Metadata syntax (was Universal syntax for Markdown)

Fletcher T. Penney fletcher at fletcherpenney.net
Wed Aug 17 17:29:50 EDT 2011


1) This is a totally separate issue from the discussion at hand I was responding to, which is how to try and converge the Markdown derivatives, so I renamed the thread

2) That said, there are certainly multiple ways of including metadata information, and I'm happy to discuss. But I do want to be clear that I think this is a secondary discussion, and do not wish to distract from the bigger picture of how to develop a plan for unifying the core features of the Markdown family. How to merge metadata syntaxes should be a secondary or tertiary concern for such an effort.



For those who care about metadata syntaxes, read on. If you don't, feel free to skip to the next email that interests you. ;)


The MMD format for metadata was actually taken from the Blosxom software that you mention. As you may recall, the first line could be used as a title, but beyond that a syntax basically identical to that of MMD was the most common way of including metadata, and I believe that the plugin responsible was in fact, called metadata. This was necessary to allow information such as dates, categories, etc to be included in the document itself. The ability to include metadata using arbitrary keys created a blossoming (pun intended) of plugins that added many useful features to the blosxom package.

Your suggested syntax certainly requires less markup than that used by MMD currently, but at the cost of a great deal of flexibility, and would require more complexity in programming the parser.

You mention the English-centric nature of MMD metadata. This is certainly true, but no more so than HTML itself. One could certainly localize MMD to use any language you like (the beauty of open source), but to match your proposal in multiple languages would be quite complicated.

For example, the following are valid MMD metadata dates, and easily used:

date: 8/17/2011
date: August 17th, 2011
date: 2011-08-17
date: 17/8/2011
date: 14. Juni 2001
date: 8 avril 2000

Writing a parser that would correctly catch all of these dates in any language would be quite difficult, and prone to error.

You mention tags as being easily recognized, but that this is not always true:

A sample document

by John Smith, MD
Director of Palliative Care, Division of General Medicine, Medical University of Somewhere

While perhaps not the best example of potential problems, this would be incorrectly interpreted as tags, when the author probably implies that this represents his academic affiliation and would like it to be properly placed after his name on the title page, or on the slide deck if generating via beamer.


So your example would work for simple metadata that relies only on numerical dates. For documents that fit your desired model, this syntax would be great and would involve less markup --- which is good. However, I suspect that for those who want metadata in their document, it would be too limiting --- which is not good. Many of my users, myself included, would end up right back where we started with needing another way to include metadata.


To help give you perspective on the power of the current metadata model, by properly including the right metadata, a single MMD document can be processed into a web page, a pdf slide show (aka "powerpoint"), and a pdf handout. Another document can be processed into letterhead, complete with logo, return address, recipient information, graphical signature, and even a properly addressed envelope. Another can be output as a properly formatted manuscript for submission to a publisher.

I don't expect all users to use the full power of metadata. Many users can simply ignore it altogether. But it is an incredibly useful feature that is one of the primary ways that I integrate MMD into my own personal workflow. It does take a bit of willingness to dig around and experiment in order to understand how metadata works. So while I am certainly interested in ways to improve it, metadata will not be removed from MMD.


That doesn't mean I expect all variants to use metadata, just because MMD does. Nor do I expect them to follow the MMD syntax if they do. Other than yours, I haven't seen any proposals for a metadata syntax that had *less* markup than mine, nor did they seem any more "human friendly" than this syntax. And for my purposes, your proposal doesn't offer the flexibility that I would need for the ways I use MMD.


I've tried to throw a few things into your example, to show how it wouldn't work as well for my own use cases:

---
Test Document for Automatic Metadata Detection
Is this a subtitle, or a continuation of the title from above?

by Christoph Freitag
date: August 17, 2001
Markdown, Standardization, MMD, Metadata
affiliation: University of Somewhere
comment: This looks funny aligned with preceding line,
but not the other lines
---

(forgive the alignment above - since the world has still not moved towards elastic tab stops, there's no way to guarantee it looks right in every email program.)


How would you suggest combining the simplicity of your proposal, with the flexibility needed by many users?


F-



On Aug 17, 2011, at 4:29 PM, Christoph Freitag wrote:


>

> Am 17.08.2011 um 18:00 schrieb markdown-discuss-request at six.pairlist.net:

>

> Fletcher T. Penney pointed this out:

>

>> I think that for any movement towards a "unification" of the Markdown variants to have a chance of success, the first step is to agree on a core set of principles.

>>

>> For example, one of my core principles for MultiMarkdown is taken from Gruber:

>>

>>> The overriding design goal for Markdown?s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it?s been marked up with tags or formatting instructions.

>>

>

> Fletcher, sorry, but personally -- despite loving MMD (and even having used MMD CMS for a diary) -- I have never liked the way MMD handles metadata. Partly this is because, not being a native English speaker, I dislike English meta descriptors. A localization could resolve this -- but I still think it looks ugly. However, do you actually need descriptors at all? I doubt it:

>

> * The title could be anything "at the start" of the document. Blosxom is a good example. Anything up to the first blank line is the title.

> * After that, anything between the first blank line and the second blank line would be treated as additional metadata.

> * Instead of the "Author:" descriptor, explicitely stated, it should suffice to write "by". What follows is the name of the author. (Localization would be easier as only this "keyword" would have to be known to the parser in a number of languages.)

> * Dates would be self-explanatory, to a clever parser.

> * Any list of words separated by commas on a single line would be treated as tags.

> * Any more fanciful meta descriptors might be given explicitly just as in MMD before. This could be left to non-standard, personalized variants of Markdown.

>

> Thus the following would be a valid document:

>

> ---

> Test Document for Automatic Metadata Detection

>

> by Christoph Freitag

> 08/17/2011

> Markdown, Standardization, MMD, Metadata

>

> A Markdown document may contain metadata in a human readable form that the parser converts to a machine readable form of metadata automatically. A casual reader will understand the content directly and without distraction. Bowerbird will love this.

> ---

>

> Best regards,

> Christoph

> _______________________________________________

> Markdown-Discuss mailing list

> Markdown-Discuss at six.pairlist.net

> http://six.pairlist.net/mailman/listinfo/markdown-discuss


--
Fletcher T. Penney
fletcher at fletcherpenney.net




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20110817/6776f473/attachment.html>


More information about the Markdown-Discuss mailing list