Metadata syntax (was Universal syntax for Markdown)

David Chambers david.chambers.05 at gmail.com
Sun Sep 18 18:08:33 EDT 2011


On Sep 18, 2011, at 10:47 AM, John MacFarlane wrote:


> * There's no provision for structured data (e.g. key/value

> tables or lists), or for boolean or numerical fields.


I'm not convinced that Markdown should have any say as to which data structure a particular value should be transformed into.

These are the things I believe Markdown certainly should define:
delimiters for metadata blocks (whitespace or otherwise)
syntax for key–value pairs
valid keys
valid values
Perhaps Markdown's responsibilities should be limited to the following:
ensuring that metadata are omitted from the HTML output
storing the key–value pairs (as strings) in a dictionary-like object
The reason I lean towards this approach is that the alternative (defining syntax for lists, numbers, etc.) would impose extra syntax in common cases. Take the following, for example:

date: Sunday, 22 May 2011
time: 6:30pm
zone: America/Los_Angeles
tags: JavaScript, regex, regular expressions

To a human reader, "tags" is clearly a list. How, though, would a parser know that "tags" is a list but "date"—which also contains a comma—is not? Resolving this ambiguity would require that the tags be wrapped in square brackets (or the addition of some other syntax):

date: Sunday, 22 May 2011
time: 6:30pm
zone: America/Los_Angeles
tags: [JavaScript, regex, regular expressions]

What if list items are allowed to contain commas? Perhaps an item may be quoted to resolve this ambiguity. What happens, then, if one wishes to include a quoted item:

tags: [foo, bar, "baz!"]

If quotation marks are optional, would this necessitate wrapping "baz!" in an extra pair?

These are certainly edge cases, but as we've agreed defining correct behaviour in such cases is important. If we want to avoid defining our own serialization format, we have two options: we can adopt an existing format (such as JSON or YAML), or we can hand off the responsibility to application developers.

I favour the latter, because serialization formats, by necessity, contain quite a bit of punctuation. Transforming strings from a metadata dictionary into appropriate values is something with which I have first-hand experience. Mango provides a META_LISTS setting which determines which keys' (string) values should be transformed in lists. Sure, this required a bit of work on my part, but the end result is pleasing (no extra punctuation in my Markdown files).

Won't this lead to a situation where one application cannot correctly process another application's metadata? Yes. If we're unwilling to accept this I fear we'll end up reinventing YAML. ;)

David


On Sep 18, 2011, at 11:07 AM, Fletcher T. Penney wrote:


>

> On Sep 18, 2011, at 1:47 PM, John MacFarlane wrote:

> <snipped>

>

>> To my mind, multimarkdown comments just aren't flexible enough:

>>

>> * There's no way to have multiline metadata fields that contain

>> blank lines, e.g. an abstract with two paragraphs.

>

> True - but in MMD an abstract would be included in the document with a separate header, not as metadata. But you're correct that blank lines are not allowed. I've never needed them, but they aren't allowed.

>

>> * There's no provision for structured data (e.g. key/value

>> tables or lists), or for boolean or numerical fields.

>

> True. I've never needed them, and have never had them requested. But there is no provision for that.

>

>> * Metadata fields are interpreted as raw strings, not markdown.

>> That's sometimes what you want, but not always. Titles

>> often contain emphasis and other formatting, for example,

>> and sometimes even footnotes (for acknowledgements). If

>> these are just going into an html meta field, it doesn't much

>> matter, but if you're using the metadata fields in templates,

>> it does. (And sure, you could always run a raw string through

>> your markdown processor again, before passing it to the template engine,

>> but that creates problems for things like reference links and

>> footnotes.)

>

> This is a slight difference in behavior from MMD 2. I'm considering approaches to allow processing the contents of the metadata, as this can be an issue occasionally.

>

>> Another major problem, in my view, is that if a document starts

>> with a phrase followed by a colon, it gets swallowed into metadata:

>>

>> % multimarkdown

>> To be or not to be: that is the question.

>> ^D

>> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

>> <!DOCTYPE html>

>> <html xmlns="http://www.w3.org/1999/xhtml">

>> <head>

>> <meta name="tobeornottobe" content="that is the question."/>

>> </head>

>> <body>

>>

>> </body>

>> </html>

>>

>> That's not what most authors would expect!

>

> This is true. But a blank line at the top of the document solves the problem. And it doesn't match a URL on the first line as metadata, so I'm not sure how often this really happens in real life.

>

>> For this reason, I would favor something more like reStructuredText

>> field lists, which marks the fields explicitly as fields:

>>

>> :title: Here is the title.

>> :author: John

>> :abstract: The abstract here.

>> It can span multiple lines.

>>

>> As long as the indentation is maintained.

>>

>> This is not part of the metadata.

>>

>> This is slightly less texty because of the leading colon, but less likely to

>> capture regular text.

>

> This becomes a matter of values. To me, the ugliness of this approach outweighs the virtually negligible chance that I will have a document triggering metadata when I don't mean it. But it's certainly not as bad as some other alternatives. If it was proposed as a standard, I would try to vote against it, but would not necessarily "boycott" it within MultiMarkdown.

>

>

>> Also, because this is recognizable as metadata wherever it occurs

>> in the document, one could then drop the requirement that the

>> metadata occur at the top of the document, which I think is

>> undesirable. When there's lots of metadata, it's nicer to put

>> it at the bottom (or at least to put some of it at the bottom),

>> so it doesn't interfere with reading the article. lunamark's

>> lua_metadata allows that, by the way -- so you don't have to

>> start the document with something that doesn't look like plain

>> text.

>

> I don't view metadata as necessarily belonging at the bottom, but the flexibility is a bonus.

>

>> One nice point that David Sanson made is that one could combine

>> a simple, "texty" metadata format for common things like titles

>> and authors with a flexible, more "cody" format for everything else.

>> One should keep this in mind in thining about how to balance flexibility

>> vs. textiness.

>>

>> John

>

> My vote would be for something more akin to MMD's metadata as the first option, and then for something more robust as the optional variant for those who need it. The "cody" alternative could allow lists, key value pairs, multiple paragraphs, etc. I suspect it would be used by only a minority of users, but that the minority is going to be over-represented on this discussion list.

>

>

> F-

>

> --

> Fletcher T. Penney

> fletcher at fletcherpenney.net

>

>

>

>

> _______________________________________________

> Markdown-Discuss mailing list

> Markdown-Discuss at six.pairlist.net

> http://six.pairlist.net/mailman/listinfo/markdown-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20110918/442a9a7e/attachment-0001.htm>


More information about the Markdown-Discuss mailing list