Metadata syntax (was Universal syntax for Markdown)

John MacFarlane jgm at
Sun Sep 18 11:53:20 EDT 2011

+++ David Sanson [Aug 17 11 23:09 ]:

> First time posting here as well. I've been watching this discussion

> with interest. As a user of (extended) markdown, I have long hoped for

> a unified standard for (most or all) markdown extensions and a unified

> handling of metadata.

Thanks for the thoughtful post. A few comments below, describing
a metadata experiment I've done in implementing lunamark.

> It seems to me that one of the issues that arises when we start

> thinking about metadata is that there really are two different kinds

> of metadata: some metadata (title, author, date) is---at least in many

> cases---also part of the *content* of the document. This is the kind

> of metadata for which I feel the force of the demand for an elegant

> plaintext solution. For some bold suggestions in this direction, see

> this [old post by Michael Thompson][1] to the pandoc-discuss list.

> Here is one of his examples from that post:


> A Good Man Is Hard To Find


> Flannery O'Connor

> Spring 1952



> The grandmother didn't want to go to Florida. She wanted to visit

> some of her connections in east Tennessee and she was seizing at

> every chance to change Bailey's mind.


> Isn't that so much *prettier* than any of the options currently in

> play? Email someone a document like that, and they will know exactly

> what you mean, and see no distracting markup. No doubt this presents

> challenges when it comes to parsing, and I have no idea whether or not

> those challenges are surmountable. Clearly some rules would have to be

> laid down (Does it have to be centered? Indented? Can I underline the

> title ala setext? Do I have to have two blank lines after the date?

> Can I leave the date out? etc.) And it raises issues for backwards

> compatibility too. But I think its worth having in view a solution

> that achieves a certain degree of perfection along this one dimension.


> But then there is the other kind of metadata. Tags, keywords,

> baseurls, paths to associated files, directives for webpage templating

> software, and so on and so on. This sort of stuff is definitely not

> content. It is a bunch of data that I want to associate with the file

> for some reason or other. It needs to be indefinitely extensible. It

> is frequently tied directly to some specific output format or context.

> In other contexts, probably just needs to be ignored. Blosxom taught

> us that it should all be at the top of the document (and successors,

> like Jekyll, follow this tradition), but much of it is ugly enough

> that it could just as well be banished to the bottom of the document,

> where nobody but the author would ever have to look at it.


> When it comes to this sort of metadata, I don't see any reason to look

> for something elegant, language-independent, and plaintext-y. This is

> where it feels like I just want a way of embedding a block of data

> within a markdown file, knowing that it won't be treated as content

> (and, depending on my processor and the context, knowing that it may

> be sucked up and used in various ways). It is here that I agree with

> the sentiment that metadata shouldn't be part of the markdown spec,

> *but* I think markdown should be smart enough to ignore the metadata,

> so that I don't have to strip it out before feeding the document to a

> markdown processor.

One way to achieve this is to put metadata inside specially
marked HTML comments. Then existing markdown parsers will all
ignore it (at any rate, it won't display).

That's what I did in lunamark's experimental 'lua_metadata' feature.
Here's an example:

catalog_number = "23423423A"
category = "fish"
tags = { "Arctic", "fish", "char" }
bib = { title = "Fishing for Arctic char",
author = "Samuel Smith",
publisher = "Alaska Press",
year = "2008" }

Inside the comment we just have lua declarations (they're processed
in a sandbox, so metadata can't do anything nasty). This makes the
metadata slightly less "textual" looking, but it gives you the ability
to have metadata of various types: string, number, array, key-value
table. And it's actually pretty readable -- note that bibtex's
format was based on lua tables.

One thing that needs to be considered in a metadata format is that
some metadata entries need to be parsed as markdown, while others
should remain literal (suppose you have a product number with
lots of '*' and '[' in it). I handle this by providing a function
'markdown' or 'm' that you can use:

title = m"Reading *Hamlet*",
author = m"[Sally Cho]("

It doesn't matter whether you write

markdown "foo"

They all work.

It would be possible to define other functions as well, even
ones that do IO, and expose them individually without giving access
to general IO functions. So you could provide a function 'csv' that parsed a
CSV file and included its data as an array. Or a function 'timestamp' that
returns a timestamp.

Lunamark allows these metadata sections to occur anywhere inside the
document, not just at the beginning. You can also have multiple
metadata sections, and their results are aggregated. All of the
defined variables are available for use in lunamark's document

> So, here is my *pipe dream* implementation of metadata in markdown:


> 1. A syntax for clean, language independent title, author, date (and

> ?) that looks the way you would have done it on a typewriter or in a

> plaintext email.

lunamark supports pandoc_title_blocks, which are close to this.
As you pointed out on the pandoc-discuss list, you can always write:

% A Good Man Is Hard To Find
% Flannery O'Connor
% Spring 1952

Getting rid of the '%'s would be feasible if you didn't mind requiring

> 2. Support for embedding arbitrary metadata inside of appropriate

> delimiters (e.g., YAML's '---' and '...') *anywhere* within the

> document.

This is handled by lua_metadata -- not YAML, but more flexible
(since you can define things like 'm') and arguably quite a bit simpler.


More information about the Markdown-Discuss mailing list