Metadata syntax (was Universal syntax for Markdown)

David Sanson dsanson at gmail.com
Mon Sep 19 23:04:44 EDT 2011


On Mon, Sep 19, 2011 at 2:34 PM, Fletcher T. Penney
<fletcher at fletcherpenney.net> wrote:

> Not to repeat myself, but I again think we're approaching this from the wrong end.  If there's going to be a consensus, I think it's going to have to start with a shared philosophy for the standards.  Each variant may end up with it's own philosophy outside of that, but there has to be a common vision for the purpose of the standard.


It seems to me that many visions have been expressed here. I'm not
sure what more can be done to generate consensus. But I'm happy to try
to express my own. And, for what it is worth, bowerbird, this is the
vision of user, not a developer :-)

I have two visions, and I think they are compatible. One is an agreed
upon text-y format for title, author, and date. The other is an agreed
upon text-y-as-possible-but-no-doubt-more code-y format for arbitrary
metadata. If I were going to push for consensus on one of these rather
than the other, it would be the second, but I'd like to see both, and
I think, as I've suggested before, that some of the reasons for
resisting code-y metadata (elegance and aesthetics, not assuming that
all documents are in English) are better thought of as reasons for
developing a text-y format for title, author, and date.

As for the code-y metadata, I think it is a mistake to think that we
can imagine ahead of time all the ways this arbitrary metadata might
be used, so I'd like it to be as flexible and powerful as possible.
I've already mentioned one vision---the ability to embed bibliographic
data in academic papers---but that's just something I think about
because I am an academic who often uses markdown to write papers with
lots of citations. Markdown is used in so many ways by so many
different people---bloggers writing posts, academics writing research
papers, scriveners writing novels, developers writing readme's, .... I
say: make it as powerful as feasible and let the users discover new
uses.

There has been some discussion of whether or not there is any real
need for multi-paragraph metadata, focusing on the example of
abstracts. I currently use Jekyll for my website. By far the easiest
way to generate a "blurb" for a given page---the sort of thing that on
a blog gets shown "before the fold"---is to toss it into a metadata
field and adjust Jekyll's templates to use the content of the blurb.
There are no doubt other ways to do this---filters and scripts and
pre- or post-processors. But that doesn't take away from the fact that
using metadata is one very easy way to do this. So multi-paragraph
metadata is something I use regularly in this context.

There has been some discussion of whether or not markdown
implementations should be responsible for parsing this code-y
metadata. I suppose it is part of my vision that markdown
implementations do parse this code, and pass it along as appropriate
to templates and the like. But John's first point of possible
consensus,

1. Agreement about which bits of the document are metadata, so
these won't be processed as part of the document's text.

would be of great value on its own. I've spent time converting
documents from Scrivener or Mellel to MMD, and then to Pandoc's
extended markdown. A MMD document with lots of metadata---even with
hard line breaks---is, when used with other processors, a markdown
file with a bunch of junk at the top that has to be trimmed away.
Likewise, I've written documents using Pandoc's title-author-date
blocks, and then needed to use those documents with other processors,
and that stuff at the top was just so much junk that had to be trimmed
away. So if everyone could just agree on what to ignore, that would be
a serious improvement.

But if markdown implementations are not themselves going to be
responsible for parsing the code-y metadata, I would strongly prefer
that the metadata be in a format that has existing wide support. I
doubt that some decree by the markdown community will have the power
to move all the developers who have developed all the various tools
that use markdown and rely on metadata. And I think the whole thing is
likely to be a nonstarter if it requires that these developers all
write parsers for some new fangled format. Even if markdown
implementations are going to handle to parsing, I guess someone is
also going to need to write tools for translating existing data
formats into the new format---unless we are assuming that nobody would
ever want to use existing data as metadata in a markdown document?

So if there were a standard out there for human writable/machine
readable plaintext data that shares the values of markdown, I would
think it made more sense to use that, and let the markdown community
focus their intellectual energy on markdown. I had naively mentioned
YAML in an earlier post just because among us naive users, it has the
reputation of being such a standard. But I really don't know anything
about plaintext data formats, and have no special affection for YAML.
Maybe the reStructuredText format is better; maybe lua's format is
better; maybe there really isn't anything out there that gets things
right.

So that's my vision. I can see that it differs from Fletcher's vision
of figuring something out that is good enough for 90% of existing
uses. But I'm not so sure that, in practice, there is a large gap when
it comes to what would need to be implemented.

David


More information about the Markdown-Discuss mailing list