Metadata syntax (was Universal syntax for Markdown)
Bowerbird at aol.com
Bowerbird at aol.com
Fri Aug 26 14:01:45 EDT 2011
all pooped out, are you?
oh well, the conversation this time lasted longer
than it ever has before, in my memory, so maybe
you're just working up your stamina for next time...
so let me finish off this round...
> A Markdown document may contain metadata
> in a human readable form that the parser converts
> to a machine readable form of metadata automatically.
> A casual reader will understand the content directly
> and without distraction. Bowerbird will love this.
indeed, christoph... because you've begun to describe
the very system that i use, for the very reason i use it.
i'll describe it more fully below, but first other stuff....
i'm not sure i fully understand the mentality that says
"implementations of markdown 2.0 can toss metadata".
isn't the objective to dispense with implementations that
act differently from each other? ok, sure, i'm not naive;
i realize that once a "standard" for "markup 2.0" is made,
someone will come along and "tweak" it for their benefit,
and then we are once again on the path toward fracture.
but still, the goal for here and now is to unify all. right?
i feel the same way about command-line switches that
turn on different "modes", like "quirks" and "extensions".
isn't it our zeitgeist to gather everyone under one roof?
you'll just ignore (or never learn) features you don't need.
so everyone gets what they want. and if it's not possible,
if you want to use the system you have been using which
is tweaked the way you want it, just continue to do that...
it's not like those scripts will stop working or something.
but manufacturing a situation where all of the differences
are _blessed_ (rather than removed) is counterproductive.
now on to "metadata"...
as for the color of the metadata bikeshed, we have one
shade of paint -- "simple" -- so that's what it must be...
you've probably over-discussed it already, without even
getting to the meat of the matter. for _most_ purposes,
the "metadata" is relatively unimportant, which you'll see
quite clearly if you only begin to concentrate on specifics.
in a .pdf, for example, the "metadata" consists merely of
title, author, subject, creator, and keywords. that's it...
in an .epub or a .mobi, you can specify a ton of metadata,
if you want, but there's no standardized way of getting it,
so you're basically whistling at a noisy construction site...
(or doing pantomime in the dark, if you prefer that image.)
unless/until the "microformat" people get an upper-hand
-- and lord help us if that kind of bureaucracy wins out --
"metadata" in .html continues to be a rather iffy thing, so
at least for now, i think this issue needs little attention...
as for the matter of "tags" or "keywords", they're _lame_,
to a large degree, because they can be gleaned from the
text itself in most cases. and perhaps more importantly,
such descriptive judgments need to be accumulated over
the input from hundreds or thousands of "objective" users,
rather than plugged in by a document's author or publisher,
or the specter of gaming the system makes it all worthless...
i'm not telling people not to use tags, but i think it's obvious
that any worthwhile recommendation system will ignore 'em.
your metadata often tries to tell lies; google knows the truth.
there are a lot of consultants selling metadata as a cure-all.
it's more like snake-oil.
as for my system...
as i said, my focus is on _books_, so for me, the concept of
the "title-page" (plus the "cover") is the one that rules here.
the first "section" or "chapter" in a .zml file is the title-page,
and _everything_ on that page is considered as "metadata".
remember that my first pass consists of separating "chunks"
-- a sequence of non-blank lines bordered by blank lines --
so the top chunk (of one or more lines) is defined as the title.
the second chunk is considered to be the subtitle, and the
third is considered to be the author. the "author" chunk is
required to start with the word "by", so if the second chunk
starts with "by" and the third chunk does not, my routines
assume that the book has no subtitle, so the second chunk
is considered to be the "author" chunk. subsequent chunks
are required to be labeled appropriately, such as "edited by"
or "illustrations by" or "plus additional contributions by" or
"with preface by", and so on. you get the picture; it's clear.
other things which commonly appear on the title-page are
the publisher's name and often the city where it is located,
publication date, contact information for the author(s), etc.
none of this is particularly difficult to parse.
nor does it sacrifice any power _or_ flexibility.
other info about the document is obtained in the course of
analyzing it, like the number of chapters and illustrations,
the size of the file, the number of references, and so forth.
you also have to acknowledge, at some point in time, that
no matter what you do, you ain't gonna make a professional
book-cataloger happy... and one of my close friends is just
such an animal, working in the library system over at u.c.l.a.
their cataloging workflow can summon hundreds of variables,
depending on the unique characteristics of a particular book,
and that's a complexity that we could never hope to replicate.
at the same time, though, we can get 80% of the utility with
2% of the effort (yes i did say 2%, and not the expected 20%),
so that's the sweet spot we need for maximum cost-benefit.
as i said, there are a lot of consultants selling metadata as
snake-oil, and the most common pitch is that metadata will
give better discovery. that's hogwash. discovery will always
be inferior until we develop good collaborative filtering, and
that's necessary anyway, and fully independent of metadata.
there's something else that i generally put under "metadata"
-- which other people do not -- which are the specifications
used to create the output-formats. these include things like
straight-quotes vs. curly, indented paragraphs vs. block, and
the pagesize (for .pdf), the font, fontsize, leading, and so on.
this allows the end-user who receives the z.m.l. file to create
outputs matching what the author intended them to look like.
in accordance with the all-text-in-one-file mandate of z.m.l.,
these specifications should be included in the text-file itself,
and can fall in the "metadata" section, the "colophon" section,
or in their own "output specifications" section, as you desire...
and, of course, end-users can also change the specifications,
so as to create output that is formatted to their own desires...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Markdown-Discuss