text/markdown effort in IETF (invite)

Thu Jul 10 09:30:10 EDT 2014

* Sean Leonard <dev+ietf at seantek.com> [2014-07-10 06:30]:
> On 7/9/2014 8:06 PM, Aristotle Pagaltzis wrote:
> > Yet guessing wrong is largely without consequence.
> >
> > There are really no syntax features that affect the document’s
> > rendering non-locally. If part of a document is written with
> > unsupported syntax, only that part will render incorrectly, but the
> > other parts will come out fine.
>
> There are two use cases that I am particularly interested in: #1 You
> put .md files in a project (readme.md, etc.). These .md files are then
> passed around among project users, which may include developers,
> copy-writers, copy-editors, etc. They need to be sure that the
> readme.md is treated in the same way, which ought to be communicated
> with the data. If one person edits the document in UTF-8 and commits
> and another person edits the document in ISO-8859-1, you're going to
> have problems.
>
> #2 You have some app (let's say some web forum for example, but it
> literally could be anything, an electronic health record, some
> national criminal records, whatever) and you export data from the app.
> Say to some structured data format like XML or a sqlite database. Part
> of data liberation or backup or whatever. You want to get whatever
> your users actually input into the fields--not the HTMLized versions.
> So you need to annotate the blobs of data as Markdown, since users
> like to upload various kinds of data (Word docs, JPEG images, MP4
> videos, bits of text like names of individuals, whatever).
>
> In both cases, rendering matters "non-locally".

I’m afraid you entirely misunderstood what I meant by non-local.

What I was referring to is, e.g. in HTML you can insert tags at the top
of a document such as `<table>` or `<pre>` which then change the way the
entire remaining document is to be rendered. They affect the document
non-locally.

Markdown does not have such constructs. If you include a Markdown Extra
table in the document and you put that document through Markdown.pl, you
will get a garbled form of the source of the table syntax as output for
the table, but the misrendering is only local. The rest of the document
will be unaffected and will render correctly.

> As someone new to Markdown development, I really want to see some
> comprehensive references (since "authority" in Markdown-land is notably
> absent).

I’m afraid you will have to first find and then survey all of processors
yourself. The closest there is to central coordination is discussion on
this list, but it’s more of a users list that a lot of implementors seem
to shun (partly or fully) and others are unaware of.

> Besides, since Markdown is such a free-for-all, someone could easily
> write a Markdown processor that turns (!) into
> <script>alert('hello!');</script>.

Sure, someone could, but who would use it? There is no point in basing
any technical considerations on this.

> > So you will get a document that differs from the author’s intent in
> > some way. But it will be clear *where* the differences are and you
> > will still get all of the data in *some* form, quite possibly fully
> > intelligible if not pretty.
>
> For what we might call "sensible flavors" of Markdown, yes. But the
> author's intent may be poorly represented when processed through
> a tool that injects lolcat pictures every third word. Or, the author's
> intent may be very well-represented.

It makes no sense to me to consider obviously silly pseudo-flavours just
because anything can claim to implement Markdown. What author is going
to write a real document using such a processor, and what user is going
to try and read Markdown documents with it?

> The point is...we don't know what the author's intent is, /unless the
> author tells us/.

And he has: he said it’s Markdown. It may not be entirely clear which
flavour, but that alone is a lot more than nothing. Now he sure should
be able to explain himself more specifically than that, but the user is
not dependent on more detail to make reasonably much sense out of the
document.

> > Therefore the flavour parameter ought to be considered nothing more
> > than loosely informative, and the processor should just render the
> > document to the best of its ability regardless of the flavour
> > specified. It MAY use the parameter value to adapt to the document,
> > in RFC 2119 lingo, but ought not be bound by it.
>
> I would reword this:
>
> The flavor parameter informs recipients of the author's intent. The
> processor should just render the document to the best of its ability
> regardless of the flavor specified. It SHOULD use the parameter value
> to adapt to the document.

MAY, MUST or bust. SHOULD is almost automatically bad idea and should be
employed very sparingly (though should also not be shied away from when
warranted).

Note that RFC 2119 “SHOULD” is not the same as English “should”.

> I don't know what should happen if the flavor is absent. I am trying
> to understand. Let me put it this way: if you come across un-annotated
> Markdown in the wild (as in, not attached to any processing scripts,
> instructions, directions, whatever), what do you do? "Guess?"

Yes! That was what my entire mail was saying: you just guess. And if you
guess wrong, nothing much happens. The result looks a little ugly and
the user goes View Source and end of story. And that’s if they cannot
decipher the intended meaning at all.

> > Furthermore, an absent flavour parameter ought to mean that the
> > flavour is unspecified, not that it is any particular default
> > flavour; i.e. the choice of flavour in that case ought to be up to
> > the processor.
>
> The choice of how to act on the Markdown is /always/ up to the
> processor...so...probably. It just may not represent the author's
> intent.
>
> Between this and the Gruber discussion, I need to get used to this
> idea that "guessing" is a normative part of Markdown culture. :)

The thing is Markdown is not terribly hard to process and it’s easy to
support extra syntax or change interpretation of things slightly.

Part of it is even Gruber himself; his last release is a beta with some
small differences in syntax from the previous stable release, which he
never superseded. Furthermore he has agreed with certain proposed tweaks
such as forbidding intra-word underscore emphasis that then he never got
around to putting in code himself, but have been adopted elsewhere.

So naturally a lot of people have taken it and run with it, in all sorts
of directions. The fact that it accommodates this (while simple uses of
basic features work the same everywhere) is part of the appeal. The core
core features are well picked and well designed, so they are attractive
to take as a basis for anyone who wants to design a nice human-readable
shorthand syntax – no need to go through all the basics, just spec out
the one other thing you need and implement it. By calling your own thing
Markdown+extensions you get to profit at least partially from a lot of
software that already exists.

Of course the result is a highly informal and highly fractured landscape
where no two implementations agree on every edge case use of the syntax
and any given syntax extension likely has only a single implementation.
Trying to put this in any order is not going to be easy, if it is even
possible.

But actually I don’t know that Markdown would have been as successful as
it is if it were more strongly formalised. That it makes an attractive
platform for one’s own extensions is probably why it has spread so much:
people extending it do so with their own extensions as their goal, but
thereby implicitly help the core Markdown feature set reproduce itself
in another implementation.

One might say that Markdown is a highly virulent meme, in the original
Dawkins sense of the word.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>