text/markdown effort in IETF (invite)

Thu Jul 10 00:25:44 EDT 2014

On 7/9/2014 8:06 PM, Aristotle Pagaltzis wrote:
> * Sean Leonard <dev+ietf at seantek.com> [2014-07-09 22:10]:
>> Markdown has no way to communicate the character set in the document
>> (other than the Unicode Byte Order Marks, which is a generalized
>> property about text streams, not specific to Markdown)--and it would
>> be counterproductive to invent one. So that is a perfect example of
>> relevant metadata. And the second one, is how to turn it into
>> something else that the author wants. If it's not communicated, it's
>> going to be implied. Implied means "guessing" and likely "guessing
>> wrong".
> Yet guessing wrong is largely without consequence.
>
> There are really no syntax features that affect the document’s rendering
> non-locally. If part of a document is written with unsupported syntax,
> only that part will render incorrectly, but the other parts will come
> out fine.
There are two use cases that I am particularly interested in:
#1 You put .md files in a project (readme.md, etc.). These .md files are 
then passed around among project users, which may include developers, 
copy-writers, copy-editors, etc. They need to be sure that the readme.md 
is treated in the same way, which ought to be communicated with the 
data. If one person edits the document in UTF-8 and commits and another 
person edits the document in ISO-8859-1, you're going to have problems.

#2 You have some app (let's say some web forum for example, but it 
literally could be anything, an electronic health record, some national 
criminal records, whatever) and you export data from the app. Say to 
some structured data format like XML or a sqlite database. Part of data 
liberation or backup or whatever. You want to get whatever your users 
actually input into the fields--not the HTMLized versions. So you need 
to annotate the blobs of data as Markdown, since users like to upload 
various kinds of data (Word docs, JPEG images, MP4 videos, bits of text 
like names of individuals, whatever).

In both cases, rendering matters "non-locally".

>
> And there are no large overlapping surfaces among the syntaxes of the
> various extensions (esp. those for very different document features),
> which makes unsupported syntax unlikely to appear to have been intended
> to be rendered as some completely dissimilar feature.

As someone new to Markdown development, I really want to see some 
comprehensive references (since "authority" in Markdown-land is notably 
absent). Besides, since Markdown is such a free-for-all, someone could 
easily write a Markdown processor that turns (!) into 
<script>alert('hello!');</script>.

>
> So you will get a document that differs from the author’s intent in some
> way. But it will be clear *where* the differences are and you will still
> get all of the data in *some* form, quite possibly fully intelligible if
> not pretty.
For what we might call "sensible flavors" of Markdown, yes. But the 
author's intent may be poorly represented when processed through a tool 
that injects lolcat pictures every third word. Or, the author's intent 
may be very well-represented.

The point is...we don't know what the author's intent is, /unless the 
author tells us/. And I think we need some more metadata to make the 
author's intent clear.

>
> And because of the primary goal of Markdown to be human-readable in its
> source form, there is always an easy and cheap last resort: view source.

This is a goal. Agreed.

> Therefore the flavour parameter ought to be considered nothing more than
> loosely informative, and the processor should just render the document
> to the best of its ability regardless of the flavour specified. It MAY
> use the parameter value to adapt to the document, in RFC 2119 lingo, but
> ought not be bound by it.
I would reword this:

The flavor parameter informs recipients of the author's intent. The processor should just render the document to the best of its ability regardless of the flavor specified. It SHOULD
use the parameter value to adapt to the document.

I don't know what should happen if the flavor is absent. I am trying to 
understand. Let me put it this way: if you come across un-annotated 
Markdown in the wild (as in, not attached to any processing scripts, 
instructions, directions, whatever), what do you do? "Guess?"

> Furthermore, an absent flavour parameter ought to mean that the flavour
> is unspecified, not that it is any particular default flavour; i.e. the
> choice of flavour in that case ought to be up to the processor.

The choice of how to act on the Markdown is /always/ up to the 
processor...so...probably. It just may not represent the author's intent.

Between this and the Gruber discussion, I need to get used to this idea 
that "guessing" is a normative part of Markdown culture. :)

>
> Lastly, the spec should mention (as informal guidance to implementors)
> that applications containing Markdown processors which have any chance
> of being exposed to source documents of unknown flavour should, if at
> all possible, provide a means for the user to view the source Markdown
> document in unformatted form.

Agreed on that one. I will include something like that in the next draft.

-Sean