Optional features (was: Markdown Extra Specification (First Draft))

Aristotle Pagaltzis pagaltzis at gmx.de
Sat May 24 05:10:43 EDT 2008

* Yuri Takhteyev <qaramazov at gmail.com> [2008-05-23 08:35]:

> * Aristotle Pagaltzis <pagaltzis at gmx.de> [2008-05-23 05:40]:

> > I also agree with your opposition to them; if anything, one

> > should filter the *output* of a Markdown-to-HTML conversion

> > so that it won't matter whether people write literal `<em>`

> > tags or use asterisks.


> This is true in theory... I actually just recently write

> something along those lines in Lua [1] to use with my Lua wiki.

> The idea is to do as you suggest: Convert from MD to HTML

> first, then filter the HTML. To make it safe, I parse HTML as

> XHTML and complain if it doesn't parse. Hence a problem: if the

> user screws up with their HTML (and my filter is pretty

> unforgiving), it becomes hard to communicate to them what went

> wrong. I can tell them where there is a problem in the overall

> HTML, but this doesn't help much, since the user didn't know

> there was all of this HTML to begin with.

It seems to me that filtering is a red herring in your case. If
you want to allow users to enter literal tags, you will have this
problem whether you filter the ultimate output or not.

> There is no easy way to show them where the problem occurred

> relative to the input that they provided, or to show them the

> content with just _their_ HTML escaped. So, a good solution in

> Markdown itself actually would be a good thing.

If your XHTML parser has a streaming input mode, you can couple
your Markdown converter directly to the XHTML parser and feed the
HTML output to it as you go. If the XHTML parser throws a well-
formedness error, you can then relate it to the vicinity of the
last Markdown chunk you converted to HTML and passed into the
XHTML parser.

It will sometimes be an earlier chunk; eg. if the user writes
`&nbsp` (notice the missing semicolon) and this is exacly at end
of string in the HTML chunk you pass to the XHTML parser, then
the XHTML parser will have to wait until the next chunk before
it can decide that that entity is broken.

If you don’t want to couple the Markdown converter with an XHTML
parser that closely, it’s still possible to do this, but the
Markdown converter will have to be able to accept streaming input
itself and will need to generate output sufficiently frequently
that you can track the correlation of input and output with a
useful amount of precision. The glue code that combines the
Markdown converter with the XHTML parser will have to do some
relatively hairy (tho not very complex) bookkeeping in that case.

Aristotle Pagaltzis // <http://plasmasturm.org/>

More information about the Markdown-Discuss mailing list