Formal Grammar — some thoughts

A. Pagaltzis pagaltzis at
Sat Jul 29 17:54:26 EDT 2006

* Allan Odgaard <29mtuz102 at> [2006-07-29 22:40]:

> 1. interpreting tokens as literal text when end token is

> missing, example: `this is __not starting bold`. For bold it

> doesn’t matter IMO (having to escape the token,) but having to

> escape all single appearances of `_` and `*` could be

> irritating, although presently _ often do come in pairs, so

> here one often already do need to wrap filenames, environment

> variables and similar which use the underscore in a raw

> environment.

I wouldn’t go for a pure formal grammar. If you don’t, then it’s
easy to tolerate ambiguity in the language by deferring
disambiguation until possible. Just accumulate potential tokens
and only assign meaning once it’s decidable.

> 2. using back-references in end-tokens, example: `a ``` ``raw``

> ``` environment`. A formal grammar can’t really do that,

I’m pretty sure it can. You just need a couple redundant

> 5. heuristically defined end of lists, sub-lists and

> block-quotes. This would need to be more strict. I am not

> entirely sure what the current definition is, so I am wary of

> reformulating a strict version. From the source it seems that

> a sub-list is started when a line is a list item with a

> different (exact) indent as the first list item, allowing for

> some fun flexibility:

That was recently discussed. It will be stricter in future
versions, requiring a certain amount of indentation.

> There is also an ambiguity between `*` used for bold and used

> for a list item.

That one is helped if the vocabulary contains newlines as
terminals, and gets easy if you allow deferred disambiguation.

> A minor problem is that when in a list item environment the

> rule e.g. for raw blocks needs to be redefined (to require 2

> tabs or 8 spaces) and that would be necessary for each new

> level (to add an extra indent in the requirement) with the

> likely outcome that raw blocks would only be supported in e.g.

> the 3 first levels of list items.

Objection. To me, a great feature of Markdown over nearly every
wiki markup out there is that nested block structures are
composable with straightforward rules. If a pure formal grammar
can’t cope, then to hell with pure formal grammars. It’s quite
easy to cope with nesting once you leave the purely declarative
path. Heck, Perl 5 pattern matches can do it.

> Take the following relative simple code which produce bogus

> markup as an example of how fragile this stuff currently is:

The current reference implementation of Markdown, frankly, isn’t
very good. It’s a search&replace train, which makes it inherently
fragile and painful to extend. It’s just valuable anyway because
it’s actual running code that works without breaking badly too
often (cf. Anthony DeBoer’s delectably sardonic definition of

Aristotle Pagaltzis // <>

More information about the Markdown-Discuss mailing list