Parsing Code Blocks

Michel Fortin michel.fortin at
Thu May 22 23:25:27 EDT 2008

Le 2008-05-16 à 0:31, Yuri Takhteyev a écrit :

> Your first two examples are not treated as the same by any

> implementation. It seems that all implementations interprete this:


> ~~~

>> One

> Two


>> Three

> Four


> Five

> ~~~


> as meaning that "One" is in a code block, but "Two" is not.


> Or did you mean to put a few more spaces in front of "Two"?

Hum, yes I did, and in fact I had. It just looks like my email client
(Mac OS X's Mail) eat the first space on each line that begins with a
space... I really wish it wasn't using Web Kit as its text editor when
in text-only mode.

>> [spec]: <

>> >


> I think it would help if the spec maked it more clear what part of

> each line of the blockquote is consumed before we go looking for

> sub-elements, especially as far as consuming initial whitespace goes.

Quoting item 2 of blockquote (at the moment you wrote the above):

> A run of the [block element generator](#block-element-generator) by
> pushing the following sequence to the <var>context-line-prefix</var>
> stack:
> 1. Zero or one [insignificant-indent](#insignificant-indent)
> 2. ">"
> 3. Zero or one [space](#space)

This means that the block element generator is used as a grammar rule
at this point. It matches if it can generate one or more block
elements. Since each rule in the block generator first checks for a
hard-block-content-line-prefix, you could check for yourself that you
can match a hard-block-content-line-prefix prior calling the generator
(this *could* be more performant).

I've added this to the block element generator section:

> The block element generator is used as a parsing rule in the
grammar of
> the document element generator and the block element generator. The
> element generator matches if it one of the following rule matches
and creates
> an element.

That said, I decided to revamp the blockquote rule to no longer use
directly the block element generator. Everything now passes through a
rule named block-element-run, matching one or more block element
(using the block-element generator), and the blockquote first ">" is
parsed separately in the blockquote rule instead of indirectly from
attempting to parse block elements.

Does this makes it clearer?

By the way, I agree things are not optimal at the moment. They are
also way off the tracks of what PHP Markdown and actually
do in many cases. The plan is to start by making something that mostly
work. Then I'll compare with the actual regular expressions used in
the code and do the adjustments as necessary. After that, I'll compare
with test cases in MDTest, and with the output given by other
implementations in Babelmark. And I might mix the order a bit.

Michel Fortin
michel.fortin at

More information about the Markdown-Discuss mailing list