Re: Formal Grammar — some thoughts

Allan Odgaard 29mtuz102 at sneakemail.com
Sat Jul 29 18:02:05 EDT 2006


On 29/7/2006, at 23:22, Eric Astor wrote:


>> 1. interpreting tokens as literal text when end token is missing,

>> example: `this is __not starting bold`.

>

> This is actually simple to deal with in most formal grammars -

> since formal

> grammars are recursive, you simply define bold (for example) as:

> bold := ('__' SPAN '__') | ('**' SPAN '**')


Well, yes, you can put that in your formal grammar, but the generated
parser will have a problem. Parsers generally tokenize the text and
then go through it token-by-token selecting which rule to pick.

So this parser will only see the `__` token (not what follows) and
will then pick the bold rule. If we have defined SPAN as not
containing any `\n`, then when it reaches end-of-line it will give
the error that it sees `\n` but expected `__`.

Given a sufficiently large look-ahead (in parser terms, i.e. looking
at the next n tokens) and defining some dummy rules to deal with
isolated `__` it could possibly be pulled off, but it could likely
still be fooled.

A slightly related problem is the ambiguity when seeing `___` in the
text. That will be tokenized as the two tokens `__` and `_`, i.e.
first start bold, then italic. But the entire line could be: `___bold
and__ only italic_`.

I.e. in this particular case it should have been tokenized as `_` and
`__`.

A workaround would be using `*` for either the bold or italic. I.e.
the strict parser would disallow three consecutive `*` or `_` if and
only if bold has a longer span than italic.



More information about the Markdown-Discuss mailing list