Backtick Hickup
    Eric Astor 
    eastor1 at swarthmore.edu
       
    Mon Aug 27 17:02:34 EDT 2007
    
    
  
Michel Fortin wrote:
> As to how to parse it with an incremental parser, I assume you could do 
> that:
> 
>     text: this
>     mark: **
>     text: is
>     mark: `
>     (switch tokenizer into "raw" mode until it sees a backtick)
>     text: raw** text
>     mark: `
>     (take last text token, remove backtick marks, and make a code span)
>     (switch back tokenizer into "span" mode)
>     end reached in span
> 
> The hard part comes when no matching backtick is found (assuming 
> non-paired backticks do not constitute code). Here's what I suggest for 
> the same case with no ending backtick:
> 
>     text: this
>     mark: **
>     text: is
>     mark: `
>     (switch tokenizer into "raw" mode until it sees a backtick)
>     text: raw** text
>     end reached in raw
>       (reparse last text token in "span" mode)
>         text: raw
>         mark: **
>         (take tokens between the two ** marks and put them in emphasis, 
> the two marks are removed)
>         text: text
>         end
> 
> Note that in this case backtracking is limited to the last token, which 
> is itself limited in length by the current block (paragraph, list item, 
> ...). I have no idea how that could fit any formal grammar language 
> however.
Well - has anyone else looked into ANTLR 3.0 at all? The LL(*) grammar 
language it uses (an EBNF) allows for full backtracking support, and 
unspecified lookahead as far as necessary. It's fairly well-optimized, 
as I understand it, taking advantage of some of the packrat-parsing 
ideas to save handling a single text section repeatedly...
I suspect Markdown might be formally specifiable in ANTLR v3, and I'd 
bet that even if it's not, it's very close. If it is - getting Markdown 
parsers into various languages would just be a matter of helping develop 
new ANTLR v3 language-translation backends.
- Eric Astor
    
    
More information about the Markdown-Discuss
mailing list