when rational discussion was still a possibility

Michel Fortin michel.fortin at michelf.ca
Sat Sep 6 07:26:04 EDT 2014


Le 6-sept.-2014 à 0:16, John MacFarlane <jgm at berkeley.edu> a écrit :

> Michel,
> 
> What you did at the beginning, I gather, was to port (and then extend)
> an existing implementation, Markdown.pl.  The same will be possible with
> CommonMark, which provides two implementations that use the same parsing
> algorithm, one in portable C and one in 1540 lines of javascript (with
> no library dependencies).  The javascript implementation doesn't use any
> unusual javascript features and should be straightforward to
> port to other dynamic languages: perl, python, ruby, PHP.  (Or you could
> just use the javascript library client-side and skip the server-side
> rendering.) Those who work with compiled languages will be able to use
> the C library directly.
> 
> The parsers are both fast and accurate.  The original C parser I wrote
> was about as fast as discount.  An expert C coder is now working on
> otimizing it and, without changing the algorithm, has managed to make it
> about as fast as sundown, which is very fast indeed (0.01 seconds to
> parse a 1MB document, for example).  When optimization is complete, it
> should be even faster.  The javascript parser is also very fast (0.28
> seconds for the above-mentioned 1MB document, running in the Chrome
> browser).  By comparising, Markdown.pl takes 250 seconds on the same
> input, and pandoc takes 3.19 seconds.

I have no doubt a parser written in C, or even JavaScript (which nowadays gets executed with JIT compilers) will beat PHP Markdown. I also have no doubt that your algorithm can be ported to PHP. I have some doubt it'll be fast enough in PHP.

But regardless of performance, I can't swap my algorithm with your algorithm and still call it PHP Markdown if it gives significantly different results. CommonMark does not pass the PHP Markdown test suite, neither does it pass the original test suite made by John Gruber.

Failing tests from the original test suite:
https://github.com/michelf/mdtest/blob/master/Markdown.mdtest/Hard-wrapped%20paragraphs%20with%20list-like%20lines.text
https://github.com/michelf/mdtest/blob/master/Markdown.mdtest/Literal%20quotes%20in%20titles.text

Failing tests from PHP Markdown test suite:
https://github.com/michelf/mdtest/blob/master/PHP%20Markdown.mdtest/Backslash%20escapes.text
https://github.com/michelf/mdtest/blob/master/PHP%20Markdown.mdtest/Code%20block%20in%20a%20list%20item.text
https://github.com/michelf/mdtest/blob/master/PHP%20Markdown.mdtest/Email%20auto%20links.text
https://github.com/michelf/mdtest/blob/master/PHP%20Markdown.mdtest/Headers.text
https://github.com/michelf/mdtest/blob/master/PHP%20Markdown.mdtest/Ins%20%26%20del.text
https://github.com/michelf/mdtest/blob/master/PHP%20Markdown.mdtest/Tight%20blocks.text

Some of these are obviously bugs on your side you'll likely fix. Some of these are degenerate cases I don't really care about the result as long as it produces valid HTML. But for some there is an obvious intent do produce something different (and there are probably more of these than the test suite can catch).

My understanding is that CommonMark is a different flavor of Markdown that chose to diverge in a couple of small ways from the original. I could obviously fork it and "fix" things so they can pass my test suite and John Gruber's test suite and behave more like the original Markdown behave, but that's going to take a lot of time and it'll just create one more flavor situated in between PHP Markdown and CommonMark. That's not a worthy goal to me.

 - - -

With all that said, if I do port CommonMark to PHP I'd probably call it PHP CommonMark and promote it as an alternative, better defined, Markdown-like syntax.


-- 
Michel Fortin
michel.fortin at michelf.ca
http://michelf.ca



More information about the Markdown-Discuss mailing list