Benchmarks with TextMate's manual

John Fraser john at attacklab.net
Mon Aug 27 20:43:05 EDT 2007


On 8/27/07, Andrea Censi <andrea at censi.org> wrote:

> Maruku takes 8 seconds for parsing (on my PowerBook G4 1.5GHz).

> (please note that Ruby, per se, is much slower than Perl)

>

> I guess that if you plot [time for parsing] versus [length of the

> document], you get a curve which grows more than linearly for

> Markdown.pl and PHP Markdown.

>

> This is the same behaviour that I observed in Bluecloth (straight port

> of Markdown.pl in Ruby) -- if I remember well, time was O(length^2).

> By comparison, Maruku, and other real parsers, takes O(length).


One implementation of the current Markdown algorithm can actually beat
Maruku's time: [Showdown] is a straight JavaScript port of
Markdown.pl, but it converts the TextMate in under 4 seconds (in
Firefox, on an ancient machine).

[Showdown]: http://www.attacklab.net/showdown-gui.html

John Gruber's focus in the reference implementation has been on
functionality -- not speed. That's good news, because it means
there's a lot of room for improvement. I didn't do any clever
optimizations with Showdown; I just benchmarked, then fixed the
glaring problems. It's not so easy to speed up the Perl version right
now, because the dog-slow text::balanced parser would eclipse any
simple fixes. But the PHP port shouldn't be hard too speed up. In
fact, a good band-aid for the Perl version might be to backport
Michael's variant of the old HTML parsing code -- which builds a big
regexp to handle balanced tags up to four levels deep. (My gut tells
me we can write a simple HTML parser that's even faster than that, and
I'll take a crack at it as soon as I find the time.) Once you have a
fast HTML parser, it's easy to optimize the rest.

So, big-O aside, Markdown's current algorithm can perform a lot better
in the real world than the Perl and PHP implementations indicate; in
fact, it sounds like it can keep up with Maruku on reasonably-sized
documents. But don't get me wrong; I'm actually strongly in favor of
switching to a traditional parsing model -- even though I'd have to
throw out all the work I did porting the current algorithm to
JavaScript. I'd originally planned to write a Markdown parser from
scratch, but quickly realized it would be impossible to get 100%
compatibility without doing a direct, line-by-line port. And that's a
bad sign.

John Fraser


More information about the Markdown-Discuss mailing list