PHP 5 port of Markdown, plugin-based

John Gruber gruber at fedora.net
Wed Aug 16 01:11:55 EDT 2006


Michel Fortin <michel.fortin at michelf.com> wrote on 8/15/06 at 5:40 PM:


> > I have taken some liberties with this implementation, such as using

> > delimited integer markers instead of MD5 hashes, and changing some

> > of the rule names (e.g., from "Anchors" to "Links" in one case, and

> > from "ItalicsAndBold" to "EmStrong" in another).

>

> I've always wondered why John chose these function names. And going

> away from hashes seems like a good idea too.


The MD5 hashing is (I thought) clearly just a very odd
implementation detail in Markdown.pl. It has nothing whatsoever to
do with Markdown itself.

And it ends up that Perl's `Digest::MD5` module has some serious
incompatibilities with Unicode text (the whole bytes-vs-characters
thing), which I never uncovered even though I pass UTF-8 input to
Markdown.pl every single day. The difference is that in all the
places where I use Markdown, my strings aren't explicitly encoded
as UTF-8 from Perl's perspective. Perl just treats my input as a
sequence of bytes, which MD5 hashes properly, and the Right Thing
just happens. However, if anyone uses Markdown in a script where
input is explicitly encoded as UTF-8, such that Perl is aware of
the string as UTF-8, then `Digest::MD5` will choke.

So: soon(ish), Markdown.pl will no longer do the hashing thing
internally, either. It's really quite silly if you think about it,
but at the time I was writing that code, it seemed easier than
keeping a counter.

_DoAnchors() still seems sensible to me, in that `<a>` tags are
"anchor" tags.

`_DoItalicsAndBold()` matches how I think of *this* and **that**,
in my mind.

I know why Michel followed my use of MD5, internally -- by copying
as much of my algorithm as possible, he made it as easy as
possible to sync changes between our implementations.

-J.G.


More information about the Markdown-Discuss mailing list