[ANN] PHP Markdown 1.0.2b7

Jacob Rus jrus at hcs.harvard.edu
Mon Sep 18 19:00:42 EDT 2006


John Gruber wrote:

> Michel Fortin <michel.fortin at michelf.com> wrote on 9/16/06 at 5:23 PM:

>

>> Another big change is the automatic hashing of all Markdown-generated

>> HTML content. Previous versions of PHP Markdown Extra were already

>> doing this, but it was limited on block-level elements only and was

>> done to have less call to make to the expensive html block parser.

>> This has been ported to the more basic PHP Markdown, and in addition

>> to hashing block-level content it now also hash span-level elements:

>> this has the benefit of preventing bad nesting of elements, so

>> something like this:

>>

>> *Some **strange* emphasis**

>>

>> will now give valid HTML:

>>

>> *Some <strong>strange* emphasis</strong>

>

> That's interesting, and because the output is valid, it's probably

> better than what Markdown.pl currently generates.


I don't like this solution. It seems to me that the output should
instead be:

<em>Some **strange</em> emphasis**

because the "do what comes first, and then toss out improper nesting"
rule is more understandable for humans (well, at least for this one) and
also I expect easier for a computer parser. The current markdown
behavior is indeed broken


> I've been thinking that a better solution for input like that

> would be to generate markup like this:

>

> <em>Some <strong>strange</strong></em><strong> emphasis</strong>

>

> Which is more of a "do what I mean" solution. However, I've given

> no thought whatsoever to how this would be done algorithmically.


Please please don't travel down this path. I've never seen any
non-contrived example where overlapping bold and italic like this are
needed. So forcing authors to use proper nesting should be perfectly
fine. Mediawiki tries to do this DWIM thing, and it causes all sorts of
bugs and edge cases, which are more trouble than they're worth,
particularly because they go a long way towards committing the language
to a particular implementation, but also because they often just don't work.


>> * Made the block-level HTML parser smarter using a

>> specially- crafted regular expression capable of handling

>> nested tags.

>

> A single pattern that matches nested tags?!

>

> $me == "downloading now";


Somehow it doesn't excite me to learn that even more things will get
round-tripped through markdown's weird MD5 hash step. :P

Are there any examples of what the current behavior is, and what changes
when we use this "smarter" block HTML parser?

-Jacob



More information about the Markdown-Discuss mailing list