Markdown within HTML

John Fraser showdown at
Fri Mar 2 19:36:11 EST 2007

Before I start writing an HTML parser for Showdown, I want to see if there's
a safe way to have Markdown process the contents of HTML block elements by
default. I don't think markdown="1" is an official part of the language
yet, so this seems like a good time to talk about it.

Being able to wrap Markdown text in divs and spans would make it possible to
add more structure to documents, and to give css and javascript something to
hold on to. It's probably reasonable to ask advanced users to write simple
markup like `<div class="description">`, and it helps keep Markdown's syntax
small. But I think something as verbose as `<div class="description"
markdown="1">` is too much to ask.

So what breaks when you run complex HTML through Markdown? I've come up
with two problems so far: code blocks and paragraph wrapping.

First, we don't want to accidentally trigger a code block with indented HTML
like this:

<div class="vcard">
<span class="fn">John Smith</span>
<div class="tel">212-555-1212</div>

I think the best way around this is just to disable Markdown's code blocks
within HTML. People can always use `<code>` and `<pre>` tags (and there's
probably a safe way for Markdown to do `&` and `<` encoding within `<code>`
elements to make them less of a pain in the ass). Disallowing code blocks
in HTML would make writing an HTML parser easier too -- but that's a lousy
reason to do it.

Second, we need to make sure we don't litter HTML with lots of extra
paragraph tags. I haven't looked at all the places it's a problem, but I
think we can come up with a set of rules that will work well: don't make a
solitary paragraph or one whose only siblings are block elements; don't wrap
anything the `<p>` element isn't allowed to contain; don't add `<p>` tags if
they'd be siblings to existing ones... that kind of thing.

There's probably a show-stopper I'm missing, but I feel like we can get this
working for the vast majority of HTML in the wild. And if a user pastes in
HTML that we do break, there's always `markdown="0"`.

So what are the other obstacles?


-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Markdown-Discuss mailing list