when rational discussion was still a possibility
jgm at berkeley.edu
Sat Sep 6 21:38:06 EDT 2014
Michel, I also wanted to comment on these failing test cases,
to explain why they fail. The current spec is a work in progress,
and certainly still up for comment and revision. Your comments would
be most welcome!
CommonMark currently allows a list to interrupt a paragraph (as Markdown
1.0.0 and earlier did, but not later versions). I am not certain about
this choice, but as I see it, these are the tradeoffs.
CON: There is a danger that a hard-wrapped numeral at the end of
a sentence will be misinterpreted as a list item. (Of course, this
could be avoided with a backslash escape, but it might escape the
PRO 1: It is natural and common to write things like:
People write things like this in web forms all the time.
PRO 2: Allowing lists to interrupt a paragraph allows us to keep
a very nice property, which is that a block of text, when converted
into a list by prepending '1.' and indenting, will have the same
meaning inside the list as it had without it.
So, in CommonMark,
is a (paragraph followed by a list) by itself, and also a
(paragraph followed by a list) inside a list item:
We lose this (to my mind very natural and desirable) property if
we don't allow a list to interrupt a paragraph.
I think that the PROs outweigh the CON here.
See here http://jgm.github.io/stmd/spec.html#link-title
and particularly the paragraph:
"(Note: Markdown.pl did allow double quotes inside a double-quoted title, and
its test suite included a test demonstrating this. But it is hard to see a good
rationale for the extra complexity this brings, since there are already many
ways--backslash escaping, entities, or using a different quote type for the
enclosing title--to write titles containing double quotes. Markdown.pl’s
handling of titles has a number of other strange features. For example, it
allows single-quoted titles in inline links, but not reference links. And, in
reference links but not inline links, it allows a title to begin with " and end
with ). Markdown.pl 1.0.1 even allows titles with no closing quotation mark,
though 1.0.2b8 does not. It seems preferable to adopt a simple, rational rule
that works the same way in inline links and link reference definitions.)"
>Failing tests from PHP Markdown test suite:
As far as I can see, stmd's output is semantically equivalent HTML; it's
just a matter of whether '>' is escaped as '>'.
For email addresses we used the "non-normative regex" from the HTML5 spec,
which seemed a nonarbitrary and practical thing to use:
It seems not to allow the international example or the crazier ones
(with strange symbols and quotes). Probably this should be fixed in our
Our spec does not allow setext headers to interrupt a paragraph.
So the example
does not get parsed as containing a header. I'm not sure there's any
very strong rationale for not allowing setext headers to interrupt
a paragraph, so maybe this should be revisited. It would be an easy
matter to change in both spec and parsers.
The spec does not include ins or del among the list of HTML block tags.
I can't recall where we got this list, and it now seems a mistake.
Adding these to the list would still yield different output from PHP
Markdown, because of differences in treatment of HTML blocks,
but more reasonable output.
The differences here are explained by three decisions:
1. as discussed above that we allow lists to interrupt paragraphs
2. we don't allow headers to interrupt paragraphs
3. we do allow block quotes to interrupt paragraphs.
I think that the grounds for (1) are pretty strong (discussed above).
I don't feel strongly about (2). In this case, we have counterparts of
CON and PRO 1 from the discussion above, but PRO 2 is not an issue.
This makes the balance more even, but an argument from consistency
could tilt the balance towards allowing headers to interrupt paragraphs.
This would also preserve compatibility with the majority of existing
With (3), we also have counterparts of CON and PRO 1. Again, it
seems arbitrary to go different ways with (2) and (3), and this is the
first time I've noticed this. This is definitely something we'll
have to reconsider.
Here I just need to refer you to the extensive discussion in the spec
of the motivation for the list rules we chose.
This was one of the hardest things to work out in a (to me) satisfactory
way. NO choices here will be perfectly backwards compatible with every
implementation, since they go in so many directions. But I'm pretty confident
that the choices we've made are better than any of the alternatives I've
considered. I would be interested to hear your feedback on this!
More information about the Markdown-Discuss