Universal syntax for Markdown

John MacFarlane jgm at berkeley.edu
Sat Aug 13 18:00:47 EDT 2011


I'll chime in too. In developing pandoc, lunamark, and peg-markdown, I've
thought a lot about markdown extensions, and also about how to resolve some of
the ambiguities in the markdown syntax description.

I agree that it would be good if the various implementations could
converge as much as possible on these things. In some cases, they have
converged: I think most implementations do footnotes the same way, for
example, and as a result of a discussion on this list, PHP markdown extra
and pandoc have the same syntax for fenced code blocks. In other cases, they
haven't converged, even after discussion. Some of the divergences are pretty
basic -- e.g. whether nested lists have to be indented by 4 spaces.

Of course, going forward, implementors have to worry about backwards
compatibility. I don't want to make changes that are going to break
old pandoc documents. On the other hand, there's no reason why an
implementation has to accept just one syntax for (say) definition
lists. So if we could agree on a standard syntax that diverged from
pandoc's old syntax, I might be able to modify pandoc to accept them
both.

I can think of two tools that would help a lot in discussions about
extensions and edge cases:

* An updated version of [babelmark](http://babelmark.bobtfish.net/),
which allows you to compare the output of many implementations on the same
input. This was really a useful tool in its time -- any chance it could be
updated? The version of pandoc there, for example, is about three years
old. [I know it's a burden to keep versions of so many implementations
up to date. Perhaps each maintainer could set up a web app that returns
output and metadata (version number, author, website) for a single
implementation. There could be an API key so that only the main babelmark
site could use these apps. babelmark could then just send out a bunch of
HTTP queries and display the output. It wouldn't even need its own server.]

* A wiki with a page for each syntax feature or extension, allowing
comparison of different versions and discussion of pros and cons,
with links to user's guides etc.

* A test suite articulated into many very small tests, each testing
something very specific. We could try to separate these into "agreed" and
"disputed." There are many tests that extend the standard Markdown
test suite that would be agreed on by virtually everyone.
Michel Fortin's MDTest is a major step in this direction.
I have a nice little test-runner that allows each test to be in
a single file, with input and expected output. This makes it easy
to add little tests.

A few more thoughts:

1. Many people have mentioned "rule #1" - markdown should look natural and
readable just by itself. I strongly agree. In my own tinkerings, I've also
insisted on another principle, which Fletcher also articulated, but which I
think would not be accepted by everyone on this list:

Format-independence: Markdown is not just for writing HTML.

I've seen people on this list say, "why do you need extension X, when you
can just include raw HTML?" To which I reply: "Because I want to be
able to convert my document to LaTeX, where the raw HTML won't do much good."
It's true that John Gruber presented markdown primarily as a readable shortcut
to HTML. But I don't see why we should keep thinking about it this way, when
tools like pandoc and multimarkdown can easily convert markdown to a variety
of formats. Indeed, one of the main reasons I write in markdown whenever I
can is that I'm not tied to a single output format. I can have a canonical
document that can be converted reliably to just about any text format.

2. We really need to clarify the rules for indented lists. As I've
argued before on this list, the markdown documentation at least strongly
implies that sublists need to be indented by four spaces, but many
implementations (including Markdown.pl) don't insist on this.

3. I think most people agree that changing from ordered to bulleted
list markers should start a new list (discussed earlier on this mailing
list).

4. I think the opening number of an ordered list should be significant.

5. My own preference would be to require a blank line before a heading
or blockquote, to avoid unexpected results.

5. Tables -- here there's a significant divergence between pandoc, PHP
markdown extra, and multimarkdown. A limitation of pandoc's tables is
that they require a monospaced font, since they rely on column
alignment. The advantage is that they look exactly like tables. In
addition, they allow table cells that contain whole paragraphs, and
even arbitrary block-level content -- whereas, if I understand the
documentation correctly, PHP markdown extra only allows simple tables
with one-line cells. The philosophical differences here may be too
deep for convergence.

6. Metadata -- multimarkdown's system is simple, flexible, and
readable. One reservation I have about it is that it is
English-centric -- nobody wants to write 'Title' at the beginning
of a Swedish document -- but that could be solved by localization.
It also seems a bit pedantic to have to say 'Title' if that's all you have.
Pandoc's system is convenient and doesn't use English keywords, but it's not
flexible enough, and I've been thinking about alternatives.

7. Image/link attributes -- the difficulty here is respecting
format-independence. Saying that an image is 200px is not going
to be helpful if you're targeting both HTML and LaTeX.

8. Citations -- I think multimarkdown's citation system is a step
in the right direction, but too unambitious to make part of a standard.
We put a lot of thought into a good markdown citation format on
pandoc-discuss, and came up with this:
http://johnmacfarlane.net/pandoc/README#citations
This gives you automatic bibliographies and citations, with configurable
styles -- you can even move between footnote styles and parenthesized
inline references -- and still looks pretty natural.

9. Definition lists -- Pandoc is pretty similar to PHP Markdown Extra,
but only supports one term per definition. HTML definition lists support
multiple terms, but this doesn't make sense in many other output
formats, and I don't think it's necessary.

10. Nesting/precedence -- this is probably less of a concern in
practice, but there seems to be no standard for parsing nested
inline elements. For example, consider the input
'[hi `there] friend`](/url)'. Markdown.pl parses this as a link,
and discount doesn't. I don't see anything in the Markdown syntax
description that resolves the ambiguities here. Similarly for
nested emph and strong -- Michel Fortin's MDTest suite contains
some opinionated tests for these, but I'm not sure what the principle
behind them is.

John


+++ Fletcher T. Penney [Aug 10 11 21:40 ]:

> A few caveats:

>

> 1) I am responding, at least in part, since I (or at least my software) was mentioned

>

> 2) I've had a very nice meal, a nice relaxing evening in the mountains on vacation, and a few glasses of wine

>

> 3) I can only speak for myself, not the authors of other Markdown derivatives/forks

>

>

> I agree with some other points that have been made by others --- Gruber seems to be quite content with the current feature set and performance of Markdown, and not inclined to pursue it further. If he's happy, then I don't see any need for him to put further effort into development.

>

> After being introduced to Markdown, it took me about 2 seconds to realize the beauty and elegance that it offered. It took me a little bit longer, but not that long, to realize that it had not been taken as far as it could go. To my knowledge, I was the first person to apply the idea of the Markdown syntax to an output format other than HTML. I then also tried to tie together the improvements made by Michel Fortin in terms of syntax additions. For me, MultiMarkdown offered the ultimate blend of syntax features and output format flexibility.

>

> This is not the first time that the call has gone out for "one Markdown variant to rule them all" to be developed. I've even written, and then deleted, such a call myself. IMHO, the fatal flaw is that those of us capable and inclined to create a derivative of Markdown to scratch our own itch are happy with the variant we have created. We don't see a problem. We added what we needed, and we're content.

>

> In the final analysis, it doesn't matter to me if the other authors of Markdown variants follow my syntax or not. They have their own goals, needs, and opinions that don't necessarily match mine. If you think that Markdown works best for you - great, stick with it. If MultiMarkdown offers features that you find useful, use it. If something else is better, by all means go with it.

>

> That said, I am perfectly willing to tweak the syntax of MMD to mesh with some consensus if it were to exist. But, there is a limit to the features I would be interested in incorporating. I've been asked to include many syntax additions that I have said no to, because I thought they would end up detracting, rather than contributing to, the overall success of MMD. Some may agree with what I've done. Many others will disagree. That's fine.

>

> Where I do think consensus would be helpful is in the features that are *almost* identical across implementations. Early on, I made changes to my footnote syntax to match what others were doing. There is value in such changes to improve compatibility across implementations. That said, I don't want to edge towards the "everything but the kitchen sink" mentality that plagues Word, for example. Gruber has made it pretty clear in the past that he is not a big fan of the syntax additions that I have made for MMD (though, strangely he seems supportive of PHP Markdown Extra.... ;)

>

>

> My proposal, then, is to develop a "standards body" to create a core set of syntax additions, edge case resolution, and definitive test files to define "Markdown 2.0". Obviously, it would need a different name, but I am too lazy to think of one right now. My personal opinion is that this new standard would include fewer, rather than more, extensions to the core Markdown standard. I think it should be defined in a fairly rigorous manner, to avoid some of the ambiguity present in the canonical Markdown.pl (I think John MacFarlane's peg-markdown work was pretty good in this regard). I think some of the core features would include:

>

> * metadata

> * footnotes

> * tables

> * complete test cases/tools

>

> secondary features could include:

>

> * citations

> * definition lists

> * automatic cross-references/labels

> * math extension

> * image/link attributes

>

>

> All this said, however, I think an important consideration for this discussion is:

>

>

> What benefit do the authors of current Markdown variants gain from the effort required to agree on a standard?

>

>

> Being realistic, I'm pretty busy with my day job. I'm even busier throwing in maintaining MMD and now trying to release a new application. I've put in countless hours on a project that has in total provided me with the equivalent of a weekend or two working at my day job in donations from the generosity from those who have themselves saved countless hours of their own time. Clearly I'm not doing this for the money. My guess is that other Markdown authors aren't doing it for the money either.

>

> I think we all do it because we care. We see the beauty and utility in this approach to writing, whether it be for the web (Markdown) or other document formats (MultiMarkdown). For progress to be made on an official "next version" of Markdown, it's going to take a cause that offers some benefit to those of us who have worked so hard during the past few years to contribute our own changes and additions.

>

>

> Again - my own $.02, and may not even be worth that much....

>

> F-

>

>

> --

> Fletcher T. Penney

> fletcher at fletcherpenney.net

>

>

>

>

> _______________________________________________

> Markdown-Discuss mailing list

> Markdown-Discuss at six.pairlist.net

> http://six.pairlist.net/mailman/listinfo/markdown-discuss



More information about the Markdown-Discuss mailing list