Universal syntax for Markdown

Waylan Limberg waylan at gmail.com
Sat Aug 13 22:14:20 EDT 2011


Guess I'll add my comments as well. In fact, because Python-Markdown
not only includes some extensions, but also provides an API for third
party extensions, I've given a lot of thought to extensions in
general.

On Sat, Aug 13, 2011 at 6:00 PM, John MacFarlane <jgm at berkeley.edu> wrote:

> I'll chime in too. In developing pandoc, lunamark, and peg-markdown, I've

> thought a lot about markdown extensions, and also about how to resolve some of

> the ambiguities in the markdown syntax description.

>

> I agree that it would be good if the various implementations could

> converge as much as possible on these things. In some cases, they have

> converged: I think most implementations do footnotes the same way, for

> example, and as a result of a discussion on this list, PHP markdown extra

> and pandoc have the same syntax for fenced code blocks. In other cases, they

> haven't converged, even after discussion. Some of the divergences are pretty

> basic -- e.g. whether nested lists have to be indented by 4 spaces.


I agree here. In fact, Python-Markdown has based any extension which
implements a feature in PHP Markdown Extra on the PHP implementation.

>From time to time we get a request for feature x to be added to

extension y. We just say: "No, its not part of PHP Extra, but you can
build your own extension if you like." Some times they do. A few have
even become semi-popular.


> Of course, going forward, implementors have to worry about backwards

> compatibility.  I don't want to make changes that are going to break

> old pandoc documents.


We have the same concern. A few extensions are rather old now and
while other implementations have implemented a better syntax, we can't
change ours because people using our library with the old
implementation will now need to update countless numbers of documents.


> *  An updated version of [babelmark](http://babelmark.bobtfish.net/),

>   which allows you to compare the output of many implementations on the same

>   input. This was really a useful tool in its time -- any chance it could be

>   updated? The version of pandoc there, for example, is about three years

>   old. [I know it's a burden to keep versions of so many implementations

>   up to date. Perhaps each maintainer could set up a web app that returns

>   output and metadata (version number, author, website) for a single

>   implementation. There could be an API key so that only the main babelmark

>   site could use these apps. babelmark could then just send out a bunch of

>   HTTP queries and display the output.  It wouldn't even need its own server.]


That's an interesting idea. I've been thinking about it too and was
considering writing my own implementation as an experimental Node.js
project (server-side javascript). Seems like the calls to all the
various implementations would make a good fit for JavaScript's
asynchronous nature. I always imagined all the implementations being
on one server, but nothing would prevent them from being on separate
servers either.


> 2.  We really need to clarify the rules for indented lists.  As I've

> argued before on this list, the markdown documentation at least strongly

> implies that sublists need to be indented by four spaces, but many

> implementations (including Markdown.pl) don't insist on this.


Python-Markdown has only ever supported 4 spaces for indent. From time
to time we get a bug report complaining that "nested lists don't work"
as if they don't work *at all*. Every time I'm a little confused at
first, then I realize that their sample source text uses less than
four spaces of indent. Interestingly, every such report points to
another implementation in which their sample does work. So far, we've
refused to change our implementation and simply suggest that if they
really want it to work, they can adjust the `tab-length` argument to
whatever they want. Of course, this does have the side affect of also
changing the indentation requirements for code blocks, etc. But hey,
we are Python people, white space is significant and must be
consistent or our code won't run so just indent everything with four
spaces already. ;-)


> 3.  I think most people agree that changing from ordered to bulleted

> list markers should start a new list (discussed earlier on this mailing

> list).


Probably, but what about those old documents?


> 4.  I think the opening number of an ordered list should be significant.


Agreed, but those old documents will come back to bite us again.


> 5.  My own preference would be to require a blank line before a heading

> or blockquote, to avoid unexpected results.


Old documents again? Sigh, I'm getting tired of that response too.


> 5.  Tables -- here there's a significant divergence between pandoc, PHP

> markdown extra, and multimarkdown. A limitation of pandoc's tables is

> that they require a monospaced font, since they rely on column

> alignment. The advantage is that they look exactly like tables. In

> addition, they allow table cells that contain whole paragraphs, and

> even arbitrary block-level content -- whereas, if I understand the

> documentation correctly, PHP markdown extra only allows simple tables

> with one-line cells.  The philosophical differences here may be too

> deep for convergence.


Here is one of those features where we copied PHP's implementation and
tell people to go build their own if they want something different.
Our extension API should make it relatively easy, but I have yet to
see any significant advancements by third party extensions here.

Personally, I'm in the tables-should-be-raw-html camp.You want to
output to something besides html? Hmm, I thought tables were for
tabular data. Surely some library exists that could convert tabular
data in html to your format of choice. Not that I'm trying to be
inflammatory, but if your tables are that complex, I would suggest
your doing it wrong. Maybe the problem is that I just don't produce
that much tabular data.


> 6.  Metadata -- multimarkdown's system is simple, flexible, and

> readable.  One reservation I have about it is that it is

> English-centric -- nobody wants to write 'Title' at the beginning

> of a Swedish document -- but that could be solved by localization.

> It also seems a bit pedantic to have to say 'Title' if that's all you have.

> Pandoc's system is convenient and doesn't use English keywords, but it's not

> flexible enough, and I've been thinking about alternatives.


Python-Markdown provides a metadata extension which almost exactly
follows multimarkdown's. The primary difference it that we don't
include a "complete document" (with html head section) feature.
Therefore the lib user is expected to build their own. So we simply
provide the metadata as a data dict which the user can pass into a
templating system or whatever they want to do with it.

Interestingly, after having this feature for a few years, I would
instead prefer a separate library (like say YAML) which reads and
removes the metadata before passing the document on to markdown. In
fact, the metadata could even alter some of the arguments passed to
markdown, or perhaps skip calling markdown at all (basically like
Jekyll - used by Github Pages). To me this is much more flexible and
doesn't require markdown to define a metadata syntax at all. Just let
YAML (or whatever spec is chosen) determine the syntax and we
automatically get the benefits of that spec's well developed data
structure.

My personal preferences aside, we continue to support the existing
extension for those people still using the current implementation.

In summary, I think the best thing we did with Python-Markdown was
make the extension API public and documented [1] it. Then, whoever
uses our library for whatever unforeseen (by us) purpose can build on
the markdown syntax to fill whatever needs they have. I find it
interesting that Github has recently taken the same route. They built
a new C lib [2] (more recently renamed to sundown) with an extension
API. They then built their own extensions [3] on the API which meet
their needs. Use the basic lib, and you get basic markdown. Use
whatever extensions you want and you get whatever you need.

PS: Sundown's docs list bindings for ten different languages. At least
the python bindings claim [4] to be significantly faster than any
other python implementation, including those wrapping C code. You
might want to take a look, especially considering that the developers
have had no association with or communication with the community on
this list AFAICT.

[1]: https://github.com/waylan/Python-Markdown/blob/master/docs/writing_extensions.txt
[2]: https://github.com/tanoku/sundown
[3]: https://github.com/tanoku/redcarpet
[4]: https://github.com/FSX/misaka/blob/master/docs/documentation.md

--
----
\X/ /-\ `/ |_ /-\ |\|
Waylan Limberg


More information about the Markdown-Discuss mailing list