Tightening the rules for literal `[` and `]` chars in link ids

John Gruber gruber at fedora.net
Mon Sep 25 15:09:31 EDT 2006


So here's an interesting bug I just discovered:

[Like this][d]: [here][h].

[d]: foo

[h]: bar


The output here should be:

<a href="foo">Like this</a>: <a href="bar">here</a>.


But instead the output is completely empty. I see this bug in both
Markdown.pl and PHP Markdown.

The problem is that all three lines are being treated as link
definitions. The first line is being matched as though

Like this][d

is the link id, and

[here][h].

is the URL.

You can trigger it in other, simpler ways as well:

[Like this][d]: here.

[Like this][]: here.

And with the magic implicit autolinks, even:

The next line will disappear
[like this]: here.

The current pattern for identifying link references, translated to
English and simplified slightly for this discussion, is:

An opening bracket `[`
Followed by anything other than a newline
A closing bracket `]`
A colon `:`
Zero or more spaces and tabs
Followed by the URL

The URL is defined, simply, as a run of non-space characters.

So, we *won't* trigger this bug with this:

[Like this][d]: two words.

* * *

My thinking is that this can be solved by changing the rules for
link IDs to state that they can only contain embedded literal
brackets if (a) they're properly nested; or (b) they're backslash
escaped.

Objections or suggestions?

This change won't solve the problem for the magic implicit link
references:

The next line will disappear
[like this]: here.

One way to address this might be to tighten up the rules for what
a URL is.

I should also note that this entire problem has never been
reported to me by anyone, so it doesn't seem to be something
people are stumbling upon frequently.

-J.G.


More information about the Markdown-Discuss mailing list