Backslash escapes

Jacob Rus jrus at hcs.harvard.edu
Sun Jan 7 20:19:29 EST 2007


Andrea Censi wrote:

>>> b) \<newline> represents a linebreak

>> I can't see why this would be better than what we have now. In fact I

>> think it's worse as it'll clutter the text version of the document

>> unnecessarily; the current double-space syntax means that the

>> Markdown-formatted text looks fine by itself, something which is a

>> core goal for Markdown.

>

> The problem I find with the current syntax is that I cannot *see*

> whether there is the line break.


Get a text editor which allows you to color that line break ;)


>>> 2) Inside "quoted values", you MUST escape `"`

>>> 3) Inside 'quoted values', you MUST escape `'`

>> But what happens if you don't? If you want to go deep in the corner-

>> cases of the syntax I think it'd be more useful to explain what

>> parsers have to do when they encounter that rather than tell the

>> author what not to write.

>

> At one point, you have to decide what is legal and what is not in a

> language. And, if it's not legal, then the behaviour is

> implementation-dependent.


No, that's a bad way to go about it. The edge-case behavior should be
clearly defined, and not left up to implementations.


> Just like HTML: it's very clear what is a legal HTML document.

> However, even though browser do their best to sanitize illegal

> documents, their behaviour in that case isn't specified by the spec.


Yes, and look at all the problems that has caused for web authors aiming
for cross-browser compatible sites.


>>> I would tend to drop the special case

>>>> [text](url "title"with"quotes")

>>> as it is ambiguous.

>> Drop it and replace it with what output? I agree that it has some

>> ambiguities, but it's not that bad really, especially when parsing

>> with regular expressions.

>

> My personal point is that, to support that kind of syntax, I had to

> write a function that it's the only ugly one in my shiny new

> recursive-descent parser.

>

> Also - but I reckon that it is sort of philosophical matter - it's

> really really evil to design a language which contains ambiguities.

> This is one case when the implementation (regexp-based system) heavily

> influenced the syntax.


You'll have to explain the ambiguity here a little bit. I'm not really
clear on what the syntax allows, as I don't ever use separate link
titles, so maybe someone can fill that in as well?


> Anyway, to the goal of reaching a compromise, here's the revised

> proposal for escaping:

>

> =======

>

> 1. No escaping in code spans/blocks.

>

> 2. Everywhere else, **all** PUNCTUATION characters **can** be escaped,

> and **must** be escaped when they could trigger links, tables, etc.

> (punctuation=[^a-zA-Z0-9\s\n])

>

> 3. As a rule, quotes **must** be escaped inside quoted values:

>

> * Inside `"quoted values"`, you **must** escape `"`.

> * Inside `'quoted values'`, you **must** escape `'`.


Yes, this all sounds reasonable to me. The tricky part is that number 2
isn't always completely cut and dried, especially not given the
heuristic regexp replacement method of the current markdown.pl. I
suppose that's what you're aiming here to fix, though.

Incidentally, is anyone interested at all in discussions on any of the
following:

1. Footnotes
2. Tables
3. A more formalized extension mechanism

The first two of those have lengthy archived discussions which could use
someone summarizing them for the rest of us. I plan on taking that up
at some point in the nearish future if no one else will. The last would
be really nice, for adding things like TeX-formatted math, or lilypond
formatted music, or alternate table syntaxes, or whatever else, for
people running into markdown's limitations and not wanting to just use
raw html. I think that curly braces are still available for such a use.

-Jacob



More information about the Markdown-Discuss mailing list