Backslash escapes (was: Revised 2005 proposal for meta-data)
andrea at censi.org
Fri Jan 5 18:40:51 EST 2007
On 1/4/07, Michel Fortin <michel.fortin at michelf.com> wrote:
> Le 2007-01-01 à 15:25, Andrea Censi a écrit :
> >>  Even further, you could allow non-punctuation to be escaped.
> > In a sense, this is the most consinstent way of escaping.
After implementing it, and playing around, I changed my mind about
escaping [a-zA-Z]. It's useless and just confusing.
> > b) \<newline> represents a linebreak
> I can't see why this would be better than what we have now. In fact I
> think it's worse as it'll clutter the text version of the document
> unnecessarily; the current double-space syntax means that the
> Markdown-formatted text looks fine by itself, something which is a
> core goal for Markdown.
The problem I find with the current syntax is that I cannot *see*
whether there is the line break.
> > 2) Inside "quoted values", you MUST escape `"`
> > 3) Inside 'quoted values', you MUST escape `'`
> But what happens if you don't? If you want to go deep in the corner-
> cases of the syntax I think it'd be more useful to explain what
> parsers have to do when they encounter that rather than tell the
> author what not to write.
At one point, you have to decide what is legal and what is not in a
language. And, if it's not legal, then the behaviour is
Just like HTML: it's very clear what is a legal HTML document.
However, even though browser do their best to sanitize illegal
documents, their behaviour in that case isn't specified by the spec.
> > I would tend to drop the special case
> >> [text](url "title"with"quotes")
> > as it is ambiguous.
> Drop it and replace it with what output? I agree that it has some
> ambiguities, but it's not that bad really, especially when parsing
> with regular expressions.
My personal point is that, to support that kind of syntax, I had to
write a function that it's the only ugly one in my shiny new
Also - but I reckon that it is sort of philosophical matter - it's
really really evil to design a language which contains ambiguities.
This is one case when the implementation (regexp-based system) heavily
influenced the syntax.
> > The first pass of processing the document simply becomes:
> > until eof
> > end
> Something that sounds odd to me is that you're doing this as the
> first pass of the whole document, yet you don't take into account
> HTML blocks, code blocks and inline HTML tags, but you've thought of
> code spans. It'll have to get much more complicated than that if you
> want to handle escapes as a first pass.
Actually, it worked ok in my first implementation. The trick is to
re-expand the escapes in code blocks or HTML code.
> Why do you want to proceed escapes first anyway?
Assume the input string is
" `code` - \`not code\` - ``code with \` slash-tick `` "
The first pass I did was to replace "\`" with a code outside of the
input range. Let `?` represent that code. The string becomes:
" `code` - ?not code? - ``code with ? slash-tick `` "
now extract code blocks (CB):
CB("code"), "- ?not code? - ", CB("code with ? slash-tick")
and undo the escapes: in strings ? becomes `, in code spans ? becomes \`:
CB("code"), "- `not code` - ", CB("code with \` slash-tick")
I did the same for code blocks and HTML.
It worked, but I don't use this method anymore.
Anyway, to the goal of reaching a compromise, here's the revised
proposal for escaping:
1. No escaping in code spans/blocks.
2. Everywhere else, **all** PUNCTUATION characters **can** be escaped,
and **must** be escaped when they could trigger links, tables, etc.
3. As a rule, quotes **must** be escaped inside quoted values:
* Inside `"quoted values"`, you **must** escape `"`.
* Inside `'quoted values'`, you **must** escape `'`.
* Other examples:
`"bah 'bah' bah"` = `"bah \'bah\' bah"` = `'bah \'bah\' bah'`
`'bah "bah" bah'` = `'bah \"bah\" bah'` = `"bah \"bah\" bah"`
4. There is an exception for backward compatibility, in links/images titles:
The exception is not valid for attribute lists and in other
contexts, where you have to use the canonical syntax.
As for point 4, my implementation tries its best to parse it, but
warns the user that it's bad manners.
"Life is too important to be taken seriously" (Oscar Wilde)
More information about the Markdown-Discuss