Detab should be multi-byte aware?

John Gruber gruber at fedora.net
Mon Oct 9 18:19:38 EDT 2006


Allan Odgaard <29mtuz102 at sneakemail.com> wrote on 10/9/06 at
11:02 PM:


> This raises two questions:

> 1. Should Markdown convert tabs to spaces in pre-formated text?

> 2. If yes, should Markdown be aware of multi-byte characters?

> I’d say yes to #1 -- Markdown converts to (X)HTML which

>does not define the tab size, and a good rule of thumb is to

>always convert to spaces before publishing on the net.


For #1, that's exactly why it does it.



> As for #2, Markdown doesn’t know the encoding of the source

> document, so that would mean it can’t really be aware of

> things such as UTF-8 mb sequences, OTOH if it changes my

> pre-formatted text, I would like to have it do the right thing.


If Markdown.pl ever gains explicit support for text encodings, the
rules will be simple: UTF-8 in, UTF-8 out, no exceptions.

This would break the way some people are using it, I'm sure. I
don't really have much sympathy for people who are clinging to
other encodings, though.

I don't think the rules for the syntax (as opposed to the
implementation) need to mention it, though, at least not yet.

I say "yet" because from the get-go I've always considered using
non-ASCII punctuation characters for certain features.

I don't think there's any reason that someone couldn't write a
UTF-8 savvy Markdown implementation using the 1.0 syntax, though.

-J.G.


More information about the Markdown-Discuss mailing list