Markdown Extra Specification (First Draft)

Michel Fortin michel.fortin at michelf.com
Tue May 6 23:57:19 EDT 2008


Le 2008-05-06 à 18:18, Sherwood Botsford a écrit :


> As a suggestion for the next pass at this, add an example of each,

> and how it should be rendered.

> I found fireball's web site fairly lucid for this.


The content model section currently includes a basic description of
each syntax element, but its primary goal is to explain the structure
of a Markdown Extra document. The parsing section is intended to
define without ambiguity the syntax for each element.

Nothing is set in stone though. I could, for instance, remove entirely
the syntax description part from the document model and leave that
entirely to the parsing section. I'll see what fits when I write the
parsing section.



> E.g.

> Example abbreviation definition

> *[TLA] :Three Letter Acronym

>

> Note, If I'm reading this correctly then

> * [TLA]: Three Letter Acronym

>

> would be incorrect, as there is a space after the asterisk and after

> the colon.

> It's also not clear if this critter has to appear on a line by itself.


This will be defined in more precise terms in due time. I'm not there
just yet.

That said, I think the first one should be allowed, but not the second
one. Putting a space before a colon is common in the French-speaking
world, and don't think it causes a problem to allow it; whereas the
second one is ambiguous with a list item and I'd rather have the user
see a list item where he doesn't expect one than seeing a list item
disappear because it looks like an abbreviation definition.

In fact, the first one is allowed by PHP Markdown Extra, but not the
second one which becomes a list item.



> If the abbreviation is long enough that it spans line ends, is that

> ok?

>

> * [ODITLOID]: A Day in the Life of Ivan

> Denisovich (Alexander Solzhenitsyn)


Currently, PHP Markdown Extra doesn't handle that very well. I'm not
sure what will the final specification should say about this. I'll
look at it when doing the parsing section.



> ***

> A lot of the critters that appear as references aren't clear about

> how they

> appear in the text, and how they appear when resolved. The footnote

> dfinition starts with [^ but does the footnote in the text also

> start that way?


I'm not sure I understand your concern here. The spec may not be clear
about that currently, but the spec's extra features are coming from
PHP Markdown Extra, and so will follow PHP Markdown Extra's syntax.
Please take a look at this document:
<http://michelf.com/projects/php-markdown/extra/#footnotes>



> 2.2.1 Link Reference

>

> Quote:

> A link reference is alone on a line. It begins with the reference

> name inside square brackets, optionally followed by a space or a no-

> break-space, a colon, a URI (either enclosed in angle brackets or

> not), and an optional title enclosed in single or double quotes, or

> in parenthesis (which can be preceded by a newline).

>

>

> four things:

> 1. Sometimes the link is *(@*&^(^ long. So if I'm editing with vi, I

> have everything else in 60-70 column

> lines, then this great bloody honker. I'd like an optional syntax

> for breaking a long URI into chunks.

>

> Eg. the usual unix convention of \ with optional trailing whitespace

> means continued on next line, with the \ and whitespace going to the

> bit bucket.


I'm not sure that's useful enough to justify the added complexity for
parsers. But still, please remind me of this problem after I've done
the part of the parsing section that deals with URLs.



> 2. I'd like some way to hook references to an external file, or

> database lookup instead of doing them internally.


That can be a parser feature; it's out of the scope of the
specification.



> 3. Why three versions of quoting characters for the title?

>

> 4. Why the <> around the URI?


Because that's what Markdown.pl supports, alongside other
implementations based on it such as PHP Markdown. I think that's the
exact kind of detail that is under-documented right now and that is
making life difficult for other implementers who want to be compatible
with existing documents.



> 5. Only if you put the title in () can you start on a newline?


No idea. I haven't looked at the implications of this yet, but perhaps
it could be done.



> 2.2.2 Abbreviation

> Again, I'd like a hook so that I can put these in an external file.

> In my tree farm web page I'd like to use botanical descriptions, but

> be able to let users see the definition on mouse over or click. But

> the word 'glabrous' may appear on 40 pages. Be nice if I only had to

> define it once. If someone is creating an annotated Shakespeare they

> would want to use an Elizabethan English dictionary as their

> external file, style it so that defined words are barely different

> from the text, and let the confused reader click for enlightenment.


That should be an implementation-specific feature; perhaps we should
make sure the spec doesn't disallow that.



> 2.2.2 footnotes

>

> Note possible numbering error both abbreviation and footnotes are

> 2.2.2


Oops... should be fixed now.

[Note to self: I really need to add an automatic numbering system to
my publishing system.]



> How does the footnote appear in the text? For clarity in reading,

> all the things that refer to something else

> should be visible different. E.g. In markdown we presently have

> [link text ][LINKREF]

> ![Image alt text][IMGREF]

> so

> ^[FOOTNOTE] (although I'd prefer _[FOOTNOTE] as it tells me it's

> below the ruler line at the bottom of the page)

> Except from your text it appears it should be [^FOOTNOTE] which is

> at odds with the image and abbreviation

> syntax.

> How are footnotes numbered?


That feature, along others, should follow how PHP Markdown Extra does
it. Footnote numbering could be left implementation defined however.



> I think you could make a case for a footnote being the child of the

> block element that the reference appears in.

> This may potentially allow clever people with CSS to have the

> footnote appear as a sidebar div, adjacent to the reference.


You'll probably need a different HTML output if you want to have
sidenotes, but that shouldn't be disallowed.

I intend to describe a reference HTML output in the spec, but that
section will probably be non-normative so that implementers are free
to give any output they feel right for their users.



> 2.3.7 Table syntax

>

> Suggested syntax

> [TT] Table title

> |[TH] elements | separated by pipes | with white space | on either

> side |

> | anything | that | appears | with | leading and trailing |

> | is | formated | as a | table row|

>

> |> This cell spans two columns | and | so forth|

> | This cell also spans two columns <| and | so forth|

> |>> This cell spans three columns | in the | table |

> | This cell spans two rows | in the | table|

> | " " | because it has ditto marks | in the cell below |

>

> Since many tables are done without a title or header, the pipe

> syntax is the usual.

> You can spent some time pretty printing it.

> Suggested implementation would have warnings when the number of

> cells per row is inconsistent.

>

> 2.4.2 and 2.4.3 Emphasis and strong emphasis.

> The current markdown uses either _ or * for emphasis and any

> combination of the two

> doubled for strong emphasis. I suggested that * be used for strong

> (default bold) and _ be used for emphasis.

> (default italic) This gives three combinations possible with the

> same set of symbols, and fits the general

> intuitive nature of markdown.


The plan is to use the syntax implemented by PHP Markdown Extra.

For emphasis with underscore, there's going to be a special note about
the difference in Markdown Extra and plain Markdown (about middle-word
emphasis), so that implementers of plain Markdown can implement the
thing correctly.



> 2.4.6 Hard line break

> This one bites me regularly, as I learned to touch type in high

> school and to end a sentence with

> a period followed by two spaces. This means that every time I end a

> sentence on a line end, I

> get an involuntary break. Lots of head scratching over this one. I

> don't like markup that depends on invisible

> trailing characters.

>

>

> I would favour ending a line with a forward slash. You sometimes see

> this in poetry where line length exceeds

> the column width. And it has an easy mnuemonic: If a back slash

> means concatenate the next line onto this one

> then forward slash means, force the line break here.

>

> Thus my address would appear

>

> Sherwood Botsford /

> RR 1 Site 2 Box 5 /

> Warburg, Alberta T0C 2T0 /

>

> I would propose that any amount of white space surrounding the /

> would be allowed. So if you wanted

> to add extra space so the /'s would line up, you could.


Removing the double-space-at-the-end rule isn't on the table; such a
change would break all documents that are already out there using the
current hard line break syntax.

That said, I agree with you that it can bite easily if you have the
habit of writing two spaces after a sentence (and I know this is quite
common among people where I live). But, sadly perhaps, I think it's
too late to change.



> Abbreviation is element 2.2.2 and 2.4.7 Is this correct? It is both

> a document element and a span element? Ditto Link. Will this cause

> trouble for designing the parser?


The document element 2.2.2 is "Abbreviation definition", telling what
word means what, and the span element 2.4.7 is "Abbreviation",
representing an instance of an abbreviation in the text (deduced
automatically by the parser). I'm not sure what is the problem there.
It's pretty much alike 2.2.1 Link Reference and 2.4.4 Link: one is the
definition of the URL and title of a link; the other is the actual link.

* * *

I'm sorry to ditch most of your suggestions like that, but I can't
really do any breaking change to the syntax, or that syntax wouldn't
be Markdown anymore. The idea behind the spec is to give implementors
an unambiguous reference about how to implement Markdown (and Markdown
Extra), allowing documents tested with one parser to work with any
other, unchanged.

Given the current situation, it may be a little utopian to believe no
current document will be broken as implementations adjust themselves
to the spec, but we should try to minimize that.


Michel Fortin
michel.fortin at michelf.com
http://michelf.com/




More information about the Markdown-Discuss mailing list