what this has to do with markdown

bowerbird bowerbird at aol.com
Sun Jul 7 22:21:22 EDT 2013


first, i'm sorry for the expletive in your in-box. really.

i also apologize for the smell from those dead skunks.

***

as to "what this has to do with markdown", it's simple.

if i remember correctly -- i might not, but who cares? --
"fan_f*ck*ng_tastic" was the word gruber used to justify
his choice that his version of markdown would recognize
intraword italics. so that's why _i_ used that one as well.

now, the reason i followed it up with my reference to the
dead-skunk problem is because it's almost perfect as a
demonstration of the full range of problems these days...

a person comes in and says, "hey, i noticed this glitch".

somebody else says "here's a workaround you can use."

which -- first -- ignores the fact that it's after-the-fact.

but, in this particular case, the suggestion was actually
better than most. to remind you, the workaround was to
surround filename_withanunderbar.txt with `backticks`,
which marks it as `code`, and thus short-circuits italics.

because, as the suggester pointed out, it is the case that
you probably _want_ filenames to be marked as `code`,
so they will display in a different typeface, and stand out.

the problem with that tactic, however, is that it does not
address the situation where you would want the word to
be rendered with the same typeface as surrounding text.
you wouldn't want "fan_f*ck*ng_tastic" marked as code.

so... sticking with the problem in regard to filenames...

another workaround would be to backslash/escape the
underbar in the filename, which will also nix the italics,
but that presents a different problem, which is that now
we've gummed up the plain-text version of the filename
with an unwanted backslash, with unknown side-effects.
(since you just know somebody is going to end up using
that now-improper filename, and they will suffer for it.)

that same type of problem would likely manifest with the
"just use raw .html" workaround, even if you can find the
way to concoct that. (it hurts my brain to think about it;
i'm using light-markup so i'm not forced to do raw html.)

the fact is, we really want to leave a filename untouched.
but we also don't want its underbars to be italic triggers.

and remember that when an underbar is misrecognized
as an italic-trigger, it's dropped from .html output, so
we now have _another_ wrong version of that filename,
in addition to the difficult problem of the runaway italics.

and, just to remind y'all that this is even _more_ thorny,
this underbar problem also happens regularly with urls.

(there are other instances too, but i do not intend to
share all of the results from my hard-fought research;
since url's have the problem, it is significant enough.)

this is not a thing we can casually sweep under the rug.

which is why some markdown script-writers have just
decided that they will _disallow_ intraword underbars.

and, in defense of that decision, it is the absolute truth
that browsers make a sad tragedy with intraword italics.
go look at some, take a hard look, and you _will_ see it:
the italic characters either slant into the upright ones, or
lean far too far away from them. either side, it's _awful_.

so yes, many markdown scripters do an outright ban...

which is fine if you are god, and you make the decisions.

but if you are beholden to users, it might not be so good.

and if you consider yourself to be a _servant_of_writers_,
then you really need to do a bit of research (or lots of it)
to discern if writers actually do ever use intraword italics.

that was what i did, as i was developing my light-markup.

so i can tell you that, yes, indeed, writers _do_ use them.

not a lot, of course, but they're not that infrequent either,
and it is a sizable percentage of writers that do use them.

so that's probably why about _half_ of the implementations
ban 'em, and half _allow_ them. it's split down the middle.

so if you really want to know if it's acceptable to ban them,
my advice would be "no".

***

now, let's go back and look what the original poster said.


> Why not to ignore all "_"

> which are not followed or preceded

> either by a whitespace or by a newline?


just for the record, a newline _is_ whitespace, so we can
strike the "or by a newline" phrase; just use "whitespace".

as a first pass in thinking about that issue, that's not bad.
i'd say it's the "solution" most people would come up with.

i wouldn't even be surprised if some implementations do
indeed use exactly that rule to govern their conversions...

but if you actually go look at where italics markup is used,
you'll find many people put italics _inside_ any punctuation.
(most typically, you can find this with double-quote marks,
but any terminal-punctuation will present the same issue.)

now i wouldn't recommend that, because -- as i just said --
browsers do a lousy job when italics are next to un-italics,
and that's true for punctuation as much as other characters.

but the fact remains that a lot of people use italics like that,
so if you use "whitespace" as the rule, you'll screw them up.

(of course, by putting your underbars _outside_ quotemarks,
you can screw up some conversion routines for curly-quotes,
because _they_ are using whitespace to make their decisions;
but that's why you need to decide things in a systematic way.)

again, back to the original poster:


> It would be nice to make

> a part of the official Markdown definition

> then all implementation will display this in the same way.


as gruber put it, years ago and very recently, people _say_
they wanna have an "official" version of markdown -- but
what they _mean_ is that they want _their_ pet desires to
receive his stamp of approval as "the official markdown".

but if gruber _were_ to make an "official version", he says
that it would make those people very unhappy, because he
will instantiate _his_ pet desires as the canonical standard.

so, let me say to the original poster, gruber _did_ make the
closest thing to an official version, and it specifically _allows_
intraword italics. so you wouldn't get what you want anyway.

which is not to say that other implementations, which do it
_differently_ are "wrong", because gruber likes it "flexible".

in other words, he doesn't _want_ all implementations to
"display in the same way". which could be well and good,
if not for all these dead skunks in the middle of the road.

you can call it "flexiblity", or you can call it "inconsistencies".

whether you, or i, or anyone else for that matter, considers
all this to be "right" or "wrong" is entirely beside the point...

since gruber ain't gonna change his ways, and neither are
the many developers, whose stubborn insistence has also
been equally-well documented, there is no resolution here.

which is why most people have stopped thinking long ago.

***

and _that_, my friends, is another one of the problems here.

because that refusal to do any more thinking on the matters
-- the disinclination to remove dead skunks from the road --
means that the situation really has become totally hopeless.

as fletcher put it, in his reply to the original poster:


> Stick around. You'll learn. ;)


hey, at least he put a winkey-smiley after it... ;+)

***

so, just to do a follow-through as a for-example for you,
let me run you through the thinking that i did when i was
working about the aspects of this intraword italics issue.

one part, which i mentioned above, was to survey books
-- as my system focuses on books -- to see if authors
actually use intraword italics. and they occasionally do.

on the other hand, more research revealed quite readily
that there was a problem with both filenames and urls,
as they often contain underbars. (and, so i note it, yes,
a url _is_ a filename, but sometimes it's a symbolic one
-- in the sense that the "file" does not actually exist --
so both for purposes of clarity and to remind us of the
full range of the problem, i mention them specifically.)

so, both use-cases do exist. we have intraword italics,
and intraword underbars that must be taken as literals.

thus, we need a way to differentiate them.

the key here, to which i have already given one big hint,
is that the literal-underbars occur in specific situations,
namely for filenames and urls. intraword italics, on the
other hand, occur (by definition) in the middle of words.

so when my system encounters an underbar in a string,
it decides whether the string is a filename/url or a word.
in the former, the underbar is seen as a literal character;
in the latter, the underbar is considered an italic trigger.

it's relatively simple to determine if something is a url;
e.g., an "http" or a "www" or a ".com" is a dead giveway.
and an internal period is a good indicator of a filename,
especially if it's followed by a known filename extension.

likewise, it's relatively easy to tell if something is a word,
or is not, once you have removed the underbars inside it.
if it's in the dictionary, or if it's repeated (sans underbars)
elsewhere in the document, odds are that the underbars
in this version of the string are intended as italic triggers.

so, in my testing, this decision-rule has been pretty solid.

it's not something that i would recommend for markdown,
because of factors i will discuss later, but it works for me.

and, more to the point i'm trying to make here, it's what
can happen if you really try hard to resolve a discrepancy,
rather than simply just throwing your hands up in the air.
(like you just don't care. hu-hum, hu-hum, baby-cakes.)

i mean, i understand the paralysis that _will_ result when
you're mired in a standoff situation, like this has become,
but i think you markdown developers need to fight that.
instead, you've all let yourself become complacent about
the edge-cases and inconsistencies that dog the format.

a little elbow-grease might go a long way, is what i say.

but you're going to have to apply it. i had to work a lot
to come to the easy understanding of intraword italics
that i have just imparted to you. you need to work too.

and, for me, the italics situation was actually less sticky
than the asterisk problem, because asterisk-overload is
much, much worse. asterisks -- which i use for *bold*
(and i didn't take the easy way out and require two) --
_also_ represent bullets in unordered lists, _and_ occur
in equations where they are the sign for multiplication.
writing the routines to sort through all that was a pain.

further, curly-quote conversion isn't as easy as it seems.
a single round of thinking (like microsoft did) will create
a converter that makes some very embarassing mistakes.

even a couple more rounds of thinking might not give
you a routine that correctly gives straight-quotes in the
cases where the marks are referring to feet and inches,
or the minutes-and-seconds part of lattitude/longitude.

again, this is the kind of intense thinking you have to do
if you wanna sort through these types of difficulties, but
nobody here that i can see is doing much thinking at all.
and for sure you don't share any thinking you are doing,
or bounce ideas off of each other in a collaborative way.

and that's really sad.

***

so, anyway, this is what i'd recommend for markdown,
as your general solution to the underbar/italic problem.

(and, yes, i am chuckling as i write this, because i know
darn well that nobody even wants "a general solution",
and even though some implementations already do it,
the rest -- including gruber -- will never, ever, follow,
so any such proposal is an exercise in mere folly, but...)

anyway, here it is:

ban intraword italics, outright, with full notice, _but_
make it clear that the workaround is to use raw .html
to obtain the necessary italics for any intraword needs.

(and if you're curious why i don't use this in my system,
the reason is because i do not permit raw .html at all.)

***

and, finally, hey, let's put this all into perspective, ok?

the kind of standoff we have here is relatively minor.
and the problems we see border on the most trivial...

we see the same type of stubborness at a larger level
as the big corporations continue lobbying for d.r.m.,
and the big tech companies up their lock-in tactics.

and unlike here, in little old markdown land, where
there is no money to be made one way or the other,
the dollars from d.r.m. and lock-in could be _huge_.
so those companies are gonna be firm, intransigent,
and persistent in their stubbornness and their greed.

and, on a bigger level still, look at global warming,
and the way that we are rapidly polluting our planet.

again, the standoff there is so much more dangerous,
as the money is _staggering_, so don't even bother to
wonder if any of the big corporations will ever change.

and once humans go extinct, it will not really matter if,
once upon a time, somewhere along the line, someone
had their italics messed up because of a stray underbar.

so, just so you know, if it was _just_ markdown that this
was relevant to, i probably wouldn't care nearly so much.

but the problem of stubborn standoffs is much bigger,
and applies to arenas far larger than this little molehill,
causing problems worse than the smell of dead skunks,
and _that_ is why i care, and why i choose to speak up...

now i will ask you: why do you sit and suffer in silence?

-bowerbird



More information about the Markdown-Discuss mailing list