fast sailing on the way to a clearer model

bowerbird bowerbird at
Sat Sep 6 04:12:50 EDT 2014

john macfarlane said:
>   CommonMark, which provides two implementations
>   that use the same parsing algorithm,
>   one in portable C and one in 1540 lines of javascript


>   The javascript implementation doesn't use any unusual
>   javascript features and should be straightforward to port
>   to other dynamic languages: perl, python, ruby, PHP.

double bingo.

>   (Or you could just use the javascript library client-side
>   and skip the server-side rendering.)


>   The parsers are both fast and accurate.

accurate is assumed.  so, for fast, let's have the numbers.

(i'm ostensibly writing this reply to john, but really to michel,
and to everyone else on this list, because i know macfarlane
already recognizes this, and i know some of you others don't.)

>   without changing the algorithm, has managed to make
>   it about as fast as sundown, which is very fast indeed
>   (0.01 seconds to parse a 1MB document, for example).
>   When optimization is complete, it should be even faster.

first of all, let me just say that, from a user's perspective,
any conversion that happens in one second is "fast enough".
so if you're talking 100 times faster than that, you're good.

now let's put this into perspective.

a 1-megabyte "document" is a book, and not a small one.

a small book -- "alice in wonderland" -- runs at about 150k.

a medium-sized book is anywhere between 400k and 600k.

and a _big_ book weighs in at around the 1-megabyte range.
("moby dick" is 1.2-megs.  "war and peace" is 3.2-megs huge;
but then again, it was also split up into 15 different "books".)

so let's say that, as a general rule-of-thumb, you can assume
that 99.9% of your needs are gonna fall under the 1-meg mark.

so if you can keep it under a second, or even two, you're good.

>   The javascript parser is also very fast  (0.28 seconds for the
>   above-mentioned 1MB document, running in the Chrome browser).

so, you're good.

> takes 250 seconds on the same input

really really really bad.  but you knew that before you started.

>   and pandoc takes 3.19 seconds.

not good.  not really bad, on a 1-meg file.  but room to improve.

ok, actually, considering that pandoc runs in a "batch" modality,
3.19 seconds is fantastic.  but nobody really wants to do "batch"
when they could instead be doing on-the-fly immediate reactive.
so you're gonna have to have an editor, and a web-app, wherein
a 3.2-second turnover will seem "pokey" to today's spoiled users.
so you're gonna need a definite improvement.


and you _have_ definite improvement, thanks to a better model,
to the point that performance metrics are no longer your concern.

you're way past "fast enough", even on huge files, so you can now
shift your focus to all of the other things you should be prioritizing.

and now i'll say it a third time, so everyone besides macfarlane
understands exactly what i'm saying:  throw away your flavors,
and get on board this new, better model, with a good algorithm.
maybe the way you did it before was good enough for back then,
but the future _will_ leave you behind if you do not get on board.

_however_, you might want to wait, for maybe a month or two.

because macfarlane's _model_ isn't yet as clear as it should be...

and now i will speak to macfarlane.  (but y'all should listen too.)

this is the point where i basically throw all this back in your lap.

because i now respect you as a worthy competitor to my system.
and i don't need to be giving free advice to a worthy competitor.

(and not to put too much stress on the "competitor" aspect, since
i think it's a win-win situation, and nobody's extracting any cash.
besides, you might not even see me out here on the playing field,
let alone think that i could be capable of giving _you_ any advice.)

but i also imagine you are getting "lots" of advice from elsewhere.
and i don't feel any big need to be a part of that bellowing chorus.

but there's a definite weakness in your approach, and i suspect that
you know it yourself, so here goes: it is too complex for the users.

yes, you've improved gruber's system by eliminating ambiguities,
but you did it by creating additional rules on how to handle them,
via what, 159 examples?, which is a very heavy burden for users.
you need to lighten the load for them, or you don't stand a chance.

specifically, stop retaining the backward compatibility with gruber.

instead, ditch the aspects that are _causing_ all those edge cases,
so you can boil things down into a cleaner mental model which is
conducive to giving your users a more transparent understanding.

i know, i know, you've spent years and years trying to fit yourself
into his mental model, so it's very difficult now to let yourself out
of that fenced area to run free, suffering as much as you have from
stockholm syndrome, but gruber had a very inferior understanding.
if you toss out _everything_ from him, and start over from scratch,
first you will experience a remarkable feeling of freedom, and then
you will find yourself making great progress toward crystal clarity.
because you now have an infinitely better understanding than his.
after that, you can drag back the pieces of his model that still "fit".

all by itself, that will ensure your success.  but here's another tip:
if different flavor-developers made incompatible interpretations,
it's because they weren't operating with the same mental model,
so those incompatibilities are sign-posts to spots that need work.
and the work is _not_ in the simply resolution of a decision-fork,
but rather an understanding of how those developers ended up
in two separate places with different perspectives on that issue.
what led one to be on the east side, and the other on the west?
and how do you channel _users_ so they're all on the same side?

i've probably said too much, for my own good, and in your eyes,
but there it is.  best of luck down the line in the work you're doing.


p.s.  andrei said:
>   Times without hardware specs mean nothing,
>   please also provide some hardware specs

times _with_ hardware specs mean next-to-nothing.
give me the code, so i can run it on my own machines,
on my own documents, so i can feel the speed myself.

More information about the Markdown-Discuss mailing list