Minor regexp oversight for setext headings

A. Pagaltzis pagaltzis at gmx.de
Mon Oct 9 13:15:47 EDT 2006


* Angie Ahl <alists at vertebrate.co.uk> [2006-10-09 16:10]:

> And I've read the book about 5 times by now so I'd really

> appreciate you qualifying what you say to the list for the

> benefit of the list as I'm pretty sure you're mistaken.


Ask yourself: if non-greedy is always faster, why is it not the
default?

Greedy matching is the strategy of immediately gobbling up as
much as possible and then unmatching it one repetition at a time
every time an attempt to match the following atom fails. Non-
greedy matching is the strategy of matching the least possible
amount and then adding another repetition every time an attempt
to match the following atom fails.

So if you have the following string:

xxxxxxxxxxxxxxxxxxxxxxxxxy

then the pattern `.*y` will immediately allocate the entire
string to the `.*`, then fail to match the `y` at end-of-string,
give up one character from the `.*`, try to match `y`, and
succeed. Done.

The pattern `.*?y` will immediately allocate nothing to `.*?`,
then fail to match `y`, go back to allocate one character to the
`.*?`, then again fail to match `y`, go back to allocate one more
character to the `.*?`, still fail to match `y`, and so on, until
after 25 attempts it finally has allocated all the x-es to the
`.*?` and the `y` matches the last character so the match
succeeds.

Obviously, which one will succeed faster depends on the data. If
you use the following string:

yxxxxxxxxxxxxxxxxxxxxxxxxx

then the non-greedy pattern will succeed in two steps while the
greedy one will backtrack like a madman trying to find the match.

Actually, that’s not even how it really works. Dot-star matches
are usually heavily optimised in modern engines. You really can’t
say which one will be faster for what data without looking at how
the matching engine you use compiles and executes the pattern.

Non-greedy matching is not some kind of magic pixie dust to make
patterns go faster. Don’t tell people it is. And if you really do
need to make your pattern faster, then analyse how it is compiled
and profile how it is executed. Standard programming practice.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>


More information about the Markdown-Discuss mailing list