[om-list] Inverse Cumulative Probability Distribution Functions

Fri Jan 4 13:40:04 EST 2002

(OM people: This email has a prerequisite of introductory mathematical
statistics.  If you don't have the prereqs, read it anyway if you have
interest in helping 4C with its Informatica; 4C-Informatica will probably be
heavily based on probability distributions; it's good to get the exposure as
soon as possible, in learning new things.)

Mark

    Remember our phone conversation about generating inverse distribution
functions from estimated (sampled) p.d.f.s?  I'm concerned about cases of
discrete p.d.f.s.

    You remember how the "nodes" in the domain of the inverse distribution
function would not necessarily correspond to the nodes in the domain of the
original p.d.f.?  If we have a few discrete, positively valued points in the
p.d.f. domain, how will we regain those exact points through the inverse
distribution function (i.d.f.) if this distribution function is generated by
integrating between points other than the "support" points?

    That is, think of the process of generating a semi-random Y: we generate
a random number in the domain of the i.d.f., between 0 and 1, and then
looking for the node with that value, or interpolate to find an
approximation.  In the discrete case, there will rarely be a node with that
exact value, so we'd be looking for the point where the distribution
function range jumps from below the input value to above the input value and
then interpolate.  This could be drastically wrong if we end up generating a
lot of Y's which in reality never have positive probabilities.

    There's another reason we should make the i.d.f. correspond directly to
the sampled p.d.f.: it would be more accurate, even in the continuous
case -- or at least I think it would be, since we'd have less approximation
error: we'd be interpolating once instead of twice.

    How easy might it be to generate an inverse distribution function that
does correspond directly to the p.d.f. in its sample points?

tomp