How to algorithmically simulate probabilistic distributions using other distributions

Posted on 2014 03 21 by phimuemue

In this article, we explore (in short) how we can generate random numbers of any distribution if we have a corrupt toin coss generator (i.e. a random toin that produces 1 with (unknown) probability \(p\in(0;1)\) and 0 with (unknown) probability \(q=1-p\)). So first of all, we convince ourselves that we can construct a fair coin tosser (i.e. 1 and 0 occur both with probability 0.5) from the above mentioned corrupt coin tosser.

Therefore, we do the following: We execute the corrupt coin tosser two times. Then there are four possible outcomes:

We repeat until we have either 01 or 10 as output. Then, we can compute: The probability of reaching the result "01" is the following: Either we reach "01" immediately or we get first some "00" and "11". Thus, we can compute the probability of reaching "01" as follows: \(\sum_{i=0}^\infty (p^2+q^2)^i \cdot (pq)\). Using the fact that \(q+p=1\) and geometric series, we compute that this probability is 0.5. Moreover, we can do the same for the outcome "10". Thus, we get a fair random 0/1-number-generator.

We saw that we can w.l.o.g. assume that we have a fair 0/1-random-number-generator \(R\). Armed with this knowledge, we can now say that we can generate any discrete random distribution, where the number of results is finite. Therefore, we can just employ a binary encoding of successive calls to \(R\). To be more precise, if we want to have \(n\)possible outcomes (namely from 0 to \(n-1\)), we do \(\lceil \log_2 n \rceil\) calls to \(R\). If the binary encoding of the successive \(R\)-calls yields a number less than \(n\), we are done and return that number as the result of our random process. Otherwise, we repeat until we get such a number as the result.

So, up to now we saw that we can w.l.o.g. assume that we have random number generators that generate a random number between 0 and \(n\).

For now, we assume that we want to construct a random variable \(X\) with a specific distribution. To generate any such variable, we need a continuous random variable. That is, we have to convert somehow our discrete variable to a continuous one.

There are several methods to do so, however we focus on the following: We assume \(R\) is a fair 0/1-random-number-generator. Then, we execute \(n\) times the generator \(R\) and interpret the result as a \(n\)-digit binary number. After that, we divide the result by \(2^n\). Note that this yields still a discrete random variable, but for large \(n\) it "behaves" like a uniformly distributed variable over the intervall \([0;1]\). To be honest, I do not know if there is a (theoretical) way of transforming any discrete random variable into a uniformly distributed continuous random variable.

So, we now assume that we have (almost) a random variable \(R'\) that is uniformly distributed over \([0;1]\). If we now want a (continuous) random variable \(X\) with the distribution \(F_X(t)=P(X\leq t)\), we can do as follows (we assume \(F_X\) to be strictly increasing, and thus we also assume that there is an \(F_X\) inverse \(F_X^{-1}\) with \(F_X(F_X^{-1}(t))\)): We define this random variable as follows: \(X=F_X^{-1}(R')\).

Now we compute:

\(P(X\leq t)=P(F_X^{-1}(R')\leq t)= P(R' \leq F_X(t))\approx F_R'(F_X(t))= F_X(t)\)

This tells us that \(X\) is a random variable with the distribution \(F_X\), which is what we wanted.

Note that out of a continuous variable it is no problem to derive a discrete random variable that e.g. takes only integral values.