Five ways to solve one of my favourite mathematical problems

Posted on May 10, 2014 by phimuemue

During my studies, we had to take the lecture "Discrete Probability Theory" which quickly became one of my favourite ones. The day before the exam, I found the following riddle in the internet (see here):

n persons, all of different heights are standing in a queue facing the window of ticket distributor in a cinema hall. What is the expected number of persons visible to the person at the ticket window if each permutation of n persons is equally likely?

Over time, I found some solutions to this problem, and I'd like to present them to you.

Preliminaries

We tackle this problem the following way: We realize that it is not actually important which sizes the people have. Thus, we assume that these \(n\) people have sizes \(1, 2, 3, \dots, n\) (or sizes \(0,1,2,\dots,n-1\) - which is easier to implement without index shifting) and base all our calculations on this assumption.

Experimental approach

The following program can be used to do a quick experimental check:

import itertools
import fractions

def visible(p):
    res = 0
    m = -1
    for x in p:
        if x > m:
            m = x
            res = res + 1
    return res

def E(n):
    res = fractions.Fraction(0)
    for p in itertools.permutations(range(n)):
        res = res + visible(p) 
    return res/reduce(lambda x,y: x*y, range(1, n+1))

for i in range(1, 6):
    print E(i)

Its output (the numbers 1, 3/2, 11/6, 25/12, 137/60) suggests that the expected value for \(n\) people in line is exactly the \(n\)-th harmonic number.

We will now see several proofs of this fact.

Linearity of expectation and indicator variables

The most straightforward solution is to use indicator variables: We define \(I_i\) as the indicator variable that is 1 iff the person of size \(i\) is visible and 0 otherwise.

We can then say that the expected number of visible people among \(n\) people in line (- denoted by \(\mathbb{E}[X_n]\) -) is given by \(\mathbb{E}[X_n] = \mathbb{E}[\sum_{i=1}^n I_i]=\sum_{i=1}^n \mathbb{E}[I_i]\) (because of linearity of expectation).

Because \(I_i\) is an indicator variable, we know that \(\mathbb{E}[I_i]=Pr[I_i=1]\). Now, the probability that \(I_i=1\) denotes the probability that the person of size \(i\) is visible.

The probability that the person of size \(i\) is visible depends only on the person of size \(i\) and larger people (i.e. the people of sizes \(i+1,i+2,\dots,n\)). The person of size \(i\) is visible iff all taller people are standing behind person \(i\). This means: We have to compute the probability that - only considering people of sizes \(i, i+1, i+2, \dots, n\), we have a permuatation whose first person is person \(i\).

There are \((n-i+1)!\) permutations of the people \(i,i+1,i+2,\dots,n\) in total, and exactly \((n-i)!\) of them are beginning with person \(i\) (because we can arrange the people \(i+1,i+2,\dots,n\) in \((n-i)!\) ways). Thus, \(Pr[I_i=1]=\frac{(n-i)!}{(n-i+1)!}=\frac{1}{n-i+1}\).

We plug this knowledge into the above equation obtained from linearity of expectation and obtain that \(\mathbb{E}[X_n]=\sum_{i=1}^n \frac{1}{n-i+1} = \sum_{j=1}^n \frac{1}{j} = H_n\), where \(H_n\) denotes the \(n\)-th harmonic number. qed.

A combinatorial approach

This approach uses quite elementary combinatorial facts, but is somewhat lenghty to carry out. We define a function \(v: S^n \mapsto \mathbb{N}\) that maps a permontation (an element of the symmetric group \(S^n\)) onto its number of visible people in line (just like the function visible in our Python code above).

We then can argue that \(\mathbb{E}(X_n)=\frac{\sum_{p\in S^n} v(p)}{n!}\) - we simply iterate over all permutations, sum up the visible people per permutation and divide by the number of permutations in total. So far, this is no problem at all.

We now imagine that we have a table, where each line stands for exactly one permutation, and each column represents a position in the queue. The element at row \(p\) and column \(i\) in this table is 1 iff in permutation \(p\), the person at position \(i\) is visible. As an example, consider the following table for 3 people:

        1 2 3
-------------
1 2 3 | 1 1 1
1 3 2 | 1 1 0
2 1 3 | 1 0 1
2 3 1 | 1 1 0
3 1 2 | 1 0 0
3 2 1 | 1 0 0

We can now argue that the sum of row \(p\) is exactly \(v(p)\). Thus, \(\sum_{p\in S^n} v(p)\) is taking the sum for each row, and sums all these sub-sums up to the final result.

It is clear that we are not limited to computing this sum by summing up sums per row, but that we can also sum up sums per column - the result must be the same.

Thus, we define \(col(i)\) to be the function that takes the column sum of column \(i\). The function \(col(i)\) intuitively counts the number of permutations of \(n\) people whose person at position \(i\) is visible. Using this function, we can say that \(\sum_{p\in S^n} v(p) = \sum_{i=1}^n col(i)\).

It turns out that \(col(i)\) is relatively easy to compute. We do it by examining how we could generate a permutation where the person at position \(i\) is visible:

In total, this yields \(\binom{n}{i}\cdot(i-1)!\cdot(n-i)!=\frac{n!}{i}\).

We plug this into the column-sum and obtain that \(\mathbb{E}[X_n]=\frac{\sum_{i=1}^n col(i)}{n!}=\frac{\sum_{i=1}^n \frac{n!}{i}}{n!}=\sum_{i=1}^n \frac{1}{i}=H_n\). qed.

Conditional expectation

This was actually my first solution to this problem, but it is not the most straightforward one. We use conditional expectation and the law of total probability to compute the desired result.

We define \(T\) to be the position of the tallest person. We then can argue that \(\mathbb{E}[X_n] = \sum_{t=1}^n Pr[T=t] \cdot \mathbb{E}[X_n \mid T=t]\).

It is clear that \(Pr[T=t]\), the probability that the tallest person is exactly at position \(t\), is \(\frac{1}{n}\) because the tallest person must land on one of \(n\) different positions in total, and each of these positions is equally likely.

What is not so clear, is how we can compute \(\mathbb{E}[X_n \mid T=t]\). This denotes the expected value under the condition that the tallest person is placed at position \(t\).

If the tallest person is placed at position \(t\), there are \(t-1\) people in front of the tallest person. All people standing at positions \(t+1,t+2,\dots,n\) are invisible since overlapped by the tallest person.

That is, to compute \(\mathbb{E}[X_n \mid T=t]\), we simply compute the expected number of visible people among the \(t-1\) people in front of the tallest person, and add 1 to obtain the expected number of visible people in total. In formulae, this means that \(\mathbb{E}[X_n \mid T=t]=1+\mathbb{E}[X_{t-1}]\).

It is clear that \(\mathbb{E}[X_0]=0\) and \(\mathbb{E}[X_1]=1\), which will be the base cases of our induction.

The induction step is a bit technical and is done as follows: We start out by computing \(\sum_{t=1}^{n+1} Pr[T=t] \cdot E[X_{n+1}\mid T=t] = \sum_{t=1}^{n+1} \frac{1}{{n+1}} \cdot (1+\mathbb{E}[X_{t-1}])\). We can now use the induction hypothesis and reformulate to \(\sum_{t=1}^{n+1} \frac{1}{{n+1}} (1+\cdot H_{t-1}) =\sum_{t=1}^{n+1} \frac{1}{{n+1}} \cdot (1+\sum_{m=1}^{t-1} \frac{1}{m})\).

We can transform the above sum to \(\frac{1}{n+1}\cdot \sum_{t=1}^{n+1}(1+\sum_{m=1}^{t-1}\frac{1}{m})\), which itself can be simplified to \(1 + \frac{1}{n+1}\cdot\sum_{t=1}^{n+1}\sum_{m=1}^{t-1} \frac{1}{m}\).

We rewrite \(\sum_{t=1}^{n+1}\sum_{m=1}^{t-1} 1/m = \sum_{j=1}^{n+1} (n+1-j)\cdot\frac{1}{j} = \sum_{j=1}^{n+1} (n+1)\cdot\frac{1}{j} - \sum_{j=1}^{n+1} j\cdot \frac{1}{j}\), whose first subsum is just \((n+1) \cdot H_{n+1}\) and whose second subsum is \({n+1}\) resulting in \(\sum_{t=1}^{n+1}\sum_{m=1}^{t-1} 1/m = (n+1)\cdot H_{n+1} - (n+1)\).

This, finally, means that \(\mathbb{E}[X_{n+1}] =1+\frac{1}{n+1}\cdot\sum_{t=1}^{n+1}\sum_{m=1}^{t-1}\frac{1}{m} =1+\frac{1}{n+1}\cdot\left((n+1)H_{n+1} - (n+1)\right) = H_{n+1}\). qed.

Raw probabilities

A variation of the previous approach works by using probabilities directly: We define \(P_n(m) = \begin{cases} 0 & \text{ if } m=0 < n \text{ or } n<m \\ 1 & \text{ if } m=n=0 \\ \frac{1}{n} \sum_{i=1}^n P_{i-1}(m-1) & \text{ otherwise} \end{cases}\).

The value \(P_n(m)\) denotes the probability that there are exactly \(m\) people visible in a queue of \(n\) people in total.

These cases can be explained quite easily as follows:

We use induction again and see that the base cases are directly encoded in the definition of \(P_n(m)\), so we don't write this explicitly here.

For the induction step (- nota bene: \(n\geq m > 0\), thus the third case of the definition of \(P_n(m)\) applies -), we now can use the definition of the expected value: \(\mathbb{E}[X_{n+1}] = \sum_{i=1}^{n+1} m \cdot P_{n+1}(m) = \sum_{i=1}^{n+1} m \cdot \left(\sum_{i=1}^{n+1}\frac{1}{n+1}P_{i-1}(m-1)\right) = \frac{1}{n+1}\cdot\sum_{i=1}^{n+1} \sum_{i=1}^{n+1}{m}\cdot P_{i-1}(m-1) \).

We transform the innermost sum: \(\sum_{i=1}^n \sum_{i=1}^{n+1}{m}\cdot P_{i-1}(m-1) = \sum_{m=1}^{n+1}(m-1)\cdot P_{i-1}(m-1) + \sum_{m=1}^{n-1}P_{i-1}(m-1)\) and consider the two resulting sums seperately:

The first subsum \(\sum_{m=1}^{n+1}(m-1)\cdot P_{i-1}(m-1)\) describes - by definition - exactly the expected value \(\mathbb{E}(X_{i-1})\) of visible people in a line of \(i-1\) people.

The second subsum \(\sum_{m=1}^{n-1}P_{i-1}(m-1)\) just evaluates to 1, since it just sums up the probabilities that there are exactly \(m-1\) visible people in a queue of \(i-1\) people in total, where \(m-1\) ranges from 0 to \(n\). Since \(i-1\leq n\) and all the events are disjunct, it is clear that this probability results in 1.

We use this and infer that \(\mathbb{E}[X_{n+1}] = \frac{1}{n+1}\cdot\sum_{i=1}^{n+1} \sum_{i=1}^{n+1}{m}\cdot P_{i-1}(m-1) = \frac{1}{n+1}\cdot\sum_{i=1}^{n+1} \left( \mathbb{E}[X_{i-1}] + 1 \right) \).

This, on the other hand, is exactly the term we arrived at when we did the computation using conditional expectations (see above). We can continue as before and are done. qed.

A useful bijection

Finally a very tricky possibility to compute the expected value: We define the following function \(tocyc\) that maps a permutation onto another one by constructing one cycle of the permutation after another.

The function traverses the given permutation from left to right and does the following:

Here's an implementation in Python:

def tocyc(p):
    res = []
    m = -1
    for i in p:
        if i > m:
            m = i
            res.append([i])
        else:
            res[-1].append(i)
    return res

for p in itertools.permutations(range(4)):
    print "%s : %s"%(str(p), str(tocyc(p)))

The output of the above program would be

(0, 1, 2, 3) : [[0], [1], [2], [3]]
(0, 1, 3, 2) : [[0], [1], [3, 2]]
(0, 2, 1, 3) : [[0], [2, 1], [3]]
(0, 2, 3, 1) : [[0], [2], [3, 1]]
(0, 3, 1, 2) : [[0], [3, 1, 2]]
(0, 3, 2, 1) : [[0], [3, 2, 1]]
(1, 0, 2, 3) : [[1, 0], [2], [3]]
(1, 0, 3, 2) : [[1, 0], [3, 2]]
(1, 2, 0, 3) : [[1], [2, 0], [3]]
(1, 2, 3, 0) : [[1], [2], [3, 0]]
(1, 3, 0, 2) : [[1], [3, 0, 2]]
(1, 3, 2, 0) : [[1], [3, 2, 0]]
(2, 0, 1, 3) : [[2, 0, 1], [3]]
(2, 0, 3, 1) : [[2, 0], [3, 1]]
(2, 1, 0, 3) : [[2, 1, 0], [3]]
(2, 1, 3, 0) : [[2, 1], [3, 0]]
(2, 3, 0, 1) : [[2], [3, 0, 1]]
(2, 3, 1, 0) : [[2], [3, 1, 0]]
(3, 0, 1, 2) : [[3, 0, 1, 2]]
(3, 0, 2, 1) : [[3, 0, 2, 1]]
(3, 1, 0, 2) : [[3, 1, 0, 2]]
(3, 1, 2, 0) : [[3, 1, 2, 0]]
(3, 2, 0, 1) : [[3, 2, 0, 1]]
(3, 2, 1, 0) : [[3, 2, 1, 0]]

We can confirm two things that the function tocyc fulfills:

From these two facts, it is clear that the expected number of visible people in a random-permutation queue of \(n\) people is exactly the expected number of cycles in a random \(n\)-permutation.

Fortunately, we've already computed that the expected number of cycles is exactly in \(H_n\) (see here). qed.