Five ways to solve one of my favourite mathematical problems
During my studies, we had to take the lecture "Discrete Probability Theory" which quickly became one of my favourite ones. The day before the exam, I found the following riddle in the internet (see here):
n persons, all of different heights are standing in a queue facing the window of ticket distributor in a cinema hall. What is the expected number of persons visible to the person at the ticket window if each permutation of n persons is equally likely?
Over time, I found some solutions to this problem, and I'd like to present them to you.
Preliminaries
We tackle this problem the following way: We realize that it is not actually important which sizes the people have. Thus, we assume that these \(n\) people have sizes \(1, 2, 3, \dots, n\) (or sizes \(0,1,2,\dots,n-1\) - which is easier to implement without index shifting) and base all our calculations on this assumption.
Experimental approach
The following program can be used to do a quick experimental check:
import itertools
import fractions
def visible(p):
res = 0
m = -1
for x in p:
if x > m:
m = x
res = res + 1
return res
def E(n):
res = fractions.Fraction(0)
for p in itertools.permutations(range(n)):
res = res + visible(p)
return res/reduce(lambda x,y: x*y, range(1, n+1))
for i in range(1, 6):
print E(i)
Its output (the numbers 1, 3/2, 11/6, 25/12, 137/60) suggests that the expected value for \(n\) people in line is exactly the \(n\)-th harmonic number.
We will now see several proofs of this fact.
Linearity of expectation and indicator variables
The most straightforward solution is to use indicator variables: We define \(I_i\) as the indicator variable that is 1 iff the person of size \(i\) is visible and 0 otherwise.
We can then say that the expected number of visible people among \(n\) people in line (- denoted by \(\mathbb{E}[X_n]\) -) is given by \(\mathbb{E}[X_n] = \mathbb{E}[\sum_{i=1}^n I_i]=\sum_{i=1}^n \mathbb{E}[I_i]\) (because of linearity of expectation).
Because \(I_i\) is an indicator variable, we know that \(\mathbb{E}[I_i]=Pr[I_i=1]\). Now, the probability that \(I_i=1\) denotes the probability that the person of size \(i\) is visible.
The probability that the person of size \(i\) is visible depends only on the person of size \(i\) and larger people (i.e. the people of sizes \(i+1,i+2,\dots,n\)). The person of size \(i\) is visible iff all taller people are standing behind person \(i\). This means: We have to compute the probability that - only considering people of sizes \(i, i+1, i+2, \dots, n\), we have a permuatation whose first person is person \(i\).
There are \((n-i+1)!\) permutations of the people \(i,i+1,i+2,\dots,n\) in total, and exactly \((n-i)!\) of them are beginning with person \(i\) (because we can arrange the people \(i+1,i+2,\dots,n\) in \((n-i)!\) ways). Thus, \(Pr[I_i=1]=\frac{(n-i)!}{(n-i+1)!}=\frac{1}{n-i+1}\).
We plug this knowledge into the above equation obtained from linearity of expectation and obtain that \(\mathbb{E}[X_n]=\sum_{i=1}^n \frac{1}{n-i+1} = \sum_{j=1}^n \frac{1}{j} = H_n\), where \(H_n\) denotes the \(n\)-th harmonic number. qed.
A combinatorial approach
This approach uses quite elementary combinatorial facts, but is somewhat lenghty to carry out. We define a function \(v: S^n \mapsto \mathbb{N}\) that maps a permontation (an element of the symmetric group \(S^n\)) onto its number of visible people in line (just like the function visible
in our Python code above).
We then can argue that \(\mathbb{E}(X_n)=\frac{\sum_{p\in S^n} v(p)}{n!}\) - we simply iterate over all permutations, sum up the visible people per permutation and divide by the number of permutations in total. So far, this is no problem at all.
We now imagine that we have a table, where each line stands for exactly one permutation, and each column represents a position in the queue. The element at row \(p\) and column \(i\) in this table is 1 iff in permutation \(p\), the person at position \(i\) is visible. As an example, consider the following table for 3 people:
1 2 3
-------------
1 2 3 | 1 1 1
1 3 2 | 1 1 0
2 1 3 | 1 0 1
2 3 1 | 1 1 0
3 1 2 | 1 0 0
3 2 1 | 1 0 0
We can now argue that the sum of row \(p\) is exactly \(v(p)\). Thus, \(\sum_{p\in S^n} v(p)\) is taking the sum for each row, and sums all these sub-sums up to the final result.
It is clear that we are not limited to computing this sum by summing up sums per row, but that we can also sum up sums per column - the result must be the same.
Thus, we define \(col(i)\) to be the function that takes the column sum of column \(i\). The function \(col(i)\) intuitively counts the number of permutations of \(n\) people whose person at position \(i\) is visible. Using this function, we can say that \(\sum_{p\in S^n} v(p) = \sum_{i=1}^n col(i)\).
It turns out that \(col(i)\) is relatively easy to compute. We do it by examining how we could generate a permutation where the person at position \(i\) is visible:
- We take \(i\) different people of our \(n\) people in total (\(\binom{n}{i}\) possibilities to do so). These people are distributed onto the positions \(1,2,\dots,i\). We call this set \(I\).
- In order for the person at position \(i\) to be visible, it must be the largest person of the \(i\) people chosen in the previous step. Thus, we have to place the largest person within \(I\) must be placed at position \(i\). All the other people from \(I\) can be placed randomly on positions \(1,2,\dots,i-1\), which yields \((i-1)!\) possibilities.
- The remaining \(n-i\) people can be placed randomly at the positions \(i+1,i+2,\dots,n\), for which there are \((n-i)!\) possible arrangements.
In total, this yields \(\binom{n}{i}\cdot(i-1)!\cdot(n-i)!=\frac{n!}{i}\).
We plug this into the column-sum and obtain that \(\mathbb{E}[X_n]=\frac{\sum_{i=1}^n col(i)}{n!}=\frac{\sum_{i=1}^n \frac{n!}{i}}{n!}=\sum_{i=1}^n \frac{1}{i}=H_n\). qed.
Conditional expectation
This was actually my first solution to this problem, but it is not the most straightforward one. We use conditional expectation and the law of total probability to compute the desired result.
We define \(T\) to be the position of the tallest person. We then can argue that \(\mathbb{E}[X_n] = \sum_{t=1}^n Pr[T=t] \cdot \mathbb{E}[X_n \mid T=t]\).
It is clear that \(Pr[T=t]\), the probability that the tallest person is exactly at position \(t\), is \(\frac{1}{n}\) because the tallest person must land on one of \(n\) different positions in total, and each of these positions is equally likely.
What is not so clear, is how we can compute \(\mathbb{E}[X_n \mid T=t]\). This denotes the expected value under the condition that the tallest person is placed at position \(t\).
If the tallest person is placed at position \(t\), there are \(t-1\) people in front of the tallest person. All people standing at positions \(t+1,t+2,\dots,n\) are invisible since overlapped by the tallest person.
That is, to compute \(\mathbb{E}[X_n \mid T=t]\), we simply compute the expected number of visible people among the \(t-1\) people in front of the tallest person, and add 1 to obtain the expected number of visible people in total. In formulae, this means that \(\mathbb{E}[X_n \mid T=t]=1+\mathbb{E}[X_{t-1}]\).
It is clear that \(\mathbb{E}[X_0]=0\) and \(\mathbb{E}[X_1]=1\), which will be the base cases of our induction.
The induction step is a bit technical and is done as follows: We start out by computing \(\sum_{t=1}^{n+1} Pr[T=t] \cdot E[X_{n+1}\mid T=t] = \sum_{t=1}^{n+1} \frac{1}{{n+1}} \cdot (1+\mathbb{E}[X_{t-1}])\). We can now use the induction hypothesis and reformulate to \(\sum_{t=1}^{n+1} \frac{1}{{n+1}} (1+\cdot H_{t-1}) =\sum_{t=1}^{n+1} \frac{1}{{n+1}} \cdot (1+\sum_{m=1}^{t-1} \frac{1}{m})\).
We can transform the above sum to \(\frac{1}{n+1}\cdot \sum_{t=1}^{n+1}(1+\sum_{m=1}^{t-1}\frac{1}{m})\), which itself can be simplified to \(1 + \frac{1}{n+1}\cdot\sum_{t=1}^{n+1}\sum_{m=1}^{t-1} \frac{1}{m}\).
We rewrite \(\sum_{t=1}^{n+1}\sum_{m=1}^{t-1} 1/m = \sum_{j=1}^{n+1} (n+1-j)\cdot\frac{1}{j} = \sum_{j=1}^{n+1} (n+1)\cdot\frac{1}{j} - \sum_{j=1}^{n+1} j\cdot \frac{1}{j}\), whose first subsum is just \((n+1) \cdot H_{n+1}\) and whose second subsum is \({n+1}\) resulting in \(\sum_{t=1}^{n+1}\sum_{m=1}^{t-1} 1/m = (n+1)\cdot H_{n+1} - (n+1)\).
This, finally, means that \(\mathbb{E}[X_{n+1}] =1+\frac{1}{n+1}\cdot\sum_{t=1}^{n+1}\sum_{m=1}^{t-1}\frac{1}{m} =1+\frac{1}{n+1}\cdot\left((n+1)H_{n+1} - (n+1)\right) = H_{n+1}\). qed.
Raw probabilities
A variation of the previous approach works by using probabilities directly: We define \(P_n(m) = \begin{cases} 0 & \text{ if } m=0 < n \text{ or } n<m \\ 1 & \text{ if } m=n=0 \\ \frac{1}{n} \sum_{i=1}^n P_{i-1}(m-1) & \text{ otherwise} \end{cases}\).
The value \(P_n(m)\) denotes the probability that there are exactly \(m\) people visible in a queue of \(n\) people in total.
These cases can be explained quite easily as follows:
- The first two cases are base cases where it is either impossible or hundret percent sure that exactly \(m\) people are visible.
- The third case is a conditional probability and uses a distinction on where the tallest person is standing. If the tallest person is standing at position \(i\), there must be exactly \(m-1\) people visible before this tallest person. The probability for this is given by \(P_{i-1}(m-1)\).
We use induction again and see that the base cases are directly encoded in the definition of \(P_n(m)\), so we don't write this explicitly here.
For the induction step (- nota bene: \(n\geq m > 0\), thus the third case of the definition of \(P_n(m)\) applies -), we now can use the definition of the expected value: $[X_{n+1}] = {i=1}^{n+1} m P{n+1}(m) = {i=1}^{n+1} m ({i=1}^{n+1}P_{i-1}(m-1)) = {i=1}^{n+1} {i=1}^{n+1}{m}P_{i-1}(m-1) $.
We transform the innermost sum: \(\sum_{i=1}^n \sum_{i=1}^{n+1}{m}\cdot P_{i-1}(m-1) = \sum_{m=1}^{n+1}(m-1)\cdot P_{i-1}(m-1) + \sum_{m=1}^{n-1}P_{i-1}(m-1)\) and consider the two resulting sums seperately:
The first subsum \(\sum_{m=1}^{n+1}(m-1)\cdot P_{i-1}(m-1)\) describes - by definition - exactly the expected value \(\mathbb{E}(X_{i-1})\) of visible people in a line of \(i-1\) people.
The second subsum \(\sum_{m=1}^{n-1}P_{i-1}(m-1)\) just evaluates to 1, since it just sums up the probabilities that there are exactly \(m-1\) visible people in a queue of \(i-1\) people in total, where \(m-1\) ranges from 0 to \(n\). Since \(i-1\leq n\) and all the events are disjunct, it is clear that this probability results in 1.
We use this and infer that $[X_{n+1}] = {i=1}^{n+1} {i=1}^{n+1}{m}P_{i-1}(m-1) = _{i=1}^{n+1} ( [X_{i-1}] + 1 ) $.
This, on the other hand, is exactly the term we arrived at when we did the computation using conditional expectations (see above). We can continue as before and are done. qed.
A useful bijection
Finally a very tricky possibility to compute the expected value: We define the following function \(tocyc\) that maps a permutation onto another one by constructing one cycle of the permutation after another.
The function traverses the given permutation from left to right and does the following:
- If we see a person that is taller than all people seen so far, we start a new cycle and append the current person as the first item to the current cycle.
- Otherwise, we append the current person at the end of the current cycle.
Here's an implementation in Python:
def tocyc(p):
res = []
m = -1
for i in p:
if i > m:
m = i
res.append([i])
else:
res[-1].append(i)
return res
for p in itertools.permutations(range(4)):
print "%s : %s"%(str(p), str(tocyc(p)))
The output of the above program would be
(0, 1, 2, 3) : [[0], [1], [2], [3]]
(0, 1, 3, 2) : [[0], [1], [3, 2]]
(0, 2, 1, 3) : [[0], [2, 1], [3]]
(0, 2, 3, 1) : [[0], [2], [3, 1]]
(0, 3, 1, 2) : [[0], [3, 1, 2]]
(0, 3, 2, 1) : [[0], [3, 2, 1]]
(1, 0, 2, 3) : [[1, 0], [2], [3]]
(1, 0, 3, 2) : [[1, 0], [3, 2]]
(1, 2, 0, 3) : [[1], [2, 0], [3]]
(1, 2, 3, 0) : [[1], [2], [3, 0]]
(1, 3, 0, 2) : [[1], [3, 0, 2]]
(1, 3, 2, 0) : [[1], [3, 2, 0]]
(2, 0, 1, 3) : [[2, 0, 1], [3]]
(2, 0, 3, 1) : [[2, 0], [3, 1]]
(2, 1, 0, 3) : [[2, 1, 0], [3]]
(2, 1, 3, 0) : [[2, 1], [3, 0]]
(2, 3, 0, 1) : [[2], [3, 0, 1]]
(2, 3, 1, 0) : [[2], [3, 1, 0]]
(3, 0, 1, 2) : [[3, 0, 1, 2]]
(3, 0, 2, 1) : [[3, 0, 2, 1]]
(3, 1, 0, 2) : [[3, 1, 0, 2]]
(3, 1, 2, 0) : [[3, 1, 2, 0]]
(3, 2, 0, 1) : [[3, 2, 0, 1]]
(3, 2, 1, 0) : [[3, 2, 1, 0]]
We can confirm two things that the function tocyc
fulfills:
- It is a bijection from \(S^n\) to \(S^n\).
- For a permutation \(p\), the number of cycles in \(tocyc(p)\) equals the number of visible people in the queue described by \(p\).
From these two facts, it is clear that the expected number of visible people in a random-permutation queue of \(n\) people is exactly the expected number of cycles in a random \(n\)-permutation.
Fortunately, we've already computed that the expected number of cycles is exactly in \(H_n\) (see here). qed.