# A hiring problem

Imagine the following scenario: You want to hire a new employee, and of course you want to take the best candidate. Therefore, you conduct a series of interviews with the candidates one after another. We now impose the following restriction: You have to tell the candidate whether he or she is good enough or not (i.e. if you want to hire him or her) *directly after the interview*. That is, once you told a candidate you hire her or him, the interview process is over and you do not interview any more candidates. And once you told a candidate you won't hire her or him, this candidate is away and can not be hired anymore. We now consider a strategy to solve this problem: We hire a candidate if he or she is better than the expected value of the best of all following candidates. If there are no more candidates, we have to hire the last candidate.

## Some considerations regarding our strategy

We formalize our assumptions as follows: By \(X\) we denote a vector of random variables \(X_1,\dots,X_m\), where \(X_i\) stands for how good the candidate fits into your company. We will focus on the case where the \(X_i\)s are distributed uniformly over \([0,1]\), but one can adapt our considerations for other distributions, as well. We assume that we know the total number \(m\) of candidates. Moreover, we assume that the candidates are independent of each other.

Our strategy now can be formalized as follows: We hire candidate \(i\) if \(X_i \geq \mathbb{E}[\max\{X_{i+1},\dots,X_m\}]\). Therefore, it might be helpful to obtain a formula to compute this expected value. We can observe that \(\max\{X_{i+1},\dots,X_m\}\) as the maximum of \(m-i\) random variables and thus, compute the expected values of \(m-i\) identically distributed, independent random variables \(X_1,\dots,X_{m-i}\).

\(\mathbb{E}(\max\{X_1,\dots,X_{m-1}\}) = \int_0^1 x \cdot f_{\max\{X_1,\dots,X_{m-1}\}}(x)\, dx\).

To compute \(f_{\max\{X_1,\dots,X_{m-1}\}}(x)\) (the density function for the random variable \({\max\{X_1,\dots,X_{m-1}\}}\)), we compute its distribution function \(F_{\max\{X_1,\dots,X_{m-1}\}}(x)\) and take its derivative:

\(F_{\max\{X_1,\dots,X_{m-1}\}}(x) = Pr[\max\{X_1,\dots,X_{m-1} \leq x\}] = Pr[X_1 \leq x \wedge \dots \wedge X_{m-i} \leq x]\)

We can split the above probability because the \(X_i\) are independent and obtain \(Pr[X_1 \leq x] \cdot \dots \cdot Pr[X_{m-i} \leq x]=Pr[X_1 \leq x]^{m-i}\).

This leads to the fact (by evaluating the resulting integral for the exptected value) that \(\mathbb{E}[ \max \{ X_1,\dots,X_{m-1} \}] = \frac{m-i}{m-i+1}\).

## Expected value of the competency of the hired candidate

Now, of course, we want to know what we can expect from our strategy. I.e. we want to know how good the hired candidate is expectedly. Therefore, we define events \(A_i\), denoting that the \(i\)-th candidate is hired.

First of all, note that the probability of \(A_i\), i.e. the probability that exactly the \(i\)-th candidate is hired, is given by \(Pr[X_i \geq \frac{m-i}{m-i+1}] \cdot Pr[X_1 < \frac{m-1}{m-1+1}] \cdot Pr[X_2 < \frac{m-2}{m-2+1}] \cdot \dots Pr[X_1 < \frac{m-(i-1)}{m-(i-1)+1}]\), because we therefore have to require that all candidates before candidate \(i\) are worse than the corresponding expected value of the best candidate after them, and only candidate \(X_i\) is better than the expected value of the best candidate of all following candidates.

This means, we can compute \(Pr[A_i]\) by \((1-\frac{m-i}{m-i+1}) \cdot \prod_{j=1}^{i-1}\frac{m-j}{m-j+1}\).

We now use the law of total probability (extended to work for expected values) and compute the expected value of \(H\) (\(H\) denotes the random variable indicating how good our hired candidate is):

\(\mathbb{E}(H) = \sum_{i=1}^m \mathbb{E}[H | A_i] \cdot Pr[A_i]\). We can compute \(\mathbb{E}[H | A_i]\) by the following consideration: If candidate \(i\) is chosen (that's what \(A_i\) tells us), then we have that \(X_1 \leq \frac{m-1}{m-1+1} \wedge \dots \wedge X_{i-1} \leq \frac{m-(i-1)}{m-(i-1)+1} \wedge X_i \geq \frac{m-i}{m-i+1}\). So we can reforumlate \(\mathbb{E}[H | A_i] = \mathbb{E}[H | latex X_1 \leq \frac{m-1}{m-1+1} \wedge \dots \wedge X_{i-1} \leq \frac{m-(i-1)}{m-(i-1)+1} \wedge X_i \geq \frac{m-i}{m-i+1}]\). Since \(H\) only depends on \(X_i\) (only the value of the chosen candidate is important for \(H\), we can simplify the conditions to \(\mathbb{E}[H | X_i \geq \frac{m-i}{m-i+1}] = \mathbb{E}[X_i | X_i \geq \frac{m-i}{m-i+1}]\). Since \(X_i\) is uniformly distributed over \([0,1]\) we can tell that \(\mathbb{E}[X_i | X_i \geq \frac{m-i}{m-i+1}] = \frac{1 + \frac{m-i}{m-i+1}}{2}\).

Combining our simplified expression for \(\mathbb{E}[H | A_i\) and \(Pr[A_i]\), we arrive at \(\mathbb{E}(H) = \sum_{i=1}^m \mathbb{E}[H | A_i] \cdot Pr[A_i] = \sum_{i=1}^m \frac{1 + \frac{m-i}{m-i+1}}{2} \cdot \left[(1-\frac{m-i}{m-i+1}) \cdot \prod_{j=1}^{i-1}\frac{m-j}{m-j+1}\right]\). However, simplifying the above term seems very non-trivial (Wolfram Alpha returned the term as it was).

## Testing

However, we can verify our computations by simulating it (Python):

```
from random import random
import operator
def E_M(m):
"""Computes E(max(X_1,...,X_m)), where X_i ~ Uni(0,1)"""
return float(m)/(m+1)
def hire1(l):
"""Scans list l from left to right and hires i if l[i] is greater than
E(max(l[i+1],...,l[-1]))"""
remaining = len(l)
for i in xrange(len(l)):
remaining = remaining - 1
if l[i] > E_M(remaining):
return i
return i
def S1(m):
def prod(i):
return reduce(operator.mul, [(float(m-j))/(m-j+1) for j in xrange(1,i)], 1)
def binom(i):
return 0.5 * (1-(float(m-i)/(m-i+1))**2)
return sum([binom(i)*prod(i) for i in xrange(1, m+1)])
N=20
# compute expectancy
print S1(N)
# simulate
RUNS = 1000
total = 0
maxes = 0
for r in xrange(RUNS):
l = [random() for _ in range(N)]
h = hire1(l)
total = total + l[h]
if l[h] == max(l):
maxes = maxes + 1
total = total / RUNS
maxes = float(maxes) / RUNS
print total, maxes
```

By examining larger and larger values for `N`

, we can observe that the expectation converges to 1. Moreover, we can approximate the probability that we choose the best element (this is done by the variable `maxes`

). Some tests I did suggest that the probability of choosing the maximum is greater than 0.5.