An amazon interview question (not completely solved)
Taken from this site:
Given red balls and blue balls and some containers, how would you distribute those balls among the containers such that the probability of picking a red ball is maximized, assuming that the user randomly chooses a container and then randomly picks a ball from that?
The solution to this problem works as follows: Take all your red balls and put them one by one in one container after another until you have no balls left. If there are containers left (i.e. without red balls), take your blue balls and distribute them as you like into the remaining containers. If there are no containers left, take all your blue balls and put them in one arbitrary container.
To prove this (at least one case), we first do some definitions:
W.l.o.g I assume , i.e. all containers can be filled with at least one ball.
Let's call the number of containers . denotes the event that container is choosen. denotes the event that a red ball is chosen from the -th container. The probability of choosing the -th container is (since we have containers in total) .
Let denote the number of red balls in container and denote the number of blue balls in container . It is obvious that .
So the probability of choosing a red ball from container if we chose container already, is .
No we can use a case differentiation:
In this case, we have containers containing exactly 1 red ball and containers containing either one or more blue balls. So the probability of picking a red ball according to the proposed experiment, is
Proving that this probability is maximal is not very hard: We know, that at most summands can be positive (since containers with no red balls yield a probability of 0 and we have exactly red balls). We know further, that . Therefore we know, that the maximal probability in this case is as stated above.
Case 2: (unsolved)
Here we have at least one red ball in each container. More precisely, we have containers with exactly one red ball, and one container holding red and blue balls. The probability of picking a red ball can then be computed as
The above can be simplified to
To prove that this is the optimal probability is quite difficult (as far as I can judge it).
We have to fill all containers. For all holds that .
If we distribute the balls according to the above rule, we have containers with just one red ball, i.e. choosing one of these containers directly leads to picking a red ball. Just one single container has a probability of less than . More formally, (there exists exactly one such that) . Moreover .
If we choose any other distribution, there are at least two values such that and .
Let the number of summands that are equal to . That is, we have summands remaining that have to sum up to something greater than .
I do not know how to prove (or disprove) the second case.
If you can't prove, test it!
Since I was not able to prove the second case, I wrote some lines to test it experimental. Below you find some code to simulate the experiment and determine the best distribution:
def p(l): sum = 0 for (n,m) in l: if (n>0): sum = sum + (1/len(l))*(n/(n+m)) return sum def distribute(n,k): #distributes n balls into k urns if (k==1): yield [n] return for i in range(n+1): for d in distribute(n-i, k-1): yield [i] + d def bestDist(n,m,k): return max([(p(list(zip(l1,l2))),list(zip(l1,l2))) for l1 in distribute(n,k) for l2 in distribute(m,k)])
bestDist(n,m,k) returns a tuple consisting of the maximal probability and the corresponding Distribution. For example, the tuple
(0.8, [(1, 4), (1, 0), (1, 0), (1, 0)])
means, that we have a maximal probability of 0.8, that occurs if we distribute 1 red ball and 4 blue balls into container 1, 1 red ball and 0 blue balls into container 2, 1 red ball and 0 blue balls into container 3, 1 red and 0 blue ball into container 4.
>>> bestDist(2,2,2) (0.6666666666666666, [(1, 2), (1, 0)]) >>> bestDist(3,3,3) (0.75, [(1, 3), (1, 0), (1, 0)]) >>> bestDist(4,4,4) (0.8, [(1, 4), (1, 0), (1, 0), (1, 0)]) >>> bestDist(5,5,5) (0.8333333333333335, [(1, 0), (1, 0), (1, 0), (1, 5), (1, 0)]) >>> bestDist(6,6,6) (0.8571428571428571, [(1, 0), (1, 0), (1, 0), (1, 6), (1, 0), (1, 0)]) >>> bestDist(5,4,3) (0.8095238095238095, [(3, 4), (1, 0), (1, 0)])
As you can see, the distribution is always such that we have containers with exactly 1 red ball and 0 blue balls. One single container is filled with red and blue balls.
The simulation experiments I've done lead one to assume that the strategy is correct.
Hints for solving this quiz mathematically are highly appreciated...