Following this question on stackoverflow, I came up with the following “merge shuffle” algorithm to permute a linked list. Originally, I had doubts that the results would be distributed uniform, but I discovered that the algorithm gives good results. So I decided to check whether the distribution is indeed uniform.
Let us first of all discover what the algorithm looks like (in Haskell-like pseudo code):
splitlist  = (,) splitlist (x:xs)=(x:r,l) where (l,r)=splitlist xs merge  r = r merge l  = l merge (lh:l) (rh:r) = if (randomCondition) then lh:(merge l (rh:r)) else rh:(merge (lh:l) r) mergeshuffle  =  mergeshuffle [a] = [a] mergeshuffle (x:xs) = merge (mergeshuffle l) (mergeshuffle r) where (l,r) = splitlist (x:xs)
As you can see, it is just a merge sort algorithm with a randomized merge subroutine, that takes the first element of the randomly chosen list. We now assume that
randomCondition is fulfilled with probability 0.5 and not fulfilled with probability 0.5.
We are now going to prove that the above algorithm yields evenly distributed results, i.e. all outcomes of
mergeShuffle are equally probable.
Therefore, we first examine three base cases – lists with 0, 1 or 2 elements. We choose these base cases, since all recursive calls somewhen result in calls for lists with these lengths. For those cases, it is clear that all outcomes are equally probable (with the assumption that the afore mentioned
randomCondition occurs with probability 0.5). Thus, we can say that if we have a list with at most 2 elements, all outcomes are equally probable.
Now, we consider lists with more than 2 elements. First, we take the
merge subroutine. It is clear that for two fixed input lists of lengths resp. , there is for every possible (merged) result exactly one run of the merge subroutine. Furthermore, it is clear (since we have list items in total) that there are possible outcomes in total (since we have times the random variable
randomCondition). So the probability of a certain outcome of merge has probability .
Now, we prove the claim using induction. We assume that the algorithm yields uniformly distributed results for lists of lengths resp. and conclude that then the algorithm yields uniformly distributed results for a list of length .
We saw above that for two lists, each possible result of merge is reached via one unique sequence of
randomConditions. Moreover we know – because of our induction assumption – that the algorithm (i.e. the recursive calls of
mergeshuffle) yields uniformly distributed results. I.e. the probability that
mergeshuffle (applied to a list of length ) yields a specific result, is .
Thus, we can compute the probability that
mergeshuffle applied to a list of length results in a specific outcome by the following:
This is because the recursive calls yield two specific results with probability and for the results, the probability that the merge subroutine reaches (another) specific outcome is . Since both phases are independent (or at least we assume they are), we can just multiply the two probabilities to see that all results are equally likely.