Have you ever compared your random draws to find that they actually don’t seem that random at all? In fact, you might’ve noticed a pretty substantial overlap in the number of people listed in each selection. Well, it turns out that this is actually quite expected. And with a short lesson in probability, we’ll show you why.
For starters, let’s consider a consortium with 100 people in it. We do an initial draw of 20%. So, the probability of any one person being selected is 20 in 100 which can be simplified to 1 in 5. If we do another draw, how much overlap is expected between the first and second draws?
Probability tells us that, if you know the probability of 2 events happening independently, you can find the probability that they’ll happen together by multiplying their individual probabilities. In this case, our first event is Jim getting selected in the first draw. The probability of that occurring is 1 in 5 as noted above. The second event it Jim getting selected in another draw. We’re doing exactly the same thing as before so the probability is the same: 1 in 5. To get the probability that both of these events occur, we multiply the two probabilities (1/5 x 1/5) to get the probability of Jim getting selected for both draws: 1/25.
To summarize, if we do 2 random draws of 20% of the people from a 100-person consortium, we should actually expect the draws to have 100 x 1/25 = 4 people in common out of the 20 selected. It seems much higher than what our intuition would tell us but, like our math teachers told us, probability never lies.
So, let’s test this. To produce the image below I performed 20 real random selections in TestVault and, for each pair (e.g., selection 2 and selection 1, selection 3 and selection 1, selection 3 and selection 2, etc.), I found the number of overlaps. Then, I tallied the number of occurrences of each number. For example, you can see in the chart that there were 56 pairs that had 4 people in common. In fact, it is clear from the chart that 4 is the clear winner in number of overlaps and that the other frequencies circle around 4. However, while 4 is the winner, we saw overlaps of 0 people all the way up to 10 (half of the number of people selected).
As you can see, we can’t make any real assumptions on how much overlap to expect for any two selections. That number can vary tremendously. I hope, however, that this article helps to inform your intuition for future selections. Happy testing and good luck.