# How many cards you need to pull to collect all of them?

Let’s assume some series has five common cards. How many common cards do you need to pull out to collect all five without trading? In fact, there is no particular answer because this task is about probabilities. So let’s correct the question: how many cards do you need to pull out to collect all cards in 95% of cases, and which number will be the most likely?

I’ll use the following variables: c - number of different cards in the series, s - number of pulled out surplus cards, and t = c + s - total number of pulled out cards.

Let start from a trivial case: what is the chance to pull out only different cards, and hence there no need to pull additional cards? It’s just a ratio of fitting cases to the total number. When we pull out the first card, we’ll be glad any. At the second time, we don’t need this card again and will be satisfied only with one of the other four cards. At the third time - one of three, the fourth time - one of two, and at the fifth time, we need only the last card. Hence, the number of fitting cases is 5·4·3·2·1 = 5! = 120. The total number of possible cases is 5·5·5·5·5 = 55 = 3125. Thus the chance is 120/3125 = 3.84%. Or, if we generalize, .

What if you were not so lucky and once pulled a duplicate? It can happen at a different time, so the number of fitting cases will be 5·4·4·3·2·1 + 5·4·3·3·2·1 + 5·4·3·2·2·1 + 5·4·3·2·1·1 = 5·4·3·2·1·(4+3+2+1) = 5!·(4+3+2+1). The change is 5!·(4+3+2+1) / 56 = 7.68%. And if we apply the formula of sum of arithmetic progression we’ll get .

If you unfortunately twice pulled duplicates, then the number of fitting cases is 5·4·4·4·3·2·1 + 5·4·4·3·3·2·1 + 5·4·4·3·2·2·1 + 5·4·4·3·2·1·1· + 5·4·3·3·3·2·1 + 5·4·3·3·2·2·1 + 5·4·3·3·2·1·1 + 5·4·3·2·2·2·1 + 5·4·3·2·2·1·1 + 5·4·3·2·1·1·1 = 5!·(4·4 + 4·3 + 4·2 + 4·1 + 3·3 + 3·2 + 3·1 + 2·2+2·1 + 1·1) = 5!·(4·(4+3+2+1) + 3·(3+2+1) + 2·(2+1) + 1·1). And the chance is 9.984%. The general formula will be: .

At the next step, after simplification, we'll get 5!(4(4(4+3+2+1) + 3(3+2+1) + 2(2+1) + 1·1) + 3(3(3+2+1) + 2(2+1) + 1·1) + 2(2(2+1) + 1·1) + 1·1·1). Here the chance is 10.75%.

By repeating it, the general formula turns into this: .

where f function is: .

To ensure that the formula is correct, let’s compare its distribution with one of simulating 100k players. And it’s clearly seen that they are the same - the average deviance of difference is 0.02.

By the way, this is how it looks for 30 cards. Note, the maximum is smaller at almost ten times.

Now, let’s see the chance to get all cards while pulling a certain number of surplus cards. For it, we just sum all possibilities: .

And for series with a different number of cards, it looks so:

The sum never exceeds 100% that says we on the right way as well.

To get the most likely number, we just need to find the surplus number with the highest chance. For another part of the question, we need to find the surplus number that exceeds 95% of the summary chance. But we can do smarter. The normal distribution has the following feature: about 95% of the values lie within two standard deviations away from the mean. And, in general, our distributions are normal. We’ll get to numbers: the lower and upper boundaries. You will likely be still collecting below the lower border and probably already finish when you reach the upper border.

Starting from 8 cards, the most likely number of surpluses is higher than the number of different cards, beginning in 20 cards, higher in twice or more.

So, which conclusion we can make from it? It’s simple: let’s trade and help each other to faster finish their favorite series.

For curious, here is the source code I used to generate all these data:
``````const cache = {};
function f (cards, surplus) {
if (cards <= 1n || surplus === 0n ) return 1n;
if (surplus == 1n) return cards * (cards+1n) / 2n;
const k = `\${cards}:\${surplus}`;
if (k in cache) return cache[k];
cache[k] = f(cards-1n, surplus) + cards*f(cards, surplus-1n);
return cache[k];
}

function fact (n) {
return (n <= 1n) ? 1n : n * fact(n-1n);
}

function prop (cards, surplus) {
cards = BigInt(cards);
surplus = BigInt(surplus);
const numerator = fact(cards) * f(cards - 1n, surplus);
const denominator = cards ** (cards + surplus);
// for surplus > 180 denominator exceeds Number
return 1000 / Number(denominator * 1000n / numerator);
}

const cards = 30, surplus = 200;
console.log(`chances by formula for \${cards} cards and \${surplus} surpluses:`);
console.log(Array(surplus + 1)
.fill(1)
.map((_, i) => prop(cards, i))
// .map((_, i, arr) => arr.slice(0, i+1).reduce((a,b)=>a+b))
.join("\n")
// .reduce((a, b) => a + b)
);

const dist = [], count = 100000;
console.log(`chances by simulation for \${count} users for \${cards} cards:`)
for (let i = count; i > 0; i--) {
const collection = Array(cards).fill(0);
do {
collection[~~(Math.random()*cards)]++;
} while (collection.some(cards => cards === 0));
const surplus = collection.reduce((a, b) => a + b) - cards;
if (dist.length < surplus+1) dist.length = surplus+1;
dist[surplus] = (dist[surplus] ?? 0) + 1;
}
console.log(dist.map((v) => v/count).join("\n"));

console.log(`lowest, most likely, and highest number of surpluses for different number of cards:`)
for (let cards = 2; cards <= 30; cards++) {
let sum = prop(cards, 0), surplus = 0;
const dist = [sum];
while (sum < 0.99999) {
surplus++;
dist[surplus] = prop(cards, surplus);
sum += dist[surplus];
}

const top = dist.findIndex((p, i, arr) => p > arr[i+1]);
const expVal = dist.reduce((ex, p, x) => ex + p*x, 0);
const stDev = dist.reduce((dev, p, x) => dev + p*(x - expVal)**2, 0) ** 0.5;

console.log(cards, Math.max(0, expVal - 2*stDev), top, expVal + 2*stDev);
}
``````
2 Likes    All these maths and the key isa each other anyways…  