Coupon Collector Problem

In the coupon collector problem, the goal is to purchase distinct objects in order to make a complete set of objects. Each purchase gives a random object, and the contents are independent of all other purchases. "Coupon" is just a placeholder word; the objects collected can be any kind of object.

Mathematically, the goal of the problem is to quantify the effort required to complete the collection. This is done by calculating the expected value of purchases of an object in order to obtain a full set of those objects.

One of the most interesting observations of the coupon collector problem is that a collection becomes more difficult to complete as one approaches a full collection. The final item in a collection typically requires the most effort to obtain.

The mathematical principles behind this problem are useful for problems involving any number of different types of things: cards from collectible card games (CCGs), sports cards, and, as seen in the example above, collectible toys.

General Case

There are many possible variations of the coupon collector problem. The following is the most basic case of the coupon collector problem:

The Coupon Collector Problem

There is a bin that contains \(n\) distinct objects. Each "purchase" consists of selecting an object out of the bin at random and then replacing it. \(X\) is the discrete random variable that represents the number of purchases until each of the \(n\) objects is selected at least once.

The goal of the coupon collector problem is to find \(\text{E}[X]\).

Typically, the "bin" is interpreted to be some pack in which the contents are randomized. One of the key aspects of this problem is that the selections are mutually independent (selection with replacement). Practically speaking, this means that objects can be selected more than once. However, in this problem, multiple selections of the same object do not matter; only selections of distinct objects matter.

The linearity of expectation page contains several problems relating to the coupon collector problem. These problems should be attempted before moving on to the generalization of the coupon collector problem, below.

General Solution to the Coupon Collector Problem

For the coupon collector problem as stated above, the expected value of the number of purchases required in order to select each of the \(n\) objects at least once is:

\[\text{E}[X]=nH_n\]

Where \(H_n\) is the \(n^\text{th}\) harmonic number.

Let \(X\) be the discrete random variable that represents the number of purchases until each of the \(n\) objects is selected at least once.

Let \(X_k\) be the discrete random variable that represents the number of purchases after the \((k-1)^\text{th}\) distinct object to select the \(k^\text{th}\) distinct object. As a base case, \(X_1=1\), because the first object selected will always be distinct.

By linearity of expectation,

\[\text{E}[X]=\sum\limits_{k=1}^{n}\text{E}[X_k]\].

After the \((k-1)^\text{th}\) distinct object is selected, there are \(n-k+1\) objects remaining to be selected. Let \(A_k\) be the event that one of those objects is selected in the next purchase. Then by probability by outcomes,

\[P(A_k)=\frac{n-k+1}{n}\].

\(X_k\) follows a geometric distribution (of trials). It's expected value is the reciprocal of \(P(A_k)\):

\[\text{E}[X_k]=\frac{1}{P(A_k)}=\frac{n}{n-k+1}\]

From before, \(\text{E}[X]\) is equal to the sum of all these expectations:

\[\text{E}[X]=\sum\limits_{k=1}^{n}\frac{n}{n-k+1}=n\sum\limits_{k=1}^{n}\frac{1}{k}\]

Thus, \(\text{E}[X]=nH_n\), where \(H_n\) is the \(n^\text{th}\) harmonic number.

There is no closed form expression for the \(n^\text{th}\) harmonic number. However, there does exist a very good estimate for \(H_n\).

For relatively large \(n\), the \(n^\text{th}\) harmonic number can be approximated as:

\[H_n\approx\ln{n}+\gamma+\frac{1}{2n}\]

Where \(\gamma\) is the Euler-Mascheroni Constant, \(\gamma\approx 0.577216\).

This gives an approximate solution to the coupon collector problem:

Approximate Solution to the Coupon Collector Problem

For the coupon collector problem as stated above, the expected value of the number of purchases required in order to select each of the \(n\) objects at least once is approximated as:

\[\text{E}[X]\approx n(\ln{n}+\gamma)+\frac{1}{2}\]

Where \(\gamma\) is the Euler-Mascheroni constant, \(\gamma\approx 0.577216\).

This approximation tends to be very accurate, and is more accurate with larger \(n\). For \(n\ge 5\), the approximation is accurate to one decimal place.

The collectible card game Arcane: The Congregation just came out with a new set of cards. Cards are sold in sealed packs, and each pack contains a single random rare card. The contents of each pack is independent of every other pack, and each rare is equally likely.

If there are 55 distinct rare cards in the newest set, what is the expected number of packs needed to obtain at least one copy of each rare card? Round your answer to the nearest integer.

This problem, as described, fits the format of a coupon collector problem. Packs are independent, the contents are random, and each rare is equally likely.

If \(X\) is the random variable that represents the number of packs opened until each of the 55 rares is collected at least once, then \(\text{E}[X]\) could be calculated with:

\[\text{E}[X]=55H_{55}\]

However, it would be very tedious to calculate the \(55^\text{th}\) harmonic number. Fortunately, the problem as stated only requires an answer to the nearest integer. Therefore, the approximation can be used:

\[\text{E}[X]\approx 55(\ln{55}+0.577216)+\frac{1}{2}\approx 252.650\]

The expected number of packs needed to obtain at least one copy of each rare card, rounded to the nearest integer, is \(\boxed{253}\).

Incidentally, using computer software to compute the exact answer, it is \(\text{E}[X]\approx 252.6486716559\).

Variance

It should be noted that the solution to the coupon collector problem is not a guarantee. The expected value of purchases one would need to make in order to complete a collection is not the same as what will actually happen in practice. In fact, there is significant variability in the number of purchases needed to complete a collection. This variability can be quantified with the statistics of variance and standard deviation.

Variance of the Coupon Collector Problem

For the coupon collector problem as stated in the general case section above, the variance of the number of purchases required in order to select each of the \(n\) objects at least once is:

\[\text{Var}[X]=n^2\sum\limits_{k=1}^{n}{\frac{1}{k^2}}-nH_n\]

Where \(H_n\) is the \(n^\text{th}\) harmonic number.

Let \(X\) be the discrete random variable that represents the number of purchases until each of the \(n\) objects is selected at least once.

Let \(X_k\) be the discrete random variable that represents the number of purchases after the \((k-1)^\text{th}\) distinct object to select the \(k^\text{th}\) distinct object.

Given that each \(X_k\) is independent,

\[\text{Var}[X]=\sum\limits_{k=1}^{n}\text{Var}[X_k]\].

After the \((k-1)^\text{th}\) distinct object is selected, there are \(n-k+1\) objects remaining to be selected. Let \(A_k\) be the event that one of those objects is selected in the next purchase. Then by probability by outcomes,

\[P(A_k)=\frac{n-k+1}{n}\].

\(X_k\) follows a geometric distribution. It's variance is:

\[\text{Var}[X_k]=\frac{1-P(A_k)}{\left[P(A_k)\right]^2}=\frac{1-\frac{n-k+1}{n}}{\left(\frac{n-k+1}{n}\right)^2}=\frac{n(k-1)}{(n-k+1)^2}\]

From before, \(\text{Var}[X]\) is equal to the sum of all these variances:

\[\begin{align} \text{Var}[X]&=\sum\limits_{k=1}^{n}\frac{n(k-1)}{(n-k+1)^2} \\ \\ &=n\sum\limits_{k=1}^{n}\frac{n-k}{k^2} \\ \\ &=n^2\sum\limits_{k=1}^{n}{\frac{1}{k^2}}-n\sum\limits_{k=1}^{n}{\frac{1}{k}} \\ \\ &=n^2\sum\limits_{k=1}^{n}{\frac{1}{k^2}}-nH_n \\ \\ \end{align}\]

Thus, \(\displaystyle\text{Var}[X]=n^2\sum\limits_{k=1}^{n}{\frac{1}{k^2}}-nH_n\), where \(H_n\) is the \(n^\text{th}\) harmonic number.

You might notice that the variance calculation contains a very inconvenient finite sum in \(\displaystyle\sum\limits_{k=1}^{n}{\frac{1}{k^2}}\). Fortunately, there is a way to approximate this sum using the Basel Problem.

The Basel Problem gives the following value for the convergent series:

\[\sum\limits_{k=1}^{\infty}{\frac{1}{k^2}}=\frac{\pi^2}{6}\]

Thus, for relatively large \(n\),

\[\sum\limits_{k=1}^{n}{\frac{1}{k^2}}\approx \frac{\pi^2}{6}\]

With this approximation, and the previous approximation for \(nH_n\), an approximation can be given for \(\text{Var}[X]\):

Approximate Variance of the Coupon Collector Problem

For the coupon collector problem as stated in the general case section above, the variance of the number of purchases required in order to select each of the \(n\) objects at least once is approximated as:

\[\text{Var}[X]\approx\frac{\pi^2}{6}n^2-n(\ln{n}+\gamma)-\frac{1}{2}\]

Where \(\gamma\) is the Euler-Mascheroni constant, \(\gamma\approx 0.577216\).

Contents

Image credit: toyspedia.blogspot.com

Image credit: ravnerdwars.info