Waste less time on Facebook — follow Brilliant.

Better living through probability generating functions

The straightforward way to calculate the mean value, mean squared value, or any other moment of a distribution is by taking a weighted average:

\[\begin{align} \langle x \rangle &= \sum\limits_i x_ip(x_i) \\ \vdots \\ \langle x^m \rangle &= \sum\limits_i x_i^mp(x_i) \end{align} \]

For example, in the voting model, the mean value of \(\hat{A}\) is given formally by

\[ \begin{align} \langle\hat{A}\rangle &= 0\cdot p(0)+1\cdot p(1)+2\cdot p(2)+\ldots+N\cdot p(N) \\ &= \sum\limits_m m\cdot p(m) \end{align} \]

where \(p(m)\) is given by the binomial distribution: \(\binom{N}{m}p_A^m\left(1-p_A\right)^{N-m}\).

As we expect, the mean value of \(\hat{B}\) is

\[ \begin{align} \langle\hat{B}\rangle &= \sum\limits_m (N-m)\cdot p(m) \\ &= N - \langle \hat{A} \rangle \end{align} \]

In principle, we can extract anything we'd like to know from a distribution using similar series. In general, however, these sums can get cumbersome and quite unfriendly. It would be preferred if we could find a way to bypass their use altogether, if we could find an equivalent representation that reveals its properties by simpler means.

A similar problem arises in statistical mechanics where we have a series of states, characterized by some energies \(E_i\), for which we'd like to know the likelihood of finding a system in any one of those states. In principle we have to calculate some dreadful weighted sum.

However, it is often possible to express the probability distribution in a compact form known as the partition function \(Z(E_i)\) and obtain various bulk average by means of simple operations on \(Z\), rather than wading through the murky waters of series manipulation. Can we do something similar here?

Let's consider the object \(\displaystyle Z = \left(p_A + p_B\right)^N\). On its face this may not seem like much of anything. For one thing, it is always equal to 1.

If we expand it, we see that it is a sum of terms, each one corresponding to a \(p(m)\) in the binomial distribution, i.e.

\[\left(p_A + p_B\right)^N = \sum\limits_{m=0}^{N}\binom{N}{m}p_A^mp_B^{N-m}\]

This is a good sign, we have found a concise expression that reproduces the binomial distribution. It would be great if we could implant something that allows us to easily transform the compact expression from distribution, to mean value, to mean squared value, etc.

Right now, we have an expression for \(\displaystyle\sum\limits_{m=0}^{N}\binom{N}{m}p_A^mp_B^{N-m}\) while we'd like one for \(\displaystyle\sum\limits_{m=0}^{N}m\binom{N}{m}p_A^mp_B^{N-m}\).

Consider \(\left(p_Az+p_B\right)^N\). Upon expanding, we find that this is

\[f(z) = \displaystyle\sum\limits_{m=0}^{N}z^m\binom{N}{m}p_A^mp_B^{N-m}\]

and in particular \(f(1) = \displaystyle\sum\limits_{m=0}^{N}\binom{N}{m}p_A^mp_B^{N-m}\).

Cleaning things up a bit, we have \(f(z) = \displaystyle\sum\limits_{m=0}^{N}z^mp(m)\)

Now, if we take the derivative of \(f\) with respect to \(z\), we can bring down a factor of \(m\) in each term of the sum, i.e.

\[f'(z) = \displaystyle\sum\limits_{m=0}^{N}mz^{m-1}p(m)\]

and, by setting \(z=1\), we obtain the mean value of the distribution \[f'(1) = \displaystyle\sum\limits_{m=0}^{N}m\cdot p(m) = \langle m \rangle\]

This is amazing. Rather than summing the series by hand, manipulating the terms, applying identities, etc. we perform a single derivative and find \(f'(1) = \langle m \rangle \).

It is natural to then ask what happens if we take the second derivative. Let us see!

\[ \begin{align} f''(1) &= \sum\limits m(m-1)z^{m-2}p(m)\bigg|_{z=1} \\ &= \left[\displaystyle\sum m^2z^{m-2}p(m) - \displaystyle\sum mz^{m-2}p(m)\right]\bigg|_{z=1} \\ &= \langle m^2 \rangle - \langle m \rangle \end{align} \]

Then, we have \[\langle m^2 \rangle = f''(1) + f'(1)\]


\[ \begin{align} \sigma^2(m) &= \langle m^2 \rangle - \langle m \rangle ^2 \\ &= f''(1)+f'(1)-f'(1)^2 \end{align} \]

It appears we have stumbled on something quite valuable: for each derivative we perform on \(f(z)\), we receive another piece of information about the distribution \(p(m)\).

At this point you might wonder what all of this is good for. After all, we began with some ugly looking sum, and now we have arrived at some slightly less ugly but more abstract sum to take its place. But remember, we are working with the abstract representation for \(f(z)\) in order to derive rules to manipulate \(f\)'s simpler form, \(\left(p_Az+p_B\right)^N\). Calculations are simpler and more transparent once these are in place.

Next, let's try out some of our tools.

Note by Josh Silverman
2 years, 6 months ago

No vote yet
1 vote


Sort by:

Top Newest

I didnt understand it but its good, so I reshared it. Julian Poon · 1 year, 11 months ago

Log in to reply

what does this symbol (!) mean? Cyrus Wilson · 1 year, 1 month ago

Log in to reply

@Cyrus Wilson Sorry, where in the note do you see the ! symbol? I can't find it. Josh Silverman Staff · 1 year, 1 month ago

Log in to reply

@Josh Silverman on a problem ,algebra, all I remember is something like 6!-4! divided by 10!, or something like that. do not understand it . can you help! Cyrus Wilson · 1 year, 1 month ago

Log in to reply

@Cyrus Wilson Oh, I see. \(!\) means the factorial function. It is easy to see what it means from example, \(5! = 5\times 4\times 3\times 2\times 1\). Not much more complicated than that! Josh Silverman Staff · 1 year, 1 month ago

Log in to reply


Problem Loading...

Note Loading...

Set Loading...