Linearity of Expectation

Linearity of expectation is the property that the expected value of the sum of random variables is equal to the sum of their individual expected values, regardless of whether they are independent.

The expected value of a random variable is essentially a weighted average of possible outcomes. We are often interested in the expected value of a sum of random variables. For example, suppose we are playing a game in which we take the sum of the numbers rolled on two six-sided dice:

Calculating the expected value of the sum of the rolls is tedious using our basic methods. Instead, we make the following argument: "Well, the expected value for each die is $3.5$, and the two dice rolls are independent events, so the expected value for their sum should be $3.5+3.5=7$."

And this is true—these expected values add. But there’s more! The linearity property of expectation is especially powerful because it tells us that we can add expected values in this fashion even when the random variables are dependent.

Let that sink in for a moment, as it can be quite counter-intuitive! As an example, this means that the expected value for the amount of rain this weekend is precisely the expected value for the amount of rain on Saturday plus the expected value for the amount of rain on Sunday, even though we do not think that the amount of rain on Saturday is independent of the amount of rain on Sunday (for example, a very rainy Saturday increases the likelihood of a rainy Sunday).

On this page, we derive this property of expected value. We'll solve some basic problems, and then dive into the advanced techniques which allow us to solve many combinatorics problems, ranging from reasonably straight-forward to quite challenging. Finally, we’ll explore applications in other subject areas such as computer science and geometry.

Definition and Proof

First, let's clearly state the linearity property of the expected value function (usually referred to simply as "linearity of expectation"):

For random variables $X$ and $Y$ (which may be dependent),

\[E[X+Y] = E[X]+E[Y].\]

More generally, for random variables $X_1,X_2,\ldots,X_n$ and constants $c_1,c_2,\ldots,c_n,$

\[E\left[\sum_{i=1}^{n} c_i X_i\right] = \sum_{i=1}^{n} \big(c_i \cdot E\left[X_i\right]\big). \]

Hopefully, it is clear from the proof above why this property holds, regardless of whether or not the random variables are independent. This is one of the key concepts of linearity of expectation, so be sure to work through the proof again if this isn't evident.

Basic Examples

Before we jump into problem-solving techniques, let's show how to directly apply linearity of expectation. Consider the example given in the introduction: a game in which two six-sided dice are rolled. We've discussed how we were able to find the expected value of the sum as $7,$ since the expected value of each die is $3.5.$

However, remember that one of the most important distinctions of linearity of expectation is that it can be applied to dependent random variables. Let's do an example with our dice:

If the sum of the numbers rolled on the dice is $A$ and the product of the numbers rolled is $B,$ compute $E\left[A+B\right].$

You can practice directly applying linearity of expectation in the following problem:

Now that we've seen some direct applications of linearity of expectation, let's jump into some problem-solving techniques!

Introductory Problem-solving

Interestingly enough, one of the most common uses of linearity of expectation in problem-solving is when it can be applied to finding the expected value of a single random variable. You're probably thinking, "Wait, how can I apply a tool about sums of random variables to a single random variable?"

In situations where linearity of expectation is most useful, it is often not obvious that it should be applied. Instead, we have to use our problem-solving skills to reframe our single random variable as a sum of other random variables.

First, let's note two important signs which alert us to the fact that we may be able to apply linearity of expectation when solving a given expected value problem:

Computing the expected value as a weighted average is difficult/messy because the probability of each individual outcome is hard to calculate.

The random variable under consideration can be written as a sum of some simpler random variables.

Let’s take a look at an example:

Caroline is going to flip 10 fair coins. If she flips $n$ heads, she will be paid $$n$. What is the expected value of her payout?

As you can see, linearity of expectation can greatly simplify the calculation required in an expected value calculation! Notice how we used the fact that the expected value calculation seemed messy to consider invoking linearity of expectation, and then we cleverly wrote the random variable (Caroline's payout) as a sum of simpler random variables. Go ahead and try the following problem to test your skills:

In the example and problem above, we have applied linearity of expectation to sums of independent random variables. That’s all well and good, but remember that one of the most powerful aspects of linearity of expectation is that it applies to random variables which are dependent! Let’s work out an example problem where we have a sum of dependent random variables.

The digits $1,2,3,$ and $4$ are randomly arranged to form two two-digit numbers, $\overline{AB}$ and $\overline{CD}.$ For example, we could have $\overline{AB} = 42$ and $\overline{CD} = 13.$ What is the expected value of $\overline{AB}\cdot \overline{CD}?$

Now that we've got the basic ideas under our belt, let's move on to some more complicated problem-solving techniques.

Using Indicator Variables

In this section, we'll continue exploring techniques that allow us to solve combinatorics problems using linearity of expectation, leading up to the introduction of a new tool known as indicator variables. In particular, this technique is useful when the random variable under consideration is counting the number of occurrences of simple events.

In these types of problems, the fact that the random variable under consideration can be written as a sum of other random variables will be less obvious than in the previous section. But with the tools we've built up thus far, we'll be able to solve these problems in no time!

Let's begin with an example to help spark some ideas about clever ways to write our random variable as a sum of simpler random variables:

A box contains a yellow ball, an orange ball, a green ball, and a blue ball. Billy randomly selects 4 balls from the box (with replacement). What is the expected value for the number of distinct colored balls Billy will select?

We want to find the expected value of the number of distinct colors Billy selects. We could directly compute the probabilities that Billy selects $k=1,2,3,4$ distinct colors, but this is a little bit messy (and would be even harder if there were many more than 4 balls). That's our cue to see if linearity of expectation might be helpful!

Now, we need to think about how to write the number of distinct colors Billy selects as a sum of some other random variables. To get started, suppose Billy's four selections were as follows:

How would you determine the number of distinct colors Billy has selected? Well, looking at the image above, Billy has chosen the orange, green, and blue balls, but he hasn't chosen the yellow ball, so that's 3 distinct colors. If you're thinking to yourself, "yeah sure, I know how to count already!", hang tight—here's where the magic happens.

Since our random variable is just counting how many of the colors have been selected, we can think of it as a sum of four random variables: one for each color, which is equal to 1 if the color is selected and 0 if it is not. So, in the example above, the orange, green, and blue random variables would be 1, while the yellow random variable would be 0. Adding these up gives a total of 3 distinct colored balls selected.

Furthermore, the expectancy of any of these four random variables is simple to compute. In fact, using the basic definition of expected value, we see that its expectancy is simply equal to the probability that the color is selected. Using complimentary probability techniques, we find that the probability that an arbitrary color (let's say yellow) is selected is

\[1-P(\text{yellow is not selected}) = 1-\left(\frac{3}{4}\right)^4 = \frac{175}{256}.\]

Finally, by linearity of expectation, the expected value for the number of distinct colored balls that Billy will select is

\[4\cdot \frac{175}{256} = \frac{175}{64} \approx 2.73. \ _\square\]

Let's reflect on this example and see what important ideas we have used and developed:

Linearity of expectation helped us compute a seemingly complicated expected value and in a very simple way (albeit after using a clever insight—but these will become second nature with more practice)!
Linearity of expectation allowed us to not worry about the fact that we were considering a sum of dependent random variables. Obviously, these variables are dependent: for example, if three of them are 0, then the fourth must be 1 (since at least one color must be selected). However, as we proved at the beginning of this page, linearity of expectation holds for all random variables, regardless of independence.
Because our random variable counted something (e.g. the number of distinct colored balls), we were able to write it as a sum of random variables that indicated which things should be counted (e.g. a 1 for each color which was selected).

Looking at this last point, it is actually quite reminiscent of how we solved the earlier example with Caroline's coin flipping when we counted the number of heads she flipped by summing over random variables which indicated, for each coin, whether or not it was heads. In fact, there is a formal term for such a random variable:

An indicator random variable on an event $t$, often denoted as $\mathbb{1}_t$ is the random variable which is $1$ if $t$ occurs and 0 if $t$ does not occur. By the definition of expected value,

\[E\left[\mathbb{1}_t\right] = P(t\text{ occurs})\cdot 1 + P(t\text{ does not occur})\cdot 0 = P(t\text{ occurs}).\]

These indicator variables can be extremely helpful for solving problems with linearity of expectation, especially when it is simple to compute their expectancy (that is, the probability of the event occurring). Using this idea, you’re ready to take on a problem that would otherwise be essentially impossible to solve in a simple, clean way.

To summarize, using linearity of expectation with indicator variables is often useful when

a problem looks like it could be solved using linearity of expectation;
there is not an obvious way to write the random variable under consideration as a sum of simpler random variables;
the random variable under consideration is counting the number of occurrences of a fixed number of simple events.

Using States

In this section, we'll introduce a technique for applying linearity of expectation when the random variable under consideration measures the amount of time or number of steps it takes to complete some sort of process. If that sounds a little vague, let's consider the following simple example to introduce this idea of "states" within a process:

Allison has an unfair coin which lands on heads with probability $p.$ What is the expected value for the number of times she will have to flip the coin until she flips a heads?

Let's look at a more complicated example using states, in which we'll be able to directly apply the result we've just derived!

With each purchase at SlurpeeShack, you receive one random piece of the puzzle seen at right.

Once you collect all 12 pieces, you get a free Slurpee!

What is the expected value for the number of purchases you will need to make in order to collect all 12 pieces?

As we've seen, the method of states is useful when we can write our random variable as the sum of other random variables which measure the amount of time/steps it takes to get from one state to the next. If you're looking for an additional challenge, try this problem where the states are even less obvious:

Applications

We've seen how to solve some great combinatorics problems using linearity of expectation. However, there are also many real-world and cross-discipline uses of linearity of expectation, and in this section we'll explore a few of those.

Let's begin by considering lotteries.

There is a lottery contest in which participants pay $1 to choose 5 distinct numbers from the integers 1 to 50. The lottery company offers a $2,500,000 prize if someone guesses the correct 5 numbers (and the prize will be split among all winners if there are multiple correct participants).

We know from combinations that there are $\binom{50}{5} = 2,118,760$ possible choices. But wait—by the pigeonhole principle, this means that if we just get a group of $2,118,760$ people to each submit a distinct lottery ticket, we will surely make money from the lottery company!

While that's true, lottery companies know that participants usually don't have the resources to coordinate in such a way. Instead, they may assume that each person likely chooses their numbers random. If this is the case and $n$ people participate in a lottery with $m$ possible choices, what is the probability that some participant wins?

It's very cool to see how we were able to apply our skills with linearity of expectation to discover an interesting fact about real-world lotteries! Let's turn to something with a very different flavor: a famous geometric probability question.

Suppose a needle of length 1 is dropped onto a floor with strips of wood 1 unit apart. What is the probability that the needle will land across two strips of wood?

The traditional solution to this problem uses Calculus, but we're going to show how to solve it with linearity of expectation instead!

Here are some other examples which can be tackled with linearity of expectation:

Many questions about expected values in random graphs can be answered with linearity of expectation. For example, suppose that there is a group of $n$ potential friends, and each pair of people becomes friends with probability $\frac{1}{2}$. What is the expected value for the number of "friend-triplets," e.g. groups of three people who are all mutual friends?

If $n$ people are in a room, what is the expected number of distinct birthdays represented, assuming there are 365 days in every year? How does this relate to the birthday paradox problem?

In computer science, the randomized quicksort algorithm has expected runtime $O(n\log n).$ How does linearity of expectation allow us to show this?

Applied Probability

Contents