Linearity of Expectation
Linearity of expectation is the property that the expected value of the sum of random variables is equal to the sum of their individual expected values, regardless of whether they are independent.
The expected value of a random variable is essentially a weighted average of possible outcomes. We are often interested in the expected value of a sum of random variables. For example, suppose we are playing a game in which we take the sum of the numbers rolled on two six-sided dice:
Calculating the expected value of the sum of the rolls is tedious using our basic methods. Instead, we make the following argument: "Well, the expected value for each die is \(3.5\), and the two dice rolls are independent events, so the expected value for their sum should be \(3.5+3.5=7\)."
And this is true—these expected values add. But there’s more! The linearity property of expectation is especially powerful because it tells us that we can add expected values in this fashion even when the random variables are dependent.
Let that sink in for a moment, as it can be quite counter-intuitive! As an example, this means that the expected value for the amount of rain this weekend is precisely the expected value for the amount of rain on Saturday plus the expected value for the amount of rain on Sunday, even though we do not think that the amount of rain on Saturday is independent of the amount of rain on Sunday (for example, a very rainy Saturday increases the likelihood of a rainy Sunday).
On this page, we derive this property of expected value. We'll solve some basic problems, and then dive into the advanced techniques which allow us to solve many combinatorics problems, ranging from reasonably straight-forward to quite challenging. Finally, we’ll explore applications in other subject areas such as computer science and geometry.
Contents
Definition and Proof
First, let's clearly state the linearity property of the expected value function (usually referred to simply as "linearity of expectation"):
For random variables \(X\) and \(Y\) (which may be dependent),
\[E[X+Y] = E[X]+E[Y].\]
More generally, for random variables \(X_1,X_2,\ldots,X_n\) and constants \(c_1,c_2,\ldots,c_n,\)
\[E\left[\sum_{i=1}^{n} c_i X_i\right] = \sum_{i=1}^{n} \big(c_i \cdot E\left[X_i\right]\big). \]
We'll explicitly prove this theorem for discrete random variables \(X\) and \(Y\). By the basic definition of expected value,\[\begin{align*} E[X+Y] &= \sum_{x}\sum_{y} \big[(x+y)\cdot P(X=x,Y=y)\big] \\ &=\sum_{x}\sum_{y} \big[x\cdot P(X=x,Y=y)\big] + \sum_{x}\sum_{y} \big[y\cdot P(X=x,Y=y)\big] \\ &=\sum_{x}x\underbrace{\sum_{y} P(X=x,Y=y)}_{P(X=x)} + \sum_{y}y\underbrace{\sum_{x} P(X=x,Y=y)}_{P(Y=y)} \\ &=\sum_{x}x\cdot P(X=x) + \sum_{y}y \cdot P(Y=y) \\ &=E[X] + E[Y]. \end{align*}\]
This result can be extended for \(n\) variables using induction.
Note that we have never used any properties of independence in this proof, and thus linearity of expectation holds for all random variables!
For continuous random variables, the proof is essentially the same except that the summations are replaced by integrals. Additionally, it is easy to extend the proof for two random variables to the more general case using the properties of expected value. \(_\square\)
Hopefully, it is clear from the proof above why this property holds, regardless of whether or not the random variables are independent. This is one of the key concepts of linearity of expectation, so be sure to work through the proof again if this isn't evident.
Basic Examples
Before we jump into problem-solving techniques, let's show how to directly apply linearity of expectation. Consider the example given in the introduction: a game in which two six-sided dice are rolled. We've discussed how we were able to find the expected value of the sum as \(7,\) since the expected value of each die is \(3.5.\)
However, remember that one of the most important distinctions of linearity of expectation is that it can be applied to dependent random variables. Let's do an example with our dice:
If the sum of the numbers rolled on the dice is \(A\) and the product of the numbers rolled is \(B,\) compute \(E\left[A+B\right].\)
We know that \(E[A] = 7,\) and since the two numbers rolled are independent, we have \(E[B] = 3.5 \cdot 3.5 = 12.25.\) Thus, despite the fact that \(A\) and \(B\) are clearly dependent, linearity of expectation tells us that\[E\left[A+B\right] = E[A] + E[B] = 7 + 12.25 = 19.25. \ _\square\]
You can practice directly applying linearity of expectation in the following problem:
The expected value for the amount of rain on Saturday and Sunday is 2 inches and 3 inches, respectively. There is a 50% chance of rain on Saturday. If it rains on Saturday, there is a 75% chance of rain on Sunday, but if it does not rain on Saturday, then there is only a 50% chance of rain on Sunday.
What is the expected value (in inches) for the amount of rain over the weekend?
Now that we've seen some direct applications of linearity of expectation, let's jump into some problem-solving techniques!
Introductory Problem-solving
Interestingly enough, one of the most common uses of linearity of expectation in problem-solving is when it can be applied to finding the expected value of a single random variable. You're probably thinking, "Wait, how can I apply a tool about sums of random variables to a single random variable?"
In situations where linearity of expectation is most useful, it is often not obvious that it should be applied. Instead, we have to use our problem-solving skills to reframe our single random variable as a sum of other random variables.
First, let's note two important signs which alert us to the fact that we may be able to apply linearity of expectation when solving a given expected value problem:
- Computing the expected value as a weighted average is difficult/messy because the probability of each individual outcome is hard to calculate.
- The random variable under consideration can be written as a sum of some simpler random variables.
Let’s take a look at an example:
Caroline is going to flip 10 fair coins. If she flips \(n\) heads, she will be paid $\(n\). What is the expected value of her payout?
Let’s call Caroline's payout $\(X\). Our first instinct is to find the probability that \(X=0,1,2,\ldots\) and do a weighted average, but computing these probabilities and doing an expected value summation gets ugly quickly -- the first sign that we consider trying to make use of linearity of expectation! How can we express \(X\) as the sum of simpler random variables?We note that getting paid $\(n\) for \(n\) heads is the same as saying that she gets paid $\(1\) for each head, so we can think of her total payout as a sum of the payouts for each individual coin ($\(1\) for heads). Thus, if we let \(X_i=1\) if the \(i^\text{th}\) coin is heads and \(X_i=0\) if it is tails, then we can write Caroline's payout as
\[X = X_1+X_2+\cdots+X_{10}.\]
To find the expected value of \(X\), we now just need to find the expected value of \(X_1+X_2+\cdots+X_{10}\). Just like that, we’ve got our problem looking like a linearity of expectation problem!
Of course, since each coin is heads with probability \(\frac{1}{2}\), \(E\left[X_i\right]=1\cdot \frac{1}{2} + 0\cdot \frac{1}{2} = \frac{1}{2}\) for all \(i\). Thus,
\[E[X] = E\left[X_1+X_2+\cdots+X_{10}\right] = 10 \cdot \frac{1}{2} = 5,\]
so Caroline's expected payout is $\(5\). \(_\square\)
As you can see, linearity of expectation can greatly simplify the calculation required in an expected value calculation! Notice how we used the fact that the expected value calculation seemed messy to consider invoking linearity of expectation, and then we cleverly wrote the random variable (Caroline's payout) as a sum of simpler random variables. Go ahead and try the following problem to test your skills:
Sammy is lost and starts to wander aimlessly. Each minute, he walks one meter forward with probability \(\frac{1}{2}\), stays where he is with probability \(\frac{1}{3}\), and walks one meter backward with probability \(\frac{1}{6}\).
After one hour, what is the expected value for the forward distance (in meters) that Sammy has traveled?
In the example and problem above, we have applied linearity of expectation to sums of independent random variables. That’s all well and good, but remember that one of the most powerful aspects of linearity of expectation is that it applies to random variables which are dependent! Let’s work out an example problem where we have a sum of dependent random variables.
The digits \(1,2,3,\) and \(4\) are randomly arranged to form two two-digit numbers, \(\overline{AB}\) and \(\overline{CD}.\) For example, we could have \(\overline{AB} = 42\) and \(\overline{CD} = 13.\) What is the expected value of \(\overline{AB}\cdot \overline{CD}?\)
Our first instinct is to find \(E\big[\,{\small\overline{AB}}\,\big]\) and \(E\big[\,{\small\overline{CD}}\,\big]\) and simply multiply them. However, remember that \( E[xy] = E[x] \times E[y] \) only holds when the random variables are independent . Clearly, \(\overline{AB}\) and \(\overline{CD}\) are not independent since each digit can only be used once; for example, if \(\overline{AB}=13\) then we would know that \(\overline{CD} \) can only be 24 or 42. This inspires us to try to turn our problem into some kind of sum.Since our most simple variables are the individual digits, we are inspired to write the product as
\[\left(10A+B\right)\cdot \left(10C+D\right) = 100\cdot AC + 10 \cdot AD + 10 \cdot BC + BD.\]
However, it is clear that the expected value of any of these products of the form \(AC\) is the same since there is symmetry among \(A,B,C,D.\) We can compute
\[E\left[AC\right] = \frac{1\cdot 2 + 1\cdot 3 + 1\cdot 4 +2\cdot 3 + 2 \cdot 4 + 3\cdot 4}{6} = \frac{35}{6}.\]
Thus, by linearity of expectation,
\[E\big[\,{\small\overline{AB}\cdot \overline{CD}}\,\big] = E\left[100\cdot AC\right]+ E\left[10 \cdot AD\right] + E\left[10 \cdot BC\right] + E\left[BD\right] = 121\cdot E\left[AC\right] = \frac{4235}{6} = 705.8\overline{3}.\ _\square\]
Now that we've got the basic ideas under our belt, let's move on to some more complicated problem-solving techniques.
Using Indicator Variables
In this section, we'll continue exploring techniques that allow us to solve combinatorics problems using linearity of expectation, leading up to the introduction of a new tool known as indicator variables. In particular, this technique is useful when the random variable under consideration is counting the number of occurrences of simple events.
In these types of problems, the fact that the random variable under consideration can be written as a sum of other random variables will be less obvious than in the previous section. But with the tools we've built up thus far, we'll be able to solve these problems in no time!
Let's begin with an example to help spark some ideas about clever ways to write our random variable as a sum of simpler random variables:
A box contains a yellow ball, an orange ball, a green ball, and a blue ball. Billy randomly selects 4 balls from the box (with replacement). What is the expected value for the number of distinct colored balls Billy will select?
We want to find the expected value of the number of distinct colors Billy selects. We could directly compute the probabilities that Billy selects \(k=1,2,3,4\) distinct colors, but this is a little bit messy (and would be even harder if there were many more than 4 balls). That's our cue to see if linearity of expectation might be helpful!Now, we need to think about how to write the number of distinct colors Billy selects as a sum of some other random variables. To get started, suppose Billy's four selections were as follows:
How would you determine the number of distinct colors Billy has selected? Well, looking at the image above, Billy has chosen the orange, green, and blue balls, but he hasn't chosen the yellow ball, so that's 3 distinct colors. If you're thinking to yourself, "yeah sure, I know how to count already!", hang tight—here's where the magic happens.
Since our random variable is just counting how many of the colors have been selected, we can think of it as a sum of four random variables: one for each color, which is equal to 1 if the color is selected and 0 if it is not. So, in the example above, the orange, green, and blue random variables would be 1, while the yellow random variable would be 0. Adding these up gives a total of 3 distinct colored balls selected.
Furthermore, the expectancy of any of these four random variables is simple to compute. In fact, using the basic definition of expected value, we see that its expectancy is simply equal to the probability that the color is selected. Using complimentary probability techniques, we find that the probability that an arbitrary color (let's say yellow) is selected is
\[1-P(\text{yellow is not selected}) = 1-\left(\frac{3}{4}\right)^4 = \frac{175}{256}.\]
Finally, by linearity of expectation, the expected value for the number of distinct colored balls that Billy will select is
\[4\cdot \frac{175}{256} = \frac{175}{64} \approx 2.73. \ _\square\]
Let's reflect on this example and see what important ideas we have used and developed:
- Linearity of expectation helped us compute a seemingly complicated expected value and in a very simple way (albeit after using a clever insight—but these will become second nature with more practice)!
- Linearity of expectation allowed us to not worry about the fact that we were considering a sum of dependent random variables. Obviously, these variables are dependent: for example, if three of them are 0, then the fourth must be 1 (since at least one color must be selected). However, as we proved at the beginning of this page, linearity of expectation holds for all random variables, regardless of independence.
- Because our random variable counted something (e.g. the number of distinct colored balls), we were able to write it as a sum of random variables that indicated which things should be counted (e.g. a 1 for each color which was selected).
Looking at this last point, it is actually quite reminiscent of how we solved the earlier example with Caroline's coin flipping when we counted the number of heads she flipped by summing over random variables which indicated, for each coin, whether or not it was heads. In fact, there is a formal term for such a random variable:
An indicator random variable on an event \(t\), often denoted as \(\mathbb{1}_t\) is the random variable which is \(1\) if \(t\) occurs and 0 if \(t\) does not occur. By the definition of expected value,
\[E\left[\mathbb{1}_t\right] = P(t\text{ occurs})\cdot 1 + P(t\text{ does not occur})\cdot 0 = P(t\text{ occurs}).\]
These indicator variables can be extremely helpful for solving problems with linearity of expectation, especially when it is simple to compute their expectancy (that is, the probability of the event occurring). Using this idea, you’re ready to take on a problem that would otherwise be essentially impossible to solve in a simple, clean way.
To summarize, using linearity of expectation with indicator variables is often useful when
- a problem looks like it could be solved using linearity of expectation;
- there is not an obvious way to write the random variable under consideration as a sum of simpler random variables;
- the random variable under consideration is counting the number of occurrences of a fixed number of simple events.
Using States
In this section, we'll introduce a technique for applying linearity of expectation when the random variable under consideration measures the amount of time or number of steps it takes to complete some sort of process. If that sounds a little vague, let's consider the following simple example to introduce this idea of "states" within a process:
Allison has an unfair coin which lands on heads with probability \(p.\) What is the expected value for the number of times she will have to flip the coin until she flips a heads?
At first glance, our random variable is certainly counting simple events, so it seems like indicator variables might help. We just need to count the number of tails we flip, and each flip is tails with probability \(1-p.\) But there's a major problem! We don't know how many of these flips should be in our sum; in fact, that is almost precisely the question being posed.Instead, we think of this problem in terms of "completing a process" in which there are multiple states. Here, the states are simple: we have either not yet flipped a head (let's call this state 0), or we have flipped a head (let's call this state 1).
Now, let \(E\left[X_i\right]\) denote the expected number of flips needed to complete the process (flip the first head) given that we are in state \(i.\) It should be clear that \(E\left[X_1 \right] = 0,\) since state 1 means we have flipped a head so the process is completed. On the other hand, once we flip in state 0, we will remain in state 0 with probability \(1-p\) and we will move to state 1 with probability \(p.\) Thus, considering our state after the first flip in state 0, we have
\[E\left[X_0\right] = 1 + (1-p) \cdot E\left[X_0\right] + p \cdot \underbrace{E\left[X_1\right]}_{0},\]
so \(p \cdot E\left[X_0\right] = 1,\) giving \(E\left[X_0\right] = \frac{1}{p}. \ _\square\)
Remark: This is one way to derive the expected value of a Bernoulli distribution.
Let's look at a more complicated example using states, in which we'll be able to directly apply the result we've just derived!
With each purchase at SlurpeeShack, you receive one random piece of the puzzle seen at right.
Once you collect all 12 pieces, you get a free Slurpee!
What is the expected value for the number of purchases you will need to make in order to collect all 12 pieces?
Here, our states are how many distinct coupons we have. Let \(X_i\) denote the number of purchases we need to make to get the \(i^\text{th}\) distinct coupon, so the random variable under consideration is \(\sum_{i=1}^{12} X_i.\)Once you have \(i-1\) distinct coupons, the probability of getting a new one on any given purchase is \(\frac{12-(i-1)}{12}.\) As derived in the previous example, the expected number of purchases needed will be \(E\left[X_i\right] = \frac{12}{12-(i-1)}.\) Thus, by linearity of expectation,
\[E\left[\sum_{i=1}^{12} X_i \right] = \sum_{i=1}^{12} E\left[X_i\right] = \sum_{i=1}^{12} \frac{12}{12-(i-1)} = \frac{12}{12} + \frac{12}{11} + \cdots + \frac{12}{1} = \frac{86021}{2310} \approx 37. \ _\square\]
Bonus: This is a famous problem, often referred to as the coupon collector's problem. In general, for \(n\) coupons, the expected value for the number of purchases will be \(nH_n,\) where \(H_n = \sum_{i=1}^{n} \frac{1}{i}\) is the \(n^{\text{th}}\) harmonic number. It can be shown with calculus that, for large \(n,\) \(H_n \approx \ln n,\) so the expected value is approximately \(n \ln n.\)
As we've seen, the method of states is useful when we can write our random variable as the sum of other random variables which measure the amount of time/steps it takes to get from one state to the next. If you're looking for an additional challenge, try this problem where the states are even less obvious:
Sarah the squirrel is trying to find her acorn, but she can't remember where she left it!
She starts in the lower-left corner of the \(2\times 2\) grid, and at each point, she randomly steps to one of the adjacent vertices (so she may accidentally travel along the same edge multiple times).
What is the expected value for the number of steps Sarah will take before she finds her acorn in the top-right corner?
Jorge has an \(N\)-sided fair die and wonders how many times he would need to roll it until he has rolled all the numbers from \(1\) to \(N\) (in any order).
He does a quick calculation and discovers that the expected value for the number of rolls is 91 when rounding to the nearest integer.
How many sides does his die have?
Clarification: One distinct number from \(1\) to \(N\) is printed on each face of the die.
Image credit: ravnerdwars.info
Other Expected Value Quizzes
An infinite line of stepping stones stretches out into an infinitely large lake.
A frog starts on the second stone from the shore.
Every second, he takes a jump to a neighboring stone. He has a 60% chance of jumping one stone closer to the shore and a 40% chance of jumping one stone further away from the shore.
What is the expected value for the number of jumps he will take before reaching the first stone (the one closest to the shore)?
Other Expected Value Quizzes
Applications
We've seen how to solve some great combinatorics problems using linearity of expectation. However, there are also many real-world and cross-discipline uses of linearity of expectation, and in this section we'll explore a few of those.
Let's begin by considering lotteries.
There is a lottery contest in which participants pay $1 to choose 5 distinct numbers from the integers 1 to 50. The lottery company offers a $2,500,000 prize if someone guesses the correct 5 numbers (and the prize will be split among all winners if there are multiple correct participants).
We know from combinations that there are \(\binom{50}{5} = 2,118,760\) possible choices. But wait—by the pigeonhole principle, this means that if we just get a group of \(2,118,760\) people to each submit a distinct lottery ticket, we will surely make money from the lottery company!
While that's true, lottery companies know that participants usually don't have the resources to coordinate in such a way. Instead, they may assume that each person likely chooses their numbers random. If this is the case and \(n\) people participate in a lottery with \(m\) possible choices, what is the probability that some participant wins?
Let \(X_i\) be an indicator variable on the event that lottery combination \(1 \le i \le m\) is not chosen by any participants. Since each person chooses randomly,\[E\left[X_i\right] = \left(1 - \frac{1}{m}\right)^n\]
for all \(i.\) By linearity of expectation, the expected value for the number of unchosen combinations is
\[E\left[\sum_{i=1}^{m} X_i\right] = \sum_{i=1}^{m} E\left[X_i\right] = m \cdot \left(1 - \frac{1}{m}\right)^n.\]
Now, since the winning combination is chosen randomly,
\[1-\frac{m \cdot \left(1 - \frac{1}{m}\right)^n}{m} = 1- \left(1 - \frac{1}{m}\right)^n.\]
For example, in our lottery with 2,118,760 choices and 2,118,760 participants, the probability that someone wins is approximately 63%. \(_\square\)
Bonus: What happens when as \(m,\) the number of possible choices, becomes very large? How does this affect how a lottery company would need to adapt if the number of participants were to increase? Hint: The result involves a famous limit of a sequence related to the constant \(e.\) Additionally, read this answer for a deeper insight between \(e\) and combinatorics.
It's very cool to see how we were able to apply our skills with linearity of expectation to discover an interesting fact about real-world lotteries! Let's turn to something with a very different flavor: a famous geometric probability question.
Suppose a needle of length 1 is dropped onto a floor with strips of wood 1 unit apart. What is the probability that the needle will land across two strips of wood?
The traditional solution to this problem uses Calculus, but we're going to show how to solve it with linearity of expectation instead!
First, we're going to argue that the expected number of times a needle crosses the strips of woods is directly proportional to the length \(l\) of the needle. Suppose that the needle is made up of \(n\) linear pieces of equal length, and let \(X_i\) be an indicator variable on the \(i^{\text{th}}\) piece crossing two strips of wood. Then, the expected number of total crosses, by linearity of expectation, is\[E\left[\sum_{i=1}^{n} X_i\right] = \sum_{i=1}^n E\left[X_i\right] = n \cdot E\left[X_1\right].\]
Holding the length of these small pieces constant, we see that the expected number of times is proportional to the length of the needle (admittedly, this is a little bit of hand-waving, but the intuition should be clear).
Thus, as a function of the length \(l,\) the expected number of crossings is \(cl\) for some constant \(c.\) However, consider a circle with diameter 1 (so circumference \(\pi\)); with probability 1, this circle will intersect exactly 2 of the wood-crossings. Hence, \(c\pi = 2,\) so \(c = \frac{2}{\pi}.\) If you're uncomfortable with the idea of a circle, consider approximating the circle by combining a bunch of very, very small linear segments.
For any needle (such as ours) which can intersect at most one wood-crossing, \(\sum_{i=1}^{n} X_i\) is in fact an indicator variable on the event that the needle lands across two strips of wood, so the expected value is precisely the probability of this occurring. Thus, the probability of a crossing with a needle of length 1 is simply \(\frac{2}{\pi}\approx 64\%. \ _\square\)
Bonus: Can you think about how you could use this result to estimate the value of \(\pi\) via a Monte-Carlo simulation?
Here are some other examples which can be tackled with linearity of expectation:
Many questions about expected values in random graphs can be answered with linearity of expectation. For example, suppose that there is a group of \(n\) potential friends, and each pair of people becomes friends with probability \(\frac{1}{2}\). What is the expected value for the number of "friend-triplets," e.g. groups of three people who are all mutual friends?
Note that there are \(\binom{n}{3}\) such triplets, and each triplet is a friend triplet with probability \(\left(\frac{1}{2}\right)^3 = \frac{1}{8}.\) If we let \(X_i\) be an indicator variable on the \(i^{\text{th}}\) triplet being a friend triplet, then the expected number of friend triplets is\[E\left[\sum_{i=1}^{\binom{n}{3}} X_i \right] = \sum_{i=1}^{\binom{n}{3}} E\left[X_i\right] = \frac{\binom{n}{3}}{8}. \ _\square\]
If \(n\) people are in a room, what is the expected number of distinct birthdays represented, assuming there are 365 days in every year? How does this relate to the birthday paradox problem?
This is very similar to the example with lottery tickets. See if you can work this out!
In computer science, the randomized quicksort algorithm has expected runtime \(O(n\log n).\) How does linearity of expectation allow us to show this?
(Want to write a solution to this example? Edit this wiki!)