Variance
Variance is a statistic that is used to measure deviation in a probability distribution. Deviation is the tendency of outcomes to differ from the expected value.
Studying variance allows one to quantify how much variability is in a probability distribution. Probability distributions that have outcomes that vary wildly will have a large variance. Probability experiments that have outcomes that are close together will have a small variance.
The variance explored on this page is different from sample variance, which is the variance of a sample of data.
Calculating the Variance
If is a numerical discrete random variable with distribution and expected value , the variance of , denoted as or , is
Note that from the definition, the variance is always non-negative, and if the variance is equal to zero, then the random variable takes a single constant value, which is its expected value
In the rest of this summary, it is assumed is a discrete numerical random variable with distribution and expected value The following theorem gives another method to calculate the variance.
The variance of random variable is
By definition,
where the third line follows from linearity of expectation.
What is the variance of a fair, six-sided die roll?
Let be the random variable that represents the result of the die roll. It is known that .
is calculated as follows:
Then, the variance can be calculated:
Properties of Variance
The following properties of variance correspond to many of the properties of expected value. However, some of these properties have different results.
For a constant ,
We have
For random variable and any constant ,
By the properties of expectation, . Then,
For random variable and any constant ,
The above two theorems show how translating or scaling the random variable by a constant changes the variance. The first theorem shows that scaling the values of a random variable by a constant scales the variance by . This makes sense intuitively since the variance is defined by a square of differences from the mean. The second theorem shows that translating all variables by a constant does not change the variance. This also makes intuitive sense, since translating all variables by a constant also translates the expected value, and the spread of the translated values around the translated expected value remains unchanged.
From the linearity property of expected value, for any two random variables and , However, this does not hold for variance in general. One special case for which this does hold is the following:
Let and be independent random variables. Then
We have
where the calculation in the fourth line follows from the independence of random variables and .
The following is a generalization of the above theorem.
Let be pairwise independent random variables. Then
For non-independent random variables and ,
We have
In order to calculate the variance of the sum of dependent random variables, one must take into account covariance.
You are planting 5 sunflowers in each of the 2 gardens, where these sets of plants shoot out in varying heights.
Shown above is the graph depicting the height of each sunflower, where the red line indicates the mean height of sunflower population .
For example, the shortest sunflower in Garden A is 5 cm shorter than average while the highest one in Garden B is 7 cm higher than average.
Which set of sunflowers has higher population variance?
Standard Deviation
The standard deviation of a random variable, denoted , is the square root of the variance, i.e.
Note that the standard deviation has the same units as the data. The variance of a random variable is also denoted by .
Worked Examples
There are bags, each containing balls numbered through . From each bag, ball is removed. What are the variance and standard deviation of the total of the two balls?
Let be the random variable denoting the sum of these values. Then, the probability distribution of is given as follows:
Previously, the expected value was calculated, As such,
\[ \begin{align} \text{Var}(X) =& E\big[(X - \mu)^2\big] \\ =& (2-6)^2 \times \frac{1}{25} + (3-6)^2 \times \frac{2}{25} + (4-6)^2 \times \frac{3}{25} \\ & + (5-6)^2 \times \frac{4}{25} + (6-6)^2 \times \frac{5}{25} + (7-6)^2 \times \frac{4}{25} \\ & + (8-6)^2 \times \frac{3}{25} + (9-6)^2 \times \frac{2}{25} + (10-6)^2 \times \frac{1}{25} \\ =& 4.
\end{align} \]Then the standard deviation is
six-sided dice are rolled. What are the variance and standard deviation for the number of times a is rolled?
Let be the random variable representing the number of times a is rolled. The table below lists the probabilities of rolling different numbers of s:
The expected number of times a is rolled is , so . Now the goal is to calculate
Therefore, and .
Consider the Bernoulli process of a sequence of independent coin flips for a coin with probability of heads . Let be a random variable with if the flip is heads and if the flip is tails. Let be a random variable indicating the number of trials until the first flip of heads in the sequence of coin flips. What are the variance and standard deviation of
In the Expected Value wiki, it was demonstrated that is a geometrically distributed random variable with . Then
where
To compute this sum, consider the algebraic identity
By first differentiating this equation, then multiplying throughout by , and then differentiating again,
This implies