Covariance
The covariance generalizes the concept of variance to multiple random variables. Instead of measuring the fluctuation of a single random variable, the covariance measures the fluctuation of two variables with each other.
Contents
Definition
Recall that the variance is the mean squared deviation from the mean for a single random variable \( X \): \[ \text{Var}(X) = E[\left(X - E[X]\right)^2]. \] The covariance adopts an analogous functional form.
The covariance \( \text{Cov}(X, Y) \) of random variables \( X \) and \( Y \) is defined as \[ \text{Cov}(X, Y) = E\left[(X - E[X])(Y - E[Y])\right]. \]Now, instead of measuring the fluctuation of a single variable, the covariance measures how two variables fluctuate together. For the covariance to be large, both \( X - E[X] \) and \( Y - E[Y] \) must be large at the same time or, in other words, change together.
Calculation of the Covariance
It is generally simpler to find the covariance by taking \[ \begin{align} \text{Cov}(X, Y) &= E[XY - E[X] Y - X E[Y] + E[X] E[Y]] \\ &= \boxed{E[XY] - E[X] E[Y].} \end{align} \] In other words, to compute the covariance, one can equivalently find \( E[XY] \) (in addition to the means of \( X \) and \( Y \)).
Similarly, one can find an expression in terms of variances: \[ \begin{align} \text{Var}(X + Y) &= E\left[(X + Y - E[X] - E[Y])^2\right] \\ &= E[\left(X - E[X]\right)^2] + E[\left(Y - E[Y]\right)^2] + 2 E\left[(X - E[X])(Y - E[Y])\right] \\ &= \boxed{\text{Var}(X) + \text{Var}(Y) + 2 \text{Cov}(X, Y).} \end{align} \]
A generalized statement of this result is as follows.
Variance of a sum. Given random variables \( X_i \), each with finite variance,
\[ \text{Var}\left( \sum_i X_i \right) = \sum_i \ \text{Var}(X_i) + 2 \sum_{i<j} \text{Cov}(X_i, X_j). \]
Covariance - Properties
The covariance inherits many of the same properties as the inner product from linear algebra. The proof involves straightforward algebra and is left as an exercise for the reader.
Given a constant \( a \) and random variables \( X \), \( Y \), and \( Z \), the following properties hold:
- \( \text{Cov}(X, X) = \text{Var}(X) \geq 0 \)
- \( \text{Cov}(X, Y) = \text{Cov}(Y, X) \)
- \( \text{Cov}(aX, Y) = a \text{Cov}(X, Y) \)
- \( \text{Cov}(X, a) = 0 \)
- \( \text{Cov}(X + Y, Z) = \text{Cov}(X, Z) + \text{Cov}(Y, Z) \).
Given knowledge of \( \text{Cov}(W, Y) \), \( \text{Cov}(W, Z) \), \( \text{Cov}(X, Y) \), and \( \text{Cov}(X, Z) \), which of the following can necessarily be computed?
I. \( \text{Cov}(W + X, Y + Z) \)
II. \( \text{Cov}(Y + Z, W + X) \)
III. \( \text{Cov}(W, X + Y + Z) \)
IV. \( \text{Cov}(W, X + Y + Z) \), if it known that \( W \) and \( X \) are independent
Let \( X \) and \( Y \) be random variables such that \( \text{Var}(X) = \sigma^2 \) and \( Y = aX \), where \( \sigma \) and \( a \) are constants. Determine \( \text{Cov}(X, Y) \).
The inner product properties yield
\[ \text{Cov}(X, Y) = \text{Cov}(X, aX) = \text{Cov}(aX, X) = a\text{Cov}(X, X) = a \sigma^2. \]
As a result, the Cauchy-Schwarz inequality holds for covariances.
Cauchy-Schwarz inequality. Given random variables \( X \) and \( Y \),
\[ \left[ \text{Cov}(X ,Y) \right]^2 \leq \text{Var}(X) \text{Var}(Y). \]
One of the key properties of the covariance is the fact that independent random variables have zero covariance.
Covariance of independent variables. If \( X \) and \( Y \) are independent random variables, then \( \text{Cov}(X, Y) = 0. \)
If \( X \) and \( Y \) are independent, then \( E[XY] = E[X] E[Y] \) and therefore \( \text{Cov}(X, Y) = 0 \). (Recall that \( E[XY] = E[X] E[Y] \) is a simple consequence of the fact that \( P(X | Y) = P(X) \).)
Dependent variables with zero covariance. However, the converse is not in general true. As a simple example, suppose that \( X \) is a standard normal random variable and that \( Y = X^2 \). Notice that knowledge of \( X \) completely determines \( Y \), in which case \( X \) and \( Y \) are very clearly dependent. However, by symmetry it holds that \[ \text{Cov}(X, Y) = E[XY] - E[X] E[Y] = 0. \]
A simple corollary is as follows.
Variance of the sum of independent variables. Given independent random variables \( X_i \), each with finite variance,
\[ \text{Var}\left( \sum_i X_i \right) = \sum_i \ \text{Var}(X_i). \]
Since the \( X_i \) are independent, it must be the case that \( \text{Cov}(X_i, X_j) = 0 \) for all \( i \neq j \), and the result follows directly from the variance of a sum theorem.
Covariance Matrix
When dealing with a large number of random variables \( X_i \), it makes sense to consider a covariance matrix whose \( m,n \)th entry is \( \text{Cov}(X_m, X_n) \).
Since \( \text{Cov}(X, Y) = \text{Cov}(Y, X) \), the covariance matrix is symmetric.
References
[1] DeGroot, Morris H. Probability and Statistics. Second edition. Addison-Wesley, 1985.