Continuous Random Variables - Joint Probability Distribution
In many physical and mathematical settings, two quantities might vary probabilistically in a way such that the distribution of each depends on the other. In this case, it is no longer sufficient to consider probability distributions of single random variables independently. One must use the joint probability distribution of the continuous random variables, which takes into account how the distribution of one variable may change when the value of another variable changes.
Contents
Definition of Joint Probability Distribution
The probability that the ordered pairs of random variables \((X,Y)\) take values in the (open or closed) intervals \([a,b]\) and \([c,d],\) respectively, is given by the integral of a function called the joint probability density function \(f_{XY} (x,y):\)
\[P(a\leq X \leq b, c\leq Y\leq d) = \int_a^b \int_c^d f_{XY} (x,y) \,dy\, dx.\]
In the discrete case, if \(X\) and \(Y\) are two random variables, then to each pair of possible outcomes \(X=x\) and \(Y=y\) can be assigned the number \(p_{_{XY}} (x,y)\), the probability of that pair of outcomes. The sum over all possible pairs of outcomes is then equal to one in the discrete case:
\[\sum_{xy} p_{_{XY}}(x,y) = 1.\]
As before, the generalization to the continuous case follows by replacing the sums with integrals and \(p_{xy}\) with \(f_{XY}:\)
\[\int \int f_{XY}(x,y)\, dx\, dy = 1.\]
This is the normalization condition for joint probability density functions.
Intuitively, the joint probability density function just gives the probability of finding a certain point in two-dimensional space, whereas the usual probability density function gives the probability of finding a certain point in one-dimensional space.
A certain joint probability density function is given by the formula
\[f_{XY} (x,y) = Ce^{-x^2 - y^2},\]
where \(x\) and \(y\) both range over the entire real number line. Find the normalization constant \(C\).
Computing the normalization integral using polar coordinates,
\[\int Ce^{-x^2-y^2}\, dy\, dx = \int_0^{2\pi} \int_0^{\infty} Cre^{-r^2}\, dr\, d\theta = 2\pi C \left(\frac12 \right) = C \pi.\]
Thus the constant \(C\) is \(\frac{1}{\pi}.\ _\square\)
A normalized joint probability density function on the square \([0,3]\times[0,3]\) is given by
\[f_{XY} (x,y) = \frac{2}{81} xy^2.\]
Find the probability that \(X\) is between \(2\) and \(3\) and \(Y\) is greater than \(1\).
By the definition of the joint probability density function, this probability is
\[P(2\leq X \leq 3, Y \geq 1) = \int_2^3 \int_1^3 \frac{2}{81} xy^2 \,dy\, dx = \int_2^3 \frac{2}{81} \frac{26x}{3} dx = \frac{130}{243}.\ _\square\]
Marginal Distributions
Suppose that one has the joint probability density function for \(X\) and \(Y\), \(f_{XY} (x,y)\). But perhaps only the variable \(X\) is relevant to the problem at hand, i.e. one only cares about the probability \(P (X=x)\) regardless of the value of \(Y\). Fortunately, the marginal distributions \(f_X (x)\) and \(f_Y (y)\) can be extracted from the joint probability distributions.
In the discrete case, recall that every ordered pair of outcomes \((x,y)\) is assigned the probability \(p (x,y)\). Since the discrete case is discrete, these may be thought of as matrix elements \(p_{ij}\) by ordering the possible outcomes in some way, where each fixed \(i\) corresponds to fixed \(x\) and vice versa. If one is looking for the probability of \(X=x\), one wants to sum all the probabilities in the matrix where this is true:
\[P(X=x) = p (x,y_1) + p(x,y_2) + \cdots.\]
If fixed \(x\) corresponds to row \(i\), this probability is
\[P(X=x) = \sum_j p_{ij} = \sum_y p(x,y).\]
That is, the probability that \(X=x\) is found by summing the probabilities of every possible outcome where \(X=x\).
Given the last formula above in the discrete case, the generalization to the continuous case is now easy by replacing the sums with integrals. The marginal distributions are found by integrating over the "irrelevant" variable:
\[f_X (x) = \int f(x,y)\, dy, \qquad f_Y (y) = \int f(x,y)\, dx.\]
In probability, two random variables are independent if the outcome of one does not influence the other. Independence can be stated in terms of joint probability density function using marginal distributions via the statement
\[f_{X,Y}(x,y) = f_X (x) f_Y (y).\]
That is, two random variables are independent if their joint probability distribution function factors into the marginal distributions.
A certain joint probability density function is given by the formula
\[f_{XY} (x,y) = \frac{\sqrt{\pi}}{2} x \sin (xy),\]
where \(X\) and \(Y\) are both drawn from the interval \(\big[0,\sqrt{\pi}\big].\) Find the marginal distribution \(f_X (x).\)
The marginal distribution in \(X\) is given by integrating out \(Y:\)
\[f_X (x) = \int_0^{\sqrt{\pi}} \frac{\sqrt{\pi}}{2} x \sin (xy)\, dy = \frac{\sqrt{\pi}}{2} \left. \big(- \cos(xy)\big) \right|^{\sqrt{\pi}}_0 = \frac{\sqrt{\pi}}{2} - \frac{\sqrt{\pi}}{2} \cos \big(x\sqrt{\pi}\big).\ _\square\]
Expectation, Variance, and Covariance
The expected value, variance, and covariance of random variables given a joint probability distribution are computed exactly in analogy to easier cases. The expected value of any function \(g(X,Y)\) of two random variables \(X\) and \(Y\) is given by
\[E\big(g(X,Y)\big) = \int \int g(x,y) f_{XY} (x,y)\, dy\, dx.\]
For instance, the expected value of \(X\) is
\[E(X) = \int \int x f_{XY} (x,y)\, dy\, dx.\]
The variance of each variable independently is defined accordingly:
\[\text{Var} (X) = E\big(X^2\big) - E(X)^2.\]
Note that the expected values can be computed using either the joint probability distributions or the marginal distributions, since the two cases will be mathematically equivalent (in one case, the two integrations are performed together; in the other, they are performed one at a time).
A new quantity relevant to joint probability distributions is the covariance of two random variables, which is defined by
\[\text{Cov} (X,Y) = E(XY) - E(X) E(Y).\]
There are a few important observations to make about this expression. The first is to note that \(\text{Cov} (X,X) = \text{Var} (X)\) and similarly for \(Y\). Secondly, note that the independence of \(X\) and \(Y\) is equivalent to their covariance vanishing. This is because if \(X\) and \(Y\) are independent then \(E(XY) = E(X) E(Y)\) since the joint probability density function factors. The covariance thus encapsulates how much changing one random variable affects the other.
A certain joint probability distribution is given by the joint PDF
\[f_{XY} (x,y) = 4xy,\]
where \(X\) and \(Y\) are sampled from the interval \([0,1]\). Compute \(\text{Cov} (X,Y)\).
To compute the covariance, one must compute a number of expectation values according to the above definition. Computing each gives
\[E(X) = \int_0^1 \int_0^1 4x^2 y\, dy\, dx = \frac23 = E(Y),\]
where, in the last line, the symmetry of \(x\) and \(y\) in the joint probability density function allows one to say \(E(X) = E(Y)\) without doing more computation. Now
\[E(XY) = \int_0^1 \int_0^1 4x^2 y^2\, dy\, dx = \frac49.\]
Since \(E(XY) = E(X) E(Y) = E(X)^2 = E(Y)^2\), \(\text{Cov} (X,Y) = 0\).
This should have been expected; since the joint PDF factorizes into marginal distributions, \(X\) and \(Y\) are independent. \(_\square\)