Continuous Random Variables - Probability Density Function (PDF)
The probability density function or PDF of a continuous random variable gives the relative likelihood of any outcome in a continuum occurring. Unlike the case of discrete random variables, for a continuous random variable any single outcome has probability zero of occurring. The probability density function gives the probability that any value in a continuous set of values might occur. Its magnitude therefore encodes the likelihood of finding a continuous random variable near a certain point.
Heuristically, the probability density function is just the distribution from which a continuous random variable is drawn, like the normal distribution, which is the PDF of a normally-distributed continuous random variable.
Contents
Definition of the Probability Density Function
The probability that a random variable \(X\) takes a value in the (open or closed) interval \([a,b]\) is given by the integral of a function called the probability density function \(f_X(x)\):
\[P(a\leq X \leq b) = \int_a^b f_X(x) \,dx.\]
If the random variable can be any real number, the probability density function is normalized so that:
\[\int_{-\infty}^{\infty} f_X(x) \,dx = 1.\]
This is because the probability that \(X\) takes some value between \(-\infty\) and \(\infty\) is one: X does take a value!
These formulas may make more sense in comparison to the discrete case, where the function giving the probabilities of events occurring is called the probability mass function \(p(x)\). In the discrete case, the probability of outcome \(x\) occurring is just \(p(x)\) itself. The probability \(P(a\leq X \leq b)\) is given in the discrete case by:
\[P(a\leq X \leq b) = \sum_{a\leq x \leq b} p(x),\]
and the probability mass function is normalized to one so that:
\[\sum_x p(x) = 1,\]
where the sum is taken over all possible values of \(x\). One can see that the analogous formulas for continuous random variables are identical with the sums promoted to integrals.
The non-normalized probability density function of a certain continuous random variable \(X\) is:
\[f(x) = \frac{1}{1+x^2}.\]
Find the probability that \(X\) is greater than one, \(P(X > 1)\).
Solution:
First, the probability density function must be normalized. This is done by multiplying by a constant to make the total integral one. Computing the integral:
\[\int_{-\infty}^{\infty} \frac{1}{1+x^2} \,dx = \bigl. \arctan (x)\bigr|_{-\infty}^{\infty} = \pi.\]
So the normalized PDF is:
\[\tilde{f} = \frac{1}{\pi(1+x^2)}.\]
Computing the probability that \(X\) is greater than one,
\[P(X>1) = \int_1^{\infty} \frac{1}{\pi(1+x^2)} = \frac{1}{\pi} \bigl. \arctan(x) \bigr|_1^{\infty} = \frac{1}{\pi} \left(\frac{\pi}{2} - \frac{\pi}{4}\right) =\frac{1}{4}.\]
It should be noted that the probability density function of a continuous random variable need not be continuous itself.
In contrast to the case of discrete random variables, the probability density function \(f_X(x)\) of a continuous random variable need not satisfy the condition \(f_X(x)\leq 1\). A uniformly distributed continuous random variable on the interval \([0,\frac{1}{2}]\) has constant probability density function \(f_X(x)=2\) on \([0,\frac{1}{2}]\). Another example is the unbounded probability density function \(f_X(x) = \frac{1}{2\sqrt{x}}, 0< x <1\) of a continuous random variable taking values in \((0,1)\).
Mean and Variance of Continuous Random Variables
Recall that in the discrete case the mean or expected value \(E(X)\) of a discrete random variable was the weighted average of the possible values \(x\) of the random variable:
\[E(X) = \sum_x x p(x).\]
This formula makes intuitive sense. Suppose that there were \(n\) outcomes, equally likely with probability \(\frac{1}{n}\) each. The the expected value is just the arithmetic mean, \(E(X) = \frac{x_1 + x_2 + \ldots + x_n}{n}\). In the cases where some outcomes are more likely than others, these outcomes should contribute more to the expected value.
In the continuous case, the generalization is again found just by replacing the sum with the integral and \(p(x)\) with the PDF:
\[E(X) = \int_{-\infty}^{\infty} x f(x) \,dx,\]
assuming the possible values \(X\) are the entire real line. If \(X\) is constrained instead to \([0,\infty]\) or some other continuous interval, the integral limits should be changed accordingly.
The variance is defined identically to the discrete case:
\[\text{Var} (X) = E(X^2) - E(X)^2.\]
Computing \(E(X^2)\) only requires inserting an \(x^2\) instead of an \(x\) in the above formulae:
\[E(X^2) = \int_{-\infty}^{\infty} x^2 f(x) \,dx,\]
The mean and the variance of a continuous random variable need not necessarily be finite or exist. Cauchy distributed continuous random variable is an example of a continuous random variable having both mean and variance undefined.
Show that the exponential random variable given by the normalized PDF:
\[f(x) = \lambda e^{-\lambda x}\]
has mean \(E(X) = \frac{1}{\lambda}\) and variance \(\text{Var}(X) = \frac{1}{\lambda^2}\).
Solution:
Note that the exponential random variable is defined for \(x\) in the range \([0,\infty)\) (if not obvious why, consider that the PDF is only normalized for this range). Computing the expected values that define the mean and variance respectively using integration by parts:
\[E(X) = \int_0^{\infty} \lambda x e^{-\lambda x}\,dx = \int_0^{\infty}e^{-\lambda x}\,dx = \frac{1}{\lambda}.\]
So the mean is therefore indeed \(\frac{1}{\lambda}\).
\[E(X^2) = \int_0^{\infty} \lambda x^2 e^{-\lambda x}\,dx = \int_0^{\infty} 2x e^{-\lambda x} = \frac{2}{\lambda} E(X) = \frac{2}{\lambda^2} .\]
So the variance is:
\[\text{Var}(X) = E(X^2) - E(X)^2 = \frac{2}{\lambda^2}- \frac{1}{\lambda^2} = \frac{1}{\lambda^2}.\]
Suppose a continuous random variable \(X\) is given by the PDF:
\[f(x) = \begin{cases} 2x \quad & x \in [0,1] \\ 0 \quad & \text{otherwise.} \end{cases}\]
If the mean of \(X\) is \(A\) and the variance of \(X\) is \(B\), what is \(A+B\)?