Continuous Random Variables - Cumulative Distribution Function
The cumulative distribution function, CDF, or cumulant is a function derived from the probability density function for a continuous random variable. It gives the probability of finding the random variable at a value less than or equal to a given cutoff. Many questions and computations about probability distribution functions are convenient to rephrase or perform in terms of CDFs, e.g. computing the PDF of a function of a random variable.
Contents
Definition of the Cumulative Distribution Function
For any random variable \(X,\) the cumulative distribution function \(F_X\) is defined as
\[F_X(x) = P(X \leq x),\]
which is the probability that \(X\) is less than or equal to \(x.\)
Using this definition, one can write the probability that \(X\) takes a value in a certain interval \([a,b]\) without using an integral. Recall that previously this probability was defined in terms of a PDF:
\[P(a\leq X \leq b) = \int_a^b f_X (x) \,dx.\]
Now, the probability is rewritten as the difference in values of the CDF:
\[P(a \leq X \leq b) = F_X(b) - F_X(a).\]
So the CDF gives the amount of area underneath the PDF between two points. It increases from zero (for very low values of \(x\)) to one (for very high values of \(x\)). This is because as \(x \to -\infty\), there is no probability that \(X\) will be found that far out if the PDF is normalized. If \(x \to \infty\), this corresponds to \(P(X \leq \infty)\) which will be one because it is certain that \(X\) takes some finite value.
In the case of discrete random variables, the value of \(F_X\) makes a discrete jump at all possible values of \(x\); the size of the jump corresponds to the probability \(P(X = x)\) of that value. In the case of a continuous random variable, the function increases continuously; it is not meaningful to speak of the probability that \(X = x\) because this probability is always zero. Instead one considers the probability that the value of \(X\) lies in a given interval:
\[P(X \in [a,b]) = P(a ≤ X ≤ b) = F_X(b)-F_X(a).\]
Note that it does not matter if the inequalities are strict (if the interval is \([a,b]\) or \((a,b)\) for example): since the probability of any given value is zero, the endpoints can be included or not without changing any probabilities.
Still, one frequently wants to make use of the probability density function \(f_X (x)\) rather than the CDF. Since the CDF corresponds to the integral of the PDF, the PDF corresponds to the derivative of the CDF:
\[f_X(x) = F_X'(x) = \frac{dF_X}{dx} .\]
A fly lands on a \(30\text{ cm}\) long ruler at a random position chosen uniformly along the ruler. Let \(X\) be the position of the fly in centimeters, and let \(f_X(x)\) be the probability density function for \(X.\) What is \(f_X(5)\)?
Solution:
This probability distribution is uniform, meaning that the probability density is constant on the entire interval \([0, 30]\). This means that \(F_X\) is a linear function: \[F_X(x) = \left\{\begin{array}{ll} 0 & x \leq 0 \\ \frac{x}{30} & 0 \leq x \leq 30 \\ 1 & 30 \leq x. \end{array}\right.\] The probability density function is the derivative: \[f_X(x) = \left\{\begin{array}{ll} 0 & x \leq 0 \\ \frac{1}{30} & 0 \leq x \leq 30 \\ 0 & 30 \leq x. \end{array}\right.\]
Therefore the probability density function at \(x = 5\) is equal to \(\frac{1}{30}.\)
A dart player always hits the dartboard (with a radius of \(20\text{ cm}\)), but has such a poor aim that the distribution of darts is uniform across the entire board. Let \(R\) be the distance in cm between the dart and the center. Evaluate the probability density function for \(R\) at \(0,\) \(10,\) and \(20.\)
Solution:
The probability \(P(R < r)\) is directly proportional to the area of a circle with radius \(r\):
\[F_R(r) = P(R < r) = \frac{\text{area of circle with radius}\ r}{\text{area of dartboard}} = \frac{\pi r^2}{\pi\times 20^2} = \left(\frac r{20}\right)^2.\]
The probability density function is the derivative:
\[f_R(r) = \frac r{200}.\]
Thus one obtains:
\[f_R(0) = 0,\ \ f_R(10) = \tfrac1{20},\ \ f_R(20) = \tfrac1{10}.\]
Functions of a Continuous Random Variable
One question that often comes up in applications of continuous probability is the following: given the PDF of a random variable, is it possible to find the PDF of an arbitrary function of that random variable?
The answer is yes, and the easiest method uses the CDF of the random variable. The general case goes as follows: consider the CDF \(F_X (x)\) of the random variable \(X\), and let \(Z = g(X)\) be a function of \(X\). It's important to note the distinction between upper and lower case: \(X\) is a random variable while \(x\) is a real number. Recall that the PDF is given by the derivative of the CDF:
\[f_X (x) = \frac{d}{dX} F_X (x) = \frac{d}{dx} P(X \leq x).\]
Now write the formula for the CDF of \(Z\):
\[f_Z (z) = \frac{d}{dz} P(Z \leq z) = \frac{d}{dz} P(g(X) \leq z) = \frac{d}{dz} P(X \leq g^{-1} (z)) = \frac{d}{dz} F_X (g^{-1} (z)).\]
If \(g\) is invertible and increasing, then by the chain rule:
\[f_Z (z) = f_X (g^{-1} (z)) \frac{dg^{-1} (z)}{dz}.\]
This formula can be generalized straightforwardly to cases where \(g\) is not invertible or increasing.
Consider a uniform random variable on the interval \([0,1]\). Find the distribution (i.e., PDF) of \(Z = X^3\).
Solution:
Note that \(Z = g(X)\) where \(g\) is an invertible and increasing function, so the discussion above will apply. The CDF of \(X\) is:
\[F_X (x) = x.\]
So:
\[f_Z (z) = \frac{d}{dz} F_X (g^{-1} (z)) = \frac{d}{dz} z^{1/3} = \frac13 z^{-2/3}.\]
This is consistent with the formula derived above.