Gamma Distribution
The gamma distribution is a generalization of the exponential distribution that models the amount of time between events in an otherwise Poisson process in which the event rate is not necessarily constant. It is also used to model the amount of time before the \(k^\text{th}\) event in a Poisson process, equivalent to the note that the sum of exponential distributions is a gamma distribution.
The gamma distribution is often used to model waiting times, particularly in the case of lifespan testing in which the "waiting time" until death is modeled by a gamma distribution. It is also commonly used in applied fields such as finance, civil engineering, climatology (e.g. in estimating rainfall), and econometrics.
Contents
Definition and p.d.f.
There are multiple parametrizations of the gamma distribution. The first is the \(k-\theta\) parametrization and perhaps is the more natural one, with p.d.f.
\[f_X = \frac{1}{\Gamma(k)\theta^k}x^{k-1}e^{-\frac{x}{\theta}},\]
where \(\Gamma\) denotes the gamma function and lends its name to the distribution. This parametrization is natural because it represents the sum of \(k\) i.i.d. exponential random variables that have mean \(\theta\) when \(k\) is a positive integer and has more or less the same interpretation for nonintegral \(k\). Here \(k\) is the shape parameter and \(\theta\) is the scale parameter.
The alternative parametrization, the \(\alpha-\beta\) parametrization, is commonly used in Bayesian statistics as a conjugate prior for rate parameters in distributions such as the exponential distribution and the gamma distribution itself. It has p.d.f.
\[f_X = \frac{\beta^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x},\]
which simply makes the substitution \(\beta = \frac{1}{\theta}\) and \(\alpha = k\) and is thus not so independently interesting. Here \(\alpha\) is the shape parameter and \(\beta\) is the inverse scale parameter (also rate parameter).
Properties
There are several important values that give information about a particular probability distribution. The most important are as follows:
- The mean, or expected value, of a distribution gives useful information about what average one would expect from a large number of repeated trials.
- The median of a distribution is another measure of central tendency, useful when the distribution contains outliers (i.e. particularly large/small values) that make the mean misleading.
- The mode of a distribution is the value that has the highest probability of occurring.
- The variance of a distribution measures how "spread out" the data is. Related is the standard deviation, the square root of the variance, useful due to being in the same units as the data.
Of these, the mean, mode, and variance can be explicitly calculated for the gamma distribution, while the median has only partial results. The known quantities are as follows:
The mean of the \(\text{Gamma}(k,\theta)\) distribution is \(k\theta\).
This implies that, in the alternate parametrization, the mean of the \(\text{Gamma}(\alpha, \beta)\) distribution is \(\frac{\alpha}{\beta}\).
The mode of the \(\text{Gamma}(k,\theta)\) distribution is \((k-1)\theta\), provided \(k \geq 1\).
This implies that, in the alternate parametrization, the mode of the \(\text{Gamma}(\alpha, \beta)\) distribution is \(\frac{\alpha - 1}{\beta}\) provided \(\alpha \geq 1\).
The variance of the \(\text{Gamma}(k,\theta)\) distribution is \(k\theta^2\).
This implies that, in the alternate parametrization, the variance of \(\text{Gamma}(\alpha, \beta)\) distribution is \(\frac{\alpha}{\beta^2}\).