# Poisson Distribution

The **Poisson distribution** is the probability distribution of the number of events occurring in a given time period, given the average number of times the event occurs over a period. It is especially useful for scheduling, as it gives information about the number of likely visitors over a certain interval, and has applications to various other areas such as biology (especially mutation detection), finance, disaster readiness, and any other situation in which events are time-independent.

#### Contents

## Conditions for use

The Poisson distribution is applicable only when several conditions hold:

- An event can occur any number of times (or effectively so) during a time period.
- Events occur independently: in other words, if an event occurs, it does not affect the probability of another event occurring in the same time period.
- The rate of occurrence is constant; that is, the rate does not change based on time.
- The probability of an event occurring is proportional to the length of the time period. For example, it should be twice as likely for an event to occur in a 2 hour time period than it is for an event to occur in a 1 hour period.

Also Poisson distribution can be thought of as the limiting case of binomial distribution .If n be the number of trials and p be the probablity of a success in a binomial distribution .The product n*p= constant ,n tends to infinity and p tends to 0 are the conditions when a binomial distribution behaves like a poisson distribution .We can prove this by considering the fact that convergence in Moment generating functions implies convergence in distribution .

For example, the Poisson distribution is appropriate for modelling the number of phone calls an office would receive during the noon hour, if they know that they average 4 calls per hour during that time period.

- The events are effectively independent, since there is no reason to expect a caller to affect the chances of another person calling,
- the occurrence rate may be assumed to be constant,
- and it is reasonable to assume that (for example) the probability of getting a call in the first half hour is the same as the probability of getting a call in the final half hour.

Finally, there can be essentially any number of calls during the time period, even though this is technically limited by outside factors: for instance, the office certainly cannot receive a trillion calls during that time period, as there are less than a trillion people alive to be making calls. Practically speaking, however, there is no upper bound on the number of calls.

## Finding the Poisson distribution

If the Poisson distribution is appropriate, then the average number of events over the time period is denoted \(\lambda\) (lambda). The probability of exactly \(k\) events occurring over the time period is

\[\text{Pr}(X=k) = \frac{\lambda^ke^{-\lambda}}{k!},\]

where \(e\) is Euler's number (roughly 2.718).

## In the World Cup, an average of 2.5 goals are scored. If the Poisson model is applicable, what is the probability that \(k\) goals are scored?

In this instance, \(\lambda=2.5\). The above formula applies directly:

\[Pr(X=0) = \frac{2.5^0e^{-2.5}}{0!} \approx 0.082\] \[Pr(X=1) = \frac{2.5^1e^{-2.5}}{1!} \approx 0.205\] \[Pr(X=2) = \frac{2.5^2e^{-2.5}}{2!} \approx 0.257\] \[Pr(X=3) = \frac{2.5^3e^{-2.5}}{3!} \approx 0.213\] \[Pr(X=4) = \frac{2.5^4e^{-2.5}}{4!} \approx 0.133\] \[\vdots\]

This can also be represented pictorially, as in the following image:

## Properties of the Poisson distribution

There are several important values that give information about a particular probability distribution. The most important are:

- The
**mean**, or**expected value**of a distribution gives useful information about what average one would expect from a large number of repeated trials. - The
**median**of a distribution is another measure of central tendency, useful when the distribution contains**outliers**(i.e. particularly large/small values) that make the mean misleading. - The
**mode**of a distribution is the value that has the highest probability of occurring. - The
**variance**of a distribution measures how "spread out" the data is. Related is the**standard deviation**, the square root of the variance, useful due to being in the same units as the data.

Three of these values -- the mean, mode, and variance -- are generally calculable for a Poisson distribution. The median is not generally determined, but some bounds are known.

The mean is particularly nice:

The mean of a Poisson distribution with parameter \(\lambda\) is \(\lambda\).

By the definition of expected value,

\[E[X] = \sum_{x \in \text{Im}(X)}x\text{Pr}(X=x),\]

where \(x \in \text{Im}(X)\) simply means that \(x\) is one of the possible values of the random variable \(X\). Applying this to the Poisson distribution,

\[ \begin{align*} E[X] &= \sum_{k = 0}^{\infty} k \cdot \frac{\lambda^ke^{-\lambda}}{k!} \\ &=\lambda e^{-\lambda}\sum_{k=1}^{\infty} \frac{\lambda^{k-1}}{(k-1)!} \\ &=\lambda e^{-\lambda}\sum_{j=0}^{\infty} \frac{\lambda^j}{j!} \\ &=\lambda e^{-\lambda}e^{\lambda} \\ &=\lambda, \end{align*} \]

where the rescaling \(j=k-1\) and the Taylor series \(e^x=\sum_{k=0}^{\infty}\frac{x^k}{k!}\) was used. \(_\square\)

Similarly,

The variance of a Poisson distribution with parameter \(\lambda\) is \(\lambda\).

The proof involves the routine (but computationally intensive) calculation that \(E[X^2]=\lambda^2+\lambda\). Then using the formula for variance \[\text{Var}[X] = E[X^2]-E[X]^2,\] we have \(\text{Var}[X]=\lambda^2+\lambda-\lambda^2=\lambda\).

The mode is only slightly more complicated:

If \(\lambda\) is not an integer, the mode of a Poisson distribution with parameter \(\lambda\) is \(\lfloor \lambda \rfloor\). Otherwise, both \(\lambda\) and \(\lambda-1\) are modes.

Finally, the median is bounded by:

The median \(\rho\) of a Poisson distribution with parameter \(\lambda\) satisfies

\[\lambda-\ln 2 \leq \rho \leq \lambda+\frac{1}{3}.\]

## Practical applications

The classical example of the Poisson distribution is the number of Prussian soldiers accidentally killed by horse-kick, due to being the first example of the Poisson distribution's application to a real-world large data set. Ten army corps were observed over 20 years, for a total of 200 observations, and 122 soldiers were killed by horse-kick over that time period. The question is how many deaths would be expected over a period of a year, which turns out to be excellently modeled by the Poisson distribution (with \(\lambda=0.61\)):

# of deaths | Predicted % | Expected # of occurrences | Actual # of occurrences | |

0 | 54.34 | 108.67 | 109 | |

1 | 33.15 | 66.29 | 65 | |

2 | 10.11 | 20.22 | 22 | |

3 | 2.05 | 4.11 | 3 | |

4 | 0.32 | 0.63 | 1 | |

5 | 0.04 | 0.08 | 0 | |

6 | 0.01 | 0.01 | 0 |

The interpretation of this data is important: since the Poisson distribution measures the frequency of events under the assumption of statistical randomness, the agreement of the expected distribution with the actual data suggests that the actual data was indeed due to randomness. If the actual data resulted in many more deaths than expected, an alternate explanation should be sought (e.g. inadequate training, a clever and subtle enemy plot, etc.).

The Poisson distribution is also useful in determining the probability that a certain number of events occur over a given time period. For example, if an office averages 12 calls per hour, they can calculate that the probability of receiving at least 20 calls in an hour is

\[\sum_{k=20}^{\infty}\frac{12^ke^{-12}}{k!} \approx 2.12\%,\]

which means they can generally feel comfortable keeping only enough staff on hand to handle 20 calls. Of course, the choice of threshold depends on context; an emergency room, for instance, may still wish to have extra staff on hand.

In short, the list of applications is very long. A partial list[1] of recently studied phenomena that obey a Poisson distribution is below:

- the number of mutations on a given strand of DNA per time unit
- the number of bankruptcies that are filed in a month
- the number of arrivals at a car wash in one hour
- the number of network failures per day
- the number of file server virus infection at a data center during a 24-hour period
- the number of Airbus 330 aircraft engine shutdowns per 100,000 flight hours
- the number of asthma patient arrivals in a given hour at a walk-in clinic
- the number of hungry persons entering McDonald's restaurant per day
- the number of work-related accidents over a given production time
- the number of birth, deaths, marriages, divorces, suicides, and homicides over a given period of time
- the number of customers who call to complain about a service problem per month
- the number of visitors to a web site per minute
- the number of calls to consumer hot line in a 5-minute period
- the number of telephone calls per minute in a small business
- the number of arrivals at a turnpike tollbooth per minute between 3 A.M. and 4 A.M. in January on the Kansas Turnpike

## See Also

## References

[1] Western New England University. *Applications of the Poisson probability distribution*. Retrieved February 9, 2016 from http://www.aabri.com/SA12Manuscripts/SA12083.pdf.

**Cite as:**Poisson Distribution.

*Brilliant.org*. Retrieved from https://brilliant.org/wiki/poisson-distribution/