Poisson Distribution
The Poisson distribution is the discrete probability distribution of the number of events occurring in a given time period, given the average number of times the event occurs over that time period.
A certain fast-food restaurant gets an average of 3 visitors to the drive-through per minute. This is just an average, however. The actual amount can vary.
A Poisson distribution can be used to analyze the probability of various events regarding how many customers go through the drive-through. It can allow one to calculate the probability of a lull in activity (when there are 0 customers coming to the drive-through) as well as the probability of a flurry of activity (when there are 5 or more customers coming to the drive-through). This information can, in turn, help a manager plan for these events with staffing and scheduling.
In addition to its use for staffing and scheduling, the Poisson distribution also has applications in biology (especially mutation detection), finance, disaster readiness, and any other situation in which events are time-independent.
Contents
Conditions for Use
The Poisson distribution is applicable only when several conditions hold.
Conditions for Poisson Distribution:
- An event can occur any number of times during a time period.
- Events occur independently. In other words, if an event occurs, it does not affect the probability of another event occurring in the same time period.
- The rate of occurrence is constant; that is, the rate does not change based on time.
- The probability of an event occurring is proportional to the length of the time period. For example, it should be twice as likely for an event to occur in a 2 hour time period than it is for an event to occur in a 1 hour period.
For example, the Poisson distribution is appropriate for modeling the number of phone calls an office would receive during the noon hour, if they know that they average 4 calls per hour during that time period.
- Although the average is 4 calls, they could theoretically get any number of calls during that time period.
- The events are effectively independent since there is no reason to expect a caller to affect the chances of another person calling.
- The occurrence rate may be assumed to be constant.
- It is reasonable to assume that (for example) the probability of getting a call in the first half hour is the same as the probability of getting a call in the final half hour.
Of course, this situation isn't an absolute perfect theoretical fit for the Poisson distribution. For instance, the office certainly cannot receive a trillion calls during the time period, as there are less than a trillion people alive to be making calls. Practically speaking, the situation is close enough that the Poisson distribution does a good job of modeling the situation's behavior.
The following problem gives an idea of how the Poisson distribution was derived:
Consider a binomial distribution of \(X\sim B(n,p)\).
It can be easily shown that \(P(X=k)={n\choose k}p^k{(1-p)}^{n-k}\) for \(k=0,1,2,3,\ldots,n\).
Now, let's take the limit of the above using \(n \to \infty\). Instead of having an infinitesimal \(p\), let's assume that it is given that \(np\), the mean of the probability distribution function, is some finite value \(m\).
Find \(P(X=k)\) in terms of \(m\) and \(k\) for this new distribution, where \(k=0,1,2,3,\ldots\), without looking anything up or reciting any formulas from memory.
Probabilities with the Poisson Distribution
Given that a situation follows a Poisson distribution, there is a formula which allows one to calculate the probability of observing \(k\) events over a time period for any non-negative integer value of \(k\).
Let \(X\) be the discrete random variable that represents the number of events observed over a given time period. Let \(\lambda\) be the expected value (average) of \(X\). If \(X\) follows a Poisson distribution, then the probability of observing \(k\) events over the time period is
\[P(X=k) = \frac{\lambda^ke^{-\lambda}}{k!},\]
where \(e\) is Euler's number.
In the World Cup, an average of 2.5 goals are scored each game. Modeling this situation with a Poisson distribution, what is the probability that \(k\) goals are scored in a game?
In this instance, \(\lambda=2.5\). The above formula applies directly:
\[\begin{align} P(X=0) &= \frac{2.5^0e^{-2.5}}{0!} \approx 0.082\\\\ P(X=1) &= \frac{2.5^1e^{-2.5}}{1!} \approx 0.205\\\\ P(X=2) &= \frac{2.5^2e^{-2.5}}{2!} \approx 0.257\\\\ P(X=3) &= \frac{2.5^3e^{-2.5}}{3!} \approx 0.213\\\\ P(X=4) &= \frac{2.5^4e^{-2.5}}{4!} \approx 0.133\\\\ &\ \ \vdots \end{align}\]
There is no upper limit on the value of \(k\) for this formula, though the probability rapidly approaches 0 as \(k\) increases. \(_\square\)
A fast food restaurant gets an average of 2.8 customers approaching the register every minute.
Assuming the number of customers approaching the register per minute follows a Poisson distribution, what is the probability that 4 customers approach the register in the next minute?
Round your answer to 3 decimal places.
The Poisson distribution can be used to calculate the probabilities of "less than" and "more than" using the rule of sum and complement probabilities.
A statistician records the number of cars that approach an intersection. He finds that an average of 1.6 cars approach the intersection every minute.
Assuming the number of cars that approach this intersection follows a Poisson distribution, what is the probability that 3 or more cars will approach the intersection within a minute?
For this problem, \(\lambda=1.6.\) The goal of this problem is to find \(P(X \ge 3),\) the probability that there are 3 or more cars approaching the intersection within a minute. Since there is no upper limit on the value of \(k,\) this probability cannot be computed directly. However, its complement, \(P(X \le 2),\) can be computed to give \(P(X \ge 3):\)
\[\begin{align} P(X=0) &= \frac{1.6^0e^{-1.6}}{0!} \approx 0.202 \\\\ P(X=1) &= \frac{1.6^1e^{-1.6}}{1!} \approx 0.323 \\\\ P(X=2) &= \frac{1.6^2e^{-1.6}}{2!} \approx 0.258 \\\\ \Rightarrow P(X \le 2) &= P(X=0) + P(X=1) + P(X=2) \\ &\approx 0.783 \\ \\ \Rightarrow P(X \ge 3) &= 1-P(X \le 2) \\ &\approx 0.217. \end{align}\]
Therefore, the probability that there are 3 or more cars approaching the intersection within a minute is approximately \(0.217.\) \(_\square\)
When a computer disk manufacturer tests a disk, it writes to the disk and then tests it using a certifier. The certifier counts the number of missing pulses or errors. The number of errors in a test area on a disk has a Poisson distribution with \(\lambda = 0.2\).
What percentage of test areas have two or fewer errors?
There are other applications of the Poisson distribution that come from more open-ended problems. For example, it can be used to help determine the amount of staffing that is needed in a call center.
A call center receives an average of 4.5 calls every 5 minutes. Each agent can handle one of these calls over the 5 minute period. If a call is received, but no agent is available to take it, then that caller will be placed on hold. Assuming that the calls follow a Poisson distribution, what is the minimum number of agents needed on duty so that calls are placed on hold at most 10% of the time?
In order for all calls to be taken, the number of agents on duty should be greater than or equal to the number of calls received. If \(X\) is the number of calls received and \(k\) is the number of agents, then \(k\) should be set such that \(P(X > k)\le 0.1,\) or equivalently, \(P(X \le k) > 0.9.\)
The average number of calls is 4.5, so \(\lambda=4.5:\)
\[\begin{array}{cl} P(X=0) = \frac{4.5^0 e^{-4.5}}{0!} \approx 0.011 & \\ P(X=1) = \frac{4.5^1 e^{-4.5}}{1!} \approx 0.050 &\implies P(X\le 1) \approx 0.061 \\ P(X=2) = \frac{4.5^2 e^{-4.5}}{2!} \approx 0.112 &\implies P(X\le 2) \approx 0.173 \\ P(X=3) = \frac{4.5^3 e^{-4.5}}{3!} \approx 0.169 &\implies P(X\le 3) \approx 0.342 \\ P(X=4) = \frac{4.5^4 e^{-4.5}}{4!} \approx 0.190 &\implies P(X\le 4) \approx 0.532 \\ P(X=5) = \frac{4.5^5 e^{-4.5}}{5!} \approx 0.171 &\implies P(X\le 5) \approx 0.703 \\ P(X=6) = \frac{4.5^6 e^{-4.5}}{6!} \approx 0.128 &\implies P(X\le 6) \approx 0.831 \\ P(X=7) = \frac{4.5^7 e^{-4.5}}{7!} \approx 0.082 &\implies P(X\le 7) \approx 0.913. \end{array}\]
If the goal is to make sure that less than 10% of calls are placed on hold, then \(\boxed{7}\) agents should be on duty. \(_\square\)
Properties of the Poisson Distribution
The expected value of a Poisson distribution should come as no surprise, as each Poisson distribution is defined by its expected value.
Expected Value of Poisson Random Variable:
Given a discrete random variable \(X\) that follows a Poisson distribution with parameter \(\lambda,\) the expected value of this variable is
\[\text{E}[X]=\lambda.\]
By the definition of expected value,
\[\text{E}[X] = \sum_{x \in \text{Im}(X)}xP(X=x),\]
where \(x \in \text{Im}(X)\) simply means that \(x\) is one of the possible values of the random variable \(X\). Applying this to the Poisson distribution,
\[ \begin{align*} \text{E}[X] &= \sum_{k = 0}^{\infty} k \cdot \frac{\lambda^ke^{-\lambda}}{k!} \\ &=\lambda e^{-\lambda}\sum_{k=1}^{\infty} \frac{\lambda^{k-1}}{(k-1)!} \\ &=\lambda e^{-\lambda}\sum_{j=0}^{\infty} \frac{\lambda^j}{j!} \\ &=\lambda e^{-\lambda}e^{\lambda} \\ &=\lambda, \end{align*} \]
where the rescaling \(j=k-1\) and the Taylor series \(e^x=\sum_{k=0}^{\infty}\frac{x^k}{k!}\) was used. \(_\square\)
The variance of the Poisson distribution is also conveniently simple.
Variance of Poisson Random Variable:
Given a discrete random variable \(X\) that follows a Poisson distribution with parameter \(\lambda,\) the variance of this variable is
\[\text{Var}[X]=\lambda.\]
The proof involves the routine (but computationally intensive) calculation that \(E[X^2]=\lambda^2+\lambda\). Then using the formula for variance
\[\text{Var}[X] = E[X^2]-E[X]^2,\]
we have \(\text{Var}[X]=\lambda^2+\lambda-\lambda^2=\lambda\).
The mode is only slightly more complicated:
Mode of Poisson Random Variable:
If \(\lambda\) is not an integer, the mode of a Poisson distribution with parameter \(\lambda\) is \(\lfloor \lambda \rfloor\). Otherwise, both \(\lambda\) and \(\lambda-1\) are modes.
The median of a Poisson distribution does not have a closed form, but its bounds are known:
Median of Poisson Random Variable:
The median \(\rho\) of a Poisson distribution with parameter \(\lambda\) satisfies
\[\lambda-\ln 2 \leq \rho \leq \lambda+\frac{1}{3}.\]
The sum of two independent Poisson random variables is a Poisson random variable.
Sum of Independent Poisson Random Variables:
Let \(X\) and \(Y\) be Poisson random variables with parameters \(\lambda_1\) and \(\lambda_2\), respectively. If \(X\) and \(Y\) are independent, then \(X+Y\) is a Poisson random variable with parameter \(\lambda_1+\lambda_2.\) Its distribution can be described with the formula
\[P(X+Y=k)=\frac{(\lambda_1+\lambda_2)^k e^{-(\lambda_1+\lambda_2)}}{k!}.\]
Damon is working the evening shift at the register of his retail job. There are currently two registers open, but his coworker is about to go home for the day and close her register.
The number of customers approaching each register is an independent Poisson random variable. If each register was getting an average of 2 customers per minute, what is the probability that Damon will have more than 4 customers approaching his register in minute after his coworker goes home?
Round your answer to 3 decimal places.
Additionally, the Poisson distribution can be thought of as the limiting case of the binomial distribution. If there are \(n\) independent trials, \(p\) is the probability of a successful trial, and \(np\) remains constant, then this binomial distribution will behave as a Poisson distribution as \(n\) approaches infinity.
Poisson Limit Theorem:
As \(n\) approaches infinity and \(p\) approaches \(0\) such that \(\lambda\) is a constant with \(\lambda=np,\) the binomial distribution with parameters \(n\) and \(p\) is approximated by a Poisson distribution with parameter \(\lambda\):
\[\binom{n}{k}p^k(1-p)^{n-k} \simeq \frac{\lambda^k e^{-\lambda}}{k!}.\]
This can be proved by considering the fact that convergence in moment generating functions implies convergence in distribution.
Practical Applications
The classical example of the Poisson distribution is the number of Prussian soldiers accidentally killed by horse-kick, due to being the first example of the Poisson distribution's application to a real-world large data set. Ten army corps were observed over 20 years, for a total of 200 observations, and 122 soldiers were killed by horse-kick over that time period. The question is how many deaths would be expected over a period of a year, which turns out to be excellently modeled by the Poisson distribution \((\)with \(\lambda=0.61):\)
# of deaths | Predicted % | Expected # of occurrences | Actual # of occurrences |
0 | 54.34 | 108.67 | 109 |
1 | 33.15 | 66.29 | 65 |
2 | 10.11 | 20.22 | 22 |
3 | 2.05 | 4.11 | 3 |
4 | 0.32 | 0.63 | 1 |
5 | 0.04 | 0.08 | 0 |
6 | 0.01 | 0.01 | 0 |
The interpretation of this data is important: since the Poisson distribution measures the frequency of events under the assumption of statistical randomness, the agreement of the expected distribution with the actual data suggests that the actual data was indeed due to randomness. If the actual data resulted in many more deaths than expected, an alternate explanation should be sought (e.g. inadequate training, a clever and subtle enemy plot, etc.).
The Poisson distribution is also useful in determining the probability that a certain number of events occur over a given time period. For example, if an office averages 12 calls per hour, they can calculate that the probability of receiving at least 20 calls in an hour is
\[\sum_{k=20}^{\infty}\frac{12^ke^{-12}}{k!} \approx 2.12\%,\]
which means they can generally feel comfortable keeping only enough staff on hand to handle 20 calls. Of course, the choice of threshold depends on context; an emergency room, for instance, may still wish to have extra staff on hand.
In short, the list of applications is very long. A partial list[1] of recently studied phenomena that obey a Poisson distribution is below:
- the number of mutations on a given strand of DNA per time unit
- the number of bankruptcies that are filed in a month
- the number of arrivals at a car wash in one hour
- the number of network failures per day
- the number of file server virus infection at a data center during a 24-hour period
- the number of Airbus 330 aircraft engine shutdowns per 100,000 flight hours
- the number of asthma patient arrivals in a given hour at a walk-in clinic
- the number of hungry persons entering McDonald's restaurant per day
- the number of work-related accidents over a given production time
- the number of birth, deaths, marriages, divorces, suicides, and homicides over a given period of time
- the number of customers who call to complain about a service problem per month
- the number of visitors to a web site per minute
- the number of calls to consumer hot line in a 5-minute period
- the number of telephone calls per minute in a small business
- the number of arrivals at a turnpike tollbooth per minute between 3 A.M. and 4 A.M. in January on the Kansas Turnpike.
See Also
References
[1] Western New England University. Applications of the Poisson probability distribution. Retrieved February 9, 2016 from http://www.aabri.com/SA12Manuscripts/SA12083.pdf.