Bernoulli Distribution

The Bernoulli distribution essentially models a single trial of flipping a weighted coin. It is the probability distribution of a random variable taking on only two values, \(1\) ("success") and \(0\) ("failure") with complementary probabilities \(p\) and \(1-p,\) respectively. The Bernoulli distribution therefore describes events having exactly two outcomes, which are ubiquitous in real life. Some examples of such events are as follows: a team will win a championship or not, a student will pass or fail an exam, and a rolled dice will either show a 6 or any other number.

The Bernoulli distribution serves as a building block for discrete distributions which model Bernoulli trials, such as binomial distribution and geometric distribution.

Definition

The Bernoulli distribution is the probability distribution of a random variable \(X\) having the probability density function

\[ \text{Pr}(X=x) = \begin{cases} p && x = 1 \\ 1-p && x = 0 \\ \end{cases}\]

for \(0<p<1\).

Intuitively, it describes a single experiment having two outcomes: success ("1") occurring with probability \(p,\) and failure ("0") occurring with probability \(1-p.\) It describes a single trial of a Bernoulli experiment.

A closed form of the probability density function of Bernoulli distribution is \(P(x) = p^{x}(1-p)^{1-x}\).

One can represent the Bernoulli distribution graphically as follows:

Here, \(p=0.3\).

A fair coin is flipped once. The outcome of the experiment is modeled by the Bernoulli distribution with \(p=0.5\).

Basic Properties

The expected value of a Bernoulli distribution is

\[ E(X) = 0\times (1-p) + 1\times p = p. \]

The variance of a Bernoulli distribution is calculated as

\[ Var(X) = E(X^2) - E(X)^2 = 1^2 \times p + 0^2 \times (1-p) - p^2 = p - p^2 = p(1-p). \]

The mode, the value with the highest probability of occurring, of a Bernoulli distribution is \(1\) if \(p>0.5\) and \(0\) if \(p<0.5\). If \(p=0.5\), success and failure are equally likely and both \(0\) and \(1\) are modes. This is intuitively clear: since there are only two outcomes with complementary probabilities, \(p>0.5\) implies that the probability of success is higher than the probability of failure.

Basic properties of Bernoulli distribution can be calculated by taking \(n=1\) in the binomial distribution.

Using properties such as linearity of expectation and rules for calculating the variance, Bernoulli distribution is used in the calculation of the properties of distributions based on the Bernoulli experiment, such as the binomial distribution.

Examples

Bernoulli distribution models the following situations:

A newborn child is either male or female. (Here the probability of a child being a male is roughly 0.5.)

You either pass or fail an exam.

A tennis player either wins or loses a match.

A dart thrown at a circular dartboard lands randomly over its area (example). The dart will either land closer to the center than to the edge or not (in the second case it is either closer to the edge or equally distant from the center and the edge). In this case \(p=0.25\).

An integer \(n\in \{1,\ldots, 999999 \}\) is chosen randomly. We consider three variables, \(X_1,X_2,\) and \(X_3\). \(X_1\) assumes the value \(1\) if the sum of the digits of \(n\) is divisible by \(9\) and \(0\) otherwise; \(X_2\) assumes the value \(1\) if \(n\) can be expressed as a sum of four squares of integers and \(0\) otherwise; \(X_3\) assumes values \(0,1\) and \(2\), respectively, if \(n\) leaves a remainder of \(0,1\) and \(2\) when divided by \(3\).

The sum of digits of a positive integer \(n\) is divisible by \(9\) if and only if \(9\) divides \(n\). The probability that a randomly chosen integer in \(\{1,\ldots, 999999 \}\) will be divisible by \(9\) is \(\frac{1}{9}.\) Therefore, \(X_1\) is a Bernoulli distributed random variable with \(p=\frac{1}{9}\).

Every positive integer can be expressed as a sum of four squares, so the variable \(X_2\) is not a random variable and is not Bernoulli distributed. In the definition of the Bernoulli distribution the restriction \(0<p<1\) excludes the case \(p=1\).

The variable \(X_3\) models an experiment with more than two outcomes, and hence it is not Bernoulli distributed.

Suppose we have \(a + b\) independent Bernoulli\((p)\) trials. Let \(N_a\) be the number of successes in the first \(a\) of these trials, and \(N_b\) be the number of successes in the last \(b\) of these trials. Using properties of the Bernoulli distribution, we can then say the following:

\(N_a\) ~ Binomial\((a, p)\) because we have \(a\) independent Bernoulli\((p)\) trials. We don't care about the last \(b\) trials.

\(N_b\) ~ Binomial\((b, p)\) because we have \(b\) independent Bernoulli\((p)\) trials. We don't care about the first \(a\) trials.

\(N_a + N_b\) ~ Binomial\((a + b, p)\) because all of the Bernoulli trials are independent, and we can treat them as i.i.d.

\(N_a \) and \(N_b\) are independent because the two groups of trials (the first \(a\) trials and the last \(b\) trials) are independent.

Contents