Bayes' Theorem and Conditional Probability

Bayes' theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability, but can be used to powerfully reason about a wide range of problems involving belief updates.

Given a hypothesis \(H\) and evidence \(E\), Bayes' theorem states that the relationship between the probability of the hypothesis before getting the evidence \(P(H)\) and the probability of the hypothesis after getting the evidence \(P(H \mid E)\) is

\[P(H \mid E) = \frac{P(E \mid H)} {P(E)} P(H).\]

Many modern machine learning techniques rely on Bayes' theorem. For instance, spam filters use Bayesian updating to determine whether an email is real or spam, given the words in the email. Additionally, many specific techniques in statistics, such as calculating \(p\)-values or interpreting medical results, are best described in terms of how they contribute to updating hypotheses using Bayes' theorem.

Explaining Counterintuitive Results

Probability problems are notorious for yielding surprising and counterintuitive results. One famous example--or a pair of examples--is the following:

A couple has 2 children and the older child is a boy. If the probabilities of having a boy or a girl are both 50%, what's the probability that the couple has two boys?
We already know that the older child is a boy. The probability of two boys is equivalent to the probability that the younger child is a boy, which is \(50\%\).

A couple has two children, of which at least one is a boy. If the probabilities of having a boy or a girl are both \(50\%\), what is the probability that the couple has two boys?

At first glance, this appears to be asking the same question. We might reason as follows: “We know that one is a boy, so the only question is whether the other one is a boy, and the chances of that being the case are \(50\%\). So again, the answer is \(50\%\).”

This makes perfect sense. It also happens to be incorrect.

Deriving Bayes' Theorem

Bayes' theorem centers on relating different conditional probabilities. A conditional probability is an expression of how probable one event is given that some other event occurred (a fixed value). For instance, "what is the probability that the sidewalk is wet?" will have a different answer than "what is the probability that the sidewalk is wet given that it rained earlier?"

For a joint probability distribution over events \(A\) and \(B\), \(P(A \cap B)\), the conditional probability of \(A\) given \(B\) is defined as

\[P(A\mid B) = \frac{P(A\cap B)}{P(B)}.\]

In the sidewalk example, where \(A\) is "the sidewalk is wet" and \(B\) is "it rained earlier," this expression reads as "the probability the sidewalk is wet given that it rained earlier is equal to the probability that the sidewalk is wet and it rains over the probability that it rains."

Note that \(P(A \cap B)\) is the probability of both \(A\) and \(B\) occurring, which is the same as the probability of \(A\) occurring times the probability that \(B\) occurs given that \(A\) occurred: \(P(B \mid A) \times P(A).\) Using the same reasoning, \(P(A \cap B)\) is also the probability that \(B\) occurs times the probability that \(A\) occurs given that \(B\) occurs: \(P(A \mid B) \times P(B)\). The fact that these two expressions are equal leads to Bayes' Theorem. Expressed mathematically, this is:

\[\begin{align} P(A \mid B) &= \frac{P(A\cap B)}{P(B)}, \text{ if } P(B) \neq 0, \\ P(B \mid A) &= \frac{P(B\cap A)}{P(A)}, \text{ if } P(A) \neq 0, \\ \Rightarrow P(A\cap B) &= P(A\mid B)\times P(B)=P(B\mid A)\times P(A), \\ \Rightarrow P(A \mid B) &= \frac{P(B \mid A) \times P(A)} {P(B)}, \text{ if } P(B) \neq 0. \end{align}\]

Notice that our result for dependent events and for Bayes’ theorem are both valid when the events are independent. In these instances, \(P(A \mid B) = P(A)\) and \(P(B \mid A) = P(B)\), so the expressions simplify.

Bayes' Theorem

\[P(A \mid B) = \frac{P(B \mid A)} {P(B)} P(A)\]

While this is an equation that applies to any probability distribution over events \(A\) and \(B\), it has a particularly nice interpretation in the case where \(A\) represents a hypothesis \(H\) and \(B\) represents some observed evidence \(E\). In this case, the formula can be written as

\[P(H \mid E) = \frac{P(E \mid H)}{P(E)} P(H).\]

This relates the probability of the hypothesis before getting the evidence \(P(H)\), to the probability of the hypothesis after getting the evidence, \(P(H \mid E)\). For this reason, \(P(H)\) is called the prior probability, while \(P(H \mid E)\) is called the posterior probability. The factor that relates the two, \(\frac{P(E \mid H)}{P(E)}\), is called the likelihood ratio. Using these terms, Bayes' theorem can be rephrased as "the posterior probability equals the prior probability times the likelihood ratio."

If a single card is drawn from a standard deck of playing cards, the probability that the card is a king is 4/52, since there are 4 kings in a standard deck of 52 cards. Rewording this, if \(\text{King}\) is the event "this card is a king," the prior probability \(P(\text{King}) = \frac{4}{52} = \frac{1}{13}.\)

If evidence is provided (for instance, someone looks at the card) that the single card is a face card, then the posterior probability \(P(\text{King} \mid \text{Face})\) can be calculated using Bayes' theorem:

\[P(\text{King} \mid \text{Face}) = \frac{P(\text{Face} \mid \text{King})}{P(\text{Face})} P(\text{King}).\]

Since every King is also a face card, \(P(\text{Face} \mid \text{King}) = 1\). Since there are 3 face cards in each suit (Jack, Queen, King) , the probability of a face card is \(P(\text{Face}) = \frac{3}{13}\). Combining these gives a likelihood ratio of \(\frac{1}{\hspace{2mm} \frac3{13}\hspace{2mm} } = \frac{13}{3}\).

Using Bayes' theorem gives \(P(\text{King} \mid \text{Face}) = \frac{13}{3} \frac{1}{13} = \frac{1}{3}\). \(_\square\)

Bayes' theorem clarifies the two-children problem from the first section:

1. A couple has two children, the older of which is a boy. What is the probability that they have two boys?

2. A couple has two children, one of which is a boy. What is the probability that they have two boys?

\[\] Define three events, \(A\), \(B\), and \(C\), as follows:

\[ \begin{align} A & = \mbox{ both children are boys}\\ B & = \mbox{ the older child is a boy}\\ C & = \mbox{ one of their children is a boy.} \end{align}\]

Question 1 is asking for \(P(A \mid B)\), and Question 2 is asking for \(P(A \mid C)\). The first is computed using the simpler version of Bayes’ theorem:

\[P(A \mid B) = \frac{P(A)P(B \mid A)}{P(B)} = \frac{ \frac{1}{4}\cdot 1 }{\frac{1}{2}} = \frac{1}{2}.\]

To find \(P(A \mid C)\), we must determine \(P(C)\), the prior probability that the couple has at least one boy. This is equal to \(1 - P(\mbox{both children are girls}) = 1 - \frac{1}{4}=\frac{3}{4}\). Therefore the desired probability is

\[P(A \mid C) = \frac{P(A)P(C \mid A)}{P(C)} = \frac{\frac{1}{4}\cdot 1}{\frac{3}{4}} = \frac{1}{3}. \ _\square \]

For a similarly paradoxical problem, see the Monty Hall problem.

Visualizing Bayes’ Theorem

Venn diagrams are particularly useful for visualizing Bayes' theorem, since both the diagrams and the theorem are about looking at the intersections of different spaces of events.

A disease is present in 5 out of 100 people, and a test that is 90% accurate (meaning that the test produces the correct result in 90% of cases) is administered to 100 people. If one person in the group tests positive, what is the probability that this one person has the disease?

The intuitive answer is that the one person is 90% likely to have the disease. But we can visualize this to show that it’s not accurate. First, draw the total population and the 5 people who have the disease:

The circle A represents 5 out 100, or 5% of the larger universe of 100 people.

Next, overlay a circle to represent the people who get a positive result on the test. We know that 90% of those with the disease will get a positive result, so need to cover 90% of circle A, but we also know that 10% of the population who does not have the disease will get a positive result, so we need to cover 10% of the non-disease carrying population (the total universe of 100 less circle A).

Circle B is covering a substantial portion of the total population. It actually covers more area than the total portion of the population with the disease. This is because 14 out of the total population of 100 (90% of the 5 people with the disease + 10% of the 95 people without the disease) will receive a positive result. Even though this is a test with 90% accuracy, this visualization shows that any one patient who tests positive (Circle B) for the disease only has a 32.14% (4.5 in 14) chance of actually having the disease.

Diagnosing Disease

Main article: Bayesian theory in science and math

Bayes’ theorem can show the likelihood of getting false positives in scientific studies. An in-depth look at this can be found in Bayesian theory in science and math.

Many medical diagnostic tests are said to be \(X\)% accurate, for instance 99% accurate, referring specifically to the probability that the test result is correct given your condition (or lack thereof). This is not the same as the posterior probability of having the disease given the result of the test. To see this in action, consider the following problem.

More Examples

Balls numbered 1 through 20 are placed in a bag. Three balls are drawn out of the bag without replacement. What is the probability that all the balls have odd numbers on them?

In this situation, the events are not independent. There will be a \(\frac{10}{20} = \frac{1}{2}\) chance that any particular ball is odd. However, the probability that all the balls are odd is not \(\frac{1}{8}\). We do have that the probability that the first ball is odd is \(\frac{1}{2}.\) For the second ball, given that the first ball was odd, there are only 9 odd numbered balls that could be drawn from a total of 19 balls, so the probability is \(\frac{9}{19}\). For the third ball, since the first two are both odd, there are 8 odd numbered balls that could be drawn from a total of 18 remaining balls. So the probability is \(\frac{8}{18}\).

So the probability that all 3 balls are odd numbered is \(\frac{10}{20} \times \frac{9}{19} \times \frac{8}{18} = \frac{2}{19}.\) Notice that \(\frac{2}{19} \approx 0.105\), whereas \(\frac{1}{8} = 0.125.\) \(_\square\)

A family has two children. Given that one of the children is a boy, what is the probability that both children are boys?

We assume that the probability of a child being a boy or girl is \(\frac{1}{2}\). We solve this using Bayes’ theorem. We let \(B\) be the event that the family has one child who is a boy. We let \(A\) be the event that both children are boys. We want to find \(P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)}\). We can easily see that \(P(B \mid A) = 1\). We also note that \(P(A) = \frac{1}{4}\) and \(P(B) = \frac{3}{4}\). So \(P(A \mid B) = \frac{1 \times \frac{1}{4}}{\frac{3}{4}} = \frac{1}{3} \). \(_\square\)

A family has two children. Given that one of the children is a boy, and that he was born on a Tuesday, what is the probability that both children are boys?

Your first instinct to this question might be to answer \(\frac{1}{3}\), since this is obviously the same question as the previous one. Knowing the day of the week a child is born on can’t possibly give you additional information, right?

Let’s assume that the probability of being born on a particular day of the week is \(\frac{1}{7}\) and is independent of whether the child is a boy or a girl. We let \(B\) be the event that the family has one child who is a boy born on Tuesday and \(A\) be the event that both children are boys, and apply Bayes’ Theorem. We notice right away that \(P(B \mid A)\) is no longer equal to one. Given that there are 7 days of the week, there are 49 possible combinations for the days of the week the two boys were born on, and 13 of these have a boy who was born on a Tuesday, so \(P( B \mid A) = \frac{13}{49}\). \(P(A)\) remains unchanged at \(\frac{1}{4}\). To calculate \(P(B)\), we note that there are \(14^2\ = 196\) possible ways to select the gender and the day of the week the child was born on. Of these, there are \(13^2 = 169\) ways which do not have a boy born on Tuesday, and \(196 - 169 = 27\) which do, so \(P(B) = \frac{27}{196}\). This gives is that \(P(A \mid B) = \frac{ \frac{13}{49} \times \frac{1}{4}} {\frac{27}{196}} = \frac{13}{27}\). \(_\square\)

Note: This answer is certainly not \(\frac{1}{3}\), and is actually much closer to \(\frac{1}{2}\).

Contents

1. A couple has two children, the older of which is a boy. What is the probability that they have two boys?

2. A couple has two children, one of which is a boy. What is the probability that they have two boys?

Balls numbered 1 through 20 are placed in a bag. Three balls are drawn out of the bag without replacement. What is the probability that all the balls have odd numbers on them?

A family has two children. Given that one of the children is a boy, what is the probability that both children are boys?

A family has two children. Given that one of the children is a boy, and that he was born on a Tuesday, what is the probability that both children are boys?