## Inverse Probabilities

If you test positive for a disease, what are the chances that you actually have it? A manufacturer might advertise the probability that their test identifies whether someone has a disease, but that’s not necessarily the same as the probability of someone having a disease, given that their test said so.

Those probabilities sound similar, so what is the difference? To walk through an example, keep reading; otherwise, jump straight to today’s challenge to do your own calculation.

The probabilities mentioned above are conditional probabilities. That is, they’re of the form “the probability that $A$ happens, given that $B$ has occurred,” the notation for which is $P(A | B).$

For example, consider a test with the following accuracies:

• If someone doesn’t have the disease, the probability that the test comes back negative (indicating that they don’t have the disease) is $97\%.$ As a conditional probability, that is $P(\text{negative test } | \text{ don’t have the disease}) = 0.97.$
• If someone has the disease, the probability that the test comes back positive (indicating that they have the disease) is $91\%.$ As a conditional probability, that is $P(\text{positive test } | \text{ have the disease}) = 0.91.$

However, if your test comes back positive, that doesn’t necessarily mean that there is a $91\%$ chance you have the disease. You need to instead find the conditional probability $P(\text{have the disease } | \text{ positive test}).$ That is, the manufacturer of the test tells you $P(A | B),$ whereas you need to determine $P(B | A).$ Mixing up the two, or assuming that they’re the same, is known as confusion of the inverse.

So let’s calculate the probability you would need to evaluate the seriousness of your test results. It turns out that it depends in part on how prevalent the disease is. Suppose that, in a population of $1000,$ the disease affects $10\%.$ Then we would expect $100$ people to have the disease, and the other $900$ don’t have it.

Of those who have the disease, there is a $91\%$ chance that their test comes back positive. So, of the $100$ people, we expect $91$ positive tests. Then the other $9$ are negative.

Of those who don’t have the disease, there is a $97\%$ chance that their test comes back negative. So, of the $900$ people, we expect $873$ negative tests, and the other $27$ are positive.

 Have disease Don't have disease Positive test $91$ $27$ Negative test $9$ $873$

With these values, we’re looking to calculate $P(\text{have the disease } | \text{ positive test}).$ The total number of positive tests is $91+27=118$ and, within that set of people, $91$ of them actually have the disease. So the probability is $P(\text{have the disease } | \text{ positive test}) = \frac{91}{118} \approx 0.77.$ We see that, in this case, $P(\text{have the disease } | \text{ positive test}) \\[0.3em] \neq \\[0.3em] P(\text{positive test } | \text{ have the disease}).$ Now let’s quickly look at how the former probability changes when the prevalence of the disease changes. Again, with a population of $1000,$ suppose the disease affects $20\%.$ Then we get the following:

 Have disease Don't have disease Positive test $182$ $24$ Negative test $18$ $776$

Then $P(\text{have the disease } | \text{ positive test}) = \frac{182}{206} \approx 0.88.$

With a more prevalent disease, the probability of having that disease if you tested positive for it is greater. While $P(\text{positive test } | \text{ have the disease})$ is constant, $P(\text{have the disease } | \text{ positive test})$ can change, so we must not assume that they’re the same.

Can you distinguish between the two in today’s challenge?

# Today's Challenge

A disease is prevalent in $5\%$ of a population. A test for the disease has the following results:

• If someone has the disease, there is a $94\%$ chance that the test comes back positive (indicating that they have the disease).
• If someone doesn't have the disease, there is a $96\%$ chance that the test comes back negative (indicating that they don't have the disease).

If someone gets tested and the test comes back positive, which of the following is the closest to the probability that they have the disease? $\\[1em]$ 