100 Day Challenge 2020

Inverse Probabilities

If you test positive for a disease, what are the chances that you actually have it? A manufacturer might advertise the probability that their test identifies whether someone has a disease, but that’s not necessarily the same as the probability of someone having a disease, given that their test said so.

Those probabilities sound similar, so what is the difference? To walk through an example, keep reading; otherwise, jump straight to today’s challenge to do your own calculation.

The probabilities mentioned above are conditional probabilities. That is, they’re of the form “the probability that AA happens, given that BB has occurred,” the notation for which is P(AB).P(A | B).

For example, consider a test with the following accuracies:

  • If someone doesn’t have the disease, the probability that the test comes back negative (indicating that they don’t have the disease) is 97%.97\%. As a conditional probability, that is P(negative test  don’t have the disease)=0.97. P(\text{negative test } | \text{ don’t have the disease}) = 0.97.
  • If someone has the disease, the probability that the test comes back positive (indicating that they have the disease) is 91%.91\%. As a conditional probability, that is P(positive test  have the disease)=0.91. P(\text{positive test } | \text{ have the disease}) = 0.91.

However, if your test comes back positive, that doesn’t necessarily mean that there is a 91%91\% chance you have the disease. You need to instead find the conditional probability P(have the disease  positive test). P(\text{have the disease } | \text{ positive test}). That is, the manufacturer of the test tells you P(AB),P(A | B), whereas you need to determine P(BA).P(B | A). Mixing up the two, or assuming that they’re the same, is known as confusion of the inverse.

So let’s calculate the probability you would need to evaluate the seriousness of your test results. It turns out that it depends in part on how prevalent the disease is. Suppose that, in a population of 1000,1000, the disease affects 10%.10\%. Then we would expect 100100 people to have the disease, and the other 900900 don’t have it.

Of those who have the disease, there is a 91%91\% chance that their test comes back positive. So, of the 100100 people, we expect 9191 positive tests. Then the other 99 are negative.

Of those who don’t have the disease, there is a 97%97\% chance that their test comes back negative. So, of the 900900 people, we expect 873873 negative tests, and the other 2727 are positive.

Have diseaseDon't have disease
Positive test91912727
Negative test99873873

With these values, we’re looking to calculate P(have the disease  positive test).P(\text{have the disease } | \text{ positive test}). The total number of positive tests is 91+27=11891+27=118 and, within that set of people, 9191 of them actually have the disease. So the probability is P(have the disease  positive test)=911180.77. P(\text{have the disease } | \text{ positive test}) = \frac{91}{118} \approx 0.77. We see that, in this case, P(have the disease  positive test)P(positive test  have the disease). P(\text{have the disease } | \text{ positive test}) \\[0.3em] \neq \\[0.3em] P(\text{positive test } | \text{ have the disease}). Now let’s quickly look at how the former probability changes when the prevalence of the disease changes. Again, with a population of 1000,1000, suppose the disease affects 20%.20\%. Then we get the following:

Have diseaseDon't have disease
Positive test1821822424
Negative test1818776776

Then P(have the disease  positive test)=1822060.88. P(\text{have the disease } | \text{ positive test}) = \frac{182}{206} \approx 0.88.

With a more prevalent disease, the probability of having that disease if you tested positive for it is greater. While P(positive test  have the disease)P(\text{positive test } | \text{ have the disease}) is constant, P(have the disease  positive test)P(\text{have the disease } | \text{ positive test}) can change, so we must not assume that they’re the same.

Can you distinguish between the two in today’s challenge?

Today's Challenge

A disease is prevalent in 5%5\% of a population. A test for the disease has the following results:

  • If someone has the disease, there is a 94%94\% chance that the test comes back positive (indicating that they have the disease).
  • If someone doesn't have the disease, there is a 96%96\% chance that the test comes back negative (indicating that they don't have the disease).

If someone gets tested and the test comes back positive, which of the following is the closest to the probability that they have the disease? \\[1em]