If you test positive for a disease, what are the chances that you actually have it? A manufacturer might advertise the probability that their test identifies whether someone has a disease, but that’s not necessarily the same as the probability of someone having a disease, given that their test said so.
Those probabilities sound similar, so what is the difference? To walk through an example, keep reading; otherwise, jump straight to today’s challenge to do your own calculation.
The probabilities mentioned above are conditional probabilities. That is, they’re of the form “the probability that happens, given that has occurred,” the notation for which is
For example, consider a test with the following accuracies:
- If someone doesn’t have the disease, the probability that the test comes back negative (indicating that they don’t have the disease) is As a conditional probability, that is
- If someone has the disease, the probability that the test comes back positive (indicating that they have the disease) is As a conditional probability, that is
However, if your test comes back positive, that doesn’t necessarily mean that there is a chance you have the disease. You need to instead find the conditional probability That is, the manufacturer of the test tells you whereas you need to determine Mixing up the two, or assuming that they’re the same, is known as confusion of the inverse.
So let’s calculate the probability you would need to evaluate the seriousness of your test results. It turns out that it depends in part on how prevalent the disease is. Suppose that, in a population of the disease affects Then we would expect people to have the disease, and the other don’t have it.
Of those who have the disease, there is a chance that their test comes back positive. So, of the people, we expect positive tests. Then the other are negative.
Of those who don’t have the disease, there is a chance that their test comes back negative. So, of the people, we expect negative tests, and the other are positive.
Have disease | Don't have disease | |
Positive test | ||
Negative test |
With these values, we’re looking to calculate The total number of positive tests is and, within that set of people, of them actually have the disease. So the probability is We see that, in this case, Now let’s quickly look at how the former probability changes when the prevalence of the disease changes. Again, with a population of suppose the disease affects Then we get the following:
Have disease | Don't have disease | |
Positive test | ||
Negative test |
Then
With a more prevalent disease, the probability of having that disease if you tested positive for it is greater. While is constant, can change, so we must not assume that they’re the same.
Can you distinguish between the two in today’s challenge?