Bayesian Theory in Science and Math

Scientists and mathematicians are increasingly realizing that Bayes' theorem has been missing from historical analysis. In some cases, scientists were unable to do analysis that is now possible with Bayes' theorem; in other cases doctors and scientists failed to apply Bayes' theorem where it was needed, relying instead on frequency probability.^[1] This is especially common in medical testing, where tests are said to be $x\%$ accurate, but this does not necessarily mean that a single result is $x\%$ guaranteed, merely that over a large population $x\%$ of the results were correct.

Frequentist Probability

Historically, basic frequency probability theory dominated statistical analysis. This probability states conclusions like "In a normal, two-sided, unweighted coin, there is a 50% chance of flipping one side, and a 50% chance of flipping the other" but gets more complicated. For instance, a study found that persons with a family income of $20,000 or less have "about twice the incidence of ulcers" compared to families with incomes above $20,000. [1]

These probabilities are not wrong. However, they have been misapplied. Throughout the $19^\text{th}$ and $20^\text{th}$ century, many mathematicians, scientists, and statisticians resisted using Bayesian theory. For instance, the Scottish mathematician George Chrystal urged that Bayes' theorem, and Laplace's revival of it, "should be decently buried out of sight, and not embalmed in textbooks and examination papers... The indiscretions of great men should be quietly allowed to be forgotten." [2]

This misapplication is prevalent in published scientific and statistical research. Dr. Andrew Gelman, a statistics professor at Columbia, said, "Even if scientists always did the calculations correctly—and they don’t, he argues—accepting everything with a $p$-value of 5 percent means that one in 20 'statistically significant' results is nothing but random noise. The proportion of wrong results published in prominent journals is probably even higher, he said, because such findings are often surprising and appealingly counterintuitive." [3]

One way to demonstrate the failures, or at least poor applicability, of the frequentist approach to probability is the Monty Hall problem. In the problem, Monty Hall is the host of a game show and gives a contestant the chance to choose one of three doors without knowing what is behind them. The catch is that one of the doors has a prize like a car, and the other two have goats. After the contestant picks a door, Monty then opens one of the doors that the contestant did not pick and reveals that this door has a goat behind it. Before the final reveal, Monty gives the contestant the chance to switch their choice of door. The frequency-probability-guided approach to looking at this choice is to think that because there are now only two doors left and one of them has a car and the other a goat, the chance of picking right is 50-50 and it doesn't matter if a contestant changes their door.

This, however, is incorrect, and Bayesian thinking helps to illustrate why. A Bayesian probabilist will realize that Monty opening one door is additional evidence provided to the contestant. The Bayesian would realize that the contestant's initial guess had a 1 in 3 chance of being right, and a 2 in 3 chance of being wrong. Now that Monty has deliberately (and not randomly) eliminated one wrong door and the 2 in 3 chance assigns itself to the unchosen and unopened door, staying with their door still has a 1 in 3 chance of being right, but switching has a 2 in 3 chance. The Monty Hall problem isn't the only place where educated people become confused. Physicians and scientists have mistakenly used frequency probabilities when they should use Bayes' theorem to report results and analyze clinical tests.

Biomedical Test Results

Many medical tests include claims like "this test is 99% accurate" (usually the accuracy is less than 99%), commonly referred to as sensitivity with the error rate (in this case, 1%) causing false positives or false negatives. Because of high rates of success on tests, some physicians look at a single test result and conclude that it's as accurate as the test itself is—for instance, that a single patient's result is 99% correct. The difference here has to do with the difference between the posterior probability and the prior probability in Bayes' theorem. Many physicians and scientists conflate to the prior probability in some given situation, $P(A),$ with the posterior probability, $P(A \mid B),$ the possibility of $A$ given some evidence.

In fact, both formal and informal tests have shown that the majority of doctors themselves misunderstand clinical data. In a famous 1982 paper on probabilistic reasoning in clinical medicine,[4] David Eddy explored the prevalence of clinical language that guides doctors to incorrect reasoning. For instance, in breast cancer screening, "The accuracy of mammography is approximately 90%,"[4] "The accuracy of mammography in correctly diagnosing malignant lesions of the breast averages 80 to 85 percent,"[4] with specific results reporting, "The results showed 79.2 percent of 475 malignant lesions were correctly diagnosed and 90.4 percent of 1,105 of benign lesions were correctly diagnosed for an average of 87 percent."[4] Eddy goes on to mention an informal survey he did (of a problem that's about to follow) where 95 out of 100 doctors mistook these prior probabilities as the correct probability as to whether a single patient's positive result meant they had cancer.[4] This has been confirmed in many other tests, for instance, by Windeler and Köbberling in 1986,[5] and in recent surveys by Wegwarth, Schwartz, Woloshin, and Gaissmaier in 2012,[6] showing that many physicians do not apply Bayesian statistics to their interpretation of clinical results.

A visualization of the above problem that may help you find the answer!

Testing Twice

The key to determining if a patient received a false positive or a true positive is in testing the patient twice. The problem at the bottom of this wiki goes through the math here.

Additionally, the medical field often says things like "mammogram screening reduces the risk of dying from breast cancer by 25%!" However, the actual absolute increase, as opposed to this relative increase, may often be small. For instance, if in a group 1,000 women get breast cancer screening, in 10 years' time three of them might die from breast cancer. If another group of 1,000 women do not get screened, in 10 years' time four of them might die from breast cancer. That is a 25% improvement in relative terms, but misleading to an average woman, who might think that getting screened makes her 25% less likely to have cancer.[7]

Natural Frequencies as an Alternative

A proposed solution to this common misunderstanding is to represent results in terms of natural frequencies. Multiple studies, including work by Hoffrage, Krauss, Martignon, and Gigerenzer, have shown that this drastically improves Bayesian reasoning.[8]

Natural frequencies have shown to be a positive tool for inducing Bayesian reasoning in numerous laboratory studies,[9] the interpretation of DNA evidence in court,[10] and teaching children about Bayesian thinking.[11]

The issue isn't that Bayes theorem is too difficult to understand, but in how risk and probabilities are presented. A natural frequency representation of the above problem would look something like the following:

Out of every 1000 women at age 40 who participate in breast cancer screening, 10 will have breast cancer. Eight out of every 10 women with breast cancer will get a positive mammography (80% of 10). 95 out of every 990 women without breast cancer will also get a positive mammography (9.6% of 990). If you have a new group of women at age 40 and look at those who receive a positive result in the screening, what percentage of these positive results do you think actually indicate that the woman has breast cancer?

In this case, because it's the same problem as before, just represented differently, the answer is the same. However, in surveys, physicians were far more likely to produce the correct result with the data provided this way versus the previous way.

Further Problems

Testing twice: what happens when a patient who got a positive result on the above cancer screening is tested again?

References

[1] Evertiart, J. E., Byrd-Holt D., Sonnenberg A. (1998) Incidence and Risk Factors for Self-reported Peptic Ulcer Disease in the United States. American Journal of Epidemiology. Vol. 147, No. 6 Accessed on March 3rd 2016 from: http://aje.oxfordjournals.org/content/147/6/529.full.pdf
[2] Muehlhauser, L. A History of Bayes Theorem. Less Wrong. 2011, August 29th. Accessed on March 3rd 2016 from: http://lesswrong.com/lw/774/ahistoryofbayestheorem/
[3] Flam, F.D. The Odds, Continually Updated. The New York Times 2014, Sept. 29 Science Accessed on March 3rd 2016 from: http://www.nytimes.com/2014/09/30/science/the-odds-continually-updated.html?
[4] Eddy DM (1982). "Probabilistic Reasoning in Clinical Medicine: Problems and Opportunities". Cambridge University Press. Accessed on February 24th 2016 from: http://personal.lse.ac.uk/robert49/teaching/mm/articles/Eddy1982ProbReasoningInClinicalMedicine.pdf
[5] Windeler, J., & Köbberling, J. (1986). Empirische Untersuchung zur Einschätzung diagnostischer Verfahren am Beispiel des Haemoccult-Tests. Klinische Wochenschrift, 64(21), 1106-1112.
[6] Wegwarth O, Schwartz LM, Woloshin S, Gaissmaier W, Gigerenzer G. Do Physicians Understand Cancer Screening Statistics? A National Survey of Primary Care Physicians in the United States. Ann Intern Med. Accessed on February 24th 2016 from http://annals.org/article.aspx?articleid=1090696
[7] Hoffrage U.; and Gigerenzer G. (2004) How to Improve the Diagnostic Inferences of Medical Experts. In E. Kurz-Milcke & G. Gigerenzer (Eds.), Experts in science and society (pp. 249–268). Accessed on February 24th 2016 from http://library.mpib-berlin.mpg.de/ft/gg/GGHow2004.pdf
[8] Hoffrage, U., Krauss, S., Martignon, L., & Gigerenzer, G. (2015). Natural frequencies improve Bayesian reasoning in simple and complex inference tasks. Frontiers in Psychology, 6, 1473. http://doi.org/10.3389/fpsyg.2015.01473
[9] Cosmides, L., & Tooby, J. (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. cognition, 58(1), 1-73. Accessed on February 24th 2016, from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.8290&rep=rep1&type=pdf
[10] Lindsey, S., Hertwig, R., & Gigerenzer, G. (2003). Communicating statistical DNA evidence. Jurimetrics, 147-163. Accessed on February 24th 2016, from http://pubman.mpdl.mpg.de/pubman/item/escidoc:2101705/component/escidoc:2101704/SLCommunicating2003.pdf
[11] Zhu, L., & Gigerenzer, G. (2006). Children can solve Bayesian problems: The role of representation in mental computation. Cognition, 98(3), 287-308. Accessed on February 24th 2016, from http://pubman.mpdl.mpg.de/pubman/item/escidoc:2100734/component/escidoc:2100733/GGChild2006.pdf

References

Gilles, D. (2000). Philosophical theories of probability(Philosophical Issues in Science) (pp. 88). London, UK: Routledge.

Contents