Berkson's Paradox
Berkson's paradox is a result in statistics, very closely related to Simpson's paradox, that demonstrates that two values can statistically be negatively correlated even when they appear positively correlated in the population. In a high school, being taller may appear to be positively correlated with being good at math. However, statistically, a student's height and math skills are not correlated, and being taller doesn't make someone better or worse at math. It is simply the case that taller students tend to be older and have studied more math. Many applications of Berkson's paradox are less obvious.
Berkson's paradox is a particular kind of selection bias, or statistical result, caused by systematically observing some events more than others. In this paradox, observations are restricted to those where two variables sum together. If you know that \(A + B\) must be within a certain range, then having a high \(A\) results in a lower \(B\), and vice versa. Since observations are often more or less likely given a combination of variables, Berkson's paradox is ubiquitous. Berkson's paradox, and selection bias in general, appear in almost every field of research, particularly epidemiology and economics.
Patrons rate restaurants based on a number of features, including the quality of the food and the quality of the "atmosphere" (what it looks like, what music they play, etc). They combine these ratings into an overall rating from one to five stars.
You come across two restaurants that you know have five star ratings. You can see inside, but you haven't tasted the food yet. Which restaurant is more likely to have the best tasting food?
University Admissions
Universities pick students based on a number of attributes. In the US, two commonly considered attributes are high school GPA and SAT scores. These are positively correlated [1], so one would expect that within a given school they would also be positively correlated. However, this need not be so.
The admissions committee accepts students who have either a sufficiently high GPA, a sufficiently high SAT score, or some combination of the two. However, applicants who have both high GPAs and high SAT scores will likely get into a higher-tier school and not attend, even if they are accepted. The range of students that actually attend the school is given by the blue dots in the plot in the introduction. These dots show a downward trend even though the overall population (red and blue dots) show an upward trend. This trend reversal is the "paradox," though there is nothing truly paradoxical about it. It is the result of a trade-off between GPA and SAT scores in the people reviewed..
Berkson's paradox is also referred to as Berkson's bias, since this form of selection doesn't necessarily result in a trend reversal. In the example plot, the two lines are firm cutoffs--nobody below the lower line was admitted, and nobody above the upper line decided to attend. However, if they were instead probabilistic, so that those with higher overall GPAs + SAT scores were more or less likely to attend, then the effect would be less drastic. The lines are also relatively close together. If they were sufficiently far apart, so that only students with very low scores were rejected and only students with very high scores resulted, there would still be a positive correlation among students at the school; it would just not be quite as big as the correlation for all students.
Examples
Hospitals
Berkson's Paradox was originally discovered in the context of epidemiological studies--studies that track the relationship between disease and exposure to potential risk factors. For instance, the relationship between lung cancer and smoking as a risk factor is a good example. It is very convenient to conduct these studies at hospitals, since there are a large number of patients in one place that have already been diagnosed with the disease. (Also, the researchers conducting the studies are frequently doctors.)
However, in the same way that universities accept students based on a combination of positive traits, hospitals accept patients based on a combination of symptoms. For example, if a study wants to know if pregnancy increases or decreases the time for an HIV-positive woman to develop full-blown AIDS, and the study is conducted at a antenatal clinic, the study may be biased. Women will come either to be seen about pregnancy or to be seen about AIDS risk [2]. So, a woman who comes to the clinic and is not pregnant is more likely to have AIDS than someone in the general population, as they likely came to the clinic for some reason, and it isn't pregnancy. There may seem to be a correlation even if there is not one.
In a hospital-based study determining whether pancreatic cancer and coffee use are correlated, the control group was pulled from patients of a gastroenterologist. However, patients with gastrointestinal distress are less likely to to drink coffee than the general population because of their gastrointestinal distress. So, the study artificially inflated the rate of coffee drinking in the test group as opposed to the deflated control group [3].
Dating
People pick partners for dating based on a combination of physical traits and personality traits. A graph similar to the one for universities could be drawn, having physical and personality traits on the two axes, with a region of people. This may account for the common (though unsubstantiated) observation that handsome men are jerks [4].
Book Recommendations
Researchers have found that quality ratings for books that had won literary awards went down after the book won the award [4]. This seems paradoxical, since people usually like books more if they know other people like them. However, people choose to read a book based on a combination of whether the book is popular and whether it seems to be interesting to them. If a book is made more popular (by getting awards, for instance), then more of the people who read it will do so because it directly appeals to them. If you want to read the best books, avoid ones that you only want to read because your friends are reading it.
Probabilities
While Berkson's bias can apply to any two random variables, it is best expressed in terms of two independent variables \(X\) and \(Y\). If \(X\) and \(Y\) are independent, then the conditional probability of \(X\) given \(X\) or \(Y\) is greater than the conditional probability of \(X\) given both \(X\) or \(Y\) and the value of \(Y\):
\[P(X \mid Y, X \cap Y) < P(X \mid X \cap Y)\].
Consider two discrete binary random variables \(X = {0,1}\) and \(Y = {0,1}\). They are uniformly distributed, so that each combination has probability \(\frac{1}{4}\). Restricting to the space \((X = 1) \cap (Y = 1)\) leaves three possible choices, \((X,Y) = (1,0), (0,1)\) and \((1,1)\), each of which has equal probability \(frac{1}{3}\).
The conditional probability \(P(X \mid (X = 1) \cap (Y = 1))\) is \(\frac{2}{3}\). Two out of three of the possible options have \(X = 1\). However, the conditional probability \(P(X \mid (Y = 1), (X = 1) \cap (Y = 1))\) is \(\frac{1}{2}\), since only one of the two possible options has \(X = 1\).
Collider
Berkson's paradox and Simpson's paradox are both examples of conditioning on causal colliders. When representing how events are causally related, statisticians will draw directed graphs, where an edge from event \(A\) to event \(B\) represents \(A\) causing \(B\). When multiple nodes point into the same node, it's called a "collider," since the incoming arrows "collide" at that node. In the university example, the collider is university attendance, which is caused by both GPA and SAT scores.
The paradox can be restated as saying that controlling for a collider introduces correlations that were not previously there. This is an issue when developing protocols for scientific studies, since usually it is a good thing to control variables so that there is as little difference as possible between two populations.
All of the examples here have relied on a collider that somehow involves summing together two options. However, colliders can, in general, be any function involving two other events. In the Simpson's paradox example of Caltech admission rates, \(X\) and \(Y\) are applicant gender and department, respectively, and the collider is admission rate.
Citations
[1] Paulos, John Allen. Do SAT Scores Really Predict Success?. ABC News. Retrieved on 4 Mar 2016 from http://abcnews.go.com/Technology/WhosCounting/story?id=98373&page=1
[2] Westreich D. Berkson’s bias, selection bias, and missing data. Epidemiology (Cambridge, Mass). 2012;23(1):159-164. doi:10.1097/EDE.0b013e31823b6296.
[3] University of Illinois at Chicago. Bias in Study Design. Retrieved on 7 Mar 2016 from http://www.uic.edu/classes/epid/epid401/lectures/lecture4.pdf
[4] Ellenberg, Jordan. Why Are Handsome Men Such Jerks? The Atlantic. Retrieved on 5 Mar 2016 from http://www.slate.com/blogs/hownottobewrong/2014/06/03/berksonsfallacywhyarehandsomemensuchjerks.html
[5] Bates, Tim. Retrieved on 5 Mar 2016 from https://en.wikipedia.org/wiki/Collider_(epidemiology)#/media/File:Collider(statistics).png