Simpson's paradox is a paradox from statistics. It is named after Edward H. Simpson, a British statistician who first described it in 1951.[1] The statistician Karl Pearson described a very similar effect in 1899.[2]- Udny Yule's description dates from 1903.[3] Sometimes, it is called the Yule–Simpson effect. When looking at the statistical scores of groups, these scores may change, depending on whether the groups are looked at one by one, or if they are combined into a larger group. This case often occurs in social sciences and medical statistics.[4] It may confuse people, if frequency data is used to explain a causal relationship.[5] Other names for the paradox include reversal paradox and amalgamation paradox.[6]

## Example: Kidney stone treatment

This is a real-life example from a medical study[7] comparing the success rates of two treatments for kidney stones.[8]

The table shows the success rates and numbers of treatments for treatments involving both small and large kidney stones, where Treatment A includes all open procedures and Treatment B is percutaneous nephrolithotomy:

Treatment A Treatment B
success failure success failure
Small Stones Group 1 Group 2
number of patients 81 6 234 36
93% 7% 87% 13%
Large Stones Group 3 Group 4
number of patients 192 71 55 25
73% 27% 69% 31%
Both Group 1+3 Group 2+4
number of patients 273 77 289 61
78% 22% 83% 17%

The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B is more effective when considering both sizes at the same time. In this example, it was not known that the size of the kidney stone influenced the result. This is called a hidden variable (or lurking variable) in statistics.

Which treatment is considered better is determined by an inequality between two ratios (successes/total). The reversal of the inequality between the ratios, which creates Simpson's paradox, happens because two effects occur together:

1. The sizes of the groups, which are combined when the lurking variable is ignored, are very different. Doctors tend to give the severe cases (large stones) the better treatment (A), and the milder cases (small stones) the inferior treatment (B). Therefore, the totals are dominated by groups three and two, and not by the two much smaller groups one and four.
2. The lurking variable has a large effect on the ratios, i.e. the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, the group of patients with large stones using treatment A (group three) does worse than the group with small stones, even if the latter used the inferior treatment B (group two).

## References

1. Simpson, Edward H. (1951). "The Interpretation of Interaction in Contingency Tables". Journal of the Royal Statistical Society, Ser. B 13: 238–241.
2. Pearson, Karl; Lee, A.; Bramley-Moore, L. (1899). "Genetic (reproductive) selection: Inheritance of fertility in man". Philosophical Translations of the Royal Statistical Society, Ser. A 173: 534–539.
3. G. U. Yule (1903). "Notes on the Theory of Association of Attributes in Statistics". Biometrika 2 (2): 121–134. doi:10.1093/biomet/2.2.121.
4. Clifford H. Wagner (February 1982). "Simpson's Paradox in Real Life". The American Statistician 36 (1): 46–48. doi:10.2307/2684093.
5. Judea Pearl. Causality: Models, Reasoning, and Inference, Cambridge University Press (2000, 2nd edition 2009). ISBN 0-521-77362-8.
6. I. J. Good, Y. Mittal (June 1987). "The Amalgamation and Geometry of Two-by-Two Contingency Tables". The Annals of Statistics 15 (2): 694–711. doi:10.1214/aos/1176350369. ISSN 0090-5364.
7. C. R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (29 March 1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed) 292 (6524): 879–882. doi:10.1136/bmj.292.6524.879. PMC 1339981. PMID 3083922.
8. Steven A. Julious and Mark A. Mullee (12/03/1994). "Confounding and Simpson's paradox". BMJ 309 (6967): 1480–1481. PMC 2541623. PMID 7804052.