Secondary Logo

Journal Logo

False Positives: Commentary

Random Allocation in Observational Data: How Small But Robust Effects Could Facilitate Hypothesis-free Causal Inference

Davey Smith, George

Author Information
doi: 10.1097/EDE.0b013e31821d0426
  • Free

Conventional observational epidemiology has an unenviable reputation for generating false-positive findings,1,2 or “scares,” as others call them.3 In 1993, for example, the New York Times reported that “vitamin E greatly reduces the risk of heart disease”4 following simultaneous publication of 2 observational studies in the New England Journal of Medicine demonstrating that use of vitamin E supplements, even for just a few years, was associated with a substantially lower risk of coronary heart disease (CHD).5,6 Randomized controlled trials (RCTs)—testing precisely the same hypothesis—revealed no reduction in risk at all.7 As I write this, UK news media are reporting that “junk food makes your kids dumb,” in response to a paper reporting that children given a poor diet at age 3 had lower IQ scores 5 years later.8 In the latter case, however, no RCTs are ever likely to be carried out, and the status of the finding will remain liminal.

The ratio of false-positive to false-negative (FP:FN) published findings in traditional epidemiology is very high, John Ioannidis and colleagues argue.9 But what exactly are “false-positive” findings in the context of conventional observational epidemiology, and how can their prevalence be quantified? My suspicion is that use of vitamin E supplements would—in most contexts—be associated with lower risk of CHD, because of substantial confounding by a myriad of measured and unmeasured factors related to such a socially and behaviorally patterned exposure.10 The same will apply to children fed on junk food and their later intelligence.11 The inability to substantially “control” statistically for confounding in many situations—due to measurement error in assessed confounders and omission of others—remains underappreciated.12,13 In the sense that these associations do exist, they should perhaps not be called false positives; they are false positives only if they are taken to be indicators of underlying causal effects. Noncausal but replicable observational associations could clearly be of considerable value through prediction of disease risk, and targeting of preventive measures to those who can benefit most. For the holy grail of epidemiology—identifying causes of disease—they are a disappointment, however. A wide range of approaches—from formally comparing associations in contexts where confounding structures differ,14 utilizing correlates of the exposure under study that are not plausible causes15 and natural experiments,16 through to formal instrumental variables methods17—offer greater hope for reliable causal inference than plowing on with traditional approaches and keeping the FP:FN ratio high.9

Ioannidis and colleagues reiterate the low positive predictive value of a nominally “significant” P, something I (along with a generation of epidemiologists, I imagine) first encountered in Michael Oakes' seminal “Statistical Inference,18 although (as has been publically confessed) it did not instantly prevent me from using the word ‘significant’ and over-interpreting such tests.19As Sander Greenland20 argued in this journal 20 years ago, randomization provides the key link between inferential statistics and causal parameters.” It is through this prism that we should consider Ioannidis' finding that in genetic association studies, once appropriate thresholds were applied,21 false positives became a rarity.9 Greenland stated that his arguments were “largely derived from the writings of R.A. Fisher,”20 and it was Fisher who clarified that randomization is inherent in genetic analysis. When lecturing on “Statistical Methods in Genetics” in 1951, Fisher clarified the relationship between the 2 disciplines to which he contributed so much22:

And here I may mention a connection between our two subjects which seem not to be altogether accidental, namely that the factorial method of experimentation, now of lively concern so far afield as the psychologists, or the industrial chemists, derives its structure and its name, from the simultaneous inheritance of Mendelian factors. Geneticists certainly need not feel that the intellectual debt is all on one side.22

Fisher goes on:

Genetics is indeed in a peculiarly favoured condition in that Providence has shielded the geneticist from many of the difficulties of a reliably controlled comparison. The different genotypes possible from the same mating have been beautifully randomised by the meiotic process.22

This principle—that analysis of genetic data is analogous to that of a randomized experiment—has, in epidemiology, been termed “Mendelian randomization.”23 This depends on the basic (but approximate) laws of Mendelian genetics. If the probability that a postmeiotic germ cell that has received any particular allele at segregation contributes to a viable conceptus is independent of environment (following from Mendel's first law), and if genetic variants sort independently (following on from Mendel's second law), then these variants will not be associated with the confounding factors that generally distort conventional observational studies.24 Fisher was referring only to the implications of Mendel's second law when he stated that “A more perfect control of conditions is scarcely possible, than that of different genotypes appearing in the same litter,”22 which would imply that family-based analysis is required.24 If, however, basic precautions with respect to population stratification are applied, then, at a population level, genetic variants are indeed unrelated to nongenetic confounding factors, as has been empirically demonstrated.25

For Fisher, perhaps, Mendelian randomization provided the basis for formularizing randomization in experiments. Fisher appears to be saying this in his 1951 lecture, although his initial advocacy of randomization related to ensuring symmetry of the error distribution,26,27 and “ensur[ing] the validity of normal-theory analysis.”28 His daughter, Joan Fisher Box, writes “the structure of the factorial experiment was borrowed, in all its efficiency and versatility, from genetics.”29 She reminds us that the analysis of variance components was developed in Fisher's pioneering work on polygenic inheritance,30 and it was in the context of analysis of variance that he first hinted at randomization, when crop variation data were analyzed as “if all the plots are undifferentiated, as if the numbers had been mixed up and written down in random order,”31 Fisher's students similarly consider that his pioneering genetic work, in particular with respect to polygenic inheritance,30 was reflected in his later work on the design of experiments.32,33 Iain Chalmers has pointed out that statistical theory did not underlie the development of controlled trials in medicine,34 and it is entertaining to speculate that, rather than statistical theory, it was analogy with the factorial randomization of Mendel's second law that provided the basis for the development of randomized experiments in general.

In biomedical science the key value of Mendelian randomization is that genetic variants can proxy for modifiable risk factors, and their randomization allows considerably greater inferential power than is provided in conventional observational epidemiology.24 The principles have been reviewed24,35,36 and a now very substantial body of empirical studies have been reported in the leading medical journals. Indeed, for biomarkers, this approach is rapidly becoming de rigueur.37,38 Obstacles to reliable interpretation from Mendelian randomization studies have been discussed at considerable length.24,35,36 Important issues include low statistical power of the instrumental variable analyses conducted within this framework and the possibility of confounding being reintroduced by pleiotropy—that is, possible multiple functional consequences of genetic variants. Here the right-hand term in Ioannidis' FP:FN ratio comes into play. The very low ratio with respect to properly conducted genetic studies reflects the fact that there are many false-negative findings and a very large number of common variants waiting to be identified through adequately powered studies.39 Large numbers of variants (eg, over 200 variants for height, 100 for circulating lipids, etc) have already been identified in genome-wide association studies, and the harvest will continue.40 These allow the construction of allele scores41 that explain more of the variance in the proxied-for phenotype and thus increase the power beyond that of the single-variant approaches used in most Mendelian randomization studies to date. Indeed, such allele scores often explain more of the variance in the phenotype than any potentially randomizable intervention could.

More importantly, multiple variants (or independent combinations of variants) working through different pathways can be used as separate instruments. If these predict the same causal effect of the proxied-for environmentally modifiable risk factor, then it becomes much less plausible that reintroduced confounding (through pleiotropy, for example) explains the association, because the confounding would have to be acting in the same way for these 2 unlinked variants or combination of variants. This can be likened to RCTs of blood pressure–lowering agents (eg, diuretics and ACE inhibitors), which work through different biologic mechanisms and have different potential side effects. If the various agents produce the reductions in cardiovascular disease risk predicted by the degree to which they lower blood pressure, then it is unlikely that they act through agent-specific (pleiotropic) effects of the drugs; rather, such a finding points to blood pressure lowering as being key. The latter is indeed what is generally observed.42

Multiple variant approaches have been reported in the Mendelian randomization context,43 but they have not yet exploited the possibilities provided by large numbers of independent variants. These allow very large numbers of combinations to be used as instruments to evaluate the validity of the assumptions. In this setting, even if some variants do have pleiotropic effects, this will balance out across the different combinations. More speculatively, in the future we will be able to use these methods to obtain hypothesis-free causal inference, through combining forwards genetics (phenotype to genetic variant) and reverse genetics (genetic variant to phenotype) approaches. As a shorthand it can be said that traits tend to have heritability of around 50%,44 and rather few traits with low to zero heritability exist. Genome-wide association data (or, in time, whole-genome sequence data) can be used to generate multiple instruments for each trait, which can then be related to every other variable, and in this way the causal structure will be revealed. The wonderful “hypothesis-generating machine,” invented by Philip Cole and announced to the world in this journal in 1993,45 will become redundant. We will enter the pure Fisherian world, where inferential statistics have meaning. Greenland envisaged this possibility in 1990, stating that “If nature or circumstance resulted in what is essentially random allocation and we knew this was so, we could employ all the above interpretations of our statistics.”20 He thought, however, that “such ‘natural experiments’ are rare.”20 They no longer are: unmediated hypothesis-free causality is available to all.


GEORGE DAVEY SMITH is Professor of Clinical Epidemiology at the University of Bristol, Director of the MRC Centre for Causal Analyses in Translational Epidemiology (CAiTE) and coeditor of the International Journal of Epidemiology. After nearly 3 decades of contributing to the miasma of obvious, spurious or uninteresting epidemiologic findings, he is attempting to do something more useful during his declining years.


I thank Sander Greenland, Debbie Lawlor, and John Lynch for their comments on an earlier draft.


1.Taubes G, Mann CC. Epidemiology reaches its limits. Science. 1995;269:164–169.
2.Bofetta P. False positive results in cancer epidemiology. J Natl Cancer Inst. 2008;100:988–995.
3.Brignell J. The Epidemiologists: Have They Got Scares for You! London: Brignell Associates; 2004.
4.Brody JE. Vitamin E greatly reduces risk of heart disease, studies suggest. New York Times. May 20th, 1993.
5.Rimm EB, Stampfer MJ, Ascherio A, et al. Vitamin E consumption and the risk of coronary heart disease in men. N Engl J Med. 1993;328:1450–1456.
6.Stampfer MJ, Hennekens CH, Manson JE, et al. Vitamin E consumption and the risk of coronary disease in women. N Engl J Med. 1993;328:1444–1449.
7.Eidelman RS, Hollar D, Hebert PR, et al. Randomized trials of vitamin E in the treatment and prevention of cardiovascular disease. Arch Intern Med. 2004;164:1552–1556.
8.Northstone K, Joinson C, Emmett P, et al. Are dietary patterns in childhood associated with IQ at 8 years of age? A population-based cohort study. J Epidemiol Community Health. In press.
9.Ioannidis J, Tarone R, McLaughlin JK. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology. 2011;22:450–456.
10.Lawlor DA, Davey Smith G, Bruckdorfer KR, et al. Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence? Lancet. 2004;363:1724–1727.
11.Connor S. Lab notes: is fast food making children stupid? Don't swallow the stories. The Independent. February 11th, 2011.
12.Phillips A, Davey Smith G. How independent are “independent” effects? Relative risk estimation when correlated exposures are measured imprecisely. J Clin Epidemiol. 1991;44:1223–1231.
13.Phillips AN, Davey Smith G. Bias in relative odds estimation owing to imprecise measurement of correlated exposures. Stats Med. 1992;11:953–961.
14.Brion M-J, Lawlor DA, Matijasevich A, et al. What are the causal effects of breastfeeding on IQ, obesity and blood pressure? Evidence from comparing high-income and middle-income cohorts. Int J Epidemiol. 2011; doi: 10.1093/ije/dyr020.
15.Davey Smith G. Assessing intrauterine influences on offspring health outcomes: can epidemiological findings yield robust results? Basic Clin Pharmacol Toxicol. 2008;102:245–256.
16.Rutter M. Epidemiological methods to tackle causal questions. Int J Epidemiol. 2009;38:3–6.
17.Angrist JD, Pischke JS. Most Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press; 2008.
18.Oakes M. Statistical Inference: A Commentary for the Social and Behavioural Sciences. Chichester: Wiley; 1986.
19.Sterne J, Davey Smith G. Sifting the evidence—what's wrong with significance tests? BMJ. 2001;322:226–231.
20.Greenland S. Randomization, statistics, and causal inference. Epidemiology. 1990;1:421–429.
21.Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet. 2003;361:865–872.
22.Fisher RA. Statistical methods in genetics [reprint in: Int J Epidemiol. 2010;39:335–339]. Heredity. 1952;6:1–12.
23.Davey Smith G. Capitalizing on Mendelian randomization to assess the effects of treatments. J R Soc Med. 2007;100:432–435. Available at:
24.Davey Smith G, Ebrahim S. Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:21–25.
25.Davey Smith G, Lawlor DA, Harbord R, et al. Clustered Environments and Randomized Genes: a fundamental distinction between conventional and genetic epidemiology. PLoS Med. 2008;4:1985–1992.
26.Fisher RA. The arrangement of field experiments. J Min Agric G Br. 1926;33:503–513.
27.Box JF, Fisher RA. The Life a Scientist. New York: John Wiley & Sons; 1978.
28.Hinkley DV. Contribution to the discussion of Basu's “Randomization Analysis of Experimental Data: The Fisher Randomization Test.” J Am Stat Assoc. 1980;75:582–584.
29.Box JF. On RA Fisher's Bateson lecture on statistical methods in genetics. Int J Epidemiol. 2010;39:335–339.
30.Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb. 1918;52:399–433.
31.Fisher RA, Mackenzie WA. Studies in crop variation. II the manurial response of different potato varieties. J Agric Sci. 1923;13:311–320.
32.Bodmer WF. Commentary: Connections between genetics and statistics: a commentary on Fisher's 1951 Bateson lecture—“Statistical Methods in Genetics.” Int J Epidemiol. 2010;39:340–344.
33.Finney DJ. Commentary: “Statistical Methods in Genetics” by Sir Ronald A Fisher. Int J Epidemiol. 2010;39:339–340.
34.Chalmers I. Statistical theory was not the reason that randomization was used in the British medical research councils clinical trial of streptomycin for pulmonary tuberculosis. In: Jorland G, Opinel A, Weisz G, eds. Body Counts: Medical Quantification in Historical and Sociological Perspectives. Montreal: McGill-Queens University Press; 2005:309–334.
35.Davey Smith G. Use of genetic markers and gene-diet interactions for interrogating population-level causal influences of diet on health. Genes Nutr. 2011;6:27–43.
36.Sheehan NA, Didelez V, Burton PR, et al. Mendelian randomisation and causal inference in observational epidemiology. PLoS Med. 2008;5:e177.
37.Thanassoulis G, O'Donnell CJ. Mendelian randomization: nature's randomised trial in the post genome era. JAMA. 2009;301:2386–2388.
38.Shah SH, de Lemos JA. Biomarkers and cardiovascular disease: determining causality and quantifying contribution to risk assessment. JAMA. 2009;302:92–93.
39.Yang J, Benyamin B, McEvoy BP, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569.
40.National Human Genome Research Institution. A catalog of published genome wide association studies. Available at:
41.Pierce BL, Ahsan H, VanderWeele TJ. Power and instrument strength requirements for Mendelian randomization studies using multiple genetic variants. Int J Epidemiol. 2011; doi: 10.1093/jie/dyq151.
42.Law MR, Morris JK, Wald NJ. Use of blood pressure lowering drugs in the prevention of cardiovascular disease: meta-analysis of 147 randomised trials in the context of expectations from prospective epidemiological studies. BMJ. 2009;338:b1665.
43.Timpson NJ, Sayers A, Davey Smith G, Tobias JH. How does body fat influence bone mass in childhood? A Mendelian randomisation approach. J Bone Miner Res. 2009;24:522–533.
44.MacGregor AJ, Snieder HH, Schork NJ, Spector TD. Twins—novel uses to study complex traits and genetic diseases. Trends Genet. 2000;16:131–134.
45.Cole P. The hypothesis generating machine. Epidemiology. 1993;4:271–273.
© 2011 Lippincott Williams & Wilkins, Inc.