From the *Departments of Epidemiology and Statistics, University of California Los Angeles, Los Angeles, California; †Department of Preventive Medicine, USC/Norris Comprehensive Cancer Center, Keck School of Medicine of the University of Southern California, Los Angeles, California.
Correspondence: Sander Greenland, Departments of Epidemiology and Statistics, University of California Los Angeles, Los Angeles, CA 90095-1772. E-mail: firstname.lastname@example.org
We thank the commentators for their thoughtful observations. We heartily endorse the comments from Weiss.1 Susser2 addresses issues beyond our scope, emphasizing the importance of social factors for health; as he notes, this emphasis doesn't lessen the importance of the individual factors such as diet (especially when those factors mediate effects of social factors like food pricing and promotion). Social science and psychology also bear on problems of methods and inference: not only does social context heavily influence what is studied and how, it also influences presentation and interpretation of results.3 For example, the association of residential magnetic fields with health was deemed outlandish by the highest caste of scientists (physicists); we suspect that is why some major studies were reported as negative even though they exhibited positive associations similar to those seen in most studies.4 In light of this controversy and others (eg, the Byzantine history of hormone replacement), epidemiology should be a rich source of material for studies in the sociology of science.
Haack5 raises many subtle issues, as do Mayo and Spanos;6 in the brief space allowed we can do little more than point to our longer discussions.7–15 They interpreted “risk-factor epidemiology” as analysis of data collected for purposes other than that at hand. For us, it also includes studies that collect data to test primitives like “X causes Y,” with (at most) only speculative hypotheses about the innards of the black box connecting X to Y. Here, hypothesized mechanisms do little more than make the causal hypothesis sound reasonable (which is valuable: implausibility is what dogs magnetic-field research).
Haack5 proposes “honesty” as a methodologic need. Honesty, balance, and an ability to see and report unpleasant facts are usually taken for granted. Yet prejudicial reporting happens, eg, via selective citation or criticism of mixed literature.7 Imbalance is built into social conventions (like null-hypothesis testing) that privilege the null and misrepresent absence of evidence as evidence of absence.8 It is worsened by naive discussions of policy implications, without thought to other evidence or to practicality or side-effects of actions. Balance is a casualty when scientists become advocates of hypotheses or actions.7,8 Seeing the damage done by unquestioned conventions and advocacy,7–9 we see merit in accurate descriptions of observations and the circumstances surrounding the observation, unpressured by demands to “reach conclusions”—even in literature review.
We agree with Mayo and Spanos4 that statistical regularities can be established without appeal to explanatory theory; indeed, searching for statistical regularities (data mining) is one definition of “black-box” research. Machine learning (algorithmic modeling) shows just how far one can go with “black-box” statistics when pure prediction is the goal and the error structure is identifiable.16 We saw statistical regularity in the relation of magnetic fields to leukemia in case-control studies,4 and this regularity successfully predicted other results.10, sec.3 Whether it represents field effects or shared biases is another matter.
Mayo and Spanos4 say, “the ability to establish the validity of statistical assumptions without appeals to substantive theories is key to experimental knowledge of real effects.” But this key is missing in observational epidemiology. Its absence pulls the plug on frequentist methods, for those assume the data arise from experiments with adequate control or knowledge of errors (both random and systematic). In most epidemiology (and, we suspect, in most social science) claims of adequate error control are unfounded; in this sense the data are nonexperimental. Worse, such control is usually infeasible within the severe budgetary and ethical constraints of human-subjects research. Hence, there are enduring controversies about whether certain associations even exist, let alone are causal.
Without adequate error control, the most we can infer from conventional statistics is that some study designs tend to exhibit a pattern of results. The key assumption implicit in inference to targets of interest is that the data were generated via an identifiable random process (a design with errors known up to an estimable set of parameters) from a population like the target. Most epidemiologic analyses take statistical assumptions for granted, yet this one is absurd because identified random components of typical epidemiologic studies (eg, random-digit dialing) do not dominate the data-generating process.10–12 When someone thinks a statistic provides “knowledge of real effects,” they are assuming it successfully accounts or controls not only for random error, but also for residual confounding, selection bias, and measurement error. Such assumptions are usually untestable with epidemiologic data, which renders specification testing moot; that is, realistic models for epidemiologic data are not identified.10,12 Exhortations to “collect better data” to address this problem will (rightly) be seen as naive, given the complexities of epidemiologic research.
A feasible approach is to make good use of available data, regardless of the design or discipline of origin. We see 2 honest analytic responses to this epidemiologic reality: description, relying on subject-matter background to specify what data summaries would most efficiently encode and convey desired information,13 and opinion refinement, relying on subject-matter background to model major sources of bias or uncertainty.10,12,17,18 Inferences will be infinitely sensitive to model choices, and different observers will have differing opinions about what models are acceptable. Hence, rarely can we offer severe tests; at best we can offer modest opinions (perhaps in the form of subjective probabilities) whose derivation is laid bare for critical inspection.
No inferential theory or method is optimal or complete in practice; thus, some diversity is needed. Bias modeling arose to address limits of canonical (“criterion-based”) causal inference, which treats causal inference as a qualitative diagnosis.14 Modeling provides quantification, which allows us to compare sources of uncertainty and to show how strong opinions hinge on strong assumptions.10 Subjective Bayesian (prior) modeling seems natural in these opinion-refinement and opinion-debunking roles.10,15 Frequentism provides a repeated-sampling framework to simulate the hypothetical experimental performance of procedures, including Bayesian ones. But Bayesian reasoning provides a check on frequentist methods by revealing the often absurd error models needed to support causal inferences from P-values and confidence limits. Such revelations are crucial if the application of statistics is to do more good than harm in epidemiology.10–12
1. Weiss NS. Presents can come in black boxes, too [commentary]. Epidemiology
2. Susser E. Eco-epidemiology: thinking outside the black box [commentary]. Epidemiology
3. Feyerabend PK. Against method, 3rd
ed. New York: Verso; 1993.
4. Greenland S, Sheppard AR, Kaune WT, Poole C, Kelsh MA. A pooled analysis of magnetic fields, wire codes, and childhood leukemia. Epidemiology
5. Haack S. An epistomologist among the epidemiologists [commentary]. Epidemiology
6. Mayo DG, Spanos A. When can risk-factor epidemiology provide reliable tests? [commentary] Epidemiology
7. Greenland S. Science versus advocacy: The challenge of Dr. Feinstein. Epidemiology
8. Greenland S. The need for critical appraisal of expert witnesses in epidemiology and statistics. Wake Forest Law Rev
9. Greenland S. The relation of the probability of causation to the relative risk and the doubling dose: A methodologic error that has become a social problem. Am J Public Health
10. Greenland S. Multiple-bias modeling for observational studies (with discussion). J Royal Stat Soc
, ser A, 2005 (in press).
11. Greenland S. Randomization, statistics, and causal inference. Epidemiology
12. Greenland S. Interval estimation by simulation as an alternative to and extension of confidence intervals. Int J Epidemiol
13. Greenland S. Summarization, smoothing, and inference. Scand J Soc Med
14. Greenland S. An overview of methods for causal inference from observational studies. Gelman A, Meng XL (eds.). Applied Bayesian modeling and causal inference from an incomplete-data perspective.
New York: Wiley, 2004, in press.
15. Greenland S. Probability logic and probabilistic induction. Epidemiology
16. Breiman L. Statistical modeling: the two cultures. Stat Sci
17. Lash TL, Fink AK. Semi-automated sensitivity analysis to assess systematic errors in observational epidemiologic data. Epidemiology
18. Phillips CV. Quantifying and reporting uncertainty from systematic errors. Epidemiology