Secondary Logo

Journal Logo

Epidemiology & Society

Theorizing About Causes at the Individual Level While Estimating Effects at the Population Level

Implications for Prevention

Rockhill, Beverly

Author Information
doi: 10.1097/01.ede.0000147111.46244.41
  • Free



Population-level thinking is not intrinsically more important or compelling than individual-level thinking. However, we can most accurately address the population level. To increase our contribution to disease prevention, we must reject the goal of identifying “susceptible” individuals on practical public health grounds, rather than only on political grounds. We must match our philosophies about causes (and prevention) to our statistical methods, which convey information on averages in aggregates of individuals. We must broaden the purview of our discipline beyond questions related to individuals and biologic mechanisms to include, more frequently, strategic questions about causes of differences in distributions of risk, causes of population shifts in risk, and ethical and efficacious means to achieve reductions in average risk in populations.

I start with the assumption that the goal of epidemiologic research is, ultimately, disease prevention. This leads to a key question: “To what uses are quantitative findings from epidemiology most appropriately applied, given a prime concern with prevention?” For the noninfectious diseases, most risk factors (including genetic ones) are associated with very low positive predictive values. This means that strategies based in individualism (eg, individual risk communication, or “individualized prevention”1) are questionable scientifically and as public health policy.

There is a larger issue. The dominant philosophy in epidemiology is one of individualism,2,3 specifically one of biologic mechanisms in individuals. Yet our primary analytic tools are incapable of addressing issues of biologic mechanisms in individuals. The distinction between questions about group averages and questions about mechanisms of individual events can be framed by an analogy to coin-flipping. “Did more heads than expected arise in the repeated tossing of this coin?” is a question readily answered by recourse to the binomial probability model. The question of why a particular flip resulted in heads rather than tails is a mechanistic question not answerable through reference to the statistical model.

Any average quantitative measure of association discovered by epidemiologists can be consistent with myriad biologic mechanisms in different individuals. An average estimate of causal effect does not suggest that mechanisms or causal paths are homogeneous across individuals or even that the average represents reality for any individual. A relative risk of 1.0, for instance, obviously does not mean that all individuals experience no effect on their risk of disease from exposure. Precisely the opposite could be true; the exposure may increase risk in some persons and reduce the risk in others. Consider the situation of oral contraceptives and breast cancer, for instance (Malcolm Pike, personal communication, 2000). The overall relationship in many studies appears close to the null. However, some women whose lifetime exposure to endogenous ovarian hormone levels is low to begin with may have an increase in breast cancer risk with the increased ovarian hormone levels they experience with oral contraceptive use, whereas those women whose endogenous hormone levels are higher to begin with can experience a decrease in levels with oral contraceptive use and consequently can experience a decrease in breast cancer risk. This is not just a matter of not including the right “effect modifiers” in the analysis. Rather, it is a philosophical matter about averages and the inability to draw conclusive statements about individuals based on summary information about classes or groups of individuals.

By focusing on questions of biologic mechanisms in individuals and limiting our discussion of causes to such mechanisms, we use our methods and data inappropriately. We avoid deeper questions of determinants of population risk, and of the nature of the relationship between causes of population risk and causes of individual cases. We also sidestep thoughtful discussion of the relation between labeling of “causes” and the possibilities for meaningful strides in prevention. Finally, although the language of tradeoffs between individual-level risks and benefits is becoming increasingly common in our discipline, there is little corresponding discussion of the more appropriate topic of population-level tradeoffs of risk and benefit with respect to societal interventions.

In laying out my arguments, I discuss some limitations and consequences of the current hunt for universally necessary causes of disease, mainly in the genome, and I tie these issues back to our responsibility to strive for primary prevention of disease. I also propose one possible explanation for how epidemiology, as a discipline, has become so individualistic in its current focus. My concern throughout is the common noninfectious diseases such as cancer and cardiovascular disease.


We pursue information on increasingly reductionist causes in our search for accurate knowledge of causes of specific cases. Both philosophical reasoning and empiric evidence suggest that this search may not be as fruitful as proponents claim.

We have collected a wealth of knowledge about causes (risk factors) for many diseases. Yet we persist in the search for more, and “better,” causal knowledge. Sidestepping the question of whether this endless search for causes is appropriate from a social-ethical or economic standpoint, I want to raise the philosophical question of what it is meant by “better” causal knowledge.

Epidemiology, as a discipline, is pursuing an increasingly reductionist search for causes. For many noninfectious diseases, this search now occurs largely in the genome. The search is driven in large part by the ideal of being able to answer questions about causes of specific individuals’ disease and being able to predict individuals’ futures with a high degree of accuracy. In the language of our most popular causal model,2,3 one might call this a search for universally necessary component causes of disease—those causes or mechanisms that appear in every case of disease and, ideally, are present in few people who do not develop disease.

The reductionist search for genetic causes is expected by some leading scientists to lead to a revolution in the field of medicine—individualized prevention.1 Clearly, however, there is no necessary link between the discovery of universally necessary component causes and prevention. For example, we could label as a universally necessary cause of myocardial infarction “cutoff of blood supply to the heart.” This causal knowledge is useless. It is, in essence, definitional and leads nowhere with respect to primary prevention. In a similar view, defining a series of genetic mutations that result in carcinogenesis a necessary cause of cancer may not be relevant at all to prevention.

In addition to the ambiguity of the link between “causes” and prevention, there is another large issue equally relevant to the aspiration of individualized prevention. The promise that molecular and genetic epidemiologic research will allow prediction of individuals’ futures, and, by implication, efficient preventive interventions, is flawed empirically and philosophically.4 First, despite much speculation and anticipation, virtually all studies to date of common genetic polymorphisms and their interactions with environmental exposures have produced modest relative risks. Observed risks are mostly of the same magnitude (ie, well under 3.0) as most conventional nongenetic risk factors, rather than the high risks of lung cancer with heavy smoking or cervical cancer with oncogenic human papillomavirus (HPV) infection. For a risk factor or risk marker to serve as a useful discriminatory tool at the individual level (in terms of accurately segregating individuals into those who will and those who will not get disease), we need relative risks or odds ratios much greater than usually seen in epidemiology, greater than 50 or so.5,6 Genetic mutations known to be associated with very large disease risks are found in only a small proportion of the population and account for a relatively small proportion of cases. Furthermore, it appears likely that even some of the high estimates of risks with genetic mutations are overestimated.4,7,8

Some have argued that the field is still young and that the promise of widespread individualized risk-targeting based on genes will be fulfilled in the future. Is there a theoretical basis for such a promise? Pritchard,9 a population geneticist, uses evolutionary theory to argue that widespread, even moderately penetrant alleles for the common long-latency noninfectious diseases are unlikely. Others4,10 have argued similarly. Numerous empiric findings to date support these cautious opinions. Thus, despite the anticipatory talk of accurate “individualized prevention,” there is virtually no empiric or theoretical support for the concept.


Are improvements in public health likely to come from providing individuals information on “individual risks” of myriad possible health outcomes or from changing conditions at a macrolevel so that more individuals can avoid health risks?

The fundamental question that underlies most epidemiologic investigations is, “What is the best estimate of the true causal association between exposure and disease?” The answer to this question should have real quantitative relevance for public health. The goal is not to churn out a stream of unbiased association measures that have no intrinsic meaning beyond a small circle of academic epidemiologists.

What, then, is the primary quantitative relevance of epidemiologic findings? I argue that the relevance lies with population-level thinking, that is, thinking about large numbers of people rather than the individual. For many of the diseases studied by epidemiology, it makes little difference whether an exposure is associated with a relative risk of 1.2, 1.5, 3.0, or 10 in terms of discriminatory ability at the individual level (that is, in terms of positive and negative predictive value).5,6 Epidemiologists and other health professionals concerned with primary prevention are usually in a weak position quantitatively when it comes to convincing individuals of the relevance of epidemiologic findings to their own lives. We do not often admit the poor positive and negative predictive values of most of our risk designations.

There are a few situations in which individual-level discriminatory accuracy is quite high and where individualized prevention is the optimal strategy, for example, with oncogenic HPV infection and cervical cancer. However, exposures like oncogenic HPV, which are associated with extremely high relative risks, are relatively few in number. The relative risk for cervical cancer associated with oncogenic HPV infection may be as high as infinity—ie, the predictive value associated with being negative for this risk factor may be 1.0. This example is a poor paradigm for the usual epidemiologic risk factor, however. The focus on individual risk and individualized prevention is inappropriately strong in our discipline, given the more common situation of poor risk discrimination.

In many circumstances, if an individual were provided with all the relevant numeric information, it is not clear that a decision to change his or her behavior would be “rational.” Would a well-informed individual decide to change dietary patterns to lower “individual risk” of colon cancer from a high risk of 32 in 10,000 in 5 years to a low risk of 8 in 10,000 in 5 years? Is it a rational decision for a woman to agree to annual mammographic screening to lower the 10-year risk of dying of breast cancer from 24 in 1000 to 12 in 1000? A “rational” or “numerate” individual could justifiably dismiss most epidemiologic information as personally unconvincing on quantitative grounds and therefore not worth the inconvenience. Our estimates of absolute and relative risk derive from aggregates, and the logic in their quantitative comparison and relevance is inherent at the aggregate level. The epidemiologic issue is not one of providing individuals with individual risk estimates and expecting them to act “rationally,” but rather considering whether population-wide shifts on risk factors should be encouraged.

This is not an argument for not communicating risk information to individuals. Indeed, honest and complete communication about the limitations and errors in applying epidemiologic findings to the individual level might help convince individuals and policymakers that macrolevel issues must be addressed. Such communication might also convince people that, when it comes to epidemiologic findings and the goal of primary prevention, the language of population-level tradeoffs of risks and benefits should take precedence over the language of individual-level tradeoffs—not because the population level is more important or compelling, but simply because that is what we can address with any respectable and consistent accuracy.

One of the key questions that epidemiologists concerned with disease prevention must openly address is the following: “By what ethical means, if any, can we envision many people changing in response to risk factor findings?” There is a strategy, although it often goes unspoken, that underlies the heralding of the era of “individualized prevention.” This strategy relies, in large part, on well-intended but often inaccurate persuasion in addition to well-intended but often inaccurate reassurance; some individuals can be convinced that they need to change, based on their genetic or risk factor profile, whereas others can be convinced that all is fine, that they are “nonsusceptible.” The ethics of such a strategy are questionable given both the weak discriminatory accuracy of most risk factors (including genetic ones) and the poor understanding of probability among many who will give and receive such communication. In addition to the ethical concerns, there is the issue of effectiveness. Because most people who will eventually get a disease will not be designated as “high risk” by our tools (ie, because most risk factors do not improve knowledge of negative, as well as positive, predictive values) such a strategy will fail to prevent many of the cases of disease.

Mechanic11 addresses the question of if, and how, many individuals might be encouraged to change to lower average risk in the population. He points out that many positive or damaging health-relevant behaviors (including smoking, physical activity, dietary intake, maintenance of a recommended weight, ability to adhere to physician recommendations, ability to avoid harmful toxins or pollutants in the environment or in the workplace, understanding of and access to effective screening tests, and so forth) arise from the routine activities and conventional patterns of everyday life in a given society or social group. He argues that changing health-damaging behaviors thus depends on the abilities of societies and communities to create, or, in some cases, recreate, environments and activities of daily living so that more individuals readily and easily act in ways that will improve or maintain health for as long as possible. The goal underlying the “macro” level approaches to disease prevention (described also by Rose12) is to make it easier for individuals to act in a health-promoting way by default rather than by persistent individual risk–benefit analysis. There is a large literature in the health sciences documenting the failure of health education campaigns to convince individuals to change behavior simply because it is good for them. There is also a large literature documenting Mechanic's point, that for most individuals, “health behaviors” arise out of normal patterns of everyday social life. Mechanic places the emphasis on changes in social structure as the most effective means by which risk factor findings can be incorporated into disease prevention strategies.

Whether one agrees with Mechanic is not the key issue. The critical point is that in modern epidemiology, the discussion of population attributes that determine average risk and the consideration of “macro” level interventions to improve public health have virtually been swept off the table. Such topics are commonly denigrated as “political” or “social” rather than “scientific” concerns. However, we ourselves are basing our decisions on values, and not on scientific logic, when we relegate a discussion of social change and social intervention to the realm of “politics” or “not epidemiologic science,” whereas we embrace such concepts as genetic susceptibility testing and individual risk prediction and communication as parts of “epidemiologic science.” The statistical methods and risk factor findings of noninfectious disease epidemiology demand a population focus, independent of any political leanings.

A large analysis of coronary heart disease13 demonstrated that, although risk factors were highly prevalent among those with disease, prevalences were almost as high (69–90% for varying combinations of the risk factors) in those without disease. Although these major risk factors are causal in the sense of predicting incidence of disease, they obviously do not answer important individual-level questions about causes of specific cases or about causes of averting disease. The implicit and critical message in this analysis is that those individuals with coronary heart disease are mirrors, not outliers, of society.14 When 60% to 90% of individuals in a population have some combination of major risk factors for heart disease, it is not clear that locating “risk” or “causal mechanisms” primarily within the individual is the most productive or effective or even the most scientifically logical strategy from a public health standpoint.14

Some epidemiologists may dismiss the call for a population focus as a call for “social engineering” or even as a call for a “license to tinker promiscuously with society.”15(p.811) I believe these are unjust characterizations. Today, individuals can hear, on nearly a daily basis, the latest epidemiologic findings on causes or preventives of diseases. There is an enthusiastic movement among influential scientists, sometimes working with corporate interests, to bring genetic testing and individual risk prediction into the clinic. These trends are rarely, if ever, labeled pejoratively as social engineering or tinkering, at least by academic epidemiologists, although clearly there are consequences to imposing massive “risk awareness” among healthy individuals throughout the general population. Furthermore, we are participating in open and unsystematic experimentation in these arenas. There is no study underway of the public health impact of these trends, nor is the relation of such practices to official public health policy explicitly scrutinized by epidemiologists. Such practices could be considered part of epidemiologists’ contributions, by default or selective silence, to social change and public health policy.


How did epidemiology, a discipline that has historically defined itself through reference to public or population health, come to this current obsession with individual risk and individual susceptibility? At times it seems our discipline has become merely an extension of the disciplines of genetics or molecular biology; we distinguish ourselves from such disciplines primarily by our sample size rather than by theory or intent. There are many possible answers to this question. One explanation lies with our foundational models of causality, those models2,3,16 that first became widely known to most epidemiologists in the 1980s and that are now taught to probably every epidemiology student in the United States.

In the epidemiology literature, the language surrounding the concept of “cause” is usually presented as applying to a single 1-time event in an “n of 1.” Our theoretical causal models (the popular sufficient-component-cause model, also referred to as the “causal pie” model,2,3 and the less frequently discussed potential-outcomes model,16 also known as the counterfactual model) are commonly presented in the language of individualism. The sufficient-component-cause model has as its basic unit of analysis the causal mechanisms (the constellations of component causes) that pertain to individual cases of disease2,3; the potential-outcomes model usually makes reference to individual-level causes and effects. The 4 individual “response types”17,18 discussed in epidemiologic methods courses (“doomed,” “exposure causative,” “exposure preventive,” and “immune”) complement the potential outcomes model, because they derive entirely from the notion of individual-level causal effects. Yet despite this language of distinct response types among individuals,17,18 our methods and data analysis can never speak to such distinctions.

Given the difficulty, even impossibility, of discerning complex biologic mechanisms from epidemiologic data, it is paradoxic that our most widely used causal model presumes causal mechanisms of specific cases as the unit of analysis.2,3 The training that many epidemiologists have received in this model may contribute to the high prevalence in epidemiology articles of speculations and even declarations, drawn from statistical findings on averages alone, about biologic mechanisms in individuals. Lieberson and Lynn19 discuss the need for alternative causal models in their own discipline of sociology; much of their discussion, and offering of alternatives, is relevant to epidemiology.


The epidemiologic perspective speaks to averages in large aggregates of individuals. It provides only limited information, if any, about biologic mechanisms and causal pathways underlying specific cases of disease. In reality, and despite the theoretical assumptions and language underlying our commonly used causal models, our knowledge of mechanisms and causes as they operate in individuals to produce or prevent disease remains weak. Our ability to predict individuals’ futures is correspondingly weak, despite the wealth of risk factor knowledge we have accumulated. If the differences between relative risks of 1.2, 1.5, 3.0, and 10 are indeed meaningful (and much of the methodologic focus in our discipline is designed to convince us they are), they are meaningful mostly at the aggregate level. In contrast, such relative risks are usually irrelevant at the individual level for many diseases we study in terms of positive and negative predictive values. From a scientific and quantitative standpoint, and not just from the standpoint of political argument, primary prevention requires us to address questions of determinants of average risk, of shifts in population risk distributions, and of population-level rather than individual-level tradeoffs in risk and benefit. We must reexamine the concepts of population-level prevention and intervention in light of the limits of risk factor findings at the individual level.

The failure of epidemiologists to fully acknowledge the tension between biologic complexity of disease and the comparative empiric crudeness of our statistical tools hinders our effectiveness in contributing to disease prevention. We pursue information about increasingly reductionist causes in our seemingly endless search for “better” knowledge about causal mechanisms in individuals. In doing so, we ignore the wisdom of the philosophers of science when it comes to thinking about the search for causes: there is no such thing as the “cause” of an event separate from the investigator's interest.20 In a similar vein, Helman21(p.181) claims that “the idea of ’cause’ has become meaningless other than as a convenient designation for the point in the chain of event sequences at which intervention is most practical.” The current focus on genes as the ultimate repository of “causes” may run the risk of conveying the nihilistic, postmodern, and incorrect view that there is little to be done by public health professionals on the societal level to prevent disease; it is only the molecular level within individuals that can be intervened on. There is much research and historic observation telling us otherwise.

Finally, in the ever-ongoing pursuit of knowledge of causes, we seem to ignore the quantitative reality that underlies nearly all studies of causes (including genes): many individuals will need to change, or “be changed,” or otherwise become unexposed, to prevent disease in a relatively small number of people. It is more fruitful to raise questions addressed to the population level: What are the determinants of average risk in the population? How many individuals need to become unexposed to prevent a single case of disease? Can “mass unexposure” be encouraged, or, equivalently, how can the determinants of average risk be ethically and feasibly addressed? For many topics in noninfectious disease epidemiology, these questions seem more productive than those directed to the individual level.


I would like to thank Doug Levine and Amy Sayle for their help in reading the manuscript several times.


1. Collins F, McKusick V. Implications of the Human Genome Project for medical science. JAMA. 2001;285:540–544.
2. Rothman K. Causes. Am J Epidemiol. 1976;104:587–592.
3. Rothman KJ. Modern Epidemiology. Boston: Little, Brown and Co; 1986.
4. Chanock S, Wacholder S. One gene and one outcome? No way. Trends Molecular Med. 2002;8:266–269.
5. Pepe M, Janes H, Longton G, et al. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159:882–890.
6. Wald N, Hackshaw A, Frost C. When can a risk factor be used as a worthwhile screening test? BMJ. 1999;319:1562–1565.
7. Begg C. On the use of familial aggregation in population-based probands for calculating penetrance. J Natl Cancer Inst. 2002;94:1221–1226.
8. Streuwing J, Hartge P, Wacholder S, et al. The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med. 1997;336:1401–1408.
9. Pritchard J. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137.
10. Holtzman N, Marteau T. Will genetics revolutionize medicine? N Engl J Med. 2000;343:141–144.
11. Mechanic D. The social context of health and disease and choices among health interventions. In: Brandt A, Rozin P, eds. Morality and Health, vol 53–78. New York: Routledge; 1997.
12. Rose G. The Strategy of Preventive Medicine. New York: Oxford University Press; 1992.
13. Greenland P, Knoll M, Stamler J, et al. Major risk factors as antecedents of fatal and nonfatal coronary heart disease events. JAMA. 2003;290:891–897.
14. Rockhill B. Major risk factors as antecedents of fatal and nonfatal coronary heart disease events [Letter]. JAMA. 2004;291:299.
15. Rothman KJ, Adami H-O, Trichopoulos D. Should the mission of epidemiology include the eradication of poverty? Lancet. 1998;352:810–813.
16. Greenland S, Brumback B. An overview of relations among causal modelling methods. Int J Epidemiol. 2002;31:1030–1037.
17. Robins J, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155.
18. Greenland S, Robins J. Identifiability, exchangeability, and epidemiologic confounding. Int J Epidemiol. 1986;15:412–418.
19. Lieberson S, Lynn F. Barking up the wrong branch: scientific alternatives to the current model of sociological science. Ann Rev Sociol. 2002;28:1–19.
20. Van Fraassen B. The Scientific Image. Oxford: Clarendon Press; 1980.
21. Helman C. Culture, Health and Illness. Bristol, UK: Wright; 1984.
© 2005 Lippincott Williams & Wilkins, Inc.