The overall potential for A to cause an increase or decrease in M in the target population is determined by the proportions of people who are susceptible to A as a cause or preventive of M (response types 2–3, 5–12, and 14–15, according to the notation of Rothman and Greenland 10 ,p.333). This potential is called the causal RD (for example, equation 18-10 10 ,p.334). For notational convenience, we will also call it the aetiologic (using the British spelling of the word etiologic) or agential RD, aRDA. The superscript on the left refers not only to the type of RD (aetiologic, agential) but also to the domain where it occurs (agency). The superscript on the right (A) refers to the specific agent, in this case alcoholic drinks. Before alcoholic drinks were invented, aRDA was the difference between the unobservable, counterfactual risk of migraine if all people had been alcohol drinkers vs the actual, potentially observable risk if all were nondrinkers (as they were). In every population, aRDA is unobservable because the population cannot be simultaneously all drinkers and all nondrinkers.
B. The Domain of Background Randomness
Before human beings began making alcoholic drinks, there existed background factors (B) that caused migraine attacks, such as barometric pressure changes. The causal (aetiologic) RD due to B can be represented by aRDB and illustrated in Figure 3 as an arrow from B (barometric pressure) to M (migraine). Later, when alcohol intake emerges as a human habit, some of these background factors will remain independent from alcohol drinking. It will be as if barometric pressure fluctuations are allocated at random so they are not associated with alcohol intake except by chance. This relation is illustrated in Figure 3 by the absence of any arrow or broken line between A and B, meaning they are not causally connected. “Not causally connected” means that if A and B ever happen to be associated, the association exists only by chance. “By chance” means any concurrence of A with B, like the coincidence of weekend alcohol intake with days of higher barometric pressure, is the result of causal processes so remote, mixed, and/or indirect that they are unable to cause A and B to be associated reproducibly (that is, more often than if alcohol intake was randomly allocated to days.) When B is associated with A by chance, B causes a distortion in the relation between A and M. If random background factors, B, are the only cause of distortion, as in a perfect randomized controlled trial, the potentially observable RD, bRDA, differs from the casual RD, aRDA, by only a chance deviation. We suggest calling bRDA, the base (or best) RD, because it is a potentially observable RD in the base (source) population and is the best possible estimate of the unobservable aRDA.
C. The Domain of Connections
When humans began making and imbibing alcoholic drinks (A), background factors were divided into those associated with A only by random chance (which we will still call B) vs factors associated with A by nonrandom causal connections. For example, cultural factors (C*) cause alcohol intake to be connected with caffeine consumption (C). This relation is illustrated in Figure 4, which shows that C (a correlated cause such as caffeine) can cause (or prevent) M and is connected with A because of a common cause C*. Such a correlated cause is called a “confounder” by many epidemiologists (including us), but other epidemiologists prefer to restrict use of the term “confounder” to the stage of data analysis for reasons we will discuss later. Correlated causes produce additional distortions to the base RD, so the RD in the domain of causal connections is a contaminated or crude RD, cRDA. A special type of connection in this domain is when M itself is C*, as when history of migraine causes a person to avoid alcohol intake. This connection produces reverse causation bias.
D. The Domain of Diagnosis
At some point in history, humans began to diagnose illnesses such as migraine. Having a true migraine greatly increased the chances of having such a diagnosis, but a true migraine is neither a sufficient nor a necessary cause of a diagnosis. Figure 5 shows that M (morbidity) causes D (diagnosis) but the disease assessment process is somewhat insensitive; migraine causes some but not all people to receive the diagnosis. Figure 5 also shows that m, a mimicking state (for example, muscle tension headache) in the absence of M, can cause D (a false positive diagnosis). This means the disease assessment process is also somewhat nonspecific. The insensitivity and nonspecificity cause further distortions to cRDA, resulting in the diagnostic RD, dRDA.
E. The Domain of Encoding
In time, humans began recording their exposures, including their consumption of alcoholic and caffeine-containing drinks. Figure 6 shows that A (alcohol intake) causes E (evidence in the historical record, such as “Epicurean,” “enjoyed eau-de-vie,” or “exposed to ethanol”). A is not a sufficient cause of E, because the process of observing and recording is somewhat insensitive; consumption of alcohol causes some but not all drinkers to be recorded as exposed. Nor is A a necessary cause of E, because the observing and recording is also somewhat nonspecific; an alternative agent, a (such as unfermented apple cider that is mistakenly considered alcoholic), can cause E (a false positive) in the absence of A. The insensitivity and nonspecificity of encoding add further distortions to dRDA. The result is the evident RD, eRDA. This is the first RD that is evident, ie, observable, not just potentially observable, because only now do recorded observations of both the agent and the morbidity exist.
F. The Domain of Files
Eventually medical records of diagnoses began to be collated with exposure information in paper or electronic files in filing cabinets and databases. Losses of diagnostic data are sometimes systematically associated with losses of exposure histories. Thus, some causes of disease insensitivity also cause exposure insensitivity. This situation can be illustrated with Figure 7, showing a filter (F) causing loss of information about both exposure and diagnosis, resulting in simultaneous insensitivity. Figure 7 can also represent the opposite situation, simultaneous nonspecificity caused by something that fabricates false positives among E (exposed) and D (diagnosed). For example, F could be the fancy that both alcoholism and migraine commonly occur together, causing the record keeper often to classify muscle tension headaches among apple cider drinkers as migraines among alcohol drinkers. Alternatively, F might increase sensitivity of the exposure measure while decreasing sensitivity of diagnosis.
Note that F can be E or D itself. Knowledge that a patient is a drinker can directly influence the diagnosis, and knowledge of the diagnosis can directly influence the exposure measurement. Likewise there can be direct biological causes of differential misclassification. Excessive intake of alcohol might cause the drinker to forget having had a migraine attack, in which case A is the F factor. Conversely, a severe migraine might directly cause loss of memory about alcohol intake, in which case M is the F factor. Thus, differential misclassification also occurs in the Domains of Diagnosis and Encoding. For clarity, however, we have introduced it and illustrated it only in the Domain of Files. This presentation also conforms to our experience studying impacts of health services. A common type of differential misclassification occurs when health care providers file claims for their services (the exposure of interest) and they record diagnoses that are both causes of, and justifications for, those services. (F can be interpreted as their expectation of a fee.) F causes further distortion of eRDA. The result is fRDA, which we call the file RD.
F is not well understood by most epidemiologists (including us). Contrary to conventional wisdom, F can produce misclassification that is nondifferential (not associated with A or M) but biases the eRDA away from the null. 11 (Note that F may represent a mixture of minor causes of diagnosis and encoding in much the same way that B represents a mixture of background causes of migraine, whereas we have specified that A, a, C, M, m, D, and E are single distinct characteristics, things or events.) Even if F does not exist, it may seem to exist. For example, the causes of false positive diagnoses and encoding (m and a) sometimes coincide by chance, which has the same effect in a database as if a single factor F could reproducibly cause simultaneous nonspecificity. If we assume that F does not exist, the probability that measurement errors coincide by chance can be estimated. Phillips 12 has proposed that uncertainty due to random variation in sensitivities and specificities be routinely quantified and reported as part of an enhanced confidence interval.
G. The Domain of Grouping
Although eRDA and fRDA are observable, they are not yet observed. We now add the descriptive epidemiologist who makes observations by grouping the file data, that is, forming subgroups of the target population in which the morbidity frequency is observed. Figure 8 shows an arrow from G to X and a broken line between G and D, indicating that the act of drawing a particular subgroup (extract, X) with a certain level of exposure, from the population classified as exposed (E), can be influenced by data on D. For example, past drinkers may be excluded and X defined as recent drinkers because the association with migraine diagnosis becomes stronger. Likewise, G* may be a criterion for producing the unexposed group, . In a hypothesis-generating study, the epidemiologist is likely to find more than one possible file RD, fRDA, depending on the grouping. The one that attracts the most attention is apt to be the largest. We think of this quantity as a generated or group RD, gRDA. Most group RDs (and risk ratios) remain unpublished observations.
Now we introduce a team of analytic epidemiologists to conduct a hypothesis-testing study. They aim to improve the grouping by carefully producing a cohort of drinkers and nondrinkers so that many of the above-mentioned causes of distortion are reduced or avoided. Ideally they conduct a double-blind randomized controlled trial of alcohol with good compliance and accurate outcome assessment. If that is not ethically or logistically feasible, they do an observational study involving subject selection. G and G* in Figure 8 can be interpreted as contributing causes of selecting exposed and unexposed groups, X and , for the cohort study. (G and G* are usually not single measurable phenomena, such as a definable selection criterion, but rather a mix of mental, logistical, and data processes.) In a retrospective cohort study, if G and G* are causally connected with diagnosis, D, in an unbalanced way, then the result is selection bias. In a prospective cohort study, D cannot cause selection bias, but confounder C can influence selection and produce a bias (which some epidemiologists call a selection bias, whereas others call it just confounding by C.)
The RD in the cohort is another type of generated group RD, gRDA. If the cohort is sufficiently different from the source population, epidemiologists often make a convenient mental shift and redefine the target population as the cohort itself. Epidemiologists who take that viewpoint will find that the Domain of Grouping is relatively unimportant. Nevertheless, for epidemiologists who continue to regard the source population as the target population, the causes of cohort selection bias remain a major concern worthy of defining a distinct domain.
When a study population is defined as the target population, rather than its source, this definition produces some terminology problems. If by matching on C the investigator eliminates the association between C and A in the crude data from a cohort, C is no longer a correlated cause and may be considered a background factor, B. (If the association between C and A reappears after stratifying by another variable, it may again be called a C.) Conversely, a background factor, B, not associated with A in the source population, might become associated with A after the cohort is selected. Then B is considered to be a C. In a properly conducted randomized trial, all alternative causes of morbidity are Bs, associated with A only by chance. But if one of those Bs is measurable, and strongly associated with A, it sometimes is considered to be a C (although it is debatable whether stratifying by C is legitimate without reinterpreting the confidence interval). This changeability is why some epidemiologists prefer not to label anything a confounder until a study is done and an analysis is specified. Similarly, the values of the sensitivities and specificities, and their interrelations, may change when a cohort is created and defined as the target population.
H. The Domain of Harvesting
As the cohort is followed, the yield of cases, Y, harvested by the follow-up protocol, H, may be incomplete (Figure 9). If a major cause of loss to follow-up is associated with exposure (for example, H is influenced by E), then the RD is further distorted. If a nested case-control study is done within the cohort, the protocol, H*, for harvesting controls, , can also produce bias if it is not done properly. All case-control studies are nested in an underlying cohort (the study base, a fixed or dynamic cohort) from which the cases and controls are harvested. In most case-control studies, however, the relation between the study population and the underlying cohort is not well defined. That means the harvesting processes, H and H*, (for example, “hospital catchment patterns”) are not well defined, so the study is susceptible to case- and control-selection bias, the major limitation of case-control studies. If the harvesting factors are causally associated with exposure, E, the result is likely that the harvested RD, hRDA, deviates from the gRDA. (The same applies to the risk ratio, which would normally be the parameter estimated in a case-control study.)
I. The Domain of Inference
During data analysis, the investigators’ interpretations (I)—their prior hypotheses and their statistical assumptions—can cause mismodeling and misinterpretation. Investigators often calculate several values for hRDA during an analysis, only one or two of which are chosen as the inferred RDs, iRDA, to be printed in a table.
J. The Domain of Journals
The inferred RDs are carefully judged for validity, plausibility, and interest before they are published. Investigators’ and reviewers’ judgments (J) cause journal RDs, jRDA, to differ from the average of all inferred RDs ever tabulated.
K. The Domain of Knowledge Use
An emerging field is the translation of published evidence into decision aids, clinical guidelines, and policies. People who do this are sometimes called knowledge brokers. Their task is to supply users with a known RD, kRDA. This value can differ from one knowledge broker to the next. Their knowledge (K) of the kinds of decisions people want to make can influence their methods of meta-analysis, interpretations, and judgments.
Interpretations (I), judgments (J) and knowledge of context (K) are usually unmeasurable complex mixtures of causes so their causal arrows, shown in Figure 10, are relatively unquantifiable and uncontrollable. Sometimes, however, they include a major component cause that is an indicator of a particular interpretation, judgment, or piece of knowledge, which enables us to express the cause of bias in terms of a counterfactual conditional statement. For example, “If the investigator had not included factor Z in the model, the RD would not have been β,” or “If the meta-analyst had included the non-English studies, the RD would not have been β**.” Generally I, J, and K are modifiers of the RD (except when I is an invention, such as a belief that a RD is zero, based on no direct evidence.)
Up to this point, we have avoided distinguishing between independent causes and modifiers. We could have added a susceptibility factor, S, as a necessary component of the mechanism by which A, B, and C cause M (Figure 11). By analogy, we could have stated that F, G, and H often can be viewed as necessary component causes, respectively, of (1) certain types of simultaneous false positive observations, (2) alternative ways of being extracted from the source population and included in the cohort, and (3) particular methods of being harvested as a case (Figure 11). The topics of synergism and antagonism would have made our model and discussion too complex. Nevertheless, for readers who wish to translate our arrow diagrams into the pie charts of Rothman’s classic paper Causes, 13 now known as the sufficient/component cause model, 10 ,pp.8–13 we have included pie shapes in Figure 11. Rothman’s model is an excellent introduction to causal interpretations of strengths of effects, synergism, antagonism, and necessary and sufficient causes. But it cannot equal the ability of arrows to portray sequences of causes. (Note that we depart from the syntax of directed acyclic graphs in Figure 11 when we use curved lines merging with straight arrows, where a curved line represents a necessary component of the causal chain represented by the straight arrow.)
The pies at the bottom of Figure 11 represent two of many possible alternative combinations of component causes that are sufficient to cause a person to be in the “a” cell of a study (that is, having characteristics X and Y). Imagine a person who is born, with a potential for alcohol-induced migraine, into a world of seemingly random barometric changes. She starts consuming alcoholic and caffeine-containing drinks. Eventually she experiences headaches and receives a diagnosis of migraine. She reports being an occasional drinker of alcohol and this fact is filed in her medical chart with her diagnosis. Later her data are extracted from a database and included in a retrospective cohort study in which she is a harvested case. The component causes of her journey to the “a-cell” can be shown as accumulating pieces of a circular jigsaw puzzle (the left pie chart in Figure 11).
In closing, we should repeat that we used two heuristic devices: a temporal sequence of the evolutionary origins of epidemiology and a hierarchy from the most universal to the most local causes. In epidemiologic studies, the temporal sequences of events are often different from the sequence we used here. In a prospective cohort study, A may cause E and E cause X long before A causes M. In a case-control study, M may cause D, which may cause Y before a measurement E is made from A. Additionally, the hierarchy as we have presented it often does not hold exactly. For example, the causation of E by A (for example, an expensive new measurement method) may occur in only one subgroup of one local study population. The validity of our model does not depend on whether a particular temporal sequence or hierarchy occurs. It depends on whether the arrows correctly reflect the sequence and convergence of causes on the two main paths, from A (or a) to X and from M (or m) to Y. This depiction depends on the validity of the basic idea that a phenomenon causes (and precedes) its observation, which causes (and precedes) the use of the observation for selection of subjects into a study, which causes (and precedes) data analysis, which causes (and precedes) use of the study results.
A potential misinterpretation of these causal diagrams is to conclude optimistically that we can treat all biases as we treat confounders. A widespread idea is “there are five possible explanations for an association between an exposure ... and an outcome ... 1. Chance ... 2. Bias ... 3. Confounding ... 4. Causal ... 5. Reverse causality ...”14 ,pp.26–27 Readers who begin with this misleading simplification may misinterpret our statement, “Bias is caused,” to mean, “All sources of bias are like confounders.” In fact, it is well established that sources of bias often cannot be treated as confounders. 15
On the other hand, the arrow diagrams can also be misused by pessimists who believe epidemiologic evidence is hopelessly biased. The mere existence of a cause of bias does not automatically make it quantitatively important. In future papers, we will present some simple algebraic formulas to help us think more quantitatively about the degradations from the top-quality, “grade-a”, causal RD, aRD, to the lower-quality RDs we normally use.
The main lesson from our causal model of bias is that we should question the adequacy of qualitative assessments of bias in the Discussion sections of study reports. Even with the help of user-friendly notation and illustrations, it is hard to think about all these causes of bias simultaneously. Above, we mentioned how selecting a cohort from a target population may result in a random background factor becoming a confounder or vice versa. We also mentioned that causes of measurement error can coincide by chance. We did not grapple with many other possible chance associations among these causes of bias. To handle such complexity, we need sensitivity analysis software tailored to epidemiologic methods. With such tools, we hope epidemiologists in the future can go far beyond simplistic confidence intervals and embrace thorough quantitative treatments of uncertainty in epidemiologic reports. 12
We thank Lucas Neas for collaborating on a different version of this causal model long ago. We thank Charles Poole, Jay Kaufman, Sander Greenland, James Robins, and Carl Phillips for their helpful critiques and suggestions.
1. Greenland S, Robins JM. Identifiability and exchangeability, and epidemiological confounding. Int J Epidemiol 1986; 15: 431–419.
2. Greenland S. Absence of confounding does not correspond to collapsibility of the rate ratio or rate difference. Epidemiology 1996; 7: 498–501.
3. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999; 10: 37–48.
4. Maldonado G, Greenland S. The causal-contrast study design (Abstract). Am J Epidemiol 2000; 151: S39.
5. Pearl J. Causal diagrams for empirical research (with discussion). Biometrika 1995; 82: 669–710.
6. Robins JM, Wasserman L. Estimation of effects of sequential treatments by reparameterizing direct acyclic graphs. In: Geiger D, Shenoy P, eds. Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence, Rhode Island, August 1–3, 1997. San Francisco: Morgan Kaufmann, 1997;409–420.
7. Greenland S. Basic methods for sensitivity analysis of biases. Int J Epidemiol 1996; 25: 1107–1116.
8. Steinbeck G, Ahlbom A. A definition of bias founded on the concept of the study base. Epidemiology 1992; 3: 477–482.
9. Stang P, Sternfeld B, Sidney S. Migraine headache in a prepaid health plan: ascertainment, demographics, physiological, and behavioral factors. Headache 1996; 36: 69–76
10. Rothman KJ, Greenland S. Modern Epidemiology. 2nd ed. Philadelphia: Lippincott-Raven, 1998.
11. Kristensen P. Bias from nondifferential but dependent misclassification of exposure and outcome. Epidemiology 1992; 3: 210–215.
12. Phillips CV. Applying fully articulated probability distribution calculations (Abstract). Am J Epidemiol 2000; 151: S41.
13. Rothman KJ. Causes. Am J Epidemiol 1976; 104: 587–592.
14. Ebrahim S, Harwood R. Stroke: Epidemiology, Evidence and Clinical Practice. 2nd ed. New York: Oxford University Press, 1999; 26–7.
15. Greenland S, Robins J. Confounding and misclassification. Am J Epidemiol 1985; 122: 495–506.
Keywords:© 2001 Lippincott Williams & Wilkins, Inc.
bias,; meta-analysis,; causal inference,; epidemiologic methods,; confounding.