Bias is caused. Preventing and adjusting for bias in epidemiology is improved by understanding its causation. Causal thinking has deepened understanding of confounding 1–3 and study design. 4 Now that the theory of causal diagrams has been applied to epidemiologic causation, 3,5,6 we hypothesize that it can be used to elucidate the causes of bias in epidemiologic studies.
In this paper, we present an integrated model of causation of bias in epidemiology with user-friendly notation. We have tested it as a tool for quickly conveying to nonepidemiologists how bias enters epidemiologic evidence. Our aim is to give epidemiologists a picture that not only encapsulates current qualitative thinking, but also encourages more quantitative treatment of hypotheses about biases. We envision that the components of our model will map directly onto multiple equations used in sensitivity analyses of simultaneous hypothesized biases. 7 We also hope this paper increases interest in the innovative work of Robins, Pearl, and Greenland on causal modeling. We will build our model piece by piece, after explaining its overall design by an analogy.
Imagine that a target population on Earth can be observed through a special telescope on an orbiting space station. Suppose the telescope’s electronic components include a “disease detector”; an “exposure sensor”; various filters; an “extract grouper”; a “yield harvester”; an “image processor”; a “joiner-transmitter”; and, back on Earth, a “knowledge receiver.” A skeptical user of this telescope would wonder how it produces the final picture and what could cause image distortion and degradation.
Now consider the actual observation tools of epidemiology, webs of information systems between morbidity in human populations and users of epidemiologic evidence. Suppose we think of this as an “episcope.”Figure 1 shows schematically a user looking from right to left through the lenses and filters of an episcope. Once the episcope has been constructed, the direction of causation (information flow, like light rays through a telescope) is mainly from left to right, as follows.
A. The original cause of the image transmitted through the episcope is the association (if it exists) between the causal agent and morbidity.
B. Other background factors cloud the association randomly.
C. In the source population, the association of interest can be further distorted by correlated causes.
None of the above phenomena is observable without diagnoses and exposure measurements, which are produced as follows:
D. Morbidity is a contributing cause of diagnoses, which are recorded in medical charts or death certificates, as well as self-reported on questionnaires. These diagnoses and recordings have varying sensitivities and specificities.
E. The occurrence of the causal agent is a contributing cause of the occurrence of evidence of exposure, which is recorded via interviews, self-administered questionnaires, administrative forms, and other instruments with varying sensitivities and specificities.
F. Data on diagnoses and exposures are collated by various means into files and databases, usually for administrative purposes and sometimes specifically for epidemiologic studies.
Using files and databases, it is possible to select people into an epidemiologic study by exposure or disease status, as follows.
G. In a descriptive study, the selection usually involves grouping subjects into various exposure levels and examining disease rates. For a cohort study, the database may be used for choosing specific exposure groups to follow.
H. Then comes the harvesting of new cases in a cohort study or, analogously, the selection of cases and controls for a case-control study.
I. The investigators do many data analyses but submit only their best for publication.
J. The journal judges the submitted paper and may decide not to publish it.
K. Knowledge brokers, such as Cochrane Collaboration meta-analysts, guideline committees, or local experts, help decision makers use the published papers.
As users of episcopes ourselves, when we sit at a computer screen obtaining evidence from the Internet and then delving deep into a review and thoroughly reading an individual study, we experience these layers roughly in reverse order, from K to A. Each layer is a distinct component of the episcope, a domain where distinct biases occur. We will use this concept of domains while building our tool for understanding the causation of bias. We will start at the left and finish at the right, tracing the evolution of a risk difference (RD) from a causal RD in the “Domain of Agency” to a known RD in the “Domain of Knowledge Use.” As information moves through the episcope, the RD goes through a sequence of potential distortions caused by domain-specific biases. In future papers, we will embellish this tool with some algebra quantifying some of these biases. The algebra is easier for the RD than for the risk ratio, so we will focus here on the RD. In this paper, we discuss biases to the RD only qualitatively. The same qualitative arguments apply to the risk ratio.
Our arrow diagrams use two conventions drawn from the language of “directed acyclic graphs” explained for epidemiologists by Greenland et al 3 :
(1) An arrow indicates that the thing or event at the tail of the arrow can sometimes produce the thing or event at the head of the arrow.
(2) A broken line indicates that the two things or events are associated because they are sometimes joint outcomes of a common cause not shown in the diagram.
It is important to appreciate that some causes of bias—for example, other causes of morbidity correlated with the cause of interest—are more universal (or general) than other causes, because they exist in the source population, even if no observer exists. Observers (and, therefore, the causes of their observations) are less common than the causes being observed, but they are more widespread than the causes involved in making databases. These, in turn, are more common than the causes of selection into epidemiologic studies. Biases in inference and judgment are the most local; they may be specific to one coinvestigator on the study team or one reviewer of the published literature. As we build our model, moving from the left to the right of the episcope, the most universal causes of bias precede the most local causes. The idea of organizing our model as a hierarchy of domains was influenced by a paper by Steinbeck and Ahlbom. 8 As an additional heuristic device, we also exploit the fact that the causal sequence of our model, whereby phenomena cause observations that cause uses of those observations, corresponds approximately to their historic chronology in the evolutionary origins of epidemiology. To reinforce our notation system, we use the epidemiology of migraine 9 as an example.
A. The Domain of Agency (Causal Potential)
The hierarchy begins with the causal laws that govern life on Earth, including the causes of migraines in human beings. One possible cause might be intake of alcoholic drinks. This is shown in Figure 2 as an arrow from A (the agent, alcoholic drink) to M (the morbidity, migraine) with a question mark. (This, like every other arrow in Figures 3 –10, concisely illustrates an association that can also be expressed using a two-by-two table.) We think of this as the most universal of domains because, long before humans started making alcoholic drinks, the potential for a large quantity of alcohol to cause or prevent migraine in the human brain could have existed as a biological characteristic of humans. In terms of “counterfactual response types,”10,p.333 there were individuals in the population who would have responded to alcohol by experiencing a migraine (or by having the migraine prevented), although there were no alcoholic drinks yet to produce those responses. (Similarly, the causal potential for a smallpox virus to cause smallpox still exists although the virus no longer exists, except in cold storage.) In this sense, a causal potential can exist even before (or after) the exposure’s existence.
The overall potential for A to cause an increase or decrease in M in the target population is determined by the proportions of people who are susceptible to A as a cause or preventive of M (response types 2–3, 5–12, and 14–15, according to the notation of Rothman and Greenland 10,p.333). This potential is called the causal RD (for example, equation 18-10 10,p.334). For notational convenience, we will also call it the aetiologic (using the British spelling of the word etiologic) or agential RD, aRDA. The superscript on the left refers not only to the type of RD (aetiologic, agential) but also to the domain where it occurs (agency). The superscript on the right (A) refers to the specific agent, in this case alcoholic drinks. Before alcoholic drinks were invented, aRDA was the difference between the unobservable, counterfactual risk of migraine if all people had been alcohol drinkers vs the actual, potentially observable risk if all were nondrinkers (as they were). In every population, aRDA is unobservable because the population cannot be simultaneously all drinkers and all nondrinkers.
B. The Domain of Background Randomness
Before human beings began making alcoholic drinks, there existed background factors (B) that caused migraine attacks, such as barometric pressure changes. The causal (aetiologic) RD due to B can be represented by aRDB and illustrated in Figure 3 as an arrow from B (barometric pressure) to M (migraine). Later, when alcohol intake emerges as a human habit, some of these background factors will remain independent from alcohol drinking. It will be as if barometric pressure fluctuations are allocated at random so they are not associated with alcohol intake except by chance. This relation is illustrated in Figure 3 by the absence of any arrow or broken line between A and B, meaning they are not causally connected. “Not causally connected” means that if A and B ever happen to be associated, the association exists only by chance. “By chance” means any concurrence of A with B, like the coincidence of weekend alcohol intake with days of higher barometric pressure, is the result of causal processes so remote, mixed, and/or indirect that they are unable to cause A and B to be associated reproducibly (that is, more often than if alcohol intake was randomly allocated to days.) When B is associated with A by chance, B causes a distortion in the relation between A and M. If random background factors, B, are the only cause of distortion, as in a perfect randomized controlled trial, the potentially observable RD, bRDA, differs from the casual RD, aRDA, by only a chance deviation. We suggest calling bRDA, the base (or best) RD, because it is a potentially observable RD in the base (source) population and is the best possible estimate of the unobservable aRDA.
C. The Domain of Connections
When humans began making and imbibing alcoholic drinks (A), background factors were divided into those associated with A only by random chance (which we will still call B) vs factors associated with A by nonrandom causal connections. For example, cultural factors (C*) cause alcohol intake to be connected with caffeine consumption (C). This relation is illustrated in Figure 4, which shows that C (a correlated cause such as caffeine) can cause (or prevent) M and is connected with A because of a common cause C*. Such a correlated cause is called a “confounder” by many epidemiologists (including us), but other epidemiologists prefer to restrict use of the term “confounder” to the stage of data analysis for reasons we will discuss later. Correlated causes produce additional distortions to the base RD, so the RD in the domain of causal connections is a contaminated or crude RD, cRDA. A special type of connection in this domain is when M itself is C*, as when history of migraine causes a person to avoid alcohol intake. This connection produces reverse causation bias.
D. The Domain of Diagnosis
At some point in history, humans began to diagnose illnesses such as migraine. Having a true migraine greatly increased the chances of having such a diagnosis, but a true migraine is neither a sufficient nor a necessary cause of a diagnosis. Figure 5 shows that M (morbidity) causes D (diagnosis) but the disease assessment process is somewhat insensitive; migraine causes some but not all people to receive the diagnosis. Figure 5 also shows that m, a mimicking state (for example, muscle tension headache) in the absence of M, can cause D (a false positive diagnosis). This means the disease assessment process is also somewhat nonspecific. The insensitivity and nonspecificity cause further distortions to cRDA, resulting in the diagnostic RD, dRDA.
E. The Domain of Encoding
In time, humans began recording their exposures, including their consumption of alcoholic and caffeine-containing drinks. Figure 6 shows that A (alcohol intake) causes E (evidence in the historical record, such as “Epicurean,” “enjoyed eau-de-vie,” or “exposed to ethanol”). A is not a sufficient cause of E, because the process of observing and recording is somewhat insensitive; consumption of alcohol causes some but not all drinkers to be recorded as exposed. Nor is A a necessary cause of E, because the observing and recording is also somewhat nonspecific; an alternative agent, a (such as unfermented apple cider that is mistakenly considered alcoholic), can cause E (a false positive) in the absence of A. The insensitivity and nonspecificity of encoding add further distortions to dRDA. The result is the evident RD, eRDA. This is the first RD that is evident, ie, observable, not just potentially observable, because only now do recorded observations of both the agent and the morbidity exist.
F. The Domain of Files
Eventually medical records of diagnoses began to be collated with exposure information in paper or electronic files in filing cabinets and databases. Losses of diagnostic data are sometimes systematically associated with losses of exposure histories. Thus, some causes of disease insensitivity also cause exposure insensitivity. This situation can be illustrated with Figure 7, showing a filter (F) causing loss of information about both exposure and diagnosis, resulting in simultaneous insensitivity. Figure 7 can also represent the opposite situation, simultaneous nonspecificity caused by something that fabricates false positives among E (exposed) and D (diagnosed). For example, F could be the fancy that both alcoholism and migraine commonly occur together, causing the record keeper often to classify muscle tension headaches among apple cider drinkers as migraines among alcohol drinkers. Alternatively, F might increase sensitivity of the exposure measure while decreasing sensitivity of diagnosis.
Note that F can be E or D itself. Knowledge that a patient is a drinker can directly influence the diagnosis, and knowledge of the diagnosis can directly influence the exposure measurement. Likewise there can be direct biological causes of differential misclassification. Excessive intake of alcohol might cause the drinker to forget having had a migraine attack, in which case A is the F factor. Conversely, a severe migraine might directly cause loss of memory about alcohol intake, in which case M is the F factor. Thus, differential misclassification also occurs in the Domains of Diagnosis and Encoding. For clarity, however, we have introduced it and illustrated it only in the Domain of Files. This presentation also conforms to our experience studying impacts of health services. A common type of differential misclassification occurs when health care providers file claims for their services (the exposure of interest) and they record diagnoses that are both causes of, and justifications for, those services. (F can be interpreted as their expectation of a fee.) F causes further distortion of eRDA. The result is fRDA, which we call the file RD.
F is not well understood by most epidemiologists (including us). Contrary to conventional wisdom, F can produce misclassification that is nondifferential (not associated with A or M) but biases the eRDA away from the null. 11 (Note that F may represent a mixture of minor causes of diagnosis and encoding in much the same way that B represents a mixture of background causes of migraine, whereas we have specified that A, a, C, M, m, D, and E are single distinct characteristics, things or events.) Even if F does not exist, it may seem to exist. For example, the causes of false positive diagnoses and encoding (m and a) sometimes coincide by chance, which has the same effect in a database as if a single factor F could reproducibly cause simultaneous nonspecificity. If we assume that F does not exist, the probability that measurement errors coincide by chance can be estimated. Phillips 12 has proposed that uncertainty due to random variation in sensitivities and specificities be routinely quantified and reported as part of an enhanced confidence interval.
G. The Domain of Grouping
Although eRDA and fRDA are observable, they are not yet observed. We now add the descriptive epidemiologist who makes observations by grouping the file data, that is, forming subgroups of the target population in which the morbidity frequency is observed. Figure 8 shows an arrow from G to X and a broken line between G and D, indicating that the act of drawing a particular subgroup (extract, X) with a certain level of exposure, from the population classified as exposed (E), can be influenced by data on D. For example, past drinkers may be excluded and X defined as recent drinkers because the association with migraine diagnosis becomes stronger. Likewise, G* may be a criterion for producing the unexposed group, . In a hypothesis-generating study, the epidemiologist is likely to find more than one possible file RD, fRDA, depending on the grouping. The one that attracts the most attention is apt to be the largest. We think of this quantity as a generated or group RD, gRDA. Most group RDs (and risk ratios) remain unpublished observations.
Now we introduce a team of analytic epidemiologists to conduct a hypothesis-testing study. They aim to improve the grouping by carefully producing a cohort of drinkers and nondrinkers so that many of the above-mentioned causes of distortion are reduced or avoided. Ideally they conduct a double-blind randomized controlled trial of alcohol with good compliance and accurate outcome assessment. If that is not ethically or logistically feasible, they do an observational study involving subject selection. G and G* in Figure 8 can be interpreted as contributing causes of selecting exposed and unexposed groups, X and , for the cohort study. (G and G* are usually not single measurable phenomena, such as a definable selection criterion, but rather a mix of mental, logistical, and data processes.) In a retrospective cohort study, if G and G* are causally connected with diagnosis, D, in an unbalanced way, then the result is selection bias. In a prospective cohort study, D cannot cause selection bias, but confounder C can influence selection and produce a bias (which some epidemiologists call a selection bias, whereas others call it just confounding by C.)
The RD in the cohort is another type of generated group RD, gRDA. If the cohort is sufficiently different from the source population, epidemiologists often make a convenient mental shift and redefine the target population as the cohort itself. Epidemiologists who take that viewpoint will find that the Domain of Grouping is relatively unimportant. Nevertheless, for epidemiologists who continue to regard the source population as the target population, the causes of cohort selection bias remain a major concern worthy of defining a distinct domain.
When a study population is defined as the target population, rather than its source, this definition produces some terminology problems. If by matching on C the investigator eliminates the association between C and A in the crude data from a cohort, C is no longer a correlated cause and may be considered a background factor, B. (If the association between C and A reappears after stratifying by another variable, it may again be called a C.) Conversely, a background factor, B, not associated with A in the source population, might become associated with A after the cohort is selected. Then B is considered to be a C. In a properly conducted randomized trial, all alternative causes of morbidity are Bs, associated with A only by chance. But if one of those Bs is measurable, and strongly associated with A, it sometimes is considered to be a C (although it is debatable whether stratifying by C is legitimate without reinterpreting the confidence interval). This changeability is why some epidemiologists prefer not to label anything a confounder until a study is done and an analysis is specified. Similarly, the values of the sensitivities and specificities, and their interrelations, may change when a cohort is created and defined as the target population.
H. The Domain of Harvesting
As the cohort is followed, the yield of cases, Y, harvested by the follow-up protocol, H, may be incomplete (Figure 9). If a major cause of loss to follow-up is associated with exposure (for example, H is influenced by E), then the RD is further distorted. If a nested case-control study is done within the cohort, the protocol, H*, for harvesting controls, , can also produce bias if it is not done properly. All case-control studies are nested in an underlying cohort (the study base, a fixed or dynamic cohort) from which the cases and controls are harvested. In most case-control studies, however, the relation between the study population and the underlying cohort is not well defined. That means the harvesting processes, H and H*, (for example, “hospital catchment patterns”) are not well defined, so the study is susceptible to case- and control-selection bias, the major limitation of case-control studies. If the harvesting factors are causally associated with exposure, E, the result is likely that the harvested RD, hRDA, deviates from the gRDA. (The same applies to the risk ratio, which would normally be the parameter estimated in a case-control study.)
I. The Domain of Inference
During data analysis, the investigators’ interpretations (I)—their prior hypotheses and their statistical assumptions—can cause mismodeling and misinterpretation. Investigators often calculate several values for hRDA during an analysis, only one or two of which are chosen as the inferred RDs, iRDA, to be printed in a table.
J. The Domain of Journals
The inferred RDs are carefully judged for validity, plausibility, and interest before they are published. Investigators’ and reviewers’ judgments (J) cause journal RDs, jRDA, to differ from the average of all inferred RDs ever tabulated.
K. The Domain of Knowledge Use
An emerging field is the translation of published evidence into decision aids, clinical guidelines, and policies. People who do this are sometimes called knowledge brokers. Their task is to supply users with a known RD, kRDA. This value can differ from one knowledge broker to the next. Their knowledge (K) of the kinds of decisions people want to make can influence their methods of meta-analysis, interpretations, and judgments.
Interpretations (I), judgments (J) and knowledge of context (K) are usually unmeasurable complex mixtures of causes so their causal arrows, shown in Figure 10, are relatively unquantifiable and uncontrollable. Sometimes, however, they include a major component cause that is an indicator of a particular interpretation, judgment, or piece of knowledge, which enables us to express the cause of bias in terms of a counterfactual conditional statement. For example, “If the investigator had not included factor Z in the model, the RD would not have been β,” or “If the meta-analyst had included the non-English studies, the RD would not have been β**.” Generally I, J, and K are modifiers of the RD (except when I is an invention, such as a belief that a RD is zero, based on no direct evidence.)
Up to this point, we have avoided distinguishing between independent causes and modifiers. We could have added a susceptibility factor, S, as a necessary component of the mechanism by which A, B, and C cause M (Figure 11). By analogy, we could have stated that F, G, and H often can be viewed as necessary component causes, respectively, of (1) certain types of simultaneous false positive observations, (2) alternative ways of being extracted from the source population and included in the cohort, and (3) particular methods of being harvested as a case (Figure 11). The topics of synergism and antagonism would have made our model and discussion too complex. Nevertheless, for readers who wish to translate our arrow diagrams into the pie charts of Rothman’s classic paper Causes, 13 now known as the sufficient/component cause model, 10,pp.8–13 we have included pie shapes in Figure 11. Rothman’s model is an excellent introduction to causal interpretations of strengths of effects, synergism, antagonism, and necessary and sufficient causes. But it cannot equal the ability of arrows to portray sequences of causes. (Note that we depart from the syntax of directed acyclic graphs in Figure 11 when we use curved lines merging with straight arrows, where a curved line represents a necessary component of the causal chain represented by the straight arrow.)
The pies at the bottom of Figure 11 represent two of many possible alternative combinations of component causes that are sufficient to cause a person to be in the “a” cell of a study (that is, having characteristics X and Y). Imagine a person who is born, with a potential for alcohol-induced migraine, into a world of seemingly random barometric changes. She starts consuming alcoholic and caffeine-containing drinks. Eventually she experiences headaches and receives a diagnosis of migraine. She reports being an occasional drinker of alcohol and this fact is filed in her medical chart with her diagnosis. Later her data are extracted from a database and included in a retrospective cohort study in which she is a harvested case. The component causes of her journey to the “a-cell” can be shown as accumulating pieces of a circular jigsaw puzzle (the left pie chart in Figure 11).
In closing, we should repeat that we used two heuristic devices: a temporal sequence of the evolutionary origins of epidemiology and a hierarchy from the most universal to the most local causes. In epidemiologic studies, the temporal sequences of events are often different from the sequence we used here. In a prospective cohort study, A may cause E and E cause X long before A causes M. In a case-control study, M may cause D, which may cause Y before a measurement E is made from A. Additionally, the hierarchy as we have presented it often does not hold exactly. For example, the causation of E by A (for example, an expensive new measurement method) may occur in only one subgroup of one local study population. The validity of our model does not depend on whether a particular temporal sequence or hierarchy occurs. It depends on whether the arrows correctly reflect the sequence and convergence of causes on the two main paths, from A (or a) to X and from M (or m) to Y. This depiction depends on the validity of the basic idea that a phenomenon causes (and precedes) its observation, which causes (and precedes) the use of the observation for selection of subjects into a study, which causes (and precedes) data analysis, which causes (and precedes) use of the study results.
A potential misinterpretation of these causal diagrams is to conclude optimistically that we can treat all biases as we treat confounders. A widespread idea is “there are five possible explanations for an association between an exposure ... and an outcome ... 1. Chance ... 2. Bias ... 3. Confounding ... 4. Causal ... 5. Reverse causality ...”14,pp.26–27 Readers who begin with this misleading simplification may misinterpret our statement, “Bias is caused,” to mean, “All sources of bias are like confounders.” In fact, it is well established that sources of bias often cannot be treated as confounders. 15
On the other hand, the arrow diagrams can also be misused by pessimists who believe epidemiologic evidence is hopelessly biased. The mere existence of a cause of bias does not automatically make it quantitatively important. In future papers, we will present some simple algebraic formulas to help us think more quantitatively about the degradations from the top-quality, “grade-a”, causal RD, aRD, to the lower-quality RDs we normally use.
The main lesson from our causal model of bias is that we should question the adequacy of qualitative assessments of bias in the Discussion sections of study reports. Even with the help of user-friendly notation and illustrations, it is hard to think about all these causes of bias simultaneously. Above, we mentioned how selecting a cohort from a target population may result in a random background factor becoming a confounder or vice versa. We also mentioned that causes of measurement error can coincide by chance. We did not grapple with many other possible chance associations among these causes of bias. To handle such complexity, we need sensitivity analysis software tailored to epidemiologic methods. With such tools, we hope epidemiologists in the future can go far beyond simplistic confidence intervals and embrace thorough quantitative treatments of uncertainty in epidemiologic reports. 12
We thank Lucas Neas for collaborating on a different version of this causal model long ago. We thank Charles Poole, Jay Kaufman, Sander Greenland, James Robins, and Carl Phillips for their helpful critiques and suggestions.