The theory of directed acyclic graphs (DAGs), as extensively developed by Pearl in the setting of artificial intelligence1 and in the epidemiology setting in 1999,2 is producing growing pains for the field, even as it clarifies how we think about sampling biases and confounder adjustment in statistical models for causal relationships. I welcomed the paper by VanderWeele and Robins in the current issue,3 as a long-awaited and important step toward using DAGs to clarify the role of effect modifiers in causing disease. Most diseases are caused by multiple factors acting together and often through distinct pathways that can lead to a common final phenotype. Teasing apart the causal choreography will remain a prize worth the struggle; and the prize seems more attainable than ever, thanks to the rich array of molecular tools that are newly available to us.
VanderWeele and Robins3 propose a system for classifying effect modifiers in DAGs according to 4 categories: direct effect modifiers, indirect effect modifiers, effect modifiers by proxy, and effect modifiers by common cause. Their categorizations are intended to be DAG-specific, and not necessarily biologically meaningful: an indirect effect modifier might morph to become a “direct” effect modifier, simply by omitting an intermediate node from the DAG. While one might wish for a more biologic meaning, few cause—effect relationships in epidemiology or biology are ever “direct,” as one can typically think of additional proximal intermediates. For example, a genotype is not plausibly a “direct” cause of much (despite Fig. 1 in the paper by VanderWeele and Robins). A variant allele that influences risk by producing an aberrant protein product (as in sickle cell anemia) and one that influences the rate of transcription of some other gene can both be validly represented as “direct” in the nosology of VanderWeele and Robins, even though their effects are biologically indirect. As another example, these authors clarify that a factor categorized as an effect modifier by “common cause” can sometimes be transformed to be an effect modifier by proxy, simply by omitting an intermediate factor from the DAG – thereby transforming the shared indirect cause of D to a direct cause. Thus the 4-way categorization is telling us something about the somewhat arbitrary way we have drawn the DAG itself and may capture little about the nature of the causal factor or its role in causing disease.
Are these classifications helpful? Presumably, the direct and indirect categories of effect modification are the ones with potential implications for intervention, although VanderWeele and Robins do not comment on the utility of their classification scheme.
It has been frustrating to me that important kinds of causal relationships are not captured graphically by DAGs, so I was disappointed that the example DAGs given by VanderWeele and Robins provide no graphical representation to indicate that a factor is regarded as an effect modifier for some other risk factor. Consider their example of a genotype, X, that might influence response to treatment in a randomized clinical trial of an exposure E. Their Figure 1 shows the corresponding DAG. One might have hoped to see causal diagrams that were able to show more interaction than is suggested by the 2 separate, mutually aloof arrows—one from E to D and one from X to D. Perhaps one could instead show an arrow from X that ends at the E-to-D arrow itself, as in my Figure 1. (Refer to the figure legends for how I am defining “direct cause” mathematically.) Note that if both X and E are causes of D, and X is an effect modifier for E, then E is necessarily also an effect modifier for X.
Variations of this causal scenario can also be captured. Consider Figure 1 in VanderWeele and Robins and remove the arrow from X to D. X might influence either the uptake or the metabolism of E, or the biologic response to E, but have no effect on D by itself. Thus, it may only affect the arrow from E to D, ie, the causal process itself. We would not think of X as a “direct cause” of D (its “main effect” would be 0), and we could show this by omitting its direct arrow to D, although the standard DAG might still demand that arrow. In environmental health, some of the most plausible effect modifiers are those that influence absorption, specific metabolic detoxification pathways, immune responses, apoptosis, or DNA repair processes. Some effect-modifying cofactors may have little or no effect in the absence of exposure (although others may retain effects via unmeasured other exposures whose effects are also modified). This phenomenon could be shown graphically as in my Figure 2. Note that this effect modification is present regardless of the scale selected. As another example, the polio vaccine should have no effect on an individual's risk of developing polio in the absence of exposure to the polio virus.
There are also scenarios where both the X-to-D arrow and the E-to-D arrow could be omitted, because neither produces D by itself. This possible category of “pure” effect modification could be represented by arrows that join together, as in my Figure 3. Retardation secondary to phenylketonuria is an example, because neither the genetic metabolic defect nor dietary phenylalanine produces retardation by itself. Note that this kind of scenario also reflects effect modification that is present regardless of the scale selected, and could also be seen as a 2-component, sufficient-cause scenario, as described by Rothman4 as a causal pie. More generally, however, I tend to see things in stochastic rather than deterministic terms, and would not expect causal processes that do not involve a highly penetrant mutation to be usefully represented by pies.
Another kind of scenario that may not be well captured by the usual DAGs is one in which biologic intermediates are included as “E” in the DAG and those intermediates can be phenotypically diverse in ways that have not been identified, but may to some extent depend on how they were caused. Thus, for example, if pesticide exposure in pregnancy causes gestational diabetes,5 a known risk factor for developing pre-eclampsia, the condition may sometimes retain a physiologic fingerprint that we have not characterized, but which reflects its causation in a way that has implications for sequelae, eg, the risk of pre-eclampsia. Suppose an exposure X is not causally related to D except through a path involving an intermediate E. Then the risk of D among those with E may still depend on X, as if E retains a memory of its parent, X. This is an interesting kind of effect modification, which could be represented by the graph of Figure 4. Notice that if we did not include the arrow-on-arrow, representing effect modification, we could mistakenly think that D does not depend on X once we have conditioned on E. This kind of DAG strongly suggests the existence of diverse subtypes that have been inappropriately lumped into a single intermediate phenotype, subtypes that themselves carry implications for risk. In the example, the phenotypic diversity may be subtle, or may be as simple as variation in severity of gestational diabetes.
Having proposed Figures 1 to 4, which seem to me to capture more about effect modification than do the usual DAGs, there may be some formal logic reason to avoid arrow-intersecting graphs; such graphs are probably not kosher within formal directed acyclic graph theory. It may also be that, once one allows arrow-on-arrow effects and begins to think about representing overlapping sets of contributing causes, the resulting tangle of arrows begins to look too much like some kind of pasta, and the complexity may become daunting.
I consider the terminology itself to be a significant problem in thinking about models for joint effects: “effect modification” may be the most unfortunate jargon in epidemiology. The phrase strongly implies that a cofactor actually is acting to modify the causal effect we are studying, and investigators (and others) can be seduced by the words into presuming their finding has causal meaning. In practice, a finding of “interaction” or “effect modification” usually means little more than inequality of an estimated parameter across strata, which is a much more accurate but less sexy way to say the same thing.
VanderWeele and Robins3 step very carefully around the old issues related to the choice of scale one should use when defining and identifying effect modification. Readers who favor an additive model, because it is of more immediate public health relevance than the popular multiplicative model, will appreciate their choice of the risk-difference scale in which to assess effect modification. Although they develop their ideas in a context where effect modification is based on a risk-difference criterion, VanderWeele and Robins point out that one could alternatively use either the risk ratio (log risk) or the odds ratio (log odds) as the scale for defining effect modification. The richness of these choices for identifying “effect modifiers” becomes clear if one considers the fact that if E is a risk factor, then any second risk factor must be an “effect modifier” for the effect of E on at least 2 of those 3 scales. Basic algebra guarantees that effect modification is all around us, while all this plenty must also make some of us doubt its usefulness as a statistical and epidemiological construct.
Nonetheless, a scale I think should be more often considered because it may sometimes actually have causal meaning, but one not mentioned by VanderWeele and Robins, is the log-complement scale. This is the scale that attempts to capture probabilistic independence among causal processes. Toxicologists developed the concept of “simple independent action” for toxic effects of 2 different chemicals administered simultaneously.6 The idea, which has also been discussed in the context of epidemiology,7 is simple. Suppose that exposure A can cause D and exposure B can cause D by a truly independent pathway, while D can also occur in the absence of either A and B through an independent background cause. The paradigm would be 2 hunters who are independently aiming at the same duck. Of course, the duck could also drop dead from some unrelated and independent background cause, eg, lightning. If the 3 pathways are probabilistically independent, we have:
Hence we have that:
Under probabilistically independent effects then, it follows that there is additivity on the log complement scale:
Thus there is additivity of the effect of A and the effect of B, on the log-complement scale. For a rare outcome, the analogous additive model will approximately hold on the absolute risk scale. However, when the outcome is not rare, additivity of risks is not equivalent to statistical independence (there being a non-negligible chance that both hunters will hit the duck), so is not equivalent to additivity on the log complement scale.
When we study more than one individual, variation across individuals in susceptibility (certain ducks may be easier for the independently shooting hunters to see) can cause violations of probabilistic independence, even if stochastic independence of causes holds for each individual at risk, so even this (to me) appealing formulation does not necessarily permit a biologically meaningful inference.7 Moreover, one must also think differently and in a more complicated way about protective factors, such as vaccines. Nonetheless, although the field may have grown weary of this debate, issues of scale will continue to matter when we try to draw useful inference from models for joint effects.
DAGs have met with a mixed reaction in epidemiology, with some of my colleagues recognizing their importance for analysis of etiologic factors, and others preferring to think about confounding in more classic terms. I have long been one of the boosters.
One practical challenge with DAGs in my experience, however, is the contentiousness of choosing a DAG. These choices can be especially problematic in reproductive epidemiology, where time-related factors become important. For example: Is long interpregnancy interval a cause of change of partner, or is change of partner a cause of long interpregnancy interval? But the hard thinking and very careful consideration of the etiologic context that are needed to decide which DAG is epidemiologically most plausible can be extremely useful, as we try to break through our academically enforced reluctance to think directly about causes.
A different, longstanding problem with DAGs has been that when synergistic effects may be involved in the etiology of the disease, DAGs are annoyingly noncommittal. I'm afraid the paper by VanderWeele and Robins,3 although providing useful categorizations for effect modifiers, has not rescued DAGs from this limitation. There may be more future in rethinking the basics of how we draw the DAGs.
I thank Richard MacLehose and Anne Marie Jukic for their critical reading of the manuscript.