The article in this issue on “Attributing effects to interactions”1 is on the assessment of interactions, a topic that has attracted considerable interest in epidemiology. It is technically accomplished, well written, and easy to read, at least for a statistically minded reader.
Although the article can be understood by many epidemiologists, it may be hard for the general readership of EPIDEMIOLOGY to assess its importance. I will therefore provide some critical comments on the contribution of this particular article and offer some general remarks on the publication practice of epidemiology journals.
The basic statistical model considered is a linear risk model for 2 exposures with an interaction. When Y is a binary outcome, there are 2 binary exposures G and E, and there is no confounding, the model can be written as
where P(Y= 1|G, E) is the probability or risk that Y = 1 for given values of the exposures G and E, and the parameter αGE is called the most fundamental epidemiologic measure of interaction by Rothman.2 The main contribution of the article1 is to propose a new measure based on model (1):
The proportion of the total effect of an exposure G or E attributable to the interaction between G and E.
Before identifying some problematic features, I will give a brief derivation of what I see as the main results. I refer to the appendix of the article1 for an extensive technical derivation that also uses counterfactuals.
BRIEF DERIVATION OF MAIN RESULTS
I begin by deriving the total effect of exposure E in model (1); analogous results would be obtained for the other exposure G. Taking the population average or “marginalizing” P(Y= 1|G, E) in (1) over the conditional distribution of G given E, we obtain the conditional probability or risk P(Y= 1|E) where we condition only on E:
Here,
(·) takes the (conditional) population average over the distribution of G for given E, and
since G is binary. Taking the difference between this risk when exposure E is “on” (E = 1) and “off” (E = 0), the total effect of E becomes
where αGE P(G=1|E=1) is the component of the total effect attributable to the interaction (because it is the only component involving the interaction parameter αGE).
The proportion of the total effect of exposure E that is attributable to interaction (called pAIG=0 (E) in the original article1) is then obtained by dividing the component of the total effect attributable to the interaction by the total effect:
If G and E are statistically independent, as is assumed in the first part of the article, then P(G=1|E)=P(G =1). The numerator of (2) simplifies to αGE P(G = 1), the total effect in the denominator simplifies to αE + αGE P(G = 1), and it follows that
From model (1) we see that if G were fixed at 0, the total effect of E would be
Hence, if the exposures were independent, αE/[αE + αGE P(G=1)] would be the proportion of the total effect of E that remains after setting G = 0. Because pAIG = 0(E)=1 − αE/[αE + αGE P(G = 1)], VanderWeele and Tchetgen Tchetgen1 argue that pAIG=0 (E) can be interpreted as the proportion of the total effect of E that could be eliminated if G were fixed at 0. In this sense, the measure “may help determine the extent to which an intervention on a potential effect modifier would successfully alter the effect of the exposure of interest.”
Adjustment for confounders C can be performed by including them as covariates in the linear risk model,
where C is a vector of covariates with corresponding coefficients αC. The measures pAIG=0 (E) can then be based on the parameters of this extended model.
The above expressions can be used in cohort studies where the parameters of the linear risk model can be estimated but cannot be used in case-control studies unless auxiliary information such as sampling fractions are available. However, in the spirit of Rothman’s seminal work,2 it is easy to express pAIG=0 (E) in a form amenable for case-control studies. Still assuming that there are 2 independent exposures, we can simply divide the numerator and denominator of (3) by α0 to obtain
Here, I have used that αGE / α0 is the relative excess risk due to interaction (RERI) and that αE / α0 equals RR01 − 1, where the risk ratio is defined as RR01 = P(Y=1|G=0,E=1)/P(Y=1|G=0,E=0). In line with Rothman’s ideas, VanderWeele and Tchetgen Tchetgen1 point out that the logistic regression model
can be used to estimate pAIG=0 (E) in (5) from case-control studies if the outcome is rare or incidence density sampling is used. This is because in this case RERI ≈ exp(γG + γE + γGE) − exp(γG)−exp(γE)+1 and RR01 ≈ exp(γE).
SOME PROBLEMS
Having summarized the major results of the article by VanderWeele and Tchetgen Tchetgen,1 I now turn to some problematic features.
Uniqueness and Misspecification Problems
In practice, confounding is of course expected to be present. For case-control studies, these authors follow Rothman2 and adjust for confounders C by simply including them as covariates in a logistic regression model,
Estimates of the parameters γG,γE, and γGE from this extended model are then plugged into the approximations for RERI and RR01 given above.
Unfortunately, this produces the fundamental problems that I uncovered3 for Rothman’s logistic regression approach when covariates are included to handle confounding. In brief, the proposed approach leads to a uniqueness problem (where pAIG=0 (E) takes on a different value for each combination of the covariate values of C because RERI depends on C, whereas the interaction parameter αGE is invariant) and a misspecification problem (where the estimated logistic regression model (6) that includes C is not equivalent to the assumed linear risk model (4) that includes C).
These authors fail to mention that pAIG=0 (E) suffers from both the problems outlined above. This is peculiar because they argue elsewhere in the article that an advantage of their measures for attributing joint effects (which I have not discussed here) is that they do not suffer from the uniqueness problem (and even cite my article in that context).
Understated Uncertainty
VanderWeele and Tchetgen Tchetgen1 propose to use the delta method to obtain standard errors for the estimated pAIG=0 (E) and provide SAS and Stata code to implement this approach. This is laudable in principle, but a fundamental problem with the actual implementation is that uncertainty regarding the estimated prevalence P(G=1) is ignored. The estimated standard errors are therefore likely to be biased downward.
No Quality Control?
The previous problem brings me to another problem with the article1 (and many other methods articles in epidemiology journals), namely that the actual performance of the proposed methods does not appear to have been investigated in simulation studies. This is a major problem because approximations that look fine in theory may not work well in practice. Extensive quality controls are vital before exposing epidemiologists to readily available statistical tools.
Independent Exposures?
The authors assume in the first part of the article1 that the 2 exposures considered are statistically independent, which implies that they are uncorrelated. This is usually unrealistic in epidemiology, as they indicate in the following understatement: “When exposures are 2 environmental factors, or 2 behavioral factors, the 2 exposures may often be correlated with each other.” An exception is genetic epidemiology, where genes may be independent from environmental factors, but even this assumption may be questionable as they acknowledged. However, a saving grace is that falsely assuming independent exposures can produce a lower bound for the measure assuming causally ordered exposures, although under extra assumptions.
Interaction or Mediation?
The authors relax the assumption of independent exposures in the second part of their article1 by assuming that one of the exposures causes the other exposure, for instance that G affects E. This seems to open up a can of worms. In particular, how is the proposed notion of interaction with causally ordered exposures related to mediation? Not much is said about this, apart from briefly contrasting the proposed decompositions to the decompositions in the mediation analysis literature (this is useful, but for some reason buried in an appendix). The authors’ position is that “when G affects the second exposure E, the questions concerning mediation may be the more relevant question of interest.”
Correct Causal Model?
VanderWeele and Tchetgen Tchetgen1 interpret pAIG=0 (E) as the proportion of the total effect of E that could be eliminated if G were fixed at 0. This is an attractive interpretation but seems to require that the linear risk model (4) is a correctly specified causal model, which is a tall order for observational data.
PUBLICATION PRACTICE OF EPIDEMIOLOGY JOURNALS
It is a shame, and unfortunate for these authors, that the reviewers of their article1 apparently failed to discover the problems identified above. This may illustrate what appears to be a general problem for epidemiology journals, namely finding competent methodologists who are willing to review articles in disciplines they do not publish in themselves. Finding suitably independent reviewers from the epidemiologic community may also be a challenge.
Another general problem with epidemiology journals is that it is possible to publish a number of articles on small variations of a methodological topic, each article having limited value-added. Admittedly, this is also a problem in other fields, but it seems particularly acute in epidemiology. An even more depressing feature of epidemiology journals is that they sometimes publish methods articles that are confused or merely recycle existing results. The article by VanderWeele and Tchetgen Tchetgen1 certainly does not belong to this category.
There has lately been considerable overlapping interest in the prospects for causal inference in disciplines relying on observational studies, such as epidemiology and economics, and even some cross-fertilizing research. In contrast, interest in the conceptualization of interactions is confined to epidemiology. The article being discussed here1 is sober and balanced, but much of the epidemiologic literature on interactions has a quasi-religious fervor. Even the rhetorical trick of framing has been used, where interaction based on multiplicative risk models has been dubbed “statistical interaction,” whereas the advocated conceptualization based on additive risk models has been given the considerably more attractive name “biologic interaction.” Seen from the outside, epidemiologists appear naive and simplistic when they measure biologic interaction by the coefficient αGE in a linear risk model or some measure derived thereof.
I am not convinced that yet another theoretical article on interaction is moving epidemiology forward. There are surely more pressing methodological challenges to address in epidemiology, such as unobserved confounding, measurement error, missing data, and their combinations. Maybe it is time to move on.
ABOUT THE AUTHOR
ANDERS SKRONDAL is a senior scientist at the Division of Epidemiology, Norwegian Institute of Public Health. He was previously Professor of Statistics and Director of The Methodology Institute at the London School of Economics. His research interests include topics in statistics, biostatistics, social statistics, econometrics, and psychometrics.
REFERENCES