# Sufficient Cause Interactions and Statistical Interactions

When the outcome and all exposures of interest are binary it is sometimes possible to draw conclusions from empirical data about mechanistic interactions in the sufficient cause sense. Empirical conditions are given for sufficient cause interactions and these conditions are compared with and contrasted to interaction coefficients in linear, log-linear and logistic regression models. Conditions that suffice to allow for the interpretation of statistical interactions as sufficient cause interactions are derived. Discussion is presented concerning the implications of the inclusion of confounding variables in the model.

From the Department of Health Studies, University of Chicago, Chicago, Illinois.

Submitted April 3, 2008; accepted September 26, 2008.

Correspondence: Department of Health Studies, University of Chicago, 5841 S. Maryland Ave., MC 2007, Chicago, IL 60637. E-mail: vanderweele@uchicago.edu.

It is often pointed out that many different biologic pathways or mechanisms can be consistent with the results of a statistical model^{1,2}; in such cases it is not possible to draw conclusions about causal biologic mechanisms through statistical analysis. In certain simple settings, however, such as when the outcome and all exposures of interest are binary, it is possible to draw conclusions about mechanisms from empirical analyses provided there are no unmeasured confounding variables.^{3,4} In particular, VanderWeele and Robins^{4,5} recently derived empirical tests for synergism in the sufficient cause sense of Rothman.^{6} This form of synergism essentially implies joint presence of 2 causes in the same causal mechanism or sufficient cause. In this article we relate the empirical conditions of VanderWeele and Robins^{4,5} to interaction terms arising in standard statistical models. Linear, log-linear, and logistic models are all considered. In each case we use the empirical conditions of VanderWeele and Robins^{4,5} to derive conditions on model coefficients that suffice to conclude the presence of a sufficient cause interaction and we provide additional conditions under which the interactions in statistical models can be interpreted as the presence of a sufficient cause interaction. The remainder of the paper is organized as follows. We first summarize the sufficient component cause framework as conceptualized by Rothman^{6} and formalized by VanderWeele and Robins^{4,5} and give the empirical conditions of VanderWeele and Robins^{4,5} that suffice to conclude the presence of a sufficient cause interaction. We then relate sufficient cause interactions first to linear statistical models, and then in the following section to interaction terms in log-linear and logistic models; 2-way sufficient cause interactions are discussed explicitly in the text whereas extensions to 3-way interactions are given in the Appendix. We next consider the implications of the presence of confounding variables in statistical models for inference about sufficient cause interactions and we close with some general discussion.

### Sufficient Causes and Sufficient Cause Interactions

Rothman^{6} conceptualized causation as a collection of different causal mechanisms, each sufficient to bring about the outcome. These causal mechanisms Rothman called “sufficient causes” and conceived of them as minimal sets of actions, events, or states of nature that together initiated a process that inevitably resulted in the outcome. For a particular outcome there would likely be many different sufficient causes, that is, many different causal mechanisms by which the outcome could come about. Each sufficient cause involved various component causes. Whenever all components of a particular sufficient cause were present, the outcome would inevitably occur; within every sufficient cause, each component would be necessary for that sufficient cause to lead to the outcome. If 2 distinct causes are both components of the same sufficient cause, then the causes participate together in the same causal mechanism and synergism is said to be present. Often there will be several primary causes of interest and other background causes will be necessary to complete the sufficient causes. We use *A*_{i} to denote the background causes for the *i*th sufficient cause. Consider, for example, the case of 2 binary causes *X*_{1} and *X*_{2} for some outcome *D*. Each sufficient cause may involve background causes as well as either or both of *X*_{1} and *X*_{2} or the complements of *X*_{1} and *X*_{2}, which we will denote by _{1} and _{2}. In the case of 2 binary causes, Greenland and Poole^{7} thus enumerate 9 different sufficient causes: *A*_{1}, *A*_{2}X_{1}, *A*_{3}_{1}, *A*_{4}X_{2}, *A*_{5}_{2}, *A*_{6}X_{1}X_{2}, *A*_{7}X_{1}X_{2}, *A*_{8}X_{1}_{2}, and *A*_{9}_{1}_{2}. For a particular outcome *D*, only some of these sufficient causes might be present; for example, if the presence of *X*_{1} or *X*_{2} can never prevent the outcome, then none of the sufficient causes with _{1} or _{2} will be present, that is, none of *A*_{3}_{1}, *A*_{5}_{2}, *A*_{7}_{1}*X*_{2}, *A*_{8}X_{1}_{2}, and *A*_{9}_{1}_{1} will be present and the only possible sufficient causes will be *A*_{1}, *A*_{2}X_{1}, *A*_{4}X_{2} and *A*_{6}X_{1}X_{2}. When none of the causes of interest *X*_{1} and *X*_{2} can ever prevent the outcome, we will say that the effects of *X*_{1} and *X*_{2} on *D* are monotonic. If we let *Dx*_{1}*x*_{2} denote the counterfactual value of *D* after intervening to set *X*_{1} = *x*_{1} and *X*_{2} = *x*_{2}, then the effects of *X*_{1} and *X*_{2} on *D* are monotonic if *Dx*_{1}*x*_{2} is nondecreasing in *x*_{1} and *x*_{2}. The equivalence of the definitions of monotonicity based on counterfactuals and on sufficient causes is discussed elsewhere.^{8}

If, for the *i*th sufficient cause, no background causes are necessary, then *A*_{i} = 1. If for example the outcome *D* always occurred whenever *X*_{1} = 1 and *X*_{2} = 1, then the 6th sufficient cause in the list would be *X*_{1} X _{2} rather than *A*_{6} X _{1} X _{2}. Now instead suppose that for all individuals *D* = 1 if and only if either *X*_{1} = 1 or *X*_{2} = 1. Greenland and Brumback^{9} note that several different sets of sufficient causes could represent this response pattern. For example, if there were 3 sufficient causes, *X*_{1} X _{2}, _{1}*X*_{2}, and *X*_{1}_{2} this would replicate the response pattern. However the 2 sufficient causes, *X*_{1} and *X*_{2}, would also replicate the response pattern. VanderWeele and Robins^{5} formally defined a sufficient cause interaction to be present between *X*_{1} and *X*_{2} (or more generally between *X*_{1},…,*Xk*) if for every set of sufficient causes that replicates the response patterns there is a sufficient cause in which *X*_{1} and *X*_{2} are both present (or more generally in which *X*_{1},…,*Xk* are all present). Thus if a sufficient cause interaction between *X*_{1} and *X*_{2} is present then there must be some mechanism, which the sufficient cause represents, which requires the presence of both *X*_{1} and *X*_{2} to operate.

VanderWeele and Robins^{4,5} furthermore derived empirical conditions that were sufficient to conclude that a sufficient cause interaction was present. Let *px*_{1}*x*_{2} = *E*(*D*|*X*_{1} = *x*_{1}, *X*_{2} = *x*_{2}). It was shown that for a binary outcome *D* and 2 binary exposures *X*_{1} and *X*_{2}, if the effects of *X*_{1} and *X*_{2} on *D* are unconfounded then if

then a sufficient cause interaction must be present between *X*_{1} and *X*_{2}. It was further shown if the effects of *X*_{1} and *X*_{2} on *D* are monotonic (ie, if neither *X*_{1} and *X*_{2} can ever prevent the outcome) then if

then a sufficient cause interaction must be present between *X*_{1} and *X*_{2}. If we let

denote the relative risk that *D* = 1 when *X*_{1} = *x*_{1} and *X*_{2} = *x*_{2} then provided *p*_{00} > 0, condition 1 can be rewritten as *RR*_{11} − *RR*_{10} − *RR*_{01} > 0 and condition 2 can be rewritten as *RR*_{11} − *RR*_{10} − *RR*_{01} + 1 > 0. Condition 2 is simply a condition for “superadditivity” or positive interaction on the risk difference scale; however, it is only applicable for sufficient cause interactions when the effects of both *X*_{1} and *X*_{2} on *D* are monotonic. Condition 1 is a stronger condition and applies without making assumptions about monotonicity. The intuition for condition 1 is that, if it holds, then there must be some individuals for whom *D*_{11} = 1 but *D*_{10} = *D*_{01} = 0 and, if this is the case, then there must be a sufficient cause with both *X*_{1} and *X*_{2} present.^{5}

Extensions to 3-way sufficient cause interactions were also given by VanderWeele and Robins.^{5} For 3 binary exposures *X*_{1}, *X*_{2}, and *X*_{3}, let *px*_{1}*x*_{2}_{x}_{3} = *E*(*D*|*X*_{1} = *x*_{1}, *X*_{2} = *x*_{2}, *X*_{3} = *x*_{3}). Suppose that the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are unconfounded then if

then a sufficient cause interaction must be present between *X*_{1}, *X*_{2}, and *X*_{3}. Finally if the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are monotonic then any of the following 3 conditions imply that a sufficient cause interaction is present between *X*_{1}, *X*_{2}, and *X*_{3}:

If we let

denote the relative risk that *D* = 1 when *X*_{1} = *x*_{1}, *X*_{2} = *x*_{2} and *X*_{3} = *x*_{3} then provided *p*_{000} > 0, condition 3 can be rewritten as *RR*_{111} − *RR*_{110} − *RR*_{101} − *RR*_{011} > 0 and similarly the conditions given in 4 can also be rewritten in terms of relative risks. If the effects of {*X*_{1}, *X*_{2}} or {*X*_{1}, *X*_{2}, *X*_{3}} on *D* are unconfounded conditional on some set of covariates *C*, then conditions 1–4 can also be made conditional on *C*. A sufficient cause interaction is then present if the conditions hold in any stratum of the confounding variables *C*. In the context of no confounding factors, the fact that condition 2 was sufficient to conclude the presence of a sufficient cause interaction was stated explicitly and proved by Rothman and Greenland^{3} and it was also anticipated elsewhere.^{10,11} Theory concerning sufficient causes developed in VanderWeele and Robins^{5} was necessary to derive conditions 1, 3, and 4. Note that these are sufficient conditions for a sufficient cause interaction, that is, if they hold then a sufficient cause interaction must be present. They are not, however, necessary conditions; a sufficient cause interaction might be present even if they do not hold. See VanderWeele and Robins^{4,5} for further discussion.

### Sufficient Cause Interaction and Linear Statistical Models

In this section we will relate sufficient cause interactions to interactions arising in linear statistical models. The discussion follows from that given in VanderWeele and Robins.^{5} For simplicity we will assume that the causal effects of {*X*_{1}, *X*_{2}} or alternatively {*X*_{1}, *X*_{2}, *X*_{3}} are unconfounded. Below we will consider also settings in which the effects of the exposures of interest are confounded by some set of measured confounding variables *C*. Consider first the setting of 2 causes of interest, *X*_{1} and *X*_{2}. To model disease with a linear statistical model one could use a saturated Bernoulli regression model:

Because *X*_{1} and *X*_{2} are binary and 4 coefficients are in the model, the model will fit the conditional probabilities of the outcomes perfectly and the model is said to be saturated. In this linear model, one would test for a statistical interaction by testing the hypothesis α_{3} = 0. We will consider first the case of monotonic effects. If *X*_{1} and *X*_{2} have monotonic effects on *D*, condition 2 states that if the effects of *X*_{1} and *X*_{2} on *D* are monotonic, then if *p*_{11} − *p*_{10} − *p*_{01} + *p*_{00} > 0, then there is a sufficient cause interaction between *X*_{1} and *X*_{2}. We may write this condition in terms of the coefficients in the linear statistical model 5 as follows:

Thus in the case of monotonic effects, if α_{3} > 0 then a sufficient cause interaction is necessarily present between *X*_{1} and *X*_{2}. When the effects of *X*_{1} and *X*_{2} on *D* are not monotonic we need condition 1, that *p*_{11} − *p*_{10} − *p*_{01} > 0, to be able to conclude the presence of a sufficient cause interaction. We may also rewrite this condition in terms of the coefficients in the linear statistical model 5:

In this case, we need that α_{3} > α_{0} to be able to conclude the presence of a sufficient cause interaction. Note that these statements concern the true parameters. In practice, of course, the parameters will have to be estimated from data, and inference concerning the true parameters drawn from the estimates and their confidence intervals.

We see then that a test for a statistical interaction only implies a test for a sufficient cause interaction in the case of monotonic effects, not in general, and that even with monotonic effects a statistical interaction implies a sufficient cause interaction only in the case of a positive interaction (α_{3} > 0); a negative interaction (α_{3} < 0) does not suffice. Without the assumption of monotonic effects, a test for a sufficient cause interaction, α_{3} > α_{0}, will only be implied by a test for a statistical interaction, α_{3} > 0 if α_{0} = *p*_{00} = 0, that is, if the baseline risk for the outcomes when both exposures *X*_{1} and *X*_{2} are absent is 0. In the Appendix we also relate conditions for 3-way sufficient cause interactions, conditions 3 and 4, to 3-way statistical interactions in linear models. There it is shown that for 3-way sufficient cause interactions neither the condition with monotonicity, condition 3, nor the condition without monotonicity, condition 4, is implied by a test for the presence of a 3-way statistical interaction in a linear model for the probability of the outcome.

### Sufficient Cause Interaction and Log-Linear and Logistic Models

In this section we will relate sufficient cause interactions to interaction terms arising in log-linear and logistic models. There is a substantial epidemiologic literature on using log-linear and logistic models in tests for the additivity of effects.^{12–15} Here we consider some of the implications of this literature for testing for sufficient cause interactions. Consider the following saturated log-linear model for the probability of the outcome:

From this model it follows that *p*_{11} = e^{β}_{0}^{+β}_{1}^{+β}_{2}^{+β}_{3}, *p*_{10} = e^{β}_{0}^{+β}_{1}, *p*_{01} = e^{β}_{0}^{+β}_{2} and *p*_{00} = e^{β}_{0}. Condition 2, which suffices to conclude the presence of a sufficient cause interaction under the monotonicity assumption, can be rewritten in terms of the coefficients in model 6 as follows:

which can be rewritten as

This condition can be used to test for a sufficient cause interaction between *X*_{1} and *X*_{2} by using model 6 provided the effects of *X*_{1} and *X*_{2} on *D* are monotonic. Note that the quantity given in 7 is what Rothman^{12} defined as the relative excess risk due to interaction (*RERI*) and which Rothman and Greenland^{3} call the interaction contrast or *ICR*. Note that the condition *RERI* > 0 implies that there is a sufficient cause interaction only if the effects of *X*_{1} and *X*_{2} on *D* are monotonic.

We will now characterize the conditions such that a test for a statistical interaction, β_{3} > 0, in the log-linear model 6 corresponds to a test for a sufficient cause interaction. Suppose that β_{1} ≥ 0. We can rewrite condition 7 as

Suppose β_{3} > 0; then because β_{1} ≥ 0 we have that e^{β}_{1}(e^{β2+β3}−1)≥(e^{β2+β3}−1)>(e^{β2}−1) and thus condition 8 must be satisfied and a sufficient cause interaction between *X*_{1} and *X*_{2} must be present. By symmetry the conclusion would also hold if β_{2} ≥ 0 and β_{3} > 0. Note that because

and

the conditions β_{1} ≥ 0 and β_{2} ≥ 0 are necessarily satisfied, as we have assumed that the effects of *X*_{1} and *X*_{2} on *D* are monotonic. We have thus established the following result.

Result 1. Suppose the effects of *X*_{1} and *X*_{2} on *D* are monotonic and unconfounded. If in model 6, β_{3} > 0 then there is a sufficient cause interaction between *X*_{1} and *X*_{2}.

Thus, under monotonicity, a test for a statistical interaction in model 6, β_{3} > 0, implies a test for a sufficient cause interaction. It can similarly be verified that if β_{1} > 0 and β_{2} > 0 then β_{3} ≥ 0 implies that there is a sufficient cause interaction between *X*_{1} and *X*_{2}. Thus if β_{1} > 0 and β_{2} > 0 then a sufficient cause interaction will be present even if β_{3} = 0.^{16} It is thus also clear then that a sufficient cause interaction can be present even under a multiplicative model, that is, if β_{3} = 0 so that log (*P*(*D* = 1|*X*_{1} = *x*_{1}, *X*_{2} = *x*_{2})) = β_{0} + β_{1}x_{1} + β_{2} x _{2} and

that is, *p*_{11} p _{00} = *p*_{10} p _{01}. Following VanderWeele and Robins,^{4} suppose that in some study it is found that the relative risk of lung cancer with only an asbestos exposure is 3, the relative risk of lung cancer with only the smoking exposure is 10 and the relative risk of lung cancer with both the asbestos and the smoking exposure is *30*, then the risks are multiplicative and thus β_{3} = 0. But from this information alone, provided the effects of asbestos and smoking on lung cancer are monotonic, 1 can conclude that a sufficient cause interaction between asbestos and smoking must be present because

and similarly

Thus, assuming that our estimates are unconfounded and that the effects of asbestos and smoking on lung cancer are monotonic we could conclude that there must be a causal mechanism that requires both the exposure to smoking and to asbestos to operate.

We now consider the case when it cannot be assumed that the effects of *X*_{1} and *X*_{2} on *D* are monotonic. Condition 1, which suffices to conclude the presence of a sufficient cause interaction without the monotonicity assumption, can be rewritten in terms of the coefficients in model 6 as follows:

which can be rewritten as

This condition can be used to test for a sufficient cause interaction between *X*_{1} and *X*_{2} by using model 6 even when the effects of *X*_{1} and *X*_{2} on *D* are not monotonic. Note that condition 9 can be rewritten as *RERI* > 1.

We will now characterize the conditions such that a test for a statistical interaction, β_{3} > 0, in the log-linear model in 6 corresponds to a test for a sufficient cause interaction. We can rewrite condition 9 as

or as

Clearly if (½)e^{β1+β3}−1>0 and (½)e^{β2+β3}−1>0 then condition 10 will be satisfied. These 2 conditions we can rewrite as e^{β3}>2e^{−β1} and e^{β3}>2e^{−β2} or as β_{3} > log(2) − β_{1} and β_{3} > log(2) − β_{2}. We have thus established the following result.

Result 2. Suppose the effects of *X*_{1} and *X*_{2} on *D* are unconfounded. If in model 6 both β_{3} > log(2) − β_{1} and β_{3} > log(2) − β_{2} then there must be a sufficient cause interaction between *X*_{1} and *X*_{2}.

If β_{1} ≥ log(2) and β_{2} ≥ log(2) then the conditions in result 2 will be satisfied if β_{3} > 0. Let

and

denote the relative risks when either only exposure *X*_{1} or *X*_{2}, respectively, is present. Note that

and

and so the conditions β_{1} ≥ log(2) and β_{2} ≥ log(2) are simply that *RR*_{10} ≥ 2 and *RR*_{01} ≥ 2. Thus if both *RR*_{10} ≥ 2 and *RR*_{01} ≥ 2 then a test for a statistical interaction in model 6, β_{3} > 0, corresponds to a test for a sufficient cause interaction. If either of the conditions *RR*_{10} ≥ 2 and *RR*_{01} ≥ 2 does not hold then the condition for a statistical interaction, β_{3} > 0, does not in general imply the presence of a sufficient cause interaction.

Also, note that by Result 2 if β_{1} ≥ 0 and β_{2} ≥ 0 then β_{3} > log(2) implies the presence of a sufficient cause interaction. Note that Rothman et al^{16} use the results of VanderWeele and Robins^{4,5} to show that if *RR*_{10} > 2 and *RR*_{01} > 2, that is, if the inequalities are strict, then a sufficient cause interaction must be present even if β_{3} = 0. Result 2 generalizes this observation of Rothman et al.^{16} Note also that a sufficient cause interaction might be present even if β_{3} < 0. For example, β_{3} might be negative and yet we might have either β_{3} > log(2) − β_{1} and β_{3} > log(2) − β_{2} if β_{1} or β_{2} are sufficiently large.

In the smoking and asbestos example, if we were unwilling to assume that the effects of smoking and asbestos on lung cancer were monotonic we could still conclude the presence of a sufficient cause interaction by result 2 because

Consider now the case of logistic regression. A saturated logistic model for the probability of the outcome is given by:

Conditions 1 and 2 can be respectively written in terms of the coefficients in model 11 as follows:

and

These conditions can be used to test for a sufficient cause interaction between *X*_{1} and *X*_{2} by using the logistic model given in 11. However, there appear to be no simple conditions on the coefficients γ_{0}, γ_{1}, and γ_{2} that imply that whenever γ_{3} > 0 condition 12 or 13 is satisfied. Nevertheless, if the outcome is sufficiently rare so that the odds ratio closely approximates the risk ratio, so that,

then the discussion above will apply also to the coefficients in model 11. That is if the outcome is sufficiently rare then,

implies the presence of a sufficient cause interaction if the effects of *X*_{1} and *X*_{2} on *D* are monotonic and then a test of γ_{3} > 0 implies a test for a sufficient cause interaction. Also if the effects of *X*_{1} and *X*_{2} on *D* are not monotonic then,

implies the presence of a sufficient cause interaction and if γ_{1} ≥ log(2) and γ_{2} ≥ log(2) then a test of γ_{3} > 0 implies a test for a sufficient cause interaction; if γ_{1} ≥ 0 and γ_{2} ≥ 0 then a test of γ_{3} > log(2) implies a test for a sufficient cause interaction. Provided the outcome is rare these tests for the coefficients of a logistic regression model could be used to test for sufficient cause interactions by using data arising from case-control studies. In the Appendix we also relate conditions for 3-way sufficient cause interactions, that is, conditions 3 and 4, to 3-way statistical interactions in log-linear and logistic models.

### Implications of Confounding Variables

Our discussion thus far has assumed that no confounding variables are present, that is, that the exposures *X*_{1} and *X*_{2} are effectively randomized. As noted above, if the effects of *X*_{1} and *X*_{2} are unconfounded given some set of variables *C* the conditions derived in VanderWeele and Robins^{4,5} can be made conditional on *C*. For example, let *px*_{1}*x*_{2}_{c} = *E*(*D*|*X*_{1} = *x*_{1}, *X*_{2} = *x*_{2}, *C* = *c*), then condition 1 becomes

and condition 2 becomes

If these conditions hold in any strata *C* = c, this suffices to conclude the presence of a sufficient cause interaction. However, if 1 or more of the confounding variables in *C* is continuous, this can raise difficulties in the models we have been considering. The models we have considered thus far have all had only binary variables and have all been saturated models so that they fit the conditional probabilities of the outcomes perfectly. With a continuous covariate this will not be possible, and one will need to specify a model that will impose certain additional distributional assumptions. Tests for sufficient cause interactions will be valid only if the assumption of no unmeasured confounders holds true (at least approximately) and if the model is correctly specified. With saturated models there was no danger of misspecification.

It is also well known that when a Bernoulli model with linear or log-linear link is used with 1 or more continuous covariates *C*, the convergence properties of maximum likelihood estimators are generally poor^{17} and fitted probabilities can lie outside the range of [0, 1]. Even if the parameter estimates do converge and if the fitted probabilities are all between 0 and 1, the tests for sufficient cause interaction in certain cases may be quite sensitive to model specification.

Let *C* denote a set of confounding variables. Suppose that the following linear model for the probability of the outcome is used:

where *f* is some function of the parameter α_{4} and the confounding variables *c*. Note that *C* may be multivariate and α_{4} may denote a vector of coefficients. Condition 15 can be written in terms of the regression coefficients in model 16 as follows:

Condition 14 can be written in terms of the regression coefficients in model 16 as:

Note that unlike condition 17, condition 18 in fact depends on the value of *c* and is thus sensitivity to the specification of *f*(α_{4}, *c*).

Consider now a log-linear model for the probability of the outcome that includes the confounding variables *C*:

It can be shown that model 19 effectively implies that a change in *C* from *c* to *c** multiplies the likelihood of each of the background causes (the *A*_{i} variables) being present by a factor of e^{g(β4,c*)−g(β4,c)}. Condition 15 can be written in terms of the regression coefficients in model 19 as:

which be rewritten as e^{β0+β1+β2+β3}−e^{β0+β1}−e^{β0+β2}+1>0. Similarly, condition 14 can be written in terms of the regression coefficients in model 19 as:

which be rewritten as e^{β0+β1+β2+β3}−e^{β0+β1}−e^{β0+β2}>0. Neither of these 2 expressions depends on *c* nor on the specification of *g*(β_{4}, *c*). If, however, *C* served also as an effect modifier for the effect of *X*_{1} or *X*_{2} on *D* on the log-linear scale so that the model for the conditional probabilities of the outcome was in fact

then it is easy to verify that conditions 14 and 15 will once again depend on *c* and on the specification of *g*_{1}(β_{5}, *c*) and *g*_{2}(β_{6}, *c*). Clearly, with continuous confounding variables, the conclusions drawn about sufficient cause interaction will be sensitive to model specification. The issues of confounding and model specification are common to almost all research with observational data. Further work, however, remains to be done in determining how sensitive tests for sufficient cause interactions are to unmeasured confounding and model specification. In recent work, multiply robust tests have been derived for sufficient cause interactions that will be valid if either a model for the outcome is correctly specified or if a model for the joint probability of the exposures is correctly specified.^{18}

## DISCUSSION

We have focused here on linear, log-linear, and logistic models and have compared and contrasted tests for sufficient cause interactions to tests for interactions in statistical models. Only when the effects of *X*_{1} and *X*_{2} on *D* were monotonic did a test for a statistical interaction in linear and log-linear models correspond to a test for a sufficient cause interaction. In other cases, certain additional conditions on other regression coefficients were necessary for a test for a statistical interaction in a linear, log-linear, or logistic model to correspond to a test for a sufficient cause interaction.

Although our focus in this paper has been on the relationship between statistical interactions and sufficient cause interactions and has not been on estimation or modeling, the models we have discussed could be fit to data and used in testing for sufficient cause interactions. Logistic regression models are of course routinely used in epidemiologic research. As noted above, when the outcome is rare, the relative excess risk due to interaction (*RERI*) can be used to test for sufficient cause interactions. If the effects of the exposures on the outcome are monotonic then *RERI* > 0 implies the presence of a sufficient cause interaction. If it cannot be assumed that the effects of the exposures on the outcome are monotonic then a stronger condition, *RERI* > 1, can still be used to test for the presence of a sufficient cause interaction. This approach to testing for sufficient cause interactions by using logistic models could also be employed with case-control data. Linear and log-linear models are used less frequently for binary outcomes because of the convergence and fitting issues previously mentioned. Nevertheless, several authors discuss how such linear and log-linear Bernoulli regression models can be fit.^{17,19–21} Other approaches to estimating risk differences and relative risks are also available and could be used in the context of tests for sufficient cause interactions. Zou^{22} discusses using modified Poisson regression models to estimate risk ratios while controlling for covariates. Skrondal^{23} discusses the advantages of using linear odds models rather than logistic regression models to assess departures from additivity in case-control studies. Cheung^{24} discusses using a modified least squares regression approach to estimating risk differences while controlling for covariates. Lumley et al^{25} have recently compared several different approaches to estimating relative risks.

Moving from conclusions about statistical models to conclusions about causation and mechanisms always require certain assumptions. To move from conclusions about association to conclusions about causation we first need that the assumption of no unmeasured confounding holds and second that the statistical model that is used is correctly specified. To draw conclusions about mechanisms, further conditions are needed. In this paper, we have shown that even if the assumption of no unmeasured confounding is met and if the statistical model is correctly specified, interactions terms in statistical models do not in general correspond to interaction or synergism in the sufficient cause sense, that is, to the joint presence of 2 causes in the same causal mechanism. We have, however, provided the appropriate conditions to allow one to conclude the presence of sufficient cause interactions from the coefficients of a statistical model.

## REFERENCES

*Int J Epidemiol*. 1981;10:383–387.

*J Clin Epidemiol*. 1991;44:221–232.

*Modern Epidemiology*. 2nd ed. Philadelphia: Lippincott-Raven; 1998.

*Epidemiology*. 2007;18:329–339.

*Biometrika*. 2008;95:49–61.

*Am J Epidemiol*. 1976;104:587–592.

*J Work Environ Health*. 1988;14:125–129.

*Ann Statist.*In press.

*Int J Epidemiol*. 2002;31:1030–1037.

*Am J Epidemiol*. 1981;113:716–724.

*Biometrika*. 1994;81:259–270.

*Modern Epidemiology*. 1st ed. Boston, MA: Little, Brown and Company; 1986.

*Epidemiology*. 1992;3:452–456.

*Epidemiology*. 1996;7:286–290.

*Int J Epidemiol*. 2007;36:1111–1118.

*Modern Epidemiology*. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008.

*Am J Epidemiol*. 1986;123:174–184.

*J Am Statist Assoc.*In press.

*Stat Med*. 1991;10:1069–1074.

*Am J Epidemiol*. 2004;160:301–305.

*Am J Epidemiol*. 2005;162:199–200.

*Am J Epidemiol*. 2004;159:702–706.

*Am J Epidemiol*. 2003;158:251–258.

*Am J Epidemiol*. 2007;166:1337–1344.

## APPENDIX

### Three-Way Sufficient Cause Interactions in Linear Models

Here we relate 3-way sufficient cause interactions to interaction terms in linear statistical models following the discussion in VanderWeele and Robins.^{5} Consider the following saturated linear model for the probability of the outcome with 3 binary variables *X*_{1}, *X*_{2}, and *X*_{3}:

Under the assumption that the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are monotonic we can rewrite the conditions given in 4 in terms of the coefficients of the linear probability model given in 20. The 3 conditions become

If the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are monotonic and any of these 3 conditions are satisfied then there must be a sufficient cause interaction between *X*_{1}, *X*_{2}, and *X*_{3}. In this case, a test for a 3-way statistical interaction, α_{7} > 0, will imply a test for a 3-way sufficient cause interaction only if 1 of α_{1}, α_{2}, or α_{3} is 0. If it cannot be assumed that the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are monotonic we may use condition 3 which can be rewritten in terms of the coefficients of the linear probability model given in 20 as

If this condition is satisfied then there must be a sufficient cause interaction between *X*_{1}, *X*_{2}, and *X*_{3} even if it cannot be assumed that the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are monotonic. In this case, a test for a 3-way statistical interaction, α_{7} > 0, will imply a test for a 3-way sufficient cause interaction only if 2α_{0} + α_{1} + α_{2} + α_{3} ≤ 0. Thus, for 3-way sufficient cause interactions neither the condition with monotonicity, condition 3, nor the condition without monotonicity, condition 4, is in general implied by a test for the presence of a 3-way statistical interaction, α_{7} > 0, in a linear model for the probability of the outcome. Stronger conditions are needed both with and without monotonicity.

### Three-Way Sufficient Cause Interactions in Log-Linear and Logistic Models

Here we relate 3-way sufficient cause interactions to interaction terms in log-linear and logistic models. Consider the following saturated log-linear model for the probability of the outcome:

Consider first the condition for a 3-way sufficient cause interaction without the assumption that the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are monotonic. Condition 3 can be written in terms of the coefficients in model 21 as

which can be rewritten as

It is then easily verified that if β_{3} + β_{5} + β_{6} > log(3) and β_{2} + β_{4} + β_{6} > log(3) and β_{1} + β_{4} + β_{5} > log(3) and β_{7} > 0 then condition 22 is satisfied. Thus if β_{3} + β_{5} + β_{6} > log(3) and β_{2} + β_{4} + β_{6} > log(3) and β_{1} + β_{4} + β_{5} > log(3), then a test for a 3-way statistical interaction, β_{7} > 0, implies a 3-way sufficient cause interaction.

Now suppose that the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are monotonic. Consider the first condition in 4, *p*_{111} – *p*_{110} – *p*_{101} – *p*_{011} + *p*_{100} + *p*_{010} > 0. This can be written in terms of the coefficients in model 21 as

which can in turn be rewritten as

or as

where

and

. It can then be verified that if β_{3} + β_{5} + β_{6} > log(3 – *c*_{1}) and β_{2} + β_{4} + β_{6} > log(3 – *c*_{2}) and β_{1} + β_{4} + β_{5} > log(3 – *c*_{3}) and β_{7} > 0 then condition 23 is satisfied. Thus, if the effects of *X*_{1}, *X*_{2}, and *X*_{3} on *D* are monotonic and if β_{3} + β_{5} + β_{6} > log(3 – c_{1}) and β_{2} + β_{4} + β_{6} > log(3 – *c*_{2}) and β_{1} + β_{4} + β_{5} > log(3 – *c*_{3}) then a test for a 3-way statistical interaction, β_{7} > 0, implies a test for a 3-way sufficient cause interaction. Note that since *c*_{1}, *c*_{2}, and *c*_{3} are all positive quantities, the conditions required under monotonicity for a statistical interaction to correspond to a sufficient cause interaction are weaker than when monotonicity cannot be assumed. Similar implications hold for the other 2 conditions in 4.

As noted previously, if the outcome is sufficiently rare so that the odds ratio closely approximates the risk ratio then these remarks concerning 3-way sufficient cause interactions and statistical interactions in log-linear models apply also to logistic models.