There are 2 types of assumptions at play in causal inference. One type encapsulates what we know (or think we know) both in the content area and about the data we have collected within a given study design. These assumptions are needed to connect the effects of interest, or estimands, to the observed data, to enable their estimation. They are termed identifiability assumptions, and a familiar example in observational studies is the exchangeability assumption, which in practical terms helps decide which covariate adjustment set is sufficient to avoid, or at least minimize, confounding bias.1 The other type relates to what we want to know, which can be referred to as estimand assumptions.2 These assumptions are embedded in the way we actually define the estimands to answer our questions and include basic aspects such as the target population, the interventions being compared (including how they are applied), and the outcomes.
The paper by Jackson3 in this issue defines estimands and proposes estimators that are novel in that they emphasize the key distinction between covariates with a role in estimand assumptions and those with a role in identification assumptions. Jackson3 compellingly argues that the choice of covariates in each set reflects equity value judgments that directly and critically affect the interpretability of estimates and whether they appropriately address the questions of interest. This exemplifies the importance of distinguishing what we want to know from what we know.
Considering the three broad steps of the causal inference roadmap,4,5 estimand assumptions pertain to the first step, definition, where the question and effect of interest are defined, while identifiability assumptions come in the second step, identification, which is followed by the third step, estimation. Examining methodologic advances in causal inference, and especially their uptake in practice, the two latter steps have received much attention, as attested by the widespread use of directed acyclic graphs (DAGs)6,7 to depict identifiability assumptions and the rise of g-methods,1,8 which move beyond regression to enable estimation under more complex DAGs. The recent advent of the target-trial framework9,10 has put renewed focus on the first step, where a detailed description of the protocol of the hypothetical randomized trial to be emulated promotes explicit articulation of estimand assumptions, yielding refined definitions of research question and estimand. However, a common interpretation of this approach is that it requires, and thus may only be relevant for, well-defined interventions.11–13 This is problematic given the prevalence of ill-defined interventions in many areas of epidemiologic inquiry.14–17
In this commentary, I argue that the key distinction made by Jackson3 and related issues can be illuminated by systematic application of the target-trial framework as a guide to defining estimands and analysis planning when tackling any causal inference problem, regardless of the extent to which the intervention is well defined.
RECOGNIZING CAUSAL INFERENCE QUESTIONS IN PRACTICE
Following Hernán and Robins,1 by causal inference questions I mean “what if” questions,18 about the effects of causes,19 which seek to predict how outcomes might have been under changes to the system (counterfactual prediction; e.g., what if my friend hadn’t smoked?). Of note, “why” questions, about the causes of effects (e.g., why did my friend die of lung cancer?), are also important causal questions, although it has been argued that answers to these may only be possible in terms of effects of causes.20 Recognizing causal inference questions can be challenging, especially in the absence of well-defined interventions. In my practice and teaching, I specify the following guide: for the purpose of analysis planning, consider a question to concern causal inference when the translational intent of the research is to inform on the likely impact of interventions, regardless of whether these exist or are hypothetical. Indeed, depending on the state of advancement of the field, the study findings might seek to inform future trials of existing interventions, development of new interventions, or simply further research into potential interventions or their targets. In all these cases, even when informing future research, one would want to plan the analysis so as to produce the least biased answer, to maximize the potential of subsequent studies to identify effective interventions. The anticipated conclusion of the study’s abstract usually provides a clear giveaway as to whether the question is about causal inference or not.
Applying this guideline to studies focusing on causal mediation or decomposition analysis reveals that the intent is almost invariably to inform on potential mediator interventions. In the case of Jackson,3 the aim is specifically to inform how intervention on a mediator would reduce health disparities, and the conclusion of the hypothetical study put forward might say something like “intervening on decisions to intensify antihypertensive treatment might help to reduce disparities in hypertension control.” Therefore, this is a causal inference problem, even though the actual intervention to change those decisions (e.g., training of physicians to raise awareness of unconscious bias) is not well defined.
DEFINE THE ESTIMAND FIRST AND FOREMOST
Questions like Jackson’s3 abound in life course and social epidemiology, in particular, questions concerning mediator interventions to redress health disparities. As colleagues and I have argued in a general setting,2 these problems warrant the explicit formulation of estimands to reflect the research question at hand, like Jackson3 has done, over the use of ready-made causal mediation estimands which might, at best, only implicitly emulate target trials, and if so of interventions that are unlikely to be relevant.21 In this sense, it is not surprising that when Jackson3 compares the derived estimator with existing mediation estimators, it is found that none of the latter use covariates in a way that is appropriate for the question addressed, which is because these estimators are targeting different estimands. This highlights the critical importance of defining meaningful estimands before considering identification assumptions and estimators.
SPECIFICATION OF THE TARGET TRIAL AND ITS EMULATION
The target trial provides an invaluable tool for specifying suitably tailored estimands, facilitating systematic consideration of all important aspects. Following Hernán and Robins,9 the Table shows a possible target trial for the Jackson3 example, as well as how it might be emulated (noting that the cohort study described there was also hypothetical). Highlighted in the Table is the first appearance of the covariate types distinguished by Jackson.3 “Allowable” covariates are used to define the relevant estimands, which quantify the effect of an intervention on a mediating variable that is delivered conditional on “target-allowable” covariates, and are standardized to a given distribution for “outcome-allowable” covariates. As shown in the Table, Jackson3 also includes these in the confounder set in the example, that is, as necessary to satisfy conditional exchangeability, although this need not be the case. Conversely, “nonallowable” covariates have a key role in this set to enable estimand identification, but it is argued that these should not be conditioned upon or standardized for in the estimand definition.
Possible Target-trial Specification and Emulation for the Example of Jackson3
||Target trial (estimand assumptions)
||Emulation (identification assumptions)
|A. Eligibility criteria
• Hypertensive patients
• Outcome-allowable covariates (age, sex) distributed as in pooled population of blacks and whites
• Patients with systolic blood pressure above 140 mm Hg
• Age and sex distribution as in the pooled sample of blacks and whites
|B. Treatment strategies
• Intervention arm: Blacks, hypothetical intervention that would shift distribution of hypertensive treatment intensification to that in whites, conditional on target-allowable covariates (diabetes, baseline blood pressure, age, sex)
• Comparator arm I: Blacks, no intervention
• Comparator arm II: Whites, no intervention
|Treatment arm measure
• Intervention arm: Ill-defined intervention, need unverifiable identifiability assumptions to emulate hypothetical shifts from data in blacks
• Comparator arm I: Blacks
• Comparator arm II: Whites
|C. Assignment procedures
||Randomization at recruitment without blind assignment
||Selection of confounders
• Nonallowable covariates (educational attainment, private health insurance)
• Outcome- and target-allowable covariates
|D. Follow-up period
• Starts: First visit
• Ends: 6 months later
|Timing of measures
• Mediator: First visit
• Outcome: 6 months later
• Systolic blood pressure above 140 mm Hg
|F. Causal contrasts
||Disparity reduction: Comparator arm I versus Intervention arm
Residual disparity: Intervention arm versus comparator arm II
Just completing the Table elucidates immediately that, by their first appearance being in different columns, allowable and nonallowable covariates are of a critically different nature, with allowable covariates pertaining to the target-trial specification, that is, estimand assumptions, while nonallowable covariates pertain to the emulation and required identification assumptions. In the Table, we further see that outcome-allowable covariates pertain to the target population, given that estimands are standardized to the distribution of these covariates in a chosen standard population, in this case, the population of blacks and whites who are hypertensive at first visit (though Jackson3 mentions alternatives). Jackson3 comments that the selected nature of this population does not preclude the identifiability of estimands, which is fundamentally because this is a characteristic of the target trial, not just of its emulation. Meanwhile, target-allowable covariates pertain to the treatment strategies under comparison; specifically, they condition the hypothetical shifts in the distribution of antihypertensive treatment intensification (the mediator) that are being evaluated.
TACKLING ILL-DEFINED INTERVENTIONS
The issue of the intervention being ill-defined arises under the “treatment strategies” heading, which is one, but not the defining, component of the target-trial exercise. Thus, having an ill-defined intervention should not preclude consideration of all the other aspects of the target-trial specification, which for example lead to the distinction between covariates defining the target population and confounders, as above. The ill-defined nature of the intervention is nonetheless problematic,12,22 as it adds an important layer of uncertainty to findings, and yet there are many important questions in epidemiologic research that face this issue.14–17,23 In work on interventional effects for multiple mediators, colleagues and I2 have suggested that ill-defined interventions may be tackled by positing hypothetical interventions that would lead to shifts in the intervention target (in a mediator, as in Jackson,3 or it could also be in an exposure as per so-called population intervention effects of stochastic interventions24), with estimand assumptions specifically about those shifts being made according to the research question. Like some g-formula approaches, this can be seen as an intermediate step between traditional causal inference, which relies predominantly on data, and simulation-based approaches like agent-based modeling, which depend less on data and more on theory and modeling,23 in the form of estimand assumptions.
In this view, the choice of target-allowable covariates is an example of such an assumption, and an important contribution of Jackson,3 in that previous proposals of interventional effects2,25–28 had not distinguished these from the confounding adjustment set. There are additional aspects that would also warrant consideration when positing estimand assumptions for interventional effects designed to evaluate the impact of mediator distributional shifts.2 First, one can consider whether it makes sense in the given context to posit that the exposure is intervened upon as well, which is not the case for the effects proposed by Jackson3 or Micali29 but is for other previously proposed interventional effects.2,25–28 Second, the contrasts of interest are preferably specified according to the question rather than the mathematical attractiveness of the decomposition; for instance, Jackson3 considered comparator arms without interventions, which they argued made more sense for their purpose than considering comparator arms under interventions as do the effects of Vansteelandt and Daniel.28 Third, the hypothetical shift of interest can also be specified according to the question; as for previous interventional effects, Jackson3 considers the unexposed group as benchmark to reflect an intervention that eliminates mediator disparities, but other choices are possible. Finally, consideration of other mediators, whether they are intervened upon, and how, is also important and is an issue that colleagues and I have considered in detail.2,21
All estimand assumptions encapsulate substantive judgments that require thought. What Jackson3 exemplifies in the context of health equity research is that choosing not to consider them explicitly is a choice in itself, and a potentially dangerous one. Therefore, putting an emphasis on estimand assumptions, and how we might use them to refine our research questions, is much needed, and the target-trial specification is a very efficient way of doing this systematically and comprehensively, even with ill-defined interventions.
From a practical perspective, a key takeaway message is that once a causal inference question is identified, as per above, this should be embraced from start to finish for the purpose of the analysis, via the target-trial framework, while keeping in mind that this will never fix all the limitations that must still be considered for the purpose of drawing causal conclusions. That is, the target-trial concept is, above all, a powerful analysis planning tool that aids clear thinking and will refine estimands and limit biases in their emulation. With this in mind, it is very helpful to explicitly incorporate question type and target-trial specification into analysis plan templates for observational studies, such as the one our group has developed for life-course cohort studies.30
To conclude, restricting the use of the target-trial framework to questions regarding well-defined interventions would represent a missed opportunity to sharpen questions and corresponding estimands when addressing what are clearly causal inference problems in many important areas. An approximate answer to the right question will always be better than an exact answer to the wrong question.
I thank John Carlin for his helpful comments.
1. Hernan MA, Robins J. Causal Inference: What If. 2020.Chapman & Hall/CRC;
2. Moreno-Betancur M, Moran P, Becker D, Patton G, Carlin JBMediation effects that emulate a target randomised trial: simulation-based evaluation of ill-defined interventions on multiple mediators.Stat Methods Med ResIn press.
3. Jackson J. Meaningful causal decompositions in health equity research: definition, identification, and estimation through a weighting framework. Epidemiology. 2021;32:282–290.
4. van der Laan MJ, Rose S. Targeted Learning. Causal Inference for Observational and Experimental Data. 2011.Springer;
5. Petersen ML. Commentary: applying a causal road map in settings with time-dependent confounding. Epidemiology. 2014;25:898–901.
6. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48.
7. Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82:669–688.
8. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Math Model. 1986;7:1393–1512.
9. Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183:758–764.
10. Hernán MA, Alonso A, Logan R, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19:766–779.
11. Schwartz S, Prins SJ, Campbell UB, Gatto NM. Is the “well-defined intervention assumption” politically conservative? Soc Sci Med. 2016;166:254–257.
12. Hernán MA. Invited commentary: hypothetical interventions to define causal effects–afterthought or prerequisite? Am J Epidemiol. 2005;162:618–620.
13. Didelez V. Commentary: should the analysis of observational data always be preceded by specifying a target experimental trial? Int J Epidemiol. 2016;45:2049–2051.
14. Galea S, Hernán MA. Win-Win: reconciling social epidemiology and causal inference. Am J Epidemiol. 2020;189:167–170.
15. Jackson JW, Arah OA. Invited commentary: making causal inference more social and (social) epidemiology more causal. Am J Epidemiol. 2020;189:179–182.
16. VanderWeele TJ. Invited commentary: counterfactuals in social epidemiology-thinking outside of “the box.” Am J Epidemiol. 2020;189:175–178.
17. Robinson WR, Bailey ZD. Invited commentary: what social epidemiology brings to the table-reconciling social epidemiology and causal inference. Am J Epidemiol. 2020;189:171–174.
18. Hernán MA, Hsu J, Healy B. A second chance to get causal inference right: a classification of data science tasks. Chance. 2019;32:42–49.
19. Dawid AAP. Causal inference without counterfactuals. J Am Stat Assoc. 2000;95:407–424.
20. Gelman A, Imbens G. Why ask why? Forward causal inference and reverse causal questions. No. w19614. National Bureau of Economic Research2013.
21. Moreno-Betancur M, Carlin JB. Understanding interventional effects: a more natural approach to mediation analysis? Epidemiology. 2018;29:614–617.
22. Kaufman JS. There is no virtue in vagueness: comment on: causal identification: a charge of epidemiology in danger of marginalization by Sharon Schwartz, Nicolle M. Gatto, and Ulka B. Campbell. Ann Epidemiol. 2016;26:683–684.
23. Hernán MA. Invited commentary: agent-based models for causal inference—reweighting data and theory in epidemiology. Am J Epidemiol. 2015;181:103–105.
24. Muñoz ID, van der Laan M. Population intervention causal effects based on stochastic interventions. Biometrics. 2012;68:541–549.
25. Geneletti S. Identifying direct and indirect effects in a non-counterfactual framework. J R Stat Soc Ser B. 2007;69:199–215.
26. Didelez V, Dawid AP, Geneletti S. Dechter R, Richardson TDirect and indirect effects of sequential treatments. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence. 2006:AUAI Press; 138–164.
27. Vanderweele TJ, Vansteelandt S, Robins JM. Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology. 2014;25:300–306.
28. Vansteelandt S, Daniel RM. Interventional effects for mediation analysis with multiple mediators. Epidemiology. 2017;28:258–265.
29. Micali N, Daniel RM, Ploubidis GB, De Stavola BL. Maternal prepregnancy weight status and adolescent eating disorder behaviors: a longitudinal study of risk pathways. Epidemiology. 2018;29:579–589.
30. Moreno-Betancur M, Carlin JB. Analysis plan template for life-course cohort studies
. 2020. Available at: https://doi.org/10.26188/12471380
. Accessed June 12, 2020.