Secondary Logo

Journal Logo


Importance of Homogeneous Effect Modification for Causal Interpretation of Meta-analyses

Steele, Russell J.a; Schnitzer, Mireille E.b; Shrier, Ianc

Author Information
doi: 10.1097/EDE.0000000000001181
  • Free

This issue of the Journal contains two articles focusing on how to draw causal conclusions using data from multiple studies that would generally be analyzed via meta-analysis. Manski1 approaches the problem using partial identification methods commonly found in the economics literature to synthesize conditional regression models across studies. Dahabreh et al.2 identified the assumptions necessary for causal inference for meta-analyses via counterfactuals under a nonparametric structural equation model and propose the use of g-methods3 for estimation using individual patient data. In this commentary, we draw links between these two seemingly disparate approaches, as well as place them in the broader context of methods currently used in meta-analysis of randomized trial data.


The goal of a meta-analysis in the above context is to synthesize information about a target population parameter from data collected from different randomized controlled trials (RCTs), where each RCT unconditionally randomly assigned one of the same two treatments to each subject with equal probability. Both articles1,2 allow for relaxation of these assumptions in different ways, but our basic example suffices for the purposes of characterizing some important differences between the two approaches. Further assume that each RCT assigns the same number of subjects to each treatment arm, so that the sample sizes are balanced across trials.

The target population causal parameter for meta-analyses of data structured as in the preceding paragraph can often be expressed as , where is the potential outcome for the response variable if the patient were to receive treatment level (0 or 1). As mentioned in both articles, the first challenge is to clearly define the population of subjects over which we want to compute the expectation of potential outcomes. Manski1 focuses on a population of a single subject, the patient of interest with covariate profile , for whom the clinician wants to choose an appropriate treatment level based on a model estimated from subjects with the same covariate profile. Dahabreh et al.2 assume a distinct target population for which baseline characteristics are available for a simple random sample of size . Therefore, we can view both articles as having the same target parameter where the article by Manski1 targets a homogeneous population represented by the relevant baseline covariates of the individual patient, and the article by Dahabreh et al.2 allows for a heterogeneous target population.


When synthesizing information across randomized clinical trials (RCTs), the underlying causal inference challenge is subtle. For a single RCT with perfect adherence, causal parameters can always be estimated consistently for the sampling frame of the study. However, drawing causal inferences from several studies requires connecting information from each study to the population of interest through shared model parameters, that is, transporting estimation from studies with observed counterfactual outcomes to the target population.4 When multiple studies are estimating independent, disconnected treatment effects, there is no obviously correct way to combine information to target a particular summary parameter without additional assumptions.

Dahabreh et al.2 make this connection directly with a critical conditional exchangeability in measure assumption over the different trial participation mechanisms, similar to our work.5 If indicates the collection of study population indices and indexes a study of interest (with indicating the target population), Dahabreh et al.2 assume for every covariate combination in the target population. This assumption guarantees that the expected difference in potential outcomes conditional on a covariate pattern in the target population is the same as the expected difference in potential outcomes conditional on the same covariate combination in every one of the other studies in the meta-analysis. The purpose of the conditional exchangeability assumption here is different from our work in aggregate network meta-analyses,6 in that both allow for effect modification, but our assumption involves covariate influence at the study level while assumption by Dahabreh et al.2 is at the individual level.

Manski1 implicitly makes a similar assumption. To be able to use the bounds in sections 3 and 4 of Manksi's article effectively requires that is equal across trials as well as for the target subject of interest with covariates pattern at treatment level . Assume the contrary in our simple case, that is, that is not equal for two of the trials. As the number of studies and the sample size of each study increase, the set intersection interval in Manski’s equation 41 becomes empty. This is because at least one study would be consistently estimating a different , requiring the user to choose arbitrarily among sets of intervals believed to be measuring the same quantity in the target population of interest.

One cannot overstate the importance of the assumption of homogeneous effect modification across studies for both articles, which allows for the pooling of contrasts or arm-specific outcomes across studies. However, many meta-analysts might consider them unlikely to hold at first glance. Note that the treatment effects are not necessarily restricted to be constant across studies conditional on covariates. Heterogeneity in treatment effects via random effects which do not confound the treatment effect5 would also satisfy the equations above.


Our previous work6 dealt with identifiability under restrictions of the unconfoundedness of study-level information rather than focus on the problem at the individual patient level because access to individual patient data for meta-analysis (even of randomized trials) remains quite restricted.7 Lack of access to individual patient data impacts the g-methods approach most directly. The g-methods approach requires the ability to compute the probabilities that subjects participate in a particular RCT. Without access to individual patient data, we cannot compute such probabilities. The widely used CONSORT checklist8 mentions only the reporting of marginal baseline individual covariate summaries by treatment arm, which remains insufficient to compute the probability of participation in the vast majority of situations.

Manski’s partial identification approach does not require individual patient data; in fact, one of the key strengths of the partial identification approach is that it allows one to construct intervals from each study separately and then combine interval bounds across studies. However, the partial identification approach requires that individual studies report the same expected outcomes adjusted for the target subject’s covariates across studies, and that all effect modifiers are properly adjusted for. Studies may generally measure the same demographic data, but often vary in the measurement or reporting of outcomes and baseline covariates9 as well as reporting of results for regressions including interactions between variables or polynomial regression terms. In these cases, the information required to convert the conditional means to marginal means for the observed treatment arms is often not provided.10


In conclusion, Manksi1 and Dahabreh et al.2 propose different strategies for combining causal inferences from multiple studies. Both articles require homogeneous effect modification by covariates across studies. Furthermore, both articles require specialized situations where either individual patient data are available or covariate-adjusted results are reported. Given the move toward reproducibility and open access to data across science, we envision that these two approaches could be useful in the future.

In the current context, other work in meta-analysis addresses the limitations of currently available multi-study datasets by using other available information. Vo et al.11 discussed using individual patient data (IPD) to standardize each study to the covariate distribution of interest and to calculate marginal effects. In the context of network meta-analysis, additional complications arise when only a subset of treatment arms is observed in some (or all) study populations. Making assumptions based only on observed marginal characteristics of the studies, Schnitzer et al.6 examined g-methods and a targeted learning approach, whereas others have used parametric Bayesian approaches.12,13 Although network meta-analysis methods are generally used when not all studies collect data on all possible treatment arms, these methods provide different sets of assumptions necessary to connect the parameters (or effects) from the different study populations. White et al.14 assume parameter heterogeneity can be expressed through a hierarchical normal model. Wang et al.5 define a union of study superpopulations for IPD meta-analysis and an overall summary target causal parameter in terms of the superset of all study population potential outcomes, as in Ref. 6. They then use targeted learning to appropriately synthesize the information across studies.

One theme that resonates throughout all of these articles, and is explicitly stated by Dahabreh et al.,2 is the need for sensitivity analyses of the untestable assumptions that enable proper evidence synthesis. Given the generally small number of studies that are typically available for any research question of interest, conclusions can be quite sensitive to even small deviations from assumptions. Turner et al.15 address this issue from a Bayesian (noncounterfactual) perspective for assumptions relating to different kinds of bias. The bias most relevant to our discussion is that of selection bias, where the synthesized studies included in the analysis are not reflective of the target population of interest. Although an assumption of homogeneous effect modification allows one to draw causal inferences in both proposed approaches, sensitivity to deviations from this assumption should be carefully considered.


Russell J. Steele is an Associate Professor in the Department of Mathematics and Statistics at McGill University. He has published articles developing new methods for meta-analysis and causal inference, as well as collaborated with clinicians and epidemiologists using those methods to solve clinical research problems. Steele has also a number of publications in the areas of Bayesian inference and Monte Carlo computation.

Mireille E. Schnitzer is an Associate Professor of Biostatistics at Université de Montréal and holds a Canada Research Chair in Causal Inference and Machine Learning in Health Science. Schnitzer received her PhD in Biostatistics from McGill University in 2012 and was a postdoctoral researcher at the Harvard T.H. Chan School of Public Health in 2013. Schnitzer has multiple publications on causal inference theory and methodology including semiparametric efficient estimation in longitudinal, survival, and meta-analytical settings.

Ian Shrier has published several methodologic articles on combining observational and randomized studies, and how to apply a causal inference approach to evidence synthesis. He has participated in the development of the Cochrane Risk of Bias Tool for Non-Randomized Studies, and the Cochrane Risk of Bias Tool for Randomized Studies. Dr. Shrier is on the Board of Trustees for the Society for Research Synthesis Methods, and the Co-Editor-in-Chief of the journal Research Synthesis Methods.


1. Manski CF. Towards credible patient-centered meta-analysis. Epidemiology. 2020;31.
2. Dahabreh IJ, Petito LC, Robertson SE, et al. Towards causally interpretable meta-analysis: transporting inferences from multiple studies to a target population. Epidemiology. 2020;31.
3. Naimi AI, Cole SR, Kennedy EH. An introduction to g methods. Int J Epidemiol. 2017;46:756–762.
4. Bareinboim E, Pearl J. Causal inference and the data-fusion problem. Proc Natl Acad Sci U S A. 2016;113:7345–7352.
5. Wang G, Schnitzer ME, Menzies D, et al. Estimating treatment importance in multidrug-resistant tuberculosis using targeted learning: an observational individual patient data network meta-analysis. Biometrics. 2020. In press.
6. Schnitzer ME, Steele RJ, Bally M, et al. A causal inference approach to network meta-analysis. J Causal Inference. 2016;4:1–19.
7. Naudet F, Sakarovitch C, Janiaud P, et al. Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in the BMJ and PLOS Medicine. BMJ. 2018;360:k400.
8. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c869.
9. Higgins JPT, Green S. The Cochrane Collaboration. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0. 2011. Available at: Accessed February 1, 2020.
10. Shrier I, Redelmeier A, Schnitzer ME, et al. Challenges in interpreting results from ‘multiple regression’when there is interaction between covariates. BMJ EvidBased Med. 2020; In press.
11. Vo T-T, Porcher R, Chaimani A, et al. Rethinking meta-analysis: assessing case-mix heterogeneity when combining treatment effects across patient populations. arXiv preprint arXiv:1908.10613. 2019.
12. Lu G, Ades A. Assessing evidence inconsistency in mixed treatment comparisons. J Am Statist Assoc. 2006;101:447–459.
13. Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat Med. 2010;29:932–944.
14. White IR, Turner RM, Karahalios A, Salanti G. A comparison of arm-based and contrast-based models for network meta-analysis. Stat Med. 2019;38:5197–5213.
15. Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc Ser A Stat Soc. 2009;172:21–47.
Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.