Although well-conducted randomized experiments are in principle the preferred study design for evaluating the causal effects of an exposure on an outcome, randomization is often unfeasible, leaving observational studies as the best alternative. However, observational studies are prone to confounding and selection bias. Several analytic methods can be used to overcome these potential biases, such as multivariable regression, propensity scores, and the instrumental variable approach.1
These techniques may be inadequate in longitudinal observational studies when more complex biases are present, such as when exposure is time-dependent and time-varying confounders are present. A time-varying confounder is a variable affected by prior exposure that predicts both the subsequent outcome and subsequent exposure. To correct the effects of time-varying confounding in observational studies, Robins in 1997 developed a new class of causal models called marginal structural models.2 These models use inverse-probability (of exposure) weights to create an artificial population in which covariate imbalances are removed and causal effects can be estimated. The publication of 2 companion papers3,4 in 2000 facilitated the practical application of this new method.
We aimed to determine whether there were differences in exposure effect estimates between marginal structural models and conventional models in real applications, and to examine the completeness of reporting of marginal structural models.
We systematically reviewed published studies that used marginal structural models to estimate exposure effects potentially affected by time-varying confounding. Citations from 2000 to October 2009 were retrieved from both PubMed and ISI Web of Knowledge databases following this search strategy: first, we retrieved all papers from PubMed selected by the query: (marginal structural model) OR (“marginal structural model”) OR (marginal structural models) OR (“marginal structural models”) OR (“inverse probability weighting”) OR (“inverse probability weighted”) OR (“inverse probability weights”); second, we retrieved from ISI all papers selected by the query: Topic = (“marginal structural”) AND Topic = (model*); third, we retrieved from ISI all papers selected by the query: Topic = (“inverse probability”) AND Topic = (weight*); and fourth, we retrieved all papers that cited one of the 2 seminal epidemiologic papers on marginal structural models.3,4 Methodologic papers and non-English language papers were included, but not abstracts, editorials, or letters.
The selected articles were abstracted independently by 2 reviewers. From each article with results from both marginal structural models and conventional analyses, we abstracted the measure of association, confidence intervals, and standard errors for the exposure–outcome associations. We defined conventional models as the alternative model reported in the paper, if any (eg, a multivariable regression model). If more than one conventional model was reported, we chose the one providing the highest degree of adjustment. Only results from the exposure of interest were abstracted, even if a paper reported estimates on other covariates. In addition, for all papers using marginal structural models (including those that did not provide a comparison with conventional methods), we recorded 2 good practices in marginal structural modeling: whether the paper stated that stabilized inverse-probability weights had been used, and whether it stated that the mean of the stabilized inverse-probability weights was close to one. Robins et al3 recommended stabilized inverse-probability weights because they have good convergence properties, are efficient, and are easy to compute. Stabilized inverse-probability weights with a mean far from one may indicate a violation of some of the hypotheses of the model, eg, a misspecification of the weight model or a violation of the positivity assumption.5
The search yielded 661 unique papers; 543 of these were excluded because they did not use marginal structural models to estimate the effect of an exposure on outcome while accounting for time-dependent confounding. The Figure shows the distribution of the resulting 118 papers by year. The number of articles that used marginal structural models to control for time-varying confounding increased from 2 in 2002 to 29 in 2008 (almost 15-fold); in comparison, the total number of publications listed in PubMed increased only 46% in the same period. Of the 118 papers, 65 (55%) compared the numerical results of the marginal structural model and a conventional model, and an additional 5 papers (4%) reported the results qualitatively.
The 65 papers that provided data to compare marginal structural models and the conventional models reported 164 exposure-outcome associations. Of these, 18 (11%) yielded a marginal structural model (MSM) estimate and a conventional model estimate with opposite interpretations, ie, one model considered the exposure a risk factor and the other considered it a protective factor. For the remaining associations (in which the direction of the effect was the same), the MSM estimate differed by at least 20% from the conventional estimate on the usual scale (eg, odds ratio or coefficient of linear regression) in 58 (40%) of the exposure-outcome associations. Similar results were obtained when restricting the comparisons to linear regression coefficients (43 of 164 estimates, 26%), where 6 (14%) yielded estimates with opposite interpretations, and, the MSM estimates differed from the conventional estimates by at least 20% in 17 of the remaining 37 associations (46%).
The standard errors for the 2 methods were available for 156 associations. The standard errors of the MSM associations were in median 19% greater than the respective conventional standard errors (IQR: 2% to 48%).
Of the original 118 papers, 30 were purely methodological. The remaining 88 used marginal structural models to analyze a real dataset. Only 53 (60%) of these papers reported that stabilized inverse-probability weights were used, and only 28 (32%) reported that they verified that the mean of the stabilized inverse-probability weights was close to one. Finally, almost half of the 88 papers were about HIV treatment (46%). The eAppendix (http://links.lww.com/EDE/A481) provides a list of the 118 publications that used marginal structural models.
This review of the literature shows that the number of published papers using marginal structural models has increased rapidly since the publication of 2 seminal papers on marginal structural models in 2000.3,4 Some reports from single studies have provided evidence that, compared with conventional models, marginal structural models yield results closer to those results of randomized experiments.6,7 However, no systematic review of this difference in the literature has been undertaken. Our review shows that there are substantial differences between marginal structural models and conventional analyses in real studies in which time-varying confounding was suspected.
Controlling for time-varying confounding increased the median standard error of the estimates nearly 20% compared with conventional standard errors. Cole and Hernán5 have described this increase in standard error as a trade-off between bias and precision.
Our review also shows that the reporting of marginal structural models can be improved. Researchers should routinely report whether they used stabilized inverse-probability weights, and verify that the mean of these weights was close to 1.0.
The implications of the results of this review are not limited to observational studies with time-varying confounding. As noted by Toh and Hernán,8 randomized studies can also suffer from biases similar to those affecting observational studies if the study is randomized at baseline but postbaseline changes in treatment are not randomized. Thus, the use of marginal structural models is to be encouraged in that context as well.
Our review has some limitations. First, we missed any papers that used marginal structural models without explicitly reporting doing so in the title, keywords, or abstract. However, marginal structural models are still viewed as an advanced and novel approach to data analysis, so it is likely that few papers would use this method without clearly acknowledging it. Second, marginal structural models provide marginal estimates while conventional estimates are conditional. In the case of odds ratios and hazard ratios, the 2 approaches can differ even in the absence of confounding.9 However, this is also true for the comparison of propensity scores versus multivariable regression models, and the differences found in the present review are stronger than those previously reported for propensity scores.10,11 In addition, restricting our comparison to the linear regression coefficient estimates, where marginal and conditional estimates are coincident,12,13 resulted in differences similar to those found in the overall analysis.
Third, violations of the positivity assumption could lead to biased marginal structural model estimates.14 This point underlines the importance of checking that the mean of the stabilized inverse-probability weights is close to 1.0 and of exploring the impact of extreme weights. Fourth, if papers with similar results from the marginal structural models and conventional analyses were more likely to report the results of only one of the 2 methods, there might be a publication bias.
In conclusion, the findings of this review suggest that there are important differences between marginal structural model and conventional estimates in real settings and that reporting of marginal structural models can be improved. Applied researchers should use marginal structural models in their analyses whenever time-varying confounding is likely to occur, taking care to provide a complete description of their results.
We thank Samuel Rolfe for his help and an anonymous reviewer for the helpful comments.
1.Haro JM, Kontodimas S, Negrin MA, Ratcliffe M, Suarez D, Windmeijer F. Methodological aspects in the assessment of treatment effects in observational health outcomes studies. Appl Health Econ Health Policy
2.Robins JM. Marginal structural models. In: 1997 Proceedings of the Section on Bayesian Statistical Science, Alexandria, VA: American Statistical Association, 1998;1–10.
3.Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology
4.Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology
5.Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol
6.Cook NR, Cole SR, Hennekens CH. Use of a marginal structural model to determine the effect of aspirin on cardiovascular mortality in the Physicians' Health Study. Am J Epidemiol
7.Suarez D, Haro JM, Novick D, Ochoa S. Marginal structural models might overcome confounding when analyzing multiple treatment effects in observational studies. J Clin Epidemiol
8.Toh S, Hernán MA. Causal inference from longitudinal studies with baseline randomization. Int J Biostat
. 2008;4:Article 22.
9.Kaufman JS. Marginalia: comparing adjusted effect measures. Epidemiology
10.Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. J Clin Epidemiol
11.Sturmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. J Clin Epidemiol
12.Gail M, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regression and omitted covariates. Biometrika
13.Vellaisamy P, Vijaya V. Collapsibility of regression coefficients and its extensions. J Stat Plan Inference
14.Kurth T, Walker AM, Glynn RJ, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol