Share this article on:

Specifying the Correlation Structure in Inverse-Probability- Weighting Estimation for Repeated Measures

Tchetgen Tchetgen, Eric J.; Glymour, M. Maria; Weuve, Jennifer; Robins, James

doi: 10.1097/EDE.0b013e31825727b5

Departments of Epidemiology and Biostatistics Harvard School of Public Health Boston, MA (Tchetgen Tchetgen)

Department of Society, Human Development and Health Harvard School of Public Health Boston, MA (Glymour)

Rush Institute for Healthy Aging Department of Internal Medicine Rush University Medical Center Chicago, IL (Weuve)

Departments of Epidemiology and Biostatistics Harvard School of Public Health Boston, MA (Robins)

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article ( This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.

Back to Top | Article Outline

To the Editor:

Inverse-probability weighting has recently gained popularity as an intuitive and practical approach for estimation in the context of causal inference and missing-data problems in epidemiology. Inverse-probability weighting a person's data by the probability density for his or her observed exposure history is most commonly used in epidemiology to account for time-varying confounding when estimating the parameters indexing the joint causal effects of a time-varying exposure in a marginal structural model.1,2 Inverse-probability weights for dropout are similarly used when estimating the regression parameters of a right-censored outcome or to account for dependent forms of attrition in the analysis of repeated measures.34

In studies of repeated outcomes, it is customary to account for dependence in the outcomes of a given person by specifying a working correlation structure for the person's outcomes and to subsequently estimate the mean regression parameters of main interest using generalized estimating equations that incorporate both the inverse-probability weights and the working correlation structure. In the absence of weights, it is well known that generalized estimating equations consistently estimate the parameters of a correctly specified regression model, irrespective of whether the working correlation structure is correct.

Here, we show that the situation is different when weights are present, and that regression estimates obtained from generalized estimating equations that are inverse-probability weighted can be biased even when the correlation structure is correct. Specifically, in an eAppendix (, we focus on generalized estimating equations as implemented in Proc GENMOD in SAS,5 and we establish sufficient conditions for bias to occur in regression estimates, in the absence of modeling error, when inverse-probability weights are used to account for dependent dropout. In particular, the eAppendix provides a formal proof that regression estimates from a repeated measures analysis that is inverse-probability weighted for dropout will generally be biased unless at least 1 of the following 2 conditions holds:

Condition 1. The repeated outcomes are assumed to be uncorrelated, and therefore, the independence working correlation structure is used, or

Condition 2. The dropout process is independent of the repeated outcomes, given the covariates in the regression model.

The second condition will generally fail to hold in settings where, as we assume throughout, it is believed that the observed history of the outcome process predicts a person's chance of dropping out. When (as typically the case in practice) the correlation structure is estimated from the data, the first condition may be modified to state that, for consistency, the estimated within-person correlation must converge with sample size to 0.

In light of the aforementioned result, a simple strategy that allows more careful use of estimating equations to obtain an asymptotically unbiased regression estimates is simply to impose condition 1 and altogether ignore the correlation structure for point estimation, ie, assume a possibly incorrect working independence correlation structure for estimation. The approach is akin to pooling multiple artificial studies, each study ending at a different follow-up time with corresponding dropout weights, and ignoring for the purposes of point estimation, the fact that the same person may contribute to multiple such artificial studies. Robust standard errors or the bootstrap can then be used for inference. An alternative equally simple approach is discussed in the eAppendix.

We briefly illustrate the results of the previous section in an analysis of the effects of smoking on cognitive decline in an aging population subject to substantial attrition due to death and dropout for other reasons.6 In their paper, Weuve et al6 noted that selective attrition in this population may introduce bias into analyses of the effects of smoking status measured at the start of follow-up on cognitive decline. In an effort to appropriately account for selective attrition, Weuve and colleagues used inverse-probability-of-attrition weights and examined the influence of selective attrition on the estimated association of current smoking (vs. never smoking) with cognitive decline in participants of the Chicago Health and Aging Project (n = 3713), 65–109 years of age, who were current smokers or never-smokers, and underwent cognitive assessments up to 5 times at 3-year intervals.6 The Table compares their estimates obtained under an exchangeable correlation structure with estimates obtained under the independence correlation structure.

Although the point estimates contrasting smokers' and never-smokers' rates of cognitive decline appear to have been relatively robust to bias induced by the use of an exchangeable working correlation structure, the estimated 10-year cognitive decline for never-smokers was notably more sensitive to the correlation structure used in these analyses. Overall, the qualitative results of Weuve et al6 remain largely unchanged.

In the eAppendix, we discuss implications of our findings for weighted analyses of marginal structural mean models for repeated measures. Specifically, to avoid bias, we generally recommend that a possibly incorrect independence correlation structure be used in such analyses; we also discuss, in the eAppendix, an alternative approach that allows for use of a nonindependence working correlation structure when estimating marginal structural models.

Eric J. Tchetgen Tchetgen

Departments of Epidemiology and Biostatistics

Harvard School of Public Health

Boston, MA

M. Maria Glymour

Department of Society,

Human Development and Health

Harvard School of Public Health

Boston, MA

Jennifer Weuve

Rush Institute for Healthy Aging Department of Internal Medicine

Rush University Medical Center

Chicago, IL

James Robins

Departments of Epidemiology and Biostatistics

Harvard School of Public Health

Boston, MA

Back to Top | Article Outline


1. Robins JM. Marginal structural models. In: 1997 Proceedings of the American Statistical Association. Section on Bayesian Statistical Science. Alexandria, VA: American Statistical Association; 1998: 1–10.
2. Robins JM, Hernán M, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560.
3. Robins JM, Rotnitzky A. Semiparametric efficiency in multivariate regression models with missing data. J Am Stat Assoc. 1995;90:122–129.
4. Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc. 1995;90:106–121.
5. SAS Institute Inc. SAS/STAT 9.2 User's Guide. 2nd ed. Cary, NC: SAS Institute Inc.; 2009.
6. Weuve J, Tchetgen Tchetgen EJ, Glymour MM, et al.. Accounting for bias due to selective attrition: the example of smoking and cognitive decline. Epidemiology. 2012;23:119–128.

Supplemental Digital Content

Back to Top | Article Outline
© 2012 Lippincott Williams & Wilkins, Inc.