Secondary Logo

Journal Logo

An R Package for G-estimation of Structural Nested Mean Models

Wallace, Michael P.; Moodie, Erica E. M.; Stephens, David A.

doi: 10.1097/EDE.0000000000000586
Letters
Free
SDC

Supplemental Digital Content is available in the text.

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada, michael.wallace@uwaterloo.ca

Department of Mathematics and Statistics, McGill University, Montreal, QC, Canada

Supported by Discovery Grants from the Natural Sciences and Engineering Research Council (NSERC) of Canada. The author E. E. M. M. is sponsored by a Chercheur-boursier career award from the Fonds de recherche du Quèbec-Santè (FRQS).

The authors report no conflicts of interest.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com).

Back to Top | Article Outline

To the Editor:

Structural nested mean models are a useful tool when estimating the effect of time-varying treatments, a challenge made more difficult by the presence of treatment-dependent confounders. We consider the situation where data are measured on, and a treatment assigned to, subjects at a number of distinct time points (or stages). We wish to identify the effect of treatment at each stage on a final (continuous) outcome, when all time-varying confounders are correctly measured, with each treatment’s effect characterized by a structural nested mean model. For example, consider a study of the effect of activity level (the treatment) on blood pressure (the outcome), with data collected by repeated questionnaires over time. We might expect activity level to be associated with blood pressure, but other factors such as age or body mass index may interact with treatment, potentially obfuscating its effect. A structural nested mean model to model the treatment effect may take such interactions into account.

G-estimation1 is an estimating equation-based approach used to estimate the parameters of structural nested mean models (but has wider applications). Despite theoretical advantages over alternative approaches,2–5 it has seen relatively little use, thanks to its typically strongly theoretical presentation and challenging implementation. We have therefore derived simplified theory, reducing G-estimation to straightforward matrix equations, and produced an accompanying R package DTRreg.6 Theoretical details are included as Supplementary Material (http://links.lww.com/EDE/B132).

We demonstrate G-estimation and DTRreg with a simulation study; a real data analysis is included in the Supplementary Material (http://links.lww.com/EDE/B133). We consider a three-stage example with binary treatments at stages 1 and 3, continuous treatment at stage 2, and treatment–covariate interaction:

  • Stage 1. Covariate:
  • ; treatment:
  • ;
  • Stage 2. Covariate:
  • ; treatment:
  • ;
  • Stage 3. Covariate:
  • ; treatment:
  • ; and
  • Outcome:

where for simplicity, we set all parameters equal to 1.

Treatment effects are therefore characterized by structural nested mean models

, and we seek estimates

for

and

, so that the effect of assigning a patient with covariate

the treatment aj is estimated by

.

In addition, at each stage, we consider a treatment-free model, characterizing the expected outcome assuming no treatment (aj = 0) at that and all subsequent stages (denoted Gj):

  • Stage 1:
  • ;
  • Stage 2:
  • ; and
  • Stage 3:
  • .

Our final component is the treatment model: the expected value of treatment given prior information. At stages 1 and 3 (binary treatment), we estimate this via logistic regression, at stage 2 (continuous treatment), we use linear regression.

G-estimation for such an analysis boasts the double-robustness property: our structural nested mean model parameter estimators at each stage are consistent if at least one of the treatment or treatment-free models is correctly specified. To demonstrate, we conduct our analysis with a misspecified treatment-free model at stage 3, a misspecified treatment model at stage 2, and both models misspecified at stage 1. Misspecification is achieved by omitting all covariates from the affected models.

  • Stage 1 (both misspecified).

Treatment:

(fit by logistic regression)

Treatment-free:

;

  • Stage 2 (treatment model misspecified).

Treatment:

(fit by linear regression)

Treatment-free:

;

  • Stage 3 (treatment-free model misspecified).

Treatment:

(fit by logistic regression)

Treatment-free:

.

We can estimate the structural nested mean model parameters at each stage in a step-by-step fashion, either manually through matrix equations (eAppendix; http://links.lww.com/EDE/B134), or through our R package (Figure). Analyzing 1,000 datasets of size n = 1,000, we obtain mean estimates

,

, and

at stages 1, 2, and 3, respectively. As expected, the estimators appear consistent when either the treatment or treatment-free model was correctly specified (stages 2 and 3), but not when both were misspecified (stage 1). Inference may be pursued by either the bootstrap or sandwich-based approaches.

FIGURE

FIGURE

Structural nested mean models are a valuable, but underused, alternative to more established modeling techniques, with G-estimation one approach for parameter estimation within this framework. Through simplified theory, or our computational routine, G-estimation may be implemented with ease, and we encourage practitioners to consider its use in future analyses.

Back to Top | Article Outline

ACKNOWLEDGMENT

The authors thank the National Heart, Lung, and Blood Institute for allowing access to data from the Honolulu Heart Program.

Michael P. Wallace

Erica E. M. Moodie

Department of Epidemiology, Biostatistics and Occupational Health

McGill University

Montreal, QC, Canada

michael.wallace@uwaterloo.ca

David A. Stephens

Department of Mathematics and Statistics

McGill University

Montreal, QC, Canada

Back to Top | Article Outline

REFERENCES

1. Robins JM. Lin D, Heagerty P. Optimal structural nested models for optimal sequential decisions. Proceedings of the second Seattle Symposium on Biostatistics. 2004:New York, NY: Springer-Verlag; 189–326.
2. Robins JM. Marginal structural models versus structural nested models as tools for causal inference. Epidemiology. 1999;116:95–134.
3. Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560.
4. Robins JM, Hernán MA. Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G. Estimation of the causal effects of time-varying exposures. Longitudinal Data Analysis. 2009:Boca Raton, FL: CRC Press; 553–599.
5. Joffe MM. Structural nested models, g-estimation, and the healthy worker effect: the promise (mostly unrealized) and the pitfalls. Epidemiology. 2012; 23:220–222.
6. Wallace MP, Moodie EEM, Stephens DA. DTRreg: DTR Estimation and Inference via G-Estimation, Dynamic WOLS, and Q-Learning, 2016. Available at: https://cran.r-project.org/. Accessed November 1, 2016.

Supplemental Digital Content

Back to Top | Article Outline
Copyright © 2017 Wolters Kluwer Health, Inc. All rights reserved.