In this issue of EPIDEMIOLOGY, Chevrier et al^{1} apply G-estimation to estimate the effect of exposure to metalworking fluids when a healthy-worker survivor effect is present. This is the first application of G-estimation to this problem in the more than 20 years since G-estimation was first proposed.^{2} Like G-computation (or the G-formula), G-estimation has been much less popular in practice than the more recently proposed marginal structural models. In this commentary, I discuss possible reasons for the relative unpopularity of G-estimation in practice, as well as some little recognized advantages of the approach compared with competing methods.

##### Pitfalls of G-estimation: Unfamiliarity and Artificial Censoring

G-estimation is a semiparametric way to estimate parameters in structural nested models. These models include structural nested mean models,^{3},^{4} structural nested distribution models,^{4} structural nested failure time models,^{2},^{5},^{6} and the newly developed structural nested cumulative failure time models.^{7} These models differ from the more familiar standard regression models and marginal structural models in important ways. In the more familiar models, covariates may be used in the structural model as main (statistical) effects themselves and also as modifiers of the causal effects of the treatments of interest. Only baseline covariates may be included in the models if one wishes to retain causal interpretation for the joint effects of treatments provided at different times. In contrast, structural nested models directly parameterize only the effect of a possibly sequential series of interventions on the outcome of interest. Thus, nontreatment covariates enter the structural model only as modifiers of the effect of the treatment, not as main effects. Additionally, time-varying covariates that modify the effect of subsequent treatments may enter the model as effect modifiers.

For outcomes measured at the end of a fixed period or for repeated measures outcomes, the form of these models can resemble those of standard regression models. However, structural nested failure time models do not directly estimate hazard ratios or other relative risk parameters that are ubiquitous in epidemiology and applied survival analysis. The accelerated failure time model with time-varying covariates is essentially the only structural nested failure time model that has been used in practice. Conversion between the parameters in the accelerated failure time model used most often in conjunction with G-estimation and parameters in the proportional hazards model may be done in 1 of the 2 ways. If the treatment-free survival times follow a Weibull distribution, there is a direct correspondence between the parameters in the accelerated failure time model and the hazard ratio.^{6} The Weibull model is restrictive; for example, it cannot accommodate hazard or rates that start at a nonzero value at the beginning of follow-up and increase over time. When failure times do not follow a Weibull model, there will be a separate causal hazard ratio at each follow-up time. One can then characterize that hazard ratio as a function of time, which is unwieldy and difficult to summarize, or estimate the hazard ratio at one or a few specific times, as done by Chevrier et al.^{1} This approach is of greatest value if the hazard ratio does not vary too much over time. It is not clear in the paper how representative the reported hazard ratio is of the set of hazard ratios over the time of follow-up, and it would be useful to report a hazard ratio at several times if the hazard ratio varies substantially with time.

G-estimation of structural nested failure time models suffers further from the consequences of artificial censoring.^{8} To test a hypothesis about causal effects without making parametric assumptions beyond those in the structural nested failure time models, some subjects whose failure is observed must be treated as if they were censored; subjects artificially censored for testing one hypothesis about causal effects may not be censored for testing other hypotheses.^{5},^{6} This results in estimating functions that are not smooth functions of the parameters being tested or estimated. This has several consequences. First, common algorithms (eg, Newton-Raphson) for finding solutions to the estimating functions may not work. Second, estimates of the variance of the estimator become problematic in practice. The problems with artificial censoring may be even deeper when, as is nearly inevitable, the models for causal effects are misspecified because the optimization criterion may no longer be appropriate.

The first 2 problems are not, in principle, serious limitations in attempting to estimate a causal model with a single parameter. One can use line search methods to obtain point estimates and confidence intervals; an implementation of this is available in Stata.^{9} However, if one is trying to estimate a complex causal model where the effect of treatment may be modified by covariates and prior treatment or where there are dose-response functions for nonbinary treatments, grid searches quickly become infeasible. Sometimes, one can obtain estimates for individual parameters without a high-dimensional grid search. In particular, if there are separate parameters for the effects of earlier and later interventions given in sequence, one can estimate the effect of the parameters in reverse chronological sequence.^{10} The problems with high-dimensional optimization can be mitigated to some degree by using estimating equations that are smoother functions of the parameters being estimated—but this cannot always be relied on.^{8} In the application presented by Chevrier et al,^{1} only one parameter is estimated. This will often be a reasonable approximation to the truth in occupational health, where an exposure may be uniformly harmful and the effects may not vary too much with measured covariates. In contrast, when sorting out complex questions about clinical management of patients, where the treatment may be beneficial or harmful in different subgroups and dose-response functions are nonlinear, such 1-parameter models may not answer useful clinical questions. G-estimation may not be useful in this setting. Artificial censoring is not necessary for G-estimation in the other classes of structural nested models; thus, the resulting problems do not complicate estimation of those models. G-estimation of structural nested cumulative failure time models may thus be able to substitute for noncumulative models in some settings with failure times as outcomes.^{7} Alternatively, parametric methods, despite their pitfalls, might be used to estimate the noncumulative models. Some SAS macros are available for G-estimation of structural nested distribution models in the context of mediation analyses.^{11}

##### Advantages of G-estimation: Flexibility in Modeling and Assumptions

The authors justify the use of G-estimation by comparing the approach with variants of standard methods, and cite confounding by variables affected by treatment as the reason for using G-estimation. Further, they demonstrate nicely that appropriate methods give different and more reasonable estimates in their application of those methods. As the first applied paper on the healthy-worker survivor effect, this demonstration is appropriate. In general, after many papers on the subject of time-varying confounders, lengthy justification for using appropriate estimation approaches in this setting should no longer be necessary in the epidemiology literature (although it may still be necessary for papers in various clinical specialties).

Marginal structural models have become the de facto standard for dealing with time-varying confounding. An explanation of reasons for the use of G-estimation instead of marginal structural models would often be worthwhile. The authors have not provided such explanation. In occupational health settings like those considered here, estimation of standard marginal structural models typically runs afoul of the required positivity assumption because people who leave work have no subsequent exposure^{12}; thus, G-estimation or G-computation is preferable here. General marginal structural models^{7} might also be used productively here; for example, one might use them to contrast outcomes under regimes of the type: “continue to be exposed for *t* years unless one leaves work.”

Structural nested models and G-estimation do have some important but sometimes little recognized advantages over marginal structural models and G-computation. (Further discussion of the drawbacks of structural nested models and G-estimation is available elsewhere.^{7},^{13}) Robins and Hernan^{7} note that structural nested models allow one to model the modification of the effects of treatments provided after baseline by using postbaseline time-varying covariates. (History-adjusted marginal structural models somewhat mitigate this restriction of marginal structural models even though they are a modification of marginal structural models.) G-estimation of structural nested models is not adversely affected by the large weights that can characterize estimation of marginal structural models, especially when there is an extended period of follow-up. G-estimation may be used when the treatment assignment is confounded when: (1) there is a baseline instrument (eg, randomization in a randomized trial),^{3},^{14},^{15} (2) there is a time-varying instrument,^{7} or (3) sufficient confounders are measured to render treatment assignment unconfounded in an identifiable subset of the person-time experience in a study, but not in other subsets.^{16}^{–}^{18} Thus, the assumptions used in estimation of structural nested models can be tailored to fit the specific conditions of the study without having to resort to speculation about the degree of departure of treatment assignment from exchangeability or ignorability, which is required with marginal structural models but difficult in practice. G-estimation may be used to reduce sensitivity of inference to error in measurement of treatment in identifiable subsets of the data.^{18}

Finally, structural nested distribution models and a variant of G-estimation can sometimes be used to deal with situations in which sufficient information on confounders is lacking in all subsets of the data.^{19} When there are repeated measures of the outcome, future outcomes may sometimes serve as a proxy for the confounding variables used in making treatment decisions but recorded incompletely at best. This issue may arise when treatment decisions are made more frequently than their determinants are recorded for research—a common situation in epidemiology.

In conclusion, structural nested models and G-estimation have substantial promise that, so far, has remained incompletely realized in practice. Recent work has found additional advantages in theory to the approach. Nonetheless, there is a need for good software to make the method more acceptable, and to provide convincing application of the methods in practical problems, especially where other approaches fail. The paper by Chevrier et al is a useful step in this direction.