When estimating the effect of an exposure on an outcome in the time-varying setting, epidemiologists routinely target the average causal effect, which compares counterfactual outcomes had one intervened to expose all versus none of a sample across all time points. However, there is a growing recognition that the average causal effect is an unrealistic contrast because, even in time-fixed settings, it is difficult to imagine an intervention that would result in all individuals in a population being exposed or unexposed.1 The interventions implied by the average causal effect are even more challenging to imagine in the time-varying setting, where we have to assume we can set all individuals to being exposed at all time points. Furthermore, the longer we follow participants, the more unrealistic such intervention becomes.
Fortunately, there exist alternative causal estimands to the average causal effect. One such estimand—the incremental effect—allows us to estimate the effect of shifting each individual’s probability of being exposed, instead of intervening on the exact, fixed value of the exposure as in the average causal effect.2–4 Estimation of incremental effects has several advantages. First, depending on the exposure of interest and target population, this approach may better reflect the impact of a realistic public health intervention. For example, realistic interventions to increase patient adherence to medication (e.g., daily cellphone notifications) might on average increase patients’ likelihood of taking their medication but are unlikely to result in all patients always adhering. Interventions that would lead to perfect adherence (e.g., daily nurse visits) are likely to be costly or unethical. Exposure interventions are also not typically applied uniformly in populations, so a fixed intervention like that imagined by the average causal effect may not be of practical policy interest.3 Second, incremental effects can be estimated using double-robust methods, which enables balancing the tradeoff between bias from the curse of dimensionality and bias from potential statistical model misspecification. Specifically, the approach we use here can achieve optimal statistical properties (root-n convergence) regardless of the number of timepoints—even when implemented with flexible machine learning tools.4,5 Third, identification of incremental effects does not require meeting the positivity assumption, which makes this an attractive estimator in settings where either structural or random violations of positivity may be likely.6
Prior work has described the theory and motivation behind the estimation of incremental effects.2,4 Naimi et al.3 demonstrated how to estimate incremental effects in time-fixed settings, using as an example the effect of increasing vegetable intake on the risk of preeclampsia. Here, we demonstrate how to estimate these effects in longitudinal data with a time-varying exposure, time-varying confounding, and drop-out. We build on the applied example in Kim et al.,4 by describing in depth how to estimate the effect of taking preconception, low-dose aspirin on the incidence of pregnancy in the Effects of Aspirin in Gestation and Reproduction (EAGeR) trial. This work was motivated by the challenge of analyzing the effects of a time-varying exposure (adherence to aspirin) that suffered from nonpositivity as follow-up accrued.
The EAGeR trial was a double-blind trial, designed to investigate whether taking preconception low-dose aspirin had an impact on pregnancy outcomes.7,8 The study enrolled 1228 women at high risk for pregnancy loss and randomized women 1:1 to receive 81 mg of aspirin or placebo; all women additionally received 400 mcg of folic acid. Participants were followed up for 6 menstrual cycles if they did not become pregnant and, if they did become pregnant, throughout pregnancy. Women were allowed to leave the study at any point during follow-up. The trial’s primary outcome was live birth, with additional outcomes of interest including pregnancy and preterm birth. EAGeR participants provided written informed consent to participate in the trial. Our secondary analysis fell under the approval of the Institutional Review Board of the University of Pittsburgh, who deemed the work not human subjects’ research.
Here, we focus on the incidence of pregnancy by 26 weeks of follow-up (approximately six menstrual cycles). Pregnancy was determined by either a positive result on a “real-time” urine pregnancy test carried out at home or at a study visit or from urine testing conducted on stored samples after study completion.7 During this time period, 116 (9.5%) of the 1226 women included in our analysis dropped out of the study (two women enrolled in the study were excluded due to all missing data).
The intention-to-treat analysis of the EAGeR trial reported a small increase in rates of pregnancy among those assigned to aspirin relative to placebo.8 However, there was noted noncompliance to assigned treatment that increased as follow-up accrued (Figure 1A). Thus, a per-protocol analysis was conducted that assessed the effect on pregnancy outcomes of being assigned to aspirin and complying in each week of follow-up versus being assigned to placebo and always complying.9 Compliance to randomized treatment was determined based on bottle weight measurements and was defined as taking an assigned pill 5 out of 7 days in a given week. This per-protocol analysis reported a small increase in the incidence of pregnancy among those compliant with aspirin, relative to those compliant with placebo; specifically, the estimated risk difference was 7.8% (95% CI = 4.6%, 11%).
Defining the Causal Effect of Interest
We often define the per-protocol effect as the effect of assigning everyone to a given treatment and intervening to ensure they always comply with a specified protocol versus assigning everyone to a comparator treatment and intervening to ensure they always comply:10
where indicates the outcome (here, pregnancy), indicates randomization to treatment (here, aspirin or placebo), and indicates compliance to protocol across follow-up (throughout, the overbar denotes past history for a variable). The per-protocol effect is thus a special type of average causal effect that contrasts treatment regimens requiring all participants always be compliant. The exposure in a per-protocol analysis is analogous to the exposure in an observational study, with the difference being that we only need to control for the confounders of the effect of postbaseline on (as there are no confounders of the effect of on ). Given that compliance to the protocol is generally a time-varying variable, g-methods such as inverse probability weighting, g-computation, or a double-robust alternative are needed to control for the presence of time-varying confounders that might be affected by past treatment.11,12
However, suppose we thought it unrealistic to model an intervention that would force all women to comply in all weeks of follow-up, but we thought we could instead model an intervention that would increase women’s probability of complying. If this were the case, we might be interested in targeting an incremental effect, rather than the average causal effect. Incremental effects in the longitudinal setting have been described in detail elsewhere; here, we provide an overview of the method in the application.4
In the EAGeR per-protocol analysis, we could estimate the risk of the outcome through a certain time () under an intervention that shifts each woman’s probability of being compliant with treatment by a specified odds ratio (). The usual time-varying propensity score (probability of being exposed) takes the form:
where is the exposure at time ; is the set of all relevant historical variables through including past exposure, time-varying confounders, and baseline confounders; and is an indicator of not dropping out of the study through . We will hereafter define as “complied with aspirin” ( and ).
To shift by , we use:
Rewriting this equation, we can see that is interpreted as the odds ratio comparing the odds of to the odds of :
We are then interested in estimating what the average counterfactual outcomes (i.e., counterfactual risk of the outcome) would be under exposure to the shifted propensity score across follow-up:
where represents random draws from the conditional distributions that have been shifted by . Unlike the average causal effect above, which estimates the effect of a deterministic intervention on the exposure, this estimand represents a dropout-adjusted stochastic intervention on the shifted propensity score . We can then compare the counterfactual risk of the outcome under different values of . For example, we can compare the risks under interventions to increase the probability of being exposed () against the observed risk (.
The double-robust estimator for longitudinal incremental effects combines information on the shifted propensity score with the output of an outcome regression (), as well as the estimated probability for not dropping out (). We can estimate , the risk of the outcome at , by taking , where for individual :
For simplicity, we have removed the subscript from the righthand side of the equation. The first portion of the equation (labeled ) is the pseudo-outcome at time , whereas the portion labeled is the inverse probability weight for treatment and drop out at .
Although different in form, shares a similar structure to the standard double-robust estimator, in that it has an augmentation term for bias correction followed by an inverse-probability-weighted term. The above estimator only requires the second-order product term ()( to be “small enough” for all This double robustness property enables us to employ more flexible nonparametric methods for estimating each regression function. Moreover, the estimator can yield important efficiency gains over the classical inverse probability weighting estimator.4
Identifying incremental effects
To interpret an estimate obtained in the observed data as the targeted causal effect, meeting certain identification conditions is required. For the standard g-methods, one sufficient set of conditions includes exchangeability, causal consistency, and positivity.13 One complication of estimating average causal effects in data with time-varying exposures and long follow-up periods, though, is that violations of the positivity condition are common, particularly random violations due to data sparsity.4,6
In the time-varying setting, positivity requires a nonzero probability of following a given treatment regime (conditional on those variables necessary to achieve exchangeability) across all follow-ups. For example, when estimating the per-protocol effect in EAGeR, we have to assume that the cumulative probability of remaining compliant with assigned treatment (conditional on the baseline and time-varying confounders) in each successive week of follow-up is bounded away from zero (and one). Specifically, we define the positivity condition for the per-protocol effect as:
When we estimated these probabilities using logistic regression, we saw that these probabilities approached zero as follow-up accrued, indicating a random positivity violation (Figure 1B).6
When positivity is violated, several solutions can be pursued.6,14 If the violations are random, one can use parametric models to smooth over the sparse data, at the cost of strong model specification assumptions. This was the solution pursued by the original EAGeR per-protocol analysis.9 Another solution, which works regardless of whether the violations are random or structural, is changing the target estimand to one that will not be affected by the positivity violation (e.g., by estimating the average causal effect in the subset of participants with exposure opportunity) or one that does not require the positivity condition.6 Incremental effects are one example of the latter type of causal estimand.2–4
Previous articles on the estimation of incremental effects have demonstrated why this approach does not require positivity for identification.3,4 The core idea is that, for individuals and times with , regardless of whether it was owing to data sparsity or owing to structural violations, the intervention naturally leaves the propensities as is. We do not intervene on them at all. In such cases, an individual’s estimated outcome becomes a function of their observed exposure, the specified , the estimated probability of not dropping out, and the pseudo-regression functions. Thus, the interpretation of the risk of the outcome under the incremental effect is the average of the outcomes in the entire population when the probability of the exposure is shifted among those with exposure opportunity. The performance of the incremental effect estimator has not been assessed under near violations of positivity when is close to but not exactly equal to zero or one. In such cases, we would likely expect our 95% confidence intervals to be wide, just as they would be for methods to estimate the average causal effect. It is also worth mentioning that the estimation of incremental effects can be used as a sensitivity analysis for the positivity assumption. When positivity is violated, approaches at very large (approaching ) and approaches at very small (near 0).2
When controlling for right censoring as we do here, we must also meet the positivity condition for drop out, which is often more likely to hold relative to positivity assumptions for time-dependent treatments.4 Finally, it is critically important to note that we must still meet the exchangeability (by controlling for confounding and selection bias due to informative drop out) and consistency conditions to identify the incremental effect.
Estimating Incremental Effects
To estimate incremental effects, we can use the following steps:4
- Sample Splitting. Split the full sample into sample splits (here, ).5,15–17 For a given , define testing (including all individuals selected into split ) and training (including all individuals not selected into split ) data sets. Sample splitting not only allows us to avoid any restrictions on the complexity of the nuisance estimators so that we can use arbitrarily complex modern machine learning methods but also makes our algorithm easily parallelizable.2
- Estimate nuisance parameters. Regress the exposure on historical variables ( and the indicator for not dropping out on the exposure and historical variables () within the training data and use the output from these models to predict and in the full sample. Then, use these predicted values and the observed exposure to build cumulative weights:
- Estimate the pseudo regression functions. Starting at the last time point, let and regress on exposure and historical variables () within the training data and use the model output to predict the outcomes when within the individuals who have not dropped out. Then use these predicted outcomes, , , , , , to compute the pseudo-outcome at the previous time point . Repeat the process of regression in the training set, prediction in the full sample, and computing , for .
- Estimate risk. Within the testing data set, combine the results from the steps above to estimate risk of the outcome for this sample split:
We then repeat these steps for the other sample splits, and the overall estimated risk of the outcome is the average of the estimates obtained within each sample split:
We should note that all steps are only carried out among those who were “observable” at a given time , by which we mean they had not dropped out or had the outcome. Because we use sample-splitting estimators that allow arbitrarily complex nuisance estimation, all regressions above can be carried out using not only traditional parametric models (e.g., logistic regression) but also flexible machine learning algorithms or ensemble approaches such as SuperLearner,18 even when our regression problems are high-dimensional. To obtain 95% confidence intervals (CI), we can estimate the variance of from the efficient influence function.4 When comparing risks under different values, 95% CI can be obtained using the delta method.19
Application to EAGeR
As mentioned above, our outcome of interest in the EAGeR application was the incidence (risk) of pregnancy by 26 weeks of follow-up. Our exposure of interest was compliance with aspirin. We examined values that ranged from 0.3 to 3.0. We considered four baseline covariates as potential confounders: age, body mass index, any smoking, and which trial eligibility criteria a woman met. We additionally controlled for two time-varying confounders: reporting of any bleeding or any nausea or vomiting in a given week. This adjustment set matches that used in the published per-protocol analysis.9 We ran all regressions using the SuperLearner R package to combine generalized linear models, random forest, and k-nearest neighbors (using default hyperparameters).18
Because our motivating example was the per-protocol analysis of the EAGeR trial, we repeated the above steps to estimate the incremental effect for an exposure defined as “complied with placebo” ( and ). We estimated the risk difference comparing the results for the two exposures at specific values of and estimated 95% CI using the delta method.
We carried out all analyses using R version 4.1.0 (The R Foundation, Vienna, Austria); code can be found on GitHub (https://github.com/jerudolph13/inc_effect_eager).
As illustrated in Figure 1a, the overall proportion of EAGeR participants who complied with assigned treatment in a given week dropped consistently across follow-up, from a high of 96% in week 1 to a low of 45% in week 26. Compliance was relatively similar for each treatment arm, although there were weeks in which the proportion of women who complied with aspirin was notably lower than the proportion of women who complied with placebo. For example, in week 23, 69% of women assigned to placebo complied, compared with 59% of women assigned to aspirin (a difference in proportions of 10%). By 26 weeks of follow-up, 773 women became pregnant, with 403 and 370 of those pregnancies occurring among women assigned to aspirin and placebo, respectively. The observed incidence of pregnancy by 26 weeks (not controlling for informative censoring) was 67% among all participants, 70% among those assigned to aspirin, and 64% among those assigned to placebo.
We summarized how the incidence of pregnancy by 26 weeks changed as we shifted women’s probability of complying with aspirin in Figure 2A. When , we estimated that the incidence of pregnancy under no intervention on the exposure (but under an intervention to remove censoring) was 77% (95% CI = 74%, 80%). As we decreased women’s probability of complying (), the incidence of pregnancy did not meaningfully change, given the width of the confidence intervals. For example, when , the incidence of pregnancy was 76% (95% CI = 72%, 81%). This implies that the incidence of pregnancy was 0.29% (95% CI = −3.2%, 2.6%) lower when we multiplied women’s odds of complying with aspirin across follow-up by compared to when we did not intervene on their exposure (). As we increased women’s probability of complying with aspirin (), the incidence of pregnancy steadily increased as increased. When , the incidence of pregnancy was 83% (95% CI = 79%, 87%), and the risk difference relative to was 6.4% (95% CI = 3.8%, 9.0%). When , the incidence was 89% (95% CI = 84%, 93%), for a risk difference relative to of 12% (95% CI = 7.8%, 16%). Risks and risk differences for additional values are provided in the Table.
Incidence of Pregnancy
(per 100 women) Under Interventions to Shift EAGeR Participants’ Odds of Complying with Aspirin and Placebo by
, and the Difference in Incidence Relative to no Intervention (
||Compliance to aspirin
||Compliance to placebo
||Aspirin vs. Placebo
indicates propensity score odds ratio; CI, confidence interval; RD: risk difference.
We saw a similar pattern in our results when we shifted women’s probability of complying with a placebo, although the increase in pregnancy incidence at was more pronounced (Figure 2B). When we estimated that the incidence of pregnancy by 26 weeks was 77% (95% CI = 74%, 80%). The risk difference comparing intervening on aspirin to intervening on placebo at was 0.01% (95% CI = −0.62%, 0.64%). When the incidence of pregnancy was 78% (95% CI = 74%, 83%). The risk difference relative to was 1.7% (95% CI = −0.75%, 4.2%), and the risk difference comparing aspirin to placebo at was −2.0% (95% CI = −6.2%, 2.1%). When the incidence was 85% (95% CI = 81%, 89%), for a risk difference relative to of 8.0% (95% CI = 5.4%, 11%) and a risk difference comparing aspirin to placebo at of −1.5% (95% CI = −5.2%, 2.1%).
In this study, we estimated longitudinal incremental effects in the EAGeR trial to assess how an intervention to shift probabilities of complying with aspirin and complying with placebo impacted the incidence of pregnancy by 26 weeks of follow-up. In doing so, we sought to provide information that would answer a similar question as a standard per-protocol analysis but in a manner that would not be vulnerable to the random nonpositivity we observed in our data. We estimated that the incidence of pregnancy steadily increased as we increased women’s probability of complying and changed little if we decreased the probability of complying; however, the results were nearly identical regardless of whether women were complying with aspirin or placebo.
The similarity in results seen for the aspirin and placebo exposures seems to indicate that aspirin (even when taken regularly) has little estimated impact on the incidence of pregnancy by 26 weeks in the EAGeR sample. Both the original intention-to-treat and per-protocol analyses reported small increases in incidence of pregnancy for the aspirin arm relative to placebo arm.8,9 Potential reasons for the differing results include that here we simply targeted a different estimand than the original per-protocol analysis and that we used flexible machine learning methods, rather than parametric models. Nonetheless, we saw that incidence of pregnancy steadily increased as we increased women’s probability of complying with either treatment. This finding suggests that the act of complying with and staying involved in the EAGeR trial mattered more for pregnancy incidence than the treatment being taken. The results seen here for both treatment arms demonstrate why the per-protocol effect usually has as its comparator “always complied with placebo.” The goal is to isolate the effect of the drug’s active ingredient. Having a comparator group with the same, perfect level of compliance as the active treatment arm controls for the impact of behaviors related to complying with the trial protocol.10
Thinking beyond this particular application in EAGeR, there are several important advantages to estimating incremental effects in epidemiologic analyses—many of which we have already mentioned. First, the intervention proposed by incremental effects is interpretable and realistic. The average causal effect imagines one could intervene to make everyone in a given sample exposed and unexposed across all time points, which in some contexts would be infeasible or impossible. In contrast, the intervention implied by incremental effects instead simply increases or decreases everyone’s probability of being exposed, which mirrors the sort of impact one could achieve via many public health interventions. Second, this approach does not require the positivity assumption to interpret the risk obtained in observed data as the counterfactual risk of the outcome that would be seen if we intervened to shift everyone’s probability of being exposed. This can make an estimation of incremental effects an attractive method to use in analyses where nonpositivity due to either structural or random violations is likely.6 Third, incremental effects can be estimated using a double-robust approach implemented with machine learning algorithms, which means it can be less vulnerable to statistical model misspecification bias than other approaches that rely on parametric models (e.g., g-computation, inverse probability weighting, or even double robust estimation of marginal structural models).5
There are, however, some caveats in the use of the incremental effect as a causal estimand. In particular, the incremental effect will not answer every research question. Causal estimands and the estimators used to target them should only be chosen if they appropriately answer the scientific question of interest. In particular, incremental effects capture natural increases and decreases in treatment propensity relative to the observational setting; if personalized optimal treatment regimens are of interest, or if treatment assignments really can be finely controlled (e.g., if all in a population can feasibly be treated), then incremental effects may not be the most useful estimand. Furthermore, while we do not need to meet the positivity condition to identify incremental effects, exchangeability and consistency conditions must still be met to interpret one’s estimate as causal. As in any other analysis, these assumptions cannot be tested in the data and must be made solely on background knowledge. Finally, if everyone in the population has zero or one probability of treatment, then no method—even estimation of incremental effects—will be able to estimate a meaningful contrast without extrapolation. Our results could also be sensitive to the number of sample splits used. However, this limitation is not unique to our analysis but may be mitigated in larger sample sizes or by making assumptions about the empirical process conditions, which would impose strong restrictions on the complexity of our nuisance estimators.17
Despite these limitations, the estimation of incremental effects is a novel approach that provides results that are both highly interpretable and robust. The method described here is thus likely to be attractive for many analyses of epidemiologic studies.
1. Westreich D. From patients to policy: population intervention effects in epidemiology. Epidemiology. 2017;28:525–528.
2. Kennedy EH. Nonparametric causal effects based on incremental propensity score interventions. J Am Stat Assoc. 2019;114:645–656.
3. Naimi AI, Rudolph JE, Kennedy EH, et al. Incremental propensity score effects for time-fixed exposures. Epidemiology. 2021;32:202–208.
4. Kim K, Kennedy EH, Naimi AI. Incremental intervention effects in studies with dropout and many timepoints. J Causal Inference. 2021;9:302–344.
5. Naimi AI, Mishler A, Kennedy EH. Challenges in obtaining valid causal effect estimates with machine learning algorithms. Am J Epidemiol. 2021.
6. Petersen ML, Porter KE, Gruber S, et al. Diagnosing and responding to violations in the positivity assumption. Stat Methods Med Res. 2012;21:31–54
7. Schisterman EF, Silver RM, Perkins NJ, et al. A randomised trial to evaluate the effects of low-dose aspirin in gestation and reproduction: design and baseline characteristics. Paediatr Perinat Epidemiol. 2013;27:598–609.
8. Schisterman EF, Silver RM, Lesher LL, et al. Preconception low-dose aspirin and pregnancy
outcomes: results from the EAGeR randomised trial. Lancet. 2014;384:29–36.
9. Naimi AI, Perkins NJ, Sjaarda LA, et al. The effect of preconception-initiated low-dose aspirin on human chorionic gonadotropin-detected pregnancy
loss, and live birth: per protocol analysis of a randomized trial. Ann Intern Med. 2021;174:595–601.
10. Rudolph JE, Naimi AI, Westreich DJ, et al. Defining and identifying per-protocol effects in randomized trials. Epidemiology. 2020;31:692–694.
11. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7:1393–1512
12. Naimi AI, Cole SR, Kennedy EH. An introduction to g methods. Int J Epidemiol. 2017;46:756–762.
13. Hernan MA, Robins JM. Causal Inference: What If. Boca Raton, FL: Chapman & Hall/CRC; 2020.
14. Rudolph JE, Benkeser D, Kennedy EH, et al. Estimation of the average causal effect in longitudinal data with time-varying exposures: the challenge of non-positivity and the impact of model flexibility. [published online ahead of print July 27, 2022]. Am J Epidemiol. doi:10.1093/aje/kwac136.
15. Zivich PN, Breskin A. Machine learning for causal inference: on the use of cross-fit estimators. Epidemiology. 2021;32:393–401.
16. Zhong Y, Kennedy EH, Bodnar LM, et al. AIPW: an r package for augmented inverse probability weighted estimation of average causal effects. Am J Epidemiol. 2021;190:2690–2699.
17. Kennedy EH. Semiparametric theory and empirical processes in causal inference. https://arxiv.org/abs/1510.04740
. Published 2016. Updated July 22, 2016. Accessed 2020.
18. van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6:Article25.
19. Cox C. Delta Method. In: Encyclopedia of Biostatistics. New York: John Wiley; 1998:1125–1127.