Relative excess risk due to interaction (RERI), first proposed by Rothman1 and defined as a departure from additivity of effects from 2 binary exposure variables, has been used to quantify the joint effects of 2 exposures in epidemiology. For example, RERI has been used to assess the possible synergy between smoking and alcohol use in relation to oral cancer among male veterans under age 60.2,3 RERI has also been used to investigate the joint effects of age and body mass index on diastolic blood pressure by Knol et al.4
Approaches have been developed to perform inference on RERI. Hosmer and Lemeshow3 proposed a Wald-type confidence interval (CI) based on the standard linear-logistic regression model. However, as the authors implied, this Wald-type CI could have poor coverage and low statistical power. Therefore, Zou5 described a refined method to improve the Wald-type CI, and Assmann et al6 proposed a nonparametric bootstrap CI. The method advocated by Zou, which is a general approach to construct CI for general linear functions of parameters, recovers the improperly assumed symmetric variance estimates in Wald-type CI due to asymmetric confidence limits for odds. Recently, by showing that Hosmer and Lemeshow's interpretation based on odds is equivalent to an interaction term in an linear additive odds ratio (OR) model, Richardson and Kaufman7 provided a likelihood-based CI through the linear additive OR model and reported a case study example in which the bootstrapped CI proposed by Assmann et al6 resulted in an unacceptably wide interval.
When data are sparse, these earlier approaches for inference regarding RERI either fail or result in very wide CIs. Here, we propose modified nonparametric and parametric bootstrap approaches using a continuity correction and tailoring the resampling scheme to minimize the effect of sparse data. The proposed methods are illustrated on 3 example data sets, and limited Monte Carlo simulations are conducted to assess finite sample properties of the proposed and existing methods.
We illustrate the proposed approaches using 3 examples. The first example is a cohort study concerning the joint effect of age (coded as A = 1 if age ≥40 years and 0 if age <40 years) and body mass index (BMI) (coded as B = 1 if BMI ≥25 kg/cm2 and 0 if BMI <25 kg/cm2) on the incidence of hypertension, defined as a diastolic blood pressure ≥90 mm Hg4; cell counts in this study are large. The second example is a case–control study concerning the joint effect of sports participation (A) and smoking (B) on the incidence of herniated lumbar disc,6,7 where cell counts are of moderate size. The third example is a case–control study of 458 male veterans under age of 60 years concerning the joint effects of alcohol use (A) and smoking (B) on the incidence of oral cancer,3,5,7 where cell counts are relatively sparse. Data for all 3 examples are reproduced in Table 1.
Relative Excess Risk Due to Interaction
Following the notation of Hosmer and Lemenshow,3 let A and B denote the presence of, and Ā and denote the absence of, 2 binary exposures. Let the quantity RR denote the relative risk. RERI can be defined as
Let yij denote the number of cases with exposure status of A = i and B = j, where i, j = 0, 1 represents the absence and presence of exposures, respectively. Assuming yij ∼ Binomial (nij, pij) for i, j = 0, 1, and
where x1=i and x2=j are the exposure status of A and B. Hosmer and Lemeshow3 estimated RERI through the following approach,
where j is the maximum likelihood estimate (MLE) of βj. The RERI defined in Equation (1) is equivalent to
Nonparametric Bootstrap Methods
Assmann et al6 proposed generating bootstrap resamples from the data to construct a CI for RERI. For each resample, RERI can be estimated through (2) or equivalently through a logistic regression model. A 95% percentile-based bootstrap CI is defined as (RERI0.025 to RERI0.975), where RERI0.025 and RERI0.975 are the 2.5 and 97.5 quantiles from the set of bootstrap resamples.
In the third (and sparse) oral cancer data example,3,5,7 there are approximately (1−y00/N)N≈exp(−y00) = 5% nonparametric bootstrap resampled datasets expected to have negative and positive unbounded values of the RERI due to zero values in the denominators of Equation (2). Whether it is a negative or positive unbounded value depends on the sign of
. A simulation shows that there is about 1% negative and 4% positive unbounded value; see the Figure for more information. Consequently, the upper bound of the usual 95% CI (RERI0.025 to RERI0.975) is a positive unbounded value.
Modified Bootstrap Approach 1: Nonparametric Bootstrap Method With a Continuity Correction
The nonparametric bootstrap method may be problematic when cell counts are sparse, as in the oral cancer data example. We therefore modify Equation (2) using a continuity correction as follows,
Then, a 95% parametric bootstrap CI is defined analogous to the original nonparametric bootstrap method.
Modified Bootstrap Approach 2: Stratified Nonparametric Bootstrap Method With a Continuity Correction
To reduce the percentage of bootstrap samples with zero cell counts (especially if the sample sizes are unbalanced in the 4 strata, as will often be the case in observational data), we also investigate a stratified nonparametric bootstrap method in which bootstrap samples were drawn from each of 4 strata. For each bootstrap sample, we calculate RERI through Equation (3) and further construct the 95% parametric bootstrap CI.
For the oral-cancer example, the percentage of bootstrap samples where RERI cannot be calculated due to a zero cell is reduced by 20%, from 5% to 4%. Had the data been even more sparse (eg, 2/5 nonsmoking teetotalers with oral cancer while other cell counts remain the same) then the reduction in incalculable RERIs with stratification is 43%—from about 13.4% to about 7.7%.
Modified Bootstrap Approach 3: Parametric Bootstrap CI With a Continuity Correction
Because yij ∼Binomial (nij, pij) for i, j = 0, 1, we may estimate pij using maximum likelihood estimation, namely JOURNAL/epide/04.03/00001648-201007000-00018/OV0402/v/2021-02-05T040302Z/r/image-pngij=yij/nij. Here, we would draw a number of resamples of yij (i, j = 0, 1) from the distribution Binomial (nij, JOURNAL/epide/04.03/00001648-201007000-00018/OV0402/v/2021-02-05T040302Z/r/image-pngij). The CI can be constructed similarly to the nonparametric bootstrap method with a continuity correction. In this setting, the parametric bootstrap method is equivalent to the stratified nonparametric bootstrap method. SAS code to implement this approach is given in eAppendix 1 (https://links.lww.com/EDE/A398).
In all bootstrap methods, the point estimate of RERI is always given by Equation (2) and the continuity correction is used for bootstrap CIs construction. In this paper, a total of 50,000 bootstrap samples were obtained for the example. A 95% bootstrap CI is defined as (RERI0.025 to RERI0.975).
Finally, it has been argued that a RERI defined based on relative risk could be more appropriate5 for a cohort study. The bootstrap methods can be easily extended to RERI defined based on relative risk, as follows:
After a similar continuity correction it becomes
MONTE CARLO SIMULATION
Simulation 1: Cell Counts Are Not Sparse
A Monte Carlo simulation was conducted to evaluate the performance of the modified bootstrap approaches and existing approaches (likelihood-based and Zou's approaches) when cell counts are not sparse. The setting was made to mimic the herniated lumbar disc example data.6,7 We let n00=208, n01=251, n10=51, n11=64, p00=.394, p01=.550, p10=.608, and p11=.563. A total of 1000 resamples of yij were drawn randomly from the distribution Binomial (nij, pij), i, j = 0, 1. RERI CIs were obtained for each of the 1000 resamples using each method. In all bootstrap methods, 50,000 Monte Carlo trials were conducted. In this setting, the true population RERI defined as in Equation (2) is
Simulation 2: Some Cell Counts Are Sparse
A second Monte Carlo simulation was conducted to evaluate the performance of the modified bootstrap approaches and existing approaches (likelihood-based and Zou's approaches) when some cell counts are sparse. The setting was made to mimic the oral cancer example data.3,6,7 We let n00=23, n01=26, n10=18, n11=391, p00= .130, p01=.308, p10=.333, and p11=.575. A total of 1000 resamples of yij were drawn randomly from the distribution Binomial (nij, pij), i, j = 0, 1. RERI CIs were obtained for each of the 1000 resamples through the likelihood approach and the parametric bootstrap approach with n = 50,000 Monte Carlo trials. In this setting, the true population RERI defined as in Equation (2) is
In Table 2, we present the 95% CI obtained using the various approaches. In the hypertension data, where cell counts are not sparse, all approaches provide similar inference. In the herniated-lumbar-disc data, where cell counts are moderate, the differences among the approaches remain small. However, in the oral-cancer data, where some cell counts are sparse, the parametric and stratified nonparametric bootstrap approaches provide the narrowest CIs. Kernel-smoothed bootstrap sample distributions of RERI are presented in the Figure for the oral-cancer example, where cell counts are sparse. The distributions are slightly right-skewed.
We also briefly present the results for the relative-risk-based RERI estimation for the first example (the cohort study for hypertension), for which Zou5 notes that this RERI might be more appropriate. The Wald-type approach gives an asymptotic 95% CI of 0.39 to 2.30, whereas the approach by Zou5 gives a 95% CI of 0.31 to 2.37. The bootstrap 95% CI based on 50,000 Monte-Carlo samples is 0.42 to 2.29. Differences are again relatively small because there are few cells with sparse data.
In Table 3, we summarize the results of simulation 1, where cell counts are not sparse. For each method, coverage probability was estimated as the proportion of the CIs constructed that contained the true values of RERI. The lower and upper bounds of the reported 95% CI are the means of the lower and upper bounds of these 95% CIs, and the length of the 95% CI is the difference between the lower and upper bounds. As expected, all compared methods provide similar 95% CIs. By combining these results with those presented by Zou5; we conclude that, where cell counts are not sparse, all methods perform similarly (likelihood-based and Zou's approaches, the nonparametric bootstrap approach, and the modified bootstrap methods).
In Table 4, we summarize the results for simulation 2, where some cell counts are sparse. The 95% CI obtained from the nonparametric, parametric, or stratified nonparametric bootstrap approaches with continuity correction have appropriate CI coverage, with shorter length than for the likelihood-based approach; Zou's approach has overly conservative CI coverage.
We did not evaluate a nonparametric bootstrap approach without continuity correction in this simulation study because we do not expect it to provide meaningful inference of RERI. As we found in the third example, when the data are sparse, there are a high percentage of bootstrap samples where RERI cannot be calculated due to a zero cell in a nonparametric bootstrap approach without continuity correction.
Wald-type CIs are not reported here because they have been thoroughly examined in prior work.5,6 However, our limited results show that Wald-type CIs do not ensure correct coverage probabilities, which is consistent with these previous results.
This paper presents several modified bootstrap methods to make inference for RERIs. The modified bootstrap approaches outperform other methods when data are sparse. Among 3 modified bootstrap methods with a continuity correction, the stratified and parametric bootstrap methods are (in this setting) theoretically equivalent and expected to be superior to the nonparametric bootstrap method, particularly when data are extremely sparse. Between these 2 methods, there is a slight preference for the parametric-bootstrap method with a continuity correction because numerical implementation of the parametric bootstrap is computationally faster and presents no computational burden with modem computing resources.
When there are additional covariates, our methods can no longer use the simple Equation (2) to define RERI because Equations (1) and (2) are not equivalent in this case. However, the bootstrap methods can be modified in the same spirit. In short, we estimate RERI through Equation (1) but constrained maximum likelihood estimate8 of βj will be used. Details are provided in eAppendix 2 (https://links.lww.com/EDE/A398).
When a definition of an estimate involves a ratio and the denominator could be zero, a continuity correction may be useful to provide a meaningful estimate. Continuity corrections have been used in many areas (eg, binomial proportion,9 meta-analyses for adverse event,10 and sparse data in general11) and can be considered as a weak null-centered prior from a Bayesian perspective.12 We note that the continuity correction we chose may not be optimal. However, because the purpose is to construct CIs of the RERIs, rather than to provide a point estimate, the method is not likely to be sensitive to the choice of a reasonable continuity correction (eg, adding 0.25 or 1 to all cell counts instead of 0.5).
The approaches described here have limitations. First, we assume that the data contain no zero cells. None of the methods presented makes reasonable inference about RERI when the data contain zero cells. The maximum likelihood estimates needed in the Wald-type approach, Zou's approach, and the likelihood-based approach do not exist in the presence of zero cells.13 For bootstrap methods, all resampled datasets will inherit the zero cells from the original data; therefore such approaches do not provide reasonable estimates. Firth14 proposed a penalized-likelihood approach to circumvent the problem of zero cells. Second, here we use simple bootstrap methods and found adequate finite-sample properties. More sophisticated bootstrap methods (eg, bias-correction and acceleration and double bootstrap) may provide enhanced properties, but are beyond our current scope.
The likelihood-based approach proposed by Richardson and Kaufman7 seems to be superior to a Wald-type approach or a nonparametric bootstrap method without continuity correction. On the other hand, the likelihood-based approach does not appear to function as well as bootstrap approaches with a continuity correction as described here. Further work is needed to elucidate exactly why the bootstrap outperforms the likelihood-based approach.
We thank the 3 reviewers for their helpful and constructive comments. Views expressed in this paper are the author's professional opinions and do not necessarily represent the official positions of the US Food and Drug Administration.
1.Rothman KJ. Modern Epidemiology.
1st ed. Boston: Little, Brown & Co; 1986.
2.Rothman K, Keller A. The effect of joint exposure to alcohol and tobacco on risk of cancer of the mouth and pharynx. J Chronic Dis
3.Hosmer DW, Lemeshow S. Confidence interval estimation of interaction. Epidemiology
4.Knol MJ, van der Tweel I, Grobbee DE, Numans ME, Geerlings MI. Estimating interaction on an additive scale between continuous determinants in a logistic regression model. Int J Epidemiol
5.Zou GY. On the estimation of additive interaction by use of the four-by-two table and beyond. Am J Epidemiol
6.Assmann SF, Hosmer DW, Lemeshow S, Mundt KA. Confidence intervals for measures of interaction. Epidemiology
7.Richardson DB, Kaufman JS. Estimation of the relative excess risk due to interaction and associated confidence bounds. Am J Epidemiol
8.Tian GL, Tang ML, Fang HB, Tan M. Efficient methods for estimating constrained parameters with applications to lasso logistic regression. Comput Stat Data Anal
9.Pires AM, Amado CA. Interval estimators for a binomial proportion: comparison of twenty methods. REVSTAT—Stat J
10.Sutton AJ, Cooper NJ, Lambert PC, Jones DR, Abrams KR, Sweeting MJ. Meta-analysis of rare and adverse event data. Expert Rev Pharmacoecon Outcomes Res
11.Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Stat Med
12.Greenland S. Interval estimation by simulation as an alternative to and extension of confidence intervals. Int J Epidemiol
13.Albert A, Anderson JA. On the existence of maximum likelihood estimates in logistic regression models. Biometrika
14.Firth D. Bias reduction of maximum likelihood estimates. Biometrka