It is uncommon for occupational cohort mortality studies to include smoking history information; consequently, smoking is often an unmeasured potential confounder in analyses of occupational exposure-disease associations.1 When the outcome of interest is strongly related to smoking, this can seriously complicate interpretation of study results. Since smoking is more strongly associated with lung cancer than with any other cancer site, concerns about unmeasured confounding due to smoking are particularly serious in occupational cohort studies of lung cancer.2
More than 2 decades ago Axelson and Steenland3 described a quantitative approach to assessing the sensitivity of relative rate estimates to uncontrolled confounding by smoking. This approach requires estimates of the adjusted relative rate to be calculated under various assumptions about the prevalence of smoking among exposed and unexposed workers. If correct assumptions are made about the prevalence of smoking among exposed and unexposed workers in the study cohort, this approach works perfectly.
However, quantitative sensitivity analyses remain the exception rather than the rule in occupational cohort mortality studies. One reason may be that a sensitivity analysis requires speculations on the prevalence of smoking among exposed and unexposed workers when there is little empirical basis. One proposed solution has been to assign subjective probabilities to various confounding scenarios and derive, via Monte Carlo methods, posterior distributions of risk estimates accounting for smoking differences.4 The method proposed by Axelson and Steenland3 also requires the assumption that smoking prevalence does not depend upon age or birth cohort (an often implausible assumption in large historical cohort studies). Extensions of this approach to allow for such variations in smoking prevalence require further assumptions about the distribution of smoking by occupational exposure level and age category or birth cohort.4,5
I propose a related quantitative approach to account for bias due to confounding by smoking in occupational cohort studies. Unlike standard approaches to sensitivity analysis, this approach does not require the analyst to explicitly posit distributions of smoking by occupational exposure level.
Suppose an investigator is interested in the association between an occupational exposure and risk of lung cancer in a cohort in which smoking is an unmeasured potential confounder. Confounding by smoking could be assessed in a qualitative fashion by examining the association between the exposure and risk of a different disease that is known to be associated with smoking, such as chronic obstructive pulmonary disease (COPD).6,7 Evidence of such an association could be interpreted as evidence of confounding of the exposure-lung cancer association by smoking. To be valid, such an interpretation requires that smoking is related to lung cancer and COPD, there is no true causal association between exposure and COPD, and the only uncontrolled confounder of the association between exposure and COPD is smoking. Figure 1 illustrates these necessary assumptions about the underlying causal relations among exposure, smoking, lung cancer, and COPD. Using the same assumptions, I describe in the proceeding text how to conduct a quantitative sensitivity analysis.
Model for Exposure and Smoking
Let RRLungunadj be the rate ratio for the association between exposure and lung cancer in the study cohort unadjusted for smoking, and define indicator (1 = yes, 0 = no) variables: X1 = exposure, X2 = current smoking, and X3 = former smoking. Assume that the lung cancer rate conforms to a proportional hazards model of the form
The parameter of central interest in this model, β1, is the log hazard ratio for lung cancer contrasting exposed to unexposed workers, adjusted for smoking. The study cohort may comprise a single stratum, or multiple strata defined by covariates such as age and birth cohort. Let ωt denote a weight proportional to the contribution of subgroup t to the study cohort, and π1,2,t, π1,3,t, π0,2,t, π0,3,t be the proportion of current and former smokers among the exposed and unexposed workers in covariate stratum, t. The bias due to confounding by smoking is a function of the weighted average of the stratum-specific proportions of current and former smokers among the exposed relative to the unexposed:
Let RRCOPDunadj be the relative rate for the association between exposure and COPD in the study cohort (unadjusted for smoking) and assume that the rate of COPD conforms to the proportional hazards model Log (RRCOPD)=θ1X1+θ2X2+θ3X3.
The bias due to confounding by smoking is given by
where θ1=log(RRCOPDunadj) − log(BiasCOPD). If, as in Figure 1, there is no true causal association between exposure and COPD (ie, θ1 = 0) then BiasCOPD=RRCOPDunadj.
Adjustment for Confounding by Smoking
The proposed approach to correcting the crude estimate of the association between exposure and lung cancer for the confounding effect of smoking begins by making use of available information on the correction factor for the association between exposure and COPD. Recall that the correct expression for adjusting the target parameter is β1=log(RRLungunadj) − log(BIASLung). Substituting RRCOPDunadj into this expression in place of BIASLung results in an adjusted estimate of the association between exposure and lung cancer that may reduce the magnitude of bias due to uncontrolled confounding by smoking, particularly if θ2≅β2 and θ3≅β3. The proposed expression for correction of the target parameter for confounding by smoking is the quantity 1 ≅log(RRLungunadj) − log(RRCOPDunadj).
The performance of this adjustment for bias due to confounding can be assessed via direct calculations (eAppendix 1, http://links.lww.com/EDE/A357). The difference between the true bias correction factor for the target parameter, log(BIASLung), and the quantity log(RRCOPDunadj) indicates the magnitude of bias in the target parameter after this adjustment for confounding by smoking. A value of 0 indicates perfect adjustment.
Figure 2 provides illustrative calculations derived for a hypothetical study in which the relative rate of lung cancer among current and former smokers was 23.6 and 8.7, respectively, and the relative rate of COPD among current and former smokers was 12.2 and 8.4, respectively (values obtained from the American Cancer Society's second Cancer Prevention Study7). The percentage of current and former smokers among the unexposed was 35% and 31%, respectively (values that conform to the smoking distribution among US men aged 25–64 years in 1987 national survey data).4 The performance of this adjustment approach is shown for various scenarios in which the prevalence of smoking is greater among exposed than unexposed workers. Under the scenarios considered, 90% or more of the bias due to confounding by smoking is removed.
Figure 3 provides illustrative calculations derived for a hypothetical study in which the relative rate of lung cancer among current and former smokers was 14.6 and 4.0, respectively, and the relative rate of COPD among current and former smokers was 14.2 and 5.8, respectively (values obtained from the UK study of smoking and death among male British doctors8). The proportion of current and former smokers among the unexposed was 35% and 31%, respectively. Under the scenarios considered, 96% or more of the bias due to confounding by smoking is removed via adjustment.
Fitting a regression model that simultaneously estimates the crude association between occupational exposure and lung cancer and COPD accounts for potential correlation of the outcomes. For survival analysis, one simple approach to fitting such a model starts by constructing a person-year data set that includes a unique record for each person-year of observation.9,10 The Cox proportional hazards model may be approximated by use of pooled logistic regression analysis of the discrete time hazard (ie, the probability of case occurrence at a given time interval conditional on survival until the start of that interval).
Consider a cohort mortality study in which decedents are classified by underlying cause of death according to a multilevel outcome variable with 3 levels (0 = alive, 1 = lung cancer, 2 = COPD). A polytomous logistic regression model for this multilevel outcome variable estimates the log of the probability that outcome category is 1, divided by the probability that the outcome category is 0, and simultaneously estimates the log of the probability that outcome is 2, divided by the probability that the outcome category is 0. Let β11 denote the parameter representing the crude estimate of the exposure-lung cancer association under this model, and let β21 denote the parameter representing the crude estimate of the exposure-COPD association. When the discrete time hazard probabilities for person-year intervals are low, the estimated β coefficients from the discrete time hazard model approximate those obtained via the continuous time model for hazard ratios and the quantity of interest, RRLung=exp(β11−β21).11 The variance for (β11–β21) may be derived via the equation Var(β11–β21) = Var(β11) + Var(β21) − 2Cov(β11, β21) and the 95% Wald-type confidence interval for RRLung may be derived as exp[(β11–β21) ± 1.96 (Var(β11–β21))½].
Consider a cohort study of the association between an occupational exposure and lung cancer. Table 1 shows the number of lung cancer cases, person-years at risk, and the hazard ratio contrasting exposed to unexposed workers. The hazard ratio, unadjusted for confounding by smoking, is 2.32. There may be confounding due to smoking in this study, although smoking history information was unmeasured. The adjustment approach described above estimates the crude association between occupational exposure and COPD, a category of cause of death that is unrelated to the occupational hazard (Table 2). Under the assumptions outlined above, the adjusted estimate of the target parameter will be 1≅log(2.32) − log(1.17) = log(1.99).
Table 3 reports the results that would have been obtained if the investigator had had smoking information. The hazard ratio contrasting lung cancer rates among exposed workers to unexposed workers is 2.00, after conditioning on smoking status. Table 3 shows that confounding occurred, because the prevalence of smokers was greater among the exposed than among the unexposed workers, and the risk of lung cancer was substantially higher among smokers than among never smokers. The simple adjustment procedure corrected for essentially all bias due to confounding.
A polytomous logistic regression model was fitted to these hypothetical data to simultaneously estimate crude associations between exposure and lung cancer and COPD. Table 4 reports the obtained estimates and associated covariance matrix. The adjusted estimate of the exposure effect was exp(0.8955–0.2098) = 1.99. Using the reported covariance matrix, a 95% confidence interval (CI) (1.85–2.13) was derived.
Other Smoking-Related Diseases
In some settings, the occupational exposure of interest may actually increase the risk of COPD, violating the assumption that there is no true causal association between exposure and COPD. If θ1≠0 then log(RRCOPDunadj) =log(BIASCOPD) + θ1. An indirect adjustment procedure that substitutes RRCOPDunadj in place of BIASLung will result in residual bias equal to log(BIASLung) − log(BIASCOPD) − θ1. If there is concern about the causal associations underlying this approach, an investigator might consider analysis of outcomes other than COPD, or examine the consistency of findings obtained via analysis of outcomes other than COPD. The alternative outcome should be associated with smoking but not with the occupational hazard. Occupational exposure to silica, for example, is associated with lung cancer and COPD,12–14 but not with cancers of the mouth, pharynx, larynx, or esophagus (which are associated with smoking). Illustrative calculations were derived for a hypothetical study in which the relative rate of lung cancer among current and former smokers was 14.6 and 4.0, respectively, and the relative rate of cancers of the mouth, pharynx, larynx, and esophagus among current and former smokers was 6.7 and 2.9, respectively (values obtained from the UK study of smoking and death among male British doctors8). The proportion of current and former smokers among the unexposed was 35% and 31%, respectively. Under the scenarios considered, 91% or more of the bias due to confounding by smoking is removed via adjustment (results not shown).
It has been argued that failure to adjust for smoking generally will not result in substantial confounding of associations between occupational exposures and lung cancer.3,15 However, confounding by smoking continues to be of concern in epidemiologic studies of lung cancer, particularly with low-level occupational exposures to lung carcinogens typical in contemporary settings. Epidemiologic methods for sensitivity analysis to evaluate uncontrolled confounding due to smoking in occupational cohort studies have a long history.3,4,6,15 Typically, a sensitivity analysis requires various plausible assumptions for the smoking-lung cancer association and the proportion of smokers among exposed and unexposed subgroups (eg, values for π12, π13, π02, and π03). To account for typical age or birth cohort trends in baseline smoking habits, these proportions must be allowed to vary over time, which requires further assumptions.
This article proposes an approach to adjust for potential confounding by smoking. The approach does not require assumptions about the proportion of smokers among exposed and unexposed subgroups. Rather, this adjustment makes use of an association between the exposure and a smoking-related disease other than the one of primary interest. Which smoking-related cause of death should be considered? The approach developed here requires that the cause of death should not have a causal association with the occupational exposure of interest but should be associated with smoking (Fig. 1). In many settings, COPD may be a reasonable choice. The motivating example for this article was research on lung cancer among workers who were occupationally exposed to x-rays or gamma radiation at low dose rates (not believed to be associated with risk of COPD). Any apparent association between occupational exposure and COPD should be due only to uncontrolled confounding by smoking. This assumption should be considered carefully in the context of the occupational exposure under investigation. Some occupational hazards, such as asbestos, silica, and other dusts, are associated with COPD.13,14 If the exposure under study actually increases the risk of COPD, then adjustment may bias the risk coefficient downwards.
Unlike a qualitative assessment of confounding by smoking, the approach described here results in a quantitative assessment of confounding by smoking, with assumptions made explicit by their mathematical presentation, and without the need to speculate about the (unknown) distribution of smoking with respect to exposure in the study cohort. This approach employs a standard proportional hazards model that implies multiplicative effects of the exposure and confounder; a similar approach can be developed for a model of additive effects (eAppendix 2, http://links.lww.com/EDE/A357).
As illustrated in Figures 2 and 3, under plausible values for θ2, β2, θ3, and β3 this approach leads to a reasonable degree of control for confounding by smoking, even for some relatively extreme scenarios of confounding. Figures 2 and 3 include scenarios, for example, in which essentially all exposed workers are either current or former smokers. Furthermore, an estimate of the target parameter of interest, adjusted for confounding by smoking, remains valid even under scenarios in which the prevalence of smoking varies across covariate strata.
Unlike a sensitivity analysis, the adjustment approach described in this article does not require assumptions regarding the magnitude of the smoking-lung cancer associations. This is an advantage because such estimates can vary among study cohorts. Figures 2 and 3 illustrate scenarios using relative-rate estimates from 2 large, widely cited studies of smoking effects (the American Cancer Society's second Cancer Prevention Study7 and the UK study of smoking and death among male British doctors8). Despite the fact that the associations among smoking, COPD, and lung cancer differ between these populations, the adjustment procedure described in this article consistently performed well (Figs. 2, 3).
Best and Hansell16 recently described an approach for joint modeling of the spatial distribution of COPD and lung cancer mortality data. Their focus was on detection of geographic variation in COPD mortality rates in Great Britain. Similar to the method described in this article, they considered joint models for these diseases to account for unmeasured confounding due to spatial clustering of smoking. The spatial methods described by Best and Hansell assume spatially structured effects (ie, a structure for effects across adjacent areas); no such assumption is employed in this article. Rather, the current article focuses on etiologic research in occupational cohort mortality studies, and specifically on estimation of the association between a primary exposure variable of interest and lung cancer mortality.
Most published examples of sensitivity analyses for assessing confounding by smoking of occupational exposure-lung cancer associations have assumed that the prevalence of smoking in the study cohort does not vary over time.3,4,17 However, in a typical occupational cohort study, people move through cumulative exposure groups over time. Since smokers typically commence smoking at young ages and may quit at older ages, the percentage of former smokers may tend to increase and the percentage of current smokers may tend to diminish as the cohort ages. Therefore, even if the prevalence of smoking is similar in jobs with different exposure intensities, a pattern of confounding by smoking may occur in an analysis that employs a cumulative metric of occupational exposure. As long the assumptions outlined above hold, the adjustment procedure described in this article will accommodate even the relatively complex and time-dynamic patterns of confounding by smoking that may occur in occupational cohort studies with time-varying exposure metrics.
In summary, this adjustment approach for confounding by smoking appears to provide a high degree of control for confounding under plausibly encountered scenarios. Unlike the typical approaches to sensitivity analysis, this adjustment does not require assumptions regarding distributions of the confounding variable. The results obtained via this method may help in the interpretation of effect estimates derived from occupational cohort studies that lack individual smoking history information.
I thank N. Kyle Steenland and Steve Wing for comments on an earlier draft of this manuscript.
1. Blair A, Steenland K, Shy C, O'Berg M, Halperin W, Thomas T. Control of smoking in occupational epidemiologic studies: methods and needs. Am J Ind Med
2. Schottenfeld D, Fraumeni JF. Cancer Epidemiology and Prevention
. New York: Oxford University Press; 2006.
3. Axelson O, Steenland K. Indirect methods of assessing the effects of tobacco use in occupational studies. Am J Ind Med
4. Steenland K, Greenland S. Monte Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer. Am J Epidemiol
5. Gail MH, Wacholder S, Lubin JH. Indirect corrections for confounding under multiplicative and additive risk models. Am J Ind Med
6. Steenland K, Beaumont J, Halperin W. Methods of control for smoking in occupational cohort mortality studies. Scand J Work Environ Health
7. Thun MJ, Apicella LF, Henley SJ. Smoking vs. other risk factors as the cause of smoking-attributable deaths: confounding in the courtroom. JAMA
8. Doll R, Peto R, Boreham J, Sutherland I. Mortality in relation to smoking: 50 years' observations on male British doctors. BMJ
9. Singer JD, Willett JB. Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence
. New York: Oxford University Press; 2003.
10. Allison PD. Survival Analysis Using SAS: A Practical Guide
. Cary, NC: SAS Institute; 1995.
11. Abbott RD. Logistic regression in survival analysis. Am J Epidemiol
12. Checkoway H, Heyer NJ, Seixas NS, et al. Dose-response associations of silica with nonmalignant respiratory disease and lung cancer mortality in the diatomaceous earth industry. Am J Epidemiol
13. Rushton L. Chronic obstructive pulmonary disease and occupational exposure to silica. Rev Environ Health
14. Rushton L. Occupational causes of chronic obstructive pulmonary disease. Rev Environ Health
15. Axelson O. Confounding from smoking in occupational epidemiology. Br J Ind Med
16. Best N, Hansell AL. Geographic variations in risk: adjusting for unmeasured confounders through joint modeling of multiple diseases. Epidemiology
17. Kriebel D, Zeka A, Eisen EA, Wegman DH. Quantitative evaluation of the effects of uncontrolled confounding by alcohol and tobacco in occupational cancer studies. Int J Epidemiol