Share this article on:

Models of Smoking and Lung Cancer Risk: A Means to an End

Samet, Jonathan M.*; Thun, Michael J.; Berrington de Gonzalez, Amy*

doi: 10.1097/EDE.0b013e3181271afa

This commentary provides some historical context to the analysis of smoking and lung cancer risk by Lubin and colleagues in this issue of epidemiology. It also considers the potential utility of ongoing efforts to apply complex mathematical models to epidemiologic data on smoking and lung cancer risk. We conclude that the work of Lubin and colleagues adds to the models already developed and points to some potential complexities that models should incorporate.

From the *Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland; and †the American Cancer Society, Atlanta, Georgia.

Correspondence: Jonathan M. Samet, Professor and Chairman, Department of Epidemiology, Johns Hopkins University, Bloomberg School of Public Health, 615 N. Wolfe St., Suite W6041, Baltimore, MD 21205. E-mail:

More than a half century has passed since the first epidemiologic studies provided strong evidence that cigarette smoking causes lung cancer.1 Since then, numerous case-control and cohort studies have characterized how risk varies with the number of cigarettes smoked and the duration of smoking, as well as time since cessation for former smokers.2,3 The epidemiologic data have been used to test and refine models of carcinogenesis, to predict individual risk of developing disease for clinical purposes,4,5 and to estimate the proportion of lung cancer cases or deaths in the population attributable to smoking or other exposures as the basis for regulatory policies. This commentary provides some historical context to the analysis of smoking and lung cancer risk by Lubin and colleagues6 in this issue of epidemiology. It also considers the potential utility of ongoing efforts to apply complex mathematical models to epidemiologic data on smoking and lung cancer risk.

Epidemiologic data have long been a basis for formulating disease models, particularly for cancer. One early example is the multistage model of carcinogenesis, which fits closely with the age-related increase in the occurrence of lung and other solid tumors. This was first noted by Armitage and Doll in 1961,7 decades before the discovery of specific genetic or epigenetic events involved in lung cancer. In 1978, Doll and Peto fit a multistage model of carcinogenesis to lung cancer incidence in the British Doctors’ Study8 and demonstrated that risk increased by the fourth or fifth power of the duration of smoking but only by the second power of the number of cigarettes smoked per day. The results imply that the temporal trend of the time toward increasingly younger age at initiation and more prolonged duration of smoking would lead to substantially increased risk. Furthermore, research analyses that combine number of cigarettes smoked and duration of smoking into a single cumulative measure (eg, pack-years) may misrepresent the relative importance of the 2 factors.9

More recently, researchers have examined other models of lung cancer risk in smokers, some replicating the findings of Doll and Peto9,10 and others offering new approaches.5,11,12 These new papers are based in large cohort and case-control studies, and model development and estimation are facilitated by new statistical methods and software. The models vary in their dependence on assumptions about the underlying biology of carcinogenesis in the situation of interest. Some, known as biologically motivated models, include parameters for tissue growth, cell kinetics, and number of stages—none of which can be verified or quantified at present. Others, known as empirical risk models, rely more directly on the observational data, although they too make implicit assumptions about the shape of the dose-response relationship and how model parameters should be specified.

In this issue of epidemiology, Lubin and colleagues6 further examine the relationship between lung cancer risk—or, more precisely, the excess relative risk of lung cancer—and several measures of smoking. Using a model they proposed previously,11 they describe a set of analyses on modification of the effects of a cumulative exposure indicator (pack-years of smoking) and of an indicator of “intensity” of smoking (cigarettes smoked per day) by attained age and sex, as well as additional descriptors of smoking. The underlying goal of this analysis, like that of Doll and Peto in their 1978 paper,8 is to gain insight about the biologic actions of smoking on lung cancer risk. Despite its laudable aim, however, their approach does not resolve some of the inherent challenges encountered when using epidemiologic data to test and refine models of carcinogenesis.

Lubin et al6 pool data from 2 large case-control studies—the European Smoking and Health Study, a multicenter study conducted in the late 1970s, and the German Radon Study, carried out in the 1990s. In classifying exposure to cigarette smoking, the authors recognize the limitations of standard approaches that specify either duration or cigarettes/day separately or the combined variable pack-years; such approaches preclude the ability to separate dose rate from total dose. Pack-years alone cannot distinguish a history of smoking 40 cigarettes/day for 10 years from smoking 20 cigarettes/day for 10 years. Similarly, an increase in relative risk associated with smoking more cigarettes/day for a fixed duration cannot be interpreted as representing solely the effect of the increase in cigarettes smoked per day, if cumulative exposure is assumed to affect risk independently. The authors attempt to resolve this problem by including indices of both cumulative exposure (pack-years of smoking) and “intensity” (cigarettes smoked per day) in the model. However, the attempted solution creates a new problem, since cigarettes smoked per day is represented twice in the model. For a fixed number of pack-years, an increase in the number of cigarettes/day requires a decrease in the duration of smoking. A cumulative exposure of 20 pack-years of smoking could represent either 20 cigarettes/day for 20 years or 40 cigarettes/day for 10 years.

The difficulties in classifying these various dimensions of exposure are further complicated by the number of interaction terms tested in the models. Testing for effect-modification of both smoking intensity and total pack-years results in 5 interaction tests for each potential effect modifier. Not only does the large number of interaction tests increase the potential for false-positive results, but the observed interactions are complex and interpreted based primarily on the level of statistical significance. Most of the interaction terms considered, such as years since cessation, appear to modify the association with intensity. The opposite is reported for sex, which modifies the association of lung cancer with total exposure (pack-years) but not with cigarettes per day. It is difficult to interpret these findings in a biologic framework, given the potential for false positives, the complexity of the associations, and unexplained heterogeneity of results between the 2 studies.

Despite these concerns, several findings are intriguing and may in fact provide insight into underlying biologic processes. Both this study and a previous analysis of European case-control study data11 report that lung cancer risk does not increase in a linear fashion with the number of cigarettes smoked per day. Instead, the excess relative risk per cigarette smoked diminishes above approximately 20 cigarettes per day. This observation meshes nicely with studies of some biomarkers of smoke components13; for nicotine specifically, the intake per cigarette is less for smokers of above 20 per day, compared with those consuming fewer.13 Type of cigarette smoked has little impact on risk, even though the contrast is between filter and nonfilter cigarettes smoked decades previously, particularly in the European study. The latter finding is consistent with other epidemiologic studies and with studies comparing biomarkers in smokers of cigarettes with differing machine-measured yields of tar and nicotine.3,13,14 These observations may relate more to the dosimetry of exposure to carcinogens from cigarette smoke than to underlying mechanisms of carcinogenesis.

Another reason to develop models of lung cancer risk in relation to smoking is to predict individual risk for clinical purposes such as counseling about smoking cessation or selecting high-risk individuals for participation in clinical trials of chemoprevention or screening.4,5 Models developed for other diseases are exemplary; the risk prediction model based on the Framingham Heart Study data has been widely applied in clinical practice,15 as has the breast cancer model developed by Gail and colleagues.16 The Bach lung cancer model was developed primarily to estimate expected lung cancer incidence and mortality rates in heavy smokers to compare these with the observed rates in observational studies of computed tomography screening for lung cancer.17

For clinical purposes, we need accepted and validated models for individual risk predictors. Cigarette smoking is a remarkably strong cause of lung cancer, and there are numerous large data sets available for model development and validation. The work of Lubin and colleagues6 add to the models already developed and points to some potential complexities that models should incorporate. A next step might be collaboration among modelers to compare their approaches and assess generalizability of their risk estimates with the goal of offering the best models for clinical and public health application. After all, these models are just a means to the end – ending the epidemic of lung cancer deaths.

Back to Top | Article Outline


JONATHAN SAMET is professor and chair of the Department of Epidemiology at the Johns Hopkins Bloomberg School of Public Health and director of the School's Institute for Global Tobacco Control. The epidemiology of lung cancer has been a major focus of his work, with emphasis on tobacco smoking and radon. MICHAEL THUN is vice-president of Epidemiology and Surveillance Research at the American Cancer Society, overseeing both cancer surveillance and studies on the causes and prevention of cancer. AMY BERRINGTON DE GONZALEZ is an assistant professor of epidemiology and biostatistics at the Johns Hopkins Bloomberg School of Public Health where her research is focused on cancer epidemiology and prevention and includes the development of statistical methods for epidemiologic studies.



Back to Top | Article Outline


1. U.S. Department of Health Education and Welfare (DHEW). Smoking and Health. Report of the Advisory Committee to the Surgeon General. DHEW Publication No. [PHS] 1103. 1964. Washington, DC: U.S. Government Printing Office.
2. Burns DM, Garfinkel L, Samet JM, eds., for the U.S. Department of Health and Human Services (USDHHS), Public Health Service & National Cancer Institute (NCI). Changes in Cigarette-Related Disease Risks and Their Implication for Prevention And Control. Bethesda, MD: U.S. Government Printing Office (NIH Publication No. 97-4213). Smoking and Tobacco Control Monograph; 1997.
3. U.S. Department of Health and Human Services (USDHHS). The Health Effects of Active Smoking: A Report of the Surgeon General. Washington, DC: U.S. Government Printing Office; 2004.
4. Spitz MR, Hong WK, Amos CI, et al. A risk model for prediction of lung cancer. J Natl Cancer Inst. 2007;99:715–726.
5. Bach PB, Kattan MW, Thornquist MD, et al. Variations in lung cancer risk among smokers. J Natl Cancer Inst. 2003;95:470–478.
6. Lubin JH, Caporaso N, Wichmann HE, et al. Cigarette smoking and lung cancer: Modeling effect modification of total exposure and intensity. Epidemiology. 2007;18:639-648.
7. Armitage P, Doll R. Stochastic models for carcinogenesis. Berkeley, CA: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability; 1961;19–38.
8. Doll R, Peto R. Cigarette smoking and bronchial carcinoma: dose and time relationships among regular smokers and lifelong non-smokers. J Epidemiol Community Health. 1978;32:303–313.
9. Flanders WD, Lally CA, Zhu BP, et al. Lung cancer mortality in relation to age, duration of smoking, and daily cigarette consumption: results from Cancer Prevention Study II. Cancer Res. 2003;63:6556–6562.
10. Knoke JD, Shanks TG, Vaughn JW, et al. Lung cancer mortality is related to age in addition to duration and intensity of cigarette smoking: an analysis of CPS-I data. Cancer Epidemiol Biomarkers Prev. 2004;13:949–957.
11. Lubin JH, Caporaso NE. Cigarette smoking and lung cancer: modeling total exposure and intensity. Cancer Epidemiol Biomarkers Prev. 2006;15:517–523.
12. Thurston SW, Liu G, Miller DP, et al. Modeling lung cancer risk in case-control studies using a new dose metric of smoking. Cancer Epidemiol Biomarkers Prev. 2005;14:2296–2302.
13. Blackford AL, Yang G, Hernandez-Avila M, et al. Cotinine concentration in smokers from different countries: relationship with amount smoked and cigarette type. Cancer Epidemiol Biomarkers Prev. 2006;15:1799–1804.
14. International Agency for Research on Cancer (IARC). Tobacco smoke and involuntary smoking. IARC monograph. Lyon, France: International Agency for Research on Cancer; 2004.
15. Grundy SM, Balady GJ, Criqui MH, et al. Primary prevention of coronary heart disease: guidance from Framingham: a statement for healthcare professionals from the AHA Task Force on Risk Reduction. American Heart Association. Circulation. 1998;97:1876–1887.
16. Gail MH, Brinton LA, Byar DP, et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually [see comments]. J Natl Cancer Inst. 1989;81:1879–1886.
17. Bach PB, Jett JR, Pastorino U, et al. Computed tomography screening and lung cancer outcomes. JAMA. 2007;297:953–961.
© 2007 Lippincott Williams & Wilkins, Inc.