Secondary Logo

Journal Logo

Original Article

Smoking and the Risk of Lung Cancer

Susceptibility with GSTP1 Polymorphisms

Miller, David P.*; Neuberg, Donna; De Vivo, Immaculata; Wain, John C.§; Lynch, Thomas J.; Su, Li*; Christiani, David C.*∥

Author Information
doi: 10.1097/01.ede.0000073120.46981.24
  • Free


Cigarette smoke contains many carcinogens, including polyaromatic hydrocarbons, n-nitrosamines, and aromatic amines.1 When inhaled, they enter the lung and then the bloodstream, where they are metabolized (generally) in a two-phase process.2 Phase I involves the “activation” of the carcinogen by oxygenation. This is mainly performed through cytochrome P450 enzymes encoded by the CYP gene superfamily. Phase II is the detoxification step, in which the “activated” carcinogen is rendered more hydrophilic, thus excretable. Glutathione S-transferases, a major group of enzymes whose main classes are α, μ, π, and θ, are directly involved in the detoxification step of metabolism (phase II).2 The expression of these enzymes differs among organs.

GST π has the highest expression in the lung and is one of the main detoxifiers of the activated form of benzopyrene (BPDE; 7,8-diole-9,10-epoxide), a major carcinogen of tobacco smoke.1,3,4 GST π is encoded by a polymorphic gene, GSTP1. One polymorphism of GSTP1 is caused by a single base pair substitution, where (A) adenine is replaced by (G) guanine, leading to an amino acid substitution in which isoleucine (I105) is replaced by valine (V105). This substitution results in a lower GST π enzymatic activity5,6 and is associated with higher hydrophobic adduct levels in lung tissue7 and higher levels of polycyclic aromatic hydrocarbon (PAH)–DNA adducts in human lymphocytes.8

GSTP1 polymorphisms may modify the association between a cumulative exposure to active smoking and lung cancer. Lung cancer risk is directly associated with increases in cumulative smoking exposure.9,10 Adduct levels have also been shown to increase in a dose-dependent manner in smokers.11 Lung cancer risk is also affected by other components of a smoking history. Specifically, lung cancer risk increases with smoking status (nonsmoker, ex-smoker, current smoker), and decreases with increasing number of years since smoking cessation.9 Previous studies have examined the modification of the association between smoking and lung cancer risk by polymorphisms in metabolism genes. Most studies have simply stratified lifetime exposure to cigarette smoking without considering possible effects of smoking status or years since smoking cessation.12-16 Few studies have examined the effect of polymorphisms on the association between a continuous cumulative smoking exposure and lung cancer risk also without considering smoking status.17,18 Yet, there is evidence that the time since a participant quit smoking may affect the association between cumulative exposure to smoking and lung cancer.

We investigated whether the GSTP1 GG genotype modified the association between a cumulative exposure to smoking and lung cancer risk, assuming that this association also differed by smoking status. We also examined the assumptions and resulting interpretation of common modeling approaches.


We recruited subjects for this study as part of an ongoing hospital-based case–control study initiated in 1992 at the Massachusetts General Hospital (MGH), in Boston, MA. The study was approved by the Institutional Review Board at both Massachusetts General Hospital and Harvard School of Public Health. Eligible cases included any person over the age of 18 years with a diagnosis of primary lung cancer, evaluated by the pulmonary, thoracic surgery, or hematology–oncology units at MGH for either surgery (from 1992), chemotherapy (from 1996), radiation treatment, or any combination. An MGH lung pathologist histologically confirmed all cases.

We recruited controls first among friends and nonblood-related family members of the cases (usually spouses). If friends of lung cancer patients were not available, we recruited controls from friends and family of patients receiving thoracic surgery, chemotherapy, or radiation treatment of a condition other than lung cancer. To determine whether our controls were similar to the Massachusetts general population, we compared important covariate data obtained from our controls with information provided by the Massachusetts Tobacco Survey, 1993 to 19979 and noted that the results were similar. Although the data were not restricted to residents of Massachusetts, 84% of the cases and 79% of the controls resided in Massachusetts, whereas 10% of the cases and 11% of the controls were from the rest of the New England area, respectively. The remainder of five cases (6%) and controls (10%) were from other regions of the United States.

A research nurse sought and obtained informed consent from all cases and controls and proceeded to administer the health and diet questionnaires. Some participants opted to complete the questionnaires at home and returned them by mail in a self-addressed stamped envelope. Participants were contacted by telephone when there were missing data. The participation rate was over 85% and did not differ between cases and controls. To reduce potential variation in allele frequency by ethnicity, only whites were considered in the analysis.

We collected blood samples from all participants at the time of recruitment. Two or three 10-ml EDTA tubes were used for sample collection. Samples were processed in the molecular epidemiology laboratory at the Harvard School of Public Health. DNA was extracted from whole blood for the purpose of genotyping. All genotyping was performed using polymerase chain reaction–restriction fragment length polymorphism techniques and blinded to case or control status. GSTP1 genotypes were determined using methods described by Harris et al.19 For quality control, a random 5% of the samples were repeated an found to be 100% concordant. Two authors independently reviewed all of the agarose gels and genotype data entry.

Data on other variables were obtained through the health questionnaire.17,20,21 These included age, sex, race, weight, education, medical history, smoking history, family history of cancer, work history, exposure to various substances, participation in specific activities, and food preparation and consumption. We estimated cumulative exposure to cigarette smoking in pack-years by multiplying the mean number of packs smoked per day by the number of years of smoking, taking into account periods of smoking cessation. Smoking status is defined as current smoker (quit smoking less than a year from recruitment or current smoker), ex-smoker (quit smoking more than a year before recruitment), and nonsmokers (smoked 100 cigarettes or less in a lifetime).

Genotype frequencies were calculated among controls to test for Hardy–Weinberg equilibrium. We considered GSTP1 GG and AG genotypes separately in each model, thus avoiding the assumptions required to pool heterozygote and homozygote variants. The continuous variables used in the models were assessed to determine if any transformations were necessary. The square root transformation of the variable for pack-years was used in the analyses.

To assess whether GSPT1 GG modified the association between pack-years and lung cancer risk, we used a model with two interaction terms, one between genotype and pack-years in current smokers and the another interaction between genotype and pack-years in ex-smokers (model A). Model A included indicator variables for the GSTP1 polymorphism (AG, GG), a continuous variable for pack-years among ex-smokers (where the variable equals 0 if the participant is either a nonsmoker or a current smoker), and a continuous variable for pack-years among current smokers (where the variable equals 0 for nonsmokers and ex-smokers), and two separate interaction terms between GSTP1 GG and pack-years for ex-smokers, and GSTP1 GG and pack-years for current smokers.

We also examined a model that assumes nonsmokers do not contribute to the estimate of lung cancer risk associated with pack-years (model B), and a model that assumes smoking status does not have an impact on the association between a cumulative smoking exposure and lung cancer risk (Model C). Model B considered a three-way interaction among genotype, smoking status and pack-years. Specifically, Model B contained indicator variables for the GSTP1 polymorphisms (AG, GG), smoking status (ex-smoker, current smoker), pack-years, and all the two-way and three-way interactions of these variables. Model C considered an interaction term between genotype and pack-years only. Specifically, model C included indicator variables for the GSTP1 polymorphisms (AG, GG), smoking status (ex-smoker, current smoker), pack-years, and an interaction term between GSTP1 GG and pack-years. All models included age, sex, and the number of years since a participant quit smoking (which was coded as zero for current and nonsmokers). We used the likelihood ratio test to evaluate the interaction terms.22


Table 1 shows the basic characteristics of cases and controls. Cases were older and more likely to be men. As expected, there were a higher number of nonsmokers among controls compared with cases (36% vs. 6%). The median pack-years among smokers were significantly higher for cases (53 vs. 26) and the median number of years since an ex-smoker quit smoking was higher among controls (18 vs. 11). Although the frequency of participants without a college degree was similar for cases and controls (70% vs. 69%), there were fewer college graduates among cases (22% vs. 30%), with 8% of the data missing for cases, and 1% missing for controls.

Covariate Summary Data for Cases and Controls

The overall genotype distribution did not differ substantially between cases and controls (Table 1). The genotype frequencies were in Hardy–Weinberg equilibrium among the controls. Adenocarcinoma cell type accounted for 39% of cases, squamous cell-type comprised 22%, and all others represented 20%. Cell type data were not available at the time the study was conducted for 195 cases (19%). Staging information was available for 71% of the cases: stage I and II, 36%; stage III and IV, 35%. The relatively high percentage of early stage (potentially curable) cancers reflected a referral bias for this specialty hospital.

Table 2 shows the distribution of the covariates from Table 1, stratified by polymorphisms. The distribution of polymorphisms was similar by sex and education. The frequency of the GSTP1 GG genotype was highest among cases with squamous cell histology (15%). Among cases, the frequencies of the GSTP1 AA and GG genotypes were higher among those who had ever smoked, whereas the frequency of the GSTP1 AG genotype was higher among nonsmokers. Among controls, this pattern was reversed for the GSTP1 GG genotype: nonsmokers had the highest and the current smokers had the lowest frequency. Stratification by genotype showed no important differences in the medians of pack-years and number of years a participant quit smoking among cases and controls. The following crude odds ratios were calculated (using data from Table 2), to examine the risk of lung cancer associated with GSTP1 GG compared with AA genotype in the various smoking status strata: nonsmokers (OR = 0.86; 95% confidence interval [CI] = 0.32–2.24), ex-smokers (1.19; 0.78–1.82), and current smokers (1.95; 1.03–2.95).

Covariate Summary Data for Cases and Controls by Genotype

Table 3 shows the results of our gene-pack-years interaction analysis evaluating whether the association between pack-years and lung cancer risk is modified by the GSTP1 polymorphism. We selected a 26 pack-year increase to represent the adjusted odds ratios (AORs) because this was the median among controls, although the results are robust at other cutpoints. The table shows the adjusted odds ratios for the association between a 26 pack-year increase and lung cancer risk.

Lung Cancer Risk Associated with an Increase in 26 Pack–years Stratified by Smoking Status and Genotype

Model A allows participants with zero pack-years (nonsmokers) to contribute to the estimate of dose–response by assuming that smokers and nonsmokers all have a baseline-risk of lung cancer in the absence of any exposure. More importantly, individuals with pack-years equal to zero (nonsmokers) actually contributed to the estimate for pack-years. The presence of the interaction terms was statistically significant (P < 0.01). The association between pack-years and lung cancer risk for ex-smokers and current smokers is substantially modified by the presence of the GSTP1 GG genotype (Table 3). The AOR for a 26 pack-year increase for ex-smokers with the GSTP1 AA genotype is 4.7 (CI = 3.9–5.8), and for ex-smokers with the GSTP1 GG genotype is 7.5 (4.2–14). The AOR for a 26 pack-year increase for current smokers with the GSTP1 AA genotype is 6.1 (4.9–7.5), and for the current smokers with the GSTP1 GG genotype is 13 (6.5–25). Figure 1 shows the lung cancer risk associated with increases in pack-years and stratified by smoking and genotype status (derived from Model A).

Lung cancer risk associated with increases in pack-years stratified by genotype and smoking status based on results from model A. Current GG, current smoker with GSTP1 GG genotype; Ex GG, ex-smoker with GSTP1 GG genotype; Current AA, current smoker with GSTP1 AA genotype; Ex AA, ex-smoker with GSTP1 AA genotype.

Model B considered a three-way interaction among smoking status, pack-years, and genotype. This model assumed that lung cancer risk differs not only by genotype but also by smoking status. Model B did not include nonsmokers in the calculation of the estimate for pack-years and assumed differences in risks at pack-years zero for current and ex-smokers. As expected, all the calculated odds ratio estimates among the different strata of smoking status and genotypes evaluating the association between pack-years and lung cancer risk were reduced but still substantial.

Model C evaluated an interaction between GSTP1 GG and pack-years only, assuming that lung cancer risk associated with pack-years is modified by genotype but not smoking status. Participants with pack-years equal to zero (nonsmokers) contributed to this estimate. There was modest interaction by genotype (Table 3). Lung cancer risk associated with a 26 pack-year increase was 6.4 (CI = 3.5–11.7) with the GSTP1 GG compared with 3.7 (2.8–5.0) for GSTP1 AA. All three models (A, B, and C) evaluated the entire case–control population.


Modification of tobacco carcinogen metabolism is the basis of almost all xenobiotic metabolizing polymorphism studies of lung cancer risk. Thus, polymorphisms should be evaluated as an effect modifier of the smoking–lung cancer risk association, through evaluation of gene–smoking interactions. Most studies have treated smoking as a confounder of the polymorphism–lung cancer risk association by adjusting for smoking variables in a logistic regression.7,12,13,15,23-25

In the specific case of the GSTP1 polymorphism, there are biologic data to suggest an association between the polymorphisms and intermediary markers of lung carcinogenesis. GST π is involved primarily in the detoxification of BPDE,1,3,4 one of the main carcinogens in tobacco smoke. Ryberg and colleagues7 found substantially higher levels of DNA adducts in patients with the GSTP1 GG polymorphism compared with GSTP1 AA patients. Watson et al6 reported that the enzymatic activity on substrates (including BPDE) associated with the GSTP1 GG genotype was lower than the activity associated with GSTP1 AA genotype.

Lung cancer risk is associated with several aspects of smoking exposure (cumulative smoking exposure, smoking status, number of years quit smoking).9 All these factors and their possible modification by metabolism genes should be considered when assessing smoking related lung cancer risk. If the exposure variable is collapsed into too few categories, the dose effect from a cumulative exposure and the way in which this can vary depending on an individual’s genetic susceptibility profile can be missed

Other studies have examined effect modification by genetic polymorphisms of the association between pack-years and lung cancer risk. These studies were mostly conducted by stratified analyses in which the risk associated with one gene is compared in two groups dichotomized by a preset amount of pack-years. London et al13 reported an AOR of 1.77 for GSTM1 Null and lung cancer risk among light smokers (<40 pack-years) but not in heavy smokers (AOR = 0.90). They used a cutoff point for smoking that was based on two previous studies. To-Figueras et al14 also reported an AOR of 1.77 for the association between GSTM1 Null and lung cancer risk among those with pack-years less than or equal to 50. The cutoff point was based on the median pack-years among cases because pack-year information for controls was not available. Kihara et al15 reported an increase in squamous cell lung cancer risk associated with the GSTM1 Null genotype when comparing the lowest category for cumulative smoke exposure (AOR = 0.8) to the highest (AOR = 3.1). Although Jourenkova-Mironova et al12 considered interactions of GSTP1, GSTM1, and GSTM3 with pack-years, they reported only the results of stratified analyzes. Lung cancer risk associated with the three deficient genotypes was higher among heavy smokers (<35 pack-years) compared with light smokers (>35 pack-years). The cutoff point was based on the distribution of pack-years among controls. In contrast, two studies investigated these gene–pack–year interactions, but considered a continuous variable for pack-years. We previously evaluated GSTM1 and CYP1A1 MSP1 genotypes as individual effect modifiers of the pack-years lung cancer risk association,17 and Nyberg et al18 separately examined similar associations with GSTM1 and NAT2 polymorphisms. No interaction was detected. However, both studies had small sample sizes, with lower power to detect interactions.

Our study reports an interaction between GSTP1 GG and cumulative smoking exposure, measured in pack-years. Specifically, the GSTP1 GG genotype is associated with a substantially higher risk of lung cancer at any given level of exposure defined by pack-years, and the risk ratio for a 26 pack-year increase is greater for current smokers (GSTP1 GGAOR: 12.8; for GSTP1 AA, AOR = 6.1) compared with ex-smokers (GSTP1 GG, AOR = 7.5; for GSTP1 AA, AOR = 4.7) (model A). Even if we assume that the association between pack-years and lung cancer risk is not modified by smoking status (model C), there is still interaction between genotype and lung cancer risk (for 26 pack-year increase: GSTP1 GG, AOR = 6.4; for GSTP1 AA, AOR = 3.7). An analysis removing outliers did not change the results.

To assess fully the association between a continuous variable and outcome, a broad range of exposure is necessary. In the case of smoking and lung cancer, excluding nonsmokers reduces this range and may alter what may be the true association between pack-years and lung cancer risk. In addition, power would be reduced by excluding nonsmokers from the study population (22%). Both models A and C allow for the nonsmokers (pack-years = 0) to contribute to the estimate of the coefficient for pack-years. In addition, model A accounts for differences in the gene–pack-years interaction because of smoking status, which is a biologically plausible assumption, whereas model B adjusts for smoking status as a confounder but assumes that the gene–pack-years interaction does not differ based on smoking status.

Models with three-way interactions (model B) are a common approach when examining effect modification of the association between pack-years and lung cancer by two variables (smoking status and genotype). In this model, nonsmokers did not contribute to the estimate of the pack-years coefficient. However, excluding those at pack-years 0 from the estimate of the association between pack-years and lung cancer risk may not be biologically appropriate. This not only alters the association between smoking and lung cancer risk, but also affects the differences in this association in the different genotypic strata.

Our study had several limitations. Recall bias can be a concern for our smoking exposure variables, which were collected through an administered health questionnaire. In choosing controls who are either friends or nonblood-related family member of cases, the motivation drive to recall exposures possibly associated with lung cancer risk is more likely to be similar for cases and controls.26 However, our choice of controls leads to another concern: overmatching. The data show a wide distribution of our cumulative smoking exposure. If the analysis results in an underestimation of the lung cancer risk associated with a cumulative smoking exposure, this should not affect the gene–environment analysis. A third limitation is that we did not have the power to stratify the data further by lung cancer cell type.

In conclusion, the lung cancer risk associated with pack-years is substantially greater with the GSTP1 GG genotype. This risk is greater for current smokers than for ex-smokers. Understanding the assumptions that underlie statistical models is critical to appropriate interpretations of these models. When assessing which model assumptions are appropriate, biologic plausibility should be considered.26


We gratefully acknowledge the assistance of Linda Lineback, Barbara Bean, Jeanne Jackson, and Andrea Solomon for patient recruitment and data collection; Lucy Ann Principe, Salvatore V. Mucci, and Richard Rivera-Massa for data entry; Stephanie Shih for sample preparation and genotyping; and Geoffrey Liu and Shannon Magari for editing advice and comments.


1.Hecht SS. Tobacco smoke carcinogens and lung cancer. J Natl Cancer Inst. 1999;91:1194–1210. (in process citation)
2.Casarett LJ, Klaassen CD, Amdur MO, et al. Casarett and Doull’s Toxicology: the Basic Science of Poisons. 5th ed. New York: McGraw–Hill Health Professions Division; 1996.
3.Fields WR, Morrow CS, Doss AJ, et al. Overexpression of stably transfected human glutathione S-transferase P1–1 protects against DNA damage by benzo[a]pyrene diol-epoxide in human T47D cells. Mol Pharmacol. 1998;54:298–304.
4.Nakajima T, Elovaara E, Anttila S, et al. Expression and polymorphism of glutathione S-transferase in human lungs: risk factors in smoking-related lung cancer. Carcinogenesis. 1995;16:707–711.
5.Ali-Osman F, Akande O, Antoun G, et al. Molecular cloning, characterization, and expression in Escherichia coli of full-length cDNAs of three human glutathione S-transferase Pi gene variants. Evidence for differential catalytic activity of the encoded proteins. J Biol Chem. 1997;272:10004–10012.
6.Watson MA, Stewart RK, Smith GB, et al. Human glutathione S-transferase P1 polymorphisms: relationship to lung tissue enzyme activity and population frequency distribution. Carcinogenesis. 1998;19:275–280.
7.Ryberg D, Skaug V, Hewer A, et al. Genotypes of glutathione transferase M1 and P1 and their significance for lung DNA adduct levels and cancer risk. Carcinogenesis. 1997;18:1285–1289.
8.Butkiewicz D, Grzybowska E, Phillips DH, et al. Polymorphisms of the GSTP1 and GSTM1 genes and PAH-DNA adducts in human mononuclear white blood cells. Environ Mol Mutagen. 2000;35:99–105.
9.Samet JM. The epidemiology of lung cancer. Chest. 1993;103(Suppl 1):20S–29S.
10.Samet JM, Wiggins CL, Humble CG, et al. Cigarette smoking and lung cancer in New Mexico. Am Rev Respir Dis. 1988;137:1110–3.
11.Randerath E, Randerath K. Monitoring tobacco smoke-induced DNA damage by 32P-postlabelling. IARC Sci Publ. 1993;124:305–314.
12.Jourenkova-Mironova N, Wikman H, Bouchardy C, et al. Role of glutathione S-transferase GSTM1, GSTM3, GSTP1 and GSTT1 genotypes in modulating susceptibility to smoking-related lung cancer. Pharmacogenetics. 1998;8:495–502.
13.London SJ, Daly AK, Cooper J, et al. Polymorphism of glutathione S-transferase M1 and lung cancer risk among African-Americans and Caucasians in Los Angeles County, California. J Natl Cancer Inst. 1995;87:1246–1253.
14.To-Figueras J, Gene M, Gomez-Catalan J, et al. Glutathione-S-Transferase M1 and codon 72 p53 polymorphisms in a northwestern Mediterranean population and their relation to lung cancer susceptibility. Cancer Epidemiol Biomarkers Prev. 1996;5:337–342.
15.Kihara M, Noda K. Lung cancer risk of the GSTM1 null genotype is enhanced in the presence of the GSTP1 mutated genotype in male Japanese smokers. Cancer Lett. 1999;137:53–60.
16.Taioli E, Ford J, Trachman J, et al. Lung cancer risk and CYP1A1 genotype in African Americans. Carcinogenesis. 1998;19:813–817.
17.Garcia-Closas M, Kelsey KT, Wiencke JK, et al. A case-control study of cytochrome P450 1A1, glutathione S-transferase M1, cigarette smoking and lung cancer susceptibility (Massachusetts, United States). Cancer Causes Control. 1997;8:544–553.
18.Nyberg F, Hou SM, Hemminki K, et al. Glutathione S-transferase mu1 and N-acetyltransferase 2 genetic polymorphisms and exposure to tobacco smoke in nonsmoking and smoking lung cancer patients and population controls. Cancer Epidemiol Biomarkers Prev. 1998;7:875–883.
19.Harris MJ, Coggan M, Langton L, et al. Polymorphism of the Pi class glutathione S-transferase in normal populations and cancer patients. Pharmacogenetics. 1998;8:27–31.
20.Ferris BG. Epidemiology Standardization Project (American Thoracic Society). Am Rev Respir Dis. 1978;118(6 Pt 2):1–120.
21.Cheng TJ, Christiani DC, Xu X, et al. Glutathione S-transferase mu genotype, diet, and smoking as determinants of sister chromatid exchange frequency in lymphocytes. Cancer Epidemiol Biomarkers Prev. 1995;4:535–42.
22.Hosmer DW, Lemeshow S. Applied Logistic Regression. 1st ed. New York: Wiley; 1989.
23.Hou SM, Ryberg D, Falt S, et al. GSTM1 and NAT2 polymorphisms in operable and non-operable lung cancer patients. Carcinogenesis. 2000;21:49–54.
24.Rom WN, Hay JG, Lee TC, et al. Molecular and genetic aspects of lung cancer. Am J Respir Crit Care Med. 2000;161(4 Pt 1):1355–1367.
25.To-Figueras J, Gene M, Gomez-Catalan J, et al. Genetic polymorphism of glutathione S-transferase P1 gene and lung cancer risk. Cancer Causes Control. 1999;10:65–70.
26.Rothman KJ, Greenland S. Modern Epidemiology. 2nd ed. Philadelphia: Lippincott Williams & Wilkins; 1998.

GSTP1; lung cancer; case-control; susceptibility

© 2003 Lippincott Williams & Wilkins, Inc.