Preterm birth is commonly defined as delivery before 37 weeks of gestation. Preterm birth is a serious public health problem,1 threatening all ethnic groups, but particularly African American populations.
The contemporary African American population is a genetic admixture between Africans and Europeans.2 Genetic ancestry has been linked to a number of health outcomes with a known black–white disparity, including asthma, cardiovascular diseases, prostate cancer, lung function, and preterm birth.3–5 However, it remains largely unknown whether genetic ancestry can further explain individual variation in self-reported African American women.6
In an earlier report,6 using 57 ancestry-informative markers in 812 women, we found that African ancestry was associated with preterm birth and its related traits. However, our previous ancestral estimates may not have been highly accurate because we used a set of only 57 ancestry-informative markers. Therefore, our current study is necessary to strengthen and expand on our previous findings and is bound to provide more convincing evidence for the potential usefulness of genetic ancestry in the preterm birth research. In addition, our team previously identified interactions of two genes, cytochrome P450 1A1 (CYP1A1) and glutathione S-transferases Theta 1 (GSTT1), and maternal smoking associated with preterm birth in the same population.7–9
In this study, we first examined whether the previously observed association was retained between African ancestral proportion and preterm birth and its subphenotypes (including very preterm birth and late preterm birth) by genotyping several more ancestry-informative markers in a larger set of patients to estimate African ancestral proportion. Second, we assessed the association of African ancestry and preterm birth, very preterm birth, and late preterm birth in self-reported African American women after controlling for pertinent risk factors. Finally, we examined whether including African ancestral proportion, gene polymorphisms, and GxE interactions could make additional and independent contributions to predicting preterm birth, very preterm birth, and late preterm birth beyond known epidemiologic risk factors.
MATERIALS AND METHODS
This study included a subset of 1,030 African American mothers (518 preterm birth individuals and 512 controls) enrolled in an ongoing case-control study of preterm birth at the Boston Medical Center.10 The current study overlaps with our earlier study6 in that 441 of the 812 individuals in our first study also participated in this study. The enrollment period was from 1998 to 2008. Mother in the case group were those who had singleton live births occurring at less than 37 weeks of gestation; and those in the control group were mothers delivering at 37 or more weeks of gestation with birth weight appropriate for gestational age as defined by the National Center for Health Statistics/Centers for Disease Control and Prevention guidelines (birth weight between 2,500 and 4,000 g).11 Pregnancies resulting in multiple births and newborns with major birth defects were excluded. A detailed description of the study population was previously published.10 The Institutional Review Boards of the Boston University Medical Center, the Massachusetts Department of Public Health, and Children's Memorial Hospital in Chicago approved the study protocol. All participants gave written informed consent.
Preterm birth (preterm birth) was evaluated as both a binary (less than 37 weeks of gestation compared with 37 or more weeks of gestation) and a continuous (gestational age) variable. Gestational age was assessed using an algorithm based on last menstrual period and the result of early ultrasonography (less than 20 weeks of gestation). The last menstrual period estimate was used only if confirmed by ultrasonography within 7 days or if no ultrasonogram estimate was obtained; otherwise, the ultrasonogram estimate was used. This approach has been used in previous studies.10,12
In this study, we used a cutoff point of less than 34 weeks to define very preterm birth (very preterm birth), and 34 0/7–36 6/7 weeks to define late preterm birth (late preterm birth), which has been used by other groups.13,14 In addition, we categorized preterm birth cases as spontaneous preterm birth if they occurred secondarily to documented active preterm labor (uterine contractions with cervical effacement and dilation at less than 37 weeks), or preterm premature rupture of membranes (PROM) (less than 37 weeks without uterine contractions) or both uterine contractions and preterm PROM occurring simultaneously; or as indicated, including preterm birth that was defined as delivery, which was not preceded by the presence of uterine contractions, rupture of membranes, or both. A detailed description was previously published.6
We genotyped a total of 1,509 ancestry-informative markers previously identified as highly informative between African and European ancestry4,15 for 1,030 African American mothers (518 preterm birth cases and 512 matched controls). Specifically, we applied the Illumina African American Panel as our genotyping platform (http://www.illumina.com/products/african_american_admixture_panel.ilmn). For quality control, four duplicate DNA samples were placed on each 96-well plate. The concordance rate of these duplicate samples was more than 99.5%.
In addition, we genotyped genetic markers within the two genes, CYP1A1 and GSTT1, separately. Detailed information regarding DNA extraction, polymerase chain reaction condition and quality check of CYP1A1 and GSTT1 has been described elsewhere.8,10
Several analytical approaches were used in this study. First, we obtained ancestral estimates for each patient, using the Structure program.16,17 Second, we compared the equality of ancestral distributions between preterm birth and term controls, very preterm birth and term controls, and very preterm birth and late preterm birth, respectively, using the Kolmogorov-Smirnov statistic. Third, using stepwise model selection in a logistic regression framework, we identified a set of significant risk factors and further tested the association of significant risk factors, African ancestry, CYP1A1, GSTT1, and their interactions with smoking with preterm birth, very preterm birth, and late preterm birth, individually. Last, the receiver operating characteristic curve analysis and the corresponding c statistic were obtained, and a nonparametric statistic was applied to assess the discrimination ability of different predictive models. Below, we provide detailed information about each analytical approach applied in this study.
First, we applied an admixture model implemented in the program Structure to estimate individual admixture proportions using our panel of 1,509 ancestry-informative markers.16,17 Specifically, the admixture model assumes that each individual inherits some proportion of their ancestry from each ancestral population. To compute African ancestral estimates, we input genotyping data from both ancestral populations (Africans and Europeans) (of note, genotyping data for both ancestral populations were from the International HapMap project [http://hapmap.ncbi.nlm.nih.gov/]), specified as known populations, and from admixed patients, specified as an unknown population. We then assumed an admixture model and used default values for other parameters provided by Structure with 5,000 burn-in and 50,000 further iterations through the Markov chain Monte Carlo algorithm. In addition, we generated plots to compare the distribution of African ancestral proportion between preterm birth and term controls, very preterm birth and term controls, and very preterm birth and late preterm birth, respectively. We also carried out the nonparametric test using the Kolmogorov-Smirnov statistic for the equality of distributions.
Because this study was based on a “loosely age-matched” design, we further took into account this “matching-adjusted age effect” to ensure that we did not misinterpret the results. We adjusted for this matching effect as follows. We first obtained the prevalence of preterm birth for each matching stratum. In the formula we let ncase be the number of preterm birth cases in the stratum a, ncontrol be the number of controls in the same stratum, and pa be the prevalence of preterm birth in this stratum. We used this formula: log(ncase/ncontrol)−log(pa /(1−pa)) to compute an age-adjusted variable for each individual within each corresponding stratum.18 Then we applied subsequent logistic regression while including this age-adjusted variable using an “offset” option implemented in STATA.
Next, we identified a set of risk factors using stepwise model selection (cutoff P<.15)—specifically, the examined known risk factors of preterm birth including maternal age (younger than 20, 20–24, 25–29, 30–34, and 35 years or older), education (middle school or less, high school, and more than high school), parity (0, 1, and 2 or higher), marital status (married, other), maternal prepregnant body mass index (BMI, calculated as weight (kg)/[height (m)]2) (less than 20, 20–24, 25–29, and 30 or more), maternal smoking during pregnancy (current smoking, quitters, and none smoking), illicit drug use (yes or no), stress (not stressed, average stress, and very stressed), and number of years in the United States (born in the United States, fewer than 5 years of residence, and 5 years or more of residence). In addition, we used logistic regression to examine the association of African ancestry proportion, CYP1A1, GSTT1 and their interactions with maternal smoking, with preterm birth, very preterm birth, and late preterm birth, respectively. The odds ratio (OR) is expressed for each 10% increment of African ancestry proportion.
We further assessed the model performance for the added value of African ancestry, CYP1A1, GSTT1, and their interactions with maternal smoking for the prediction of preterm birth, very preterm birth, and late preterm birth, individually, using the receiver operating characteristic curve analysis.19 Specifically, we evaluated the predictability across different models by calculating the concordance (c) statistic, the most commonly used quantity for indicating the discrimination ability of different models. In addition, we applied a nonparametric statistic implemented in STATA to test the equality of the area under the curve across four sequential predictive models.20 The set of risk factors assessed in the predictive models was as follows: the known epidemiologic variables described above, African ancestral proportion, two previously identified genetic polymorphisms (CYP1A1 and GSTT1), and their interactions with maternal smoking. Finally, to evaluate whether African ancestry was collinear with known preterm birth epidemiologic risk factors, we computed the Pearson correlation coefficient between African ancestry and maternal age, BMI, and years in the United States, and the point-biserial correlation coefficient between African ancestry and education, marital status, and parity. Data analyses were performed using statistical packages R 2.10.0 (http://www.r-project.org) and Intercooled STATA 11.0.
A total of 1,030 African American mothers (518 preterm birth individuals and 512 term controls) were included in the study. The 518 preterm birth individuals were further divided into 211 very preterm birth and 307 late preterm birth. Table 1 shows the demographic, clinical, and genetic characteristics of the study participants, stratified by preterm birth, very preterm birth, late preterm birth, and term controls. The average and corresponding standard deviation (SD) of African proportions was 0.88 (SD 0.15) in preterm birth, 0.88 (SD 0.12) in very preterm birth, 0.87 (SD 0.16) in late preterm birth, and 0.85 (SD 0.16) in controls. The distribution of African ancestry between preterm birth individuals and controls is provided in Figure 1 in the Appendix (available online at http://links.lww.com/AOG/A262). In addition, the Kolmogorov-Smirnov statistic indicated that the distributions of African ancestral proportion did not differ significantly in preterm birth compared with term controls (P=.78), very preterm birth compared with term controls (P=.20), and very preterm birth compared with late preterm birth (P=.18).
For the known preterm birth epidemiologic risk factors, we applied stepwise model selection to identify a subset of important risk factors, which were included in the subsequent predictive models (Table 2). Consistent with previous findings, maternal smoking during pregnancy and overall stress were significantly associated with preterm birth, whereas illicit drug use was borderline significant. In addition, we examined the association of African ancestry and CYP1A1, GSTT1, and their interactions with maternal smoking with preterm birth, very preterm birth and late preterm birth, separately. Notably, African ancestry was significantly associated with preterm birth (22% compared with 31%, OR 1.11 for every 10% increment, 95% confidence interval [CI] 1.02–1.20) and very preterm birth (23% compared with 33%, OR 1.17, 95% CI 1.03–1.33), but not with late preterm birth (22% compared with 29%, OR 1.06, 95% CI 0.97–1.16), whereas CYP1A1-maternal smoking was significantly associated with preterm birth (OR 1.83, 95% CI 1.20–2.81) and late preterm birth (OR 2.03, 95% CI 1.30–3.19), but not GSTT1-maternal smoking (Table 2).
Moreover, we performed stepwise model selection by stratifying preterm birth into the subtypes of spontaneous preterm birth and indicated preterm birth to identify a set of important preterm birth risk factors. Interestingly, the same set of important risk factors was identified among preterm birth, very preterm birth, late preterm birth, and spontaneous preterm birth, but not indicated preterm birth (Tables 1 and 2 in the Appendix, http://links.lww.com/AOG/A262).
Furthermore, we performed receiver operating characteristic curve analysis. We evaluated four different models for predicting preterm birth, very preterm birth, and late preterm birth, separately, as follows: 1) model 1: a set of known preterm birth epidemiologic risk factors, including smoking, illicit drug use, and overall stress, which were significant factors identified from the above association tests; 2) model 2: model 1 plus African ancestry; 3) model 3: model 1 plus GSTT1 and interaction of GSTT1 and maternal smoking, and CYP1A1 and interaction of CYP1A1 and maternal smoking; 4) model 4: model 3 plus African ancestry. As shown in Table 3, model 4 (containing African ancestry, genetic findings, and GxE interactions) showed the highest area under curve at 0.66 (95% CI 0.61–0.70) among the predictive models for very preterm birth. Likewise, model 4 (containing African ancestry, genetic findings, and GxE interactions) for preterm birth and late preterm birth, individually, also showed the best discrimination ability among the four presented models. Similarly, we performed receiver operating characteristic curve analysis and evaluated the four predictive models described above for spontaneous preterm birth and indicated preterm birth, respectively. We also used another set of important risk factors identified for indicated preterm birth and carried out the corresponding receiver operating characteristic curve analysis (Tables 3–5 in the Appendix, http://links.lww.com/AOG/A262). The results in Tables 5 and 6 in the Appendix (http://links.lww.com/AOG/A262) showed that the best discrimination ability was observed in model 4 (containing African ancestry, genetic findings, and GxE interactions) for spontaneous preterm birth.
We further applied a nonparametric statistic to test the equality of the area under curve. There was significant improvement when adding African ancestry, genetic findings, and GxE interactions in model 4 (Table 4). The results in Table 4 indicated that model 4 was significantly better than all the other models (model 1 compared with model 4, P=.002 for preterm birth; P<.001 for very preterm birth; P=.004 for late preterm birth).
We further examined whether African ancestral proportion was associated with the two preterm birth susceptible genetic polymorphisms (CYP1A1 and GSTT1). We found that neither of these genetic polymorphisms was significantly associated with African ancestral proportion (Table 1 in the Appendix, http://links.lww.com/AOG/A262). In addition, we tested whether African ancestry was collinear with known preterm birth epidemiologic risk factors such as age, BMI, education, marital status, parity, and years in the United States. Interestingly, we did not find any substantial degree of correlation between African ancestry and the above known preterm birth epidemiologic risk factors (data not shown).
Although the effect of genetic ancestry and its potential confounding effect have been examined in several common complex diseases such as asthma, breast cancer, prostate cancer, and cardiovascular disease, few studies have assessed the effect of genetic ancestry in preterm birth and very preterm birth. This study has evaluated the distributions of genetic ancestry among preterm birth, very preterm birth, and late preterm birth compared with term controls in a sample of 1,030 U.S. African American women. We further examined the association of genetic ancestry with preterm birth, very preterm birth, and late preterm birth in self-reported African American individuals. Moreover, we evaluated whether African ancestry, two preterm birth susceptibility genes, and their interactions with maternal smoking could make additional and independent contributions to predicting preterm birth, very preterm birth, and late preterm birth beyond known epidemiologic risk factors of preterm birth.
This study has evidenced several important findings. First, we demonstrated that African ancestry was significantly associated with preterm birth (OR 1.11, 95% CI 1.02–1.20) and very preterm birth (OR 1.17, 95% CI 1.03–1.33), but not with late preterm birth (OR 1.06, 95% CI 0.97–1.16) in the logistic regression models. However, no difference was observed in known epidemiologic factors between the very preterm birth group and the late preterm birth group in this study. It indicated that those known epidemiologic factors may have similar effect in both preterm subgroups. Interestingly, compared with preterm birth, previous studies also reported a greater black–white disparity for very preterm birth.21 This finding indicates that the influence of genetic ancestral background may have a greater effect on very preterm birth than on late preterm birth. In addition, it is likely that the effect of gene–environment interaction may be also stronger in very preterm birth. However, at present, the underlying mechanism of genetic ancestral influence in very preterm birth remains largely unexplored. It will be of importance to confirm this finding in another study and further conduct functional investigation. Second, we showed that the model including African ancestry, specific gene polymorphisms, and their interactions with maternal smoking provided the best discrimination ability with an area under curve of 0.66 (95% CI 0.61–0.70) for very preterm birth (Table 3).
Previous studies have identified a certain number of preterm birth–related genetic variants in the hope that these discoveries will provide deeper insights into the dissection of the etiology of preterm birth and ultimately lead to the development of new therapies, preventive strategies, or both.7,8,22 However, for many complex diseases including preterm birth, limited studies have applied these genetic discoveries to risk assessment in clinical practice. In this study, we included African ancestry along with genetic findings, gene–smoking interactions on preterm birth, and the known epidemiologic factors in predictive models for preterm birth, very preterm birth, and late preterm birth, respectively. Notwithstanding the results suggesting that incorporating African ancestry only or genetic findings and their interaction with smoking only offer merely limited improvements in our ability to predict preterm birth, very preterm birth, and late preterm birth beyond known preterm birth epidemiologic risk factors, the results also showed that the model including African ancestry, preterm birth–susceptible genes, and gene–smoking interactions provided the highest area under curve of 0.66 (95% CI 0.61–0.70) for very preterm birth among the sequential predictive models (Table 3).
This study has several strengths. First, the accuracy of estimating ancestral proportion is strongly affected by the number of ancestry-informative markers. In this study, we genotyped a panel of 1,509 ancestry-informative markers, which provided a robust estimation of African ancestry. Second, this is a large sample of inner-city African American women assembled to study the influence of African ancestry on preterm birth and its subgroups: very preterm birth and late preterm birth. Third, we applied receiver operating characteristic/area under curve methods to evaluate sequential models of African ancestry, genetic contributions, and GxE interactions beyond known epidemiologic risk factors. Our results suggested that genetic ancestry, gene polymorphisms, and their interactions with maternal smoking together can improve predictability for very preterm birth, although our predictive model is far from perfect and there is tremendous work that remains. Finally, our predictive models, which integrated epidemiologic factors, genetic discoveries, and GxE interactions, illuminate a future direction toward a better and more accurate predictive model of preterm birth in clinical practice.
On the other hand, this study also has some limitations. Although we have included major known risk factors of preterm birth in this population, as reflected in relatively modest discrimination ability, it is likely that additional risk factors are necessary to take into account in the predictive models. In addition, we included only two genes and their interactions in the predictive models. Recent studies have reported a potential interaction of cytokine polymorphisms and bacterial vaginosis in preterm birth development as well as the interaction of inflammatory-response regulatory polymorphisms and bacterial vaginosis.23,24 Therefore, we anticipate that there are other important genetic factors and GxE interactions yet to be considered or discovered. The ultimate goal of our work is to develop a highly accurate and predictive model for preterm birth that can be used in clinical and public health settings. However, although our predictive models presented in this study were far from perfect given their modest discrimination ability, this was only the first step toward our goal. Looking forward, the performance of our predictive models needs to be validated and evaluated in independent samples.
In summary, consistent with our previous report, African ancestry was significantly associated with an increased risk of preterm birth and very preterm birth in this sample of inner-city African American mothers. Our data underscore the need to simultaneously consider African ancestry, important gene polymorphisms, and GxE interactions to better understand preterm racial disparity and to improve our ability to predict, treat, and prevent preterm birth, especially very preterm birth.
1. Mattison DR, Damus K, Fiore E, Petrini J, Alter C. Preterm delivery: a public health perspective. Paediatr Perinat Epidemiol 2001;(suppl 2):7–16.
2. Parra EJ, Marcini A, Akey J, Batzer MA, Cooper R, Forrester T, et al.. Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet 1998;63:1839–51.
3. Kumar R, Seibold MA, Aldrich MC, Williams LK, Reiner AP, Colangelo L, et al.. Genetic ancestry in lung-function predictions. N Engl J Med 2010;363:321–30.
4. Freedman ML, Haiman CA, Patterson N, McDonald GJ, Tandon A, Waliszewska A, et al.. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci U S A 2006;103:14068–73.
5. Reiner AP, Carlson CS, Ziv E, Iribarren C, Jaquish CE, Nickerson DA. Genetic ancestry, population sub-structure, and cardiovascular disease-related traits among African-American participants in the CARDIA Study. Hum Genet 2007;121:565–75.
6. Tsai HJ, Yu Y, Zhang S, Pearson C, Ortiz K, Xu X, et al.. Association of genetic ancestry with preterm delivery and related traits among African American mothers. Am J Obstet Gynecol 2009;201:94.e1–10.
7. Hao K, Wang X, Niu T, Xu X, Li A, Chang W, et al.. A candidate gene association study on preterm delivery: application of high-throughput genotyping technology and advanced statistical methods. Hum Mol Genet 2004;13:683–91.
8. Tsai HJ, Liu X, Mestan K, Yu Y, Zhang S, Fang Y, et al.. Maternal cigarette smoking, metabolic gene polymorphisms, and preterm delivery: new insights on GxE interactions and pathogenic pathways. Hum Genet 2008;123:359–69.
9. Yu Y, Tsai HJ, Liu X, Mestan K, Zhang S, Pearson C, et al.. The joint association between F5 gene polymorphisms and maternal smoking during pregnancy on preterm delivery. Hum Genet 2009;124:659–68.
10. Wang X, Zuckerman B, Pearson C, Kaufman G, Chen C, Wang G, et al.. Maternal cigarette smoking, metabolic gene polymorphism, and infant birth weight. JAMA 2002;287:195–202.
11. Hamill PV, Drizd TA, Johnson CL, Reed RB, Roche AF, Moore WM. Physical growth: National Center for Health Statistics percentiles. Am J Clin Nutr 1979;32:607–29.
12. Kramer MS, Platt R, Yang H, Joseph KS, Wen SW, Morin L, et al.. Secular trends in preterm birth: a hospital-based cohort study. JAMA 1998;280:1849–54.
13. Engle WA, Tomashek KM, Wallman C; Committee on Fetus and Newborn, American Academy of Pediatrics. “Late-preterm” infants: a population at risk. Pediatrics 2007;120:1390–401.
14. Mathews TJ, Menacker F, MacDorman MF; Centers for Disease Control and Prevention, National Center for Health Statistics. Infant mortality statistics from the 2002 period: linked birth/infant death data set. Natl Vital Stat Rep 2004;53:1–29.
15. Reich D, Patterson N, Ramesh V, De Jager PL, McDonald GJ, Tandon A, et al.. Admixture mapping of an allele affecting interleukin 6 soluble receptor and interleukin 6 levels. Am J Hum Genet 2007;80:716–26.
16. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 2003;164:1567–87.
17. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics Jun 2000;155:945–59.
18. Breslow NE, Zhao LP. Logistic regression for stratified case-control studies. Biometrics 1988;44:891–9.
19. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29–36.
20. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837–45.
21. Institute of Medicine of the National Academies, Committee on Understanding Premature Birth and Assuring Healthy Outcomes, Board on Health Sciences Policy. Preterm birth: causes, consequences, and prevention. IOM Report. Washington, DC: The National Academies Press; 2006.
22. Jakobsdottir J, Gorin MB, Conley YP, Ferrell RE, Weeks DE. Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet 2009;5:e1000337.
23. Jones NM, Holzman C, Friderici KH, Jernigan K, Chung H, Wirth J, et al.. Interplay of cytokine polymorphisms and bacterial vaginosis in the etiology of preterm delivery. J Reprod Immunol 2010;87:82–9.
24. Gomez LM, Sammel MD, Appleby DH, Elovitz MA, Baldwin DA, Jeffcoat MK, et al.. Evidence of a gene–environment interaction that predisposes to spontaneous preterm birth: a role for asymptomatic bacterial vaginosis and DNA variants in genes that control the inflammatory response. Am J Obstet Gynecol 2010;202:386.e1–6.