Genome-wide by environment interaction studies of maternal smoking and educational score in UK biobank : Psychiatric Genetics

Secondary Logo

Journal Logo

Original Article

Genome-wide by environment interaction studies of maternal smoking and educational score in UK biobank

Huang, Huimeia,*; Liu, Lib,*; Feng, Fenlinga,c; Sun, Honglid; Li, Feie; Wu, Haibinf; Liang, Chujunb; Chu, Xiaomengb; Ning, Yujieb; Zhang, Fengb

Author Information
Psychiatric Genetics ():10.1097/YPG.0000000000000347, May 24, 2023. | DOI: 10.1097/YPG.0000000000000347



Maternal tobacco exposure exerts an adverse influence on offspring, especially on brain structure/function development (Ekblad et al., 2015; Zhang et al., 2017). For example, a growing number of literatures have proved the dramatically association between maternal smoking (MS) during pregnancy (MSDP) and neurodevelopment disorders and mental health outcomes in offpsring. For instance, Margaret H. Bublitz et al. indicated that MSDP is associated with neurobehavioral and cognitive deficits in offspring (Bublitz and Stroud, 2012). Moreover, Herrmann M suggested that prenatal tobacco exposure was consistently associated with children’s neurodevelopment and behavior, such as attention deficit hyperactivity disorder (ADHD) and cognitive impairments (Herrmann et al., 2008). A previous cohort study indicated MS rates (e.g. >20 cigarettes per day) were positively associated with the levels of internalizing behaviors (anxiety and depression) in offspring (Moylan et al., 2015). However, the role of MS in the brain structural/functional development of offspring remains questionable. The related biological mechanism of MS has not been comprehensively explored until now.

Socioeconomic status (SES) is a complex indicator, which commonly encompasses income, education attainment, occupational prestige, and subjective perceptions of social status and social class. A growing number of studies have revealed that social outcomes are influenced by their SES (such as income, educational level, and occupation), even in the gene-environment interaction analysis(Abdellaoui et al., 2019). For example, MS often occurred among women with lower SES, poor education, impaired parenting family issues, and so on (Moussa et al., 2010; Oskarsdottir et al., 2017). Education score, a part of the SES, is a theoretical index indicating personal education attainment at the highest possible, which could be greatly influenced by nutritional status and financial situation. Over the compulsory education process, family SES and children’s school performance are evident, and the impact of economic status on academic performance is partially independent of intelligence (von Stumm, 2017). Thus, SES is the common influencing factor for both the educational level of adult offspring and their smoking behavior during pregnancy. However, the education score we used in the present study is indeed a region-level indicator, which measures the extent of deprivation in terms of education, skills and training in an area. Thus, the education score in the present study refers to SES in a designated area.

Education score is a complex characteristic resulting from the combination of genetic and environmental factors. For example, a registry data-based cohort study provided evidence of the persistent negative impact of MSDP on academic achievement in offspring (Kristjansson et al., 2018). Remarkable advances have resulted from the great efforts made in the field of genetics. A previous study constructed a random polygenic effect model and found that at least 2% of the variation in education scores could be interpreted by genetic factors. They also indicated that if the size of the training sample used to estimate the linear polygenic score increased, the percentage could increase up to 22.4% (Okbay et al., 2016). Moreover, in this study, a large number of single nucleotide polymorphisms (SNPs) associated with education attainment were located on 2q32 that regulate gene expression in the fetal brain, primarily in neural tissue (Okbay et al., 2016). Cornelius A. Rietveld et al. also identified some candidate SNPs for education attainment being associated with health, cognitive, and central nervous system phenotypes (Rietveld et al., 2013). What’s more, Lee et al. conducted a large-scale genetic association analysis of educational attainment among a sample of approximately 1.1 million individuals. As a result, 1271 independent genome-wide-significant SNPs were identified, which implicate genes involved in brain development processes and neuron-to-neuron communication. Also, polygenic scores were generated by a multi-phenotype analysis of educational attainment and three related cognitive phenotypes, explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance (Lee and Wedow, 2018). Nevertheless, limited studies were conducted to investigate the potential association between MS to offspring education score.

In complex diseases, multiple candidate loci interact with each other, and combine with the environmental factors to affect the development and deterioration of diseases (Huynh, 2017; Brand et al., 2019). Usually, the pathogenesis of complex diseases is thought to involve the interaction of genetic factors and environmental exposures, which makes it difficult to study separately (Klimek et al., 2016). Gene-environment interaction analysis (G × E) plays an important role in improving the accuracy and precision of genetic and environmental influences for complex diseases and traits in the context of genome-wide association study (GWAS) (Milmont et al., 2008). With the application and development of GWAS, it becomes more accessible for building gene-environmental interaction models to analyze available datasets more comprehensively (Rava et al., 2015; Dehghan, 2018). For instance, genome-wide by environment interaction studies (GWEIS) has been conducted to estimate the effect of gene-environment interactions on BMI and identified 15 lifestyle factors that interacted with BMI-associated genetic variants, such as alcohol intake frequency, usual walking pace, and Townsend deprivation index (TDI) (Rask-Andersen et al., 2017). Tao et al. performed a GWEIS and observed a significant interaction between rs671 in ALDH2 and alcohol consumption on ferritin levels which is a complex trait being regulated by genetic variants, total iron status, systemic inflammation and metabolic disorders (Tao et al., 2019). To the best of our knowledge, no GWEIS of MS and offspring education score has been conducted. More efforts are required to clarify the roles of MS in the development of brain function development in offspring.

In the present study, leveraging the individual-level genotypic and phenotypic data of MS and education score from the UK Biobank, we first performed a two-stage observational analysis with logistic regression model to evaluate the association between MS and education score. The population from England was used as the discovery cohort, and the populations from Scotland and Wales were used as replication cohorts. Further, GWEIS was conducted to identify gene-MS interaction effects on offspring education score.

Material and methods

The UK Biobank cohort

The individual-level phenotypic and genotypic data used in this study were derived from UK Biobank health resource under UKB application 46478. UK Biobank project is a UK-wide, ongoing, prospective cohort study, which recruited ~500 000 volunteers aged between 40 and 69 registered with the National Health Service in England, Wales and Scotland. Participants were required to visit the 21 assessment centers to complete a computer-assisted self-administered questionnaire, participate in a face-to-face interview and provide physical measures and biological samples. Detailed information about DNA extraction, and participants information can be found in the published studies (Bycroft et al., 2018; Mutambudzi et al., 2020).

Genotyping, imputation and quality control

Data about genotyping, imputation and quality control (QC) were obtained from the UK biobank processed (Sudlow et al., 2015; Bycroft et al., 2018). In brief, the Affymetrix UK BiLEVE Axiom and the Affymetrix UK Biobank Axiom arrays (Santa Clara, California, USA) were used for genotyping. Imputation was carried out by IMPUTE4 in chunks of approximately 50 000 imputed markers with a 250 kb buffer region. QC consists of two parts, namely sample-based QC and marker-based QC.

Additionally, in the GWEIS analyses, we conducted an additional QC on the UK Biobank genotyped data. First, we need to restrict our analyses to a relatively genetically homogeneous sample to avoid spurious findings that may be induced by population stratification. Thus, we limited our participants to those with ‘Caucasian’ genetic ethnic grouping based on principal component analysis (UKB Data-Field 22006) and those with self-reported ‘white British’ ethnic background (UKB Data-Field 21000). Second, the Kinship coefficient was estimated with KING software ( to screen out genetically related individuals. Specifically, we generate kinship coefficient for all pairs of individuals using KING, and the pairs of degree 3 or closer were assigned using the kinship coefficient boundaries recommended by the authors of KING (kinship coefficient ≥1/2(9/2) = 0.04419418; see Table 1 in published article 12) (Rietveld et al., 2013). In addition, third-degree and higher-degree were removed. Additionally, we removed the low-quality SNPs on the PLINK platform in the following process. After excluding the individuals with too much missing genotype data (mind=0.1), we included only SNPs with minor allele frequency (MAF) ≥0.01, 90% genotyping rate (10% missing) and Hardy–Weinberg equilibrium test P > 1.00 × 10−3.

Table 1 - Basic characteristics of study population
England (N = 276,999) Scotland (N = 24,355) Wales (N = 14,526)
MS (n = 85 362) Non-MS (n = 191 637) MS (n = 7712) Non-MS (n = 16 643) MS (n = 4316) Non-MS (n = 10 210)
Sex, male 40 446 (47.38%) 86,941 (45.37%) 3539 (45.89%) 7723 (43.40%) 2092 (48.47%) 4592 (44.98%)
Age, years 56.33 ± 7.65 57.16 ± 8.08 55.47 ± 7.67 56.92 ± 8.13 55.75 ± 7.46 56.54 ± 8.03
MS means maternal smoking during pregnancy response with Yes; non-MS means maternal smoking during pregnancy response with No.

Phenotype definition

The phenotypic information of MS and offspring education score were derived from the UK Biobank cohort (Biobank, 0000). The information of 494 288 participants is available for MS (Data-Field 1787) with both genders. All of the participants met the initial assessment visit (2006–2010), who were recruited and consent informed. MS was set as a binary variable, using an ACE touchscreen question ‘Did your mother smoke regularly around the time when you were born?’ Answers of the participants were defined as four types, including Yes, No, Do not know and Prefer not to answer. A value of ‘1’ was assigned to participants with the answer ‘Yes,’ and ‘0’ for those with the answer ‘No.’ Meanwhile, participants with the answer ‘Prefer not to answer’ or ‘Do not know’ were excluded from the dataset. A total of 134 621 participants answered ‘Yes,’ while 326 116 answered ‘No,’ 64 807 participants answered ‘Do not know’ and 254 of them ‘Prefer not to answer’ the question (

The education score for the England population (discovery cohort, Data-Field: 26414), Scotland population (replicate cohort, Data-Field: 26431), and Wales population (replicate cohort, Data-Field: 26421) were all collected from the UK Biobank. The education score used in the present study is indeed a region-level indicator, which measures the extent of deprivation in terms of education, skills and training in an area. The indicators are divided into two sub-domains, which represent the ‘flow’ and ‘stock’ of educational disadvantage within each area respectively. Different indicators are used to calculate the education score in different populations. For England population, seven indicators were used to calculate this domain. These indicators are average points score of pupils taking English, Math and Science Key Stage 2 exams; average points score of pupils taking English, Math and Science Key Stage 3 exams; average capped points score of pupils taking Key Stage 4 (General Certificate of Secondary Education or equivalent) exams; proportion of young people not staying on in school or non-advanced education above age 16; secondary school absence rate – the proportion of authorized and unauthorized absences from secondary school; proportion of those aged under 21 not entering Higher Education; and proportion of adults aged 25–54 with no or low qualifications. More information about the calculation of education score for the other two populations could be looked up in the UK Biobank (Biobank, 0000).

Because the education score in these three populations was calculated basing on different indicators, direct comparisons between the populations in a single analysis are unreasonable (Abel et al., 2016). In the observational analyses, in order to maximize comparability, the education score was recoded as categorical variable in the logistic regression model. Individuals are placed into groups based on the quartiles of education score in the area where they reside and these groups are assumed to be equivalent across the areas (Abel et al., 2016). By setting the upper 25% of scores as the reference category, the associations between MS and the lowest 25%, second quartile, and third quartile of scores were estimated respectively. However, this would assume that there are no deprivation gradients among the three areas. More, in the subsequent gene-MS interaction analyses (GWEIS), the education score is used as a continuous variable in a linear regression model.

Observational analyses

A two-stage observational analysis was first performed to evaluate the association between MS and offspring education score with age and sex as covariates in the model, that is, two different cohorts were used to perform the logistic regression analysis in the observational study. In the first stage, a total of 276 996 subjects from England in UK Biobank cohort were enrolled for the discovery study. In the second stage, 24 355 subjects from Scotland and 14 526 subjects from Wales were enrolled for the replication study.

Genome-wide by environment interaction studies analyses

GWEIS analyses were conducted on individual education score using PLINK 2.0 software, which is able to build regression models of each SNP to estimate the gene-environment interactions. Generally, this is a large-scale association scan for the true gene-environment interaction in complex traits. Education score was adjusted for 10 principal components of population structure, SNP-age and SNP-gender were set as covariates in the GWEIS model. The model was constructed as follows, in which offspring education score was defined as the outcome variable D, MS during pregnancy was defined as the environmental factor e or E. Gene was defined as the g or G. β_0 is called regression constant, and β 1, β 2, β 3 is called regression coefficient. ge means the gene×environment interaction analysis.


The additive genetic model of PLINK2.0 was used to effectively assess the effects of SNP-MS interaction on offspring education score. For each SNP, the excluded criteria were set as below, SNPs with low call rates <0.90, low Hardy–Weinberg equilibrium exact test P values <0.001, or low MAFs <0.01. The significant threshold was P < 5.0 × 10–8 in this study(Purcell et al., 2007; Chang et al., 2015).

Statistical analysis

For the observational analyses, all statistical analysis was performed by SPSS. Genome-wide gene-environment interaction studies were conducted by PLINK 2.0. Circular Manhattan plot of GWEIS result was generated using the ‘CMplot’ R script (


Observational analyses in UK Biobank cohort

The characteristics of the study population were exhibited in Table 1. In UK Biobank cohort, we observed significant association between MS and offspring education score in both the discovery cohort (P < 0.0001) and two replicate cohorts, the Scotland population (P < 0.0001) and Wales population (P < 0.0001). By setting the upper 25% of scores as reference category, the associations between MS and the lowest 25%, second quartile, and third quartile of scores were estimated respectively (Table 2).

Table 2 - The observational associations between maternal smoking and education score analyses in UK Biobank cohorts
Population Education score categoriesa P-value OR 95% confidence interval for OR (lower bound, upper bound)
England Lowest 25% <0.0001 1.584 (1.548–1.621)
Second 25% <0.0001 1.408 (1.376–1.440)
Third 25% <0.0001 1.265 (1.237–1.294)
Wales Lowest 25% <0.0001 1.676 (1.514–1.855)
Second 25% <0.0001 1.484 (1.344–1.640)
Third 25% <0.0001 1.274 (1.155–1.406)
Scotland Lowest 25% <0.0001 1.635 (1.514–1.766)
Second 25% <0.0001 1.456 (1.350–1.571)
Third 25% <0.0001 1.265 (1.174–1.363)
CI, confidence interval; MS, maternal smoking; OR, odds ratio.
aThe reference category is the upper 25% of education score; OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure, which quantifies the strength of the association between MS and education score; P-value, the significance of the OR.

Genome-wide by environment interaction studies result of maternal smoking and offspring education score

With the significant threshold of P = 5.0 × 10−8, GWEIS analyses identified 2 independent significant SNP-MS interaction. For example, rs72768988 located in the chromosomal 16 (Position: 22 768 798, P = 1.22 × 10-8, β = 6.7662) is positively interacted with MS for educational score, the effect size is 6.7662, while 2:196424612_GT_G located on chromosome 2 (Position: 196 424 612, P = 3.60 × 10−9, β = −0.4721) is negatively interacted with MS for educational score, the effect size is -0.4721. In addition, there are 76 significant SNP-MS interactions are summarized in Table S1, Supplemental digital content 1, such as rs72927705 (P = 1.09 × 10-8), rs72927707 (P = 1.09 × 10-8) and rs16836823 (P = 1.35 × 10-8). All of the significant SNPs were exhibited in Table 3 and Table S1, Supplemental digital content 1, The P-value results of the GWEIS were displayed in a circular Manhattan plot, which was shown in Fig. 1.

Table 3 - List of top 10 significant SNP-MS interactions (P < 5.0 × 10−8)
Chromosome Position SNP ID Region β Standard Error P-value
16 22768798 rs72768988 16p12.2 6.7662 1.1878 1.22 × 10-8
2 196424612 2 : 196424612_GT_G 2q32.3 -0.4721 0.08 3.60 × 10-9
2 196389182 rs545437977 2q32.3 -0.5417 0.0924 4.66 × 10-9
2 196421646 rs148984532 2q32.3 -0.4627 0.0795 5.81 × 10-9
2 196424039 rs2102745 2q32.3 -0.4629 0.0795 5.82 × 10-9
MS, maternal smoking during pregnancy; SNP, single nucleotide polymorphisms.

Fig. 1:
The circular Manhattan plot of Genome-wide gene-environment interaction study result. *Specifically, from the center, the first circus depicts the-log10(P) values of each variant due to the SNP-MS interaction effect. The second circus shows chromosome density. The blue line on the Manhattan plot represents the genome-wide significance threshold (5 × 10−8), and the red line represents the suggestive association threshold (5 × 10−7). The window size for counting SNP number was set as 1 × 106 Mb. The plots were generated using the ‘CMplot’ R script ( MS, maternal smoking during pregnancy; SNP, single nucleotide polymorphisms.


Previous studies have demonstrated that MSDP does exert influence on brain development of offspring (Roza et al., 2007). For example, compared to fetuses not exposure to MSDP, fetal head circumference of fetuses exposure to MSDP showed a growth reduction of 0.13 mm per week. The biparietal diameter of fetuses also shows the same trend as fetal head circumference (Roza et al., 2007). In the present study, to evaluate the association between MS and education score, we performed a two-stage observational analysis using the individual-level phenotypic and genotypic data of MS and education score from the UK Biobank. Further, we conducted GWEIS to detect SNP-MS interaction effect on offspring education score. As result, we observed strong association between MS and offspring education score in both discovery and replicate cohorts. GWEIS identified two independent significant SNP-MS interactions at chromosomal 16 and chromosomal 2q32 region. Our results indicated the significant impact of MS on offspring education score, and the potential role of 2q32.

It has also been demonstrated that MSDP was associated with low birth weights, which was related to significant neurologic consequences, such as increased risk for behavior problems, decreased intellectual capacity, hyperactivity and learning disabilities (McCarton, 1998; Braun et al., 2009; Kristjansson et al., 2017; Narvestad et al., 2019). MSDP also exerts influence on some psychiatric diseases in offspring, including depressive and anxiety behaviors (Thakur et al., 2013; Moylan et al., 2015). Thakur et al. suggested that MSDP was related to a more severe form of ADHD, characterized by more severe clinical manifestations and poorer neuropsychological performance (Thakur et al., 2013). Animal and human research have demonstrated that MSDP could disrupt neurodevelopment by influencing maturing neurotransmitter systems and brain architecture in regions associated with stress and mood regulation, such as the hippocampus (Slotkin, 2004; Moylan et al., 2015). Epidemiological evidence suggests MSDP was associated with poor and reduced academic performance in childhood and adolescent offspring(Kristjansson et al., 2017). For example, the remarkable statistical effect of birth weight, maternal education and smoking in pregnancy on offspring’s IQ scores was found in 1822 children (Rahu et al., 2010).

Several plausible mechanisms have been suggested for the association. First, the association is caused by prenatal exposure to nicotine. Nicotine easily passes through the syncytium, targets specific neurotransmitters, disrupts neurodevelopment and impairs fetal brain development and function (Ekblad et al., 2015). Moreover, the detection of blood concentrations of cotinine (a nicotine metabolite) in exposed newborns indicates that the fetus is exposed to equal or even higher levels of nicotine than the smoking mother (Falgreen Eriksen et al., 2012). Dwyer JB et al. suggested that nicotinic acetylcholine receptors are widely expressed in fetal central nervous system and have detrimental effects on cholinergic modulation of brain development (Dwyer et al., 2008). What’s more, exposure of nicotine could cause many neurochemical alterations that affect the major neuromodulatory pathways in the developing brain (Oliff and Gallardo, 1999; Dwyer et al., 2009). The variation in education score among individuals also could be induced by low intelligence resulting from brain structure/function impairment (Kristjansson et al., 2018). For example, MSDP has been related to lower intelligence in 9-year-old children and reduced intellectual abilities in 8 years olds (Kristjansson et al., 2017). Moreover, Mortensen et al. indicated a dose-response relationship between MSDP and offspring adult intelligence (Mortensen et al., 2005).

Second, when considering the association between smoking and education, it is important to consider also the role of genetic factors (Silventoinen et al., 2022). The hypothesis was confirmed by a recent study that some of the genetic variation is shared between MSDP and education (Silventoinen et al., 2022), which was emphasized by a recent GWA study finding which found that smoking and education share four SNPS (Erzurumluoglu et al., 2020), indicating the important role of genetic loci in the association. Semick et al. performed a genome-wide differential gene expression analysis using RNA sequencing (RNA-seq) on prenatal and adult human postmortem prefrontal cortices, and they found 14 genes directly associated with MSDP and different exposure effects between MSDP and direct exposure in adulthood were identified, providing evidence of MSDP affects gene expression in the prenatal human cortex (Semick et al., 2020), indicating the interaction effects of MSDP and genetic loci. In addition, MSDP has been associated with altered DNA methylation and dysregulated expression of microRNA (Knopik et al., 2012). Evidence showed above indicated the possible interaction between MSDP and loci. In the present study, we identified MSDP interacted with HECW2/ZNF804A for educational score. However, our study is just a preliminary screening analysis, a deeper understanding of the biological genetic mechanisms underlying the associations as well as how these interactions may affect offspring educational attainment remain to be elucidated.

Our GWEIS identified chromosomal 2q32 region showing significant interacting effects with MS on offspring education scores, but no supporting evidence was found in previous study. In addition, two significant loci identified in our GWEIS analysis (rs72768988, 2:196424612_GT_G) were also not found in previous GWAS analyses either. More functional studies are warranted to confirm our findings. HECW2 (HECT, C2 and WW Domain Containing E3 Ubiquitin Protein Ligase 2) is a gene on chromosomal 2q32.3. By encoding a member of a family of E3 ubiquitin ligases, HECW2 plays a key role in the proliferation, migration and differentiation of neural crest cells as a regulator of glial cell line-derived neurotrophic factor/Ret signaling. A number of evidence has demonstrated that mutations in this gene are associated with neurodevelopmental delay and epilepsy. Five de-novo mutations in HECW2 have been identified by previous exome sequencing projects in neurodevelopmental disorders, including intellectual disability and epilepsy (EuroEPINOMICS-RES Consortium; Epilepsy Phenome/Genome Project; Epi4K Consortium, 2014; Iossifov et al., 2014; Krumm et al., 2015; Wright et al., 2015). For example, a previous study performed exome sequencing on 39 patient-parent trios and identified HECW2 gene as new candidate gene for epilepsy and intellectual disability (Halvardson et al., 2016). According to the authors, HECW2 is moderately expressed in the brain throughout development, with the highest expression in frontal cortex. Interestingly, CDKL5 is a well-established causative gene in intellectual disability and epilepsy, which shows the highest correlation in expression in frontal cortex during brain development (Halvardson et al., 2016). All of the evidence suggests the potential role of gene HECW2 in brain development. More, previous studies have shown that the absence of chromosomal 2q32 region could lead to intellectual disability (Kaminsky et al., 2011). For instance, a study indicated that singleton deletion within 2q31.1-q33.1 leads to unexplained developmental delay, intellectual disability, dysmorphic features, multiple congenital anomalies, autism spectrum disorders, or clinical features suggestive of a chromosomal syndrome (Kaminsky et al., 2011).

Gene ZNF804A is a protein-coding gene on 2q32, which encodes Zinc finger binding protein. Polymorphisms in the gene are identified being susceptible to schizophrenia, bipolar disorder, and heroin addiction (O’Donovan et al., 2008; Sun et al., 2016). Previously, a genome-wide meta-analysis for FTND (Fagerstrӧm test for nicotine dependence) and TTFC (time to smoke first cigarette in the morning) phenotypes were conducted for adult smokers. In the article, it is reported that ZNF804A was associated with nicotine dependence (Chen et al., 2020). However, neural mechanism of ZNF804A in nicotine dependence and brain development remains unclear. Therefore, further investigation is needed to discover the effects of ZNF804A in the development of the nervous system. It is worth noting that, while ZNF804A is a well-established susceptibility gene for schizophrenia, bipolar disorder, and heroin addiction, our analyses do not necessarily point to this gene as a causal mechanism.

There are some highlights in our study. Firstly, we utilized the phenotypic and genetic data from UK Biobank with a large number of subjects. Two-stage design observational analyses further enhanced the reliability of our study results. Secondly, to the best of our knowledge, few GWEIS has been conducted for MS now. This study represents the limited efforts to explore the gene-environment interaction between MS and offspring education score. Our study results highlighted the potential role of chromosomal 2q32 region for future studies. It is also worth noting that the outcome variable, education score, is regional measure of deprivation relating to education and training. Actually, it is a region-level indicator, which measures the extent of deprivation in terms of education, skills and training in an area, rather than an individual-level indicator, which is derived from several indices of different types of material deprivation related to education and training and actually reflects the deprivation of an area.

This study has limitations. First, the present study restricted the subjects to ‘white British’ ancestry, which limits the generalizability of the results to other ancestral groups. Second, in the logistic regression model, the education score was recoded as categorical variable, rather than continuous variable. However, this model could be established only under the hypothesis that no deprivation gradients exist among the three areas in the UK. In addition, owing to the limitation of available UK biobank data, we used region-level education score in the association analysis, region-level education score represents a material deprivation for a given area and region-level education score was used as an important indicator for social deprivation, such as index of multiple deprivation and TDI. The region-level social deprivation has been used as individual description in several recently published studies (Lokar et al., 2019; Woodward et al., 2021). However, by using region-level indicators, the associations reported in the present study are valid tests of the null hypotheses that no deprivation gradients exist among the three areas in the UK. Nevertheless, attenuation bias could affect our results because the UK Biobank is a volunteer sample, which over-sampled more-educated people. And our results may average the region level and underrepresent the difference between more-educated people and less-educated people. Thus, our results should be interpreted with caution. Thirdly, more research is required to dissect contributions to the SNP × MS interaction by subtle population differences confounding. Fourth, due to the consideration of the small sample size in Scotland Wales, none of the significant SNP-MS interactions identified by the GWEIS analysis were still significant in the replication sample. In addition, the phenotypic information of MS collected in our study was based on individual’s recall when they were adults, which may be subjected to recall bias. Furthermore, individuals might be habitual smoking and not just in pregnancy. Future prospective research is necessary to identify potential sensitive periods using biological markers of MS and detailed information on MS instead of retrospective self-reports. In addition, further studies are required to confirm our findings as well as to examine the potential mechanisms behind the gene-MS interactions that we observed.

In conclusion, we conducted a two-stage observational analysis and gene-MS interaction effects analyses (GWEIS), suggested the significant association between MS and offspring education score, and highlighted the potential roles of 2q32.


This study is supported by the Program for Tackling Key Problems in Shannxi Provincial Science and Technology (2016SF-288).

Data availability statement: The UK Biobank data are available through the UK Biobank Access Management System Information on the education score of the study population can be found in the following website:

Conceptualization, Huimei Huang, Yujie Ning and Feng Zhang; Data curation, Feng Zhang; Funding acquisition, Huimei Huang; Methodology, Li Liu, Fenling Feng, Hongli Sun; Writing – original draft, Huimei Huang, Li Liu; Writing – review & editing, Yujie Ning, Hongli Sun, Fei Li, Haibin Wu, Chujun Liang, Xiaomeng Chu.

Institutional review board statement: Not applicable. Our data were downloaded from online.

Informed consent statement: Not applicable. Our data were downloaded from online.

Conflicts of interest

There are no conflicts of interest.


EuroEPINOMICS-RES Consortium; Epilepsy Phenome/Genome Project; Epi4K Consortium. (2014). De novo mutations in synaptic transmission genes including DNM1 cause epileptic encephalopathies. Am J Hum Genet 95:360–370.
Abdellaoui A, Hugh-Jones D, Yengo L, Kemper KE, Nivard MG, Veul L, et al. (2019). Genetic correlates of social stratification in Great Britain. Nat Hum Behav 3:1332–1342.
Abel GA, Barclay ME, Payne RA (2016). Adjusted indices of multiple deprivation to enable comparisons within and between constituent countries of the UK including an illustration using mortality rates. BMJ Open 6:e012750.
Biobank, U. Indices of multiple deprivation. [Accessed May 2018]
Brand JE, Moore R, Song XI, Xie Y (2019). Parental divorce is not uniformly disruptive to children’s educational attainment. Proc Natl Acad Sci U S A 116:7266–7271.
Braun JM, Daniels JL, Kalkbrenner A, Zimmerman J, Nicholas JS (2009). The effect of maternal smoking during pregnancy on intellectual disabilities among 8-year-old children. Paediatr Perinat Epidemiol 23:482–491.
Bublitz MH, Stroud LR (2012). Maternal smoking during pregnancy and offspring brain structure and function: review and agenda for future research. Nicotine Tob Res 14:388–397.
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature 562:203–209.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4:7.
Chen J, Loukola A, Gillespie NA, Peterson R, Jia P, Riley B, et al. (2020). Genome-wide meta-analyses of FTND and TTFC phenotypes. Nicotine Tob Res 22:900–909.
Dehghan A (2018). Genome-wide association studies. Methods Mol Biol 1793:37–49.
Dwyer JB, Broide RS, Leslie FM (2008). Nicotine and brain development. Birth Defects Res C Embryo Today 84:30–44.
Dwyer JB, McQuown SC, Leslie FM (2009). The dynamic effects of nicotine on the developing brain. Pharmacol Ther 122:125–139.
Ekblad M, Korkeila J, Lehtonen L (2015). Smoking during pregnancy affects foetal brain development. Acta Paediatr 104:12–18.
Erzurumluoglu AM, Liu M, Jackson VE, Barnes DR, Datta G, Melbourne CA, et al.; Understanding Society Scientific Group, EPIC-CVD, GSCAN, Consortium for Genetics of Smoking Behaviour, CHD Exome+ consortium (2020). Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci. Mol Psychiatry 25:2392–2409.
Falgreen Eriksen HL, Kesmodel US, Wimberley T, Underbjerg M, Kilburn TR, Mortensen EL (2012). Effects of tobacco smoking in pregnancy on offspring intelligence at the age of 5. J Pregnancy 2012:945196.
Halvardson J, Zhao JJ, Zaghlool A, Wentzel C, Georgii-Hemming P, Månsson E, et al. (2016). Mutations in HECW2 are associated with intellectual disability and epilepsy. J Med Genet 53:697–704.
Herrmann M, King K, Weitzman M (2008). Prenatal tobacco smoke and postnatal secondhand smoke exposure and child neurodevelopment. Curr Opin Pediatr 20:184–190.
Huynh K (2017). Risk factors: Low educational attainment linked to high CVD risk. Nat Rev Cardiol 14:442.
Iossifov I, O'Roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D, et al. (2014). The contribution of de novo coding mutations to autism spectrum disorder. Nature 515:216–221.
Kaminsky EB, Kaul V, Paschall J, Church DM, Bunke B, Kunig D, et al. (2011). An evidence-based approach to establish the functional and clinical significance of copy number variants in intellectual and developmental disabilities. Genet Med 13:777–784.
Klimek P, Aichberger S, Thurner S (2016). Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks. Sci Rep 6:39658.
Knopik VS, Maccani MA, Francazio S, McGeary JE (2012). The epigenetics of maternal cigarette smoking during pregnancy and effects on child development. Dev Psychopathol 24:1377–1390.
Kristjansson AL, Thomas S, Lilly CL, Thorisdottir IE, Allegrante JP, Sigfusdottir ID (2018). Maternal smoking during pregnancy and academic achievement of offspring over time: a registry data-based cohort study. Prev Med 113:74–79.
Kristjansson AL, Thorisdottir IE, Steingrimsdottir T, Allegrante JP, Lilly CL, Sigfusdottir ID (2017). Maternal smoking during pregnancy and scholastic achievement in childhood: evidence from the LIFECOURSE cohort study. Eur J Public Health 27:850–855.
Krumm N, Turner TN, Baker C, Vives L, Mohajeri K, Witherspoon K, et al. (2015). Excess of rare, inherited truncating mutations in autism. Nat Genet 47:582–588.
Lee JJ, Wedow R (2018). Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet 50:1112–1121.
Lokar K, Zagar T, Zadnik V (2019). Estimation of the ecological fallacy in the geographical analysis of the association of socio-economic deprivation and cancer incidence. Int J Environ Res Public Health 16:296.
McCarton C (1998). Behavioral outcomes in low birth weight infants. Pediatrics 102(5 Suppl E):1293–1297.
Milmont C, Lewinger JP, Gauderman W (2008). Gene-environment interaction in genome-wide association studies. Am J Epidemiol 169:219–226.
Mortensen EL, Michaelsen KF, Sanders SA, Reinisch JM (2005). A dose-response relationship between maternal smoking during late pregnancy and adult intelligence in male offspring. Paediatr Perinat Epidemiol 19:4–11.
Moussa KM, Ostergren P-O, Eek F, Kunst AE (2010). Are time-trends of smoking among pregnant immigrant women in Sweden determined by cultural or socioeconomic factors? BMC Public Health 10:374.
Moylan S, Gustavson K, Øverland S, Karevold EB, Jacka FN, Pasco JA, et al. (2015). The impact of maternal smoking during pregnancy on depressive and anxiety behaviors in children: the Norwegian Mother and Child Cohort Study. BMC Med 13:24.
Mutambudzi M, Flowers P, Demou E (2020). Neuroticism, health and health behaviours in emergency personnel: a UK Biobank study. Occup Med (Lond). 69:617–624.
Narvestad H, Vestergaard CH, Rytter D, Bech BH (2019). Maternal smoking during pregnancy and offspring utilisation of health care services: a population-based cohort study. Paediatr Perinat Epidemiol 33:384–393.
O’Donovan MC, Craddock N, Norton N, Williams H, Peirce T, Moskvina V, et al.; Molecular Genetics of Schizophrenia Collaboration (2008). Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nat Genet 40:1053–1055.
Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al.; LifeLines Cohort Study (2016). Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533:539–542.
Oliff HS, Gallardo KA (1999). The effect of nicotine on developing brain catecholamine systems. Front Biosci 4:D883–D897.
Oskarsdottir GN, Sigurdsson H, Gudmundsson KG (2017). Smoking during pregnancy: a population-based study. Scand J Public Health 45:10–15.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575.
Rahu K, Rahu M, Pullmann H, Allik J (2010). Effect of birth weight, maternal education and prenatal smoking on offspring intelligence at school age. Early Hum Dev 86:493–497.
Rask-Andersen M, Karlsson T, Ek WE, Johansson A (2017). Gene-environment interaction study for BMI reveals interactions between genetic factors and physical activity, alcohol consumption and socioeconomic status. PLoS Genet 13:e1006977.
Rava M, Smit LA, Nadif R (2015). Gene-environment interactions in the study of asthma in the postgenome wide association studies era. Curr Opin Allergy Clin Immunol 15:70–78.
Rietveld CA, Medland SE, Derringer J, Yang J, Esko T, Martin NW, et al.; LifeLines Cohort Study (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340:1467–1471.
Roza SJ, Verburg BO, Jaddoe VWV, Hofman A, Mackenbach JP, Steegers EAP, et al. (2007). Effects of maternal smoking in pregnancy on prenatal brain development. The Generation R Study. Eur J Neurosci 25:611–617.
Semick SA, Collado-Torres L, Markunas CA, Shin JH, Deep-Soboslay A, Tao R, et al. (2020). Developmental effects of maternal smoking during pregnancy on the human frontal cortex transcriptome. Mol Psychiatry 25:3267–3277.
Silventoinen K, Piirtola M, Jelenkovic A, Sund R, Tarnoki AD, Tarnoki DL, et al. (2022). Smoking remains associated with education after controlling for social background and genetic factors in a study of 18 twin cohorts. Sci Rep 12:13148.
Slotkin TA (2004). Cholinergic systems in brain development and disruption by neurotoxicants: nicotine, environmental tobacco smoke, organophosphates. Toxicol Appl Pharmacol 198:132–151.
von Stumm S (2017). Socioeconomic status amplifies the achievement gap throughout compulsory education independent of intelligence. Intell 60:57–62.
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. (2015). UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12:e1001779.
Sun Y, Zhao L-Y, Wang G-B, Yue W-H, He Y, Shu N, et al. (2016). ZNF804A variants confer risk for heroin addiction and affect decision making and gray matter volume in heroin abusers. Addict Biol 21:657–666.
Tao Y, Huang X, Xie Y, Zhou X, He X, Tang S, et al. (2019). Genome-wide association and gene-environment interaction study identifies variants in ALDH2 associated with serum ferritin in a Chinese population. Gene 685:196–201.
Thakur GA, Sengupta SM, Grizenko N, Schmitz N, Pagé V, Joober R (2013). Maternal smoking during pregnancy and ADHD: a comprehensive clinical and neurocognitive characterization. Nicotine Tob Res 15:149–157.
Woodward M, Peters SAE, Harris K (2021). Social deprivation as a risk factor for COVID-19 mortality among women and men in the UK Biobank: nature of risk and context suggests that social interventions are essential to mitigate the effects of future pandemics. J Epidemiol Community Health 75:1050–1055.
Wright CF, Fitzgerald TW, Jones WD, Clayton S, McRae JF, van Kogelenberg M, et al.; DDD study (2015). Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385:1305–1314.
Zhang D, Cui H, Zhang L, Huang Y, Zhu J, Li X (2017). Is maternal smoking during pregnancy associated with an increased risk of congenital heart defects among offspring? A systematic review and meta-analysis of observational studies. J Matern Fetal Neonatal Med 30:645–657.

genome-wide by environment interaction studies; maternal smoking; offspring education score; two-stage observational analysis

Supplemental Digital Content

Copyright © 2023 The Author(s). Published by Wolters Kluwer Health, Inc.