Minimal improvement in coronary artery disease risk prediction in Chinese population using polygenic risk scores: Evidence from the China Kadoorie Biobank : Chinese Medical Journal

Secondary Logo

Journal Logo

Original Article

Minimal improvement in coronary artery disease risk prediction in Chinese population using polygenic risk scores: Evidence from the China Kadoorie Biobank

Yang, Songchun1,2; Sun, Dong1; Sun, Zhijia1; Yu, Canqing1,3; Guo, Yu4; Si, Jiahui1; Sun, Dianjianyi1,3; Pang, Yuanjie1; Pei, Pei3; Yang, Ling5,6; Millwood, Iona Y.5,6; G.Walters, Robin5,6; Chen, Yiping5,6; Du, Huaidong5,6; Pang, Zengchang7; Schmidt, Dan6; Stevens, Rebecca6; Clarke, Robert6; Chen, Junshi8; Chen, Zhengming6; Lv, Jun1,3,; Li, Liming1,3

Editor(s): Ni, Jing

Author Information
Chinese Medical Journal ():10.1097/CM9.0000000000002694, May 17, 2023. | DOI: 10.1097/CM9.0000000000002694

Abstract

Introduction

Coronary artery disease (CAD), a subtype of cardiovascular disease (CVD), is a major cause of morbidity and mortality in China and worldwide.[1,2] Risk prediction models can help identify individuals who are at high risk of CAD and may benefit from lifestyle modifications and medical interventions. Current guideline-recommended CAD or CVD risk prediction models, such as the World Health Organization CVD risk chart,[3] pooled cohort equations (PCEs) in the United States,[4] SCORE2 in Europe,[5] and the China-PAR model[6] achieve risk stratification mainly based on sex, age, blood pressure, blood lipid levels, diabetes, smoking, and other traditional risk factors.

Genome-wide association studies (GWAS) have identified more than 160 genetic loci associated with CAD.[7,8] There has been considerable interest in whether incorporating genetic information such as a polygenic risk score (PRS) could enhance CAD risk prediction beyond traditional risk factors. Based on summary statistics derived from large-scale GWAS such as the CARDIoGRAMplusC4D collaboration group and UK Biobank (UKB), researchers have constructed several CAD PRSs in Western populations using different methods and containing genetic variants ranging from 40,000 to 6.6 million.[9–17] Most studies have found that the addition of PRS to traditional models (TMs) (such as Framingham Risk Score, PCE, and QRISK3 [University of Nottingham and EMIS; https://www.qrisk.org/index.php]) could enhance risk prediction,[9,10,12,17–23] with a potential increase of up to 0.03 to Harrell's C index, a measurement of discrimination.[22] However, there were two studies reporting negative results.[14,24] Due to the lack of large-scale GWAS on CAD conducted in the Chinese population, the development of PRS and the use of PRS in risk prediction are insufficient in China. Recently, a PRS comprising 540 genetic variants was developed using a meta-analytical approach and results from a GWAS for CAD and CAD-related traits in East Asian populations. The addition of this PRS to the China-PAR model yielded a modest yet statistically significant improvement in Harrell's C by 0.01 and a net reclassification improvement (NRI) of 3.5%.[25]

In this study, based on nearly 100,000 participants with genome-wide genotypic data from the China Kadoorie Biobank (CKB), we examined the association of PRSs developed in previous studies[9–17,25] with the risk of CAD in the Chinese population. We also developed new PRSs based on publicly available summary statistics from the largest CAD GWAS conducted globally and in East Asian populations.[8,13] The PRS showing the strongest association with CAD risk was selected to further evaluate its effects on improving the traditional risk prediction model.

Methods

Ethical approval

CKB was approved by the Ethical Review Committee of the Chinese Center for Disease Control and Prevention (Beijing, China) (approval No.: 005/2004) and the Oxford Tropical Research Ethics Committee, University of Oxford (UK) (reference:025–04). All the participants provided written informed consent.

Study population

The CKB is an ongoing prospective study with 512,725 participants aged 30–79 years enrolled from five urban and five rural regions in China between 2004 and 2008. The details of this study have been described previously.[26] Briefly, all participants provided valid baseline data, including a complete interviewer-administered laptop-based questionnaire and physical measurements conducted by trained staff using calibrated instruments and standard protocols. A 10 mL random blood sample was collected from each participant at the time of their last meal on the investigation day.

Nearly one-fifth of all participants have genome-wide genotypic data until now [Supplementary Figure 1, https://links.lww.com/CM9/B558]. Of them, 75,982 participants were randomly selected from the entire CKB cohort ("population-based samples"). The remaining participants were selected based on a case–cohort design ("case–cohort samples," n = 24,658). Based on "population-based samples," after excluding participants with CAD or stroke at baseline (n = 3832), the remaining participants were included in prospective analyses ("testing set," n = 72,150). By combining the case–cohort samples with the 3832 patients mentioned before, we got a "potential training set" (n = 28,490), from which two matched case–control training sets ("training set for hard CAD" and "training set for soft CAD") were determined [Supplementary Methods, Supplementary Table 1, https://links.lww.com/CM9/B558].

Study design

In this study, four interrelated components were involved [Figure 1]: (1) validation of previous PRSs: ten previously reported CAD PRSs were selected for validation; (2) development of new PRSs: clumping and thresholding ("C + T") and LDpred[27] were used to develop new PRSs; (3) identification of the optimal PRS: Different PRSs were compared in the training sets of hard CAD and soft CAD, separately; and (4) validation and evaluation of the optimal PRS: we prospectively examined the association between the optimal PRS and the risk of CAD and evaluated the improvement by adding the optimal PRS to traditional risk prediction models for CAD.

F1
Figure 1:
Study overview of the process evaluating polygenic risk scores used for coronary artery disease risk prediction in Chinese population. C + T: Clumping & thresholding; GWAS: Genome-wide association study; HR: Hazard ratio; LD: Linkage disequilibrium; OR: Odds ratio; PRS: Polygenic risk score; SD: Standard deviation.

Collection and definition of variables in the traditional risk prediction models

The present analysis was based on Kaplan–Meier "CKB-CVD models," which are newly derived 10-year risk prediction models for CAD and stroke subtypes, based on the CKB cohort. The predictors included age, systolic and diastolic blood pressure, use of blood pressure-lowering treatments, current daily smoking, self-reported diabetes, and waist circumference [Supplementary Table 2, https://links.lww.com/CM9/B558]. Details of the collection and definition of each variable were described in our previous study.[28]

Genetic data

Genotyping and imputation in this study were conducted centrally by the CKB research team and have been described elsewhere.[29] Briefly, two custom-designed single-nucleotide polymorphism (SNP) arrays optimized for Chinese Han participants (Affymetrix Axiom® CKB array) were used for genotyping. These arrays were developed by the University of Oxford's Clinical Trial Service Unit and Epidemiological Studies Unit (Oxford, UK) in collaboration with the Beijing Genomics Institute (Shenzhen, China) and Affymetrix (now Thermo Fisher Scientific, Santa Clara, CA, USA). Standard quality control after genotyping revealed 532,415 biallelic variants present in both array versions. Qualified genotypes for each chromosome were phased using SHAPEIT3 (https://jmarchini.org/software/#shapeit-3).[30] Imputation was performed for each 5-Mb interval using IMPUTE4 (https://jmarchini.org/software/#impute-4)[31] based on haplotypes derived from the 1000 Genomes Project Phase 3 (1KGP) reference panels.[32] The genetic principal components of ancestry (PCA) were also centrally computed. According to the quality control criteria of the UKB,[33] there were 9.54 million genetic variants with high reliability that achieved good coverage of the whole genome [Supplementary Figure 2, https://links.lww.com/CM9/B558]. The QCTOOL (version 2; https://www.well.ox.ac.uk/~gav/qctool/#overview) was used to convert the imputed genotype data to dosages.

PRSs

In the current analysis, we validated previous PRSs and developed new PRSs. First, we searched the PGS Catalog[34] and 10 previously reported CAD PRSs were selected for subsequent analyses [Supplemental Methods, Supplementary Table 3, https://links.lww.com/CM9/B558].[9–16] Second, we developed new PRSs by using two methods: "C + T" and LDpred. In the "C + T" method, based on ethnicity, sample size, and accessibility of summary statistics files, Biobank of Japan (BBJ)[13] and "UKB-CARDIoGRAMplusC4D meta-analysis (UCM)"[8] were selected as the base data to develop PRS [Supplementary Figure 3, https://links.lww.com/CM9/B558]. We applied r2 thresholds of 0 (no pruning), 0.2, 0.4, 0.6, and 0.8, and P value thresholds from 5 × 10-8 to 1 (40 values in total) [Supplementary Methods, https://links.lww.com/CM9/B558]. In the LDpred method, the base data were the same as the "C + T" method. In addition, the variants were restricted to HapMap3 SNPs.[35] East Asians (n = 504) and Europeans (n = 503) in the 1KGP were used as linkage disequilibrium (LD) reference panels for BBJ and UCM, respectively. A range of P values (fraction of causal variants) was used: 1.0, 0.3, 0.1, 0.03, 0.01, 0.003, and 0.001 [Supplementary Methods, https://links.lww.com/CM9/B558].

Outcomes

All participants were followed up for outcomes of incident disease identified at baseline. Incidental events were identified using linkages with local disease and death registries and the National Health Insurance database, supplemented by active follow-up.[26] The loss to follow-up was <1% before censoring on December 31, 2017. Trained staff, blinded to the baseline information, coded all events using the International Classification of Diseases, Tenth Revision (ICD-10). In this study, hard CAD events included nonfatal myocardial infarction (I21–I23) and fatal CAD (I20–I25), while soft CAD events included all fatal or nonfatal CAD (I20–I25).

Statistical analysis

In the training set for hard CAD (n = 3513 pairs), conditional logistic regression models were used to measure the association of each PRS with hard CAD, stratified by the case–control pair. The analyses were adjusted for the top ten PCA and array versions. The PRS with the highest odds ratio (OR) per standard deviation (SD) was selected as the optimal PRS for hard CAD (PRShard). These steps were repeated for the soft CAD training set (n = 7142) to obtain the optimal PRS for soft CAD (PRSsoft). We used the Pearson correlation coefficient to measure the correlation between two continuous variables.

In the test set, the optimal PRSs (PRShard and PRSsoft) were grouped by quintiles. Cox proportional hazards models were used to estimate the associations between PRSs and CAD risk, stratified by sex and the ten study regions, with age as the time scale. The covariates included the top ten PCAs and array versions. The proportional hazard assumptions were evaluated by examining the Schoenfeld residuals. Restricted cubic splines were used to examine the nonlinear associations between the PRS and CAD risk.

In this study, traditional risk prediction models for CAD were defined as sex-specific Cox models stratified by ten study regions, with time-on-study as the time scale, including models for hard and soft CAD. Predictors in TMs were the same as the "CKB-CVD models."[28] The addition of PRS to TMs led to "PRS-enhanced models." The discrimination performance was assessed using Harrell's C-statistic.[36] Calibration performance was graphically assessed by comparing the mean predicted risks at 10 years with the observed risks across the deciles of predicted risks. The Nam-D'Agostino[37] test was used to quantify the agreement or fit. Net reclassification NRI and integrated discrimination improvement (IDI) were used to evaluate model reclassification before and after the addition of PRS.[38] Since the "CKB-CVD models" have not identified high-risk thresholds for each CVD subtype, we applied different thresholds for hard CAD (ranging from 1% to 10%) and soft CAD (ranging from 5% to 50%) while calculating the categorical NRI.

Analyses were performed using Stata (version 17.0; StataCorp, Texas, USA) and R (version 4.0.3; The R Foundation for Statistical Computing, Vienna, Austria). All statistical tests were two-sided. P values <0.05 were considered significant.

Results

Selection of the optimal PRS

In the hard and soft CAD training sets [Supplementary Figure 1,https://links.lww.com/CM9/B558], the median ages (25–75th percentile range) at baseline were 63 (55–70) and 62 (54–69) years, respectively. Of all the participants, 42.6% (1496/3513) and 48.1% (3434/7142) were women, and 39.9% (1400/3513) and 38.4% (2746/7142) were urban residents, respectively [Supplementary Table 4, https://links.lww.com/CM9/B558]. Of all previously reported PRSs, MetaPRS_CAD (PRS ID: PGS000337) performed the best in both training sets. The ORs per SD (ORSD) were 1.21 (95% confidence interval [CI]: 1.15–1.27) for hard CAD and 1.11 (1.08–1.15) for soft CAD [Table 1 and Supplementary Figure 4, https://links.lww.com/CM9/B558]. In the "C + T" method and the LDpred method, the optimal PRSs for hard CAD and soft CAD both came from UCM. The corresponding ORSD were 1.20 (1.14–1.26) and 1.11 (1.07–1.14) in the "C + T" method [Table 1 and Supplementary Figure 5, https://links.lww.com/CM9/B558], and 1.19 (1.13–1.25) and 1.12 (1.08–1.16) in the LDpred method [Table 1 and Supplementary Figure 6, https://links.lww.com/CM9/B558]. Finally, the optimal PRSs for hard CAD (PRShard) and soft CAD (PRSsoft) were PGS000337, developed in a previous study, and LD-UCM-004 from the LDpred method, respectively [Table 1]. The Pearson correlation coefficients between PRShard and PRSsoft were 0.65 and 0.67 for the training sets of hard and soft CAD, respectively [Supplementary Figure 7, https://links.lww.com/CM9/B558].

Table 1 - Performance of the optimal PRSs in training sets.
Outcome* Method PRS name Number of variants ORSD (95% CI)||
Hard CAD
Previous study PGS000337 59,951 1.21 (1.15, 1.27)
C + T CT-UCM-011 1403 1.20 (1.14, 1.26)
LDpred LD-UCM-004‡§ 1,018,036 1.19 (1.13, 1.25)
Soft CAD
Previous study PGS000337 59,951 1.11 (1.08, 1.15)
C + T CT-UCM-009§ 1,093 1.11 (1.07, 1.14)
LDpred LD-UCM-004 1,018,036 1.12 (1.08, 1.16)
CAD: Coronary artery disease; CI: Confidence interval; LD: Linkage disequilibrium; OR: Odds ratio; ORSD: OR per SD; PCA: Principal components of ancestry; PRS: Polygenic risk score; SD: Standard deviation; UCM: UK Biobank (UKB)–CARDIoGRAMplusC4D meta-analysis.*Hard CAD includes nonfatal myocardial infarction (I21–I23) and fatal CAD (I20–I25); soft CAD includes all fatal and nonfatal CAD (I20–I25).Base data were UCM, r2 threshold = 0, P value threshold = 0.0004. Base data were UCM, parameter P (the fraction of non-zero effects in the prior) was 0.1.§ Base data were UCM, r2 threshold = of 0, P value threshold = of 0.0002. ||Conditional logistic regression models were used to measure the association of each PRS with CAD, stratified by the case–control pair. The covariates included array versions and the top ten PCA. The PRS with the highest ORSD is selected as the optimal PRS.

Validation of the optimal PRS

The median age (25–75th percentile range) for all participants in the training set was 51 years (43–59 years); 59.8% (43,171/72,150) were women, and 47.0% (33,929/72,150) were urban residents. PRSsoft was strongly correlated with PRShard (Pearson's correlation coefficient = 0.66) [Supplementary Figure 8,https://links.lww.com/CM9/B558]. Participants in the higher PRS quintile were more likely to have a higher mean blood pressure, diabetes, and a family history of CVD [Supplementary Table 5, https://links.lww.com/CM9/B558].

During a mean follow-up of 11.2 years (SD = 1.9 years), we documented 1214 hard CAD and 7201 soft CAD cases. After multivariate adjustment, PRShard was positively associated with CAD risk. Hazard ratios per SD (HRSD) were 1.26 (95% CI:1.19–1.33) for hard CAD and 1.11 (1.09–1.14) for soft CAD [Figure 2]. The corresponding HRSD of PRSsoft were 1.24 (1.17–1.31) and 1.10 (1.07–1.12) [Supplementary Figure 9, https://links.lww.com/CM9/B558]. Considering the strong correlation between the two PRSs and that PRShard showed slightly stronger associations with the risk of CAD than PRSsoft, only PRShard was included in the subsequent analyses. The HRSD for hard and soft CAD attenuated slightly after additional adjustment for education level, smoking status, systolic blood pressure, diabetes, and waist circumference [Supplementary Table 6, https://links.lww.com/CM9/B558]. The associations between PRShard and CAD were consistent across different subgroups of sex, age, smoking status, body mass index, waist circumference, hypertension status, and diabetes status. The HRSD for soft CAD was greater in participants with a family history of CVD than in those without (P = 0.017) [Supplementary Figure 10, https://links.lww.com/CM9/B558].

F2
Figure 2:
Adjusted HRs for CAD associated with the optimal PRS for hard CAD. The PRS reported here is the optimal PRS for hard CAD (PGS000337, see Table 1 for details). (A) Hard CAD events included non-fatal myocardial infarction (I21–I23) and fatal CAD (I20–I25). (B) Soft CAD events included all fatal or non-fatal CAD (I20–I25). The models were stratified by sex and 10 study regions and adjusted simultaneously for the top 10 PCA and array versions, with age as the time scale. In each subgraph, HRs and P values on the upper left were derived from linear trend tests. The abscissa of each closed square represents the mean value of the standardized PRS in the corresponding quintile group. The number above the closed square represents the HR. The number below the closed square represents the number of events in this group. The vertical lines indicate 95% CIs. CAD: Coronary artery disease; CIs: Confidence intervals; HRs: Hazard ratios; PCA: Principal components of ancestry; PRS: Polygenic risk score.

Addition of the optimal PRS to TMs

Based on the TMs defined in this study, the addition of the PRS did not improve or only slightly improved the discrimination performance of the models. For hard CAD, the addition of PRS increased Harrell's C by 0.001 in women (P =0.282) and 0.003 in men (P =0.003). For soft CAD, the addition of PRS increased Harrell's C by 0.001 in both sexes (P =0.028 for women and P =0.043 for men) [Figure 3]. The calibration performance showed little change before and after the addition of the PRS [Supplementary Figure 11, https://links.lww.com/CM9/B558]. The addition of the PRS offered little to no improvement in risk stratification. For hard CAD, the largest categorical NRI was 0.032 (95% CI: 0.004–0.060) at the 10% high-risk threshold in women and 0.020 (0.007–0.032) at the 1% high-risk threshold in men. For soft CAD, the categorical NRIs were below 0.01 in both sexes, and most were not statistically significant [Figure 4]. For hard CAD, the continuous NRI was 0.212 (95% CI: 0.119–0.305) in women and 0.193 (0.108–0.278) in men; the relative IDI was 3.9% in women and 4.6% in men. The corresponding continuous NRI and relative IDI of the soft CAD model were lower than those of the hard CAD model [Supplementary Table 7, https://links.lww.com/CM9/B558].

F3
Figure 3:
C statistics evaluating the performance of PRS. The PRS reported here is the optimal PRS for hard CAD (PGS000337, see Table 1 for details). Hard CAD events included non-fatal myocardial infarction (I21–I23) and fatal CAD (I20–I25); soft CAD events included all fatal or non-fatal CAD (I20–I25). TMs for CAD were defined as sex-specific Cox models stratified by 10 study regions, with time on study as the time scale, including models for hard CAD and models for soft CAD. Predictors included in TMs were the same as the "CKB-CVD models," including age, systolic and diastolic blood pressure, use of anti-hypertensives, current daily smoking, self-reported diabetes, and waist circumference. Interactions between age and the other six predictors were also included. The 95% CIs of Harrell's C and Harrell's C changes were calculated by 100 bootstrap replications using the bias-corrected accelerated (BCa) method in Stata. CAD: Coronary artery disease; CIs: Confidence intervals; CKB: China Kadoorie Biobank; CVD: Cardiovascular disease; PRS: Polygenic risk score; TMs: Traditional models.
F4
Figure 4:
Reclassification based on the categorical NRI. The PRS reported here is the optimal PRS for hard CAD (PGS000337, see Table 1 for details). (A) Hard CAD events included non-fatal myocardial infarction (I21–I23) and fatal CAD (I20–I25). (B) Soft CAD events included all fatal or non-fatal CAD (I20–I25). A range of high-risk thresholds was applied in the current analyses. For example, threshold = 1% represents participants with 10-year CAD risk >1% are grouped into a high-risk group. The 95% CIs of the categorical NRI were calculated by 100 bootstrap replications using the bias-corrected accelerated (BCa) method in Stata. CAD: Coronary artery disease; CIs: Confidence intervals; NRI: Net reclassification improvement; PRS: Polygenic risk score.

Discussion

In this study, we examined the association between previously developed PRSs and CAD risk and developed new PRSs in the Chinese population. The PRS developed by Koyama et al[13] performed the best in this analysis. This PRS was independently associated with CAD risk in prospective analyses. However, the addition of PRS to the traditional CAD risk prediction model, which contained only non-laboratory-based information, minimally changed risk discrimination and offered little to no improvement in risk stratification.

The PRS developed by Koyama et al[13] (PGS000337) is the optimal PRS for hard CAD in the training set. It used summary statistics with the largest sample size globally, simultaneously containing CARDIoGRAMplusC4D, UKB, and BBJ, and trained the optimal parameters among East Asians. Although PGS000337 was not the optimal PRS for soft CAD in the training set, it was still in second place, with an OR of only 0.003 away from the first PRS (ORPGS000337 = 1.114, ORLD-UCM-004 = 1.117). In the testing set, the HR of PGS000337 for soft CAD was slightly higher than that of LD-UCM-004. Compared to PGS000337, other PRSs developed based on European populations,[9–17] and the PRS developed based on the Chinese population (metaPRS_CAD, PRS ID: PGS002262)[25] have smaller GWAS based data, possibly leading to less robust regression coefficients, resulting in weakened associations between these PRSs and the risk of CAD in the current study. Koyama et al[13] reported the association of PGS000337 with the risk of CAD mortality over a median of 7.7 years of follow-up among 49,230 participants in BBJ, with the adjusted HRSD of 1.22 (95% CI: 1.11–1.33). To facilitate comparison, we examined the association between PGS000337 and the risk of CAD mortality in the testing set of the current study. A total of 800 CAD-related deaths were recorded during a mean follow-up period of 11 years. The adjusted HRSD in our study was 1.25 (95% CI:1.17–1.34), similar to that reported by Koyama et al[13] (data not shown).

On one hand, the stronger the association between PRS and disease outcome, the more obvious the effect of PRS on improving the TM.[21,23] In previous studies, the HRSD of PRS for CAD was usually between 1.20 and 1.60.[10,12,14,19,20,23–25,39,40] In the current analysis, however, the HRSD of PGS000337 was only 1.26 for hard CAD and 1.11 for soft CAD. These relatively weak associations might be the main reason why PRS hardly changed risk discrimination or improved risk stratification. A PRS developed by Riveros-McKay et al[22] had an HRSD of 1.62 (95% CI:1.57–1.67) among 186,451 UKB participants, which is much stronger than that in our study. The addition of this PRS to the PCE model increased Harrell's C by 0.03 (95% CI:0.02–0.04).[22] The PRS developed by Lu et al[25] had an HRSD of 1.44 (95% CI not reported) and increased Harrell's C by 0.01 (P =7.72 × 10-7) based on the China-PAR model. Two previous studies reported a strength of association similar to that in our study. Mars et al[14] developed a PRS using LDpred, with GWAS summary statistics from the UKB as the base data (PRS_CHD, PRS ID: PGS000329). The HRSD was 1.25 (95% CI:1.18–1.32) in 20,165 FINRISK participants. The addition of this PRS to the PCE model did not improve risk discrimination (∆C = -0.003, P >0.05). Mosley et al[24] evaluated a previous PRS (GPS_CAD; PRS ID: PGS000013) based on summary statistics from CARDIoGRAMplusC4D in two independent cohorts. The HRSD was 1.24 (95% CI: 1.15–1.34) in the Atherosclerosis Risk in Communities study (N = 4847) and 1.38 (95% CI: 1.21–1.58) in the Multi-Ethnic Study of Atherosclerosis (N = 2390). The addition of the PRS to the PCE model did not significantly increase Harrell's C in either cohort (P >0.05).

On the other hand, the better the predictive performance of the TM, the more limited the improvement provided by the PRS. TM defined in the current study did not include lipid information, although dyslipidemia is an important risk factor for CAD. This is because blood lipid testing is not included as a free item for the entire population in the National Basic Public Health Service Program in China. The model, excluding blood lipids, may facilitate broader use in primary prevention.[3,28] The addition of blood lipids should further enhance the current TM. Therefore, adding PRS to a "lipid-enhanced TM" might lead to a more minor improvement than that we have observed in the present study.

In the current study, we observed that the association between the PRS and hard CAD was much stronger than that between the PRS and soft CAD, suggesting that the PRS may have a greater value in predicting the risk of hard CAD. Hard CAD accounted for approximately 15% of all CAD events among the participants. Most of the remaining events involved angina pectoris or chronic ischemic heart disease (ICD-10, I25).[41] Improving the risk prediction for soft CAD may be of greater public health significance. However, the diagnosis of angina pectoris and chronic ischemic heart disease is not well established in clinical practice. This may partially explain the weaker association between the PRS and soft CAD. Further studies are warranted to evaluate the effect of the PRS in improving risk prediction for soft CAD.

The current study, which to date is the largest study based on a Chinese population, systematically evaluated the effects of previous PRSs on improving the traditional CAD risk prediction model. The loss to follow-up rate was <1% at an average follow-up of 11 years in the CKB group, with both the hard and soft CAD outcomes considered. Genotyping and imputation of genetic data in this study were conducted centrally using a standard quality control process. Genetic variants with high reliability also covered the entire genome.

This study had several limitations that merit consideration. First, in the current study, genetic variants with ambiguous SNPs (i.e., A/T, C/G) that were not found or had low imputation quality scores in the CKB were removed during the standard quality control process of PRSs. This may have weakened the association between previous PRSs and CAD. Second, because information on blood lipids was not available for the current study population, we were unable to compare the effects of blood lipids and PRS on improving the traditional CAD risk prediction model.

In conclusion, based on nearly 100,000 Chinese participants with genome-wide genotypic data and prospective follow-up, we examined the association between previous PRSs and the risk of CAD and developed new PRSs in the Chinese population. The optimal PRS minimally changed risk discrimination and offered little to no improvement in risk stratification for CAD. Therefore, this may not be suitable for promoting genetic screening in the general Chinese population to improve CAD risk prediction. With the development of CAD GWAS in the Chinese population, PRS with stronger associations with CAD may be developed to improve the predictive performance of existing models in the future.

Acknowledgments

The most important acknowledgment is to the participants in the study and the members of the survey teams in each of the 10 regional centers, as well as to the project development and management teams based at Beijing, Oxford, and the 10 regional centers. The members of the steering committee and collaborative group are listed in the Supplementary Materials, https://links.lww.com/CM9/B558. We thank the 1000 Genomes Project Consortium for access to genetic data. We thank the Polygenic Score Catalog for access to previous polygenic risk score (PRS) files. We thank Biobank of Japan (BBJ), CARDIoGRAMplusC4D consortium, and UK Biobank (UKB) for access to summary statistics data. We also thank Dr. Jiachen Li for technical supports in the development of PRS.

Funding

This work was supported by grants from the National Natural Science Foundation of China (Nos. 82192904, 82192901, 82192900, 91846303). The CKB baseline survey and the first re-survey were supported by a grant from the Kadoorie Charitable Foundation in Hong Kong. The long-term follow-up is supported by grants from the UK Wellcome Trust (Nos. 212946/Z/18/Z, 202922/Z/16/Z, 104085/Z/14/Z, 088158/Z/09/Z), the National Key Research and Development Program of China (No.2016YFC0900500), National Natural Science Foundation of China (No.81390540), and Chinese Ministry of Science and Technology (No.2011BAI09B01).

Conflicts of interest

The funders played no role in the study design, data collection, data analysis, data interpretation, or manuscript writing. The corresponding author had full access to all data in the study and had the final responsibility for the decision to submit for publication.

Reference

1. GBD 2017 Causes of Death Collaborators Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet 2018;392: 1736–1788. doi: 10.1016/S0140-6736(18)32203-7.
2. Xia X, Cai Y, Cui X, Wu R, Liu F, Huang K, et al. Temporal trend in mortality of cardiovascular diseases and its contribution to life expectancy increase in China, 2013 to 2018. Chin Med J 2022;135: 2066–2075. doi: 10.1097/CM9.0000000000002082.
3. WHO CVD Risk Chart Working Group. World Health Organization cardiovascular disease risk charts: Revised models to estimate risk in 21 global regions. Lancet Glob Health 2019;7: e1332–e1345. doi: 10.1016/S2214-109X(19)30318-3.
4. Andrus B, Lacaille D. 2013 ACC/AHA guideline on the assessment of cardiovascular risk. J Am Coll Cardiol 2014;63: 2886. doi: 10.1016/j.jacc.2014.02.606.
5. SCORE2 working group and ESC Cardiovascular risk collaboration SCORE2 risk prediction algorithms: New models to estimate 10-year risk of cardiovascular disease in Europe. Eur Heart J 2021;42: 2439–2454. doi: 10.1093/eurheartj/ehab309.
6. Yang X, Li J, Hu D, Chen J, Li Y, Huang J, et al. Predicting the 10-year risks of atherosclerotic cardiovascular disease in Chinese population: The China-PAR project (prediction for ASCVD risk in China). Circulation 2016;134: 1430–1440. doi: 10.1161/CIRCULATIONAHA.116.022367.
7. Nikpay M, Goel A, Won HH, Hall LM, Willenborg C, Kanoni S, et al. A comprehensive 1, 000 genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet 2015;47: 1121–1130. doi: 10.1038/ng.3396.
8. van der Harst P, Verweij N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ Res 2018;122: 433–443. doi: 10.1161/CIRCRESAHA.117.312086.
9. Abraham G, Havulinna AS, Bhalala OG, Byars SG, De Livera AM, Yetukuri L, et al. Genomic prediction of coronary heart disease. Eur Heart J 2016;37: 3267–3278. doi: 10.1093/eurheartj/ehw450.
10. Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, et al. Genomic risk prediction of coronary artery disease in 480, 000 adults: Implications for primary prevention. J Am Coll Cardiol 2018;72: 1883–1893. doi: 10.1016/j.jacc.2018.07.079.
11. Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 2018;50: 1219–1224. doi: 10.1038/s41588-018-0183-z.
12. Elliott J, Bodinier B, Bond TA, Chadeau-Hyam M, Evangelou E, Moons KGM, et al. Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA 2020;323: 636–645. doi: 10.1001/jama.2019.22241.
13. Koyama S, Ito K, Terao C, Akiyama M, Horikoshi M, Momozawa Y, et al. Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nat Genet 2020;52: 1169–1177. doi: 10.1038/s41588-020-0705-3.
14. Mars N, Koskela JT, Ripatti P, Kiiskinen TTJ, Havulinna AS, Lindbohm JV, et al. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat Med 2020;26: 549–557. doi: 10.1038/s41591-020-0800-0.
15. Wang M, Menon R, Mishra S, Patel AP, Chaffin M, Tanneeru D, et al. Validation of a genome-wide polygenic score for coronary artery disease in South Asians. J Am Coll Cardiol 2020;76: 703–714. doi: 10.1016/j.jacc.2020.06.024.
16. Ye Y, Chen X, Han J, Jiang W, Natarajan P, Zhao H. Interactions between enhanced polygenic risk scores and lifestyle for cardiovascular disease, diabetes, and lipid levels. Circ Genom Precis Med 2021;14: e003128. doi: 10.1161/CIRCGEN.120.003128.
17. Tamlander M, Mars N, Pirinen M, Widén E, Ripatti S; FinnGen. Integration of questionnaire-based risk factors improves polygenic risk scores for human coronary heart disease and type 2 diabetes. Commun Biol 2022;5: 158. doi: 10.1038/s42003-021-02996-0.
18. Tikkanen E, Havulinna AS, Palotie A, Salomaa V, Ripatti S. Genetic risk prediction and a 2-stage risk screening strategy for coronary heart disease. Arterioscler Thromb Vasc Biol 2013;33: 2261–2266. doi: 10.1161/atvbaha.112.301120.
19. Iribarren C, Lu M, Jorgenson E, Martínez M, Lluis-Ganella C, Subirana I, et al. Clinical utility of multimarker genetic risk scores for prediction of incident coronary heart disease: A cohort study among over 51 000 individuals of European ancestry. Circ Cardiovasc Genet 2016;9: 531–540. doi: 10.1161/circgenetics.116.001522.
20. Hindy G, Aragam KG, Ng K, Chaffin M, Lotta LA, Baras A, et al. Genome-wide polygenic score, clinical risk factors, and long-term trajectories of coronary artery disease. Arterioscler Thromb Vasc Biol 2020;40: 2738–2746. doi: 10.1161/ATVBAHA.120.314856.
21. Bauer A, Zierer A, Gieger C, Büyüközkan M, Muller-Nurasyid M, Grallert H, et al. Comparison of genetic risk prediction models to improve prediction of coronary heart disease in two large cohorts of the MONICA/KORA study. Genet Epidemiol 2021;45: 633–650. doi: 10.1002/gepi.22389.
22. Riveros-Mckay F, Weale ME, Moore R, Selzam S, Krapohl E, Sivley RM, et al. Integrated polygenic tool substantially enhances coronary artery disease prediction. Circ Genom Precis Med 2021;14: e003304. doi: 10.1161/CIRCGEN.120.003304.
23. Sun L, Pennells L, Kaptoge S, Nelson CP, Ritchie SC, Abraham G, et al. Polygenic risk scores in cardiovascular risk prediction: A cohort study and modelling analyses. PLoS Med 2021;18: e1003498. doi: 10.1371/journal.pmed.1003498.
24. Mosley JD, Gupta DK, Tan J, Yao J, Wells QS, Shaffer CM, et al. Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. JAMA 2020;323: 627–635. doi: 10.1001/jama.2019.21782.
25. Lu X, Liu Z, Cui Q, Liu F, Li J, Niu X, et al. A polygenic risk score improves risk stratification of coronary artery disease: A large-scale prospective Chinese cohort study. Eur Heart J 2022;43: 1702–1711. doi: 10.1093/eurheartj/ehac093.
26. Chen Z, Chen J, Collins R, Guo Y, Peto R, Wu F, et al. China Kadoorie Biobank of 0.5 million people: Survey methods, baseline characteristics and long-term follow-up. Int J Epidemiol 2011;40: 1652–1666. doi: 10.1093/ije/dyr120.
27. Vilhjalmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am J Hum Genet 2015;97: 576–592. doi: 10.1016/j.ajhg.2015.09.001.
28. Yang S, Han Y, Yu C, Guo Y, Pang Y, Sun D, et al. Development of a model to predict 10-year risk of ischemic and hemorrhagic stroke and ischemic heart disease using the China Kadoorie Biobank. Neurology 2022;98: e2307–e2317. doi: 10.1212/WNL.0000000000200139.
29. Zhu Z, Li J, Si J, Ma B, Shi H, Lv J, et al. A large-scale genome-wide association analysis of lung function in the Chinese population identifies novel loci and highlights shared genetic etiology with obesity. Eur Respir J 2021;58: 2100199. doi: 10.1183/13993003.00199-2021.
30. O'Connell J, Sharp K, Shrine N, Wain L, Hall I, Tobin M, et al. Haplotype estimation for biobank-scale data sets. Nat Genet 2016;48: 817–820. doi: 10.1038/ng.3583.
31. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018;562: 203–209. doi: 10.1038/s41586-018-0579-z.
32. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al.1000 Genomes Project Consortium A global reference for human genetic variation. Nature 2015;526: 68–74. doi: 10.1038/nature15393.
33. Mitchell R, Hemani G, Dudding T, Corbin L, Harrison S, Paternoster L. UK biobank genetic data: MRC-IEU quality control, version 2. Bristol: University of Bristol, 2019.
34. Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, et al. The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nat Genet 2021;53: 420–425. doi: 10.1038/s41588-021-00783-5.
35. Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, et al.HapMapInternational 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature 2010;467: 52–58. doi: 10.1038/nature09298.
36. Harrell FE Jr., Lee KL, Mark DB. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15: 361–387. doi: 10.1002/(sici)1097-0258(19960229)15:4<361:Aid-sim168>3.0.Co;2-4.
37. D'Agostino RB, Nam BH. Evaluation of the performance of survival analysis models: Discrimination and calibration measures. In: Handbook of statistics. Amsterdam: Elsevier; 2003: 1–25.
38. Pencina MJ Sr, D'Agostino RB, D'Agostino RB Jr., Vasan RS Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med 2008;27: 157–172. doi: 10.1002/sim.2929.
39. Dikilitas O, Schaid DJ, Kosel ML, Carroll RJ, Chute CG, Denny JA, et al. Predictive utility of polygenic risk scores for coronary heart disease in three major racial and ethnic groups. Am J Hum Genet 2020;106: 707–716. doi: 10.1016/j.ajhg.2020.04.002.
40. Neumann JT, Riaz M, Bakshi A, Polekhina G, Thao LTP, Nelson MR, et al. Prognostic value of a polygenic risk score for coronary heart disease in individuals aged 70 years and older. Circ Genom Precis Med 2022;15: e003429. doi: 10.1161/CIRCGEN.121.003429.
41. Lv J, Yu C, Guo Y, Bian Z, Yang L, Chen Y, et al. Adherence to healthy lifestyle and cardiovascular diseases in the Chinese population. J Am Coll Cardiol 2017;69: 1116–1125. doi: 10.1016/j.jacc.2016.11.076.
Keywords:

Coronary artery disease; Polygenic risk score; Risk prediction model; Chinese population

Supplemental Digital Content

Copyright © 2023 The Chinese Medical Association, produced by Wolters Kluwer, Inc. under the CC-BY-NC-ND license.