Most kidney diseases are complex, with both genetic and environmental factors contributing to their risks. The heritability estimated by family studies are 30%–75%.1 2 3 – 4 Genome-wide association studies (GWASs) have grown rapidly in the last decade and identified numerous loci for kidney function, stimulating increasing interest in polygenic risk scores (PRSs) as risk factors for kidney diseases.5 6 7 8 9 10 – 11 However, previous PRSs provided limited risk stratification for adverse kidney outcomes.5 , 10 Potential reasons include small sample sizes of early GWASs, which might lead to imprecise estimation of the associations between individual variants and disease risk; limiting the PRS to genetic variants that reached genome-wide significance (P <5×10−8 ); and a lack of deeply phenotyped data to identify cases.6 7 8 9 10 – 11 With new data and methodologies, there is an opportunity to mitigate these limitations.
New methodologies for large-scale proteomic measurement, using aptamer technologies, also provide an opportunity to assess the effect of genetic susceptibility to low kidney function, measured using the PRS, on the plasma proteome.12 , 13 The plasma proteome consists of thousands of circulating proteins involved in numerous physiologic and pathologic processes, including transporting and signaling, metabolism, vascular function, and defense mechanisms.14 15 – 16 Therefore, the plasma proteome is a reservoir of important potential biomarkers capturing current physiology and pathophysiology. Although previous studies have demonstrated the heritability of plasma protein levels,17 research into the plasma proteomic signals of genetic susceptibility for disease has been limited.18 19 20 – 21 In kidney disease, reduced kidney function is correlated with elevations in many proteins, but the balance of genetically predicted risk versus secondary influences on proteomic signals is unknown.
Using large studies and new algorithms, we explored the potential of using a PRS for kidney function as a prediction tool by developing a PRS using summary statistics from a multiethnic GWAS and investigating the strength of associations of the PRS with incident kidney diseases over 30 years of follow-up in a deeply phenotyped, community-based cohort. We also examined the association of the PRS with 4877 plasma proteins measured at two time points, approximately 20 years apart, and evaluated whether the proteomic associations with genetically predicted risk were mediated by concurrent eGFR.
Methods
Study Cohort
The Atherosclerosis Risk in Communities (ARIC) study is an ongoing, longitudinal cohort of 15,792 middle-aged Black and White participants (55% female) recruited from four communities in the United States (Forsyth County, North Carolina; Jackson, Mississippi; suburbs of Minneapolis, Minnesota; and Washington County, Maryland) from 1987 to 1989 (visit 1). Follow-up examinations were conducted approximately every 3 years: 1990–1992 (visit 2); 1993–1995 (visit 3); and 1996–1998 (visit 4); and, more recently, in 2011–2013 (visit 5), in 2016–2017 (visit 6), and in 2018–2019 (visit 7).22 Each study visit consisted of a clinical examination, blood and urine specimen collection, and administration of extensive questionnaires. Proteomic levels were measured at visit 3 and visit 5. Our primary analysis was restricted to 8866 unrelated participants self-reported as being of European ancestry (EA) (Supplemental Figure 1 ), because most genomic studies with available summary statistics have been conducted among EA individuals. In the proteomic analysis of our study, 7213 participants with valid proteomic measurements at either visit remained. In secondary analysis, we evaluated the PRS in 2871 unrelated ARIC participants self-reported as Black. Study protocols were approved by the institutional review boards, and all study participants provided informed consent (including agreement for industry studies for SomaLogic-sponsored proteomic quantification).
Genotyping
Genotyping was performed on the Affymetrix 6.0 DNA microarray (Affymetrix, Santa Clara, CA) and analyzed with the Birdseed variant-calling algorithm. Haplotype phasing was performed using ShapeIt (version 1.r532).23 Genotypes were imputed on the Michigan Server to the TOPMed reference panel.24 , 25 Quality control was carried out before imputation: single nucleotide polymorphisms (SNPs) were included if they had call rate <95%, Hardy–Weinberg equilibrium P values <0.0001, or minor allele frequencies <1%.26 Individuals with cryptic relatedness, defined as an identity-by-state distance >0.8, generated using PLINK, were also excluded.27
Assessing Kidney Function
Kidney function, measured as eGFR, was assessed by measuring serum creatinine (at all visits excepted visit 7) and serum cystatin C (at all visits excepted visit 1 and 7) using the 2009 Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) creatinine equation (eGFRcr) and 2012 CKD-EPI cystatin C equation (eGFRcys).28 , 29 Serum creatinine level was measured by the modified kinetic Jaffé method, standardized to the National Institute of Standards and Technology standard, and calibrated to an isotope dilution mass spectrometry–traceable reference method.30 31 – 32 Serum cystatin C level was measured by the turbidimetric method, and standardized and calibrated to the International Federation of Clinical Chemistry and Laboratory Medicine reference.33 In the PRS development, we used eGFRcr as the kidney function measurement because this has been the main trait with the largest sample sizes in GWAS meta-analysis.5
PRSs for Kidney Function
PRSs aggregate genome-wide genetic variation into a single score that reflects individual’s inherited disease risk. They are most commonly calculated by summing across SNPs associated with a given trait, weighted by their effect sizes from GWAS results of that trait.
For the PRS construction, we first conducted a GWAS for log(eGFRcr) using PLINK among 90% of unrelated multiethnic participants in the UK Biobank (n =451,508; application identifier 17712), using an additive genetic model adjusted for age, sex, diabetes, and the first 40 principal components (PCs) of genetic ancestry.27 Details of the UK Biobank cohort have been described elsewhere.34 To ensure balance across different ethnic groups, the selection was conducted within each ethnic group that was available in the UK Biobank (Supplemental Figure 2 ). We then conducted a fixed-effects inverse variance–weighted meta-analysis, using METAL, on the summary statistics from our UK Biobank GWAS and a meta-analysis by the CKD Genetics (CKDGen) Consortium of the GWAS of eGFRcr, including up to 765,348 multiethnic individuals (https://ckdgen.imbi.uni-freiburg.de/ ).5 , 35 Because the CKDGen Consortium included the ARIC study (n =11,908), we adjusted the effect sizes of SNPs by reconducting the meta-analysis with the ARIC data removed. This led to a total of 1,159,871 participants, including 949,116 White, 166,997 East Asian, 21,338 South Asian, 17,459 Black, and 4961 Hispanic participants. We also used a set of 608 unrelated multiethnic individuals from phase 3 of the 1000 Genomes Project as a linkage disequilibrium (LD) reference panel for the PRS construction.36 The proportions of the different ethnic groups in the LD reference panel were consistent with those in the meta-analyzed UK Biobank and CKDGen multi-ancestry GWAS summary statistics. The details regarding this UK Biobank GWAS and meta-analysis can be found in the Supplemental Appendix and Supplemental Figure 3 . Approximately 1.5 million SNPs in the union of the Illumina Multi-Ethnic Genotyping Array (MEGA) Beadchip and HapMap3 were kept for score construction. We used MEGA Beadchip in addition to HapMap3 for variant filtering because of its expanded coverage of SNPs for multiethnic populations, as recommended by Vilhjálmsson et al .37 38 – 39 We computed PRS in three ways: LDpred; pruning and thresholding (P+T); and a weighted combination of top SNPs that reached genome-wide significance in our meta-analysis combining UK Biobank and CKDGen, a special case of P+T.
The primary PRS was calculated using the LDpred algorithm.38 For this method, we created seven candidate LDpred PRSs corresponding to seven different fractions of causal variants. This Bayesian approach uses GWAS summary statistics to compute the posterior mean effect sizes for the genetic variants by assuming a prior of the joint effect sizes and incorporating the LD structure of the reference population. Two parameters of the LDpred need to be set by the users. One is the LD radius, which is the number of variants being adjusted for at each side of a variant. We set it to 400 (which corresponds to 1.2×106 /3000), based on Vilhjálmsson et al. 38 The other parameter is the fraction of causal variants, ρ , which can be selected via parameter tuning on a separate dataset. Our tested ρ values were 1, 0.3, 0.1, 0.03, 0.01, 0.003, and 0.001, as suggested in Vilhjálmsson et al .38
We also implemented a second approach, named P+T. P+T scores were constructed by applying two filtering steps based on LD and P value.40 The variants were first pruned to only keep variants that have absolute pairwise correlation weaker than a threshold, r 2 , within a specific genetic distance. The remaining variants are further filtered by removing the ones that have a P value larger than a predefined threshold of significance. We created 30 candidate P+T PRSs on the basis of five r 2 levels (0.1, 0.2, 0.4, 0.6, and 0.8) and six P value thresholds (5×10−8 , 5×10−6 , 5×10−4 , 0.05, 0.5, and 1). We then selected the optimal r 2 and P value threshold via parameter tuning on a separate dataset. Finally, we created a “top SNPs PRS” in a similar manner, using the most commonly used r 2 level, 0.1, and P value threshold, 5×10−8 (genome-wide significance).
For the PRS tuning, the seven candidate LDpred PRSs and the 30 candidate P+T PRSs were calculated in a tuning dataset of the remaining 10% unrelated participants in the UK Biobank (n =45,158) independent from the training set. The best PRS of each approach was determined on the basis of the proportion of the variance (R2 ) of eGFRcr that can be explained by the PRS. Specifically, we fitted a linear regression model with eGFRcr being the outcome; each candidate PRS being the exposure; and age at baseline, sex, and the first PCs of genetic ancestry as the covariates. The best LDpred PRS and P+T PRS and the top SNPs PRS were carried forward into subsequent analyses in a validation dataset independent from both training and tunning sets.
PRS validation was conducted in the 8866 unrelated EA participants in ARIC. The R2 for eGFRcr based on the best LDpred PRS, best P+T PRS, and top SNPs PRS was calculated using the same approach with adjustment for the same set of covariates as in the tuning step. We compared the three PRSs with respect to the number of SNPs included, phenotypic variance explained, and correlations with each other. In secondary analysis, we also evaluated the constructed PRS in 2871 unrelated Black participants in ARIC.
Assessing Incident Kidney Diseases
Four incident kidney diseases were included in our study as outcomes: CKD, ESKD, kidney failure, and AKI. CKD was defined on the basis of the following criteria: eGFR <60 ml/min per 1.73 m2 plus ≥30% eGFR decline during a follow-up visit compared with baseline, ESKD cases identified through the direct linkage to the United States Renal Data System (USRDS) registry, or International Classification of Diseases Ninth Revision (ICD-9) Clinical Modification (CM)/ICD-10-CM codes (Supplemental Table 1A ) representing CKD in any position of hospitalization or death records.41 ESKD was defined as having kidney transplant or dialysis in the USRDS registry. Kidney failure was defined by hospitalization codes or ESKD (Supplemental Table 1B ). AKI was defined by hospitalization or death codes (ICD-9-CM code 584.X or ICD-10-CM code N17.X).42
Protein Measurements
Plasma proteins were measured in ARIC participants at visit 3 and visit 5 using the SOMAscan version 4 assay by SomaLogic. This platform uses slow off-rate modified aptamers to bind to targeted proteins and then uses DNA microarray to quantify them. SOMAscan version 4 includes 4931 unique human proteins or protein complexes, with 95% of the proteins tagged by one modified aptamer and a total of 5211 modified aptamers. Protein measurements were reported as relative fluorescence units.12 There were no missing values in the proteomic data. Details of the quality control of the proteins were described elsewhere.20 Previous studies of SOMAscan version 3, consisting of 4001 aptamers, have shown high precision of this assay in quantifying proteins, with a median coefficient of variance of 4%–8%.43 44 – 45 In this study, all proteins were log(2) transformed.
Assessing Covariates
Information on age, sex, center, and education level were assessed at baseline, and current smoking status was assessed at all visits using an interviewer-administered questionnaire.22 Body mass index (BMI) was calculated as weight (in kilograms) divided by the square of height (in meters), both of which were measured at all visits. Clinical factors included history of hypertension, diabetes, and coronary heart disease (CHD). Hypertension was defined at all visits as systolic BP ≥140 mm Hg, diastolic BP ≥90 mm Hg, or use of antihypertensive medication in the past 2 weeks. Diabetes was defined at all visits as fasting blood glucose ≥126 mg/dl, nonfasting glucose ≥200 mg/dl, self-reported doctor-diagnosed diabetes, or use of diabetes medication in the past 2 weeks. CHD was defined at all visits as prior myocardial infarction observed on electrocardiogram, self-reported doctor-diagnosed heart attack, or self-reported cardiovascular surgery or coronary angioplasty. Albumin-creatinine ratio (ACR) was calculated as urinary albumin divided by urinary creatinine, with albumin measured using an immunoturbidimetric method and creatinine measured using a modified kinetic Jaffé method.
Statistical Analyses
Baseline characteristics of the primary study population were examined. The R2 for eGFRcr at all visits except for visit 7 by the LDpred PRS, P+T PRS, and top SNPs PRS was calculated as Var ( PRS ) × Coefficient PRS 2 Var ( eGFRcr ) in a linear regression model of eGFRcr, adjusting for age at the corresponding visit, sex, center, and first ten genetic PCs. We also calculated the R2 at all visits with data for eGFRcys (visit 2 to visit 6) and ACR (visit 4 to visit 6). For comparison, we calculated the R2 for eGFRcr, eGFRcys, and ACR by the PRS using the same methods among the 2871 Black participants as a secondary analysis.
We evaluated the association between PRS and incident kidney diseases. Using Cox proportional hazard models, we estimated hazard ratios (HRs) and associated 95% CIs of PRSs (per 1 SD lower PRS) for incident kidney diseases outcomes. We considered time at risk to start at visit 1 (1987–1989) and continue until the event of interest, death, loss to follow-up, or the end of follow-up (December 31, 2018). We evaluated three models: model 1, which included age, sex, center, and first ten genetic PCs; model 2, which additionally included education, baseline BMI, baseline smoking status, baseline history of hypertension, diabetes, and CHD; and model 3, which included all variables in model 2 and baseline eGFRcr. We also estimated the HRs of participants with the highest 10% of PRS versus the rest for incident kidney disease outcomes, with adjustment for the same sets of covariates. In sensitivity analysis, we evaluated additional adjustments for ACR. Because ACR was first measured at visit 4, the time to event for this sensitivity analysis started at visit 4 and the baseline covariates were also assessed at this time. We also examined the associations between PRS and all-cause mortality and comorbidities, including incident hypertension, diabetes, CHD, and heart failure. Time to incident kidney diseases was assessed among quartiles of PRS using proportional hazard models, displayed using Kaplan–Meier survival curves. Performance of measurable risk factors with or without PRS for predicting CKD risk was evaluated using area under the receiver‐operator characteristic curves (AUCs) over 30-year follow-up.25 Seven sets of predictors were compared: LDpred PRS alone, modifiable covariates (education, baseline BMI, baseline smoking status, baseline history of hypertension) with or without the LDpred PRS, all covariates (modifiable covariates plus age, sex, center, and baseline history of diabetes, and CHD) with or without the LDpred PRS, and all covariates plus eGFR with or without the LDpred PRS.
To evaluate the association between PRS for kidney function and proteomic measurements, we conducted linear regression of LDpred PRS on 4877 proteins measured at visit 3 and visit 5 adjusting for age, sex, center, and first ten genetic PCs. These estimates reflect the difference in each log(2)-transformed protein per normalized SD unit higher PRS for kidney function. Given that multiple statistical tests were performed, we used a Bonferroni-adjusted P value threshold of 0.05/4877 = approximately 1.2×10−5 to indicate evidence for significant associations. We identified proteins significantly associated with LDpred PRS at both visit 3 and visit 5. We then examined their correlations and associations with eGFRcr and eGFRcys at each visit through Pearson correlation matrix and linear regression of PRS on eGFRcr or eGFRcys with adjustment for age, sex, center, and first ten genetic PCs. We then made scatterplots of Pearson correlations between those proteins and eGFR at each visit against their Pearson correlations with PRS. Causal mediation effects of eGFRcr on the association between PRS and proteins were evaluated using the “mediation” R package, with adjustment for age, sex, center, and first ten genetic PCs. Results were obtained on the basis of 100 simulations using a quasi-Bayesian Monte Carlo method with normal approximation.46 Analyses used R version 3.6.2 software (R Foundation), two-tailed P values, and a statistical significance level of P <0.05, except for the identification of proteomic signals, which was P <1.02×10−5 .
Results
Characteristics of Study Cohort
Our primary study population included 8886 participants (mean age, 54.3 years; 53% female). Around 40% of participants received college or above education (Figure 1 ). At baseline, 25% of participants were smokers; mean BMI was 27.0 kg/m2 ; and the percentage of participants with prevalent hypertension, diabetes, and CHD was 26.7%, 8.6%, and 5.1%, respectively. Over 30 years of follow-up, the number of participants with incident CKD, classified as CKD, ESKD, kidney failure, and AKI, was 2959, 137, 470, and 1723, respectively (Table 1 ).
Figure 1.: PRSs constructed and tuned in large-scale datasets were examined for their associations with incidence kidney diseases and circulating proteome in ARIC. Three polygenic risk scores (PRS) for kidney function measured as estimated glomerular filtration rate based on creatinine level (eGFRcr) were constructed by a meta-analysis of UK Biobank multi-ancestry GWAS for eGFRcr (90% of the cohort) and a multi-ancestry meta-analysis of GWAS for eGFRcr conducted by the CKDGen Consortium with excluding the Atherosclerosis Risk in Communities (ARIC) study using LDpred algorithm, pruning and thresholding (P+T), and a simple weighted combination of SNPs that reached genome-wide significance, followed by parameters tuned using data from the remaining 10% of UK Biobank multi-ancestry participants, then tested for their associations with proteome and incident kidney diseases in ARIC.
Table 1. -
Characteristics of the study population in the Atherosclerosis Risk in Communities study (
n =8886)
Characteristic
Value
Age, yr
54.3 (5.7)
Female
4708 (53.0)
Advanced education
3513 (39.6)
Current smokersa
2192 (24.7)
BMI, kg/m2
a
27.0 (4.8)
History of hypertensiona
2363 (26.7)
History of diabetesa
764 (8.6)
History of CHDa
445 (5.1)
eGFRcr, ml/min per 1.73 m2 a
99.6 (12.5)
Events during follow-up
Incident CKD
2959 (33.6)
Incident ESKD
137 (1.5)
Incident kidney failure
470 (5.3)
Incident AKI
1723 (19.4)
Values are shown as mean (SD) for continuous variables and % (n ) for categoric variables.
a Indicate baseline values.
Characteristics of the PRSs
LDpred PRS, P+T PRS, and top SNPs PRS were all standardized to zero mean and unit variance and were approximately normally distributed in the population. Details of the PRS derivation and validation are provided in Figure 1 . Additional technical details of the three PRSs are summarized in Supplemental Table 2 and described in detail elsewhere.47 LDpred PRS was highly correlated with P+T PRS, with a Pearson correlation coefficient (r ) of 0.85, and moderately correlated with the top SNPs PRS (r =0.64; Supplemental Figure 2 ). The adjusted eGFRcr variance explained by the LDpred PRS for kidney function was relatively consistent across the first four visits and slightly decreased at the last two visits, ranging from 5.5% to 9.4%. The P+T PRS and top SNPs PRS explained the lower adjusted variance in eGFRcr (P+T PRS, 5.3%–7.8%; top SNPs PRS, 4.3%–6.2%). The adjusted variance in eGFRcys was lower and its ranges for LDpred PRS, P+T PRS, and top SNPs PRS were 2.2%–3.9%, 2.0%–3.1%, and 1.6%–2.4%, respectively. Variance explained for eGFR on the basis of both creatinine and cystatin was intermediate, and that for ACR was minimal (Supplemental Table 3 ). Variance explained (pseudo-R2 ) for CKD was 2.34% for LDpred PRS, 2.00% for P+T PRS, and 1.52% for top SNPs PRS. Applying the PRS trained mainly and tuned completely on EA participants to Black participants led to a substantially poorer score performance in Black participants, with the eGFRcr variances explained by the LDpred PRS, P+T PRS, and top SNPs PRS ranging from 1.4% to 3.6%, 0.7% to 2.3%, and 0.5% to 1.8%, respectively (Supplemental Table 4 ).
Associations between PRSs for Kidney Function and Incident Kidney Diseases
Categorizing the PRSs into quartiles showed an incremental association with risk in Kaplan–Meier survival curves (Figure 2 ). In continuous analysis, we observed that the LDpred PRS for kidney function was strongly associated with all four incident kidney diseases: HRs per 1-SD unit lower in LDpred PRS, indicating worse kidney function, were 1.33 (95% CI, 1.28 to 1.37), 1.24 (95% CI, 1.05 to 1.46), 1.18 (95% CI, 1.08 to 1.29), and 1.06 (95% CI, 1.01 to 1.11) for incident CKD, ESKD, kidney failure, and AKI, respectively, after adjusting for age at baseline, sex, center, and first ten genetic PCs. Using P+T PRS and top SNPs PRS, HRs for all incident kidney diseases were of a smaller magnitude than the HR obtained using LDpred PRS, and were only statistically significant for CKD and kidney failure (Table 2 ). Participants in the top 10% of PRS had higher risk of all outcomes compared with the remaining 90% (Table 3 ).
Figure 2.: Quartiles of LDpred PRS showed incremental associations with incident kidney diseases risks ( n =8886). The LDpred PRS for kidney function was categorized into quartiles, and quartiles were examined for their unadjusted associations with incident kidney diseases over 30 years of follow-up.
Table 2. -
Risk for incident kidney diseases per 1-SD lower PRS for kidney function (
n =8886)
Kidney Disease
Modela
LDpred PRSb
P+T PRSb
Top SNPs PRSb
Hazard Ratio (95% CI)
P Value
Hazard Ratio (95% CI)
P Value
Hazard Ratio (95% CI)
P Value
Incident CKD
1
1.33 (1.28 to 1.37)
1.3×10−52
1.30 (1.25 to 1.35)
7.6×10−46
1.24 (1.20 to 1.29)
7.7×10−32
2
1.32 (1.27 to 1.37)
3.2×10−48
1.30 (1.25 to 1.35)
1.2×10−44
1.25 (1.20 to 1.30)
3.8×10−32
3
1.19 (1.15 to 1.24)
4.0×10−19
1.19 (1.14 to 1.23)
2.6×10−18
1.15 (1.11 to 1.19)
8.1×10−13
Incident ESKD
1
1.24 (1.05 to 1.46)
1.4×10−2
1.23 (1.04 to 1.45)
1.6×10−2
1.07 (0.91 to 1.27)
4.1×10−1
2
1.24 (1.04 to 1.47)
1.6×10−2
1.28 (1.07 to 1.53)
6.1×10−3
1.12 (0.94 to 1.33)
2.2×10−1
3
0.97 (0.81 to 1.17)
7.4×10−1
1.04 (0.87 to 1.25)
6.6×10−1
0.94 (0.79 to 1.12)
4.6×10−1
Incident kidney failure
1
1.18 (1.08 to 1.29)
4.0×10−4
1.18 (1.08 to 1.29)
3.9×10−4
1.11 (1.02 to 1.22)
2.1×10−2
2
1.17 (1.07 to 1.29)
6.8×10−4
1.20 (1.10 to 1.32)
1.0×10−4
1.13 (1.03 to 1.24)
7.6×10−3
3
1.03 (0.93 to 1.13)
5.8×10−1
1.07 (0.97 to 1.18)
1.6×10−1
1.02 (0.93 to 1.12)
6.4×10−1
Incident AKI
1
1.06 (1.01 to 1.11)
2.2×10−2
1.04 (0.99 to 1.09)
9.8×10−2
1.00 (0.95 to 1.04)
8.5×10−1
2
1.05 (1.00 to 1.10)
6.9×10−2
1.04 (0.99 to 1.10)
8.4×10−2
1.00 (0.95 to 1.05)
9.7×10−1
3
1.01 (0.96 to 1.06)
7.1 × 10−1
1.01 (0.96 to 1.06)
6.9×10−1
0.97 (0.92 to 1.02)
2.0×10−1
a Model 1 was adjusted for age at baseline, sex, center, and first ten genetic PCs; model 2 was adjusted for all covariates in model 1 and education, baseline BMI, baseline smoking status, baseline history of hypertension, diabetes, and CHD; and model 3 was adjusted for all covariates in model 2 and baseline eGFR.
b LDpred PRS was constructed using LDpred algorithm, a Bayesian approach which uses GWAS summary statistics to compute the posterior mean effect sizes for the genetic variants by assuming a prior of the joint effect sizes and incorporating the LD structure of the reference population. P+T PRS was constructed using P+T, which first prunes variants to only keep those that have an absolute pairwise correlation weaker than a threshold within a certain genetic distance, and then filters variants that have a P value larger than a predefined threshold of significance. Top SNPs PRS was constructed using the most commonly used level of absolute pairwise correlation for pruning and genome-wide significance level for thresholding.
Table 3. -
Risk for incident kidney diseases comparing individuals with top 10% PRS versus the remaining 90% (
n =8886)
Kidney Disease
Modela
LDpred PRSb
P+T PRSb
Top SNPs PRSb
Hazard Ratio (95% CI)
P Value
Hazard Ratio (95% CI)
P Value
Hazard Ratio (95% CI)
P Value
Incident CKD
1
1.69 (1.52 to 1.88)
1.5×10−22
1.78 (1.61 to 1.98)
9.5×10−28
1.48 (1.33 to 1.65)
1.1×10−12
2
1.66 (1.49 to 1.85)
1.0×10−20
1.79 (1.61 to 1.99)
1.5×10−27
1.48 (1.32 to 1.65)
3.9×10−12
3
1.34 (1.20 to 1.49)
2.2×10−7
1.49 (1.34 to 1.66)
3.4×10−13
1.21 (1.09 to 1.36)
7.2×10−4
Incident ESKD
1
1.64 (1.04 to 2.60)
3.4×10−2
1.42 (0.87 to 2.29)
1.6 0−1
1.39 (0.86 to 2.27)
1.8×10−1
2
1.74 (1.10 to 2.76)
1.7×10−2
1.57 (0.97 to 2.55)
6.7×10−2
1.59 (0.97 to 2.59)
6.4×10−2
3
1.11 (0.69 to 1.78)
6.7×10−1
1.10 (0.68 to 1.80)
7.0×10−1
1.09 (0.67 to 1.80)
7.2×10−1
Incident kidney failure
1
1.46 (1.12 to 1.89)
4.6×10−3
1.43 (1.10 to 1.85)
7.3×10−3
1.34 (1.02 to 1.75)
3.3×10−2
2
1.42 (1.09 to 1.85)
8.7×10−3
1.49 (1.14 to 1.93)
3.0×10−3
1.43 (1.09 to 1.88)
9.0×10−3
3
1.08 (0.82 to 1.41)
6.0×10−1
1.20 (0.92 to 1.56)
1.8×10−1
1.14 (0.87 to 1.51)
3.4×10−1
Incident AKI
1
1.07 (0.91 to 1.25)
4.1×10−1
1.03 (0.88 to 1.21)
7.1×10−1
0.91 (0.77 to 1.07)
2.7×10−1
2
1.02 (0.87 to 1.19)
8.2×10−1
1.04 (0.89 to 1.22)
6.2×10−1
0.92 (0.78 to 1.09)
3.4×10−1
3
0.94 (0.80 to 1.11)
4.8×10−1
0.98 (0.83 to 1.15)
7.8×10−1
0.86 (0.73 to 1.02)
9.2×10−2
a Model 1 was adjusted for age at baseline, sex, center, and first ten genetic PCs; model 2 was adjusted for all covariates in model 1 and education, baseline BMI, baseline smoking status, baseline history of hypertension, diabetes, and CHD; and model 3 was adjusted for all covariates in model 2 and baseline eGFR.
b LDpred PRS was constructed using LDpred algorithm, a Bayesian approach which uses GWAS summary statistics to compute the posterior mean effect sizes for the genetic variants by assuming a prior of the joint effect sizes and incorporating the LD structure of the reference population. P+T PRS was constructed using P+T, which first prunes variants to only keep those that have an absolute pairwise correlation weaker than a threshold within a certain genetic distance, and then filters variants that have a P value larger than a predefined threshold of significance. Top SNPs PRS was constructed using the most commonly used level of absolute pairwise correlation for pruning and genome-wide significance level for thresholding.
After adjustment for lifestyle and clinical risk factors (education; baseline BMI; baseline smoking status; and hypertension, diabetes, and CHD history at baseline) in Cox models, we observed limited changes in the risk estimates of the PRSs for all incident kidney diseases (Table 2 ; Supplemental Table 5 ). However, risk estimates were substantially attenuated after additionally adjusting for the eGFRcr (Table 2 ). Additional adjustment for ACR made little difference (Supplemental Table 6 ). For incident CKD, the diagnostic performances for all seven models with different sets of covariates are shown in Supplemental Figure 3 . LDpred PRS alone resulted in an average AUC of 0.58 over 30 years of follow-up, and inclusion of the PRS improved the performance of each set of nongenetic risk factors of CKD. With only modifiable covariates, the average AUC was estimated to be 0.60, and adding PRS to modifiable covariates improved the estimated AUC to 0.63. With additionally adding nonmodifiable covariates, the average AUC was 0.66 without PRS and 0.68 with PRS. With the further addition of eGFR, the difference in AUC with or without PRS was much less pronounced, with the average AUC without PRS being 0.685 and that with PRS being 0.690. During follow-up, the AUC appeared to be steady for the PRS (Supplemental Figures 4 and 5 ). In comparison, AUCs for nongenetic variables all showed a remarkable decrease with time as shown in the Supplemental Figure 5 . The PRS did not associate with all-cause mortality and comorbidities (data not shown).
Plasma Proteome Screening Using PRS
Using linear regression models adjusted for age, sex, center, and first ten genetic PCs, we observed that 210 proteins were associated with LDpred PRS for kidney function at the P =1.2×10−5 level among 7213 participants with valid proteomic measurements at visit 3, and 155 proteins among 3666 participants at visit 5. Among those proteins, 132 were significant at both visits, which are 20 years apart. The strongest associations were with cystatin C, collagen α -1(XV) chain, and desmocollin-2. For the 132 proteins consistently associated with LDpred PRS for kidney function, all but seven of the associations were negative, indicating higher protein levels at lower kidney function. Testican-2, klotho, carbonic anhydrase-related protein 10, hypoxanthine-guanine phosphoribosyltransferase, and angiostatin were the only five proteins with significant positive correlations with kidney function. The correlations with eGFRcr and eGFRcys measured at the corresponding visits were much stronger than those with the LDpred PRS, especially at visit 5 (median negative correlations at visit 5 of −0.0963, −0.4415, and −0.4583 for proteins with LDpred PRS, eGFRcr, and eGFRcys, respectively, with corresponding values at visit 3 of −0.0720, −0.2509, and −0.2661; Figure 3 , Supplemental Table 7 ). This was also true for the significant positive correlations; the median positive correlation coefficients between proteins with PRS, eGFRcr, and eGFRcys at visit 3 were 0.0587, 0.1509, and 0.1723, respectively, and those at visit 5 were 0.0825, 0.2734, and 0.3182, respectively (Supplemental Table 7 ). After additionally adjusting for eGFRcr, 18 out of 132 proteins remained nominally significantly associated at visit 3. Among them, testican-2 was significant at a Bonferroni-corrected P threshold (P =2.1×10−5 ). After accounting for the mediation effects of eGFRcr, at visit 3, the average direct causal effects of the PRS on only 20 proteins were nominally significant, and eight, including testican 2, were significant after Bonferroni correction for multiple testing (Supplemental Table 8 ).
Figure 3.: LDpred PRS broadly influences the circulating proteome, primarily mediated by eGFR. Both protein measures and eGFR are visit specific: (A) for visit 3, (B) for visit 5. A total of 135 proteins were identified as significantly (Bonferroni threshold P <1.02×10−5 ) associated with LDpred PRS at both visit 3 and visit 5 through linear regression of LDpred PRS on 4877 proteins, adjusting for age at the corresponding visits, sex, center, and first ten genetic PCs. Visit 3 (n =7213) was conducted during 1993–1995, when the mean age of study population was 60.4 years; visit 5 (n =3666) was conducted during 2011–2013, when the mean age of study population was 75.9 years. The dashed line in gray is the identity line. CA10, carbonic anhydrase-related protein 10; COL15A1, collagen α -1(XV) chain; CST3, cystatin C; GPC3, glypican-3; GSS, glutathione synthetase; HPRT1, hypoxanthine-guanine phosphoribosyltransferase; KL, klotho; PLG, angiostatin; RNASE1, ribonuclease pancreatic; SPOCK2, testican-2.
Discussion
In this community-based, deeply phenotyped cohort of 8866 middle-aged adults, we leveraged a large multiethnic GWAS and the UK Biobank to construct a range of PRSs for kidney function. A genome-wide score that included 1.5 million SNPs (LDpred PRS) showed strong and significant associations with kidney function and a spectrum of incident kidney diseases, suggesting a potential prediction role of the PRS, whereas narrower risk scores (P+T PRS and top SNPs PRS) showed weaker associations. The LDpred PRS was also associated with a range of plasma protein levels in midlife and older age, and was primarily mediated by the protein associations with eGFR itself, providing additional evidence of biologic plausibility.
During the last decade, GWASs demonstrated thousands of genetic loci associated with hundreds of phenotypes.48 However, for most traits, the heritability explained by those SNPs only explains a small portion of the estimated proportion of phenotypic variance due to additive genetic effects, i.e. , narrow-sense heritability.49 One of the proposed reasons for this was the existence of common causal variants of exceedingly low effect size which require extremely large sample sizes to detect via GWAS.50 51 52 53 – 54 Using large studies for discovery and algorithms that incorporate variants across the genome, our results showed a significant improvement in the performance of PRS compared with previous efforts for score development (7.1% versus 1.7%–2.8% of variance explained).6 7 8 9 10 – 11 The phenotypic variance explained by the PRS was larger than that of previous studies and in line with GWAS meta-analysis estimates.5
Our results demonstrate that, for a spectrum of kidney diseases, PRS can now identify individuals with higher genetic risk over 30 years of follow-up, suggesting a role for prediction and risk stratification in future research and potentially clinical medicine. This was true not only for diseases with established high heritability but also entities, such as AKI, whose genetic basis is less pronounced.1 2 3 – 4 , 55 However, as reflected in our plot comparing AUCs using different sets of predictors, as one includes more measurable clinical risk factors for prediction, the incremental contribution of the PRS was smaller. In particular, much of the genetic risk was mediated through eGFR. Indeed, when all risk factors and eGFR in midlife are included as adjustment variables, the eGFR PRS no longer adds to risk prediction.
With more advanced algorithms and larger sample sizes, we believe the predictive power of the PRS will be further improved. An advantage of PRS for kidney function is that it can be assessed early, well before the emergence of lifestyle and clinical risk factors or midlife declines in eGFR. We hypothesize that the PRS may be most useful early, before eGFR decline has manifested. People with higher risk of kidney disease based on PRS may benefit from early intervention.
Using genetic information of kidney function aggregated as a PRS, we established the link between eGFR and plasma proteome in a hypothesis-free manner and identified proteins that were consistently associated with genetically predicted kidney function at both midlife and older age. This association reflected a broad signal across many proteins associated with a broadly defined PRS based on 1.5 million SNPs, rather than the product of one or a few genetic loci. Furthermore, our causal mediation analysis indicated the relationship between the genetic PRS for low GFR and the plasma protein was mediated by the eGFR, supporting a causal framework whereby genetic susceptibility leads to GFR decline and many consequent plasma proteome alterations. Our findings, combined with future functional validation, can contribute to prioritizing proteins that are functionally involved in the filtration process and/or can be used as kidney function markers that are relatively robust to environmental influences.
Although the majority of proteins were inversely associated with eGFR, suggesting accumulation of proteins or upregulation of ongoing pathologic processes, such as inflammation, specific proteins were positively associated with eGFR, including testican-2, klotho, carbonic anhydrase-related protein 10, hypoxanthine-guanine phosphoribosyltransferase, and angiostatin. Among the negative associations, collagen α -1(XV) chain showed strong and consistent associations with PRS and both eGFRcr and eGFRcys (with a magnitude similar to that of cystatin, the best marker for kidney function independent of demographics in current practice), suggesting its potential role as a marker for genetically predicted kidney function. It forms the α -chain of type XV collagen, a structural component widely expressed across tissues, including kidney, with a predominant localization in the basement membrane.56 , 57 Experimental studies conducted on fetal kidneys demonstrated the existence of both its mRNAs and proteins at glomeruli and collecting ducts, and elevated levels in samples collected from patients with glomerular diseases.58 , 59 Testican-2, one of seven proteins that was lower at lower kidney function and also remained significant after accounting for the mediation effect of eGFRcr, forms a structural component of the extracellular matrix through covalently binding with glycosaminoglycans.60 It is expressed in multiple tissues, including the kidney, and genetic variants in its gene, SPOCK2 , have been strongly associated with bronchopulmonary dysplasia. This gene is not at or near any eGFR loci reported by previous GWAS results.5 Angiostatin, another protein that was lower with lower kidney function and had significant average direct effects after adjusting for the mediation effect of eGFRcr, is a potent angiogenesis inhibitor generated through the proteolysis of plasminogen. Evidence also suggests its anti-inflammatory roles through hindering the recruitment of leukocytes61 and the movement of neutrophils and macrophages.62 , 63 Alterations in angiogenesis and inflammation have important roles in kidney disease pathophysiology.64 , 65 Previous animal experiments demonstrated that angiostatin overexpression slowed the progression of renal disease after chronic kidney injury, and its decreased expression accelerated the pathogenesis of diabetic nephropathy.66 , 67 Observational studies suggested elevated urinary angiostatin as a potential biomarker of the disease severity and progression for IgA nephropathy and lupus nephritis.68 , 69
Our study has limitations. The PRSs developed in our study were constructed using summary statistics from multiethnic meta-analyses that incorporated primarily EA participants (82%). We tuned the PRS in EA participants only, mirroring most genetic studies done so far. Because LD patterns, minor allele frequencies, effect sizes of common variants, and phenotypic features vary by ancestry, our PRS constructed on the basis of GWAS results and LD structure of EA individuals will have compromised power for disease prediction for individuals of other ancestries.70 , 71 When directly applying our PRS trained and tuned mainly on EA participants to Black participants, the phenotypic variance explained was approximately three-fold lower (more than six-fold lower for P+T and top SNP PRSs). It is, therefore, necessary and important for future efforts to include more non-European participants of different ancestries in genomic studies and develop novel methods that appropriately tailor genetic risk scores to each ethnic group. The PRSs presented in this study were for eGFRcr, which means they may include genetic influences of creatinine metabolism and kidney function. However, we included eGFRcys as an outcome to assess the extent to which associations were robust to the kidney function marker used.
In conclusion, our results show PRSs for kidney function are associated with future risk of incident kidney diseases, including CKD progression, ESKD, kidney failure, and AKI, over 30 years of follow-up in a community-based cohort. This association was independent of most risk factors, including albuminuria, and was largely mediated through kidney function itself. A large number of plasma protein levels were elevated among individuals with high genetic risk for low kidney function, whereas seven proteins had lower levels at lower kidney function. The plasma protein associations were much stronger with concurrent kidney function, which mediated the association with the genetic PRS, providing face validity for the PRS.
Disclosures
D.E. Arking reports serving on the Association for the Eradication of Heart Attach Scientific Advisory Board. C.M. Ballantyne reports receiving research funding from Abbott Diagnostic, Akcea, Amgen, Esperion, Ionis, Novartis, Regeneron, and Roche Diagnostic; having consultancy agreements with Abbott Diagnostics, Althera, Amarin, Amgen, Arrowhead, AstraZeneca, Corvidia, Denka Seiken, Esperion, Genentech, Gilead, Matinas BioPharma Inc., New Amsterdam, Novartis, Novo Nordisk, Pfizer, Regeneron, Roche Diagnostic, and Sanofi-Synthélabo; and serving as a scientific advisor for, or member of, Amarin, Amgen, Arrowhead, AstraZeneca, Corvidia, Esperion, and Matinas BioPharma. J. Coresh reports having ownership interest in Health.io; having consultancy agreements with Healthy.io, Kaleido, and Ultragenyx; serving as a scientific advisor for, or membership of, Healthy.io and National Kidney Foundation (NKF); and receiving research funding from National Institutes of Health and NKF (which receives industry support). M.E. Grams reports serving as a scientific advisor for, or member of, American Journal of Kidney Diseases , CJASN , JASN (editorial fellowship committee), Kidney Disease Improving Global Outcomes (KDIGO; executive committee), NKF (scientific advisory board), and USRDS (scientific advisory board); receiving honoraria from the American Society of Nephrology (Young Investigator Award) and academic institutions (for giving grand rounds); receiving travel support from DCI (to speak at the annual meeting) and KDIGO (for participation in scientific meetings and the executive committee); and having other interests in/relationships with NKF, which receives funding from Abbvie, Relypsa, and Thrasos. R.C. Hoogeveen reports having consultancy agreements with and receiving research funding from Denka Seiken. A. Köttgen reports serving as a scientific advisor for, or member of, American Journal of Kidney Diseases , American Kidney Fund, JASN , Kidney International , and Nature Reviews Nephrology ; and receiving honoraria from Sanofi Genzyme. All remaining authors have nothing to disclose.
Funding
The work of A. Köttgen is supported by the Deutsche Forschungsgemeinschaft (German Research Foundation) project identifier 431984000–SFB 1453. C.M. Ballantyne is supported by the National Heart, Lung, and Blood Institute (NHLBI) grant R01 HL134320. J. Chen is supported by National Institute of Diabetes and Digestive and Kidney Diseases grants U01 DK085689 and U01 DK106981. J. Jin and N. Chatterjee are supported by National Human Genome Research Institute grant R01 HG010480. The ARIC study has been funded, in whole or in part, with federal funds from the National Institutes of Health, Department of Health and Human Services via NHLBI , National Institute of Neurological Disorders and Stroke, National Institute on Drug Abuse, and National Institute on Deafness and Other Communication Disorders contracts HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, and HHSN268201700005I; and grants R01 HL134320, U01 2U01HL096812, 2U01HL096814, 2U01HL096899, 2U01HL096902, and 2U01HL096917.
Acknowledgments
The authors thank the staff and participants of the ARIC study RRID:SCR_021769 for their important contributions and the CKDGen Consortium (https://ckdgen.imbi.uni-freiburg.de ) for sharing GWAS summary statistics of individual studies that contributed to PMID 31152163.
SomaLogic Inc. conducted the SOMAscan assays in exchange for use of ARIC data. Some of the data reported here have been supplied by the USRDS.
The funding sources had no role in (1 ) the design or conduct of the study; (2 ) the collection, management, analysis, and interpretation of the data; or (3 ) preparation, review, or approval of the manuscript.
Z. Yu, J. Jin, N. Chatterjee, M.E. Grams, J. Chen and J. Coresh designed the study, wrote the research plan, and interpreted the results; Z. Yu wrote the first draft of the manuscript with critical comments and revision from J. Jin, A. Tin, A. Köttgen, B. Yu, J. Chen, A. Surapaneni, L. Zhou, C.M. Ballantyne, R.C. Hoogeveen, D.E. Arking, N. Chatterjee, M.E. Grams, and J. Coresh.
Data Sharing Statement
The full summary statistics from the multiethnic meta-analysis of CKDGen Consortium GWAS and UK Biobank GWAS for eGFR generated during this study has been deposited in the National Human Genome Research Institute–European Bioinformatics Institute GWAS Catalog under accession code GCST90026654. Data contributed by the CKDGen Consortium are available at https://ckdgen.imbi.uni-freiburg.de/ . The PRSs developed during this study have been deposited in the Polygenic Score Catalog and can be assessed at www.pgscatalog.org/publication/PGP000229/ .
Supplemental Material
This article contains the following supplemental material online at http://jasn.asnjournals.org/lookup/suppl/doi:10.1681/ASN.2020111599/-/DCSupplemental .
Supplemental Appendix . Details of the meta-analysis of the CKDGen Consortium GWAS and the UK Biobank GWAS.
Supplemental Table 1 . ICD-9/10 codes used for identifying chronic kidney disease and kidney failure.
Supplemental Table 2 . Technical details of the PRS.
Supplemental Table 3 . Adjusted proportion of the variance for eGFR and ACR explained by PRS.
Supplemental Table 4 . Adjusted proportion of the variance for eGFR and ACR explained by PRS among participants with African ancestry.
Supplemental Table 5 . Risk for incident kidney diseases according to conventional risk factors of kidney diseases.
Supplemental Table 6 . Risk for incident kidney diseases according to polygenic risk scores of kidney function among participants who attended visit 4.
Supplemental Table 7 . Associations of LDpred PRS for kidney function and eGFR with proteins significantly associated with LDpred PRS at both visit 3 and visit 5.
Supplemental Table 8 . Causal mediation effects of estimated glomerular filtration rate measured based on creatinine on proteins significantly associated with LDpred polygenic risk score at both visit 3 and visit 5.
Supplemental Figure 1 . Flow chart of subjects included in the study.
Supplemental Figure 2 . Scatter plots of PRS with locally weighted smoothing (LOESS) regression line.
Supplemental Figure 3 . 10-fold cross validation Area under the curve (AUC) performances for predicting incident CKD for each of the seven models.
Supplemental Figure 4 . Population substructure shown by the first two principal components of genetic ancestry among participants included in the UK Biobank GWAS.
Supplemental Figure 5 . Quantile–quantile plots for the meta-analysis of UK Biobank and CKDGen GWAS.
References
1. Satko SG, Freedman BI: The familial clustering of renal disease and related phenotypes. Med Clin North Am 89: 447–456, 2005
2. O’Seaghdha CM, Fox CS: Genome-wide association studies of chronic kidney disease: What have we learned? Nat Rev Nephrol 8: 89–99, 2011
3. Wu HH, Kuo CF, Li IJ, Weng CH, Lee CC, Tu KH, et al.: Family aggregation and heritability of ESRD in Taiwan: A population-based study. Am J Kidney Dis 70: 619–626, 2017
4. Akrawi DS, PirouziFard M, Fjellstedt E, Sundquist J, Sundquist K, Zöller B: Heritability of end-stage renal disease: A Swedish Adoption Study. Nephron 138: 157–165, 2018
5. Wuttke M, Li Y, Li M, Sieber KB, Feitosa MF, Gorski M, et al.; Lifelines Cohort Study; V. A. Million Veteran Program: A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat Genet 51: 957–972, 2019
6. Pani A, Bragg-Gresham J, Masala M, Piras D, Atzeni A, Pilia MG, et al.: Prevalence of CKD and its relationship to eGFR-related genetic loci and clinical risk factors in the SardiNIA study cohort. J Am Soc Nephrol 25: 1533–1544, 2014
7. Ma J, Yang Q, Hwang SJ, Fox CS, Chu AY: Genetic risk score and risk of stage 3 chronic kidney disease. BMC Nephrol 18: 32, 2017
8. Thio CHL, van der Most PJ, Nolte IM, van der Harst P, Bültmann U, Gansevoort RT, et al.: Evaluation of a genetic risk score based on creatinine-estimated glomerular filtration rate and its association with kidney outcomes. Nephrol Dial Transplant 33: 1757–1764, 2018
9. Yun S, Han M, Kim HJ, Kim H, Kang E, Kim S, et al.: Genetic risk score raises the risk of incidence of chronic kidney disease in Korean general population-based cohort. Clin Exp Nephrol 23: 995–1003, 2019
10. Hellwege JN, Velez Edwards DR, Giri A, Qiu C, Park J, Torstenson ES, et al.: Mapping eGFR loci to the renal transcriptome and phenome in the VA Million Veteran Program. Nat Commun 10: 3842, 2019
11. Fujii R, Hishida A, Nakatochi M, Furusyo N, Murata M, Tanaka K, et al.: Association of genetic risk score and chronic kidney disease in a Japanese population. Nephrology (Carlton) 24: 670–673, 2019
12. Rohloff JC, Gelinas AD, Jarvis TC, Ochsner UA, Schneider DJ, Gold L, et al.: Nucleic acid ligands with protein-like side chains: Modified aptamers and their use as diagnostic and therapeutic agents. Mol Ther Nucleic Acids 3: e201, 2014
13. Tin A, Yu B, Ma J, Masushita K, Daya N, Hoogeveen RC, et al.: Reproducibility and variability of protein analytes measured using a multiplexed modified aptamer assay. J Appl Lab Med 4: 30–39, 2019
14. Stastna M, Van Eyk JE: Secreted proteins as a fundamental source for biomarker discovery. Proteomics 12: 722–735, 2012
15. Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, et al.: Proteomics. Tissue-based map of the human proteome. Science 347: 1260419, 2015
16. Schwenk JM, Omenn GS, Sun Z, Campbell DS, Baker MS, Overall CM, et al.: The Human Plasma Proteome Draft of 2017: Building on the Human Plasma PeptideAtlas from mass spectrometry and complementary assays. J Proteome Res 16: 4299–4310, 2017
17. Liu Y, Buil A, Collins BC, Gillet LC, Blum LC, Cheng LY, et al.: Quantitative variability of 342 plasma proteins in a human twin population. Mol Syst Biol 11: 786, 2015
18. Suhre K, Arnold M, Bhagwat AM, Cotton RJ, Engelke R, Raffler J, et al.: Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun 8: 14357, 2017
19. de Vries PS, Yu B, Feofanova EV, Metcalf GA, Brown MR, Zeighami AL, et al.: Whole-genome sequencing study of serum peptide levels: The Atherosclerosis Risk in Communities study. Hum Mol Genet 26: 3442–3450, 2017
20. Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, et al.: Genomic atlas of the human plasma proteome. Nature 558: 73–79, 2018
21. Emilsson V, Ilkov M, Lamb JR, Finkel N, Gudmundsson EF, Pitts R, et al.: Co-regulatory networks of human serum proteins link genetics to disease. Science 361: 769–773, 2018
22. The ARIC Investigators: The Atherosclerosis Risk in Communities (ARIC) Study: Design and objectives. Am J Epidemiol 129: 687–702, 1989
23. Delaneau O, Marchini J, Zagury JF: A linear complexity phasing method for thousands of genomes. Nat Methods 9: 179–181, 2011
24. Howie BN, Donnelly P, Marchini J: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5: e1000529, 2009
25. Kowalski MH, Qian H, Hou Z, Rosen JD, Tapia AL, Shan Y, et al.; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; TOPMed Hematology & Hemostasis Working Group: Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet 15: e1008500, 2019
26. Ellinor PT, Lunetta KL, Albert CM, Glazer NL, Ritchie MD, Smith AV, et al.: Meta-analysis identifies six new susceptibility loci for atrial fibrillation. Nat Genet 44: 670–675, 2012
27. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575, 2007
28. Levey AS, Stevens LA, Schmid CH, Zhang YL, Castro AF 3rd, Feldman HI, et al.; CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration): A new equation to estimate glomerular filtration rate. Ann Intern Med 150: 604–612, 2009
29. Inker LA, Eckfeldt J, Levey AS, Leiendecker-Foster C, Rynders G, Manzi J, et al.: Expressing the CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) cystatin C equations for estimating GFR with standardized serum cystatin C values. Am J Kidney Dis 58: 682–684, 2011
30. Eckfeldt JH, Chambless LE, Shen YL: Short-term, within-person variability in clinical chemistry test results. Experience from the Atherosclerosis Risk in Communities Study. Arch Pathol Lab Med 118: 496–500, 1994
31. Coresh J, Astor BC, McQuillan G, Kusek J, Greene T, Van Lente F, et al.: Calibration and random variation of the serum creatinine assay as critical elements of using equations to estimate glomerular filtration rate. Am J Kidney Dis 39: 920–929, 2002
32. Parrinello CM, Grams ME, Couper D, Ballantyne CM, Hoogeveen RC, Eckfeldt JH, et al.: Recalibration of blood analytes over 25 years in the atherosclerosis risk in communities study: Impact of recalibration on chronic kidney disease prevalence and incidence. Clin Chem 61: 938–947, 2015
33. Grubb A, Blirup-Jensen S, Lindström V, Schmidt C, Althaus H, Zegers I; IFCC Working Group on Standardisation of Cystatin C (WG-SCC): First certified reference material for cystatin C in human serum ERM-DA471/IFCC. Clin Chem Lab Med 48: 1619–1621, 2010
34. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al.: The UK Biobank resource with deep phenotyping and genomic data. Nature 562: 203–209, 2018
35. Köttgen A, Pattaro C: The CKDGen Consortium: Ten years of insights into the genetic basis of kidney function. Kidney Int 97: 236–242, 2020
36. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, et al.; 1000 Genomes Project Consortium: A global reference for human genetic variation. Nature 526: 68–74, 2015
37. Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, et al.; International HapMap 3 Consortium: Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58, 2010
38. Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, et al.; Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study: Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am J Hum Genet 97: 576–592, 2015
39. Illumina: In: Infinium® Expanded Multi-Ethnic Genotyping Array (MEGAEX): A consortium-built array with increased power for understanding complex disease in diverse human populations. [Accessed October 21, 2021].
https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/mega-ex-data-sheet-370-2015-004.pdf
40. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ: Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4: 7, 2015
41. Grams ME, Rebholz CM, McMahon B, Whelton S, Ballew SH, Selvin E, et al.: Identification of incident CKD stage 3 in research studies. Am J Kidney Dis 64: 214–221, 2014
42. Grams ME, Waikar SS, MacMahon B, Whelton S, Ballew SH, Coresh J: Performance and limitations of administrative data in the identification of AKI. Clin J Am Soc Nephrol 9: 682–689, 2014
43. Ganz P, Heidecker B, Hveem K, Jonasson C, Kato S, Segal MR, et al.: Development and validation of a protein-based risk score for cardiovascular outcomes among patients with stable coronary heart disease. JAMA 315: 2532–2541, 2016
44. Ngo D, Sinha S, Shen D, Kuhn EW, Keyes MJ, Shi X, et al.: Aptamer-based proteomic profiling reveals novel candidate biomarkers and pathways in cardiovascular disease. Circulation 134: 270–285, 2016
45. Candia J, Cheung F, Kotliarov Y, Fantoni G, Sellers B, Griesman T, et al.: Assessment of variability in the SOMAscan assay. Sci Rep 7: 14248, 2017
46. Tingley D, Yamamoto T, Hirose K, Keele L, Imai K: Mediation: R package for causal mediation analysis. J Stat Soft 59: 1–38, 2014
47. Aragam KG, Natarajan P: Polygenic scores to assess atherosclerotic cardiovascular disease risk: Clinical perspectives and basic implications. Circ Res 126: 1159–1177, 2020
48. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al.: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106: 9362–9367, 2009
49. Visscher PM, Brown MA, McCarthy MI, Yang J: Five years of GWAS discovery. Am J Hum Genet 90: 7–24, 2012
50. Visscher PM, Hill WG, Wray NR: Heritability in the genomics era—Concepts and misconceptions. Nat Rev Genet 9: 255–266, 2008
51. Loh PR, Bhatia G, Gusev A, Finucane HK, Bulik-Sullivan BK, Pollack SJ, et al.; Schizophrenia Working Group of Psychiatric Genomics Consortium: Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat Genet 47: 1385–1392, 2015
52. Boyle EA, Li YI, Pritchard JK: An expanded view of complex traits: From polygenic to omnigenic. Cell 169: 1177–1186, 2017
53. Zeng J, de Vlaming R, Wu Y, Robinson MR, Lloyd-Jones LR, Yengo L, et al.: Signatures of negative selection in the genetic architecture of human complex traits. Nat Genet 50: 746–753, 2018
54. Zhang W, Bojorquez-Gomez A, Velez DO, Xu G, Sanchez KS, Shen JP, et al.: A global transcriptional network connecting noncoding mutations to changes in tumor gene expression. Nat Genet 50: 613–620, 2018
55. Stafford-Smith M, Li YJ, Mathew JP, Li YW, Ji Y, Phillips-Bute BG, et al.; Duke Perioperative Genetics and Safety Outcomes (PEGASUS) Investigative Team: Genome-wide association study of acute kidney injury after coronary bypass graft surgery identifies susceptibility loci. Kidney Int 88: 823–832, 2015
56. Myers JC, Dion AS, Abraham V, Amenta PS: Type XV collagen exhibits a widespread distribution in human tissues but a distinct localization in basement membrane zones. Cell Tissue Res 286: 493–505, 1996
57. Iozzo RV: Basement membrane proteoglycans: From cellar to ceiling. Nat Rev Mol Cell Biol 6: 646–656, 2005
58. Kivirikko S, Saarela J, Myers JC, Autio-Harmainen H, Pihlajaniemi T: Distribution of type XV collagen transcripts in human tissue and their production by muscle cells and fibroblasts. Am J Pathol 147: 1500–1509, 1995
59. Hägg PM, Hägg PO, Peltonen S, Autio-Harmainen H, Pihlajaniemi T: Location of type XV collagen in human tissues and its accumulation in the interstitial matrix of the fibrotic kidney. Am J Pathol 150: 2075–2086, 1997
60. Nakada M, Miyamori H, Yamashita J, Sato H: Testican 2 abrogates inhibition of membrane-type matrix metalloproteinases by other testican family proteins. Cancer Res 63: 3364–3369, 2003
61. Chavakis T, Athanasopoulos A, Rhee JS, Orlova V, Schmidt-Wöll T, Bierhaus A, et al.: Angiostatin is a novel anti-inflammatory factor by inhibiting leukocyte recruitment. Blood 105: 1036–1043, 2005
62. Benelli R, Morini M, Carrozzino F, Ferrari N, Minghelli S, Santi L, et al.: Neutrophils as a key cellular target for angiostatin: implications for regulation of angiogenesis and inflammation. FASEB J 16: 267–269, 2002
63. Perri SR, Annabi B, Galipeau J: Angiostatin inhibits monocyte/macrophage migration via disruption of actin cytoskeleton. FASEB J 21: 3928–3936, 2007
64. Kang DH, Kanellis J, Hugo C, Truong L, Anderson S, Kerjaschki D, et al.: Role of the microvascular endothelium in progressive renal disease. J Am Soc Nephrol 13: 806–816, 2002
65. Stenvinkel P, Ketteler M, Johnson RJ, Lindholm B, Pecoits-Filho R, Riella M, et al.: IL-10, IL-6, and TNF-alpha: Central factors in the altered cytokine network of uremia—The good, the bad, and the ugly. Kidney Int 67: 1216–1233, 2005
66. Mu W, Long DA, Ouyang X, Agarwal A, Cruz PE, Roncal CA, et al.: Angiostatin overexpression is associated with an improvement in chronic kidney injury by an anti-inflammatory mechanism. Am J Physiol Renal Physiol 296: F145–F152, 2009
67. Zhang SX, Wang JJ, Lu K, Mott R, Longeras R, Ma JX: Therapeutic potential of angiostatin in diabetic nephropathy. J Am Soc Nephrol 17: 475–486, 2006
68. Xia YY, Bu R, Cai GY, Zhang XG, Duan SW, Wu J, et al.: Urinary angiostatin: A novel biomarker of kidney disease associated with disease severity and progression. BMC Nephrol 20: 118, 2019
69. Wu T, Du Y, Han J, Singh S, Xie C, Guo Y, et al.: Urinary angiostatin–A novel putative marker of renal pathology chronicity in lupus nephritis. Mol Cell Proteomics 12: 1170–1179, 2013
70. Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, et al.: Genetic analyses of diverse populations improves discovery for complex traits. Nature 570: 514–518, 2019
71. Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, et al.: Human demographic history impacts genetic risk prediction across diverse populations. Am J Hum Genet 100: 635–649, 2017