Secondary Logo

Journal Logo


Advances in identifying coding variants of common complex diseases

Cai, Minglonga,b,c,d,e; Ran, Delina,b,c,d,e; Zhang, Xuejuna,b,c,d,e,*

Author Information
doi: 10.1097/JBR.0000000000000046
  • Open



In contrast to monogenic diseases, the etiology of many common diseases is multifactorial with a complex genetic predisposition. The discovery of disease-causing genes informs the pathogenesis, diagnosis, intervention and treatment of diseases. The emergence of high-throughput genotyping technologies as well as completion of the Human Genome Project has led to the development of genetic architecture research by genome-wide association studies (GWAS). In the last decade, GWAS of common diseases have discovered an overwhelming number of susceptibility loci, which has led to a better understanding of the genetic structure and pathomechanisms of diseases. However, one of the surprises of the GWAS findings is that, for most common diseases, single genetic association has a limited effect size and that, collectively, significant markers only account for a limited proportion of heritability.[1] This phenomenon can be explained by the fact that although GWAS are highly effective in identifying common single nucleotide polymorphisms (SNPs), variants with very low allele frequencies are difficult to detect.[2] The second surprise is that, most genetic markers discovered by GWAS reside in non-coding regions,[3] making the functional interpretation of these variants a challenge. Therefore, Exome SNP array analysis and next generation sequencing approaches were subsequently employed to identify rare coding variants.[4,5] Unlike non-coding variants that confer risk for diseases by influencing gene expression through cis or trans regulatory mechanisms, coding variants can perturb both the structure and function of a protein and can disrupt its stability.[6] Moreover, The coding region of the genome (also called the exome) only accounts for approximately 1.5% of the overall genome sequence, but for many common diseases, coding variants collectively explain a significantly higher contribution to the heritability of diseases (Fig. 1).[7,8] These features indicate that coding variants have a greater probability of being causal compared with those involving an otherwise equivalent non-coding variant. Therefore, searching for disease-driven coding variants has been prioritized in revealing the pathogenesis of diseases and for the development of new therapies. However, disease-associated variants in coding regions have not yet been systematically reviewed. In this review, we focus on the advances in identifying coding variants, and we discuss how to interpret the functional role of coding variants. Furthermore, we highlight the translation of coding variants to clinical implementation. We note that coding variants in the major histocompatibility complex region make a great contribution to common diseases, but our focus is on associations between common diseases and coding variants within non-major histocompatibility complex regions. Fine mapping of major histocompatibility complex associations in common diseases is not reviewed here.

Figure 1
Figure 1:
Enrichment estimates for coding variants across 11 traits/diseases. Y-axis represents enrichment, which equals the proportion of heritability/proportion of single nucleotide polymorphisms. X-axis represents traits/diseases. BMI = body mass index.

Database search strategy

A search of the MEDLINE database was undertaken encompassing publications from January 2005 to June 2019. We used the following inclusion criteria: full-text articles that discussed relationships between coding variants and common diseases that were published in English. To define coding variants, the search terms ‘coding variant’, ‘exome chip’, ‘exome sequencing’, and ‘whole genome sequencing’ were used, and to define common diseases, ‘complex disease’ and ‘common disease’ were used. In addition, a search of the MEDLINE database for methods of functional interpretation was performed using the following search criteria: ‘functional interpretation’ and ‘functional validation of coding variants’. We further screened the reference lists of included studies to identify other potentially useful studies. The results were screened by title and abstract, then full texts were analyzed for keywords to identify potentially suitable publications. Data from included papers were extracted independently by 2 groups of researchers.

Discovery of coding variants in complex diseases

Determining disease-driving variants is the first step to translate genetic associations into mechanistic understanding as well as novel or precise treatments (Fig. 2). GWAS using array-based genotyping have now been reported for hundreds of complex diseases, including auto-immune, mental, and cardiovascular diseases, and various cancers.[9] GWAS with large experimental sample sizes have discovered numerous coding variants.[10,11] For example, Mahajan et al[12] combined datasets from 32 GWAS that included 74,124 European type 2 diabetes cases and 824,006 controls, which identified 51 loci that are strongly associated with type 2 diabetes. Among these 51 loci, 8 were missense coding variants, and 5 were implicated as causal for type 2 diabetes. These findings substantially improve fine mapping of causal variants in diseases. However, GWAS was mainly based on linkage disequilibrium (LD) among genetic variants that constitute haplotype blocks across the genome. Each haplotype was tagged by several SNPs, called tag SNPs. This means that markers identified by GWAS were only tag SNPs in the haplotype blocks, and not a causal variant. Under the assumption that a tag SNP could have a strong LD with causal variants that reside in coding regions. Researchers started to perform targeted resequencing of candidate genes to look for disease-related coding variants. For example, by using exome sequencing of 781 patients with psoriasis and 676 controls followed by targeted sequencing of 1326 candidate genes in 9946 patients with psoriasis and 9906 controls from the Chinese population, Tang et al[13] discovered 2 independent psoriasis-associated, low-frequency single nucleotide variants in IL23R and GJB2 and 5 common single nucleotide variants in LCE3D, ERAP1, CARD14, and ZNF816A. All these single nucleotide variants are missense variants that reside in coding regions. To systematically search for coding variants, exome-wide array analysis has been successfully employed in numerous studies to identify a series of functional coding variants.[4,14–18] For example, a large-scale exome-wide analysis, including 10,716 cases with esophageal squamous cell carcinoma and 12,637 healthy controls, found six new susceptibility loci in CCHCR1, TCN2, TNXB, LTA, CYP26B1, and FASN, and 3 low-frequency variants with a odds ratio >1.5,[17] which have greatly expanded the genetic spectrum of the disease. These findings emphasize the important role of coding variants in the development of common diseases. More importantly, with the development of high-throughput sequencing technology and decreasing sequencing costs, whole genome sequencing or whole-exome sequencing can be employed to provide a nearly full coverage of variation in the genome and a much wider minor allele frequency spectrum of variants. This means that whole-exome sequencing/whole-genome sequencing can be more directly used to detect disease-causing mutations. Whole genome-sequencing/whole-exome sequencing technology has been applied with association analysis for various common diseases, including Alzheimer's disease (AD), age-related macular degeneration and cancer.[5,19–21] A whole-exome sequencing and association study of AD with 5740 cases and 5096 cognitively normal controls identified 3 novel genes: IGHG3, AC099552.4, and ZNF655.[5] Furthermore, the sequence kernel association test was used to test for association between genetic variants in a region and disease based on sequencing data,[22] which greatly increases the statistical power. In the study of late-onset AD, researchers using gene-based tests identified two novel genes, PINX1 and TREM2.[23] Although these findings do not directly explain the pathogenesis of the disease, they significantly expand the genetic spectrum of the disease and inform research into the mechanism of the disease and the discovery of novel therapeutics.

Figure 2
Figure 2:
Pipeline analysis of coding variants in common diseases. (1) GWAS and NGS is the first step to identify susceptibility loci of common diseases. (2) Functional annotation to determine if the variants are detrimental. (3) Experimental studies to validate causal variants. (4) Application of casual variants, for example, drug development. GWAS = genome-wide association study, NGS = next generation sequencing.

Coding variants: from associations to biology

Associations identified by genetic research do not in themselves provide a disease mechanism. With the goal of revealing the biological effects behind risk loci, functional interpretation of coding variants and experimental validation are indispensable.

Functional interpretation of coding variants

Coding variants, especially non-synonymous mutations, can cause substitutions of amino acids that affect protein structure, stability, and biochemical properties; thus they may modify the molecular function of a protein.[24] Therefore, when we detect a coding variant, the first step is to determine whether it is detrimental to the function of the protein. For this, bioinformatic analyses were developed to predict the functional impact of coding variants primarily by evaluating their effect on the structure and stability of proteins. Computational methods include Provean,[25] combined annotation-dependent depletion,[26] SIFT,[27] PolyPhen2,[28] MutationTaster,[29] VIPUR,[30] FATHMM,[31] and mutation assessor tool.[32] Furthermore, addition of information concerning artificial neural networks, protein-protein interaction networks, the protein essentiality index, and the pathway in which the protein is involved can significantly improve the predictions, as shown by SNPMuSiC,[33] DEOGEN,[34,35] and SuSPect.[36] For example, a high expression level of P-glycoprotein, encoded by ATP-binding cassette genes, was reported to obstruct the treatment of breast cancer with anti-cancer drugs. Comprehensive bioinformatic analysis incorporating simulation of functional change of P-glycoprotein, prediction of deleterious human SNPs, modeling of the mutant protein structure, and molecular dynamic simulation of P-glycoprotein, have detected 2 mutations (R538S and M701R) in ABCB1 that have deleterious effects on breast cancer-associated P-glycoprotein.[37] More importantly, protein-coding genes generally express various transcripts and their expression might differ across body tissues, which is crucial in the interpretation of coding variants.[38] A novel annotation tool, TiSAn, was developed for estimating tissue-specific effects of coding and non-coding variants. TiSAn combines the power of supervised machine learning with tissue-specific annotations, including genomic, epigenomic, and transcriptomic annotations, and showed high accuracy when predicting tissue-specific functional interpretation of variants in large cohorts of autism spectrum disorder and coronary artery disease patients.[39] Next, functional profiling of the impact of coding variants on molecular pathways is predicted to be performed. Thanks to the availability of public data, Kyoto Encyclopedia of Genes and Genomes analysis,[40] gene ontology annotations and gene ontology enrichment analysis[41] can provide insights into biological information about the variants. Using a whole-exome microarray followed by genotype imputation with a large AD case-control cohort, Sims et al[42] observed three novel AD-associated coding variants in PLCG2, ABI3, and TREM2. Further co-expression network analysis showed that co-expression networks harboring PLCG2 and/or ABI3 were enriched in microglial genes and immune response gene ontology terms.[43] These findings provided robust evidence that the microglia-mediated innate immune response contributes to AD development.

Experimental functional validation of coding variants

Coding variants with a functional role (evaluated by bioinformatic analysis) are important in the pathogenesis of common diseases. To further validate coding variants and uncover the pathway a variant protein is involved in, many experimental functional studies have been developed, from cellular to animal models, and from in vitro to in vivo assays. Cytological functional studies of coding variants can help to determine differences in protein levels and assess the impact on cell functions, such as proliferation, apoptosis, and migration. Functional studies using tissue enable the detection of differential gene expression in disease-associated tissue. Creating animal models of human disease is extremely valuable because of the similarities between humans and animals. Many experimental strategies have been developed, including luciferase reporter assays, DNase I hypersensitivity assays, chromatin conformation capture, electrophoretic mobility shift assay, Hi-C, clustered regularly interspaced short palindromic repeats with Cas9 nuclease (CRISPR/Cas9), proliferation and apoptosis assays, in vivo transgenic reporter assays and microRNA mimic assays.[44] For example, DNA variants in the TREM2 coding region were reported to be associated with AD susceptibility.[42] To elucidate the role of TREM2 in the amyloid cascade in early stage AD, Parhizkar et al[45] created an amyloid precursor protein/presenilin-1 transgenic mouse model and performed intrahippocampal injections of amyloid β-containing brain extracts into the mice. This in vivo assay combined with proteomic analyses and microglia depletion experiments demonstrated that loss-of-function TREM2 mutants increase amyloid plaque seeding and decrease plaque-associated apolipoprotein E and microglial clustering around newly seeded plaques, indicating that loss of TREM2 function contributes to AD risk by increasing amyloidogenesis at relatively young ages. Gene editing techniques (including zinc-finger nucleases, transcription activator-like effector nucleases, and CRISPR/Cas9) possess the ability to assess effects of a single variant in vivo by altering specific position(s) within genes. The newly developed gene editing technology, CRISPR/Cas9, has a high targeting efficiency and accuracy, few off-target effects, low cytotoxicity and low costs, facilitating vector design and manipulation.[46,47] CRISPR/Cas9 has been widely used in creating cellular and animal models of human disease to explore the functional effect of genetic variations, the molecular basis of disease, and to facilitate precision treatment.[48] For instance, coding variants in SLC16A11 were reported to be strongly associated with type 2 diabetes in the Mexican population.[49] Functional study with CRISPR/Cas systems found that knocking down wild-type Slc16a11 produced no significant metabolic defects in mouse models, while the Slc16a11 knockout mouse reconstituted with the mutant SLC16A11 protein caused elevated triglyceride accumulation and induction of insulin resistance via upregulation of lipin1, indicating the gain of an aberrant function by the mutant protein that affects lipid metabolism.[50]

Coding variants: from associations to application

One of the ultimate goals of genetic research is translation of knowledge to enable effective prediction, prevention, and treatment of disease. Although the number of disease-associated coding variations is far smaller than that of non-coding variations, coding variations have greatly contributed to precision medicine.

Coding variants combined with non-coding variants predict disease risk

A key advantage of a genetic-based prediction is that it can be assessed from the time of birth, well before the discriminative capacity of most traditional risk factors emerges, which may facilitate intensive preventive efforts. Predicting disease risk using genetic markers has been a challenge, especially for common diseases, where a single variant only has a moderate effect on phenotype. With an increasing number of common and rare variants identified by GWAS and next generation sequencing analysis, there is great interest in combing the vast number of disease-associated variants to increase the power of risk prediction. For example, integrating multiple loci into a polygenic risk score (PRS) has been proposed to improve predicative accuracy.[51] Lee et al[52] detected common and rare variants in coding as well as regulatory regions using targeted sequencing, and incorporated these markers into genetic risk scores. Consequently, genetic risk scores have improved classification accuracy for vertebral and non-vertebral fractures by 10.2% and 4.9%, respectively. Recently, a more precise computational algorithm, LDpred, has been developed that uses Bayesian PRS and combines all markers throughout the whole genome. This method calculates posterior mean causal effect sizes based on GWAS summary statistics and genetic architecture and LD information in a reference panel, and was shown to outperform other existing PRS methods.[53,54] For example, Khera et al[54] have successfully applied LDpred to 5 common diseases and compared predicative accuracy between a pruning and thresholding method and LDpred. Among 31 predictors, PRS with the LDpred algorithm had a higher accuracy. The LDpred method revealed coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer to have >3-fold increased risk in 8.0%, 6.1%, 3.5%, 3.2%, and 1.5% of the population, respectively. However, there is still a long way to go to accurately predict diseases through genetic markers. With increases in study size, algorithm improvement and the assembly of other non-genetic risk factors, prediction will become more feasible in years to come.

The application of coding variants in disease treatment

Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets. Although the effect sizes of single genetic variants are small, their effect sizes on a molecular pathway can be large, thus making it possible to develop new drugs. Coding variants involved in disease pathogenesis can be targeted by new drugs. For example, through candidate-gene resequencing and genotyping, loss-of-function mutations in SLC30A8 were shown to associate with type 2 diabetes protection. SLC30A8 encodes unstable ZnT8 proteins expressed in pancreatic islets, leading to efforts by several pharmaceutical companies to develop ZnT-8 antagonists.[55] Another study that combined exome sequencing and targeted sequencing demonstrated that a pathogenic role of the interleukin-23 pathway in the development of psoriasis, and now biologics (including Tildrakizumab, guselkumab, and secukinumab) targeting components of the interleukin-23 pathway are becoming the dominant treatments for moderate-severe psoriasis as well as psoriatic arthritis.[13,56–59] Pharmacogenomic studies of common diseases can be used to predict the clinical response and adverse events of drugs in patients with specific genotypes, enabling the most suitable drugs to be used for specific individuals. For instance, niacin has been used to modulate plasma lipid, an important independent risk factor for cardiovascular disease. A study of 2067 participants detected 2 lipid-associated coding variants, p.R311C and p.M317I, in the niacin receptor gene (HCAR2). The study identified that the reduction in lipoprotein (a) in response to niacin was greater in homozygous carriers of the major 317 M allele compared with minor allele carriers,[60] indicating that the coding variant, M317 M, could act as a predictive marker for treatment modulating plasma lipid. Furthermore, Gene editing technology (such as CRISPR/Cas9) has proved to be a powerful tool for the identification of new targets in common diseases,[61–63] which, may facilitate development of precise treatments based on coding variants.

Conclusions and directions

GWAS and next generation sequencing studies have provided a more comprehensive map of coding variations in common diseases. Subsequently, bioinformatic analyses and functional experiments have revealed the functional effects of coding variants and indicated the biological pathways involved in diseases. However, the use of large data sets to systematically explore functional variants is currently limited owing to the lack of sequence-structure-function interaction data. Sequence data are available for the entire human proteome, but the availability of protein structural data, experimentally verified functional associations and biomolecular interaction data is limited.[6] Moreover, the translation of coding variations to clinical implementation is also limited. The availability of public multi-omics datasets and improvement of computational algorithms that integrate tissue-specific resources will help interpretation of coding variants. In addition, CRISPR/Cas9 genome-editing technology will help to identify the functional effects of coding variants and to identify novel therapeutic targets. Last, pharmacogenetic studies with large sample sizes will facilitate personalized treatment based on coding variants.



Author contributions

MC participated in the design and writing of the manuscript, reviewed the manuscript, and approved submission. DR participated in the writing of the manuscript, reviewed the manuscript, and approved submission. XZ participated in the design of the manuscript, reviewed the manuscript, and approved submission.

Financial support

This work was financially supported by the National Natural Science Foundation of China (No. 81130031).

Conflicts of interest

The authors declare that they have no conflicts of interest.


1. Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature 2009; 461:747–753.
2. Marouli E, Graff M, Medina-Gomez C, et al. Rare and low-frequency coding variants alter human adult height. Nature 2017; 542:186–190.
3. Kyono Y, Kitzman JO, Parker SCJ. Genomic annotation of disease-associated variants reveals shared functional contexts. Diabetologia 2019; 62:735–743.
4. Wen L, Zhu C, Zhu Z, et al. Exome-wide association study identifies four novel loci for systemic lupus erythematosus in Han Chinese population. Ann Rheum Dis 2018; 77:417.
5. Bis JC, Jian X, Kunkle BW, et al. Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry 2018; doi: 10.1038/s41380-018-0112-7.
6. Shameer K, Tripathi LP, Kalari KR, et al. Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment. Brief Bioinform 2016; 17:841–862.
7. Gusev A, Lee SH, Trynka G, et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet 2014; 95:535–552.
8. Finucane HK, Bulik-Sullivan B, Gusev A, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 2015; 47:1228–1235.
9. Visscher PM, Wray NR, Zhang Q, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 2017; 101:5–22.
10. Wellcome Trust Case Control Consortium, Maller JB, McVean G, et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat Genet 2012; 44:1294–1301.
11. Mahajan A, Wessel J, Willems SM, et al. Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. Nat Genet 2018; 50:559–571.
12. Mahajan A, Taliun D, Thurner M, et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 2018; 50:1505–1513.
13. Tang H, Jin X, Li Y, et al. A large-scale screen for coding variants predisposing to psoriasis. Nat Genet 2014; 46:45–50.
14. Zuo X, Sun L, Yin X, et al. Whole-exome SNP array identifies 15 new susceptibility loci for psoriasis. Nat Commun 2015; 6:6793.
15. Huyghe JR, Jackson AU, Fogarty MP, et al. Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat Genet 2013; 45:197–201.
16. Kozlitina J, Smagris E, Stender S, et al. Exome-wide association study identifies a TM6SF2 variant that confers susceptibility to nonalcoholic fatty liver disease. Nat Genet 2014; 46:352–356.
17. Chang J, Zhong R, Tian J, et al. Exome-wide analyses identify low-frequency variant in CYP26B1 and additional coding variants associated with esophageal squamous cell carcinoma. Nat Genet 2018; 50:338–343.
18. Chang J, Tian J, Zhu Y, et al. Exome-wide analysis identifies three low-frequency missense variants associated with pancreatic cancer risk in Chinese populations. Nat Commun 2018; 9:3688.
19. Huang LZ, Li YJ, Xie XF, et al. Whole-exome sequencing implicates UBE3D in age-related macular degeneration in East Asian populations. Nat Commun 2015; 6:6687.
20. Witkiewicz AK, McMillan EA, Balaji U, et al. Whole-exome sequencing of pancreatic cancer defines genetic diversity and therapeutic targets. Nat Commun 2015; 6:6744.
21. Wu S, Ou T, Xing N, et al. Whole-genome sequencing identifies ADGRG6 enhancer mutations and FRS2 duplications as angiogenesis-related drivers in bladder cancer. Nat Commun 2019; 10:720.
22. Wu MC, Lee S, Cai T, et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 2011; 89:82–93.
23. Tosto G, Vardarajan B, Sariya S, et al. Association of Variants in PINX1 and TREM2 With Late-Onset Alzheimer Disease. JAMA Neurol 2019; doi: 10.1001/jamaneurol.2019.1066.
24. MacArthur DG, Balasubramanian S, Frankish A, et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 2012; 335:823–828.
25. Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 2015; 31:2745–2747.
26. Kircher M, Witten DM, Jain P, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014; 46:310–315.
27. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 2009; 4:1073–1081.
28. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nat Methods 2010; 7:248–249.
29. Schwarz JM, Rodelsperger C, Schuelke M, et al. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 2010; 7:575–576.
30. Baugh EH, Simmons-Edler R, Muller CL, et al. Robust classification of protein variation using structural modelling and large-scale data integration. Nucleic Acids Res 2016; 44:2501–2513.
31. Shihab HA, Gough J, Cooper DN, et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat 2013; 34:57–65.
32. Ionita-Laza I, McCallum K, Xu B, et al. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet 2016; 48:214–220.
33. Ancien F, Pucci F, Godfroid M, et al. Prediction and interpretation of deleterious coding variants in terms of protein structural stability. Sci Rep 2018; 8:4480.
34. Raimondi D, Gazzo AM, Rooman M, et al. Multilevel biological characterization of exomic variants at the protein level significantly improves the identification of their deleterious effects. Bioinformatics 2016; 32:1797–1804.
35. Raimondi D, Tanyalcin I, Ferte J, et al. DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic Acids Res 2017; 45:W201–W206.
36. Yates CM, Filippis I, Kelley LA, et al. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J Mol Biol 2014; 426:2692–2701.
37. Chakraborty R, Gupta H, Rahman R, et al. In silico analysis of nsSNPs in ABCB1 gene affecting breast cancer associated protein P-glycoprotein (P-gp). Comput Biol Chem 2018; 77:430–441.
38. Zimmermann MT. The importance of biologic knowledge and gene expression context for genomic data interpretation. Front Genet 2018; 9:670.
39. Vervier K, Michaelson JJ. TiSAn: estimating tissue-specific effects of coding and non-coding variants. Bioinformatics 2018; 34:3061–3068.
40. Kanehisa M, Goto S, Furumichi M, et al. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 2010; 38:D355–360.
41. Gene Ontology Consortium. The Gene Ontology project in 2008. Nucleic Acids Res 2008; 36:D440–444.
42. Sims R, van der Lee SJ, Naj AC, et al. Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer's disease. Nat Genet 2017; 49:1373–1384.
43. Conway OJ, Carrasquillo MM, Wang X, et al. ABI3 and PLCG2 missense variants as risk factors for neurodegenerative diseases in Caucasians and African Americans. Mol Neurodegener 2018; 13:53.
44. Farashi S, Kryza T, Clements J, et al. Post-GWAS in prostate cancer: from genetic association to biological contribution. Nat Rev Cancer 2019; 19:46–59.
45. Parhizkar S, Arzberger T, Brendel M, et al. Loss of TREM2 function increases amyloid seeding but reduces plaque-associated ApoE. Nat Neurosci 2019; 22:191–204.
46. Smith AJP, Deloukas P, Munroe PB. Emerging applications of genome-editing technology to examine functionality of GWAS-associated variants for complex traits. Physiol Genomics 2018; 50:510–522.
47. Jiang S, Shen QW. Principles of gene editing techniques and applications in animal husbandry. Biotech 2019; 9:28.
48. Zarei A, Razban V, Hosseini SE, et al. Creating cell and animal models of human disease by genome editing using CRISPR/Cas9. J Gene Med 2019; 21:e3082.
49. Williams AL, Jacobs SBR, et al. SIGMA Type 2 Diabetes Consortium. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature 2014; 506:97–101.
50. Zhao Y, Feng Z, Zhang Y, et al. Gain-of-function mutations of SLC16A11 contribute to the pathogenesis of type 2 diabetes. Cell Rep 2019; 26: 884-892.e4.
51. Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS Genet 2013; 9:e1003348.
52. Lee SH, Kang MI, Ahn SH, et al. Common and rare variants in the exons and regulatory regions of osteoporosis-related genes improve osteoporotic fracture risk prediction. J Clin Endocrinol Metab 2014; 99:E2400–2411.
53. Vilhjálmsson BJ, Yang J, Finucane HK, et al. Modeling linkage disequilibrium increases accuracy of Polygenic Risk scores. Am J Hum Genet 2015; 97:576–592.
54. Khera AV, Chaffin M, Aragam KG, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 2018; 50:1219–1224.
55. Flannick J, Thorleifsson G, Beer NL, et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat Genet 2014; 46:357–363.
56. Reich K, Papp KA, Blauvelt A, et al. Tildrakizumab versus placebo or etanercept for chronic plaque psoriasis (reSURFACE 1 and reSURFACE 2): results from two randomised controlled, phase 3 trials. Lancet 2017; 390:276–288.
57. Deodhar A, Gottlieb AB, Boehncke WH, et al. Efficacy and safety of guselkumab in patients with active psoriatic arthritis: a randomised, double-blind, placebo-controlled, phase 2 study. Lancet 2018; 391:2213–2224.
58. Sofen H, Smith S, Matheson RT, et al. Guselkumab (an IL-23-specific mAb) demonstrates clinical and molecular response in patients with moderate-to-severe psoriasis. J Allergy Clin Immunol 2014; 133:1032–1040.
59. McInnes IB, Sieper J, Braun J, et al. Efficacy and safety of secukinumab, a fully human anti-interleukin-17A monoclonal antibody, in patients with moderate-to-severe psoriatic arthritis: a 24-week, randomised, double-blind, placebo-controlled, phase II proof-of-concept trial. Ann Rheum Dis 2014; 73:349–356.
60. Tuteja S, Wang L, Dunbar RL, et al. Genetic coding variants in the niacin receptor, hydroxyl-carboxylic acid receptor 2, and response to niacin therapy. Pharmacogenet Genomics 2017; 27:285–293.
61. Liu B, Saber A, Haisma HJ. CRISPR/Cas9: a powerful tool for identification of new targets for cancer treatment. Drug Discov Today 2019; 24:955–970.
62. Jensen TI, Axelgaard E, Bak RO. Therapeutic gene editing in haematological disorders with CRISPR/Cas9. Br J Haematol 2019; 185:821–835.
63. Wang L, Zheng W, Liu S, et al. Delivery of CRISPR/Cas9 by novel strategies for gene therapy. ChemBioChem 2019; 20:634–643.

common diseases; GWAS; next generation sequencing; coding variants; non-coding variants

Copyright © 2019 The Chinese Medical Association. Published by Wolters Kluwer Health, Inc.