Secondary Logo

Journal Logo

Supplement Articles

Celiac Disease Genetics

Past, Present and Future Challenges

Wijmenga, Cisca; Gutierrez-Achury, Javier

Author Information
Journal of Pediatric Gastroenterology and Nutrition: July 2014 - Volume 59 - Issue - p S4-S7
doi: 10.1097/01.mpg.0000450392.23156.10
  • Free



In the early 1970 s it was discovered that particular HLA molecules were involved in CD. With time it became evident that patients with CD more often carried particular HLA risk molecules. It is now widely recognized that the HLA-DQA1*05 and DQB1*02 alleles confer risk to CD (1). These HLA-associated alleles are not only frequently found in patients with CD (up to 95%) but also in the general population (up to 35%), implying that HLA is a necessary but not sufficient factor for CD pathogenesis. This prompted research into other genetic factors predisposing to CD. The first efforts were focused on linkage analysis in large pedigrees segregating the disease or in affected sibling pairs, and on investigating association with candidate genes in case-control cohorts. Both approaches were largely unsuccessful for different reasons: (1) the limited power, as most studies were too small; (2) the investigation of only a few candidate genes; and (3) the lack of comprehensive genetic markers across the entire human genome. By the turn of the century in 2000, the CD28-CTLA4-ICOS locus was the only other locus that was suggested to be associated with CD.


The genetic landscape of CD changed completely after the discovery of SNPs (single nucleotide polymorphisms) and the development of array-based genotyping technology in 2006, which allowed for genome-wide association studies (GWAS) to be conducted. CD was one of the first diseases for which a GWAS was performed. The initial study tested 310,605 SNPs for association in 778 cases and 1422 controls and yielded 13 new CD-associated loci (2–5). Many of these new loci contained genes related to the immune response and this became even more apparent after a much larger, second GWAS had been performed in close to 5000 cases and more than 10000 controls, which resulted in 26 CD-associated loci (4). Many of these loci suggested that T-cell development and the innate immune system were causally related to CD.

An important result from the GWAS was the strong overlap seen in the association signals with other autoimmune diseases (6). This observation prompted the development of the Immunochip as a customized genotyping platform to refine the signals to single genes in loci that had already been significantly associated to CD, and to reveal more of the overlapping association signals. Immunochip genotyping in CD revealed another 13 non-HLA loci contributing risk to CD, resulting in 39 GWAS loci comprising 57 independent genetic SNP variants (7) (Fig. 1). Immunochip analysis has been performed for at least another 12 autoimmune diseases and shown that the majority of the 39 non-HLA CD GWAS-associated loci overlap with at least one of the other phenotypes (Fig. 2).

Timeline of celiac disease associations. Outside the human leukocyte antigen (HLA) region, there are currently 39 genome-wide association studies (GWAS) loci and 57 different single-nucleotide polymorphisms (SNP) that show association to celiac disease (CD) (data are based on references 2–5, and 7). For some loci, >1 independent SNP shows association. A resequencing study of the coding part of associated CD loci in large case-control cohorts recently identified 1 additional rare coding variant in the NCF2 gene (12), bringing the total to 40 non-HLA loci and 58 SNPs associated with CD.
Immunochip results of nonhuman leukocyte antigen loci shared across 13 other autoimmune diseases. Celiac disease (CD) shares at least 1 locus with each of 13 other diseases (red ribbons). The same is seen for the other 13 diseases analyzed. The numbers in the external ring represents the total number of shared loci per disease and each ribbon represents the absolute number of shared loci between any 2 specific diseases. Each locus can be shared with more than 1 disease, which results in a larger number of shared loci. The other loci represented in the figure include autoimmune thyroiditis (ATD), systemic sclerosis (SS), atopic dermatitis (AD), primary sclerosing cholangitis (PSCh), ulcerative colitis (UC), Crohn disease (CD), inflammatory bowel diseases (IBDs) with shared loci between UC and CD (sharedIBD), primary biliary cirrhosis (PBC), juvenile idiopathic arthritis (JIA), ankylosing spondylitis (AS), psoriasis (PS), rheumatoid arthritis (RA), and multiple sclerosis (MS). Only loci identified by immunochip analysis and reaching the genome-wide significance threshold P < 5 × 10−8, are included.

HLA and the 39 non-HLA GWAS loci in CD can explain approximately 50% of the genetic variation of the disease. In order to find the remaining genetic variation it is necessary to collect many more samples; however, this has become a difficult task. Alternative approaches may include cross-disease meta-analysis, in other words, including other autoimmune diseases given their genetic overlap. A study of CD and rheumatoid arthritis has shown that this is feasible (8). Although it is difficult to estimate the total number of unique samples already genotyped with the Immunochip platform (given the possible overlap across studies), pooling all of these samples significantly improves the power.

The genetic architecture of CD is likely to be polygenic (ie, many relatively common alleles with very modest effect sizes contribute to the phenotype). A recent study by Stahl et al suggested that at least another 2667 genetic variants could be involved in CD (9). If this proves to be true, large case-control studies will indeed be the only way to find such variants. Large families have been documented with CD, segregating across multiple generations (10). The association in such families could be consistent with a simpler genetic model than the polygenic one and could involve rare alleles with larger effect sizes.

Sequencing all genes in the genome may be a powerful tool to identify such alleles, although a study by Szperl et al in 2011 was unable to reveal any rare coding variants with a strong effect to explain CD in a 3-generation family (11). Similarly, a resequencing study of the coding part of associated CD loci in large case-control cohorts only identified 1 additional rare coding variant in the NCF2 gene (12), bringing the total to 40 non-HLA loci and 58 SNPs associated with CD.

It is becoming clear from further study of the GWAS findings, however, that it cannot be ruled out that the true causal variants are located outside the coding part of the genome. Kumar et al have shown that 81% of the GWAS variants in CD are located in noncoding regions of the genome (either intergenic or intronic) (13). This suggests that one of the mechanisms by which genetic variation could have an impact on phenotypic expression in CD is by affecting the levels of gene expression (4,7) rather than by changing the nature of the protein-coding genes.

More recent evidence suggests that the great majority of the human genome is involved in gene regulation, in part by encoding so-called noncoding RNAs. A more detailed inspection of the CD GWAS loci showed that some of the genetic variants associated with the disease are in, or close to, long noncoding RNA genes (lncRNA), antisense RNA genes, or microRNAs (miRNAs) (14). This observation has implications for our understanding of CD. Because gene regulation is tissue specific and cell specific, future studies aimed at elucidating disease mechanisms should focus on the proper effector cell type. For CD this is likely to be the gluten-restricted T cell, which can only be found in intestinal biopsies from patients.


The challenge for the near future is to translate the genetic findings in CD research into risk prediction models and functional analyses towards unraveling disease mechanisms. For risk prediction it is clear that genetics alone is not, and never will be, sufficient to predict accurately the risk in the general population. Nevertheless, in high-risk groups this may be possible. The major genetic risk factor in CD (ie, carrying the determining HLA genotype) is already being used in clinical practice to identify potentially affected individuals. A recent study showed that the 57 non-HLA alleles could improve the positive predictive value of a model compared to using HLA alone, although the difference is not striking, with an area under the receiver operator characteristic curve of 0.854, compared with 0.823 for HLA only (15). This study showed, however, that 11.1% of individuals could be reclassified into a more accurate risk group. This information could be useful in the future, particularly if a more accurate risk classification has implications for treatment or intervention times. What will clearly boost our knowledge about the relation between genetic risk carriers and phenotypic outcome, are large, prospective cohort studies studying individuals from birth onward and monitoring clinical signs, food intake and microbiome patterns, epigenetic changes, and all kinds of other factors that may affect health and the development of disease. In this regard, it is extremely interesting to study the group of individuals who carry a high genetic risk for CD but remain healthy, because they may provide new insights into how we can prevent disease development.

With respect to disease mechanisms, genetic studies have provided novel insights into underlying pathways that were not well appreciated earlier. For example, we have seen that many of the genes located in the same loci as risk alleles are involved in the innate immunity, which may suggest an intricate relation between the adaptive and innate immune systems. How the innate immunity is triggered is unknown, although it is tempting to speculate that certain gliadin molecules play a role in the process. Future work should be directed towards a better understanding of these aspects.


For a complete picture of CD genetics, thousands of additional genetic variants need to be identified, which will require cohorts of many thousands of individuals. Such studies can be conducted only by pooling within or between populations and within and between related phenotypes. In addition, to better understand the relation between genotype and phenotype, the clinical characterization of patients should be made as accurately as possible, in a standardized fashion, and they should be studied over time. In the future this may also allow for the development of better prediction models for disease development and progression. Studying healthy individuals who are at risk for CD (based on their genetic and family history) may provide insight into preventive factors.

To determine how genetic variation changes the levels of gene expression and how that affects both protein-coding and noncoding RNA genes will require studies of the appropriate effector cell types. Eventually, unraveling the genetics of CD will reveal why the immune system becomes dysregulated and will provide leads to alternative ways to treat the disease.


The authors thank Gosia Trynka for providing Figure 1 and Jackie Senior for editing the manuscript. Current work in the Wijmenga group is funded by the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement no. 2012-322698, and the Dutch Digestive Disease Research Foundation (MLDS, WO11-30).


1. De Haas EC, Kumar V, Wijmenga C. Devi S, Mullin GE. Immunogenetics of celiac disease. Clinical Gastroenterology. Totowa, NJ:Humana Press; 2013. 1–16.
2. Van Heel DA, Franke L, Hunt KA, et al. A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21. Nat Genet 2007; 39:827–829.
3. Hunt KA, Zhernakova A, Turner G, et al. Newly identified genetic risk variants for celiac disease related to the immune response. Nat Genet 2008; 40:395–402.
4. Dubois PC, Trynka G, Franke L, et al. Multiple common variants for celiac disease influencing immune gene expression. Nat Genet 2010; 42:295–302.
5. Trynka G, Zhernakova A, Romanos J, et al. Coeliac disease-associated risk variants in TNFAIP3 and REL implicate altered NF-kappaB signalling. Gut 2009; 58:1078–1083.
6. Zhernakova A, van Diemen CC, Wijmenga C. Detecting shared pathogenesis from the shared genetics of immune-related diseases. Nat Rev Genet 2009; 10:43–55.
7. Trynka G, Hunt KA, Bockett NA, et al. Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nat Genet 2011; 43:1193–1201.
8. Zhernakova A, Stahl EA, Trynka G, et al. Meta-analysis of genome-wide association studies in celiac disease and rheumatoid arthritis identifies fourteen non-HLA shared loci. PLoS Genet 2011; 7:e1002004.
9. Stahl EA, Wegmann D, Trynka G, et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet 2012; 44:483–489.
10. van Belzen MJ, Vrolijk MM, Meijer JW, et al. A genomewide screen in a four-generation Dutch family with celiac disease: evidence for linkage to chromosomes 6 and 9. Am J Gastroenterol 2004; 99:466–471.
11. Szperl AM, Ricano-Ponce I, Li JK, et al. Exome sequencing in a family segregating for celiac disease. Clin Genet 2011; 80:138–147.
12. Hunt KA, Mistry V, Bockett NA, et al. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature 2012; 498:232–235.
13. Kumar V, Wijmenga C, Withoff S. From genome-wide association studies to disease mechanisms: celiac disease as a model for autoimmune diseases. Semin Immunopathol 2012; 34:567–580.
14. Ricano-Ponce I, Wijmenga C. Mapping of immune-mediated disease genes. Annu Rev Genomics Hum Genet 2013; 14:325–353.
15. Romanos J, Rosen A, Kumar V, et al. Improving coeliac disease risk prediction by testing non-HLA variants additional to HLA variants. Gut 2014; 63:415–422.
© 2014 by European Society for Pediatric Gastroenterology, Hepatology, and Nutrition and North American Society for Pediatric Gastroenterology,