In recent years, the development of new DNA technologies has allowed the successful completion of genome-wide association studies, (GWAS) and multiple genetic associations were identified in several diseases such as celiac disease,1 Schizophrenia,2 or Type 2 diabetes.3 In AIDS, several GWAS have been published since 2007,4-10 and associations passing genome-wide significance have been found solely in chromosome 6 in the region of HLA (HCP54-6, HLA-C4,5,11) and in the CXCR6 gene.12 The design of genotyping chips tends to rely mainly on common variants with minor allele frequencies (MAFs) >5%. Moreover, the power to detect low allele frequency SNP associations in AIDS is weakened for 2 complementary reasons: (1) lower frequency means less individuals at stake and thus weaker P values, and (2)AIDS-related genomic cohorts have enrolled fewer patients compared with other pathologies such as chronic kidney disease13 or Type 2 diabetes.3 However, low-frequency SNPs are predicted to have the potential for greater functional consequences than common alleles and may contribute strongly to genetic susceptibility to common diseases14-16; thus, they constitute very good candidates for genetic association studies. We therefore decided to reanalyze the genome-wide data obtained from the Genomics of Resistance to Immunodeficiency Virus (GRIV) cohort on slow progression6 and on rapid progression,7 by focusing specifically on low-frequency SNPs (MAF < 5%). To increase the power of the study, we also included in our analysis rapid and slow progressors from the Dutch ACS cohort11 and from the American MultiCenter AIDS Cohort Study (MACS) 156 group.8
The Case Groups
Slow Progressors and Rapid Progressors were gathered from 3 HIV cohorts:
The Genomics of Resistance to Immunodeficiency Virus Cohort
The GRIV cohort, established in 1995 in France, is a collection of DNA samples to identify host genes associated with slow progression and with rapid progression to AIDS.17-20 Only whitess of European descent living in France were eligible for enrollment to reduce confounding by population substructure. These criteria limit the influence of the ethnic and environmental factors (all subjects live in a similar environment and are infected by HIV-1 subtype B strains) and put an emphasis on the genetic make-up of each individual in determination of rapid progression (RP) or slow progression (SP) to AIDS. The RP and the SP were included on the basis of the main clinical outcomes, CD4 T-cell count, and time to disease progression. SP were defined as asymptomatic HIV-1-infected individuals for more than 8 years, no treatment and a CD4 T-cell count above 500 cells per cubic millimeter. The SP group (n = 270) was composed of 200 males and 70 females aged at inclusion from 19 to 62 (mean = 35). Rapid progressors (RP) were stringently defined as having a 2 or more CD4 T-cell counts below 300 cells per cubic millimeter less than 3 years after the last seronegative testing. The RP group (n = 84) was composed of 72 males and 12 females aged at inclusion from 21 to 55 (median = 32). DNA was obtained from fresh peripheral blood mononuclear cells or from EBV-transformed cell lines. All patients provided written informed consent before enrolment in the GRIV genetic association study.
The Amsterdam Cohort Studies Cohort
The ACS (Amsterdam Cohort Studies) cohort is composed of 316 HIV-1 homosexual men and 100 HIV-1 drug users. This cohort was established to follow the course of HIV-1 infection using various endpoints related to HIV-1 infection and AIDS. The ACS participants were described in detail previously.11
Since CD4 T-cell count was assessed during routine clinical follow-up, we could extract the ACS SP and RP patients respecting the GRIV criteria (SP: n = 36, RP: n = 41). The SP and RP status was easily determined among seroconverter subjects since the date of seroconversion was known. We could also extract SP from seroprevalent subjects when the time of seropositivity was known to be higher than 8 years.
The MultiCenter AIDS Cohort Study 156 Group
The MACS156 study comprises a subset of 156 HIV-1 homosexual men enrolled in the cohort, a prospective cohort originally established to investigate the natural history of HIV infection.21 This subset of MACS European American participants was chosen to be enriched with the extremes AIDS progression phenotypes.8 The MACS156 participants were described in detail previously.8
Because CD4 T-cell count was assessed during routine clinical follow-up, we could extract the MACS SP and PR respecting the GRIV definition (SP: n = 59, RP: n = 22). As with the ACS cohort, the SP and the RP were selected from seroconverter subjects. SP were also selected from seroprevalent subjects.
Three white control groups from France, The Netherlands, and United States were merged and used as a control group.
The Data from an Epidemiological Study on Insulin Resistance Syndrome Control Group
The Data from an Epidemiological Study on Insulin Resistance Syndrome (DESIR) program was a 9-year follow-up study designed to clarify the development of the insulin resistance syndrome. Subjects were recruited from 1994 to 1996 from volunteers insured by the French social security system, which offers periodic health examinations free of charge.22 This control group comprised 694 participants both nonobese and normoglycemic of the DESIR trial, all French and HIV-1 seronegative. It was composed of 281 males and 413 females aged from 30 to 64 years.
Dutch Controls (CTR-ACS)
This control group corresponds to 376 Dutch subjects genotyped with HumanHap300 BeadChips.23
Illumina Control Group
This control group corresponds to white subjects genotyped with HumanHap300 BeadChips from the Illumina Genotyping Control Database (www.illumina.com). There were 324 individuals.
Genotyping Method, Quality Control
The GRIV cohort, the ACS cohort, and the control groups were genotyped using the Illumina Infinium II HumanHap300 BeadChips (Illumina, San Diego, CA). The genotyping quality was assessed for each group using the BeadStudio software (version 3.1, Illumina). Missing data >2%, MAF (<1%), and deviations from Hardy-Weinberg equilibrium in the control groups (P < 1.0 × 10−3) were removed during these quality control steps.6,7,11 The MACS156 group genotype data were obtained through the Affymetrix GeneChip Human Mapping 500K Array (Affymetrix, Santa Clara, CA). Different quality control filters were applied to ensure reliable genotyping data.8 For all the cohorts, we removed outliers exhibiting nonwhite ancestry, cohort by cohort, using the Eigenstrat method.24
We considered 2 pooled case groups (SP from GRIV, ACS, and MACS156 group on the one hand, RP from GRIV, ACS, and MACS156 on the other hand) and the pooled control groups (DESIR, CTR-ACS, Illumina-CTR). We retained the 8584 SNPs exhibiting a MAF < 5% for the slow progressors versus controls (SP-CTR) comparison (Bonferroni 5.8 × 10−6), and 10295 SNPs exhibiting a MAF < 5% for the RP-CTR comparison (Bonferroni 4.8 × 10−6). It was important to choose SNPs with low frequency in either groups because we were looking for factors either promoting progression (low MAF in SP compared with CTR or low MAF in CTR compared with RP) or preventing progression (low MAF in RP compared with CTR or low MAF in CTR compared with SP). The choice to screen specifically low-frequency SNPs stems from 2 complementary reasons: (1) a biological reason: HIV-1 infection is a multi-factorial disease with several genetic factors impacting disease. Several groups have pointed out that most signals involved in complex diseases should deal with the low-frequency variants spectrum.14-16 Indeed, this observation was confirmed in AIDS because the main signal found up to now was in HCP5 with a low-frequency variant.4-6,9 (2) a statistical fact: in our case-control configuration, a low frequency either in the CTR or in the CASE group means systematically a weaker P value for a given odds ratio (OR which measures the real biological impact of the SNP). For instance, for an OR of 0.5 in the dominant mode, the P values obtained are: 0.02 with a SP MAF of 2% and a CTR MAF=3.9%, 1.5 × 10−5 with a SP MAF of 5% and a CTR MAF = 9.5%, 9.98 × 10−8 with a SP MAF of 10% and CTR MAF = 18.2%. Moreover, in the Illumina genotyping chips used, SNPs with low MAF are underrepresented in genotyping chips compared with SNPs with higher MAF (data not shown). Overall, for a biological effect measured by a given OR, SNP associations are thus more difficult to identify in the low-frequency spectrum because they exhibit weaker P values by essence and because they are underrepresented and thus artificially penalized through global Bonferroni corrections.
We performed a case-control analysis by comparing either the SP group (n = 365) or the RP group (n = 147) consisting in GRIV, ACS, and MACS patients with the control group consisting of DESIR, CTR-ACS, and Illumina control individuals (n = 1394). The statistical analysis was performed by a logistic regression (with SNPtest software25) in the dominant mode, taking into account stratification by adding the 2 first Eigenstrat PC axes as covariates using EIGENSOFT. Testing for association under the dominant model was appropriate since we lacked power to test for associations under the recessive model and additionnaly, in this context, the dominant model is identical to the additive mode. For each SNP passing the Bonferroni threshold, we recomputed the regression by adding the HCP5 SNP (rs2395029) as a covariate to check for nonindependence due to linkage disequilibrium (with SNPtest software25).
Using SNPtest Impute software,25 it was possible to impute untyped SNPs of the MACS156 study subjects, absent from the the Affymetrix GeneChip Human Mapping 500K Array (Affymetrix, Santa Clara, CA) and present in the Illumina HumanHap300 BeadChips (Illumina). They were imputed using the HapMap release 21 phased data for the European population (Utah residents with ancestry from northern and western Europe [CEU]) as panel of reference (http://www.hapmap.org). Only the SNPs imputed with high reliability (imputation quality score25 P > 0.9) were retained.
After comparing the total SP group (combined from GRIV, ACS, and MACS 156) with the total control group (combined from DESIR, CTR-ACS, and CTR_Illumina), we also performed an individual analysis of each group GRIV, ACS, and MACS156 for the 4 SNPs passing the Bonferroni threshold. For GRIV, we checked the result obtained in our previous GWAS.6 For ACS, we tested the association between the four SNPs and times to time to AIDS93 by linear regression. For MACS156 group, we also tested the association between the SNPs and time to AIDS93 by linear regression.
For each SNP exhibiting a significant association, we looked for the other SNPs in linkage disequilibrium (r2 ≥ 0.9) in the HapMap population of Western European ancestry (Utah residents with ancestry from northern and western Europe [CEU], HapMap data Release 21a/phase II January 2007, on NCBI B35 assembly, dbSNP125, http://www.hapmap.org) to identify the genes possibly tracked by the SNP associations. A SNP was assigned to a gene if it was located within the gene or in the 2 kilobase flanking regions (potential regulatory sequence), otherwise it was considered intergenic.
To further explore the associations observed, we tried to identify putative modifications in mRNA expression as shown in Genevar26 and Dixon27 databases, in splicing (FastSNP,28 http://fastsnp.ibms.sinica.edu.tw/pages/input_CandidateGeneSearch.jsp/), in polyadenylation (polyAH, http://linux1.softberry.com/berry.phtml?topic=polyah&group=programs&subgroup=promoter and polyApred, http://www.imtech.res.in/raghava/polyapred/submission.html), or in transcription factor binding sites (SignalScan, http://www-bimas.cit.nih.gov/molbio/signal/, TESS, http://www.cbil.upenn.edu/cgi-bin/tess/tess?RQ=WELCOME, and TFSearch, http://www.cbrc.jp/research/db/TFSEARCH.html, derived from TRANSFAC database). We used the Genecards database to look for the tissues and organs expressing the proteins (GeneCards,29 http://www.genecards.org).
For the 8584 SNPs with a MAF <5%, we tested a total of 365 SP patients and compared them with a control cohort of 1394 seronegative individuals (from France, The Netherlands, and United States; Methods). Four signals passed the Bonferroni threshold in the dominant mode, 3 in chromosome 6 and 1 in chromosome 17 (Table 1). Not unexpectently, the best result was obtained for the well-replicated HCP5 rs2395029 (P = 8.54 × 10−15). The 2 other associations found in chromosome 6 were for rs9368699 in C6orf48 (P = 3.03 × 10−10) and rs8192591 in NOTCH4 (P = 9.08 × 10−07). These genes are in partial linkage disequilibrium with HCP5-rs2395029 (resp. r2 = 0.68, r2 = 0.57). The fourth association corresponded to the chromosome 17 SNP rs2072255 located in the RICH2 gene (P = 3.30 × 10−6). This SNP is in full Linkage disequilibrium (LD) (r2 = 1) with rs2072254 corresponding to a synonymous change (Gly188Gly) of RICH2 (Fig. 1; see Map, Supplemental Digital Content 1, http://links.lww.com/QAI/A114). The 4 SNPs corresponded to the higher end of the MAF distribution of the SNP studied (see Histogram, Supplemental Digital Content 2, http://links.lww.com/QAI/A115), which is logical since larger numbers of subjects lead to stronger P values.
To evaluate the role of LD in these associations, we recomputed the P values using the HCP5 SNP (rs2395029) as covariate. The 2 chromosome 6 SNPs were not significant in the adjusted analysis (P = 0.78 for rs9368699 in C6orf48, P = 0.31 for rs8192591 in NOTCH4), but the association remained statistically robust for the chromosome 17 SNP after controlling for the HCP5 SNP, P = 1.82 × 10−6 for rs2072255 in RICH2. In line with this computation, we found that the rs2072255 frequency was not significantly different between the SP elite controllers12 and among the SP nonelite controllers12 (P = 0.7).
The subjects carrying the rs2072255-A allele were 9.04% in the SP group (Table 1), 18.87% in the control group, and 17.1% in the RP group (this excludes the hypothesis of an effect on infection). Interestingly these frequencies were consistent within each of the 3 SP groups and within each of the 3 control groups (Fig. 2). Moreover, the positive signal for association of rs2072255 was confirmed in the individual cohorts GRIV, ACS, and MACS156 study: the comparison of the NP with DESIR controls in GRIV led to P = 8.1 × 10−5, the analysis of ACS by linear regression led to P = 0.05, and the analysis of the MACS156 group by linear regression led also to P = 0.05. A table summarizing the results in the different cohorts is provided in Supplemental Digital Content 3 (http://links.lww.com/QAI/A116). Finally, the rs2072254 RICH2 exonic SNP (in LD with the rs2072255) is located in a splicing site according to FastSNP28 (Fig. 1).
When comparing the 10295 selected SNPs between the 147 RP patients with the 1394 seronegative control group, no signal passed the Bonferroni threshold.
We decided to reanalyze previous GWAS data on AIDS cohorts by focusing specifically on low-frequency SNPs (MAF < 5%). For that, we combined rapid and slow progressors from 3 international cohorts from France (GRIV), Netherlands (ACS), and United States (MACS156 study) totalling 365 SP and 147 RP, who were compared with 1394 controls (seronegative individuals). No association was found when comparing the RP group with the CTR group. This was not a surprise since the RP group comprised only 147 individuals; and for a MAF of 5% in the CTR, one needed to get a quite strong biological effect (OR > 2.8) to pass the Bonferroni threshold. Four SNPs passed the Bonferroni threshold when comparing the SP group with the CTR group. Among them, 3 are in chromosome 6 and were previously found significant by several studies: rs2395029 in HCP54-6,8,9,11, rs9368699 in C6orf485,6, rs8192591 in NOTCH45. NOTCH4 is an interesting candidate gene due to its role in immune regulation and the NOTCH4 rs8192591 corresponds to a nonsynonymous Gly534Ser protein variant. This association was found independent from the HCP5 rs2395029 by Fellay et al,5 however, the signal disappeared in our own study when using HCP5 rs2395029 as covariate. A possible explanation for this discrepancy could be the use of viral load as an endpoint in the study by Fellay et al,5 whereas we used here a progression phenotype.
The fourth signal identified in the present study is new and corresponds to rs2072255 in the chromosome 17 RICH2 gene. The RICH2 gene encodes a Rho-type GTPase activator composed of 818 amino acid (89,247 kDa) with an intracellular localization. It is expressed highly in the brain and at a basal level in several tissues notably in the lymph nodes.29 A recent study has shown that RICH2 is a part of the physical link between BST-2 and the actin cytoskeleton and prevents the internalization of BST-2.30 RICH2 could thus contribute to the externalization of BST-2 which prevents HIV-1 virion budding and release.31 This is a counteraction of HIV-1 Vpu known to favor internalization and degradation of BST-2.31 The rs2072255-A RICH2 allele favors progression to AIDS because 18.87% of the CTR carry the variant, whereas only 9.04% of the SP carry it (Table 1, Fig. 2). Interestingly, rs2072255 is in total LD with the SNP rs2072254 located in a splicing site of RICH2 as suggested by FastSNP (Fig. 2). If the rs2072254-G minor allele alters mRNA splicing, it could lead to a down-modulation of RICH2, and thus explain a diminished effect of BST-2 against HIV-1 production.
The identification of 3 chromosome 6 signals already confirmed by several studies shows the relevance of targeting specifically low-frequency SNPs in GWAS. The genetic and biological data regarding the RICH2 signal are also quite compelling and provide a new relevant candidate gene to explore the molecular etiology of HIV-1 pathogenesis. Further genetic and experimental studies will be needed to confirm and understand the effect of RICH2 in AIDS pathogenesis.
The authors are grateful to all the patients and medical staff who have kindly collaborated with the GRIV project. Data in this article were collected by the Multicenter AIDS Cohort Study (MACS) with centers (Principal Investigators) at The Johns Hopkins Bloomberg School of Public Health (Joseph B. Margolick, Lisa P. Jacobson), Howard Brown Health Center, Feinberg School of Medicine, Northwestern University, and Cook County Bureau of Health Services (John P. Phair, Steven M. Wolinsky), University of California, Los Angeles (Roger Detels), and University of Pittsburgh (Charles R. Rinaldo). Website located at http://www.statepi.jhsph.edu/macs/macs.html.
1. Dubois PC, Trynka G, Franke L, et al. Multiple common variants for celiac disease influencing immune gene expression. Nat Genet
2. O'Donovan MC, Craddock N, Norton N, et al. Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nat Genet
3. Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet
4. Fellay J, Shianna KV, Ge D, et al. A whole-genome association study of major determinants for host control of HIV-1. Science
5. Fellay J, Ge D, Shianna KV, et al. Common genetic variation and the control of HIV-1 in humans. PLoS Genet
6. Limou S, Le Clerc S, Coulonges C, et al. Genomewide association study of an AIDS-nonprogression cohort emphasizes the role played by HLA genes (ANRS Genomewide Association Study 02). J Infect Dis
7. Le Clerc S, Limou S, Coulonges C, et al. Genomewide association study of a rapid progression cohort identifies new susceptibility alleles for AIDS (ANRS Genomewide Association Study 03). J Infect Dis
8. Herbeck JT, Gottlieb GS, Winkler CA, et al. Multistage genomewide association study identifies a locus at 1q41 associated with rate of HIV-1 disease progression to clinical AIDS. J Infect Dis
9. Dalmasso C, Carpentier W, Meyer L, et al. Distinct genetic loci control plasma HIV-RNA and cellular HIV-DNA levels in HIV-1 infection: the ANRS Genome Wide Association 01 study. PLoS One
10. Pelak K, Goldstein DB, Walley NM, et al. Host determinants of HIV-1 control in African Americans. J Infect Dis
11. van Manen D, Kootstra NA, Boeser-Nunnink B, et al. Association of HLA-C and HCP5 gene regions with the clinical course of HIV-1 infection. AIDS
12. Limou S, Coulonges C, Herbeck JT, et al. Multi-cohort genetic association study reveals CXCR6 as a new chemokine receptor involved in AIDS long-term non-progression. J Infect Dis
13. Kottgen A, Pattaro C, Boger CA, et al. New loci associated with kidney function and chronic kidney disease. Nat Genet
14. Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet
15. Gorlov IP, Gorlova OY, Sunyaev SR, et al. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am J Hum Genet
16. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet
17. Flores-Villanueva PO, Hendel H, Caillat-Zucman S, et al. Associations of MHC ancestral haplotypes with resistance/susceptibility to AIDS disease development. J Immunol
18. Hendel H, Caillat-Zucman S, Lebuanec H, et al. New class I and II HLA alleles strongly associated with opposite patterns of progression to AIDS. J Immunol
19. Rappaport J, Cho YY, Hendel H, et al. 32 bp CCR-5 gene deletion and resistance to fast progression in HIV-1 infected heterozygotes. Lancet
20. Winkler CA, Hendel H, Carrington M, et al. Dominant effects of CCR2-CCR5 haplotypes in HIV-1 disease progression. J Acquir Immune Defic Syndr
21. Kaslow RA, Ostrow DG, Detels R, et al. The Multicenter AIDS Cohort Study: rationale, organization, and selected characteristics of the participants. Am J Epidemiol
22. Balkau B. An epidemiologic survey from a network of French Health Examination Centres, (D.E.S.I.R.): epidemiologic data on the insulin resistance syndrome [in French]. Rev Epidemiol Sante Publique
23. van Es MA, Veldink JH, Saris CG, et al. Genome-wide association study identifies 19p13.3 (UNC13A) and 9p21.2 as susceptibility loci for sporadic amyotrophic lateral sclerosis. Nat Genet
24. Price AL, Patterson NJ, Plenge RM, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet
25. Marchini J, Howie B, Myers S, et al. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet
26. Ge D, Zhang K, Need AC, et al. WGAViewer: software for genomic annotation of whole genome association studies. Genome Res
27. Dixon AL, Liang L, Moffatt MF, et al. A genome-wide association study of global gene expression. Nat Genet
28. Yuan HY, Chiou JJ, Tseng WH, et al. FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Res
. 2006;34(Web Server issue):W635-W641.
29. Rebhan M, Chalifa-Caspi V, Prilusky J, et al. GeneCards: integrating information about genes, proteins and diseases. Trends Genet
30. Rollason R, Korolchuk V, Hamilton C, et al. A CD317/tetherin-RICH2 complex plays a critical role in the organization of the subapical actin cytoskeleton in polarized epithelial cells. J Cell Biol
31. Tokarev A, Skasko M, Fitzpatrick K, et al. Antiviral activity of the interferon-induced cellular protein BST-2/tetherin. AIDS Res Hum Retroviruses
AIDS; disease progression; genome-wide association study; HIV-1; RICH2; SNP
Supplemental Digital Content
Copyright © 2011 Wolters Kluwer Health, Inc. All rights reserved.