The HIV genome encodes nine proteins, some of which are produced from precursor polyproteins that are cleaved to generate multiple proteins. Expression of the viral proteins is tightly regulated by a complex mechanism that principally controls the mRNA splicing, transport, translation and cleavage of the polyproteins, providing the proper timing of protein expression and the desired yields of different protein quantities. Thus, not all HIV proteins are expressed simultaneously during the virus life cycle, and an ‘early’ regulatory phase and ‘late’ assembly phase can be defined. Tat is the major regulator of HIV replication, and once replication is initiated, it upregulates the production of all other viral proteins. Rev is a nuclear RNA export factor that controls the transport of unspliced mRNAs and singly spliced mRNAs to the cytoplasm. By this activity, Rev actually increases the production of structural and enzymatic proteins and the transport of mRNA genomes to the cytoplasm. Once Rev reaches a high enough expression level, Gag, Env and Pol proteins are expressed [1–3]. However, Rev neither upregulates its own expression nor the expression of Tat [4,5], which remains relatively low. Thus, the small mRNAs encoding Tat and Rev predominate during the initial phases of replication, whereas production of all other HIV proteins proceeds only when a threshold level of Tat and Rev is attained. We here compare the predicted epitope expression of the ‘early’ regulatory proteins Tat and Rev and the ‘late’ Gag, Pol and Env proteins.
The infection of a cell by HIV can elicit a cytotoxic T-lymphocyte (CTL) response to viral peptides presented by HLA class I molecules. During the early phases of the infection, CTLs play a critical role in the host's anti-HIV immune response. This role is suggested by the drop in HIV viremia loads and by the relief of the acute infection symptoms following the emergence of HIV-specific CTLs, as well as by data from CD8+ T-cell-depleted animal models [6,7]. The CTL response is also associated with a rapid selection of viral CTL escape variants [8,9]. In order to explore the specific HIV CD8+ T cells' response, one has to consider the factors affecting the CTL response, including the kinetics of the viral protein expression, the HLA class I genetic background of the infected individuals and the viral gene diversifications .
In general, CTL epitopes originate from short peptides cleaved by the proteasome  that can pass through the transporter associated with antigen processing (TAP) channel and associate noncovalently with the groove of major histocompatibility complex (MHC)-I molecules. The vast majority of these epitopes are nine-mers. A cleaved nine-mer is presented on an MHC-I molecule only if its affinity to the MHC molecule is sufficiently high.
We here propose to use an immunomic methodology combining genomic data and multiple bioinformatic tools to study the evolution of the anti-HIV CTL response to different viral genes. We use the number of CTL-predicted epitopes within a genomic section and the number and type of mutations they accumulate to study the interplay between the immune pressure and the viral sequence evolution. We show a downtrend in the total number of HIV-predicted epitopes over time, predominantly in the regulatory Tat and Rev proteins, opposed by an uptrend in the number of Gag epitopes.
Materials and methods
HIV and SIV gene sequences were used for this analysis. The sequences were obtained from the NCBI (http://www.ncbi.nlm.nih.gov/) and LANL  databases.
We focused on HIV-1, subtype M clade B. We used 4059 and 4140 sequences from the NCBI and LANL databases respectively, as well as the LANL ancestral sequence. The sequences were obtained from nine genetic segments: Env, Gag, Nef, Pol, Tat, Rev, Vif, Vpr and Vpu. Only sequences of at least 80% of the consensus length were included. The sequences from LANL database included only the first exon of Rev to minimize errors due to mispositioning of the second exon. For the HIV-2 we focused on clade A, and 174 sequences from the LANL database were used. For the SIV analysis, we used all the available sequences of SIVcpz, SIVsmm and SIVmac from the LANL and NCBI databases (55, 141 and 282 sequences, respectively). In the HIV-2 and SIV analyses, the gene Vpu was replaced by Vpx. For the time evolution analysis, the sequences were divided into six groups according to the year they were reported: all years before 1985, 1986–1990, 1991–1994, 1995–1998, 1999–2002 and 2003–2006.
Mutation and consensus
All HIV sequences of a given gene were aligned using ClustalW  (http://www.ebi.ac.uk/Tools/clustalw/) at the nucleotide level, and a consensus sequence was formed. Each codon in the sequence was compared to the appropriate codon in the consensus by removing the gaps within this codon and finding the optimal match within the codon. Any remaining difference was defined as a mutation. If the mutation led to a change in the amino acid, it was defined as a replacement mutation.
Size of Immune Repertoire score
We have analyzed the fraction between the number of epitopes presented in HIV and SIV genes and their random counterpart. The epitope number was computed using three algorithms: a homemade cleavage algorithm , a TAP-binding algorithms developed by Peters et al.  and the BioInformatics and Molecular Analysis Section (BIMAS) MHC-binding algorithms . We have used 31 HLA alleles and weighted the results according to the allele frequency in a given human population. The algorithms' quality was systematically validated by measuring the fraction of experimentally observed epitopes actually predicted by the algorithm and the fraction of random peptides predicted as epitopes. Both fractions were low.A detailed description of the algorithm, their validation and the SIR score can be found in previous works . The SIR score analysis was performed on each sequence by itself. In order to estimate the epitope mutation rate, a consensus was built, and all sequences were compared to it.
In order to study the evolution of the HIV epitope, we computed all predicted epitopes in each HIV protein and compared this number to the number of expected epitopes in a random amino acid sequence of the same length and composition. We defined the ‘Size of Immune Repertoire’ (SIR) score of an amino acid sequence as the ratio of the number of predicted CTL epitopes to the number of epitopes expected within the same number of random nine-mers with similar amino acid couple distribution . For example, assume a sequence of 308 amino acids (300 overlapping nine-mers), with four computed HLA A*0201 epitopes. If a set of 300 random nine-mers with a similar amino acid distribution is expected to have 10 epitopes, the SIR score of the sequence for HLA A*0201 would be 0.4. The SIR score of a gene in a population is defined as the average SIR score over all HLAs, weighted by the HLA frequency in this population. In the current analysis, we have tested the Sub-Saharan population, the West-African population (Dogon and Mandenka) and the all-human population. An average SIR score of less than 1 represents an under-presentation of epitopes, whereas an average SIR score of more than 1 represents an over-presentation of epitopes.
Predicted CTL epitopes are nine-mers fulfilling three criteria that are as follows: (A) Production through proteasomal cleavage. In other words, a given peptide can be potentially presented if its extreme and flanking residues enhance proteasomal cleavage and if it is not cleaved in its center . (B) Transport through the TAP machinery to the endoplasmic reticulum. (C) Presentation by MHC-I (Fig. 1). We have used algorithms to predict all peptides within a protein successfully passing all these stages [13–15]). All algorithms used here were validated using a quality assurance process versus seven different databases of epitopes experimentally measured to be present. The validation process ensured that the error levels were low enough to allow a systematic analysis of the repertoire  (http://peptibase.cs.biu.ac.il/peptibase/validation.htm).
These algorithms were systematically applied using a nine-amino-acid-sliding window to all nine-mers within each HIV and SIV protein. A fraction of the epitopes is transported through a TAP-independent pathway . As we performed a comparative analysis, we ignored this fraction, assuming that their statistics resemble those of TAP-dependent epitopes and that they should have only a minor effect on the total epitope number. We have ignored octamers and decamers for the same reasons. We estimated the number of epitopes from viral proteins on nine most common HLA-A, nineteen HLA-B and three HLA-C alleles, whose binding motifs are well defined. These combined HLA alleles are present in 80–90% of the human population.
We computed the SIR scores of all HIV-1, HIV-2, SIVcpz, SIVsm and SIVmac sequences in the Los Alamos National Laboratory (LANL) and National Center for Biotechnology Information (NCBI) databases . The average Sub-Saharan population HIV SIR score (i.e. the averaged SIR score of all HIV protein sequences weighted by their length) is significantly lower than 1 (SIR = 0.7, T-test, P < 0.0025). Similar results were obtained taking into account the all-human population frequencies (SIR = 0.75, P < 0.0004) (Fig. 2), showing a clear tendency of HIV to evade immune detection. Furthermore, the SIVcpz, which is the ancestor of HIV-1, has a higher SIR score than the LANL HIV-1 ancestral sequence, which is in turn higher than the average SIR score of all HIV-1 sequences, most of them recent (Fig. 2). A similar trend was obtained when comparing either the West-African population-based SIR score or the all-human population-based SIR score of HIV-2 with the ones of the SIVsm and the SIVmac (Fig. 2). The main reason for the decreasing SIR score in the HIV population is probably the ‘selection pressure’ posed on HIV by the immune system and the establishment of escape variants. The role of the immune pressure is supported by the more significant SIR score decrease in HLA B alleles, known to produce a stronger immune pressure on HIV [18–20]. Although the HIV SIR score of the HLA-A alleles did not change significantly between the SIVcpz and the HIV sequences (ΔSIR = 0.0453, T-test, P = 0.1), the average over HLA-B alleles was more significant (ΔSIR = 0.103, P < 0.05).
To study the evolution of HIV epitopes in the human population, we studied the dynamics of the SIR score with respect to the year the HIV sequence was recorded. As we have no precise sequencing time, we assumed that the reporting year was highly correlated with the sequencing year. The LANL HIV sequences (4140 sequences) were grouped according to the year of report. The correlation between the report year and SIR score was small and almost nonsignificant (Pearson's correlation – P = 0.0541, r = −0.033) in all HLA alleles. However, when only HLA-B alleles were considered as significant, a small decrease was observed (P < 1 × e−7, r = −0.0869). This negative correlation actually only represents a less than 5% net decrease in the average SIR score and much less than one could have expected if HIV systematically evolved to remove its epitopes on all proteins. A similar analysis performed using a different methodology also did not report a significant change in the total HIV epitope density over time . This small decrease is the result of confronting tendencies of different genes as will be further discussed.
If epitopes are indeed systematically positively or negatively selected in HIV, we expect to see a signature of this process in point mutations within epitopes. We created a consensus of all the HIV sequences used in the current analysis and compared all sequences to the consensus. We then compared the fraction of replacement (nonsynonymous R) and silent (synonymous S) mutations. The R/R + S ratio was compared within and outside predicted epitopes on the same proteins. As we do not know the HLA serotyping of each host, we defined an average R/R + S as the weighted R/R + S average, based on the appropriate HLA allele frequency in that population (we tested the Sub-Saharan and all human populations for HIV-1). The average R/R + S ratio is indeed as expected significantly higher inside epitopes than outside epitopes (Sub-Saharan population, 0.94 versus 0.74, two-sample, two-sided T-test, P < 0.01, similar results for all human population). The correlation between epitopes and variability has previously been reported on experimentally measured epitopes on some HIV genes .
We then separated the HIV-1-M(B) CTL epitope repertoire into its nine genetic components and computed the appropriate SIR score for each protein. A clear and logical hierarchy of the SIR score emerges. Regulatory proteins (Tat and Rev) have very few epitopes left already at the HIV ancestral sequence. Their SIR score is close to the basal level induced by the errors inherent to the prediction algorithms. The ‘late’ virion-associated proteins (Gag, Pol and Env) are found at the top of the list. Accessory proteins also have an intermediate SIR score (Fig. 3). This is in good agreement with the observed critical role of Tat-specific CTL in stopping the acute infection stage  and with the differential total variability of the HIV genes [23,24]. Epitopes from proteins that are produced during the first stages of the HIV gene replication after cell entry (e.g. Tat and Rev) are exposed early to CTLs within the cell. Their detection by CTLs can lead to cellular destruction, long before new virions are produced. Epitopes from proteins presented later in the infection may not critically impair HIV, as some virions may have the time to bud before the cell is destroyed. The same trend was observed using the Sub-Saharan allele frequency (Fig. 3). One can actually observe similar results at the single allele level when looking at frequent HLA alleles (that are probably highly sampled in the unserotyped population) (Supplementary material Figure S1).
The evolution of HIV must therefore be tested at the single protein level. Some proteins have reduced their SIR score upon entry of HIV into the human population (e.g. Tat and Rev). Other genes, such as Nef, significantly decrease their SIR score over time, whereas others, such as Gag and Pol, do not. One could raise the possibility that it is advantageous for HIV to increase the exposure of Gag, as it may serve as a decoy for the immune system, removing the pressure from the most critical early components. As Gag is expressed at high levels and contains immunodominant epitopes [25–27], it attracts a massive immune response. When the immune system reaches an equilibrium (i.e. after the first acute phase), the Gag immune response competes with other clones for CD8+ T-cell activation and thus may prevent the response to the critical Tat and Rev genes. If this is indeed the case, one expects a continuous increase in Gag's SIR score as a function of time. We have created the phylogenic tree of the Gag protein based on 465 LANL sequences (Supplementary material methodology). The average SIR scores of the Gag sequences in the intermediate nodes are positively correlated with their level, and they are higher than one in the last levels (Fig. 4). In other words, as the Gag evolves from its ancestor sequences, its SIR score increases.
The average SIR results may be affected by the HLA allele usage in the population or by sampling effects on the HLA usage of HIV carriers. We have repeated the analysis using a smaller number of sequences, in which the host HLA serotyping is known. These sequences are the aggregated results of all HIV-1-clad B cohorts available in the LANL database. We computed for these HIV sequences, the SIR scores in the HLA allele of their host and used protein/HLA combinations for which at least 10 sequences were available to obtain reliable averages. As was observed in the average analysis, Tat and Rev systematically have low SIR scores, whereas Gag has high ones (Fig. 5). Interestingly, when comparing the SIR of HIV genes on the HLA of their host with the average SIR score of all sequences (i.e. serotyped and unserotyped sequences) of the same gene on the same HLA allele, the weighted average of the difference is significantly negative for the regulatory genes Tat and Rev and significantly positive for Gag (R = 0.2, P < 0.0001) (Supplementary-material, Figure S2). One can thus clearly see from multiple angles a positive selection of epitopes in Gag and a negative selection of such epitopes in regulatory genes in hosts with matching HLA.
The SIR score only provides statistical information. In order to translate these results to the properties of specific epitopes, we tracked the HLA-A*0201 Gag, Pol and Tat epitopes over the past 30 years. HLA-A*0201 is the most frequent allele, and a large number of experimental epitopes were tested on this allele. Tat has no epitopes in the ancestral sequence and only two epitopes in all early sequences: QPLQIVVIV and VPIAIVKSV. They both appear in a single sequence and disappear over time. Some transient epitopes appear but only briefly. Not only do the epitopes themselves disappear, but also their position in the genome is highly mutated. By the end of our sampling, it is completely different (only 30% similarity). Interestingly, once the epitopes disappeared, this position remained conserved. Rev evolution is very similar to Tat, showing a systematic elimination of the epitopes.
A very different trend was observed in Gag/Pol. Gag has originally a large number of computed epitopes, including the immunodominant SLYNTIATL. The original epitopes are actually highly conserved over time, and a good alignment to the original epitopes can be found in 85% of the sequences in the last recorded period. Pol demonstrates a similar epitope evolution to Gag, showing again the very different evolution of Tat, Rev, Gag and Pol.
A major obstacle preventing the development of useful anti-HIV T-cell vaccines is the rapid mutation of its CTL epitopes. Significant efforts have been directed toward the analysis of specific epitopes or the detection of ‘robust’ epitopes. Within this analysis, Gag epitopes played a major role. The immune response to Gag was indeed correlated with a lower viremia . We have here extended the existing analysis of specific epitopes, most of them to frequent HLA, such as HLA-A*0201, to a systematic analysis of the HIV CTL epitope repertoire. We found that the number of epitopes in HIV was initially low and further decreased over time, but that this decrease is very minor. The decrease is mainly associated with HLA-B alleles, and it reaches an average of 10% from SIV to current HIV sequences. This limited decrease results from the opposing tendencies of ‘early’ and ‘late’ genes. The regulatory Rev and Tat have significantly reduced their SIR score upon the transition to HIV. They have a very low SIR score, leaving no place for further evolution. Late genes, such as the virion-associated Gag, Env and Pol, maintained the overall number of epitopes through HIV evolution. The Gag SIR score has even increased in recent years to more than 1. Recently, we observed a similar trend in the herpesviridae proteins . This difference has been experimentally observed through the critical role of Tat-specific CTL in the early stages of infection in single patients . This response is then lost through epitope mutation and replaced by the immunodominant Gag response [25–27]. CTL may attempt to detect the small amounts of Gag expressed by the infecting virus, before the initiation of HIV protein transcription . Given the limited amounts of Gag proteins at this stage, such attempts are probably of limited success. The SIR score difference between regulatory and virion-associated HIV genes raises the intriguing possibility that the HIV may actually evolve to target the immune response to this gene, rather than the more essential earlier genes. Thus, if an immune response is raised against proteins expressed only during capsid formation and budding, the appropriate CTLs may not have the window of time to destroy the cell before at least few HIV virions exit the cell. This prospect may hint that Gag is not an ideal target for immunotherapy. Note that in contrast with these results, there are multiple evidences that an anti-Gag response is associated with a low viremia. A possible explanation for this discrepancy is the complex dynamics affecting the immune system–virus steady state. These complex dynamics obviously affect the host survival but may not affect the short time scale dynamics of the virus. These short time scale dynamics are probably the main factor driving the HIV epitope evolution.
The global decrease in the number of HIV CTL epitopes was accompanied, as expected, by an increased R/S ratio inside epitopes, revealing not only the end product of the selection but also at least one of the driving mechanisms reducing the epitope number.
The orderly life cycle is not the only factor affecting the detection of epitopes by the immune system. Other critical factors are the RNA expression level, the ubiquitylation rate and the subcellular localization of proteins, as well as structural and functional constraints on their amino acid usage. Beyond the obvious life cycle effect, one could indeed observe finer differences, for example, between Gag, Env and Pol, as was recently reported . The current analysis is only a first-order analysis; the analysis of more complex aspects of the immune system–HIV interaction will require more detailed information on the HLA alleles of HIV carrying hosts and the effect of epitope immunodominance.
The work of Y.L. and T.V.-S. was sponsored by NIH grant AI61062-01.
1. Cullen BR. Regulation of HIV-1 gene expression. Faseb J 1991; 5:2361–2368.
2. Pomerantz RJ, Seshamma T, Trono D. Efficient replication of human immunodeficiency virus type 1 requires a threshold level of Rev: potential implications for latency. J Virol 1992; 66:1809–1813.
3. Fukumori T, Kagawa S, Iida S, Oshima Y, Akari H, Koyama AH, Adachi A. Rev-dependent expression of three species of HIV-1 mRNAs [review]. Int J Mol Med 1999; 3:297–302.
4. Vicenzi E, Poli G. Regulation of HIV expression by viral genes and cytokines. J Leukoc Biol 1994; 56:328–334.
5. Steffy K, Wong-Staal F. Genetic regulation of human immunodeficiency virus. Microbiol Rev 1991; 55:193–205.
6. Letvin NL, Schmitz JE, Jordan HL, Seth A, Hirsch VM, Reimann KA, Kuroda MJ. Cytotoxic T lymphocytes specific for the simian immunodeficiency virus. Immunol Rev 1999; 170:127–134.
7. Negri DR, Borghi M, Baroncelli S, Macchia I, Buffa V, Sernicola L, et al
. Identification of a cytotoxic T-lymphocyte (CTL) epitope
recognized by Gag-specific CTLs in cynomolgus monkeys infected with simian/human immunodeficiency virus. J Gen Virol 2006; 87:3385–3392.
8. Lichterfeld M, Yu XG, Le Gall S, Altfeld M. Immunodominance of HIV-1-specific CD8(+) T-cell responses in acute HIV-1 infection: at the crossroads of viral and host genetics. Trends Immunol 2005; 26:166–171.
9. Fields BN, Knipe DM, Howley PM, Griffin DE. Fields virology
. 4th ed. Philadelphia: Lippincott Williams & Wilkins; 2001.
10. Rock KL, York IA, Saric T, Goldberg AL. Protein degradation and the generation of MHC class I-presented peptides. Adv Immunol 2002; 80:1–70.
11. Vider-Shalit T, Fishbain V, Raffaeli S, Louzoun Y. Phase-dependent immune evasion of herpesviruses. J Virol 2007; 81:9536–9545.
12. Ginodi I, Vider-Shalit T, Tsaban L, Louzoun Y. Precise score for the prediction of peptides cleaved by the proteasome. Bioinformatics
13. Louzoun Y, Vider T. Score for proteasomal peptide production probability. Immunology 2004:1.
14. Peters B, Bulik S, Tampe R, Endert PMV, Holzhutter HG. Identifying MHC class I epitopes by predicting the TAP transport efficiency of epitope
precursors. J Immunol 2003; 171:1741–1749.
15. Parker KC, Bednarek MA, Coligan JE. Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. J Immunol 1994; 152:163–175.
16. Yewdell JW, Snyder HL, Bacik I, Anton LC, Deng Y, Behrens TW, et al
. TAP-independent delivery of antigenic peptides to the endoplasmic reticulum: therapeutic potential and insights into TAP-dependent antigen processing. J Immunother 1998; 21:127–131.
17. Kuiken C, Korber B, Shafer RW. HIV sequence databases. AIDS Rev 2003; 5:52–61.
18. Bihl F, Frahm N, Di Giammarino L, Sidney J, John M, Yusim K, et al
. Impact of HLA-B alleles, epitope
binding affinity, functional avidity, and viral coinfection on the immunodominance of virus-specific CTL responses. J Immunol 2006; 176:4094–4101.
19. Kiepiela P, Leslie AJ, Honeyborne I, Ramduth D, Thobakgale C, Chetty S, et al
. Dominant influence of HLA-B in mediating the potential co-evolution of HIV and HLA. Nature 2004; 432:769–775.
20. Specht A, Degottardi MQ, Schindler M, Hahn B, Evans DT, Kirchhoff F. Selective downmodulation of HLA-A and -B by Nef alleles from different groups of primate lentiviruses. Virology 2007; 373:229–237.
21. Schmid B, Kesmir C, de Boer RJ. The specificity and polymorphism of the MHC class I prevents the global adaptation of HIV-1 to the monomorphic proteasome and TAP. PLoS ONE 2008; 3:e3525.
22. Yusim K, Kesmir C, Gaschen B, Addo MM, Altfeld M, Brunak S, et al
. Clustering patterns of cytotoxic T-lymphocyte epitopes in human immunodeficiency virus type 1 (HIV-1) proteins reveal imprints of immune evasion on HIV-1 global variation. J Virol 2002; 76:8757–8768.
23. Addo MM, Yu XG, Rosenberg ES, Walker BD, Altfeld M. Cytotoxic T-lymphocyte (CTL) responses directed against regulatory and accessory proteins in HIV-1 infection. DNA Cell Biol 2002; 21:671–678.
24. Betts MR, Yusim K, Koup RA. Optimal antigens for HIV vaccines based on CD8+ T response, protein length, and sequence variability. DNA Cell Biol 2002; 21:665–670.
25. Kaufmann DE, Bailey PM, Sidney J, Wagner B, Norris PJ, Johnston MN, et al
. Comprehensive analysis of human immunodeficiency virus type 1-specific CD4 responses reveals marked immunodominance of gag and nef and the presence of broadly recognized peptides. J Virol 2004; 78:4463–4477.
26. Frahm N, Korber BT, Adams CM, Szinger JJ, Draenert R, Addo MM, et al
. Consistent cytotoxic-T-lymphocyte targeting of immunodominant regions in human immunodeficiency virus across multiple ethnicities. J Virol 2004; 78:2187–2200.
27. Gao X, Bashirova A, Iversen AK, Phair J, Goedert JJ, Buchbinder S, et al
. AIDS restriction HLA allotypes target distinct intervals of HIV-1 pathogenesis. Nat Med 2005; 11:1290–1292.
28. Geldmacher C, Currier JR, Herrmann E, Haule A, Kuta E, McCutchan F, et al
. CD8 T-cell recognition of multiple epitopes within specific Gag regions is associated with maintenance of a low steady-state viremia in human immunodeficiency virus type 1-seropositive patients. J Virol 2007; 81:2440–2448.
29. Sacha JB, Chung C, Rakasz EG, Spencer SP, Jonas AK, Bean AT, et al
. Gag-specific CD8+ T lymphocytes recognize infected cells before AIDS-virus integration and viral protein expression. J Immunol 2007; 178:2746–2754.
30. Brumme ZL, Brumme CJ, Heckerman D, Korber BT, Daniels M, Carlson J, et al
. Evidence of differential HLA class I-mediated viral evolution in functional and accessory/regulatory genes of HIV-1. PLoS Pathog 2007; 3:e94.
31. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al
. Clustal W and Clustal X version 2.0. Bioinformatics