HIV-1 infection continues to spread rapidly throughout the world at a rate of 15,000 new infections every day. According to the UNAIDS/WHO (December 2008) report, an estimated 33 million people are living with the virus worldwide.1
HIV-1 is characterised by high rate of genetic variability, a rapid evolution, and diversification. It has been acknowledged that genetic recombination within and between different subtypes, a high rate of reverse transcriptase (RT) errors, and rapid turnover contribute significantly to HIV-1 variability.2-5 HIV-1 can be classified into 9 approximately equidistant phylogenetic subtypes (A-K) and a number of circulating recombinant forms.6-9
In Western Europe, where HIV-1 subtype B predominates, the epidemic showed a rapid exponential growth in the early stages that has subsequently declined.10-12 HIV incidence peaked around 1983 among men-having-sex-with-men (MSM) and in 1988 among injection drug users (IDUs). Between 1997 and 2002, HIV diagnoses decreased among injecting drug users but increased greatly in subjects infected through heterosexual or homosexual contact.13 MSM were the most frequently diagnosed risk group until 1998 and remained the group at highest risk of acquiring HIV in Western Europe14; however, since then, heterosexuals have been more frequently diagnosed.
The HIV epidemic in Italy, initiate by HIV-1 subtype B, entered with IDUs who (including homosexual IDUs) accounted for 57% of all of the AIDS cases reported in the adult population between 1982 and 2007; in the same period, MSM accounted for an increasing percentage ranging from 15% in 1996-1997 to 22% in 2006-2007.15
Many studies have exploited the rapid evolution of HIV and used statistical methods to reconstruct its origins and evolutionary dynamics on the basis of viral sequence data16-19 We used a recently described Bayesian coalescent-based framework20 to reconstruct the epidemiological history of the HIV-1 subtype B epidemic in Italy. To this aim, we estimated the evolutionary rates and dates of origin of the epidemic and identified and characterised a number of epidemiological clusters from a dataset of pol gene sequences isolated from MSM living in Rome.
PATIENTS AND METHODS
Patients, Dataset, Phylogenetic, and Demographic Analysis
The dataset consisted of 125 nonrecombinant HIV-1 subtype B pol sequences (1233 nucleotides long) encompassing protease and RT coding regions. The isolates were obtained from a prevalent cohort of HIV-1-infected MSM, who attended the sexually transmitted infection/HIV Clinic of the San Gallicano Institute Dermatological in Rome between 2004 and 2008. All samples were collected at their first HIV-positive test, or, if diagnosed as HIV-1 infected before January 2004, at the first follow-up consultation within the study period. Demographic and clinical data were collected anonymously in accordance with the standards of the Institute's Ethics Committee. Their median age was 36 years (interquartile range: 30-40.25), and 21 (16.8%) were non-Italians. At the moment of the sample collection, 30 patients were in treatment with highly active antiretroviral therapy. In an online Table, the clinical characteristics of the patients were reported (see Table, Supplemental Digital Content 1, http://links.lww.com/QAI/A70).
Pol Gene Sequencing
Viral RNA was extracted from plasma samples using a Roche Amplicor HIV-1 Monitor and Ultrasensitive Specimen Preparation kit (alcohol extraction). After reverse transcriptase-polymerase chain reaction, the protease and RT coding regions of the pol gene were amplified using the Viroseq HIV-1 Genotyping System (Celera Diagnostics, Alameda, CA), and the amplified products were sequenced using Big Dye terminators and an ABI 310 automated sequencer (Applied Biosystems Inc, Foster City, CA), and analyzed using Viroseq analysis software (Applied Biosystems Inc).
The sequences used in this study were deposited in GenBank with assigned accession numbers (from FJ228036 to FJ228136).
All HIV pol sequences were analyzed using REGA HIV-1 subtyping tool21 and aligned removing gaps and cutting to identical sequence lengths using ClustalX software.22 The manual editing was performed using Bioedit software (http://www.mbio.ncsu.edu/BioEdit/bioedit.html).
The Bayesian phylogenetic tree was reconstructed by means of MrBayes23 using a general-time-reversible model of nucleotide substitution, a proportion of invariant sites, and gamma-distributed rates among sites. A Markov Chain Monte Carlo search was made for 5 × 106 generations using tree sampling every 100th generation and a burn-in fraction of 50%. Statistical support for specific clades was obtained by calculating the posterior probability of each monophyletic clade, and a posterior consensus tree was generated after a 50% burn-in. The epidemiological clusters were recognized on the basis of the same tree, and posterior probability was used as a statistical support for each clade. Clades with a posterior probability of 1 were considered epidemiological clusters.
The dated trees, evolutionary rates, and population growth were co-estimated by using a Bayesian Markov Chain Monte Carlo approach (Beast version 1.4.8 http://beast.bio.ed.ac.uk)24 implementing a general-time-reversible + Invariant + Gamma model. The posterior consensus tree obtained by means of MrBayes was used as the starting tree. As coalescent priors, different parametric demographic models (a constant population size and exponential and logistic growth) and a nonparametric Bayesian skyline plot (BSP) were compared under strict and relaxed clock conditions, and the best models were selected by means of a Bayes factor (BF, using marginal likelihoods) implemented in Beast.25 In accordance with Kass and Raftery,26 the strength of the evidence against H0 was evaluated as follows: 2lnBF <2 no evidence; 2-6 weak evidence; 6-10 strong evidence; >10 very strong evidence. A negative value indicates evidence in favour of H0. Only values of ≥6 were considered significant.
Chains were conducted for at least 50 × 106 generations and sampled every 5000 steps. Convergence was assessed on the basis of the effective sampling size after a 10% burn-in using Tracer software version 1.4 (http://tree.bio.ed.ac.uk/software/tracer/). Only parameter estimates with effective sampling size's of >200 were accepted. Uncertainty in the estimates was indicated by 95% highest posterior density (95% HPD) intervals. The trees were summarised in a target tree by the Tree Annotator program included in the Beast package by choosing the tree with the maximum product of posterior probabilities (maximum clade credibility) after a 10% burn-in.
To avoid the influence of convergent evolution due to the drug associated selection pressure, we confirmed the existence and significance of the clades by performing an analysis partitioned into codon positions (1 + 2 and 3) and considering only the third codon position at the same conditions of the nonpartitioned analysis.
The Bayesian phylogenetic tree reconstructed using the 125 Italian MSM isolates showed a number of highly significant clades (see Figure, Supplemental Digital Content 2, http://links.lww.com/QAI/A71), and a total of 10 clades with posterior probabilities = 1 (a-l) were identified when the analysis was limited to the clades including more than 2 isolates. The clades were made up of between 9 isolates (clade e) and 3 isolates (clades f, g, h, i, and l). All these clades were also highly significant after limiting the analysis to the third codon position.
The genetic distances between clades was always ≥1.7% (from 1.7 to 3.2), and the median intraclade distance was 0.6% (range 0%-1.7%). Two groups of clades were identified as follows: 1 group (clades l, i, c, and f) whose intraclade distance was less than the median (range 0%-0.4%) and another (clades a, b, d, e, g, and h) whose genetic distance was the same as or more than the median (range 0.6%-1.7%) (see Table, Supplemental Digital Content 3, http://links.lww.com/QAI/A72).
Figure 1 shows the dated phylogeny. Analyses of the noncontemporary sequences under a relaxed (log normal) or strict molecular clock model showed that the first performed better than the second, as assessed by the Bayes factor (2lnBF = 128.9). A mean evolutionary rate of 2.6 × 10−3 (range 1.4-3.9 × 10−3) substitutions/site/year was estimated for the considered HIV-1 pol sequences and, on the basis of this evolutionary rate, we estimated the time of the most common recent ancestor (tMRCAs) of the entire tree: in particular, the root of the tree was placed about 25.3 before 2008 (95% HPD 15.5-37.5).
Analysis of the tMRCAs of the different clades showed that some clades (a, b, d, e, and h) originated more than 10 years ago (between 1990 and 1995, 17.8-12.3 years before 2008), whereas the others (c, f, g, I, and l) began to radiate between 1999 and 2004 (8.8-3.7 years before 2008). There was a significant positive correlation between the tMRCA and the intraclade genetic distance (Pearson correlation r = 0.964, P < 0.01).
Table 1 shows the mean clade tMRCA estimates and the main demographic characteristics of the patients included in the clades. There were no statistically significant differences between the patients' characteristics in the different clades, but there was a trend toward a younger age (mean birth year 1974 versus 1970; Pearson correlation between tMRCA and birth year r = 0.33, P = 0.03) and a not statistically significant higher frequency of foreign nationality (6 of 16, 37.5% versus 4 of 27, 14.8%; P = 0.08 by χ2 test) in the clades with a tMRCA of <10 years.
Assuming that the distance between 2 adjacent tMRCAs represents the upper boundary of the time between 2 transmission events, analysis of the times estimated for each node in the major clades (a and e) showed that the median time between transmission events was 2.1 years (0.8-4.7) in clade e and 2.5 years (range: 1.5-14.8) in clade a.
Bayes factor comparison of the parametric demographic models showed that a logistic growth model fitted the data better than constant population size (2lnBF = 9.3) or exponential growth (2lnBF = 935.6) (see Table, Supplemental Digital Content 4, http://links.lww.com/QAI/A73). We estimated a high exponential growth rate (r = 1.54 year−1, HPD95% 1.03-2.13), corresponding to a duplication time of 0.448 years (5.8 months). Bayes factor comparison of the parametric and nonparametric models showed that the BSP fitted the data better than the logistic model (2lnBF = 632).
Figure 2 shows the changes in population size at different times from the root of the tree to the time of the most recent isolates. The effective number of infections exponentially grew from 1982 to the early 1990s, flattened out until about 2000, and then decreased until about 2003.
MSM continue to be the population group most at risk of acquiring HIV in high-income countries. Unprotected sex between men still accounts for more than 50% of new HIV diagnoses in the United States and about 40% in Canada.1,27,28 In Western Europe, the number of diagnoses in MSM has increased over the last few years, probably together with an increase in risk behaviors among homosexual and bisexual men.1,13 The HIV epidemic in Italy, which was originally supported by HIV-1 subtype B, entered with IDUs who (including homosexual IDUs) accounted for 57% af all of the AIDS cases reported.29
The main aim of this study was to investigate the evolutionary and demographic history of the HIV-1 B subtype in an Italian cohort of MSM using a Bayesian coalescent-based framework. Dated phylogeny allowed us to estimate a tMRCA of about 25 years for the root of the tree, thus suggesting that the HIV-1B strains currently circulating in Italy originated in the early 1980s, about 20 years later than the mean coalescent-based tMRCA estimate for subtype B in high-income countries as a whole (1959-1962).12
After excluding clades consisting of only 2 sequences, our Bayesian phylogenetic analysis of 125 pol sequences revealed the existence of 10 clades supported by high degree of statistical significance and including between 3 and 9 individuals.
A number of authors have recently attempted the time-scale reconstruction and characterization of epidemiological clusters among individuals infected with HIV-1 probably through sexual route (particularly MSM) using molecular clock-based approaches.10,30 The identification of several clades among MSM living in the United Kingdom has made it possible to conclude that the currently circulating subtype B strains were introduced by multiple subepidemics, and a similar conclusion can be drawn from our data. Although smaller than that analyzed by Huè,10 our sequence data set was obtained from a restricted and homogeneous population of MSM living in Rome and attending one clinical centre.
The tMRCA estimates in our study showed that the clades fall into 2 groups. The first, which originated in the late 1980s and early 1990s, includes the largest clades (a and e) and has the highest level of intraclade genetic heterogeneity, which suggests that the clades represent strains that accumulated a large number of mutations during the time. This hypothesis is supported by the positive correlation between the tMRCA estimates and genetic diversity. The second group of clades, with lower intraclade distances, originated between 1998 and 2003, which is in line with the evidence of a re-emergence of the HIV epidemic in MSM between 1996 and 2005 due to increased risk behaviors among MSM in Western Europe,13,31 including Italy.29
We did not find any significantly supported clades originating before 1990, probably because of the high rate of mortality among HIV-infected subjects before the introduction of highly active antiretroviral therapy and the fact that the samples were collected between 2004 and 2008.
Analysis of the demographic characteristics of the patients showed that the more recent clades had a trend through an increasing proportion of foreign subjects, which suggests the introduction of new HIV-1 B strains, as has been reported in relation to the recent introduction of new HIV-1 variants from different geographic areas.32,33
In a recent analysis of the internode intervals of the dated clades, Lewis et al30 estimated a median time of 14 months between transmission events, whereas we estimated a median of 30 months. However, as our sample size was smaller, it is highly probable that only a small fraction of the actual members of a epidemiological cluster were sampled and so the actual average time between events should be shorter than our estimate.
One possible bias in the identification of epidemiological clusters is represented by the effects of convergence due to drug therapy-associated selection pressure, but we did not find any major drug-resistant mutations in our clades and the partitioned analysis limited to the third codon position, not subjected to the selection pressure, gave similar results to the nonpartitioned one.
On the basis of the dated Bayesian phylogeny, we made a demographic analysis using 2 different approaches: one parametric and the other a nonparametric BSP. This analysis showed that the epidemic grew in Italy after a logistic model, which is in line with the findings of other authors concerning the expansion of the subtype B epidemic in high-income countries.34,35 The BSP showed that the epidemic was characterised by rapid exponential growth from the early 1980s (corresponding to the root of the tree) until the early 1990s. During this period, the effective number of infections increased rapidly at a rate of 1.5 years−1, which corresponds to a doubling time of about 6 months. This is a higher rate than that of previous estimates for Western populations,34-36 but similar to the rates observed by Huè in a number of epidemiological clusters among MSM in the United Kingdom.10 Our data suggest that HIV entered the Italian population later than in other countries but then spread rapidly to reach a high frequency of infection. The effective number of infections showed a 10-fold growth in a short time interval of only one decade, remained constant until the early 2000s and then decreased until 2003, when it reached a new plateau. This result is not in contrast with the appearance of the new epidemiological clusters during the period 1999-2003. In fact we know that in Western Europe the HIV diagnosis increased in homosexual and bisexual men in early 2000s, after a slow decline in the previous years.13 Probably, our demographic analysis reflects the trend of the epidemic of HIV-1 subtype B in the whole Italian population not only in MSM. Moreover, the recent replacement of the endemic subtype by new subtypes from other geographic areas32,33,37 may explain the decrease in the number of people infected with HIV-1 B.
In conclusion, our preliminary data show the importance of phylodynamics when reconstructing the origin and evolution of specific HIV epidemiological networks in different groups of subjects. Further studies are necessary to elucidate the demographic evolution of the other HIV subtypes more recently introduced into Europe to clarify the current trend of the infection among Italian MSM.
2. Ho DD, Neumann AU, Perelson AS, et al. Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature
3. Op de Coul EL, Prins M, Cornelissen M, et al. Using phylogenetic analysis to trace HIV-1 migration among western European injecting drug users seroconverting from 1984 to 1997. AIDS
4. Roberts JD, Bebenek K, Kunkel TA. The accuracy of reverse transcriptase from HIV-1. Science
5. Temin HM. Retrovirus variation and reverse transcription: abnormal strand transfers result in retrovirus genetic variation. Proc Natl Acad Sci U S A
6. Ayouba A, Souquieres S, Njinku B, et al. HIV-1 group N among HIV-1-seropositive individuals in Cameroon. AIDS
7. Gurtler LG, Hauser PH, Eberle J, et al. A new subtype of human immunodeficiency virus type 1 (MVP-5180) from Cameroon. J Virol
8. Louwagie J, McCutchan FE, Peeters M, et al. Phylogenetic analysis of gag genes from 70 international HIV-1 isolates provides evidence for multiple genotypes. AIDS
9. Simon F, Mauclere P, Roques P, et al. Identification of a new human immunodeficiency virus type 1 distinct from group M and group O. Nat Med
10. Hue S, Pillay D, Clewley JP, et al. Genetic analysis reveals the complex structure of HIV-1 transmission within defined risk groups. Proc Natl Acad Sci U S A
11. Pybus OG, Harvey PH. Testing macro-evolutionary models using incomplete molecular phylogenies. Proc Biol Sci
12. Walker PR, Pybus OG, Rambaut A, et al. Comparative population dynamics of HIV-1 subtypes B and C: subtype-specific differences in patterns of epidemic growth. Infect Genet Evol
13. Hamers FF, Downs AM. The changing face of the HIV epidemic in western Europe: what are the implications for public health policies? Lancet
14. McCutchan FE. Global epidemiology of HIV. J Med Virol
. 2006;78(Suppl 1):S7-S12.
15. Italian AIDS Operational Center C. Aggiornamento delle nuove diagnosi di infezioni da HIV e dei casi di AIDS in Italia. Notiziario Istituto Superiore di Sanità
16. Korber B, Muldoon M, Theiler J, et al. Timing the ancestor of the HIV-1 pandemic strains. Science
17. Lemey P, Salemi M, Wang B, et al. Site stripping based on likelihood ratio reduction is a useful tool to evaluate the impact of non-clock-like behavior on viral phylogenetic reconstructions. FEMS Immunol Med Microbiol
18. Salemi M, Strimmer K, Hall WW, et al. Dating the common ancestor of SIVcpz and HIV-1 group M and the origin of HIV-1 subtypes using a new method to uncover clock-like molecular evolution. Faseb J
19. Worobey M, Pitchenik AE, Gilbert MT, et al. Reply to Pape et al: the phylogeography of HIV-1 group M subtype B. Proc Natl Acad Sci U S A
20. Drummond AJ, Rambaut A, Shapiro B, et al. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol
21. de Oliveira T, Deforche K, Cassol S, et al. An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics
22. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res
23. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics
24. Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol
25. Suchard MA, Weiss RE, Sinsheimer JS. Bayesian selection of continuous-time Markov chain evolutionary models. Mol Biol Evol
26. Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc
27. Hall HI, An Q, Hutchinson AB, et al. Estimating the lifetime risk of a diagnosis of the HIV infection in 33 states, 2004-2005. J Acquir Immune Defic Syndr
28. Hall HI, Song R, Rhodes P, et al. Estimation of HIV incidence in the United States. JAMA
29. Giuliani M, Di Carlo A, Palamara G, et al. Increased HIV incidence among men who have sex with men in Rome. AIDS
30. Lewis F, Hughes GJ, Rambaut A, et al. Episodic sexual transmission of HIV revealed by molecular phylodynamics. PLoS Med
31. Sullivan PS, Hamouda O, Delpech V, et al. Reemergence of the HIV epidemic among men who have sex with men in North America, Western Europe, and Australia, 1996-2005. Ann Epidemiol
32. Giuliani M, Montieri S, Palamara G, et al. Non-B HIV type 1 subtypes among men who have sex with men in Rome, Italy. AIDS Res Hum Retroviruses
33. Visco-Comandini U, Balotta C. Genotypic resistance tests for the management of the HIV-infected patient with non-B viral isolates. Scand J Infect Dis Suppl
34. Bello G, Eyer-Silva WA, Couto-Fernandez JC, et al. Demographic history of HIV-1 subtypes B and F in Brazil. Infect Genet Evol
35. Walker PR, Worobey M, Rambaut A, et al. Epidemiology: sexual transmission of HIV in Africa. Nature
36. Robbins KE, Lemey P, Pybus OG, et al. U.S. Human immunodeficiency virus type 1 epidemic: date of origin, population history, and characterization of early strains. J Virol
37. Buonaguro L, Tagliamonte M, Tornesello ML, et al. Genetic and phylogenetic evolution of HIV-1 in a low subtype heterogeneity epidemic: the Italian example. Retrovirology