Time-measured phylogenies of gag, pol and env sequence data reveal the direction and time interval of HIV-1 transmission

Rachinger, Andreaa; Groeneveld, Paul HPb; van Assen, Sanderc; Lemey, Philipped; Schuitemaker, Hannekea

doi: 10.1097/QAD.0b013e3283467020
Basic Science: Concise Communication

Objective: To investigate whether time-measured phylogenetic analysis of longitudinal viral sequences can establish the direction and timing of HIV-1 transmission in an epidemiologically linked transmission cluster of three homosexual men.

Design: An HIV-1-infected homosexual man (patient 1) and his long-term HIV-negative partner (patient 2) engaged in a triangular relationship with an additional partner (patient 3). On the basis of phylogenetic analysis of gag sequences, patient 3 was previously identified as the source for superinfection of patient 1 but the source of HIV-1 infection of patient 2, who seroconverted during the triangular relationship, remained unclear. Here, we set out to analyze newly obtained gag, pol and env sequences from all three patients to fully elucidate the transmission history in this epidemiologically linked cluster.

Methods: Bayesian Markov Chain Monte Carlo (MCMC) phylogenetic analyses incorporating a relaxed clock model and a flexible Bayesian skyride tree prior were applied to the longitudinally obtained gag, pol and env sequences from all three patients.

Results: Our time-measured evolutionary reconstructions convincingly supported transmission of HIV-1 from the new partner patient 3 to both patients 1 and 2. In addition, estimates of viral divergence times assisted in narrowing down the transmission intervals delineated by seroconversion estimates.

Conclusion: Our analysis implies that Bayesian MCMC phylogenetic reconstruction incorporating temporal information can indeed reveal the direction of multiple HIV-1 transmission events in an epidemiologically linked cluster and provide more detail on the timing of transmission.

Author Information

aDepartment of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infection and Immunity Amsterdam (CINIMA) at the Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands

bDepartment of Internal Medicine, Isala Clinics, Zwolle, The Netherlands

cDepartment of Internal Medicine, Division of Infectious Diseases, University Medical Center Groningen, Groningen, The Netherlands

dDepartment of Microbiology and Immunology, Rega Institute, K.U. Leuven, Belgium.

Received 26 July, 2010

Revised 9 February, 2011

Accepted 4 March, 2011

Correspondence to Hanneke Schuitemaker, Department of Experimental Immunology, AMC M01-120, Meibergdreef 15, 1105 AZ Amsterdam, The Netherlands. Tel: +31 20 5668298; fax: +31 20 5669756

Article Outline
Back to Top | Article Outline


Phylogenetic analysis has frequently been used to investigate HIV-1 transmission. HIV-1 transmission histories can be hypothesized based on epidemiological information [1] (Rachinger et al. submitted), by contact tracing in well established transmission chains [2,3] and in the context of criminal investigations [4,5]. Forensic applications have spawned a great deal of discussion on the standards of phylogenetic inference [6,7]. Although epidemiological linkage can be tested phylogenetically, the direction and timing of HIV-1 transmission, however, are far more difficult to assess. Standard phylogenetic reconstruction only establishes evolutionary relatedness but not evolutionary direction. The integration of molecular clock models in phylogenetic inference has enabled the reconstruction of rooted, time-measured phylogenies [8], and the time scale of viral evolutionary histories has proven useful to shed light on HIV-1 and Hepatitis C virus transmission hypotheses [9].

In 2008, we reported on HIV-1 superinfection of an HLA*B5701-positive long-term elite controller of HIV-1 infection (patient 1, Fig. 1a) [1]. During a 14-year period, his steady partner (patient 2) had remained HIV negative despite their unprotected sexual contact. Patient 1 contracted HIV-1 superinfection in 2005 after unprotected sex with a new partner (patient 3). The three men subsequently maintained a triangular relationship with unprotected sexual intercourse and as a consequence, patient 2 seroconverted 11 months after the estimated time of superinfection of patient 1 (Fig. 1a). Phylogenetic analysis of gag sequences confirmed that all three men were infected with the same virus variant from 2005 on and identified patient 3 as the source for superinfection of patient 1 [1]. The source for the initial infection of patient 2 remained unsolved (Fig. 1a).

Here, we adopted a Bayesian Markov Chain Monte Carlo (MCMC) framework to infer evolutionary histories from partial HIV-1 genes serially sampled from the three patients. Time-measured phylogenies of clonally amplified gag, pol and C2–C4 env sequences provided compelling evidence for the direction of HIV-1 transmission from patient 3 to both patients 1 and 2, and relatively precise time frames for the transmission events.

Back to Top | Article Outline


RNA isolation, RT-PCR, PCR amplification, molecular cloning of multiple PCR products and sequencing

We eluted 50 μl HIV-1 RNA from 140 μl serum obtained at multiple time points during follow-up of all three patients (QIAmp Viral RNA Mini kit; Qiagen, Hilden, Germany). cDNA synthesis was performed from 10 μl RNA with RT-PCR (SuperScript First-Strand Synthesis system; Invitrogen, Carlsbad, California, USA), followed by PCR amplification (GoTaq Flexi DNA Polymerase; Promega, Madison, Wisconsin, USA), molecular cloning (pGEM-T-Easy Vector System; Promega) of multiple PCR products and sequencing (Big Dye Terminator Cycle Sequencing kit, v1.1; Applied Biosystems, Foster City, California, USA) as described previously [10] of HIV-1 C2-C4 env (546 nucleotides), gag (1018 nucleotides) and pol (977 nucleotides) (primers see supplemental information, http://links.lww.com/QAD/A132). From each time point and per patient a median of 3 env (range 1–5), 4 gag (range 1–5) and 1 pol (range 1–4) PCR product(s) were generated and cloned. A median of 6 env (range 2–27), 6 gag (range 1–15) and 5.5 pol (range 4–9) sequences were generated per time point. Additionally, sequences obtained from replication-competent biological clones of all three patients generated during our earlier study were included (20 env, 17 gag, 18 pol sequences) [1]. In total, 172 env, 155 gag and 66 pol sequences were included in phylogenetic analyses. For patient 1, only sequences obtained after HIV-1 superinfection were used. Sequences were submitted to GenBank (accession numbers JF278168–JF278560).

Back to Top | Article Outline
Bayesian phylogenetic inference

To exclude a potential confounding impact of recombination, we tested for the presence of recombination using the robust Phi-test [11]. No significant recombination signal could be detected (gag: P = 0.25, pol: P = 0.15, env: P = 0.06).

Evolutionary histories for the patients' gag, pol and env nucleotide alignments were reconstructed using BEAST v.1.4.8 [12] (details see supplemental information, http://links.lww.com/QAD/A132). MCMC analyses were run for 100 million generations and diagnosed using Tracer (http://tree.bio.ed.ac.uk/software/tracer). Phylogenetic relationships were represented using a maximum clade credibility tree obtained using TreeAnnotator and visualized in FigTree (http://tree.bio.ed.ac.uk/software/figtree/).

Back to Top | Article Outline


Patients and serum samples

From patient 1, seven HIV-1-positive serum samples were collected during a 3.3-year period (November 2005 until March 2009) from 2 months after the estimated time of superinfection onwards. From patient 2, five HIV-1-positive serum samples were collected during a 1.6-year period (October 2006 until May 2008), starting 6 months after estimated time of infection. From patient 3, nine HIV-1-positive serum samples were collected over a 3-year period (May 2005 until May 2008), starting 2 months after estimated time of infection. Median time between samples was 4.5 months (range 0.3–11 months).

Back to Top | Article Outline
Viral load and CD4+ T cell counts

After superinfection of patient 1, viral load peaked at 2.5 × 104 copies/ml plasma and subsequently remained between 1.7 × 103 and 2.7 × 103 copies/ml. CD4 counts decreased to 400 cells/μl blood but increased again to 500 cells/μl. Patients 2 and 3 experienced constantly high viral load (4.1 × 104–2.8 × 105 and 4.3 × 104–1.5 × 105 copies/ml, respectively) and a decline of CD4+ T cells to 260 and 340 cells/μl, respectively, in 2.3 and 3.5 years time postinfection. Patients 2 and 3 initiated antiretroviral therapy in July 2008, 3.7 and 2.4 years after seroconversion, respectively, and viral load declined to undetectable levels in both patients. Patient 1, infected with the same virus, remained therapy-naïve until the end of follow-up.

Back to Top | Article Outline
Transmission direction and timing

To elucidate HIV-1 transmission directions within this epidemiologically linked transmission cluster (Fig. 1a), we adopted a Bayesian MCMC approach that infers a posterior distribution of time-measured phylogenies from temporally spaced sequence data [12]. Maximum clade credibility trees represent the phylogenetic relationships for gag, pol and env in Fig. 2. All trees corroborate a single transmission event from patient 3 to patient 1 and from patient 3 to patient 2. Almost all posterior trees for gag, pol and env (100, 97 and 99%, respectively) support a monophyletic cluster of sequences obtained from patient 1, and almost all posterior trees for gag and pol (99 and 100%, respectively) and a lower number of trees for env (24%) support a monophyletic cluster for sequences of patient 2. The mean evolutionary rates and coefficients of variation (see supplemental information, http://links.lww.com/QAD/A132) indicated significant nonclock-like behavior in all three genes justifying the use of relaxed molecular clocks.

In Fig. 1b, we depict the viral divergence time estimates to delineate the transmission intervals of patient 1 and patient 2 and compare them to the transmission intervals established using the calendar dates of the last time point before and the first time point after superinfection for patient 1 and of the last HIV-negative test and first HIV-positive test for patient 2. If we consider the smallest common time interval based on the credible intervals for both estimates across all genes (gray area in Fig. 1b), the divergence times can considerably narrow down the transmission time estimates.

We further investigated how Bayesian MCMC phylogenetic analysis of rooted trees relates to unrooted phylogenetic analysis in revealing the direction of HIV-1 transmission. For this purpose, all three sequence datasets were subjected to Maximum Likelihood analysis (see supplemental information, http://links.lww.com/QAD/A132). In all Maximum Likelihood phylogenetic reconstructions, patient 1 and patient 2 sequences were nested clusters within the larger diversity of patient 3 sequences, which again implicates patient 3 as the source for the superinfection and the initial infection of patient 1 and patient 2, respectively. However, the bootstrap support for monophyly of sequences from patient 1 (70, 51 and 42% for gag, pol and env, respectively) and patient 2 (28, 25 and 6% for gag, pol and env, respectively) was generally less convincing (see supplemental information, http://links.lww.com/QAD/A132).

Back to Top | Article Outline


In our present study, we included viral evolutionary data based on HIV-1 gag, pol and C2–C4 env sequences generated from serum and peripheral blood mononuclear cells obtained at multiple time points from an HIV-1 superinfected former elite controller and his two partners during 3 years of HIV-1-positive follow-up to elucidate the time and direction of HIV-1 transmission. Bayesian MCMC phylogenetic analysis implicated that not the superinfected patient 1, but rather the new partner patient 3 was the index case for the infection of patient 2. This evolutionary analysis of sequences from three HIV-1 genes has demonstrated the possibility to provide transmission directions even in the case of closely related sequences derived from an epidemiologically linked transmission cluster involving three individuals.

Our study reveals two key aspects to arrive at conclusions of transmission directionality. First, we required a characterization of the viral variants within the donor and recipients, which allowed us to demonstrate that diversity of recipients' viral quasispecies nested within the diversity of HIV-1 in the donor (Fig. 2). In this respect, sample size and sampling bias issues may be important to consider, but only a few variants from the donor and recipients may already reveal these nested or paraphyletic relationships. Second, an essential aspect of our probabilistic inference is the ability to provide rooted time-measured phylogenies. Without evolutionary direction, tree topologies are essentially unrooted (see Maximum Likelihood trees, supplemental information, http://links.lww.com/QAD/A132); the root may lie on any branch, which can be compatible with different hypotheses of transmission direction. Standard phylogenetic analysis frequently resorts to an outgroup, generally obtained from unrelated patients, for rooting purposes. However, it may be particularly challenging to establish evolutionary direction in the case of closely related sequences from a transmission chain using unrelated, considerably more divergent outgroup sequences. By averaging over all plausible time-measured evolutionary histories, the Bayesian MCMC analysis takes into account rooting uncertainty contingent on the data at hand, and phylogenetic (topological) uncertainty in general. So, all plausible roots for the data at hand can be taken into consideration when evaluating transmission hypotheses. Averaging over all plausible histories also provides a measure of support for monophyletic clustering in the form of posterior probabilities. In our case, these appear to provide better support for recipient monophyly than Maximum Likelihood bootstrap support, but such differences between Bayesian posterior probability support and bootstrap frequencies appear to be a general observation in phylogenetics [13].

As the viral evolutionary histories are estimated on a real-time scale, we can also demarcate the transmission interval based on the donor and recipient divergence time and the TMRCA of the recipients (Fig. 1b), both connected by a single branch in the phylogeny. Considering the common denominator of the credible intervals for both estimates across the three phylogenies, we establish relatively narrow time intervals. Whereas the time interval determined by the last-negative and the first-positive sample for the superinfecting strain in patient 1 widely overlaps with the time interval determined by the last HIV-negative and first HIV-positive test for patient 2, this is not the case anymore for the common genetic demarcation and the transmission to patient 1 is clearly estimated before the transmission to patient 2. To achieve such precision, HIV-1-positive samples as close as possible to transmission are needed, which is an important practical consideration for further applications aimed at investigating transmission dynamics.

In conclusion, time-measured Bayesian MCMC analysis of longitudinally sampled sequences generated from multiple genes may be a useful approach to determine direction, and possibly also timing, of HIV-1 transmission.

Back to Top | Article Outline


A.R. conceived of and designed the study, performed experiments, phylogenetic analysis and interpretation of the data, and wrote the manuscript. P.H.P.G. and S.v.A. provided serum samples and clinical data of the patients. P.L. performed phylogenetic analysis and interpretation of the data, and wrote the manuscript. H.S. designed the study and wrote the manuscript.

The authors are indebted to the three patients for their continuous participation in our study.

Financial support for this study was obtained from the Academic Medical Center at the University of Amsterdam. P.L. is supported by a postdoctoral fellowship from the Fund for Scientific Research (FWO) Flanders.

There are no conflicts of interest.

Back to Top | Article Outline


1. Rachinger A, Navis M, van Assen S, Groeneveld PH, Schuitemaker H. Recovery of viremic control after superinfection with pathogenic HIV type 1 in a long-term elite controller of HIV type 1 infection. Clin Infect Dis 2008; 47:e86–e89.
2. Leitner T, Escanilla D, Franzen C, Uhlen M, Albert J. Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci U S A 1996; 93:10864–10869.
3. Lemey P, Derdelinckx I, Rambaut A, Van LK, Dumont S, Vermeulen S, et al. Molecular footprint of drug-selective pressure in a human immunodeficiency virus transmission chain. J Virol 2005; 79:11981–11989.
4. Albert J, Wahlberg J, Leitner T, Escanilla D, Uhlen M. Analysis of a rape case by direct sequencing of the human immunodeficiency virus type 1 pol and gag genes. J Virol 1994; 68:5918–5924.
5. Lemey P, Van DS, Van LK, Schrooten Y, Derdelinckx I, Goubau P, et al. Molecular testing of multiple HIV-1 transmissions in a criminal case. AIDS 2005; 19:1649–1658.
6. Lemey P, Vandamme AM. Exploring full-genome sequences for phylogenetic support of HIV-1 transmission events. AIDS 2005; 19:1551–1552.
7. Bernard EJ, Azad Y, Vandamme AM, Weait M, Geretti AM. HIV forensics: pitfalls and acceptable standards in the use of phylogenetic analysis as evidence in criminal investigations of HIV transmission. HIV Med 2007; 8:382–387.
8. Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol 2006; 4:e88.
9. De Oliveira T, Pybus OG, Rambaut A, Salemi M, Cassol S, Ciccozzi M, et al. Molecular epidemiology: HIV-1 and HCV sequences from Libyan outbreak. Nature 2006; 444:836–837.
10. Rachinger A, Stolte IG, van de Ven TD, Burger JA, Prins M, Schuitemaker H, et al. Absence of HIV-1 superinfection 1 year after infection between 1985 and 1997 coincides with a reduction in sexual risk behavior in the seroincident Amsterdam cohort of homosexual men. Clin Infect Dis 2010; 50:1309–1315.
11. Bruen TC, Philippe H, Bryant D. A simple and robust statistical test for detecting the presence of recombination. Genetics 2006; 172:2665–2681.
12. Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 2007; 7:214.
13. Erixon P, Svennblad B, Britton T, Oxelman B. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst Biol 2003; 52:665–673.

Bayesian Markov Chain Monte Carlo; elite control; HIV; Maximum Likelihood; phylogenetic analysis; superinfection; transmission direction

Supplemental Digital Content

Back to Top | Article Outline
© 2011 Lippincott Williams & Wilkins, Inc.