Secondary Logo

Journal Logo

Epidemiology and Prevention

Inferring HIV-1 Transmission Dynamics in Germany From Recently Transmitted Viruses

Pouran Yousef, Kaveh PhD*; Meixenberger, Karolin PhD; Smith, Maureen R. MSc*; Somogyi, Sybille PhD; Gromöller, Silvana MSc; Schmidt, Daniel MSc§; Gunsenheimer-Bartmeyer, Barbara PhD§; Hamouda, Osamah PhD; Kücherer, Claudia PhD; von Kleist, Max PhD*

Author Information
JAIDS Journal of Acquired Immune Deficiency Syndromes: November 1, 2016 - Volume 73 - Issue 3 - p 356-363
doi: 10.1097/QAI.0000000000001122



HIV-1 continues to spread globally with a fairly stable incidence rate in Europe.1 Recently, novel strategies, such as treatment as prevention (TasP)2 and pre-exposure prophylaxis3 have been introduced. Their effective implementation however requires substantial knowledge of the epidemiological dynamics of HIV-1.4 Transmission dynamics represent a random subset of the underlying sexual contact dynamics,5,6 which may not be reconstructable from patient-derived information alone. This is because HIV-1 infection is characterized by long periods of infectivity and a small probability of infection per sexual act.7 However, viral sequences extracted shortly after infection (recent) yield valuable insights into patterns of disease transmission8 because closely related viral sequences may reflect direct transmission events or transmission clusters, where transmission may have occurred through one or several unknown individuals.

Except for few cases,9 the population of infected individuals is usually insufficiently covered to precisely reconstruct the connectivity structure of the underlying network. However, certain statistical features may still be represented. For instance, the distribution of cluster sizes may be considered as a sparse sample of the original epidemiological network approximating its connectivity structure,4 although there may not be a one-to-one correspondence. Furthermore, temporal and geographical information of individuals co-clustered by their viral sequences may yield statistical parameters describing the spread of the infection. Such parameters are given by the typical time elapsed between infection and onward transmission and its geographical spread.10,11 The objective of this study was a characterization of HIV-1 transmission dynamics in Germany based on viral sequences from recently infected individuals with known date of infection, geographical origin, and transmission group. We first inferred putative transmission clusters using various clustering thresholds and analyzed their size distribution. We then derived a conservative clustering threshold that maximized co-clustering of individuals connected by a direct transmission event. The derived clusters were analyzed to infer temporal dynamics of HIV-1 spread.


Study Population

The German HIV-1 seroconverter study is a nationwide, multicenter, open, prospective, long-term observational cohort study initiated in 1997 (written informed consent, ethical vote obtained from Charité University Medicine Berlin). In the present work, we only included “acute seroconverters” based on a first reactive HIV test before confirmed seroconversion or “documented seroconverters” who were confirmed HIV-1 antibody–positive maximally within a year after the last negative HIV test (Section 1, Supplemental Digital Content, Duration of infection, that is the period between the calculated date of seroconversion and the date of the first blood sampling for HIV-1 genotyping, was restricted to maximally 6 months. Also, the study population was limited to therapy-naïve seroconverter patients of whom at least one pol sequence was available. All follow-up sequences of the eligible participants were included, provided that the participant remained untreated.

HIV-1 pol Sequences

EDTA-blood samples taken between 1997 and 2012 from study participants were collected at the date of enrollment and roughly at yearly follow-ups. HIV-1 pol sequences were determined from plasma by direct Sanger sequencing of reverse transcriptase-polymerase chain reaction amplicons using the Viroseq HIV-1 Genotyping System (Abbott Molecular, Wiesbaden, Germany) or an in-house pol reverse transcriptase-polymerase chain reaction, as previously described.12–14 The entire protease sequence (99 codons) and the first 247 codons of the reverse transcriptase were included (GenBank accession numbers KX465238 to KX467180).

Ambiguous base calls were identified at a threshold of approximately 20% and were called if the minor peak was 3 times higher than the background and evident in at least 2 differently primed sequences. Overall, there was a median of 0.22% [interquartile range (IQR): 0%–0.78%] ambiguous base calls (“RYKMSWBDHVN”) per sequence.

Identification of Transmission Clusters by Phylogenetic Analyses

A multiple sequence alignment was generated and end-trimmed using the program Mafft ( We deleted all putative drug resistance sites according to the International Antiviral Society list 201315 to avoid the effects of convergent evolution through transmitted drug resistance mutations. Maximum likelihood (ML) phylogenies were computed using RAxML We used the general time reversible model of nucleotide substitution17 with the CAT (category) approximation to rate heterogeneity.18 In order to minimize the probability of converging to local minima in the tree space, ML-estimation was repeated 200 times with different randomized maximum parsimony starting trees. In addition, the tree topology was bootstrapped by a random shuffling of the multiple sequence alignment columns with 500 bootstrap runs. The best-scoring ML tree was selected for further analysis. We tested 2 alternatives for rooting the tree: we used a chimpanzee simian immunodeficiency virus sequence (GenBank accession number U42720) as outgroup and the node separating subtype B sequences (91.1% of all sequences) from the other subtypes. Both rooting methods yielded equivalent clustering results.

Phylogenetic clustering of individuals (and their viral sequences) was based on a breadth-first search, which recursively identifies the largest clades with a bootstrap support of at least 95% and a mean interpatient (patristic) sequence distance not exceeding a threshold (Section 3, Supplemental Digital Content,

The optimality criterion for identifying a conservative threshold was defined as a trade-off between clustering uniqueness (using the silhouette score as a specificity surrogate) and the probability that all follow-up sequences of one individual are assigned to the same cluster (sensitivity surrogate). The latter is motivated as follows: Because we analyzed treatment-naïve pol sequences, we did not expect strong adaptive/diversifying selection pressure19 (ratio of nonsynonymous to synonymous substitutions dN/dS was ≤1 in most follow-up sequence pairs). Therefore, we assumed that follow-up sequence data approximate the evolutionary dynamics of viruses of direct descent. Optimizing the threshold in a way that maximizes co-clustering of follow-up sequences may then also maximize clustering of individuals linked by a direct transmission event. For a detailed description of the hierarchical clustering method, the threshold optimization and a short discussion see Section 3, Supplemental Digital Content, A program (transmicBS) for the identification of transmission clusters on phylogenetic trees allowing for inclusion of follow-up sequences was implemented in Python (

Relation Between Co-clustering and Residential Proximity

We developed a significance test for an unbiased statistical assessment of over- or under-representation of certain geographical cluster compositions as defined by the residence information of the clustered individuals. An observed geographical cluster composition (eg, Berlin/Brandenburg/Bavaria) was considered significant if it was observed significantly more often than expected by chance, see Section 4, Supplemental Digital Content,, for details.

Interinfection Times

We assessed the temporal transmission dynamics by computing the interinfection times between co-clustered individuals and by comparing them to the interinfection times in a null model (same individuals, all interinfection times, regardless of co-clustering). The details are provided in Section 5, Supplemental Digital Content,


Characteristics of the Study Population

A total of 1159 study participants had eligible baseline HIV-1 sequences of whom 427 participants had 1–7 follow-up sequences, cumulating to 1943 sequences included. The vast majority of participants were men (1101/1159, 95%), and the predominant transmission group was men who have sex with men (MSM, 1018/1159, 87.8%). Berlin was the predominant residence state (62.7%). The median duration of infection at baseline blood sampling for HIV-1 sequencing was 6 weeks (IQR: 2–15 weeks), and the median age of participants was 33 years (range: 27–39 years). The vast majority (91%) were infected with subtype B virus (1056/1159 individuals). Characteristics of the study population are depicted in Table 1.

Characteristics of the Study Population in Clustered (
) vs. Unclustered Individuals

Cluster-Size Distribution

Using the described set of HIV-1 pol sequences, we identified clusters based on the reconstructed phylogeny if the mean interindividual distance between the taxa in a clade was below a certain threshold and if the corresponding subtree had sufficient bootstrap support (≥95%). We tested whether the size distribution of identified clusters may be explained by a power-law model (Section 2, Supplemental Digital Content, as described in previous studies.4 In order to analyze this hypothesis, we used various clustering thresholds and applied the rigorous method of Clauset et al.20 This method tests whether a synthetic sample from the power-law distribution provides a better fit to an analytical power-law distribution than the actual data set (Section 2, Supplemental Digital Content, A P value above 0.1 suggests that the observed data are power-law distributed, meaning that the synthetic data set does not exhibit a significantly better fit than the observed data set. For all analyzed values of , the cluster sizes exhibit a heavy-tailed distribution, ie, the logarithm of the cluster size is almost perfectly anticorrelated with the logarithm of the frequency of a particular cluster size (Fig. 1A–C). However, only for clustering thresholds , a power law seems to be a plausible model according to the criteria defined in Clauset et al20 (Fig. 1D). The reason is that for clustering thresholds , we obtain too few clusters for a significant statistical assessment according to the proposed method20 (eg, 50 clusters at , 77 clusters at and ≥89 clusters at ). For all tested thresholds, we deduced a power law with a finite mean and variance as indicated by an exponent parameter (Supplemental Digital Content, Furthermore, we analyzed whether alternative distributions such as exponential, Poisson, Waring, and Yule distribution exhibit a better fit to the data than a power law. Using Vuong's22 likelihood-ratio test, we found that the exponential and Poisson distribution can be ruled out if the clustering threshold is sufficiently large (Figure S1 and corresponding section 2, Supplemental Digital Content, The fitting quality of the Waring distribution (weak power law23) to the cluster-size data could not be distinguished from the power-law fit for an intermediate range of clustering thresholds. In contrast, for all analyzed clustering thresholds, the goodness of fit of the Yule distribution was not significantly different from the power-law fit. This is in line with results of Handcock et al,24 suggesting that the limit of the Yule distribution is given by a power law.

Statistical attributes of cluster-size distribution. A–C, Cluster size distributions (double-logarithmic scale) shown for 3 different clustering thresholds
, corresponding to global or selected local maxima of the combined clustering performance score (Fig. 2B). The lines show the best linear fits with Pearson correlation coefficient ρ as indicated. D, P value of the fit of the power-law distribution to observed cluster-size distributions at different clustering thresholds (details of the significance test are described in section 2, Supplemental Digital Content,

Identification of the Optimal Conservative Clustering Threshold

Next, we aimed at inferring temporal dynamics of HIV-1 spread. To this end, we intended to identify a conservative clustering threshold that maximizes the probability that individuals who are related by direct transmission events are placed into the same putative transmission cluster. Furthermore, it should minimize the probability that individuals who are connected by indirect transmission events are co-clustered. We simultaneously maximized clustering modularity using the silhouette score, and the relative amount of individuals whose follow-up sequences are assigned to the same cluster, (Fig. 2A). Because follow-up sequences are of close descent, their co-clustering may yield a good divergence estimate for closely related viruses from different individuals. Hence, we assumed that the follow-up criterion maximizes co-clustering of individuals who are directly connected by transmission. The procedure yielded a composite score (Fig. 2B), which identified an optimal clustering threshold of (Section 3, Supplemental Digital Content, Using this threshold, a total of 195 HIV-1 sequences from 110 individuals were clustered in 50 putative transmission clusters. These clusters contained sequences from a median of 2 patients (IQR: 2–2). Of the putative transmission clusters, 94% (47/50) contained only individuals from identical transmission groups, that is 92% (46/50) were MSM-only and 2% (1/50) were heterosexual (HET)-only clusters (HET transmission). A total of 6% (3/50) of the identified clusters contained individuals from different transmission groups; either MSM and HET, or MSM and unknown risk.

Clustering performance score depending on the mean pairwise distance threshold
(x-axis). A, Silhouette score (solid line) along with the follow-up inclusion score (dotted line). The vertical dashed line indicates the global optimum of the combined score. B, Combined score as described in Materials and Methods. Global and local optima are marked by vertical thick and thin dashed lines, respectively.

The vast majority of putative transmission clusters (94%, 47/50) contained only men, whereas 6% (3/50) were mixed-gender transmission clusters. Of the entire eligible study population, 9.7% (107/1101) of all male patients and 3.6% (2/55) of all female individuals were in a putative transmission cluster. A total of 94% of identified clusters were of subtype B, whereas 3 clusters consisted of subtypes CRF01_AE, A (A1) and unique recombinant form. We did not identify any significant differences between the group of clustered vs. unclustered individuals, except that MSM were more likely to be clustered (P < 0.05).

Co-clustering and Residential Proximity

We analyzed co-clustered individuals in terms of their place of residence within German federal states. Our focus was on 3 types of cluster compositions: clusters consisting of individuals living in the same, bordering, or nonbordering federal states. The observed amount of clusters of a certain regional composition was considered significant if it was observed more often than expected by the random background model (Methods and Section 4, Supplemental Digital Content, As a result, clusters comprising individuals living in the same state were significantly more frequent than expected by chance (Fig. 3A, P < 0.01). Clusters including individuals living in bordering states were also more frequent than expected (Fig. 3B, P < 0.01) except for Berlin/Brandenburg clusters. In contrast, clusters from nonbordering states were either not significantly different or even less frequent than expected by chance (Fig. 3C).

Spatial transmission dynamics. Bars are expressed as deviations from randomness (ie,
, see section 4, Supplemental Digital Content, Normalized proportion of A, Homogeneous clusters (all patients from the same federal state). B, Ggeographically adjacent clusters (all patients must come from bordering federal states). C, Interregional clusters (patients come from nonbordering federal states). The flanking stars indicate that the outcome is significantly different from the null model at α = 0.01 (***).

Interinfection Times

We computed the interinfection times (the absolute pairwise differences of seroconversion dates) between individuals who were co-clustered vs. all interinfection times for the same individuals as background (ie, including differences between seroconversion dates of individuals not co-clustered). The resulting empirical distribution is shown in Figure 4A. The median interinfection time between co-clustered individuals was 182 days (IQR: 90.9–571.2, average of 13 months) and was significantly smaller than in a background data set (1993 days, P = 4.2 × 10-22, Fig. 4), where we computed all possible interinfection times, regardless of whether individuals were co-clustered or not. We also repeated the same analysis in which we defined the background of all participants (clustered and unclustered, amounting to interinfection times), coming to the same conclusion (see Figure S4A, Supplemental Digital Content,

Temporal transmission dynamics. A, Probability distribution of interinfection times between co-clustered individuals (light gray, median: 26 wk/182 d) and for the same individuals, all pairwise distances, irrespectively of whether co-clustered or not (black, median: 170 wk/1193 d). The dashed white horizontal lines represent the respective medians. B, Probability that the interinfection time is greater than the time indicated on the x-axis (survival plot). Light gray line: Interinfection times between individuals belonging to the same transmission cluster. Black: All interinfection times for the same individuals. Gray areas represent the respective confidence areas, computed using Greenwood formula.

In order to assess more quantitatively the temporal dynamics, we computed the probability that the interinfection times are greater than some time-lag , ie, . The time course for is shown in Figure 4B based on the interinfection times of co-clustered individuals vs. all possible interinfection times for these individuals. This figure shows that the probability of interinfection times being larger than a certain threshold declines much more rapidly than expected from the background data (all interinfection times), hazard ratio: 4.53, P < 0.001. The analysis with all participants (clustered and unclustered) as background resulted in the same conclusion (Figure S4B, Supplemental Digital Content,


The main aim of this work was to characterize HIV-1 transmission dynamics in Germany based on a large sample of HIV-1 pol sequences from recently infected individuals with known infection dates. We used a phylodynamic clustering approach,25 meaning that we inferred a phylogenetic tree (ML) from the set of available sequences. We then assigned the sequences (and corresponding individuals) into putative transmission clusters if the mean pairwise evolutionary distance within a corresponding subtree was below an optimal threshold (see Supplemental Digital Content, Note, that depending on the data set and the objective of the analysis, different thresholds may be useful. In general, smaller thresholds result in fewer clusters, where co-clustered individuals are connected by shorter infection chains. This enables to analyze dynamics related to transmission. Large thresholds, however, yield more clusters and thus enable a more statistically solid analysis of general cluster properties, bearing in mind that co-clustered individuals may be related by more indirect transmission events. Although some authors a priori fixed the threshold at a conservative level,26,27 others optimized the threshold using the distribution of sequence mismatches11 or by maximizing the number of clusters.28

Because we observed that the statistics of the cluster-size distribution remained fairly similar across various thresholds (Figure S1, Supplemental Digital Content,, we analyzed whether the cluster-size distribution obtained from our analysis is power-law distributed. Although in many studies, simple Pearson correlation between the logarithm of the cluster sizes and their frequency is used (eg, Hughes et al29), Clauset et al20 and Jones and Handcock30 argued that this is insufficient, because degree distribution data violate the assumptions required for a regression fit. We thus adopted the more rigorous significance test from Clauset et al20 and indeed found that the power-law distribution is a plausible model for the distribution of putative transmission clusters. We then tested various alternative models as potential explanations for the observed long-tail distribution of cluster sizes (ie, very few large clusters and many small clusters consisting of 2–3 individuals). Our analysis confirmed that the sampled subnetwork is structured, favoring a simple power law and Yule distribution over alternative models. A connection to the structure of the underlying sexual contact network is tempting because putative transmission cluster sizes and node degrees in sexual networks obey similar statistical laws.6,24 To this end, a comparison with cluster-size distributions obtained from simulated trees under alternative epidemiological models can be made.31 However, the time-dependent nature of edges in epidemic networks32 introduces an additional level of complexity, emphasizing the need for further research on this topic.

To infer properties of the temporal spread of HIV-1 and to minimize contamination of putative transmission clusters by indirect transmission links, conservative clustering thresholds are required. Hence, we optimized the clustering threshold: We assumed that longitudinal sequence data approximate the evolutionary dynamics of directly descended viruses. Optimizing the clustering threshold in a way that maximizes co-clustering of follow-up sequences may then also maximize clustering of individuals linked by a direct transmission event. In the absence of a real “gold standard,” we used this information as a surrogate to maximize the sensitivity of the clustering threshold. In addition, we used the silhouette score to maximize the uniqueness (specificity) of cluster assignment. The tradeoff between these 2 criteria defined an optimal threshold of at which we identified 50 distinct transmission clusters.

To evaluate whether individuals' residency had a profound impact on their propensity to cluster together, we devised a statistical test (Section 4, Supplemental Digital Content, We identified a strong association between residential proximity and co-clustering (Fig. 3A and B). It should be noted that we observed this tendency only at a sufficiently small clustering threshold (), which is expected to maximize the likelihood of co-clustering only direct transmission events. Unsurprisingly, at higher thresholds (eg, ), more cross-regional clusters were observed (data not shown), related to indirect transmission events (multiple missing transmission links). This post hoc analysis thus substantiated our claim of minimizing contamination of the derived clusters by indirect transmission events at . Furthermore, it highlights the importance of the choice of the clustering threshold, depending on the aim of analysis. After this plausibility check, we assessed interinfection times between co-clustered individuals and derived a mean estimate of 13 months (median 6 months, IQR: 3–19 months). This estimate is in agreement with other studies: Brenner et al33 derived a mean estimate of 15.2 months, whereas Lewis et al11 and Bezemer et al34 arrive at comparable results with respective median estimates of 13.14 and 13.2 months, both located within our IQR. Furthermore, Recordon-Pinson et al35 obtained a median estimate of 27 months, which lies outside our IQR, but maybe attributable to the use of maximum interinfection intervals in their study.

A potential inaccuracy in estimated interinfection times may be given due to inexact seroconversion estimates. The error range was 0–12 months (duration between last negative and first positive test) for “documented seroconverters” and 0 months for “acute seroconverters.” This inaccuracy is much smaller than in related studies,11,33–35 because of our strict inclusion criteria.

Note that there may be cases where interinfection times over- or underestimate the time to onward transmission as outlined in Section 6, Supplemental Digital Content, Finally, the participants of our cohort might have known their HIV status within 6 months after infection, which could alter their behavior. In the absence of this knowledge, the period of onward transmission may be prolonged.

In the case of substantial diversifying intrahost evolution, distance-based clustering methods may not co-cluster transmission pairs whose viral samples are temporally far apart.31 Subsequent analysis of interinfection times in clusters can then bias this estimate downward. We took several measures to minimize and assess the impact of this potential bias: Firstly, we only considered treatment-naïve pol sequences, which exhibit a very low evolutionary rate.19 Secondly, we compared all available sequences of individuals and if any 2 sequences co-clustered, the corresponding individuals were co-clustered. The patients' baseline sequence was derived within 6 months after infection, minimizing evolution within a (potential) virus recipient. All follow-up sequences of an individual were included in the analysis, guaranteeing that the distance between any 2 sequences of a transmission pair is minimal, irrespective of their interinfection times. In the absence of follow-up sequences, knowledge of infection times, or strong diversifying selection, a coalescent approach, modeling the evolution at the population level, may be more suitable. Such an approach may incorporate transmission models with distinct disease stages.36

In conclusion, we used phylodynamic clustering to infer HIV-1 transmission dynamics in Germany based on viral sequences from recently infected individuals with known infection dates. We observed a power-law distribution of cluster sizes with a few large and many small clusters. We optimized our clustering to minimize contamination of clusters by too many unobserved intermediate transmission events. Co-clustering was associated with spatial proximity and short interinfection times. The latter suggests that onward transmission occurred shortly after infection in our data set. Importantly, most individuals may be unaware of their serological status during this time and may consequently not have initiated intervention strategies like TasP. Thus, for TasP to have maximum epidemiologic impact, it should be combined with HIV testing campaigns.37


We are very grateful to all study patients and medical doctors participating in the German HIV-1 Seroconverter Study since 1997. We acknowledge the enduring excellent technical assistance of Sabrina Neumann, Hanno von Spreckelsen, and Katrin Arndt, the excellent RKI sequencing (head Andreas Nitsche) service of Julia Tesch, Silvia Muschter, and Julia Hinzmann, and the careful data documentation of Parvinolsadat Ghassim.


1. UNAIDS Report on the Global AIDS Epidemic. 2013. Available at: Accessed November 04, 2015.
2. Cohen MS, Chen YQ, McCauley M, et al. Prevention of HIV-1 infection with early antiretroviral therapy. N Engl J Med. 2011;365:493–505.
3. Grant RM, Lama JR, Anderson PL, et al. Preexposure chemoprophylaxis for HIV prevention in men who have sex with men. N Engl J Med. 2010;363:2587–2599.
4. Leigh Brown AJ, Lycett SJ, Weinert L, et al. Transmission network parameters estimated from HIV sequences for a nationwide epidemic. J Infect Dis. 2011;204:1463–1469.
5. Doherty IA, Padian NS, Marlow C, et al. Determinants and consequences of sexual networks as they affect the spread of sexually transmitted infections. J Infect Dis. 2005;191(suppl 1):S42–S54.
6. Liljeros F, Edling CR, Amaral LA, et al. The web of human sexual contacts. Nature. 2001;411:907–908.
7. Royce RA, Sena A, Cates W Jr, et al. Sexual transmission of HIV. N Engl J Med. 1997;336:1072–1078.
8. Leitner T, Escanilla D, Franzen C, et al. Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci U S A. 1996;93:10864–10869.
9. Bruhn CA, Audelin AM, Helleberg M, et al. The origin and emergence of an HIV-1 epidemic: from introduction to endemicity. AIDS. 2014;28:1031–1040.
10. Frentz D, Wensing AM, Albert J, et al. Limited cross-border infections in patients newly diagnosed with HIV in Europe. Retrovirology. 2013;10:36.
11. Lewis F, Hughes GJ, Rambaut A, et al. Episodic sexual transmission of HIV revealed by molecular phylodynamics. PLoS Med. 2008;5:e50.
12. Poggensee G, Kuecherer C, Werning J, et al. Impact of transmission of drug-resistant HIV on the course of infection and the treatment success. Data from the German HIV-1 Seroconverter Study. HIV Med. 2007;8:511–519.
13. Bartmeyer B, Kuecherer C, Houareau C, et al. Prevalence of transmitted drug resistance and impact of transmitted resistance on treatment success in the German HIV-1 Seroconverter Cohort. PLoS One. 2010;5:e12718.
14. Zu Knyphausen F, Scheufele R, Kuecherer C, et al. First line treatment response in patients with transmitted HIV drug resistance and well defined time point of HIV infection: updated results from the German HIV-1 seroconverter study. PLoS One. 2014;9:e95956.
15. Johnson VA, Calvez V, Gunthard HF, et al. Update of the drug resistance mutations in HIV-1 March 2013. Top Antivir Med. 2013;21:6–14.
16. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690.
17. Tavare S. Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences. Lectures on Mathematics in the Life Sciences. Vol. 17: Providence, RI: Amer Mathematical Society; 1986:57–86.
18. Stamatakis A. Phylogenetic models of rate heterogeneity: a high performance computing perspective. Proceedings of 20th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS2006), High Performance Computational Biology Workshop, Rhodos, Greece. April 25–29, 2006.
19. Zanini F, Brodin J, Thebo L, et al. Population genomics of intrapatient HIV-1 evolution. Elife. 2015;4:e11282.
20. Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. Siam Rev. 2009;51:661–703.
21. Newman MEJ. Power laws, Pareto distributions and Zipf's law. Contemp Phys. 2005;46:323–351.
22. Vuong Q. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica. 1989;57:307–333.
23. Chen WC. On the weak form of Zipf law. J Appl Probab. 1980;17:611–622.
24. Handcock MS, Jones JH. Likelihood-based inference for stochastic models of sexual network formation. Theor Popul Biol. 2004;65:413–422.
25. Pybus OG, Rambaut A. Evolutionary analysis of the dynamics of viral infectious disease. Nat Rev Genet. 2009;10:540–550.
26. Aldous JL, Pond SK, Poon A, et al. Characterizing HIV transmission networks across the United States. Clin Infect Dis. 2012;55:1135–1143.
27. Kaye M, Chibo D, Birch C. Phylogenetic investigation of transmission pathways of drug-resistant HIV-1 utilizing pol sequences derived from resistance genotyping. J Acquir Immune Defic Syndr. 2008;49:9–16.
28. Lawyer G, Schulter E, Kaiser R, et al. Endogenous or exogenous spreading of HIV-1 in Nordrhein-Westfalen, Germany, investigated by phylodynamic analysis of the RESINA Study cohort. Med Microbiol Immunol. 2012;201:259–269.
29. Hughes GJ, Fearnhill E, Dunn D, et al. Molecular phylodynamics of the heterosexual HIV epidemic in the United Kingdom. PLoS Pathog. 2009;5:e1000590.
30. Jones JH, Handcock MS. An assessment of preferential attachment as a mechanism for human sexual network formation. Proc Biol Sci. 2003;270:1123–1128.
31. Volz EM, Koopman JS, Ward MJ, et al. Simple epidemiological dynamics explain phylogenetic clustering of HIV from patients with recent infection. PLoS Comput Biol. 2012;8:e1002552.
32. Rocha LE, Liljeros F, Holme P. Information dynamics shape the sexual networks of internet-mediated prostitution. Proc Natl Acad Sci U S A. 2010;107:5706–5711.
33. Brenner BG, Roger M, Routy JP, et al. High rates of forward transmission events after acute/early HIV-1 infection. J Infect Dis. 2007;195:951–959.
34. Bezemer D, van Sighem A, Lukashov VV, et al. Transmission networks of HIV-1 among men having sex with men in the Netherlands. AIDS. 2010;24:271–282.
35. Recordon-Pinson P, Anies G, Bruyand M, et al. HIV type-1 transmission dynamics in recent seroconverters: relationship with transmission of drug resistance and viral diversity. Antivir Ther. 2009;14:551–556.
36. Volz EM, Ionides E, Romero-Severson EO, et al. HIV-1 transmission during early infection in men who have sex with men: a phylodynamic analysis. PLoS Med. 2013;10:e1001568. discussion e1001568.
37. McNairy ML, El-Sadr WM. Antiretroviral therapy for the prevention of HIV transmission: what will it take? Clin Infect Dis. 2014;58:1003–1011.

transmission cluster; phylogeny; transmission dynamics; onward transmission; clustering threshold

Supplemental Digital Content

Copyright © 2016 Wolters Kluwer Health, Inc. All rights reserved.