Eyer-Silva, Walter A. MD*†‡; Morgado, Mariza G. PhD*
The AIDS epidemic in Brazil is spreading from the large urban centers to small counties and the innermost parts of the country1: however, data on the features of HIV-1 infection in these places are scarce. Such spread is expected to pose formidable medical, social, and logistic challenges. The future of the epidemic in Brazil is likely to be governed by our ability to halt HIV-1 spread toward these areas. Thus, there is an urgent need to delineate the clinical, epidemiologic, and virologic features of HIV-1 infection in relatively small Brazilian communities.
Molecular analysis of nucleotide sequences has been successfully used to trace patterns of HIV-1 spread.2 These studies, for example, established the epidemiologic linkage between a rape victim and the assailant,3 documented iatrogenic transmission to a cluster of patients,4 allowed the reconstruction of a long chain of transmission involving 9 individuals,5 identified routes of viral spread within different regions of the same country,6 and provided evidence that 2 independent epidemics were going on in the same country.7 Additionally, analysis of HIV-1 sequences provided evidence of a high-incidence profile, as evidenced by molecular findings such as multiple transmission networks,7-9 and of an old and mature epidemic pattern, given the high sequence diversity and absence of multiple transmission clusters.10,11
Therefore, to understand local epidemiologic patterns of HIV-1 spread in a small Brazilian county better, we sequenced viral sequences recovered from patients who were receiving care at the municipal HIV-1/AIDS Program of Miracema. Our aim was to clarify routes of HIV-1 spread in a setting that we already knew was primarily characterized by heterosexual transmission.12
PATIENTS AND METHODS
Patients and Setting
Miracema is a small county in Northwestern Rio de Janeiro State, with 27,042 inhabitants as of August 2000.13 Monthly medical appointments are offered at a local ambulatory facility with a physician based in the city of Rio de Janeiro. Patients are referred by local health care professionals whenever HIV-1 infection is diagnosed or suspected. Antiretroviral agents, treatment, and prophylaxis of opportunistic disorders are freely supplied when clinically indicated. Patients were staged according to the 1993 revised Centers for Disease Control and Prevention (CDC) classification system.14 Between July 1999 and July 2005, a total of 78 HIV-1-infected adult patients (41 female) visited the unit at least once. Most patients attributed the acquisition of HIV-1 infection to unprotected sexual intercourse, whereas 1 subject (patient M48) was an intravenous drug user (IVDU). A detailed clinical and epidemiologic characterization of the cohort is presented elsewhere.12 The study protocol was approved by the Ethics Review Board at Instituto de Pesquisas Clínicas Evandro Chagas, Instituto Oswaldo Cruz, Rio de Janeiro.
Blood Samples, DNA Extraction, Amplification, and Nucleotide Sequencing
After obtaining signed informed consent, blood samples were collected from 63 adult patients. Total genomic DNA was extracted, and DNA samples were amplified by polymerase chain reaction (PCR) using nested protocols. The C2V3/envelope (env) region and a fragment of the polymerase (pol) spanning protease (PR) and reverse transcriptase (RT) regions were amplified using outer and nested primer sets (available on request). Purified PCR products were sequenced in an ABI 3100 automated sequencer (Applied Biosystems) with BigDye Terminator (version 3.0) Cycle Sequencing Reaction Kits. Contigs were assembled using SeqMan II, which was included in the DNASTAR package.15 Sequences were assigned GenBank accession numbers AY177893 through AY177913, AY928978 through AY929061, DQ058780 through DQ058783, and DQ224346 through DQ224357.
Sequence Analysis and Phylogenetic Studies
As an initial subtyping analysis, sequences were aligned against a set of reference strains from all known HIV-1 group M subtypes (using SIVcpz sequence as the outgroup) and trimmed to equivalent lengths using CLUSTAL X.16 Gap-stripping and minor adjustments were performed. An alignment of 328 base pairs (bp), corresponding to positions 6892 through 7225 relative to HXB2 genome, was obtained for env, and an alignment of 836 bp was obtained for pol (positions 2363-3198). Inferences were performed by the neighbor-joining (NJ) algorithm 17 based on a DNA distance matrix using the F84 model of nucleotide substitution implemented in phylogenetic analysis using parsimony (*and other methods) (PAUP*) software, version 4.0b10.18 The robustness of the trees was evaluated by bootstrap analysis 19 with 100 rounds of replication. The bootscanning method was used to detect and study recombination, as implemented in SIMPLOT software, version 2.5.20
To understand the phylogenetic relationships between clustered sequences further, we realigned those identified as subtype B in both genomic regions against a set of additional subtype B sequences using 2 subtype C sequences as outgroups (ETH2220 and 92BR025) and applying the NJ and maximum likelihood (ML)21 methods. A total of 38 sequences were added to the env alignment, and 55 sequences were added to the pol alignment, including 21 controls collected in Rio de Janeiro.22 The best-fitting nucleotide substitution models for each data set were selected using the Akaike criterion as implemented in MODELTEST, version 3.06.23 We found the general time-reversible (GTR) model with γ-distributed rates across sites and a fraction of sites assumed to be invariable to be the best-fitting model for both data sets. ML trees were run with PAUP* using heuristic searches based on a subtree pruning and regrafting (SPR) branch-swapping algorithm and 1000 rounds of bootstrap replication. The presence of saturation was analyzed by plotting the transitions and transversions versus the F84 genetic distance using DAMBE.24 No evidence of saturation was observed, implying that the data sets were phylogenetically informative.
Patients whose samples yielded sequences that fell within a cluster were compared with those whose samples yielded nonclustered sequences. Depending on the variables, the Wilcoxon rank-sum test and the χ2 test were used for univariate analysis. Multivariate logistic regression was used to identify variables independently associated with being a patient whose sample yielded clustered sequences. Statistical analyses were performed using the software R.25
Assignation to Subtypes
Of the 63 patients from whom a blood sample was available, env sequences were obtained from all 63 and pol sequences were obtained from 58. We failed to obtain pol sequences from 5 samples: 4 were subtype B (M19, M32, M35, and M74) and 1 (M59) was sub-subtype F1 (henceforth designated subtype F1) in env. Fifty-four samples were subtype B in both regions, whereas 3 (from couple M02/M08 and patient M31) were F1 in env and B/F1 recombinants in pol. These recombinants shared the same intersubtype breakpoints and consistently clustered together with high bootstrap values in all analyses. One additional sample (M36) was subtype B in env with a different B/F1 mosaic pattern in pol. These recombinants are described further elsewhere. No other subtypes were found.
Identification of Phylogenetically Related Clusters and Couples
Distance matrix-based phylogenetic analyses revealed potential clusters and couples of genetically related sequences in env and pol studies (not shown). To understand the genetic relatedness among subtype B strains further, ML analyses were performed using additional subtype B controls rooted with 2 subtype C strains (Fig. 1).
Five subtype B clusters ensued (see Fig. 1). Cluster A contains 10 sequences (M20, M33, M34, M35, M40, M50, M51, M52, M64, and M65) supported by significant bootstrap values in env (98) and pol (89), except M33, which branches with a bootstrap value of 83 in pol but only 60 in env. Cluster B contains 7 sequences (M11, M17, M21, M24, M25, M55, and M61) supported by a bootstrap value of 94 in env and 95 in pol. Cluster C contains 3 sequences (M57, M58, and M63) supported by a bootstrap value of 100 in pol and 86 in env. Cluster D contains 3 sequences (M47, M68, and M70) supported by a bootstrap value of 100 in pol and 93 in env. Cluster E contains 3 sequences (M67, M77, and M78) supported by a bootstrap value of 100 in both phylogenies.
Known Epidemiologic Relationships in the Cohort
Epidemiologic linkage was known a priori for some patients. Nine heterosexual couples (M02/M08, M11/M21, M25/M55, M20/M52, M35/M50, M64/M65, M57/M58, M42/M44, and M14/M23), 1 homosexual couple (M68/M70), and 1 group of 1 man and 2 women (M45/M48/M54) had a known direct link. Patients M11 and M25 were also epidemiologically related, thus linking couples M11/M21 and M25/M55.
ML analyses confirmed that sequences recovered from patients with a known epidemiologic relationship shared close genetic relatedness, were topologically associated, and clustered with bootstrap values greater than 80, except for sequences from the group M45/M48/M54. Sequences from epidemiologically related couples fell within cluster A (M20/M52, M35/M40, and M64/M65), cluster B (M11/M21 and M25/M55), cluster C (M57/M58), and cluster D (M68/M70) as well as in isolated branches (M14/M23 and M42/M44). Sequences from couple M02/M08 were B/F1 recombinants.
Comparison of Envelope and Polymerase Phylogenies
Env and pol phylogenies yielded the same overall results, and relationships in one of the studies were reproduced in the other. The env tree, however, tended to present somewhat lower bootstrap values. Examples are the lower bootstrap figures for the common branch of clusters A (including sequence M33), C, D, and E as well as for couple M42/M44. In contrast, couple M14/M23 was strongly supported in env. Nonclustered sequences in env (eg, samples from patients M12, M39, M49, and M76, and from inhabitants of a neighboring county; see Fig. 1A) were invariably nonclustered in pol.
Demographic, Social, Clinical, and Behavioral Associations Within Phylogenetic Clusters
Samples from 29 (46%) patients yielded clustered sequences (Table 1). On univariate analysis, these patients were found to be younger, more likely to have a known epidemiologic link within the cohort, and to always have lived in Miracema. Multivariate logistic regression analysis identified having a known direct epidemiologic relationship (odds ratio [OR] = 4.46, 95% confidence interval [CI]: 3.27 to 5.68; P = 0.014) and having always lived in Miracema (OR = 5.48, 95% CI: 4.30 to 6.65; P = 0.0044) as independent predictors of belonging to a cluster.
Genotypic Resistance to Reverse Transcriptase and Protease Inhibitors
Two B/F1 recombinants and 7 subtype B sequences recovered from treatment-experienced patients harbored mutations known to confer reduced drug susceptibility (see Fig. 1B). No evidence of transmission of resistant variants was recorded. The presence of these mutations did not seem to obscure or artificially create phylogenetic relationships.
Current epidemiologic data show the spread of the AIDS epidemic toward small counties and the innermost parts of Brazil.1 Small Brazilian communities are likely to be challenged to provide care to an increasing number of patients, yet these places generally have a much less comprehensive health infrastructure and lack physicians familiar with the medical management of HIV-1 infection. Patients from these areas may face problems such as confidentiality issues (which can present barriers to counseling, testing, and treatment), long distances to medical facilities, and lack of nongovernmental organization support. The features of HIV-1 infection in these places need to be appropriately studied so as to optimize the institution of adequate control measures, grasp the true magnitude of the problem, improve clinical recognition and management, and better allocate resources.
The extreme nucleotide sequence variation of HIV-1 makes it possible to trace patterns of viral spread between populations, groups, and people.2 There has been considerable discussion, however, on which genomic region is the most informative and on which phylogenetic methodology5,26 is best suited for such investigations. In recent years, genotypic analysis of antiretroviral drug resistance has generated expanding local and public pol data sets. The suitability of this genomic region to epidemiologic investigations is being increasingly demonstrated,27-29 although it remains a controversial issue.30-32 The present Miracema cohort offers, in our opinion, a unique contribution to the subject. Previous investigations on the usefulness of pol involved the study of viral transmission between isolated cases27,28,30 or the preselection of samples on the basis of closest pairwise distances.29 Instead, we had the opportunity to study cross-sectional samples from 63 members of a 78-patient cohort. Our patients were placed at variable degrees of proximity in the transmission chains, and analyses of both regions yielded essentially the same overall results. Caution should be exercised when refined transmission chain analysis is needed, however, because a pattern of parallel evolution may emerge among sequences that harbor a similar set of resistance mutations.33
Caution should also be taken when studying patients who might have been multiply infected. In a C2V3 study, the establishment of epidemiologic linkage between 2 multiply infected IVDUs who had shared infected needles required the analysis of a series of proviral and plasma strains.34 In the Miracema cohort, the only situation in which a phylogenetic relationship between env and pol sequences from linked patients could not be established involved an IVDU (patient M48) and his female sexual partners (M45/M54). Interestingly, studies that found pol suboptimal for phylogenetic analyses also involved IVDU-recovered sequences.3,30
We found a polyphyletic pattern suggesting multiple viral introductions in the region. Subtyping analysis was in accordance with studies that found the prevalent subtypes in southeast Brazil to be B, F1, and B/F1 recombinants.22,35 We also found 29 samples (representing 46% of the cohort) forming 6 clearly defined clusters. Intracluster cases, including those with no known direct epidemiologic link, probably took part in the same chain of viral transmission, suggesting the existence of sexual networks and the emergence of multiple new infections within a relatively short period. Such potential molecular markers of high incidence highlight the urgent need to perform incidence studies in inner Brazil. In general, clusters were supported by a lower bootstrap value in env. This might reflect the continuously evolving nature of C2V3/env and may even indicate that the targeted fragment of pol is better suited for establishing relationships, especially after a given period has elapsed between transmission and sampling. The presence of mutations associated with drug resistance in the Miracema cohort did not seem to obscure or artificially create phylogenetic relationships.
In summary, this molecular epidemiology study of HIV-1 sequences from a small Brazilian county found evidence of multiple clusters of strains sharing close genetic relatedness, suggesting the existence of sexual networks and a high-incidence molecular profile. Our results highlight the need for further investigations to delineate the features of HIV-1 infection in small Brazilian counties better as well as appropriate control strategies to halt epidemic spread toward these areas.
The authors are indebted to the patients for agreeing to participate in this study. They thank the many dedicated health care professionals in Miracema, without whom this study would not be possible. They also thank Laboratório Avançado de Saúde Pública in Salvador, Bahia State, Programa Nacional de DST/AIDS, and Fundação de Amparo à Pesquisa do Estado da Bahia for organizing invaluable international workshops on bioinformatics.
1. Szwarcwald CL, Bastos FI, Esteves MA, et al. The spread of the AIDS epidemic in Brazil from 1987 to 1996: a spatial analysis. Cad Saude Publica. 2000;16:7-19.
2. Kuiken C, Thakallapalli R, Esklid A, et al. Genetic analysis reveals epidemiologic patterns in the spread of human immunodeficiency virus. Am J Epidemiol. 2000;152:814-822.
3. Albert J, Wahlberg J, Leitner T, et al. Analysis of a rape case by direct sequencing of the human immunodeficiency virus type 1 pol and gag genes. J Virol. 1994;68:5918-5924.
4. Ou CY, Ciesielski CA, Myers G, et al. Molecular epidemiology of HIV transmission in a dental practice. Science. 1992;256:1165-1171.
5. Leitner T, Escanilla D, Franzen C, et al. Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci USA. 1996;93:10864-10869.
6. Yu XF, Liu W, Chen J, et al. Rapid dissemination of a novel B/C recombinant HIV-1 among injection drug users in southern China. AIDS. 2001;15:523-525.
7. Ou CY, Takebe Y, Weniger BG, et al. Independent introduction of two major HIV-1 genotypes into distinct high-risk populations in Thailand. Lancet. 1993;341:1171-1174.
8. Oelrichs RB, Shrestha IL, Anderson DA, et al. The explosive human immunodeficiency virus type 1 epidemic among injecting drug users of Kathmandu, Nepal, is caused by a subtype C virus of restricted genetic diversity. J Virol. 2000;74:1149-1157.
9. Nguyen L, Hu DJ, Choopanya K, et al. Genetic analysis of incident HIV-1 strains among injection drug users in Bangkok: evidence for multiple transmission clusters during a period of high incidence. J Acquir Immune Defic Syndr. 2002;30:248-256.
10. Vidal N, Peeters M, Mulanga-Kabeya C, et al. Unprecedented degree of human immunodeficiency virus type 1 (HIV-1) group M genetic diversity in the Democratic Republic of Congo suggests that the HIV-1 pandemic originated in Central Africa. J Virol. 2000;74: 10498-10507.
11. Trask SA, Derdeyn CA, Fideli U, et al. Molecular epidemiology of human immunodeficiency virus type 1 transmission in a heterosexual cohort of discordant couples in Zambia. J Virol. 2002;76:397-405.
12. Eyer-Silva WA, Basílio-de-Oliveira CA, Morgado MG. HIV-1 infection and AIDS in a small municipality in Southeast Brazil. Rev Saude Publica. 2005;39:950-955.
13. Instituto Brasileiro de Geografia e Estatística (IBGE). Censo Demográfico de 2000. IBGE, 2001, Rio de Janiero, Brazil.
14. Centers for Disease Control and Prevention. 1993 Revised classification system for HIV infection and expanded surveillance case definition for AIDS among adolescents and adults. MMWR Recomm Rep. 1992;41: 961-962.
15. Burland TG. DNASTAR's Lasergene sequence analysis software. Methods Mol Biol. 2000;132:71-91.
16. Thompson JD, Gibson TJ, Plewniak F, et al. The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25: 4876-4882.
17. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406-425.
18. PAUP*: Phylogenetic analysis using parsimony (*and other methods) [computer program]. Version 4.0b10. Saunderland, MA: Sinauer Associates; 1999.
19. Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985;39:783-791.
20. Salminen MO, Carr JK, Burke DS, et al. Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. AIDS Res Hum Retroviruses. 1995;11:1423-1425.
21. Felsenstein J. Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet. 1973;25:471-492.
22. Brindeiro PA, Brindeiro RM, Mortensen C, et al. Testing genotypic and phenotypic resistance in human immunodeficiency virus type 1 isolates of clade B and other clades from children failing antiretroviral therapy. J Clin Microbiol. 2002;40:4512-4519.
23. Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998;14:817-818.
24. Xia X, Xie Z. DAMBE: software package for data analysis in molecular biology and evolution. J Hered. 2001;92:371-373.
25. Ihaka R, Gentleman RR. A language for data analysis and graphics. J Comput Graph Stat. 1996;5:299-314.
26. Posada D, Crandall KA. Selecting models of nucleotide substitution: an application to human immunodeficiency virus 1 (HIV-1). Mol Biol Evol. 2001;18:897-906.
27. Goujon CP, Schneider VM, Grofti J, et al. Phylogenetic analyses indicate an atypical nurse-to-patient transmission of human immunodeficiency virus type 1. J Virol. 2000;74:2525-2532.
28. Metzker ML, Mindell DP, Liu XM, et al. Molecular evidence of HIV-1 transmission in a criminal case. Proc Natl Acad Sci USA. 2002;99:14292-14297.
29. Hue S, Clewley JP, Cane PA, et al. HIV-1 pol gene variation is sufficient for reconstruction of transmissions in the era of antiretroviral therapy. AIDS. 2004;18:719-728.
30. Palmer S, Vuitton D, Gonzales MJ, et al. Reverse transcriptase and protease sequence evolution in two HIV-1-infected couples. J Acquir Immune Defic Syndr. 2002;31:285-290.
31. Sturmer M, Preiser W, Gute P, et al. Phylogenetic analysis of HIV-1 transmission: pol gene sequences are insufficient to clarify true relationships between patient isolates. AIDS. 2004;18: 2109-2113.
32. Jenwitheesuk E, Liu T. Single phylogenetic reconstruction method is insufficient to clarify relationships between patient isolates in HIV-1 transmission case. AIDS. 2005;19:743-744.
33. Lemey P, Derdelinckx I, Rambaut A, et al. Molecular footprint of drug-selective pressure in a human immunodeficiency virus transmission chain. J Virol. 2005;79:11981-11989.
34. Song JZ, Wang B, Ge YC, et al. Significance of plasma and peripheral blood mononuclear cell derived HIV-1 sequences in establishing epidemiologic linkage between two individuals multiply exposed to HIV-1. Microb Pathog. 1999;26:287-298.
35. Guimaraes ML, dos Santos Moreira A, Loureiro R, et al. High frequency of recombinant genomes in HIV type 1 samples from Brazilian southeastern and southern regions. AIDS Res Hum Retroviruses. 2002;18:1261-1269.
© 2006 Lippincott Williams & Wilkins, Inc.