The HIV-1 circulating recombinant form (CRF) 01_AE, first identified among female sex workers in northern Thailand in 1989 [1–3] initially named subtype (or clade) E before its identification as a new recombinant virus [4,5]. Phylogenetic studies have shown that CRF01_AE originated in central Africa in the 1970s and spread in Thailand in the 1980s through heterosexual transmission [4,6]. It was subsequently confirmed as the first large-scale epidemic of a recombinant HIV-1 strain in the world [5,7].
In the early 1990s, CRF01_AE strains were first identified in China among persons at risk of sexual transmission and IDUs in the southwest provinces of Yunnan and Guangxi, which border Myanmar and Vietnam [7–11]. Subsequently, China's nationwide molecular epidemiological surveys in 1996–2002 showed that the early spread of CRF01_AE was limited to the eastern coastal areas and southwest border provinces, predominantly in heterosexual populations [12,13]. However, as of the latest nationwide molecular epidemiological survey for newly reported cases in 2006, CRF01_AE strains had quickly emerged as the most widespread HIV-1 strain in terms of geographic and risk group distributions, accounting for approximately 28% of nationwide HIV infections of that period . More recently, CRF01_AE strains have also become the predominant strain among MSM in China [15,16]. Clearer delineation of the origin, timescale, spatial spread and risk population structure of the HIV-1 recombinant CRF01_AE epidemic in China is, therefore, of considerable public health importance.
In the present study, we sampled various risk populations across major CRF01_AE epidemic regions in China and determined a total of 75 new near full-length genome (NFLG) sequences of CRF01_AE strains. We analyzed the sequences using phylogenetic and molecular clock methods, and identified at least seven closely timed transmission clusters with distinct geographic and risk group distributions. Our findings elucidate the origins and complexity of China's CRF01_AE epidemic and underscore the challenges in controlling sexual transmission of HIV-1.
Sample selection and dataset characteristics
On the basis of the results of the latest national HIV molecular epidemiology survey (NHMES) , all HIV-1 CRF01_AE samples identified were subjects for this study. Approximately 50% of the available specimens (n = 162) were randomly selected for amplification of the NFLGs. Additional samples (n = 99) were collected in order to cover the regions where the NHMES study failed to obtain NFLG CRF01_AE sequences. All together, a total of 75 HIV-1 CRF01_AE NFLG sequences were obtained with about two-thirds (n = 47) from the NHMES and one-third (n = 28) from additional sampling. All available CRF01_AE NFLGs from China (n = 27) in the Los Alamos HIV database were also downloaded and analyzed together with the 75 newly generated NFLG sequences.
The dataset of 102 NFLGs of CRF01_AE strains consists of all major risk populations for CRF01_AE in China, including heterosexuals (n = 58), IDUs (n = 20), MSM (n = 18), and unknown risk (n = 6). Altogether, it provides wide coverage in geographical regions, from which over 90% of the total estimated CRF01_AE HIV-1-infected cases were found . All participants, except four people (two from Yunnan province and two from Hunan provinces), in the study were antiviral treatment naive.
The study was approved by the institutional review board of the National Center for AIDS/STD Control and Prevention. Informed consent was obtained from all participants at the time of sample collection.
Near full-length genome amplification and sequencing
Viral RNA was extracted from plasma using the QIAamp Viral RNA Mini kit (Qiagen, Valencia, California, USA) and reverse transcribed by SuperScript III reverse transcriptase (Invitrogen, Carlsbad, California, USA). The NFLG fragments (552–9636 nt relative to HXB2) were amplified by nested PCR as described by Li et al.  using primers 01_AE-FL1.5 (5′- CCTTGAGTGCTTAAAGTGGTGTGTGCC CGTCTGT-3′, HXB2: 538–571) and 01_AE-FL1.3 (5′- ACCACTTTAAGCACTCAA GGCAAGCTTTATTG-3′, HXB2: 9666–9611) for the first round PCR reaction, and 01_AE-FL2.5 (5′- AGTGGTGTGTGCCCGTCTGTGTTAGGACTC -3′, HXB2: 552–581) and 01_AE-FL2.3 (5′- TTAAGCACTCAAGGCAAGCTTTATTGAGGCTT A-3′, HXB2: 9636–9604) for the second. The PCR products were purified using QIAquick Gel Extraction Kit (Qiagen, Valencia, California, USA) and direct-sequenced using BigDye terminators (Applied Biosystems, Foster City, California, USA). Sequences newly determined in this study are available in GenBank under accession number JX112796-JX112870.
Reference sequences and sequence alignment
In addition to the 102 Chinese NFLGs sequences described above, all available HIV-1 CRF01_AE NFLGs from other countries (accessed in April 2012) were downloaded from Los Alamos HIV sequence database as reference sequences for this analysis. After removing redundancy and those without sampling information, the references (n = 92) were combined with the Chinese sequences to yield a total dataset of 194 sequences. The geographic location, sampling year, and risk groups for the sequences used in this study are summarized in Table 1 and Supplemental Table S1, http://links.lww.com/QAD/A341. An initial alignment of all sequences was performed using Gene Cutter from the Los Alamos HIV sequence database (http://www.hiv.lanl.gov/content/sequence/GENE_CUTTER/cutter.html). The alignment was subsequently adjusted manually using BioEdit v7.0.9 .
Model selection analyses
ModelTest v3.7  was used to find the best-fitting model of nucleotide substitution for our dataset. The general time reversible (GTR) model with γ-distributed (Γ4) among-site rate heterogeneity, and a proportion of invariant sites (I) (the GTR+I+Γ4 model) were chosen as the most appropriate mode on the basis of the standard Akaike information criterion.
PhyML 3.0  was used to estimate a maximum likelihood phylogenetic tree for NFLG sequences using the GTR+I+Γ4 nucleotide substitution model. Tree topologies were searched heuristically using the subtree pruning and regrafting procedure. The confidence of each node in phylogenetic trees was determined using the bootstrap method with 500 replicates. The final maximum likelihood tree was visualized using the program FigTree v1.3.1 (http://beast.bio.ed.ac.uk).
Web-based software Hypermut (http://www.hiv.lanl.gov/) was used to screen for hypermutation from the 194 NFLGs. Consequently, one sequence (FJ061, Fig S1) was identified from Cluster 6 and removed to avoid disproportional influence on the Bayesian phylogenetic analysis.
To estimate the evolutionary rate (μ: nucleotide substitutions/site per year) and the divergence times (time to most recent common ancestor; tMRCA) of various CRF01_AE lineages, we performed Bayesian phylogeny analysis for the gag, pol and env genes. Phylogenies were inferred using BEAST v.1.6.1 under an uncorrelated log-normal relaxed clock model, GTR+Γ4 substitution model, and Bayesian skyline plot demographic model [21–23]. For each gene, BEAST analysis was performed using Markov Chain Monte Carlo runs of chain length 3 × 108, and the first 10–30% of states of each run were discarded as burn-in. Ten thousand trees were then sampled to estimate μ and the tMRCA for each CRF01_AE lineage. Convergence was checked using Tracer v1.5 (http://beast.bio.ed.ac.uk/Tracer), and most parameters had effective sample sizes more than 200, except the tMRCA of the env region of cluster 6 (ESS = 196) (Supplemental Table S2, http://links.lww.com/QAD/A341).
Identification of seven independent HIV-1 CRF01_AE lineages in China
We generated 75 new NFLG sequences of CRF01_AE HIV-1 strains collected from various risk groups and geographical regions in China. We performed phylogenetic analysis on this dataset in combination with existing NFLG sequences available from the Los Alamos HIV Databases, which consisted of 27 sequences from China and 92 from other countries. As shown in Fig. 1, the maximum-likelihood phylogeny identified seven well supported (bootstrap values were greater than 90%) and distinct clusters of CRF01_AE strains in China. Detailed information of the phylogeny tree in Fig. 1 is shown in supplemental Figure S1, http://links.lww.com/QAD/A341. Clusters 1 (n = 39), 2 (n = 10) and 3 (n = 3) were found primarily among heterosexuals and IDUs in southern provinces of China (Table 2). Of note, cluster 2 contained CRF01_AE strains circulating among IDUs in northern Vietnam (n = 8), southeastern China (Guangxi, n = 7; Fujian n = 2) and eastern China (n = 1) (Fig. 1) was placed within a large phylogenetic group comprised of CRF01_AE strains from Vietnam (designated as VN/CN cluster) (Fig. 1). As previously reported by Liao et al. , CRF01_AE strains from Vietnam can be divided into three groups that are defined by risk groups and geographic regions (Fig. 1): CRF01_AE variants circulating among heterosexuals and IDUs in southern Vietnam (SV Hetero and SV IDU clusters, respectively), and among IDUs in northern Vietnam and China's Guangxi province (NV/GX IDU cluster). Here, Cluster 2 entirely encompasses the sequences from the NV/GX IDU cluster, and is closely related to the SV IDU cluster, with which it is grouped together with a bootstrap score of 100%.
On the contrary, clusters 4 (n = 15) and 5 (n = 9) appeared to be associated with the epidemic among MSM in China. In fact, all CRF01_AE sequences identified from MSM in China in the present study (n = 18) belonged to either cluster 4 (n = 12) or cluster 5 (n = 6) (Table 2 and Fig. 1). Cluster 4 was comprised of two subclusters, which are labeled as 4a (n = 10) and 4b (n = 5) (Fig. 1). Subcluster 4a was exclusively from MSM in northern China [Beijing (n = 9); Tianjin (n = 1)], whereas subcluster 4b contained CRF01_AE strains from non-MSM populations (or risk factor unknown) from eastern and southern provinces (three of five) (Fig. 1). Cluster 5 strains (n = 9) were found among MSM (n = 6) and heterosexuals (n = 3) in Beijing (n = 2) and other northeastern provinces [Jilin (n = 4) and Liaoning (n = 3)].
Clusters 6 (n = 5) and 7 (n = 3) are province-specific clusters detected only among heterosexuals in Fujian and Yunnan provinces, respectively (Table 2 and Fig. 1). Finally, the remaining 18 HIV-1 strains from China were intermingled with CRF01_AE sequences of Thai origin (designated ‘ungrouped’; Table 2) and were all from southern provinces [Fujian (n = 9); Guangdong (n = 2); Guizhou (n = 1); Yunnan (n = 6)] (Fig. 1 and Table 2). The geographic and risk group distribution of the CRF01_AE clusters identified in this study are illustrated in Fig. 2 (see also Table 2).
Evolutionary characterization of CRF01_AE clusters in China
To explore the timeline of the emergence of CRF01_AE in China, we estimated the time of the most recent common ancestor (tMRCA) of each cluster using a relaxed molecular clock approach . Analyses were performed in BEAST v1.6.1, using three different subgenomic regions [gag (HXB2: 790–2292nt); pol (HXB2: 2085–5096nt); env (HXB2 : 6225 −8795nt)], under an uncorrelated lognormal relaxed clock model with a GTR+Γ4 substitution model and a Bayesian skyline demographic model. The estimated evolutionary rates were 5.05 (4.46–5.65) × 10–3, 3.25 (2.94–3.58) × 10–3, and 6.59 (5.96–7.21) × 10–3 substitutions/site per year for the gag, pol, and env regions, respectively (numbers in parenthesis show the 95% highest posterior density for each estimate). The estimates and 95% credible regions of the tMRCAs of each CRF01_AE lineage are depicted in Fig. 3 (see also supplemental Table S2, http://links.lww.com/QAD/A341). The tMRCAs of lineages from central Africa and Thailand were estimated to be in the mid-1970s and mid-1980s, respectively, in accordance with the literature. The tMRCAs of the seven Chinese CRF01_AE lineages fell within in a narrow time range dated from early to late 1990s (Fig. 3; supplemental Table S2, http://links.lww.com/QAD/A341). In general, the substitution and evolutionary models and genomic regions used for the analyses had no significant impact on tMRCA estimates (supplemental Table S2, http://links.lww.com/QAD/A341).
In this study, we identified seven independent lineages of HIV-1 CRF01_AE strains circulating in China, the origins of which all date to the mid-to-late 1990s (Fig. 1, Fig. 3). Each cluster appears to have a unique role in China's CRF01_AE HIV-1 epidemic, with distinct associations with particular risk groups and geographic regions (Fig. 2, Table 2). Clusters 1 and 3 were found among heterosexuals and IDUs in mostly southern provinces; the high prevalence of CRF01_AE in these regions is supported by former surveys on southern China [24,25]. Cluster 2, which contained strains circulating among IDUs in Guangxi province, as well as northern Vietnam, was most likely a descendant of strains found among IDUs in southern Vietnam, which clustered closely . Clusters 4 and 5 were highly represented by the more recently emerged transmission among MSM (13 of 15 and 6 of 9, respectively; Table 2) in Beijing and other northern provinces. Clusters 6 and 7 were small, region-specific clusters found only among heterosexuals in Fujian and Yunnan provinces, respectively. Together, these clusters demonstrate the exceptional diversity of the HIV-1 CRF01_AE epidemic in China.
Two observations about the origin and spread of China's CRF01_AE epidemic can be inferred from the phylogenetic structure and transmission patterns. First, although both CRF01_AE and subtype B′ strains entered China following earlier epidemics in nearby Southeast Asian countries [1–3,7–11,26–29], including Thailand, their transmission through distinct risk populations resulted in highly dissimilar geographic and demographic distributions in China. Previous study has shown that following border entry through IDU populations in Yunnan, subtype B′ HIV-1 strains were introduced to plasma donors in central China and consequently a single lineage was amplified throughout the region and the rest of the country . As of 2011, all subtype B′ lineages in circulation in China could be traced to this common ancestor . In contrast, our findings here concerning CRF01_AE showed multiple independent and approximately concurrent introductions into China, followed by population-specific and region-specific transmission through predominantly sexual routes (Fig. 2). These different transmission patterns of subtype B′ and CRF01 AE HIV-1 underscore the complexity of sexual transmission of HIV-1, which involves a more diffuse risk population and is, thus, much more difficult to control than blood transmission routes. As CRF01_AE becomes the most prevalent strain among heterosexual transmitters and MSM in China [13–16,30], it is likely that we will observe continued evolution and diversification within and across the seven major viral lineages.
Second, all CRF01_AE clusters identified here were estimated to have originated in China during a short window period from mid to late 1990s (Fig. 3). Although previous evidences have implicated the entry of CRF01_AE into China with concurrent epidemics in neighboring countries [7–10], there has been no comprehensive analysis of the subtype's origin in China due to limited samples of those studies. Here, we observe that all major clusters and ungrouped sequences, except for cluster 2, bear direct phylogenetic relationships to sequences from Thailand. Demographic data on travel supports this observation: this was likely a result of vastly increased international travel to and from China, starting in the early 1990s when the Chinese government began relaxing border restrictions to Chinese citizens for unofficial international travel (so called ‘Free travel policy’) as part of the open door and economic reform policy of the early 1990s . Thailand has been the top destination of outbound tourism for Chinese in the first decade of ‘China's free travel policy’, as it was the first country that Chinese citizens were allowed to visit (Table 3) [3,32]. This timing coincides with Thailand's explosive HIV-1 epidemic which peaked in mid-1990s and driven by the country's sex industry through heterosexual transmission of CRF01_AE strains. After the Thai government successfully initiated the ‘100% Condom Program’, a rapid increase in condom usage rate was followed by significant decreases in HIV incidence among both direct and indirect CSWs in the country beginning from the late 1990s (Table 3). It is likely that the few years between the opening up of Chinese travel to Thailand and Thailand's effective reduction of HIV incidence served as a short window for the introduction of multiple CRF01_AE strains from Thailand into China.
Additionally, the ‘ungrouped’ CRF01_AE strains in our phylogenetic analysis (18 of 102, 17.6%) were exclusively from southern provinces in China: Fujian, Yunnan, Guizhou, and Guangdong (Table 2, Fig. 2). Although we cannot formally rule out sampling bias (despite of the relatively large sample size of NFLGs, n = 102;), this implicates the earliest entries of CRF01_AE strains into China from Southeast Asia to have taken place in individuals and populations in southern China, especially Fujian, Yunnan and Guangdong provinces, where the earliest epidemics among heterosexuals in China have been reported.
In summary, our results show, for the first time, that the CRF01_AE epidemic in China is remarkably complex, comprising of multiple genetically distinct lineages that were independently introduced into China during the mid-to-late 1990s and subsequently spread into different risk groups and geographic regions. Further study and surveillance of China's CRF01_AE lineages will provide knowledge on their increasing transmission and continued evolution, which will help to inform effective intervention programs.
The authors would like to thank Yao Yang and Jing Wei for technical support, Dr Punnee Pitisuttithum of Mahidol University for assistance in acquiring official Bureau of Epidemiology data records on HIV prevalence among Thailand's sex workers.
The authors would like to thank the Group for HIV Molecular Epidemiologic Survey. Contributing members of the Group [name of contributor (facility of the contributor)]: Hongyan Lu (Beijing CDC); Pingping Yan (Fujian CDC); Peng Lin (Guangdong CDC); Shen Zhiyong, Shujia Liang (Guangxi CDC); Xianguang Sun (Guizhou CDC); Xi Chen, Jianmei He (Hunan CDC); Haitao Yang, Xiaoqin Xu (Jiangsu CDC); Xiangdong Meng, Xihui Zang (Jilin CDC); Fengxia Jiang (Liaoning CDC); Guangming Qin, Shu Liang (Sichuan CDC); Xiaoke Zhu, Minna Zheng (Tianjin CDC); Yanling Ma, Manhong Jia (Yunnan CDC).
Y.S., X.H. and Y.F. conceived and designed the study. Y.F., F.L., X.L., Q.W., T.L., and X.H. performed the experiments and analyzed the data. Y.F., J.H.H., Y.T. and Y.S. drafted the manuscript. Y.S., X.H., Y.R., H.X., and O.P. interpreted data and provided critical review. All authors reviewed and approved the final manuscript.
This study was supported by the National Science and Technology Major Project for Infectious Diseases Control and Prevention (2008ZX10001–004, 2012ZX10001–002), and 2012ZX10001 - 008. National Natural Science Foundation of China (81261120379), NIH Foundation (1R01AI094562-01). International Cooperative Grant (2009DFB30420) and SKLID Development Grant (2012SKLID103).
Conflicts of interest
All authors declare that they have no conflicts of interest.
1. McCutchan FE, Hegerich PA, Brennan TP, Phanuphak P, Singharaj P, Jugsudee A, et al. Genetic variants of HIV-1 in Thailand
. AIDS Res Hum Retroviruses
2. Carr JK, Salminen MO, Koch C, Gotte D, Artenstein AW, Hegerich PA, et al. Full-length sequence and mosaic structure of a human immunodeficiency virus type 1 isolate from Thailand
. J Virol
3. Nelson KE, Celentano DD, Suprasert S, Wright N, Eiumtrakul S, Tulvatana S, et al. Risk factors for HIV infection among young adult men in northern Thailand
4. Gao F, Robertson DL, Morrison SG, Hui H, Craig S, Decker J, et al. The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin
. J Virol
5. Robertson DL, Sharp PM, McCutchan FE, Hahn BH. Recombination in HIV-1
6. McCutchan FE, Artenstein AW, Sanders-Buell E, Salminen MO, Carr JK, Mascola JR, et al. Diversity of the envelope glycoprotein among human immunodeficiency virus type 1 isolates of clade E from Asia and Africa
. J Virol
7. Liao H, Tee KK, Hase S, Uenishi R, Li XJ, Kusagawa S, et al. Phylodynamic analysis of the dissemination of HIV-1 CRF01_AE in Vietnam
8. Chen J, Young NL, Subbarao S, Warachit P, Saguanwongse S, Wongsheree S, et al. HIV type 1 subtypes in Guangxi Province, China, 1996
. AIDS Res Hum Retroviruses
9. Piyasirisilp S, McCutchan FE, Carr JK, Sanders-Buell E, Liu W, Chen J, et al. A recent outbreak of human immunodeficiency virus type 1 infection in southern China was initiated by two highly homogeneous, geographically separated strains, circulating recombinant form AE and a novel BC recombinant
. J Virol
10. Yu XF, Chen J, Shao Y, Beyrer C, Lai S. Two subtypes of HIV-1 among injection-drug users in southern China
11. Cheng H, Zhang J, Capizzi J, Young NL, Mastro TD. HIV-1 subtype E in Yunnan, China
12. Xing H, Pan PL, Su L, Fan XJ, Feng Y, Qiang LY, et al. [Molecular Epidemioiogical Study of a HIV-1 Strain of subtype E in China between 1996 and 1998]
. Zhongguo Xing Bing Ai Zi Bing Fang Zhi
13. Xing H, Liang H, Wan ZY, Chen X, Wei M, Ma PF, et al. [Distribution of recombinant human immunodeficiency virus type-1 CRF01_AE strains in China and its sequence variations in the env V3-C3 region]
. Zhonghua Yu Fang Yi Xue Za Zhi
14. He X, Xing H, Ruan Y, Hong K, Cheng C, Hu Y, et al. A comprehensive mapping of HIV-1 genotypes in various risk groups and regions across China based on a nationwide molecular epidemiologic survey
. PLoS One
15. Zhao B, Han X, Dai D, Liu J, Ding H, Xu J, et al. New trends of primary drug resistance among HIV type 1-infected men who have sex with men in Liaoning Province, China
. AIDS Res Hum Retroviruses
16. Zhang X, Li S, Li X, Xu J, Li D, Ruan Y, et al. Characterization of HIV-1 subtypes and viral antiretroviral drug resistance in men who have sex with men in Beijing, China
2007; 21 (Suppl 8):S59–S65.
17. Li Z, He X, Wang Z, Xing H, Li F, Yang Y, et al. Tracing the origin and history of HIV-1 subtype B′ epidemic by near full-length genome analyses
18. Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT
. Nucleic Acids Symp Ser
19. Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution
20. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0
. Syst Biol
21. Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence
. PLoS Biol
22. Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data
23. Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees
. BMC Evol Biol
24. Cheng CL, Feng Y, He X, Lin P, Liang SJ, Yi ZQ, et al. [Genetic characteristics of HIV-1 CRF01_AE strains in four provinces, southern China]
. Zhonghua Liu Xing Bing Xue Za Zhi
25. Zeng H, Sun Z, Liang S, Li L, Jiang Y, Liu W, et al. Emergence of a New HIV Type 1 CRF01_AE Variant in Guangxi, Southern China
. AIDS Res Hum Retroviruses
26. Ma Y, Li Z, Zhao SD. HIV infected people were first identified in intravenous drug users in China
. Chin J Epidemiol
27. Ou CY, Takebe Y, Weniger BG, Luo C, Kalish ML, Auwanit W, et al. Independent introduction of two major HIV-1 genotypes into distinct high-risk populations in Thailand
28. Shao Y, Chen Z, Wang B, Zeng Y, Zhao SD, Zhang ZR. Isolation of viruses from HIV infected individuals in Yunnan
. Chin J Epidemiol
29. Shao Y, Zhao QB, Wang B, Chen Z, Su L, Zeng Y, et al. Sequence analysis of HIV env genes among HIV infected drug injecting users in Dehong epidemic area of Yunnan province, China
. Chin J Virol
30. Li L, Lu X, Li H, Chen L, Wang Z, Liu Y, et al. High genetic diversity of HIV-1 was found in men who have sex with men in Shijiazhuang, China
. Infect Genet Evol
31. The yearbook of China tourism statistics.
Beijing: National Tourism Administration of the P.R.C. 1978–2011.
32. Rojanapithayakorn W, Hanenberg R. The 100% condom program in Thailand
China; CRF01_AE; HIV-1; near full-length genome; phylogenetic cluster; risk population
Supplemental Digital Content
© 2013 Lippincott Williams & Wilkins, Inc.