Spatiotemporal Phylodynamics of Hepatitis C Among People Who Inject Drugs in India : Hepatology

Secondary Logo

Journal Logo

Original Articles: Viral Hepatitis

Spatiotemporal Phylodynamics of Hepatitis C Among People Who Inject Drugs in India

Clipman, Steven J.1; Mehta, Shruti H.2; Rodgers, Mary A.5; Duggal, Priya1,2; Srikrishnan, Aylur K.3; Saravanan, Shanmugam3; Balakrishnan, Pachamuthu3; Vasudevan, Canjeevaran K.3; Ray, Stuart C.1; Kumar, Muniratnam S.3; Quinn, Thomas C.1,4; Cloherty, Gavin A.5; Lucas, Gregory M.1; Solomon, Sunil S.*,1,3

Author Information
Hepatology 74(4):p 1782-1794, October 2021. | DOI: 10.1002/hep.31912


Supported by the National Institute on Drug Abuse (grants R01DA032059, DP2DA040244, R01DA041736, K24DA035684), National Institute of Allergy and Infectious Diseases (AI094189), the Division of Intramural Research, and Abbott Diagnostics.

Potential conflict of interest: Dr. Mehta is on the speakers’ bureau for Gilead. Dr. Rodgers is employed by and owns stock in Abbott. Dr. Cloherty is employed by and owns stock in Abbott. Dr. Solomon reports grants/products and advisory board fees from Gilead Sciences and grants/products from Abbott Diagnostics, outside this published work.

People who inject drugs (PWID) bear a high burden of HCV with an estimated HCV prevalence upward of 50%.(1,2) Moreover, injection drug use is a major driver of new infections globally and primary prevention of HCV among this population is a major challenge.(3) However, over the past 5 years, there have been major advances in curative HCV therapeutics, and with these developments, conversations have turned to treatment or “cure” as prevention,(4‐8) following the example of HIV. Indeed, the World Health Organization has called for the elimination of HCV as a public health problem by 2030.(9‐11) It is well recognized that elimination will not be achieved unless treatment reaches all populations that are infected and capable of transmitting, including PWID. Some have posited that the optimal approach for HCV treatment delivery is through PWID networks(12‐14) with the added benefit of reducing the risk of reinfection. Yet these hypotheses are based on an incomplete understanding of the transmission dynamics of HCV, particularly in low and middle‐income countries (LMICs). A more nuanced understanding of these dynamics is critical for efficiently designing HCV elimination programs.

Advances in genetic sequencing and bioinformatic methods provide epidemiologists with a molecular toolset to elucidate viral transmission dynamics and harness epidemiologic and viral sequence data to infer evolutionary histories and identify factors driving viral spread in space and time. We leverage epidemiologic, geospatial, and viral genetic data to characterize HCV transmission dynamics in four cities across India. India provides a unique setting for exploring these dynamics because it is uniquely situated between the world’s two largest heroin‐producing regions, the “Golden Crescent” and the “Golden Triangle,” resulting in a diverse epidemiology of drug use and the largest number of opioid users globally.(15) Specifically, historical opioid epidemics in Northeastern India are thought to have been driven by proximity to the Golden Triangle, and more recent shifts in heroin production to the Golden Crescent appear to have seeded newer opioid epidemics in North and Central India. We included data from four cities across India representing this diversity of geography and drug use epidemiology and history (Amritsar [Northwest], New Delhi [North], Kanpur [Central], and Imphal [Northeast]). We further draw comparisons to HIV dynamics in coinfected and monoinfected PWID in the same cities.

Materials and Methods

Study Design

This study used survey data and 483 HCV and 492 HIV sequences isolated from stored specimens of PWID sampled using respondent‐driven sampling (RDS) across four Indian cities (Amritsar, New Delhi, Kanpur, and Imphal) between August 2016 and April 2017 as part of the evaluation assessment of a cluster‐randomized trial ( Identifier: NCT01686750). The number of PWID in each of these cities, from estimates in the literature, ranges from 34,000 to 100,000; HCV antibody prevalence among PWID in these cities ranges from 45.7% to 68.5%; and prevalence of HIV ranges from 13.8% to 31.1% (Supporting Table S1).(16,17) Eligibility criteria for the survey were (1) age ≥18 years, (2) self‐report of injecting drugs in the prior two years, and (3) provision of oral informed consent. Detailed procedures of this assessment, including evaluation of RDS assumptions, have been published.(17‐19) We did not find assumptions violated to an extent that could bias phylogenetic analyses, and individuals included in this paper are a subset of the larger RDS sample based on HIV/HCV specimen availability, further increasing stochasticity. None of the individuals who clustered recruited one another, and of the individuals included in this sample, 425 (88%) were recruited by someone other than an injection or sexual partner (e.g., a stranger, friend, acquaintance). As part of the trial, blood specimens were banked with appropriate consent for additional testing. Specimens were selected for HCV and HIV sequencing if they were antibody positive, had RNA >3.5 log10 copies/mL, and had adequate sample volume (≥1.5 mL). Sites were selected to represent geographic diversity, with Amritsar in the northwest bordering Pakistan, New Delhi in the north, Kanpur in North‐Central India, and Imphal in the northeast bordering Myanmar. The original sample size was 1,000 PWID per city. Of the 4,000 participants recruited across these four sites, 704 HCV and 622 HIV samples were antibody‐positive and satisfied the thresholds listed above, of which we obtained sequences for 498 HCV sequences (target sample size = 500; 15 sequences were excluded because they did not pass quality control [QC]) and 492 HIV sequences (see Supporting Fig. S2 for details). A total of 233 sequences were obtained from HIV/HCV coinfected individuals (i.e., we had HIV and HCV sequence from the same individual). Although the study presented in this paper is focused primarily on HCV phylodynamics, HIV sequences were used to draw comparisons between the two epidemics within the same population and cities and to evaluate evidence of shared transmission pathways.

Epidemiologic Survey Data and HIV/HCV Screening

As part of the study visit, participants completed an interviewer‐administered electronic survey, provided a blood sample, and underwent on‐site rapid HIV testing with pretest and posttest counseling with appropriate referrals. No personal identifying information was recorded. Biometric (fingerprint data) verification was used to identify duplicates. Residual samples were stored at the central repository in Chennai, India. The survey included modules on sociodemographics, including zip code of residence and injection, substance use, and risk behaviors, including sexual practices. The stored specimens were tested for HCV antibodies using the Genedia HCV ELISA 3.0 (Green Cross Medical Science, Chungbuk, Korea). All HCV antibody‐positive samples were tested for HCV RNA with the RealTime HCV assay (ABBOTT Molecular Inc., Des Plaines, IL) using an ABBOTT m2000rt instrument (ABBOTT Molecular Inc., Des Plaines, IL) with a lower limit of quantification of 30 IU/mL.


The HCV 5’ untranslated region (UTR)‐core region spanning ~700 bp and HIV envelope (env) immunodominant region (IDR) spanning ~650 bp were amplified by reverse transcription‐polymerase chain reaction (RT‐PCR) and sequenced using the Sanger methods described.(20,21) Primer sequences were removed with Sequencher v5.4.6 (Gene Codes, Ann Arbor, MI), and sequences were trimmed by quality and to remove leading and trailing ambiguous bases. Consensus contigs were assembled from forward and reverse reads with a minimum overlap of 20 bp and an 85% minimum match percentage. Contigs were manually reviewed for further trimming and calling of mixed bases.

Statistical Analyses

The 483 HCV and 492 HIV sequences were separately aligned using MUltiple Sequence Comparison by Log‐Expectation and edited manually for optimization. Further QC included checks for duplicated sequences, and the phylogenetic signal of the multiple sequence alignments was investigated using the Xia et al.(22) method in DAMBE7. In brief, this method is based on the concept of entropy in information theory and utilizes the index of substitution rate (Iss), defined as Iss=H/HFss, where H represents observed entropy and HFSS represents the entropy when sequences have reached full substitution saturation. If Iss is greater than Iss.c (the critical value at which the sequences fail to recover the true tree), we can conclude that the sequences have experienced severe substitution saturation and should not be used for phylogenetic reconstruction. These data show Iss.c > Iss (P < 0.0001) (Supporting Table S3). Substitution saturation was also evaluated by plotting the observed number of transitions and transversions against Tamura‐Nei (TN93) genetic distance for nn1/2 pairwise comparisons in the multiple sequence alignment of n taxa for HCV and HIV (Supporting Figs. S4 and S5). Transitions and transversions are expected to increase linearly with genetic distance and transitions are expected to be higher than transversions. Substitution saturation is reached when transversions become higher than transitions.

Determination of Nucleotide Substitution Model

The most appropriate nucleotide substitution model for phylogenetic analysis was determined using jModelTest v2.1.10 with scores determined using hierarchical likelihood ratio tests and Akaike Information Criterion. A general time reversible (GTR) model with four categories of assumed rate heterogeneity (Γ4) and invariant sites (I) was best suited for both these HCV and HIV data. The GTR model is generally one of the most flexible because it assumes unequal base frequencies and that each nucleotide has different transition and transversion rates as well as the gamma parameter for rate variation among sites.

Tree Inference

Maximum likelihood (ML) phylogenetic trees were inferred using RAxML with 500 bootstrap replications under a GTR+Γ4+I model. Temporal signal was examined using root‐to‐tip regression in TempEst v1.5.(23) Sequences sampled from Indian PWID were supplemented with additional reference sequences from India and neighboring countries by retrieving all unique sequences that spanned the 5’UTR‐core region and were annotated with sample date, country, and genotype in the Los Alamos HCV Sequence Database (India, n = 18; China, n = 32; Pakistan, n = 3; Myanmar, n = 7; sample years: 2001‐2016). Time‐resolved phylogenies were inferred using a Bayesian Markov Chain Monte Carlo (MCMC) approach in BEAST 2.5(24) with a GTR+Γ4+I site model and uncorrelated lognormally distributed relaxed molecular clock. To account for potential confounding from spatial and temporal population substructure, a structured birth‐death model(25) was used. Sensitivity analyses were conducted under a coalescent Bayesian skyline model and birth‐death serial skyline. Sampling dates were used to calibrate the time scale. Priors were set using knowledge from the literature and ML tree inference. MCMC was run for a chain length of 50 million states with sampling every 1,000 steps. Chain convergence was assessed using Tracer v1.7. The effective sample size (ESS) was examined for each parameter, and all ESS values were >200, generally considered sufficient for sampling and chain mixing. The maximum clade credibility tree was selected from the posterior tree distribution after a 10% burn‐in using TreeAnnotator v1.8.4.

Phylogeographic Analysis

A Bayesian phylogeographic approach implemented in BEAST 2.5 was used to describe the geographical distribution of HCV lineages and reconstruct the geospatial dynamics of HCV spread with temporal information obtained from time‐resolved phylogenies. By considering geographic locations as discrete states in a Bayesian statistical framework, the evolutionary history of viral migration through time can be reconstructed using Markov Jump counts of location‐state transitions along the posterior tree distribution and label nodes by location, both at tips where it is known, and internal nodes where it is inferred using ancestral state reconstruction. To avoid potential for bias in viral gene flow estimates introduced by unequal viral subsampling across sites, 90 sequences were randomly selected from each city. Sensitivity analysis was conducted using the full dataset. The model used an asymmetric GTR+Γ4 substitution model and a strict molecular clock, further sensitivity analysis was conducted using a relaxed lognormal clock. A coalescent Bayesian skyline model was used as the population model, and a coalescent constant population model was used for sensitivity analysis. Priors were set using knowledge gleaned from prior MCMC inference. The GEO_SPHERE package in BEAST was used to create a discrete geographic model and apply a Bayesian stochastic search variable selection method to identify the number of migrations between cities. Statistical support for location changes was assigned using a Bayes factor test and determines the most parsimonious depiction of viral migration patterns. The direction of transitions between the states (cities) was inferred in BEAST using the asymmetrical discrete traits analysis. The MCMC simulations ran for 55 million states reaching ESS > 200. SpreaD3 was used to visualize the data on a map and is further depicted in a time‐resolved phylogenetic tree format in the supplement (Supporting Fig. S8).

Identification of Linked Transmission Clusters

Identification of putative transmission clusters was performed using Cluster Picker with a distance threshold (d) of 0.045 for HCV and 0.015 for HIV and support threshold (s) of 0.9, considering only valid nucleotides (A,C,T,G).(26,27) The optimal distance threshold was determined using EPI‐ClusT (‐ClusT) by computing the threshold that maximizes the number of nonsingleton clusters in steps of 0.001. Sensitivity analysis was conducted with a distance‐free approach using Phydelity ( Phydelity negates the need for arbitrary distance thresholds by identifying groups of sequences that are more closely related to one another than the ensemble distribution under a statistically principled framework based on integer linear programming optimization.(28) We did not see substantial differences in cluster size or composition between these two methods.

Epidemiologic Risk Factor Analysis

Individual and network‐level risk factors were analyzed for an association with clustering through univariable and multivariable logistic regression. The Boruta random forest machine learning feature selection algorithm was used in Python to nominate candidate risk factors.

Geospatial Analysis

The geospatial dispersal of genetically linked individuals in New Delhi was evaluated by calculating the median Euclidean distance between centroids of self‐reported zip code of residence and injection venue using the pgeocode package ( in Python. Distances from linked individuals were compared with those from nonlinked individuals. To further determine if linked transmissions were significantly closer than expected by chance alone, 5,000 samples were randomly permuted from New Delhi at a probability proportional to the district population density. The median distance between permuted samples was compared with that of the linked study samples.

Ethical Approval

The study protocol was approved by the institutional review boards at Johns Hopkins Medicine (IRB00082575) and the YR Gaitonde Centre for AIDS Research and Education in India (YRG‐282). The study protocol conforms to the ethical guidelines of the 1975 Declaration of Helsinki. All participants provided oral informed consent.



The majority of HCV‐infected PWID in this study were male, with a median age of 33. Overall, 95% reported injecting drugs in the prior 6 months and 73% reported ever sharing needles. Higher proportions of heroin injection were seen in Amritsar and Imphal and higher proportions of buprenorphine injection were seen in New Delhi and Kanpur (Table 1). Of 483 HCV samples, 363 were coinfected with HIV (75.2%).

Table 1 - Participant characteristics of 483 HCV‐infected PWID across four Indian cities: Amritsar (AM), New Delhi (DH), Kanpur (KA), and Imphal (IM). Parentheses denote n unless otherwise specified.
Characteristic Total AM DH KA IM
Participants (n) 483 123 128 138 94
Number of clustered HCV sequences 139 (28.8%) 40 (32.5%) 50 (31.3%) 24 (17.4%) 35 (37.2%)
Median age (IQR) 33 (27‐41) 31 (26‐36) 28 (23‐35) 36 (30‐) 41 (37‐44)
Male sex 99.4% (480) 100% (123) 100% (128) 100% (138) 96.8% (91)
Married to or living with female partner 39.1% (190) 38.1% (48) 18.0% (23) 44.2% (61) 61.7% (58)
Men who have sex with men 12.9% (62) 22.8% (28) 11.7% (15) 13.8% (19) 0% (0)
At least secondary school education 50.9% (246) 52.9% (65) 28.9% (37) 42.0% (58) 91.5% (86)
Employed daily 66.1% (319) 56.9% (70) 85.9% (110) 70.3% (97) 44.7% (42)
Ever tested for HIV 54.9% (265) 74.8% (92) 46.9% (60) 25.4% (35) 83.0% (78)
Ever tested for HCV 25.5% (123) 43.9% (54) 1.6% (2) 0% (0) 71.3% (67)
Injected drugs in past 6 months 94.6% (457) 95.9% (118) 99.2% (127) 96.4% (133) 84.0% (79)
Median no. days injected past 6 months 180 (0‐180) 180 (0‐180) 176 (0‐180) 180 (0‐180) 85 (0‐180)
Median no. injections per day (range) 2 (1‐10) 2 (1‐8) 2 (1‐10) 2 (1‐5) 2 (1‐10)
Median years injecting drugs (IQR) 10 (5‐16) 10 (6‐14) 7 (3‐10) 11 (5‐16) 21 (11‐26)
Ever shared needles 73.3% (354) 77.3% (95) 75.0% (96) 56.5% (78) 90.4% (85)
Shared needles in past 6 months 63.8% (308) 58.5% (72) 71.1% (91) 44.9% (62) 88.3% (83)
Inject heroin 72.5% (350) 91.9% (113) 64.8% (83) 44.2% (61) 98.9% (93)
Inject buprenorphine 71.2% (344) 80.5% (99) 96.9% (124) 86.2% (119) 2.1% (2)
Inject other drugs 29.0% (140) 32.5% (40) 22.7% (29) 37.7% (52) 20.2% (19)
Travel for work in past 12 months 28.6% (138) 14.6% (18) 18.8% (24) 31.9% (44) 55.3% (52)
Travel for drugs in past 12 months 2.5% (12) 0% (0) 4.7% (6) 2.2% (3) 3.2% (3)
Mean kilometers traveled in past 12 months (range) 291 (0‐>9,995) 265 (0‐3,000) 142 (0‐2,500) 265 (0‐9,995) 738 (0‐>9,995)

Linked Transmission Clusters

Overall, 139 (28.8%) of HCV sequences had a putative epidemiologic link with another sequence and fell into 19 transmission clusters (Fig. 1) with a median size of three individuals (range: 2‐49). Nine clusters were dyads and three clusters contained >10 individuals. Overall, HCV phylogeny and clustering localized largely by city of origin with some evidence of intercity clustering. Independent predictors of HCV clustering in models adjusted for sex, city, needle sharing, and injection frequency included age (adjusted odds ratio [AOR] per 5‐year increase = 0.83, 95% CI, 0.72‐0.95) and infection with HCV subtype 3b (AOR, 3.44; 95% CI, 2.04‐5.81). Analysis of age by HCV subtype and city revealed that individuals infected with 3b also tend to be younger (Supporting Fig. S6). Overall, HCV sequences clustered by city; however, transmission between cities was seen with 3 of 19 HCV clusters that contained samples from multiple cities, appearing primarily between Delhi (North India), Amritsar (Northwestern India), and Kanpur (Central India). Imphal (Northeastern India) only clustered with another city in one instance.

FIG. 1:
Transmission cluster structure of 139 genetically linked HCV sequences. Circular nodes denote an individual sample and edges denote a putative link. Nodes are colored by participant study city.

HCV Subtypes

Subtypes have been examined in detail and published elsewhere.(29) In brief, HCV subtype 3a (39.0%), 1a (26.9%), 3b (20.7%), and 6n (4.8%) were the predominant subtypes. A supplemental phylogenetic tree depicts the evolutionary relationship between subtypes (Supporting Fig. S7).

Geospatial Analysis

The median geographical distance between genetically linked HCV samples was 6.3 km (interquartile range [IQR], 0‐10.6) by zip code of residence and 1.1 km (IQR, 0‐10.6) by zip code of injection venue. The Euclidian distance between the centroids of self‐reported zip code of residence and zip code of injection venue was found to be significantly (P < 0.05) closer in genetically linked participants than expected by chance alone (Fig. 2) as well as between nonlinked participants. Nonlinked participants had a median distance of 7.8 km (IQR, 0‐11.4) by zip code of residence and 3.4 km (IQR, 0‐10.5) by zip code of injection venue. The overall median distance between zip code of residence and zip code of injection venue was 0 km (mean = 2.0 km).

FIG. 2:
Histogram of distance between centroid of self‐reported zip code of residence of genetically linked HCV samples versus 5,000 permuted samples drawn randomly with respect to population density in New Delhi.

Time‐Resolved Phylogeny

The narrow sample collection window hindered fine calibration of the molecular clock; however, temporal analysis using root‐to‐tip regression uncovered a modest signal (R2 = 0.03), which was shown to support a temporal signal under a relaxed molecular clock.(30) The Bayesian inferred tree (Fig. 3) for HCV shows a shallow time depth, and the overall mean time to most recent common ancestor (TMRCA) for samples within a transmission cluster was 1 year (IQR, 5 months ‐ 2 years).

FIG. 3:
Maximum clade credibility (MCC) tree of HCV sequences from Indian PWID (orange) and references (green). MCC tree was selected from the posterior distribution of a 50 million state MCMC simulation under a structured birth‐death model and GTR+Γ4+I substitution model.

Phylogeographic Analysis

The Bayesian phylogeographic approach employs stochastic models that afford quantification of uncertainty in both ancestral state reconstructions and the phylogeographic process to infer spatiotemporal dispersion patterns of HCV in this population of PWID. The most probable location states agree on Imphal as the origin of the most recent common ancestor of these sequences (Supporting Fig. S8). Phylogeographic patterns indicate a historically higher net gene flow along an east to west gradient, initially spreading from Imphal to Kanpur and then to Amritsar before moving to Delhi. This gradient later reverses with more recent transmission between cities showing well‐supported eastwardly movement around 2007 onward. Circular polygons (Fig. 4) depict Markov reward counts such that the size of the polygons around a sampling location is proportional to the number of lineages that maintain that location. This captures the absolute and relative intensity of the local virus spread at any given point in time. Imphal and Amritsar have the highest amount of local virus spread, whereas Delhi and Kanpur appear as more transitory hotspots. An animated version of this figure is available online (

FIG. 4:
Geospatial diffusion of HCV among PWID across four Indian cities. Circular polygons depict Markov reward counts such that the size of the polygons around a sampling location is proportional to the number of lineages that maintain that location. Colored lines depict movement into a particular city. Direction is indicated with arrowheads, with eastward movements further depicted by lines with an upward curvature and westward movements depicted by lines with a downward curvature. Only lines with well‐supported rates (Bayes Factor > 3) that dominate the diffusion process by Bayesian stochastic search variable selection are displayed.

Comparison With HIV and Assessment of Shared Transmission Pathways

Phylogenetic cluster inference using HIV sequences revealed that 109 (22.2%) of samples had a putative link with at least one other sample and fell into 41 transmission clusters (Fig. 5). Twenty‐seven of these clusters were dyads, and none of the samples displayed intercity clustering. None of the Imphal samples clustered with any other samples. HIV sequences were available for 233 of 363 of the HIV/HCV coinfected samples. Overall, 18 coinfected samples had HCV and HIV sequences that clustered, which was indicative that both infections happened recently; however, only two pairs of coinfected samples clustered together in both HCV and HIV, which was indicative of a potentially shared transmission pathway. In the cases in which samples clustered together, individuals were all younger than 33 years.

FIG. 5:
Transmission cluster structure of 109 genetically linked HIV sequences. Circular nodes denote an individual sample, and edges denote a putative link. Nodes are colored by participant study city.


We studied 483 HCV sequences and 492 HIV sequences isolated from PWID across four Indian cities to better understand viral transmission dynamics to inform public health interventions. HCV transmission cluster analysis revealed that about a third of the HCV sequences had a putative epidemiologic link with another sequence, and intercity clustering was seen. The number of clusters and cluster size suggest that there is ongoing rapid transmission of HCV among PWID, which is further supported by the topology and short tip branch lengths of the HCV time‐resolved tree.(31) Epidemiologic analysis failed to uncover many individual‐level risk factors for HCV clustering (i.e., recent transmission) beyond age and HCV subtype 3b; however, there was a clear and statistically significant association with geographic distance. Clustered individuals generally lived and injected close to one another, with the median distance between injection venue being shortest. Phylogeographic analyses showed that viral spread appears to follow established drug trafficking routes,(15,32,33) with Delhi in Northern India and Kanpur in Central India appearing as important intermediaries through which drugs and virus pass. Finally, in comparison with HIV, there is more recent and ongoing transmission of HCV that is not fully consistent with shared pathways in HIV, potentially suggesting high rates of HCV reinfection or superinfection with dominance of the secondary infection.(34)

These results have implications for public health programming. Transmission cluster analysis answers questions relevant to HCV elimination programs. Cluster size and number reflect infection/reinfection, and intercity clusters suggest that there is viral spread between cities, supporting the idea that a national (opposed to network‐based or “microelimination”) HCV program may be required to fully combat the epidemic and prevent reintroduction of disease from neighboring states. Intercity transmission was seen largely between Delhi, Amritsar, and Kanpur, suggesting that these may be possible hotspots for propagating intercity transmission. Indeed, this is consistent with subsequent phylogeographic analyses that pinpointed Delhi and Kanpur as waypoints where virus passes through; Delhi remains a major city for employment, attracting migrant laborers from the neighboring states of Punjab, Uttar Pradesh, and Haryana, which may explain this observation. Targeting such locations could be critical to interrupting transmission across states.

Comparing results from phylogenetic inference and cluster analysis with those of HIV further suggest that not only do PWID appear to experience regular HCV reinfection or superinfection but also that injection networks are evolving, i.e., individuals do not exclusively inject with the same partners. Comparisons with HIV show that the recent and ongoing transmission of HCV is not fully consistent with shared HIV transmission pathways, and no intercity clustering was seen with HIV as it was with HCV. In instances in which there was evidence of shared transmission pathways or time of infection, individuals were younger, more likely representing first infection. The time depth, measured as mean TMRCA from tips to first internal node, in a time‐resolved phylogenetic tree provides a rough approximation to the time of transmission and was an average of 1 year within HCV transmission clusters. Although the exact calibration of time is a limitation of these analyses, these findings are consistent with a high reinfection rate of HCV and potentially superinfection and dominance of the secondary infection. It is worth noting that reinfection, or rather, superinfection with HIV along shared transmission pathways is possible; however, because of the biology of HIV and the Sanger sequencing methodology used in this study, superinfection is not as evident in these phylogenetic comparisons. Nonetheless, although an initial unobserved common‐source infection of HIV and HCV is possible, these data show that PWID are most likely clearing and becoming reinfected with HCV from another source.

Regression analysis failed to uncover major individual‐level predictors for HCV clustering but did identify a strong association with geographic distance, highlighting the value of venue‐based interventions. Public health programs in India may be best suited to focus PWID interventions spatially on injection venues rather than trying to target specific individual risk factors or profile of drug use (e.g., injection of prescription opioids versus heroin).

Finally, the phylogeographic analyses in this study further inform geographic priorities for surveillance and interventions. Around 30% of participants in this study reported traveling in the prior 12 months, suggesting that the viral geographic diffusion observed has likely been influenced by travel of HCV‐infected PWID. Analyses show that the HCV epidemic among this population of PWID was likely first imported into Imphal, with diffusion consistent with historic drug trafficking patterns, as opioids were imported from the Golden Triangle of Myanmar, Laos, and Thailand.(15) Consequently, an initial east to west gradient of viral gene flow is seen. Although historical spread appears to have come from Imphal, gene flow later shifts west to east and Amritsar appears to become a larger contributor to viral spread, with Delhi and Kanpur serving as intermediaries where drugs and virus pass through. This is further evidenced by Markov reward counts, which showed less intense local virus spread in Delhi and Kanpur compared with Amritsar and Imphal. Opium production in Afghanistan increased to historic highs in 2007(33) and replaced the Golden Triangle as the world’s largest producer of opium.(15,35) This increase in production and trafficking from Afghanistan(32) and the Golden Crescent coincides with the increased eastwardly gene flow seen out of Amritsar, which borders Pakistan. These data demonstrate the role of robust surveillance measures incorporating sequencing to plan effective public health programs for HCV.

Although this study offers a characterization of HCV phylodynamics among PWID in India and draws comparisons to HIV dynamics, results should be considered in the context of some notable limitations. The cross‐sectional study design represents a clear limitation in determining the impact of reinfection and/or superinfection. Participants in this study were recruited through RDS, and therefore, recruitment ties do not represent a true injection network that could be compared against phylogenetic clusters. Our ability to accurately date time‐measured phylogenetic trees is limited by the short sampling window (2016‐2017), which prevented fine calibration of the molecular clock; however, the directionality of the temporal diffusion patterns of HCV remains valid. Additionally, only four sites were studied and not all positive samples could be sequenced; however, the analytical methodology and modeling attempt to address these limitations. As with most studies, the entire population of PWID was not sampled; therefore, it is possible that there could be additional PWID who could have clustered or been intermediaries in transmission chains but were not sampled. Finally, the genetic sequence data in this study were obtained by Sanger sequencing and did not capture the whole viral genomes or quasispecies diversity.


This study characterizes HCV phylodynamics among PWID in an LMIC, leveraging epidemiologic, geospatial, and viral genetic data as well as drawing comparisons to HIV dynamics in the same cities. Phylogenetic cluster analysis suggests that a nationally concerted effort is needed to combat HCV across Indian states, and phylogeographic analyses suggest that surveillance in Imphal and Amritsar could help prevent reimportation of virus by drug trafficking routes. Interventions should be countrywide and be focused on spaces, i.e., injection venues. These findings also shed light on the potential importance of computational genetic methods and phylogeographic analysis in the planning of HCV prevention and treatment programs.

Author Contributions

S.S.S., S.J.C., S.S., G.M.L., T.C.Q., and S.H.M. conceptualized the study and obtained funding. S.S.S., A.K.S., P.B., and C.K.V. led protocol implementation. M.A.R., S.C.R., G.A.C., M.S.K, and S.S. provided laboratory expertise. S.J.C. led the statistical analysis with input from S.C.R., P.D., S.S.S., and S.H.M. S.J.C., S.S.S., and S.H.M. drafted the manuscript, and S.J.C. prepared figures. S.J.C., S.S.S., and S.H.M. had full access to all the data in the study and had final responsibility for the decision to submit for publication. All authors revised the manuscript and approved the final version.

Data Availability Statement

Data from this study, including deidentified participant data, data dictionary, study protocol, statistical analysis plan, and informed consent documents will be made available to researchers. To access data, researchers should contact the corresponding author. Researchers will need to complete a concept sheet for their proposed analyses, to be reviewed and approved by the study investigators. The study investigators will consider overlap of the proposed project with active or planned analyses and the appropriateness of study data for the proposed analysis. All sequences in this study have been deposited in GenBank. The GenBank accession numbers for HCV 5’UTR‐core sequences are MN697739MN698236, and the HIV env IDR sequences are MN378645MN379312


1. Aceijas C, Rhodes T. Global estimates of prevalence of HCV infection among injecting drug users. Int J Drug Policy 2007;18:352‐358.
2. Solomon SS, Mehta SH, Srikrishnan AK, Solomon S, McFall AM, Laeyendecker O, et al. Burden of hepatitis C virus disease and access to hepatitis C virus services in people who inject drugs in India: a cross‐sectional study. Lancet Infect Dis 2015;15:36‐45.
3. Joint United Nations Programme on HIV/AIDS . Do No Harm—Health, Human Rights, and People Who Use Drugs. Geneva, Switzerland: UNAIDS; 2016.
4. Sulkowski MS, Gardiner DF, Rodriguez‐Torres M, Reddy KR, Hassanein T, Jacobson I, et al. Daclatasvir plus sofosbuvir for previously treated or untreated chronic HCV infection. N Engl J Med 2014;370:211‐221.
5. Afdhal N, Zeuzem S, Kwo P, Chojkier M, Gitlin N, Puoti M, et al. Ledipasvir and sofosbuvir for untreated HCV genotype 1 infection. N Engl J Med 2014;370:1889‐1898.
6. Naggie S, Cooper C, Saag M, Workowski K, Ruane P, Towner WJ, et al. Ledipasvir and sofosbuvir for HCV in patients coinfected with HIV‐1. N Engl J Med 2015;373:705‐713.
7. Ferenci P, Bernstein D, Lalezari J, Cohen D, Luo Y, Cooper C, et al. ABT‐450/r‐ombitasvir and dasabuvir with or without ribavirin for HCV. N Engl J Med 2014;370:1983‐1992.
8. Wyles DL, Ruane PJ, Sulkowski MS, Dieterich D, Luetkemeyer A, Morgan TR, et al. Daclatasvir plus sofosbuvir for HCV in patients coinfected with HIV‐1. N Engl J Med 2015;373:714‐725.
9. Hagan LM, Wolpe PR, Schinazi RF. Treatment as prevention and cure towards global eradication of hepatitis C virus. Trends Microbiol 2013;21:625‐633.
10. Alter HJ, Liang TJ. Hepatitis C: the end of the beginning and possibly the beginning of the end. Ann Intern Med 2012;156:317‐318.
11. Edlin BR, Winkelstein ER. Can hepatitis C be eradicated in the United States? Antiviral Res 2014;110:79‐93.
12. Zelenev A, Li J, Mazhnaya A, Basu S, Altice FL. Hepatitis C virus treatment as prevention in an extended network of people who inject drugs in the USA: a modelling study. Lancet Infect Dis 2018;18:215‐224.
13. Hajarizadeh B, Grebely J, Martinello M, Matthews GV, Lloyd AR, Dore GJ. Hepatitis C treatment as prevention: evidence, feasibility, and challenges. Lancet Gastroenterol Hepatol 2016;1:317‐327.
14. Leask JD, Dillon JF. Review article: treatment as prevention ‐ targeting people who inject drugs as a pathway towards hepatitis C eradication. Aliment Pharmacol Ther 2016;44:145‐156.
15. United Nations Office on Drugs and Crime . World Drug Report 2019, Sales No. E.19.XI.8. Published June 2019.
16. Ambekar A, Agrawal A, Rao R, Mishra AK, Khandelwal SK, Chadda RK, on behalf of the group of investigators for the National Survey on Extent and Pattern of Substance Use in India . Magnitude of Substance Use in India. New Delhi: Ministry of Social Justice and Empowerment, Government of India; 2019.
17. Solomon SS, McFall AM, Lucas GM, Srikrishnan AK, Kumar MS, Anand S, et al. Respondent‐driven sampling for identification of HIV‐ and HCV‐infected people who inject drugs and men who have sex with men in India: a cross‐sectional, community‐based analysis. PLoS Medicine 2017;14:e1002460.
18. Solomon SS, Lucas GM, Celentano DD, McFall AM, Ogburn E, Moulton LH, et al. Design of the Indian NCA study (Indian National Collaboration on AIDS): a cluster randomized trial to evaluate the effectiveness of integrated care centers to improve HIV outcomes among men who have sex with men and persons who inject drugs in India. BMC Health Serv Res 2016;16:652.
19. Solomon SS, Mehta SH, Srikrishnan AK, Vasudevan CK, Mcfall AM, Balakrishnan P, et al. High HIV prevalence and incidence among MSM across 12 cities in India. AIDS 2015;29:723‐731.
20. Rodgers MA, Vallari AS, Harris B, Yamaguchi J, Holzmayer V, Forberg K, et al. Identification of rare HIV‐1 Group N, HBV AE, and HTLV‐3 strains in rural South Cameroon. Virology 2017;504:141‐151.
21. Rodgers MA, Holzmayer V, Vallari A, Olivo A, Forberg K, Fuhrman J, et al. Hepatitis C virus surveillance and identification of human pegivirus 2 in a large Cameroonian cohort. J Viral Hepat 2019;26:30‐37.
22. Xia X, Xie Z, Salemi M, Chen L, Wang Y. An index of substitution saturation and its application. Mol Phylogenet Evol 2003;26:1‐7.
23. Rambaut A, Lam TT, Max Carvalho L, Pybus OG. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path‐O‐Gen). Virus Evol 2016;2:vew007.
24. Bouckaert R, Vaughan TG, Barido‐Sottani J, Duchêne S, Fourment M, Gavryushkina A, et al. BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLOS Computational Biol 2019;15:e1006650.
25. Kühnert D, Stadler T, Vaughan TG, Drummond AJ. Phylodynamics with migration: a computational framework to quantify population structure from genomic data. Mol Biol Evol 2016;33:2102‐2116.
26. Ragonnet‐Cronin M, Hodcroft E, Hué S, Fearnhill E, Delpech V, Brown AJL, et al. Automated analysis of phylogenetic clusters. BMC Bioinformatics 2013;14:317.
27. Rose R, Lamers SL, Dollar JJ, Grabowski MK, Hodcroft EB, Ragonnet‐Cronin M, et al. Identifying transmission clusters with cluster picker and HIV‐TRACE. AIDS Res Hum Retroviruses 2017;33:211‐218.
28. Han AX, Parker E, Maurer‐Stroh S, Russell CA. Inferring putative transmission clusters with Phydelity. Virus Evol 2019;5:vez039.
29. Rodgers MA, Gomathi S, Vallari A, Saravanan S, Lucas GM, Mehta S, et al. Diverse HCV strains and HIV URFS identified amongst people who inject drugs in India. Sci Rep 2020;10:7214.
30. Duchene S, Lemey P, Stadler T, Ho SYW, Duchene DA, Dhanasekaran V, et al. Bayesian evaluation of temporal signal in measurably evolving populations. Mol Biol Evol 2020;37:3363‐3379.
31. Poon AFY, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH. Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses. PLoS One 2013;8:e78122.
32. United Nations Office on Drugs and Crime . The Global Afghan Opium Trade: A Threat Assessment, Sales No. E.11.XI.11. 2011.
33. Farooq SA, Rasooly MH, Abidi SH, Modjarrad K, Ali S. Opium trade and the spread of HIV in the Golden Crescent. Harm Reduction J 2017;14:47.
34. Osburn WO, Fisher BE, Dowd KA, Urban G, Liu L, Ray SC, et al. Spontaneous control of primary hepatitis C virus infection and immunity against persistent reinfection. Gastroenterology 2010;138:315‐324.
35. Beyrer C. The golden crescent and HIV/AIDS in Central Asia: deadly interactions. Glob Public Health 2011;6:570‐576.

Author names in bold designate shared co‐first authorship.