Secondary Logo

Journal Logo


Prediction of HIV Transmission Cluster Growth With Statewide Surveillance Data

Billock, Rachael M. MSPH*; Powers, Kimberly A. MSPH, PhD*; Pasquale, Dana K. MPH, PhD*; Samoff, Erika MPH, PhD; Mobley, Victoria L. MPH, MD; Miller, William C. MD, PhD; Eron, Joseph J. MD§; Dennis, Ann M. MD§

Author Information
JAIDS Journal of Acquired Immune Deficiency Syndromes: February 1, 2019 - Volume 80 - Issue 2 - p 152-159
doi: 10.1097/QAI.0000000000001905



Despite major advances in HIV treatment and prevention, flat or rising HIV incidence remains a major public health problem in numerous settings.1 Globally, only very gradual declines in incidence have been observed in most regions since 2010, whereas large increases have occurred in Eastern Europe and Central Asia.1 In the United States, young, black or Hispanic men who have sex with men (MSM) have shown static or increasing incidence in recent years.2 Novel approaches to reducing HIV transmission across a range of highly affected populations have been proposed, including investigation of HIV transmission clusters.3,4

HIV transmission clusters are groups of persons living with HIV (PLWH) with closely related viral sequences, signifying a network of local HIV transmission.4–8 Identification of and response to HIV transmission clusters that are likely to grow holds promise in guiding interventions to interrupt transmission and improve HIV care engagement. However, accurate forecasting of future cluster growth is needed to implement this strategy. Although multiple factors—including large baseline size,9,10 presence of acute HIV infections,10 and geographic diversity9,11—have been individually associated with future cluster growth, no tool synthesizing predictor information for use by public health agencies has been developed.

We developed and validated a predictive model to forecast cluster growth using cluster-level demographic, clinical, temporal, and contact tracing characteristics contained in routinely collected HIV surveillance data in North Carolina (NC), a setting with a high relative HIV burden.2,12 By identifying clusters that are likely to grow, the model is intended to guide cluster selection for enhanced interventions to diagnose additional cluster members, bridge viremic cluster members to expedited treatment, and link HIV-negative contacts with pre-exposure prophylaxis.


Study Design

We conducted a combined analysis of HIV sequence and surveillance data from diagnosed PLWH in NC who had at least one sequence available. We obtained partial pol (protease and reverse transcriptase) HIV-1 sequences generated from routine genotypic resistance testing at Laboratory Corporation of America, the largest reference laboratory in NC, from November 2010 to December 2017. Sequence data were linked to the NC Division of Public Health's (DPH) Electronic Disease Surveillance System (NC EDSS), which includes demographic, clinical, and contact-tracing data for all persons diagnosed with HIV and reported in NC. We evaluated predictors of cluster growth using two 18-month observation periods. First, we developed a model to predict cluster growth from January 2015 to June 2016. Next, we temporally externally validated the model in the subsequent 18 months, July 2016–December 2017, using data that became available after the initial model development phase. The Biomedical Institutional Review Board at the University of North Carolina at Chapel Hill approved this study.

Transmission Cluster Identification and Growth Assessment

Putative clusters were identified based on the first available full-length protease and partial reverse transcriptase sequence for each individual. Sequences were aligned with Clustal Omega and manually edited to strip-gapped positions.13 Final sequence length was 1497 bases. Clusters were generated in HIV TRACE14 by linking sequences with pairwise distance ≤0.015 (1.5%) substitutions per site divergent (see Figure 1, Supplemental Digital Content, for an explanation of this cut-off) based on the Tamura-Nei-93 substitution model.15 We defined transmission clusters as links composed of ≥2 members, where each cluster member was linked to at least one other member with distance ≤1.5%.

For each 18-month observation period (January 2015–June 2016 and July 2016–December 2017), we first identified clusters that were both established and recently active at the start of that observation period (baseline). “Established” was defined as having ≥2 sequences sampled before baseline and “recently active” was defined as having any sequences sampled in the 2 years before baseline. Receipt of new sequences in the 2 years prior was chosen as a signifier of recent transmission potential because viremia is a necessary condition for viral sequencing. Cluster members were categorized as (1) baseline: person with an HIV diagnosis and sequence before baseline; (2) new: person newly diagnosed with HIV and sequenced during the observation period; (3) hidden: person diagnosed before baseline, but whose first available sequence was sampled during the observation period. Clusters and relevant members were identified independently for each observation period and could be included as “established” in the internal, temporal external, or both validation populations.

Among clusters that were established and recently active at baseline, growth was defined as the diagnosis of at least one new member in the relevant 18-month observation period (Fig. 1A). Nongrowing clusters were composed only of sequences sampled from persons diagnosed before baseline (baseline or hidden persons) (Fig. 1B). Although hidden members would have been identified as baseline members had a sequence been available before baseline, they were not treated as baseline cluster members because they would not be observable in the cluster for real-time public health application of the predictive model. Hidden cluster members were also not counted toward cluster growth because they were diagnosed before baseline.

A, Network diagram, for example, “growing cluster.” Black circles indicate sequences sampled before baseline (baseline cluster members), white circles indicate sequences sampled after baseline from new diagnoses (new cluster members), and gray circles indicate sequences sampled after baseline from previous diagnoses (hidden cluster members). Lines indicate ≤1.5% TN-93 genetic distance between individuals. B, Network diagram for an example “nongrowing cluster.”

Predictive Model Development

We completed model development and validation in 5 main steps (see Figure 2, Supplemental Digital Content, First, we identified 18 cluster-level variables as candidate predictors of cluster growth over time (step 1), choosing variables that could be reliably assessed using NC EDSS data and that had known or plausible associations with HIV transmission or case detection (see Table 1, Supplemental Digital Content, One of these 18 variables was cluster size at baseline, and the remaining 17 were calculated from baseline members' characteristics and categorized in 1 of 4 domains: temporal, demographic, clinical, or contact tracing. Temporal variables were based on diagnosis and sequence dates relative to baseline and demographic variables were based on surveillance data recorded at the time of diagnosis or first recognition as a case in NC. Clinical variables were largely based on laboratory data (CD4 counts and viral load measurements), which are required by NC law to be reported to the DPH for surveillance purposes. As is common in surveillance-based analyses,16 we treated these laboratory reports as proxies for HIV-care visits. Recency of infection was determined as previously described as acute or recent infection at diagnosis, or chronic infection at diagnosis.8 Contact-tracing variables were based on documented encounters with disease intervention specialists (DIS), DPH employees who attempt contact with all newly HIV-diagnosed persons within approximately 1 week of diagnosis to link them to HIV medical care, and perform partner counseling and referral services.

Of the 18 candidate predictors specified a priori, 2 were excluded from further consideration because of substantial missingness (>64% of cluster members with missing data): the percentage of cluster members who reported meeting sex partners online and the percentage who reported a previous or prevalent sexually transmitted infection (STI). Of the remaining 16 predictors, 1 was cluster size at baseline, and the other 15 (grouped by domain) were as follows: (1) temporal: years since the most recent diagnosis, years since the earliest diagnosis, and years since the earliest sequence; (2) demographic: median age at baseline, percentage persons who inject drugs (PWID), percentage MSM, percentage male, percentage black, non-Hispanic, and percentage residing in a single NC region; (3) clinical: percentage with HIV viral loads ≥1000 copies per milliliter at the most recent care visit or no viral load in the year before baseline (assumed out of care and detectable), percentage diagnosed during acute or recent infection in the 2 years before baseline, median time to HIV care entry after diagnosis, and percentage in HIV care during the year before baseline; and (4) contact tracing: percentage interviewed by DIS at diagnosis and percentage with no named, identifiable contacts.

We assessed the crude associations between each of these 16 candidate predictors and cluster growth to select variables for retention in a full multivariable model (step 2). To preserve the original sample for use in internal model validation (see step 4 below), we conducted these analyses in a sample drawn from the original set of established, recently active clusters. To create a sample of equal size to the original set of N observed clusters, we conducted N individual draws (with replacement) of clusters in the original sample. We then used logistic regression to estimate the unadjusted associations between each predictor and binary cluster growth, testing continuous, binomial, and categorical coding to identify the strongest predictive form for each variable. Predictors with Wald χ2P values >0.25 in the logistic regression model, along with collinear variables (Pearson correlation coefficient of ≥0.7), were excluded from further consideration. Of the original 16 candidate predictors, all but percentage men (who we removed because of collinearity with percentage MSM) survived this process and were eligible for inclusion in a full multivariable model in their strongest predictive forms.

To determine whether reduced models could predict cluster growth with similar accuracy but greater parsimony than the full model, we applied a bootstrap method17–19 for model selection (step 3) because of the small available sample of established clusters. This method leverages random samples of the same underlying population during model development and forces documentation and visibility of the volatility of predictor selection.17–19 We first drew 100 bootstrapped samples of size N, including the previous sample drawn in step 2 for preliminary predictor assessments, forming each sample through N draws (with replacement) of individual clusters from the original set of N established, recently active clusters. We then constructed reduced logistic regression models in each of the bootstrapped samples through backward elimination, sequentially removing predictors with the largest Wald χ2P value and retaining predictors if a reduced model showed a Wald χ2P value ≤0.10 compared with the previous model. The frequency of retention in the final model after complete backward elimination in each of the bootstrapped samples was assessed for each predictor (see Table 2, Supplemental Digital Content, Retained predictors did not change substantially with additional (200 or 500) bootstrapped samples. Five candidate-reduced models were then generated by including predictors retained in ≥20%, ≥30%, ≥40%, ≥50%, and ≥70% of the bootstrapped final models (see Table 3, Supplemental Digital Content,

The 5 candidate-reduced models and the full model were then applied to the original sample for final model selection (step 4).17–19 Models were considered sufficiently fit with χ2P values ≥0.10 using the Hosmer–Lemeshow test.20 The area under the receiver operating characteristic curve (ROC-AUC) was calculated for each reduced model in the original sample, and changes of ≤0.01 in ROC-AUC compared with the full model were deemed acceptable losses of predictive power.20 We also calculated the Akaike Information Criterion and Bayesian Information Criterion (BIC) to evaluate comparative model fit. A final model was selected considering ROC-AUC, Akaike Information Criterion, BIC, and model complexity. Optimism and the optimism-corrected ROC-AUC were calculated for the final model to account for potential overestimation of predictive power after fitting and internally validating the model on the same source data set, despite bootstrapping.21 This model was then applied to the temporal external validation population of clusters to test for reproducibility over time through ROC-AUC (step 5).

Predictive Model Application

Using the final model, we estimated predicted probabilities of cluster growth for each established, recently active cluster, and we calculated the sensitivities and specificities of predicted probability cut-offs (≥0.1 to ≥0.9 by 0.1 increments) in identifying growing clusters for potential intervention. We also calculated for each cut-off the proportion of clusters that would require additional public health investigation and the proportion of new members who belonged to a cluster that would be selected for investigation. We then repeated these calculations for the temporal external validation sample.

All analyses were conducted in SAS 9.4 (SAS Institute Inc, Cary, NC), and visualizations were produced in R 3.5.0. Sequence and participant data may not be shared under the terms of the Data Use Agreement governing this analysis. Deidentified data may be available through a data request process requiring a Data Use Agreement.


Study Population and HIV Transmission Clusters

Sequences were available for 10,084 persons diagnosed with HIV 1982–2017. Approximately 35,000 persons were living with diagnosed HIV in NC in 2017, giving overall sequence coverage of ∼29%.22 Sequence coverage has increased in recent years, and sequences were available for 49%, 52%, and 51% of persons newly diagnosed in 2014, 2015, and 2016, respectively.12 Compared with all persons diagnosed with HIV and living in NC in 2017, persons with sequences were younger (32.6% vs. 20.3% <35 as of December 31, 2017), less likely to self-identify as MSM (47.1% vs. 72.0%) and PWID (7.3% vs. 11.0%), and more likely to be African American (69.2% vs. 58.4%).

For initial model development and internal validation (baseline of January 2015), sequence data were available for 8202 persons. One-third of these persons (2750/8202; 33.5%) were identified in 730 putative clusters, and half of all clusters (352/730; 48.2%) were recently active and established before baseline (1835 sequences) (Fig. 2A). The median established cluster size was 3 baseline members (range 2–34); 163 clusters (46.3%) were dyads at baseline, 126 (35.8%) contained 3–5 members, 46 (13.1%) contained 6–10 members, and 17 (4.8%) contained >10 members at baseline (see Figure 3, Supplemental Digital Content,

A, Study population identification for model development and internal validation. B, Targeted study population identification for temporal external validation.

Characteristics of Growing and Nongrowing Clusters

One-quarter of established, recently active clusters (24.4%; 86/352) grew during the initial observation period, adding 209 new cluster members (Fig. 2A). In growing clusters, 89/916 cluster members (9.7%) were hidden, compared with 47/919 (5.1%) in nongrowing clusters. Growing clusters were larger at baseline than nongrowing clusters (median size: 5 vs. 2 baseline members) and had more recently experienced a new diagnosis (median 0.4 vs. 1.8 years between the most recent diagnosis and baseline).

Baseline members of growing clusters were younger at baseline (median age: 29 vs. 35 years), more likely to be male (88.8% vs. 76.3%), and more likely to identify as black, non-Hispanic (75.7% vs. 71.9%), and MSM (70.7% vs. 52.2%) than baseline members of nongrowing clusters (Table 1). Baseline members of growing clusters also had a shorter median time to HIV care entry after diagnosis than those in nongrowing clusters (46 vs. 78 days) but were more likely to have HIV viremia at the most recent care visit (≥1000 copies per milliliter) or no available viral load in the year before baseline (49.4% vs. 43.5%).

Selected Characteristics of Individuals in Recently Active, Established Clusters, Including Baseline Members of Both Growing and Nongrowing Clusters and New Members of Growing Clusters in the Internal and Temporal External Validation Samples

New members of growing clusters were younger (median age: 26 years) than baseline members of both growing and nongrowing clusters and were predominantly self-identifying MSM (73.7%). They were also more likely to have been diagnosed during acute or recent infection (10.1%) than baseline members of both growing (5.5%) and nongrowing clusters (3.2%).

Predictive Model Internal and Temporal External Validation

As the ROC-AUC values and ROC curves were nearly identical for all candidate-reduced models (see Figure 4, Supplemental Digital Content,, and because the Hosmer–Lemeshow test showed acceptable goodness of fit, we selected the model with the lowest BIC (344.47) (see Table 3, Supplemental Digital Content, and thus the best comparative fit as our final model. This model was also considerably simpler (6 predictors retained after complete elimination in ≥50/100 bootstrapped samples) than the full model and had <0.01 loss of ROC-AUC compared with the full model.

In this final model, cluster growth was predicted by larger baseline cluster size [adjusted odds ratio (aOR) = 1.17 per one-person increase], shorter median time to HIV care entry after diagnosis (aOR = 0.85 per 1-year increase), and younger median age at baseline (aOR = 0.67 per 10-year increase). Cluster growth was also predicted by >50% baseline cluster members with no named contacts (aOR = 2.13), ≤1 year since the most recent diagnosis in the cluster (aOR = 2.69), and higher percentage with HIV viremia (RNA ≥1000 copies per milliliter) or no available viral load during the year before baseline (0% < x ≤ 25%: aOR = 2.03; 25% < x ≤ 50%: aOR = 4.31; and >50%: aOR = 2.74) (Fig. 3). The final model had an ROC-AUC of 0.83 and optimism-corrected ROC-AUC of 0.82 in the internal validation sample, indicating excellent predictive ability.23

Predictor aORs with application of the final model to the internal and temporal external validation samples. Ninety-five percent confidence intervals are shown with horizontal bars. aORs for cluster size at baseline and median years to care entry represent a 1-unit increase in cluster members and years, respectively. aOR for median age at baseline represents a 10-year increase in median age. aaOR is in comparison with ≤50% of cluster members with no named contacts. baOR is in comparison with >1 year since a diagnosis in the cluster. caOR is in comparison with 0% of cluster members with HIV viremia.

The temporal external validation population was composed of 426 established clusters identified from all sequences sampled through December 2017 (Fig. 2B). The final model had an ROC-AUC of 0.83 in this population.

A low-predicted probability cut-off, ≥0.1 would identify clusters accounting for 92% of all new cluster members identified during 18 months of observation but would require investigation of only 55% of established clusters in the temporal external validation population (Fig. 4B). Intervention upon these clusters could potentially allow for earlier diagnosis of or prevention of transmission to these 92% of new cluster members. Conversely, a high cut-off, ≥0.9 would require investigation of only 2% of established clusters but would identify clusters containing 18% of these new cluster members.

A, Evaluation of sensitivity, specificity, and coverage of predicted probability cut-offs in the internal validation cluster population. The solid black line indicates sensitivity and the solid gray line indicates specificity. Ninety-five percent confidence intervals for sensitivity and specificity are shown with vertical bars. The dashed black line indicates the proportion of clusters with predicted probabilities equal to or greater than the predicted probability cut-off. The dashed gray line indicates the proportion of new cluster members that are included in investigated clusters with each cut-off. B, Evaluation of sensitivity, specificity, and coverage of predicted probability cut-offs in the temporal external validation cluster population.

Sensitivity Analyses

We performed a sensitivity analysis with a maximum pairwise genetic distance of ≤0.005 for cluster detection to evaluate how a tighter genetic distance threshold would impact the model's predictive ability. However, too few growing clusters were observed under this cut-off to achieve model convergence with our data in either population of clusters. In a second sensitivity analysis excluding dyads, we observed an optimism-adjusted ROC-AUC of 0.746 in the internal validation sample and an ROC-AUC of 0.812 in the temporal external validation sample with the final model. Predictor aORs are presented in Table 4, Supplemental Digital Content, This smaller population of clusters showed generally similarly trending aORs with lower precision than the main analyses.


Identification of HIV transmission clusters that are likely to grow in the near future has the potential to guide prioritization of public health interventions to cluster members and their known contacts in an era of ongoing HIV incidence.3 Limited resources within public health departments necessitate novel approaches to improving efficiency, particularly given current uncertainties around funding for HIV prevention, care, and treatment.24 To our knowledge, we have developed and validated the first public health tool for forecasting cluster growth with multiple cluster-level characteristics obtainable from routine HIV surveillance data. Our model showed excellent ability to predict HIV transmission cluster growth over 18 months.

The predictors identified in our final model likely correspond with several different mechanisms for observed cluster growth because new cluster members (ie, persons signifying growth) could represent new transmissions, newly diagnosed cases reached as a result of contact tracing for baseline cluster members, or newly diagnosed cases identified through other routes. Two of the final predictors—specifically, more recent diagnoses and a higher prevalence of viremia among baseline cluster members—may be predictive of observed cluster growth through new transmissions. Viremia is a necessary condition for viral transmission, and recent diagnoses within a cluster may signal members who were unaware of their infections and/or not in care in the recent past. On the other hand, shorter median time between diagnosis and care entry may reflect strong care-seeking behaviors or access that may also be common to baseline members' partners. As such, the association between this predictor and cluster growth may be related more strongly to infection detection in a cluster rather than transmission. Use of both predictors of new transmissions and predictors of detection of new cluster members—several of which have been individually identified in previous analyses9–11,25–29—allows for the model described here to forecast cluster growth as observed in real-world public health scenarios.

The influence of multiple growth mechanisms suggests a variety of intervention strategies, and selection of an appropriate intervention may be guided by individual predictors within the model. Clusters with large proportions of virally unsuppressed PLWH may benefit from enhanced support for care engagement and immediate antiretroviral therapy30 allocated to cluster members and their named contacts who are newly diagnosed or currently unsuppressed. Clusters with a majority of members who do not report any sexual or injecting contacts to DIS may be ideal recipients of nontraditional network recruitment methods, including internet partner notification or social network HIV screening.31 Finally, clusters with recent diagnoses and young members may especially benefit from enhanced support for pre-exposure prophylaxis linkage and uptake among HIV-negative named contacts of cluster members to interrupt future transmissions.

Selection of predicted probability cut-offs for prioritization of cluster investigation can be based on resource availability and intervention priorities in a given setting. Higher predicted probability cut-offs have lower sensitivity (and will thus miss some growing clusters) but allow for concentration of limited resources and identify clusters responsible for disproportionate percentages of new cluster members. Public health officials may prefer a low cut-off if the algorithm is applied only to identify clusters warranting further digital oversight, or a high cut-off if the algorithm is used to identify potential clusters for heightened, resource-intensive interventions.

Temporal external validation in a second period demonstrated that the combination of identified predictors maintained excellent predictive ability over the subsequent 18-month interval in NC, but future changes in epidemic dynamics and public health practice could affect algorithm performance. More specifically, a new mandate for sequence reporting in NC32 will likely increase the proportion of newly diagnosed cases with sequences available beyond the 40%–50% observed here. The time between diagnosis and sequencing is also shortening and a larger proportion of diagnosed cluster members thus have sequences available for analysis, can be recognized as cluster members, and contribute to the calculation of predictors year-over-year. Increasing completeness and rapidity of sequencing is likely to shrink the numbers of “hidden” cluster members, which we found to be more prevalent in growing vs. nongrowing clusters. Interventions spurred by model results may also alter cluster characteristics, potentially limiting future predictability for some clusters. As these changes occur, updates to the model may be required.

Although we limited our analysis to cluster-level predictors of cluster growth, we did observe expansion of singletons into clusters (Fig. 2A). Further evaluation of individual-level predictors of this process may provide additional insights for targeted public health efforts. STI diagnoses before and after an HIV diagnosis are strong indicators of sexual risk behavior that allow for HIV transmission.33 We were unable to assess STI diagnosis as a potential predictor here due to substantial missing self-reported data, but linkage of HIV and STI surveillance data may allow for inclusion of this predictor in the future.

HIV transmission clusters are derived from available data and should not be interpreted as the full underlying transmission network. MSM and PWID are underrepresented in current sequence data, possibly reducing our ability to detect cluster growth due to new diagnoses in these risk groups. We are unable to determine directionality from these data and we note that genetic linkages may not signify direct transmission events. We also note that cluster growth may, in some cases, reflect strong case finding through existing mechanisms. A high likelihood of cluster growth should be treated as a signal for further evaluation and potential tailored intervention, not necessarily a sign of rapid transmission. However, an elevated prevalence of acute/recent HIV infections among new members of growing clusters relative to all new diagnoses with sequences in NC over the same time frame indicates potential to disproportionately impact ongoing transmission with cluster-based interventions.


The predictive model developed and validated here leveraged existing HIV surveillance data and showed excellent predictive ability to forecast transmission cluster growth in NC. Identification of HIV transmission clusters that are likely to grow over time with this type of predictive tool may help guide prioritization of public health interventions to maximize HIV incidence reductions.


All data used in this study were provided by the NC Department of Health and Human Services, Division of Public Health (DPH). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the NC DPH.


1. UNAIDS. Global AIDS Update. Available at: Accessed August 1, 2018.
2. Singh S, Song R, Johnson AS, et al. HIV incidence, HIV prevalence, and undiagnosed HIV infections in men who have sex with men, United States. Ann Intern Med. 2018;168:685–694.
3. Oster AM, France AM, Mermin J. Molecular epidemiology and the transformation of HIV prevention. JAMA. 2018;319:1657–1658.
4. Paraskevis D, Nikolopoulos GK, Magiorkinis G, et al. The application of HIV molecular epidemiology to public health. Infect Genet Evol. 2016;46:159–68.
5. Leigh Brown AJ, Lycett SJ, Weinert L, et al. Transmission network parameters estimated from HIV sequences for a nationwide epidemic. J Infect Dis. 2011;204:1463–469.
6. Chaillon A, Essat A, Frange P, et al. Spatiotemporal dynamics of HIV-1 transmission in France (1999–2014) and impact of targeted prevention strategies. Retrovirology. 2017;14:15.
7. Wertheim JO, Leigh Brown AJ, Hepler NL, et al. The global transmission network of HIV-1. J Infect Dis. 2013;209:304–313.
8. Dennis AM, Pasquale DK, Billock R, et al. Integration of contact tracing and phylogenetics in an investigation of acute HIV infection. Sex Transm Dis. 2018;45:222–228.
9. Brenner BG, Roger M, Stephens D, et al. Transmission clustering drives the onward spread of the HIV epidemic among men who have sex with men in quebec. J Infect Dis. 2011;204:1115–1119.
10. Ragonnet-Cronin M, Ofner-Agostini M, Merks H, et al. Longitudinal phylogenetic surveillance identifies distinct patterns of cluster dynamics. J Acquir Immune Defic Syndr. 2010;55:102–108.
11. Ragonnet-Cronin M, Hodcroft E, Hue S, et al. Automated analysis of phylogenetic clusters. BMC Bioinformatics. 2013;14:317.
12. North Carolina HIV/STD Surveillance Unit. 2016 North Carolina HIV/STD/Hepatitis Surveillance Report. Available at: Accessed August 1, 2018.
13. Sievers F, Wilm A, Dineen DG, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7:539.
14. Kosakovsky Pond SL, Weaver S, Leigh Brown AJ, et al. HIV-TRACE (TRAnsmission cluster engine): a tool for large scale molecular epidemiology of HIV-1 and other rapidly evolving pathogens. Mol Biol Evol. 2018;35:1812–1819.
15. Rose R, Lamers SL, Dollar JJ, et al. Identifying transmission clusters with cluster picker and HIV-TRACE. AIDS Res Hum Retroviruses. 2017;33:211–218.
16. Lesko CR, Sampson LA, Miller WC, et al. Measuring the HIV care continuum using public health surveillance data in the United States. J Acquir Immune Defic Syndr. 2015;70:489–494.
17. Austin PC, Tu JV. Bootstrap methods for developing predictive models. Am Stat. 2004;58:131–137.
18. Harrell FE, Slaughter JC. Biostatistics for Biomedical Research. Available at: Accessed August 1, 2018.
19. Steyerberg EW, Harrell FE. Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol. 2015;69:245–247.
20. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 2013;21:128–138.
21. Smith GCS, Seaman SR, Wood AM, et al. Correcting for optimistic prediction in small data sets. Am J Epidemiol. 2014;180:318–324.
22. North Carolina HIV/STD Surveillance Unit. 2017 North Carolina HIV/STD/Hepatitis Surveillance Report. Available at: Accessed August 1, 2018.
23. Hosmer DW, Lemeshow S. Assessing the fit of the model. In: Applied Logistic Regression. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc.; 2000.
24. Kaiser Family Foundation. U.S. Federal Funding for HIV/AIDS: Trends over Time. Available at: Accessed August 1, 2018.
25. Ragonnet-Cronin M, Wertheim JO, Hayford CS, et al. HIV transmission cluster dynamics that inform public health intervention in Illinois [abstract 957]. Presented at: 25th Conference on Retroviruses and Opportunistic Infections; March 4–7, 2018; Boston, MA.
26. Raggonet-Cronin M, Hu YW, Wertheim JO. Predicting HIV cluster growth using phylodynamic reconstruction in Los Angeles County [abstract 949]. Presented at: 25th Conference on Retroviruses and Opportunistic Infections; March 4–7, 2018; Boston, MA.
27. Bachmann N, Kadelka C, Turk T, et al. Cluster analysis reveals important shift of drivers of the HIV in Swiss men [abstract 41]. Presented at: 25th Conference on Retroviruses and Opportunistic Infections; March 4–7, 2018; Boston, MA.
28. Panneer N, Oster AM, Ocfemia CB, et al. Association between viral suppression and molecular cluster growth, United States [abstract 955]. Presented at: 25th Conference on Retroviruses and Opportunistic Infections; March 4–7, 2018; Boston, MA.
29. McVea D, Liang R, Joy J, et al. A framework for predicting phylogenetic clusters of HIV at high risk for growth [abstract 848]. Presented at: 24th Conference on Retroviruses and Opportunistic Infections; 2017; Seattle.
30. Pilcher CD, Ospina-Norvell C, Dasgupta A, et al. The effect of same-day observed initiation of antiretroviral therapy on HIV viral load and treatment outcomes in a U.S. Public health setting. J Acquir Immune Defic Syndr. 2017;74:44–51.
31. Dailey Garnes NJM, Moore ZS, Cadwell BL, et al. Previously undiagnosed HIV infections identified through cluster investigation, North Carolina, 2002–2007. AIDS Behav. 2015;19:723–731.
32. North Carolina Division of Health and Human Services. Rule 10A NCAC 41A.0202.
33. Pasquale D. Epidemiological Analysis of Sociosexual HIV Networks in Central North Carolina: Doctoral Dissertation. Chapel Hill, NC: University of North Carolina at Chapel Hill; 2018.

HIV transmission cluster; HIV genetic cluster; predictive model; cluster growth

Supplemental Digital Content

Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.