Many but small HIV-1 non-B transmission chains in the Netherlands

Objective: The aim of this study was to investigate introductions and spread of different HIV-1 subtypes in the Netherlands. Design: We identified distinct HIV-1 transmission chains in the Netherlands within the global epidemic context through viral phylogenetic analysis of partial HIV-1 polymerase sequences from individuals enrolled in the ATHENA national HIV cohort of all persons in care since 1996, and publicly available international background sequences. Methods: Viral lineages circulating in the Netherlands were identified through maximum parsimony phylogeographic analysis. The proportion of HIV-1 infections acquired in-country among heterosexuals and MSM was estimated from phylogenetically observed, national transmission chains using a branching process model that accounts for incomplete sampling. Results: As of 1 January 2019, 2589 (24%) of 10 971 (41%) HIV-1 sequenced individuals in ATHENA had non-B subtypes (A1, C, D, F, G) or circulating recombinant forms (CRF01AE, CRF02AG, CRF06-cpx). The 1588 heterosexuals were in 1224, and 536 MSM in 270 phylogenetically observed transmission chains. After adjustments for incomplete sampling, most heterosexual (75%) and MSM (76%) transmission chains were estimated to include only the individual introducing the virus (size = 1). Onward transmission occurred mostly in chains size 2–5 amongst heterosexuals (62%) and in chains size at least 10 amongst MSM (64%). Considering some chains originated in-country from other risk-groups, 40% (95% confidence interval: 36–44) of non-B-infected heterosexuals and 62% (95% confidence interval: 49–73) of MSM-acquired infection in-country. Conclusion: Although most HIV-1 non-B introductions showed no or very little onward transmission, a considerable proportion of non-B infections amongst both heterosexuals and MSM in the Netherlands have been acquired in-country.


Introduction
The WHO has the ambition to end HIV transmission this decade [1]. With this aim, it is essential for countries to understand where HIV infections were acquired, and who they affect. In Western European countries, an increasing proportion of newly HIV-1 diagnosed persons are infected with non-B subtypes [2][3][4][5][6]. In the Netherlands, this concerns about a quarter of all people receiving care. In 71% of cases, these are foreign-born individuals, of whom 70% are from sub-Saharan Africa. The level of introductions and national transmission is unknown. The aMASE survey study among immigrants across HIV clinics in Europe estimated that 45% (95% confidence interval: 39-52) of infections among people from sub-Saharan Africa were acquired postmigration to Western Europe [7]. In contrast, a molecular phylogenetic study from Europe suggests that only nearly 20% of non-B infections among people from sub-Saharan Africa acquired their infection postmigration [2]. These estimates are quite different; however, the phylogenetic study had not accounted for individuals that were not sequenced, potentially introducing sampling bias in these estimates.
To obtain more insight into the transmission dynamics of non-B subtypes, we reconstructed partially observed transmission chains through phylogenetic analysis of nationally collected HIV-1 polymerase ( pol) nucleotide sequences, and then estimated the proportion of incountry HIV acquisitions amongst heterosexuals and MSM while accounting for incomplete sequence sampling of these risk groups. The sequences were obtained from the ATHENA national observational HIV cohort, and combined with pol sequences publicly available from the Los Alamos National Lab (LANL) HIV database [4,8]. As subtypes evolve differently amongst different risk groups, and some people have their virus sequenced many years after infection, we avoid phylogenetic clustering analysis with cut-offs on bootstrap values or patristic distances in phylogenetic trees [9][10][11][12]. Instead, the LANL sequences provided the phylogeographic context to identify distinct viral phylogenetic subgraphs associated with the Netherlands. We then interpreted each subgraph as evidence of a distinct national transmission chain resulting from a single introduction. This made it possible to estimate the number, spread, as well as the origin of introductions by transmission group of distinct non-B transmission chains, and for comparison, subtype B transmission chains in the Netherlands. Using a branching process model that adjusts for incomplete observations our results provide insight regarding the proportion of HIV-1 infections that are acquired in-country, both for heterosexual infected individuals and MSM.

ATHENA cohort
The ATHENA national observational HIV cohort includes pseudonymized demographic and clinical data collected from HIV-positive individuals receiving care in one of the 24 HIV treatment centres in the Netherlands since 1996, excluding approximately 1.5% who opt-out of their data being collected [4]. Population-based nucleotide HIV-1 partial pol gene sequences are obtained as part of routine screening for antiretroviral drug resistance, either before start of antiretroviral therapy or at time of virological failure [13]. Data on the most likely mode of HIV infection and assumed geographic region of infection are self-reported at time of entry into HIV care.
HIV-1 sequence selection, alignment and subtyping The study population comprised individuals in the ATHENA cohort infected before 1 January 2019 with at least one available HIV-1 pol RNA sequence with a minimum length of 750 nucleotides by 1 June 2020. If more than one sequence was available, the first sequence was used for analysis. R and the ape package were used to process sequences for phylogenetic analysis [14]. Sequences were aligned to the reference genome HXB2 [15], using VIRULIGN [16]. Subtypes were assigned using COMET [17], and when required checked and assigned using REGA [18].

International epidemic context
To characterize the international epidemic context of sequences obtained from our study population, background sequences were retrieved by subtype from the LANL HIV database on 18 February 2021 [8]. LANL sequences were included into the background sequence data if they overlapped with the ATHENA sequences in at least 1000 nucleotides, had country and year of sampling reported, and were not from the Netherlands. Only one sequence per person was included. Accession numbers are reported in Supplementary Table S1, http:// links.lww.com/QAD/C308. In addition, ATHENA sequences from people on Curaçao were added as background sequences.

Phylogeny construction
For each subtype, aligned background sequences were merged with the alignment of the study sequences using the reference sequence HXB2. Sequences from the closest other subtype in the combined data set were included as outgroups. For every subtype-specific alignment, major resistance conferring sites were deleted [19]. Phylogenetic trees were built in FastTree v2.1.8 [20] and rooted at the outgroup.
Viral phylogenetic estimation of Dutch subgraphs by transmission risk group From the reconstructed phylogenies, we identified phylogenetic subgraphs associated with national transmission amongst either heterosexuals or MSM. First ATHENA sequences were associated with states according to the transmission risk group (heterosexual, MSM, drug users, other/unknown) of the corresponding sampled individual. The state 'other/unknown' includes individuals infected through blood transfusion, vertical transmissions, accidental/occupational exposure and unknown routes of transmission. States associated with sequences from the LANL HIV database were based on the geographic region of the sample (states: Central Europe, Eastern Europe, Western Europe, Latin America and the Caribbean, North Africa and the Middle East, sub-Saharan Africa, United States and Canada, Southand Southeast Asia and Oceania, and Suriname and Curaçao, Supplementary Table S1, http://links.lww.com/QAD/C308). Second, ancestral states of all internal nodes in the phylogeny were estimated with the maximum parsimony ancestral state reconstruction algorithm implemented in phyloscanner v1.8.0 [21] (Supplementary Text, section 1, http://links.lww.com/ QAD/C307). Then, likely transmission chains associated with either MSM or heterosexuals in the Netherlands were identified as phylogenetic subgraphs (of size !1) that had a common state change occurring at the branch leading to the node representing their most recent common ancestor (see Fig. 1a).
The phylogeographic origin of a Dutch phylogenetically identified transmission chain was estimated as the state of the viral lineage ancestral to its root, which was one of the geographic regions attributed to LANL sequences, another Dutch transmission risk group or undefined when two states were equally parsimonious.

Statistical analyses
The number, origin and size of the phylogenetically identified transmission chains were determined by summing these statistics across the subtype-specific phylogenies. We focused on the four largest transmission categories (r) in the Netherlands: non-B heterosexuals, non-B MSM, subtype B heterosexuals and subtype B MSM. Ninety-five percent confidence intervals of these statistics were obtained from 100 replicate analyses on bootstrap sequence alignments, and are abbreviated as 95%BS-CI. As a result, confidence intervals do not necessarily include the central estimate.
In a fully sampled population, every phylogenetically identified transmission chain would correspond to the actual transmission chain within a particular observation period, and comprise one introduction possibly followed by onward in-country transmission. Some introductions will have occurred from other in-country risk groups, and so, the proportion of HIV-1 infections acquired in the Netherlands in category r can be expressed as where E Ã r is the number of transmission chains in category r, a r the proportion of chains that had their origin assigned to outside the Netherlands and N Ã r is the total number of individuals in the transmission chains. However, it is not obvious whether unsampled infections in the Netherlands are part of unobserved transmission chains or represent additional infections in partially observed chains. This makes it necessary to account for incomplete sampling when analysing phylogenetically observed transmission chains. The likelihood of the observed chain size distribution (x) was quantified under a branching process model with Negative Binomial offspring distribution to describe the unobserved size distribution of transmission chains (z), assuming that infected individuals were sampled at random [22]. The model is specified by the expected number of infected individuals under the offspring distribution, the dispersion parameter under the offspring distribution and the sampling probability (the ratio of individuals with a sequence divided by the total number of HIV-1 positive individuals in ATHENA). Using a Bayesian framework (Supplementary Text, section 2, http://links.lww.com/QAD/C307), we sampled from the posterior predictive distribution of complete transmission chains z Ã r , calculated N Ã r and E Ã r , and estimated the proportion of HIV-1 infections acquired in the Netherlands in category r through Eq. (1). Samplingadjusted results are reported by the median estimates and corresponding 95% credible intervals (95% CI). The model does not account for incomplete sampling of infections occurring outside the Netherlands based on the rationale that the sequence of only one external individual separating any two Dutch transmission chains needs to be sampled in order to separate distinct Dutch transmission chains into distinct phylogenetic subgraphs.
Comparison to estimates from the aMASE survey Phylogenetic estimates on the proportion of HIV-1 infections acquired in the Netherlands amongst MSM and heterosexuals with non-B subtypes were compared with estimates from the aMASE cross-sectional study among HIV-positive migrants. The study was conducted from July 2013 to July 2015 in 57 clinics in nine European countries. Postmigration HIV acquisition was estimated from electronic questionnaire and clinical data [7].

Ethics approval
At initiation, the ATHENA cohort was approved by the institutional review board of all participating HIV treatment centres. People entering HIV care receive written material about participation in the ATHENA cohort and are informed by their treating physician on the purpose of data collection, thereafter they can consent verbally or elect to opt-out. Data are pseudonymised before being provided to investigators and may be used for scientific purposes. A designated quality management coordinator safeguards compliance with the European General Data Protection Regulation [4].
b Estimated actual transmission chains, results adjusted for incomplete observations. heterosexuals in the Netherlands and 0 (95%BS-CI: 0-1) from other national chains (Table 3).

Estimated transmission chains after accounting for incomplete sequence sampling
Accounting for incomplete sampling, we estimated more and larger transmission chains than phylogenetically observed (  (Table 3). MSM infected with subtype B were also attributed to proportionally fewer and significantly larger transmission chains than MSM infected with non-B subtypes. The phylogenetically inferred origins of subtype B transmission chains among MSM were also significantly different, with more chains originating from Western Europe, North America and Latin America and the Caribbean, as compared to non-B transmission chains among MSM.
estimates, we find that an estimated 75% (95% CI:  In comparison, phylogenetic empirical estimates are shown with 95% confidence intervals obtained from 100 replicate analyses on bootstrap sequence alignments, hence these do not necessarily include the central estimate. In addition, estimates from selfreported data on the likely location of acquisition, and a selection of estimates from the aMASE survey study among immigrants across HIV clinics in Europe are shown [7]. when compared to non-B-infected, especially so for heterosexuals. The estimated proportion of in-country transmission was larger for all risk groups and all subtypes after adjusting for incomplete sequence sampling (Fig. 2), and more consistent with estimates from self-reported data on the likely location of acquisition as well as relevant estimates from the aMASE survey study among immigrants across HIV clinics in Europe [7] than the unadjusted estimates.

Discussion
We characterized the number, level of spread and region of origin of HIV-1 non-B introductions in the Netherlands by viral phylogenetic analysis, and compared our findings with those for HIV-1 subtype B infections. Our phylogenetic analyses indicate first that thousands of distinct transmission chains are co-circulating among both heterosexuals and MSM in the Netherlands. Second, we use these data to estimate the size distribution of complete transmission chains and find higher levels of incountry transmission, in more and larger transmission chains, than the directly observed phylogenetic data suggest. Our findings are compatible with phylogenetic studies from other European countries, which provided little evidence of large domestic non-B HIV-1 sub-epidemics for heterosexuals, but several larger transmission chains amongst MSM [2,3,6,23]. However, our estimated proportions of in-country HIV acquisitions were consistently lower in phylogenetic analyses that did not adjust for incomplete sampling, showing that the observed phylogenetic subgraphs are challenging to interpret without further modelling that accounts for incomplete sequence sampling. The adjusted phylogenetically derived estimates of in-country HIV acquisition were broadly congruent with overall selfreported data concerning the most likely geographical origin of infection. For non-B-infected heterosexuals, the adjusted phylogenetically derived estimates of incountry HIV acquisition were higher than estimates from patients' self-reported likely geographical origin of infection, but in line with survey estimates among heterosexual immigrants in a European survey, and in particular with survey estimates among heterosexual immigrants from sub-Saharan Africa [7]. The latter is particularly relevant, as the majority of our non-B heterosexual sample population was born in sub-Saharan Africa.
In comparison, among heterosexually infected individuals with subtype B, we find a much higher proportion of in-country acquisitions, primarily because half of transmission chains among subtype B heterosexuals were estimated to have originated in-country from MSM. Similar to what we see for non-B, these subsequently have not resulted in any major sub-epidemics amongst heterosexuals. The large subtype B subgraphs identified with predominantly heterosexual transmission include a high percentage of people from Curaçao and Suriname [24] (Supplementary Table S6, http://links.lww.com/ QAD/C313). Consistent with the European aMASE survey, immigrants from Latin America and the Caribbean reported highest (71%) postmigration HIV acquisition [7].
Our study has several limitations. First, our phylogenetic analysis rests on the assumption that sufficient HIV sequences were publicly available at LANL to place the Dutch viral sequences into the correct epidemiologic context. Although only one ancestor separating distinct Dutch transmission chains needs to be in the background data set, it is certainly possible that some of the larger phylogenetically identified transmission chains in fact correspond to several smaller transmission chains. Our results may therefore overestimate the number of infections acquired in the Netherlands. Second, less than half of ATHENA participants had a viral sequence available for analysis, and the sampling fraction was higher among MSM compared to heterosexuals (44 versus 39%). Although we employed statistical models to account for incomplete sampling, the models have limitations. We assumed individuals are sampled at random with the same sampling probability in each risk group, and we cannot exclude the possibility that violation of these assumptions could introduce bias. Third, viral phylogenetic lineages were attributed the location of individuals at time of sampling, and this approach did not account for individual-level mobility. Several infections per chain could have occurred abroad amongst travellers and/or immigrants, or infections might have occurred in the Netherlands from short-term visitors. This could partly explain a respectively higher and lower proportion of in-country infections estimated by the phylogenetic approach compared to self-reported data. Alternatively, self-reported data on the likely origin of infection could be subject to reporting biases [25]. Finally, we have not estimated time trends, and it is possible that dynamics changed over time, in particular with recent test-andtreat policy and substantial reductions in new diagnose [26].
This study shows HIV-1 non-B subtypes are spreading in the Netherlands in many distinct transmission chains, which are significantly smaller in size than transmission chains of subtype B, both for heterosexuals and for MSM. Nevertheless, we estimate that a substantial proportion of non-B infections among heterosexuals and MSM are acquired within the Netherlands, and find that the discrepancy between previous phylogenetic studies of incountry transmission and recent survey estimates into the proportion of in-country transmission could be explained by incomplete sequence sampling of infected individuals.