We appreciate the correspondence by Dr Butler  in response to our article  and his careful consideration of the study methodology. We indeed agree that caution should be exercised in interpreting sequence data as originating from single genome amplification (SGA) when using increased amplification thresholds. Dr Butler expressed, in particular, concerns regarding sensitivity of detection of minor variants due to low sampling numbers in our study and the number of nucleotide ambiguities accepted per genome. We would like to address the aforementioned and try to articulate why our analyses and conclusions are robust to the increased threshold used.
The number of templates likely contained in SGA reactions can be modeled with a Poisson distribution. The Poisson distribution's only parameter, denoted by lambda (λ), governs the average number of genomic templates per reaction, rather than, as suggested by Dr Butler, the proportion of reactions that are positive. A widely used convention for SGA is a threshold of less than 33% of reactions resulting in amplification. This is what you would expect, on average, from a Poisson distribution with λ = 0.4, wherein just over 81% of these reactions likely will have been amplified from a single genomic template. Our threshold of 66% positive reactions is what you would expect from a lambda of 1.079, and, given this lambda, we should expect 55.6% of positive reactions to have arisen from a single template. For large numbers of reactions, you can compute the proportion of positive reactions expected to truly arise from single genomes with SGAs as:
where p is the proportion of reactions that are positive. The ‘large number’ caveat is required because λ is not actually observed directly, but is estimated from the proportion of positive reactions. This estimate may be unreliable for experiments with few reactions. Our estimates differ from those of Dr Butler and we have plotted the relationship between the threshold, and the likely number of positive reactions derived from a single genome in Fig. 1.
The purpose of our study was to identify the earliest changes driven by immune selection across the genome by sequencing at time-points between study screening/enrolment and 6 months postinfection. Due to inherent challenges in amplification of near full-length genomes, including the availability of plasma, in some instances exacerbated by low viral loads such as in the case of viremic controller CAP45, we included sequences at a higher amplification threshold. Importantly, however, 47% of sequences sampled over the 6 months were obtained at 33% or less PCR positivity and 64% at 40% or less PCR positivity. Thus, according to a Poisson distribution, for the majority of our dataset, the probability that our sequences were derived from a single template likely ranged from 77 to 81%.
To compensate for the higher positivity threshold, we limited the acceptable number of ambiguities per template. Within our primary dataset, we had a median of 1.5 (average = 1.99) ambiguous positions per genome. Dr Butler correctly indicates that one reason for Salazar-Gonzalez et al.  attributing up to five ambiguities to polymerase error was because of the less than 20% PCR positivity threshold used. However, as also indicated by Salazar-Gonzalez et al. , when generating a consensus of sequences sampled from the earliest time-point, ambiguities are essentially ‘absorbed’ into that consensus.
In fact, derivation of the t/f sequences was also determined in conjunction with env only sub-genomic SGA. As part of a larger study , envSGA (using an amplification threshold of 30% and excluding all sequences with double peaks) was carried out on all participants in our study, and all were found to be infected with a single t/f variant. Our env consensus sequences generated from near full-length genomes proved to be identical to that generated from env SGA. As env is the most rapidly diversifying region of the genome, we are confident that the near full-length genome consensus represents the true t/f virus.
Although derivation of the t/f is robust to the inclusion of a minority of limiting dilution sequences, we acknowledge that a low sampling number and a higher SGA positivity threshold increased the likelihood of missing minor variants. However, as indicated in our article, we also performed focused epitope sequencing at time-points before and after detection of mutations associated with cytotoxic T-lymphocyte selection, ameliorating the lack of sensitivity to detection of earlier escape or additional mutant variants. The presence of earlier escape mutations or additional mutant variants, which may have gone undetected due to increased thresholds, would have only strengthened our conclusions that immune selection occurs very early following transmission, and is dynamic across the subtype C genome.
We would like to thank Miguel Lacerda and Ben Murrell for helpful discussions.
Conflicts of interest
There are no conflicts of interest.
1. Butler DM. Single genome amplification and sequencing methods require appropriate thresholds for viral transmission and evolution studies
2. Abrahams MR, Treurnicht F, Ngandu N, Goodier SA, Marais JC, Bredell H, et al. Rapid, complex adaptation of transmitted HIV-1 full-length genomes in subtype C-infected individuals with differing disease progression
3. Salazar-Gonzalez JF, Salazar MG, Keele BF, Learn GH, Giorgi EE, Li H, et al. Genetic identity, biological phenotype, and evolutionary pathways of transmitted/founder viruses in acute and early HIV-1 infection
. J Exp Med
4. Abrahams M-R, Anderson JA, Giorgi EE, Seoighe C, Mlisana K, Ping L-H, et al. Quantitating the multiplicity of infection with human immunodeficiency virus type 1 subtype C reveals a nonpoisson distribution of transmitted variants
. J Virol