Reliable estimates of HIV incidence rates are critical for tracking the epidemic and planning prevention studies. Several recent prevention studies have led to equivocal results because biased estimates of incidence were used in their planning.1 Cross-sectional methods for estimating HIV incidence rates, using a sensitive [eg, enzyme-linked immunosorbent assay (ELISA)] combined with a less sensitive (eg, Vironostika detuned ELISA or BED capture enzyme immunoassay) diagnostic test, offer important advantages to traditional longitudinal follow-up studies in terms of cost, time, and attrition.2 However, as several reports have cautioned, the reliability of cross-sectional methods is in doubt, in part because of inconsistencies between estimates they have produced and those obtained by traditional longitudinal cohort studies.3,4 These concerns have led some investigators to propose adjustments to the standard estimator.5-8 Recently, Brookmeyer9 has questioned the need for adjusted estimators by arguing that “false negatives” and “false positives” cancel out, thus leading to no essential change. This raises fundamental questions of when adjustment of the standard estimator is needed and how such adjustments should be made.
One purpose of this article is to shed light on the choice of incidence estimators by providing intuition behind the McDougal adjustments and demonstrating that, even in the idealized situation when the sensitivity and specificity of the less sensitive test are fully known, these estimators are less precise than the unadjusted estimator in settings where all infected subjects eventually become reactive to the less sensitive test. A second purpose of the article is to determine the statistical properties of adjusted estimators of HIV incidence rate when a subset of infected subject would never become reactive to the less sensitive test. We derive the maximum likelihood estimator of HIV incidence based on a statistical model for this setting. The resulting estimator coincides with one developed by McWalter and Welte7 using a mathematical modeling approach. We demonstrate that the precision of the maximum likelihood estimator is always greater than that of the adjusted estimators considered by McDougal et al,5 and we develop a variance expression for this estimator. Finally, we determine and illustrate the biases of the unadjusted and adjusted incidence estimators when incorrect assumptions are made about a subpopulation of infected subjects who indefinitely remain nonreactive to the less sensitive test.
We use longitudinal natural history statistical models of HIV seroconversion and subsequent reactivity to a less sensitive diagnostic test to determine the statistical properties of unadjusted and adjusted incidence estimators based on a cross-sectional sample. The method of maximum likelihood estimation is used to derive the optimal cross-sectional estimator of HIV incidence for settings where a subset of the infected persons indefinitely remain nonreactive to the less sensitive test. The bias and precision of the various incidence estimators are assessed and compared using analytic methods and are illustrated using simulation studies.
Three-State Model for HIV Seroconversion and Reactivity to Less Sensitive Test
Suppose that N subjects are randomly selected from an asymptomatic population, and each is tested with an ELISA and, if positive, a less sensitive antibody test. The most commonly used less sensitive tests to date have been the 3A11-LS and Vironostika detuned ELISA assays and the BED capture enzyme immunoassay.2,10,11 Let N1, N2, and N3 denote the resulting numbers of subjects found to be nonreactive to ELISA, reactive to ELISA but nonreactive to the less sensitive test, and reactive to the less sensitive test, respectively, so that N = N1+N2+N3.
Assume that the observations arise from the 3-state longitudinal model depicted in Figure 1. State 1 denotes the preseroconversion period when an individual is either uninfected or infected without yet having seroconverted. State 2 denotes the time interval after seroconversion while an individual is nonreactive to the less sensitive diagnostic test [eg, with a normalized optical density (OD) value below a cut off of 1.0 with the BED capture enzyme immunoassay] and is sometimes referred to as the “recent infection” state. State 3 denotes the period beginning when the individual would test positive (reactive) with the less sensitive test and has been referred to as the “nonrecent infection” state.
Implicit in Figure 1 is the assumption that all infected individuals would eventually become reactive to the less sensitive test. The actual time spent in state 2, say L, varies from person to person and is assumed to be independent of time of seroconversion. Because of the interperson variability in L, use of “recent infection state” to describe state 2 is not meant in any strict literal sense, such as “infected in the past 6 months”. We use μ = E (L), commonly called the “mean window period”, to denote the mean population time in state 2. The distribution of L, and thus the value of μ, will depend on the less sensitive test being used. For example, Janssen et al,2 Parekh et al,11 and Constantine et al10 report estimated mean window periods of 129, 160, and 133 days for the 3A11-LS assay (using a OD cut off of 0.75), the BED assay (using a 1.0 normalized OD cut off), and Vironostika detuned ELISA (using a SOD cut off of 0.75), respectively.
Unadjusted Incidence Estimators
Suppose that I denotes the population incidence rate at the time of the cross-sectional sample and consider the unadjusted cross-sectional estimator
For the moment, we assume that μ is known. This estimator arises as a special case of the snapshot estimator considered Kaplan and Brookmeyer (equation 10).12 It also arises as the maximum likelihood estimator of I for 4-stage model considered by Balasubramanian and Lagakos13 when the time between infection and serconversion is negligible and when the incidence density is constant for a period of time preceding the cross-sectional sample. Because (1) is the maximum likelihood estimator of I in this setting, it follows that as N becomes large, it will converge to the true incidence rate and be the most efficient cross-sectional estimator of incidence. The denominator of Î differs slightly from that in the unadjusted estimator used by Brookmeyer and Quinn14 and Janssen et al,2 who use N1 + N2 instead of N1. However, N1 is typically about 2 orders of magnitude larger than N2, in which case the difference between these versions of the unadjusted estimator is negligible.
Adjusted Incidence Estimators
Adjusted versions of the estimator in (1) have been proposed by several investigators.5-8 McDougal et al5 (equations 1 and 2) consider the estimators
Here sens, spec, spec1, and spec2 denote the specificity and sensitivity of the less sensitive test as regards identifying subjects as having seroconverted within the past μ days; that is, if Y denotes the time between seroconversion and testing,
sens = p (nonreactive on the less sensitive test | ELISA+, Y ≤ μ),
spec = p (reactive on the less sensitive test | ELISA+, Y > μ),
spec1 = p (reactive on the less sensitive test | ELISA+, μ < Y ≤ 2μ), and
spec2 = p (reactive on the less sensitive test | ELISA+, Y > 2μ)
Based on longitudinal studies which give similar estimates for sens and spec1, Hargrove et al6 recommend a simplified version of (4) obtained by setting sens = spec1. However, Brookmeyer9 notes that for the model in Figure 1, sens = spec1 implies that spec2 = 1, which contradicts the use of a value of spec2 less than 1 by Hargrove et al.6 Welte et al8 recommend replacing the denominator in (4) by sens2, based on analysis of a mathematical model which leads to sens − spec1 + spec2 = 1. We return to these estimators later in the article.
Appendix 1 derives expressions for sens, spec, spec1, and spec2 in terms of HIV incidence and prevalence at the time of the cross-sectional sample and the distribution of time in state 2, based on the 3-state model in Figure 1. To gain insight into the adjusted estimators, imagine a hypothetical less sensitive test that becomes reactive exactly μ time units after every individual seroconverts. That is, the time, L, in state 2 equals μ for every individual, and every subject found to be in state 2 is a “recent infection” in the literal sense of having seroconverted within the past μ time units. Suppose that N2L denotes the number of such subjects from the sample of N subjects. Then from the same theory justifying (1), another valid estimator of the HIV incidence rate I would be
Furthermore, because N2L has the same binomial distribution as N2, the estimator in (6), while giving different numerical results than the unadjusted estimator in (1) for a given data set, has the same distribution and thus is equally precise. Of course, this estimator cannot be used because no such hypothetical less sensitive test exists. However, the adjusted estimators (2) and (4) can be viewed as approximations of (6) based on replacing the unobservable quantity N2L by 𝒩̂2L and Ñ2L, respectively. It is shown in Appendix 2 that for the 3-state model in Figure 1, 𝒩̂2L and Ñ2L have the same expectation as N2 and that Î1 and Î2 are valid estimators of I; that is, will converge to I as N grows large. This result is analogous to the finding by Brookmeyer,9 who shows that the “false positives” and “false negatives” cancel out in the adjustment formula considered by McDougal et al5 and forms the basis for his conclusion that “The McDougal adjustment has no net effect”.9
One key point is that, because (2) and (4) are based on estimates of N2L, the variances of Î1 and Î2 are always larger than that of Î (Appendix 3). Another is that the estimators in (2) and (4) cannot be computed in practice because sens, spec, spec1, and spec2 are not known exactly. That is, the adjusted McDougal estimators of HIV incidence used in practice are actually:
where 𝒩̂2L and Ñ2L are analogous to (3) and (5), but with sens and spec replaced by estimates. If sens, spec, spec1, and spec2 are estimated unbiasedly, (7) and (8) are valid estimators of I. However, as a result of the need to estimate sens and spec, the estimators in (7) and (8) always will have greater variance than those in (2) and (4); that is,
Thus, although these adjusted incidence estimators are valid when sens, spec1, and spec2 are estimated unbiasedly, they always will be less precise than the unadjusted estimator Î for the model in Figure 1. It follows that valid 95% confidence intervals (CIs) for I based on these estimators will be wider than the corresponding 95% CI based on the unadjusted estimator Î.
To illustrate the precision of the various incidence estimators, we conducted a simulation study based on several choices for the prevalence and incidence rates, a random sample of N = 3000, and when an individual's time in state 2 has a Weibull distribution with mean μ = .6,.5,.4 years, and standard deviation .6 years. For each of 2000 simulated samples, we generated counts (N1, N2, N3) and computed Î using (1), Î1 using (2), and Î2 using (4). To compute Î1, we estimated sens and spec by sampling from a truncated normal distributions centered about the true values and with standard deviations of 2%, and used these estimates in (7). Similarly, Î2 was computed by generating estimates of sens, spec1, and spec2, and then using these in (8). Full results are available upon request. The result for a 10% prevalence rate and mean window period of 0.5 years are summarized in Table 1. For each of the 5 estimators, the average was close to the true underlying incidence rate reflecting their validity. Also consistent with the theoretical results, the variances of the adjusted estimators were uniformly larger than that of the unadjusted estimator, and often substantially greater. The poorer precision of the adjusted estimators is also reflected in the expected widths of 95% symmetric CIs of I shown in Table 1, computed as ±1.96 SE, where SE denotes the standard error of the estimate. Different experimental settings (prevalence, incidence, mean window period, precision of estimates of sens, spec, spec1, and spec2) will lead to estimators with different SEs, but in every case the precision of the adjusted estimators will be poorer than that of the unadjusted estimator.
Moreover, unless the estimates of specificity and sensitivity are unbiased, as assumed in Table 1, the adjusted estimators of incidence will in general be biased. McDougal et al have cautioned on the challenges in obtaining good estimates of these quantities.5
Four-State Model Allowing for a Subpopulation That Indefinitely Remains Nonreactive
Several investigators have noted that some infected individuals can repeatedly test negative (nonreactive) on a less sensitive diagnostic test long after they have seroconverted. For example, Hargrove et al6 report a subject who repeatedly tested nonreactive on the BED assay more than 800 days after seroconversion, and Novitsky et al15 note 1 subject who repeatedly tested nonreactive on the Vironostika assay more than 695 days after seroconversion. If such subjects would never become reactive to the less sensitive test, then the model depicted in Figure 1 no longer applies and the statistical justification for the unadjusted estimator (1) no longer holds. That is, the unadjusted estimator in (1) may be biased by overestimating the number of subjects in the recent infection state. This was a motivation for the proposed adjusted incidence estimators; the need for adjustment here is also noted by Brookmeyer.9
To investigate the properties of the adjusted estimators of HIV incidence in this setting, and to provide a framework for developing an optimal estimator, we consider the expanded model in Figure 2. Here a proportion, 1 − p, of the population would, after seroconversion, never become reactive on the less sensitive test. The remaining population, in proportion p, would become reactive at some point after seroconversion. With this expanded model, μ denotes the mean window period for the subpopulation of infected individuals that would eventually become reactive with the less sensitive test; that is, μ is the conditional mean of L, given that an infected individual is in the subpopulation that will eventually become reactive to the less sensitive test. When p = 1, the model in Figure 2 reduces to the model in Figure 1.
Suppose subpopulation membership is independent of the risk of infection and subsequent HIV progression, and that the incidence density is constant for a period of time preceding the cross-sectional sample. Then (Appendix 4) the maximum likelihood estimator of I for the model in Figure 2 is
provided the numerator is nonnegative, and zero otherwise. Note that this estimator reduces to the unadjusted estimator in (1) when p = 1. The estimator in (9) also arises using a mathematical modeling approach for estimating incidence developed by McWalter and Welte.7 Recognition that the McWalter-Welte7 estimator is also the maximum likelihood estimator of I for the model in Figure 2 ensures that, as the cross-sectional sample size becomes large, it will converge to the true incidence rate and be the most efficient (smallest variance) cross-sectional estimator for this setting.
In practice, p is not known exactly, and thus the estimator used in practice is (9) with p replaced by an estimator, say p̂ that is,
provided the numerator is nonnegative, and zero otherwise. Estimates of the variance of (9) and (10) are given in Appendix 4 and can be used to form CIs for I. Methods for estimating p are considered in the Discussion.
When p < 1, it is easily verified from Appendix 1 that:
sens − spec1 ≥ 1 − p > 0 for p < 1. Therefore, the assumption made by Hargrove et al6 that sens − spec1 = 0 cannot hold for the model in Figure 2 with p < 1.
To illustrate the relative performance of the adjusted estimators of HIV incidence for the model in Figure 2, we conducted a simulation study using the same settings as in the previous simulation study, except that now a proportion 1 − p = 3%, 5%, or 7% of the population never become reactive on the less sensitive test. Full results are available upon request. The results for a 10% prevalence and μ = 0.5 are summarized in Table 2. Consistent with the theoretical results, the McDougal adjusted and maximum likelihood estimators are essentially unbiased, with the McDougal adjusted estimators being less precise than maximum likelihood estimator.
Robustness of Estimators to the Assumed Model
The previous sections indicate that the optimal incidence estimator depends on whether or not all infected subjects become reactive to the less sensitive test. Below, we indicate how both the adjusted and unadjusted estimators of incidence can be biased when the wrong underlying model is assumed.
Suppose, first that the unadjusted estimator is used, but that the correct model is that shown in Figure 2, for some p < 1. This might occur when the longitudinal study used to approximate μ and p had relatively few serconverters, all of whom subsequently became reactive to the less sensitive test, thus resulting in an accurate estimate of μ but an estimated p of 1. For example, with p = 0.97 and 20 seroconverters, there is a 54% chance that none of the recent infections are from the subpopulation that will indefinitely remain nonreactive to the less sensitive test.
It is shown in Appendix 6 that the unadjusted estimator (1) in this setting converges to
as N gets large, where 1 − φ is the HIV prevalence rate at the time of the cross-sectional sample. The constraint that the prevalence in state 2 is less than the overall infection prevalence implies that I < (1 − φ)/(φμ), which in turn ensures that this limit must always exceeds the true incidence I. For example, when p = 95%, μ = 0.5 years, the prevalence is 10%, and the true incidence rate is 2%, 4%, and 10%, the unadjusted estimator will actually converge to 3.01%, 4.91%, and 10.6%, respectively.
Next suppose that the estimator (9) is used with some values, say p0 and μ0, for p and μ, but that in fact the model in Figure 1 holds. This might arise because the longitudinal study used to approximate μ and p did not follow infected subjects for sufficient time, resulting in some not yet becoming reactive to the less sensitive test, and hence leading to underestimating μ by μ0 and p (= 1) by p0 < 1. It is shown in Appendix 6 that the adjusted estimator in (9) converges to
The adjusted estimator will thus be biased in this setting, and could either underestimate or overestimate the true incidence I. To illustrate using an example suggested by a referee, suppose that p = 1 and L has an exponential distribution with mean μ = 1, and that the cohort study used to approximate p and μ only followed subjects for 3 years. This would, on average, lead to p0 = 0.95 and μ0 = 0.95. With a prevalence of 10% (φ = 0.9) and true incidence rates of 2%, 4%, and 8%, the adjusted incidence estimator (9) will converge to 1.60%, 3.82%, and 8.25%, respectively.
When every infected individual would eventually become reactive to the less sensitive test (Fig. 1), the unadjusted cross-sectional incidence estimator (equation 1) will always be more precise than the adjusted incidence estimators and does not require external estimates of sensitivity and specificity parameters. On the other hand, when the less sensitive test would never become reactive in a subset of the population (Fig. 2), the most efficient estimation of incidence is given by the maximum likelihood estimator (9), which coincides with the estimator developed by McWalter and Welte,7 using mathematical models. This estimator requires an external estimate of the size of this subpopulation (1 − p), and the precision of this external estimate needs to be accounted for to obtain valid CIs for the underlying incidence rate I.
When the assumptions about the underlying model (Fig. 1 versus Fig. 2) is wrong, both the unadjusted and adjusted incidence estimators can be biased. Thus, a key issue is how to determine which model reflects the practical setting under consideration. Because the model in Figure 1 is the special case of the model in Figure 2 corresponding to p = 1, determining the correct model can be viewed as accurately estimating p. Estimation of sens, spec, spec1, and spec2 typically relies on external information from longitudinal studies that identify time of seroconversion with reasonable accuracy; such studies are relatively hard to conduct. In contrast, estimation of p can be achieved by an appropriate external cross-sectional sample of subjects known to have been infected for longer than Lmax, the maximum possible time from seroconversion until reactivity with the less sensitive test among infected subjects that eventually become reactive (McWalter and Welte, unpublished manuscript). Although stored samples from individuals who have been infected for long periods are abundant, obtaining an appropriate external sample is challenging because the sample must representative of the population from which the cross-sectional sample (N1, N2, N3) is obtained and, as a referee pointed out, this requires knowledge of Lmax, which is not well characterized for a variety of experimental settings. For the BED capture enzyme immunoassay, Barnighausen et al16 examine the stability of estimates of 1 − p for possible values of Lmax varying from 250 to 400 days, using nested samples of subjects obtained from a surveillance study in South Africa. Their results show no evidence of a decreasing rate of nonreactive subjects, suggesting that Lmax may be smaller than these candidate values. An alternative approach is to initially estimate the proportion of nonreactive subjects from a random sample known to be infected for more than a candidate value of Lmax, say 1 year, and to then test samples from the nonreactive subjects at later times (say, .5, 1, and 1.5 years later) to see if they remain nonreactive. Decreasing numbers of nonreactive samples would be evidence that a larger assumed value of Lmax is needed. Another approach is to test independent samples of subjects known to have been infected for differing time periods (say, 1, 1.5, 2, and 2.5 years) and assess the stability of the proportions who are nonreactive.
We have assumed that the cross-sectional sample was drawn from an apparently healthy (asymptomatic) population. Biased incidence estimates can result from a sample drawn from a more general population that includes persons with late stage HIV infection because of their increased risk of death13 and because some of these individuals may previously have been reactive to the less sensitive assay but became nonreactive with advancing HIV infection. For such samples, valid estimates of incidence can be obtained when based only on asymptomatic individuals.13
Use of antiretroviral treatments (ARTs) by individuals in the cross-sectional sample also can bias incidence estimates because ARTs suppress assay SOD levels, and thus can make a previously reactive individual non-reactive to the less sensitive assay. Currently, ART initiation occurs several years following seroconversion, typically more than Lmax time units. In such settings, the unadjusted estimator based on the model in Figure 1 will remain valid if all sampled individuals taking ARTs are regarded as being in state 3, regardless of their reactivity to the less sensitive test. However, appropriate modification of the adjusted estimates of incidence based on the model in Figure 2 is more complicated because the impact of ARTs on SOD levels might be different in the 2 subpopulations. ART use that occurs relatively soon after seroconversion and unrecognized ART use among the N sampled subjects further complicates the estimation of incidence. More research in this area is needed.
Throughout this article, we have assumed that the cross-sectional sample was a random sample from the population of interest. The importance of a random sample cannot be overemphasized, as use of a “convenience” or other nonrandom sample can distort results in ways that are usually impossible to quantify. We also have assumed that the mean window period μ is known. However, when using cross-sectional methods to estimate incidence, an estimate of μ must be used. When a biased estimate of μ is used, all estimators of HIV incidence discussed in this article would, in general, be biased. This underscores the need for carefully conducted and analyzed longitudinal cohort studies to estimate μ for different less sensitive tests and the identification of virologic, demographic, and other factors that might affect μ.1,3-6 An alternative to the standard approach of using an external estimator of μ is to employ an augmented cross-sectional design in which the subset of subjects found to be in the recent infection state are followed forward in time for reactivity to the less sensitive test; Wang and Lagakos18 demonstrate how this can be used to obtain internal estimators of the mean window period and proportion, 1 − p, of the population that would never become reactive to the less sensitive test.
We are grateful to the referees for their helpful comments.
1. Lagakos SW, Gable A, eds. Methodological Challenges in HIV Biomedical Prevention Trials
. Washington, DC: Institute of Medicine, National Academy Press; 2008.
2. Janssen RS, Satten GA, Stramer SL, et al. New testing strategy to detect early HIV-1 infection for use in incidence estimates and for clinical and prevention purposes. JAMA
3. Karita E, Price M, Hunter E, et al. Investigating the utility of the HIV-1 BED
capture enzyme immunoassay using cross-sectional and longitudinal seroconverter specimens from Africa. AIDS
4. Sakarovitch C, Rouet F, Murphy G, et al. Do tests devised to detect recent HIV-1 infection provide reliable estimates of incidence in Africa? J Acquir Immune Defic Syndr
5. McDougal JS, Parekh BS, Peterson ML, et al. Comparison of HIV type I incidence observed during longitudinal follow-up with incidence estimated by cross-sectional analysis using the BED
capture enzyme immunoassay, AIDS Res Hum Retroviruses
6. Hargrove JW, Humphrey JH, Mutasa K, et al. Improved HIV-1 incidence estimates using the BED
capture enzyme immunoassay. AIDS
7. McWalter TA, Welte A. Relating recent infection prevalence to incidence with a sub-population of non-progressors. J Math Biol
. 2009. [Epub ahead of print] PMID: 19633854.
8. Welte A, McWalter TA, Bärnighausen T. A simplified formula for inferring HIV incidence from cross-sectional surveys using a test for recent infection, AIDS Res Hum Retroviruses
9. Brookmeyer R. Should biomarker estimates of HIV incidence be adjusted? AIDS
10. Constantine NT, Sill AM, Jack N, et al. Improved classification of recent HIV-1 infection by employing a two-stage sensitive/less-sensitive test strategy. J Acquir Immune Defic Syndr
11. Parekh BS, Kennedy MS, Dobbs T, et al. Quantitative detection of increasing HIV type 1 antibodies after seroconversion: a simple assay for detecting recent HIV infection and estimating incidence. AIDS Res Hum Retroviruses
12. Kaplan EH, Brookmeyer R. Snapshot estimators of recent HIV incidence rates. Oper Res
13. Balasubramanian R, Lagakos SW. Estimating HIV incidence based on combined prevalence testing. Biometrics
. 2009. [Epub ahead of print] PMID: 19397583.
14. Brookmeyer R, Quinn TC. Estimation of current human immunodeficiency virus incidence rates from a cross-sectional survey using early diagnostic tests. Am J Epidemiol
15. Novitsky V, Wang R, Kebaabetswe L, et al. Better control of early viral replication is associated with slower rate of elicited antiviral antibodies in the detuned EIA during primary HIV-1C infection. J Acquir Immune Defic Syndr
16. Barninghausen T, Wallrauch C, Welte A, et al. HIV incidence in rural South Africa: comparison of estimates from longitudinal surveillance and cross-sectional cBED assay testing. PLoS One
17. Cox D and Hinkley D. Theoretical Statistics
. London, Unite Kingdom: Chapman and Hall; 1974.
18. Wang R, Lagakos SW. Augmented cross-sectional prevalence testing for estimating HIV incidence. Biometrics
. In press.
In what follows, we make the assumptions used in Balasubramanian and Lagakos13 for the model in Figure 1; namely, that time in state 2 is independent of time in state 1, and that the density function, f (u), for HIV infection is constant, say f (u) = f, for a period Lmax before the time, t, at which the cross-sectional sample is drawn. In this interval, f = φI, where 1 − φ is the prevalence at time t. We let E + (E−) denote that the ELISA is reactive (nonreactive), and let LS + (LS−) denote that the less sensitive test is reactive (nonreactive), and let G(x) denote the cumulative distribution function of time in state 2.
APPENDIX 1. EXPRESSIONS FOR SENS, SPEC, SPEC1, and SPEC2 FOR MODEL IN Figure 1
For an individual found to be infected, let Y denote the elapsed time from seroconversion until the time of the test. Let Lmax denote the upper limit of support for L. For any constants a, b satisfying 0 ≤ a < b < Lmax,
Using similar arguments when b > Lmax, we get,
An expression for sens is obtained by setting a = 0 and B = μ in (A.1), yielding
and 1-spec is obtained by setting a = μ and b = ∞ in (A2), yielding
Using (A.1) with a = μ and b = 2μ gives 1-spec1, and using (A.2) with a = 2μ and b = ∞ gives 1-spec2. This yields:
It is easily verified that the overall specificity is related to spec1 and spec2 by
APPENDIX 2. EXPECTATION AND CONVERGENCE OF 𝒩̂2L AND Ñ2L; CONVERGENCE OF Î1 AND Î2 FOR MODEL IN FIGURE 1
Substituting the expressions for sens and spec into (3), we have
Rewrite (3) as
Dividing the numerator and denominator of (2) by N and letting N → ∞, the probability limit of (2) is
Substituting the expressions for sens and spec into this limit to equal to I whenever spec< 1. When spec = 1, the limit also equals I when sens = 1.
Dividing the numerator and denominator of (4) by N and letting N → ∞, it follows that the probability limit of Î2 is I.
APPENDIX 3. PROOF THAT VAR (Î1) > VAR (Î) and VAR (Î2) > VAR (Î) FOR MODEL IN FIGURE 1
where the last inequality holds because
The same approach yield Var (Î2) > Var (Î) by showing that spec22 > (sens − spec1 + 2spec2 −1)2. This can be verified by substituting (A.3), (A.5), and (A.6) into this expression and simplifying.
APPENDIX 4. MAXIMUM LIKELIHOOD ESTIMATOR OF HIV INCIDENCE RATE FOR MODEL IN FIGURE 2
The likelihood function for (I, φ) based on (N1, N2, N3) can be written as:
From (A.9), the log of the likelihood is
from which one obtains the partial derivatives:
Setting these to zero and solving gives maximum likelihood estimators:
The matrix of negative second derivatives of ℓ(φ, I) with φ and I replaced by their maximum likelihood estimates, can be shown to have (1,1), (1,2), and (2,2) elements
Estimated variances for Î and φ̂ are obtained as the diagonals of the inverse of this matrix. This gives
Regularity conditions for the asymptotic properties of the maximum likelihood estimator hold.17
When p is estimated externally, note that (N1, N2, N3) ∼ Multinomial (N; φ, p2, p3), where
We use the fact that
APPENDIX 5: LIMIT OF ADJUSTED ESTIMATORS FOR MODEL IN FIGURE 2
Under the 4-state model in Figure 2, the probability limits of N2 /N and N1/N are φIpμ + (1 − p)(1 − φ) and φ, respectively. Let sens*, spec*, spec*1, and spec*2 denote the algebraic expressions for sens, spec, spec1, and spec2 under the model in Figure 1, given in (A.3), (A.4), (A.5), and (A.6), respectively. It is easily verified that
sens = sens* · p + (1 − p)
sens = sens* · p
sens = sens*1 · p
sens = sens*2 · p
where the last step follows from Appendix 2.
where the last step follows from Appendix 2. Using similar arguments, the probability limit of Î2 equals I.
Note that although McDougal et al5 allude to the assumption that Lmax is less than 2μ, the McDougal adjusted formula Î2 remains valid regardless of whether this assumption holds or not. Also, under the setting where Lmax <2μ, it can be verified that sens − spec1 = 1 − p and spec2 = p. Using these relations, it is easily verified that Î2 reduces to the unadjusted estimator Î for the model in Figure 1 and to the maximum likelihood estimator Îp for the model in Figure 2.
APPENDIX 6: LIMITS OF UNADJUSTED AND MAXIMUM LIKELIHOOD ESTIMATORS WHEN ASSUMED MODEL DOES NOT HOLD
Under the 4-state model in Figure 2, the probability limits of N2 /N and N1/N are φIpμ+(1 − p)(1 − φ) and φ, respectively. The probability limit of the unadjusted estimator is therefore
Now suppose the model in Figure 1 holds, and that the cohort study used to estimate p and μ only follows subjects until time τ < Lmax, and incorrectly assumes that any infected person who has not yet become reactive on the less sensitive test will never become reactive. Let μ0 and p0 denote the expectations of the resulting (under) estimates of μ and p, respectively; that is, μ0 = E(L|L ≤ τ), and p0 = G (τ). Then as the size of the cross-sectional sample grows large, the adjusted estimator (9) converges in probability to