#### Introduction

HIV incidence, the rate at which new infections are occurring in populations, measures the current growth of the epidemic. Knowledge of incidence rates leads to more effective and targeted intervention efforts. Incidence rates are important when designing studies to evaluate prevention programs, for example, in sample size determinations, and indeed, the lack of accurate estimates has been identified as one reason for the failure of several large-scale HIV prevention trials ^{[1]}. There are enormous challenges to obtaining reliable estimates of HIV incidence in populations. Recent progress in the use of novel biomarkers in cross-sectional samples is an exciting methodological development that offers the promise of addressing these challenges. The basic idea is to use a biomarker to identify persons recently infected. The approach requires only a cross-sectional sample of persons from whom serum specimens are collected at a single point in time, which is in contrast to the cohort approach that requires that multiple specimens from participants be collected longitudinally over time.

The biomarker approach was first suggested in connection with an assay for P24 antigenemia ^{[2]}. This particular biomarker necessitates large sample sizes to obtain statistically reliable estimates of incidence because the durations that persons are P24 antigenemic are relatively short. Subsequently, the Serologically Testing Algorithm for Recent HIV Seroconversion (STARHS) was developed ^{[3]}, which consists of a dual antibody testing system in which persons are tested with a standard HIV antibody assay and those that are positive are then tested with a second less sensitive assay. Persons who are positive on the first assays and negative on the second assays are said to be in the â€˜windowâ€™ period. The second assay that is currently in widespread use is the BED HIV-1 capture enzyme immunoassay ^{[4]}. A person is in the â€˜window periodâ€™ if they are confirmed positive on the standard enzyme immunoassay (the first assay) and negative on the BED assay (the second assay).

The BED approach was the basis for the recently reported figure that approximately 56 000 new infections occurred in the United States in 2006, which was higher than the previous estimate of 40 000 ^{[5,6]}. That statistical revision raises questions whether the increase is due to changes in the methodology or changes in the underlying incidence, and whether the new (BED) method is any more accurate than previous methods? UNAIDS had actually issued a cautionary statement that the BED assay should neither be used for estimating incidence nor for monitoring trends because of reports from African and Asian settings that the BED approach overestimate rates ^{[7]}.

The reports that the BED approach overestimates HIV incidence when compared to cohort studies have led to proposals for statistical adjustments ^{[8,9]}. These adjustments purportedly correct for â€˜misclassificationâ€™ that arises because BED does not always accurately distinguish recent from past infection; of particular concern is that some long-standing infections have been found in the window period. Yet, such statistical adjustment procedures have not been universally adopted ^{[10]}. For example, the US incidence estimate was not adjusted for misclassification, which raises the question whether it was an overestimate because it was not adjusted? This article evaluates the appropriateness of statistical adjustments of biomarker incidence estimates for misclassification with respect to the timing of infection.

#### Statistical framework

The biomarker approach to estimating incidence from a cross-sectional survey relies on the epidemiological relationship that the prevalence of a condition is equal to the incidence multiplied by the mean duration of that condition. Here the â€˜conditionâ€™ refers to the â€˜window periodâ€™. The window period is not a fixed number but rather has a probability distribution. We use the symbol *Î¼* for the mean (or expected value) of the window period. Persons can have window periods either shorter or longer than the mean *Î¼*. The probability that the window period is greater than *t* days, the survival function, is called *S*(*t*). The biomarker approach assumes that the expected number of new infections occurring in the population per unit time is approximately constant for the most recent period up to the maximum possible window period *M*; there have been no reports that the BED window period is longer than about *M* = 3 years. If *g* is the probability density of incident infection (i.e., the expected number of new infections per day divided by the population size), which is assumed constant over the past *M* years, then the probability that an individual is in the window period at the time of recruitment into the cross-sectional sample is

The mean window period is mathematically equal to the integrated survival function, that is,

which is a relationship we will use several times in this article. Thus, the right side of (Equation 1) is equal to *g Î¼*. Using that fact and dividing (Equation 1) by the proportion of persons who are either uninfected or in the window period, we obtain

where *I* is the incidence rate and *p* is the proportion of persons that are in the window period from among persons who are either negative on the standard enzyme immunoassay or in the window period. If the mean *Î¼* is expressed in days, then *I* is the fraction of the uninfected population that becomes infected per day. (Equation 2) is the basis of the biomarker approach for estimating incidence, which has also been termed a snapshot approach because it requires only a single cross-sectional sample ^{[11]}. The approach relies on an unbiased estimate of the mean window period. Statistical approaches have been proposed for incorporating uncertainty in the mean window period into confidence intervals for the incidence rate including an analytic procedure ^{[12]} or a Monte Carlo procedure ^{[13]}. It is assumed that persons with advanced HIV disease (AIDS) are removed from the cross-sectional sample before calculating incidence from (Equation 2) ^{[10]}.

#### Evaluation of statistical adjustments

Two adjustment procedures have been proposed to account for misclassification that purportedly arise if persons not infected â€˜recentlyâ€™ are in the window period (the false positives) or if persons infected â€˜recentlyâ€™ are not in the window period (the false negatives). These adjustment procedures define â€˜recent infectionâ€™ as an infection that occurred within the past *Î¼* days, where *Î¼* is the mean window period. The procedures use sensitivity and specificity corrections to adjust the numbers in the window to obtain a â€˜correctedâ€™ number infected within the past *Î¼* days.

##### The McDougal adjustment

McDougal *et al*. derive a correction factor (see (Equation 2) in ^{[8]}) which is the ratio *P*(*T*_{0})/*P*(*W*), where *P*(*T*_{0}) is the probability a person was infected within the past *Î¼* days and *P*(*W*) is the probability of being in the window. An important point is that the McDougal approach makes a distinction between *P*(*T*_{0}) and *P*(*W*). However, it is a mathematical fact that the frequency (probability) of false negatives is exactly equal to the frequency of false positives. We prove this remarkable fact by the following mathematical argument The false-negative probability (FN) is the probability of being infected within the last *Î¼* days and not being in the window period. The false-positive probability (FP) is the probability of being in the window period and having been infected more than *Î¼* days ago. The fact that FN = FP is true because

Our result that FN = FP is illustrated in Fig. 1, which shows two circles that represent the probability of being in the window period, *P*(*W*), and the probability of being infected within the past *Î¼* days, *P*(*T*_{0}). The two regions in which the circles do not overlap are the false-negative and false-positive probabilities and these two regions are of exactly equal size, implying that the two circles themselves are of exactly equal size, that is, *P*(*W*) = *P*(*T*_{0}) Although such a result does not hold in general, it is true in this context in which the â€˜screening testâ€™ is the window period and that screening test is being used to classify persons as infected within the past *Î¼* days or not.

The fact that *P*(*W*) is equal to *P*(*T*_{0}) can also be proved simply and directly as follows. The probability a person was infected within the most recent *Î¼* days is just the rate infections occur in the population, *g*, multiplied by the duration of the time interval, which is *Î¼*; that argument implies that *P*(*T*_{0}) is equal to *g Î¼*. But, *P*(*W*) is also equal to *g Î¼* [see (Equation 1)], thus proving that *P*(*W*) = *P*(*T*_{0}).

Thus, the McDougal correction factor, which is the ratio *P*(*T*_{0})/*P*(*W*), is theoretically equal to 1. The adjustment for false negatives exactly counterbalances the adjustment for false positives. The net effect of the McDougal adjustment is zero.

A subtle point is that although the probabilities *P*(*W*) and *P*(*T*_{0}) are exactly equal, the particular persons who comprise the group in the window period are not exactly the same as the persons who comprise the group infected in the last *Î¼* days. In other words, although the sizes of the groups are exactly the same, the members of the groups are not. Accordingly, the assay for the window period should not be used to make individual determinations (diagnoses) as to whether or not a particular person was infected recently; rather it is used to obtain an aggregate number of persons in the window period, which is then used to estimate the incidence rate in the population.

##### The Hargrove adjustment

Hargrove *et al*. ^{[9]} developed a simplified version of the McDougal adjustment using an additional assumption that the sensitivity of the BED is equal to its specificity among persons infected between *Î¼* and 2 *Î¼* days earlier [see (Equation 3) in ^{[9]}]. They develop an adjustment that has an input factor *ϵ*, which is the probability of being in the window period if infected at least 2 *Î¼* days earlier; Hargrove *et al*. ^{[9]} use *ϵ* = 0.053. Larger (smaller) values of *ϵ* lead to larger (smaller) downward adjustments to the incidence rate.

However, in the following section (Inconsistency of the Hargrove adjustment), we show that the mathematical implication of the Hargrove assumptions is that *ϵ* is 0. That is, there is a fundamental mathematical inconsistency in the Hargrove formula with any nonzero value for *ϵ* and that inconsistency can produce very anomalous results.

We illustrate the anomalies of the Hargrove adjustments with a numerical example. Consider two communities *A* and *B* with the same current rates of HIV incidence and population sizes. Suppose the HIV incidence rates in *A* were 0 up until 3 years ago. The epidemic in *B* occurred in two waves. In the first wave, there was a burst of infections that occurred 5 years earlier, resulting in 20% of the population of *B* becoming infected and thereby creating a pool of prevalent long-standing infections. In the second wave, the HIV incidence rates in *B* were exactly the same as in community *A* during the past 3 years. Cross-sectional samples of 10 000 individuals are taken in both communities. Table 1 shows the numbers of persons in the sample that were uninfected, infected, and in the window period, along with the Hargrove adjusted and unadjusted rates (see Table 1 footnote for explanation). We assume that no persons from *B* who were infected 5 years earlier are currently in the window period (because there are no data suggesting window periods are longer than about 3 years). The unadjusted incidence rates were 3.6% per year in both communities. The Hargrove adjusted incidence rates in *A* and *B* were 3.5 and 0.7% per year, respectively. Thus, the Hargrove procedure yields an anomalous result because the adjusted rate in *A* is about five times greater than *B* even though the current HIV incidence rates are equal in the two communities.

Why does the Hargrove adjustment produce such an anomalous result? The Hargrove adjustment is subtracting off a fraction of the long-standing HIV prevalent infections from the numbers in the window period. But because these long-standing infections are not in the window period (they were infected 5 years earlier, and there are no data suggesting window periods extend that long), the procedure is incorrectly deflating the numbers in the window period. As the pool of long-standing prevalent infections becomes larger, the Hargrove adjusted incidence will become increasingly more biased downwards. In fact, the Hargrove adjustment procedure produces negative estimates of the HIV incidence rates in many plausible settings. For example, the Hargrove adjustment (with *ϵ* = 0.053) will give a negative estimate of incidence if the true HIV incidence rate is 1% per year and if the percentage of the population with long-standing prevalent infections of greater than 3 years duration is 10% or more. In effect, the Hargrove adjustment reduces to assuming that a fraction of infected persons (*ϵ*) remain in the window period indefinitely. Yet there are no data to indicate that the window period extends beyond approximately 3 years.

##### Inconsistency of the Hargrove adjustment

Hargrove *et al*. ^{[9]} define the sensitivity (sen) as the conditional probability *P*(*W*|*T*_{0}) of being in the window (*W*) if the person was infected within the last *Î¼* days (*T*_{0}):

Hargrove *et al*. ^{[9]} define spec1 to be the specificity among persons infected between *Î¼* and 2 *Î¼* days ago. Let *T*_{1} represent the event that an individual was infected between *Î¼* and 2 *Î¼* days ago. We have

Let *T*_{2} represent the event that an individual was infected between 2 *Î¼* and *M* days ago, then the probability that such an individual is in the window is

We can express the mean as a sum of integrals over three intervals because

:Dividing the above equation by *Î¼*, we have

The Hargrove adjustment assumes that sen = spec1, which implies from (Equation 4)

It follows [from (3)] that *P*(*W*|*T*_{2}) = 0, which implies *ϵ* = 0 (*ϵ* is the conditional probability of being in the window period if infected more than 2 *Î¼* days ago). Thus, the Hargrove assumption that sen = spec1 implies *ϵ* = 0. Any nonzero value for *ϵ* is mathematically incompatible with the Hargrove assumptions. The Hargrove adjustment uses *ϵ* = .052.

#### Why cohort and biomarker incidence estimates may not agree

There have been several reports especially from Africa that cohort estimates of incidence are lower than biomarker (BED) estimates ^{[7,14]}. These reports raise questions as to why cohort estimates do not agree with the biomarker estimates, and whether cohort estimates should be regarded as the gold standard for assessing the accuracy of biomarker estimates?

We consider two sources of error that could result in cohort studies yielding biased estimates of incidence rates of a population. The first concerns selection bias into cohorts that result if persons who would agree to participate in cohort studies, the â€˜compliers,â€™ have different risks of infection than other persons. For example, compliers may be less mobile than the noncompliers and mobility may be associated with HIV risks. The second source of error, we call adherence effects, occurs if the follow-up visits themselves have an effect on HIV incidence perhaps through repeated exposure to counseling (such as condom promotion or other prevention messages) among persons who adhere to the schedule of visits. We numerically investigated the sensitivity of cohort estimates of incidence to selection bias and adherence effects. Table 2 shows the ratio of the incidence rate in a population and that from a cohort study. For example, if 50% of a population is compliers and the relative risks associated with selection bias and adherence effects are both 0.67, then the population incidence rate would be 1.86 times greater than estimated from a cohort study. Although we do not have direct data on the actual magnitude of the selection and adherence effects, we do know that a significant fraction of persons who were asked to participate in several major cohort studies either refused to participate or did not complete the follow-up schedule. For example, among studies cited as examples where the BED estimates were higher than the cohort estimates, only 61% ^{[15]} and 71% ^{[9]} of persons at baseline gave follow-up blood specimens.

There are also important sources of error in the biomarker approach for estimating population incidence. A main source of error in the biomarker approach is the mean window period. The mean should be calculated from a representative sample of window periods of the population. The mean window period may depend on the circulating HIV strain and other co-infections in the population ^{[10]}. Furthermore, neither short nor long window periods should be systematically excluded from the calculation. If long window periods are excluded, then the mean will be underestimated, potentially causing an overestimation of incidence. For example, the reported mean window period for clade *C* HIV-1 virus in southern Africa was 187 days (0.512 years) ^{[9]}. However, that estimate systematically excluded censored data (cases still in the window period at last follow-up), which resulted essentially in the exclusion of all window periods greater than 12 months (see Fig. 1 in ^{[9]}). Excluding censored observations from the statistical analysis will results in downwardly biased estimates of the mean window period.

We illustrate the impact of excluding long window periods or censored observations when calculating the mean window period. Suppose 85% of observations have window periods 365 days or less, and the mean of those observations is 187 days; and 15% of observations have window periods greater than 365 days and their mean is 620 days. The effect of excluding the 15% of observations with windows greater than 365 days is to produce a mean window period that is 74% of what it should be, and a biomarker incidence estimate that is 1.35 times larger than it should be.

It is also worth reiterating that Equation 2 requires the mean window period and not the median; this is a potentially important distinction because the distribution of window periods is right skewed, in which case the median would be smaller than the mean. Accordingly, if the median was used instead of the mean in (Equation 2), the incidence rate would be overestimated. The median has been used in some analyses ^{[4]}.

In summary, discrepancies between biomarker and cohort incidence estimates, even by a factor of 2 or more, could well be explained by a combination of reasons including underestimation of the mean window period due to exclusion of censored observations, and the impact of selection bias and adherence effects in cohort studies.

#### Discussion

Reports of discrepancies between biomarker and cohort incidence rates have motivated adjustment procedures to correct biomarker estimates. These procedures attempt to correct for misclassification with regard to the timing of HIV infection. Our analysis shows that use of two of these adjustment procedures are generally misguided. The McDougal adjustment has no numerical effect because mathematically the false positives exactly counter balance the false negatives. The Hargrove adjustment has a mathematical error that can cause significant underestimation of the HIV incidence rates, especially when there is a pool of long-standing HIV prevalent infections in the population.

Our analysis of the adjustment procedures assumed a maximum possible window period (*M*), and published analyses are consistent with that assumption. Indeed, there are no data documenting any window period longer than approximately 3 years (see e.g., Fig. 7 in ^{[4]} and Fig. 1 in ^{[9]}). If, however, a proportion of HIV-positive persons are identified who remain in the window period indefinitely (i.e., for periods considerably greater than 3 years), then an adjustment would be necessary.

Although published analyses of the distribution of window periods have not reported persons re-entering the window period after having exited, we recognize that such a phenomenon is a possibility. Under such circumstances, *Î¼* in (Equation 2) refers to the total expected time spent in the window period (the initial time in the window period plus any subsequent revisits; see ^{[11]} for theoretical justification). Further studies are needed to refine our understanding of the duration of window periods including whether persons re-enter the window. It is imperative that these studies not exclude persons with long window periods. Persons still in the window period at last follow-up should be treated as censored data using survival analysis techniques.

How can we improve the accuracy of biomarker estimates of population HIV incidence? It is critically important that cross-sectional samples be representative of the target population with respect to HIV risks. For example, specimens that are drawn principally from antenatal clinics, sexually transmitted disease clinics or voluntary testing and counseling centers may not be representative of the broader population. In such cases, weighting of the sample will be necessary to appropriately adjust the incidence to reflect the target population (see ^{[5]} for an example of such adjustments). Adjustments to improve the representativeness of cross-sectional surveys are very different from those for misclassification discussed in ^{[8,9]} and evaluated in this article.

Persons with advanced HIV disease should be excluded from the cross-sectional samples when using the biomarker approach. The accuracy of biomarker estimates can also be increased through improved estimates of the mean window period across HIV subtypes and diverse populations.

The inconsistency between cohort and biomarker estimates of incidence reported in some studies is likely due to a multitude of reasons. These include errors in the mean window period of the biomarker as well as selection bias and adherence effects in cohort studies. In view of the potential errors with both cohort and biomarker approaches, cohort estimates should not blindly be considered the gold standard for assessing the validity of biomarker estimates.

#### Acknowledgement

The author thanks Peter Ghys and journal reviewers for helpful comments on an earlier draft of the manuscript.

R.B. takes responsibility for the concept, analysis, interpretation and drafting of the manuscript.