To meet the primary goal of the National HIV/AIDS Strategy for the United States of reducing new HIV infections, it is essential to increase the percentage of persons living with HIV who are aware of their infection.1 Estimating these outcomes with appropriate models is important to help the development and evaluation of prevention programs, particularly if estimates can be derived by geographic area or risk and demographic groups. Such models also provide information on the size of the population living with HIV, which is helpful for planning treatment and care services.
To date, 2 types of statistical methods have been used to estimate incidence at the national level in the United States using HIV surveillance data: one using an additional biomarker test to identify recent infection and the other using a back-calculation approach to estimate the number of new infections from the annual number of diagnoses. The biomarker method can be applied to single-year or multiple-year data to estimate incidence in the year or years for which the additional tests were conducted for new diagnoses.2–4 However, this method cannot be used to estimate the prevalence and the number of undiagnosed infections if there is no estimate of cumulative incidence; ie, all infections that occurred in the past. The back-calculation method can be used to estimate not only incidence but also prevalence and the number of undiagnosed infections.3,5 However, it requires the annual numbers of new diagnoses stratified by stage of disease (AIDS diagnosed during the same calendar year as HIV diagnosis vs. later) for the entire epidemic history, and this information is not available in many states because confidential name-based HIV reporting was implemented at different times in different states.6
As HIV disease progresses, the CD4 cell count can be used to estimate the time since infection at the date of CD4 test, assuming no treatment has been received. The trajectory of CD4, on a square root scale, has typically been modeled as a linear function of time since infection,7–9 although other types of functions have also been used.10,11 Estimates of model parameters for subpopulation groups, without identifying HIV subtypes, have been provided by Lodi et al,8 and estimates for specific subtype groups have been provided by Touloumi et al.9 Hall et al12 applied this model to estimate the distribution of diagnosis delays (time from HIV infection to diagnosis) among persons with newly diagnosed HIV in the United States, and Schwarzwald et al13 used the method to estimate HIV incidence in Brazil based on a linear extrapolation prediction.
In this article, we use the method outlined in Hall et al12 to estimate the distribution of diagnosis delays, but with updated CD4 depletion model parameter estimates that are more specific to the predominant subtype in the U.S. HIV population. We use these parameters to estimate the diagnosis delay distribution, create a diagnosis delay weight, and use this weight to estimate the number of new HIV infections by year of infection, which includes diagnosed and undiagnosed infections. Combined with information on cumulative numbers of deaths, we estimate the number of persons living with HIV infection.
We used data on HIV cases diagnosed during the study period (2006–2013) and the first CD4 value after diagnosis reported to the Centers for Disease Control and Prevention's National HIV Surveillance System to determine diagnosis delays and estimate incidence, prevalence, and the percentage undiagnosed. All states, the District of Columbia, and U.S. territories have mandatory reporting of cases of HIV infection to the state or local health departments, and data on demographic, clinical, and risk characteristics and deaths are collected in a uniform format and electronically submitted to the CDC without personal identifiers. Cases of stage 3 (AIDS) infection were reported since the beginning of the epidemic in the early 1980s, and jurisdictions implemented reporting of HIV over time, with all jurisdictions reporting by 2008. Cases are routinely deduplicated within and between jurisdictions.14 Reporting of the first CD4 test result after diagnosis of HIV infection is a required data element on the HIV case report form. In addition, although jurisdictions have had mandatory reporting of all laboratory test results for CD4 count <200 since the early 1990s, over time they have also implemented mandatory reporting of all values of CD4 test results. As of December 2015, all but 6 states had implemented mandatory laboratory reporting of all CD4 values, and 32 states and the District of Columbia reported complete data to the National HIV Surveillance System, representing 70% of persons living with HIV in the United States.
Not all diagnosed cases are reported to CDC in a timely fashion; some may take as long as several years. To account for cases diagnosed but not yet reported, reporting delay weights are used.15 After adjusting for reporting delay, persons with CD4 tests were also assigned a weight to account for those without a CD4 test based on the year of HIV diagnosis, sex, race/ethnicity, transmission category, age at diagnosis, and the status at the end of the study period—whether the person was living with HIV whose disease had never been classified as AIDS, died without ever having been classified as AIDS, or had progressed to AIDS regardless of whether living or dead.
Because the completeness of CD4 data differed among jurisdictions, we also considered an approach wherein we selected jurisdictions with relatively high completeness (defined as those where more than 50% of persons had a CD4 test within 3 months of diagnosis in each calendar year or more than 85% of persons had a CD4 test during the study period), then assigned weights to the persons in the selected jurisdictions to account for persons not in these jurisdictions and applied the procedure described in this article to estimate incidence, prevalence, and the proportion of undiagnosed infections based on the HIV-infected persons in the selected jurisdictions.
Approximately 30% of cases of HIV infection are reported to CDC without an identified risk factor.16 To provide case counts by transmission category, a summary classification of the single risk factor most likely to have been responsible for transmission, multiple imputation is used to assign a transmission category.17 Multiple imputation is a statistical approach in which each missing transmission category is replaced with a set of plausible values that represent the uncertainty about the true, but missing, value. The plausible values are analyzed using standard procedures, and the results of these analyses are then combined to produce the final results.18,19
Deaths are ascertained by linking HIV surveillance data to vital records, and death information from death certificates is imported into the HIV surveillance system. Death ascertainment for a given year of death is completed within 12–18 months; therefore, we report on data that allow at least 18 months of reporting delay.
Date of Infection
To estimate date20 of infection, we first updated previously published CD4 depletion model parameters.9 We modeled CD4 depletion using the procedure in Lodi et al8 with a subset of CASCADE (Concerted Action of Seroconversion to AIDS and Death in Europe) data (20 pooled in September 2014 within EuroCoord-www.eurocoord.net) of selected cohorts in countries where most of the cases are subtype-B, as they are in the U.S. HIV population. We also redefined covariates to match the HIV risk and population stratifications commonly used in the United States; for example, risk and age groups were redefined. For each person with diagnosed HIV and who had a first CD4 test at or after diagnosis but before antiretroviral therapy, the date of HIV infection for this person is estimated based on the CD4 depletion model that was commonly used in the literature (eg, see Ref. 8,9).where t is the duration of infection at the date of the CD4 test. If Ti is the duration of infection at the date of first CD4 test for the person, then it is estimated by
and the date of infection is estimated by
If the first CD4 test is not at (in the same month of) HIV diagnosis, then due to model uncertainty, the estimated date of infection could be after the HIV diagnosis date. In this situation, the date of infection is reset to be the date of HIV diagnosis. Because we report results for persons aged 13 or older, if a person has an estimated date of infection before age 13, then the date of infection for this person is set to the date when the person reaches the age of 13. Because our age cutoff is younger than that of the CASCADE cohorts, we applied the parameters for age group 15–19 to the youngest age group 13–19 in our analysis.
The intercepts ai and slopes bi vary from person to person and are assumed to follow a bivariate normal distribution N[(a,b), (σa,σb), ρ]. Estimated model parameters are displayed in the Supplemental Digital Content, http://links.lww.com/QAI/A889. A difficulty in applying this model is that age at infection, a stratification variable, is unknown. To solve this problem, we first use the age at the date of first CD4 test as an approximation of the age at infection to determine the CD4 modeling age group and estimate Ti (the time between infection and first CD4 test) using age group–specific CD4 model parameters in the CD4 depletion model. Age at infection is then estimated as age at first CD4 test—Ti. We then repeat the above process but using the updated age at infection to determine the final CD4 modeling age group.
Diagnosis Delay Weight
Diagnosis delay is defined as the time from HIV infection to diagnosis, which is Ti−Di, where Di is the time from HIV diagnosis to the first CD4 test. We are interested in the probability that HIV infection will be diagnosed within a given period after infection. The distribution of diagnosis delay can be estimated using standard survival analysis techniques for right truncated data. Conceptually, diagnosis delay is similar to reporting delay, which is defined as the time from the date of diagnosis of HIV to the date the diagnosis is reported to the surveillance system. Using a procedure similar to that for estimating the reporting delay probability,15 we estimate the diagnosis delay probability, P(x), the probability that an infected person would be diagnosed within x units of the time after infection.
A diagnosis delay weight, W(x) = 1/P(x) is assigned to each person according to the infection date estimated for the person (x = time from infection to the ending date of the study period). The diagnosis delay weights are generated separately by strata determined by sex, transmission category, and race/ethnicity. With the estimated date of infection and diagnosis delay weight associated with each case diagnosed in the study period, the number of new infections in a calendar year can be estimated by summing the diagnosis delay weights among cases with infection date in that year within each population group of interest.
The number of persons infected before, but with undiagnosed infection at, the beginning of the study period (U) is also estimated based on infections diagnosed during the study period but with an estimated date of infection before the study period. Let ui be the number of persons among U but diagnosed during the ith year after the study period begins. Then, we have , where u1, u2, …, u8 are observed diagnoses in the 8 years of the study period. Let Hi be the total number of persons with HIV diagnosed during the ith year. We use linear regression to model the observed number of diagnoses Hi and the rate ri = ui/Hi in the study period, and then estimate ui for i >8 (unobserved after the study period) by ui = Hi × ri, where Hi and ri (>0) are extrapolated from linear regression. For each person with HIV diagnosed during the study period but with an estimated date of infection before the study period, a diagnosis delay weight W = U/(u1 + u2+ …+u8) is assigned, to account for those infected before the study period but still not diagnosed at the end of the study period. The number of undiagnosed infections before the study period can be estimated by the sum of diagnosis delay weights among persons with an infection date before the study period within each population group of interest. An example of this extrapolation procedure is illustrated in Figure 1, with “known” results based on cases diagnosed during the study period and unobserved future values predicted from linear regression.
Number of Undiagnosed Infections
The number of persons living with diagnosed HIV infection at the beginning of the study period is required to estimate the prevalence and percentage of undiagnosed HIV infections during the study period. The total number of persons living with HIV infection at the beginning of the study period is estimated as the number of persons living with diagnosed HIV infection at the beginning of the study period plus the number of persons with undiagnosed HIV infection at the beginning of the study period.
The number of persons living with HIV infection at the end of each year of the study period is calculated as the prevalence at the end of the previous year plus the number of new infections during the year minus the number of deaths during the year. The number of undiagnosed HIV infections at the end of each year is the number of undiagnosed HIV infections at the end of the previous year plus the number of new infections estimated to occur during the year minus the number of infections diagnosed during the year.
Uncertainties associated with each estimation step are calculated and combined based on analytical formulas or approximations using the delta method. Because multiple imputation was used in handling missing transmission category, and multiple dates of infection were simulated for each person, the standard procedure for combining results from multiple imputation datasets is followed to obtain the final estimates and their standard errors.18
An estimated 366,966 persons received a diagnosis of HIV infection in the study period (2006–2013, adjusting for reporting delay); 358,262 of them were reported to the National HIV Surveillance System by December of 2015. Among the reported persons, 61.9% had a CD4 test within 3 months of diagnosis and 89.5% by December of 2015. The annual numbers of HIV diagnoses and those with CD4 test within 3 months of diagnosis or by December of 2015 are shown in Table 1. The proportion of persons with CD4 test within 3 months of HIV diagnosis increased from 54% in 2006 to 73% in 2013. The overall CD4 test completeness by year of diagnosis was stable at around 89.5%. The average weight (N4/N2) to account for reporting delay and no CD4 test by year of diagnosis increased from 1.13 for 2006 to 1.20 for 2013 (Table 1).
Among all persons with HIV diagnosed in 2006–2013, 172,029 persons had been infected before 2006 (Table 2). The number of persons infected with HIV in 2006–2013 and diagnosed by the end of 2013 decreased by year of infection from 33,491 in 2006 to 13,869 in 2013. Approximately 195,800 persons were living with undiagnosed infection at the end of 2005. The annual number of infections decreased from 48,300 in 2007 to 39,000 in 2013 (Table 3).
Based on the national HIV surveillance data, there were 701,100 persons living with diagnosed HIV infection at the end of 2005 (cumulative diagnoses minus cumulative deaths = 1,272,300–571,200). The estimated number of undiagnosed infections at the end of 2005 was 195,800. Adding these 2 numbers gives the total number of persons living with HIV infection at the end of 2005. The estimated prevalence increased from 0.9 million to 1.1 million and the estimated proportion of undiagnosed infections decreased from 21.8% to 16.4% during the study period, 2006–2013 (Table 3).
An estimated 81.5% of persons infected with HIV in 2013 were male, 67.9% men who have sex with men, 44.6% black, and 33.1% aged 25–34 years (Table 4). At the end of 2013, of persons living with HIV, 76.8% were male, 55.1% men who have sex with men, 42.7% black, and 32.4% aged 45–54 years. Males had a higher proportion of undiagnosed infections (17.3%) than females (13.4%), as did younger age groups compared with older age groups (51.6% in the youngest age group, 13–24 years, vs. 6.6% in the oldest age group, 55 years and older).
Limiting the analyses to jurisdictions with complete CD4 data as defined above produced results quite similar to those reported here, with differences less than 1 percent (data not shown).
We have introduced a method that uses the first CD4 count after HIV diagnosis to estimate the date of infection. Then, based on the estimated time from HIV infection to diagnosis, we can estimate the diagnosis delay distribution, which in turn can be used to account for individuals infected but not yet diagnosed, and thus estimate the total number of HIV infections (diagnosed and undiagnosed). As a by-product, the estimated diagnosis delay distribution can be used to estimate the average time it would take to diagnose a new infection. In addition, the distribution of diagnosis delays among new diagnoses can be used to estimate the average time a person receiving a diagnosis of HIV has been infected.
Compared with published incidence estimates for 2007–2010,21 our incidence estimates are up to 10% lower, but all 95% CIs overlap. For example, the incidence estimates based on the stratified extrapolation approach are 45,000 (39,900 to 50,100) for 2009 and 47,500 (42,000 to 53,000) for 2010, whereas the estimates based on the CD4 model for 2009 and 2010 are 45,500 (44,100 to 46,800) and 43,400 (41,800 to 44,900), respectively. Our prevalence estimates for 2007–2011 are 12%–15% lower than previously published estimates16 and, we believe, more accurate because using the CD4 model adjustment for incomplete reporting of HIV cases is not necessary. This is because only cases diagnosed in recent years are needed in the CD4 model and jurisdictions converted cases reported to them by code to named cases and have reported these cases to CDC to allow assessment of prevalence of cases diagnosed before the jurisdiction began reporting. Previous estimates rely on back-calculation models, for which data for the entire epidemic period are required and hence adjustment for incomplete reporting in the early years. Such an adjustment would increase the estimated number of persons living with diagnosed HIV infection and thus lower the percent undiagnosed.
Our approach is subject to some uncertainties and limitations. First, the accuracy of incidence, prevalence, and proportion undiagnosed estimates based on CD4 data relies on the accuracy of the CD4 depletion model. Although some nonlinear functions have been used to model CD4 depletion,10,11 there is limited information on whether they are suitable for different HIV subtypes, risk, and various demographic groups. Second, individual CD4 counts are highly variable even over short time intervals. This has an effect on the precision of estimates but not much on the bias if the depletion model is correct. We adopted the multiple imputation approach to account for the variability at the individual level in the group level estimates. Considering this variability, some researchers have modeled CD4 depletion by stage instead of individual value.22–26 Basically, their approach is a generalized back-calculation method that classifies cases into groups at different stages based on CD4 results instead of AIDS or non-AIDS as other back-calculation methods do (eg, our earlier approach for estimating prevalence and percentage undiagnosed for the US3,5). It requires all historical data on HIV diagnoses, which makes it difficult if not impossible to use with our data because CD4 data are very limited in earlier years. In addition, HIV may cause a sharp drop in CD4 count within the first few weeks or months after infection.27,28 This is followed by a small recovery and increase in CD4 count, then a second slower decline in the number of CD4 cells over time. At the population level, this initial drop-off effect will generally be minor because few infected individuals are tested during that short period. The initial drop will cause an overestimate of time since infection, and hence increase the diagnosis delay weight slightly. We could correct this problem if there was information indicating a recent infection, eg, a recent negative test or a nonreactive less sensitive test.
Persons with a CD4 test but not within 3 months of diagnosis may have received treatment before the first CD4 test reported to the surveillance system and hence their first CD4 results do not reflect natural (ie, untreated) depletion. Information from viral load testing could be used as a proxy for treatment receipt so that questionable CD4 results can be removed from analysis. For persons without a timely CD4 test or no CD4 test at all, additional information related to the time since infection, such as time of a previous negative test or indication through an incidence assay of recent vs. long-standing infection, could be used to estimate the length of diagnosis delay. Because we only need to know the number of persons living with diagnosed infection at the end of 2005, it does not matter whether persons who received a diagnosis of HIV infection and died before 2006 are counted in the cumulative number of diagnoses and the cumulative number of deaths at the end of 2005.
Although reporting the first CD4 test result after diagnosis of HIV infection is a required data element on the HIV case report form, there is not always a CD4 result available close to diagnosis for various reasons, such as persons not being linked to care or data not being reported. Because of differences in CD4 data completeness among jurisdictions, we also considered the approach using CD4 data from jurisdictions satisfying the completeness requirements as specified in the Methods section. It turned out that results from the 2 approaches are quite similar with differences less than 1 percent. This is because there are only 3 jurisdictions not satisfying the completeness requirements, and they either have a small number of cases or are very close to meeting the completeness requirements.
An implicit assumption when applying survival analysis for right truncated data to estimate the diagnosis delay distribution is that the diagnosis delay distribution does not change over time. This assumption may not be completely true if HIV testing increases consistently from year to year, which could result in an overestimation of diagnosis delay and hence overestimation of incidence and undiagnosed infections.
In summary, the method using the first CD4 count after HIV diagnosis to measure the progression of HIV disease can readily be applied to HIV surveillance data to estimate the annual numbers of new HIV infections, HIV prevalence, and the proportion prevalent infections that remain undiagnosed. Estimates can be obtained overall and for subpopulations, thus allowing for the monitoring of the goals and objectives of national and local HIV prevention strategies.
1. The White House Office of National AIDS Policy. National HIV/AIDS Strategy for the United States: updated to 2020. July 2015. Available at: https://aids.gov/federal-resources/national-hiv-aids-strategy/nhas-update.pdf
. Accessed on March 15, 2016.
2. Karon JM, Song R, Kaplan E, et al. Estimating HIV incidence in the United States from HIV/AIDS surveillance
data and biomarker HIV test results. Stat Med. 2008;27:4617–4633.
3. Hall HI, Song R, Rhodes P, et al. Estimation of HIV incidence in the United States. JAMA. 2008;300:520–529.
4. Prejean J, Song R, Hernandez A, et al. For HIV Incidence Surveillance
Group. Estimated HIV incidence in the United States, 2006–2009. PLoS One. 2011;6:e17502.
5. An Q, Kang J, Song R, et al. A Bayesian hierarchical model with novel prior specifications for estimating HIV testing rates. Stat Med. 2016;35:1471–1487.
6. Hall HI, An Q, Tang T, et al. Prevalence of diagnosed and undiagnosed HIV infection—United States, 2008–2012. MMWR Morb Mortal Wkly Rep. 2015;64:657–662.
7. DeGruttola V, Lange N, Dafni U. Modeling the progression of HIV infection. J Am Stat Assoc. 1991;86:569–577. Available at: http://www.jstor.org/stable/2290384
. Accessed September 1, 2016.
8. Lodi S, Phillips A, Touloumi G, et al. CASCADE Collaboration in EuroCoorda. Time from human immunodeficiency virus seroconversion to reaching CD4+ cell count thresholds, <200, <350, and <500 cells/mm3
: assessment of need following changes in treatment guidelines. Clin Infect Dis. 2011;53:817–825.
9. Touloumi G, Pantazis N, Pillay D, et al. CASCADE collaboration in EuroCoord. Impact of HIV-1 subtype on CD4 count at HIV seroconversion, rate of decline, and viral load set point in European seroconverter cohorts. Clin Infect Dis. 2013;56:888–897.
10. Yan P, Zhang F, Wand H. Using HIV diagnostic data to estimate HIV incidence: method and simulation. Stat Commun Infect Dis. 2011;3:1948–4690.
11. Meijerink H, Wisaksana R, Iskandar S, et al. Injecting drug use is associated with a more rapid CD4 cell decline among treatment naive HIV-positive patients in Indonesia. J Int AIDS Soc. 2014;17:18844.
12. Hall HI, Song R, Szwarcwald CL, et al. Brief report: time from infection with the human immunodeficiency virus to diagnosis, United States. J Acquir Immune Defic Syndr. 2015;69:248–251.
13. Szwarcwald CL, Pascom ARP, de Souza Júnior PR. Estimation of the HIV incidence and of the number of people living with HIV/AIDS in Brazil. J AIDS Clin Res. 2012;2015:430.
14. Cohen SM, Gray KM, Ocfemia MC, et al. The status of the national HIV surveillance
system, United States, 2013. Public Health Rep. 2014;129:335–341.
15. Song R, Green TA. An improved approach to accounting for reporting delay in case surveillance
systems. JP J Biostatistics. 2011;7:1–14.
16. Centers for Disease Control and Prevention. Monitoring Selected National HIV Prevention and Care Objectives by Using HIV Surveillance
Data—United States and 6 Dependent Areas, 2012. Atlanta, GA: US Department of Health and Human Services, CDC; 2014. Available at: http://www.cdc.gov/hiv/pdf/surveillance_report_vol_19_no_3.pdf
17. Harrison KM, Kajese T, Hall HI, et al. Risk factor redistribution of the national HIV/AIDS surveillance
data: an alternative approach. Public Health Rep. 2008;123:618–627.
18. Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York, NY: John Wiley & Sons Inc; 1987.
19. Rubin DB. Multiple imputation after 18+ years. J Am Stat Assoc. 1996;91:473–489.
20. CASCADE Collaboration. Changes in the uptake of antiretroviral therapy and survival in people with known duration of HIV infection in Europe: results from CASCADE. HIV Med. 2000;1:224–231.
21. Prejean J, Hernandez A, Song R, et al. Estimated HIV incidence among adults and adolescents in the United States, 2007–2010. HIV Surveill Supplemental Rep. 2012;17:1–26.
22. Sweeting MJ, De Angelis D, Aalen OO. Bayesian back-calculation using a multi-state model with application to HIV. Stat Med. 2005;24:3991–4007.
23. Ndawinz JD, Costagliola D, Supervie V. New method for estimating HIV incidence and time from infection to diagnosis using HIV surveillance
data: results for France. AIDS. 2011;25:1905–1913.
24. van Sighem A, Nakagawa F, De Angelis D, et al. Estimating HIV incidence, time to diagnosis, and the undiagnosed HIV epidemic using routine surveillance
data. Epidemiology. 2015;26:653–660.
25. Cori A, Pickles M, van Sighem A, et al. CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, sex and calendar time. AIDS. 2015;29:2435–2446.
26. Lodwick RK, Nakagawa F, van Sighem A, et al. Use of surveillance
data on HIV diagnoses with HIV-related symptoms to estimate the number of people living with undiagnosed HIV in need of antiretroviral therapy. PLoS ONE. 2015;10:e0121992.
27. Fauci AS, Pantaleo G, Stanley S, et al. Immunopathogenic mechanisms of HIV infection. Ann Intern Med. 1996;124:654–663.
28. Aldrich J, Gross R, Adler M, et al. The effect of acute severe illness on CD4+ lymphocyte counts in nonimmunocompromised patients. Arch Intern Med. 2000;160:715–716.