The Joint United Nations Program on HIV/AIDS (UNAIDS) and country partners produce annual global, regional and country-specific estimates of HIV burden for use in national and global planning and monitoring. These estimates are prepared with oversight from the UNAIDS Reference Group on Estimates, Modeling and Projections, which works to advance the development of statistical and mathematical approaches to modeling the HIV epidemic . Most countries use Spectrum, a UNAIDS-supported modeling software tool (Spectrum, Glastonbury, Connecticut, USA), to generate these annual estimates.
The fit to program data (FPD) tool, first introduced in 2014, was developed as an alternative curve fitting tool to the Estimates and Projection Package (EPP) for countries with concentrated epidemics and robust historical vital registration and case-based HIV surveillance systems. Before 2014, most of these countries would have used EPP to derive national incidence curves from HIV seroprevalence surveillance and survey data among key populations at higher risk of HIV exposure (such as female sex workers, gay men and other MSM and people who inject drugs) and pregnant women attending antenatal care clinics alongside estimates of the size of each population subgroup [2,3]. EPP-derived estimates have been subject to criticism where prevalence data among key populations are not routinely available or not nationally representative, do not reflect, or where accurate estimates of the size of key populations are not available [4,5]. The challenge in many places is that the surveillance, especially among MSM, is comparatively unstable and changes too frequently to represent trend data. In countries with very low-level epidemics where few historical measures of HIV prevalence are available among key populations, estimation of HIV incidence using EPP may not have been possible.
The FPD tool overcomes the challenges of scarce HIV survey data and key population size estimates by offering countries with robust vital registration and case-based HIV surveillance systems an alternative approach for deriving HIV incidence estimates. Previous studies have shown that many middle-income and high-income countries have reasonably complete vital registration systems  and that misclassification of AIDS-related deaths to other causes is low [7,8]. Although published evaluations of the quality and completeness of HIV case-based surveillance systems are more limited, studies elsewhere have shown that HIV incidence curves derived from program data are a good alternative to fits in EPP when information about the timing of diagnosis, the proportion of the population undiagnosed and the potential for misclassification of AIDS-related deaths over time is known [9,10].
This article describes the implementation of the FPD tool in the 2016 UNAIDS estimates round, with a focus on advances in the development of methods and a new approach for incorporating uncertainty. We also provide a brief summary of where the tool was applied in 2016, the frequency of available program data inputs, and its impact on estimates of HIV incidence, people living with HIV and AIDS-related deaths. Finally, we provide some illustrative examples of the fitting process in Barbados and Chile and describe improvements planned for future estimation rounds.
Modeling HIV incidence among adults aged 15–49 years
Double logistic curve
In the FPD tool, the HIV incidence curve is modeled as a double logistic function of the form
where α defines the rate of increase at the beginning of the trend, β defines the rate of convergence to the asymptote, t0 determines the time of the inflexion point, a modulates the peak incidence, and b indicates the asymptote (i.e. the incidence value over time); and all parameter values are greater than zero.
Eq. (1) was first proposed by Stover et al. and describes a flexible class of functions. However, these functions assume that the incidence converges to a constant over time. For some countries, program data inputs do not support this assumption. Moreover, the parameter dimension may be too large for it to be estimated with good precision when program data are scarce.
Simple logistic curve
When there is no evidence of an inflexion point or there are too few data points, a simple logistic curve may be warranted. In the 2016 version of the FPD tool, we added a second option that models incidence as follows:
, c defines the incidence at time
and α defines the rate of increase of the trend. As in the double logistic curve, all parameter values must be greater than zero.
To estimate trends in HIV incidence, we assume that, for a given country, the numbers of AIDS deaths recorded at times td1,…,tdSd were Nd(td1),…,Nd(tdSd), respectively; the numbers of newly detected HIV cases recorded at times tn1,…,tnSn were Nn(tn1),…,Nn(tnSn), respectively; and the numbers of people living with HIV (PLHIV) at times tl1,…,tlSl were Nl(tl1),…,Nl(tlSl), respectively, where Sd, Sn and Sl are the respective numbers of years for which AIDS deaths, new HIV diagnoses and PLHIV were available.
Let θ be the parameter vector for the incidence curve. We have θ = (α,β, t0, a,b) if Eq. (1) is considered, and θ = (a,c), if Eq. (2) is considered instead. And, for each θ and for each time t, let nd (θ;t), nn (θ;t) and nl (θ;t) be the numbers of AIDS death, new cases and PLHIV, respectively, as predicted by Spectrum given the incidence curve corresponding to θ.
Our objective is to find the incidence curve that provides estimates of new cases, deaths and number living with HIV that fit the observed numbers best.
Maximum likelihood estimation
In the maximum likelihood estimation approach, the observed numbers followed independent Poisson distributions such that the likelihood is
We estimated θ by maximizing Lik, which is equivalent to minimizing
that is finding
, the set of arguments θ that maximize the log-likelihood.
When this approach is used for parameter estimation, the double and simple logistic models can be compared using the Akaike information criterion (AIC) .
Minimum chi-squared distance
The minimum chi-squared distance (MKS) approach allows avoiding the Poisson assumption. This consists of minimizing the following loss function:
that is finding
, the set of arguments θ that minimizes the chi-squared distance.
In 2016, a new approach was adopted on the basis of asymptotic properties of the estimators. For each curve fit, under regulatory conditions, the parameter is assumed to be asymptotically Gaussian and the covariance matrix of the estimated parameter, that is
is obtained from the Hessian matrix of llik (or ks) . Uncertainty can then be derived by sampling the parameters using that covariance matrix and generating new incidence curves (default of 1000). These incidence curves are then used to make projections and, for each indicator of interest, 95% confidence bounds are obtained using the bootstrap and percentile method .
Adjustment to reported program data
In practice, the data from case-based notification and vital registration systems will not account for all relevant cases. Some proportion of the population living with HIV will be undiagnosed, data from certain facilities or geographical areas may be missing, and deaths from AIDS-related causes may have been attributed to some other cause or not reported
Additional inputs to the tool are therefore used for adjusting program data inputs, including annual estimates of percentage undercounted, percentage undiagnosed and number of years from infection to diagnosis. Indeed, two sets of values for estimated percentage undercounted can be used, for scaling new HIV cases and AIDS-related deaths, whereas estimated percentage undiagnosed is used to scale PLHIV. Estimated years from infection to diagnosis are used to shift the new HIV cases data to earlier years as a representation of new HIV infections rather than diagnoses.
As the FPD tool employs a fitting procedure to the program data, the aim of these adjustments is to be a more accurate representation of the number of people living with and dying from HIV as well as HIV infections as opposed to diagnoses. Proportions undiagnosed are used to directly adjust the reported number in the same year. In fact, let us assume for example that τ is the average duration in years between infection and diagnosis. Let us further assume that
is the number of reported new diagnoses at time t + τ (Spectrum input). Then the ‘true’ number of new diagnosis at time t (defined in the ‘Minimum chi-squared distance’ section), Nn(t), is calculated as follows:
, where q(t) is the proportion of undercount at time t which is an input that can be obtained from external studies.
Numbers of people who have died because of AIDS-related causes is also scaled to account for potential misclassification or under-reporting of deaths using the same approach as described for the proportion undiagnosed.
Application of the fit to program data tool in the 2016 UNAIDS estimation round
In 2016, the FPD tool was used to estimate incidence curves for 62 countries, an increase from 16 countries in 2015 (Table 1). The tool was used most often in countries in Western and Central Europe and North America. In Latin America and the Caribbean, 13 countries opted for FPD over EPP approaches in 2016, up from five in the previous round. Georgia was the first country in Eastern Europe and Central Asia to adopt the tool and publish estimates from it in 2016, although results from four other countries contributed to 2016 UNAIDS regional and global estimates. In the Middle East and North Africa, Lebanon and Algeria used the tool for a second year.
Data inputs to inform FPD curve fits were primarily AIDS-related mortality data coming from vital registration systems and new HIV case reports from case-based notification systems. Five countries in Latin America and 13 high-income and middle-income countries had sufficiently robust case-based notification systems linked to vital registration systems to provide historical numbers of PLHIV as an input.
In four countries, Bulgaria, Chile, Montenegro and Slovakia, a simple logistic curve to model incidence was required. For all countries, the maximum likelihood and minimum chi-squared distance approaches produced similar results.
Examples applications of the fit to program data tool in
In Barbados, previous efforts to publish estimates from Spectrum had failed because of insufficient availability of HIV serosurveillance data from key populations. Unpublished estimates that were used to contribute to regional and global totals were based on attempts to fit EPP to case-based notifications of the people newly diagnosed with HIV by sex.
In 2016, the country adopted the FPD tool, taking advantage of historically robust program data of the number of people currently alive in their HIV case-based surveillance system and the number of people reported to have died from AIDS-related causes. Reports of the number of PLHIV in the National Surveillance database were available between 1989 and 2013. Approximately 15% of PLHIV were assumed to be undiagnosed over time. Program data on AIDS-related mortality was available from 2003 to 2012.
Comparisons of HIV incidence trends and estimates of the number of PLHIV and the number of AIDS-related deaths in the 2015 and 2016 models relative to program data inputs are shown in Fig. 1a–c, respectively. This indicates that the FPD tool fits the data, for example the reported number of PLHIV better than EPP which, in this context, uses a poor data set and appears to underestimate that indicator from 2005 onward and presents larger confidence widths. Estimates from the FPD tool were validated and accepted for publication by the Ministry of Health in Barbados and UNAIDS (personal communication, Dr Anton Best, 25 May 2016).
Chile adopted the FPD tool in 2015 because of limited availability of historical HIV seroprevalence surveillance and survey data among key populations. The country has robust historical data on the number of people newly diagnosed with HIV in the national surveillance system, the number of positive tests from the central reference laboratory and the number of people who died from AIDS-related causes.
AIDS-related mortality data have been reportable in Chile since 1997 and available through 2013. New HIV cases were reported to the Ministry of Health from 2004 and were available through 2012. In 2015, the FPD tool was used to estimate HIV incidence using a double logistic curve. In 2016, however, an improved curve fit was achieved using a simple logistic approach, as suggested by a change in AIC of 6. Nevertheless, both approaches suggest that the incidence remains stable over time. This suggests that prevention efforts have had limited success in the country. Estimates of HIV incidence, validated and published by the Ministry of Health in Chile and UNAIDS, are shown in Fig. 2 for the 2015 and 2016 rounds.
Impact of moving to the fit to program data tool on 2016 global estimates
Figure 3 shows a comparison of estimates of HIV incidence, new HIV infections, PLHIV and AIDS-related mortality for the 46 countries that were using EPP but newly adopted the FPD tool in 2016. Caution is needed when interpreting this impact as differences are also because of other modifications in the 2016 Spectrum model described elsewhere (in this issue). Overall, adoption of the FPD tool resulted in lower estimates of incidence between the early 1980s and mid-1990s and a higher and modestly rising trend in incidence from early 2000 through 2015 compared with results from the previous year. Estimates of PLHIV in 2015 were similar in both rounds, although the increase was steeper in the early 2000s using the FPD tool. Peaks in AIDS-related deaths occurred in 1995 for both methods, although the FPD tool results showed somewhat steeper declines as compared with the previous fitting approach.
In this article, we presented a review of the FPD tool methods and most recent advances as well as a summary of its application and potential impact on the 2016 round of estimates. The tool was used in the most recent round to estimate HIV incidence trends among adults aged 15–49 years in 62 of the 167 countries that contribute to UNAIDS regional and global estimates, an increase from the previous year where it was used by just 16 countries. The newly added simple logistic functional form for incidence yielded an improved fit to the program data in four countries. Aggregation of result across countries newly adopting the tool suggested that in more recent years, HIV incidence was rising modestly rather than declining and that progress in reducing AIDS-related mortality may have been greater than assumed during the 2015 UNAIDS estimates round in these countries.
The use of the FPD tool, with its ongoing expansion and improvements in methods, has offered a number of benefits. Perhaps the most important for countries is the transparent fitting process to routinely available surveillance and vital registration data and the acceptability of modeled results. Previously, many countries, including the Bahamas, Barbados and Costa Rica, were unable to generate acceptable estimates using EPP because of limited or no HIV survey data among key populations. Elsewhere in Latin America and the Caribbean, historical estimates of HIV prevalence from surveys among key populations have been shown to be potentially biased by various sampling, selection and participation biases, and there is no accepted gold standard method for estimating the size of key populations [4,14]. In addition, as observed in this study, because the FPD tool was tailored to fit mortality data among other indicators, it should perform better than EPP in countries with good vital registration systems.
Another benefit of the tool as implemented in 2016 compared with 2015 is that uncertainty around the estimates is now directly incorporated, which better informs the precision around the estimates.
Additional work is planned to improve the FPD tool and its application by countries in future estimation rounds. One of the main priorities will be to work with countries to document the quality and completeness of program data inputs and to more accurately reflect the uncertainty in the inputs into the uncertainty bounds around the final incidence estimates. For example, historical estimates of the proportion of the undiagnosed population or time from infection to diagnosis may not be available in many countries but could be estimated from other models or taken from other studies in the region. Estimates of misclassification of AIDS-related mortality and completeness of vital registration systems from other modeling work could also be used to scale AIDS-related mortality data . However, using indicators such as number of PLHIV obtained from modeling studies as input to the FPD tool may lead to biased estimates resulting from these studies, which can be hard to detect. In addition, data on recent diagnoses or deaths may not yet be reported to the system for more recent years, which may create a greater uncertainty around the most recent estimates than is currently captured by the model.
A second area for further exploration is the parameterization of the incidence curve. In the current version of the tool, the incidence curve is parameterized as either a double logistic or simple logistic curve. The double logistic curve appears flexible and provided reasonable fits for most of the countries. Though these provided satisfactory fits, neither may be very appropriate to detect outbreaks. This suggests that a wider class of functions, including the repertoire of curves offered by EPP, should be explored for future development of the tool and possibly the need to provide model selection tools to users.
A final area for future work is to explore how countries might produce HIV incidence curves by key populations or within smaller level geographic areas using this tool. Although the functionality could be incorporated easily with the tool, it would require countries to introduce changes in case notification forms that capture the likely location and suspected route of transmission of the infection. As countries begin to realize the benefits of using case reporting and vital registration system data to produce more robust estimates of the impact of the HIV epidemic; however, the level of effort required to achieve this vision may not seem so extraordinary.
The authors wish to thank UNAIDS for ongoing support for development of Spectrum. We would like to acknowledge the work done by colleagues in the Ministry of Health in Barbados during the 2016 round of UNAIDS estimates.
Authors contributions: conceived, designed and performed the experiments: J.S., S.G.M. and K.M.; analyzed the data: S.G.M., K.G., K.C., K.M. and S.C.; wrote the first draft of the manuscript: J.S., G.M., K.G. and K.M.; contributed to the writing of the manuscript: J.S., G.M., K.G., K.C., K.M. and S.C.; agree with the manuscript's result and conclusion: J.S., G.M., K.G., K.C., K.M. and S.C.
Conflicts of interest
There are no conflicts of interest.
1. Brown T, Bao L, Eaton JW, Hogan DR, Mahy M, Marsh K, et al. Improvements in prevalence trend fitting and incidence estimation in EPP 2013
2014; 28 (suppl 4):S415–S425.
2. Hogan DR, Zaslavsky AM, Hammitt JK, Salomon JA. Flexible epidemiological model for estimates and short-term projections in generalised HIV/AIDS epidemics
. Sex Transm Infect
2010; 86 (suppl 2):ii84–ii92.
3. Hogan DR, Salomon JA. Spline-based modelling of trends in the force of HIV infection, with application to the UNAIDS Estimation and Projection Package
. Sex Transm Infect
2012; 88 (suppl 2):i52–i57.
4. Abdul-Quader AS, Baughman AL, Hladik W. Estimating the size of key populations: current status and future possibilities
. Curr Opin HIV AIDS
5. Yu D, Calleja JM, Zhao J, Reddy A, Seguy N. Estimating the size of key populations at higher risk of HIV infection: a summary of experiences and lessons presented during a technical meeting on size estimation among key populations in Asian countries
. Western Pac Surveill Response J
6. Phillips DE, Lozano R, Naghavi M, Atkinson C, Gonzalez-Medina D, Mikkelsen L, et al. A composite metric for assessing data on mortality and causes of death: the vital statistics performance index
. Popul Health Metr
7. Fazito E, Cuchi P, Fat DM, Ghys PD, Pereira MG, Vasconcelos AM, et al. Identifying and quantifying misclassified and under-reported AIDS deaths in Brazil: a retrospective analysis from 1985 to 2009
. Sex Transm Infect
2012; 88 (suppl 2):i86–i94.
8. Trepka MJ, Sheehan DM, Fennie KP, Niyonsenga T, Lieb S, Maddox LM. Completeness of HIV reporting on death certificates for Floridians reported with HIV infection, 2000–2011
. AIDS Care
9. Stover J, Andreev K, Slaymaker E, Gopalappa C, Sabin K, Velasquez C, et al. Updates to the spectrum model to estimate key HIV indicators for adults and children
2014; 28 (suppl 4):S427–S434.
10. Vesga JF, Cori A, van Sighem A, Hallett TB. Estimating HIV incidence from case-report data: method and an application in Colombia
2014; 28 (suppl 4):S489–S496.
11. Akaike H. A new look at the statistical model identification
. IEEE Trans Automatic Control
12. Berkson J. Minimum chi-square, not maximum likelihood
. Ann Statist
1980; 8: 457–487.
13. Press WH. Numerical recipes software (firm). Numerical recipes in C
. 2nd ed.1993; Cambridge, England; New York, NY: Cambridge University Press, v2.0. ed.
14. Miller WM, Buckingham L, Sanchez-Dominguez MS, Morales-Miranda S, Paz-Bailey G. Systematic review of HIV prevalence studies among key populations in Latin America and the Caribbean
. Salud Publica Mex
2013; 55 (suppl 1):S65–S78.
15. Wang H, Wolock TM, Carter A, Nguyen G, Kyu HH, Gakidou E, et al. Estimates of global, regional, and national incidence, prevalence, and mortality of HIV, 1980–2015: the Global Burden of Disease Study 2015
. Lancet HIV