An effective national response to the HIV epidemic requires strong prevention, care and treatment programs focused on key populations at risk for HIV exposure . Key populations consist of people who inject illicit drugs (PWIDs), men who have sex with men (MSM), transgender women and sex workers and their sex partners. Monitoring HIV prevalence, and access to and utilization of programmes by the members of key populations, provides important measures of progress in combatting the epidemic. National programmes must ‘know their epidemic’ epidemiologically and programmatically to plan efficiently. The primary means to collect data on these programs is through repeated biological–behavioural surveys.
Programmatically, the measurement of coverage is critical to determine if sufficient numbers of people living with HIV or at risk for contracting HIV are receiving prevention, care or treatment services with intensity adequate to turn or accelerate the epidemic trend downward. For example, if only 30% of PWIDs know their correct HIV status and routinely access clean needles, it is unlikely that HIV transmission among them will be curtailed. Modeling from Vietnam suggests that programme coverage needs to approach high figures (50% for opiate substitution therapy, 85–100% for condom use among sex workers and regular testing with early treatment) before the epidemic curve moves toward zero . In addition, estimating the population sizes of these populations to comprehend the potential magnitude of the epidemic in each subpopulation is important to planning efforts. National programs, donors and multilateral agencies use coverage (i.e. condom use, uptake of HIV testing, antiretroviral therapy) measures as a sign of impact of successful responses in reducing HIV transmission .
OVERALL CHALLENGES AND SUCCESSES
The challenges for collecting and using these important measures can be broken down into two categories. One category is methodological. Probability sampling methods available for surveys of key populations have been developed and fielded only in the past 10 years. Key populations are often stigmatized and suffer discrimination in their societies; 60% of countries reporting on legal conditions to UNAIDS have laws, regulations and policies that inhibit access to prevention, care and treatment among key population members . These conditions make these communities difficult to sample in a statistically meaningful way, that is, with probability sampling. In the mid-1990s, time–location sampling (TLS) (also known as venue–day–time or time–space) adapted multistage cluster sampling to provide a statistical framework for sampling populations that could be mapped . In the early 2000s, respondent-driven sampling (RDS) became popular for surveying populations that were socially networked [6,7]. Analytic advances for both methods permit theoretically unbiased estimators based on probability statistics to be calculated [8,9,10▪▪,11▪▪]. These methodological advances combined with additional resources for surveys led to an unprecedented expansion of surveys in many populations wherein data were previously absent. Many of the field methods for implementing TLS or RDS surveys seem to be settled. However, correct analysis of the resulting data remains challenging.
The second category of challenges is overcoming blind spots. Although increasing numbers of countries are conducting surveys of key populations, these surveys often cover a small number of localities. Even in countries with large numbers of surveys, like China, Ukraine or Vietnam, surveys are conducted largely in cities. Still, many more cities do not collect any survey data, relying on case reports and little else to monitor the epidemic and response. More commonly, data are available from two or three cities and estimates from these cities are extrapolated or simply applied to represent the entire country. Limits on survey sites generally come from the lack of resources to expand and implementation challenges in small towns and rural areas. Some countries make efforts to assess whether the context of one city can be applied to its nearest neighbours; most do not.
Issues that arise in the first category above are being addressed by the academic and survey implementation community. This article will review some of the notable advances of the past year in this area. The second category is only being acknowledged recently. No publications were uncovered that discuss this explicitly. However, work from Pakistan suggests that mixed method approaches that include broad area mapping, survey data and injecting/sexual network data might provide an understanding that is closer to the national picture . In the absence of a nationally distributed surveillance system, widespread HIV testing sites and complete HIV case reporting can provide a full picture of an epidemic.
Much behavior that puts one at risk for HIV transmission events are considered socially stigmatized in many settings. An on-going challenge is obtaining valid responses to important behavioral questions such as recent opiate use or condom use during sex. Opiate testing of urine specimens is infrequently reported but has been available for a long time. Advances in laboratory testing to confirm validity of responses by the use of biological markers are both useful and concerning. Evans et al. describe the use of prostate-specific antigen (PSA) testing of vaginal swabs collected during a survey of sex workers to determine recent (<48 h) exposure to semen. Among women who tested positive for recent exposure to semen (42/183), 42% reported only protected or no sexual intercourse . Gallo et al. review data available on semen biomarkers finding equivocal validity of survey responses. They suggest pairing analyses of reported condom use with the results of PSA or Y-chromosome DNA testing to assess the validity of responses.
KEY POPULATIONS COVERAGE CHALLENGES
Key population surveys typically cover PWIDs, MSM and sex workers. Transgender persons, especially transgender women, were captured primarily within surveys of MSM or sex workers. Increasingly, surveys of transgender women are reported in the literature and their importance in the required epidemic response is recognized . Network analysis suggested that RDS could be used successfully to sample transgender women populations  and surveys specifically designed for transgender women have succeeded using RDS [17,18]. The Global AIDS Response Progress Report requests selected indicators be disaggregated by sex, with the option of male, female or transgender woman for the first time in 2014 .
GEOGRAPHIC COVERAGE CHALLENGES
Survey sampling of key populations typically necessitates adaptive sampling methods, such as TLS or RDS . Statistically representative household samples such as the Demographic and Household Surveys are unusual and not recommended for obtaining estimates of prevalence of HIV and associated behaviors . Such sampling has low yield of respondents at risk of HIV infection owing to injection drug use, male–male sex or commercial sex, attributable to the relative rarity of these behaviors in the population. Moreover, household survey respondents are often reluctant to divulge practices that might be stigmatized or illegal, especially in a home where relatives might overhear the conversation. However, household surveys can be used to estimate sizes of key populations . In most countries, data are collected from key populations in a limited number of cities. This decision is often based on available resources, conventional wisdom of where populations are located, previous prevalence estimates, where programs are based, and sources of large numbers of case reports. In countries with sufficiently large and diverse survey sites and relatively homogeneous epidemics and responses, estimates may be extrapolated to the unsurveyed portions of the country. Otherwise, if prevention, care and treatment programmes are available, programme data may complement survey data. It is difficult to assess individual site programmes with published aggregated data from multisite surveys. For example, Kerr et al. published results from a 10-site survey of MSM and Szwarcwald et al. did likewise for FSW in Brazil. Kerr et al. describe important differences among the communities participating in the surveys that would intimate different requirements for programmes in different communities. Emmanuel et al. describe results in greater detail, by site, for 20 cites in Pakistan. Very clear differences emerge suggesting different responses are necessary and different responses may be in place. It is clear that whatever national policy might exist, needs and/or implementation varies by site and requires more granular data . These data are also important for measuring trends in HIV prevalence; it is critical to maintain regular and consistent data collection sites over a period of years to create a clear picture of the epidemic by site and nationally .
CHOOSING A SAMPLING METHOD
The decision to choose a sampling method requires understanding the methodology in conjunction with the structure of the population to be surveyed. Although there is general agreement that a sampling method once successfully employed should remain in use to allow time trend analysis, there is some agreement that RDS captured harder-to-reach elements of a population. However, rules to select a method were not promulgated though suggestions are available . A few comparisons to inform sampling method selection were published in the past [25,26].
Recently, Paz-Bailey et al. published a formal comparison of the two methods to inform the selection of a national methodology for key population surveys in Guatemala, and Wei et al. compared RDS and TLS for sampling black MSM in the USA. Paz-Bailey found RDS recruited a more diverse, higher risk sample of MSM for lower cost. Design effect was similar for both methods. Wei found a more diverse, higher risk population with RDS and suggests using it for future surveys of black MSM in the USA. Tran et al. report a similar comparison among PWIDs in Hai Phong, Vietnam, finding similar HIV prevalence, with lower cost and effort per respondent with RDS but higher refusal rates when compared to TLS Table 1.
TIME–LOCATION SAMPLING CHALLENGES AND ADVANCES
TLS is the most effective method for obtaining probability samples of populations who can be located at venues. TLS mimics multistage cluster sampling by constructing sampling frames for places where target populations gather in appreciable numbers over different time frames. The estimated numbers in each place–time cell can be used for probability proportional to size sampling. Often the decision to use TLS over RDS is based on the history of prior use, familiarity with mapping populations, safety and perception of validity. The similarity between TLS and population proportional to size cluster sampling provided a strong veneer of statistical validity . However, TLS clusters are typically defined by the very characteristics being measured, giving a likely high level of collinearity within sampling clusters. Karon and Wejnert [10▪▪] describe the need to weight TLS samples appropriately by taking into account the frequency of venue attendance and propose methods to do so. Venue-based sampling applied in Tanzania suggests that people who attend more than two social venues per day were more likely to have concurrent sexual partners, highlighting the need to adjust for venue attendance .
RESPONDENT-DRIVEN SAMPLING CHALLENGES AND ADVANCES
RDS uses social networks to access members of populations. Recruitment is initiated by selecting a small number of ‘seeds’ (eligible population members). Each seed receives a fixed number of recruitment coupons to recruit his/her peers who then present the coupons at a fixed site to enrol in the survey. Eligible recruits who finish the survey process are also given a set number of coupons to recruit their peers. This process continues until the desired sample size is reached.
A big challenge with RDS is merging statistics with implementation realities. Most RDS estimators are heavily dependent on the assumptions of random walk, Markov process models, specifically that RDS is a sampling with replacement method (the sample size is a small proportion of the population size) and the final sample is independent of the selected seeds . Most empirical explorations to develop more robust statistics attempt to address and provide diagnostics to identify these assumptions [9,11▪▪,33▪▪]. For instance, the Successive Sampling estimator relaxes the assumptions of a random walk, Markov process model by using a nonreplacement sampling successive sampling statistic and having some knowledge of the size of the target population . The model-assisted estimator goes one step further by relaxing the replacement sampling and the seed dependence assumptions. Some of these estimators and diagnostic tools have been incorporated into new analysis software [11▪▪,34].
Another widely explored and difficult to achieve assumption in populations is that respondents are recruited from a peer's network at random. Additional suggestions for statistics to cope with nonrandom recruitment offer hope for post-hoc adjustments [35▪▪,36▪▪]. Another area of discussion among RDS practitioners is the size of design effects used to calculate sample sizes to ensure adequate power and confidence for the sample. Findings from surveys conducted among PWIDs in the United States and from surveys on multiple key populations in international settings recommend that design effects above two might be adequate for most RDS studies, an effect closer to three or four would be ideal [37▪,38].
POPULATION SIZE ESTIMATIONS
Both TLS and RDS can be used in deriving population size estimations of hidden populations by using multiplier methods [39,40]. The most promising method only using normally collected RDS data uses a Bayesian mathematical solution to estimating the size of key population [41▪▪].
Measuring HIV prevalence among key populations in any epidemic context is an on-going methodological challenge. Many Ministries of Health and public health researchers regularly collect data using probability sampling methods from populations heretofore deemed too difficult to sample with such methods. New analytic and diagnostic tools render these data more validly analyzed while laboratory assays permit validation of some critical behavioural questions, giving greater confidence to managers who must make decisions based on these data. The new analytical methods for both RDS and TLS may be initially daunting; however, the new RDS analysis software should ease the transition to using new estimators. Political will, resources and community engagement are most important to increase the quantity of data to close gaps in knowledge about the epidemic state and local responses to it.
Conflicts of interest
The authors have no conflicts of interest to declare.
REFERENCES AND RECOMMENDED READING
Papers of particular interest, published within the annual period of review, have been highlighted as:
▪ of special interest
▪▪ of outstanding interest
2. Kato M, Granich R, Bui DD, et al. The potential impact of expanding antiretroviral therapy and combination prevention in Vietnam: towards elimination of HIV transmission. J Acquir Immune Defic Syndr 2013; 63:e142–e149.
5. MacKellar D, Valleroy L, Karon J, et al. The Young Men's Survey: methods for estimating HIV seroprevalence and risk factors among young men who have sex with men. Public Health Rep 1996; 111 (Suppl 1):138–144.
6. Johnston LG, Sabin K, Hien MT, Huong PT. Assessment of respondent driven sampling for recruiting female sex workers in two Vietnamese cities: reaching the unseen sex worker. J Urban Health 2006; 83 (Suppl 1):16–28.
7. World Health Organization. Regional Office for the Eastern Mediterranean. Introduction to HIV/AIDS and sexually transmitted infection surveillance, Module 4, Introduction to respondent-driven sampling; 2013. Geneva, Switzerland. http://applications.emro.who.int/dsaf/EMRPUB_2013_EN_1539.pdf
. [Accessed 1 December 2013]
8. Heckathorn DD. Extensions of respondent-driven sampling: analyzing continuous variables and controlling for differential recruitment. Sociol Methodol 2007; 37:151–207.
9. Tomas A, Gile KJ. The effect of differential recruitment, nonresponse and nonrecruitment on estimators for respondent-driven sampling. Electron J Stat 2011; 5:899–934.
10▪▪. Karon JM, Wejnert C. Statistical methods for the analysis of time-location sampling data. J Urban Health 2012; 89:565–586.
This article is the first to present a weighting method to adjust data collected with TLS. The methods should provide estimates with greater validity and appropriate variance.
Diagnostic tools are presented to test the underlying assumptions for RDS. The tools are simply implemented and will allow researchers to better consider potential biases in their RDS-derived estimates.
12. Emmanuel F, Salim M, Akhtar N, et al. Second-generation surveillance for HIV/AIDS in Pakistan: results from the 4th round of integrated behavior and biological survey 2011–2012. Sex Transm Infect 2013; 89:iii23–iii28.
13. Evans JL, Couture MC, Stein ES, et al. Biomarker validation of recent unprotected sexual intercourse in a prospective study of young women engaged in sex work in Phnom Penh, Cambodia. Sex Transm Dis 2013; 40:462–468.
14. Gallo MF, Steiner MJ, Hobbs MM, et al. Biological markers of sexual activity: tools for improving measurement in HIV/sexually transmitted. Sex Transm Dis 2013; 40:447–452.
15. Baral SD, Poteat T, Strömdahl S, et al. Worldwide burden of HIV in transgender women: a systematic review and meta-analysis. Lancet Infect Dis 2013; 13:214–222.
16. Barrington C, Wejnert C, Guardado ME, et al. Social network characteristics and HIV vulnerability among transgender persons in San Salvador: identifying opportunities for HIV prevention strategies. AIDS Behav 2012; 16:214–224.
17. Silva-Santisteban A, Raymond HF, Salazar X, et al. Understanding the HIV/AIDS epidemic in transgender women of Lima, Peru: results from a sero-epidemiologic study using respondent driven sampling. AIDS Behav 2012; 16:872–881.
18. Bauer GR, Travers R, Scanlon K, Coleman TA. High heterogeneity of HIV-related sexual risk among transgender people in Ontario, Canada: a province-wide respondent-driven sampling survey. BMC Public Health 2012; 12:292.
20. Guo W, Bao S, Lin W, et al. Estimating the Size of HIV Key Affected Populations in Chongqing, China, Using the Network Scale-Up Method. PLoS ONE 2013; 8:e71796
21. Kerr LR, Mota RS, Kendall C, et al. HIV among MSM in a large middle-income country. AIDS 2013; 27:427–435.
22. Szwarcwald CL, de Souza Júnior PRB, Damacena GN, et al. Analysis of data collected by RDS among sex workers in 10 Brazilian Cities 2009: estimation of the prevalence of HIV, Variance, and Design Effect. J Acquir Immune Defic Syndr 2011; 57:S129–S135.
25. Robinson WT, Risser JMH, McGoy S, et al. Recruiting injection drug users: a three-site comparison of results and experiences with respondent-driven and targeted sampling procedures. J Urban Health 2006; 83 (Suppl 1):29–38.
26. Platt L, Wall M, Rhodes T, et al. Methods to recruit hard-to-reach groups: comparing two chain referral sampling methods of recruiting injecting drug users across nine studies in Russia and Estonia. J Urban Health 2006; 83 (Suppl 1):39–53.
27. Paz-Bailey G, Miller W, Shiraishi RW, et al. Reaching men who have sex with men: a comparison of respondent-driven sampling and time-location sampling in Guatemala City. AIDS Behav 2013; 17:3081–3090.
28. Wei CY, McFarland W, Colfax GN, et al. Reaching black men who have sex with men: a comparison between respondent-driven sampling and time-location sampling. Sex Transm Infect 2012; 88:622–626.
29. Tran VH, Le LN, Johnston LG, et al.Application of time-location and respondent-driven sampling methods for HIV surveillance in Vietnam. ICAAP 2013; Abstract: ICAAP2469-00979.
31. Yamanis TJ, Doherty IA, Weir SS, et al. From coitus to concurrency: sexual partnership characteristics and risk behaviours of 15-19 year old men recruited from urban venues in Tanzania. AIDS Behav 2013; 17:2405–2415.
32. Volz E, Heckathorn DD. Probability based estimation theory for respondent driven sampling. J Official Stat 2008; 24:79–97.
33▪▪. Gile K, Handcock M. Network model-assisted inference from respondent-driven sampling data. arXiv:1108.0298v1 [stat.ME]; 2011. http://arxiv.org/pdf/1108.0298.pdf
. [Accessed 1 December 2013]
The estimator presented in this article offers a more robust option to overcome the violation of certain RDS assumptions.
35▪▪. Yamanis TJ, Merli MG, Neely WW, et al. An empirical analysis of the impact of recruitment patterns on RDS estimates among a socially ordered population of female sex workers in China. Soc Methods Res 2013; 42:392–425.
The authors propose a new bootstrap to correct variance for the branching recruitment process.
36▪▪. McCreesh N, Copas A, Seeley J, et al. Respondent driven sampling: determinants of recruitment and a method to improve point estimation. PLoS One 2013; 8:e78402.
The authors suggest an additional interview step that provides information for improved weighting of RDS estimates. ‘Interview presentation weighting’ improved estimates for characteristics associated with coupon distribution.
37▪. Wejnert C, Pham H, Krishna N, et al. Estimating design effect and calculating sample size for respondent-driven sampling studies of injection drug users in the United States. AIDS Behav 2012; 16:797–806.
A thorough analysis of two rounds of RDS data collection in 20p sites shows that design effect is typically between two and four in these populations, for most variables. This is much lower than published theoretical work.
38. Johnston LG, Chen YH, Silva-Santisteban A, Raymond AH. An empirical examination of respondent driven sampling design effects among HIV risk groups from studies conducted around the world. AIDS Behav 2013; 17:2202–2210.
40. Johnston LG, Prybylski D, Raymond HF, et al. Incorporating the service multiplier method in respondent driven sampling surveys to estimate the size of hidden and hard-to-reach populations: case studies from around the world. Sex Transm Dis 2013; 40:304–310.
41▪▪. Handcock MS, Gile KJ, C Mar. Estimating hidden population size using respondent driven sampling data. arXiv:1209.6241v1 [stat.ME] 27 Sep 2012.
Applying statistics for incomplete social network data, the authors develop an algorithm for estimating the population size of the sampled network. The algorithm will be implemented in freeware for easy use.