Respondent-driven sampling (RDS) is a network-based method to recruit hidden populations1 that is increasingly used in human immunodeficiency virus (HIV)-related studies of persons who engage in illicit drug use, commercial sex work, and men having sex with men.2–8 RDS involves direct recruitment of peers by their peers, a dual system of incentives, and a coupon system. Recruitment starts with an initial set of subjects known as “seeds,” and continues in waves, with seeds recruiting first-wave respondents, first-wave respondents recruiting the second-wave respondents, and so on, until the final sample size is achieved. Respondents are typically monetarily compensated for interview completion as well as for each peer that they successfully recruit. A coupon system is used to monitor the recruitment quota (i.e, the number of peers one can recruit into the study) and recruitment information is used to link recruiters to recruits.
RDS is an adaptation of traditional chain-referral sampling methods first introduced by Coleman9 to study characteristics of social networks. It was specifically designed to eliminate some of the biases associated with these methods, such as bias due to nonrandom selection of seeds, volunteerism, and masking.1,10–12 Although RDS can be successful in eliminating these biases, it is prone to other sources of bias such as differential recruitment effectiveness, differential recruitment patterns, and heterogeneity in degree.10,11,13,14
Differential recruitment effectiveness occurs when some groups are better at recruiting than others. When this occurs, the group with better recruitment effectiveness usually becomes overrepresented in the sample.10 Overrepresentation takes place when the population is homophilous (i.e., its members are more likely to connect with other individuals who are similar to themselves), the opposite being true for heterophilous populations (i.e., its members are more likely to connect with other individuals who are dissimilar to themselves). However, since most populations are homophilous, overrepresentation of groups with better recruitment effectiveness is much more common than underrepresentation.
Differential recruitment patterns are usually the result of individuals’ tendencies to associate with other individuals who are similar to them, also known as homophily. This causes personal networks to be homogeneous with regard to many sociodemographic, behavioral, and intrapersonal characteristics.15 The presence of homophily will cause a greater correlation between the sample and its seeds. In the presence of differential recruitment, homophily may bias the sample because recruitment patterns will reflect affiliation patterns, with preference for ties within a group.10
Heterogeneity in degree refers to differences between groups in terms of network size. When such differences exist, subjects with larger network sizes are oversampled because more recruitment paths lead to them.10
In public health, the notion that biased samples can yield benefits is rarely entertained. A biased sample can be problematic if valid statistical inference cannot be made. However, when sources of bias can be identified and quantified, bias becomes less problematic because one may correct for it and obtain unbiased (or at least asymptotically unbiased) estimates of parameters of interest. Bias in RDS has been extensively studied and methods to assess, quantify, and correct for it have been developed and thoroughly described.10–14,16,17 However, the potential benefits that RDS bias may yield have been understudied. We explored bias in an RDS-based study by examining patterns of recruitment of HIV-positive and HIV-negative recruiters. We first determined to what extent differential recruitment effectiveness, differential recruitment patterns, and heterogeneity in degree, by HIV-status, were present in our sample. We subsequently identified factors associated with being recruited by an HIV-positive IDU. Although the goal of this study was not to recruit HIV+ or other high-risk persons, our results have implications for intervention studies that may consider using RDS to identify HIV+ or other high-risk individuals.
Study subjects were 1056 IDUs (1024 recruits and 32 seeds) recruited between April, 2006 and April, 2007, in Tijuana, Mexico, into a prospective study of behavioral and contextual factors associated with HIV, syphilis, and tuberculosis infection. Eligibility criteria included: age ≥18 years; having injected illicit drugs within the past month as confirmed by inspection of injection stigmata (track marks); ability to speak Spanish or English; willingness and ability to provide informed consent; and having no plans to permanently move out of the city in the following 18 months. Methods were approved by the Institutional Review Board of the University of California, San Diego and the Ethics Board of the Tijuana General Hospital.
Study Design and Procedures
Participants were recruited via RDS, whereby a diverse group of seeds, heterogeneous in age, gender, and geographic location underwent an interview, were educated on how to refer other eligible IDUs, and were given uniquely coded coupons to refer their peers to the study, as described previously.7 Of the 32 subjects treated as seeds, 24 were productive at recruiting other individuals into the study. Recruitment continued in waves as subjects returning with coupons were given coupons to recruit members from their social networks. Although men were given 3 coupons, women received anywhere between 6 and 12 coupons in an unsuccessful attempt to recruit more women. Computerized interviews, produced via questionnaire development system, were conducted by indigenous outreach workers through the use of a modified recreational vehicle and a storefront office. IDUs completed an interviewer-administered survey that elicited information on sociodemographic, behavioral, and contextual characteristics.
Study staff recorded serial number of coupons given to respondents and those from respondents who were enrolled into the study. This information helped us link each recruit with his/her recruiter, to enable adjustments for correlations between recruiter and recruit. We also collected network size information about injection drug use and used this information for a multiplicity adjustment, in which the study participants were weighted by the reciprocal of their network sizes.
The “Determine” rapid HIV antibody test was administered to determine the presence of HIV antibodies (Abbott Pharmaceuticals, Boston, MA). All reactive samples were then tested using an HIV-1 enzyme immunoassay and immunofluorescence assay. Syphilis serology used the rapid plasma reagin (RPR) test (Macro-Vue, Becton Dickenson, Cockeysville, MD). RPR-positive samples were subjected to confirmatory testing using the Treponema pallidum particle agglutination assay (Fujirebio, Wilmington, DE). Quantitative nontreponemal test titers were obtained for subjects who tested positive to the RPR test. Titers ≥1:8 were considered to be consistent with active syphilis infection. QuantiFERON TB Gold In-Tube (QFT-G) was used for detecting Mycobacterium tuberculosis infection. Specimen testing was conducted at the San Diego County Health Department. HIV/STI test results were provided to participants after confirmation; those testing positive were referred to the municipal health clinic for free medical care.
Statistical analyses compared primarily recruits of HIV-positive and of HIV-negative IDUs. As seeds did not have a recruit, they were excluded from these comparisons. Additionally, to assess recruitment effectiveness and recruitment patterns, by HIV status, we compared HIV-positive subjects with HIV-negative subjects. The entire sample was used for these comparisons. Depending on whether distributional assumptions were met or not, continuous outcomes were examined using either t tests or Wilcoxon rank sum test. Similarly, binary outcomes were examined using either Pearson χ2 test or Fisher exact test. To control for multiple testing, the raw P values associated with outcomes within each area of interest (i.e., sociodemographics, social influence, individual behaviors/risks, and structural/environmental factors) were adjusted for false discovery rate (FDR) by using the Hochberg and Benjamini method.18 While both, raw and FDR adjusted P values are listed in Table 1, the corresponding statistical inferences are based on FDR adjusted P values. To identify factors associated with being the recruit of an HIV-positive IDU, we performed univariate and multivariate logistic regressions. For model building we used a manual procedure, where all of the variables that had attained a significance level ≤10% in the univariate models were considered for inclusion in a multivariate model. Lack of multicollinearity between the predictor variables in the final model was confirmed by appropriate values of the largest condition index and of the variance inflation factors.
To correct for differential recruitment effectiveness by HIV status, we calculated inverse probability weights based on individualized recruitment weights. The weights include a factor to control for respondent’s heterogeneity of degree (i.e., multiplicity) and were derived via RDS Analytical Tool.19 The variable containing the weights was used as a covariate in the logistic regression models. Interactions between this covariate and the predictors were also explored. To account for correlation between recruiter and recruit, we created a variable indicating who the recruiter of each subject was, and used this variable as a cluster variable in the GEE algorithm. An exchangeable correlation structure within each cluster was assumed (i.e., correlation between any 2 subjects recruited by the same recruiter was assumed to be the same).
To estimate the model’s rate of classification accuracy, we conducted Monte-Carlo cross-validation.20 The Monte-Carlo procedure randomly split the data into model fitting and model testing subsets. For each iteration, the proportion of observations for which the model agreed with the outcome was calculated and averaged over the 10,000 iterations to obtain the estimate for the classification accuracy of the model.
Description of HIV-Positive and HIV-Negative Recruits
Our sample consisted of 1056 (32 seeds and 1024 recruits) participants. Table 1 summarizes the baseline characteristics of the 1024 recruits (50 recruits of HIV-positive IDUs and 974 recruits of HIV-negative IDUs). Most recruits were males (85.2%), the median age was 37, and 67.2% were born outside the Mexican state of Baja California. Overall, 45 recruits were diagnosed as HIV-positive and the majority (93.3%) were previously unaware of their own serostatus. However, the percentage of subjects unaware of their own serostatus was marginally higher among those recruited by HIV-negative subjects as compared to those recruited by HIV-positive subjects (97.5% vs. 60%; PFDR-adj = 0.07). Recruits of HIV-positive versus HIV-negative recruiters did not differ in terms of age, gender, educational attainment, income, marital status, birthplace, or sexual orientation.
We next examined group differences in terms of social influences. Recruits of HIV-positive IDUs had larger numbers of IDUs in their social network (median = 40 vs. 22; PFDR-adj = 0.02), personally knew a larger number of HIV-positive individuals (median = 1.5 vs. 0; PFDR-adj <0.001), were more likely to be friends with the recruiter (80% vs. 61.1%; PFDR-adj = 0.02), and were more likely to come in contact with the recruiter in a shooting gallery (30% vs. 11.9%; PFDR-adj <0.001). They also spent marginally more time on the street (median = 12 vs. 10 hours per day; PFDR-adj = 0.06), but were less likely to come in contact with the recruiter on the street (16% vs. 45%; PFDR-adj <0.001). The 2 groups did not differ in terms of the proportion ever having been forced to have sex, the proportion with high perceived risk of HIV infection, the proportion with an IDU sex partner, or the number of people that they inject with.
We next compared recruits of HIV-positive versus HIV-negative recruiters in terms of risk behaviors, protective behaviors, and infectious disease status. During the earlier 6 months, recruits of HIV-positive IDUs were less likely than their HIV-negative counterparts to have had sex with a regular sex partner (10% vs. 59.9%; PFDR-adj = 0.01), more likely to have had sex with casual sex partner(s) (100% vs. 63.2%; PFDR-adj = 0.05), and marginally more likely to have never used a condom during sex with their casual partner(s) (70% vs. 36%; PFDR-adj = 0.09). They also were more likely to report having had unprotected sex with an HIV-infected partner (10% vs. 2%;PFDR-adj = 0.02) and to report obtaining syringes from the Tijuana needle exchange program (34% vs. 15.9%; PFDR-adj = 0.001). Last but not least, they were more likely to test positive for syphilis antibodies (32% vs. 14.6%; P <0.001), and marginally more likely to test positive for HIV (10% vs. 4%; P = 0.06). The proportion of subjects with syphilis titers ≥1:8 was greater among the recruits of HIV-positive (12% vs. 7.7%) but this difference did not reach statistical significance (P = 0.27). Also, groups did not differ in their reported years of injection; frequency of injection; receptive needle sharing; male having sex with male status; ever trading sex; and having been tested previously for HIV.
Finally, we examined group differences for a variety of structural influences. Compared to recruits of HIV-negative IDUs, recruits of HIV-positive IDUs were significantly more likely to report having been forced by police to rush an injection (50% vs. 32%; PFDR-adj = 0.04), and to having been forced by police to leave the place where they lived in the previous 6 months (32.6% vs. 12.6%; PFDR-adj = 0.001). No differences between groups were found in terms of number of years lived in Tijuana; homelessness; places where they inject drugs; ever being arrested; and the number of times in jail/prison.
All variables attaining P ≤0.10 in univariate regressions were considered as candidates for multivariate models (Table 1). The following 6 factors remained independently associated with being the recruit of an HIV-positive recruiter: personally knowing someone with HIV/AIDS (adjusted [adj] OR = 2.4); having had unprotected sex with an HIV-infected person (adj OR = 6.7), lifetime syphilis infection (adj OR = 2.8), meeting the recruiter in a shooting gallery (adj OR = 4.5), having obtained needles from a needle exchange program in the previous 6 months (adj OR = 2.3), and having a larger number of arrests for track marks (adj OR = 1.11) (Table 2). The average classification rate of accuracy for this model yielded by Monte-Carlo cross-validation was 94.96% with a standard error of 0.01%.
Among the 1056 study participants, 47 (4.5%) were diagnosed as HIV-positive. Twenty of the 47 (42.6%) HIV-positive and 449 of the 1009 (44.5%) HIV-negative subjects were recruiters, with HIV-positive subjects generating 4.9% (50/1024) of the recruits. Since HIV-positive subjects comprised 4.5% of the sample and generated 4.9% of the recruits we found no evidence that recruitment effectiveness varied by HIV status.
Differential Recruitment Patterns.
HIV-positive subjects recruited 10% (5/50) other HIV-positive subjects and HIV-negative subjects recruited 4.1% (40/974) HIV-positive subjects (P = 0.06). After controlling for whether the recruiter and the recruit injected drugs together, the odds of an HIV-positive recruiter recruiting an HIV-positive individual were 2.8 times greater than the corresponding odds of an HIV-negative recruiter (P = 0.04). We also found that 20% of the HIV-positive recruiters as compared to 0% of the HIV-negative recruiters, recruited more than one HIV-positive subject into the study (P = 0.002), indicating that recruitment patterns differed significantly by HIV status. Only 1 of the 20 HIV-positive recruiters was aware of his HIV-positive serostatus, suggesting that HIV-positive recruiters were more likely to recruit other HIV-positive IDUs even though they were not aware of their own serostatus.
Heterogeneity in Degree.
As chain referral samples are biased towards individuals with larger network sizes, we adjusted the distribution of network size by weighting the distribution of network sizes by the inverse of the network size.12 This led to a significant drop in estimated network size, from a median of 95 and 70 for recruits of HIV-positive and HIV-negative recruiters to corresponding values of 40 and 22, respectively. Recruits of HIV-positive recruiters had significantly larger network sizes than recruits of HIV-negative recruiters (PFDR-adj = 0.02) and thus had a higher probability of being recruited into the study.
An important contribution of this study is the finding that HIV-positive IDUs were significantly more likely than HIV-negative IDUs to recruit other HIV-positive IDUs into a research study, even though they were unaware of their own HIV serostatus. The finding not only reinforces the assertion that to draw valid statistical inferences from RDS-based studies, one has to assess and possibly adjust for differential recruitment bias, but also that RDS can be used to identify HIV-positive individuals who are unaware of their HIV-positive status who can then be referred to counseling and medical services. One has to keep in mind that the goal of our study was not to recruit HIV-positive or other high-risk individuals, and as such, only 2 of the seeds in our study were HIV-positive. However, RDS can be easily adapted to oversample individuals, such as undiagnosed HIV cases, if we know certain characteristics of the people who are more likely to recruit them. For instance, if the goal is to recruit high-risk individuals, one strategy may be to initiate RDS recruitment with most or all seeds consisting of HIV-positive and/or other high risk individuals.
A second important finding is that compared to recruits of HIV-negative IDUs, recruits of HIV-positive IDUs have larger IDU networks. This indicates not only that recruits of HIV-positive IDUs have a higher probability of selection but also suggests that they have a heightened vulnerability to HIV infection. For instance, having a larger number of peers who are IDU has been associated with higher levels of needle sharing,21 overdose,22 and lower drug use cessation,23 which are known risk factors for HIV infection. Thus, RDS can potentially be used to identify not only undiagnosed HIV cases but also HIV-negative individuals at high risk of HIV acquisition.
Finally, compared to recruits of HIV-negative IDUs, recruits of HIV-positive IDUs were more likely to know someone infected with HIV, to have unprotected sex with an HIV-infected person, to have a higher lifetime prevalence of syphilis antibodies, and to have been more frequently arrested for track marks. These factors have been associated with an increased risk of acquiring HIV in our population,7 and others.24–26
Our study was limited by the fact that it only analyzed characteristics and behaviors of the HIV-positive and HIV-negative recruits cross-sectionally, allowing us to identify factors associated with being recruited by an HIV-positive recruit, without being able to ascribe causal interpretations to the data. Longitudinal studies are needed to ascribe such interpretations. We were also limited by the fact that only 2 of the seeds in our sample were HIV-positive which precluded the comparison of recruitment patterns between HIV-positive and HIV-negative seeds. Another limitation was that only 1 of the 20 HIV-positive recruiters was aware of his HIV-positive serostatus, and so our finding that HIV-positive recruiters are more likely than HIV-negative recruiters to recruit other HIV-positive individuals can only be generalized to HIV-positive recruiters who were unaware of their HIV serostatus. Further studies including HIV-positive respondents who are aware of their HIV serostatus are needed to determine their recruitment patterns. Similarly, our sample included a low proportion of female participants. However, an RDS convergence analysis about gender indicated that the sample composition reached equilibrium, thus supporting the robustness of our results.
Overall, our findings suggest that beyond success in recruiting hidden populations, RDS may facilitate identification of persons with undiagnosed HIV infection and high-risk networks. As such, RDS could become a vehicle for what has been referred to as “the new generation of network-based interventions.”27 An example of such intervention is the peer-driven intervention of the Eastern Connecticut Health Outreach Project, whereby IDUs were recruited through RDS and successfully motivated to recruit and educate each other about HIV prevention through a voucher-based incentive system.27,28 Heckathorn et al27 found that compared to the traditional street-based outreach intervention, the peer-driven intervention not only accessed a larger number of people and was more effective in reducing their levels of HIV risk behaviors, but it did so at a lower cost. Similarly, in an evaluation of 4 different sampling methods (targeted, stratified, time-space, and RDS) Semaan et al29 concluded that RDS uses the “least amount of formative research and resources.”
These findings suggest that RDS-based network interventions may be especially useful in resource-constrained settings with emerging HIV epidemics, like Tijuana. Our study opens the doors to a very practical application that can be used in fields other than HIV research. If our results can be generalized to other populations, RDS can be easily adapted to oversample individuals if we know certain characteristics of the people who are more likely to recruit them.
1.Heckathorn DD. Respondent driven sampling: A new approach to study hidden populations. Soc Probl 1997; 44:174–179.
2.Abdul-Quader AS, Heckathorn DD, McKnight C, et al. Effectiveness of respondent-driven sampling for recruiting drug users in New York City: Findings from a pilot study. J Urban Health 2006; 83:458–476.
3.Abdul-Quader AS, Heckathorn DD, Sabin K, et al. Implementation and analysis of respondent driven sampling: Lessons learned from the field. J Urban Health 2006; 83:i1–i5.
4.Yeka W, Maibani-Michie G, Prybylski D, et al. Application of respondent driven sampling to collect baseline data on FSW and MSM for HIV risk reduction interventions in two urban centers in Papua New Guinea. J Urban Health 2006; 83(7 suppl 1):i60–i72.
5.Johnston LG, Sabin K, Hien MT, et al. Effectiveness of respondent-driven sampling to recruit female sex workers in two cities in Vietnam. J Urban Health 2006; 83(suppl 7):16–28.
6.Magnani R, Sabin K, Saidel T, et al. Review of sampling hard-to-reach and hidden populations for HIV surveillance. AIDS 2005; 19(suppl 2):S67–S72.
7.Strathdee SA, Lozada R, Pollini RA, et al. Individual, social, and environmental influences associated with HIV infection among injection drug users in Tijuana, Mexico. J Acquir Immune Defic Syndr 2008; 47:369–376.
8.Ramirez-Valles J, Heckathorn DD, Vázquez R, et al. From networks to populations: The development and application of respondent-driven sampling among IDUs and Latino gay men. AIDS Behav 2005; 9:387–402.
9.Coleman JS. Relational analysis: The study of social organization with survey methods. Hum Organ 1958; 17:28–36.
10.Heckathorn DD. Respondent driven sampling II: Deriving valid populations estimates from chain-referral samples of hidden populations. Soc Probl 2002; 49:11–34.
11.Heckathorn DD, Semaan S, Broadhead RS, et al. Extensions of respondent-driven sampling: A new approach to the study of injection drug users aged 18–25. AIDS Behav 2002; 6:55–67.
12.Salganik MJ, Heckathorn DD. Sampling and estimation in hidden populations using respondent-driven sampling. Sociol Methodol 2004; 34:193–240.
13.Heckathorn DD. Extensions of respondent-driven sampling: Analyzing continuous variables and controlling for differential recruitment. Sociol Methodol 2007; 6:151–208.
14.Ramirez-Valles J, Heckathorn DD, Vasquez R, et al. The fit between theory and data in respondent-driven sampling: Response to Heimer. AIDS Behav 2005; 9:409–414.
15.McPherson M, Smith-Lovin L, Cook JM. Birds of a feather: Homophily in social networks. Ann Rev Sociol 2001; 27:415–444.
16.Goel S, Salganik M. Respondent driven sampling as Markov Chain Monte Carlo. Available at: http://www.cam.cornell.edu/∼sharad/papers/RDSasMCMC.pdf
. Accessed September 10, 2008.
17.Volz E, Heckathorn DD. Probability based estimation theory for respondent driven sampling. J Off Stat 2008; 24:79–97.
18.Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 1995; 57:289–300.
19.Volz E, Wejnert C, Degani I, et al. Respondent-Driven Sampling Analysis Tool (RDSAT), Version 5.6. Ithaca, NY: Cornell University, 2007.
20.Shao J. Linear model selection by cross-validation. J Am Stat Assoc 1993; 88:486–494.
21.Costenbader EC, Astone NM, Latkin CA. The dynamics of injection drug users’ personal networks and HIV risk behaviors. Addiction 2006; 101:1003–1013.
22.Tobin KE, Hua W, Costenbader EC, et al. The association between change in social network characteristics and non-fatal overdose: Results from the SHIELD study in Baltimore, MD, USA. Drug Alcohol Depend 2007; 87:63–68.
23.Latkin CA, Knowlton AR, Hoover D, et al. Drug network characteristics as a predictor of cessation of drug use among adult injection drug users: A prospective study. Am J Drug Alcohol Abuse 1999; 25:463–473.
24.El-Bassel N, Gilbert L, Wu E, et al. A social network profile and HIV risk among men on Methadone: Do social networks matter? J Urban Health 2006; 8:602–613.
25.Buchacz K, Klausner JD, Kerndt PR, et al. HIV incidence among men diagnosed with early syphilis in Atlanta, San Fransisco, and Los Angeles, 2004 to 2005 J Acquir Immune Defic Syndr 2008; 47:234–240.
26.Service SK, Blower SM. HIV transmission in sexual networks: An empirical analysis. Proc R Soc Lond B Biol Sci 1995; 260:237–244.
27.Heckathorn DD, Broadhead RS, Anthony DL. Aids and social networks: HIV prevention thorough network mobilization. Sociol Focus 1999; 32:159–179.
28.Broadhead RS, Heckathorn DD, Weakliem DL, et al. Harnessing peer networks as an instrument for AIDS prevention: Results from a peer-driven intervention. Public Health Rep 1998; 113(suppl 1):42–57.
29.Semaan S, Lauby J, Liebman J. Street and network sampling in evaluation studies of HIV risk-reduction interventions. AIDS Rev 2002; 4:213–223.