Estimating the size of populations at high risk of HIV is critical to inform programming and policy activities, including advocacy, response planning and resource allocation, modelling, projections and programme management . The independent size estimation of high-risk populations involved in covert, stigmatized and socially ostracized activities is a difficult exercise [2,3]. In spite of limitations related to mapping largely clandestine groups, those operating in diverse locations and when interventions lack strong networks in the community, mapping for interventions (Table 1) is required to target interventions appropriately [4–6]. Regular updates from intervention staff on the number and location of risk group members and using results from independent mapping for sampling frame development and size estimation will help data be more comprehensive. There is no standard method of size estimation that can be used universally. The UNAIDS Working Group reviewed the major size estimation methods, and some studies have compared them for reliability [1,7–9].
Avahan, the India AIDS initiative, is implementing a large-scale HIV prevention effort in southern and north-east India [10,11]. As part of the overall evaluation strategy, a large cross-sectional survey (termed the integrated behavioural and biological assessment; IBBA) of the key populations was carried out to monitor behavioural and biological trends and to obtain data for modelling the impact of the Avahan programme [12,13]. Estimates of the size of high-risk groups in four high HIV prevalence states in India (Karnataka, Tamil Nadu, Andhra Pradesh, Maharashtra) suggest that 0.56–0.73% of adult urban women are female sex workers (FSW) and 0.12–0.18% and 0.03% of adult urban men are high-risk men who have sex with men (MSM) and injecting drug users (IDU), respectively [14,15].
Size estimates were calculated using three different methodologies: capture–recapture, the multiplier method and an approach based on the Hansen and Hurwtiz model  (henceforth termed the reverse tracking method; RTM). Estimates were evaluated for validity and compared with each other and with programme data [7,8,17,18]. This paper reviews operational issues in the context of a large-scale, diverse programme and the validity of estimates for the states where size estimation took place.
As part of IBBA, comprehensive sampling frames of each district (i.e. sampling domain) and probability sampling (cluster-based sampling and respondent-driven sampling; RDS) offered an excellent opportunity for size estimation . A detailed description of IBBA methodology is included in a paper in this volume . Briefly, cluster-based sampling and RDS were applied depending on local population dynamics. In cluster sampling, the mapping teams: met with all local non-governmental organizations (NGO) (both Avahan and non-Avahan) to collect information on hotspots; and independently visited each hotspot, and identified new ones, to update information with regard to the number (minimum and maximum) of risk group members at the site and timing (peak and lean days/times). Keeping in mind that the number of individuals at the site could change at different times of day, the mapping team followed the Delphi method (Table 1) taking the help of peer educators, members of the risk group and key informants at the site . When specific members were associated with a single site and the size of the site was stable, conventional cluster sampling (CCS) was used by treating the site as a cluster, otherwise time location cluster sampling (TLCS) was the procedure. RDS, a method used for conducting surveys with hidden populations, relies on community members recruiting a limited number of their peers for participation, and those peers recruiting their peers, etc., until completion of the survey. The size estimation component focuses on FSW, high-risk MSM and IDU in 24, 10 and five survey groups, respectively.
Capture–recapture calculates size estimates by first ‘tagging’ a number of individuals and, through an independent recapture, calculating the proportion of overlap . One to 2 weeks before IBBA, unique objects (e.g. key chains, plastic stars) were distributed to individuals who met the survey eligibility criteria . Objects were selected based on the likelihood of the community easily remembering them without confusing them with other, similar, objects. Distribution throughout the IBBA sampling domain was done by teams of peer educators from NGO already working with the target population, survey team members and community monitoring board members (i.e. members of the target population who provided feedback on survey quality and community concerns). Those who received the object were requested to retain it. During the interview, respondents were shown the object and asked whether they had received it in the previous 1–8 weeks depending on the survey period. After survey completion, procedures executed for capture–recapture methodology were reviewed (i.e. to know if all the conditions required for capture–recapture were satisfied) with each survey group through personal discussions with individuals who actually distributed the objects. The details of coverage of the distribution, number distributed, how and when distribution took place and how many were involved in the process were also collected. The proposed method of estimation was based on Petersen's capture–recapture procedure .
The multiplier method compares two independent, overlapping sources of information to estimate the size of populations. The method, although mathematically simple, requires respondents to remember exposure to a specific service/event. Good record keeping at the service/event site and precise questions in the survey instruments is required .
Service information from Avahan NGO working with FSW and high-risk MSM in the district was one source of data, and the reported exposure to the same services during the IBBA interview was the second. Four indicators for exposure to intervention were designed in consultation with the state Avahan partners. The number of individuals who: (1) were currently (i.e. during survey period) registered with the project; (2) were contacted by peer educators or outreach workers in the past one month; (3) received a new health card in the past 3 months; and (4) visited Avahan clinic in the past 3 months.
Exposure to the above services was confirmed in the IBBA through use of the project name and showing the project logo (there were seven different logos and names corresponding to the seven state partners) and then asking specific service-related questions to each of the respondents. Data with respect to the above four services were collected from the records of local Avahan NGO working with the same risk group.
The estimate of the population size (ŝM) by this method was obtained by: ŜM = (1/P) × M
The 95% confidence interval (CI) for the size estimate ŜM:
Lower limit = (1/PU) × M
Upper limit = (1/PL) × M
Where: M is the number of subpopulation members exposed to an event/service (e.g. total number registered for a programme in a sampling district). P is the estimated weighted* proportion of sampled population (estimated by the complex sampling plan module of SPSS 14.0) reporting exposure to the service/event. PU and PL are the upper and lower 95% CI estimated by the complex sampling plan.
*Since simple random sampling was not done, the weighted proportion was used.
Reverse tracking method
The RTM relies on updated mapping for sampling frame development and actual size of the selected clusters obtained during survey. Systematic random sampling with probability proportional to size was used to select clusters . Selection of clusters within a district was done separately for CCS and TLCS. If a cluster was selected for the survey, the team visited the cluster, listed the actual number of members associated with the cluster (NCi)), if CCS, or the number of individuals who visited the cluster (NTi,) during a specified time period, if TLCS . Detailed description of the method with numerical example is given in Appendices 1 and 2.
The 95% CI is generally estimated through a complex process for systematic random sampling without replacement (complexity increases with n and only an approximation is possible) and the gain compared to sampling with replacement may not be substantial if the number of selected clusters ‘n’ is large . As between 51 to 202 clusters were selected for each FSW survey group and 28 to 99 clusters for each high-risk MSM survey, the n was reasonably large. Therefore, the CI was calculated assuming selection of clusters with replacement.
Avahan programme estimates
Before implementing their interventions, Avahan supported NGO conducted mapping and size estimation exercises. Most Avahan partners did a formal (externally managed) size estimation, which used a combination of social and geographical mapping with iterative, intensive Delphi techniques in urban and peri-urban areas . Each Avahan state partner designed their own tools for this. Most updated their size estimations every 6 months to one year either through a full-fledged mapping exercise or by regularly updating information obtained by peer educators (Avahan, personal communication).
The RTM was not applied in three FSW and four IDU groups, in which RDS was used as mapping for sampling frame development was not part of survey preparation. For three FSW groups (Hyderabad, Karimnagar, Warangal) and IDU in Bishnupur, capture–recapture and multiplier method were not applied as protocols were not finalized before survey implementation. multiplier method was not applied in groups in which RDS was used as a result of the method not being defined before IBBA implementation.
In assessing adherence to underlying assumptions and the necessary rigour of data source for the multiplier method and capture–recapture, capture–recapture was considered acceptable for calculating size estimates in Andhra Pradesh, Nagaland and Maharashtra. In particular, for programme indicators 2 (peer contact) and 4 (clinic visit) used in the multiplier method, a detailed review of documentation kept by NGO (Avahan) after survey completion revealed that data were not unique with respect to the individual. Duplication was possible as the outreach worker and clinic staff could not precisely calculate the total number of individuals contacted during the past 3 months. With respect to programme indicator 3 (health card) and 1 (registration), not all members of the risk group had a chance (i.e. a non-zero probability) to receive the health card or be registered in districts where Avahan was not the sole implementing partner. Furthermore, individuals could not recall if they received the card during the past 3 months, some NGO discontinued distribution of health cards and many were unaware that they were ‘registered’ with an NGO even if they accessed services from the same.
Questions assessing exposure to interventions by non-Avahan NGO were not addressed in as much detail as Avahan supported interventions in the IBBA questionnaire. Programme data from these NGO were not available to calculate size estimates. Therefore the two information sources used in the multiplier method may have covered different catchment areas in places where non-Avahan NGO were also present.
In districts of Tamil Nadu, preparation for capture–recapture was not completed before the survey start as some peers retained the object for themselves instead of distributing it, and some respondents were given the object just before participating in the survey (i.e. data were not independent). Registration in the state was widespread and documented well. Although some eligible individuals may have had a zero chance of representation in programme data as Avahan was not the sole implementing partner for FSW in three districts (Chennai, Madurai, Dharmapuri), the multiplier method was considered acceptable (compared with capture–recapture) using ‘registration’. The RTM was applied with all survey groups in which a sampling frame was used. If Ni and Mi (measure of size in mapping) vary greatly, the standard error will also vary. As it happened, Nis were found to be proportional or in proximity to Mis; therefore variances were small, resulting in narrow CI.
Comparisons below are based on the RTM, Avahan programme estimate and capture–recapture or the multiplier method depending on the survey group. Four of 13 districts in which capture–recapture and the RTM were applied and one of five districts in which the multiplier method and the RTM were applied showed similar results for FSW (Table 2). The results of the RTM were generally similar in Maharashtra where capture–recapture was used but not in Andhra Pradesh. When compared with survey estimates, Avahan programme estimates were larger in 14 of 24 FSW groups.
Among high-risk MSM, data from capture–recapture and the RTM showed similar results in four of six survey groups and two of four survey groups in which the multiplier method and the RTM were used (Table 3). As compared with survey estimates, Avahan programme estimates were larger in five of 10 survey groups. With IDU, two of three estimates using capture–recapture were similar to the Avahan programme estimates (Table 3).
The various size estimation techniques yielded divergent results from each other and from programme estimates.
Capture–recapture, although mathematically simple and less resource intensive, presents several implementation issues. When a unique object is distributed as one source of data, random distribution should be ensured so that individuals have a non-zero, equal probability of receiving the object and geographical coverage and eligibility criteria match in both data sources . Using community members for object distribution made it difficult to track the number of objects actually distributed to eligible populations, although in theory community members should have the best access to target populations . Verifying that all objects were distributed randomly and matching geographical coverage between data sources was impossible. The object distributed must be memorable and distributed close to the time of survey to reduce the effects of migration and recall bias. When an object is too valuable, it may not get distributed. Yet, when the object is of no value, individuals may give the object away or forget receiving it. If there is in or outmigration between the time the object is distributed and the survey, the size estimate will not be accurate . Distribution of a larger number of objects would have produced a less biased and more precise estimate.
Nevertheless, with multiple district-wide surveys in which multiple interventions exist, capture–recapture appeared to be more reliable than the multiplier method partly because of the shorter time period between the distribution of objects and the survey . Capture–recapture is operationally simple as there is no need to match survey exposure questions with information collected by NGO, which can be a difficult task for outsiders to the project. Applying knowledge about the location and size of hotspots and the mobility of populations to select sites for distribution and the number of objects to distribute may help make the estimate less biased in future surveys.
Although the questions for exposure to intervention had been discussed with Avahan state partners, difficulty in collecting the same was realized only after the survey. Understanding actual field methods of programme data collection at the NGO level was essential. The scale of the IBBA and time constraints prevented this from being done more thoroughly. Assessing exposure to the intervention itself was problematical as the intensity of branding of the intervention varied both by state partners and NGO. As a result, the recall of specific programme services was difficult to guarantee. Moreover, in districts with NGO supported by different funding agencies, individuals may be exposed to multiple programmes, but may not be able to differentiate between them and may be reported more than once in programme data. Different methods of assessing general programme exposure should be pre-tested. Assigning the review of programme data and accessing data as a specific task including field visits will help improve the quality of the multiplier method. When a large number of surveys are implemented, like in this IBBA, feasibility of the multiplier method should be considered carefully given the amount of programme data in different places that need to be assessed to identify the right multiplier.
Reverse tracking method
Data for the RTM were collected as a part of field preparation for the IBBA survey but may not be routinely available. The RTM does not account for populations that are hidden, mobile, who do not visit the cluster on the day of survey, or otherwise may be missed, and thus would underestimate population size. Collecting additional information during the survey by the method proposed by Laska et al.  and its extension by Tate and Hudgens , or the truncated Poisson approach by Van der Heijden et al.  could resolve this (to some extent).
The RTM is advantageous because Mi need not be accurate as it is enough to be indicative of the ‘bigness that may be highly correlated with actual size’ . This method can be applied without any additional cost in cluster-based surveys in which all of the information (mapping, cluster data) is required for the survey itself.
In applying the RTM, a few limitations were identified. Individuals counted at a site during mapping or the survey may not be uniquely and exclusively attached to a cluster, especially in the case of surveys covered by TLCS. By nature, individuals soliciting at cruising sites are mobile and may frequent other sites within the same geographical area. If individuals solicit at multiple sites, overestimation of the population size is possible. Collecting information from respondents about other venues frequented during defined periods may help to reduce this effect [31,34]. In the IBBA design, clusters of a site could be sampled more than once depending on cluster size and sampling interval. Practically resolving the NTi specific to selected clusters was difficult in such situations.
Standard errors of the RTM estimate were made by assuming that the sampling was with replacement.
Programme-based size estimates from interventions well established in the community are more easily updated as project staff are in regular contact with the groups. The trust built with the community over time helps in the identification of hidden subpopulations. Frequent updating of site size information helps address issues of mobility, thereby ensuring that data reflect the latest known solicitation behaviour. The target population may not be able to provide a precise estimate of the site size. If the group is partly or mostly hidden, use of the census method, with in-depth interviews or discussions, may not be effective enough to understand the size of the population as a whole. Consensus on the overall size estimates is difficult in districts with multiple interventions: sites of solicitation may overlap, individuals may visit multiple interventions and others may avoid programme services altogether.
In comparing the IBBA district-wide size estimate with the Avahan programme estimates, programme estimates were generally larger. Whereas this may indicate that the programme has penetrated further into the community, working with more hidden individuals, it could also indicate an inflated size estimate. This may happen when individuals visit multiple sites, are counted by more than one intervention, when individuals that are not part of the target community access programme services, or when funding is linked to size estimates.
Multiple tools for calculating size estimation may result in different estimates. Standardizing the survey method of size estimation across multiple surveys and locations may be tempting but can be misleading if the dynamics of the target population, strength of programmes and programme data and variation in survey methods are not considered (Table 4). Triangulation of data can yield a more focused estimate. Increasing time devoted to selecting, planning and localizing survey size estimation methods and operational plans should be prioritized. Specific plans to ensure assumptions are addressed for each survey in which estimates will be generated are needed.
Further exploration of the size estimation techniques is required to understand whether and when different techniques are suitable. This may help in resolving large variations in size estimates. For example: when both TLCS and CCS were employed to cover one risk group, how did this affect the estimation? With more mobile or hidden risk groups, how are the assumptions addressed? How do slight variations in recall affect the estimate? Enhancing this exploration with qualitative methods to understand trends in the behaviour of the risk group (e.g. seasonal migration, mobility as a result of harassment) will help in evaluating the methodologies.
The authors would like to thank the Avahan state partners who provided district-level programme size estimation data including: Tamil Nadu AIDS Initiative, International HIV/AIDS Alliance, Hindustan Latex Family Planning Promotion Trust, Emmanuel Hospital Association and Australian International Health Institute, Pathfinder International, and Family Health International.
IBBA study team
National AIDS Research Institute (NARI), Pune: Abhijit Deshpande, Amey S., Amol Salagare, Arun Risbud, Bhagyashri, Deepak More, Dilip Pardeshi, Geetanjali Mehetre, Jagnnath Navale, Jayesh Dale, Mandar Mainkar, Milind Pore, Narayan Panchal, Rahul Gupta, Raman Gangakhedkar, Ramesh Paranjape, Sachin Kale, Sachin P., Shailaja Aralkar, Shashikant Vetal, Shirin Kazi, Shradha Gaikwad, Shradha Jadhav, Sucheta Deshpande, Sujata Zankar, Tanuja Khatavkar, Trupti Joshi, Uma Mahajan
National Institute of Nutrition (NIN) Hyderabad: B. Narayana Goud, B. Sesikeran, Ch. Hanumatha Reddy, G. Krishna Reddy, G.N.V. Brahmam, K. Venkaiah, L.A. Rama Raju, M. Chandra Sekhara Rao, M. Shamsuddin, R. Harikumar, R. Hemalatha, S.P.V. Prasad, V.V. Annapurna
National Institute of Epidemiology (NIE), Chennai: A. Bhubneswari, A. Manjula, A. Pauline Priscilla, A. Sivaraman, Beena Thomas, C. Femina, C. Kalpana Devi, C. Selvendran, C.P. Girish Kumar, D. Prabhu, J. Rajkumar, Jagan, Jeyasingh, Joseph David, K.J. Dhananjeyan, K.J. Kalyanam, L. Palani, M. Amulu, M. Stabri Dhanabakyam, Michael, Muniraja, Paul Tambi, R. Muthu, S. Karthikeyan, S. Periasamy, S. Tilakvathi, S. Velan, Stephen Raja, T. Karunakaran, T. Rabinson, T. Venkata Rao
National Institute of Medical Statistics (NIMS), New Delhi: Arvind Pandey, B.S. Sharma, D. Sahu, D.K. Joshi, G.P. Jena, M. Thomas, Nandini Roy, P. Mahato, R.P. Sharma, R.S. Chadha, S.K. Benara, U. Sengupta
Regional Medical Research Council (RMRC), Dibrugarh: Ashim Das, Basumati Apum, D. Borasaikia, Dulon Chetia, G.K. Medhi, Gajendra Singh Golap, Ch. Barua, Gunavi Sonowal, H.K. Das, J. Mahanta, Jogeswar Barman, Manas Barman, Mintu Gogoi, Nabajyoti Laskar, Purnima Barua, R. Gogoi, S.Z. Hussain, T. Rahman, Utpal Saikia, Wahid Bora
Family Health International (FHI): Ajay Prakash, Bitra George, Gay Thongamba, Kathleen Kay, Lakshmi Ramakrishnan, Motiur Rahman, Nandan Roy, Sharad Malhotra, Srinivasan Kallam, Tobi Saidel, Umesh Chawla
Sponsorship: Support for this study was provided by the Bill and Melinda Gates Foundation.
The views expressed herein are those of the authors and do not necessarily reflect the official policy or position of the Bill and Melinda Gates Foundation.
Conflicts of interest: None.
2. Vandepitte J, Lyerla R, Dallabetta G, Crabbe F, Alary M, Buve A. Estimates of the number of female sex workers in different regions of the world. Sex Transm Infect 2006; 82(Suppl 3):iii18–iii25.
3. Tate JE, Hudgens MG. Estimating population size with two and three stage sampling designs. Am J Epidemiol 2007; 165:1314–1320.
5. Zhang D, Wang L, Lv F, Su W, Liu Y, Shen R, et al
. Advantages and challenges of using census and multiplier method to estimate the number of female sex workers in a Chinese city. AIDS Care 2007; 19:17–19.
6. CARE Bangladesh/UNAIDS South Asia Inter Country Team. Guidelines for behaviour change interventions to prevent HIV: sharing lessons from an experience in Bangladesh based on the application of lessons from Sonagachi, Kolkata
. 2003. For details E-mail: email@example.com.
7. Luan R, Zeng G, Zhang D, Luo L, Yuan P, Liang B, et al
. A study on methods of estimating the population size of men who have sex with men in Southwest China. Eur J Epidemiol 2005; 20:581–585.
8. Archibald CP, Jayaraman GC, Major C, Patrick DM, Houston SM, Sutherland D. Estimating the size of hard-to-reach populations: a novel method using HIV testing data compared to other methods. AIDS 2001; 15(Suppl. 3):S41–S48.
9. UNODC Global Assessment Program on Drug Abuse. Estimating prevalence: indirect methods for estimating the size of the drug problem
. 2003, GAP Toolkit Module 2. www.unodc.org/unodc/en/GAP/index.html
. Accessed: October 2008.
10. Steen R, Mogasale V, Wi T, Singh AK, Das A, Daly C, et al
. Pursuing scale and quality in STI interventions with sex workers: initial results from Avahan India AIDS Initiative. Sex Transm Infect 2006; 82:381–385.
11. The Bill and Melinda Gates Foundation. Avahan – the India AIDS initiative: the business of HIV prevention at scale
. New Delhi, India: The Bill and Melinda Gates Foundation; 2008.
12. Chandrasekaran P, Dallabetta G, Loo V, Mills S, Saidel T, Adhikary R, et al
. Evaluation design for large-scale HIV prevention programmes: the case of Avahan, the India AIDS initiative. AIDS 2008; 22(Suppl. 5):S1–S15.
13. The Bill and Melinda Gates Foundation. Use it or lose it: how Avahan used data to shape its HIV prevention efforts in India
. New Delhi, India: The Bill and Melinda Gates Foundation; 2008.
14. Chandrasekaran P, Dallabetta G, Loo V, Rao S, Gayle H, Alexander A. Containing HIV/AIDS in India: the unfinished agenda
. Lancet Infect Dis
15. Census of India 2001. Census data summary
. Available at: www.censusindia.gov.in
. Accessed:26 October 2008.
16. Hansen MM, Hurwitz WN. On the theory of sampling from finite populations. Ann Math Stat 1943; 14:333–362.
17. Luan R, Liang B, Yuan P, Fan L, Huang Y, Zeng G, et al
. A study on the capture–recapture method for estimating the population size of injecting drug users in South-west China. J Health Sci 2005; 51:405–409.
18. International Working Group for Disease Monitoring and Forecasting. Capture–recapture and multiple record systems estimation: history and theoretical development
. Am J Epidemiol
19. Heckathorn D. Respondent-driven sampling: a new approach to the study of hidden population. Soc Problems 1997; 44:174–199.
20. Saidel T, Adhikary R, Mainkar M, Dale J, Loo V, Rahman M, et al
. Baseline integrated behavioural and biological assessment among most at-risk populations in six high-prevalence states of India: design and implementation challenges. AIDS 2008; 22(Suppl. 5):S17–S34.
21. Dalkey N. An experimental study of group opinion: the delphi method. Futures 1969; 7:408–426.
22. Geibel S, van der Elst E, King'ola N, Luchters S, Davies A, Getambu E, et al
. ‘Are you on the market?’: a capture–recapture enumeration of men who sell sex to men in and around Mombasa. AIDS 2007; 21:1349–1354.
23. Petersen CGJ. The yearly immigration of young plaice into the Limfjord from the German Sea. Report of the Danish Biological Station 1896; 6:1–48.
24. Zheng D, Lv F, Wang L, Sun L, Zhou J, Su W, et al
. Estimating the population of female sex workers in two Chinese cities on the basis of HIV/AIDS behavioural surveillance approach combined with a multiplier method. Sex Transm Infect 2007; 83:228–231.
25. Hartley HO, Rao JNK. Sampling with unequal probabilities and without replacement. Ann Math Stat 1962; 33:350–374.
26. Seber GAF. Note on multiple recapture census model. Biometrika 1965; 52:249–259.
27. Weir S, Wilson D, Smith P, Schoenbach V, Thomas JC, Lamptey P, et al.
Assessment of a capture–recapture method for estimating the size of the female sex worker population in Bulawayo, Zimbabwe. Measure Evaluation 2003; WP-03-63. Available at: http://www.cpc.unc.edu/measure/publications/pdf/wp-03-63.pdf
. Accessed: April 2008.
28. Broadhead RS, Heckathorn D, Grund JP, Anthony D. Drug users vs. outreach workers in combating AIDS: the results of peer-driven intervention. In: XIth International Conference on AIDS
. Vancouver, July 1996 [abstract no. w.e.c. 3553].
29. Kruse N, Behets FMTF, Vaovola G, Burkhardt G, Barivelo T, Amida X, et al
. Participatory mapping of sex trade and enumeration of sex workers using capture recapture methodology in Diego-Suarez, Madagascar. Sex Transm Dis 2003; 30:664–670.
30. Hook EB, Regal RR. Capture–recapture methods in epidemiology: methods and limitations. Epidemiol Rev 1995; 17:243–264.
31. Laska EM, Meisner M, Siegel C. Estimating the size of a population from a single sample. Biometrics 1988; 44:461–472.
32. Van der Heijden P, Bustami R, Cruyff M, Engbersen G, van Houwelinger HC. Point and interval estimation of the population size using the truncated Poisson regression model. Statistical Modelling – An Internal Journal 2003; 3:305–322.
33. Cochran WG. Sampling techniques
. New York: John Wiley and Sons; 1977.
34. Laska EM, Meisner M, Wanderling JA, Kushner HB. Estimating population size when duplicates are present. Stat Med 1996; 15:1635–1646.
Appendix 1: Explanation of reverse tracking method
The calculation of estimates by RTM is similar to probability proportional to size (PPS) sampling. In PPS sampling, the probability of selection of ith cluster is Mi/M for all i = 1, 2, …, K, where Mi is the measure of size of the ith cluster, K is the total number of clusters and M
Equation (Uncited)Image Tools
is the total of all Mi of a specific population. If ‘Yi’ is the observed value of a random variable ‘Y’ of the ith sampled cluster then each of the ratio
Equation (Uncited)Image Tools
estimates the total of that population. If ‘n’ clusters are selected using a systematic sampling procedure, the total population size is estimated by taking the average of this ratio. Therefore, an unbiased estimator of population total is given by the Hansen–Hurwitz estimator :
Equation (Uncited)Image Tools
Here Mi is surrogate to Yi. Assuming that the sampling is with replacement, the variance of estimated population size is:
Equation (Uncited)Image Tools
The variance factor in this procedure is determined by the proportional relationship between the Yis and their Mis. The variance of the estimated total will be at minimum if Yi is proportional to Mi.
In the IBBA survey, the sampling was PPS without replacement. The RTM methodology just replaces the random variable ‘Yi’ by Ni, where Ni is the actual number (head count) observed during survey of the ith selected cluster and Mi to be the measure of size of the ith cluster obtained during mapping for sampling frame development. The M is the total mapped. Therefore the formula for the estimation of population size Ŝ is:
Equation (Uncited)Image Tools
If ‘Ni’ are true values and mapping is complete,
Equation (Uncited)Image Tools
will approach to true size. It may be noted that if Mi = Ni for all the sampled clusters, RTM yields ‘M’ the total mapped size (essentially indicating the estimated size) and zero variance. If sampling involves both CCS and TLCS in a district, then the estimate of the total size of that district is arrived by adding the estimated total of CCS and that of TLCS as the sampling was independent. The formula used for estimating the total when CCS and TLCS are used in a district is:
Equation (Uncited)Image Tools
Henceforth the suffix C and T represents CCS and TLCS, respectively.
The formula used to estimate the standard error is
Equation (Uncited)Image Tools
Therefore the 95% CI is Ŝ ± 1.96 × SE(Ŝ)
Appendix 2: Numerical example for the reverse tracking method
Information from mapping for sampling frame development is as below:
The selection of IBBA clusters (CCS and TLCS) is shown below:
The size estimates by the RTM method (first three columns are the selected clusters from the above tables and the fourth column is obtained during survey. Columns five and six provide size estimation and variance):
Equation (Uncited)Image Tools
These nine cluster values are used to estimate the population of 20 clusters and population size is estimated at 299 with 95% CI (287–311).
NB: (Mi) and (Ni) are closer, a situation similar to IBBA data.
© 2008 Lippincott Williams & Wilkins, Inc.