Accurate estimates of HIV prevalence are essential for health service planning, tracking the course of the epidemic, and assessing the effects of interventions. However, all estimation methods have inherent biases. With expanding access to HIV testing, and widespread availability of antiretroviral therapy (ART), an increasing proportion of the general population in sub-Saharan Africa know their HIV status. This knowledge affects their subsequent willingness to participate in serosurveys, introducing new biases.
For many years, HIV prevalence estimation in generalized epidemics relied on sentinel surveillance in antenatal clinics (ANC), based on unlinked anonymous testing [1–4]. The limitations of ANC surveillance are well recognized: there is no direct information on men, and HIV prevalence differs between pregnant and nonpregnant women. Methods have been devised that attempt to minimize the biases, but they are affected by contraceptive use [5–8], and anonymous surveillance can be difficult to use alongside named testing for prevention of mother-to-child transmission.
General population surveys provide better estimates, if they are large with high testing uptake [3,9–11]. Demographic and Health Surveys (DHS), which are conducted regularly in most sub-Saharan African countries, started to include HIV testing in 2001 . Individuals miss testing if they or their households refuse participation or if they are not found at home. Until recently, few individuals in Africa knew their HIV status, so this was unlikely to influence participation. However, with rapid expansion in access to HIV counselling and testing and roll out of free ART, the proportion of individuals who know their HIV status has increased dramatically . If individuals who know they are HIV positive are less likely to consent to testing in a survey than those who were HIV-negative when last tested, this will introduce bias. In a longitudinal population-based study in southern Malawi, individuals who were HIV-positive in 2004 were more likely to refuse a repeat HIV test in 2006 than those who were HIV-negative, with a risk ratio of 4.6, 95% confidence interval (CI) 2.6,8.2 .
In the DHS, although questions include prior testing, the results of previous tests cannot be corroborated. Biases due to prior testing have previously been investigated using data from six African DHS from 2003–2006. Using a conditional probability equations approach to adjust for selection bias , HIV prevalence was estimated by combining the refusal risk ratio from southern Malawi with DHS data on the proportion who had ever tested, the probability of refusing HIV testing for those with and without a prior HIV test, and HIV prevalence among those who were tested. This approach assumes that HIV-positive and HIV-negative individuals are equally likely to have previously tested; that if they do not know their status they are equally likely to refuse testing; and that the refusal risk ratio can be extrapolated from one setting to another. The bias was small in West Africa, but much larger in Malawi and Zimbabwe, in both absolute and relative terms.
In this article, we assess alternative methods for estimating HIV prevalence, including some that adjust for previous levels of HIV testing, using data from northern Malawi where an increasingly large proportion of adults know their HIV status.
Materials and methods
The Karonga demographic surveillance site (DSS) was established in a population of 33 000 in rural northern Malawi in 2002 . Information on births and deaths is collected monthly, facilitated by key informants, and in-migrations and out-migrations are updated by annual re-census, forming an ‘open cohort’. During the re-census, socio-demographic information is collected for all individuals.
The DSS is part of the Karonga Prevention Study, which has been conducting population-based epidemiological studies since 1980. Unique identifiers allow individuals to be traced over time and linked across studies. In the district, adult HIV prevalence was 0.2% during 1981–1984, 2% during 1985–1989, and 13% by the late 1990s . It then stabilized [16,17], and was 11.5% in a sample serosurvey in the DSS in 2005–2006 . Free ART became available in Malawi in 2004, in the district in 2005, and in the DSS area in 2006 .
From the 2005–2006 sample serosurvey onwards, individuals included in research studies were asked at each contact if they had previously tested for HIV and if they knew their HIV status. Counselling and testing was limited outside the research setting before 2004, so until then the research database captured most individuals who knew their HIV status. During 1998–2005, more than 90% of individuals tested in research studies chose to receive, and were given, their results (unpublished data).
Ethics approval for the studies was granted by the National Health Sciences Research Committee of Malawi and the Ethics Committee of the London School of Hygiene and Tropical Medicine. Routine demographic monitoring used verbal consent. For interview and HIV testing, individual written consent was obtained.
HIV serosurveys, 2007–2010
Three annual house-to-house cross-sectional HIV serosurveys were completed in the DSS area between October 2007 and October 2010. These followed about 6 weeks after re-census in each area. All adults (≥15 years old) were sought at home, with up to three repeat visits if necessary. Consent was asked separately for interviewing and HIV testing, and participants could accept testing but choose not to know the result. Interviewers asked about previous HIV testing, including the timing and result of the most recent test. All three surveys used rapid tests with results immediately available to the participants .
Antiretroviral therapy cohort
By mid-2008 ART uptake in the DSS area was estimated to be at least 60% of those eligible (i.e. WHO stage 3/4 or CD4 < 250 cells/μl) . From January 2008, all new attendees at the local HIV care/ART clinic were asked to participate in a study, whether or not they started ART, and 90% accepted. These individuals can be anonymously linked to the serosurvey data, thus contributing information on HIV-status.
The focus of our analysis was the three serosurveys 2007–2010, but for completeness we also present HIV prevalence estimates for the 2005–2006 sample serosurvey. All analyses were done separately for men and women, and sometimes overall, using Stata 11 (Stata Corp, College Station, Texas, USA).
We assessed the proportion of individuals who were found, consented to interview, and consented to HIV testing, by prior HIV status and socio-demographic characteristics. For the 2007/2008 survey, information on prior HIV status was used from up to 9 years previously, using information from the community controls in a TB case–control study (1998–2006), individuals included in a retrospective cohort study from 1998–2000, and the 2005–2006 sample serosurvey [16,18,22]. Given the design of these studies, these individuals should be representative of the general population, after standardization for age and sex. For subsequent surveys, results from previous serosurveys were added. Among individuals who were HIV-tested, the association between socio-demographic characteristics and HIV status was analysed, and is presented for the first total-population serosurvey (2007/2008). Crude and adjusted odds ratios (OR) were calculated using logistic regression, and (for Method five below) adjusted risk ratios were calculated using a generalized linear model with a log-link function.
We estimated HIV prevalence in the total adult DSS population, separately for each survey round, using the following methods:
- ‘Crude’ HIV prevalence: the proportion of HIV-positive individuals among the total tested.
- HIV prevalence age-standardized to the total adult DSS population, to adjust for variation in HIV testing by age group.
- HIV prevalence after multiple imputation of missing values using chained equations, with age group, area of residence, marital status, education, and occupation as predictors in the imputation model for HIV status. These were chosen because they were associated with serosurvey participation and/or HIV prevalence among those tested. HIV prevalence was taken as the mean of 20 imputed datasets. Following imputation, all resident adults are included.
- HIV prevalence, standardized for age and participation level. For each of three non-tested groups (those not seen, seen but refusing interview, and interviewed but not tested) and within age strata, HIV prevalence estimates were based on the subset with a prior HIV result available. All individuals tested in the survey round or with prior HIV results contribute to the estimate.
- HIV prevalence adjusted using a conditional probability equations approach, as already described . The risk ratio for refusing HIV testing, comparing HIV-positive to HIV-negative individuals, was estimated in our data, separately for men and women and for each of the three serosurveys. We also applied the method using the published risk ratio (4.6, 95% CI 2.6,8.2) . The denominator is all individuals who consented to interview.
- HIV prevalence incorporating previous HIV-positive results and subsequent HIV-negative results, age-standardized.
- HIV prevalence incorporating previous and subsequent results: HIV-negative if negative test subsequently or less than 3 years previously; HIV-positive if positive test previously or less than 3 years subsequently, age-standardized.
- HIV prevalence as for (7), plus self-reported HIV status among individuals who did not test in any of the three serosurveys, age-standardized. Among individuals who never tested in a serosurvey but reported having been tested elsewhere, the number who self-reported they were HIV-positive was multiplied by 1.67 (=1/0.6) to adjust for under-reporting of positive status, assuming that for every six people who report they are HIV-positive there are truly ten (see Results).
- HIV prevalence as for (8), but additionally adjusted for nonparticipation, age-standardized. This assumes that age-specific HIV prevalence in those never interviewed, or interviewed but never tested anywhere, was the same as among individuals who never tested in a serosurvey but reported having tested elsewhere.
- ‘Minimum’ HIV prevalence, combining ever tested HIV-positive, ever self-reported HIV-positive, and/or registered at the ART clinic from 2008 onwards, with the total adult population as the denominator.
- ‘Maximum’ HIV prevalence, combining those who ever tested HIV-positive in a serosurvey with those who reported testing HIV-positive elsewhere [corrected for under-reporting as in (8)], and treating all individuals who said they had never tested, or who were never interviewed, as HIV-positive.
Finally, we assess the accuracy of self-reported HIV status, separately for each survey round, to assess whether this can be used to adjust estimates when the actual prior prevalence is not known.
Around 17 000 individuals were sought in each of the three serosurveys (2007–2010), and around 2300 in the sample survey in 2005/2006. Most of those seen agreed to be interviewed. HIV testing in each round was higher for women (63–70%) than for men (54–62%), mainly because men were more often absent. Among individuals who consented to HIV testing, 98% chose to learn the result in 2007/2008, and over 99% in later rounds. By 2009/2010, 79% of men and 85% of women had tested for HIV in a serosurvey at least once. Supplementary Figure 1, http://links.lww.com/QAD/A247 shows the proportion of individuals in each round who were absent, refused interview, consented to interview but refused HIV testing, and consented to testing.
Associations with HIV-test refusal
Among individuals who were found, refusal rates for HIV testing were lowest among 15–24 year olds and highest among those aged more than 60 years (Table 1). Refusal was higher in roadside than in more remote areas, and higher among more educated individuals. Among women, widows refused testing more than married women, and those in professional or ‘unskilled’ occupations refused more than those in other occupations. Among men, differences by marital status were small, whereas farmers and students had lower test refusal than those in other occupations.
Associations with HIV prevalence
HIV prevalence differed substantially by age (Table 2), rising and peaking earlier in women (2.6% at 15–24 years, peaking at 16.3% at 30–39 years) than in men (0.8% at 15–24 years, peaking at 14.7% at 40–49 years). HIV prevalence was higher in roadside than in remote areas, in those with more education, and in divorced or widowed compared with married individuals. Among men, there was little difference by occupation after controlling for other characteristics, whereas among women HIV prevalence was highest among small traders, and lower in farmers and students than other occupations.
Association between prior HIV status and absence, interview and HIV-test refusal
In 2008/2009 and 2009/2010, but not 2007/2008, men and women with a prior HIV-positive result were more likely to be absent at the time of a serosurvey than individuals whose most recent test was HIV-negative, with ORs around 1.5 (Table 3).
Among individuals seen during the serosurveys, those with a prior HIV-positive test were more likely to refuse testing than those previously HIV-negative (adjusted ORs 2.4–4.9, Table 3). The results were similar when restricted to individuals who consented to interview. For both men and women, and in all surveys, there was no evidence that the association between prior HIV test result and HIV test refusal varied by age, area of residence, education, occupation, or marital status (data not shown).
Adjusted risk ratios and 95% CIs for HIV test refusal, comparing HIV-positive to HIV-negative individuals, that were used for the calculation of HIV prevalence using the conditional probability equations approach were 1.8 (1.1,3.0); 3.1 (2.6,3.7); and 2.7 (2.3,3.2) for men in 2007/2008, 2008/2009, and 2009/2010 respectively. For women, the corresponding estimates were 3.4 (2.3,5.1); 2.8 (2.4,3.2); and 2.2 (2.0,2.5).
HIV prevalence estimates
In the 2005–2006 sample serosurvey, 22% of men and 30% of women reported that they had previously had an HIV test and knew the result (Table 4). In the first whole DSS serosurvey, in 2007/2008, 51% of men and 59% of women reported previous testing. This had increased to 87% of men and 92% of women by 2009/2010.
The crude HIV prevalence (i.e. number HIV-positive/total tested) was 9.2% for men and 13.8% for women in 2005/2006 (Table 4). It was lower in 2007/2008 (6.2 and 8.4%, respectively) and lower again in 2008/2009 (4.4 and 6.3%) and 2009/2010 (4.5 and 6.8%). Age-standardizing (method 2) and multiple imputation (method 3) made little difference to these estimates.
Standardizing using participation level as well as age (method 4) increased the estimated HIV prevalence for women in all time periods (Table 4), and for men in the later periods, when participation rates were lower and a higher proportion had previously tested for HIV.
Using the conditional probability equations approach with risk ratios from this study to adjust for non-participation (method 5a) increased the estimated HIV prevalence in all time periods, for both men and women. Using the previously published risk ratio (method 5b) gave high estimates.
Using prior HIV-positive results, and subsequent HIV-negative results (method 6), gave low estimates for the 2007/2008 survey (for which few prior HIV-positive but many subsequent HIV-negative estimates were available) and high estimates for the 2009/2010 survey (for which no subsequent estimates were available). Allowing up to 3 years for prior HIV-negative results and subsequent HIV-positive results (method 7) gave more stable estimates.
Incorporating (adjusted) self-reported HIV-status among individuals not tested in the serosurveys (method 8) allowed more individuals to be included in the analysis (93% of women and 88% of men in 2008/2009), and slightly increased prevalence estimates. Extrapolating (adjusted) self-reported HIV prevalence among individuals who interviewed but never tested in the serosurveys to individuals who had never tested or never interviewed (method 9), further increased prevalence estimates.
The minimum estimate (method 10), using all direct evidence of prior HIV-positive tests, gave HIV prevalence figures that were higher than the crude, standardized and imputed estimates (and their upper 95% confidence limits) for the later surveys, for both men and women. For 2008/2009, maximum estimates were 18% for men and 16% for women.
Accuracy of self-reported HIV status
Among people with a prior HIV-positive result who refused HIV testing, 62% in 2008/2009 and 54% in 2009/2010 reported that they were HIV-positive, and one-third that they were HIV-negative (Table 5). We therefore used an approximation of 60% to correct for under-reporting for methods eight and nine. Under-reporting of HIV-positivity was less among those who consented to re-testing. Among individuals with previous HIV-negative results, over 95% self-reported as HIV-negative.
Refusal bias, due to prior knowledge of HIV status, leads to substantial underestimation of HIV prevalence
We showed a dramatic increase in the percentage of the adult population who know their HIV status between 2005 and 2010. Even before the first whole population serosurvey in 2007/2008, half the adults reported prior HIV testing. By 2009/2010, over 80% had HIV-tested in at least one of the three serosurveys.
As expected, individuals who knew they were HIV-positive were less likely to retest in a serosurvey than those who were HIV-negative, with risk ratios for HIV test refusal around two to three, similar for men and women, and consistent across time. These ratios were much lower than the only previously published estimate [4.6, 95% CI (2.6,8.2)], from a study in southern Malawi, which was based on relatively few HIV-positive individuals. There is little benefit for an HIV-positive individual in retesting, and they may fear disclosure. Many do retest, perhaps to get confirmation, but also reflecting the good relationship between the research teams and the community, which will lower the refusal risk ratio. Point-of-care CD4 counts could provide an incentive to retest and reduce the risk ratio.
Refusal bias, combined with inaccurate or no self-reports, has important implications for population HIV prevalence estimates in settings where a large proportion of the population already know their status. This is illustrated by the ‘crude’ and age-standardized HIV prevalence estimates. It is implausible for age-standardized HIV prevalence to fall as dramatically as shown in Table 4. HIV prevalence was stable at around 10% for the decade before 2005/2006 [15,17], and ART use should increase HIV prevalence. The confidence intervals are narrow and the change is not explained by sampling variation. Furthermore, the ‘minimum’ estimates of HIV prevalence in 2008/2009 and 2009/2010 were higher than the age-standardized estimates. The magnitude of the bias would increase in absolute terms, but change little in relative terms, as ‘true’ HIV prevalence increases.
Among individuals who consented to HIV testing, HIV prevalence patterns by age, sex, marriage, and area are consistent with previous findings in this and other similar populations. However, multiple imputation based on these factors made little difference to the estimates, leaving them implausibly low. Two analyses of HIV prevalence data in other populations have used multiple imputation [23,24], and two have used regression to predict HIV prevalence among nonparticipants [25,26]; all found a small effect on the overall prevalence estimate. Heckman-type selection models could be explored but require identification of ‘selection’ variables that are predictors of test refusal, but not associated with HIV status .
‘Best’ estimate of HIV prevalence in this population
The crude, age-standardized, and imputed HIV prevalence estimates are misleading, showing that standard adjustment methods are inadequate in settings where a large proportion of adults already know their status. As in the Karonga DSS, over 80% had tested for HIV at least once by 2009/2010, our estimates using longitudinal data (methods 6 and 7) are preferred, but probably underestimate prevalence since refusal bias was already important at the time of the first total-population serosurvey. Method 7, which includes positive HIV results up to 3 years later and negative results up to 3 years earlier, is preferred to method 6. It allows results to be inferred for more individuals, the 3-year time window is narrow so misclassification should be small, and it avoids the bias in the first and last serosurveys in a series that occurs with method 6. Using method 7, the best estimate of HIV prevalence in 2008/2009 was 6.5% for men and 8.4% for women. Supplementing these data with self-reported information includes more individuals, and extrapolating to individuals who report never testing or were never interviewed provides a prevalence estimate for the total adult population (method 9). In 2008/2009, this estimate was 7.1% for men and 9.2% for women.
Method 4, standardized for age and participation level, has the advantage of including all individuals in the analysis (including absentees), and gave similar results to method 9. It worked well for the first serosurvey in 2007/2008, because all prior tests are expected to be representative of the general population, but less well later when prior test results come mainly from the 2007/2008 serosurvey.
The conditional probability equations approach (method 5) gave similar results to the longitudinal data when using estimates of the refusal risk ratio from our own data (method 5a). However, using the published risk ratio (method 5b) gave implausibly high estimates when compared with the longitudinal data (methods 7, 8, and 9). The theoretical advantage of this method, that it can be applied to a single cross-sectional survey using a standard refusal risk ratio, is therefore lost. A further limitation is that in a cross-sectional survey, information on the percentage previously HIV-tested is known only for those interviewed. This would underestimate prevalence , since (as we showed) being absent and refusing interview are also higher for HIV-positive than HIV-negative individuals. Asking those who refuse retesting to disclose their status does not help, unless supplemented by information on the extent of under-reporting, as many HIV-positive individuals refuse to report or misreport (Table 5). In most settings the extent of under-reporting is not known; sensitivity analyses using a range of assumptions would be needed.
In settings where a high percentage of adults know their HIV status, which is increasingly common in sub-Saharan Africa, participation in serosurveys is lower for HIV-positive than HIV-negative individuals. Population-based HIV serosurveys will therefore substantially underestimate HIV prevalence, as demonstrated in our analyses, unless adjustments are made to account for the bias. Such adjustments require longitudinal data that are not widely available. Furthermore, the bias is likely to increase over time, so trends are also misleading. Strategies to increase participation are needed. Data sources with high coverage and high participation, such as ANC surveillance, remain important for estimating HIV prevalence in the general population.
S.F. and J.R.G. wrote the article. All authors gave comments and suggested edits to an advanced draft of the article. S.F., J.R.G., N.F., A.M., A.C.C., and R.H. designed the study. A.M., A.D., M.C., N.K., A.C.C., and A.P. led or co-led data acquisition. J.S. took overall responsibility for the management of all project databases that were used in this study. S.F. led the statistical analysis, with contributions from J.R.G, R.H., A.C.C., and N.F. All authors have read and approved the manuscript as submitted.
Source of funding: the research presented here was funded by a Wellcome Trust programme grant (grant number 079828/Z/06/Z).
Conflicts of interest
There are no conflicts of interest.
1. Chin J, Mann J. Global surveillance and forecasting of AIDS
. Bull World Health Organ
2. UNAIDS/WHO/CDC. Guidelines for conducting HIV sentinel serosurveys among pregnant women and other groups
. Geneva, Switzerland: WHO and UNAIDS; 2003.
3. Garcia-Calleja JM, Gouws E, Ghys PD. National population based HIV prevalence surveys in sub-Saharan Africa: results and implications for HIV and AIDS estimates
. Sex Transm Infect
2006; 82 (Suppl 3):iii64–70.
4. Ghys PD, Walker N, McFarland W, Miller R, Garnett GP. Improved data, methods and tools for the 2007 HIV and AIDS estimates and projections
. Sex Transm Infect
2008; 84 (Suppl 1):i1–i4.
5. Zaba BW, Carpenter LM, Boerma JT, Gregson S, Nakiyingi J, Urassa M. Adjusting ante-natal clinic data for improved estimates of HIV prevalence among women in sub-Saharan Africa
6. Gouws E, Mishra V, Fowler TB. Comparison of adult HIV prevalence from national population-based surveys and antenatal clinic surveillance in countries with generalised epidemics: implications for calibrating surveillance data
. Sex Transm Infect
2008; 84 (Suppl 1):i17–i23.
7. Montana LS, Mishra V, Hong R. Comparison of HIV prevalence estimates from antenatal care surveillance and population-based surveys in sub-Saharan Africa
. Sex Transm Infect
2008; 84 (Suppl 1):i78–i84.
8. Walker N, Grassly NC, Garnett GP, Stanecki KA, Ghys PD. Estimating the global burden of HIV/AIDS: what do we really know about the HIV pandemic?
9. Boerma JT, Holt E, Black R. Measurement of biomarkers in surveys in developing countries: Opportunities and problems
. Popul Dev Rev
10. Boerma JT, Ghys PD, Walker N. Estimates of HIV-1 prevalence from national population-based surveys as a new gold standard
11. Calleja JM, Marum LH, Carcamo CP, Kaetano L, Muttunga J, Way A. Lessons learned in the conduct, validation, and interpretation of national population based HIV surveys
2005; 19 (Suppl 2):S9–S17.
12. Cremin I, Cauchemez S, Garnett GP, Gregson S. Patterns of uptake of HIV testing in sub-Saharan Africa in the pre
-treatment era. Trop Med Int Health
13. Reniers G, Eaton J. Refusal bias in HIV prevalence estimates from nationally representative seroprevalence surveys
14. Jahn A, Glynn JR, Mwaiyeghele E, Branson K, Fine PEM, Crampin AC, et al. Evaluation of a village-informant driven demographic surveillance system
. Demogr Res
15. Glynn JR, Ponnighaus J, Crampin AC, Sibande F, Sichali L, Nkhosa P, et al. The development of the HIV epidemic in Karonga District, Malawi
16. Crampin AC, Glynn JR, Floyd S, Malema SS, Mwinuka VK, Ngwira BM, et al. Tuberculosis and gender: exploring the patterns in a case control study in Malawi
. Int J Tuberc Lung Dis
17. Crampin AC, Glynn JR, Ngwira BM, Mwaungulu FD, Ponnighaus JM, Warndorff DK, et al. Trends and measurement of HIV prevalence in northern Malawi
18. McGrath N, Kranzer K, Saul J, Crampin AC, Malema S, Kachiwanda L, et al. Estimating the need for antiretroviral treatment and an assessment of a simplified HIV/AIDS case definition in rural Malawi
2007; 21 (Suppl 6):S105–113.
19. Floyd S, Molesworth A, Dube A, Banda E, Jahn A, Mwafulirwa C, et al. Population-level reduction in adult mortality after extension of free antiretroviral therapy provision into rural areas in northern Malawi
. PLoS One
20. Molesworth AM, Ndhlovu R, Banda E, Saul J, Ngwira B, Glynn JR, et al. High accuracy of home-based community rapid HIV testing in rural Malawi
. J Acquir Immune Defic Syndr
21. Wringe A, Floyd S, Kazooba P, Mushati P, Baisley K, Urassa M, et al. Antiretroviral therapy uptake and coverage in four HIV community cohort studies in sub-Saharan Africa. Trop Med Int Health
22. Crampin AC, Floyd S, Glynn JR, Sibande F, Mulawa D, Nyondo A, et al. Long-term follow-up of HIV-positive and HIV-negative individuals in rural Malawi
23. Marston M, Harriss K, Slaymaker E. Nonresponse bias in estimates of HIV prevalence due to the mobility of absentees in national population-based surveys: a study of nine national surveys
. Sex Transm Infect
2008; 84 (Suppl 1):i71–i77.
24. Ziraba AK, Madise NJ, Matilu M, Zulu E, Kebaso J, Khamadi S, et al. The effect of participant nonresponse on HIV prevalence estimates in a population-based survey in two informal settlements in Nairobi city
. Popul Health Metr
25. Mishra V, Vaessen M, Boerma JT, Arnold F, Way A, Barrere B, et al. HIV testing in national population-based surveys: experience from the Demographic and Health Surveys
. Bull World Health Organ
26. Mishra V, Barrere B, Hong R, Khan S. Evaluation of bias in HIV seroprevalence estimates from national household surveys
. Sex Transm Infect
2008; 84 (Suppl 1):i63–i70.
27. Barnighausen T, Bor J, Wandira-Kazibwe S, Canning D. Correcting HIV prevalence estimates for survey nonparticipation using Heckman-type selection models