Introduction
Knowledge of the sexual behaviour of populations provides a basis for estimating the spread of sexually transmitted diseases including HIV, and for designing health promotion programmes. The key estimates for these purposes are sensitive to biases, including participation bias. In Britain the only large population-based probability sample survey addressing this issue is the National Survey of Sexual Attitudes and Lifestyles (NATSAL) [1–3 ]. The participation rate for this survey was around 65% (depending on assumptions made), comparable to similar surveys in other countries, such as the 1991–1992 survey in France [4 ], and to surveys of a less sensitive nature in Britain. However, even if the participation rate in a survey is comparatively high and the participants demographically representative of the target population, estimates of population parameters may be biased as a result of participation or item response biases. By participation bias we mean that those who do not take part in the survey differ in sexual behaviour from those who do, even within demographic classes (e.g., age-group and marital status). By item response bias we mean that among participants, those who do not answer the specific question of interest differ in sexual behaviour from those who do, even within demographic classes.
Hypotheses concerning the behaviour of non-participants are needed to produce parameter estimates. The hypothesis generally accepted in survey analysis is that the prevalence of behaviours in non-responders is the same as in responders (at least within demographic classes). Clearly this hypothesis states that there are no participation or item response biases. Under these assumptions, the standard analytical technique of weighting by demographic variables [5 ] results in unbiased estimates of population parameters. However, this hypothesis may well not be reasonable in surveys of sensitive topics, such as sexual behaviour, and participation bias in experimental sexual behaviour studies is well documented [6 ]. If around 35% of those sampled did not participate, as in NATSAL, then even relatively small deviation from the standard hypothesis will affect population parameter estimates. Furthermore, the identification of participation and item response biases is important because there is no simple technique to reduce their effects in analysis.
When the prevalence amongst the responders is high (e.g., 50%), then the sample prevalence will be of a similar order of magnitude even if the prevalence amongst the non-responders is 0 or 100%. This is not the case, however, when the prevalence amongst responders is low (e.g., 2%). Hence, participation bias is typically of most concern when estimating the prevalence of rare behaviours.
In this study we explore the likelihood and possible nature of participation bias in NATSAL. We examine the associations between behavioural factors and refusal of a self-completion booklet that formed part of the interview, and associations with interviewee embarrassment, as recorded by the interviewer. The embarrassed and booklet refusers can be considered ‘unwilling’ participants. Assuming that non-participants are closer in sexual behaviour to unwilling participants than to others, these analyses provide initial hypotheses concerning participation bias. We assess which demographic groups had higher booklet refusal and embarrassment rates, and consider what effects any participation bias may have had on estimates of the prevalence of HIV risk behaviours.
Materials and methods
Sampling and interview structure
A full description of the survey methodology and the questionnaires has been published elsewhere [1,3 ]. In brief, NATSAL was carried out in 1990–1991. The Post Office Small Users Address File (PAF) was the sampling frame for the survey. First a random sample of wards was taken from the PAF, stratified by region, and then addresses were selected from the list for each ward. Recipients of large amounts of mail such as large residential institutions and prisons were not included in this sampling frame. Finally, one person aged 16–59 years was selected randomly from each eligible household without substitution, and invited to take part in the survey. From the initial sample of 50 010 addresses, a total of 18 876 interviews were conducted. After removing non-residential addresses and ineligible households from the denominator, the participation rate was 65%. Those interviewed were found to be demographically representative of the general population in terms of ethnic mix, although men and older people were under-represented [1 ].
The interview used a combination of face-to-face questioning and a self-completion booklet containing the more sensitive questions. In the face-to-face interview, question cards were used so that the interviewee read the questions relating to age at first heterosexual intercourse and first heterosexual experience which were then answered verbally. A response card to the question of whether any sexual experience has been with just women, just men, both, or no such experience was also used so that the interviewee only had to state a letter in reply. All those who reported no sexual experience (with men or women) and no heterosexual intercourse were not offered the self-completion booklet. In addition, those aged 16 or 17 years who reported only sexual experience with the opposite sex, and no intercourse since age 13 years, were not offered the booklet, to avoid any distress. In total, 539 (2.9%) of the interviewees were not offered the booklet, of whom 305 (56%) were aged 16 or 17 years. A further 688 (3.8% of those offered) refused the booklet (including one person who accepted the booklet but did not complete it in a way that could be analysed). The reason for refusal, if given, was recorded and the frequencies are presented in Table 1 .
Table 1: Reason given for refusal of self-completion booklet.
Purpose of analyses
We analysed the associations between demographic and behavioural variables and booklet refusal to give a concise description of the item response bias in the survey and of population subgroups with low completion rates. Booklet refusal provides a neat analysis of item non-response because the answers to many of the questions within the self-completion booklet could be deduced for all, or almost all, of those not offered the booklet. For instance those not offered the booklet (unless aged 16 or 17 years and reporting intercourse before age 13 years but not since) could not have had anal intercourse with partner of opposite sex. Hence, those who were offered the booklet but refused it form a ‘core’ of non-responders to many sexual behaviour questions. This analysis of item response bias provides one indicator of the likelihood and possible nature of participation bias in the survey.
Analysis of embarrassment (here grouped to be binary, see below) provides an alternative indication of the likelihood and nature of participation bias. Embarrassment is clearly subjective, and interviewers had no written guidelines as to how to record it. Booklet refusal, although objective, is only applicable to those offered it. In view of these limitations, hypotheses about non-participants are best derived from considering the two analyses together.
Variables used in analysis
The demographic variables tested for association with booklet refusal and embarrassment were as follows: sex; age-groups 16–24, 25–34, 35–44 and 45–59 years; household occupational class classified as in the UK census, so that ‘other’ consists primarily of those who have never worked; ethnicity classified white, black, Asian, and other; marital status; and region of Britain. For the multivariate analysis household occupational class, ethnicity, marital status and region were condensed. Professional, intermediate, and skilled non-manual class were reclassified as ‘skilled non-manual’; skilled manual, part skilled and unskilled were reclassified as ‘unskilled or manual’. Ethnicity was condensed to white and non-white, and region was condensed to ‘metropolitan’ (inner London, outer London, other metropolitan areas), and ‘town or rural’. Widowed people were grouped with those divorced or separated.
The interviewer also recorded whether there were any language or literacy problems, or other problems of understanding, using a scale of ‘yes, severe’, ‘yes, some’, and ‘no problems’. For this analysis a binary variable indicating any problems of understanding was created, taking the value ‘yes’ if the interviewer recorded some or severe problems in any of the three areas.
Two behavioural variables were also used in this analysis. These are two key behavioural variables asked in the face-to-face part of the interview: homosexual experience as asked with a response card, and number of heterosexual partners since age 13 years. The number of heterosexual partners can be deduced to be zero, one or more than one, since participants were asked in the face-to-face interview whether they had had intercourse with a partner of the opposite sex, and how long the interval was between the first two such partners.
Interviewee embarrassment was recorded by the interviewer, in all but 429 cases, as one of four levels. Of the total 18 427 cases, 14 073 (76.4%) were recorded as not at all embarrassed, 3246 (17.6%) as only slightly embarrassed, 878 (4.8%) as somewhat embarrassed, and 230 (1.2%) as very embarrassed. For simplicity of analysis these last three levels were combined, creating a binary embarrassment variable.
Statistical methods
For univariate analysis, the same tests were used to compare those refusing and those not as to compare those embarrassed and those not. Ages were compared using the t test, and number of partners (as grouped) and occupational class using the χ2 test for trend. Sex, ethnicity (white versus non-white), region (metropolitan versus non-metropolitan), problems of understanding, homosexual experience, and marital status were compared using the χ2 test.
Logistic regression was used to assess the effect of factors together. For both embarrassment and booklet refusal two models are presented. The demographic model is intended to identify those demographic subgroups in which refusal of the self-completion booklet or embarrassment were high. All the demographic terms were selected by the forwards stepwise method in both models (except sex in the refusal model), and hence for ease of comparison, all terms are included in both models we present. The full model of demographic and behavioural variables is intended to show whether behaviour is associated with embarrassment and refusal after controlling for demographic variables. All terms were included in these models, so as to best answer this question. Only the interactions between sex and number of heterosexual partners, and between homosexual experience and number of heterosexual partners were tested.
All analysis was performed with the statistical package SAS (SAS Institute, Cary, North Carolina, USA).
Results
Booklet refusal
Table 2 shows the univariate associations between demographic variables and booklet refusal. All factors except sex were significantly associated with refusal. Non-white people were far more likely to refuse, as were those with problems of understanding, those in metropolitan areas, and those of lower occupational class (all with P < 0.0001).
Table 2: Associations with refusal of the self-completion questionnaire.
Table 2 also shows the univariate associations between the two behavioural variables and booklet refusal. Those with homosexual experience were somewhat less likely to refuse (P = 0.03). Booklet refusal was highest among those reporting exactly one heterosexual partner since age 13 years, and lowest among those reporting two or more.
In the demographic model for booklet refusal (Table 3 ) the associations remain significant when controlling for the other variables. Age, problems of understanding, and ethnicity are seen to have the strongest effects. Refusal was highest among those aged 45–59 years, with fitted odds ratio (OR) of 4.01 [95% confidence interval (CI), 2.64–6.09] compared with those aged 16–24 years, amongst whom refusal was lowest. Refusal was highest among those of unskilled or manual household occupational class, with fitted OR of 1.58 (95% CI, 1.28–1.93) compared with those of skilled non-manual class, amongst whom refusal was lowest. Refusal among single people was similar to those married, but cohabitees and those widowed, divorced or separated were less likely to refuse, with fitted OR of 0.44 (95% CI, 0.26–0.77) and 0.62 (95% CI, 0.46–0.84), respectively, compared with those who were married.
Table 3: Model of the odds of refusing the self-completion booklet.
In the full model of booklet refusal (Table 3 ) the number of heterosexual partners was found to be significant after controlling for the demographic variables. Those reporting exactly one partner remained most likely to refuse, with fitted OR of 1.56 (95% CI, 1.27–1.92) compared with those with two or more. Those reporting homosexual experience were somewhat less likely to refuse the booklet, after controlling for other variables. The demographic terms in the model were relatively unaffected by the inclusion of the two behavioural factors.
Embarrassment
Table 4 shows the univariate associations between the demographic variables and embarrassment. Men were significantly more likely to be embarrassed (P = 0.002), white people far less likely than others, those with problems of understanding far more likely, and those of lower household occupational class also far more likely (all with P < 0.0001). The youngest and the oldest were more likely to be embarrassed than the middle-aged, those in London were least likely to be embarrassed, whereas those in other metropolitan areas were most likely. Single people were most likely to be embarrassed, and cohabitees least likely.
Table 4: Associations with interviewee embarrassment.
Table 4 also shows the univariate associations between the two behavioural factors and embarrassment. Those reporting homosexual experience were significantly less likely to be embarrassed (P = 0.0005), those reporting more lifetime heterosexual partners (grouped as before) were significantly less likely to be embarrassed (P = 0.0001).
In the demographic model of embarrassment (Table 5 ), all demographic factors were found to be significant, with problems of understanding having the strongest effect. Embarrassment increased steadily with age, those aged 45–59 years having a fitted OR of 1.45 (95% CI, 1.27–1.65) compared with those aged 16–24 years. Those of unskilled or manual household occupational class were most likely to be embarrassed, with fitted OR of 1.21 (95% CI, 1.12–1.31) compared with skilled non-manual workers. Cohabitees and those widowed, divorced or separated had similar embarrassment to those who were married, whereas single people were more likely to be embarrassed, with fitted OR of 1.31 (95% CI, 1.18–1.46) compared with those who were married.
Table 5: Model of the odds of being embarrassed.
In the full model (Table 5 ), both behavioural factors were significant after controlling for demographic factors. Embarrassment decreased with increasing number of reported heterosexual partners, the fitted OR among those reporting no such partners relative to those reporting two or more is 2.30 (95% CI, 1.95–2.71). Those reporting homosexual experience were less likely to be embarrassed, with fitted OR of 0.69 (95% CI, 0.56–0.85). The interaction between homosexual experience and the number of heterosexual partners was not found to be significant. The associations between the demographic variables and embarrassment were relatively unaltered by entering the two behavioural factors into the model.
Among the small number of participants with ‘unknown’ characteristics the proportion embarrassed and the proportion refusing the booklet are both high. This probably reflects the fact that participants who refused even simple demographic and behavioural information were likely to be finding the interview embarrassing and to go on to refuse the booklet of more personal questions.
Discussion
Participation bias is a potential problem whenever 100% participation is not achieved, and may be a major source of bias in surveys of sexual behaviour, causing appreciable error in estimates of HIV risk. In surveys of such sensitive topics, participation bias is especially likely, and should perhaps be assumed unless there is evidence to the contrary. The method we have used to generate hypotheses about the non-participants from comparisons of willing and unwilling participants (defined in some way) would be applicable to any survey, and has been used by others (discussed below). Although the assumption that such comparisons predict the nature of participation bias is untestable, we suggest it may well be more realistic than assuming no participation bias. We outline below the effect we believe participation bias has had on two key estimates. First, however, we identify those demographic subgroups with low completion rates.
We have found that, among participants, refusing the booklet was associated with demographic variables in important ways. After controlling for other variables, booklet refusal was higher amongst ethnic minorities, the lower occupational classes, people with problems of understanding, older people and those married or single. Embarrassment followed a broadly similar pattern of association with respect to the demographic variables. However, men were more likely to be embarrassed than women, although their booklet refusal rates were similar, and metropolitan people were significantly more likely to refuse the booklet but significantly less likely to be embarrassed. Estimates of behavioural parameters (derived from questions in the booklet) within any of the ethnic minorities, amongst the lower occupational classes, amongst the old, and amongst the married or single need to be treated with particular caution.
Of more interest, we have found that among the NATSAL participants, sexual behaviour was associated with embarrassment and booklet refusal, even after controlling for demographic and understanding variables. This supports the hypothesis that there is some element of participation bias.
Reporting one lifetime heterosexual partner was significantly associated with increased booklet refusal, compared with two or more. The level of refusal in the model among those reporting no such partners may be misleading since the majority of those participants reporting no partners were not offered the booklet and so were excluded from this first analysis. Embarrassment was significantly associated with fewer partners, those reporting no partners being most embarrassed. Therefore, a crude initial hypothesis about the non-participants is that they consisted of more people with zero and one lifetime partner than the participants. Hence, estimates of these proportions based on responders alone should be regarded as probable underestimates.
Those reporting homosexual experience were somewhat less likely to refuse the booklet, and significantly less likely to be embarrassed than the other participants. Therefore, one might well suspect that the estimate of the proportion of the population with such experience, based on the responders alone, should be regarded as a probable overestimate.
On the basis of this limited analysis it would seem that, if anything, participation bias is more likely to have caused an overestimation of HIV risk behaviour than an underestimation. Other sources of error, such as reporting error, and exclusions from the sampling frame, should also be considered, and their effect on key estimates may potentially counter the effect of participation bias. In particular, the exclusion from the sampling frame of prisoners may counter some of this effect [7 ]. Estimates of HIV risk behaviour are key parameters in the estimation of population HIV prevalence and future spread [8 ]. Any overestimation of HIV risk behaviour leads to an overestimation of these measures.
However, the behaviour of the non-participants will never be known, and furthermore, one cannot determine whether embarrassment or item response bias predict the nature of any participation bias. It is entirely possible for there to be item response bias but no appreciable participation bias. It is also possible for one set of sexual experiences to cause participants to be embarrassed or refuse the booklet and for a different set of experiences to cause non-participation. Furthermore, we note that we cannot establish causality between reported behaviour and embarrassment or willingness to complete the self-completion booklet. It may be that people with certain sexual experiences are more embarrassed and less willing participants than those with other experiences, and this assumption underpins our method. Alternatively, it may be the case that embarrassment is a marker of unwillingness to disclose censured behaviour such as homosexual experience or multiple heterosexual partners. Under this alternative hypothesis, the sexual behaviour of the non-participants is unlikely to be closer to the reported behaviour of the embarrassed or unwilling participants than to the reported behaviour of the others.
We also note that recorded embarrassment was rather subjective, and indeed may only have been noticed by the interviewer if the participant refused the booklet or a set of questions. Furthermore, there may have been variation between interviewers in the way embarrassment was recorded. Although biases from these sources cannot be excluded, the similarities between the pattern of associations with embarrassment (Table 5 ) and that with booklet refusal (Table 3 ) have already been described.
Similar approaches to assessing participation bias, based on comparing willing and unwilling participants in a multiple regression analysis, and making the same assumption, have been taken by others. Biggar and Melbye [9 ] compared those who participated early and late in a postal survey in Denmark of sexual behaviour, finding similar reported numbers of lifetime heterosexual partners and similar reporting of homosexual experience among those reporting some sexual experience. However, they did find that, among men, those without sexual experience tended to respond late. They conclude, after also telephoning a sample of non-participants (see below), that the non-participants do not differ greatly in sexual behaviour from the participants. Laumann et al . [10 ] also used an approach broadly comparable to our own to assess the possibility of participation bias. They took number of visits to the address, whether a large payment was required to obtain participation, and the judgement of the interviewer about how hard participation had been to obtain, in place of our embarrassment and booklet refusal. They used different demographic and behavioural variables in their analysis, and found little evidence to support participation bias, except to suggest that male non-participants may be less likely to have masturbated in the last year.
There is one obvious alternative technique to generate hypotheses about non-participants, that of determinedly approaching a sample of initial non-participants and asking basic questions, for example, reasons for non-participation, and this method has also been used by Biggar and Melbye [9 ]. However, one would be concerned that those who refuse this information or remain uncontacted differ from the others.
None of the main analyses of any of the large sexual behaviour surveys to date has incorporated the possibility that non-participants may have different sexual behaviour from participants within demographic classes. However, the possibility of participation bias arising and differentially affecting the mean number of partners reported by men and women has been considered in a recent paper based upon NATSAL [11 ]. It may be that the identification of and adjustment for participation bias are felt by many to be too problematic, and to be based upon assumptions that are less conservative than assuming no participation bias. In fact, the assumption of no participation bias is also a strong assumption.
To present an estimate derived from just the participants (who answered the specific question) as an estimate of the population parameter is to assume that participants, non-participants and question non-responders have the same sexual behaviour, at least within definable demographic classes. Since this assumption may well not be true, the plausible range for the true value is much wider than the apparent confidence interval.
We suggest that one more appropriate way of presenting results of such a survey would be to calculate estimates based on a limited range of plausible hypotheses concerning the behaviour of the non-participants. These hypotheses could be derived from the observed data, such as from embarrassment or number of visits to address, or be taken from elsewhere. The estimates under each of the hypotheses could be accompanied by both a confidence interval assuming the hypothesis to be true, and a range of values that would be obtained by varying the precise nature of the assumption within plausible bounds (i.e., a sensitivity analysis). We intend to follow up this work with research into the best ways of constructing hypotheses about non-participants, based on the observed data. This research will focus on measures of HIV risk, and deal with a full range of sexual behaviour variables. Indeed, estimates of the prevalence of injecting non-prescribed drugs, can also be included. Work is also needed to assess the effect that these hypotheses have, via estimates of HIV risk, on estimates of population HIV prevalence and future spread. Further research is needed into how best to design a survey such that hypotheses about non-participants can easily be generated.
Further research may also be required into survey design in order to improve the completion of questions by those groups, such as ethnic minorities and those with language or literacy problems, in which booklet refusal was high.
Any method that improves participation rates may be expected to reduce participation bias. Methods that make the interview process less invasive or more private, for example through the use of computer-based interviews, may particularly reduce participation bias, since embarrassment and worries of confidentiality may be a key cause of participation bias. Other methods that attempt to ensure that people of all levels and varieties of sexual experience feel that their experiences are of interest, such as use of an explanatory letter before the visit of the interviewer, may also particularly reduce participation bias. Further research is needed in these areas of survey design. However, even if a participation rate of say 95% is achieved, then estimates of rare behaviours remain sensitive to participation bias (see Introduction). Since the prevalence of certain rare behaviours (e.g., paying for sex) are crucial to estimating the potential spread of HIV in populations, there will always be a need to consider the likelihood and possible nature of participation bias in sexual behaviour research.
Acknowledgement
The NATSAL was supported by a generous grant from the Wellcome Trust and the fieldwork was carried out by Social and Community Planning Research.