This Invited Commentary accompanies the following article:
van Gelder FE, de Graaff JC, van Wolfswinkel L, van Klei WA. Preoperative testing in noncardiac surgery patients: a survey amongst European anaesthesiologists. Eur J Anaesthesiol 2012; 29:465–470.
In this issue of the European Journal of Anaesthesiology, van Gelder et al. present a cross-sectional survey which aims to describe the practice of preoperative cardiac testing among European anaesthesiologists.1 On the basis of 359 responses from 17 countries, the investigators found considerable variation in the reported frequency of, and indications for, preoperative testing. Additionally, half of respondents reported not adhering to practice guidelines for preoperative testing, and most supported reducing, if not eliminating, routine preoperative testing.
These results provide a previously not described picture of contemporary perioperative practice among some European anaesthesiologists, namely the survey respondents. However, the important point to consider is whether these results can be extrapolated to the overall practice of anaesthesiology within Europe. Although all cross-sectional surveys provide information about the study sample (i.e. survey respondents) directly, they implicitly try to provide information about the underlying population from which the sample is drawn (i.e. all European anaesthesiologists). The degree to which the sample is ‘representative’ of the underlying population is an important indicator of the quality of any survey.2 If it is representative, we can reasonably infer that 70% of European anaesthesiologists support eliminating routine preoperative testing. Conversely, if the respondents are systematically different (e.g. disproportionately from specific countries), we cannot make this extrapolation.
How might readers judge how representative a survey sample is? They should first consider how participants were selected for the survey. There are two main approaches for doing this, namely ‘probability sampling’ and ‘purposive (nonprobability) sampling’.3,4
In probability sampling, participants are selected using the laws of chance. When conducted well, such methods are most likely to result in a sample which is representative of the underlying population. Typical probability sampling methods include simple random sampling, systematic random sampling, stratified random sampling and cluster random sampling.3 All these methods entail that the investigators first have a list of all individuals who might potentially be eligible for the study, termed a ‘sampling frame’. An example of a sampling frame could be the membership list of a national anaesthesia society. The different probability sampling methods then use alternative approaches for randomly selecting individuals from this list. In ‘simple random sampling’, everyone in the list has an equal chance of being selected. Thus, if the list included 10 000 anaesthesiologists, and 100 potential participants were to be approached, then each individual would have a one in 100 chance of being approached. In ‘systematic random sampling’, only the starting point on the sorted list (e.g. position 130 in the alphabetically sorted membership list) is randomly selected, after which the investigators select subsequent participants at pre-specified intervals (e.g. intervals of 50 individuals).
In ‘stratified sampling’, the membership list is first divided into strata based on shared characteristics – for example, the hospital type in which anaesthesiologists practise. Simple or systematic random sampling would then be used to identify participants within each stratum. Stratified sampling is helpful if there are important, but small, subgroups of individuals who should be included in the study – for example, anaesthesiologists practising in small rural hospitals. In ‘cluster sampling’, the membership list is first divided into smaller heterogeneous groups which could more feasibly be approached to participate in the survey. Unlike stratified sampling, these groups are not based on shared characteristics, but instead include a diverse range of individuals. A major reason for using cluster sampling is to increase the feasibility of conducting a survey. For example, in a survey involving visits to physicians’ offices, the membership list could first be divided into clusters based on geographical regions, and individuals then randomly sampled within these clusters. The geographical clustering of the physicians’ offices to be visited would help to increase the feasibility of such a survey.
Unlike probability sampling, purposive or nonprobability sampling involves selecting individuals from the underlying population using non-random methods. An example is ‘convenience sampling’ in which individuals who happen to be available at the time of the study (e.g. physicians attending a conference)5 are chosen for participation.4 Given that participants are selected in a non-random manner from the underlying population, their responses cannot necessarily be extrapolated to the population. Nonetheless, purposive sampling has an important role in research and can be employed ‘as long as it is understood to whom the results do (or do not) apply’.6 Indeed, such sampling methods are often used in qualitative research studies7 in which investigators may specifically ‘select only those cases that best illuminate and test the hypothesis of the research team’.4
In addition to sampling methods, readers should also consider the ‘response rate’ among participants. It is likely that individuals who do not respond to a survey are systematically different from those who do, leading to a ‘nonresponse bias’. For example, a survey conducted exclusively by electronic mail may lead to more nonresponses by older anaesthesiologists, who may also have systematically different preferences for preoperative testing. Consequently, investigators should focus on quantifying and maximising the response rate in any survey. Various methods have been proposed to maximise response rates8 which should exceed 60 to 70% to maintain external validity.2
These issues of sampling and response rates are relevant not only to anaesthesia research. Indeed, underestimation of their importance led to the infamous botched prediction of the 1948 American presidential contest between President Harry S. Truman and the Republican nominee Thomas E. Dewey. (http://en.wikipedia.org/wiki/United_States_presidential_election,_1948) In that contest, Truman trailed significantly behind Dewey in most pre-election opinion polls. Truman's impending defeat was felt to be so certain that, on election night itself, the Chicago Daily Tribune infamously published ‘Dewey defeats Truman’ as its headline for the following morning. However, the following morning, the results showed that Truman defeated Dewey by more than 2 million votes.
So why were the polling companies’ predictions so wrong? Of several reasons cited, two are relevant to our discussion. First, as opposed to random sampling, many polling companies in 1948 used a nonprobability method called quota sampling.9,10 Essentially, polling companies provided their interviewers with specific quotas for subgroups to be interviewed, for example 10 males aged over 60 years and 20 females aged less than 30 years. The choice of individuals to interview to meet these quotas was left to the discretion of the interviewers. Second, many polling companies at the time contacted participants by telephone. In 1948, access to telephones was greatest among wealthier individuals, who were also more likely to vote for Dewey.9 Thus, the failure to consider whether participants in the pre-election polls of 1948 were representative of the larger voting population helped lead to a very public humiliation for the polling companies.
On the basis of these criteria of sampling and response rates, how should readers view the study by van Gelder et al. The investigators describe providing a link to a web-based survey to representatives of 36 national anaesthesia societies which then distributed the link to their members using a variety of methods, ranging from emails to all members to publishing the link on the society website. Thus, the survey employed a nonprobability sampling approach. In addition, the response rate cannot be determined because the total number of surveys distributed is unknown. Given these important limitations, should readers simply ignore the results? The short answer is no. Some results can be reasonably extrapolated to the larger population of European anaesthesiologists. For example, it is likely that the observed variability in preoperative testing patterns also applies to the population of European anaesthesiologists. Conversely, we cannot extrapolate the 70% preference for eliminating preoperative testing among the respondents to all European anaesthesiologists. It would, therefore, be premature to use these results for justifying initiatives to remove preoperative testing altogether because these initiatives may lack broad support.
In summary, readers of cross-sectional surveys should carefully consider both how survey participants were selected and how often they fail to respond to the survey. Alternatively, they can simply remember this humorous, yet informative, quote from the comedian Craig Kilborn –‘a telephone survey says that 51 percentage of college students drink until they pass out at least once a month. The other 49 percentage didn’t answer the phone’.
Assistance with the Commentary: none declared.
Sources of funding: DNW and SRJ are supported by Clinician-Scientist Awards from the Canadian Institutes of Health Research, and D.N.W. is also supported by a Merit Award from the Department of Anesthesia at the University of Toronto.
Conflicts of interest: There are no conflicts of interest.
Comment from the Editor: This manuscript has been reviewed by the editors but not submitted for external peer review.
1. van Gelder FE, de Graaff JC, van Wolfswinkel L, van Klei WA. Preoperative testing in noncardiac surgery patients: a survey amongst European anaesthesiologists. Eur J Anaesthesiol
2. Burns KEA, Duffett M, Kho ME, et al. A guide for the design and conduct of self-administered surveys of clinicians. CMAJ
3. Aday LA, Cornelius LJ. Designing and conducting health surveys: a comprehensive guide. Hoboken, New Jersey:John Wiley and Sons; 2006.
4. Kemper E, Stringfield S, Teddlie CB. Mixed methods sampling strategies in social science research. In: Tashakkori A, Teddlie C, editors. Handbook of mixed methods in social and behavioral research
. Thousand Oaks, California: Sage Publications; 2003. pp. 273–296.
5. Johnson SR, Granton JT, Tomlinson GA, et al. Effect of warfarin on survival in scleroderma-associated pulmonary arterial hypertension (SSc-PAH) and idiopathic PAH. Belief elicitation for Bayesian priors. J Rheumatol
6. Fletcher RH, Fletcher SW. Clinical epidemiology: the essentials. Philadelphia, Pennsylvania:Lippincott Williams and Wilkins; 2005.
7. Wijeysundera DN, Feldman BM. Quality, not just quantity: the role of qualitative methods in anesthesia research. Can J Anaesth
8. Dillman DA, Smyth JD, Christian LM. Internet. Mail and mixed-mode surveys: the tailored design method
. Hoboken, New Jersey:John Wiley and Sons; 2009.
9. Friedenson B. Dewey defeats Truman and cancer statistics. J Natl Cancer Inst
10. Mosteller F. Why did Dewey beat Truman in the preelection polls. In: Fienberg SE, Hoaglin DC, Tanur JM, editors. The pleasures of statistics: the autobiography of Frederick Mosteller
. New York, New York: Springer; 2009. pp. 1–13.