- The population is defined clearly, for both subjects (participants) and stimulus (intervention), and is sufficiently described to permit the study to be replicated.
- The sampling procedures are sufficiently described.
- Subject samples are appropriate to the research question.
- Stimulus samples are appropriate to the research question.
- Selection bias is addressed.
ISSUES AND EXAMPLES RELATED TO CRITERIA
Investigators in health outcomes, public health, medical education, clinical practice, and many other domains of scholarship and science are expected to describe the research population(s), sampling procedures, and research sample(s) for the empirical studies they undertake. These descriptions must be clear and complete to allow reviewers and research consumers to decide whether the research results are valid internally and can be generalized externally to other research samples, settings, and conditions. Given necessary and sufficient information, reviewers and consumers can judge whether an investigator's population, sampling methods, and research sample are appropriate to the research question.
Sampling from populations has become a key issue in 20th and 21st century applied research. Sampling from populations addresses research efficiency and accuracy. To illustrate, the Gallup Organization achieves highly accurate (±3 percentage points) estimates about opinions of the U.S. population (280 million) using samples of approximately 1,200 individuals.1
Sampling from research populations goes in at least two dimensions: from subjects or participants (e.g., North American medical students), and from stimuli or conditions (e.g., clinical problems or cases). Some investigators employ a third approach—matrix sampling—to address research subjects and stimuli simultaneously.2 In all cases, however, reviewers should find that the subject and stimulus populations and the sampling procedures are defined and described clearly.
Given a population of interest (e.g., North American medical students), how does an investigator define a population subset (sample) for the practical matter of conducting a research study? Textbooks provide detailed, scholarly descriptions of purist sampling procedures3,4 Other scholars, however, offer practical guides. For example, Fraenkel and Wallen5 identify five sampling methods that a researcher may use to draw a representative subset from a population of interest. The five sampling methods are: random, simple, systematic, stratified random, and cluster.
Experienced reviewers know that most research in medical education involves convenience samples of students, residents, curricula, community practitioners, or other units of analysis. Generalizing the results of studies done on convenience samples of research participants or other units is risky unless there is a close match between research subjects and the target population where research results are applied. In some areas, such as clinical studies, the match is crucial, and there are many excellent guides (for example, see Fletcher, Fletcher and Wagner6). Sometimes research is deliberately done on “significant”7 or specifically selected samples, such as Nobel Laureates or astronauts and cosmonauts,8 where descriptions of particular subjects, not generalization to a subject population, is the scholarly goal.
Once a research sample is identified and drawn, its members may be assigned to study conditions (e.g., treatment and control groups in the case of intervention research). By contrast, measurements are obtained uniformly from a research sample for single-group observational studies looking at statistical correlations among variables. Qualitative observational studies of intact groups such as the surgery residents described in Forgive and Remember9 and the internal medicine residents in Getting Rid of Patients10 follow a similar approach but use words, not numbers, to describe their research samples.
Systematic sampling of subjects or other units of analysis from a population of interest allows an investigator to generalize research results beyond the information obtained from the sample values. The same logic holds for the stimuli or independent variables involved in a research enterprise (e.g., clinical cases and their features in problem-solving research). Careful attention to stimulus sampling is the cornerstone of representative research.11–13
An example may make the issue clearer. (The specifics here are from medical education and are directly applicable to health professions education and generally applicable to wide areas of social sciences.) Medical learners and practitioners are expected to solve clinical problems of varied degrees of complexity as one indicator of their clinical competence. However, neither the population of eligible problems nor clear-cut rules for sampling clinical problems from the parent population have been made plain. Thus the problems, often expressed as cases, used to evaluate medical personnel are chosen haphazardly. This probably contributes to the frequently cited finding of case specificity (i.e., nongeneralizability) of performance in research on medical problem solving.14 An alternative hypothesis is that case specificity has more to do with how the cases are selected or designed than with the problem-solving skill of physicians in training or practice.
Recent work on construction of examinations of academic achievement in general15,16 and medical licensure examinations in particular17 is giving direct attention to stimulus sampling and representative design. Conceptual work in the field of facet theory and design18 also holds promise as an organizing framework for research that takes stimulus sampling seriously.
Research protocols that make provisions for systematic, simultaneous sampling of subjects and stimuli use matrix sampling.2 Matrix sampling is especially useful when an investigator aims to judge the effects of an overall program on a broad spectrum of participants.
Isolating and ruling out sources of bias is a persistent problem when identifying research samples. Subject-selection bias is more likely to occur when investigators fail to specify and use explicit inclusion and exclusion criteria; when there is differential attrition (drop out) of subjects from study conditions; or when samples are insufficient (too small) to give a valid estimate of population parameters and have low statistical power. Reviewers must be attentive to these potential flaws. Research reports should also describe use of incentives, compensation for participation, and whether the research participants are volunteers.
1. Gallup Opinion Index. Characteristics of the Sample. Princeton, NJ: Gallup Organization, 1999.
2. Sirotnik KA. Introduction to matrix sampling for the practitioner. In: Popham WJ (ed). Evaluation in Education: Current Applications. Berkeley, CA: McCutchan, 1974.
3. Henry GT. Practical sampling. In: Applied Social Research Methods Series, Vol. 21. Newbury Park, CA: Sage, 1990.
4. Patton MQ. Qualitative Evaluation and Research Methods. 2nd ed. Newbury Park, CA: Sage, 1990.
5. Fraenkel JR, Wallen NE. How to Design and Evaluate Research in Education. 4th ed. Boston, MA: McGraw-Hill, 2000.
6. Fletcher RH, Fletcher SW, Wagner EH. Clinical Epidemiology: The Essentials. 3rd ed. Baltimore, MD: Williams & Wilkins, 1996.
7. Simonton DK. Significant samples: the psychological study of eminent individuals. Psychol Meth. 1999;4:425–51.
8. Santy PA. Choosing the Right Stuff: The Psychological Selection of Astronauts and Cosmonauts. Westport, CT: Praeger, 1994.
9. Bosk CL. Forgive and Remember: Managing Medical Failure. Chicago, IL: University of Chicago Press, 1979.
10. Mizrahi T. Getting Rid of Patients: Contradictions in the Socialization of Physicians. New Brunswick, NJ: Rutgers University Press, 1986.
11. Brunswik E. Systematic and Representative Design of Psychological Experiments. Berkeley, CA: University of California Press, 1947.
12. Hammond KR. Human Judgment and Social Policy. New York: Oxford University Press, 1996.
13. Maher BA. Stimulus sampling in clinical research: representative design revisited. J Consult Clin Psychol. 1978;46:643–7.
14. van der Vleuten CPM, Swanson DB. Assessment of clinical skills with standardized patients: state of the art. Teach Learn Med. 1990;2:58–76.
15. Linn RL, Gronlund NE. Measurement and Assessment in Teaching. 7th ed. Englewood Cliffs, NJ: Prentice—Hall, 1995.
16. Millman J, Green J. The Specification and Development of Tests of Achievement and Ability. In: Linn RL (ed). Educational Measurement. 3rd ed. New York: Macmillan, 1989.
17. LaDuca A. Validation of professional licensure examinations: professions theory, test design, and construct validity. Eval Health Prof. 1994; 17:178–97.
18. Shye S, Elizur D, Hoffman M. Introduction to Facet Theory: Content Design and Intrinsic Data Analysis in Behavioral Research. Applied Social Methods Series Vol. 35. Thousand Oaks, CA: Sage, 1994.
Review Criteria for Research Manuscripts
Joint Task Force of Academic Medicine and the GEA-RIME Committee