Purpose: High-quality checklists are essential to performance test score validity. Prior research found that physical exam checklists of items that clinically discriminated between competing diagnoses provided more generalizable scores than all-encompassing thoroughness checklists. The purpose of this study was to compare validity evidence for clinically discriminating versus thoroughness checklists, hypothesizing that evidence would favor the former.
Method: Faculty at four Chicago-area medical schools developed six standardized patient (SP) cases with checklists of about 20 items (“thoroughness [long] checklists”). Four clinicians identified a subset of items that clinically discriminated between competing diagnoses of each case (“clinically discriminating [short] checklists”). Cases were administered to 155 University of Illinois at Chicago fourth-year medical students during their 2011 Clinical Skills Examination (CSE). Validity evidence was compared for CSE scores based on thoroughness versus clinically discriminating checklist items.
Results: Validity evidence favoring clinically discriminating checklists included response process: greater SP checklist accuracy (kappa = 0.75 for long and 0.84 for short checklists, P < .05); internal structure: better item discrimination (0.28 long, 0.42 short, P < .001); internal consistency reliability (0.80 long, 0.92 short); standard error of measurement (z score 8.87 long, 8.05 short); and generalizability (G = 0.504 long, 0.533 short). There were no significant differences overall in relevance ratings, item difficulty, or cut scores of long versus short checklist items.
Conclusions: Limiting checklist items to those affecting diagnostic decisions resulted in better accuracy and psychometric indices. Thoroughness items performed without thinking do not reflect clinical reasoning ability and contribute construct-irrelevant variance to scores.