Academic Medicine

Skip Navigation LinksHome > July 2014 - Volume 89 - Issue 7 > Clinically Discriminating Checklists Versus Thoroughness Che...
Academic Medicine:
doi: 10.1097/ACM.0000000000000235
Research Reports

Clinically Discriminating Checklists Versus Thoroughness Checklists: Improving the Validity of Performance Test Scores

Yudkowsky, Rachel MD, MHPE; Park, Yoon Soo PhD; Riddle, Janet MD; Palladino, Catherine; Bordage, Georges MD, PhD

Collapse Box


Purpose: High-quality checklists are essential to performance test score validity. Prior research found that physical exam checklists of items that clinically discriminated between competing diagnoses provided more generalizable scores than all-encompassing thoroughness checklists. The purpose of this study was to compare validity evidence for clinically discriminating versus thoroughness checklists, hypothesizing that evidence would favor the former.

Method: Faculty at four Chicago-area medical schools developed six standardized patient (SP) cases with checklists of about 20 items (“thoroughness [long] checklists”). Four clinicians identified a subset of items that clinically discriminated between competing diagnoses of each case (“clinically discriminating [short] checklists”). Cases were administered to 155 University of Illinois at Chicago fourth-year medical students during their 2011 Clinical Skills Examination (CSE). Validity evidence was compared for CSE scores based on thoroughness versus clinically discriminating checklist items.

Results: Validity evidence favoring clinically discriminating checklists included response process: greater SP checklist accuracy (kappa = 0.75 for long and 0.84 for short checklists, P < .05); internal structure: better item discrimination (0.28 long, 0.42 short, P < .001); internal consistency reliability (0.80 long, 0.92 short); standard error of measurement (z score 8.87 long, 8.05 short); and generalizability (G = 0.504 long, 0.533 short). There were no significant differences overall in relevance ratings, item difficulty, or cut scores of long versus short checklist items.

Conclusions: Limiting checklist items to those affecting diagnostic decisions resulted in better accuracy and psychometric indices. Thoroughness items performed without thinking do not reflect clinical reasoning ability and contribute construct-irrelevant variance to scores.

© 2014 by the Association of American Medical Colleges


Article Tools


Article Level Metrics