In Reply to O’Connor et al: We thank O’Connor and colleagues for their comments on our Research Report. Assessment of competency is a complex, multifaceted process including workplace-based assessments, such as emergency department shift cards for the residents in Dayal and colleagues’ study,1 as well as other assessment methods. These assessments contribute to programmatic assessment for milestone reporting to the Accreditation Council for Graduate Medical Education (ACGME). Incorporated in the assessment process may be systematic bias, including gender bias. In our study of the national dataset of emergency medicine (EM) resident milestone assessments, males and females were rated similarly for the majority of subcompetencies. However, statistically significant but very small, absolute differences were noted in several subcompetencies, all within the general competency of patient care.
Assessment with validity and reliability evidence is challenging, with few assessment instruments and processes demonstrating strong validity evidence. All biases threaten the generalizability and validity of assessments. O’Connor and colleagues bring up the concern of straight line scoring (SLS) noted in some EM milestone reporting.2 SLS per se does not imply greater gender bias or negate the importance of the analyses of our study. Whether SLS affects each gender equally (or not) is a question that could be answered empirically in a future study.
Thus, one key step to measuring competency is the examination of validity evidence and the continuous improvement of the assessments. For example, the ACGME is undergoing the Milestones 2.0 project to realign and, in some cases, to redefine the milestones for all specialties. Similarly, there is a responsibility to examine the validity evidence including gender bias at all types of assessment from shift cards to clinical competency committees (CCCs). This process will improve the quality of the assessments as well as summative decisions and formative feedback. The responsibility for examining and improving assessments lies with the programs through single- and multi-institutional research. In addition, as evidenced by our study, the ACGME welcomes collaborative partnerships to engage in rigorous research projects to address these important questions. Finally, qualitative research on how raters score trainees and how CCCs render decisions may be helpful to improve validity and decrease bias.
Sally A. Santen, MD, PhD
Professor and senior associate dean, Virginia Commonwealth University School of Medicine, Richmond, Virginia; [email protected]; ORCID: http://orcid.org/0000-0002-8327-8002.
Kenji Yamazaki, PhD
Senior analyst, Milestones Research and Evaluation, Accreditation Council for Graduate Medical Education, Chicago, Illinois.
Eric S. Holmboe, MD
Chief research, milestone development and evaluation officer, Accreditation Council for Graduate Medical Education, Chicago, Illinois; ORCID: https://orcid.org/0000-0003-0108-6021.
Stan J. Hamstra, PhD
Vice president, Milestones Research and Evaluation, Accreditation Council for Graduate Medical Education, Chicago, Illinois; ORCID: https://orcid.org/0000-0002-0680-366X.
1. Dayal A, O’Connor DM, Qadri U, Arora VM. Comparison of male vs female resident milestone evaluations by faculty during emergency medicine residency training. JAMA Intern Med. 2017;177:651–657.
2. Beeson MS, Hamstra SJ, Barton MA, et al. Straight line scoring by clinical competency committees using emergency medicine milestones. J Grad Med Educ. 2017;9:716–720.