Secondary Logo

Journal Logo

Education & Administration

Is There Evidence of Gender Bias in the Oral Examination for Initial Certification by the American Board of Physical Medicine & Rehabilitation?

Driscoll, Sherilyn W. MD; Robinson, Lawrence R. MD; Raddatz, Mikaela M. PhD; Kinney, Carolyn L. MD

Author Information
American Journal of Physical Medicine & Rehabilitation: June 2019 - Volume 98 - Issue 6 - p 512-515
doi: 10.1097/PHM.0000000000001126


Unconscious bias (UB) is a well-documented phenomenon in medical education, research, and practice.1–9 Unconscious or implicit associations of certain stereotypes with individuals outside of one's own social or identity group may influence perceptions or judgments of other people. Bias, whether conscious or unconscious, may result in a prejudicial evaluation of another person and could potentially lead to unfair treatment or unequal opportunities.

Medical professionals demonstrate levels of UB similar to the general population.9,10 Most studies focus on the impact of UB on patient care, although more literature is emerging on UB in medical school activities such as faculty recruitment and advancement.3,4,8 In 2015, the field of physical medicine and rehabilitation (PMR) was composed of approximately 35% females.11 A recent report from the Association of Academic Physiatrists Women's Task Force suggested that, compared with the general membership gender distribution, women were underrepresented in leadership positions including: trustees, committee chairs, annual meeting plenary speakers, award recipients, and American Journal of PMR editorial board members.12 Whether the gender inequality is related to UB or other causes is unknown, but the gender differences reported previously prompted questions about other professional activities, including board certification.

Little is known about UB relating to the certification processes of the American Board of Medical Specialties Member Boards, several of which continue to include an oral examination as a component of their initial certification. Unconscious bias at this level could potentially result in unfair examination scoring. The American Board of PMR (ABPMR) includes an oral examination (Part II) as a component of the initial certification assessment process in PMR. The Part II examination uses an oral examination format designed to test patient care, system based practice, and interpersonal and communication skills that cover the five domains of data acquisition, problem-solving, patient management, systems-based practice, and interpersonal communication. Every May, the ABPMR conducts the Part II examination in Rochester, MN. The examiner's behavior is modeled and statistically controlled by the severity measure. The severity measure is derived from the pattern of examiner ratings given the ability level of the candidate and the difficulty of the task.

Because this examination involves face-to-face interactions between candidates and examiners, there is a potential gender bias risk in the scoring process despite design elements in the examination to minimize this possibility. As a result of the concern about potential bias, UB training was mandated for all examiners in 2017 and 2018.

This retrospective observational study was undertaken to determine whether performance on the ABPMR Part II examination differed based on candidate gender and whether the gender of examiner/candidate pairs affected Part II examination scores. The impact of examiner UB training on candidate performance was also assessed.


Aggregate performance of first-time men and women Part II examination candidates was reviewed from 2013 to 2018 (2359 candidates; 1427 men/932 women). Gender was self-identified. The data analysis included examination scores and pass rates of men and women candidates on Part II. Performance was controlled for the following: Part I scores, severity measure comparisons for men and women examiners, differences in mean domain scores between men and women, and candidate performance by examiner/candidate gender combinations.

In addition, these same candidate and examiner performance metrics were compared before and after the implementation of examiner UB training, along with consistency measure comparisons. The UB training required of all examiners was a 15-min publicly available online module produced by Cook Ross with the following objectives: (1) to define terms related to UB, (2) to understand the research, science, and impact of UB, (3) to identify the filters through which individuals view and interpret themselves and others, and (4) to explore patterns of evaluating other people based on one's own background. A portion of the pre-examination, in-person examiner training session was dedicated to raising awareness regarding UB via group discussion.

The Mayo Clinic Institutional Review Board determined that it did not meet the criteria for Human Subjects Research as defined under 45 CFR 46.102. This study conforms to STROBE guidelines (see Supplemental Checklist, Supplemental Digital Content 1,


For first-time Part II PMR certification examination test takers between 2013 and 2018, there were significant differences in pass rates (men 84%, women 89%) and mean scaled scores (men 6.56, women 6.81) between men and women (P < 0.001), with women consistently achieving higher scores (Table 1). In addition, there are significant differences in mean domain scores between men and women (all significant at P < 0.001) (Fig. 1), with the biggest differences in data acquisition and interpersonal and communication skills. Although there was some variation in comparisons of candidate domain performance by examiner and candidate gender combination, the women examinees scored higher than men examinees whether their examiner was male or female.

Comparison of mean scaled scores and pass rates between men and women (both P < 0.001) taking the Part II PMR examination for the first time between 2013 and 2018
Comparison of women and men on subscore and total scores, differences all significant at P < 0.001.

Regarding examiner characteristics, there was not a significant difference in scoring severity (a measure of how difficult a grader the examiner is) between men and women examiners (men 4.93, women 4.86; P = 0.244) (Table 2).

Difference in severity between men and women examiners (P = 0.244)

Implementation of UB training in 2017 and 2018 did not impact either examiner or candidate performance (Tables 3–5).

Comparison of mean scaled scores after implementing examiner unconscious bias training in 2017 and 2018 (P = 0.165)
Comparison of examiner severity after implementing examiner unconscious bias training in 2017 and 2018 (P = 0.109)
Comparison of examiner consistency after implementing examiner unconscious bias training in 2017 and 2018 (P = 0.623)

The combination of examiner and candidate gender significantly predicted candidate scores (P < 0.001). However, based on the previous analyses and post hoc analyses, it is evident that the difference was between candidate performance, not examiner performance.

There were significant differences in mean raw scores for the following combinations of examiners and candidates (1) male examiner/male candidate and male examiner/female candidate and (2) female examiner/male candidate and female examiner/female candidate. However, there were no significant differences between (1) male examiner/female candidate and (2) female examiner/female candidate (Table 6). Therefore, it seems that the scores were predicted by candidate performance, not examiner performance.

Comparisons of relative scores on Part II for different combinations of male and female examiners and examinees

For the Part II examinee cohort 2013–2018, the vast majority took the Part I written PMR examination between 2012 and 2017, and scores on the written examination could be directly compared for men versus women. Overall, there is not a significant difference in mean scaled scores between men and women (P = 0.567). Furthermore, there is not a significant difference in Part I pass rates between men and women (P = 0.170) (Table 7).

Comparison of Part I examination scores and pass rate for men and women who took Part I 2012–2017

After controlling for Part I scores, the gender of candidates still significantly predicted Part II scores (P < 0.001).


Women candidates scored higher and had a higher pass rate than men candidates overall on the ABPMR Part II (oral) examination for initial certification. This difference does not seem to be due to scoring gender bias by the Part II examiners or due to candidate aptitude as measured on the Part I examination.

Previous studies specifically looking for potential evidence of gender bias in medical education suggest mixed results with some showing no evidence of bias or difference between men and women.13–17 However, other studies raise suspicion of gender bias in a variety of medical settings. Women gastroenterology fellows were scored lower by faculty evaluators than men whereas there was no difference between women and men internal medicine residents at the same institution.18 Gender pairing was observed to be a significant factor in another internal medicine program with male faculty awarding significantly higher scores to male residents than the female faculty awarded to the same male residents.7 Women emergency department residents received similar scores as men from faculty during their first year of residency but then lost ground in terms of milestone achievement through graduation.1 Another study suggested that female emergency department residents received discordant feedback form staff regarding autonomy and assertiveness, whereas male resident feedback was more consistent about areas that needed improvement.6 Obstetrics and gynecology female residents earlier in training received harsher feedback from nurses than did male residents.2 Female students scored higher on a musculoskeletal clinical evaluation than male students and were given statistically higher scores by male examiners than female examiners.19

The ABPMR Part II examination is designed and structured to minimize bias. The examination content is standardized and scoring metrics are reviewed with all examiners. In addition, each candidate is assigned both male and female examiners through their three individual examinations. However, because of concern related to potential bias, UB training was instituted for all examiners in 2017 and 2018. There is evidence that such training may raise awareness, change habits, and reduce gender bias in the medical setting.3,20,21 However, no change in scoring or pass rates occurred after the initiation of examiner UB training in our study. Possible explanations include the following (1): any UB that does exist in examiners was minimized by the design and structure of the examinations, (2) the examiners were previously educated and aware of UB or (3) with additional or a different type of UB training, we might have found differences.

Because women and men perform equally well on the written Part I PMR examination but women outperform men on the Part II examination, we wondered whether we could be seeing evidence of bias against men. Alternatively, women may simply outperform men despite any potential lingering gender biases. In support of the latter, several studies suggest that women demonstrate stronger communication skills than men in the medical setting. For example, women outperform men on United States Medical Licensing Examination step 2 clinical skills communication and interpersonal skills scores,22 medical student clinical examination scores in Ob-Gyn,23 medical student clinically based performance examinations,24 and medical student general practice Objective Structured Clinical Examination style communication scores.25 Colliver et al26 found ratings higher for women in the attribute of “personal manner” and a trend for women to outscore men on a clinical competence score using standardized patients was seen in Spanish medical students.27 Despite the fact that the field of PMR is male dominated based on number of practicing physicians (65% male, 35% female),11 women candidates outperform men on the oral examination. On the oral examination, it is possible that some men may choose not to ask more questions because they thought that they knew the answer and wanted to rush through to problem-solving. A meta-analysis supports the notion that women physicians spend more time (2.24 mins on average) with patients gathering information and adopting a patient-centered communication style.28,29 Differences in communication style between genders may be a possible explanation for the higher scores awarded women in oral examination situations. Unfortunately our limited data do not allow us to do more than speculate about potential underlying reasons for these gender differences.

To date, two other specialties have specifically published data on gender differences in performance on American Board of Medical Specialties oral board examinations. In anesthesiology, there was a trend for women to perform better on Part II oral examinations.30 In general surgery, no gender differences were found in scores or pass rates, and scores were not influenced by different combinations of examiner examinee gender pairs.31

Limitations of our study include the retrospective nature of the study, inability to evaluate for other biases such as those based on appearance or accent that may supercede any gender bias, and the inability to truly identify and understand the potential biases of both examiners and examinees. Moreover, gender, as opposed to sex, is not a simple dichotomous variable as we have treated it here. Gender is measured along a spectrum of socially constructed roles and is based on an individual's concept of themselves along this spectrum. Finally, because our examination content and examiner training are specific to the ABPMR, we do not believe that these results can necessarily be extrapolated to other specialties or other examinations.

Despite the safeguards, UB is an inherent limitation of all interpersonal interactions. A rigorous orientation process for Part II examiners now includes UB training as a best practice. Although this study did not reveal an impact of UB training on examiner/candidate performance, studies support that UB training elevates awareness of potential bias for those in grading or scoring roles,3 and UB training will continue as a component of ABPMR's Part II examiner orientation. Further research is needed to determine the most relevant and effective UB training for medical specialty certification processes.


1. Dayal A, O'Connor DM, Qadri U, et al.: Comparison of male vs female resident milestone evaluations by faculty during emergency medicine residency training. JAMA Intern Med 2017;177:651–7
2. Galvin SL, Parlier AB, Martino E, et al.: Gender bias in nurse evaluations of residents in obstetrics and gynecology. Obstet Gynecol 2015;126:7S–12
3. Girod S, Fassiotto M, Grewal D, et al.: Reducing implicit gender leadership bias in academic medicine with an educational intervention. Acad Med 2016;91:1143–50
4. Holmboe ES, Huot SJ, Brienza RS, et al.: The association of faculty and residents' gender on faculty evaluations of internal medicine residents in 16 residencies. Acad Med 2009;84:381–4
5. Morgan HK, Purkiss JA, Porter AC, et al.: Student evaluation of faculty physicians: gender differences in teaching evaluations. J Womens Health (Larchmt) 2016;25:453–6
6. Mueller AS, Jenkins TM, Osborne M, et al.: Gender differences in attending physicians' feedback to residents: a qualitative analysis. J Grad Med Educ 2017;9:577–85
7. Rand VE, Hudes ES, Browner WS, et al.: Effect of evaluator and resident gender on the American Board of Internal Medicine evaluation scores. J Gen Intern Med 1998;13:670–4
8. Templeton K: Impact of gender on teaching evaluations of faculty: another example of unconscious bias? J Womens Health (Larchmt) 2016;25:420–1
9. FitzGerald C, Hurst S: Implicit bias in healthcare professionals: a systematic review. BMC Med Ethics 2017;18:19
10. Johnson TJ, Ellison AM, Dalembert G, et al.: Implicit bias in pediatric academic medicine. J Natl Med Assoc 2017;109:156–63
11. Association of American Medical Colleges: Active Physicians by Sex and Specialty, 2015. Available at: Accessed October 15, 2018
12. Silver JK, Cuccurullo SJ, Ambrose AF, et al.: Association of Academic Physiatrists women's task force report. Am J Phys Med Rehabil 2018;97:680–90
13. Richens D, Graham TR, James J, et al.: Racial and gender influences on pass rates for the UK and Ireland specialty board examinations. J Surg Educ 2016;73:143–50
14. Halperin EC, Broadwater GJ: Are there sex biases in standardized tests of radiation oncology knowledge?J Clin Oncol 1997;15:2722–7
15. McManus IC, Elder AT, Dacre J: Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations. BMC Med Educ 2013;13:103
16. Solomon DJ, Speer AJ, Ainsworth MA, et al.: Investigating gender bias in preceptors' ratings of medical students. Acad Med 1993;68:703
17. Denney ML, Freeman A, Wakeford R: MRCGP CSA: are the examiners biased, favouring their own by sex, ethnicity, and degree source? Br J Gen Pract 2013;63:e718–25
18. Thackeray EW, Halvorsen AJ, Ficalora RD, et al.: The effects of gender and age on evaluation of trainees and faculty in gastroenterology. Am J Gastroenterol 2012;107:1610–4
19. Schleicher I, Leitner K, Juenger J, et al.: Examiner effect on the objective structured clinical exam—a study at five medical schools. BMC Med Educ 2017;17:71
20. Carnes M, Devine PG, Baier Manwell L, et al.: The Effect of an intervention to break the gender bias habit for faculty at one institution: a cluster randomized, controlled trial. Acad Med 2015;90:221–30
21. Devine PG, Forscher PS, Cox WTL, et al.: A gender bias habit-breaking intervention led to increased hiring of female faculty in STEMM departments. J Exp Soc Psychol 2017;73:211–5
22. Cuddy MM, Swygert KA, Swanson DB, et al.: A multilevel analysis of examinee gender, standardized patient gender, and United States Medical Licensing Examination Step 2 clinical skills communication and interpersonal skills scores. Acad Med 2011;86:S17–20
23. Jacques L, Kaljo K, Treat R, et al.: Intersecting gender, evaluations, and examinations: averting gender bias in an obstetrics and gynecology clerkship in the United States. Educ Health (Abingdon) 2016;29:25–9
24. Haist SA, Wilson JF, Elam CL, et al.: The effect of gender and age on medical school performance: an important interaction. Adv Health Sci Educ Theory Pract 2000;5:197–205
25. Wiskin CM, Allan TF, Skelton JR: Gender as a variable in the assessment of final year degree-level communication skills. Med Educ 2004;38:129–37
26. Colliver JA, Vu NV, Marcy ML, et al.: Effects of examinee gender, standardized patient gender, and their interaction on standardized patients' ratings of examinees' interpersonal and communication skills. Acad Med 1993;68:153–7
27. Gómez JM, Prieto L, Pujol R, et al.: Clinical skills assessment with standardized patients. Med Educ 1997;31:94–8
28. Roter DL, Hall JA: Physician gender and patient-centered communication: a critical review of empirical research. Annu Rev Public Health 2004;25:497–519
29. Jefferson L, Bloor K, Birks Y, et al.: Effect of physicians' gender on communication and consultation length: a systematic review and meta-analysis. J Health Serv Res Policy 2013;18:242–8
30. McClintock JC, Gravlee GP: Predicting success on the certification examinations of the American Board of Anesthesiology. Anesthesiology 2010;112:212–9
31. Ong TQ, Kopp JP, Jones AT, et al.: Is there gender bias on the American Board of Surgery general surgery certifying examination? J Surg Res 2019;237:131–5

Gender; Unconscious Bias; Board Certification; ABPMR

Supplemental Digital Content

Copyright © 2019 Wolters Kluwer Health, Inc. All rights reserved.