Share this article on:

The Association of Faculty and Residents’ Gender on Faculty Evaluations of Internal Medicine Residents in 16 Residencies

Holmboe, Eric S. MD; Huot, Stephen J. MD, PhD; Brienza, Rebecca S. MD, MPH; Hawkins, Richard E. MD

doi: 10.1097/ACM.0b013e3181971c6d
Gender Issues

Purpose Previous studies have found gender bias in the global evaluations of trainees. The purpose of this study was to investigate the association of faculty and residents’ gender on the evaluation of residents’ specific clinical skills, using direct observation.

Method In 2001–2002, 40 clinician–educators from 16 internal medicine residency programs viewed a series of nine scripted videotapes depicting varying levels of residents’ clinical performance in medical interviewing, physical examination, and counseling. Differences in the ratings of women versus men faculty, in relation to differences in the residents’ gender, were compared using random-effects regression analysis.

Results There were no statistically or educationally significant differences in the rating of clinical skills attributable to faculty or residents’ gender for medical interviewing, physical examination, or counseling.

Conclusions This study suggests that gender bias may be less prevalent in the current era of evaluation of clinical skills, particularly when specific skills are directly observed by faculty. Further work is needed to examine whether the findings of this study translate to the actual training setting.

Dr. Holmboe is senior vice president for quality research and academic affairs, American Board of Internal Medicine, Philadelphia, Pennsylvania, and professor adjunct, Yale University, New Haven, Connecticut.

Dr. Huot is professor, Department of Medicine, Yale University, New Haven, Connecticut, and program director, Yale Primary Care Residency Program, New Haven, Connecticut.

Dr. Brienza was formally clinic director, Norwalk Hospital Internal Medicine Residency Program, Norwalk, Connecticut.

Dr. Hawkins is vice president, Assessment Programs, National Board of Medical Examiners, Philadelphia, Pennsylvania.

Correspondence should be addressed to Dr. Holmboe, American Board of Internal Medicine, 510 Walnut Street, Suite 1700, Philadelphia, PA 19160; telephone: (215) 446-3606; fax: (215) 446-3633; e-mail: (

A number of studies have assessed the relationship between trainees’ gender and their evaluation of medical knowledge and clinical competence.1–7 Some have only assessed whether cognitive or noncognitive performance between men and women trainees is different.8,9 Other studies have assessed the perception by others of women trainees, compared with men, in the areas of humanism and technical skills.9,10 Limited data exist that address potential differences in performance ratings attributable to gender bias within the evaluation process of resident trainees by faculty. In addition, some reported differences in ratings by gender may simply reflect real differences in clinical and evaluative skills not associated with gender.

A study examining the American Board of Internal Medicine (ABIM) evaluation form found that male residents, compared with female residents, received significantly higher scores from male attending physicians than from female attending physicians in many of the domains studied.11 A small, single-institution study of faculty ratings of interns did not find any differences by gender.8 A study of 14,340 U.S. and Canadian graduates taking the ABIM certifying examination found that for overall competence, male trainees were rated higher than female trainees by program directors.12 However, a more recent study did not find significant differences in competency ratings of male and female residents on evaluation forms completed at the end of a ward rotation during a randomized controlled trial of an educational intervention.13

All of the previously published studies compared global ratings of various domains of competence. These previous findings of potential rating bias by faculty based on gender are more concerning, given more recent studies that used standardized patients (SPs) for assessment.14,15 For example, Haist and colleagues14 found that fourth-year female medical students scored higher than male students in all domains tested on a clinical skills assessment. Van Zanten et al15 reported the same finding among all international medical graduates who took the Educational Commission for Foreign Medical Graduates’ clinical skills assessment, including the areas of empathy, attentiveness, and attitude. The findings from SPs are also more in line with differences noted among female and male physicians in communication behaviors with actual patients.16

Less is known about whether the gender of the trainee and/or attending is associated with the evaluation of clinical skills through direct observation by faculty. The objective of our study was to examine the potential interaction of the gender of faculty and residents on the evaluation of the clinical skills of medical interviewing, physical examination, and counseling.

Back to Top | Article Outline



Forty faculty from 16 different internal medicine residency programs from the Northeast (Connecticut, Massachusetts, and Rhode Island) and Mid-Atlantic (Maryland, Virginia, and District of Columbia) regions participated in a randomized controlled trial of a faculty development course designed to improve evaluation skills in 2001–2002.16 Participating faculty from each institution were chosen by the residency program director. Five university-based programs (e.g., situated at a medical school’s primary teaching hospital) and 11 university-affiliated community-based programs (e.g., not situated at a medical school and not the medical school’s primary teaching hospital) participated. Program directors were encouraged to select faculty who did or could play a significant role in the program’s educational and assessment activities; program directors were also encouraged to participate. Participants were informed that they would be assigned to a control or intervention group to test a faculty development intervention and that they would complete a comprehensive baseline assessment, including the rating of a series of videotaped clinical encounters. The study was approved by the Yale University human investigation committee and the Uniformed Services University of the Health Sciences institutional review board.

Back to Top | Article Outline

Assessment of clinical skills

Before participating in the faculty development sessions, all faculty observed and rated a series of nine videotaped clinical encounters that were presented in random order.

  • The history skill videotapes depict a Caucasian male resident evaluating a 64-year-old African American woman presenting to the emergency room with acute shortness of breath and chest pain attributable to a pulmonary embolism.
  • The physical exam skill videotapes depict a Caucasian male resident evaluating a 69-year-old Caucasian male with progressive shortness of breath secondary to ischemic cardiomyopathy with varying levels of physical examination skills (errors of both omission and commission).
  • The counseling skill videotapes depict a Caucasian female resident evaluating a 48-year-old Caucasian male returning to clinic for follow-up of recently diagnosed hypertension to discuss treatment options with varying levels of informed decision making and patient-centered counseling skills.

Regarding the different history skill videotapes, the “poor” performance encounter shows the resident conduct a very doctor-centered interview with all closed-ended questions. A number of key feature history items are also neglected by the resident. The “best” performance encounter shows the use of open-ended questions and coverage of key feature questions that lead to a proper diagnosis.

Faculty rated resident performance on the tapes using a modified version of the ABIMs nine-point mini-clinical evaluation exercise form.17–19 On this form, a score of one to three denotes unsatisfactory performance, four to six denotes satisfactory performance, and seven to nine denotes superior performance. Videotape scripts were written for SPs and standardized residents (SRs) to depict three levels of performance for each of three clinical skills: history taking, physical examination, and counseling. The same SR and patient portrayed the three levels of performance for each of the three clinical skills. None of the tapes were designed to be a “gold standard”; some deficiencies were depicted on each tape.

Back to Top | Article Outline

Statistical analysis

The mean differences in ratings between male and female faculty were first examined in an unadjusted analysis for each videotape, with P < .05 considered statistically significant. A random-effects regression analysis was then performed adjusting for faculty age, years in current job, general internist, or subspecialist. The male attending served as the reference. An effect coefficient >0 denotes that female faculty were more likely to give a higher rating, and a value <0 denotes that female faculty were more likely to give a lower rating. All analyses were performed with SAS (SAS Institute, Inc., Cary, North Carolina).

Back to Top | Article Outline


The demographic characteristics of faculty participants are shown in Table 1. The majority of the faculty were general internists, and nearly half (48%) were women. A total of 348 tapes were rated by the 40 faculty. Some faculty arrived late to the baseline assessment; a total of 12 faculty–videotape encounters (3%) were not rated. Table 2 displays the unadjusted mean rating score for each videotape encounter according to the gender of the faculty rater. Level one videotapes were scripted to portray the lowest level of performance for the depicted clinical skill, and level three videotapes were scripted to depict the highest level of performance for the depicted clinical skill. This unadjusted analysis shows no statistically significant differences between the male and female faculty ratings for each of the clinical skills, regardless of the gender of the trainee on the videotape.

Table 1

Table 1

Table 2

Table 2

Table 3 displays the results of the random-effects regression model that adjusts for the faculty characteristics of age, program director status, length of time in job, and specialty. Using this analysis, compared with their male colleagues, female faculty rated the physical exam skills of the scripted male resident slightly lower, the interviewing skills of a different scripted male resident slightly higher, and the counseling skills of the scripted female resident slightly higher. None of these differences, however, were statistically significant.

Table 3

Table 3

Back to Top | Article Outline


Our data demonstrate—in a controlled research setting investigating the observation of discrete, measurable behaviors—that there are no significant differences in the rating of clinical skills via direct observation based on the gender of the faculty or the resident. To our knowledge, this is one of the first studies to directly study the association of gender with the evaluation of clinical skills. Almost all previous studies on faculty evaluations had only examined gender as it was associated with global competency ratings, not specific, targeted competencies such as interviewing, physical examination, and counseling skills.3,8,11–13

How do these observations compare with the real-world setting of residency training? And are our findings consistent across specialties? Our cohort consisted of relatively younger clinician–educators who trained in an era of increasing enrollment of women into medical school and internal medicine residencies.20 This group of faculty, perhaps more accustomed to working with women colleagues in their formative educational years, likely had different experiences and possibly less bias than would colleagues who trained during a time or in specialties in which direct interaction with female physicians was more limited. Also, the task of rating specific clinical skills and behaviors, rather than providing a global rating of perceived competence not grounded in direct observation, may help to mitigate any potential effects on trainee gender bias seen in past studies with global ratings. Past studies have shown that faculty based their global ratings mostly on the personality and perceived medical knowledge of the resident and not on explicit criteria or directly observed clinical skills.21,22 Furthermore, in the videotapes, the faculty and trainees did not have a personal relationship. This may have substantially mitigated the effect of the “personality factor” seen in previous studies of global rating scales.21,22

Our finding of a lack of an educationally significant difference in ratings based on gender when directly observing residents perform clinical skills is encouraging, especially given that the videotapes were scripted to depict varying levels of performance based on the quality of the clinical skills and not gender-specific traits. Given the growing concern over the state of trainees’ clinical skills, there is an urgent need for more direct observation by faculty, and with women now constituting approximately 50% of the medical school enrollment, recognizing and addressing gender bias in the evaluation of clinical competence is important.20–23 This study, along with two earlier single-institution studies, suggests that gender bias in evaluation may be lessening.8,13 However, future studies on the potential association of gender and the evaluation of clinical skills now need to move into the training setting.

Several limitations should be noted. Faculty participants were either program directors or selected by their program directors; the majority were relatively early in their careers, and most were general internists. Our results may not generalize to all faculty, particularly older faculty or subspecialty physicians. Also, we do not know whether the rating behaviors seen in this study reflect what faculty actually do when rating real residents in their own programs. Characteristics known to be different, such as female physicians’ tendency to perform longer visits and be more conversational, could have a bigger effect in the observation of trainees in actual clinical encounters.16 Finally, we did not assess the effects of ethnicity on evaluation outcomes, another important area for study.

Back to Top | Article Outline


To our knowledge, this is one of the few studies to investigate the potential effect of gender on the rating of trainees’ clinical skills. In a controlled setting, in which specific skills were evaluated via direct observation by faculty, we did not find any evidence of gender bias. Given the refocus on clinical skills in medical education, we need a fuller understanding of the factors that affect the quality and accuracy of evaluation based on direct observation.24,25 Future studies should examine whether gender and ethnicity bias in the ratings of trainee performance occur when working with actual patients.

Back to Top | Article Outline


The authors wish to thank Dr. Yun Wang for his statistical assistance on this manuscript.

This project was supported in part by a grant from the Robert Wood Johnson Foundation.

Back to Top | Article Outline


1 Dawson-Saunders B, Iwamoto C, Postell LE, Nungester RJ, Swanson DB. Initial investigation of differential performance by men and women on a national certification examination in medicine. Paper presented at: Annual Meeting of the American Educational Research Association; April 1990; Boston, Mass.
2 Norcini JJ, Fletcher SW, Quimby BB, Shea JA. Performance of women candidates on the American Board of Internal Medicine Certifying Examination, 1973–1982. Ann Intern Med. 1985;102:115–118.
3 Day SC, Norcini JJ, Shea JA, Benson JA. Gender differences in the clinical competence of residents in internal medicine. J Gen Intern Med. 1989;4:309–312.
4 Stillman PL, Regan MB, Swanson DB, Haley HA. Gender differences in clinical skills as measured by an examination using standardized patients. In: Hart IR, Harden RM, DesMarchais JE, eds. Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications; 1992.
5 van der Vleuten CPM, Jacobs EA. Comparison between the clinical competence of female and male medical students. In: Bender W, Hiemstra RJ, Scherpbier AJJA, Zwierstra RP, eds. Teaching and Assessing Clinical Competence. Groningen, Netherlands: Boek-Werk Publications; 1990.
6 Colliver JA, Marcy ML, Travis TA, Robbs RS. The interaction of student gender and standardized-patient gender on a performance-based examination of clinical competence. Acad Med. 1991;66(9 suppl):S31–S33.
7 Schueneman AL, Pickleman J, Freeark RJ. Age, gender, lateral dominance and prediction of operative skill among general surgery residents. Surgery. 1985;98:506–514.
8 Ringdahl EN, Delzell JE, Kruse RL. Evaluation of interns by senior residents and faculty: Is there any difference? Med Educ. 2004;38:646–651.
9 Linn LS, Cope DW, Leake B. The effect of gender and training of residents on satisfaction ratings by patients. J Med Educ. 1984;59:964–966.
10 Engleman EG. Attitudes toward women physicians: A study of 500 clinic patients. West J Med. 1974;120:95–100.
11 Rand VE, Hudes ES, Browner WS, Wachter RM, Avins AL. Effect of evaluator and resident gender on the American Board of Internal Medicine evaluation scores. J Gen Intern Med. 1998;13:670–674.
12 Day SC, Norcini JJ, Shea JA, Benson JA Jr. Gender differences in the clinical competence of residents in internal medicine. J Gen Intern Med. 1989;4:309.
13 Brienza R, Huot S, Holmboe ES. Evaluation of internal medicine residents: Does gender bias exist? J Womens Health. 2004;13:77–83.
14 Haist SA, Witzke DB, Quinliven S, Murphy-Spencer A, Wilson JF. Clinical skills as demonstrated by a comprehensive clinical performance examination: Who performs better—men or women? Adv Health Sci Educ. 2003;8:189–199.
15 van Zanten M, Boulet JR, Norcini JJ, McKinley D. Using a standardized patient assessment to measure professional attributes. Med Educ. 2005;39:20–29.
16 Hall JA, Irish JT, Roter DL, Ehrlich CM, Miller LH. Gender in medical encounters: An analysis of physician and patient communication in a primary care setting. Health Psychol. 1994;13:384–392.
17 Holmboe ES, Hawkins RE, Huot SJ. Direct observation of competence training: A randomized controlled trial. Ann Intern Med. 2004;140:874–881.
18 Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): A preliminary investigation. Ann Intern Med. 1995;123:795–799.
19 Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: A method for assessing clinical skills. Ann Intern Med. 2003;138:476–481.
20 Lambert E, Holmboe ES. The relationship between specialty choice and gender of U.S. medical students. Acad Med. 2005;80:797–802.
21 Thompson WG, Lipkin M Jr, Gilbert DA, Guzzo RA, Roberson L. Evaluating evaluation: Assessment of the American Board of Internal Medicine resident evaluation form. J Gen Intern Med. 1990;5:214–217.
22 Haber RJ, Avins AL. Do ratings on the American Board of Internal Medicine resident evaluation form detect differences in clinical competence? J Gen Intern Med. 1994;9:140–145.
23 Association of American Medical Colleges. Women in U.S. Academic Medicine Statistics and Medical School Benchmarking 2004–2005. Washington, DC: Association of American Medical Colleges; October 2005.
24 Holmboe ES. The importance of faculty observation of trainees’ clinical skills. Acad Med. 2004;79:16–22.
25 Association of American Medical Colleges. Educating Doctors to Provide High Quality Medical Care. A Vision for Medical Education in the United States. Available at: ( Accessed November 4, 2008.
© 2009 Association of American Medical Colleges