As part of the licensure and certification process in the United States, physicians must achieve acceptable scores on a series of standardized competency tests. Performances on previous qualifying examinations can be used to some degree to predict a candidate's success on subsequent assessments. In addition, previous research has shown that certain cohorts of candidates have a greater likelihood of performing well on specific examinations. Candidate characteristics such as age, gender, and native language have all been shown to be related to an examinee's likelihood of success on particular assessments.1–3
Numerous studies have examined predictive relationships and candidate characteristics that influence performance on the United States Medical Licensure Examinations (USMLE). In 1997 Swanson et al. analyzed the performance of international medical school graduates (IMGs) on the USMLE Step 1 (basic science) examination. Results showed that gender (male), taking the test at a foreign test center and earlier in the education process, age (younger), and English as a first language were attributes associated with better performance.1 Factors associated with IMGs' USMLE Step 2 (clinical science) performance were also studied, and the authors reported similar findings.2 Finally, the performance of US citizens who graduated from medical schools outside of the US and Canada (USIMGs) on the Step exams was examined, and similar results were reported.3 Younger examinees and those who took the examination near to the time of graduation performed better than older candidates and those who took the exams later in their careers. Although males scored better than females on Step 1, males and females performed comparably on Step 2. On both Step exams, native English speakers received higher scores than non-English speakers.
In other studies of predictive factors of USMLE performance, Swanson et al. and Case et al. examined the relationship between candidates' medical school performance and achievement on Steps 1 and 2.4,5 Moderate to strong relationships between Step scores and school-based measures of clinical performance were reported. When US citizens who attended international medical schools were studied specifically, data showed that younger examinees who took the Step exams close to graduation did better, as did native English speakers. Males tended to achieve higher scores on Step 1, although there was no appreciable gender difference based on Step 2 performance.
Additional studies have specifically investigated the role of gender in assessment of clinical performance. Generally, females performed comparably or better than males on Objective Structured Clinical Encounters.6,7 In a study specifically focused on OB-GYN clinical clerkship performance, females clearly did better than males.8
The purpose of the study was to examine the relationship between performance on the Educational Commission for Foreign Medical Graduates' (ECFMG) Clinical Skills Assessment (CSA) and candidate characteristics. The CSA is a performance-based examination designed to ensure that the clinical skills of IMGs are comparable to those of students graduating from accredited medical schools in the United States and Canada. Several candidate characteristics were analyzed, including; age, native language (English speaking versus all others), and years since graduation from medical school. Additionally, the following examination scores and related data were studied to determine their relationship, if any, to CSA outcomes; Test of English as a Foreign Language (TOEFL), USMLE Step 1, and USMLE Step 2.
Clinical Skills Assessment
Graduates of international medical schools who wish to enter graduate training programs in the United States are required to be certified by ECFMG. Requirements for ECFMG certification include: verification of a medical diploma, passing scores on USMLE Step 1 and Step 2, an acceptable score on the TOEFL; and passing status on the CSA.
The CSA is a performance-based assessment that requires candidates to demonstrate their clinical skills in a simulated medical environment. Candidates interact with ten or 11 standardized patients (SPs), lay people trained to realistically portray common clinical complaints. The candidates evaluate the SPs as they would actual patients, gathering relevant patient data, performing a focused physical examination and summarizing their findings in the form of a clinical note. Candidates have 15 minutes to interact with each of the SPs, and ten minutes after encounters to write up their findings. The exam is divided into two components, the Integrated Clinical Encounter (ICE) and Doctor-Patient Communication (COM). The ICE portion is comprised of history taking and physical exam skills, scored analytically by SPs completing case-specific checklists, and written communication of patient findings, scored holistically by trained physician raters. Standardized patient ratings of interpersonal skills (IPS) and spoken English proficiency (ENG) make up the COM portion of the exam. The psychometric properties of CSA component and composite scores, including generalizability and dependability coefficients, have been reported elsewhere.9–12 In general, depending on the specific component or composite, the reliability of the scores, over ten encounters, ranges from 0.66 to 0.93.
Interpersonal skills are assessed across four dimensions using a four-point Likert scale. The first dimension, skills in interviewing and collecting information, includes criteria such as the effective use of open and closed-ended questions, clarity of questions and the use of verification, summarization and transition phrases. The second dimension focuses on a physician's skills in counseling and delivering information to patients. Criteria such as checking for a patient's understanding, avoiding jargon in presenting possible diagnoses and leaving a patient with an understanding of what will happen next are evaluated. The third dimension assesses a candidate's ability to establish rapport with a patient. Factors evaluated include a doctor's attentiveness, body language, attitude and demonstration of empathy and support for patient's concerns. The fourth dimension, personal manner, assesses criteria such as a candidate's mood and demeanor and skills in conducting a physical exam respectful of a patient's comfort and modesty. Candidates must achieve passing scores on both the ICE and COM portions of the exam to pass the CSA.
From February 1, 2000 to January 31, 2002, over 13,000 candidates took the ECFMG CSA, of which 11,690 were taking the CSA for the first time. Due to possible performance and personal characteristic differences between repeat and first attempt candidates, only data from first-time takers was analyzed. Of this cohort, approximately 17% failed CSA, with 27% of this failing group not passing COM, 54% failing ICE and 19% failing both exam components. English as a native language was reported by 24% of the candidates and 42% were female. Twenty-four percent of candidates were US citizens at the time of medical school, followed by 20% Indian, 6% Pakistani and 3% Chinese. Other nationalities were represented by less than 3% of candidates. The mean age of first-time test takers was 30.5 years. Average 3-digit Step 1 and Step 2 scores were 206 and 204 respectively. The mean TOEFL score was 263.
A stepwise logistic regression was used to explore the relationships between candidate characteristics and CSA pass/fail status. Separate models were based on COM and ICE status (dependent variables). The independent variables were gender, native language (English versus other), age (> = 30, <30), recent graduation from medical school (> = 5 years, < 5 years), USMLE Step 1 passing score, USMLE Step 2 passing score, and TOEFL score. Each variable was entered in a stepwise fashion and retained if it met a p < .001 criterion.
Integrated Clinical Encounter
Based on the variable selection criteria, passing ICE was positively associated with Step 2 and TOEFL scores, and significantly more likely for female as opposed to male candidates. The odds of passing the ICE composite for females relative to males, adjusting for Step 2 and TOEFL scores, was 2.64 (95% confidence interval (CI), 2.31 – 3.03). The odds ratios for Step 2 and TOEFL were somewhat lower. Here, for every 10-unit improvement in the scores (Step 2 or TOEFL), the likelihood of passing ICE increases by a factor of 1.20.
Passing the doctor–patient communication composite was associated with gender, recent graduation, Step 2 performance, TOEFL score, and was more likely for native language speakers. Similar to the results presented for ICE, controlling for other retained variables, females were over two times more likely to pass COM as opposed to males (95% CI, 1.71 – 2.39). Native language was, however, the strongest predictor of COM pass/fail status (odds ratio = 6.85, 95% CI, 3.81 – 12.29). Recent graduation (≤5 years) was positively related to COM pass/fail status (odds ratio = 1.54, 95% CI, 1.32 – 1.81). For a 10-unit increase in USMLE Step 2 score, the odds of passing COM increased by a factor of 1.08. Likewise, a 10-unit increase in TOEFL score was associated with a 1.62 increase in the probability of passing COM.
To further investigate the role of gender on CSA status, the performance of males and females on various assessment components was contrasted (see Table 1). Across all CSA components/composites, female candidates outperformed males. On interpersonal skills, females had markedly better rapport and personal manner. For the ICE elements, data gathering and patient note scores were approximately 0.40 standard deviations higher for female candidates than for males.
There are many factors that may increase one's likelihood of passing a test. These include motivation, test taking strategies, preparation, experience, and, most importantly, ability in the domain being assessed. For many examinations, certain cohorts are more likely to pass and, more often than not, this is related to ability. If this were not true, the validity of the examination would be compromised. Therefore, it is extremely important to investigate the relationships between examinee characteristics, including previous performance on related ability measures, and test performance. The strength of these relationships, provided they can be explained in a meaningful fashion, can provide valuable information to support the use of the test scores.
For the ICE portion of the CSA exam, gender emerged as the most highly predictive factor of success. Females did better on both component parts of the ICE, data gathering (DG) and the patient note (PN). This finding is interesting for a number of reasons. Considering that DG is scored by SPs completing case specific checklists, females are asking more relevant history taking items and performing, correctly, more physical exam maneuvers. The DG ability of females is likely related to their rapport and personal manner. As noted in the literature,6,13,14 females generally score better on measures of interpersonal skills within a clinical encounter than do males. Patients are more likely to be forthcoming with information to candidates with whom they feel most comfortable. These findings regarding higher DG scores would also support the higher PN ratings achieved by women. Candidates who gathered more information from patients would be more likely to be able to write a comprehensive note. Additionally, handwriting could play a role in higher female PN scores, as legibility is considered by raters when assigning a score. Females have been shown to write faster and smaller and make fewer penmanship mistakes than males.15
In addition to gender, USMLE Step 2 scores emerged as a significant predictor of ICE pass/fail status. This would be expected in that clinical science knowledge is required for effective interviewing and summarization of clinical findings. Finally, higher TOEFL scores were an indicator of ICE success. Candidates with greater English language ability would be expected to be able to gather more information and write better clinical notes than those with lower language ability.
These three predictors of ICE success (gender, USMLE Step 2 score, and TOEFL score), were also predictors of success on the Doctor-Patient Communication portion of the CSA. However, as expected, native language emerged as having the highest predictive value. Candidates with English as their first language achieved much higher spoken English proficiency rating, providing additional validity evidence for the SPs' ratings. English as a native language was also related to interpersonal skills. As expected, native English speaking doctors were better able to collect information, counsel and deliver information, establish rapport etc., with the patients than candidates with a first language other than English.
In addition to native language, gender emerged as a relatively strong predictor of COM status. Based on our performance data, women have a better ability to connect with patients and make them feel at ease in a medical encounter. They may also ask more personal and lifestyle related questions, and therefore achieve a more equalitarian relationship with their patients. These specific behaviors, including draping, need to be studied in more detail. They would definitely influence IPS ratings, especially the rapport and personal manner dimensions of the CSA IPS rating scale.
Following gender, recent graduation, TOEFL and USMLE Step 2 scores all emerged as predictors of COM status. One would expect that more recently graduated candidates did better on COM due to their experience with patients. However, it is unclear exactly how candidates spent their time between medical school and taking the CSA. A significant amount of time away from patient interactions would certainly be expected to have a negative impact on a candidate's ability to communicate in the medical environment. Based on language and clinical science proficiency, candidates with higher TOEFL and USMLE Step 2 scores would also logically be expected to perform better on the COM portion of the CSA.
Additional studies need to be conducted to better understand the relationship between candidate background characteristics and CSA performance. First, it would be informative to gather more data that could be used to explain female performance on the ICE. Are females asking more history questions and performing more physical exam maneuvers overall, or just more of the appropriate ones credited on the checklist? Also, female candidates may spend more time in the encounters and therefore have the opportunity to be credited for more checklist items. Finally, the SPs are taught to credit a candidate with an item if they are not asked the question, but accidentally volunteer the information. If male and female SPs volunteer information deferentially based on candidate gender, some of the performance differences noted could not be explained based on ability.
It would also be informative to investigate the background of female candidates who take CSA. Although their Step exam performance is about equal with males, it would appear that female candidates are somehow better prepared to take the CSA. Additionally, CSA candidates come from all over the world, including countries where the majority of physicians are male. One might expect that female candidates from certain countries would be more motivated than their male counterparts due to the nature of obstacles female physicians need to overcome to advance in the medical training and certification process.
Our study focused on various candidate characteristics and previous assessment results to determine which factors were associated with a physician's success on the CSA. Variables such as native language, recent graduation, clinical science performance and TOEFL scores would be expected to have some relationship with CSA status. These findings provide some support for the validity of CSA pass/fail decisions, and give candidates some prospective information regarding their potential CSA performance. Performance differences by gender emerged as an interesting finding, and further research needs to be conducted to determine the potentially complex reasons why females scored better on every component of the CSA.
1.Swanson DB, Case SM, Ripkey D, et al. Performance of examinees from foreign schools on the basic science component of United States Medical Licensing Examination. Scherpbier A, van der Vleuten C, Rethans J, van der Steeg A (eds). Proceedings from the Seventh Ottawa Conference on Medical Education and Assessment, The Netherlands, Kluwer Academic Publishers. 187–90 1997.
2.Ripkey D, Case SM, Swanson DB, et al. Performance of examinees from foreign schools on the clinical science component of the United States Medical Licensing Examination. Scherpbier A, van der Vleuten C, Rethans J, van der Steeg A (eds). Proceedings from the Seventh Ottawa Conference on Medical Education and Assessment, The Netherlands, Kluwer Academic Publishers. 175–8. 1997.
3.Case SM, Swanson DB, Ripkey D, et al. Preliminary descriptive analyses of the performance of US citizens attending foreign medical schools on USMLE Steps 1 and 2. Scherpbier A, van der Steeg A, van der Vleuten C, Rethans J, van der Steeg A (eds). Proceedings from the Seventh Ottawa Conference on Medical Education and Assessment, The Netherlands, Kluwer Academic Publishers. 135–8. 1997.
4.Swanson DB, Ripkey DR, Case SM. Relationship between achievement in basic science coursework and performance on 1994 USMLE Step 1. 1994-95 Validity Study Group for USMLE Step 1/2 Pass/Fail Standards. Acad Med. 1996;71(1 Suppl):S28–S30.
5.Case SM, Ripkey DR, Swanson DB. The relationship between clinical science performance in 20 medical schools and performance on Step 2 of the USMLE licensing examination. 1994-95 Validity Study Group for USMLE Step 1 and 2 Pass/Fail Standards. Acad Med. 1996;71(1 Suppl):S31–S33.
6.Rutala PJ, Witzke DB, Leko EO, et al. The influences of student and standardized patient genders on scoring in an objective structured clinical examination. Acad Med. 1991;66(9 Suppl):S28–S30.
7.Rothman AI, Cohen R, Ross J, et al. Station gender bias in a multiple-station test of clinical skills. Acad Med. 1995;70:42–6.
8.Krueger PM. Do women medical students outperform men in obstetrics and gynecology? Acad Med. 1998;73:101–2.
9.Boulet JR, Ben David MF, Ziv A, et al. Using standardized patients to assess the interpersonal skills of physicians. Acad Med. 1998;73(10 Suppl):S94–S96.
10.Boulet JR, McKinley DW, Whelan GP, et al. Quality assurance methods for performance-based assessments. Adv Health Sci Educ Theory Pract. 2003;8:27–47.
11.Boulet JR, van Zanten M, McKinley DW, et al. Evaluating the spoken English proficiency of graduates of foreign medical schools. Med Educ. 2001;35:767–73.
12.Boulet JR, Rebbecchi T, Denton E, et al. Assessing the written communication skills of medical school graduates. Adv Health Sci Educ Theory Pract (in press).
13.Colliver JA, Vu NV, Marcy ML, et al. Effects of examinee gender, standardized-patient gender, and their interaction on standardized patients' ratings of examinees' interpersonal and communication skills. Acad Med. 1993;68:153–7.
14.Chambers KA, Boulet JR, Furman GE. Are interpersonal skills ratings influenced by gender in a clinical skills assessment using standardized patients? Adv Health Sci Educ Theory Pract. 2001;6:231–41.
15.Ziviani J, Elkins J. An evaluation of handwriting performance. Education Rev. 1984;36:249–61.