Formative and summative assessment of medical students' clinical skills is a critical component of medical student education.1 As such, multiple tools have been developed to assess students' clinical skills with actual patients, with one of the most well studied being the mini-Clinical Evaluation Exercise (mCEX). The mCEX, developed by the American Board of Internal Medicine, is used to provide formative feedback to trainees about their history taking, physical exam, counseling, and interpersonal skills.1–3 Prior research has demonstrated the reliability, concurrent and predictive validity of scores when used with internal medicine (IM) residents4,5 and students1–3; however, the predictive validity of mCEX scores with medical students has largely focused on correlating mCEX performance with scores on multiple-choice exams, patient write-ups, and summative clinical evaluations of students by their residents and attendings.3 Whether mCEX performance during actual patient encounters correlates with students' short-term and longer-term clinical performance on standardized patient (SP) examinations has not been studied.
In this study, we evaluated the predictive validity of the mCEX by correlating IM core clerkship students' mCEX scores with their subsequent clinical performance on SP examinations. We hypothesized that mCEX performance would have a modest, positive correlation with student performance on two SP exams, one taken at the end of the IM core clerkship and the other at the conclusion of the core clerkship year. Additionally, we hypothesized that there would be a small correlation of competency-specific measures (i.e., history taking, physical exam, interpersonal skills) on the mCEX with the corresponding competencies on the SP exams, and that correlations would increase over the clerkship year as students became more clinically proficient.
After obtaining IRB approval, we examined data for all students taking the eight-week inpatient IM core clinical clerkship at our institution in 2006 and 2007. Clerkships run from January to December and are scheduled in four 12-week blocks. The medicine block combines the eight-week IM rotation with a four-week family medicine clerkship. Students were required to complete eight mCEX cards (four by housestaff, four by attendings) and return them to the course director at the end of the IM rotation; however, card ratings did not influence clerkship grades.
Students were included in the study population if they (1) completed at least four of eight assigned mCEX cards during the IM rotation, (2) took the family medicine/IM (FIM) SP exam on the last day of the block clinical rotation, and (3) took the school of medicine's multidisciplinary SP clinical skills exam (CSE) in the spring after the completion of the core clerkship year. We chose a minimum of four mCEX cards because it is the number needed to document minimal clinical competency.1,3
During an mCEX, students were observed with actual patients by IM housestaff or attendings. The mCEX included five competencies (medical interviewing, physical examination [PE], professionalism, clinical judgment, and counseling) and a separate rating for overall clinical performance. Competencies and overall performance were rated on a nine-point rating scale (1–3 = performs below expectation, 4–6 = performs at expectation, 7–9 = performs above expectation).
The FIM exam is a four-encounter, pass/fail exam given on the last day of the medicine block. Students are required to perform a focused history and physical exam and to counsel a patient in clinical scenarios focused on FIM. Students' grades were determined by their performance on a checklist completed by the SP, a clinical note graded by the clerkship director, and an interpersonal skills score assigned by the SP. The checklists for each case had between 15 and 18 yes/no items indicating whether key elements of the history, physical exam, or counseling were performed. Six interpersonal skills (Eliciting Information, Listening, Giving Information, Respectfulness, Empathy, and Professionalism) were rated on a four-point scale (1 = poor/almost never, 2 = fair/somewhat less, 3 = good/somewhat more, 4 = very good/almost always). An additional interpersonal skills item rated on a four-point scale (1 = not at all, 2 = somewhat, 3 = comfortable, 4 = very comfortable) asked, “If this student were a doctor, how comfortable would you feel referring a family member or friend to him/her?” Students taking the FIM received two final scores: a combined checklist/patient notes score and an interpersonal skills score.
The CSE, a multidisciplinary exam that incorporated cases and clinical skills taught in IM as well as the other core clerkships, was given after the completion of all core clerkships. In 2007 it was a six-encounter exam; in 2008, it had 12 cases. Students taking the CSE were evaluated by the SP with a checklist and interpersonal skills ratings, and a patient note rated by the course director. For both SP exams, the checklist/notes scores were calculated as percentages of total possible points awarded, and students were required to pass both the checklist/notes portion and the interpersonal skills portion to pass the exam.
For each student, we calculated a mean score for each mCEX competency, a mean score for the overall clinical performance rating, and a mean interpersonal skills score (a composite of mean professionalism and counseling scores). We used Pearson correlations to estimate the relationship between the mean overall mCEX score and the overall combined checklist/notes score on the FIM and CSE, as well as the mean interpersonal skills scores on these exams. Correlations were disattenuated to correct for lack of reliability.
For the secondary analyses, we correlated specific mCEX competencies with corresponding competencies on the CSE. Ordinary regressions estimated the R2 with CSE and FIM checklist/notes and interpersonal scores as dependent variables and mCEX as the independent variable. We assessed whether correlations among the mCEX and CSE varied depending on time of year that students took the IM clerkship (Block 1 versus 2 versus 3 versus 4).
A total of 310 students took the IM clerkship in 2006 and 2007, and 244 students (79%) met inclusion criteria. Of the 66 excluded students, 8 (12%) completed fewer than four cards; the remainders were nontraditional students (e.g., MD/PhD, MD/MBA) who did not complete the CSE secondary to an interrupted clinical training.
Table 1 shows that mean scores were greater than seven for all mCEX competencies. Estimated reliability coefficients were 0.90 for the mCEX, 0.59 for the FIM checklist/notes, 0.67 for the CSE checklist/notes, 0.31 for the FIM interpersonal scores, and 0.68 for the CSE interpersonal scores.
The correlation between overall mCEX performance and SP checklist/notes scores was 0.15 (P = .018) with the FIM and 0.21 (P < .001) with the CSE (Table 2). Overall mCEX performance correlated slightly better with SP interpersonal skills scores on both clinical skills exams (Table 2). Disattenuated correlations were slightly higher (Table 2). Regressions of the SP performance measures on the mCEX overall revealed generally low R2 values of 0.02 and 0.05 for the FIM and CSE checklist/notes dependent variables and 0.05 and 0.15 for the FIM and CSE interpersonal ratings, respectively.
Observed correlations between the mCEX and CSE specific competencies were low and not statistically significant for history taking (r = 0.08, P = .19) and PE (r = 0.03, P = .66); however, the aggregate interpersonal skills scores on the mCEX correlated modestly with CSE interpersonal skills (r = 0.31, P < .0001). Student mCEX scores were significantly higher in Blocks 3 and 4 than in Blocks 1 and 2 (P = .02); however, contrary to our hypothesis, we found somewhat higher correlations between mCEX done earlier in the clerkship year and CSE checklist/notes scores (r = 0.26, P = .05; r = 0.45, P = .002; r = 0.07, P = .57; r = 0.12, P = .31; Blocks 1–4, respectively).
Our analysis demonstrates performance on the mCEX has a small, but modest, correlation with overall future clinical skills performance in both the short term (at the end of the IM clerkship) and in the longer term (after the completion of the core clerkship year). Prior studies evaluating tools for direct observation of clinical skills with actual patients across medical specialties have largely focused on correlations with written and oral exams and faculty/resident evaluations of students.6–9 In many studies, the mCEX has been shown to be able to detect differences in trainee performance.1,3,4,10 In one study, residents' mCEX performance was related to performance on a high-stakes exam that included an SP component,10 and faculty ratings of a medical student with an SP using the mCEX have been shown to correlate with the SP's ratings of that same encounter.11 However, few studies,8 and none in IM, have explored the relationship between scores on tools used to observe trainees with actual patients and future performance on SP/OSCE exams.
There are several possibilities to explain the generally low magnitude of the observed correlations and the lack of competency-specific correlations that we predicted. One explanation is that attendings and residents are not formally trained on criteria that must be met for a student to receive a certain mCEX score. This lack of training contributes to inconsistency in how ratings are assigned. It is also reasonable to expect that rating biases, such as the halo effect, may exist. Although each mCEX is intended as an evaluation of a student during a single patient encounter, assessment of student performance by an attending or resident who has a longer-term relationship with a student may be influenced by factors (i.e., work ethic, interpersonal skills) other than the interaction he or she has with that student at the time of the mCEX. This contrasts with a true single interaction as evaluated by each SP during an SP exam. If mCEX evaluators are influenced by factors other than competency-specific factors (i.e., interpersonal skills while evaluating physical exam skills), then this may also help explain the higher observed correlation of mCEX scores with interpersonal skills scores on the SP exams. There is a relative paucity of research on best approaches to train raters to use direct observation tools, and more research in the area is needed.
Another difference between the mCEX and the SP exams is that the former is intended as a tool for formative feedback and the latter are higher-stakes tools that impact grades. These different stakes could affect student performance and limit the observed correlations; however, our experience has been that students still try to perform well on the mCEX because attendings and residents who ultimately submit summative evaluations are assessing them.
The low correlations might also reflect the restricted range in scores on the mCEX which places boundaries on the magnitude of correlations that might be observed, and the relatively short FIM and 2006 CSE exams.12
Another limitation of our study is the increase in the number of cases on the CSE from 2007 to 2008. Despite this change, we chose to combine the data for the two CSE exams because it increased the power of the study. In addition, we viewed each case as a sample of representative items from the domain of general clinical skills for students who had completed the core clerkships. In 2008 we substituted two paper-based cases (not included in the 2007 CSE analysis) with more case-based stations to improve test characteristics and test a wider range of content; however, the format of the case-based stations and all checklists remained the same.
Our finding that correlations between mCEX and CSE scores were generally higher for Block 1 and 2 students is intriguing, especially when all students took the CSE at the same time upon completing the entire clerkship year. Why mCEX scores better discriminated amongst students earlier in their clinical training merits additional exploration.
Finally, although it is helpful to know that some predictive validity exists, we acknowledge that critical analysis of a feedback tool, such as the mCEX, should also focus on whether the feedback provided affects outcome-based results, such that all students can improve their future clinical performance based on the feedback they receive during an mCEX. If this is occurring, then it may explain why our observed correlations between mCEX and SP exam scores were not higher.
This is another study providing additional validity evidence for the mCEX. The mCEX can be useful for feedback; however, to maximize its utility, raters should be better trained to use it so that scores and associated feedback are accurate assessments of performance, thereby providing students the opportunity to improve their clinical skills.
Future research should address whether the observed correlations exist with a larger sample size of students from multiple institutions and whether feedback provided by the mCEX actually impacts students' clinical skills.
1 Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): A preliminary investigation. Ann Intern Med. 1995;123:795–799.
2 Hauer KE. Enhancing feedback to students using the mini-CEX. Acad Med. 2000;75:542.
3 Kogan JR, Bellini LM, Shea JA. Feasibility, reliability, and validity of the mini-clinical evaluation exercise (mCEX) in a medicine core clerkship. Acad Med. 2003;78(10 suppl):S33–S35.
4 Durning SJ, Cation LJ, Markert RJ, Pangaro LN. Assessing the reliability and validity of the mini-clinical evaluation exercise for internal medicine residency training. Acad Med. 2002;77:900–904.
5 Holmboe ES, Huot S, Chung J, Norcini J, Hawkins RE. Construct validity of the mini clinical evaluation exercise (miniCEX). Acad Med. 2003;78:826–830.
6 Hamdy H, Prasad K, Williams R, Salih FA. Reliability and validity of the direct observation clinical encounter examination (DOCEE). Med Educ. 2003;37:205–212.
7 Price J, Byrne JA. The direct clinical examination: An alternative method for the assessment of clinical psychiatry skills in undergraduate medical students. Med Educ. 1994;28:120–125.
8 Richards ML, Paukert JL, Downing SM, Bordage G. Reliability and usefulness of clinical encounter cards for a third-year surgical clerkship. J Surg Res. 2007;140:139–148.
9 Dunnington G, Reisner L, Witzke D, Fulginiti J. Structured single-observer methods of evaluation for the assessment of ward performance on the surgical clerkship. Am J Surg. 1990;159:423–426.
10 Hatala R, Ainslie M, Kassen BO, Mackie I, Roberts JM. Assessing the mini-clinical evaluation exercise in comparison to a national specialty examination. Med Educ. 2006;40:950–956.
11 Boulet JR, McKinley DW, Norcini JJ, Whelan GP. Assessing the comparability of standardized patient and physician evaluations of clinical skills. Adv Health Sci Educ. 2002;7:85–97.
12 Guilford JP, Fruchter B. Fundamental Statistics in Psychology and Education. 6th ed. New York, NY: McGraw Hill; 1978.