The evaluation of the clinical competence of medical students and residents can best be achieved through the use of direct observation. There has been inconsistency in how best to measure and compare performance on clinical skill domains, however. In response to the problems of the longer clinical evaluation exercise, in 1972 the American Board of Internal Medicine proposed the use of the mini-clinical evaluation exercise (mini-CEX) to evaluate residents in the completion of a patient history and physical examination that results in the demonstration of organized clinical judgments and efficient counseling skills.1
The mini-CEX is a seven-item, global rating scale that is designed to evaluate medical students’ and residents’ patient encounters in about 15 to 20 minutes. The mini-CEX is specifically designed to assess the skills that residents require in actual patient encounters and also to reflect the educational requirements that are expected by attending physicians during teaching rounds.2,3 As described by Norcini et al,1 the multiple use of the mini-CEX with trainees allows for a greater variability across different patient encounters that results in a more reliable and valid measure of clinical skill practice and development. It is a performance-based evaluation method that is used to assess selected clinical competencies (e.g., patient interview and physical examination, communication and interpersonal skills) within a medical training context.
Although the mini-CEX continues to be widely used in a broad range of clinical settings, there are concerns about the reliability and validity of this assessment instrument in evaluating medical students’ and residents’ clinical performance. In a recent literature review of in-training assessment using direct observation of single-patient encounters, Pelgrim et al4 acknowledged the mini-CEX as one of the best-supported instruments but stated that more evidence of construct validity is needed. Specifically, de Lima et al5 computed a range of mean scores on the mini-CEX items for a sample of 108 first- to fourth-year cardiology residents showing modest increases across the years of training. Kogan et al6 reported several correlation coefficients (r = 0.17–0.43) with a sample of medical students (n = 162) when comparing their mean mini-CEX scores with other written and clinical performance measures. In addition, Hatala et al7 computed a range of correlation coefficients from r = 0.29 to 0.60 on the mean score for the mini-CEX with residents’ measures on certifying internal medicine oral, bedside, and written exams (n = 162). In part because the mini-CEX has been used in a variety of contexts and the research focuses on measures across program years or is based on comparisons with other assessment methods, the validity of this clinical evaluation instrument needs further exploration.
The main purpose of our study, therefore, was to conduct an empirical integration of all published data on the use of the mini-CEX to assess medical students’ or residents’ clinical skills in comparison or contrasted with those participants’ use of other clinical measures at various training levels. In the present study, we conduct a meta-analysis on the construct and criterion (predictive or concurrent) validity of the mini-CEX as a function of both summary effect sizes and interpretation of the magnitude of coefficients as well as their confidence intervals (CIs).
Selection of studies
In this study, we followed the JAMA guidelines for the reporting of observational studies included in a meta-analysis.8 In addition to a MEDLINE (January 1995 to January 2012) search, we also searched the PsychINFO (January 1995 to January 2012), EMBASE (January 1995 to January 2012), and CINAHL (January 1995 to January 2012) databases. In addition, all lists of articles that used or referenced the use of the mini-CEX were reviewed to ensure that all relevant publications were identified.
To be included, a primary study had to meet the following criteria: (1) It used the original seven-item version of the mini-CEX, (2) it reported empirical findings on the use of the mini-CEX related to either medical students’ or residents’ clinical performance, (3) when applicable, it employed psychometrically sound criterion measures (e.g., standardized instruments, summative in-training evaluations, objectively scored observational ratings), and (4) it was published in a refereed, peer-reviewed journal. The purpose for restricting the search of the articles to refereed journals was to enhance the inclusion of studies that are of high quality. On the other hand, studies were excluded if (1) the focus of the article was restricted to a generalizability analysis or investigation of the internal structure of the mini-CEX,9 (2) the review on the use of the mini-CEX did not provide any new empirical data,10 and (3) the analysis focused on differences related to rater stringency without reporting on actual student performance outcomes.11,12
Our initial literature review and search of the databases yielded 31 peer-reviewed journal articles; 11 studies met the inclusion criteria requirements, and the other 20 articles failed to meet all the relevant inclusion criteria (e.g., review articles without new data, focus on rater stringency or training, emphasis on factor analysis or generalizability of mini-CEX). A coding protocol was developed that included each study’s title, author(s)’ name(s), year published, source of publication, study design (i.e., construct or criterion validity study), mini-CEX measures reported (i.e., a single item, all item domains, total mean score), student category (i.e., medical students, residents), program specialty, and types of raters (i.e., faculty, residents, consultants). The following moderator variables were coded when available: sex, age, race, ethnicity, and location of medical school or residency program. All 11 articles were coded independently by two of us (A.A. and S.K.), and any discrepancies (e.g., effect size calculations) were reviewed by another (T.D.). On the basis of our iterative reviews and discussions, we were able to achieve 100% agreement on all coded data.
The statistical analysis of all effect size calculations was performed using the Comprehensive Meta Analysis software program (version 1.0.23, Biostat Inc., Englewood, New Jersey). Depending on the empirical data reported in each of the primary studies, we used the Pearson product–moment correlation coefficient (r) or mean differences (Cohen d) as the effect size measures. We selected mini-CEX items or total mean score on the mini-CEX measures as the variables and either contrasted between groups (e.g., postgraduate or in-training year) or compared mini-CEX scores with other clinical skill measures (e.g., in-training evaluation report). For 5 (45%) of the 11 studies included, a correlation coefficient between mini-CEX scores and performance on another outcome was provided in the results section (see Table 1). For 7 (64%) of the studies, we calculated d from other reported data (note that from Kogan and colleagues’6 study we were able to extract both r and d effect size values).
With the combination of results from studies that used different research designs (e.g., different years of residency training, resident versus faculty ratings of medical students) and methods of analysis between groups (e.g., mini-CEX in comparison with in-training evaluation reports, oral exams, bedside assessments), we used a random-effects model in combining the unweighted and weighted effect sizes. Although a fixed-effect model assumes that the summary effect size differences are the same from study to study (e.g., the consistent use of the mini-CEX instrument), the random-effects model calculation reflects a more conservative estimate of the between-study variance of the participants’ clinical skills performance measures.13 Forest plots with Cochran Q tests for heterogeneity of effect sizes were completed, but the absence of a significant P value for Q may imply low power within studies rather than actual consistency or homogeneity across the studies included in the meta-analysis.14,15 Subsequently, a review of the dispersion of the studies in the forest plots was an important visual indicator for evaluating the consistency between studies. The interpretation of the magnitude of the effect size for both mean differences and correlations is based on Cohen’s16 suggestions: r = 0.10 to 0.29 and d = 0.20 to 0.49 are “small,” r = 0.30 to 0.49 and d = 0.50 to 0.79 are “medium,” and r > 0.50 and d > 0.80 are considered to be “large” effect size differences.
The characteristics of the 11 studies included in the meta-analysis are based on four groups that we identified (see Table 1) that report contrasts between trainees within one year of a residency program (Group A), differences between performance levels within a peer group (Group B), rating differences between faculty/residents (Group C), and comparisons between the mini-CEX and other measures of achievement or performance (Group D). In addition, the reported mini-CEX domain measure (i.e., items 1–7, or the total mean score) and corresponding unweighted effect sizes based on either the contrast or comparison variables are provided in Table 1. The studies included illustrate different approaches to testing the validity of the mini-CEX. Groups A, B, and C test the construct validity of the items or mean scores of the mini-CEX instrument by showing that medical students or residents at different levels or by personnel ratings tend to obtain higher clinical skills performance scores, and Group D tests the criterion validity by comparing the mini-CEX with other similar assessments of clinical performance as either a concurrent or predictive measure.
The sample sizes of the studies range from 9 residents17 to 244 medical students18 who had been assessed using the mini-CEX with as few5 as 2.3 completed forms and as many17 as 38 completed forms per individual (note that in Holmboe and colleagues’17 study, 38 faculty members each viewed nine videotaped residents’ performance at the poor, marginal, and superior levels). In our meta-analysis, we treated medical students and residents equally in that they represent learners at different stages of their training in clinical skills development. Therefore, we are evaluating the performance of these trainees as a function of their ability to conduct an appropriate medical interview, physical examination, etc. The training provided to medical students and residents at the bedside or on the ward adheres to a similar teaching and learning process where the performance of these skills reflects the clinical competency expectations for all clinicians in practice. Information on specific demographic characteristics such as students’ sex or age was not reported, but level of training and residency program were typically identified. In each study, the unweighted mean effect size difference (Cohen d) or Pearson product–moment correlation (rUWM) is provided between the mini-CEX item or mean score and either a contrasting variable (e.g., in-training level) or comparison measure (e.g., in-training evaluation reports).
Construct validity of the mini-CEX
Of the 11 studies that reported data on medical students’ or residents’ clinical skills performance, 7 (64%) demonstrate results in support of the construct validity of the mini-CEX. As shown in Table 2, we combined 4 of the studies (Group A) to show that, for each of the seven mini-CEX items, a range of effect size differences in performance between a single year of residency training (e.g., change in ratings as a function of year 1 to year 2, year 2 to year 3, etc.) from d = 0.25 (95% CI, 0.04–0.46) in humanistic qualities/professionalism to d = 0.50 (95% CI, 0.31–0.70) in overall clinical competence. As illustrated in the forest plot (Chart 1), the combined fixed-effect and random-effects size for the three Group A studies (and seven outcome measures) for the overall clinical competence item were both shown to be a “medium” effect size difference, d = 0.50 (95% CI, 0.31–0.70).
When differences between performance level within a peer group (superior/honors, marginal/high pass, poor/pass) were investigated, we found two studies (Group B) that showed a mean difference in clinical performance on three items and a total mean score of the mini-CEX. In particular, Holmboe et al17 compared ratings on the medical interviewing, physical examination, and counseling skills items to show that there are consistently large effect size differences between mean ratings of poor, marginal, and superior second-year residents. In particular, the ratings on these items ranged from d = 0.90 between superior/marginal residents on the medical interviewing item to a mean difference of d = 4.00 between superior/poor residents on the physical exam skills item. Correspondingly, Kogan et al6 found a range of mean mini-CEX scores that varied from d = 0.04 with high-pass/pass-level medical students in an inpatient setting to d = 1.00 with honors/pass-level medical students in an outpatient setting.
In Group C, we combined the outcomes from two studies that investigated the mean differences in ratings provided by residents and faculty members on medical students. As shown in Table 2, the ratings of medical students by faculty are consistently more stringent than those of residents across all seven of the mini-CEX items. As illustrated by the forest plot in Chart 1, residents are more lenient in the mean ratings of medical students’ performance on the overall clinical competence item, with a combined “medium” random-effects size difference of d = 0.38 (95% CI, 0.15–0.62).
Criterion (predictive/concurrent) validity of the mini-CEX
Of the 11 studies included in the meta-analysis, 5 reported data (Table 3) on either medical students’ or residents’ mini-CEX item or mean score ratings with some other criterion measure (e.g., in-training evaluation reports, inpatient or outpatient write-ups, examination checklists). Although the mean effect size differences were found to be “medium” across each of the seven items on the mini-CEX, the total mean score resulted in a combined “small” effect size difference, d = 0.26 (95% CI, 0.16–0.35). As shown in the forest plot (Chart 1), the combined random-effects size calculation for the overall clinical competence item was “medium,” d = 0.64 (95% CI, 0.48–0.77).
Although the Cochran Q test shows significant heterogeneity between the studies included in Groups A, C, and D on the medical students’ and residents’ mini-CEX ratings, an analysis to determine the potential differences as a result of moderator variables (e.g., sex, year of program) was limited by the information provided across the primary studies included in the meta-analysis. Nevertheless, the studies are weighted by their respective sample sizes, and the random-effects model analysis (with greater 95% CIs) provides a more conservative estimate of the combined effect sizes and with the overall clinical competence item illustrated in the forest plot diagram.
Discussion and Conclusion
Our study provided four major findings.
- The mini-CEX has evidence of construct validity when used with residents across the years of a residency program. Residents’ performance on the mini-CEX items across one year of residency training showed “small” to “medium” effect size differences, with the effect sizes ranging from d = 0.25 (95% CI, 0.04–0.46) to d = 0.50 (95% CI, 0.31–0.70). When performance on items across more than one year of residency training was reviewed, however, the unweighted mean effect size differences were found to be even greater (e.g., d = 3.80 between superior/poor-performing residents on the counseling skills item).
- The effect size differences between performance levels within a peer group (superior/honors, marginal/high pass, poor/pass) ranged from d = 0.43 (95% CI, 0.23–0.63) in one study on the total mean score of the mini-CEX up to d = 1.86 (95% CI, 0.31–3.40) on the physical examination skills item. Although these results are based on two separate studies included in the meta-analysis, the findings are based on the combination of 3 and 12 outcome measures combined from each of the studies, respectively.
- The rating differences of medical students on the mini-CEX between personnel (either residents or faculty members) showed “small” effect size differences that ranged from d = 0.23 (95% CI, 0.04–0.50) on the clinical judgment item to d = 0.50 (95% CI, 0.34–0.65) on the counseling skills item. In particular, these results support other findings that have shown that, in comparison with faculty evaluators, residents tend to be more lenient and score medical students higher on in-training evaluation checklists.19,20
- The mini-CEX shows evidence of criterion-related validity when compared with other clinical skill achievement (e.g., certifying oral and written examinations) or performance (e.g., in-training evaluation reports, inpatient or outpatient write-ups) measures. We found “small” to “large” correlation coefficients with combined effect sizes ranging from r = 0.26 (95% CI, 0.16–0.35) on the mean score of the mini-CEX to r = 0.64 (95% CI, 0.48–0.77) on the overall clinical competence item.
The construct- and criterion-related validity of the mini-CEX are supported by the findings outlined within the studies that were included in one or more of four group comparisons. As shown in Tables 2 and 3, the effect size calculations derived from the outcomes included in studies from Groups A, C, and D found a significant and “medium” combined effect size for each of the mini-CEX items or total mean score. As illustrated in the forest plot for the overall clinical competence item, not all reported outcomes on differences between a single year of residency training (Group A) were found to be statistically significant. When combined with the outcomes from four difference studies, however, we found that there is a significant total random-effects size of d = 0.50 (95% CI, 0.31–0.70). In support of the mini-CEX as a criterion-related measure in comparison with other measures of clinical skills performance, we also found that there is a “large” combined random-effects size of r = 0.64 (95% CI, 0.48–0.77).
There are limitations to this meta-analysis. The quality of the meta-analysis study depends on the quality of the primary studies that we selected on the basis of the criteria we followed for inclusion. As we were interested in determining the construct- and criterion-related validity of the mini-CEX as a direct observation of clinical skill development, consistency in the use of the evaluation instrument from a research design perspective varied on the basis of who or what was being assessed (e.g., medical student, resident, performance-level ranking), the evaluators used (i.e., residents, faculty, specialists, consultants), and whether or not the mini-CEX was being compared with other clinical skill measures (e.g., in-training evaluation reports, inpatient or outpatient write-ups, certifying exams). To overcome this concern, we used the more conservative random-effects size (weighted) analysis and test for heterogeneity between the studies by using the Cochran Q test. Although some of the studies had small sample sizes,17,21 such as 9 and 10, this was in part compensated by the 38 and 10 mini-CEX forms completed, respectively, by each of the participants in these studies. In an attempt to exercise additional control over the quality of the studies that were included, we selected articles that had been published in refereed journals. Further analyses based on potential moderator variables (e.g., sex, age, residency discipline) were not possible because many of the included studies did not report such data.
The findings of this meta-analysis on the construct- and criterion-related validity of the mini-CEX show consistent and “medium” combined effect sizes for both mini-CEX items and total mean score. The introduction and use of the mini-CEX as a standard instrument for the assessment of medical students’ and residents’ clinical skill performance across a number of domains (e.g., medical interviewing, physical examination, counseling skills) are an important advance in the use and recognition of the direct observation method of clinical evaluation. Therefore, the mini-CEX has been adopted and used extensively as an instrument for the assessment of clinical skill performance in medical education programs in Canada, the United States, Europe, and other countries’ medical schools and residency programs.
The mini-CEX has the potential to be used frequently, even on a daily basis; however, the context of its use is restricted by the discipline (e.g., for general and specialty programs that have trainees encountering patients on a regular basis) and purpose of the evaluation (e.g., meets program expectations for the assessment of trainees’ clinical competencies). Consequently, there are some training programs where the use of the mini-CEX is simply not appropriate in many cases for some disciplines (e.g., anesthesiology, emergency medicine, pathology, radiology). Although the actual number of mini-CEX forms completed about a trainee will vary from program to program, it has been suggested that to obtain a generalizability coefficient of 0.88, only six encounters are required.22
Medical educators are faced with the challenge of how to assess the in-training evaluation of medical students’ and residents’ clinical skills on the basis of patient encounters in a manner that is feasible, reliable, and valid. To date, the mini-CEX has proved to be a useful in-training assessment measure with clear evidence of construct- and criterion-related validity. Although the mini-CEX is considered to be a useful assessment instrument, it should not be the only measure used to assess trainees’ clinical skill development. Other reliable and valid methods should be used in conjunction with the mini-CEX in the evaluation of clinical skills performance to overcome the limitation of using of a single measure.
Other disclosures: None.
Ethical approval: Not applicable.
1. Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: A method for assessing clinical skills. Ann Intern Med. 2003;138:476–481
2. Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): A preliminary investigation. Ann Intern Med. 1995;123:795–799
3. Norcini JJ, Blank LL, Arnold GK, Kimball HR. Examiner differences in the mini-CEX. Adv Health Sci Educ Theory Pract. 1997;2:27–33
4. Pelgrim EAM, Kramer AWM, Mokkink HGA, van den Elsen L, Grol RPTM, van der Vleuten CPM. In-training assessment using direct observation of single-patient encounters: A literature review. Adv Health Sci Educ.. 2011;16:189–199
5. Alves de Lima A, Barrero C, Baratta S, et al. Validity, reliability, feasibility and satisfaction of the mini-clinical evaluation exercise (mini-CEX) for cardiology residency training. Med Teach. 2007;29:785–790
6. Kogan JR, Bellini LM, Shea JA. Feasibility, reliability, and validity of the mini-clinical evaluation exercise (mCEX) in a medicine core clerkship. Acad Med. 2003;78(10 suppl):S33–S35
7. Hatala R, Ainslie M, Kassen BO, Mackie I, Roberts JM. Assessing the mini-clinical evaluation exercise in comparison to a national specialty examination. Med Educ. 2006;40:950–956
8. Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: A proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008–2012
9. Cook DA, Beckman TJ, Mandrekar JN, Pankratz VS. Internal structure of mini-CEX scores for internal medicine residents: Factor analysis and generalizability. Adv Health Sci Educ Theory Pract. 2010;15:633–645
10. Hawkins RE, Margolis MJ, Durning SJ, Norcini JJ. Constructing a validity argument for the mini-clinical evaluation exercise: A review of the research. Acad Med. 2010;85:1453–1461
11. Kogan JR, Hess BJ, Conforti LN, Holmboe ES. What drives faculty ratings of residents’ clinical skills? The impact of faculty’s own clinical skills. Acad Med. 2010;85(10 suppl):S25–S28
12. Margolis MJ, Clauser BE, Cuddy MM, et al. Use of the mini-clinical evaluation exercise to rate examinee performance on a multiple-station clinical skills examination: A validity study. Acad Med. 2006;81(10 suppl):S56–S60
13. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177–188
14. Cooper H, Hedges LV, Valentine JC The Handbook of Research Synthesis and Meta-Analysis. 20092nd ed New York, NY Russell Sage Foundation
15. Borenstein M, Hedges LV, Higgins JPT, Rothstein HR Introduction to Meta-Analysis. 2009 West Sussex, UK John Wiley & Sons
16. Cohen J Statistical Power Analysis for the Behavioral Sciences. 1988 Hillsdale, NJ Erlbaum
17. Holmboe ES, Huot S, Chung J, Norcini J, Hawkins RE. Construct validity of the miniclinical evaluation exercise (miniCEX). Acad Med. 2003;78:826–830
18. Ney EM, Shea JA, Kogan JR. Predictive validity of the mini-clinical evaluation exercise (mcex): Do medical students’ mCEX ratings correlate with future clinical exam performance? Acad Med. 2009;84(10 suppl):S17–S20
19. Hatala R, Norman GR. In-training evaluation during an internal medicine clerkship. Acad Med. 1999;74(10 suppl):S118–S120
20. Hill F, Kendall K, Galbraith K, Crossley J. Implementing the undergraduate mini-CEX: A tailored approach at Southampton University. Med Educ. 2009;43:326–334
21. Boulet JR, McKinley DW, Norcini JJ, Whelan GP. Assessing the comparability of standardized patient and physician evaluations of clinical skills. Adv Health Sci Educ Theory Pract. 2002;7:85–97
22. Sidhu RS, Hatala R, Barron S, Broudo M, Pachev G, Page G. Reliability and acceptance of the mini-clinical evaluation exercise as a performance assessment of practicing physicians. Acad Med. 2009;84(10 suppl):S113–S115
23. Wiles CM, Dawson K, Hughes TA, et al. Clinical skills evaluation of trainees in a neurology department. Clin Med. 2007;7:365–369
24. Torre DM, Simpson DE, Elnicki DM, Sebastian JL, Holmboe ES. Feasibility, reliability and user satisfaction with a PDA-based mini-CEX to evaluate the clinical skills of third-year medical students. Teach Learn Med. 2007;19:271–277
© 2013 by the Association of American Medical Colleges
25. Durning SJ, Cation LJ, Markert RJ, Pangaro LN. Assessing the reliability and validity of the mini-clinical evaluation exercise for internal medicine residency training. Acad Med. 2002;77:900–904