Secondary Logo

Journal Logo

Special Theme: Complementary, Alternative, and Integrative Medicine: RESEARCH REPORTS

Assessing the Reliability and Validity of the Mini—Clinical Evaluation Exercise for Internal Medicine Residency Training

Durning, Steven J., MD; Cation, Lannie J., MD; Markert, Ronald J., PhD; Pangaro, Louis N., MD

Author Information
  • Free


The American Board of Internal Medicine (ABIM) eliminated the oral examination requirement for board certification in 1972.1 Since then, the ABIM has delegated the critical task of evaluating residents' clinical competence to residency program directors. The ABIM has asked program directors to assess clinical competence through direct evaluation methods, such as the traditional clinical evaluation exercise (tCEX) and the mini—clinical evaluation exercise (mCEX).

In the tCEX model, which takes approximately two hours, an attending physician observes a resident perform a complete history and physical examination on a single patient, and give an oral presentation of pertinent data and a management plan.1 Studies have shown that approximately 82% of residents have one tCEX during their first postgraduate year (PGY-1), and only about 32% have completed more than one tCEX.2 The tCEX has been criticized for the excessive amount of time required to complete each assessment, its limited reliability for a given resident due to the few assessments per resident, and its incompatibility with the real-world practice of physicians' focused encounters with patients.1

The mCEX was recently recommended by the ABIM for evaluating residents' clinical competence. The mCEX is an evaluation tool that assesses residents' clinical skills, attitudes, and behaviors. The mCEX entails a much shorter evaluation encounter and can be applied to a wide variety of clinical settings.1

To date, no study has been conducted to assess the validity of the mCEX. We compared mCEX scores with other widely used methods of evaluating residents' clinical competence to determine the validity of this newer tool for evaluating residents. We also compared mean mCEX scores for each of seven mCEX assessments to determine whether the scores changed with subsequent encounters.


Wright—Patterson Medical Center (WPMC), in Dayton, Ohio, is an Air Force Regional Medical Center that provides primary and tertiary care to active-duty and retired military members and their dependents. WPMC, one of four U.S. Air Force internal medicine training programs, is the primary teaching hospital for the internal medicine residency program and has eight categorical internal medicine residents per year.

The mCEX, in a format recommended by the ABIM, was incorporated into the internal medicine residency program at WPMC in February 1996. All PGY-1 residents are required to perform one mCEX during each month of inpatient general internal medicine. The attending physician for the inpatient ward team completes the evaluation and is also responsible for completing the ABIM monthly evaluation form (MEF). The respective service's attending physician completes the ABIM evaluation for each outpatient month. All MEF and mCEX evaluation forms completed for each resident were included in our study, without exclusionary criteria.

We reviewed the scores on all mCEX evaluation forms for three classes of PGY-1 residents for academic years 1996–97, 1997–98, and 1998–99 (n = 23). (One resident left our program prior to completing residency training and was thus excluded from our study.) The sections for the mCEX during our study were clinical skills history, clinical skills physical exam, clinical judgment, humanistic attributes, and overall clinical competence. In these five areas, we rated residents on a 1–9 scale (1 = un-satisfactory, 9 = superior). We calculated each PGY-1 resident's mean score for each of these five areas.

An attending physician completes our modified ABIM MEF for each resident during each month of his or her residency. Our ABIM MEF includes the following sections: clinical skills history, clinical skills physical exam, clinical judgment (divided into four sections), humanistic attributes, medical knowledge, medical care, and overall clinical competence. We calculated a mean score for each PGY-1 resident for each section.

Internal consistency reliability was measured using Cronbach's coefficient alpha. Analysis of variance (ANOVA) was used to compare mean scores on the first seven mCEXs. We assessed construct validity by comparing the mean mCEX scores for the first seven encounters.

Mean mCEX scores were compared with mean scores for the ABIM MEF. We also compared mCEX scores for clinical judgement and overall clinical competence with the overall PGY-2 In-Training Examination (ITE) percentile score using the Pearson product—moment correlation. We made inferences at the .05 level of significance for all inferential statistical procedures with no correction for multiple comparisons.

We used two units of analysis. The resident was the unit of evaluation for correlations of mean mCEX scores and mean ABIM MEF scores. The encounter was the unit of analysis for the coefficient alpha calculations and for comparing mCEX scores across months.


Each of the 23 PGY-1 residents had an average of seven mCEXs completed by an average of six different ward attending physicians. One PGY-1 resident had five mCEXs, two residents had six mCEXs, the remainder had seven or more mCEXs. (Nine was the maximum number of mCEXs.) During academic year 1996–97, each intern completed eight inpatient ward months; during academic years 1997–98 and 1998–99, each intern completed seven ward months. We had 162 of the 168 required mCEX evaluations (96.43% completion rate). Of the 23 residents in our study, 21 were men, two were women, and the age range was 26–38 years.

In our study, 46 different attending physicians completed mCEXs. Our inspection of the data showed no leniency or stringency effects among raters (data not shown). A formal assessment of differences among raters was precluded by numerous raters who each had only a few encounters with residents. Each resident also had an average of 12 ABIM MEFs completed during the study period. We used all ABIM MEFs for analysis, including MEFs completed during months when a corresponding mCEX was not performed.

We calculated Cronbach's coefficient alpha, a measure of internal consistency reliability, for 162 mCEXs and for the mCEXs for seven individual months (since 20 residents had seven mCEX ratings). The coefficient alpha for 162 forms was .90, and the median coefficient alpha for months one through seven was also .90.

Mini-CEX history, physical exam, clinical judgment, humanistic attributes, and overall clinical competence all significantly correlated with corresponding ABIM sections (p < .01). Additionally, mCEX clinical judgment significantly correlated with ABIM medical knowledge, medical care, and overall clinical competence (p < .01). Overall clinical competence in the mCEX also significantly correlated with ABIM medical care and medical knowledge (p < .01) (see Table 1). Clinical judgment in the mCEX (r = .57, p < .01) and overall clinical competence (r = .47, p < .05) correlated with the ITE score.

Table 1
Table 1:
A Comparison of Residents' Scores from the mCEX Forms and from the ABIM Evaluation Forms, Wright—Patterson Medical Center, 1996–99*

Our comparison of the mean mCEX scores for each of the first seven encounters yielded no statistically significant difference (p = .21) (see Table 2).

Table 2
Table 2:
Residents' Mean mCEX Scores for the First Seven mCEX Evaluations in This Study, Wright-Patterson Medical Center, 1996–99*


With the elimination of the oral examination, the ABIM, other institutions, and program directors have attempted to develop tools to directly observe residents' clinical competence. The term “clinical competence” is often used broadly to incorporate the domains of knowledge, skills, and attitudes necessary to practice medicine.3 Clinical competence is the quality of “not only knowing but knowing how.”4

Evaluating residents' clinical competence is a difficult task. Standardized tests such as the ABIM Certifying Examination (ABIMCE) primarily assess knowledge and clinical judgment.5,6,7 Evaluating residents' competence requires directly observing trainees with patients, an appropriate evaluation format, sufficiently trained raters, and the experience to judge competence and provide feedback to the residents on areas for improvement. Ideally, assessing residents' clinical competence should occur in a setting and with a format closely modeling actual clinical practice. This is one of the drawbacks of the tCEX; clinicians do not usually perform a comprehensive history and physical examination with each patient encounter, as they do in the tCEX. Moreover, inferences about residents' competence may be limited to the single clinical problem presented by the patient at hand. The advantages of the mCEX over the tCEX are (1) enhanced flexibility for a variety of clinical settings, (2) less time required to perform the evaluation, (3) increased reliability due to more evaluations per resident (with far more content validity since the resident may be evaluated using multiple clinical problems), and (4) a focused evaluation that more closely models actual clinical practice1 (face validity) but that assesses selectivity and prioritization of data gathering (content validity).

The ITE and the ABIM MEF were chosen for criterion-related validity, as they are generally accepted, standard tools for evaluating residents' clinical competence. Unfortunately, no single evaluation tool adequately provides all the data required for determining the multidimensional construct of clinical competence. Using a standardized form that specifies items for documentation, similar to that used in the mCEX, has been shown to improve both the quantity and the accuracy of raters' observations.5

The ITE has been shown to be a valid evaluation tool for medical knowledge and clinical judgment, as several studies have shown the predictive validity of the ITE for subsequent performance on the ABIMCE.6,8 The ABIM MEF has been shown to detect global differences among residents with regard to clinical competence,9 but has failed to distinguish among specific areas of competence.9,10

The ABIM MEF is used to regularly assess performance and to produce a composite evaluation for the ABIMCE. The ABIM MEF, which has behavior-related descriptors, provides a global rating scale that measures various components of clinical competence such as judgment, knowledge, humanistic attributes, professionalism, and overall clinical competence. As with many global rating scales, a “halo effect,” or a single factor, appears to positively influence all ratings.9,10

Each resident had an average of seven mCEXs completed during our study, exceeding the current suggested guideline of a minimum of four mCEXs per resident during the year.1 The mCEX has been shown to have reasonable reliability (reproducibility coefficient of .56 with a standard error of measurement of .35)1 with this number of evaluations. Similarly, we found excellent internal consistency reliability for the mCEX (coefficient alpha = .90). Additionally, our mCEX completion rate of 96% supports the feasibility of conducting this evaluation method.

In our study, the mCEX rating of clinical judgment correlated with the ITE, suggesting that attending physicians can reasonably rate residents' medical knowledge and clinical judgment by observing them in clinical encounters. In addition, our measurements of the mCEX components were sufficiently precise: clinical skills history (95% confidence interval 7.22–7.54), clinical skills physical exam (95% confidence interval 7.13–7.48), clinical judgment (95% confidence interval 7.29–7.58), humanistic attributes (95% confidence interval 7.81–8.09), and overall clinical competency (95% confidence interval 7.30–7.60). Overall clinical competence in the mCEX significantly correlated with the ITE, suggesting that attending raters consider medical knowledge and clinical judgment as main factors in assessing clinical competence. Additionally, the significant correlation of mCEX clinical judgment and clinical competence with the ABIM sections on individual and composite clinical judgment, as well as with ABIM medical knowledge, medical care, and overall clinical competence, suggests that attending physicians rate residents' perceived medical knowledge and clinical judgment as central to assessing residents' clinical competence. Thus, our study suggests that the “halo effect” may be perceived medical knowledge and clinical judgment. Prior studies, using factor analysis, have shown that ABIM MEF ratings tend to be weighted heavily on perceived knowledge as well as interpersonal skills.9,10

We found that mean scores did not differ on the first seven mCEXs. Initially, we thought this analysis would be an appropriate test of the mCEX's construct validity. However, due to several factors that could affect sequential mCEX scores, we are reluctant to draw any conclusion about construct validity from our results. These factors include our small sample size, restriction of range for mCEX ratings, possible halo effect, and the fact that raters may be more lenient during the early months of residency, when residents are perceived to be less knowledgeable and skilled.

We saw lower correlation coefficients for unrelated mCEX and ABIM MEF sections (data not shown). For example, mCEX humanistic attributes would not be expected to correlate with ABIM MEF clinical judgment or medical knowledge (correlations of .22 and .34, respectively). Likewise, the mCEX clinical skills physical exam would not be expected to correlate with ABIM MEF humanistic attributes (correlation coefficient = .17).

Our study was limited by its small sample of PGY-1 residents for each year from a single institution; however, we collected data over a three-year period and included all mCEXs and ABIM MEFs for residents in the program. Additionally, our results could have been biased in cases where the attending physician's ABIM MEF clinical skills history and clinical skills physical exam ratings were based solely on his or her observation of history and physical exam skills evaluated during the mCEX. It is unlikely, however, that this bias influenced our results, because each resident, on average, had almost half of his or her evaluations completed during months when no corresponding mCEX was performed. As with many descriptive evaluation tools, no definitive “gold standard” exists for assessing residents' clinical skills. The ABIM MEF and the ITE have been the subject of relatively few critical reviews and have not been conclusively shown to be valid. In the absence of a definitive “gold standard” for assessing residents' clinical skills, however, significant correlation with generally accepted rating instruments suggests that the mCEX is valid. While our study was also limited by restriction of range for mCEX and ABIM MEF ratings, the correlations between the mCEX and the ABIM would have been greater without this restriction.

We found that the mCEX forms had more specific or goal-oriented comments for feedback, as opposed to the ABIM MEF, in which the comments were generally vague. Specifically, most of the mCEXs contained goal-oriented comments regarding history and physical exam skills, which were rarely present on the ABIM MEF. Additionally, several of the mCEX forms contained specific comments on bedside manner and clinical competence, notes rarely found on the ABIM MEF. These findings should be addressed in a larger study.

The reliability of the mCEX is supported by our high internal consistency reliability and by the number of evaluations completed per resident (all but three residents had seven or more mCEXs, with a range of five to nine). Our completion rate of 96.43% supports the feasibility of this evaluation method.

The validity of the mCEX is supported by the strong correlations between mCEX scores and corresponding ABIM MEF scores, as well as the ITE scores. Based on the results of our study, the mCEX appears to be complementary to other methods of assessing residents' clinical competence and is a valid tool for both formative and summative evaluation of PGY-1 residents for residency program directors.


1. Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med. 1995;123:795–9.
2. Day SC, Grosso LG, Norcini JJ Jr, Blank LL, Swanson DB, Horne MH. Residents' perceptions of evaluation procedures used by their training program. J Gen Intern Med. 1990;5:421–6.
3. Holmboe ES, Hawkins RE. Methods for evaluating the clinical competence of residents in internal medicine: a review. Ann Intern Med. 1998;129:42–8.
4. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9 suppl):S63–S67.
5. Noel GL, Herbers JE, Caplow MP, Cooper GS, Pangaro LN, Harvey J. How well do internal medicine faculty members evaluate the clinical skills of residents? Ann Intern Med. 1992;117:757–65.
6. Grossman RS, Fincher RM, Layne RD, Seelig CB, Berkowitz LR, Levine MA. Validity of the in-training examination for predicting American Board of Internal Medicine Certifying Examination scores. J Gen Intern Med. 1992;7:63–7.
7. Petersdorf RG, Beck JC. The new procedure for evaluating the clinical competence of candidates to be certified by the American Board of Internal Medicine. Ann Intern Med. 1972;76:491–6.
8. Waxman H, Braunstein G, Dantzker D, et al. Performance on the internal medicine second-year residency In-training Examination predicts the outcome of the ABIM Certifying Examination. J Gen Intern Med. 1994;9:692–4.
9. Haber RJ, Avins AL. Do ratings on the American Board of Internal Medicine resident evaluation from detect differences in clinical competence? J Gen Intern Med. 1994;9:140–5.
10. Thompson WG, Lipkin M Jr, Gilbert DA, Guzzo RA, Robertson L. Evaluating evaluation: assessment of the American Board of Internal Medicine resident evaluation form. J Gen Intern Med. 1990;5:214–7.
© 2002 Association of American Medical Colleges