Holmboe, Eric S. MD; Huot, Stephen PhD, MD; Chung, Jeff MD; Norcini, John PhD; Hawkins, Richard E. MD
Drs. Holmboe and Huot are associate professors of medicine, and Dr. Chung is fellow, Department of Medicine and the Yale Primary Care Internal Medicine Residency Program, Yale University School of Medicine, New Haven, Connecticut. Dr. Norcini is president and CEO, Foundation for the Advancement of International Medical Education and Research (FAIMER), Philadelphia, Pennsylvania. Dr. Hawkins is deputy vice president, Assessment Programs, National Board of Medical Examiners, Philadelphia, Pennsylvania.
Correspondence and requests for reprints should be addressed to Eric S. Holmboe, Waterbury Hospital, Pomeroy 6, 64 Robbins Street, Waterbury, CT 06721; e-mail: 〈email@example.com〉 or 〈firstname.lastname@example.org〉.
This work was supported by a grant from the Robert Wood Johnson Foundation and the American Board of Internal Medicine.
The direct observation of residents' clinical skills by faculty is essential to ensure that residents have achieved a satisfactory level of competence to practice independently. To facilitate direct observation during residency training, the American Board of Internal Medicine (ABIM) developed the miniclinical evaluation exercise (miniCEX).1 The miniCEX assessment method is intended to promote and facilitate the direct observation of history taking, physical examination, and counseling skills. The ABIM strongly recommends the miniCEX for interns (PGY1). The Accreditation Council for Graduate Medical Education (ACGME) has endorsed the miniCEX as a method of evaluating several of the six new general competencies, including the domain of patient care.2 Previous research has demonstrated that the miniCEX has sufficient reliability for “pass/fail” determinations after just four observations.1 Little data, however, are available regarding the validity of the miniCEX.
Evaluation of the validity of the miniCEX is important for several reasons. First, studies have documented significant deficiencies in the basic clinical skills of physicians-in-training, highlighting the need for a valid evaluation method for clinical skills.3–9 Failure to identify poor performance precludes effective remediation or improvement in clinical skills. Second, the miniCEX may be the only evaluation method used by many residency programs to directly observe clinical skills. Although standardized patients offer another opportunity to directly observe clinical skills, the miniCEX is easier to implement and less costly. Furthermore, standardized patients are not always readily available to residency programs and are not a substitute for observing trainees in actual clinical performances. Real patients often present with real symptoms and discomfort and, therefore, present a broader range of medical challenges. Finally, previous research with the original model of the CEX in the late 1980s (“traditional” CEX) uncovered significant concerns about the quality of faculty observations.10–12
We undertook this study to evaluate the construct validity of the miniCEX as an evaluation method for the clinical skills of residents. For the purposes of this study, construct validity is defined as the ability of the miniCEX to differentiate between levels of performance.13 Demonstrating that the miniCEX has construct validity would be valuable to medical educators who need tools to differentiate between levels of competence, especially between satisfactory and unsatisfactory performances.
Forty faculty from 16 different internal medicine residency programs from the Northeast region (Connecticut, Massachusetts, and Rhode Island) and Mid-Atlantic region (Maryland, Virginia, and the District of Columbia) were randomized to participate in a study of faculty development in evaluation. The participating faculty members from each institution were chosen by the institutions' residency program directors. The program directors were also encouraged to participate. A minimum of two and a maximum of four faculty members were recruited from each program. The participants were informed that the main objective of the trial was to improve the faculty's evaluation skills. All faculty members completed a baseline assessment of their observation skills in October 2001, using the miniCEX instrument to evaluate a series of nine videotapes before any training in evaluation or direct observation. The Yale University Human Investigation Committee and the Uniformed Services University of the Health Sciences Institutional Review Board approved the study.
Scripts were written for standardized patients and standardized residents that depicted three levels of performance for each of three clinical skills: history taking, physical examination, and counseling. A videotape of each scripted encounter was filmed and recorded for a total of nine videotapes. The depicted resident was stated to be a PGY2 level, and the same standardized resident portrayed each level of performance per clinical skill. The levels of performance depicted by the residents were unsatisfactory, marginal/satisfactory, and high satisfactory/superior level. The scripts were developed by one of the authors (EH), then reviewed and edited independently by two other authors (SH, RH). The scripts were revised until consensus was reached among these authors. Scripts were then sent to outside reviewers for additional comments before the final edits were made. The history-taking skills videotapes depicted a 64-year-old woman presenting to the emergency room with acute shortness of breath and chest pain due to a pulmonary embolism. The physical-examination skills tapes depicted a 69-year-old man with progressive shortness of breath who had ischemic cardiomyopathy. The counseling skills tapes portrayed a 48-year-old man returning to clinic for follow-up of recently diagnosed hypertension to discuss treatment options. To discriminate among the three levels of performance for counseling, for example, the criteria for informed decision making developed by Braddock et al.14 were used. The Braddock criteria recommend seven main behaviors when counseling a patient about starting a new medication. For the “unsatisfactory” videotape, the resident performed only one of the seven criteria, whereas for the “high satisfactory/superior” videotape, the resident performed six of the seven criteria.
Overall, the videotapes ranged in length from three minutes to 15 minutes. Both the inpatient and outpatient settings were used for the videotapes to reflect the use of the miniCEX in each of these settings.
Participants, divided into four groups, viewed the tapes in random order on a 32-inch television screen. They were instructed to use the ABIM's miniCEX form to rate whatever dimension(s) of clinical competence they believed they could appropriately evaluate while they observed the tape (the miniCEX form covers seven dimensions: medical interviewing, physical examination, humanistic qualities/professionalism, clinical judgment, counseling skills, organization/efficiency, and overall competence). The form uses a nine-point rating scale (1–3 = unsatisfactory performance, 4–6 = marginal/satisfactory performance, and 7–9 = superior performance).
The dimensions of clinical competence scored and the scores for each evaluation were recorded from all completed miniCEX forms. All participants completed the miniCEX forms independently without interaction with other participants. The lead investigator proctored all sessions. The participants viewed and rated the performances on all nine videotapes over the course of a full morning. Two breaks were included to reduce raters' fatigue.
Data Collection and Statistical Analysis
The main focus of this study is the participants' ratings of the miniCEX dimensions of medical interviewing, physical examination, and counseling based on their observation of the videotapes. Demographic data about the participants are presented as means and proportions as appropriate. Rating scores on the miniCEX form are presented as means and medians. The three-way analysis of variance (ANOVA) was used to compare ratings of each level of scripted performance for the three clinical skills the faculty evaluated.
Demographic characteristics of the 40 faculty and 16 residency programs that participated in the study are shown in Table 1. The faculty represented five university- and 11 community-based internal medicine residency programs. All but one community program had an active university affiliation. The participants were relatively early in their educational careers. Their mean amount of teaching experience was approximately three years, and half of the participants classified themselves as instructors. The residency programs ranged in size from five residents to 26 categorical residents per year, with almost half of the programs reporting 20–39 total residents.
Participants watched a total of 348 taped performances (of a possible 360, 97%) and turned in a completed miniCEX form for each performance. At least 38 faculty members viewed each of the nine videotapes. The mean ratings on the miniCEX increased with each higher level of performance (Table 2). For all three clinical skills, the three-way ANOVA demonstrated a statistically significant change (p < .0001 for all analyses) in scoring over the three levels of depicted performance. The range of scores for each tape, however, was wide (Table 2). Scores ranged from unsatisfactory to superior on four tapes, including two tapes scripted for high satisfactory/superior performance. Fourteen of the 38 (37%) observations for both history taking and physical examination on the “unsatisfactory” videotapes were scored as marginal/satisfactory (miniCEX rating of 4–6). None of the participants scored an “unsatisfactory” performance tape as superior (miniCEX rating of 7–9).
Using scripted videotapes of standardized patients and standardized residents, we demonstrated that the miniCEX has construct validity. Equally important, we found that the magnitude of rating differentiation between depicted levels of performance was educationally as well as statistically significant. Norcini and colleagues1 have previously shown that the miniCEX is a reliable evaluation tool. Durning and colleagues15 recently argued that the miniCEX had concurrent validity in their single residency program. They demonstrated a statistically significant correlation between miniCEXs and monthly evaluation forms, but their study was limited because the majority of miniCEXs and monthly evaluation forms were completed by the same attending physician. The results reported in this article are, to our knowledge, the first to demonstrate the construct validity of miniCEX.
Program directors must choose from a variety of methods to accomplish their teaching and evaluation objectives. Given the growing recognition of deficiency in clinical skills among physicians-in-training and practicing physicians, more attention to interviewing, physical examination, and counseling skills is needed in residency training.3–12 The direct observation of residents' behaviors is essential to assess these clinical skills. Whether this involves observation of the resident interacting with patients in real clinical settings or with standardized patients or other simulation methods will depend upon local resources. Standardized patients can be an excellent way to assess trainees' capabilities in clinical skills but should not replace the evaluation of residents' performance with actual patients.16–18
This study also suggests that faculty can distinguish between the poor or marginal resident and the satisfactory or superior resident. For example, of the 40 participating faculty, only 11 inaccurately rated the tape depicting a poor level of performance as satisfactory or higher for the medical interviewing (3), physical examination (7), and counseling (1). The vast majority of faculty in this study were able to provide a poor or marginal rating when the performance called for such a rating. From the perspective of a program director, the ability of faculty to make this distinction is very important for making decisions about competency.
Direct observation by teaching faculty remains vital and essential to the evaluation of clinical skills. In addition, direct observation by faculty has other clear benefits, including real-time, potentially high-quality feedback, and relationship building between residents and faculty.1 Finally, direct observation using the miniCEX helps to underscore the professional responsibility of faculty through the modeling and mentoring of basic but critically important clinical skills. In our opinion, direct observation is a fundamental duty of teaching faculty.
Although our study provides strong evidence for the construct validity of the miniCEX and the study by Norcini et al.1 documents the reliability of the miniCEX, important challenges remain. First, our results reinforce the need for multiple observations of the same resident and/or performance to ensure sufficient reliability, as witnessed by the wide range of ratings given for each level of clinical skill. Second, although construct validity is an important characteristic for any evaluation method, the question of predictive validity remains unanswered (whether performance on the miniCEX will correlate with future performance). Finally, this study does not address the important issue of the accuracy of direct observation. By accuracy, we mean the faculty's ability to identify whether a specific skill (e.g., use of a stethoscope in the cardiac examination) was performed properly, if performed at all. Accuracy is particularly important from a formative evaluation standpoint; faculty cannot correct errors or deficiencies if they cannot correctly identify the errors and deficiencies.
Two previous studies found serious problems with the accuracy of faculty's observations.11,12 Noel et al.,11 for example, found that faculty using an unstructured evaluation form for the “traditional” CEX achieved an accuracy score of only 40% on a simulated clinical videotape. Accuracy only improved to 66% with a structured CEX evaluation form. Not surprisingly, a brief training intervention that did not specifically address the issue of accuracy failed to improve the accuracy of faculty's ratings. This study and others highlight the clear need for more research on the accuracy of faculty's observations and strategies to improve their observation skills.
Two potential limitations of this study should be noted. First, the study's participants were highly motivated educators, and all were from internal medicine training programs. They may not be representative, therefore, of all teaching faculty. Second, the participants rated videotapes in a classroom setting rather than in actual patient encounters in a clinical setting, and thus their rating behavior may not reflect actual practice with their own residents. However, this group of young, highly motivated faculty may actually make them well suited for a test of construct validity. Motivation is required for sufficient attention to detail while viewing the videotapes, and the young faculty's relative inexperience would favor a negative study rather than the positive results we found. Although the use of videotapes may in some way be a limitation of the study, the main purpose of the study was to evaluate the ability of faculty to discriminate between levels of clinical skill using the miniCEX. The use of videotapes allows for strict control of the content and ensures that each participant views the same encounter in the same environment. Thus, this rigorous methodology helps to control for confounding that can be present in “live” interactions and is a logical first step in validity assessment.
Direct observation of residents' clinical skills is an essential responsibility of faculty educators. The miniCEX is a reliable tool to facilitate the evaluation of clinical skills. This is one of the first studies to document the construct validity of the miniCEX. Although the miniCEX has good construct validity, more work is needed on effective training methods to improve faculty's observation skills, especially in the era of the new ACGME general competencies.
The views expressed herein are solely those of the authors and do not represent the views of the Department of Defense or the Department of the Navy.
1. Norcini JJ, Blank LL, Arnold GK, Kimball HR. The Mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med. 1995;123:795–9.
2. Accreditation Council for Graduate Medical Education/American Board Medical Specialties. Toolbox of Assessment Methods. Version 1.1. Chicago: ACGME, 2000.
3. Weiner S, Nathanson M. Physical examination. Frequently observed errors. JAMA. 1976;236:852–5.
4. Johnson JE, Carpenter JL. Medical housestaff performance in physical examination. Arch Intern Med. 1986;146:937–41.
5. Pfeiffer C, Madray H, Adolino A, Willms J. The rise and fall of student's skill in obtaining a medical history. Med Educ. 1998;32:283–8.
6. Mangione S, Nieman LZ. Cardiac auscultatory skills of internal medicine and family practice trainees: a comparison of diagnostic proficiency. JAMA. 1997;278:717–22.
7. Li LTC. Assessment of basic examination skills of internal medicine residents. Acad Med. 1994;69:296–9.
8. Fox RA, Clark CLI, Scotland AD, Dacre JE. A study of pre-registration house officers' clinical skills. Med Educ. 2000;34:1007–12.
9. Sachdeva AK, Loiacono LA, Amiel GE, Blair PG, Friedman M, Roslyn JJ. Variability in the clinical skills of residents entering training programs in surgery. Surgery. 1995;118:300–309.
10. Herbers JE Jr, Noel GL, Cooper GS, Pangaro LN, Harvey J, Weaver MJ. How accurate are faculty evaluations of clinical competence? J Gen Intern Med. 1989;4:202–8.
11. Noel GL, Herbers JE Jr, Caplow MP, Cooper GS, Pangaro LN, Harvey J. How well do internal faculty members evaluate the clinical skills of residents? Ann Intern Med. 1992;117:757–65.
12. Kroboth FJ, Hanusa BH, Parker S et al. The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med. 1992;7:174–9.
13. Feinstein AR. Clinimetrics. New Haven, CT: Yale University Press, 1987.
14. Braddock CH, Edwards KA, Hasenberg NM, Laidley TL, Levinson W. Informed decision making in outpatient practice. Time to get back to basics. JAMA. 1999;282:2313–20.
15. Durning SJ, Cation LJ, Markert RJ, Pangaro LN. Assessing the reliability and validity of the mini-clinical evaluation exercise for internal medicine residency training. Acad Med. 2002;77:900–904.
16. Holmboe ES, Hawkins RE. Methods for evaluating the clinical competence of residents in internal medicine: a review. Ann Intern Med. 1998;129:42–8.
17. Stillman P, Swanson D, Regan MB et al. Assessment of clinical skills of residents utilizing standardized patients. A follow-up study and recommendations for application. Ann Intern Med. 1991;114:393–401.
18. Van der Vleuten CPM, Swanson DB. Assessment of clinical skills with standardized patients: state of the art. Teach Learn Med. 1990;2:58–76.