Dore, Kelly L.; Kreuger, Sharyn; Ladhani, Moyez; Rolfson, Darryl; Kurtz, Doris; Kulasegaram, Kulamakan; Cullimore, Amie J.; Norman, Geoffrey R.; Eva, Kevin W.; Bates, Stephen; Reiter, Harold I.
The Accreditation Council for Graduate Medical Education competencies1 and the similar Royal College of Physicians and Surgeons of Canada's CanMEDS 20002 roles are becoming the framework for residency training and evaluation. The roles—Professional, Communicator, Collaborator, Manager, Health Advocate, Scholar, and Medical Expert—ostensibly describe the ideal practicing physician and provide the standards for trainee assessment.
Assessment tools testing medical knowledge/expertise are readily available.1,3 However, reliably and validly assessing the other competencies challenges residency program directors: a challenge exacerbated by evidence suggesting that some qualities (e.g., empathy and moral reasoning) tend to remain static4 or even worsen5 during training. Ensuring trainees to maintain these skills as practicing physicians may thus be partly predicated on choosing medical school graduates with those skills when entering residency.
Entry into Canadian postgraduate training occurs via the Canadian Residency Matching Service, which attempts to match the candidate's choice with the program's choice, a resource- and time-intensive process as candidates routinely interview for many programs and programs consider many applicants. In-person interviews are commonly held, requiring considerable departmental resources. During interviews, a program must balance assessment of the candidate's suitability while advantageously portraying the program itself. Initial screening generally includes the curriculum vitae, personal letter, medical school transcript, reference letters, and deans' letters.6 Residency directors, surveyed across Canada, believe the personal interview is the most important evaluation of noncognitive qualities in candidates.7 However, the interview and other screening measures have limitations.8,9
Most critically, the available tools do not predict outcomes for intended performance aspects.10–13 These same issues, faced by undergraduate admissions committees, led to development of the Multiple Mini-Interview (MMI),14 similar in format to an objective structured clinical examination (OSCE).15 Robust reliability values for the MMI of 0.70 and above have been confirmed across multiple health professions when 10 or more stations are used. At the undergraduate level, the MMI is predictive of in-training OSCE performance,16 clinical clerkship scores, and licensing examination performance.17 Correlations with the component of the Medical Council written exam related to ethical and communication skills range from 0.37 to 0.39. Similar results have been found in other health care disciplines.18
However, undergraduate and postgraduate admissions processes differ in applicant pool size and composition: Postgraduate pools are smaller and much more homogenous. In addition, while undergraduate programs have an abundance of applicants relative to available positions, postgraduate programs often must actively recruit their preferred candidates. This study investigates the reliability and acceptability of the MMI at the postgraduate level, given the differences in the applicant pool and process. In each of two years, three cohorts of Canadian medical graduates (CMGs) and one cohort of international medical graduates (IMGs), who completed their medical training outside of Canada, were assessed using the MMI. In addition, this study assessed candidates' and interviewers' perceptions of the MMI for resident selection, which is important given the need for postgraduate programs to appeal to candidates.
All CMGs invited to interview by three training programs, obstetrics–gynecology and pediatrics (McMaster University) and internal medicine (University of Alberta), and IMGs applying to pediatrics (McMaster University), completed an MMI as part of their postgraduate admissions. The McMaster University and University of Alberta internal review boards both approved this study.
The IMGs completed their MMIs separately from the CMG applicants. Assessors were told to score each applicant relative to the appropriate pool, CMG or IMG, as candidates were applying for different training spots. A total of 484 candidates participated across the two years. The first year included 56 CMGs applying to obstetrics–gynecology, 56 CMGs applying to pediatrics, 8 IMGs applying to pediatrics, and 107 CMGs applying to internal medicine. In the second year, 52, 64, 16, and 125 candidates interviewed in these four cohorts, respectively.
All candidates underwent an MMI in which candidates rotate through multiple rooms (stations), each station addressing a separate question. Before entering the room, the candidate has two minutes to read a scenario on the door. On entering, he or she has eight minutes to address the scenario with an interviewer. The same interviewer remains with each station, thus minimizing potential halo effect.19 The MMI stations were based on the CanMEDS competencies,2 other than medical expert, allowing the same stations to be used across different specialties. Example stations have been published,14 and stations are available on request from the author. Within each year, the same seven MMIs were used all three programs.
The assessor in each station was asked to evaluate each candidate's communication skills, strength of discussion, and overall performance, using a nine-point anchored scale. Interviewers were also asked to note any “red flags” (i.e., incidents during the interview indicating unprofessional behavior that might exclude the candidate from the ranking process). Both obstetrics–gynecology and pediatrics used only one interviewer per room, either a faculty member or a current resident; internal medicine used one faculty and one resident in most stations.
All raters were provided with a rater-training booklet and, on interview day, underwent orientation, receiving relevant background and theory, and additional probing questions they could use for their station.
At the conclusion of the interview day, each candidate and interviewer were asked to complete a brief, anonymous, survey (similar to that used in the undergraduate context16) regarding their perceptions of the MMI for residency selection.
Generalizability theory was used to assess three kinds of reliability—internal consistency/interitem, interrater within station (for stations with two raters), and interstation. Thus, in a generalizability theory framework, we examined the generalizability across three facets of generalization—item, rater, and station. On the basis of the literature, we anticipated that station would contribute the most error variance, rater next, and item the least. Finally, we computed the overall test reliability considering all sources of variance.
As the candidates were scored relative to the other candidates within their applicant pools, it is of little benefit to compare across the applicant groups. However, as demonstrated by the means and ranges (Table 1), the overall mean score and the range for the IMG applicant pools were lower than those of the CMGs.
The interitem reliability (i.e., the internal consistency of the three scores assigned within any one station) was very high across all years and programs (see Table 1), perhaps indicating that all three subscales were not needed on the evaluation form. Table 1 also illustrates that the interrater reliability (only calculable for internal medicine) was also high.
The interstation reliability, as seen in previous evaluations of interviews, was quite low from one station to the next (see Table 1). That is, it is difficult to generalize from the scores an applicant received on one station to scores he or she received on another station.
Generalizability analyses demonstrated candidate × station variance components ranging from 1.26 to 1.96, candidate × item variance components ranging from 0.00 to 0.01, and candidate × rater variance components of 0.36 in 2008 and 0.75 in 2009 for internal medicine. These results indicate that a greater degree of assessment error is due to variance between stations rather than to raters or items.
The overall reliability of the seven-station MMI across all years and programs was at least moderately acceptable, ranging from 0.55 to 0.72 (Table 2). Seven stations were used in this study for feasibility and until acceptability was established. However, anticipating the future use of more stations, a decision study was performed, estimating overall reliability for 10 stations. The results demonstrated that the range of reliability would increase to 0.64–0.79 (Table 2).
In year one, 19 red flags were reported, 15 across the 219 CMG candidates (6.8%) and 4 for the 8 IMG candidates (50%) (Fisher exact test, P = .002). In year two, 40 were reported, 31 for the 241 CMG candidates (12.9%) and 9 for 16 IMG candidates (56.2%) (Fisher exact test, P = .0001). Across all years and programs, a total of seven candidates received more than one red flag.
Survey results indicated that overall the process was well received. Eighty-eight percent of candidates believed they could accurately portray themselves during the MMI, and 77% indicated that specialized medical knowledge was not needed to complete the stations. Seventy percent of candidates believed the stations provided sufficient time. Most important, only 8% of candidates said that the use of the MMI would discourage their applying. Of the assessors, 90% believed they received a reasonable portrayal of the candidate's abilities, and 74% believed the MMI outperformed the traditional interview method.
Assessment methods are generally evaluated in at least four areas: reliability, feasibility, acceptability, and validity. This study's results provide support for the MMI selection protocol in the context of postgraduate training in the first three of these areas. The MMI process demonstrated reliability across multiple specialty programs. While reliability theoretically could have been higher, the first two years of the study demonstrate acceptable results within the feasibility constraints, and especially compared with the traditional interview. Additionally, acceptability was demonstrated in the eyes of both the applicants and assessors. Personal communication with the program directors of the three programs involved indicated that an MMI requires no more resources than the traditional interview process, save finding adequate physical space. The results of these studies indicate that the MMI may allow postgraduate admissions, regardless of specialty, to reap benefits similar to those in undergraduate admissions. However, additional predictive validity must be established for a postgraduate sample.17 Over 95% of the candidates who completed the MMI for this study agreed to allow the authors to compare their MMI results with their Medical Council of Canada Qualifying Exam Part II written in the second year of their residency. Portions of this licensing exam were developed with the CanMEDS roles in mind.
The generalizability analysis indicated that overall test reliability was in the range observed in the other studies of the MMI. As expected, internal consistency across items was high, but correlation across stations was low.20 The reliability of the MMI increases with the number of stations, and the number of stations depends on individual program resource availability.
Evaluation of the variance components indicates that more error variance is attributable to stations, rather than item or rater, suggesting that maximizing the number of stations is more beneficial than increasing the number of raters per station.
The MMI stations used to date required no specific medical knowledge, reflecting competencies expected of all physicians, and were developed to apply across various specialties. The consistency of the findings across programs provides successful evidence of that effort. Combining a better selection process with better training protocols during residency training could aid in elevating the profession by encompassing a more holistic perspective of a competent practitioner.
The authors are grateful to the Medical Council of Canada for funding this study.
Other disclosures: None.
Ethical approval: The McMaster University and University of Alberta internal review boards both approved this study.
1 ACGME Outcome Project, Accreditation Council for Graduate Medical Education; American Board of Medical Specialties. Toolbox of Assessment Methods. Available at: http://www.acgme.org/Outcome/assess/Toolbox.pdf
. Accessed June 11, 2010.
2 Skills for the New Millennium: Report of the Societal Needs Working Group–CanMEDS 2000 project. Ann R Coll Physicians Surg Can. 1996;29:206–216.
4 Mangione S, Kane GC, Caruso JW, Gonnella JS, Nasca TJ, Hojat M. Assessment of empathy in different years of internal medicine training. Med Teach. 2002;24:370–373.
5 Patenaude J, Niyonsenga T, Fafard D. Changes in students' moral development during medical school: A cohort study. CMAJ. 2003;168:840–844.
6 Bandiera G, Regehr G. Evaluation of a structured application assessment instrument for assessing applications to Canadian postgraduate training programs in emergency medicine. Acad Emerg Med. 2003;10:594–598.
7 Provan JL, Cuttress L. Preferences of program directors for evaluation of candidates for postgraduate training. CMAJ. 1995;153:919–923.
8 Bandiera G, Regehr G. Reliability of a structured interview scoring instrument for a Canadian postgraduate emergency medicine training program. Acad Emerg Med. 2004;11:27–32.
9 Mallott D. Interview, dean's letter, and affective domain issues. Clin Orthop Relat Res. 2006;449:56–61.
10 Brown E, Rosinski EF, Altman DF. Comparing medical school graduates who perform poorly in residency with graduates who perform well. Acad Med. 1993;68:806–808.
11 Erlandson EE, Calhoun JG, Barrack FM, et al. Resident selection: Applicant selection criteria compared with performance. Surgery. 1982;92:270–275.
12 Hojat M, Connella JS, Veloski J, Erdmann JB. Is the glass half full or half empty? A reexamination of the associations between assessment measures during medical school and clinical competence after graduation. Acad Med. 1993;68(2 suppl):S69–S76.
13 Papp KK, Polk HC Jr, Richardson JD. The relationship between criteria used to select residents and performance during residency. Am J Surg. 1997;173:326–329.
14 Eva KW, Rosenfeld J, Reiter HI, Norman GR. An admissions OSCE: The multiple mini-interview. Med Educ. 2004;38:314–326.
15 Harden RM, Gleeson FA. Assessment of clinical competence using an objective structured clinical examination (OSCE). Med Educ. 1979;13:41–54.
16 Eva KW, Reiter HI, Rosenfeld J, Norman GR. The ability of the multiple mini-interview to predict preclerkship performance in medical school. Acad Med. 2004;79(10 suppl):S40–S42.
17 Eva KW, Reiter HI, Trinh K, Wasi P, Rosenfeld J, Norman GR. Predictive validity of the mutiple mini-interview for selecting medical trainees. Med Educ. 2009;43:767–775.
18 Salvatori and Stratford, unpublished. [Contact authors for more information.]
19 Dore KL, Hanson M, Reiter HI, Blanchard M, Deeth K, Eva KW. Medical school admissions: Enhancing the reliability and validity of an autobiographical screening tool. Acad Med. 2006;81(10 suppl):S70–S73.
20 Elstein AS, Shulman LS, Sprafka SA. Medical Problem-Solving: An Analysis of Clinical Reasoning. Cambridge, Mass: Harvard University Press; 1978.