The current crisis in providing effective and accessible health care in the United States has spawned a number of dual-degree leadership programs for medical undergraduates.1 In 2005, the University of California (UC) initiated an ambitious initiative, the Program in Medical Education (PRIME), to increase enrollment in its medical schools in order to address the needs of California's disadvantaged populations.2,3 In 2007, at the David Geffen School of Medicine at UCLA, UCLA-PRIME was developed as a five-year dual-degree program focused on the development of leadership skills in 18 medical students per year whose career goals would be to improve health care for the disadvantaged and medically underserved. UCLA-PRIME received 980 applications for the 18 available seats in its inaugural class. Clearly, given the specific and unique goals of the program, a new admissions process and selection mechanism was needed to identify students with leadership potential who would remain committed to practicing in disadvantaged communities.
Given the low attrition rates in U.S. medical schools, it has been argued by Eva and Reiter4 that the admissions process is the most important test in one's medical career; once admitted, the vast majority of candidates will become practicing physicians. The selection of future physicians, however, often fails on several accounts.4 First, the cognitive record of the applicant, that is, grade point average (GPA) and Medical College Admission Test (MCAT) scores, commonly overrides any consideration of noncognitive attributes in decisions to admit.5 Second, the noncognitive qualities sought in applicants are unclear, remain implicit, and are not necessarily agreed on by stakeholders. Third, even if a set of desirable noncognitive qualities for candidates is clear and agreed on, reliable and valid assessment methods are scarce. This is particularly true for characteristics such as altruism, empathy, and leadership. Furthermore, the entire admissions process is rarely transparent or uniformly applied.
For years, interviews have been considered part and parcel of the admissions process in almost all medical schools in the United States and Canada. Besides serving as a recruitment tool, interviews purportedly provide an assessment of the applicant that complements the information gleaned from MCAT, GPA, and other cognitive measures.6 Unfortunately, admissions interviews are, like many other assessments, prone to “context specificity.”7 That is, the performance of an applicant during the interview may depend to an important extent on the particular interviewer, the specific questions asked, or other factors irrelevant to the applicant's suitability. Indeed, Kreiter and colleagues8 studied the variance components of admissions interview scores and found that the variance component attributable to applicants was smaller than variance component attributable to the applicant-by-occasion interaction. These and similar findings imply that traditional interviews may have inadequate reliability and, thus, questionable validity. Methods proposed to improve the reliability of the admissions interview include structuring the interview, training interviewers, or involving more than one interviewer9–11 and/or using “medical judgment vignettes.”10,11
A more substantive departure from the traditional admissions interview was described in studies by Eva and colleagues12,13 at the Michael G. DeGroote Faculty of Medicine at McMaster University. In lieu of one or two long admissions interviews, they proposed a series of multiple mini-interviews (MMIs), a circuit of 8 to 12 short, structured interviews in which applicants briefly interact with interviewers. Immediately after a brief, prescribed encounter with each applicant, the interviewer evaluates and scores the applicant. Scores are combined for each student across the 8 to 12 encounters to provide an overall assessment. Thus, by involving a sufficient number of questions and interviewers, the MMI addresses the problem of “context specificity.” In their initial study, Eva et al12 demonstrated that, with 12 stations, each with a single interviewer, a reliability (generalizability index) of 0.85 could be achieved. Since then, the MMI has been gaining in popularity with medical schools around the world,14–20 and modifications have been suggested.21
Clearly, these findings suggest that the psychometric properties of the MMI are superior to a traditional admissions interview. The initial MMI study by Eva and colleagues12 was conducted on graduate students, a relatively heterogeneous group compared with a pool of medical school applicants. This may have inflated their reliability results. As Eva and colleagues22 put forth in a subsequent article, “the reliability and validity of any assessment strategy is dependent on the context in which the strategy is applied and the content of the assessment.” In other words, the promising psychometric properties of the MMI may not necessarily hold up for a more homogenous pool of applicants who have been selected for consideration on the basis of a more specific set of attributes. The purpose of the current study was to investigate the reliability and acceptability of an MMI when used as a central component of the admissions process for the UCLA-PRIME program that was narrowly focused on selecting students with similar goals, prior experiences, and capacities matched to serving as leaders in improving health care for disadvantaged populations. Specifically, in this study we describe the implementation of the MMI in two consecutive years including measures put in place to enhance the reliability of the assessment.
The development of a new admissions process for UCLA-PRIME students was modeled after an approach developed by the Michael G. DeGroote Faculty of Medicine at McMaster University.4 First, we generated an inventory of the desirable characteristics of UCLA-PRIME candidates with a focus on leadership and commitment to disadvantaged populations using a Delphi approach among stakeholders (program administrators, deans, faculty members, and community leaders). We described the details of the Delphi study elsewhere.23 Characteristics that were deemed essential for the PRIME program included commitment to and experience with underserved populations, cultural sensitivity, leadership potential, maturity, and being an effective team member. We selected 12 MMI stations from the pool of stations developed at McMaster University with these characteristics in mind. We implemented the MMI in 2009 (after piloting it in 2008) and then again with some procedural changes in 2010.
Study 1 (2009)
In 2009, we created a panel of 28 interviewers consisting of 18 faculty members, 6 medical students, and 4 community members. Most of the interviewers were drawn from the UCLA-PRIME advisory board, which is made up of faculty and administrators who have been involved with improving health care in disadvantaged communities. None had seen the applicants' application materials. We introduced the interviewers to the MMI process in a 90-minute orientation session several weeks prior to the first round of interviews. We explained the rationale behind replacing the traditional interview process with MMIs guided by the seminal article by Eva et al,12 and we reviewed the MMI interview and scoring process step-by-step.
On the day of the MMI, we handed out the scenarios and a list of applicants to the interviewers. The interviewers practiced the scenarios with each other before the applicants arrived. We instructed the interviewers to rate the overall performance of the applicant using a seven-point Likert scale (1 = unsatisfactory; 7 = outstanding). Specifically, we asked them to “consider the applicant's communication skills, strength of the argument, and suitability for the medical profession.” We strongly encouraged the interviewers to use the full rating scale, recognizing that interviewees had been selected from a very large pool of applicants and exceeded all other admissions requirements. Interviewers scored the applicants immediately after each interview. They could adjust their scoring after they completed interviewing the entire cohort. A total score was calculated for each applicant by summing the scores for individual stations. Thus, total scores could range from 12 through 84.
In 2009, the MMIs took place on three separate weekend days in the internal medical outpatient suite of the UCLA Medical Center. We sent a copy of Eva and colleagues'12 paper to the applicants ahead of time to familiarize them with the MMI process. We used 13 adjacent examination rooms, one of which functioned as a “rest station.” Applicants cycled through the 13-station circuit, preparing for each interview for 2 minutes and then spending 8 minutes with each interviewer. We interspersed two short restroom breaks. Thus, it took 145 minutes to complete the full MMI cycle. We interviewed two cohorts of 13 applicants on each day while keeping cohorts separate from one another throughout the day. All faculty, staff, and applicants signed statements of confidentiality.
Applicants and interviewers evaluated their MMI experiences anonymously with an instrument that was based on work by Eva et al.13 We also studied the preliminary psychometric properties by conducting a generalizability theory24 analysis using the GENOVA system.25 All other analyses were conducted using SPSS version 17 (SPSS, Inc, Chicago, Illinois). The UCLA Office for the Protection of Human Subjects reviewed and approved this research.
Study 2 (2010)
In 2010, we repeated the MMI with a new cohort of 78 applicants. A total of 31 interviewers participated, of whom 11 had participated in the 2009 study. The applicants, but not the interviewers, evaluated the MMI experience with the same instrument used in 2009. All procedures were identical to 2009 with the exception of four changes that we implemented in an effort to improve the reliability of the assessment.
First, we moved the MMI venue to our education building and used adjacent rooms typically used for small-group teaching of medical students. The applicants could familiarize themselves with the layout of the facility before commencing the MMI.
Second, we replaced an easy station (Station 9, “How did you prepare for this interview?”) with a perhaps more challenging task in which applicants were asked to describe student characteristics desirable for the PRIME program. Difficulty level was not assessed formally but was suggested by the fact that interviewers had difficulty differentiating performance of the applicants in the original station. The remaining 11 stations were the same as in 2009.
Third, we asked the interviewers to rate the performance of an applicant relative to the pool of all applicants. Accordingly, we changed the seven-point Likert-scale anchors to a normative scoring rubric (1 = bottom 15%; 4 = middle 50%; 7 = top 15%). The scoring procedure was otherwise the same as in 2009. That is, interviewers scored applicants immediately after each interview and could make adjustments after completing the entire cohort.
Finally, we changed the wording of two stations that previously led to confusion among some applicants. In 2009, one station asked the applicants to discuss “surgeons' mortality rates.” A few applicants proceeded to discuss the mortality rate of surgeons and not their patients. In 2010, we changed the prompt to “surgeons' patient mortality rates.” In another station, we replaced the term “SARS epidemic” with the more recent “H1N1 epidemic” but left the crux of the station the same.
Study 1 (2009)
A total of 76 applicants participated in the MMI across three days. Two applicants cancelled their participation at the last moment. Demographic information of the applicants is summarized in Table 1. Twenty-six of the 28 interviewers (93%) and all 76 applicants completed the postassessment questionnaire. Responses of the applicants and the interviewers are listed in Tables 2 and 3, respectively. Using a seven-point Likert scale (1 = definitely not; 7 = definitely), applicants indicated that the MMI process was free of cultural bias (average 6.3) or gender bias (average 6.6). Only 3 applicants (4%) felt that “a lot” of specialized knowledge was required. The timing of each station (eight minutes) was seen by applicants as adequate for portraying one's abilities, although 5 (6.6%) indicated that they had “too little” time. In contrast, only 1 interviewer (3.8%) felt that the time was more than enough. Five applicants (6.6%) rated the experience as “definitely stressful.” All interviewers, however, seemed to enjoy the experience; 23 (88.5%) indicated that they would “very likely” participate in the MMI again in the future.
We found no differences in total MMI scores between men and women or between those who identified as economically or educationally disadvantaged. The distribution of the total MMI scores, however, was skewed toward the maximum score, suggesting that interviewers had difficulty using the lower range of the rating rubric (Figure 1).
To estimate the reliability of the MMI process, we calculated the variance components (Table 4). We used a random P × I design where applicants were crossed with MMI stations. The generalizability coefficient was estimated to be 0.59 (see Crossley et al26 for a theoretical background of generalizability theory).
Study 2 (2010)
A total of 78 applicants participated. Their demographic information was highly similar to the 2009 cohort (Table 1). All but one applicant completed the postinterview survey. Their responses were comparable to the previous cohort (Table 2), with one notable exception: The 2010 cohort rated the MMI process as less stressful (average 3.7 in 2010 versus 4.2 in 2009; chi-square = 4.1; df = 1; P < .05).
The distribution of the total MMI scores is displayed in Figure 1. The average score dropped significantly from 60.6 in 2009 to 54.7 in 2010 (t test, t = 4.1; df = 152; P < .001). The dispersion of the scores, however, was larger in 2010 compared with 2009 (Levene test, F = 7.0; P < .01). This suggests that interviewers were better able to use the full range of the revised Likert scale.
The variance components of the MMI are listed in Table 4. An estimated 16% of the variance between scores could be attributed to consistent candidate-to-candidate variation across MMI stations and interviewers (the “true” candidate score)—up from 10% in 2009. The remaining 84% of the variance is from unwanted sources, such as interviewer leniency and interaction between candidate and MMI question. (Note that in this study we were not able to separate out these sources of variance from the error term. Interviewers and stations were confounded, and each applicant participated in each station only once.) The generalizability coefficient was 0.71.
Using the variance components obtained in 2009 and 2010, we used a D study (Figure 2) to estimate reliability of the assessment with varying numbers of stations and with and without Station 9 (the station replaced in 2010 with a more challenging task). Figure 2 clearly indicates that the reliability of the assessment in 2010 was superior compared with the reliability noted in 2009. We estimated that the number of stations in 2009 needed to be increased to 20 stations in order to reach the same reliability we achieved in 2010 with only 12 stations. Also, the inclusion of Station 9 in 2009 depressed the reliability of the overall assessment, whereas its replacement in 2010 did not affect the reliability.
Our study showed that the MMI can be effectively used to assess a homogeneous group of applicants and that its reliability can be enhanced with minor changes in protocol. Our implementation of the MMI was part of a selection process for a new undergraduate leadership program focusing on health care for disadvantaged populations. The process was well received by applicants and interviewers. Despite its novelty, both applicants and interviewers felt well prepared for the MMI experience and considered it a fair process. Most applicants and interviewers felt that the MMI did not require specialized knowledge, supporting the supposition that the MMI assessed “noncognitive” qualities other than aptitude and knowledge. The process was perceived to be free of gender and cultural bias, a perception that was corroborated by the data. We found no differences in scores between applicants with or without self-declared disadvantaged backgrounds, suggesting that the MMI provides a level playing field for applicants. A similar finding was reported in another study comparing aboriginal and nonaboriginal applicants.27
Reliability of the first MMI implementation in 2009 was 0.58—lower than reported elsewhere. Typically, estimated reliabilities for 12 stations with one interviewer per station have been found to be between 0.69 and 0.85.12,22 Our interviewees were a relatively homogenous group of applicants because initial screening considered primary and secondary application information that demonstrated a strong commitment to disadvantaged populations. This homogeneity and the smaller sample size may have resulted in comparatively less variability among the interviewees and could have suppressed the reliability of the overall MMI assessment as estimated by the generalizability coefficient. And, indeed, some of our interviewers in the 2009 study stated they found it difficult to differentiate between members of our select group of applicants. Still, Eva et al22 continued to demonstrate high reliability (0.80) even in a small, homogeneous group of residents, illustrating that the psychometric properties of an assessment do not necessarily transfer to a different context and population.
We made a few changes in the 2010 implementation of the MMI process that, all taken together, seemed to have contributed to a substantial improvement of the reliability. One such change was the replacement of a seemingly “easy” station (determined at face value) with a more challenging one. To facilitate discrimination between applicants, the stations must have an optimal level of difficulty. Item response theory suggests that items of median difficulty best discriminate between groups with either high or low magnitude of a latent trait.28 In other words, if the task that a candidate is asked to carry out is too easy (or too difficult, for that matter), an interviewer will find it difficult to rank-order candidates. And, indeed, our analysis showed that an easy station simply “added noise to the signal.” When we recalculated the reliability excluding Station 9, the reliability improved; it did not decrease, as one would expect when taking away one assessment point. Other institutions interested in evaluating their MMI processes could consider reanalyzing reliability holding out one specific station to identify a relatively ineffective or, as in our case, a detrimental station.
In our 2009 study, interviewers found that some of the Likert anchors on the scoring sheet (1 = unsatisfactory, 3 = borderline, 5 = satisfactory, etc.) rarely applied to a highly select group of mostly excellent applicants. In our 2010 study, the interviewers seemed better able to use the full range of the rating scale after we changed its anchors to “bottom 15%,” “bottom 30%,” “middle 50%,” etc., and asked interviewers to rate an applicant's performance relative to the pool of all applicants. Thus, we encouraged rank-ordering of candidates with a more normative approach of scoring. Interviewers could adjust their scoring after having seen a cohort of 13 applicants (and this was allowed in the 2009 study as well). This approach seemed to work well even with a relatively homogeneous group of applicants. The MMI scores had a wider range and a lower average compared with 2009, even though the two cohorts of candidates were comparable.
Another factor that may have enhanced the reliability of the MMI was the fact that 2010 applicants found the MMI process significantly less stressful compared with the 2009 cohort. Perhaps the change of venue to a less intimidating environment (an education building as opposed to an internal medicine suite) and the tour through the MMI areas beforehand may have contributed to this. Because “stress” does not affect all applicants equally, it may increase irrelevant variation in scores (specifically, the interaction between applicants and stations and interviewers), impeding reliable rank-ordering of candidates. As institutions often need to resort to multiple venues to accommodate concurrent MMIs circuits, they may want to study to what extent the location affects MMI performance.
We found that implementing MMIs was feasible but a daunting task nonetheless. The preparation involves blueprinting the MMI circuit, finding appropriate space, securing the availability of interviewers and support staff, and sequestering cohorts of applicants. Clearly, the MMI requires extensive human resources. In a recent cost-efficiency analysis, Rosenfeld et al29 found that MMI requires more upfront preparation (securing space, identifying appropriate interview questions, interviewer training, etc.) compared with the traditional interview process. This cost, however, was offset by considerably fewer hours required of each person to assess a pool of applicants. We would note that the time saving is even more considerable if the time spent by interviewers in writing reports and attending committee meetings in which applicants are discussed is taken into account. Currently, the David Geffen School of Medicine at UCLA is ramping up the MMI process to be used with all of its medical school applicants.
Our study has several limitations. First, we did not assess the validity of the MMI process even though one could argue that blueprinting the MMI stations based on our Delphi study provided an acceptable level of content validity. Several studies have estimated the predictive validity of the MMI process.18,30,31 A recent study by Eva and colleagues22 correlated the MMI scores of 22 internal medicine residents with scores of Part II of the Medical Council of Canada Qualifying Examination (MCCQE). To obtain a medical license in Canada, residents must pass this OSCE-style exam in which they are observed interacting with standardized patients. The investigators found that the MMI scores were statistically predictive of the MCCQE Part II scores—in particular the patient-interaction subscore (r = 0.65). This finding was replicated with another cohort of 34 students for whom MMI scores were obtained five years previously. In fact, the MMI was the only assessment among the admission tools included that could predict MCCQE Part II scores at a statistically significant level. More studies assessing the long-term validity of the MMI need to be carried out and planned carefully in advance. If early performance of the first two UCLA-PRIME cohorts can be an indicator of the effectiveness of our selection process, the signs are promising. Both cohorts have generated remarkable projects with underserved communities, including a foot care clinic for the homeless on Skid Row in downtown Los Angeles, and rebuilding a community center in an urban area stricken by gang violence.
Our study is further limited by the naturalistic environment in which we collected data and the fact that we did not systematically control the various factors that may have influenced the reliability of the MMI. It is certainly possible that other factors beyond our control may have caused an improvement of the reliability in 2010, such as a mostly different group of interviewers (only one-third participated in both studies). The strength of this study was that we clearly demonstrated that the MMI was found to be a fair and effective assessment by both applicants and interviewers and that the changes we implemented in 2010 did not adversely affect this perception. Moreover, our results suggest that these changes considerably improved the reliability of the assessment, which was already at a level superior to most traditional admissions interview practices.
Research in this area is hampered by the ubiquitous but ill-defined term “noncognitive characteristics.” As Norman32 pointed out, the umbrella term “noncognitive skills” is used to describe those characteristics that MCAT score or GPA do not reflect, such as tacit knowledge, communication skills, emotional intelligence, and stable personality traits. We feel that admissions committees must explicitly define those qualities they deem essential for a successful medical school career and subsequent practice and that are in concordance with the institution's philosophy and goals. As in our case, we determined the qualities in relation to the unique mission of the UCLA-PRIME program through a Delphi process. Valid and reliable assessments, then, need to be identified or developed for each of those characteristics. The MMI may or may not be appropriate for some of these characteristics. For instance, should certain stable personality traits (e.g., based on the Big Five: Neuroticism, Extroversion, Openness to Experience, Agreeableness, and Conscientiousness) be deemed essential for the “ideal” physician or medical student, they may be better assessed with validated paper-and-pencil instruments. In fact, such instruments have been shown to have predictive validity for medical school performance in a recent Belgian study.33
We describe an admissions process designed to select future physicians with leadership skills and commitment to serve vulnerable populations. First, stakeholders were involved in prioritizing selection criteria they felt were essential for physicians to be able to meet the needs of the underserved. A circuit of 12 MMI stations was used to screen applicants for these criteria. MMI results were then combined with other admission information to yield a selection decision. This process was seen as transparent and fair by both applicants and interviewers. The reliability of the MMIs was enhanced by, among other things, clarifying the scoring and replacing an easy station with a more challenging task. It remains to be seen whether or not this selection process was effective. Building on our experience with assessing the long-term outcomes of the UCLA/Charles R. Drew University Medical Education Program,34 we will continue to monitor the outcomes of this selection process by tracking the academic performance, clinical competence, community engagement, and career choices of applicants for many years to come.
The authors wish to thank Dr. LuAnn Wilkerson for her comments on an earlier version of this manuscript and Ms. Michelle Vermillion and Ms. Emma Ledesma for their help with the preparation of this manuscript.
This study was approved by the UCLA Office of Protection of Human Subjects and exempted for full review.
1Crites GE, Ebert JR, Schuster RJ. Beyond the dual degree: Development of a five-year program in leadership for medical undergraduates. Acad Med. 2008;83:52–58. http://journals.lww.com/academicmedicine/Fulltext/2008/01000/Beyond_the_Dual_Degree__Development_of_a_Five_Year.8.aspx
. Accessed April 28, 2011.
2Nation CL, Gerstenberger A, Bullard D. Preparing for change: The plan, the promise, and the parachute. Acad Med. 2007;82:1139–1144. http://journals.lww.com/academicmedicine/Fulltext/2007/12000/Preparing_for_Change__The_Plan,_the_Promise,_and.5.aspx
. Accessed April 28, 2011.
3Manetta A, Stephens F, Rea J, Vega C. Addressing health care needs of the Latino community: One medical school's approach. Acad Med. 2007;82:1145–1151. http://journals.lww.com/academicmedicine/Fulltext/2007/12000/Addressing_Health_Care_Needs_of_the_Latino.7.aspx
. Accessed April 28, 2011.
4Eva KW, Reiter HI. Where judgement fails: Pitfalls in the selection process for medical personnel. Adv Health Sci Educ Theory Pract. 2004;9:161–174.
5Johnson L, Mitchel K, Boyd C, Solow C. Holistic Review in Medical School Admissions: Making It Real. Chicago, Ill: Central Group on Student Affairs; 2009.
6Edwards JC, Johnson EK, Molidor JB. The interview in the admission process. Acad Med. 1990;65:167–177. http://journals.lww.com/academicmedicine/Abstract/1990/03000/The_interview_in_the_admission_process.8.aspx
. Accessed April 28, 2011.
7Eva KW. On the generality of specificity. Med Educ. 2003;37:587–588.
8Kreiter CD, Yin P, Solow C, Brennan RL. Investigating the reliability of the medical school admissions interview. Adv Health Sci Educ Theory Pract. 2004;9:147–159.
9Albanese MA, Snow MH, Skochelak SE, Huggett KN, Farrell PM. Assessing personal qualities in medical school admissions. Acad Med. 2003;78:313–321. http://journals.lww.com/academicmedicine/Fulltext/2003/03000/Assessing_Personal_Qualities_in_Medical_School.16.aspx
. Accessed April 28, 2011.
10Donnon T, Oddone-Paolucci E, Violato C. A predictive validity study of medical judgment vignettes to assess students' noncognitive attributes: A 3-year prospective longitudinal study. Med Teach 2009;31:e148–e155.
11Donnon T, Paolucci EO. A generalizability study of the medical judgment vignettes interview to assess students' noncognitive attributes for medical school. BMC Med Educ. 2008;8:58.
12Eva KW, Rosenfeld J, Reiter HI, Norman GR. An admissions OSCE: The multiple mini-interview. Med Educ 2004;38:314–326.
13Eva KW, Reiter HI, Rosenfeld J, Norman GR. The relationship between interviewers' characteristics and ratings assigned during a multiple mini-interview. Acad Med. 2004;79:602–609. http://journals.lww.com/academicmedicine/Fulltext/2004/06000/The_Relationship_between_Interviewers_.21.aspx
. Accessed April 28, 2011.
14Dodson M, Crotty B, Prideaux D, Carne R, Ward A, de Leeuw E. The multiple mini-interview: How long is long enough? Med Educ. 2009;43:168–174.
15Lemay JF, Lockyer JM, Collin VT, Brownell AK. Assessment of non-cognitive traits through the admissions multiple mini-interview. Med Educ. 2007;41:573–579.
16Brownell K, Lockyer J, Collin T, Lemay JF. Introduction of the multiple mini interview into the admissions process at the University of Calgary: Acceptability and feasibility. Med Teach. 2007;29:394–396.
17Harris S, Owen C. Discerning quality: Using the multiple mini-interview in student selection for the Australian National University Medical School. Med Educ. 2007;41:234–241.
18Hofmeister M, Lockyer J, Crutcher R. The multiple mini-interview for selection of international medical graduates into family medicine residency education. Med Educ. 2009;43:573–579.
19Kumar K, Roberts C, Rothnie I, du Fresne C, Walton M. Experiences of the multiple mini-interview: A qualitative analysis. Med Educ. 2009;43:360–367.
20Humphrey S, Dowson S, Wall D, Diwakar V, Goodyear HM. Multiple mini-interviews: Opinions of candidates and interviewers. Med Educ. 2008;42:207–213.
21Harding DH. Fine tuning the MMI. Paper presented at: International Ottawa Conference on Medical Education; 2008; Melbourne, Australia.
22Eva KW, Reiter HI, Trinh K, Wasi P, Rosenfeld J, Norman GR. Predictive validity of the multiple mini-interview for selecting medical trainees. Med Educ. 2009;43:767–775.
23Doyle HL, Vermillion M, Uijtdehaage S. Developing an innovative admission policy for a new leadership program in medicine. Paper presented at: Annual Meeting of the American Educational Research Association; 2009; San Diego, Calif.
24Brennan RL. Generalizability Theory. New York, NY: Springer; 2001.
25Crick J, Brennan RL. A Generalized Analysis of Variance (GENOVA) System. Iowa City, Iowa: The American College Testing Progam; 1983.
26Crossley J, Russell J, Jolly B, et al. ‘I'm pickin’ up good regressions': The governance of generalisability analyses. Med Educ. 2007;41:926–934.
27Moreau K, Reiter H, Eva KW. Comparison of aboriginal and nonaboriginal applicants for admissions on the Multiple Mini-Interview using aboriginal and nonaboriginal interviewers. Teach Learn Med. 2006;18:58–61.
28Hambleton RK, Swaminathan H, Rogers HJ. Fundamentals of Item Response Theory. Newbury Park, Calif: Sage Publications; 1991.
29Rosenfeld JM, Reiter HI, Trinh K, et al. A cost efficiency comparison between the multiple mini-interview and traditional admissions interviews. Adv Health Sci Educ Theory Pract. 2008;13:43–58.
30Eva KW, Reiter HI, Rosenfeld J, Norman GR. The ability of the multiple mini-interview to predict preclerkship performance in medical school. Acad Med. 2004;79(10 suppl):S40–S42. http://journals.lww.com/academicmedicine/Fulltext/2004/10001/The_Ability_of_the_Multiple_Mini_Interview_to.12.aspx
. Accessed April 28, 2011.
31Reiter HI, Eva KW, Rosenfeld J, Norman GR. Multiple mini-interviews predict clerkship and licensing examination performance. Med Educ. 2007;41:378–384.
32Norman G. Non-cognitive factors in health sciences education: From the clinic floor to the cutting room floor. Adv Health Sci Educ Theory Pract. March 2010;15:1–8.
33Lievens F, Ones DS, Dilchert S. Personality scale validities increase throughout medical school. J Appl Psychol. 2009;94:1514–1535.
34Ko M, Edelstein RA, Heslin KC, et al. Impact of the University of California, Los Angeles/Charles R. Drew University Medical Education Program on medical students' intentions to practice in underserved areas. Acad Med. 2005;80:803–808. http://journals.lww.com/academicmedicine/Fulltext/2005/09000/Impact_of_the_University_of_California,_Los.4.aspx
. Accessed April 28, 2011.