Despite improvements in health indicators of U.S. racial and ethnic populations, disparities in care continue to exist.1 Within the context of medical education, there has been a call for broad strategies extending beyond measures of the compositional diversity of trainees or representational ratios.2 Enhancing diversity in the health care workforce has been proposed as one approach to address those group disparities.3 Cohen et al3 cite increasing access and ensuring optimal management of the health care system, in addition to issues of equity and fairness, as pragmatic reasons for attaining workforce diversity.
While traditional measures of academic competencies, such as undergraduate grade point averages (GPAs) and Medical College Admission Test (MCAT) scores, are commonly used, reliance on traditional approaches overlooks the importance of personal competencies in predicting success within medical school.4–6 Considerable research has examined different approaches to identify and assess personal competencies in the admissions process.7,8 Limitations in traditional assessment methods—for instance, letters of evaluation9–13 and autobiographical assessments5,14—have been identified. In response, some medical schools across North America and Europe have adopted the Multiple Mini Interview (MMI) to increase the reliability and predictive validity of the interview process.15–18
The MMI is a multiple-station assessment process. Applicants complete a series of brief interviews with a number of different raters. Research has shown that MMI scores are more reliable than those obtained from traditional interviews,15,19,20 are able to predict clerkship performance and national licensing examination performance,18–20 and can be designed to reflect the values of the individual medical school.21 Furthermore, applicants’ performance on the MMI is unrelated to gender, community of origin size, and income,17 suggesting that the MMI may promote diversity within medical school by increasing the number of accepted applicants from groups underrepresented in medicine (URIM). Increasing trainee diversity is important for shaping educational quality for all students, increasing access to health care in rural, inner-city, and minority populations, and accelerating advances in medical and public health research.22 Reliance on GPAs and MCAT scores may severely constrain diversity within medicine,23,24 and researchers have argued that basing admission selections on MMI scores may promote applicant diversity.17,25
Changes in accreditation requirements reflect the enhanced attention to diversity expected of all medical schools.26 The Holistic Review Project has articulated a model that promotes the consideration of both academic and personal competencies in the application process.27 In response, Rutgers Robert Wood Johnson Medical School (RWJMS) began to implement a more holistic screening process; after initial screening of academic and experiential criteria, applicants are assessed on a series of core personal competencies in an MMI setting. Importantly, applicants are admitted exclusively on the basis of their MMI scores. MCAT data support this reliance on academic thresholds. Using data from 11 schools, a study by Julian28 demonstrated that the risk of academic difficulties remained very low until entering students’ MCAT scores fell below 8 for biological sciences, 7 for physical sciences, and 6 for verbal reasoning. These findings suggest that for students exceeding acceptable academic thresholds, selection procedures should be less concerned with academic performance and more concerned with core personal competencies performance. Longitudinal performance in academics and experiential preparation is given significant weight at RWJMS, as thousands of applicants are screened for inclusion in the interview cohort. Our basic tenet is that once an applicant meets baseline academic and experiential criteria, responses to how s/he has behaved in the past or to how s/he might behave in a particular situation, as measured by an MMI, would be as good and perhaps a better predictor of future performance than a combination of academic, interview, and screening scores. In support of this hypothesis, the first cohort at RWJMS whose final admissions decision was based solely on MMI scores performed equivalently in first- and second-year courses and on United States Medical Licensing Examination (USMLE) Step 1 relative to previous cohorts admitted on the basis of traditional interviews, academic scores, and experiences. Additionally, the MMI scores from this first cohort predicted scores for students’ core personal competencies assessed in medical school (reliability, integrity, service/sensitivity to diversity).29
To our knowledge, this is the first medical education program to implement an MMI to specifically evaluate applicants’ core personal competencies, as defined by the Association of American Medical Colleges Committee on Admissions (AAMC COA),4 and to base final admissions decisions exclusively on MMI performance. We were interested in examining whether the acceptance of applicants based solely on MMI performance enhances diversity among medical students. Specifically, we examined whether academic measures (GPA, MCAT), experience scores (service, clinical, and research [SCR]), and personal competencies scores (MMI) varied as a function of applicants’ self-reported race/ethnicity, and whether change in weighting of scores would impact diversity by altering the demographic composition of the entering classes. We define diversity as those self-identifying as black/African American, Hispanic, Native American, Native Hawaiian, or Native Alaskan. None of the URIM applicants in the three years of the study self-identified as the latter three groups, and as such, URIM are those who identified as either black/African American or Hispanic.
Setting, study population, application screening process
This is a retrospective study of previously collected and recorded data for the RWJMS admissions process for entering classes 2011–2013. The study was approved by the institutional review board of Rutgers University. The Hamilton Integrated Research Ethics Board of McMaster University (data analysis site) determined that deidentified secondary use of data was exempt from ethics review. We excluded applicants who participated in articulated programs (BA MD, Access Med Pipeline Program, and postbaccalaureate linkage) in the final dataset because the admissions procedures for these trainees differed. Over the three years, 72 (5%) applicants participated in articulated programs.
We determined that applicants screened for MMI were academically and experientially prepared, based on threshold criteria previously set by the RWJMS Admissions Committee (total GPA > 3.0, total MCAT > 22, MCAT biological science score > 8, and no other MCAT score < 6). For experiential preparation, a new behaviorally anchored scale was developed for screening. We scored service, clinical exposure, research, the personal essay, and letters of recommendation on a 1–5 Likert scale. The scale was developed so that a score of 3 is an acceptable score for an applicant. An example of a research rating of 5 would indicate culmination of the research experience with peer-reviewed presentation or publication. With respect to service, regular involvement in a service organization would be rated 3, whereas the founder of a service organization would be rated a 5. The sums of the screening scores were not used to rank applicants but served as threshold scores below which an interview would not be offered. An SCR score was developed to inform but not dictate screening decisions, as some students did not have research experience. We considered for interview only applicants who met the academic criteria and who had SCR, personal essay, and letters scores of at least 3. We did not revisit the GPA, MCAT, experiences screening scores, essays, and letters after applicants were selected for interview.
The MMI process, committee deliberations, admissions decisions
The MMI process at RWJMS consists of a six-station MMI. Each station consists of a behavioral descriptor or situational judgment-type interview stem addressing a specific AAMC COA core personal competency4 or combination of competencies. All interview stems are unique on a given interview day and written by one of the authors (C.A.T.). The MMI process at RWJMS employs only the 30 members of the standing committee, who participate in modified frame-of-reference training prior to the sessions. Extensive interviewer training allows for the assumption of adequate reliability with a six-station MMI.
In each station, interviewers evaluate applicants on the basis of communication, content/argument, and overall global impression using a behaviorally anchored 1–5 Likert scale. Table 1 demonstrates the behaviorally anchored rating scale for communication.
A score of 3 indicates “acceptable for an entering medical student” for all three of the scales. We calculated mean scores for communication, argument/content, and global for each interviewee across the six stations, and we calculated the final MMI score by summing the three mean scores, with a maximum achievable score of 15.
The Admissions Committee considered only the MMI score and made all offers of admission, rejection decisions, or placement on an alternate list exclusively on applicants’ MMI scores. The committee did not discuss applicants individually unless the applicant received a score below 3 in more than one station. After discussion on the content of the applicant response, the committee voted to keep the applicant at the current MMI score or to move the applicant to the alternate list or reject category.
We combined data from the three interview year cohorts. We calculated descriptive statistics as a function of self-reported race and ethnicity, specifically URIM as described above or non-URIM. We conducted Pearson correlational analyses to evaluate the relationship between GPA and MCAT scores, SCR experience scores, and MMI personal competency scores.
For each dependent variable we conducted an independent-samples t test to compare mean admission test scores for URIM and non-URIM applicants. Because GPA, MCAT, SCR, and MMI scores are all measures of academic performance, we considered conducting a multivariate analysis of variance (MANOVA) to analyze the four dependent variables simultaneously. However, further investigations revealed that our performance measures were poorly correlated with one another (r range: 0.003–0.25), and therefore, we determined there was no additional value in using a MANOVA model over separate t tests.30
In addition to comparing differences in mean performance scores as a function of applicant self-reported race/ethnicity, we also conducted a series of “what-if” analyses to determine whether alternative weighting methods would have changed final admissions decisions and entering class composition. Because the different performance measures are on different numeric scales, we converted performance measures (GPA, MCAT, SCR score, and MMI) to z scores before implementing alternative weighting schemes. Four potential weighting methods were used: 100% MMI; 30% GPA and 70% MMI; 10% GPA, 10% MCAT, 10% SCR score, and 70% MMI; and 50% MCAT and 50% MMI. We chose these weighting formulae to reflect the proportion of MMI weighting used at McMaster University17 (70%), where authors M.M. and H.I.R. are faculty, as well as formulae incorporating interview, academic, and experiential scores commonly used by admissions committees. RWJMS currently accepts applicants based solely on their total MMI score (option 1), with approximately 45% of applicants being accepted. Consequently, we calculated the entering class composition for the alternative weighting methods (options 2–4) by theoretically “accepting” those applicants whose combined z scores placed them in the top 45% of applicants and identified URIM ethnicity composition by each alternative weighting method.
Of the 5,740 candidates who applied to RWJMS during the study period, 847 applicants (14.8%) were self-described as URIM. A total of 1,339 candidates (23.3% of entire pool) were invited to participate in the MMI process. Of the 1,339 applicants, 141 (10.5%) did not report their ethnicity and were therefore not included in the analyses. Of the remaining 1,198 candidates selected for interview, 129 (10.8%) were URIM, while 1,069 (89.2%) were non-URIM.
Traditional performance measures
Correlations between admissions performance measures are provided in Table 2. A moderate correlation existed between GPA and MCAT scores. Small, albeit significant, correlations were found positively between SCR and MMI scores, and between SCR and MCAT scores; and negatively between GPA and MMI scores. In addition to examining the overall relation between applicant performance measures, generalizability theory was used to determine overall reliability of the MMI scores. Variance components for MMI scores were estimated for applicants (a; n = 1,198), MMI station (s; n = 6), and MMI item (i; n = 3: communication, argument/content, and global score) nested in skill area (i:s). Table 3 displays the facets and variance components for MMI scores. The majority of variance (62%) was attributed to applicants, indicating that the MMI was able to reliably differentiate between individuals of varying performance. The interaction between applicant and MMI station accounted for the second largest amount of variance (33%). This interaction effect indicates that applicant performance varied across MMI stations, an effect commonly referred to as “context-specificity.”15
Relation of traditional performance measures to applicant diversity
Figure 1 shows mean and standard error scores for URIM and non-URIM applicants. Overall, URIM applicants had significantly lower GPA, MCAT, and SCR scores relative to non-URIM applicants. However, this pattern of lower scores did not hold with MMI scores, with equivalent MMI scores being found for URIM and non-URIM applicants.
“What-if” analyses: The effects of alternative weighting of performance measures on race/ethnicity composition of accepted applicants
Table 4 shows the mean and standard deviations for GPA, MCAT, and SCR scores for the students who would theoretically have been accepted on the basis of the four alternative weighting protocols. Alternative weighting analyses, based on z score combinations, resulted in marked differences in the racial and ethnic composition of the top 45% of interviewed applicants. The percentage of URIM applicants for the 129 interviewed applicants across the four weighting schemes is as follows:
- 100% MMI: 73 (57%) applicants accepted.
- 70% MMI, 30% GPA: 50 (39%) accepted.
- 70% MMI, 10% GPA, 10% MCAT, 10% SCR score: 55 (43%) accepted.
- 50% MMI, 50% GPA: 37 (22%) accepted.
These data are further illustrated in Figure 2, whereby the proportion of URIM applicants accepted into the undergraduate medical program would have declined from 57% to 22% depending on weighting.
Our findings suggest that increasing use of MMI scores in admission decisions may enhance racial/ethnic diversity among entering medical students, relative to reliance on traditional academic measures and experience scores. To our knowledge this is the only report from a U.S. medical school showing the neutrality of the MMI for underrepresented applicants, contrary to the MCAT or GPA.31 Compared with traditional methodologies,5,32–39 the MMI has psychometric advantages, higher reliability than traditional interviews, and predictive validity of future performance.15–20 Despite such advantages, there have been anecdotal concerns that the MMI format and the types of interview stems could disadvantage certain racial/ethnic groups. Our results revealed that there was no statistical significance in MMI performance between URIM and non-URIM groups, a finding consistent with a small Canadian study on five aboriginal applicants.25 Extrapolation from that study, however, is limited because of the size of that study, and the very different social and cultural backgrounds of the United States and Canada.
We predicted that assessment of personal competencies by the MMI would provide unique information about applicants when compared with the academics, and as such, the small correlations are not a surprising finding. We were somewhat surprised that the MMI correlations with SCR scores are not stronger as these screening variables are more conceptually relevant to the MMI on the basis of core personal competencies. One explanation is that ratings of students’ SCR activities may function as an indicator of the number of activities completed rather than an indicator of what knowledge, skills, or attitudes one developed through participating in such activities. This finding provides justification for the screening process used at RWJMS, not giving additional weight to an application with experience ratings of “4” or “5” versus “3” in the holistic application screening.
The change in racial/ethnic makeup of the top 45% ranked students who would be offered acceptance is even more surprising. Reiter et al17 combined MMI results of six Canadian medical schools over two years, focusing on MMI effect on enhancing diversity, increasing access to medical school, and neutralizing the effect of academic variables. McMaster’s formulaic approach to invitation for interview was 60% GPA and 40% autobiographical questionnaire, and postinterview selection was 70% MMI score and 30% GPA. The Canadian study found that these differential weighting schemes did not impact the diversity of accepted cohorts, as measured by income and community size.17 The lack of a relationship between diversity and MMI weighting may be due to fallacies in assumptions regarding postal code data use as a surrogate for socioeconomic status, reluctance in collecting race/ethnicity data in Canada, or the fact that early screening formulas weighting academics and autobiographical submissions may irrevocably alter the demographic makeup of the interview pool. This last difference seems likely given the very different academic thresholds set for interview eligibility at RWJMS relative to the Canadian schools in that study—namely, the decision not to score, weight, or revisit threshold criteria for invitation for interview. The preliminary success of RWJMS’s first cohort in terms of academic performance and demonstration of personal competencies represents early validation of our selection methodology.29 We compared the medical school performance of the first MMI cohort in 2011 with that of the 2010 entering class, admitted under the former process (a traditional single open-file one-on-one semistructured interview). Noting that course grades reflect cognitive end points, the numbers of course failures across these years were not statistically different: 15 students with exam failures in 2010 versus 12 students in 2011. The mean (SD) USMLE Step 1 score for the MMI cohort was 231 (19) compared with 231 (20) (entering class of 2010) (P = .70).
Our study has several limitations. The data are based on a single-institution study that has implemented a “home-grown” MMI system. The thresholds set for interview eligibility are different for different schools. The closed rater pool, the modified reference standard setting exercises preceding each MMI day, the behaviorally anchored rating scale, and the stems based on combinations of the core personal competencies may be difficult to reproduce. Additionally, URIM applicants from our articulated pipeline program were not included in the dataset because their acceptances were based on a hybrid MMI/traditional interview, and thus the findings cannot be extrapolated to this subset.
Evidence-based admissions practices are upon us. Core personal competencies4 have been identified to inform admissions committees and to serve as a basis for outcome studies on medical student performance. The MMI is gaining traction in the United States, and its value is now being recognized in graduate medical education selection processes.40–42 The public trusts medical schools to select the next generation of providers to meet the health care needs of society. This single-institution multiyear cohort study suggests that sole reliance on the MMI for final admissions decisions, after threshold academic and experiential preparation have been met, results in a diverse accepted applicant pool; in contrast, weighting of “the numbers” as pertains to GPA or MCAT or to what is written about the application may decrease the acceptance of URIM applicants.
Acknowledgments: The authors thank Meryle R. Kramer, MA, for her dedication and support in implementing the new interview process.
1. Mechanic D. Disadvantage, inequality, and social policy. Health Aff (Millwood). 2002;21:48–59
2. Nivet MA. Commentary: Diversity and inclusion in the 21st century: Bridging the moral and excellence imperatives. Acad Med. 2012;87:1458–1460
3. Cohen JJ, Gabriel BA, Terrell C. The case for diversity in the health care workforce. Health Aff (Millwood). 2002;21:90–102
4. Koenig TW, Parrish SK, Terregino CA, Williams JP, Dunleavy DM, Volsch JM. Core personal competencies important to entering students’ success in medical school: What are they and how could they be assessed early in the admission process? Acad Med. 2013;88:603–613
5. Albanese MA, Snow MH, Skochelak SE, Huggett KN, Farrell PM. Assessing personal qualities in medical school admissions. Acad Med. 2003;78:313–321
6. Carrothers RM, Gregory SW Jr, Gallagher TJ. Measuring emotional intelligence of medical school applicants. Acad Med. 2000;75:456–463
7. Dore KL, Reiter HI, Eva KW, et al. Extending the interview to all medical school candidates—Computer-Based Multiple Sample Evaluation of Noncognitive Skills (CMSENS). Acad Med. 2009;84(10 suppl):S9–S12
8. Lievens F, Sackett PR. The validity of interpersonal skills assessment via situational judgment tests for predicting academic success and job performance. J Appl Psychol. 2012;97:460–468
9. Girzadas DV, Harwood RC, Dearie J, Garnett S. A comparison of standardized and narrative letters of evaluation. Acad Emerg Med. 1998;5:1101–1104
10. Greguras GJ, Robie C. A new look at within-source interrater reliability of 360-degree feedback ratings. J Appl Psychol. 1998;83:960–968
11. Dirschl DR, Adams GL. Reliability in evaluating letters of recommendation. Acad Med. 2000;75:1029
12. Ferguson E, James D, O’Hehir F, Sanders A. Pilot study of the roles of personality, references, and personal statements in relation to performance over the five years of a medical degree. Br Med J. 2003;326:429–432
13. Ozuah PO. Variability in deans’ letters. JAMA. 2002;288:1061
14. Ziv A, Rubin O, Moshinsky A, et al. MOR: A simulation-based assessment centre for evaluating the personal and interpersonal qualities of medical school candidates. Med Educ. 2008;42:991–998
15. Eva KW, Rosenfeld J, Reiter HI, Norman GR. An admissions OSCE: The multiple mini-interview. Med Educ. 2004;38:314–326
16. Reiter HI, Eva KW, Rosenfeld J, Norman GR. Multiple mini-interviews predict clerkship and licensing examination performance. Med Educ. 2007;41:378–384
17. Reiter HI, Lockyer J, Ziola B, Courneya CA, Eva KCanadian Multiple Mini-Interview Research Alliance (CaMMIRA). . Should efforts in favor of medical student diversity be focused during admissions or farther upstream? Acad Med. 2012;87:443–448
18. Eva KW, Reiter HI, Rosenfeld J, Norman GR. The relationship between interviewers’ characteristics and ratings assigned during a multiple mini-interview. Acad Med. 2004;79:602–609
19. Eva KW, Reiter HI, Rosenfeld J, Norman GR. The ability of the multiple mini-interview to predict preclerkship performance in medical school. Acad Med. 2004;79(10 suppl):S40–S42
20. Eva KW, Reiter HI, Rosenfeld J, Trinh K, Wood TJ, Norman GR. Association between a medical school admission process using the multiple mini-interview and national licensing examination scores. JAMA. 2012;308:2233–2240
21. Reiter HI, Eva KW. Reflecting the relative values of community, faculty, and students in the admissions tools of medical school. Teach Learn Med. 2005;17:4–8
22. Terrell C, Beaudreau J. 3000 by 2000 and beyond: Next steps for promoting diversity in the health professions. J Dent Educ. 2003;67:1048–1052
23. Koenig JA, Sireci SG, Wiley A. Evaluating the predictive validity of MCAT scores across diverse applicant groups. Acad Med. 1998;73:1095–1106
24. Huff KL, Koenig JA, Treptau MM, Sireci SG. Validity of MCAT scores for predicting clerkship performance of medical students grouped by sex and ethnicity. Acad Med. 1999;74(10 suppl):S41–S44
25. Moreau K, Reiter H, Eva KW. Comparison of aboriginal and nonaboriginal applicants for admissions on the Multiple Mini-Interview using aboriginal and nonaboriginal interviewers. Teach Learn Med. 2006;18:58–61
26. Liaison Committee on Medical Education. Functions and structure of a medical school: Standards for accreditation of medical education programs leading to the M.D. degree. http://www.lcme.org/publications.htm#standards-section
. Accessed August 4, 2015
27. Witzburg RA, Sondheimer HM. Holistic review—shaping the medical profession one applicant at a time. N Engl J Med. 2013;368:1565–1567
28. Julian ER. Validity of the Medical College Admission Test for predicting medical school performance. Acad Med. 2005;80:910–917
29. Terregino CA, Dunleavy D, Geiger T, Kramer MR, Kohan E. Admitting the 21st century physician: Exclusive use of the MMI in final admissions decisions to predict clinical and interpersonal performance.Poster presented at: Association of American Medical Colleges Annual MeetingNovember 7–11, 2014Chicago, IL [Unpublished data, available from the authors on request.]
30. Cohen J Statistical Power Analysis for the Behavioral Sciences. 19882nd ed Hillsdale, NJ Lawrence Erlbaum Associates
31. Davis D, Dorsey JK, Franks RD, Sackett PR, Searcy CA, Zhao X. Do racial and ethnic group differences in performance on the MCAT exam reflect test bias? Acad Med. 2013;88:593–602
32. Morris JG. The value and role of the interview in the student admissions process: A review. Med Teach. 1999;21:473–481
33. Elam CL, Andrykowski MA. Admission interview ratings: Relationship to applicant academic and demographic variables and interviewer characteristics. Acad Med. 1991;66(9 suppl):S13–S15
34. Elam C, Stratton T, Scott KL, et al. Review, deliberation and voting: A study of selection decisions in a medical school admission committee. Teach Learn Med. 2002;14:98–103
35. Stansfield RB, Kreiter CD. Conditional reliability of admissions interview ratings: Extreme ratings are the most informative. Med Educ. 2007;41:32–38
36. Basco WT, Gilbert GE, Chessman AW, Blue AV. The ability of a medical school admissions process to predict clinical performance and patients’ satisfaction. Acad Med. 2000;75:743–747
37. Campion MA, Elliot DP, Brown BA. Structured interviewing: Raising the psychometric properties of the employment interview. Pers Psychol. 1998;41:25–42
38. Kreiter CD, Yin P, Solow C, Brennan RL. Investigating the reliability of the medical school admissions interview. Adv Health Sci Educ Theory Pract. 2004;9:147–159
39. Husbands A, Dowell J. Predictive validity of the Dundee multiple mini-interview. Med Educ. 2013;47:717–725
40. Hofmeister M, Lockyer J, Crutcher R. The acceptability of the multiple mini interview for resident selection. Fam Med. 2008;40:734–740
41. Hopson LR, Burkhardt JC, Stansfield RB, Vohra T, Turner-Lawrence D, Losman ED. The multiple mini-interview for emergency medicine resident selection. J Emerg Med. 2014;46:537–543
42. Strand EA, Moore E, Laube DW. Can a structured, behavior-based interview predict future resident success? Am J Obstet Gynecol. 2011;2004:446.e1–446.e13