In November 2001, addressing the American Association of Medical Colleges, President Dr. Jordan Cohen decried the present admissions process that “underemphasizes personal characteristics.”1 Although the gauntlet was thrown down, a greater call to arms would not come until 2004, with studies by Papadakis et al2–5 correlating undergraduate behavior with subsequent complaints to state medical boards.
Attrition rates from U.S. and Canadian medical schools are too low to envision weeding out the “bad apples” after entry.6 The emphasis is thus placed squarely on the shoulders of admissions offices to admit students with a certain level of personal qualities they perceive as desirable. The Medical College Admissions Test (MCAT) provides a reliable, valid, and acceptable tool that can be applied to more than 100,000 test-takers annually. Where is the same testing capability to screen out those with weaker noncognitive skills? Traditional approaches—autobiographical letters, reference letters, and interviews—have been shown to be of limited value.7 The Multiple Mini-Interview (MMI)8 has shown promise in this regard, revealing correlations with future clinical and noncognitive performance in the 0.35 to 0.57 range,8,9 thus mirroring the success achieved by MCAT for cognitive end points.10 Researchers in other schools have demonstrated promising results using similar methodology at assessment centers.11,12 However, the MMI and similar assessments are limited in that they can only be administered to the limited number of candidates who come on-site for an interview, not to the thousands who apply. Therefore, the decision of whom to bring to interview is done without the benefit of psychometrically sound measures of noncognitive qualities.
Following discussions at McMaster University, the same psychometric principles used in the OSCE and in the development of the MMI were applied to developing the Computer-based Multiple Sample Evaluation of Noncognitive Skills (CMSENS) principles intended to minimize the deleterious effects of context specificity and halo effect by the use of many small, independent “biopsies” of candidates' personal characteristics. Unlike the MMI, these tests are computerized and, therefore, have the potential for large-scale, remote screening. Recently, two studies of reliability and concurrent validity, herein described, were completed. Both studies received ethics approval from the hospital research ethics board.
The purpose of the two studies presented in this paper was to determine the psychometric qualities—including reliability of various response methods and correlation with an established noncognitive measure, the MMI—of CMSENS in a controlled environment. Study 1 sought to determine the psychometric properties of audio and typewritten responses as well as concurrent validity with the MMI. Study 2 expanded on the findings of Study 1 with a larger sample, preceding the MMI by three months.
Method and Results
In 2006, a pilot study investigated the reliability and validity of data obtained from CMSENS with undergraduate medical school candidates. All participants wrote CMSENS in a proctored computer lab at McMaster on the same day they completed the admissions interview, the MMI. The proctored environment allowed for test security and elimination of plagiarism.
One hundred ten candidates to the Michael G. DeGroote School of Medicine at McMaster University completed CMSENS. CMSENS consisted of 12 scenarios—eight 60- to 90-second video-based vignettes presenting ethical dilemmas and group dynamic challenges, all lacking an obvious correct response, as well as four self-descriptive questions similar to traditional interview questions (e.g., “What makes your heart sing?”). Each video and self-descriptive scenario had three related questions requiring a response.
The video vignettes, developed with the help of field experts, covered topics including, but not limited to, collaboration, communication, professionalism, and confidentiality, reflecting the nonmedical expert qualities identified by both the Royal College of Physicians and Surgeons of Canada and the Accreditation Council for Graduate Medical Education to ensure construct validity.13 The nonclinical situations ensured no advantage to those with clinical backgrounds and were presented on computers. The video vignettes allowed for richer context in the depiction of cases (compared with written description) while still being less resource-intensive than live standardized performance.
To enable generalization to the preinterview candidate population, of those 110 who completed the CMSENS pilot, 82 were candidates who had been invited to interview at the medical school and 28 were pseudocandidates who had applied to McMaster and who had been turned down for interviews but completed the MMI as part of the pilot project. To recruit the pseudocandidates, letters were sent to 300 candidates who were unsuccessful in receiving an interview and were geographically approximate to McMaster. Pseudocandidates were aware that their voluntary participation would not impact their current or future application to medical school. The 30 pseudocandidates were selected on a first-come, first-served basis. Two pseudocandidates did not participate in the study, one because of illness and one no-show. In terms of true candidates, all candidates were invited to participate; 80 were recruited, 5 declined to participate on the day of interview, and 7 additional candidates were asked to participate.
Of those candidates who completed the study, 78 verbally recorded their responses into an audio file captured by the computer, and 32 typed their responses. The two groups saw the same scenarios; the only variation was their method of recording their responses. The true interview candidates were represented in both the audio response (n = 60) and the typewritten response (n = 22) groups, as were the pseudocandidates (n = 18 and n = 10, respectively). Raters and other candidates were not aware which candidates were true candidates and which candidates were completing the MMI just for the purposes of the pilot study.
Ninety-two raters for Study 1 attended an orientation session explaining the scoring process and were given the background and theory for the scenario they were evaluating. Raters were asked to score candidates' communication skills, strength of the argument raised, suitability for a career in medicine, and overall performance on that scenario using a 10-point anchored Likert scale ranging from “Bottom 25%” to “Top 1%.” Responses were marked across scenario, rather than across candidate, in an effort to decrease the possibility of a halo effect.14 Two independent raters were used for each CMSENS scenario, a medical student and either a faculty or community member. In addition, CMSENS raters were asked to estimate the average time they took to mark each scenario. MMI global ratings were scored by one rater for each scenario using a 10-point Likert scale. Data were analyzed using generalizability theory, analysis of variance, and Pearson correlation.
Analyses performed on the pilot study results indicated that the overall test generalizability was 0.86 for the audio CMSENS and 0.72 for the typewritten version of CMSENS. Interrater reliability was 0.82 and 0.81 for the audio and typed versions of CMSENS, respectively. The average score per scenario did not differ as a function of response type [audio = 4.5, SD = 1.20, typewritten = 4.14, SD = 0.71, F(1, 118) = 2.73, ns]. There was a trend toward real candidates outperforming pseudocandidates, but the difference was not statistically significant either in the audio response [4.57 (SD = 1.23) and 4.15 (SD = 1.19), respectively, F(1, 76) = 1.51, ns], or typed response [4.14 (SD = 0.75) and 4.0 (SD = 0.72), respectively, F(1, 32) = 0.19, ns].
The typewritten CMSENS also demonstrated concurrent validity, correlating with the MMI at r = 0.51. The audio CMSENS scores, however, correlated with the MMI at only r = 0.15.
According to rater estimates, more time was required to score the audio responses—on average, 20 minutes per scenario per candidate, as opposed to a reported average of 2 minutes per scenario per candidate for typewritten responses. In addition, the typewritten responses could be evaluated off-site at the raters' convenience, whereas the audio responses, because of file size, had to be scored on-site. The typed responses also provided increased anonymity for the candidates in terms of sex, ethnicity, and other potential sources of bias.
For two reasons, further evaluation of CMSENS used the typewritten response format only. The first reason was markedly lower rater resources, specifically marking time and less possibility for unconscious bias based on sex or culture, relative to audio responses. In addition, the typewritten responses correlated with the MMI at an acceptable level.
One hundred sixty-seven candidates to McMaster Medical School completed CMSENS two months before the admissions interviews. Roughly half (n = 88) of those completing CMSENS also completed the MMI. This CMSENS version used new scripts produced with much more professional quality videos in terms of production value than Study 1. It consisted of a series of eight 60-second video vignettes, as well as six self-descriptive questions. As before, each video and self-descriptive scenario had three related questions requiring a response. All participants wrote the 90-minute CMSENS in a proctored computer lab on the university campus. Most candidates agreed to participate before knowing whether they had received an interview or not. The ratings scales were modified from those for Study 1 on the basis of feedback from raters asking for greater clarity, with CMSENS global ratings scored using an anchored nine-point Likert scale ranging from “Unacceptable” to “Superior.” Again, two independent raters (n = 56) were used for each scenario. MMI scoring remained the same as described in Study 1. Additional evaluations available during the admissions process included the overall grade point average (GPA) and rater scores on an autobiographical submission (ABS). The ABS consists of five short answers completed by all candidates remotely with their application to medical school and without a proctor, and scored by two independent raters using global ratings on a seven-point Likert scale. Fifty-seven percent of interviewed candidates in the study also completed the MCAT.
Pearson correlations were analyzed between CMSENS (specifically, CMSENS–total, CMSENS–video, and CMSENS–self-descriptive) and GPA, ABS, MMI, and MCAT (broken into MCAT–total, MCAT–biological sciences, MCAT–physical sciences, MCAT–writing sample, and MCAT–verbal reasoning).
Analyses performed on the larger study results indicated an overall test generalizability for all 14 scenarios, 8 video scenarios, and 6 self-descriptive scenarios of 0.83, 0.75, and 0.69, respectively. Interrater reliability for the entire CMSENS, video CMSENS, and self-descriptive CMSENS were 0.95, 0.92, and 0.90. Approximately three minutes per rater was required for scoring each scenario per candidate. Correlations are recorded in Table 1 for the whole sample of 167 who completed CMSENS and for the subgroup of 88 who were invited for interviews and completed the MMI. The correlation of CMSENS to MCAT–verbal reasoning was 0.38 in the whole sample of 167, and 0.26 for the subgroup of 88 invited to interview. For CMSENS–video, the corresponding correlations were 0.37 and 0.22; for CMSENS–self-descriptive, correlations were 0.28 and 0.22.
Of greatest import, if CMSENS is to be considered as a screening tool for interview, the correlations of CMSENS–total, CMSENS–video, and CMSENS–self-descriptive to MMI were r = 0.46, r = 0.51, and r = 0.33, respectively. With correction for disattenuation, the correlation of CMSENS with MMI was 0.60. For purposes of considering incremental benefit of CMSENS, the correlation of MMI to MCAT–verbal reasoning (n = 50) was found to be 0.26, so it seems to be more closely related to MMI performance than MCAT–verbal reasoning.
The potential advantages of an accurate assessment of noncognitive skills of all medical school candidates include enhanced inclusiveness to all candidates from the time of application, improved behavior during medical school, better performance on national licensing examinations, and ultimately lessened reasons for lawsuits and medical board disciplinary action. Although this is a long causal chain, gradual evidence is emerging to link these components. CMSENS correlated with MMI at a rate of 0.60 after correction for disattenuation. MMI predicts for clinical clerkship performance at a correlation of 0.57.9 Clerkship performance seems predictive of state licensure board disciplinary action for professional behavior problems.5 So, although the results of the two studies reported here are preliminary, and should be taken as such, the cumulative results show promise in that they seem to enable reliable assessment of noncognitive qualities before an interview, with a potential link to noncognitive measures at the interview and beyond. Future research will endeavor to make more direct links with CMSENS and other outcomes.
It is important to note that predictive validity has not been completely established. The use of the surrogate end point of MMI performance is a promising beginning, but future research should follow the 57 CMSENS volunteers subsequently admitted to McMaster (through licensing examinations and, ideally, well into practice). This will also be done for candidates admitted to other medical schools who gave consent to have their CMSENS performance correlated to future medical exam scores.
Despite all these limitations, the correlation with MMI provides a basis for optimism, in contrast to traditional written measures.14 That said, there are other indications of progress in the literature. The verbal reasoning portion of the MCAT has demonstrated some success, predicting for communication skills on both the U.S.15 and Canadian16 national licensing examinations.
More powerful preinterview noncognitive screening of all candidates would continue that progress. Specifically, neither personal interviews nor MMIs can be feasibly applied to all candidates in many institutions. This study supplies a further step toward overcoming that challenge, although more investigation remains. Future studies include evaluating the feasibility of a Web-based version of CMSENS administered on a larger scale sample, either via a Web platform or in proctored settings.
The authors wish to acknowledge the grant support of the Medical Council of Canada and the Stemmler Fund, National Board of Medical Examiners.
1 Cohen J. Facing the future. President's address delivered at: 112th Annual Meeting of the Association of American Medical Colleges; November 4, 2001; Washington, DC.
2 Papadakis MA, Arnold GK, Blank LL, Holmboe ES, Lipner RS. Performance during internal medicine residency training and subsequent disciplinary action by state licensing boards. Ann Intern Med. 2008;148:869–876.
3 Papadakis MA, Teherani A, Banach MA, et al. Disciplinary action by medical boards and prior behavior in medical school. N Engl J Med. 2005;353:2673–2682.
4 Teherani A, Hodgson CS, Banach M, Papadakis MA. Domains of unprofessional behavior during medical school associated with future disciplinary action by a state medical board. Acad Med. 2005;80(10 suppl):S17–S20.
5 Papadakis MA, Hodgson CS, Teherani A, Kohatsu ND. Unprofessional behavior in medical school is associated with subsequent disciplinary action by a state medical board. Acad Med. 2004;79:244–249.
6 McGrath B, McQuail D. Decelerated medical education. Med Teach. 2004;26:510–513.
7 Eva KW, Reiter HI, Rosenfeld J, Norman GR. An admissions OSCE: The multiple mini-interview. Med Educ. 2004;38:314–326.
8 Eva KW, Reiter HI, Rosenfeld J, Norman GR. The ability of the Multiple Mini-Interview to predict pre-clerkship performance in medical school. Acad Med. 2004;79(10 suppl):S40–S42.
9 Reiter HI, Eva KW, Rosenfeld J, Norman GR. Multiple mini-interviews predict clerkship and licensing examination performance. Med Educ. 2007;41:378–384.
10 Julian ER. Validity of the Medical College Admission Test for predicting medical school performance. Acad Med. 2005;80:910–917.
11 Ziv A, Rubin O, Moshinsky A, et al. MOR: A simulation-based assessment centre for evaluating the personal and interpersonal qualities of medical school candidates. Med Educ. 2008;42:991–998.
12 Donnon T, Paolucci EO. A generalizability study of the medical judgment vignettes interview to assess students' noncognitive attributes for medical school. BMC Med Educ. 2008;8:58.
13 Downing S. Validity: On the meaningful interpretation of assessment data. Med Educ. 2003;37:830–837.
14 Dore KL, Hanson M, Reiter HI, Blanchard M, Deeth K, Eva KW. Medical school admissions: Enhancing the reliability and validity of an autobiographical screening tool. Acad Med. 2006;81:S70–S73.
15 Veloski JJ, Callahan CA, Xu G, Hojat M, Nash DB. Prediction of students' performances on licensing examinations using age, race, sex, undergraduate GPAs and MCAT scores. Acad Med. 2000;75(10 suppl):S28–S30.
16 Kulatunga-Morvzi C, Norman G. Validity of admission measures in predicting performance outcomes: The contribution of cognitive and non-cognitive domains. Teach Learn Med. 2002;14:34–42.