Medical school admissions protocols commonly employ cognitive measures (e.g., undergraduate grade point average [uGPA] and Medical College Admissions Test scores) and noncognitive measures (e.g., personal interviews and written submissions).1 Assessment of noncognitive characteristics is key to the admissions process, yet reliable and valid methods for their assessment have proven elusive. Research regarding the admissions interview at the Michael G. DeGroote School of Medicine at McMaster University has produced the Multiple Mini-Interview (MMI).2 The MMI provides a good measure of candidates’ noncognitive characteristics, with reliability and predictive validity.3,4 However, feasibility limits its employment to only a subset of applicants invited to interview. Preinterview screening of noncognitive characteristics is attained using a candidate-written autobiographical submission (ABS).
Candidate-written submission materials have a long history, with their local use in admissions decisions dating back almost four decades.5,6 At that earlier time, it was a candidate-written letter completed according to specific guidelines. Teams of three raters including medical students, faculty, and community members were used. The veracity, overestimation of candidates’ accomplishments, and the independence of candidates’ completion of the letters were initial concerns, but those concerns were allayed subsequent to a rater evaluation study and referee verification checks.6 The referee verification study entailed a random subsample of autobiographical letters. For these letters, contact was made with personal verifiers provided by applicants for their cited activities, a process that is still used. A very high degree of corroboration was reported, and many verifiers reported that candidates had downplayed their accomplishments.
Today, the ABS consists of written responses to five questions designed to evaluate noncognitive personal characteristics such as applicants’ personal experiences, suitability for training at McMaster, and suitability for a medical career. The questions are delivered and responses are received by way of a Web-based system operated by the Ontario Medical School Application Service (OMSAS). The system becomes available to candidates in August of each year, and a candidate has until November to submit his or her application. An ABS score (assigned by averaging across the global ratings provided by three raters) in combination with uGPA, is used to select applicants for interview.
Past concerns regarding ABS veracity and independence of completion have resurfaced, partly because of increased competition secondary to a disproportionate increase in the number of candidates relative to the number of spaces available. In 1972, 1,400 candidates applied for 80 positions, resulting in a 6% success rate.5 Today, that success rate has halved, with close to 5,000 candidates competing for 164 positions. Intuitively, increased competition creates increased concern regarding academic dishonesty.7 More empirically, findings reported by Albanese et al8 support this concern; whereas surveyed medical school graduates reported believing that a written submission enabled them to adequately describe their personal characteristics, they also commonly reported lack of independence in preparation of such submissions. A small minority reported going so far as to obtain professional consultation. Assistance of this sort impacts both the fairness of the tool as an admissions measure and the tool’s psychometric properties; the reliability may be affected by a restriction in the range of candidate’s scores, and the validity may be affected in that the score assigned may better reflect the characteristics of the candidates’ support system than of the candidate him- or herself.9
In this paper, we revisit the question of ABS validity by evaluating candidates’ ability to complete the ABS in an on-site, controlled environment and comparing that with their ability to complete the ABS in the typical off-site, noncontrolled environment. We hypothesized that ABS completion in a time-controlled and supervised condition might lower the performance scores assigned, but that the relative performance of candidates would be maintained. In addition, we compared the psychometric properties of controlled ABS questions as a function of the amount of time candidates were provided to answer each question, hypothesizing that shorter time periods might enable better candidate differentiation.
Off-site, noncontrolled ABSs were received via the Web-based OMSAS system. Each submission was scored via seven-point global ratings assigned by three independent raters: one medical student, one faculty member, and one community member.
The top 696 ranking applicants (based on a combination of 67% uGPA and 33% autobiographical submission) for the 2005 application cycle of the Michael G. DeGroote School of Medicine at McMaster University were invited to campus for an admissions interview. This on-site process consisted of a 12-station MMI and completion of eight ABS questions under controlled conditions. The control consisted of having the ABS supervised and by limiting the time allowed to respond to each question. Half the applicants completed the MMI in the morning and the ABS in the afternoon, with the reverse order for the other half. Six of the on-site ABS questions were novel questions that were comparable with, but not identical to, questions candidates answered off-site before the interviews; they focused on issues of ethics, advocacy, and personal experiences, for example. The remaining two on-site ABS questions were not novel—they were identical to two of the questions candidates answered off-site before their interviews. Ethical approval for the study was obtained through the Protocol Review Committee of the Michael G. DeGroote School of Medicine at McMaster University.
The time allowed to answer novel on-site ABS questions was manipulated such that three question pairs fell into each time allotment: 40 minutes were provided to complete one question pair, 20 minutes were provided for a second pair, and 10 minutes were provided for a third pair. Candidates were told the time allotment before the question presentation, and they were informed that spelling and grammar would not count. The order of the timing conditions and the questions assigned to each timing condition were counterbalanced across interview day. After completion of the six novel questions, a 10-minute time allotment was provided for each of the repeated questions.
Teams of three raters (one faculty member, one community member, and one student) were each assigned to score ABSs for a subset of approximately 30 candidates. They rated the quality of each ABS question using a seven-point rating scale, and their scores were averaged to create a score for each candidate. Sets of ratings were compared, using Pearson’s correlation coefficients and analysis of variance. Reliability analyses were carried out using generalizability theory.
Comparison of on-site (controlled) and off-site (noncontrolled) performance
Candidates’ mean performance scores were significantly higher for ABS questions completed off-site (mean = 4.68; 95% CI = 4.63–4.73) relative to the scores assigned to ABS questions completed on-site under controlled conditions (mean = 4.33; 95% CI = 4.27–4.40; F = 69.4; P < .001). In addition, it was found that the scores assigned to off-site ABS submissions were nonpredictive of scores assigned to on-site ABS submissions (r = 0.16). In other words, one’s performance on the preliminary ABS screen was unrelated to performance in the controlled setting. This poor correlation was not simply attributable to restriction of range— although the ABS was used to help determine who was brought to interview, it was weighted sufficiently lightly that the interviewed sample did not reveal substantially smaller variance on their ABS scores relative to the larger pool of applicants (SD = 1.26 versus 1.31).
The internal consistency was between 0.85 and 0.92, regardless of where the ABS was completed (on-site or off-site) and regardless of which subgroup of examiners (faculty, community members, or students) were considered, thereby indicating good agreement in the rank ordering of candidates across question. The interrater reliability, however, was poorer for the ABS completed off-site (G = 0.28; 95% CI = 0.26–0.30) relative to the interrater reliability for the ABS completed on-site (G = 0.51; 95% CI = 0.49–0.53). Correcting for the unreliability of both measures did not yield greater cause for optimism, because the disattenuated correlation between on-site and off-site scores was r = 0.42. In other words, the correlation between the two sets of scores would be expected to be only 0.42 even if we were able to make both measures perfectly reliable.
The impact of time on on-site candidate performance
The amount of time candidates were provided to respond to questions had no impact on the reliability of the measure; the interrater reliabilities of the on-site ABS varied between 0.48 and 0.52 across the three timing conditions. In fact, there was a strong relationship between the scores assigned across timing condition, with the correlations ranging between r = 0.69 (for the 20-minute questions compared with the five-minute questions) and 0.72 (for the 10-minute questions compared with the five-minute questions). We did, however, find that candidates received higher mean scores the more time they had to complete their responses. Looking at just the novel questions, participants received a mean score of 4.48 when provided with 20 minutes to respond to each question (95% CI = 4.42–4.54), 4.20 when provided with 10 minutes to respond to each question (95% CI = 4.14–4.26), and 4.04 when provided with five minutes to respond to each question (95% CI = 3.99–4.10; F = 174; P < .001). The time allowed interacted with rater type; the difference between 5- and 10-minute timings was nonsignificant when students or community raters were providing the ratings, but it was significant when faculty served as assessors.
Performance on “repeated” questions
Examining candidates’ performance on the two questions that were repeated during the on-site ABS (i.e., those that were directly drawn from the noncontrolled ABS submission) revealed that these scores were more strongly predicted by candidates’ performance on the six novel questions presented on-site (r = 0.72) than by candidates’ initial performance ratings on the same questions completed off-site (r = 0.16). That said, candidates did receive higher mean scores on the repeated questions completed on-site (4.49; 95% CI = 4.43–4.55) relative to the novel questions completed on-site (4.20; 95% CI = 4.14–4.26; F = 159; P < .001).
Faculty tended to assign higher scores to ABS submissions (4.51; 95% CI = 4.44–4.58) relative to students (4.12; 95% CI = 3.05–4.20; F = 33; P < .001). Community members’ mean scores laid between the two (4.39; 95% CI = 4.31–4.48). Rater type and novelty of questions did not interact (F = 1.7; P > .15), indicating that the difference in on-site performance between novel and repeat questions was consistent regardless of which rater group assigned the ratings.
Criterion validity check
Finally, the correlation between MMI performance and ABS was considered. Scores assigned to candidates’ off-site ABS correlated poorly with the MMI (r = 0.12), whereas on-site ABS scores were moderately predictive of MMI performance (r = 0.30).
Assessment of noncognitive characteristics for medical student selection is an integral component of the admissions process at the Michael DeGroote School of Medicine at McMaster University, as it is for many other medical schools. To this end, the use of candidate-written submission materials have been a long standing component of the admissions process. Yet, their use is not without associated controversy. Of particular concern is the validity with which these materials provide an indication of the candidates themselves, because they are not always completed independently by the candidates. As medical education as a field has increasingly emphasized the noncognitive characteristics of medical practice, such as professionalism, it becomes increasingly important that the right message is sent to candidates from their first contact with the medical school—that the institution values noncognitive characteristics and strives to use appropriate selection strategies for that purpose.
Clearly, the spirit is willing, but the tests are weak. Candidates’ ABS scores were higher when obtained under noncontrolled conditions than when obtained under controlled conditions. Of greater concern is that the two sets of scores were not correlated with one another and that only the controlled, on-site scores were correlated with the MMI. The difference in means between on-site and off-site performance could be attributed simply to the amount of time available for response refinement, but that hypothesis is discounted by finding that, in the on-site controlled condition, performance on repeated ABS questions was more strongly related to performance on novel questions given in the same controlled environment than with the ratings candidates originally received for the same question within their noncontrolled submission. In essence, one’s performance on the initial ABS screen under normal, noncontrolled conditions was unrelated to performance in the on-site controlled condition. Unreliability of the test does not account for this finding, because the disattenuated correlations remained relatively low.
This disconnect between off-site and on-site performance is concerning because it attests to either poor ABS methodology or, perhaps, intentionally fraudulent behavior on the part of the candidates.10 Academic dishonesty, such as receiving substantial unpermitted help on an assignment, is significantly more likely to occur when the prospects of it being reported are low11—a condition certainly met when the ABS is completed off-site in a noncontrolled manner. As a result, one may ask whether we should continue to employ the ABS screening tool. As described, the principle to do so has been a tradition at McMaster University since the medical school’s inception. To eliminate screening for noncognitive characteristics would require a major shift in the ethos of McMaster’s admissions process, because the off-site assessment would then be totally dependent on uGPA. Still, the status quo cannot remain.
How can we begin a process to modify the ABS methodology? Our first step was to try to improve its reliability.9 We have demonstrated that the reliability can be improved by using a new scoring approach that involves assigning ABS scores by having raters read across questions rather than across applicants. The current study further suggests that (a) a major threat to ABS’ validity is completion under noncontrolled conditions, and (b) provision of three months to complete the ABS is negatively related to the tool’s reliability relative to smaller time allotments (measured in minutes). In essence, the current ABS protocol is similar to an open-book examination format, in which one has an ample but finite time frame to gather whatever resources one considers necessary. Some applicants may consider it appropriate to use multiple resources, whereas others may not.
Our findings from this study suggest that the psychometric properties of the assessment improve by limiting the submission time frame. Yes, applicants require sufficient time to complete the overall application procedure, but three months is not necessary for ABS submission. Three months provides more than ample time to have multiple individuals guide candidates with ABS submissions or (in a small minority of instances, as reported by Albanese et al8) purchase professional services. This three-month time frame may relate to the past dictates of mail service, but with Internet-based technologies, consideration of a new, time-sensitive ABS submission procedure is warranted.
We cannot determine from this study, however, whether a nonsupervised, Internet-based, time-limited ABS protocol would improve the psychometric properties of the ABS relative to the current protocol. In fact, we view the inability to separate the effect of supervision and the effect of providing stringent time limits for ABS completion as the primary limitation of this study. In addition, it should be noted that it is possible that we observed such low correlations between the on-site and off-site ABS submissions because the tools simply measure different aspects of noncognitive performance. Were that the case, however, we would still be forced to conclude that the controlled ABS provided a better assessment because (a) the on-site, controlled ABS was more highly correlated with the MMI than the off-site, noncontrolled ABS, and (b) previous work has shown the off-site ABS to be uncorrelated with other outcome measures of interest.4,12 Unfortunately, this study’s findings do not solve the problems associated with the ABS screening tool, but they should raise awareness of the fallibility of existing screening tools such as the ABS.