Purpose. Medical College Admission Test (MCAT) examinees with disabilities who receive accommodations receive flagged scores indicating nonstandard administration. This report compares MCAT examinees who received accommodations and their performances with standard examinees.
Method. Aggregate history records of all 1994–2000 MCAT examinees were identified as flagged (2,401) or standard (297,880), then further sorted by race/ethnicity (broadly identified as underrepresented minority and non-URM, at the time of testing) and gender. Those with flagged scores were also classified by disability (LD = learning disability, ADHD = attention deficit hyperactivity disorder, LD/ADHD = learning disability and attention deficit hyperactivity disorder, and Other = other disability) and type of accommodation. Mean MCAT scores were calculated for all groups. A group of 866 examinees took the MCAT first as a standard administration and subsequently with accommodations. In a separate analysis, their two sets of scores were compared.
Results. Less than 1% of examinees (2,401) had accommodations; of these, 55% were LD, 17% ADHD, 5% LD/ADHD, and 23% Other. Extended time was the most frequently provided accommodation. Mean flagged scores slightly exceeded mean standard scores on all MCAT sections. Examinees who retook the MCAT with accommodations after a standard administration increased their scores by six points, quadrupling the average gain by Standard–Standard retest cohort from another study.
Conclusion. The small but statistically significant different higher flagged scores may reflect either appropriate compensation or overly generous accommodations. Extended time had a positive impact on the scores of those who retested with this accommodation. The validity of the flagged MCAT in predicting success in medical school is not known, and further investigation is underway.
Dr. Julian is assistant vice president and director of the Medical College Admission Test, and Dr. Etienne, is director, Medical College Admission Test research, Association of American Medical Colleges, Washington, DC; Dr. Ingersoll is educational resources coordinator, Office of Student Affairs, and Dr. Hilger is health professions advisor and associate professor, Division of Clinical Laboratory Science, The University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina.
Correspondence and requests for reprints should be addressed to Ellen R. Julian, PhD, Assistant Vice President and Director of the Medical College Admission Test, Association of American Medical Colleges, 2450 N Street, NW, Washington DC 20037; telephone: (202) 828-0692; fax: (202) 828-4799.
To a large extent, admission to medical school is contingent upon undergraduate grade point average and Medical College Admission Test (MCAT) scores.1 A student who does not meet predetermined criteria may be excluded from the pool of applicants, regardless of additional desirable application credentials.
Examinees with learning disabilities (LD) and attention deficit hyperactivity disorder (ADHD) may have difficulty achieving MCAT scores that reflect their potential for success in medical school when they take the examination under standard conditions. The Association of American Medical Colleges (AAMC), which administers the MCAT, evaluates requests for nonstandard testing conditions (accommodations) based on documentation of a disability and approves appropriate accommodations for MCAT examinees who meet the Americans with Disabilities Act of 1990 definition of “disabled.”
When accommodations alter the condition under which the MCAT is taken, the examinee's scores are “flagged” with an asterisk, indicating that the test was administered under nonstandard conditions. Complete information about MCAT accommodations can be found at 〈http://www.aamc.org/students/mcat/about/ada2003.pdf〉. Such accommodations include extended time, use of an intermediary to read the test or record answers, the use of a device or aid prohibited to the general testing population, and “stop-the-clock” breaks within test sections. Other accommodations do not result in flagged scores (e.g., special seating, and extra bathroom breaks) because they do not impact the testing conditions. Medical school admissions officers receive no information about the specific reason for or the nature of an accommodation when they receive examinees’ scores that have been flagged.
Concerns exist about whether accommodations truly allow examinees to compensate for a disability; whether flagged MCAT scores stigmatize examinees and, thus, adversely affect their admission to medical school; and whether MCAT scores attained with accommodations are valid predictors of success in medical school to the same extent as standard scores. Some speculate that students with LD and/or ADHD are unlikely to survive the academic rigors of medical school and the medical profession.2,3 A thorough review of the psychometric, legal, and social policy issues involved in testing students with disabilities can be found in Pitoniak and Royer.4 The power of the MCAT taken under standard conditions to predict success in medical school has been well documented,5–8 but research is lacking regarding the predictive success of the MCAT taken with accommodations.
A wide range of testing programs including the Graduate Record Examination® (GRE), the ACT™ Assessment, the SAT®, and the Graduate Management Admission Test (GMAT®) discontinued flagging for the reasons noted previously. The Law School Admission Council® (LSAC), however, has recently decided to continue flagging The Law School Admission Test (LSAT®) because of research showing accommodations impact the validity of the test scores.9 The AAMC is considering whether to continue flagging scores obtained when specific accommodations alter testing conditions.
This report focuses on the MCAT scores for the pool of MCAT examinees from 1994–2000 who took the test with accommodations and compares these flagged scores with those of examinees who took the test under standard conditions (standard examinees with standard scores). Because data were obtained from MCAT applications, which do not contain grade point averages or test scores such as intelligence quotients or SATs, no independent covariates were available with which to compare MCAT scores. The purpose of this report was to delineate similarities and differences between examinees with flags and those without.
Prior to conducting this research, the researchers presented the research proposal to the Institutional Review Board (IRB) of the University of North Carolina at Chapel Hill. The IRB determined that the proposed research met the criteria for an exemption of review.
Data for all 300,281 examinees who took the MCAT within a seven-year period (1994–2000) were analyzed. The scores, gender, and race/ethnicity (defined broadly as underrepresented minority and non-underrepresented minority) of the 2,401 examinees who took the MCAT with accommodations (indicated by flagged scores) were compared with those of the 297,880 examinees who took the MCAT under standard conditions. The type of documented disability and the accommodation(s) approved were noted. Descriptive statistics were used to analyze both the demographics of the examinees and their MCAT scores.
Examinees provide demographic information when they register to take the MCAT. Accommodations data, including a description of the disability and approved accommodations, were retrieved from records of approved accommodation requests. Performance information was taken from AAMC databases. Data were matched using unique and confidential numerical identifiers to construct data files consisting of a single record for each examinee. For examinees who took the MCAT multiple times between 1994 and 2000, the most recent scores were used unless otherwise noted. There were several reasons for using most recent scores. Some examinees, against the advice of their premedical advisers, still take the test with little or no preparation as an initial practice run. Furthermore, most medical school admissions committees place more weight on the most recent score. And, finally, our assumption was that the most recent score satisfied the examinee (after which he or she discontinued efforts to improve or was admitted). Examinees were included in either the standard or flagged group based on whether their most recent examination was standard or flagged. A small group of examinees (88) whose most recent examination placed them in the standard group had a flagged examination in their past. Because of the ambiguity about which group was more appropriate for them, they were omitted from this study. Another larger group (866) had accommodations on their most recent MCAT but also had a standard administration in their past. A separate analysis of the difference between their earlier standard and later accommodated examinations was conducted.
On the MCAT, two of the multiple-choice question (MCQ) test sections, Biological Sciences (BS) and Physical Sciences (PS), are scored from one to 15, and the third MCQ section, Verbal Reasoning (VR), is scored from one to 13. To simplify data presentation, we summed these three scores to create a MCQ total score from three to 43. Using this MCQ total score in our analysis is supported by the high correlation among the three scores that comprise the total MCQ. The correlation between the BS scores and the PS scores is approximately .80. Each of these scores has a correlation of .64 and .60, respectively, with the VR scores. The MCQ total score was added to MCAT score reports beginning in 2003.
The MCAT's Writing Sample (WS) scores are letters ranging from “J” (the lowest) to “T” (the highest). For analysis, these letter scores were converted to a numeric scale ranging from 1–11, with the unscorable “X” receiving a zero. Writing Sample scores, because of their low correlation with the other MCAT test section scores (typically around .35), were not combined with the MCQ scores.
A total of 300,281 examinees took the MCAT from 1994–2001, including 2,401 (.8%) receiving accommodations (see Table 1). Among the accommodated examinees were 1,320 (55%) with LD, 409 (17%) with ADHD, and 120 (5%) with both LD and ADHD. Nearly one quarter (23%) of those receiving accommodations had other disabilities such as visual, hearing, or mobility impairments, and psychological disabilities or chronic illness. The number of accommodations granted peaked in 1997 (441), the year before the stringency of the disability documentation requirements increased in 1998. The number of examinees receiving accommodations declined since 1998 to a low of 265 in 2000.
More men than women took the MCAT from 1994–2000 (53% and 47%, respectively), and a greater percentage of examinees with accommodations were men (57% were men and 43% were women). The percentages of underrepresented minorities in the standard (11.7%) and flagged (11.9%) groups were similar.
The most commonly provided accommodations were extended time and separate room (97% and 97.5% of all accommodated examinees, respectively). The use of a separate room may have been approved, although not explicitly requested, when a separate room was procedurally necessary to implement extended time or other accommodations.
Accommodated examinees achieved higher mean scores than standard examinees on all sections of the MCAT (see Table 2). The difference, though small, was statistically significant, as indicated by the effect sizes reported below. Examinees with ADHD attained the highest means, followed by examinees with LD, other disabilities (Other), both LD and ADHD, and standard examinees. These results are consistent across all years of the study.
A total of 866 examinees took the MCAT first as a standard administration and subsequently took the MCAT again with accommodations (Standard–Accommodated group). Their mean scores for the two test administrations are shown in Table 3. On their initial (standard) administration, the mean MCQ total score for this group was 19.9, and on their subsequent (accommodated) administration, the mean MCQ total was 26.1. The gain of 6.2 points was associated with an effect size of 1.4 (the difference between the means of the standard-administration MCAT scores and the accommodated-administration MCAT, divided by the standard deviation of the standard-administration MCAT scores). An effect size of .3 or greater is generally considered to indicate a significant practical difference between two sets of scores.10
Koenig and Leger5 studied the scores of examinees who took the MCAT twice (Standard–Standard; 12,111) and, using their data, we extrapolated that the mean gain in MCQ total score was 1.7 on the second MCAT (see Table 3), a difference equivalent to an effect size of .4. Although the sizes of Koenig and Leger study group and our subsample differ, it is interesting to note that the effect size for the Standard–Accommodated MCQ total score difference in our study was almost 3.6 times greater than that for the Koenig and Leger study. Standard–Accommodated examinees scored lower on the VR section than on the other MCQ sections, but when retested, they made the greatest gains on the VR section. Compared with the Koenig and Leger study's findings the Standard–Accommodated gain on the VR section was four times that of the Standard–Standard gain. On the WS, Standard–Accommodated examinees gained 4.6 times more than Standard–Standard examinees.
Our study shows that examinees who received accommodations had higher mean MCAT scores than did examinees who sat for the standard administration of the examination. The slightly higher scores may reflect either appropriate compensation or overly generous accommodation.
By far, the accommodation most frequently granted during the study period was extended time (and almost all examinees who used extended time also received a separate room). One might expect that higher scores would result when the MCAT is taken with extended time. Indeed, the literature on extended time for course examinations for postsecondary students reveals that, although most students are able to raise their scores if given additional time, students with LD make significantly greater gains than those without documented disabilities.11,12 Our study supports these findings.
In our study, the impact of standard time versus extended time on the test may be observed in the score increases among the subset of Standard–Accommodated examinees. Their phenomenal average gain of 6.2 MCQ points is over four points higher than the expected gain seen in Standard–Standard examinees of the Koenig and Leger study.5 Although our study found the mean MCQ total score for accommodated examinees was higher than that of standard examinees (26.3 and 25.1, respectively), the practical significance of the difference between these means, when viewed as admission criteria or a predictor of success in medical school, may be negligible.
Because no related covariate was available to adjust for differences in initial ability between the accommodated and standard examinees, we offer several possible explanations for the differences in flagged and standard scores. First, the difference in mean MCQ scores of flagged and standard examinees may indicate that the accommodations permitted examinees to compensate for their disabilities and demonstrate that they have slightly more academic ability than standard examinees as measured by the MCAT. Another explanation could be that examinees with disabilities, having succeeded for years in meeting academic challenges and identifying effective compensatory strategies, are more cognitive of and introspective about their strengths, weaknesses, talents, and limitations and, therefore, may be more adept at realistically assessing their educational experiences, abilities, and potential. If such is the case, prospective examinees with disabilities on the lower end of the academic spectrum may opt out of pursuing a medical career at a higher rate than similarly qualified standard examinees. This would result in higher mean MCAT scores for the accommodated examinees. Standard examinees with mediocre academic records may be less aware of their abilities and weaknesses and may take the MCAT hoping against odds for favorable outcomes but achieving scores that lower the mean for the standard examinees. Finally, extended time may have been an overly generous accommodation resulting in artificially inflated scores for some examinees. Perhaps extended time reduces anxiety about time limits and contributes to the phenomenal gain in MCQ total scores or simply allows accommodated examinees ample time to review test items that is not available to the standard examinees.
The MCAT historically has been perceived to be a “speeded” test (i.e., most examinees experience some concern about being able to finish all test items and believe their performance would improve with additional time). Does removing or reducing speededness fundamentally alter the validity of the MCAT? This question is currently being debated and research is pending. Nevertheless, provision of equitable testing conditions for all examinees might be achieved by carefully assessing specific abilities of examinees with disabilities to more precisely measure the functional limitations of each examinee in the context of taking high-stakes speeded tests. Olfeish and Hughes11 propose that this might be possible by analyzing specific subtests (e.g., processing speed and reading rate) as reported in examinees’ required documentation of disability. Rather than defaulting to time-and-a-half and double-time, as is current practice, extended time as an appropriate accommodation could be calculated in more precise and individually appropriate intervals. The goal would be for all examinees, both with and without disabilities, to experience equivalent degrees of speededness; however, accounting for variations in working speed among standard examinees would be difficult if not impossible.
National testing agencies have been dealing with the controversy of whether to flag nonstandard test administrations. Recently, a blue-ribbon panel convened to consider issues relating to the flagging of scores on College Board standardized tests administered with extended time. The majority position was to discontinue the practice of flagging the SAT I, in part because the College Board “does not consider the SAT to be a measure of speed. …”13 The AAMC is currently investigating whether speed of processing written material is an element of performance on the MCAT, and whether it should be. Where speed is a component of what tests are attempting to assess, the testing industry and disabilities specialists need to work together to advance the science of assigning individually appropriate accommodations so that flagging will be unnecessary.
This report documents a significant difference in MCAT scores of examinees who take the test first with standard conditions and again with extended time. We also found a statistically significant but small difference between scores of all standard and accommodated examinees. Two important questions remain: are examinees with flagged scores admitted to medical school at a rate comparable to examinees with standard scores, and do examinees with flagged scores progress through medical school and into the medical profession as well as examinees with standard scores? In future studies addressing these questions, it will be possible to correlate MCAT scores with other data on medical school applicants, their undergraduate grade point averages, academic difficulties in medical school, and scores on the national medical licensure examinations.
The authors gratefully acknowledge Rebecca Johnson for her substantial contributions to this project. This research was supported in part by a grant from the ACLD Foundation, an arm of the Learning Disabilities Association of America.
1. Scotti MJ. Medical school admission criteria: the needs of patients matter. JAMA. 1997;278:1196–7.
2. Hafferty FW, Gibson GG. Learning disabilities and the meaning of medical education. Acad Med. 2001;76:1027–31.
3. Hafferty FW, Gibson GG. Learning disabilities, professionalism, and the practice of medical education. Acad Med. 2003;78:189–201.
4. Pitoniak MJ, Royer JM. Testing accommodations for examinees with disabilities: a review of psychometric, legal, and social policy issues. Rev Educ Res. 2001;71:53–104.
5. Koenig JA, Leger KF. A comparison of retest performances and test-preparation methods for MCAT examinees grouped by gender and race-ethnicity. Acad Med. 1997;72(10 suppl):S100–S102.
6. Wiley A, Koenig JA. The validity of the Medical College Admission Test for predicting performance in the first two years of medical school. Acad Med. 1996;71(10 suppl):S83–S85.
7. Mitchell KJ. Traditional predictors of performance in medical school. Acad Med. 1990;65:149–58.
8. Swanson DB, Case SM, Koenig J, Killian CD. Preliminary study of the accuracies of the old and new Medical College Admission Tests for predicting performance on USMLE Step 1. Acad Med. 1996;71(10 suppl):S25–S27.
9. Thornton AE, Reese LM, Pashley PJ, Dalessandro SP. Predictive Validity of Accommodated LSAT Scores. Law School Admission Council Technical Report 01-01, May 2002.
10. Wolf RM. Evaluation in Education: Foundations of Competency Assessment and Program Review. New York: Praeger Publishers, 1990.
11. Olfiesh NS, Hughes CA. How much time? A review of the literature on extended test time for postsecondary students with learning disabilities. J Postsecondary Educ Disability. 2000;16:2–16.
12. Olfeish NS, Kroger S, Funckes C. Use of processing speed and academic fluency test scores to predict the benefit of extended time for university students with reading-based learning disabilities. Presented at the 25th meeting of the Association on Higher Education and Disability, Washington, DC, July 8–12, 2002.
13. Gregg N, Mather N, Shaywitz S, Sireci S. The Flagging of Test Scores of Individuals with Disabilities Who Are Granted the Accommodation of Extended Time: A Report of the Majority Opinion of the Blue Ribbon Panel of Flagging. Princeton, NJ: Educational Testing Service, 2002.