The medical school admission process involves recruiting, evaluating, and accepting applicants to medical school. During the evaluation process, admission officers review a wide variety of applicant data related to academic preparation, personal attributes, and extracurricular experiences to assess applicants’ strengths and determine their likelihood of success in medical school. In determining whom to accept, admission officers must weigh these data in relation to the institution’s mission, goals, and diversity interests.
One important source of academic preparation data is the Medical College Admission Test (MCAT), a standardized examination that assesses fundamental knowledge of scientific concepts, critical reasoning ability, and written communication skills. MCAT scores are the only common metric of academic preparedness on which medical school applicants can be compared because the meaning of undergraduate grade point averages (UGPAs) can vary by field, course, and undergraduate institution.1,2
It is vital, therefore, that tests used in high-stakes decisions, such as medical school admissions, be subjected to the highest levels of scrutiny. Professional testing standards call for an ongoing program of validation to collect evidence about such tests’ validity, reliability, and fairness, among other things.3
Medical college admission officers, as primary users of MCAT scores, recognize the criticality of ensuring that the MCAT exam produces valid and reliable information about applicants’ academic preparedness and that scores are used appropriately in the admission process. They have a stake in ensuring that MCAT content is relevant, that the exam predicts important medical school outcomes, and that the test does not unfairly disadvantage any applicant group.
The importance of these considerations is heightened in the context of selecting a diverse student body. Diversity, including but not limited to racial and ethnic diversity, is widely viewed as an important goal by admission committees. Admitting a heterogeneous student body may help the medical school achieve its distinct missions and goals, such as serving underserved communities or populations. A diverse student body has been linked to important benefits in medical school and beyond, including improved teaching4 and learning5 and strong attitudes about the importance of equitable access to care.6 Further, increasing the diversity of medical students increases the likelihood that future physicians will be prepared to care for a diverse and global patient population, as well as to serve communities with disparate health care needs.6
In this article, we examine evidence culled from extant data and primary research findings about the use of the MCAT scores of white, black, and Latino medical school applicants. This research was conducted as part of the fifth comprehensive review (MR5) of the MCAT exam, the current version of which was introduced in 1991. The Association of American Medical Colleges (AAMC) convened the MR5 Committee to evaluate the current exam and make recommendations for a new version. This reviewed research related to medical school admissions, racial and ethnic group differences in academic achievement, bias in testing, stereotype threat, test speededness, and the predictive value of MCAT scores. Primary analyses were also conducted, using data from the total population of MCAT examinees, from the subset of examinees who applied to medical school, and from the more restricted groups of examinees who received offers of acceptance from one or more medical schools and of examinees who ultimately matriculated.
Here, we present our analysis of primary and extant data sources, organized around four issues central to the question of whether racial and ethnic group differences in MCAT performance reflect test bias. First, we evaluate how mean scores on the MCAT differ for racial and ethnic groups, and how these differences compare to differences on other widely used admission exams. Second, we examine whether MCAT scores exhibit bias in their prediction of subsequent outcomes—specifically, graduation from medical school and United States Medical Licensing Examination (USMLE) Step 1 performance. Third, we explore possible explanations for mean differences on MCAT scores, such as family, neighborhood, and school conditions that may limit an applicant’s opportunities to achieve his or her potential. Finally, we consider whether the MCAT exam acts as a barrier to medical school admission for underrepresented minorities (URMs).
To examine these issues, we extracted data from the data warehouse maintained by the AAMC, which contains records of MCAT examinees, applicants to U.S. medical schools, and students who have matriculated at U.S. medical schools. The AAMC assigns a unique personal identifier to each person’s data, which allows researchers to link deidentified records for analysis. We limited our study to data for individuals who self-reported their race/ethnicity as white, black, or Latino because data for other URM groups were based on small sample sizes insufficient for the analyses. Preliminary analyses were conducted during 2008–2011, and we finalized our results in May 2012. This study was approved by the institutional review board of the American Institutes for Research as part of the MCAT program’s psychometric research protocol.
Question 1: Do White, Black, and Latino Applicants Differ in Mean MCAT Scores?
The MCAT exam has been shown to be a useful predictor of selected benchmarks of success in medical school, both when considered alone and in combination with other academic credentials.7–9 Evidence from a wide variety of sources has demonstrated that well-developed standardized tests of knowledge, skill, and achievement, including the MCAT exam, are valid predictors of performance in a variety of employment and academic settings.10,11
Although the MCAT exam has been shown to be an effective predictor of performance in medical school, differences have been observed between the mean MCAT scores of examinees in different racial and ethnic groups. Prior research has reported, for example, that mean MCAT scores are lower for black and Latino students than for white students.8 Using data for white, black, and Latino examinees who tested during 2009, we calculated mean MCAT total and section scores, as well as standardized mean differences (ds) to facilitate interpretation of between-group differences. A d of 0 reflects no difference in mean scores between groups, whereas a positive d reflects a majority mean that is higher than the minority mean, and a negative d reflects a minority mean that is higher than the majority mean. In terms of magnitude of differences, a d of 0.2 is small, 0.5 is medium, and 0.8 is large.12 As reported in Table 1, we found large standardized mean differences in MCAT total scores (white–Latino = 0.8, white–black = 1.0) and medium to large differences in MCAT section scores that are consistent with differences in MCAT scores from other years.8
Table 1: MCAT Scores for 2009 Examinees and Undergraduate GPAs for Medical School Applicants to the 2010 Matriculating Class, Means and Standardized Mean Differences (ds) by Racial and Ethnic Group*
We also compared the mean UGPAs of white, black, and Latino medical school applicants who applied for admission to the 2010 matriculating class. The large white–black difference (d = 0.9) is similar in magnitude to that in mean MCAT total scores, whereas the medium white–Latino difference (d = 0.5) is somewhat smaller than that in MCAT total scores.
The mean differences we found in 2009 MCAT scores mirror differences on other standardized tests used for a variety of educational purposes.10,13–20Table 2 presents ds for several graduate admission exams including the Graduate Record Examination (GRE), the Graduate Management Admission Test (GMAT), and the Law School Admission Test (LSAT), all of which have been shown to have racial/ethnic differences in test performance comparable to (or greater than) those on the MCAT exam.13,15,19 As this table illustrates, the performance differences on graduate admission exams are also similar to those seen on undergraduate admission tests14,15,19 and on measures of earlier achievement in kindergarten through high school.19
Table 2: White–Black and White–Latino Standardized Mean Differences (ds) on Different Types of Admission and School Exams
It is critical, however, to recognize that mean differences in MCAT scores do not provide a complete picture of black and Latino examinees’ performance on the MCAT exam compared with that of white examinees. Specifically, substantial overlap exists in the distribution of MCAT scores for these three groups of examinees (Figure 1).
Figure 1: The distribution of Medical College Admission Test (MCAT) total scores* for white, black, and Latino examinees who tested in 2009. The box-and-whisker plots show the scores associated with the 10th, 25th, 50th (median), 75th, and 90th percentiles for each group. The most recent score in 2009 was used for repeat examinees. Source: AAMC Data Warehouse: Examinee File, accessed January 27, 2012.*The MCAT exam has three multiple-choice sections, each of which is scored 1 to 15. The total score reflects the sum of these three section scores and ranges from 3 to 45. The MCAT exam also includes a writing sample section, whose results are not reported here.
Question 2: Are the Mean Differences in MCAT Scores Due to Test Bias?
Numerous authors in the popular and academic press have expressed concern that the mean differences in majority and URM examinees’ performance on admission and other standardized tests could be due to test bias.10,21–24 Test bias is a fundamental concern when life-altering decisions rely in part on test scores, because it could unfairly limit access to opportunities—in the case of the MCAT exam, test bias could affect admission to medical school. Although some individuals may conclude that any test showing average differences in performance between majority and URM examinees is biased, such differences, in the absence of corroborating evidence, are insufficient to conclude that there is bias.3,25 Instead, professional testing standards compel testing experts to gather logical and empirical evidence about potential sources of these group differences using well-established (and generally agreed-on) procedures to determine whether test bias exists.3
Test bias arises “when deficiencies in a test itself or the manner in which it is used result in different meanings for scores earned by members of different identifiable subgroups.”3 Deficiencies in the test could be caused by construct-irrelevant variance, which occurs when test performance is influenced by factors, such as test content and administration conditions, that are unrelated to the knowledge and skills, or the “construct,” measured by a test. Item content is also construct-irrelevant if it draws on experiences more common to one group of examinees than another and is unrelated to the knowledge or skills being measured. Researchers evaluate whether these types of construct-irrelevant factors cause people with the same underlying skill level to earn different test scores.3,26
Two broad strategies are employed to examine the possibility of bias in the MCAT exam. First, extensive resources are devoted to preventing irrelevant test content from influencing performance and to making sure that the test administration is standardized and relies on procedures suited to the tasks being performed by the examinee. MCAT item writers, reviewers, and editors follow detailed guidelines to ensure that the content of passages and items meets test specifications. All items undergo bias and sensitivity review by experts with diverse backgrounds to identify and eliminate any features of the items that are construct-irrelevant. Items that survive the bias and sensitivity review are tried out on the MCAT exam but are not counted in examinees’ scores. Instead, examinees’ responses to these items are analyzed to determine whether the items are of appropriate difficulty and adequate reliability. Only items that survive these sensitivity and empirical reviews are used as scored items on the MCAT exam. The MCAT administration process is also standardized to ensure that scores have comparable meaning. Examinees receive the same instructions, the same amount of testing time, and the same types of computer equipment to eliminate the possibility that differences in test scores are caused by differences in administration conditions. (Examinees with appropriately documented disabilities are granted nonstandard accommodations to minimize aspects of a disability that are not related to the construct being measured.)3
Second, “differential prediction”3 analysis is used to examine whether a given MCAT score forecasts the same level of future performance regardless of the examinee’s race or ethnicity. If the MCAT exam predicts success in medical school in a comparable fashion for different racial and ethnic groups, medical students with the same MCAT score will, on average, achieve the same outcomes regardless of racial or ethnic background.25 On the other hand, if their outcomes differ significantly, test bias in the form of differential prediction exists because the prediction will be more accurate for some groups than for others. Determining whether the MCAT exam is biased in identifying who will be successful in medical school is of utmost importance because of the role that MCAT scores play in the process of selecting qualified applicants.27,28
Consider, for example, using MCAT scores to predict the likelihood of medical students’ graduating within four years of matriculation. We would look for bias in prediction by comparing the predicted and observed graduation rates for each group. If 90% of students with a given MCAT score were predicted to graduate in four years, we would expect the observed graduation rates for white and black students earning that MCAT score to be highly similar and around 90%. If the observed graduation rates were 90% for white students but 95% for black students with the same MCAT score, this would be evidence of predictive bias against black students because their four-year graduation rate was higher than predicted and also higher than that of white students earning the same score. This is an example of underprediction.
In general, if a test were to underpredict the performance of a group, the observed performance of students in that group would be higher than their predicted performance. Underprediction of URM students’ performance is important in the context of medical school admissions because if URMs perform better in medical school than their MCAT scores would suggest, they may be admitted at lower rates than they should be.
We therefore examined the differential prediction of the MCAT exam for white, black, and Latino students who matriculated at U.S. medical schools in 2000–2004 for two types of medical school outcomes: (1) passing USMLE Step 1 and (2) graduating from medical school. We used MCAT total scores to predict pass/fail status on the Step 1 exam, both at first attempt and eventually (after additional attempts until 2010), and to predict graduation status four and five years after matriculation. All outcome measures were dichotomous, coded as “1” if a student succeeded (e.g., graduated within four years) and “0” if a student did not succeed (e.g., did not graduate within four years).
We conducted logistic regression analyses to estimate the probability of success on the basis of students’ MCAT total scores (e.g., the probability that someone earning an MCAT total score of 27 will graduate in four years). We compared the predicted success rates with the observed success rates—separately for white, black, and Latino students—to examine whether the MCAT exam is biased against URM (i.e., black and Latino) students in its predictions of their performance in medical school. For each outcome measure, we conducted two identical sets of logistic regression analyses: one for black and white students and one for Latino and white students.* We conducted the analyses separately by school and then pooled the results.†
We defined prediction error as observed minus predicted success rates, so positive differences indicate that more students succeeded than predicted, whereas negative differences indicate that fewer students succeeded than predicted. For example, on the outcome of graduation within four years of matriculation, positive differences would indicate that MCAT scores underestimated black (or Latino) students’ performance in medical school because more black (or Latino) students graduated in four years than were predicted to graduate in four years on the basis of their MCAT total scores. On the other hand, zero or negative differences would indicate, respectively, that the same number of or fewer black (or Latino) students succeeded on the criterion than were predicted to succeed on the basis of their MCAT total scores, suggesting that the MCAT exam is not biased against black (or Latino) students.
The results of our analysis are summarized in Table 3. For the outcome of passing Step 1 on the first attempt, 83.1% of black students were predicted to pass compared with 80.9% who actually passed. In other words, 2.2% fewer black students passed Step 1 than were predicted to pass. Similarly, 1.6% fewer Latino students passed than were predicted to pass. The differences between observed and predicted success rates were smaller for the outcome of passing Step 1 eventually, but these analyses similarly did not show underprediction that would suggest the MCAT exam is biased against black or Latino students (–0.3% and –0.2%, respectively).
Table 3: Comparison of Observed and Predicted Success Rates on Four Measures of Medical School Performance for White, Black, and Latino Medical Students Who Matriculated at MD-Granting U.S. Medical Schools From 2000 to 2004*,†
The results for predicting graduation rates were similar. Specifically, fewer black and Latino medical students graduated in four years than were predicted to graduate (–6.6% and –4.8%, respectively). The differences between the observed and predicted percentages of students graduating in five years were smaller (–3.4% and –2.2% for black and Latino students, respectively). These results indicate that the graduation rates of black and Latino medical students were not underpredicted by the MCAT exam.
Two important trends are reflected in Table 3. First, the differences between the observed and predicted success rates are greater for passing Step 1 on the first attempt and graduation within four years of matriculation than they are for the outcome measures reflecting eventual (or later) success. As the success rates approach 100%, the differences decrease. The second, and arguably more important finding, is that these results provide no evidence that the MCAT exam is biased against either black or Latino medical students—that is, none of the four outcomes showed these minority groups succeeding at rates greater than those predicted by their MCAT scores.
Our findings are consistent with past studies on the differential prediction of the MCAT exam and other standardized tests used for college and graduate school admissions, which have shown no statistically significant predictive bias against minority students.8,26,29–31 Confirming that the current MCAT exam is not biased against black and Latino applicants was important as the MR5 Committee sought to identify changes that would improve the exam’s value in identifying the applicants who are the most likely to succeed in medical school.
Question 3: What Might Explain Mean Differences in MCAT Scores Across Groups?
If the MCAT exam is not biased, what other factors could be contributing to differences in MCAT scores? The environments and experiences of youth raised in the United States vary in innumerable ways, as reflected in social class and economic status; rural, suburban, or urban environments; variations in racial and ethnic diversity in communities, neighborhoods, and schools; and access to resources and educational opportunities, to name a few. Some of these environments have been shown to contribute—positively or negatively—to academic achievement, meaning that exposure to some conditions may maximize one’s potential for achievement, whereas exposure to other conditions may limit one’s potential.32–35 None of these factors works in isolation; rather, it is likely that both positive and negative conditions accumulate, taking shape in different ways for different people.
Family and neighborhood characteristics, educational factors, and geographic conditions all have been shown to correlate with academic achievement gaps spanning kindergarten through high school and college; these gaps, in turn, have been shown to vary systematically by racial and ethnic group status.32–35Table 4 presents a small sample of the various factors that correlate with the academic achievement gaps among racial and ethnic groups. The risk factors analyzed in these prior studies suggest that, compared with white students, black and Latino students generally are more likely to be exposed to school, family, and environmental conditions that may reduce their potential for academic achievement and are less likely to be exposed to positive factors. For example, black and Latino third-graders are more likely than white third-graders to report having changed schools three or more times since the first grade32 and to live in food-insecure households.34 Conversely, the parents of white students in grades K to 12 are more likely to volunteer or serve on a committee at their child’s school than are the parents of black and Latino students.34 Black and Latino students are more likely to experience living in poverty and attending low-quality day care,35 whereas white students are more likely to be read to every day by a family member.34
Table 4: Factors Affecting the General Population Related to Academic Achievement Gaps Between Racial and Ethnic Groups*
The prevalence of the environmental conditions reported in Table 4 for the general U.S. population may not reflect the conditions experienced by medical school examinees, applicants, or matriculants, however. Recent research suggests that medical students are more likely to come from families earning incomes that are higher than those of families in the general population36 and that white medical students’ parental education levels are likely to be higher than those of URM medical students.37 For example, in each year from 1987 to 2005, less than 6% of medical students had parental incomes in the bottom fifth of U.S. household incomes, whereas 48% of medical students had parental incomes in the highest fifth.36,38 We therefore explored whether, within the select group of persons interested in medical school, whites, blacks, and Latinos varied in their exposure to certain conditions that could influence their achievement on the MCAT exam.
Table 5 presents selected indicators pertaining to parental education and income, along with undergraduate education indicators and MCAT preparation activities for whites, blacks, and Latinos who, in 2010, took the MCAT exam, applied to medical school, or matriculated. These data suggest that although medical schools disproportionately attract individuals from better economic circumstances than those of the general population, differences exist in the conditions experienced by whites, blacks, and Latinos who are interested in pursuing a medical degree. We found that, compared with white applicants, black and Latino applicants were less likely to have at least one parent who earned a college or graduate degree and more likely to have no parent with a college degree. Black and Latino applicants were also more likely than white applicants to have been raised in families with lower household incomes or in single-parent households and to qualify for the fee assistance programs provided by the AAMC.
Table 5: Profiles of White, Black, and Latino MCAT Examinees, Medical School Applicants, and Matriculants at U.S. Medical Schools, 2010*
The data in Table 5 suggest only a small sample of the potential explanations for average differences in performance on the MCAT exam. Beyond the influence of socioeconomic status, achievement may be influenced (positively or negatively) by differential access to the educational or occupational opportunities that can occur through social and cultural capital—whereby individuals with certain social networks gain access to opportunities that promote academic achievement39,40—or by institutional racism, where differential access to opportunities is incorporated into institutional policies or practices.41 Similarly, achievement may be influenced by repeated exposure to subtle slights or microaggressions42 that permeate the educational process and cause the individual to question his or her own competence on the basis of his or her race or ethnicity.43 These various influences are complex and were not addressed by this study.
Although the data in Table 5 provide context for understanding group differences in MCAT performance, they cannot provide direction for deciding which applicants will be successful in medical school. No two applicants will have the same school, home, and life experiences. Indeed, the same life experiences could have different effects on achievement for any two applicants. It is also true that these factors are correlates rather than determinants of performance. Students of varying backgrounds have succeeded in medical school and, more importantly, as practicing physicians. Finally, as discussed previously, whereas white, black, and Latino applicants’ exposure to risk factors and mean MCAT scores vary, there is a wide range of overlap in the MCAT scores (Figure 1). Between-group differences in exposure to risk factors in the general or the medical school applicant population do not readily apply to any individual in a particular group.
Question 4: Does the MCAT Exam Act as a Barrier to Admission for Black and Latino Applicants?
In this section, we examine admission decisions as a means of understanding how medical school admission committees weigh white, black, and Latino applicants’ MCAT scores in relation to other personal characteristics and life experiences, as well as other indicators of academic preparedness. The admission process is complex and involves multiple stages44; in addition, each medical school establishes its own criteria and weights for different types of applicant data to select students who will succeed given the educational program, resources, and mission of the institution. At the end of the process, however, each committee makes decisions about whom to accept, and this admission decision is therefore our focus. Specifically, the mean differences in MCAT performance and in UGPAs reported in Table 1 suggest that if these academic credentials were highly emphasized in the admission process, the acceptance rates of black and Latino applicants would be considerably lower than those of white applicants.
Figure 2 compares white, black, and Latino individuals’ academic qualifications, as measured by MCAT scores and reported in applications for admission to the 2010 matriculating class, with the percentages of applicants in each group who were ultimately offered acceptance by one or more medical schools. As the figure illustrates, the percentage of white applicants with MCAT scores ≥ 25 is much greater than the percentage of black or Latino applicants reporting similar scores (84% for whites versus 37% for blacks and 56% for Latinos). This profile of MCAT scores stands in sharp contrast to the overall acceptance rates shown in Figure 2 (47% for whites versus 40% for blacks and 49% for Latinos), reflecting differences of 7 percentage points for white versus black applicants and –2 percentage points for white versus Latino applicants.
Figure 2: The percentages of white, black, and Latino individuals applying for admission to the 2010 matriculating medical school class with Medical College Admission Test (MCAT) total scores ≥ 25,* and the percentages of those same applicants who were accepted into at least one MD-granting U.S. medical school. If an applicant took the MCAT more than once, his or her most recent score at the completion of the application cycle was used in this analysis. Source: AAMC Data Warehouse: Applicant Matriculant File, accessed January 27, 2012.*An MCAT total score of 25 was selected as the cut score because more than 75% of applicants report MCAT total scores ≥ 25, and the profile of percentages of white, black, and Latino applicants reporting scores at or above a given threshold remained very similar when the threshold was set at higher and lower values. Only when the threshold was set at MCAT total scores between 15 and 20 were the differences reduced in the profile of percentages of white, black, and Latino applicants reporting scores at or above the cut. MCAT total scores reflect the sum of the exam’s three multiple-choice sections and range from 3 to 45.
The similar overall acceptance rates for the three groups suggest that admission committees do not limit themselves to the consideration of MCAT scores in their efforts to identify the applicants who are the most likely to succeed in medical school. If they did, differences in acceptance rates across racial and ethnic groups would more closely parallel differences in their mean MCAT scores. That is, greater emphasis on the MCAT exam would decrease the percentages of minority applicants selected for entry into medical school.11 In sum, although group differences on the MCAT exam have the potential to reduce the percentage of URM students selected into medical school, these results show that this is not occurring in practice.
In Conclusion
Black and Latino examinees had lower average 2009 MCAT scores than did white examinees. The between-group differences we found are similar to those reported for the GRE, LSAT, SAT, and other admission tests. They also are similar to differences in the average UGPAs of URM and majority medical school applicants. Our findings do not, however, point to bias in the design, use, or predictive value of the MCAT exam. Rather, data that predict medical students’ performance on the basis of their MCAT scores show that the MCAT exam is not biased against black and Latino applicants. Factors other than bias in the exam might explain differences in performance, such as family, neighborhood, and school conditions, which relate to academic achievement and differ by group. Admission committees accept majority and URM applicants at similar rates, looking beyond MCAT data to select students with a wide range of experiences and characteristics. Indeed, the high success rates for all students on the outcomes we examined likely reflect multiple influences, including admission committees’ use of MCAT scores in conjunction with other data in determining applicants’ likelihood for success in medical school, the resources provided by institutions to assist students who need support, and the efforts of the medical students themselves.
Although this study provides evidence that the current MCAT exam is not biased against URMs in predicting performance in medical school, additional research is needed. Arguably, passing Step 1 and graduating are not the only measures of success in medical school. It is important to understand how the MCAT exam predicts other important measures of medical student performance that it might reasonably be expected to predict and, conversely, which outcomes it does not predict well.45 Upcoming changes to the MCAT exam will necessitate the collection of new evidence on the predictive validity of the revised test, particularly with respect to aspects of medical school performance that rely on foundations of scientific reasoning, reasoning with data, biochemistry knowledge, and knowledge of the behavioral and social sciences.46
Medical schools with different missions likely look for different qualities in applicants and also value and reward performance differently. Therefore, an important related focus of future research is potential variations in the predictive validity of the MCAT exam at medical schools whose priorities lie in research, education, and clinical performance, among other areas. It also is important to consider the value of the MCAT exam together with other applicant data, such as personal characteristics and experiences.47
Finally, in this article, we presented a small sample of socioeconomic indicators on which white, black, and Latino examinees, applicants, and matriculants varied. Not all factors that might influence achievement are socioeconomic, however. Stereotype threat, for example, is often cited as a factor that may influence performance.43 According to this theory, the negative stereotypes about a group to which an examinee belongs can be internalized and thereby hinder the examinee’s performance, particularly if the examinee affiliates strongly with the group and if the stereotype is made salient on the test. Stereotype threat has been shown to reduce working memory capacity,48 interfere with the learning process,49 and impair performance on tests in laboratory settings.43 However, there is a lot we do not yet know about how stereotype threat works in applied educational settings, although the evidence for its effects in these settings has been mixed.50,51 Future research should look at the differences between minority-serving and other institutions and whether these different types of medical schools show different patterns of performance for white, black, and Latino students. Stereotype threat involves a complex interplay of conditions and individual reactions, and attempts should also be made to design research that will improve our understanding of whether and how it plays a role in mean MCAT score differences among white, black, and Latino examinees.
Acknowledgments: The authors would like to thank the National Board of Medical Examiners and the following Association of American Medical Colleges (AAMC) personnel and MR5 Committee members for reviewing earlier drafts of this article: Atul Grover, Robert Jones, Karen Mitchell, Alicia Monroe, Scott Oppler, Norma Poll-Hunter, Elisa Siegal, Henry Sondheimer, Frank Trinity, and Geoffrey Young. We would also like to thank the members of the MR5 Committee for their tireless efforts to perform the comprehensive review for which this research was conducted: Steven Gabbe, Ronald Franks, Lisa Alty, Dwight Davis, Kevin Dorsey, Michael Friedlander, Robert Hilborn, Barry Hong, Richard Lewis, Maria Lima, Catherine Lucey, Alicia Monroe, Saundra Oyewole, Erin Quinn, Richard Riegelman, Gary Rosenfeld, Wayne Samuelson, Richard Schwartzstein, Maureen Shandling, Catherine Spina, and Ricci Sylla.
Funding/Support: None.
Other disclosures: Dr. Davis, Dr. Dorsey, and Dr. Franks were members of the MR5 Committee. Dr. Sackett was a paid consultant to the MR5 Committee. Medical College Admission Test (MCAT) is a program of the AAMC. Related trademarks owned by the AAMC include Medical College Admission Test, MCAT, and MCAT2015.
Ethnical approval: This study was approved by the institutional review board of the American Institutes for Research as part of the MCAT program’s psychometric research protocol.
* By conducting analyses separately for the set of black and white students and for the set of Latino and white students, we explicitly established white students as the benchmark against which differential prediction of each minority group was compared. Results based on an overall model that included all three groups in the same analysis did not differ appreciably in magnitude or direction of differences.
Cited Here
† For each school, we summed the predicted probabilities of success separately for white and black students, computing four indices: (1) predicted number of white students succeeding, (2) observed number of white students succeeding, (3) predicted number of black students succeeding, and (4) observed number of black students succeeding. We then summed these four indices over all schools. We performed the same set of steps to estimate the predicted and observed numbers of students succeeding for Latino and white students.
Cited Here
References
1. Young JW. Grade adjustment methods. Rev Educ Res. 1993;63:151–165
2. Goldman RD, Widawski MH. A within-subjects technique for comparing college grading standards: Implications in the validity of the evaluation of college achievement. Educ Psychol Meas. 1976;36:381–390
3. American Educational Research Association, American Psychological Association, National Council on Measurement in Education.Standards for Educational and Psychological Testing. 19992nd ed Washington, DC American Educational Research Association
4. Milem JFOrfield G, Kurlaender M eds. Increasing diversity benefits: How campus climate and teaching methods affect student outcomes. In: Diversity Challenged: Evidence on the Impact of Affirmative Action. 2001 Cambridge, Mass Harvard Education Publishing Group:233–249
5. Tedesco LASmeadley BD, Stith AY eds. The role of diversity in the training of health professionals. In: The Right Thing to Do, The Smart Thing to Do: Enhancing Diversity in the Health Professions: Summary of the Symposium on Diversity in Health Professions in Honor of Herbert W. Nickens, MD. 2001 Washington, DC National Academies Press
6. Saha S, Guiton G, Wimmers PF, Wilkerson L. Student body racial and ethnic composition and diversity-related outcomes in U.S. medical schools. JAMA. 2008;300:1135–1145
7. Julian ER. Validity of the Medical College Admission Test for predicting medical school performance. Acad Med. 2005;80:910–917
8. Koenig JA, Sireci SG, Wiley A. Evaluating the predictive validity of MCAT scores across diverse applicant groups. Acad Med. 1998;73:1095–1106
9. Dunleavy DM, Kroopnick MH, Dowd KW, Searcy CA, Zhao XThe predictive validity of the MCAT exam in relation to academic performance throughout medical school. . A national cohort study of 2001–2004 matriculants. Acad Med. 2013;88:666–671
10. Kuncel NR, Hezlett SA. Standardized tests predict graduate students’ success. Science. 2007;315:1080–1081
11. Sackett PR, Schmitt N, Ellingson JE, Kabin MB. High-stakes testing in employment, credentialing, and higher education. Prospects in a post-affirmative-action world. Am Psychol. 2001;56:302–318
12. Cohen J Statistical Power Analysis for the Behavioral Sciences. 19882nd ed Hillsdale, NJ Lawrence Erlbaum Associates
13. Roth PL, Bevier CA, Bobko P, Switzer FS, Tyler P. Ethnic group differences in cognitive ability in employment and educational settings: A meta-analysis. Pers Psychol. 2001;54:297–330
14. D’Mello S, Koch A, Sackett PR. Cohen’s d and the homoscedasticity assumption: How much heteroscedasticity is too much? Paper presented at: Annual Conference of the Society for Industrial and Organizational Psychology. April 10, 2010 Atlanta, Ga
15. Camara WJ, Schmidt AE Group Differences in Standardized Testing and Social Stratification. 1999 New York, NY College Entrance Examination Board
16. Kuncel NR, Ones DS, Hezlett SA. A comprehensive meta-analysis of the predictive validity of the Graduate Record Examinations: Implications for graduate student selection and performance. Psychol Bull. 2001;127:162–181
17. Linn RL, Hastings CN. A meta-analysis of the validity of predictors of performance in law school. J Educ Meas. 1984;21:245–259
18. Kuncel NR, Hezlett SA, Ones DS. Academic performance, career potential, creativity, and job performance: Can one construct predict them all? J Pers Soc Psychol. 2004;86:148–161
19. Sackett PR, Shen WOuttz JL ed. Subgroup differences on cognitively loaded tests in contexts other than personnel selection. In: Adverse Impact: Implications for Organizational Staffing and High Stakes Selection. 2010 . New York, NY Taylor and Francis Group:323–346
20. Kuncel NR, Credé M, Thomas LL, Klieger DM, Seiler SN, Woo SE. A meta-analysis of the validity of the Pharmacy College Admission Test (PCAT) and grade predictors of pharmacy student performance. Am J Pharm Educ. 2005;69:339–347
21. Ramsey JH. Test scores aren’t 100% of the picture. San Jose Mercury News. August 14, 1997:10B
22. Roser MA. Test firm lauds A&M de-emphasis of MCAT. February 15, 1998 Austin American-Statesman:B1
23. Weiss K. UC faculty chief backs dropping SAT as unfair. February 18, 2001 Los Angeles Times
http://articles.latimes.com/2001/feb/18/local/me-27050. Accessed December 14, 2012
24. National Center for Fair and Open Testing (FairTest).. “Healthy” medical school admissions.
http://www.fairtest.org/facts/mcat.html. Accessed December 14, 2012
25. Sackett PR, Borneman MJ, Connelly BS. High stakes testing in higher education and employment: Appraising the evidence for validity and fairness. Am Psychol. 2008;63:215–227
26. Cleary TA. Prediction of grades of negro and white students in integrated colleges. J Educ Meas. 1968;5:115–124
27. Shepard LA. The selection of medical students. JAMA. 1987;257:2291–2292
28. Linn RLWigdor AK, Garner WR eds. Ability testing: Individual differences, prediction, and differential prediction. In: Ability Testing: Uses, Consequences, and Controversies. 1982 Washington, DC National Academy Press
29. Kyei-Blankson LS. Predictive Validity, Differential Validity, and Differential Prediction of the Subtests of the Medical College [dissertation]. 2005 Athens, Ohio Ohio University
30. Linn RL. Fair test use in selection. Rev Educ Res. 1973;43:139–161
31. Young JW Differential Validity, Differential Prediction, and College Admission Testing: A Comprehensive Review and Analysis. 2001 New York, NY College Entrance Examination Board
32. Barton PE Parsing the Achievement Gap: Baselines for Tracking Progress. 2003 Princeton, NJ Educational Testing Service, Policy Information Center
33. Logan JR, Oakley DTate WF ed. Segregation, unequal educational opportunities, and the achievement gap in the Boston region. In: Research on Schools, Neighborhoods, and Communities: Toward Civic Responsibility. 2012 Rowman and Littlefield Publishers:103–124
34. Barton PE, Coley RJ Parsing the Achievement Gap II. 2009 Princeton, NJ Educational Testing Service, Policy Information Center
35. Barton PE, Coley RJ The Family: America’s Smallest School. 2007 Princeton, NJ Educational Testing Service, Policy Information Center
36. Jolly P. Diversity of U.S. medical students by parental income. AAMC Analysis in Brief. January 2008;8(1)
37. Grbic D, Garrison G, Jolly P. Diversity of U.S. medical school students by parental education. AAMC Analysis in Brief. August 2010;9(10)
38. United States Census Bureau.. Table H-1: Income limits for each fifth and top 5 percent of all households: 1967 to 2010.
http://www.census.gov/hhes/www/income/data/historical/household. Accessed May 18, 2012
39. Newman DA, Hanges PJ, Outtz JL. Racial groups and test fairness, considering history and construct validity. Am Psychol. 2007;62:1082–1083
40. Lee JS, Bowen NK. Parent involvement, cultural capital, and the achievement gap among elementary school children. Am Educ Res J. 2006;43:193–218
41. Brondolo E, Libretti M, Rivera L, Walsemann KM. Racism and social capital: The implications for social and physical well-being. J Soc Issues. 2012;68:358–384
42. Sue DW, Capodilupo CM, Torino GC, et al. Racial microaggressions in everyday life: Implications for clinical practice. Am Psychol. 2007;62:271–286
43. Steele CM, Aronson J. Stereotype threat and the intellectual test performance of African Americans. J Pers Soc Psychol. 1995;69:797–811
44. Monroe A, Quinn E, Samuelson W, Dunleavy DM, Dowd KW. An overview of the medical school admission process and use of applicant data in decision making: What has changed since the 1980s? Acad Med. 2013;88:672–681
45. Epstein RM. Assessment in medical education. N Engl J Med. 2007;356:387–396
46. Association of American Medical Colleges. Preview Guide for the MCAT2015 Exam. 20122nd ed Washington, DC Association of American Medical Colleges
https://www.aamc.org/students/download/266006/data/2015previewguide.pdf. Accessed December 14, 2012
47. Lievens F, Sackett PR. The validity of interpersonal skills assessment via situational judgment tests for predicting academic success and job performance. J Appl Psychol. 2012;97:460–468
48. Schmader T, Johns M, Forbes C. An integrated process model of stereotype threat effects on performance. Psychol Rev. 2008;115:336–356
49. Rydell RJ, Rydell MT, Boucher KL. The effect of negative performance stereotypes on learning. J Pers Soc Psychol. 2010;99:883–896
50. Cullen MJ, Waters SD, Sackett PR. Testing stereotype threat theory predictions for math-identified and non-math-identified students by gender. Hum Perf. 2006;19:421–440
51. Cullen MJ, Hardison CM, Sackett PR. Using SAT-grade and ability-job performance relationships to test predictions derived from stereotype threat theory. J Appl Psychol. 2004;89:220–230