Callahan, Clara A. MD; Hojat, Mohammadreza PhD; Veloski, Jon MS; Erdmann, James B. PhD; Gonnella, Joseph S. MD
The Medical College Admission Test (MCAT) has been widely used as a parameter for selecting medical students in the United States and Canada. Since its first administration in 1928, this examination has undergone successive revisions to improve its content and predictive validity.1
Attempts have been made by the Association of American Medical Colleges, which sponsors and administers the MCAT, to modify the content and components of different versions of the MCAT in order to improve its ability to predict students' success in medical schools and on licensing examinations.1
For example, the pre-1978 version of the MCAT (administered between 1962 and 1977) included four components (subtests): Science Achievement, General Information, Quantitative Ability, and Verbal Ability. To maximize content relevance, minimize cultural and social influence on performance, and improve comparability of measures of achievement in sciences, a new version of the MCAT was developed and administered between 1978 and 1991. This version included Science Problem Solving (a composite score derived from the Biology, Chemistry, and Physics subtests), Quantitative Skills, and Reading Skills. The latest version of the MCAT, administered in 1991 and thereafter, consists of the following four subtests: Biological Sciences, Physical Sciences, Verbal Reasoning, and Writing Sample.
Because the MCAT is a high-stakes examination, evidence in support of its psychometric properties in general, and its predictive validity in particular, is of utmost importance to justify its prominent role in decision making for medical school admissions. A number of studies have addressed the validity of different versions of the MCAT with mixed results.2–17 Julian9 reported moderately high validity coefficients for the post-1991 version of the MCAT in predicting the medical licensing examinations (r = 0.61 for Step 1, r = 0.49 for Step 2, and r = 0.49 for Step 3 of the United States Medical Licensing Examination [USMLE]). Shen and Comrey18 reported statistically significant correlations between scores of the 1978–1991 version of the MCAT and a combined index of academic performance in medical school that included scores on National Board (NB) Parts I and II. Swanson and colleagues14 compared the validity coefficients of the 1978–1991 and the post-1992 versions of the MCAT in predicting scores of Step 1 of the USMLE and found no significant change in the predictive validity of the two versions. The median correlations were 0.49 and 0.51 for the two study cohorts. In a recent meta-analytic study, Donnon and colleagues19 reported that the predictive validity of the current version of the MCAT varies from a small to a medium range for performance measures in medical school and for scores on licensing examinations.
The study of the predictive validity of the MCAT is important to justify its broad use in medical college admission decisions. Because a primary purpose of the MCAT is to achieve maximum predictive power, the impact of revisions on its predictive validity needs to be studied.
To our knowledge, no study has been published to compare the validity of the last three versions of the MCAT and their subtest scores in predicting performance measures during medical school, ratings of clinical competence in the first year of residency training, and scores on the licensing examinations. We designed this large-scale longitudinal study to address this issue. We also chose to include gender comparisons on the predictive validity coefficients of different versions of the MCAT.
Total study participants were 7,859 matriculants in 36 classes who entered Jefferson Medical College (JMC) between 1970 and 2005. We divided them into three groups based on the version of the MCAT that they had taken as applicants. The Jefferson Longitudinal Study of Medical Education, which was initiated more than four decades ago, and which at the present time includes information for over 10,000 JMC students and graduates, provided the opportunity to examine data for our study population (more information about the Jefferson Longitudinal Study of Medical Education and its relevant publications are posted at http://jdc.jefferson.edu/jlsme).
The MCAT has always been a requirement for consideration for admission to JMC. Our data set includes MCAT scores for all matriculated students during the three time periods. Group 1 consisted of 1,728 matriculants (1,445 men, 283 women) who entered JMC between 1970 and 1977. They took the pre-1978 version of the MCAT. Group 2 consisted of 3,032 matriculants (2,152 men, 880 women) who entered JMC between 1978 and 1991 and had taken the 1978–1991 version of the MCAT. Group 3 comprised 3,099 matriculants (1,769 men, 1,330 women) who entered JMC between 1992 and 2005. They took the post-1991 version of the MCAT.
Subtest scores of the MCAT were used as predictors. For Group 1 (matriculants between 1970 and 1977), scores on four subtests of the pre-1978 version of the MCAT (Science Achievement, General Information, Quantitative Ability, and Verbal Ability) were used.
For Group 2 (matriculants between 1978 and 1991), the 1978–1991 version of the MCAT contained the following subtests: Biology, Chemistry, Physics, Science Problem Solving, Quantitative Skills, and Reading Skills. The Science Problem Solving scores were not independent from those in the Biology, Chemistry, and Physics subtests because they were derived from items in these subtests. Therefore, for statistical analysis, we excluded Biology, Chemistry, and Physics and used scores for Science Problem Solving, Quantitative Skills, and Reading Skills as the predictors. We chose Science Problem Solving, instead of the Biology, Chemistry, and Physics subtests, because the Science Problem Solving scores yielded higher correlations with the criterion measures than the other three subtests.
For Group 3 (matriculants between 1992 and 2005), the post-1991 version of the MCAT contained the following subtests: Biological Sciences, Physical Sciences, and Verbal Reasoning. Also included in this version of the MCAT is a Writing Sample section, requiring applicants to write essays on given topics that were rated on an alphabetic scale ranging from J (the lowest) to T (the highest). This unique alphabetic score of the Writing Sample does not allow for correlational analyses often used in the validity studies. It is possible to convert the alphabetic scores to the integers from 1 = J to 11 = T by assuming that the letter scores constitute an interval scale of measurement, but such an assumption might not be widely accepted.6 Although in a validity study our research team found significant associations between scores of the Writing Sample subtest of the MCAT and medical school class rank, indicators of clinical competence in medical school, and ratings of interpersonal skills and attitudes in the first year of residency training,6 a meta-analytic study19 and a study at Medical University of South Carolina20 did not confirm the predictive validity of this subtest. Therefore, we excluded the Writing Sample and used only the scores from the Biological Sciences, Physical Sciences, and Verbal Reasoning subtests in this study.
The following three sets of criterion measures were used.
Performance in medical school.
We used a combined grade point average (GPA) in the first and second year of medical school, and grades in examinations in six core clerkships (family medicine, internal medicine, obstetrics–gynecology, pediatrics, psychiatry, and surgery) in the third year of medical school as measures of performance in medical school. The first- and second-year GPAs and grades in clerkship examinations are based on objective examinations with reliability coefficients that are usually in the 0.70s or higher. Evidence in support of the short- and long-term predictive validity of these measures has been reported.21,22 Attrition was also used as a criterion measure.
Performance in the first year of residency.
We used ratings on two competence areas of “knowledge and clinical capabilities” (the science of medicine) and “professionalism” (the art of medicine). The rating form included 24 items. The form was completed at the end of the first postgraduate year by either the residency program director or a faculty member most familiar with the graduate's performance. The two aforementioned components of clinical competence emerged in a factor analytic study.23 Data in support of the psychometric properties of these ratings have been reported for the original version of the rating form24,25 as well as for its revised version.23
Performance on licensing examinations.
This set of scores included the three licensing examinations: Steps 1, 2, and 3 of the USMLE (formerly NB Parts I, II, and III). Part III/Step 3 scores were available for graduates who granted written permission for the National Board of Medical Examiners to report their scores to Jefferson (77% in 1970–1977, 70% in 1978–1991). For those who took the MCAT during 1992–2005, the Step 3 scores were available for 61%, in part because some had not yet taken the examinations at the time of this study.
We calculated the mean subtest scores for those who repeated the MCAT because our previous research suggested that they are the best predictors of performance in medical school.26 Also, a mean score was calculated for each licensing examination for those who failed the test on their first attempt.
Steps 1 and 2 of the USMLE replaced NB Parts I and II in 1992. This change affected students who entered Jefferson beginning in 1990 and 1989, respectively. Step 3 replaced Part III in 1995, which affected students beginning with the class that entered in 1990. Because of the changes in content, format, and pass/fail standards of the licensing examinations from NB to the USMLE,27 we examined the predictive validity coefficients separately for two cohorts in Group 2 (who took the 1978–1991 version of the MCAT). One cohort included students who had taken the NB (Parts I, II, and III), and another cohort included those who took the USMLE (Steps 1, 2, and 3) in this time period. The predictors for both cohorts were the same MCAT subtest scores.
We calculated bivariate correlations to examine the relationships between each MCAT subtest score and each of the criterion measures. Multivariate regression analyses were also used to simultaneously examine the global relationship between the set of MCAT subtest scores (for each group) and each of the criterion measures. We adjusted the obtained multivariate R values for the number of predictors in the analyses.
We determined the relative contribution of each predictor in the multivariate statistical model by examining the magnitude of the standardized regression coefficients for each criterion measure within each group. The adjusted multivariate correlations were compared for men and women within each group to examine the predictive validity of the MCAT by gender. We set the probability of a type I error (alpha) at < .05 to judge statistical significance. The practical differences between correlations were determined by effect size estimates based on the transformation of correlation coefficients to z values.28 A correlation, or standardized regression coefficient, or an effect size less than 0.10 was considered practically (clinically) unimportant.28,29
Performance in medical school
The results of regression analyses in which the MCAT subtest scores were predictors of performance measures in medical school and residency are reported in Table 1, where the multiple R values are reported. As shown, the validity coefficients in relation to the first- and second-year GPAs declined from 0.36 (for the pre-1978 version) to 0.30 (for the 1978–1991 and post-1991 versions). Similarly, the predictive validity in relation to third-year examinations declined from 0.31 (for the pre-1978 version) to 0.24 (for the 1978–1991 version) and 0.27 (for the post-1991 version).
Although the pattern of correlations reported in Table 1 suggests a slight decline in the validity coefficients for the later versions of the MCAT in predicting performance measures in medical school, the differences were not of practical importance. In additional analyses, we examined the association between MCAT scores and attrition in medical school. Attrition in medical school due to academic difficulties was associated with scores on the Science Achievement subtest of the pre-1978 MCAT (r = 0.09, P < .01), but not with the Science Problem Solving subscale of the 1987–1991 version (r = 0.01) or the Biological Sciences subtest of the post-1991 version (r = 0.03).
We also examined the attrition rates for the top 25% and bottom 25% scorers on the aforementioned subtests. For the Science Achievement subtest (of the pre-1978 version), attrition was 2% and 6% for the top and bottom 25% scorers, respectively (χ2(2) = 9.2, P < .01). No significant difference on attrition was observed between top and bottom scorers for the Science Problem Solving subtest of the 1978–1991 version of the MCAT (5.5% in the top 25%, 6% in the bottom 25%) or for the Biological Sciences subtest of the post-1991 version of the MCAT (4.4% in the top 25%, 5.5% in the bottom 25%).
These findings indicate that scores on the Science Achievement subtest of the pre-1978 version of the MCAT were better predictors of attrition than scores on the Science Problem Solving subtest of the 1978–1991 version and the Biological Sciences subtest of the post-1991 version.
Clinical competence in residency
As shown in Table 1, slight variation can be observed in the validity coefficients in relation to ratings of clinical competence in the first year of residency. However, those predictive validity coefficients were either nonsignificant or practically negligible.
Summary results of bivariate and multivariate correlational analyses for the MCAT scores and licensing examinations are reported in Table 2. The bivariate correlations are indeed the validity coefficients of the scores for each subtest in predicting the corresponding criterion measure. The multivariate R values are the validity coefficients for the set of subtest scores in predicting the corresponding criterion measure in each group.
Examination of the bivariate correlations indicates that the Science Achievement subtest (for the pre-1978 version of the MCAT), Science Problem Solving subtest (for the 1978–1991 version), and Biological Sciences subtest (for the post-1991 version), compared with the other subtests of a given version of the MCAT, were the best predictors of Part I/Step 1 scores. Data reported in Table 2 also indicate that, compared with the Part I/Step 1 scores, the predictive validity of the verbal/reading subtest of the MCAT (Verbal Ability in the pre-1978 version, Reading Skills in the 1978–1991 version, and Verbal Reasoning in the post-1991 version) increased when Part II/Step 2 scores were the criterion measures.
Also, the Reading Skills (in Group 2) and Verbal Reasoning (in Group 3) scores predicted Part III/Step 3 scores as well as, or better than, other MCAT subtests. These findings are further supported by the magnitude of the standardized regression coefficients for the aforementioned groups. This pattern of findings is consistent with the previous report that verbal skills are better than science scores in predicting measures of clinical competence in medical school.6 This might be due to the heavy load of verbal skills required in clinical performance in the context of the patient–physician relationship. Quantitative subtests of the MCAT did not show a significantly unique contribution to the multivariate prediction models. Examination of the adjusted multivariate correlations indicates that the validity coefficients are moderately high, ranging from a low of R = 0.30 (P < .01) for the 1978–1991 version of the MCAT in predicting Step 3 scores to a high of R = 0.47 (P < .01) for the pre-1978 version in predicting Part II scores.
Data reported in Table 2 show no systematic improvement in the predictive validity of different versions of the MCAT. Conversely, data reported in Table 2 show that there has been a systematic decline in the validity coefficients from the earlier (pre-1978, R = 0.47, P < .01) to the current version of the MCAT (post-1991, R = 0.37, P < .01) in relation to Step 2 scores. The difference in the magnitude of the obtained correlations indicates that the decline in the validity coefficients of the pre-1978 version of the MCAT and the next two versions of the MCAT in predicting Part II/Step 2 are of practical importance.
Predictive validity coefficients by gender
The predictive validity coefficients (multivariate R values) for men and women in relation to performance in medical school and residency (reported in Table 1) and on the licensing examinations (reported in Table 3) suggest that in most cases the validity coefficients were larger in magnitude for women compared with men. For example, with only one exception (the predictive validity of the current version of the MCAT in relation to Step 3 scores), all other predictive validity coefficients reported in Table 3 were consistently larger for women than men in all three versions of the MCAT.
Although the pattern of gender differences in the predictive validity of the MCAT in favor of women's performance is mostly consistent, none of the differences reported in Table 1 are practically important. However, the gender differences in the validity coefficients reported in Table 3 are of practical importance for the pre-1978 version in predicting Part I scores and for the pre-1978 and 1978–1991 versions in predicting Part II and Step 2 performance, respectively. In addition, the larger validity coefficient of the 1978–1991 version for women compared with men in predicting Step 3 scores is of practical importance.
Previous multiinstitutional research has shown a substantial continuity in academic performance along different stages of medical education.30 In support of that concept, our findings indicate that scores on all three versions of the MCAT were moderately correlated with performance measures in medical school and with scores on medical licensing examinations. Considering the time interval between taking the MCAT before medical school and Part I/Step 1 (usually taken at the completion of the second year of medical school), Part II/Step 2 (usually taken during the fourth year of medical school), and Part III/Step 3 (usually taken at the completion of the first year of residency), the obtained predictive validity coefficients are impressive.
In a recent critical review of high-stakes testing in higher education and employment, it was concluded that despite some critiques raised by skeptics, tests of ability are generally valid in predicting a wide variety of short-term and long-term academic and job performance abilities.31 Our findings that MCAT scores yield significant predictive validity coefficients with medical licensing examinations taken years later (approximately three years apart [Part I/Step 1], four years apart [Part II/Step 2], and five years apart [Part III/Step 3]) confirm the aforementioned assertion about the short- and long-term predictive validity of high-stakes admission tests in higher education.
Hemphill32 reported that a validity coefficient greater than 0.30 is typically in the upper third of findings in meta-analytic studies. Given this classification, the MCAT predictive validity coefficients obtained in this study are encouraging and support the use of the MCAT scores in the screening of a large applicant pool.
The other important finding is that not only was there no significant improvement in the validity coefficients of the successive versions of the MCAT, there was an unexpected pattern of systematic decline in the validity of the MCAT in predicting Part II/Step 2 performance. This downward trend is important, given the ongoing consideration by the National Board of Medical Examiners to have only one licensing examination during undergraduate medical education, most likely to be used for evaluation at the time that the Step 2 examination is now taken.
It can be argued that changes in the format, administration, scoring, and pass/fail standards of the licensing examinations (conversion from Parts to Steps) could have contributed to the decline in validity coefficients. This seems unlikely because, despite changes in all three sequences of the licensing examinations, the decline in predictive validity was observed only for Part II/Step 2 and not for the other two sequences of licensing examinations.
Second, we examined the validity coefficients of the same version of the MCAT (1978–1991) in predicting performance on the two licensing examinations to discern whether the same set of the MCAT scores would yield different validity coefficients with NBME and USMLE scores. As reported in Table 2, in the transition from NBME to USMLE (for the 1978–1991 MCAT version), no significant change in the validity coefficients of the MCAT in predicting scores on the licensing examinations was observed. Thus, we may safely assume that the decline in the validity coefficient of the MCAT in predicting performance on NB Part II or USMLE Step 2 cannot be attributed to changes in the licensing examinations. Further research is needed to investigate the reasons for the decline in the predictive validity of the MCAT in relation to Step 2. There may be other valid reasons for revising the test, for instance, measuring achievement and performance measures in the pertinent disciplines that are more relevant to medicine than GPAs.
Also, the pattern of findings that the predictive validity coefficients were mostly and consistently higher for women than men requires further exploration. Gender differences in the correlations between personality measures and scores on NB Part I in favor of women have been reported,33 but no explanation has been offered.
We examined the variance of the MCAT and licensing examination scores for men and women to determine whether these values might lead to differences in the validity coefficients. Women were more homogeneous in the score distribution of the verbal ability subtest of the 1970–1977 MCAT (P < .05 by F test) and on the Step 3 scores. In contrast, men were more homogeneous than women on the quantitative skills subtest of the 1978–1991 MCAT and on Step 2 of the USMLE. No other gender differences in the score variances were observed. These findings suggest that the issue of consistent gender differences in the predictive validity coefficients of the MCAT scores cannot be attributed to differences in score variance.
We also found that women tended to score higher than men on the verbal sections of different versions of the MCAT (e.g., Verbal in the pre-1978 version, Reading Skills in the 1978–1991 version, and Verbal Reasoning in the post-1991 version), whereas men tended to score higher than women on the science sections of the MCAT (Science in the pre-1978 version, Science Problem Solving in the 1978–1991 version, and Biological Sciences in the post-1991 version). However, the effect size estimates of gender differences on these subtests were small to moderate. These differences did not generally influence their contributions in the prediction models, as judged by the magnitudes of their standardized regression coefficients. It is important to recognize the gender differences in the predictive validity and to further investigate the reasons for these differences in the validity coefficients, especially if such a gender difference is observed in other medical schools.
These validity findings are important, considering that another comprehensive review of the MCAT is on the horizon to produce a “revamped version” of the MCAT to possibly incorporate measures of compassion, collaboration, lifelong learning, and cultural competence.34
It would be worthwhile to replicate this study in other medical schools in which a longitudinal database has been maintained to mitigate the single-institution limitation for the generalization of our findings. Needless to say, one of the advantages of longitudinal studies in medical education is to examine historical changes and their outcomes.
Our findings provide broad support for the predictive validity of the three versions of the MCAT administered between 1970 and 2005 in relation to students' performance in medical school and scores on licensing examinations. However, we did not find any measurable improvement in the test's predictive validity after major revisions to its content and score reporting were introduced in 1978 and, subsequently, in 1991. Moreover, we observed a steady decline throughout the 36-year time period in its capacity to predict students' performance on the NB Part II examination and, subsequently, on the USMLE Step 2. We also found that the predictive validity coefficients for women were consistently higher than those for men in all three versions, across nearly four decades, in relation to a broad array of outcomes. It is essential that any plans to revise the MCAT include efforts to strengthen its ability to predict students' performance on USMLE Step 2 and to minimize the differential validity related to gender.
The authors would like to thank Dorissa Bolinski for her editorial assistance.
This study was approved by the Jefferson Medical College institutional review board.
1McGaghie W. Assessing readiness for medical education: Evolution of the Medical College Admission Test. JAMA. 2002;288:1085–1090.
2Basco WT Jr, Way DP, Gilbert GE, Hudson A. Undergraduate institutional MCAT scores as predictors of USLME Step 1 Performance. Acad Med. 2002;77(10 suppl):S13–S16.
3Colliver JA, Verhulst SJ, Williams RG. Using a standardized-patient examination to establish the predictive validity of the MCAT and undergraduate GPA as admissions criteria. Acad Med. 1989;64:482–484.
4Friedman CP, Bakewell WE. Incremental validity of the new MCAT. J Med Educ. 1980;55:399–404.
5Hall FR, Bailey BA. Correlating students' undergraduate science GPAs, their MCAT scores, and the academic caliber of their undergraduate colleges with their first-year academic performances across five classes at Dartmouth Medical School. Acad Med. 1992;67:121–123.
6Hojat M, Erdmann JB, Veloski JJ, et al. A validity study of the writing sample section of the Medical College Admission Test. Acad Med. 2000;75(10 suppl):S25–S27.
7Huff KL, Koenig JA, Treptau MM, Sireci SG. Validity of MCAT scores for predicting clerkship performance of medical students grouped by sex and ethnicity. Acad Med. 1999;74(10 suppl):S41–S44.
8Jones RF, Thomae-Forgues M. Validity of the MCAT for predicting performance in the first two years of medical school. J Med Educ. 1984;59:455–464.
9Julian ER. Validity of the Medical College Admission Test for predicting medical school performance. Acad Med. 2005;80:910–917.
10Koenig JA, Sireci SG, Wiley A. Evaluating the predictive validity of MCAT scores across diverse applicant groups. Acad Med. 1998;73:1095–1106.
11Mitchell K, Haynes R, Koenig J. Assessing the validity of the updated Medical College Admission Test. Acad Med. 1994;69:394–401.
12Nowacek GA, Pullen E, Short J, Blumner HN. Validity of MCAT scores as predictors of preclinical grades and NBME Part I examination scores. J Med Educ. 1987;62:989–991.
13Silver B, Hodgson CS. Evaluating GPAs and MCAT scores as predictors of NBME I and clerkship performances based on students' data from one undergraduate institution. Acad Med. 1997;72:394–396.
14Swanson DB, Case SM, Koenig JA, Killian CD. Preliminary study of the accuracies of the old and new Medical College Admission Test for predicting performance on USLME Step 1. Acad Med. 1996;71(10 suppl):S25–S27.
15Vancouver JB, Reinhart MA, Solomon DJ, Haf JJ. Testing for validity and bias in the use of GPA and the MCAT in selection of medical school students. Acad Med. 1990;65:694–697.
16Veloski JJ, Callahan CA, Xu G, Hojat M, Nash DB. Prediction of students' performances on licensing examinations using age, race, sex, undergraduate GPA, and MCAT scores. Acad Med. 2000;75(10 suppl):S28–S30.
17Wiley A, Koenig JA. The validity of the MCAT for predicting performance in the first two years of medical school. Acad Med. 1996;71(10 suppl):S83–S85.
18Shen H, Comrey AL. Predicting medical students' academic performances by their cognitive abilities and personality characteristics. Acad Med. 1997;72:781–786.
19Donnon T, Paolucci EO, Violato C. The predictive validity of the MCAT for medical school performance and medical board licensing examinations: A meta-analysis of the published research. Acad Med. 2007;82:100–106.
20Gilbert GE, Basco WT Jr, Blue AV, O'Sullivan PS. Predictive validity of the Medical College Admission Test writing sample for the United States Medical Licensing Examination. Adv Health Sci Educ Theory Pract. 2002;7:191–200.
21Gonnella JS, Erdmann JB, Hojat M. An empirical study of the predictive validity of number grades in medical school using three decades of longitudinal study: Implications for a grading system. Med Educ. 2004;38:425–434.
22Hojat M, Gonnella JS, Erdmann JB, Veloski JJ. The fate of medical students with different levels of knowledge: Are the basic medical sciences relevant to physician competence? Adv Health Sci Educ Theory Pract. 1997;1:179–196.
23Hojat M, Paskin DL, Callahan CA, et al. Components of postgraduate competence: Analysis of 30 years of longitudinal data. Med Educ. 2007;41:982–989.
24Hojat M, Veloski JJ, Borenstein BD. Components of clinical competence ratings: An empirical approach. Educ Psychol Meas. 1986;46:761–769.
25Hojat M, Borenstein BD, Veloski JJ. Cognitive and noncognitive factors in predicting the clinical performance of medical school graduates. J Med Educ. 1988;63:323–325.
26Hojat M, Veloski JJ, Zeleznik C. Predictive validity of the MCAT for students with two sets of scores. J Med Educ. 1985;60:911–918.
27Swanson DB, Case SM, Waechter D, et al. A preliminary study of the validity of scores and pass/fail standards for he USMLE Steps 1 and 2. Acad Med. 1993;68(10 suppl):S19–S21.
28Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum; 1988.
29Hojat M, Xu G. A visitor's guide to effect size: Statistical significance versus practical (clinical) importance of research findings. Adv Health Sci Educ Theory Pract. 2004;9:241–249.
30Gonnella JS, Hojat M, Erdmann JB, Veloski JJ. Assessment Measures in Medical School, Residency, and Practice: The Connections. New York, NY: Springer; 1993.
31Sackett PR, Borneman MJ, Connelly BS. High-stakes testing in higher education and employment: Appraising the evidence for validity and fairness. Am Psychol. 2008;63:215–227.
32Hemphill JF. Interpreting the magnitudes of correlation coefficients. Am Psychol. 2003;58:78–79.
33Willoughby L, Calkins V, Arnold L. Different predictors of examination performance for male and female medical students. J Am Med Womens Assoc. 1979;34:316–317, 320.