Secondary Logo

Journal Logo

Learner Needs and Environments

A Multilevel Analysis of Examinee Gender and USMLE Step 1 Performance

Cuddy, Monica M.; Swanson, David B.; Clauser, Brian E.

Section Editor(s): Reis, Diane C.; Robins, Lynne PhD

Author Information
doi: 10.1097/ACM.0b013e318183cd65

Abstract

As the first component of the United States Medical Licensing Examination (USMLE) sequence, Step 1 assesses how well an examinee understands and can apply important basic science concepts fundamental to the safe and competent practice of medicine. Examinees from most U.S. medical schools usually sit for Step 1 at the end of their second year.

Past research has documented gender differences in performance on National Board of Medical Examiners (NBME) Part I and USMLE Step 1 examinations, indicating that men generally outperformed women.1–4 Controlling for differences in prematriculation measures, such as undergraduate grade point average (GPA) and Medical College Admission Test (MCAT) scores, partially reduced these gender-related score differences.2,3 Considering that these prematriculation measures were most likely positively related to Part I and Step 1 performance, the reduction in the magnitude of the gender-related score differences suggests that men, on average, scored higher on prematriculation measures than women. Indeed, past research has shown that men tend to have higher MCAT Biological and Physical Sciences scores compared with women1–3,5 and that MCAT Biological and Physical Sciences scores are positively correlated with Step 1 performance.6–7 Although past research demonstrates differences in Step 1 performance by gender, it is important to note that research also indicates similar ultimate pass rates on Step 1 for men and women.4,8

Because much of this research was conducted more than a decade ago, changes in medical students and/or medical schools may have affected the relationship between examinee gender and Step 1 performance. Using analytic procedures similar to those used in a recent investigation of the relationship between examinee gender and Step 2 Clinical Knowledge (CK) performance,9 the present study provides an updated examination of the relationship between examinee gender and Step 1 performance, controlling both for student and school characteristics.

The current paper has three main objectives: (1) to reevaluate earlier findings related to the relationship between examinee gender and Step 1 performance with more recent data, (2) to investigate the effect of examinee gender on the relationships between prematriculation measures and Step 1 performance, and (3) to examine how medical school characteristics influence the relationships between examinee characteristics and Step 1 performance.

Method

Data.

The data were from USMLE records and included scores and demographic information for examinees that matriculated into U.S. Liaison Committee on Medical Education-accredited medical schools between 1997 and 2002 and took Step 1 for the first time between 1999 and 2004. The sample included 66,412 examinees from 133 medical schools/campuses and spanned six entering cohorts. Puerto Rican schools were excluded from the sample because there is an alternate route to licensure in Puerto Rico that may affect the perceived stakes of USMLE examinations. Cohorts with less than 80% of the median cohort size at a school and schools with fewer than five cohorts were excluded because they may reflect atypical medical school admission policies, Step 1 requirements, and/or curricula. On the basis of these criteria, approximately 3% of cohorts and 4% of schools/campuses were excluded. Since the sample was limited to Step 1 administrations through 2004, a small percentage of examinees was excluded because it is possible to sit for Step 1 several years after matriculation. In the current sample, on average, 92% of examinees took Step 1 two years after matriculation, and close to 99% took the exam three years after matriculation. Only examinees that consented to the confidential use of their scores in research were included in the sample. Although all USMLE examinees are given the opportunity to decline the use of their scores for research purposes, on average, fewer than 0.01% of examinees do so.

Hierarchical linear modeling analyses.

Hierarchical linear modeling (HLM) techniques offer a means of analyzing multilevel data sets and, in the current study, provide a way to investigate the relationships among examinee characteristics, medical school characteristics, and Step 1 performance. Conceptually, HLM does this by estimating a separate regression line predicting Step 1 scores from examinee characteristics for each medical school (random-coefficients models) and then uses the results of these examinee-level regression analyses (estimates of school-specific intercepts and slopes) as dependent variables in school-level regression analyses (intercepts-and-slopes-as-outcomes models). School characteristics can then be used in the school-level analyses to predict variation in within-school intercepts and slopes.

Such multilevel modeling is useful for disentangling individual and group effects, both of which may matter for medical education outcomes. For example, MCAT scores may be positively related to Step 1 scores, but this relationship may be stronger for students attending medical schools with weaker basic science curricula, because schools with stronger basic science instruction may help students with lower MCAT scores score higher than expected on Step 1.

For the current study, a series of HLM examinees-nested-in-schools analyses was conducted, with Step 1 total scores as the dependent variable. These analyses included (1) a random-effects ANOVA that partitioned total variation in Step 1 scores into within- and between-school components, (2) a series of random-coefficients models used to determine (a) which examinee characteristics to use as within-school predictors and (b) which within-school coefficients to fix and which to allow to vary across schools, and (3) a series of intercepts-and-slopes-as-outcomes models used to examine the impact of school characteristics on mean Step 1 scores (intercepts) and the relationships between examinee characteristics and Step 1 scores (slopes).

Variables.

Within-school predictors included a gender variable (0 = male; 1 = female) and an English as a second language (ESL) variable (0 = native English speaker; 1 = ESL). They also included a series of dichotomous variables indicating the following undergraduate majors: Health Sciences, Social Sciences, Humanities, Biological Sciences, Mathematics, and Physical Sciences. The percentages of male examinees in each major were as follows: Health Sciences (48%), Social Sciences (48%), Humanities (50%), Biological Sciences (54%), Mathematics (63%), and Physical Sciences (64%). In all models, Biological Sciences was used as the reference group.

Additional within-school predictors included undergraduate science and nonscience GPAs adjusted for the selectivity of the examinees’ undergraduate institutions. Similar to past research,2,3 this adjustment was made by multiplying GPAs by a selectivity index equal to the sum of a school’s mean verbal and quantitative SAT scores and then dividing the product by 1,000. To investigate the influence of gender on the relationship between adjusted undergraduate science GPAs and Step 1 performance, an interaction term between gender and adjusted undergraduate science GPAs was also used as a within-school predictor.

Because preliminary analyses indicated that MCAT Biological Sciences and Physical Sciences scores were both relatively well correlated with Step 1 scores and with each other, they were averaged and used as a single within-school predictor (MCAT BP). MCAT Verbal Reasoning scores (MCAT VR) and MCAT Essay scores were also used as within-school predictors. Table 1 provides means and standard deviations for adjusted undergraduate GPAs, MCAT scores, and Step 1 scores by examinee gender.

Table 1
Table 1:
Adjusted Undergraduate GPAs, MCATs, and United States Medical Licensing Examination (USMLE) Step 1 Observed Scores, by Examinee Gender

Between-school predictors included the percentage of students at a school who are female (mean = 46.4%, SD = 6.0), the percentage of students at a school who are native English speakers (mean = 89.9%, SD = 6.8), school size as indicated by schools’ average cohort size (mean = 84.0 students, SD = 38.8), schools’ average MCAT VR scores (mean = 9.5, SD = 0.6), and schools’ average MCAT BP scores (mean = 10.0, SD = 0.9). All within-school predictors were group-mean centered, and all between-school predictors were grand-mean centered.

Results

Random-effects ANOVA.

The top portion of Table 2 provides the results of the random-effects ANOVA. Of the total variation in Step 1 scores, 90.7% was within schools, and 9.3% was between schools. These percentages were determined by dividing the within-school variance component (463.26) and the between-school variance component (47.71) by the sum of the within- and between-school variance components (510.97) and then multiplying the individual quotients by 100. These findings are generally consistent with Step 2 CK, where 94.2% of the total variation in scores was within schools and 5.8% was between schools.9 Thus, for both Steps, although some variation in scores exists between schools, the majority is among examinees.

Table 2
Table 2:
Results of Examinees-Nested-in-Schools Hierarchical Linear Modeling Analyses

Random-coefficients models.

To determine which examinee-level independent variables to use as within-school predictors and which within-school coefficients to let vary across schools, a series of random-coefficients models was run. Compared with science GPAs, preliminary models indicated that nonscience GPAs were less related to Step 1 scores and were therefore excluded from the final random-coefficients model. MCAT Essay scores were also excluded from the final random-coefficients model because preliminary models showed them to be essentially unrelated to Step 1 scores. All other within-school predictors were included in the final random-coefficients model.

The middle portion of Table 2 presents the results of the final random-coefficients model, which explained 25.2% of the within-school variation in Step 1 scores, the majority of which was attributable to MCAT BP scores. Controlling for differences in undergraduate major, adjusted undergraduate science GPAs, and MCAT scores, men performed, on average, 1.5 points higher than women on Step 1. Without controlling for such prematriculation measures, men outperformed women by about 6.2 points, as indicated by an initial random-coefficients model not reported in Table 2. Overall, the regression of Step 1 scores on adjusted undergraduate science GPAs was steeper for women compared with men. For every one-point increase in adjusted GPAs, Step 1 scores were expected to increase 6.60 points for men and 7.58 points (6.60 + 0.98) for women.

With respect to undergraduate major, Health Sciences majors, on average, outperformed Biological Sciences majors by 2.63 points, controlling for adjusted undergraduate science GPAs and MCAT scores. Biological Sciences majors outperformed other majors, with differences ranging from 2.43 points for Physical Sciences majors to 4.05 points for Mathematics majors. In terms of MCAT scores, MCAT VR and MCAT BP scores were both positively related to Step 1 scores, although the effect was larger for MCAT BP scores. For every one-point increase in MCAT VR scores, Step 1 scores were expected to increase 0.96 points, whereas for every one-point increase in MCAT BP scores, Step 1 scores were expected to increase 5.9 points.

Average Step 1 scores varied significantly from school to school, as did the relationships between examinee gender and Step 1 scores, and MCAT BP scores and Step 1 scores. Therefore, the intercept, the slope for gender, and the slope for MCAT BP were allowed to vary across schools in the intercepts-and-slopes-as-outcomes models. Slopes for all other within-school predictors were fixed.

Intercepts-and-slopes-as-outcomes models.

Once the final random-coefficients model was determined, a series of intercepts-and-slopes-as-outcomes models was run to explain school-to-school variation in intercepts and slopes. The lower portion of Table 2 presents the results of the final intercepts-and-slopes-as-outcomes model, which explained 25.2% of the within-school variation and 77.40% of the between-school variation in Step 1 scores. Because the final intercepts-and-slopes-as-outcomes model included the same within-school predictors and yielded similar results as the final random-coefficients model, the lower portion of Table 2 does not include all of the within-school results.

The final intercepts-and-slopes-as-outcomes model included the following between-school predictors of schools’ mean Step 1 scores: percent native English speakers, medical school size, and schools’ mean MCAT BP scores. Percent female was unrelated to differences in mean Step 1 scores, and mean MCAT VR scores were unrelated to differences in mean Step 1 scores, controlling for mean MCAT BP scores. Percent female was included as a between-school predictor of the relationship between MCAT BP and Step 1 scores. None of the other between-school predictors were significantly related to school differences in MCAT BP slopes, and none of the between-school predictors explained any school-to-school variation in the relationship between gender and Step 1 scores.

In terms of MCAT scores, schools with higher mean MCAT BP scores, on average, had higher mean Step 1 scores. For every one-point increase in schools’ average MCAT BP score, schools’ mean Step 1 scores were expected to increase 7.8 points. It is important to note that a one-point increase in a school’s mean MCAT BP score is a large change, given that a one-point change is slightly greater than a one-standard-deviation increase (0.9) in mean MCAT BP scores. The percentage of female students at a school affected the slope of the regression of Step 1 scores on MCAT BP scores. The regression was slightly steeper at schools with more female students. On average, the slope was expected to increase 0.3 points (0.03 × 10) for every 10-percentage-point increase in female students.

Discussion

The current study demonstrated that men tend to outperform women slightly on Step 1 and that controlling for prematriculation measures reduced, but did not eliminate, this relatively small performance difference. This is generally consistent with earlier research.1–4 As noted elsewhere,2 compared with women, men may acquire stronger basic science backgrounds during their undergraduate educations, as indicated by their generally higher MCAT Biological and Physical Sciences scores. This may help explain their somewhat higher Step 1 scores, because much of the content of Step 1 is basic science related. Choice of undergraduate major may account for some of this knowledge difference, because it seems that more men than women major in basic science disciplines. It is possible that the type of undergraduate instruction associated with the specific science disciplines in which many men major may better prepare students for Step 1.

The relatively small difference in average Step 1 performance between men and women after controlling for prematriculation measures found in the current study (approximately 1.5 points, or 0.10 SDs) could be interpreted as additional validity support for the interpretation of Step 1 scores. Indeed, it is quite possible that the use of additional examinee- and/or school-level independent variables could completely account for this slight gender-related difference in Step 1 performance. Possibilities include methods by which examinees prepare for the USMLE sequence (i.e., study groups, preparation courses, etc.) and/or the type and content of medical schools’ basic science curricula.

The current study also examined the influence of gender on the relationship between adjusted undergraduate science GPAs and Step 1 performance and found that GPAs were generally more associated with Step 1 performance for women compared with men. This may indicate that, during the initial years of medical school, instruction impacts men in ways that it does not for women. As others have noted, this could be attributable to adjustment issues for women, or to women’s preferences for the subject matter stressed in the later years of medical school.1,2 This would be consistent with the idea that a basic science knowledge gap favoring men at the start of medical school continues during the first few years of medical education.

As outlined in earlier work, it seems that this gender-related knowledge gap narrows over the course of medical education,1–2 and when the focus shifts from basic to clinical science, women begin to “catch up” and outperform men, as indicated by their generally comparable or higher Step 2 CK scores.1–2,4,9 One recent study showed that Step 1 scores were more associated with Step 2 CK performance for men compared with women.9 In this case, during the later years of medical education, medical school training seems to impact women in ways that it does not for men. Again, this may be attributable to women’s preferences for the areas taught in the later years of medical school, because these domains may be more associated with the specialty areas in which many women may ultimately want to practice.

One school-level finding worth highlighting is that MCAT Biological and Physical Sciences scores were slightly more associated with Step 1 performance for students from schools with higher percentages of female students compared with students from schools with smaller percentages of female students. Although this effect was small, it may be that at schools with lower percentages of women, increased interaction with men, who may, on average, have stronger basic science backgrounds, may better prepare students for Step 1. It is important to note that past research has generally found the percentage of female students at a school to be unrelated to schools’ mean performances on the NBME Part I, II, and III examinations,10 and on USMLE Step 2 CK.9 Past research has further found this school-level variable to be unrelated to between-school variation in the effect of Step 1 scores on Step 2 CK performance.9 In general, the medical school characteristics used in the current study were often unable to predict variation in school-to-school performance differences. Thus, when available, USMLE score data should be examined using a wider range of medical school data.

The present study provides some useful updated information on the relationship between examinee gender and Step 1 performance. In conjunction with a recent study of gender differences in Step 2 CK performance, the current findings revealed patterns of gender-related achievement differences in medical education that generally mirrored those reported consistently during the past 35 years. Although it seems that gender-related performance differences in medical education begin with dissimilar basic science backgrounds, the reasons, both at the student and school levels, for this knowledge gap are unclear from the present analyses and should be investigated in future studies.

References

1 Weinberg E, Rooney JF. The academic performance of women students in medical school. J Med Educ. 1973;48: 240–247.
2 Case SM, Becker DF, Swanson DB. Performance of men and women on NBME Part I and Part II: The more things change …. Acad Med. 1993;68(10 suppl):S25–S27.
3 Dawson B, Iwamoto CK, Ross LP, Nungester RJ, Swanson DB, Volle RL. Performance on the National Board of Medical Examiners Part I examination by men and women of different race and ethnicity. JAMA. 1994;272:674–679.
4 Case SM, Swanson DB, Ripkey DR, Bowles LT, Melnick DE. Performance of the class of 1994 in the new era of USMLE. Acad Med. 1996;71(10 suppl):S91–S93.
5 Koenig JA, Leger KF. A comparison of retest performances and test-preparation methods for MCAT examinees grouped by gender and race-ethnicity. Acad Med. 1997;72(10 suppl):S100–S102.
6 Swanson DB, Case SM, Koenig JA, Killian CD. Preliminary study of the accuracies of the old and new Medical College Admission Tests for predicting performance on USMLE Step 1. Acad Med. 1996;71(10 suppl):S25–S30.
7 Julian ER. Validity of the Medical College Admission Test for predicting medical school performance. Acad Med. 2005;80:910–917.
8 DeChamplain AF, Winward ML, Dillon GF, DeChamplain JE. Modeling passing rates on a computer-based medical licensing examination: An application of survival data analysis. Educ Meas Issues Pract. 2004;23: 15–22.
9 Cuddy MM, Swanson DB, Dillon GF, Holtman MC, Clauser BE. A multilevel analysis of the relationships between selected examinee characteristics and United States Medical Licensing Examination Step 2 Clinical Knowledge performance: Revisiting old findings and asking new questions. Acad Med. 2006;81(10 suppl):S103–S107.
10 Shen L. Gender effects on student performances on the NBME Part I, Part II, and Part III. Acad Med. 1994;69(10 suppl):S75–S77.
© 2008 Association of American Medical Colleges