The annual U.S. News & World Report (USN&WR) undergraduate and graduate school rankings attract attention among1–14 and appear to influence the behavior of15–18 both lay and academic audiences. The rankings purport to reflect the relative quality of training at various schools.19–22 If rankings truly reveal quality, they could provide useful guidance to students considering application options, and to educators and administrators seeking to benchmark and improve their programs. USN&WR does not publish information regarding the measurement properties of their rankings, but independent empirical studies have examined this issue, and the findings call into question the reliability, validity, and utility of the rankings.1,2,4,6,7,9,13,14,18,23
Nearly all of the prior studies have focused on USN&WR rankings of undergraduate and nonmedical graduate schools; we found only one that has focused on aspects of the overall medical school ranking methodology.9 Furthermore, to our knowledge, no investigators have examined the measurement properties of the USN&WR Primary Care Medical School (PCMS) ranking, nor of the PCMS score used to derive the ranking (Table 1).22 This research gap is an important one to fill given both the importance of primary care to meeting the health needs of the U.S. population24 and the key differences in the PCMS ranking methodology as compared with the overall medical school (and certainly other USN&WR) methodologies. Just two examples illustrate these differences: the “primary care rate” component of the PCMS ranking methodology is not a part of the overall medical school ranking methodology; conversely, the “research activity” factor in the overall medical school methodology is not employed in the PCMS methodology. These and other differences suggest the need for studies that specifically examine the measurement properties of the PCMS score and resulting ranking.
Among the key unexamined aspects of the USN&WR PCMS score are its short-term stability (within-school, between-year [or year-to-year] reliability), and the short-term stability of the two subjective components of the calculation: (1) the peer assessment score and (2) the residency director score, which together account for 40% of each school’s total score (Table 1). In theory, we would anticipate these two scores to change little from year to year, as most schools do not undergo changes that substantively affect primary care training in the short term. Further, even when such changes do occur, they generally are not immediately apparent to external raters. Low short-term stability of either the peer or the residency director score would limit both the short-term stability of the overall score and the ability to distinguish reliably among schools. Because adequate measure reliability is necessary (though not sufficient) for validity, low short-term stability of these subjective components would also undermine the validity of the composite PCMS score.
Also unexamined is the degree to which differences in PCMS between-year rankings for a single school and the rankings among schools within a given year are informative. These differences are important to consider, given both the publicity the rankings receive and the human tendency to interpret rank-ordered lists as indicative of meaningful differences among ranked entities even when meaningful differences do not exist.25,26
Although we cannot directly evaluate the validity of the PCMS because of the lack of a readily available reference standards, several elements have low face validity.27 USN&WR defines the “primary care rate” as the percentage of graduates entering family medicine, internal medicine, and pediatrics. Whereas most family medicine graduates become primary care physicians, smaller percentages of pediatric and internal medicine graduates do so.28–30 Thus, schools that produce many pediatric and/or internal medicine subspecialists may receive artificially high PCMS scores. The relevance of the “student selectivity” component (Table 1) is also unclear because some evidence suggests that higher selectivity might be detrimental to primary-care-interested students.31 Also, USN&WR defines its “faculty resources” score as the ratio of all full-time science and clinical faculty to full-time students. Having relatively large numbers of basic science and non-primary-care clinical faculty could increase a school’s faculty resource score, but whether these faculty would contribute to better primary care training is unclear. Eliminating such questionable elements of the score or, where feasible, modifying them to improve their face validity might substantively change the rankings.
We analyzed data employed in the 2009–2012 USN&WR PCMS rankings, augmented with additional relevant, publically available data from the American Academy of Family Physicians (AAFP).32,33 We had three general aims: (1) to reconstruct the USN&WR PCMS scores and rankings for these years, using their published methodology, (2) to assess the short-term stability of the reconstructed USN&WR PCMS score and its two subjective components (the peer and residency director scores), as well as the short-term stability of the PCMS rankings, and (3) to examine the spread of individual schools’ USN&WR PCMS rankings during the four-year study period. In exploratory analyses, we also compared the reconstructed USN&WR PCMS rankings with rankings derived from two alternative methodologies, which we devised with the intention of improving the face validity of the rankings.
The study primarily employed data from the 2009–2012 USN&WR PCMS rankings.22 For 2009, 2011, and 2012, we had access to all of the data for all schools listed by USN&WR on their subscription-based, annually updated Web site.34
In 2010, we had access to data for only the 22 schools ranked (some schools were tied) in the top 20, as published in the print version of the magazine.35
USN&WR PCMS score elements and methodology
Using data presented on the USN&WR Web site,34 we reconstructed the USN&WR PCMS scores and rankings for all schools, using the following elements and methodology (Table 1).
USN&WR conducted a quality assessment that consisted of two components: peer assessment and residency director assessment. To compute the peer assessment component, a USN&WR contractor mailed a survey in the fall prior to each ranking year to MD-granting and osteopathic medical school deans, deans of academic affairs, heads of internal medicine, and/or directors of admissions. The survey asked these individuals to rate each U.S. MD-granting and osteopathic school’s primary care program on a five-point Likert-type scale (1 = marginal, 2 = adequate, 3 = good, 4 = strong, 5 = outstanding, or “Don’t know well enough to rate fairly”). USN&WR calculated a mean peer assessment score for each school, based on all responses. USN&WR reported survey response rates of 48%, 46%, and 43% for, respectively, 2009, 2011, and 2012, but did not report the number of surveys mailed or peer ratings returned for each school. We were unable to obtain the peer assessment survey response rate for 2010.
The process for computing the residency director component was analogous; the only difference was that the survey was mailed to family medicine, internal medicine, and pediatrics residency directors. Also, USN&WR noted that for the residency director component, ratings for the two most recent survey years were averaged for use in ranking calculations.22 The USN&WR reported overall response rates of 18%, 19%, and 16% for, respectively, the 2009, 2011, and 2012 rankings; we were unable to obtain the response rate for 2010.
USN&WR calculated primary care rate as the percentage of graduates entering family medicine, internal medicine, and pediatrics residencies, averaged over the ranking year and the two prior years.
Student selectivity consisted of three components: mean composite MCAT score, mean undergraduate grade point average, and acceptance rate (proportion of applicants offered admission), all in the ranking year.
USN&WR reported calculating faculty resources as the ratio of full-time science and clinical faculty to full-time students in the ranking year. However, in reconstructing the USN&WR published scores, we discovered that USN&WR used the student-to-faculty ratio instead, applying negative weights such that a higher student-to-faculty ratio meant a worse score.
For each ranking year we internally z-score-transformed (i.e., standardized to a sample mean of 0 and standard deviation of 1) the scores for each scoring element. To avoid bias in computing z scores in 2010, for which data from only the top 22 schools were available to us, we temporarily imputed to schools below the top 22 a value for each PCMS score element that was equal to the per-school mean value of that element in the two adjacent years (2009 and 2011).
The weighted average of a school’s computed z scores was the reconstructed PCMS score. In their publications, the USN&WR rescales the PCMS score so that the highest rated school has a score of 100. We opted to forego this rescaling and present weighted mean scores, which we felt would facilitate reader interpretation of the score distribution without altering the relative rankings of schools.
Exploratory alternative PCMS score methodologies
In an effort to improve the face validity of the rankings, we compared the reconstructed USN&WR PCMS rankings with rankings derived from two alternative methodologies. Alternative methodology 1 involved calculating the PCMS score using only those elements of the USN&WR methodology that we viewed as plausibly associated with primary care training quality (i.e., those with reasonable face validity). The three elements, each z-transformed and relatively weighted as in the original USN&WR methodology (Table 1), were the peer score (25/70), residency director score (15/70), and primary care rate (30/70). In other words, we excluded the “student selectivity” and “faculty resource” components.
Alternative methodology 2 was similar to alternative methodology 1; the only difference was that we adjusted the primary care rate calculation. Instead of employing the percentages of all family medicine, internal medicine, and pediatrics graduates as in the USN&WR methodology, we discounted the percentages of graduates into different primary care fields, based on national data regarding the rates at which graduates in each field actually end up in primary care practice. We obtained publicly available data from the AAFP regarding the number of students matching into family medicine by school from 2006 to 2010 (no comparable data were available for internal medicine and pediatrics).32,33 For each school for each year, we estimated the proportion of recent graduates matching into family medicine, based on all available data for that school for that year and up to three previous years. We divided this proportion by the overall USN&WR primary care rate to estimate the proportion of primary care graduates matching into family medicine and the proportion (determined by taking the complement) matching into internal medicine or pediatrics.
We applied a propensity factor of 0.95 to the proportion of family medicine graduates from each school, reflecting that the vast majority of family physicians practice primary care. We applied a propensity factor of 0.50 to the proportion of internal medicine or pediatrics graduates. This represents a conservative approach, because whereas about 50% to 60% of pediatricians practice primary care, far fewer internists do so.28–30
We conducted our statistical analyses using SAS System for Windows version 9.3 (SAS Institute Inc, Cary, North Carolina). We have denoted the 25th percentile and the 75th percentile of distributions as, respectively, Q1 and Q3. We have also reported Pearson correlations between the reconstructed and published PCMS scores for each year.
Using all available data in 2009, 2011, and 2012, we used mixed-effects models to assess the within-school between-year reliability (short-term stability) of the reconstructed PCMS score, the peer and residency director score components, and the PCMS rankings that are based on the reconstructed score and the two alternative methodologies. We used restricted maximum likelihood estimates of between-school and residual variance components from simple mixed-effects models to estimate intraclass correlation coefficients (ICCs—the ratio of the between-school variance component divided by the sum of between-school and residual variance components).36,37 ICCs of 0.90 and 0.95 have been proposed as, respectively, the “minimum” and “desirable” standard for assessing reliability in the context of decision making.38 Our mixed-effects models included a fixed effect for the study year factor and random intercepts for each school, and we have reported the square root of the residual variance component as the standard error of measurement.38
We used a variant of the Bland-Altman plot to depict the short-term spread (within-school range) versus the mean annual ranking of the reconstructed PCMS from 2009 to 2012,39 and we used the Kruskal-Wallis test to compare the magnitude of these short-term spreads for those schools whose average annual ranking fell in the 1–20 range versus the spreads for all other schools.
Reconstructed USN&WR PCMS scores and rankings
Sufficient data were available to reconstruct USN&WR PCMS scores for 119, 22, 121, and 112 schools in, respectively, 2009, 2010, 2011, and 2012. Figure 1 presents a histogram of the reconstructed USN&WR PCMS scores for the 112 schools in 2012. The median (Q1, Q3) score was 0.04 (−0.26, 0.34), and most schools were tightly clustered near the center of the distribution.
Table 2 shows the reconstructed 2012 PCMS scores and rankings. The reconstructed PCMS score correlated highly with USN&WR published scores in each study year; the Pearson correlations were 99.9% for 61 schools in 2009, 99.5% for 22 schools in 2010, 99.6% for 98 schools in 2011, and 98.9% for 89 schools in 2012.
Comparison of USN&WR and alternative methodology rankings
Table 2 also shows the 2012 rankings determined via the two alternative methodologies and, for each, the difference in ranking as compared with the USN&WR methodology. Of the 112 schools with sufficient data for analysis in 2012, 40 (36%) had a ranking that differed from the USN&WR ranking by more than 10 positions when we applied alternative methodology 1, and 53 (47%) had a ranking that differed from the USN&WR ranking by more than 10 positions when we applied alternative methodology 2.
Short-term stabilities of the PCMS score, quality assessment components, and PCMS ranking
Table 3 shows the variance components and ICCs of the reconstructed USN&WR PCMS score. It also provides these data for the peer and residency director scores, the reconstructed PCMS ranking, and the two alternative methodology rankings. All of the ICC estimates indicate fair to good short-term stability. The standard error of measurement for an individual school’s ranking was 11.
Short-term spread of the PCMS rankings
Figure 2 presents a Bland-Altman-type plot of the within-school range versus the mean annual ranking based on the reconstructed PCMS scores for the 107 schools with at least three annual scores from 2009 to 2012. The plot indicates a large amount of within-school variation for schools with mean annual rankings below the top few slots. The median (Q1, Q3) within-school range was 14 (7, 23), indicating that at least half of the schools experienced a change from best to worst ranking of at least 14 positions. For the 18 schools whose average annual ranking was in the 1–20 range, the median (Q1, Q3) difference between the school’s best and worst annual ranking during the study period was 4 (1, 12), while for the other 89 schools, it was 17 (9, 24) (P < .001 for comparison, Kruskal-Wallis test).
Discussion and Conclusions
Our findings begin to characterize key measurement properties of both the USN&WR PCMS score and the associated, widely publicized annual PCMS ranking. We found fair to good short-term stability of the PCMS score and its peer and residency director scores. The ICCs for these scores were around 90% (Table 3), a level commonly cited as the minimum standard necessary (but not sufficient) for reliable decision making (i.e., discriminating among schools).38
By contrast, our findings regarding the PCMS rankings suggest they are more problematic. Although the PCMS score itself was reasonably stable in the short term, most schools’ scores were tightly clustered near the center of the score distribution (Figure 1). Thus, for any given school, even a very small change in PCMS score from one year to the next could result in a large change in ranking position. The span of the 95% confidence interval (CI) for a school’s ranking can be estimated at four times the standard error of measurement.38 Given the standard error of measurement in our ranking analysis of approximately 11, this yields an estimated 95% CI span of 44 ranking positions, a large proportion of the total ranking positions (slightly over 100) in any year. Indeed, we observed that individual schools’ rankings tended to vary considerably across the short study time interval, particularly schools with mean rankings below the top 20. For those schools, the median range in rankings over the study period was 17. That is, over half of the schools below the top 20 had a difference of 17 or more between their best and worst ranking during the four study years. That the actual primary care training quality would change to such a degree at so many schools over such a short period of time seems implausible.
In this context, the USN&WR ranking appears to distinguish between two groupings of schools: those in the top 20, whose stable high rankings were largely driven by their consistently high subjective peer and residency director scores (data not shown, available from authors), and all others. Our finding of stable PCMS rankings over the short term among the top 20 schools from 2009 to 2012 echoes the finding of a prior study, which examined the long-term stability of overall medical school rankings in an analysis limited to the 25 most highly ranked schools from 1996 to 2000.9 Similarly, our finding of unstable PCMS rankings among schools ranked below the top 20 is generally consistent with the findings of prior research regarding the measurement properties of the USN&WR undergraduate and nonmedical graduate school rankings.1,2,4,6,7,13,14,18,23 For schools ranked below the top 20 in our study, the differences in rankings among schools within years, and the differences in an individual school’s ranking across years, seem unlikely to represent meaningful, informative differences to medical school applicants, faculty, or administrators.
Whether or not schools with mean annual rankings in the top 20 actually offer better-quality primary care training than other schools remains open to question because validation studies of the USN&WR PCMS method have not been published. Directly examining the validity of the method is difficult because no gold standard reference method for determining primary care quality exists. Thus, we explored how removing or modifying those elements of the USN&WR PCMS methodology that have low face validity would change the scores and rankings. Both alternative methodologies resulted in substantially different rankings. For example, in 2012, Harvard Medical School was 15th in the reconstructed USN&WR ranking, but 34th and 41st when applying alternative methodologies 1 and 2, respectively. Harvard Medical School’s high ranking in the USN&WR methodology was driven in part by a relatively high “primary care rate” (data not shown, available from authors), which, as mentioned, assumes that all graduates matching into internal medicine and pediatrics residencies will pursue primary care (Table 1). Given national data regarding the proportions of physicians in these specialties who actually pursue primary care,28–30 this assumption is likely spurious. Harvard Medical School’s high ranking was also partly driven by elements of the USN&WR methodology without clear relevance to primary care training quality (data available from authors) such as “student selectivity” and “faculty to student ratio” (Table 1). Modifying (in the case of the primary care rate) or removing (in the case of student selectivity and faculty resources) low-face-validity elements of the USN&WR methodology resulted in alternative rankings that we believe have higher face validity (Table 2).
Our study had some limitations. USN&WR declined our request for access to the original raw data employed in the ranking calculations, so we had to reconstruct the scores and rankings from the publicly available information on the USN&WR Web site34; nonetheless, our replicated PCMS scores and rankings correlated highly with the published scores and rankings. Also, for 2010, we had PCMS ranking data for only the top 22 schools. Thus, the four-year ranking spreads calculated for the remaining schools represent underestimations because they are based on only three years of data. We maintained the same scoring weights for the components retained in our alternative methodologies that the original USN&WR methodology employs. These weights appeared reasonable to us, but they are ultimately arbitrary because no criterion standard exists for determining the relative importance of each of the components to primary care training quality.
Response rates were low for the peer and residency director surveys employed to calculate the “quality assessment” score used in both the original USN&WR methodology and therefore also in our alternative methodologies, potentially undermining the validity of all three methods. Finally, our analyses focused only on examining some key measurement properties of the USN&WR methodology and on exploring the utility of alternative methodologies (adjustments) with higher face validity. Ideally, factors not addressed in any of the methodologies would be considered in assessing the primary care training quality at medical schools. Such factors include student and expert educator ratings of the quality of primary care curricula, as well as the contribution of primary care training to meeting aspects of the social mission of medical schools (e.g., training adequate numbers of minority physicians, supplying physicians to underserved areas)24; however, information regarding such factors is not readily available from all medical schools. These observations suggest the need for enhanced collaboration among schools and key national organizations(e.g., Association of American Medical Colleges, AAFP) to develop comprehensive, valid, nonproprietary methods of assessing and comparing the quality of primary care training at U.S. medical schools.
In conclusion, the USN&WR PCMS score and ranking had reasonably good short-term stability. However, the spread in individual schools’ PCMS rankings during the study time period was large, particularly among those with mean annual rankings below the top 20. The variation in individual schools’ rankings was greater than can be plausibly attributed to changes in the quality of primary care training, raising questions regarding the validity of the rankings and their utility to medical school applicants, faculty, and administrators.
Acknowledgments: The authors thank Nicholas Clark, University of California Davis Center for Healthcare Policy and Research, for meticulous and timely data entry.
2. Brooks RL. Measuring university quality. Rev High Educ. 2005;29:1–21
4. Clarke M. News or noise? An analysis of U.S. News and World Report’s ranking scores. Educ Meas Issues Pract. 2002;21:39–48
5. Cole JR, Lipton JA. The reputations of American medical schools. Soc Forces. 1977;55:662–684
6. Dichev I. News or noise? Estimating the noise in the U.S. News university rankings. Res High Educ. 2001;42:237–266
8. Hazelkorn E. Learning to live with league tables and ranking: The experience of institutional leaders. High Educ Policy. 2008;21:193–215
9. McGaghie WC, Thompson JA. America’s best medical schools: A critique of the U.S. News & World Report rankings. Acad Med. 2001;76:985–992
10. Schwenk TL, Sheets KJ. Family medicine in highly ranked medical schools. Fam Med. 2008;40:538–539
13. Webster TJ. USNWR college rankings reexamined. J Coll Teach Learn. 2005;2:3–16
14. Clarke M. Quantifying quality: What can the U.S. News and World Report rankings tell us about the quality of higher education? Educ Policy Anal Arch. 2002:10 http://epaa.asu.edu/epaa/v10n16/
. Accessed April 17, 2013
15. Monks J, Ehrenberg RG The impact of U.S. News & World Report college rankings on admissions outcomes and pricing policies at selective private institutions. CHERI working paper no. 1. 1999 Ithaca, NY Cornell Higher Education Research Institute, Cornell University http://digitalcommons.ilr.cornell.edu/cheri/1/
. Accessed April 17, 2013
17. Sauder M, Lancaster R. Do rankings matter? The effects of U.S. News & World Report rankings on the admissions process of law schools. Law Soc Rev. 2006;40:105–134
18. Tsakalis K, Palais JC. Improving a school’s U.S. News and World Report ranking. J Eng Educ. 2004;93:259–263
19. Blau PM, Margulies RZ. The reputations of American professional schools. Change. 1974;6:42–47
23. . American Bar Association, Section on Legal Education and Admissions to the Bar. Report of the Special Committee on the U.S. News and World Report Rankings Section on Legal Education and Admissions to the Bar. http://ms-jd.org/files/f.usnewsfinal-report.pdf
. Accessed April 17, 2013
24. Mullan F, Chen C, Petterson S, Kolsky G, Spagnola M. The social mission of medical education: Ranking the schools. Ann Intern Med. 2010;152:804–811
25. Goldstein H, Spiegelhalter DJ. League tables and their limitations: Statistical issues in comparisons of institutional performance. J R Stat Soc A. 1996;159:385–443
26. Spiegelhalter D. Ranking institutions. JThorac Cardiovasc Surg. 2003;125:1171–1173
27. Jerant A, Srinivasan M, Bertakis KD, Azari R, Pan RJ, Kravitz RL. Attributes affecting the medical school primary care experience. Acad Med. 2010;85:605–613
28. Connelly MT, Sullivan AM, Peters AS, et al. Variation in predictors of primary care career choice by year and stage of training. J Gen Intern Med. 2003;18:159–169
29. Hauer KE, Durning SJ, Kernan WN, et al. Factors associated with medical students’ career choices regarding internal medicine. JAMA. 2008;300:1154–1164
30. Jeffe DB, Whelan AJ, Andriole DA. Primary care specialty choices of United States medical graduates, 1997–2006. Acad Med. 2010;85:947–958
31. Lawson SR, Hoban JD, Mazmanian PE. Understanding primary care residency choices: A test of selected variables in the Bland-Meurer model. Acad Med. 2004;79(10 suppl):S36–S39
32. McGaha AL, Schmittling GT, DeVilbiss AD, Pugno PA. Entry of US medical school graduates into family medicine residencies: 2008–2009 and 3-year summary. Fam Med. 2009;41:555–566
33. McGaha AL, Schmittling GT, DeVilbiss Bieck AD, Crosley PW, Pugno PA. Entry of US medical school graduates into family medicine residencies: 2009–2010 and 3-year summary. Fam Med. 2010;42:540–551
35. . The top schools—Primary care. May 2010 U.S. News & World Report:84
36. Littell RC, Miliken GA, Stroup WW, Wolfinger RD, Shabenberger O SAS for Mixed Models. 20062nd ed Cary, NC SAS Institute, Inc.
37. Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull. 1979;86:420–428
38. Nunnally JG, Bernstein IH Psychometric Testing. 19943rd ed New York, NY McGraw Hill
39. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–160