The thought process was easy—good school, good doctor; bad school, bad doctor…. Shaped by magazine lists, friends’ and strangers’ confirmations and professional hearsay, the notion that a medical school’s quality can be ranked and then passed on directly to their graduates has become an integral part of American culture.
—Pauline Chen, The New York Times 1
Since the Flexner Report was issued in 1910, the internal evaluation of medical school performance has focused on meeting the accreditation standards set by the Liaison Committee on Medical Education (LCME), assessing curricular content and structure, and measuring students’ performance on the National Board of Medical Examiners (NBME) licensing examinations. Given the nature of these evaluation criteria (i.e., pass/fail for LCME accreditation and NBME examinations), a publicly available comparative analysis of medical schools and their ability to train future physicians is particularly challenging.
Despite these challenges, medical school rankings have been an integral part of our educational enterprise for many years. In their 1977 study of medical school faculty, Cole and Lipton2 reported on both the objective and subjective nature of rankings, noting that “research and publication, eminence of faculty, training and research grants available, size of full-time faculty, and perceived effectiveness of training” all correlated with perceived quality (i.e., reputation). They found that although reputation is partially linked to institutional performance, “there is some evidence of a ceiling effect (Harvard) and a halo effect for schools affiliated with universities having national reputations.”2 Rankings have broad implications for institutions’ educational programs as well as for their graduates. Influential rankings that do include students’ academic performance have been developed and include evaluating schools by their efforts to serve underserved areas, generate primary care physicians, and promote the training of racial and ethnic minority physicians.3 Given the reality that such rankings and their influences are often more accessible and applicable to the general public than are those rankings based on academic performance only, we must pay close attention to their content, accuracy, and validity.
This situation is akin to that of Major League Baseball, in which the traditional, subjective evaluation of players was supplanted by the data-driven system of sabermetrics, which now provides the objective data used to rank teams and individual players.4 In baseball, the traditional system was perceived to be correct on the basis of a rationally congruent approach, where a player’s apparent skills determined his worth, but using statistical data has proven to be a better method of determining why teams win and lose (i.e., what components of a team’s offensive and defensive strategies need modifications). Correcting our medical school rankings system may similarly upset the existing academic hierarchy and challenge our current thinking about optimal evaluation methods of medical school performance.
Current Models to Evaluate Medical School Performance
The most recognized modern comparative analysis of medical schools is performed and published by U.S. News & World Report (USN&WR).5 However, USN&WR relies heavily on subjective and premedical student performance measures, including a subjective peer assessment score (i.e., a numerical rating by the deans of other medical schools), Medical College Admission Test (MCAT) scores, undergraduate grade point average, and school acceptance rates.5 Attempts to reconstruct the USN&WR rankings for primary care medical schools raised questions about accuracy and validity when researchers found that the short-term variability for schools ranked below the top 20 were greater than could be expected by changes in educational quality.6 Although the USN&WR evaluation method has undergone numerous changes, it remains subjective and limited. Importantly, USN&WR’s objective criteria evaluate the quality of matriculating students rather than assessing the value added by undergraduate medical education. In doing so, the rankings present an inaccurate and misleading assessment by focusing on criteria irrelevant to the aspiring medical student.
To our knowledge, no studies have challenged the measurement strategy for evaluating research-intensive medical schools. Yet, there is a critical need for better evaluation metrics in this area,7,8 especially as the amount of funding for academic research has significantly decreased. The Scientific Management Review Board of the National Institutes of Health (NIH) recently convened a working group, which, in its report, called for new approaches to document the value of the biomedical research that the agency conducts and funds.8
A New Model
Highlighted by the Institute of Medicine report To Err Is Human: Building a Safer Health System,9 patient outcome and quality measurements have become central to our health care system in the past decade. Comparative quality data on readmissions, surgical complications, and patient care from over 4,000 Medicare-certified hospitals are now publicly available.10 Unfortunately, extant assessments have not provided parallel objective measures of medical school education in the United States.
A new approach is needed. Previous critiques of the USN&WR rankings have called for an improved methodology for determining medical school, graduate school, and hospital rankings,3,11–14 including, for medical schools, the more systematic measurement of educational processes and impact as well as the gathering of metrics that reflect a school’s contribution to the social mission of medical training. Although arguments for particular measurement criteria are important, these specifics should be decided by all medical education stakeholders (e.g., deans, faculty, students, patients).
Here, we propose a new model to evaluate medical school performance based on two fundamental principles: (1) relevant and accessible objective criteria should replace the subjective, qualitative criteria (e.g., peer assessment score)5 that dominate the current rankings system, and (2) metrics should be based on outcomes that reflect the general mission, vision, and values of the nation’s medical education enterprise.
As a demonstration of this approach, we constructed a simplified model rankings system (see Chart 1) to evaluate medical schools’ production of academic physicians who advance medicine through basic, clinical, translational, and implementation science research. We believe our model is comparable to the USN&WR Best Medical Schools: Research rankings. We acknowledge that our rankings system is not comprehensive, as approximately 85% of U.S. medical school graduates do not join the professoriate. However, we aim to show the feasibility of a novel model for evaluating medical school performance using existing, publicly available data.
Databases and physician matching
We collected the data to input into our model from Doximity, Inc.’s comprehensive physician database (n = 1,144,599), which includes every U.S. physician as identified by a National Provider Identifier (NPI) number. We also collected data from the U.S. Department of Health and Human Services’ NPI Registry, the American Medical Association’s Physician Masterfile, state medical boards, specialty boards (e.g., American Board of Medical Specialties, American Board of Surgery), the NIH’s Research Portfolio Online Reporting Tools (RePORTER), the National Library of Medicine’s MEDLINE bibliographic database, and award databases (e.g., Howard Hughes Medical Institute, American Society for Clinical Investigation). In addition to the basic physician data available in Doximity’s proprietary database, more than 300,000 active (registered and verified) members review and update their profiles to provide additional primary data.
We excluded from our analysis foreign medical graduates (n = 284,562), graduates from medical schools with fewer than 500 graduates, and physicians who graduated before 1950 or after 2009 (n = 103,922). In our model, we included U.S. medical school graduates from 1950 to 2009 (n = 756,115), regardless of their membership status with Doximity. One hundred twenty-seven medical schools were represented. However, any medical school with fewer than 50 graduates per decade was excluded from our analysis by decade, leaving a total of 696,566 physicians. The number of medical schools included by decade were as follows: 1950–1959: 79; 1960–1969: 84; 1970–1979: 112; 1980–1989: 127; 1990–1999: 126; and 2000–2009: 126.
In our model’s scoring system, each physician was given a score that incorporated data from four primary categories: publications, grants, clinical trials, and awards/honors. We obtained these data from MEDLINE (publication record), the NIH’s RePORTER (grant record), ClinicalTrials.gov (clinical trial record), and 37 official award rosters (honors and awards).
We matched publications, NIH grants, and clinical trials data to physicians according to a proprietary Doximity, Inc. algorithm based on a previously published coauthor clustering method,15 which uses multiple variables (e.g., name, affiliations, key words) to analyze historical and demographic data about each author. We predicted that matching publications to physicians would be the most challenging, as an author’s name can vary because of the presence or absence of a middle name or middle initial and because of the lack of indexing of multiple institutional affiliations for each author. To ensure accuracy, we performed spot checks. As a result, we obtained a publication-matching accuracy of 90%, which we measured by comparing the curricula vitae of Doximity users with the matches we obtained from our model. As the quality and quantity of publicly available data improve, we expect an even higher match rate.
We assigned physicians one, two, or three points for each journal article published. Using the eigenfactor ranking system (www.eigenfactor.org), articles published in the top 1,000 journals that had an eigenfactor x ≥ 0.2 were assigned three points; 0.2 > x ≥ 0.1 were assigned two points; and x < 0.1 were assigned one point. The eigenfactor score, compared with the Thomson Reuters impact factor, more fairly assesses impact, with highly cited journals having more influence than lesser-cited journals, and corrects for self-citation.
We also assigned physicians one point for each NIH grant received, with the exception of Research Project Grants (R01), which received 10 points. Despite multiple NIH grant series, we did not employ a more detailed point system in our model.
Next, we assigned physicians one point for each clinical trial for which they were named as the principal investigator as reported by ClinicalTrials.gov.
Finally, we assigned physicians between 1 and 10 points for each award or honor they received, which we determined on the basis of the award’s prestige and exclusivity (e.g., Howard Hughes Medical Institute Investigators, Institute of Medicine Members) (see Table 1).
In Table 2, we report the descriptive statistics for these four categorical scores for all U.S. medical school graduates. We capped each categorical score at the 99.9th percentile to limit the influence that outliers would have on the results and to ensure that the schools, which consistently produce successful academic physicians, are ranked higher than the schools that produce a small number of high-performing individuals.
We calculated the average scores of graduates from the same medical school for each category. The maximum score for each category was 1.0. We then used the maximum average score for each category to normalize each school’s average score so that we could combine the four scores into a single, composite score. Next, we averaged each school’s normalized categorical score to form a raw composite score, which we subsequently normalized to the highest composite score.
In the above equation, R medschool is the composite raw score of each individual medical school, C i is the school’s average score in category i (with i = awards, publications, grants, or clinical trials), C i,max is the maximum average score obtained by any medical school in category i, P composite is the school’s normalized composite score, and R max is the maximum composite raw score amongst all schools.
Although many top institutions ranked highly in both our model (see Table 3) and the 2014 USN&WR rankings,5 important differences exist. For example, graduates from the Albert Einstein College of Medicine of Yeshiva University excelled at obtaining awards and NIH grants, which resulted in a rank of 13 in our analysis as compared with a rank of 34 in USN&WR. The University of California, San Francisco, School of Medicine (UCSF) was ranked fourth by USN&WR, in part because the faculty, not the graduates, excelled in securing NIH grants. Our evaluation of UCSF graduates, however, placed the school at 17 because its graduates achieved fewer and lower-impact publications and grants. This finding highlights the important point that the measurement of faculty grants may not reflect the quality of education provided by a given school.
In a secondary analysis, we subdivided physicians by graduation decade to assess institutional performance trajectory over time. Although most institutions maintained a consistent composite score over time, we found some notable exceptions (see Figure 1). Johns Hopkins University School of Medicine consistently placed second, but outcomes from the most recent decade (2000–2009) were very close to those of Harvard Medical School, the benchmark for normalization (ranked first). Like Johns Hopkins, Baylor College of Medicine and Stanford University School of Medicine saw better outcomes over time: Ranked 47th and 34th, respectively, in the 1950s, these institutions improved to 15th and 3rd, respectively, in the 2000s. UCSF, in contrast, experienced a sharp decline in the 1960s with a slow recovery over the following decades. Such an analysis permits us to identify unique performance trends, which can generate hypotheses regarding how major institutional changes may affect the production of successful academic physicians.
Our model is intended to demonstrate the feasibility of an outcomes-based approach to evaluating medical schools’ ability to produce academic physicians who go on to successful biomedical research careers. Institutions without this priority, such as those that focus on creating primary care physicians, would likely be disadvantaged by this model, just as they are with the USN&WR Best Medical Schools: Research rankings. We have intentionally omitted clinical performance in our model, but we do not intend to imply a value judgment that research has greater value than education or clinical care. Unfortunately, no clear metrics of an individual physician’s clinical quality or productivity are publicly available.
Although we purposefully did not include a subjective factor (e.g., peer assessment score) in this objective model, we do recognize that such a construct may incorporate certain intangible factors and may represent data that are influential to prospective medical students. Performing another analysis of our data with the addition of the USN&WR peer assessment score was not appropriate because we did not have access to the raw score data, including the score distribution and variability.
Our model has a number of limitations. First, we developed the scoring system, so it is not based on a specific tool. Second, we elected to include only the number of NIH grants rather than the dollar amounts of the grants for several reasons: The individual details of a specific grant can change rapidly in this dataset when no-cost extensions are granted and/or when more funding is granted via a noncompeting award; the award size is not necessarily an equitable metric as expenses vary widely based on the nature of the research (i.e., clinical trial > animal studies > pure in vitro/biochemical studies); and collaborative grants (e.g., program project, P01) can present unique challenges for institutional analyses like ours when multiple principal investigators are at different institutions.
Our model is also limited by a lack of grant data from non-NIH sources (e.g., Agency for Healthcare Research and Quality, Food and Drug Administration, Centers for Disease Control and Prevention, Department of Defense), a lack of publication data from journals not indexed by PubMed, and the use of only four characteristics of traditionally successful academic physicians. We recognize that the quality of students selected for matriculation at a particular institution may influence that institution’s ability to foster the academic and professional development of research-focused physicians, which may confound the rankings in our model. Additional confounders (e.g., graduate medical education) may also influence a physician’s success. In fact, combining this model with other objective outcome measures, including those related to graduate medical education,16 may provide unique insights into specific undergraduate and graduate medical education combinations that significantly influence the production of research-focused physicians.
Despite these caveats, our model makes clear the ability to shift the existing assessment paradigm from an opinion- and input-based system to an evidence-based, outcomes-oriented science. As systems designed to longitudinally track individuals in the physician pipeline (e.g., Pivio from the Association of American Medical Colleges) come online, more objective data will be available for use in such models. We hope that our analysis of the academic physician population will prompt further work to assess the outcomes of all medical school graduates who practice, teach, and lead health care innovations, policy, and reform.
The public relies on our medical institutions to train the next generation of physicians, scientists, and medical leaders. As a major funder of undergraduate and graduate medical education through tax revenue, the public has the right to a more transparent and accurate evaluation system to assess these institutions. The backbone of our academic medical system is physicians’ contribution to knowledge creation through research. Appropriately, many students are interested in pursuing research-intensive careers. Ensuring that these students have the skills and experiences they need for successful research careers should be a priority, and we believe that our rankings model can help identify institutions with a proven track record of success in producing such physicians. Additionally, institutions interested in improving their ability to turn out research-focused physicians may use these rankings to identify areas where additional resources, course work, or infrastructure may help foster the success of their graduates.
For a field guided by the principles of evidence-based practice, we know remarkably little about what educational processes produce the best physicians in the domains of high-value patient care and medical research. We are only as good as our data. Thus, we must explore alternative approaches to analyze and understand the performance of medical schools. Such efforts will lead to more rigorous and equitable evaluations, increased transparency of assessment standards, and improved educational quality. By fostering a national discussion about the most meaningful criteria that we should be measuring and reporting, we hope to improve the quality of our institutions, medical education, and the care of our patients.
Acknowledgments: The authors thank David M. Irby, PhD (University of California, San Francisco, School of Medicine), and Graham T. McMahon, MD, MMSc (Harvard Medical School and Brigham and Women’s Hospital), for their critical reviews of this article.