In academic medicine, faculty members are assessed on their research productivity for the purposes of making hiring, promotion, grant, and award decisions.1,2 However, it has been shown that there is bias in the assessment of research productivity that leads to women and underrepresented minorities being less likely to receive promotions than their white male peers.3 Objective metrics for scholarly output are needed so that faculty members can be compared fairly across institutions. Several metrics, including total number of publications and citation counts, have been studied, but each is limited in that they may evaluate publication quality, publication quantity, or citation trajectory but do not evaluate all three.
The h-index is a bibliometric index proposed by J.E. Hirsh in 2005 to assess academic productivity (or quantity) across all fields of academia that also represents the quality of an author’s work.4 That is, an individual has index h if h of his or her total publications (Np) “have at least h citations each and the other (Np − h) papers have ≤ h citations each.”4 Additionally, the m-index is calculated by dividing the h-index by the number of years since the author’s first paper was published, thus providing h-index trajectory. Several specialty-specific studies reveal that the h- and m-indices are positively correlated with academic rank (see Discussion). However, no published study has compared these metrics across all fields and specialties in medicine.
The purpose of the current study is to systematically review, synthesize, and analyze the available literature on publication productivity (including quantity, quality, and trajectory) by academic rank across medical specialties. We hypothesized that each academic medical specialty would have unique h- and m-indices among academic ranks. The results of this study have the potential to help academic institutions standardize their guidelines for hiring, promotion, grant, and award assessments.
This systematic review of the published literature was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; see Supplemental Digital Appendix 1 at http://links.lww.com/ACADMED/A800)5 guidelines and the reporting checklist for Meta-analysis Of Observational Studies in Epidemiology (MOOSE; see Supplemental Digital Appendix 2 at http://links.lww.com/ACADMED/A800).6
Data source and search
The medical literature, including observational studies, published in English from 2005 to 2018 was searched in PubMed, using the term “h-index,” on July 1, 2018. Using this broad term, we aimed to identify all active researchers (e.g., physicians or psychologists within an academic medical center) and other nonclinical researchers (e.g., scientists, epidemiologists, biostatisticians) who could be compared using this metric. However, there were limited published data on nonmedical fields, so all of the included studies were ultimately focused on academic medical subspecialties.
Study selection and inclusion and exclusion criteria
Initially, the titles of 786 studies were screened for eligibility; subsequently, the abstracts of the 440 studies that passed the initial screen were reviewed. E.O. and J.M. performed the initial review and discussed discrepancies with a third author (N.G.Z.). Review articles, studies that were not available online, and studies not evaluating h-indices were excluded; however, we hand searched the reference lists of these studies and related articles for other studies that met the inclusion criteria. Further, since all of the studies focused on academic medical subspecialties, to ensure the most up-to-date data for all physicians, studies had to include one or more of the 120 medical specialties from the Association of American Medical Colleges website.7 Ultimately, after full-text review of 440 articles, 36 articles, encompassing 21 different medical specialties, met all inclusion criteria. Relatively few studies pertaining to internal medicine subspecialties were published, and most of the studies pertained to surgical specialties.
Additional inclusion criteria for the literature search were defined using the Population, Intervention, Control, Outcome, and Study Design (PICOS) approach (see Supplemental Digital Appendix 3 at http://links.lww.com/ACADMED/A800).8 The population was composed of faculty in academic medicine with reported mean and/or median h-indices.
Studies must have reported h-indices and, if available, other publication metrics, including number of citations, number of publications, and m-indices, stratified by academic rank: instructor, assistant professor, associate professor, full professor, and department chair. The h-index was the primary outcome. The secondary outcome was the m-index, defined as (h-index)/(number of years since the author’s first published paper), which characterizes the rise in the h-index over time. The h-index can only rise over time. Since the h-index must be ≥ 0 and time must be year(s) > 0, the m-index must be ≥ 0 when calculated. The m-index does not adjust for “natural” gaps in the workforce for certain subgroups of faculty; for example, faculty members may be temporarily not publishing after childbirth and during child care. For a study to report an m-index, it must by definition also report the h-index. Thus, in our search criteria, we focused on the h-index and only reported the m-index if it was also calculated by the authors.
Data extraction and quality assessment
The results of the PubMed search was exported into EndNote version X9 (Clarivate Analytics, Philadelphia, Pennsylvania). Data from the 36 remaining articles were initially independently extracted by 2 authors (E.O., J.M.) who were not involved in any of the studies being reviewed. Any discrepancies in data values were resolved by discussion with 2 other investigators (N.G.Z., E.J.L.). When multiple studies existed for the same specialty, the most recent study was chosen to avoid potentially double counting faculty members. This eliminated 15 of the 36 studies, for a total of 21 included studies in the systematic review (see below).
Of the remaining 21 studies, 19 studies reported both mean h-index and provided the data needed to calculate standard deviation (SD), standard error of mean (SEM), and 95% confidence interval (CI). These 19 studies were included in the meta-analysis. The other 2 studies reported only median h-indices and interquartile ranges. The authors of these studies were contacted to supply means, SDs, SEMs, and 95% CIs; however, they did not provide these data. Therefore, these 2 studies were included in the systematic review but not in the meta-analysis (PRISMA; see Supplemental Digital Appendix 1 at http://links.lww.com/ACADMED/A800).
The data were analyzed using R studio version 1.1.383 (R Foundation, Boston, Massachusetts) and the Meta-Analysis Package for R (metafor) version 2.0-0 (publisher/distributor information not available) to conduct the meta-analyses and heterogeneity tests. The DerSimonian and Laird method was used to perform meta-analyses for the primary and secondary outcome measures.9,10 Heterogeneity was assessed using the I2 statistic.11 Significant heterogeneity was considered present if the I2 statistic was > 50%. Plots of the I2 data were not included as significant heterogeneity was expected. Forest plots were generated for the primary and secondary outcome measures by academic rank.
The systematic review included a total of 17,117 academic physicians across 21 separate studies1,12–31 from the years 2009–2018, and the meta-analysis included a total of 14,567 academic physicians across 19 separate studies1,12–29 from the years 2009–2018. All of the studies were from the United States or Canada.
The meta-analysis included publication productivity metrics for general pediatrics and 4 pediatric subspecialties (445 faculty members)12–14; general surgery, 4 surgical subspecialties, and 3 surgery-heavy fields (i.e., otolaryngology, surgical oncology, and urology; 7,260 faculty members)15–22; anesthesiology and cardiothoracic anesthesiology (904 faculty members)23,24; and 6 other specialties: sports medicine (313 faculty members),25 radiation oncology (986 faculty members),26 dermatology (1,061 faculty members),27 psychiatry (1,601 faculty members),28 ophthalmology (1,459 faculty members),29 and radiology (538 faculty members).1 Overall, the meta-analysis included 8 (< 1%) instructors, 6,609 (45%) assistant professors, 3,508 (24%) associate professors, 3,626 (25%) full professors, and 816 (6%) department chairs.
Data on gynecological oncology30 and gastroenterology31 were also found, though these data were reported in medians and therefore not included in the meta-analysis.
All h-indices are dependent on the number of publications and number of citations, so for reference, the means of these data are included in Supplemental Digital Appendixes 4 and 5 (at http://links.lww.com/ACADMED/A800), respectively, when they were reported.
Publication productivity and academic rank
Table 1 lists the mean h-indices by both academic rank and subspecialty; it also lists the median h-indices at each academic rank in gynecological oncology and gastroenterology. Mean and median h-indices increased with successive academic rank for all specialties. Figure 1 presents mean h-indices as forest plots by academic rank and subspecialty. The summary effect sizes under weighted random effects models revealed: mean h-indices for assistant professors, associate professors, full professors, and department chairs of 5.22 (95% CI: 4.21–6.23, I2 = 99%, n = 6,609), 11.22 (95% CI: 9.65–12.78, I2 = 97%, n = 3,508), 20.77 (95% CI: 17.94–23.60, I2 = 98%, n = 3,626), and 22.08 (95% CI: 17.73–26.44, I2 = 96%, n = 816), respectively.
Only 4 studies1,20,26,31 reported m-indices. Table 2 lists the mean and median m-indices across academic ranks in gastroenterology, radiology, radiation oncology, and orthopedic surgery. Similar to h-indices, m-indices increased with successive academic rank for all specialties. Of these, 3 studies1,20,26 provided mean m-indices that could be included in the meta-analysis. The weighted random effects summary effect sizes for mean m-indices were 0.53 (95% CI: 0.40–0.65, I2 = 96%, n = 1,653) for assistant professors, 0.72 (95% CI: 0.58–0.85, I2 = 92%, n = 883) for associate professors, 0.99 (95% CI: 0.75–1.22, I2 = 96%, n = 854) for full professors, and 1.16 (95% CI: 0.81–1.51, I2 = 79%, n = 195) for department chairs. Figure 2 presents mean m-indices as forest plots by academic rank and subspecialty. Figure 3 presents summary statistics from Figures 1 and 2.
This is the first published meta-analysis that characterizes publication productivity in academic medical specialties by successive academic ranks. The results suggest that h-indices increase with academic rank. Though there were limited data available, a similar trend was seen for the m-index. There are unique distributions of these metrics among medical subspecialties. The h- and m-indices should be used in conjunction with other measures of academic success to evaluate faculty members for hiring, promotion, grant, and award decisions.
The h-index was first described in 2005 by J.E. Hirsch,4 and since its inception, various fields of medicine have started to assess publication productivity and update this metric over time. For example, cardiothoracic anesthesiology and anesthesiology published their h-indices in 201124 and 2013,23 respectively; these studies were some of the earliest included in the current analysis (see Table 1). In contrast, some surgical subspecialties or surgery-heavy fields have published their h-indices only recently, including orthopedic surgery in 201720 and surgical oncology in 2018.16 A faculty member’s h-index is expected to increase over time, and this trend has been shown in the analysis of radiation oncology faculty members, where the mean h-index increased from 8.5 in 2007 to 14.5 in 2017.26,32 Thus, although some specialties in the current analysis appear to have higher mean h-indices than others, this difference may be due to the time at which the studies were published. Since the studies included in the current analysis overrepresented certain specialties, academic rank based on h-indices can be more reliably concluded within those fields rather than applying the results across all medical specialties due to gaps in the existing literature.
One of the trends seen in the current analysis was that specialties with longer training periods (e.g., surgical oncology) typically had higher h-indices for each rank than those with shorter training periods (e.g., general pediatrics; see Table 1). We attribute this higher publication productivity to a persistent commitment to research that is seen throughout residency and fellowship training in specialties with longer training periods. That is, some medical specialties may have a greater focus on research, and this is exemplified by their training requirements. For example, surgical residencies generally last 7 years with 2 years of protected research time, and most radiation oncology residences provide 6–12 months of guaranteed research time. This built-in research time is usually not a part of primary care residencies and likely contributes to their lower h-indices. This was mirrored in the current analysis as surgical oncology, neurosurgery, pediatric neurosurgery, and pediatric surgery, which all have longer training periods, all had consistently higher mean h-indices than most of the other specialties (see Figure 1 and Table 1).
In addition to variations between medical specialties in terms of dedicated research time during training years, there are significant differences in allotted research time and funding for graduated faculty members between and within institutions and their departments. These allotments are often based on available and awarded funding, which varies greatly based on institutional and departmental goals. In this way, the h-index metric may act in a cyclic manner; that is, higher h-indices can lead to greater funding, and then greater funding can result in higher h-indices.
Since the h-index will increase as the number of publications and citations increases, this bibliometric is dependent on the number of years someone has been publishing. In contrast, the m-index may be an invaluable way to identify highly productive junior faculty members because it adjusts the h-index for time since first publication. Although limited data were available for statistical analysis, Figure 2 presents the available data on m-indices across academic ranks from 3 specialties; similar to what we found for the h-index, there is a positive correlation between m-index and higher academic rank.
There are limitations to our study. First, we only searched one database—PubMed; however, we did this because, by definition, the included studies had to be published in peer-reviewed medical journals, which are all listed in PubMed. Second, we were unable to find data on publication productivity metrics for all fields of medicine and science. In the systematic review portion of our work, we found little or no information on general internal medicine, neurology, emergency medicine, obstetrics and gynecology, pulmonology, occupational medicine, and many surgical subspecialties or surgery-heavy fields, including cardiothoracic surgery, vascular surgery, and interventional radiology. Data from certain specialties were not available or could not be integrated into the current work due to the nature in which the data were reported. For example, an extensive discussion on the h-index in emergency medicine was published by DeLuca and colleagues in 2013 but could not be included in this meta-analysis as h-indices were not stratified by academic rank.33 When this issue (and similar ones) with other existing studies arose, we requested data sets from the authors; when provided, this information was included in the current analysis.
Third, the studies included here used different databases to obtain bibliometric data—some used Google Scholar, while others used Scopus. Scopus appears to have fewer duplicate entries and defects in its functionality; however, it may still list authors through multiple identifiers (e.g., maiden name, married name, nickname) and, therefore, skew search results.34 In contrast, Google Scholar may list conference abstracts, which falsely increases the number of papers. Thus, the h-indices reported here should be considered reliable approximations rather than exact values.
Fourth, faculty members’ contributions to research and scholarship are measured by a variety of other metrics, including the g-index and the Relative Citation Ratio (developed by the National Institutes of Health), which were not included in the current analysis. Further, scholarly activities are composed of more than just publications; for example, clinical acumen and teaching experience are also important metrics of academic success, but these are not directly evaluated by the h- or m-index. Additionally, the h-index may be inflated by self-citations and high coauthorship, where an author is listed on many papers but has minimal input on the actual work and is instead listed as a courtesy because of employment at an institution. Often, the first and last authors put in significantly more work into the manuscript than the middle authors; however, since the order of authors in the byline does not influence h-index, first and last authors of a publication will gain the same numeric value toward their h- and m-indices as middle authors.
Another inherent limitation of the h-index is a temporal bias, where the h-index increases over time and favors physicians that have been in practice for longer than others in their specialty (e.g., neurosurgery, joint reconstruction).19,35 Since h-indices are expected to increase over time, it is possible that some of the studies included here from an earlier time (e.g., 2011) would have higher h-indices if they were reevaluated today. This temporal bias may partly explain the correlation observed between academic rank and h-indices. The m-index can correct for this bias as it accounts for career duration. However, despite this benefit, the m-index has its own limitations. For example, though it accounts for time, it does not provide reliable adjustments for career gaps that occur for parental leave or other reasons.
In terms of the statistical analysis, we did see a significant heterogeneity (I2 values) across academic ranks for both h- and m-indices. This is expected as this type of analysis closely mimics the modern climate of academic medicine, wherein scholarly activity varies widely across specialties. Future work will be necessary to include more medical specialties, including internal medicine specialties, as these data become available.
Finally, the h-index is adjusted for publication in high- versus low-impact journals because it depends on the number of citations each publication receives. Thus, a faculty member with a highly cited article in a mid-tier specialty journal would have a higher h-index than a faculty member with a commentary with a few citations in a high-impact journal. Future work looking to broaden this research should include the adjustment for journal impact factors as an additional variable to further evaluate the merit of applying the h-index between specialties.
We find most of these limitations to be characteristics that detract from the value of the h- and m-indices, and as such, we feel that the h- and m-indices should be used in conjunction with other measures of academic success to evaluate individual faculty members for hiring, promotion, grant, and award decisions. The h- and m-indices continue to serve as valuable metrics for assessing research quantity and quality, but they should not be used as the sole determinants of a faculty member’s success.
This is the first published meta-analysis that characterizes publication productivity in academic medical specialties by successive academic ranks. H- and m-indices increase with successive academic rank across multiple specialties. Further, our findings highlight the unique distributions of these metrics among academic medical specialties. Despite the aforementioned limitations of these metrics, this study has the potential to assist academic institutions in standardizing their guidelines for hiring, promotion, grant, and award assessments based on a combination of clinical acumen, teaching experience, and publication productivity metrics.
1. Jiang A, Ginocchio LA, Rosenkrantz AB. Associations between academic rank and advanced bibliometric indices among United States academic radiologists. Acad Radiol. 2016;23:1568–1572.
2. Venable GT, Khan NR, Taylor DR, Thompson CJ, Michael LM, Klimo P Jr.. A correlation between National Institutes of Health funding and bibliometrics in neurosurgery. World Neurosurg. 2014;81:468–472.
3. Fang D, Moy E, Colburn L, Hurley J. Racial and ethnic disparities in faculty promotion in academic medicine. JAMA. 2000;284:1085–1092.
4. Hirsch JE. An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A. 2005;102:16569–16572.
5. Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. J Clin Epidemiol. 2009;62:1006–1012.
6. Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: A proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008–2012.
7. Association of American Medical Colleges. Careers in Medicine. Specialties.https://www.aamc.org/cim/specialty/exploreoptions/list
. Accessed January 2, 2019.
8. Eldawlatly A, Alshehri H, Alqahtani A, Ahmad A, Al-Dammas F, Marzouk A. Appearance of Population, Intervention, Comparison, and Outcome as research question in the title of articles of three different anesthesia journals: A pilot study. Saudi J Anaesth. 2018;12:283–286.
9. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177–188.
10. Rücker G, Schwarzer G, Carpenter J, Olkin I. Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Stat Med. 2009;28:721–738.
11. Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21:1539–1558.
12. Tschudy MM, Rowe TL, Dover GJ, Cheng TL. Pediatric academic productivity: Pediatric benchmarks for the h- and g-indices. J Pediatr. 2016;169:272–276.
13. Kalra RR, Kestle JR. An assessment of academic productivity in pediatric neurosurgery. J Neurosurg Pediatr. 2013;12:262–265.
14. Watson C, King A, Mitra S, et al. What does it take to be a successful pediatric surgeon-scientist? J Pediatr Surg. 2015;50:1049–1052.
15. Eloy JA, Svider P, Chandrasekhar SS, et al. Gender disparities in scholarly productivity within academic otolaryngology departments. Otolaryngol Head Neck Surg. 2013;148:215–222.
16. Nguyen V, Marmor RA, Ramamoorthy SL, Blair SL, Clary BM, Sicklick JK. Academic surgical oncologists’ productivity correlates with gender, grant funding, and institutional NCI comprehensive cancer center affiliation. Ann Surg Oncol. 2018;25:1852–1859.
17. Mueller C, Wright R, Girod S. The publication gender gap in US academic surgery. BMC Surg. 2017;17:16.
18. Lopez J, Susarla SM, Swanson EW, Calotta N, Lifchez SD. The association of the h-index and academic rank among full-time academic hand surgeons affiliated with fellowship programs. J Hand Surg Am. 2015;40:1434–1441.
19. Lee J, Kraus KL, Couldwell WT. Use of the h index in neurosurgery. Clinical article. J Neurosurg. 2009;111:387–392.
20. Bastian S, Ippolito JA, Lopez SA, Eloy JA, Beebe KS. The use of the h-index in academic orthopaedic surgery. J Bone Joint Surg Am. 2017;99:e14.
21. Therattil PJ, Hoppe IC, Granick MS, Lee ES. Application of the h-index in academic plastic surgery. Ann Plast Surg. 2016;76:545–549.
22. Kasabwala K, Morton CM, Svider PF, Nahass TA, Eloy JA, Jackson-Rosario I. Factors influencing scholarly impact: Does urology fellowship training affect research output? J Surg Educ. 2014;71:345–352.
23. Pashkova AA, Svider PF, Chang CY, Diaz L, Eloy JA, Eloy JD. Gender disparity among US anaesthesiologists: Are women underrepresented in academic ranks and scholarly productivity? Acta Anaesthesiol Scand. 2013;57:1058–1064.
24. Pagel PS, Hudetz JA. Scholarly productivity of United States academic cardiothoracic anesthesiologists: Influence of fellowship accreditation and transesophageal echocardiographic credentials on h-index and other citation bibliometrics. J Cardiothorac Vasc Anesth. 2011;25:761–765.
25. Cvetanovich GL, Saltzman BM, Chalmers PN, Frank RM, Cole BJ, Bach BR Jr.. Research productivity of sports medicine fellowship faculty. Orthop J Sports Med. 2016;4:2325967116679393.
26. Zhang C, Murata S, Murata M, et al. Factors associated with increased academic productivity among US academic radiation oncology faculty. Pract Radiat Oncol. 2017;7:e59–e64.
27. John AM, Gupta AB, John ES, Lopez SA, Lambert WC. A gender-based comparison of promotion and research productivity in academic dermatology. Dermatol Online J. 2016;22:2.
28. MacMaster FP, Swansburg R, Rittenbach K. Academic productivity in psychiatry: Benchmarks for the h-index. Acad Psychiatry. 2017;41:452–454.
29. Lopez SA, Svider PF, Misra P, Bhagat N, Langer PD, Eloy JA. Gender differences in promotion and scholarly impact: An analysis of 1460 academic ophthalmologists. J Surg Educ. 2014;71:851–859.
30. Hill EK, Blake RA, Emerson JB, et al. Gender differences in scholarly productivity within academic gynecologic oncology departments. Obstet Gynecol. 2015;126:1279–1284.
31. Diamond SJ, Thomas CR Jr, Desai S, et al. Gender differences in publication productivity, academic rank, and career duration among U.S. academic gastroenterology faculty. Acad Med. 2016;91:1158–1163.
32. Quigley MR, Holliday EB, Fuller CD, Choi M, Thomas CR Jr.. Distribution of the h-index in radiation oncology conforms to a variation of power law: Implications for assessing academic productivity. J Cancer Educ. 2012;27:463–466.
33. DeLuca LA Jr, St John A, Stolz U, Matheson L, Simpson A, Denninghoff KR. The distribution of the h-index among academic emergency physicians in the United States. Acad Emerg Med. 2013;20:997–1003.
34. El Emam K, Arbuckle L, Jonker E, Anderson K. Two h-index benchmarks for evaluating the publication performance of medical informatics researchers. J Med Internet Res. 2012;14:e144.
35. Khan AZ, Kelley BV, Patel AD, McAllister DR, Leong NL. Academic productivity among fellowship associated adult total joint reconstruction surgeons. Arthroplast Today. 2017;3:298–302.