Iyengar, Rama MBA; Wang, Yanping PhD; Chow, Jennifer; Charney, Dennis S. MD
The Mount Sinai School of Medicine (MSSM) has embarked on a new strategic plan for the next decade. As the strengths and weaknesses of current operational practices were evaluated, it became clear that as an institution, faculty and administration needed to agree on the definitions for success and productivity. This was particularly important in research, where often the administration assessed success largely by the level of extramural peer-reviewed grant support, whereas faculty often focused on peer-reviewed modalities such as letters of recommendation from senior researchers in a field. Evaluating faculty and programs through peer reviews and recommendations is valuable and continues to carry significant weight in decisions regarding resource allocation. However, use of more objective measures of the significance and impact of research programs, ones that can be computed both by the administration and the individual researcher or department, introduces transparency into the decision-making process and can validate decisions that are both positive and negative.1
We conducted a literature review of the methods used by institutions in the United States and Europe to evaluate research faculty and to allocate resources.2–4 We were particularly interested in the guidelines for the allocation of laboratory space. We also conducted a brief survey of most of the member institutions of the Association of American Medical Colleges (AAMC) to determine how they evaluated the productivity of their research faculty. The survey posed the following questions: (1) What criteria does your institution use to allocate research space? (2) Is “research density” a criterion? Can you share benchmarks used by your institution for research density? (3) What other measures of performance do you use? (4) Finally, what is your experience using impact factor for publications? We posted the survey on the AAMC listserv on February 6, 2007, and the survey remained open for two weeks (only one response was received after February 20). At the time of our study, there were 100 institutions on the AAMC listserv.
We received a total of 30 responses, a response rate of 30%. The survey period of two weeks is in line with similar AAMC listserv survey periods, but the response rate was almost three times their normal response rate, indicating significant interest in this subject. Out of the 30 institutions that responded to the survey, 28 (93%) indicated that their institutions used extramural funding as the primary evaluation criterion in allocating space. Most indicated that their experience in the use of an impact factor was limited and that they did not believe that it was an appropriate measure or added value to faculty evaluations because of the inability to compare across disciplines Although with this low response rate we cannot assume that the survey responses reflect the situation at all AAMC medical schools, the published literature5–9 supports the anecdotal evidence that U.S. institutions predominantly use research grant levels as the primary benchmark, whereas European institutions use publications data, specifically, the impact factor of the journals in which their faculty members publish. Allocation of funds and recruitment and tenure decisions in several European countries10 are directly linked to the impact factor of journals in which the institution’s researchers publish; there continues to be a vigorous debate about this in the scientific literature.11–23
The Need for an Integrated Approach
Because multi- and interdisciplinary translational research are the overall goals of medical school research enterprises, there is a clear need for metrics that can be adopted across different disciplines and areas of research and are broadly accepted by faculty members as a fair measure of performance on which resource allocation decisions are based. Applicability of these metrics across disciplines is particularly essential because decisions by medical school administrators often involve evaluating and balancing the relative merits of many valuable fields of research. An integrated approach that measures multiple aspects of performance might be considered fairer than a measure of a single aspect of performance. This approach would only be valid if the different metrics were independent. The central hypothesis for the analysis we report below is that research density (extramural grant dollars/square foot of research space) and the impact of a faculty member’s research (as measured by impact factors of journals in which they publish) would be independent measures of their performance. Hence, we decided to examine whether the research density of a faculty member correlated with the impact of that individual’s publications. We also tried to develop an approach whereby the combination of metrics would provide a basis for classification of each faculty member for individualized faculty mentoring and development, when needed. The approach, described in this article, was initially tried using data from 2006 and 2007. It was found to be a success and its use continues at MSSM, as both chairs and faculty have found it to be useful. The analyses reported below used data gathered for routine management purposes. We did not attempt to design the analyses as a formal study with controls, because this is not feasible within one institution.
Research Space Allocation
The allocation of resources such as laboratory space and discretionary funds within the research enterprise is always challenging, because many good and worthwhile ideas and projects compete for them. Research space is a particularly scarce resource because, even under the best of circumstances, it takes several years to develop and bring into operation incremental space. The allocation and reallocation of research space becomes an important ongoing facet of medical school operations that has direct near-term consequences for strategic growth. It has become increasingly clear that there is a need for objective and transparent criteria for the allocation of this institutional resource.2
Like other institutions in the United States, MSSM has historically used funding level as the metric used to measure research productivity and to allocate space. The ratio of extramural support to research space, or research density (RD), expressed in dollars/square feet, is computed both at the level of individual investigator and as an aggregate for the department. At MSSM, both direct and indirect funds are included, and the research space comprises individual investigators’ research space and an allocation of all shared research and administrative spaces. We expect all researchers to achieve a minimum RD target that is set by the Dean’s Office. Current targets at MSSM are $500/square foot for lab-based (wet) research and $1,000/square foot for computational/clinical (dry) research. We conduct an annual space survey and calculate the RD of each principal investigator as part of our space management process.
Impact Factor for Individual Scientists
The impact factor of journals is a widely used and familiar measure of quality of primary research.24 Although newer measures for journal impact such as the Eigenfactor25 or for longitudinal individual impact such as the Hirsch index26,27 have been introduced, the Thomson-Reuters (TR) two-year journal impact factor remains the commonly understood and used measure of research “quality.” As defined by TR, the “impact factor of a journal is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years” (http://thomsonreuters.com/products_services/science/free/essays/impact_factor).
However, there are clear limits to its applicability. Because biomedical research consists of many fields of disparate sizes, the impact factors of their journals vary greatly. Thus, whereas the impact factors of the most general and prestigious journals (Cell, Nature, New England Journal of Medicine, and Science) range from 20 to 40, the average for the leading journals in a specific field or subspecialty are considerably lower and vary several-fold. When comparing researchers in different fields, the impact factor of a journal in which a researcher publishes might not be indicative of the researcher’s scientific influence within his or her respective field. We, therefore, thought that it was necessary to develop a method to allow for valid comparisons across fields.
Comparing Research Impact Across Disciplines
Discussions with the senior faculty and department chairs at MSSM resulted in a metric that was accepted as a fair and appropriate quantitative measure of the quality of an investigator’s publications. The adopted methodology compares the impact factor of an investigator’s articles with those of the top journals within their own field. Each investigator identified the top three journals in his or her field. The average impact factor of these three journals was used as the benchmark for that investigator. Each faculty member was then asked to calculate his or her own individual impact factor (IIF) for two consecutive years, using 75% of their benchmark as target. This benchmark was selected after reviewing results of comparisons of investigators’ IIFs with their self-defined benchmarks at several multiples (50%, 75%, and 100%). We used 75% of the self-defined benchmark as the target, because it is unlikely for every paper to be published in the best journal in the field, and yet 75% reflects the reasonably high standard of the research quality that MSSM strives for. The data were collated and the IIF of each faculty member was computed as the ratio of his or her impact factor to 75% of his or her self-defined benchmark, expressed as a percentage. For example, if the average impact factor of the top three journals of an investigator’s field is 12.0 and the investigator’s own impact factor for the past two consecutive years is 10.0, the IIF = 10.0/(12.0 × 75%) = 111%. Because the IIF is a ratio, it corrects for disparities that arise on account of the varying sizes of research fields. Thus, we could compare a researcher in a field in which the impact factor of the top journal is 2.0 with a researcher in a field where the impact factor of the top journal is 10. This also allowed for comparison of research quality of individuals working in different areas within the same department.
Relationship Between RD and IIF
We analyzed the data for two years that covered the performance reviews conducted in 2006 and 2007 (on data from 2005-2006 and 2006-2007). We had data from 188 principal investigators in 2006 and 213 principal investigators in 2007, including 158 investigators who provided data for both years.
Results for 2006 indicated that RD and IIF had a weak but statistically significant and positive correlation (Spearman rho = 0.23, P = .001). Analysis on 2007 data did not indicate a significant correlation (Spearman rho = 0.10, P = .15; see Figure 1). Overall, we interpret these data to indicate that there is no strong or significant correlation between the IIF and RD which is not entirely surprising, because the IIF measures past productivity, whereas RD is a measure of the ability to conduct future research. One could argue that the two ought to be linked, because one of the review criteria for grants from the National Institutes of Health comprises the faculty member’s research expertise and ability. Gathering these data over a much longer period (five to seven years) may help clarify this relationship. The current absence of a strong association between RD and IIF suggests that using either variable without taking into account the other will not give the full picture of the faculty member’s research performance. Therefore, it is important to use an integrated approach, taking into consideration both these variables to obtain a fuller evaluation of research performance.
Development of a Composite Metric for Research Performance
To better understand the relationship between RD and IIF, we took the scatter plots in Figure 1 and converted them into a four-quadrant matrix for each year within which each researcher could be identified by both criteria (Figure 2). The characteristics of the quadrants are listed below.
* Quadrant 1. RD ≥ target and IIF ≥ target. Investigators in Quadrant 1 have a high RD and a high IIF. These could be identified as the strongest researchers in the institution. They have well-funded research programs and are well recognized in their field for their contribution to its knowledge base.
* Quadrant 2. RD < target but IIF ≥ target. Investigators in Quadrant 2 have a high IIF but a low RD. This could be due to temporary loss of funding or other reasons that need further investigation.
* Quadrant 3. RD < target and IIF < target. Investigators in Quadrant 3 have not achieved their benchmark in either metric. They would be considered the weakest investigators, absent mitigating circumstances such as junior faculty in start-up period.
* Quadrant 4. RD ≥ target but IIF < target. Investigators in Quadrant 4 have a high RD but a low IIF. Again, the reasons for this require further investigation.
Use of Composite Metric as a Management Tool
Identification of the faculty in these quadrants has enabled the Dean’s Office to assess the institution’s strengths and weaknesses and devise faculty development approaches suited to each group.
This group represents the strongest investigators in the institution. The goal of the Dean’s Office would be to provide them with the resources to continue to be successful, and the institution would attempt to aggressively retain these investigators. The institution is likely to invest in these investigators’ research programs, if there is strategic convergence with institutional goals.
This group is doing research considered important by their peers, but its members are not successful in obtaining funding commensurate with their research impact. The institution should provide these investigators opportunities or training to enhance their grant-writing skills, explore other funding mechanisms/resources so that they could be more competitive, and increase their funding levels. In some cases, certain investigators may have small, high-impact programs but fall into this quadrant because of the nature of the research (e.g., large equipment needs). They need to be identified and valued for their contribution to the institution’s academic reputation on a case-by-case basis.
These are the least productive researchers in the institution. This group can be viewed as the one most in need of faculty development. The institution would have to closely monitor their progress, and the chairs of their departments would be asked to be proactive in discussing future plans with them. Because these researchers are not effectively utilizing research space, reallocation of their space to more productive faculty is, in most cases, justified.
Although this group is successful in obtaining funds, their research does not seem to have the same impact as similarly well-funded investigators in Quadrant 1. The reasons for this disparity may be complex and would need to be carefully analyzed on an individual basis. Some in this group may benefit from mentoring to improve the quality of their publications, leading to better recognition within their field. On the other hand, cutting-edge research often occurs without much recognition for many years, and for this category, the Dean’s Office may benefit from qualitative assessment of research, such as peer reviews from senior investigators in that field. The mentoring goal would be to assist this group of faculty so that they may eventually move to Quadrant 1.
Analyses of the Integrated Evaluation
In the near term, the effect of integrated evaluation has been largely on allocation of research space. Since 2005, there has been a sustained effort to change the space paradigm at the institution. Previously, space was allocated to the departments and was made available to department faculty at the discretion of the chair with little oversight from the Dean’s Office. Since 2005, all space is considered “institutional” and allocation is on the basis of performance. Department chairs are expected to assign space on the basis of their faculty members’ meeting productivity targets of both RD and IIF. Furthermore, department chairs themselves are evaluated in part on their ability to carry out the resource allocation process according to the standards set by the institution. We encourage chairs to share the results of our survey reported in this article with individual faculty members so that each faculty member is aware of the metrics used for his or her assessment. The Dean’s Office has established a new institution-wide leadership position, Associate Dean for Faculty Development and Mentoring, to develop and coordinate mentoring programs for faculty across the school.
Using the integrated evaluation, we (the authors, representing the Dean’s Office) made an effort between 2006 and 2007 to reallocate resources in a systematic manner. We concentrated our efforts on the faculty in Quadrant 3—those performing below target in both dimensions. Specifically, we reassigned space from 11 faculty in Quadrant 3 and 9 in Quadrant 2.
To determine the impact of the Dean’s Office interventional program, we compared the 2006 and 2007 RDs and IIFs of the 158 faculty for whom we had data for both years.
Among those faculty, space from 15 faculty was reallocated to those performing at a very high RD. Because of deviation from a normal distribution and presence of extreme values on the variables, the Wilcoxon signed rank tests were used. Results indicated a small but significant increase in aggregate RD at the institution from 2006 (mean = 111.2%, SD = 98.1%) to 2007 (mean = 124.3%, SD = 94.0%, z = −2.22, P = .03), suggesting that the institution’s space utilization is incrementally more efficient than before. Without adding significant additional area, it has been possible to find space for eight new recruits. In real numbers, the average density for all laboratory-based research in the institution (based on full data set for both years) increased from $480/square foot to $530/square foot, with standard deviations of 214 and 172, respectively.
To control for potential self-enhancing biases, we used the same self-defined benchmark of the impact factor to evaluate the IIF for 2006 and 2007. The impact factor is a typically lagging indicator in that it usually takes a few years for a published work to receive maximum recognition and citations. Nevertheless, based on the data from the 158 faculty that provided data for both years, our analysis indicated that there seemed to be a small but significant increase of the average IIF from 2006 (mean = 125.8%, SD = 92.6%) to 2007 (mean = 166.1%, SD = 259.7%, z = −2.11, P = .03). Descriptive statistics of RD and impact factor based on the data from the 158 faculty are summarized in Table 1. We examined the proportions of investigators in both years who have reached their RD target and 75% of their IIF benchmark. McNemar test results indicated that there was a significantly higher proportion of investigators who reached their RD target in 2007 (χ2 = 4.83, P = .03). There was no significant difference between 2006 and 2007 in the proportion of investigators who reached their 75% target IIF (χ2 = 0.02, P = .89). Besides the formal statistical tests presented above based on the 158 investigators for which we had data from both years, we also summarized the descriptive statistics of the full data based on 188 investigators in 2006 and 213 investigators in 2007 in Table 2. Descriptively, the full data set suggests a similar difference between 2006 and 2007 in RD and IIF. It will take several more years to ascertain whether there is a sustained increase in IIF also.
Advantages of Using Multifaceted Evaluation Criteria
As mentioned earlier, the challenge for institutional management in evaluating research is the need to compare across disciplines in order to effectively manage resources. The use of a single metric, with its idiosyncrasies, to assess research has drawbacks, because it is important to ensure that decisions are based on commonly accepted criteria and are perceived to be fair; multiple metrics are therefore preferable. They allow the institution to make assessments based on more balanced criteria and to demonstrate to the faculty members that decisions on resource allocation are based on hard evidence. As this article documents, at MSSM we experimented with an integrated approach between 2005 and 2007 that takes into account both RD and IIF. Whereas RD measures the ability to obtain grants at a point in time, IIF reflects the quality of research conducted by a faculty member. The use of dual metrics not only provides a fairer method of assessing performance but also is helpful as a management tool for resource allocation and faculty development.
At MSSM, an annual performance review of all departments now includes analysis of both grants and publications for the prior year. There has been general support from faculty members and departmental leadership for this process, as individual faculty are able to define the benchmarks against which their IIFs are calculated and department chairs see advantages in their ability to work with their faculty using a standard set of institutional guidelines. Resource allocations within departments and the institution are based on objective criteria that are transparent and shared with faculty members. Overall, faculty members have been receptive to these transparent criteria. The dean visited all academic departments in the fall of 2007 and 2008, and the faculty agreed that the evaluation criteria were fair to them.
The use of dual metrics allows the Dean’s Office to set realistic goals, as faculty development can be individually tailored for those performing below target in one dimension. In the short-term, the use of this process allows the Dean’s Office to review resource allocation and make adjustments as needed. It allows the Dean’s Office to couple resource allocation decisions with targeted faculty mentoring activities in an explicit manner. For this coupling, the integrated approach for evaluation of research performance should, when needed, be used in the context of other factors and qualitative evaluations by experts in the field. At the level of individual faculty, this analysis provides a starting point for discussions on research performance with their department chair and allows for consideration of mitigating factors such as ill health, change in family circumstances, and change in research directions when making decisions and designing mentoring programs.
The integrated approach described here has proved helpful at MSSM for evaluating research performance because it is evidence based and tailored to different fields of research (IIF) and types of research (wet lab versus dry lab). At the institutional level, the desired outcome is better use of a scarce and finite resource: research space. The desired outcomes at the individual level are a better grants portfolio and publications in higher-quality journals. The outcome of interest to senior management is that such an evaluation-feedback-mentoring loop will lead to a higher quality of science at the institution. Tracking these metrics and changes over a number of years and comparing the results with actual outcomes at the institution will determine whether this approach has been successful.
The authors thank Dr. Ravi Iyengar for his comments on earlier versions of this article.
1 Holmes EW, Burks TF, Dzau V, et al. Measuring contributions to the research mission of medical schools. Acad Med. 2000;75:303–313.
2 Fink I. Research space: Who needs it, who gets it, who pays for it? Plan High Educ. 2004;33:5–17.
3 Monastersky R. The number that’s devouring science. Chron High Educ. October 14, 2005;52:A12.
4 Fassoulaki A, Sarantopoulos C, Papilas K, Patris K, Melemeni A. Academic anesthesiologists’ views on the importance of the impact factor of scientific journals: A North American and European survey. Can J Anaesth. 2001;48:953–957.
5 Citation data: The wrong impact? Nat Neurosci. 1998;1:641–642.
7 Gastel B. Assessing the impact of investigators’ work: Beyond impact factors. Can J Anaesth. 2001;48:941–945.
8 Gisvold SE. Citation analysis and journal impact factors—Is the tail wagging the dog? Acta Anaesthesiol Scand. 1999;43:971–973.
9 Frank M. Impact factors: Arbiter of excellence? Physiologist. 2002;45:181–183.
10 Adam D. The counting house. Nature. 2002;415:726–729.
11 Talley NJ, Richter JE. The journal’s impact increases! Am J Gastroenterol. 2004;99:1867–1868.
12 Dong P, Loh M, Mondry A. The “impact factor” revisited. Biomed Digit Libr. 2005;2:7.
13 Cartwright VA, McGhee CN. Ophthalmology and vision science research. Part 1: Understanding and using journal impact factors and citation indices. J Cataract Refract Surg. 2005;31:1999–2007.
14 Garfield E. Journal impact factor: A brief review. CMAJ. 1999;161:979–980.
15 Garfield E. Use of journal citation reports and journal performance indicators in measuring short and long term journal impact. Croat Med J. 2000;41:368–374.
16 Garfield E. The impact factor and its proper application. Unfallchirurg. 1998;101:413–414.
17 Seglen PO. Why the impact factor of journals should not be used for evaluating research. BMJ. 1997;314:498–502.
18 Whitehouse GH. Citation rates and impact factors: Should they matter? Br J Radiol. 2001;74:1–3.
19 Not-so-deep impact. Nature. 2005;435:1003–1004.
20 Rossner M, Van Epps H, Hill E. Show me the data. J Cell Biol. 2007;179:1091–1092.
21 Smith G. Impact factors in anaesthesia journals. Br J Anaesth. 1996;76:753–754.
22 Makeham JM, Pilowsky PM. Journal impact factors and research submission pressures. ANZ J Surg. 2003;73:93–94.
23 Garfield E. The history and meaning of the journal impact factor. JAMA. 2006;295:90–93.
26 Hirsch JE. An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A. 2005;102:16569–16572.
27 Ball P. Index aims for fair ranking of scientists. Nature. 2005;436:900.