Key Indicators in Academic Medicine (KIAMs), a feature introduced in this issue of Academic Medicine, are intended to substantially inform teaching hospitals and medical schools on those metrics that they may use to gauge their health, including the performance of units and programs within these organizations.1 Ultimately, KIAMs may help promote effective growth and development in an increasingly dynamic clinical, training, and research environment. In this perspective, we suggest a framework for analyzing the KIAMs that we believe will enhance the usefulness of the published pieces. These recommendations represent our opinion on how to maximize the applicability and impact of the KIAMs feature. Our suggested approach is more structured than might be imagined or preferred by others: We envision the development and publication of KIAMs as a systematic mechanism to assemble an actionable “playbook” for academic health center (AHC) leaders. We propose that this perspective, together with the first KIAMs, could be used to refine the guidelines and expectations for KIAMs published in the future.
What Are the Ideal Characteristics of KIAMs?
We start with premise that key indicators (KIs) are intended to provide guidance in tactical and strategic decision making, rather than in setting the mission, vision, or strategic plan of AHCs. Accordingly, KIAMs would help answer “How?” much more than “Why?” or “What?” In this regard, we see KIAMs* as akin to key performance indicators (KPIs) in the business literature. KPIs are measures that an organization uses to define and evaluate its success in making progress toward its long-term organizational goals. KPIs are thereby derived from and aligned with organizational mission, vision, strategy, and objectives. In this paradigm, Academic Medicine readers would be evaluating the KIAMs for applicability to their institutions' established strategic goals, rather than using the KIAMs to set those goals.
The likening of KIAMs to KPIs would help guide their design and use. KPIs (and, by analogy, KIAMs) should be meaningful, understandable, measurable, management driven, longitudinal, and actionable (see Table 1, where we explain the goal-setting acronym SMART,2 which captures these principles, and apply it to the KIAMs). In each KIAM, the text describing the derivation and application of the indicator should stand up to scrutiny through a series of generic questions, including the following:
* How compelling is the methodological and statistical evidence that the indicator is valid?
* How convincing is the cause-and-effect relationship between strategic decision making and improvement (or lack thereof) in the indicator?
* What are the comparators for the indicator in the published/publicly available domain?
* How sustainable is the indicator over time, how effective is the longitudinal follow-up, and what is the lag period between implementation and results?
An additional and important consideration is the effect of establishing the indicator on overall behavior. There are incentives (explicit and implicit) associated with establishing a KI†; clear alignment between strategic goals and incentives is essential.3 As measurement itself often drives the behavior of individuals and institutions (e.g., “Hawthorne effect”), two questions must be asked: Does the establishment of the KI have benefit regardless of the derived metric? Conversely, if the KI is no longer tracked, are the putative benefits of the intervention durable? As discussed in a recent series of articles,4,5 more than half of respondents to a poll on scientific productivity metrics indicated that they had changed their behavior because of the way they were evaluated, and nearly three-quarters were concerned that their colleagues could “game” or “cheat” the systems for evaluations at their institutions. On the other hand, changing the parameters that are rewarded typically results in a change in behavior, with deterioration of performance on valued tasks that are no longer incentivized.6
Even more important is the rationale for proposing a particular KIAM. KPIs are the most relevant and important of the metrics used to guide organizational decision making. They are typically either selected or derived from a much larger set of benchmarks that capture all of an enterprise's activities. Through this selection process, KPIs promote strategic focus. From this perspective, KIAMs should consist of selective metrics that have the greatest utility to guide decision making. They should provide a strong rationale for the utility of the proposed indicator and meet the SMART criteria as described in Table 1.
Applying KIAMs When Allocating Resources: The Importance of Generalizability
In our opinion, there is a key question readers should ask about each KIAM: Can and should the indicator be implemented at and/or adapted for my institution? After considering the fit of the indicator to the institution's strategic plan (and with its culture and history), the reader's answer would then typically hinge on resource allocation decisions (e.g., resource requirements, opportunity costs, return on investment). In this sequence, the KIAM would inform resource allocation by helping to prioritize resource-related decisions.
Considerations in defining KIAMs
We submit that there are three different contexts in which KIAMs could be applied, but two take precedence. When considering a KI as a possible KIAM, one should ask, Is the KI referable (1) to a single AHC only, (2) to teaching hospitals or medical schools generally, or (3) to other types of organizations as well as AHCs? As we will explain below, we submit that the answer to question 2—and, whenever appropriate, question 3—should be “yes.”
KIAMs should be generalizable
More often than not, an AHC will make its resource allocation decisions predicated on the answer to the following question: How does our (single) AHC allocate available resources among various programs to improve performance on KPI metrics? All-funds budgets and funds-flow analyses are representative of more sophisticated approaches to assist in this endeavor.7,8 Although knowledge of practices at other institutions can inform the process, it is not essential. For purposes of viewing resource availability as a fixed pot, this paradigm resembles a zero-sum game, in which incremental allocation of resources to one initiative by definition curtails allocation of resources to another initiative. On the basis of these considerations, and as we discuss further below, resource allocation decisions using institution-specific KIs are not optimal as KIAMs because of their limited generalizability.
KIAMs should allow comparison across institutions
In many situations, resource allocation decisions are made in the context of responding to the following question: How do the resource allocation decisions within our (single) AHC fit within “standard practices” for AHC management? In contrast with allocating resources to improve KPI performance, this process requires the AHC to select KIs to “manage toward” or to use to “manage to” strategic goals. These metrics are typically derived from comparisons across institutions and are particularly relevant to consider as KIAMs.
KIAMs may utilize tools that allow comparison with other types of organizations
AHCs are increasingly asking, How well do resource allocation decisions within our (single) AHC fit with the theoretically (and empirically derived) optimal approaches in other organizations, as determined using tools from managerial accounting, finance, economics, and decision support analysis? Answering the query for a single AHC does not depend on obtaining information from multiple institutions, although the tools used are derived from, and often validated in, the broader organizational finance community. Accordingly, KIs that use validated financial tools to evaluate resource allocation or other tactical and strategic interventions may have more durable impact and applicability as KIAMs.
Deriving KIAMs for All Components of the AHC Mission
The optimal management of AHCs requires rigorous KIs for all three core missions: clinical care, education, and research. In the clinical arena—particularly practice plan and hospital management—KIs are more developed. Generalizability, comparison across institutions, and, in some cases, comparison with other types of organizations are expected and possible.9 There is a substantial resource base outlining financial and nonfinancial performance measures for AHCs. Metrics available through the University Health System Consortium, Medical Group Management Association, Faculty Practice Solutions Center, and other organizations offer definitions of best practices and provide institutions with targets to “manage toward.” Through a process which began more than two decades ago and continues to be refined, KIs that are both accepted and applicable to a wide range of institutions have been developed. At the most general level, adopting these KIs allows organizations to “right size” their resource allocations. The more recent emphasis on quality of care and patient safety has resulted in an analogous menu of widely applicable performance measures. Therefore, a valuable outcome of the KIAMs feature would be the identification of a selective group of KPIs from these larger sets of performance metrics in the clinical arena. A complementary outcome would be the definition of new indicators, developed de novo or based on derivations and combinations from existing measures.
The availability of KIs to evaluate research and education performance is much more limited. In the absence of common databases and management tools equivalent to those in the clinical arena, there are few agreed-on targets to “manage toward.” Even extensive sources that deal with AHC performance focus largely on hospital and practice plan management metrics.10–12 Hence, there is typically a tenuous basis for determining whether the optimal investment has been made (or whether it is too large or too small) in research and education. The implicit assumption, particularly in the research arena, is usually that bigger is better, perhaps because critically important KIs that assess quality, innovation, and impact of research are difficult to develop.13 However, even with the most desirable sources of research funding, AHCs underrecover total research costs, so striving to continually expand the research base may not optimize the research program's resource utilization or impact.
This limited availability of KIs in the research and educational domains is attributable to the complexity of outcomes and infrequent implementation of KIs across institutions. The measurement of clinical performance two decades ago was similarly restricted until detailed databases were developed that employed precise definitions for data collection and analysis. Accordingly, Academic Medicine's KIAMs should facilitate this same process in the research and educational domains.
Pitfalls and Opportunities in Establishing KIAMs
We believe it will be helpful to provide general and specific characteristics of what we would consider to be useful (and not so useful) KIAMs, by creating contextual settings for their derivation and use. We first outline features of less-than-optimal indicators for AHCs and provide strategies to improve their usefulness. We next describe some indicators derived from other organizations and review their applicability for AHCs. Finally, we propose how we believe the KIAMs feature could be expanded to broaden its impact.
Ratios in which the numerator and/or denominator are not standardized.
Arguably, the biggest impediment to establishing KIs applicable across a wide range of AHCs is the absence of standard definitions for many of the parameters. (An ambitious long-term goal for the KIAMs could be to derive such standards; see below.) Accordingly, any proposed KIAM in which the parameters are not standardized requires, at a minimum, attention to rigorous definitions. Better yet, KIAMs can be constructed so as to minimize this problem. We offer the following examples to illustrate the importance of deriving standard definitions.
Occasionally, one will hear the claim that grant funding per faculty at a given institution is particularly high compared with that of other institutions. Such a ratio is derived from publicly available data on faculty numbers and some, if not all, grant funding. This ratio is confounded by wide variations in definitions of “faculty.” Even faculty numbers reported for a single institution can be substantially different across various publicly available sources (e.g., Association of American Medical Colleges, National Science Foundation, U.S. News & World Report, institutional Web sites). Therefore, all ratios with faculty numbers in the numerator or denominator are limited by definitional uncertainty (e.g., student-faculty ratio, teaching credit hours per faculty, relative value units per faculty). Although definitions of faculty categories (e.g., full-time, part time, tenure track) are highly standardized across institutions, uniform application of the standards in reporting by all institutions and publicly available sources would be necessary to make comparisons valid. Furthermore, the number of research faculty within an AHC may vary independently from the number of total faculty, depending on the size of the AHC's clinical, teaching, and research programs, thereby further compromising comparisons of research funding per faculty member across institutions.
As one alternative, ratios involving faculty numbers could be derived with “built-in” definitions for both numerator and denominator that are equivalent across institutions. For example, determining R01s per R01-funded faculty member standardizes the denominator to only those individuals who have obtained an R01. Although this approach would enable a more standardized comparison of the performance of research faculty across institutions, this ratio is confounded by R01s with multiple principal investigators—further illustrating the complexity of standardized comparisons.
Surrogates of multidimensional processes are used to capture performance.
KIAMs have the potential to guide and help simplify the complex task of managing teaching hospitals and medical schools. KIs composed of single metrics to assess performance on complex multidimensional tasks, however, have the potential to mislead.14 This is particularly true when those metrics are surrogates for the actual outcome one is trying to measure. Clearly, KIs are most useful when they directly assess the activity under consideration.
Publication metrics serve as an example of how single measures can fail to adequately assess multidimensional tasks. The use of publication metrics to evaluate faculty or institutional performance in research is commonplace, yet it is fraught with controversy. Although there is general understanding that a journal's impact factor does not capture the performance of an individual scientist or publication, it is nonetheless used as a surrogate for scientific productivity. Although many modifications to the impact factor have been developed to deal with this issue, and tools to normalize the impact factor have been reported,15,16 controversy remains. In a recent comparative analysis of 39 scientific impact measures, Bollen and colleagues17 found that impact factor was one of the least valuable measures of productivity and concluded that scientific impact is a multidimensional construct that cannot be adequately measured by any single indicator. Of interest, in their analysis they determined that the most important factor in capturing scientific impact is whether a metric measures rapid or delayed impact.
Ratios in which the numerator and/or denominator are not controlled by the institution.
It is commonplace for AHCs to use national ranking scales as performance indicators. Among the well-known national rankings are those that compare overall medical school and/or hospital performance (e.g., U.S. News & World Report18) or research funding (National Institutes of Health, National Science Foundation). As Gladwell14 explains in a recent, superb critique, comparisons of medical school or hospital performance are confounded by the use of “heterogeneous” ranking systems that are devised to cover all schools or hospitals and include a wide array of parameters in calculating the final ranking. In the case of research funding comparisons, rankings are based on absolute numbers and are not normalized by institution size or characteristics. He concludes, “Who comes out on top, in any ranking system, is really about who is doing the ranking.”
However, even if an AHC makes strategic decisions that improve its performance on metrics used in rankings, there is no guarantee that its rank level will improve. That is completely dependent on the performance of its peer institutions, which it has limited or no capacity to influence. The AHC also lacks the ability to influence the weighting of the parameters used to calculate the rankings. This situation does not obviate the importance or relevance of improvement in selected comparative metrics; rather, it highlights the importance of establishing KIAMs that reflect institutional performance independent of other organizations.
One of us (K.A.J.) has previously suggested a series of alternative measures for assessing performance in the research funding arena which reflect institutional performance independent of other organizations.19 These alternative measures are organized around ratios in which both the numerator and denominator are derived from the same institution (e.g., percentage increase in institutional NIH funding from year Y to year Y + 1, 2, 3 …). If the percentage is further normalized by comparison with the change in the NIH budget over the same period, one gets a measure of performance in absolute terms compared with the “funding market.” We are not suggesting that this metric is an appropriate KIAM but, rather, that the logic of its derivation is generalizable.
Performance metrics in not-for-profit organizations: Setting the axes correctly
A common perception is that benchmarks for many AHC activities cannot be defined with clarity because of the enterprise's not-for-profit nature. One of us (K.A.J.) has previously described a paradigm for evaluating performance of individual projects in not-for-profit organizations that deals with this issue.20 When displayed in graphical form, any existing or proposed project can be evaluated simultaneously along one axis for its contribution to the mission of the organization and along a perpendicular axis for its revenue-generating potential. The most valued projects are those situated in the upper right quadrant—they score highly on their contribution both to mission and to revenue generation. This paradigm can be applied to the evaluation of proposed KIAMs by responding to the following question: Does the indicator provide generalizable approaches for measuring performance along both axes? The challenge is primarily in developing common metrics for contribution to mission.
Financial ratios as KIAMs: Opportunity for comparison, fraught with peril
Ratios in which both the numerator and denominator are in dollars have an intrinsic advantage—the unit of measure is identical across institutions.21 They also offer the opportunity to extrapolate general principles from other types of organizations to the management of AHCs. For example, Table 2 summarizes four KIs of financial performance for institutions of higher learning.22 Although these are not directly applicable in this form to the Academic Medicine KIAM feature, they are representative of the utility of precise and uniform definitions. At the same time, however, nomenclature of categories and funds-flow accounting vary widely across AHCs, creating a perilous landscape for comparison, absent precise and uniform definitions.
As part of an effort to identify financial ratios as candidates for KIs, one of us (K.A.J.) recently used ratio analysis to analyze data from the 2007–2008 AHC census conducted by the Association of Academic Health Centers (AAHC).23 Fifty-five AHCs responded to the AAHC survey, providing detailed financial information, expressed as raw numbers. These data were analyzed by normalizing data from an individual institution to that institution, by creating a ratio of two separate values from the institution. The ratios were then compared across institutions. To a great extent, this strategy minimizes the effect of institution size on the raw numbers because differences in size are the predominant limitation of using absolute values for developing meaningful metrics. Ratio analysis thus provides a range of normalized responses, which can be displayed in graphical form to determine both the shape and the range of the distribution. The data can be readily scrutinized to determine where any given institution falls within the distribution.
One of the most interesting ratios from this analysis of the AHC census data23 was that of total AHC payroll to total AHC operating expense. Nearly three-quarters of all evaluable responses were in a peak centered around 0.52, with a range between 0.44 and 0.60. Of note, there was no distinction between institutions by public versus private status, by research intensity, or by geographic region.
As a point of reference, payroll-to-operating-expense ratios vary substantially depending on the sector being measured. Values around 0.2 are characteristic for durable goods manufacturing, construction, and retail and wholesale trade. Ratios in the hospital sector cluster around 0.5, regardless of hospital type (large university teaching, small community, etc.), whereas for practice plans the values are typically 0.8 or higher. It is, therefore, a reasonable hypothesis that the ratio of payroll to operating expenses—if refined by guidelines for uniform assignment of payroll expenses and differentiated by category (e.g., research, education)—could constitute a KIAM to guide optimal resource allocation.
Looking forward: A suggested evolution of the KIAMs feature
More generally, we suggest that the KIAMs feature, by catalyzing data collection using defined criteria, could explore and validate potential KPIs in AHCs' research and educational domains. The intrinsic functions in education and research, just as in the clinical arena, are similar across institutions. As examples, running a small-group teaching session, giving a lecture, or performing wet-lab or dry-lab research are fundamentally the same functions, irrespective of institution or geographic location. Trends in trainee performance in each of the Accreditation Council for Graduate Medical Education's six general competencies following various interventions could be compared across institutions as measurement tools improve. We believe it would be particularly interesting to see changes in relative performance for individual learners in addition to aggregate absolute performance for groups of learners. Each of these approaches would provide useful information to assess teaching and learning strategies. These, in turn, are subsets of the “educational epidemiology” approach suggested by Carney et al24 in which observational and randomized experimental designs would be applied to study physician education, through a comprehensive national network.
Generating data in the educational and research arenas using refined and uniform definitions would simultaneously define which metrics are applicable across institutions, which are relevant to specific categories of institutions (e.g., research-intensive versus non-research-intensive), and which are intrinsic to individual institutions. Fortunately, much of the information needed to generate research and education metrics is already collected by AHCs and provided to accrediting or oversight bodies (e.g., Liaison Committee on Medical Education, A21 reports to the Office of Management and Budget for indirect cost negotiations), oftentimes with highly precise and uniform definitions (such as for categorizing research and education space in the A21 reports). For space and for other categories, there is no need to start from scratch, particularly because some of the data are available from public sources. Identifying and using comprehensive and publicly available data has multiple benefits:
* The data source can be referenced and verified.
* Response rates are not an issue and allow conclusions to be drawn without concern about whether the information is representative.
* The data are updated regularly and are usually available in longitudinal fashion.
* The primary data can be scrutinized by individuals and by organizations.
* The definitions and guidelines for providing the data are typically detailed, discriminatory, and consistent over time.
* It is faster, cheaper, and less burdensome to use these data than to try to collect them de novo.
* This approach could serve as a catalyst for coalescing entities/organizations around joint initiatives.
The organizational infrastructure required to rigorously define KIs in education24 and research would require continuous investment and refinement. Even if an approach to rigorously define KIs in education and research were pursued, it seems likely that identification of the most specific, measurable, attainable, relevant, and time-bound KIs would not be realized for some time.
We believe that the development of robust KIs will be increasingly critical for the efficient and effective strategic management of the missions of AHCs. This perspective constitutes our initial attempt to propose guidelines for the development of KIAMs. We believe that, ultimately, there should be standard guidelines that all AHCs embrace.
Concluding Thoughts: What Constitutes a Hit?
We conclude with an analogy between the KIAMs feature and the game of baseball. At the outset of this feature, there will appropriately be different perspectives on what constitutes a valuable KIAM—or, in baseball terminology, whether a suggested KIAM is a hit. Not only may the umpires (reviewers) and the head umpire (the editor-in-chief) find it difficult to determine what constitutes a hit, but they may also face challenges in distinguishing singles from extra-base hits, including home runs. Adding in walks, strikeouts, errors, and wild pitches—we could go on—further magnifies the complexity. This perspective represents a first attempt to build a rule book to help guide that endeavor with an ultimate goal of having a standardized set of rules that all players (faculty and staff), managers (department heads, center directors, deans, CEOs), and team “owners” (boards for nonprofit organizations) embrace because the rules improve the game. Finally, identifying the most valuable KIAMs for AHCs, particularly KIAMs that reliably evaluate education and research, will require multiple seasons (and potentially even some playoffs) to determine which KIAMs allow all involved to be champions.
* In management, the term key indicator can be applied to past, coincident, or future events. Given that this Academic Medicine feature encompasses all three time frames, KIAM is an apt description. Cited Here...
† In this article, we use KI when referring to general considerations in formulating metrics for AHC management. KIAM refers more specifically to the KIs published as part of this Academic Medicine feature—particularly those which meet our suggested criteria to qualify as a KIAM. In some contexts, the distinction cannot be made with precision and/or is not germane. Cited Here...