Secondary Logo

Journal Logo

Research Reports

Measuring Physicians’ Productivity

A Three-Year Study to Evaluate a New Remuneration System

Filler, Guido MD, PhD; Burkoski, Vanessa RN, MScN, DHA; Tithecott, Gary MD

Author Information
doi: 10.1097/ACM.0000000000000058

Abstract

Academic health science centers have transformed into complex business enterprises in which “clinical revenue and academic performance support each other by being strategically and tactically aligned.”1 As research and clinical success are synergistic and interdependent, medical school and university leaders have to collaborate,2 and department chairs have to work to enhance the academic productivity of their clinical departments.3

Performance-based compensation systems may have a substantial effect on the clinical, research, and teaching activity of physicians at academic health centers, as researchers in the Department of Medicine at Vanderbilt University4 and others5–10 have demonstrated. However, most of these studies focused on physicians’ clinical activity. The degree to which financial incentives improve scholarship remains uncertain. The three factors usually cited for improving clinical practice include changes in the reimbursement system, as outlined above, threat of legal action, and feedback on how physicians are doing in comparison with each other.11 Today, physicians hold many roles in the four primary domains of an academic medical department—clinical practice, education, research, and administration—yet measuring their work in each domain is difficult.

Performance-based compensation systems also require that physicians’ performance be measured appropriately, which presents additional challenges. An integrated approach that measures multiple versus single aspects of performance is preferable.12 Given the trend in academic health science centers to recognize multiple role categories for physicians, measuring these different roles has become particularly important.13 However, no one has reached a definitive consensus as to what constitutes good scholarship or good administrative skills. Definitions of success have been rather subjective.

At the time of our study, we found no comprehensive and validated system that both measured physicians’ academic productivity in the four domains of an academic medical department—clinical practice, education, research, and administration—and adjusted for the variable percentages of time physicians work in each domain in each academic role category—clinician administrator, clinician educator, clinician researcher, clinician teacher, and clinician scientist. In the Vanderbilt study, the researchers accounted only for two categories—either 80% research and 20% clinical or 80% clinical and 20% academic.4 In addition, the degree to which financial incentives positively affect scholarship remains uncertain—specifically, what motivates physicians’ scholarly productivity.

In this article, we discuss the development and assessment of a tool to evaluate a performance-based remuneration system in a medium-sized subspecialty department. The main purpose of our study was to evaluate the reliability and effectiveness of the assessment tool to assist the department chair with the task of rewarding academic productivity.

Method

The institutional research ethics board of the University of Western Ontario (REB file number 102508) approved our study. During the study period (July 1, 2008 to June 30, 2011), the Department of Pediatrics at the University of Western Ontario provided tertiary care in southwestern and northwestern Ontario for approximately 600,000 children. The department received funding from different practice plans—one each for emergency medicine, neonatal medicine, and perinatal medicine, and the London Academic Pediatric Association (LAPA) for all other academic pediatricians.

During the study period, LAPA paid 37 full-time academic physicians for approximately 19,000 inpatient days and 48,000 outpatient visits per annum and to train approximately 40 residents. These 37 physicians included general pediatricians as well as pediatric subspecialists. The LAPA funding stream is complex and consists of a number of fixed revenues for each physician and variable income based on individual physicians’ patient volumes and fee codes. The LAPA financial management committee (FMC), which includes six elected members and an ex officio member (the department chair), decides how to distribute the funding and how to hold recipients accountable for their use of the funding.

Developing the assessment tool

Enhancing academic productivity was a priority for the chair of the department of pediatrics, who was appointed in August 2006. In fall 2006, he established a departmental task force, including four LAPA members, a basic scientist, and himself, as a subcommittee of the LAPA FMC to develop a performance assessment tool that would evaluate four components—impact, application, scholarly activity, and mentorship—in each of four domains—clinical practice, education, research, and administration—while accommodating for variations in the time physicians spend on tasks in each domain. The distribution of the workload among the four domains matched the role categories outlined by the Schulich School of Medicine & Dentistry.

From October 2006 to May 2008, the task force reviewed all assessment tools used in the various role categories at 16 Canadian universities. From their review, they designed a tool based on those they found at 4 universities—University of Western Ontario, University of Toronto, University of Ottawa, and Dalhousie University. In May 2008, the task force presented three versions of the tool to the LAPA FMC and the LAPA members at large. All LAPA members then voted on which version to implement.

Using the assessment tool

All participating LAPA physicians approved each component and fully endorsed the final scoring sheet (see Appendix 1).

To complete the tool, individual physicians assigned themselves up to three points for each of the four components in each of the work domains. Thus, scores ranged from 0 to 12 for each domain. Physicians had to provide specific examples of their work to justify a score of 3.

Physicians then weighted the scores by the percentage of time they spent on tasks in each domain. This weighting process was necessary because physicians typically have several role categories—clinician administrator, clinician educator, clinician researcher, clinician teacher, and clinician scientist. Weighting scores acknowledged that different roles have different tasks. To better understand the differences between these five role categories, we listed the mean percentages of time physicians in each role category spent working in each domain in Table 1.

T1-35
Table 1:
Percentage of Time Spent Working in Four Domains of Five Role Categories Among 37 Physicians, University of Western Ontario, 2008 to 2011

Each August, all physicians completed the scoring sheet (see Appendix 1) and updated their curriculum vitae using a commercial software program (Acuity Star). Within two weeks, the physicians met with their section heads to discuss their self-assigned scores and to review those scores against their curriculum vitae. Together, they calculated a total score for each physician. The section heads also completed the scoring sheet and reviewed their scores with the department chair, and the chair completed the tool and reviewed his scores with four senior members of the department (the department has a general pediatrics section and subspecialty sections). The task force developed a detailed resolution process to address differences in opinions about the correct scores, but it was never used.

LAPA used Year 1 (academic year 2008–2009) as a baseline, implementing the assessment tool but continuing to divide funds equally among physicians irrespective of scores. In Years 2 and 3 (academic years 2009–2010 and 2010–2011), they tied remuneration to physicians’ scores. LAPA divided about $500,000 (all dollar amounts in Canadian dollars) by the sum of the physicians’ total scores to determine a dollar value for each point scored. Each physician then received a bonus calculated by multiplying that dollar value for each point scored by his or her total score (see Supplemental Digital Appendix 1, https://links.lww.com/ACADMED/A174). For example, in 2010, the average bonus was $10,540.54 ± 3,453.81, and each point scored translated to $1,293.96.

To illustrate further, for example, a senior clinician administrator spends her time in this way—15% clinical practice, 5% education, 25% research, and 55% administrative responsibilities. Her scores in Year 2 were 11/12 in the clinical practice domain, 5/12 in the education domain, 12/12 in the research domain, and 11/12 in the administration domain, which would result in weighted scores of 1.65, 0.25, 3.00, and 6.05, respectively, for a total score of 10.95/12. If she had a score of 10.78 in Year 1, her productivity then improved by 2%.

Evaluating the assessment tool

We entered all scores for the three-year study period into Microsoft Excel (version 14.2.2). Only one author (G.F.) had access to the original data. He linked the scores of each physician across all three years in the spreadsheet and then deidentified the data. To verify the data, we compared the percentages in the spreadsheet with those in the role categories documents (see Table 1) and the academic ranks with those in the departmental records. We obtained the physicians’ ages from a departmental database. We did not assess inter- or intrarater reliability. Because a physician’s age, baseline (or preintervention) score, and academic rank would invariably contribute to his or her final score, we planned a subgroup analysis by academic rank and a correlational analysis with age and baseline score.

We used GraphPad Prism 5 (GraphPad Software Inc., La Jolla, California) to analyze our data. We made no adjustments for missing data due to maternity leave or attrition. We assessed distributions for normality using the Kolmogorov–Smirnov test. Because all data except the administration/leadership scores were normally distributed, we used parametric methods in our analysis. We compared groups using Student t test or ANOVA for repeated measurements and tested for associations between variables using standard regression analysis. We used Bland–Altman analysis to test agreement between Year 1 (baseline) and Year 2 results, as well as between Year 1 and Year 3 results. We considered a P value of < .05 to be significant.

Results

The 37 physicians (100% of LAPA members) who participated in the first year of our study included 11 assistant professors, 22 associate professors, and 4 full professors. Although we analyzed the baseline results of all 37 physicians, we had three consecutive assessments for only 33 physicians, due to maternity leaves and attrition. Most physicians were men (22; 59%), and 10 were older than 54 years (28%). Mean (standard deviation [SD]) age at the end of the study was 48.4 [8.3] years (range = 34–61). Most physicians (25; 68%) were in the clinician teacher role category (see Table 1).

Year 1 results

We included the results of all 37 members in the baseline assessment. The mean total score was 7.44 (range = 3.9–11.3; see Table 2). Means [SDs] differed significantly by academic rank (assistant: 7.26 [0.85]; associate: 8.95 [1.60]; and full professor: 11.15 [0.78]), and full professors had higher scores (P < .0001, one-way ANOVA).

T2-35
Table 2:
Distribution of Performance Scores for 37 Physicians, University of Western Ontario, 2008 to 2011*

Years 2 and 3 results

The Years 2 and 3 results were highly correlated with those of Year 1 (r = 0.85 between Years 1 and 2; r = 0.89 between Years 1 and 3). In Year 2, the first year of performance-based remuneration, mean weighted scores did not differ significantly from scores in Year 1 (see Table 2). Our Bland–Altman analysis comparing Years 1 and 2 revealed a mean (SD) bias of +1.966% (19.17%), which was not significantly different from zero. After two years of performance-based remuneration, we still found no significant change in the mean weighted scores. We found an even closer agreement between the Years 2 and 3 total scores, with a mean (SD) bias of +0.1778% (1.036%). Overall, we found no significant improvement in scores between the baseline year and the following two years (see Table 3). However, we found significant differences between the subgroups. Assistant professors’ scores improved significantly between Years 1 and 2 (mean improvement +1.08, P < .001; see Figure 1).

T3-35
Table 3:
Mean Performance Scores for 37 Physicians, by Role Categories and Domains, University of Western Ontario, 2008 to 2011*
F1-35
Figure 1:
Changes in overall weighted performance scores between Years 1 and 2 for 37 physicians, by academic rank, in the Department of Pediatrics at the University of Western Ontario, 2008 to 2010. The bars indicate means and standard deviations.

Factors affecting change in overall scores

When we analyzed whether a physician’s baseline score was related to any change between his or her Years 2 and 3 scores, we found a negative correlation (r = −0.6, P < .001). Lower Year 1 scores were correlated with a greater improvement in scores between Years 1 and 2. Age also was negatively correlated with the change in scores between Years 2 and 3. Whereas younger physicians’ scores improved significantly, older physicians’ scores actually worsened (r = −0.5, P < .001).

Discussion

We developed an assessment tool to evaluate the academic performance of physicians in clinical practice, education, research, and administration that can accommodate differing percentages of time spent in each domain. The high degree of agreement between Years 2 and 3 scores suggests that our assessment tool and the resulting performance scores are a robust and feasible measurement system for use in our department. We found no evidence of significant improvement in scholarship scores, except among assistant professors, who had lower baseline scores. These results contradict earlier findings from another institution.4

Strengths and limitations

A considerable strength of our study was the high degree of agreement between physicians and their section chiefs in assigning the scores. We believe that the inclusion of specific anchors to define a score made the tool robust and reliable, as well as adaptable to a large variety of domains. Thus, it may be of value for many clinical departments, not just for pediatrics.

However, our study also had several important limitations. First, that scores did not increase overall may simply be related to our assessment tool being too insensitive to variations in performance, rather than to performance not varying. Second, change takes time. Perhaps two years was not long enough for marked changes to be reflected in performance scores. The Department of Pediatrics has continued to use this assessment tool, so hopefully, with additional time, scores will detect significant improvements in performance. Third, the LAPA incentives were equal to about 10% of each physician’s contracted remuneration. Although some studies found that this amount would improve productivity, it may not have been enough, given that other studies recommend 20% to 30% for such programs.4,14 Another possible confounder may be related to the role categories of the included physicians—clinician teachers composed two-thirds of our sample, it included no clinician educators, and the remaining 12 physicians were distributed across the remaining role categories. This imbalance may have affected our results. Next, no one used the process for resolving discrepancies between the physician’s self-assigned score and the section head or department chair’s assigned score. Although raw scores were sometimes adjusted higher or lower on the basis of a supervisor’s feedback, these adjustments were never contentious. However, the power differential between the physician and his or her supervisor may have prevented the physician from invoking the conflict resolution process, similar to what may occur between learners and teachers in medical education.15 However, we believe that this did not occur as physicians had to provide detailed information about their work to substantiate a score of 3.

Additional considerations

Over the past decade, offering financial incentives to physicians for achieving certain goals has become more common in hospital employment arrangements.16 However, the impact of these financial incentives on physicians’ performance and productivity is unclear and continues to be debated in the literature.5,16 The factors motivating physicians at academic health sciences centers to improve their scholarly productivity must be better understood before leaders can develop appealing incentives.

In addition, no universal design or consistent methodology exists for creating financial incentive plans.16,17 Approaches to allocating financial rewards vary from the subjective decision by a chairperson to the use of flexible or static scoring tools.10

The level of compensation also varies, ranging from as low as 1.5% to as high as 87% of a typical salary.16,17 Bluth14 suggested that, to be effective, incentives must generally be 20% to 30% of contracted compensation. The Vanderbilt study found that an additional 20% compensation attached to clinical and administrative work increased performance by up to 73%, depending on the role category and productivity in clinical care, research, and teaching.4 However, studies of other incentive programs with 10% or less compensation attached to physicians’ performance and productivity also revealed improvements in the quality of care delivery, research, teaching, mentoring, and administrative tasks.10,14,18 At present, financial incentive plans appear to be based on the amount of money available to individual health care organizations and the particular nuance of clinical departments, rather than on purposefully developed and evidence-based plans.

We found few examples of nonfinancial incentive programs aimed at stimulating physicians’ performance and productivity. Emery and Gregory10 described one academic department of orthopedics with no specific compensation for academic productivity. The incentive model instead was based on the distribution of academic tasks that, according to Emery and Gregory,10 “[enabled] people to contribute to the academic mission in a fashion that played to their individual strengths.” The department chair and peer pressure toward academic success supported a culture that valued academic productivity. The internal motivation of department physicians then reduced the need to financially reward academic work.10 Interestingly, a survey of the physicians in the study by Emery and Gregory10 revealed that departmental culture was the most important factor driving scholarly activity.

The type of incentive matters. An effective reward should be substantial enough to get the attention of the physician yet remain within the range of competitive practices and not discourage other desirable behaviors, such as providing excellent patient-centered care.10 Support from all members of the department is required, and the incentives should promote excellence in patient care and academic scholarship as opposed to average work. In our department, all members agreed on the scoring sheet criteria. Outcome variables that are unclear or incentives that reward work for tasks the department members do not support may deter staff innovation, impede performance, and lead to a culture in which quality improvements are treated with indifference.19 Academic departments, similar to other workplaces, are not homogeneous. One-third of our department, for example, is more than 55 years old, and most of the newer physicians are young and have different needs, especially with regard to a better work–life balance. These generational and cultural differences pose an additional challenge to designing incentives systems. A system with rigid incentives may fail if generational and cultural beliefs are not considered.20

Finally, the performance gap among associate professors is worrisome. Possible explanations include a lack of appropriate mentoring, misalignment of the requirements for promotion and the focus areas of the physicians, clinical workload, and age-related slowing down, but we do not know for certain. Future research into the absence of change in this group’s performance should examine their skill sets, dedication, and capacity to succeed while ensuring that no barriers to improvement exist.

McAllister and Vandlen21 argue that “communicating and coaching between managers and staff breed achievement in real time,” meaning that feedback must be timely and responsive. In fact, just that recognition may foster productivity. In response to lower-than-expected performance results, one company adopted a smaller number of goals that were aligned with the organization’s mission and tailored to individual meaningful and measurable targets.22 Whether these business models can be adapted to clinical academic departments remains uncertain. Key to improving productivity are having a reliable tool to measure individuals’ progress and exploring fully nonfinancial incentives before considering financial incentives, which may require significantly more funding while not rewarding the issues that truly motivate physicians.

Conclusions

The physicians in the Department of Pediatrics at the University of Western Ontario responded favorably to our assessment tool. Although physicians with lower baseline scores tended to improve their scores over time, the promise of remuneration for improvements in performance was not associated with score increases.

Acknowledgments: The authors thank the faculty members of the London Academic Pediatric Association practice plan, Department of Pediatrics, Schulich School of Medicine & Dentistry, University of Western Ontario, for their willingness to participate in this study. They are particularly grateful for Dr. Doreen Matsui’s assistance in preparing the ethics submission and review of the ethical considerations of this work. The authors express gratitude to Dean Michael Strong, University of Western Ontario, for his most important discussions and contributions to this article, and to Professor Alfred Drukker, Jerusalem, for his valuable discussion and assistance with preparing the revisions.

References

1. Wartman SA. Toward a virtuous cycle: The changing face of academic health centers. Acad Med. 2008;83:797–799
2. Levine AS, Detre TP, McDonald MC, et al. The relationship between the University of Pittsburgh School of Medicine and the University of Pittsburgh Medical Center—a profile in synergy. Acad Med. 2008;83:816–826
3. Souba W, Notestine M, Way D, Lucey C, Yu L, Sedmak D. Do deans and teaching hospital CEOs agree on what it takes to be a successful clinical department chair? Acad Med. 2011;86:974–981
4. Tarquinio GT, Dittus RS, Byrne DW, Kaiser A, Neilson EG. Effects of performance-based compensation and faculty track on the clinical activity, research portfolio, and teaching mission of a large academic department of medicine. Acad Med. 2003;78:690–701
5. Golden BR, Hannam R, Hyatt D. Managing the supply of physicians’ services through intelligent incentives. CMAJ. 2012;184:E77–E80
6. Wharam JF, Paasche-Orlow MK, Farber NJ, et al. High quality care and ethical pay-for-performance: A Society of General Internal Medicine policy analysis. J Gen Intern Med. 2009;24:854–859
7. Woodson SB. Making the connection between physician performance and pay. Healthc Financ Manage. 1999;53:39–42, 44
8. Patel PH, Siemons D, Shields MC. Proven methods to achieve high payment for performance. J Med Pract Manage. 2007;23:5–11
9. Christianson JB, Knutson DJ, Mazze RS. Physician pay-for-performance. Implementation and research issues. J Gen Intern Med. 2006;21(suppl 2):S9–S13
10. Emery SE, Gregory C. Physician incentives for academic productivity. An analysis of orthopaedic department compensation strategies. J Bone Joint Surg Am. 2006;88:2049–2056
11. Shaw EK, Howard J, Etz RS, Hudson SV, Crabtree BF. How team-based reflection affects quality improvement implementation: A qualitative study. Qual Manag Health Care. 2012;21:104–113
12. Iyengar R, Wang Y, Chow J, Charney DS. An integrated approach to evaluate faculty members’ research performance. Acad Med. 2009;84:1610–1616
13. Ackerly DC, Sangvai DG, Udayakumar K, et al. Training the next generation of physician–executives: An innovative residency pathway in management and leadership. Acad Med. 2011;86:575–579
14. Bluth EI. An incentive system for radiologists in an academic environment. J Am Coll Radiol. 2007;4:332–334
15. Plaut SM, Baker D. Teacher–student relationships in medical education: Boundary considerations. Med Teach. 2011;33:828–833
16. Butcher L. Financial incentives for employed physicians: Do they work? Physician Exec. 2010;36:18–21
17. Johnson J. Do you know the fair market value of quality? Healthc Financ Manage. 2009;63:52–58, 60
18. Reece EA, Nugent O, Wheeler RP, Smith CW, Hough AJ, Winter C. Adapting industry-style business model to academia in a system of performance-based incentive compensation. Acad Med. 2008;83:76–84
19. Kaplan RS, Porter ME. How to solve the cost crisis in health care. Harv Bus Rev. 2011;89:46–52, 54, 56
20. Latham GP. The motivational benefits of goal-setting. Acad Manage Exec. 2004;18:126–129
21. McAllister RB, Vandlen CE. Motivating employees in R&D. Cornell HR Review. 2010:1–6
22. Shaw KN. Changing the goal-setting process at Microsoft. Acad Manage Exec. 2004;18:139–142

Appendix 1

T4-35
Performance Criteria and Scoring Sheet for a New Assessment Tool to Measure Physicians’ Academic Productivity in a Performance-Based Remuneration System, University of Western Ontario, 2008*

Supplemental Digital Content

© 2014 by the Association of American Medical Colleges