Holmboe, Eric S. MD; Ross, Kathryn MBE
Dr. Holmboe is chief medical officer and senior vice president, American Board of Internal Medicine, Philadelphia, Pennsylvania.
Ms. Ross is research associate, American Board of Internal Medicine, Philadelphia, Pennsylvania.
Correspondence should be addressed to Dr. Holmboe, American Board of Internal Medicine, 510 Walnut St., Suite 1700, Philadelphia, PA 19106; telephone: (215) 446-3606; e-mail: email@example.com.
Editor’s Note: This is a commentary on Wright C, Richards SH, Hill JJ, et al. Multisource feedback in evaluating the performance of doctors: The example of the UK General Medical Council patient and colleague questionnaires. Acad Med. 2012;87:1668–1678.
We commend the rigorous psychometric analyses by Wright and colleagues1 of a multisource feedback (MSF) approach involving patient and colleague questionnaires as part of a revalidation program in the United Kingdom. Wright and colleagues note the growing interest in the use of MSF to evaluate physicians, and regulatory bodies in North America—most notably the American Board of Medical Specialties (ABMS) along with several provincial medical authorities in Canada—are also using patient, peer, and other health care provider surveys as part of recertification and evaluation programs for practicing physicians.2,3 Wright and colleagues note that the psychometric analyses demonstrated that the surveys were sufficient for formative assessment. However, it is necessary to define a threshold of suitability when an assessment tool, such as MSF surveys, is used for formative purposes in a program of assessment for practicing physicians. The issue of what is “good enough” is further complicated when the agency delivering the assessment sits within a regulatory system, regardless of whether the agency is governmental, such as the Canadian provincial medical authorities, or independent, such as the UK General Medical Council and the professional self-regulatory certification boards of the ABMS.
What Constitutes “Good Enough?”
In defining “good enough,” we must ask two key questions: for whom and for what purposes? Identifying the end users of the assessment is critical. The ultimate end users, in some form, are the patient and the public. Even if the results of such assessments are not publicly transparent or shared with the patient and public, a logic model must be clear to ensure that the results lead to improved quality and safety. For example, if the primary purpose for the regulatory agency is the identification of poor outlier performance (a “bad apples” approach), the psychometric results reported by Wright and colleagues may be sufficient for that specific purpose, assuming the regulatory agency has in place mechanisms to act on the results. This latter point is also essential; lack of a mechanism to act on the results, especially when poor and potentially dangerous performance is detected, should be unacceptable to the public and, from an institutional professionalism perspective, to regulatory agencies.
The Utility of MSF
The findings of Wright and colleagues reveal a common conundrum for MSF surveys. Although the authors found that the instruments in this study had sufficient reliability for formative purposes, meaning that the reliability coefficients did not meet the benchmark for summative, high-stakes assessments that allow for discrimination between different levels of performance among physicians, the results reveal common rating problems, such as range restriction and ceiling effects, in addition to an apparent lack of useful comments from participants. The instrument presents a conundrum in that it meets threshold psychometric criteria but provides results that are not terribly useful for physicians in making changes in their practice. Formative assessment is usually defined as assessment for learning, in contrast to summative assessment that targets assessment of learning. Effective formative assessment should occur on a regular and frequent cycle, is usually viewed as lower-stakes because it should focus on key tasks, and should be specifically tailored to the individual.4
Norcini and colleagues’5 recent work builds on van der Vleuten’s concept of utility to define criteria for good assessment tools. Although reliability and validity are important, Norcini and colleagues added the concept of the “catalytic effect,” meaning that an assessment process, especially one designed for formative purposes, should be able to drive future learning forward. This is a core principle behind quality improvement approaches such as the plan–do–study–act cycle: Data about performance in practice—in this case, MSF surveys—should enable physicians to make meaningful changes in their practice, such as more effective shared decision making and counseling, to improve the quality and safety of the care they deliver to their patients. Yet, when assessment results are highly skewed without useful comments or guidance, the overall utility of the assessment can be quite low for physicians despite the instrument’s sufficient psychometric properties.
Another major limitation of MSF surveys from a formative assessment perspective is that, even when survey results detect a signal about a potential problem, the physician must engage in additional fact finding that requires substantial skills in quality improvement, systems, and communication science. Most practicing physicians’ formal training did not include the set of skills required for this additional assessment step; thus, the process of improving “the ratings” can be difficult and frustrating. How can these surveys be constructed to provide more specific guidance about what to improve in a physician practice?
Incorporating a Meaningful “Patient Voice” in MSF
Although it has received little attention over the years, clinimetrics may be one way to improve the utility of assessment processes. Clinimetrics is a methodology that focuses on the quality and rigor of measurement from a clinical perspective. Concato and Feinstein6 applied this approach in an outpatient clinic and used a simple, open-ended, clinimetrically based survey that asked patients three questions at the end of their clinic visit: “What do you like most, what do you like least, and what one thing would you like to see changed” about the care you receive? In addition to providing substantial and rich actionable narrative feedback about the patient’s experience, this simple clinimetric approach also uncovered many aspects of care not detected by the validated psychometrically based instrument being concurrently used by the institution.
In the experience of one of the authors of this commentary (E.H.), using a combined psychometric and clinimetric approach to assess patient experience in a military general internal medicine clinic was more effective than a purely psychometric approach in facilitating quality improvement. Furthermore, a clinimetric approach enhances the patient-centeredness of an assessment process, providing the patients the opportunity to have their specific concerns heard. When those concerns form a theme across patients, the results provide powerful information to the physicians and practice about what is and is not working. In essence, although comments from clinimetric-based surveys can be analyzed for common themes, each survey also serves as an important “n of 1” patient-centered study for individual physicians. Simply asking the patient, for example, to “please add any other comments you want to make about this doctor” is not likely to be very helpful.1
Formative Assessment and Medical Regulation
Medical regulatory bodies face additional challenges when using formative assessments as part of their assessment programs. By definition, medical regulatory bodies’ primary constituency is the public, yet these bodies must work through health care providers to achieve their mandate and goals. Therefore, any required assessment has to be seen as credible to both the public and the professional. It is our view that this very public-facing position tends to drive the narrow focus on the psychometric characteristics of formative assessments such as MSF, at times to the detriment of the overall utility of the assessment process. Recognizing this challenge for professional medical regulation, the American Board of Internal Medicine convened an international group of experts in MSF from medicine and other occupations in February 2012 to better understand how and why MSF approaches and tools might be incorporated in comprehensive, regulatory-based assessment programs, such as revalidation and maintenance of certification.
Consistent with the conclusions of Wright and colleagues, the consensus opinion of this group was that MSF approaches should be used at this time only for formative purposes, even in the context of regulatory-based assessment processes. However, the most important conclusion was the recognition that such agencies need to define with much greater clarity the intended and specific purposes of MSF. Is the purpose to promote lifelong learning? If so, what does that look like in a program of recertification and revalidation? Is it to “raise all boats” to improve quality and safety across physicians, or is it focused on detecting the potentially harmful outlier physician, or both? Is it to raise awareness only, or to serve as a catalyst to improve practice? As we have argued above, we believe the catalytic effect is crucial to successful formative assessment and should involve all physicians participating in an MSF process. The specific purpose of MSF might be different across specialties depending on the nature and context of practice, but some core components will likely be relevant for most physician specialties.
The group also agreed that MSF should be a component of a comprehensive strategy to assess the quality of care a clinician delivers. Instruments that provide a reflective framework for the physician are likely to be helpful, but how best to guide and facilitate such reflection is not well understood, and we need more work to learn how MSF drives future learning. In addition, the group concluded that systems need to be designed to enable efficient collection, analysis, and dissemination of the MSF reports back to physicians. More important, data suggest that physicians should review their results with either a coach or trusted peer as a “best practice” for MSF. Review and reflection with a coach or trusted peer can enhance the effect of the MSF data and help physicians develop meaningful improvement plans, assuming they have actionable data to inform the process.
Striking the Right Balance in MSF
MSF can be a useful approach for formative assessment, but to reach its full potential it requires (1) clarity of purpose and of the nature of the information collected with specific attention to the catalytic effects to drive improvement, (2) better use of comments and open-ended questions to detect contextually rich local issues, and (3) greater attention on how the data are delivered to and processed by the physician. Both government and independent professional self-regulatory bodies walk a difficult tight rope in developing instruments that are psychometrically credible to both physicians and the public yet truly drive improvements in quality and safety. In the end, however, a formative assessment approach is only as good as the quality of care it detects and improves for the benefit of patients and the public.
Other disclosures: Both authors are employed by the American Board of Internal Medicine, and Dr. Holmboe receives royalties from Mosby-Elsevier for a textbook on assessment.
Ethical approval: Not applicable.
1. Wright C, Richards SH, Hill JJ, et al. Multisource feedback in evaluating the performance of doctors: The example of the UK General Medical Council patient and colleague questionnaires. Acad Med. 2012;87:1668–1678
2. Lipner RS, Blank LL, Leas BF, Fortna GS. The value of patient and peer ratings in recertification. Acad Med. 2002;77(10 suppl):S64–S66
3. Hall W, Violato C, Lewkonia R, et al. Assessment of physician performance in Alberta: The physician achievement review. CMAJ. 1999;161:52–57
4. Rudolph JW, Simon R, Raemer DB, Eppich WJ. Debriefing as formative assessment: Closing performance gaps in medical education. Acad Emerg Med. 2008;15:1010–1016
5. Norcini J, Anderson B, Bollela V, et al. Criteria for good assessment: Consensus statement and recommendations from the Ottawa 2010 Conference. Med Teach. 2011;33:206–214
6. Concato J, Feinstein AR. Asking patients what they like: Overlooked attributes of patient satisfaction with primary care. Am J Med. 1997;102:399–406