In recent years, the professional regulatory authorities in many countries have engaged in the difficult process of ensuring the ongoing competent clinical performance of registered and practicing medical practitioners who have completed their training.1 To this end, they have developed performance assessments with the dual aim of protecting patients and of enhancing the clinical performance of doctors. These assessments are often linked to the recertification or revalidation of practitioners.2 Three levels of performance assessment are recognized.3 Level 1 screens either an entire population or a random sampling of doctors. Level 2 targets groups whose personal attributes or practice profile place them at risk of poor performance. Examples might include older doctors or those working in relative professional isolation. Level 3 assesses doctors about whom there are specific concerns. Such people might be identified through a general screening program, by complaints, by aberrant prescribing practices, or by some other reviewing methodology (e.g., coroner's investigation, internal hospital audit, etc.).
As the various performance assessments are based on fundamental and widely accepted educational principles, they tend to have much in common. Yet, historical, political, financial, philosophical, and sociological forces all inevitably affect the development of performance assessment. This ensures a rich diversity in the approaches taken and in the variety of models generated in different countries and even by different authorities in the same country.
The 10th Cambridge Conference on Medical Education was held in Sheffield, the United Kingdom (UK), in 2001 with “Assessing the Performance of Doctors in Practice” as the conference theme. This conference, together with a follow-up meeting in Wellington, New Zealand, later that year, drew together many of those engaged in the practical aspects of the development and maintenance of performance assessments throughout the English-speaking world. It became clear to those in attendance that there is much to be learned by comparing the similarities and differences in the various approaches to performance assessment. A network of programs was subsequently established under the collective name of the International Performance Assessment Coalition.
In this report, we describe and compare 16 performance assessments (Levels 1–3) that have been successfully introduced by 11 regulatory bodies in four countries (Canada, Australia, New Zealand, and the UK) on three continents. Specifically, we identify the responsible organizations and outline how they are supported statutorily and financially, how they are administered, the number and type of assessments undertaken, staffing and budget details, referral mechanisms, assessment methods, and program evaluation. We also outline some of the issues with which the various programs are currently grappling. We then attempt to identify those factors that have contributed to the successful introduction of these programs.
The data contained in this report were supplied by some of those attending the 10th Cambridge Conference on Medical Education in Sheffield, UK, in July 2001 and/or the follow-up meeting in Wellington, New Zealand, in November 2001. The informants and the organizations that they represent are listed in the Appendix. All informants occupy senior positions in their respective organizations, and all have a detailed knowledge of their organizations' work on performance assessment. The collected data relate only to systems of ongoing assessment of practitioners who have completed their training. In view of the fundamental differences between certification and recertification,4 we specifically excluded assessments for undergraduate students and for postgraduate students who are undergoing or who have just completed their training.
The information was primarily obtained through a standardized written questionnaire that was completed in advance by all those attending the Wellington meeting. Where necessary, this information was supplemented by direct contact between the responsible person and one or other of the four authors. Some informants provided descriptions and analyses of their programs from the published literature.5–10 A draft of this report was circulated to all the program contacts listed in the Appendix so that any inaccuracies could be corrected.
Responsible Organizations and Types of Programs
The 16 Level-1, Level-2, and Level-3 assessments and the 11 responsible organizations are listed in Table 1. There is a large variation in the numbers of doctors enrolled, ranging from 120 to 200,000. Collectively, they offer three Level-1 assessments that screen populations of doctors, five Level-2 assessments that target specific groups, and eight Level-3 assessments that deal with doctors about whom there are specific concerns. Some organizations provide a range of assessments. For example, the Physician Achievement Review Program of the College of Physicians and Surgeons of Alberta, Canada, conducts both Level-1 and Level-2 assessments, whereas the Practice Enhancement Division of the College des medecins du Québec conducts assessments at all three levels. The College of Physicians and Surgeons of Ontario, Canada, offers two Level-3 programs, one for doctors in general/family practice and the other at doctors in specialty practice. There are two phases to the Level-3 performance assessments conducted by the General Medical Council in the UK; the first phase is conducted by a team of at least three assessors at the doctor's place of work and the second phase is conducted at designated centers.
As might be expected, Level 1 assesses many more people (some 2,000 per year in the case of the Practice Enhancement Division of the College des medecins du Québec), and Level 3 assesses as few as 15 physicians per year. The performance assessment of the General Medical Council in the UK undertakes the largest number of Level-3 assessments—160 in 2001.
Administration, Statutory Support, and Legal Challenge
All of the performance assessment programs are supported by legislation regulating the practice of medical professionals. Most are under the auspices of a licensing authority, or in the case of the Cambridge Peer Review Program, by the hospital in which the physician practices, i.e., by a body with the authority to grant or revoke licenses or privileges. Universities conduct some of the Level-3 assessments, with the results being forwarded to the licensing authority, although with the signed consent of the participating physician. This is a deliberate attempt to keep the assessment at arm's length from the licensing authority and so promote the program's focus on “What can we do to help you be a better doctor?” and to minimize feelings of “They [i.e., the licensing authority] are out to get me.”
Many of the Level-3 assessments have an appeals process in place. For example, the College of Physicians and Surgeons of Ontario uses a review panel consisting of peer colleagues and public appointees to the college. In November 2001, when we collated these data, only one of the programs had faced a legal challenge, although others felt that it was only a matter of time before this occurred. The single existing legal challenge related to the successful appeal to the Privy Council in the UK by a doctor who had been suspended following a performance assessment undertaken on behalf of the General Medical Council.
Funding and Staffing Levels
In most Level-1 assessments, funding comes from the licensing authority through annual member dues. In contrast, many Level-3 assessments operate on a cost-recovery basis, with all or a significant proportion of the total expense being the responsibility of the participating physician.
Staffing levels vary according to the scope and size of the assessment program. Many, in addition to their regular administrative and support staff, employ practicing physician-assessors as needed. As shown in Table 1, budgets also vary widely depending on the size and scope of the program, so that actual figures are difficult to compare.
In the main, Level 1 assesses doctors randomly, Level 2 targets “at risk” groups, and Level 3 assesses doctors about whom the regulatory body is notified. Of the five Level-2 assessments, the Practice Quality Review of the Royal Australasian College of Physicians is a voluntary program that serves as an alternative to the mandatory Continuing Medical Education activities required by the college of its fellows. The Saskatchewan Practice Enhancement Program in Canada assesses physicians on a stratified random basis that intentionally overselects older physicians. The Peer Assessment Program of the College of Physicians and Surgeons of Ontario, Canada, is a nonvoluntary program of peer review of both randomly selected and targeted older physicians (over the age of 70) practicing in Ontario.
Table 2 summarizes the tools used in the three Level-1 and five Level-2 assessments. Level 1 screens large numbers of doctors and tends to rely on questionnaires and practice profiles to assess performance. In the main, questionnaires are given to colleagues, coworkers, and patients. The Level-1 Cambridge Peer Review Program is hospital based, has relatively few participants, and, therefore, has assessment methods more commonly associated with Level-3 assessments. Level-2 assessments all involve direct contact between the practitioner and the assessment team, and all include a physician interview, a chart review, and a site inspection.
Table 3 summarizes the tools being used in the eight Level-3 assessments. In all, 17 different assessment tools are in use, although some of these (e.g., written clinical examinations, written therapeutics examinations) are similar. Only five assessment tools are so popular as to be used by four or more Level-3 assessments. In order of popularity these are: case discussion, physician interview, chart review, standardized/simulated patients, and direct observation.
Validation and Evaluation of Programs
Some larger programs (e.g., the Physician Achievement Review Program, College of Physicians and Surgeons of Alberta, Canada) pilot test their survey instruments or statistically analyze their instruments and results. Most programs regularly modify and update their materials, and this makes it difficult to analyze them longitudinally. Some Level-3 assessments (e.g., Clinician Assessment and Professional Enhancement Program of Manitoba and the Physician Review Program of the College of Physicians and Surgeons of Ontario, both in Canada) validate their assessments using randomly selected community physicians (known as “criterion” physicians in Ontario) who are not considered to be performing poorly.
The programs identified a wide range of ongoing issues with which they were dealing when we were collecting the data in 2001. For example, the Medical Council of New Zealand, the College of Physicians and Surgeons of Ontario (Canada), and the Saskatchewan Practice Enhancement Program (Canada) identified the recruitment and training of adequate numbers of assessors and also ensuring the consistency of standards between assessors. The Clinical Competence Program of British Columbia (Canada) and the New South Wales Medical Board (Australia) were developing more sophisticated systems of evaluating communication skills. Finding resources for physician enhancement in areas of need was a concern for the Colleges of Physicians and Surgeons of Alberta and of Ontario and the Clinician Assessment and Professional Enhancement Program of Manitoba.
Our report provides a snapshot rather than a comprehensive account of some of the performance assessment programs that currently operate in different parts of the English-speaking world. The individuals providing data were a self-selected group of enthusiasts, and the programs that they represent are at the cutting edge in the development of performance assessment programs for practicing physicians in their respective countries. Our report highlights the fact that licensing authorities everywhere are grappling with broadly similar issues related to the organization and conduct of assessments, including deciding on the type of assessments to be conducted, developing referral mechanisms, selecting appropriate assessment tools, selecting and training assessors, ensuring the provision of supportive legislation, developing an organizational structure, and procuring funds. In particular, organizations are active in locating or developing the educational resources that allow remediation to appropriately and effectively deal with identified needs.
The first important observation is that the various programs operate on three distinct levels. These differ fundamentally, particularly in terms of their aims and objectives, the numbers of doctors enrolled, referral mechanisms, assessment methods, and assessment outcomes. For example, Level-1 assessments that screen an entire population or a random sample of doctors provide formative rather than summative assessments of performance and are therefore relatively “low-stake” assessments. By contrast, Level-3 assessments target relatively small numbers of those identified as possible poor performers. Here, the assessments are “high stake” and must be as rigorous as any other summative assessment process. We suggest that the evolving literature on performance assessment should recognize this distinction between the different levels of performance assessment. Furthermore, an important initial step for those contemplating the introduction of performance assessment is to decide on the level of the assessment, as so many subsequent decisions flow from this.
The programs we describe in this report are all supported by specific legislation. In most cases, such legislation not only allows the relevant bodies to make participation mandatory but also safeguards the confidentiality of participants, whether they are patients, assessors, or those being assessed. This is particularly important in countries where “freedom of information” legislation has been enacted and where the fear of adverse publicity and/or litigation might deter individuals from participating in performance assessments in an open manner.11 To date, only one of the programs in our study (the performance assessment of the UK's General Medical Council) reported a legal challenge to the outcome of a performance assessment. This challenge was successful and led to the revision of the assessment process. However, other programs anticipate that their assessments will sooner or later be legally tested.
The large variety of assessment tools in use, particularly with Level-3 assessments, attests to the fact that no single tool alone is satisfactory or considered adequately robust. The choice of tools inevitably involves a trade-off between their validity and reliability on the one hand and their cost and acceptability on the other.12 It is increasingly clear that a battery of assessment tools is necessary, and development work in this area is ongoing.13 It is notable that case discussion, physician interview, chart review, use of standardized/simulated patients, and direct observation were the most popular assessment tools in the programs that we reviewed. A related issue is that of setting standards for assessment of a doctor in practice (rather than one who has completed training and is sitting a certifying examination), given the wide range of location and scope of practice even among doctors in one particular clinical specialty.
The validation of assessment processes and assessment tools is a common concern. The aim is to develop quality assessments that are valid, legally defensible, and sustainable within the limits of human and financial resources.5 Currently, programs tend to deal with the possibility of a legal challenge by ensuring the reliability and validity of their assessments, and by ensuring good communication with participants so that their processes are seen to be fair and transparent.
In this report, we identify the essential elements for successful performance assessment programs. Firstly, programs need to be adequately equipped to carry out their task. This involves the provision of human and financial resources, together with some form of statutory support. Successful programs tend to have a core group of people committed to the principles and process of ongoing assessment of clinical competence and to enhancing the quality of their assessments. In addition to regular administrative and support staff, many programs employ practicing physician as assessors to ensure that the process is a true “peer assessment.” Some programs undertake research or validation to ensure that their assessments are as valid and reliable as possible and that they themselves remain competent in what they do.
Ensuring the ongoing competent clinical performance of practicing doctors presents many challenges. The public, its elected representatives, employers, and doctors themselves may all have expectations and anxieties that need to be reconciled.14 Those involved in performance assessments at various levels have much to learn from each other, for example, with regard to what assessment tools are most effective in which situation, strategies to encourage buy-in of the local physician population, difficulties encountered and overcome, and problems faced and resolved. By working together, we can surely improve the chances that each of our programs will succeed.