The literature about assessing clinical competence includes many articles on the development, performance, advantages, and disadvantages of various assessment instruments. Epstein and Hundert,1 for example, undertook an extensive review of assessment methods, which they found to be wanting in some respects, and Fowell et al2 identified problems specific to methods used in the United Kingdom. Few researchers, however, have evaluated the overall performance of an assessment scheme. The assumption seems to be that if the individual pieces are fit for purpose, then the system works as a whole. One recent study3 discussed the need to think about assessment instruments as part of an overall assessment system which itself is a major part of the curriculum. Some researchers and some regulatory bodies4,5—in the United States, the Accreditation Council for Graduate Medical Education; and, in the United Kingdom, the Postgraduate Medical Education and Training Board (now merged with the General Medical Council [GMC]) and the Quality Assurance Agency for Higher Education— aim to describe frameworks or standards for assessment programs, but a limited number of articles6,7 describe working programs.
Many curricula and assessment systems divide the study of medicine into disciplines with preclinical and clinical content. During a nine-year period (April 2001 to March 2010), we and our colleagues developed and evaluated an assessment system in a new medical school where disciplinary boundaries are deliberately blurred and initial medical training is viewed holistically. In this article, we describe the development approach and decisions, changes made in response to internal and external feedback, and some of the outcomes selected to measure student performance. We hope that by sharing our experiences, we may help other organizations develop and improve their own assessment systems.
History and Context
In 2000, the creation of five new UK medical schools was approved8; among the five was the Peninsula College of Medicine and Dentistry (PCMD), a joint venture between the Universities of Exeter and Plymouth and five local National Health Service (NHS) trusts.
The GMC provides guidance to UK medical schools in the form of “Tomorrow's Doctors,”9 a set of regularly revised medical school standards. GMC policy requires each education program to consist of both “core” and “student-selected” components. Students' performance on both components must be assessed.
Medical training in the United Kingdom is largely based on a five-year program that most students enter directly from (high) school at age 18. The new PCMD curriculum was designed to reflect the five-year structure and meet GMC standards. The first students enrolled in September 2002 and graduated in July 2007.
Developing a New Curriculum and System of Assessment
The PCMD curriculum was designed according to the principles of the PRISMS approach, which calls for medical curricula to be practice based, relevant to communities, interprofessional, shorter in length, continuous in nature, and taught across multiple sites.10 The resulting curriculum takes an integrated view of medicine; specific “subjects” or “disciplines,” such as anatomy11 or pharmacology, are not taught via formal, single-topic courses. Rather, learning is based around patient presentations, and structured problem-based learning groups act as the main teaching modality in the first two years. The driving force behind the curriculum is producing students who are optimally prepared for practice in the NHS and who meet GMC requirements as set out in “Tomorrow's Doctors.”9
Curriculum development focused on three key phases: the curriculum for years 1 and 2 (phase 1), the curriculum for years 3 and 4 (phase 2), and the curriculum for year 5 (phase 3). It took place against an overall longitudinal blueprint for the curriculum. Much of this development occurred simultaneously. While rollout took place year on year, development was an iterative process.
The assessment system was developed in three phases alongside the curriculum. The PCMD assessment scheme needed to follow the same principles of integration and clinical relevance as the curriculum and to align with the PRISMS model: “Assessment will emphasize ‘doing’ rather than ‘knowing,’ and continuous methods of formative assessment, such as portfolios, will predominate.”10 Assessment approaches also had to provide good feedback to students on a continuous basis and develop in authenticity during the five years of the program. Because we desired to assess overall program outcomes rather than simply performance in independent “courses,” we kept three final outcomes at the heart of the assessment program: (1) applied knowledge of life and human sciences, (2) clinical skills, and (3) personal and professional development.
Rather than regarding these outcomes as units of teaching, we took the unusual step of defining them as units of assessment. Each unit of assessment became a “module” with associated credits, which allowed us to comply with the usual modular structure of UK undergraduate degrees.12 The assessment modules did not need to link directly to units of teaching, but relevant learning for each assessment module could take place in a variety of contexts and experiences, from small-group tutorial sessions to patient encounters.
Our two parent universities created an overarching joint approval and review board (JARB) to manage the quality assurance and course approval process. JARB's interacting committees and processes ensure timely development and annual review of the infrastructure, curriculum, resources, and the assessment system. The GMC, through its Quality Assurance of Basic Medical Education (QABME) program13 and regular annual visits, provided a regulatory overview of the program's quality before granting final approval in 2005.
The PCMD curriculum is delivered at a number of localities—including three district hospitals, two university campuses, and more than 200 primary care practices—across the counties of Devon and Cornwall in southwest England. Students normally move among sites during the five-year program. To ensure consistency and quality of learning experiences at all sites, the curriculum was developed centrally but drew on expertise from all localities; it is now managed centrally but delivered locally. To ensure quality and consistency of assessment across the sites, the assessment program was also devised centrally; like the curriculum, it is managed centrally but delivered locally within agreed quality boundaries. Assessors are trained locally. For quality assurance purposes, it was crucial that local teams not devise their own assessment programs.
Assessment System Design
Dijkstra et al5 propose a general framework for designing assessment systems and provide six areas that assessment designers should consider: goals, program in action, support, documenting, improving, and accounting. Their study had not yet been published when we devised our assessment system, but it serves as a useful model for the retrospective evaluation of our design processes. We will try to link our evaluation to these general areas, but it should be noted that these are not sequential steps; the last area, “accounting,” includes establishing an evidence base and should be one of the first steps in designing a system.
Assessment goals: Purposes and outcomes
The primary purposes (goals) of our assessment system are to
1. improve student learning by providing the student with regular, appropriate, and, most important, timely feedback;
2. evaluate student knowledge, skills, and attitudes;
3. provide a mark or grade that enables a student's performance to be established in relation to program learning outcomes; and
4. provide evidence on the performance of the curriculum.
The primary outcome is to certify PCMD graduates fit for practice in the NHS in accordance with GMC regulations.
It is generally accepted3,14,15 that assessment programs need to take account of validity, reliability, impact, and resources, although these may be expressed slightly differently by different authors (e.g., usability, affordability). To implement what we considered to be best practices in assessment in accordance with these needs, we adopted seven principles
1. Policies and procedures should be informed by the best evidence and relevant educational theory.
2. Assessment should be authentic and relevant to the major curriculum outcomes of clinical competence and professional competence.
3. A “frequent look and rapid remediation” approach should underpin the program.
4. Appropriate standard-setting methods for all assessments should be used.
5. A mixture of continuous, cumulative, and end-point assessments should be included.
6. Multiple sampling should be incorporated.
7. The performance of assessment activities and assessors must be evaluated.
At the time we developed our seven principles, the precepts of the Quality Assurance Agency for Higher Education (which, along with the GMC, regulates UK medical schools) were largely related to the institutional-level governance of assessment. Nevertheless, we considered those precepts, which included specific references to reliability and validity, fairness, and clarity of criteria.16 We did not explicitly include fairness (equity in relation to equality and diversity) and openness to students among our principles because they were already enshrined in the values of the whole medical school and did not need to be reiterated for the assessment system. All of our assessments are analyzed to investigate equity, and some interesting findings have been published, such as evidence showing that students with specific learning disabilities, including dyslexia, are not disadvantaged by multiple-choice tests and that the commonly allowed accommodations enable them to perform up to their capability.17
Establishing the evidence base
Having established our goals and principles, we needed to choose an appropriate mix of instruments using the “best evidence and relevant educational theory” that aligned with the PRISMS model.10 We scoured the literature for best assessment practices that fit with our curriculum structure and assessment goals. This work resulted in a document entitled “Undergraduate Assessment Evidence Base,” in which we provided the justification for the instruments and sampling scheme we selected. We made the document available to the external bodies monitoring the development of our curriculum.
Combining our goals with the evidence base led us to select tools for assessing our three modules. Knowledge of human and life sciences would be assessed in an applied context using patient scenarios; a picture of longitudinal growth would be built using a progress testing approach.18 Clinical skills would be assessed by a mixture of individual procedural assessments and a modified (integrated) objective structured clinical examination (OSCE). Personal and professional development would be assessed by a mixture of portfolios and professional judgments. Table 1 shows the structure of our “frequent look” assessment system, which emphasizes “doing rather than knowing” wherever possible: Faculty observe students doing clinical tasks and judge their behaviors. The system is largely based on continuous assessment; there are few high-stakes, end-of-year examinations. The system increases in authenticity as students progress through the program by basing a large proportion of assessments in the later years on real patient encounters.
Implementing the Assessment System
Implementing the assessment system involved numerous parallel processes. First, we devised regulations for approval by our parent universities; these included appeals processes using approved governance structures and became part of PCMD's academic regulations and codes of practice. As the system developed, we produced a technical manual detailing how the assessment process works (with information on all individual assessment instruments and standard-setting methods) and how results are aggregated to make decisions about student competence. The technical manual, a live document that is revised annually, is available to all students and staff through our managed electronic learning environment, and by request from the authors. All changes are approved in a multistage process involving specialist assessment groups, the medical program management committee, the college education committee, and, when necessary, the JARB.
Next, we identified, appointed, and trained relevant staff to take lead responsibility for the assessments described in Table 1, and we ensured that all assessors were trained. We put into place administrative processes that allowed assessment data to be collected, disseminated, and acted on. We also developed assessment blueprints, chose appropriate standard-setting methods, and developed quality assurance processes.
Further, we had to decide how closely the medical school would control the timing of assessments. PCMD's student-centered approach suggested that in some areas, such as clinical procedures, students might be able to decide when they were ready to be assessed.
Because our goal was to create an evidence-based assessment system, we also put into place staff and systems for evaluating our assessment instruments using appropriate psychometric approaches. We therefore appointed a psychometrician to work full-time on assessment analysis for the school. The outcomes of these analyses were fed back into the quality assurance processes of the school through structures such as the PCMD Education Committee and the JARB. They were also made available to external quality assurance agencies, such as the GMC through its QABME process. The evidence the analyses provided allowed us to make informed decisions regarding changes to assessment criteria, assessment instruments, standard-setting methods, and assessor training as our experience with the system matured.
These implementation issues had to be addressed in each of the three phases of development. We took the opportunity to learn from, and to modify, existing processes in each phase. However, we needed to ensure that the assessment contexts increased in authenticity at each stage of training in order to produce new graduates who could operate as junior doctors with real patients at the end of the program. So, whereas much assessment in years 1 and 2 of the curriculum was based on simulated patients or simulated instruments in a safe environment, assessment in year 5 was based on encounters with real patients in the clinical environment whenever possible.
Intervention and Remediation
Because our assessment system provides information at frequent points, faculty are able to review students' progress throughout the year and intervene when any assessment identifies poor performance. We therefore instituted a process of regular academic review that identifies struggling students and those who need additional support. As the review process developed, we came to use the phrase “frequent look and rapid remediation” to describe our assessment program. We wrote procedures for remediation and formed a core “remediation team” that works with students whose performance is of particular concern. We have also undertaken research on the effect of remediation on subsequent student performance.19
The developing assessment system was continually reviewed by PCMD, our parent universities, the GMC, and external examiners. We were open with all our assessment results in aggregate form, including any analyses that suggested that all was not perfect. We successfully engaged our external reviewers as part of the development process, and thus we were able to expand the expertise available to the school.
We benefited from the additional evidence that the external reviewers offered. For example, students who were in the early cycles of OSCE examinations and did not know the general content areas in advance claimed that they were disadvantaged because this information was “leaked” to students later in the cycle by those who had completed the assessment. External reviewers provided evidence that knowing the general content area in advance did not confer an advantage. We were able to confirm this using our own data after the examinations, and now we routinely include this analysis as part of our internal review after such assessments. However, we also decided to give an indication of general station content as preparatory information for all students.
The outcomes from the development of our assessment system are a set of successful assessment modules that perform to the required standard, the ability to identify students able to progress, and research publications contributing to the wider understanding of medical assessment.
As noted above, the system uses continuous assessment (see Table 1), and remedial action is possible at any of the many assessment points. Global assessments on progression are made at the end of every academic year. The system has identified students who are not ready or able to progress from one year to the next. In many of these cases, students were able to repeat the year of study and then make satisfactory progress. Other students left the program voluntarily or were asked to leave. The progression figures for 2006–2009, averaged over three cohorts, are shown in Table 2. As the table highlights, progression standards are slightly more stringent between years 2 and 3 and between years 4 and 5 than at other transition points. This is deliberate; both of these transitions reflect a large increase in contact with and responsibility for patients. The success of our graduates continues to be evaluated, and a recent report20 showed that our graduates reported that they felt at least as well prepared for working in the NHS as graduates of other medical schools.
Challenges and Conclusions
The biggest challenge facing any assessment development team is that the system devised by a small number of staff will be tested to destruction by a large number of students. Students are prime stakeholders in any assessment system, and development teams can learn from their reactions to assessment instruments and processes. Responding to unexpected problems can seem like fighting fires, but this important process should be acknowledged more widely as part of improving an assessment program. For example, in response to student requests, we changed the basis of the knowledge test at the end of year 1. Students were concerned that the test, which was blueprinted against the knowledge required of a junior doctor, did not accurately indicate what they had learned in their first year because their exposure to clinical learning was not extensive. In addition to using the existing blueprint, therefore, this early knowledge test now assesses performance toward achieving the learning outcomes generated by the students themselves in their problem-based learning sessions.
Some assessment systems allow students flexibility to choose the timing of certain assessments. Clearly, this cannot occur when all students must sit the same assessment at the same time (e.g., our progress tests). However, for some clinical skills procedures we initially sought to encourage students to understand when they might be competent and to choose the time of their assessment, allowing repeat attempts. It rapidly became clear that this practice was not sustainable. The majority of students chose to leave the assessment as late as possible, creating a bottleneck and adding to staff workloads.
Review of the assessment program continues, and there are ambitions both to improve it and to use it to provide research evidence of value to the medical education community. Challenges for the future include reviewing the assessment system with a view to simplification, using generalizability theory as a common framework to evaluate and gain a deeper understanding of the value of the assessments, and building greater flexibility into the existing system so that assessment decisions can be individualized to particular students.
The authors wish to recognize the valuable input of Professor Lambert Schuwirth, their educational advisor during the development of the assessment program.
The authors also acknowledge the contributions of Paul Bradley, Adrian Freeman, John McLachlan, Charlotte Rees, Judy Searle, John Tooke, and Paul Upton to various stages or parts of the assessment program.