”The distinction between summative and formative is replaced by a continuum of stakes.”
Over the past few years, there has been a growing push in the medical education community for competency-based medical education (CBME). This led to the need for new assessment strategies. We all agree that medicine is a complex process and hence a single tool or assessment cannot test the learner's competency or performance. This has led to the introduction of programmatic assessment (PA) which is well thought out, blue-printed, low-stake multiple assessments over a period with constructive and timely feedback with every assessment. The pass/fail or promotion decision is made by an expert committee. This strategy of multiple tools and multiple assessors will eliminate or minimize bias. PA drives good learning compared to the traditional assessment which may drive poor learning because learners tend to study only what will be assessed.
The clinical analogy for PA is like going to see the doctor for a medical checkup. They will take a history, examine you, and order some tests. When you attend for follow-up the doctor will sit down with you, discuss your body mass index and lipid levels and then advise about lifestyle modifications, ask you to do more exercise, and may prescribe a lipid-lowering drug with a good explanation of why the drug is needed. They will make a follow-up appointment to see how you are progressing while rechecking your weight and lipids. Imagine instead that after the first examination that they would send you a letter with your score of 6/10 like a summative assessment outcome now provided in the mail! In summary, workplace-based assessment is an assessment for learning, while the traditional summative assessment is an assessment of learning.
The reason for introducing PA is that during training all assessors see the learners in action on many different occasions. In each specific setting, the learner is observed and receives feedback; however, this rich data is not collected, recorded, or explicitly used for learning or assessment. PA, on the other hand, helps us to utilize this data and give the learner appropriate credit. This is underpinned by the fact that what a learner knows is only a small part of what a doctor needs to do.
So how do we implement PA in a medical curriculum? As mentioned, this should be well planned and done over a sufficient duration of time, using assessment tools that focus on the learning process. Over the past two decades, several tools have been developed, checked for validity, and analyzed for reliability. Mini-clinical examination (mini-CEX) was developed by Norcini, Blank for the American Board of Internal Medicine. It is an observed clinical encounter of the learner with a patient. As expressed by the name, it is a short encounter and should be blue printed to capture various domains such as communication, physical examination, management, and counseling. Each assessment should take 30 min including immediate constructive feedback. This is a reliable tool with 8–10 assessments. Another tool is the case-based discussion (CBD), otherwise known as chart-stimulated recall, for testing record-keeping and clinical reasoning skills. Here, the learner is assessed on the case records of a patient they have managed longitudinally. The assessor will review the record keeping and test the learner on their clinical thinking. What is tested is the adequacy of communication and clarity in their records. They are asked about the choice of management plans and test ordering. Again done in sufficient numbers, the tool is reliable for judging the performance of a learner. Another tool is the Direct Observation of Procedural Skills (DOPS). The choice of the DOPS depends on the specialty, in which they are being assessed. It can be from simple tasks such as venipuncture or arterial blood gas collection to more complex tasks like a lumbar puncture. All these assessments should be done on multiple occasions by multiple assessors to minimize or avoid bias.
Another gap in traditional assessment is that it cannot test the daily working requirement like teamwork. For the best patient outcome, we need to have a collaborative approach. Multisource feedback (MSF) otherwise known as 360° assessments has been implemented in business for a long time. In medicine, this is becoming more popular. As the name suggests, it is an overall view of the professional over a period by colleagues who work with them, around them, and above them. Different professionals observe different aspects of the performance. What the nurse or social worker observes may be different from what the medical colleagues observe. Again, there should be a sufficient number of observations to make a reliable conclusion. We believe there should be at least 12–14 reports to get reliability. Patient assessment is also used for 360° assessments; it will need a larger sample size to get reliability.
Why do we combine different assessments in PAs? We know that some modes of assessment have more validity and some others have more reliability. Moreover, each tool is designed for learning and assessing a certain set of aspects of the learning process together all the tools give a more complete view of the performance of the student. The more (assessment) data are collected, the better the overall judgment. This is like a photograph; the more pixels you have, the clearer the image is. There is a tendency now to think not in terms of summative or formative. Assessment is a continuum and the number of assessments depends on how low or high the stakes are.
There are lots of research done in finding the numbers needed to get reliability for each individual assessment tool. However, in any program, an extensive amount of assessments may impose a huge burden on the process, assessors, and faculty. The good news is that we can combine the several tools and arrive at a reliable judgment with less individual assessments. In our workplace-based assessment program, in which all tools are designed to support learning and assessment on a predefined competency framework, we have shown that by combining 12 mini-CEX, six CBDs, and 12 MSF assessors we obtain reliable results. This is far less than the number of assessments we would need for each separate tool.
We have shown this is acceptable and feasible for both learners and assessors in our program. The long-term outcome is more favorable than the traditional assessments and it is cost-effective too.
Surely, we have the assessment methodology, tools, and reliability data, and we know that any assessment program should strive to find an optimal balance between validity, reliability educational impact, acceptability, and cost, for example, the utility index, introduced by van der Vleuten, which is entirely achievable with PA. So why it is not implemented more widely? When moving from a “traditional” assessment program (assessment of learning) to PA, learners, assessors, and policymakers need to adopt the idea that each assessment is an opportunity to give the learner feedback to improve and guide learning. Many small assessments for learning eventually accumulate to make a reliable decision. Changing the curriculum to a PA approach can be done one small step at a time. Calibrating the assessors is vital for any new assessment strategy. Even if we have all the tools and data, it is the assessors we are depending on. We may have to grab the opportunities and introduce this too.
Back to the doctor's office: If you have a patient with borderline elevation of prostate-specific antigen, you take a history, examine him, get an ultrasound or MRI of the prostate, and then organize the biopsy. The urologist takes multiple biopsies, at least 6, to avoid false-negative decisions. Similarly, in assessing the learner's performance, when we use more data, we arrive at a more defensible decision and obtain less false negatives.
In conclusion, PA meets all the criteria for making a good assessment. What are we waiting for?
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
1. Heeneman S, de Jong LH, Dawson LJ, Wilkinson TJ, Ryan A, Tait GR, et al Ottawa 2020 consensus statement for programmatic assessment – 1. Agreement on the principles Med Teach. 2021;43:1139–48
2. Miller GE. The assessment of clinical skills/competence/performance Acad Med. 1990;65:S63–7
3. Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): A preliminary investigation Ann Intern Med. 1995;123:795–9
4. Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: A method for assessing clinical skills Ann Intern Med. 2003;138:476–81
5. Wilkinson JR, Crossley JG, Wragg A, Mills P, Cowan G, Wade W. Implementing workplace-based assessment across the medical specialties in the United Kingdom Med Educ. 2008;42:364–73
6. Pelgrim EA, Kramer AW, Mokkink HG, van den Elsen L, Grol RP, van der Vleuten CP. In-training assessment using direct observation of single-patient encounters: A literature review Adv Health Sci Educ Theory Pract. 2011;16:131–42
7. Barton JR, Corbett S, van der Vleuten CP. English Bowel Cancer Screening Programme UK Joint Advisory Group for Gastrointestinal Endoscopy. The validity and reliability of a direct observation of procedural skills assessment tool: Assessing colonoscopic skills of senior endoscopists Gastrointest Endosc. 2012;75:591–7
8. Mynors-Wallis L, Cope D, Brittlebank A, Palekar F. Case-based discussion: A useful tool for revalidation Psychiatrist. 2011;35:230–4
9. Moonen-van Loon JM, Overeem K, Govaerts MJ, Verhoeven BH, van der Vleuten CP, Driessen EW. The reliability of multisource feedback in competency-based assessment programs: The effects of multiple occasions and assessor groups Acad Med. 2015;90:1093–9
10. Donnon T, Al Ansari A, Al Alawi S, Violato C. The reliability, validity, and feasibility of multisource feedback physician assessment: A systematic review Acad Med. 2014;89:511–6
11. Moonen-van Loon JM, Overeem K, Donkers HH, van der Vleuten CP, Driessen EW. Composite reliability of a workplace-based assessment toolbox for postgraduate medical education Adv Health Sci Educ Theory Pract. 2013;18:1087–102
12. Nair BR, Moonen - van Loon JM, Parvathy M, van der Vleuten CP. Composite reliability of workplace based assessment of international medical graduates Med Ed Publish. 2021;10:104 doi: org/10.15694/mep.2021.000104.1
13. Nair BK, Moonen-van Loon JM, Parvathy M, Jolly BC, van der Vleuten CP. Composite reliability of workplace-based assessment of international medical graduates Med J Aust. 2017;207:453.
14. van der Vleuten CP. The assessment of professional competence: Developments, research and practical implications Adv Health Sci Educ Theory Pract. 1996;1:41–67
15. Wilkinson TJ, Tweed MJ. Deconstructing programmatic assessment Adv Med Educ Pract. 2018;9:191–7
16. Torre D, Schuwirth L, Van der Vleuten C, Heeneman S. An international study on the implementation of programmatic assessment: Understanding challenges and exploring solutions Med Teach. 2022;44:928–37