Simulation-based mastery learning (SBML) is an instructional approach in which all learners must attain a predetermined skill level in a simulated environment before performing the procedure or skill on actual patients.1,2 In contrast to traditional training methods that limit curricular time and result in variable learner achievement, mastery learning allows for uniformly high educational outcomes with variable time needed for training.2 SBML includes a baseline skills examination (pretest) often using a checklist, followed by didactic content, and deliberate practice using a simulator.1,3 At the end of training, all participants are required to meet or exceed a minimum passing standard (MPS) during a posttest skills examination.1,3 Trainees who do not meet or exceed the MPS at initial posttesting engage in additional deliberate practice and testing until the MPS is reached.1,3 In contrast to traditional settings in which the MPS is set at the level of minimal competency, in mastery settings the MPS is set at the level at which learners are well prepared for the next stage of training or practice.
A defensible MPS must be established to measure learner performance outcomes in mastery learning curricula.1 Historically, the Angoff and Hofstee methods have often been used to set these standards.4–9 Both approaches use a panel of judges composed of experts with experience and knowledge of the subject area and of learners at the relevant level of training. In the Angoff method, an item-based method, expert judges are asked to review each checklist item and estimate the percentage of borderline (i.e., minimally or marginally competent) trainees who would perform each checklist item correctly after training.5 The Hofstee method is whole-test based and includes both normative and criterion-based considerations; this method asks judges to bracket the minimum and maximum acceptable passing scores and failure rates.5 Because different methods typically result in different standards, some studies have averaged the Angoff and Hofstee scores to obtain a final cut score.4,7,8
Recent work has questioned the utility of the Angoff and Hofstee methods when used to set standards for mastery learning curricula.10,11 Many medical procedures involve significant patient risk if not performed to very high standards, yet the Angoff method does not distinguish between critical and noncritical test items and allows a borderline or minimally competent trainee to successfully complete training and perform procedures on patients. The use of the Hofstee method in mastery settings has also been challenged in that the hallmark of the Hofstee method is consideration of acceptable failure rates, yet in mastery settings there are rarely any ultimate failures. Any learner who does not meet the MPS at a given test administration completes additional training and retests until they pass; thus, initial failure rates are not relevant concerns in a mastery setting. Additionally, the explicitly normative concerns of the Hofstee method are not appropriate in a competency-based approach.
To address these issues, two new standard-setting methods were created for mastery learning environments: the Mastery Angoff and the Patient-Safety methods.10,11 In the Mastery Angoff method, judges are asked to review each checklist item and determine the percentage of trainees, well prepared for the next stage of practice (e.g., well prepared to perform the procedure on a live patient with minimal to no supervision), who would perform each item correctly.10 An equivalent formulation would be to ask the judges to estimate the probability that a well-prepared learner would accomplish each item. In the Patient-Safety method, the judges are asked to review each checklist item and determine whether it has implications for patient safety, comfort, or clinical outcomes, and standards are set conjunctively for critical and noncritical items.11
The education and clinical implications of using these newer standard-setting methods compared with traditional methods have had limited evaluation.11 We hypothesized that the use of the Mastery Angoff and Patient-Safety methods would require learners to demonstrate a higher level of skill acquisition before completing training and interacting with patients. The current study had three aims: to compare Angoff and Hofstee MPSs with the MPSs obtained from the Mastery Angoff and Patient-Safety approaches for central venous catheter (CVC) skills examinations; to evaluate the consequences of setting different MPSs based on the historical performance of internal medicine (IM) and emergency medicine (EM) residents who participated in a CVC insertion SBML curriculum; and to compare current Angoff and Hofstee MPSs with those set using these methods in 2010 to evaluate stability of expert judgments over time.12
We performed Angoff, Hofstee, Mastery Angoff, and Patient-Safety method standard-setting exercises for internal jugular (IJ) and subclavian (SC) CVC insertion checklists in April and May 2015. The study was performed at Northwestern Memorial Hospital, a tertiary academic medical center in Chicago, Illinois, and was approved by the Northwestern University Institutional Review Board.
All standards were set using previously published IJ and SC CVC skills checklists.13 A group of 12 experienced, board-certified attending physicians with experience teaching, supervising, and performing CVC insertions was recruited to participate as expert judges for the standard-setting exercises.
The first part of the exercise was to determine MPSs using both the traditional Angoff and Hofstee methods. Judges were provided with historical performance data from prior CVC SBML pretests and posttests to guide their judgments.12 For the Angoff method, judges described the characteristics of the “borderline” resident (a marginally competent trainee who has a 50% chance of passing the examination) and then estimated the proportion of borderline residents who would correctly perform each step of the IJ and SC CVC checklists at posttest (after training). For the Hofstee method, each expert recorded judgments in response to four questions about the CVC checklist: What is the maximum acceptable passing score? What is the minimum acceptable passing score? What is the maximum acceptable failure rate? And what is the minimum acceptable failure rate?5,6,9
The judges also determined MPSs for the CVC checklist using both the Mastery Angoff and Patient-Safety approaches. However, historical resident performance data were not provided for this part of the exercise because mean pretest performance and performance on first test attempt after training are not relevant in a mastery setting where individual learners are tested and retested multiple times until reaching the passing standard.10,11 For the Mastery Angoff method, judges characterized the “well-prepared” resident (in this case defined as a trainee who can perform the procedure safely on patients, with minimal to no supervision) and then estimated the proportion of well-prepared residents who would correctly perform each step of the IJ and SC CVC checklists at final posttest (i.e., before being allowed to proceed to perform the procedure on patients, and after as much training as needed to reach that point). Using the Patient-Safety method, judges were asked to review each checklist item and determine whether or not it had implications for patient safety, comfort, or outcomes. Items that affect these domains were considered critical items, while those that did not were considered noncritical items. Judges set standards separately and conjunctively for critical and noncritical items.
We collected demographic data from the 12 judges including years in practice, medical specialty, and whether they taught or supervised CVC insertions by trainees. Primary outcome measures were to compare the MPSs derived from traditional Angoff and Hofstee standard-setting exercises with those obtained using the Mastery Angoff and Patient-Safety approaches. We also evaluated how the Mastery Angoff and Patient-Safety MPSs would affect passing rates from previously published historical performance of IM and EM residents who participated in a CVC insertion SBML curriculum based on traditional Angoff and Hofstee methods.4,14,15 We compared the demographic and clinical experience data of the residents who did not meet the MPSs based on the 2010 traditional Angoff and Hofstee approaches versus those who did not meet the MPSs on the new Mastery Angoff and Patient-Safety standards (2015). Demographic and clinical data included age, sex, specialty (IM or EM), postgraduate year, United States Medical Licensing Exam Step 1 and 2 scores, whether the resident graduated from a U.S. medical school, number of procedures previously performed, and procedure self-confidence (0 = not confident to 100 = very confident). A secondary outcome measure was to compare the MPSs set by traditional Angoff and Hofstee methods in 201012 versus those set in this study (which were based on similar performance data) to assess the stability of traditional Angoff and Hofstee judgments over time and judges.
Each MPS was expressed as a percentage and associated number of checklist items an examinee would need to perform correctly to complete training. We compared the MPSs obtained by the traditional Angoff and Hofstee methods, and the MPS obtained by averaging across the traditional Angoff and Hofstee results, versus the MPSs obtained with the Mastery Angoff and Patient-Safety methods for both IJ and SC CVC checklists using Wilcoxon signed-rank tests. We compared demographic and clinical experience data between groups of residents who did not meet or exceed the MPS based on the 2010 traditional Angoff and Hofstee and the 2015 Mastery Angoff and Patient-Safety standards using the independent-samples t test, Mann–Whitney U test, or chi-square test. We compared the average of the traditional Angoff and Hofstee MPSs set in 2010 versus the average of the traditional Angoff and Hofstee MPSs set in this study using the Mann–Whitney U test. We performed analyses using SPSS statistical software, version 24 (IBM Corp, Armonk, New York).
Judge demographic data are presented in Table 1. All judges were board-certified faculty physicians representing the following: IM (2 pulmonary/critical care, 2 interventional cardiology, 2 nephrology, 2 hospital medicine), EM (1), surgical critical care (2), and anesthesiology (1). All judges reported current teaching and supervision of CVC insertions.
Table 2 shows the final MPS (as set in 2010 and in 2015) for each 29-item checklist13 derived using the Angoff, Hofstee, Mastery Angoff, and Patient-Safety standard-setting methods. For the Patient-Safety method, the judges deemed all but 2 items on both the IJ and SC checklists as “critical” to patient safety, patient comfort, and procedure outcome. The 2 remaining items were deemed critical for patient comfort only. Therefore, no MPS for noncritical items was reported. As shown in Table 2, the MPSs derived from the Mastery Angoff and Patient-Safety approaches were equivalent (98% or 28/29 items for both methods). The MPSs from the traditional Angoff, the Hofstee, and the average of the traditional Angoff and Hofstee methods were lower than the Mastery Angoff and the Patient-Safety methods for both the IJ and SC skills examinations (all P = .002). Next we evaluated the impact of the mastery MPSs on passing rates based on the performance of a historical cohort of 143 IM and EM residents. These residents completed the SBML intervention4,14,15 from 2006 to 2012. We based their passing scores on the traditional Angoff and Hofstee MPSs of 26/29 items correct for both IJ and SC CVC insertion (set in 2010; Table 2).12 When applying the Mastery Angoff and Patient-Safety MPSs of 28/29 items to the performance of the historical cohort, 55/123 residents (45%) who had originally passed the IJ checklist and 36/130 residents (28%) who had originally passed the SC checklist would now require additional training and assessment. Table 3 shows the demographic and clinical experience of residents who did not achieve the MPSs using the 2010 traditional Angoff and Hofstee approaches and the MPSs using the 2015 Mastery Angoff and Patient-Safety standards for both IJ and SC assessments. There were no statistical differences between groups. Supplemental Digital Appendix 1, available at http://links.lww.com/ACADMED/A533, shows the most common items missed on the checklist for the cohort of residents who did not meet the 2010 traditional Angoff and Hofstee and the 2015 Mastery Angoff and Patient-Safety MPSs. There was wide variability in the specific items missed by each group. However, “maintains sterile technique” was the most common item missed in both groups.
The MPS of 90%, obtained by averaging the new traditional Angoff and Hofstee MPSs, was very close to the final MPS of 88% and 87% for IJ and SC, respectively, in 2010 (P = not significant, Table 2). The resulting MPS was stable at 26/29 items for both 2010 and 2015.
A true mastery learning curriculum strives to produce learners who are “well prepared” rather than “minimally competent”10; learners continue to practice and retest without penalty until the required standard is achieved. Mastery learning has expanded beyond procedural skills to include communication16 and team-based skills.8 As this educational model has expanded in use, questions have been raised about the applicability of traditional standard-setting approaches in this context. The findings from this study show that newer mastery-learning standard-setting approaches can be effectively used to set standards in a mastery learning curriculum. As expected, these approaches yield more stringent MPSs than those derived from traditional standard-setting approaches not designed for a mastery learning environment, and would result in additional practice and retesting before learners progress to inserting CVCs in actual patients. Consistent with our previous studies, demographic and clinical experience data of residents who were unable to meet the MPSs were not different between the 2010 traditional Angoff and Hofstee and the 2015 Mastery Angoff and Patient-Safety groups. Therefore, we suggest that these data cannot be used to predict who will pass the CVC insertion clinical skills assessment.4,14,15 Our findings are also consistent with the Yudkowsky 2014 study,11 which compared the traditional Angoff method with the Patient-Safety method for six basic clinical procedures such as IV insertion and phlebotomy. However, this study goes beyond the previous study by including the Hofstee and the combined Angoff/Hofstee methods in the traditional methods comparison group, by adding the more recently described Mastery Angoff as an additional mastery-focused method, and by addressing the more complex procedure of CVC insertion.
While the MPSs obtained from mastery-focused methods in this study are more stringent than those derived by the Angoff and Hofstee methods, the more lenient CVC insertion MPS set in 2010 has already been shown to have important downstream clinical outcomes. Specifically, completion of CVC SBML using that MPS significantly improves trainee skill that is retained over time, reduces iatrogenic complications, improves patient care outcomes, and is highly cost-effective.4,14,15,17–19 Because the judges felt that each CVC checklist item had important patient safety consequences, it is possible that additional clinical benefit could be obtained from these more stringent passing standards. Further study is needed to assess whether increasing the already-high CVC SBML standards will yield additional downstream patient care benefits.
A secondary purpose of this study was to compare our newly derived traditional Angoff and Hofstee standards with those set by a similarly composed expert panel in 2010.12 Other studies have shown the stability of the Angoff and Hofstee methods over much shorter time periods.6,9 In both this study and in 2010, expert panels reviewed similar group-level pretest and posttest checklist performance data to help guide their judgments. Our findings suggest that the traditional Angoff and Hofstee methods can produce consistent MPSs when informed by benchmark performance data that are stable over time. By contrast, the CVC MPS set in 2006 informed by substantially lower benchmark scores resulted in more lenient Angoff judgments and a lower combined average Angoff/Hofstee MPS of 79%.6 From a patient safety perspective, changes in mastery-level cut scores based on cohort performance are worrisome. In traditional settings, in which the great majority of students are expected to pass on the first attempt, benchmark performance data such as mean scores on first attempt provide content experts with a useful reality check regarding learner performance and tend to moderate minimal competency cut scores. However, pretest and first-attempt scores are not relevant in mastery settings, where students retest multiple times until they achieve the MPS. Instead, information for mastery standards focuses on the performance or performance outcomes (e.g., patient safety outcomes) of learners at the immediate next level of training, to support the inference that learners who meet the standard are indeed “well prepared.”10
Because these approaches are new, it is not known whether the Mastery Angoff and Patient-Safety methods produce MPSs that are stable over a similar length of time. However, the mastery approach MPS of 98% was stable across methods (Mastery Angoff and Patient-Safety) and CVC approaches (IJ and SC) in this study. This MPS was also identical to the Patient-Safety MPS of 98% set in the 2014 study with different judges across six different procedures.11 This suggests that faculty have a consistent and coherent expectation of high performance in the simulation lab before a learner is deemed “well prepared” to perform procedures on live patients.
Having reliable assessments with defensible standards is essential as medical education moves to a competency-based framework through the Accreditation Council for Graduate Medical Education Milestones20 and the Association of American Medical Colleges Entrustable Professional Activities (EPAs).21–23 Although mastery standards have not been formally built into these frameworks, the Mastery Angoff method can be a reliable component of a set of data supporting ranking a level 4 milestone (ready for unsupervised practice)24 or an EPA at level 4 supervision (supervision at a distance or post hoc).21 Incorporation of simulation-based assessments to mastery standards followed by assessment of skills during actual patient care is a well-studied approach that can be used as part of a comprehensive competency-based education assessment system. For example, at our institution we require residents to complete CVC SBML to mastery standards and then complete a minimum of five supervised CVC insertions before they are allowed to perform the procedure independently. This requirement is the same for all trainees regardless of postgraduate year, and residents who continue to insert CVCs are reassessed on the simulator every six months. Although this approach has been linked to improved downstream patient outcomes,4,14,17 further study is required to determine the optimal number of observed procedures following SBML needed to ensure safe independent practice.
Our study has several limitations. First, it was performed at one institution using one data set and a single panel of expert judges, potentially limiting generalizability. Second, we were unable to compare the downstream effects on patient care of using one method versus another. Evaluating the impact of applying a more stringent MPS to downstream patient care quality requires further study. Finally, we did not measure the stability of Mastery Angoff and Patient-Safety MPSs over time.
In conclusion, this study shows that newer standard-setting approaches optimized for mastery learning settings and patient safety resulted in higher MPSs for CVC insertion during SBML compared with traditional standard-setting methods. Rigorous assessment of skills in a mastery learning environment with the newer standard-setting approaches may be a reliable way to support milestone or entrustment decisions. Our previous studies demonstrated that periodic review of performance data is needed to inform a fair and reasonable passing standard.6,12 Our current study is another example of why critical appraisal, follow-up, and periodic examination of assessment tools and passing standards should be part of all medical education interventions.25 Faculty members involved in mastery learning curricula should implement appropriate standard-setting methods to address patient safety considerations, while understanding potential limitations and gaps in current knowledge.
Acknowledgments: The authors acknowledge Drs. Douglas Vaughan and Kevin O’Leary for their support and encouragement of this work. They also thank the expert panel who participated in this study as well as the internal medicine and emergency medicine residents at Northwestern Memorial Hospital for their dedication to patient care.
1. McGaghie WC. Mastery learning: It is time for medical education to join the 21st century. Acad Med. 2015;90:14381441.
2. McGaghie WC, Miller GE, Sajid AW, Telder TV. Competency-based curriculum development on medical education: An introduction. Public Health Pap. 1978;(68):1191.
3. McGaghie WC, Barsuk JH, Wayne DB. AM last page: Mastery learning with deliberate practice in medical education. Acad Med. 2015;90:1575.
4. Barsuk JH, McGaghie WC, Cohen ER, Balachandran JS, Wayne DB. Use of simulation-based mastery learning to improve the quality of central venous catheter placement in a medical intensive care unit. J Hosp Med. 2009;4:397403.
5. Downing SM, Tekian A, Yudkowsky R. Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teach Learn Med. 2006;18:5057.
6. Wayne DB, Barsuk JH, Cohen E, McGaghie WC. Do baseline data influence standard setting for a clinical skills examination? Acad Med. 2007;82(10 suppl):S105S108.
7. Wayne DB, Barsuk JH, O’Leary KJ, Fudala MJ, McGaghie WC. Mastery learning of thoracentesis skills by internal medicine residents using simulation technology and deliberate practice. J Hosp Med. 2008;3:4854.
8. Wayne DB, Butter J, Siddall VJ, et al. Mastery learning of advanced cardiac life support skills by internal medicine residents using simulation technology and deliberate practice. J Gen Intern Med. 2006;21:251256.
9. Wayne DB, Fudala MJ, Butter J, et al. Comparison of two standard-setting methods for advanced cardiac life support training. Acad Med. 2005;80(10 suppl):S63S66.
10. Yudkowsky R, Park YS, Lineberry M, Knox A, Ritter EM. Setting mastery learning standards. Acad Med. 2015;90:14951500.
11. Yudkowsky R, Tumuluru S, Casey P, Herlich N, Ledonne C. A patient safety approach to setting pass/fail standards for basic procedural skills checklists. Simul Healthc. 2014;9:277282.
12. Cohen ER, Barsuk JH, McGaghie WC, Wayne DB. Raising the bar: Reassessing standards for procedural competence. Teach Learn Med. 2013;25:69.
13. Barsuk JH, Cohen ER, Nguyen D, et al. Attending physician adherence to a 29-component central venous catheter bundle checklist during simulated procedures. Crit Care Med. 2016;44:18711881.
14. Barsuk JH, McGaghie WC, Cohen ER, O’Leary KJ, Wayne DB. Simulation-based mastery learning reduces complications during central venous catheter insertion in a medical intensive care unit. Crit Care Med. 2009;37:26972701.
15. Barsuk JH, Cohen ER, Potts S, et al. Dissemination of a simulation-based mastery learning intervention reduces central line-associated bloodstream infections. BMJ Qual Saf. 2014;23:749756.
16. Sharma RK, Szmuilowicz E, Ogunseitan A, et al. Evaluation of a mastery learning intervention on hospitalists’ code status discussion skills. J Pain Symptom Manage. 2017;53:10661070.
17. Barsuk JH, Cohen ER, Feinglass J, McGaghie WC, Wayne DB. Use of simulation-based education to reduce catheter-related bloodstream infections. Arch Intern Med. 2009;169:14201423.
18. Cohen ER, Feinglass J, Barsuk JH, et al. Cost savings from reduced catheter-related bloodstream infection after simulation-based education for residents in a medical intensive care unit. Simul Healthc. 2010;5:98102.
19. Barsuk JH, Cohen ER, McGaghie WC, Wayne DB. Long-term retention of central venous catheter insertion skills after simulation-based mastery learning. Acad Med. 2010;85(10 suppl):S9S12.
21. Journal of Graduate Medical Education. Nuts and bolts of entrustable professional activities. http://www.jgme.org/doi/pdf/10.4300/JGME-D-12-00380.1?code=gmed-site
. Accessed January 28, 2018.
22. Lomis K, Amiel JM, Ryan MS, et al.; AAMC Core EPAs for Entering Residency Pilot Team. Implementing an entrustable professional activities framework in undergraduate medical education: Early lessons from the AAMC Core Entrustable Professional Activities for Entering Residency pilot. Acad Med. 2017;92:765770.
23. Brown DR, Warren JB, Hyderi A, et al.; AAMC Core Entrustable Professional Activities for Entering Residency Entrustment Concept Group. Finding a path to entrustment in undergraduate medical education: A progress report from the AAMC Core Entrustable Professional Activities for Entering Residency Entrustment Concept Group. Acad Med. 2017;92:774779.
24. Accreditation Council for Graduate Medical Education and American Board of Internal Medicine. The Internal Medicine Milestone Project. https://www.acgme.org/Portals/0/PDFs/Milestones/InternalMedicineMilestones.pdf
. Accessed January 28, 2018.
25. Epstein RM. Assessment in medical education. N Engl J Med. 2007;356:387396.