Secondary Logo

Journal Logo

Research Reports

Validity Evidence for a Knowledge Assessment Tool for a Mastery Learning Scrub Training Curriculum

Hasty, Brittany N. MD, MHPE; Lau, James N. MD, MHPE; Tekian, Ara PhD, MHPE; Miller, Sarah E.; Shipper, Edward S. MD; Bereknyei Merrell, Sylvia DrPH, MS; Lee, Edmund W. MD; Park, Yoon Soo PhD

Author Information
doi: 10.1097/ACM.0000000000003007


The operating room can be a challenging learning environment for newcomers. A lack of preparedness was found to be the leading cause of bad experiences in the operating room for medical students.1 In addition, the operating room is fraught with unique language, confusion over team member roles, uncertainty over operating room etiquette, and the fear of being asked questions about surgical knowledge, all of which can leave students feeling intimidated.2 On the other hand, medical students who felt adequately prepared for the operating room were more likely to have experiences that positively influenced a career in surgery.1,3 Therefore, an introduction to this environment, via a formal scrub training curriculum, may facilitate student integration into the operating room.

One skill that is distinct in surgical specialties is scrubbing in. Scrubbing in is the act of washing the fingernails, hands, forearms, and elbows with a bactericidal solution in a systematic manner and gowning and gloving in an aseptic fashion before the start of any surgical procedure. In one study, more than 80% of students reported that being able to scrub in is an essential skill for learning in the operating room.4 In another study, less than half of students reported receiving any formal instructions, including scrub training, before entering the operating room.5 Surgical hand hygiene is a priority for health professionals as inappropriate techniques can lead to adverse patient outcomes, such as surgical site infections (SSIs), which can lead to increased length of hospitalization, likelihood of transfer to a surgical intensive care unit, cost of care, and mortality.6 Moreover, SSIs are a leading cause of unplanned readmission, further increasing the burden on our health care system.7 Thus, for students to actively participate as a member of the surgical team, they should rigorously adhere to aseptic techniques taught through a mastery learning curriculum.

Skills such as scrubbing in require learners to achieve “mastery,” rather than a minimum passing standard, as scrubbing in requires rigorous adherence to aseptic technique. Mastery learning is a time-variable instructional method, in which learners accomplish high educational outcomes before being allowed to perform in patient care areas.8 This contrasts with traditional learning methods in which learners are allotted limited curricular time and are only required to achieve minimal competency, thus resulting in variable learner outcomes. Learners under the mastery learning paradigm are allotted unlimited curricular time and are required to achieve a mastery passing standard (MPS), reflecting performance in which learners are well prepared for the next level of practice. Learners who do not achieve the MPS are instructed to engage in deliberate practice with expert feedback until the MPS is reached.9 Thus, when assessing learners in mastery learning curricula, passing standards must be rigorously developed, diligently defined, and defensible.

In 2014, the Association of American Medical Colleges released 13 Core Entrustable Professional Activities (EPAs) for Entering Residency. EPA 12 requires students to follow sterile technique when indicated, and under EPA 13, students are to “engage in daily safety habits,” one of which is hand washing.10 Previous studies evaluating surgical skills curricula that include a scrub training component have demonstrated increased student confidence and knowledge in performing proper surgical hand hygiene.11–14 However, no study to date has established validity evidence for a scrub training knowledge assessment tool. Ultimately, the development of assessment instruments with validity evidence for mastery learning is critical to ensuring the entrustment of learners with promoting daily safety habits and readiness for residency training. This study aims to examine the validity evidence, using Messick’s unified validity framework, for a scrub training knowledge assessment tool to demonstrate the utility and robustness of a multimodal, EPA-aligned, mastery learning scrub training curriculum.15–17


Study participants

Data for this study were gathered from Stanford University School of Medicine. All medical students and physician assistant students who wanted to participate or needed to participate in the curriculum as part of a required clerkship in the operating room during the study period (April 2017–June 2018) were eligible to be included. The medical school’s curriculum includes 2 years of preclinical education followed by 2 clinical years. The physician assistant curriculum is a 30-month program, with 18 months of preclinical education. Medical and physician assistant students share the same preclinical education. Fourth-year visiting subinternship medical students were required to take the scrub knowledge assessment and achieve the MPS to receive scrub privileges at our institution.

Curricular structure

The curricular structure was based on the mastery learning model, and the curricular content was based on a prior national needs assessment.8,18 The multimodal scrub training curriculum consisted of a precurriculum (or baseline) knowledge assessment with 25 selected response items about scrubbing in as well as 5 items related to demographic characteristics (see below), followed by an 18-minute video that contained content covering the curriculum’s 4 learning objectives (Box 1).19 The scrub training knowledge assessment can be found in Supplemental Digital Appendix 1 (at Students were encouraged to rewatch the video as needed (as of January 15, 2019, there had been over 100,000 views).19 Following this video-based training, students reported for an in-person scrub training session with a team of 4 experts, consisting of an operating room nurse educator, 2 surgical education fellows (B.N.H., E.W.L.), and a faculty surgeon (J.N.L.). All of these experts participated in setting the institutional surgical hand hygiene policies, which were based on national best practices. All students engaged in deliberate practice with feedback from the 4 experts until they were able to perform all scrubbing in steps without contamination.9

Box 1

Learning Goal and Objectives for a Multimodal, EPA-Aligned, Mastery-Based Scrub Training Curriculum, Stanford University School of Medicine, April 2017–June 2018

Scrub training curriculum goal

By the end of this course, students should be able to apply scrub training principles and techniques to enter the operating room environment and participate fully, at their level of training, as a member of the surgical team.

Scrub training curriculum objectives
  1. Describe and demonstrate appropriate surgical attire for operating room personnel.
  2. Identify all personal protective equipment necessary for entry into the operating room.
  3. Describe and demonstrate proper surgical hygiene, donning of a mask, gowning, and gloving.
  4. Describe and demonstrate the understanding of sterile boundaries.

Once students demonstrated the ability to scrub in without contamination, they received a passing grade on the in-person portion of the curriculum. Following this, students completed the same 25-item knowledge assessment. If students passed the in-person portion of the curriculum and reached the MPS on the postcurriculum knowledge assessment, they received unsupervised scrub privileges. Since this curriculum was based on a mastery learning framework, students were given unlimited curricular time to achieve the MPS. However, a majority of students completed the curriculum within 1 week.

Data collection

We collected validity evidence via students’ assessment scores pre curriculum, post curriculum, and 6 months following the curriculum from April 2017 to June 2018 at Stanford University School of Medicine. Student demographic data such as gender, year in school, prior experience with scrubbing in, and prior scrub training were also collected.


In accordance with recommendations from the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, we collected validity evidence based on Messick’s unified validity framework as follows.15,16


Items in the knowledge assessment were blueprinted to map to curricular objectives, EPAs 12 and 13, and Stanford operating room policies.10

Response process.

The knowledge assessment was piloted with our intended test-taking population, including both preclinical and clinical trainees. We solicited feedback from our test-taking population regarding their construct interpretation and understanding of the question stems and answer choices. All selected response items were reviewed by content and assessment experts (A.T., Y.S.P.) for consistency with item-writing standards.

Internal structure.

The following 3 knowledge assessment measures were calculated: (1) item discrimination, which demonstrates how well the knowledge assessment differentiated between high and low levels of learner performance; (2) individual item difficulty, which determines the percentage of students who answered an item correctly; and (3) the internal consistency reliability of the knowledge assessment.

Relations to other variables.

We measured the association of prior scrub training and prior experience scrubbing in with knowledge scores on the precurriculum knowledge assessment. Additionally, we measured the association of postcurriculum assessment scores with retention of knowledge 6 months after the curriculum.

Consequences of testing.

We attempted to determine the impact of the knowledge assessment on key stakeholders, such as the students themselves, operating room staff, and patients. The students were engaged in deliberate practice with expert feedback until they were able to master scrub training techniques. Criterion-referenced, rather than norm-referenced, standard setting methods were used. The MPS was established using the Mastery Angoff and Patient-Safety approaches.20 Standard setting judges (hereafter expert raters) included the surgery clerkship director (J.N.L.), a fourth-year medical student (S.E.M.), the vice chair of education in surgery, 2 surgical education fellows (B.N.H., E.W.L.), and an assistant professor of surgery. Raters were asked the following questions for each knowledge assessment item: (1) “What is the probability that the master student will accomplish this item?” and (2) “Is this item critical to patient safety?” To maintain a criterion-based standard setting process, we did not provide prior performance data to the raters. The master student was defined as one that could be entrusted to scrub in unsupervised.

Statistical analysis

Descriptive summary statistics were used to examine distribution of data and trends. A paired t test was used to examine pre- and postcurriculum differences in individual learner performance.

The internal consistency reliability of knowledge assessment results was measured using Cronbach’s alpha. Reliability coefficients above 0.70 were considered sufficient.21 Item difficulty measurements were calculated by dividing the total number of students who answered an item correctly by the total number of students. Item discrimination indices were calculated using item-total correlations. Item discrimination indices were considered ideal if they were +0.20 or higher; the next best index was +0.15 or higher, while items with an item discrimination of less than +0.14 were considered to have low item discrimination.21 Only when the content of a test item was considered critical for patient safety based on expert review (B.N.H., E.S.S., E.W.L., J.N.L.) were items with a discrimination index of less than +0.14 used.21 All tests were 2-sided, and P values of less than .05 were considered statistically significant.

Data were compiled and analyzed using SPSS version 25 (IBM Corp., Armonk, New York). The institutional review boards of Stanford University and the University of Illinois at Chicago approved this study.


Descriptive statistics

From April 2017 to June 2018, a total of 220 medical and physician assistant students participated in the scrub training curriculum (124 [56.4%] female, 96 [43.6%] male). There were 88 (40.0%) first-year medical students, 93 (42.3%) second-year medical students, 2 (0.9%) third-year medical students, 29 (13.2 %) fourth-year medical students, and 8 (3.6%) physician assistant students. One hundred eight (49.1%) students had previously received scrub training, and 89 (40.5%) had previously scrubbed into an operation. The mean precurriculum knowledge assessment score was 74.4% (standard deviation [SD] = 15.6), and the mean postcurriculum knowledge assessment score was 90.1% (SD = 8.3), yielding a Cohen’s d = 1.10, P < .001.

Sources of validity evidence


The knowledge assessment was designed based on the curriculum’s goals and objectives (Box 1). Questions were developed iteratively by content experts, including the surgery clerkship director (J.N.L.), a fourth-year medical student (S.E.M.), 2 surgical education fellows (B.N.H., E.W.L.), and an assistant professor of surgery. The final knowledge assessment contained 25 selected response items covering the 4 learning objectives: appropriate operating room attire (5 questions), personal protective equipment required for entry into the operating room (6 questions), proper surgical hygiene (6 questions), and sterile boundaries (8 questions). In addition to the content experts, medical education experts (J.N.L., B.N.H., E.W.L., A.T., S.B.M., E.S.S.) and a psychometrician (Y.S.P.) reviewed the knowledge assessment before implementation.

Response process.

Initially, the knowledge assessment tool was piloted with 14 third-year medical students on the first day of their surgery clerkship. Following this initial pilot, a focus group was conducted with the pilot students to obtain feedback on their construct interpretation and ability to understand and interpret the question stems and answer choices to reduce construct relevant variance. Next, the knowledge assessment was piloted with 104 second-year medical students. Following each pilot, we gathered the content and assessment experts to conduct a thorough review of the items to ensure that the questions and response options were consistent with item-writing guidelines and revise them if needed, thereby gathering evidence of alignment of the intended scrub knowledge assessment (construct) and the response engaged by the learners. In addition, all content experts served as both instructors and evaluators for students, and as instructors, they all received feedback on curricular content delivery. Therefore, the content experts were able to consistently deliver the curriculum to ensure all students received the same curricular content.

Internal structure.

The original knowledge assessment tool contained 12 selected response items with an initial internal reliability of α = 0.52, which we determined to be insufficient following the second pilot with 104 second-year medical students. Using the Spearman-Brown formula, we determined that the test needed to be expanded to 25 selected response items to reach a sufficient reliability.21 The final Cronbach’s alpha was determined to be 0.71, which corresponded to a sufficient reliability.21 The mean item discrimination index was 0.35, indicating that our knowledge assessment tool was able to discriminate between learners at different levels of training.21 While there were 4 questions (questions 2 [0.01], 3 [−0.09], 17 [0.13], and 20 [0.06]) with poor item discrimination (< 0.14), there was consensus among the content experts that these questions should be kept due to the importance of the content in them for understanding sterile technique and operating room policies (Table 1). Furthermore, in setting mastery learning standards, item discrimination may be less important than item relevance, lending credibility to our decision to retain these test items.20 The mean item difficulty was 0.74 (SD = 0.2). Individual item discrimination indices and item difficulties are listed in Table 1.

Table 1:
Selected Response Items on the Knowledge Assessment Tool for a Multimodal, EPA-Aligned, Mastery-Based Scrub Training Curriculum, Stanford University School of Medicine, April 2017–June 2018a

Relations to other variables.

Among participants, 108 (49.1%) had prior scrub training, and their previous experience was predictive of their precurriculum knowledge assessment performance. Students with previous scrub training scored significantly higher on their precurriculum knowledge assessment than those without previous scrub training (81.9% [SD = 12.6] vs 67.0% [SD = 14.9]; P < .001). Similarly, 89 (40.5%) participants had previously scrubbed into an operation, and they scored significantly higher on their precurriculum knowledge assessment than those who had not previously scrubbed in (82.9% [SD = 13.7] vs 68.7% [SD = 14.2]; P < .001). For the subset of our students who retested after 6 months (n = 31), the mean postcurriculum score from 6 months prior was 85.8% (SD = 9.0) and the mean retest score was 81.4% (SD = 13.0), with a mean score difference of 4.4 points. These results showed retention of scrub training knowledge as demonstrated via the consistency of means between the postcurriculum and 6-month retest scores. However, there was a significant increase in the variability of scores between the postcurriculum and 6-month retest scores (SD = 9.0 vs SD = 13.0; P < .001).

Consequences of testing.

There were consequences of this knowledge assessment for key stakeholders. The consequences of failing the assessment were that students were remediated until they fulfilled each of the curriculum’s learning objectives and met the MPS. For patients and operating room staff, the consequences of the assessment were that they could be reassured that medical students were knowledgeable in surgical hand hygiene and could guard against SSIs.

The Mastery Angoff overall cut score for the knowledge assessment was 92.0% (n = 23/25). With this cut score, only 88 of 178 (49.4%) students met the MPS on the postcurriculum knowledge assessment on their first attempt. By consensus, our 6 expert raters deemed questions 5, 12, 16, 17, and 18 to be critical for patient safety (Table 2), and the patient safety cut score was determined to be 97.0%. With this patient safety cut score, 161 of 206 (78.2%) students met the MPS on their first attempt of the postcurriculum knowledge assessment. On average, students required 1.5 attempts to achieve the MPS. Rater responses, rater means, and item means can be seen in Table 2. Students who did not achieve the MPS were given individualized feedback to address knowledge deficits and were allowed to rewatch the scrub training video and retake the assessment until they achieved the MPS. A summary of item statistics for the full knowledge assessment tool can be seen in Table 3.

Table 2:
Question Ratings, Using the Mastery Angoff and Patient-Safety Approaches, for the 25 Selected Response Items on the Knowledge Assessment Tool for a Multimodal, EPA-Aligned, Mastery-Based Scrub Training Curriculum, Stanford University School of Medicine, April 2017–June 2018a
Table 3:
Summary of Item Statistics for the 25 Selected Response Items on the Knowledge Assessment Tool for a Multimodal, EPA-Aligned, Mastery-Based Scrub Training Curriculum, Stanford University School of Medicine, April 2017–June 2018a


This study describes the administration of and provides validity evidence for a knowledge assessment tool for making entrustment decisions about readiness for unsupervised scrub practice, using Messick’s unified validity framework.16 To our knowledge, this is the first study to apply the Mastery Angoff approach to a selected response assessment tool at the undergraduate medical education level. Moreover, our study used a mastery-based standard setting approach that incorporates patient safety standards. We describe the structure of a mastery-based curriculum, including the mastery-based standard setting process, which could be replicated for other educational programs.

Recently, Pilieci et al showed that video-based curricular instruction is superior to skill demonstration alone in providing medical students with knowledge of surgical hand hygiene.14 However, their curriculum was not based on the mastery learning model and students were not required to complete a precurriculum knowledge assessment.14 In addition, there was no validity evidence gathered for the knowledge assessment tool, which limits the inferences we can draw from the knowledge scores for these learners.20 By implementing a mastery-based curriculum with mastery learner standards and establishing validity evidence for a knowledge assessment tool, we were able to draw credible inferences from our students’ test scores; thus, our educators were able to confidently grant entrustment decisions.

This study is also the first to describe an EPA-aligned, multimodal, mastery learning scrub training curriculum. Using our multimodal curriculum, we were able to show that student knowledge of scrub technique significantly increased post curriculum. A high first-attempt failure rate should not cause alarm in mastery-based curricula. Our 50.6% first-time fail rate was not unexpected as our learners were held to a mastery passing standard rather than a minimum passing standard; thus; they were often required to retrain and retest.20 Additionally, we were able to show that a subset of our learners sustained scrub training knowledge for at least 6 months.

As we continue to transition to competency-based rather than time-based medical training, reliable assessments with defensible standards are essential for assessing EPAs.10,22 Medical and physician assistant students who complete our scrub training curriculum can be considered fully entrusted with the practice of scrubbing in unsupervised, partially satisfying the requirements for EPAs 12 and 13 at level 4 supervision (supervision at a distance or post hoc).10 Thus, implementing an EPA-aligned, multimodal, mastery learning scrub training curriculum is not only an example of an ongoing commitment to patient safety but is also pushing forward toward competency-based medical education.

This study was based at a single academic medical center. Therefore, it is possible that our students may have had different densities in knowledge of scrub training techniques and principles than other U.S. medical students. However, the passing standards were based on mastery-based criterion standards aligned with EPAs, which provide a framework for use at other medical schools. Second, our scrub training curriculum has thus far only been implemented for 1 year, so long-term retention of the knowledge and skills taught in this curriculum has yet to be measured. Additionally, given the short period of time (approximately 1–2 weeks) between the precurriculum and postcurriculum knowledge assessments, students could have developed test memory; however, students were not told which questions they answered incorrectly, rather they were given individualized feedback tailored to their knowledge deficits. Third, we did not assess the degree to which student scores predicted their likelihood of contaminating either themselves or patients during actual surgical cases. Finally, we have yet to correlate the implementation of our scrub training curriculum with improved patient outcomes.

In conclusion, this study provides validity evidence for the use of scores derived from a knowledge assessment in a mastery-based curriculum among medical and physician assistant students who demonstrated significant improvement after scrub training. The mastery-based curriculum described in this study could also be applied to teaching and assessing other EPAs. Future work could focus on further dissemination of our curriculum to gather additional validity evidence to support the use of our knowledge assessment tool across other institutions. Moreover, studies could evaluate the performance of students in actual surgical cases to determine if their knowledge test scores can predict their likelihood of contaminating themselves or the patient. In many institutions across the country, medical and physician assistant students routinely scrub in unsupervised, making a mastery learning curriculum for this activity that holds students to rigorous standards critical to ensuring patient safety.


The authors wish to thank perioperative services at Stanford Hospital and Clinics, especially Susan Gates, RN, BSN, whose ongoing commitment to student education and patient safety is exemplary.


1. Chapman SJ, Hakeem AR, Marangoni G, Raj Prasad K. How can we enhance undergraduate medical training in the operating room? A survey of student attitudes and opinions. J Surg Educ. 2013;70:326–333.
2. Ravindra P, Fitzgerald JE, Bhangu A, Maxwell-Armstrong CA. Quantifying factors influencing operating theater teaching, participation, and learning opportunities for medical students in surgery. J Surg Educ. 2013;70:495–501.
3. Miller S, Shipper E, Hasty B, et al. Introductory surgical skills course: Technical training and preparation for the surgical environment. MedEdPORTAL. 2018;14:10775.
4. Fernando N, McAdam T, Youngson G, McKenzie H, Cleland J, Yule S. Undergraduate medical students’ perceptions and expectations of theatre-based learning: How can we improve the student learning experience? Surgeon. 2007;5:271–274.
5. Fernando N, McAdam T, Cleland J, Yule S, McKenzie H, Youngson G. How can we prepare medical students for theatre-based learning? Med Educ. 2007;41:968–974.
6. Kirkland KB, Briggs JP, Trivette SL, Wilkinson WE, Sexton DJ. The impact of surgical-site infections in the 1990s: Attributable mortality, excess length of hospitalization, and extra costs. Infect Control Hosp Epidemiol. 1999;20:725–730.
7. Merkow RP, Ju MH, Chung JW, et al. Underlying reasons associated with hospital readmission following surgery in the United States. JAMA. 2015;313:483–495.
8. McGaghie WC. When I say … mastery learning. Med Educ. 2015;49:558–559.
9. Ericsson KA. Acquisition and maintenance of medical expertise: A perspective from the expert-performance approach with deliberate practice. Acad Med. 2015;90:1471–1486.
10. Association of American Medical Colleges. Core Entrustable Professional Activities for Entering Residency: Curriculum developers’ guide. Published 2014. Accessed July 19, 2019.
11. Shipper ES, Miller SE, Hasty BN, Merrell SB, Lin DT, Lau JN. Evaluation of a technical and nontechnical skills curriculum for students entering surgery. J Surg Res. 2017;219:92–97.
12. Drolet BC, Sangisetty S, Mulvaney PM, Ryder BA, Cioffi WG. A mentorship-based preclinical elective increases exposure, confidence, and interest in surgery. Am J Surg. 2014;207:179–186.
13. Antiel RM, Thompson SM, Camp CL, Thompson GB, Farley DR. Attracting students to surgical careers: Preclinical surgical experience. J Surg Educ. 2012;69:301–305.
14. Pilieci SN, Salim SY, Heffernan DS, Itani KMF, Khadaroo RG. A randomized controlled trial of video education versus skill demonstration: Which is more effective in teaching sterile surgical technique? Surg Infect (Larchmt). 2018;19:303–312.
15. American Educational Research Association, American Psychological Association, National Council on Measurement in Education. The Standards for Educational and Psychological Testing. 2014.Washington, DC: American Psychological Association.
16. Messick S. Linn RL. Validity. In: Educational Measurement. 1989.New York, NY: American Council on Education and Macmillan.
17. Downing SM. Reliability: On the reproducibility of assessment data. Med Educ. 2004;38:1006–1012.
18. Leeper K, Stegall MS, Stegall MH. Basic aseptic technique for medical students: Identifying essential entry-level competencies. Curr Surg. 2002;59:69–73.
19. YouTube. Stanford scrub training video. Accessed July 19, 2019.
20. Yudkowsky R, Park YS, Lineberry M, Knox A, Ritter EM. Setting mastery learning standards. Acad Med. 2015;90:1495–1500.
21. Downing SM, Yudkowsky R. Assessment in Health Professions Education. 2009.New York, NY: Routledge.
22. Chen HC, van den Broek WE, ten Cate O. The case for use of entrustable professional activities in undergraduate medical education. Acad Med. 2015;90:431–436.

Supplemental Digital Content

© 2019 by the Association of American Medical Colleges