Journal Logo

Empirical Investigations

Validity of Simulation-Based Assessment for Accreditation Council for Graduate Medical Education Milestone Achievement

Isaak, Robert S. DO; Chen, Fei PhD; Martinelli, Susan M. MD; Arora, Harendra MD; Zvara, David A. MD; Hobbs, Gene CHSE; Stiegler, Marjorie P. MD

Author Information
Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare: June 2018 - Volume 13 - Issue 3 - p 201-210
doi: 10.1097/SIH.0000000000000285
  • Free


As part of the Next Accreditation System of the Accreditation Council for Graduate Medical Education (ACGME), anesthesiology residency training programs are required to evaluate residents twice per year on 25 distinct subcompetency milestones. Similar ACGME milestones requirements exist across medical specialties. Some of the milestone domains are particularly challenging to repeatedly and reliably observe during clinical care (eg, crisis or emergency management, difficult communications with patients and families). Simulation-Based Milestones Assessment (SBMA) provides a useful and efficient means to overcome these challenges to traditional resident evaluation methods. Simulation-Based Milestones Assessment can include complex computerized mannequin-based simulations or traditional Objective Structured Clinical Examination (OSCE) activities such as written problem-solving stations, standardized patient (SP) encounters, and interpretation of laboratory or imaging results. Simulation has been used in other contexts to evaluate anesthesiology resident skills, most often for the purpose of formative feedback.1–6 However, simulation is increasingly being considered as a method for summative assessments and higher-stakes evaluations, including successful progression through residency and certification examinations.7 However, no criterion standard SBMA tool has been provided or validated by the ACGME.8 Although there are multiple studies evaluating the validity of simulation-based assessments,9,10 to date there are minimal data regarding the external validity of simulation assessment scores specifically for the intent of ACGME milestones evaluations.4 Subsequently, there is a recognized need for additional research comparing simulation, or OSCE-style assessment tools, with other established methods of assessment for milestone evaluations.11

Simulation-based assessments may require a trade-off between validity and complexity.12 That is, more tightly controlled study variables increase the psychometric robustness but potentially decrease resemblance to real-life case encounters and vice versa. However, many of the milestones involve measurement of complex medical decision-making and other nontechnical competencies; evaluation of these skills requires complex simulations. Therefore, to evaluate the association between this SBMA program and other metrics of learning progression, this study analyzed whether (1) SBMA scenarios discriminate between postgraduate year (PGY) of training, (2) SBMA scores improve over time, and (3) SBMA scores correlate with other traditional measures of performance such as clinical evaluations.


This investigation was reviewed by the Office of Human Research Ethics at the University of North Carolina and was determined to be exempt from institutional review board approval. This was a retrospective analysis from a single residency program cohort of 55 anesthesiology residents' SBMA data comprising 30 discrete milestone scenario assessments for four separate simulation sessions, with residents from at least two PGYs for comparison. The four separate simulation sessions occurred for a 2-year time span, with approximately 6 months between each session. The SBMA activities included complex computerized mannequin-based simulations as well as traditional OSCE activities such as SP encounters, procedural demonstration on task trainers, and interpretation of laboratory or imaging results. Each scenario was evaluated for time-in-training discrimination (ie, performance scores clustered by PGY). Next, an analysis of two PGYs of residents' average SBMA scores between January 2015 and April 2016 was performed. Finally, SBMA performance was compared with residents' daily clinical evaluations (based on ACGME milestones) accrued during the 3-month period of the time that immediately preceded each of the SBMA sessions. The daily clinical evaluations were completed by supervising clinical faculty, based on residents' performance within the context of a single day of clinical encounters with regard to milestone competencies. Assuming that the residents worked 6 days per week in preceding 3-month period, each resident would have had up to 72 clinical evaluations (6 d/wk for 12 weeks).


Simulation-Based Milestones Assessment sessions were held twice a year at 6-month intervals to meet the biannual ACGME milestone reporting requirement. For a given SBMA session, all participants managed the same set of simulated encounters regardless of PGY level. Sessions consisted of four to eight scenarios, each of which was 7 to 15 minutes in duration. Each session presented unique sets of simulated scenarios that residents had not been exposed to previously. However, residents were familiar with the capabilities, expectations, and limitations of the SBMA environment as a result of their previous participation in a mandatory anesthesiology residency training simulation curriculum, requiring at least three half-day sessions per year. Further details of simulation capabilities and curricula development for these sessions have been previously reported.13 The scenarios recreated clinical situations using complex computerized mannequins (Laerdal SimMan3G, Wappingers Falls, NY), various procedural task trainers (Simulab, Seattle, Washington), standardized actors as patients or other healthcare professionals, props, and moulage to mimic patient care environments.


The simulation faculty created SBMA scenarios to address milestone competencies selected by the departmental Clinical Competency Committee. Milestone selection focused on those that were otherwise difficult to observe in routine clinical practice (eg, ethical issues such as obtaining consent from a Jehovah's witness for major surgery) or that mandated significant involvement from the supervising physician such that observation and assessment of the resident were either unethical or impossible13 (eg, life-threatening emergency). Scenarios, corresponding milestones, participating PGY cohorts, and scores by PGY are outlined in Table 1. Participants were given written pre-encounter stem information to provide necessary background and/or medical information, as well as a defined goal for the encounter (Appendix 1, Part 2).

Simulation-Based Milestone Assessment Scenarios, Milestones Assessed, and Scores by Year in Training


Professional “SP” actors were used in all applicable SBMA scenarios. Standardized patient actors were trained to portray clinical roles (eg, surgeon, nurse, medical student) or family members. Standardized patients did not participate in scoring. Standardization and SP training for the SBMA sessions were assured using a three-stage preparation pathway directed by the same expert simulation faculty member. Although no criterion standard for SP training is universally accepted, our process mirrored published best practices.14–16 The orientation stage occurred 2 weeks before the SBMA and consisted of a case overview and script reading of each scenario, including particular areas for emphasis or nuance. Each SP was assigned to a single role in a single scenario. Each role was played exclusively by a single SP such that all participants had the same interpersonal experiences. In stage two, the simulation faculty director, support staff, and SPs simulated the simulation with a full walk-through in the actual examination environment. During this phase, the simulation faculty director played the role of the participant and a spectrum of potential participant behaviors were explored. It was ensured that the SP could perform any necessary physical tasks and understood the critical verbal cues. This allowed for identification of scenario design weaknesses, while scenario flow and feasibility were confirmed. Stages 1 and 2 of SP training were digitally recorded. Standardized patients were expected to study the videos and written notes for 2 weeks. In the third stage, a dress rehearsal was performed immediately before the first resident assessment on the SBMA day. During this rehearsal, a junior attending or anesthesiology fellow who was not involved in the scenario creation played the role of the participant, and the final rehearsal was observed by the simulation faculty director to ensure quality of SP performance. This final check ensured the script was executed as intended, actor cues were accurate, and contingencies were addressed. Scripts contained preprogrammed changes in simulated patient vital signs, actor cues and responses, and physical environment details.


Faculty content experts developed behaviorally anchored analytical score sheets (Appendix 1, Part 1) for each scenario via a modified Delphi method, beginning with standard guidelines or algorithms (eg, ASA Practice Guidelines for Preoperative Fasting, ASA Practice Guidelines for Management of the Difficult Airway) when possible.17,18 To ensure objective, reliable scoring, concrete observable behaviors on the scores sheets translated directly to each portion of the anesthesiology milestone rubric.19 All scenarios were captured via audiovisual recordings from two or three angles (CAE Healthcare Learning Space, Sarasota, FL). To determine the reliability of the scoring method (video review by one faculty rater who also serves as a clinical faculty supervisor who provides clinical evaluations for residents), two pilot studies were performed, demonstrating good to excellent interrater reliability.20,21 One pilot study of three scenarios with 41 resident participants and nine faculty raters demonstrated acceptable intraclass correlation coefficient (.50–.85) for agreement between one rater scoring live in real time versus two scoring separately by delayed video review, and paired t tests showed no significant mean score difference between raters using the two different assessment approaches. In a second pilot study, comparisons were made from the scores of faculty members who personally knew the residents with scores from faculty at an outside institution who did not know the residents. All raters used video to observe and score the scenarios. We found strong correlation between raters' scores: scenario 1 r = 0.80 and scenario 2 r = 0.58 as well as combining all 22 encounters (scenarios 1 + 2) r = 0.69, P < 0.0001. We used these pilot data to inform our methodology in the context of the reported reliability among faculty performing standard, direct observation of residents during clinical care, which has been reported in the literature to vary widely, in one review from 0.6 to 0.87.22 In a study of direct observation assessment of milestones for surgical residents, faculty ratings had an interrater reliability near zero.23 Another study of five- and nine-point scoring tools for miniclinical evaluation exercise reported intraclass correlation coefficient of 0.4 and 0.43, respectively.24


Because of the logistics and nature of the SBMA sessions, residents did not receive feedback or debriefing immediately after the sessions. Rather, after all of evaluations were completed, scores results analyzed, and scenario validity confirmed, residents received the actual physical score sheets completed by faculty members. The score sheets contained the overall score performance, behaviors completed, and qualitative comments from the evaluators. In addition, residents were given another score sheet, in bar graph form, that showed their individual SBMA scores in comparison with the average of the residents in the same year in residency training (Appendix 2). Although residents did not have direct access to the recordings of their performances, they were asked to meet with their faculty mentor, who did have direct access to the recordings, and review their scenario performances together. Faculty mentors were encouraged to facilitate these meetings and asked to debrief the residents' performances after each session. Data on the frequency to which residents reviewed their performances with their faculty mentor were not collected.

Statistical Analysis

Thirty scenarios were analyzed for discrimination by PGY level. One-way analysis of variance (ANOVA) was conducted to examine between PGY cohort scores on each scenario. Four scenarios demonstrated simulation environment failures that resulted in breaches of scenario standardization and were therefore excluded in subsequent analysis of score trends and correlation with other performance measures (see Table 1 for details of the included and excluded scenarios). Residents' session average score trends over time were analyzed using repeated measure ANOVA to see whether residents' between-session scores were significantly improved. Post hoc tests using the Bonferroni correction were conducted to examine pairwise session score difference. Lastly, residents' scores in the four SBMA sessions were compared with other metrics of their clinical performance (ie, clinical faculty evaluations of patient care encounters). Because of the variability in the number of evaluations completed by faculty for given residents during the 3-month period before the SBMA session, Pearson correlation coefficients were calculated for the residents who had at least one clinical evaluation score. The analytical sample size of residents for each of the four correlation analyses ranged from 18 to 27. In addition, Cook distance (ie, Cook D) was used for outlier diagnostics of the bivariate linear relationship using SBMA score to predict clinical evaluation score. Cases of Cook D greater than 4/n (ie, 4 divided by the sample size) was deemed outlier for the bivariate relationship.


Of the 30 SBMA scenarios, 24 demonstrated discrimination by PGY level. Six SBMAs did not show discrimination by PGY. Upon review of the video data from these six scenarios, four were found to have simulation environment failures. Two of these four scenarios had irregularities related to the timing of patient state vital signs changes, possibly confusing the participants, resulting in wide variability in scores across participants in all PGY cohorts. One of the four scenarios relied on moulage (eg, simulated blood) that went unnoticed by most participants while the other scenario used a simulated procedural task trainer (eg, airway mannequin) that did not adequately present the intended clinical challenge. As a result, the latter two cases resulted in all participants having similar scores regardless of PGY training levels. All four of these SBMAs were excluded from further analysis. The other two scenarios, from the group of six scenarios that failed to show discrimination among any groups of participants, did not have simulation-related problems, rather they seemed to be “too easy” in the content being assessed. Nonetheless, these two scenarios were included in subsequent analysis. Although not reported within the article to maintain clarity of the results, analysis was performed excluding these two scenarios and the findings were consistent with the results reported.

For residents who participated all four sessions, repeated measure ANOVA analysis showed no interaction between elapsed time and year of training. The between-session score improvements were statistically significant when residents from PGY2 = 2013 and PGY2 = 2014 were combined together (F (3, 54) = 17.79, P < 0.001) as well as when stratified by PGY2 year (F (3, 18) = 10.39, P < 0.001 for PGY2 = 2013, F (3, 36) = 9.41, P < 0.01 for PGY2 = 2014). Post hoc tests using the Bonferroni correction revealed that residents' SBMA scores improved significantly from the first to the fourth assessment period as shown in Figure 1 (P < 0.001 for 2 PGY cohorts combined, P = 0.03 for PGY2 = 2013, P = 0.01 for PGY2 = 2014; April 2016 vs. January 2015).

Average SBMA scores improve over time as residents progress through training. J15 = January 2015, M15 = May 2015, N15 = November 2015, A16 = April 2016.

Regression outlier diagnostics identified one observation of Cook D greater than 4/18 when using SBMA score to predict clinical evaluation score of May 2015. Thus, the observation was excluded from correlation analysis between the two scores of the session. Pearson correlation coefficients demonstrated moderate to strong correlation between residents' SBMA scores and their clinical evaluation scores for the 3 months preceding the SBMA session: r = 0.67, P < 0.01 (n = 27) for January 2015; r = 0.43, P = 0.09 (n = 17) for May 2015; r = 0.70, P < 0.01 (n = 24) for November 2015; and r = 0.70, P < 0.01 (n = 27) for April 2016. Scatterplots that represent the bivariate correlation between SBMA scores and clinical evaluation scores are shown in Figure 2.

Scatterplots demonstrate positive correlation between SBMA scores and clinical evaluations. The x-axis represents the SBMA scores (Jan2015 = January 2015, May2015 = May 2015, Nov2015 = November 2015, Apr2016 = April 2016) and the y-axis represents clinical evaluations for the 3-month period preceding the corresponding SBMA session (ClinicalJan15 = average of clinical evaluations gathered between November 2014 and January 2015, ClinicalMay15 = average of clinical evaluations gathered between March 2015 and May 2015, ClinicalNov15 = average of clinical evaluations gathered between September 2015 and November 2015, ClinicalApr16 = average of clinical evaluations gathered between February 2016 and April 2016). Grey areas represent the range of 95% confidence intervals of the fitted linear regression line.


Research specifically addressing the validity of SBMA activities for ACGME milestone assessment is lacking. Hastie and colleagues11 reviewed OSCE design and implementation for competency assessment in anesthesiology (including ACGME milestones and primary board certification), concluding with a call for centers to share their experiences with such assessments in the literature. Their review also emphasized the importance of reliability, validity, and rigorous iterative development, while acknowledging time, expertise, and cost as logistical challenges. As well, Sinz25 has described the educational basis and theory for integrating simulation with anesthesiology milestones and has described a method for developing OSCE-based milestones assessments, although this description does not include validation or analysis. Mudumbai et al4 published a small validation study of simulation-based assessments via correlation with markers for clinical ability and called for additional validation studies because simulation is increasingly used beyond formative feedback for high-stakes assessment.

This study aimed to evaluate the criterion and external validity evidence for SBMA scores as a metric of residents' ACGME milestone competency achievement by analyzing the association between those scores and three other markers of progressive competency: (1) whether SBMA scores discriminate between junior and senior residents, (2) whether SBMA scores improved as residents progressed through residency training, and (3) whether SBMA scores correlated to residents' other measures of clinical competencies as determined by their performance in the clinical setting by supervising attending physicians.

The results show that most SBMA scenarios discriminated among residents of different PGY level. This finding supports the validity of using simulation encounters to evaluate milestones that would otherwise be difficult or impossible to capture in a predictable and repeatable way via routine clinical encounters. Of importance, the methodology described provides a means to screen SBMA scenarios with regard to whether a flaw or insensitivity to detect differences among competency levels exists. For example, scenarios that fail to discriminate by PGY should be reviewed to identify whether they contain irregularities or failed execution of standardization (eg SP error, technology failure, moulage issues). This can be accomplished by video review or feedback from individuals who participated in, or observed live, SBMA sessions (eg, SPs, faculty coordinators, staff, and resident participants). In some cases, review of scenarios that fail to discriminate among competency levels do not demonstrate flaws but instead may be either “too hard” or “too easy.” Although scenarios that are “too hard” or “too easy” for all residents may be valid in the sense that they reflect true measures of performance, they may not be sensitive enough to infer varied competence levels among residents for assessment of milestone achievement. This is distinct from situations where simulation-based assessments are used solely to determine a minimum level competency, such as judging appropriateness for new residents to perform invasive procedures without continuous supervision. As such, cases that fail to discriminate can be reviewed and deemed valid or not for the intended purpose of the assessment. Scenarios that are invalid because of simulation environment failures may be either revised or eliminated from the SBMA catalog. However, scores from invalid SBMA encounters should be expunged from residents' records. Because faculty time and monetary resources have been identified as major barriers to simulation-based activities, identification of insufficiently sensitive scenarios allows precious simulation resources to be reallocated to other assessments or training.26,27

This study also demonstrated that SBMA scores improved over time. Just as residents' procedural skills and clinical knowledge are expected to increase as a result of accumulated patient care encounters and related didactic or self-directed learning, so should performance in simulated encounters that mimic patient care, medical decision-making, treatment, and interventions. Improvement may not always occur in a continuous upward trend, but significant improvement over a longer time interval should be expected if the SBMA represents the actual skill level. These results demonstrate improvement for the two academic year study period, which provides further support for their use as a valid metric of competency.

Finally, the positive correlation between resident performance on SBMA and clinical performance as evaluated by direct observation by faculty demonstrates that residents who generally perform less well than others receive scores consistent with that level of performance across all evaluations received. The same is true for residents who perform better than others. Residents' performance as measured during SBMA is similar to residents' performance as measured during clinical encounters. This correlation supports the validity of using SBMA as a supplement to capture and evaluate milestones competencies in clinical contexts that either are infrequent in nature (eg, challenging ethical situations) or require significant attending physician involvement that affects the appropriate assessment of a trainee's competency (eg, crisis management) and therefore may be less likely to be sufficiently observed during clinical care activities to meet the ACGME reporting requirements.

Because these SBMA demonstrated discrimination by PGY of experience, improvement over time as residents progress through residency training, and correlation with clinical evaluations, the results provide evidence for validity of SBMA as metrics of milestone achievement. This study is the first report that explicitly evaluates the validity of simulation-based or OSCE-style activities as a means to reliably generate ACGME milestone data for anesthesiology, particularly for those milestones that are clinically rare or otherwise difficult to capture during routine clinical activities.

There are some limitations of this study that should be noted. First, these data come from a single center and represent the output of a highly resourced simulation program that has a well-equipped simulation facility as well as faculty with extensive experience writing simulation scenarios, preparing simulation environments, training SPs, and standardizing simulation scenarios. Extrapolation to other centers and intercenter variability may limit the generalizability of these findings. Second, the data were collected in a retrospective nature. That is the data used were obtained from an initiative that was initially intended to address the difficult task of thorough and objective milestone evaluations for a large residency program. Future prospective design of scenarios, with special attention to weighting difficulty and diversity of clinical scenarios, would be important for further evaluating and strengthening the body of evidence to support simulation-based milestone assessments. Third, the simulation-based milestones assessments focused on subcompetencies that can be elusive to observe during routine clinical activities and patient care—specifically, professionalism-, communication-, and crisis management-related milestones. Future work should be conducted to examine whether these conclusions may be applied to milestones beyond the subset studied. It will also be helpful to investigate scenarios that are more homogenous with regard to assessing behaviors that cannot reliably be observed or evaluated in actual patient care for ethical or logistical reasons. A fourth limitation to the study is attributable to the nuanced differences that exist between the clinical settings and SBMA settings. Clinical settings often provide nuanced contextual cues, such as dynamic physical environments or varied conversational responses from clinical experts who are absent in simulated settings. Future research in developing SBMA scenarios for in situ clinical environments or using clinical experts in place of SPs may help reduce this limitation. Lastly, as described in the methods section, most encounters were scored by a single rater for each SBMA scenario. The data set is composed of more than 1200 video clips, each between 7 and 15 minutes in length. It would be prohibitively resource intensive for most academic centers to dedicate two faculty raters to score each scenario and would certainly increase, rather than decrease, the burden of ACGME milestone reporting. This design also resembles real-life clinical care encounters and evaluations that are most likely completed by one single faculty member. Moreover, scoring sheets contain specific behaviors that provide clear, succinct, and objective elements for observation by a single observer. For these reasons, we believe single-rater video evaluation is at least as good as the current criterion standard of resident milestone competency evaluation (ie, single faculty evaluation per encounter and many different faculty raters across all clinical encounters).

In summary, SBMA scenarios discriminated among residents of different experience levels, SBMA scores improved over time, and there is good correlation between SBMA scores and clinical performance evaluations, providing evidence of construct and external validity for this set of SBMA scenarios. Simulation-Based Milestones Assessment may therefore be used as metrics of residents' ACGME milestone competencies and progress throughout residency. In addition, SBMA could serve as an adjunct to real clinical encounter observations for the purpose of reporting infrequently occurring, or difficult to observe, ACGME milestone competency scenarios.


1. Steadman RH, Huang YM. Simulation for quality assurance in training, credentialing and maintenance of certification. Best Pract Res Clin Anaesthesiol 2012;26:3–15.
2. Blum RH, Boulet JR, Cooper JB, Muret-Wagstaff SL. Harvard Assessment of Anesthesia Resident Performance Research G: simulation-based assessment to identify critical gaps in safe anesthesia resident performance. Anesthesiology 2014;120:129–141.
3. Stiegler MP, Gaba DM. Decision-making and cognitive strategies. Simul Healthc 2015;10:133–138.
4. Mudumbai SC, Gaba DM, Boulet JR, Howard SK, Davies MF. External validation of simulation-based assessments with other performance measures of third-year anesthesiology residents. Simul Healthc 2012;7:73–80.
5. Boulet JR, Murray DJ. Simulation-based assessment in anesthesiology: requirements for practical implementation. Anesthesiology 2010;112:1041–1052.
6. Fehr JJ, Boulet JR, Waldrop WB, Snider R, Brockel M, Murray DJ. Simulation-based assessment of pediatric anesthesia skills. Anesthesiology 2011;115:1308–1315.
7. American Board of Anesthesiology Newsletter. Raleigh, NC; March 2013.
8. ACGME: The Anesthesiology Milestone Project. The Accreditation Council for Graduate Medical Education and The American Board of Anesthesiology, 2015. Available at: Accessed February 12, 2017.
9. Brydges R, Hatala R, Zendejas B, et al. Linking simulation-based educational assessments and patient-related outcomes: a systematic review and meta-analysis. Acad Med 2015;90:246–256.
10. Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ Theory Pract 2014;19:233–250.
11. Hastie MJ, Spellman JL, Pagano PP, Hastie J, Egan BJ. Designing and implementing the objective structured clinical examination in anesthesiology. Anesthesiology 2014;120:196–203.
12. McIntosh CA. Lake Wobegon for anesthesia…where everyone is above average except those who aren't: variability in the management of simulated intraoperative critical incidents. Anesth Analg 2009;108:6–9.
13. Isaak R, Chen F, Hobbs G, Martinelli SM, Stiegler M, Arora H. Standardized Mixed-Fidelity Simulation for ACGME Milestones Competency Assessment and Objective Structured Clinical Exam Preparation. Med Sci Educ 2016;26:437–441.
14. Ker JS, Dowie A, Dowell J, et al. Twelve tips for developing and maintaining a simulated patient bank. Med Teach 2005;27:4–9.
15. Adamo G. Simulated and standardized patients in OSCEs: achievements and challenges 1992–2003. Med Teach 2003;25:262–270.
16. May W. Training standardized patients for a high-stakes Clinical Performance Examination in the California Consortium for the Assessment of Clinical Competence. Kaohsiung J Med Sci 2008;24:640–645.
17. American Society of Anesthesiologists C: Practice guidelines for preoperative fasting and the use of pharmacologic agents to reduce the risk of pulmonary aspiration: application to healthy patients undergoing elective procedures: an updated report by the American Society of Anesthesiologists Committee on Standards and Practice Parameters. Anesthesiology 2011;114:495–511.
18. Apfelbaum JL, Hagberg CA, Caplan RA, et al. Practice guidelines for management of the difficult airway: an updated report by the American Society of Anesthesiologists Task Force on Management of the Difficult Airway. Anesthesiology 2013;118:251–270.
19. Baker-Genaw K, Kokas MS, Ahsan SF, et al. Mapping direct observations from Objective Structured Clinical Examinations to the milestones across specialties. J Grad Med Educ 2016;8:429–434.
20. Isaak R, Hobbs G, Kolarczyk L, Balfanz G, Bass C, Stiegler M. Real time vs. delayed assessment of ACGME milestones through simulation scenarios. Abstracts of the 16th Annual International Meeting on Simul Healthc. Simulation in Healthcare 2015;10:399.
21. Rebel A, Isaak R, DiLorenzo A, et al. Rating objective standardized clinical examinations: it matters how (but not who). Anesth Analg 2016;122:S 117.
22. Swing SR. Assessing the ACGME general competencies: general considerations and assessment methods. Acad Emerg Med 2002;9:1278–1288.
23. Schott M, Kedia R, Promes SB, et al. Direct observation assessment of milestones: problems with reliability. West J Emerg Med 2015;16:871–876.
24. Cook DA, Beckman TJ. Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX. Adv Health Sci Educ Theory Pract 2009;14:655–664.
25. Sinz E. Simulation for anesthesiology milestones. Int Anesthesiol Clin 2015;53:23–41.
26. Savoldelli GL, Naik VN, Hamstra SJ, Morgan PJ. Barriers to use of simulation-based education. Can J Anaesth 2005;52:944–950.
27. Isaak RS, Chen F, Arora H, Martinelli SM, Zvara DA, Stiegler MP. A descriptive survey of anesthesiology residency simulation programs: how are programs preparing residents for the new American Board of Anesthesiology APPLIED Certification Examination? Anesth Analg 2017;125:991–998.

Appendix 1

Simulation-Based Milestone Assessment Scenario Sample

Part 1: Score Sheet for a Sample Simulation-Based Milestone Assessment

ACGME milestone language is provided in the left 2 columns. Scoring action/behavior elements translate milestone rubrics to the specific scenario context. The analytical behavior score range is between 0 and 5. Successful completion of all the behaviors for a given milestone level is worth 1.0; completing 1 of 2 behaviors successfully is worth 0.5 points; completing none is worth 0 points. The point value from each level of the milestone is tallied to obtain the total score for the milestone competency.

Milestone: Interpersonal and Communications Skills 1: Communication with patients and families

Scenario Title: “Medical Error Disclosure”

Scenario Overview: Resident discloses a medication error that occurred on a child to the mother including explaining how the error occurred and a plan to prevent future similar errors

Part 2: Scenario information provided to the resident immediately before the start of the interaction.

Background/Setting: Conference room/waiting room

HPI: You have just completed a laparoscopic appendectomy taking for a healthy 9-year-old boy (Michael Williams) with acute appendicitis. You took over the case before the beginning of surgical closing. The procedure lasted 45 minutes, but the patient was not initiating any breaths at the end of the surgery. A twitch monitor was placed and 0/4 twitches were observed.

After going back over the medical record and investigating the syringes in your work area, you realized the previous anesthesia provider administered 2 mg/kg of cisatracurium instead of 0.2 mg/kg for induction. You made the decision to take the patient to the PICU intubated. The attending surgeon has already discussed the outcome of the surgery with the mother but left the OR before making the decision to go to the PICU, so he did not inform the mother. The previous anesthesia provider has already left the hospital and is unavailable. The preop from the operating is shown below for your reference.

OSCE Scenario Objective: Your objective is to meet with the mother and explain the current situation and answer her questions.


PMHx: None

PSHx: T&A 6 year ago with no complications

Allergies: NKDA

Meds: None

Social: (−) smoker, (−) drinker, (−) other drugs.

NPO: 12 hours


BP: 90/50; HR: 90; SPO2: 99%


General: AA0x3

Airway: MP1, nl TM distance, no loose teeth

Lungs: CTAB

Heart: normal rate, normal rhythm, no murmurs

Neuro: grossly normal

Vascular: strong palpable pulses

Labs: None

Anesthesia record: in the electronic medical record and not physically present in the conference room

Part 3: Script used by standardized patient for the possible directions of the scenario.

Milestone: Professionalism 2

Title: Medication Error Disclosure

Setting: Family waiting room

Appendix 2
Appendix 2:
Sample Score Report for Jan 2015 SBMA for a single resident (blue) in comparison to the other residents in the same year in training (red).

Milestone assessment; OSCE; ACGME; simulation assessment; OSCE validity; milestone assessment validity

Copyright © 2018 Society for Simulation in Healthcare