Rosen, Michael A. MA; Salas, Eduardo PhD; Silvestri, Salvatore MD; Wu, Teresa S. MD; Lazzara, Elizabeth H. BS
There are currently at least 2 rapidly evolving movements in graduate medical education in general, and specifically in Emergency Medicine (EM): 1) a focus on competency-based assessment,1 and 2) the use of simulation to train and assess complex skills.2 First, the Accreditation Council for Graduate Medical Education’s (ACGME) Outcome Project shifts the focus of Graduate Medical Education program accreditation from an assessment based on the structural and process features of the graduate program to the actual learning outcomes of residents. This is a shift from evaluating a program’s potential to educate to an evaluation of how much that program’s residents are actually learning. Therefore, a significant challenge for residency programs is the development of measurement tools that indicate the degree to which their residents are learning. Second, the use of simulation-based training (SBT) as a technique to provide guided learning experiences that replicate real world experiences and accelerate the development of expertise is on the rise.3,4 This is especially true for EM, where the characteristics of the discipline (eg, time pressure, highly dynamic environments5) make the contextualized practice opportunities of SBT particularly valuable.
This article addresses critical needs in both competency-based assessment and medical SBT by providing a generalizable methodology for systematically linking scenario development, performance measurement, and feedback to explicitly defined learning objectives rooted in the ACGME core competencies. This method is called the Simulation Module for Assessment of Resident Targeted Event Responses (SMARTER) Approach.6 The product of the SMARTER process is a set of simulation scenarios and accompanying measurement tools that capture performance during SBT. Subsequently, this measurement can be used to generate corrective feedback and assessment specific to ACGME competencies. This approach is not offered as a universal solution (eg, not all ACGME core competencies are best taught or assessed with the use of simulation, and SMARTER represents only one method of meeting these goals with simulation), but it is an addition to the toolbox of approaches needed to train and assess the complexities of the ACGME core competencies.1,7,8 SMARTER is one method for developing training and measurement tools, and (as with any one method) there are associated strengths and weaknesses. An overall curriculum should incorporate a broad spectrum of learning experiences and learning outcomes should be assessed through a variety of methods. There is an extensive scientifically based literature capable of guiding the development of curricula (eg, content development, test specification, scoring, and feedback). The SMARTER approach leverages portions of this literature into practical guidance for application to SBT. Many of the choices made during the development of the SMARTER approach represent tradeoffs between different techniques and methods. Before presenting SMARTER, a brief overview of the rationale of the methodology is provided.
OVERVIEW OF SBT, PERFORMANCE MEASUREMENT AND SMARTER
Measurement is a critical component of building reliable healthcare systems.9 Systematic change (toward increased safety) requires a methodological approach to measurement. Measurement guides change, but misconceived and poorly designed measurement systems misguide development for both the individual resident and the residency program. That is, a resident’s education is influenced by performance feedback during training and residency programs must now evaluate the effectiveness of and adjust their training methods based on the measurement of resident learning outcomes. Both are dependent on the quality of performance measurement. The core challenge addressed by the SMARTER approach is the development of measures that are diagnostic of the knowledge, skills and attitudes (KSAs) underlying the ACGME core competencies. The importance of performance measurement in SBT increases with the rise in use of SBT in EM residency curricula.10
How can SBT Help?
SBT is a methodology for accelerating the acquisition of expertise through the provision of structured learning experiences.11–13 It is a technique, and not a specific technology,4 that emphasizes guided practice activities in learning environments that reproduce certain aspects of the real world performance setting.
Therefore, because of its emphasis on dynamic practice, SBT is well suited for training and evaluating the ACGME core competencies, especially those involving dynamic interaction (eg, procedural skills, professionalism). First, it provides opportunities for performance and practice in an environment that replicates much of the “real world” complexity of clinical performance. Exposure to a learning environment that closely resembles the actual clinical environment increases the chances that learning will transfer to on-the-job performance.14 Similarly, measures of performance in simulation are likely to be more predictive of on-the-job performance than measures more distal from actual performance (ie, scores on a written test1) and therefore provide an opportunity for valid assessment. Second, SBT provides an error-tolerant and safe environment for practice; real patients are not at risk. This eliminates the competing goals of patient care and education in a resident’s clinical experience. Third, because competing goals are eliminated, SBT provides an opportunity for immediate feedback on performance which is critical to learning.15 Fourth, SBT offers control over the content of experience. The resident’s practice opportunities are no longer dependent on chance events such as the available patient population. This provides an added benefit for assessment, as SBT scenarios can be chosen systematically so that the ACGME core competencies are adequately sampled. Fifth, using structured observation protocols during SBT (ie, linking measurement to the structure of the simulation scenario) for the purposes of guiding feedback may eliminate some of the variability in the resident’s education because of the differing teaching abilities of clinicians. This structured measurement in dynamic situations also affords reliable and diagnostic assessment of complex performance.
The technology used during SBT can vary widely (eg, high-fidelity patient simulators, standardized patients, part-task trainers). The underlying commonality in the use of these various technologies is the provision of practice activities for the learner. The SMARTER approach can be applied to the development of any guided and dynamic practice activities. Other factors (eg, purpose, level of expertise of the trainee, and type of KSAs being trained) will determine the appropriateness of a specific technology for a given instance of SBT.
The SMARTER Contribution
SMARTER is an event-based approach to training (EBAT) and measurement16,17 that capitalizes on the unique opportunities afforded by SBT. This approach allows for the development of valid and reliable, formative and summative assessments of complex performance. The central contributions of the SMARTER approach involve providing a systematic means to 1) developing and maintaining links between simulation scenario events, performance measures, and ACGME core competencies, 2) generating diagnostic measurement that feeds the processes of providing corrective feedback to accelerate skill acquisition and providing learning outcomes data rooted in the ACGME core competencies, and 3) providing opportunities to perform that are structured to maximize learning and assessment opportunities. The SMARTER approach is discussed in detail in the following section.
THE SMARTER APPROACH
The SMARTER approach to assessment and feedback of ACGME core competencies involves an 8-step process, as illustrated in Figure 1. Three examples will be referred to throughout this section; the first is a scenario developed to capture the medical knowledge core competency in the context of a septic patient, the second captures the patient care core competency in the context of an emergency department resuscitation, and the third captures aspects of the professionalism core competency in the context of a refusal of treatment scenario. Several documents are created throughout the SMARTER process that serve to maintain connections between the scenario content, measurement tools, competencies, and learning objectives. These are summarized in Table 1 and full versions are available through the corresponding author.
Focus on a Subset of ACGME Core Competencies
As the main purpose of SMARTER is to provide linkages between AGCME core competencies, SBT, and measurement, the SMARTER process begins with clearly articulating a subset of focal competencies. Performance in EM is frequently complex and capturing all of it at once in a meaningful manner is not practical18 or necessary. Therefore, in the SMARTER approach, each scenario focuses on a specific subset of ACGME core competencies (eg, one or a smaller subset of several). The 3 examples provided in this document focus on medical knowledge, patient care, and professionalism. Essentially, this can be viewed as a sampling strategy, where each scenario developed is designed to sample a specific part of the core competencies content. Ultimately, many scenarios are developed, each focusing on a different portion of the core competencies, or on the same competencies in different clinical situations. This strategy has several benefits for assessment and training.
By nature, simulation of any one clinical task provides insight into a limited range of a resident’s KSAs.1,19,20 A resident’s performance on one clinical procedure may not yield much information about how that resident will perform on a different procedure. From an assessment perspective, using multiple scenarios that sample different competencies generates a more robust picture of a resident’s competency level (eg, can this individual demonstrate proficient patient care across a broad range of disease processes?). This strategy is akin to triangulation, where multiple measures are used to indicate a resident’s overall level of proficiency.21 For example, a single core competency can be evaluated in different clinical situations, and in this way, through viewing a competency from multiple “perspectives,” a clearer picture of overall competency can emerge. From a training perspective, each scenario provides an opportunity for in depth feedback on a particular area of the core competencies. Over many such focused scenarios, the resident is exposed to variations in cases present in the work environment (eg, chances to practice aspects of the same core competencies over different cases and disease processes), which accelerates the acquisition of expertise in complex domains.22 Additionally, from a measurement perspective, narrowing the focus in terms of what is measured is essential to maintaining reliable observations. Even highly trained and experienced observers are limited in what they can reliably rate.23
In sum, starting the development process with the ACGME core competencies will facilitate a systematic sampling of the KSAs that need to be trained. This will help to focus the scenario and measurement tool development, allowing a curriculum developer to craft a learning experience around competencies and not just sample from an unorganized list of disease processes. For example, taking acute myocardial infarction as an example clinical situation, the scenario and measurement tools will look quite different if the purpose is to evaluate medical knowledge or interpersonal and communication skills.
Define Specific Learning Objectives Rooted in ACGME Core Competencies
ACGME core competencies are at a high level of granularity and are intended to be applicable across specialty areas and to serve as a guide in developing specific learning objectives and measurement tools.24 Therefore, the second step in the SMARTER process is to define specific learning objectives that are rooted in the ACGME core competencies and are at a level that can be effectively trained and measured. Learning objectives are specifications of what a scenario is intended to instruct. For EM, as for other specialty areas, the integration of ACGME core competencies into the model of clinical practice helps to ground the core competencies and to articulate them in a more concrete fashion.25 However, these too may need further refinement to generate process measures of performance. The learning objectives for the 3 examples in this article have been taken directly from the ACGME core competencies and are listed in Table 1.
Choose a Clinical Context to Frame the Scenario Development
Once a set of focal core competencies has been chosen, and steps have been taken to define the learning objectives, a clinical scenario that can be used to assess the learning objectives should be chosen. For example, if the learning objective is to train or assess compassion, the selected scenario may be one centered on a victim of domestic violence. If an educator wants to evaluate professionalism in their residents, a scenario involving a patient’s refusal of services may be chosen. For instance, our third example scenario rooted in the ACGME core competency of professionalism involves the management of a Jehovah’s Witness patient who is refusing treatment for an abdominal aortic aneurysm on the basis of his religious beliefs. Clearly, there are a multitude of clinical scenarios that can be used to examine each of the core competencies and associated learning objectives. In the aviation domain, this has been addressed by developing libraries of scenarios so that competencies can be evaluated over many different performance contexts.26,27
Develop a Set of Targeted KSAs to Capture the Predefined Learning Objectives and Core Competencies
Developing a set of targeted KSAs is a critical step in bridging the gap between learning objectives rooted in the predefined ACGME core competencies and performance measures tied to the chosen clinical scenario. KSAs are the underlying knowledge, skills, and attitudes that a resident must have to exhibit effective performance28; that is, they are what a resident must know and be able to do to meet the learning objectives. These KSAs serve as the building blocks of performance measures and the development of scenarios. The purpose of training is to build the targeted KSAs in the resident and the purpose of assessment is to determine if the resident possesses the KSAs necessary for clinical performance. Therefore, carefully articulating the KSAs needed to meet the learning objectives is critical. Listings of KSAs are most readily generated by asking skilled individuals (ie, subject matter experts) to list such things as what distinguishes good performers from poor performers and why, and what is necessary to perform effectively in specific clinical scenarios or disease processes. Choosing a specific disease process or other clinical situation for the scenario goes a long way in defining the KSAs necessary for effective performance. In this way, KSAs can be generated by asking, what does the resident need to know and be able to effectively manage a chosen clinical situation? How are the training objectives manifested within a resident’s performance during a specific clinical situation? Methods of cognitive task analysis can be used to elicit the knowledge of domain experts in EM to generate lists of KSAs.29,30 Although this process can be labor intensive, it is a powerful tool for identifying the KSAs underlying expert performance. In addition to cognitive task analysis, literature reviews of evidence-based practice can serve as a valuable source for generating lists of KSAs.
In the medical knowledge scenario the resident is required to diagnose and treat a patient with sepsis. The KSAs for this scenario were developed by asking a series of questions in the context of a septic patient. What does the resident need to know and be able to do to demonstrate an investigatory and analytical thinking approach to clinical situations (learning objective 1 for this scenario)? What does the resident need to know and be able to do to know and apply the basic and clinically supportive sciences which are appropriate to their discipline (learning objective 2 for this scenario)? Answering questions such as these led to the 6 KSAs for this task which are listed in Table 1.
Defining the KSAs is perhaps the most critical step in the SMARTER process. Steps 1 and 2 are generally provided by the standardized core competencies and learning objectives. The effectiveness of all steps following the definition of KSAs depend on an accurate and complete list of KSAs. Because the KSAs needed are so dependent on which disease process or other clinical situation is being evaluated (eg, the KSAs needed to treat sepsis are different than those needed to perform resuscitation), the choice of these clinical events plays a central role in determining what KSAs can be trained or assessed in a particular scenario. Clinical events are often framed in terms of criticality and frequency. Although severity and frequency are not orthogonal, the polar extremes are generally considered high frequency/low severity and low frequency/high severity cases. Choosing the correct type of clinical event from this spectrum depends on several factors including the purpose of the simulation and the degree of expertise of the resident. For example, if the purpose of the simulation is training and the resident is relatively novice, then high frequency events of low criticality are useful, as it provides a safe environment to practice routine KSAs that the resident has not had the opportunity to develop. However, when training more advanced residents who have mastered these routine procedures, simulating lower frequency events provides an opportunity to practice new KSAs.
Craft the Scenario to Ensure Residents Have the Opportunity to Display Targeted KSAs—Define the Critical Events
Practice, without competency-based guidance, does not make perfect, especially for complex tasks like those in EM.31,22 Learners must be guided through practice opportunities to ensure that they learn the correct KSAs.31–33 Similarly, from an assessment perspective, the scenario must be standardized in a way that allows for comparison of results either to a set performance criterion or to the performance of other residents. Once the KSAs have been clearly defined, the scenario must be carefully engineered to ensure that the resident has the appropriate opportunities to perform. The scenario must elicit performance that requires the resident to display the targeted KSAs. This is achieved by inserting trigger or critical events into the scenario.
A trigger event is a routine or unexpected prompt defined in terms of changes in the state of the situation originating from sources under the control of those conducting the simulation (eg, verbal communication from confederates, changes in simulated patient physiology). The key is that the events are controllable and elicit observable behaviors linked to KSAs and learning objectives. The types of events inserted into a scenario contribute to the validity of the measurement (ie, events determine what the resident is required to do and hence what can be measured in any scenario) and the number of events influences the reliability of the measure (ie, more items on a test yields more internally consistent tests17,34).
For example, in the sepsis case evaluating the core competency of medical knowledge one of the learning objectives requires that the resident knows and applies the basic and clinically supportive sciences which are appropriate to their discipline. This learning objective was used to define the KSA that involves the resident recognizing the indications for early goal directed therapy (EGT) and applying the algorithm for EGT in sepsis. To assess this particular KSA, the scenario was crafted to give the resident multiple opportunities to demonstrate comprehension of EGT. Critical events crafted to elicit this response include verbal cues from the confederate nurse, and changes in the patient’s physiology that render initial resuscitation attempts fruitless. A sample of the events for each of the 3 example scenarios is included in Table 1.
Define a Set of Targeted Responses
Each of the critical events defined in the preceding step needs to be linked with targeted responses—objectively observable behaviors exhibited by the resident in response to the critical events. That is, for each deliberately inserted event in the scenario, a list of acceptable and directly observable resident behaviors needs to be developed. These behaviors are indicators of the presence or absence of the targeted KSAs. Targeted responses indicate whether or not (or to what degree over multiple responses) a resident possesses the explicitly defined KSAs. This chain provides an explicit connection between what is measured and the core competencies. For example, in the patient care scenario, a critical event occurs when the confederate attaches the cardiac rhythm leads. When this occurs, there are 4 targeted responses expected of the resident: 1) recognizing rhythm and verbalizing ventricular fibrillation, 2) ordering immediate defibrillation at 360 J, 3) continuing cardiac compressions, and 4) ordering intravenous access with normal saline. Each of these responses are linked to the KSA of recognizing cardiac rhythm, which is in turn representative of the competency of patient care and the specific learning objective of gathering essential and accurate information about the patient in the context of the resuscitation scenario.
Although the goal of clinicians is to practice evidence based medicine, individual styles and preferences are involved in daily practice. Even when presented with the exact same clinical scenario, the targeted responses defined by one clinician may differ from those generated by another. In a patient who presents with overwhelming sepsis, one educator may be satisfied by a resident ordering a venous lactate, where another may want their residents to consistently order arterial lactate levels. Both actions demonstrate a general understanding of the pathophysiology of sepsis. Because various practice patterns do exist, it is important to define responses that are globally accepted critical actions, and not ones that may be influenced by individual preference. However, when there is an unequivocal and evidence-based correct action in a given situation that is the response that should be trained and measured.
The SMARTER integration form tracks the linkages between ACGME core competencies, learning objectives, KSAs, critical events, and targeted responses. A summary is provided in Table 1. It is frequently difficult to define a specific set of behaviors linked to one and only one event. Therefore, complex mappings of events to responses are possible. One event may elicit many responses; one response may be linked to several events. Choice of responses and events should provide discrimination. That is, they should not be so basic that everyone does them and they should not be so above the residents’ level of expertise that no one does them. In both of these situations there will be no variability in scores (which makes them useless for assessment) and no strong basis for feedback. The level of discrimination of a set of events and responses will change based on the expertise level of the residents being trained or evaluated. By associating responses with specific events, the groundwork is put in place for capturing the timeliness of actions. For instance, 2 resident’s may both perform the same actions but at drastically different times, or a resident may not perform a specific targeted response but later in the scenario perform an equivalent action. Events serve as landmarks throughout the scenario from which latency measures can be derived to capture these differences.
Create Diagnostic Measurement Tools
After targeted responses for each event have been identified, event-based measurement tools can be developed readily. In the most basic case, these measurement tools take the form of event-based checklists. An example is provided in Figure 2. The events are simply ordered in time and the associated responses are grouped for each event. A check box is provided for the rater to mark whether or not the resident performed the behavior. Scores on these measures can be used in several ways. Percent “hit” (ie, number of targeted responses performed) can be calculated for each event or KSA and used as an outcome measure to assess level of proficiency or identify which KSAs the resident needs to focus on developing. Additionally, process-oriented feedback based on the flow of events and responses can be generated. SMARTER tools capture performance variations over time in that there are multiple opportunities to perform within a given scenario (ie, multiple events) which are linked to a common dimension (ie, KSAs and core competencies). This affords a detailed look at the resident’s performance over time that is lost with measures operating at a lower level of temporal resolution (eg, a global rating scale forces an averaging of performance over time). For learning purposes, it is sometimes necessary for the instructor to intervene during performance. To accurately diagnose the resident’s level of competency, there needs to be a way to distinguish between a behavior performed independently by the resident, or with instructor guidance (IG). This box labeled “IG” on the SMARTER observation forms provides a means to capture this information.
Two features of these checklists have proven extremely beneficial. First, because the checklists are event-based and the scenarios are scripted a priori, the rater knows when each event will occur. This allows the rater to focus on the critical aspects of performance at the right time and hence reduces workload which in turn increases reliability. Second, the raters are only asked to score the presence or absence of objectively defined behaviors (ie, did the resident do behavior x?). By using dichotomous ratings, observers are only asked to detect whether or not a behavior occurred, they are not asked to make judgments about quality of the response. This affords greater reliability as observers do not have to be “calibrated” to each other or to a set performance criterion.35 For example, it has been shown that experts and novices can achieve equally high levels of reliability using the dichotomous event-based behavioral ratings.36 There are tradeoffs between different types of scoring and an extensive literature base exists on the relative strengths and weaknesses of these approaches. We have outlined the strengths of rating explicit behaviors above; however, one tradeoff is that developing these types of measures can involve considerable time and effort.
There are variations that can be introduced to the basic form of the event checklist. First, defining “acceptable” responses may cause debate. Ideally, as previously discussed, all targeted responses should have evidence-based support; however, stylistic responses can be included (eg, if a resident administers antibiotics, does it have to be given IV? Or does PO or IM dosing earn them the same score?). If this is done, a hierarchical structure can be used where evidence-based and stylistically based responses are weighted differently. Second, time is a critical aspect of performance, especially in EM. Latency measures or ratings of timeliness can be added to the measurement of responses by associating time windows with each event.37 That is, the time from onset of an event to the response of the resident is measured. This is more easily accomplished when SMARTER observation forms are electronic so that latency information can be captured automatically. Third, the dichotomous rating can be replaced with a Likert scale or other multipoint rating where the quality of the response is rated (eg, the resident may have performed the required action, but it was poorly done). When observers must make judgments on the quality of a behavior and not just whether it occurred or not, more training is needed to ensure that responses are rated reliably across observers and that ratings do not drift over time. The decision to use behavioral indicators versus more general rating scales involves a tradeoff between time spent developing items (ie, creating behavioral indicators is frequently more time consuming) and time spent calibrating raters (ie, more general rating scales are more susceptible to decay in reliability).
EBAT measurement tools have been developed and validated for the purposes of training and assessment in numerous complex domains such as military command and control,38 aircrew coordination,39 problem solving skills,40 and distributed teamwork training.41 In these cases, EBAT measurement tools have proven to have sound psychometric properties including high reliability and sensitivity to defined constructs underpinning performance.17 When the purpose of measurement is summative assessment, there is an added step in the process of creating a criterion by which scores in the simulation can be judged. In this case, scoring each response in terms of criticality becomes more important (eg, weighting the points attached to each response relative to importance in the scenario). Determining response weighting has been achieved through expert opinion.42–45 Additionally, the evidence-based literature can serve as a basis for weighting responses.
Create Scenario Script
A scenario script is a plan for how the simulation will unfold, a means to coordinate the multiple actors (eg, confederates and simulation technology) so that the simulation meets its objectives. Creating a script for the scenario ensures that all involved in the simulation (eg, confederates) are on the same page. People involved know when to do or say what. A well-designed script ensures that critical events occur at the right time. It also standardizes the “noise” (ie, noncritical event aspects of the simulation) that occur from session to session. This is especially important when using simulations for high stakes assessment when performance is being compared with others in a group or to some set criterion.
Generating the script falls near the end of the SMARTER process, after the definition of competencies, learning objectives, KSAs, critical events, and targeted responses. However, it is unfortunately common practice for simulation programs to create the scenario script first and then define their critical action measurement tools based on the written case. This amounts to working backward from the SMARTER perspective—using an existing case versus systematically identifying the KSAs that need to be trained and engineering practice opportunities capable of training and evaluating the targeted KSAs.
The SMARTER methodology outlined above represents the first effort at adapting the successful event-based measurement approach to EM and, as such, will need further testing and refinement. However, SMARTER holds promise for addressing critical needs. First, SMARTER characterizes the complexity of a resident’s performance in terms of objective behaviors that are responses to events defined a priori. This combination of focusing on the presence or absence of key behaviors and tight control over events occurring during the simulation allows for reliable and valid measurement of complex performance. Additionally, because SMARTER measurement tools are process-oriented (ie, they capture the dynamic behaviors of performance, not just the end result), SMARTER provides a basis for diagnostic feedback. That is, linkages between the resident’s responses and KSAs allow for detailed assessment of the areas where the resident needs improvement. This feedback can be developed rapidly after a simulation session. Debriefing aids can be developed automatically based on computerized versions of the event-based checklists. The SMARTER process also provides a systematic methodology for addressing the content of simulation scenarios that clearly links scenario development to the ACGME core competencies. Clear connections are drawn between what is simulated, what is measured, and the core competencies. SMARTER also provides a basis for structured and systematic feedback that addresses the KSAs underlying performance and thus removes some of the variability in a resident’s learning experiences.
Is SMARTER Practical?
A rigorous evaluation of the practicality, reliability, validity, and diagnosticity of the measurement tools and scenarios developed using the SMARTER approach is currently underway. However, a preliminary evaluation has provided some initial support. A total of 29 observers, including physicians at all levels of training (attending physicians, PGY I’s, II’s, and III’s) and medical students viewed a video of the sepsis scenario as a group and individually rated the resident’s performance using the SMARTER observation form. See Figure 2 and Table 1 for samples of the observation form, critical events, targeted responses, and linkages to KSAs and competencies. Despite several limitations of this evaluation (ie, rater’s were not familiar with the simulation scenario before hand and given only a brief introduction to the SMARTER tool, and poor audio and video quality during the presentation of the scenario the interfered with rater’s ability to discern events and responses), there were reasonably high levels of agreement between raters. There was an average level of agreement with a “gold standard” scoring of the video of 78.12% (SD = 7.95) on a total of 14 events and 62 targeted responses. Because of technical complications during playback of the video taped session (ie, several raters reported difficulty understanding the audio) and unfamiliarity of the observers with the scenario and the measurement tool, this represents a conservative estimate of levels of inter-rater agreement using SMARTER observation forms. Additionally, after rating the scenario, each observer filled out a brief questionnaire concerning their experience and reactions to using the SMARTER observation form. On a 7 point scale (with 7 indicating a positive response), observers indicated that the SMARTER form was useful for completing assessments of specific tasks (mean = 5.87, SD = 1.1), generating feedback during a debriefing session (mean = 5.83, SD = 1.1), and assessing resident performance (mean = 5.47, SD = 1.3). Additionally, the number of responses rated was judged to be appropriate (mean = 5.1, SD = 1.5), and overall, the form was rated as being easy (mean = 4.8, SD = 1.2), and practical (mean = 4.93, SD = 1.2) to use. Again, because of the lack of familiarity of the observers with the scenario and audio/video quality, we view these as conservative estimates. A more rigorous evaluation of these issues is currently underway.
The ACGME Outcome Project has crystallized a pre-existing need for more robust, reliable, valid, and diagnostic forms of measurement in the education of residents. As simulation takes a larger role in the educational experiences of residents, more standardized and theory-based approaches to SBT and performance measurement are necessary. SMARTER is an example of this type of measurement. Although in need of further development and testing, the SMARTER approach is a comprehensive method for generating not only measurement tools for SBT and assessment, but the scenarios themselves. The SMARTER approach ensures that the training content and measurement tools are grounded in the competencies to be trained or assessed. SMARTER is not a “cure-all” for assessing learning outcomes in EM, but it is widely applicable to training and assessment of the portions of the core competencies that require dynamic performance.
The authors thank the 3 anonymous reviewers for extensive and constructive comments on an earlier version of this article.
1. Swing SR. Assessing the ACGME general competencies: general considerations and assessment methods. Acad Emerg Med 2002;9:1278–1288.
2. Bond WF, Lammers RL, Spillane LL, et al. The use of simulation in emergency medicine: a research agenda. Acad Emerg Med 2007;14:353–363.
3. Jha AK, Duncan BW, Bates DW. Simulator-based training and patient safety (Evidence Report/Technology Assessment: Number 43). Rockville, MD: Agency for Healthcare Research and Quality; 2001.
4. Gaba DM. The future vision of simulation in health care. Qual Saf Health Care Suppl 2004;13:i2–i10.
5. Wears RL, Perry SJ. Human factors and ergonomics in the emergency department. In: Carayon P, ed. Handbook of Human Factors and Ergonomics in Health Care and Patient Safety. Mahwah, NJ: Erlbaum; 2007:851–863.
6. Silvestri S, Wu TS, Salas E, et al. Beyond the basics: brining simulation theory and technology together. Paper presented at: Council of Emergency Medicine Residency Directors (CORD) Academic Assembly; March 2, 2007; Orlando, FL.
8. Swick S, Hall S, Beresin E. Assessing the ACGME Competencies in Psychiatry Training Programs. Acad Psychiatry 2006;30:330–351.
9. Measurement: the heart of patient safety. Joint Commission Benchmark 2006;8:4–7.
10. Binstadt ES, Walls RM, White BA, et al. A comprehensive medical simulation education curriculum for emergency medicine residents. Ann of Emerg Med 2007;49:495–507.
11. Salas E, Priest HA, Wilson KA, et al. Scenario-based training: improving military mission performance and adaptability. In: Britt TW, Adler AB, Castro CA, eds. Military Life: The Psychology of Serving in Peace and Combat. Vol. 2. Operational Stress. Westport, CT: Praeger Security International; 2006:32–53.
12. Issenberg SB, McGaghie WC, Petrusa ER, et al. Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review. Med Teach 2005;27:10–28.
13. McFetrich J. A structured literature review on the use of high fidelity patient simulators for teaching in emergency medicine. Emerg Med J 2006;23:509–511.
14. Hays RT, Singer MJ. Simulation Fidelity in Training System Design. New York: Springer-Verlag; 1989.
15. Ende J. Feedback in clinical medical education. JAMA 1983;250:777–781.
16. Fowlkes JE, Dwyer DJ, Oser RL, et al. Event-based approach to training (EBAT). Int J Aviat Psychol 1998;8:209–221.
17. Fowlkes JE, Burke CS. Targeted Acceptable Responses to Generated Events or Tasks (TARGETs). In: Stanton N, Hedge A, Brookhuis K, et al, eds. Handbook of Human Factors and Ergonomics Methods. Boca Raton, FL: CRC Press; 2005:53–51, 47–56.
18. Vreuls D, Obermayer RW. Human-system performance measurement in training simulators. Hum Factors 1985;27:241–250.
19. Dawson S. Procedural simulation: a primer. J Vasc Interv Radiol 2006;17:205–213.
20. Norman GR. Striking the balance. Acad Med 1994;69:209–210.
21. Boulet JR, Murray D, Kras J, et al. Reliability and validity of a simulation-based acute care skills assessment for medical students and residents. Anesthesiology 2003;99:1270–1280.
22. Schneider W. Training high-performance skills: fallacies and guidelines. Hum Factors 1985;27:285–300.
23. Holt RW, Hansberger JT, Boehm-Davis DA. Improving rater calibration in aviation: a case study. Int J Aviat Pyschol 2002;12: 305–330.
25. Chapman DM, Hayden S, Saunders AB, et al. Integrating the accreditation council for graduate medical education core competencies into the model of the clinical practice of emergency medicine. Ann Emerg Med 2004;43:756–769.
26. Jentsch F, Bowers C, Berry D, et al. Generating line-oriented flight simulation scenarios with the RRLOE computerized tool set. Paper presented at: The 45th Annual Meeting of the Human Factors and Ergonomics Society. 2001; Santa Monica, CA.
27. Bowers C, Jentsch F, Baker D, et al. Rapidly reconfigurable event-set based line operational evaluation scenarios. Paper presented at: The Human Factors and Ergonomics Society 41st Annual Meeting. 1997; Albuquerque, NM.
28. Goldstein IL, Ford JK. Training in Organizations. 4th ed. Belmont, CA: Wadsworth; 2002.
29. Crandall B, Klein G, Hoffman RR. Working Minds: A Practitioner’s Guide to Cognitive Task Analysis. Cambridge, MA: MIT Press; 2006.
30. Schraagen JM, Chipman SF, Shalin VL, eds. Cognitive Task Analysis. Mahwah, NJ: Erlbaum; 2000.
31. Ziv A, Wolpe PR, Small SD, et al. Simulation-based medical education: an ethical imperative. Acad Med 2003;78:783–788.
32. Lorenzet SJ, Salas E, Tannenbaum SI. Benefiting from mistakes: the impact of guided errors on learning, performance, and self-efficacy. Hum Resource Dev Q 2005;16:310–322.
33. Gaba DM. Safety first: ensuring quality care in the intensely productive environment—the HRO model. APSF Newslet 2003;18: 1–5
34. Nunnally JC, Bernstein IH. Psychometric Theory. 3rd ed. New York: McGraw-Hill; 1994.
35. Bakeman R, Gottman JM. Observing Interaction: An Introduction to Sequential Analysis. 2nd ed. Cambridge, UK: Cambridge University Press; 1997.
36. Stout RJ, Salas E, Fowlkes J. Enhancing teamwork in complex environments through team training. Group Dyn 1997;1:169–182.
37. Rothrock L. International journal of cognitive ergonomics. Intl J Cogn Ergon 2001;5:1–21.
38. Johnston JH, Smith-Jentsch KA, Cannon-Bowers JA. Performance measurement tools for enhancing team decision-making training. In: Brannick MT, Salas E, Prince C, eds. Team Performance Assessment and Measurement: Theory, Methods, and Applications. Mahwah, NJ: Erlbaum; 1997:311–327.
39. Fowlkes JE, Lane NE, Salas E, et al. Improving the measurement of team performance: the TARGETs methodology. Mil Psychol 1994;6:47–61.
40. Oser RL, Gualtieri JW, Cannon-Bowers JA. Training team problem solving skills: an event-based approach. Comput Hum Behav 1999;15:441–462.
41. Dwyer DJ, Oser RL, Salas E, et al. Performance measurement in distributed environments: initial results and implications for training. Mil Psychol 1999;11:189–215.
42. Murray DJ, Boulet JR, Ziv A, et al. An acute skills evaluation for graduating medical students: a pilot study using clinical simulation. Med Ed 2002;36:833–841.
43. Murray DJ, Boulet JR, Kras JF, et al. Acute care skills in anesthesia practice. Anesthesiology 2004;101:1084–1095.
44. Murray DJ, Boulet JR, Kras JF, et al. A simulation-based acute skills performance assessment for anesthesia training. Anesth Analg 2005;101:1127–1134.
45. Boulet JR, Murray D, Kras J, et al. Reliability and validity of a simulation-based acute care skills assessment for medical students and residents. Anesthesiology 2003;99:1270–1280.
© 2008 Lippincott Williams & Wilkins, Inc.