Most interns have never performed an infant lumbar puncture (ILP) before starting residency.1 In a single-institution study, our group noted that simulation-based training increased interns’ first ILP success rates; however, a larger multicenter study did not replicate these findings.2,3 Unfortunately, an infant experiencing an unsuccessful LP is 5 times more likely to be hospitalized with an added cost of more than $2000.4 Currently, supervisors have limited data to inform their decision about when a trainee has sufficient skill to safely perform his or her first procedure on a patient.5–7
Workplace-based assessments involve short periods of observation with feedback that take place in the clinical environment.8,9 They are commonly used in graduate medical education to assess procedural performance in the United Kingdom and Australia.8,10 We propose that simulation can be used as an additional source of workplace-based assessment data to guide supervisors’ decisions related to trainees’ procedural readiness. Simulation provides an opportunity for safe procedural skills assessment before performance on patients, thus limiting potential harm.11–14 Simulation-based assessments have been documented to correlate with improvements in patient outcomes for a variety of procedural skills.15,16 A recent meta-analysis by Brydges et al15 noted only 5 studies correlating simulation-based assessments with patient outcomes. None of these 5 studies involved simulation-based assessment in close proximity in time and space to clinical performance. There was also no attempt to link assessments to clinical performance. This study involved simulation-based assessments in the clinical environment in an attempt to measure correlation with actual performance.
This article describes a just-in-time, simulation-based assessment methodology incorporating a global rating scale (GRS) in the workplace to assess the construct of interns’ readiness to safely perform their first clinical ILP. We aimed to describe the correlation of a simulation-based assessment in the workplace with subsequent clinical ILP success. In addition, we will describe the development of a cutoff point to serve as a minimum passing score on this simulation-based instrument that could be used as a source of information for supervisors making decisions related to interns’ readiness to safely perform a clinical ILP.
Primary Study Design and Setting
This report represents a subcomponent of a multi-institution, prospective study conducted over 2 consecutive academic years (2010–2012) at 33 academic medical centers (see Table, Supplemental Digital Content 1, http://links.lww.com/SIH/A259, which lists the participating sites and enrollment per year) exploring the impact of a just-in-time training intervention on procedural success.17 Enrolled sites were all members of the International Network for Simulation-Based Pediatric Innovation, Research and Education (INSPIRE).18 Either verbal or written informed consent was obtained based on individual institutional requirements. For some sites, the need for informed consent was waived by the institutional review board. Participants were incoming postgraduate year 1 trainees (interns) recruited from pediatrics, emergency medicine, or combined pediatrics residency programs (eg, medicine-pediatrics) at the start of the academic year for the primary study. All of the eligible interns agreed to participate in the study during an initial training session described later.
The detailed elements of our training bundle have been reported in previous studies.1–3,17,19–21 In brief, the overall study protocol involved interns viewing a video describing the indications, contraindications, complications, necessary equipment, and key steps of the ILP through expert modeling of the procedure on a simulator (production: Imaginehealth, New York, NY).22 Next, they completed simulation-based training facilitated by expert clinical faculty that used an ILP task-trainer (BabyStap; Laerdal Medical, Stockholm, Sweden). Practice sessions on the simulator continued until the intern achieved a predefined mastery performance standard.2,17
The assessment in this study focused on interns caring for infants (<365 days old) requiring an ILP. Interns first completed a simulation-based “just-in-time” skills refresher, after which their performance was assessed using a global rating tool. The simulators, checklists and necessary equipment were made available in clinical areas where ILPs are routinely performed at all of the participating institutions (eg, neonatal intensive care unit, emergency department, inpatient unit). The supervisor working with the intern on that shift facilitated the refresher, the skills assessment (described later), and the clinical ILP. The simulations occurred in the workplace on the unit where the ILP procedure was performed. If an intern performed poorly on the assessment, he or she was given the opportunity to continue to practice on the simulator as time allowed, and the assessment was repeated on that same day. When multiple assessments occurred, the documented assessment was the final assessment before performing the clinical procedure to ensure that the assessment used had the highest possible correlation with real-world performance. There was no study recommendation on who should be allowed to perform a clinical ILP. These decisions were based on local standards of care.
The workplace-based assessment involved using a GRS tool to rate the interns’ performance on a simulator.19,21 Global rating scale assessment instruments provide a broad-based assessment using behaviorally anchored categories (eg, 1 = novice, 3 = competent, 5 = expert) where each numeric value has a set of anchors describing the corresponding level of skill.23 Global rating scale instruments are more efficient and feasible than checklists when experts are conducting assessments.24,25 Because this study aimed to apply the simulation-based assessment in time-limited conditions in the workplace, we selected a GRS instrument. The GRS tool used in this study was developed by the INSPIRE investigators to measure the construct of interns’ readiness to perform their first clinical ILP. The tool has anchors relating to the number of prompts that were required during the interns’ performance of the procedure on the simulator and is rated as follows: novice (extensive guidance with >2 prompts), beginner (minimal guidance with 1 or 2 prompts), competent (performed independently and self-corrected or made minor errors), and proficient (performed independently with no prompts or errors). The GRS tool is provided in Figure 1. A prompt was defined in the same way that it has been defined in our previous studies as “a verbal interjection to either prevent or correct an error.”
Assessment Tool (Validity Evidence and Development Process)
This tool has been iteratively developed and validity evidence has been reported in a series of previous studies.19 The previous validation work was conducted for raters applying this tool in a simulation center to an independent group of providers (not those enrolled in this study) and has been conducted with both faculty and resident raters. The metrics assessing validity for these 2 studies included content, internal structure, relations with other variables, and response process.22,23 Cook and Beckham’s modern validity framework involves “the unitary concept of construct validity: the degree to which a score can be interpreted as representing the underlying construct.”26,27 The 5 elements of validity evidence of this framework related to this GRS are provided in Table 1.27,28 In addition, at the end of each of our previous interventional studies, all of the participating institutions provided feedback on this tool to the study team through conference call discussions.19–22
All individuals who supervise clinical ILPs (senior residents, fellows, and faculty, nurse practitioners, and physician assistants) at all institutions had access to the 3 components of the train-the-trainer program: (1) an online presentation on the educational framework of deliberate practice to guide the skills refresher, (2) an online and written guide to the use of the ILP simulator, and (3) an online and in-person rater training on the application of the GRS for rater calibration. The total duration of all sessions combined was approximately 45 to 60 minutes. We were not able to mandate completion of the train-the-trainer program as part of this or previous studies.1–3,19–21
Procedural success was defined as obtaining cerebrospinal fluid with fewer than 1000 red blood cells per high-power field on the first needle pass of the first attempt (without previous attempts by other providers), or when cell count was not available, the ILP was coded as a success if and only if the intern described the cerebrospinal fluid as clear. Outcomes data were reported on data collection instruments after the GRS assessment and the clinical ILP attempt. Participants’ data were linked across the study via a confidential study identification number.
The a priori primary outcome was the correlation of the supervisors’ GRS ratings with interns’ first clinical ILP success as defined earlier. A correlation coefficient of greater than 0.1 would support our hypothesis that this GRS demonstrated predictive validity with a higher coefficient providing stronger evidence. Of note, we used Kendall’s Tau (τ) because of the nonparametric nature of our data and the nature of our outcomes. In addition, this is a more rigorous measure of correlation than the traditional Pearson r or Spearman ρ, and as such, lower thresholds are described for Kendall’s τ. For Kendall’s τ, τ greater than 0.3 is strong correlation, 0.2 to 0.29 is moderate, 0.10 to 0.19 is weak, and 0.10 is very weak.29–31 The authors determined that as an educational study with no existing published data on predictive validity for this procedure, this low correlation would be sufficient to support the future iterative development of this instrument in guiding supervisors’ decisions.
Sample Size Calculation
The sample size for this study was calculated based on an established success rate of 34% from our previous work.3 A total of 182 ILPs were required to detect a 10% difference in success rates with 80% power at the 0.05 significance level for the primary study hypothesis related to the impact of just-in-time training on procedural success. Considering dropouts, missing values, and so on, we planned to collect data on at least 250 interns performing ILPs per cohort. When this number was not reached in the first year of enrollment, we continued the protocol for a second year.
The correlation coefficient of the GRS was evaluated using Kendall’s τ correlation coefficient for rank correlation.32 In addition, we assessed the predictive validity of the GRS by dichotomizing the instrument to calculate sensitivity and specificity at different cutoff points. Three cutoff points are identified in collapsing the 4-point GRS (novice, beginner, competent, and proficient) into a dichotomous variable (competent and noncompetent) to explore a cutoff point that could be used as a minimum passing score to determine procedural readiness. Participant characteristics were analyzed using descriptive statistics, and clinical outcomes were compared across groups, using either χ2 tests or Fisher exact test as appropriate. Relative risk was calculated for each of the baseline characteristics for assessing risk of success.
The number needed to assess (NNA) was calculated similarly to a number needed to treat (NNT), as the inverse of absolute risk reduction (ARR), where NNA = 1 / ARR. This analytic approach was used to evaluate the effect of the intervention (ie, competency assessment using optimal cutoff points) compared with no intervention (ie, the intern performs the LP on his or her own). All participants received the just-in-time assessment, and our analyses compared those that were assessed and were defined as passing the supervisor competence assessment according to 3 different cutoffs (proficient, competent/proficient, or beginner/competent/proficient) compared with failing the supervisor competence assessment (novice/beginner/competent, novice/beginner, novice, respectively). The NNA is an attempt to provide a meaningful metric to evaluate the potential clinical utility of the tool and to quantify how many interns need to be assessed to prevent one failed LP attempt on a patient compared with no assessment.
Only cases with nonmissing values for the primary outcome, LP success, were considered in the final data set. All other cases were deleted. For variables with more than 10% missing data, imputation methods were used to complete the data set. All statistical analyses were performed using SPSS (version 22.0; IBM Corp, Armonk, NY).
To test for intersite variability, LP success rate was compared across cohorts using the Cochran-Mantel-Haenszel (CMH) procedure with site used as a stratification variable. The CMH test adjusts for variability across sites in success rate. The Breslow-Day test was used to examine heterogeneity in the effect measure at each level of the site stratification variable. Additional psychometric data were described in previous literature with an overlapping group of raters; however, because of the logistical challenges of conducting research on a workplace-based assessment, we were not able to evaluate elements such as interrater reliability in this work.
A total of 1600 interns were eligible to participate, 1215 were enrolled, and 297 completed a workplace-based assessment and a subsequent clinical ILP (Fig. 2). All analyses are reported on this group of 297 participants, except otherwise specified. The overall ILP success rate for the entire cohort was 41.8% (124/297). The baseline characteristics of providers and procedure variables related to the clinical ILPs performed are reported in Table 2. The CMH and Breslow-Day tests did not indicate significant heterogeneity between sites (P = 0.648), indicating that intersite variability for success rate was low.
Correlation of GRS and Success
The success rates for each level of performance on the GRS are as follows: novice, 28.6% (18/63); beginner, 39.2% (51/130); competent, 55.4% (46/83); and proficient, 42.8% (9/12). The GRS rating was positively associated with interns’ clinical LP success (P = 0.010), with a correlation coefficient of τ = 0.161 (95% confidence interval [CI], 0.057–0.265, P = 0.002). This indicates a weak association between supervisor rating and success. The associations between procedural factors with ILP success are reported in Table 3. Notably, early stylet removal, family member presence, and clinical area revealed statistically significant associations with LP success. We also examined baseline characteristics of study participants (ie, previously observed clinical ILP, previously performed clinical ILP, previously received didactic training or performed ILP on simulator, supervisor level, and holder) and found no significant differences between successful LPs and failed LPs.
Table 4 reports 3 schemes constructed to explore the test characteristics of the GRS at various cutoff points with the test characteristics (sensitivity and specificity) of each scheme. Scheme 1 is dichotomized into novice compared with beginner-competent-proficient, scheme 2 is dichotomized into novice-beginner compared with competent-proficient, and scheme 3 is dichotomized into novice-beginner-competent compared with proficient alone. Scheme 2 had an appropriate balance of sensitivity and specificity and resulted in the highest ARR compared with the other 2 schemes.
Number Needed to Assess (Potential Impact of GRS)
The principle of the NNT was used to calculate the NNA. This number was intended to quantify the potential impact of implementing this GRS for workplace-based assessment on patient outcomes. The 3 scoring schemes were compared with no intervention on the risk of failed ILP, and the NNA was calculated as the inverse of the ARR. This number quantifies how many interns need to be assessed to prevent one failed LP compared with if no assessment was performed. The NNA for the other schemes are listed in Table 4. With the use of scheme 2, the ARR was 16%, and the inverse ARR would result in an NNA of 6. This number describes how the application of scheme 2 (not allowing novice or beginner interns to perform) could impact the rate of failed ILPs. If scheme 2 was applied as a summative assessment and providers who “failed” did not perform the ILP on a patient, there would be 1 fewer failed LP for every 6 interns assessed (NNA, 6.2; 95% CI, 4.0–8.5) compared with no GRS assessment intervention. This approach provides guidance to supervisors related to selecting an optimal cutoff for this assessment tool if it was used to determine who can and cannot perform an LP (we are not recommending this based on the current iteration of the tool and its operational characteristics). It is important to recognize the “balancing metric” and potential implications of this application of our tool (preventing some trainees who would be successful from performing on a patient).
This study is the first to explore the correlation of a workplace simulation-based assessment with clinical outcomes. A weak correlation was noted between supervisor ratings on a workplace simulation-based GRS assessment and interns’ ILP success. Applying the assessment method described in this article allows for some degree of enhanced discrimination of intern LP skills. According to our analyses, the application of GRS with scheme 2 to 6 interns (rated competent or proficient) would result in 1 fewer failed ILP compared with no GRS assessment. This small improvement could allow for improved patient outcomes but at the price of missed educational opportunities. It should here be noted, however, that this is the first application of this tool and iterative development coupled with further validation holds the promise of higher correlation coefficients, which could drastically improve the assessment’s usefulness in clinical practice while reducing the risk of missed educational opportunities.
Primum non nocere—do no harm—is a fundamental ethical standard that we as physicians hold dearly; however, it is a standard that is often put at risk with our methods of training the next generation’s providers. There is a real, tangible risk to our patients when we allow trainees to develop their skills while “practicing on them” in the traditional apprenticeship process of learning at the patient’s bedside through “see one, do one, teach one.” Every trainee will have a “first” patient on whom they perform a procedure. In the series of studies that we have published, we have described a low level of procedural success of novice participants.1–3,17,19–21 This work has not evaluated the downstream impact of a failed LP on the patient or clinical system. When trainees learn the LP procedure on patients repeated needle insertions may occur leading to an increase in the patient’s pain and parent’s anxiety. Although there are no data supporting the point that failure has been associated with increased rates of iatrogenic infections or other serious morbidity, these data exist for many other procedures (central line insertion, surgeries, catheter insertion).
Studies of assessment instruments are rarely based in the workplace. Summative assessment studies described in the simulation literature are limited by small samples and involve objective structured assessment of technical skills in a laboratory setting.15 The context and timing of assessment impact the validity of an assessment. If the GRS were applied in a simulation laboratory and/or at a distant time interval, it would not have the same operational characteristics. We aimed to create an easy-to-use, low-cost tool that would be practical in the context of the workplace. We did not collect data on the duration of assessment; however, in our discussion with our site directors, the majority of assessments were limited to less than 5 minutes. Simulation-based assessments in the workplace would not translate well to more complex procedures because of the potential impact on patient flow. This GRS tool is specific to this procedure, patient population, and group of providers; therefore, we caution its use in other settings or for other procedures without further research.
A strength of this study is that we provided a detailed presentation of the performance characteristics of the GRS. This presentation of assessment data allows supervisors to balance optimal operational characteristics with the necessity of having a cutoff point that can realistically be achieved by a significant number of trainees. A cutoff point that either allows for too many failed clinical procedures or is so unattainable that few trainees will ever get to perform the clinical procedure is unsustainable. Reporting the test characteristics of an assessment instrument related to clinical outcomes provides a reference point to guide supervisors’ decisions regarding when a trainee is prepared to perform his or her “first” procedure on a patient. If supervisors intend to use this instrument for summative assessment, they must inform trainees how the score will be used—for example, what level of performance will be required before a trainee is granted the privilege of performing the procedure on a real patient. This would provide trainees with clear performance expectations required before being entrusted to perform a clinical procedure.33
When developing and implementing an assessment tool, researchers must consider the utility of an assessment instrument in combination with its validity, cost-effectiveness, context, practicability, and acceptability.33 Now that we have described the operational characteristics of the GRS, we must work to iteratively adapt our approach and improve its performance. In addition, we must use our data to engage supervisors, trainees, and parents to determine an acceptable level of performance that meets our ethical standards of practice. We have completed qualitative research exploring learners’ perspectives on this training paradigm that revealed barriers to workplace-based assessments, including workplace busyness and instructor lack of support.3 We are currently conducting work with all of the other stakeholders on this topic.
The current culture of safety has resulted in decreased tolerance for trainees “practicing on patients” or trial-and-error on-the-job learning. Practice changes could mandate that interns and other novice providers cannot perform procedures on patients until they have completed simulation training and assessment. However, providers will always have a “first” patient that they perform a procedure on, and it is our responsibility as educators to maximize the safety of that event through training, assessment, and supervision. Although the current study is limited to pediatric interns and the infant lumbar procedure, this approach to workplace-based assessment should be explored in other patient and provider populations. For example, a similar assessment strategy could be used before a nurse placing his first intravenous line or a surgeon performing his first appendectomy. To establish the validity concept of consequence in the modern framework,27 we hope that this work encourages other researchers to conduct studies exploring the use of workplace simulation-based assessments as a method for determining procedural readiness. Providing supervisors and trainees objective data on the operational characteristics of a simulation-based assessment tool could be used to improve decisions related to trainees’ procedural readiness and has the potential to improve downstream patient outcomes. In addition, these data could be used to improve the transparency we provide when informing patients about how prepared a trainee is to safely perform a procedure. Of note, this paradigm would require appropriate safeguards and systems changes to ensure that patients do not perceive that they have a suboptimal provider caring for them without supervision. The majority of validity articles do not report data on this element of the validity argument. Reporting these data would represent a significant paradigm shift within teaching institutions and the simulation research community. This would parallel recent transformational changes aimed at maximizing patient safety in academic medicine.
The study has several limitations. The most significant limitation is the weak correlation coefficient noted for the GRS. The tool could be improved through 2 major approaches: first, the tool could be modified or adapted, and second, the rater training or application of the tool could be improved. From a design perspective, the GRS was applied by a diverse set of supervisors across many institutions to maximize the generalizability of this work. Creating an assessment tool that is easy to use and generalizable is a well-described challenge in the implementation of workplace-based assessments.34 The multisite design that allowed for all supervisors to serve as raters created challenges to provide a uniform method for training individuals to apply this instrument and evaluating their application of the GRS. Unfortunately, the investigators could neither mandate completion of the train-the-trainer by all supervisors nor assess if those reporting the GRS had completed the expected training. This may limit the inferences that can be drawn from this work. This could also have contributed to the low number of interns classified as proficient, which prevented finding a significant difference in success rate from other providers. Another limitation is that we could not control for or monitor the interaction between each supervisor and trainee during the simulation, the assessment, or the clinical performance. Trainees have differing learning styles, supervisors have diverse teaching styles, and the clinical scenario of each procedure event is variable. This variation occurs both interinstitutionally and intrainstitutionally and could lead to an unquantifiable amount of variability in supervision and feedback during the assessment and/or clinical procedure. In addition, the short duration of the training may not have been sufficient to develop this procedural skill. Although additional training would likely lead to improved skills, there is limited time that training programs have for a single skill. It is notable that only one quarter of interns reported completing a supervised ILP during the academic year. Although this low number could be from underreporting, it is also consistent with the reduced number of procedural experiences for trainees.35
A workplace-based GRS of interns performing a simulated ILP has some value in predicting procedural success on predicting subsequent clinical performance. The application of this GRS as a summative assessment would result in 1 fewer infant experiencing a failed lumbar puncture for every 6 interns to which it was administered. Future work should focus on the iterative development of this assessment tool to improve its ability to predict procedural success. Further research of workplace-based assessments using simulation has potential in other areas of clinical practice and for diverse populations of health care providers.
The authors thank Charmin Gohel for her help in formatting the references and preparing other files for the submission process.
1. Auerbach M, Chang TP, Reid J, et al. Are pediatric interns prepared to perform infant
lumbar punctures? A multi-institutional descriptive study. Pediatr Emerg Care
2013; 29(4): 453–457.
2. Kessler DO, Auerbach M, Pusic M, Tunik MG, Foltin JC. A randomized trial of simulation
-based deliberate practice for infant lumbar puncture
skills. Simul Healthc
2011; 6(4): 197–203.
3. Kessler DO, Arteaga G, Ching K, et al. Interns’ success with clinical procedures in infants after simulation
2013; 131(3): e811–e820.
4. Pingree EW, Kimia AA, Nigrovic LE. The effect of traumatic lumbar puncture
on hospitalization rate for febrile infants 28 to 60 days of age. Acad Emerg Med
2015; 22(2): 240–243.
5. Ludmerer KM. The development of American medical education from the turn of the century to the era of managed care. Clin Orthop Relat Res
6. Yang J, Howell MD. Commentary: is the glass half empty? Code blue training in the modern era. Acad Med
2011; 86(6): 680–683.
7. Kogan JR, Conforti L, Bernabeo E, Iobst W, Holmboe E. Opening the black box of clinical skills assessment via observation: a conceptual model. Med Educ
2011; 45(10): 1048–1060.
9. Swanwick T, Chana N. Workplace-based assessment
. Br J Hosp Med (Lond)
2009; 70: 290–293.
10. CIPHER. Review of Work-Based Assessment Methods
. Sydney, Australia: Centre for Innovation in Professional Health Education and Research; 2007.
11. Issenberg SB, McGaghie WC, Petrusa ER, Lee Gordon D, Scalese RJ. Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review. Med Teach
2005; 27(1): 10–28.
12. McGaghie WC, Issenberg SB, Cohen ER, Barsuk JH, Wayne DB. Does simulation
-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Acad Med
2011; 86(6): 706–711.
13. Kneebone R. Evaluating clinical simulations for learning procedural skills: a theory-based approach. Acad Med
2005; 80(6): 549–553.
14. Lenchus JD. End of the “see one, do one, teach one” era: the next generation of invasive bedside procedural instruction. J Am Osteopath Assoc
2010; 110(6): 340–346.
15. Brydges R, Hatala R, Zendejas B, Erwin PJ, Cook DA. Linking simulation
-based educational assessments and patient-related outcomes: a systematic review and meta-analysis. Acad Med
2015; 90(2): 246–256.
16. Zendejas B, Brydges R, Wang AT, Cook DA. Patient outcomes in simulation
-based medical education: a systematic review. J Gen Intern Med
2013; 28(8): 1078–1089.
17. Kessler D, Pusic M, Chang TP, Fein DM, et al.; INSPIRE LP investigators. Impact of just-in-time and just-in-place simulation
on intern success with infant lumbar puncture
2015; 135(5): e1237–e1246.
18. Gaies MG, Landrigan CP, Hafler JP, Sandora TJ. Assessing procedural skills training in pediatric residency programs. Pediatrics
2007; 120(4): 715–722.
19. Gerard JM, Kessler DO, Braun C, Mehta R, Scalzo AJ, Auerbach M. Validation of global rating scale
and checklist instruments for the infant lumbar puncture
procedure. Simul Healthc
2013; 8(3): 148–154.
20. Auerbach M, Chang T, Fein D, et al. A Comprehensive Infant
Lumber Puncture Novice Procedural Skills Training Package: an INSPIRE Simulation
-Based Procedural Skills Training Package: MedEdPORTAL Publications; 2014. Available at: http://www.mededportal.org/publication/9724
. Accessed November 2, 2014.
21. Braun C KD, Auerbach MA, Mehta R, Scalzo A, Gerard J. Can residents assess other providers’ infant lumbar puncture
skills? Validity evidence for a global rating scale
and subcomponent skills checklist. Pediatr Emerg Care
2015. In press.
23. Carraccio CL, Benson BJ, Nixon LJ, Derstine PL. From the educational bench to the clinical bedside: translating the Dreyfus developmental model to the learning of clinical skills. Acad Med
2008; 83(8): 761–767.
24. Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med
1998; 73(9): 993–997.
25. Gray JD. Global rating scales in residency education. Acad Med
1996; 71(Suppl 1): S55–S63.
26. Downing SM, Yudkowsky R. Assessment in Health Professions Education. New York, NY: Routledge; 2009.
27. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: theory and application. Am J Med
2006; 119(2): 166.e7–166.e16.
28. Downing SM. Validity: on meaningful interpretation of assessment data. Med Educ
2003; 37(9): 830–837.
29. Weichao Xu YH, Hung YS, Zou Y. A comparative analysis of Spearman’s rho and Kendall’s tau in normal and contaminated normal models. Signal Processing
2013; 93(1): 261–276.
31. MG K. A new measure of rank correlation. In: Biometrika OUP, ed. 301938: 81–93.
32. Hinchey KT, Rothberg MB. Can residents learn to be good doctors without harming patients? J Gen Intern Med
2010; 25(8): 760–761.
33. van der Vleuten CP, Schuwirth LW. Assessing professional competence: from methods to programmes. Med Educ
2005; 39(3): 309–317.
34. Crossley J, Jolly B. Making sense of work-based assessment: ask the right questions, in the right way, about the right things, of the right people. Med Educ
2012; 46(1): 28–37.
35. Rodriguez-Paz JM, Kennedy M, Salas E, et al. Beyond “see one, do one, teach one”: toward a different training paradigm. Qual Saf Health Care
2009; 18(1): 63–68.
Simulation; Infant; Lumbar puncture; Pediatrics; Workplace-based assessment; Global rating scale; Quality; Safety; Graduate medical education
Supplemental Digital Content
© 2016 Society for Simulation in Healthcare