When interpreting performance scores on an objective structured clinical examination (OSCE), are all checklist items created equal? In written examinations, all questions are typically given equal value, and a numeric cut-off determines the passing grade. Applying this same scoring technique to skill performance assessments may miss important clinical care discriminators.
To address the challenge inherent in the interpretation of skill performance scores, individual checklist items are often weighted. Individual items that are felt to be more important are given greater value in the overall score calculation. In actual clinical practice, however, some clinical performance actions are not just more important, they are essential. Whether the clinician performed a specific task or not determines the quality of care. In this respect, the use of weighting for clinical skill assessments in medical education may not realistically reflect the student’s ability to properly demonstrate essential skill performance elements.
As a more discriminating option to weighting, we reviewed the concept of the critical incident technique. Originally described in 1954, this technique was used in psychological research to assign relative importance among behaviors essential to successful performance.1 This approach was used in 1965 to create a definition of clinical competence by the National Board of Medical Examiners.2 More recently, the term critical action was defined by Petrusa3 for the purpose of introducing a clinical standard of care into long-case skill performance examinations. In this article, we define a critical action as an OSCE checklist item whose performance is critical to ensure an optimal patient outcome and avoid medical error. Below, we report our findings from when we applied Petrusa’s idea of critical actions to an analysis of our school’s recent OSCE performance data.
Identifying critical actions for use with clinical skill assessments necessitates the consideration of two kinds of OSCE checklist items: synergistic items and critical actions items. On the one hand, all items included in an OSCE checklist can be seen as synergistic. Taken together, they create an effect greater than that predicted by their separate effects. Thus, each OSCE checklist item is selected because it contributes synergistically to the total performance of the clinical skill being assessed.
On the other hand, within a clinical setting, there are some elements that are critical for optimal clinical outcomes. Within the context of an OSCE, critical actions are those select checklist items that most encompass the essential purpose for performing the clinical skill.3 When performed correctly, the critical action can result in the appropriate clinical decision, even though there may have been inadequate performance on other checklist items. When performed incorrectly, the critical action might result in the wrong clinical decision and an adverse patient outcome.
For example, in an OSCE designed to assess a student’s ability to accurately perform all the elements of blood pressure measurement, a student may be observed to perform those elements but, nevertheless, obtain an inaccurate blood pressure reading. Using current scoring techniques with or without weighting, this learner would achieve a high overall score, reflecting correct performance on a sufficient number of checklist items to meet a preselected passing standard. However, this high-scoring student also did not perform the critical action: obtaining an accurate blood pressure. This example illustrates the clinical utility of designating critical checklist items when interpreting the performance scores of clinical learners. In applying the critical action concept to an analysis of our recent OSCE assessment data, we considered two hypotheses:
1. that students with above-average overall OSCE scores will more likely accomplish the critical actions than will lower-scoring students; and
2. that some students who perform above average will fail to correctly accomplish the critical actions.
The University of Virginia Clerkship Clinical Skill Learning Assessment was a semiannual formative OSCE assessment of discrete, observable clinical skills. This project, funded by the Health Resources and Services Administration, was conducted from 2003 to 2006. Three hundred ninety-eight third-year clerkship students rotated through four different clinical cases twice during the clerkship years 2003–2006. Twenty-five cases were developed covering five skill domains of clinical care: physical examination, communication, clinical reasoning, procedure performance, and test or imaging interpretation. Cases were authored by selected teaching faculty and reviewed by project leadership using an established template.
During testing, students were instructed, before entering the examination room, that they should first read the clinical scenario that was placed on the door. Each OSCE station used standardized patients (either simulating a condition or actually having it) or clinical models to portray case scenarios. Each student was observed by a trained clinical faculty member who evaluated the student using a detailed OSCE checklist. After each encounter, students received feedback and instruction from the faculty observer and the standardized patient. During this time, students were given the opportunity to practice the skill again in an effort to assimilate their newly acquired understanding of skill performance. During this three-year period, we studied 398 third-year students who were tested on a selection of the 25 different cases.
A consensus technique was used to identify OSCEs containing critical actions from among the 25 cases.4 This process involved checklist item review by each of the six program faculty and the case authors. The faculty represented the Departments of Medicine (E.C.C., E.B.B., E.B.H.), Family Medicine (K.L.M.), Pediatrics (N.J.P.), and Surgery, and the Office of Medical Education (V.E.M.). They were asked to review the case instructions and consider the implications of the clinical findings in each checklist item. The faculty were given the definition of critical action and were then asked to identify which checklist items met this definition. After this, the program faculty met to discuss the identified critical actions and to reach a final consensus.
For inclusion in this study, 10 of the 25 OSCE cases were selected because they contained checklist items that met the definition of a critical action. These 10 cases represented the skill domains of physical exam, communication, clinical reasoning, and procedure performance. Nine cases used standardized patients (one with an actual condition—chronic atrial fibrillation), and one used a mannequin model (child’s ear examination). In the end, a range of one to nine critical actions were identified within each of the 10 cases (see Table 1). Examples of cases that were excluded from this analysis include the motivational interviewing case, because it focused on interview technique and not on a specific patient outcome, and the shoulder examination case, because the standardized patient had a normal shoulder and the emphasis was on performance of all physical examination maneuvers.
Students were separated into higher-performing and lower-performing groups based on overall OSCE performance for the purpose of subgroup comparison. The median score for each OSCE was chosen as the breakpoint between higher- and lower-performing students. The median was chosen for two reasons. First, these OSCEs were originally developed as a formative learning assessment, not as a graded, summative exercise. Thus, no standard-setting process or passing score existed at the time of this analysis. Second, the median was selected rather than the mean because students’ scores were not normally distributed and overall performance scores were low. In light of this, a norm-referenced cut-off, such as one standard deviation below the mean, would not have been statistically valid, and might have implied standard setting.
For each student on each OSCE, an overall mean score was calculated that represented the percentage of weighted checklist items performed correctly. From these scores, a group mean was calculated for each OSCE. Individual OSCE scores were evaluated to determine which students performed higher than the group median but failed to perform the critical action correctly.
Logistic regression models5 were used to discriminate between students who correctly and incorrectly performed the critical actions. An odds ratio was used to quantify the effect of the total OSCE score in predicting the probability of correctly performing the critical actions. An area that was under the receiver operating characteristic (ROC) curve was used to quantify the predictive discriminatory power. In this analysis, this is defined as the ability to separate those students who correctly performed the critical actions from those who did not. An ROC curve area of 0.5 indicates no discrimination, and a curve area of 1.0 indicates perfect discrimination. All statistical analyses were performed using either SAS 9.1 (Cary, NC) or the S-Plus Version 7.0.6 (Insightful Corp., Seattle, Washington).
The University of Virginia institutional review board considered this an exempt educational study.
Using descriptive statistics, Table 2 illustrates the mean performance scores by OSCE, as well as the quartile analysis for each case. The quartile analysis demonstrates that the data are not normally distributed.
To determine whether students with above-average overall OSCE scores correctly accomplished the critical actions, we analyzed the effect of the total performance score in predicting the probability of correctly performing the critical action (see Table 3). We assessed whether students with above-average overall OSCE scores were more likely to accomplish the critical actions. For 7 of the 10 cases, the total OSCE score is a significant predictor of whether students performed critical actions correctly (P < .5), demonstrating that students who scored higher were more likely to accomplish the critical actions. Three of the 10 cases were not eligible for this part of the analysis because of a low n (one case) or the lack of sufficient variability in the scores (two cases).
Odds ratios were calculated for the eight cases with adequate variability to determine the magnitude of the effect that the overall score had on critical action performance. The odds ratios varied from 1.35 to 38.54, with six cases having odds ratios above 2 and two cases having odds ratios above 10. As an example, in the blood pressure case, any student who scored 10% higher than a peer was almost twice as likely to perform the critical actions correctly (odds ratio = 1.81, 95% confidence interval: 1.28, 2.56; P < .001).
Finally, Table 3 lists the area under the ROC curve for the eight analyzed cases. In the blood pressure case example, the ROC area of 0.688 indicates not only that some of the students with higher overall scores failed to perform the critical actions correctly but also that some students with lower overall scores successfully performed the critical actions. ROC areas vary from 0.620 to 0.946 for all the cases analyzed.
To determine whether some students who performed above the median failed to correctly accomplish the critical actions, we prepared Table 4. In this table, the percentage of students who scored above and below the median were compared by critical action performance. For 9 of 10 cases, 6% to 46% of students with above-average overall scores failed to perform the critical actions correctly. In five cases, more than 20% of students with above-average total scores failed to correctly perform the critical action. Additionally, for 6 of the 10 cases, 2% to 28% students who scored below average still performed the critical actions correctly. In two of these cases, the percentage was 20% or greater.
The purpose of the OSCE is to provide a standardized method for the evaluation of clinical skill performance. OSCEs have been widely critiqued since their development in the 1960s. In the past 10 years alone, a number of studies of their measurement properties have been published.6–8 Their purpose, methodology, and standard-setting procedures have also been reexamined.9–18 However, except for Downing et al’s15 reference to crucial items in standard setting, little has been published regarding the application of the critical action concept to OSCE assessment as defined by Petrusa.3,15
We analyzed clerkship students’ clinical skill performance in OSCEs containing critical actions and found that students with above-average overall OSCE scores correctly performed the critical actions more often than did lower-performing students. We also found that there were many students with above-average scores who failed to perform the critical actions. These findings reveal that an overall performance score alone overlooks important indicators about students’ skill performance. Adding critical action analysis sharpens the eye of the OSCE by identifying certain checklist items that, in clinical practice, would be mandatory, not simply synergistic. This has important implications not only for OSCE scoring but also for OSCE design and the evaluation of curricular effectiveness.
Traditional numeric scoring on OSCEs quantifies a student’s overall skill performance, enables normative standard setting, and allows for individual and group comparisons. For OSCEs that contain critical actions, an overall score does not fully characterize learners’ skill performance. Importantly, a high overall score does not ensure that a student correctly performed the critical actions. Being able to correctly perform all of the elements of blood pressure measurement is important. Accurately determining the patient’s blood pressure is mandatory.
Weighting of checklist items is employed in an effort to emphasize clinically important elements. However, for an OSCE that contains critical actions, checklist item weighting can theoretically fail in one of three ways. First, if the critical actions are weighted too heavily, then the impact of the remaining synergistic items becomes marginalized. This might be desired in cases of a minimum basic competency assessment. However, for comprehensive skill assessment, all items should factor into the final score. Second, if the critical actions are not weighted heavily enough, students may score well on the exercise even though they do not correctly perform the critical actions. Finally, it is possible that all students would miss the critical actions. In that circumstance, even the weighted overall scores would fail to show that students missed these clinically essential items. Recognizing these limitations of weighting compels us to conclude that adding critical action analysis to OSCEs that contain critical actions will more accurately characterize students’ levels of skill.
To achieve optimal patient outcomes and avoid medical error in clinical practice, critical actions must be successfully performed. Even so, critical actions should not be the entire focus of an OSCE. Particularly when calculating skill performance scores, synergistic elements are important because they emphasize the finer points of technique and lend importance to the art of clinical skill performance. The student who happens to record an accurate blood pressure after inconsiderately overinflating the wrong-sized cuff may have performed the critical action, but few would argue that this was an adequate performance. Furthermore, achieving reproducible results on critical actions seems more likely accomplished by correctly performing all of the other elements of a skill. Selecting the appropriate blood pressure cuff size, ensuring the patient remains quiet through the exam, and properly inflating the cuff will likely produce both correct and reproducible blood pressure measurements. If medical education is to realistically prepare the student clinician to develop ideal clinical practice habits, its methods should reflect this same performance expectation. Therefore, when critical actions are present in an OSCE, critical action analysis should be employed in addition to calculating an overall score.
Not all OSCEs have or require critical actions. OSCEs need not have critical actions if they are designed to assess only the ability of a clinical learner to properly perform the steps of a specific skill. This is generally the case at the outset of learning a particular interviewing technique, physical examination skill, or basic clinical procedure. In more advanced clinical performance learning, however, critical actions become essential in evaluating clinical proficiency because they place skill performance within the context of clinical practice. This consideration underscores the importance of designing the OSCE with explicit attention to the developmental level of the learner to be tested.
Informing curricular effectiveness
The analysis of data from clinical skills assessments informs not only how an individual student is progressing but also how effectively a curriculum enables students’ clinical skill development in the aggregate. In our analysis of basic clinical skill OSCEs containing critical actions, we learned that many clerkship students were unable to correctly perform those skill checklist items that are most important in guaranteeing the quality of care. If medical schools are persuaded by overall OSCE scores alone that students are achieving skill performance proficiency, they miss an opportunity to place skill learning in a clinical context and enhance the OSCE’s instructional value. Failure to identify students who cannot perform critical elements of clinical skills can result in a false sense of educational effectiveness. Additionally, students may take away from this experience an inappropriate sense of their own clinical ability.
A 2 × 2 table analysis of scores (Table 5) shows how critical action analysis can contribute to an enhanced understanding of aggregate student skill performance. In a traditional scoring system, students who perform above average but omit the critical action would appear to do well. In fact, they would be deficient in the most critical skill areas and need further learning and practice. They may remain unaware of their performance improvement needs. Furthermore, students in each quadrant of the figure may have different instructional, skill learning, and assessment needs that would go unrecognized in a traditional scoring system.
Limitations and future directions
We believe that using the concept of the critical action to enhance OSCE score interpretation informs a more robust understanding of student skill learning. However, there are limitations to our analysis. This study represents one investigation of a small set of selected skills at one institution. Further understanding of the utility of critical action analysis will surely be gained from further studies on a larger scale and with a broader set of skills both at our institution and others.
Second, there is a need for determining optimal consensus methods among physician educators regarding which checklist items should be designated as critical actions. Faculty vary in their opinions on what may be the most important elements in performing a specific clinical skill. For any group of OSCE authors interested in using the critical action technique, choosing critical actions through a transparent consensus process is more important than whether or not they concur with the critical actions as defined in our study.
Third, our students’ overall scores are low. Their performance may have been different had this been a high-stakes examination rather than a formative learning exercise. Although all of these OSCEs were framed within a clinical scenario, our students’ perception of the formative nature of this assessment may have influenced their performance. It is also unclear to us whether their low scores are attributable to OSCE design, curricular effectiveness, or student-centered factors. Further study is needed to better understand the influence of these variables on students’ critical action performance.
Fourth, the small sample size in this study limits the generalizability of our scoring results and affected the score distribution. Because their scores were not normally distributed, we used the OSCE median performance score to distinguish between higher- and lower-performing students. Different grading threshold methods such as quartiles or normative curving would have produced a different number of students in each of the four performance groups.
Fifth, this study was not designed to compare the influence of OSCE format on skill performance. Students can be influenced by whether real patients, standardized patients, or models are used in skill assessment. Further study is needed to evaluate the impact of testing formats on students’ critical action performance.
Finally, this retrospective analysis did not allow us to determine patterns of individual student performance. The low number of students in some skill categories and study data-collection methods did not allow for tracking of individual student performance over multiple cases. A better understanding of the critical action technique would be advanced with a prospective study encompassing a broader range of clinical skills at multiple institutions.
The Bottom Line
Clinical skills education should reflect the performance expectations of clinical practice. Through the application of Petrusa’s idea of the critical action to OSCE assessment, clinical performance standards can be further incorporated into the skills assessment process. In the performance of the clinical act, there are certain things that a physician ideally should do, and others that simply must be done to guarantee minimum standards of quality in the care of the patient. When applied to the assessment of clinical-skill-performance learning in the medical education setting, critical action analysis sharpens the eye of the OSCE and serves to enhance the clinical relevance of this widely practiced skill assessment technique.