Balancing patient safety with the often-limited availability of critical care resources necessitates that clinicians recognize and/or be able to predict adverse outcomes in patients presenting for emergency care and then requiring hospitalization. Failure to accurately recognize and treat patients when their condition deteriorates can lead to significantly poorer patient outcomes, which is counter to the goal of maintaining a high level of patient safety (1,2). To that end, significant effort has been made over the past 2 decades or more to create and implement standardized alert systems with the goal of improving the detection of patients at risk for an adverse physiologic event (3). An early warning score (EWS), subsequently known as the modified early warning score (MEWS), has been implemented worldwide leading some clinicians to explore if these scoring tools are the best approach to the recognition of an acute adverse event (4). The “jury” has, perhaps, been hearing a one-sided argument given the mounting evidence for the use of MEWS and the paucity of evidence of relying on clinical gestalt to make the decisions regarding when to admit to intensive care units from emergency departments (EDs).
The MEWS has been studied in a variety of populations including but not limited to surgical patients (5), patients with acute pancreatitis (6), and emergency medical admissions (7). To calculate the MEWS, numerical values are assigned to five common physiological parameters: Alert, reacting to Voice, reacting to Pain, Unresponsive score; temperature; heart rate; respiratory rate; and systolic blood pressure (7). The cutoff scores indicating an increased risk of catastrophic deterioration were greater than or equal to 4 in these cited studies. It is worth noting that the number of items and cutoff scores are highly variable in the literature regarding EWS and MEWS as additional items of physiologic importance (i.e., decrease urinary output, oxygen saturation, and mental status) have been added in some institutions and in the research literature dependent on case mix (8). The consensus of the literature regarding the use of MEWS is that the tool improves safety outcomes, and it is an instant, easy, and repeatable way to standardize the recognition of patients that may suffer a deleterious medical event.
Though MEWS has been widely used and recommended (3–8), barriers to using these tools have been noted affecting the ability of the tools to impact outcomes. The most significant barriers were reported in a recent qualitative evidence synthesis conducted by O’Neill et al (9) regarding why healthcare providers do not escalate care when indicated by EWS protocols. The synthesis yielded 18 studies from seven countries, from which the following barriers were suggested: resources, lack of accountability, fear, rapid response team behaviors, lack of standardization, increased conflict, hierarchy, over confidence, lack of confidence, and patient variability. Improper use of EWS tools by healthcare providers has been documented in other recent literature as well (10).
The preponderance of the literature on the use of EWS tools to recognize the deterioration of patients recommends their use even given the barriers highlighted above. However, there are only a small number of studies demonstrating the importance of clinical judgment (11,12). One such multicenter observational study supporting clinical gestalt to recognize critically ill patients is presented in this issue of Critical Care Medicine (12). In the study, the authors compared MEWS with medical caregivers (defined as emergency medical services [EMS] nurses, ED nurses, and physicians) in predicting the development of critical illness. They hypothesized that healthcare provider prediction of critical illness in patients presenting to the ED is more accurate than MEWS. The study included all adult patients presenting to the ED via EMS of a level-one trauma center with two sites providing care to approximately 50,000 patients annually. The MEWS was calculated from primarily imputed data, and acute caregivers were asked standardized questions aimed at predicting clinical outcomes (defined as clinical gestalt for the study) for 800 patient that were included in the study with 113 of those suffering from critical illness. Trauma-related diagnosis was most frequent in the sample with stroke being the second most frequent and approximately 67% with some other nonidentified illness. The ability to predict a critical illness occurrence (serious adverse events, intensive care admission, or death) within 72 hours was the primary outcome. Sensitivity and specificity for both MEWS and clinical gestalt (based on a yes/no question) were calculated. The MEWS sensitivity of 64.8% was significantly higher than the clinical gestalt of the EMS nurses (41.6%) and ED nurses (41.8%). Physician sensitivity was 60.9%. With regard to specificity, MEWS was reported at 70.4% compared with 93.2–97.3% for the medical caregivers. Thus, the positive predictive value (PPV) was higher for clinical gestalt compared with MEWS; however, the negative predictive value (NPV) was similar. A secondary outcome assessed the performance of MEWS ≥ 3 and healthcare providers (based on the highest score of a 4-point scale in recognizing the most severely ill) in predicting which patients would develop critical illness. This was done by calculating the area under the curve (AUROC) for ED nurses of 0.809 and for physicians of 0.848 versus 0.731 for MEWS. The decreased performance of MEWS compared with ED nurses and physicians was statistically significant in this analysis. When the medical professionals were combined and compared with MEWS, the AUROC ranged from 0.691 to 0.704, and when compared with the MEWS, AUROC above indicated no improvement in accuracy of the medical professionals.
Kuit et al (12) concluded that healthcare provider judgment could predict patients not becoming critically ill within 72 hours better than MEWS. Conversely, MEWS overestimates the number of patients who become critically ill resulting in a higher number of false positives. Ultimately, the author’s recommendation is that healthcare providers’ judgment should be considered for early escalation of care in patients they suspect to become critically ill.
The study described above was an excellent attempt at quantifying the phenomenon of clinical gestalt. A large sample was used in multiple centers with skillful statistical analysis applied. However, like most research (especially observations studies), there are several threats to the internal and external validities of the study. Only collecting data in the daytime hours likely alters the outcome and creates a selection bias. Collection tools using a yes/no question along with a 4-point scale created by the authors create an instrumentation bias. Regarding the MEWS, the data were imputed for approximately two-thirds of the sample indicating that the tool was not being utilized in the way intended, which may account for the reported poor statistical performance. The heterogeneity of the patients’ diagnoses is a limitation when calculating predictive values since the PPV and NPV vary with changing prevalence (13). Perhaps the most significant limitation is the historical event bias created when considering the healthcare provider familiarity with the physiological parameters used in the MEWS.
The readers of this Journal are the “jury” referred to by the title and, as such, are reminded to collect all the evidence and evaluate its merit before acting on it. In the context of recognizing when patients deteriorate and in need of precious resources like intensive care, clinicians need to err on the side of safety and be given the benefit of the doubt. Healthcare provider gestalt and MEWS should not be considered mutually exclusive. Standardized tools are safety nets scientifically developed to support clinical decision-making based on the potential fallibility between clinician’s variable experience and expertise. Prediction models have inherent advantages over clinical gestalt as they can accommodate a much larger number of variables than what a human is capable of (14). Given the multivariate data intrinsic in assessing human physiology, a statistical model will provide consistent results compared with the inconsistency of human judgment, especially with less experienced clinicians. As far as a prediction and/or screening tools are concerned, false positives would be preferred over false negatives as false positives would likely lead to early escalation of care more often; however, the alternative is potentially devastating outcomes like higher cardiopulmonary arrest events and higher mortality (4). Given the significant research on prediction tools and the paucity of data on human judgment, the jury should rule for healthcare providers to work on better implementation and use of prediction models combined with their clinical gestalt to assure patient safety stays at the forefront.
REFERENCES
1. Barwise A, Thongprayoon C, Gajic O, et al.: Delayed rapid response team activation is associated with increased hospital mortality, morbidity, and length of stay in a tertiary care institution. Crit Care Med 2016; 44:54–63
2. Parkhe M, Myles PS, Leach DS, et al.: Outcome of
emergency department patients with delayed admission to an intensive care unit. Emerg Med (Fremantle) 2002; 14:50–57
3. Pimentel MAF, Redfern OC, Malycha J, et al.: Detecting deteriorating patients in the hospital: Development and validation of a novel scoring system. Am J Respir Crit Care Med 2021; 204:44–52
4. Chirag M, WuQiang F, Karen V, et al.: Modified Early Warning System improves patient safety and clinical outcomes in an academic community hospital. J Community Hosp Intern Med Perspect 2015; 5:26716
5. Gardner-Thorpe J, Love N, Wrightson J, et al.: The value of Modified
Early Warning Score (MEWS) in surgical in-patients: A prospective observational study. Ann R Coll Surg Engl 2006; 88:571–575
6. Suppiah A, Malde D, Arab T, et al.: The Modified
Early Warning Score (MEWS): An instant physiological prognostic indicator of poor outcome in acute pancreatitis. JOP 2014; 15:569–576
7. Subbe CP, Kruger M, Rutherford P, et al.: Validation of a modified
Early Warning Score in medical admissions. QJM 2001; 94:521–526
8. Smith ME, Chiovaro JC, O’Neil M, et al.: Early warning system scores for clinical
deterioration in hospitalized patients: A systematic review. Ann Am Thorac Soc 2014; 11:1454–1465
9. O’Neill SM, Clyne B, Bell M, et al.: Why do healthcare professionals fail to escalate as per the early warning system (EWS) protocol? A qualitative evidence synthesis of the barriers and facilitators of escalation. BMC Emerg Med 2021; 21:15
10. van Galen LS, Struik PW, Driesen BE, et al.: Delayed recognition of
deterioration of patients in general wards is mostly caused by human related monitoring failures: A root cause analysis of unplanned ICU admissions. PLoS One 2016; 11:e0161393
11. Veldhuis LI, Ridderikhof ML, Bergsma L, et al.: Performance of early warning and risk stratification scores versus clinical judgement in the acute setting: A systematic review. Emerg Med J 2022; 39:918–923
12. Kuit M, Veldhuis LI, Hollman M, et al.: Recognition of Critically Ill Patients by Acute Healthcare Providers: A Multicenter Observational Study Crit Care Med 2023; 51:697–705
13. Akobeng AK: Understanding diagnostic tests 1: Sensitivity, specificity and predictive values. Acta Paediatr 2007; 96:338–341
14. Adams ST, Leveson SH: Clinical prediction rules. BMJ 2012; 344:d8312