Sepsis is a common and costly condition accounting for over 750,000 hospital admissions, 500,000 ED visits, and 200,000 deaths annually, with estimated total hospital costs exceeding $24 billion per year (1, 2). The emergency department (ED) is often the point of initial presentation and where early detection, timely treatment, and appropriate disposition can reduce morbidity and mortality while efficiently allocating limited resources (3). Accurate risk-stratification and disposition is therefore of paramount importance. Patients with sepsis admitted directly to the intensive care unit (ICU) from the ED have better outcomes than those admitted to the hospital ward and later transferred to the ICU (4, 5). The ability to accurately identify patients at risk of future deterioration is necessary to optimize disposition from the ED.
A wide range of clinical measures, scores, and prediction rules have been developed to facilitate risk stratification (6–10). These range from the relatively simple qSOFA (7) and shock index (8) to more complex APACHE II (9) and SOFA (10) scores which provide estimates of illness severity from weighted combinations of clinical and laboratory measures. The predictive ability of these tools varies widely (11, 12) and many are impractical for use in the ED where variables such as hourly urine output, etc., are not routinely obtained.
Heart rate variability (HRV) analysis provides a set of electrocardiograph (ECG)-derived metrics that are associated with illness severity (13) and outcomes in sepsis (14) and do not require additional measurements or manual computation. Continuous measurement has been shown to be practical in the critical care setting (15). Prior studies have demonstrated that HRV is consistently altered in illness and the degree of alteration correlates with severity of disease (16).
Widespread implementation of electronic medical records has generated a significant increase in the data available for both clinical and research purposes. This has driven growing interest in the use of machine learning techniques to explore large clinical datasets in order to identify novel associations, facilitate diagnosis, and guide treatment decisions (17). To date, no existing risk stratification tools have combined clinical, laboratory, and HRV measures in a predictive model using machine-learning techniques. Such a tool might provide the clinician with a standardized prognostic measure to support the clinical decision regarding treatment and disposition.
In this study, we evaluate, compare, and combine the predictive capability of clinical, laboratory, and HRV measures to identify patients with sepsis at increased risk for future deterioration; as defined by any of the following: ICU admission, initiation of noninvasive ventilation, endotracheal intubation, administration of vasopressors/inotropes, or death within 72 h.
PATIENTS AND METHODS
Adult ED patients (≥21 years of age) satisfying the 1992 criteria for sepsis (18) and able to provide consent (directly or through a surrogate, if available) were eligible for enrollment. Patients who died or required aggressive resuscitative measures within 1 h of ED presentation (intubation, noninvasive ventilation, vasopressors/inotropes), and those with Do Not Resuscitate (DNR) or Do Not Intubate (DNI) orders were excluded. Patients in whom HRV analysis could not be performed (permanent pacemaker, atrial fibrillation, or non-sinus rhythm) were similarly excluded.
This study was performed from December 2014 through May 2016 in the Department of Emergency Medicine of Montefiore Medical Center, two urban, academic EDs in Bronx, NY with a combined adult census of 150,000+ visits per year. The project was approved by the institutional review board of the Albert Einstein College of Medicine/Montefiore Medical Center (Approval #2014-3582).
At enrollment, demographic data were recorded and trained ED Research Associates obtained clinical variables routinely measured in the ED. Enrolled patients had continuous Holter monitoring (DigiTrak XR Recorder, Philips Healthcare, Andover, Mass) for a minimum duration of 15 min, with the amount of usable waveform data ranging from 5 to 135 min. Patients were not monitored for the totality of their ED stay.
The medical center's medical record was reviewed daily by the principal study investigator (DPB) to record laboratory values and detect study endpoints. Laboratory values were recorded from samples drawn upon ED presentation. Candidate clinical and laboratory variables were selected based upon readily available measures that were part of routine clinical care at our institution and were recorded as raw values (see Online Appendix 1, http://links.lww.com/SHK/A759).
Holter recordings were processed by trained technicians (LifeWatch Services Inc, Rosemont, Ill) and exported for further analysis. Technicians were blinded to study objectives as well as all other patient level data and outcomes. Using Continuous Individualized Multiorgan Variability Analysis (CIMVA) software, a set of 55 HRV measures was calculated and tracked over time through a windowed analysis of each recording, as we have described elsewhere (19). This analysis consisted of taking a window of the R-peak-to-R-peak interval data of 1,000 R-R intervals using the normal beats identified from Holter software algorithm, assessing the quality of the resulting time series and removing any residual artifacts and non-sinus beats (20), or non-stationary periods, computing all variability measures for the given window, and repeating the computation on successive windows until the end of the recording, with an overlap of 250 R-R intervals. The number of successive windows varied from 1 to 12 with most patients having between 2 and 4 windows. Variability was summarized over the duration of the recording, by averaging each variability time series across analysis windows. All analyses adhered to standards developed by the Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (21). (See Online Appendix 1, http://links.lww.com/SHK/A759, for a comprehensive listing of all HRV metrics.)
For comparison purposes, we computed values for four existing tools (qSOFA (7), SOFA (10), MEWS (22), and NEWS (23)) using threshold values defined a priori corresponding to qSOFA ≥ 2 (7), SOFA ≥ 2 (7), MEWS ≥ 5 (24), and NEWS ≥ 8 (24) to identify patients at increased risk of deterioration.
“Deterioration” was defined by any of the following within 72 h of ED presentation: initiation of noninvasive ventilation; endotracheal intubation; vasopressor/inotrope infusion; ICU admission; or death. Patients previously placed on noninvasive ventilation for obstructive sleep apnea were not coded as reaching the noninvasive ventilation endpoint. To ensure study endpoints reflected true deterioration, we required a minimum treatment duration of 1 h for noninvasive ventilation and vasopressor/inotrope infusion, and a minimum critical care length of stay of 24 h. Patients who did not meet the minimum treatment duration or critical care length of stay were included in the analysis but not coded as having met these endpoints. Patients discharged within 72 h were contacted to ensure that they did not attain one of the study endpoints during the 72-h period following their initial presentation.
Individual measures were characterized by domain as either clinical (“CLIN”; e.g., blood pressure), laboratory (“LAB”; e.g., white blood cell (WBC) count) or heart rate variability (“HRV”; e.g., multiscale entropy) for purposes of our analysis. Candidate variables are listed in Online Appendix 1 (see http://links.lww.com/SHK/A759).
To generate our models, we used ensemble averaging (25) of univariate logistic regression models, which we have used previously (26) using only records with complete datasets. In this approach, one begins with a collection of univariate logistic regression models whereby each model provides an isolated estimate of risk based upon a single predictor. These estimates are then linearly combined across multiple predictors (in our case, a simple average) using a sequential predictor selection process, in order to obtain a more robust estimate of risk, with increasing values representing a greater likelihood of reaching a study endpoint.
We first divided our study sample into training, validation, and test datasets to identify the best set of predictors and provide an unbiased assessment of model performance (see Fig. 1 for a graphical description). The selection of predictor and by extension of associated univariate logistic regression models to combine was done via a sequential forward floating selection procedure (SFFS) (27). Briefly, it iteratively added univariate logistic regression models corresponding to each individual predictor to find the optimal combination maximizing the H-measure (28) in the validation set. We used a maximum of six predictors in order to permit a fair comparison between all ensemble models given that we considered only six laboratory variables. The H-measure is an alternative classifier performance metric to the common area under the receiver operating curve (AUROC) that uses a more coherent handling of misclassification costs when comparing different classifiers. Derived models were evaluated on the unseen test set in order to obtain an unbiased estimate of performance.
A set of final predictive models based upon the results of the steps performed above were then retrained using the entire dataset in order to obtain examples of best subsets of features and overall performance for each initial set of features (Fig. 1). The optimal model was then used to compute the risk score reported below (Fig. 2).
As part of an exploratory analysis, we constructed a model outcome score to define low, medium, and high risk strata. The fold increase in risk of developing at least one outcome at 72 h, compared with the averaged risk in the population, was represented by grouping the probabilities of deterioration into three bins, representing the low, medium, and high-risk groups. Thresholds on the probabilities of deterioration were optimized so that the medium-risk group had an incidence of patients developing at least one outcome at 72 h equal to the prevalence in the entire population (8%) and so that the total number of patients in the high-risk group was maximal. To assess performance, risk was defined as the number of patients who developed an outcome divided by the total number of patients within a given bin. Fold increase in risk was therefore equal to the risk within the bin divided by the average risk of developing an outcome over the entire dataset.
MATLAB v9.1 (MathWorks, Natick, Mass) was used for all computations.
A total of 1,247 patients were enrolled, of whom 832 had complete datasets and were included in the analysis. Characteristics of the study cohort are described in Table 1. Of these, 68 (∼8%) met one or more study endpoints, including 8 patients who died and 52 who required ICU admission. Twenty patients were intubated and mechanically ventilated, 24 required vasopressor/inotrope infusions, and 21 required new noninvasive ventilation (see Fig. 3). Median intervals between triage and study endpoints are reported in Table 2.
Variables selected through SFFS for our final predictive models are listed in Table 3 and the results of predictive modeling are reported in Table 4. The models with the highest H-measures and areas under the receiver operating characteristic curve (AUROC) were those that included a combination of HRV and laboratory measures. HRV+LAB+CLIN and HRV+LAB were equivalent (AUC = 0.80, 95% CI: 0.65–0.92). Performance characteristics of our optimal model and existing tools are presented in Table 5.
Our final model, HRV+LAB, provided the greatest predictive ability, combining three HRV measures (DFA α-1 (29), Multiscale Entropy (30), mean of R-R peak intervals) with three laboratory variables (lactate, international normalized ratio (INR), and serum creatinine) and was statistically superior to all other models (with the exception of HRV+LAB+CLIN, which was equivalent), using a Mann–Whitney U test and correcting for multiple comparisons between the different models by controlling for the false discovery rate with a value of 0.01 (31).
Performance characteristics of our final HRV+LAB model are presented alongside those of established tools in Table 5 and varied widely ranging from the relatively insensitive (but highly specific) qSOFA and NEWS, to the more balanced performance of HRV+LAB, SOFA, and MEWS. The positive (2.9) and negative (0.4) likelihood ratios of the HRV+LAB predictive model translated into a ∼20% increase or decrease, respectively, in the estimated probability of deterioration (32).
We used this final model to derive an outcome score to define low, medium, and high-risk groups as described in the previous section. The observed incidence of deterioration in each stratum was 2% (95% CI, 1%–4%), 8% (95% CI, 5%–13%) and 35% (95% CI, 26%–44%) for low, medium, and high, respectively. The high-risk group had a 4.3-fold (95% CI, 3.2, 5.4) increased likelihood of meeting one of the study endpoints, while those in the lowest strata were less than half as likely to deteriorate relative to the average risk of the entire population (Fig. 2). This translated into an approximate 8-fold difference in likelihood of reaching a study endpoint between low- and high-risk strata.
In this study, the prediction model derived from a combination of laboratory and HRV measures performed optimally, indeed better than individual models containing clinical, laboratory, and HRV measures alone. These findings are similar to those reported by Eick et al. (33), who evaluated deceleration capacity, also derived from a waveform-based assessment, and then combined the information with bedside clinical variables demonstrating superior performance to either measure applied separately. Our current study is novel in that it explores predictors from three separate domains of clinical assessment, harnessing a machine-learning algorithm to provide risk estimates without input from the clinician. Waveform-based predictive analytics have the potential to complement traditional clinical and laboratory evaluation. The means and degree to which physicians’ might incorporate this predictive information into their own clinical intuition, assessment, and ultimate decision making requires future study.
While these results require further external validation in advance of clinical application, this approach represents a powerful potential tool in the assessment of ED patients with sepsis. Although technologically straightforward, the clinical application of such a tool is complex, requiring capture of waveform data, a clinical user interface, and the ability to calculate the predictive model, all requiring regulatory approval. If applied clinically, computation of risk scores could be performed continuously. One might envision that such passive surveillance might act as a “safety net”—ensuring that patients at increased risk are identified at the earliest opportunity. Patients in this study with scores in the high-risk group had an over fourfold increased risk (35% incidence) of short-term deterioration, whereas those in the lowest risk group were much less likely (2% incidence) to reach one of the study endpoints.
This study has several limitations. First, serum lactate levels were absent from a substantial number (n = 200) of subjects who were initially enrolled, and they were therefore excluded from both model generation and in our final analysis. This likely represented the tendency at our institution to reserve serum lactate measurement for patients in whom there is some degree of diagnostic uncertainty or with more severe illness. Patients who appear less severely ill may have been less likely to have serum lactate measured. This was confirmed when we compared the patients who did not have lactate measured with those who did. Patients in whom lactate was not measured were more likely to have been discharged at 72 h (65% vs. 44%, P < 0.001) and less likely to have reached one or more study endpoints (2.5% vs. 8.5%, P = 0.003).
Second, patients who lacked capacity to consent and did not have a surrogate present in the ED were excluded. Because alteration in consciousness has been shown to correlate with illness severity in sepsis (7), excluding these patients represented a missed opportunity to capture patients who likely had more severe illness.
Third, the present study utilized a cross-sectional design, measuring clinical, laboratory, and HRV measures only at initial presentation. As others have found improved performance with serial measurements of HRV (13) and with changes in severity scores (34), repeated or even continuous measurement would likely improve predictive ability and will be the focus of future studies.
Fourth, creation of our model outcome score was performed as part of an exploratory analysis for a future validation project. Evaluation in an independent, unseen cohort would be required for an accurate assessment of its performance.
Finally, regarding the definition of “sepsis,” for enrollment we relied upon the 1992 consensus definition of sepsis, and thus have interpreted the results utilizing this definition. It is likely that we missed patients with organ dysfunction due to infection by relying exclusively on these criteria for enrollment. However, relying upon these criteria facilitated study execution, as it is clinically recognizable, and easily integrated into existing sepsis initiatives and clinical pathways. Indeed, it has been suggested that the SIRS criteria may still need to be employed in the early triage of patients with infection in order to identify those with systemic response to infection and at risk of organ failure or death (35).
Future projects are planned to further refine the predictive capability of this model, incorporating both repeated assessments and exploring additional predictive measures, with plans to externally validate the findings in a separate cohort. In addition, determining how to optimally present the results of such models to clinicians at the bedside will be examined. Ultimately, a randomized controlled trial will be required to evaluate whether improving prediction of future deterioration in patients with sepsis has the capacity to objectively improve patient outcomes, including hospital and ICU length of stay, proportion of patients requiring ICU admission, and patient mortality.
A predictive model combining HRV and laboratory values outperformed individual models constructed from a single domain (clinical, laboratory, or HRV) and demonstrated comparable or superior discrimination with more balanced sensitivity and specificity when compared with existing risk stratification measures. This tool may help ED physicians evaluate risk of future deterioration in patients presenting with sepsis and thereby guide patient disposition, and merits further evaluation and validation.
The authors acknowledge Joseph Clauser, Leigh Ann Kelly, Heather Luke, and Lifewatch Services, Inc for providing the Holter equipment and scanning services as a courtesy to our group for research purposes and whose support was instrumental to the successful execution and completion of the project.
1. Angus DC, Linde-Zwirble WT, Lidicker J, Clermont G, Carcillo J, Pinsky MR. Epidemiology of severe sepsis
in the United States: analysis of incidence, outcome, and associated costs of care. Crit Care Med
29 7:1303–1310, 2001.
2. Lagu T, Rothberg MB, Shieh M-S, Pekow PS, Steingrub JS, Lindenauer PK. Hospitalizations, costs, and outcomes of severe sepsis
in the United States 2003 to 2007. Crit Care Med
40 3:754–761, 2012.
3. Yealy DM. Early sepsis
care: Finding the best path. Ann Emerg Med
68 3:312–314, 2016.
4. Renaud B, Santin A, Coma E, Camus N, Van Pelt D, Hayon J, Gurgui M, Roupie E, Hervé J, Fine MJ, et al. Association between timing of intensive care unit admission and outcomes for emergency department
patients with community-acquired pneumonia. Crit Care Med
37 11:2867–2874, 2009.
5. Liu V, Kipnis P, Rizk NW, Escobar GJ. Adverse outcomes associated with delayed intensive care unit transfers in an integrated healthcare system. J Hosp Med
7 3:224–230, 2012.
6. Calle P, Cerro L, Valencia J, Jaimes F. Usefulness of severity scores in patients with suspected infection in the emergency department
: a systematic review. J Emerg Med
42 4:379–391, 2012.
7. Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, Rubenfeld G, Kahn JM, Shankar-Hari M, Singer M, et al. Assessment of Clinical Criteria for Sepsis
: For the Third International Consensus Definitions for Sepsis
and Septic Shock (Sepsis
315 8:762–774, 2016.
8. Tseng J, Nugent K. Utility of the shock index in patients with sepsis
. Am J Med Sci
349 6:531–535, 2015.
9. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med
13 10:818–829, 1985.
10. Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, Reinhart CK, Suter PM, Thijs LG. The SOFA (Sepsis
-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis
-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med
22 7:707–710, 1996.
11. Nguyen HB, Van Ginkel C, Batech M, Banta J, Corbett SW. Comparison of predisposition, insult/infection, response, and organ dysfunction, acute physiology and chronic health evaluation II, and Mortality in emergency department sepsis
in patients meeting criteria for early goal-directed therapy and the severe sepsis
resuscitation bundle. J Crit Care
27 4:362–369, 2012.
12. Raith EP, Udy AA, Bailey M, McGloughlin S, MacIsaac C, Bellomo R, Pilcher DV. Australian and New Zealand Intensive Care Society (ANZICS) Centre for Outcomes and Resource Evaluation (CORE). Prognostic accuracy of the SOFA score, SIRS criteria, and qSOFA score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit. JAMA
317 3:290–300, 2017.
13. Green GC, Bradley B, Bravi A, Seely AJE. Continuous multiorgan variability analysis to track severity of organ failure in critically ill patients. J Crit Care
28 5:879.e1–879.e11, 2013.
14. Chen W-L, Chen J-H, Huang C-C, Kuo C-D, Huang C-I, Lee L-S. Heart rate variability
measures as predictors of in-hospital mortality in ED patients with sepsis
. Am J Emerg Med
26 4:395–401, 2008.
15. Bradley B, Green GC, Batkin I, Seely AJE. Feasibility of continuous multiorgan variability analysis in the intensive care unit. J Crit Care Elsevier
27 2:218e9–218e20, 2012.
16. Ahmad S, Tejuja A, Newman KD, Zarychanski R, Seely AJE. Clinical review: a review and analysis of heart rate variability
and the diagnosis and prognosis of infection. Crit Care
13 6:232, 2009.
17. Taylor RA, Pare JR, Venkatesh AK, Mowafi H, Melnick ER, Fleischman W, Hall MK. Prediction
of in-hospital mortality in emergency department
patients with sepsis
: a local big data-driven, machine learning approach. Acad Emerg Med
23 3:269–278, 2016.
18. Bone RC, Balk RA, Cerra FB, Dellinger RP, Fein AM, Knaus WA, Schein RM, Sibbald WJ. Definitions for sepsis
and organ failure and guidelines for the use of innovative therapies in sepsis
. The ACCP/SCCM Consensus Conference Committee. American College of Chest Physicians/Society of Critical Care
19. Herry CL, Green GC, Bravi A, Seely AJE. Sturmberg J, Martin C. Continuous multiorgan variability monitoring in critically ill patients: complexity science at the bedside. Handbook of Systems and Complexity in Health
. New York, NY: Springer; 2013. 467–481.
20. Clifford GD, McSharry PE, Tarassenko L. Characterizing artefact in the normal human 24-hour RR time series to aid identification and artificial replication of circadian variations in human beat to beat heart rate using a simple threshold. Comput Cardiol
21. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. Heart rate variability
: standards of measurement, physiological interpretation and clinical use. Circulation
93 5:1043–1065, 1996.
22. Subbe CP, Kruger M, Rutherford P, Gemmel L. Validation of a modified Early Warning Score in medical admissions. QJM
23. National Early Warning Score (NEWS). Standardising the assessment of acute illness severity in the NHS. Report of a working party. London: Royal College of Physicians; 2012.
24. Churpek MM, Snyder A, Han X, Sokol S, Pettit N, Howell MD, Edelson DP. qSOFA, SIRS, and early warning scores for detecting clinical deterioration in infected patients outside the ICU. Am J Respir Crit Care Med
195 7:906–911, 2016.
25. Polikar R. Ensemble based systems in decision making. IEEE Circuits Syst Mag
6 3:21–45, 2006.
26. Seely AJE, Bravi A, Herry C, Green G, Longtin A, Ramsay T, Fergusson D, McIntyre L, Kubelik D, Maziak DE, et al. Do heart and respiratory rate variability improve prediction
of extubation outcomes in critically ill patients? Crit Care
18 2:R65, 2014.
27. Pudil P, Novovicova J, Kittler J. Floating search methods in feature selection. Pattern Recognit Lett
28. Hand DJ. Evaluating diagnostic tests: the area under the ROC curve and the balance of errors. Stat Med
29 14:1502–1510, 2010.
29. Peng CK, Havlin S, Stanley HE, Goldberger AL. Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos
5 1:82–87, 1995.
30. Costa M, Goldberger AL, Peng C-K. Multiscale entropy analysis of complex physiologic time series. Phys Rev Lett
89 6:068102, 2002.
31. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat
29 4:1165–1188, 2001.
32. McGee S. Simplifying likelihood ratios. J Gen Intern Med
17 8:646–649, 2002.
33. Eick C, Rizas KD, Meyer-Zürn CS, Groga-Bada P, Hamm W, Kreth F, Overkamp D, Weyrich P, Gawaz M, Bauer A. Autonomic nervous system activity as risk predictor in the medical emergency department
: a prospective cohort study. Crit Care Med
43 5:1079–1086, 2015.
34. Yu S, Leung S, Heo M, Soto GJ, Shah RT, Gunda S, Gong MN. Comparison of risk prediction
scoring systems for ward patients: a retrospective nested case-control study. Crit Care
18 3:R132, 2014.
35. Sprung CL, Schein RMH, Balk RA. To SIRS with love—an open letter. Crit Care Med
45 4:736–738, 2017.