Secondary Logo

Journal Logo

Clinical Aspects

Validation of the Sepsis Severity Score Compared with Updated Severity Scores in Predicting Hospital Mortality in Sepsis Patients

Khwannimit, Bodin; Bhurayanontachai, Rungsun; Vattanavanit, Veerapong

Author Information
doi: 10.1097/SHK.0000000000000818



Sepsis and septic shock are among the most serious life-threatening medical conditions. They consume healthcare resources and incur a high rate of overall costs (1, 2). These patients require intensive care unit (ICU) admission for resuscitation and close hemodynamic monitoring. Several studies have reported that the incidence of sepsis and septic shock is increasing with a high mortality rate (3–5).

Severity scoring systems have been developed for predicting hospital deaths in general critically ill patients (6, 7). The Acute Physiology and Chronic Health Evaluation (APACHE) along with the Simplified Acute Physiology Score (SAPS) are the commonly used scoring models. However, the performance of these scores has declined over time; therefore, the updated version of these two scores (APACHE IV (8) and SAPS 3 (9)) were developed using a very large database. Severity scores are not only used to provide information on the disease severity and risk of hospital mortality, but also for evaluation, monitoring of ICU performance, and ICU benchmarking (7, 10). Limited studies are available for the evaluation of the performance of APACHE IV and SAPS 3 scores in sepsis and septic shock patients (11, 12).

Recently, the Sepsis Severity Score (SSS) was devised as a new model for predicting hospital mortality in sepsis patients (13). The score was constructed based on sepsis patients admitted to 218 hospitals from 18 countries around the world, from the Surviving Sepsis Campaign database. This score presented a good calibration and acceptable discrimination for estimating hospital mortality in sepsis patients. Williams et al. (14) reported that SSS had limited ability for predicting outcome in sepsis patients within an emergency department. For proper application, a new score to independently patients from the original population along with an assessment of the validity of the model should be made. Interestingly, the performance of SSS to predict outcome in sepsis patients outside the United States has not been evaluated.

The purpose of this study was to validate the SSS and compare its performances with those of the updated severity scores (APACHE IV and SAPS 3) and standard severity scores (APACHE II, III, and SAPS II) in predicting hospital mortality in sepsis patients.


This study was a retrospective analysis of a prospective registry of severity scoring systems of patients admitted to our medical intensive care unit. This ICU contains 10 beds within an 842-bed tertiary referral, teaching university hospital at Prince of Songkla University, located in southern Thailand. Our ICU is run by three full-time, board-certified intensivists. This study was approved by our Institutional Review Board (REC 57-311-14-1) and informed consent was not required.

Adult patients of more than 15 years with sepsis, admitted to the ICU between the periods of January 1, 2011 to March 30, 2015, were enrolled in this analysis. Sepsis and septic shock were defined following the criteria of the Third International Consensus Definitions for sepsis and septic shock (Sepsis-3) (15). Patients who stayed in the ICU for less than 8 h (as APACHE II criteria) and withdrawal of life-sustaining therapy were excluded. For those patients who had been admitted to the ICU more than once during their hospital stay, only the first admission was included.

All components of the SSS, APACHE II-IV (8, 16, 17), SAPS II (18), and SAPS 3 (9) scores, as described by the original papers, were collected (detailed in Additional file Table S1, The variables and calculation of the SSS have been summarized in Additional file Table S2, The physiological data of SSS, APACHE II-IV, and SAPS II were based on the worst values within the first 24 h after ICU admission (8, 13, 16–18), in contrast to data for SAPS 3, which were based on the first hour before or after ICU admission (9). The Sequential Organ Failure Assessment score was calculated on the first day of ICU admission (19). The predicted hospital mortalities of SSS, APACHE II-III, SAPS II, and SAPS 3 were calculated according to standard formula defined by the original articles (9, 13, 16–18). APACHE IV predictions of hospital mortality were obtained from the Cerner Corporation via The primary outcome was in-hospital mortality.

Statistical analyses were performed using Stata 7 software (StataCorp, College Station Tex). Descriptive data were summarized as mean± standard deviation, median (interquartile range), or percentage. Chi-square and Wilcoxon rank sum test were used to compare category variables and continuous variables, respectively. The performance of the severity scores was evaluated by discrimination, calibration, and overall performance. The discrimination refers to the ability of the score to discriminate between survivors and non-survivors. This index was evaluated by using the area under the receiver operating characteristic curve (AUC), as described by Hanley and McNeil (20). Calibration refers to the agreement between the observed and expected numbers of survivors and non-survivors across all of the strata of probabilities of death. Calibration was examined by the Hosmer–Lemeshow (H–L) goodness-of-fit H and C statistics (21), calibration curve analysis and standardized mortality ratio (SMR). A P value >0.05 was accepted as indicating a good calibration. Calibration curves were constructed by plotting predicted mortality rates, stratified by 10% increments of predicted mortality versus actual mortality rates. The SMR was calculated by dividing observed hospital mortality by predicted hospital mortality. The 95% confidence intervals (CI) for the SMRs were calculated by the method described by Rapoport et al. (22). The overall performance was assessed by the Brier score. It measures the mean squared deviation between the observed and predicted outcomes. The Brier score offers an overall assessment of performance involving elements of both discrimination and calibration (23). A lower score represents higher accuracy. In addition, the performance of all scores was compared between sepsis diagnosed by Sepsis-3 and previous definition (Sepsis-2) (24). Statistical significant was considered at P <0.05.


There were 913 patients enrolled during the study period. Sepsis and septic shock were identified in 437 (47.9%) and 476 (52.1%) patients, respectively. Overall ICU and hospital mortality rates were 32.5% and 43.9%, respectively. The most common sources of ICU admission were from the emergency department (47.5%), general wards (43.9%), and transfers from other hospitals (8.6%). Mechanical ventilation was used in 808 patients (88.5%). Patient clinical characteristics stratified by hospital outcome are presented in Table 1. Non-survivors were older, had higher severity scores, and predicted mortality rate, more comorbidities, and longer ICU stays. However, survivors had more community-acquired infection and longer hospital stays (Table 1). Blood cultures were positive in 289 patients (31.6%) and microorganisms were isolated from 724 patients (79.3%). The most common organisms were Escherichia coli (19.9%), Klebsiella spp. (13%), and Acinetobacter baumannii (10.5%).

Table 1
Table 1:
Patient clinical characteristics data stratified by hospital death

Hospital mortality ranked by SSS is summarized in Figure 1. Hospital mortality was substantially increased in patients with a higher SSS. A hospital mortality rate of less than 5% was observed in patients with SSS less than that of 60; however, patients having scores of more than 100 incurred the mortality rate to rise to 80%. The performance of the SSS and APACHE II-IV, SAPS II, and SAPS 3 is presented in Table 2. The SMRs of severity scores were between 0.81 and 1.10. The discrimination of all scores was very good with an AUC ranging from 0.892 to 0.948. The SSS presented good discrimination with AUC of 0.892 (95% CI, 0.871–0.913). The AUC of the SSS was not statistically significant from that of APACHE II (P = 0.07), SAPS II (P = 0.06), and SAPS 3 (P = 0.11). However, the APACHE IV score showed the best discrimination with an AUC of 0.948 (Table 2 and Fig. 2). The AUC of the APACHE IV score was statistically greater than that of the SSS, APACHE II, SAPS II, and SAPS 3 (P <0.0001 for all) and APACHE III (P = 0.0002) (Fig. 2). The AUC of APACHE III was also higher than that of the SSS (P = 0.0002), APACHE II (P = 0.004), SAPS II and SAPS 3 (P = 0.002 for both).

Fig. 1
Fig. 1:
Hospital mortality rate arranged by Sepsis Severity Score.
Table 2
Table 2:
The performance of SSS, APACHE II, III, IV, SAPS II, and SAPS 3
Fig. 2
Fig. 2:
Comparison of ROC curves for prediction of hospital mortality by Sepsis Severity Score and standard severity scores (APACHE II–IV, SAPS II, and SAPS 3).APACHE indicates acute physiology and chronic health evaluation; ROC, receiver operating characteristic; SAPS, simplified acute physiology score.

The calibration was poor for all scores, with the H–L goodness-of-fit H statistics <0.05 (Table 2). The calibration curves for the SSS and APACHE II-IV scores are presented in Figure 3 and for SAPS scores are represented in the Addition file: Figure S1, Overall performance by the Brier score was best for APACHE IV (0.096) and worst for SSS (0.183).

Fig. 3
Fig. 3:
Calibration curves for Sepsis Severity Score and Acute Physiology and Chronic Health Evaluation II–IV.APACHE indicates acute physiology and chronic health evaluation; SSS, Sepsis Severity Score.

The performance of SSS for subgroups analysis is shown in Table 3. The SSS showed good discrimination in all subgroups of sepsis patients with an AUC 0.864 to 0.903, except that with severe sepsis diagnosed by Sepsis-2. The discrimination performance of SSS was similar in subgroup of sepsis patients stratified by Sepsis-3. The AUC of SSS was not significantly different between septic shock defined by Sepsis-3 and Sepsis-2. However, discrimination declined for all scores in group of patients with severe sepsis by Sepsis-2 definition (Table 3 and in the Additional file: Tables S3 and S4, The SSS underestimated hospital mortality in septic shock stratified by both definitions; however, the score overestimated mortality in severe sepsis by Sepsis-2. The calibration of the SSS was unsatisfactory in most subgroups of sepsis in our study. The AUC of SSS and SAPS 3 score was not statistically significant in all subgroups of sepsis, except septic shock diagnosed by Sepsis-3 and patients younger than 60 years old (in the Addition file: Table S3, Similarly, the APACHE IV score presented the best discrimination and overall performance in all subgroups of our sepsis patients (in the Addition file: Tables S3 and S4,

Table 3
Table 3:
The performance of Sepsis Severity Score for various subgroups of sepsis patients


In our study, the SSS provided good discrimination but poor calibration for predicting hospital mortality in sepsis patients admitted to the ICU. The APACHE IV score had the best discrimination and overall performance. However, the calibration of all models was poor in predicting outcome in our sepsis patients.

To the best of our knowledge, this is the first report of external evaluation of the performance of the SSS in predicting hospital outcome in sepsis patients by new sepsis definition (Sepsis-3). It is also believe to be the first attempt to compare SSS with the updated standard severity scores.

Mortality risk assessment in sepsis patients is commonly used in critical care for many purposes such as patient's risk stratification in clinical trials, risk adjustment in cohort studies, resource allocation, and evaluation of ICU performance (10). Several risk stratification methods in sepsis have been applied, including the use of biomarkers, organ failure scores, and severity scores (12, 25–29). However, there is no ideal score solely for sepsis. Thus, new models were devised for these patients.

The SSS was recently proposed as a new prediction model for outcome in sepsis patients. Osborn et al. (13) undertook a secondary analysis of 23,438 sepsis patients from the Surviving Sepsis Campaign database for constructing the model. Logistic regression analysis was used for estimating the probability of hospital mortality. The prediction model was developed using a random sample of 21,085 patients and was validated by the bootstrapped random sample technique. The SSS model had 34 categorical variables. They reported an AUC of 0.736 in the development group and 0.701 in the validation group with good calibration (the H-L goodness-of-fit P = 0.58) (13). In our study, the discrimination of SSS was higher than that of the original study (AUC 0.892).

Williams et al. (14) evaluated the performance of SSS with APACHE II and SAPS II scores in sepsis patients presenting to an emergency department. The SSS showed moderate discrimination for sepsis patients (AUC 0.78) but poor discrimination in subgroups of severe sepsis and septic shock patients (AUC 0.69 and 0.63, respectively) (14). The APACHE II and SAPS II scores had better discrimination than SSS (AUC 0.86, 0.88, and 0.78, respectively). In contrast to our study, the SSS showed good discrimination and was not different from APACHE II, SAPS II, and SAPS 3 scores for predicting hospital outcome in sepsis patients. Difference in case-mix, sepsis definitions, and severity of sepsis may underlie the opposite results. The majority of sepsis patients in our study were those with septic shock and these had a higher severity scores than that in the previous report (APACHE II 22.8 vs. 6 and SSS 82.7 vs. 13) (14).

Several studies have evaluated the performance of APACHE IV and SAPS 3 in specific subgroups of critically ill patients such as acute coronary syndrome, acute kidney injury, and sepsis patients. Nassar et al. (30) compared APACHE IV and SAPS 3 in predicting hospital mortality of acute coronary syndrome patients admitted to an ICU in Brazil. Both scores showed good discrimination but only APACHE IV demonstrated adequate calibration. Costa et al. (31) showed that APACHE IV and SAPS 3 scores had good discrimination and calibration for predicting mortality in acute kidney injury patients. A previous single study validated the performance of APACHE IV in small group of surgical abdominal sepsis patients and found that the APACHE IV had moderate discrimination with AUC 0.72 (11). However, the calibration was not evaluated in this study. Zhang et al. (26) evaluated the ability of APACHE III score in predicting outcome in patients with sepsis-associated acute lung injury. The APACHE III had moderate discrimination with an AUC 0.68 but good calibration for predicting outcome of the study population.

Our results are consistent with previous studies, which reported that the updated versions of APACHE scores were associated with better discrimination (23, 32). However, an updated version of APACHE score requires more variables, comorbidities, and details of diagnosis categories. Therefore, the use of an updated version is associated with the increased burden of data collection, workload, more complex calculations, and financial costs. Kuzniewicz et al. (33) reported that abstraction times for variables of APACHE IV and SAPS II were 37.3 and 19.6 min, respectively. The discrimination of the APACHE III and APACHE IV being better than SSS may be from the inclusion of age, comorbidities, and more predictor variables.

A scoring model with a higher discrimination is able to accurately identify patients with a higher probability of dying. With this in mind, the SSS could be used to assess risk stratification in sepsis patients as well as the use of APACHE II, SAPS II, and SAPS 3 scores. The calibration of SSS and all severity scores was poor for predicting outcome in our sepsis patients. There are several possible causes for suboptimal performance, including differences in case-mix, standard of care, and treatment policies. In addition, customization of the SSS and severity scores or a new model should be created for predicting outcome in sepsis patients. First-level customization is a simple and practical method for a local institution or a country to improve the performance of severity models (34, 35). A first-level customization refers to calculate a new logistic coefficient, while maintaining the same variables with the same weights as the original model. Moreover, the SSS may be more accurately modified using new variables such as age, comorbidities, or new physiological parameters.

There are some limitations in our study. First, this is a single-center study, our data may not be applicable to other ICUs with different in case-mix, admission criteria, and standard of ICU care. However, the evaluation of severity scores in a single institute may more accurately reflect the performance of the score, without potential confounding by ICU management. Second, we only evaluated sepsis patients admitted to a medical ICU, accordingly these results may not apply for specific groups for sepsis such as surgical and traumatic sepsis patients.

In conclusion, the SSS was similar in discrimination to APACHE II, SAPS II, and SAPS 3 scores for predicting hospital mortality in sepsis patients admitted to the ICU. However, APACHE IV scores provide the best discrimination and overall performance. The calibration of all scores was poor. Furthermore, SSS should be customized as well as validated in independent groups of sepsis patients, before generalized application of the score in clinical practice.


The authors thank Dr Alan Frederick Geater for editing of the manuscript.


1. Khwannimit B, Bhurayanontachai R. The direct costs of intensive care management and risk factors for financial burden of patients with severe sepsis and septic shock. J Crit Care 2015; 30 5:929–934.
2. Jones SL, Ashton CM, Kiehne LB, Nicolas JC, Rose AL, Shirkey BA, Masud F, Wray NP. Outcomes and resource use of sepsis-associated stays by presence on admission, severity, and hospital type. Med Care 2016; 54 3:303–310.
3. Kaukonen KM, Bailey M, Suzuki S, Pilcher D, Bellomo R. Mortality related to severe sepsis and septic shock among critically ill patients in Australia and New Zealand, 2000–2012. JAMA 2014; 311 13:1308–1316.
4. Fedeli U, Piccinni P, Schievano E, Saugo M, Pellizzer G. Growing burden of sepsis-related mortality in northeastern Italy: a multiple causes of death analysis. BMC Infect Dis 2016; 16:330.
5. Kempker JA, Martin GS. The changing epidemiology and definitions of sepsis. Clin Chest Med 2016; 37 2:165–179.
6. Keegan MT, Gajic O, Afessa B. Severity of illness scoring systems in the intensive care unit. Crit Care Med 2011; 39 1:163–169.
7. Salluh JI, Soares M. ICU severity of illness scores: APACHE, SAPS and MPM. Curr Opin Crit Care 2014; 20 5:557–565.
8. Zimmerman JE, Kramer AA, McNair DS, Malila FM. Acute Physiology and Chronic Health Evaluation (APACHE) IV: hospital mortality assessment for today's critically ill patients. Crit Care Med 2006; 34 5:1297–1310.
9. Moreno RP, Metnitz PG, Almeida E, Jordan B, Bauer P, Campos RA, Iapichino G, Edbrooke D, Capuzzo M, Le Gall JR. SAPS 3—from evaluation of the patient to evaluation of the intensive care unit. Part 2: development of a prognostic model for hospital mortality at ICU admission. Intensive Care Med 2005; 31 10:1345–1355.
10. Power GS, Harrison DA. Why try to predict ICU outcomes? Curr Opin Crit Care 2014; 20 5:544–549.
11. Chan T, Bleszynski MS, Buczkowski AK. Evaluation of APACHE-IV predictive scoring in surgical abdominal sepsis: a retrospective cohort study. J Clin Diagn Res 2016; 10 3:PC16–PC18.
12. Khwannimit B, Bhurayanontachai R. Validation of predisposition, infection, response and organ dysfunction score compared with standard severity scores in predicting hospital outcome in septic shock patients. Minerva Anestesiol 2013; 79 3:257–263.
13. Osborn TM, Phillips G, Lemeshow S, Townsend S, Schorr CA, Levy MM, Dellinger RP. Sepsis severity score: an internationally derived scoring system from the surviving sepsis campaign database*. Crit Care Med 2014; 42 9:1969–1976.
14. Williams JM, Greenslade JH, Chu K, Brown AF, Lipman J. Severity scores in emergency department patients with presumed infection: a prospective validation study. Crit Care Med 2016; 44 3:539–547.
15. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, Bellomo R, Bernard GR, Chiche JD, Coopersmith CM, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 2016; 315 8:801–810.
16. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med 1985; 13 10:818–829.
17. Knaus WA, Wagner DP, Draper EA, Zimmerman JE, Bergner M, Bastos PG, Sirio CA, Murphy DJ, Lotring T, Damiano A, et al. The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest 1991; 100 6:1619–1636.
18. Le Gall JR, Lemeshow S, Saulnier F. A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA 1993; 270 24:2957–2963.
19. Vincent JL, Moreno R, Takala J, Willatts S, De Mendonca A, Bruining H, Reinhart CK, Suter PM, Thijs LG. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med 1996; 22 7:707–710.
20. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143 1:29–36.
21. Lemeshow S, Hosmer DW Jr. A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 1982; 115 1:92–106.
22. Rapoport J, Teres D, Lemeshow S, Gehlbach S. A method for assessing the clinical performance and cost-effectiveness of intensive care units: a multicenter inception cohort study. Crit Care Med 1994; 22 9:1385–1391.
23. Keegan MT, Gajic O, Afessa B. Comparison of APACHE III, APACHE IV, SAPS 3, and MPM0III and influence of resuscitation status on model performance. Chest 2012; 142 4:851–858.
24. Levy MM, Fink MP, Marshall JC, Abraham E, Angus D, Cook D, Cohen J, Opal SM, Vincent JL, Ramsay G. 2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference. Intensive Care Med 2003; 29 4:530–538.
25. Ghanem-Zoubi NO, Vardi M, Laor A, Weber G, Bitterman H. Assessment of disease-severity scoring systems for patients with sepsis in general internal medicine departments. Crit Care 2011; 15 2:R95.
26. Zhang Z, Chen K, Chen L. APACHE III outcome prediction in patients admitted to the intensive care unit with sepsis associated acute lung injury. PLoS One 2015; 10 9:e0139374.
27. Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, Rubenfeld G, Kahn JM, Shankar-Hari M, Singer M, et al. Assessment of clinical criteria for sepsis: for the third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 2016; 315 8:762–774.
28. Lipinska-Gediga M, Mierzchala-Pasierb M, Durek G. Procalcitonin kinetics - prognostic and diagnostic significance in septic patients. Arch Med Sci 2016; 12 1:112–119.
29. Ward MJ, Self WH, Singer A, Lazar D, Pines JM. Cost-effectiveness analysis of early point-of-care lactate testing in the emergency department. J Crit Care 2016; 36:69–75.
30. Nassar Junior AP, Mocelin AO, Andrade FM, Brauer L, Giannini FP, Nunes AL, Dias CA. SAPS 3, APACHE IV or GRACE: which score to choose for acute coronary syndrome patients in intensive care units? Sao Paulo Med J 2013; 131 3:173–178.
31. Costa e Silva VT, de Castro I, Liano F, Muriel A, Rodriguez-Palomares JR, Yu L. Performance of the third-generation models of severity scoring systems (APACHE IV, SAPS 3 and MPM-III) in acute kidney injury critically ill patients. Nephrol Dial Transplant 2011; 26 12:3894–3901.
32. Juneja D, Singh O, Nasa P, Dang R. Comparison of newer scoring systems with the conventional scoring systems in general intensive care population. Minerva Anestesiol 2012; 78 2:194–200.
33. Kuzniewicz MW, Vasilevskis EE, Lane R, Dean ML, Trivedi NG, Rennie DJ, Clay T, Kotler PL, Dudley RA. Variation in ICU risk-adjusted mortality: impact of methods of assessment and potential confounders. Chest 2008; 133 6:1319–1327.
34. Metnitz B, Schaden E, Moreno R, Le Gall JR, Bauer P, Metnitz PG. Austrian validation and customization of the SAPS 3 Admission Score. Intensive Care Med 2009; 35 4:616–622.
35. Khwannimit B, Bhurayanontachai R. A comparison of the performance of Simplified Acute Physiology Score 3 with old standard severity scores and customized scores in a mixed medical-coronary care unit. Minerva Anestesiol 2011; 77 3:305–312.

Acute physiology and chronic health evaluation; outcome; risk assessment; severity of illness; simplified acute physiology score

Supplemental Digital Content

© 2017 by the Shock Society