The prognostic and general severity scoring systems that are used in the ICU are beneficial in predicting risk of mortality. Mortality prediction is important for patient or family information and consent, comparison of ICU results, and in monitoring quality of ICU care [1,2]. The Acute Physiology and Chronic Health Evaluation (APACHE)  and Simplified Acute Physiology Score (SAPS)  are the most common systems in use. The last version of SAPS, SAPS3 Admission Score (SAPS3as), recently has been introduced as an improvement of older systems [5,6].
An ICU can improve the performance of any score with periodic recalibration of the equations . Changes in the scoring system that is used by an ICU to improve predictive abilities are justified when the newly adopted score considers variables that were not in the previously used system. SAPS3 has changed the usual physiology-based scoring systems, introducing three subscores, namely patient characteristics before ICU admission [age, comorbidities, use of vasoactive drugs, intrahospital location, and length-of-stay (LOS) in the hospital – called lead time], circumstances of ICU admission (reason for ICU admission, planned/unplanned, surgical status, anatomical site of surgery, and presence of infection and place acquired), and acute physiology (lowest estimated Glasgow coma scale score, highest heart rate, lowest systolic blood pressure, highest bilirubin, highest body temperature, highest creatinine, highest leukocytes, lowest platelets, lowest pH, and ventilatory support, and oxygenation) that are summed up to produce the SAPS3 score. These subscores provide 50, 22.5, and 27.5% of the predictive power of the score, in contrast with APACHE II in which the acute physiology score is the most important factor (65.6% of predictive power, instead of 27.5%). The patients' worst physiologic parameters at ICU admission (<1 h) are recorded. The original study  found regional differences in mortality with the same SAPS3as, so probability of mortality is calculated using the SAPS3as in the global database (GlobalS3) or in a customized equation based on seven world locations of the hospital: Southern Europe–Mediterranean countries (SEMcS3), Central–Western Europe, Eastern Europe, North Europe, Australasia, Central–South America, and North America.
We have been recording APACHE II for several years in our ICU but switched to SAPS3as in 2006. To assess the extent of improvement that SAPS3as provided and to validate SAPS3as customized for Southern Europe, we evaluated the performance of both APACHE II and SAPS3as scores in predicting hospital mortality among our ICU patients.
The study was conducted in a mixed medical and surgical ICU at a tertiary referral university teaching hospital. The 12-bed ICU had approximately 1000 admissions per year. The hospital also has a 12-bed coronary and intermediate care unit, thus limiting the number of coronary care patients on the ICU. The ICU was staffed with fully trained anaesthesiologists in a round-the-clock shift system. The study was approved by the hospital ethics committee, and informed consent was waived. We collected data to compute SAPS3as and the presence of organ dysfunction with Sequential Organ Failure Assessment (SOFA)  at admission over a 1-year period (January–December 2006) concurrently for consecutive ICU admissions. We also collected data for APACHE II retrospectively from ICU computing charts by three investigators (M.J.Y., M.V. and G.E.) and the main investigator (C.-L.M.) rechecked chart by chart to control for the quality of both APACHE and SAPS3 data collection. Patients who were excluded from the present study were readmissions, those who were aged 15 years or younger, and those who died within 4 h of admission. Data collection was performed according to the criteria and definitions that were described by the developers, and we calculated predicted hospital mortality rates with the SAPS3 model, both the global model (GlobalS3) and the model that was customized for SEMcS3, and the APACHE II model. ICU LOS and lead-time (pre-ICU hospital LOS) were calculated.
The outcome measure was hospital mortality. We compared characteristics of survivors with those of nonsurvivors: continuous variables are reported as median ± interquartile range and compared using Student's t-test. Categorical variables are shown as the count and percentage and compared with the χ2 test.
The performance of the systems (APACHE II model, the GlobalS3 model, the SEMcS3 model, and SOFA scores) was determined by examining their discrimination and calibration. Discrimination (the ability to correctly classify survivors and nonsurvivors) was tested by the area under the receiver operating characteristics curve (AUROC). The AUROCs were compared using the algorithm that was proposed by DeLong et al. .
The calibration of the systems (prognostic accuracy at different levels of risk) was studied using Lemeshow–Hosmer χ2 goodness-of-fit statistics (Ĉ) and the standard mortality ratio (SMR). To test the accuracy of prediction the smaller the Ĉ value, the better the model's calibration. A P value above 0.05 in the Ĉ test indicated a good calibration. The SMR, a crude measurement of calibration, was calculated by dividing observed hospital mortality by the predicted hospital mortality. The 95% confidence interval (CI) for the SMR was calculated by the Boice–Monson method.
We used for comparison all the included patients and two subsamples: one after excluding surgical cardiac patients (who were not considered in APACHE II development) and the other after excluding patients with an estimated probability of death of less than 10% according to APACHE II (that could change the models' predictive accuracy).
All statistical tests were two-sided, and a significance level of 0.05 was used. Statistical analysis was performed using the SPSS version 15 software package (SPSS, Inc., Chicago, Illinois, USA) and Stata 10.0 software (Stata Corporation, College Station, TX, USA).
During the study period, 935 patients were admitted to the ICU. Seventy-one patients were excluded (69 readmissions and two early deaths); thus, 864 patients were considered for analysis. Overall, 71 patients (8.2%) died in the hospital and 50 patients (5.8%) in the ICU.
To characterize our ICU population, we show the patients' demographics, type of admission, diagnostic categories, scores, and LOS in Table 1. As expected, our nonsurvivor patients were older, mainly medical admissions with respiratory diseases and with more comorbidities. They presented longer ICU LOS and lead time and higher severity scores.
The performance of the scoring systems is shown in Table 2 and is displayed graphically in Figs 1 and 2. SAPS3as had greater discriminative power (AUROC 0.917) than APACHE II or SOFA, though the comparison by AUROC showed no statistically significant differences (APACHE II vs. SAPS3as χ2 = 0.83; P = 0.36). GlobalS3 (Ĉ = 8.57, P = 0.38) and SEMcS3 (Ĉ = 7.5, P = 0.48) were well calibrated, in contrast with APACHE (Ĉ = 19.0, P = 0.015).
After excluding postoperative cardiac surgery patients, we observed better discrimination by both systems in 726 patients but worse calibration.
After excluding patients who had an estimated probability of death of less than 10%, both systems showed poorer discrimination in 237 patients; only APACHE improved calibration.
SMR analysis revealed no differences in predictive results with any model. Both systems overestimated hospital mortality with significant improvement after excluding low-risk patients.
In this study, the first validation of SAPS3as in Southern Europe, we demonstrate that the ability to predict hospital mortality in a general ICU can be improved by adopting the newer SAPS3as and departing from use of the older APACHE II.
We found that the discriminative ability of SAPS3as is excellent, even better than in the original report . Moreover, it has greater discriminative power than APACHE or SOFA in our critically ill patients. SAPS3as also has a better, more appropriate calibration than APACHE, so only SAPS3as properly predicts mortality risk in our ICU. Our results prove that a validation study can confirm the superior predictive performance of SAPS3 compared with APACHE in an adult mixed-case ICU.
We chose APACHE II for comparison with SAPS3as because APACHE II was used in our ICU and is the most popular system in Spain. Many Spanish ICUs are reluctant to implement SAPS3as due to greater familiarity with APACHE II and the lack of validation studies with SAPS3as in Southern Europe.
The improved performance of SAPS3as compared with APACHE II in our study can be explained by several observations: SAPS3as was developed much more recently (2005 versus 1985 for APACHE II); its developmental database is broader (16 784 patients in 303 ICUs worldwide versus 5005 patients in the United States); it avoids both lead-time bias (taking into account both the pre-ICU hospital LOS and therapeutic intervention before ICU admission) and the potentially poor care that is given in the first 24 h because of data collection at the time of admission (Boyd and Grounds effect: the occurrence of more abnormal physiologic values during the first 24 h in the ICU, leading to an increase in computed severity of illness and a corresponding increase in predicted mortality); and SAPS3as includes other patient characteristics and circumstances of admission rather than just patients' acute physiological changes. Another important advantage of SAPS3as compared with other scoring systems is its free, open access to the scientific community .
The insufficient calibration of APACHE II and its observed overestimation of mortality can be explained by differences in patient selection and case mix . Few demographic patient data were reported in the original description of APACHE II , but compared with SAPS3as, our patients were more often postoperative admissions, had fewer emergency operations, and suffered from very few traumas, which are characteristics that elicit the best calibration in the SAPS3as developmental model . Although ICU admission policies generally are unknown, they probably also influence outcome. The APACHE model differs in risk assessment of medical or surgical patients . Nevertheless, APACHE II prediction has been more consistent across a wide range of mortality risks than APACHE III or SAPS 2 . Patient selection may, at least in part, explain our incongruent results compared with recent studies that have evaluated these scoring systems [13–17].
SAPS3as has been validated only in central Europe, in one general ICU in Belgium (802 patients), a surgical ICU in Germany (1851 patients) and 22 ICUs in Austria (2060 patients) [13–15]. SAPS3as demonstrated its superior discriminative power over APACHE II but noted the poor calibration of the global model [13–15]. Only the SAPS3as model that was customized for central and western Europe had a good calibration, similar to SAPS II, in Belgium  but not in Germany or Austria [14,15]. Differences in the performance of scoring systems reinforce the need to validate them using data of independent samples from different ICUs in different countries, due to variations in case mix, structure, and organization of acute medical care, lifestyles, and genetic makeup between populations .
Adequate discrimination by APACHE II previously has been described with an AUROC of 0.91 in Thailand, 0.88 in Hong Kong, 0.83 in Greece and Saudi Arabia, and 0.79 in Portugal . Its calibration, however, always has been poor, as evidenced by recent studies, primarily due to differences in case mix, data collection, and lead-time bias [11,17]. Because postoperative cardiac surgical patients were not included in the developmental database of APACHE II, we also chose to exclude this subpopulation, achieving marginally better discrimination – that did not achieve statistical significance – yet poorer calibration.
As our ICU mortality is low, we chose also to exclude the subpopulation of low-risk admissions (mortality risk <10%) because nearly two-fold greater mortality ratios generated by APACHE III and SAPS II in low-risk patients have been described . Excluding low-risk patients, we found a poorer discrimination of both systems but a better calibration of APACHE II that shows its overestimation of hospital mortality in low-risk patients.
Because SOFA was developed as an organ dysfunction scoring system, we did not compute prediction of death with SOFA. However, it has also been used as a prognostic score  because organ dysfunction is independently associated with mortality, so we have used the SOFA score at admission as a comparative item of predictors of hospital mortality. We observed better calibration with it than with APACHE II, at the cost of poorer discriminative power, in contrast with an Australian study in which APACHE II performed better than SOFA in predicting hospital mortality .
The present study has some limitations. First, as a single-centre study, there may be bias with regard to case mix, quality of ICU care, and ICU policy. Our low-risk ICU admissions can contribute to the models' predictive accuracy , though SAPS3as maintained good calibration even after exclusion of low-risk patients. Second, our relatively small sample size is a limiting factor in stratified analysis of calibration. Third, APACHE II is based on retrospective data of an automatic physiological data management system that is available within 24 h of ICU admission; consequently, the sampling rate that is used can influence mortality estimation. Furthermore, though we rechecked the charts for data collection quality, we did not calculate or use any statistical test to analyse interrater variance.
A multicentre study would mitigate the concerns over case mix and benefit from a larger sample size.
We found a better calibration of SAPS3 as than APACHE II such that SAPS3as improves the ability to predict hospital mortality in comparison with the older APACHE II score. Hospital mortality was lower than predicted by both models. The discrimination of SAPS3as is excellent, and, when it is customized for Southern Europe, SAPS3as accurately predicts mortality risk in our adult mixed-case ICU.
1 Strand K, Flaatten H. Severity scoring in the ICU: a review. Acta Anaesthesiol Scand 2008; 52:467–478.
2 Zimmerman JE, Alzola C, Von Rueden KT. The use of benchmarking to identify top performing critical care units: a preliminary assessment of their policies and practices. J Crit Care 2003; 18:76–86.
3 Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med 1985; 13:818–829.
4 Le Gall JR, Lemeshow S, Saulnier F. A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA 1993; 270:2957–2963.
5 Metnitz PG, Moreno RP, Almeida E, et al
. SAPS3 – from evaluation of the patient to evaluation of the intensive care unit. Part 1: objectives, methods and cohort description. Intensive Care Med 2005; 31:1336–1344.
6 Moreno RP, Metnitz PG, Almeida E, et al
. SAPS3 – from evaluation of the patient to evaluation of the intensive care unit. Part 2: development of a prognostic model for hospital mortality at ICU admission. Intensive Care Med 2005; 31:1345–1355.
7 Harrison DA, Brady AR, Parry GJ, et al
. Recalibration of risk prediction models in a large multicenter cohort of admissions to adult, general critical care units in the United Kingdom. Crit Care Med 2006; 34:1378–1388.
8 Vincent JL, Moreno R, Takala J, et al
. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med 1996; 22:707–710.
9 DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44:837–845.
10 Knaus WA, Wagner DP. Selection bias and the relationship between APACHE II and mortality. Crit Care Med 1990; 18:793–795.
11 Markgraf R, Deutschinoff G, Pientka L, Scholten T. Comparison of Acute Physiology and Chronic Health Evaluations II and III and simplified acute physiology score II: a prospective cohort study evaluating these methods to predict outcome in a German interdisciplinary intensive care unit. Crit Care Med 2000; 28:26–33.
12 Beck DH, Smith GB, Taylor BL. The impact of low-risk intensive care unit admissions on mortality probabilities by SAPS II, APACHE II and APACHE III. Anaesthesia 2002; 57:21–26.
13 Ledoux D, Canivet JL, Preiser JC, et al
. SAPS3 admission score: an external validation in a general intensive care population. Intensive Care Med 2008; 34:1873–1877.
14 Sakr Y, Krauss C, Amaral AC, et al
. Comparison of the performance of SAPS II, SAPS 3, APACHE II, and their customized prognostic models in a surgical intensive care unit. Br J Anaesth 2008; 101:798–803.
15 Metnitz B, Schaden E, Moreno R, et al
, ASDI Study Group. Austrian validation and customization of the SAPS 3 Admission Score. Intensive Care Med 2009; 35:616–622.
16 Soares M, Salluh JI. Validation of the SAPS3 admission prognostic model in patients with cancer in need of intensive care. Intensive Care Med 2006; 32:1839–1844.
17 Khwannimit B, Geater A. A comparison of APACHE II and SAPS II scoring systems in predicting hospital mortality in Thai adult intensive care units. J Med Assoc Thai 2007; 90:643–652.
18 Ho KM, Lee KY, Williams T, et al
. Comparison of Acute Physiology and Chronic Health Evaluation (APACHE) II score with organ failure scores to predict hospital mortality. Anaesthesia 2007; 62:466–473.