Resources in an intensive care unit (ICU), including beds, medical equipment as well as physicians' and nurses' time, are scarce. Differences in the use of resources are associated with a variety of factors including the patients' severity of illness. Although the decision environment in an ICU is highly complex, an assessment of the illness severity of a patient and his/her immediate prognosis represent an integral part of the initial evaluation of a critically ill patient and the treatment decisions made by the responsible physician that follow. The prospect of successful treatment serves as a useful method of communication between physicians and patients, physicians and family and naturally among physicians themselves. Prognoses provide clinicians, researchers and administrators with useful information that hopefully reduce the immanent uncertainty in the decision process. Continuous research covering e.g., the use of critical care resources [1-3], underlines the importance of accurate prognostic models for the purpose of both clinical practice and research.
Over the last decades, scoring systems have been developed, allowing estimation of the chances of survival for a patient. The development of these so-called objective models runs under close observation since the scientific community fears their naïve and improper use in practice . Still, the environment in an ICU, characterized both by uncertainty and limited resources, requires physicians to decide on the level and range of treatment. Their decisions should be based on the most accurate models. In this context, studies discuss the idea of using objective scores as a means of decision support [5,6].
There is a vast literature dealing with scoring systems in ICU. Its main purpose is to compare the predicted mortality derived from different models and to contrast them to actual mortality in one cohort of patients [7-9]. Sometimes prediction models are customized to specific settings and in specific databases, e.g. in patients with prolonged length of stay in the ICU [10,11], or customized for specific diseases, e.g. sepsis . In general, customization results in a heterogeneous picture concerning the accuracy of predictions. One strand of literature evaluates physicians' prognoses with the most commonly employed objective models, the Acute Physiology and Chronic Health Evaluation Scores (APACHE) II and III [13,14]. In these studies, the physicians' expertise generally reveals a higher accuracy in estimating the survival probability of patients than the actual scoring systems . A corresponding study for the Simplified Acute Physiology Score (SAPS) II model has not yet been performed.
The purpose of the present paper is to partly fill this gap. First, it studies the performance of physicians relative to that of SAPS II, based on an ICU patient group at Göttingen. Second, we examine a predictive model, which is based on a modified modus operandi for customizing the original SAPS II using existing data, providing a stronger test for the relative performance of physicians when forecasting survival.
Data collection took place from mid-2000 to the end of 2001. During this period, all consecutively admitted 504 patients, 16 yr and older, in the two participating ICUs of Göttingen University Clinic were enrolled. Burn patients, acute coronary care, cardiac surgery patients and patients with a length of stay in the ICU of <48 h were excluded, leaving 412 patients for analysis. The patient information consists of a wide range of physiological parameters and the results of questionnaires in which the physicians treating the patients involved were interviewed. For each patient three prognoses were obtained or determined using different methods. The quality of these prognoses was then examined.
As part of the project structure, the first estimate - made by the physician - was obtained directly in the ICU approximately 48 h after the patient's admittance. The physicians were asked to quantify the probability that the patient would be discharged alive from the hospital. The study involved 14 physicians with different levels of expertise. Eight of them were interns, and the least experienced had graduated from medical school only within the last 2 yr. They gave a numerical estimate for 35 patients. Four of the physicians were critical care fellows who completed at least 3 yr of training. They were specializing in the care of critically ill patients and completed 220 questionnaires. Two of the doctors were 'experts' (157 prognoses): both had more than 10 yr experience in intensive care. Fifty-seven percent of all physicians were males. We are interested in the physicians' prognoses as a whole, thus, we do not control for any difference in the accuracy due to the physicians' gender or different levels of expertise.
Data needed for the second prognosis, the SAPS II objective scoring measurement, is routinely recorded in intensive care. SAPS II, developed in the early 1990s  is one of a series of scoring systems in intensive medicine, which allow a statement to be made on the severity of the illness and the likelihood that the patient will die. This mortality predictor is frequently used in Germany and has also been evaluated at the Göttingen University Clinic. General scores such as SAPS II cover the average intensive care patient, but are not validated for specific cases, e.g. burn patients and cardiac surgery patients. It is important to recapitulate the way SAPS II works in order to clarify the difference to the third model. The system of scoring, which involves evaluating selected parameters, is recorded within the first 24 h of the patient being admitted to the ICU. In this case, physical and biochemical variables (listed in Table 2), which have been filtered from a large number of physiological values as the situation develops, are used. These parameters determine variables, which have been proved, as the statistical model has developed and been verified, to be significant for predicting hospital mortality. Different attributes of these variables are evaluated on a points scale depending on the extent to which they vary from the normal situation. The total points of all the variables under review provide the SAPS II rating for a patient. The more points a patient has, the greater the extent by which his or her physiological situation is out of balance and thus the greater the probability that he or she will die during hospital treatment. The developers of SAPS II have published the following formula based on logistic regression, which allows the associated risk of hospital mortality Pr to be assigned to each patient i and each score rating calculated:
According to this method, one member of our study team - an intern well experienced according to SAPS II - calculated the score for all patients. Missing physiological variables were assumed to be normal and assigned zero points. When a patient was sedated or the lungs ventilated, we recorded the estimated Glasgow Coma Score as if the patient was not sedated or not ventilated.
In the third prognosis, we use the structure of SAPS II to evaluate the impact it will have if this model is customized to the patient sample in Göttingen. Adapting the model signifies that the structure of SAPS II is unchanged but the coefficients of the model are re-estimated employing a bootstrap approach. A logistic regression model is created for the third prediction in which the dependent variable Y in its binary form can only adopt the values 0 (for 'survival') and 1 (for 'death'). p(Yi = 1) = pi is the probability that 'death' will occur in the case of individual i and p(Yi = 0) = 1 − pi is the complementary probability that the patient will survive. Event Y or the corresponding mortality probability are assumed to depend on the SAPS II variables X1-X15 explained in Table 2 plus the variable squared age, not included in SAPS II. As non-linear correlation exists between these variables and probability pi, the logistic distribution function represents a reasonable approach for modelling:
where Zi = β0 + β1 · X1,i + β2 · X2,i + ... + β16 · X16,i.
The coefficients β are estimated via the Maximum Likelihood Principle.
In order to predict the expected probability of death among patients, the sample is divided into two subgroups of equal size by means of a random algorithm. In the first patient Group A the coefficients are estimated, which are then used in the complementary Group K for determining a prognosis. Figure 1 illustrates the procedure.
As the sample group of n = 412 patients is relatively small, we employed a bootstrap : by repeatedly taking sample groups (in this case, 2000 independent replications) as a replacement for the original sample group, this procedure ensures an adequate prognosis. It is noteworthy that the purpose of this approach is not a validation of the SAPS II model but a test between three models forecasting hospital mortality of a given patient sample.
As was initially indicated, for each individual in the sample group three probabilities are available, all of which predict the event that the patient will die, albeit using different methods. To assess the performance in our cohort, we adopted the following two statistics: first, an examination was to be made on the ability of the systems to discriminate between survivors and the deceased on the basis of the estimated mortality. For this purpose, we used the receiver operating characteristic (ROC) curve. The area below this curve indicates the discriminating power of the models and their prognoses .
Second, results were supported by an approximate Gauss test (special case of one sample t-test) for differences on a 5% significance level. Here an evaluation was made to answer the question of how well the three prognoses were able to predict the actual event (Y = 1 or Y = 0). This test, therefore, focuses on the prognosis error, calculated as the squared difference between observed and expected mortality. By squaring the difference between realized and expected mortality more weight is given to outliers. A test was carried out to establish whether, on average, the prognosis error of the two models was equal or whether one model performs better. LIMDEP and SPSS 9.0 statistical packages were used for data analysis and statistics.
The mean age of the 412 patients enrolled (59% males, 41% females) was 59.0 yr. There was a clear predominance of medical patients (e.g. pneumonia) and an overall hospital mortality of 17.7%.
Mean predicted score mortality amounted to 25.0%, mean hospital mortality risk assessed by critical care physicians was 21.9% (Table 1).
Table 2 gives an overview of the results concerning the basic model, which explains the relationship between the SAPS II variables and the probability of death. The estimated β and the coefficient of determination reveal a robust econometric model. Some variables were no longer significant but were kept to maintain comparability with the original SAPS II model. Figure 2 shows the ROC curves for the three prognoses. The curve for the physicians' predictions is the closest to the ideal ROC curve. Below this are the two curves for SAPS II and the customized model. These two curves intersect. An analysis of the areas under the curves (AUC) indicates a figure of 0.84 (95% confidence interval (CI): 0.79-0.89) for the physicians' prognosis, 0.75 (95% CI: 0.69-0.80) for SAPS II and a value of 0.72 (95% CI: 0.66-0.78) for the customized model.
The customized Göttingen model results in an AUC that is very similar to the SAPS II model. All three prognoses contain significant positive information. When comparing the prognosis errors, physicians' prognoses perform better than both objective models. The H0 hypothesis regarding an equal prognosis error between the physicians and each of the objective models under review (or even a higher level of error for physicians' prognoses) can be rejected. By contrast, the H0 hypothesis on the forecast equivalence of the objective models cannot be rejected. Thus, both models have the same predictive power on average. Regarding the symmetry of the curvature of the ROC curve, we did not detect any differences between the three models.
Estimates of hospital mortality provide meaningful information in many contexts, such as in discussions of patient prognosis by intensive care physicians. The basic objective of this work was to compare prognoses made by intensive care clinicians with two scoring systems, SAPS II and a model based on the SAPS structure but customized to the Göttingen patients. The area under the ROC curve for SAPS II is below the value range >0.8, which was calculated in the initial SAPS II study and in other more recent publications (e.g. that of Moreno and Morais ). The performance of the customized Göttingen model is not better, thereby confirming the robustness of the SAPS II model. In accordance with the classification taken from previous publications , the discriminating power of the objective models can be considered adequate. By comparison, the physicians' prognoses are good. Our results confirm the research previously undertaken in this field. Forecasts made by physicians [13,15] are superior to objective prognosis models; an exception is a recent study , which finds that the SUPPORT model, pure or extended, is as accurate as physicians' forecasts.
If one assumes that when physicians estimate hospital mortality of a patient, they include precisely those factors that are incorporated in the objective models, one might expect that physicians' prognoses and those produced by statistical models are equally good. However, it appears that physicians have more extensive knowledge and other kinds of information available to them when they form their opinions. More extensive knowledge may originate from additional physiological parameters recorded as well in the ICU. For instance, co-morbidity or chronic illness go beyond the 15 parameter used for the objective models and provide the physician with further information about the patient's state of health. Moreover, the way in which the patient has responded to the treatment is also significant. In this respect, the fact that the SAPS prognosis preceded physicians' forecast (24 vs. 48 h after admittance) appears to be important in explaining the difference. By allowing physicians 48 h rather than just the first 24 h to make their predictions, they are able to modify their estimate in the light of how the patient responds to the therapy. Then, professional experience and intuition allow the physician, in contrast to statistical models, to make an individual picture of each patient and thus give a more accurate prognosis. A final aspect is noteworthy. It is the physicians who decide on the access and the range of treatment for ICU patients. Through their actions they affect the outcome, giving scope for a (limited) self-fulfilling prophecy. Interestingly, the outlay for therapy tends to be increased if it appears that the prognosis is not likely to be met ([1,22] as well as Teres and colleagues  which find this in a subgroup of intensive care patients, in sepsis patients). However, our study did not record the outlay for treatment but focused on the accuracy of prognosis only.
This study was supported by the Deutsche Forschungsgesellschaft (DFG). We are indebted to Dipl. Vw. Andreas Werblow, Dipl. Inf. Stefan Behrens and Dr Dirk Schürgers and two anonymous referees for their helpful comments.
1. Detsky AS, Stricker SC, Mulley AG, Thibault GE. Prognosis, survival, and the expenditure of hospital resources for patients in an intensive-care unit. New Engl J Med
2. Wong DT, Gomez M, McGuire GP, Kavanagh B. Utilization of intensive care unit days in a Canadian medical-surgical intensive care. Crit Care Med
3. Teres D, Rapoport J, Lemeshow S, Kim S, Akhras K. Effects of severity of illness on resource use by survivors and nonsurvivors of severe sepsis at intensive care unit admission. Crit Care Med
4. Moreno R. Severity of illness. In: Sibbald WJ, Bion JF, eds. Evaluating Critical Care.
Berlin, Germany: Springer, 2001: 51-68.
5. Karfonta T. Decision support systems in the intensive care unit: nurses' and physicians' experiences. PhD thesis, University of Wisconsin-Milwaukee, USA, 1999.
6. Schuster D. Predicting outcome after ICU admission. Chest
7. McNelis J, Marini C, Kalimi R. A comparison of predictive outcomes of APACHE II and SAPS II in a surgical intensive care unit. Am J Med Qual
8. Castella X, Artigas A, Bion J, Kari A. A comparison of severity of illness scoring systems for intensive care unit patients. Results of a multicenter, multinational study. Crit Care Med
9. Moreno R, Reis Miranda D, Fidler V. Evaluation of two outcome prediction models on an independent database. Crit Care Med
10. Timsit JF, Fosse JP, Troché G, et al.
Accuracy of a composite score using daily SAPS II scores for predicting hospital mortality
in ICU patients hospitalized for more than 72 h. Intensive Care Med
11. Suistomaa M, Niskanen M, Kari A, Hynynen M, Takala J. Customized prediction models based on APACHE II and SAPS II scores in patients with prolonged length of stay in the ICU. Intensive Care Med
12. Le Gall J, Lemeshow S, Leleu G, et al.
Customized probability models for early severe sepsis in adult intensive care patients. Intensive Care Unit Scoring Group. JAMA
13. Katzman McClish D, Powell S. How well can physicians estimate mortality
in a medical intensive care unit? Med Decis Making
14. Schuster HP, Hesse M, Tröster S. Prognoseeinschätzung durch Ärzte in der Intensivmedizin. Intensivmed
15. Marks RJ, Simons RS, Blizzard RA, Browne DR. Predicting outcome in intensive care units
- a comparison of APACHE II with subjective assessments. Intensive Care Med
16. LeGall JR, Lemeshow S, Saulnier F. A new simplified acute physiology score (SAPS II) based on a European/North American multicenter study. JAMA
17. Efron B, Tibshirani R. An Introduction to the Bootstrap.
New York, USA: Chapman & Hall, 1993.
18. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology
19. Moreno R, Morais P. Outcome prediction in intensive care: results of a prospective, multicenter, Portuguese study. Intensive Care Med
20. Murphy-Filkins RL, Teres D, Lemeshow S, Hosmer DW. Effect of changing patient mix on the performance of an intensive care unit severity of illness model: how to distinguish a general from a special intensive care unit. Crit Care Med
21. SUPPORT Group. The SUPPORT prognostic model objective estimates of survival for seriously ill hospitalized adults. Ann Intern Med
22. Perkins HS, Jonsen AR, Epstein WV. Providers as predictors: using outcome predictions in intensive care. Crit Care Med