Journal Logo

50th Anniversary Article

Severity of Illness and Predictive Models in Society of Critical Care Medicine’s First 50 Years: A Tale of Concord and Conflict

Kramer, Andrew A. PhD, FCCM1; Zimmerman, Jack E. MD, FCCM2; Knaus, William A. MD3

Author Information
doi: 10.1097/CCM.0000000000004924
  • Free
  • Editor's Choice

When the Society of Critical Care Medicine (SCCCM) was established in 1970, it was recognized that the volume and complexity of data from admissions to acute care hospitals exceeded the human capability to process it. This was especially apparent in the ICU, where measurements are more frequent than that in other venues within the hospital. The high cost of critical care, coupled with the high frequency of adverse events and mortality, drove the desire to improve performance within the ICU (1,2). However, the inability to evaluate how outcomes could be measured and compared across units emphasized the need to construct metrics that transcended subjective assessments.

It was intuitive that case-mix influenced care and patient outcome. Patients seen at a university hospital in an urban setting are far more likely to be at high risk of mortality than those treated at a suburban for-profit hospital (3). Indeed, the transfer of extremely high-risk patients from ICUs in rural hospitals to ICUs in academic centers was accepted practice (2). Therefore, some method of fairly evaluating mortality rates required a system that took into account patient acuity.

The attempts to measure patient acuity around the time of SCCM’s birth relied on manual entry of data, by either nurses or physicians. This required extra nontreatment time by practitioners and additional cost to ICUs. Furthermore, computer assessment of “large” numbers of patients, at the time several hundred admissions, was slow and susceptible to entry errors.

Today, data on hundreds of thousands of patients are collected electronically and processed by high-speed computers running sophisticated data mining algorithms (4). Correspondingly, there has been a push to move beyond comparisons of unit performance to creating predictions that help drive care at the bedside in almost real time.

The growth from hand-calculated acuity assessments to blink-of-an-eye analytics in the ICU over the past 50 years has not been linear. In fact, there have been numerous fits and starts, barriers in the road, and fights over commercialization. At times, the push for a particular acuity assessment was more melodrama than scientific debate. As such, the history and lessons of the previous 50 years of critical care research into severity scores, outcome prediction, benchmarking, and clinical decision support systems (CDSS) may provide insight into what the next 50 years will yield. This article will recount the struggles and triumphs experienced in critical care analytics, with an eye toward suggesting what the next 50 years might hold in store.

THERAPEUTIC INTERVENTION SCORING SYSTEM: THE PIONEER

Given the substantial costs involved in providing critical care, it is not surprising that the first scoring system focused on defining and tracking the costly and complicated interventions used in the ICU. The first such set of measures was the TISS, described in Critical Care Medicine in 1974 (5). Treatments received a value based on the expected amount of clinical resources required, and these were summed to estimate severity and workload. TISS soon became a useful way to measure the therapeutic burden of patients and was especially useful for nurse staffing decisions as it measured the intensity of care (6). Outcome assessment using TISS scores also suggested that the more therapy patients received, the higher their death rate. However, there was wide variation in how the specific therapies incorporated in TISS were used across ICUs, making it unsuitable for evaluating severity and outcomes.

THE IMPORTANCE OF PHYSIOLOGY—APACHE

Many of the therapeutic measures in TISS were designed to correct acute physiologic abnormalities. In the early years of intensive care, attempting to achieve what Claude Bernard had first termed “homeostasis” (7) was a major clinical objective. It then followed that quantifying the degree of physiologic derangement or the deviation from homeostasis formed the basis of the first iteration of a patient-oriented severity scoring system: APACHE or Acute Physiology, Age, and Chronic Health Evaluation (8). APACHE’s design principle was to use physiologic derangement as the central driving force behind assessing a patient’s severity of illness. Age and chronic disease were added to reflect “physiologic reserve.” This first iteration of APACHE reported on a sample of 582 patients from a university and community hospital, and showed that there was a direct relationship between an increasing score and patient vital status at hospital discharge.

Initial reaction to APACHE by the clinical community was encouraging. There had been other attempts at “case-mix adjustment,” but these relied on administrative data from hospitals’ billing systems (9). APACHE was the first to use clinical measures as the basis for comparing the utilization and outcomes of various ICUs.

The next milestone was the publication of APACHE II in 1985 (10). This new version simplified the scoring system, reducing the number of physiologic measurements from 33 to 12, and more precisely represented the complex interactions of disease and extent of physiologic derangement on prognosis. APACHE II was very successful and spread rapidly through the clinical community. A major reason behind APACHE II’s acceptance was that it was freely available, concise, and easy to use. The APACHE II article would become a citation classic in critical care, at last count being cited over 10,000 times.

By capturing APACHE II information within a multiinstitutional database, it was possible to compare “severity adjusted” outcomes among ICUs in various hospitals for the first time. This capability enabled a study reported on in 1986 in which APACHE II predictions were obtained for a cohort of 5,030 patients treated at 13 U.S. hospitals (11). In that study, one hospital had significantly fewer deaths than predicted by APACHE II, and another had substantially more deaths based on their APACHE II-adjusted case mix. From this result, it was suggested that the variation in outcome might be ascribable to significant variations in how well care was coordinated in the ICU, specifically the degree of communication, coordination, and quality control by the clinical staff.

APACHE III: A MORE ROBUST SYSTEM FOR PREDICTING OUTCOMES IN THE ICU

The early success and enthusiasm for improving APACHE, along with its worldwide use, improved computer capability, and the promise of a capability to collect data electronically, subsequently led to a major upgrade of the APACHE system (12). This represented an exhaustive effort to evaluate and improve all components of the system, including the weighting and interaction of physiologic variables, the interaction of physiology with different diseases, and a new way to represent the incremental contribution of all variables. The result was an improvement in overall accuracy of prediction, as seen by an increase in the area under the receiver operating characteristic curve (AU-ROC) (13) from 0.86 to 0.90.

It became apparent to APACHE’s creators that there were important variations in how ICUs were managed. For some hospitals, there were many patients admitted to ICUs who were not severely ill and did not require life-sustaining interventions. These individuals were labeled “low-risk monitor” patients, many of whom were admitted for monitoring and nursing care immediately after complex surgery (14). This was the first new observation to result from the use of APACHE and it encouraged further efforts to scrutinize ICU utilization and outcomes with the use of an objective physiologic measure.

Because APACHE III was so robust in its data collection and interpretation, predictions of outcomes beyond hospital mortality became possible. New outcomes included were ICU mortality, a patient’s anticipated ICU length of stay, duration of mechanical ventilation, identifying low-risk monitor patients, along with daily predictions of mortality for individual patients (14,15). To accommodate APACHE III’s complexity, it was imbedded within an electronic clinical information and management system using the funds raised by APACHE Medical Systems (AMSI), a company formed to support APACHE’s development and dissemination. The resultant APACHE III information system was designed to help capture the majority of variables and connect them to a large number of predictive algorithms with an extensively designed graphical user interface. There was enormous gratification in unveiling a product that enabled clinicians to communicate and compare their patients’ severity level with a recognized, transparent worldwide standard (16).

Included in APACHE III’s development was the recognition that patients undergoing coronary artery bypass graft (CABG) surgery experienced care and outcomes quite different from other critically ill patients. Specifically, mortality was much lower, ICU stay was much shorter for CABG patients, and virtually all were receiving mechanical ventilation. This led to the creation of APACHE predictions exclusively for CABG patients (17).

With the publication of APACHE III and release of the APACHE information system came controversy and disappointment. The controversy concerned the failure of the APACHE III publication to disclose the predictive equation’s coefficients in order to protect the intellectual property of AMSI. The failure to place the core equations of APACHE III in the public domain in large part explains the continuing use of APACHE II today, despite its enormous overprediction of hospital mortality in contemporary populations.

The nondisclosure of the full equations resulted in a hailstorm of criticism. As a result of AMSI’s nondisclosure of APACHE III’s hospital mortality equation, the SCCM sought to compete with AMSI by creating Project IMPACT (18). This ended up dividing a small professional community into competing camps.

The major disappointment was that the promise of interconnectivity of the APACHE information system with large electronic medical record (EMR) system vendors did not take place, even when AMSI was later bought out by Cerner Corporation. In part, this came about because of the resistance of EMR vendors to mount any sort of data sharing protocol. Although the APACHE information system was installed in almost 100 hospitals nationwide, the vast majority of the data were still being manually entered. This data burden created an insurmountable roadblock to wider dissemination.

APACHE III was superseded by Versions III-i and III-j, but then stayed dormant for several years. This led to overprediction of mortality by equations developed using older patient data, which has been called “model fade” (19) and is a result of improved disease-specific therapy over time (20). Cerner APACHE tested the accuracy of APACHE III-j using data from admissions to ICUs in 2002 and 2003, and found that the overall standardized mortality rate (SMR) was far below the desired 1.00. This meant that most ICUs were given an overly optimistic assessment of their performance, that is, “grade inflation.” As a result, a decision was made to update APACHE with more contemporary data, as well as upgrading certain aspects of the analytics.

APACHE IV: A BRAND NEW SET OF CLOTHES

Cerner APACHE used 110,558 admissions to ICUs from 2002 to 2003 to develop and validate models for predicting hospital mortality and measures of ICU resource use. Each model included new predictive variables and used refined statistical methods (21). APACHE IV employed the same physiologic variables and weights as APACHE III, but was even more complex (142 variables), mainly due to expansion in the number of ICU admission disease groups (from 94 to 116), new predictor variables, and differences in variable measurement. It turned out that physiology and ICU admission diagnosis combined for over 80% of the model’s’ predictive power; comorbidities and age were not nearly as influential. The APACHE IV models for predicting hospital mortality (21) and ICU length of stay (22) were placed in the public domain and made available at a website that provided user instructions and a calculator.

These models’ complexity came with a cost. The Joint Commission on Accreditation of Healthcare Organizations (JCAHO) initially considered incorporating APACHE IV’s hospital mortality and ICU length of stay predictive models into their set of ICU core measures. However, the implementation effort was judged too daunting and JCAHO abandoned their consideration of APACHE for their core measures.

APACHE IV models also provided updated equations that predicted hospital length of stay (23), risk for active life-supporting therapy among patients who receive only monitoring on ICU day 1 (24), as well as duration of mechanical ventilation in integer days (25). Similar to APACHE III, APACHE IV included a separate set of predictive models for patients undergoing CABG surgery (26). Analyses using the above models focused on patient groups rather than individuals. The accuracy of the 77 APACHE IV models (developed and validated using a 2002–2003 patient database) was assessed every 3–5 years using contemporary data. In 2010, each APACHE IV equation was recalibrated using an external patient database with admissions to adult ICUs from 2006 to 2008 (APACHE IVa).

It is instructive at this point to review the commercialization of predictive models for critical care, as this had a profound impact on intensive care analytics during the last 30 years.

COMMERCIALIZATION OF CRITICAL CARE ANALYTICS AND THE SUBSEQUENT FALLOUT

The development and utilization of predictive models for ICU patients were largely supported by governments in the United Kingdom, France, Netherlands, Australia, New Zealand, and Finland (27). In the United States, except for a Veterans Administration model that incredibly did not include vital signs (28), these efforts were initially supported by foundation and government grants. APACHE III broke that mold by becoming the first ICU predictive modeling system to be developed and marketed commercially.

Why was APACHE III developed commercially? First, AMSI was overwhelmed by requests for assistance by users of APACHE II. Second, in response to the shortcomings of APACHE II, the developers of APACHE understood the need to collect patient data to develop and validate an improved model (APACHE III). Third, government and foundation grant funds were available to study the relationship between the ICU management and outcomes, but these funding sources did not support patient data collection.

Explicit academic-commercial intellectual property arrangements for technology transfer were rare for critical care in the late 1980s even though they were common in the basic sciences and in the pharmaceutical and medical device industries. This gap led to misunderstanding and outright hostility to AMSI. Despite the high cost to develop APACHE, some intensivists demanded that its software and support services be provided free or at a nominal charge. There ensued heated discussions at scientific meetings and in print, in which the developers of APACHE received ad hominem attacks.

In 1996, the SCCM countered with a program called “Project IMPACT.” The program’s main focus involved using a public domain predictive model for benchmarking ICU performance, in this case the Mortality Probability Model (29) (MPM) Version II (30). The MPM0 was developed in 1985; the model greatly simplified manual data collection for mortality prediction at ICU admission (29). In 1993, the model was updated (MPM II) using U.S. and European data and later expanded to provide mortality predictions at 24 and 48 hours (30). Also included was a prediction for weighted hospital days (a surrogate for ICU stay). Project IMPACT gained a respectable acceptance across ICUs, which enabled the comparison of ICU performance via quarterly reports sent to all participants.

Another useful achievement of Project IMPACT was the development of the Rapoport-Teres chart, an innovative two-dimensional graph based on SMRs and difference between the observed and predicted-weighted hospital days (31). This chart allowed ICUs to be compared using these two metrics simultaneously.

Similar to AMSI’s experiences with APACHE, SCCM found that building the infrastructure for data acquisition, user assistance, analysis, and performance reports required commercial support. Subsequently, a commercial entity, Tri-Analytics, overtook operation of Project IMPACT. Neither AMSI nor Tri-Analytics were commercial successes. The failure of these companies was probably caused by a number of factors: skepticism about ICU benchmarking; competition for funds to purchase a predictive system versus other technologies such as CT and MRI; the difficulty in showing a financial return on investment for the hospital; and the requirement for an APACHE/MPM coordinator since few hospitals had automation capabilities.

Cerner Corporation acquired AMSI and the APACHE III models in 2000, and in 2004 Project IMPACT and the MPM-II models. This AMSI purchase included a contract with VISICU, which provided day-1 APACHE equations and subsequent updates. A few years later, VISICU was acquired by Phillips, which embedded VISICU analytics into its eICU remote telemonitoring solution (32). Thus, two major EMR vendors, Cerner and Phillips, had almost 100% ownership of the ICU analytics and software market.

In 2007, a third version (MPM0-III) was published (33,34). The most important difference of MPM0-III compared with APACHE was that it had 16 variables collected within the first hour after ICU admission, compared with 26 APACHE IV variables plus selecting among 116 diagnoses collected during the first 24 hours after ICU admission. As a result, data collection burden was least for MPM0-III and highest for APACHE IV. MPM was eventually phased out by Cerner, leaving APACHE as the sole severity-adjustment tool for critical care in the United States. In 2017, APACHE IVa was updated, a simple recalibration called “APACHE IVb.” Since then, Cerner has purportedly phased out APACHE as a dedicated ICU tool (personal communication), focusing instead on a solution applicable across all venues within an acute care hospital.

Medical Decision Network, LLC, an innovation driven healthcare IT company has created an APACHE-like system called “ACUITY 2016” (35) and its successor ACUITY 2019. The latter removed elements in APACHE that needed to be manually entered, such as reason for ICU admission, and replaced them with elements such as diagnosis-related groups that could be captured electronically.

Phillips’ large database of ICU admissions has been in part made publicly available for scientific research (36). This is an ongoing effort, with cooperation from researchers at Massachusetts Institute of Technology. Another publicly available ICU database is called “Medical Information Mart for Intensive Care (MIMIC).” The initial publicly available iteration MIMIC-II was comprised of approximately 8,000 patients seen at the Beth Israel Deaconess Medical Center in 2008 (37). In addition to the usual clinical and physiologic variables found in most ICU databases, MIMIC also included information streamed from EKGs. MIMIC-II was an excellent “sandbox” for trying out new predictive algorithms. Unfortunately, its small population from one hospital made it impractical for creating a model for benchmarking purposes. With these limitations in mind, MIMIC-III was introduced (38,39). It contains data on over 40,000 patients seen at the Beth Israel Deaconess Medical Center between 2001 and 2012. The involvement of a single center still limits this data source for creating for benchmarking and reporting tools. However, the rich source of variables, including streaming vital signs, makes it a good candidate for early-stage model development.

APACHE’S SIBLINGS

Several predictive models have been developed for the ICU, which are derivatives of APACHE. The U.S. Veteran’s Affairs model (28), the U.K.’s Intensive Care National Audit and Research Center (ICNARC) model (40), and an attenuated version of APACHE called “OASIS” based on machine learning techniques (41) are three examples. Each of these models provided hospital mortality predictions using ICU admissions from the early 2000s and has not been updated in the last decade.

The continental European response to APACHE was the Simplified Acute Physiology Score (SAPS) (42), a close variant of APACHE. SAPS 2 was developed in 1993 (43) using 13,152 patients from 137 ICUs in 12 countries. Its successor, SAPS 3 (44), was an ambitious approach to develop a worldwide hospital mortality model for the ICU. SAPS 3 included patients from 307 adult ICUs in 35 countries, but data collection took place over a mere 2 months, which resulted in a cohort of just 19,577 admissions for analysis: a mean of 539 patients per country, or 64 per ICU. These numbers are insufficient for benchmarking across countries and ICUs.

Given SAP 3’s questionable suitability for benchmarking, it is fair to ask if APACHE can be used outside of the United States. Several countries have tried to answer that, most notably the Netherlands, Australia, and New Zealand. The Netherlands’ National Intensive Care Evaluation (NICE) NICE made the wise decision to recalibrate APACHE IV to its own population (45), whereas the Australian and New Zealand Intensive Care Society (ANZICS) announced a variation of APACHE III-j called the “Australian and New Zealand Risk of Death” model (46), which added substantially more nonphysiologic data elements. NICE and ANZICs have each built a substantial patient database and have published extensively in the critical care literature. It appears that customization of APACHE to individual countries and their differing healthcare systems makes it suitable for use outside of the United States. Other countries have constructed national ICU databases, such as Sweden’s Intensive Care Registry (47); that entity uses SAPS 3 as its risk-adjustment model.

A COMPARISON OF THE MAJOR MORTALITY MODELS

There are many similarities among the above predictive models (48). Each model includes a prediction of hospital mortality based on logistic regression. The SAPS 3, ICNARC, and Veteran's Association models are closely based on APACHE methodology. As such, they were the replicates of APACHE, not replacements. Table 1 shows the major hospital mortality models, year in which they were published, number of admissions in the cohort, number of diagnostic groups in the model, and number of nondiagnostic variables (excluding the individual components of the severity of illness score). One major difference not shown in Table 1 is that all of the models except MPM collect data over a patient’s first day in the ICU. MPM uses information taken at 1 hour after admission.

TABLE 1. - Components of the Major Hospital Mortality Models That Have Been Used in the ICUa
Model APACHE II APACHE III APACHE IV Intensive Care National Audit and Research Center MPM-II MPM-III SAPS 2 SAPS 3
Published 1985 1991 2006 2007 1993 2007 1993 2005
Admissions 5,815 17,440 110,558 216,626 19,124 124,855 13,152 16,784
Number of Equations 1 77 1 2 2 1
Dxa Groups 50 78 115 101 0 5 0 13
Other varsb 4 9 27 11 15 19 17 7(1)c
APACHE = Acute Physiology, Age, and Chronic Health Evaluation, MPM = Mortality Probability Model, SAPS = Simplified Acute Physiology Score.
aDx Groups = number of diagnostic groups.
bOther vars = # of variables other than diagnostic group and components of the severity score.
cSeven variables were given weights along with 10 physiologic variables, and summed into one score.

The number of admissions used for model development has increased over time. However, the increase in sample size has not resulted in a dramatically increased AU-ROC (Fig. 1). All of the major hospital mortality models exhibited good discrimination. The real gain in using a larger number of patients comes from the ability to calibrate within more patient subgroups. APACHE has consistently employed a large number of diagnostic groups as well as additional variables, making it the preferred candidate for case-mix adjustment.

Figure 1.
Figure 1.:
Discrimination of the major hospital mortality models. APACHE = Acute Physiology, Age, and Chronic Health Evaluation, AU-ROC = area under the receiver operating characteristic curve, ICNARC = Intensive Care National Audit and Research Center, MPM = Mortality Probability Model, SAPS = Simplified Acute Physiology Score.

Determining whether the “best” prognostic model should be more or less complex requires “head-to-head” prospective validation in an independent patient population. This is because prognostic models are overspecific for the populations used in their development (40,49,50). Identifying the “best” model also requires assessment of model accuracy across patient populations, diagnostic subgroups, and among ICUs with large enough numbers of patients to provide meaningful CIs (51). The effectiveness of mortality prediction across ICUs also depends on the extent of patient coverage (exclusion criteria), which varies markedly across models (52). Comparisons across patients and ICUs are unusual because of the cost and complexity involved (53).

In 2014, the results of a head-to-head comparison of patient-level accuracy of mortality predictions from APACHE IVa, MPM0-III, and National Quality Forum’s (NQF) extension of MPM0-III, among 174,001 ICU admissions during 2008–2012 was reported (50). APACHE IV excluded fewer patients than the MPM-III and NQF models and had the best discrimination, calibration, and overall accuracy at the patient level. A second head-to-head comparison focused on ICU-level performance based on APACHE IV and NQF predicted and observed mortality at 47 ICUs using January 2008 to May 2013 data (54). There were discrepancies between the APACHE IV and NQF-based ICU performance assessments. Performance was concordant in only 45% of the ICUs. Four ICUs had superior performance based on APACHE IVa predictions, but inferior performance using NQF; two ICUs had inferior performance based on NQF, but average performance based on APACHE IVa.

Differences in model performance at patient and ICU levels appeared to be related to differences in how well APACHE IV and NQF models adjust for case-mix. Two case-mix–related measures, ICU admission diagnosis (116 for APACHE IV, three for NQF), and measurement of physiologic abnormalities had the greatest influence on predictive accuracy. In addition, 27.9% of admissions were eliminated by NQF, whereas APACHE IV eliminated 10.6% of admissions. This is important because a prognostic model’s exclusion criterion can alter crude mortality for individual ICUs by as much as 15% (52).

LESSONS LEARNED ABOUT SEVERITY AND PROGNOSTIC SYSTEMS

The competitive push for prominence among predictive models, while sometimes bruising, did shed light on several aspects of ICU analytics. First, there is nothing simple about assessing severity and prognosis. Models with a greater number of predictor variables and fewer exclusions were associated with improved accuracy. However, complexity comes with costs, notably increased data burden and the need for timely revalidation (and if necessary, recalibration). The need for revalidation is most often due to a change in the frequency of mortality over time. An illustration of this phenomenon is presented in Figure 2. Admissions to a commercial ICU database during 2016 (data courtesy Medical Decision Network, Charlottesville, VA) had hospital mortality predictions generated from three predictive models developed in different eras: APACHE III in 1988–1989 (12), APACHE IV in 2002–2003 (21), and ACUITY 2016 (35) in 2012–2015. The graph shows that the further the back in time a model was developed, the greater the overprediction in hospital mortality, leading to decreasing standardized mortality ratios.

Figure 2.
Figure 2.:
Standardized mortality ratio from applying the hospital mortality predictions from different eras to a single data set of ICU admissions. APACHE = Acute Physiology, Age, and Chronic Health Evaluation.

A second lesson learned was that a model developed in one country should not be used without alteration in another country. Experience has shown that international differences in healthcare systems and in ICU populations require that a system developed in one country should be recalibrated and validated before use in another country. The “imported” system should then be tested for accuracy in a second (independent) population in that country.

Finally, we have learned that accuracy of a predictive model among patient groups does not mean that predictions can be applied to an individual patient. At the patient level, unmeasured factors and institutional characteristics cause significant variation in estimated values. This is particularly cogent for outcomes that involve duration, such as ICU length of stay. Availability of beds, presence of step-down units, and other issues affecting ICU throughput are factors that make predicting ICU length of stay problematic at the patient level. However, predictive models such as APACHE IV and MPM-III have shown impressive accuracy at the unit level. Most likely, this is a result of these unmeasured factors cancelling each other out when looking at a group of patients.

CLINICAL DECISION SUPPORT SYSTEMS

With the development of APACHE III, we believed that our prognostic estimates were becoming precise enough for them to be used potentially over time as one indicator of whether an individual patient was responding to therapy or not. This turned out to be a false hope, as APACHE did not capture the information necessary for use at the bedside. Although predictive models such as APACHE have proven valuable for ICU benchmarking and comparing performance at a unit over time (55), they are old, the most recent major upgrade to APACHE and ICNARC being about 15 years ago (Table 1). Furthermore, these models have two drawbacks that make them limited in terms of care for the individual at bedside. One is that APACHE’s Acute Physiology Score (APS) uses a patient’s worst value within a 24-hour timeframe for each physiologic element. This means that the APS can only get worse during the day; thus, it cannot identify patients who are improving. The second drawback is that APS scores are only available at the end of an “APACHE day,” which gives it no value for real-time decision-making.

The demand for more contemporaneous analytics has resulted in the development of early warning scores (EWS). These originated on general wards in an acute care hospital. The first such EWS to receive attention was the Modified Early Warning Score (MEWS) (56), subsequently followed by the New Early Earning Score (NEWS) (57) and NEWS-2 (58). These scores came from data obtained by nurses during normal rounding. Deviation in vital signs and level of conciseness generated integer values that were summed to give a composite numerical score. An increasing score’s value indicated an increased risk for patient deterioration, and various actions were suggested based on thresholds for these values.

Although the various EWS can be adapted to collect data electronically at timely intervals, MEWS, NEWS, and NEWS-2 were not calibrated on critically ill patients, thus making their usefulness in ICUs equivocal. Furthermore, a study using general ward patients was inconclusive in terms of the score’s ability to lower mortality (59).

A different kind of analytic was sought for determination of a patient’s readiness to be discharged from the ICU. Readmissions to the ICU are at high mortality risk and can be costly (60,61). That led to the creation of models that predicted readmission to the ICU or death on another unit post-ICU discharge (35,62).

There has been a clarion call for a new kind of analytic, one that requires no manual data collection and can emit alerts in near real-time (63,64). These have been called “CDSS.” The U.S. Food and Drug Administration (FDA) has defined a distinct label for these systems: Software as a Medical Device (65). These CDSS analytic systems rely heavily on machine learning and artificial intelligence approaches to data analysis. An example is the Visensia Index (66–68), which was created for intermediate care units in 2006. It is based on detecting multivariable outliers in five vital signs (heart rate, respiratory rate, blood pressure, temperature, and Spo2). A signal is generated when a threshold probability has been exceeded. The Visensia index has proven robust in clinical studies and received FDA clearance for marketing. It is being combined with another analytic (see below) to result in a CDSS for the ICU.

The first attempts at creating analytics for individual ICU patients have been aimed at alerts for specific conditions. Bedside analytics for detecting sepsis have been developed using vital signs and laboratory measurements, with modest results (4). A genetic test is available for detecting patients with a high risk of progression toward acute kidney injury (69). However, to date, there is no CDSS in use for all patients in ICUs. In part, this is due to the large expense of mounting a pivotal clinical study without the support of a separate National Institutes of Health agency like those that exist for cancer, heart disease, infectious diseases, etc. There is also a major issue with IT security in the need to comply with increasingly burdensome federal privacy laws.

Recently, a CDSS for detecting patients at an elevated risk for mortality hours ahead of actual deterioration has been described (70). Called “SIGNIPHY,” it is based on a pattern-recognition algorithm for multiparameter physiologic configurations that are indicators of high mortality risk. Currently, a combination of this algorithm and Visensia is being tested at multiple ICUs where data are being collected in real time.

THE NEXT 50 YEARS: BARRIERS AND PROMISES

The second half of SCCM’s first century could lead to remarkable breakthroughs in analytics that improve patient care. However, there are serious obstacles in the way. Perhaps the foremost roadblock is the poor interoperability among medical devices within the ICU. Critically ill patients are connected with many devices that yield quantitative information (63). These information wells are frequently incapable of amalgamating information, resulting in data silos. Furthermore, patient data pre- and post-ICU care could be valuable, but they are usually not available to the critical care researcher. A fully automated system might allow the inclusion of laboratory values, especially the various “OMICs” (e.g., genomics and metabolomics).

The other obstacle stems from the growing reach of the Health Information Portability and Accountability Act (HIPAA). Started in 1996 as a means to prevent a patient’s health information being distributed indiscriminately, it has grown into a mechanism that makes any sort of data linkage extremely difficult (71). This especially affects population health studies, where important information outside of ICU might be quite useful. Studies to develop new predictive analytics suffer long delays as each hospital’s bureaucratic layer must sign off on HIPAA adherence.

With the above in mind, there are exciting new developments in health informatics that could substantially change the fundamental nature of severity scores. Using genomic information and that obtained from other molecular entities could help deliver on the promise of personalized medicine by substantially reducing the amount of randomness aligned with critical care metrics. This could make severity scores more accurate at the patient level. Algorithms for predicting outcomes other than mortality, including those that aid in treatment decisions (e.g., detecting ventilator-associated pneumonia, sepsis alerts, and when to wean from mechanical ventilation) might be attainable.

The demonstrably different physiologic profile and subsequent outcomes for patients admitted to the ICU due to acute respiratory distress from infection with coronavirus disease 2019 argue for predictive models that are accurate under highly unusual conditions. These models need to be developed quickly and address situation-specific outcomes: prediction of prolonged mechanical ventilation (≥ 96 hr on a ventilator), triaging decisions, etc.

Perhaps the area that will have the most immediate impact on predictive models is the advancement of data science/machine learning algorithms. The advent of freeware programming tools such as Python and R makes algorithm development affordable. Software repositories such as GitHub allow collaboration on a scale previously unattainable. Critical care analytics, with its large and diverse datasets, is well suited to use these new computational tools. In turn, this will lead to the use of complex machine learning algorithms that find patterns in previously intractable data streams emanating from devices in the ICU.

CONCLUSIONS

Critical care analytics have come a long way in SCCM’s first 50 years. From a start of measuring therapy to reflect severity, methods for evaluating a patient’s risk have advanced tremendously. APACHE’s emphasis on physiology and diagnosis was seminal in assessing severity and in comparing ICUs, which has not been surpassed by other predictive models.

A recent set of guidelines by editors of prominent journals in critical care (72) attests to the long-lasting impact of the work that is taken place for developing predictive models of ICU outcomes. Critical care professionals now use predictive modeling for diverse purposes, and the high editorial standards set in that report reflect the many lessons and experiences discussed here.

Yet, despite the large amount of research and development into critical care severity scores, the field continues to be dominated by scoring approaches that are decades old. APACHE II is still being widely used today for stratifying patients by severity of illness, because its reliance on physiology means that it still works well in a variety of circumstances. Does APACHE II’s simplicity, being free available, and supplying a basic level of information about a patient’s severity of illness outweigh more sophisticated models’ improved accuracy and utility? In many respects, the answer to that question is “Yes,” due in large part because updates to scoring systems have been evolutionary rather than revolutionary. Improving a predictive model’s AU-ROC from 0.85 to 0.90 is not sufficiently compelling to cause some intensivists to relinquish a model developed before many of them were born.

What is needed is a different approach to consolidating information from a variety of sources and using that information in a novel way. Severity scores such as APACHE and MPM were predicated on the assumption that medical device integration would allow for the seamless transfer of critical care data into those systems. Access to large volumes of clinical data amenable to advanced machine learning techniques would achieve new insights into pathophysiology of disease. However, this has not yet happened. Why?

The primary culprit is the absence of a common format and nomenclature for EMR data. This is essential for integrating the huge amount of clinical data ranging from laboratory tests to clinical diagnoses to the various “OMICS,” especially in multicenter studies. Adoption of a consistent standard for clinical data across healthcare systems holds the promise of liberating data from existing silos and being able to combine them into large accurate databases amenable for advanced machine learning methods. Blocking the path forward is draconian privacy laws that make access to critical care data difficult for research aimed at developing projects whose results could generate new severity scoring systems that could be used at the bedside.

In the near future, larger and larger quantities of clinical data will be available, but will not by itself lead to improved insights and more precise clinical knowledge. The challenge will be for the clinical enterprise to properly amalgamate, analyze, present, and transform these data into new ways to measure severity that are useful at the bedside. Only then will clinicians abandon their reliance on severity measures developed in the last century.

ACKNOWLEDGMENTS

We thank for the support shown over the years by one of the original developers of APACHE, the late Douglas P. Wagner, PhD.

We also thank Medical Decision Network, LLC, in providing the use of their ICU database “Phoenix” in an analysis included in this article.

REFERENCES

1. Scheffler RM, Knaus WA, Wagner DP, et al. Severity of illness and the relationship between intensive care and survival. Am J Public Health. 1982; 72:449–454
2. Rosenberg AL, Zimmerman JE, Alzola C, et al. Intensive care unit length of stay: Recent changes and future challenges. Crit Care Med. 2000; 28:3465–3473
3. Knaus WA, Wagner DP, Zimmerman JE, et al. Variations in mortality and length of stay in intensive care units. Ann Intern Med. 1993; 118:753–761
4. Giannini HM, Ginestra JC, Chivers C, et al. A machine learning algorithm to predict severe sepsis and septic shock: Development, implementation, and impact on clinical practice. Crit Care Med. 2019; 47:1485–1492
5. Cullen DJ, Civetta JM, Briggs BA, et al. Therapeutic intervention scoring system: A method for quantitative comparison of patient care. Crit Care Med. 1974; 2:57–60
6. Keene AR, Cullen DJ. Therapeutic intervention scoring system: Update 1983. Crit Care Med. 1983; 11:1–3
7. Bernard C. An Introduction to the Study of Experimental Medicine. 1865, First Edition. New York and London: H. Bailere
8. Knaus WA, Zimmerman JE, Wagner DP, et al. APACHE-acute physiology and chronic health evaluation: A physiologically based classification system. Crit Care Med. 1981; 9:591–597
9. Fetter RB, Shin Y, Freeman JL, et al. Case mix definition by diagnosis-related groups. Med Care. 1980; 18(Suppl 2):iii, 1–53
10. Knaus WA, Draper EA, Wagner DP, et al. APACHE II: A severity of disease classification system. Crit Care Med. 1985; 13:818–829
11. Knaus WA, Draper EA, Wagner DP, et al. An evaluation of outcome from intensive care in major medical centers. Ann Intern Med. 1986; 104:410–418
12. Knaus WA, Wagner DP, Draper EA, et al. The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest. 1991; 100:1619–1636
13. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982; 143:29–36
14. Zimmerman JE, Wagner DP, Knaus WA, et al. The use of risk predictions to identify candidates for intermediate care units. Implications for intensive care utilization and cost. Chest. 1995; 108:490–499
15. Wagner DP, Knaus WA, Harrell FE, et al. Daily prognostic estimates for critically ill adults in intensive care units: Results from a prospective, multicenter, inception cohort analysis. Crit Care Med. 1994; 22:1359–1372
16. Knaus WA. APACHE 1978-2001: The development of a quality assurance system based on prognosis: Milestones and personal reflections. Arch Surg. 2002; 137:37–41
17. Becker RB, Zimmerman JE, Knaus WA, et al. The use of APACHE III to evaluate ICU length of stay, resource use, and mortality after coronary artery bypass surgery. J Cardiovasc Surg. 1995; 36:1–11
18. Cook SF, Visscher WA, Hobbs CL, et al.; Project IMPACT Clinical Implementation Committee. Project IMPACT: Results from a pilot validity study of a new observational database. Crit Care Med. 2002; 30:2765–2770
19. Kramer AA. Predictive mortality models are not like fine wine. Crit Care. 2005; 9:636–637
20. Zimmerman JE, Kramer AA, Knaus WA. Changes in hospital mortality for United States intensive care unit admissions from 1988 to 2012. Critical Care. 2013; 17:R81
21. Zimmerman JE, Kramer AA, McNair DS, et al. Acute physiology and chronic health evaluation (APACHE) IV: Hospital mortality assessment for today’s critically ill patients. Crit Care Med. 2006; 34:1297–1310
22. Zimmerman JE, Kramer AA, McNair DS, et al. Intensive care unit length of stay: Benchmarking based on Acute Physiology and Chronic Health Evaluation (APACHE) IV. Crit Care Med. 2006; 34:2517–2529
23. Kramer AA, Zimmerman JE. The relationship between hospital and intensive care unit length of stay. Crit Care Med. 2011; 39:1015–1022
24. Zimmerman JE, Kramer AA. A model for identifying patients who may not need intensive care unit admission. J Crit Care. 2010; 25:205–213
25. Kramer AA, Gershengorn HB, Wunsch H, et al. Variations in case-mix-adjusted duration of mechanical ventilation among ICUs. Crit Care Med. 2016; 44:1042–1048
26. Kramer AA, Zimmerman JE. Predicting outcomes for patients admitted to ICUs following cardiac surgery: Problems and solutions. Seminars Cardiothoracic Vascular Anesthesia. 2008; 12:175–183
27. Zimmerman JE, Kramer AA. A history of outcome prediction in the ICU. Curr Opin Crit Care. 2014; 20:550–556
28. Render ML, Kim HM, Deddens J, et al. Variation in outcomes in veterans affairs intensive care units with a computerized severity measure. Crit Care Med. 2005; 33:930–939
29. Lemeshow S, Teres D, Pastides H, et al. A method for predicting survival and mortality of ICU patients using objectively derived weights. Crit Care Med. 1985; 13:519–525
30. Lemeshow S, Teres D, Klar J, et al. Mortality probability models (MPM II) based on an international cohort of intensive care unit patients. JAMA. 1993; 270:2478–2486
31. Rapoport J, Teres D, Lemeshow S, et al. A method for assessing the clinical performance and cost-effectiveness of intensive care units: A multicenter inception cohort study. Crit Care Med. 1994; 22:1385–1391
32. Lilly CM, McLaughlin JM, Zhao H, et al.; UMass Memorial Critical Care Operations Group. A multicenter study of ICU telemedicine reengineering of adult critical care. Chest. 2014; 145:500–507
33. Higgins TL, Teres D, Copes WS, et al. Assessing contemporary intensive care unit outcome: An updated Mortality Probability Admission Model (MPM0-III). Crit Care Med. 2007; 35:827–835
34. Nathanson BH, Higgins TL, Teres D, et al. A revised method to assess intensive care unit clinical performance and resource utilization. Crit Care Med. 2007; 35:1853–1862
35. Kramer AA. A novel method using vital signs information for assistance in making a discharge decision from the intensive care unit: Identification of those patients at highest risk of mortality on the floor or discharge to a hospice. Med Res Archives. 2017; 5:1–12. Available at: https://journals.ke-i.org/mra/article/view/1635/1614. Accessed September 21, 2020
36. Pollard TJ, Johnson AEW, Raffa JD, et al. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci Data. 2018; 5:180178
37. Saeed M, Villarroel M, Reisner AT, et al. Multiparameter intelligent monitoring in intensive care II: A public-access intensive care unit database. Crit Care Med. 2011; 39:952–960
38. Johnson AE, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035
39. Johnson AEW, Stone DJ, Celi LA. The MIMIC code repository: Enabling reproducibility in critical care research. J Am Med Informatics Assoc. 2018; 25:32–39
40. Harrison DA, Brady AR, Parry GJ, et al. Recalibration of risk prediction models in a large multicenter cohort of admissions to adult, general critical care units in the United Kingdom. Crit Care Med. 2006; 34:1378–1388
41. Johnson AE, Kramer AA, Clifford GD. A new severity of illness scale using a subset of acute physiology and chronic health evaluation data elements shows comparable predictive accuracy. Crit Care Med. 2013; 41:1711–1718
42. Le Gall JR, Loirat P, Alperovitch A, et al. A Simplified Acute Physiology Score for ICU patients. Crit Care Med. 1984; 12:975–977
43. Le Gall JR, Lemeshow S, Saulnier F. A new simplified acute physiology score (SAPS II) based on a European/North American multicenter study. JAMA. 1993; 270:2957–2963
44. Moreno RP, Metnitz PGH, Almeida E, et al. SAPS 3–From evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission. Int Care Med. 2015; 31:1345–1355
45. Verburg IWM, de Jonge E, Peek N, et al. The association between outcome-based quality indicators for intensive care units. PLoS One. 2018; 13:e0198522
46. Paul E, Bailey M, Pilcher D. Risk prediction of hospital mortality for adult patients admitted to Australian and New Zealand intensive care units: Development and validation of the Australian and New Zealand risk of death model. J Crit Care. 2013; 28:935–941
47. Engerström L, Kramer AA, Nolin T, et al. Comparing time-fixed mortality prediction models and their effect on ICU performance metrics using the Simplified Acute Physiology Score 3. Crit Care Med. 2016; 44:e1038–e1044
48. Zimmerman JE, Kramer AA. Outcome prediction in critical care: The APACHE models. Curr Opinions in Crit Care. 2008; 14:491–497
49. Bakhshi-Raiez F, Peek N, Bosman RJ, et al. The impact of different prognostic models and their customization on institutional comparison of intensive care units. Crit Care Med. 2007; 35:2553–2560
50. Kramer AA, Higgins TL, Zimmerman JE. Comparison of the mortality probability admission model III, national quality forum, and acute physiology and chronic health evaluation IV hospital mortality models: Implications for national benchmarking*. Crit Care Med. 2014; 42:544–553
51. Peek N, Arts DG, Bosman RJ, et al. External validation of prognostic models for critically ill patients required substantial sample sizes. J Clin Epidemiol. 2007; 60:491–501
52. Wunsch H, Brady AR, Rowan K. Impact of exclusion criteria on case mix, outcome, and length of stay for the severity of disease scoring methods in common use in critical care. J Crit Care. 2004; 19:67–74
53. Breslow MJ, Badawi O. Severity scoring in the critically ill: Part 2: Maximizing value from outcome prediction scoring systems. Chest. 2012; 141:518–527
54. Kramer AA, Higgins TL, Zimmerman JE. Comparing observed and predicted mortality among ICUs using different prognostic systems: Why do performance assessments differ? Crit Care Med. 2015; 43:261–269
55. Zimmerman JE, Alzola C, Von Rueden KT. The use of benchmarking to identify top performing critical care units: A preliminary assessment of their policies and practices. J Crit Care. 2003; 18:76–86
56. Morgan R, Williams F, Wright M. An early warning scoring system for detecting developing critical illness. Clin Intensive Care. 1997; 8:100
57. Royal College of Physicians. National Early Warning Score (NEWS). Standardising the Assessment of Acute-Illness Severity in the NHS: Report of a Working Party, 2012. Available at: www.rcplondon.ac.uk/projects/outputs/national-early-warning-score. Accessed October 15, 2018
58. Royal College of Physicians. National Early Warning Score (NEWS) 2: Standardising the Assessment of Acute-Illness Severity in the NHS: Updated Report of a Working Party, 2017. Available at: www.rcplondon.ac.uk/projects/outputs/national-early-warning-score-news-2. Accessed October 15, 2018
59. Bedoya AD, Clement ME, Phelan M, et al. Minimal impact of implemented early warning score and best practice alert for patient deterioration. Crit Care Med. 2019; 47:49–55
60. Kramer AA, Higgins TL, Zimmerman JE. The association between ICU readmission rate and patient outcomes. Crit Care Med. 2013; 41:24–33
61. Kramer AA, Higgins TL, Zimmerman JE. Can this patient be safely discharged from the ICU? Intensive Care Med. 2016; 42:580–582
62. Badawi O, Breslow MJ. Readmissions and death after ICU discharge: Development and validation of two predictive models. PLoS One. 2012; 7:e48758
63. Kramer AA, Sebat F, Lissauer M. A review of early warning systems for prompt detection of patients at risk for clinical decline. J Trauma Acute Care Surg. 2019; 87:S67–S73
64. Sutherland SM, Chalwa LS, Gill SL, et al. Utilizing electronic health records to predict acute kidney injury risk and outcomes: Workgroup statements from the 15th ADQI consensus conference. Can J Kid Health Dis. 2016; 3:11–24
65. Software as a Medical Device (SAMD). 2017. Available at: www.fda.gov/media/100714/download. Accessed May 7, 2020
66. Tarassenko L, Hann A, Young D. Integrated monitoring and analysis for early warning of patient deterioration. Br J Anaesth. 2006; 97:64–68
67. Hravnak M, Edwards L, Clontz A, et al. Defining the incidence of cardiorespiratory instability in patients in step-down units using an electronic integrated monitoring system. Arch Intern Med. 2008; 168:1300–1308
68. Hravnak M, Devita MA, Clontz A, et al. Cardiorespiratory instability before and after implementing an integrated monitoring system. Crit Care Med. 2011; 39:65–72
69. Bihorac A, Chawla LS, Shaw AD, et al. Validation of cell-cycle arrest biomarkers for acute kidney injury using clinical adjudication. Am J Respir Crit Care Med. 2014; 189:932–939
70. Kramer AA. 2019. A continuously updated predictive analytics model for the timely detection of critically ill patients with a high risk of mortality. Med Res Archives. 2019; 7:1-12. Available at: https://journals.ke-i.org/mra/article/view/2008. Accessed September 21, 2020
71. Gostin LO. National health information privacy: Regulations under the Health Insurance Portability and Accountability Act. JAMA. 2001; 285:3015–3021
72. Leisman DE, Harhay MO, Lederer DJ, et al. Development and reporting of prediction models: Guidance for authors from editors of respiratory, sleep, and critical care journals. Crit Care Med. 2020; 48:623–633
Keywords:

Acute Physiology and Chronic Health Evaluation; critical care outcomes; mortality probability model; predictive models; scoring systems

Copyright © 2021 by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved.