Lacroix, Jacques MD, FRCPC, FAAP; Cotting, Jacques MD; for the Pediatric Acute Lung Injury and Sepsis Investigators (PALISI) Network
Critically ill patients are typically characterized by disturbances of body homeostasis. Both in adults and children, these disturbances can be estimated by measuring how much apart one or many physiologic variables are from the normal range. Composite scores can be constructed with such variables. Many types of scores have been developed (Fig. 1). Prognostic scores were developed to better describe the severity of illness at baseline of groups of critically ill patients. These scores consider some co-morbidities and physiologic disturbances at entry into the pediatric intensive care unit (PICU) or at randomization in a clinical trial. Actually, prognostic scores were developed to maximize prediction of the overall risk of mortality among groups of critically ill patients, given the severity of the patients. Outcome scores were developed to better describe the severity of illness during stay in the intensive care unit (ICU). In this instance, organ physiologic disturbances are collected daily from baseline to outcome or discharge from the ICU. Outcome scores were developed and validated to maximize description of the clinical course of groups of patients.
In this article, we will discuss shortly what is a composite score, and we will present a brief review of clinical scores that may be used in studies of sepsis or multiple organ dysfunction syndrome (MODS) done in the PICU. We will focus our discussion on two groups of predictive scores that are frequently used in the PICU, the Pediatric Risk of Mortality (PRISM) scores (1–3) and the Pediatric Index of Mortality (PIM) scores (4, 5), and on one descriptive score that estimates the severity of cases of MODS in critically ill children, namely, the Pediatric Logistic Organ Dysfunction (PELOD) score (6, 7). We will not discuss scores used in neonatal ICUs, like the Clinical Risk Index for Babies (CRIB) scores (8, 9) and the Score for Neonatal Acute Physiology (SNAP) (10, 11). A detailed description of all these scores can be found in a Web site of the Société Française d'Anesthésie Réanimation (http://www.sfar.org/s/article.php3?id_article=60); the page is written in English.
What Is a Composite Score?
PRISM, PIM, and PELOD scores are composite scores (aggregate scales) that are made up of a group of variables. It is the death rate that was used as the outcome measure to estimate the validity of all these scores. Many types of variables can be used in constructing such scores, including clinical data like heart rate, physiologic data like cardiac index, laboratory data like creatinine or Pao2, and other scores like the Glascow coma score that is integrated into the PRISM score. Points that estimate severity of illness are given to each variable in proportion to its predictive weight. The number of points of each variable should be proportional to its capacity to predict a given outcome. More points should be attributed to a given variable if the predictive value of the organ or system monitored by this variable is more significant. More points should also be given if the dysfunction is more severe. For example, in the PELOD score, severe dysfunction of the cardiovascular or neurologic system is more heavily weighted (up to 20 points) than the renal system (maximum of 10 points) (Table 1) (7). On the other hand, 20, 10, 1, 0 points can be given in the PELOD score for more or less severe dysfunction of the cardiovascular or neurologic system.
The points of all the variables incorporated in a given aggregate scale are added to get the score. The total of points can be then computed in risk of mortality. Mean risk of mortality of a population can also be compared with actual mortality to get a standardized mortality ratio.
Relevance of Composite Scores.
Well-validated composite scores can be used to harden soft data. Death is a good example of hard data: it is easy to get a consensus on its diagnosis, and there is no interobserver variability. Beauty is an example of soft data: thousands of definitions are advocated, and the evaluation of the beauty of something can be drastically different from one “expert” to the other. Soft data are frequently used in medicine. For example, one can use a qualitative scale to describe the severity of MODS in PICUs: critically ill children can have no organ dysfunction at all or a light, moderate, or severe MODS. The problem with such a 4-grade qualitative scale (no MODS, light, moderate, or severe MODS) is that the interrater variability is large: such qualitative scale can be considered as soft data because what is meant by words such as light, moderate, and severe can be very different from one caregiver to the other. There is indeed strong evidence that qualitative expressions like that are not reliable (12); this must apply to qualitative scales. A semiquantitative or ordinal score is clearly better. For example, one can describe the severity of cases of MODS by reporting the number of dysfunctional organs, which can range from zero to six in critically ill children (13, 14). There are problems here, too, because the risk of death is different from one organ to the other. For example, in a group of critically ill patients, neurologic or cardiovascular dysfunctions are more important and more predictive of death than hepatic dysfunction (7). A well-developed and well-validated quantitative score can take into account the independent weight of each variable that is integrated into it. Greater of fewer points are attributed to each variable included in the score.
Composite scores are relevant if they are used. This is the case. PRISM and PIM scores are frequently used to compare the efficacy of different PICUs, given the expected mortality predicted by these scores in these units (quality assurance and quality assessment), and in clinical trials to compare the severity of illness of patients at randomization. MODS scores are frequently used as an outcome measure in clinical trials with critically ill adults; the PELOD score would probably be used similarly in clinical studies performed in PICUs.
PROGNOSTIC (PREDICTIVE) SCORES
PRISM scores should be used in critically ill neonates, infants, children, or adolescents, not in premature infants or in adults. Three versions were published. The first was named the Physiologic Stability Index, and it contained 24 variables (1, 15). Daily (dynamic) assessment of the Physiologic Stability Index score was reported in 1986 (16). In 1988, Pollack et al. (2) published an improved version of the score that was named the Pediatric Risk of Mortality (PRISM) score (it is named PRISM II score by some intensivists). The PRISM score contained 14 variables; its daily assessment was published in 1991 (17). The PRISM score was again improved in 1996. The PRISM III score contains 17 variables or signs of cardiovascular, neurologic, or vital functions (systolic blood pressure, heart rate, Glascow coma score, pupillary reflexes, temperature), acid base status (pH, CO2, Pco2 and Pao2), chemistry tests (glucose, potassium, creatinine, blood urea nitrogen), hematology tests (white blood cell count, platelet count, prothrombin time, partial thromboplastin time), and other factors like operative status and some types of diseases (3). Data incorporated into the PRISM III score can be collected during the first 12 hrs (PRISM III-12) or during the first 24 hrs after entry into PICU (PRISM III-24). The most abnormal values are retained.
Discrimination and calibration are two very important characteristics of a score. Discrimination is the ability of a test to differentiate patients who meet the outcome (for example, death) and those who do not. The discrimination capacity (predictor performance) of a test is best described by its area under the receiver operating characteristics curve. The calibration of a score is the degree of correspondence at different levels of probability between the probability of the outcome (for example, death) as predicted by the score and the observed frequency of the outcome (Fig. 2). As shown in this figure, the PELOD score predicts well for different risk strata of mortality. The statistical analysis of calibration is usually done with the Hosmer-Lemeshow goodness-of-fit test. The statistical question is: are discrepancies between observed and expected mortality statistically significant? A p value for calibration of <.05 suggests that there is a statistically significant difference between observed and predicted mortality, which is the opposite than what we want to find. It is traditionally considered that calibration is good if the p value is >.10 (18). A higher p value is probably even better. The discrimination capacity of the PRISM III-24 to differentiate critically ill children who die and those who survive is 0.944 ± 0.021 (area under the receiver operating characteristics curve ± sem). The calibration of the PRISM III-24 score was excellent (p = .5504, 12 df).
Pollack et al. (3) also estimated the value of the PRISM III-12 score. The most abnormal values seen during the first 12 hrs in the PICU are used to calculate that score. Its discrimination capacity was 0.941 ± 0.021, and the p value analyzing the calibration of the PRISM III-12 score was 0.4168 (12 df).
Strengths of the PRISM III score are significant. It was very well validated with large sample size involving a lot of different PICUs. However, a few problems must be underlined. Early treatment bias is the main problem. As Richardson et al. (10) wrote, “The shorter the scoring period, the more the score reflects the patient's condition rather than the therapeutic response.” All scores of the PRISM series include data from the first 12 or 24 hrs in the ICU, not only at entry into the ICU; therefore, better early treatment given in the PICU before the first data of the PRISM score are observed can improve the most abnormal values collected during the first day in the ICU. For example, better mechanical ventilation should improve Pao2, Pco2, and pH if these variables are measured on a blood sample taken a few hours after entry in the PICU. This means that the average PRISM score will be lower in this ICU than it would be for similar patients in a less efficient ICU. The result could be that the risk of death can look worse in better ICUs (4). Another bias is possible. All PRISM scores include data collected up to death. A significant proportion of critically ill children who die do so during the first day of their stay in the ICU (>40% in Australia (4)), and this should inflate the capacity of PRISM scores to predict death. In other words, “there is a danger that the score is really diagnosing death rather than predicting it” (4). Another problem is that the equation required to estimate the predicted mortality with the PRISM III score is not in the public domain: it is patented, and users have to pay to get this equation. This is not well received in many countries, and it probably explains why the PRISM III score is not used in many PICUs outside North America. Despite this, the PRISM III score is used extensively in North American PICUs. It is used frequently for quality assurance and quality assessment. It is also frequently used to describe the severity of cases at baseline in the different arms of randomized clinical trials performed with critically ill children: a good balance between the severity of illness at baseline must be found if the randomization process worked correctly; if this is not the case, some adjustment is required while doing the statistical analysis.
The PIM is also a predictive score. The first version (PIM-1) was published in 1997 (4); the score was updated in 2003 (5). PIM-2 has been validated in 20,787 critically ill children from 14 ICUs in Australia, New Zealand, and the United Kingdom. PIM-2 includes ten variables that are measured at entry into the PICU: systolic blood pressure, pupillary reaction, Pao2, base excess, mechanical ventilation, elective admission to the ICU, recovering from surgery or procedure, admission after cardiac bypass, high-risk diagnosis, or low-risk diagnosis. The discrimination value of the PIM-2 is 0.90 (95% confidence interval, 0.89–0.91). The calibration is also good (p = .17, 8 df). It seems that the predictive value of the PIM-2 score is similar, if not better, than the PRISM III score, at least in Australia and New Zealand (19). PIM scores avoid problems of early treatment bias because it includes only data at entry into the PICU. Its development and validation were very well done. Its main weakness is that it has not been tested in North America and in many other countries around the world. On the other hand, the PIM score is free and there is an access to estimate the PIM score on the Internet. The PIM score can be used for the same purpose as the PRISM score. More studies are required before one can conclude that the PRISM III is better than the PIM-2 or vice versa.
General or Specific Scores?
General scales like PRISM and PIM scores may not be applicable to children with a specific disease, such as fulminant hepatic failure (20), acute renal failure (21), near-drowning (22), cancer (23), or bone marrow transplant (24, 25). This may hold true also for cases of purpura fulminans and infections caused by Neisseria meningitidis. More than 25 scores have been developed to describe the severity of illness of patients infected with this bacteria (26); nine are more frequently used (27–35). Use of laboratory tests like C-reactive protein and procalcitonin (36) has also been advocated. These tests have been validated always in small series, and the independent predictive value of each variable has been rarely studied. The predictive performance of general and specific markers or scoring systems for meningococcal septic shock in children was compared independently by at least two groups of investigators; both found that the PRISM II score was a better predictor of mortality than specific scores (37, 38). These results suggest that it could be inappropriate to apply a generic score to a specific population, such as in bone marrow transplantation or cancer. On the other hand, it means also that a score validated in a specific population is not always better than generic scores.
The diagnosis of MODS is supported by the observation in a critically ill patient of the simultaneous dysfunction of at least two organ systems. Up to seven organs have been considered: respiratory, cardiovascular, neurologic, hematologic, renal, hepatic, and gastrointestinal. The diagnostic criteria of these dysfunctions were defined by Wilkinson et al. (39) and Proulx et al (14). These definitions were updated by experts who met in San Antonio in 2002 (40) and in Boston in 2004 (41). It must be underlined that the diagnostic accuracy of the variables considered in these definitions have never been scientifically validated, but these diagnostic criteria of pediatric MODS are extensively used by practitioners and investigators.
In critically ill adults, at least three quantitative scoring systems estimating the severity of cases of MODS have been developed and validated: the Multiple Organ Dysfunction score (42), the Logistic Organ Dysfunction score (43), and the Sepsis Organ Failure Assessment (SOFA) score (44). In children, the number of dysfunctional organs is frequently used to describe the severity of cases of pediatric MODS, the rationale being that MODS with more organ dysfunctions should be more severe. This ordinal scale has been named the pediatric MODS score by some physicians. There can be zero to six or seven organ dysfunctions (the gastrointestinal dysfunction is not retained by most experts). MODS score is not without interest. There is indeed a relationship between the number of organ dysfunctions and the mortality rate (Fig. 3). However, mortality in the ICU is a function not only of the number of failing systems but also of the relative risk and of the degree of dysfunction of each system. Indeed, the predictive weight of the different organs and systems is not similar: for example, the cardiovascular and neurologic systems are more predictive of death than hepatic or renal dysfunction (Table 2). The relative weight and the severity of the organ dysfunction is not taken into account in the MODS score; these limitations cast doubt on its reliability and its usefulness.
Development and Validation.
Some pediatric intensivists considered that a good tool to estimate the severity of cases of MODS observed in PICUs was needed to describe correctly the clinical course of illnesses observed in critically ill children. Thus, it was thought appropriate to undertake a research program to create and validate a score for MODS in children. That score was named the Pediatric Logistic Organ Dysfunction (PELOD) score. Two consecutive prospective studies were completed to develop and to validate the score.
The creation and development of the PELOD score is reported in detail in an article published in 1999 (6). Item generation was carefully done. First, a list of criteria was independently generated by three pediatric intensivists, based on their clinical experience, the medical literature, and other scores used in PICUs. All clinical and biological variables used as diagnostic criteria for pediatric and adult organ dysfunction, all variables of predictive scores (PRISM, PIM, etc.), and all variables proposed by the experts were considered. The first list of items included 45 variables. Second, a notation grid (from 1 to 4 points) was used to estimate each candidate variable against a set of criteria suggested to describe the ideal descriptor of organ dysfunction (simple, readily available, clearly definable, specific for the function or one system, reproducible, responsive) as advocated by Marshall et al (42). A written questionnaire was used. Third, the experts were asked to exclude variables that were redundant and to keep only the best descriptors of organ dysfunction. Consensus was obtained using Delphi methods (45). A total of 18 variables were retained for the development study.
A prospective descriptive epidemiologic study of consecutive patients was undertaken in three multidisciplinary, university-affiliated PICUs to develop the PELOD score. The occurrence of all 18 clinical and biological variables under study was monitored daily throughout the entire PICU stay. Variables were only measured if clinical status of patients justified their knowledge. If a variable was not measured, it was assumed to be within the normal range. The most abnormal values were retained for the statistical analysis. All physiologic data accumulated during the preterminal period in dying patients (the last 2 hrs of life) were not considered for analysis. Physiologic variables for which values are age dependent were stratified into four age groups: neonates (<7 days or 1 month), infants (1–12 months), children (12–144 months), and adolescents (>144 months). The threshold of each continuous criterion was established a posteriori by using the raw data that were collected; the cutoff with the best global predictive value was retained. We used the Fisher algorithm to determine those thresholds. Moreover, the weight of each variable to predict death was estimated independently. Four levels of increasing severity were found by cluster analysis, and values of 0, 1, 10, or 20 points were attributed to these levels. Then, the severity and the weight of each organ dysfunction were integrated into the PELOD score using multivariate logistic regression. Six systems contributed to the PELOD score, and 12 variables were retained (Table 1). Table 2 summarizes the partial r2 found by multiple regression in the development and validation phases of the PELOD score. It is clear in that table that the most important organ dysfunctions were neurologic and cardiovascular. Accordingly, a greater number of points (maximum of 1, 10, or 20) were attributed to the more significant systems. The development study of the PELOD score included 594 consecutive patients and 51 deaths. The discrimination of the PELOD score was 0.98 ± 0.01 (area under the receiver operating characteristics curve ± se). The calibration was good (p = .44, 3 df).
Thereafter, a validation study was undertaken in Canada, France, and Switzerland; a total of 1,806 consecutive patients in seven PICUs were retained for analysis, including 115 deaths (7). The discrimination of the PELOD score was 0.91 ± 0.01 (area under the receiver operating characteristics curve ± se). The calibration was good (p = .54, 5 df).
A daily PELOD score was also studied during the first 5 days in the PICU. When a variable was not measured, it was assumed to be either identical to the last measurement (if the physician considered that the value of the variable did not change) or normal (if the physician considered that the value of the variable was normal). The most abnormal values during the day under evaluation were retained for analysis. The predictive value of the daily PELOD score was quite good, the area under the receiver operating characteristics curve ranging from 0.79 to 0.85 during these 5 days.
What are the limitations of the PELOD score? First, treatment bias may be a problem because the PELOD score includes data that can be modulated by the care provided during PICU stay. Thus, the PELOD score cannot differentiate therapy and severity of disease, but this bias is unavoidable unless one is ready to give no treatment to critically ill children. Second, the PELOD score has not been tested in countries other than Canada, France, and Switzerland; its applicability to other countries needs to be studied. Third, the PELOD score is not validated to predict post-ICU morbidity and mortality; further studies are required before the PELOD score can be used as a surrogate outcome of post-ICU morbidity and mortality.
PELOD Score and Sepsis.
Recently, further evaluation of the PELOD score was done in the context of sepsis. Four “septic states” have been defined in critically ill patients: systemic inflammatory response syndrome (SIRS), sepsis, severe sepsis, and septic shock (40, 46, 47). One can ask if knowing both the PELOD score and the worst septic state observed during PICU stay improves our capacity to predict the risk of death. To address this question, we tested the hypothesis that the risk of death increases with the PELOD score and with the severity of the worst septic state and that this increase is higher if one takes into account both the PELOD score and the worst septic state (48). Septic states were prospectively recorded during the development study of the PELOD score. The hazard ratios of deaths were 7.43 for SIRS and sepsis (we combined hazard ratios of SIRS and sepsis because they were similar), 27.40 for severe sepsis, and 61.40 for septic shock. If we took into account the hazard ratio of the PELOD score (1.096PELOD point), the adjusted hazard ratios became 9.04 for SIRS/sepsis, 18.8 for severe sepsis, and 32.6 for septic shock. The combined hazard ratio (HR) of death was calculated with the following equation: (HRPELOD point × HRseptic state). For example, a patient in severe sepsis presents with a PELOD score of 24. Its combined hazard ratio would be 169.6 (1.09624× 18.8). This is significantly higher than the hazard ratio of 27.40 that we got for severe sepsis alone, without any reference to the PELOD score. Therefore, we can conclude that there is some accrual in the information collected if one takes into account both the PELOD score and the worst septic state of a group of critically ill children.
PELOD Score as Surrogate Outcome.
Can the PELOD score be used as a surrogate outcome of death in the PICU? Presently, it is the incidence rate of death in ICUs that is considered the standard outcome measure for clinical trials run in critically ill adults. The main justification for such a choice is that death is hard data and that it is quite difficult to bias such outcome. The feasibility of a randomized clinical trial run in PICUs may be a problem if one chooses mortality rate as the primary outcome measure because death is a rare event in PICUs: death rate is between 20% and 40% in adult ICUs, whereas it is 4% to 6% in PICUs (7). Such a low rate of death should increase significantly the sample size required to complete pediatric trials. Indeed, the number of patients required may be so huge that it could be impossible to collect them. The incidence rate of MODS in PICUs is about 18% to 25% (14, 48); this is significantly higher than the death rate. Thus, it makes sense to ask the question: can the PELOD score be a good surrogate outcome of death for clinical studies run in PICUs?
By definition, a surrogate outcome is an outcome measure used instead of the gold standard. Its relationship to the gold standard must be good; this is the case for MODS and PELOD scores because MODS is present in almost all dying critically ill children. There must be a meaningful cause–effect relationship between the surrogate outcome and the gold standard. MODS is the cause of death of most critically ill children who die in the PICU (14). The best way to check if there is really a cause–effect relationship between MODS or PELOD score and death would be to test the hypothesis that improving MODS or PELOD score decreases death rate; this remains to be done. The prevalence or incidence rate of the surrogate outcome should be significantly greater than it is for the gold standard; the incidence rate of MODS is at least three to four times the death rate in PICUs. Moreover, the PELOD score is a quantitative scale, not a dichotomous variable; therefore, using the PELOD score must decrease even more the sample size required to complete randomized clinical trials. The scientific value of a surrogate outcome must be well estimated; this is the case of the PELOD score.
There are pros and cons of using a surrogate outcome. Surrogate outcomes are frequently misused (49). For example, it might look appropriate to use as a surrogate outcome the blood level of protein C in a randomized clinical trial on the efficacy of activated protein C. This is inappropriate because the link between the blood level and clinically relevant outcome like survival or severity of MODS is not well established. Actually, the blood level of activated protein C would be a good measure of the compliance to the research protocol, but it cannot be considered as a reliable surrogate outcome of death. On the other hand, a surrogate outcome can be better than the gold standard in some circumstances. As stated by Zygun et al. (50), “death cannot predict the neurologic outcome of severe brain injury, while there are data suggesting that a MOD [Multiple Organ Dysfunction] score can in critically ill adults with severe brain injury.” That there is such relationship between the PELOD score and neurologic outcome or the quality of life of critically ill children who survive a PICU stay remains to be determined. On the other hand, it is clear that using a severity scale like the PELOD score as a surrogate outcome of death should have indeed a great impact on the sample size of clinical trials undertaken in PICUs. Thus, it makes sense to use the PELOD score as a surrogate outcome in some randomized clinical trial run in PICUs.
The PELOD and the daily PELOD scores seem to be valid measures of the severity of illness in the PICU. The delta PELOD score (ΔPELOD equals the worst PELOD after randomization minus daily PELOD score at day of randomization) may also be useful to chart the course of critical illness. Actually, all three measures (PELOD, daily PELOD, and ΔPELOD) would probably be good surrogate outcome measures of death in PICUs in phase II and phase III randomized clinical trials (51). We also showed that the risk of death during PICU stay is better predicted by taking into account both the hazard ratio of death of the PELOD score and the hazard ratio of the worst septic state observed during PICU stay. On the other hand, we do not know if the PELOD score is a good surrogate outcome measure of long-term morbidity and mortality after a stay in a PICU.
Is the PELOD system working? Only experience will show what is the external validity of the PELOD score. The score was described in 2003. It is presently used in many ongoing multiple-center clinical studies of the Pediatric Acute Lung Injury and Sepsis Investigators (PALISI) Network and of the Canadian Critical Care Trials Group (CCCTG). For example, the PELOD score and the daily PELOD scores from randomization to death or discharge from PICU are used in the Transfusion Requirements in Pediatric Intensive Care Units (TRIPICU) study, a multiple-center, randomized, clinical trial driven by the CCCTG and the PALISI Network. More experience with the PELOD score is required before one can conclude that it is really useful and reliable.
Composite scoring systems can be used to predict or to describe outcome. Prognostic scores, like PRISM and PIM, are validated to maximize prediction of severity of illness in PICUs. In this instance, data are collected at baseline. The PELOD score measures severity of illness after baseline. It was validated to maximize the description of clinical course of critically ill children. In this instance, data are collected from baseline to death or discharge from the PICU. In the case of septic children, the descriptive capacity of the PELOD score is even better when one takes into account the score and the worst septic state of the patient (SIRS/sepsis, severe sepsis, or septic shock) (48).
All scores discussed in this article (PRISM, PIM, PELOD) were very well validated with respect to short-term outcome (death in PICUs). Despite this, their limitations must be recognized. General scores like the PRISM and PIM scores may not be applicable to populations in a geographic region different from those where the scores were validated (52–54) or to patients observed in different sites (i.e., pre-ICU (55)). The validity of these scores outside the ICU and in countries outside North America, Australia, and some European countries remains to be determined. The value of these scores must also be tested before they are used to describe patients with specific disease. Moreover, mortality changes with time in PICUs (56); such secular trend can change the discrimination and the calibration of these scores with respect to mortality, which means that these scores must be updated and reevaluated regularly.
No score has been validated specifically in critically ill children in SIRS or sepsis. However, these scores were validated in studies in which consecutive patients were included; the frequency of SIRS and septic states is so high in PICUs (between 82% (14) and 87% (48)) that these scores are probably applicable to patients in SIRS or sepsis.
None of the scores discussed in this article were validated to predict or to describe long-term outcomes, like mortality or morbidity observed after PICU stay. We need to know what is the epidemiology of post-ICU mortality and morbidity/disability in children. In other words, what mortality and severe morbidity can be attributed to ICU-related events in children must be better delineated. Moreover, we must find predictors that can be detected in the ICU of post-ICU mortality and morbidity. Such predictors would be useful while running clinical trials planned to improve not only short-term morbidity (MODS) and mortality (death in the ICU) but also long-term morbidity and mortality.
Predictive scores can also be used to select patients to be included into randomized clinical trials on sepsis. As suggested by Cohen et al. (57), the chance that a given intervention improves the outcome of mild or extremely severe cases of sepsis (Fig. 4, zones A and C) is so weak that it is thought by some experts that such cases must be excluded from clinical trials. Civetta (58) wrote, “It is clear that the group providing most of the medical, ethical, and financial problems lies right in the middle of the spectrum of all indices.” Predictive scores (PRISM III, PIM-2) can be used to retain only patients with a significant risk of adverse outcome and with a significant chance of recovery (Fig. 4, zone B). In other words, we recommend that the target group for enrollment in randomized clinical trials on sepsis in PICUs be limited to patients for whom there is the greatest chance of demonstrating a treatment effect, and we are suggesting use of a predictive score to screen and to select these patients.
Actually, it may be even better to use a staging system to stratify critically ill children included in randomized clinical trials on sepsis (those in zone B of Fig. 4). The IRO system advocated by Marshall et al. (59) can be a good choice. In this system, I stands for insult (infection, inflammation caused by trauma, etc.), R stands for response, and O stands for organ dysfunction. Response (R) can be described by baseline severity of illness, which can be estimated with scores like PRISM III or PIM-2, genetic variability, and presence or severity of therapeutic target. The magnitude of organ dysfunction (O) can be described with the PELOD score with or without the septic state (48).
Notwithstanding the limitations listed above, one can conclude that the accuracy, precision, and reliability of at least two predictive scores (PRISM III and PIM-2) and one descriptive score (PELOD score) are quite well established. Predictive scores are frequently used to check if randomization works and if the groups of a randomized clinical trial are balanced at time 0, whereas descriptive scores can be used in the same setting to describe changes after time 0 in the clinical status of participants of a randomized clinical trial. Both predictive and descriptive scores of severity of illness may be useful to select patients who must be included into randomized clinical trials of sepsis. All these scores need frequent revalidation and updating because case mix and risk of mortality in ICUs change with time.
We thank members of the Pediatric Acute Lung Injury and Sepsis Investigators (PALISI) Network and of the Canadian Critical Trials Group (CCCTG) for their support and Francis Leclerc and Stéphane Leteurtre for their collaboration.
1. Yeh TS, Pollack MM, Ruttimann UE, et al: Validation of a physiologic stability index for use in critically ill infants and children. Pediatr Res 1984; 18:445–451
2. Pollack MM, Ruttimann UE, Getson PR: Pediatric risk of mortality (PRISM) score. Crit Care Med 1988; 16:1110–1116
3. Pollack MM, Patel KM, Ruttimann UE: PRISM III: An updated pediatric risk of mortality score. Crit Care Med 1996; 24:743–752
4. Shann F, Pearson G, Slater A, et al: Paediatric index of mortality (PIM): A mortality prediction model for children in intensive care. Intensive Care Med 1997; 23:201–207
5. Slater A, Shann F, Pearson G, et al: PIM2: A revised version of the Paediatric Index of Mortality. Intensive Care Med 2003; 29:278–285
6. Leteurtre S, Martinot A, Duhamel A, et al: Development of a pediatric multiple organ dysfunction score: Use of two strategies. Med Decis Making 1999; 19:399–410
7. Leteurtre S, Martinot A, Duhamel A, et al: Validation of the pediatric logistic organ dysfunction (PELOD) score: A prospective multicenter study. Lancet 2003; 362:192–197
8. The CRIB (clinical risk index for babies) score: A tool for assessing initial neonatal risk and comparing performance of neonatal intensive care units. The International Neonatal Network. Lancet 1993; 342:193–198
9. Parry G, Tucker J, Tarnow-Mordi W, et al: CRIB II: An update of the clinical risk index for babies score. Lancet 2003; 361:1789–1789
10. Richardson DK, Gray JE, McCormick MC, et al: Score for Neonatal Acute Physiology: A physiologic severity index for neonatal intensive care. Pediatrics 1993; 91:617–623
11. Richardson DK, Corcoran JD, Escobar GJ, et al: SNAP-II and SNAPPE-II: Simplified newborn illness severity and mortality risk scores. J Pediatr 2001; 138:92–100
12. Kong A, Barnett GO, Mosteller F, et al: How medical professionals evaluate expressions of probability. N Engl J Med 1986; 315:740–744
13. Wilkinson JD, Pollack MM, Glass NL, et al: Mortality associated with multiple organ system failure and sepsis in pediatric intensive care unit. J Pediatr 1987; 111:324–328
14. Proulx F, Fayon M, Farrell CA, et al: Epidemiology of sepsis and multiple organ dysfunction syndrome in children. Chest 1996; 109:1033–1037
15. Pollack MM, Ruttiman UE, Getson PR: Accurate prediction of the outcome of pediatric intensive care: A new quantitative method. N Engl J Med 1987; 316:134–139
16. Ruttimann UE, Albert A, Pollack MM, et al: Dynamic assessment of severity of illness in pediatric intensive care. Crit Care Med 1986; 14:215–221
17. Ruttimann UE, Pollack MM: Objective assessment of changing mortality risks in pediatric intensive care unit patients. Crit Care Med 1991; 19:474–483
18. Tibby SM, Murdoch IA: Calibration of the paediatric index of mortality score for UK paediatric intensive care. Arch Dis Child 2002; 86:65–69
19. Slater A, Shann F: The suitability of PIM, PIM2, PRISM and PRISM III for monitoring the quality of pediatric intensive care in Australia and New Zealand: For the ANZICS Paediatric Study Group. Pediatr Crit Care Med 2004; 5:447–454
20. Tissieres P, Prontera W, Chevret L, et al: The pediatric risk of mortality score in infants and children with fulminant liver failure. Pediatr Transplant 2003; 7:64–68
21. Fargason CA, Langman CB: Limitations of the pediatric risk of mortality score in assessing children with acute renal failure. Pediatr Nephrol 1993; 7:703–707
22. Gonzalez-Luis G, Pons M, Cambra FJ, et al: Use of the Pediatric Risk of Mortality Score as predictor of death and serious neurologic damage in children after submersion. Pediatr Emerg Care 2001; 17:405–409
23. Ben Abraham R, Toren A, Ono N, et al: Predictors of outcome in the pediatric intensive care units of children with malignancies. J Pediatr Hematol Oncol 2002; 24:23–26
24. Schneider DT, Lemburg P, Sprock I, et al: Introduction of the oncological pediatric risk of mortality score (O-PRISM) for ICU support following stem cell transplantation in children. Bone Marrow Transplant 2000; 25:1079–1086
25. Schneider DT, Cho J, Laws HJ, et al: Serial evaluation of the oncological pediatric risk of mortality (O-PRISM) score following allogeneic bone marrow transplantation in children. Bone Marrow Transplant 2002; 29:383–389
26. Cartwright K, Kroll S: Optimising the investigation of meningococcal disease. BMJ 1997; 315:757–758
27. Stiehm ER, Damrosch DS: Factors in the prognosis of meningococcal infection: Review of 63 cases with emphasis on recognition and management of the severely ill patient. J Pediatr 1966; 68:457–467
28. Niklasson PM, Lundbergh P, Strandell T: Prognostic factors in meningococcal disease. Scand J Infect Dis 1971; 3:17–25
29. Kahn A, Blum D: Factors for poor prognosis in fulminating meningococcemia: Conclusions from observations of 67 childhood cases. Clin Pediatr (Phila) 1978; 17:680–682
30. Lewis LS: Prognostic factors in acute meningococcaemia. Arch Dis Child 1979; 54:44–48
31. Ansari BM, Davies DB, Boyce JM: A comparative study of adverse factors in meningococcaemia and meningococcal meningitis. Postgrad Med J 1979; 55:780–783
32. Leclerc F, Beuscart R, Guillois B, et al: Prognostic factors of severe infectious purpura in children. Intensive Care Med 1985; 11:140–143
33. Flaegstad T, Kaaresen PI, Stokland T, et al: Factors associated with fatal outcome in childhood meningococcal disease. Acta Paediatr 1995; 84:1137–1142
34. Gedde-Dahl TW, Bjark P, Hoiby EA, et al: Severity of meningococcal disease: Assessment by factors and scores and implications for patient management. Rev Infect Dis 1990; 12:973–992
35. Riordan FA, Marzouk O, Thomson AP, et al: Prospective validation of the Glasgow Meningococcal Septicaemia Prognostic Score: Comparison with other scoring methods. Eur J Pediatr 2002; 161:531–537
36. Van der Kaay DC, De Kleijn ED, De Rijke YB, et al: Procalcitonin as a prognostic marker in meningococcal disease. Intensive Care Med 2002; 28:1606–1612
37. Castellanos-Ortega A, Delgado-Rodriguez M: Comparison of the performance of two general and three specific scoring systems for meningococcal septic shock in children. Crit Care Med 2000; 28:2967–2973
38. Leteurtre S, Leclerc F, Martinot A, et al: Can generic scores (Pediatric Risk of Mortality and Pediatric Index of Mortality) replace specific scores in predicting the outcome of presumed meningococcal septic shock in children? Crit Care Med 2001; 29:1239–1246
39. Wilkinson JD, Pollack MM, Ruttimann VE, et al: Outcome of pediatric patients with multiple organ system failure. Crit Care Med 1986; 14:271–274
40. Goldstein B, Giroir B, Randolph A: International Pediatric Severe Sepsis Consensus Conference: Definitions for sepsis and organ dysfunction in pediatrics. Pediatr Crit Care Med 2005; 6:2–8
41. Brilli RJ, Goldstein B: Pediatric sepsis definitions: Past, present, and future. Pediatr Crit Care Med 2005; 6(Suppl):S6–8
42. Marshall JC, Cook DJ, Christou NV, et al: Multiple organ dysfunction score: A reliable descriptor of a complex clinical outcome. Crit Care Med 1995; 23:1638–1652
43. Le Gall JR, Klar J, Lemeshow S, et al: The Logistic Organ Dysfunction system: A new way to assess organ dysfunction in the intensive care unit. JAMA 1996; 276:802–810
44. Vincent JL, Moreno R, Takala J, et al: The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. Intensive Care Med 1996; 22:707–710
45. Jones J, Hunter D: Consensus methods for medical and health services research. BMJ 1995; 311:376–380
46. American College of Chest Physicians/Society of Critical Care Medicine Consensus Conference: Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. Crit Care Med 1992; 20:864–874
47. Hayden WR: Sepsis terminology in pediatrics. J Pediatr 1994; 124:657–658
48. Leclerc F, Leteurtre S, Duhamel A, et al: Cumulative influence of organ dysfunctions and septic state on mortality of critically ill children. Am J Respir Crit Care Med 2005; 171:348–353
49. Fleming TR, DeMets DL: Surrogate end points in clinical trials: Are we being misled? Ann Intern Med 1996; 125:605–613
50. Zygun D, Kortbeek JB, Fick GH, et al: Non-neurological organ dysfunction in severe traumatic brain injury. Crit Care Med 2005; 33:654–660
51. Marshall JC: Charting the course of critical illness: Prognostication and outcome description in the intensive care unit. Crit Care Med 1999; 27:676–678
52. Wells M, Riera-Fanego JF, Luyt DK, et al: Poor discriminatory performance of the Pediatric Risk of Mortality (PRISM) score in a South African intensive care unit. Crit Care Med 1996; 24:1507–1513
53. Wang JN, Wu JM, Chen YJ: Validity of the updated pediatric risk of mortality score (PRISM III) in predicting the probability of mortality in a pediatric intensive care unit. Acta Paediatr Taiwan 2001; 42:333–337
54. Deerojanawong J, Prapphal N, Udomittipong K: PRISM score and factors predicting mortality of patients with respiratory failure in the pediatric intensive care unit. J Med Assoc Thai 2001; 84(Suppl 1):S68–S75
55. Orr RA, Venkataraman ST, Cinoman MI, et al: Pretransport Pediatric Risk of Mortality (PRISM) score underestimates the requirement for intensive care or major interventions during interhospital transport. Crit Care Med 1994; 22:101–107
56. Tilford JM, Roberson PK, Lensing S, et al: Differences in pediatric ICU mortality risk over time. Crit Care Med 1998; 26:1737–1743
57. Cohen J, Guyatt G, Bernard GR, et al: New strategies for clinical trials in patients with sepsis and septic shock. Crit Care Med 2001; 29:880–886
58. Civetta JM: Scoring systems: Do we need a different approach? Crit Care Med 1991; 19:1460–1461
59. Marshall JC, Vincent JL, Fink MP, et al: Measures, markers, and mediators: Toward a staging system for clinical sepsis. A report of the Fifth Toronto Sepsis Roundtable, Toronto, Ontario, Canada, October 25–26, 2000. Crit Care Med 2003; 31:1560–1567
©2005The Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies