Critically ill patients admitted to an ICU face a substantial risk of death (1); thus, all-cause mortality is a common measure of treatment effect in ICU-based randomized clinical trials (RCTs) (2–4). Mortality is patient-centered, objective, and definitive integrating the net benefits and harms of life-sustaining therapies (5).
Because of ICU support, however, the risk of dying unfolds over time and is impacted by factors other than the immediate acute illness. Death sometimes results from a catastrophic event—devastating brain injury, uncontrollable hemorrhage, or cardiac arrest—and occurs early during the ICU course (6). More commonly, it is the outcome of a contingent series of complications of the presenting illness that lead to a decision to discontinue life support measures (7). Both the timing of this decision and the likelihood of making it are influenced by factors such as age, comorbidities, prior wishes, and cultural and religious mores (8).
The interpretation of a mortality signal, or its absence, in a clinical trial can also be uncertain (9). Well-conducted and adequately powered clinical trials using survival as a measure of efficacy have yielded divergent conclusions about interventions as the use of corticosteroids (10,11), the management of hyperglycemia (12,13), and the use of adjuvant therapies for sepsis (14,15), leading to pessimism about “negative” trials in the ICU setting (16).
Much about the performance of an all-cause mortality endpoint in clinical trials in critical illness is unknown, or based on untested assumptions, leaving important questions unanswered. When do these deaths occur, and how does this vary with the population studied? When an intervention shows benefit or harm, when is that differential effect maximal, over what time frame is it evident, and for how long does it persist? How do landmark time point estimates (e.g., mortality at day 28) differ from process-based endpoints (e.g., survival to hospital discharge)? And when a mortality signal is demonstrated in a clinical trial, is that information incorporated into clinical practice?
We sought to understand the performance properties of all-cause mortality as a measure of treatment effect in ICU-based RCTs and to address these questions through a systematic review and meta-analysis of large RCTs conducted over the past 3 decades.
We conducted a systematic review of RCTs that recruited critically ill adult patients and were published in one of five high-impact general medical journals (New England Journal of Medicine, JAMA, Lancet, Annals of Internal Medicine, and British Medical Journal) or eight subspecialty journals (American Journal of Respiratory and Critical Care Medicine, Critical Care Medicine, Intensive Care Medicine, Chest, Critical Care, Journal of Critical Care, Shock, and Journal of Trauma, Infection, and Critical Care) between 1990 and 2018; the search strategy is summarized in the Supplemental Methods (https://links.lww.com/CCM/H244).
We included trials that enrolled at least 100 patients and that reported mortality outcomes at two or more prespecified time points: ICU discharge, hospital discharge, 7 days (range, 5–10 d), 14 days, 28 or 30 days, 60 days, 90 days, 180 days, and greater than or equal to 1-year. Risk of bias was adjudicated using the Cochrane Collaboration’s published tool (17).
We extracted the following data from each study: year of publication; trial size; study population (sepsis, acute respiratory distress syndrome [ARDS], or other); risk of bias (allocation concealment, blinding of outcome assessors, number of centers, whether analysis was by intention to treat, whether the trial was stopped early for benefit, and number of randomized patients lost to follow-up); duration of ICU and hospital length of stay; and mortality outcomes (deaths in each randomized group at each prespecified time point). Mortality outcomes for each time point were recorded when reported explicitly or estimated from survival curves.
Conceptual Framework and Data Analysis
We hypothesized that by analyzing the trajectory of mortality data from multiple trials, pooled using meta-analytic techniques, we could identify a time point where a differential signal was maximal and one beyond which a differential effect was no longer evident.
To determine the temporal course of a “differential” mortality signal, we calculated the change in mortality over successive time intervals—7, 14, 28–30, 60, 90, and 180 days—for all trials where these data were available. For each trial, the incremental absolute risk difference [RD] is the change in mortality risk between successive time points for each study group A and B, where A was the intervention with the higher mortality at the “first” time point that mortality was measured:
Where D2,A and D2,B are the cumulative number of deaths at the second time point, D1,A and D1,B are the cumulative number of deaths at the first time point, and NA and NB are the total numbers of patients randomized to study groups A and B. The numerators are the number of additional deaths that occurred between the two time points in each randomized group, while the denominators reflect the number of patients alive at the earlier time point. Further details and an example are available in the Supplemental Methods (https://links.lww.com/CCM/H244).
Because we are aggregating data from studies evaluating multiple interventions in heterogeneous patient populations, we calculated the incremental RD at the trial level and pooled these estimates using a meta-analytic random effects model (18). We analyzed data using Review Manager (RevMan Version 5.2; Cochrane Collaboration, Oxford, United Kingdom). Statistical heterogeneity among trials was assessed using both the I2 statistic (19) and tau, the sd of the between-study distribution of effects derived from random effects meta-analysis. We conducted sensitivity analyses, analyzing RCTs enrolling patients with sepsis or systemic inflammatory response syndrome, RCTs enrolling patients with acute lung injury (ALI) or ARDS, and RCTs enrolling patients without either of these conditions and used the Z test to calculate interaction p values comparing RDs between these three different pairs of subgroups of RCTs. To study the potential confounding effects of transfer to long-term acute care facilities in the United States on location-based mortality measures, for comparisons involving ICU or hospital mortality, we stratified RCTs by the proportion of enrolled patients originating from the United States versus other countries, categorized as high (> 90%), intermediate (10–90%), or low (< 10%) percentage of Americans.
We further evaluated trial impact by analysis of whether the conclusions resulted in regulatory approval of a novel therapy or incorporation into clinical practice guidelines, including the Surviving Sepsis Campaign guidelines (20), the European Respiratory Society guidelines on noninvasive ventilation (21), the American Heart Association/American Stroke Association recommendations for the management to cerebral infarction (22), the Brain Trauma Foundation guidelines for the management of severe traumatic brain injury (23), and the Kidney Disease Improving Global Outcomes guidelines for the management of acute renal failure (24).
Baseline data are presented as means and sds, or medians and interquartile ranges (IQRs). Differences in means were assessed using an unpaired t test, and differences in medians using the Mann-Whitney U test. We considered a (two-sided) p value of less than 0.05 as statistically significant.
There was no funding source for this study.
From 2,592 citations identified through the search strategy, we retrieved 813 articles for a detailed review. After excluding trials that reported mortality at a single time point only, trials that randomized fewer than 50 patients per group, and cluster (n = 13), pediatric (n = 16), and neonatal RCTs (n = 26), we included 346 publications in our analysis (Fig. 1). These 346 publications reported 343 separate RCTs comparing 350 trial interventions (for details see Supplemental Table 1, https://links.lww.com/CCM/H244), and recruited 228,784 patients. Both the number of trials and the duration of follow-up within each trial increased over each decade (Supplemental Fig. 1, https://links.lww.com/CCM/H244).
Most trials (78%) were multicenter studies; they recruited a median of 324 subjects (IQR, 188–735.5; Table 1). Risk of bias was low; most trials reported allocation concealment (93%), use of an intention to treat analysis (99%), recruitment of planned sample size (94%), and a loss to follow-up of less than 5% (91%). Outcome assessment was blinded in 47% of included RCTs.
TABLE 1. -
Characteristics of 343 Included Trials
n (%; IQR)
|Trial size (median number of subjects [IQR])
| Acute respiratory distress syndrome/acute lung injury
|Follow-up (median [IQR])
| ICU discharge (n = 75)
| Hospital discharge (n = 79)
| 7 d (n = 231; four reported at 5, 6, 8, or 10 d)
| 14 d (n = 228; two reported at 15 d)
| 28–30 d (n = 268)
| 60 d (n = 135; two reported at 56 d)
| 90 d (n = 143)
| 180 d (n = 69; one reported at 140 d)
| 1 yr (n = 24; two reported at 2 and 2.4 yr, respectively)
| Low (< 10% of patients [or sites])
| Intermediate (10–90% of patients [or sites])
| High (> 90% of patients [or sites])
IQR = interquartile range.
Mortality was the primary outcome measure in 50% of the trials and was most commonly measured at 28–30 days after randomization (268/343 [78%] of the included RCTs; Table 1). Mortality was explicitly reported at days 28–30 in 214 trials (80%); it was measured from survival curves in the remaining 54 (20%) trials. Mortality could be ascertained at 7 days in 231 studies (36 by direct report and 195 estimated from survival curves), at 14 days in 228 studies (direct report in 44, from survival curves in 184 studies), at 60 days in 135 studies (57 direct report, 78 from survival curves), at 90 days in 143 studies (109 direct report, 34 from survival curves), at 180 days in 69 studies (63 by direct report and six from survival curves), and at 1 year in 24 studies (22 by direct report). Unadjusted 28–30 days control group mortality declined from a median of 38% in the 1990s to 27% in the 2010s (p < 0.001). Similar declines in control group mortality occurred in trials recruiting patients with sepsis, ALI/ARDS, and other diagnoses (Supplemental Fig. 2, https://links.lww.com/CCM/H244).
The Temporal Trajectory of Mortality
We analyzed time to death among patients who died during trial follow-up. Most deaths occurred early following randomization: approximately one third of all patients who had succumbed by 180 days had died before day 7, and one-half by day 14 (Fig. 2). On average, 77% (IQR, 67–86%) of in hospital deaths, and 71% of deaths occurring by 90 days, occurred prior to ICU discharge. The median number of days till death was slightly longer for patients recruited to trials of interventions for ARDS; however, the shape of the survival curves was similar for all diagnoses, with an early steep rise in mortality, followed by a more gradual rise through 90 days, and a smaller rise over the ensuing 90 days (Supplemental Fig. 3, https://links.lww.com/CCM/H244).
When Does a Differential Mortality Signal Occur?
The conclusion of an RCT that an intervention is effective arises through demonstration of a statistically significant difference in rates of the primary endpoint. Among the trials included in our study, we sought to determine when following randomization, this differential effect was greatest and for how long a differential effect persists. We anticipated that relatively few trials would report significant effects at conventional levels of α (p < 0.05); therefore, we decided a priori to include trials for which there was a differential mortality effect of p value of less than 0.2 at one or more time points in our primary analysis. As shown in Figure 3, for the 162 trials for which there was a differential mortality effect of p value of less than 0.2 at one or more time points, the RD between treatment arms continued to increase through day 60. Further separation between study arms was evident between days 60 and 90; however, the difference was no longer statistically significant (p = 0.07). The differences persisted for at least 6 months and likely up to 1 year, although only eight studies reported follow-up to a full year. We conducted two sensitivity analyses using only trials for which there was a differential mortality of p value of less than 0.05 at one or more time points during the trial and all trials regardless of p value. The same temporal course in incremental RD was seen in both analyses (Supplemental Fig. 4, https://links.lww.com/CCM/H244). A further sensitivity analysis by trial focus—sepsis, ARDS, or other—showed that the temporal course for differential treatment effect was similar across diagnoses (Supplemental Fig. 5, https://links.lww.com/CCM/H244).
Both overall mortality and the differential mortality between treatment groups increased between 28–30 and 90 days. For all trials reporting a mortality difference at an α level of p value of less than 0.2 at one or more time points, and for which mortality data were available at both 28–30 and 90 days (n = 72), median mortality in the arm with lower mortality increased from 24% (IQR, 15–31%) at 28–30 days to 31% (IQR, 22–35%) at 90 days, while the differential mortality signal over this time period increased by 0.94% (95% CI, 0.31–1.58%).
Time- Versus Location-Based Mortality Measures
The number of deaths at ICU discharge approximated the number of 28–30-day deaths (95% [IQR, 86–106%]), while the number of deaths at hospital discharge approximated the number of 60-day deaths (99% [IQR, 94–104%]) (Fig. 4). Location-based outcomes (discharge from the ICU or hospital) are potentially confounded by differences in the organization of care, particularly by the widespread availability of long-term ventilator facilities in the United States (25). Consistent with this, the mean ICU length of stay in trials with fewer than 10% American patients was 15 ± 6 days compared with 6 ± 2 days in trials with more than 90% American patients (p < 0.001) and mean hospital length of stay in trials with fewer than 10% American patients was 27 ± 10 days compared with 15 ± 5 days in trials with more than 90% American patients (p < 0.001).
Clinical Impact of Interventions That Reduce Mortality in Clinical Trials
Of the 350 comparisons in our review, 51 (15%) reported benefit for one study intervention at one or more time points at an α level of p value of less than 0.05, and 15 (4%) reported an increased risk of harm (Supplemental Table 2, https://links.lww.com/CCM/H244). Only 12 of 43 (28%) unique interventions reported to reduce mortality have been incorporated into practice guidelines. A trial (26) that led to a change in the formulation of the sedative agent, propofol (https://www.accessdata.fda.gov/drugsatfda_docs/label/2017/019627s066lbl.pdf) was also considered to have impacted practice, although it was not cited in guidelines. With the exception of drotrecogin alpha activated (Xigris) that was briefly licensed for the treatment of sepsis (14), none of the seven studies of novel sepsis therapies that reported a mortality benefit (27–32) resulted in the regulatory approval of a new therapy. Similarly, no novel pharmacologic agents have been introduced for the treatment of ARDS, and the benefit apparent in trials of mechanical ventilation has reflected minimization of the harms of positive pressure ventilation (33–36).
We examined the performance characteristics of a mortality endpoint through a systematic review of 343 trials published over 28 years. Our work has four main findings.
First, independent of the diagnostic group studied, mortality in ICU-based trials displays a characteristic trajectory, with half of all deaths occurring in the first two weeks following randomization and three quarters by 28 to 30 days, but with additional deaths continuing to accrue over the next 6 months. On average, aggregate mortality increases from 29% on days 28–30 to 37% on days 90. The time to death is slightly longer in studies of ARDS, but the trajectory is similar. Three factors may explain this convergence. First, ICUs provide support rather than treatment. ICU survival depends on success in weaning from mechanical ventilation and in avoiding complications such as acute kidney injury or ventilator-associated pneumonia that are common to all critically ill ICU patients. Second, ICU diagnoses are syndromic, and there is substantial overlap in these. A patient admitted with community-acquired pneumonia may meet criteria for a trial in sepsis or ARDS and also be a candidate for a trial of stress ulcer prophylaxis. Finally, ICU patients typically have other life-limiting comorbidities, and death is less often a result of an inability to provide support than of a conscious acceptance that prolonged support is inappropriate (37). A recent trial showed a spike in deaths at day 29 (38), suggesting that these decisions may be delayed so that they will not impact trial conclusions.
Second, using a meta-analytic derived incremental RD calculation, we found that a differential mortality risk is evident 60 or more days following randomization. The conventional landmark time point for measuring mortality in ICU-based trials has been 28 days, based on Food and Drug Administration from 3 decades ago (39). Although a differential mortality signal occurs early, a longer time horizon appears necessary to capture the full survival impact of ICU-based interventions. A higher event rate and a larger differential event rate at a more distal time point have important implications for sample size calculations. Pooling data from the 72 trials that reported an α level of p value of less than 0.2 for mortality, median mortality was 24% and 29% for the lower and higher mortality arms at days 28–30, and 31% and 37% at day 90. To demonstrate these differences at an α level of p value of less than 0.05 and a power of 80% would require 2,444 patients for the earlier time point compared with 1,954 patients for the later.
Third, we found that for trials conducted outside the United States, 28–30–day mortality approximates ICU mortality, while 60-day mortality approximates hospital mortality. Follow-up at a landmark time point may require in-person or telephone contact with study subjects, while ICU or hospital discharge can be determined from the hospital record, an important consideration in an era of large, multinational, but modestly funded investigator-initiated trials (40,41).
Finally, although mortality was the primary outcome in the majority of trials in our review, a mortality benefit was inconsistently associated with adoption of the intervention into clinical practice. Conclusions were incorporated into clinical guidelines for only 13 of 43 studies (30%) showing a mortality reduction. None of these are novel pharmacologic therapies.
Our study has limitations that may impact its interpretation. Even though the overall number of included RCTs was large, some of the binary comparisons were based on very small numbers. In particular, very few U.S.-based RCTs reported ICU or hospital mortality, and very few of the RCTs reported mortality rates beyond 90 days, resulting in low statistical power to detect further separation of survival curves at later time points. We only included trials for which mortality was available at two or more time points, so the sample was not necessarily representative of all large ICU-based trials. Further, in pooling mortality trajectories across a diverse group of interventions and study contexts, we may have obscured subtle differences in course based on where the study was conducted or what was compared.
In summary, incremental differences in mortality rates between intervention and control groups occur until at least day 60. The number of hospital discharge deaths is similar to deaths at 60 days, suggesting that hospital discharge mortality can be used as an alternate to day 60 mortality. Finally, while mortality is commonly used as the primary endpoint for ICU trials when an effect was seen, the interventions were inconsistently adopted into clinical practice, suggesting that factors other than evidence of survival benefit impact clinician uptake of trial results (3).
We acknowledge with gratitude the efforts of Wilson Kwong and Kamya Kommaraju who identified and abstracted some of the articles included in this article.
1. Vincent JL, Marshall JC, Namendys-Silva SA, et al.: Assessment of the worldwide burden of critical illness: The intensive care over nations (ICON) audit. Lancet Respir Med. 2014; 2:380–386
2. Petros AJ, Marshall JC, van Saene HKF: Should morbidity replace mortality as an endpoint for clinical trials
in intensive care? Lancet. 1995; 345:369–371
3. Gaudry S, Messika J, Ricard JD, et al.: Patient-important outcomes in randomized controlled trials
in critically ill patients: A systematic review. Ann Intensive Care. 2017; 7:28
4. Harhay MO, Wagner J, Ratcliffe SJ, et al.: Outcomes and statistical power in adult critical care randomized trials
. Am J Respir Crit Care Med. 2014; 189:1469–1478
5. Roth D, Heidinger B, Havel C, et al.: Different mortality time points in critical care trials
: Current practice and influence on effect estimates in meta-analyses. Crit Care Med. 2016; 44:e737–e741
6. Orban JC, Walrave Y, Mongardon N, et al.: Causes and characteristics of death in intensive care units: A prospective multicenter study. Anesthesiology. 2017; 126:882–889
7. Cook D, Rocker G, Marshall J, et al.: Withdrawal of mechanical ventilation in anticipation of death in the intensive care unit. N Engl J Med. 2003; 349:1123–1132
8. Cook DJ, Guyatt GH, Jaeschke R, et al.: Determinants in Canadian health care workers of the decision to withdraw life support from the critically ill. JAMA. 1995; 273:703–708
9. Veldhoen RA, Howes D, Maslove DM: Is mortality a useful primary end point for critical care trials
? Chest. 2020; 158:206–211
10. Annane D, Sebille V, Charpentier C, et al.: Effect of treatment with low doses of hydrocortisone and fludrocortisone on mortality in patients with septic shock. JAMA. 2002; 288:862–871
11. Sprung CL, Annane D, Keh D, et al.: Hydrocortisone therapy for patients with septic shock. N Engl J Med. 2008; 358:111–124
12. Van den Berghe G, Wouters P, Weekers F, et al.: Intensive insulin therapy in the surgical intensive care unit. N Engl J Med. 2001; 345:1359–1367
13. Finfer S, Chittock DR, Su SY, et al.: Intensive versus conventional glucose control in critically ill patients. N Engl J Med. 2009; 360:1283–1297
14. Bernard GR, Vincent J-L, Laterre PF, et al.: Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med. 2001; 344:699–709
15. Ranieri VM, Thompson BT, Barie PS, et al.: Drotrecogin alfa (activated) in adults with septic shock. N Engl J Med. 2012; 366:2055–2064
16. Vincent JL, Marini JJ, Pesenti A: Do trials
that report a neutral or negative treatment effect improve the care of critically ill patients? No. Intensive Care Med. 2018; 44:1989–1991
17. Higgins JP, Altman DG, Gotzsche PC, et al.: The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials
. BMJ. 2011; 343:d5928
18. DerSimonian R, Laird N: Meta-analysis in clinical trials
. Control Clin Trials
. 1986; 7:177–188
19. Higgins JP, Thompson SG: Quantifying heterogeneity in a meta-analysis. Stat Med. 2002; 21:1539–1558
20. Rhodes A, Evans LE, Alhazzani W, et al.: Surviving sepsis campaign: International guidelines for management of sepsis and septic shock: 2016. Crit Care Med. 2017; 45:486–552
21. Rochwerg B, Brochard L, Elliott MW, et al.: Official ERS/ATS clinical practice guidelines: Noninvasive ventilation for acute respiratory failure. Eur Respir J. 2017; 50:1602426
22. Wijdicks EF, Sheth KN, Carter BS, et al.: Recommendations for the management of cerebral and cerebellar infarction with swelling: A statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke. 2014; 45:1222–1238
23. Fletcher DD, Lawn ND, Wolter TD, et al.: Long-term outcome in patients with Guillain-Barre syndrome requiring mechanical ventilation. Neurology. 2000; 54:2311–2315
24. Khwaja A: KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin Pract. 2012; 120:c179–c184
25. Kahn JM, Werner RM, Carson SS, et al.: Variation in long-term acute care hospital use after intensive care. Med Care Res Rev. 2012; 69:339–350
26. Herr DL, Kelly K, Hall JB, et al.: Safety and efficacy of propofol with EDTA when used for sedation of surgical intensive care unit patients. Intensive Care Med. 2000; 26:S452–S462
27. Schuster DP, Metzler M, Opal S, et al.: Recombinant platelet-activating factor acetylhydrolase to prevent acute respiratory distress syndrome and mortality in severe sepsis: Phase IIb, multicenter, randomized, placebo-controlled, clinical trial. Crit Care Med. 2003; 31:1612–1619
28. Panacek EA, Marshall JC, Albertson TE, et al.: Efficacy and safety of the monoclonal anti-TNF antibody F(ab’)2 fragment in patients with severe sepsis stratified by IL-6 level. Crit Care Med. 2004; 32:2173–2182
29. Angstwurm MW, Engelmann L, Zimmermann T, et al.: Selenium in Intensive Care (SIC): Results of a prospective randomized, placebo-controlled, multiple-center study in patients with severe systemic inflammatory response syndrome, sepsis, and septic shock. Crit Care Med. 2007; 35:118–126
30. Guntupalli K, Dean N, Morris PE, et al.: A phase 2 randomized, double-blind, placebo-controlled study of the safety and efficacy of talactoferrin in patients with severe sepsis. Crit Care Med. 2013; 41:706–716
31. Morelli A, Ertmer C, Westphal M, et al.: Effect of heart rate control with esmolol on hemodynamic and clinical outcomes in patients with septic shock: A randomized clinical trial. JAMA. 2013; 310:1683–1691
32. Wu J, Zhou L, Liu J, et al.: The efficacy of thymosin alpha 1 for severe sepsis (ETASS): A multicenter, single-blind, randomized and controlled trial. Crit Care. 2013; 17:R8
33. Brower RG, Matthay MA, Morris A, et al.: Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome. N Engl J Med. 2000; 342:1301–1308
34. Villar J, Kacmarek RM, Perez-Mendez L, et al.: A high positive end-expiratory pressure, low tidal volume ventilatory strategy improves outcome in persistent acute respiratory distress syndrome: A randomized, controlled trial. Crit Care Med. 2006; 34:1311–1318
35. Papazian L, Forel JM, Gacouin A, et al.: Neuromuscular blockers in early acute respiratory distress syndrome. N Engl J Med. 2010; 363:1107–1116
36. Guerin C, Reignier J, Richard JC, et al.: Prone positioning in severe acute respiratory distress syndrome. N Engl J Med. 2013; 368:2159–2168
37. Azoulay E, Metnitz B, Sprung CL, et al.: End-of-life practices in 282 intensive care units: Data from the SAPS 3 database. Intensive Care Med. 2009; 35:623–630
38. Annane D, Siami S, Jaber S, et al.: Effects of fluid resuscitation with colloids vs crystalloids on mortality in critically ill patients presenting with hypovolemic shock: The CRISTAL randomized trial. JAMA. 2013; 310:1809–1817
39. Warren HS, Danner RL, Munford RS: Sounding board: Anti-endotoxin monoclonal antibodies. N Engl J Med. 1992; 326:1153–1157
40. Harhay MO, Casey JD, Clement M, et al.: Contemporary strategies to improve clinical trial design for critical care research: Insights from the First Critical Care Clinical Trialists Workshop. Intensive Care Med. 2020; 46:930–942
41. Marshall JC: Global collaboration in acute care clinical research: Opportunities, challenges, and needs. Crit Care Med. 2017; 45:311–320