Evidence for efficacy of treatments to defeat coronavirus disease 2019 (COVID-19) is needed. The number of registered randomized trials studying promising antiviral drugs is rising steadily. Heterogeneity in choice of endpoints and conflicting results both between and also within trials are to be expected (1). On the one hand, this heterogeneity reflects the manifold potential ways of defining a treatment effect, the different information needed for the various decision-makers involved, and the different patient populations under study. On the other hand, harmonization of the different endpoints is an essential step to fasten decision-making of public health authorities and clinicians (1). Thus, comprehensive judgment is facilitated if treatment effects on the various components of clinical outcomes are presented simultaneously in a compact and informative manner. In this article, we develop a multistate model that contributes to harmonize in a descriptive analysis a multiplicity of endpoints.
Regarding choice of endpoints, the Core Outcome Measures in Effectiveness Trials (COMET) initiative developed a core outcome set (COS) for COVID-19 randomized trials (2). Similarly, the Clinical Characterization and Management Working Group of the World Health Organization (WHO) Research and Development Blueprint programme, the International Forum for Acute Care Trialists, and the International Severe Acute Respiratory and Emerging Infections Consortium propose a minimal COS (3).
In the primary analysis of a trial, an estimand (the effect measure that is to be estimated) is defined that translates the primary endpoint into a quantity measuring the effect of the treatment on the primary endpoint (e.g., for the primary endpoint of mortality, the estimand may be the relative hazard of death comparing patients receiving treatment to control). Subsequently, using data from the completed trial, the estimand with a measure of statistical uncertainty is estimated and evaluated under prespecified criteria to indicate the presence or absence of a treatment effect. Secondary analyses serve to provide supportive evidence related to the primary analysis in a descriptive manner. In light of the pandemic, it is essential to exploit the full potential of the available data by efficiently defining primary and secondary analyses. We give recommendations on efficient secondary analyses using a multistate model that provides major insights on treatment effects for primary endpoints within, for example, the COS of the COMET initiative and the endpoints recommended by the WHO (4).
With the multistate analysis, we give recommendation on efficient secondary analysis that provides major insights on treatment effects for endpoints within, for example, the COS of the COMET initiative and the endpoints recommended by the WHO (4). Multistate methodology is a powerful tool to study multiple endpoints simultaneously over time (5,6). Based on the multistate model, we propose a stacked probability plot that reflects the manifold treatment effects in a time-dependent manner within a single informative graph complementing the proposals of the WHO and the COMET initiative.
Review of Endpoints in Registered COVID-19 Trials
The WHO recommends an ordinal endpoint for COVID-19 trials, whose numbers of categories and category definitions have evolved (Table S1 [Supplemental Digital Content 1, http://links.lww.com/CCM/F952] and ). The most recently proposed categories are as follows: 0 (no clinical or virological evidence of infection); 1 (no limitation of activities); 2 (limitation of activities); 3 (hospitalization, no oxygen therapy); 4 (oxygen by mask or nasal prongs); 5 (noninvasive ventilation or high flow oxygen); 6 (intubation and mechanical ventilation); 7 (ventilation + additional organ support-pressors, renal replacement therapy, and extracorporeal membrane oxygenation); and 8 (death).
Events that are both part of the WHO recommended categorical endpoint (including all of its variants) and considered within the COS of (2,3) are duration of invasive mechanical ventilation, hospitalization, and mortality. These are robust endpoints providing information for clinicians as well as public health authorities. The proposed multistate model is based on at least these major clinical events.
To obtain an idea how many of the currently ongoing trials collect information on hospitalization, mechanical ventilation, and death, we review registered clinical COVID-19 trials. In this brief review, we include all randomized trials registered by April 3, 2020, in the U.S. registry database (ClinicalTrials.gov) of the National Library of Medicine at the National Institute of Health. We restrict the search to trials with the primary purpose to study the efficacy of remdesivir, lopinavir/ritonavir, or hydroxychloroquine as treatment for COVID-19 diseased patients (7), which are currently considered to be among the most promising treatment options (8). We highlight that this is not a systematic review, we only aim to give a basic idea on the currently used endpoints.
Results of the Review
Our review of COVID-19 clinical trials registered in the database of the National Library of Medicine at the National Institute of Health includes 38 trials (7). We found that nine of the studies (24%) used an ordinal endpoint as recommended by the WHO (4). The number of categories varied between six and 10 states (for the definitions, we refer to Table S1, Supplemental Digital Content 1, http://links.lww.com/CCM/F952).
In total, 25 (66%) of the studies collect information on invasive mechanical ventilation—13 (34%) within the primary endpoint and 12 (32%) within the secondary endpoints. Information on hospitalization is included in 24 (63%) of the studies. Specifically, 13 (34%) comprise hospitalization within the primary endpoint and 11 (29%) within the secondary endpoints. Twenty-nine (76%) of the studies indicate that information on mortality is recorded (n = 19 [50%] for the primary endpoint, n = 10 [26%] for secondary endpoints). For other studies, it is unclear whether this information is available. Nonetheless, 13 of the 19 trials (68%) in which we did not find explicit details on data collection for either hospitalization, invasive mechanical ventilation, or mortality do have a general statement that adverse events are recorded. Thus, even though not specified, it can be presumed that more trials record this information. The review implies that the majority of COVID-19 trials have sufficient information to provide the stacked probability plot.
Follow-up of the 38 trials ranges from 4 to 168 days. Most trials (n = 27, 71%) have a follow-up of 14 to 15 days (n = 10 and 6, respectively) or 28–30 days (n = 11). In Table S1 (Supplemental Digital Content 1, http://links.lww.com/CCM/F952), we give a detailed overview of the endpoints chosen in the 38 trials as well as on the duration of follow-up, the sample size, and the patient population under study.
COVID-19 Trial Data Harmonization Approach
Multistate models are used to analyze time-to-event data. To obtain a detailed understanding of potential treatment effects, we propose the multistate model shown in Figure 1. The boxes represent the possible states a patient may encounter, and the arrows represent the possible transitions from one state to another. The model accounts for hospitalization, invasive mechanical ventilation, discharge alive, and death. At randomization, patients may start either in the hospitalized nonventilated state or in the invasive mechanical ventilation state. Over the course of hospital stay, patients move through the states of the model and may encounter one or multiple episodes of mechanical ventilation. Eventually, individuals end up in the discharged state or the death state. If follow-up is not complete, some individuals are censored before they reach a final state. If follow-up is beyond discharge, the discharge state can be modeled as an intermediate state and the only final state is death.
The four-state multistate model in Figure 1 can be easily extended to allow for more detailed information of treatment effects in, for example, mild and moderate cases and follow-up beyond discharge (Fig. 2). If more detailed data is available (e.g., WHO scale), we recommend to use the multistate model in Figure 2, which also includes a general recovery state (denoted as “cured”) separating hospital discharge from negativity (e.g., two consecutive negative results of the 2019 novel coronavirus tests ). Thus, this model is not only more sensitive for mild and moderate cases that can be defined according to the level of oxygen support provided to patients (oxygen, high flow oxygen, and noninvasive ventilation) but also allows for a differentiation of discharged with and without capability of resuming normal activities (9).
Stacked Probability Plot.
With the multistate model in Figure 1, treatment effects on duration of invasive mechanical ventilation, length of hospital stay, and death are directly quantifiable. The course of a patients hospital stay in the treatment and control group can be visualized with a stacked probability plot (10) (Fig. 3 and Results section). The stacked probability plot visualizes important events of interest simultaneously over time in a single informative graph.
For the stacked probability plot, the following information needs to be recorded as follows:
- 1) Hospital admission and discharge dates.
- 2) Vital state at the end of follow-up.
- 3) Death date, if applicable.
- 4) Start and stop dates of mechanical ventilation.
Additionally, to the stacked probability plot, multiple estimands can be explored to add further information. Two types are of particular interest. The first type of estimand comprises the probabilities of being in one of the states of the multistate model at a particular time (state occupation probability) and the probabilities of moving from one state to another over the course of time (transition probability). For a more detailed definition, we refer to our online supplementary material (Supplemental Digital Content 2, http://links.lww.com/CCM/F953).
The second type of estimand, based on the states occupation probabilities, is the mean length of stay in each state, such as mean duration under ventilation and mean length of hospital stay. These complex composite outcomes, which can account for multiple episodes of ventilation and adequately control for discharge alive and death, can be easily computed and compared between treatment groups (11). For example, differences in mean number of days alive without mechanical ventilation or mean number alive with ventilation between treatment and control group can be modeled and estimated with the transition and state occupation probabilities of the multistate model (11).
The model avoids common pitfalls such as competing risks bias when studying hospital mortality and immortal time bias when considering mechanical ventilation (12). Unlike a simple proportion at a given time point, the transition and occupation probabilities of a multistate model can display dynamically the whole evolution of the patient’s prognosis. They account for random censoring, competing risks and the possibility for one patient to experience multiple episodes of invasive mechanical ventilation. Furthermore, estimation of both types of estimands can be performed nonparametrically and is therefore independent of modeling assumptions, such as the proportionally assumption of the Cox model (13).
Analysis of the extended model (Fig. 2) is the same as for the model in Figure 1. Explanations on multistate model analysis and software are available in the literature (5,6,14–17).
Even if primary analysis uses different endpoints and/or estimands, the stacked probability plot should be added to the analysis for explorative purposes. If the multistate model analysis is chosen as the primary analysis, CIs for the difference between the various estimates of the placebo and treatment groups can be obtained via bootstrapping. However, we recommend estimation of CIs only if the multistate model analysis is the primary analysis.
Finally, we remark that there remain some limitations. For example, the stacked probability plot harmonizes heterogeneous endpoints and provides deep insights into time dynamics and treatment effects on different levels, but it does not overcome other differences in clinical trials. These include differences in the patient population, differing doses of the same treatment, differences in timing of the treatment, and potentially different cointerventions.
Hypothetical COVID-19 Trial.
In this section, we provide a proof of concept on the statistical analysis of a COVID-19 randomized trial. Our example is inspired by the study of Cao et al (9), which is a placebo-controlled double-blinded randomized controlled trial (RCT) for lopinavir/ritonavir with a follow-up of 28 days. Via simulation, we obtain individual patient data for the events in the multistate model (Fig. 1). Compared with the eight-category WHO ordinal scale of Cao et al (9), we combine status 1 and 2 to the “Discharge from hospital” and status 3, 4, and 5 to “Hospital non ventilated” (first state in the multistate model). Status 6 and 7 remain the status 6 remain the “ventilated” and, respectively, “Death” state. More details on the simulation are found in the appendix (Supplemental Digital Content 3, http://links.lww.com/CCM/F954).
The results of our multistate model analysis for the reconstructed RCT are shown in Table 1 and Figure 3. Table 1 shows the probabilities to be mechanically ventilated, hospitalized without ventilation, discharged alive, and to die at days 7, 14, and 28. These probabilities can be read from the stacked probability plot as distance between the curves. The probability to die is read directly from the first curve. The probability to be mechanically ventilated at day 28 corresponds to the difference between the first and second curves. Similarly, the probability to be hospitalized, but not ventilated, is the difference between the second and the third curve. The probability do be discharged alive is one minus the fourth curve. The probability to be hospitalized, irrespective of the ventilation state, is the sum of the probability to be ventilated and to be hospitalized nonventilated. At all time points, the four probabilities sum up to 1.
TABLE 1. -
Estimates of the First and Second Type of Estimand for the Time Points 7, 14, and 28 Days Since Randomization Using the Constructed Data Example (n
||Treatment (n = 100)
||Placebo (n = 100)
||Difference (95% CI)
||Treatment (n = 100)
||Placebo (n = 100)
||Difference (95% CI)
||Treatment (n = 100)
||Placebo (n = 100)
||Difference (95% CI)
||At 7 d
||At 14 d
||At 28 d
| Discharged alive
||10 (–3 to 23)
||4 (–9 to 17)
||–4 (–11 to 3)
||0 (–1 to 10)
||–4 (–15 to 7)
| Hospitalized without MV
||–11 (–23 to –2)
||–11 (–24 to 2.4)
||–4 (–14 to 6)
| Hospitalized with MV
||0 (–4 to 4)
||1 (–4 to 6)
||4 (–0.03 to 8)
|Mean time spent
||At 7 d
||At 14 d
||At 28 d
| Alive without MV (hospitalized or discharged alive)
||0.2 (–1 to 1)
||0.3 (–1.5 to 2)
||0.2 (–3 to 4)
| Alive with MV
||0.0 (–0.2 to 0.2)
||0 (–0.5 to 0.5)
||0.2 (–0.5 to 1)
||–0.4 (–1 to 0.3)
||–1.0 (–2.5 to 0.5)
||–2.3 (–5 to 1)
MV = mechanical ventilation.
The colors indicate how the estimates relate to the stacked probability plot. The probabilities can be read from the plot as distance between the curves. The mean time spent in each state can be read from the plot as the colored area between the curves. The mean time spent hospitalized is the sum of the time spent with and without ventilation in the hospital. The durations are restricted to the time points of interest; therefore, they are to be considered as the lower limits of the total durations spent in each state. If the multistate model is used for the primary analysis, the 95% CIs of the differences can be interpreted as treatment effects. If the primary analysis is based on a different endpoint, we do not recommend to estimate CIs and p values, as the confidence level may not be valid due to multiple testing.
The lower part of Table 1 presents the results of the second type of estimand, namely the duration of mechanical ventilation, the length of hospitalization, and the mean number of days alive without mechanical ventilation. The mean time spent with mechanical ventilation corresponds to the light green area in the stacked probability plot, while the total duration of hospitalization to the sum of the dark and the light green area. A popular patient-oriented endpoint is the mean number of days without ventilation. The estimate can be read from the stacked probability plot as the sum of the dark green and orange area.
Thus, the stacked probability plot does not only illustrate the results of the table within a single graph but also provides additional information on the estimates for all time points between randomization and end of follow-up. This single informative graph replaces Table 1 as well as multiple figures of Kaplan-Meier curves or cumulative incidence functions allowing for direct insights. For example, the plot shows directly that patients in the treatment group are discharged alive more quickly than patients in the placebo group (Table 1). It can also be seen that duration of mechanical ventilation is longer in the treatment group. However, this comes at the cost of a higher mortality risk in the placebo group.
Real COVID-19 Data Example
A further advantage of the stacked probability plot is that results of different trials can be directly compared. Differing lengths of follow-up are accounted for by the time-dependent graphical display of the probabilities of being in one of the states of interest.
As a proof of concept, we use the published data of Grein et al (18). In this prospective observational cohort on compassionate use of remdesivir, the authors provide detailed clinical history of respiratory support for the 53 analyzed patients (Fig. 3 in their original article). The primary endpoint is the time to clinical improvement, defined as discharge from the hospital, or a decrease of at least 2 points from baseline on a modified 6 points ordinal scale derived from the WHO scale.
Based on Figure 3, we reconstruct individual data and analyze it within the multistate framework using both the four-state model in Figure 1 and a more complex model differentiating the 6 points of the ordinal scale. Since the seven-state model is nested within the four-state model, results are the same for corresponding estimands such as time spent alive in hospital, time spent alive without ventilation, or mortality (Table S2, Supplemental Digital Content 2, http://links.lww.com/CCM/F953). The use of the 6 points ordinal scale allows for a more precise understanding of the clinical course and oxygen support therapy needed by the patients. The use of the four-state model allows for a direct comparison with the trial by Cao et al (9), by looking at the stacked probability plot and by comparing meaningful estimands. For example, although mortality is lower, patients in (18) need a more intensive and prolonged respiratory support. This comparison also shows the limitations of the proposed approach. Since the two studies have a different level of evidence (randomized vs observational study) and included different patients (< 1% of patients were intubated at inclusion in the trial by Cao et al (9), whereas 65% of patients were already receiving invasive ventilation at baseline in the remdesivir cohort) differences between the trials should only be explored without drawing conclusions on treatment effects.
To harmonize heterogeneous endpoints of COVID-19 clinical trials, we propose to include multistate methodology as a descriptive analysis in statistical analysis plans. This ensures similar descriptive summaries are being presented uniformly across all COVID-19 trials. Our brief review indicates that the proposed multistate model in Figure 1 is based on information collected in most interventional COVID-19 trials within either the primary or the secondary endpoint. All ongoing clinical trials should add a stacked probability plot of the major events hospitalization, invasive mechanical ventilation, discharge alive, and death. Only simultaneous consideration of these endpoints ensures that reduction of invasive mechanical ventilation and length of stay can be correctly attributed to discharge alive or early death. Thus, early death and efficacy can be differentiated while maintaining information relevant not only for clinicians and patients but also for public health authorities. Indeed, one of the specificities of this pandemic is the pressure it puts on the whole healthcare system. Triage pressure and shortage of ventilators have led to discussion about ethical decision rules to allocate scarce medical resources (21). Healthcare system-centered outcomes such as the number of days spent in the ICU/hospital and number of days spent using mechanical ventilators become of interest for all decision-makers.
The model can also handle do-not-resuscitate (DNR) orders either by stratification, if the DNR order is known at the time of randomization for all patients, or by a combined endpoint death/DNR. Furthermore, the stacked probability plot can be estimated with observational data. The methodological background is covered in (19) using two publically available datasets. We emphasize that similar approaches can also be applied to registry studies such as the International Severe Acute Respiratory and Emerging Infection Consortium database (20), which provides clinically relevant insights. If follow-up beyond hospital discharge is available, a more detailed plot differentiating between discharge with and without resumption of normal activities should be provided. Specifically, clinical trials using the categorical endpoints recommended by the WHO R&D Blueprint expert group have all the necessary information available for this detailed analysis.
The resulting informative plot can be a powerful contribution in the effort of harmonizing the diversity of clinical endpoints and lengths of follow-up and thereby fastening accessibility of evidence and, thus, decision-making. Additionally, treatment effects on a number of (potentially clinically opposite) endpoints can be studied simultaneously over time by estimating sojourn time spent in the various states. We note that the proposed multistate model approach cannot overcome all aspects of heterogeneity in randomized trials. Additional statistical modeling to account for heterogeneity in patient populations or differing time scales is needed.
We also highlight that our proposal should be understood as a descriptive complement for the primary analysis. Thus, results from the multistate model, if used as secondary analysis, can only generate new hypothesis but cannot provide evidence. Finally, rather than suggesting core outcomes for COVID-19 trials, we provide a way for simultaneous evaluation of multiple endpoints within COS using information on all the available data.
To conclude, we recommend to complement the primary analysis with a stacked probability plot for the clinical events hospitalization, invasive mechanical ventilation, discharge alive, and death. If data are available, we recommend differentiation between ward and ICU hospitalization as well as differentiation between discharge alive without and with resumption of normal activities.
We thank Jean-Francois Timsit and Dr. Klaus-Dieter Wolkewitz for valuable clinical input. Furthermore, we are grateful for the important and helpful comments of Dr. Gerta Rücker and Dr. Erika Graf who commented on the article. Finally, we are grateful for the reviewer comments which we believe strengthened the article.
1. Timsit JF, de Kraker MEA, Sommer H, et al. COMBACTE-NET consortium: Appropriate endpoints for evaluation of new antibiotic therapies for severe infections: A perspective from COMBACTE’s STAT-Net. Intensive Care Med. 2017; 43:1002–1012
2. Jin X, Pang B, Zhang J, et al. Core outcome set for clinical trials on Coronavirus disease 2019 (COS-COVID). Engineering (Beijing). 2020 Mar 18. [Epub ahead of print]
3. Marshall JC, Murthy S, Diaz J, et al. WHO Working Group on the Clinical Characterisation and Management of COVID-19 infection: A minimal common outcome measure set for COVID-19 clinical research. Lancet Infect Dis. 2020; 20:e192–e197
4. World Health Organization: Coronavirus Disease (COVID-2019) R&D. 2020. Available at: http://www.who.int/blueprint/priority-diseases/key-action/novel-coronavirus/en/
. Accessed March 30, 2020
5. Cook RJ, Lawless JF: Multistate Models for the Analysis of Life History Data. Boca Raton. 2018, CRC Press, p 441
6. Andersen PK, Keiding N. Multi-state models for event history analysis. Stat Methods Med Res. 2002; 11:91–115
7. ClinicalTrials.gov: Find Trials. 2017. Available at: https://clinicaltrials.gov/ct2/search
. Accessed March 24, 2020
8. Kalil AC. Treating COVID-19—off-label drug use, compassionate use, and randomized clinical trials during pandemics. JAMA. 2020; 323:1897–1898
9. Cao B, Wang Y, Wen D, et al. A trial of lopinavir–ritonavir in adults hospitalized with severe Covid-19. N Engl J Med. 2020; 382:1787–1799
10. de Wreede LC, Fiocco M, Putter H. mstate: An R package for the analysis of competing risks and multi-state models. J Stat Softw. 2011; 38:1–30
11. Beyersmann J, Putter H. A note on computing average state occupation times. Demographic Res. 2014; 30:1681–1696
12. Schumacher M, Allignol A, Beyersmann J, et al. Hospital-acquired infections–appropriate statistical treatment is urgently needed! Int J Epidemiol. 2013; 42:1502–1508
13. Stensrud MJ, Hernán MA. Why test for proportional hazards? JAMA. 2020; 323:1401–1402
14. Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: Competing risks and multi-state models. Stat Med. 2007; 26:2389–2430
15. Aalen O, Borgan O, Gjessing H: Survival and Event History Analysis: A Process Point of View. New York. 2008, Springer
16. Beyersmann J, Allignol A, Schumacher M: Competing Risks and Multistate Models With R. New York. 2011, Springer Science & Business Media, p 249
17. Allignol A, Schumacher M, Beyersmann J. Empirical transition matrix of multi-state models: The etm package. J Stat Softw. 2011; 38:1–15
18. Grein J, Ohmagari N, Shin D, et al. Compassionate use of remdesivir for patients with severe Covid-19. N Engl J Med. 2020; 382:2327–2336
19. Hazard D, Kaier K, von Cube M, et al. Joint analysis of duration of ventilation, length of intensive care, and mortality of COVID-19 patients: A multistate approach. BMC Med Res Methodol. 2020; 20:206
20. International Severe Acute Respiratory and Emerging Infection ConsortiumHome - ISARIC. 2020. Available at: https://isaric.tghn.org/
. Accessed March 14, 2020
21. Emanuel EJ, Persad G, Upshur R, et al. Fair allocation of scarce medical resources in the time of Covid-19. N Engl J Med. 2020; 382:2049–2055