Secondary Logo

Share this article on:

Causal Inference From Observational Data

New Guidance From Pulmonary, Critical Care, and Sleep Journals

Maslove, David M., MD, MS; Leisman, Daniel E., BS

doi: 10.1097/CCM.0000000000003531
Editor's Choice

Department of Critical Care Medicine, Queen’s University and Kingston Health Sciences Center, Kingston, ON, Canada

Icahn School of Medicine at Mount Sinai, New York, NY

The authors have disclosed that they do not have any potential conflicts of interest.

This month’s issue of Annals of the ATS features a special article by a group of editors from top pulmonary, sleep, and critical care journals—including Critical Care Medicine—that provides guidance for authors of observational studies (1). Spurred in part by a perceived uptick in the number of submissions of this type, the article aims to build common ground for both authors and readers alike, encouraging the use of modern statistical methods, the transparent reporting of results, and a measured interpretation of findings. The guidance does not apply to observational studies that are purely descriptive, that identify associations only, or that develop predictive models, but rather is intended specifically for those that focus on causal inference.

Although Critical Care Medicine endorses the article’s main principles—outlined below and unpacked more fully in the article itself—they are not intended to be strict requirements; as with any submission, observational studies are considered individually on a case-by-case basis and proceed through the editorial process based on their merits.

Why generate this document now? Observational studies in critical care are on the rise. A cursory Pubmed search suggests that their ranks have increased in the last 5 years and that as of 2015, these outnumber studies identified as randomized controlled trials ( Fig. 1). A few key factors may account for this trend.

Figure 1

Figure 1

First, a dearth of actionable evidence from interventional trials—most of which are costly and difficult to carry out—has in some circles cast observational studies in a new and favourable light (2). Although they generate a different class of evidence, observational studies may be cheaper, easier, and faster to conduct than interventional ones, especially when retrospective datasets are used. There are also cases where randomization poses ethical challenges, where the event of interest is rare, where the research question itself pertains specifically to real-world practices, or where the effects of long exposures are being examined, in which observational studies may be the design of choice.

Second, the proliferation of new sources of data, including electronic medical records, patient registries, and administrative databases, ensure that fodder for observational studies is now in ample supply. Secular trends encouraging data sharing and open access have also made such datasets more readily available to researchers, including those who were not involved in the data collection itself. One illustrative example of this phenomenon is the Medical Information Mart for Intensive Care (MIMIC) database, which includes granular data collected from tens of thousands of ICU admissions at a single center, and has been used as a data source for hundreds of publications (3).

Third, the analytic tools used in observational studies, including current popular statistical programming languages, are readily available (and in some cases open source), with both the clinical and academic workforces becoming increasingly adept at using them.

Observational data can be analyzed in a number of ways, with the appropriateness of the methods determined by the research questions. This concept can be illustrated with the following simple example: Suppose you have noticed that when patients receiving high flow oxygen get intubated, they tend to develop hypotension immediately thereafter. You have access to a retrospective dataset including detailed patient-level clinical data, and consider leveraging this to ask a few questions about this phenomenon, including: 1) are patients truly developing hypotension after intubation or is it just your individual (and potentially biased) belief?; 2) might we be able to predict when this is going to happen, so that we can institute preventive measures?; and 3) does intubation in fact “cause” hypotension? (you wonder if the transition to positive pressure results in reduced preload in these patients, who may be relatively volume deplete because of excessive insensible losses and fluid-restricted management plans).

The first question is one of association and can be addressed by applying traditional statistical methods like correlation and regression to the dataset. We recognize this would not tell us anything about causation, but that was not our question. We simply wanted to know whether the perceived phenomenon was supported by something more than a hunch. The second question is one of prediction and can be addressed by building statistical or machine learning models. Since the objective here is simply to generate the best prediction possible, all is fair; any variable can be included, and we need not (and in fact should not) read much into their biological meaning.

The third question, and the focus of the recent article, is one of causal inference. It is here that close attention must be paid to confounding, that is to say accounting for external factors that might be related to both the exposure and the outcome. In this case, the use of propofol may be a confounder; it is associated with intubation because patients are often intubated after it is given, and it is associated with hypotension because blood pressure often drops after it is given. This relationship is shown in Figure 2, an example of a directed acyclic graph (DAG). This DAG shows that we must control for propofol as a confounder before we can infer that intubation might cause hypotension.

Figure 2

Figure 2

One common pitfall when identifying confounders for causal inference is to select potential confounders based on statistical testing. These should rather be chosen a priori, based on knowledge about the pathophysiology of the process in question. As a corollary, this means that since only the main exposure variable is properly modeled this way, the effect estimates of the other independent variables included in the model as covariates may be influenced by residual confounding and should not be reported as reliable results. Not only are these estimates unreliable, they can distort the estimates for the primary effect of interest.

A second pitfall is the reliance on p values in reporting results, since these do not represent the magnitude or clinical importance of an association. Effect estimates and their surrounding CIs are more informative and allow for a more nuanced discussion than a simple “significant versus nonsignificant” dichotomy. The STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines provide additional guidance on how to present results in tabular format.

The specific language used in concluding statements is also important. Although we see frequent admonitions cautioning authors to refer to “associations” rather than “causation”, and although it may be misleading to state that an exposure had an “effect” on an outcome, reference to “effect estimates” made in the proper context may be reasonable (4).

We encourage authors to refer to the work of Lederer et al (5), as well as to the STROBE guidelines available at Consultation with a biostatistician is also encouraged for most analyses, especially where control of confounding for causal inference is concerned.

The modern data deluge brings both opportunities and risks. The increasing availability of data may help in accelerating research, replicating results, and ensuring that leads identified for further inquiry are sound. But the relative ease with which results can be generated from such datasets also opens the door to distraction and false starts. An open understanding of and rigorous approach to causal inference with observational data stands to ensure this valuable methodology makes important contributions to critical care research.

Back to Top | Article Outline


1. Lederer DJ, Bell SC, Branson RD, et al. Control of confounding and reporting of results in causal inference studies: guidance for authors from editors of respiratory, sleep, and critical care journals. Ann Am Thorac Soc 2018 Sep 19. [Epub ahead of print]
2. Vincent JL. We should abandon randomized controlled trials in the intensive care unit. Crit Care Med 2010; 38:S534–S538
3. Johnson AE, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Sci Data 2016; 3:160035
4. Hernán MA. The C-word: Scientific euphemisms do not improve causal inference from observational data. Am J Public Health 2018; 108:616–619
5. Lederer DJ, Bell SC, Branson RD, et al. Control of confounding and reporting of results in causal inference studies: Guidance for authors from editors of respiratory, sleep, and critical care journals. Ann Am Thorac Soc 2018 Sep 19. [Epub ahead of print]. Available at: Accessed October 12, 2018

control of confounding; causal inference; observational studies

Copyright © by 2019 by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved.