Secondary Logo

Commentary: Multiple Causes of DeathThe Importance of Substantive Knowledge in the Big Data Era

Haneuse, Sebastien

doi: 10.1097/EDE.0000000000000566

From the Harvard T.H. Chan School of Public Health, Boston, MA.

This study was supported by the National Institutes of Health Grant R-01 CA181360-01.

The authors report no conflicts of interest.

Correspondence: Sebastien Haneuse, Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Ave., Building II, Boston, MA 02115. Email:

During the past 10 years, large administrative and clinical electronic health databases have become valuable resources for public health research.1 The central appeal of these databases is that they often provide rich information on large populations at a relatively low cost. This, in turn, provides researchers with unique opportunities to address increasingly nuanced scientific questions. In this issue of Epidemiology, two very interesting papers consider one sliver of this backdrop, specifically settings where primary interest lies in death due to a specific cause and yet the observed data are complicated due to the presence of other potential causes of death. Focusing on the estimation of causal effects within the traditional competing risks framework, Lesko and Lau2 consider the critical task of confounder adjustment specifically through a simulation study that investigates two possible strategies. They conclude that to mitigate confounding bias one should take care not to focus exclusively on variables that potentially confound the association between the exposure and the primary cause of mortality but also on variables that potentially confound the association between the exposure and the competing cause(s). The second paper seeks to move beyond the traditional competing risks notion that observed deaths are due to a single cause. Specifically, Moreno-Betancur et al.3 motivate and develop a novel set of targets of estimation that acknowledge the presence of multiple causes of death through a user-specified weighting scheme. They refer to this framework as multiple-cause mortality modeling.

Although these papers approach the challenge of multiple causes of death from different perspectives, they are both emblematic of a trend toward recognizing and accommodating competing risks as a challenge in epidemiologic studies of mortality.4 As the age distribution of the general population increases, death will also play an increasingly prominent (and detrimental) role in studies of nonterminal events (i.e., events, in contrast to death, that do not preclude study participants to experience subsequence outcomes). Studies of cognitive decline and Alzheimer’s disease, for example, have been shown to be prone to survival bias, a form of selection bias that arises when those who survive to be evaluated for the outcome are not representative of the underlying population.5 In such settings, the data are sometimes referred to as semicompeting risks, with “semi” referring to the fact that death is a competing risk for the event of interest but not vice versa.6,7 The Figure illustrates this concept, distinguishing semicompeting risks from the more familiar competing risks set-up. Although not widely appreciated as being distinct from the traditional competing risks data setting, there is an emerging literature on methods for semicompeting risk data, providing frameworks and software for formulating causal research questions,8 and for performing statistical analyses.9,10 Furthermore, in the context of longitudinal data analysis, methods are being developed to help researchers quantify the potential survival bias.11 Collectively, these methods, along with those proposed in Lesko and Lau2 and Moreno-Betancur et al.,3 provide researchers with opportunities to obtain greater insight into the associations and mechanisms we seek to understand.



The papers by Lesko and Lau2 and Moreno-Betancur et al.3 also serve to highlight the critical role that careful scientific thinking plays in the conduct of epidemiology in the modern age. The methods described in both papers require subject-matter knowledge, both in terms of how the observed data are generated and in terms of how the results of the analyses are to be interpreted and translated into practice. In Lesko and Lau,2 this issue manifests through the choice of adjustment variables when performing causal inference, and in Moreno-Betancur et al.3 through the choice of weights attributed to different causes of death when defining estimands of interest. While this observation is not new,12 it may be more pertinent now than ever as researchers seek to leverage more and more complex datasets, which may not have been collected for the specific research agenda to which they are being applied, with more and more complex methods. While Lesko and Lau assert that “ the field of epidemiology has largely moved away from automated variable selection procedures,” with the advent of the Big Data era and an increased emphasis on the use of large electronic administrative and electronic health records databases, this trend may be reversing. In particular, in seeking to address the very real challenges associated with analyzing huge amounts of complex data researchers are turning to computationally intensive algorithms for a range of tasks including the control of confounding bias.13–16 Furthermore, these ideas and methods are likely to gain in popularity with the proliferation of online courses and formal educational programs in Big Data and Data Science, many of which place substantial emphasis on modern methods in computer science. While the need for algorithm-based methods is clear, the success of future research endeavors will require their marriage with substantive knowledge and thoughtful study design. The papers by Moreno-Betancur et al.3 and Lesko and Lau2 remind us that, as in all marriages, as one seeks to make the “right” choices, be it in regard to a weighting scheme or in the choice of confounders, a degree of pragmatism will likely be required if progress is to be made.

Back to Top | Article Outline


1. Gallego B, Dunn AG, Coiera E. Role of electronic health records in comparative effectiveness research. J Comp Eff Res. 2013;2:529–532.
2. Lesko C, Lau B. Bias due to confounders for the exposure-competing risk relationship. Epidemiology. 2017;28:20–27.
3. Moreno-Betancur M, Sadaoui H, Piffaretti C, Rey G. Survival analysis with multiple causes of death: Extending the competing risks model. Epidemiology 2016;28:12–19.
4. Lau B, Cole SR, Gange SJ. Competing risk regression models for epidemiologic data. Am J Epidemiol. 2009;170:244–256.
5. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625.
6. Fine JP, Jiang H, Chappell R. On semi-competing risks data. Biometrika 2001;88:907–919.
7. Haneuse S, Lee KH. Semi-competing risks data analysis: accounting for death as a competing risk when the outcome of interest is nonterminal. Circ Cardiovasc Qual Outcomes. 2016;9:322–331.
8. Tchetgen Tchetgen EJ. Identification and estimation of survivor average causal effects. Stat Med. 2014;33:3601–3628.
9. Tchetgen Tchetgen EJ, Phiri K, Shapiro R. A simple regression-based approach to account for survival bias in birth outcomes research. Epidemiology. 2015;26:473–480.
10. Lee KH, Haneuse S, Schrag D, Dominici F. Bayesian semi-parametric analysis of semi-competing risks data: investigating hospital readmission after a pancreatic cancer diagnosis. J R Stat Soc Ser C Appl Stat. 2015;64:253–273.
11. Mayeda ER, Tchetgen Tchetgen EJ, Power MC, et al. A simulation platform for quantifying survival bias: an application to research on determinants of cognitive decline. Am J Epidemiol. 2016;184:378–387.
12. Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155:176–184.
13. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512–522.
14. Wang C, Parmigiani G, Dominici F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68:661–671.
15. Patorno E, Glynn RJ, Hernández-Díaz S, Liu J, Schneeweiss S. Studies with many covariates and few outcomes: selecting covariates and implementing propensity-score-based confounding adjustments. Epidemiology. 2014;25:268–278.
16. Pirracchio R, Petersen ML, van der Laan M. Improving propensity score estimators’ robustness to model misspecification using super learner. Am J Epidemiol. 2015;181:108–119.
Copyright © 2017 Wolters Kluwer Health, Inc. All rights reserved.