Secondary Logo

Journal Logo


More on Selection Bias

Shahar, Eyal; Shahar, Doron J.

Author Information
doi: 10.1097/EDE.0b013e3181d7ec13
  • Free

To the Editor:

Hernán has argued that period-specific hazard ratios “have a built-in selection bias.”1 The claim was based on both a concept of susceptibility to a harmful exposure, and on colliding bias in a causal directed acyclic graph (DAG). We would like to offer the following comments.

The term “susceptible” was not explained, but could be interpreted as people of the causative type (deterministic model).2 Under that interpretation, however, early depletion of the causative type in the exposed may be accompanied by early depletion of the preventive type in the unexposed. In the Women's Health Initiative, for example, susceptible participants in a positive sense (ie, women in whom taking hormone prevents coronary heart disease), should have developed coronary heart disease early on if they had received placebo. As a result, fewer events were also expected in the placebo group over time and the net effect of depleting “causative” and “preventive” types on the period-specific hazard ratios is unpredictable. Note that in a deterministic world, the estimated effect from the trial is of no interest to any woman3; it does not inform any woman about her deterministic type. Under an indeterministic model (which is supported by quantum mechanics and other arguments), time-dependent susceptibility implies modification of relative causal propensity (eg, hormones versus placebo) by other time-dependent causal variables. Causal knowledge may be gained when a modifying variable is proposed. In summary, the concept of susceptibility does not show any predictable or unique deficiency of the period-specific hazard ratios.

Far more illuminating is the DAG perspective of colliding bias,4,5 and the implications for every measure of effect from nonrandomized studies. The Figure shows an extension of the DAG for hazard ratio-related colliding bias in a randomized trial5 to a nonrandomized study. E and D are time-index exposure status and disease status, respectively; the time-index Q represents all causes of the disease that are not effects of E. When the hazard ratio is computed to estimate the effect of E0 (baseline exposure status) on D2, we condition on D1 (by restricting the sample to those who remained event-free by time 2) and thereby open the blocked path E0D1Q0Q1D2.5 Note, however, that we traditionally condition on D0 and earlier D variables as well: people with prevalent disease at baseline are usually excluded to study the effect of baseline exposure status on incident disease during follow-up. That conditioning creates colliding bias in a nonrandomized study by opening blocked paths, such as E0E−1D0Q−1Q0Q1D2, for any measure of association between E0 and D2, including the confounder-adjusted cumulative probability difference. The magnitude of the bias, however, might be small, whenever prevalent disease is rare.6

A directed acyclic graph showing colliding bias in a nonrandomized study, caused by excluding people with prevalent disease at baseline. A box denotes conditioning.

One remedy is simple. If we wish to estimate the effect of baseline exposure on incident disease status by the hazard ratio (or by any other measure), we should condition on all known causes of the disease that are not effects of the exposure—regardless of whether they are also causes of the exposure (ie, regardless of whether they are confounders). Setting aside unfalsifiable truism (eg, unknown causes of D might exist), the proposed solution should conjecturally hold for the period-specific hazard ratios as well—whether estimated from a randomized trial or from a nonrandomized study. Moreover, the DAG perspective suggests that conditioning on all known causes of D will remove both disease-related colliding bias and confounding bias, whereas conditioning only on confounders (those causes of D that are also causes of E) will not suffice. Of course, the cost of the remedy might be increased variance.

Eyal Shahar

Division of Epidemiology and Biostatistics

Mel and Enid Zuckerman College of Public Health

University of Arizona

Tucson, AZ

[email protected]

Doron J. Shahar

Departments of Physics and Mathematics

College of Science

University of Arizona

Tucson, AZ


1. Hernán MA. The hazards of hazard ratios. Epidemiology. 2010;21:13–15.
2. Maldonado G, Greenland S. Estimating causal effects. Int J Epidemiol. 2002;31:422–429.
3. Shahar E. Estimating causal parameters without target populations. J Eval Clin Pract. 2007;13:814–816.
4. Glymour MM, Weuve J, Berkman LF, Kawachi I, Robins JM. When is baseline adjustment useful in analyses of change? An example with education and cognitive change. Am J Epidemiol. 2005;162:267–278.
5. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625.
6. Greenland S. Quantifying biases in causal models: Classical confounding vs. collider-stratification bias. Epidemiology. 2003;14:300-306.
© 2010 Lippincott Williams & Wilkins, Inc.