The incredible strength of instrumental variable (IV) analyses is clear: if we are lucky enough to have a true instrument, we can perhaps learn about the causal effect of a treatment on an outcome without ever identifying, measuring, or adjusting for baseline confounding. In other words, IV analyses help address one of the major criticisms of causal effect estimation using more traditional analytic strategies that require measuring baseline confounders. In some settings, however, this asset has been misconstrued or exaggerated: sometimes the way investigators present or discuss their IV analyses can leave a reader with the impression that IV methods overcome not just this bias but all biases of the more traditional strategies. This is not so.
Indeed, one threat to the validity of IV analyses is selection or collider stratification bias.1 While the possibility of selection bias is clear in many of the seminal papers on IV methods,2,3 the relevant nuances and subtleties have been lost in translation to the more frequent end-users of IV methods. Recently, there has been more attention in the epidemiologic methods literature discussing selection biases in IV analyses, including the paper by Hughes and colleagues4 appearing in this issue, and a series of publications in the past three years which have described specific sources of selection bias,5–7 estimated plausible magnitudes of bias,8,9 and proposed means to mitigate some biases.5,6,8,10 Here, we connect the key messages from this methodologic literature for the applied researcher. Specifically, we try to clarify answers to some common questions regarding selection bias in IV analyses.
For ease of presentation, we assume throughout this guide that, were the selection bias in question absent, the proposed instrument is indeed an instrument (meaning that it is associated with treatment, does not affect the outcome except through treatment, and shares no causes with the outcome). We further assume that any homogeneity or monotonicity conditions relevant to a particular analysis hold.11 We illustrate key points with exemplar causal diagrams; readers unfamiliar with causal diagrams are encouraged to consult other resources to learn more.12 Moreover, given the ubiquity of Mendelian randomization analyses (i.e., studies in which genetic variants are proposed instruments), we address one question specific to that setting.
ARE IV ANALYSES VULNERABLE TO SELECTION BIAS?
The short answer, as an astute reader may suspect, is yes. Many of the forms of selection biases we have learned about when using more traditional confounder adjustment approaches can bias an IV analysis as well. Consider, as one example, loss to follow-up, in which treatment affects loss to follow-up, and loss to follow-up and the outcome share an unmeasured cause. Then, an IV analysis (as with other types of analyses) restricted to those with complete outcome measurement could be biased (Figure 1).7,10
ARE THERE SOURCES OF SELECTION BIAS UNIQUE TO IV ANALYSES?
IV analyses are vulnerable to sources of bias owing to selection that may not otherwise affect more traditional confounder adjustment analyses in the same selected study population. One example of this is selecting on treatment.5,8,9 Consider a setting in which we would like to compare two particular treatments, but participants could also be treated with a third option (including no treatment). Selecting only participants that receive one of the two treatments of interest seems like the logical thing to do, but this very selection creates a possible selection bias in an IV analysis (Figure 2). Part of the reason that IV analyses are vulnerable to “unique” sources of bias is because some of these biases are functions of the unmeasured confounding of the treatment–outcome relation the IV analysis was trying to avoid: in Figure 2, for example, the open pathway goes through the unmeasured confounders of the treatment–outcome relation.
Another difficulty for IV analyses is that “time zero”–the time at which treatment is assigned, eligibility criteria are met, and outcome recording begins–can be difficult to define, especially if the proposed instrument and treatment are not set close in time.6,13 For example, Mendelian randomization studies may implicitly apply eligibility criteria in adulthood or a later age, yet time zero arguably occurs when the genetic variants proposed as instruments are set at conception. This could create a selection bias even if the research question is about later-life exposures (Figure 3).6,7 Thus, a Mendelian randomization study of the effects of late-life cholesterol levels on dementia risk could be biased if earlier cholesterol levels (which presumably are affected by the genetic variants proposed as instruments) affect living long enough to be at risk of dementia. Time zero misalignments more generally can create selection (or immortal time) biases for any type of analysis, but the unique issues in defining time zero for an IV analysis are often unstated.
Of course, the reverse question could be asked: do IV analyses avoid some selection biases that may afflict other types of analyses? Generally, it is probably reasonable to suspect that a selection or stratification that biases a more traditional analysis in a given dataset might also bias an IV analysis. Note, however, this is not universally true: consider the causal diagram in Figure 4, which like the causal diagram in Figure 1 could depict loss to follow-up, but in this particular case, the IV analysis would not be biased.
DO WE NEED TO APPROACH SELECTION BIAS DIFFERENTLY DEPENDING ON WHETHER THE GOAL OF OUR IV ANALYSIS IS ESTIMATION OF A SPECIFIC CAUSAL EFFECT COMPARED WITH TESTING A CAUSAL NULL HYPOTHESIS?
Selection biases generally mean that our proposed instrument is not an instrument in the selected population. This means that whatever we were hoping to do with that instrument (e.g., bound the average causal effect, estimate the average causal effect, estimate the average causal effect in the “compliers,” or test a causal null hypothesis) could be prone to bias.11,14,15 One exception is the setting in which the selection is based entirely on the outcome, and we are only interested in testing the sharp causal null hypothesis: under the sharp causal null, this selection would not bias the IV analysis, which means that any association detected between the proposed instrument and outcome in the selected sample would be evidence against the sharp causal null even though a nonnull causal effect estimate could be biased (Figure 5).4
HOW CAN WE AVOID, ADDRESS, OR AT LEAST MITIGATE SELECTION BIAS IN AN IV ANALYSIS?
The easiest thing we can do to avoid selection bias is simply not to select or adjust for a collider. That is, some of the biases can be avoided by simply designing and conducting our studies in a manner that does not inflict a selection bias. An example of this would be a study in which a time zero misalignment is avoidable (Figure 3).
For many settings, it will not be feasible to avoid some selection, however: consider again the ubiquity of loss to follow-up in any follow-up study. In such cases, the tools available to address or mitigate selection bias in IV analyses are similar to those we could apply in other analyses. If we are concerned about the selection bias in Figure 1, we could, for example, create inverse probability of censoring weights (if we measured the appropriate variables) and then conduct the IV analysis in the weighted dataset.10 Hughes and colleagues4 consider another approach to compute inverse probability weights that require information about selection beyond the dataset in hand.
Note, however, that inverse probability weighting (or similar adjustment procedures) may not be practical in all cases. In the setting of selecting on treatment (Figure 2), the weights we would need to estimate would be functions of the very same (and therefore likely unmeasured) variables that motivated the use of an IV analysis in the first place.5 This means that it may be the case that, if unmeasured confounding of the treatment–outcome relation motivated the use of the IV analysis, then unfortunately our IV analysis may not be able to avoid the selection bias (see the Appendix of Ref. 5 for further details).
CAN WE QUANTIFY THE EXISTENCE, DIRECTION, OR MAGNITUDE OF BIAS IN AN IV ANALYSIS?
Unfortunately, we can never prove to ourselves that selection bias is not a problem. Our best bet is to reason carefully about what we know about any selection into the study population, perhaps using causal diagrams, to explore its existence. It is sometimes possible to falsify the IV model, however.16–18 For example, applying the instrumental inequalities may allow us to detect a violation of the model, including a violation owing to selection bias. Such falsification strategies would not necessarily inform us whether the violation detected was owing to selection bias versus another violation of the instrumental conditions.
To date, most approaches for learning about the plausible magnitude or direction of selection bias in IV analyses have focused on simulations.4,5,10 The use of simulations is likely because the particular structure of the selection bias matters quite a bit. A bias formula or sensitivity analysis developed for estimating the local average treatment effect when selecting on treatment,9 for example, will not necessarily inform or be easy to adapt into a sensitivity analysis for estimating a causal effect when there is loss to follow-up (or even for estimating the average treatment effect when selection on treatment).
WE PLAN TO CONDUCT A MENDELIAN RANDOMIZATION STUDY WITH MULTIPLE GENETIC VARIANTS. CAN WE AVOID SELECTION BIAS BY USING A METHOD THAT DOES NOT REQUIRE THAT ALL PROPOSED INSTRUMENTS ARE INSTRUMENTS?
Many of the premiere Mendelian randomization studies feature new instrument-based estimators that do not, strictly speaking, require that all proposed instruments are valid instruments. For example, MR-Egger works in part by allowing the proposed instruments to be invalid as long as additional assumptions hold, including an assumption that the strength of the biasing pathway is independent of the strength of the proposed instrument–treatment relation.19 MR-Egger, unfortunately would not solve selection bias. Consider Figure 6, an expansion of Figure 1 which now includes three proposed instruments that all are invalid instruments due to the loss to follow-up bias. In this case, the additional assumption needed for MR-Egger would be clearly violated: each biasing pathway is explicitly a function of the proposed instrument–treatment association. Likewise, methods that require a specific subset of proposed instruments to be valid (e.g., a median-based estimator) would also be violated in Figure 6 because the selection bias affects all the proposed instruments.20
WE PLAN TO CONDUCT AN INSTRUMENTAL VARIABLE ANALYSIS USING A PUBLICLY AVAILABLE DATABASE. DO WE NEED TO CONSIDER THE POSSIBILITY OF SELECTION BIAS?
It is becoming increasingly common to use large public datasets (e.g., biobanks) to conduct Mendelian randomization studies or other IV analyses. When the study includes a very selected population, as in the example of the UK Biobank discussed in this issue,4 we must grapple again and again with selection bias as a possible threat to any causal analysis using these data. The existence, direction, and degree of bias will be specific to each new study question, but an awareness of the entire study design will always be necessary. Without careful consideration applied to each study question, there is a risk that these publicly available resources will lead to many biased analyses.
Investigators conducting IV analyses need to carefully consider the real possibility of selection biases. Of course, there is no reason to single out selection bias–if we think about time-dependent confounding, information bias, ill-defined interventions, or any number of other threats to the validity of other analyses in observational data, some component of each of these can also be a threat to the validity of IV analyses. Our goal here is to describe some practical ways to think about what selection biases can be avoided, what can be done about those that cannot be avoided, and how to stay mindful about whatever continues to lurk.
I thank Elizabeth Diemer, Jeremy Labrecque, and Saskia le Cessie for helpful comments.
ABOUT THE AUTHOR
SONJA A. SWANSON is an assistant professor in the Department of Epidemiology, Erasmus MC. Her methodologic research focuses on developing, improving, and increasing the transparency of causal inference methods in epidemiology.
1. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625.
2. Robins JM. Correcting for non-compliance in randomized trials using structural nested mean models. Community Stat. 1994;23:2379–2412.
3. Robins JM. Ostrow DG, Kessler RC. Analytic methods for estimating HIV-treatment and cofactor effects. In: Methodological Issues in AIDS Behavioral Research. 2002:Berlin: Springer; 213–288.
4. Hughes RA, Davies NM, Davey Smith G, Tilling K. Selection bias when estimating average treatment effects using one-sample instrumental variable analysis. Epidemiology. 2019;30:350–357.
5. Swanson SA, Robins JM, Miller M, Hernan MA. Selecting on treatment: a pervasive form of bias in instrumental variable analyses. Am J Epidemiol. 2015;181:191–197.
6. Swanson SA, Tiemeier H, Ikram MA, Hernán MA. Nature as a trialist?: deconstructing the analogy between Mendelian Randomization and randomized trials. Epidemiology. 2017;28:653–659.
7. Boef AG, le Cessie S, Dekkers OM. Mendelian randomization studies in the elderly. Epidemiology. 2015;26:e15–e16.
8. Ertefaie A, Small D, Flory J, Hennessy S. Selection bias when using instrumental variable methods to compare two treatments but more than two treatments are available. Int J Biostat. 2016;12:219–232.
9. Ertefaie A, Small D, Flory J, Hennessy S. A sensitivity analysis to assess bias due to selecting subjects based on treatment received. Epidemiology. 2016;27:e5–e7.
10. Canan C, Lesko C, Lau B. Instrumental variable analyses and selection Bias. Epidemiology. 2017;28:396–398.
11. Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17:360–372.
12. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48.
13. Hernán MA, Sauer BC, Hernández-Díaz S, Platt R, Shrier I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J Clin Epidemiol. 2016;79:70–75.
14. Swanson SA, Hernan MA, Miller M, Robins JM, Richardson TS. Partial identification of the average treatment effect using instrumental variables: review of methods for binary instruments, treatments, and outcomes. J Am Stat Assoc. 2018;113:933–947.
15. Swanson SA, Labrecque J, Hernán MA. Causal null hypotheses of sustained treatment strategies: what can be tested with an instrumental variable? Eur J Epidemiol. 2018;33:723–728.
16. Bonet B. Breese JS, Kollder D. Instrumentality tests revisited. In: UAI ‘01: Proceedings of the 17th Converence on Uncertainty in Artificial Intelligence. 2001:San Francisco, CA: Morgan Kaufmann Publishers, Inc.; 48–55.
17. Glymour MM, Tchetgen Tchetgen EJ, Robins JM. Credible Mendelian randomization studies: approaches for evaluating the instrumental variable assumptions. Am J Epidemiol. 2012;175:332–339.
18. Labrecque J, Swanson SA. Understanding the assumptions underlying instrumental variable analyses: a brief review of falsification strategies and related tools. Curr Epidemiol Rep. 2018;5:214–220.
19. Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44:512–525.
20. Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40:304–314.