Instrumental variables are routinely used to recover a consistent estimator of an exposure causal effect in the presence of unmeasured confounding. Instrumental variable approaches to account for nonignorable missing data also exist but are less familiar to epidemiologists. Like instrumental variables for exposure causal effects, instrumental variables for missing data rely on exclusion restriction and instrumental variable relevance assumptions. Yet these two conditions alone are insufficient for point identification. For estimation, researchers have invoked a third assumption, typically involving fairly restrictive parametric constraints. Inferences can be sensitive to these parametric assumptions, which are typically not empirically testable. The purpose of our article is to discuss another approach for leveraging a valid instrumental variable. Although the approach is insufficient for nonparametric identification, it can nonetheless provide informative inferences about the presence, direction, and magnitude of selection bias, without invoking a third untestable parametric assumption. An important contribution of this article is an Excel spreadsheet tool that can be used to obtain empirical evidence of selection bias and calculate bounds and corresponding Bayesian 95% credible intervals for a nonidentifiable population proportion. For illustrative purposes, we used the spreadsheet tool to analyze HIV prevalence data collected by the 2007 Zambia Demographic and Health Survey (DHS).

# Implementation of Instrumental Variable Bounds for Data Missing Not at Random

- Free
- SDC

## Abstract

Instrumental variables are routinely used in epidemiology as an approach to recovering a consistent estimator of an exposure causal effect in the presence of unmeasured confounding. Missing data present another frequent complication, but the commonly invoked assumption that data are missing completely at random or missing at random is often tenuous. Moreover, if the missingness depends on the possible unobserved value of the outcome of interest, the data are considered missing not at random. In such cases, missing data methods that only rely on the observed data are likely to fail to deliver a valid result. Instrumental variable approaches to account for selection bias attributable to outcome data missing not at random—where selection bias refers to divergence of the outcome means among complete cases from that of incomplete cases, conditional on observed covariates—also exist but are less familiar to epidemiologists.^{1–6} A valid instrumental variable in the context of data missing not at random must satisfy two conditions:

- (i) (exclusion restriction) The instrumental variable must not be directly related to the outcome in the underlying population.
- (ii) (IV relevance) The instrumental variable must be associated with the missingness mechanism.

These two conditions are depicted in the directed acyclic graph (DAG) in Figure 1. The instrumental variable (

) is not directly related to the outcome (

) but is associated with the missingness indicator (

). Throughout we let

denote complete cases and

denote incomplete cases. Often, assumptions (i) and (ii) hold only after conditioning on a set of fully observed covariates, as reflected in Figure 1. To simplify the exposition without loss of generality, we suppress the presence of covariates

throughout the presentation.

Suppose one aims to estimate the population mean of the outcome

, which is unobserved for a subset of individuals. Like instrumental variables for exposure causal effects, assumptions (i) and (ii) alone cannot identify the parameter of interest. For estimation, researchers have operated under a third parametric assumption. This assumption typically involves fairly restrictive parametric assumptions such as a bivariate Gaussian latent error model, which together with (i) and (ii), allows one to identify the outcome mean.^{3}^{,}^{7} For example, Bärnighausen and colleagues^{7} used a Heckman-type parametric selection model to correct for selection on unobserved correlates of the outcome. In addition to being empirically untestable, inferences can be sensitive to this third parametric assumption.^{8–12}

In their discussion of Bärnighausen et al,^{7} Geneletti et al^{13} recommended sensitivity analysis as an alternative to the parametric approach used by Bärnighausen and colleagues. Later, McGovern et al^{14} introduced a flexible parametric copula approach as an alternative to the assumption of bivariate normality. In a recent article, Sun and colleagues^{15} provide necessary and sufficient identification conditions for a valid instrumental variable satisfying assumptions (i) and (ii). They also give easier-to-verify sufficient identifying conditions. These conditions are advantageous as they do not necessarily commit to a specific parametric model for identification and inference but allow the analyst to specify a range of parametric, semiparametric, or nonparametric models, which may provide a better fit to the data, provided their identification conditions are met. This is the approach taken by Tchetgen Tchetgen and Wirth.^{16} The purpose of this article is to discuss another inferential approach for leveraging a valid IV. Although this approach fails to point-identify the mean outcome of interest, it can nonetheless provide informative inferences about the presence, direction, and magnitude of selection bias, without invoking an untestable parametric assumption. In fact, a valid instrumental variable satisfying (i) and (ii) may be used to obtain (1) empirical evidence of selection bias and (2) IV bounds as independently proposed by Robins^{1} and Manski^{2} for the population outcome mean of interest. These bounds are often referred to as Manski bounds and originated in the context of inference on an average treatment effect, where data on the counterfactual outcome is logically missing. The idea was subsequently extended to situations with a monotone instrumental variable^{17} and an instrumental variable satisfying statistical rather than mean independence.^{18}

Similar to instrumental variable approaches for exposure causal effects, the exclusion restriction assumption (i) cannot be established from empirical data. However, it may sometimes be falsified empirically, indicating that a particular IV may be invalid; Wang and colleagues^{19} provide a falsification test in the context of exposure causal effects. In particular, assumption (i) will be falsified when the bounds produced by (2) are incoherent; that is, when either the upper instrumental variable bound is strictly smaller than the lower instrumental variable bound or the instrumental variable bounds are not contained within the nonparametric bounds, which do not utilize the instrumental variable.

We sought to develop an Excel spreadsheet tool (eAppendix 1; https://links.lww.com/EDE/B322) that can be used to obtain empirical evidence of selection bias and calculate IV bounds with corresponding Bayesian 95% credible intervals for a population proportion. For illustrative purposes, we used the tool to reanalyze HIV prevalence data collected by the 2007 Zambia Demographic and Health Survey (DHS)^{20} and reported by Bärnighausen et al.^{7}

### Evidence of Selection Bias

Selection bias will occur if, as shown in Figure 1, the

edge indicating a non-null association between

and

is present. An immediate consequence is that while (i) ensures that

and

are independent in the target population, (i) and (ii) together imply that

and

will generally be associated among complete cases attributable to collider bias induced by conditioning on collider

along the

path. Therefore, under assumptions (i) and (ii), empirical evidence of an association between the outcome and instrumental variable among complete cases supports the presence of selection bias. This empiric evidence can be established by standard methods including Pearson’s χ^{2} or Fisher’s exact test for polytomous

and likelihood ratio, Wald, or score tests in more general regression analysis. Our spreadsheet tool (eAppendix 1; https://links.lww.com/EDE/B322) assumes that

is binary and

is polytomous to compute Pearson χ^{2} test statistic and corresponding *P* value for the association between

and

conditional on

. To our knowledge, our article is the first that explicitly leverages collider bias to test for selection bias with an instrumental variable in a missing data context. We note, however, that the proposed test is not consistent because its power fails to exceed the nominal type 1 error rate to detect selection bias when

is multiplicative in

and

—that is, when there exist functions

and

such that

. In this case,

and

are guaranteed to be independent among complete cases despite the presence of selection bias for estimating the mean of

.^{21}

### Instrumental Variable Bounds

First, we describe nonparametric bounds that do not utilize an instrumental variable.^{20} By the law of total probability,

quantities on the right-hand side of the equation, except for

evaluated empirically. Therefore, lower and upper bounds for

, when

is bounded by two known values, may be obtained by substituting lower and upper bounds for the unobserved mean

. In the case of binary Y, this produces in the following equations:

Notably, these bounds are guaranteed to contain the complete-case mean,

, and therefore can never rule it out as a plausible parameter value. The nonparametric bounds are also useful as they provide extreme case scenarios: unobserved outcomes are all cases

or all noncases

, an eventuality that is highly unlikely in most settings. However, the length of this interval is equal to the proportion of the sample with incomplete data,

. Thus, the resulting bounds may be too wide to be informative (unless the amount of missing data is negligible in which case selection bias would unlikely be of concern).

When a categorical instrumental variable

is available, the length of the interval conferred by the nonparametric bounds may be narrowed sufficiently, potentially leading to more informative inferences about the population mean

. We begin by rewriting the above interval (1) conditional on subgroups defined by

:

If

satisfies the exclusion restriction (i) and instrumental variable relevance (ii) assumptions, Robins^{1} and Manski^{2} noted

such that the interval (2) can be simplified to:

Because the interval (3) holds for all levels of

, Robins^{1} and Manski^{2} noted that the lower (4) and upper (5) bounds could be obtained by taking the maximum and minimum of the interval values defined by categories of *Z*, respectively, such that under assumption (i) and (ii),

In other words, the lower bound (4) is identified by choosing the value of

that maximizes the mean value of

among subjects with observed data and that value of

multiplied by the probability of observing that data among subjects with that value of

If

is a valid instrumental variable, it is mathematically impossible that the true mean of

falls below this value. An analogous reasoning establishes that it is not mathematically possible for the true mean to exceed the upper bound. We created an Excel spreadsheet tool (eAppendix 1; https://links.lww.com/EDE/B322) to accompany this article, which allows researchers to enter summary data on

among complete cases and

and

for the observed sample to produce both nonparametric and IV bounds. The spreadsheet tool accommodates a binary outcome

and polytomous

(up to 50 levels). The Excel tool also computes Bayesian 95% credible intervals around the IV bounds and therefore for the nonidentifiable outcome mean

by leveraging a well-known large-sample Gaussian approximation to the posterior distribution of estimated parameters used to obtain empirical bounds. The tool also computes the Pearson’s χ^{2} test statistic and corresponding *P* value for the association between

and

among complete cases.

Last, we also provide the R programming code (eAppendix 2; https://links.lww.com/EDE/B323) that implements Bayesian credible intervals for the outcome mean with improved finite sample performance, by directly sampling from the exact posterior distribution for upper and lower bounds without relying on large-sample approximations.^{22} Specifically, the Bayesian approach proceeds by (a) placing independent Dirichlet priors

on the distribution of

for all levels of the instrument; (b) drawing a large number of samples (10,000) from the posterior distribution; (c) discarding those that can be falsified empirically (cf. end of page 3); (d) calculating the instrumental variable bounds (3) and (4) with each of the remaining samples; and (e) reporting the posterior median and quantile-based credible intervals from step (d). The resulting 95% credible interval contains the true outcome mean with 95% probability. Frequentist alternatives to the Bayesian approach described above also exist to account for sampling variability of empirical bounds. Such approaches are not pursued here but can be found elsewhere.^{23–26}

### Empirical Illustration

#### Study Population

To illustrate the IV bounds using our Excel spreadsheet tool and R program, we reanalyzed data previously reported by Bärnighausen et al.^{7} Specifically, they estimated HIV prevalence using the HIV testing component of the 2007 Zambia Demographic and Health Survey.^{19} This cross-sectional, population-based survey, carried out over a 6-month period from April to October 2007, employed a complex sampling scheme to assess the general health status and family welfare among households in Zambia.^{16} HIV status defines the outcome

, whereas

indicates whether an HIV test result is available for the participant. At the initial visit, a household representative was asked to complete a short survey and provide basic demographic information on all usual household members and any visitors who stayed in the household the previous night. Of those listed, men and women aged 15–59 and 15–49 years (respectively) were eligible for an individual interview and HIV testing. In this illustration, we restrict attention to 7,146 eligible men who were identified from 7,164 household interviews; 7,116 (>99%) men had complete information from the household survey. Of those with complete information, 5,145 (72%) provided a specimen for HIV testing.

#### Instrumental Variables

Following Bärnighausen et al, we used household interviewer identity as the instrumental variable. Interviewer characteristics such as sex, personality, and interpersonal skills may lead to different response rates. However, because interviewers were randomly assigned to households, these factors are unlikely to be associated with an individual’s HIV status in the population. In the 2007 Zambia DHS, 48 distinct interviewers conducted interviews with men; interviewer was highly associated with HIV testing nonparticipation (*P*<0.001; range of nonparticipation for interviewers: 13%–53%).

Although Bärnighausen et al used interviewer identity as a fixed effect in their Heckman-type implementation of IV adjustment for selection bias, we employed the IV bounds using our Excel spreadsheet tool. We captured the association between interviewer identity and nonresponse by computing a propensity score from a logistic regression model for the response indicator

with fixed effects for interviewer identity. We then categorized the estimated propensity scores into

categories, which we then used as the instrumental variable (

) in the Excel spreadsheet tool.

is generally expected to be valid if interviewer identity is a valid IV across all interviewers, given that the former is solely a function of the latter (in large samples). We expect a bias-variance trade-off to occur with increasing levels of

because fewer observations per interviewer will likely increase variability but decrease the length of the bound. IV bounds were generated for *k* = 10, 20, 30, and 48, with 48 corresponding to the total number of interviewers in the data. We also calculated nonparametric bounds and computed Pearson’s χ^{2} test statistic and *P* value for testing selection bias.

#### Ethics

The original protocol for the HIV testing component of the 2007 Zambia Demographic and Health Survey was reviewed and approved by the Zambian Ministry of Health Tropical Diseases Research Centre Ethical Review Committee, the Institutional Review Board of Macro International, and CDC Atlanta. The current analysis of the deidentified survey data was determined to meet criteria for exemption by the Office of Human Research Administration at the Harvard T.H. Chan School of Public Health.

## RESULTS

Crude HIV prevalence was 12.2% (95% credible interval = 11.2%, 13.1%) compared with the IV bounds of 15.3% to 26% with

(95% credible interval = 14.1%, 26%). The crude prevalence was within the nonparametric bounds (9.1%, 36.8%). The instrumental variable estimate from Bärnighausen et al. was 21% (95% CI = 19.8%, 22.2%), which is consistent with our IV bounds. This suggests that, at least in this specific empirical example, the instrumental variable results seem to be fairly robust to the assumptions underlying either adjustment strategy and that the adjustment for selection bias with an instrumental variable seems to matter more than the specific IV analysis. The Pearson’s χ^{2} test statistic was 13.57 corresponding to a *P* value of 0.007 providing strong evidence supporting the presence of selection bias. Results are shown under the “Zambia Application” tab of the Excel spreadsheet tool (eAppendix 1; https://links.lww.com/EDE/B322).

Figure 2 overlays the nonparametric bounds with the IV bounds for *k* = 10, 20, 30, and 48. The results illustrate the impact of

on the length of the interval with fewer categories leading to wider intervals. Interestingly, when the original coding of interviewer identity is used as instrumental variable (*k* = 48), the point estimate of the upper bound, 18.8%, is smaller than the point estimate of the lower bound, 22.6%. To account for sampling variability in our estimation of the bounds, we performed a Bayesian analysis and checked the posterior acceptance rate against 5%. Analysis results show that we cannot falsify the instrumental variable from the data, and the 95% posterior credible interval is 16.2%–24.2%. Results were essentially identical in the more exact Bayesian approach implemented in the R program we have provided (eAppendix 2; https://links.lww.com/EDE/B323). This is again consistent with the instrumental variable estimate given by Bärnighausen et al.^{7}

## DISCUSSION

We obtained IV bounds that were tighter than the nonparametric bounds within the Zambia HIV survey sample, as expected on theoretical grounds. The crude HIV prevalence estimate failed to fall within IV bounds, consistent with selection bias. The Pearson’s χ^{2} statistic provided fairly strong empirical evidence of selection bias. Finally, Figure 2 shows that the precision of the bounds increases as the number of categories of the instrument increases, until data within levels of the instrument become exceedingly sparse, causing the bounds to become incoherent.

Instrumental variables have great potential as a design tool to address selection bias attributable to nonparticipation or drop out. For example, researchers could build in an instrumental variable for missingness by randomizing participation incentives, thus guaranteeing conditions (i) and (ii) to hold. If randomization is not possible, researchers could collect information on interviewers for use as observational instrumental variables for missingness. If researchers incorporate instrumental variables at the design stage, without any additional assumptions, they can obtain a distribution-free test for the presence of selection bias and account for this bias through IV bounds.