Beside confounding, nonrandom selection is probably the most important source of bias in epidemiologic studies. Hernán et al.^{1} provided a systematic classification of selection biases, based on causal diagrams. In their classification, the selection into the study is a “collider” on a noncausal path between the exposure and the outcome. One such scenario is depicted in Figure 1, where the exposure, outcome and selection indicator are denoted ^{2}^{,}^{3}

Hernán^{4} discussed another type of selection bias, where the selection is statistically independent of the exposure but associated with the outcome through common causes, as in Figure 2^{4}. He showed, using standard graphical rules, that this selection is more benign than those considered by Hernán et al.^{1} in that it preserves the sharp causal null, that is, it does not induce an association between the exposure and the outcome unless the exposure has a causal effect on the outcome for at least some individuals. Thus, a hypothesis test of the sharp causal null remains valid. However, he showed with numerical examples that this selection may give bias away from the sharp causal null in that the exposure–outcome association in the selected population is not generally equal to the exposure effect in the source population, that is, the population from which the data were taken.

There are several issues with the discussion by Hernán.^{4} Although many readers may be content with the numerical examples that he gave, some may find the lack of formal proofs unsatisfactory. Furthermore, whereas the causal diagram in Figure 2 represents one possible mechanism of outcome-dependent selection, there are other such mechanisms. In particular, the causal diagram in Figure 3 represents the important scenario where the selection is directly influenced by the outcome, but not by the exposure. An obvious example is the unmatched case–control study, in which the selection is by definition influenced by the outcome status alone. To distinguish between the scenarios in Figures 2 and 3, we refer to them as “outcome-associated selection” and “outcome-influenced selection,” respectively. Finally, Hernán^{4} focused on causal effects in the source population, that is, the population from which the observed data were selectively drawn. However, one may also wonder to what extent the observed data can help to estimate causal effects in the selected population, that is, the population defined by those selected into the study. To the best of our knowledge, this question has not been addressed in other literature either.

In this article, we address these issues. We consider both outcome-associated selection and outcome-influenced selection, and we use formal yet intuitive methods based on counterfactual diagrams^{5} to show whether causal effects, both in the source population and in the selected population, are estimable under these selection mechanisms. We will assume that the reader is familiar with causal diagrams, and in particular with the rules of d-separation.^{2}^{,}^{3} We will focus on selection bias and ignore other possible biases, confounding in particular. Real observational studies typically control for measured confounders to reduce the degree of confounding bias; all our arguments and results hold within levels of (conditional on) such measured confounders, provided that these are sufficient for confounding control. We restrict attention to binary exposures and outcomes, but some of our arguments and results carry over to other types of variables; we indicate this as we proceed. Throughout, we ignore uncertainty due to sampling variability.

## POTENTIAL OUTCOMES AND CAUSAL TARGET PARAMETERS

We use standard potential outcome notation^{2}^{,}^{3} to define causal effects. Let

We emphasize that, since we are not concerned with sampling variability, we use the term “selected population” in an asymptotic sense. That is, we do not use the term to refer to the limited sample of selected individuals in the particular study, but rather to an infinite “super-population” of individuals, generated under the same selection mechanism as the factual sample. Probabilities conditional on

## CONSISTENCY AND EXCHANGEABILITY

We make the standard consistency assumption^{6–8} that the potential outcome

From consistency (1) it follows that

that is, the probability of the outcome

then we could interpret the exposure–outcome association in the selected population as the corresponding causal effect, for example, we could interpret the risk difference

then we could interpret the causal effect in the selected population as the corresponding causal effect in the source population, for example, we could interpret the causal risk difference

The relation in (2) states that the potential outcome

this is often referred to as “conditional exchangeability.”^{2}^{,}^{3} Similarly, the relation in (3) states that the potential outcome

The concepts and definitions above are related to a recent study by Lu et al.^{9} These authors assumed that the causal effect in the source population is the target parameter, and showed that the total bias of the exposure–outcome association in the selected population can be decomposed into two parts. They used the terms “type 1 selection bias” and “type 2 selection bias” for the bias components due to violations of (4) and (5), respectively. This decomposition is also related to the modern literature on transportability of causal effects, where we say that the causal effect in the selected population is “transportable” to the source population if it is equal to the causal effect in that population, that is, if there is no type 2 selection bias; see Barenboim and Pearl^{10} and the references therein.

## ESTIMATION OF CAUSAL EFFECTS IN THE SELECTED POPULATION

### Outcome-associated Selection

It is difficult to judge whether counterfactual independencies like (4) and (5) hold in a causal diagram using intuitive reasoning alone, because standard causal diagrams do not include potential outcomes like ^{5} In this method, the causal diagram illustrating the factual world is augmented with a parallel diagram, illustrating the counterfactual world where the exposure is set to a certain level for everyone. The factual and counterfactual worlds are joined by exogenous error terms, corresponding to all (measured or unmeasured) factors that influence the variables under consideration, apart from those explicitly depicted on the original causal diagram. For instance, in the causal diagram of Figure 3 there is only one variable, ^{2} provides a formal connection between causal diagrams and potential outcomes through nonparametric structural equations. Once the counterfactual diagram has been constructed, counterfactual independencies like (4) and (5) can easily be evaluated using standard rules of d-separation.

The counterfactual diagram corresponding to the causal diagram in Figure 2 for outcome-associated selection is shown in Figure 4. The left part of the diagram represents the factual world where the exposure

In Figure 4, there are two paths between ^{9} we say that there is no type 1 selection bias. Because the counterfactual diagram in Figure 4 makes no assumption about

### Outcome-influenced Selection

Lu et al.^{9} considered a variation of outcome-influenced selection shown in Figure 5, where there is a covariate ^{9} stated that, for the causal diagram in Figure 5, the exposure–outcome association in the selected population “suffers from type 1 selection bias by restricting to one level of a descendant of the collider

To provide a more rigorous argument, we again use counterfactual diagrams. The counterfactual diagram corresponding to the causal diagram in Figure 3 for outcome-influenced selection is shown in Figure 6. Because ^{9} we say that there is type 1 selection bias. Thus, Lu et al.^{9} were correct in that bias occurs because of conditioning on a collider, but this collider appears on the counterfactual diagram, not on the original causal diagram. This bias occurs for nonbinary exposures and outcomes as well.

In eAppendix 1, https://links.lww.com/EDE/B995, we prove the stronger result that causal effects in the selected population are not estimable in any way under the causal diagram in Figure 3. We only prove this result for binary variables, but we conjecture that a similar result holds for other types of variables as well.

That causal effects in the selected population cannot be estimated under outcome-influenced selection does not mean that data are completely uninformative about such causal effects. A straightforward application of arguments by Robins^{11} and Manski^{12} leads to the conclusion that the counterfactual probability

For completeness, we prove this relation in eAppendix 2, https://links.lww.com/EDE/B995. By maximizing

and

## ESTIMATION OF CAUSAL EFFECTS IN THE SOURCE POPULATION

### Outcome-associated Selection

From the counterfactual diagram in Figure 4, we observe that ^{9} we say that there is type 2 selection bias. This bias occurs for nonbinary variables as well.

In fact, the situation is even worse. In eAppendix 3, https://links.lww.com/EDE/B995, we prove that the observed data have no information about causal effects in the source population under outcome-associated selection. Thus, whatever data we observe under outcome-associated selection, the causal risk difference in the source population can be anywhere between

This result may seem reasonable to some readers, because we have not assumed that the sampling fraction

To other readers, the fact that the observed data have no information about causal effects in the source population under outcome-associated selection may seem to contradict the result by Hernán,^{4} that we can test whether the sharp causal null holds under outcome-associated selection. However, we remind the reader that the sharp causal null means that the exposure has no effect for any single individual. A violation of the sharp causal null does not imply that the exposure has an effect on the population level, because the exposure may have positive effects for some individuals and negative effects for other individuals, which may cancel out in the population.

### Outcome-influenced Selection

From the counterfactual diagram in Figure 6, we observe that ^{9} we say that there is type 2 selection bias, in addition to the type 1 selection bias noted above. This bias occurs for nonbinary variables as well.

However, in contrast to outcome-associated selection, the data under outcome-influenced selection do have some information about causal effects in the source population. Under outcome-influenced selection, it can be shown that the causal odds ratio in the source population is equal to the odds ratio in the selected population,

and can therefore be estimated; this is true irrespectively of the sampling fraction ^{13}^{,}^{14}; for completeness, we give a detailed proof in eAppendix 4, https://links.lww.com/EDE/B995. By using the causal odds ratio in the source population, it is possible to provide bounds on the causal risk difference and risk ratio in the source population. Specifically, it follows from results in King and Zeng^{15} that

and

The bounds in (7) for the causal risk difference in the selected population are qualitatively different from those in (9) for the causal risk difference in the source population, in that the former include both positive and negative values, whereas the latter include either positive or negative values, but not both. Similarly, the bounds in (8) for the causal risk ratio in the selected population include both values above and below 1, whereas the bounds in (10) for the causal risk ratio in the source population include either values above or below 1, but not both. Thus, using these bounds we are able to tell the direction of the exposure effect in the source population, but not in the selected population.

## CONCLUSIONS

In this note, we have contrasted outcome-associated selection and outcome-influenced selection. We have shown that causal effects in the selected population are estimable under outcome-associated selection but not under outcome-influenced selection. We have shown that data have no information about causal effects in the source population under outcome-associated selection, but that the causal odds ratio in the source population can be estimated, and the causal risk ratio and risk difference can be bounded, both in the selected population and in the source population, under outcome-influenced selection. For some of these results, we have used counterfactual diagrams, but we note that it may also be possible to prove these results with Single World Intervention Graphs (SWIGs).^{16}

We have presented bounds for the causal risk difference and risk ratio in the source population and in the selected population under outcome-influenced sampling. Other authors have presented related bounds, but under somewhat different conditions. Kuroki et al.^{17} derived bounds for the causal risk difference and the causal risk ratio in the source population under case-control sampling, which is a special case of outcome-influenced sampling. Unlike us, though, these authors allowed for both confounding and biased selection, so we would expect their bounds to be less informative (i.e. wider) than our bounds in (9) and (10); we verify this in eAppendix 5, https://links.lww.com/EDE/B995. Gabriel et al.^{18} derived bounds for the causal risk difference in the source population in scenarios with missing data, which is analogous to selection. Unlike us though, these authors allowed simultaneous outcome-associated and outcome-influenced missingness or selection (their Figure 1A), and they assumed that the proportion of nonmissingness, corresponding to

We have not taken a stance on which parameter is most relevant from a scientific perspective: a causal effect in the source population or in the selected population. We believe that most researchers would prefer to estimate causal effects in the source population, but given all sources of errors in real epidemiologic studies (e.g., selection bias, measurement bias, confounding bias), we conjecture that many researchers would be content with an estimate of any causal effect that is at least approximately unbiased. Thus, the fact that outcome-associated selection admits estimation of causal effects in the selected population, whereas outcome-influenced selection does not, may be useful information to practitioners.

## REFERENCES

**Keywords:**

Causal diagrams; Causality; Counterfactual graphs; Outcome-dependent sampling; Selection bias