Vaccines are one of the most important inventions in modern medicine.^{1} Justification for real-life vaccination strategies relies heavily on results from large-scale vaccine randomized controlled trials (RCTs). However, the nature of communicable diseases means that defining and evaluating vaccine effects requires consideration of population characteristics such as the prevalence of current and prior infection, mixing patterns, and concurrent public health measures.

Policy-relevant estimands for vaccine trials have been discussed extensively (see Halloran et al.^{2} for an overview), in particular in the context of the SARS-CoV-2 disease (COVID-19) pandemic.^{3–9} However, as of yet, methods to study vaccine effects conditional on, or under interventions on, exposure to the infectious agent are rarely used. Here and henceforth, we use exposure to mean exposure to the disease agent, such as a virus, which is distinct from the treatment, such as a vaccine. A key problem is that exposure status is often difficult, or even impossible, to measure in practice.^{2}^{,}^{10} For example, Halloran and Struchiner^{11} write that measuring susceptibility to infection “might not be easy in practice and might indeed require considerable assumptions regarding who is infectious and when, how infectious the persons are, and who is exposing whom.” Challenge trials, in which participants are intentionally exposed, are one option for controlling exposure status but involve serious ethical issues.^{12–14}

This article specifically targets effects that account for exposure status, even when it is unmeasured. We provide results on the interpretation and identification of causal effects of vaccines from RCTs and observational studies. The results include identification conditions and formulas for the causal effect of a vaccine on clinical outcomes, conditional on an unmeasured exposure to the infectious agent. Specifically, we show that, under a plausible no effect on exposure assumption, the relative effect—though not the absolute effect—of the vaccine can be point-identified in an RCT. Furthermore, under the same assumption, we derive sharp bounds for the absolute effect. We clarify how these effects are related to existing estimands and notions of biological effects, and we give identification results on per-exposure effects,^{10}^{,}^{15} a type of controlled direct effect, even when the exposure is unmeasured, as is often the case in practice.

The article is organized as follows. Section Data Structure presents the data structure and the notation. Section Causal Parameters provides definitions and interpretation of causal estimands. Section Identification contains results on the identification of causal estimands, including point identification results for the relative causal effect conditional on exposure, and partial identification results for the absolute causal effect conditional on exposure. Section External Data and Sensitivity Analysis presents results for point identification of absolute causal effects conditional on exposure when external data on exposure risk are available, and suggests a sensitivity analysis when external data are unavailable. Section CECE in Time-to-event Settings extends the results to time-to-event outcomes, in a setting in which individuals can be censored due to loss to follow-up. Section Estimation and Implementation describes how our new parameters can be estimated using existing estimators, even when the outcome is unmeasured. Section Example: Effects of COVID-19 Vaccination implements the new results in a study of the ChAdOx1 nCoV-19 (Oxford) vaccine against COVID-19.

## DATA STRUCTURE

Suppose that we have data from a randomized experiment with ^{16}^{,}^{17} we consider inference in a much larger population from which the trial participants are drawn, so that interactions among patients in the trial are negligible; thus, we suppose the individuals are iid and omit the

Let *unobserved* in the study. Although we will focus on settings where the exposure

We use superscripts to denote counterfactuals.^{18}^{,}^{19} For example,

## CAUSAL PARAMETERS

### The Average Treatment Effect

To motivate the new contributions in this article, we first review the conventional average treatment effect (ATE) of

which compares the average outcome in the trial population had everyone been treated (

#### Conditional Counterfactual Contrasts Are Not Necessarily Causal Effects

To mitigate some of the concerns that are raised about the ATE in vaccine trials, we could attempt to adjust for exposure to the infectious agent.^{2}^{,}^{11} However, defining causal effects conditional on exposure status is not straightforward because exposure status is a post-treatment variable. In particular, a naive contrast of counterfactual outcomes conditional on exposure status,

is not a causal effect when the treatment affects the post-treatment event; it compares counterfactual outcomes in different subpopulations of individuals. This is illustrated by the path

### The Principal Stratum Effect and the Causal Effect Conditional on Exposure

A principal stratum effect (PSE)^{18}^{,}^{20} compares counterfactual outcomes among individuals with the same counterfactual exposure status. We can define a particular PSE among those individuals who would be exposed to the infectious agent, at least once, regardless of treatment assignment,

Unlike (2), the PSE (3) is a contrast of counterfactual outcomes in the same (sub)population of individuals, and it is therefore a causal effect. However, the conditioning set in (3) is defined by exposures in the same individual under two different treatments and, without further assumptions, it is impossible to observe the individuals in this subpopulation,^{18} even when ^{21–24}

As an alternative to the PSE, consider a contrast of counterfactual outcomes conditional on exposure status in the observed data,

Like (3), the contrast in (4) is a causal effect as it compares the same subpopulation of individuals under different treatment. Unlike (3), the conditioning set in (4) is observable when

But there is at least one setting in which differences in exposure status would not be expected between treatment groups: a *blinded* RCT, which is the context of many vaccine efficacy studies. The following mechanistic assumption formalizes the notion that receiving the vaccine does not exert effects on exposure status

**Assumption** (No effect on exposure).

Assumption (5) guarantees that exposure to the infectious agent is the same, regardless of the treatment that was assigned, and, assuming that the intervention on

This assumption can also hold outside of a blinded RCT. In particular, exposures that are outside of the individual’s control can satisfy Assumption (5). Such exposures could be consequences of natural or human disasters, such as a flooding after an intense rainfall or radiation from an atomic bombing.

The DAG in Figure 1B describes the causal structure of a blinded RCT, in which this assumption would be expected to be met, as there is no path

Under assumption (5), the contrasts (2)–(4) are equal, that is,

Halloran and Struchiner^{11} also advocated contrasts of (counterfactual) outcomes in exposed individuals, under the assumption that “people did not change their behavior after randomization”^{11}^{(p. 147)}. Condition (5) formalizes when such contrasts are unambiguous causal effects, that is, contrast of outcomes in the same (sub)population of individuals.

Because we focus on blinded RCTs in this study, we will use assumption (5) extensively, and under (5), we will denote the contrasts (3)–(4) collectively as the causal effect conditional on exposure (CECE), which is also equal to (2).

The CECE could mitigate some of the concerns that are raised about the generalizability of the ATE (1), because the CECE is confined to those individuals who are exposed to the infectious agent in the observed data, regardless of treatment assignment. Thus, assumption (5) ensures that the CECE has a mechanistic interpretation as an average causal effect given exposure to the infectious agent. The CECE is also of immediate interest for individuals who, based on their own subject-matter knowledge, believe, or possibly know, that they will be, or already are, exposed.

However, the CECE is defined among those who would be exposed in a given study, and the subset who is exposed is context-dependent. To understand the CECE, it is helpful to draw an analogy to ring vaccination trials, in which individuals are recruited only if they have been exposed to an index case, and are subsequently randomly assigned to

### The Controlled Direct Effect

A special case of a controlled direct effect (CDE),^{25} also called a per-exposure effect or a challenge effect,^{10}^{,}^{11} is defined with respect to an intervention on the treatment

This CDE corresponds to the effect that is identified by a challenge trial^{26}; that is, a study where the participants are subject to an intervention where they are guaranteed to be physically exposed to the infectious agent. Outside of RCTs, household studies are sometimes used to infer such effects, based on contrasts of household secondary attack rates.^{11}

Unlike the ATE (1), the CDE is defined in a controlled setting, in which *all* individuals are exposed to the infectious agent. Thus, this effect is insensitive to the risk of exposure in the observed population.

Finally, all the estimands considered so far can be defined conditional on any baseline covariate

### The Notion of a “Biological” Effect

Both the CECE and CDE quantify treatment effects in individual who are guaranteed to be exposed to the disease agent. In that sense, both effects seem to be captured by the notion of “biological” effects. However, the fact that the CECE and CDE are distinct estimands illustrates that the term “biological” effect, without further clarification, is ambiguous.

## IDENTIFICATION

To motivate the identification results in this work, we first review three standard identifiability conditions for the ATE.

**Assumption** (Treatment exchangeability).

Treatment exchangeability, for example, holds in the Single World Intervention Graph (SWIG)^{19} in Figure 1C, even if

**Assumption** (Positivity).

**Assumption** (Consistency).

Conditions (7)–(9) hold by design in an RCT where treatment is unconditionally randomly assigned. These three conditions allow us to identify the ATE (1) as

However, our focus is on estimand (4) (and (6) in eAppendix D, https://links.lww.com/EDE/B994), which is defined with respect to counterfactual statuses of the exposure

### Identification of the CECE

Under the no effect on exposure assumption (5) and conditions (7)–(9), it is straightforward to express the CECE as a function of factual variables,

but the CECE, as defined as an arbitrary contrast (“vs.”), is not point identified in our data because

is not possible to estimate from the observed data.

To identify the CECE, we therefore introduce an additional assumption, which relates the unmeasured

**Assumption** (Exposure necessity).

The exposure necessity assumption states that only individuals who were exposed to the infectious agent can experience the outcome. Thus, the exposure is a necessary condition for experiencing the outcome. For example, contact with some amount of live virus is necessary to develop severe disease. Many exposures and outcomes of interest meet this criterion, though sometimes researchers may be interested in other exposures that do not necessarily satisfy this criterion, for example, sharing a home or classroom with an infected individual. However, such an exposure definition might be revised to being in the same room with an infected individual for at least 1 minute, though even that might not be strictly necessary. In practice, it is important that the investigator has articulated a well-defined exposure, but it is possible that different investigators use different definitions.

Our first proposition shows that the *relative* CECE is identified under the conditions we have introduced so far, which are expected to hold in a blinded RCT.

**Proposition 1** (Relative CECE). *Under the no effect on exposure assumption (5), standard identifiability conditions (7)–(9) and exposure necessity (10), the relative CECE is equal to*

*given that*

The proof is found in eAppendix A, https://links.lww.com/EDE/B994. From our considerations in Section The Principal Stratum Effect and the Causal Effect Conditional on Exposure and our derivations in Section Identification of the CECE, it follows that Proposition 1 also gives an identification result for the relative principal stratum effect, that is, ^{2}

The fact that the relative CECE is identified by the same formula as the relative ATE is related to the known result in epidemiology that diagnostic tests that have perfect specificity will give unbiased estimates of risk ratios, even if these tests do misclassify disease cases. We discuss this in eAppendix C, https://links.lww.com/EDE/B994.

Although the absolute CECE is not point identified, our next proposition gives partial identification of the absolute CECE for a binary outcome

**Proposition 2** (Absolute CECE). *Under the no effect on exposure assumption (5) and conditions (7)–(10), the absolute CECE on an* outcome *is partially identified by the sharp bounds*

*when*

The proof is given in eAppendix A, https://links.lww.com/EDE/B994.

**Remark on Proposition 2**. The lower bound on the absolute CECE is equal to the absolute ATE. Thus, Proposition 2 gives us another interpretation of a standard risk difference—as a lower bound on the absolute CECE. Furthermore, this lower bound is equal to the absolute CECE if *everybody* is exposed.

The upper bound is 1 minus the relative ATE, which is a quantity that is often reported as the vaccine efficacy in randomized controlled trials,^{2} for example, during the COVID-19 pandemic.^{27} The absolute CECE is equal to this bound if an unvaccinated individual (

It follows from Proposition 2 that the larger

Zhao et al.^{28} studied another interesting setting where relative—but not absolute—risks could be point identified. Their causal question, which concerned racial discrimination in policing, was studied in a setting where the treatment, equivalent to our

## EXTERNAL DATA AND SENSITIVITY ANALYSIS

Consider a binary outcome

Our next proposition shows that knowledge of either

**Proposition 3** (Point identification of the absolute CECE). *Under the no effect on exposure assumption (5) and conditions (7)–(10)*,

The proof of Proposition 3 is given in eAppendix E, https://links.lww.com/EDE/B994. Besides giving point identification results in settings with knowledge from external data, Proposition 3 motivates sensitivity analyses for the magnitude of the absolute CECE using sensitivity parameters that are justified by subject-matter reasoning; that is, the investigator can evaluate (11) and (12) under different values of the marginal sensitivity parameters

## CECE IN TIME-TO-EVENT SETTINGS

In both RCTs and observational studies, it is common to evaluate vaccine effects on time-to-event outcomes. Our results generalize to settings where the exposure status and the outcome of interest are both time-to-event variables, which possibly are censored due to losses to follow-up.

Suppose that ^{18}^{,}^{19}^{,}^{29} suppose that we are interested in outcomes in discrete time intervals ^{30}^{,}^{31} For example,

**Proposition 4** (Relative and absolute CECE for time to event outcomes). *Under exchangeability, positivity, consistency, exposure necessity and the no effect on exposure assumption for time-to-event outcomes, formally stated as conditions (S3)–(S9) in eAppendix B*, https://links.lww.com/EDE/B994, *the relative CECE at time *

$k$

*,*

$0\le k\le K$

*, is identified by the ratio of cumulative incidences*,

*where*

*and*

*Under the same conditions, the absolute CECE is partially identified by the sharp bounds*

*when*

See eAppendix B, https://links.lww.com/EDE/B994, for details and a proof. Thus, like the point exposure and point outcome setting, we do not need to measure common causes of

We have restricted all our discussion to results on *risks*, not *rates* such as hazards. Despite the fact that hazards are sometimes reported as “efficacy parameters” in infectious disease settings, there are well-known limitations of considering causal estimands on the hazard scale^{18}^{,}^{32–34} because of the conditioning on a post-treatment event—here outcomes at earlier times—that is affected by treatment.

### Excess and Etiologic Fractions

Following Greenland and Robins,^{35} the excess (prevented) fraction quantifies the excess of outcomes under treatment versus control. When the assumptions of Proposition 4 hold, the excess fraction among the exposed is

which quantifies the increase in caseload under no treatment.^{35}^{,}^{36} In particular, the excess fraction conditional on exposure is equal to the unconditional excess fraction. Furthermore, the right hand side of (13) is often what is reported as *the vaccine efficacy* in clinical studies.^{2}

The excess fraction should not be confused with the etiologic fraction, which is the fraction *caused* by treatment. For example, suppose we consider outcomes at time ^{35}^{,}^{36}

## ESTIMATION AND IMPLEMENTATION

Because all our identifying formulas from Section Identification are expressed in terms of simple conditional means, we can use simple estimators with known properties. Let

and similarly the upper bound on the absolute CECE by ^{37} or the Delta method.^{38} The estimator of the lower bound on the absolute CECE is

For the identifying formulas in Section CECE in Time-to-event Settings, which are cumulative incidences, let ^{39} (Sections 2.3 and 2.4).

## EXAMPLE: EFFECTS OF COVID-19 VACCINATION

To study the effect of the ChAdOx1 nCoV-19 vaccine against COVID-19, Voysey et al.^{40} enrolled 23,848 participants in a blinded RCT done across the United Kingdom, Brazil, and South Africa. The participants were randomly assigned 1:1 to the ChAdOx1 nCoV-19 vaccine or control, which contained a meningococcal vaccine. The interim analysis included 11,636 participants.^{40} The cumulative incidence of COVID-19 80 days since second dose was 0.9% (95% CI = 0.5%, 1.3%) and 3.1% (95% CI = 2.4%, 3.8%) in the vaccine and placebo arms, respectively. Thus, an estimate of the relative

which corresponds to the reported vaccine efficacy point estimate of ^{40}^{(Table 2)}. Here and henceforth, we omit the

We can use the results from Section Identification of the CECE to derive bounds for the absolute CECE, specifically the sharp lower bound

and the sharp upper bound

Although we obtained informative point estimates of the relative CECE, the bounds on the absolute CECE are wide. The fact that the bounds are wide is not surprising, because they crucially depend on the risk of exposure to the virus. As discussed in Section Identification of the CECE, the lower bound is reached under a setting where everybody is exposed to the virus, and the upper bound when the probability of the outcome among the exposed, unvaccinated individuals is 1. Depending on the definition of exposure, such settings may or may not be plausible. However, we can use a sensitivity analysis, as suggested in Section External Data and Sensitivity Analysis, to reason about the magnitude of the absolute CECE.

### Sensitivity Analysis in the ChAdOx1 nCoV-19 Vaccine Study

Determining sensitivity parameters to generate point estimates of the absolute CECE requires us to think concretely about the definition of exposure, or to consider a range of definitions of exposure. The ChAdOx1 nCoV-19 vaccine trial began enrollment in June 2020 and recruited a sample of 60%–90% health-care workers, depending on the site. Suppose we define

So far we have reasoned about the sensitivity parameter ^{41} Using this estimate of

More broadly, the bounds illustrate an important point: the relative CECE is constant for *any* exposure when Assumptions (10) and (5) hold, but only weak conclusions can be made about the magnitude of the absolute CECE unless we both have a clear idea about the definition of the exposure and have information about

Estimating the CDE requires data on covariates

## DISCUSSION

In this study, we have distinguished various estimands for vaccine effects conditional on exposure to infection and clarified their identification assumptions. We have required that the exposure, for example, close contact with an infectious individual, is necessary for the outcome of interest to occur, for example, symptomatic disease, as stated in our exposure necessity condition (10). An alternative approach would involve adapting the definition of exposure to something that is possible to measure. For example, one might define exposure as close contact with infected people who present overt disease. However, such definitions have explicitly been discouraged, precisely because they would lead to an *underestimate* of the exposure in settings where some infections are inapparent.^{2} In the case of the CECE, we have considered the exposure to be any event such that the exposure necessity condition holds.

When a necessary exposure is unmeasured, we have shown that relative effects can be point identified under plausible conditions, but absolute effects can only be bounded under the same conditions. Often the most commonly reported and publicized results are relative effects, as in major studies on different COVID-19 vaccines.^{40}^{,}^{42}^{,}^{43} Thus, the results presented in this work give valuable interpretations to the numbers that are computed.

However, often both relative and absolute effects are of interest. Absolute effects are usually studied in optimal regime settings,^{44–46} which reflects the common opinion that heterogeneous effects on the additive scale are most appropriate for evaluating public health interventions.^{47} Importantly, bounds on the additive effect can be used in formal decision theoretic approaches, even if these bounds are wide or cover null effects.^{48}^{,}^{49} Furthermore, if the investigator is willing to invoke assumptions about the probability of exposure, the bounds will be narrower, as we describe in Section External Data and Sensitivity Analysis.

In future study, we will formally consider generalizability of the different vaccine effects on different scales, including the CECE, which could be applicable to settings with interference outside of the randomized experiment.

## ACKNOWLEDGMENTS

This study was initiated during Louisa Smith’s visit to École Polytechnique Fédérale de Lausanne.

## REFERENCES

**Keywords:**

Estimands; Randomized Trials; Sharp bounds; Vaccine effects