The regression discontinuity design has been described as the next best thing after a randomized trial: it may produce valid causal inferences, at least under some assumptions,^{1–4} and can do this without randomization and even without having a real-life control group that is comparable in all relevant respects (or made comparable by adjustment). In principle, the regression discontinuity design is able to estimate effects of interventions in all health care situations in which a treatment is given to persons above a threshold of some continuous prognostic measurement and not given to those below the threshold. The basic idea is that the counterfactual outcomes of the persons who were treated(ie, their expected outcome if they had not been treated) are derived by extrapolation from a model based on those who did not qualify for treatment.^{3} In their review article in this issue of EPIDEMIOLOGY, Bor et al^{5} explain the background theory: for the limiting situation (ie, for persons whose distance to the threshold approaches zero, either from above or below), the assignment to treatment can be seen as being as good as random, because “chance”—that is, all kinds of irrelevant happenings—will have determined their position at infinitesimally small distances above or below the threshold. Like instrumental variable analysis, regression discontinuity seems popular in economics, but potentially underused in medicine and epidemiology. Bor et al^{5} argue that the assumptions of regression discontinuity are more easily met than those of instrumental variable analysis.

The article by Bor et al^{5} is a great effort to bring the regression discontinuity design to the attention of epidemiologic and statistical methodologists so that a wide audience becomes aware of the design and can potentially use it in a variety of circumstances. As their article is somewhat couched in statistical language, the potential usefulness of this design might not immediately be transparent. The design might be used in any existing care setting where rules exist, or new interventions are introduced, that apply to people above or below a particular threshold. Because today’s health care delivery is ever more guideline-driven, a myriad of possibilities may exist to measure the effects of guideline-based interventions or regulations—not only newly introduced interventions but also existing guidelines. For example, the effect of writing a stern letter to doctors who were high prescribers of a particular drug (a β-agonist inhaler to children) has been analyzed by a regression discontinuity design.^{6} Similarly, the effect of cardiovasculair treatments to lower serum cholesterol, as prescribed by general practitioners in patients above a particular cholesterol level is amenable to regression discontinuity design—as was tried several decades ago,^{7} and was recently revived in a sophisticated analysis, with an in-depth statistical discussion.^{8} For the effect of an intervention on serum cholesterol itself, a simulation of a regression discontinuity design using data from a randomized controlled trial (RCT) demonstrated that the estimates of the regression discontinuity design were close to the estimates obtained when the randomized control group was used.^{4} In addition, the effect of interventions on disease outcomes can be estimated when the interventions are based on a threshold of some prognostic variable, as demonstrated by Bor et al^{5} in the CD4/mortality example in their article and earlier by others.^{4}

#### SUBTLETIES OF REGRESSION DISCONTINUITY DESIGNS

To see some of the subtleties behind the regression discontinuity design idea, it is useful to first think about the situation in which the pre-intervention measurement is the same as the post-intervention outcome (say, pre-intervention cholesterol and post-intervention cholesterol). In that situation we immediately suspect that “regression to the mean” will occur: given variation in measurement of whatever source, persons whose initial value is away from the mean, will have different values on their second measurement – without any intervention. The direction of the difference inverses around the mean, and the differences will be larger the further away from the mean. The phenomenon was described as a genetic phenomenon under the name “regression towards mediocrity”‘ by Galton^{9} in 1886 when comparing the length of parents (the mid-parent length, the first measurement) with the length of their children (the second measurement); the original figure by Galton is reproduced in Figure 1. The phenomenon is well-known in clinical medicine and forms the basis of such clinical practices as that hypertension is established only if several measurements at various points in time remain above a certain level. A linear relation between pre- and post-treatment measurements will result if the underlying variable is normally distributed. Figure 2A shows the expected linear regression to the mean without any intervention, and Figure 2B, what might happen after an intervention in the ideal case wherein the treatment effect is constant above the threshold. (More complex cases with interaction have been dealt with in the literature^{2},^{5},^{10}). The idea of a limiting situation that leads to exchangeability around the threshold is of little practical use, because the regression lines above and below the threshold must be estimated on a rather sizeable window of pretreatment values above and below the threshold, and in any real-life situation, the expected values above the threshold will differ from those below. The beauty of the regression discontinuity design, however, is that it exploits the regression-to-the-mean phenomenon by estimating the regression-to-the-mean line from a group of persons on one side of the intervention threshold and then by extrapolating the line at the other side of the threshold to represents the “expected outcomes.”^{4},^{10}

Figure 1 Image Tools |
Figure 2 Image Tools |

The usefulness of the regression discontinuity design does not limit itself to the situation wherein pre- and postintervention measurements are from the same variable. The design may also be used when the outcome is different from the pretreatment test, as in the article by Bor et al^{5} This opens the possibility of even more applications. Again, one may expect that in real-life estimations the windows will be much wider than in the theoretical limiting situation, and that the more persons there are below a particular threshold value of a prognostic variable, the more their expected outcomes will be different from those above the threshold, and vice versa. One can make assumptions about the form of the relationship between the preintervention measurement and the outcome and posit that a shift of regression functions around the threshold corresponds to a treatment effect.

In general, in situations wherein a linear relation is credible,^{1} or a functional relation between test and outcome is already known (or can be derived from a large part of the data), the results will have greater credibility than if the form of a model has to be empirically tried out. Linearity of the regression line in the untreated can be assumed if both the pre- and posttreatment variables have normal distributions. Of course, perfect normal distributions from minus to plus infinity do not exist in biology, but many biological variables are reasonably bell-shaped, even if somewhat lopsided, or they can be transformed. Linear regression will then offer good approximations, except perhaps at extreme ends of the data.

#### WHAT IS ESTIMATED?

Bor et al^{5} emphasize that the regression discontinuity design estimates a local causal treatment effect “at the threshold,” which is different from what is obtained in an RCT where the average treatment effect in the total population in the trial is considered. A Bayesian perspective was recently proposed by Geneletti et al,^{8} with an extensive discussion of the causality assumptions. According to Bor et al,^{5} the causal effect might become similar to that of an RCT only in the ideal case of a constant additive effect over all preintervention measurements. They see an estimation of a local threshold effect as important, however, because if there still is an effect “at the threshold,” this shows that the threshold for treatment might be too high. The reasoning about the windows above and below the threshold is symmetrical: the extrapolation of the outcomes of the treated persons to the untreated gives the latter’s counterfactuals under treatment and estimates the same effect.

As with any design, there are all kinds of caveats. First, several caveats stem from choices of doctors or patients: (1) the intensity of treatment might differ farther away from the threshold, and (2) if the threshold is fuzzy,^{5},^{8} this means that other considerations to allocate treatment came into play that leads to the suspicion of confounding by indication. Second, there are biological and statistical considerations: (1) The effect of treatments may differ in persons closer to the threshold than in persons at more extreme values (ie, interaction), and (2) the relations may be nonlinear and, worse, of unknown form. To counter potential objections, most literature on regression discontinuity emphasizes the value of showing graphically what happens, of using different windows around the intervention threshold, and of being aware of nonlinearity and interactions.

A fierce debate that ran in the 1990s should be mentioned. Many a statistician would intuitively suspect that if measurement error is present in the first measurement—the preintervention measurement—this would bias the estimate of the treatment effect.^{11} However, this intuition is incorrect: the treatment effect is unbiased if the intervention threshold is based only on the preintervention measurement (and the relation between pre- and postmeasurement is correctly specified).^{12},^{13} In contrast, if instead of being based on one observed preintervention measurement, the “true” underlying value is used as threshold—for example, a doctor prescribes antihypertensive treatment based on repeated blood pressures, that is, “true hypertension” —corrections for measurement error should be made.^{12},^{13} Researchers might have some difficulty in realizing the problem, because they might be inclined in an analysis to use only one measured value that is readily available before an intervention, whereas this is part of a series that led to the decision to treat. As an alternative to correct for measurement error of one preintervention measurement value, in such instances one might use a “true underlying value” obtained from several measurements as the threshold.

#### LET’S TRY IT OUT

Whatever the caveats and the debates, here is an interesting idea with several variants in its practical application. Actually, one wonders whether it can be called “a design,” rather than the general idea of estimating counterfactual outcomes by a model instead of a comparison group. The latter idea has become commonplace in epidemiology, for instance when G-computation is used. A regression discontinuity design is less efficient than an RCT, and larger numbers are needed, but the regression discontinuity design can be applied in existing care settings and can use existing data. Thus, the idea should be experimented with. It might be useful to check its robustness again on data from RCTs (as was done by Finkelstein et al^{4} earlier) or in other situations in which the effect is well-known. Also, it might be useful to examine in which situation the regression discontinuity design is more robust: when pre- and postintervention measurements are the same, or when an outcome is studied that is different from the preintervention measurement. We have the benefit of a large methodological literature, going back several decades and over several fields of observational science from education to economics. Now it is our turn to study the feasibility and robustness of this design in diverse areas of medicine and public health!

#### ABOUT THE AUTHORS

JAN P. VANDENBROUKE is Royal Academy Professor at the Department of Clinical Epidemiology, Leiden University Medical Center, the Netherlands. His main interest is translating techniques of observational epidemiology to the research interests of tertiary-care physicians. He was introduced to the regression discontinuity design in a course by Olli Miettinen in 1977 in the Netherlands, and has repeatedly tried to interest clinical investigators in this design. SASKIA LE CESSIE is an associate professor at the departments of Medical Statistics and the department of Clinical Epidemiology of the Leiden University Medical Center. She is interested in statistical methods used in epidemiologic research–in particular, methods to derive causal effects from observational data.