The Case-Control Study as Data Missing by Design: Estimating Risk Differences.

Wacholder, Sholom

There are advantages to viewing the case-control design as a missing-data problem instead of as a sampling problem. In the simplest setup, cases are those members of a population who develop disease; controls can be a small random sample of the large number who do not; and covariates, including exposures and other important variables, are available only for cases and controls and are assumed to be missing at random for the remaining large fraction of the population. This approach allows estimation of the joint distribution of all variables in the population. Thus, when the size of the population is known, analysis is not restricted to logistic and other multiplicative intercept models. Methods based on this approach can obtain estimates and confidence intervals for parameters representing the effect of exposure on disease, with multivariate adjustment for other factors. Thus, case-control data can be used to estimate the risk difference, a parameter with great public health value. The missing-data perspective offers an additional advantage by linking the "study base principle" of control selection with the statistical concept of "missing at random." As an illustration, I use a subset of data from a case-control study to obtain estimates of the difference between annual risk of bladder cancer for various levels of smoking and lifetime non-smokers, adjusted for occupational exposure.

(C) Lippincott-Raven Publishers.