In the late 1980s, when we started developing the case-crossover design, confusion about the relation between case-control studies and cohort or follow-up studies was still common. Some epidemiologists were still teaching students that a case-control study is like a backwards cohort study—the “trohoc” fallacy.1
The turning point was the realization that most case-control studies are (or ought to be) the same as dynamic-population follow-up studies, except for a key difference: controls are a small sample rather than a complete census of the study base—the population-time that produced the cases.2 This realization played a role in our formulation of the case-crossover design, as we puzzled over how to identify triggers of myocardial infarction (MI) onset by retrospective interviews with patients.3 Here the underlying dynamic follow-up study was like a crossover experiment, except that the time at which subjects crossed back and forth between exposure and nonexposure was not controlled by an experimenter but influenced by natural events or subjects' personal decisions.
The impetus for using cases as their own controls was our finding that all other control groups are too vulnerable to selection or information bias.4 By interviewing patients about the day of their heart attack (the case day) as well as the day before (the matching control day), we could rule out confounding by constant characteristics of patients. Also the subjects' interpretations of our questions about exposures would be the same for both days.
Initially, we saw little reason why case-crossover designs might be useful in absence of those biases, such as in the context of database studies of the health effects of air pollution. Such studies typically capture all deaths or hospitalizations in a population, minimizing selection bias. Air pollution data are comprehensive and collected independently from the health outcome data, so differential information bias is negligible.
However, it rapidly became apparent that case-crossover studies do offer an additional research strategy for air pollution researchers, especially when data are collected from individual subjects. Even with no additional data besides what are already available in health and air pollution databases, a case-crossover analysis can complement time-series analyses. By redefining the time scale so “time zero” is the onset of the outcome, a case-crossover analysis enables the investigator to directly conceptualize the causal relation, and may thereby reveal associations as well as problems that were initially overlooked in time-series analyses. For example, long-term, seasonal and weekly patterns of traffic density, air pollution and timing of disease onset make it necessary to control for time trends in exposure. In a case-crossover analysis, this can be achieved by design, ie, by selecting control intervals on the same day of week and from the calendar month as each case interval, and ignoring exposure data from other times. This controls confounding by time trends to the extent that the dimensions of the strata capture the temporal variation. In a time-series analysis, the same can be achieved by modeling, ie, by keeping the exposure data from all other times in the follow-up period and adding terms to a multivariate model to control for such time trends. If the model specification happens to be right, then using exposure data from other times preserves statistical information in the analysis and improves statistical precision. However, modeling can add misinformation if the modeling assumptions happen to be wrong—eg, uncontrolled confounding can result from incorrectly specifying the shape of the long- and short-term time trends. Some people understand and trust analyses more when confounding is controlled by design. Simple crude analyses from case-crossover studies—especially raw counts of events in case and control periods—are easy to understand, despite their involving control for many confounders by self-matching.
Does a case-crossover analysis actually uncover new information, or does it just help investigators to perceive it? First, we must understand the nature of this question by contrasting a case-crossover study with a traditional case-control study. The latter asks “Why them?” (Why did these people become cases whereas those people did not?). A case-crossover study answers the question “Why now?” (Why did these individuals become cases on this day rather than previous days?) As such, case-crossover studies extract different information from databases and estimate different relative risks. For example, the relative risk of motor vehicle collisions comparing benzodiazepine users to nonusers is different from the relative risk of collisions comparing days when benzodiazepines were taken versus not taken among intermittent users who ultimately have collisions.5 The clearest difference is that a case-control study includes people who are “immune” to the outcome while a case-crossover study excludes them. The exposure odds among immune people enters into the computation of the control group's exposure odds, but has no part conceptually or computationally in the case-crossover odds ratio. In lay terms, “How common are people with immunity?” is relevant to the question “Why them?” but not “Why now?”
This difference does not exist between case-crossover and time-series analyses. Both address the question “Why now?” Lu et al, in this issue of the Journal,6 and in a companion paper Lu and Zeger,7 show the near equivalence of conditional-logistic case-crossover analyses and stratified log-linear time-series analyses of air pollution data in relation to population health data. Lu et al6 emphasize that they are dealing with special circumstances—a type of case-crossover study in which the exposure is common to all.
Everyone in the population crosses back and forth with the same frequency between exposed and unexposed periods of air pollution. Another special circumstance assumed by Lu et al is “time-stratified bidirectional control sampling.”8 “Time-stratification” means control days are selected only from the same time window (eg, calendar month) as the case day. “Bidirectional” means control days are selected both before and after the case day.
Such circumstances are uncommon in broad applications of epidemiologic research and apply primarily in studies of individual effects triggered by ecologic fluctuations. Bidirectional designs are feasible only when future exposure events cannot be affected by past outcomes. Some examples might include, studies of fluctuating water turbidity that trigger gastrointestinal illness, or volatile stock markets that trigger heart attacks among brokers. In these examples, case-crossover analyses might extract no more information than already contained in time-series analyses. In fact, as Lu et al6 show, time-series analysis generally would have more statistical precision (assuming no misspecification of the model), plus the ability to adjust for over-dispersion of data about the estimate of the background trend.
How Generalizable are the Findings of Lu et al to Other Types of Case-Crossover Study?
They plan to address this mathematically in further investigations. We predict they will discover that the advantages of assumption-dependent time-series analyses wane as they move further from those special circumstances. In any case, the magnitude of those advantages are small compared with the nondifferential exposure misclassification that comes from errors in measuring times of events and quantities of exposures that affect case-crossover and time-series analyses alike.9
To illustrate how special these circumstances are, let us apply them to our analysis of strenuous exertion as a trigger of MI onset.10 First, a “bidirectional” design was unthinkable because patients' exertion on future control days would be greatly influenced by their history of MI. If we had included future control days, reverse causation bias (outcomes influencing exposure) would have rendered our exposure odds ratios uninterpretable. Second, “exposure common to all” would be like restricting the analysis to patients who reported the same usual frequency of exertion, eg, once per week. Everyone would contribute similar weight to the analysis. Not only would there be no “concordant” subjects who drop out of the analysis because they never had strenuous exertion, there would be no “nearly concordant” subjects (eg, with no exertion on the MI day and exertion only once or twice a year in the control period) who contribute just a little information, nor any “very discordant” subjects (eg, with no exertion on the MI day, yet daily exertion in the control period) who contribute the most information.
Nevertheless, we feel some recommendations by Lu et al are pertinent to most case-crossover studies. As in any thoughtful analysis, investigators should examine their data and look for outliers. If you have the data to do it, examine your problem with a dynamic follow-up design as well. If your main analysis uses a multivariate model, assess your modeling assumptions. If you are estimating a trend in exposure across multiple control intervals, and using the estimate to predict expected exposure frequencies in case intervals, examine for over-dispersion.
If we had had the benefit of the insights of Lu et al, would we have done anything differently in our earlier case-crossover work? We might have made more comparisons within and among control intervals, displaying trends or exposure patterns and trying to use that information to adjust the expected exposure frequency in the case interval. However, the subtleties of accounting for over-dispersion around trend estimates would have seemed inappropriate in a context where avoiding information bias in self-reported data was paramount.
Most problems investigated by case-crossover studies involve complexities that do not occur in air pollution studies. Yet, by simultaneously using case-crossover and time-series analyses to study the same questions with the same data, air pollution researchers have improved our understanding of the theoretical relation between the case-control paradigm and the follow-up study paradigm, in the context of subjects crossing between exposure levels. Such understanding is always useful, even for conducting case-crossover studies in which potential selection and information biases may be large relative to concerns about statistical precision and unmeasured covariates.9
ABOUT THE AUTHORS
MALCOLM MACLURE is Manager of Research in the Pharmaceutical Services Division of the British Columbia Ministry of Health, with adjunct appointments at the Harvard School of Public Health and University of Victoria, British Columbia (Canada). He focuses on methodologic problems in pharmacoepidemiology and health services research using central administrative databases. MURRAY MITTLEMAN is director of the Cardiovascular Epidemiology Research Unit at the Beth Israel Deaconess Medical Center, and Associate Professor of Epidemiology at the Harvard School of Public Health. His applied work is primarily in the areas of cardiovascular disease and injury epidemiology. Maclure introduced the idea of case-crossover study design, and Mittleman has worked on its methodological aspects since its inception.
1. Poole C. Controls who experienced hypothetical causal intermediates should not be excluded from case-control studies. Am J Epidemiol
2. Miettinen OS. The “case-control” study: valid selection of subjects. J Chronic Dis
3. Maclure M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epidemiol
4. Maclure M, Mittleman MA. Should we use a case-crossover design? Ann Rev Public Health
5. Maclure M. ‘Why me?' versus ‘why now?'—differences between operational hypotheses in case-control versus case-crossover studies. Pharmacoepidemiol Drug Saf
6. Lu Y, Symons JM, Geyh AS, et al. An approach to checking and improving upon case-crossover analyses based on equivalence with time-series methods. Epidemiology
7. Lu Y, Zeger SL. On the equivalence of case-crossover and time series methods in environmental epidemiology. Biostatistics
8. Janes H, Sheppard L, Lumley T. Case-crossover analyses of air pollution exposure data: referent selection strategies and their implications for bias. Epidemiology
9. Mittleman MA. Optimal referent selection strategies in case-crossover studies: a settled issue. Epidemiology
10. Mittleman MA, Maclure M, Tofler GH, et al. Triggering of acute myocardial infarction by heavy physical exertion: protection against triggering by regular exertion. Determinants of Myocardial Infarction Onset Study Investigators. N Engl J Med