Click on the links below to access all the ArticlePlus for this article.
Please note that ArticlePlus files may launch a viewer application outside of your web browser.
Epidemiologic studies often examine the experience of a population over time to estimate the association between exposure and outcome. These studies generally use case–control, prospective cohort, or retrospective cohort designs. In this work, we discuss an alternative design, which we term the cross-sectional cohort study. This design uses a cohort identified in the present, assesses the timing of exposures and outcomes retrospectively, and applies analytic methods for cohort studies. The cross-sectional cohort design has previously been mentioned briefly by Miettinen,1 and used in some previous studies,2–4 but recent texts have emphasized primarily the flaws of using cross-sectional sampling to achieve the goals of a cohort or case–control study.5,6 There has been no explicit discussion of the conditions required for validity of the cross-sectional cohort design or the situations in which it may be advantageous. This method can be particularly useful, however, when studying certain chronic or episodic disorders, such as psychiatric disorders—provided that threats to validity, such as nonignorable exiting and measurement error, can be adequately addressed.
DESCRIPTION OF DESIGN
The cross-sectional cohort design involves cross-sectional sampling to obtain a study cohort and then retrospective assessment of the history of exposures and outcomes in the members of that cohort. We define the study cohort as the set of all individuals from a given source population who are available for evaluation at a specific calendar time point, t1. Usually, t1 will be the present, although it does not have to be. After obtaining the study cohort (or more commonly in actual practice, a random sample of the study cohort), we assess exposure and outcome information for each individual at each time point back from t1 to the onset of that individual's period of risk. Normally, the onset of the period of risk is defined as the time at which the individual naturally becomes at risk for an outcome (eg, birth, onset of puberty, time of first exposure to a potential toxin under study). However, the onset of the period of risk may alternatively be defined as a later time point (more technically, the period of risk can be left-truncated by design)—for example, if the investigators are concerned with the decay of memory over time, or with information bias associated with recall of long-past events. For the study cohort as a whole, the observation period begins at the earliest time point at which any individual in the cohort enters the period of risk. We define this time point as t0.
For any given study cohort, we can construct a hypothetical conceptual cohort, defined as individuals in the study cohort plus all other individuals from the source population who also entered the period of risk between calendar times t0 and t1, but who are not available at t1, and therefore are unknown to the investigator (see Figure 1; note that a similar figure appears in Szklo and Nieto6). Thus in the cross-sectional cohort study, the conceptual cohort forms the study base, and the study cohort represents “survivors” from this base at time point t1 (although we do not use the term “survivor” herein because it suggests mortality—and mortality is only one of many reasons that an individual might be unavailable at t1).
The conceptual cohort is a closed cohort with no loss to follow-up, and the decision to stop observation at t1 is independent of exposure or outcome (technically, there is noninformative type I right censoring7). Note that calendar time is used to define the cohort, whereas an alternative time metric, such as age, may be more appropriate for the analysis.8 A prospective study of this conceptual cohort would clearly yield an unbiased estimate of the hazard for each level of exposure. Therefore, under all conditions where a study cohort yields the same estimates of measures of effect as its underlying conceptual cohort, the corresponding cross-sectional cohort study will also produce unbiased estimates of measures of effect. By extension, the cross-sectional cohort design also will yield unbiased estimates of measures of effect when using efficient sampling schemes employed in traditional cohort studies, such as nested case–control and case–cohort designs.5,6,9 We summarize these conditions for validity in the text below and demonstrate them formally in the Appendix (available with the electronic version of the article).
Finally, to dispel possible confusion regarding the term “cross-sectional cohort,” we note that (1) “cross-sectional” refers to the selection of subjects in the present, usually without regard to exposure or outcome, whereas the assessments cover both present and past experience and (2) “cohort” refers broadly to a dynamic (or open) cohort whose past experience is assessed at t1, rather than referring more narrowly to a specific closed cohort, or to a group of persons still at risk for the outcome at t1.
Consider a study of whether sex and attention-deficit hyperactivity disorder in childhood are risk factors for major depressive disorder and alcohol abuse in adolescence or adulthood (ie, age 13 or older). To test this hypothesis using a cross-sectional cohort design, we interview all individuals age 13–50 years in a given region who report that they were born in that region. Here, t0 is the date that the oldest subject turned 13; t1 is the present; the conceptual cohort is the set of all individuals born in the region between 13 and 50 years ago; and the study cohort is the subset of those individuals who still currently live in the region. We administer a standard structured diagnostic interview, such as the Structured Clinical Interview for DSM-IV,10 to assess all subjects for the outcomes of major depressive disorder and alcohol abuse (including the age of onset of these outcomes). The exposure variable of sex is easily determined; childhood attention-deficit hyperactivity disorder before age 13 is scored as present or absent on the basis of a standard self-administered rating scale, such as the Wender Utah Rating Scale,11 which has been shown to be reliable and valid for assessing this diagnosis retrospectively.11,12 We then analyze the association between each of the exposure variables and the development of major depressive disorder or alcohol abuse. For example, we can estimate the hazard ratio using a Cox proportional hazards model using age (ie, time from cohort entry) as the time scale and including covariates that are potential confounders, or we could use discrete-time survival methods.
Of course, we could study these same exposure–outcome relationships with any of the 3 more common designs listed earlier. A prospective cohort study would be less vulnerable to error associated with retrospective assessment, but would be inefficient, requiring decades to complete. A retrospective cohort study would be feasible, but only if we happened to possess an historical enumerated cohort of individuals from the region. A case–control study might seem efficient because we could recruit cases of depression and alcoholism at treatment centers. But many individuals with these disorders do not seek treatment, and treatment-seeking is likely related to the exposure variables, possibly causing bias. We could perform a more valid case-control study by identifying cases through community interviews, recruiting separate controls for each of the 2 outcomes, and discarding all other interviewed subjects. But this option would offer little advantage over the cross-sectional cohort design, since most of our cost is for interviews to identify cases, and it costs almost nothing to add exposure assessments to each interview.
Thus, the cross-sectional cohort design is attractive in this example: first, because both the exposures and outcomes are common; and second, because retrospective assessment of psychiatric measures such as these is a well-established procedure. Moreover, the cross-sectional cohort design requires fundamentally the same assumptions for its validity as other approaches, although in practice certain of these assumptions require particular attention in a cross-sectional cohort study. Below, we discuss the 2 principal types of threats to validity: (1) nonignorable exiting and (2) measurement error and selection bias. We then discuss practical considerations arising from these threats when conducting a cross-sectional cohort study.
THREATS TO VALIDITY
In the cross-sectional cohort study, as in all of the aforementioned study designs, some subjects will exit the study base by the time that evaluation for the outcome is performed. If subjects exit in a manner that does not bias estimates of the measures of effect, we call this ignorable exiting; conversely, when the pattern of exiting results in biased estimates, exiting is nonignorable. (Note that we are concerned here only with exiting, and not entering, since the fixed rules for entry into the conceptual cohort preclude nonignorable entering.) Nonignorable exiting potentially threatens all study designs, but often this is not explicitly acknowledged, because in practice the threat may be low. For example, when using incident cases in case–control studies, little time exists for the development of the outcome to affect availability for sampling. Furthermore, in prospective and retrospective cohort studies, the investigators know which subjects are unavailable for assessment, and so they can use information available on the missing subjects to analyze for possible effects of the outcome on exiting. But in a typical cross-sectional cohort study, the outcomes may have occurred long in the past, so that the outcome has ample opportunity to influence exiting—and the investigators possess no data on the characteristics of those who have exited. In the cross-sectional cohort study, therefore, one must pay particular attention to the conditions required for ignorable exiting, which thereby permit unbiased estimates of the effects of interest. We present the conditions for ignorable exiting with respect to estimates of the hazard and the hazard ratio below, and provide a formal demonstration of them in the Appendix (available with the electronic version of the article).
The hazard or incidence rate represents a fundamental measure of occurrence of an outcome. For the cross-sectional cohort design to have ignorable exiting and produce unbiased estimates of the hazard, it is sufficient that at each point in the time metric used in the analysis, among individuals sharing any given level of the explanatory variables (exposure and covariates), the probability of ultimately being a member of the study cohort is equal for those with and for those without the outcome (demonstrated in Section I of the Appendix). Hereafter, we will use the term “probability of membership in the study cohort” for brevity, rather than “probability of ultimately being a member of the study cohort.”
The hazard ratio (or incidence rate ratio) and hazard difference (or incidence rate difference) are among the most important measures of association between an outcome and an exposure. Conditions for unbiased estimates of the hazard difference are the same as those for the hazard. However, when the above assumption for estimating the hazard is not met (ie, when the probability of membership in the study cohort depends on the outcome), the cross-sectional cohort study can still have ignorable exiting with respect to estimating the hazard ratio under a less restrictive sufficient condition—namely, that at each point in the time metric used in the analysis, among individuals sharing any given level of the explanatory variables, the relative probability of membership in the study cohort for those with versus those without the outcome (ie, the ratio of these 2 probabilities) is the same as the relative probability of membership among individuals at any other given level of the explanatory variables; in other words, this ratio is constant across all levels of the explanatory variables (demonstrated in Section II of the Appendix). For a binary exposure, this would mean that the logarithm of the probability of membership in the study cohort, for a given combination of exposure and outcome, is an additive function of the exposure and the outcome status (see Appendix, Section II).
In practice, this assumption means that even if (1) the level of exposure influences the probability of membership in the study cohort and (2) occurrence of outcome influences the probability of membership in the study cohort, we can still obtain an unbiased estimate of the hazard ratio, provided that these 2 influences are independent and do not interact with each other (ie, changes in exposure do not change the tendency of the outcome to influence membership, and vice versa).
To illustrate this assumption, suppose that exposure influences membership in the above example: at any given age, men are more likely than women to exit the study region. Suppose also that outcome influences membership: people with major depressive disorder are more likely to exit than those without. Even in these circumstances, our study will still yield an unbiased estimate of the hazard ratio, barring the circumstance that these 2 influences interact with each other (for example, if being male were somehow to accelerate exiting among depressed people, while not comparably accelerating exiting among nondepressed people).
Measurement Error and Selection Bias
All study designs require that exposure, outcome, and covariates be measured accurately. However, because the cross-sectional cohort study is retrospective, these assessments are more vulnerable to measurement error than contemporaneous assessments. This error can be either nondifferential (for example, attributable to the decay of memory over time), or differential, because subjects’ outcome status may differentially influence their reporting of past exposures, thus causing information bias. Superficially, case–control studies may seem less vulnerable to such measurement error—but this difference is only apparent, because case–control studies often evaluate recent exposures and outcomes, whereas cross-sectional cohort studies often evaluate more distant exposures and outcomes. However, as mentioned above, any cross-sectional cohort study can be restricted to recent exposures and outcomes simply by left-truncating the period under evaluation. In addition, the above discussion assumes that all members of the study cohort are evaluated, whereas in practice one usually attempts to evaluate only a random sample of the study cohort. If this sampling is not random, the study will also be vulnerable to selection bias. Technically, “selection bias” could refer more broadly to any distortion in sampling from the study base that biases estimates of measure of effect, including distortions caused by nonignorable exiting—but here we use the term more narrowly to describe bias arising from distortions when sampling from the study cohort.
The problems of measurement error and selection bias, however, are not unique to the cross-sectional cohort study. They affect all studies in which random sampling is attempted. These problems are particularly acute when exposure and outcome are assessed retrospectively (especially in situations where historical records are not available and where investigators must rely on self-reports)—as is the case in many retrospective cohort studies and most case-control studies. Because these problems are inherent to retrospective studies as a whole,5,6 rather than to the cross-sectional cohort design in particular, we do not discuss them further here.
The cross-sectional cohort study is not restricted to incident cases (cases with new onset at the time of evaluation) but instead captures all cases with onset during the period of risk under study. Below, we refer to this set of cases as “prevalent cases.” Note that we define “prevalent” here to include not only cases active at the time of evaluation, but also cases that developed earlier in the period of risk and that have remitted by the time of evaluation. These broadly defined “prevalent” cases might alternatively be termed “cumulative incident cases.”
The challenges of using prevalent cases have been noted previously5,6; in particular, 2 considerations become important in order for the cross-sectional cohort study (or any retrospective study of prevalent cases) to meet the requirements for validity outlined in the 2 sections above. First, ascertainment bias—an effect of nonignorable exiting—may arise due to length-biased sampling. A common example of this bias is the situation where high mortality among cases accelerates exiting from the population. Of course in theory, even with length-biased sampling, the cross-sectional cohort study can still estimate the hazard ratio if the required assumptions are met—but the greater the degree of length-biased sampling, the more these assumptions are strained. Second, it may be difficult to date accurately the onset of exposures and outcomes that have occurred in the past—increasing the likelihood of measurement error.
For these 2 reasons, neither the cross-sectional cohort study nor any other retrospective design is well suited for, say, a rapidly fatal medical condition such as glioblastoma, or a condition for which date of onset is difficult to establish, such as hypertension. However, the cross-sectional cohort study is well suited for chronic or episodic conditions such as psychiatric disorders, in which (1) incident cases are uncommon, because individuals often come to treatment belatedly or not at all; (2) assessment of prevalent cases is feasible, because exposures and outcomes are typically assessed retrospectively with acceptable accuracy; and (3) length-biased sampling is minimized, because the disorders are rarely fatal or otherwise likely to accelerate exiting, and because past occurrence of disorders now in remission can be detected.
RELATION TO OTHER DESIGNS
The cross-sectional cohort design shares much in common with the more familiar prospective cohort, retrospective cohort, and case–control designs. All can be considered variants of the cohort study approach: even the case–control study represents sampling from an underlying cohort where there is missing data by design.13 The cross-sectional cohort study can be seen as a generalization of the retrospective cohort study—but in the latter design, the primary study base is a historical cohort with a roster,1,14 whereas the former uses a primary study base (our “conceptual cohort”) for which the roster (the list of members of the study cohort) is incomplete. Stated differently, the cross-sectional cohort study is a retrospective cohort study with all exposure and outcome information collected retrospectively, and with an unknown amount of loss to follow-up. When considered along with its extensions to case–control sampling, the cross-sectional cohort study can also be viewed as a case–control study using prevalent cases (within a primary study base) that are available at the end of follow-up—with the sampling fraction set at 100% for cases and controls when sampling the entire cohort; set at less than 100% for cases and controls when using random sampling; and set higher for cases than controls when performing traditional case-control sampling. A similar relationship exists between a prospective or retrospective cohort study and a case–control study performed within the same primary study base. Indeed, the cross-sectional cohort and other cohort studies differ technically from the case–control study only in that the case–control study does not require a primary study base but also can be performed with a secondary study base—the latter being defined implicitly as the hypothesized population that gives rise to the cases.1,14–17
The “cross-sectional cohort study” represents a useful option for assessing the association of exposures with outcomes in chronic or episodic conditions with low mortality, such as psychiatric disorders. With this design, an investigator samples a source population cross-sectionally, and then retrospectively assesses subjects’ histories of exposures and outcomes over a specified period of time. Because the cross-sectional cohort study examines prevalent cases, it is best suited for situations in which length-biased sampling is unlikely, and in which exposure and outcome can be accurately assessed retrospectively.
1. Miettinen OS. Theoretical Epidemiology: Principles of Occurrence Research in Medicine
. Albany, NY: Delmar; 1985.
2. Andrade L, Eaton WW, Chilcoat HD. Lifetime co-morbidity of panic attacks and major depression in a population-based study: age of onset. Psychol Med
3. Kessler RC, Borges G, Walters EE. Prevalence of and risk factors for lifetime suicide attempts in the National Comorbidity Study. Arch Gen Psychiatry
4. Whitlock G, Norton R, Clark T, et al. Motor vehicle driver injury and socioeconomic status: a cohort study with prospective and retrospective driver injuries. J Epidemiol Community Health
5. Rothman KJ, Greenland S. Modern Epidemiology
. 2nd ed. Philadelphia: Lippincott-Raven; 1998.
6. Szklo M, Nieto FJ. Epidemiology: Beyond the Basics
. Gaithersburg, MD: Aspen Publishers; 2000.
7. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data
. 2nd ed. New York: John Wiley; 2002.
8. Korn EL, Graubard BI, Midthune D. Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale. Am J Epidemiol
9. Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika
10. First MB, Spitzer RL, Gibbon M, Williams JBW. Structured Clinical Interview for DSM-IV Axis I Disorders Non-patient Edition
. New York, NY: Biometrics Research Department, New York State Psychiatric Institute; 1996.
11. Ward MF, Wender PH, Reimherr FW. The Wender Utah rating scale: an aid in the retrospective diagnosis of childhood attention deficit hyperactivity disorder. Am J Psychiatry
12. Fossati A. Di Ceglie A, Acquarini E, et al. The retrospective assessment of childhood attention deficit hyperactivity disorder in adults: reliability and validity of the Italian version of the Wender Utah Rating Scale. Compr Psychiatry
13. Wacholder S. The case-control study as data missing by design: estimating risk differences. Epidemiology
14. Wacholder S. Design issues in case-control studies. Stat Methods Med Res
15. Wacholder S, McLaughlin JK, Silverman DR, et al. Selection of controls in case-control studies. I. Principles. Am J Epidemiol
16. Wacholder S, Silverman DR, McLaughlin JK, et al. Selection of controls in case-control studies. II. Types of controls. Am J Epidemiol
17. Wacholder S, Silverman DR, McLaughlin JK, et al. Selection of controls in case-control studies. III. Design options. Am J Epidemiol