Skip Navigation LinksHome > March 2011 - Volume 22 - Issue 2 > Sample-size Formula for Case-cohort Studies
doi: 10.1097/EDE.0b013e3182087650

Sample-size Formula for Case-cohort Studies

Kubota, Kiyoshi; Wakana, Akira

Supplemental Author Material
Article Outline
Collapse Box

Author Information

Department of Pharmacoepidemiology, Faculty of Medicine, University of Tokyo, Bunkyo-ku, Tokyo, kubotape-tky@umin.ac.jp

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com).

Back to Top | Article Outline

To the Editor:

The case-cohort design is an efficient alternative to the full cohort design. When compared with the case-control study nested within the cohort, the case-cohort design has flexibility for a series of exploratory analyses because a single subcohort is employed to analyze multiple outcomes.1,2 This design feature is of particular importance in some specific types of research, including pharmacoepidemiology studies. For example, it can be used to evaluate the association between a single specific drug and multiple adverse events, of which the association with some of the events is often unknown or little understood at the beginning of the study. Nevertheless, the case-cohort design has not often been employed. One of the reasons hindering the wide use of the design may be the scarcity of the information essential for planning individual studies, including sample size calculation. Recently, Cai and Zeng3,4 have presented a method for power/sample size calculation as a natural generalization of the log-rank test in the full cohort design. We show a simple sample size formula for the case-cohort design interpretable as the straightforward expansion of the conventional sample-size formula for the cohort study.

Nfull denotes the sample size needed for the cohort study and N1full (N0full) is the size of the exposed (unexposed) population in the full cohort, that is, Nfull = (1 + K)N1full where K = N0full/N1full. When RR is the relative risk, or the ratio of the risk (incidence proportion) in the exposed (P1) to that in the unexposed (P0) (ie, RR = P1/P0) and PD is the common estimate of the incidence proportion under the null hypothesis defined as PD = (N1fullP1 + N0fullP0)/Nfull= P0(RR + K)/(1 + K), based on the conventional sample size formula for the cohort study,

where zc is (1 − c) th standard normal quantile, A = (1 + 1/K)PD(1 − PD), B = RR · P0 (1 − RR · P0) + P0(1 − P0)/K and C = P0 (RR − 1). Using m, the ratio of the subcohort to cases in the entire cohort, the entire size of the case-cohort study, N, is simply formulated as

Equation (Uncited)
Equation (Uncited)
Image Tools

Of note, m should be assigned by a researcher who is planning the study.

Equation (Uncited)
Equation (Uncited)
Image Tools

A simulation study using a model subject to time-to-event analysis5 revealed that the proposed sample size yielded a satisfactory empirical power and type I empirical error rate. For a single event, the number of subjects where the detailed information on covariates is collected (ie, subcohort members and/or cases) defined as ndetail is the smallest when m = 1; however, for multiple events, ndetail is the smallest when m is larger than 1. In general, with a larger m, the size of the entire cohort N is closer to Nfull but ndetail is larger. To achieve a good balance between N and ndetail, m = 3-5 may be adopted in many occasions. For example, (N, ndetail) = (19, 972, 70) and (11, 984, 126) for m = 1 and 5, respectively, when (P0, RR, K, α, β) = (0.001, 4, 3, 0.05, 0.2). In actual situations, if the estimation for all or some of covariates is quite costly, the value of ndetail may be minimized by adjusting m within available resources. Details on derivation of the formula and simulation are available in the eAppendix (http://links.lww.com/EDE/A449).

Kiyoshi Kubota

Akira Wakana

Department of Pharmacoepidemiology

Faculty of Medicine

University of Tokyo

Bunkyo-ku, Tokyo


Back to Top | Article Outline


1.Kupper LL, McMichael AJ, Spirtas R. A hybrid epidemiologic study design useful in estimating relative risk. J Am Stat Assoc. 1975;70:524–528.

2.Langholz B, Thomas DC. Nested case-control and case-cohort sampling from a cohort: a critical comparison. Am J Epidemiol. 1990;131:169–176.

3.Cai J, Zeng D. Sample size/power calculation for case-cohort studies. Biometrics. 2004;60:1015–1024.

4.Cai J, Zeng D. Power calculation for casecohort studies with nonrare events. Biometrics. 2007;63:1288–1295.

5.Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat. 1988;16:64–81.

Cited By:

This article has been cited 1 time(s).

Revue D Epidemiologie Et De Sante Publique
Case-cohort surveys
Marti, H; Chavance, M
Revue D Epidemiologie Et De Sante Publique, 61(1): 67-74.
Back to Top | Article Outline

Supplemental Digital Content

Back to Top | Article Outline

© 2011 Lippincott Williams & Wilkins, Inc.

Twitter  Facebook 


Article Tools



Article Level Metrics