Secondary Logo

Journal Logo

Methods: Commentary

Sample-size Efficiencies via the Multiple-Primary-Cancer Study: Why Fewer Cases May Not Mean Less Work

Berrington de González, Amy

Author Information
doi: 10.1097/EDE.0b013e3181d74c08
  • Free

Kuligina et al1 suggest that substantial sample size reductions could be achieved in genome-wide discovery studies via a design in which individuals with multiple primary cancers in the same organ are compared with disease-free controls. The proposal is illustrated with results from several case-control studies of contralateral breast cancer. These examples provide preliminary, empirical evidence in support of the assumption underlying the approach, which is that the relative risk is approximately the square of the relative risk found in the traditional case-control design (single primary cases vs. disease-free controls).

It has been more than a decade since the publication of Begg's original article on this topic, which highlighted the sample-size advantages of a related approach in which cases with multiple primary cancers are compared with controls with a single primary.2 However, relatively few epidemiologic studies have taken advantage of this approach to sample size reduction. Will this new publication, which proposes even greater efficiencies, convert more epidemiologists to the use of multiple primary cancers?

There are 2 reasons why the utility of this approach may still be somewhat limited. First, for many cancer sites it may be difficult to obtain the required number of cases with multiple cancers in the same organ. Second, there is considerable variation in the rules for registering multiple primary cancers among cancer registries and also within registries over time, and so careful review of the cases will be required. Kuligina et al1 briefly mention both of these limitations in their discussion. Some additional details about these practical difficulties may be useful for researchers considering this study design, to illustrate why the smaller sample size required may not necessarily mean less work.

With improved cancer survival, multiple primary cancers now make up about 15% of annual cancer diagnoses in the United States.3 Nevertheless, on a site-by-site basis they are mostly still rare, especially if the outcome of interest is 2 independent cancers in the same organ. The Table shows the number of multiple primary cancers in the same organ (eg, colon cancers after a first colon cancer) reported to the 17 US Surveillance Epidemiology and End Results (SEER) registries.4 These registries are an important resource for multiple primary cancer studies because they employ sophisticated coding rules for these events and cover a large population (about 25% of the US population). For 13 of the major 20 cancers shown, the number of multiple cancer diagnoses in 2006 was less than 100.

TABLE. Number of Multiple Primary Cancers in the Same Organ Reported to the SEER 17 Registries4 According to Diagnosis Period of the Second Cancer

The annual number of multiple primary registrations may be small because the initial cancer is rare, such as for laryngeal cancer (n multiple primary diagnoses in 2006 = 9) or testicular cancer (n = 26) or because this cancer has poor survival as with pancreatic (n = 1), liver (n = 2), esophageal (n = 6), and brain cancer (n = 36). Some common cancers with good survival also rarely result in multiple primaries because the cancer treatment often involves removing all or most of the organ, such as for endometrial cancer (n = 16). The paucity of these cases would render even a small incident case-control study difficult. A prevalent case-control study is a possible alternative for cancers with good survival, although (as Kuligina et al note) this requires the additional assumption that the gene or risk factor of interest are not related to survival.

One of the key assumptions underlying the proposed study design is that the cancers are of different clonal origin and not a recurrence, metastasis or a multifocal occurrence of the initial cancer. Cancer registries have developed coding rules to try to maximize the probability of correct classification. Unfortunately, there is considerable variation in multiple primary coding rules among cancer registries, and rules within registries have also changed over time.5 For example, the International Agency for Research on Cancer (IARC) recommends that registries report only one cancer per organ (or pair of organs) per lifetime, unless the 2 cancers have different histologies.6 However, in the SEER program, since 1979, cancers in the same organ with the same histology have been reported as multiple primaries if there is more than 2 months between the diagnosis dates.7 In practice the requirement of different histology is not necessarily clear cut, and so histology groupings have to be defined for this purpose.6 In addition there are a number of site-specific exceptions to these rules. Prostate and bladder cancers are frequently multifocal; even in SEER these are not reported as new primaries if the histology is the same.7 SEER also has special rules for colon cancer (not used in most other registries) in which each segment of the colon is considered a separate organ. For the oral cavity and pharynx, IARC has developed coding rules that define groups of sites that should be considered as a single organ.6 These nuances could make it difficult or time-consuming to combine cases from different registries, or to include cases diagnosed over a long time period for inclusion in a prevalent case-control study.

Subtle assumptions underpin the design advantages of these studies, such as the requirement that the multiple events are “independent.” Although Kuligina et al1 provide some preliminary empirical evidence that supports one of the underlying assumptions, the field would benefit from a more detailed theoretical evaluation of the assumptions from both the biologic and statistical perspective. The application of the proposed study designs to noncancer diseases is also an interesting question that warrants further investigation. The issue of independence of events may be even more challenging to define and to determine for many noncancer diseases. The proposed sample-size efficiencies should also apply to cohort studies, although this may depend on how the individuals who only develop a single cancer are treated in the analysis. The practical difficulties described above, however, would also apply.

Two of the key studies in this field (the WECARE study of contralateral breast cancer8,9 and the GEMs study of multiple melanomas10,11) have both produced interesting findings for a number of genes and other risk factors for these diseases. These cancer sites are probably 2 of the best candidates for future studies. For other cancers, the savings in sample size have to be weighed carefully against the additional work and assumptions that may be required.


AMY BERRINGTON DE GONZÁLEZ is an Investigator in the Division of Cancer Epidemiology and Genetics at the National Cancer Institute and Adjunct Faculty at the Johns Hopkins Bloomberg School of Public Health. Her research interests include methodologic issues in cancer epidemiology, radiation risk assessment, and second cancers. She is currently carrying out a study of risk factors for contralateral breast cancer.


1. Kuligina E, Reiner A, Imyanitov EN, Begg CB. Evaluating cancer epidemiologic risk factors using multiple primary malignancies. Epidemiology. 2010;21:366–372.
2. Begg CB, Berwick M. A note on the estimation of relative risks of rare genetic susceptibility markers. Cancer Epidemiol Biomarkers Prev. 1997;6:99–103.
3. Ries L, Melbert D, Krapcho M, et al. SEER cancer statistics review, 1975–2004. Bethesda, MA: National Cancer Institute; 2007.
4. Surveillance Research Program, National Cancer Institute SEER*Stat software version 6.5.2. Available at:
5. Filali K, Hedelin G, Schaffer P, et al. Multiple primary cancers and estimation of incidence trends. Eur J Cancer. 1996;32A:683–690.
6. International Agency for Research on Cancer. International Rules for Multiple Primary Cancers (ICD-0 Third edition). Lyon: IARC; 2004.
7. Surveillance Epidemiology and End Results Program (SEER). Multiple primary and histology coding rules. Bethesda, MD: National Cancer Institute; 2007.
8. Begg CB, Haile RW, Borg A, et al. Variation of breast cancer risk among BRCA1/2 carriers. JAMA. 2008;299:194–201.
9. Bernstein JL, Teraoka S, Haile RW, et al; WECARE Study Collaborative Group. Designing and implementing quality control for multi-center screening of mutations in the ATM gene among women with breast cancer. Hum Mutat. 2003;21:542–550.
10. Begg CB, Hummer AJ, Mujumdar U, et al; GEM Study Group. A design for cancer case-control studies using only incident cases: experience with the GEM study of melanoma. Int J Epidemiol. 2006;35:756–764.
11. Berwick M, Orlow I, Hummer AJ, et al; GEM Study Group. The prevalence of CDKN2A germ-line mutations and relative risk for cutaneous malignant melanoma: an international population-based study. Cancer Epidemiol Biomarkers Prev. 2006;15:1520–1525.
© 2010 Lippincott Williams & Wilkins, Inc.