Click on the links below to access all the ArticlePlus for this article.
Please note that ArticlePlus files may launch a viewer application outside of your web browser.
Cohort studies are often analyzed by comparing the number of observed deaths from a particular cause with the number expected in the general population. The expected number of deaths is calculated using mortality rates from a reference population. It is not always easy to find an appropriate reference population because of the difficulties in adjusting for deprivation score or the healthy-worker effect, but we shall assume that a reference population exists. The reference population is divided into risk-homogeneous strata (typically, 5-year age groups, sex, and possibly calendar year) and the number of person-years of follow-up in each stratum is calculated. The “expected” number of deaths is then the sum of the person-years times the rate over the strata. Unfortunately, although this approach is standard (see, for example, Breslow and Day 1), it is liable to overestimate the number of cancer deaths when individuals are cancer free at the time they enter the study and the median follow-up is relatively short (under 10 years). This issue is particularly relevant in cancer screening and prevention programs. In such studies, it is standard to exclude individuals who already have symptomatic cancer. Screening aimed at the early detection of cancer will generally be evaluated by its effect on cancer mortality, and even in cancer prevention studies long-term follow-up may look at death from cancer if there are concerns that the intervention may succeed in preventing only the less aggressive cancers. In this research note, we describe how to combine cancer incidence and survival data to obtain a more realistic estimate of the expected number of deaths.
Example 1: Ovarian Screening Trial
It has been suggested that regular screening by ultrasonography could reduce the mortality from ovarian cancer through early detection and more successful treatment of ovarian cancer. It is also possible that the removal of benign cysts destined to become malignant would favorably affect mortality. Between 1981 and 1987, 5479 self-referred women without symptoms participated in an ultrasonographic screening trial for early ovarian cancer. The vital status (and, where applicable, cause of death) as of June 1999 was traced in 5135 women. 2 Expected numbers of deaths from various causes were calculated by the standard life-table method. The observed number of deaths from all causes was lower than expected because of the “healthy volunteer effect.” The authors used proportional mortality ratios to study ovarian cancer mortality. Such an analysis fails to take account of the fact that all women were free of symptoms of ovarian cancer at enrollment.
Example 2: Colonoscopic Surveillance in a Cancer-Family Clinic
Colonoscopic surveillance and polypectomy have been proposed as methods of reducing colorectal cancer incidence and mortality in those at high risk for the disease. Such colonoscopic screening is now standard practice in cancer family clinics, but its effectiveness in reducing the mortality from colorectal cancer in those with a moderate family history has not been demonstrated. Using results from other studies, the relative risk of individuals in the clinics can be estimated based on their age and the extent of their family history. Combining these relative risks with age- and sex-specific population rates, one can estimate the expected number of cancers, in the absence of screening, in this cohort. However, the same approach would overestimate the expected number of deaths because the individuals were all free of symptomatic cancer when first screened.
To estimate the number of cancer deaths during the follow-up of a cohort initially free of cancer, we need to consider the two stages: (1) cancer incidence and (2) death from cancer. We can then consider the time to death from cancer as the sum of the time required for each of these two steps, and the distribution of the time to death as the convolution of the distributions of the two component times. Let λ(u) denote the rate of cancer at age u and F (v) the probability of dying from cancer within v years of diagnosis. Then the probability of dying from cancer for an individual who enters the cohort at age a and is followed for t years is EQUATION
In practice, we approximate the λ(a) by a piecewise constant function, and it is necessary to do the same for the distribution F. Define the i th interval of postdiagnosis follow-up by (ti−1, ti), and let Fi be the average probability of dying during follow-up for at least ti−1 and no more than ti years (approximated by [Fti−1+Fti]/2).
Further, let λj be the cancer incidence rate in stratum j, and yij the number of person-years at risk in stratum j and follow-up interval i. Then the expected number of deaths from cancer is the sum, over i and j, of yijλjFi.
Just as the incidence rate may depend on sex and year of diagnosis as well as age, so also the (cause-specific) survival function may depend on age, sex and year of diagnosis. In principle, allowance for such factors does not complicate the analysis; one simply uses Fij instead of Fi.
The Stata 3 code for carrying out such a calculation is provided with the electronic version of this article at www.epidem.com.
Data were simulated for the following example and expected numbers of deaths calculated using SEER rates. 4 Women age 50–54 years at entry were followed until 31 December 1995 or death (whichever came first). Cohort A included a pilot phase and a main study. During the pilot phase, 3000 women were recruited at a uniform rate between 1 January 1987 and 31 December 1989. A further 24,000 women were recruited at a uniform rate between 1 January 1990 and 31 December 1995. Cohort B comprised 27,000 women, recruited in 1987. Loss to follow-up (including death) occurred uniformly throughout the study, and (for simplicity) 3% of each cohort was thus censored.
The expected numbers of breast, ovarian and stomach cancer deaths have been calculated by both the traditional and the proposed methods and are presented in thetable. The traditional calculations used SEER mortality rates, in 5-year age bands, from 1987 to 1995. The proposed two-step method used SEER cancer-incidence data (also in 5-year age bands) and survival data (in single year from diagnosis for women age 50–64 at diagnosis).
In general the two methods differ most when the average follow-up is shorter (cohort A) and for cancers (such as breast) from which individuals continue to have elevated mortality rates even several years after diagnosis. The new method estimates 16.4 breast cancer deaths in cohort A; compare with 53.4 using the standard method based on mortality rates. Even for cohort B, the standard method yields a two-fold increase in the number of expected breast cancer deaths. By contrast, the two methods give almost identical results for stomach cancer deaths in cohort A. Interestingly, when applied to cohort B (those recruited in 1987 and followed until the end of 1995) the two-step method yields 11.9 expected deaths compared with 10.5 using the traditional method. A possible explanation is that the additional 1.4 deaths represent women who die after diagnoses of stomach cancer but whose causes of death are not recorded as stomach cancer.
The method of expected number of deaths was first used in the 18th century. 5 Even at that time there was evidence of what became known as the healthy-worker effect. The method proposed here does not adjust for the healthy-worker effect, in that no allowance is made for individuals who participate in studies being generally healthier than the reference population. Rather, we make explicit allowance for the fact that dying from cancer is a two-stage process: people need first to get cancer before they can die from it.
The situation in which a cohort of cancer-free individuals is followed is common in cancer prevention and cancer-screening studies. When there is no control cohort, the method proposed here can be used to produce a reference number of expected deaths. Comparisons in such nonrandomized epidemiologic studies need to be undertaken with care but are often of interest in the absence of randomized evidence. The method introduced here can also be used when planning randomized studies to project the expected number of deaths at various stages throughout the study. As illustrated here, the effect of naively using mortality rates, instead of adjusting incidence rates for survival, can be substantial. In our simulated example, the traditional method would have overestimated the expected number of deaths from breast cancer by more than two-fold.
I would like to thank Joanna Adams for help with the simulations.
1. Breslow NE, Day NE. Statistical Methods in Cancer Research. Volume II-The Design and Analysis of Cohort Studies
. Lyon, France: IARC Scientific Publications, 1987;1–406.
2. Crayford TJB, Cambell S, Bourne TH, Rawson HJ, Collins WP. Benign ovarian cysts and ovarian cancer: a cohort study with implications for screening. Lancet 2000; 355: 1060–1063.
3. StataCorp. Stata Statistical Software. Release 7.0. College Station, TX: Stata Corporation, 2001.
5. Keiding N. The method of expected number of deaths
, 1786–1886–1986. Int Stat Rev 1987; 55: 1–20.