Geographic location is one of the strongest predictors of breast cancer incidence, with as much as a 10-fold difference among regions of the world.1 Within the United States, there are substantial regional variations in breast cancer rates, with the highest rates occurring in the western and northeastern states.2,3 Such regional variations have provoked public concern and generated hypotheses for factors possibly implicated in the disease's etiology. The consistently high rates of breast cancer incidence seen in the San Francisco Bay Area over the course of the past several decades,4,5 including the striking increases in incidence among women living in Marin County,6 have recently increased public concern.7
Geographic variations in breast cancer incidence, although well-recognized, are not well-understood. Many of the earlier studies designed to examine regional differences in breast cancer rates were mortality studies,2,3,8,9 which were unable to disentangle etiologic versus prognostic factors. Furthermore, most of these studies have been ecologic in design. Because data on reproductive factors (such as age at menarche or age at first full-term pregnancy) are not usually available at the population level, ecologic studies cannot examine those risk factors that could be the most important for breast cancer.
Two studies designed to evaluate the impact of traditional risk factors on regional differences in breast cancer incidence and mortality between California and other areas of the United States reported that established risk factors did not fully explain the elevated California rates.3,10 Conversely, 2 other studies, more specifically focused on elevated breast cancer rates in the San Francisco Bay Area, reported that known breast cancer risk factors could account for the breast cancer excess in this area.5,11
Our study examined the regional variations in breast cancer incidence rates among women participating in the California Teachers Study cohort, a large, well-monitored cohort of current and retired professional school employees residing throughout California. A preliminary assessment of breast cancer incidence within this cohort revealed geographic incidence patterns remarkably similar to those reported in the statewide population. Because this cohort was established specifically to study breast cancer, we were able to account for many personal risk factors measured at the individual level in a way that has not been feasible in most previous studies of regional variations in breast cancer.
The California Teachers Study cohort is an ongoing prospective study of female California professional school employees. It was established from the 133,479 respondents to a 1995 mailing, which was sent to all 329,000 active and retired female enrollees in the California State Teachers Retirement System. The cohort is followed annually for cancer incidence through linkage to California's statewide cancer registry; members receive a mailed follow-up questionnaire every 2 years to update risk factor information and collect data on factors of emerging interest. The cohort represents a broad age range (21–108 years at baseline), although the majority of participants were over age 50 (58%) when they entered the cohort. Most cohort members resided in California in 1995 (93%) and are geographically distributed throughout the state. The cohort is primarily comprised of non-Hispanic white women (87%). A full description of the California Teachers Study (CTS) cohort is available elsewhere.12 Although the incidence of invasive breast cancer in the CTS cohort is approximately 50% higher than in the statewide population,12 the geographic patterns of rate differences mirror those observed in the statewide population.4
A cover letter accompanying the first questionnaire, which described the study and its expectations of participants, was sent to all members of the California State Teachers Retirement System; return of the questionnaire constituted consent for enrollment into the cohort. Use of human subjects data in this study was reviewed and approved by the California Health and Human Services Agency, Committee for the Protection of Human Subjects.
Geocoding and Definitions of Region
The baseline residential addresses of California Teachers Study cohort members who resided in California at study entry (n = 123,925) were geocoded to a census block group. The California Cancer Registry sent 2 batches of addresses to Geographic Data Technology (Lebanon, NH)13 for geocoding. The first batch, sent in late 1997, was originally geocoded to the North American Datum 1927 (NAD27) coordinates system; these records were later transformed to match all subsequent geocoding, which was done in the North American Datum 1983 (NAD83) coordinate system. The second batch was sent to Geographic Data Technology in early 2000. The remaining ungeocoded addresses were geocoded by our own Geographic Information System (GIS) specialists at the California Department of Health Services. Many of these remaining addresses were post office boxes, which we traced (through linkages to the Department of Motor Vehicles, Experian,14 and so on) to obtain actual residential addresses. Addresses were cleaned and validated using ZP4 software15 and geocoded using ArcView16 GIS software and street databases from Geographic Data Technology,13 the U.S. Census Bureau (TIGER2000),17 and Navigational Technology, Inc. (Chicago, IL).18 The California Cancer Registry also provided us with a sample (N = 805) of addresses that had been previously geocoded by their geocoding vendors, allowing us to evaluate the consistency of their geocoding with ours. Consistency between the 2 coding approaches was good, with block group assignment agreement for approximately 97% of the sample.
Geocoding was performed blinded to case status and, ultimately, was 98% complete. We assigned cohort members to one of 3 California regions based on the county of their block group of residence: the San Francisco Bay Area (Alameda, Contra Costa, Marin, San Francisco, San Mateo, and Santa Clara counties), the Southern Coastal area (Orange, Los Angeles, and San Diego counties), and the rest of California (the State's remaining 49 counties). We defined these regions a priori based on historical statewide registry data reporting higher rates in the urban counties of the San Francisco Bay and Southern Coastal areas.4
The California Teachers Study cohort is linked annually with the California Cancer Registry, a legally mandated, statewide, population-based cancer reporting system.19 Modeled after the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) program, the California Cancer Registry maintains the highest standards for data quality and completeness; their data are estimated to be 99% complete and include case-sharing from neighboring states.20
In addition, we used mortality files (the California state mortality file, the nationwide Social Service Administration death master file, and the National Death Index mortality file), as well as reports from relatives, to ascertain the dates and causes of death for cohort members. Cohort members’ address changes were obtained through annual mailings and responses from participants. We defined cases for our analysis as any case of invasive breast cancer occurring prospectively (ie, after the date that each woman completed her baseline questionnaire) through December 31, 1999. We excluded from the study women who were diagnosed with breast cancer before completing their baseline questionnaire (n = 6131), leaving 115,611 women included in our analysis, among whom 1562 developed newly diagnosed invasive breast cancer during the follow-up period.
Calculation of Follow Up
Person-months at risk were based on the first 4 years of follow up. For women who remained in California for the entire follow-up period, we calculated person-months at risk as the number of months between the date each woman joined the cohort (ie, the date she completed her baseline questionnaire) and the earliest of 3 dates: her breast cancer diagnosis date, the date of her death, or December 31, 1999. Women who moved out of California during the follow-up period, without developing breast cancer before leaving, were presumed to have lived in California for one half of the time that had elapsed between their entry into the cohort and the date associated with the first non-California address; we therefore assigned these women person-months up to the midpoint of that period.
Assessment of Breast Cancer Risk Factors
Personal Risk Factors
From the baseline questionnaire, we collected information on the following personal breast cancer risk factors: age, race/ethnicity, family breast cancer history, age at menarche, pregnancy history, lifetime duration of breast feeding, physical activity, menopausal status, body mass index (BMI; kg/m2), alcohol consumption, and hormone therapy. To account for various risks associated with BMI for pre- and postmenopausal women, we included 6 interaction terms for BMI and menopausal status in our model using pre-/perimenopausal women with a BMI of <25.8 kg/m2 as the referent group.
Ecologically Based Risk Factors
No individual-level data on socioeconomic status (SES) were collected for the cohort. We did, however, link the residential street addresses of cohort members to U.S. census data21 to derive an estimate of neighborhood (census block group) SES. We created a summary SES metric incorporating occupation, education, and income. To do this, we first ranked all block groups in California by education level (percentage of adults over age 25 completing a college degree or higher), income (median family income), and occupation (percentage of adults employed in managerial/professional occupations) according to quartiles based on the statewide adult population. This resulted in a score of 1 to 4 for each of these SES attributes. We then created a summary SES metric by summing the scores across each of these attributes and categorizing them into 4 groups based on the quartiles of this score for the statewide population.
To define the degree of urbanization, we used a combination of census-based information. The U.S. Census Bureau defines an urbanized area as a centralized area with a population of 50,000 or more people and a population density of at least 1000 people per square mile.17 Because, by this definition, 85% of California residents live in an urban area, we used additional information to refine urbanization. Categorization was primarily based on population and ultimately refined with population density. Originally, the urbanization categorization had 5 values: “metropolitan urban” represented block groups with the highest quartile of population density within U.S. Census-defined Urbanized Areas (ie, population >1,000,000); “metropolitan suburban” included the rest of the population within Census-defined Urbanized Areas; “city” included U.S. Census-defined Places with more than 50,000 people outside of an Urbanized Area; “town” included U.S. Census-defined Places with less than 50,000 people outside of an Urbanized Area and not the lowest quartile of population density; and “rural” included U.S. Census-defined Places with less than 50,000 people outside of an Urbanized Area and in the lowest quartile of population density and unpopulated areas. Because of the small number of cases in the less populated areas, we collapsed the “town” and “rural” into one category: “small town/rural.”
We limited our statistical analyses to those cohort members who were living in California at the time they completed their baseline questionnaire, who were successfully geocoded, and who had no personal history of breast cancer at study entry (n = 115,611). Because the U.S. Census Bureau suppresses data on block groups with very small population counts, SES and urbanization data were not available for all block groups; our analyses that included these census-based ecologic risk factors were limited to the 114,927 respondents for whom this information was available. We evaluated the regional distribution of the personal and ecologically based risk factors among cohort members by examining the frequency distributions. We used Cox proportional hazards models to estimate hazard rate ratios (HRs) and 95% confidence intervals (CIs) associated with region. Initial models were adjusted for age and race only. Subsequent models were further adjusted for the personal risk factors of interest (family breast cancer history, age at menarche, pregnancy history, lifetime duration of breast feeding, physical activity, BMI, menopausal status, BMI/menopausal status interaction, alcohol consumption, and hormone therapy use), as well as neighborhood factors (SES and urbanization).
The San Francisco Bay area had the smallest cohort population (18%), followed by the Southern Coastal area (40%) and the rest of California (43%). Table 1 shows the distribution of personal risk factors among members residing in these 3 regions. Except for age at menarche and family breast cancer history, notable differences in distributions by region were evident for a number of factors potentially relevant to breast cancer risk.
Table 2 shows the distribution of ecologic risk factors by region. As expected, SES and degree of urbanization varied dramatically among the 3 regions. The proportion of women in the highest SES category was higher for those living in the San Francisco Bay Area (65%) than in the Southern Coastal area (55%) or the rest of California (28%). Cohort members in the San Francisco Bay and Southern Coastal areas were almost all living in urban or suburban block groups (96% each) compared with only approximately one fourth (24%) of the rest of California. Compared with the statewide population, cohort members were much more likely to be in the highest quartile of SES (46% vs. 25%) and to live in suburban areas (55% vs. 45%) (data not shown).
Table 3 shows the hazard rate ratios (HRs) for invasive breast cancer incidence associated with region. After adjusting for age and race, the San Francisco Bay (HR = 1.22; 95% CI = 1.06–1.40) and Southern Coastal (1.16; 1.04–1.30) areas had breast cancer incidence rates approximately 20% higher than the rest of California. Adjusting for personal risk factors and neighborhood SES did not substantially change these rate ratios, whereas adjusting for urbanization appeared to modestly increase the age- and race-adjusted HRs for the San Francisco Bay (HR = 1.40; 95% CI = 1.16–1.69) and the South Coast areas (1.32; 1.12–1.57).
In age- and race-adjusted models that excluded the region variable, both neighborhood SES and urbanization were associated with breast cancer rates (data not shown). Compared with those living in neighborhoods in the lowest SES quartile, cohort members in the highest quartile had substantially higher rates (HR = 1.39; 95% CI = 1.04–1.85). Adding region to the model reduced the point estimate and widened the confidence interval for SES (1.29; 0.96–1.74 for the highest vs. lowest quartile). Compared with small town/rural areas, urban (1.12; 0.91–1.38), suburban (1.16; 1.01–1.35), and city (1.16; 0.98–1.38) areas all showed modestly increased rates. After further adjusting for the region variable, however, the point estimates for urbanization dropped to below one for urban (0.86; 0.66–1.11) and suburban areas (0.93; 0.76–1.13), whereas the point estimate for city areas did not appreciably change (1.17; 0.99–1.40). Because adjustment for urbanization appeared to have the largest effect on the point estimates for region, in a somewhat counterintuitive direction, we further explored the relationship among urbanization, region, and breast cancer rates through stratified analyses (results not shown). These analyses seemed to suggest that the patterns of risk associated with urbanization could differ across regions. Formal tests for multiplicative interaction between urbanization and region, however, revealed no statistical evidence for interaction among these variables.
Finally, we used a comparison of log likelihood ratios for models with and without the region variable to evaluate model fit. The fit of the model with age, race, personal risk factors, SES, urbanization, and the region variable was compared with the same model without the region variable. The difference in the log likelihood ratios was significant, indicating that the region variable explains more than that already explained by the remaining factors.
In our study of regional patterns in breast cancer incidence among members of a well-defined cohort of professional California women, we observed elevated breast cancer incidence rates among San Francisco Bay and Southern Coastal area residents that do not seem to be fully explained by regional differences in personal or neighborhood risk factors. The regional rate patterns seen in this cohort are consistent with recent International Agency for Research on Cancer reports of very high invasive breast cancer rates in both the San Francisco Bay and Los Angeles (Southern Coastal) areas compared with other regions of the world.1,22 The patterns and estimated magnitude of rate differences among regions are also consistent with a previous California Department of Health Services report showing approximately 15% to 20% higher rates in these 2 areas compared with other regions of California.4,23
It has been suggested that traditional risk factors both can5 and cannot3,10 explain much of the observed historical breast cancer incidence excesses in the western regions of the United States. Differences in breast cancer incidence between the San Francisco Bay area and 7 other regions participating in the Cancer and Steroid Hormone Study were the subject of an analysis by Robbins et al.5 Using the risk factor distributions for control subjects in this 1980–1982 study of predominately premenopausal women, they concluded that regional rate differences for all ages combined could be completely explained by differences in known risk factors. In contrast, analyzing regional differences in breast cancer occurrence among members of the Nurses Health Study between 1976 and 1992, Laden et al.10 concluded that the higher incidence among postmenopausal cohort members residing in California at baseline was attenuated, but not eliminated, by adjustment for established risk factors.
In a population-based study evaluating reasons for the breast cancer excess in Marin County (a small county in the San Francisco Bay area), Prehn et al.11 examined a subset of census block-group attributes for 25 California counties and concluded that adjustment for differences in SES, urbanization, and average number of children under age 15 per adult woman (calculated as an indicator of parity) explained the higher Marin County rate during the years 1988–1992. The results presented here are not directly comparable to the Prehn study because they are based on different populations and different time periods. The Prehn study included roughly one-third of all California block groups. Also, Marin County comprises only 4.4% of the San Francisco Bay Area population studied here. Furthermore, the Prehn study encompassed a period when Marin breast cancer excesses were less pronounced (1988–1992).6 In addition, Prehn's analysis was ecologic in nature and unable to account for individual risk factors in the areas of study. Unfortunately, inadequate sample size precluded us from looking specifically at rates among cohort members living in Marin County (22 cases among 1235 cohort members).
Results from a recent pooled analysis of 47 epidemiologic studies suggested that breast-feeding patterns in developed countries are a major contributor to the higher breast cancer rates seen there, as compared with rates in developing countries where the prevalence and duration of breast feeding is greater.24 In our analyses, adjusting for breast-feeding patterns did not alter the observed regional effects. The prevalence of breast feeding in our cohort, however, is remarkably high for U.S. women (approximately 77% of parous women ever breast-fed, with a mean duration of 14 months)12 and might not offer sufficient geographic heterogeneity to detect associated risks. Likewise, mammography screening rates within the CTS cohort are universally high,12 do not vary by region, and are, thus, unlikely to explain the regional differences in rates presented here.
As a study of incidence, rather than mortality, our study more directly addresses factors influencing disease occurrence. In particular, because this cohort was originally designed to study breast cancer risk, it provides a wealth of information on critical risk factors. Detailed analyses of the major breast cancer risk factors are currently underway. Our preliminary models, however, suggest that the covariates considered here predict breast cancer risk in this cohort of women in the direction expected, based on the current literature. For the less well-studied factors of urbanization and SES, we saw risk estimates in the same direction as has been published in the literature, although the SES-related risk estimates were less dramatic in the California Teachers Study cohort than those that have been generally reported in the literature.
Because this is an occupational cohort, members tend to be more homogeneous than the general population with respect to SES. This is both a strength and a weakness of our study. It limits our ability to directly study the influence of SES on breast cancer incidence and brings into question the generalizability of our results to the more heterogeneous general population. On the other hand, it does offer a number of advantages in minimizing potential confounding from factors such as occupational exposures, differential access to health care, and other correlates of SES. It is noteworthy that our findings are remarkably similar to those of the Nurses Health Study, which is also an occupational cohort study of professional women. Furthermore, although members of the cohort could be somewhat more homogeneous than the general population with respect to some breast cancer risk factors, they are geographically distributed much as the total population of adult California women and display similar patterns of differences in breast cancer incidence.
Interestingly, adjustment for urbanization appeared to have the strongest effect on the point estimates for region, whereas adjustment for SES seemed to make little difference. The ecologic metrics of SES and urbanization used in our analysis, however, are highly correlated with each other and with region. In measuring both of these attributes at the block group level, rather than the zip code or county level, as many previous studies have done, we have been able to characterize some heterogeneity of these factors within region. Nevertheless, it is difficult to disentangle the independent effects of these factors from the effect of region in our models. This is especially true for the urbanization variable, because over 95% of both the San Francisco Bay and South Coast areas are either urban or suburban. The most striking finding from our study, however, is that personal risk factors, which are often cited as the potential explanation underpinning geographic distributions (including urban/rural gradients) in breast cancer incidence, did not appear to explain the observed rate differences seen in this cohort of women.
Similar to other studies examining geographic patterns in breast cancer incidence, our analysis is limited by our inability to incorporate information on residential mobility. Without this information, we cannot discount the possibility that the high breast cancer rates in the San Francisco Bay and Southern Coastal areas of California are the result of in-migration of “high-risk” women. If such in-migration is driving the observed geographic patterns, however, our analysis suggests that it is not the established risk factors that have put these women at higher risk, because adjustment for such factors did not explain the higher rates in these areas. Earlier lifetime exposure windows are probably more etiologically relevant; nonetheless, this study was initiated in response to concern generated by the geographic differences in published rates, which are based on address at diagnosis.
Published reports of high breast cancer rates in the San Francisco Bay and other urban areas have generated a great deal of public concern. These reports have been predicated on residence at the time of diagnosis for all ages combined, without regard for residential mobility or menopausal status. Because we saw very similar patterns of risk within the CTS cohort, our analysis was designed to assess the degree to which established personal risk factors, as well as ecologic factors, might explain these patterns in this cohort. This cohort of women might not be entirely representative of the statewide California population. Even so, the fact that the remaining excess of breast cancer in these 2 urban regions of the state is not easily explained by known risk factors underscores the importance of considering a broad spectrum of potential contributors to risk, including candidate environmental agents and markers of host vulnerability, along with traditional risk factors, in future efforts to understand the causes of this disease in women.
We express our appreciation to all of the participants in the California Teachers Study and to the analysts and staff who have contributed so much to the success of this research project. We also thank the following people for technical or administrative support: Gretchen Agha, Rachna Nivas, Theresa Saunders, Andrew Hertz, Robert Gunier, Susan Stewart, Jane Sullivan-Halley, Mark Allen, Frank Stasio, and Jan Schaefer.
1. Parkin DM, Whelan SL, Ferlay J, et al. Cancer Incidence in Five Continents,
vol VII. Lyon, France: World Health Organization, International Agency for Research on Cancer; 1997.
2. Blot WJ, Fraumeni JFJr, Stone BJ. Geographic patterns of breast cancer in the United States. J Natl Cancer Inst
3. Sturgeon SR, Schairer C, Gail M, et al. Geographic variation in mortality from breast cancer among white women in the United States. J Natl Cancer Inst
4. Perkins C, Morris C, Wright W, et al. Cancer Incidence and Mortality in California by Detailed Race/ethnicity, 1988–1992.
Sacramento, CA: California Department of Health Services, Cancer Surveillance Section; 1995.
5. Robbins AS, Brescianini S, Kelsey JL. Regional differences in known risk factors and the higher incidence of breast cancer in San Francisco. J Natl Cancer Inst
6. Prehn A, Clarke C, Topol B, et al. Increase in breast cancer incidence in middle-aged women during the 1990s. Ann Epidemiol
7. Clarke CA, Glaser SL, West DW, et al. Breast cancer incidence and mortality trends in an affluent population: Marin County, California, USA, 1990–1999. Breast Cancer Res
8. Garland FC, Garland CF, Gorham ED, et al. Geographic variation in breast cancer mortality in the United States: a hypothesis involving exposure to solar radiation. Prev Med
9. Goodwin JS, Freeman JL, Freeman D, et al. Geographic variations in breast cancer mortality: do higher rates imply elevated incidence or poorer survival? Am J Public Health
10. Laden F, Spiegelman D, Neas LM, et al. Geographic variation in breast cancer incidence rates in a cohort of US women. J Natl Cancer Inst
11. Prehn AW, West DW. Evaluating local differences in breast cancer incidence rates: a census-based methodology (United States). Cancer Causes Control
12. Bernstein L, Allen M, Anton-Culver H, et al. High breast cancer incidence rates among California teachers: results from the California Teachers Study (United States). Cancer Causes Control
13. Geographic Data Technologies (GDT) home page. Available at: http://www.geographic.com/home/index.cfm
. Accessed December 15, 2002.
14. Experian home page. Available at: http://www.experian.com/consumer/index.html
. Accessed December 15, 2002.
[computer program], version 46. Aptos, CA: Semaphore Corp; 2002.
16. ArcView, version 3. 3. Redlands, CA: ESRI; 2000.
17. US Census Bureau. Census 2000 Urban and Rural Classification. April 30, 2002. Available at: http://www.census.gov/geo/www/ua/ua_2k.html
. Accessed January 22, 2003.
18. Navagational Technologies (NavTech) home page. Available at: http://www.navtech.com/
. Accessed December 15, 2002.
19. California Cancer Registry, Data Standards and Quality Control Unit. California Cancer Reporting System Standards: Abstracting and Coding Procedures for Hospitals, vol 1, 5th ed. Sacramento, CA: California Department of Health Services; 2000.
20. Kwong S, Perkins C, Morris C, et al. Cancer in California: 1988–1999.
Sacramento, CA: California Department of Health Services, Cancer Surveillance Section; 2001.
21. Census of Population and Housing, 1990: Modified Age/Race, Sex and Hispanic Origin (MARS) State and County File
[data file]. Washington, DC: US Bureau of Census; 1992.
22. Parkin DM, Muir CS, Whelan SL, et al. Cancer Incidence in Five Continents,
vol VI. Lyon, France: World Health Organization, International Agency for Research on Cancer; 1992.
23. Von Behren J, Reynolds P. An Examination of the Reported Excess Incidence of Breast Cancer in the Bay Area.
California Association of Cancer Registries Annual Conference, San Diego, CA; April 1998.
24. Collaborative Group on Hormonal Factors in Breast Cancer. Breast cancer and breastfeeding: collaborative reanalysis of individual data from 47 epidemiological studies in 30 countries, including 50302 women with breast cancer and 96973 women without the disease. Lancet
25. Sizer F, Whitney E. Nutrition Concepts and Controversies
. Belmont, CA: West and Wadsworth; 1997.