Share this article on:

Spatial Epidemiology: Some Pitfalls and Opportunities

Ryan, Louise

doi: 10.1097/EDE.0b013e318198a5fb
Spacial Epidemiology: Commentary

From the Harvard School of Public Health, Boston, MA.

Correspondence: Louise Ryan, Harvard School of Public Health, Bldg 2, Rm 423, Biostatistics, 655 Huntington Ave, Boston MA 02115. E-mail:

The paper by Tassone et al in this issue of the journal1 makes an important contribution to the health disparities literature, while showcasing the value of sound statistical methods and especially the Bayesian approach. A powerful tool for studying health disparities, spatial epidemiology has been growing in popularity due to the increasing ease and accessibility of GIS software, statistical developments such as the CAR model and ready availability of data that characterize population health and demographic characteristics in small geographic areas such as counties and census tracts.

Caution is needed, though, because the relative ease of fitting spatial models masks a number of subtle challenges that can easily lead to inappropriate conclusions. Fortunately, the authors’ careful analysis avoids many classic pitfalls. For example, by using race-specific data on stroke deaths, the authors avoid the potential for ecologic bias that might be associated with an analysis, say, that modeled county-specific stroke deaths as a function of a county-specific covariate such as percentage black.2 I particularly like the way that the authors convert their results to county-specific age-standardized rates for blacks and whites, and then compute absolute disparity measures in addition to the more typical relative risks.

It is fascinating to compare the maps of relative and absolute disparity, because they present surprisingly different visual impressions. The variation in both relative and absolute risks underscore how much would have been missed had the authors fitted a simpler and more traditional model, for example assuming constant relative risks. The authors discuss the differences between the relative and absolute maps to some extent, although the reader (or at least this one!) is left with a sense of disquiet. It most certainly makes sense that the spatial variation in absolute disparity is driven largely by spatial variation in stroke rates for the 2 racial groups. Indeed, the map of absolute disparity looks remarkably similar to the maps of race-specific stroke rates. The maps of relative disparity, especially the one that does not include extra covariates, is remarkably, almost unbelievably smooth! I cannot help but wonder if this smoothness was somehow induced by the MultiCAR model. The authors report a high correlation between the race-specific random effects in the MultiCAR model. I would like to see what the relative disparity map looks like under the analysis that simply fits separate models for each racial group. I also wonder whether the correlation parameter itself might vary spatially.

Aside from the observations already made, I have 2 main comments. The first pertains to a caution regarding the potential for model misspecification, whereas the second relates to a potentially lost opportunity associated with the use of age standardization. My own experiences with spatial data analysis suggest that estimated covariate effects can vary quite dramatically with varying modeling assumptions. For example, some colleagues and I used postcode-level emergency–room data in New South Wales, Australia, to explore the impact of social disadvantage on rates of hospitalization for ischemic heart disease.3 Social disadvantage was measured using the Socio-Economic Index for Areas (SEIFA) index,4 which has a mean of about 1000 and ranges between 800 and 1200, with low values denoting areas of high disadvantage.

Table 1 of our paper3 shows the estimated coefficient of the SEIFA index, its associated standard error and the DIC value under 4 different models (the authors also use DIC for model comparison). All 4 models adjust for age and sex effects. Model A naively ignores the possibility of any within-postcode correlations, model B allows for a postcode-specific random effect but ignores spatial correlation, model C assumes a classic CAR model, and model D combines models B and C. All 4 models suggest a substantial decrease in the rate of ischemic heart disease hospitalizations with increasing values of SEIFA, but the magnitude varies considerably. A similar phenomenon can be seen in Breslow and Clayton's analysis of the classic Scottish lip cancer data.5 I believe that there is opportunity and need for interesting methodologic work exploring the impact of model misspecification in the spatial analysis setting.

The observed sensitivity to the assumed correlation structure for the Australian data underscores the importance of careful sensitivity analysis in general. Although the authors present some such analyses, there are a number of additional issues that could be explored, for example, adding an area-specific random effect (a “nugget” in traditional spatial statistics parlance). It would also be worth exploring the impact of various alternative approaches to constructing the so-called neighborhood matrix needed for the CAR and MultiCAR analyses. CAR models were originally developed in the context of image analysis in which data were organized in a regular grid of pixels and where it is straightforward to decide whether or not 2 pixels are neighbors (see Fig. 1). In the epidemiologic setting where regions correspond to counties or census tracts, determining an appropriate neighborhood structure can be challenging. Figure 2 displays census tracts in a region of Southeastern Massachusetts. The irregular shapes of the tracts sometimes make it difficult to decide whether to consider 2 regions as neighbors. For instance, although regions 150 and 135 are clearly neighbors (because they share fairly large proportions of each other's boundaries), it is less obvious whether to consider tracts 150 and 131 as neighbors. They touch each other's boundaries, but the region of intersection is very small. Earnest et al6 explore the impact of various ways of determining neighborhood structure, 1 considering 2 regions to be neighbors if their boundaries touch, and another considering neighbors to be those regions whose centroids are within a specified distance of each other. These authors found that neighborhood structure could have considerable impact on modeling results. It would be of interest to hear more about how neighborhood structure was defined in the authors’ analysis, and also to know whether they conducted any sensitivity analysis to assess the impact of neighborhood structure.





My last comment pertains to what might be a lost opportunity associated with the use of age standardization. Age standardization is widely used in the epidemiologic literature and has the advantage of simplifying an analysis by avoiding the need to model the age effect.7 Standardization also lowers the computational burden associated with model fitting by reducing the dimension of the data. However, working with age-standardized quantities precludes the possibility of exploring age interactions. Guha et al8 have developed a computationally efficient algorithm that facilitates the fitting of spatial regression models allowing for interactions between covariates of interest and age.7 They applied their algorithm to the Australian data, in the process revealing a strong interaction between SIEFA and age. The results imply that the association between social disadvantage and heart disease is much stronger at younger ages.

There are many possible explanations for this observed pattern. For example, it is feasible to think that adverse social conditions might have more impact on working-age people who are struggling to raise families and pay mortgages, compared with older people–especially in a country such as Australia that provides good financial and social support for older citizens. It is also likely that associations with social conditions may be masked at later ages by the generally increased rate of heart disease in elderly patients. It would be interesting to explore whether the spatial patterns in stroke deaths observed by Tassone and collegues1 vary by age. Of course the authors address this issue indirectly by focusing their analysis on subjects younger than 65 to capture premature stroke mortality. Even something as simple as reporting 2 separate analyses, 1 in subjects aged 35–50 and the other in subjects age 50–65, would be interesting.

Finally, as a statistician, I cannot resist making at least 1 nit-picking technical comment! I would like to know how sensitive the results were to the parameters of the inverse-Wishart distribution used as the prior on the variance–covariate matrix from the CAR model. A number of authors9 have written on instabilities associated with using inverse-Wishart and inverse-gamma priors on variance components. I have indeed encountered such instabilities in practice and find that simpler priors such as the uniform often do better, albeit at the cost of losing conjugacy.

Back to Top | Article Outline


LOUISE RYAN is the Henry Pickering Walcott Professor and Chair in the Department of Biostatistics at the Harvard School of Public Health. She is well known for the development and application of statistical methods to a wide range of public health issues, but especially in the area of environmental health. Professor Ryan has also been a passionate advocate of the value of diversity in higher education.

Back to Top | Article Outline


1. Tassone EC, Waller LA, Casper MA. Small-area racial disparity in stroke mortality: an application of Bayesian spatial hierarchical modeling. Epidemiology. 2009;20:
2. Wakefield J, Shaddick G. Health-exposure modeling and the ecological fallacy. Biostatistics. 2006;7:438–455.
3. Burden S, Guha S, Morgan G, Ryan L, Young L. Spatio-temporal analysis of ischemic heart disease in NSW, Australia. Environ Ecol Stat. 2005;2:427–448.
4. Trewin D. Socio-Economic Indexes for Areas, Australia, 2001. Canberra: Australian Bureau of Statistics; 2003 Information Paper ABS Catalogue No: 2039.0.
5. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993;88:9–25.
6. Earnest A, Morgan G, Mengersen K, Ryan L, Summerhayes R, Beard J. Evaluating the effect of neighbourhood weight matrices on smoothing properties of Conditional Autoregressive (CAR) models. Int J Health Geogr. 2007;6:54–66.
7. Breslow N, Day NE. Statistical Methods in Cancer Research, Volume 2: The Design and Analysis of Cohort Studies. Lyon: International Agency for Research on Cancer; 1987.
8. Guha S, Morara M, Ryan L. Gauss-Seidel Estimation of Generalized Linear Mixed Models with Application to Poisson Modeling of Spatially Varying Disease Rates. Harvard University Biostatistics Working Paper Series 2005. Available at:
9. Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006;1:515–533.
© 2009 Lippincott Williams & Wilkins, Inc.