Share this article on:

Intelligent Smoothing Using Hierarchical Bayesian Models

Graham, Patrick

doi: 10.1097/EDE.0b013e31816b7859

Hierarchical Bayesian modeling provides a flexible approach to modeling in multiparameter problems. Examples include disease mapping and spatiotemporal analysis, and multiple exposure modeling. A key feature of hierarchical Bayesian models is that prior expectations regarding model structure are embedded in a probability model that reflects uncertainty about the form of the structure that links analytical units (such as geographic areas). This results in posterior estimates that are compromises between raw data summaries and estimates that conform exactly to the prior model structure. The posterior estimates are more precise and generally have lower mean-squared error than traditional data summaries, and yet are not strictly constrained to follow a posited prior model form.

From the Department of Public Health and General Practice, University of Otago, Christchurch, New Zealand.

Submitted 14 December 2007; accepted 17 December 2007.

Correspondence: Patrick Graham, Department of Public Health and General Practice, University of Otago, Christchurch, P.O. Box 4345, Christchurch, New Zealand. E-mail:

Hierarchical Bayesian modeling has emerged over the last 30 years as a flexible modeling approach in complex multiparameter settings. Notable applications include performance comparisons for health-care institutions and programs,1–3 modeling the effects of multiple correlated exposures,4–10 as well as disease mapping and spatiotemporal modeling of disease.11–15 In all of these applications the correlation between analytical units (such as geographic areas) is exploited to improve precision for unit-specific estimates as well as to obtain more plausible estimates of the pattern of variation underpinning the observed data.

The paper by Beard et al11 in this issue of Epidemiology illustrates hierarchical Bayesian modeling in the context of spatiotemporal analysis of access to health services. In the analysis by Beard and colleagues, hierarchical Bayesian modeling leads to intelligent smoothing of raw estimates of standardized mortality, morbidity and service utilization rates obtained from small geographic areas. The hierarchical modeling also accounts appropriately for between-area correlation in estimating the effects of area characteristics on outcomes. In particular, under the model fitted by Beard et al, standardized mortality or morbidity ratios for neighboring areas are not assumed to be independent even after conditioning on area characteristics. Accounting appropriately for departures from statistical independence is an important inferential issue that has been studied extensively from both frequentist and Bayesian viewpoints.16–19 However, it is the approach to smoothing that is particularly characteristic of hierarchical Bayesian modeling, and is the focus of this commentary.

Back to Top | Article Outline


Suppose we want to describe the spatial pattern of variation in acute myocardial infarction (MI) mortality rates across a large geographic area, using data coded into standard subdivisions such as postal area codes. What would be wrong with simply calculating the MI mortality rate in each postal area and mapping the rates produced? Surely this would be “letting the data speak for themselves,” and therefore beyond question. A substantial body of statistical theory, both frequentist and Bayesian,20–29 suggests that such simple analyses are not the best that can be done with the data. For the frequentist, methods of analysis that permit some pooling of information across areas result in reductions in mean-squared error, while, from a Bayesian perspective, pooling of information corresponds to a more plausible prior model than the model of mutual independence implicit in the presentation of raw data summaries as the sole analytical output. Moreover, if data are sparse in some areas, raw data summaries may provide a misleading impression of the underlying pattern of variation the more so given the difficulty of adequately representing uncertainty in disease maps.

Observable data can be conceptualized as “structure plus noise” with the role of analysis being to reveal the structure by stripping away the noise.29 Since the raw data include the noise, it follows that raw data description is not an optimal analytical output, unless data for each analytical unit are so plentiful that simple data-summaries can be assumed to have minimal noise. If consideration is also given to the various systematic distorting influences (such as misclassification, nonresponse, and unobserved confounding) that act on typical data-gathering systems to produce the data we actually see, it becomes even less defensible to maintain that raw-data summaries are inherently “correct” or in any way morally superior to model-based estimates.30–33

When data are not plentiful for all analytical units, noise can be reduced by “borrowing strength” or pooling information across analytical units. This requires a modeling framework that connects separate analytical units. One simple and familiar example is a standard log-linear model for age-specific mortality rates. The model log(λi) = β0 + agei × β1, with agei representing age in years, says that the mortality rate for groups 1 year apart in age differ by a factor of exp(β1). When the model is fitted to data, data from all age groups contribute to estimation of the parameters. Plugging these estimates into the model equation yields smoothed estimates of age-specific mortality rates. Thus, strength is borrowed across units (age-groups) through estimation of parameters of a prior model. However, a defect of such simple prior models is that they assume the specified model form is known with certainty, with the result that age-specific estimates are shrunk completely to the model predictions. One response to this is to consider more flexible functional forms such as spline models34 or Generalized Additive Models.35 However, with few exceptions, these more advanced modeling approaches still enforce smoothness and shrink estimates completely to the model predictions.

Back to Top | Article Outline


Hierarchical modeling provides an alternative approach to smoothing in which a prior model structure is embedded in a probability model to reflect uncertainty regarding the form of the model that links analytical units.

Returning to the specific problem of modeling spatial variation in MI mortality, one plausible prior model is that MI mortality varies according to some reasonably smooth spatial process. Combining a prior model of spatial smoothness with observed data will result in more precise estimates because the analysis will be conditioning on more information (observed data plus prior model). However, the prior model of spatial smoothness may be incorrect. In the logical sense, it is a priori possible that patterns of MI mortality are extremely irregular and determined by strong localized effects. Accordingly, even if we view the spatial smoothness model seems plausible, we may not wish to impose it absolutely on the data, in the sense that estimates are forced to exactly follow the smooth pattern predicted by the model. By embedding a prior expectation of smoothness in a probability model, the a priori possibility of nonsmooth patterns of variation can be acknowledged while nevertheless giving some prior weight to spatial smoothness.

With prior assumptions such as smoothness embedded within a prior model, posterior estimates are a compromise between observed data and the prior model form. This follows the standard logic of Bayesian inference—posterior estimates combine observed data with prior information. In the paper by Beard et al11 the assumption of spatial smoothness is embodied in a model called the intrinsic conditional autoregressive model which acts like a spatial moving average.13,36 Under this model, area-specific effects are modeled on a log-scale and are assigned a normal prior distribution with expectation equal to the average of the effects for the areas neighboring the target area. However, since it is only prior expectations that are set equal to neighborhood averages, posterior estimates for area effects are not constrained to equal neighborhood averages (see Fig. 2 in the paper by Beard et al11); instead, each area's effect is shrunk toward the average effect for its neighborhood in a manner dependant on the fit of the prior model, and on the amount of data contributed by the area.

Back to Top | Article Outline


Hierarchical Bayesian models are a flexible tool for data analysis in multiparameter problems such as exploring spatial variation in health outcomes. Far from seeing multiplicity as something to be penalized (as with the Bonferroni-style corrections of naive frequentist hypothesis testing), the hierarchical Bayes approach embraces multiplicity as an opportunity to improve precision by borrowing strength across analytical units. Nevertheless, in disease-mapping applications, hierarchical Bayes modeling does not solve all problems of map interpretation, primarily because of the inherent difficulty of representing uncertainty in disease maps.37

Implementation of Bayesian methods, including hierarchical Bayes methodology, is often much less intimidating than many epidemiologists imagine, with some approaches requiring only standard frequentist software.29,38,39 When the full flexibility of Markov Chain Monte Carlo posterior sampling is needed, software is available40 which, while requiring some user knowledge of Bayesian theory and computation, nevertheless frees the user from computational programming so they can concentrate on model specification—an activity in which epidemiologists should be closely involved.

Back to Top | Article Outline


I thank Sander Greenland for commenting on an earlier version of this commentary.

Back to Top | Article Outline


PATRICK GRAHAM is a Senior Research Fellow in the Department of Public Health, University of Otago, Christchurch. He works on applications of Bayesian modeling in health care epidemiology, causal inference theory and confidentiality research.

Back to Top | Article Outline


1.Burgess JF Jr., Christiansen CL, Michalak SE, et al. Medical profiling: improving standards and risk adjustments using hierarchical models. J Health Econ. 2000;19:291–309.
2.Christiansen CL, Morris CN. Improving the statistical approach to health care provider profiling. Ann Int Med. 1997;127:764–768.
3.Normand SL, Glickman ME, Gatsonis CA. Statistical methods for profiling providers of medical care: issues and applications. J Am Stat Assoc. 1997;92:803–814.
4.Thomas DC, Siemiatycki J, Dewar R, et al. The problem of multiple inference in studies designed to generate hypotheses. Am J Epidemiol. 1985;122:1080–1095.
5.Greenland S. Methods for epidemiologic analyses of multiple exposures: a review and comparative study of maximum-likelihood, preliminary testing and empirical-Bayes regression. Stat Med. 1993;12:717–736.
6.Greenland S. A semi-Bayes approach to the analysis of correlated multiple associations, with an application to an occupational cancer mortality study. Stat Med. 1992;11:219–230.
7.Witte JS, Greenland S, Bird CL, et al. Hierarchical regression analysis applied to a study of multiple dietary exposures and breast cancer. Epidemiology. 1994;5:612–621.
8.MacLehose RF, Dunson DB, Herring AH, et al. Bayesian methods for highly correlated exposure data. Epidemiology. 2007;18:199–207.
9.Thomas DC, Witte JS, Greenland S. Dissecting effects of complex mixtures: who's afraid of informative priors? Epidemiology. 2007;18:186–190.
10.Witte JS. Genetic analysis with hierarchical models. Genet Epidemiol. 1997;14:1137–1142.
11.Beard JR, Earnest A, Morgan G, et al. Socioeconomic disadvantage and mortality from acute coronary events: a spatiotemporal analysis. Epidemiology. 2008;19:485–492.
12.Marshall R. Mapping disease and mortality rates using empirical Bayes estimators. Appl Stat. 1991;40:283–294.
13.Bernardinelli L, Clayton D, Pascutto C, et al. Bayesian analysis of space-time variation in disease risk. Stat Med. 1995;14:2433–2443.
14.Schootman M, Sun D. Small-area incidence trends in breast cancer. Epidemiology. 2004;15:300–307.
15.Lawson AB. Disease map reconstruction. Stat Med. 2001;20:2183–2204.
16.Liang KY, Zeger SL. Longitudinal data analysis using generalised linear models. Biometrika. 1986;73:13–22.
17.Ziang SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42:121–130.
18.Laird NM, Ware JH. Random effects models for longitudinal data. Biometrics. 1982;38:963–974.
19.Zeger SL, Karim MR. Generalized linear models with random effects; a Gibbs sampling approach. J Am Stat Assoc. 1991;70:138–144.
20.De la Cruz-Mesia R, Marshall G. Non-linear random effects models with continuous time autoregressive errors: a Bayesian approach. Stat Med. 2006;25:1471–1484.
21.Stein C. Inadmissability of the usual estimator for the mean of a multivariate normal distribution. In: Neyman J, ed. Proceedings of the Third Berkeley Symposium. Vol. 1. Berkeley: University of California Press; 1955:197–206.
22.James W, Stein C. Estimation with quadratic loss. In: Neyman J, ed. Proceedings of the Fourth Berkeley Symposium. Vol. 1. Berkeley: University of California Press; 1960:361–380.
23.Efron B, Morris C. Stein's estimation rule and its competitors—an empirical Bayes approach. J Am Stat Assoc. 1973;68:117–130.
24.Efron B, Morris C. Data analysis using Stein's estimator and its generalizations. J Am Stat Assoc. 1975;70:311–319.
25.Morris CN. Parametric empirical Bayes inference: theory and applications. J Am Stat Assoc. 1983;78:47–65.
26.Lindley DV, Smith AFM. Bayes estimates for the linear model. J Royal Stat Soc B. 1972;34:1–41.
27.Deely JJ, Lindley DV. Bayes empirical Bayes. J Am Stat Assoc. 1981;76:833–841.
28.Gelman A, Carlin JB, Stern HS, et al. Bayesian Data Analysis. London: Chapman and Hall; 1995.
29.Greenland S. Smoothing observational data: a philosophy and implementation for the health sciences. Int Stat Rev. 2006;74:31–46.
30.Maclure M, Schneeweiss S. Causation of bias: the episcope. Epidemiology. 2001;12:114–122.
31.Lash TL, Fink AK. Semi-automated sensitivity analysis to assess systematic errors in observational data. Epidemiology. 2003;14:451–458.
32.Phillips CV. Quantifying and reporting uncertainty from systematic errors. Epidemiology. 2003;14:459–466.
33.Greenland S. Multiple-bias modelling for analysis of observational data. J Royal Stat Soc A. 2005;168:267–306.
34.Greenland S. Dose-response and trend analysis in epidemiology: alternatives to categorical analysis. Epidemiology. 1995;6:356–365.
35.Hastie TJ, Tishribani RJ. Generalized Additive Models. New York: Chapman and Hall; 1990.
36.Besag J, York J, Mollie A. Bayesian image restoration with two applications in spatial statistics. Ann Inst Stat Math. 1991;43:1–59.
37.Gelman A, Price PN. All maps of parameter estimates are misleading. Stat Med. 1999;18:3221–3234.
38.Greenland S. Bayesian perspectives for epidemiological research. II. Regression analysis. Int J Epidemiol. 2007;36:195–202.
39.Greenland S, Christensen R. Data augmentation priors for Bayesian and semi-Bayes analyses of conditional-logistic and proportional-hazards regression. Stat Med. 2001;20:2421–2428.
40.WinBugs [computer program]. Version 1.4. Cambridge: MRC Biostatistics Unit; 2003.
© 2008 Lippincott Williams & Wilkins, Inc.