Hierarchical Bayesian modeling has emerged over the last 30 years as a flexible modeling approach in complex multiparameter settings. Notable applications include performance comparisons for health-care institutions and programs,^{1–3} modeling the effects of multiple correlated exposures,^{4–10} as well as disease mapping and spatiotemporal modeling of disease.^{11–15} In all of these applications the correlation between analytical units (such as geographic areas) is exploited to improve precision for unit-specific estimates as well as to obtain more plausible estimates of the pattern of variation underpinning the observed data.

The paper by Beard et al^{11} in this issue of Epidemiology illustrates hierarchical Bayesian modeling in the context of spatiotemporal analysis of access to health services. In the analysis by Beard and colleagues, hierarchical Bayesian modeling leads to intelligent smoothing of raw estimates of standardized mortality, morbidity and service utilization rates obtained from small geographic areas. The hierarchical modeling also accounts appropriately for between-area correlation in estimating the effects of area characteristics on outcomes. In particular, under the model fitted by Beard et al, standardized mortality or morbidity ratios for neighboring areas are not assumed to be independent even after conditioning on area characteristics. Accounting appropriately for departures from statistical independence is an important inferential issue that has been studied extensively from both frequentist and Bayesian viewpoints.^{16–19} However, it is the approach to smoothing that is particularly characteristic of hierarchical Bayesian modeling, and is the focus of this commentary.

#### WHY SMOOTH?

Suppose we want to describe the spatial pattern of variation in acute myocardial infarction (MI) mortality rates across a large geographic area, using data coded into standard subdivisions such as postal area codes. What would be wrong with simply calculating the MI mortality rate in each postal area and mapping the rates produced? Surely this would be “letting the data speak for themselves,” and therefore beyond question. A substantial body of statistical theory, both frequentist and Bayesian,^{20–29} suggests that such simple analyses are not the best that can be done with the data. For the frequentist, methods of analysis that permit some pooling of information across areas result in reductions in mean-squared error, while, from a Bayesian perspective, pooling of information corresponds to a more plausible prior model than the model of mutual independence implicit in the presentation of raw data summaries as the sole analytical output. Moreover, if data are sparse in some areas, raw data summaries may provide a misleading impression of the underlying pattern of variation the more so given the difficulty of adequately representing uncertainty in disease maps.

Observable data can be conceptualized as “structure plus noise” with the role of analysis being to reveal the structure by stripping away the noise.^{29} Since the raw data include the noise, it follows that raw data description is not an optimal analytical output, unless data for each analytical unit are so plentiful that simple data-summaries can be assumed to have minimal noise. If consideration is also given to the various systematic distorting influences (such as misclassification, nonresponse, and unobserved confounding) that act on typical data-gathering systems to produce the data we actually see, it becomes even less defensible to maintain that raw-data summaries are inherently “correct” or in any way morally superior to model-based estimates.^{30–33}

When data are not plentiful for all analytical units, noise can be reduced by “borrowing strength” or pooling information across analytical units. This requires a modeling framework that connects separate analytical units. One simple and familiar example is a standard log-linear model for age-specific mortality rates. The model log(*λi*) = β_{0} + age_{i} × β_{1}, with age_{i} representing age in years, says that the mortality rate for groups 1 year apart in age differ by a factor of exp(β_{1}). When the model is fitted to data, data from all age groups contribute to estimation of the parameters. Plugging these estimates into the model equation yields smoothed estimates of age-specific mortality rates. Thus, strength is borrowed across units (age-groups) through estimation of parameters of a prior model. However, a defect of such simple prior models is that they assume the specified model form is known with certainty, with the result that age-specific estimates are shrunk completely to the model predictions. One response to this is to consider more flexible functional forms such as spline models^{34} or Generalized Additive Models.^{35} However, with few exceptions, these more advanced modeling approaches still enforce smoothness and shrink estimates completely to the model predictions.

#### SMOOTHING USING HIERARCHICAL BAYESIAN MODELS

Hierarchical modeling provides an alternative approach to smoothing in which a prior model structure is embedded in a probability model to reflect uncertainty regarding the form of the model that links analytical units.

Returning to the specific problem of modeling spatial variation in MI mortality, one plausible prior model is that MI mortality varies according to some reasonably smooth spatial process. Combining a prior model of spatial smoothness with observed data will result in more precise estimates because the analysis will be conditioning on more information (observed data plus prior model). However, the prior model of spatial smoothness may be incorrect. In the logical sense, it is a priori possible that patterns of MI mortality are extremely irregular and determined by strong localized effects. Accordingly, even if we view the spatial smoothness model seems plausible, we may not wish to impose it absolutely on the data, in the sense that estimates are forced to exactly follow the smooth pattern predicted by the model. By embedding a prior expectation of smoothness in a probability model, the a priori possibility of nonsmooth patterns of variation can be acknowledged while nevertheless giving some prior weight to spatial smoothness.

With prior assumptions such as smoothness embedded within a prior model, posterior estimates are a compromise between observed data and the prior model form. This follows the standard logic of Bayesian inference—posterior estimates combine observed data with prior information. In the paper by Beard et al^{11} the assumption of spatial smoothness is embodied in a model called the intrinsic conditional autoregressive model which acts like a spatial moving average.^{13,36} Under this model, area-specific effects are modeled on a log-scale and are assigned a normal prior distribution with expectation equal to the average of the effects for the areas neighboring the target area. However, since it is only *prior* expectations that are set equal to neighborhood averages, *posterior* estimates for area effects are not constrained to equal neighborhood averages (see Fig. 2 in the paper by Beard et al^{11}); instead, each area's effect is shrunk toward the average effect for its neighborhood in a manner dependant on the fit of the prior model, and on the amount of data contributed by the area.

#### DISCUSSION

Hierarchical Bayesian models are a flexible tool for data analysis in multiparameter problems such as exploring spatial variation in health outcomes. Far from seeing multiplicity as something to be penalized (as with the Bonferroni-style corrections of naive frequentist hypothesis testing), the hierarchical Bayes approach embraces multiplicity as an opportunity to improve precision by borrowing strength across analytical units. Nevertheless, in disease-mapping applications, hierarchical Bayes modeling does not solve all problems of map interpretation, primarily because of the inherent difficulty of representing uncertainty in disease maps.^{37}

Implementation of Bayesian methods, including hierarchical Bayes methodology, is often much less intimidating than many epidemiologists imagine, with some approaches requiring only standard frequentist software.^{29,38,39} When the full flexibility of Markov Chain Monte Carlo posterior sampling is needed, software is available^{40} which, while requiring some user knowledge of Bayesian theory and computation, nevertheless frees the user from computational programming so they can concentrate on model specification—an activity in which epidemiologists should be closely involved.

#### ACKNOWLEDGMENTS

I thank Sander Greenland for commenting on an earlier version of this commentary.

#### ABOUT THE AUTHOR

*PATRICK GRAHAM is a Senior Research Fellow in the Department of Public Health, University of Otago, Christchurch. He works on applications of Bayesian modeling in health care epidemiology, causal inference theory and confidentiality research.*