Studies of ecologic or aggregate data suffer from a broad range of biases when scientific interest lies with individual-level associations. To overcome these biases, epidemiologists can choose from a range of designs that combine these group-level data with individual-level data. The individual-level data provide information to identify, evaluate, and control bias, whereas the group-level data are often readily accessible and provide gains in efficiency and power. Within this context, the literature on developing models, particularly multilevel models, is well-established, but little work has been published to help researchers choose among competing designs and plan additional data collection.
We review recently proposed “combined” group- and individual-level designs and methods that collect and analyze data at 2 levels of aggregation. These include aggregate data designs, hierarchical related regression, two-phase designs, and hybrid designs for ecologic inference.
The various methods differ in (i) the data elements available at the group and individual levels and (ii) the statistical techniques used to combine the 2 data sources. Implementing these techniques requires care, and it may often be simpler to ignore the group-level data once the individual-level data are collected. A simulation study, based on birth-weight data from North Carolina, is used to illustrate the benefit of incorporating group-level information.
Our focus is on settings where there are individual-level data to supplement readily accessible group-level data. In this context, no single design is ideal. Choosing which design to adopt depends primarily on the model of interest and the nature of the available group-level data.
SUPPLEMENTAL DIGITAL CONTENT IS AVAILABLE IN THE TEXT.
From the aDepartment of Biostatistics, Harvard School of Public Health, Boston, MA; and bDepartment of Epidemiology and Program in Public Health, University of California at Irvine, Irvine, CA.
Submitted 13 April 2010; accepted 19 November 2010.
Supported, in part, by NCI R-01 grant CA125081.
Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com).
Correspondence: Sebastien Haneuse, Department of Biostatistics, Harvard School of Public Health, 677 Huntington Ave, Boston, MA 02115. E-mail: firstname.lastname@example.org.