Share this article on:

Evaluating Geographic Imputation Approaches for Zip Code Level Data: An Application to a Study of Pediatric Diabetes

Hibbert, James*; Liese, Angela*; Lawson, Andrew; Porter, Dwayne*; Puett, Robin*; Standiford, Debra; Liu, Lenna§; Dabelea, Dana

doi: 10.1097/01.ede.0000362296.71466.49
Abstracts: ISEE 21st Annual Conference, Dublin, Ireland, August 25–29, 2009: Oral Presentations

*University of South Carolina Arnold School of Public Health, Columbia, SC, United States; †Medical University of South Carolina, Chaleston, SC, United States; ‡Children's Hospital Medical Center, Cincinnati, OH, United States; §Seattle Children's Hospital Research Institute, Seattle, WA, United States; and ¶University of Colorado, Denver, CO, United States.

Abstracts published in Epidemiology have been reviewed by the organizations of Epidemiology. Affliate Societies at whose meetings the abstracts have been accepted for presentation. These abstracts have not undergone review by the Editorial Board of Epidemiology.


Back to Top | Article Outline

Background and Objective:

There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals, and at higher group-levels (e.g., spatial distribution).

Back to Top | Article Outline


We evaluated four geo-imputation methods for address allocation from ZIP codes to Census tracts at the individual and group level. Two fixed (deterministic) and two random allocation methods were evaluated, using land area or population under age 20 as weighting factors. Data included 2,126 geocoded cases of incident diabetes mellitus among youth aged 0-19 between 2002 and 2003 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic.

Back to Top | Article Outline


At the individual level, population-weighted fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaged 30.45% across all regions, followed by the populated-weighted random method; 21.07%. Distribution of cases across Census tracts was: 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. True distribution was best captured by random allocation methods, with no significant differences (P-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (P-value < 0.0003).

Back to Top | Article Outline


Results indicate fixed imputation methods yield greatest accuracy at the individual level, thus indicating their use for studies focusing on distances to exposure sites. Fixed methods result in artificial clusters in single Census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims.

© 2009 Lippincott Williams & Wilkins, Inc.