Institutional members access full text with Ovid®

Share this article on:

Multiple Imputation for Missing Income Data in Population-Based Health Surveillance

Zeng, Zhiwei MD, MPH

Journal of Public Health Management and Practice: November-December 2009 - Volume 15 - Issue 6 - p E12–E21
doi: 10.1097/PHH.0b013e3181aab5f7

Background Although advanced multiple imputation (MI) methodology has become widely introduced and increasingly used, few have reported for health surveillance, where missing incoming data is a common and serious problem. This study examined the application of MI for incomplete income data in population-based health surveillance.

Methods In the 2002–2003 Los Angeles County Health Survey (N = 8 167), self-reported household income converted into Federal Poverty Levels (FPLs) was imputed using MI for 1 381 (16.9%) missing cases. Validity was assessed with the 6 786 completed cases where 1 381 FPLs were randomly masked and MI was applied. Consistency was examined by Z tests comparing imputed and original FPL statistics. Multiple imputation statistical inference was examined by estimating 95 percent confidence intervals of Pearson correlation coefficients with 5 percent, 10 percent, 15 percent, and 20 percent of masked and imputed FPL and with different sets of covariates and comparing them with original correlation coefficients.

Results Among 188 major surveillance statistics, Z tests showed that imputed and original FPL were consistent by 96.3 percent as demographics but only 19.4 percent as outcome variables. With well-established covariates, powerful MI statistical inference was indicated when missing proportion was within 15 percent, but it started fading out as the missing proportion increased to 15 percent and over.

Conclusions Multiple imputation provides a feasible approach and produces differing results for incomplete income data in population-based health surveillance. It performs better for demographic variables than for outcome variables and is more powerful with lower missing proportions than higher ones. With well-established covariates, MI statistical inference could be reliable for missing proportions up to 15 percent.

This article examines the application range and properties of multiple imputation for incomplete income data in population-based health surveillance.

Zhiwei Zeng, MD, MPH, is Information System Supervisor, Department of Public Health, County of Los Angeles, Los Angeles, California. Dr Zeng was Epidemiologist of the study.

Corresponding Author: Zhiwei Zeng, MD, MPH, Department of Public Health, County of Los Angeles, 2615 S Grand Ave, Room 500, Los Angeles, CA 90007 (

The author thanks Dr Margaret Shih for her valuable advice and editorial comments and acknowledges support from the Office of Health Assessment and Epidemiology, Department of Public Health, County of Los Angeles. Also, special thanks to Dr Paul Simon and Ms Cheryl Wold for their assistance at the early stage of the study.

© 2009 Lippincott Williams & Wilkins, Inc.