The National Health Interview Survey (NHIS) is a primary source of information on the changing health of the US population over the past 4 decades. The full potential of NHIS data for analyzing long-term change, however, has rarely been exploited. Time series analysis is complicated by several factors: large numbers of data files and voluminous documentation; complexity of file structures; and changing sample designs, questionnaires, and variable-coding schemes. We describe a major data integration project that will simplify cross-temporal analysis of population health data available in the NHIS. The Integrated Health Interview Series (IHIS) is a Web-based system that provides an integrated set of data and documentation based on the NHIS public use files from 1969 to the present. The Integrated Health Interview Series enhances the value of NHIS data for researchers by allowing them to make consistent comparisons across 4 decades of dramatic changes in health status, health behavior, and healthcare.
From the aDivision of Health Policy & Management, University of Minnesota and bMinnesota Population Center, University of Minnesota, Minneapolis, MN.
Submitted 22 August 2007, accepted 18 March 2008.
Supported by Grant Number R01HD046697 from the National Institute of Child Health and Human Development.
Supplemental material for this article is available with the online version of the journal at www.epidem.com; click on “Article Plus.”
Correspondence: Pamela Jo Johnson, Division of Health Policy & Management, University of Minnesota, School of Public Health, 2221 University Ave SE, Suite 345, Minneapolis, MN 55414. E-mail: firstname.lastname@example.org
The National Health Interview Survey (NHIS) is a leading source of data on the health of the American population.1 These data are used to monitor the health of the US population,2 track progress toward Healthy People 2010 objectives,3 and evaluate the quality of health care in the United States4 The rich array of data have also been valuable for a broad range of population health research. NHIS data allow examination of conditions such as cancer, diabetes, hypertension, asthma, and functional limitations. Data are also available on preventive care utilization, including cancer screening5–7 and immunization,8,9 and on health behaviors such as diet,10,11 physical activity,12–14 and tobacco use.15–17 A wealth of sociodemographic information permits examination of social disparities in access to care, morbidity, and mortality.18–22 Moreover, pooling multiple years of data provides sufficient sample size for analysis of subpopulations of interest.23,24
The NHIS is the longest-running US health survey, with annual microdata from 1963 to the present. This broad chronologic coverage makes these data uniquely suited for studying changes in health over time. Yet, cross-temporal analyses of these important data have been uncommon, particularly by researchers outside the National Center for Health Statistics.25 Use of NHIS data has increased in recent years, most notably after the 2001 release of data files on the Internet. However, use of these complex data in time-series analyses remains rare. The purpose of this paper is to introduce this resource to the epidemiologic community.
The Data Integration Project
The Integrated Health Interview Series (IHIS) project is a well-documented cross-sectional time series based on the National Health Interview Survey. We make these data freely available through a user-friendly Web-based data dissemination system (available at http://www.ihis.us) to facilitate informed analysis of this valuable source of information about the nation's health. The IHIS builds on the model of the Integrated Public Use Microdata Series (IPUMS), a harmonized set of US Census data from 1850 to 2000.26 There are 3 components to the IHIS data integration project: (1) harmonization, (2) documentation, and (3) dissemination.
Discontinuities in National Health Interview Survey variables complicate analysis of change over time. Harmonization is the process of taking original variables with different coding schemes and creating a new variable that is comparable over time. The “translation table” is the foundation of this harmonization work. It is a tool (in spreadsheet format) for laying out the various coding schemes for each year and then aligning the coding schemes into a single integrated scheme.
In some cases, the original variables are completely or nearly compatible, and recoding them into a common classification is relatively straightforward. For example, marital status is nearly identical over time, although with small differences. Table 1 shows selected sections of the marital status translation table. In the first 2 columns, the final integrated coding scheme and value labels are listed with the original (unharmonized) codes for each year in the remaining columns.
For other variables, it is impossible to construct a single uniform classification without losing information. Some years provide more detail than others, and using the “lowest common denominator” of all years would discard important information. In these cases, we construct composite coding schemes. The first digit(s) of the code provide information available across all years. The next digits provide additional information available in a broad subset of years. For example, the variable for self-reported main race uses a variety of coding schemes over time. Table 2 shows selected sections of the translation table, and eFigure 1 (available with the online version of this article) shows a partial screenshot of the IHIS codes available for this variable. The utility of composite coding can be seen in the case of Asian or Pacific Islanders. Codes 400 through 430 are all subclassifications of this group. Using the first digit (4) provides the broadest comparable grouping over time. Using the second digit distinguishes Asian (41) from Pacific Islander (42). Researchers interested in even finer distinctions can use the 3-digit values.
Harmonization also allows us to address sample design discontinuities. We constructed the IHIS survey design variables to be usable when examining data from 1 year or from many years. We employed the concatenated design period pooling approach suggested by Korn and Graubard27 for pooling data from 1 survey over multiple years and sample designs. Strata and primary sampling unit (PSU) variables are constructed so researchers need do no additional recoding of these variables, regardless of which years of data are analyzed.
The Integrated Health Interview Series provides documentation designed to enhance researchers’ ability to work with the data as a cross-sectional time series. Along with detailed descriptions of each variable, the IHIS also includes general documentation (such as user notes about the original NHIS source data, sample design, and sampling weights) and guidance on analysis and variance estimation.
For each variable, we consulted the survey descriptions, codebooks, questionnaires, and interviewer instructions for each year, as well as documented discussions of survey methodology, concepts, and sample selection.28–33 We reorganized this information by putting all essential facts relating to one variable over time into a single narrative.
Variable-specific documentation covers the meaning of a variable, years available, universe definitions, codes, and frequency distributions. We also provide discussions of cross-temporal comparability for each variable. We have noted potential problems in combining multiple years of the variable and offer suggestions for maximizing comparability and for choosing appropriate weights. The variable descriptions also reference related variables, with information accessible via hyperlinks. When variables cannot be fully harmonized, the documentation explains the limitations of comparability.
Response category codes and frequencies can be accessed from the variable availability grid or from a link on each variable description. Codes and frequencies are displayed so researchers can see which categories are represented in each available year. Codes can also be viewed in “case count” format, with unweighted sample size for each response category displayed by year. eFigure 2 in the online appendix shows both the category availability view and the case count views for one IHIS variable.
User-friendly data dissemination is an integral component of this effort. We distribute these data and documentation through a Web-based data access system that is available free of charge (http://www.ihis.us). For each data extract, the researcher specifies the file type (hierarchical or rectangular), data format (SAS, Stata, SPSS), years to be included, and variables for analysis. The researcher can provide a short description of the extract, which is numbered and displayed for future reference in the researcher's personal download history (accessible at every subsequent log-in). When the data extract is ready for download, an e-mail is sent to alert the researcher.
For each extract submitted, the researcher downloads a compressed ASCII data file, an extract-specific codebook, and a command file with syntax to convert the ASCII data to the preferred file format. At any time, the researcher can return to the personalized data download Web page to revise or resubmit a previously created extract request. Users who encounter difficulties can e-mail IHIS user support for assistance.
Project Status and Future Plans
The Integrated Health Interview Series currently consists of more than 1000 integrated variables selected from NHIS data files for 1969 to the present. However, this is only a fraction of the total variables available in NHIS. Additional variables are steadily being added. Furthermore, users can link additional variables from the original NHIS public use files to an IHIS data extract. A user note with guidance on linking, and syntax files for merging are provided on the website.
IHIS data can be used by population health researchers in numerous ways. The data can document trends over time in the incidence of conditions such as diabetes, the prevalence of health behaviors such as smoking, or disparities in cancer screening. Exposure-outcome relationships such as socioeconomic indicators (eg, education, income, or poverty status) and cause-specific mortality can be examined for the years 1986–2000. Pooling multiple years of IHIS data can provide sufficient sample size to study small subgroups such as American Indians, farmers, or new immigrants.
We are making available links between the original NHIS survey text and each IHIS variable description. We are extending the time series backward by including new public use files for 1963–1968 (these files have recently been released by National Center for Health Statistics staff). We are also in the process of developing new features for the Web site, including on-line tabulation and a search engine to help users efficiently locate variables.
Old health survey data are not simply of historical interest; rather, they are essential tools for understanding the dynamics of population health. Our goal with IHIS is to reduce barriers to cross-temporal analysis by using 4 decades of NHIS data. These integrated, well-documented, and easily accessible health data provide an important new data resource for epidemiologic and population health research.
1. Gentleman JF, Pleis JR. The National Health Interview Survey: an overview. Eff Clin Pract
. 2002;5(Suppl 3):E2.
2. Adams PF, Dey AN, Vickerie JL. Summary health statistics for the U.S. population: National Health Interview Survey, 2005. Vital Health Stat
3. US Department of Health and Human Services. Healthy People 2010. 2nd ed. With Understanding and Improving Health and Objectives for Improving Health. 2 vols. US Government Printing Office.
Available at: http://www.healthypeople.gov/document/
4. Agency for Healthcare Research and Quality. 2006 National Healthcare Quality Report.
Rockville, MD: US Department of Health and Human Services; 2006. AHRQ Pub. No. 07-0013.
5. Calle EE, Flanders WD, Thun MJ, et al. Demographic predictors of mammography and Pap smear screening in US women. Am J Public Health
6. Hiatt RA, Klabunde C, Breen N, et al. Cancer screening practices from National Health Interview Surveys: past, present, and future. J Natl Cancer Inst
7. Swan J, Breen N, Coates RJ, et al. Progress in cancer screening practices in the United States: results from the 2000 National Health Interview Survey. Cancer
. 2003;97:1528 –1540.
8. Dombkowski KJ, Lantz PM, Freed GL. The need for surveillance of delay in age-appropriate immunization. Am J Prev Med
. 2002;23: 36 –42.
9. Pleis JR, Gentleman JF. Using the National Health Interview Survey: time trends in influenza vaccinations among targeted adults. Eff Clin Pract
. 2002;5(Suppl 3):E3.
10. Thompson FE, Midthune D, Subar AF, et al. Dietary intake estimates in the National Health Interview Survey, 2000: methodology, results, and interpretation. J Am Diet Assoc
. 2005;105:352–363; quiz 487.
11. Patterson BH, Harlan LC, Block G, et al. Food choices of whites, blacks, and Hispanics: data from the 1987 National Health Interview Survey. Nutr Cancer
12. Ahmed NU, Smith GL, Flores AM, et al. Racial/ethnic disparity and predictors of leisure-time physical activity among U.S. men. Ethn Dis
. 2005;15:40 –52.
13. Caspersen CJ, Christenson GM, Pollard RA. Status of the 1990 physical fitness and exercise objectives–evidence from NHIS 1985. Public Health Rep
14. Kruger J, Galuska DA, Serdula MK, et al. Physical activity profiles of U.S. adults trying to lose weight: NHIS 1998. Med Sci Sports Exerc
. 2005;37:364 –368.
15. Gilpin EA, Pierce JP. Demographic differences in patterns in the incidence of smoking cessation: United States 1950–1990. Ann Epidemiol
16. Lee DJ, LeBlanc W, Fleming LE, et al. Trends in US smoking rates in occupational groups: the National Health Interview Survey 1987–1994. J Occup Environ Med
. 2004;46:538 –548.
17. Shopland DR, Brown C. Toward the 1990 objectives for smoking: measuring the progress with 1985 NHIS data. Public Health Rep
. 1987;102:68 –73.
18. Cubbin C, LeClere FB, Smith GS. Socioeconomic status and injury mortality: individual and neighbourhood determinants. J Epidemiol Commun Health
19. Lowry R, Kann L, Collins JL, et al. The effect of socioeconomic status on chronic disease risk behaviors among US adolescents. JAMA
. 1996; 276:792–797.
20. Kaufman JS, Long AE, Liao Y, et al. The relation between income and mortality in U.S. blacks and whites. Epidemiology
21. Newacheck PW, Stein RE, Bauman L, Hung YY. Disparities in the prevalence of disability between black and white children. Arch Pediatr Adolesc Med
. 2003;157:244 –248.
22. Silver EJ, Stein RE. Access to care, unmet health needs, and poverty status among children with and without chronic conditions. Ambul Pediatr
. 2001;1:314 –320.
23. Brackbill RM, Cameron LL, Behrens V. Prevalence of chronic diseases and impairments among US farmers, 1986–1990. Am J Epidemiol
24. McGee DL, Liao Y, Cao G, et al. Self-reported health status and mortality in a multiethnic US cohort. Am J Epidemiol
25. Mugge RH. The varied uses of health statistics. Public Health Rep
. 1981;96:228 –230.
26. Ruggles S, Sobek M, Alexander T, et al. Integrated Public Use Microdata Series: Version 3.0 [Machine-readable database]. Minneapolis, MN: Minnesota Population Center [producer and distributor]. Available at: http://usa.ipums.org/usa/
27. Korn EL, Graubard BI. Analysis of Health Surveys.
New York: John Wiley & Sons; 1999.
28. Kovar MG, Poe GS. The National Health Interview Survey design, 1973–84, and procedures, 1975–83. Vital Health Stat
29. Massey JT, Moore TF, Parsons VL, et al. The 1985–94 NHIS sample design. Vital Health Stat
30. National Center for Health Statistics. National Health Interview Survey: research for the 1995–2004 redesign. Vital Health Stat.
31. Health interview survey 1957–1974. Vital Health Stat.
32. Khrisanopulo MP. Health Survey Procedure. Concepts, Questionnaire Development, and Definitions in the Health Interview Survey. Vital Health Stat
33. Design and estimation for the National Health Interview Survey, 1995– 2004. Vital Health Stat.