Accurate data on health determinants and outcomes are necessary for communities to identify and invest in opportunities for community health improvement. These data are especially important for the numerically small but culturally and geographically diverse American Indian and Alaska Native (AI/AN) population in the United States. Nationally, AI/AN face significant health disparities that have resulted in the lowest life expectancy when compared with other race/ethnicity groups.1 Historically, efforts to address these disparities have been hampered by a lack of accurate and timely data on health outcomes and determinants.
Over the past 2 decades, there has been an increase in tribes' capacity to access, collect, and utilize data for health improvement activities. These gains in tribal public health infrastructure are due in part to efforts by Tribal Epidemiology Centers.2 As tribal epidemiology has gained recognition as a field of public health inquiry, there has been increased awareness of the challenges involved with describing the distribution of health and its determinants for AI/AN communities.
Given the many complex issues surrounding AI/AN identity,3 determining who “counts” as AI/AN is a key challenge to obtaining accurate AI/AN health statistics. One aspect of AI/AN identity with implications for data collection and analysis is that almost 50% of self-identified AI/AN consider themselves multiracial.4,5 Many health data collection systems allow for the selection of multiple races (typically checkboxes for white, black, AI/AN, Asian, Native Hawaiian/Pacific Islander), ethnicity (Hispanic/Latino or not Hispanic/Latino), and the option to write in more specific information on racial/ethnic background or tribal affiliation. Individuals of Hispanic/Latino ethnicity are often combined into a single group, regardless of the race indicated on the record. Typically, researchers use 1 of 3 approaches when analyzing records with multiple-race responses: exclude these records from analysis and only report on single-race groups; combine all multiple-race responses into a single “multiracial” group; or combine single- and multiple-race responses into “alone or in combination with another race” groups.6 An alternative to these approaches is to use bridged race estimates, which utilize regression methods to bridge multiple-race information into single-race categories.6,7 Given the large proportion of AI/AN who identify as multiracial, the method used to handle multiple-race responses can have a significant effect on morbidity and mortality estimates.
Another challenge to obtaining accurate AI/AN counts is the misclassification of AI/AN as white or other races in public health data systems.8,9 AI/AN misclassification has been documented in death records,9,10 cancer registries,11 hospital discharge data systems,12 and other data sources. AI/AN misclassification results in the underestimation of AI/AN morbidity and mortality and obscures health disparities experienced by AI/AN communities.3 Probabilistic record linkages, in which registries of known AI/AN individuals are matched against public health data systems, are one method to identify and correct misclassified AI/AN records and thereby produce more accurate health statistics for AI/AN populations.9,11
This article examines the effects of definition and misclassification of AI/AN on estimates of all-cause and cause-specific mortality counts and rates for AI/AN in Washington.
We obtained death certificates for Washington State residents who died in 2015 and 2016 from the Washington Department of Health's Center for Health Statistics. The death certificates were linked to the Northwest Tribal Registry (NTR) in order to identify misclassified AI/AN death records. The NTR is a demographic data set of known AI/AN patients who received services at Indian Health Service (IHS), tribal clinics, or urban Indian health clinics in Idaho, Oregon, and Washington. Patients who are included in the NTR have been designated by their tribe to be eligible to receive services through the Indian health care system or patients who receive care at urban Indian health clinics have self-identified as AI/AN when registering to receive services. The NTR is updated every 1 to 2 years using the most recent patient registration data available from the IHS, tribal clinics, and urban Indian health clinics within the Portland Area of the IHS (Idaho, Oregon, and Washington). The NTR includes all patients who were registered at the IHS, tribal, or urban Indian health clinics that provided data for the NTR at the time the data for the registry were pulled. The NTR includes patients from tribes outside of Idaho, Oregon, and Washington, patients who no longer live within the 3-state area, and patients who have not recently received services at an IHS, tribal, or urban Indian clinic. The NTR does not include information on AI/AN individuals who are tribal members but do not seek health care services within the Indian health care system, individuals who self-identify as AI/AN but are not eligible for services through the Indian health care system, or AI/AN patients who are registered at a tribal clinic that does not provide data for the NTR. The NTR does not contain health information but does contain personal identifiers (including name, Social Security Number, date of birth, and address) for patients included in the NTR.
We used probabilistic linkage methods to match the NTR to Washington death certificate in order to identify individuals with records in both data sets. Probabilistic linkage methods allow for the matching of records across data sets even when there are minor discrepancies in the matching variables (eg, misspelled names, transposed digits in Social Security Number, or birth dates).12,13 The probabilistic linkages between the NTR and Washington death certificates were conducted as 2 separate linkage events. For both linkages, NTR records were matched to death records using Social Security Number, name (last, first, and middle), date of birth, and sex. This first linkage was conducted with 2015 death certificates (N = 54 640) using the LinkPlus v.2.0 software and identified 702 matched records. The second linkage was conducted with 2016 death certificates (N = 54 780) using the MatchPro v.1.1 software and identified 705 matches. The 2015 and 2016 records were combined into a single data set for this analysis.
Outcomes and case definitions
We obtained counts and age-adjusted rates for the following outcomes:
- All-cause mortality: All deaths reported for Washington residents during 2015-2016, including a small number of records that did not have a cause of death specified in the underlying cause of death field.
- Major cardiovascular disease deaths: Washington resident deaths during 2015-2016 with the underlying cause of death in the following International Classification of Diseases, Tenth Revision (ICD-10) range: I00-I78.
- Cancer deaths: Washington resident deaths during 2015-2016 with the underlying cause of death in the following ICD-10 range: C00-C97.
- Unintentional injury deaths: Washington resident deaths during 2015-2016 with the underlying cause of death in the following ICD-10 range: V01-X59, Y85, Y86.
AI/AN definitions and calculation of rates
We calculated counts and age-adjusted rates for 6 “definitions” of AI/AN. Prelinkage AI/AN counts were obtained using information from the race and ethnicity checkbox fields and the National Center for Health Statistics (NCHS) bridged race field originally provided in the Washington death certificate statistical file. Postlinkage AI/AN counts were calculated as prelinkage counts plus records that matched with the NTR but had no mention of AI/AN background in the race/ethnicity fields. Postlinkage non-Hispanic white (NHW) counts and rates were included for comparison. Table 1 shows the IDs and numerator descriptions for each of the definitions compared in this analysis.
We used NCHS vintage 2017 postcensal bridged race estimates for the years 2015 and 2016 as population denominators for calculating rates.14 NCHS bridged race estimates are updated yearly, available at the state and county levels, and are often used when calculating rates by race/ethnicity. However, NCHS bridged race estimates are known to overestimate the Hispanic AI/AN population15 and therefore result in underestimated rates for the total (Hispanic and non-Hispanic combined) AI/AN population.
Age-adjusted mortality rates (shown as deaths per 100 000 population) were directly standardized to the 2000 US standard population. Ninety-five percent confidence intervals (CIs) were calculated using the gamma distribution for AI/AN rates to account for small cell sizes.16 The normal approximation method was used to calculate 95% CIs around NHW rates. Rate ratios and 95% CIs were calculated for each definition of AI/AN using the most conservative definition of AI/AN and postlinkage NHW (definition IDs A and G, respectively) as reference groups. All analyses were conducted in SAS v. 9.4.
Mortality counts and rates by definition of AI/AN
Based on the most conservative definition of AI/AN (A), there were 1502 deaths among AI/AN in Washington State during 2015-2016 (Table 2). The corresponding age-adjusted all-cause mortality rate for this definition was 833.8 deaths per 100 000 (95% CI, 788.7-881.1). Using the NCHS bridged race definition of AI/AN (B) increased the mortality count by 252 records (16.8%) and did not appreciably change the mortality rate when compared with A. The largest increase in records resulted from changing from a single-race definition (A) to a multiple-race definition (C). Definition C resulted in the identification of 637 additional AI/AN records (42.4% increase) compared with A and resulted in a statistically significant increase of 47% in the mortality rate.
Based on the least conservative definition of AI/AN (F), there were 2473 deaths among AI/AN in Washington State during 2015-2016, which was a 971 record (64.6%) increase over the most conservative estimate for definition A. The age-adjusted all-cause mortality rate for F (1231.0 deaths/100 000; 95% CI, 1176.9-1287.3) was 48% higher than the rate for A.
Inclusion of Hispanic ethnicity AI/AN records
The inclusion of Hispanic AI/AN records (and a small number of records where ethnicity was unknown) resulted in the identification of 90 additional records between definitions C and D and 107 additional records between definitions E and F. However, while the definitions that included Hispanic AI/AN had higher counts, the age-adjusted rates calculated for these definitions were lower than the rates for the definitions that were restricted to non-Hispanic AI/AN. This discrepancy is due to the previously discussed overestimation of Hispanic AI/AN in the NCHS's bridged race denominators, which resulted in the underestimation of rates for definitions D and F.
Comparing prelinkage definitions with their corresponding postlinkage definitions allowed us to evaluate the effect of AI/AN misclassification on mortality counts and rates. There were between 227 (difference between E and C) and 244 (difference between F and D) misclassified AI/AN records identified through the probabilistic linkage with the NTR. Correcting misclassified records resulted in postlinkage mortality rates that were approximately 11% higher than prelinkage mortality rates.
Disparities between AI/AN and NHW
Regardless of definition used, AI/AN in Washington had significantly higher all-cause mortality rates than NHW in the state. The disparity in mortality rates between AI/AN and NHW ranged from 20% using the most conservative definition of AI/AN (A) to 96% using the postlinkage non-Hispanic multiple-race definition of AI/AN (E).
The patterns and magnitude of disparities experienced by AI/AN varied by cause of death and definition of AI/AN (Figure). For cardiovascular diseases, the leading cause of death for AI/AN and NHW in the state, the mortality rate for AI/AN was statistically similar to the rate for NHW for definition A and 91% higher for definition E. For cancer, the second leading cause of death for both AI/AN and NHW in the state, AI/AN had lower (though not significantly so) mortality rates than NHW for definitions A and B but significantly higher rates for definitions C to F. For unintentional injuries, the third leading cause of death among AI/AN in Washington, AI/AN had significantly higher mortality rates than NHW regardless of the definition used. The magnitude of the disparity in unintentional injury mortality rates ranged from 2.1 times higher for definition B to 3.4 times higher for definition E.
Discussion and Conclusion
Our findings show that the methods used to identify AI/AN in Washington death certificates resulted in significant differences in mortality rate estimates and disparity measures. The choice of reporting single-race versus multiple-race AI/AN had the most consequential effect on mortality counts and rates and resulted in a nearly 50% difference in all-cause mortality counts and rates. The correction of misclassified AI/AN records resulted in relatively small but statistically significant increases in AI/AN mortality rates. The inclusion of Hispanic AI/AN resulted in modest increases in AI/AN numerator counts but significantly underestimated the corresponding mortality rates due to the inflation of Hispanic AI/AN population in the NCHS bridged race estimates used for population denominators.
Our findings are subject to at least 2 limitations. First, the NTR used for probabilistic linkages represents only a subset of the total AI/AN population in the Northwest. The NTR only includes records of individuals who receive services from the IHS, tribal, or urban Indian health clinics in Idaho, Oregon, and Washington and does not include AI/AN individuals who are not eligible for these services or who seek care outside the Indian health care system. At best, the NTR represents approximately 76.2% of the Northwest AI/AN population and is known to underrepresent younger age groups and urban Indian populations.17 Therefore, we expect the true number of misclassified AI/AN and postlinkage AI/AN mortality rates to be higher than those reported in this analysis. The second limitation is related to the use of NCHS bridged race estimates as population denominators. As mentioned previously, these estimates are known to inflate the Hispanic AI/AN population in the United States and therefore result in underestimates of rates that include Hispanic AI/AN. These artificially high estimates are due to changes in data collection between the 2000 and 2010 Census and the bridging algorithms used to reassign Hispanics who self-identified as “some other race” in the 2010 Census into specific race groups.18 Furthermore, the bridged race estimates likely overestimate the rate denominator for the single-race definition (A) and underestimate the rate denominator for the multiple-race definitions (C-F). Additional study is needed to understand the extent of these discrepancies on AI/AN rate calculations and to identify possible alternate sources of data for AI/AN population denominators.
Tribes and urban Indian communities require accurate and representative health statistics to demonstrate need and seek resources to address the disparities experienced by their communities. Furthermore, the question of who “counts” as American Indian and/or Alaska Native has cultural, political, financial, and health implications for AI/AN individuals and communities.3 Public health researchers and practitioners should consult with AI/AN communities to better understand the complex issues around AI/AN identity and apply this knowledge when identifying AI/AN individuals in health and administrative data sets.
Implications for Policy & Practice
Epidemiologists who wish to describe AI/AN health outcomes and disparities should consider 3 key findings when identifying AI/AN in public health surveillance data sets:
- Reporting single-race (AI/AN alone) versus multiple-race (AI/AN alone or in combination with other races) can result in large and statistically significant differences in estimates of AI/AN morbidity, mortality, and disparities.
- Misclassification of AI/AN as white or other races results in the underestimation of AI/AN morbidity and mortality and obscures disparities experienced by this population. Utilizing data that have been corrected for AI/AN misclassification through probabilistic linkages provides more accurate and robust estimates of disease burden for AI/AN communities.
- The methods used to identify AI/AN individuals in health and administrative data sets, and the limitations associated with these methods, should be clearly described when reporting data on AI/AN communities.
1. Arias E, Xu J, Jim MA. Period life tables for the non-Hispanic American Indian and Alaska Native population, 2007-2009. Am J Public Health. 2014;104(suppl 3):S312–S319.
3. Haozous EA, Strickland CJ, Palacios JF, Solomon TGA. Blood politics, ethnic identity, and racial misclassification
among American Indians and Alaska Natives. J Environ Public Health. 2014;2014:321604.
6. Liebler CA, Halpern-Manners A. A practical approach to using multiple-race response data: a bridging method for public-use microdata. Demography. 2008;45(1):143–155.
7. Ingram DD, Parker JD, Schenker N, et al United States Census 2000 population with bridged race categories. Vital Health Stat 2. 2003;135:1–55.
8. Bauer UE, Plescia M. Addressing disparities
in the health of American Indian and Alaska Native People: the importance of improved public health data. Am J Public Health. 2014;104(suppl 3):S255–S257.
9. Anderson RN, Copeland G, Hayes JM. Linkages to improve mortality data for American Indians and Alaska Natives: a new model for death reporting? Am J Public Health. 2014;104(suppl 3):S258–S262.
10. Stehr-Green P, Bettles J, Robertson LD. Effect of racial/ethnic misclassification
of American Indians and Alaskan Natives on Washington State death certificates, 1989-1997. Am J Public Health. 2002;92(3):443–444.
11. Puukka E, Stehr-Green P, Becker TM. Measuring the health status gap for American Indians/Alaska Natives: getting closer to the truth. Am J Public Health. 2005;95(5):838–843.
12. Bigback KM, Hoopes M, Dankovchik J, et al Using record linkage to improve race data quality for American Indians and Alaska Natives in two Pacific Northwest state hospital discharge databases. Health Serv Res. 2015;50(suppl 1):1390–1402.
13. Sayers A, Ben-Shlomo Y, Blom AW, Steele F. Probabilistic record linkage. Int J Epidemiol. 2016;45(3):954–964.
14. National Center for Health Statistics. Vintage 2017 postcensal estimates of the resident population of the United States (April 1, 2010, July 1, 2010-July 1, 2017), by year, county, single-year of age, bridged race, Hispanic origin, and sex. Prepared under a collaborative arrangement with the US Census Bureau. nchs/nvss/bridged_race.htm. Published June 21, 2018. Accessed October 13, 2018.
15. Jim MA, Arias E, Seneca DS, et al Racial misclassification
of American Indians and Alaska Natives by Indian Health Service Contract Health Service Delivery Area. Am J Public Health. 2014;104(suppl 3):S295–S302.
16. Fay MP, Feuer EJ. Confidence intervals for directly standardized rates: a method based on the gamma distribution. Stat Med. 1997;16:791–801.
18. Edwards BK, Noone AM, Mariotto AB, et al Annual report to the nation on the status of cancer, 1975-2010, featuring prevalence of comorbidity and impact on survival among persons with lung, colorectal, breast, or prostate cancer. Cancer. 2013;120(9):1290–1314.