Secondary Logo

Journal Logo

Research Report

Revealing the Unequal Burden of COVID-19 by Income, Race/Ethnicity, and Household Crowding: US County Versus Zip Code Analyses

Chen, Jarvis T. ScD; Krieger, Nancy PhD

Author Information
Journal of Public Health Management and Practice: January/February 2021 - Volume 27 - Issue - p S43-S56
doi: 10.1097/PHH.0000000000001263

Abstract

As communities in the United States grapple with the COVID-19 pandemic, there is an urgent need for real-time data to better understand how vulnerable populations are affected, including who is most at risk of infection, developing serious illness, and dying.1,2 Informed by an awareness of the critical importance of racial/ethnic, economic, and gender inequalities in shaping individuals' exposure to and ability to protect themselves from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), as well as their ability to practice physical distancing, maintain economic well-being, and access health care when sick, there have been increasing calls for improved data to provide an evidence base for action.1–3 Descriptive epidemiology, vital to informing efforts to distribute resources, develop treatments, and coordinate public policy, is hampered by the paucity of disaggregated data by important social variables such as race/ethnicity and socioeconomic position in surveillance data. For example, while data from the COVID-19 tracking project4 suggest that an increasing number of states are starting to report COVID-19 cases disaggregated by race/ethnicity, among those states that do substantial proportions (typically ≥50%) of cases are of unknown or missing race/ethnicity. As of May 6, 2020, tables on the US Centers for Disease Control and Prevention's (CDC's) own Web page reporting COVID-19 cases by race/ethnicity indicated that 54.2% of reported cases were missing race/ethnicity information.5 Furthermore, to our knowledge, no states are reporting COVID-19 cases or deaths by measures of individual socioeconomic position, though US death certificates routinely collect information on decedent's education.1,2,6

The Public Health Disparities Geocoding Project was established to address the absence of socioeconomic data in most routinely collected public health surveillance data.6–11 By geocoding health records and linking them to US Census–derived data on neighborhood socioeconomic variables, we have shown that these methods can be used to compute valid estimates of socioeconomic gradients in health and, moreover, that area-based socioeconomic measures (ABSMs) can be used to characterize the influence of neighborhood socioeconomic context on health above and beyond their association with individual socioeconomic position. We have applied these techniques to a wide range of health outcomes, from birth to death and including cancer and infectious diseases, and have shown that the resulting estimates of socioeconomic gradients are valid and robust. The series of articles6–11 stemming from this project have been cited more than 3500 times and have had a demonstrable impact on US public health surveillance systems and health research more generally.

To respond to the urgent need in the United States for documentation of stark social inequities in who is affected by the COVID-19 pandemic, in this article, we quantify disparities in COVID-19 death rates in the United States by county-level sociodemographic attributes. For finer levels of geographic aggregation, we use zip code ABSMs to analyze data on (a) cumulative incidence of confirmed cases in Illinois and (b) cumulative incidence of positive test results in New York City (NYC). Our intention is to illustrate how state and local health departments can easily implement these types of analyses using freely available US Census data, and we provide tabular and graphic summaries of these social inequities to contribute to discussions on policies and interventions.

Methods

COVID-19 data sources

US county death data

We obtained publicly available data on COVID-19 deaths at the county level from USA Facts,12 which aggregates data from multiple sources including state and local health departments. Our analytic sample consisted of 68 656 COVID-19 deaths reported in 3142 US counties (excluding territories) as of May 5, 2020.

Illinois data on confirmed cases at the zip code level

We obtained zip code tabulation area (ZCTA)–level data on confirmed cases in Illinois from the lookup tool developed by the Illinois Department of Public Health and The Chicago Reporter.13 As of May 5, 2020, Illinois reported probable and confirmed cases following the CDC's interim case definition.14 ZCTAs are US Census–defined geographic units that correspond to areas roughly covered by US Postal Service (USPS) zip codes.15 As noted by the Illinois data source, infections among incarcerated populations are not fully represented in these data. Thus, our analytic sample consisted of 63 901 confirmed cases reported in 461 Illinois ZCTAs as of May 5, 2020.

NYC data on positive tests at the zip code level

We obtained ZCTA-level data on positive PCR tests in NYC from the NYC Department of Health and Mental Hygiene's COVID-19 GitHub repository.16 Our analytic sample consisted of 171 615 positive tests reported in NYC 177 ZCTAs as of May 5, 2020.

Population denominator and area attributes data

We extracted county- and ZCTA-level population counts and sociodemographic attributes from the American Community Survey (ACS) 2014-2018 5-year estimates17 using the tidycensus package in R.18 ABSMs included percentage of persons below poverty, percent household crowding, percent population of color (defined as the proportion of population who are not White non-Hispanic), and a measure of racialized economic segregation, the Index of Concentration at the Extremes (ICE).19 This measure captures the extent to which the population in a given area is concentrated at either extreme of a social metric and ranges from −1 (everyone in the worst category) to 1 (everyone in the best category). We set the extremes for this ICE as follows: (a) high-income White population versus (b) low-income Black population, with high versus low income defined as being in the top 20th versus bottom 80th percentile of US household income distribution in 2014-2018.19 We defined categories of ABSMs using a priori cut points for percent below poverty (0%-4.9%, 5%-9.9%, 10%-14.9%, 15%-19.9%, and 20%-100%)7,8 and quintile cut points based on the distribution of county-level attributes in the United States or the distribution of ZCTA attributes within Illinois and NYC. Definitions, source variables from the ACS, and categorical cut points are presented in Supplemental Appendix Table A1 (available at https://links.lww.com/JPHMP/A712).

Statistical methods

Drawing on the methods of the Public Health Disparities Geocoding Project,8 we merged cumulative counts of confirmed cases, positive tests, and deaths at the reported level of geography with population denominators and ABSMs. We then aggregated over areas within the defined categories described earlier. Since no data source currently reports disaggregated data by age and county or ZCTA, we computed crude outcome rates per 100 000 person-years by ABSM categories rather than age-standardized rates. Person-time denominators were computed by multiplying ACS county populations by the 104 days from January 22 to May 5 and reexpressing rates per 100 000 person-years. To quantify absolute and relative disparities, we computed rate differences and rate ratios (RRs), setting the reference category to the socially most advantaged groups. In all analyses, we computed confidence limits assuming a Poisson distribution for event counts.20 R code to replicate these analyses is available on the Public Health Disparities Geocoding Project Web site.8 We additionally present sensitivity analyses for (a) age- and sex-adjusted RRs by county characteristics based on indirect standardization and (b) deaths restricting to counties with 50 or more confirmed cases; and (c) analyses of US COVID-19 cases as of May 5, 2020, by county characteristics in the Supplemental Appendix A2 (available at https://links.lww.com/JPHMP/A713).

Human subjects: This research is exempt from IRB review (decedents only; 45 CFR 46.102(f)).

Results

County-level COVID-19 death in the United States

As shown in Figures 1A-1D and Table 1, the highest COVID-19 death rates per 100 000 person-years were consistently observed among those living in the most disadvantaged versus most advantaged counties in relation to percent poverty (143.2 vs 83.3), ICE (113.0 vs 108.8), percent crowding (124.4 vs 48.2), and percent population of color (127.7 vs 25.9). The gradient was especially stark for percent population of color, whereby populations living in counties where 61% to 100% of the population is of color experienced a COVID-19 death rate 4.9-fold greater than those living in counties where 0% to 17.2% of the population is of color. However, socioeconomic gradients were not always monotonic, most notably for ICE, for which residents of counties in the most advantaged quintile experienced a death rate only slightly lower than residents of counties in the most disadvantaged quintile. In contrast, residents of counties in the middle quintile of ICE experienced the lowest COVID-19 death rates (37.1 per 100 000).

FIGURE 1
FIGURE 1:
US COVID-19 Deaths per 100 000 Person-Years by County Area-Based Socioeconomic Measures as of May 5, 2020
TABLE 1 - US COVID-19 Death Rate per 100 000 Person-Years, Rate Differences, and Rate Ratios by County Characteristics as of May 5, 2020 (3142 Counties, 68 656 Deaths, 322 903 030 Population)
Number of Counties Number of Deaths Populationa Death Rate per 100 000 Person-Years (95% CI) Rate Difference per 100 000 Person-Years (95% CI) Rate Ratio (95% CI)
% Poverty (categories)
0%-4.9% 41 1 067 4 495 932 83.3 (78.3, 88.4) 0.0 (reference) 1.00 (reference)
5%-9.9% 558 17 855 71 157 744 88.1 (86.8, 89.4) 4.8 (−0.4, 9.9) 1.06 (0.99, 1.12)
10%-14.9% 1023 18 895 108 820 591 61.0 (60.1, 61.9) −22.4 (−27.4, −17.3) 0.73 (0.69, 0.78)
15%-19.9% 860 15 990 101 961 251 55.1 (54.2, 55.9) −28.3 (−33.3, −23.2) 0.66 (0.62, 0.70)
20%-100% 659 14 849 36 428 205 143.2 (140.9, 145.5) 59.8 (54.3, 65.3) 1.72 (1.61, 1.83)
Index of Concentration at the Extremes (high-income White households vs low-income Black households)
Q1: [−0.522, 0.114]b 974 19 939 61 949 063 113.0 (111.5, 114.6) 4.2 (2.1, 6.4) 1.04 (1.02, 1.06)
Q2: [0.114, 0.159] 701 10 991 64 942 197 59.4 (58.3, 60.5) −49.4 (−51.2, −47.5) 0.55 (0.53, 0.56)
Q3: [0.159, 0.205] 696 6 879 65 113 354 37.1 (36.2, 38.0) −71.7 (−73.4, −70.0) 0.34 (0.33, 0.35)
Q4: [0.205, 0.283] 515 10 297 64 525 801 56.0 (55.0, 57.1) −52.8 (−54.6, −50.9) 0.52 (0.50, 0.53)
Q5: [0.283, 0.536] 255 20 550 66 333 308 108.8 (107.3, 110.3) 0.0 (reference) 1.00 (reference)
% Crowding (quintiles)
Q1: [0%, 1.47%] 1089 8953 65 273 354 48.2 (47.2, 49.2) 0.0 (reference) 1.00 (reference)
Q2: (1.47%, 2.12%] 709 10 331 64 425 866 56.3 (55.2, 57.4) 8.1 (6.7, 9.6) 1.17 (1.14, 1.20)
Q3: (2.12%, 3.06%] 656 14 499 63 510 499 80.2 (78.9, 81.5) 32.0 (30.4, 33.6) 1.66 (1.62, 1.71)
Q4: (3.06%, 4.91%] 443 12 242 65 654 959 65.5 (64.3, 66.6) 17.3 (15.8, 18.8) 1.36 (1.32, 1.40)
Q5: (4.91%, 49.3%] 244 22 630 63 913 934 124.4 (122.7, 126.0) 76.2 (74.3, 78.1) 2.58 (2.52, 2.65)
% Population of color
Q1: [0%, 17.2%] 1636 4804 65 219 939 25.9 (25.1, 26.6) 0.0 (reference) 1.00 (reference)
Q2: (17.2%, 30.2%] 549 10 570 65 166 967 57.0 (55.9, 58.1) 31.1 (29.8, 32.4) 2.20 (2.13, 2.28)
Q3: (30.2%, 44.3%] 468 15 687 69 376 152 79.4 (78.2, 80.7) 53.5 (52.1, 55.0) 3.07 (2.97, 3.17)
Q4: (44.3%, 61%] 280 14 974 60 922 155 86.3 (84.9, 87.7) 60.5 (58.9, 62.0) 3.34 (3.23, 3.45)
Q5: (61%, 100%] 209 22 621 62 217 817 127.7 (126.0, 129.4) 101.8 (100.0, 103.6) 4.94 (4.78, 5.09)
Abbreviation: CI, confidence interval.
aPopulation totals can vary due to counties with missing area-based socioeconomic measures.
b[a, b] indicates interval where value x is axb; (a, b] indicates interval where value x is a < xb.

ZCTA-level confirmed COVID-19 cases in Illinois

As shown in Figures 2A-2D and Table 2, we observed consistent and monotonic socioeconomic gradients in the cumulative incidence of COVID-19 diagnoses for all ABSMs using finer resolution ZCTA-level data in Illinois. The highest rates of COVID-19 confirmed cases per 100 000 person-years were observed among the most disadvantaged compared with most advantaged categories of percent poverty (2817.5 vs 1093.3), ICE (3453.6 vs 1084.0), percent crowding (3454.2 vs 1084.0), and percent population of color (4027.5 vs 782.4). The steepest gradient was observed by quintiles of percent population of color, with residents of ZCTAs in the highest quintile experiencing a rate 5.2 times that of residents in the lowest quintile.

FIGURE 2
FIGURE 2:
Illinois COVID-19 Confirmed Cases per 100 000 Person-Years by Zip Code Area-Based Socioeconomic Measures as of May 5, 2020 Abbreviation: ZCTA, zip code tabulation area.
TABLE 2 - Illinois Rate of Confirmed COVID-19 Cases per 100 000 Person-Years, Rate Differences, and Rate Ratios by ZCTA Characteristics as of May 5, 2020 (461 ZCTAs, 63 901 Confirmed Cases, 11 383 197 Population)
Number of ZCTAs Number of Confirmed Cases Populationa Confirmed Case Rate per 100 000 (95% CI) Rate Difference per 100 000 (95% CI) Rate Ratio (95% CI)
% Poverty (categories)
0%-4.9% 72 4 912 1 577 939 1093.3 (1062.7, 1123.8) 0.0 (reference) 1.00 (reference)
5%-9.9% 159 15 584 3 556 778 1538.8 (1514.6, 1562.9) 445.5 (406.6, 484.5) 1.41 (1.36, 1.45)
10%-14.9% 90 13 235 2 309 648 2012.5 (1978.2, 2046.8) 919.2 (873.3, 965.2) 1.84 (1.78, 1.90)
15%-19.9% 60 10 085 1 458 799 2427.9 (2380.5, 2475.3) 1334.7 (1278.3, 1391.1) 2.22 (2.15, 2.30)
20%-100% 80 19 896 2 480 033 2817.5 (2778.4, 2856.7) 1724.2 (1674.6, 1773.9) 2.58 (2.50, 2.66)
Missing ZCTA 189
Index of Concentration at the Extremes (high-income White households vs low-income Black households)
Q1: [−1, 0.0375]b 73 22 090 2 245 980 3454.2 (3408.6, 3499.7) 2370.2 (2318.1, 2422.3) 3.19 (3.10, 3.27)
Q2: (0.0375, 0.166] 95 12 447 2 304 293 1897.1 (1863.7, 1930.4) 813.0 (771.2, 854.9) 1.75 (1.70, 1.80)
Q3: (0.166, 0.27] 101 13 182 2 276 391 2033.7 (1999.0, 2068.4) 949.7 (906.7, 992.7) 1.88 (1.82, 1.93)
Q4: (0.27, 0.396] 100 8 919 2 263 673 1383.8 (1355.0, 1412.5) 299.7 (261.5, 338.0) 1.28 (1.24, 1.32)
Q5: (0.396, 0.721] 91 7 051 2 284 383 1084.0 (1058.7, 1109.3) 0.0 (reference) 1.00 (reference)
Missing ZCTA 212
% Crowding (quintiles)
Q1: [0%, 0.971%] 133 7 584 2 296 157 1160.0 (1133.9, 1186.1) 0.0 (reference) 1.00 (reference)
Q2: (0.971%, 1.7%] 99 5 900 2 221 492 932.7 (908.9, 956.5) −227.2 (−262.6, −191.9) 0.80 (0.78, 0.83)
Q3: (1.7%, 2.64%] 84 10 763 2 307 613 1638.1 (1607.1, 1669.0) 478.1 (437.6, 518.6) 1.41 (1.37, 1.45)
Q4: (2.64%, 4.46%] 81 16 319 2 272 821 2521.7 (2483.0, 2560.3) 1361.7 (1315.0, 1408.3) 2.17 (2.12, 2.23)
Q5: (4.46%, 14.3%] 64 23 146 2 285 114 3557.3 (3511.5, 3603.2) 2397.3 (2344.6, 2450.1) 3.07 (2.99, 3.15)
Missing ZCTA 189
% Population of color
Q1: [0.685%, 18%] 146 5 104 2 290 991 782.4 (761.0, 803.9) 0.0 (reference) 1.00 (reference)
Q2: (18%, 28.6%] 89 6 636 2 268 672 1027.3 (1002.6, 1052.0) 244.9 (212.1, 277.6) 1.31 (1.27, 1.36)
Q3: (28.6%, 44.5%] 94 9 652 2 291 717 1479.2 (1449.6, 1508.7) 696.7 (660.2, 733.2) 1.89 (1.83, 1.96)
Q4: (44.5%, 71.8%] 71 16 598 2 288 835 2546.8 (2508.1, 2585.6) 1764.4 (1720.1, 1808.7) 3.26 (3.15, 3.36)
Q5: (71.8%, 99%] 61 25 722 2 242 982 4027.5 (3978.3, 4076.7) 3245.1 (3191.4, 3298.8) 5.15 (5.00, 5.30)
Missing ZCTA 189
Abbreviations: CI, confidence interval; ZCTA, zip code tabulation area.
aPopulation totals can vary due to counties with missing area-based socioeconomic measures.
b[a, b] indicates interval where value x is axb; (a, b] indicates interval where value x is a < xb.

ZCTA-level positive COVID-19 tests in NYC

Strong socioeconomic gradients were observed with finer resolution ZCTA-level data in NYC in relation to the rate of positive tests (Figures 3A-3D; Table 3). These unequal patterns persist even in the context of NYC's substantially greater rates of infection. The population rate of positive COVID-19 tests per 100 000 person-years was highest among residents in the most disadvantaged versus most advantaged categories of ICE (7411.7 vs 4561.4), percent crowding (8441.5 vs 5616.4), and percent population of color (8919.2 vs 5645.0). Similarly, the highest rate of positive tests was observed among residents living in counties in the 2 most disadvantaged categories of ZCTA-level poverty (15%-19.9% poverty: 7651.7 and 20%-100% poverty: 7411.7 vs 4561.4 in the most advantaged category, 0%-4.9% poverty). These contrasts correspond to RRs of 1.68 and 1.62.

FIGURE 3
FIGURE 3:
New York City COVID-19–Positive Tests per 10 000 Person-Years by Zip Code Area-Based Socioeconomic Measures as of May 5, 2020 Abbreviation: ZCTA, zip code tabulation area.
TABLE 3 - New York City Rate of Positive COVID-19 Tests per 100 000 Person-Years, Rate Differences, and Rate Ratios by ZCTA Characteristics as of May 5, 2020 (177 ZCTAs, 171 615 Positive Tests, 8 433 176 Population)
Number of ZCTAs Number of Positive Tests Populationa Rate per 100 000 (95% CI) Rate Difference per 100 000 (95% CI) Rate Ratio (95% CI)
% Poverty (categories)
0%-4.9% 9 1 690 130 121 4561.4 (4343.9, 4778.9) 0.0 (reference) 1.00 (reference)
5%-9.9% 41 26 941 1 506 286 6281.5 (6206.5, 6356.5) 1720.1 (1490.1, 1950.2) 1.38 (1.31, 1.45)
10%-14.9% 48 41 280 2 100 915 6900.6 (6834.1, 6967.2) 2339.2 (2111.8, 2566.7) 1.51 (1.44, 1.59)
15%-19.9% 27 31 368 1 439 746 7651.7 (7567.0, 7736.4) 3090.3 (2856.9, 3323.7) 1.68 (1.60, 1.76)
20+% 52 68 716 3 256 108 7411.7 (7356.3, 7467.1) 2850.3 (2625.9, 3074.7) 1.62 (1.55, 1.71)
Missing ZCTA 1 620
Index of Concentration at the Extremes (high-income White households vs low-income Black households)
Q1: [−0.385, −0.102]b 29 38 587 1 688 793 8024.6 (7944.5, 8104.6) 3253.5 (3152.2, 3354.7) 1.68 (1.65, 1.71)
Q2: (−0.102, 0.0212] 30 39 324 1 749 736 7893.0 (7815.0, 7971.0) 3121.9 (3022.3, 3221.5) 1.65 (1.63, 1.68)
Q3: (0.0212, 0.141] 29 37 107 1 623 732 8026.0 (7944.3, 8107.6) 3254.9 (3152.4, 3357.4) 1.68 (1.65, 1.71)
Q4: (0.141, 0.29] 39 32 180 1 692 826 6676.2 (6603.3, 6749.2) 1905.1 (1809.4, 2000.8) 1.40 (1.38, 1.42)
Q5: (0.29, 0.7] 50 22 797 1 678 089 4771.1 (4709.2, 4833.0) 0.0 (reference) 1.00 (reference)
Missing ZCTA 1 620
% Crowding (quintiles)
Q1: [0.00942, 0.0478] 48 27 327 1 708 791 5616.4 (5549.8, 5683.0) 0.0 (reference) 1.00 (reference)
Q2: [0.0478, 0.0698] 37 32 369 1 688 963 6730.8 (6657.5, 6804.1) 1114.4 (1015.3, 1213.4) 1.20 (1.18, 1.22)
Q3: [0.0698, 0.0978] 38 34 018 1 679 177 7114.9 (7039.3, 7190.5) 1498.5 (1397.7, 1599.2) 1.27 (1.25, 1.29)
Q4: [0.0978, 0.138] 31 36 056 1 682 708 7525.3 (7447.7, 7603.0) 1908.9 (1806.6, 2011.2) 1.34 (1.32, 1.36)
Q5: [0.138, 0.297] 23 40 225 1 673 537 8441.5 (8359.0, 8524.0) 2825.0 (2719.0, 2931.1) 1.50 (1.48, 1.53)
Missing ZCTA 1 620
% Population of color (quintiles)
Q1: [0.0839, 0.402] 44 27 303 1 698 653 5645.0 (5578.0, 5711.9) 0.0 (reference) 1.00 (reference)
Q2: [0.402, 0.584] 38 27 575 1 678 144 5770.9 (5702.8, 5839.0) 125.9 (30.4, 221.4) 1.02 (1.01, 1.04)
Q3: [0.584, 0.826] 38 35 079 1 708 248 7212.0 (7136.5, 7287.4) 1567.0 (1466.1, 1667.9) 1.28 (1.26, 1.30)
Q4: [0.826, 0.957] 29 38 403 1 708 722 7893.2 (7814.2, 7972.1) 2248.2 (2144.7, 2351.7) 1.40 (1.38, 1.42)
Q5: [0.957, 0.992] 28 41 635 1 639 409 8919.2 (8833.6, 9004.9) 3274.3 (3165.5, 3383.0) 1.58 (1.56, 1.60)
Missing ZCTA 1 620
Abbreviations: CI, confidence interval; ZCTA, zip code tabulation area.
aPopulation totals can vary due to counties with missing area-based socioeconomic measures.
b[a, b] indicates interval where value x is axb; (a, b] indicates interval where value x is a < xb.

Conclusions

The unequal burden of COVID-19

Linkage of available COVID-19 surveillance data to ABSMs at the county and zip code levels reveals a substantially unequal burden of COVID-19 outcomes by county and ZCTA economic and racial/ethnic characteristics. These strikingly inequitable disease distributions, heretofore obscured by the lack of disaggregated reporting by race/ethnicity and socioeconomic position in publicly available US COVID-19 surveillance data, speak to the urgent need for improved testing, surveillance and monitoring, data transparency, and targeting of public health interventions and resources for community protection and health care.

Looking across the United States, people living in the most impoverished, crowded, and racially and economically polarized counties are experiencing substantially elevated rates of COVID-19 infection and death. Of note, counties are presently the most granular geographic level at which comprehensive data on COVID-19 for all parts of the United States are being reported. We focus on death because, unlike confirmed case counts (see Supplemental Appendix Table A4, available at https://links.lww.com/JPHMP/A715), the number of deaths is less likely to be affected by well-documented inconsistencies in testing eligibility, procedures, and availability.21 Reported deaths due to COVID-19 nonetheless may not capture the potentially large burden of mortality due to unexplained deaths among individuals who were not tested for SARS-CoV-2, who might have died at home or in nursing facilities, or who might have died of a preexisting condition whose disease course was exacerbated by coronavirus infection.21,22 If individuals living in disadvantaged counties were less likely to have been tested for SARS-CoV-2, to have accessed health care given infection, or generally less likely to have had their death recorded as COVID-19 related, our analyses would underestimate the magnitude of inequities across categories of ABSMs.

Despite these data limitations, we saw strong associations of COVID-19 death rates with all 4 county-level ABSMs. These inequities are fundamentally related to the material circumstances in which people live and work. For example, individuals living in low-income areas may be more likely to be classified as “essential workers” who are less able to practice physical distancing and may not have access to personal protective equipment.1–3,23,24 “Essential workers” also include many health care professionals, including nurses, home health aides, and nursing home employees, whose risk of occupational exposure to SARS-CoV-2 is high and who live in working class areas.1–3,24 Moreover, we noted a strong association with county percent crowding, defined as the proportion of households in an area with more than 1 person per room (excluding bathrooms and hallways)17,25; by this definition, a 1-bedroom apartment with 1 bedroom, 1 dining room, 1 living room, and 1 kitchen would be categorized as crowded only if 5 or more persons were in the household.

Socioeconomic gradients in COVID-19 death rates by county poverty and ICE exhibited more complex patterns. This likely reflects the contribution of particularly large counties with high levels of transmission. Depending on the stratum of county-level ABSM in which it falls, a county with a large number of deaths will tend to dominate the computed rate for that stratum. Table 4 shows the top 25 counties by cumulative count of deaths, along with population and ABSM estimates. These counties include all 5 boroughs of NYC as well as surrounding areas, with high death counts in New York State, New Jersey, and Connecticut. The list also includes other large US urban areas with substantial transmission. Together, these 25 counties account for 57% of reported COVID-19 deaths in the United States. Examination of this list suggests that the higher death rates observed in the 5% to 9.9% category of county poverty and the most advantaged quintile of ICE reflect the contribution of counties such as Nassau, Suffolk, Westchester, and New York (Manhattan) counties, New York, to these strata. County-level analyses also notably gloss over important socioeconomic heterogeneity within counties, which may further contribute to the more complex socioeconomic gradients seen here. Also potentially relevant are changing class dynamics of COVID-19 infections, whereby early cases may have arisen from travelers who could afford international travel, followed by community transmission burdening essential workers and working class communities with crowded housing.

TABLE 4 - Deaths, Population, Crude Death Rate per 100 000 Person-Years, and County-Level Area-Based Measures for 25 Counties With the Largest Cumulative Death Counts as of May 5, 2020
FIPS Code County Name State Deaths Population Crude Death Rate per 100 000 Person-Years % Below Poverty Index of Concentration at the Extremes (White/Black Race + Income) % Crowding (>1 Person per Room) % Population of Color
36047 Kings County NY 5745 2 600 747 775.8 21% 0.07 10% 64%
36081 Queens County NY 5460 2 298 513 834.3 13% 0.117 10% 75%
36005 Bronx County NY 3671 1 437 872 896.6 29% −0.065 12% 91%
36061 New York County NY 2260 1 632 480 486.2 17% 0.289 6% 53%
26163 Wayne County MI 1945 1 761 382 387.8 23% −0.022 2% 50%
17031 Cook County IL 1922 5 223 719 129.2 15% 0.138 3% 58%
36059 Nassau County NY 1818 1 356 564 470.7 6% 0.412 3% 39%
34013 Essex County NJ 1319 793 555 583.7 16% 0.072 4% 69%
6037 Los Angeles County CA 1313 10 098 052 45.7 16% 0.168 11% 74%
36103 Suffolk County NY 1296 1 487 901 305.9 7% 0.416 3% 32%
34003 Bergen County NJ 1261 929 999 476.2 7% 0.356 2% 43%
36119 Westchester County NY 1116 968 815 404.6 9% 0.336 4% 46%
25017 Middlesex County MA 1028 1 595 192 226.3 8% 0.4 2% 28%
9001 Fairfield County CT 935 944 348 347.7 9% 0.379 3% 38%
34017 Hudson County NJ 870 668 631 457.0 16% 0.175 8% 71%
9003 Hartford County CT 804 894 730 315.6 11% 0.271 2% 38%
26125 Oakland County MI 772 1 250 843 216.8 9% 0.273 1% 28%
34039 Union County NJ 768 553 066 487.7 10% 0.227 5% 60%
36085 Richmond County NY 758 474 101 561.5 13% 0.293 4% 38%
34023 Middlesex County NJ 667 826 698 283.4 9% 0.238 4% 56%
34031 Passaic County NJ 663 504 041 462.0 17% 0.22 7% 58%
26099 Macomb County MI 647 868 704 261.6 11% 0.205 2% 20%
42101 Philadelphia County PA 627 1 575 522 139.8 25% −0.04 3% 65%
9009 New Haven County CT 610 859 339 249.3 12% 0.247 2% 36%
25025 Suffolk County MA 609 791 766 270.1 19% 0.192 4% 55%
Abbreviations: CA, California; CT, Connecticut; IL, Illinois; MA, Massachusetts; MI, Michigan; NJ, New Jersey; NY, New York; PA, Pennsylvania.

Zip code–level analyses

ZCTA-level analyses for Illinois and NYC revealed more consistent gradients for all ABSMs, though the magnitude of disparities comparing the top with the bottom socioeconomic categories was smaller on the relative disparity scale. Together, these results suggest that analyzing inequities in COVID-19 outcomes at finer levels of geographic aggregation is feasible and can provide important information about the unequal spread and impact of COVID-19 within counties and cities. As with the county-level death analysis, the results suggest that areas with higher rates of poverty, crowded housing, and populations of color are disproportionately affected. Moreover, given unequal patterns of testing, if residents of these neighborhoods are not able to access testing, these results may be understating the true magnitude of inequities in COVID-19 infection.

Age adjustment

An important drawback of these data is the lack of disaggregation by age, which prohibits presentation of age-standardized rates. There is some suggestion that age distributions in more disadvantaged counties may be skewed toward younger ages—for example, 42% of the population in counties in the lowest quintile of percent population of color is younger than 45 years compared with 49% of the population in counties in the highest quintile.26 As a result, estimates of inequities based on crude rates may underestimate the magnitude of age-adjusted inequities. We conducted sensitivity analyses to estimate relative inequities using indirect standardization by age and sex. This method does not require county-specific death counts by age and instead relies on combining age- and sex-specific national rates27 with disaggregated county population estimates to predicted expected death counts at the county level.28 These can be summed over age and sex and compared with the observed county death counts. These results (see Supplemental Appendix Table A3, available at https://links.lww.com/JPHMP/A714) show that age- and sex-adjusted inequities are substantially higher than the crude estimates for percent population of color (RR for Q5 vs Q1 = 6.69; 95% confidence interval [CI]: 6.49, 6.91) and percent crowding (RR for Q5 vs Q1 = 3.35; 95% CI: 3.27, 3.43), though not for percent below poverty or ICE. However, we caution that the interpretability of age-adjusted comparisons based on indirect standardization depends on an assumption of proportionality across age categories,29 which cannot be evaluated in the absence of age-specific county counts. Thus, efforts should continue to focus on encouraging surveillance systems to report disaggregated outcome data by age so that standardization can be done using the direct method.

Recommendations for public health departments

Our results reaffirm the urgency of documenting how historically disadvantaged communities are unequally affected by the devastation of the COVID-19 pandemic, inequities notably not discernible from data provided by US health agencies.1–3 The methods of the Public Health Disparities Geocoding Project8–11 provide a well-validated, robust, and cost-effective methodology by which public health departments can enhance their reporting of disparities in COVID-19 outcomes. Moreover, reporting of rates, especially if age-standardized, allows for meaningfully comparing population burdens across places that vary in population size and age structure,6–11 unlike the case counts currently reported on the CDC and other COVID-19 tracking sites.4,5

We recommend that state and local public health departments adopt reporting of COVID-19 outcomes minimally by ZCTA-level characteristics, which is preferable to county-level reporting. In our earlier work, we originally recommended routine reporting by socioeconomic characteristics of census tracts.9,15 While we stand by that recommendation, we recognize that it may be more feasible for surveillance systems to implement ZCTA-level analyses, since zip code is easy to ask of individuals as they are being tested, is already recorded on death certificates, and does not require additional steps for geocoding.1 We emphasize that reporting of disparities by ZCTA characteristics need not entail risk of individual data disclosure due to small numbers: because our methodology involves aggregating over ZCTAs with similar socioeconomic characteristics, summary statistics are reported for aggregations of ZCTAs, and typically have large enough numbers not to require data suppression.15 Thus, whenever possible, public health departments should report summary statistics by race/ethnicity, gender, and age within strata of ZCTA-level ABSMs in order to paint a fuller picture of the extent of inequities in COVID-19 outcomes. To assist public health departments that wish to implement these types of analyses, we direct interested readers to the Public Health Disparities Geocoding Project Web site.8

Statistical considerations

Aggregation over areas is analogous to how state and local health departments typically report disease rates by sex and race/ethnicity and avoids problems with statistical instability in the estimation of small-area rates by assuming that populations within strata of ABSMs have a common disease experience. While marginalizing over disease counts and population at risk may obscure meaningful area differences relevant to disease etiology or infectious disease transmission dynamics, rates computed for strata of ABSMs still provide an important description of what populations are impacted by COVID-19 and where disease burdens are most substantial. Similarly, while ABSMs such as percent poverty and percent crowding are certainly correlated, we recommend use of distinct social metrics (rather than summary indices or mutual adjustment for multiple metrics) since data on household crowding, poverty, and residential segregation by race/ethnicity and income are all informative and needed to communicate the social burdens of COVID-19.

The analyses we have presented here can be easily implemented by state and local health departments using existing surveillance data and an Excel spreadsheet or similar software. We argue that these simple descriptive analyses of inequities are vital to identifying the communities that are experiencing the most serious impacts of the pandemic and to holding government leaders and policy makers accountable for directing resources to those in need.

Throughout we have presented confidence limits based on traditional formulas for the variance of an incidence rate,20 which assumes that counts of events are Poisson distributed, arise from a homogeneous pool of person-time within stratum, and do not reflect correlation due to a contagious process. Given county variation in SARS-CoV-2 transmission dynamics (including when infected cases were seeded in these communities and how the pace of transmission has been affected by containment and mitigation strategies) and variation in the susceptibility of county populations above and beyond what is explained by ABSMs, the assumption of homogeneity is likely unrealistic. In particular, computed disparities in the death analyses are sensitive to the inclusion of counties where the epidemic had not yet taken hold. As these counties contribute population denominators but little or no deaths, the effect is to depress rates overall. Sensitivity analyses restricted to counties with 50 or more cases found higher rates overall, with sharper disparities comparing top with bottom quintiles of poverty but attenuated disparities by crowding and population of color (see Supplemental Appendix Table A3, available at https://links.lww.com/JPHMP/A714).

More sophisticated statistical models can be employed to model area-level variation in rates, including overdispersed Poisson, negative binomial, mixed models, and zero-inflated models.30,31 In our experience, however, estimates of socioeconomic inequities can be sensitive to the modeling approach taken, and the interpretation of summary measures of disparities at the population level may be complicated by model assumptions. Even when there are variations in rates within strata of ABSMs, estimates from the aggregated method still have relevant interpretation as the “average” health experience of persons living in areas with particular socioeconomic characteristics. While future work will address small-area estimation and models for spatial heterogeneity in COVID-19 outcomes, we should not lose sight of the immediate need for timely data on economic and social inequities to inform policy and interventions.

Implications for Policy & Practice
  • Prior to our study, no public health agency or scientific investigation has reported rates of COVID-19 outcomes (deaths, confirmed cases, positive tests) in relation to county or zip code measures of socioeconomic characteristics or residential segregation.
  • Analysis of rates, not just counts (the data currently being reported), is crucial, given socially patterned differences in population age structure and size within and across geographic areas.
  • Our study provides evidence of stark social gradients in rates of COVID-19 outcomes (deaths, confirmed cases, percent positive cases) at both the county and zip code levels.
  • Use of distinct social metrics (rather than summary indices) is important to guide action: data on household crowding, poverty, and residential segregation by race/ethnicity and income are all informative and needed to communicate the social burdens of COVID-19.
  • Our cost-effective straightforward methodology and results can motivate and guide state and local health departments to generate data relevant to monitoring inequities in COVID-19 outcomes and guiding resource allocation to mitigate these inequities.
  • State and local health departments should additionally make data available by age, gender, and race/ethnicity at the county and zip code levels to permit age adjustment of disease rates and to allow investigation of social inequities in COVID-19 outcomes.

References

1. Krieger N, Gonsalves G, Bassett MT, Hanage W, Krumholz HM. The fierce urgency of now: closing glaring gaps in US surveillance data on COVID-19. Health Aff Blog. https://www.healthaffairs.org/do/10.1377/hblog20200414.238084/full. Published April 14, 2020. Accessed May 6, 2020.
2. Krieger N. COVID-19, data, and health justice. To The Point. https://www.commonwealthfund.org/blog/2020/covid-19-data-and-health-justice. Published April 16, 2020. Accessed May 6, 2020.
3. Kendi IX, What the racial data show. The Atlantic. April 6, 2020. https://www.theatlantic.com/ideas/archive/2020/04/coronavirus-exposing-our-racial-divides/609526. Accessed May 6, 2020.
4. The COVID Tracking Project. About the data. https://covidtracking.com/about-data. Accessed May 6, 2020.
5. US Centers for Disease Control and Prevention. Cases of coronavirus (COVID-19) in the US. https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html. Accessed May 6, 2020.
6. Rehkopf DH, Haughton L, Chen JT, Waterman PD, Subramanian SV, Krieger N. Monitoring socioeconomic disparities in death: comparing individual-level education and area-based socioeconomic measures [erratum: Am J Public Health. 2007;97:1543]. Am J Public Health. 2006;96:2135–2138.
7. Krieger N, Chen JT, Waterman PD, Rehkopf DH, Subramanian SV. Painting a truer picture of US socioeconomic and racial/ethnic health inequalities: the Public Health Disparities Geocoding Project. Am J Public Health. 2005;95:312–323.
8. Krieger N, Chen JT, Waterman PD. Using the methods of the Public Health Disparities Geocoding Project to monitor COVID-19 inequities and guide action for health justice (May 15, 2020). https://www.hsph.harvard.edu/thegeocodingproject/covid-19-resources. Accessed July 29, 2020.
9. Krieger N, Chen JT, Waterman PD, Soobader M-J, Subramanian SV, Carson R. Geocoding and monitoring US socioeconomic inequalities in mortality and cancer incidence: does choice of area-based measure and geographic level matter? The Public Health Disparities Geocoding Project. Am J Epidemiol. 2002;156(5):471–482.
10. Krieger N, Chen JT, Waterman PD, Soobader M-J, Subramanian SV, Carson R. Monitoring socioeconomic inequalities in sexually transmitted diseases, tuberculosis, and violence: geocoding and choice of area-based socioeconomic measures—the Public Health Disparities Geocoding Project. Public Health Rep. 2003;118:240–260.
11. Krieger N, Chen JT, Waterman PD, Soobader MJ, Subramanian SV, Carson R. Choosing area-based socioeconomic measures to monitor social inequalities in low birthweight and childhood lead poisoning—the Public Health Disparities Geocoding Project (US). J Epidemiol Community Health. 2003;57:186–199.
12. USA Facts. COVID-19 deaths dataset. https://usafacts.org/visualizations/coronavirus-covid-19-spread-map. Accessed May 6, 2020.
13. Eads D, Friedrich P. MAP: find the number of COVID-19 cases in Illinois by zip code. https://www.wbez.org/stories/map-by-zip-code-of-coronavirus-covid-19-cases-illinois/90ca85cd-bdf4-423a-a7bc-924fcee9d0f3. Accessed May 6, 2020.
14. Centers for Disease Control National Notifiable Diseases Surveillance System (NNDSS). Coronavirus disease 2019 (COVID-19) 2020 interim case definitions. https://wwwn.cdc.gov/nndss/conditions/coronavirus-disease-2019-covid-19/case-definition/2020. Approved April 5, 2020. Accessed July 30, 2020.
15. Krieger N, Waterman P, Chen JT, Soobader MJ, Subramanian SV, Carson R. Zip code caveat: bias to spatiotemporal mismatches between zip codes and US census-defined areas—the Public Health Disparities Geocoding Project. Am J Public Health. 2002;92:1100–1102.
16. New York City Department of Health and Mental Hygiene. COVID-19 GitHub repository. https://raw.githubusercontent.com/nychealth/coronavirus-data/master/tests-by-zcta.csv. Accessed May 6, 2020.
17. US Census Bureau. 2014-2018 American Community Survey 5-year estimates. https://www.census.gov/newsroom/press-releases/2019/acs-5-year.html. Accessed May 6, 2020.
18. GitHub Inc. tidycensus package. https://github.com/walkerke/tidycensus. Accessed May 6, 2020.
19. Krieger N, Waterman PD, Spasojevic J, Li W, Maduro G, Van Wye G. Public health monitoring of privilege and deprivation using the Index of Concentration at the Extremes (ICE). Am J Public Health. 2016;106:256–263.
20. Greenland S, Rothman KJ. Introduction to categorical statistics. In: Rothman KJ, Greenland S, eds. Modern Epidemiology. 2nd ed. Philadelphia, PA: Lippincott-Raven; 1998: chap 14.
21. CDC COVID-19 Response Team. Geographic differences in COVID-19 cases, deaths, and incidence—United States, February 12-April 7, 2020. MMWR Morb Mortal Wkly Rep. 2020;69(15):465–471.
22. Wu J, McCann AL. 60,000 missing deaths: tracking the true toll of the coronavirus crisis. New York Times. May 6, 2020. https://www.nytimes.com/interactive/2020/04/21/world/coronavirus-missing-deaths.html. Accessed May 6, 2020.
23. Hawkins D. The coronavirus is falling heavily on black Americans. Why? The Guardian. April 16, 2020. https://www.theguardian.com/commentisfree/2020/apr/16/black-workers-coronavirus-covid-19. Accessed May 6, 2020.
24. Bailey M, Gee A, Jewett C, Renwick D, Varney S. COVID-19: lost on the frontline. Kaiser Health News. April 15, 2020. https://khn.org/news/lost-on-the-frontline-health-care-worker-death-toll-covid19-coronavirus. Accessed May 6, 2020.
25. US Department of Housing and Urban Development, Office of Policy Development and Research. Measuring overcrowding in housing. https://www.census.gov/content/dam/Census/programs-surveys/ahs/publications/Measuring_Overcrowding_in_Hsg.pdf. Published 2007. Accessed May 6, 2020.
26. CDC WONDER Online Database. Single-race population estimates, United States, 2010-2018. July 1st resident population by state, county, age, sex, single-race, and Hispanic origin. Vintage 2018 estimates released by U.S. Census Bureau on June 20, 2019. http://wonder.cdc.gov/single-race-v2018.html. Accessed Jun 20, 2020.
27. National Center for Health Statistics. Provisional COVID-19 death counts by sex, age, and week. https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-W/vsak-wrfu. Accessed July 29, 2020.
28. Breslow NE, Day NE. Indirect standardization and multiplicative models for rates, with reference to the age adjustment of cancer incidence and relative frequency data. J Chronic Dis. 1975;28(5/6):289–303.
29. Pickle LW, White AA. Effects of the choice of age-adjustment method on maps of death rates. Stat Med. 1995;14:615–627.
30. Chen JT. Multilevel and hierarchical models for disease mapping. In: Boscoe FP, ed. Geographic Health Data: Fundamental Techniques for Analysis. Wallingford, Oxfordshire, England: CABI; 2013:183–208.
31. Wu X, Nethery RC, Sabath BM, Braun D, Dominici F. Exposure to air pollution and COVID-19 mortality in the United States. medRxiv. 2020. doi:10.1101/2020.04.05.2005450.
Keywords:

COVID-19; health disparities; public health surveillance; socioeconomic inequities; state health departments

Supplemental Digital Content

© 2020 Wolters Kluwer Health, Inc. All rights reserved.