doi: 10.1097/01.ede.0000112213.80320.68
Epidemiology & Society

Who Is Hispanic?: Implications for Epidemiologic Research in the United States

Borak, Jonathan; Fiellin, Martha; Chemerynski, Susan

From the Departments of Medicine and Epidemiology & Public Health, Yale School of Medicine, New Haven, Connecticut.

Editors’ Note: Epidemiology & Society provides a broad forum for epidemiologic perspectives on health research, public policy, and global health.

Submitted 10 June 2003; final version accepted 11 November 2003.

Correspondence: Jonathan Borak, 234 Church Street (#1100), New Haven, CT 06510. E-mail:

Collapse Box


The most recent U.S. Census reported that Hispanics are now the nation's largest minority group. At the same time, increasing attention has focused on the inherent heterogeneity of the U.S. Hispanic population. Such a rapidly growing but heterogeneous minority poses potential challenges to population-based research. To understand those challenges better, we first considered the history of the demographers’ question: “Who is Hispanic?” We then considered the implications of differing Hispanic identity criteria for disease surveillance. Although relevant to political and socioeconomic considerations, the Hispanic ethnic category may not be specifically useful for understanding most disease processes. For epidemiologic studies, there is need for more transparent criteria to classify subpopulations. Those criteria must be regularly subjected to analysis and validation.

Media headlines in early 2003 announced that Hispanics had become the largest U.S. minority group, outnumbering blacks by almost 1 million people. Such growth of the Hispanic population, and the resulting realignment of the nation's racial and ethnic composition, had been anticipated by demographers as the consequence of 2 trends: increasing Hispanic immigration and high Hispanic birth rates.1 Those straightforward headlines, however, masked a complex interplay of statistical, demographic, and sociologic factors that made analysis of such population changes quite complex. One example is the difficulty of identifying and measuring race and ethnicity in a nation characterized by population mobility and assimilation. A second stemmed from the historic methods used for the U.S. census. Over the years, the Census Bureau frequently revised its criteria for identifying and characterizing people of differing races and ethnic groups, especially Hispanics. Such revisions have reduced the value of longitudinal census comparisons.

A key example is found in the 2000 census. Studies had shown that federal standards for categorizing race and ethnicity were inadequate in the face of an increasingly heterogeneous U.S. population.2 Accordingly, the Census Bureau changed its methods.3 Census respondents were required to self-identify in 2 separate categories, first by selecting one or more racial categories (black, white, Asian/Pacific Islander, Native American/Alaskan Native, or “some other race”) and then by selecting an ethnic category (Hispanic vs. non-Hispanic). By treating race and ethnicity as distinct, and by allowing respondents to self-identify as more than one race, the 2000 census reflected changing perceptions of how to define and categorize an increasingly diverse society. However, interpretation and use of the resulting data will likely prove difficult—challenging many fundamental ideas about race and ethnicity. In particular, the inclusion of questions on both race and ethnicity will lead to operational difficulties for epidemiologists studying heterogeneous populations.

Information on race and ancestry has been collected since the first U.S. census in 1790.4 Unfortunately, there is also a long history of noncomparability of census data across decades. Nearly every census after the first has modified the questions on race and ethnicity data.5 In the early 1900s, for example, groups of recent immigrants (eg, Italians, Irish, and Jews) were classified as “nonwhite” to distinguish them from the original Anglo-Saxon Protestant settlers.6,7 Over the following decades, in response to changing attitudes, and consistent with the American tradition of assimilation, many of those groups were included in the “white” category, with little attention given to ethnic differences or immigrant status.7

Still greater variability has distinguished census criteria for Hispanics.1,8 In 1930, the Census counted “Mexicans” as a separate race, whereas the 1940 census noted “persons of Spanish mother tongue,” but not specifically “Mexicans.” Ten years later, the 1950 census enumerated “persons of Spanish surname,” but not those of “Spanish mother tongue.” By 1970, the census had again changed its focus, asking about national “origin.”9 As a consequence, data for U.S. Hispanics from sequential census surveys are not readily compared, and the interpretation of long-term studies that rely on data from several census surveys is problematic.10

In 1978, during the Nixon Presidency, the Office of Management and Budget (OMB) promulgated Statistical Policy Directive 15 in an effort to set uniform standards for collection of race and ethnicity data, and to increase the availability of information about persons of Hispanic origin.4 Directive 15, still the Federal standard, acknowledged the lack of a scientific or anthropologic basis: “This Directive provides standard classifications for recordkeeping, collection, and presentation of data on race and ethnicity in Federal program administrative reporting and statistical activities. These classifications should not be interpreted as being scientific or anthropological in nature.”11 Nevertheless, to increase data consistency across federal agencies, it set administrative rules for classifying individuals into racial and ethnic categories4,12,13: “The minimum designations are: (a) Race: American Indian or Alaskan Native, Asian or Pacific Islander, Black/African American, White; (b) Ethnicity: Hispanic origin, not of Hispanic origin.

“When race and ethnicity are collected separately, the number of White and Black persons who are Hispanic must be identifiable, and capable of being reported in that category...

“The category which most closely reflects the individual's recognition in his community should be used for purposes of reporting on persons who are of mixed racial and/or ethnic origins.”12

This Directive coincided with a wave of immigration during the 1980s and 1990s that brought many Latin American and Caribbean people to the United States.5 Many were categorized as Hispanic under the rubric of the OMB standards.

Directive 15 also coined the current use of the term “Hispanic.” Previously, that term was used to describe people of Iberian ancestry, not Central and South American. The North American Association of Central Cancer Registries (NAACCR) has described that history as follows: “The term ‘Hispanic’ was created by the U.S. government; the population so identified is, in fact, an artificial rubric for a set of diverse populations that resulted from the mixture of indigenous American peoples, African slaves, and Europeans.”14 Another perspective on the “political construction and usage [of] a contrived Hispanic ethnicity” was provided by the Argentinian-born sociologist, Martha Gimenez, who has lived in the United States for more than 30 years: “The ‘Hispanic’ label fulfills primarily ideological and political identifies neither an ethnic group nor a minority group. It is the temporary outcome of political struggles between the major parties to win elections...”15 Richard Rodriquez, a contemporary Hispanic commentator on the dilemmas of ethnicity and cultural identity, has described Richard Nixon as “the Dark Father of Hispanicity.”16

The formal inclusion of Hispanic ethnicity in the U.S. Census has raised concerns for the validity of aggregating diverse populations into a single ethnic category. Such heterogeneity is reflected by the many countries from which Hispanics originated, differences in their mother tongues, varieties in family names, and even the alternative terms used to describe (and self-describe) group members (eg, Hispanic, Latino, Chicano, Raza, Mexicano, Manito).8,17 Nonetheless, later that year, OMB replaced the term “Hispanic” with “Hispanic or Latino” with the intention of improving data collection: “Because regional usage of the terms differs–Hispanic is commonly used in the eastern portion of the United States, whereas Latino is commonly used in the western portion—this change may contribute to improved response rates.”11 In 1997, an OMB Interagency Review voiced concerns about use of the term “Latino”: “Other terms, such as ”Latino“ or ”Spanish Origin,“ can be used to achieve more complete coverage of the Hispanic population. There is some evidence, however, that using the term ”Latino“ may result in the inclusion of some unintended population groups, so it should not be a part of the minimum standard.”18

If the concept of ethnicity was incorporated into the collection of vital statistics to better appreciate the range of cultural differences between various populations, then the use of “Hispanic” as a generic term may actually undermine that process: “The Raza population is very heterogeneous, ranging from the non-Spanish-speaking Indians of rural Mexico to British descended Argentines or Germanic Chileans. There are racial, cultural, and language differences between the various Raza groups: Mexican, Puerto Rican, Cuban... If all a planner has is research on urban Cuban residents in Florida, can he/she apply the results to a rural Mexican population in Colorado?... the bald fact is that Latinos are racially quite heterogeneous upon an Indian substrate ranging from pure Indian to pure White, to pure Black, to pure Asian with nearly every conceivable combination.”10

Differing or changing criteria for identifying and categorizing population subgroups have potentially important implications for monitoring disease incidence and interpreting disease rates over time. Health surveillance in the United States often relies on data derived from multiple sources. Because criteria used to characterize subpopulations can vary from source to source, there is risk that combining data from such sources will introduce statistical uncertainty and confusion. Accordingly, it is useful to consider the possible effects of criteria differences before data from various sources are combined.4,19 It is likewise appropriate to ensure that criteria are applied in consistent ways.

The inconsistencies and changes in census criteria used to identify Hispanics have already been described; characteristics that defined Hispanic populations differed from one census to the next. Because of such differences, longitudinal analyses of disease rates that rely on census data for determining appropriate denominators in Hispanic populations are necessarily uncertain. Fortunately, these census-to-census differences have been well described.

Less well recognized are the differences in criteria used by other organizations and agencies to identify Hispanics. For example, the Medicare Enrollment Database, which derives demographic information from Social Security application forms, currently treats “Hispanic” as a racial category distinct from white and black, rather than as an ethnic category.20 Before 1980, Social Security applications did not include Hispanic as a racial group; those obtaining Social Security cards before that year could identify themselves only as white, black, or other. Accordingly, the Medicare race variable has only 39% sensitivity for Hispanic ancestry,21 making the Medicare Enrollment Database severely limited for evaluating Hispanic health status.

Another example involves the National Health Interview Survey (NHIS). From 1957–1976, race was categorized on the basis of observations by the NHIS interviewer; respondents were categorized as black, white, or other. From 1976–1978, NHIS added one question that asked respondents to identify their national origin or ancestry. In 1978, NHIS adopted the criteria of OMB Directive 15, asking separate questions to determine each respondent's race and Hispanic/non-Hispanic ethnicity.22

Other inconsistencies are found in the criteria used by U.S. cancer registries to identify people of Hispanic ethnicity.23 Most registries use abstractors who use Spanish surname algorithms for medical records that do not otherwise indicate an ethnic category. Such algorithms, based on surname lists initially developed by the Census Bureau, have been used as the sole basis for identifying Hispanics in population-based mortality studies.24,25 However, that method is problematic. It often confuses Hispanics with others who have Italian, Portuguese, or Filipino surnames; it cannot identify Hispanics with non-Hispanic names, and it misclassifies non-Hispanic women married to Hispanic men (although some registries consider women's maiden names).8,17 In addition, the method's sensitivity varies directly with the proportion of the study population that is Hispanic.

Limitations of the surname method were quantified in a study of San Francisco residents that compared use of a Spanish surname list with self-identified ethnicity.26 Overall sensitivity for identifying Latinos was 79%, but 38% of people with Spanish surnames were not Latinos. Although the specificity of surnames for identifying non-Hispanics was 95%, positive predictive value was only 68% for Latino men and only 56% for Latina women. Similar findings were reported by NAACCR: “The accuracy of all the methods to correctly classify non-Hispanics was extremely high, [specificity and negative predictive values >95%]. However, their accuracy in classifying Hispanics correctly was substantially lower [55–77%].”14

There are also differences between surname lists used by different registries,23 reflecting regional differences in the composition, national origins, and surnames of Hispanics across the United States. For example, the original Census Bureau list of 8000 Spanish surnames, designed to identify Mexican-Americans, has been described as “only usable in 5 southwestern states where Spanish surnames are common: Arizona, California, Colorado, New Mexico, and Texas.”17 The adoption of expanded Spanish surname lists has allowed regional cancer registries to identify Hispanics more accurately in their catchment areas but, because the methods are not consistent from one cancer registry to the next, it has also made Hispanic incidence data less comparable across registries.

Historically, the categorization of populations into racial groups has proved historically useful to epidemiologists for discerning associations between risk factors (both genetic and environmental) and disease.27 Social scientists increasingly regard race as multidimensional (eg, “social category based on the identification of (1) a physical marker transmitted through reproduction and (2) individual, group, and cultural attributes associated with that marker”28) and of only limited value as a biologic predictor. Even so, its use as an epidemiologic variable has become institutionalized.

Unlike race, which at some level seems to imply a biologic paradigm, the concept of ethnicity encapsulates cultural, behavioral, and environmental factors that contribute to the identification and recognition of human groups as somehow distinct in the absence of distinguishing physical characteristics. Like race, ethnicity has been regarded as a useful predictor or risk factor for disease.7 For example, “The study of cancer risk differences among racial and ethnic groups is a practical epidemiologic method for discovering important clues into the etiology of, and perhaps preventive measures for, specific cancers.”29

Ethnic categories, however, may be more likely than racial categories to suffer from ambiguous definition, leading to greater risks of misclassification. In his considerations of the ambiguous nature of “Belief in Common Ethnicity,” Max Weber concluded that ethnic categorization is inherently so subjective that the “concept of the ‘ethnic’ group dissolves if we define our terms exactly”30: “We shall call ‘ethnic groups’ those human groups that entertain a subjective belief in their common descent because of similarities of physical type or of customs or both, or because of memories of colonization and migration; this belief must be important for the propagation of group formation; conversely, it does not matter whether or not an objective blood relationship exists. Ethnic membership differs from the kinship group precisely by being a presumed is primarily the political community, no matter how artificially organized, that inspires the belief in common ethnicity.”30 Others have argued that ethnicity “cannot meaningfully be disentangled from ‘race’ in societies with inequitable race relations” and that both are “forged by oppressive systems of race relations.”31

Whether such concerns are generally true (and notwithstanding that racial groups are also prone to misclassification32), there are reasons to suggest that the ethnic category “Hispanic or Latino” reflects aggregation of dissimilar groups, and that analyses based on such categorization may have only limited informational value. This particular concern has been raised by NAACCR: “Information about Hispanics/Latinos, the nation's fastest growing minority, is difficult to interpret because the data collection methods have not been uniformly applied, and have often not been well-defined. Even the terminology for referring to Hispanics may vary from region-to-region, and within a single area, from population-to-population.”14

That possibility has not been systematically evaluated. More importantly, there may be no simple solution to such difficulties. Beyond unscientific OMB guidelines regarding “Who is Hispanic,” there is disagreement among those who would be labeled “Hispanic” as to whether that term defines ethnicity or race and who should be included. Brazilians are sometimes regarded as “non-Hispanic” because their national language is Portuguese, whereas European-origin Jews and first-generation Chinese may be classified as Hispanics if they were born in Argentina or Peru. Moreover, some “Hispanics” argued against official adoption of “Latino” because that term included “peoples, nationalities or countries...whose languages and culture are descended from the Latin”33; eg, French, Italian, Portuguese, and Romanians.

Other examples are found in the record of the U.S. Census. In 1980, for example, 40% of respondents who said they were of “Spanish/Hispanic origin” also wrote in “Spanish/Hispanic” to identify their racial grouping.34 Even more clear evidence is found in the 2000 census. Respondents were asked to identify their racial category and, as an ethnic consideration separate from race, they were asked to describe themselves as Hispanic or non-Hispanic. Many respondents, especially recent Hispanic immigrants, found such distinctions difficult: “Latinos of African, metizo, and European descent—or any mixture of the three—found it hard to answer the question ‘what is your racial origin?’...some of the nation's 35 million Latinos scribbled in the margins that they were Aztec or Mayan...Nearly 48% described themselves as white, and only 2% as black. Fully 42% said they were ‘some other race.’35 Likewise, a recent Pew Hispanic Center survey found compelling need to reconsider how we think about Hispanic ethnicity in the United States: ”[T]he majority of Hispanics identified themselves by the country in which they or their ancestors were born...When asked about racial identity, 56% indicated that they did not fit into one of the racial categories typically used by the U.S. Government, such as ‘white,’ ‘African-American,’ or ‘Asian.’ Rather, the majority indicated that they preferred to identify their race as ‘Latino’ or ‘Hispanic.’“36

These observations raise additional concerns about the future use of Hispanic categorization in epidemiologic research. If Hispanics cannot agree on criteria for self-identification as an ethnic group, and if they disagree on whether Hispanic is a racial or ethnic label, then it is unlikely that rules promulgated by bureaucratic (and largely non-Hispanic) agencies will lead to epidemiologically useful definitions of Hispanic ethnicity.

Racial and ethnic categories have been important in epidemiologic research. They can remain useful, but only if researchers are increasingly explicit in the ways by which they identify ethnic and racial subgroups. In other words, the question is not ”whether but how“ to use such categorizations for epidemiologic research.37 Researchers who study such groups should be expected to detail their criteria for identifying members and characterizing populations, ie, ”what are the rules they use to select the population and the variables defining the community under investigation."6 An important lesson to be learned from the growing size, influence, and diversity of the Hispanic population is that in the face of an increasingly heterogeneous society, we will need more explicit criteria for classifying subpopulations and those criteria in turn must be regularly subjected to analysis, reanalysis, and validation.

We gratefully acknowledge the thoughtful suggestions of Kenneth Rothman and Mark Cullen who commented on drafts of the manuscript.

The protocol for this study was reviewed and approved by the Yale University Human Investigations Committee and is in compliance with federal, state, and university regulations.

Part of this literature review was performed in the context of services provided to the Office of the Corporation Counsel, City of New York.

© 2004 Lippincott Williams & Wilkins, Inc.

