Secondary Logo

Journal Logo

Non–English Language Availability of Community Health Center Websites

Rodriguez, Jorge A. MD*; Davis, Roger B. ScD; Percac-Lima, Sanja MD, PhD

doi: 10.1097/MLR.0000000000001027
Original Research

Background: For limited English-proficient (LEP) patients, the digital divide has narrowed, creating a new population of Internet users. However, language-appropriate health information is difficult to find. Community health center (CHC) websites are health information resources and their homepages are critical access points for patients. CHCs supported by Health Resources and Services Administration (HRSA) care for many LEP patients.

Objective: We sought to determine the number of HRSA-supported CHC websites providing translated homepage content.

Research Design: In February 2017, we performed a cross-sectional analysis of the language availability of CHC homepages.

Measures: The primary outcome was availability of translated content on CHC homepages. Secondary outcomes were method of translation and associations between homepage translation and CHC demographics, including percent LEP population and socioeconomic and Internet access characteristics.

Results: Of the 1400 CHC homepages, 480 (34.3%) provided translated information with half using Google Translate. We found higher odds of having a translated homepage as the LEP population by county increased [odds ratio (OR): 1.26, confidence interval (CI): 1.07–1.49, P=0.005], Internet subscription at the state level increased (OR: 1.19, CI: 1.02–1.38, P=0.026), and if health centers were in metropolitan areas (OR: 1.81, CI: 1.31–2.51, P<0.001). There was also higher likelihood of having a homepage translated to Spanish in counties with higher Spanish LEP populations (OR: 1.39, CI: 1.19–1.63, P<0.001), but this did not extend to non-Spanish languages (OR: 0.85, CI: 0.71–1.04, P=0.131).

Conclusions: Despite increased Internet use among LEP patients and linguistic diversity of the CHC populations, there is a lack of language-appropriate content on CHC website homepages.

Divisions of *Clinical Informatics

General Medicine and Primary Care, Beth Israel Deaconess Medical Center

Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA

J.A.R. received support from a career development award from the Office of Diversity, Inclusion and Career Advancement at Beth Israel Deaconess Medical Center. The study was partially funded under grant number R01 HS021495-01A1 from the Agency for Healthcare Research and Quality (AHRQ), US Department of Health and Human Services, support from Harvard Catalyst, the Harvard Clinical and Translational Science Center (National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health Award UL1 TR001102), and financial contributions from Harvard University and its affiliated academic health care centers. The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard Catalyst, Harvard University and its affiliated academic health care centers, or the National Institutes of Health.

The preliminary findings of this study were presented as an oral presentation at New England Society of General Internal Medicine Regional Meeting, March 10, 2017, Boston, MA, and at the National Society of General Internal Medicine Conference, April 2018, Denver, CO.

The authors declare no conflict of interest.

Reprints: Jorge A. Rodriguez, MD, Division of Clinical Informatics, Beth Israel Deaconess Medical Center, 1330 Beacon St. Suite 400, Brookline, MA 02144. E-mail:

Health technology has been presented as a promising tool to actively engage patients in their care and improve health outcomes. In 2009, the Health Information Technology for Economic and Clinical Health (HITECH) act provided financial incentives for the integration of technology as part of patient care.1 Given existing health disparities, underserved patients stand to gain significantly from technological tools, including the Internet. Patient-facing technologies can actively engage patients in their care. A review by the Robert Wood Johnson Foundation revealed that patients who were more engaged and informed about their care had improved quality of life, functional autonomy, and decreased hospital use.2 To decrease health disparities, technology solutions must include culturally and language-appropriate patient information to be effectively used among underserved patients.3

The term “digital divide” has been defined as the gap between those who have access to technology and those who do not.4 The digital divide has disproportionately affected rural communities, racial/ethnic minorities, populations with low socioeconomic status, and limited English-proficient (LEP) patients.5 LEP is defined as speaking English less than very well.6 The LEP population accounts for 8.6% of the US population.7 LEP patients suffer from multiple gaps in care including lower access to consistent medical care, lower use of preventative care, and higher rates of hospitalization and readmission.8–11 Technology has the potential to decrease the health disparities faced by LEP patients. Historically, LEP patients have lagged in their Internet use with only 36% of Spanish-dominant Latinos reporting its use in 2009.12 However, the spread of broadband Internet access and Internet-capable mobile devices has expanded Internet access for LEP patients and in 2015, 74% of Spanish-dominant Latinos reported Internet use. Although Internet use data for non–Spanish-speaking LEP patients is lacking, it is likely consistent with rising national trends in Internet use.13 A new population of LEP Internet users has emerged, thereby narrowing the classic digital divide.

Despite the rise in LEP Internet users and the purported benefits of health technology, language-appropriate online health information has not followed. In 2001, Berland et al demonstrated that online searches for health information in Spanish were less likely to yield relevant content.14 In addition, LEP patients often report difficulty finding health information that is language appropriate. An analysis of the 2005 Health Information National Trends Survey (HINTS) data showed that Spanish-speaking patients reported a higher effort, difficulty in understanding, and frustration when searching for online cancer information.15 For LEP patients, the digital divide has shifted from disparities in access to “digital inequalities,” which reflect disparities in content.16,17 Patients vulnerable to digital inequalities may have Internet access, but now lack the requisite tools to meaningfully engage online, including technological literacy, health literacy, and access to appropriate content.

Among the multiple sources of online health information, health center websites are essential for patients. They provide a range of health information, from directions and phone numbers to provider directories, patient portal information, and educational materials. In a study looking at the Internet-use patterns of underserved populations, Viswanath et al18 found that hospital websites were among the most common health-related sites visited by underserved patients. Similarly, community health center (CHC) websites likely serve as key resources for patients on the Internet. Furthermore, CHCs supported by the Health Resources and Services Administration (HRSA) provide primary health care services to medically underserved patients with ~6.1 million (24%) of their patients best served in another language.19,20 Thus, these clinics serve as key physical and digital access points for LEP patients.

Given the shift from the digital divide to digital inequalities for LEP patients, the potential for health technologies to decrease health disparities, and the reported difficulties in finding online health information, the aim of our study was to determine the language availability of HRSA-supported CHC websites and assess its association with community demographics.

Back to Top | Article Outline


Study Design

In February 2017, we performed a cross-sectional review of the language availability of HRSA-supported CHC homepages. We focused on website homepages since it is the first point of contact on the website for patients. From the HRSA data warehouse, we obtained a publicly available dataset of HRSA-supported CHC information, including website URLs (last updated: 05/01/2016).21 We reviewed unique website URLs, removing duplicates from the dataset.

Back to Top | Article Outline

Outcome Measures

The primary outcome was availability of translated homepage content. Secondary outcomes were method of translation, languages offered, and associations between homepage translation and CHC county and state demographics (including percent LEP population as well as economic, education, and Internet access characteristics). The method of translation was defined as machine translation (ie, Google Translate), manual translation, or no translation. Google Translate can translate a standard set of 103 languages.22 Manual translation was defined as translation included on the homepage that did not involve an embedded machine translation tool. Manual translation was further categorized by quantity of translated content as limited (few translated phrases or forms), moderate (1 translated webpage), or full (entire translated site).

Back to Top | Article Outline

Data Collection

We assessed the language availability of each homepage using a combination of manual review and web scraping. Web scraping involves writing computer programs aimed at extracting information from online content at a large scale. We used the web scraping framework Scrapy written in Python.23 This framework allowed us to extract the homepage text. When writing the web scraping program, we used a combination of HyperText Markup Language (HTML) identifiers and a collection of keywords to identify websites with non-English content (Appendix Table, Supplemental Digital Content 3, For example, to identify a website using Google Translate, we searched for identifiers like “google_translate_element” in the website HTML code. We also used language-specific keywords, for example “Spanish” or “Español.” These keywords were generated for the 10 most common languages in the United States apart from English.24 If these homepages were found to have any of the keywords, direct review was performed to confirm the presence of non-English content and assess the quantity of translated content. We defined the quantity of manually translated content as limited, moderate, or full, as described above.

Using the location of each clinic, we collected information about their counties to assess the impact of location and population demographics on the presence of translated homepage content. From the American Community Survey (ACS), we obtained percent of total LEP population, percent of language-specific LEP population, percent of the population older than 65, percent of households with a broadband or high speed Internet subscription at the state level, and median household income by county.25–27 Given the association between age and Internet use, we also included percent of the population above 65 years of age by county in our model.13 We identified whether a county was considered to be in a metropolitan area as determined by the Office of Management and Budget.28 We used 2015 US Department of Agriculture typology codes as dichotomous county-level markers of low education, low employment, and persistent poverty.28 A county was defined as having low education if ≥20% of county residents aged 25–64 did not have a high school diploma or equivalent. Low employment was defined as <65% of county residents age 25–64 were employed, determined by the ACS 5-year average data. Similarly, a county was classified as having persistent poverty if ≥20% of county residents were poor, measured by the 1980, 1990, 2000 censuses, and the ACS 5-year average data.

Back to Top | Article Outline

Statistical Analysis

We used homepage translation as a dichotomous dependent variable. We used binomial logistic regression to assess the relationship between translated homepage content and the CHC county and state-level demographics. To assess the impact of different LEP groups on homepage translation, we performed 2 separate analyses on total LEP population, Spanish LEP population, and non-Spanish LEP population. Notably, for continuous variables, we calculated a standardized odds ratio (OR) (ie, the OR for a difference of one SD of the independent variable). We also assessed differences for specific languages by comparing language-specific LEP populations. We used Wilcoxon Rank Sum test to compare continuous variables and χ2 test for categorical variables. Statistical analysis was performed using R statistical software (version 3.4.3).

Back to Top | Article Outline


We identified the websites of 1400 HRSA-supported CHCs in the United States (Table 1). Of these 1400 websites, 480 (34.3%) provided translated content on their homepages (Supplemental Fig. 1, Supplemental Digital Content 1, Among the homepages with translation, the methods of translations offered were machine translation (n=244, 50.8%) and manual translation (n=236, 49.2%). For the manually translated homepages, 68 (28.8%) had limited manual translation, 72 (30.5%) had moderate manual translation, and 96 (40.7%) had full manual translation (Supplemental Fig. 1, Supplemental Digital Content 1,



There were 49 different languages offered by manually translated homepages. The most common languages offered were Spanish (232 homepages), followed by Chinese (18 homepages) websites, and Vietnamese (16 homepages) (Supplemental Fig. 2, Supplemental Digital Content 2, Most of the homepages offered only one translated language.

Overall, we found higher odds of having a translated website as the percent of LEP residents by county increased [OR: 1.23, confidence interval (CI): 1.05–1.44, P=0.011] (Table 2). Similarly, as the percent of households with an Internet subscription increased at the state level, the odds of having a translated homepage were higher (OR: 1.19, CI: 1.03–1.39, P=0.022). In addition, health centers in metropolitan areas were more likely to have a translated homepage (OR: 1.64, CI: 1.19–2.28, P=0.002). Health centers located in counties identified as having persistent poverty were found to have a lower likelihood of having a translated homepage (OR: 0.44, CI: 0.27–0.70, P<0.001).



Spanish language analysis revealed a higher likelihood of homepages translated to Spanish in counties with a higher Spanish LEP population (OR: 1.39, CI: 1.19–1.63, P<0.001) (Table 3). The association with home broadband access, location in metropolitan area, and persistent poverty remained. Conversely, analysis of non-Spanish LEP languages demonstrated no association between non-Spanish LEP population and likelihood of having a translated website to a non-Spanish language (OR: 0.85, CI: 0.71–1.04, P=0.131) (Table 3). However, in bivariate analysis, we found significantly higher language-specific LEP populations in counties with translated homepages for Chinese, Russian, Portuguese, and Tagalog (Table 4).





Back to Top | Article Outline


To our knowledge, this is the first study to explore the language availability of CHC websites in United States. Our study revealed that two-third of federally qualified HRSA-supported CHCs did not offer translations of their homepage content. Those that did provide translation relied equally on machine translation (ie, Google Translate) and manual translation, although the quantity of translated content varied.

Our findings are consistent with the Londra et al29 study review of language availability of the reproductive endocrinology and infertility practice websites. They reported that only 27% reproductive endocrinology and infertility practices offered some translation, mostly content in Spanish. However, compared with these subspecialty practices, CHCs serve a larger swath of the population, particularly LEP patients; thus, our finding of a lack of language-appropriate online content has broader implications. Since this is where most LEP patients receive their care, they are more likely to trust and rely on the information presented by their CHCs. Because of the difficulties in finding trustworthy online health information from other sources, health centers could leverage their LEP patients’ trust to engage them in the digital space.

Despite the low number of CHCs offering translated content, the health centers in counties with higher LEP and Spanish LEP populations were more likely to have a translated website. However, this association did not persist for non-Spanish languages. Although we found evidence of language-specific tailoring for Chinese, Tagalog, Portuguese, and Russian populations, our regression model did not reveal an overall association between non-Spanish language population and likelihood of having a translated homepage. This likely reflects the predominance of Spanish LEP populations nationally but may reveal a lack of tailoring of language content for the specific population each clinic serves.

The languages of the translated homepages align with national language demographics. Spanish was by far the most common language offered reflecting the nearly two-third of non-English speakers in the United States who speak Spanish.30 Similarly, Chinese, Tagalog, Vietnamese, French and Korean, also represented in our sample, are among the most common non-English languages spoken in the United States. We also found Russian, Arabic, Portuguese, and Haitian Creole to be common languages on the homepages, although the use of these languages nationally is not as prevalent.

Machine translation (ie, Google Translate) accounted for about half the translated homepages. Manual translation is an intensive and rigorous process that involves commitment of time and financial resources. Given the limited resources available at CHCs, machine translation provides a translation method that is easier to implement. Although machine translation can be a convenient tool for translating content, it should be used with caution due to its propensity for errors.31 Recently, the Quality and Patient Safety Division of the Massachusetts Board of Registration in Medicine released a statement noting that the use of machine translation may “provide erroneous or nonsensical translations that can lead to patient misunderstanding and potentially compromise patient safety.”32 Although websites may not contain sensitive or high-risk information that could present harm to patients if mistranslated, machine translation should aid in the developing of language-appropriate content, rather than be the definitive solution.

The economic status of the population was significantly associated with providing translated homepage content. Health centers in counties with low employment and persistent poverty were less likely to have a translated homepage. CHCs may view the cost of device ownership as well as Internet access as prohibitive to their LEP patients. Although this is consistent with remaining digital divides, these gaps continue to narrow and patients may begin to look for health information from their health centers.33 Notably, the educational level of the population did not affect the likelihood of having translated content. Given the association between Internet use and literacy, we had expected areas with low educational attainment to have a lower likelihood of having translated content.34 This nonsignificant result may reflect a focus on the language preference of the population, but not necessarily the literacy level of the population, which may vary in both English and their preferred language.35,36

Our findings also mirror the effect of the urban and rural digital divide.37,38 CHCs located in metropolitan areas had 64% higher odds of having a translated homepage than their nonmetropolitan counterparts. However, even Internet use among rural populations is on the rise at 81% in 2016 from 41% in 2000.13 The Internet and health technology tools (ie, telehealth) continue to provide a platform to bridge some of the physical barriers to access health care in rural areas. LEP patients stand to gain from these advances, but the lack of language-appropriate online content may hinder their progress.

Our study has limitations. We acknowledge that translation is only the first step in creating accessible content for LEP patients We specifically focused on homepages since these are the website entry points and did not review the rest of the site. Our assessment did not include specific content on the translated homepages, including the type of content offered, the quality of translation, or the literacy level. Furthermore, web scraping can lead to misidentification of translation status since we rely on predetermined identifiers and keywords. We attempted to attenuate this limitation with direct review. In addition, although we captured Internet access through broadband adoption, we have not accounted for the impact of mobile Internet adoption. Although county-level demographics serve as proxies for health center population demographics, they do not provide specific information reflecting the population the health center serves. This is a cross-sectional study and further research is needed to explore changes in language availability of CHC websites over time.

Back to Top | Article Outline


Despite the increased Internet use among LEP patients, as well as the linguistic diversity of the CHC populations, there is an overall lack of language-appropriate content on CHC websites. Our study provides insights into the shift from the digital divide, which reflected a lack of Internet access, to digital inequalities arising from a lack of content and design. By developing online content that is culturally and linguistically appropriate for patients, we might be able to see the benefits of informing and empowering LEP patients through technology and subsequently improve equity in health care.

Back to Top | Article Outline


The authors thank Daniel E. Singer, MD; Yuri Quintana, PhD; and Charles Safran, MD for their feedback on the manuscript.

Back to Top | Article Outline


1. Health IT. Legislation and regulation. Available at: Accessed November 16, 2017.
2. James J. Health policy brief: patient engagement. Health Aff. 2013:1–5. Available at: Accessed June 21, 2017.
3. López L, Green AR, Tan-McGrory A, et al. Bridging the digital divide in health care: the role of health information technology in addressing racial and ethnic disparities. Jt Comm J Qual patient Saf Jt Comm Resour. 2011;37:437–445.
4. National Telecommunications and Information Administration. Falling through the net: defining the digital divide. 1999. Available at: Accessed April 26, 2017.
5. Perrin A, Anderson M. 13% of Americans don’t use the internet. Who are they? Pew Research Center. 2016. Available at: Accessed May 17, 2017.
6. Limited English Proficiency (LEP): A federal interagency website. Available at: Accessed May 17, 2017.
7. Pandya C, McHugh M, Batalova J. Limited english proficient individuals in the United States: number, share, growth, and linguistic diversity. LEP data brief. Migr Policy Inst. 2011:1–12. Available at: Accessed May 17, 2017.
8. Flores G. Language barriers to health care in the United States. N Engl J Med. 2006;355:229–231.
9. Kim EJ, Kim T, Paasche-Orlow MK, et al. Disparities in hypertension associated with limited english proficiency. J Gen Intern Med. 2017;632–639.
10. Dubard CA, Gizlice Z. Language spoken and differences in health status, access to care, and receipt of preventive services among US hispanics. Am J Public Health. 2008;98:2021–2028.
11. Karliner LS, Kim SE, Meltzer DO, et al. Influence of language barriers on outcomes of hospital care for general medicine inpatients. J Hosp Med. 2010;5:276–282.
12. Brown BYA, López G, Lopez MH. Digital Divide Narrows for Latinos as More Spanish Speakers and Immigrants Go Online. 2016. Available at: Accessed November 16, 2017.
13. Pew Research Center Demographics of Internet and Home Broadband Usage in the United States. 2017. Available at: Accessed May 17, 2017.
14. English R, Elliott MN, Morales LS, et al. Health information on the internet accessibility, quality, and readability in English and Spanish. JAMA. 2001;285:2612–2621.
15. Vanderpool RC, Kornfeld J, Finney Rutten L, et al. Cancer information-seeking experiences: the implications of hispanic ethnicity and Spanish language cancer information-seeking experiences. J Cancer Educ. 2009;24:141–147.
16. Dimaggio P, Hargittai E. From the “Digital Divide” to “Digital Inequality”: Studying Internet Use as Penetration Increases; 2001. Available at: Accessed June 21, 2017.
17. McCloud RF, Okechukwu CA, Sorensen G, et al. Beyond access: barriers to internet health information seeking among the urban poor. J Am Med Informatics Assoc. 2016;23:1053–1059.
18. Viswanath K, Mccloud R, Minsky S, et al. Internet use, browsing, and the urban poor: implications for cancer control. J Natl Cancer Inst Monogr. 2013;47:199–205.
19. Health Resources & Services Administration. Available at: Accessed May 17, 2017.
20. HRSA Demographic Characteristics. 2016. Available at: Accessed July 13, 2018.
21. Health Center Service Delivery and Look-Alike Sites. Available at: Accessed May 17, 2017.
22. Google Translate. Languages. Available at: Accessed May 17, 2017.
23. Scrapy A. Fast and powerful scraping and web crawling framework. Available at: Accessed May 17, 2017.
24. Ryan C. Language Use in the United States: 2011. 2013. Available at: Accessed May 17, 2017.
25. US Census Bureau. American Community Survey 2011-2015. DP02_0113PE. Available at: Accessed January 4, 2017.
26. US Census Bureau. American Community Survey 2011-2015. DP03_0062E. Available at: Accessed January 4, 2017.
27. US Census Bureau. American Community Survey 2015. S2801_C02_010E. Available at: Accessed January 4, 2017.
28. Health Resources & Services Administration. Area health resources files. Available at: Accessed May 17, 2017.
29. Londra LC, Tobler KJ, Omurtag KR, et al. Spanish language content on reproductive endocrinology and infertility practice websites. Fertil Steril. 2014;102:1371–1376.
30. Census. New Census Bureau interactive map shows languages spoken in America. 2013. Available at: Accessed May 17, 2017.
31. Patil S, Davies P. Use of Google Translate in medical communication: evaluation of accuracy. BMJ. 2014;349–g7392.
32. Commonwealth of Massachusetts Board of Registration in Medicine, Division Q and PS. Clinical Translation Advisory. 2016. Available at: Accessed September 8, 2016.
33. Pew Research Center. Home Broadband 2015. Available at: Accessed May 17, 2017.
34. Mackert M, Mabry-Flynn A, Champlin S, et al. Health literacy and health information technology adoption: the potential for a new digital divide. J Med Internet Res. 2016;18:1–16.
35. De Alba A, Britigan DH, Lyden E, et al. Assessing health literacy levels of Spanish-speaking Hispanic patients in Spanish at federally qualified health centers (FQHCs) in the Midwest. J Health Care Poor Underserved. 2016;27:1726–1732.
36. Rojas-Guyler L, Britigan DH, Murnan J, et al. Measuring english linguistic proficiency and functional health literacy levels in two languages: implications for reaching Latino immigrants. Heal Educ. 2013;45:2–12.
37. National Telecommunications and Information Administration. The State of the Urban/Rural Digital Divide. 2016. Available at: Accessed May 26, 2017.
38. Pew Research Center. Digital gap between rural and nonrural America persists. 2017. Available at: Accessed May 26, 2017.

limited English proficiency; health information technology; health communication; disparities; cultural competency

Supplemental Digital Content

Back to Top | Article Outline
Copyright © 2019 Wolters Kluwer Health, Inc. All rights reserved.