Culturally competent care refers to the capacity of health care providers at various levels to engage with patients in a safe, patient and family centered, evidence-based, and equitable manner.1 Given the increasing size of the Spanish-speaking Hispanic population in the United States,2 the importance of delivering culturally competent care to this population,3–8 and the importance of the patient’s perspective,9 it seems self-evident that stakeholders need reliable and valid measures of patients’ experiences with culturally competent care. Yet, until recently, few tools have existed to do this, especially for Spanish-speaking patients.
In response, a team of investigators developed a new measure of patient’s experiences with culturally competent care.10 The Consumer Assessment of Healthcare Providers and Systems (CAHPS) Cultural Competence Survey (CAHPS-CC) assesses 8 domains of culturally competent care: doctor communication-positive behaviors; doctor communication-negative behaviors; doctor communication-health promotion; doctor communication-alternative medicine; shared decision; equitable treatment; trust; and access to interpreter services. Another paper provides support for the reliability and validity of this survey among patients generally.10 However, research has not yet examined whether responses to the CAHPS-CC item set provide equivalently reliable and valid measurement across patients responding in English and Spanish.
Measurement bias refers to the possibility that 2 people who have had equivalent experiences with culturally competent care, nevertheless, answer questions about their experiences differently based on whether or not they respond to questions in English or Spanish.11 Without establishing equivalent measurement, the field cannot discern whether differences in reports of care between English and Spanish speakers result from different care experiences or differences in the way the groups respond. Multiple group confirmatory factor analysis (MG-CFA) provides a potent method for evaluating bias.12–14 Thus, we used MG-CFA to examine potential measurement bias on the CAHPS-CC across English and Spanish survey versions.
Using administrative data provided by the state’s plans, we used a stratified random sampling design (based on race/ethnicity and language), to select 3200 (New York) and 2800 (California) adults (18–65 y old). The initial sampling frame consisted of: 1200 white English speakers, 1200 black English speakers, 900 Hispanic English speakers, 900 Hispanic Spanish speakers, 900 Asian English speakers, and 900 bilingual Asian speakers (all communications with this group occurred in English. Cost restrictions and the number of Asian languages prevented us from developing numerous separate Asian language surveys).
Data collection occurred in 2 waves: mailing and follow-up telephone interviews of nonrespondents. The mailing included an English survey and a cover letter in English and Spanish. The letter directed Spanish speakers to call an 800 number to request a copy of the Spanish survey materials. Four weeks after the initial mailing, nonrespondents received a second mailed survey packet. Telephone follow-ups in English and Spanish started 2 weeks after the second mailing. Remaining nonrespondents after the second call attempt received a monetary incentive of $10 to complete the survey. In all, 1380 individuals completed the survey for an overall response rate of 26%.
We used administrative data to compare responders and nonresponders on the basis of sex, age, race/ethnicity, primary language, and health plan affiliation. Respondents were more likely to be white (24% vs. 20%) and older (39 vs. 36 y), and less likely to be black (18% vs. 22%). We observed no other significant differences. Note that using administrative data to compare respondents and nonrespondents may have influenced our conclusions regarding nonresponse bias.
After excluding individuals that did not have a personal doctor or a doctor visit during the last 12 months, the final analytic sample constituted 964 respondents. Eight hundred fifty-one completed the survey in English and 113 completed the survey in Spanish (see Weech-Maldonado et al10 for further methodological details).
The CAHPS Cultural Comparability team developed the CAHPS-CC by: (1) evaluating existing CAHPS surveys to identify existing items that addressed the domains; (2) conducting a literature review to identify existing items and instruments; (3) placing a Federal Register notice with a call for measures; (4) reviewing and adapting existing public domain measures; and (5) writing new survey items for each of the domains not addressed in 1 through 4. This resulted in a 49-item draft set. Two independent American Translators Association-certified translators then conducted 2 forward translations of the items into Spanish. Subsequently, a committee formed by the 2 translators and bilingual members of the Comparability team reviewed the translations and reconciled differences. After the translation, cognitive interviews15 were conducted. Finally, psychometric analyses evaluated the CAHPS-CC in the sample overall.10,16,17
At the development’s end, the CAHPS-CC consisted of 27 items addressing the extent to which an experience had occurred (rather than evaluating the experience). The items measured 8 constructs: doctor communication-positive behaviors, doctor communication-negative behaviors, doctor communication-health promotion, doctor communication-alternative medicine, shared decision making, equitable treatment, trust, and access to interpreter services. One can view the entire item set at https://www.cahps.ahrq.gov/clinician_group/.
Our analyses addressed the doctor communication-positive behaviors, doctor communication-negative behaviors, doctor communication-health promotion, trust, and equitable treatment domains only. This occurred because very few individuals used interpreters to create a large enough sample to evaluate the access to interpreter services domain. In addition, the presence of some bivariate frequencies equal to 0 limited our ability to estimate the polychoric correlation matrix when including all of the remaining items. These “empty” cells occurred as a result of sparse responses in some item’s categories.18 Consistent with the literature, we merged categories for polytomous items with this problem19 and dropped dichotomous items that had this problem. This resolved all estimation problems, but limited our analyses to the 5 factors listed earlier (and their 19 total items).
We probed for measurement bias following the method described by Millsap and Yun-Tien20 and Carle.12 To evaluate overall fit, we used fit index levels identified by the literature.21,22 Fit evaluation focused on the index set. We used the χ2 difference test (Δχ2) to test for bias. After identifying bias using this omnibus test, we used item level comparisons to identify bias’ source.12 All analyses used Mplus (6.1),18 its theta parameterization and robust-weighted least squares estimator, and missing data estimation capability. Given the number of model tests and consistent with the literature, we used a more conservative α of 0.01 for all significance tests.12 We evaluated bias’ substantive impact on substantive conclusions by comparing the pattern and size of mean differences from a model ignoring measurement bias to a model incorporating measurement bias, as described by Carle.12
Table 1 shows the analytic sample’s descriptive statistics.
Evaluating Measurement Bias
We initially tested a 5-factor model’s fit (model 1) across the English and Spanish groups. This model fit well (root mean squared error of approximation=0.05, Close Ft Index=0.98, Tucker-Lewis Index=0.98). We then tested model 2, which constrained the loadings to equality across groups. This model also fit well (root mean squared error of approximation=0.04, Close Ft Index=0.99, Tucker-Lewis Index=0.99) and the constraints did not result in statistically significant misfit (Δχ2=12.7, 13; n=633; P=0.23). This indicated no statistically significant bias in the loadings. We next examined bias in the thresholds. Thresholds give the level of the latent variable present before a respondent is more likely than not to respond in a given category. Model 3 constrained the thresholds to equality across the groups. The threshold’s equivalence led to statistically significant misfit (Δχ2=138.6, 34; n=964; P<0.01), revealing bias in at least 1 threshold. Follow-up analyses indicated bias only in the thresholds of the “easy to understand instructions” items. The final partially invariant model relaxed the ill-fitting constraints. In summary, we found no differences in the loadings and differences in only 1 item’s thresholds. Table 2 shows the final partially invariant measurement model.
Evaluating the Influence of Measurement Bias
Statistically significant measurement bias may not substantively influence scores.23,24 To evaluate bias’ influence, we compared model-based estimates from the final partially invariant measurement model incorporating measurement differences to estimates from a model ignoring bias. Any differences in the pattern of mean differences would indicate influence. For example, white’s had a mean of 0 on each factor (for statistical identification). Thus, we could first evaluate whether the means for each factor and group differed from whites by examining whether their means differed significantly from 0. If we observed differences, we could then examine changes (if any) in these differences across the models. None of the means for Spanish respondents (doctor communication-positive=−0.062, z=−0.565; doctor communication-negative=−0.052, z=−0.278; health promotion=−0.092, z=−0.609; trust=0.234, z=2.107; and equitable treatment=−0.137, z=−0.393) showed statistically significant differences relative to English respondents. Under the model adjusting for bias, we also observed no statistically significant mean differences, providing support for the hypothesis that bias does not substantially influence mean-based conclusions for these factors.
In this study, we investigated whether the CAHPS-CC provides sufficiently equivalent measurement across individuals responding in English and Spanish. Despite best efforts at survey translation, the possibility exist that 2 people with equivalent cultural competence experiences who answered the CAHPS-CC in different languages may have responded to questions about their experiences differently. Our results indicate that the CAHPS-CC has equivalent measurement properties across individuals responding in English and Spanish for the domains included in our analyses.
We used MG-CFA and probed for bias across language (Spanish and English) in a sample of Medicaid patients in New York and California. Though we found some statistically significant measurement bias, further analyses demonstrated that the observed bias did not influence mean-based comparative conclusions across language when using the CAHPS-CC. These findings highlight the importance of evaluating whether measurement bias exists and whether any observed, statistically significant bias substantively influences decisions.
These findings support the use of the CAHPS-CC to measure patients’ experiences with culturally competent care across Spanish and English speaking patients. Scores on the measure correspond to and estimate the underlying CAHPS-CC constructs similarly whether or not patients answer in Spanish or English. Patients’ reports should have similar reliability across responses in either language, and mean-based estimates should correspond to similar levels of the domain across English and Spanish respondents.
Before concluding, we note some study limitations. Because of sparse categories and relatively small within group sample sizes, we had to merge some item categories and drop 3 domains (shared decision making, alternative medicine, and access to interpreter services). Therefore, we could not examine bias in the full set of thresholds and for all of the domains. In addition, our data came from a sample of Medicaid managed care enrollees in 2 states. New York and California’s Medicaid populations may not generalize to the full Medicaid population. In addition, we only investigated bias across language using Medicaid patients; our findings may not generalize to other populations. Similarly, we did not have measures of other potentially relevant variables (eg, income and language ability) that might have influenced our results. Moreover, due to sample size restrictions, we could not further split our groups to examine additional variables for which we did have measures (eg, race and ethnicity). Future research in larger, more diverse samples can address all of these issues.
In summary, we used MG-CFA to examine whether measurement bias influences conclusions based on the patients’ responses to the CAHPS-CC depending on whether they answer the survey in Spanish or English. Though we found some statistically significant measurement bias, our analyses demonstrated that this measurement bias does not substantively influence mean-based conclusions based on patients’ responses. CAHPS-CC users can place confidence in efforts to compare the cultural competence experiences of English and Spanish speakers using the CAHPS-CC on the studied domains.
Adam Carle thanks Tara J. Carle and Lyla S. B. Carle whose thoughtful comments and unending support make his work possible.
1. Endorsing a Framework and Preferred Practices for Measuring and Reporting Culturally Competent Care Quality. 2008 Washington, DC National Quality Forum
3. Lambert BL, Street RL, Cegala DJ, et al. Provider-patient communication, patient-centered care, and the mangle of practice. Health Commun. 1997;9:27–43
4. McWhinney IStewart M, Roter D. The need for a transformed clinical method. Communicating With Medical Patients. 1989 London Sage:25–40
5. Ngo-Metzger Q, Telfair J, Sorkin D, et al. Cultural Competency and Quality of Care: Obtaining the Patient’s Perspective. 2006 New York, NY Commonwealth Fund
6. Weech-Maldonado R, Morales LS, Elliott M, et al. Race/ethnicity, language, and patients’ assessments of care in Medicaid managed care. Health Serv Res. 2003;38:789–808
7. Weech-Maldonado R, Dreachslin J, Dansky K, et al. Racial/ethnic diversity management and cultural competency: the case of Pennsylvania hospitals. J Healthc Manag. 2002;47:111
8. Nápoles-Springer AM, Santoyo J, Houston K, et al. Patients’ perceptions of cultural factors affecting the quality of their medical encounters. Health Expect. 2005;8:4–17
9. Stewart AL, Nápoles-Springer AM. Advancing health disparities research: can we afford to ignore measurement issues? Med Care. 2003;41:1207–1220
10. Weech-Maldonado R, Carle AC, Weidmer B, et al. Assessing cultural competency from the patient’s perspective: The CAHPS Cultural Competency (CC) Item Set. Working Paper: Department of Health Services Administration, University of Alabama at Birmingham; 2010
11. Mellenbergh GJ. Item bias and item response theory. Int J Educ Res. 1989;13:127–143
12. Carle A. Mitigating systematic measurement error in comparative effectiveness research in heterogeneous populations. Med Care. 2010;48:S68–S74
13. Carle AC. Assessing the adequacy of self-reported alcohol abuse measurement across time and ethnicity: cross-cultural equivalence across Hispanics and Caucasians in 1992, non-equivalence in 2001–2002. BMC Public Health. 2009;9:60
14. Carle AC. Tolerating inadequate alcohol dependence measurement: cross-cultural invalidity of alcohol dependence across Hispanics and Caucasians in 2001 and 2002. Addict Behav. 2009;34:43–50
15. Willis G Cognitive Interviewing: A Tool For Improving Questionnaire Design. 2005 Thousand Oaks, New York Sage Publications Inc.
16. Weech-Maldonado R, Carle AC, Weidmer B, et al. The Consumer Assessment of Healthcare Providers and Systems (CAHPS) Cultural Competence (CC) Item Set. Med Care. 2012;50(suppl 2):S22–S31
17. Carle AC, Weech-Maldonado R. Evaluating measurement equivalence across race and ethnicity on the CAHPS Cultural Competence Survey. Med Care. 2012;50(suppl 2):S32–S36
18. Muthén LK, Muthén BO Mplus Use’s Guide. 2009 Los Angeles, CA Muthén & Muthén
19. Crane PK, Gibbons LE, Jolley L, et al. Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Med Care. 2006;44(suppl 3):S115–S123
20. Millsap RE, Yun-Tein J. Assessing factorial invariance in ordered-categorical measures. J Multivar Behav Res. 2004;39:479–515
21. Hu L, Bentler P. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6:1–55
22. Hu L, Bentler PM. Fit indices in covariance structure modeling: sensitivity to underparameterized model misspecification. Psychol Methods. 1998;3:424–453
23. Millsap RE Statistical Approaches to Measurement Invariance. 2011 New York Routledge
24. Millsap RE, Kwok O-M. Evaluating the impact of partial factorial invariance on selection in two populations. Psychol Methods. 2004;9:93–115