Culturally competent medical care has the potential to reduce disparities in racial and ethnic differences in patients’ experiences with their medical care.1 Though multiple definitions exist,2 culturally competent care refers to the capacity of health care providers at various levels to engage with patients in a safe, patient-centered and family-centered, evidence-based, and equitable manner.3 Yet, until recently, few tools have existed to measure cultural competency.
The Consumer Assessment of Healthcare Providers and Systems (CAHPS) Cultural Competence Survey (CC) assesses 8 aspects of culturally competent care: Doctor Communication—Positive Behaviors, Doctor Communication—Negative Behaviors, Doctor Communication—Health Promotion, Doctor Communication—Alternative Medicine, Shared Decision, Equitable Treatment, Trust, and Access to Interpreter Services. Another paper provides support for the reliability and validity of this survey.4 However, research has not yet examined whether the CAHPS-CC item set provides equivalently reliable and valid measurement across patients with different racial and ethnic backgrounds.
Measurement bias refers to the possibility that 2 people who have had equivalent experiences with culturally competent care will nevertheless answer questions about their experiences differently based on some characteristic such as their race or ethnicity.5 They should respond similarly, but they do not. Without establishing equivalent measurement, the field cannot discern whether differences in reports and ratings of care between subgroups result from different care experiences or differences in the way the groups interpret or respond to the survey.6,7 In this study, we used multiple group confirmatory factor analysis (MG-CFA)6,8–10 to examine measurement bias on the CAHPS-CC.
Participants came from a field test of the CAHPS-CC conducted in 2008 among a stratified random sample (based on race/ethnicity and language) of 6000 adult (age, 18 to 64 y) Medicaid (a US health program for individuals with low incomes and resources) managed care enrollees in 2 health plans: New York (3200) and California (2800). The initial sampling frame consisted of: 1200 white English speakers, 1200 black English speakers, 900 Hispanic English speakers, 900 Hispanic Spanish speakers, 900 Asian English speakers, and 900 Asian non-English speakers.
Data collection consisted of a 2-wave mailing with follow-up telephone interview of nonrespondents. The first mailing included an English survey and a cover letter in English and Spanish. The letter directed Spanish speakers to call an 800 number to request the Spanish survey materials (13% mail response rate; n=722). Four weeks after the initial mailing, nonrespondents received a second mailed survey packet. Telephone follow-ups (English and Spanish) started 2 weeks after the second mailing. We offered a $10 monetary incentive to nonrespondents remaining after the second call (14% phone response rate; n=489). These steps resulted in a 26% response rate overall (n=1380).
Using administrative data, we compared responders and nonresponders on the basis of sex, age, race/ethnicity, primary language, and health plan. Respondents were more likely white (24% vs. 20%) and older (39 vs. 36 y on average), and less likely black (18% vs. 22%). We observed no other significant differences. After excluding individuals without a personal doctor or a doctor visit during the last 12 months, the final analytic sample constituted 991 respondents: 146 non-Hispanic white (hereafter white), 148 non-Hispanic black (hereafter black), 339 Hispanic, 173 Asian, 182 Other Race/Ethnicity, and 3 Missing Race/Ethnicity.
Among the Asian subgroup, too little variation in item responses occurred, resulting in a large amount of bivariate frequencies of zero. This in turn led to an inestimable model for this group. Thus, we excluded Asians from the analysis. We excluded Other Race/Ethnicity individuals from our analyses, given the heterogeneity of racial groups that this category captured. Relatedly, due to the small sample sizes within each group constituting the “Other” group, we could not include each of these groups separately. Thus we examined measurement bias across white, black, and Hispanic individuals only.
The CAHPS Cultural Comparability team developed the CAHPS-CC in several steps: (1) evaluating existing CAHPS surveys to identify existing items addressing the domains of interest; (2) conducting a literature review to identify relevant existing instruments or item sets; (3) placing a Federal Register notice with a call for measures; (4) reviewing and adapting publically available measures; and (5) writing new items for each of domain not addressed in 1 to 4. This resulted in a 49-item draft set.
Subsequently, 2 independent American Translators Association–certified translators conducted 2 forward translations of the survey into Spanish. A committee formed by the 2 translators and bilingual members of the comparability team reviewed the translations and reconciled any differences. After the translation, cognitive interviews were conducted.11 Lastly, the team conducted psychometric analyses to evaluate the CAHPS-CC in the sample overall.4
At the end of item development, the CAHPS-CC included 27 items. These measured 8 constructs: Doctor Communication—Positive Behaviors, Doctor Communication—Negative Behaviors, Doctor Communication—Health Promotion, Doctor Communication—Alternative Medicine, Shared Decision Making, Equitable Treatment, Trust, and Access to Interpreter Services. Very few individuals used interpreters to create a large enough sample to evaluate the Access to Interpreter Services domain in this analysis. Consequently, our analyses included 23 items.
Race and Ethnicity
Respondents self reported their race and ethnicity.
We examined measurement invariance following the method described by Millsap and Yun-Tien.12 This method uses a series of nested models with increasing equivalence constraints on the measurement parameters across groups to evaluate measurement bias. We used fit index levels [root mean squared error of approximation (RMSEA), close ft index (CFI) and Tucker-Lewis Index (TLI)] identified in the literature.13,14 Fit evaluation focused on the index set. After identifying bias using omnibus fit criteria, we used item level comparisons to identify bias’ source and modify the model accordingly.6 Constraints that led to significantly decreased fit identified measurement bias. We subsequently freed these constraints to develop a partial invariance model that directly modeled measurement bias.
All analyses used Mplus (6.1),15 its theta parameterization and robust-weighted least squares estimator, and missing data estimation capability. Consistent with the literature, we used a more conservative α of 0.01 for all significance tests, given the number of tested models.6 We evaluated the influence of bias on substantive conclusions by comparing a model ignoring bias to a model incorporating measurement bias, as described by Carle.6
Table 1 shows the descriptive statistics for the analytic sample. A visual comparison of our sample’s demographics with the general Medicaid population indicated generally similar distributions, except for the variables for which we oversampled (eg, race).
Evaluating Measurement Bias
Given the previous research, we initially tested a 7-factor model’s fit (model 1)4 across whites, blacks, and Hispanics. Though we achieved good fit when estimating the model in the sample ignoring group status (RMSEA=0.04; TLI=0.99; CFI=0.91), we encountered problems when attempting to fit the model using MG-CFA. This occurred for several reasons. First, upon splitting the sample into groups, we observed several bivariate frequencies equal to 0, limiting our ability to estimate the polychoric correlation matrix.15 These 0’s occurred primarily as a result of sparse responses in some categories and items, thus we merged categories for those items.16 This resolved the problem for all but one item (did this doctor use a condescending…tone). Thus, we dropped it from our model. Second, we experienced difficulty fitting the baseline model due to the fact that 3 of the factors (Shared Decision Making, Equitable Treatment, and Alternative Medicine) each had only 2 indicators per factor, resulting in an unstable model. Thus, we had to drop these factors from our model, resulting in a 4-factor model (Doctor Communication—Positive Behaviors, Doctor Communication—Negative Behaviors, Doctor Communication—Health Promotion, and Trust). The modified baseline model (model 1b) fit well (RMSEA=0.056; CFI=0.99; TLI=0.99). Given good fit, we tested model 2, which constrained the loadings to equality across the groups. These constraints did not result in statistically significant measurement bias (Δχ2=28.73, 24; n=633; P=0.23).
Model 3 constrained the thresholds to equality across the groups. Thresholds indicate that the levels of the latent trait present before (on average) respondents are more likely than not to endorse a given category. Model 3 revealed statistically significant measurement bias in at least 1 threshold (Δχ2=141.72, 24; n=633; P<0.01). Univariate indicated bias in 4 items’ thresholds: “listens carefully,” “spend enough time,” “show respect,” and “easy to understand instructions.” The pattern of bias was sometimes similar and sometimes different across Hispanics and blacks relative to whites (Table 2). The final partially invariant model (see Table 2 for values) relaxed the equality constraints for these 4 items’ thresholds.
Evaluating the Influence of Measurement Bias
Statistically significant bias does not necessarily indicate that bias would substantively influence conclusions.17 To evaluate the influence of bias, we compared model-based estimates that resulted from the final partially invariant measurement model incorporating measurement differences to estimates that resulted from a model ignoring bias. Any differences in the pattern of mean differences would indicate bias’ influence. For example, whites had a mean of 0 on each factor (for statistical identification). Thus, we could first evaluate whether the means for each factor and group differed from whites by examining whether their means differed significantly from 0. If we observed differences, we could then examine changes (if any) in these differences across the models. Ignoring bias, none of the means across blacks (Doctor Communication—Positive Mblack=0.42, z=1.37; Doctor Communication—Negative Mblack=−0.73, z=−2.37; Health Promotion Mblack=−0.3, z=−1.643; Trust Mblack=−0.15, z=−0.76) or Hispanics (Doctor Communication—Positive MHispanic=0.136, z=0.517; Doctor Communication—Negative MHispanic=−0.24, z=−1.23; Health Promotion MHispanic=−0.14, z=−0.81; Trust MHispanic=0.12, z=0.73) differed from whites. Under the model adjusting for bias, blacks’ and Hispanics’ means still did not differ significantly from the means for whites, supporting the hypothesis that bias did not substantively influence mean-based conclusions.
In this study, we evaluated whether the CAHPS-CC provides sufficiently equivalent measurement across individuals of different racial and ethnic backgrounds. In answer, yes. We used MG-CFA and probed for bias across whites, blacks, and Hispanics in a sample of Medicaid patients in New York and California. Though we found some statistically significant measurement bias, sensitivity analyses indicated that the observed measurement bias did not influence conclusions. These findings highlight the importance of both evaluating whether measurement bias exists and whether any observed, statistically significant measurement bias has the potential to substantively influences decisions based the measure’s scores.
These findings provide preliminary support for the use of the CAHPS-CC to measure experiences in culturally competent care across white, black, and Hispanic patients. Scores on the measure correspond to the underlying constructs similarly across groups. Patients’ reports should also have similar reliability. Moreover, while some differences seem to exist in the levels of Doctor Communication—Positive present before black and Hispanics will likely endorse some of the categories measuring the Doctor Communication—Positive construct, these differences do not seem to substantively influence the mean-based conclusions.
Before concluding, we note some limitations. First, due to sparse categories, we had to merge some item categories and exclude some subscales. Therefore, we could not fully examine bias. Second, our data came from a sample of 2 state’s Medicaid enrollees. Our findings may not generalize to the Medicaid or other populations. Third, the fit indices we used may not have been robust enough to identify misfit. Fourth, limited response rates may affect our findings’ validity. Finally, sample sizes precluded us from including Asians or separating Hispanics or the other groups into finer-grained groups (eg, by acculturation, education, or other culturally relevant variables) to address these potential confounds with race and ethnicity. Future research in a larger, more diverse sample can and should address these issues before reaching firm conclusions about measurement bias on the CAHPS-CC.
In summary, we used MG-CFA to examine whether measurement bias influences conclusions with regard to 4 of 8 CAHPS-CC subscales across whites, blacks, and Hispanics. Although we found some statistically significant bias, analyses demonstrated that bias does not substantively influence conclusions on the basis of patients’ responses for these subscales, indicating preliminary support that stakeholders can place confidence in the CAHPS-CC when used among white, black, and Hispanic groups.
The author thanks Tara J. Carle and Lyla S. B. Carle whose thoughtful comments and unending support make his work possible.
1. Weech-Maldonado R, Dreachslin J, Dansky K, et al. Racial/ethnic diversity management and cultural competency: the case of Pennsylvania hospitals. J Healthc Manag. 2002;47:111
2. Betancourt JR, Green AR, Carrillo JE, et al. Defining cultural competence: a practical framework for addressing racial/ethnic disparities in health and health care. Public Health Rep. 2003;118:293
3. National Quality Forum (NQF). 2009. A Comprehensive Framework and Preferred Practices for Measuring and Reporting Cultural Competency. Washington, DC: NQF. http://www.qualityforum.org/Publications/2009/04/A_Comprehensive_Framework_and_Preferred_Practices_for_Measuring_and_Reporting_Cultural_Competency.aspx
4. Weech-Maldonado R, Carle AC, Weidmer B, et al. Assessing cultural competency from the patient’s perspective: the CAHPS cultural competency (CC) item set. Working paper: Department of Health Services Administration, University of Alabama at Birmingham, 2010
5. Mellenbergh GJ. Item bias and item response theory. Int J Educ Res. 1989;13:127–143
6. Carle A. Mitigating systematic measurement error in comparative effectiveness research in heterogeneous populations. Med Care. 2010;48:S68–S74
7. Weech-Maldonado R, Weidmer BO, Morales LS, et al.Cynamon M, Kulka R Cross-Cultural Adaptation of Survey Instruments: The CAHPS Experience. Seventh Conference on Health Survey Research Methods. 2001 Hyattsville, MD DHHS:75–82
8. Carle AC. Assessing the adequacy of self-reported alcohol abuse measurement across time and ethnicity: cross-cultural equivalence across Hispanics and Caucasians in 1992, non-equivalence in 2001–2002. BMC Public Health. 2009;9:60
9. Carle AC. Tolerating inadequate alcohol dependence measurement: cross-cultural invalidity of alcohol dependence across Hispanics and Caucasians in 2001 and 2002. Addict Behav. 2008;34:43–50
10. Carle AC. Cross-cultural validity of alcohol dependence across Hispanics and non-Hispanic Caucasians. Hispanic J Behav Sci. 2008;30:106–120
11. Willis G CognitiveInterviewing: A Tool for Improving Questionnaire Design
. 2005 Thousand Oaks Sage Publications Inc.
12. Millsap RE, Yun-Tein J. Assessing factorial invariance in ordered-categorical measures. J Multivar Behav Res. 2004;39:479–515
13. Hu L, Bentler P. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6:1–55
14. Hu L, Bentler PM. Fit indices in covariance structure modeling: sensitivity to underparameterized model misspecification. Psychol Methods. 1998;3:424–453
15. Muthén LK, Muthén BO Mplus User
. 2009 Los Angeles, CA Muthén & Muthén
16. Crane PK, Gibbons LE, Jolley L, et al. Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Med Care. 2006;44(suppl 3):S115–S123
17. Millsap RE, Kwok O-M. Evaluating the impact of partial factorial invariance on selection in two populations. Psychol Methods. 2004;9:93–115