The Consumer Assessment of Healthcare Providers and Systems (CAHPS®) surveys were developed to elicit reports from consumers about their health care experiences. The surveys cover topics such as the communication skills of providers, helpfulness of staff, and access to care, which are important to consumers and for which they are the best source for this information. The surveys and accompanying tools can be used by providers, health care organizations, government agencies, and researchers to assess and improve patient-centered care. Establishing the psychometric properties of CAHPS surveys is an integral step toward enabling valid comparisons on patient experience across organizations and over time.1–5
The CAHPS Clinician and Group Survey (CG-CAHPS) was developed to assess patient experiences with ambulatory care. There are 3 versions of CG-CAHPS: (1) a 12-month Survey that asks patients to report on their experiences over the last 12 months; (2) an expanded 12-month Survey that includes items to assess aspects of the Patient-Centered Medical Home; and (3) a Visit Survey that primarily focuses on experiences during a single visit. The Visit Survey includes questions about doctor communication and office staff interactions at the patient’s most recent visit, and questions about the patient’s access to care with their doctor over the last 12 months. The survey also elicits an overall rating of the doctor from patients and asks about their willingness to recommend their doctor's office to family and friends. The Visit Survey was designed to collect feedback about a specific patient visit that providers can use for monitoring and improving care.
In this paper, we evaluate the hypothesized factor structure and reliability of the CG-CAHPS Adult Visit Survey using data submitted to the CG-CAHPS Database.
The CG-CAHPS Adult Visit Survey contains 28 non-demographic items, of which 13 are used to create 3 composites that assess Access to Care (5 items), Doctor Communication (6 items), and Courteous/Helpful Staff (2 items). The survey also includes 2 questions that ask respondents (1) to rate their doctor; and (2) report if they would recommend the doctor’s office to family and friends. In addition, respondents are asked about their overall health, age, sex, and education.
Access to Care Composite
The 5 Access to Care items ask patients about their ability to get an appointment for urgent care as soon as needed, get an appointment for a check-up or routine care as soon as needed, get an answer to a phone question during regular office hours on the same day, get an answer to a phone question after hours as soon as needed, and if the wait time to be seen was within 15 minutes of appointment time. All questions in this composite have a reference period of 12 months and use a 4-point response scale (1=Never, 2=Sometimes, 3=Usually, 4=Always). The Access to Care composite uses a 12-month reference period unlike the other items on the Visit Survey that ask about the most recent visit. In field testing, results showed that the Access items using a visit-based reference period did not achieve an acceptable level of reliability. As a result, the Access items were changed back to the 12-month reference period, leaving all other items visit-specific.
Doctor Communication Composite
The 6 Doctor Communication items ask whether the doctor explained things clearly, listened carefully, gave easy to understand instructions, knew important medical history about the patient, showed respect, and spent enough time with the patient. These questions reference the most recent visit and use a 3-point response scale (1=Yes, definitely; 2=Yes, somewhat; 3=No). The items in this composite were recoded such that higher scores equal more positive responses (eg, Yes, definitely was recoded to 3; No was recoded to 1).
Courteous/Helpful Staff Composite
The 2 Staffing items ask whether clerks and receptionists were helpful, and if they treated the patient with courtesy and respect. These questions reference the most recent visit and use a 3-point response scale (1=Yes, definitely; 2=Yes, somewhat; 3=No). The items in this composite were recoded such that higher scores equal more positive responses (eg, Yes, definitely was recoded to 3; No was recoded to 1).
Overall Doctor Rating
This question asks the patient to rate the doctor on a scale from 0 to 10, with 0 representing the worst doctor possible and 10 representing the best doctor possible.
Recommend Doctor Rating
This question asks whether the patient would recommend the doctor’s office to family and friends and uses a 3-point response scale (1=Yes, definitely; 2=Yes, somewhat; 3=No). This item was recoded such that higher scores equal more positive responses (eg, Yes, definitely was recoded to 3; No was recoded to 1).
The data was from the CG-CAHPS Database, consisting of 103,442 respondents from 469 practice sites. The Visit Survey includes a number of screener questions that require a “yes” response before responding to a subsequent question. For one of these questions, a majority of respondents (93%) had not phoned their doctor after regular office hours and therefore were instructed to skip the Access to Care item Q12: “When you phoned this doctor’s office after regular office hours, how often did you get an answer to your medical question as soon as you needed it?” Because there was such a high percentage of valid skips for this item, it was dropped from further analyses. The remaining Access to Care composite items had responses from between 46% and 98% of the respondents. The 2 Courteous/Helpful Staff items and 5 of the 6 Doctor Communication items were answered by 99%. The Doctor Communication item (Q21) about receiving easy to understand health care instructions was answered by 84% of respondents.
To run a 3-factor psychometric model with items loading onto their associated composites (Access, Doctor Communication, and Courteous/Helpful Staff), we included only nonmissing data for the items that make up the 3 CG-CAHPS composites. The final analysis dataset therefore consisted of 21,318 responses from 450 practice sites.
The data used for these analyses came from health systems, medical offices, and survey vendors who voluntarily submitted CG-CAHPS survey data collected from March 2010 to December 2010 to the CAHPS Database. All of the 450 practice sites included in the analysis dataset administered mail surveys. Most of the practice sites specialized in Family Practice and/or Internal Medicine (89%). Over two thirds of the practice sites were owned by a hospital or integrated delivery system (69%). Most respondents were female (67%) and a majority were 45 years or older (81%).
Descriptive statistics for the survey items and Spearman rank-order correlations with their associated composites and the global rating items were computed. In addition, we performed confirmatory factor analyses using Mplus Version 6.12, as described below. Finally, we estimated internal consistency reliability and physician group-level reliability (see below).
Individual-level Confirmatory Factor Analysis
We conducted individual-level confirmatory factor analysis on the proposed 3-factor model, with maximum likelihood estimation, at first ignoring the nesting of respondents within practice sites. To assess the appropriateness of the resulting structure, we examined factor loadings with the criterion that they should be ≥0.40.6 We present standard overall model fit statistics: the χ2, comparative fit index (CFI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR).
Given the large sample size of our dataset, we primarily relied on the CFI, RMSEA, and SRMR as indices of model fit because the χ2 is influenced by sample size such that the larger the sample size the more likely it is that the χ2 will be significant (which indicates lack of model fit).7,8 The CFI compares the existing model fit with a null model that assumes the items in the model are uncorrelated. The factor structure is determined to adequately fit the data if the CFI is at least 0.95.9 The RMSEA examines the residuals of the model; an RMSEA of ≤0.06 is indicative of good fit.9 The SRMR is the standardized difference between the observed and predicted covariances from the model. A value of 0 for the SRMR indicates perfect fit, but a value <0.08 is considered good fit.9
When respondent data are nested within practice sites, multilevel modeling is generally more appropriate because it accounts for the nested nature of the data. We performed a number of steps in association with the multilevel analyses.
Intraclass Correlations (ICCs) and Design Effects
First, we examined ICCs and design effects to determine if the data were truly nested and therefore multilevel analyses would be necessary.10 ICCs>0.05 indicate that the multilevel structure of the data needs to be taken into consideration; ICCs<0.05 signify that the consequences of not using multilevel analyses are minimal.11 We also examined design effects, as ICCs are affected when there are few groups comprised of many individuals or many groups comprised of few individuals, as is the case for our dataset. Design effects take into consideration the group sample size (Design effect=1+[Average within group sample size–1]×ICC). A design effect of ≥2.0 implies that group membership is associated with responses of the individuals and therefore multilevel modeling should be conducted to account for the multilevel nature of the data.12
Multilevel Confirmatory Factor Analyses (MCFA)
Similar to the individual-level confirmatory factor analyses, a 3-factor model was examined, taking into consideration the nested nature of the data. We evaluated the item factor loadings with the same rule as the individual level confirmatory factor analyses—that factor loadings should be ≥0.40. With multilevel models, 2 sets of factor loadings are provided: between-practice sites and within-practice sites, which coincide with the nested nature of the data. The between factor loadings are based on the between-practice site covariance matrix, whereas the within factor loadings use the within-level or respondent-level covariance matrix. We again present overall model fit indices using standard fit statistics: the χ2, CFI, RMSEA, and SRMR, with the same criteria as at the individual level.
Cronbach coefficient α, an estimate of reliability, was calculated for each composite to assess the extent to which respondents consistently answered the items, with a reliability of at least 0.70 considered acceptable.13
We examined practice site reliability by practice site size (ie, the number of clinicians per site) because practices of different sizes need different numbers of patient surveys to reach acceptable levels of reliability on the measures. We calculated practice site reliability using the following formula:
where ΣB refers to the between-group variance; ΣW refers to the within-group variance, and Ng is the sample size for practice site g.14
Average reliability estimates were calculated for the 3 composites and 2 global rating items for 6 practice size categories: (1) 1 clinician; (2) 2–3 clinicians; (3) 4–9 clinicians; (4) 10–13 clinicians; (5) 14–19 clinicians; and (6) ≥20 clinicians. A variety of different size categories were considered and other splits are possible but this set of categories was chosen based on variance in reliability and patient sample sizes available in our dataset. Similar to internal consistency reliability, values of at least 0.70 are considered acceptable for practice site comparisons.13
Correlations Among Composites and Global Ratings
Relationships among the composites and global ratings at the individual and practice site levels were also examined using Spearman rank-order correlations. Although the composites should be correlated as they all measure aspects of patient experience, very high intercorrelations indicate that the composites may not be unique enough to be considered separate measures. In general, composite intercorrelations should be <0.80 for the composites to be considered unique.15 We hypothesized that the composites would be positively related to the global rating items.
The means, SDs, top box scores, and correlations for the survey items are provided in Table 1. Consistent with other patient experience data, CG-CAHPS ratings of care tend to be very positive (negatively skewed)—that is, consumers tend to report positive experiences with health care in the United States.16
The item-to-composite correlations (corrected for item overlap with the composite total) ranged from 0.40 (Q13 with Access to Care) to 0.71 (Q28 and Q29 with Courteous/Helpful Staff). The correlations between the composite items and the global rating items ranged from 0.18 (Q29. Courteous/Helpful Staff with Overall Doctor Rating) to 0.53 (Q19. Doctor Communication with Recommend Doctor).
Individual-level Confirmatory Factor Analysis
Table 2 shows that all items within the composites had factor loadings greater than the 0.40 criterion with an average loading of 0.68 for Access to Care, 0.76 for Doctor Communication, and 0.86 for Courteous/Helpful Staff. The overall model fit indices are shown in Table 3. As expected, the χ2 test was statistically significant (P<0.01) given the large sample size. The CFI was 0.97, >0.95 criterion for good model fit. The RMSEA was 0.05, <0.06 criterion, indicating good model fit. The SRMR was 0.04, <0.08 criterion, again signifying good model fit. Overall, the individual-level factor analysis results provided initial support for the 3 composites and justification for aggregating the items into their associated composites.
Multilevel Factor Analyses
ICCs and Design Effects
As shown in Table 2, the item ICCs for Access to Care were all greater than the 0.05 criterion; with an average of 0.08, ranging from 0.07 to 0.11. This finding indicates that between 7% and 11% of the variance may be attributed to practice site membership and establishes the need for multilevel analyses. For Doctor Communication and Courteous/Helpful Staff, all the item ICC values were ≤0.05 criterion indicating very little variability across practice sites (average of 0.02, ranging from 0.01 to 0.05). However, when examining design effects, both Courteous/Helpful Staff items and one of the Doctor Communication items had values exceeding the 2.00 criterion indicating the nested nature of the data for these items. Overall, these statistics confirmed that, in general, responses within practice sites were more similar than would be expected by chance; therefore the clustered nature of the data should be taken into account when examining their factor structure.
All factor loadings estimated with the multilevel models were greater than the 0.40 criterion (Table 2). The between-practice site factor loadings ranged from 0.59 to 0.99 and the within-practice site factor loadings ranged from 0.45 to 0.99. The χ2 test (Table 3) was significant (P<0.01) as expected, but CFI was 0.97, >0.95 criterion. In addition, the RMSEA was 0.03, <0.06 criterion, indicating good fit. The within-practice site SRMR was 0.05, <0.08 criterion that indicated good fit; however, the between-practice site SRMR was slightly above the cutoff at 0.10.
All composites had acceptable (≥0.70) individual level internal consistency reliability estimates, ranging from 0.77 to 0.89 (Table 4). Practice site level reliability was examined across the composites and global rating items by practice site size categories (1 clinician to ≥20 clinicians, Table 5). The practice site reliability estimates were acceptable for all sites with at least 4 clinicians. For sites with 1 clinician, only Access to Care had reliability >0.70. The remaining reliabilities for practice sites with 1 clinician ranged from 0.40 (Courteous/Helpful Staff) to 0.69 (Overall Rating Item). For sites with 2–3 clinicians, both Access to Care and Courteous and Helpful Staff had reliability estimates >0.70. The remaining reliabilities ranged from 0.58 (Recommend Doctor item) to 0.66 (Overall Rating item). The average number of respondents in 1 clinician and 2–3 clinician offices was <100, indicating that for these smaller sites it is necessary to have more respondents per practice site to increase reliability to acceptable levels.
Spearman Correlations Among the Composites and Global Ratings
All Spearman rank-order composite correlations were statistically significant (P<0.01), and none of the correlations exceeded the 0.80 criterion signaling potential multicollinearity (Table 4). The average individual-level correlation among the composites was 0.30 (range: r=0.25–0.35). The average practice site level correlation among the composites was 0.48 (range: r=0.41–0.57). The lowest correlations at the individual and practice site levels were between Doctor Communication and Courteous/Helpful Staff (0.25 for individual and 0.41 for practice site level, respectively). The highest correlation at the individual level was between Access to Care and Doctor Communication (r=0.35). The highest correlation at the practice site level was between Access to Care and Courteous/Helpful Staff (r=0.57).
The Spearman correlations between the composites and the 2 global rating items were all statistically significant (P<0.01). For the Overall Doctor Rating item, the average individual level correlation with the composites was 0.38 (range: r=0.22–0.52) and the average practice site level correlation was 0.50 (range: r=0.34–0.75). For the Recommend Doctor item, the average individual-level correlation with the composites was 0.38 (range: r=0.29–0.52), whereas the average practice site level correlation was 0.57 (range: r=0.43–0.76). The highest correlation with the global rating items was with the Doctor Communication composite and the Recommend Doctor item (0.52 at the individual level and 0.76 at the practice site level). Finally, the Spearman correlations between the 2 global ratings were 0.47 and 0.76 at the individual and practice site levels, respectively.
The CG-CAHPS Adult Visit survey is a publicly available, standardized tool to measure patients’ experiences with outpatient medical offices. Demonstrating the psychometric properties of the survey is an important step for furthering its use. Overall, both the individual level and multilevel CFA results provided support for the survey’s 3 composites (Access to Care, Doctor Communication, Courteous/Helpful Staff) and 2 global rating items (Overall Doctor Rating, Recommend Doctor).
This study of a large number of practice sites and a large sample of patients provides support that the CG-CAHPS composites have acceptable individual-level internal consistency reliability and practice site level reliability. Practice-level reliability is important because the survey is intended to provide information at the practice level, for public reporting of patient experience data, and to enable confidence in comparisons of data across sites. In our dataset, we found acceptable practice site level reliability for sites with at least 4 clinicians. The reliability stays relatively the same, and >0.70, across sites with 4–≥20 clinicians (Table 5). Given that site-level reliability is a function of sample size, and the average sample size for practice sites with >4 clinicians was far less than those with ≥4, these practice sites could achieve adequate site-level reliability by requiring responses from more respondents than were available in our dataset.
The CG-CAHPS survey, in providing the patient’s perspective, is critical for achieving the Institute of Medicine’s aim of patient-centered care and for improving quality of care in outpatient medical offices. Numerous studies have linked patient experience data in various settings to better clinical outcomes, patient adherence to medications, patient retention in physicians’ practices, and lower medical malpractice risk.17 It is therefore important to have reliable and valid measures for assessing patient experience.
The associations between the composites and global rating items provide support for the construct validity of the CG-CAHPS measures. Doctor Communication had the strongest relationship with the global ratings, which is consistent with earlier studies that have shown Doctor Communication to be a key driver of patients’ overall ratings of their doctor and their willingness to recommend their doctor's office.1,2,4 The Courteous/Helpful Staff composite had the weakest relationships with the global ratings suggesting that staff play less of a role in patients’ global assessments of their doctors.
It should be noted that while there were a large number of practice sites included in our dataset, they are not statistically representative of all medical offices in the United States because the data came from sites and states that voluntarily submitted their data to the CAHPS Database. Nevertheless, the analyses presented here represent one of the largest samples of medical offices studied and provide compelling support for the reliability, factor structure, and construct validity of the CG-CAHPS Adult Visit survey. Future research is needed to assess the associations of CG-CAHPS survey responses with clinical process measures and health outcomes.
The authors thank Dale Shaller for facilitating access to the CAHPS Database.
1. Solomon LS, Hays RD, Zaslavsky AM, et al. Psychometric properties of a group-level Consumer
Assessment of Health Plans Study (CAHPS
) instrument. Med Care. 2005;43:53–60
2. Hargraves JL, Hays RD, Cleary PD. Psychometric Properties of the Consumer
Assessment of Health Plans Study (CAHPS
) 2.0 Adult Core Survey
. Health Serv Res. 2003;38(part I):1509–1527
3. Hays RD, Brown J, Brown L, et al. Classical test theory and item response theory analyses of multi-item scales assessing parents’ perceptions of their children’s dental care. Med Care. 2006;44(suppl 3):S60–S68
4. Hays RD, Shaul JA, Williams V, et al. Psychometric properties of the CAHPS
™ 1.0 survey
measures. Med Care. 1999;37(suppl 3):MS22–MS31
5. O’Malley AJ, Zaslavsky AM, Hays RD, et al. Exploratory factor analyses of the CAHPSs Hospital Pilot Survey
Responses across and within Medical, Surgical, and Obstetric Services. Health Serv Res. 2005;40(part II):2078–2095
6. Peterson RAA. Meta-analysis of variance accounted for and factor loadings in exploratory factor analysis. Mark Lett. 2000;11:261–275
7. Brown TA Confirmatory Factor Analysis
for Applied Research. 2006 New York The Guilford Press
8. Kenny DA Measuring Model Fit. 2011. September 4, 2011. Available at: http://davidakenny.net/cm/fit.htm
. Accessed July 18, 2012
9. Hu LT, Bentler PM. Cutoff criteria for fit indices in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6:1–55
10. Muthen BO, Muthen LK Mplus: User’s Guide. 1998 Los Angeles, CA Muthen & Muthen
11. Julian M. The consequences of ignoring multilevel
data structures in nonhierarchical covariance modeling. Struct Equ Modeling. 2001;8:325–352
12. Muthen BO, Satorra AMarsden P. Complex sample data in structural equation modeling. Sociological Methodology. 1995 San Francisco Jossey-Bass:267–316
13. Nunnally JC, Bernstein IH Psychometric Theory. 1994 New York McGraw Hill
14. Raudenbush SW, Bryk AS Hierarchical Linear Models. 20022nd ed Thousand Oaks, CA Sage
15. O’Brien RM. A caution regarding rules of thumb for variance inflation factors. Qual Quant. 2007;41:673–690
16. Nelson. EC, Gentry MA, Mook KH, et al. How many patients are needed to provide reliable evaluations of individual clinicians? Med Care. 2004;42:259–266
17. Browne K, Roseman D, Shaller D, et al. Measuring patient experience as a strategy for improving primary care. Health Aff. 2010;5:921–925