Share this article on:

Literacy-Fair Measurement of Health-Related Quality of Life Will Facilitate Comparative Effectiveness Research in Spanish-Speaking Cancer Outpatients

Hahn, Elizabeth A. MA*; Du, Hongyan MS; Garcia, Sofia F. PhD*; Choi, Seung W. PhD*; Lai, Jin-Shei PhD*; Victorson, David PhD*; Cella, David PhD*

doi: 10.1097/MLR.0b013e3181d6f81b
Comparative Effectiveness

Background: Health-related quality of life (HRQL) assessment is frequently used in comparative effectiveness research, but low-literacy patients are often excluded. Appropriately translated and user-friendly HRQL measures are essential to ensure inclusion of low-literate and non-English-speaking patients in comparative effectiveness research.

Objectives: To compare HRQL responses across literacy levels in Spanish-speaking patients with cancer using a multimedia touch screen program.

Subjects: A total of 414 adult patients with cancer (213 with low literacy and 201 with high literacy).

Research Design: The touch screen system administered 3 questionnaires: The Functional Assessment of Cancer Therapy-General, the Short Form-36 Health Survey, and the Standard Gamble Utility Questionnaire. Measurement bias was evaluated using item response theory. Effects of literacy on HRQL were evaluated using regression models.

Results: Patients rated the touch screen easy to use and commented favorably on the multimedia approach. There was statistically significant item response theory measurement bias in 6 of 10 HRQL subscales; however, only 3 showed meaningful bias. Low-literacy patients had significantly lower mean scores on 3 of 4 Functional Assessment of Cancer Therapy-General subscales, before and after adjustment for patient characteristics. Low-literacy patients also had significantly lower mean scores on 5 of 6 Short Form-36 subscales; adjustment for patient characteristics attenuated or eliminated differences. Similar proportions of low- and high-literacy patients valued their current health as equivalent to perfect health.

Conclusions: This study demonstrates the feasibility of this multimedia touch screen program for low-literacy patients. The program will provide opportunities to evaluate the effectiveness of interventions in more diverse patient populations.

From the *Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University, Chicago, IL; and †Center on Outcomes, Research and Education, NorthShore University HealthSystem, Evanston, IL.

Supported by grant number #TURSG-02–069–01-PBP from the American Cancer Society.

Presented, in part, at the 2nd Annual Scientific Conference, Critical Issues in eHealth Research: Toward Quality Patient-Centered Care, Bethesda, MD, September 2006; and at the Symposium on Clinical and Comparative Effectiveness Research Methods: II, Rockville, MD, June 2009.

Reprints: Elizabeth A. Hahn, MA, Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University, 710 N. Lake Shore Dr., Room 725, Chicago, IL 60611. E-mail:

Health-related quality of life (HRQL) is commonly assessed through patient-reported outcome measures of function or preference. As HRQL assessment is now frequently used in clinical research, including comparative effectiveness research (CER), having appropriate measures for use by diverse and vulnerable populations (eg, low-literate and non-English-speaking patients) becomes increasingly important. However, because most HRQL questionnaire administration methods require reading skills, patients with low literacy are often excluded from participating.1,2 Similarly, despite efforts by the National Cancer Institute and other agencies, Hispanic/Latino and other racial/ethnic minority patients are not representatively included in cancer research.3,4

In a recent report, a focus on populations known to have health disparities was ranked 6 out of the top 100 recommended CER priority areas.5 Low research participation rates for minority and medically underserved populations inhibit adequate generalizability of results and identification of disparate care across racial, ethnic, linguistic, and socioeconomic subgroups.6 Clearly, appropriately translated study materials are essential to proper inclusion of non-English-speaking participants in research. Similarly, new administration methods can allow for the inclusion of low-literate patients in CER and other studies. Finally, valid measures are needed to ensure that differences in reported health between literacy groups do not reflect underlying measurement bias.7,8

The purpose of this study was to address all of these research needs in Spanish-speaking patients. We previously developed a multimedia touch screen program to assess HRQL, and validated its use in English-speaking patients with cancer.9,10 We adapted the program for this study with Spanish-speaking patients with cancer.11 We hypothesized that HRQL items would perform similarly across literacy levels, indicating essentially unbiased measurement, and that HRQL outcomes would not differ by literacy level after controlling for important sociodemographic and clinical characteristics.

Back to Top | Article Outline


Patient Enrollment and Literacy Assessment

Adult patients (age ≥18 years) were enrolled at Chicago-area cancer centers that provide care to underserved populations. Patients provided informed consent and received a $20 incentive, and all study materials were presented in Spanish by experienced bilingual study interviewers. Eligibility criteria were broad: any type or stage of cancer; any treatment status; Spanish language preference; and adequate visual, auditory and physical capabilities to use a multimedia touch screen. Prospective participants were informed that the study was testing a new touch screen with sound; they were not told that a literacy assessment would be conducted. This recruitment strategy was designed to minimize refusals due to shame about literacy skills, and was approved by each institutional review board. Literacy skills were measured after enrollment using the Spanish version of the Passage Comprehension subtest of the Woodcock Language Proficiency Battery.12,13 This test measures reading comprehension of short text passages. It has been extensively evaluated and standardized, and we have used it successfully in previous studies of English- and Spanish-speaking patients with cancer.10,14 We defined low literacy as a reading score below the seventh-grade level to correspond with the reading level of the HRQL questionnaires.

We employed purposeful sampling to achieve equal numbers of low- and high-literacy patients. Information was collected from the patient and medical record. We used the 4-item Short Acculturation Scale (SAS) to measure acculturation in terms of language preferences.15 SAS ranges from 1 (“Only Spanish”) to 5 (“Only English”) and an average score <3.0 reflects low acculturation.

Back to Top | Article Outline

HRQL Questionnaires

The touch screen system was programmed to administer 3 widely used HRQL questionnaires.

The Functional Assessment of Cancer Therapy-General (FACT-G, version 4) is a 27-item questionnaire with 5 Likert-type response categories (Nada, Un poco, Algo, Mucho, Muchísimo).16,17 Scores are available for total HRQL and the dimensions of physical, social/family, emotional, and functional well-being. Higher scores indicate better HRQL.

The Short Form-36 Health Survey (SF-36, version 1) is a 36-item measure of 8 health concepts: physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional and mental health, and 2 higher order dimensions.18,19 It contains multiple response formats (Sí/No, Cierta/Falsa). Higher scores indicate better HRQL.

The Standard Gamble Utility Questionnaire (SGUQ) is a preference-based HRQL measure that reflects the patient's value for her or his current health state.20 Patients consider a choice between remaining in their current health state or undergoing an imaginary treatment that could restore them to perfect health, but carries a risk of immediate death. Utility scores range from zero (current health equivalent to death) to one (current health equivalent to perfect health). We translated the SGUQ into Spanish using our certified language translation team.21,22

Back to Top | Article Outline

La Pantalla Parlanchina (The Talking Touchscreen)

The multimedia “Pantalla Parlanchina” (PP) allows self-administration of questionnaires by patients with varying literacy levels and computer skills (Fig. 1). 2,9,11 Sound for each text element is read out loud as patients listen via headset or speakers. Following a 5-minute semi-scripted instructional tutorial by study interviewers, which included completion of 2 sample questions, patients used the PP to complete the FACT-G and SF-36 (in randomized order), followed by the SGUQ. Evaluation questions were presented on the touch screen and during a short debriefing interview. Questions included: “What did you think about using the touch screen?”; “How hard was it for you to use the touch screen?” (very easy, easy, hard, very hard); “Would you be willing to do the surveys each time you visit the doctor?” (yes, sometimes, no).



Back to Top | Article Outline

Statistical Considerations

Characteristics were compared between low- and high-literacy groups using standard statistical tests for nominal, ordinal, or continuous variables. To evaluate psychometric equivalence of HRQL items between low- and high-literacy groups, we used item response theory (IRT) to investigate measurement bias (differential item functioning; DIF).23–25 One-parameter logistic (1-PL/Rasch) IRT modeling was performed for each FACT-G and SF-36 subscale that had at least 3 items. DIF was evaluated by computing a 95% confidence interval (CI) for the difference between low- and high-literacy IRT item calibrations.23 The minimal important difference (MID) between IRT calibrations is 0.3 logits.26 Modified FACT-G and SF-36 subscale scores were calculated by removing biased items and prorating to the original score range. Intraclass correlation coefficients were calculated to evaluate the extent of agreement between the original and modified scores.27

To evaluate equivalence of mean HRQL scores between low- and high-literacy groups, we tested whether the 95% CI for the mean difference on each subscale was within acceptable limits.28 The MID for each FACT-G subscale is 2.0.29 The MID is 7.0 for most of the SF-36 subscales, and 14.0 for Role-Physical and Role-Emotional.30

Separate multivariable linear regression models were constructed with each FACT-G and SF-36 subscale as the dependent variable and literacy as the primary independent variable. Covariates included recruitment site, sociodemographic factors (age, gender, Mexican ethnicity, acculturation [SAS<3 vs. ≥3], work status, marital status, living arrangement [alone vs. not alone], prior computer experience), and clinical factors (cancer diagnosis, stage at diagnosis, months since diagnosis, current chemotherapy treatment, performance status). Education was not included as a covariate.31 All covariates that met a screening criterion (P < 0.25 in bivariate regressions) were selected for a multivariable model and then removed using backward elimination (retention criterion, P < 0.05). Literacy was added to the final multivariable model to estimate the adjusted effect of literacy on each outcome. We followed recommended strategies to determine whether any apparent literacy effects were due to confounding (mediating) factors.32,33 First, we evaluated whether each covariate was significantly associated with literacy. Second, we evaluated whether the covariate was significantly associated with the outcome. Third, we ruled out the presence of interaction between the covariate and literacy. Fourth, we evaluated the magnitude of the literacy effect on the outcome before and after controlling for confounders. The same strategy was used to analyze the single overall health item on the SF-36 (fair/poor vs. excellent/very good/good)8,34 and the SGUQ utility score (1 vs. <1) using logistic regression. The primary goal of the regression analysis was to estimate the unadjusted and adjusted effects of literacy level on HRQL outcomes, and to determine whether these effects were meaningful. The 95% CI was calculated for the effect of literacy in each model. To evaluate the impact of measurement bias, the analyses were repeated using the modified subscale scores.

Back to Top | Article Outline


Patient Characteristics

We approached 487 patients and enrolled 414. Of these patients, 213 had low literacy and 201 had high literacy. Of the 73 patients who were not enrolled, 32 were ineligible and 41 refused study participation. Self-reported reasons for refusal included being too ill (n = 6), not having enough time to participate (n = 16), or other reasons (n = 19).35

Low- and high-literacy groups were comparable on some sociodemographic characteristics and most clinical characteristics (Table 1). Low-literacy patients were slightly older than high-literacy patients, and had lower education, less experience with computers, and poorer performance status. All patients self-identified as Hispanic, and the majority specified their ethnicity as Mexican/Mexican-American/Chicano. When asked their race, they tended to respond with their ethnicity. Over 90% of the patients had low acculturation based on our measure, ie, spoke Spanish exclusively/nearly exclusively.



Back to Top | Article Outline

Acceptability of la Pantalla Parlanchina

Patients reported that the PP was easy to use and commented favorably on the multimedia approach. Specifically, 16% and 27% of low- and high-literacy patients, respectively, reported that it was “very easy,” 80% and 72% reported that it was “easy,” and only 4% and <1% reported that it was “hard” or “very hard.” The majority (93%) in each group said they would be willing to complete PP surveys when they visit the doctor in the future. Representative comments (translated from Spanish): “Easy to use. Good for patients who have reading problems.” “Interesting and it was fun to do.” “Easier than I thought it would be.” “It gives you more privacy.” “You don't have to rush to answer questions.” “It's a good way to collect patient information.”

Back to Top | Article Outline

Evaluation of Measurement Bias

Literacy group differences in the IRT calibrations for the FACT-G and SF-36 items (and 95% CI) are in Figure 2. On the FACT-G, there was no statistically significant bias in the physical, social/family, and functional well-being subscales, ie, 95% CIs included zero (Fig. 2A). One emotional well-being item was significantly less favorable for low literacy (hope) and one was significantly less favorable for high literacy (worry about dying). The item on hope had a difference large enough to be considered meaningful (ie, larger than the MID).



Of the 6 SF-36 subscales with at least 3 items, only 1 (Role-Emotional) demonstrated no statistically significant bias (Fig. 2B). On the remaining subscales, 3 items were less favorable for low literacy (sick easier, pep/life, peaceful) and 4 items were less favorable for high literacy (vigorous activities, cut down time, health excellent, blue/sad). Only 2 items (vigorous activities, cut down time) had differences large enough to be considered meaningful.

There was excellent agreement between each original FACT-G and SF-36 subscale score and its modified score (calculated without biased items); specifically, the intraclass correlations were 0.93 to 0.98.

Back to Top | Article Outline

Impact of Literacy on FACT-G and SF-36 Health Status Scores

Descriptive statistics for the HRQL outcomes are in Table 2 and mean group differences are in Figure 3. Low-literacy patients had significantly lower mean scores on FACT-G Physical, Social/Family, and Functional Well-being, before and after adjustment for other characteristics (Fig. 3A). Patient-rated performance status was an important covariate in the models. Absolute mean differences between literacy groups were nearly always less than the MID of 2.0, although most of the 95% CIs did extend past the MID. There were no differences in Emotional Well-being, before and after adjustment for covariates and for measurement bias.





Low-literacy patients had significantly lower mean scores on SF-36 physical functioning, bodily pain, general health, vitality, and mental health; however, adjustment for other characteristics attenuated or eliminated these differences (Figs. 3B, C). Patient-rated performance status was an important covariate in the models. Most mean differences between literacy groups were smaller than the MID, although the 95% CIs often extended past the MID. There were no statistically significant or meaningful differences in Role-Physical, Social Functioning, or Role-Emotional.

Using the SF-36 item on overall health status, 50.5% and 45.4% of low-and high-literacy patients, respectively, reported their health as fair or poor (Table 2). The unadjusted odds ratio was 1.23 (95% CI, 0.83–1.82), and the covariate-adjusted odds ratio was 0.94 (95% CI, 0.60–1.47).

Back to Top | Article Outline

Impact of Literacy on Health Utility Scores

Overall, 46.4% and 47.1% of low-and high-literacy patients, respectively, valued their current health as equivalent to perfect health (utility = 1; Table 2). The unadjusted odds ratio was 0.97 (95% CI, 0.65–1.45), and the covariate-adjusted odds ratio was 0.91 (95% CI, 0.61–1.38).

Back to Top | Article Outline


This study demonstrates the feasibility and acceptability of la Pantalla Parlanchina, making this method of HRQL assessment a practical and welcomed approach in measuring health outcomes in low-literacy patients.2,9–11 To our knowledge, this is the first Spanish-language study to measure multiple HRQL dimensions using a multimedia platform that enables low-literacy patients to self-administer questionnaires, and the first to evaluate HRQL measurement bias across Spanish literacy groups.

The majority of the items demonstrated no significant measurement bias, and there was excellent agreement between scores calculated with and without biased items. This suggests that Rasch item parameter calibrations were relatively stable, and there is no systematic literacy bias in reporting HRQL.

For several FACT-G and SF-36 subscales, it appears that low literacy could be a marker, and possibly a risk factor, for poorer HRQL. To a degree, the differences were attenuated or eliminated when adjusting for other characteristics. In addition, the mean differences were smaller than the MIDs. This study did not determine the reason for the disparities observed; it would be particularly useful to evaluate the extent to which literacy level itself carries some risk for poorer self-reported health. We note that individual item differences were not extremely different between groups; rather the aggregation of them together tended to differentiate the groups compared. By contrast, in one area where there was not a group difference (FACT-G Emotional Well Being), we note that “hope” was significantly lower in the low-literacy group whereas worry about death was significantly worse in the high-literacy group. In this one case, it seems possible that the total score masked some important distinctions between low- and high-literacy patients; distinctions that might reflect a cultural difference in affect and perspective between groups.

Patient-rated performance status was included in all of the final multivariable models, and it appeared to mediate the effects of literacy, ie, it was associated with literacy and with the outcome, and its inclusion in the regression models attenuated the effects of literacy.36 Other covariates included in the final models differed across HRQL outcomes. Research is needed to evaluate the nature of these relationships and any potential causal pathways.37

The psychometric measurement techniques that we used are well-established in educational testing and are now being implemented in health outcomes assessment.10,38,39 These techniques provide a useful way to identify specific items that may be problematic for measuring HRQL in certain patient groups. One advantage of using multiple items to measure a latent trait (such as HRQL) is that a problematic item can be removed from scoring, as long as certain assumptions are met.23–25 We believe it is important for researchers to evaluate the possibility of systematic measurement bias, and we implemented one approach that can be useful. In our study, item bias was relatively balanced across literacy groups; therefore, the removal of problematic items did not affect conclusions about literacy differences. If our study had found evidence of systematic directional differences (eg, if responses for low-literacy patients were consistently less favorable than high-literacy patients), then removal of items would have changed the measurement of HRQL. In addition, if a HRQL subscale contains a small number of items, then removal of items could have a substantial impact on measurement.

The results of our study suggest that the FACT-G and SF-36 health status questionnaires and a standard gamble utility questionnaire provide literacy-fair measurement of HRQL and can be used confidently with low-literacy Spanish-speaking patients with cancer. The psychometric analyses implemented here cannot be conducted with utility questionnaires; however, further research should be conducted to determine how well low- and high-literacy patients are able to understand the concepts of risk and trade-offs implicit in the questionnaire. Qualitative methods such as targeted interviews or focus groups with patients would also be useful to probe for possible reasons for measurement bias.

Low literacy is associated with health disparities such as reduced access to health information, poorer understanding of illness and treatment, less effective disease management, less understanding and use of preventive services, poorer physiological health markers, lower medication adherence, increased hospitalizations, and higher financial costs.8,37,40 However, little has been known about the association between literacy and HRQL/health status. Similar to our previous study,10 the results of this study suggest that low literacy is not a meaningful independent risk factor for poorer HRQL outcomes. However, some HRQL subscales did demonstrate statistically significant differences between groups.

Three previous studies reported poorer health status outcomes for low-literacy patients.8,34,41 There are several reasons that may explain why our findings differ from these earlier reports. All of those studies used interviewer-administered questionnaires, which had been necessary when gathering self-report data from low-literacy patients. Also, some studies used only a single item to measure general health status, rather than measuring comprehensive HRQL domains. It is worth noting that the most commonly used single item (general health rating on excellent-to-poor scale) did not demonstrate measurement bias in our study. This suggests that previous findings, and those reported here, based on a single global item were likely not affected by literacy-related measurement bias. For monitoring treatment and decision-making, however, multidimensional assessment of HRQL is clinically more useful than a single-item global rating. Another possibility is that different measures for assessing and categorizing literacy may lead to different interpretations of the impact of literacy on health outcomes. Finally, many results were reported only for a pooled group of English- or Spanish-speakers.

There are some limitations to our study. We enrolled only cancer outpatients who were well enough to participate. Results may not be generalizable to patients with greater disease severity and poorer HRQL. In addition, the majority of patients self-identified their ethnicity as Mexican; therefore, results may not be applicable to other Spanish-speaking patients. Another limitation is that we did not gather qualitative data to assess patients’ understanding of utilities or possible reasons for bias. Such qualitative data would have been very useful for interpreting some of our findings.

Because of barriers to self-administration of questionnaires, low-literacy patients are often excluded from participating in assessment of patient-reported health outcomes. By overcoming these assessment barriers, our Pantalla Parlanchina will extend eligibility for future research and provide greater opportunities to measure patient-reported outcomes (PRO) to evaluate the effectiveness of interventions in more diverse patient populations. Since the software is programmed as a web-based research application that could be linked to an electronic medical record system in the future, this would facilitate the use of PRO data in primary and secondary CER, including cost-utility analyses. PROs also provide key information relevant to the delivery of individualized, patient-centered care. La Pantalla Parlanchina will allow for identification of potentially significant distinctions between low-and high-literacy patient groups, and new insight into previously undetected disease or treatment problems among low-literacy patients.

Back to Top | Article Outline


The authors thank Drs. Angel Galvez, Ahmad Jajeh, Elizabeth Marcus, Gail Shiomoto, Samuel Taylor, and Mala Vohra for assisting in recruitment of their patients; Patricia Diaz, Veronica Valenzuela and Maria Corona for recruiting and interviewing patients; Shaheen Khan for health information technology support; Dr. Deborah Dobrez for expertise on the standard gamble utility questionnaire; and Kathleen Richter for editorial guidance. The authors also thank all of the patients who participated in this study.

Back to Top | Article Outline


1. Ad Hoc Committee on Health Literacy for the Council on Scientific Affairs, American Medical Association. Health literacy: report of the Council on Scientific Affairs. JAMA. 1999;281:552–557.
2. Hahn EA, Cella D. Health outcomes assessment in vulnerable populations: measurement challenges and recommendations. Arch Phys Med Rehabil. 2003;84:S35–S42.
3. Christian MC, Trimble EL. Increasing participation of physicians and patients from underrepresented racial and ethnic groups in National Cancer Institute-sponsored clinical trials. Cancer Epidemiol Biomarkers Prev. 2003;12:277–283.
4. Giuliano AR, Mokuau N, Hughes C, et al. Participation of minorities in cancer research: the influence of structural, cultural, and linguistic factors. Ann Epidemiol. 2000;10:S22–S34.
5. Institute of Medicine. Initial National Priorities for Comparative Effectiveness Research. Washington, DC: Institute of Medicine of the National Academies; 2009.
6. Gibbons RJ, Gardner TJ, Anderson JL, et al. The American Heart Association's principles for comparative effectiveness research: a policy statement from the American Heart Association. Circulation. 2009;119:2955–2962.
7. DeWalt DA, Berkman ND, Sheridan S, et al. Literacy and health outcomes: a systematic review of the literature. J Gen Intern Med. 2004;19:1228–1239.
8. Baker DW, Parker RM, Williams MV, et al. The relationship of patient reading ability to self-reported health and use of health services. Am J Public Health. 1997;87:1027–1030.
9. Hahn EA, Cella D, Dobrez D, et al. The talking touchscreen: a new approach to outcomes assessment in low literacy. Psychooncology. 2004;13:86–95.
10. Hahn EA, Cella D, Dobrez DG, et al. The impact of literacy on health-related quality of life measurement and outcomes in cancer outpatients. Qual Life Res. 2007;16:495–507.
11. Hahn EA, Cella D, Dobrez DG, et al. Quality of life assessment for low literacy latinos: a new multimedia program for self-administration. J Oncol Manag. 2003;12:9–12.
12. Woodcock RW. Examiner's Manual: Woodcock Language Proficiency Battery-Revised. Allen, TX: DLM Publisher; 1991.
13. Woodcock RW. Examiner's Manual: Woodcock Language Proficiency Battery-Spanish Form. Chicago, IL: Riverside Publishing Company; 1981.
14. Wan GJ, Counte MA, Cella DF, et al. The impact of socio-cultural and clinical factors on health-related quality of life reports among Hispanic and African-American cancer patients. J Outcome Meas. 1999;3:200–215.
15. Marin G, Marin BV. Research with Hispanic Populations. Newbury Park, CA: Sage Publications; 1991.
16. Cella DF, Tulsky DS, Gray G, et al. The functional assessment of cancer therapy scale: development and validation of the general measure. J Clin Oncol. 1993;11:570–579.
17. Cella D, Hernandez L, Bonomi AE, et al. Spanish language translation and initial validation of the functional assessment of cancer therapy quality-of-life instrument. Med Care. 1998;36:1407–1418.
18. Ware JE Jr, Sherbourne CD. The Mos 36-item short-form health survey (Sf-36). I. Conceptual framework and item selection. Med Care. 1992;30:473–483.
19. Alonso J, Prieto L, Anto JM. La version espanola del Sf-36 health survey (cuestionario de salud Sf-36): un instrumento para la medida de los resultados clinicos. Med Clin (Barc). 1995;104:771–776.
20. Von Neumann J, Morgenstern O. Theory of Games and Economic Behavior. Princeton, NJ: University Press; 1953.
21. Bonomi AE, Cella DF, Hahn EA, et al. Multilingual translation of the Functional Assessment of Cancer Therapy (FACT) quality of life measurement system. Qual Life Res. 1996;5:309–320.
22. Eremenco SL, Cella D, Arnold BJ. A comprehensive method for the translation and cross-cultural validation of health status questionnaires. Eval Health Prof. 2005;28:212–232.
23. Wright BD, Masters GN. Rating Scale Analysis: Rasch Measurement. Chicago, IL: MESA Press; 1985.
24. Holland PW, Wainer H. Differential Item Functioning. Hillsdale, NJ: Lawrence Earlbaum Associates; 1993.
25. van der Linden WJ, Hambleton RK. Handbook of Modern Item Response Theory. New York, NY: Springer-Verlag; 1997.
26. Hudgens S, Dineen K, Webster K, et al. Assessing statistically and clinically meaningful construct deficiency/saturation: recommended criteria for content coverage and item writing. Rasch Meas Trans. 2004;17:954–955.
27. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–428.
28. Blackwelder WC. “Proving the null hypothesis” in clinical trials. Control Clin Trials. 1982;3:345–353.
29. Yost KJ, Eton DT. Combining distribution- and anchor-based approaches to determine minimally important differences: the facit experience. Eval Health Prof. 2005;28:172–191.
30. Ware JE, Snow KK, Kosinski M. Sf-36 Health Survey: Manual and Interpretation Guide. Lincoln, RI: QualityMetric Incorporated; 2000.
31. DeWalt DA, Pignone MP. Reading is fundamental: the relationship between literacy and health. Arch Intern Med. 2005;165:1943–1944.
32. Anderson S, Auquier A, Hauck WW, et al. Statistical Methods for Comparative Studies: Techniques for Bias Reduction. New York, NY: John Wiley & Sons; 1980.
33. Evans GW, Lepore SJ, Moore G, et al. Moderating and mediating processes in environment-behavior research. In: Moore G, Marans RW, eds. Advances in Environment, Behavior and Design: Toward the Integration of Theory, Methods, Research, and Utilization. New York, NY: Plenum Press; 1997:255–285.
34. Gazmararian JA, Baker DW, Williams MV, et al. Health literacy among Medicare enrollees in a managed care organization. JAMA. 1999;281:545–551.
35. Du H, Valenzuela V, Diaz P, et al. Factors affecting enrollment in literacy studies for English- and Spanish-speaking cancer patients. Stat Med. 2008;27:4119–4131.
36. Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol. 1986;51:1173–1182.
37. Nielsen-Bohlman L, Panzer AM, Kindig DA; Committee on Health Literacy. Health Literacy: A Prescription to End Confusion. Washington, DC: The National Academies Press; 2004.
38. Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000;38:28–42.
39. Cella D, Chang CH. A discussion of Item Response Theory (IRT) and its applications in health status assessment. Med Care. 2000;38:1166–1172.
40. Berkman ND, DeWalt DA, Pignone MP, et al. Literacy and Health Outcomes. Evidence Report/Technology Assessment No. 87. Rockville, MD: Agency for Healthcare Research and Quality; 2004.
41. Wolf MS, Gazmararian JA, Baker DW. Health literacy and functional health status among older adults. Arch Intern Med. 2005;165:1946–1952.

literacy; Hispanic health; patient-reported outcomes; computer testing; item response theory

© 2010 Lippincott Williams & Wilkins, Inc.