In the context of chronic disease, health-related quality of life (HRQL) has become an important measurement criterion of the impact of therapeutic strategies, including pharmaceuticals, on individual and public health.1–3 Recent recommendations relating to the development of new drugs specify that the impact on HRQL and patient-reported outcomes (PROs) should be carefully assessed 4–6 and in the field of HIV the need for contemporary HRQL instruments has been highlighted.7
Most HIV-HRQL instruments were developed in the pre–highly active antiretroviral therapy (HAART) era7; and in largely English-speaking white populations of men who have sex with men.8 They miss several issues important to people living with HIV/AIDS (PLWHA), such as lipodystrophy, sexual dysfunction, sleep disturbance, and none address the impact of treatment.7,9 Moreover, pre-HAART measures may not be suitably discriminating in the current epidemic where in many developing regions women are disproportionally affected.10 Low-income to middle-income countries in Asia, Latin America, and Sub-Saharan Africa account for 95% of the HIV/AIDS-infected population,11 and 42% now have access to HAART.10 Therefore, the experiences of most people receiving HAART, and semantic and conceptual equivalences in different cultures and languages, were not accounted for in the development of current instruments.
The impetus for the patient-reported outcomes quality of life HIV instrument (PROQOL-HIV) came in response to these pressing issues. International development of the preliminary instrument, including the conceptual model and item generation process, has been published elsewhere.12 The article presents the psychometric validation of the PROQOL-HIV instrument using data simultaneously collected in 8 countries. The selection of final items and factor composition are presented.
A total of 826 HIV patients were recruited from 9 centers in 8 countries between July and December 2008: Australia (AU), Brazil (BR), Cambodia (KH), China (CN), France (FR), Senegal (SN), Thailand (TH), USA. Thirty-four French patients participated in a test retest study. Inclusion criteria were as follows: adults receiving HAART or treatment naive. Exclusion criteria were as follows: patients under the age of 18, hospitalized and/or suffering from impaired cognition. Data from a total of 791 participants were analyzed after exclusion of 24 participants for whom more than 15% of responses on PROQOL-HIV items were missing and 11 participants for whom information about HAART information status was missing. The study was approved by Institutional Review Boards at each site and informed consent was obtained from participants.
Five questionnaires were used for the study as follows: the pilot PROQOL-HIV, EQ-5D, Medical Outcomes Study (MOS)–HIV, an HIV-symptom index, and demographic survey. Clinical information; history and stage of HIV disease, comorbidity and treatment; CD4 T-cell count and viral load were obtained from participants' medical records. Questionnaires were administered face to face in China and Senegal, Cambodia and in cases where respondents had difficulty reading the documents. The EQ-5D and MOS-HIV were used to assess the construct validity of PROQOL-HIV. These measures were chosen due to their wide international usage, psychometric robustness, and availability across most study languages.13–16 The HIV symptom index and clinical data were used in the assessment of criterion validity. The symptom index was chosen due to its demonstrated comprehensiveness and comprehensibility.17
The pilot PROQOL-HIV questionnaire comprised 70 items. As described in,12 the items and the themes they represent were derived from patient interviews performed across 9 countries in 11 languages. 67 items assessed 11 HIV HRQL themes as follows: general health perception (1 item), social relationships (12 items), emotions (7 items), energy/fatigue (5 items), sleep (2 items), cognitive functioning (2 items), physical and daily activity (4 items), coping (4 items), future (6 items), symptoms (9 items), and treatment (15 items). The remaining 3 non-HRQL items concerned satisfaction with HIV health care services, financial difficulties due to HIV, and concerns about having a child. Participants rated the extent to which they had experienced each of the items over the past 2 weeks on a 5-point scale ranging from 0 = ‘never' to 4 ‘always'. One item was an exception whereby response categories ranged from 0 = ‘very good' to 4 = ‘very poor'. All items referred to the experience of “the last 2 weeks, because i am HIV positive”. Treatment items were only applicable and, therefore, completed by participants receiving HAART.
The EuroQol13,14 generic health index comprises 5 items assessing mobility, self-care, usual activities, pain/discomfort, and anxiety/depression on 3-point scales, and a 100-point visual analog scale (VAS, from “best” to “worst imaginable health state”) of self-perceived general health.
The Medical Outcomes Study HIV Health Survey15,16 comprises 35 items that assess 11 dimensions as follows: general health perception, pain, physical functioning, role functioning, social functioning, mental health, energy/fatigue, health distress, cognitive function, quality of life, and health transition. Two composites measures can be computed as follows: physical health state (PHS) and mental health state (MHS). All scores are expressed as normed T scores, with mean 50 and SD 10. In previous research, the MOS-HIV has demonstrated adequate internal consistency (Cronbach alpha > 0.70), construct validity and responsiveness in previous research.15,16 Linguistically validated versions were available in all study languages except Wolof and Khmer.
The symptom index17 comprised 32 common and HIV-specific symptoms scored in terms of presence/absence (1, 0) and severity on a 4-point scale (0 = “not at all” to 3 = “quite a bit”). The index has been shown to be comprehensive, comprehensible and has demonstrated adequate construct validity in previous research.17 The index was linguistically validated across all necessary languages for the purposes of this study.
Sample characteristics were assessed for differences between participants included and excluded from analysis on the basis of missing data and differences in sample profiles between countries.
Classical statistical summaries—frequency of responses in each category, mean and SD, cumulated proportions for the 2 extreme categories, interitem (linear and polychoric) correlations, and item-scale correlations corrected for overlap—were used to identify items exhibiting saturation effect (ceiling or floor effect above 75%), poor reliability due to high response variance, low item-score correlation (<0.6), or redundancy (pair-wise correlation >0.7). This initial item reduction stage also took into account the conceptual relevance of items to the domain they purported to assess. When pairs of highly similar items satisfied statistical retention criteria, the item deemed to be less conceptually relevant was discarded.
Concurrent orthogonal exploratory factor analyses and item cluster analysis18 with varying numbers of factors were used to identify relevant subsets of items and to verify the dimensions hypothesized in.12 As the sample included some naive patients (n = 109, 13.8%) who did not complete the 10 treatment-related items, separate analyses were performed for the whole questionnaire and the common set of items. The factor structure of PROQOL-HIV was delineated using principal component-based exploratory factor analyses on polychoric correlation matrix with VARIMAX rotation. The internal consistency of the questionnaire was assessed with Cronbach alpha for all items and after iterative removal of each item.
Convergent and discriminant validity were assessed by computing between and within-scale linear correlations using Multitrait scaling. Scaling success was computed as the proportion of items having a larger correlation with their own dimension than with any other dimensions, with higher values indicating good convergent validity.19 Scaling success with Bonferroni-corrected significance tests was also considered, whereby only nonsignificant 1-tailed correlation tests were counted as scaling errors.
Summary scores were constructed for each dimension (factor scores) and for the questionnaire as a whole (global score). The global score was used to compare HRQL ranking of naive and treated patients, using Pearson linear and Spearman rank correlations. Scores were computed as the sum of corresponding items scores and were standardized on a 100-point scale (0 = “worst” to 100 = “best” HIV HRQL).
Concurrent and Criterion Validity
Concurrent validity was assessed by computing correlations between PROQOL-HIV factor scores and existing questionnaires (EQ-5D and MOS-HIV). To assess criterion validity, we computed correlations of PROQOL-HIV global scores with clinical data (CD4 cell count) and the HIV symptoms index. For continuous outcomes, analysis of variances were used for multi-group comparisons. Two-way cross-classifications were tested using χ2 tests.
Test–Retest Reliability Analysis
Test–retest data were used to assess the temporal stability of the global score on a sample of 34 French patients in a stable condition. Analysis was based on a mixed-effects model.
Comparison of Scores by Country
Factor scores were compared across countries using multivariate analysis of variance with the intention of demonstrating PROQOL's sensitivity to cultural differences.
One hundred patients per country (including 10% of naive patients) were needed to ensure reliable item statistics by country, perform group comparisons with adequate power, and assess within-country factor structure of the PROQOL-HIV questionnaire. The significance level considered for statistical comparisons was 0.05, with Bonferroni correction for multiple comparisons when needed. All analyses were done using the open-source R 2.8 statistical software.20
Characteristics of the 791 participants are shown in Table 1. No significant differences were found between patients analyzed (n = 791) and those excluded from analysis (n = 35) (data not shown). Among the 791 patients, 286 (36%) were women of whom 16 were pregnant, 3 were transgender (2 males), and 502 (63%) were male. Gender distribution significantly differed among countries (P < 0.001), with males comprising two-thirds of participant samples in all countries except Senegal. Mean age also differed significantly between countries (P < 0.001) with Cambodia (M = 37 years) providing the youngest and Australia (M = 46 years) providing the oldest participant pools.
The proportion of naive patients (14%) varied between 0% and 27% within countries. Median time since first HIV diagnosis varied from 1 to 9 years (range: 1–27), with Asian and SN patients diagnosed more recently (P < 0.001). Distribution of HIV Centers for Disease Control and Prevention stages also varied from one country to the other (P < 0.001), with C stage reported more frequently in China (63%) and Cambodia (49%). Median (interquartile range) CD4 counts were 404 (246–552) cells per cubic millimeter, with significant differences across centere (P < 0.001), in particular lower median CD4 counts in Cambodia and China. Overall, 19% of all patients have a count below 200 cells per cubic millimeter. Participants reported a higher prevalence of men who have sex with men HIV transmission in Western countries (47%) as compared with Asian countries where heterosexual but also intravenous drug use accounted for about 75% of reported transmission.
Less than 3% of missing responses were recorded on the validation data set. Items showing high saturation effect (11 items met this criteria), and items showing high variance (2 items), high pair-wise correlation with another item (12 items), and/or low item-score correlation (3 items) were removed. Only 2 items whose response profiles showed a strong discrepancy between countries were discarded.
A final set of 43 items was retained, including 4 items considered supplementary to HIV-HRQL having broader concept than HRQL. Regarding the factorial structure of the 39 primary items, the distribution of eigenvalues as examined through a modified parallel analysis21 showed that PROQOL-HIV was driven by 8 factors as follows: a first factor explaining 30% of the total variance, and 7 factors aggregating between 2 and 10 items each. These 8 factors accounted for 60% of total variance (Table 2). The 8 dimensions in decreasing order of explained variance were headed as follows: physical health and symptoms (PHS, 9 items), treatment impact (TI, 10), emotional distress (ED, 4), health concerns (HC, 4), body change (BC, 4), intimate relationships (IR, 3), social relationships (SR, 2), stigma (ST, 2). Internal consistency was in the acceptable range with Cronbach alpha ranging from 0.772 (HC) to 0.885 (PHS).
The item Global Health was found to belong to the large first component (PHS), but was scored separately as it assesses general health state (HIV and non-HIV). Three items related to the social impact of HAART cross-loaded onto the social relationships and TI factors, and they were kept in the latter which is completed only by treated patients. A replication study done on n = 91 French patients in April to May 2009 yielded a comparable 8-factor structure (data not shown).
Scaling success (Table 2) was 98% on average (range: 93%–100% for the 8 scales), with an average homogeneity index of 0.479 (range of interitem correlations, 0.298–0.623).
To provide comparable results between naive and treated patients, we computed 2 types of global scores as follows: (1) total score on the common set of items (29 items); and (2) total score including treatment-related items (39 items). Figure 1 summarizes the proposed scoring scheme. Imputation of missing responses with median individual score (if at least half of the items composing the scale were endorsed) did not change the interpretation of scores presented below. Correlation between the 2 standardized global scores with and without treatment dimension was very high, with linear correlation r = 0.968 [95% CI: (0.963 to 0.972)] and rank correlation ρ = 0.971. Factor scores were computed for the 8 dimensions. Four individual items are scored individually.
Concurrent Criterion Validity
Rank correlation of PROQOL-HIV global score with EQ-5D VAS was moderate [0.478 with 95% bootstrap CI (0.415 to 0.532)]. It was higher with MOS-HIV PHS [0.651, (0.594 to 0.699)] and especially the MHS [0.811, (0.775 to 0.839)]. For the first item assessing general health state from PROQOL-HIV, the correlation with EQ-5D VAS was moderate (r = 0.537). Likewise, when we restricted comparison to the PHS domains assessed by PROQOL-HIV and MOS-HIV, the correlation was 0.757 (0.720 to 0.789) with MOS-HIV PHS composite and 0.635 (0.585 to 0.681) with the physical functioning scale. This is indicative of good agreement between the 2 series of measures.
The 5 most reported symptoms (frequency, mean severity level) were fatigue (53%, 2.4), difficulty falling asleep or insomnia (44%, 2.6), headache (44%, 2.4), muscular pain (44%, 2.5), and dry skin or any other skin problem (40%, 2.4). There were significant variations in the frequency of self-reported symptoms between countries [χ2 (14) = 58.736, P < 0.001], but we found good agreement between PROQOL-HIV symptoms, MOS-HIV symptoms, and responses to the symptom index.
Subgroup analyses (adjusted for country effect) were performed by considering the effect of different 2-class predictors, such as CD4 cell counts (<200 or ≥200 cell/mm3), the number of reported symptoms (<5 or ≥5), and the existence or not of 1 or more comorbidities (psychiatric disorder, depression, cardiovascular disease, or diabetes), on global scores. For CD4, we found significant differences [F (1703) = 12.352, P < 0.001] between patients' mean scores in the high (62.2, SD = 18.5, n = 575) vs. low cell count groups (56.3, SD = 18.8, n = 137). Patients without comorbidity (n = 598) had significantly [F (1782) = 63.705, P < 0.001] higher mean scores (68.4, SD = 16.5) compared with patients reporting at least 1 comorbidity (59.5, SD = 15.3, n = 193). Finally, the frequency of reported symptoms also significantly influenced HRQL [F (1758) = 377.140, P < 0.001], with patients experiencing more than 5 symptoms (n = 477) having lower global scores (58.8, SD = 14.5) compared with other patients (77.8, SD = 12.8, n = 314).
The retest study done in France 52 days later (range: 34–62, SD = 7) showed that global scores were stable across time with an intraclass correlation coefficients estimated at 0.859 [95% CI: (0.710 to 0.960), n = 34].
Scores Distribution Across Countries
When comparing countries, significant differences in mean global scores were observed (F (7783) = 13.652, P < 0.001): Chinese and Khmer patients had lower scores compared with patients from other countries including Thailand (Tukey HSD post hoc comparisons, P < 0.05).
Country-specific profiles on the 8 dimensions are displayed in Figure 2 in terms of mean scores and 95% CIs. A multivariate analysis of variance (MANOVA) indicated that mean factor scores were significantly different between countries (Wilks statistic, W = 0.436, P < 0.001). It is worthwhile noting that patients from Senegal were severely impacted on the Stigma scale, and that Chinese and Khmer patients had, on average, lower scale scores as compared with patients from other countries.
Similar between-country variations were observed on the EQ-5D and MOS-HIV questionnaires. Figure 3A shows the distribution of mean scores for each of the 5 items, together with VAS health state score. The MOS-HIV T scores are summarized in Figure 3B, with PHS and MHS scores highlighted for each country.
We have demonstrated the sound psychometric properties of a new measure of HIV HRQL that, unlike instruments in current use, is attuned to treatment-related issues and is valid to use across a diverse range of countries and languages. From an initial pool of 70 items covering 11 themes, the pilot questionnaire was reduced to 38 items covering 8 dimensions, mostly overlapping with the hypothesized themes,12 plus 1 item assessing general health and 4 additional individual items. Those individual items encompass broader concepts than HRQL (eg, financial impact, satisfaction with care) or are more targeted toward specific populations (eg, maternity/paternity). Likewise, an item assessing spiritual beliefs does not contribute to factor or global scores. However, those issues remain important for PLWHA and might provide additional insights in predictive models. These items are scored separately. The questionnaire is short to complete (10 minutes), and the scoring procedure follows simple rules.
The analyses undertaken provide strong evidence for the psychometric qualities of PROQOL-HIV in terms of validity and reliability, compared with existing questionnaires like MOS-HIV,16 FAHI,22 HOPES,23 HAT-QOL,24 AIDS-HAQ,25 WHOQOL-HIV,26 and the MQOL-HIV.27 Although item reduction followed standard psychometric procedures for item analysis, equal attention was given to content validity to reflect different heath care systems and cultures that were highlighted in the conceptual model. It follows that only 2 items were discarded because of country-specific asymmetry in responses distributions: “I have felt ashamed” showed discrepancies between CH, KH, and SN patients; “I have felt stressed” showed more bother for CH, KH notwithstanding the fact that this item also correlated with items related to sadness and anxiety. Several items were deleted for more than one reason. For example, “I have felt isolated” showed a floor effect and correlated with items related to family relationships and depression; “I have felt like stopping my HIV medicine” also exhibited saturation effects and was highly correlated to items dealing with difficulties taking treatment and forgetfulness.
Although we found consistent correlations with MOS-HIV physical and mental composites, the choice of comparators to assess concurrent validity in some countries was limited. As the MOS-HIV questionnaire was not available in Wolof, only the EQ-5D questionnaire was available for Senegal, though correlation analysis conformed to expectations with regard to general health state. Similar difficulties arose for criterion validity as follows: CD4 count was shown to correlate with health status and HRQL (below and above the threshold for opportunistic disease, 200 cells/mm3)28,29 but can be related to its dynamic changes.9 Correlations with depression status and HIV-related HRQL are in agreement with the literature.30
New HRQL Dimensions in the Era of HAART
The new HRQL dimensions which emerged during PROQOL-HIV's development highlight that existing HIV HRQL measures lack sensitivity to the impact of contemporary treatment. Indeed, treatment-related concerns coalesced to form the second largest factor and included size and number of pills and activity limitations imposed by treatment schedule. The importance of such concerns to HIV HRQL concurs with the work of others.31,32 Body change, which encompassed lipodystrophy, emerged as a dimension unique to PROQOL-HIV as well. The HRQL impact of morphologic changes is well known,33 though only previously covered by the lipodystrophy-specific ABCD questionnaire.34 The need for an HIV HRQL measure that incorporated body image, sexual function, and stigma had previously been stated.35 The MOS-HIV for instance does not encompass these issues nor sleep disturbance as emphasized by patients in.12 PROQOL-HIV does address these issues in addition to those relating to treatment and can be used by treated or treatment-naive patients.
Most instruments initially developed in English are unlikely to reflect with sufficient precision the cultural diversity in subjective perception of HIV-specific HRQL. Only the WHOQOL questionnaire has been simultaneously developed in several languages. But although an HIV-specific 26 module was added to the generic WHOQOL, it missed several of the aforementioned issues (eg, TI, lipodystrophy).
Some of PROQOL-HIV dimensions show the potential to capture relevant variation across cultures. For example, the Stigma dimension showed the greatest variation among countries, as observed during qualitative interviews. The TI dimension, new to PROQOL-HIV, showed consistent scoring, less variation across countries, and more homogeneity within countries. This suggests that the issues inherent to HAART are relatively consistent across countries. The exhaustive development steps preceding the psychometric validation phase ensured a balance between in-depth HRQL evaluation and scale length and eliminated the need for multiple questionnaires with differing content to assess cross-country HIV-HRQL.
Limitations and Perspectives
There were several study limitations. As a disease-specific questionnaire, the PROQOL-HIV questionnaire was not designed to facilitate direct HRQL comparisons between the general population and PLWHA, with the exception of the general health item. Nor does PROQOL-HIV measure more general quality of life issues not related to health such as those captured by the WHOQOL, for example, quality of housing, transportation problems, environmental issues. The reverse is true as well. Generic HRQL questionnaires such as the SF-36 fail to capture numerous important HIV-specific issues,8 and PROQOL-HIV was developed to overcome these shortcomings.
Although PROQOL-HIV is intended to be used as a self-reported HRQL measure, it can be administered face to face. In the present study, it was not possible to compare the 2 modes of administration due to limitations in sample size and participating countries. As data were collected in a cross sectional study, it was not possible to assess the responsiveness of PROQOL-HIV scores. Test–retest reliability was limited to French patients, and the number of patients per country did not allow for reliable multigroup factor analyses, although ongoing data collection will help to strengthen the present results. Likewise, its usefulness in routine clinical practice will be assessed in future studies.
PROQOL clearly represents an advance in the field of HIV-HRQL assessment in the era of HAART. It is both a valid and reliable contemporary instrument for measuring HIV-specific HRQL. Moreover, it covers important and previously neglected dimensions such as TI and lipodystrophy. It is an appropriate new instrument for use in international studies and clinical trials.
This article was made possible by the following people: Christophe Misse, Helene Gilardi, AP-HP, Saint-Louis Hospital, Department of Clinical Research (PRO Unit), Paris, France, Simon Mallal, Noel Hyland, Institute for Immunology and Infectious Diseases, Murdoch University and Royal Perth Hospital, Australia; Mauro Schechter, Monica Barbosa de Souza, Jorge Marcio, Ana Lucia Weinstein, Sandra Barros Telles, Projeto Praça Onze, Hospital Escola São Francisco de Assis, Universidade Federal do Rio de Janeiro; Boroath Ban, Kim Bopiseth, Vutha Nhao, Sary Sar, Rotha Heng, Service de Maladies Infecieuses, Hôpital Calmette, Phnom Penh; Xiaoyou Su, Lu Yao, Liu Jun, Zhu Zhangping, Hy Tsui, School of Public Health and Primary Care, Faculty of Medicine, The Chinese University of Hong Kong, China; Corinne Taeron, Miguel Ange Garzo, Hughes Fischer, Patient Associations, France; Patrizia Carrieri, INSERM UMR 912 SE4S Marseille, France; Agnés Levy, Anne Persoz, Alioune Blondin-Diop, Patricia Assal, Olivier Ségeral, Sylvie Cheneau, Sophie Ismael, Juliette Baillon, Valérie Morin, Martine Mole, Marie Thérèse Rannou, Marie-Stéphane Nguessan, Sandrine Pottez, Cécile Goujard, Jean-François Delfraissy, Bicetre Hospital, Paris, France; Pascale Leclerc, CHU Grenoble, France; Social Science Working Group, ANRS, Paris, France; Veronica Noseda, Vincent Douris, Marc Dixneuf, Paola de Carli, Sidaction, Paris, France; Rewa Kohli, Sharvari Apte, Vinod Bhalerao, Pallavi Nimbalkar, Girish Rahane, Madhuri Khaire, Somnath Borude, National AIDS Research Institute, Pune, India; Assane Diouf, Bintou Dia, Adji Fatou, Abdoulaye Thiam, Moussa Diallo, Bineta Diallo, Cheikh Diop, Salif Sow, Regional Research and Training Centre for HIV and Associated Diseases, Department of Infectious Diseases, Fann Teaching Hospital, Dakar, Senegal; Virat Klinbuayaem, Suwalai Chalermpantmetagul, Surush Sununta, Institut de Recherche pour le Développement, Research Unit 174 (IRD/Faculty of Associated Medical Sciences, Chiang Mai University/Harvard, School of Public Health), Chiang Mai, Thailand; Institut d'Etudes Démographiques, Paris, France. Robert Murphy, Kimberly Saulsberry, Center for Global Health, Division of Infectious Diseases, Northwestern University, Chicago, IL; Christine and David Ellis Translation, Lyon, France; TransPerfect, New York; and ESTHER, Paris, France.