Otley, Anthony*; Smith, Claire†; Nicholas, David†; Munk, Marla†; Avolio, Julie†; Sherman, Philip M.†; Griffiths, Anne M.†
Assessment of health-related quality of life (HRQOL) is important for measuring the impact of chronic disease (1,2). HRQOL is determined not only by physical well-being, but also by the mental state, the degree of social support, the effects of treatment, and the presence of complications (1,2). There has been increasing emphasis on the assessment of HRQOL in patients with chronic illness using generic or disease-specific measures. HRQOL measures provide a more global picture of health than do disease parameters alone, and can be used to identify patient needs and to assess efficacy of treatment as perceived by the patient.
The Inflammatory Bowel Disease Questionnaire (IBDQ) is a disease-specific measure of HRQOL that was developed through a rigorous process beginning with item generation using interviews with adult patients and clinician surveys (3). The IBDQ has been validated as an assessment tool that reflects the health status of adult patients with inflammatory bowel disease (IBD) (3), but it does not purport to reflect the impact of IBD on young patients. Existing generic measures of HRQOL in children, although important for facilitating comparisons across disease groups (2,4), are insensitive to IBD-specific issues.
Given the need perceived by pediatric gastroenterologists (5), we previously initiated the development of a disease-specific measure of HRQOL in pediatric inflammatory bowel disease (IBD), beginning with the systematic exploration of the effects of Crohn's disease and ulcerative colitis on children and teenagers. The resultant questionnaire, herein called IMPACT, is intended to serve both as a descriptive and evaluative tool. As a descriptive tool, it is intended that the measure will facilitate recognition in individual patients of disparity between apparent IBD activity and severity, organic disease-related phenomena, which the physician is accustomed to assessing, and emotional or functional disability. As an evaluative tool, it can be incorporated as an outcome measure in clinical trials. The item generation and item reduction stages of development of this multi-item measure have been previously reported (6). This manuscript describes the assessment of the feasibility, reliability, and validity of the IMPACT questionnaire as a disease-specific measure of HRQOL in pediatric IBD.
Overview of Questionnaire Development and Testing
A summary of the process of instrument development and testing is presented in Figure 1. The item generation, item reduction, and item selection stages were completed exclusively at Hospital for Sick Children, Toronto. Patients (n = 25) from Izaak Walton Killam Hospital, Halifax were included in the instrument feasibility, validity, and reliability testing. The Human Subjects Review Boards of both participating Canadian centers approved all relevant phases of the study. All patients and their parents gave written, informed consent.
In developing a multi-item measure comprising patient-generated concerns, we focused on children and adolescents with Crohn's disease and ulcerative colitis aged 10 to 17 years. Younger patients were excluded because of concern that systematic exploration of quality of life among very young children would require significantly modified methods. At the time the instrument development study was initiated, only 9% of patients in follow-up care through the Toronto pediatric IBD program were aged less than 10 years (6).
Item Generation, Reduction, and Selection
As previously reported, children and adolescents with Crohn's disease (n = 62) and ulcerative colitis (n = 20) were asked in private interviews to elaborate all the ways their bowel disease affected them (6). A 96 item item-reduction questionnaire incorporating these patient-generated concerns was constructed. Selection of concerns to be included in the IMPACT questionnaire was based on the ranking of concerns on the item-reduction questionnaire administered to 117 children and adolescents (6). Mean scores and rankings of concerns among subgroups of patients (Crohn's disease versus ulcerative colitis; aged <12 years versus >13 years) have been previously summarized (6).
To approximate the range of scores in the widely used adult IBD-specific measure of HRQOL (3), the investigators had agreed a priori that between 30 and 35 items would be retained in the final questionnaire. Items universally of greatest importance for all IBD patients are included, as well as some items rated as very important by one subgroup of patients, even if not by others (6). Before the final selection of items to be retained, responses to the item reduction questionnaire from another cohort of 25 adolescents with IBD at the University of Chicago were reviewed, but no additional concerns ranked within the top third.
Description of the Instrument
IMPACT is a self-administered questionnaire that takes about 10 to 15 minutes to complete and contains 33 questions encompassing six domains: bowel (6 concerns), body image (3 concerns), functional/social impairment (11 concerns), emotional impairment (11 concerns), tests/treatments (3 concerns), and systemic impairment (2 concerns). Although a Likert scale with written response options was also formatted and pretested, a visual analogue scale for responses proved easier for patients to complete and was therefore adopted. Each question has two anchors at either end of the horizontal line, indicating the range from worst (0 cm) to best function (10 cm). As shown in Figure 2, patients are asked to mark on the line where their feelings lie in response to each question. The measured length is multiplied by 0.7 giving a maximal score on each item of 7, analogous to the adult IBDQ range of item scores (3). Individual items within IMPACT are equally weighted; hence the scores range from a maximum of 231 to a minimum of 0, with higher scores representing a better quality of life.
Feasibility, Validity, and Reliability Testing
Children and adolescents with a diagnosis of Crohn's disease or ulcerative colitis for more than six months, and who were 10 years of age or older, participated in the feasibility, validity, and reliability testing. Patients with isolated proctitis, existing surgical ostomies, restorative pouch surgery, or a significant comorbid medical disease (i.e., diabetes, liver disease, and psychiatric diagnosis) were excluded. Patients and their parent/caregiver, who were asked to participate, were required to be fluent in English (minimum grade 4 reading level). Demographic data and disease characteristics were collected on all patients, including: age, sex, grade at school, height and weight, Tanner stage, diagnosis, disease location, type of current treatments, and type of treatments within past year.
To carry out construct validity testing, as described in the following text, generic measures of QOL and measures of disease activity were employed.
Quality of life:
All patients completed a modified Cantril's Self-Anchoring Striving Scale (7), a measure consisting of a vertical ladder visual analogue scale, on which patients are asked to provide global ratings of their quality of life. Patients were asked to indicate their QOL on the ladder in three different circumstances: “present life,” “life the best it has ever been,” and “ life last year.” At the same study visit, the accompanying parent was asked to provide global ratings of the child's QOL using the modified Cantril's Self-Anchoring Striving scale.
A subgroup of patients completed two generic HRQOL measures, the 87-item Child Health Questionnaire-Child Form-87 (CHQ-87) (4) and the 80-item Piers Harris Self-Concept Scale (Piers-Harris) (8).
Current disease activity was assessed using the Pediatric Crohn's Disease Activity Index (PCDAI) (9) for Crohn's disease patients and the Colitis Symptom Score (10) for patients with ulcerative colitis. To facilitate analyses of all IBD patients together, PCDAI scores and colitis symptom scores were also converted to categorical groupings (“quiescent,” “mild,” and “moderate/severe”) using established cut-scores (9–11). The attending pediatric gastroenterologist (AMG/PMS/AO) made a global assessment of the patient's pattern of disease activity within the past year as either: “quiescent,” “mild symptoms only,” “exacerbations of at least moderate severity but remissions,” or “chronically active moderate/severe disease.”
Feasibility assesses the ease with which subjects complete and researchers administer a questionnaire. Grammar and language difficulty of the IMPACT questionnaire were assessed with the Flesch Reading Ease Score and the Flesch-Kincaid Grade Level Score (12) using available word processing software (Microsoft® Word 97 SR-2). As a measure of data quality, we examined the number of questions left blank by respondents (13).
A measure is regarded as reliable if it is reproducible, i.e., if the same result is obtained when an unchanged entity is measured again (14). Using a standardized report form all patients were contacted by phone two weeks after first completion of the IMPACT questionnaire. If the patient perceived no change in disease activity, and if no new treatments had been added, he/she was asked to complete the questionnaire a second time and return it by mail.
Validity is concerned with whether a questionnaire actually measures what it is purported to measure. In the absence of a clearly defined external criterion (“gold standard”) for HRQOL in pediatric IBD, construct validation is required (15). Construct validity assesses to what extent a measure “acts” the way one would predict based on the concept it represents (15). Table 1 summarizes the constructs developed a priori and the correlations hypothesized on the assumption that IMPACT scores reflect HRQOL in pediatric IBD.
Descriptive statistics were used to describe the demographic and disease-related data for the study participants. Statistical analyses were conducted using SPSS for Windows, Version 10.0 (16).
Cronbach's alpha was calculated as a measure of internal consistency (14). The intraclass correlation coefficient (ICC) was used to calculate the test-retest reliability of the IMPACT questionnaire (17). An ICC of >0.75 is considered excellent (17). Using the approach described by Donner and Eliasziw (18), to prove an ICC >0.6, and assuming the underlying reliability is at least 0.80, a sample size of approximately 30 patients was required.
To compare mean IMPACT scores, analysis of variance (ANOVA) or Student's t test were used, where appropriate. A strong correlation was defined as r > 0.5, a moderate correlation r > 0.4, and a mild or weak correlation r > 0.3. All analyses were conducted using parametric techniques, with testing to ensure that the assumption of a normal distribution was followed. Where this assumption was not upheld, the nonparametric test was used.
Total IMPACT scores were analyzed as the dependent variable in a general linear model (GLM) regression analysis with backward elimination fit for main effects and no interactions, with the following independent variables: the patient's age, disease type, disease activity, pattern of disease activity within the past year, disease duration (in months), current steroid use, and current enteral nutrition use. A sample size of at least 100 subjects was needed for validity testing, exceeding the number required to prevent underfitting of the model.
Demographic and disease characteristics of the 147 patients participating in the validation, reliability, and feasibility studies are shown in Table 2. Forty-nine of 147 patient participants completed the two generic QOL questionnaires in addition to the remaining validation study questionnaires. Thirty-two patients, in whom disease status and treatment remained unchanged during the two-week period following the initial visit, completed the test-retest portion of the reliability testing.
Readability statistics for the IMPACT questionnaire were excellent with a Flesch-Kincaid Grade level of 4.5, a Flesch Reading ease of 81.9, and 2% passive sentences. For the 147 participants, 33 (0.68%) of 4851 questions were left blank. Only one question was left blank by more than two participants: six participants (4%) left blank the question pertaining to being bothered by bleeding with a bowel movement.
Cronbach's alpha score reflecting internal consistency was 0.96. In the test-retest situation the intraclass correlation coefficient for the total score was 0.90 (95% CI, 0.80–0.95).
The a priori constructs and results are summarized in Table 1.
As shown in Figure 3, the total IMPACT score was significantly higher among patients with quiescent IBD (180 ± 32) than among those with active disease (146 ± 31 for mild, 133 ± 34 for moderate/severe) (P < 0.005). The trend was for decreasing mean total IMPACT score for increasing disease activity; however, this did not reach statistical significance with the number of patients involved (P > 0.05). As shown in Figure 4, patients whose pattern of disease activity within the preceding 12 months was categorized as quiescent had significantly higher total IMPACT scores (189 ± 24) than those whose pattern of disease activity had been chronically active moderate or severe disease (134 ± 32) (P < 0.005). Total scores among patients with a disease pattern characterized by exacerbations and remissions (163 ± 32) or by mild symptoms only (161 ± 38) were also each significantly different from the total scores among patients with a quiescent disease pattern or with chronically active disease (P < 0.005). Total scores for patients with only mild symptoms did not differ from those for patients with moderate/severe exacerbations with remissions during the preceding year (P > 0.05).
The best-fit model for the regression analysis showed an R2= 0.330. The only significant factors in this model were pattern of disease activity within the past 12 months F (3,137) = 4.59, P = 0.004, and current disease activity F(2,137) = 11.43, P < 0.0005.
Inflammatory bowel disease manifests during childhood or adolescence in up to 25% of patients. The chronic and often unpredictable gastrointestinal symptoms and required treatments impose psychologic and social stresses on young patients, which may be different from those experienced by adults. Just as the disease status of young patients is more accurately reflected by a pediatric multi-item measure than by the adult Crohn's disease activity index (CDAI) (11), HRQOL in pediatric IBD must be assessed using an instrument developed and validated among children and adolescents.
HRQOL encompasses the patient's perception of their health status, and hence items included in an evaluative tool must be patient-generated and patient-endorsed. The items included in the IMPACT questionnaire represent the end result of a rigorous process of patient interviews and systematic item reduction by a large number of young patients (6). Moreover, a recently reported cross-cultural comparison indicated that British children, who completed the item-reduction questionnaire used in IMPACT development, ranked issues similarly to the young Canadian patients (19). The current study provides evidence that the IMPACT questionnaire not only incorporates issues relevant to a large number of pediatric patients, but also reliably and validly reflects their HRQOL.
The readability statistics indicate that the questionnaire language and grammar are appropriate for the population studied. The low percentage of questions left blank by participants suggests that the questionnaire was not difficult to understand or complete. The number of questions and range of total scores approximates those of the adult IBDQ, which should facilitate between-study comparisons, as advocated in a recent review of the conduct and interpretation of randomized controlled trials in IBD (20).
Reliability is a prerequisite of validity. Of the conventional means of reliability assessment, clinicians are more familiar with test-retest reliability, the likelihood of a test yielding the same result on serial determinations, if no change in health status occurs. The excellent ICC of IMPACT compares favorably with the adult IBDQ, with a reported ICC of 0.70 (21). A measure with good test-retest reliability allows a smaller sample size to be studied as there is less variation with repeated measures. We selected a test-retest interval of two weeks for our study. Two weeks is a commonly chosen interval for reliability assessment to decrease the chance that participants will recall responses, but also to decrease chance of a change in disease activity occurring (22).
It must be acknowledged, however, that the ICC obtained reflects the test-retest reliability in IBD patients with quiescent disease, as these inevitably comprise the major portion of patients with stable disease and unaltered treatment. Therefore, we can only presume that the IMPACT questionnaire is equally reliable for patients with varying degrees of active disease. This assumption is reasonable as patients with active but stable disease in general demonstrate better test-retest reliability (17).
Due to concerns about learning effects (i.e., a patient remembering their previous responses resulting in falsely high reliability) or patient's clinical state changing in the interval between measurements (falsely lowering reliability), psychometricians have not promoted the test-retest approach to reliability assessment (17). Rather, measures of internal reliability or consistency, as calculated by Cronbach's α, are used. In a questionnaire composed of a homogeneous series of items, each item is considered a single measure of a common underlying characteristic, such as HRQOL in pediatric IBD. The questionnaire would function, therefore, as if the measure were being repeatedly performed (17). The acceptable level of internal consistency is affected by whether the measurement is used for decisions for groups of patients or for individual patients. Nunnally suggested levels of >0.90 were required for individual patient decisions (23). The excellent level of internal consistency evident in the IMPACT questionnaire satisfies the latter criteria (23).
The results of the construct validity testing suggest that the IMPACT questionnaire does, as intended, measure HRQOL in pediatric IBD. As predicted, patients who perceived themselves as having a poor HRQOL (using the modified Cantril's Self-Anchoring Striving scale) had lower IMPACT scores. The correlation was strongest with the patient's global assessment of current HRQOL. This finding indicates that the IMPACT questionnaire is reflecting the patient's current health status, and less so the health status as it was in the past year.
The results of the ANOVA for the mean scores across the four categories of disease activity patterns in the preceding 12 months lead to the same conclusion. Patients with consistently quiescent disease had sig-nificantly greater QOL scores, compared with patient groups having intermittently or chronically active disease. However, there was no significant difference between the mean scores for patients with mild symptoms only or those who had moderate/severe exacerbations with remissions. At the time the IMPACT questionnaire was completed, the mean current disease activity of the individuals in both these groups was not different (P > 0.5) (data not shown). Although in the past twelve months individual patients may have had periods of moderate/severe disease activity, it is likely that the current level of symptoms had the greatest influence on responses to the IMPACT questionnaire.
The great influence that the patient's current health status has on the IMPACT questionnaire is an important feature for its use in clinical trials. Responsiveness, a key feature of any measure, looks at how sensitive a measure is to a change in health status. If IMPACT scores continued to be influenced more by the patient's health status over the preceding year than by their current disease status, short-term responsiveness to change in clinical status would be compromised.
Health-related quality of life as measured by the IMPACT questionnaire assesses a complex phenomenon that is not adequately captured by existing outcome measures available for pediatric IBD. Although both current disease activity and disease severity within the past twelve months account for some variability in the total IMPACT scores, the low predictability of the derived regression model (r2 = 0.330) further highlights the complexity of the QOL phenomenon. If too high a correlation had been obtained, it would imply that the tools are addressing the same phenomena, and would draw into question the need for two separate measures. As diagnosis was not found to be a significant variable, the regression model further endorses the item selection process, which combined the majority of common interest to both CD and UC, as well as some items rated highly by each group. Age was another variable not found to be significant in the model. Therefore, within the age group studied (10–17 years inclusive), which encompasses the majority of pediatric patients with IBD, different questionnaires or means of administering the questionnaires are not required.
In 1997, the American Food and Drug Administration set forth a new mandate that facilitates licensure of new drugs and biologic agents that have included a pediatric study arm in the clinical trials (24). Given this new legislation, an increasing number of pediatric clinical trials will likely be conducted. The choice of an appropriate outcome measure by which to judge the success or failure of the proposed new therapy is fundamental in the design of clinical trials.
We have demonstrated that the patient-generated IMPACT questionnaire is a valid and reliable measure of HRQOL in pediatric patients with both Crohn's disease and ulcerative colitis. Responsiveness to short-term change can be inferred, but this point should be further studied in the context of a prospective clinical trial.
The authors thank C. Smith, K. Green, H. Lomas, Drs. M. Rashid, M. Ste-Marie, and Barbara Kirschner for their assistance with patient recruitment; and Drs. H. Steinhart, C. Bombardier, and R. McLeod for their methodologic critiques of the study. This study was presented in part at the North American Society of Pediatric Gastroenterology and Nutrition 12th Annual Meeting, October 22–25, 1998, in Orlando, Florida. It was published in abstract form in The Journal of Pediatric Gastroenterology and Nutrition 1998;27(4):473.
1. Jenney MEM, Campbell S. Measuring quality of life. Arch Dis Child 1997; 77:347–54.
2. Garrett JW, Drossman DA. Health status in inflammatory bowel disease: biologic and behavioural considerations. Gastroenterology 1990; 99:90–6.
3. Guyatt G, Mitchell A, Irvine EJ. A new measure of health status for clinical trials in inflammatory bowel disease. Gastroenterology 1989; 96:804–10.
4. Landgraf JM, Ware J. The Child Health Questionnaire manual. Boston, MA, 1996.
5. Ferry G. Quality of life in inflammatory bowel disease: background and definitions. J Pediatr Gastroenterol Nutr 1999; 28:S15–S18.
6. Griffiths AM, Nicholas D, Smith C, et al. Development of a quality-of-life index for pediatric inflammatory bowel disease: dealing with differences related to age and IBD type. J Pediatr Gastroenterol Nutr 1999; 28:S46–S52.
7. Cantril H. The pattern of human concerns. New Brunswick: Rutgers University Press; 1965.
8. Piers EV, Harris DB. The Piers-Harris Children's Self-Concept Scale. Los Angeles, CA: Western Psychological Services, 1996.
9. Hyams JS, Ferry GD, Mandel FS, et al. Development and validation of a Pediatric Crohn's Disease Activity Index. J Pediatr Gastroenterol Nutr 1991; 12:439–47.
10. Beattie RM, Nicholls SW, Domizio P, et al. Endoscopic assessment of the colonic response to corticosteroids in children with ulcerative colitis. J Pediatr Gastroenterol Nutr 1996; 22:373–79.
11. Otley AR, Loonen H, Parekh N, et al. Assessing disease activity in pediatric Crohn's disease: which index to use? Gastroenterology 1999; 116:527–31.
12. Flesch RF. A new readability yardstick. J Appl Psychol 1948; 32:221–33.
13. McHorney CA, Ware JE, Lu JFR, et al. The MOS 36-item Short-Form Health Survey (SF-36):III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care 1994; 32:40–66.
14. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. Basic concepts, 2nd ed. New York: Oxford University Press Inc, 1995;4–14.
15. Cox J, Naylor D. The Canadian Cardiovascular Society Grading Scale for Angina Pectoris: is it time for refinements? Ann Intern Med 1992; 117:677–83.
16. SPSS: SPSS for Windows., 10.0.5 ed. Chicago: SPSS Inc, 1999.
17. Portney LG, Watkins MP. Chapter 26: Statistical measures of reliability. Foundation of clinical research: Applications to Practice. Norwich, CT: Appleton and Lange, 1993.
18. Donner A, Eliasziw M. Sample size requirements for reliability studies. Stat Med 1987; 6:441–48.
19. Richardson G, Griffiths AM, Miller V, et al. Quality of life in inflammatory bowel disease: a cross-cultural comparison of English and Canadian children. J Pediatr Gastroenterol Nutr 2001; 32:573–78.
20. Feagan BG, McDonald JWD, Koval JJ. Therapeutics and inflammatory bowel disease: a guide to the interpretation of randomized controlled trials. Gastroenterology 1996; 110:275–83.
21. Irvine EJ, Feagan B, Rochon J, et al. Quality of life: a valid and reliable measure of therapeutic efficacy in the treatment of inflammatory bowel disease. Gastroenterology 1994; 106:287–96.
22. Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol 1997; 50:239–46.
23. Nunnally JC, Bernstein IH. Psychometric Theory. New York: McGraw Hill, 1994.
24. Federal Register. Regulations requiring manufacturers to assess the safety and effectiveness of new drugs and biologic products in pediatric patients; proposed rule. Fed Regist 1997; 62:43900–16.
© 2002 Lippincott Williams & Wilkins, Inc.