Psychological and physiological factors often contribute to the severity of labor pain and may confound the therapeutic effects of labor analgesia.1–3 Valid and reliable instruments guided by a health behavior theory are important to elucidate the complex relationships between these behavioral factors and labor pain. Among health behavior theories, the multiattribute utility (MAU) theory has the strength to break down a complicated decision into individual attributes with a hierarchical weighted utility structure and to generate a unidimensional summation score that reflects the constructs of interest.4 We previously developed a questionnaire based on MAU theory to evaluate postpartum women's attitudes toward labor epidural analgesia (ATLEA score).5 However, the limitation of the MAU-based questionnaire was that the 20-item questionnaire and the 10 rating categories on a scale between agree and disagree were too complicated for practical use. The determination of the weight of each item in the hierarchical framework to yield a summation score reflecting attitude toward labor epidural analgesia is another problematic issue. The response to items in a questionnaire is mainly determined by 3 factors: person aptitude for perceiving the questions pertinent to ATLEA, item difficulty, and the thresholds of rating categories.6,7 To enhance the feasibility of administering a questionnaire and generating a valid measure of ATLEA, while dispensing with weighing each item at different hierarchical levels, it may be worthwhile to simplify the MAU-based questionnaire by removing unnecessary rating categories and redundant items by the application of the Rasch model. This model is a commonly used psychometric method for calibrating various instruments8,9 and creating a unidimensional measurement that assesses the locations of items and persons on a common interval scale rather than on an ordinal scale.10 Its use has gained popularity in the medical field.11–14 Therefore, the aim of our study was to assess whether we could apply the Rasch model to simplify the original MAU-based questionnaire to convert multidimensional responses into a unidimensional measure of ATLEA. Furthermore, we compared the reliability and validity of the simplified version and full version to assess the feasibility of using the Rasch model to simplify a questionnaire developed for health behavior research in the field of anesthesia and analgesia.
This study was conducted in the Taipei Veterans General Hospital, a tertiary medical center in Taiwan. We used data collected in a previous study5 to demonstrate the usefulness of Rasch analysis to simplify a questionnaire originally developed to measure ATLEA. The approval of the IRB was waived, but written informed consent was still obtained from all participants for the initial study. Study participants enrolled from January to April 2006, were of mixed parity, and were native speakers of Chinese with uncomplicated singleton gestation pregnancy. All eligible parturients in this center during the study period were invited to participate after delivery. The exclusion criteria were elective cesarean delivery, emergency cesarean delivery without sufficient time to consider whether to use epidural analgesia for labor, contraindications to epidural analgesia (e.g., bleeding diathesis, local infection), and history of psychiatric disorders or substance abuse.
Measurement of ATLEA
The questionnaire used to measure ATLEA was developed in the previous study using MAU theory; reliability and validity were verified.5 There were 20 items in the original questionnaire, and each item represented a potentially influential factor pertaining to the decision to use labor epidural analgesia. The questionnaire was administered between the 1st and 3rd postpartum days. Participants were asked to judge each item and then indicate to what extent they agreed or disagreed by marking 1 of the 10 “weights” under the “balance scale” between “for” and “against.” The more weights assigned, the stronger they agreed or disagreed. The details of the original questionnaire are described in Supplemental Web Digital Content 1 (http://links.lww.com/AA/A318). The net weighted utility score on the ATLEA scale was calculated for each participant. An example of the summated utility score is illustrated in Supplemental Web Digital Content 2 (http://links.lww.com/AA/A319). Because the original scale divided the balance scale for each item into 10 portions, the ordinal responses to each item was recoded using integers from 0 to 9 (categories) for the Rasch analysis.
To remove unnecessary rating categories, we condensed the original rating scale under the MAU framework by merging response categories that were not appropriately used across the whole scale (e.g., low frequency and overlapping with adjacent categories).15 The rating scale model was used to analyze the item responses because our MAU questionnaire was designed to have a common rating scale structure. This was performed by using the Rasch analysis that relates person latent trait, item difficulty, and category threshold to the probability of selecting a specific category.6,9 Item difficulty is defined as the location parameter of an item corresponding to the person latent trait on the common scale derived from the Rasch analysis. The mathematical formula of the Rasch model for rating scale analysis is presented in Supplemental Web Digital Content 3 (http://links.lww.com/AA/A320), and details of this method have been described.7,16 The process was continued until desired measurement properties of the rating scale were obtained using Linacre's guidelines (minimum observations >10, regular observation distribution, monotonic average measures of categories, unweighted mean square (MSQ) <2, no disordering of step calibrations, coherence between ratings and measures, 1.4 < advance of step difficulties in logit <5).17 Category characteristic curves before and after rating category condensation were plotted to depict the probability of selecting a specific category for various ATLEA scores.
After the completion of rating category condensation, fit statistics of items were checked to exclude misfit items from further analyses using standardized fit statistics (ZSTD) criterion (outside the range between −2 and 2) and weighted MSQ criterion (outside the acceptable range from 0.8 to 1.2).18 Individuals with unweighted ZSTD outside the range from −5 to 5 were excluded from the analysis.19 Two kinds of reliability coefficients were reported in the analyses: the reliability and separation indices. The reliability coefficient is analogous to Cronbach's α. The person separation indices of 1.50, 2.00, and 3.00 represent 3 levels of separation ability: acceptable, good, and excellent.20
After excluding misfit items and persons, we estimated individual ATLEA scores, item location (difficulty) parameters, and category thresholds of the final rating scale format. An item distribution map was constructed to illustrate the distribution of the persons and items with their rating category thresholds on the same measurement scale. The correlation between ATLEA scores from the original MAU questionnaire and the simplified version was evaluated using Pearson product–moment correlation coefficient. Area under receiver operating characteristics (ROC) curve with its 95% confidence interval (CI) was computed to compare the empirical validity of ATLEA scale based on a simplified version of MAU questionnaire with that of the full version in predicting use of labor epidural analgesia. The Rasch analyses were performed with Winsteps software, version 3.68 (Winsteps.com, Chicago, IL), and other analyses were conducted with SPSS 15.0 (SPSS Inc., Chicago, IL).
Two hundred five parturients were approached for study participation; 167 completed most of the questionnaire; and the remaining 38 refused to complete the questionnaire. The baseline characteristics of participants are presented in Table 1.
Figure 1A illustrates category probability before condensation. Several categories had lower probability of selection and could be combined with adjacent categories. Figure 1B shows the results of category condensation from 10 to 4 categories by collapsing the adjacent categories (principle of step disorders). The original categories 0 and 1 are combined as the new category “0,” 2 to 4 as “1,” 5 to 7 as “2,” and 8 to 9 as “3.” After the condensation, each of the remaining 4 categories had a higher probability of being selected than others in a specific range of ATLEA values. Table 2 describes the stepwise results regarding the category condensation and exclusion of misfit items and persons to attain the final 9-item version. A table showing combinations of categories relative to Linacre's criteria can be found in Supplemental Web Digital Content 4 (http://links.lww.com/AA/A321). The person reliability and separation indices of the full version were 0.73 and 1.65, respectively. After removing unnecessary rating categories, the person reliability and separation indices increased to 0.74 and 1.68, respectively. In addition, the reliability remained unchanged after the exclusion of 3 misfit persons and decreased to 0.68 after the elimination of 11 misfit items. The correlation coefficient between ALTEA scores from the full version and the 9-item version was 0.89 (P < 0.001).
Table 3 shows item location parameters with its SE and fit statistics on the common scale derived from the Rasch analysis. The remaining 9 items' location on the common scale ranged from −0.69 to 0.99 and had fit statistics within the predetermined criteria. The item “tolerance to labor pain” (item 17) had the highest location parameter and the item “active information collection” (item 8) had the lowest location among all items (Fig. 2). The common threshold parameters with standard errors for categories 1, 2, and 3 were −1.02 ± 0.09, −0.15 ± 0.07, and 1.16 ± 0.07, respectively. These threshold parameters could be used to predict which category of a specific item would be selected by a person. For example, the 3 thresholds of the item “tolerance to labor pain” were −0.03 (common threshold 1 parameter (−1.02) + item location parameter (0.99)), 0.84 (common threshold 2 parameter (−0.15) + item location parameter (0.99)), and 2.15 (common threshold 1 parameter (1.16) + item location parameter (0.99)), respectively. Therefore, a parturient with an ATLEA score of 1.2 would be anticipated to select category “2” of this item because its value was higher than the category thresholds of “1” and “2” but lower than the threshold of category “3.” Figure 2 illustrates the item distribution map for 9 items with their category thresholds. There were 4 parturients with ATLEA scores below the first category threshold of the lowest located item (item 8) and 11 parturients with ATLEA scores above the 3rd threshold of the highest located item (item 17).
Figure 3 depicts the ROC curves of the ATLEA scale of the simplified version of MAU questionnaire after the exclusion of unnecessary items and the corresponding full-version ATLEA scale for labor decision on epidural analgesia. The empirical validity of ATLEA score for labor decision on epidural analgesia for the simplified and full versions, as assessed by area under ROC curve were 0.80 (95% CI: 0.74, 0.87) and 0.81 (95% CI: 0.75, 0.88), respectively.
MAU theory for identifying factors related to decisions on health behavior (i.e., decision to use epidural labor analgesia) has been extensively used.21–23 However, the collection of empirical data required to construct the hierarchical decision framework and “pro” and “con” statements is often not feasible if too many items are considered. Our previous study applied MAU theory to ascertain individual ATLEA and found that parturients were overwhelmed by superfluous items and rating categories in the original questionnaire.5 The questionnaire framework may be simplified by condensing rating categories using the Rasch model, which also considers item characteristics and person latent trait.15,17 This analytic approach is quite useful and has been extensively applied to rating category analysis. For example, Pesudovs and Noble used this approach to collapse redundant categories into adjacent ones to improve a subjective scaling of pain.24 Decruynaere et al. also successfully applied the Rasch analysis to reduce the number of response levels of face scales for pain assessment in children.12 In our study, the original 10 rating categories were collapsed into 4 without compromising validity and reliability. In addition, with the aid of Rasch analysis, a simplified version of the questionnaire was derived with fewer than half the items of the original one. The simplified version has potential benefits over the full version, including reduced time and burden to complete and promoting the response rate without compromising reliability and empirical validity.
The Rasch analysis allows ordering of items and persons on a common interval scale. Such ordering is of practical importance. For example, “active information collection” was the lowest item on the common scale. This implies that a parturient who strongly disapproved of active information collection for labor epidural analgesia would be anticipated to have very negative ATLEA score. In contrast, “tolerance to labor pain” was the highest item on the scale. This implies that those who show little tolerance to labor pain would be expected to have a very positive ATLEA score. Therefore, the ATLEA score of a parturient could be located easily on the common scale using relatively fewer items based on the previous Rasch analysis. In addition, the ATLEA scores from the Rasch analysis may be suitable to assess alteration of ATLEA under miscellaneous conditions.
The Rasch analysis had two additional benefits: it identified misfit persons who replied in an unexpected manner, and it was useful to estimate the latent trait of interest for those who did not complete the questionnaire. It is not uncommon that a subset of participants recruited in a study intentionally give false answers or respond perfunctorily. The Rasch analysis could help us to identify these participants by evaluation of their fit statistics. For example, 1 of our participants answered all questions with the same answer, and we could easily identify this parturient by fitting statistical analysis, which revealed an abnormally high degree of misfit. In addition, missing data are extremely common in a questionnaire survey, and dealing with these missing data is a serious problem. It is clearly undesirable to exclude all respondents with missing data, because this may distort analytic results because of the bias of the remaining data set. Although miscellaneous imputation methods have been used to handle missing values, the validity of the imputed data is still questionable.25 Given acceptable fit statistics, the Rasch analysis might provide sound estimates on the basis of the responses to the remaining items without missing values.9
There are many limitations to our study. First, the reliability of the simplified questionnaire is only 0.68. Validation processes should be considered before further application of this simplified version. Second, the original questionnaire was developed in Chinese, and the applicability of this instrument to women from non-Chinese cultures is not known. Further revision and testing of the instrument for use in other populations are necessary. Third, the original questionnaire was designed for MAU analysis, not for the Rasch analysis. The legitimacy of the Rasch analysis in this setting is unclear. Fourth, the benefit of reducing categories with the Rasch analysis requires further investigation. If a new 4-category rating scale version is studied in the future, reanalyzing the new questionnaire with the Rasch model will be necessary to ensure its applicability because parturients may not respond to new 4-category rating scale items in the same way as the original 10-category version. Fifth, because the questionnaire was completed after labor, women's ATLEA might have been affected by the labor process. Ideally, the questionnaire would be completed antepartum (before the onset of labor). Finally, the relatively small sample size limits generalizability of the results to other populations/samples without further validation.
In conclusion, we demonstrated the use of the Rasch model to simplify a questionnaire with MAU underpinning by condensing rating categories and removing misfit items without compromising reliability and empirical validity. This result suggests that the methodology may be feasible with other questionnaires that use MAU theory to predict decisions on anesthetic procedures.
Name: Kuang-Yi Chang.
Contribution: Study design, conduct of the study, data collection, data analysis, and manuscript preparation.
Name: Mei-Yung Tsou, MD, PhD.
Contribution: Study design, conduct of the study, and manuscript preparation.
Name: Kwok-Hon Chan, MD.
Contribution: Study design, conduct of the study, and manuscript preparation.
Name: Hsiu-Hsi Chen.
Contribution: Study design, conduct of the study, data analysis, and manuscript preparation.
This manuscript was handled by: Cynthia A. Wong, MD.
1. Berentson-Shaw J, Scott KM, Jose PE. Do self-efficacy beliefs predict the primiparous labour and birth experience? A longitudinal study. J Reprod Infant Psychol 2009;27:357–73
2. McDonald RP. Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum Associates, 1999
3. Niven C, Gijsbers K. A study of labour pain using the McGILL pain questionnaire. Soc Sci Med 1984;19:1347–51
4. Carter WB. Health behavior as a rational process: theory of reasoned action and multiattribute utility theory. In: Glanz K, Lewis FM, Rimer BK, eds. Health Behavior and Health Education: Theory, Research, and Practice. 1st ed. San Francisco: Jossey-Bass Publishers, 1990:63–91
5. Chang KY, Chan KH, Chang SH, Yang MC, Chen TH. Decision analysis for epidural labor analgesia with multiattribute utility (MAU) model. Clin J Pain 2008;24:265–72
6. Embretson SE, Reise SP. Item Response Theory for Psychologists. Mahwah, NJ: Lawrence Erlbaum Associates, 2000
7. Ostini R, Nering ML. Polytomous Item Response Theory Models. Thousand Oaks, CA: Sage Publications, 2006
8. Andrich D. Rasch Models for Measurement. Newbury Park, CA: Sage Publications, 1988
9. Bond TG, Fox CM. Applying the Rasch Model: Fundamental Measurement in the Human Sciences. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates, 2007
10. Yen WM, Fitzpatrick AR Item response theory. In: Brennan RL, ed. Educational Measurement. 4th ed. Westport, CT: Praeger Publishers, 2006:111–53
11. Chang KY, Tsou MY, Chan KH, Chang SH, Tai JJ, Chen HH. Item analysis for the written test of Taiwanese board certification examination in anaesthesiology using the Rasch model. Br J Anaesth 2010;104:717–22
12. Decruynaere C, Thonnard JL, Plaghki L. How many response levels do children distinguish on faces scales for pain assessment? Eur J Pain 2009;13:641–8
13. Hawthorne G, Densley K, Pallant JF, Mortimer D, Segal L. Deriving utility scores from the SF-36 health instrument using Rasch analysis. Qual Life Res 2008;17:1183–93
14. Lamoureux EL, Pesudovs K, Pallant JF, Rees G, Hassell JB, Caudle LE, Keeffe JE. An evaluation of the 10-item vision core measure 1 (VCM1) scale (the Core Module of the Vision-Related Quality of Life scale) using Rasch analysis. Ophthalmic Epidemiol 2008;15:224–33
15. Linacre JM. Investigating rating scale category utility. J Outcome Meas 1999;3:103–22
16. Wright BD, Master GN. Rating Scale Analysis. Chicago: MESA Press, 1982
17. Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas 2002;3:85–106
18. Sheu C, Chen C, Su Y, Wang W. Using SAS PROC NLMIXED to fit item response theory models. Behav Res Methods 2005;37:202–18
19. Wright BD, Stone MH. Best Test Design: Rasch Measurement. Chicago: MESA Press, 1979
20. Fisher WP. Reliability statistics. Rasch Meas Trans 1992;6:238
21. Carter WB, Beach LR, Inui TS. The flu shot study: using multiattribute utility theory to design a vaccination intervention. Organ Behav Hum Decis Process 1986;38:378–91
22. Petrou S, Morrell J, Spiby H. Assessing the empirical validity of alternative multi-attribute utility measures in the maternity context. Health Qual Life Outcomes 2009;7:40
23. Torrance GW, Feeny DH, Furlong WJ, Barr RD, Zhang Y, Wang Q. Multiattribute utility function for a comprehensive health status classification system. Health Utilities Index Mark 2. Med Care 1996;34:702–22
24. Pesudovs K, Noble BA. Improving subjective scaling of pain using Rasch analysis. J Pain 2005;6:630–6
25. Schafer JL, Olsen MK. Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivar Behav Res 1998;33:545–71
Supplemental Digital Content
© 2011 International Anesthesia Research Society