Simpson, Trefford L. DipOptom, PhD; Situ, Ping MB, MSc; Jones, Lyndon W. FCOptom, PhD, FAAO; Fonn, Desmond MOptom, FAAO
Some things are difficult to measure and it is not always clear, particularly clinically, that what we “measure” is actually a measurement at all. Symptoms fall into this category; with no external reference, how do we know that the metrics we use are in any way real? Soft metrology is defined by Pointer1 as the measurement of parameters that, either singly or in combination, correlate with attributes of human response. Finkelstein2,3 has defined a framework for measurements in social and psychological sciences and includes this concept of soft (or weak) measurement. The framework includes among other things, the lack of precisely defined procedures and little well-formed theoretical underpinning. Nevertheless, these soft measurements, for them to be measurements, still need to have a sound basis. In addition measurements should behave in ways that are predictable when certain experimental manipulations occur (construct validity) and things which measure similar underlying phenomena should be associated (criterion validity).4 In this article, we examine whether “measurements” of symptoms of dryness conform to one limitation necessary for them to be measurements (unidimensionality) and whether construct and criterion validity can be demonstrated when these “instruments” are used.
According to the published report from the international Dry Eye Workshop, dry eye refers to a multifactorial disease of tears and ocular surface that results in symptoms of discomfort, visual disturbance, and tear film instability with potential damage to the ocular surface.5 Dry eye is a highly prevalent disease and by definition a symptomatic disorder.6 The severity of the symptoms varies and the irritative symptoms adversely affect the quality of life of dry eye patients.6
Symptom assessment is important in diagnosis and monitoring of patients with dry eye.7–10 A number of questionnaires have been developed and employed in both epidemiological studies,11–17 and clinical research, to serve as screening instruments or to assess the effects of treatments and grade disease severity.18–20 Questionnaires have been shown to deliver reliable information in dry eye diagnosis and may provide a more integrated view of the clinical condition over time.6
Among the symptom questionnaires available, the Ocular Surface Disease Index (OSDI), the Dry Eye Questionnaire (DEQ), and the McMonnies Dry Eye Questionnaire (MQ) have been widely used.18–20 These questionnaires vary in length, design focus, and extent of validation and commonly involve a series of rating scale type items that are combined to produce a total raw score, with the exception of the DEQ where a method to generate a total score is not yet available. The scores from these questionnaires may not exhibit essential properties of measurement including (among others) unidimensionality. There have been no reports about whether this essential attribute of measurement is a characteristic of, or is similar across these questionnaires and also how these measurements of dry eye symptoms relate to each other.
Rasch analysis has been applied to subjective measurement across medicine including pain, health-related quality of life and vision outcome assessment.21–24 It provides a method for testing scale assumptions and modifying scale structure to produce a linear scale. One of the assumptions that may be tested is unidimensionality, an essential component of a single metric that implies (among other things) that greater and lesser scores on a scale represent greater and lesser characteristics of the attribute being scored. In a DEQ, for example, if poor vision could increase the score as well as the symptom intensity, this would imply that two separable independent dimensions were contributing and therefore, the score would not be unidimensional. In addition, receiver (or relative) operating characteristics (ROC) curve analysis examines the accuracy25 of each questionnaire in terms of separability of the symptomatic and asymptomatic groups.
In the present study, we aim to establish the associations between the questionnaires (the OSDI, DEQ, and MQ), we apply Rasch analysis to explicitly examine the essential property of the unidimensionality of these measurements of the symptoms and by using ROC curve analysis, determine how well the tests separated symptomatic and asymptomatic subjects in a sample of non-contact lens wearers with and without dry eye symptoms. In addition, the methods adopted enabled us to test the construct and face validity of the simple Subjective Evaluation of Symptom of Dryness (SESoD).26
METHODS AND MATERIALS
This study was conducted in accord with the Declaration of Helsinki, 1997 and received clearance from the University of Waterloo, Office of Research Ethics (Waterloo, Ontario, Canada). Informed consent was obtained from all subjects.
Ninety-seven non-contact lens wearing subjects with and without ocular dryness symptoms were enrolled in the study. They had no history of systemic diseases and/or were not using any systemic or topical medication that would affect ocular health. Slit-lamp biomicroscopy examination was undertaken to exclude clinically significant lid, conjunctival or corneal abnormalities other than the clinical signs of dry eye.
Eligible subjects were assigned to one of the two groups (dry and non-dry) by an assistant, according to grouping criteria which were based on SESoD (see appendix—available online at www.optvissci.com) and outlined in Table 1; scores “none (0)” to “trace (1)” = “non-dry” and scores “mild (2)” to “severe (4)” = “dry.” A battery of standard clinical dry eye tests including bulbar and limbal redness, tear function and appearance of the ocular surface, were performed. Then OSDI, DEQ, MQ, and SESoD (for a second time) were self-administered in a random order to reduce potential order bias. The symptom score for each questionnaire was calculated according to its own scoring system except for the DEQ where we used the sum of the scores for all the symptom questions (i.e., questions 4a to d to 12a to d) or the abbreviated sum of the scores of the first (a) of each of the symptom questions (i.e., questions 4a to 12a).
Statistical associations and differences were examined using Statistica 7.0 (StatSoft Inc. Tulsa, OK) and p ≤ 0.05 was considered to be statistically significant. χ2, Fisher-exact, and Mann-Whitney tests were used for the comparisons between the two groups. Spearman rank correlation was used to examine the association between the four questionnaires used.
ROC curve analysis was done using SPSS 15.0 (SPSS, Chicago, IL). ROC curves are plots of hit rate (true positive or sensitivity) against false alarms (false positive or 1-specificity). Briefly, this quantifies how each questionnaire segregated previously classified subjects into their respective symptomatic and asymptomatic group. Perfect segregation (i.e., one that was identical to the initially administered SESoD in designating symptomatic and asymptomatic subjects) would produce a summary statistic (in this instance area under the ROC curve) of 1.0. A perfectly useless questionnaire (one that segregated subjects randomly into symptomatic and asymptomatic groups) would have a summary statistic of 0.50.
Rasch analysis was conducted using Winstep 3.63.2 (http://www.winsteps.com/). Whether the data met the unidimensionality criterion imposed by Rasch theory was evaluated for each of the questionnaires using the infit and outfit statistics.27 These statistics (each centered on zero), allow us to determine whether scores differ from a (unidimensional) prediction that follows from Rasch’s theory by representing what has been observed, what would be expected and the odds that these have occurred by chance.28 Broadly speaking, they are interpreted in the same way and quantify the variation between observed and the Rasch model predicted data. So, for example, an infit (or outfit) of 0.20 indicated 20% more variation in the data than predicted by theory.27
Characteristics of the Symptoms of the Four Questionnaires
The prevalence of the scores from the three screening questions is shown in Fig. 1.
Dry Eye Questionnaire.
The histograms of ocular symptoms of discomfort and dryness for the non-dry and dry eye participants are presented in Figs. 2, 3a. The diurnal variation in moderate to very intense symptoms of discomfort and dryness for the non-dry eye and dry eye participants is shown in Figs. 2, 3b.
Fisher-exact tests showed significant differences between the non-dry eye and dry eye participants with respect to the proportion who reported frequent to constant discomfort and dryness (df = 1, χ2 = 42.15 and 43.12, respectively, both p < 0.001). In addition, when the symptom intensity was grouped as low [intensity level 1 (not at all) to 2] and high (3 to 5 = very intense), there were associations between the diurnal variation and the intensity of symptoms of discomfort and dryness in the dry eye participants (df = 2, χ2 = 6.45, p = 0.04 and χ2 = 6.71, p = 0.03 for discomfort and dryness, respectively) but not in the non-dry eye participants (df = 2, χ2 = 2.89, p = 0.24 for discomfort, and χ2 = 2.18, p = 0.34 for dryness). The intensity of discomfort and dryness for the dry eye participants tended to increase in the evening (Figs. 2, 3b).
McMonnies Dry Eye Questionnaire.
The distribution and frequency of ocular symptoms for the non-dry eye and dry eye participants are presented in Fig. 4a, b.
The proportion of participants who experienced ocular symptoms was significantly different between those grouped as non-dry eye and dry eye (Fisher-exact tests, all p ≤ 0.01). The proportion of “often” to “constant” symptoms was also different between the two groups of participants (Fisher-exact test, p < 0.001).
Ocular Surface Disease Index.
The OSDI score [median (lower-upper quartile)] comprising item score and subscale scores are presented in Table 2. There were significant differences between the two groups of participants for the OSDI item score as well as all subscores (all p < 0.001). The dry eye participants had higher scores for all aspects of the OSDI.
The hit rate against the false alarm rate for the different possible cutpoints of each questionnaire (ROC curve) is plotted in Fig. 5. Each figure has an inset showing how accurate the test was at separating symptomatic and asymptomatic subjects using area-under-the-curve (AUC) estimate (and its standard error). For a perfect diagnostic test (complete symptomatic and asymptomatic group separations), AUC = 1.0.
Table 3 shows the non-parametric (Spearman ρ) correlation coefficients between the summary scores for each test. The scores for the SESoD are for the second (self-administered) test.
Rasch Analysis (Unidimensionality)
Table 4 illustrates the unidimensionality of each of the DEQ, MQ, and OSDI assessed by mean square infit and outfit statistics.27 All items (questionnaires) fit the Rasch model (i.e., the scale is unidimensional or represents a single construct) with the mean square infit and outfit statistics ranging from 0.87 to 1.11.
Arguably, the most dramatic increase in understanding of dry eye disease has come from an often misused word, its symptomatology. The study of the symptoms in dry eye comes from the clarification of the defining role the subjective aspects of the condition play,5,29 but also due to a substantial improvement in the technology for measuring the symptoms. The differences between these instruments set the point of departure of this study: Can we at least tell whether these tools, each differing in levels of complexity, measure a single thing? Rasch Theory,30 among many things, provides a framework to examine just this, because it imposes a single continuum (latent variable) along which the measurements must operate for the theory to be valid. Analysis of the data from the various questionnaires was tested using the summary (infit and outfit) statistics, to determine whether the responses could be understood as though originating from a single underlying latent variable that was, in our instance the rather ill-defined feeling of ocular discomfort. The data provide support for this view and although there was an occasional person or questionnaire item that did not generally fit into this notion, for both symptomatic and asymptomatic strata, each questionnaire robustly demonstrated (Table 4) compliance with this Rasch Theory single-dimension constraint, as all the fit statistics were approximately equal to 1.27,31
If each instrument varied along the same single dimension, one would suspect that the scores from these would be strongly associated. This was the case with significant Spearman ρ for all pairwise groupings (Table 3). These strong correlations and the combination of unidimensionality demonstrate that the questionnaires might each be measuring the same thing, ocular dryness symptoms. This begs the question of whether more than one of these tests should be administered in any study, because the data they provide covary linearly (at least when expressed using the ranks) and might be considered to be providing redundant data. One interpretation is that their generally similar behavior might be because the symptoms they measure are the same. However, correlations should be interpreted with caution,32 and another possible interpretation is that the associations are through another variable and so ignoring one of the questionnaires might be a mistake, because in another experiment this association might not exist, because the other variable is missing, for example. The current experiment does not enable us to separate these interpretations and for the time being, it appears safe to say that each test measures something similar (or at least measures linearly related items) along a single continuum and because the instruments are relatively different, each should be considered, depending on the context. At least one questionnaire lends itself to a screening and the others provide more detail, if required, and therefore the questionnaire (or questionnaires) to select in an experiment would depend on the predictor variables. For example, if there was going to be emphasis on the change in symptoms during the day, perhaps the DEQ would be considered first because it has detailed questions about symptom variation during the day.
The correlation results point to the linear associations between summary scores from each of these questionnaires. These relatively strong associations might be expected because each questionnaire probes aspects of symptoms of that arise from the ocular surface (as well as, perhaps, vision). They are however different, not only in the language used but also in how the data from the questions are weighted and combined and therefore the associations rather than being self-evident, demonstrate that in spite of the different language and calculations used, there is a general similarity in the results from the questionnaires; high scores on one are typically associated with high scores on another (and vice versa). These correlations are, of course, dependent on how the summary score for each question was derived and the strength of the associations will vary depending on how each questionnaire’s questions are weighted. In this experiment, using the published weighting schemes for OSDI and MQ as well as a novel simple sums for DEQ and SESoD, there are strong associations between the symptom summaries. If, for example, a different summary of the DEQ was derived (one using more of the questions than are used in our summary score) this will affect the strength of its association with the other questionnaire scores.
An additional intention of the study was to examine if an easily administered simple screening method (the SESoD) could effectively be used to segregate symptomatic and asymptomatic groups. Although not tested directly, this was supported by the data: The ROC curves and AUC summary statistics (Fig. 5) illustrate, first, how similarly each questionnaires’ overall summary statistic separated the two groups (defined using SESoD) and second how effective MQ, DEQ, and OSDI were in terms of matching this symptom stratification. The converse of this provides an illustration of the face and construct validity of the SESoD, in that experimental and control groups similarly classified by the other questionnaires were also discriminated by the SESoDs three-question classifier. An additional test of the SESoDs performance was its repeated administration along with the other three questionnaires. The SESoD was initially administered during screening by a technician and served as the initial symptom grouping questionnaire. On the same day along with the other three questionnaires (in random order), it was re-administered during the experimental phase. This second assessment was almost perfectly concordant with the first (AUC = 0.97) and on the repeated administration, only 3 of the 97 subject’s symptom classification would have changed. Considering that there are only three questions in this questionnaire and that a relatively short time elapsed between its first and second administration, this concordance is perhaps to be expected but it does illustrate that the SESoD is repeatable and that the classification of symptomatic and asymptomatic subjects would typically not change after (slightly different) repeated testing.
In addition to using a single score for SESoD, we used an untested method to summarize the results from the DEQ. In our analyses, we used the sum of the scores from the symptom questions. Although the validity of this method has not been demonstrated, it was used instead of scores derived from factor analysis which we have previously derived and are the subject, in part, of another report.33 Despite this less sophisticated method, the DEQ performed as well as the other two validated20,34 single scoring techniques (Fig. 5). In addition, the sum of the DEQ symptom scores was correlated with those from the other techniques at least as well as the factor analysis derived score was correlated with the other questionnaires.33 The data in Table 1 and Fig. 5 were even simpler than a sum of all 36 questions related to symptoms in the DEQ in that we summed just the first (intensity) question in each of the nine-symptom questions and omitted the detailed follow-up three questions about each symptom. Despite this abbreviated score, the data show how well this linear subscore performed.
The general results of this experiment are similar to those in previous reports examining the performance of the instruments we used18–20,34 and show external validity of the outcomes of these experiments. For example, we were able to replicate the sensitivity of the OSDI in separating symptomatic and asymptomatic subjects.20 The same was true for both the MQ and DEQ despite the data in the other studies being sampled from different populations.18,19 Also, we were able to show that more subtle effects such as the diurnal variation in symptoms18 in the dry eye group (Figs. 2, 3b). These replicated results as well as the novel findings also demonstrate the utility of the instruments as well as the similarities between them.
In this article, we examined a number of issues that enable us to investigate the notion of the numbers obtained from DEQs being measurements. We showed, for example, for the first time that each scale performed as though the numbers were from a single continuum. In addition, objectivity is a requirement of a measurement and although we could not directly address this, we examined whether subjects used the questionnaires differently. Generally they did not and the correlations between the data from each scale at least point to a level of objectivity to the measurements2,3,35 inasmuch as individuals apply the questionnaires similarly, so the measurements do not depend on the vagaries of the subjects.
In summary, we demonstrate the unidimensionality of the data obtained from the MQ, the OSDI, and the DEQ, their numerical association and similarities in segregating symptomatic and asymptomatic subjects and finally the utility of the three-question SESoD to separate these groups.
The help of Dr. Joseph Vehige and Dr. Peter Simmons is greatly appreciated. This work was supported by Allergan LLC.
Trefford L. Simpson
Centre for Contact Lens Research
School of Optometry
University of Waterloo
200 University Ave. W
Waterloo, Ontario, Canada N2L 3G1
The appendix is available online at www.optvissci.com.
APPENDIX: SUBJECTIVE EVALUATION OF SYMPTOM OF DRYNESS (ALLERGAN) (SESoD)
Please evaluate your ocular discomfort due to the symptom of “Dryness” on a scale of 0 (none) to 4 (severe). You may use the following descriptions to assist in your score.
None (0) = I do not have this symptom.
Trace (1) = I seldom notice this symptom, and it does not make me uncomfortable.
Mild (2) = I sometimes notice this symptom, it does make me uncomfortable, but it does not interfere with my activities.
Moderate (3) = I frequently notice this symptom, it does make me uncomfortable, and it sometimes interferes with my activities.
Severe (4) = I always notice this symptom, it does make me uncomfortable, and it usually interferes with my activities. Cited Here...
1. Pointer MR. New directions—soft metrology—requirements for support from mathematics statistics and software. NPL Report CMSC 20/03. 2003.
2. Finkelstein L. Problems of measurement in soft systems. Measurement 2005;38:267–74.
3. Finkelstein L. Widely, strongly and weakly defined measurement. Measurement 2003;34:39–48.
4. Adcock R, Collier D. Measurement validity: a shared standard for qualitative and quantitative research. Am Polit Sci Rev 2001;95:529–46.
5. Lemp MA, Baudouin C, Baum J, Dogru M, Foulks GN, Kinoshita S, Laibson P, McCully J, Murube J, Pfugfelder SC, Rolando M, Toda I. The definition and classification of dry eye disease: report of the Definition and Classification Subcommittee of the International Dry Eye Workshop (2007). Ocul Surf 2007;5:75–92.
6. Smith JA, Albeitz J, Begley C, Caffery B, Nichols K, Schaumberg D, Schein O. The epidemiology of dry eye disease: report of the Epidemiology Subcommittee of the International Dry Eye Workshop (2007). Ocul Surf 2007;5:93–107.
7. Bron AJ, Abelson MB, Ousler G, Pearce E, Tomlinson A, Yokoi N. Methodologies to diagnose and monitor dry eye disease: report of the Diagnostic Methodology Subcommittee of the International Dry Eye Workshop (2007). Ocul Surf 2007;5:108–52.
8. Nichols KK, Nichols JJ, Mitchell GL. The lack of association between signs and symptoms in patients with dry eye disease. Cornea 2004;23:762–70.
9. Schein OD, Tielsch JM, Munoz B, Bandeen-Roche K, West S. Relation between signs and symptoms of dry eye in the elderly. A population-based perspective. Ophthalmology 1997;104:1395–401.
10. Korb DR. Survey of preferred tests for diagnosis of the tear film and dry eye. Cornea 2000;19:483–6.
11. Chia EM, Mitchell P, Rochtchina E, Lee AJ, Maroun R, Wang JJ. Prevalence and associations of dry eye syndrome in an older population: the Blue Mountains Eye Study. Clin Experiment Ophthalmol 2003;31:229–32.
12. Lee AJ, Lee J, Saw SM, Gazzard G, Koh D, Widjaja D, Tan DT. Prevalence and risk factors associated with dry eye symptoms: a population based study in Indonesia. Br J Ophthalmol 2002;86:1347–51.
13. McCarty CA, Bansal AK, Livingston PM, Stanislavsky YL, Taylor HR. The epidemiology of dry eye in Melbourne, Australia. Ophthalmology 1998;105:1114–19.
14. Moss SE, Klein R, Klein BE. Prevalence of and risk factors for dry eye syndrome. Arch Ophthalmol 2000;118:1264–8.
15. Schaumberg DA, Sullivan DA, Buring JE, Dana MR. Prevalence of dry eye syndrome among US women. Am J Ophthalmol 2003;136:318–26.
16. Schein OD, Munoz B, Tielsch JM, Bandeen-Roche K, West S. Prevalence of dry eye among the elderly. Am J Ophthalmol 1997;124:723–8.
17. Doughty MJ, Fonn D, Richter D, Simpson T, Caffery B, Gordon K. A patient questionnaire approach to estimating the prevalence of dry eye symptoms in patients presenting to optometric practices across Canada. Optom Vis Sci 1997;74:624–31.
18. Begley CG, Caffery B, Chalmers RL, Mitchell GL. Use of the dry eye questionnaire to measure symptoms of ocular irritation in patients with aqueous tear deficient dry eye. Cornea 2002;21:664–70.
19. McMonnies CW, Ho A. Responses to a dry eye questionnaire from a normal population. J Am Optom Assoc 1987;58:588–91.
20. Schiffman RM, Christianson MD, Jacobsen G, Hirsch JD, Reis BL. Reliability and validity of the Ocular Surface Disease Index. Arch Ophthalmol 2000;118:615–21.
21. Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use, 3rd ed. New York: Oxford University Press 2003.
22. Massof RW. The measurement of vision disability. Optom Vis Sci 2002;79:516–52.
23. Pesudovs K, Garamendi E, Elliott DB. The Quality of Life Impact of Refractive Correction (QIRC) Questionnaire: development and validation. Optom Vis Sci 2004;81:769–77.
24. Boeckstyns ME. Development and construct validity of a knee pain questionnaire. Pain 1987;31:47–52.
25. Swets JA. Measuring the accuracy of diagnostic systems. Science 1988;240:1285–93.
26. Simmons PA, Vehige JG, Carlisle C, Felix C. Comparison of dry eye signs in self-described mild and moderate patients. Invest Ophthalmol Vis Sci 2003;44:ARVO E-Abstract 2448.
27. Wright BD. Solving measurement problems with the Rasch model. J Educ Meas 1977;14:97–116.
28. Wright BD, Stone MH. Making Measures. Chicago: The Phaneron Press; 2004.
29. Lemp MA. Report of the National Eye Institute/Industry workshop on Clinical Trials in Dry Eyes. CLAO J 1995;21:221–32.
30. Andrich D. Rasch Models for Measurement. Newbury Park, CA: Sage Publications; 1988.
31. Bond TG, Fox CM. Applying the Rasch Model: Fundamental Measurement in the Human Sciences. Mahwah, NJ: L. Erlbaum; 2001.
32. Good PI, Hardin JW. Common Errors in Statistics (and How to Avoid Them), 2nd ed. Hoboken, NJ: Wiley; 2006.
33. Situ P, Simpson T, Jones L, Fonn D. Conjunctival and corneal sensitivity is associated with dry eye symptomatology. Invest Ophthalmol Vis Sci 2006;47:ARVO E-Abstract 262.
34. Nichols KK, Nichols JJ, Mitchell GL. The reliability and validity of McMonnies dry eye index. Cornea 2004;23:365–71.
35. Rossi GB. Measurability. Measurement 2007;40:545–62.