Fisher-exact tests showed significant differences between the non-dry eye and dry eye participants with respect to the proportion who reported frequent to constant discomfort and dryness (df = 1, χ2 = 42.15 and 43.12, respectively, both p < 0.001). In addition, when the symptom intensity was grouped as low [intensity level 1 (not at all) to 2] and high (3 to 5 = very intense), there were associations between the diurnal variation and the intensity of symptoms of discomfort and dryness in the dry eye participants (df = 2, χ2 = 6.45, p = 0.04 and χ2 = 6.71, p = 0.03 for discomfort and dryness, respectively) but not in the non-dry eye participants (df = 2, χ2 = 2.89, p = 0.24 for discomfort, and χ2 = 2.18, p = 0.34 for dryness). The intensity of discomfort and dryness for the dry eye participants tended to increase in the evening (Figs. 2, 3b).
McMonnies Dry Eye Questionnaire.
The distribution and frequency of ocular symptoms for the non-dry eye and dry eye participants are presented in Fig. 4a, b.
The proportion of participants who experienced ocular symptoms was significantly different between those grouped as non-dry eye and dry eye (Fisher-exact tests, all p ≤ 0.01). The proportion of “often” to “constant” symptoms was also different between the two groups of participants (Fisher-exact test, p < 0.001).
Ocular Surface Disease Index.
The OSDI score [median (lower-upper quartile)] comprising item score and subscale scores are presented in Table 2. There were significant differences between the two groups of participants for the OSDI item score as well as all subscores (all p < 0.001). The dry eye participants had higher scores for all aspects of the OSDI.
The hit rate against the false alarm rate for the different possible cutpoints of each questionnaire (ROC curve) is plotted in Fig. 5. Each figure has an inset showing how accurate the test was at separating symptomatic and asymptomatic subjects using area-under-the-curve (AUC) estimate (and its standard error). For a perfect diagnostic test (complete symptomatic and asymptomatic group separations), AUC = 1.0.
Table 3 shows the non-parametric (Spearman ρ) correlation coefficients between the summary scores for each test. The scores for the SESoD are for the second (self-administered) test.
Rasch Analysis (Unidimensionality)
Table 4 illustrates the unidimensionality of each of the DEQ, MQ, and OSDI assessed by mean square infit and outfit statistics.27 All items (questionnaires) fit the Rasch model (i.e., the scale is unidimensional or represents a single construct) with the mean square infit and outfit statistics ranging from 0.87 to 1.11.
Arguably, the most dramatic increase in understanding of dry eye disease has come from an often misused word, its symptomatology. The study of the symptoms in dry eye comes from the clarification of the defining role the subjective aspects of the condition play,5,29 but also due to a substantial improvement in the technology for measuring the symptoms. The differences between these instruments set the point of departure of this study: Can we at least tell whether these tools, each differing in levels of complexity, measure a single thing? Rasch Theory,30 among many things, provides a framework to examine just this, because it imposes a single continuum (latent variable) along which the measurements must operate for the theory to be valid. Analysis of the data from the various questionnaires was tested using the summary (infit and outfit) statistics, to determine whether the responses could be understood as though originating from a single underlying latent variable that was, in our instance the rather ill-defined feeling of ocular discomfort. The data provide support for this view and although there was an occasional person or questionnaire item that did not generally fit into this notion, for both symptomatic and asymptomatic strata, each questionnaire robustly demonstrated (Table 4) compliance with this Rasch Theory single-dimension constraint, as all the fit statistics were approximately equal to 1.27,31
If each instrument varied along the same single dimension, one would suspect that the scores from these would be strongly associated. This was the case with significant Spearman ρ for all pairwise groupings (Table 3). These strong correlations and the combination of unidimensionality demonstrate that the questionnaires might each be measuring the same thing, ocular dryness symptoms. This begs the question of whether more than one of these tests should be administered in any study, because the data they provide covary linearly (at least when expressed using the ranks) and might be considered to be providing redundant data. One interpretation is that their generally similar behavior might be because the symptoms they measure are the same. However, correlations should be interpreted with caution,32 and another possible interpretation is that the associations are through another variable and so ignoring one of the questionnaires might be a mistake, because in another experiment this association might not exist, because the other variable is missing, for example. The current experiment does not enable us to separate these interpretations and for the time being, it appears safe to say that each test measures something similar (or at least measures linearly related items) along a single continuum and because the instruments are relatively different, each should be considered, depending on the context. At least one questionnaire lends itself to a screening and the others provide more detail, if required, and therefore the questionnaire (or questionnaires) to select in an experiment would depend on the predictor variables. For example, if there was going to be emphasis on the change in symptoms during the day, perhaps the DEQ would be considered first because it has detailed questions about symptom variation during the day.
The correlation results point to the linear associations between summary scores from each of these questionnaires. These relatively strong associations might be expected because each questionnaire probes aspects of symptoms of that arise from the ocular surface (as well as, perhaps, vision). They are however different, not only in the language used but also in how the data from the questions are weighted and combined and therefore the associations rather than being self-evident, demonstrate that in spite of the different language and calculations used, there is a general similarity in the results from the questionnaires; high scores on one are typically associated with high scores on another (and vice versa). These correlations are, of course, dependent on how the summary score for each question was derived and the strength of the associations will vary depending on how each questionnaire’s questions are weighted. In this experiment, using the published weighting schemes for OSDI and MQ as well as a novel simple sums for DEQ and SESoD, there are strong associations between the symptom summaries. If, for example, a different summary of the DEQ was derived (one using more of the questions than are used in our summary score) this will affect the strength of its association with the other questionnaire scores.
An additional intention of the study was to examine if an easily administered simple screening method (the SESoD) could effectively be used to segregate symptomatic and asymptomatic groups. Although not tested directly, this was supported by the data: The ROC curves and AUC summary statistics (Fig. 5) illustrate, first, how similarly each questionnaires’ overall summary statistic separated the two groups (defined using SESoD) and second how effective MQ, DEQ, and OSDI were in terms of matching this symptom stratification. The converse of this provides an illustration of the face and construct validity of the SESoD, in that experimental and control groups similarly classified by the other questionnaires were also discriminated by the SESoDs three-question classifier. An additional test of the SESoDs performance was its repeated administration along with the other three questionnaires. The SESoD was initially administered during screening by a technician and served as the initial symptom grouping questionnaire. On the same day along with the other three questionnaires (in random order), it was re-administered during the experimental phase. This second assessment was almost perfectly concordant with the first (AUC = 0.97) and on the repeated administration, only 3 of the 97 subject’s symptom classification would have changed. Considering that there are only three questions in this questionnaire and that a relatively short time elapsed between its first and second administration, this concordance is perhaps to be expected but it does illustrate that the SESoD is repeatable and that the classification of symptomatic and asymptomatic subjects would typically not change after (slightly different) repeated testing.
In addition to using a single score for SESoD, we used an untested method to summarize the results from the DEQ. In our analyses, we used the sum of the scores from the symptom questions. Although the validity of this method has not been demonstrated, it was used instead of scores derived from factor analysis which we have previously derived and are the subject, in part, of another report.33 Despite this less sophisticated method, the DEQ performed as well as the other two validated20,34 single scoring techniques (Fig. 5). In addition, the sum of the DEQ symptom scores was correlated with those from the other techniques at least as well as the factor analysis derived score was correlated with the other questionnaires.33 The data in Table 1 and Fig. 5 were even simpler than a sum of all 36 questions related to symptoms in the DEQ in that we summed just the first (intensity) question in each of the nine-symptom questions and omitted the detailed follow-up three questions about each symptom. Despite this abbreviated score, the data show how well this linear subscore performed.
The general results of this experiment are similar to those in previous reports examining the performance of the instruments we used18–20,34 and show external validity of the outcomes of these experiments. For example, we were able to replicate the sensitivity of the OSDI in separating symptomatic and asymptomatic subjects.20 The same was true for both the MQ and DEQ despite the data in the other studies being sampled from different populations.18,19 Also, we were able to show that more subtle effects such as the diurnal variation in symptoms18 in the dry eye group (Figs. 2, 3b). These replicated results as well as the novel findings also demonstrate the utility of the instruments as well as the similarities between them.
In this article, we examined a number of issues that enable us to investigate the notion of the numbers obtained from DEQs being measurements. We showed, for example, for the first time that each scale performed as though the numbers were from a single continuum. In addition, objectivity is a requirement of a measurement and although we could not directly address this, we examined whether subjects used the questionnaires differently. Generally they did not and the correlations between the data from each scale at least point to a level of objectivity to the measurements2,3,35 inasmuch as individuals apply the questionnaires similarly, so the measurements do not depend on the vagaries of the subjects.
In summary, we demonstrate the unidimensionality of the data obtained from the MQ, the OSDI, and the DEQ, their numerical association and similarities in segregating symptomatic and asymptomatic subjects and finally the utility of the three-question SESoD to separate these groups.
The help of Dr. Joseph Vehige and Dr. Peter Simmons is greatly appreciated. This work was supported by Allergan LLC.
Trefford L. Simpson
Centre for Contact Lens Research
School of Optometry
University of Waterloo
200 University Ave. W
Waterloo, Ontario, Canada N2L 3G1
The appendix is available online at www.optvissci.com.
APPENDIX: SUBJECTIVE EVALUATION OF SYMPTOM OF DRYNESS (ALLERGAN) (SESoD)
Please evaluate your ocular discomfort due to the symptom of “Dryness” on a scale of 0 (none) to 4 (severe). You may use the following descriptions to assist in your score.
None (0) = I do not have this symptom.
Trace (1) = I seldom notice this symptom, and it does not make me uncomfortable.
Mild (2) = I sometimes notice this symptom, it does make me uncomfortable, but it does not interfere with my activities.
Moderate (3) = I frequently notice this symptom, it does make me uncomfortable, and it sometimes interferes with my activities.
Severe (4) = I always notice this symptom, it does make me uncomfortable, and it usually interferes with my activities.
1. Pointer MR. New directions—soft metrology—requirements for support from mathematics statistics and software. NPL Report CMSC 20/03. 2003.
2. Finkelstein L. Problems of measurement in soft systems. Measurement 2005;38:267–74.
3. Finkelstein L. Widely, strongly and weakly defined measurement. Measurement 2003;34:39–48.
4. Adcock R, Collier D. Measurement validity: a shared standard for qualitative and quantitative research. Am Polit Sci Rev 2001;95:529–46.
5. Lemp MA, Baudouin C, Baum J, Dogru M, Foulks GN, Kinoshita S, Laibson P, McCully J, Murube J, Pfugfelder SC, Rolando M, Toda I. The definition and classification of dry eye disease: report of the Definition and Classification Subcommittee of the International Dry Eye Workshop (2007). Ocul Surf 2007;5:75–92.
6. Smith JA, Albeitz J, Begley C, Caffery B, Nichols K, Schaumberg D, Schein O. The epidemiology of dry eye disease: report of the Epidemiology Subcommittee of the International Dry Eye Workshop (2007). Ocul Surf 2007;5:93–107.
7. Bron AJ, Abelson MB, Ousler G, Pearce E, Tomlinson A, Yokoi N. Methodologies to diagnose and monitor dry eye disease: report of the Diagnostic Methodology Subcommittee of the International Dry Eye Workshop (2007). Ocul Surf 2007;5:108–52.
8. Nichols KK, Nichols JJ, Mitchell GL. The lack of association between signs and symptoms in patients with dry eye disease. Cornea 2004;23:762–70.
9. Schein OD, Tielsch JM, Munoz B, Bandeen-Roche K, West S. Relation between signs and symptoms of dry eye in the elderly. A population-based perspective. Ophthalmology 1997;104:1395–401.
10. Korb DR. Survey of preferred tests for diagnosis of the tear film and dry eye. Cornea 2000;19:483–6.
11. Chia EM, Mitchell P, Rochtchina E, Lee AJ, Maroun R, Wang JJ. Prevalence and associations of dry eye syndrome in an older population: the Blue Mountains Eye Study. Clin Experiment Ophthalmol 2003;31:229–32.
12. Lee AJ, Lee J, Saw SM, Gazzard G, Koh D, Widjaja D, Tan DT. Prevalence and risk factors associated with dry eye symptoms: a population based study in Indonesia. Br J Ophthalmol 2002;86:1347–51.
13. McCarty CA, Bansal AK, Livingston PM, Stanislavsky YL, Taylor HR. The epidemiology of dry eye in Melbourne, Australia. Ophthalmology 1998;105:1114–19.
14. Moss SE, Klein R, Klein BE. Prevalence of and risk factors for dry eye syndrome. Arch Ophthalmol 2000;118:1264–8.
15. Schaumberg DA, Sullivan DA, Buring JE, Dana MR. Prevalence of dry eye syndrome among US women. Am J Ophthalmol 2003;136:318–26.
16. Schein OD, Munoz B, Tielsch JM, Bandeen-Roche K, West S. Prevalence of dry eye among the elderly. Am J Ophthalmol 1997;124:723–8.
17. Doughty MJ, Fonn D, Richter D, Simpson T, Caffery B, Gordon K. A patient questionnaire approach to estimating the prevalence of dry eye symptoms in patients presenting to optometric practices across Canada. Optom Vis Sci 1997;74:624–31.
18. Begley CG, Caffery B, Chalmers RL, Mitchell GL. Use of the dry eye questionnaire to measure symptoms of ocular irritation in patients with aqueous tear deficient dry eye. Cornea 2002;21:664–70.
19. McMonnies CW, Ho A. Responses to a dry eye questionnaire from a normal population. J Am Optom Assoc 1987;58:588–91.
20. Schiffman RM, Christianson MD, Jacobsen G, Hirsch JD, Reis BL. Reliability and validity of the Ocular Surface Disease Index. Arch Ophthalmol 2000;118:615–21.
21. Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use, 3rd ed. New York: Oxford University Press 2003.
22. Massof RW. The measurement of vision disability. Optom Vis Sci 2002;79:516–52.
23. Pesudovs K, Garamendi E, Elliott DB. The Quality of Life Impact of Refractive Correction (QIRC) Questionnaire: development and validation. Optom Vis Sci 2004;81:769–77.
24. Boeckstyns ME. Development and construct validity of a knee pain questionnaire. Pain 1987;31:47–52.
25. Swets JA. Measuring the accuracy of diagnostic systems. Science 1988;240:1285–93.
26. Simmons PA, Vehige JG, Carlisle C, Felix C. Comparison of dry eye signs in self-described mild and moderate patients. Invest Ophthalmol Vis Sci 2003;44:ARVO E-Abstract 2448.
27. Wright BD. Solving measurement problems with the Rasch model. J Educ Meas 1977;14:97–116.
28. Wright BD, Stone MH. Making Measures. Chicago: The Phaneron Press; 2004.
29. Lemp MA. Report of the National Eye Institute/Industry workshop on Clinical Trials in Dry Eyes. CLAO J 1995;21:221–32.
30. Andrich D. Rasch Models for Measurement. Newbury Park, CA: Sage Publications; 1988.
31. Bond TG, Fox CM. Applying the Rasch Model: Fundamental Measurement in the Human Sciences. Mahwah, NJ: L. Erlbaum; 2001.
32. Good PI, Hardin JW. Common Errors in Statistics (and How to Avoid Them), 2nd ed. Hoboken, NJ: Wiley; 2006.
33. Situ P, Simpson T, Jones L, Fonn D. Conjunctival and corneal sensitivity is associated with dry eye symptomatology. Invest Ophthalmol Vis Sci 2006;47:ARVO E-Abstract 262.
34. Nichols KK, Nichols JJ, Mitchell GL. The reliability and validity of McMonnies dry eye index. Cornea 2004;23:365–71.
35. Rossi GB. Measurability. Measurement 2007;40:545–62.
Keywords:© 2008 American Academy of Optometry
dry eye; questionnaire; measurement of symptoms; rasch analysis; ROC curve analysis