Retest variability compromises the ability of standard automated perimetry (SAP) to detect true progression in glaucoma. Variability in successive tests may be caused by patient factors such as small fixational eye movements during testing,1 limited neural dynamic range,2 or inherent problems in some SAP test strategies such as spatial undersampling.3,4
Different test strategies available in the same instrument (e.g., Swedish Interactive Threshold Algorithm (SITA) and Standard Full Threshold in the Humphrey Field Analyser [HFA]) may also report different visual field defect extents,5 depths,5,6 and differing levels of retest variability.7,8
The HFA 30-2 and the Medmont M700 “Central 30° test” (hereafter referred to as the 30° test) differ substantially in the distribution and spacing of test points. The HFA has test points arranged in a square grid pattern, with each point separated by 6°, with the central 10° of the 30-2 test only having 12 test points. The M700 30° test points are arranged in radial patterns around fixation (Fig. 1A).
The M700 30° test has a significantly increased central sampling rate compared with the HFA 24-2 and 30-2 protocols, which may result in better detection of central functional loss given that small but deep retinal nerve fibre layer defects may not be picked up as a scotoma using a 6° grid.9,10 The central 10° stimulus layout in the M700 30° test is an easily accessible clinical tool for detecting central and peripheral field loss within a single wide field test.
The M700 “Macula 10° test” (hereafter referred to as the 10° test) stimulus layout only differs from the central 10° of the 30° test by the addition of four test points located 1° from fixation (Fig. 1B).
To facilitate detection of progression, we propose to establish event-based progression criteria for the M700 visual field indices of Pattern Defect (PD) and Overall Defect (OD) for the 30° test and the 10° test protocols.
We intend to examine the retest variance of the central 10° of the 30° test and compare this with the retest variance of the 10° test. Comparing the test-retest variability (TRV) obtained from each strategy will determine whether any differences in repeatability exist between both strategies.
Retest variance of peripheral points in healthy individuals has been reported to be larger in mid-peripheral points than in points closer to fixation using the HFA.11,12 It may be that the higher density central sampling rate in the M700 may result in a different profile of variance to that of the peripheral field.3,4 We will therefore also compare the retest variance of the central 10° fields with the outer 20° of the 30° test.
Twenty-four glaucoma subjects were enrolled in this study. The research conformed to the tenets of the Declaration of Helsinki. Informed consent was obtained from the subjects after the nature and possible consequences of the test protocols were explained to the subjects. The appropriate consent forms were signed under ANU/ACT-Health protocol 7/07.667.
The patients had been diagnosed as having glaucoma by an ophthalmologist. Mean age was 73.6 years (range, 52.3 to 87.8 years), mean visual acuity was 6/8.1 (range, 6/5+ to 6/15-), and mean refractive error was +1.1/-0.9D (range, sphere +3.50 to −1.50D; cylinder, 0 to −1.75D).
The data set consisted of 40 eyes of 21 patients (seven males) with 160 tests in total. Pearson’s correlation coefficient (using PD) was 0.297, indicating a low level of correlation between eyes. That being said, no statistics where between-eye correlation would come into play were estimated.
No subjects were perimetrically naive. Mean time between the subjects’ most recent field test before the study and the first field test undertaken in the study was 6.1 months (standard deviation [SD] 4.7 months). No training test was performed on any subject.
A Medmont M600 bowl and Medmont M700 software (version 3.9.7; Medmont P/L, Nunawading, Victoria, Australia) were used. The M600 bowl uses Goldmann size III (0.43°) rear-projection light-emitting diodes (peak wavelength, 565 nm). The fixation target is a yellow light–emitting diode with a peak wavelength of 583 nm. The maximum stimulus brightness is 318 cd/m2 (0 dB) and is attenuated in sixteen 3 ± 1–dB levels. The background illumination is 10 apostilb (3.2 cd/m2).
All tests used the fast threshold strategy, with automatic blind spot location. The fast threshold strategy uses the following test order of presentation points: the calibration points (“C” instead of a dot in Fig. 1, A and B) are fully thresholded to initialize the neighborhood prediction function. This strategy estimates the projected threshold value for a point based on the completed points adjacent to it. An age-and population-based probability function is applied to each new test point based on the results from the surrounding exposed points. The true threshold level is then assessed using repeated exposures and statistical techniques. The fast threshold strategy does not use any spatial smoothing because individual points are not averaged with their neighbors. Because the calibration points for the 30° test and the 10° test remain the same, there is no alteration to the fast threshold strategy for each test.
All tests were carried out in the same room, with the lights turned off, to ensure consistent background illumination, and all subjects were continuously monitored throughout their testing. The test order at each visit was as follows: 30° test on the right eye, followed by a break of approximately 4 min, with the room lights off. A 30° test was then carried out on the left eye, followed by a break of approximately 15 min, with the room lights turned on. The 10° test was then carried out on the right eye, followed by a break of approximately 4 min, with the room lights turned off. The 10° test was then carried out on the left eye.
The lights in the room remained on while preparing and advising each patient before the commencement of any right-eye test. The lights were then turned off by the examiner, who sat down and commenced the test. Because the same examiner performed all the tests in exactly the same manner, there should be no differences in right-eye sensitivities.
The two test sessions were carried out 1 week apart. The first test session was carried out at a randomly allocated time of day for each subject, but the second session was done at the same time of day to minimize circadian variations in test results. Before the first test, the height of the test lens holder and the canthus level indicator to the chinrest were measured for each subject. These figures were used to center the test lens (48-mm diameter) and position the subject for all subsequent tests. These positions were not adjusted during testing. Pupils were not dilated, and the standard M700 response time setting was used.
Some tests with reliability criteria outside M700 recommendations were included in the study. No subject had more than one reliability index exceed the recommended criteria in any test, although some subjects had a single unreliable index in more than one test used in the study. False-negative results exceeding the recommended criteria all occurred in significantly damaged fields,13 and only two false-positive results exceeded the recommended criteria. Tests with fixation losses outside the recommended reliability criteria were included after due consideration was given to the examiner’s assessment of patient reliability. There were 21 such tests, although that number was reduced to 6 if reliable fixation loss criteria were increased to 33%.14
The M700 incorporates an asterisk-based staging system on its visual field printouts (Table 1), indicating the degree of abnormality in OD or PD. The number of asterisks is not staged according to any percentile value. One asterisk (*), two asterisks (**), and three asterisks (***) indicate a mild, moderate, or severe deviation from normal, respectively.
The M700 users’ manual defines OD as the mean difference between the age-normal Hill of Vision (HoV) and the mean deviation (MD) (or patient-based HoV). The patient HoV deviation shows the difference between the patients’ test results and what the patients’ HoV would look like without any localized test defects present. An algorithm is used to fit an HoV to the patient’s visual field, ignoring outliers and defect areas. Values for the patient HoV are only displayed if variation out to 15° is ±4.5 dB and beyond 15° is ±6.5 dB. A positive number indicates better than normal vision, and a negative number indicates a depressed field. The OD is calculated using a trimmed mean of the total deviations, with the amount of trimming being influenced by the severity of disease present, the presence of diffuse losses, and a high number of false-positive results.15
The M700 uses an algorithm to calculate an estimated HFA MD value. The HFA MD value is calculated using an older Medmont global index, the Average Defect, from the M600. The Average Defect index is not displayed on any M700 test report.
The PD is a measure of the clustering and depth of defects. It is a scaled mean value of the product of a points’ HoV deviation and that of its neighbors, qualifying the extent to which deviations are spatially correlated or clustered. Randomly distributed deviations from the patient’s HoV result in a small PD, whereas clusters of deviations cause the index to increase.
The TRV may be reported using the limits of agreement (LoA), which are the mean difference between measurements ±1.96 times the SD of the difference between measurements. The coefficient of repeatability (CoR) can be derived from a Bland-Altman plot by subtracting the mean difference from the upper 95% LoA. The TRV in the current study will be reported as the CoR, a single figure that can be used to define the TRV for M700 test parameters.
Statistical analysis was carried out using Medcalc version 12.4 (Medcalc Software bvba, Ostend, Belgium) and Matlab (Matlab 6.1; The MathWorks, Inc., Natick, MA). For comparison of retest variance, all measures from the 10° test excluded the four central-most points (Fig. 1) to ensure congruity with the central 10° of the 30° test.
The 30° test has 99 test points (excluding the blind spot). The overall average number of presentations was 206 presentations/99 test points (2.08/point). Average test duration for the 30° tests was 5 min 44 s. Test duration was highly negatively correlated with MD (r = − 0.82, p < 0.0001), indicating that increasing test time was associated with increasing levels of visual field damage.
The TRV in the outer 20° of the 30° test (Fig. 2) was lower than that for corresponding sensitivities (from 5 to 20 dB) in the central 10° of the 30° test and the 10° test. Reduced TRV at sensitivities close to 0 dB are caused by limitations in the dynamic range of the instrument.7
The percentiles of the test-retest results for the 30° test and the combined results of the central 10° of the 30° test and the 10° test are shown in Fig. 3. The TRV is lower for the 0 to 1 dB bins because of the dynamic range of the instrument. The anomalous results for 34 to 37 dB values are caused by small sample sizes. The TRV remains high for the 30° test for initial test values below 17 dB, which is consistent with the findings from other studies.2,7
To test for the effect of eccentricity on variance, TRV of points in the central 10° of the 30° test was compared with TRV of points in the outer 20° of the same test. For each subject, the mean sensitivity (from the two tests) at each point was calculated and compared with the absolute value of the difference between visits at each point. The effect of age and sex was also assessed.
The proportion of variance (r2) accounted for in the above model is 0.381. Although sex did not have any significant effect on TRV, age was found to have a significant effect (0.77 dB/decade from the mean age of the study subjects, p < 0.01). Eccentricity seemed to have little effect on retest error (0.03 ± 0.02 dB/deg, p = 0.10), whereas mean sensitivity had a more significant effect (0.08 dB [error]/dB [sensitivity], p = 0.01). These results indicate that in glaucoma subjects, using the M700, eccentricity does not have a significant effect on TRV.
OD and PD Variability
The OD and estimated HFA MD are highly correlated (r = 0.97) (Fig. 4). Using Bland-Altman analysis (not shown), the mean difference between the HFA MD and M700 OD was found to be −3.70 dB. The LoA obtained was -3.70 ± 1.3 dB (−2.40 to −5.01 dB).
In Fig. 4B, Pearson’s correlation coefficient was r = − 0.61. Despite this moderately negative correlation, there is a noticeable spread of values above and below the regression line for all levels of PD.
The average OD value of approximately +2.9 dB (Table 2) is almost 6 dB higher than the M700 would indicate as being mildly abnormal (Table 1) for the average age of this glaucoma cohort. The reproducibility of the OD values was relatively good, with the CoR obtained being 2.1 to 2.4 dB (Table 2).
Figs. 5 and 6 demonstrate the disparity between OD and estimated HFA MD.
Panels A and B of Fig. 7 demonstrate the LoA for M700 PD and OD. For PD, the LoA was 0.3 ± 2.2 (−1.9 to 2.5 dB). For OD, the LoA was 0.1 ± 2.2 (−2.0 to 2.3 dB). A reviewer queried the normality of the distribution of the between-visit differences as well as the validity of computing the LoA as 1.96 × SD. A Kolmogorov-Smirnov test for normality found both the OD and PD differences to be normally distributed. A bootstrap method, based on 10,000 rounds of sampling with replacement, found good agreement with the 1.96 × SD method (e.g., the PD 5th and 95th percentiles were −1.04 ± 0.16 and 2.81 ± 0.91 [mean ± SE]). Given the SE, these do not seem very different to −1.9 and 2.5 and, if anything, the 1.96 × SD method is conservative and may overestimate the error.
The mean and the CoR of the OD plot (Fig. 7A) are reflections of the M700 method of calculation of OD. Points outside the age-normal HoV are not used in the calculation of OD. As points of higher sensitivity demonstrate lower variability (Fig. 3), the variability of the OD descriptor may thus be reduced. The PD results (Fig. 7B) show the relationship of increasing variability with increasing PD (Table 3).
Table 3 shows PD results for the cohort of subjects in the current study (cf. severity criterion, Table 1). The PD results are more reflective of the varying levels of glaucomatous field loss than were the OD values. Severely abnormal individual PD results were obtained, as opposed to no moderately or severely abnormal OD results. The CoR was found to vary according to the level of PD.
The form of the root mean square error (Fig. 2) for the 30° test is broadly similar to that reported for the HFA 30-2 test.7 Because average dB results from the M700 have been reported to be approximately 5 dB lower than the HFA,16 Fig. 2 needs to be shifted to the right by 5 dB to compare with the HFA.7 In Fig. 2, the outer 20° of the 30° test showed slightly lower retest variability than both results from the central 10° for decibel values from 5 to 20 dB and quite similar values for all other decibel values. Points in the central 10° of the visual field reportedly have less variability than peripheral points in healthy subjects,11 but this may not be the case for points of equal sensitivity for glaucoma patients using the M700.
Variability has been reported to increase rapidly as sensitivity decreases,17 so the results of the current study may be biased if there is decreased sensitivity in the central 10°. Analysis of the 30° tests shows that, for the central 10° points, the average point sensitivity was 22.8 dB (1757 points, excluding two 37 dB and one 38 dB outliers). The outer 20° of the same fields showed an average point sensitivity of 17.4 dB (2118 points, which excluded one 37 dB outlier and the points directly above and below the blind spot). The lack of increase in peripheral TRV does not seem to be caused by decreased sensitivity of the central points.
The TRV did not increase with eccentricity when comparing points of equal decibel values, and eccentricity was not found to have a significant effect on retest variance in the M700 30° test (Table 4). Changes in peripheral point sensitivities may therefore have the same diagnostic importance as the same level of change in more centrally located points of equal sensitivity, and this finding needs to be incorporated into the interpretation of M700 field test results.
OD and PD Variability
In this study, the OD disease staging (Table 1, Figs. 5 and 6) did not seem to correlate with the amount of field loss present. Although the OD index might be assumed to be the equivalent of MD in the HFA, HFA MD equivalents give a mildly damaged field (Fig. 6) and a moderately damaged field (Fig. 5), whereas OD criteria have Fig. 6 being normal and Fig. 5 only mildly abnormal.
The most usable aspect of OD in the M700 may be the observation of change greater than the CoR of 2.4 dB in the 30° test (Table 2). Although OD may decline monotonically because of its method of calculation (using only points on the patient’s HoV considered to be within normal limits), this figure needs to be applied carefully in clinical practice.
Pattern deviation measures in extremely damaged eyes may return a normal probability plot using the HFA.18 Data from Landers et al.19 suggest that PD continues to increase even in severely damaged fields with the M600 (an earlier version of the M700), as evidenced by the large PDs for the severely damaged fields in Figs. 5 and 6. A study using the International Classification of Disease and Health-Related Problems Glaucoma Staging Codes found that pattern standard deviation in the HFA was significantly higher in the severe-stage group than in the moderate-stage group.20 The results of these studies19,20 seem to indicate that significantly damaged fields (up to the level encountered in the current study) may be able to use event-based criteria as reported in the current study. M700 CoR values obtained reflect the importance of considering the amount of PD present when considering what constitutes statistically significant change, as found with MD in the HFA by Tattersall et al.21
Although trend analysis may be used to determine statistically significant change across time without the requirement for a reliable baseline field, a sufficient number of fields required for this analysis may be difficult to obtain in many clinical situations. Event-based criteria may not always be the optimum method of determining progressive field loss, but they are an accessible diagnostic tool for clinicians who may only have two fields per year with which to detect statistically significant progressive field loss.
John Graham Pearce
Eccles Institute for Neuroscience
John Curtin School of Medical Research
Australian National University
Canberra ACT 0200
Supported by the Australian Research Council through the ARC Centre of Excellence in Vision Science (CE0561903) and the Federal Government through the Nursing and Allied Health Scholarship Support Scheme. The views expressed in this article do not necessarily represent those of the NAHSSS, its administrator, Services for Australian Rural and Remote Allied Health, or the Australian Government Department of Health. Medmont figures and tables have been used with the friendly permission of Medmont. None of the authors of this article has any commercial associations that may result in any conflict of interest in connection with the submitted manuscript. Until recently, author TM received royalties for sales of the FDT matrix perimeters until mid-2015 and has patents and equity in a possible new objective perimeter being developed by NuCoria Pty. Ltd.
Received January 18, 2015; accepted October 1, 2015.
1. Wyatt HJ, Dul MW, Swanson WH. Variability of visual field measurements is correlated with the gradient of visual sensitivity. Vision Res 2007; 47: 925–36.
2. Gardiner SK, Swanson WH, Goren D, Mansberger SL, Demirel S. Assessment of the reliability of standard automated perimetry in regions of glaucomatous damage. Ophthalmology 2014; 121: 1359–69.
3. Maddess T. The influence of sampling errors on test-retest variability in perimetry. Invest Ophthalmol Vis Sci 2011; 52: 1014–22.
4. Maddess T. Modeling the relative influence of fixation and sampling errors on retest variability in perimetry. Graefes Arch Clin Exp Ophthalmol 2014; 252: 1611–9.
5. Aoki Y, Takahashi G, Kitahara K. Comparison of Swedish interactive threshold algorithm and full threshold algorithm for glaucomatous visual field loss. Eur J Ophthalmol 2007; 17: 196–202.
6. Budenz DL, Rhee P, Feuer WJ, McSoley J, Johnson CA, Anderson DR. Comparison of glaucomatous visual field defects using standard full threshold and Swedish interactive threshold algorithms. Arch Ophthalmol 2002; 120: 1136–41.
7. Artes PH, Iwase A, Ohno Y, Kitazawa Y, Chauhan BC. Properties of perimetric threshold estimates from Full Threshold, SITA Standard, and SITA Fast strategies. Invest Ophthalmol Vis Sci 2002; 43: 2654–9.
8. Saunders LJ, Russell RA, Crabb DP. Measurement precision in a series of visual fields acquired by the standard and fast versions of the Swedish interactive thresholding algorithm: analysis of large-scale data from clinics. JAMA Ophthalmol 2015; 133: 74–80.
9. Asaoka R. Mapping glaucoma patients’ 30-2 and 10-2 visual fields reveals clusters of test points damaged in the 10-2 grid that are not sampled in the sparse 30-2 grid. PLoS One 2014; 9: e98525.
10. Burk RO, Tuulonen A, Airaksinen PJ. Laser scanning tomography of localised nerve fibre layer defects. Br J Ophthalmol 1998; 82: 1112–7.
11. Heijl A, Lindgren G, Olsson J. Normal variability of static perimetric threshold values across the central visual field. Arch Ophthalmol 1987; 105: 1544–9.
12. Asman P, Heijl A. Weighting according to location in computer-assisted glaucoma visual field analysis. Acta Ophthalmol (Copenh) 1992; 70: 671–8.
13. Bengtsson B, Heijl A. False-negative responses in glaucoma perimetry: indicators of patient performance or test reliability? Invest Ophthalmol Vis Sci 2000; 41: 2201–4.
14. Bickler-Bluth M, Trick GL, Kolker AE, Cooper DG. Assessing the utility of reliability indices for automated visual fields. Testing ocular hypertensives. Ophthalmology 1989; 96: 616–9.
15. Vingrys AJ, Zele AJ. Robust indices of clinical data: meaningless means. Invest Ophthalmol Vis Sci 2005; 46: 4353–7.
16. Pye D, Herse P, Nguyen H, Vuong L, Pham Q. Conversion factor for comparison of data from Humphrey and Medmont automated perimeters. Clin Exp Optom 1999; 82: 11–4.
17. Russell RA, Crabb DP, Malik R, Garway-Heath DF. The relationship between variability and sensitivity in large-scale longitudinal visual field data. Invest Ophthalmol Vis Sci 2012; 53: 5985–90.
18. Blumenthal EZ, Sapir-Pichhadze R. Misleading statistical calculations in far-advanced glaucomatous visual field loss. Ophthalmology 2003; 110: 196–200.
19. Landers J, Sharma A, Goldberg I, Graham S. A comparison of global indices between the Medmont Automated Perimeter and the Humphrey Field Analyzer. Br J Ophthalmol 2007; 91: 1285–7.
20. Parekh AS, Tafreshi A, Dorairaj SK, Weinreb RN. Clinical applicability of the International Classification of Disease and Related Health Problems (ICD-9) glaucoma staging codes to predict disease severity in patients with open-angle glaucoma. J Glaucoma 2014; 23: e18–22.
21. Tattersall CL, Vernon SA, Menon GJ. Mean deviation fluctuation in eyes with stable Humphrey 24-2 visual fields. Eye (Lond) 2007; 21: 362–6.
Keywords:© 2016 American Academy of Optometry
M700; retest variability; overall defect; pattern defect; visual field indices