Comparison of the Performance of a Novel, Smartphone-based, Head-mounted Perimeter (GearVision) With the Humphrey Field Analyzer

Supplemental Digital Content is available in the text. Precis: The agreement between a head-mounted perimeter [GearVision (GV)] and Humphrey field analyzer (HFA) for total threshold sensitivity was a mean difference of −1.9 dB (95% limits of agreement −5 to 1). GV was the preferred perimeter in 68.2% of participants. Purpose: The purpose of this study was to compare reliability indices and threshold sensitivities obtained using a novel, smartphone-based, head-mounted perimeter (GV) with the HFA in normal, glaucoma suspect and glaucoma patients. A secondary objective was to evaluate the subjective experience participants had with both perimeters using a questionnaire. Methods: In a prospective, cross-sectional study; 107 eyes (34 glaucoma, 18 glaucoma suspect, and 55 normal) of 54 participants underwent HFA and GV in random order. The main outcome measure was the agreement of threshold sensitivities using Bland and Altman analysis. Participants also completed a questionnaire about their experience with the devices. Results: Median false-positive response rate for GV was 7% (4% to 12%), while for HFA it was 0% (0% to 6%, P<0.001). Median false-negative response rate was similar for both tests. In all, 84 eyes with reliable HFA and GV results were included in the final analysis. Median threshold sensitivity of all 52 points on HFA was 29.1 dB (26.5 to 30.7 dB) and for GV was 30.6 dB (29.1 to 32.6 dB; P<0.001). Mean difference (95% limits of agreement) in total threshold sensitivity between HFA and GV was −1.9 dB (−5 to 1 dB). The 95% limits of agreement were fairly narrow (−8 to 2 dB) across the 6 Garway-Heath sectors. Most participants preferred to perform GV (68.2%) if required to repeat perimetry compared with HFA (20.6%, P<0.001). Conclusions: There was fairly good agreement between the threshold sensitivities of GV and HFA. GV was also preferred by most patients and could potentially supplement HFA as a portable or home perimeter.

S tandard automated perimetry is the reference standard test for identifying glaucomatous visual field (VF) defects. It is performed on devices such as the Humphrey field analyzer (HFA) and the Octopus perimeter which evaluate the threshold sensitivities in the central VF. 1 However, these are bulky instruments available only in a hospital setting. In addition, these machines have several testing requirements such as a dim-lit room and the need for patients to keep their head steady throughout the test duration. These factors cause patients to dislike VF testing as shown in a qualitative study of glaucoma patients. 2 Hence, there is a need for a smaller, user-friendly perimeter.
Recently, virtual reality (VR)-based methods using portable, head-mounted devices (HMDs) have been proposed as a means to estimate threshold sensitivities in VFs. Several such perimeters like imo (CREWT Medical Systems, Tokyo, Japan), VirtualEye, Kasha, and C3 field analyzer (CFA; Remidio Innovative Solutions, Bengaluru, Karnataka, India) have been developed but are not readily available. [3][4][5][6][7] Some of these are custom-made and hence expensive. [3][4][5] Others are low cost but have only reported their efficacy as a screening tool using suprathreshold programs. 6,7 GearVision (GV) is an app-based perimeter developed based on a commercially available VR headset and a compatible smartphone. 8 It enables a person to take the perimetry test at a convenient time and place, such as their own home. It is lightweight and can be carried to remote locations in telemedicine scenarios. It currently offers threshold and suprathreshold programs for VF mapping.
The purpose of our study was to compare the reliability indices and threshold sensitivities of GV with HFA in glaucoma, glaucoma suspect, and normal eyes. A secondary objective was to evaluate the subjective experience of patients with both perimeters using a questionnaire (Supplemental Digital Content 1, http://links.lww.com/IJG/A534).

METHODS
This was a prospective, observational, cross-sectional study conducted between March 2019 and January 2020. The methodology adhered to the tenets of the Declaration of Helsinki for research involving human participants. Written informed consent was obtained from all participants after approval by the Institute's Ethics Committee.
Study participants included glaucoma patients, glaucoma suspects, and healthy controls. Glaucoma patients had characteristic optic nerve head (ONH) changes (rim narrowing, notching) or retinal nerve fiber layer defects and correlating VF changes on HFA. Glaucoma suspects had suspicious ONHs (loss of the ISNT rule, neuroretinal rim thinning, or a cup-to-disk ratio asymmetry of ≥ 0.2 between the 2 eyes) as determined by glaucoma experts, but no correlating VF changes. Healthy controls had an intraocular pressure ≤ 21 mm Hg, normal ocular examination as determined by a glaucoma expert, and no characteristic VF defects. Glaucoma patients and glaucoma suspects were recruited from the glaucoma clinic. Controls were either hospital staff or patients who consulted the hospital for a routine eye examination or a refractive error. Inclusion criteria were age 18 years and above, best-corrected visual acuity (BCVA) of 20/40 or better, and refractive error within ± 5 D sphere and ± 3 D cylinder. Eyes with a history of trauma or inflammation were excluded, as were eyes which underwent intraocular surgery within the previous 6 months. Additional exclusion criteria were the presence of any retinal or neurological disease other than glaucoma. All participants underwent a comprehensive ocular examination, which included medical history, BCVA measurement, slit-lamp biomicroscopy, applanation tonometry, dilated fundus examination, and VF examination (in a random order) with HFA II (model 720i; Zeiss Humphrey Systems, Dublin, CA) and the GV perimeter. Both eyes of every participant were tested and, if eligible, were included in the study.
The technical specification of the GV perimeter have been detailed elsewhere. 8 In brief, the hardware consists of a Samsung S8 smartphone (https://m.gsmarena.com/ samsung_galaxy_s8-8161.php) and a Samsung GearVR HMD with a Samsung GearVR Controller (www.samsung. com/global/galaxy/gear-vr/). The entire setup weighs 565 g. A smartphone application has been developed for the Android platform using Oculus Mobile SDK. The VR device renders the display for each eye separately and, therefore, allows individual eye testing without the use of an eye patch. Test participants are required to wear their glasses during the test and fine adjustments for correction of refractive error may be made using an adjustment knob on the HMD. They can move their head during the test. The GV application also provides for periodic breaks to allow the participant to rest if required.
The current version of GV provides reliability parameters, namely fixation losses (FLs), false-positive (FP), and false-negative (FN) response rates. FLs were detected using the Heijl-Krakau method. The patient is made to undergo a short demonstration test before the full-threshold test to determine the location of the physiological blind spot for that patient. During this test, 30 stimuli are presented, of which 15 (5 each) are presented at the 3 potential blind spot locations and the rest are distributed randomly over the VF. The blind spot location is selected as the one where the patient detects the least number of stimuli. During the threshold test, 15 stimuli are presented at the identified blind spot location and an additional 3 stimuli each at the other potential blind spot locations. On the basis of patient's responses, the FLs are calculated. FP responses were calculated by treating responses occurring faster than the minimum possible response time (200 ms) as a FP, which is similar to the strategy used to detect FPs in the SITA algorithm of the HFA. 9 GV measures FN responses using catch trials which is similar to that performed in the HFA full-threshold test.
The luminance on the HFA is measured in cd/m 2 , while the brightness of mobile displays is measured in grayscale (0 corresponds to black and 255 corresponds to white). Therefore, the grayscale was converted to luminance (cd/m 2 ) based on the conversion provided by DisplayMate (Samsung S8 Grayscale To Luminance. Available online: www. displaymate.com/Gamma_39.html). On the basis of the ISO standards for perimetry, background luminance is set to 10 cd/ m 2 . The maximum possible stimulus luminance (L max ) depends on the device display. Typically, L max is 3183 cd/m 2 for the HFA and 440 cd/m 2 on the Samsung Galaxy S8. Hence, the stimulus intensity calculated on GV can be made a Humphrey equivalent by adding a constant value and it is this value that was reported for standardization and ease of interpretation for clinicians. Note, based on the maximum brightness on the S8 display, the GV is able to measure a minimum threshold (and maximum intensity) of 8 dB. Hence, these 2 testing modalities are not directly comparable. Threshold sensitivity is estimated similar to the HFA fullthreshold test using the 4-2 staircasing strategy with one reversal. During the thresholding procedure, a correlated neighborhood thresholding algorithm was used to reduce the test duration without compromising accuracy. Similar to SITA, correlated neighborhood thresholding first calculates the threshold estimates at certain seed points (2 in each quadrant) using 4-2 staircasing. It then uses the estimate obtained at these points as the starting threshold estimate for each of the neighboring points. 9 Each participant performed both HFA and GV tests within a period of 2 weeks. The 24-2 full-threshold strategy with a Goldmann size III stimulus was used for all tests which estimates the threshold sensitivities at 54 test points. VFs were considered unreliable if FP or FN response rates were > 33%. For the analysis, VFs were divided into 6 Garway-Heath sectors corresponding to structural regions of the ONH. 10 The test points above and below the blind spot were excluded from the analysis due to the variability in blind spot position.
After participants had performed both tests, they were requested to complete a questionnaire about their experience with the devices. As this is a novel device, there was no previously validated questionnaire available for the purpose of this study. Therefore, a short, self-administered questionnaire comprising 4 precise, close-ended multiple-choice questions was designed. The pool of questions was reviewed by qualified experts to ensure they are accurate and free of item construction problems. It comprised questions concerning discomfort during the test (eg, back, neck, eye). In addition, the participants were asked which test (GV or HFA or no preference) they would prefer if perimetry had to be repeated. During this pilot test, the questionnaire was administered by the participants initially, and the responses were reviewed by a research staff along with the participant. The final response of every participant to any question was determined after clarification of each question and its response by a research staff. Therefore, an initial validation of the questionnaire has been performed during the study. The responses to the questionnaire were then compiled and analyzed.

Statistical Analysis
The Shapiro-Wilk test was used to check for the normality of distribution. Descriptive statistics included median and interquartile range for non-normally distributed continuous variables. Reliability parameters and global field parameters of the 2 instruments were compared using Wilcoxon rank-sum test. Bland and Altman (BA) analysis that provides the mean difference in measurements between the 2 instruments and the 95% limits of agreement (LoA) was used to assess the agreement between HFA and GV threshold sensitivities. 11 A multivariable, linear regression analysis was performed to determine factors that were associated with the difference in threshold sensitivities between the perimeters. Statistical analyses were performed using commercial software (Stata, version 14.2; StataCorp, College Station, TX). A P-value ≤ 0.05 was considered statistically significant.

RESULTS
A total of 107 eyes of 54 individuals underwent GV and HFA examinations. Of these, 27 (50%) were women. The mean age of all individuals was 46.4 ± 17.8 years (range: 19 to 83 y). Among these, 34 eyes (31.8%) were diagnosed as glaucoma, 18 (16.8%) as glaucoma suspects, and 55 (51.4%) eyes were healthy controls. The average mean deviation (MD) of the glaucoma group was −13.38 ± 9.6 dB (range: −31.01 to −1.28 dB). Of the 34 glaucoma patients who underwent perimetry, 12 had early glaucoma (MD: −6 to 0 dB), 5 had moderate glaucoma (MD: −12 to −6 dB), and 17 had advanced glaucoma (MD < −12 dB). Figure 1 shows an example of the HFA and GV reports for normal and glaucomatous eyes. Table 1 shows the test characteristics of HFA and GV. FL and FP were higher with GV compared with HFA, but FN were comparable between the 2 machines. In addition, the mean number of FL checks was significantly higher in the GV test (21 ± 0) compared with the HFA test (18.4 ± 2.5, P < 0.0001). The GV test took, on average, 1 minute longer than the HFA. Thirty-two participants (63 eyes) had prior experience of undergoing HFA examination, FIGURE 1. Case examples of the perimetry reports of the Humphrey field analyzer (HFA) and GearVision for normal and glaucomatous eyes. The raw data of threshold sensitivities and grayscale of the HFA report have been shown on the left. The right column shows the threshold sensitivities and grayscale (superimposed) of the GearVision report. The HFA report of a normal eye (A) and the GearVision report of the same eye (B). The HFA report of an eye with moderate glaucoma (C) and the GearVision report of the same patient (D). The HFA report of an eye with advanced glaucoma (E) and the GearVision report of the same patient (F).
while the other 22 participants (44 eyes) were new to VF examination. In all, 30 participants underwent HFA testing followed by the GV test.
Six HFA tests and 13 GV tests were unreliable (P = 0.003, χ 2 test). Ninety-one eyes had reliable VFs on both HFA and GV. An additional 7 eyes were excluded from the final analysis due to other reasons (eg, participants did not follow instructions, they fell asleep during either of the tests, battery of response clicker discharged during GV, diabetic retinopathy noted after dilatation). Therefore, 84 eyes with both reliable HFA and GV tests have been used in the subsequent analysis. Of the 23 eyes not included in the study, 8 eyes had never undergone prior VF testing and each of these eyes was excluded because of unreliability.
Of these 84 eyes of 42 participants, 21 eyes (25%) had glaucoma, 14 eyes (16.67%) were glaucoma suspects, and 49 eyes (59.33%) were normal. The median BCVA of these 84 eyes was 0 (0 to 0) on a LogMAR scale. The median threshold sensitivity of all 52 points was 29.1 dB (26.5 to 30.7 dB) for HFA and 30.6 dB (29.1 to 32.6 dB) for GV (P < 0.001). The median threshold sensitivity in the 6 Garway-Heath sectors is shown in Table 1. A correlation plot of the total sensitivity on HFA and GV is shown in Figure 2. Figure 3 shows the BA agreement analysis between HFA and GV for the 6 Garway-Heath sectors. The mean difference was threshold sensitivity of HFA minus GV. The mean difference (95% LoA) in total threshold sensitivity between HFA and GV was −1.9 dB (−5 to 1 dB). Figure 4 shows the BA plot for the agreement in total threshold sensitivities (average of all 52 points) between HFA and GV. A multivariate linear regression analysis showed that age (r = 0.018, P = 0.148), sex (r = 0.04, P = 0.90), prior experience with HFA, or order of the perimetry tests (HFA first or GV first, r = 0.24, P = 0.48) did not affect the difference in threshold sensitivities between HFA and GV. The difference in threshold sensitivities between HFA and GV was dependent only on the MD of the HFA (r = 0.09, P = 0.008). This implies that for every 1 dB reduction in MD (increased glaucoma severity), the difference in threshold sensitivities between the 2 perimeters increased by ∼0.1 dB. Figure 5 displays this data graphically, where the agreement between the devices was worse (greater difference in threshold sensitivities between 2 perimeters) in eyes with more severe VF damage (lower MD). An additional analysis was performed which compared the agreement between HFA and GV of each point for every participant (52 points×84 participants) as shown in Figure 6. This also showed that the LoA were narrower at threshold sensitivities in the normal range and increased at points of low threshold sensitivity.
The responses to the questionnaire of all 54 individuals were analyzed. When asked which test resulted in more body discomfort (to the back and neck) 47.7% said HFA, 11.2% said GV, and 41.1% said both tests were similar. When asked which test resulted in more eye discomfort (eg, watering, burning) 59.8% said HFA, 14% said GV, and 26.2% said both tests were similar. If perimetry needed to be repeated, 68.2% preferred to perform the GV, while 20.6% preferred the HFA, and 11.2% had no preference between the tests. There was no significant difference (P = 0.61) in the   mean age of participants who preferred HFA (49.4 ± 18.5 y) and those who preferred the GV (mean age: 47.2 ± 17.0 y). Previous experience with the HFA also did not affect whether participants opted to repeat perimetry with the HFA or GV perimeters (Pearson χ 2 = 4.18; P = 0.12).

DISCUSSION
In this study, we compared the performance of the GV perimeter against the current standard for VF examination (HFA). An examination of FLs showed that GV had a significantly higher number of median FL compared with HFA (20% vs. 6%). The reason for this difference could be that GV uses only blind spot testing, while the HFA uses blind spot testing along with a gaze tracker. It is important to note that the current study did not consider a cutoff for FL as part of its reliability criteria. This is because the literature has shown that in eyes with glaucoma, FL have a limited impact on reliability. [12][13][14] Even in healthy individuals, an incorrect diagnosis of glaucoma on VFs was associated with the FN response rate (odds ratio = 1.36, 95% confidence interval: 1.25-1.48, P < 0.001), but did not appear to be associated with FLs (odds ratio = 0.96; 95% confidence interval: 0.90-1.03, P = 0.30). 15 Analyzing the reliability indices, we found that the FP response rates were slightly higher with GV compared with HFA, while the FN responses were comparable between the 2 tests. This is probably because the method of FN detection was the same for both tests, but FPs were calculated differently for the GV. In a traditional HFA full-threshold program, the test is programmed to periodically not present a stimulus at a time when one may be expected and record responses to the absence of a stimulus. In the GV, FPs were estimated from responses occurring faster than the minimum possible response time (200 ms), which is similar to the SITA strategy. This method of FP detection seems more sensitive and, hence, may need some adjustment.
The threshold sensitivities with GV were statistically significantly higher (by 1 to 2 dB) than with HFA. This may be because the nontested eye is not occluded during the GV test despite each eye being tested separately. Other studies have shown that monocular sensitivities measured without occlusion of the nontested eye are significantly higher than sensitivities measured with fellow-eye occlusion. 16 This is because occlusion of the nontested eye may cause "blankout" due to both eyes experiencing different levels of illumination simultaneously. Studies have found that the eye to be tested second progressively dark adapts under an opaque occluder and the increasing difference in adaptation states between the eyes results in an elevation in the thresholds once the second eye is tested. This effect has been shown to be reduced with the use of a translucent patch for occlusion. 17 In contrast, nonocclusion of the nontested eye can activate binocular interaction resulting in binocular summation and higher sensitivities. Hence, the slightly   higher threshold sensitivities on GV may be because of the more physiological manner of testing and this may be a potential advantage in all perimetric systems which do not use occlusion. It is also important to note that the difference in threshold sensitivities between HFA and GV were not uniform across all threshold sensitivities. In eyes with more advanced glaucoma (lower MD), the difference in threshold sensitivities was greater, suggesting that some proportional bias may exist. This increased difference in visual sensitivity could be a result of the higher fluctuation seen in eyes with increased glaucoma severity. 18 It is also because the GV is able to measure a minimum threshold (and maximum intensity) of 8 dB; areas with threshold values less than this are going to show poor agreement between the devices.
There is a limited amount of literature on other HMD perimeters. [3][4][5][6][7] One of the earliest devices was the Kasha in 2000 which used a suprathreshold test and has been compared with a bedside confrontational field test in neurosurgery patients. 6 In 2014, an eye-tracking perimeter (VirtualEye) was described, which provided a 24-2 full-threshold test. 5 When compared with the HFA, they too found more unreliable tests with the VirtualEye (16 eyes) compared with the HFA (5 eyes). In a point-wise agreement analysis, they found a systematic shift of around 5 dB of the sensitivities of the Vir-tualEye with respect to the corresponding HFA values. 5 One possible explanation for this difference could be that they compared their full-threshold program with the HFA SITA program (SITA standard or SITA Fast); average threshold sensitivity is 1.31 dB higher with SITA standard as compared with the Humphrey full-threshold program and this difference increases to ∼3 dB for midrange sensitivities. 19,20 Other possible reasons for this shift toward lower sensitivities could be differences in the display technology between the Humphrey system and their HMD. 5 Another extensively studied device is the imo (CREWT Medical Systems, Tokyo, Japan), which has a full-threshold test and a modified Zippy Estimation by Sequential Testing (ZEST) algorithm called AIZE (Ambient Interactive ZEST). 3,4 The agreement between imo (24plus 1-2 AIZE) and HFA (30-2 SITA standard) for MD showed a mean difference of −0.65 dB (−6.49 to 5.19 dB) for the right eye and 0.56 dB (−4.70 to 5.81 dB) for the left eye. Although the published analysis (correlations and BA agreement analysis) for global indices (MD and VF index) between imo and HFA are promising, a more detailed evaluation of point-wise/ sector-wise data is not yet available. 3,4 This detailed, regional evaluation is important as field defects in glaucoma are often localized to sectors corresponding to the retinal nerve fiber layer defects and global indices do not completely reflect this information. The imo also has a pupil-tracking function; hence FLs seen on this instrument were similar to HFA 3,4 However, with the addition of a gaze tracker comes the drawback of increased weight; several elderly patients could find this 1.8 kg HMD too heavy. Hence the inventors devised an i-F version where the HMD is attached to a pillar and used as a regular perimeter; this defeats the purpose of a truly, mobile perimeter. The newest HMD is the CFA which provides a suprathreshold screening test and is retailing for USD 6000. 7 However, it was found to be a poor approximator of HFA results. Participants with an 18 dB or worse deficit on HFA were found to have a corresponding scotoma on CFA only 38% of the time. 7 On evaluating the agreement between HFA and GV for threshold sensitivities, we found that the mean difference in sensitivities were small (−1.9 dB), and LoA were fairly narrow given that this is a subjective test with considerable fluctuation. Therefore, these results are encouraging and the potential uses of GV are extensive. It could be utilized to evaluate the VFs in sick patients (bed-ridden or wheelchairbound) for whom going to a hospital is cumbersome. It could also be used in physically healthy individuals who live in remote areas where transport to tertiary eye care centers is difficult. Another potential use which is unique to this device is "home perimetry." This is because most other portable devices are gadgets that can be used for perimetry alone and are still too expensive for individual purchase. GV is essentially a smartphone app and the only additional hardware required is the GearVR HMD which costs ∼INR 8200 or USD 107. In the current scenario of worldwide social distancing norms and lockdowns in place due to the pandemic caused by the novel coronavirus (COVID-19), this device shows great promise. The current pandemic has resulted in an increase in telemedicine consultations and, for a glaucoma specialist, the report of a perimetry test done by the patient at home could provide a lot of additional, valuable information.
The patient questionnaire was beneficial in understanding the initial acceptance to the GV device. Almost 70% of patients preferred the GV test if required to repeat perimetry despite this test being 1 minute longer than the HFA. This can be attributed to the postural discomfort in the neck and back reported during HFA in ∼50% of our study participants. In fact, as the GV test can be performed in any position convenient to the patient, it is advantageous in patients with diseases of the spine such as ankylosing spondylitis who cannot rest their head on the chin rest. In addition, almost 60% of our participants reported ocular discomfort (watering, burning) during the HFA, which may be attributed to the use of a patch for occlusion of the nontested eye (note that this occlusion is not required in GV). While participants in a study often feel obliged to give a positive result for the novel device, the view should be juxtaposed with the detailed, qualitative evaluations of patient's views on VF testing performed by Glen et al 2 using focus groups. They found that patients did not enjoy performing a VF test. Some of the most frequent reasons cited by patients for this included travel to the clinic, waiting times, and the testing environment. All these issues are addressed and resolved by the GV and hence it could be a real gamechanger in the long-term management of glaucoma patients.
One of the strengths of this study was the inclusion of perimetry-naïve participants. More than 40% of the participants had never undergone a VF test before the study. All the previous HMDs have been tested on seasoned HFA test takers, which will affect their results due to the selection bias. The main limitation of the study was that it has compared the GV perimetry system with the HFA full-threshold strategy, which has largely been replaced by the SITA strategies in clinical practice. This was done as the current GV program utilizes the 4-2 bracketing strategy to determine threshold sensitivities which is similar to the HFA full-threshold test and apples have been compared with apples. SITA strategy models are based on distributions of normal threshold levels obtained using the original full-threshold program. Therefore, this is the indispensable first step in the evolving GV technology and the encouraging initial results will allow us to generate a normative database. That data can lead to further development and validation of test strategies with a shorter test duration as longer tests increase the chances of fatigue and can affect reliability. 13 Another limitation of the study was the inclusion of both eyes per participant. Also, as the number of eyes in each subgroup (glaucoma, glaucoma suspect, and controls) were small, a subgroup analysis was not performed.