The “visually guided hearing aid” (VGHA), consisting of a beamforming microphone array steered by eye gaze, is an experimental device being tested for effectiveness in laboratory settings. Previous studies have found that beamforming without visual steering can provide significant benefits (relative to natural binaural listening) for speech identification in spatialized speech or noise maskers when sound sources are fixed in location. The aim of the present study was to evaluate the performance of the VGHA in listening conditions in which target speech could switch locations unpredictably, requiring visual steering of the beamforming. To address this aim, the present study tested an experimental simulation of the VGHA in a newly designed dynamic auditory-visual word congruence task.
Ten young normal-hearing (NH) and 11 young hearing-impaired (HI) adults participated. On each trial, three simultaneous spoken words were presented from three source positions (−30, 0, and 30o azimuth). An auditory-visual word congruence task was used in which participants indicated whether there was a match between the word printed on a screen at a location corresponding to the target source and the spoken target word presented acoustically from that location. Performance was compared for a natural binaural condition (stimuli presented using impulse responses measured on KEMAR), a simulated VGHA condition (BEAM), and a hybrid condition that combined lowpass-filtered KEMAR and highpass-filtered BEAM information (BEAMAR). In some blocks, the target remained fixed at one location across trials, and in other blocks, the target could transition in location between one trial and the next with a fixed but low probability.
Large individual variability in performance was observed. There were significant benefits for the hybrid BEAMAR condition relative to the KEMAR condition on average for both NH and HI groups when the targets were fixed. Although not apparent in the averaged data, some individuals showed BEAM benefits relative to KEMAR. Under dynamic conditions, BEAM and BEAMAR performance dropped significantly immediately following a target location transition. However, performance recovered by the second word in the sequence and was sustained until the next transition.
When performance was assessed using an auditory-visual word congruence task, the benefits of beamforming reported previously were generally preserved under dynamic conditions in which the target source could move unpredictably from one location to another (i.e., performance recovered rapidly following source transitions) while the observer steered the beamforming via eye gaze, for both young NH and young HI groups.
From the 1Department of Speech, Language & Hearing Sciences, Boston University, Boston, MA
2EarLens Corporation, Menlo Park, CA.
Received April 25, 2017; accepted October 22, 2017.
This work was supported by a grant from NIH/NIDCD (to G.K.) and by a grant from DoD/AFOSR (to G.K.).
The authors report no conflicts of interest.
Address for correspondence: Elin Roverud, Department of Speech, Language & Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, MA 02215. E-mail: email@example.com.