In noisy environments, listeners benefit from both hearing and seeing a talker, demonstrating audiovisual (AV) cues enhance speech-in-noise (SIN) recognition. Here, we examined the relative contribution of auditory and visual cues to SIN perception and the strategies used by listeners to decipher speech in noise interference(s).
Normal-hearing listeners (n = 22) performed an open-set speech recognition task while viewing audiovisual TIMIT sentences presented under different combinations of signal degradation including visual (AVn
), audio (An
V), or multimodal (An
) noise. Acoustic and visual noises were matched in physical signal-to-noise ratio. Eyetracking
monitored participants’ gaze to different parts of a talker’s face during SIN perception.
As expected, behavioral performance for clean sentence recognition was better for A-only and AV compared to V-only speech. Similarly, with noise in the auditory channel (An
V and An
speech), performance was aided by the addition of visual cues of the talker regardless of whether the visual channel contained noise, confirming a multimodal benefit to SIN recognition. The addition of visual noise (AVn
) obscuring the talker’s face had little effect on speech recognition by itself. Listeners’ eye gaze fixations were biased toward the eyes (decreased at the mouth) whenever the auditory channel was compromised. Fixating on the eyes was negatively associated with SIN recognition performance. Eye gazes on the mouth versus eyes of the face also depended on the gender of the talker.
Collectively, results suggest listeners (1) depend heavily on the auditory over visual channel when seeing and hearing speech and (2) alter their visual strategy from viewing the mouth to viewing the eyes of a talker with signal degradations, which negatively affects speech perception.