Efficient multisensory speech detection is critical for children who must quickly detect/encode a rapid stream of speech to participate in conversations and have access to the audiovisual cues that underpin speech and language development, yet multisensory speech detection remains understudied in children with hearing loss (CHL). This research assessed detection, along with vigilant/goal-directed attention, for multisensory versus unisensory speech in CHL versus children with normal hearing (CNH).
Participants were 60 CHL who used hearing aids and communicated successfully aurally/orally and 60 age-matched CNH. Simple response times determined how quickly children could detect a preidentified easy-to-hear stimulus (70 dB SPL, utterance “buh” presented in auditory only [A], visual only [V], or audiovisual [AV] modes). The V mode formed two facial conditions: static versus dynamic face. Faster detection for multisensory (AV) than unisensory (A or V) input indicates multisensory facilitation. We assessed mean responses and faster versus slower responses (defined by first versus third quartiles of response-time distributions), which were respectively conceptualized as: faster responses (first quartile) reflect efficient detection with efficient vigilant/goal-directed attention and slower responses (third quartile) reflect less efficient detection associated with attentional lapses. Finally, we studied associations between these results and personal characteristics of CHL.
Unisensory A versus V modes: Both groups showed better detection and attention for A than V input. The A input more readily captured children’s attention and minimized attentional lapses, which supports A-bound processing even by CHL who were processing low fidelity A input. CNH and CHL did not differ in ability to detect A input at conversational speech level. Multisensory AV versus A modes: Both groups showed better detection and attention for AV than A input. The advantage for AV input was facial effect (both static and dynamic faces), a pattern suggesting that communication is a social interaction that is more than just words. Attention did not differ between groups; detection was faster in CHL than CNH for AV input, but not for A input. Associations between personal characteristics/degree of hearing loss of CHL and results: CHL with greatest deficits in detection of V input had poorest word recognition skills and CHL with greatest reduction of attentional lapses from AV input had poorest vocabulary skills. Both outcomes are consistent with the idea that CHL who are processing low fidelity A input depend disproportionately on V and AV input to learn to identify words and associate them with concepts. As CHL aged, attention to V input improved. Degree of HL did not influence results.
Understanding speech—a daily challenge for CHL—is a complex task that demands efficient detection of and attention to AV speech cues. Our results support the clinical importance of multisensory approaches to understand and advance spoken communication by CHL.