A large body of research has demonstrated the critical role of audibility below 8 kHz for speech recognition in listeners with and without hearing loss, guiding the development of communication devices like hearing aids. Omitting EHFs appears to have essentially no detrimental effect on speech perception across a range of laboratory and clinical test conditions. However, our team recently demonstrated that EHFs are indeed useful for speech-in-speech listening when experimental conditions more closely emulate real-world listening environments.
Our interest in EHF audibility began from a theoretical standpoint: Biological resources are dedicated to supporting auditory processing at EHFs, so they are likely to provide functional benefits. Furthermore, many aspects of the human auditory system are tuned to the human vocal mechanism, likely because human vocalizations (i.e., speech) are among the most ecologically important acoustic signals for human beings. Although many speech cues are restricted to the low frequencies (e.g., vowel formants), others are high frequency in nature. The most striking examples are the fricative consonants (e.g., /s/ and /sh/), which are characterized by bands of energy that extend well above 8 kHz. Based on these observations, we hypothesized that audibility of cues at EHFs facilitates speech perception in realistic listening situations.
Our approach was to consider how EHFs might be useful in real-world listening environments. Consider a multitalker cocktail party scenario wherein the listener must recognize speech from a target talker in the context of other background talkers. For natural environments like this, the target talker is typically facing the listener, whereas other background talkers are not—they are facing other communication partners (Fig. 1A). EHF energy produced during speech is primarily emitted in front of a talker,9, 10 whereas low-frequency energy radiates nearly omnidirectionally around a talker. This principle of speech acoustics means that a talker who is facing you emits acoustical energy towards you at all frequencies, but talkers who are not facing you only emit low- and mid-frequency energy in your direction (see Fig. 1A). The spectral differences that depend on each talker's head orientation could help one recognize speech in a face-to-face conversation. Whereas the low and mid-frequencies from the target talker may be masked by background speech, EHFs from the target talker are masked very little, if at all. Armed with this understanding, we hypothesized that EHFs would benefit speech recognition when a target talker is facing the listener, but masker talkers are facing away from the listener (as in Fig. 1A).
In a recent experiment, we measured sentence recognition in the presence of two background masker talkers.11 Stimuli were recorded from three women: a target talker with a microphone at 0° (directly in front), and two masker talkers with microphones located at 45° or 60° (to the side). This way we were able to simulate a scenario where the target talker faces the listener and maskers face away from the listener. An adaptive procedure was used to estimate the speech reception threshold (SRT), defined as the target-to-masker ratio at which the listener could recognize 50 percent of the target speech. Listeners were young normal-hearing adults, screened to have normal hearing, including good sensitivity up to 16 kHz. To assess the utility of EHF speech cues, each listener was asked to recognize target speech in two experimental conditions: with access to EHFs (i.e., full bandwidth speech) and without access to EHFs (i.e., speech low-pass filtered at 8 kHz). Listeners achieved significantly lower thresholds in the full bandwidth condition than in the low-pass filter condition. In other words, access to EHF energy improved their performance. This effect was 1.6 dB for the 45° masker and 2.5 dB for the 60° masker (Fig. 1B). These differences in SRT correspond to changes of 12 and 17 percentage points, respectively.
Data from this study indicate that EHF cues can contribute to speech perception under natural listening conditions for listeners with normal hearing. These results have clear implications for the design of hearing aids, cell phones, and other communication systems that do not provide EHF cues. The opportunity to present audible EHF cues using these devices is particularly pertinent due to recent advances in technology that support a wider bandwidth of signal transmission and better feedback management. Results of this study could also motivate the inclusion of EHF threshold testing in standard clinical assessment. Speech-in-speech recognition is known to be an important component of functional communication,12 and any clinical measure that predicts this ability provides important information to guide intervention. Furthermore, our results highlight the need for more realistic speech stimuli recorded at sampling rates of at least 44.1 kHz with high fidelity microphones to faithfully represent EHFs. Finally, speech materials recorded at different positions in space around the talker's head could be used to simulate different real-world listening environments, providing more useful diagnostic information about functional hearing abilities than current clinical test materials.
1. Stiepan, S., Siegel, J., Lee, J., Souza, P., and Dhar, S., The Association Between Physiological Noise Levels and Speech Understanding in Noise. Ear Hear
2. Stelmachowicz, P.G., Beauchaine, K.A., Kalberer, A., Kelly, W.J., and Jesteadt, W., High-frequency audiometry: test reliability and procedural considerations. J Acoust Soc Am
, 1989. 85(2): 879-87.
3. Monson, B.B., Hunter, E.J., Lotto, A.J., and Story, B.H., The perceptual significance of high-frequency energy in the human voice. Front Psychol
, 2014. 5: 587.
4. Moore, B.C., and Tan, C.T., Perceived naturalness of spectrally distorted speech and music. J Acoust Soc Am
, 2003. 114(1): 408-19.
5. Lippmann, R.P., Accurate consonant perception without mid-frequency speech energy. IEEE Trans Speech Audio Process, 1996. 4(1): 66-69.
6. Vitela, A.D., Monson, B.B., and Lotto, A.J., Phoneme categorization relying solely on high-frequency energy. J Acoust Soc Am
, 2015. 137(1): EL65-70.
7. Berlin, C.I., Wexler, K.F., Jerger, J.F., Halperin, H.R., and Smith, S., Superior ultra-audiometric hearing: A new type of hearing loss which correlates highly with unusually good speech in teh “profoundly deaf”. Otolaryngol
, 1978. 86: ORL-111-6.
8. Pittman, A.L., and Stelmachowicz, P.G., Hearing loss in children and adults: audiometric configuration, asymmetry, and progression. Ear Hear
, 2003. 24(3): 198-205.
9. Monson, B.B., Lotto, A.J., and Story, B.H., Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives. J Acoust Soc Am
, 2012. 132(3): 1754-64.
10. Kocon, P., and Monson, B.B., Horizontal directivity patterns differ between vowels extracted from running speech. J Acoust Soc Am
, 2018. 144(1): EL7.
11. Monson, B.B., Rock, J., Schulz, A., Hoffman, E., and Buss, E., Ecological cocktail party listening reveals the utility of extended high-frequency hearing. Hear Res
, 2019. 381: 107773.
12. Phatak, S.A., Sheffield, B.M., Brungart, D.S., and Grant, K.W., Development of a test battery for evaluating speech perception in complex listening environments: Effects of sensorineural hearing loss. Ear Hear
, 2018. 39(3): 449-456.