Since December 2019 the COVID-19 pandemic originating in Wuhan, China has spread all over the world (1). After a reduction of new COVID-cases over the summer of 2020, several waves hit many countries since fall 2020 (2,3). With growing knowledge concerning the new coronavirus and its ways of transmission, wearing a face mask and additional personal protective gear has been shown to reduce the risk of infection (4).
Factors such as air pollution and different virus outbreaks have led to face masks being part of everyday life to protect oneself and others in many Asian countries even before the COVID-19 pandemic (5–7). This has now been adapted by the whole world and therefore wearing a face mask on several occasions has become part of the daily routine.
Nevertheless, there are also disadvantages of wearing face masks. Besides a certain discomfort and minor potential health risks of long-term wearing (e.g., de novo headaches) (8), face masks strongly interfere with communication. Acoustic attenuation (9–12), loss of facial expressions (13), and altered speech memory (14) have been identified as relevant factors. Moreover, the loss of the possibility to speechread, that is the skill to understand by using visual cues of the talker, is of great importance (15). Most of the established speech intelligibility tests use audio-only stimuli to examine the auditory system. A clinically well established speech intelligibility test is the matrix sentence test, which is available in different languages (16). Recently, an audiovisual version of the test has been introduced that is a modification with incorporation of a speechreading aspect into the German matrix test (Oldenburger Satztest, OLSA) (17). To this end, the original audio-only OLSA was supplemented with video content from the speaker (Supplemental Figure S1, https://links.lww.com/MAO/B403). This is a valuable and necessary addition, since it is well known that hearing impaired are especially reliant on speechreading (18–20). The importance of this aspect is underlined by several countries (e.g., GB (21), AUS (22), CAN (23)) having adapted rules for face mask wearing when communicating with hearing impaired individuals, which include the possibility to remove the mask, as long as certain hygiene measures are followed. In everyday life, however, even normal-hearing subjects have difficulties in understanding interlocutors with a face mask. So far, the acoustic attenuation of the masks is considered the main mechanism behind this phenomenon (10–12).
In order to understand the repercussions of wearing a mask during communication, we tested speech reception in five different conditions with normal-hearing participants: two control conditions (audio-only and audiovisual), an audiovisual condition with simulated mask and unaltered audio, and two audiovisual conditions with a simulated mask and filtered audio (medical and cloth mask).
We show here that the majority of normal-hearing individuals can use speechreading for speech comprehension and that its absence is a large factor for worsened speech comprehension when listening to individuals wearing face masks.
MATERIALS AND METHODS
The experimental protocol was approved by the institutional review board (Medizinische Ethikkommission) of the University of Oldenburg and all experiments were performed in accordance with the relevant guidelines and regulations. Informed consent was obtained from all participants. Informed consent for publication of videos and video stills was obtained from the female speaker of the audiovisual German Matrix test.
Fifteen normal-hearing, native German speakers (8 female, 7 male) aged between 22 and 42 years (mean age: 30.6 years) participated in the study. Clinical standard audiometric tests (pure-tone thresholds, digits in quiet, speech intelligibility in noise) were performed. Pure tone averages at 500, 1, 2, and 4 kHz (PTA4) of all participants did not exceed 10 dB HL in both ears. Thresholds of 50% intelligibility in digits in quiet measured with the Freiburg digit test were 0 dB HL or better. Speech reception thresholds of 50% intelligibility in noise measured with the male Oldenburg sentence test (24) ranged between −5.6 and −8.6 dB SNR with an average of −6.8 ± 0.9 dB SNR.
The audiovisual version of the female German Matrix test was used as previously described by Llorach et al. (17). This version uses the audio material of the female German Matrix test (25,26) and video recordings of the talker's head (for details see Llorach et al. (17)).
An audiovisual mask condition was simulated by editing a mask shaped object on the talker's mouth (Supplemental Figure S1, https://links.lww.com/MAO/B403). In addition, the audio signal was filtered according to attenuation patterns of a handmade two-layer cloth mask of cotton fabric and a medical mask type IIR (EN14683). The filter parameters were determined as follows: A female speaker was recorded speaking 30 sentences under each of the three conditions (without mask, cloth mask, medical mask type IIR). The recorded sentences contained three times the complete base word matrix of the German matrix, were cut sentence-by-sentence and equalized in RMS-level. The filter was built based on the difference in third-octave frequency spectra between speech produced uncovered and speech produced with the respective mask types. The average spectral differences of the two masks to the no-mask-condition of the recorded talker are shown in Figure 1 including SD on sentence level. The spectral differences of the cloth mask in this study were very similar to those described by Corey et al. (11) (see reprinted curve of Corey et al. in Fig. 1). The spectral attenuation effect of the medical mask (type II) in Corey et al. (11) was a little lower in the higher frequencies than observed in the current study. This might be due to the different types of medical masks used (type IIR in the current study versus type II in the study of Corey et al.).
The listener was seated in a sound-treated examination room in front of a loudspeaker (8030C studio monitor, Genelec, Iisalmi, Finland) and a 23.8” screen (P2419H, DELL GmbH, Frankfurt, Germany). Screen and loudspeaker were placed 80 cm in front of the seated participant. The size of the head on the screen matched the size of a real head in 1.3 m distance representing a general communication distance. The height of the loudspeaker was adjusted to the height of the ears of an average listener. The experiments were programmed in Matlab2018 (The MathWorks Inc., Nattick, Massachusetts, USA), and reproduced using VLC media player 3.0.3. (videolan.org, General Public License). The acoustic signal was directed through a sound card (Fireface uc, RME Audio AG, Haimhausen, Germany) to the loudspeaker. Acoustic signals were calibrated to a level of 85 dB SPL using a level meter (322A, PCE Deutschland GmbH, Meschede, Germany) placed at the listeners head position. The video was calibrated for synchrony using an external camera as described by Llorach et al. (17). The sentences were presented in the stationary test-specific noise. Participants were instructed using an instruction sheet. The noise level was fixed to a level of 65 dB SPL. The presentation level of the speech started at a level of 60 dB SPL and was adjusted after each sentence according to the participant's response yielding 80% intelligibility. The SRT80% was determined instead of the more usual SRT50%, since some individuals are capable of understanding 50% correct by speechreading-only (i.e., independent from the acoustic signal), leading to an undeterminable SNR. For each condition 20 sentences were presented in open-set response format, that is, participants were asked to repeat the words understood, guessing was permitted. The number of correct words was then scored by the investigator for each sentence. Participants were trained with two lists of 20 sentences in the audiovisual condition. Afterwards, the following five conditions were measured in random order:
- 1. Audio only
- 2. Audiovisual
- 3. Audiovisual with simulated mask
- 4. Audiovisual with simulated mask and cloth mask audio
- 5. Audiovisual with simulated mask and medical mask audio
- 6. A total of 140 sentences were played to each participant; 40 sentences for the training and 100 sentences for the tests.
For statistical analysis, the data was tested for normality with the Kolmogorov–Smirnov test. If normality was proven, significance was tested by one-way ANOVA and Tukey's test for correction of multiple testing and the data was plotted as mean with standard deviation (SD). If normality tests failed (data of test lists, Fig. 5), Friedman's test with Dunn's correction for multiple testing was used and data was plotted as median with range. A p value of 0.05 or less was considered statistically significant. Statistical analysis was performed with Prism 9 (GraphPad Software, San Diego, CA).
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Acoustic Attenuation of Two Types of Masks
Medical and cloth masks led to an acoustic attenuation of the voice predominantly in the middle and high frequencies (Fig. 1). The cloth mask had a detectable effect of more than 1 dB beginning from 1.5 kHz upward with a maximum of about 8 dB attenuation at around 8 kHz (blue line in Fig. 1). The medical mask had more favorable acoustic properties with a detectable effect above 2.5 kHz and a maximum attenuation of about 6 dB at around 8 kHz (red line in Fig. 1).
Speechreading With and Without Mask
There was a large improvement in speech perception of the normal-hearing subjects in this study when visual cues provided by the mouth region were available (Fig. 2): In the audio-only condition the average speech reception threshold at 80% word recognition (SRT80%) was −6.9 dB SNR (SD 1.0 dB). This indicates that 80% of the test words are correctly understood when the noise signal is 6.9 dB louder than the speech signal. In the audiovisual condition (face visible, unaltered audio), however, the average SRT80% was −9.4 dB SNR (SD 1.6 dB) indicating a statistically significant benefit of the visual cues of 2.5 dB (SD 1.5 dB, p < 0.001). In the audiovisual condition with visual mask (mouth region not visible, but unaltered audio) the SRT80% was almost equal to the audio-only condition with −6.8 dB SNR (SD 1.1 dB, p = 0.993; difference not significant). When the acoustic attenuation of the masks was added to the aforementioned condition, speech recognition further deteriorated: with the acoustic filter of the medical mask the SRT80% was −5.3 dB SNR (SD 0.9 dB) and with the acoustic filter of the cloth mask the SRT80% was −4.3 dB SNR (SD 1.0 dB).
Visual and Acoustic Effects of the Mask
The audiovisual condition with visual mask and unaltered audio was used as “baseline” condition in Figure 3. Removing the mask improved and adding the acoustic filter of the mask worsened the SNR at SRT80%. The data indicate that the visual aspect of the mask accounted for 2.6 dB SNR (SD 1.4 dB) SRT80%-difference, which was almost equal to the audio-only condition (no visual information at all). The acoustic attenuation accounted for 1.6 dB SNR (SD 1.1 dB, medical mask) and 2.5 dB SNR (SD 1.0 dB, cloth mask). Thus, both the visual occlusion and the acoustic attenuation of the mask significantly deteriorated the SNR at SRT80%.
Individual Audiovisual Benefit of Normal-Hearing Subjects
There were large differences in the SRTs of the audiovisual condition (no mask and unaltered audio) among the normal-hearing subjects. Not surprisingly, almost all subjects performed better in the audiovisual condition without mask compared to the audiovisual condition with visual mask and unaltered audio (Fig. 4). Only 2 of 15 subjects did not benefit from having visual cues of the mouth region with a SRT80%-difference of −0.3 dB SNR and 0.2 dB SNR, respectively. The benefit of the other 13 subjects lay between 1.3 dB SNR and 4.5 dB SNR.
We found a training effect of about 2 dB SNR, given as the difference between the two training lists of 20 sentences in audiovisual condition that were applied prior to the actual measurements. The SRT of the second training list did not differ significantly from the SRT measured in the audiovisual condition within the randomized sequence of test conditions (Fig. 5).
Understanding speech in situations involving a certain background noise is part of everyday life. Even for normal-hearing individuals this poses a challenge. Hearing–impaired individuals are even more affected by background noises. Speechreading in combination with auditory information can improve speech intelligibility. In our experiments normal–hearing subjects showed an improved speech understanding in noise when visual cues were available. The observed visual benefit disappeared completely when the mouth region was covered by a face mask, therefore linking this effect to speechreading. This resulted in an equally effective communication when speaking with face masks as if the interlocutors would not have any visual information at all.
It has been shown before that normal-hearing individuals profit from visual cues when speech intelligibility is assessed especially in noisy environments (27–29). In accordance, we report a 2.5 dB increase in SRT80% values in audiovisual conditions compared to audio-only conditions, which corresponds to a difference in speech intelligibility of about 30%, when approximating with the intelligibility function's slope from the female German matrix test as derived from Kollmeier et al. (16). This effect was smaller than observed by Llorach et al. (17) for the same test material. One plausible explanation is that the participants in the aforementioned study did more training lists and more conditions, thus they were able to improve their performance over the lists by getting used to the material and the talker. Other groups have reported no improvement in speech intelligibility in normal-hearing subjects, when offered an audiovisual signal compared to an audio-only signal, but this seemed to be due to an insufficient signal-to-noise ratio resulting in a ceiling effect (15). Consistent with Llorach et al. (17), our results show a wide range of audiovisual gain between subjects ranging from −0.3 to 4.5 dB SRT80% improvement. Factors that influence audiovisual gain include the ability to speechread, the ability of encoding auditory information and integration of both modalities (30). It is also clear that higher cortical processes and different biological systems are involved in audiovisual integration and that differences in the efficacy of these processes can at least partially explain inter-individual differences in normal-hearing subjects (31).
In addition, we evaluated the attenuation properties of two types of face masks (cloth and medical). In general accordance with previous findings a similar reduction in high frequency sound levels of both masks could be detected (10–12). For further studies it is important to note the differences in the acoustic attenuation of cloth masks which seem to be highly dependent on the material and number of layers used (11,12). In accordance with our findings of attenuation properties we showed that filtering the speech signal according to the attenuation patterns of a medical or a cloth mask further deteriorates speech understanding. This effect depends on the type of mask and its attenuation properties in the mid- and high frequencies and was up to the size of effect of masking the visual information (2.5 dB for the cloth mask vs. 2.6 dB visual loss). Muzzi et al. (32) investigated the effect of different types of face masks and face shields and found that different face masks had an impact on auditory speech recognition thresholds and the speech intelligibility index in noisy environments by attenuating the acoustic speech signal. They describe a decline of up to 6.4% in speech intelligibility index scores and a more than 20% decline in speech recognition when wearing a medical face mask (32). This is in line with our findings where we found an average decline of 1.6 dB in SRT80% for the medical mask that would corresponds to about 20% intelligibility loss (16). The cloth mask would correspond to about 31% auditory intelligibility loss (16). In contrast, Magee et al. did not detect significant differences in speech intelligibility between no-mask conditions and different mask conditions in audio-only analysis in quiet, but they discuss, that measuring in noisy environments could reduce speech intelligibility (33). Other groups have shown that the effect on speech intelligibility of surgical or medical face masks in certain speech intelligibility assessments is less distinct compared to N95 masks or air-purifying respirators (32,34–36). A probable cause are their larger attenuation properties especially in the mid-frequency region (11), which are known to be most important for speech intelligibility (37,38).
Our study shows that both effects, the acoustic deterioration and the missing visual cues of speechreading add up to a substantial loss in speech understanding in noise already in normal-hearing subjects by hindering the process of speech encoding (30). This combined effect was up to 5 dB in the worst acoustic condition (cloth mask) corresponding to about 60% intelligibility difference (16). This seems even more relevant since similar masks are commonly used by the public during the COVID-19 pandemic in daily life (39–41).
Some studies have evaluated modifications and alternatives to face masks to overcome the adverse effects of mask wearing on communication. It seems that raising the voice can at least partially compensate for the attenuation of the speech signal and loss of the possibility to speechread (42). Corey et al. discussed the use of transparent masks (11). Although it seems they have worse acoustic properties compared to medical masks, speech intelligibility improves with addition of visual cues especially in hearing impaired individuals (15). Further suggestions include a speech signal amplification by using lapel microphones to enhance the signal noise ratio (11). More suitable nowadays could be the use of smartphones, which alone or in combination with headphones offer very effective noise reduction in everyday life.
Using only one speaker with a simulated face mask for the audiovisual German matrix test poses a limitation to this study. For further investigations of speech intelligibility with different masks, ideal conditions would include a speaker actually wearing different types of face masks (11,15). In addition, a more realistic approximation of speech intelligibility in everyday situations could be achieved by using different male and female speakers as described for other speech intelligibility tests before (43).
Future studies should include hearing-impaired listeners since it can be assumed that hearing loss has an additional impact on speechreading and audiovisual integration resulting in a greater audiovisual gain compared to normal-hearing subjects (19,44).
We demonstrate that audiovisual speech perception is highly affected by face mask wearing. Interestingly, even in normal hearing subjects, visual aspects play a major role for this phenomenon. Both, visual and acoustic effects, thus contribute to the explanation of speech comprehension difficulties in the everyday experience of normal-hearing subjects.
1. Wu Y-C, Chen C-S, Chan Y-J. The outbreak of COVID-19: An overview. J Chin Med Assoc
2. Zuin M, Rigatelli G, Zuliani G, Roncon L. Widespread outbreak of the COVID-19 virus during the second wave pandemic: The double face of absolute numbers. Pathog Glob Health
3. World Health Organization. WHO coronavirus disease (COVID-19) dashboard. Available at: https://covid19.who.int/
. Accessed January 2, 2021.
4. Chu DK, Akl EA, Duda S, et al. Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: A systematic review and meta-analysis. Lancet
5. Xiong L, Li J, Xia T, et al. Risk reduction behaviors regarding PM2.5 exposure among outdoor exercisers in the Nanjing Metropolitan Area, China. Int J Environ Res Public Health
6. Chan EY, Cheng CK, Tam G, Huang Z, Lee P. Knowledge, attitudes, and practices of Hong Kong population towards human A/H7N9 influenza pandemic preparedness, China, 2014. BMC Public Health
7. Lau JTF, Yang X, Pang E, Tsui HY, Wong E, Wing YK. SARS-related perceptions in Hong Kong. Emerg Infect Dis
8. Ong JJY, Bharatendu C, Goh Y, et al. Headaches associated with personal protective equipment: A cross-sectional study among frontline healthcare workers during COVID-19. Headache J Head Face Pain
9. Llamas C, Harrison P, Donnelly D, Watt D. Effects of different types of face coverings on speech acoustics and intelligibility. York Pap Ling Series 2
10. Goldin A, Weinstein B, Shiman N. How do medical masks degrade speech reception? Hear Rev
11. Corey RM, Jones U, Singer AC. Acoustic effects of medical, cloth, and transparent face masks on speech signals. J Acoust Soc Am
12. Pörschmann C, Lübeck T, Arend JM. Impact of face masks on voice radiation. J Acoust Soc Am
13. Mheidly N, Fares MY, Zalzale H, Fares J. Effect of face masks on interpersonal communication during the COVID-19 pandemic. Front Public Health
14. Truong TL, Beck SD, Weber A. The impact of face masks on the recall of spoken sentences. J Acoust Soc Am
15. Atcherson SR, Mendel LL, Baltimore WJ, et al. The effect of conventional and transparent surgical masks on speech understanding in individuals with and without hearing loss. J Am Acad Audiol
16. Kollmeier B, Warzybok A, Hochmuth S, et al. The multilingual matrix test: Principles, applications, and comparison across languages: A review. Int J Audiol
2015; 54 (sup2):3–16.
17. Llorach G, Kirschner F, Grimm G, Zokoll MA, Wagener KC, Hohmann V. Development and evaluation of video recordings for the OLSA matrix sentence test. Int J Audiol
2021; 1–11. doi: 10.1080/14992027.2021.1930205.
18. Reis LR, Escada P. Effect of speechreading in presbycusis: Do we have a third ear? Otolaryngol Pol Pol Otolaryngol
19. Auer ET, Bernstein LE. Enhanced visual speech perception
in individuals with early-onset hearing impairment. J Speech Lang Hear Res
20. Oliveira LN, de, Soares AD, Chiari BM. Speechreading as a communication mediator. CoDAS
21. Government of the United Kingdom. Rules and restrictions during coronavirus—Face coverings: when to wear one, exemptions, and how to make your own. GOV.UK. Available at: https://www.gov.uk/government/publications/face-coverings-when-to-wear-one-and-how-to-make-your-own/face-coverings-when-to-wear-one-and-how-to-make-your-own
. Accessed February 20, 2021.
23. Government of Canada. Coronavirus disease (COVID-19)—Prevention and risks—Non-medical masks: About. Canada.ca. Available at: https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection/prevention-risks/about-non-medical-masks-face-coverings.html
. Accessed February 20, 2021.
24. Wagener K, Brand T, Kollmeier B. Entwicklung und evaluation eines Satztests für die deutsche Sprache Teil III: Evaluation des Oldenburger Satztests. Z Audiol
25. Wagner K, Hochmuth S, Ahrlich M, Zokoll M, Kollmeier B. Der weibliche Oldenburger Satztest. The female version of the Oldenburg sentence test. Proc 17th Jahrestag Dtsch Ges Für Audiol Oldenbg Ger; 2014. Published online.
26. Ahrlich M. Optimierung und Evaluation des Oldenburger Satztests mit weiblicher Sprecherin und Untersuchung des Effekts des Sprechers auf die Sprachverständlichkeit. Optimization and evaluation of the female OLSA and investigation of the speaker‘s effects on speech intelligibility. Bachelor Thesis; 2013. Published online.
27. Macleod A, Summerfield Q. Quantifying the contribution of vision to speech perception
in noise. Br J Audiol
28. Drijvers L, Özyürek A. Visual context enhanced: The joint contribution of iconic gestures and visible speech to degraded speech comprehension. J Speech Lang Hear Res
29. Ross LA, Saint-Amour D, Leavitt VM, Javitt DC, Foxe JJ. Do you see what i am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cereb Cortex
30. Grant KW, Walden BE, Seitz PF. Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration. J Acoust Soc Am
31. Hall DA, Fussell C, Summerfield AQ. Reading fluent speech from talking faces: Typical brain networks and individual differences. J Cogn Neurosci
32. Muzzi E, Chermaz C, Castro V, Zaninoni M, Saksida A, Orzan E. Short report on the effects of SARS-CoV-2 face protective equipment on verbal communication. Eur Arch Otorhinolaryngol
33. Magee M, Lewis C, Noffs G, et al. Effects of face masks on acoustic analysis and speech perception
: Implications for peri-pandemic protocols. J Acoust Soc Am
34. Thomas F, Allen C, Butts W, Rhoades C, Brandon C, Handrahan DL. Does wearing a surgical facemask or N95-respirator impair radio communication? Air Med J
35. Palmiero AJ, Symons D, Morgan JW, Shaffer RE. Speech intelligibility assessment of protective facemasks and air-purifying respirators. J Occup Environ Hyg
36. Radonovich LJ, Yanke R, Cheng J, Bender B. Diminished speech intelligibility associated with certain types of respirators worn by healthcare workers. J Occup Environ Hyg
37. Markham D, Hazan V. The effect of talker- and listener-related factors on intelligibility for a real-word, open-set perception test. J Speech Lang Hear Res
38. Krause JC, Braida LD. Acoustic properties of naturally produced clear speech at normal speaking rates. J Acoust Soc Am
39. Rahimi Z, Shirali GA, Araban M, Mohammadi MJ, Cheraghian B. Mask use among pedestrians during the Covid-19 pandemic in Southwest Iran: An observational study on 10,440 people. BMC Public Health
40. Gutiérrez-Velasco L, Liébana-Presa C, Abella-Santos E, Villar-Suárez V, Fernández-Gutiérrez R, Fernández-Martínez E. Access to information and degree of community awareness of preventive health measures in the face of COVID-19 in Spain. Healthcare
41. Tan M, Wang Y, Luo L, Hu J. How the public used face masks in China during the coronavirus disease pandemic: A survey study. Int J Nurs Stud
42. Hampton T, Crunkhorn R, Lowe N, et al. The negative impact of wearing personal protective equipment on communication during coronavirus disease 2019. J Laryngol Otol
43. Sheffert S, Lachs L, Hernandez LR. The Hoosier audiovisual multi-talker database. Res Spok Lang Process Prog Rep No 21
44. Rouger J, Lagleyre S, Fraysse B, Deneve S, Deguine O, Barone P. Evidence that cochlear-implanted deaf patients are better multisensory integrators. Proc Natl Acad Sci