Complex acoustic environments are encountered frequently in everyday life, including, for example, in train stations, supermarkets, and busy restaurants. People with normal hearing can usually communicate without effort in these environments, but people with a hearing impairment often have difficulties, even when wearing hearing aids. Digital hearing aids are gradually becoming more powerful, though, and more advanced signal processing can be provided to help those who wear them. However, to develop these new processing methods it is important to perform listening tests in these difficult listening situations.
Traditional hearing aid testing in the laboratory normally focuses on how well speech is understood in noise. However, there is much more to the experience of sound in everyday life. Specifically, the spatial aspects of sound, such as from which direction a sound comes or how far away a sound object is, are also important. This spatial awareness is used, for example, when switching attention from one person to another during a meeting or a dinner conversation. It also plays a major role when trying to understand what someone is saying in a very reverberant room.
These aspects of hearing have to be taken into account to ensure that hearing aids maintain the full richness of the acoustical information available to listeners and to help them extract meaning from it. Therefore, there is a need to develop and test future hearing aids in a variety of real-life sound environments.
Recent developments in room acoustics modeling and sound reproduction have made it possible to create complex listening situations in the laboratory by creating a so-called virtual sound environment (VSE).1,2 In a VSE, sound scenes are constructed in a computer by modeling sound sources in a simulated (or virtual) room. The sound of these virtual sources is then played through a large array of loudspeakers.
A VSE can include many virtual sound sources around the listening position (in different directions and at different distances), and the dimensions and wall properties of the room can be changed. When placed in the middle of the loudspeaker array, the listener perceives all attributes of the sound as in a real physical environment. Thus, the listener can be “transported” to different listening environments, bridging the gap between the laboratory and real life.
Oticon recently constructed a VSE system at its head office in Denmark. The physical setup consists of 29 loudspeakers placed on a sphere around the listening position in a sound studio (see Figure 1). Many listening tests can be performed in this VSE system to gain insights into the perception of speech in complex acoustic environments. It also opens up the possibility of studying many other aspects of sound, such as sound localization, reverberation, and masking phenomena. The system allows for testing new advancements in hearing aid technology directly on users early in the development process. Thus, the needs of users can be clarified and the benefit of the hearing aids maximized.
VIRTUAL ROOM AURALIZATION
Since the advent of computers, numerical models have been used for calculating how sound behaves in rooms.3 State-of-the-art software programs can model the acoustics in very complicated spaces. They do so by defining the three-dimensional geometry of a specific room with appropriate absorbent materials on the walls. The room acoustic modeling is then used to simulate the propagation of sound from a source to a listening position. The contributions of sound from all directions at the listening position, after interactions with the room, are then accumulated.
Figure 2 illustrates this process. It shows how “particles” of sound propagate in an auditorium and how the sound waves are distributed and scattered as they interact with the room surfaces. A more detailed description of the modeling process is found in the next section.
Room modeling is used especially in architectural acoustics, for example, for predicting the acoustics of concert halls before they are built. Software programs can simulate the sound in the modeled room and predict acoustical parameters in order to optimize room geometry and wall absorption properties. It is also possible to listen to the simulated sound. This is referred to as “auralization” and can be done through either headphones or loudspeakers.
The most common way of auralizing sound is by playback through headphones. The goal is to reproduce sound exactly at the eardrums of the listener, which requires accurate modeling of the acoustical properties of the listener in the virtual environment. The modeling is typically done with head-related transfer functions (HRTFs), measured from all directions around the head. The precision of the HRTFs is crucial. Ideally, those of the actual listener are measured, but this is cumbersome and thus a disadvantage. Another disadvantage of headphone playback is that the listener is “locked” in the virtual space. This means that the virtual space moves with the listener's head. The most important disadvantage, however, is that playback through headphones is not suited for testing hearing aid performance. This is because the hearing aids are effectively bypassed when the sound is played directly in the ears.
For hearing aids to function in a virtual environment as they would in a real environment, a realistic sound field must be created around the listener's head. This can be done by using a large number of loudspeakers. When positioned in such a sound field, the listener experiences being in the virtual acoustical space. Since the acoustical sound field created with the loudspeakers is independent of the listener, free head movements are possible. Furthermore, it does not matter whether the listener is normal-hearing or hearing-impaired or whether or not the listener is wearing hearing aids. Thus, realistic, natural listening situations can be created in the laboratory by combining room acoustic modeling with loudspeaker-based auralization. This provides a convenient platform for testing different hearing aids, algorithms, and settings directly on listeners.
THE OTICON VSE
As mentioned above, Oticon constructed a VSE system with 29 loudspeakers, placed on a sphere around the listening position. An existing sound studio, with an adjacent control room, was used, as shown in Figure 1. In many ways the VSE system is similar to the loudspeaker-based room auralization system constructed at the Technical University of Denmark.1,2 The sound studio at Oticon is slightly larger, though, and the placement of the loudspeakers is slightly more optimal.
The high-quality loudspeakers are placed on a sphere with a radius of 1.9 meters (to within a centimeter). Sixteen loudspeakers are in the horizontal plane, six are 45º below the horizontal plane, six are 45º above the horizontal plane, and one loudspeaker is directly above the listening position (reference point). The reference point has been defined as the average height of a standing person. However, for experimental purposes, the listener is seated on a hydraulic chair that can be raised to the correct height.
The sound studio has been specified to be very quiet. Ideally, an anechoic room (i.e., a room that does not reflect sound) is used. However, acoustically damped rooms with low reverberation times are also acceptable. The sound studio has absorbent materials on all the surfaces, including the doors and windows, giving the room a low reverberation time. The software for creating the virtual sound environments, as well as that for running listening experiments, is installed on computers in the control room.
Creating a virtual sound scene involves several steps. First of all, a room is modeled in three dimensions on a computer. All the surfaces, including walls, floor, ceiling, and furniture, in the acoustical space are defined in terms of their geometries and absorbing, reflecting, and diffusing properties. Then a sound source and a listening position are specified in the virtual room. The sound propagation from the source to the listening position is then calculated. Some sound reaches the listener directly through the air (direct sound), while other contributions of sound arrive after interactions with the room (reflections). The sound arrives at the listening position at different levels, at different times, and from different directions.
All the information about how sound travels from a source to the listening position can be described by a so-called room impulse response (RIR).4,5 Figure 3 shows how this is done in a rectangular room with a sound source (a loudspeaker) and a listener. As the figure shows, some sound reaches the listener directly through the air, while later contributions arrive from other directions after being reflected by the walls.
This process is described by the RIR shown in Figure 4. The graph shows that the direct sound arrives first and that it has a high level (amplitude). The early reflections arrive later at different times. The levels of the reflections depend on both the traveling time of the sound and the absorbency of the walls. The circular display indicates the direction from which each sound contribution arrives at the listening position.
The same principle is used to calculate RIRs for complex three-dimensional models of rooms. The sound arrives from all directions in three dimensions, though, and many more reflections occur as the sound continues to be reflected of the walls. If more sound sources are present in the virtual room, a unique RIR is calculated for each source.
The RIRs are used as input to the auralization process. The goal of the auralization is to reproduce the sound field at the listening position in the virtual environment at the center of the loudspeaker sphere. This is done with a technique called high-order ambisonics (HOA), which is based on a spherical harmonics decomposition of three-dimensional sound fields.7,8
In principle, all loudspeakers are used simultaneously to reconstruct the direct sound and each early reflection individually at the reference point, with their corresponding levels, arrival times, and directions. The late reflections are treated slightly differently. They represent the diffuse reverberation of the room, which carries important information for distance perception. Accurate reproduction of the directional characteristics of late reflections is, however, not needed. Therefore, the late reflections are reproduced with an algorithm that uses only energy envelopes of the late part of the RIR, which preserves its diffuseness and frequency spectrum.
The contributions from all the RIR components (direct sound, early reflections, and late reverberation) are then added to obtain a filter for each loudspeaker channel. These filters are applied to a monophonic signal (such as recorded speech) to get the 29 loudspeaker signals. This procedure is repeated for each RIR, i.e., for each sound source in the virtual space. By adding up all of the signals, a complex sound scene, with many simultaneous sound sources, can be built up. Thus, 29 sound files, one for each loudspeaker, are derived offline.
Before playing the sound files back it is necessary to equalize the loudspeakers to ensure that they have flat frequency responses. Furthermore, the time delays of the sound from each loudspeaker to the reference point have to be compensated for. This calibration ensures that the contributions of all the loudspeakers add constructively to the desired sound field at the reference point.
With high-order ambisonics, the sound at the reference point is reproduced as accurately as possible. However, the sound is also correct at some distance from the reference point. This means that there actually is an area around the reference point where the curvature of the wave front is the same as for a real source. The size of the area where the sound field is correct depends on the number of loudspeakers in the VSE system. To illustrate this, the wave fields due to a real and a virtual sound source are illustrated in Figure 5. The fact that the sound field is correct over a certain area allows listeners to move their heads freely, as they would in a normal sound field. Allowing head movements helps the listener extract spatial information from the sound, which contributes to the realism of the reproduction, since the virtual sound source behaves like a real sound source.
The described VSE system is largely custom-made. Therefore, it is not commercially available, even though some components can be bought “off the shelf.” The exact construction and the use of the different components have, however, been verified through several years of research at the Technical University of Denmark (DTU). In Oticon's VSE system, the RIRs are calculated with a commercially available room acoustic modeling software program called ODEON.9 On the other hand, the software for implementing the auralization is custom-made1,2 and it has been adapted to the specific loudspeaker setup and listening room at Oticon.
With the VSE system, it is possible to create a great variety of acoustical spaces, such as living rooms, churches, restaurants, office spaces, concert halls, and train stations. In all these environments the acoustical properties of the walls and the furniture can be changed. Also, the listening position and the sound sources can be chosen freely in the virtual space. The directions and distances to the sound sources do not depend on the loudspeaker layout of the playback system. Furthermore, any signal, such as recorded speech, noise, musical instruments, or any other recording from real life, can be used as input to the sound sources in the virtual space. This gives many possibilities for creating sound scenes to be used in listening experiments.
In many respects, testing in a VSE is similar to traditional testing in the laboratory. Typically, the listener sits on a chair and listens to a sound scene. He/she then performs some task and responds either by speaking into a microphone or by pressing buttons on a response device, such as a touch screen. To test new algorithms in hearing aids the listener can be given prototype hearing aids that are connected by wires to a computer. The listener can then control the settings in the hearing aids with the response device. In this way, the listener can, for example, switch between two hearing aid settings and select which one he/she prefers. The sound scenes are controlled by the experimenter located in the control room.
One of the main advantages of using a VSE for performing listening tests is that very realistic real-life scenarios can be created under controlled conditions. Being able to change all aspects of the listening scenario without moving the listener is convenient for both the experimenter and the listener, as testing in real environments (such as a restaurant or a car) can be cumbersome. Since, the tests are reproducible, the same scene can be presented again later, either to the same person or to another.
The availability of relevant listening scenarios makes it possible to test many aspects of hearing, such as speech intelligibility, listening preference, annoyance, listening effort, or even cognitive factors, such as working memory and attention.10 To investigate speech understanding under noisy conditions, a scene can be created with several simultaneous sound sources around the listener.
Another interesting situation to study is one in which the listener has to switch attention between several talkers.
It is well known that hearing-impaired listeners have difficulty understanding speech in very reverberant rooms, such as churches, empty halls, and small rooms with hard walls. The effect of reverberation can be studied by simulating these difficult acoustical spaces. Even though speech understanding is important, it is also crucial to study other types of sound that help give a complete picture of the acoustic world. By doing so, the clinician can ensure that the sound provided by the hearing aids is as transparent and natural as possible, and, especially, that the user's perception of direction and distance is correct.
It is important for a hearing aid manufacturer to test hearing aids under as many and as realistic listening conditions as possible. The ability of Oticon's new VSE system to simulate complex listening scenes in the laboratory makes it possible to study many aspects of spatial sound. Therefore, the ability of Oticon's new VSE system to simulate complex listening scenes in the laboratory offers important benefits.
Thereby it can play an important part in expanding our understanding of the needs of hearing aid users and of audiology in general.
The system is also very useful for demonstrating the richness of acoustical information available to listeners in everyday environments. However, its primary application is as a platform for the development of new hearing aid features that will ensure the maximum benefit for hearing aid users.