Secondary Logo

Journal Logo

Using Virtual Reality to Assess Auditory Performance

Stecker, G. Christopher, PhD

doi: 10.1097/01.HJ.0000558464.75151.52
Virtual Reality

Dr. Stecker is the director of the Spatial Hearing Laboratory at Boys Town National Research Hospital, an adjoint associate professor at Vanderbilt University School of Medicine, and the founder of Auditory Space, LLC. He received his PhD in psychoacoustics from the University of California, Berkeley, and has conducted psychophysical, acoustical, and neuroimaging research in human spatial hearing. His interests include spatial perception in acoustically complex environments, perception, audio for virtual and augmented reality, and the functions of the human auditory cortex.

Auditory assessments aim to quantify auditory abilities that impact real-world listening, such as replicating patient complaints of listening difficulty in specific situations or understanding how auditory systems perform in natural scenes. This goal suggests that assessments should be as realistic as possible. However, the need to quantify specific abilities requires greater experimental control than achievable in most natural settings. This fundamental trade-off between realism and control has existed since perception science began. But emerging technologies in virtual and augmented reality promise to bring the real world into the lab–or vice versa–and break this trade-off. Implemented correctly, these technologies offer new approaches to quasi-realistic auditory assessment with improved multisensory consistency, more natural tasks, and greater participant engagement that will make assessment more reliable, valid, and fun.

Figure 1.

Figure 1.

Figure 2.

Figure 2.

Back to Top | Article Outline


Traditional laboratory-based testing remains the gold standard for auditory assessment in the clinic and in basic research. But several features of this approach limit the real-world validity of laboratory tasks. First, these tasks are rife with multisensory conflict. Booth walls and desktop displays of the testing environment are visually distracting and inconsistent with the auditory stimuli listeners are asked to judge, whereas real-world listening involves a consistent multisensory context that can reduce uncertainty and support task performance even when non-auditory cues are not explicitly informative about the task. Removing that support increases demands on participants’ memory and mental imagery—cognitive skills that can confound attempts to measure sensory abilities. Even in explicitly multisensory studies, the spatial arrangement of auditory and visual stimuli are often poorly matched: Visuals may appear on a desktop display, while sounds appear inside the head or from a distant loudspeaker. In contrast, real-world auditory and visual features are aligned in all directions and remain so as a person moves around, producing a natural experience of immersion within the sensory world. Second, laboratory-based tasks are behaviorally unrealistic, typically presenting explicit feedback about an isolated task dimension such as localization or identification. But real-world listening combines multiple task dimensions and implicit feedback. Conversation in a crowded restaurant, for example, requires simultaneous localization, segregation, and identification of target speech, along with the evaluation of facial cues that might indicate which talker asked an important question but not the content of the question itself.

Back to Top | Article Outline


Virtual reality (VR) and other immersive technologies have been gaining attention as new modes for computer-based entertainment, gaming, communication, and work. VR head-mounted displays (HMDs) present video and audio content (typically a simulated 3D environment) customized for the individual user. Low-latency tracking of the user's position in space is used to update the display in close to real time, allowing direct interaction with immersive multisensory experiences.

At Boys Town National Research Hospital's Spatial Hearing Laboratory, we have been using commercial HMDs to explore the benefits of VR for auditory assessment. The focus has been on enhancing auditory assessment rather than on assessing the performance of virtual experiences (e.g., 3D audio) themselves. As such, our approach has been to add immersive multisensory elements without giving up the control made possible by existing unisensory methods and apparatus. For computer-based assessment, this is accomplished by software commands sent to and from a dedicated process that controls VR-based interactions with the participant. Across multiple investigations, we have noted several exciting potential benefits:

Improving Multisensory Consistency. Replacing the sound booth with even a simple virtual scene can eliminate distracting visual elements and provide better consistency between visual and auditory elements, which can help participants understand the auditory stimulus even when visual cues are not directly informative. Working in my lab at Vanderbilt University, Travis Moore, AuD, PhD, conducted a binaural localization study in which listeners indicated the direction of a perceived sound by turning their heads in a simple immersive scene—a circular array of 360 balloons (one per degree azimuth). On each trial, they turned to face the balloon closest to where they heard a target sound and pressed a button to pop the selected balloon with a simple silent animation. This VR-based approach (a) provided a more consistent and controlled visual background than the sound booth interior, (b) supported participants’ sense of target sounds’ spatial position relative to their own dynamic viewpoint, and (c) provided naturalistic feedback on the effects of the participants’ own actions rather than explicit indication of correct and incorrect responses.

Bringing the Real World into the Lab. AuD student Steven Carter integrated VR-based experiences into a classic minimum audible angle (MAA) task from spatial psychoacoustics. The MAA measures a listener's ability to distinguish the spatial separation between loudspeakers that emit two successive sounds, reporting “left then right,” or vice versa. The visibility of loudspeakers in an MAA task presents a potential confound that can bias a listener's judgment and is difficult to control. Carter's approach allowed us to display only the relevant pair of loudspeakers on each trial and compare data across trials with and without visible speakers or with invalid visual cues. As in previous MAA studies, this experiment collected responses using a simple two-button response box, but in this case, the response box was itself virtual, appearing only after the stimulus presentation and disappearing after the response. Altogether, the VR approach allowed us to (a) minimize distraction from non-task-relevant sensory information (button box and unselected loudspeakers) and (b) specifically control multisensory elements as potential factors in the experimental design (visible loudspeaker positions).

Enabling Naturalistic Multi-Dimensional Tasks. By simulating persistent multisensory objects, VR-based approaches can support naturalistic multidimensional tasks that would be too complex to conduct or place too heavy a demand on mental imagery in purely auditory approaches. An example is a task, developed in collaboration with Ramani Duraiswami and Dmitry Zotkin at Visisonics Corporation, to look at listeners’ sensitivity to reverberation differences among talkers in a multitalker environment. In this task, participants use head turning to explore a scene consisting of four to six simultaneous familiar talkers. The audio was presented from a loudspeaker array or via head-tracked virtual 3D audio, along with reverberation simulating a small classroom. One of the talkers (the target) was rendered with different reverberation than the other talkers, and the participants’ task was to locate and identify the target talker. An HMD presented a blank gray background with identical human-sized gray capsules at each talker location. Although the target cannot be identified based on the visual information alone, the capsules support the participants’ perception of the talker locations, helping them direct their attention to each as they search for the target. Once located, the participants turn to face the target and press a button to indicate the target location and produce a virtual button box for selecting the target talker's identity. We refer to this as a Multisensory Identification, Segregation, and Localization (MISL) task because it naturally combines those elements rather than isolate a single task dimension. By asking the listener both where and who the target is, the MISL task accommodates assessment in users whose cognitive strategies emphasize different salient features (e.g., spatial versus voice differences) as they might in real-world listening. Even so, the design follows established psychophysical principles–in this case, “odd one out” discrimination. Other MISL tasks explored in the lab employ multi-interval or cued matched-to-sample tasks (e.g., find a particular talker in the scene).

Enhancing Participant Engagement. For many users, immersive VR experiences are surprisingly rewarding and enjoyable. In collaboration with colleagues Erick Gallun, PhD, of the VA National Center for Rehabilitative Audiology Research, and Aaron Seitz, PhD, of the UC Riverside Brain Games Center, we are exploring whether VR-based approaches can make basic auditory assessments more fun and engaging. In one application, we added game elements to a classic two-interval, forced-choice procedure for auditory discrimination. A series of two sounds is presented to a listener, who is then asked to choose which interval contains a target–for example, a specific modulation, a tone in noise, etc. Typically, the intervals are marked with lights on a handheld button box, and the participant presses one of two buttons to indicate the target. In our version, participants use an HMD to view a simple 3D arena. Each sound interval is accompanied by the appearance of a visual object, for example, a blue sphere or a green cube. The participant turns toward the object that appeared with the target sound and presses a button, launching a marker that provides feedback on the selected item (by proximity) and correctness of the response (by marker color). Importantly, the underlying task does not differ from the classic psychophysical approach and can be used for any discrimination paradigm.

Early feedback from adult and adolescent participants suggests better engagement with the VR task, which we anticipate could result in reduced fatigue, better compliance with task instructions, improved data quality, and greater data quantity gathered due to longer on-task periods. Even greater benefits could be achieved by adding an appropriate visual narrative, for example, “stopping the broken robots” or “capturing rare insects.” The approach is also well suited to adaptive procedures, which mimic engaging video games by increasing the level of difficulty over time in a single session (to track a threshold) or across sessions (for training purposes).

In conclusion, VR-based approaches offer numerous advantages for auditory assessment. By replacing the sound booth with a more appropriate visual context, they enhance immersion, minimize distraction, and reduce conflicting multisensory information. In an important sense, they can bring the real world into the lab through naturalistic and complex tasks–such as MISL tasks–that tap into realistic multidimensional listening strategies. They can create engaging experiences that reduce fatigue, enhance the quality and quantity of data, and could be used to target younger listeners.

Notably, the use of these technologies is at a very early and exploratory stage. There is a clear need to develop and validate tools that make immersive technologies accessible to researchers, clinicians, and diverse user populations. Implemented correctly, these approaches can fundamentally reshape not only how we learn about our patients’ auditory abilities but also how our patients engage with future immersive technologies for communication, entertainment, education, and work.

Back to Top | Article Outline


1. Maddox RK, Atilgan H, Bizley JK, Lee AKC Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners. eLife 2015; 4:e04995 DOI: 10.7554/eLife.04995.
    2. Moore TM (2018). Context-dependent trading of binaural spatial cues in virtual reality. PhD Dissertation, Vanderbilt University.
      3. Carter S, Stecker GC Sound localization under virtual reality. 45th Am Aud Soc, Scottsdale AZ 103 March 2018.
        4. Stecker GC, Carter S, Moore TM, Folkerts ML Validating auditory spatial awareness with virtual reality and vice-versa. J Acoust Soc Am 143:1828.
          5. Stecker GC, Moore TM, Folkerts ML, Zotkin D, Duraiswami R Toward objective measures of auditory co-immersion in virtual and augmented reality. Proc 2018 Audio Eng Soc Int Conf on Audio for Virtual and Augmented Reality P2-3.
            Copyright © 2019 Wolters Kluwer Health, Inc. All rights reserved.