Selecting Scenarios for Hearing-Related Laboratory Testing : Ear and Hearing

Journal Logo

Eriksholm Workshop: Ecological Validity

Selecting Scenarios for Hearing-Related Laboratory Testing

Smeds, Karolina1,2; Gotowiec, Sarah1,2; Wolters, Florian1,2; Herrlin, Petra1; Larsson, Josefina1; Dahlquist, Martin1

Author Information
doi: 10.1097/AUD.0000000000000930



Extensive research has established that hearing loss is a disabling and growing individual and public health concern. A recent report documents that it is now a top-five contributor to the overall burden of disease in most European countries (Shield 2019). The report also describes the benefits of hearing aid use, such as increased quality of life and overall health, and lowered risk of depression and cognitive decline as compared with those with untreated hearing loss. For these hearing aid benefits to be realized in hearing aid users’ everyday lives, hearing aid development and fitting depend on evaluation methods that show a high degree of ecological validity, that is, methods that to a high degree reflect real-life hearing-related function (Keidser et al., 2020, this issue, pp. 5S-19S).

Evaluation of hearing aid performance or benefit can be performed in research participants’ own environments or in laboratory settings. Several aspects of research design need to be treated carefully in order for laboratory tests to produce ecologically valid findings. First, relevant test scenarios need to be selected. These scenarios should describe both the acoustical characteristics of the environment and the listening activities performed. Next, the audio or audiovisual implementation of the selected test scenarios needs to be done carefully. The scenarios should sound (and preferably also look) realistic, but it is also important that the hearing-aids behave as they would in a corresponding real-life situation. Finally, the listening activities performed in the scenarios (e.g., speech conversation or music listening) should govern the outcome measures used in the laboratory test. All these design aspects directly affect a laboratory test’s ability to produce ecologically valid results.

The present article will focus on the selection of relevant test scenarios. Only a handful of studies have scoped the variety of listening situations that people commonly encounter in their daily lives and/or subsequently categorized them for laboratory use (Walden et al. 2004; Jensen & Nielsen 2005; Wagener et al. 2008; Wu & Bentler 2012; Wolters et al. 2016; Humes et al. 2018; Wu et al. 2018). Sometimes referred to as “prototype listening situations” (PLSs; Walden 1997; Walden et al. 2004; Wu et al. 2018), these listening situations have been described as “a set of situations that can represent a large proportion of the everyday listening situations experienced by individuals” (Wu et al. 2018, p. 294).

In the present article, the term “auditory reality” refers to the variety of everyday listening environments experienced by an individual, “listening situations” refers to situations encountered in everyday life, and “test scenarios” refers to the listening situations that are implemented in the laboratory or the clinic. The term PLS is used when researchers themselves have used the term for a suggested list of listening situations that could be appropriate to use for laboratory or clinical tests.

Keidser et al. (2020, this issue, pp. 5S-19S) listed purposes of striving for high ecological validity. The development of a set of realistic scenarios that can be used for laboratory testing supports three of the four purposes. Purpose A (understanding) describes the need for knowledge about the role hearing plays in everyday life. Obtaining such knowledge constitutes a necessary first step in creating realistic laboratory scenarios. Carefully selected scenarios are also a prerequisite for developing protocols to assess everyday hearing ability and hearing-intervention benefit in the laboratory (Purpose B, development). Finally, a general set of test scenarios could be complemented with profession-specific lists of scenarios for occasions when the purpose of testing is to investigate an individual’s ability to function in a particular profession (Purpose C, assessment).

The present article explores three research aims related to laboratory test scenarios. The first research aim was to determine whether a cohesive set of scenarios for laboratory testing had been published in previous literature. Part 1 of the article addresses this aim by reviewing the purposes, results, and limitations of prior research on listening situations and laboratory test scenarios. This section presents a literature review of research that investigates laboratory test scenarios and PLSs (Walden et al. 1984,2004; Walden 1997; Wu et al. 2018), investigates and categorizes everyday listening situations (Jensen & Nielsen 2005; Wagener et al. 2008; Wu & Bentler 2012; Humes et al. 2018), and presents a listening situation framework that can be used when selecting realistic laboratory test scenarios (Wolters et al. 2016). Taken together, this body of literature evidences wide variation in both methods and selection criteria for potential test scenarios.

After establishing an understanding of people’s auditory reality, the next step is to decide which listening situations should be selected for laboratory testing. Related to this, the second research aim was to consider possible criteria for the selection of laboratory test scenarios. Part 2 outlines one potential combination of selection criteria by presenting a field trial study that used ecological momentary assessments (EMA; Holube et al., 2020, this issue, pp. 79S-90S; Stone & Shiffman 1994) to study aspects of people’s everyday listening situations. The data analysis demonstrates different ways that one can filter EMA data depending on the selection criteria of interest, in this case self-reported occurrence, importance, and difficulty.

The third research aim was to explore the test scenarios that various selection criteria lead to. To do so, Part 3 compares the data presented in Part 2 with data from selected prior studies from Part 1. The subsequent Discussion section briefly discusses ways to implement laboratory test scenarios (with a focus on the listening activities or tasks performed) and finishes with a short discussion of the strengths and limitations of the various data collection methods used to investigate auditory reality. The article concludes with a summary of the collected studies.


The first step toward creating more realistic research materials is to investigate and characterize the auditory activities and acoustic environments encountered in individuals’ daily lives (Wu et al. 2018). Early research by Walden et al. (1984) in this area evaluated the feasibility of self-report measures of amplification success. Using data from a retrospective questionnaire about perceived benefits of hearing aid amplification, factor analysis identified four active listening situations that were categorized based on environmental factors. These were later characterized as four PLSs (Walden 1997). Three of these situations include speech (in quiet, in background noise, or with reduced information). The fourth listening situation is “nonspeech or not live speech” and described as being of “relatively minor importance.” However, only 8 of the 64 questionnaire items asked about nonspeech situations, which makes it difficult to draw conclusions about the importance of these situations. Moreover, although the factor analysis was data-driven, the item generation was not. Therefore, it is unclear whether the listening situations in the questionnaire span the range of those encountered in peoples’ daily lives.

Another way to explore individuals’ everyday listening situations is using interview methodology. The client oriented scale of improvement (COSI) by Dillon et al. (1997) was created as a rehabilitative tool to subjectively measure reduction in disability after hearing rehabilitation. Clients choose up to five listening situations in which they assess that they require assistance with their hearing, and these self-reports are sorted into one of 16 predefined categories. Using data collected from COSI, Dillon et al. (1999) reported that the two most frequently occurring situations in which participants wanted assistance with their hearing were “listening to TV/radio” and “conversation with one or two others in quiet.” The COSI taps directly into the situations that are self-perceived to be challenging, and therefore the highlighted listening situations are likely important and relevant.

One drawback of retrospective methodology like the COSI or various other questionnaires is the possibility of recall bias. In response to this, later studies have used EMA (Holube et al., 2020, this issue, pp. 79S-90S; Stone & Shiffman 1994) to sample participants’ listening situations as they happen. EMA has been validated as a feasible and valid research tool that can be used to understand the daily listening experiences of adults who use hearing aids (Galvez et al. 2012; Timmer et al. 2017). A small collection of studies has used EMA in their study on everyday listening situations. Walden et al. (2004) asked participants for their preferred microphone configuration in experienced listening situations, which were then classified by test participants according to a categorization tree. Based on these categorizations, the authors suggested a model with 24 PLSs. The two most frequently occurring PLSs were “close speech face-to-face, with diffuse background noise in a low-reverberation environment” and “close speech face-to-face, with diffuse background noise in a high-reverberation environment.” Some limitations to note about this research are that participants were asked to report solely on active listening situations, the categorization tree includes only acoustic factors, in particular those important for evaluating microphone configurations, and there are no suggestions for a limited set of PLSs, which could be of utility for creating laboratory-based tests.

In a paper-and-pencil EMA study, Wu and Bentler (2012) asked test participants to classify listening events according to five environment and six activity categories. The authors used the descriptive data, together with noise dosimeter recordings, to learn more about the participants’ everyday listening situations. The most frequently occurring situations were “no or little conversation or speech-listening” and “speech listening to media,” both in a home environment. There are no indications of how the environment and activity categories were chosen.

In two further studies, test participants made audio recordings of self-selected listening situations. In an EMA study by Jensen and Nielsen (2005), participants categorized their recordings into seven categories (in situ), and evaluated occurrence, importance, and performance in the recorded situations. The most frequently occurring situation was “everyday sounds,” followed by “conversation with several persons” and “conversation with one person.” In a study by Wagener et al. (2008), participants similarly made recordings in their everyday lives. They later returned to the laboratory and listened to their own recordings and rated occurrence, importance, problems, and concerns in the situations. The situations rated to occur most frequently were “work with machines/housework” and “reading/office work.”

Humes et al. (2018) used continuously collected hearing aid data to investigate the acoustic environments encountered by a large group of older hearing aid users. The hearing aids automatically classified the listening situations experienced by the participants into one of seven categories. The most commonly occurring environment was “quiet,” closely followed by “moderate-level speech in quiet.” No self-reported data were included, which meant that the acoustic classifications were limited to the combination of signal-processing parameters that the hearing aid could access, and no information about intent or activities was reported.

A recent study by Wu et al. (2018) noted a lack of research where speech and noise levels, availability of visual cues, and speech and noise locations were examined within the same research. Their study aimed to collect empirical data with which to develop a set of PLSs that represents typical speech listening situations. Participants made all-day audio recordings and filled out in-situ surveys (prompted approximately every 2 hr) describing speech location, visual cues, noisiness, and location of noise. Speech and noise levels were derived from the recordings, and all data were analyzed using a cluster analysis from which 14 PLSs emerged. The PLSs include acoustical information about speech levels, noise levels, signal to noise ratios (SNRs), visual cues, and locations of speech and noise sources. The most frequently occurring PLSs were “speech face-to-face with visual cues in quiet” and “speech face-to-face with visual cues in diffuse background noise” (9 dB SNR). No information about activities or tasks were reported, and only speech situations were included for analysis.

The results of these EMA studies may constitute an important step toward selecting a set of valid laboratory test scenarios that could be used to increase ecological validity of hearing research. However, the reviewed EMA studies focus either on speech listening or represent situations that were self-selected by participants. Therefore, the reported listening situations may not cover the entire range of listening situations encountered by people in their everyday lives. A study by Wolters et al. (2016) suggests that common listening situations extend beyond those that include speech communication. Developed using a structured literature review, their common sound scenarios (CoSS) framework categorizes listening situations based on contextual classifications of intention and task. The CoSS framework is a tool that can be used when selecting realistic test scenarios. Three intention categories (speech communication, focused listening, and nonspecific) are divided into seven task categories. For each task category, 2 sound scenarios are described, creating 14 different sound scenarios that are further described according to occurrence, importance, and difficulty to hear. By including nonactive listening situations, CoSS spans a wider range of situations than most previous research and highlights the need to consider listening situations beyond those that include speech communication.

The studies reviewed here each investigate relevant everyday listening situations that can, in turn, be used by other researchers to develop scenarios for laboratory testing. However, taken together, the collection of research is limited by a lack of research consensus. For example, many of these studies focus on acoustics, and not listening and communication task demands or characteristics, which may be essential features of everyday listening situations. Furthermore, there is considerable variation in the criteria for inclusion as a test scenario or PLS, and in the approaches used to categorize them. Specifically, there has been no agreement on what criteria to prioritize when creating laboratory test scenarios.

Below, we present an EMA study on everyday listening situations, where three potential criteria for selecting laboratory test scenarios were examined. The test participants indicated (among other things) frequency of occurrence of the situation, importance of hearing well, and difficulty to hear in each situation that they reported. This type of data illustrates one path to selecting scenarios for laboratory testing.


A group of hearing-impaired test participants completed an EMA field trial conducted by the Office of Clinical Amplification (ORCA) Europe laboratory. The EMA trial was part of a larger study that also included laboratory testing.


During the field trial, participants provided information about their auditory reality by filling in online questionnaires probing their momentary listening activity and sound environment. All data collection was performed in compliance with the Nuremberg Code (JAMA 1996) on ethical treatment of participants in medical research.

Participants and Hearing Aids •

Nineteen experienced hearing aid users (mean 74 years, range 42 to 90) with binaural sensorineural hearing loss within the boundaries for the standard audiograms N2–N4 and S2 (Bisgaard et al. 2010) were recruited from ORCA Europe’s test participant database. Participants were provided with a pair of Unique Fusion 440 hearing aids (Widex, Lynge, Denmark) and fitted according to the manufacturer’s recommended fitting procedure.

EMA Setup and Data Collection •

A custom EMA solution was implemented on a Motorola G4 Play mobile phone by using a Google Forms online questionnaire and a locally installed reminder application. Participants described their current listening situation in a mandatory free-text item and categorized the listening situation into one of the seven CoSS task categories. They also described background noise (if present) and any associated annoyance, and then rated the perceived importance of hearing well in the situation, the difficulty of hearing, and how frequently the situation occurred in their everyday lives (see Table in Supplemental Digital Content 1,, which gives an overview of the questionnaire content).

Data collection lasted 9.5 days (range 6–11 days) on average. Participants were prompted seven times a day (at 2-hr intervals) to answer the EMA questionnaire. This time interval was chosen with the aim of covering participants’ everyday listening situations in their entirety. The first prompt of the day was altered (8:00, 8:30, or 9:00) between weekdays to reduce predictability. Participants could also self-initiate questionnaires at any time with no prompting.


The data analysis included 1131 EMA reports. Median compliance for responding to the EMA prompts was 85% across test participants (range 38 to 99%). The total proportion of self-initiated reports was 11%. Most reports were from an indoor home environment (64%), 16% of the reports were made indoors in a public environment, 8% on transportation, and 12% were made outdoors.

The distribution of responses across the three CoSS intention categories (Fig. 1A) shows that approximately one-third of the responses were categorized as speech communication. Within this category, the most common task was speech communication with one other person (Fig. 1B). Approximately, one-fourth of the responses were categorized as focused listening, the most common task being listening through media, usually TV or radio. Nonspecific listening constituted almost half of the responses, and here the main task was passive listening. Most passive listening situations occurred in indoor home environments and were described as happening in quiet. Common situations included computer work, reading, and eating. Passive listening situations were also encountered in public indoor environments, outdoors, and during transport. These situations were characterized by a lack of target sounds.

Fig. 1.:
Distribution of EMA reports divided into the three CoSS intention categories (A) and the seven CoSS task categories (B). CoSS indicates common sound scenarios; EMA, ecological momentary assessment.

Common, Important, and Difficult Listening Situations •

Participants rated how often each situation occurred, how important it was to hear well, and how difficult it was to hear in the situation. The responses to these self-reported criteria were used to filter the dataset. First, the CoSS task category distribution for “daily” situations is presented. Next, “very important” and “very difficult” situations are reported.

Situations judged to occur “daily” (Fig. 2) comprised 67% of the dataset. Compared with the general CoSS task distribution, which includes all EMA reports (Fig. 1B), selecting only situations that occur “daily” led to decreased speech communication situations with more than two people, increased focused listening to media (mainly TV and radio), and increased passive listening situations.

Fig. 2.:
CoSS task category distribution for the situations rated to occur “daily.” A total of 67% of the reported situations were rated to occur “daily.” CoSS indicates common sound scenarios.

Situations judged to be “very important” to hear well in (Fig. 3A) constitute 24% of the dataset. Compared with the general CoSS task distribution that includes all EMA reports (Fig. 1B), selecting only the “very important” situations increased the proportion of speech communication and focused listening to media situations, and correspondingly decreased the nonspecific situations. When the dataset was limited to “very important” situations that occur “daily” (Fig. 3B), focused listening to media constituted half of the reported situations.

Fig. 3.:
CoSS task category distribution for the situations rated to be “very important” to hear well in (A), “very important” and occurring “daily” (B), “very difficult” to hear in (C), and “very difficult” and occurring “daily” (D). The proportion of responses (of the total 1131 reports) is indicated above each pie chart. CoSS indicates common sound scenarios.

Situations judged to be “very difficult” to hear in (Fig. 3C) constituted 8% of the dataset. In half of these situations, it was considered “very important” to hear. Compared with the general CoSS task distribution that includes all EMA reports (Fig. 1B), selecting only the “very difficult” situations increased the proportion of situations with speech communication and decreased nonspecific situations. This subset of listening situations also showed an increase in situations with focused listening to live sounds. Compared with the “very important” situations (Fig. 3A), the substantive proportion of nonspecific listening situations is considerably higher for the “very difficult” situations (Fig. 3C). In 24% (8 + 16) of the “very difficult” listening situations, participants were not engaged in a conversation or actively listening to a sound source of interest. Here, all passive listening situations were described as taking place in some sort of noise. Transport was the most commonly used location category and the two most commonly reported noise types were speech and transportation noise.

When the dataset was limited to “very difficult” situations that occur “daily” (Fig. 3D), focused listening to media constituted more than a third of the reported situations, and situations with focused listening to live sounds were not represented. However, it is important to note that “very difficult” situations that occur “daily” only constituted 3% of all reported situations.

Noise Distribution •

Test participants also rated the noisiness of the situations they encountered. The results from five response alternatives (from questions 3a and 3b in Supplemental Digital Content 1, are presented in Figure 4. When all situations are included (Fig. 4A), no noise or no annoying noise was reported in more than 80% of the situations. The distribution for situations judged to be “very important” to hear well in (Fig. 4C) is almost identical, whereas the distribution when filtered by situations that occurred “daily” (Fig. 4B) shows an increase in situations without background noise and a slight decrease in situations with annoying noise. The distribution for the situations judged to be “very difficult” to hear in (Fig. 4D) is markedly different from the others. Here, noise was reported to be annoying in more than half of the situations, and a majority of the situations with annoying noise were described as being very annoying.

Fig. 4.:
Background noise presence and annoyance for all situations pooled (A), situations occurring “daily” (B), situations judged to be “very important” to hear well in (C) and “very difficult” to hear in (D). The proportion of responses (of the total 1131 reports) is indicated above each pie chart.

Participants also indicated the main background noise source. The background noise types were subsequently categorized as speech, music, or environmental sounds. The latter included all noise types which were not speech or music, for example, nature sounds, traffic, and machine noises. When the dataset was filtered to include only situations where the noise was judged as annoying (either little, moderately, or very annoying), environmental sounds were most frequently the source of annoying background noise (60% of reports), followed by speech (30%) and music (10%). The pattern was the same for situations described as “very important” to hear well in, and situations described as “very difficult” to hear in.


The literature on listening situations, laboratory test scenarios, and PLSs reviewed in Part 1 presents a varied picture. The research offers concrete, yet different, sets of PLSs (Walden et al. 1984,2004; Wu et al. 2018) or suggestions for laboratory test scenarios (Wolters et al. 2016). Until now, none of the suggested sets have been widely accepted by the research community. A broader review of research that examines people’s auditory reality found studies that focus on the listening activities performed in various situations (Walden et al. 1984; Wu & Bentler 2012), on acoustic factors (Walden et al. 2004; Wu et al. 2018), and further studies that combine investigations of activities with (limited) information about acoustics (Jensen & Nielsen 2005; Wagener et al. 2008). The ORCA Europe data are presented in Part 2 as an example of how EMA data can be used to learn about various aspects of auditory reality.

The following section examines various criteria for selecting which real-life listening situations should be used when developing a set of laboratory test scenarios (or PLSs). As potential test scenarios should address common situations (Wu et al. 2018), the frequency of occurrence of everyday listening situations is an important selection criterion. Based on prior research (Jensen & Nielsen 2005; Wagener et al. 2008; Wolters et al. 2016), we suggest that importance to hear well and difficulty to hear may be other relevant criteria for the selection. Later we show how the use of these three criteria affect the selection of scenarios for laboratory testing by using the data reported from the ORCA Europe study together with data from prior research.

Commonly Occurring Situations

Wu and Bentler (2012) and Jensen and Nielsen (2005) show data on the occurrence of different types of everyday situations. To facilitate a comparison of their results to the ORCA Europe EMA data, the six activity categories used by Wu and Bentler and the seven activity categories used by Jensen and Nielsen were grouped based on listening intentions. The three CoSS intention categories (speech communication, focused listening, and nonspecific listening) were used. Details of how the original categories were grouped can be found in Table 1 (last three lines). Further, Wu and Bentler’s data were pooled across all environment categories. This allowed all results to be compared with the results from the present study (as displayed in Fig. 1A). The comparison is displayed in Table 1.

TABLE 1. - Occurrence of everyday listening situations across three studies
Current ORCA Europe EMA (Figure 1, left panel) Wu and Bentler (2012; data from Table 2) Jensen and Nielsen (2005; Data From Figure 6)
Data collection method Prompted EMA reports using smartphone Unprompted paper-and-pencil EMA reports for “listening conditions lasting more than 10 min” Audio recordings and paper-and-pencil EMA reports of sound environments in daily life
Number of participants 19 27 18
Age 42–90 (mean 74 yr) 40–88 yr (mean 66 yr) 39–72 yr (mean 58 yr)
CoSS intention categories
 Speech communication 31% 32% (original activity categories 1–3) 34% (original categories 1 and 2)
 Focused listening 24% 30% (original activity categories 4 and 5) 34% (original categories 3, 4, and 6)
 Nonspecific listening 45% 38% (original activity category 7) 32% (original categories 5 and 7)
For the current data, the distribution of EMA responses for the three CoSS intention categories are presented. For Wu and Bentler’s (2012) and Jensen and Nielsen’s (2005) data, the response alternatives were grouped to match the CoSS categories (described in the text).
CoSS, common sound scenarios; EMA, ecological momentary assessment.

Humes et al. (2018) and Wagener et al. (2008) data could not be compared with the other three data sets. Humes et al. used a hearing aid classification of listening situations that could indicate when speech was present but could not distinguish speech conversation from focused listening to speech or speech in a passive listening situation. Wagener et al. reported frequency of occurrence for a list of “signal groups’, but one recording could belong to several signal groups, and therefore, because the sum of occurrences across signal groups did not add up to 100%, the results could not be compared with the data sets reported in Table 1.

The substantive proportions of intentions presented in Table 1 are similar across the three datasets. In particular, the occurrences of speech communication situations are remarkably similar. In contrast, there is a seemingly higher proportion of nonspecific listening in the current dataset as compared with the other two data sets. Due to small sample sizes and lack of ability to statistically compare these proportions, the differences seen should not be overinterpreted. However, there is a possibility that some of the differences may be explained by the test methodologies used. In Wu and Bentler’s (2012) study, the reports were unprompted, and test participants were instructed to describe listening conditions that lasted longer than 10 minutes. Those instructions may mean that purely passive listening situations were not recorded at all (if they were not thought of as “listening conditions”) or under-represented (since they may last for a long time, but only be described once). In Jensen and Nielsen’s (2005) study, the audio recordings were also unprompted. In this study, test participants were instructed that all types of sound environments were relevant to record, but that only one recording of each type of situation was necessary. The prompted test paradigm used when collecting the ORCA Europe data probably caught a higher number of nonspecific listening situations.

Based on the current comparison, a focus on commonly occurring listening situations means that a set of laboratory test scenarios should include speech communication situations, focused listening to speech or other sounds, but also situations without focused listening.

Important Situations

Another criterion for choosing a set of test scenarios could be the rated importance to hear well in a listening situation. In the current dataset, 24% of the reported situations were judged to be “very important” to hear well in. Speech communication and focused listening to media (mainly TV and radio) emerged as the two most frequently occurring situations in which it was important to hear well (Fig. 3A, B).

In Wagener et al.’s (2008) study, the medians of the subjective ratings indicated that it was “important to hear” in most situations except for three nonspecific situations, which were rated “less important to hear in.” In the Jensen and Nielsen (2005) study, the researchers asked a slightly different question, focusing on the importance of the situation rather than the importance of hearing well in the situation. The most important situation in their study was “conversation with one person,” followed by “conversation with several persons.” When the results were weighted based on occurrence, “conversation with one person” was still judged to be most important, but it was closely followed by “TV/radio—informative programs,” a result similar to the ORCA Europe result.

In sum, based on the importance criterion, speech communication situations should have priority, but when combined with occurrence, TV/radio situations should also be included in a set of laboratory test scenarios.

Difficult Situations

Yet another way of selecting test scenarios could be to focus on situations where it is difficult to hear. In the current dataset, only 8% of the total reported situations were judged to be “very difficult” to hear in, so the results should be interpreted with this limitation in mind. The low proportion of very difficult listening situations is similar to the results from Wagener et al. (2008), where all situations were rated as posing little or no problem.

In the ORCA dataset, the proportion of situations described as noisy was higher for the “very difficult” situations than for any of the other data subsets (Fig. 4). Among our small subset of “very difficult” situations, it was again speech communication situations that emerged as occurring most frequently, but a quarter of the reports were in the nonspecific category. This suggests that it is not only speech situations that are difficult to cope with, and that often it is noise that makes a situation difficult, regardless of whether it is a speech communication situation or not.

These results are supported by the results from Jensen and Nielsen (2005). They did not specifically ask about difficulties, but instead included a rating of performance (with different performance criteria depending on the type of recording). The lowest performance was for the situation “other people’s speech or conversation,” followed by “other” and “everyday sounds.”

Taken together, these results suggest that using a difficulty criterion means that speech communication situations, but also noisy nonspecific situations, should be included in a set of laboratory test scenarios.

Selection Criteria Summary

Three criteria for selecting test scenarios for laboratory testing have been explored: occurrence, importance to hear well, and difficulty to hear. Many situations were judged to be important to hear well in, and data from the ORCA Europe study show that the characteristics of the “very important” subset were similar to the general data set (for instance, the noise characteristics, Fig. 4). Not many listening situations were described as very difficult, but these situations showed different characteristics compared with the general data set, with a high proportion of noisy situations (Fig. 4). There are also other potential selection criteria (than the three explored above) that could be significant, for example, situations that are effortful (Timmer et al. 2018; Gablenz et al. 2019), worrisome (Wagener et al. 2008), or critical (Mansour et al. 2019).

The present study along with the studies by Wu and Bentler (2012) and Jensen and Nielsen (2005) report on everyday listening situations with a focus on the listening activity, intention, or task solved in the situation. In all three studies, around one-third of the situations included speech communication. Reports of focused listening to speech or other sounds were nearly equally common, but the studies showed variation in the proportion of nonspecific listening situations (32 to 45%). Taken together, the results suggest that situations with speech communication, TV/radio listening, monitoring or passive listening (perhaps in noise) should be included in a valid set of laboratory test scenarios when the intention is to increase ecological validity of research findings. The current findings demonstrate that to include only speech in noise situations in a set of listening scenarios is not true coverage of the listening situations that people encounter.

Some studies have focused on the acoustical characteristics of various listening situations. Three studies (Jensen & Nielsen 2005; Wagener et al. 2008; Wu & Bentler 2012) reported overall sound levels for several types of listening situations. More detailed acoustical information was reported for the 14 speech PLSs derived by Wu et al. (2018). Their list of information comprises measured acoustics (speech and noise levels) and self-reported acoustic markers (availability of visual speech cues, talker and noise location). This type of information can be very useful when creating a more realistic acoustical implementation of speech scenarios in the laboratory or in the clinic. Future studies that combine information about listening activities or tasks with detailed acoustical descriptions of encountered listening situations, both for speech and nonspeech situations, would be a useful addition to this line of research.


This section discusses possible ways to implement test scenarios in the laboratory or in the clinic, with a focus on the listening activities or tasks performed. The section concludes with a short discussion about strengths and limitations of the various methods used to investigate everyday listening situations.

Implementation of Laboratory Test Scenarios

After selecting appropriate laboratory test scenarios, the next consideration is how to create valid laboratory tests. In this context, the implementation of test scenarios is also important to consider. Many research groups have investigated acoustical representations of everyday listening situations (Grimm et al. 2016; Oreinos & Buchholz 2016). Recently, focus has also been placed on visual representations of everyday situations, and new test paradigms have been developed that include audiovisual presentation methods (Hadley et al. 2019; Hendrikse et al. 2019; Devesse et al. 2020; Hohmann et al., 2020, this issue, pp. 31S-38S).

The selection of outcome measures is also central to the implementation of studies aiming for highly ecologically valid findings. While the discussion about the selection of test scenarios presented here suggests that speech communication is a common and important listening activity, real speech communication is seldom included in laboratory testing. Real conversations differ from speech listening in that conversations introduce social pressure to hear, understand, and participate in the conversation. This increases the complexity of the task for test participants, but also the complexity in recreating these qualities in a test situation. However, understanding these qualities and creating valid implementations of them might be essential when developing test scenarios representing people’s auditory reality (see articles by Carlile & Keidser, 2020, this issue, pp. 56S-67S, and Lunner et al., 2020, this issue, pp. 39S-47S, for further discussions).

In our research group, we have developed a novel laboratory test called Live Evaluation of Auditory Preference, which includes both real speech conversation and nonspecific listening in a paired-comparison test paradigm (Smeds et al. 2019). For speech communication situations, the test participant and one or two test leaders engage in a real conversation. Scenarios where the test participants watch TV or listen to radio represent focused listening situations. We have also experimented with ways to tap into more nonspecific listening situations, by including for example reading or vacuum cleaning. The method has been used to compare two hearing aid settings using a small set of mandatory test scenarios, representing commonly occurring listening situations, with the option of including individually selected scenarios. The good correspondence of the laboratory-based Live Evaluation of Auditory Preference results to results from an EMA study in test participants everyday life indicates a certain degree of validity in the laboratory results.

Methodological Strengths and Limitations

In the Introduction and in the Literature Review (Part 1), various methods for investigating everyday listening situations are presented. To summarize, EMA (Walden et al. 2004; Wu & Bentler 2012), questionnaires and interviews (Walden et al. 1984s; Dillon et al. 1997), audio recordings (Jensen & Nielsen 2005; Wagener et al. 2008), dosimeters (Wu & Bentler 2012), and hearing aid logs (Humes et al. 2018) have all been used, both in isolation and in combination. Each of these methods has associated strengths and limitations.

By surveying test participants’ experiences in real time, rather than relying on retrospective reports, EMA can increase ecological validity of findings by reducing memory bias. The advantage of data collected using prompted EMA responses, such as those used in the ORCA Europe data collection, is that information on all types of listening situations is collected, not only the situations test participants remember because they were associated with, for instance, difficulty. However, the method also has drawbacks. One limitation is that EMA might catch many everyday situations where hearing ability is not central. Another potential limitation is that test participants might avoid making EMA responses in certain situations where responding to an EMA might be challenging or inappropriate (Schinkel-Bielefeld et al. accepted for publication in Am J Aud). These limitations may have skewed the data presented in Part 2.

One strength of retrospective questionnaire and interview methodologies is that they allow participants to report their own perceptions and experiences. In particular, the qualitative aspect of interviews allows researchers to gather subjective information that cannot otherwise be accessed through categorical data collection nor objective acoustic recordings (Rapport & Hughes, 2020, this issue, pp. 91S-98S). On the other hand, a notable limitation with these types of methods is that they may be influenced by self-presentation and/or recall biases. Therefore, researchers must interpret the findings from studies using interviews and questionnaires with this possible limitation in mind.

Audio recordings are a method that provides a rich source of information when investigating listening situations. By listening to the recordings, the listening task can often be determined, and it is possible to derive acoustical information from the recordings. This acoustical information can either be more general, like overall sound levels (Jensen & Nielsen 2005; Wagener et al. 2008), or more detailed, like speech and noise levels and SNRs (Smeds et al. 2015; Wu et al. 2018). However, audio recordings raise privacy concerns, both regarding consent from nonparticipants and test participants’ possible reluctance to record sensitive or challenging situations. For a recently developed open-source EMA system (Kowalk et al. 2017), this privacy issue is solved by extracting acoustic features from the audio stream rather than recording audio data. Alternatively, if it is considered enough to record overall sound levels, a noise dosimeter (Wu & Bentler 2012) or various hearing aid logs (Humes et al. 2018) or recently developed hearing aid EMA applications (Timmer et al. 2017; Jensen et al. 2019; Schinkel-Bielefeld et al., accepted for publication in Am J Aud) can be used instead.

The collective strengths and limitations of these methods indicate that future research could use a multimethod approach that draws on each of their strengths. For example, EMA data collection may gain access to certain situations that may not be considered in a retrospective interview, such as common unfocussed listening situations that may sometimes be challenging for those with hearing aids. On the other hand, a retrospective interview may capture listening situations that people avoid altogether due to their hearing difficulties, which cannot be captured by EMA. Moreover, self-described situations, whether captured through EMA, questionnaires, or interviews, can be complemented by audio recordings. This type of multimethod research design would allow researchers to both scope out a fuller picture of the everyday listening situations that people experience, and to more fully qualify their importance and difficulty. This type of rich description of peoples’ auditory reality would be an excellent starting point for the selection of realistic and relevant scenarios for laboratory testing that will likely increase ecological validity of research findings.


In the present article, insights from auditory reality studies have been used to discuss and suggest relevant criteria for selecting laboratory test scenarios. These suggestions rely on the assumption that the ecological validity of research findings increases when laboratory test scenarios are representative of everyday listening situations. Further, we suggest that the selection of outcome measures should also be guided by a careful description of the selected test scenarios for increased ecological validity.

The article addressed three research aims. The first aim was to determine whether it was possible to find a cohesive set of scenarios for laboratory testing in prior literature. The collected body of previous research on laboratory test scenarios, and more generally on auditory reality, offers few agreements about a list of test scenarios or PLSs, or guidelines and criteria related to choosing such a list. The second research aim was to examine possible criteria for the selection of these test scenarios, using data collected by ORCA Europe. The analyses illustrate how different selection criteria lead to different sets of listening situations.

The final aim was to use the ORCA Europe data, together with previous publications, to more broadly compare and discuss selection criteria for laboratory test scenarios. Substantiated by prior literature and the data presented here, we suggest that frequency of occurrence, importance to hear well, and rated difficulty to hear may be suitable selection criteria. We further suggest that the selection of a limited set of listening scenarios should be based on data collected in people’s everyday life, using a multimethod approach, and include both active and passive listening scenarios. It is also apparent that creating realistic test scenarios requires further research on new test paradigms where audiovisual representation is included in parallel with descriptive explanations of the situations’ listening activities or tasks. From a future-looking perspective, an agreed upon set of listening scenarios that have been carefully selected, based on studies of auditory reality, will contribute positively to the ecological validity of future research, development, and clinical studies.


The authors thank Volker Hohmann and Douglas Brungart for their constructive feedback on the article and Niels Søgaard Jensen and Arne Leijon for their input on an earlier version of the article.


    Bisgaard N., Vlaming M. S., Dahlquist M. Standard audiograms for the IEC 60118-15 measurement procedure. Trends Amplif, (2010). 14, 113–120
    Carlile S., Keidser G. (Conversational interaction is the brain in action: Implications for the evaluation of hearing and hearing interventions. Ear Hear, (2020). 41(Suppl 1), 56S–67S.
    Devesse A., van Wieringen A., Wouters J. AVATAR assesses speech understanding and multitask costs in ecologically relevant listening situations. Ear Hear, (2020). 41, 521–531
    Dillon H., Birtles G., Lovegrove R. Measuring the outcomes of a national rehabilitation program: Normative data for the Client Oriented Scale of Improvement (COSI) and the Hearing Aid User’s Questionnaire (HUAQ). J Am Acad Audiol, (1999). 10, 67–79
    Dillon H., James A., Ginis J. Client Oriented Scale of Improvement (COSI) and its relationship to several other measures of benefit and satisfaction provided by hearing aids. J Am Acad Audiol, (1997). 8, 27–43
    Gablenz P. v., Kowalk U., Bitzer J., Meis M., Holube I. Kressner A. A., Regev J., Christensen-Dalsgaard J., Tranebjærg L., Santurette S., Dau T. Individual hearing aid benefit: Ecological momentary assessment of hearing abilities. International Symposium on Auditory and Audiological Research (ISAAR), (2019). 7, The Danavox Jubilee Foundation.
    Galvez G., Turbin M. B., Thielman E. J., Istvan J. A., Andrews J. A., Henry J. A. Feasibility of ecological momentary assessment of hearing difficulties encountered by hearing aid users. Ear Hear, (2012). 33, 497–507
    Grimm G., Kollmeier B., Hohmann V. Spatial Acoustic Scenarios in Multichannel Loudspeaker Systems for Hearing Aid Evaluation. J Am Acad Audiol, (2016). 27, 557–566
    Hadley L. V., Brimijoin W. O., Whitmer W. M. Speech, movement, and gaze behaviours during dyadic conversation in noise. Sci Rep, (2019). 9, 10451
    Hendrikse M. M. E., Llorach G., Hohmann V., Grimm G. Movement and gaze behavior in virtual audiovisual listening environments resembling everyday life. Trends Hear, (2019). 23, 2331216519872362
    Hohmann V., Paluch R., Krueger M., Meis M., Grimm G. The Virtual Lab: Realization and application of virtual sound environments. Ear Hear, (2020). 41(Suppl 1), 31S–38S
    Holube I., von Gablenz P., Bitzer J. (Ecological momentary assessment (EMA) in audiology: Current state, challenges, and future directions. Ear Hear, (2020). 41(Suppl 1), 79S–90S.
    Humes L. E., Rogers S. E., Main A. K., Kinney D. L. The acoustic environments in which older adults wear their hearing aids: Insights from datalogging sound environment classification. Am J Audiol, (2018). 27, 594–603
    JAMA. The Nuremberg Code. Jama, (1996). 276, 1691–1691
    Jensen N. S., Nielsen C. Rasmussen A. N., Poulsen T., Andersen T., Larsen C. B. Auditory ecology in a group of experienced hearing-aid users: Can knowledge about hearing-aid users’ auditory ecology improve their rehabilitation? In 21st Danavox Symposium, (2005). Danavox Jubilee Foundation. pp. 235–258
    Jensen N. S., Hau O., Lelic D., Herrlin P., Wolters F., Smeds K. Ochmann M., Vorländer M., Fels J. Evaluation of auditory reality and hearing aids using an Ecological Momentary Assessment (EMA) approach. 23rd International Congress on Acoustics (ICA), (2019). Berlin, Germany: Deutsche Gesellschaft für Akustik.
    Keidser G., Naylor G., Brungart D., Caduff A., Campos J., Carlile S., Carpenter M., Grimm G., Hohmann V., Holube I., Launer S., Lunner T., Mehra R., Rapport F., Slaney M., Smeds K. The quest for ecological validity in hearing science: What it is, why it matters, and how to advance it. Ear Hear, (2020). 41(Suppl 1), 5S–19S
    Kowalk U., Kissner S., von Gablenz P., Holube I., Bitzer J. Santurette S., Dau T., Christensen-Dalsgaard J., Tranebjærg L., Andersen T., Poulsen T. An improved privacy-aware system for objective and subjective ecological momentary assessment. International Symposium on Auditory and Audiological Research (ISAAR), (2017). 6, The Danavox Jubilee Foundation. pp. 25–30B
    Lunner T., Alickovic E., Graversen C., Ng E.H.N., Wendt D., Keidser G. (Three new outcome measures that tap into cognitive processes required for real-life communication. Ear Hear, (2020). 41(Suppl 1), 39S–47S.
    Mansour N., Marschall M., Westermann A., May T., Dau T. Ochmann M., Vorländer M., Fels J. Speech intelligibility in a realistic virtual reality. 23rd International Congress on Acoustics, (2019). Berlin, Germany: Deutsche Gesellschaft für Akustik.
    Oreinos C., Buchholz J. M. Evaluation of loudspeaker-based virtual sound environments for testing directional hearing aids. J Am Acad Audiol, (2016). 27, 541–556
    Rapport F., Hughes S. (Frameworks for change in hearing research: Valuing qualitative methods in the real world. Ear Hear, (2020). 41(Suppl 1), 91S–98S.
    Schinkel-Bielefeld N., Kunz P., Zutz A., Buder B. (Accepted for publication in Am J Aud). Evaluation of hearing aids in every day life using ecological momentary assessment. What situations are we missing?.
      Shield B. Hearing Loss – Numbers and costs. Evaluation of the social and economic costs of hearing impairment, (2019). London: Hear-It AISBL. Brunel University.
      Smeds K., Wolters F., Rung M. Estimation of signal-to-noise ratios in realistic sound scenarios. J Am Acad Audiol, (2015). 26, 183–196
      Smeds K., Dahlquist M., Larsson J., Herrlin P., Wolters F. Ochmann M., Vorländer M., Fels J. LEAP, a new laboratory test for evaluating auditory preference. 23rd International Congress on Acoustics (ICA), (2019). Berlin, Germany: Deutsche Gesellschaft für Akustik.
      Stone A. A., Shiffman S. Ecological Momentary Assessment (EMA) in behavioral medicine. Annals Behav Med, (1994). 16, 199–202.
      Timmer B. H. B., Hickson L., Launer S. Ecological momentary assessment: feasibility, construct validity, and future applications. Am J Audiol, (2017). 26(3S):436–442
      Timmer B. H. B., Hickson L., Launer S. Do hearing aids address real-world hearing difficulties for adults with mild hearing impairment? Results from a pilot study using ecological momentary assessment. Trends Hear, (2018). 22, 2331216518783608
      Wagener K. C., Hansen M., Ludvigsen C. Recording and classification of the acoustic environment of hearing aid users. J Am Acad Audiol, (2008). 19, 348–370
      Walden B. E. Toward a model clinical-trials protocol for substantiating hearing aid user-benefit claims. Am J Audiol, (1997). 6, 13–24
      Walden B. E., Demorest M. E., Hepler E. L. Self-report approach to assessing benefit derived from amplification. J Speech Hear Res, (1984). 27, 49–56
      Walden B. E., Surr R. K., Cord M. T., Dyrlund O. Predicting hearing aid microphone preference in everyday listening. J Am Acad Audiol, (2004). 15, 365–396
      Wolters F., Smeds K., Schmidt E., Christensen E. K., Norup C. Common sound scenarios: A context-driven categorization of everyday sound environments for application in hearing-device research. J Am Acad Audiol, (2016). 27, 527–540
      Wu Y. H., Bentler R. A. Do older adults have social lifestyles that place fewer demands on hearing?. J Am Acad Audiol, (2012). 23, 697–711
      Wu Y. H., Stangl E., Chipara O., Hasan S. S., Welhaven A., Oleson J. Characteristics of real-world signal to noise ratios and speech listening situations of older adults with mild to moderate hearing loss. Ear Hear, (2018). 39, 293–304

      Auditory Reality; Common Sound Scenarios; Ecological momentary assessments; Prototype listening situation; Test scenario

      Supplemental Digital Content

      Copyright © 2020 The Authors. Ear & Hearing is published on behalf of the American Auditory Society, by Wolters Kluwer Health, Inc.