Outcome evaluation is an essential component of the hearing aid fitting process (Joint Committee on Infant Hearing 2007; Bagatto et al. 2010; American Academy of Audiology 2013). The implementation of universal newborn hearing screening programs has significantly lowered the age of hearing loss diagnosis and, thereby, of hearing aid fittings (Sininger et al. 2009). Outcome evaluation using conventional behavioral tests presents a challenge in infants diagnosed early and fitted with hearing aids (under 6 months of age) because reliable behavioral responses are difficult to obtain. This concern has led to an increased interest in the use of objective outcome measures. Objective outcome measures refer to the assessment of hearing acuity, with or without hearing aids, obtained directly from the infant, while requiring minimal co-operation from the infant and parent. Suggested objective measures include transient auditory evoked potentials that represent synchronous activity in response to stimulus onsets or offsets such as the cortical auditory evoked potential (CAEP; Purdy et al. 2005; Golding et al. 2007). They also include steady-state responses elicited by an ongoing periodically changing stimulus (Picton 2011). Steady-state potentials include measures such as the envelope following response (EFR; Choi et al. 2013) or the frequency following response (Anderson & Kraus 2013). An example of a combination of transient and steady-state potentials is the speech-evoked auditory brainstem response (Anderson & Kraus 2013; Dajani et al. 2013). The present study proposes an objective test paradigm based on EFRs elicited by speech sounds.
Of these several proposed measures, the EFR has several advantages for use as an aided outcome measure. The EFR, which is a response phase-locked to the stimulus envelope, is generated at the voice’s fundamental frequency (f0) when elicited by a vowel (e.g., Aiken & Picton 2006, 2008; Choi et al. 2013). The primary advantage of using EFRs as an outcome measure for hearing aid fittings is that the test stimulus can be running speech (Choi et al. 2013). Speech stimuli, or stimuli with temporal characteristics similar to running speech, are more likely to accurately elicit nonlinear hearing aid function (for speech) during the test compared with pure tones and complex sounds that are spectrally and temporally different from speech (Stelmachowicz et al. 1990, 1996; Scollie & Seewald 2002; Henning & Bentler 2005). Moreover, hearing aid function for speech stimuli in isolated contexts (e.g., CAEP test protocol with long interstimulus intervals) may not be similar to running speech contexts (Easwar et al. 2012). Because one of the main goals of outcome evaluation is confirming audibility of speech sounds during conversational speech, representation of accurate hearing aid function for speech during the test is vital to its validity. A second potential advantage of using continuous stimuli is the possible gain in test time efficiency. The use of test paradigms that do not require interstimulus intervals and that have reasonably quick response detection times is likely to favor clinical feasibility (Aiken & Picton 2006; Choi et al. 2013). The third advantage includes the use of statistical response detection methods that reduce tester bias and increase objectivity of the test (Picton et al. 2003; Aiken & Picton 2006; Choi et al. 2013; Easwar et al. 2015). The fourth advantage relates to the putative generator sites. Because EFRs to higher modulation rates in the range of average male f0 are predominantly generated from the upper brain stem (Herdman et al. 2002a; Purcell et al. 2004), they are largely exogenous or stimulus-driven responses and hence are less affected by attention or arousal state of the individual undergoing the test (Cohen et al. 1991; Purcell et al. 2004). Insensitivity to arousal state is also likely to be of clinical significance as infants could be evaluated in their natural sleep.
However, current approaches to elicit EFRs may be improved for use as an aided outcome measure. Although EFRs elicited by a vowel can be initiated by any two consecutive interacting harmonics within the cochlea (Aiken & Picton 2006), recent studies demonstrate dominance of the first formant (F1) region (Laroche et al. 2013; Choi et al. 2013). The dominance of F1 in EFRs elicited by a vowel likely limits inference of vowel representation to lower frequencies. Because the primary goal of an aided outcome measure is to assess audibility of speech sounds through a hearing aid, representation of the bandwidth of speech, and hence assessment of high-frequency audibility, becomes an important factor for several reasons. One, speech consists of phonemes that span a wide frequency range (e.g., Boothroyd et al. 1994). Two, several frequently occurring information-bearing segments, particularly consonants, have spectral peaks in the high-frequency regions (e.g., /s/, Tobias 1959). Audibility up to about 8 to 9 kHz is important for recognition of fricatives spoken by females and children (Stelmachowicz et al. 2001). Three, wider bandwidth leads to better word recognition (Gustafson & Pittman 2011), novel word learning (Pittman 2008), and perceived sound quality (Ricketts et al. 2008). Four, limited hearing aid bandwidth has been identified as a significant factor affecting a child’s ability to access and benefit from the aided speech signal (for a review, see Stelmachowicz et al. 2004). Limited high-frequency bandwidth in hearing aids is suspected to be the primary cause for the delayed acquisition of fricatives relative to low-frequency sounds (Stelmachowicz et al. 2004). In summary, EFRs to vowel stimuli alone may not be informative about audibility across the entire bandwidth of speech and, therefore, warrant modifications to be a more suitable aided outcome measure.
In addition to representing the bandwidth of speech, frequency specificity is an important factor in stimulus selection. Because hearing loss varies in configuration and severity, especially in children (Pittman & Stelmachowicz 2003), it is likely advantageous to use stimuli that represent a wide frequency range and that demonstrate frequency specificity to adequately reflect the impact of hearing loss and amplification in different spectral regions. As well, because all commercially available hearing aids provide adjustable frequency-dependent gain (Dillon 2012), frequency-specific information is likely to provide more specific guidance in re-evaluating hearing and/or hearing aid fittings.
This article proposes an EFR test paradigm specifically adapted for use as a hearing aid outcome measure based on factors discussed earlier. The Ling-6 sounds (/m/, /u/, /a/, /i/, /∫/ and /s/), commonly used for aided behavioral detection tasks (Ling 1989; Scollie et al. 2012), provide a simplified representation of a broad range of frequencies present in speech. The low to mid frequencies are represented by the vowels and the nasal /m/, whereas the high frequencies are represented by the fricatives. The proposed EFR test paradigm incorporates the vowels and fricatives of the Ling-6 sounds with specific modifications applied to improve the range of frequencies represented, frequency specificity, and test time efficiency while maintaining resemblance to running speech in temporal characteristics.
The utility or usefulness of an outcome measure is determined by its sensitivity to change or its responsiveness to treatment/intervention (Andresen 2000). In the case of hearing aids or amplification aimed to improve audibility, evaluating the sensitivity to change in audibility is critical in determining the utility or usefulness of a proposed outcome measure. In experiment I, we investigated whether the proposed EFR test paradigm can reliably represent changes in audibility due to stimulus level in a group of individuals with normal hearing. We hypothesized that an increase in stimulus level will result in higher amplitude EFRs and an increased number of EFRs detected. In experiment II, we investigated whether the proposed EFR test paradigm can reliably represent changes in audible bandwidth based on the elicited EFRs. A change in audible bandwidth, achieved by low-pass filtering of stimuli, affects audibility of specific spectral regions. Bandwidth manipulation using low-pass filtering and its impact on EFRs will allow inference of representation of multiple frequency regions in the proposed EFR stimulus. We hypothesized that an increase in the bandwidth of a stimulus carrier will result in an increase in EFR amplitude and the number of EFRs detected.
Auditory evoked potentials in general bear no causal relationship with hearing, and the detection of an evoked potential is influenced by factors unrelated to the stimulus and hearing (e.g., myogenic noise; Elberling & Don 2007). However, the detection of an evoked potential does confirm neural representation of one or more stimulus attributes, in at least part of the auditory pathway involved in perception (Hyde 1997; Elberling & Don 2007). In experiment II, we also evaluated the relationship between neural representation of stimulus features, as inferred by the number of EFR detections and response amplitude of EFRs and performance in behavioral measures such as speech discrimination and sound quality rating in multiple bandwidth conditions. Speech discrimination and sound quality rating are both psychophysical measures of hearing aid outcome (e.g., Jenstad et al. 2007; Parsa et al. 2013). Comparison of objective EFR parameters with these behavioral measures will test convergent validity of the EFR paradigm in demonstrating sensitivity to audible bandwidth. Convergent validity is a measure of the degree to which two measures, that are thought to be representing similar attributes, show similar results (Finch et al. 2002). Both behavioral measures vary systematically, albeit differently, as a function of stimulus bandwidth. Improvements in speech discrimination due to increasing stimulus bandwidth asymptotes at a lower frequency than sound quality rating (Studebaker & Sherbecoe 1991; Studebaker et al. 1993; Cheesman & Jamieson 1996; Ricketts et al. 2008; Füllgrabe et al. 2010; Pittman 2010). The rate of growth in speech discrimination above 4 kHz is less than the rate of growth in sound quality rating in adults with normal hearing. An improvement in sound quality rating when stimulus bandwidth is increased beyond 4 kHz has been consistently demonstrated in individuals with normal hearing (Voran 1997; Moore & Tan 2003; Ricketts et al. 2008; Füllgrabe et al. 2010).
In summary, the present study aimed to evaluate the proposed speech-evoked EFR paradigm in adults with normal hearing. We chose adults with normal hearing for our preliminary evaluation to characterize the nature of responses in the absence of hearing loss. Experiment I evaluated the effect of stimulus level on EFR amplitudes and the number of EFR detections. Experiment II evaluated the effect of stimulus bandwidth on EFR amplitudes and the number of detections. In addition, the experiment evaluated the relationship between changes in objective EFR measures and behavioral measures related to bandwidth.
EXPERIMENT I: EFFECT OF STIMULUS LEVEL ON SPEECH-EVOKED EFRS
The study included 20 adults (15 females, 5 males) with normal hearing whose ages ranged between 19 and 28 years (Mage = 22.75 years, SDage = 2.34). Participants were mostly students from Western University, London, Ontario. To be eligible, participants had to pass a hearing screen at 20 dB HL, tested using insert earphones at octaves and interoctaves between 250 and 8000 Hz in both ears. Audiometric screening was carried out using a GSI-61 audiometer. Routine otoscopy completed on all participants ruled out any contraindications such as occluding wax, discharge, or foreign bodies in the ear. Tympanograms obtained in the test ear using a MADSEN OTOflex 100 immittance meter indicated static compliance and tympanometric peak pressure within normal limits (0.3 to 1.6 mL; −100 to +50 daPa). Volunteers with any self-disclosed neurological disorders were not included in the study. All participants reported English as their first language and provided informed consent. The study protocol was approved by the Health Sciences Research Ethics Board, Western University, Canada. Participants were compensated for their time.
The stimulus included vowels and fricatives to represent a wide range of frequencies. The vowels /u/ (as in “school”), /a/ (as in “pot”), and /i/ (as in “see”) were chosen to represent a range of formant one (F1) and formant two (F2) frequencies. These vowels were produced by a 42-year-old male talker from Southwestern Ontario whose first language was English. The original token produced by the male talker was /usa∫i/. One of the many repetitions was chosen for further processing using softwares Praat (Boersma 2001), GoldWave (version 5.58; GoldWave Inc., St. John’s, Newfoundland, Canada), and MATLAB (version 7.11.0 [R2010b]; MathWorks, Natick, MA, USA). The best version was chosen based on phoneme duration in the case of vowels and fricatives and, in addition, flatness of f0 contour in the case of vowels. Voice recordings were made with a studio-grade microphone (AKG Type C 4000B) in a sound booth using SpectraPLUS software (version 184.108.40.206; Pioneer Hill Software LLC, Poulsbo, WA, USA). These tokens were recorded at a sampling rate of 44,100 Hz and later downsampled to 32,000 Hz using Praat. During the EFR recording, because the stimulus was presented repeatedly with no interstimulus interval, the naturally produced sequence was edited such that the fricative /s/ also preceded /u/. The final modified stimulus sequence was /susa∫i/. This modification was made to maintain a consonant-vowel context for all vowel stimuli and to avoid abrupt transitions between two repetitions of the stimulus sequence. The fricative /s/ was duplicated instead of having the speaker produce the token /susa∫i/ to avoid differences in characteristics of /s/ that may arise due to context (Boothroyd & Medwetsky 1992). The use of identical stimuli allows averaging of the EFRs elicited by the two iterations. The duration of a single polarity token was 2.05 seconds.
Stimulus phonemes were individually edited using Praat and MATLAB. Before editing, the wav file was high-pass filtered at 50 Hz to reduce low-frequency noise. Boundaries of phonemes were marked based on spectrograms and listening. Fricatives /∫/ and /s/ were high-pass filtered at 2 and 3 kHz, respectively, using Praat to improve frequency specificity of these carriers. The cutoff frequencies were chosen based on the lower edge of the spectral peak in the naturally produced versions. The filtered fricatives were then 100% amplitude modulated (AM) at 93.02 Hz using MATLAB and matched in root-mean-square (RMS) level before and after modulation. The durations of /∫/ and /s/ were 234 and 274 msec, respectively. The durations of analysis windows per iteration of /∫/ and /s/ were 215 and 258 msec, respectively. The levels of /∫/ and /s/ relative to the overall level of /susa∫i/ were −6.5 and −7.8 dB, respectively. There were an integer number of cycles of the modulation frequency within the analysis windows of both fricatives.
The vowel spectra were modified to elicit individual responses from the first formant (F1) and the higher formants (F2+) region. To enable recording of two simultaneous EFRs from the F1 and F2+ regions, vowel spectra were edited such that the original f0 was maintained in the F2+ region, and a lower f0 (original f0–8 Hz) was created in the F1 region. With these dual-f0 vowels, EFRs from the F2+ region were elicited at the original f0 frequency, and EFRs from the F1 region were elicited at original f0–8 Hz. In principle, this paradigm is similar to multiple auditory steady state responses (ASSR), in which the simultaneously presented AM tones are each modulated at unique modulation frequencies (Picton et al. 2003). To create the f0–8 Hz stimulus, the original vowels were lowered in f0 and then low-pass filtered to isolate the F1 frequency band. The lowering of f0 was accomplished using the “Shift frequencies” function in Praat. This function shifts the f0 by the specified amount (in Hz) while approximately maintaining the formant peak frequencies. The original vowels were also high-pass filtered to isolate the F2+ frequency band and combined with the f0-shifted F1 band. The level of the dual-f0 vowel was matched with that of the originally produced vowel. The dual-f0 structure of these processed vowels increases frequency specificity of broadband stimuli like vowels while maintaining the bandwidth of naturally occurring vowels. In addition, the use of dual-f0 vowels is likely to increase test time efficiency by increasing the information gained in a given test time. The cutoff frequencies of the F1 and F2+ bands were chosen to be close to the midpoint between the estimated F1 and F2 peak frequencies. The duration, analysis time window, formant frequencies, mean f0 in each band, cutoff frequencies, harmonics included in each band, and the level of carriers relative to the overall level of /susa∫i/ for the three vowels are provided in Table 1. Processed phonemes were concatenated using GoldWave in the same sequence as the original production with an additional /s/ preceding /u/.
The processed stimulus sequence consisted of eight EFR carriers; two fricative bands and two formant carriers from each of the three vowels. Spectra of each EFR carrier are presented in Figure 1. To be spectrally consistent with the International Speech Test Signal (Holube et al. 2010), which is a standard hearing aid test signal, the /∫/ was attenuated between 2650 and 3650 Hz by 3 dB to fall within the 99th percentile of the International Speech Test Signal dynamic range at matched RMS levels. Each sweep consisted of the stimulus sequence presented in both polarities. The entire duration of a sweep was 4.1045 sec.
Each EFR elicited using the dual-f0 vowels could be influenced by the presence of the other carrier. This has been investigated often for multiple ASSRs, in which four carriers are presented simultaneously (e.g., John et al. 1998, 2002). When carrier frequencies within an octave apart are presented simultaneously, the presence of a low-frequency carrier can enhance responses elicited by the high-frequency carrier, whereas the presence of a higher frequency carrier can attenuate responses elicited by a low-frequency carrier (John et al. 1998). To evaluate such interaction effects, a pilot study comparing EFRs elicited by F1 and F2+ carriers presented simultaneously (using dual-f0 vowels) and individually was completed in 10 adults with normal hearing. Results indicated that EFRs elicited by simultaneously presented F1 and F2+ carriers were similar in amplitude to EFRs elicited by individually presented F1 and F2+ carriers (see Appendix for details, Supplemental Digital Content 1, Supplemental Digital Content 2, http://links.lww.com/EANDH/A192, http://links.lww.com/EANDH/A193).
Stimulus Presentation and Response Recording
Stimulus presentation was controlled by software developed using LabVIEW (version 8.5; National Instruments, Austin, TX, USA). Digital to analog conversion of the stimulus and analog to digital conversion of the electroencephalogram (EEG) were carried out using a National Instruments PCI-6289 M-series acquisition card. The stimulus was presented at 32,000 samples per second with 16-bit resolution, and the responses were recorded at 8000 samples per second with 18-bit resolution. The stimulus was presented using an Etymotic ER-2 insert earphone coupled to a foam tip of appropriate size. Test ear was randomized.
EEG was recorded using three disposable Medi-Trace Ag/AgCl electrodes. The noninverting electrode was placed at the vertex, the inverting electrode was placed at the posterior midline of the neck, just below the hairline, and the ground was placed on one of the collarbones. Electrode impedances, measured using an F-EZM5 Grass impedance meter at 30 Hz, was maintained below 5 kΩ, with interelectrode differences under 2 kΩ. A Grass LP511 EEG amplifier band pass filtered the input EEG between 3 and 3000 Hz, and applied a gain of 50,000. An additional gain of two was applied by the PCI-6289 card, making the total gain 100,000.
The stimulus was presented at overall levels of 50 and 65 dB SPL for 300 sweeps. The total recording time for one condition was 20.5 min. The stimulus level was calibrated in an ear simulator (Type 4157 with ½-inch microphone) using a Bruel and Kjær Type 2250 sound level meter. The level was measured in Leq while the stimulus was presented for 30 sec. Participants were seated in a reclined chair in an electromagnetically shielded sound booth. A rolled towel was placed under their neck to help reduce neck tension, and a blanket was provided for comfort. The lights were switched off, and participants were encouraged to sleep during the recording.
EFR Analysis and Detection
Response analysis was carried out offline using MATLAB. Each sweep was divided into four epochs of approximately 1.5 sec each, and a noise metric was calculated for each epoch. The noise metric was the average EEG amplitude in each epoch between approximately 80 and 120 Hz. Any epoch that exceeded the mean noise metric plus two standard deviations was rejected before averaging. EFRs were analyzed between pre-selected boundaries (i.e., analysis time window) that were chosen such that the ramp-in and ramp-out sections at the beginning and the end of the phoneme were excluded.
EFRs from the 300-sweep average EEG were analyzed in the +- average (Aiken & Picton 2008) using a Fourier analyzer (FA; Choi et al. 2013; Easwar et al. 2015). The +- average refers to vector averaging of responses elicited by opposite stimulus polarities (Aiken & Picton 2008). For the FA, tracks of f0 showing changes in f0 were obtained for the analysis window of each vowel (Choi et al. 2013). Reference cosine and sine sinusoids were generated using the instantaneous f0 frequencies in the f0 track. An estimate of brainstem processing delay of 10 msec was used to correct for response delay (Aiken & Picton 2008; Choi et al. 2013; Easwar et al. 2015). The delay-corrected EEG was multiplied with the reference sinusoids to create real and imaginary components of the EFR. The two components were further averaged to a single complex number for each response segment. Each complex number provided an estimate of EFR amplitude and phase.
To determine whether EFRs elicited by vowel carriers were detected, the background EEG noise was estimated using 14 frequency tracks surrounding the response tracks at f0 and f0–8 Hz. Because of simultaneous presentation of F1 and F2+ carriers at f0 frequencies 8 Hz apart, frequency tracks for noise estimates had to be chosen such that the response track of one band was not included as one of the noise tracks for the other. For both the F1 and F2+ response, there were six and eight noise tracks, below and above the response f0, respectively. Noise estimates from the 14 noise tracks were averaged to provide a single value to be used for an F test. For an EFR to be detected, the ratio of the EFR amplitude at the response frequency to the average noise estimate had to exceed the critical F ratio (2, 28 degrees of freedom) of 1.82 at an α of 0.05.
EFRs to fricatives were estimated using a discrete Fourier transform (DFT). Again, responses to fricatives of opposite polarities were averaged in the time domain. In addition, responses to the two iterations of /s/ were averaged. Noise estimates were obtained from three noise bins for /∫/, and four noise bins for /s/, on each side of the bin containing the modulation frequency (93.02 Hz). The number of noise bins varied for the two fricatives because of varying durations. The difference in duration leads to differences in DFT bandwidths and therefore, the number of bins that could be accommodated within approximately 15 Hz on each side of the modulation frequency (15 Hz was deemed sufficiently close to the response frequency). Consequently, the critical F ratio for an EFR to be detected was 1.97 and 1.91 for /∫/ and /s/, respectively. All EFR amplitude estimates used for the F test were corrected for possible overestimation of response amplitude due to noise (see Appendix of Picton et al. 2005).
A stimulus artifact check was completed using a no-stimulus-to-the-ear run in 10 participants. The placement of the transducer and recording electrode leads during the no-stimulus run were similar to the main experiment. The stimulus was routed using a foam tip to a Zwislocki coupler placed on each participant’s chest. Stimulus presentation level was 65 dB SPL, and recording time per participant was 20.5 min. The false-positive rate (rate of significant detections) was 3.75%, which is close to the assumed α of 5% during response analysis.
Estimation of Sensation Level
Although the EFR stimulus sequence was presented at 50 and 65 dB SPL, the relative levels of each carrier varied (see Table 1) and we therefore quantified sensation level (SL) for each carrier.
To estimate SL of each EFR carrier during EFR recordings, the threshold of each carrier was obtained using an ER-2 insert earphone. Individual carriers were concatenated without gaps to form 1-min long tracks for each one. The fricatives used for behavioral detection were high-pass filtered like in the EFR stimulus but not amplitude modulated. The f0-shifted F1 bands and F2+ bands of the three vowels were presented individually. Stimulus presentation was controlled using the Child Amplification Laboratory experimental system. The stimuli were routed through an audiometer. The audiometer dial was used to seek thresholds using a 5-dB step size. Threshold was defined as the level at which two positive responses of three presentations in an ascending run was obtained.
Estimation of Stimulus Presentation Level
The EFR stimulus sequence was presented in a looped manner through the ER-2 at a known audiometric dial level without gaps, as in the main experiments. The output of the ER-2 was recorded for 30 sec using the same set up used for calibration, at a sampling rate of 48,000 Hz and exported in the form of a wav file. The wav file was downsampled to 44,100 Hz in Praat, and individual EFR carriers were extracted. The vowels were filtered according to the cutoff frequencies specified in Table 1 to obtain F1 and F2+ bands. The RMS level of each carrier was obtained using SpectraPLUS software. The software was calibrated using a calibration tone of known level recorded by the sound level meter. Because the ER-2 is linear, the presentation level of each EFR carrier was extrapolated to test levels of 50 and 65 dB SPL. The level of /s/ was the average of the two iterations of /s/, which were nearly identical. The RMS level of each carrier at threshold was computed based on the audiometric dial reading at threshold. The SL was computed as the difference in RMS level at presentation level and at threshold.
Response Detection Time
Because responses in the main experiment were analyzed using a fixed number of sweeps, we also evaluated the time required to obtain detections in each participant. Responses were analyzed in 50-sweep increments starting with 50 sweeps. To counteract inflation of α error due to multiple comparison bias, p values for determination of response detection were corrected using an adjusted Bonferroni correction that considers the correlation between consecutive sweeps (used in the F test) in a running average (Choi et al. 2011). The adjusted critical p values for detection were 0.05 at 50 sweeps, 0.042 at 100 sweeps, 0.038 at 150 sweeps, 0.036 at 200 sweeps, 0.034 at 250 sweeps, and 0.032 at 300 sweeps. For a given carrier, the first detection at intervals of 50 sweeps was recorded as the detection time for a given carrier.
Detection time for each carrier was further differentiated in two ways. In the first method, response detection time for a carrier was based on sweep duration of the token /susa∫i/. Detection time was computed by multiplying the response detection time (in sweeps) by the sweep duration (= 4.1045 sec). This estimate reflects the detection time for a carrier when used in the /susa∫i/ stimulus context and therefore testing time with the current stimulus. However, this estimate does not account for the differences in carrier durations. In the second method (henceforth referred to as “carrier recording time”), response detection time for a carrier was based on the carrier time within each sweep. In this method, detection time for each carrier was computed by multiplying the detection time for a carrier (in sweeps) by the duration of their respective analysis windows including iterations. This estimate reflects the stimulus duration required to obtain a significant detection for each carrier.
Statistical analyses in this study were completed using SPSS (version 21; IBM, Armonk, NY, USA). Results of the main analyses were interpreted at an α of 0.05. In experiments I and II, we measured the effect of level and bandwidth on the number of EFRs detected (maximum of eight) and EFR amplitude. To compare the number of significant EFR detections at 50 and 65 dB SPL, a paired t test was completed. To compare the effect of level on response amplitudes for all eight carriers, a two-way repeated-measures analysis of variance (RM-ANOVA) was completed with carrier (/u/ F1, /a/ F1, /i/ F1, /u/ F2+, /a/ F2+, /i/ F2+, /∫/, and /s/) and level (50 and 65 dB SPL) as the two within-subject factors. The dependent variable was response amplitude estimated by the FA or DFT, irrespective of a significant detection. Because the test protocols in experiments I and II varied audibility of the EFR carriers, there were several carriers with nonsignificant EFR detections due to experimental manipulation. The use of response amplitude from significant detections alone would have resulted in a large reduction in the study sample size due to exclusion of individuals with even one nonsignificant EFR. The use of response amplitudes, irrespective of a significant detection, was considered acceptable as these may contain small amplitude EFRs (albeit, not independent of the noise), and hence, assessment of growth of response amplitude due to experimental manipulation would remain valid. Greenhouse-Geisser corrected degrees of freedom were used for interpretation of RM-ANOVA. For post hoc analyses, paired t tests were completed and results were interpreted using false discovery rate method for multiple comparisons (Benjamini & Hochberg 1995).
Because we used response amplitudes estimated by the FA/DFT irrespective of a significant detection, a noise criterion was applied to exclude those participants who exhibited high residual noise estimates. Nonsignificant response amplitudes are close to, and not independent of, the noise floor. Therefore, differences in noise estimates across conditions under comparison could be erroneously interpreted as changes in amplitude. To minimize the effect of variable noise estimates between conditions being compared, participants with high noise estimates in >25% of conditions were excluded. A test condition was considered “high noise” if the residual noise estimated by the FA/DFT was greater than the mean plus two standard deviations of residual noise estimates computed across all 20 participants. Computed in this manner, data from one participant were excluded in experiment II.
Effect of SL on Response Amplitude
Figure 2 illustrates the relationship between estimated SL of EFR carriers and response amplitudes of significantly detected EFRs from 20 participants. For all eight carriers, the data show a positive relationship between estimated SL and response amplitudes. The SLs vary across carriers as the test levels were calibrated using the entire stimulus sequence /susa∫i/, and the level of each carrier in the sequence varied in the natural production. As well, hearing thresholds varied across carriers of different frequencies.
Effect of Level on the Number of Detections
Figure 3A illustrates the average number of significant EFRs, out of the eight recorded at each stimulus level. As expected, an increase in the number of detections with level is evident. Figure 3B illustrates that the increase in the average number of detections was mainly due to an increase in the number of detections for F1 and F2+ vowel carriers. EFRs to the fricatives were detected in all 20 participants at both test levels, 50 and 65 dB SPL. A paired t test showed a significant increase in the number of detections with increase in stimulus level from 50 to 65 dB SPL (t(19) = −4.92; p < 0.001; Cohen’s d = 1.10). The average number of detections increased from 6.0 (SD = 1.65) at 50 dB SPL to 7.4 at 65 dB SPL (SD = 0.94; 95% CI for mean difference [0.80, 1.99]).
Effect of Level on Response Amplitude
Figure 4 illustrates the effect of level on response amplitude for each carrier. The increase in average response amplitude is greater for vowel carriers compared with fricatives. The RM-ANOVA indicated a significant main effect of level (F(1, 19) = 47.57; p < 0.001; η2partial = 0.72), a significant main effect of carrier (F(3.14, 59.69) = 46.93; p < 0.001; η2partial = 0.71), and a significant interaction between level and carrier (F(4.39, 83.40) = 2.75; p = 0.029; η2partial = 0.13), suggesting a carrier-specific effect of level. Paired t tests, comparing response amplitudes at 50 and 65 dB SPL for each carrier, indicated that there was a significant increase in response amplitude with increase in stimulus level for all carriers (see Fig. 4). The difference in the change in amplitude across carriers could explain the significant interaction. The average change in response amplitude varied from a minimum of 14.21 nV for /∫/ to a maximum of 50.65 nV for /u/ F2+.
The rate of change of response amplitude (or slope in nV/dB) was calculated by dividing the difference in mean amplitude at 50 and 65 dB SPL by the change in stimulus level (also SL) of 15 dB. Only responses that were detected (i.e., significantly higher than the noise floor) were included in the average and therefore, the participants and the number of data points contributing to each average varied. To allow comparisons with estimates in the literature based on AM tones, the slopes were averaged across a few carriers based on their spectral ranges. The three F1 carriers represent low frequencies (average F1 frequency: 409 Hz), the two F2+ carriers /a/ and /u/ represent mid-frequencies (average F2+ frequency: 1241 Hz), and F2+ of /i/ and the fricatives (peak frequencies >2 kHz) represent high frequencies. Average slopes of 2.38, 2.70, and 1.54 nV/dB were obtained for the low-, mid-, and high-frequency carriers, respectively.
Response Detection Time
Response detection times for each carrier in terms of sweep length and carrier recording time are illustrated in Figures 5A and 5B, respectively. The figure only includes detection times of responses that were detected (i.e., significantly higher than noise), hence the number of responses/participants contributing to each average varied. The figure indicates variable detection times across carriers and levels. Average detection times using the /susa∫i/ sweep length ranged from approximately 4 to 10 min across carriers. Detection times are lower for fricatives relative to vowel carriers and at 65 dB SPL relative to 50 dB SPL. Figure 5B illustrates that mean recording times needed for a significant detection are under 2 min for all carriers. EFRs elicited by /∫/ required the shortest recording time for a significant detection.
Response Detectability and Detection Patterns
The number of participants with significant EFR detections for the eight individual carriers varied at 50 and 65 dB SPL (Fig. 3B). Figure 1 and Table 1 illustrate that the spectra of carriers overlap in peak frequencies and frequency ranges. For example, F1 carriers of /u/ and /i/ have dominant energy under 500 Hz. If we were only interested in inferring about neural representation and hence audibility of a certain frequency region, detection of an EFR elicited by either carrier may suffice. Therefore, we also examined detectability of low-, mid-, and high-frequency regions by pooling across a few carriers within a frequency region. Similar to slope computation, the three F1 carriers were grouped to represent low frequencies, the two F2+ carriers /a/ and /u/ were grouped to represent mid-frequencies, and F2+ of /i/ and fricatives were grouped to represent high frequencies. If any one of the two or three EFRs within a frequency band was detected (using a Bonferroni correction), we inferred that the frequency band was represented at the level of the brain stem. For the low- and high-frequency bands with three carriers each, the Bonferroni-corrected critical p value was 0.017. For the mid-frequency band with two carrier frequencies, the critical p value was 0.025. The number of participants with a minimum of one detection per frequency band is presented in Figure 6A. It is evident that 15 or more participants had a minimum of one detection per frequency band. When individual carriers are considered (Fig. 3), the number of participants with a detection ranged from 8 to 20 across carriers at 50 dB SPL and from 16 to 20 at 65 dB SPL. However, when frequency bands are considered (Fig. 6), the number of participants with a detection ranged from 15 to 20 at 50 dB SPL and from 19 to 20 at 65 dB SPL. Therefore, the use of alternative detection rules improve detection rates.
Stacked histograms in Figure 6B illustrate the distribution of participants with detections in one, two, or three bands. At 65 dB SPL, the majority of participants (19/20) had a significant detection in all three frequency bands. At 50 dB SPL, a large proportion (14/20) of participants had a significant detection in all three bands, 19 of 20 participants had a significant detection in at least two bands.
EXPERIMENT II: EFFECT OF STIMULUS BANDWIDTH ON SPEECH-EVOKED EFRS AND BEHAVIORAL MEASURES
All participants who took part in experiment I also participated in experiment II.
Experiment II involved comparison of responses in four conditions that varied in bandwidth. The four bandwidth conditions were low-pass filtered 1 kHz (LPF1k), low-pass filtered 2 kHz (LPF2k), low-pass filtered 4 kHz (LPF4k), and full bandwidth (FBW) condition. The stimulus /susa∫i/ was low-pass filtered at cutoff frequencies of 1, 2, and 4 kHz with steep filter skirts using Praat (smoothing constant of 20 Hz; cutoff frequency 6 dB below the passband). The responses obtained for the original EFR stimulus sequence at 65 dB SPL in experiment I represented the FBW condition in this experiment. The low-pass filtered stimuli matched the original EFR stimulus sequence in spectral levels of their respective passbands. The attenuator settings for the low-pass filtered conditions were the same as the FBW condition presented at 65 dB SPL, and hence, each condition varied in overall level. The recording time for each condition was 20.5 min, and the presentation order was randomized across participants. Procedures for stimulus presentation, EFR recording, response analysis, and detection were the same as in experiment I.
The effects of bandwidth on the EFR paradigm were compared with the effects of bandwidth on behavioral measures of speech discrimination and sound quality rating.
Speech Discrimination Task
The University of Western Ontario Distinctive Features Differences (UWO DFD) was used as a measure of speech discrimination. The UWO DFD test evaluates the accuracy of consonant identification in fixed context, word-medial positions using a closed-set response task paradigm (Cheesman & Jamieson 1996). This test has been frequently used as an outcome measure of hearing aid fittings (e.g., Jenstad et al. 2007). A total of 42 items (21 consonant items spoken by two male talkers) was used in the study. Stimulus presentation, response recording, and scoring were controlled by the Child Amplification Laboratory experimental system. Once the test began, a list of the 21 consonants was displayed on a computer screen. The 42 items were presented in a randomized order, and participants were required to click on the consonant they identified. Percent correct scores were calculated and converted to rationalized arcsine units (Sherbecoe & Studebaker 2004) for statistical analysis. The results presented in figures and tables are in percent correct scores.
Similar to the EFR stimulus sequence, the 42 test items in the UWO DFD test were low-pass filtered at 1, 2, and 4 kHz. The original UWO DFD stimuli represented the FBW condition. All participants underwent a practice run with FBW stimuli before starting the test. Order of presentation of the four bandwidth conditions was randomized across participants. Stimuli were presented at a level of 65 dB SPL, calibrated in using a concatenated list (with no gaps) of the 42 original (FBW) items used in the study. The level was measured in flat-weighted Leq while the stimulus was presented for 30 sec. Attenuator settings were identical for all four bandwidth conditions. The time taken to complete this task ranged from 10 to 20 min.
Sound Quality Rating Task
The MUltiple Stimulus Hidden Reference and Anchors (MUSHRA) paradigm was used to obtain sound quality ratings (International Telecommunication Union [ITU] 2003). The task was implemented using custom software described by Parsa et al. 2013). The MUSHRA task uses a paired comparison paradigm to measure sound quality, allowing for grading of multiple stimuli in comparison with each other (ITU 2003) with good reliability (Parsa et al. 2013). In this task, a reference stimulus (the best quality), multiple test stimuli that need to be rated, and an anchor (typically of poor quality) are used. The reference and anchor stimuli act as examples for ratings on either ends of the scale provided. In the visual display on a computer screen, a copy of the reference stimulus is always indicated on the top-left corner of the screen. The participant is instructed to make a pairwise comparison of each test stimulus with the reference stimulus on the top-left corner of the screen. The participant rates each test stimulus (one of which is another copy of the reference or the hidden reference) by moving a slider along a continuous scale, arbitrarily labeled as “bad” on one end and “excellent” on the other. The presentation of these stimuli is controlled by the participant. The task allows the participant to listen to any stimulus any number of times, in any order, until they are comfortable with their ratings. The order of the stimuli across filter conditions, including the reference and the anchor, was randomized for each participant.
The sentence pair “Raise the sail and steer the ship northward. The cone costs five cents on Mondays” spoken by a male talker was used in this study. This pair was a subset of the four IEEE Harvard sentence pairs used in Parsa et al. (2013). Similar to the EFR stimulus and speech discrimination test, the sentence pair was low-pass filtered at 1, 2, and 4 kHz. The original sentence represented the FBW condition and served as the reference stimulus. The sentence was low-pass filtered at 0.5 kHz to serve as the anchor. For calibration, the original (FBW) sentence pair was presented in a looped manner without silent gaps, and the level was measured in flat-weighted Leq while the stimuli was presented for 30 sec. The attenuator settings were identical for all four bandwidth conditions. The time taken to complete this task ranged from 4 to 10 min.
Individual RM-ANOVAs were completed to examine the effect of the within-subject factor bandwidth condition (four levels: LPF1k, LPF2k, LPF4k, and FBW) on four dependent variables, namely, the number of EFRs detected (maximum of eight), EFR composite amplitude, speech discrimination scores (in rationalized arcsine units), and sound quality rating. An additional level (anchor, low-pass filtered at 500 Hz) was added for sound quality rating. EFR composite amplitude was obtained by summing response amplitudes across all eight carriers. Composite amplitude was considered to allow incorporation of EFR amplitude as a dependent measure within the four levels of the within-subject factor in the ANOVA. Post hoc analysis was completed using paired t tests between all bandwidth conditions. The t tests were interpreted using the false discovery rate correction for multiple comparisons (Benjamini & Hochberg 1995).
Data from one participant were excluded from statistical analysis due to high noise estimates.
Effect of Stimulus Bandwidth on EFR and Behavioral Tasks
Figure 7 illustrates average change in EFR and behavioral measures with increase in bandwidth, superimposed on individual data. Composite response amplitude and behavioral scores increase with increase in bandwidth up to and beyond 4 kHz. The relative change in test scores between LPF4k and FBW is greater for sound quality rating compared with speech discrimination percent correct scores. Not indicated in the figure is the mean sound quality rating for the anchor (M = 10.11; SD = 8.72).
The main effect of bandwidth was significant for all four dependent variables: EFR number of detections [F(2.64, 47.45) = 134.08; p < 0.001; η2partial = 0.88], EFR composite amplitude [F(1.45, 26.15) = 83.51; p < 0.001; η2partial = 0.82], speech discrimination scores [F(2.35, 42.23) = 398.27; p < 0.001; η2partial = 0.96], and sound quality rating [F(2.72, 48.97) = 292.88; p < 0.001; η2partial = 0.94]. Significant differences due to bandwidth in post hoc analyses are indicated in Figure 7. The change between LPF4k and FBW was not significant for the number of detections. In contrast, increases in composite response amplitude, speech discrimination, and sound quality rating from LPF4k to FBW conditions were significant. As expected, the anchor stimulus was significantly lower than all wider bandwidth conditions, indicating that participants likely understood the MUSHRA task.
Effect of Stimulus Bandwidth on Response Amplitudes of Individual EFR Carriers
The analysis mentioned earlier indicated a significant change in EFR composite amplitude with increase in bandwidth. Because EFR carriers in the stimulus sequence were chosen to represent a range of frequency bands, the change in composite amplitude due to bandwidth may reflect changes in the response amplitudes of any one, or a combination of individual EFRs. The mean response amplitude for each carrier in multiple bandwidth conditions is illustrated in Figure 8. The figure suggests that the change in EFR amplitude due to bandwidth varied across carriers. To assess the effect of bandwidth condition on response amplitudes for each individual EFR carrier, a RM-ANOVA was completed with bandwidth (four levels: LPF1k, LPF2k, LPF4k, and FBW) and carriers (eight levels: /u/ F1, /a/ F1, /i/ F1, /u/ F2+, /a/ F2+, /i/ F2+, /∫/, and /s/) as the two within-subject factors and response amplitude as the dependent measure. The results indicated a significant main effect of bandwidth [F(1.45, 26.15) = 83.51; p < 0.001; η2partial = 0.82], a significant main effect of carrier [F(2.49, 44.48) = 10.61; p < 0.001; η2partial = 0.37], and a significant interaction between bandwidth condition and carrier [F(5.33, 96.01) = 44.31; p < 0.001; η2partial = 0.71]. Post hoc analysis included paired t tests comparing response amplitudes across all possible pairwise combinations of bandwidth condition for each carrier. Significantly different conditions are indicated in Figure 8.
The /u/ and /a/ F1 carriers do not show a significant change in response amplitudes across the different bandwidth conditions. This is consistent with the low-frequency peaks for both carriers (see Table 1 and Fig. 1). The carrier /i/ F1 increases significantly in response amplitude between LPF1k and LPF4k, as well as LPF2k and LPF4k, although the dominant spectral peak in this carrier is around 300 Hz, and the low-pass filter cutoff was 1150 Hz (see Table 1). All three F2+ carriers show a significant increase in response amplitude between LPF1k and LPF2k bandwidth conditions. This is consistent with the location of F2 peak frequencies, carrier spectra, and cutoff frequencies used during stimulus preparation (see Table 1 and Fig. 1). The /a/ and /i/ F2+ carriers show a steady increase in response amplitude up to the FBW condition, consistent with their spectra. Both fricative carriers show a substantial increase in response amplitudes between LPF2k and LPF4k, as well as LPF4k and FBW. On average, the carriers /∫/ and /s/ show a change of 75.28 and 99.33 nV, respectively, between LPF4k and FBW. This relatively large change likely contributes substantially to the increase in composite amplitude beyond 4 kHz (Fig. 7B). A significant increase in amplitude was also found between LPF1k and LPF2k for the carrier /s/ but not for /∫/. The significant change for /s/ was unexpected because the carrier was high-pass filtered at 3 kHz.
Relationship Between Objective (EFR) and Subjective (Behavioral) Measures of Bandwidth
Figure 9 illustrates the behavioral test scores plotted against the number of EFR detections and EFR composite amplitude. The figure displays mean scores averaged across 19 participants along with individual scores. The overall positive trends suggest that an increase in bandwidth has a similar impact on behavioral and EFR measures. Although the asymptotic trend in the number of detections does not reflect the change in behavioral measures (Figs. 9A,B), changes in EFR composite amplitude parallel changes in both behavioral measures beyond LPF4k (Figs. 9C,D). A significant positive correlation was found between the number of detections and the speech discrimination scores [Pearson r = 0.87; t(17) = 18.36; p < 0.001], as well as the number of detections and sound quality rating [Pearson r = 0.74; t(17) = 14.59; p < 0.001]. In addition, a significant positive correlation was found between EFR composite amplitude and speech discrimination scores [Pearson r = 0.67; t(17) = 9.25; p < 0.001] and between EFR composite amplitude and sound quality rating [Pearson r = 0.60; t(17) = 10.88; p < 0.001]. The significant positive correlations between EFR parameters and behavioral scores support convergent validity of the EFR paradigm in demonstrating sensitivity to audible bandwidth.
The aims of experiments I and II were to assess sensitivity of the EFR paradigm to changes in audibility due to stimulus level and bandwidth. Changes in the number of EFRs detected and EFR amplitude were compared across test conditions. In general, both experiments demonstrated an increase in the number of EFRs detected and EFR amplitude with improved audibility.
Experiment I: Effect of Level
Results indicate that the increase in test level, and thereby SL, of carriers by 15 dB, resulted in a significant increase in response amplitudes for individual EFR carriers, as well as the overall number of detections at each test level. The increase in response amplitude leads to a higher probability of detection as the criterion for detection (F ratio) is based on the response amplitude relative to the noise amplitude. The level-dependent changes in the present study support previous findings that demonstrate a positive relationship between level and response amplitudes (Lins et al. 1995; Picton et al. 2003, 2005; Vander Werff & Brown 2005). The change in response amplitudes due to level can be explained by a combination of increased neural firing rates (Sachs & Abbas 1974) and spread of excitation in the peripheral auditory system (Moore & Glasberg 1987; Moore 2003). Spread of excitation recruits more sensory cells, and thereby nerve fibers and synapses, resulting in an increase in input to the brainstem generators of EFRs (Picton et al. 2003; Purcell & Dajani 2008).
The pattern of intensity-response amplitude slope of speech EFRs is similar to AM tones (Lins et al. 1995; Vander Werff & Brown 2005); high-frequency carriers tend to show a shallower slope than lower frequency carriers. However, the absolute magnitudes of the slopes are higher than that for AM tones. The estimated slopes for low (500 Hz), mid (1 and 2 kHz), and high (4 kHz) frequency AM tones were approximately 1.4, 1.3, and 0.84, respectively (Lins et al. 1995; Vander Werff & Brown 2005). The steeper slope for this study’s broadband stimuli could be due to a larger excitation area on the basilar membrane compared with AM tones with only three frequency components. For example, in the case of vowels, it may be that the higher stimulus level allows for better interaction between many harmonics causing a larger increase in response amplitude. The difference in slope across carrier frequencies could be explained, at least in part, on the basis of excitation patterns on the basilar membrane. With increase in level, excitation patterns show more basal spread of activity toward frequencies higher than the stimulus frequency (Moore & Glasberg 1987; Moore 2003). Because the basal spread is likely to be greater for low-frequency stimuli, the growth is probably steeper for low-frequency stimuli relative to high-frequency stimuli (Lins et al. 1995).
Response Detection Time
Response detection time based on sweep length and carrier recording time indicate lower detection times for fricatives, on average (Figs. 5A,B). Particularly, the fricative /∫/ has the least carrier recording time (Fig. 5B) likely due to higher response amplitudes relative to other carriers (see Fig. 4). Average noise floors are variable across carriers too (see Fig. 4); however, the differences in amplitude are larger than the differences in noise estimates. Variations in response amplitudes lead to differences in F ratios and therefore the time to reach the critical F ratio. In addition, detection times tend to be longer at test levels of 50 dB SPL relative to 65 dB SPL. This can also be explained in terms of differences in response amplitudes while noise estimates are similar; shorter detection times at the higher test level are facilitated by higher amplitude responses at 65 dB SPL (see Fig. 4). The carrier recording times for vowels seem to be longer than detection times reported by Aiken and Picton (2006), possibly due to a floor effect in the present study as responses were not analyzed under 50 sweeps.
A direct comparison with other aided protocols such as CAEPs is difficult to make due to limited data available on test time. Test time computed based on the stop criterion on “accepted” sweeps and interstimulus level may range from under 2 to 4 min per carrier (Van Dun et al. 2012; Carter et al. 2013). Average carrier recording times required for a significant detection (Fig. 5B) are comparable with these estimates. However, the effective testing time necessary using the current stimulus would likely be shorter due to simultaneous presentation of vowel carriers. For the testing conditions in the present study, the average test time necessary for a significant detection using the stimulus /susa∫i/ fell under 10 min for all carriers (see Fig. 5A). It is encouraging to note the clinically feasible test times of the EFR paradigm using the stimulus /susa∫i/ to infer neural representation of low-, mid-, and high-frequency dominant carriers in the auditory system. Note that the detection times are representative only of responses that were significantly higher than the noise. Due to the use of a fixed number of sweeps, detection times do not represent responses that may have taken over 300 sweeps to be detected.
In summary, results of experiment I suggest that the EFR paradigm shows changes in amplitude and detection with improvement in SL caused by increasing stimulus level. Detection times at suprathreshold levels demonstrate clinically feasible test times. Alternate scoring rules such as combining carriers within a frequency band improve detection rates. Although combining carriers within a frequency band reduces carrier-specific information, it may prove to be an advantage clinically to demonstrate that detection is possible in each band in possibly shorter test times. The test could continue for longer to obtain carrier-specific detections, if time permits.
Experiment II: Effect of Bandwidth
Effect of Bandwidth on Speech Discrimination and Sound Quality Rating
The effects of bandwidth on speech discrimination scores and quality ratings are consistent with findings in the literature. As the low-pass filter cutoff is increased, the number of consonants correctly identified increases in the UWO DFD (Cheesman & Jamieson 1996) and other tests of speech recognition (see review by Stelmachowicz et al. 2004). Similar trends are observed in other speech discrimination tests such as the NU-6 (Studebaker et al. 1993) and CID-W22 (Studebaker & Sherbecoe 1991), both of which are meaningful monosyllabic word lists. In the present study, the increase in speech discrimination score was substantial until the LPF4k condition. There was only a small (but significant) improvement in scores with further increase in bandwidth (Fig. 7C). This growth function is broadly consistent with patterns seen in the speech discrimination tests mentioned earlier and is supported by the band importance functions of various speech tests (Pavlovic 1994). Band importance functions represent contributions of different frequency bands to speech intelligibility: larger values indicate higher impact on speech intelligibility. These indices tend to be lower for frequency bands over 4 kHz relative to those between 1 and 4 kHz (Pavlovic 1994), explaining the small increase in speech discrimination scores beyond LPF4k in the present study.
Sound quality ratings show a steady increase until the FBW condition (Fig. 7D), even beyond 4 kHz where changes in speech discrimination scores tend to be small (Pittman 2010). The improvement in sound quality rating when stimulus bandwidth is increased beyond 4 kHz has been repeatedly demonstrated in individuals with normal hearing (Voran 1997; Moore & Tan 2003; Ricketts et al. 2008; Füllgrabe et al. 2010).
Effect of Bandwidth on EFRs
The number of detections shows an increase only until the LPF4k condition. However, the composite amplitude, which is the sum of amplitudes of individual carriers, continues to increase even beyond the LPF4k condition (Figs. 7A,B). All carriers used in this study have at least some stimulus energy below 4 kHz (see Fig. 1). This may have therefore led to the asymptotic trend in the number of detections beyond LPF4k. However, the increase observed in composite response amplitude up to the FBW condition supports sensitivity of the EFR paradigm to stimulus bandwidth even beyond 4 kHz. The effective change in stimulus bandwidth is likely to have been from 4 to 10 kHz, consistent with the flat frequency response of the ER-2 transducer up to 10 kHz and the presence of significant frication energy in the 4 to 10 kHz region (Fig. 1). The growth in response amplitude with stimulus bandwidth is likely due to concomitant increases in the area of excitation in the cochlea. Similar to the effects of level, the larger excitation area will involve recruitment of a larger number of sensory cells and neurons contributing to more robust inputs to the brain stem (Picton et al. 2003; Purcell & Dajani 2008).
Considering carriers individually, we observe that most carriers show a change in response amplitude in the expected direction based on their spectra (Fig. 8). Some changes that were unexplained by carrier spectra were the effects of bandwidth on EFRs elicited by /i/ F1 and /s/. In the case of /i/ F1, response amplitude in the LPF4k condition was significantly higher than the LPF1k and LPF2k conditions by an average of 22 and 11 nV, respectively. It may be that the addition of higher frequencies may enhance the EFR elicited by /i/ F1, but only by a small amount. The change in response amplitude of /i/ F1 is in a direction opposite to that observed in the case of multiple ASSRs, in which multiple carrier frequencies are presented simultaneously. At moderate to high stimulus levels, addition of a high-frequency tone an octave higher than the target stimulus frequency can suppress the response generated at the target stimulus frequency (John et al. 1998, 2002; McNerney & Burkard 2012). The interference effect is nonsignificant at lower levels (John et al. 1998, 2002; McNerney & Burkard 2012). The differences in phenomena observed between the present and previous studies may be due to the stimulus levels used and/or the level of the high-frequency interfering band.
In the case of /s/, the response amplitude in the LPF2k condition was significantly higher than the LPF1k condition. As suggested in the Results, this was an unexpected finding because /s/ was high-pass filtered at 3 kHz. The changes in response amplitude of /s/ may have been due to random variations in noise or a type I error.
Stimuli used in this study are naturally occurring broadband stimuli. The goals during stimulus preparation were to represent the broad range of frequencies present in speech, as well as improve frequency specificity of naturally occurring broadband phonemes. Although we attempted to improve frequency specificity of these stimuli, they are nevertheless more broadband than stimuli commonly used for threshold estimation. Carrier-specific changes in bandwidth (Fig. 8) suggest that the stimuli are fairly frequency specific. However, it is important to note that frequency specificity does not ensure place specificity in terms of the extent of basilar membrane activation (Picton et al. 2003). Place specificity of EFR carriers could be evaluated using masking approaches (e.g., Herdman et al. 2002b).
SUMMARY AND CONCLUSIONS
The present study presented a novel test paradigm based on speech-evoked EFRs for use as an objective hearing aid outcome measure. The method uses naturally spoken speech tokens that were modified to elicit EFRs from spectral regions important for speech understanding, and speech and language development. The vowels /u/, /a/, and /i/ were modified to enable simultaneous recording of two EFRs, one from the low-frequency F1 region and one from the mid- to high-frequency F2+ regions. Fricatives /∫/ and /s/ were amplitude modulated to enable recording of EFRs from the high-frequency regions. The present study evaluated the sensitivity of the proposed test paradigm to changes in audibility caused by altering stimulus level in experiment I, and manipulating stimulus bandwidth in experiment II in young adults with normal hearing. The paradigm demonstrated sensitivity to changes in audibility, secondary to stimulus level and bandwidth changes, in terms of the number of EFR detections and response amplitude. In addition, the paradigm demonstrated convergent validity when compared with changes in behavioral outcome measures such as speech discrimination and sound quality rating due to bandwidth. In summary, this method may be a useful tool as an objective-aided outcome measure considering its running speech-like stimulus, representation of spectral regions important for speech understanding, level and bandwidth sensitivity, clinically feasible test times, and objective methods of response interpretation. The validity of the paradigm in individuals with hearing loss who wear hearing aids requires further investigation.
The authors thank Tracy Lee and Bedoor Al-Qenai for their assistance with data collection and manuscript preparation, Dr. Jong min Choi and Dr. Chris Lee for consultation, and Steve Beaulac for implementation of the MUSHRA task.
Portions of this article were presented at the Association for Research in Otolaryngology Mid-Winter meeting 2014, San Diego, USA, February 22–26, 2014, and at the American Academy of Audiology AudiologyNow! 2014, Orlando, USA, March 26–28, 2014.
Aiken S. J., Picton T. W.. Envelope following responses
to natural vowels. Audiol Neurootol. 2006;11:213–232
Aiken S. J., Picton T. W.. Envelope and spectral frequency- following responses to vowel
sounds. Hear Res. 2008;245:35–47
American Academy of Audiology.American Academy of Audiology.Pediatric Amplification Clinical Practice Guidelines. 2013 Retrieved May 25, 2014, from http://www.audiology.org/resources
Anderson S., Kraus N.. The potential role of the cABR in assessment and management of hearing impairment. Int J Otolayngol. 2013 2013 Jan 30. [ePub ahead of print]
Andresen E. M.. Criteria for assessing the tools of disability outcomes research. Arch Phys Med Rehabil. 2000;81(12 Suppl 2):S15–S20
Bagatto M. P., Scollie S. D., Hyde M. L., et al. Protocol for the provision of amplification within the Ontario infant hearing program. Int J Audiol. 2010;49(Suppl 1):S70–S79
Benjamini Y., Hochberg Y.. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Royal Stat Soc Ser B. 1995;57:289–300
Boersma P.. Praat: A system for doing Phonetics by Computer. Glot Int. 2001;5:341–345
Boothroyd A., Erickson F. N., Medwetsky L.. The hearing aid input: A phonemic approach to assessing the spectral distribution of speech. Ear Hear. 1994;15:432–442
Boothroyd A., Medwetsky L.. Spectral distribution of /s/ and the frequency response of hearing aids. Ear Hear. 1992;13:150–157
Carter L., Dillon H., Seymour J., et al. Cortical auditory-evoked potentials (CAEPs) in adults in response to filtered speech stimuli. J Am Acad Audiol. 2013;24:807–822
Cheesman M., Jamieson D.. Development, evaluation and scoring of a nonsense word test suitable for use with speakers of Canadian English. Canadian Acoustics. 1996;24:3–11
Choi J. M., Purcell D. W., Coyne J. A., et al. Envelope following responses
elicited by English sentences. Ear Hear. 2013;34:637–650
Choi J. M., Purcell D. W., John M. S.. Phase stability of auditory steady state responses in newborn infants. Ear Hear. 2011;32:593–604
Cohen L. T., Rickards F. W., Clark G. M.. A comparison of steady-state evoked potentials to modulated tones in awake and sleeping humans. J Acoust Soc Am. 1991;90:2467–2479
Dajani H. R. R., Heffernan B. P., Giguere C.. Improving hearing aid fitting using the speech-evoked auditory brainstem response. Conference Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2013 Piscataway, NJ IEEE:2812–2815
Dillon H. Hearing Aids. 20122nd ed. New York, NY Thieme
Easwar V., Beamish L., Aiken S. J., et al. Sensitivity of envelope following responses
polarity. Hear Res. 2015;320:38–50
Easwar V., Purcell D. W., Scollie S. D.. Electroacoustic comparison of hearing aid output of phonemes in running speech versus isolation: Implications for aided cortical auditory evoked potentials testing. Int J Otolaryngol. 2012 doi: 10.1155–2012–518202.
Elberling C., Don M.Burkard R. F., Eggermont J. J., Don M.. Detecting and assessing synchronous neural activity in the temporal domain (SNR, response detection). Auditory Evoked Potentials: Basic Principles and Clinical Applications. 2007 Philadelphia, PA Lippincott Williams and Wilkins:102–123
Finch E., Brooks D., Stratford P., et al. Physical Rehabilitation Outcome Measures. A Guide to Enhanced Clinical Decision Making. 20022nd ed. Hamilton, ON B.C. Decker Incorporation
Füllgrabe C., Baer T., Stone M. A., et al. Preliminary evaluation of a method for fitting hearing aids with extended bandwidth. Int J Audiol. 2010;49:741–753
Golding M., Pearce W., Seymour J., et al. The relationship between obligatory cortical auditory evoked potentials (CAEPs) and functional measures in young infants. J Am Acad Audiol. 2007;18:117–125
Gustafson S. J., Pittman A. L.. Sentence perception in listening conditions having similar speech intelligibility indices. Int J Audiol. 2011;50:34–40
Henning R. W., Bentler R.. Compression-dependent differences in hearing aid gain between speech and nonspeech input signals. Ear Hear. 2005;26:409–422
Herdman A. T., Lins O., Van Roon P., et al. Intracerebral sources of human auditory steady-state responses. Brain Topogr. 2002a;15:69–86
Herdman A. T., Picton T. W., Stapells D. R.. Place specificity of multiple auditory steady-state responses. J Acoust Soc Am. 2002b;112:1569–1582
Holube I., Fredelake S., Vlaming M., et al. Development and analysis of an International Speech Test Signal (ISTS). Int J Audiol. 2010;49:891–903
Hyde M.. The N1 response and its applications. Audiol Neurootol. 1997;2:281–307
International Telecommunication Union [ITU].International Telecommunication Union [ITU].Recommendation BS.1534-1: Method for the subjective assessment of intermediate quality levels of coding systems. 2003 Geneva, Switzerland International Telecommunications Union
Jenstad L. M., Bagatto M. P., Seewald R. C., et al. Evaluation of the desired sensation level [input/output] algorithm for adults with hearing loss: The acceptable range for amplified conversational speech. Ear Hear. 2007;28:793–811
John M. S., Lins O. G., Boucher B. L., et al. Multiple auditory steady-state responses (MASTER): Stimulus and recording parameters. Audiology. 1998;37:59–82
John M. S., Purcell D. W., Dimitrijevic A., et al. Advantages and caveats when recording steady-state responses to multiple simultaneous stimuli. J Am Acad Audiol. 2002;13:246–259
Joint Committee on Infant Hearing.Joint Committee on Infant Hearing.Year 2007 Position Statement: Principles and Guidelines for Early Hearing Detection and Intervention Programs. 2007 Retrieved June 20, 2014, from http://www.asha.org/policy
Laroche M., Dajani H. R., Prévost F., et al. Brainstem auditory responses to resolved and unresolved harmonics of a synthetic vowel
in quiet and noise. Ear Hear. 2013;34:63–74
Ling D. Foundations of Spoken Language for Hearing-Impaired Children. 1989 Washington, DC Alexander Graham Bell Association for the Deaf
Lins O. G., Picton P. E., Picton T. W., et al. Auditory steady-state responses to tones amplitude-modulated at 80-110 Hz. J Acoust Soc Am. 1995;97(5 Pt 1):3051–3063
McNerney K. M., Burkard R. F.. The auditory steady state response: Far-field recordings from the chinchilla. Int J Audiol. 2012;51:200–209
Moore B. C. J. An Introduction to the Psychology of Hearing. 20035th ed Amsterdam, NL Academic Press
Moore B. C. J., Glasberg B. R.. Formulae describing frequency selectivity as a function of frequency and level, and their use in calculating excitation patterns. Hear Res. 1987;28:209–225
Moore B. C. J., Tan C. T.. Perceived naturalness of spectrally distorted speech and music. J Acoust Soc Am. 2003;114:408–419
Parsa V., Scollie S., Glista D., et al. Nonlinear frequency compression: Effects on sound quality ratings of speech and music. Trends Amplif. 2013;17:54–68
Pavlovic C. V.. Band importance functions for audiological applications. Ear Hear. 1994;15:100–104
Picton T. W. Human Auditory Evoked Potentials. 2011 San Diego, CA Plural Publishing
Picton T. W., Dimitrijevic A., Perez-Abalo M. C., et al. Estimating audiometric thresholds using auditory steady-state responses. J Am Acad Audiol. 2005;16:140–156
Picton T. W., John M. S., Dimitrijevic A., et al. Human auditory steady-state responses. Int J Audiol. 2003;42:177–219
Pittman A. L.. Short-term word-learning rate in children with normal hearing and children with hearing loss in limited and extended high-frequency bandwidths. J Speech Lang Hear Res. 2008;51:785–797
Pittman A. L.. High-frequency amplification: Sharpening the pencil. Proceedings of the Sound Foundation through Early Amplification Chicago 2010. 2010 Stafa, Switzerland Phonak AG:179–185
Pittman A. L., Stelmachowicz P. G.. Hearing loss in children and adults: Audiometric configuration, asymmetry, and progression. Ear Hear. 2003;24:198–205
Purcell D. W., Dajani H. R. R.Rance G.. The stimulus-response relationship in auditory steady-state response testing. The Auditory Steady-State Response: Generation, Recording, and Clinical Application. 2008 San Diego, CA Plural Publishing:55–82
Purcell D. W., John S. M., Schneider B. A., et al. Human temporal auditory acuity as assessed by envelope following responses
. J Acoust Soc Am. 2004;116:3581–3593
Purdy S. C., Katsch R., Dillon H., et al.Seewald R. C., Bamford J. Aided cortical auditory evoked potentials for hearing instrument evaluation in infants. Proceedings of the Sound Foundation Through Early Amplification Chicago 2004. 2005 Stafa, Switzerland Phonak AG:115–127
Ricketts T. A., Dittberner A. B., Johnson E. E.. High-frequency amplification and sound quality in listeners with normal through moderate hearing loss. J Speech Lang Hear Res. 2008;51:160–172
Sachs M. B., Abbas P. J.. Rate versus level functions for auditory-nerve fibers in cats: Tone-burst stimuli. J Acoust Soc Am. 1974;56:1835–1847
Scollie S., Glista D., Tenhaaf J., et al. Stimuli and normative data for detection of Ling-6 sounds
in hearing level. Am J Audiol. 2012;21:232–241
Scollie S. D., Seewald R. C.. Evaluation of electroacoustic test signals I: Comparison with amplified speech. Ear Hear. 2002;23:477–487
Sherbecoe R. L., Studebaker G. A.. Supplementary formulas and tables for calculating and interconverting speech recognition scores in transformed arcsine units. Int J Audiol. 2004;43:442–448
Sininger Y. S., Martinez A., Eisenberg L., et al. Newborn hearing screening speeds diagnosis and access to intervention by 20-25 months. J Am Acad Audiol. 2009;20:49–57
Stelmachowicz P. G., Kopun J., Mace A. L., et al. Measures of hearing aid gain for real speech. Ear Hear. 1996;17:520–527
Stelmachowicz P. G., Lewis D. E., Seewald R. C., et al. Complex and pure-tone signals in the evaluation of hearing-aid characteristics. J Speech Hear Res. 1990;33:380–385
Stelmachowicz P. G., Pittman A. L., Hoover B. M., et al. Effect of stimulus bandwidth
on the perception of /s/ in normal- and hearing-impaired children and adults. J Acoust Soc Am. 2001;110:2183–2190
Stelmachowicz P. G., Pittman A. L., Hoover B. M., et al. The importance of high-frequency audibility in the speech and language development of children with hearing loss. Arch Otolaryngol Head Neck Surg. 2004;130:556–562
Studebaker G. A., Sherbecoe R. L.. Frequency-importance and transfer functions for recorded CID W-22 word lists. J Speech Hear Res. 1991;34:427–438
Studebaker G. A., Sherbecoe R. L., Gilmore C.. Frequency-importance and transfer functions for the Auditec of St. Louis recordings of the NU-6 word test. J Speech Hear Res. 1993;36:799–807
Tobias J. V.. Relative occurrence of phonemes in American English. J Acoust Soc Am. 1959;31:631
Van Dun B., Carter L., Dillon H.. Sensitivity of cortical auditory evoked potential (CAEP) detection for hearing-impaired infants in response to short speech sounds. Audiol Res. 2012;2:65–76
Vander Werff K. R., Brown C. J.. Effect of audiometric configuration on threshold and suprathreshold auditory steady-state responses. Ear Hear. 2005;26:310–326
Voran S.. Listener ratings of speech passbands. 1997 IEEE Workshop on Speech Coding for Telecommunications. 1997 Piscataway, NJ IEEE:81–82