For well over 150 years, neuroscience has advanced as new patient populations have become available for study. In this article, we describe results for the newest patient population to receive a cochlear implant (CI), i.e., individuals with single-sided deafness (SSD).
SSD patients report multiple deficits in health-related quality of life. These deficits include 1) difficulties in speech understanding in quiet and in noise, 2) difficulties in locating a sound source and judging the distance and movement of a sound source, 3) difficulties in segregating sounds, 4) an increase in listening effort, and 5) tinnitus (e.g., (1–4)). In addition, unilateral deafness can alter cortical organization and engender cross-modal cortical reorganization (5).
Multiple studies show that the provision of a CI for SSD patients improves speech understanding in noise, improves sound source localization, and reduces tinnitus (see (6) for a review; also (2–4,7,8)). Moreover, the provision of a CI can reverse cross-modal reorganization and can reinstate normal, bilateral cortical response to sound stimulation (5). In this report, we describe another domain in which SSD-CI patients have made a substantial new contribution to our knowledge of cochlear implants—the sound quality of CI stimulation.
In the laboratory, SSD-CI patients can rate the similarity of a clean signal presented to the CI ear, i.e., the sound quality of the CI, and candidate, CI-like signals presented to the ear with normal hearing. Consider two possibilities for CI sound quality as judged by SSD-CI patients. Note first that in everyday listening conditions, SSD-CI patients receive simultaneous stimulation from their normal hearing ear and from their CI. One possibility is that CI sound quality will be judged to very poor because listeners have a high-resolution signal from the normal hearing ear as a reference. A second view notes that in normal listening situations cortical speech processing areas will, at each moment in time, have the “correct” description of the CI signal from the normal hearing ear. This could be the optimum condition for learning. On this view, cortical areas responding to the signal from the normal hearing ear could “teach” cortical areas responding to the signal from the CI what the signal should sound like. If this were to be the case, then sound quality could be judged to be very high.
Creating Signals That Might Sound Like an Implant
Loizou (9,10) noted the similarity in signal processing for cochlear implants and for channel vocoders, i.e., devices that were used originally with normal hearing listeners to study speech transmission systems for telephones (11). Both divide the speech signal into n continuous frequency bands and both estimate the energy in each band and output a signal proportional to the summed energy in the filter bands. In a cochlear implant, stimulation is delivered via electrodes in the scala tympani to multiple, fixed frequency regions in the spiral ganglion. The number of frequency locations is determined by the number of electrodes and the frequencies stimulated are a function of the distance of the electrodes from the round window and the spread of current. In a channel vocoder the summed energy in the filter bands is delivered to the basilar membrane via headphones or loudspeakers. Most commonly, the output of a channel vocoder is summed noise bands with each noise band the width of a given input filter. This produces a noisy or whispered voice quality. An alternative output is a set of sine waves each at the center frequency of a filter band. This produces a tonal voice quality with a distinct flattening of the pitch contour.
The similarity in signal processing for cochlear implants and for channel vocoders has led to the use of vocoders with normal hearing individuals as listeners to study aspects of signal processing that are directly relevant to the design of cochlear implants, e.g., the minimum number of channels needed to achieve a high level of speech understanding (e.g., (12–14)) or the effect of offsets between frequencies in the input signal and the cochlear places (frequencies) to which that information is delivered (15,16).
Most generally, vocoder simulations were designed to reflect the information that was available for speech understanding from a small number of channels that contain envelope-based information. The simulations were not intended to convey the sound quality of an implant. Nonetheless, in the absence of other models, they have used ubiquitously when researchers have been asked, “What does an implant sound like?.”
For this exploratory experiment on the sound quality or “voice” of a CI, four types of stimuli were created: noise vocoded sentences; sine vocoded sentences; frequency shifted, sine vocoded sentences; and band-pass filtered, natural speech sentences. The noise and sine vocoder stimuli were created with 4, 6, 8, 10, and 12 output channels because many experiments have documented that the number of effective channels in an implant is far less than the number of electrodes (17). The sine vocoded, frequency shifted signals were created because the insertion depth of an electrode array rarely is sufficiently deep to prevent an upshift in place of stimulation (18,19). Finally, the band-limited, natural-speech signals were created because, in preliminary testing, a patient indicated that listening to her CI was like listening to someone who was speaking from behind a door. In this circumstance, the signal would be effectively low-pass filtered.
Eight SSD patients participated in this study. Biographical details are shown in Table 1. Listeners were fit with MED EL CIs except for listener 1 who was fit with a Cochlear Corporation CI and listener 6 who was fit with an Advanced Bionics CI. In the United States CIs for single-sided deafness are an off-label use of CIs.
Four types of stimuli were created: noise vocoded signals, sine vocoded signals, frequency shifted, sine vocoded signals, and band-pass filtered, natural speech signals. The noise and sine vocoded signals were created with 4, 6, 8, 10, and 12 channels. The frequency shifted signals were created with 5 sine channels and insertion depths of 25, 24, 23, and 22 mm were simulated. As the simulated insertion depth decreased, e.g., 22 mm, formant upshift increased. For details of these stimuli see (15). The band pass, natural speech signals were filtered between 200 and 1000, 200 and 2000, 200 and 3000, 200 and 4000, and 200 and 5000 Hz.
A computer monitor displayed, near the top of the screen, a box labeled CI. A mouse press on this box delivered a clean (unprocessed) sentence to the patient's CI via a direct connect cable. At the bottom of the screen were six boxes. Mouse clicks on these boxes delivered candidate sentences, via an insert phone, to the patient's normal hearing ear. The patient could toggle between the clean signal presented to the CI and candidate, CI-like signals to the normal hearing ear as often as she/he wished. After hearing each candidate signal, the patient typed a number between 0 (not at all like the sound of the CI) and 10 (exactly like the sound of the CI).
The four types of stimuli, noise vocoder, sine vocoder, frequency shifted sine, and filtered natural speech, were presented in blocks. Within each block, listeners heard three different sentences, two male and one female. These were also blocked.
The individual ratings and median rating of the candidate signals are presented in Figure 1. Each data point represents the mean score for three judgements—one for each of the three speakers. Due to the spread in the data, median scores are reported for the group data. For the sine vocoded signals with 12, 10, 8, 6, and 4 channels, the median ratings were 2.7, 2.9, 2.9, 3.2, and 3.0 respectively. For the noise vocoded signals with 12, 10, 8, 6, and 4 channels, the median ratings were 1.9, 1.7, 1.9, 1.7, and 1.5, respectively. For the frequency shifted, sine signals the mean ratings for normal, 25, 24, 23, and 22 mm insertion depths were 1.4, 1.9, 1.3, 1.2, and 1.4, respectively. For the band-passed stimuli at 200 to 5000, 200 to 4000, 200 to 3000, 200 to 2000, and 200 to 1000 Hz, the median scores were 2.8, 3.5, 4.0, 3.8, and 5.5, respectively.
The results shown in Figure 1 indicate, most generally, that neither sine vocoded; noise vocoded; nor sine vocoded, frequency-shifted signals capture the sound quality of a cochlear implant. Only one patient's responses to the noise vocoded sentences, indicated by the square symbols, suggested that his CI sounded had a distinct noise component. Following this experiment, the patient's lowest frequency channel output was reduced in amplitude. This eliminated the noise percept according to patient report. However, we did not collect a second set of data with the new settings.
The responses to the bandpass-filtered, natural speech signals provided the most provocative data. Individual data from that experiment are shown in Figure 2. There are three clear response patterns. One patient, shown at the top of the figure, ranked sentences that were bandpass filtered from 200 to 5000 Hz, i.e., that were relatively wide band, as an 8 on the 10-point scale. When the high-frequency cutoff was progressively lowered, making the sound quality more muffled, the similarity of the quality to that of the CI was progressively reduced. Thus, for this patient the sound quality of her CI was similar to that of a natural speech signal with reduced amplitudes of frequencies below 200 Hz.
The four patients shown in the middle of Figure 2 needed the opposite manipulation to make a signal sound like their CI. For these patients reducing the band-pass, and making the signal more muffled, was the key to increasing the similarity of the signals to the CI signal.
Finally, as shown at the bottom of Figure 2, for three patients band-pass filtering had no effect on similarly judgements.
Overall, the results from Experiment 1 suggested that, to capture the sound quality of a CI for SSD-CI patients, in Experiment 2 we should 1) use natural speech signals, rather than noise or sine vocoded signals, and 2) explore alternative ways to create a muffled sound quality.
In Experiment 2 we used a new approach to creating CI-like signals using natural speech. Multitrack signal layering was used to create signals that varied in several dimensions. This was to acknowledge that the CI-percept is more likely to be complex than simple. For example, by mixing a signal with a flattened fundamental frequency (F0) with a low-pass filtered speech signal, a “machine-like” or “robotic” quality could be added to a “muffled” signal.
A new matching paradigm, similar to the procedure for fitting eye glasses, was also employed. For example, if a patient indicated that a filtered signal was a moderate match, but still lacked a quality of the CI, signals were added to the filtered signal one by one and the listeners indicated whether the new combined signal was closer to the sound of the CI than the previous signal.
Finally, in an effort to create a muffled percept in a manner other than that created by frequency filtering, a spectral smearing algorithm (20) was implemented. This algorithm broadens spectral peaks—which is a likely consequence of electrical spread of excitation.
Subjects. As shown in Table 1, three young (10.7–15.4 yr) SSD-CI patients, who had participated in Experiment 1, participated in this experiment. Duration of deafness ranged from 1.5 to 5.8 years. Experience with a CI ranged from 2.5 to 6 months.
Stimulus parameters. The software allowed changes in the characteristics of clean signals along multiple dimensions. The manipulations and the rationale for those manipulations are listed below:
- The corner frequencies of low-, high-, and band pass filters. Both low-pass and band pass signals sound muffled.
- Spectral smearing. The effects of reduced frequency selectivity on the representation of spectral peaks were simulated by “smearing” the spectra of the stimuli (20). See Figure 3 for a description of the maximum smearing in this experiment.
- Pitch shifting and flattening. The absence of an appropriate “place” pitch in the range of the male voice pitch in the CI ear, due to electrodes not extending to the apex, should alter the perception of pitch (see (21,22)). The software allowed a shift in the mean pitch of an entire signal and a compression of the pitch range of a signal.
- Formant shifting. For the same reason that voice pitch may be shifted depending on electrode insertion depth and listening experience, the entire speech spectrum may be perceptually upshifted.
RESULTS AND DISCUSSION
The signal manipulations that produced the best matches for sound quality are shown in Table 2. As indicated in the next to the last column, all three listeners ranked the sound quality of the matches as 10 on our scale of 0 to 10, i.e., the signals sounded like the CI signal. All three matches were created by spectral smearing, to 1 degree or another. For two of the three matches, the signals were also band pass filtered between 200 and 1000 Hz. Thus, as the results of Experiment 1 hinted, a muffled percept captures the sound quality of the CI for some SSD-CI listeners.
The results are internally consistent in the following manner: as shown in Table 2, listeners 1 and 3, who rated the unedited, original sound file as a poor match to the CI, needed significant alteration of the original file to make a match. These alterations included both low-pass filtering at 1 kHz and a high degree of spectral smearing. On the other hand, listener 2, who rated the unedited, original sound file as a very close match to the CI, i.e., 9.3 out of 10, needed only the smallest amount of spectral smearing to rate the signal as a 10.
The data shown in Table 2 indicate that speech quality and speech intelligibility are separable dimensions. Listener 1 matched to a highly filtered and smeared signal and had 95% speech understanding scores in quiet. Listener 2 matched to a very slightly altered signal and had much poorer speech understanding—74% correct. Finally, listener 3, who needed a similar degree of signal alteration as listener 1, had a very poor speech understanding score.
We speculate that children and adolescents are more willing to say that a candidate stimulus is an exact match for the sound of the CI than adults. Adults may attend to small differences between the sound of the CI and candidate signals and may give a score of 9 or 9.5 to a signal that a younger listener might rate as a 10. If this is the case, then the matches shown in Table 2 capture most, but not all, of the sound quality of the CIs.
The matches shown in Table 2 suggest that the CI sound quality experienced by young, relatively inexperienced, SSD-CI patients varies widely. Unpublished data from our laboratory indicates that adults with more CI experience match to a similar range of sound qualities.
The Value of a Normal Hearing Ear for Device Programming
In this article, we explored the sound quality of a cochlear implant by altering clean signals in an effort to make them sound like a CI. Of course, the test environment can be reversed and the CI signal can be manipulated to make it sound more like the clean signal directed to the normal hearing ear. In this test environment, SSD-CI patients have a clear advantage in providing feedback to a clinician on sound quality because they can refer to the sound quality of the signal in their normal hearing ear. Standard patients, with one or two CIs, do not have this reference signal. For these patients, the reference is the memory of a clean signal and for many patients the last time a truly clean signal was heard, i.e., one with high frequencies unaltered by hearing loss, was many years before implantation. For that reason, SSD-CI patients may be able to provide insights into signal quality as a function of variations in signal-processing strategy that standard patients are unable to provide.
1. Douglas SA, Yeung P, Daudia A, et al. Spatial hearing disability after acoustic neuroma removal. Laryngoscope
2. Arndt S, Aschendorff A, Laszig R, et al. Comparison of pseudo-binaural hearing to real binaural hearing rehabilitation after cochlear implantation in patients with unilateral deafness and tinnitus. Oto Neurotol
3. Arndt S, Laszig R, Aschendorff A, et al. The University of Freiburg Asymmetric Hearing Loss Study. Audiol Neurotol
2011; 16 (suppl 1):3–25.
4. Van de Heyning P, Vermeire K, Diebl M, et al. Incapacitating unilateral tinnitus in single-sided deafness treated by cochlear implantation. Ann Otol Rhinol Laryngol
5. Sharma A, Glick H, Campbell J, et al. Cortical plasticity and reorganization in pediatric single-sided deafness pre- and postcochlear implantation: A case study. Oto Neurotol
6. Vlastarakos P, Nazos K, Tavoulari EF, Nikolopoulos T. Cochlear implantation for single sided deafness: The outcomes. An evidence based approach. Euro Arch Otorhinolaryngol
7. Dorman M, Zeitler D, Cook S, et al. Interaural level difference cues (ILDs) determine sound source localization by single-sided deaf
patients fit with a cochlear implant. Audiol Neurotol
8. Zeitler D, Dorman M, Cook S, et al. Sound source localization and speech understanding in complex listening environments by single-sided deaf
listeners after cochlear implantation. Oto Neurotol
9. Loizou P. Mimicking the human ear: An overview of signal processing techniques for converting sound to electrical signals in cochlear implants
. IEEE Signal Process Mag
10. Loizou P. Møller A. Speech processing in vocoder-centric cochlear implants
. Cochlear and Brainstem Implants. Adv Otorhinolaryngol
. Basel: Karger; 2006. 109–143. 64.
11. Dudley H. The Vocoder. In Bell Labs Rec
12. Shannon R, Zeng FG, Kamath V, et al. Speech recognition with primarily temporal cues. Science
13. Dorman M, Loizou P, Rainey D. Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. J Acoust Soc Am
14. Loizou P, Dorman M, Tu Z. On the number of channels needed to understand speech. J Acoust Soc Am
15. Dorman M, Loizou P, Rainey D. Simulating the effect of cochlear-implant electrode insertion depth on speech understanding. J Acoust Soc Am
16. Fu Q-J, Shannon R. Recognition of spectrally degraded and frequency shifted vowels in acoustic and electric hearing. J Acoust Soc Am
17. Fishman K, Shannon R, Slattery W. Speech recognition as a function of the number of electrodes used in the SPEAK cochlear implant speech processor. J Speech Lang Hear Res
18. Ketten D, Skinner M, Wang G, et al. In vivo measures of cochlear length and insertion depth of nucleus cochlear implant electrode arrays. Ann Otol Rhinol Laryngol Suppl
19. Finley C, Holden T, Holden L, et al. Role of electrode placement as a contributor to variability in cochlear implant outcomes. Otol Neurotol
20. Baer T, Moore BCJ. Effects of spectral smearing on the intelligibility of sentences in the presence of interfering speech. J Acoust Soc Am
21. Eddington D, Dobelle W, Brackmann D, et al. Auditory prosthesis research with multiple channel intracochlear stimulation in man. Ann Otol Rhinol Laryngol
1978; 87 (6 part 2, S53):1–39.
22. Dorman M, Smith L, Smith M, et al. The pitch of electrically presented sinusoids. J Acoust Soc Am