Journal Logo

Special Edition on Cochlear Implants

Speech Perception Using Maps Based on Neural Response Telemetry Measures

Seyle, Keely; Brown, Carolyn J.

Author Information
  • Free

Abstract

The Food and Drug Administration recently approved cochlear implantation for children as young as 12 mo of age. Increasing numbers of children with multiple developmental delays are also now being considered for cochlear implantation. Programming the speech processor of the cochlear implant for these children can be challenging. Cochlear Corporation has recently introduced a cochlear implant that allows the response of the auditory nerve to electrical stimulation to be measured from an electrode inside the cochlea. Results of initial experiments comparing the auditory nerve response to the electrical stimulation levels used to program the speech processor of the cochlear implant have been promising (Brown, Hughes, Luk, Abbas, Wolaver, & Gervais, 2000; Hughes, Brown, Abbas, Wolaver, & Gervais, 2000). However, the physiologically based MAPs are not identical to MAPs constructed using standard behavioral measures. None of the studies published to date report measures of speech perception or ratings of the sound quality made using physiologically based MAPs. Studies have shown that relatively small changes in T- and C-levels can influence judgments of sound quality and/or speech perception with the implant (Dawson, Skok, & Clark, 1997; Skinner, Holden, Holden, & Demorest, 1999; Skinner, Holden, Holden, Demorest, & Fourakis, 1997). Other studies have shown consonant perception to be relatively constant despite fairly large changes in MAP T-levels (Zeng & Galvin, 1999). It is possible that the error in the physiologically based MAPs may be substantial enough to be detrimental to speech perception.

The purpose of this study is to assess whether speech perception measures obtained when subjects use MAPs that are based solely or primarily on physiologic thresholds differ significantly from speech perception measures obtained when MAPs created using standard behavioral techniques are used. A finding that performance with the physiologically based MAPs does not differ significantly from performance with the standard MAP would provide further evidence to support the use of physiologic potentials to program the speech processor for young cochlear implant users.

Methods

Subjects

Ten postlingually deafened, adult Nucleus CI24M cochlear implant users participated in this study. All 10 subjects were implanted at the University of Iowa Hospitals and Clinics and complete electrode insertion was achieved in all cases. Two subjects had binaural cochlear implants (I24–35b and I24–54b); however, they were tested monaurally, and results from only one ear were included in this study. All of the subjects participating in this study had used their cochlear implant for at least 3 mo before testing.

Physiologic Recording Procedures

Neural response Telemetry (NRT) software (version 2.01) provided by Cochlear Corporation was used to record the Electrically evoked compound action potential (EAP) responses. This software implements an artifact subtraction technique to separate the neural response from electrical artifact. This subtraction procedure requires presentation of two biphasic current pulses that are separated by a short interpulse interval. The purpose of the first pulse, or masker, is to drive the nerve into a refractory state. The second pulse in the two-pulse sequence is called the probe pulse. If the nerve is refractory at the time the probe is presented, a recording is made that contains probe stimulus artifact without an overlying neural response. A second recording is then made using a single probe pulse. The recording of the probe stimulus artifact is then subtracted from the recording made when the nerve was not refractory. The result of this subtraction is the EAP. This procedure has been described in detail elsewhere (Abbas et al., 1999; Brown, Abbas, & Gantz, 1990). The relevant settings for the masker, probe and the timing intervals used to record the EAP are described below.

EAP responses were collected using monopolar stimulation with the ball electrode in the temporalis muscle (MP1) serving as the stimulation ground. A second intracochlear electrode, typically located two electrodes apical to the stimulus electrode, was used to record the response relative to the second ground electrode on the case of the internal receiver/stimulator (MP2). The stimulus used to evoke the EAP was a series of 25 μsec/phase, biphasic current pulses presented at a nominal rate of 80 Hz. The interval between the masker and probe was 500 μsec. The recording system used a gain of 60 dB. The delay between the probe offset and the initiation of sampling was initially set to 60 μsec. The sampling delay and recording electrode were adjusted for some subjects to allow the EAP to be recorded without significant artifact contamination. A protocol for changing these parameters, described by Abbas et al. (1999), was followed.

EAP thresholds were obtained for a series of eight electrodes (3, 5, 7, 10, 13, 15, 17, and 20) spaced across the intracochlear electrode array. Masker level was set near the maximum level of comfort for each subject. Responses were then recorded for a series of probe levels. At high stimulation levels, 50 sweeps were used in the response average. Near threshold, the number of sweeps was increased to 100 or 200 sweeps. The masker and probe current levels are represented in stimulus units that range from 1 to 255. These numbers correspond to log current values ranging from 0 to approximately 1.25 mA. The probe level was initially varied in steps of 5 to 10 programming units and decreased to 2 to 5 programming units near threshold.

Generally, NRT testing took no longer than an hour and was accomplished while the subjects were seated in a reclining chair and reading or watching closed-captioned television.

Figure 1 shows a series of EAP waveforms recorded from subject I24–42. The EAP is characterized by a single negative peak (N1) that is recorded approximately 0.3 to 0.4 msec after the onset of the probe. As probe level is decreased, EAP amplitude decreases. EAP threshold was determined automatically using cross-correlation analysis. This technique requires that a high-level response waveform, or template, be identified. This waveform is then scaled and compared with the waveforms recorded at each of the lower stimulus levels. Threshold was defined as the lowest probe level that resulted in a correlation coefficient of at least 0.80. In Figure 1, the correlation threshold was found to be 190 programming units for the electrode shown. This method of determining EAP thresholds agrees well with visual detection thresholds and avoids bias introduced by the investigator (Hughes et al., 2000).

Figure 1
Figure 1:
A series of EAP waveforms recorded from one subject (I24–42) are shown. In this example, electrode 5 was stimulated. The EAP was recorded from electrode 7. The level of the probe was systematically varied from 220 to 180 and is shown on this figure. The asterisk indicates the waveform identified as threshold.

Speech Processing Strategies Evaluated

Three SPEAK MAPs were constructed for each subject. The goal was to simulate conditions that may apply to children. Consequently, all three MAPs were created using frequency table 6. The volume control was set to 8 and the sensitivity control to 7, and the subjects were not allowed to alter these settings during speech testing. The three MAPs differed in terms of how threshold (T-levels) and maximum comfort levels (C-levels) were determined.

One MAP was created using standard programming techniques. We refer to this MAP as the “Measured MAP.” T- and C-levels were created using an adaptive procedure and were then balanced across the electrodes. The audiologist who created this MAP was experienced with programming cochlear implants and did not have any knowledge of the subjects’ EAP thresholds.

Five subjects used SPEAK MAPs on a daily basis. The remaining subjects used ACE MAPs routinely. None of the subjects who used the SPEAK strategy on a daily basis used the same volume, sensitivity and frequency table settings as used to create the Measured MAP in this study. Additionally, these subjects were told that the Measured MAP was a new experimental MAP, and the similarity to the program they routinely used was not emphasized.

Another MAP was created based only on the NRT data. This MAP is one that we refer to as the “+10/-20 MAP.” No behavioral estimates of T- or C-levels were used to construct this MAP. The +10/−20 MAP was created by setting C-levels 10 programming units above EAP thresholds and setting T-levels 20 programming units below the EAP thresholds. Because EAP thresholds were measured on only eight electrodes (electrodes 3, 5, 7, 10, 13, 15, 17, and 20), linear interpolation techniques were used to set T- and C-levels on electrodes where EAP thresholds were not measured.

A final MAP was created from a combination of EAP thresholds and a limited amount of behavioral information. This MAP is one that we refer to as the “Combined MAP.” This procedure has been described previously (Brown et al., 2000). According to this procedure, an electrode in the middle of the electrode array, typically electrode 10, is chosen as a reference. Standard behavioral techniques were used to establish T- and C-levels for electrode 10. The difference between the measured T-level and the EAP threshold recorded on electrode 10 was computed. T-levels were then set by subtracting this difference from the EAP thresholds measured on the other electrodes in the array. C-levels for the Combined MAP were determined by calculating the difference between the EAP threshold for electrode 10 and the measured C-level for electrode 10. That difference was then added to the EAP thresholds measured on the other electrodes. Again, linear interpolation techniques were used to estimate T- and C-levels for electrodes where EAP thresholds were not measured.

Speech Recognition Testing Procedures

All three experimental MAPs were loaded into the memories of the SPrint speech processor. Subjects were given approximately 10 minutes to listen to the examiner’s voice with each of the three experimental MAPs before speech recognition testing began. Each subject was seated in a sound-treated booth. All speech material was presented from a speaker located directly in front of the subject at a distance of 3 feet. For eight of the subjects, the City University of New York (CUNY) sentences were used. The two remaining subjects were tested using Hearing in Noise Test (HINT) sentences. Care was taken to ensure that the subjects had not previously been tested with the lists used. HINT sentences were used instead of CUNY sentences for two subjects because these two subjects had a great deal of prior experience with the CUNY sentence lists. Both sentence tests were presented from the recordings on the Cochlear Corporation Test Battery compact disk. Subjects were instructed to repeat back each of the sentences and were encouraged to guess if necessary.

Testing was conducted at two different presentation levels: 70 and 55 dB SPL as calibrated using a sound level meter with an “A” weighting scale held at the approximate position of the subject’s head.

Initially, a practice list of sentences was administered at 70 dB SPL and performance was assessed using the Measured MAP. If performance was 80% or greater in quiet on this practice list, eight-talker babble was added and the signal to noise level was varied until performance on a practice list fell below 80%. This process allowed us to avoid ceiling effects on the speech recognition measures. The competing babble was presented from the same speaker as the sentences. Once the level of babble needed to avoid ceiling effects with the Measured MAP was determined, speech perception was assessed for each of the three MAPs using this same signal to noise ratio. Speech perception was measured for all three of the MAPs at the 70 dB SPL presentation level. The order in which the MAPs were tested was randomized to minimize fatigue and learning effects. The testing procedure was then repeated at the 55 dB SPL presentation level. At this lower presentation level, none of the subjects required presentation of competing eight-talker babble to avoid ceiling effects.

This procedure resulted in a total of 36 lists of CUNY sentences (approximately 1224 items per condition) presented to eight subjects. The remaining two subjects (I24–42 and I24–20) were tested using HINT sentences. Eighteen HINT sentence lists (approximately 318 items per condition) were presented to these two subjects. All subjects were given breaks as needed to avoid fatigue.

Results

Figure 2 illustrates the relationship between EAP thresholds and the measured T- and C-levels for each subject. The thick and thin lines represent the C- and T-levels that were measured using traditional programming techniques. These T- and C-levels were used to construct the Measured MAP. EAP thresholds are represented on this figure with open circles. These graphs illustrate the intersubject variability that occurs. For some subjects EAP thresholds fall close to C-levels (I24–42, I24–35b, and I24–12). For other subjects EAP thresholds fall closer to T-levels (I24–20 and I24–40). For this subject pool, none of the EAP thresholds exceed the C-levels or fall below T-level.

Figure 2
Figure 2:
EAP thresholds and the measured T- and C-levels for each subject are plotted as a function of the stimulating electrode.

Figure 3 shows the relationship between the +10/−20 MAP T- and C-levels and the Measured MAP T- and C-levels. Again, the solid lines represent the measured T- and C-levels, whereas the closed circles and open circles are used to represent the +10/−20 MAP T- and C-levels. For several subjects, we come very close to estimating either the T- or the C-levels but generally not both. The average difference between the +10/−20 MAP C-levels and the measured C-levels was 7.41 programming units and ranged from −4.57 to 16.57 programming units. The average difference between the +10/−20 MAP T-levels and the measured T-levels was 9.55 programming units and ranged from −25.95 to 14.57 programming units.

Figure 3
Figure 3:
T- and C-levels used to make the +10/−20 MAP are shown with filled and open symbols, respectively, for each subject. The solid lines represent the Measured T- and C-levels determined using traditional behavioral techniques.

Figure 4 shows the relationship between T- and C-levels for the Combined MAP and T- and C-levels for the Measured MAP. Again, the two solid lines represent the measured T- and C-levels. The open and filled circles represent the Combined MAP C- and T-levels, respectively. In general, the correlation between the measured and predicted T- and C-levels is better for the Combined MAP than the +10/−20 MAP. For five subjects (I24–51, I24–35b, I24–20, I24–33, and I24–12) the Combined MAP T- and C-levels approximated the Measured MAP T- and C-levels for the majority of electrodes. The average difference between the Combined MAP C-levels and the measured C-levels was 3.65 programming units and ranged from −3.71 to 9.33 programming units. The average difference between the Combined MAP T-levels and the measured T-levels was 3.28 programming units and ranged from −5.71 to 9.24 programming units. For some subjects the Combined MAP T-levels were substantially below their measured T-levels (I24–54b, I24–15 and I24–42) for a portion of the electrode array. For the remaining subjects, the T-levels used to make the Combined MAP closely approximate the measured T-levels.

Figure 4
Figure 4:
T- and C-levels used to make the Combined MAP are shown with filled and open symbols, respectively, for each subject. The solid lines represent the measured T- and C-levels determined using traditional behavioral techniques.

Figure 5 shows the results of all 10 subjects on speech perception for each of the three different experimental MAPs. The top graph illustrates results obtained at the higher presentation level (70 dB SPL). The lower graph illustrates results obtained at the lower presentation level (55 dB SPL). The symbols indicate the mean percent correct for all lists at each condition. The error bars indicate ±1 SE around the mean. Although there is considerable variability, there is a tendency for these subjects to score more poorly when using the +10/−20 MAP or the Combined MAP than when using the Measured MAP at the 70 dB SPL level. The lower graph in Figure 5 shows the results obtained using a 55 dB SPL presentation level for eight of the subjects tested at the 70 dB SPL level. Again, there is considerable variability. Four subjects showed less than a 10% difference in performance between the three MAPs. Two subjects (I24–35b and I24–40) performed substantially worse using the +10/−20 MAP relative to either the Measured MAP or the Combined MAP.

Figure 5
Figure 5:
Average speech recognition scores measured at 70 and 55 dB SPL using the Measured MAP and both experimental MAPs. Error bars indicate ±1 SE around the mean for each condition.

The average speech recognition scores for the 10 subjects participating in this study are presented in Figure 6. The black bar represents the Measured MAP results, and the light gray and dark gray bars represent the +10/−20 and Combined MAPs, respectively. There is a tendency for the average performance of this subject group to be better when using the Measured MAP as opposed to either the +10/−20 or the Combined MAPs at the two presentation levels. The mean differences in speech recognition for the 70 dB SPL presentation level were significant at a p = 0.012 level when a repeated measures analysis of variance (ANOVA) was performed. A post hoc Tukey Test revealed that the speech recognition results obtained with the +10/−20 and Combined MAPs were significantly lower than the mean performance using the Measured MAP (p < 0.05). No significant difference was found for speech performance between the +10/−20 and Combined MAPs. A repeated measures ANOVA of the results obtained at the lower presentation revealed no significant differences in speech recognition between the three MAPs (p = 0.147).

Figure 6
Figure 6:
Average performance of speech recognition measures using each of the three different MAPs. Error bars indicate ±1 SE around the mean.

Discussion

For the subjects in this study, EAP thresholds were found to be within the dynamic range for the stimulus used to program the speech processor (see Fig. 2). None of the EAP thresholds exceeded the measured C-levels, nor did they fall below the measured T-levels. Brown et al. (2000) found that for 38% of the electrodes tested in their study, EAP thresholds were obtained at levels that exceeded the measured C-levels. Franck (1999) also showed a percentage of subjects where raw EAP thresholds exceeded C-levels. The subject pool in the Brown et al. study, however, was much larger than the subject pool in this study (44 subjects as compared with 10 subjects). It is possible that the differences between the present study and the Brown et al. (2000) study are due to differences in the sample size.

In general, neither approach to using NRT data to program the speech processor of the cochlear implant was 100% accurate (see Figs. 3 and 4). The Combined MAPs came closer to the Measured MAPs than the +10/−20 MAPs did (see Figs. 3 and 4). For subjects with small dynamic ranges (e.g., subject I24–40), the +10/−20 procedure for setting the T- and C-levels was particularly inappropriate. Additionally, examination of Figure 4 reveals that the difference between the Combined MAPs and the Measured MAPs was greatest for electrodes that were located on the apical half of the array. Despite the fact that the C-levels in some of the NRT-based MAPs exceeded measured C-levels for some of the electrodes on the array (e.g., Fig. 3, subject I24–35b), none of the subjects remarked that the MAPs were uncomfortably loud.

The primary goal of this study was not to compare EAP thresholds with MAP levels, but to compare speech perception scores obtained by subjects who used these physiologically based MAPs with scores obtained using more traditionally programmed MAPs. In this study we found a trend that subjects tended to perform more poorly with the two EAP-based MAPs relative to the Measured MAP. This difference was found to be statistically significant for the 70 dB SPL presentation level. The mean trend was the same at the 55 dB SPL presentation level, but the differences were not large enough to reach statistical significance (see Fig. 6). However, only eight subjects were tested at the 55 dB SPL level and the power of the repeated measures ANOVA that was performed to evaluate the results was low. It is possible that the trend we observed at the low level would have also reached significance if we had more subjects.

One limitation of the present study is that the subjects were given very little time to listen to speech using these experimental MAPs before testing. Additionally, the Measured MAP was similar to the MAP five of these subjects used on a daily basis. It is possible that these results may have been different if the subjects were given a chance to become more familiar with these experimental MAPs before testing began. We would expect, however, that more experience with the NRT-based MAPs may have tended to decrease the observed difference in performance between NRT-based MAPs and MAPs constructed with standard programming techniques.

Results of this study are generally consistent with the results of previous studies. Both Dawson et al. (1997) and Skinner et al. (1999) reported significant differences in performance on tests of speech recognition when subjects were tested using MAPs where T- or C-levels were varied systematically.

Zeng and Galvin (1999) did not find significant differences in performance despite making fairly large changes in T-levels. It is possible, however, that the differences between this study and that of Zeng and Galvin (1999) were due to different speech materials used or the specific procedures they used for manipulating the MAP dynamic range.

For some subjects the T-levels for the +10/−20 or the Combined MAPs were below the T-levels for the Measured MAP (for example, I24–40, I24–35b, I24–20, I24–15, I24–42, and I24–54b). If the T-levels are set below the subject’s actual threshold levels, it is possible some speech cues may be inaudible, especially for soft speech. This may explain the fact that some subjects’ speech recognition levels decrease when using the experimental MAPs at the 55 dB SPL presentation level. As can be seen in Figure 5, subjects I24–40 and I24–35b had a significant decrease in speech recognition with the +10/−20 MAP. Figure 3 shows that all of I24–40’s T-levels for the +10/−20 MAP were below the measured T-levels, as were most of the T-levels for subject I24–35b’s +10/−20 MAP. For both subjects, the +10/−20 MAP T-levels were, on average, below the measured T-levels by 14.57 and 4.57 for I24–40 and I35b, respectively. Both subjects showed decreased performance using the +10/−20 MAP. The Combined MAP corrected most of the T-levels for subjects I24–40 and I24–35b, which may explain the increase in speech recognition when compared with the +10/−20 MAP speech recognition results for these two subjects.

For subjects I24–15 and I24–54b, approximately one-third of the their electrodes on the +10/−20 MAP were set well below the T-levels obtained using standard behavioral techniques, whereas about two-thirds of the electrodes on the Combined MAP were below the T-levels in the Measured MAP. This could explain the fact that these two subjects performed somewhat better with the +10/−20 MAP as compared with the Combined MAP. The subjects who performed the best with the +10/−20 MAP at the 55 dB SPL presentation level (I24–51, I24–33, and I24–12) all had T-levels that were set above their measured T-levels. As Skinner et al. (1999) has suggested, perhaps for these subjects the effect of raising the T-levels in the NRT-based MAPs was to make more of the low level speech information audible to the subject therefore increasing performance.

This study reports data obtained from postlingually deafened adults. Clearly, however, the population of cochlear implant users most likely to have their cochlear implant speech processors programmed using physiologic rather than behavioral thresholds are infants and very young children. Hughes et al. (2000) found that electrical dynamic ranges for children are larger than adult dynamic ranges and that for children, EAP thresholds are recorded at approximately 53% of the dynamic range. Brown et al. (2000) found that for adults, EAP thresholds are recorded at approximately 91% of the dynamic range. This difference is largely due to the fact that children tend to have higher C-levels, on average, than postlingually deafened adults do (Hughes et al., 2000). These data suggest that if the mapping techniques described in this study were applied to children, relatively few instances of over stimulation would occur. To date, no information is available that describes the speech perception skills of children fitted using NRT-based techniques. Future research exploring the efficacy of EAP-based MAPs on children would seem warranted.

Conclusion

This study has demonstrated that for this group of postlingually deafened adults, speech perception scores obtained using MAPs based solely or primarily on EAP thresholds tended to be lower than speech perception scores obtained using MAPs constructed using standard behavioral techniques. This difference, although statistically significant at one presentation level, was not large. The importance of this finding is that although MAPs for young children may not be perfect, they may be sufficient to sustain adequate levels of speech perception to allow for speech and language development.

Acknowledgments:

This study was funded by research grant DC00242 NIH/NIDCD; grant RR59, General Clinical Research Centers Program, NIH; and the Iowa Sight and Hearing Foundation. This paper is based on a master’s thesis completed at the University of Iowa by the first author. The authors wish to thank the patients who participated in this study; Michelle Hughes, Abigail Wolaver, Beth Wahl, Anne Gehringer, Wendy Parkinson, Mary Lowder and Aaron Parkinson for their assistance in NRT data collection and programming support.

References

1. Abbas, P. J., Brown, C. J., Shallop, J. K., Firszt, J. B., Hughes, M. L., Hong, S. H., & Staller, S. J. (1999). Summary of results using the Nucleus CI24M implant to record the electrically evoked compound action potential. Ear and Hearing, 20, 45–59.
2. Brown, C. J., Abbas, P. J., & Gantz, B. (1990). Electrically evoked whole-nerve action potentials: Data from human cochlear implant users. Journal of the Acoustical Society of America, 88, 1385–1391.
3. Brown, C. J., Abbas, P. J., & Gantz, B. J. (1998). Preliminary experience with neural response telemetry in the Nucleus CI24M cochlear implant. American Journal of Otology, 19, 320–327.
4. Brown, C. J., Hughes, M. L., Luk, B., Abbas, P.J., Wolaver, A., & Gervais, J. (2000). The relationship between EAP and EABR thresholds and levels used to program the Nucleus CI24M speech processor: Data from adults. Ear and Hearing, 21, 151–163.
5. Dawson, P. W., Skok, M., & Clark, G. M. (1997). The effect of loudness imbalance between electrodes in cochlear implant users. Ear and Hearing, 18, 156–165.
6. Franck, K. H. (1999). The electrically evoked whole-nerve action potential: Fitting applications for cochlear implant users. Unpublished doctoral dissertation, University of Washington.
7. Hughes, M. L., Brown, C. J., Abbas, P. J., Wolaver, A. A., & Gervais, J. P. (2000). Comparison of EAP thresholds to MAP levels in the Nucleus CI24M cochlear implant: Data from children. Ear and Hearing, 21, 164–174.
8. Skinner, M. W., Holden, L. K., Holden, T. A., & Demorest, M. E. (1999). Comparison of two methods for selecting minimum stimulation levels used in programming the Nucleus 22 cochlear implant. Journal of Speech, Language, and Hearing Research, 42, 814–828.
9. Skinner, M. W., Holden, L. K., Holden, T. A., Demorest, M. E., & Fourakis, M. S. (1997). Speech recognition at simulated soft, conversational, and raised-to-loud vocal efforts by adults with cochlear implants. Journal of the Acoustical Society of America, 101, 3766–3782.
10. Zeng, F., & Galvin, J. J.,III. (1999). Amplitude mapping and phoneme recognition in cochlear implant listeners. Ear and Hearing, 20, 60–74.
© 2002 Lippincott Williams & Wilkins, Inc.