Cochlear implant (CI) sound processing strategies are usually evaluated in clinical studies involving experienced implant recipients. Metrics which estimate the capacity to perceive speech for a given set of audio and processing conditions provide an alternative means to assess the effectiveness of processing strategies. The aim of this research was to assess the ability of the output signal to noise ratio (OSNR) to accurately predict speech perception. It was hypothesized that compared with the other metrics evaluated in this study (1) OSNR would have equivalent or better accuracy and (2) OSNR would be the most accurate in the presence of variable levels of speech presentation.
For the first time, the accuracy of OSNR as a metric which predicts speech intelligibility was compared, in a retrospective study, with that of the input signal to noise ratio (ISNR) and the short-term objective intelligibility (STOI) metric. Because STOI measured audio quality at the input to a CI sound processor, a vocoder was applied to the sound processor output and STOI was also calculated for the reconstructed audio signal (vocoder short-term objective intelligibility [VSTOI] metric). The figures of merit calculated for each metric were Pearson correlation of the metric and a psychometric function fitted to sentence scores at each predictor value (Pearson sigmoidal correlation [PSIG]), epsilon insensitive root mean square error (RMSE*) of the psychometric function and the sentence scores, and the statistical deviance of the fitted curve to the sentence scores (D). Sentence scores were taken from three existing data sets of Australian Sentence Tests in Noise results. The AuSTIN tests were conducted with experienced users of the Nucleus CI system. The score for each sentence was the proportion of morphemes the participant correctly repeated. In data set 1, all sentences were presented at 65 dB sound pressure level (SPL) in the presence of four-talker Babble noise. Each block of sentences used an adaptive procedure, with the speech presented at a fixed level and the ISNR varied. In data set 2, sentences were presented at 65 dB SPL in the presence of stationary speech weighted noise, street-side city noise, and cocktail party noise. An adaptive ISNR procedure was used. In data set 3, sentences were presented at levels ranging from 55 to 89 dB SPL with two automatic gain control configurations and two fixed ISNRs.
For data set 1, the ISNR and OSNR were equally most accurate. STOI was significantly different for deviance (p = 0.045) and RMSE* (p < 0.001). VSTOI was significantly different for RMSE* (p < 0.001). For data set 2, ISNR and OSNR had an equivalent accuracy which was significantly better than that of STOI for PSIG (p = 0.029) and VSTOI for deviance (p = 0.001), RMSE*, and PSIG (both p < 0.001). For data set 3, OSNR was the most accurate metric and was significantly more accurate than VSTOI for deviance, RMSE*, and PSIG (all p < 0.001). ISNR and STOI were unable to predict the sentence scores for this data set.
The study results supported the hypotheses. OSNR was found to have an accuracy equivalent to or better than ISNR, STOI, and VSTOI for tests conducted at a fixed presentation level and variable ISNR. OSNR was a more accurate metric than VSTOI for tests with fixed ISNRs and variable presentation levels. Overall, OSNR was the most accurate metric across the three data sets. OSNR holds promise as a prediction metric which could potentially improve the effectiveness of sound processor research and CI fitting.