Journal Logo

Research Articles

Factors Influencing Speech Production in Elementary and High School-Aged Cochlear Implant Users

Tobey, Emily A.1,2; Geers, Ann E.1,2; Sundarrajan, Madhu1; Shin, Sujin1

Author Information
doi: 10.1097/AUD.0b013e3181fa41bb
  • Free


Teaching a child with a profound sensorineural hearing loss (SNHL) of 90 dB HL or greater to use spoken communication as their primary mode of communication represents a significant clinical challenge. When profound SNHL occurs within the early years of life, children are deprived of critical auditory stimulation during time periods associated with substantial development of the neural architecture of the auditory system. Auditory system development seems to be associated not only with laying the foundation for processing speech signals to form language constructs but also with laying the foundation for producing speech signals that are understood by other listeners who interact with the speaker with a SNHL (Kent et al. 1989; Tobey et al. 2000). Exactly how auditory feedback influences the development of intelligible speech in children with profound SNHL remains elusive in the detail of how it stimulates and refines spoken output—yet, the diminished or absent auditory information during early formative years results in a wide array of poor spoken outputs. Spoken outputs from children with profound SNHL range from utterances understood by familiar and unfamiliar listeners to communications enhanced by sign systems—such as Signing Exact English or Cued Speech—or languages such as American Sign Language (Tobey & Geers 1995). The challenge of providing spoken communication to children with profound SNHL continues to command clinical attention because the majority of SNHL children have normal-hearing families who desire their children to participate in their “family communities” (Geers & Brenner 2003).

In the early 1990s, cochlear implants (CIs) were approved by the Food and Drug Administration (FDA) as an intervention process for children with profound SNHL. In the ensuing years, criteria for candidacy for a CI shifted from very conservative considerations of residual hearing and chronological age to less stringent criteria incorporating greater degrees of residual hearing and implantation at younger ages (Clark 2003). Yet, questions remain regarding the relatively long-term impact of CIs on spoken communication. Spoken communication involves a bidirectional interaction between a speaker and a listener. Spoken communication is often evaluated using rather global methods such as judgments of speech intelligibility or the “degree to which a speaker's intended message is recovered by the listener” (Kent et al. 1989). Measures of speech intelligibility in young children with severe-profound SNHL who use conventional hearing aids hover around 20% (Smith 1975). Speech intelligibility in young children with severe-profound SNHL who use CIs improves with increasing experience with plateaus of 80 to 90% intelligibility around 8 to 10 yrs postimplantation (Mondain et al. 1997; Chin et al. 2003; Peng et al. 2004a; Uziel et al. 2007). Greater changes in speech intelligibility accuracy are also associated with changes in the evolving sophistication of the CI technology: speech intelligibility is higher for children using the newest technology and whose audiologist's maximizes auditory performance through appropriate mapping techniques (Tobey & Geers 1995; Tobey et al. 2000). Improvement in speech intelligibility occurs regardless of whether the speech intelligibility is measured with minimal pair words (Monsen 1981), key words in sentences (McGarr 1981), rating scales (Nikolopoulos et al. 2005), or calculating the total words correctly identified (McGarr 1981; Tobey et al. 2000; Chin et al. 2003).

Moreover, studies of children in multiple languages have found that speech intelligibility improves postimplantation, suggesting that the electrical signal provided by the CIs promotes the development of fundamental speech production actions and their associated feedback consequences (Dawson et al. 1995; Mondain et al. 1997; Vieu et al. 1998; Peng et al. 2004b; Nikolopoulos et al. 2005; Law & So 2006; Uziel et al. 2007). Changes in speech intelligibility postimplantation are fueled by the interactions of these feedback-driven changes on speech gestures and the scaffolding supplied by new auditory-linked opportunities to interface with complex and rich linguistic environments. Inherent in the global speech intelligibility improvements are changes to discrete elements of the spoken message including increased accuracy of consonant and vowel production (Serry et al. 1997; Serry & Blamey 1999; Chin 2003, 2007; Tobey et al. 2003; Peng et al. 2004; Law & So 2006; Tomblin et al. 2008), reductions in overall durations of sentences to more nearly normal values (Tobey et al. 2003; Uchanski & Geers 2003), and use of more nearly normal suprasegmental aspects of speech (Uchanski & Geers 2003). CI children with poorer levels of speech intelligibility demonstrate greater numbers of substitutions of one sound for another, omissions of sounds in all word positions, and longer sentences composed of elongated syllables and pauses.

Evaluating the rate of speech intelligibility improvement postimplantation is often confounded by the interaction of factors related to the age of onset of deafness, the length of deafness, the age of implantation, the amount of residual hearing, and the “hearing” age of child. Newborn hearing screening programs strive to identify and provide intervention early within a child's first years. Short periods of deafness and early implantation are associated with higher speech intelligibility for children implanted within the first few years of life (Serry & Blamey 1999; Tobey et al. 2000; Chin et al. 2001; Uchanski & Geers 2003). Speech intelligibility is positively related to residual hearing: higher levels of speech intelligibility are evident in children with greater levels of residual hearing (Smith 1975; Tobey et al. 2000). Yet, rate of change in speech intelligibility after cochlear implantation is not as steep as the rate of change in speech intelligibility observed in typical hearing children (Chin et al. 2003). Several investigators suggest using a “hearing” age rather than chronological age to tease apart the effects of age of implantation from chronological age effects (Tomblin et al. 2008; Warner-Czyz et al. 2010). Several studies suggest that the greatest growth in speech production occurs within the first 6 yrs of use and plateaus of performance are associated with hearing ages of around 8 yrs in pediatric CI users (Blamey et al. 2001a,b; Tomblin et al. 2008).

The data reported here address a number of factors associated with how well children with profound SNHL who received CIs within the early FDA approval time period use spoken communication. This report focuses on spoken communication measured at early elementary and high school ages in one of the first, large groups of children to receive CIs after their FDA approval. We focus our attention on their abilities, as teenagers, to be understood by unfamiliar listeners in quiet and noisy situations and to produce sounds correctly as judged by trained listeners. We examine these measures in relation to other critical factors thought to influence levels of spoken communication performance. These factors include student characteristics such as duration of deafness, nonverbal performance intelligence quotient (PIQ), their families' socioeconomic status, family size, and gender. We also take into account their abilities to hear with the CI via their averaged threshold responses, their auditory memory, and how their communication is enhanced by the addition of visually based signed communication. We extend previous investigations not only by examining a large population with over 10 yrs experience with CIs but also by evaluating the usefulness of an additional method of measuring speech intelligibility in adolescents. We anticipated high levels of speech intelligibility in the adolescents; therefore, we elected to measure their intelligibility first by using our previous techniques of presenting their spoken sentences in quiet to unfamiliar listeners and then by digitally embedding the sentences in a multispeaker babble (MSB) background. Multispeaker backgrounds simulate situations where a teenage speaker must convey a message in a noisy situation such as a restaurant, sporting, or social event. The MSB was added “off-line”; thus allowing us to evaluate the speech without “on-line” adjustments by individual speakers to their rate, intensity, and clarity. Previous studies indicate that speech intelligibility is reduced under these conditions, but the extent to which it is reduced remains unclear as are the factors related to these reductions1 (Gould et al. 2001).



One hundred ten adolescent users of CIs (CI-HS) who participated in an earlier study conducted when the children were 8 and 9 yrs of age (CI-E test session) comprise the follow-up sample (Tobey et al. 2000, 2003). A detailed description of the CI-HS teenagers is contained in the article by Geers et al. (2011, this issue, pp. 2S–12S). The average age of the CI-HS participants was 16.7 yrs with a range of 15.0 to 18.5 yrs. Fifty-nine members of the CI-HS sample were female. Mean length of deafness of the CI-E participants was 3.1 yrs (SD 1.16 yrs) with a range between 4 mos and 5.4 yrs. Average experience with the CI at the CI-HS session was 13.3 yrs with a range of 10.8 to 15.7 yrs. The average family size at CI-E and CI-HS testing was 4.2 and 4.0 members, respectively. Socioeconomic status remained comparable for the participants across the two test sessions. All participants and their families signed consent and assent forms approved by the Institutional Review Board of the University of Texas at Dallas.

Average nonverbal PIQ measured by the Wechsler Intelligence Scale for Children-Performance Scale (WISC) (Wechsler 1991) was 103.2 (SD 14) during the CI-E and 103.1 (SD 15.97) during the CI-HS test session. Speech perception for the teenagers averaged 50.3% and 58.1% on the Lexical Neighborhood Test (Kirk et al. 2000) when tested at 70 dB SPL in elementary school and high school, respectively. Average scores on the Bamford Kowal Bench sentences (Bamford & Wilson 1979) were 63.2% at CI-E and 80.5% at CI-HS for presentations delivered at 70 dB HL. Seventy-three percent of the CI-HS adolescents report using oral methods of communicating (OC) in their environment, and 27% of the teens report incorporating signs into their communication (SC).

Speech Production Measures

Speech Intelligibility at CI-E and CI-HS

Digital audio tape and audio recordings were made of 36 sentences composed of three-, five-, and seven-syllable sentences (McGarr 1981). Each sentence contained a key word selected from a pool of words that predicted speech intelligibility in deaf children. The 18 words that ranked the lowest in intelligibility occurred in sentences designated as “low” context and the 18 words that ranked the highest occurred in sentences designated as “high” context. Participants were shown a card with the sentences written out, prompted with a verbal or sign elicitation of the sentence, and encouraged to repeat the sentence orally. The microphone was placed a foot in front of the participants' mouths. Recorded stimuli were edited under computer control to form individual files and to calculate the overall duration of the sentences. In addition, the individual sentences produced by the CI-HS participants were embedded in MSB produced by a male and female speaker reading the passages.1 Duration of the babble varied across the samples to present 3 secs of babble before the target sentence and 2 secs of babble after the sentence. A brief 0.2 sec 1 kHz tone was presented before the sentence. Measures derived without the background babble are referred to as quiet (Q). Samples were collected during both CI-E and CI-HS test sessions.

Judgments of the stimuli were acquired from normal-hearing adults who were allowed to hear a sentence and any given child speaking once. Judges were instructed to write down as much of the sentence as they understood. Three judges provided responses to each sentence rendering a total of 108 judgments contributing to the score. Judges were recruited from the student population of the University of Texas at Dallas and members of the Dallas community. All judges signed consent forms approved by the Institutional Review Board of the University of Texas at Dallas. The total number of key words was calculated (36 key words × 3 judges) and served as the dependent variables.

Duration Measures at CI-E and CI-HS

Stimuli were initially analyzed using a waveform display to identify the initial and final zero crossings associated with the sentences. Dependent variables used in the analyses of this report were the average durations of the 12 seven-syllable sentences from the high- and low-context sentences.

Consonant Accuracy at CI-E and CI-HS

Speech language pathologists transcribed the speech intelligibility sentences following procedures previously reported (Shriberg & Lof 1991; Serry & Blamey 1999; Tobey et al. 2003). Comparisons of the phonemic transcriptions produced a phoneme accuracy score derived in a software program, CASALA (Computerized Aided Speech and Language Analysis) (Blamey et al. 2001). Dependent variables of consonant correct were generated from reports using a percent correct consonant-revised (PCC-R) criterion (Campbell et al. 2007). Substitutions and omissions were counted as errors, and allophonic variations or distortions were counted as correct. A similar measure was adopted for vowels correct (a percent vowel correct-revised), reflecting allophonic variations as correct. Periodic calibration of transcribers occurred to reduce transcriber “drift.”

Use of Speech Questionnaire at CI-E

Parents of CI-E participants completed a “use of speech” questionnaire during the elementary school test session (Tobey et al. 2003). Parents indicated on a 5-point scale (ranging from completely understood to never understood) how well their child was understood by familiar and unfamiliar listeners. A speech usage score was acquired by averaging the item scores.

Measures of Sign Enhancement

Test results at CI-E and CI-HS were used to create metrics reflecting the extent to which a student's language improved when sign language was added to spoken language (i.e., sign enhancement). A complete description of these metrics is in the article by Geers et al. (2011, this issue, pp. 2S–12S). Language Samples—CI-E: An estimate of sign enhancement for the CI-E group was derived from two spontaneous language samples obtained at the age of 8 or 9 yrs (Geers et al. 2003). Each child was videotaped in two different semi-structured interviews with an unfamiliar female adult. One interview was conducted in an OC-only mode and the other interview was conducted in SC. Word-for-word transcriptions were created using the CHAT format developed by the CHILDES (MacWhinney 2010). Four dependent variables were calculated. Number of words per utterance served as a measure of utterance length. Number of words per minute served as a measure of lexical diversity. The Index of Productive Syntax (IPSyn): Noun Phrases and Sentence Complexity scores served as measures of syntactic complexity. Sign enhancement was estimated by determining the ratio of each of these language scores obtained in the OC interview to those obtained in the SC interview. The average value is referred to as the “CI-E Sign Enhancement Ratio.” Sign enhancement ratios <1.0 indicate better performance on language outcome measures in the SC compared with the OC interviews. Ratios close to or above 1.0 indicate that better language occurred in the OC interview or there were no differences in language use across the two conditions.

Peabody Picture Vocabulary Test-III (Dunn & Dunn 1997) Difference Score—CI-HS

The Peabody Picture Vocabulary Test-III (PPVT) was used to assess one-word receptive vocabulary at the CI-HS session. Examiners provided a label and the student selected the picture designating the label. Form IIIA was administered using standard administration of spoken stimulus words (OC administration). Form IIIB was administered using sign or finger spelling to accompany the spoken word (SC administration). Dependent variables were standard scores based on a normative sample of typically developing children (NS-TD). The standard scores from the PPVT administered with OC were subtracted from the standard scores from the PPVT administered with SC to form the Adolescent Sign Enhancement variable.

Working Memory

Digit Span Measure

Short-term information processing/storage capacity was measured with the digit span subtest of the WISC-III at both the CI-E and CI-HS sessions (Pisoni & Cleary 2003). Administration followed established guidelines for the WISC-III test procedures (Wechsler 1991). The task requires students to repeat lists of digits spoken by an experimenter at a rate of approximately one digit per second. The face of the clinician was visible to the student. The lists began with two digits and increased in length until a student was unable to correctly repeat two lists of a given length. Both forward and backward spans were obtained. A detailed description of these variables is in the article by Pisoni et al. (2011, this issue, pp. 60S–74S). Dependent variables were the longest series correctly repeated forward at least once, the total raw score for digits forward, the WISC scaled score relative to the normative sample, and total raw score for digits repeated forward and backward.

Speech Perception

Video Game Test of Speech Pattern Contrast Perception (Boothroyd 1997)—CI-E

This test provides a nonlinguistic measure of a child's ability to discriminate specific phoneme contrasts. The test requires a participant to detect a feature contrast change in two syllables. One of the syllables serves as a standard stimulus that forms a repeating background. The participants respond by pressing a key when the different stimulus occurs. The Video Game Test of Speech Pattern Contrast Perception (VIDSPAC) stimuli consisted of two vowel contrasts, tee/too and tee/taa; two place contrasts, daa/gaa and saa/shaa; two voicing contrasts, daa/taa and saa/zaa; and two manner contrasts, daa/zaa and saa/taa. VIDSPAC scores were adjusted for random responding and the following corrected scores constituted the dependent variable: (a) total contrasts (i.e., consonants and vowels) correctly detected, (b) consonant contrasts correctly detected, (c) vowel contrasts correctly detected, and (d) consonant manner contrasts correctly detected.

Aided Threshold Average—CI-HS

Aided sound-field detection thresholds were obtained using frequency modulated tones at octave frequencies from 250 to 4000 Hz. A detailed description of the methods is contained in the monograph by Davidson et al. (2011, this issue, pp. 19S–26S).


Table 1 describes the mean, SDs, and range of performance associated with each of the dependent variables as a function of test session (CI-E versus CI-HS) and gender. Speech intelligibility in quiet at both CI-E and CI-HS test conditions revealed higher performance for key words in high-context than low-context sentences for both genders (F[1,214] = 14.35, p < 0.0001). The high context advantage disappeared during the MSB conditions at CI-HS with equivalent intelligibility scores between high and low contexts evident. Higher performance was evident for female participants for each intelligibility measure obtained at each time period (F[1,214] = 7.09, p < 0.008). Consonant correct performance was significantly higher at the CI-HS than the CI-E time period (F[1,214] = 122.12, p < 0.0000). Sign enhancement at CI-E indicated ratios from 0.8 to 0.96 for the language samples whereas Adolescent Sign Enhancement was 3.7 and 3.1 for females and males, respectively. Parents reported similar ratings of speech use for both genders at the CI-E session. VIDSPAC scores were higher for females relative to males for all CI-E values except vowels. As reported in detail elsewhere in the monograph by Davidson et al. (2011, this issue, pp. 19S–26S), average performance on the Lexical Neighborhood Test and Bamford Kowal Bench sentences presented at 70 dB SPL was higher at the CI-HS session than the CI-E session. The average aided threshold was 30.4 with a SD of 9.6. Duration of the intelligibility sentences was significantly reduced at the CI-HS session relative to the CI-E session for both the total durations and the duration of the seven-syllable items (F[1,214] = 100.63, p < 0.0000). Forward digit span increased two items on average for the participants in the CI-HS session and a detailed description is in the article by Pisoni et al. (2011, this issue, pp. 60S–74S).

Means, SD, and range values are shown for performance of the participants at CI-E and CI-HS test sessions as a function of gender

Table 2 provides the intercorrelation matrices summarizing five sets of measures. The high correlation coefficients associated with each set of measurements suggested that the measurements could be reduced to single standardized variables using principal component analyses (PCA). In the first set of speech production measures acquired at the CI-E test period, we examined how normal-hearing listeners judged the accuracy of high-and low-context words in the sentences, consonant accuracy, and parental consideration of how their child used speech. As indicated by the values in Table 3, all coefficients were highly significant, suggesting that these variables could be reduced to a single variable, “Early Speech Production” using PCA. We also examined how the accuracy of consonant production and judgments of normal hearing listeners for the high- and low-context words in the McGarr sentences under optimal, quiet conditions and in degraded listening conditions with MSB were related at the CI-HS test session. The highly significant coefficients suggested that these variables could be reduced to a single variable, “Adolescent Speech Production” using PCA. Four measures representing the ratio of OC to SC performance (words per utterance, different words per minute, noun phrases, and sentence structure scores) at the CI-E session were also highly related and were reduced to a single variable, “Early Sign Enhancement.” The four measures acquired on the VIDSPAC at the CI-E test session (total, vowel, consonant, and manner contrasts) were highly related and reduced to a single variable, “Early Speech Perception.” CI-E performance on memory measures (raw score for forward span, total scaled score, longest forward digit span, and raw score for forward and backward spans) were also highly related and reduced to a single component, “Early Working Memory.” A single component “Adolescent Working Memory” was obtained from the highly related measures associated with the same digit measures collected at the CI-HS test session.

Intercorrelations among variables used for Principal Component Analyses are shown
Principal component factor loadings are shown

The Adolescent Speech Production variable was evaluated using multiple regression analyses with variables associated with the characteristics of the participants and their families as predictor variables. These variables included two measures that remained constant across the two test sessions, duration of deafness and gender, as well as three variables that may change across the CI-E and CI-HS test sessions, nonverbal PIQ, family size, and socioeconomic status. In addition, we considered the impact of three additional measures acquired at the CI-HS test session. These measures included the average aided thresholds, the Adolescent Sign Enhancement variable, and the average duration taken to produce the seven-syllable sentences of the sentences at each test session.

Is Adolescent Speech Production Related to Participant, Family, and Performance Measures Observed During the Elementary School Age?

Ten variables were used as predictors of the Adolescent Speech Production variable. These variables included duration of deafness, gender, nonverbal PIQ, family size, socioeconomic status, Early Sign Enhancement, Early Speech Perception, Early Working Memory, Early Speech Production, and the duration of producing seven-syllable sentences. First, we evaluated duration of deafness, gender, nonverbal PIQ, family size, and socioeconomic status. As shown in Table 4, these variables accounted for nearly 13% of the variance associated with the Adolescent Speech Production component. Two variables, family size and socioeconomic status, provided significant independent contributions to the Adolescent Speech Production variance. We next added to these variables the values associated with the PCA-generated variables of Early Working Memory, Early Sign Enhancement, and Early Speech Perception. These variables, in combination, accounted for nearly 56% of the variance in the Adolescent Speech Production scores. Addition of these variables eliminated the independent contribution of family size and socioeconomic status; however, gender, the Early Sign Enhancement, and Early Speech Perception components became significant independent contributors to the Adolescent Speech Production variance. Finally, we added in the contributions of duration of producing seven-syllable sentences and the Early Speech Production variable. The cumulative effect of all the variables accounted for 66% of the variance in the Adolescent Speech Production. Once all the variables were included, two components, Early Sign Enhancement and Early Speech Production, made robust independent contributions to the Adolescent Speech Production component. Data indicated that higher levels of Adolescent Speech Production were associated with smaller families, higher socioeconomic status, and females at the early elementary, CI-E test session. Once these variables are accounted for, Adolescent Speech Production was strongly related to Early Speech Production and Early Speech Perception proficiencies. Children with the lowest Adolescent Speech Production scores demonstrated a greater reliance on sign enhancement at the CI-E session.

Predicting CI-HS adolescent speech production performance from CI-E measures

Is Adolescent Speech Production Predicted by Participant, Family, and Performance Measures Obtained at the Adolescent Test Session?

Nine variables obtained at the CI-HS session were considered as predictors for the Adolescent Speech Production component (see Table 5). These variables included the non-changing variables from CI-E to CI-HS testing of duration of deafness and gender, the child and family measures of PIQ, family size, socioeconomic status acquired during the CI-HS test session and the measures of aided thresholds, Adolescent Sign Enhancement, the Adolescent Working Memory component, and duration of the seven-syllable sentences at CI-HS. In comparable analyses to those described in the above paragraph, we first evaluated the influence of the child and family variables, duration of deafness, gender, and CI-HS measures of PIQ, family size, and socioeconomic status. These variables accounted for 6.9% of the variance in the Adolescent Speech Production component. In this initial consideration, gender served as an independent contributor to the Adolescent Speech Production component. In the second pass, we evaluated the child and family variables in combination with the average aided thresholds, the Adolescent Sign Enhancement variable, and the Adolescent Working Memory component. These variables accounted for an additional 30.4% of the variance observed in the Adolescent Speech Production components. The analysis revealed that independent contributions to the variance were evident in the gender and Adolescent Sign Enhancement variables. Finally, we included the child, family, average aided thresholds, and Adolescent Working Memory variables with duration of the seven-syllable sentences acquired in the CI-HS session. Inclusion of the seven-syllable duration variable further accounted for 11.7% of the variance with three variables, gender, Adolescent Sign Enhancement, and sentence duration at CI-HS, independently contributing to the variance. Nearly 50% of the Adolescent Speech Production component is accounted for overall by these variables. During adolescence, speech production is more accurate in females and teenagers who produce sentences at faster durations. Children who rely less on sign for their receptive vocabulary also demonstrate higher Adolescent Speech Production performance.

Predicting adolescent speech production performance from CI-HS measures

Is Adolescent Speech Production Predicted by Participant, Family, and Performance Measures Obtained at Both the Elementary Age and Adolescent Test Sessions?

To evaluate how the CI-HS variables accounted for variance over and beyond that accounted for by the CI-E variables, we examined eight child and family variables acquired at both test sessions in conjunction with average aided thresholds, Adolescent Sign Enhancement, Adolescent Working Memory, and duration of seven-syllable sentences acquired during the CI-HS session. As shown in Table 6, duration of deafness and gender in combination with PIQ, family size, and socioeconomic status at the CI-E test session accounted for 12.5% of the Adolescent Speech Production variance. Family size and socioeconomic status at the CI-E session provided independent contributions to the variance. Evaluation of these variables in conjunction with comparable measures at CI-HS of PIQ, family size, and socioeconomic status revealed that an additional 5.9% of the variance of Adolescent Speech Production component was accounted for. Family size measured at both CI-E and CI-HS remained an independent contributor to the variance of Adolescent Speech Production. Socioeconomic status at CI-E remained an independent contributor to the overall variance; however, socioeconomic status at CI-HS did not independently contribute to the variance in Adolescent Speech Production. Consideration of the child and family variables with measures acquired at CI-HS of average aided thresholds, Adolescent Sign Enhancement, and Adolescent Working Memory accounted for an additional 21% of the variance associated with the Adolescent Speech Production. There is a tendency for family size and gender to remain important contributors to variance. However, once child and family variables measured at the two times are accounted for, only average aided thresholds and the Adolescence Sign Enhancement variables provided independent contributions to the variance. In the final step, we included the child and family variables, aided thresholds, Adolescent Sign Enhancement, Adolescent Working Memory, and the seven-syllable CI-HS duration of the sentences. Inclusion of the sentence duration variable accounted for an additional 10.3% of the total variance. Gender and family size at the CI-HS test session tended to move toward independent contributions to the variance but failed to reach significance. Strong independent contributions to the overall variance were made by the Adolescent Sign Enhancement and seven-syllable CI-HS duration variables.

Predicting Adolescent Speech Production performance from early (CI-E) and late (CI-HS) measures

The model incorporating the child and family variables from both test sessions accounted for nearly 18% of the overall variance. Adolescent Speech Production was higher for participants from smaller families and higher socioeconomic status, but these variables were no longer significant once aided thresholds and Adolescent Sign Enhancement were introduced into the model. Higher aided thresholds and greater Adolescent Sign Enhancement were associated with less accurate Adolescent Speech Production. There was a tendency for females to achieve more accurate levels of performance relative to males. Adolescent Working Memory and seven-syllable durations reduced the influence of aided thresholds. Higher Adolescent Speech Production was evident in teenagers who displayed little difference in their OC and SC receptive vocabulary scores and who produced seven-syllable sentences with short durations.


Continued experience with a CI into adolescence results in substantial improvements in several speech production measures. Consonant production improved from an average of 71% correct during the CI-E test session to 93.8% correct during the CI-HS session—nearly a 22% improvement. Increased consonant accuracy was accompanied by roughly 22% increases in the accuracy of unfamiliar listeners in identifying keywords contained in high- and low-context sentences. Variability in performance for speech intelligibility measures decreased by nearly 10% in the CI-HS test session. Contextual effects remained evident in the CI-HS values with higher accuracy associated with keywords contained in high-context sentences. Differences in accuracy were reduced relative to the effects observed during the elementary school test session, suggesting that sentential context played less of a role during adolescence than when the children were younger. These high levels of speech production performance are remarkably striking when compared with previous reports of speech intelligibility in children with profound SNHL using conventional hearing aids averaging 17 to 21% (Smith 1975; Monsen 1981). Improvements in consonant accuracy with increased auditory experience with a CI also are in agreement with several previous reports examining smaller sets of participants' speech production after implantation (Mondain et al. 1997; Serry et al. 1997; Vieu et al. 1998; Serry & Blamey 1999; Blamey et al. 2001b; Chin 2003, 2007; Peng et al. 2004; Uziel et al. 2007; Tomblin et al. 2008; Warner-Czyz et al. 2010).

In an attempt to model normal speaking conditions where a speaker must convey their message in environments with competing talkers such as in an active classroom discussion or a restaurant, we also evaluated the speech intelligibility of the teenagers by embedding their McGarr sentences in a multispeaker background babble condition. This condition allowed us to assess the impact of MSB while also controlling for any “on-line” adjustments to rate, intensity, and clarity. Speech intelligibility for keywords in MSB conditions was reduced by approximately 20%—a finding resembling previous reports.1 Moreover, equivalent intelligibility was observed for keywords in high- and low-context sentences. Thus, any advantage sentential context played under normal, quiet listening conditions was eliminated when listeners were forced to understand spoken communications under less desirable situations. On the surface, this observation is perplexing, given the nearly 100% consonant correct scores associated with the CI-HS productions. Closer examination, however, suggests several reasons for this observation. First, the MSB condition intentionally does not assess how the speakers adjust their articulation and speech “on-line” when speaking in a noisy background. Controlling for this aspect of adjustment allows us to obtain a “baseline” measure of intelligibility under quiet and less ideal listening conditions. Second, to make consonant correct comparisons across the two test sessions, we used a PCC-R measure that counted substitutions and omissions as errors but counted distortions or allophones as correct. We initially chose this measure to use for the CI-E data because it seemed to adequately capture the types of errors (substitutions versus omissions) and emerging consonants of the young CI population. The PCC-R averages observed in this study are similar to the 88 to 91% values reported for CI pediatric users with 8 to 10 yrs of device experience using similar broad transcription techniques. Narrow transcriptions used to evaluate phonetic inventories suggest that phonetic development slows between the fifth and sixth yrs postimplant (Blamey et al. 2001). Such limitations may signal constraints on the underlying responsiveness of the nervous system as children grow and move through the sensitive periods associated with spoken communication. The consequences of using PCC-R in evaluating the CI-HS productions are evident in the MSB. In less ideal listening conditions, allophones or distortions that failed to adversely influence a listener's ability to understand the teenagers in quiet situations seem to come into play and reduce intelligibility of keywords to an extent that any additional information conveyed by the sentence context is neutralized. Thus, equivalent intelligibility is observed for keywords in high and low contextual sentences. Further evaluation will be needed to determine how a speaker adjusts their speech “on-line” to MSB conditions and if equivalent intelligibility continues in high- and low-context sentences.

Close examination of the factors influencing the high levels of performance associated with the Adolescent Speech Production measures reveal several interesting observations. First, we evaluated how characteristics observed during the elementary school ages associated with the child and family, Early Sign Enhancement, Early Speech Perception, Early Working Memory, Early Speech Production, and duration of seven-syllable sentences predicted the adolescent speech production. More accurate speech production in adolescence was associated with female speakers, teenagers who demonstrated higher levels of speech production accuracy during elementary school ages, and teenagers who demonstrated high levels of performance on the VIDSPAC measures at the CI-E session. Once Early Working Memory also was taken into account, Adolescent Speech Production was most strongly predicted by the Early Sign Enhancement and Early Speech Production variables. Higher Adolescent Speech Production performance was associated with teenagers who displayed higher performance during OC than SC language sampling in elementary school. These data suggest that early exposure and reliance on listening and speaking continues to impact speech production accuracy in adolescence.

Adolescent Speech Production also was influenced by several factors observed in the CI-HS session. Higher consonant accuracy and speech intelligibility in quiet and MSB was associated with female rather than male speakers. Higher Adolescent Speech Production also was associated with teenagers whose one-word receptive vocabularies were equivalent or higher in OC relative to SC administrations. Consonant accuracy and the ability of unfamiliar listeners to understand keywords were higher for teenagers who also produced sentences in shorter periods of time. When all the variables associated with the CI-E and CI-HS analyses were combined, family size and gender remain important contributors to Adolescent Speech Production performance. Teenagers from smaller families demonstrated higher speech production skills. Although socioeconomic status is commonly reported as an influential variable (including in the initial report of the current study population at elementary school ages (Tobey et al. 2003)), its influence is diminished once similar variables acquired at adolescence were taken into account. Higher average aided thresholds were associated with lower Adolescent Speech Production, but its importance diminishes as Adolescent Working Memory and abilities to produce more nearly normal length seven-syllable sentences come into play.

Adolescents whose communication relies more on OC than SC demonstrate higher speech production skills at elementary and high school ages. As discussed in the article by Geers et al. (2011, this issue, pp. 2S–12S), we created a PC variable reflecting the OC to SC ratios of performance on words and syntax used in language samples collected during the CI-E test session. Early Sign Enhancement suggested that teenagers with higher levels of speech intelligibility also were children whose ratios approached one or higher during the elementary test session. Support for the importance of early experiences incorporating speaking and listening are found in the Adolescent Sign Enhancement measures. When all other child, family, and perception measures were accounted for, Adolescent Sign Enhancement accounted for nearly 10% of the variance in production performance observed at adolescence. These two findings underscore the continuing importance of early experiences and their impact on later speech production performance.

Overall, data from this study indicate that speech intelligibility continued to improve from elementary through the high school years, although the study is limited to assessing speech intelligibility at only two test sessions. Such a limitation restricts our ability to estimate at what ages the high levels of intelligibility are acquired and whether or not they are representative of a “plateau.” High levels of speech intelligibility are reduced under demanding listening situations when multiple speakers are in the background. Reductions in overall speech intelligibility in the presence of competing signals suggest that the high levels of consonant accuracy determined by trained listeners in quiet conditions may be constrained by underestimating distortions, allophonic variations, and the possible use of speech sounds in nonambient languages through the use of a PCC-R measure. Several previous reports detail instances where the speech of CI children is regarded as highly intelligible but also perceived as displaying a foreign accent (Gulati 2003; Teoh & Chin 2009). As Toeh and Chin recently noted, “small subtle speech errors are the most challenging to address in therapy” (Teoh & Chin 2009, p. 389). These small, subtle errors also are evident in the temporal characteristics of sentences produced by CI speakers. Sentences with more nearly normal temporal relationships are judged more intelligible. Collectively, these observations reinforce the importance of balancing narrow and broad transcription techniques for monitoring speech production acquisition. High correlations between the responses acquired on a questionnaire asking parents how other listeners understand their child's speech and accuracy of consonant production and speech intelligibility measured with keyword accuracy suggests that parent ratings of their child's speech intelligibility might be useful for monitoring speech production acquisition in the early stages after cochlear implantation, in a manner similar to the commonly used Minimal Auditory Integration Scale (Osberger et al. 1997). Speech intelligibility seems strongly associated with exposure to environments where speaking and listening are included as integral pieces of the therapeutic regime.


The authors thank the contributions of Dr. Michael J Strube from Washington University and the support of Dr. Peter Roland, Department of Otorhinolaryngology—Head and Neck Surgery, The University of Texas Southwestern Medical Center.

This work was supported by the National Institute of Deafness and Other Communication Disorders (R01DC000581) and the Nelle C. Johnston Chair at the University of Texas at Dallas.


Bamford, J., & Wilson, I. (1979). Methodological Considerations and Practical Aspects of the BKB Sentence Lists. In J. Bench & J. M. Bamford (Eds). Speech-Hearing Tests and the Spoken Language of Hearing Impaired Children. London: Academy Press.
Blamey, P. J., Barry, J., Bow, C., et al. (2001a). The development of speech production following cochlear implantation. Clin Linguist Phon, 15, 363–382.
Blamey, P. J., Barry, J. G., Jacq, P. (2001b). Phonetic inventory development in young cochlear implant users 6 years postoperation. J Speech Lang Hear Res, 44, 73–79.
Boothroyd, A. (1997). Video Game Test of Speech Pattern Contrast Perception. New York, NY: Graduate Center, City University of New York.
Campbell, T. F., Dollaghan, C., Janosky, J. E., et al. (2007). A performance curve for assessing change in Percentage of Consonants Correct Revised (PCC-R). J Speech Lang Hear Res, 50, 1110–1119.
Chin, S. B. (2003). Children's consonant inventories after extended cochlear implant use. J Speech Lang Hear Res, 46, 849–862.
Chin, S. B. (2007). Variation in consonant cluster production by pediatric cochlear implant users. Ear Hear, 28:7S–10S.
Chin, S. B., Finnegan, K. R., Chung, B. A. (2001). Relationships among types of speech intelligibility in pediatric users of cochlear implants. J Commun Disord, 34, 187–205.
Chin, S. B., Tsai, P. L., Gao, S. (2003). Connected speech intelligibility of children with cochlear implants and children with normal hearing. Am J Speech Lang Pathol, 12, 440–451.
Clark, G. (2003). Cochlear implants in children: Safety as well as speech and language. Int J Pediatr Otorhinolaryngol, 67(suppl 1), S7–S20.
Davidson, L., Geers, A., Blamey, P., et al. (2011). Factors contributing to speech perception scores in long-term pediatric CI users. Ear Hear, 32, 19S–26S.
Dawson, P. W., Blamey, P. J., Dettman, S. J., et al. (1995). A clinical report on speech production of cochlear implant users. Ear Hear, 16, 551–561.
Dunn, L. M., & Dunn, L. M. (1997). The Peabody Picture Vocabulary Test (3rd ed.). Circle Pines, MN: American Guidance Services.
Geers, A., & Brenner, C. (2003). Background and educational characteristics of prelingually deaf children implanted by five years of age. Ear Hear, 24, 2S–14S.
Geers, A., Brenner, C., Tobey, E. (2011). Long-term outcomes of cochlear implantation in early childhood: Sample characteristics and data collection methods. Ear Hear, 32, 2S–12S.
Geers, A. E., Nicholas, J. G., Sedey, A. L. (2003). Language skills of children with early cochlear implantation. Ear Hear, 24, 46S–58S.
Gould, J., Lane, H., Vick, J., et al.. (2001). Changes in speech intelligibility of postlingually deaf adults after cochlear implantation. Ear Hear, 22, 453–460.
Gulati, S. (2003). Psychiatric Care of Culturally Deaf People. In Mental Health Care of Deaf People: A Culturally Affirmative Approach (pp, 33–107). Mahwah, NJ: Erlbaum.
Kent, R. D., Weismer, G., Kent, J. F., et al. (1989). Toward phonetic intelligibility testing in dysarthria. J Speech Hear Disord, 54, 482–499.
Kirk, K. I., Hay-McCutcheon, M., Sehgal, S. T., et al. (2000). Speech perception in children with cochlear implants: Effects of lexical difficulty, talker variability, and word length. Ann Otol Rhinol Laryngol Suppl, 185, 79–81.
Law, Z. W., & So, L. K. (2006). Phonological abilities of hearing-impaired Cantonese-speaking children with cochlear implants or hearing AIDS. J Speech Lang Hear Res, 49, 1342–1353.
MacWhinney, B. (2010). The CHILDES Project. Mahwah, NJ: Erlbaum.
McGarr, N. S. (1981). The effect of context on the intelligibility of hearing and deaf children's speech. Lang Speech, 24, 255–264.
Mondain, M., Sillon, M., Vieu, A., et al. (1997). Speech perception skills and speech production intelligibility in French children with prelingual deafness and cochlear implants. Arch Otolaryngol Head Neck Surg, 123, 181–184.
Monsen, R. B. (1981). A usable test for the speech intelligibility of deaf talkers. Am Ann Deaf, 126, 845–852.
Nikolopoulos, T. P., Archbold, S. M., Gregory, S. (2005). Young deaf children with hearing aids or cochlear implants: Early assessment package for monitoring progress. Int J Pediatr Otorhinolaryngol, 69, 175–186.
Osberger, M. J., Geier, L., Zimmerman-Phillips, S., et al. (1997). Use of a parent-report scale to assess benefit in children given the Clarion cochlear implant. Am J Otol, 18, S79–S80.
Peng, S. C., Spencer, L. J., Tomblin, J. B. (2004a). Speech intelligibility of pediatric cochlear implant recipients with 7 years of device experience. J Speech Lang Hear Res, 47, 1227–1236.
Peng, S. C., Weiss, A. L., Cheung, H., et al. (2004b). Consonant production and language skills in Mandarin-speaking children with cochlear implants. Arch Otolaryngol Head Neck Surg, 130, 592–597.
Pisoni, D. B., & Cleary, M. (2003). Measures of working memory span and verbal rehearsal speed in deaf children after cochlear implantation. Ear Hear, 24:106S–120S.
Pisoni, D. B., Kronenberger, W., Roman, A., et al. (2011). Measures in digit span and verbal rehearsal speed in deaf children following more than 10 years of cochlear implant use. Ear Hear, 32, 60S–74S.
Serry, T., Blamey, P., Grogan, M. (1997). Phoneme acquisition in the first 4 years of implant use. Am J Otol, 18, S122–S124.
Serry, T. A., & Blamey, P. J. (1999). A 4-year investigation into phonetic inventory development in young cochlear implant users. J Speech Lang Hear Res, 42, 141–154.
Shriberg, L. D., & Lof, G. (1991). Reliability studies in broad and narrow phonetic transcription. Clin Linguist Phon, 5, 225–279.
Smith, C. R. (1975). Residual hearing and speech production in deaf children. J Speech Hear Res, 18, 795–811.
Teoh, A. P., & Chin, S. B. (2009). Transcribing the speech of children with cochlear implants: Clinical application of narrow phonetic transcriptions. Am J Speech Lang Pathol, 18, 388–401.
Tobey, E. A., & Geers, A. E. (1995). Speech production benefits of cochlear implants. Adv Otorhinolaryngol, 50, 146–153.
Tobey, E. A., Geers, A. E., Brenner, C., et al. (2003). Factors associated with development of speech production skills in children implanted by age five. Ear Hear, 24, 36S–45S.
Tobey, E. A., Geers, A. E., Douek, B. M., et al. (2000). Factors associated with speech intelligibility in children with cochlear implants. Ann Otol Rhinol Laryngol Suppl, 185, 28–30.
Tomblin, J. B., Peng, S. C., Spencer, L. J., et al. (2008). Long-term trajectories of the development of speech sound production in pediatric cochlear implant recipients. J Speech Lang Hear Res, 51, 1353–1368.
Uchanski, R. M., & Geers, A. E. (2003). Acoustic characteristics of the speech of young cochlear implant users: A comparison with normal-hearing age-mates. Ear Hear, 24, 90S–105S.
Uziel, A. S., Sillon, M., Vieu, A., et al. (2007). Ten-year follow-up of a consecutive series of children with multichannel cochlear implants. Otol Neurotol, 28, 615–628.
Vieu, A., Mondain, M., Blanchard, K., et al. (1998). Influence of communication mode on speech intelligibility and syntactic structure of sentences in profoundly hearing impaired French children implanted between 5 and 9 years of age. Int J Pediatr Otorhinolaryngol, 44, 15–22.
Warner-Czyz, A. D., Davis, B. L., MacNeilage, P. F. (2010). Accuracy of consonant-vowel syllables in young cochlear implant recipients and hearing children in the single-word period. J Speech Lang Hear Res, 53, 2–17.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children (3rd ed.). San Antonio, TX: Psychological Corporation Harcourt Brace.


1.Tobey, E., Shin, S., Geers, A., et al. (in press). Spoken word recognition in adolescent cochlear implant users during quiet and multispeaker babble conditions. Otol Neurotol.
© 2011 Lippincott Williams & Wilkins, Inc.