Journal Logo


Activation of the left superior temporal gyrus of musicians by music-derived sounds

Matsui, Toshiea; Tanaka, Satomib; Kazai, Kojic; Tsuzaki, Minorud; Katayose, Haruhiroc

Author Information
doi: 10.1097/WNR.0b013e32835c1e02



Musicians have been compared with nonmusicians on various levels and have been used as a model of long-term acoustic training. Previous studies have suggested that musical training affects comprehension of musical temporal structure. Harmonic progressions, melodic boundaries, and metric groupings are among the elementary units of music that are relative to a specific temporal order or repetition of notes. Neuroscientists have compared the neural responses between musicians and nonmusicians for stimuli with various musical temporal structures. For example, the degree of activation of the left planum temporale while listening to classical music is correlated with the age at which an individual began musical training 1, harmony incongruity in a musical piece is processed in the right limbic areas in music experts 2, musicians demonstrate a positive electroencephalogram component called the music closure positive shift only at the melodic phrase boundary rather than during discontinuity in the melodic input 3, and musicians show left lateralization and larger neural activation during metric rhythm perception compared with random rhythm perception 4. Despite the increasing number of investigations about this issue, the relationship between musical training and neural responses to musical structure remains unclear.

Currently, many professional musicians and their candidates appreciate and perform traditional western classical music with functional harmony, traditional form (e.g. binary, Sonata and Rondo), and simple/compound meter and contemporary music that lacks these features. These individuals are often trained to be able to listen to contemporary music and comprehend nontraditional chordal and temporal structures. Musicians may be able to identify pieces of music in which the temporal structure has been destroyed, as long as the intensity, pitch, timbre, and chords are preserved 5, just as nonmusicians can judge similarity and categorize fragments of familiar environmental sounds without temporal coherence 6. An investigation of the neural activity involved in stimulus processing in musicians might elucidate the effects of musical training on the perception of musically interpretable stimuli, regardless of musical temporal structure.

We conducted a behavioral experiment to investigate the differences between musicians and nonmusicians in understanding the characteristics of a stimulus without musical temporal structure. In addition, we carried out an fMRI experiment to confirm differences in neurological responses between musicians and nonmusicians for stimuli with and without musical temporal structure.

Materials and methods

Behavioral experiment


Two groups of participants, one consisting of six musicians and the other consisting of six nonmusicians took part in the experiment. All participants provided informed consent before the study. The musician group included students who were majoring in classical music performance and professional musicians (average age: 24.5 years; age range: 21–32 years). The nonmusician group included individuals who had attended music lessons for no more than 2 years for any instrument, excluding the music component of their compulsory education (average age: 20.6 years; age range: 18–22 years).


For experimental stimuli, we selected 15 classical pieces of very obscure piano music that would likely be unfamiliar even to those in the musician group (see PDF, Supplemental digital content 1,, which describes the features of the stimuli).

Two types of stimuli, ‘music’ and ‘scrambled’ were created according to the style of Levitin and Menon 5. The musical stimuli consisted of one or two similar excerpts of 21 s clipped from each piece of piano music. The scrambled stimuli were created by randomly drawing 250–350 ms variable-sized fragments from each music stimulus and concatenating them with a 30 ms linear cross-fade between the fragments. By applying the Auditory Image Model, which is a computational model of auditory processing 7, we obtained a high coefficient of determination between the musical and the scrambled stimuli, in an excitation pattern representing the auditory nerve firing pattern (related to the spectral envelope and timbre perception, mean r2=0.999). In addition, we obtained a high coefficient of determination between the musical and the scrambled stimuli, in an autocorrelational function that represented the time interval of the sound wave (related to pitch perception, mean r2=0.997).

All stimuli were created offline using MATLAB (Mathworks, Natick, Massachusetts, USA; 44.1 kHz sampling, 16-bit stereo). The loudness of all stimuli was set at a level determined to be comfortable by one of the authors.


This experiment consisted of an XAB-type task in which one musical stimulus X and two scrambled stimuli A and B were presented sequentially. Participants were asked to choose whether A or B had originated from X. One of the two scrambled stimuli was generated from the musical stimulus X and the other, which acted as a distractor, was selected from scrambled stimuli generated from other musical stimuli. Participants performed two types of comparisons: a similar stimuli comparison and a different stimuli comparison. In the similar stimuli comparison, the distractor was generated from a part of the same original piano music of standard stimulus X. The part was not identical to, but very similar to X. Therefore, the distractor sounded very similar to the scrambled stimulus as the correct answer. The mean of the coefficient of determination (r2) between X and the similar distractor was 0.918 for the excitation pattern and 0.897 for the time interval. In the different stimuli comparison, the distractor was generated from other pieces of music or from a part of the same piano music that had very different pitches and chords. The mean of the coefficient of determination (r2) between X and the different distractor was 0.726 for the excitation pattern and 0.523 for the time interval 7. All tasks were presented using a GUI program constructed using MATLAB (Mathworks). The auditory stimuli were presented binaurally through headphones (HD650; Sennheiser, Wedemark, Germany). The experiment was carried out in a sound-insulated room located at the Kwansei Gakuin University.

Functional magnetic resonance imaging experiment


Two groups of female participants, including 12 musicians (average age: 29.1 years; age range: 20–48 years) and 15 nonmusicians (average age: 23.4 years; age range: 21–32 years) took part in the fMRI experiment. The participants were fully informed about all procedures and gave written consent, in accordance with the Declaration of Helsinki, and the local ethics committee approved the study. The criteria for musicians and nonmusicians were identical to those in the behavioral experiment. All participants were right-handed. None of the participants had taken part in the behavioral experiment.


The stimuli were generated from six pieces of classical piano music that were selected from the pieces used in the behavioral experiment (see PDF, Supplemental digital content 1, for details of stimuli). The stimulus parameters were identical to those in the behavioral experiment, except for the sampling rate (22.05 kHz, stereo).


Our aim was to compare the neurological responses during passive listening between the two types of stimuli. It was necessary to encourage the participants to listen to the sounds during the experiment without giving them any specific attention. A previous study found that musicians were motivated to obtain a high score on a test measuring their musical aptitude 8. In the current study, it was possible that the ability of musicians to comprehend or analyze novel musical sounds would improve if they assumed that their brain activity would reflect their aptitude as musicians. To avoid this type of behavior, we included an active-listening task in the experiment to encourage participants to relax in the passive-listening task. In the active-listening task, participants were required to push a button when they detected phrasal boundaries in musical stimuli. They were also instructed to press a button when they detected wide-band noise bursts in the scrambled stimuli. The wide-band noise bursts were inserted at the same time points as the subjective phrasal boundaries in the corresponding musical stimuli. In the passive-listening task, participants were instructed to relax and listen to both the scrambled and musical stimuli. All participants were asked to perform the tasks without unnecessary thoughts. They were told that this was especially important during the passive-listening task because the brain activity observed during this task would be used as a baseline for comparison with the active-listening task. Therefore, the results from the active-listening task have not been analyzed in this study.

The stimuli were presented in blocks. Each run consisted of one stimulus with a 21-s rest period in advance of the stimulus. Twenty-four trials (six original pieces of music×two stimulus conditions×two tasks) were presented in eight blocks. Each block included three trials under a stimulus condition and a task. The order of the blocks was random for each half of the experiment. The two halves of the experiment had the same number of musical stimuli, scrambled stimuli, and tasks.

All stimuli were presented using Presentation 11.0 (Neurobehavioral Systems, Albany, California, USA). The stimuli were presented binaurally through magnet-compatible headphones (AS-3000H; Hitachi Advanced Systems, Yokohama, Japan). A button controller (Lumina LSC-400; Cedrus, San Pedro, California, USA) was used to record the responses of the participants during the active-listening task.


For image acquisition, we used a 1.5 T MRI (Signa; General Electric, Milwaukee, Wisconsin, USA) scanner, located at the Kobe Institute of Biomedical Research and Innovation. Functional T2*-weighted images were acquired using a spin echo-planar imaging sequence (repetition time: 3.0 s; echo time: 0.055 s; interleaved acquisition; slice thickness: 4.0 mm; 64×64 matrix; FOV: 25×25 cm). A total of 36 interleaved axial slices (covering the entire brain) were acquired. The first scan of each run was discarded. Images were preprocessed using an SPM8 toolbox (Wellcome Department of Cognitive Neurology, University College London). One anatomical image (T1; voxel dimension: 1×1×1.5 mm) was acquired for each participant and normalized to the MNI template. Functional images were temporally and spatially realigned, spatially normalized to the normalized anatomical image, and smoothed using an 8×8×8 mm FWHM Gaussian kernel. The images were processed using an SPM8 toolbox, using a general linear model employing a boxcar function convolved with a hemodynamic response function. High-pass filtering (cutoff period: 128 s) was carried out to reduce scanner and physiological artifacts. Autoregression was used to correct for serial correlations. A fixed-effect analysis was employed with a regressor for the stimulus conditions and tasks for each participant. The contrast images were then used in a random-effect analysis (t-test) for the contrast of interest. Results for the t-tests are reported with P value equal to 0.05 for family-wise error, corrected at the cluster level. Brodmann areas, indicated in Table 2, were based on each local maximum of significant clusters.


Behavioral experiment

No participants were able to identify the title of any of the pieces of music used in the experiment, as indicated in the postexperimental questionnaire. In the different stimulus comparison, those in both the musician and nonmusician groups had a nearly perfect rate of correct responses (Table 1). In the similar stimulus comparison, the musician group had a correct response rate that was significantly higher than chance (Pearson’s χ2-test, χ2(1)=7.511, P=0.0061), although the nonmusician group did not exceed chance level performance. These results suggest that the musicians in our study were able to recognize the features of the scrambled stimulus and match them to the features of the musical stimulus.

Table 1
Table 1:
Correct response rate for the XAB task

Functional magnetic resonance imaging experiment

None of the participants were able to identify the title of any of the pieces of music used in the experiment, as indicated in the postexperimental questionnaire. Neither the contrast image of the two stimulus conditions nor the contrast images of the two groups revealed a significant activation cluster. Figure 1 and Table 2 show the brain activation observed during the passive-listening task compared with the rest period for each group. We found a predominance of activity in the right hemisphere regardless the type of group or stimulus, although the response to the stimuli was different between groups. For the musical stimuli, the musician group showed more lateral activation compared with the nonmusician group. For the scrambled stimuli, the nonmusician group showed activation of the right STG near the primary auditory cortex. In contrast, the musician group showed activation of the bilateral STG.

Fig. 1
Fig. 1:
Activity pattern for each group and stimulus during a passive-listening task compared with the rest period (cluster level, P<0.05; family-wise error corrected). Glass brain statistical parametric maps (SPM) were acquired using SPM8.
Table 2
Table 2:
Activated regions for each participant group and the type of stimulus in a passive-listening task


We found no significant contrast or interactions between the stimulus conditions and groups, therefore, it is impossible to compare the current results with the previous fMRI study by Levitin and Menon 5. It is possible that the results of the current study are related to a low number of participants in the passive-listening task and a low S/N scanner ratio. The following discussion focuses on each contrast to the resting state for the stimulus conditions and groups.

Both the musician and nonmusician groups exhibited activation that was predominantly in the right hemisphere during both the musical and scrambled stimuli. It is likely that the processing of pitch extraction was a main load for both stimulus conditions, as previous studies have implied that pitch extraction is automatic for any auditory stimulation 9–11.

Despite the auditory stimulation, the musician group showed no activity in the Heschl’s gyrus (BA41) in either stimulus conditions compared with the rest period. Although high-level sound-insulating headphones were used, they may not have completely prevented the participants from hearing the EPI scanning noise, so the participants may have been exposed to intermittent sound noise with a specific pitch during the rest period. It is possible that the activity in the Heschl’s gyrus was saturated by a scanning noise with an ambiguous pitch during the rest period for the musician group, who are known to have a short refractory period for sounds with a certain pitch 12. Therefore, the activity of the primary auditory cortex in the listening task may not have been observed in the musician group.

The response to each type of stimulus shows different patterns between the two groups. For the musical stimuli, the activated area of those in the nonmusician group was larger than that of the musician group. If the STG activity observed in nonmusicians that was potentially caused by the intermittent EPI noise during the rest period was smaller than in musicians, a larger area would appear to be active in the contrasts between the listening task and the rest state. It is also likely that listening to the musical condition stimuli was a main load for nonmusicians because the musical condition stimuli were too unfamiliar and complex for this group.

For the scrambled stimulus, the musician group showed bilateral activation in the STG, whereas the nonmusician group showed activation in the right STG only. The results from the behavioral experiment indicate that the ability of musicians to identify acoustic features in the scrambled stimuli was higher than that of nonmusicians. In the postexperimental questionnaire given after the fMRI experiment, several musicians answered that they were able to guess the original musical stimuli from the scrambled stimuli. The bilateral activity observed in the musician group in response to the scrambled stimuli is likely related to the results of the behavioral experiment, that is, musicians are able to capture sound features without information about musical temporal structure.

Previous studies using mismatch negativity (MMN), which is an electric brain response to deviant stimuli, have demonstrated that musical experience changes the early stages of cortical processing for lower-level temporal features, such as pitch sequence. The MMN signals obtained from musicians while listening to deviants from ‘good continuation’ pitches 13 or from pitch contours 14 are larger than those found in nonmusicians. An fMRI and MMN study suggested that the areas responsible for the preattentive processing of pitch contour are located mainly in the left hemisphere in musicians 15. Furthermore, MMN is observed bilaterally in musicians for deviations in features of single tones and chords including pitch, intensity, duration, components of harmony 16, and consonance and dissonance of harmony 17. In these studies, MMN is elicited without conscious attention to the stimulus. Therefore, it is possible that the left lateralization in activation for the scrambled stimuli in the present fMRI study is a reflection of preattentive processing of deviant stimuli, because the present fMRI experiment did not require participants to pay attention to the stimulus (passive-listening task). If the activation of a neural system underlying deviant detection depends on the context of the experimental task and does not necessarily require a sequence of standard and deviant stimuli, then the observed response in the left STG for the scrambled stimuli may reflect the detection of deviance that musicians are capable of, given their knowledge of musical temporal structures acquired through long-term training.


The behavioral experiment demonstrated that musicians could identify pieces of music without information regarding musical temporal structure. The fMRI experiment revealed that musicians and nonmusicians have different patterns of activation in the left superior temporal gyrus during passive listening to an intact music stimulus and a scrambled music stimulus. The left STG activity observed while musicians listened to a scrambled music stimulus appears to be related to the detection of deviance, aided by the superior knowledge of musical temporal structures held by this population.


This research was supported by the CrestMuse project of the Japan Science and Technology Agency and Grant-in-Aid for Young Scientists (B) No. 23730715 of the Japan Society for the Promotion of Science.

Conflicts of interest

There are no conflicts of interest.


1. Ohnishi T, Matsuda H, Asada T, Aruga M, Hirakata M, Nishikawa M, et al. Functional anatomy of musical perception in musicians. Cereb Cortex. 2001;11:754–760
2. James CE, Britz J, Vuilleumier P, Hauert CA, Michel CM. Early neuronal responses in right limbic structures mediate harmony incongruity processing in musical experts. Neuroimage. 2008;42:1597–1608
3. Neuhaus C, Knosche TR, Friederici AD. Effects of musical expertise and boundary markers on phrase perception in music. J Cogn Neurosci. 2006;18:472–493
4. Limb CJ, Kemeny S, Ortigoza EB, Rouhani S, Braun AR. Left hemispheric lateralization of brain activity during passive rhythm perception in musicians. Anat Rec A Discov Mol Cell Evol Biol. 2006;288:382–389
5. Levitin DJ, Menon V. Musical structure is processed in ‘language’ areas of the brain: a possible role for Brodmann area 47 in temporal coherence. Neuroimage. 2003;20:2142–2152
6. Aucouturier JJ, Defreville B. Judging the similarity of soundscapes does not require categorization: evidence from spliced stimuli. J Acoust Soc Am. 2009;125:2155–2161
7. Patterson RD, Allerhand MH, Giguère C. Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. J Acoust Soc Am. 1995;98:1890–1894
8. McAuley JD, Henry MJ, Tuft S. Musician advantages in music perception: an issue of motivation, not just ability. Music Percept. 2011;28:505–518
9. Zatorre RJ, Belin P. Spectral and temporal processing in human auditory cortex. Cereb Cortex. 2001;11:946–953
10. Patterson RD, Uppenkamp S, Johnsrude IS, Griffiths TD. The processing of temporal pitch and melody information in auditory cortex. Neuron. 2002;36:767–776
11. Hyde KL, Peretz I, Zatorre RJ. Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia. 2008;46:632–639
12. Kuriki S, Kanda S, Hirata Y. Effects of musical experience on different components of MEG responses elicited by sequential piano-tones and chords. J Neurosci. 2006;26:4046–4053
13. Van Zuijen TL, Sussman E, Winkler I, Näätänen R, Tervaniemi M. Grouping of sequential sounds – an event-related potential study comparing musicians and nonmusicians. J Cogn Neurosci. 2003;16:331–338
14. Fujioka T, Trainor LJ, Ross B, Kakigi R, Pantev C. Musical training enhances automatic encoding of melodic contour and interval structure. J Cogn Neurosci. 2004;16:1010–1021
15. Habermeyer B, Herdener M, Esposito F, Hilti CC, Klarhöfer M, di Salle F, et al. Neural correlates of pre-attentive processing of pattern deviance in professional musicians. Hum Brain Mapp. 2009;30:3736–3747
16. Ono K, Nakamura A, Yoshiyama K, Kinkori T, Bundo M, Kato T, et al. The effect of musical experience on hemispheric lateralization in musical feature processing. Neurosci Lett. 2011;396:141–145
17. Minati L, Rosazza C, D'Incerti L, Pietrocini E, Valentini L, Scaioli V, et al. Functional MRI/event-related potential study of sensory consonance and dissonance in musicians and nonmusicians. Neuroreport. 2009;20:87–92

auditory; functional magnetic resonance imaging; music; musician; superior temporal gyrus; temporal structure

Supplemental Digital Content

© 2013 Lippincott Williams & Wilkins, Inc.