Toggle Main Menu Toggle Search

Open Access padlockePrints

Processing of Dynamic Spectral Properties of Sounds

Lookup NU author(s): Professor Adrian ReesORCiD


Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Dynamic changes in spectral content, where the frequency of one or more components of the sound varies with time, are important for sound perception. In many biologically significant sounds, these components distinguish one token from another. For example, in speech, upward or downward changes in frequency occurring over a few tens of milliseconds characterize the formants of consonants and diphthongs. On a longer time scale, changes in the fundamental frequency of the voice carry suprasegmental or prosodic information in many languages. These changes in stress and intonation signal not only the speaker's mood, but also important semantic content. For example, the change in pitch at the end of a sentence can signal a question, statement, or qualification, depending on the direction or pattern of the change. In contrast, pitch changes in tonal languages, like Mandarin Chinese, can distinguish totally different word meanings. Dynamic changes in frequency are not restricted to human vocalizations. Con-specific sounds in other species contain a wide range of glides or periodic changes in frequency, e.g., monkeys (Ghazanfar and Hauser, 2001; Winter et al., 1966), cats (Brown et al., 1978), guinea pigs (Suta et al., 2003), rats (Kaltwasser, 1990), bats (Suga, 1988), and birds (Greenwald, 1968). A second role often postulated for FMs is in auditory streaming or grouping. One of the most intriguing aspects of auditory processing, particularly when compared with machine-based sound recognition systems, is the capacity to separate and identify concurrent sound sources (see the chapter by Donal G. Sinex), particularly when those sources contain frequency components in common. The recovery of independent sources depends on a process that identifies components belonging to the same source and enables them to be perceived as a single stream (Bregman, 1990). Several cues contribute to this process, and it is likely that their exploitation in auditory processing depends on their saliency in a particular context. One possible cue is the coherence of FM across frequency components. If a group of harmonics is played together, one perceives the sound as a whole and the individual components are difficult to resolve. However, if selected harmonics are modulated in frequency they can be clearly heard to stand out from the rest (Bregman, 1990; McAdams, 1989). Musicians have long exploited this phenomenon by using vibrato to give a vocal or solo instrumental line greater prominence against the accompaniment. Formal psychophysical studies suggest that coherent FM appears to be relatively unimportant for grouping vowels or even inharmonic sounds (Carlyon, 1991; Culling and Summerfield, 1995; McAdams, 1989). Nevertheless, FM may be important when listeners are required to identify whole sentences under unfavorable conditions, as in the presence of competing sounds. Zeng et al. (2005) extracted amplitude modulation (AM) and FM from different frequency bands of sentences and then created stimuli in which these components modulated pure tones or noise bursts. In quiet, AM alone was sufficient for good recognition, but in the presence of a competing sentence subjects performed significantly better when both AM and FM were present. This suggests that when the task is sufficiently demanding, FM does contribute to the separation of auditory streams. The importance of FM as a distinguishing feature in animal sounds and speech suggests that the analysis of spectral modulations is a prerequisite for auditory perception. In this chapter we review experiments that have reported how single neurons at different levels of the auditory pathway respond to FM, and discuss what is known about the extraction and representation of this information. Many of the stimuli used in these studies are abstractions of real vocalizations, being either periodic FMs with a single carrier and a simple modulation waveform, such as a sinusoid, or a frequency sweep. Such stimuli have the advantage of being more tractable because they allow the experimenter greater control over their parameters. However, it is important to remember they are considerably simpler than most of the natural sounds they seek to emulate. This review focuses on data from non-specialized mammals. This may seem surprising given that FM is, an important component in the echolocation calls of many bat species and the subject of intensive investigation in these animals. However, the success of these investigations has resulted in a body of literature too voluminous to do justice to here, and, in any case, it has received excellent treatment elsewhere (e.g., Suga, 1988; O'Neill, 1995; Covey and Casseday, 1999). Our comments on processing in bats, therefore, are mainly restricted to points of comparison, or to aspects of bat hearing not specifically concerned with echo-location. © 2005 Elsevier Inc. All rights reserved.

Publication metadata

Author(s): Rees A, Malmierca MS

Publication type: Review

Publication status: Published

Journal: International Review of Neurobiology

Year: 2005

Volume: 70

Pages: 299-330

ISSN (print): 0074-7742

ISSN (electronic): 0091-5432


DOI: 10.1016/S0074-7742(05)70009-X

PubMed id: 16472638