One of the primary goals of my research is to understand how listeners extract linguistic units from the speech signal. A central question concerns whether listeners represent speech sounds in terms of continuous acoustic features or in terms of discrete phoneme categories. Early work in speech perception (Liberman, Harris, Hoffman, & Griffiths, 1952) suggested that speech is perceived in terms of categories and that listeners largely ignore acoustic variation within a phoneme category.
These models suggested that speech recognition demonstrates a principle of categorical perception, in which speech sounds are represented by phoneme categories. However, later work demonstrated that listeners can perceive within-category differences, arguing instead that listeners perceive speech sounds in terms of continuous acoustic cues (Pisoni & Lazarus, 1974).
Distinguishing between continuous and categorical models requires us to examine how speech sounds are represented during perception. However, a major challenge in doing this is that behavioral measures only provide an indirect measure of how listeners represent speech sounds. Because of this, a number of researchers have looked at neurophysiological responses that may reflect perceptual representations more directly. In particular, a number of studies have used the event-related brain potential (ERP) technique to examine brain responses to speech sounds.
Recently, we examined the effects of variation in an acoustic cue (voice onset time; VOT) on the auditory N1 ERP response (Toscano, McMurray, Dennhardt, & Luck, 2010). We found that the size of this response varies continuously with changes in VOT and is not affected by listeners’ phoneme categories. This result is consistent with continuous models of speech perception.
However, a limitation of this approach is that ERP responses do not provide reliable information about where in the brain the response is generated (spatial information), though they provide very good information when it was generated (temporal information). As a result, it is difficult to say with certainty that this result reflects differences in the representation of speech sounds in the part of the brain that performs the initial processing of auditory stimuli. Knowing where a brain response is generated would allow us to create a map that tells us which populations of neurons respond most strongly to different sounds, which in turn, would help us to distinguish between continuous and categorical models.
Recently, researchers at Illinois (Profs. Monica Fabiani and Gabriele Gratton) have developed a non-invasive imaging method, the event-related optical signal (EROS), that can provide both high temporal and high spatial resolution. This technique uses infrared light to detect changes in the optical properties of cortical neurons when they are active. The figure above shows a participant wearing an array of infrared sources and detectors used in the experiment. Because this technique measures neuronal activity (rather than hemodynamic responses, as fMRI measures), it provides good temporal resolution. In addition, because the light is not scattered when it reaches the scalp (as electrical fields are with ERP measures), it provides good spatial resolution. The EROS technique has also been used previously to study other aspects of language processing (Tse et al., 2007). We are currently using approach to examine how speech sounds are represented in auditory cortex.
More information:
- Cognitive Neuroimaging Lab
- Toscano, J. C., McMurray, B., Dennhardt, J., & Luck, S. J. (2010). Continuous perception and graded categorization: Electrophysiological evidence for a linear relationship between the acoustic signal and perceptual encoding of speech. Psychological Science, 21, 1532-1540.
- Tse et al. (2007). Imaging cortical dynamics of language processing with the event-related optical signal. Proceedings of the National Academy of Sciences, 104, 17157-17162.