Measuring perceptual encoding and categorization of speech sounds using an ERP approach

Posted on 2 Jan 2012 by Joe Toscano

Toscano, J. C., & McMurray, B. (2012, January). Poster presented at the 6th Conference of the Auditory Cognitive Neuroscience Society, Tucson, AZ.

Abstract:

To recognize speech, listeners must map continuous acoustic features in the sound signal onto discrete units (e.g., phonemes, words). An important question is whether speech sounds are initially encoded in terms of continuous cues or whether listeners perceive them only in terms of categories. Behavioral data show that listeners are sensitive to within-category acoustic differences, but their responses are still generally shaped by phoneme categories (Liberman et al., 1961; Pisoni & Lazarus, 1974; Miller, 1997; McMurray et al., 2002). This suggests that either (1) encoding is based on categories rather than continuous cues, or (2) behavioral responses do not reflect initial encoding.

Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding more directly (Toscano et al., 2010). We found that the amplitude of the auditory N1 component varies linearly with voice onset time, a cue to word-initial voicing, suggesting that listeners encode speech in terms of continuous cues. The later-occurring P3 component, in contrast, shows effects of both acoustic differences and phonological categories (Figure 1).

Here, we ask whether the N1 may provide a general tool for studying cue encoding by examining ERP responses to other types of speech sounds. Specifically, we asked whether we could observe differences in N1 amplitude for (1) naturally-produced, rather than synthesized, sounds; (2) spectral, rather than temporal, differences that distinguish other classes of phonemes (e.g., formant frequencies for vowels); and (3) word-medial acoustic differences. We also examined P3 responses to see whether it provides us with a general measure of how listeners categorize speech sounds and the effects of within-category acoustic differences.

The results show that differences in N1 amplitude can be clearly observed for some classes of speech sounds but are difficult to observe for others, though differences in P3 amplitude can still be seen. Thus, the N1 may serve as an index of cue encoding (in addition to other aspects of auditory processing identified by prior work). However, the specific speech sounds that we can study using this approach may depend on the complex link between the cues of interest and the neural generators of the N1.

PDF of poster

Tagged with: auditory n1, categorization, cue encoding, electrophysiology, erps, p300, speech perception
Posted in Presentations