Cue Integration With Categories: Weighting Acoustic Cues in Speech Using Unsupervised Learning and Distributional Statistics

Posted on 1 Apr 2010 by Joe Toscano

Toscano, J. C., & McMurray, B. (2010). Cognitive Science, 34, 434-464.

Abstract: During speech perception, listeners make judgments about the phonological category of sounds by taking advantage of multiple acoustic cues for each phonological contrast. Perceptual experiments have shown that listeners weight these cues differently. How do listeners weight and combine acoustic cues to arrive at an overall estimate of the category for a speech sound? Here, we present several simulations using a mixture of Gaussians models that learn cue weights and combine cues on the basis of their distributional statistics. We show that a cue-weighting metric in which cues receive weight as a function of their reliability at distinguishing phonological categories provides a good fit to the perceptual data obtained from human listeners, but only when these weights emerge through the dynamics of learning. These results suggest that cue weights can be readily extracted from the speech signal through unsupervised learning processes.

MATLAB code for the mixture models is also available. Please email me if you’re interested in it.

View article | PubMed Central manuscript

(If you are unable to access an article, please email me at jtoscano at illinois.edu.)

Tagged with: computational modeling, cue integration, cue weighting, speech development, statistical learning, unsupervised learning, voice onset time
Posted in Journal Articles