The role of phoneme order and phonetic detail in spoken word recognition

Toscano, J. C., Anderson, N. D., & McMurray, B. (2011, October). Paper presented at the 17th Mid-Continental Phonetics and Phonology Conference, Urbana, IL.


A basic challenge in understanding spoken word recognition is that speech unfolds over time.  This has led to a great deal of work in psycholinguistics on how listeners deal with temporary  ambiguity (Allopenna, Magnuson, & Tanenhaus, 1998; Luce & Pisoni, 1998; Marslen-Wilson,  1987), demonstrating that during early time points in a word (when its identity is still  ambiguous), listeners consider multiple lexical candidates that compete for recognition. Thus,  after hearing /tæ/, listeners may consider tack, tap, and taxi (cohorts, words with the same onset)  as possible completions. One consequence of this is that all models of spoken word recognition  assume that the order of phonemes in a word is essential to the recognition process. This  suggests that words like cohorts compete for activation, but words like phonemic anadromes (words with the same phonemes but opposite temporal order; e.g., cat vs. tack) do not.

Is this assumption necessary? An alternative is that words may be defined by temporal  order only loosely or not at all. Because phonemes vary acoustically with word position (e.g., the  /k/ in cat is different from the /k/ in tack), listeners may be able to distinguish words without  needing to track the order of phonemes in them. This idea is supported by work demonstrating  that listeners are sensitive to fine-grained acoustic detail during word recognition (McMurray,  Tanenhaus, & Aslin, 2002; Ranbom & Connine, 2007; Salverda, Dahan, & McQueen, 2003). In  addition, words sharing the same phonemes will have some acoustic similarities, which, in  contrast to the predictions of existing models, could lead listeners to consider anadromes.

We examined whether this was the case using a visual world eye-tracking experiment in  which we measured participants’ eye-movements to objects on a computer screen while they  selected the object that corresponded to a spoken word. The display contained four objects  comprising an item-set: a target (e.g., tack), its anadrome (cat), a cohort competitor (tap), and a  phonetically-unrelated item (mill). Listeners were more likely to look at anadromes than  unrelated objects (Figure 1), indicating that they considered them even though the phoneme  order did not match the auditory stimulus. We also ruled out the possibility that this effect was  driven by temporal overlap with the vowel (e.g., tack and cat both share the vowel /æ/ in the  correct position) by examining fixations to competitors that shared a vowel and another phoneme  (e.g., tap when the stimulus was cat). We found that fixations to these objects were not  significantly different from fixations to the unrelated objects. In addition, later in the trial, there  were more fixations to anadromes than to vowel-overlap objects. Finally, when we controlled for  differences in fixations due to the visual properties of the objects, we found significantly more  fixations to anadromes than to vowel-overlap or unrelated objects (Figure 2).

These results show that listeners consider anadromes during word recognition and that  lexical activation is not completely determined by phoneme order. This presents a challenge to  all existing models that make this assumption and suggests that we should reconsider how  temporal order is implemented in models of spoken word recognition. It also suggests that a way  to move forward may be to incorporate additional fine-grained acoustic detail into models.

PDF of abstract with figures

Tagged with: , , , , , ,
Posted in Presentations