OUTSTANDING DISSERTATIONS IN LINGUISTICS Edited by
Laurence Horn Yale University
A ROUTLEDGE SERIES OTHER BOOKS IN T...
14 downloads
893 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
OUTSTANDING DISSERTATIONS IN LINGUISTICS Edited by
Laurence Horn Yale University
A ROUTLEDGE SERIES OTHER BOOKS IN THIS SERIES: MINIMAL INDIRECT REFERENCE A Theory of the Syntax-Phonology Interface Amanda Seidl AN EFFORT BASED APPROACH TO CONSONANT LENITION Robert Kirchner PHONETIC AND PHONOLOGICAL ASPECTS OF GEMINATE TIMING William H.Ham GRAMMATICAL FEATURES AND THE ACQUISITION OF REFERENCE A Comparative Study of Dutch and Spanish Sergio Baauw AUDITORY REPRESENTATIONS IN PHONOLOGY Edward S.Flemming THE SYNCHRONIC AND DIACHRONIC PHONOLOGY OF EJECTIVES Paul D.Fallon THE TYPOLOGY OF PARTS OF SPEECH SYSTEMS The Markedness of Adjectives David Beck THE EFFECTS OF PROSODY ON ARTICULATION IN ENGLISH Taehong Cho PARALLELISM AND PROSODY IN THE PROCESSING OF ELLIPSIS SENTENCES Katy Carlson PRODUCTION, PERCEPTION, AND EMERGENT PHONOTACTIC PATTERS A Case of Contrastive Palatalization Alexei Kochetov RADDOPPIAMENTO SINTATTICO IN ITALIAN A Synchronic and Diachronic Cross-Dialectical Study Doris Borrelli PRESUPPOSITION AND DISCOURSE FUNCTIONS OF THE JAPANESE PARTICLE MO Sachiko Shudo
THE SYNTAX OF POSSESSION IN JAPANESE Takae Tsujioka COMPENSATORY LENGTHENING Phonetics, Phonology, Diachrony Darya Kavitskaya THE EFFECTS OF DURATION AND SONORITY ON CONTOUR TONE DISTRIBUTION A Typological Survey and Formal Analysis Jie Zhang EXISTENTIAL FAITHFULNESS A Study of Reduplicative TETU, Feature Movement, and Dissimilation Caro Struijke PRONOUNS AND WORD ORDER IN OLD ENGLISH With Particular Reference to the Indefinite Pronoun Man Linda van Bergen ELLIPSIS AND WA-MARKING IN JAPANESE CONVERSATION John Fry WORKING MEMORY IN SENTENCE COMPREHENSION Processing Hindi Center Embeddings Shravan Vasishth INPUT-BASED PHONOLOGICAL ACQUISITION Tania S.Zamuner VIETNAMESE TONE A New Analysis Andrea Hoa Pham ORIGINS OF PREDICATES Evidence from Plains Cree Tomio Hirose
CAUSES AND CONSEQUENCES OF WORD STRUCTURE by
Jennifer Hay
Routledge New York & London
Published in 2003 by Routledge 29 West 35th Street New York, NY 10001 Published in Great Britain by Routledge 11 New Fetter Lane London EC4P 4EE Copyright © 2003 by Taylor & Francis Books, Inc. Routledge is an imprint of the Taylor & Francis Group. This edition published in the Taylor & Francis e-Library, 2006. To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to http://www.ebookstore.tandf.co.uk/. All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system without permission in writing from the publisher. 10 9 8 7 6 5 4 3 2 Library of Congress Cataloging-in-Publication Data for this book is available from the Library of Congress ISBN 0-203-49513-6 Master e-book ISBN
ISBN 0-203-57912-7 (Adobe eReader Format) ISBN 0-415-96788-0 (Print Edition)
Contents List of Figures List of Tables
viii
xi
Preface
xiv
Acknowledgments
xvi
1. Introduction
1
2. Phonotactics and Morphology in Speech Perception
17
3. Phonotactics and the Lexicon
32
4. Relative Frequency and Morphological Decomposition
60
5. Relative Frequency and the Lexicon
84
6. Relative Frequency and Phonetic Implementation
108
7. Morphological Productivity
122
8. Affix Ordering
134
9. Conclusion
162
Appendix A . Segmentation and Statistics
168
References
192
Index
201
List of Figures 1.1 Schematized dual route model
5
1.2 Schematized dual route model, indicating resting activation levels
8
1.3 Schematized dual route model, with fast phonological processor 10 2.1 Architecture of a simple recurrent network designed to identify word onsets
20
2.2 Average well-formedness ratings as predicted by the log probability of the best morphological parse
23
3.1 Phonotactics and prefixedness
40
3.2 Phonotactics and sematic transparency
42
3.3 Phonotactics and semantic transparency, with line fit only through relatively transparent words
44
3.4 Phonotactics and polysemy
49
3.5 Phonotactics and relative frequency
54
4.1 Schematized dual route model
63
4.2 Waveform and pitchtrack for “But today they seemed imperfect.”
79
4.3 Waveform and pitchtrack for “But today they seemed impatient.”
80
4.4 Pitch accent placement and relative frequency
81
5.1 Suffixed forms: derived and base frequency
86
5.2 Prefixed forms: derived and base frequency
87
5.3 Prefixed forms: relative frequency and polysemy
92
5.4 Prefixed forms: frequency and polysemy
94
5.5 Prefixed forms: frequency, relative frequency, and polysemy
97
5.6 Suffixed forms: frequency and polysemy
102
6.1 Waveform and spectrogram for “daftly”
112
6.2 Waveform and spectrogram for “softly”
113
6.3 Waveform and spectrogram for “swiftly”
114
6.4 Waveform and spectrogram for “briefly”
115
6.5 Boxplots of average /t/-ness ratings
119
7.1 Type frequency and productivity
128
7.2 Relative frequency and phonotactics
129
7.3 Relative frequency and productivity
130
7.4 Phonotactics and productivity
131
7.5 Productivity, relative frequency, and phonotactics
133
A. 1
Probability of nasal-obstruent clusters morpheme internally, vs. 170 across a morpheme boundary
A.2 Reanalysis of data from Hay et al. (2000)
172
A.3 Intra- and inter-word co-occurrence probabilities for transitions occurring in 515 prefixed words
174
A.4 Log(intra-word co-occurrence probabilities) vs. Log(intra-word/ 174 inter-word probabilities for transitions occurring in 515 prefixed
words A.5 Experiment 2: Intra- and inter-word co-occurrence probabilities 177 for stimuli A.6 Experiment 2: Log expected probability across a word boundary, vs. number of subjects rating the stimuli as “complex”
178
A.7 Experiment 2: Log expected value across a word boundary of the worst member of each pair, against the difference in judgments between pair members
180
A.8 Predictions made by an “absolute probability” decomposer, 184 relative to a decomposer based on over- or under-representation in the lexicon A.9 Intra- and inter-word co-occurrence probabilities for transitions occurring in 515 prefixed words
189
List of Tables 2.1
Experiment 2: Stimuli
26
3.1
Experiment 3: Suffixed Stimuli
33
3.2
Experiment 3: Prefixed Stimuli
34
3.3
Prefixed Forms: Junctural Phonotactics and Polysemy
46
3.4
Prefixed Forms: Junctural Phonotactics and Polysemy
47
3.5
Prefixed Forms: Junctural Phonotactics and Semantic Transparency
50
3.6
Prefixed Forms: Relative Frequency and Junctural Phonotactics 52
3.7
Suffixed Forms: Junctural Phonotactics and Semantic Transparency
55
3.8
Suffixed Forms: Junctural Phonotactics and Polysemy
56
3.9
Suffixed Forms: Relative Frequency and Junctural Phonotactics 56
4.1
Experiment 4: Prefixed Stimuli
74
4.2
Experiment 4: Suffixed Stimuli
74
4.3
Experiment 4: Stimuli
77
5.1
Prefixed Forms: Frequency and Relative Frequency
87
5.2
Suffixed Forms: Frequency and Relative Frequency
88
5.3
Prefixed Forms: Above/Below Average Frequency and Relative 89 Frequency
5.4
Suffixed Forms: Above/Below Average Frequency and Relative 89 Frequency
5.5
Prefixed Forms: Frequency and Polysemy
91
5.6
Prefixed Forms: Relative Frequency and Polysemy
93
5.7
Prefixed Forms: Frequency, Relative Frequency, and Polysemy 94
5.8
Prefixed Forms: Relative Frequency and Semantic Transparency
97
5.9
Prefixed Forms: Frequency and Semantic Transparency
98
5.10 Suffixed Forms: Relative Frequency and Semantic Transparency
99
5.11 Suffixed Forms: Frequency and Semantic Transparency
99
5.12 Suffixed Forms: Relative Frequency and Polysemy
100
5.13 Suffixed Forms: Frequency and Polysemy
100
5.14 Suffixed Forms: Frequency, Relative Frequency, and Polysemy 102 6.1
Experiment 6: Stimuli
110
6.2
Experiment 6: Average “/t/-ness” Rankings
117
7.1
Frequency of the 14 Irregular Plurals in -eren in Modern Dutch 126
8.1
-ionist Forms
144
8.2
-ionary Forms
145
8.3
-ioner Forms
146
8.4
Frequency Counts for Latinate Bases Suffixed with -ment
151
8.5
Experiment 7a: Stimuli
153
8.6
Experiment 7b: Stimuli
156
Preface
The work reported here originally appeared as my 2000 Northwestern University Ph.D dissertation. This version of the text has undergone many minor revisions—and some of the statistics have been redone using more appropriate tests than those which were originally reported. The only major revision of the text was a folding together of the original chapters 8 (“Level-ordering”) and 9 (“The Affix Ordering Generalization”) into a single chapter, which appears here as chapter 8 (“Affix Ordering”). This does not reflect substantial revisions to the content or argumentation, but the result is a more concise and more coherently presented account of the proposed theory. No attempt has been made to update the text to reflect literature which has appeared since the completion of the dissertation, nor to incorporate discussion of research which has built on the ideas presented in the dissertation, or to respond to critiques of this work which have appeared in the literature. For extensive discussion and critique of the dissertation the reader is referred in particular to Baayen (2002) and Plag (2002). Plag (2002) has dubbed the Affix-ordering account developed in chapter 8 “Complexity Based Ordering” (CBO), a term which I have subsequently adopted (Hay 2002, Hay and Plag to appear). Hay and Plag (to appear) have developed the CBO account further by examining co-occurrence restriction patterns of a set of English affixes. We find that these patterns provide strong support for a CBO account. Harald Baayen and I have devoted considerable energy to modeling the effects described in chapter 7 on a much larger scale. Hay and Baayen (2002) and Hay and Baayen (to appear) describe a large scale investigation into the relationship between relative frequency, phonotactics, and morphological productivity. We find strong evidence for links between the phonotactics and frequency profile of an individual affix, and that affix’s productivity. One significant development is the motivation of a ‘parsing threshold’. The division in the dissertation between derived forms which are more frequent than the bases they contain, and derived forms which are less frequent than the forms they contain was a first approximation, and clearly overly simplistic. Hay and Baayen (2002) refine the notion of relative frequency, investigating how frequent a base form needs to be relative to the derived form, in order to facilitate parsing. Also in joint work with Harald Baayen, the results of experiment 4 were replicated in Dutch in 2001. The details of this replication have yet to be published.
Since the completion of the dissertation, text from chapters 4 and 5 has appeared as Hay (2001), and much of chapter 8 has been published as Hay (2002). Thanks are due to Linguistics and Language for permission to reprint that material in this volume.
Acknowledgments
First and foremost, this dissertation would not have been possible without the relentless, multi-faceted support of my supervisor, Janet Pierrehumbert. Her invaluable contributions to this enterprise constituted a magical combination of friendship, advice, argument, opera, inspiration, and fish. Her constant enthusiasm and her penetrating mind made this experience something a dissertation is not supposed to be—enormous fun. Janet will forever be a role-model for me, and I simply can not thank her enough. The rest of my committee was remarkably patient with my sometimes long silences, yet always forthcoming when I turned up unannounced with questions. Mary Beckman generously hosted me in the lab at Ohio State for two consecutive summers—where early plans for this dissertation were hatched—and was a rich source of advice and ideas. Chris Kennedy is to be thanked for his strong encouragement of my interest in lexical semantics, and for the good-natured way in which he managed to ask the really hard questions, and force me to be precise in my claims. Lance Rips provided an invaluable psychology perspective, and an extremely thorough reading of the final draft. His sharp eye caught several contentful errors and omissions, for which I am extraordinarily grateful. In addition to my committee members, several people played a substantial role in capturing my imagination, and/or shaping my subsequent thinking about phonetics, phonology, morphology and lexical semantics. They are Laurie Bauer, Ann Bradlow, Mike Broe, Fred Cummins, Stefan Frisch, Stefanie Jannedy, Beth Levin, Will Thompson, J.D.Trout, Paul Warren and Yi Xu. Beth deserves special thanks for her ceaseless encouragement, and the constant flow of relevant references. Stefanie Jannedy, Scott Sowers, J.D.Trout and Saundra Wright are to be credited as excellent sounding boards, and were good-natured recipients of many graphs on napkins. And Heidi Frank, Christina Nystrom, and Scott Sowers pro-vided comments on an early draft; and beer. Many thanks to everyone in the linguistics department at Northwestern, and particularly Betty Birner and Mike Dickey, for allowing me access to their students; Tomeka White, for keeping everything running smoothly; Ann Bradlow for establishing the linguistics department subject pool; and Chris Kennedy and Lyla Miller for lending their voices as stimuli. Special thanks are also due to everyone in the lab at Ohio State for their hospitality and help, particularly Jignesh Patel, Jennifer Vannest and Jennifer Venditti. Much of the work in this dissertation would have been excruciatingly time-consuming if it weren’t for my ability to program. For this, and for teaching me to think about text in
an entirely new way, I am indebted to Bran Boguraev, Roy Byrd and everyone in the Advanced Text Analysis and Information Retrieval Group, at IBM T.J.Watson Research. My stint at IBM was invaluable in a second way—by providing the financial freedom to concentrate fully on this dissertation during the all-important home straight. The Fulbright Foundation and Northwestern University also provided crucial financial support. The following people are to be credited with keeping me generally sane during my graduate career: Jennifer Binney, Lizzie Burslem, Heidi Frank, Stefanie Jannedy, Karin Jeschke, Mark Lauer, Norma Mendoza-Denton, Kathryn Murrell, Janice Nadler, Bernhard Rohrbacher, Scott Sowers, J.D.Trout, Lynn Whitcomb and Saundra Wright. Finally, I’d like to thank Janet Holmes for fostering my initial interest in linguistics, and then for sending me away. All my family and friends in NZ for their patience, love and support. And Aaron. Evanston, Illinois May 2000 I would like to take this opportunity to thank Harald Baayen and Ingo Plag for their extensive feedback on the work reported in this dissertation, and for the joint work which has grown out of our discussions. I should also (belatedly) acknowledge Cafe Express, in Evanston, where much of the original text was written. This version of the document has benefited from the outstanding latex support of Ash Asudeh, and the careful proofreading of Therese Aitchison and Brynmor Thomas. Christchurch, New Zealand May 2003
CHAPTER 1 Introduction
This is a book about morphology. It tackles questions with which all morphologists are intimately familiar, such as the degree to which we can use affixes to create new words, and the possible orderings in which affixes can appear. However it takes as its starting point questions which are generally considered well outside the domain of morphology, and even considered by many to be outside the domain of linguistics. How do listeners process an incoming speech signal? How do infants learn to spot the boundaries between words, and begin to build a lexicon? I demonstrate that fundamentals of speech processing are responsible for determining the likelihood that a morphologically complex form will be decomposed during access. Some morphologically complex forms are inherently highly decomposable, others are not. The manner in which we tend to access a morphologically complex form is not simply a matter of prelinguistic speech processing. It affects almost every aspect of that form’s representation and behavior, ranging from its semantics and its grammaticality as a base of further affixation, to the implementation of fine phonetic details, such as the production of individual phonemes and pitch accent placement. Linguistic morphology has tended to focus on affixes, and on seeking explanations for unexpected differences in their behavior. I argue that that a different level of abstraction is necessary. Predictions about the behavior of specific affixes naturally follow when we focus on the behavior of individual words. In order to properly account for classically morphological problems such as productivity, stacking restrictions and cyclicity phenomena, we need to understand factors which lead individual forms to become, and remain, decomposed. And in order to understand decomposition, we need to start at the very beginning.
1.1 Modeling Speech Perception When listeners process an incoming speech signal, one primary goal is the recognition of the words that the signal is intended to represent. Recognizing the words is a general prerequisite to the higher level goal of reconstructing the message that the speaker intended to convey.
Causes and consequences of word structure
2
The processing required to map incoming speech to stored lexical items can be broadly divided into two levels-prelexical processing, and lexical processing. Lexical processing consists of the selection of appropriate lexical entries in the lexicon. Prelexical processing involves strategies exploited by listeners in order to facilitate access to these lexical entries. One primary role of such prelexical processing is the segmentation of words from the speech stream. Speech, unlike the written word, does not come in a form in which word boundaries are clearly marked. This fact makes the acquisition of language an impressive feat, as infants must learn to spot word-boundaries in running speech, in order to begin the task of acquiring a lexicon. Recent work has shed considerable light on the strategies used by adults and infants to segment words from the speech stream. Multiple cues appear to be simultaneously exploited. These include stress patterns (see e.g. Cutler 1994), acoustic phonetic cues (Lehiste 1972), attention to utterance boundaries (Brent and Cartwright 1996), and knowledge of distributional patterns (Saffran, Aslin and Newport 1996; Saffran, Newport and Aslin 1996). Many of these strategies may be associated with the prelexical level in speech perception, serving as a filter which hypothesizes word boundaries, facilitating access to lexical entries which are aligned with those boundaries (Pitt and McQueen 1998, van der Lugt 1999, Norris, McQueen and Cutler 2000). Of course, for adult listeners, segmentation may result as a by-product of recognition of words embedded within the signal. As van der Lugt (1999:24) eloquently observes: “if you know the word, you know the word boundaries.” Lexical entries are organized into a network, and compete with each other in access. There is now a large body of evidence supporting the claim that words compete, resulting from a variety of experimental tasks (see e.g. McQueen, Norris and Cutler 1994, Norris et al. 1995, Vitevitch and Luce 1998). Lexical competition is therefore incorporated into most current models of speech perception, including MERGE (Norris et al. 2000), TRACE (McClelland and Elman 1986), NAM (Luce and Pisoni 1998), SHORTLIST (Norris 1994) and ART (Grossberg 1986). One factor which is highly relevant to lexical competition is lexical frequency. In speech perception, ambiguous stimuli tend to be identified as high frequency words (Connine, Titone and Wang 1993), less acoustic information is required to identify high frequency words than low frequency words (Grosjean 1980), and lexical decision times are negatively correlated with lexical frequency (Balota and Chumbley 1984). Also, a highly schematized (sine-wave) replica of a list of sentences is recognized as speech earlier by naive listeners if the sentences contain only high-frequency words, and once this speech mode of listening is triggered, word-identification rates are significantly more accurate for high frequency words (Mast-Finn 1999). Frequency also affects speech production, with high frequency words accessed more quickly, produced more fluently, undergoing greater reduction, and being less prone to error (see, e.g. Whalen 1991, Levelt 1983, Dell 1990, Wright 1997, Hay, Pierrehumbert, Beckman and Faust West 1999; Hay, Jannedy and Mendoza-Denton 1999). Lexical and prelexical speech perception processes such as those described above have important consequences for the long-term representation of lexical items. Recognizing an item affects that item’s representation, and increases the probability that it will be successfully recognized in the future. Some models capture this process by raising the resting activation level of the relevant lexical entry—such a mechanism is implicit in
Introduction
3
most of the models outlined above. Other models, based on exemplars, assume that identifying a word involves adding a new exemplar to the appropriate exemplar cloud (Johnson 1997a,b). However, in order to capture the fact that words encountered frequently have different properties from words encountered relatively infrequently, all models must assume that accessing a word in some way affects the representation of that word. All of the models discussed above have been primarily concerned with modeling the access of simple words. Central to this book is the argument that they also make extremely important predictions for the processing of affixed words. Assume that an affixed word can be accessed in two ways—a direct route, in which the entire lexical entry is accessed directly, or a decomposed route in which the word is accessed via its component parts (models which make this assumption are outlined in section 1.2). If accessing a simple word affects its representation, it follows that accessing a complex word also affects its representation. And, if there are two ways to access an affixed word, then there are two ways to affect its representation. Accessing it via the decomposed route reinforces its status as an affixed word made up of multiple parts. Accessing it via the direct route reinforces its status as an independent entity. Importantly, not all affixed words are equally likely to be accessed via a decomposed route. As outlined above, prelexical information is used to segment the speech stream in speech perception. This can be modeled by assigning lexical entries which are not well aligned with hypothesized boundaries less activation than lexical entries which are well aligned (cf. Norris et al. 1997). It follows that if an affixed word possesses properties leading to the hypothesis of a boundary at the morpheme boundary, this would significantly facilitate a decomposed route—and impede the direct route, which would not be aligned with the hypothesized boundary. If an affixed word possesses no such properties, however, no boundary will be hypothesized. This book examines this hypothesis in the context of two specific factors which the speech processing literature predicts to be relevant to the segmentation of words from the speech stream: one lexical, and one prelexical. At the prelexical level, we will investigate the role of distributional information, in the form of probabilistic phonotactics. At the level of lexical processing, we will investigate the role of frequency-based lexical competition. I demonstrate that these factors both exert an important influence on the processing of affixed words, and, consequently, many aspects of their representation. This has profound consequences for the semantic transparency and decomposability of individual affixed forms, and for the predicted behavior of specific affixes. By recognizing that decomposability is a continuum, and that it can be directly related to factors influencing segmentation in speech perception, we will acquire tremendous explanatory power in domains which have proven classically problematic in linguistic morphology, including morphological productivity, level-ordering phenomena, and affix ordering restrictions.
1.2 Modeling Morphological Processing While morphological processing is clearly part of speech processing, the two have not generally been treated together. Discussion in the context of models of speech processing
Causes and consequences of word structure
4
tends to deal exclusively with the treatment of simple words. Because affixed words present special problems, researchers interested in morphological processing have developed their own models in order to account for observed phenomena in this particular sub-domain. Recent models of morphological processing share many components in common with more general speech processing models, including concepts such as the arrangement of words in a lexical network, and frequency-based resting activation levels of lexical items (see, e.g. Frauenfelder and Schreuder 1992, Baayen and Schreuder 1999). Models of morphological processing must make some fundamental assumptions about the role of decomposition. Do we decompose affixed words upon encountering them, breaking them down into their parts in order to access lexical entries associated with the component morphemes? Or do we access affixed words as wholes, accessing an independent, holistic, lexical entry? Some researchers have argued that there is no decomposition during access (e.g. Butterworth 1983), and others have claimed there is a prelexical stage of compulsory morphological decomposition (e.g. Taft 1985). Laudanna, Burani and Cermele (1994) and Schreuder and Baayen (1994) argue that affixes are not an homogeneous set, and so it is hazardous to generalize over them as an undifferentiated category. Indeed, I will argue that it is hazardous to generalize even over different words which share the same affix. Most current models are mixed—allowing for both a decomposed access route, and a direct access, non-decomposed route. In many models the two routes explicitly compete, and, in any given encounter with a word, either the decomposed or the direct route will win (Wurm 1997, Frauenfelder and Schreuder 1992, Baayen 1992, Caramazza, Laudanna and Romani 1988). Direct competition does not necessarily take place, however. Baayen and Schreuder have recently argued that the two routes may interactively converge on the correct meaning representation (Schreuder and Baayen 1995, Baayen and Schreuder 1999). In this model too, however, there will necessarily be some forms for which the decomposed route dominates access, and others in which the direct, whole word representation is primarily responsible for access. As pointed out by McQueen and Cutler (1998), dual-route models appear to have met the most success in accounting for the range of empirical facts. For ease of explication, I will adopt a simple dual route “to the death” model throughout—assuming that, in any given encounter, a form was accessed via either a fully decomposed form, or a holistic direct route. I do this not because I am committed to its truth, but rather because it offers the best prospects for a straightforward illustration of the sometimes complex interaction of the various factors involved. It is important to note that the predictions made in this book regarding the interaction of speech processing strategies with morphological decomposition do not rely on this specific choice of model. The same predictions will follow from any model in which both decomposition and whole route access are available options, or in which the presence of the base word can be variably salient. Following the speech processing literature outlined in the previous section, we will assume that accessing a word via a whole word route affects the representation of that word. This occurs either by a raising of the resting activation level, or—in exemplar models, by increasing the number of relevant exemplars. We will also assume that complex words, even if accessed via a decomposed route, are subsequently stored in
Introduction
5
memory. In an exemplar model, we would assume that such exemplars are stored in parsed form. In our more abstract dual route network, we can assume that the form is stored with strong links to the parts that were used to compose it. Figure 1.1 shows an idealization of the two routes which race to access the lexical entry for insane. The dashed line indicates the direct access route—on encountering insane, this represents the possibility to access it directly.
Figure 1.1: Schematized dual route model. The solid lines indicate the decomposed route. The dashed line indicates the direct route. The solid lines indicate the decomposed route. The component parts are activated, and used to access the lexical item. If this route wins, the connection between the parts and
Causes and consequences of word structure
6
the derived form is reinforced. As such, any access via the whole word route will serve to reinforce the independent status of insane, whereas any access via the decomposed route will reinforce its decomposability—and its relation to in- and -sane. With this basic framework in place, we now introduce some factors known to be relevant to speech processing, in order to see how they are likely to impact the processing of morphologically complex words.
1.3 Lexical Effects 1.3.1 Phonological Transparency One primary goal of speech processing is to map the incoming speech signal to appropriate lexical entries. In the example given above, the speech signal associated with insane is sufficiently similar to at least two entries in the lexicon—insane and sane—that both are activated. However, if the speech signal matched one candidate more poorly than another, then that candidate would not fare well in the competition. Insane contains sane (phonologically), and so both are well activated by the speech signal. Sanity, on the other hand, does not contain sane. As such, the contents of the speech stream will not map well to the access representation for sane. The whole word route has a high chance of winning whenever the base word is not transparently phonologically contained in the derived form. In keeping with this prediction, Cutler (1980, 1981) demonstrates that the acceptability of neologisms relies crucially on the degree to which they are phonologically transparent. Bybee (1985:88) has claimed that derived forms with low phonological transparency are more likely to become autonomous than forms which are phonologically close to the base form, and Frauenfelder and Schreuder (1992) explicitly build this into their dual route model of morphological access, with the parsing route taking more time for phonologically less transparent forms. The claim that phonological transparency facilitates access is not uncontroversial, however. For example Marslen-Wilson and his colleagues (e.g. MarslenWilson et al. 1994, Marslen-Wilson et al. 1997, Marslen-Wilson and Zhou 1999) find that derived words prime their bases equally, regardless of phonological transparency. They argue that speech is directly mapped onto an abstract representation of a morpheme, which is underspecified for the properties of the base word which display variation. While Marslen-Wilson et al.’s results demonstrate that equivalent priming takes place once a lexical item is accessed, the results do not explicitly exclude the possibility that access to that lexical item may vary with phonological transparency, either in manner or speed. 1.3.2 Temporality Speech inherently exists in time; we encounter word beginnings before we encounter word ends. Consider this with reference to the model in figure 1.1. The acoustic signal which will activate the whole word insane, precedes the acoustic signal which will activate the base, sane. If we assume that, in general, affixes tend to have relatively lower
Introduction
7
resting activation levels than free words, then this temporal asymmetry is likely to favor the whole word route for prefixed words. Conversely, in suffixed words, the temporal onset of the derived form and the base is simultaneous. Any bias we see towards whole word access in prefixes, should be reduced for suffixed forms. Cutler et al. (1985) argue that this temporal asymmetry—together with the fact that language users prefer to process stems before affixes, is responsible for the generalization that suffixes are much more frequent across the world’s languages than prefixes (Greenberg 1966). Indeed, there is good evidence that prefixed words tend to be treated differently in processing than suffixed words (Beauvillain and Segui 1992, Cole, Beauvillain and Segui 1989, Marslen-Wilson et al. 1994). In chapter 3 we will see evidence that this temporal asymmetry has long-term consequences for lexical representations. 1.3.3 Relative Frequency As outlined in section 1.1, lexical frequency affects speed of access. We must therefore assume that each of the nodes in figure 1.1 has a resting activation level which is a function of its frequency of access. Nodes associated with frequent words (or morphemes) will be accessed more quickly than nodes associated with infrequent words. We will display nodes with different line widths, with thick lines indicating high frequency, and so high resting activation levels. We can approximate the relative frequencies of the derived form and the base by consulting their entries in the CELEX Lexical Database (Baayen et al. 1995). This reveals that sane occurs at a rate of around 149/17.4 million, whereas insane occurs at a rate of around 258/17.4 million. This is shown graphically in 1.2. We will leave aside the resting activation level of the prefix. This requires some knowledge of how many words containing this affix are routinely accessed via decomposition. In chapter 7, I will return to this issue, demonstrating that the resting activation level of an affix (as estimated by the proportion of forms which are accessed via decomposition) is highly predictive of that affix’s productivity level. When we now consider the two routes in 1.2 it is clear that the whole word route has an advantage. The higher relative frequency of insane speeds the whole route, relative to the decomposed route. Insane can be compared with a word like infirm. Infirm is fairly infrequent (27/17.4 million), and, importantly, its base firm is highly frequent (715/17.4 million). As such, we predict the decomposed route should have a strong advantage
Causes and consequences of word structure
8
Figure 1.2: Schematized dual route model. The solid lines indicate the decomposed route. The dashed line indicates the direct route. The line width of each node indicates the resting activation level—insane is more frequent than sane. over the whole word access route. Importantly, because words compete, the absolute frequency of the derived form is not so important as its frequency relative to the base form with which it is competing. Many researchers have posited a relationship between lexical frequency and the decomposition of complex forms. However, this body of work has nearly exclusively concentrated on
Introduction
9
the absolute frequency of the derived form—arguing that high frequency forms are not decomposed. The argument here is different. Insane is not a high frequency word. Chapter 4 steps through current models of morphological processing in some detail, arguing that all models which predict a frequency effect, in fact predict one of relative and not absolute frequency.
1.4 Prelexical Effects As outlined above, there is good evidence that some aspects of speech processing take place at a prelexical level—incoming speech undergoes some preprocessing in order to facilitate mapping to appropriate lexical items. Many such processing strategies also appear to be prelexical in a second sense—they are acquired by the infant before they have the beginnings of a lexicon. Morphologically complex words must be subject to the same types of preprocessing as morphologically simple words. Precisely because prelexical effects are prelexical, they can not possibly filter out morphologically complex words and treat them as special cases. Prelexical processing affects complex words too. We now therefore consider the implications of adding a Fast Phonological Preprocessor to our idealized dual route model, as schematized in figure 1.3. Information flows from the phonological preprocessor in a strictly bottom-up fashion (see Norris et al. 2000). One primary role of the preprocessor is to entertain hypotheses about possible segmentations of the incoming speech stream. These hypotheses are formed on the basis of knowledge of lexical patterns and regularities. There are at least three specific hypothesis-forming strategies which could impact the processing of morphologically complex words—the use of metrical structure, restrictions on what can be a possible word, and the use of local distributional cues. 1.4.1 Metrical Structure There is extremely good evidence that English speaking adults and infants exploit the stress patterns of words in order to aid segmentation. Over 90% of English content words begin with stressed syllables (Cutler and Carter 1987). There is strong evidence that English speaking adults and infants
Causes and consequences of word structure
10
Figure 1.3: Schematized dual route model. The solid lines indicate the decomposed route. The dashed line indicates the direct route. A fast phonological preprocessor operates prelexically, facilitating access to lexical entries which are aligned with hypothesized word boundaries.
Introduction
11
take advantage of this during speech processing—positing word boundaries before strong syllables (Cutler 1990, Cutler and Butterfield 1992, Cutler and Norris 1988, Jusczyk, Cutler and Redanz 1993, Jusczyk, Houston and Newsome 1999). This Metrical Segmentation Strategy is evident in infants as young as 7.5 months (Jusczyk et al. 1999), and is clearly prelexical. In our dual-route model of morphology we attribute such prelexical effects to a phonological preprocessor—the preprocessor hypothesizes boundaries at the beginning of strong syllables and facilitates access to candidate lexical entries which are aligned with those boundaries. What implications does this have for morphologically complex words? Words in which a strong syllable directly follows the morpheme boundary will be more likely to be decomposed than words in which that syllable is weak. To see this, compare the processing of inhuman and inhumane. The first syllable of both derived words is unstressed, so, when encountered in running speech, the metrical segmentation strategy will do nothing to facilitate access via the direct route. However, the words differ in an important way. The first syllable following the morpheme boundary is stressed in inhuman, but not in inhumane. As such, the preprocessor should facilitate access to the base in the former, but not the latter. We expect words like inhuman to be more likely to be decomposed during processing than words like inhumane. The potential relevance of the Metrical Segmentation Strategy for the processing of complex words has been pointed out by Schreuder and Baayen (1994). The prediction being put forward here is somewhat more precise—because the Metrical Segmentation Strategy will affect only some prefixed words, we expect these words to remain more robustly decomposed than their non-optimally stressed counterparts. Words in the latter group should more often favor the direct access route, be more likely to become liberated from their bases, and undergo semantic drift. Some preliminary calculations over lexica lead me to believe that this will prove to be the case. However no results on this point are reported in this book—it remains as an empirical, testable, question. 1.4.2 Possible Word Constraint Norris et al. (1997) claim that there is a Possible Word Constraint operative in the segmentation of speech. This constraint effectively suppresses activation of candidate forms which would lead to a segmentation resulting in impossible words. Their experimental results, for example, show that subjects have a harder time spotting apple inside fapple than inside vuffapple (because f is not a possible word). Similarly, sea is easier to spot in seashub than seash. We predict that this will have implications for the processing of affixes which themselves cannot be words. For example Bauer (1988) points to abstract noun forming th as a textbook case of a non-productive affix. Example nouns with this affix include warmth, growth and truth. The affix -th is not itself a syllable, and is shorter than the minimal possible word in English. Thus, the processor is unlikely to posit a boundary before -th in truth, and so the direct route is likely to be advantaged. This can be contrasted with the processing of a word like trueness, where the Possible Word Constraint would not disadvantage access of the base. Thus, we predict that more words
Causes and consequences of word structure
12
containing word-like affixes will be decomposed during online processing than words containing affixes which themselves could not be phonological words of English. 1.4.3 Probabilistic Phonotactics Language specific phonotactic patterns affect multiple aspects of speech perception. Important for the discussion here, they appear to be one cue which is used in the segmentation of speech. Saffran, Aslin and Newport (1996) show that, when presented with a string of nonsense words, eight month old infants are sensitive to transitional probabilities in the speech stream. This is also true of adults (Saffran, Newport and Aslin 1996). This result suggests that sensitivity to probabilistic phonotactics plays a role in the segmentation of speech. McQueen (1998) and van der Lugt (1999) provide further evidence that phonotactics are exploited for the task of locating word boundaries. Mattys et al. (1999) demonstrate that English learning infants are sensitive to differences between inter- and intra- word phonotactics. Computer models demonstrate that probabilistic phonotactics can significantly facilitate the task of speech segmentation (Brent and Cartwright 1996, Cairns et al. 1997, Christiansen et al. 1998). Recent results reported by Pitt and McQueen (1998) and Vitevitch and Luce (1998) suggest that knowledge of probabilistic phonotactics must be represented independently of specific lexical entries. In brief, our phonological preprocessor depicted in figure 1.3 is likely to be sensitive to distributional cues—positing boundaries inside phoneme transitions which are unlikely to occur word-internally. This has important implications for morphologically complex words. Namely, if the phonology across the morpheme boundary is highly unlikely to occur morpheme internally, then the preprocessor is likely to posit a boundary, and so advantage the decomposed route. Inhumane is an example of such a word. The /nh/ transition is highly unlikely to be found within a simple word, and so the processor will hypothesize the presence of a boundary. As such, the direct route will be disadvantaged in comparison with the decomposed route, because it does not align with hypothesized boundaries. We expect words like inhumane to be more likely to be decomposed than words like insincere where the transition across the morpheme boundary is well attested morpheme internally (e.g. fancy, tinsel). Similarly, pipeful should be more likely to be decomposed than bowlful (cf. dolphin).
1.5 Consequences 1.5.1 Words In this book I concentrate on two of the predictions outlined above—one lexical (lexical frequency) and one prelexical (probabilistic phonotactics). I demonstrate that both factors are related to morphological decomposition. Because lexical access leaves its traces on the lexicon, this has profound implications, leading words which tend to be decomposed in access to acquire different properties than words which are more likely to be accessed via a direct route. Words which are more
Introduction
13
prone to whole word access appear less affixed, undergo semantic drift, proliferate in meaning, and are implemented differently in the phonetics. They are effectively free to become phonologically and semantically liberated from their bases and acquire idiosyncrasies of their own. Words which are more prone to decomposition during access, on the other hand, remain more robustly decomposed. They tend to bear a regular and predictable semantic relation to their base word, and are highly decomposable, both impressionistically, and in the phonetics. Close investigation of lexical frequency and phonotactics in morphologically complex words leads us to discover a number of lexicon-internal syndromes—remarkable co-occurrences of frequency-based, phonologically-based, and semantic factors. 1.5.2 Affixes As outlined above, I argue that affixed words display different levels of decomposability, and that the degree to which a word is decomposable can be well predicted by general facts about speech processing. This variation in decomposability has consequences not only for individual affixed forms, but also for the affixes they contain. Affixes represented by many highly decomposed forms will have higher activation levels than affixes which are represented by many forms which are accessed via a direct, non-decomposed route. Some properties inherent to the affixes themselves predict the degree to which words that contain them will be decomposable. One such property is the degree to which the affix has the form of a possible word. As outlined above, the Possible Word Constraint will make words containing the suffix -th less likely to be decomposed than words containing a more wordlike suffix like -ness. Similarly, the types of phonotactics an affix tends to create across the boundary is highly relevant. An affix which creates consistently bad phonotactics across the morpheme boundary is likely to be represented by a higher proportion of decomposed forms than an analog which consistently creates words which have the phonological characteristics of monomorphemic words. I demonstrate that the productivity of an affix can be directly related to the likelihood that it will be parsed out during processing. Previous work has documented the remarkable co-occurrence patterns involving phonological patterns, phonological transparency, semantic transparency, productivity, and selectional restrictions of affixes in English. These co-occurrences have been used to motivate two distinct levels of affixation. In this book I abandon the idea that affixes can be neatly divided into two distinct classes, and demonstrate that the variable parsability of affixes can be linked to the variable decomposition of individual words. This affords us enormous explanatory power in arenas which have proven classically problematic in linguistic morphology—such as restrictions on affix ordering. While early accounts of affix ordering were overly restrictive, recent work which has discarded the idea that there are any restrictions on ordering (beyond selectional restrictions) misses a number of important generalizations about stacking restrictions in English. When we approach the problem from a different level of abstraction, we gain enormous explanatory power.
Causes and consequences of word structure
14
The problem of restrictions on affix ordering in English can be essentially reduced to one of parsability: an affix which can be easily parsed out should not occur inside an affix which can not. This has the overall result that, the less phonologically segmentable, the less transparent, the less productive an affix is, the more resistant it will be to attaching to already affixed words. This prediction accounts for the patterns the original affix-ordering generalization was intended to explain, as well as the range of exceptions which have been observed in the literature. Importantly, the prediction extends to the parsability of specific affixes as they occur in specific words. This accounts for the so-called dual-level behavior of many affixes. I demonstrate that an affix may resist attaching to a complex word which is highly decomposable, but be acceptable when it attaches to a comparable complex word which favors the direct-route in access. Armed with results concerning the decomposability of words with certain frequency and phonology-based characteristics, we are able to systematically account for the range of allowable affix combinations in English. The temporal nature of speech perception also leads to powerful predictions regarding allowable bracketing paradoxes— unexpected combinations of prefixes and suffixes.
1.6 Some Disclaimers All of the experiments and the claims put forward here are based on English derivational morphology. I set inflectional morphology aside. I do this not because I believe that all inflectional morphology is inherently different from derivational morphology—on the contrary. However most inflectional morphology in English is of a much higher type frequency than typically observed with derivational morphology. Thus, the models outlined above would predict it to acquire markedly different characteristics. One effect of high type frequency, for example, would be a high resting activation level of the affix, and so a high level of productivity. Because the incorporation of inflectional morphology in this study would require close attention to (and experimental control of) affixal type frequency, I have omitted it from the experiments, and the study. The degree to which the results presented here are relevant to inflection remains an empirical question. The use of the terminology dual-route should not be viewed as implicit support for dual-systems models in which rule-based morphology is fundamentally distinguished from memory-based morphology (see, e.g. Pinker and Prince 1988, Prasada and Pinker 1993, Ullman 1999). As the ‘dual-systems’ debate has been waged primarily on the turf of inflectional morphology, it should be clear that the phenomena discussed in this book do not speak directly to that debate. As the use of similar terminology opens the door for confusion, however, it is worth pointing out explicitly that the ‘dual-routes’ in the simple model outlined in figure 1.3 (as well as in the models on which it is loosely based) are intended to refer to different access routes to a single representation. This book is also not a crosslinguistic study. I argue that the speech segmentation strategies used by English speakers exert an important influence on English morphology. The consequences of this claim are not examined for other languages. There are clearly
Introduction
15
cross-linguistic differences both in speech segmentation processes (see, e.g. Vroomen, Tuomainen and de Gelder 1998), and in morphological patterns (see e.g. Spencer 1991). The specific organization of a given language, that is, the factors which facilitate speech segmentation, and the degree to which these are present in morphologically complex words, will affect the manner in which the effects described here will play out in that language. While the range of results presented here certainly predict that there will be an interaction between speech perception and morphological structure crosslinguistically, detailed cross-linguistic analysis remains for future research.
1.7 Organization of the Book The book is organized as follows. Chapter 2 outlines the evidence for the role of phonotactics in the segmentation of speech. I present a simple recurrent network which is trained to use this information to segment words, and demonstrate that this learning automatically transfers to hypothesizing (certain) morpheme boundaries. If phonotactic-based segmentation is prelexical, morphology cannot escape its effects. Results are then presented from an experiment involving nonsense words, demonstrating that listeners do, indeed, use phonotactics to segment words into morphemes. Chapter 3 investigates the consequences of this result for real words. Experimental results presented in this chapter demonstrate that listeners perceive real words containing low probability junctural phonotactics to be more easily decomposable than matched counterparts which contain higher probability junctural phonotactics. Calculations over lexica demonstrate that prefixed words with legal phonotactics across the morpheme boundary are more prone to semantic drift, more polysemous, and more likely to be more frequent than the bases they contain. These results are not replicated for suffixed words— a difference which can be ascribed to the inherently temporal nature of speech processing. In chapter 4 we turn to lexical aspects of speech processing, investigating the role of the relative frequency of the derived form and the base. I argue that, despite previous claims involving the absolute frequency of the derived form, all models which predict a relation between frequency and decomposability, predict a relation involving relative frequency. Two experimental results are presented which confirm this prediction. First, subjects rate words which are more frequent than their bases as appearing less morphologically complex than matched counterparts which are less frequent than their bases. And second, words which are less frequent than the bases they contain (e.g. illiberal) are more likely to attract a contrastive pitch accent to the prefix than matched counterparts which are more frequent than their bases (e.g. illegible). Chapter 5 further demonstrates the lexicon-internal effects of relative frequency, by presenting calculations over lexica. Derived forms which are more frequent than their bases are less semantically transparent and more polysemous than derived forms which are less frequent than their bases. This result proves robust both for prefixes and suffixes. In addition to being cognitively important, this result has methodological consequences for current work on morphological processing.
Causes and consequences of word structure
16
Some phonetic consequences of decomposability are examined in chapter 6, which presents experimental results involving /t/-deletion. Decomposable forms (such as softly—less frequent than soft), are characterized by less reduction in the implementation of the stop than forms which are not highly decomposable (e.g. swiftly—more frequent than swift). The remainder of the book examines the linguistic consequences of the reported results. The two factors we have examined provide us with powerful tools for estimating the decomposability of an affixed word. Armed with these tools, I demonstrate the central role of decomposability in morphological productivity (chapter 7) and the affix ordering generalization (chapter 8). The results are drawn together and discussed as a whole in chapter 9—the conclusion. Taken as a whole, the results in this book provide powerful evidence for the tight connection between speech processing, lexical representations, and aspects of linguistic competence. The likelihood that a form will be parsed during speech perception has profound consequences, from its grammaticality as a base of affixation, through to fine details of its implementation in the phonetics.
CHAPTER 2 Phonotactics and Morphology in Speech Perception
English speaking adults and infants use phonotactics to segment words from the speech stream. The goal of this chapter is to demonstrate that this strategy necessarily affects morphological processing. After briefly reviewing the evidence for the role of phonotactics in speech perception (2.1), I discuss the results of Experiment 1—the implementation of a simple recurrent network (2.3). This network is trained to use phonotactics to spot word boundaries, and then tested on a corpus of multimorphemic words. The learning transfers automatically to the spotting of (certain) morpheme boundaries. Morphologically complex words in English cannot escape the effects of a prelexical, phonotactics-based segmentation strategy. Having illustrated that segmentation at the word and morpheme level cannot be independent, I present the results of Experiment 2, which show that listeners can, indeed, use phonotactics to segment nonsense words into component “morphemes.”
2.1 Phonotactics in Speech Perception There is a rapidly accumulating body of evidence that language specific phonotactic patterns affect speech perception. Phonotactics have been shown to affect the placement of phoneme category boundaries (Elman and McClelleland 1988), performance in phoneme monitoring tasks (Otake et al. 1996), segmentation of nonce forms (Suomi et al. 1997), and perceived well-formedness of nonsense forms (Pierrehumbert 1994, Coleman 1996, Vitevitch et al. 1997, Treiman et al. 2000, Frisch et al. 2000, Hay et al. to appear, and others). Several of these results are gradient, indicating that speakers are aware of, and exploit, the statistics of their lexicon. Such statistics also appear to play a vital role in the acquisition process. Jusczyk et al. (1994) show that nine month old infants prefer frequent phonotactic patterns in their language to infrequent ones. Saffran, Aslin and Newport (1996) show that, when presented with a string of nonsense words, eight month old infants are sensitive to transitional probabilities in the speech stream. This is also true of adults (Saffran, Newport and Aslin 1996). This result is important because it suggests that sensitivity to probabilistic phonotactics plays a role in the segmentation of speech.
Causes and consequences of word structure
18
McQueen (1998) and van der Lugt (1999) provide further evidence that phonotactics are exploited for the task of locating word boundaries. While knowledge of segment-level distributional information appears to be important, it is certainly not the only cue which plays a role in segmentation (Bates and MacWhinney 1987, Christiansen, Allen and Seidenberg 1998). Other clues include the stress pattern (Cutler and Norris 1988, Jusczyk, Cutler and Redanz 1993), acousticphonetic cues (Lehiste 1972), prosody (Gleitman, Gleitman, Laundau and Wanner 1988), knowing a substring (Dahan and Brent 1999) and attention to patterns at utterance boundaries (Brent and Cartwright 1996). In this chapter we concentrate on the role of junctural phonotactics.
2.2 Neural Networks and Segmentation Neural network models have been used to demonstrate that distributional information related to phonotactics can inform the word segmentation task in language acquisition. Elman (1990) demonstrates that a network trained on a phoneme prediction task can indirectly predict word boundaries. Error is high word-initially, and declines during the presentation of the word. Based on this result, he claims that error distribution in phoneme prediction could aid in the acquisition process, with high error rates indicating word onsets. In keeping with this hypothesis, Christiansen et al. (1998), and Allen and Christiansen (1996) deploy a phoneme prediction task in which a network is trained to predict the next phoneme in a sequence, where one possible phoneme is a boundary unit. The boundary unit is activated at utterance boundaries during training. The input at any given time is a phoneme—how this is represented varies across different implementations, but most modelers use a distributed representation, with each node representing certain features. The correct output is the activation of the phoneme which will be presented next in the sequence. The sequences preceding the utterance boundary will always be sequences which are legal word endings. Thus, after training, the utterance boundary is more likely to be activated after phoneme sequences which tend to end words. Allen and Christiansen (1996) train their network on 15 tri-syllabic nonsense words, concatenated together into utterances ranging between 2 and 6 words. They demonstrate that the network is successful at learning this task when the word internal probabilities are varied—so as to provide potential boundary information. However when the word internal probability distributions were flat, the network failed to learn the task. Christiansen et al. (1998) scale the training data up, demonstrating that when the network is faced with input typical of that to infants, phonotactics can also be exploited in the segmentation task. They train the model on utterances from the CHILDES database, and demonstrate that stress patterns, utterance boundaries and phonotactics all facilitate the phoneme prediction task, and so, indirectly, the identification of word boundaries. In a phoneme prediction task such as that deployed by Christiansen et al.’s network, the activation of a boundary depends on the phoneme(s) that precede it. Consider the trained network’s behavior on presentation of the word man. Each phoneme is presented to the network, and on the presentation of each phoneme, the network attempts to predict the next phoneme. On presentation of the /n/, then, a certain prediction will occur, and the
Phonotactics and Morphology in Speech Perception
19
boundary marker will receive some degree of activation. The degree of activation of the boundary unit is unrelated to whatever phoneme actually occurs next. At the time of the presentation of /n/, the network does not know whether the next phoneme is /t/, /p/, or something else. As such, a network trained on a prediction task would make the same predictions regarding the presence of a boundary in strinty and strinpy. Given that adults do make different predictions regarding the decomposability of these strings (see section 2.4), we should assume that, at least in a mature listener, the phoneme following the potential juncture is also relevant to the task of segmentation. In the next section I describe a simple recurrent network designed to demonstrate that if phonotactics is deployed for the task of speech segmentation, this strategy must have direct consequences for morphological decomposition.
2.3 Experiment 1: A Simple Recurrent Network Because I was interested in demonstrating the inevitability of transfer from phonological based segmentation to morphological decomposition, I decided to train a simple network on an explicit segmentation task, and provide it with the phonotactic information believed to be exploited in this task by adults—that is, provide it with information about the phonology on both sides of the potential juncture. The network was also provided with syllabification information, to allow it to make positionally sensitive generalizations. These strategies give the network the maxi-mum chance of success at learning word boundaries. The network was trained on monomorphemic words of English, and then tested on multimorphemic words, to establish the degree of the transfer of learning to the task of morpheme spotting. 2.3.1 Network Architecture A simple recurrent network was designed, with a single output node, which should be activated if the current unit is the first unit of a word, and should remain unactivated elsewhere. The network was implemented using Tlearn (Plunkett and Elman 1997). The architecture is shown in figure 2.1. There are 27 input nodes, which are used for an essentially localist representation of units. The first three input nodes represent syllabic position—node one is on for onsets, node two for nuclei, and node three for codas. The remaining 24 nodes represent the different consonants of English. The network was not given information about distinctions between vowels, on every presentation of a vowel, node two was activated and no other. If the unit consists of a single consonant, just one of nodes 4–27 is activated. If the unit is a consonant cluster, the relevant component nodes are activated. This input strategy has the side effect that there are a small number of clusters that the network can not distinguish between. The codas /ts/ and /st/, for example, have the same representation. However sonority restraints on sequencing ensure that the number of such “ambiguous” clusters in this representation is very small. The output layer consists of a single node, which is on if the current unit is the first unit in a word. In addition, the network has 6 hidden units, and 6 context units. The small number of hidden units forces an intermediate distributed transformation of the localist
Causes and consequences of word structure
20
input. All links depicted in figure 2.1 are trainable, except the links between the hidden nodes and the context nodes, which are fixed at one to one, to capture a copy of the activation pattern of the previous unit. 2.3.2 Training Data The training set was a subset of the CELEX lexical database (Baayen et al. 1995), identified as being monomorphemic. This set includes all words which CELEX tags as “monomorphemic.” CELEX also includes many words having “obscure morphology” or “possibly containing a root.” These two classes were examined independently by three linguists, and all forms which were identified as multimorphemic by any of the linguists were omitted from the corpus. After examining the resulting database, a number of forms identified as “monomorphemic” in CELEX were also subsequently excluded—these included reduplicative forms such as tomtom, and adjectives relating to places and ethnic groups such as Mayan. The resulting database is a set of 11383 English monomorphemes. Calculations based
Figure 2.1: Architecture of a simple recurrent network designed to identify word onsets. on this subset of CELEX have been reported in Hay et al. (in press) and Pierrehumbert (2001). Each word was parsed into a sequence of phonological units, consisting of onsets, nuclei and codas. The network then was trained on a single long utterance, consisting of one pass of this set of 11383 randomized monomorphemic English words. It therefore saw each word only once, and so word frequency information is not encoded at all in the input. The total number of phonological units presented was 53454, with the output node turned on for those 11383 which represented word onsets. The weights were initially randomized within the interval (−0.5, 0.5). The network was trained with a learning rate of .3 and a momentum of .9. After the presentation of each unit, the error was calculated, and used to adjust the weights by back-propagation of error (Rumelhart et al. 1986).
Phonotactics and Morphology in Speech Perception
21
2.3.3 Results and Discussion After training on monomorphemes, the network was tested on 515 prefixed words. The corpus involves all words with 9 English consonant-final prefixes which contain monomorphemic bases. This is the same dataset used for the analysis in chapters 3 and 5, and described in detail in section 3.3.2. The words were concatenated together into a long utterance consisting of 3462 units. During testing, the output node displayed a range of activation from .000, to .975. In the statistics reported here, I assume a threshold of .5. In all cases in which the output node has an activation level of above .5, it is assumed to have located a boundary. An examination of the word boundaries in the test corpus gives an indication of how well the network learned its task. Of 515 word-initial units, the network identified a boundary 458 times. That is, the network correctly identified 89% of the word boundaries. The network therefore learned the task quite successfully. As predicted, this learning transferred to the spotting of morpheme boundaries. Of the 515 morpheme boundaries present in the test set, the network hypothesized a juncture at 314–60%. In contrast, of the 2306 units which represented neither morpheme nor word onsets, the output node fired for only 127—that is, a boundary was falsely hypothesized less than 1% of the time. The 60% identification rate of morpheme boundaries reflects the considerable variation of the type of phonotactics which occur across morpheme boundaries. Some transitions are fully legal word-internally, whereas others are more typical of inter-word, rather than intra-word transitions. The variation in the behavior of the output node across the morpheme boundaries is significantly correlated with the phonotactics. The activation level of the output node can be significantly predicted by the probability of the transition occurring morpheme internally, and the probability of it occurring across a word boundary (r2=.25, pwhole
ILLEGAL whole>base
ILLEGAL base>whole
dis
8
86
0
11
un
9
58
1
20
22
82
0
3
em
8
47
1
8
up
0
20
0
8
mis
1
43
1
8
in(doors)
3
32
0
8
ex
0
6
0
2
trans
0
7
0
12
51 (12%)
381 (88%)
3 (4%)
80 (96%)
in(sane)
TOTAL
Table 3.6: Relative frequency of derived form and base for prefixed forms containing legal vs illegal phoneme transitions. Chi-Square on Totals line: Chi-Square= 3.98, df=1, p