Gradience in Grammar
This page intentionally left blank
Gradience in Grammar Generative Perspectives
Edited by G I...
134 downloads
1832 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Gradience in Grammar
This page intentionally left blank
Gradience in Grammar Generative Perspectives
Edited by G I S B E RT FA N S E LOW, CA RO L I N E F E´ RY, R A L F VOG E L , A N D M AT T H I A S S C H L E S EWSK Y
1
3 Great Clarendon Street, Oxford ox2 6dp Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With oYces in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York ß 2006 organization and editorial matter Gisbert Fanselow, Caroline Fe´ry, Ralf Vogel, and Matthias Schlesewsky ß 2006 the chapters their various authors The moral rights of the author have been asserted Database right Oxford University Press (maker) First published 2006 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose the same condition on any acquirer British Library Cataloguing in Publication Data Data available Library of Congress Cataloging in Publication Data Data available Typeset by SPI Publisher Services, Pondicherry, India Printed in Great Britain on acid-free paper by Biddles Ltd. www.biddles.co.uk ISBN 019–927479–7 978–019–927479–6 1 3 5 7 9 10 8 6 4 2
Contents Notes on Contributors 1 Gradience in Grammar Gisbert Fanselow, Caroline Fe´ry, Ralf Vogel, and Matthias Schlesewsky
vii 1
Part I The Nature of Gradience
23
2 Is there Gradient Phonology? Abigail C. Cohn
25
3 Gradedness: Interpretive Dependencies and Beyond Eric Reuland
45
4 Linguistic and Metalinguistic Tasks in Phonology: Methods and Findings Stefan A. Frisch and Adrienne M. Stearns
70
5 Intermediate Syntactic Variants in a DialectStandard Speech Repertoire and Relative Acceptability Leonie Cornips
85
6 Gradedness and Optionality in Mature and Developing Grammars Antonella Sorace
106
7 Decomposing Gradience: Quantitative versus Qualitative Distinctions Matthias Schlesewsky, Ina Bornkessel, and Brian McElree
124
Part II Gradience in Phonology
143
8 Gradient Perception of Intonation Caroline Fe´ry and Ruben Stoel
145
9 Prototypicality Judgements as Inverted Perception Paul Boersma
167
10 Modelling Productivity with the Gradual Learning Algorithm: The Problem of Accidentally Exceptionless Generalizations Adam Albright and Bruce Hayes
185
vi
Contents
Part III Gradience in Syntax
205
11 Gradedness as Relative Efficiency in the Processing of Syntax and Semantics John A. Hawkins
207
12 Probabilistic Grammars as Models of Gradience in Language Processing Matthew W. Crocker and Frank Keller
227
13 Degraded Acceptability and Markedness in Syntax, and the Stochastic Interpretation of Optimality Theory Ralf Vogel
246
14 Linear Optimality Theory as a Model of Gradience in Grammar Frank Keller
270
Part IV Gradience in Wh-Movement Constructions
289
15 Effects of Processing Difficulty on Judgements of Acceptability Gisbert Fanselow and Stefan Frisch
291
16 What’s What? Nomi Erteschik-Shir
317
17 Prosodic Influence on Syntactic Judgements Yoshihisa Kitagawa and Janet Dean Fodor
336
References Index of Languages Index of Subjects Index of Names
359 395 397 400
Notes on Contributors Adam Albright received his BA in linguistics from Cornell University in 1996 and his Ph.D. in linguistics from UCLA in 2002. He was a Faculty Fellow at UC Santa Cruz from 2002 to 2004, and is currently an Assistant Professor at MIT. His research interests include phonology, morphology, and learnability, with an emphasis on using computational modelling and experimental techniques to investigate issues in phonological theory. Paul Boersma is Professor of Phonetic Sciences at the University of Amsterdam. He works on constraint-based models of bidirectional phonology and phonetics and its acquisition and evolution. His other interests include the history of Limburgian tones and the development of Praat, a computer program for speech analysis and manipulation. Ina Bornkessel graduated from the University of Potsdam with a ‘Diplom’ (MAequivalent) in general linguistics in 2001. In her Ph.D. research (completed in 2002 at the Max Planck Institute of Cognitive Neuroscience/University of Potsdam), she developed a neurocognitive model of real-time argument comprehension, which is still undergoing further development and is now being tested in a number of typologically different languages. Ina Bornkessel is currently the head of the Independent Junior Research Group Neurotypology at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig. Abigail C. Cohn is an Associate Professor in Linguistics at Cornell University, Ithaca, NY, where her research interests include phonology, phonetics, and their interactions. She has focused on the sound systems of a number of languages of Indonesia, as well as English and French. She received her Ph.D. in Linguistics at UCLA. Leonie Cornips is Senior Researcher at the Department of Language Variation of the Meertens Institute (Royal Netherlands Academy of Arts and Sciences) and head of the department from 1 January 2006. Her dissertation (1994, Dutch Linguistics, University of Amsterdam) was about syntactic variation in a regional Dutch variety (Heerlen Dutch). Recently, she was responsible for the methodology of the Syntactic Atlas of the Dutch Dialects-project. Further, she examines non-standard Dutch varieties from both a sociolinguistic and generative perspective. Matthew W. Crocker (Ph.D. 1992, Edinburgh) is Professor of Psycholinguistics at Saarland University, having previously been a lecturer and research fellow at the University of Edinburgh. His current research exploits eye-tracking methods and computational modelling to investigate adaptive mechanisms in human language
viii
Notes on Contributors
comprehension, such as the use of prior linguistic experience and immediate visual context. Nomi Erteschik-Shir is Associate Professor of Linguistics at Ben-Gurion University, Israel. Her research has concentrated on the focus–syntax–phonology interface. She is the author of The Dynamics of Focus Structure (Cambridge University Press, 1997), co-editor of The Syntax of Aspect (Oxford University Press, 2005) and is currently at work on a volume on the syntax–information structure interface, to be published by Oxford University Press. Gisbert Fanselow received his Ph.D. in Passau (1985) and is currently Professor of Syntax at the University of Potsdam. Current research interests include word order, discontinuous arguments, wh-movement, and empirical methods in syntax research. Caroline FE´ry is Professor of Grammar Theory (Phonology) at the University of Potsdam. Her research bears on different aspects of the phonological theory, as well as on interface issues in which prosody plays the main role. She received her Ph.D. in Konstanz and her Habilitation in Tu¨bingen. In recent years she has been involved in a large project on the information structure in typological comparison. Janet Dean Fodor has a BA in psychology and philosophy (Oxford University 1964) and a Ph.D. in linguistics (MIT 1970). She is Distinguished Professor of Linguistics, Graduate Center, City University of New York and President of the Linguistic Society of America since 1997. Her research interests are human sentence processing, especially prosodic influences and garden path reanalysis; and language learnability theory, especially modelling syntactic parameter setting. Stefan Frisch studied psychology, philosophy, and linguistics (at the University of Heidelberg and the Free University Berlin). He was a research assistant at the Max-Planck Institute of Human Cognitive and Brain Sciences, Leipzig and at the University of Potsdam, where he got his Ph.D. in 2000. He is now a research assistant at the Day-Care Clinic of Cognitive Neurology, University of Leipzig. Stefan A. Frisch received his Ph.D. in linguistics from Northwestern University, and is currently Assistant Professor in the Department of Communication Sciences and Disorders at the University of South Florida. He specializes in corpus studies of phonotactic patterns, experiments on the acceptability of novel word stimuli, and the phonetic study of phonological speech errors. John A. Hawkins received his Ph.D. in linguistics from Cambridge University in 1975. He has held permanent positions at the University of Essex, the MaxPlanck-Institute for Psycholinguistics, and the University of Southern California. He currently holds a chair at Cambridge University. His present research interests include language typology, processing, and grammar from a psycholinguistic perspective.
Notes on Contributors
ix
Bruce Hayes is Professsor of Linguistics at the University of California, Los Angeles. He has published extensively in the fields of phonology and metrics, and is the author of Metrical Stress Theory: Principles and Case Studies. Frank Keller is a Lecturer in the School of Informatics at the University of Edinburgh, where his research interests include sentence processing, linguistic judgements, and cognitive modelling. Before joining the School of Informatics, he worked as a postdoc at Saarland University, having obtained a Ph.D. in cognitive science from Edinburgh. Yoshihisa Kitagawa is Associate Professor of Linguistics, Indiana University and has a Ph.D. in linguistics (University of Massachusetts at Amherst 1986). Current research interests are: information structure and syntax; the influence of prosody, processing, and pragmatics on grammaticality judgements; refinement of the Minimalist programme; interpretation of multiple-Wh-questions; economy in ellipsis; and anaphora. Brian McElree received his Ph.D. in psychology in 1990 from Columbia University, where he studied psycholinguistics with Tom Bever and human memory with Barbara Dosher. He is currently Professor of Psychology at New York University. His research focuses on the cognitive structures and processes that enable language comprehension, as well as more general issues concerning basic mechanisms in human memory and attention. Eric Reuland, who achieved his Ph.D. at Groningen in 1979, is Professor of Linguistics at Utrecht University. He conducts his research at the Utrecht Institute of Linguistics OTS. He is currently programme director of the Research Master Linguistics and European editor of Linguistic Inquiry. His research focuses on the ways anaphoric dependencies are linguistically represented, and include the relation between grammatical and neuro-cognitive architecture. Matthias Schlesewsky has a ‘Diplom’ in chemistry (MSc equivalent) from the University of Potsdam (1992). He subsequently moved to the field of theoretical linguistics, in which he obtained his Ph.D. in 1997 (Potsdam) with a dissertation on the processing of morphological case in German. From 1997 to 2002, he was a research assistant at the Department of Linguistics of the University of Potsdam, before becoming an Assistant Professor (‘Juniorprofessor’) of Neurolinguistics at the Philipps University Marburg in 2002. As documented by a wide range of international publications, his research interests focus primarily on the real-time comprehension of morphological case and arguments and its neurophysiological and neuroanatomical correlates. Antonella Sorace is Professor of Developmental Linguistics at the University of Edinburgh. The common thread of her research is an interest in the developmental, synchronic, and experimental aspects of variation in language. Topics that she has investigated include grammatical representations and processing in early and late
x
Notes on Contributors
bilinguals, the interfaces between syntax and other domains, the psychology of linguistic intuitions, and the cognitive neuroscience of the bilingual brain. Ruben Stoel received a Ph.D. from Leiden University in 2005. He is currently a research assistant at the University of Leiden. His interests include intonation, information structure, and the languages of Indonesia. Ralf Vogel is Assistant Professor at the University of Bielefeld, having received a Ph.D. from Humboldt University Berlin in 1998. His research agenda involves the syntax of the Germanic languages, the development of Optimality theory syntax, both formally and empirically, including interdisciplinary interaction with computer scientists and psychologists. The development and exploration of empirical methods in grammar research has become a strong focus of his work.
1 Gradience in Grammar G I S B E RT FA N S E LOW, C A RO L I N E F E´ RY, R A L F VO G E L , A N D M AT T H I A S S C H L E S EW S K Y
1.1 Introductory remarks Gradience has become a topic to which more and more linguists are turning their attention. One can attribute this increased interest to a variety of diVerent factors, but a growing methodological awareness certainly plays a role, as do the dramatically improved research possibilities in various domains such as the handling of very large corpora. However, applying these new methods borrowed from neighbouring disciplines such as psychology, sociology, or computer science rarely yields the kind of clear-cut categorical distinctions that most grammatical theories seem to work with. While the increase of interest in gradience may be a fairly recent phenomenon, reXections on gradience can even be found in the very Wrst treatise on generative grammar, viz. Chomsky (1955): DiVerent types of violations of syntactic principles, Chomsky observes, do not always lead to the same perception of ill-formedness. Sentences in which the basic laws of phrase structure are not respected (Man the saw cat a, Geese live in happily) appear much worse than those which merely violate selectional restrictions (John frightens sincerity). Such impressionistic judgements have been conWrmed in controlled experiments (see Marks 1965), suggesting that there are indeed degrees of grammaticality, and Chomsky (1955) integrated an analysis of degrees of grammaticality in his early grammatical models. The term gradience was introduced by Bolinger (1961a, 1961b), the Wrst detailed work on the topic. Bolinger argued that, in contrast to what the structuralist tradition claimed and what the structuralist methodology implied, linguistic categories have blurred edges more often than not, and that apparently clear-cut categories often have to be replaced by non-discrete scales. Bolinger identiWed gradient phenomena in various domains of grammar,
2
Gradience in Grammar
such as semantic ambiguities, syntactic blends, and in phonological entities, including intensity and length, among others. Such gradience in phonology can be illustrated by syllable structure constraints. How acceptable are the syllables pleal or plill? Are they possible English words? They violate a constraint which requires that the second segment of a complex onset should not be identical to the coda consonant (Davis and Baertsch 2005). On the other hand, plea, pea, lea, peal, eel, ill, and pill are English words, and nothing prohibits CCVC syllables in English. For this reason, pleal and plill are better syllables than the sequence tnoplt, which violates several well-formedness restrictions on the English syllable. With this simple example, we have already deWned three degrees of acceptability of monosyllabic words for English: perfect (and attested), not so good (and unattested) and unacceptable (and unattested). This three-way acceptability scale can be further reWned: tnopt, however badly ill-formed, is slightly better than tnoplt, since at least it does not violate the sonority hierarchy in addition to containing a prohibited onset. In attested syllables, more Wnegrained wordlikeness exists as well: the attested words mentioned above are better than equally attested words like Knut or sour with a more marked syllable structure. In Knut, the onset [kn] is otherwise not attested in English, and in sour, the sequence of a diphthong plus an [r] is marginal. The present book aims to represent the state of the art in dealing with gradient phenomena in the formal aspects of language, with a clearly visible emphasis on phonological and syntactic issues. The gradient data discussed in this book come from a wide array of phenomena: formant values, segmental and allophonic variations, morphological productivity, and tone and stress patterns, as well as word order variation, question formation, case matching in free relative clauses, and binding facts. The variety of empirical domains addressed in this book is matched by a similar richness of factors discussed in the chapters that might be made responsible for the gradient rather than categorical properties of the constructions in question: frequency of occurrence Wgures prominently, in particular in the phonological contributions, but the impact of processing diYculty, the production–perception distinction, and the Wt into context and information structure are considered as well, among other topics. Several papers also explore the representation and explanation of gradience within grammar itself—following a line of research opened by the quantitative studies of Labov. What makes dealing with gradience quite diYcult is that it is linked to a number of central (and often somewhat controversial) issues in the theory of grammar and our understanding of language. Linguistic objects such as words or sentences may sound acceptable to various degrees, and one can collect
Gradience in Grammar
3
data on acceptability by asking speakers of a language for judgements on some scale, but all that we arrive at by this procedure is the (relative) mean acceptability of a particular linguistic object among the speakers of a language, dialect, or, at least, the participants of the judgement study. In a strict sense, the object generated by the experiment (viz. mean acceptability) thus no longer necessarily belongs to the scope of a generative grammar if such a grammar is meant to represent a psychological state of a native speaker, as Chomsky claims. A certain mean acceptability value n may arise because nearly all speakers consulted Wnd the sentence acceptable to degree n, or because the mean of widely diverging judgements of the participants equals n, and the behaviour of the individual speaker may show high or low variance as well. Is the gradience we observe thus a property of the mentally represented grammar, or does it reXect variation among speakers? If the latter is the case, our descriptions will have to cover geographical, social, and also temporal dimensions, to the extent that variation and language change are related. These issues have been discussed in detail in phonology, as we will see below. One crucial question is whether diVerent dialects and registers should be described independently of each other (and if they are represented independently of each other in our brains); another question is whether it is justiWable at all to talk of clearly separate dialects or sociolects. The problem of intermediate varieties is addressed by Leonie Cornips in her contribution ‘Intermediate Syntactic Variants in a Dialect: Standard Speech Repertoire and Relative Acceptability’. She describes how intermediate language varieties with their own syntactic characteristics emerge from a language contact situation which is typical for speakers in European societies, namely the contact between a regional dialect and the standard language variety spoken in the respective country. An intermediate variety is a variant of the standard language that is characteristic of a particular region. Cornips shows that in the intermediate variant Heerlen Dutch the inalienable possession construction has syntactic characteristics which can be found neither in the local dialect nor in standard Dutch. Speakers can eVortlessly shift between the three variants, the regional Heerlen dialect, Standard Dutch, and Heerlen Dutch. However, Cornips and her colleagues found that it is no longer possible for these speakers to give clear-cut judgements about the local dialect. For instance, speakers tend to attribute to their local dialect all versions of the inalienable possession construction which are possible in the three varieties. Cornips argues that the intermediate variants form a continuum with the standard and local dialect varieties, which has arisen due to geographic, stylistic, and social factors. As a consequence, speakers can only make relative
4
Gradience in Grammar
judgements by comparing variants of a particular form. Gradient acceptability is here the result of uncertainty about their own dialects on the part of the speakers. The study of aspects of variation thus leads us back to the question of what gradience means in terms of an individual speaker’s grammar. Controversial issues may arise in two further domains here. First, if one accepts that data such as response frequencies in a forced choice acceptability experiment are relevant for a linguist identifying the grammar of a language, the question arises as to why perception data should have a privileged status. The grammar, so one can argue, should also have something to say about production facts, such as frequencies in controlled production experiments, or frequencies in text corpora. The contributions by Boersma, Crocker and Keller, and Vogel, discussed in detail below, address the issue of how to cope with situations in which gradience as determined by production facts (corpus frequency) does not go hand in hand with gradience as measured in perception. In general, while there are undeniable positive correlations between frequency and gradient acceptability (see below), there are also clear cases where the two aspects are not in agreement, so that the relation between theories that are built on frequency data and those that rely on acceptability judgements is not always obvious. It would seem that phonological theories attribute a greater role to frequency eVects than syntactic theories, which may be related to the fact that phonology is more concerned with stored and storable items than syntax. But the issues arising here are far from being resolved. In any event, a narrow interpretation of the scope of generative grammar again implies that corpus frequency data are not the kind of object a grammar can explain, since corpus frequencies do not reXect a mental state of a speaker whose internal grammar one wants to describe. Of course, concrete corpora are shaped by a large number of linguistically irrelevant factors (among them, What are people interested in talking about?), and they are further inXuenced by a set of cognitive factors (relative production diYculty) that one does not necessarily have to cover in a grammar. However, the same is true for perception data, since, for example, processing diYculty inXuences acceptability judgements as well (see, e.g. Fanselow and Frisch, Ch. 15 this volume). In principle, everyone probably agrees that there are no privileged data, but in practice grammatical models diVer quite substantially as to the type of data they are designed to capture. The second aspect is much more controversial. As mentioned above, many factors inXuence the relative acceptability and the relative frequency of a linguistic item. When we develop a model for gradience, we must take all of them into account. The controversy, which comes in many diVerent guises
Gradience in Grammar
5
such as phonetics versus phonology, grammaticality versus acceptability, competence versus performance, is whether it makes sense to keep at least some of these factors (say, working memory) outside of what one speciWes in one’s grammar, and if so, whether one can keep all factors that introduce gradient properties external to grammar. The answers which can be found in the literature range from the claim that grammar itself is gradience-free to the position that the questions addressed here make no sense at all because their presuppositions are not fulWlled.
1.2 Theories of gradience in phonology Just like its structuralist predecessor, generative phonology set out with the ideal of formulating an essentially categorical model. The aim of feature theory, segment inventories, and, of course, of rules and derivations has been to provide clear-cut categories such as the following: a system for describing all phonemes of all languages and a system of ordered rules that derive completely speciWed surface forms, not available for further variations. The re-write rules of Chomsky and Halle (1968) were conceived for categorical outputs, which implies that variations within a language were incompatible with the purely categorical approach encompassed in the generative format. It was generally accepted among generativists that phonology is categorical and phonetics gradient. In her chapter ‘Is there Gradient Phonology?’, Abigail Cohn discusses this division between categorical phonology and gradient phonetics, and asks where the line between the two modules is to be drawn. The answer to this question proves to be more diYcult than the previous generative phonology has suggested. Cohn’s main point is that there are grey areas in which sound patterns may be explained in terms of gradience or of categoricity, so that it is diYcult to clearly separate phonology from phonetics. In other words, phonology, even if obviously categorical in some of its parts, also makes use of gradient patterns. She deWnes the term gradience: (a) as a change in phonetic space and time; (b) in the sense of variability (also in a diachronic dimension); and (c) in the sense of gradient well-formedness, concentrating on the Wrst and marginally on the third interpretation. The Wrst problem she addresses in her paper concerns contrast. She asks the question of whether contrast may be gradient, a situation which may arise when two phonemes contrast in some positions but not in others. Second, she looks at phonotactic generalizations, like Pierrehumbert’s generalizations on medial consonant clusters discussed below, which can also be considered gradient. The third and last question concentrates on alternations, divided into morphophonemic
6
Gradience in Grammar
and allophonic ones. Steriade’s phonetic paradigm uniformity eVects and Bybee’s frequency eVect in allophony are addressed in some detail. The well-foundedness of Steriade’s claims that paradigms retain some phonetic properties of their stem or of some speciWc inXected form (leading to over- or underapplication of phonological processes) is questioned. Similarly, Bybee’s suggestion that more frequent words are shorter and phonologically reduced as compared to less frequent ones is also scrutinized. Cohn observes that both eVects may be less pervasive than claimed by their proponents. The conclusion of the paper is that phonology is both gradient and categorical, but that phonology is not to be confounded with phonetics: both are separate modules of the study of language. Returning to the second sense of gradience in Cohn’s list, one observes that in sociolinguistic phonology, the discrete nature of transformational rules was questioned very early. Labov (1969), Cedergren and SankoV (1974), and SankoV and Labov (1979) proposed accounting for variation in spontaneous utterances by adding weighted contextual factors to rules (‘variable rules’). The treatment of t,d deletion in South Harlem English (Labov et al. 1968) was a seminal study in this domain (see also Fasold 1991), and we will use this example to show how variation and gradience are inherent to the phonological part of grammar. The introduction of variable rules into linguistic theory was severely criticized, mainly because of its alleged illicit blurring between diVerent theoretical levels (see as an example Kaye and McDaniel 1978). In a series of studies, Bybee (Hooper 1976; Bybee 1994, 2000a, 2000b, 2001) has quantiWed t,d deletion in Standard English, among other lenition and reduction processes, and she shows convincingly that this process is an on-going diachronic change, agreeing in this with Labov (1994). Diachronic change seems to always be preceded by a phase of synchronic variation—although the reverse is not true, as will be shown below. Studying synchronic variation helps us to understand better how language changes diachronically, and why historical changes happen at all. Sociolinguists like Labov have been interested in external factors—social class, ethnicity, gender, age, and so on—which introduce variation into synchronic language. Other linguists have concentrated on the internal factors that trigger change, historical or not, an aspect of this line of research which is relevant for the issues in this book. Kiparsky (1982) and Guy (1980, 1981), for instance, use the framework of Lexical Phonology, which posits that morphology is organized in several derivational levels, each of them with their own phonological rules. They show that there is a strong correlation between the morphological structure and the rate of t,d deletion, and they use categorical aspects of the model to express the variability of
Gradience in Grammar
7
deletion. An additional factor for variation comes from frequency. Bybee (2000a, 2003) Wnds a signiWcant eVect of frequency on the rate of deletion. Also among past tense verbs, there is an eVect of frequency, since high frequency verbs delete their Wnal coronal stop more often than low frequency ones. These results are largely conWrmed by Jurafsky et al. (2001) in a study using the Switchboard Corpus (Godfrey et al. 1992), a corpus of telephone conversations between monolingual American English speakers. They Wnd that high frequency words delete Wnal t or d twice as frequently as low frequency words. Jurafsky and colleagues are interested in the fact that words which are strongly related to each other (like grand piano) or which are predictable from their neighbours, as for example in collocations, are more likely to be phonologically reduced. On the basis of all these facts, Bybee (2003) presents a diVerentiated account of t,d deletion. Whereas Labov (1994) regards the process as phonemic, in other words as a process in which a phoneme is always deleted entirely (categorically deleted), lexical diVusion is for her both lexically and phonetically gradient. This means that the coronal stop is not abruptly deleted but is lenited Wrst. Eventually, a segment which is lenited more and more may disappear altogether (see also Kirchner 2001 for an account of lenition in Optimality Theory). Bybee (2003) uses Timberlake’s (1977) insight on the distinction between uniform and alternating environments to explain asymmetries in the pattern of t,d deletion, like the fewer occurrences of deletion in the regular past tense morpheme. The past tense morpheme t,d has an alternating environment both in the preceding and in the following segment. It could be that the environments retarding deletion (preceding vowel, as in played for instance, as contrasted with an obstruent, as in jumped, missed, or rubbed) have an overall eVect on the pattern. Gradually, the more frequent occurrences impose their phonetic structure in more and more contexts, and this explains why words that occur in the context for a change more frequently undergo the change at a faster rate than those that occur less frequently in the appropriate context. In words like diVerent or grand, t,d are always in the right context for deletion, given the syllable structure of the word in isolation, but the past tense morpheme is more often in a context where deletion does not trigger a better syllable structure. In the case of diVerent or grand, the eVects of the change are represented lexically before those in the case of the past tense morpheme. We have presented the t.d deletion facts in some detail because it illustrates the state of the art in phonological gradience: variation and change are not external to the grammar and lexicon but inherent to them, and the diVusion is not the result of random variation but rather stems from reduction
8
Gradience in Grammar
processes that occur in the normal automation of motor activity. Frequently used segment sequences are easier to articulate because the neuromotor activity controlling them is automatic. According to Lindblom (1990), lenition is due to hypoarticulation. He claims that speakers undershoot phonetic targets, but only to the point at which their utterances are still recoverable. Moreover, frequent words are more often in unstressed positions, which are associated with less articulatory eVort. A frequent word is often used several times in a discourse and this pre-mentioning increases its hypoarticulation even more. Pierrehumbert (2001, 2002) proposes a model to explain the pattern of change, couched in the exemplar or episodic theory, originally a cognitive theory of perception and categorization. In the linguistic extension of the theory (Johnson 1997; Goldinger 2000), the mental lexicon consists of stored episodes, and not of abstract units, as has been assumed in generative phonology. During perception, a large number of traces residing in memory are categorized, and activated on the basis of what is heard. Traces cluster in categories as a function of their similarity. Moreover, frequent words leave more traces than infrequent ones. As a new word is encountered, it is categorized as a function of its similarity to existing exemplars, according to probabilistic computation. If one category is more probable than its competitors, the new item is categorized as an exemplar of this category, and in case of ambiguity, the more frequent label wins. Pierrehumbert interprets this theory as implicit knowledge about the probabilistic distribution of phonological elements, organized as a cognitive map. Frequency is not directly encoded but is just an artefact of the number of relevant traces. A word which is heard frequently possesses more traces and thus is activated more strongly than a rare word. A positive aspect of this theory is that it explains the phonetic details that speakers of a certain language have to know in order to master the articulatory subtleties. Languages diVer in their vocalic distribution for instance, and if speakers just have access to rough universal categories, as has been assumed in the generative approaches to phonology, this fact is diYcult to understand. If their phonemic knowledge is based on real acoustic traces of words pronounced, then the native speaker competence can be understood as the accumulation of the large number of memory experiences. As an explanation of historical change, the perceptual memories of the lenited word forms may increase incrementally. High frequency words, which are lenited for the reasons mentioned above, are heard more often than low frequency ones, shifting the direction of historical change even more.
Gradience in Grammar
9
Throughout this book, we will see that frequency plays a crucial role in patterns of gradience. Frequency in phonology has also been examined from a diVerent perspective, namely from the point of view of phonotactic patterns. Frisch (1996) and Frisch et al. (1997) model a gradient constraint combination to account for the phonotactics of the verbal roots in Arabic. In their chapter ‘Linguistic and Metalinguistic Tasks in Phonology: Methods and Findings’, Stefan A. Frisch and Adrienne M. Stearns demonstrate that probabilistic and gradient phonological patterns are part of the knowledge of a language in general. Evidence for this thesis comes from a variety of sources, including psycholinguistic experiments using metalinguistic and language processing tasks, as well as studies of language corpora. These results support theories that information about phonological pattern frequency is encoded at the processing and production levels of linguistic representation. Frisch and Stearn’s chapter Wrst reviews the methodologies that have been used in phonological studies employing metalinguistic phonological judgements, primarily in the case of phonotactics. These studies have found that native speaker judgements closely reXect the phonotactic patterns of language. Direct measures include well-formedness judgements, such as acceptability judgements and wordlikeness judgements (Frisch et al. 2000), morphophonological knowledge (Zuraw 2000), inXuence of transitional probabilities on wordlikeness judgements for novel words (Hay et al. 2004), distance of novel CCVC words as measured by a phoneme substitution score (Greenberg and Jenkins 1964), and measures of similarity between words. Indirect measures reXect the grammatical linguistic knowledge through linguistic performance and thus provide evidence for the psychological reality of gradient phonological patterns. They include elicitation of novel forms (wug tests), analysis of lexical distributions and of large corpora in general, as well as analysis of confusability in perception and production. These last tests show that lexical neighbourhood and phonotactic probability aVect speech production. Frisch and Stearn’s case study shows that sonority restrictions in consonant clusters are gradient, the cross-linguistic preference being for onset consonant clusters that have a large sonority diVerence. Quantitative language patterns for thirty-seven languages were measured and compared to attested clusters. Metalinguistic judgements of wordlikeness were also gathered for English and compared to the attested and possible clusters, the results again providing evidence for the psychological reality of gradient patterns in phonology. Mean wordlikeness judgements correlated signiWcantly with the type frequency of the CC sequences contained in the novel words.
10
Gradience in Grammar
The authors do not provide a grammatical model for their data. They even conjecture that it is unclear whether a distinct phonological grammar is required above and beyond what is necessary to explain patterns of phonological processing. Given the grounding of gradient phonological patterns in lexical distributions, they propose that exemplar models, based on frequency information and probabilities, explain generalization-based behaviour as a reXex of the collective activation of exemplars that are similar along some phonological dimension, rendering abstract representations obsolete. The contributions by Boersma and by Albright and Hayes propose anchoring the correlation between gradience and frequency in grammar. They use the Gradual Learning Algorithm (GLA) developed by Boersma (1998a) and Boersma and Hayes (2001), a stochastic model of Optimality Theory. In GLA, the variation comes from the possibility of a reordering of two or more constraints in the hierarchy, expressed by overlapping of the constraint’s range. In addition, the constraints have diVerent distances to their neighbours. The likelihood of a reordering is thus not a function of the rank in the hierarchy, but rather of the stipulated distance between the constraints, which is encoded in the grammar by assigning numerical values to the constraints which determine their rank and their distance at the same time. Boersma and Hayes’s model thus allows us to deal with error variation as a source of gradience in a language particular way. Adam Albright and Bruce Hayes’s chapter ‘Modelling Productivity with the Gradual Learning Algorithm: The Problem of Accidentally Exceptionless Generalizations’ addresses the modelling of gradient data in inXectional paradigms. Related to this is an empirical question about productivity: when language learners are confronted with new data, which weight do they assign to accuracy versus generality? This problem arises in relationship to accidentally true or small-scale generalizations. These kinds of generalizations are conWned to a small set of forms and correlate with unproductivity. This is a classic problem of inductive learning algorithms which are restricted to a training set: when confronted with new data, they might fail to make the right predictions. In a standard Optimality Theory approach, constraints deduced from the training set apply to the forms used for learning, but unfortunately they make wrong predictions for new forms. Reliability of rules or constraints, that is, how much of the input data they cover and how many exceptions they involve, is not the right property to remedy this problem. Generality may make better predictions, especially in the case of optionality between two forms. Children acquiring English, for instance, are confronted with several answers as to how to form the past tense, as exempliWed by wing winged,
Gradience in Grammar
11
wring wrung, and sing sang, which are attested English forms. A subset of such verbs, composed of dig, cling, fling and sling build their past tense with [^] simulation. Albright and Hayes (2003) Wnd that for a newly coined verb like spling, English speakers rate past tense splung and splinged nearly equivalently high. Their conclusion is that general rules, like ‘form past tense with -ed ’ are so general that they sometimes compete with exceptionless particular rules. Navajo sibilant harmony in aYxation, the data set discussed in this chapter, exhibits a similar, although attested, optionality. If a stem begins with a [–anterior] sibilant ([cˇ, cˇ, cˇh, sˇ, zˇ]), the s-perfective preWx [sˇi] is attached. If the stem contains no sibilant, the preWx [si] is the chosen form. If there is a [–anterior] sibilant later in the stem, both [si] and [sˇi] are possible. Albright and Hayes’s learning system is not able to cope with this pattern. The problem is that in addition to general and useful constraints, the system also generates junk constraints which apply without exception to a small number of forms, but which make incorrect predictions for new forms. To remedy the problem they rely on the Gradual Learning Algorithm (Boersma and Hayes 2001), which assumes a stochastic version of OT. Each pair of constraints is not strictly ranked, but rather assigned a probability index. The solution they propose is to provide each constraint with a generality index. Each rule is provided with a ranking index correlating with generality: the more general the constraint (in terms of the absolute number of forms which fulWls it), the higher it is ranked in the initial ranking. The junk constraints are ranked very low in the initial ranking and have no chance to attain a high ranking, even though they are never violated by the data of the learning set. Turning now to Paul Boersma’s chapter ‘Prototypicality Judgements as Inverted Perception’, it must Wrst be noticed that his goal is very diVerent from that of the preceding chapter. Boersma presents an account of gradience eVects in prototypicality tasks as compared to phoneme production tasks with the example of the vowel /i/. A prototype is more peripheral than the modal auditory form (the form they hear) in the listeners’ language environment, including the forms produced by the listeners themselves. The diVerence between the best prototype /i/ and the best articulated vowel [i] is implemented as a diVerence in the value of F1 and shows a discrepancy of 50 Hz between the two tasks. Boersma provides a model of production and comprehension which implements the main diVerence between the tasks in the presence of the articulatory constraints in the production task and their absence in the prototypicality task. More speciWcally, he proposes augmenting
12
Gradience in Grammar
the usual three-level grammar model (Underlying Form (UF) ! Surface Form (SF) ! Overt Form (OF) ) with articulatory and auditory representations at OF (ArtF and AudF). Production consists of a single mapping from UF to ArtF, with obligatory considerations of SF and AudF. Comprehension, on the other hand, takes place in two separate steps: one process is called perception (AudF is mapped to SF) and is equivalent to the prelexical perception of McQueen and Cutler (1997), and the other process is recognition or lexical access (SF to UF). Boersma’s formal analysis is couched in a stochastic OT approach of the kind he has proposed in previous work (see above). In the case of perception (prototypicality task), auditory (sensorimotor) events are relevant. The mapping of an incoming F1 to a certain vowel is expressed by a series of negatively formulated cue constraints of the form ‘an F1 of 340 Hz is not /i/.’ The ranking of all such constraints determines for a speciWc value of F1 whether it is to be interpreted as an /i/, an /e/ or any other vowel. In the production task, both auditory and articulatory constraints are active. Auditory cue constraints are the same as before, in the same ranking, but now, articulatory constraints expressed as degrees of articulatory precision are also involved. Too much eVort is avoided, which explains why the phoneme production task delivers a diVerent candidate from the prototypicality task. We have discussed a number of approaches and papers that point at a correlation between frequency patterns and acceptability. The question arises as to whether this correlation is also attested in other parts of the phonology. In their chapter ‘Gradient Perception of Intonation’, Caroline Fe´ry and Ruben Stoel address tonal contours as gradient phonological objects. Since tone patterns only exist in their association with texts, they conducted an experiment in which sentences realized with diVerent (marked and less marked) tonal patterns were presented to subjects in an acceptability rating task. The results of the experiment pointed to the existence of more generally accepted contours. These are the intonational patterns found in a large number of contexts. These contours are those which were originally produced as an answer to a question asking for a wide-focused (all-new) conWguration, or for a topic-focus pattern. A tonal pattern corresponding to narrow focus on an early constituent was clearly deviant in most cases, and was attributed a low grade in most contexts except for the matching one. Intermediate grades were also obtained, thus revealing that tonal patterns are gradient objects. An additional result was that the more syntactically and semantically complex the sentences are, the less clear-cut the results. An interesting point mentioned in the chapter is that the results obtained by means of a scale are indistinguishable from those obtained by categorical judgements. All in all, the paper’s
Gradience in Grammar
13
conclusion is that intonational contours are gradient objects, as far as their acceptability is concerned, in the same way as segment clusters or word orders are, and that the acceptability of tonal contours correlates with frequency. This remark may point to the conclusion that what looks idiosyncratic at Wrst glance may turn out to be the product of experience after all. What is heard more often is felt to be more acceptable. In their chapter ‘Prosodic InXuence on Syntactic Judgements’, Yoshihisa Kitagawa and Janet Dean Fodor build a bridge from phonology to syntax. Their point of departure is that a construction which requires a non-default prosody is vulnerable to misjudgements of syntactic well-formedness when it is read, and not heard. Acceptability judgements on written sentences are not purely syntax-driven; they are not free of prosody even though no prosody is present in the stimulus. As the basis of their observations, they presuppose the Implicit Prosody Hypothesis (Fodor 2002b), which claims that readers project a default prosody onto the read sentences. They elicited grammaticality judgements on both Japanese and English sentences requiring a marked prosody in order to be grammatical. The Japanese target items were instances of constructions with wh-in-situ and long-distance scrambled wh. Each was disambiguated by its combination of matrix and subordinate complementizers toward what has been reported to be its less preferred scope interpretation: (a) subordinate wh-in-situ with forced matrix scope, and (b) wh scrambled from the subordinate clause into the matrix clause, with forced subordinate scope. The results of the experiments revealed that the target sentences were accepted more often in the listening task than in the reading task. The English sentences consisted of negative polarity items (NPIs) in high/low attachment. In the Wrst case, the prosody has to be strongly marked, whereas in the second case, the diVerence in attachment barely elicits a diVerence in prosody. Subjects accepted the NPI sentences more often when listening to them than when reading them. The best scores were obtained by a combination of reading and listening.
1.3 Theories of gradience in syntax There is some irony in the fact (mentioned above) that the Wrst work in generative syntax developed a model for degrees of grammaticality, while gradience never played a crucial role later in what may be called mainstream (generative) syntax. Leading current syntactic models such as Minimalism, OT, OT-LFG, HPSG, or categorial grammar seem disinterested in gradience, at least as evidenced by the ‘standard references’ to these models, and this was not much diVerent ten years ago.
14
Gradience in Grammar
There are reasons for considering this negligence unfortunate. Some key domains of syntax show gradience to a considerable degree. The subjacency phenomena, superiority eVects, and word order restrictions Wgure prominently in this respect. This high degree of gradience often makes it unclear what the data really are, and syntactic practice does not follow the golden rule of formulating theories on the basis of uncontroversial data only (and have the theory then decide the status of the unclear cases). We believe that theories formulated on the basis of clear-cut data only would not really be interesting in many Welds of syntax, so it is necessary to make the ‘problematic’ data less controversial, that is, to formulate a theory of gradience in syntax. There are two types of approaches to syntactic gradience as a property of grammar. Chomsky (1955) allows an interpretation in which the gradience is coded as a property of the grammatical rules or principles. Simplifying his idea a bit, one can say that full grammaticality is determined by a set of very detailed, high precision rules. If a sentence is in line with these, it has the highest degree of acceptability. For deviant sentences, we can determine the amount (and kind) of information we would have to eliminate from the high precision, full detail rules in order to make the deviant sentence Wt the rule. The less we have to eliminate, the less unacceptable the sentence is. Mu¨ller (1999) makes a related proposal. He reWnes standard OT syntax with the concept of subhierarchies composed of certain principles that are inserted into the hierarchy of the other syntactic constraints. For ‘grammaticality’, it only matters whether at least one of the principles of the subhierarchy is fulWlled, but a structure is ‘unmarked’ only if it is the optimal structural candidate with respect to all of the principles in the subhierarchy. In the other tradition, represented, for example, by Suppes (1970), the rules and constraints of the grammar are directly linked to numerical values (not unlike the variable rules in phonology). In his chapter ‘Linear Optimality Theory as a Model of Gradience in Grammar’, Frank Keller introduces a theory of the second type—Linear Optimality Theory (LOT). In contrast to standard OT approaches, LOT is designed to model gradient acceptability judgement data. The author argues that the necessity for such an approach results from the observation that gradience in judgement data has diVerent properties from gradience in corpus data and that, therefore, both types of gradience should be modelled independently. The basic idea of the model is represented in two hypotheses which are formulated (a) with respect to the relative ranking of the constraints and (b) regarding the cumulativity of constraint violations. Whereas the former states that the numeric weight of a constraint is correlated with the reduction
Gradience in Grammar
15
in acceptability to which it leads, the latter assumes that multiple constraint violations are linearly cumulated. By means of a comparison of LOT and other optimality theoretic models, such as Harmonic Grammar or Probabilistic Optimality Theory, the author demonstrates the advantages of LOT in modelling relative grammaticality and the corresponding judgements that include optimal as well as suboptimal candidates within one ranking. Keller therefore presents a necessary prerequisite for a successful understanding and modelling of gradient grammaticality from a formal perspective. The chapter by Matthew Crocker and Frank Keller, ‘Probabilistic Grammars as Models of Gradience in Language Processing’, is also concerned with the functioning of grammars of the second type, but they focus on gradience in human sentence processing, which, as the authors argue, can be understood as variation in processing diYculty (or garden path strength). Based on a number of empirical Wndings, for example modulation of relative clause attachment preferences via a short training period, they present an account of this phenomenon in terms of experience-based behaviour. From this perspective, the interpretation of a sentence is a function of prior relevant experience, with a ‘positive’ experience supporting a speciWc interpretation and suppressing alternative ones. In addition to the experimental results, the concept is motivated by other theoretical and probabilistically driven approaches in psycholinguistics and cognitive science in general. The authors also discuss the Wne-grained inXuences and the scope of lexical and structural frequencies during the incremental interpretation of sentences. Most importantly for the main focus of the current volume, (a) they claim that there is no straightforward relationship between the frequency of a sentence type and its acceptability and (b) this observation leads to the conclusion that a sentence-Wnal judgement can only be derived as a result of an interaction of the frequency of experience with respect to this speciWc construction, more general linguistic knowledge, and cognitive constraints. The chapter ‘Degraded Acceptability and Markedness in Syntax, and the Stochastic Interpretation of Optimality Theory ’ by Ralf Vogel represents a grammatical theory of the Wrst of the two types introduced above. Vogel makes two central claims: (i) gradience in syntax is an intrinsic feature of grammar. There are cases of gradience which directly result from the interaction of the rules and constraints that make up the grammar; (ii) there is no need to import a quantitative dimension into grammar in order to model gradient grammaticality.
16
Gradience in Grammar
The example that illustrates the Wrst claim is case conXicts in German free relative constructions (FRs). FRs without case conXict receive higher acceptability in acceptability judgement experiments and are more frequent in corpora than those with a case conXict. But the kind of conXict is also crucial: conXicting FRs in which the oblique case dative is suppressed are judged as less acceptable than those in which the structural case nominative is suppressed. Vogel demonstrates that a standard optimality theoretic grammar is already well-suited to predict these results, if one of its core features, the central role of markedness constraints, is exploited in the right way. Vogel further argues against the application of stochastic optimality theory in syntax, as proposed, for instance, in Bresnan et al. (2001). The relative frequencies of two alternative syntactic expressions in a corpus not only reXect how often one structure wins over the other but also how often the competition itself takes place, which here means how often a particular semantic content is chosen as input for an OT competition. If the inXuence of this latter factor is not neutralized, as in the model by Bresnan et al. (2001), then properties of the world become properties of the grammar, an unwelcome result. Vogel further provides evidence against another claim by Bresnan et al., which has become famous as the ‘stochastic generalization’: categorical contrasts in one language show up as tendencies in other languages. Typologically, FR structures are less common than semantically equivalent correlative structures. The straightforward explanation for this observation can be given in OT terms: FRs are more marked than correlatives. Nevertheless, a corpus study shows that in unproblematic cases like non-conXicting nominative FRs, FRs are much more frequent than correlatives in German. Vogel argues that corpus frequency is biased by a stylistic preference to avoid overcorrect expressions which contain more redundant material than necessary, primarily function words. Including such a stylistic preference into an OT grammar in the form of a universal constraint would lead to incorrect typological predictions. Vogel opts for the careful use of a multiplicity of empirical methods in grammar research in order to avoid such methodinduced artefacts. While these contributions highlight how syntactic principles can be made responsible for gradience, several of the other factors leading to gradience are discussed in detail in the following papers. That context and information structure are relevant for acceptability has often been noted. This aspect is addressed by Nomi Erteschik-Shir. Her ‘What’s What?’ discusses syntactic phenomena which have been argued to lead to gradient acceptability in the literature, namely the extraction of wh-phrases out of syntactic islands, as well as several instances of so-called superiority violations, where multiple
Gradience in Grammar
17
wh-phrases within one clause appear in non-default order (e.g., *What did who say ?). The acceptability of the wh-extraction in ‘Who did John say/? mumble/*lisp that he had seen?’ seems to depend on the choice of the matrix verb. Previous accounts of this contrast explained it by assigning a diVerent syntactic status to the subordinate clause depending on the verb, leading to stronger and weaker extraction islands. Erteschik-Shir shows that this analysis is unable to explain why the acceptability of the clauses improves when the oVending matrix verb has been introduced in the preceding context. She argues that the possibility of extraction is dependent on the verb being unfocused. The diVerence between semantically light verbs like ‘say’ and heavier ones like ‘mumble’ and ‘lisp’ is that the latter are focused by default while the former is not. Erteschik-Shir develops a model of the interaction between syntax and information structure to account for this observation. The central idea in her proposal is that only the focus domain can be the source of syntactic extraction. If the main verb is focused, the subordinate clause is defocused and thus opaque for extraction. Erteschik-Shir’s account of superiority and exceptions from it (*What did who read? versus What did which boy read?) also refers to the information structural implications of these structures. Crossing movement does not induce a superiority violation if the fronted wh-phrase is discourse-linked and thus topical. Another crucial factor is that the in-situ wh-phrase is topical, which leads to focusing of the complement, out of which extraction becomes possible. Erteschik-Shir’s explanation for the degraded acceptability of these structures lies in her view of elicitation methods. Usually, such structures are presented to informants without contexts. The degraded structures rely on particular information structural conditions which are harder for the informants to accommodate than the default readings. As this is a matter of imagination, Erteschik-Shir also predicts that informants will diVer in their acceptance of the structures in question. Overall, the account of syntactic gradience oVered here is processing-oriented, in the sense that it is not the grammar itself that produces gradience, but the inference of the information structural implications of an expression in the course of parsing and interpretation. In her chapter ‘Gradedness and Optionality in Mature and Developing Grammars’, Antonella Sorace argues that residual optionality, which she considers the source of gradience eVects, occurs only in interface areas of the competence and not in purely syntactic domains. In that sense, her approach is quite in line with what Erteschik-Shir proposes. Sorace asks (a) whether gradedness can be modelled inside or outside of speakers’ grammatical representations, and (b) whether all interfaces between syntax and other domains of linguistics are equally susceptible to gradedness and optionality.
18
Gradience in Grammar
The underlying hypothesis for such an approach can be formulated in the following way: structures requiring the integration of syntactic knowledge and knowledge from other domains are more complex than structures requiring syntactic knowledge only. From this perspective, it can be argued that complex structures may lead to gradedness and variation in native grammars, may pose residual diYculties to near native L2 speakers, and may pose emerging diYculties to L1 speakers experiencing attrition from a second language because of their increasingly frequent failure to coordinate/integrate diVerent types of knowledge. Sorace presents empirical support from all of these domains within the context of null-subject constructions, post-verbal subject constructions, and split-intransitivity in Italian. Based on these phenomena she assumes that—at the moment—gradedness can best be modelled at the interface between syntax and discourse, without excluding the very likely possibility that there are additional interface representations. The internal structure of the interface representations as well as their accessibility during language comprehension in native and non-native grammars is a subject for further research about gradedness and optionality. The role of processing in the sense of syntactic structure building for acceptability is the topic of the contributions by Hawkins and by Fanselow and Frisch. John Hawkins’ contribution, ‘Gradedness as Relative EYciency in the Processing of Syntax and Semantics’, deals with the results of several corpus studies concerning the positioning of complements and adjuncts relative to the verb in English and Japanese. The two languages show clear selection preferences among competing structures which range from highly productive to unattested (despite being grammatical). Hawkins gives an explanation for the gradience in these data in terms of a principle of processing eYciency, Minimize Domains (MiD). This principle states that the human processor prefers to minimize the connected sequences of linguistic forms and their conventionally associated syntactic and semantic properties. This preference is of variable degree, depending on the relations whose domains can be minimized in competing structures. In the paper, the MiD principle is used to explain weight eVects among multiple constituents following the verb in an English clause. In general, between two NPs or PPs following the verb, it can be observed that the shorter NP/PP comes Wrst. This preference is stronger the more the two phrases diVer in size. The eVect of MiD is that it tends to minimize the distance between the verb and the head of the non-adjacent phrase, thus favouring an order in which the shorter postverbal constituent precedes the longer one. In Japanese, a head-Wnal language, we observe the reverse situation: the longer phrase precedes the shorter one,
Gradience in Grammar
19
the crucial factor being that the shorter phrase is preferred to be adjacent to the verb. An additional factor is whether (only) one of the two constituents is in a dependency relation with the verb. This factor strengthens the weight eVect if the selected phrase is shorter, but weakens it if it is longer. Hawkins also suggests that MiD has shaped grammars and the evolution of grammatical conventions, according to the performance-grammar correspondence hypothesis: syntactic structures have been conventionalized in proportion to their degree of preference in performance, as evidenced by patterns of selection in corpora and by ease of processing in performance. Hawkins further argues that his account is superior to an alternative approach like stochastic Optimality Theory because it does not mix grammatical constraints with processing constraints, as a stochastic OT approach would have to do. In their chapter ‘EVects of Processing DiYculty on Judgements of Acceptability’, Gisbert Fanselow and Stefan Frisch present experimental data highlighting an unexpected eVect of processing on acceptability. Typically, it is assumed that processing diYculties reduce the acceptability of sentences. Fanselow and Frisch report the results of experiments suggesting that processing problems may make sentences appear more acceptable than they should be on the basis of their grammatical properties. This is the case when the sentence involves a local ambiguity that is initially compatible with an acceptable interpretation of the sentence material, but which is later disambiguated towards an ungrammatical interpretation. The Wndings support the view that acceptability judgements not only reXect the outcome of the Wnal computation, but also intermediate processing steps. Matthias Schlesewsky, Ina Bornkessel, and Brian McElree examine the nature of acceptability judgements from the perspective of online language comprehension in ‘Decomposing Gradience: Quantitative versus Qualitative Distinctions’. By means of three experimental methods with varying degrees of temporal resolution (speeded acceptability judgements, event-related brain potentials, and speed-accuracy trade-oV), the authors track the development of speakers’ judgements over time, thereby showing that relative diVerences in acceptability between sentence structures stem from a multidimensional interaction between time sensitive and time insensitive factors. More specifically, the Wndings suggest that increased processing eVort arising during the comprehension process may be reXected in acceptability decreases even when judgements are given without time pressure. In addition, the use of eventrelated brain potentials as a multidimensional measurement technique reveals that quantitative variations in acceptability may stem from underlying diVerences that are qualitative in nature. On the basis of these Wndings, the authors argue that gradience in linguistic judgements can only be fully described when
20
Gradience in Grammar
all component parts of the judgement process, that is, both its quantitative and its qualitative aspects, are taken into account. What conclusions should be drawn from the insight that gradience results from domains such as processing diYculty or information structure is the topic of Eric Reuland’s Chapter. In ‘Gradedness: Interpretive Dependencies and Beyond’ he defends a classic generative conception of grammar that contains only categorical rules and concepts. He identiWes a number of grammar-external sources of gradience as it is frequently observed in empirical linguistic studies. The language that should be modelled by grammarians, according to Reuland, is the language of the idealized speaker/hearer of Chomsky (1965). In this Chomskyan idealization, most factors which are crucial for gradience are abstracted away from. Among such factors, Reuland identiWes the non-discreteness of certain aspects of the linguistic sign, for instance the intonation contours which are used to express particular semantic and syntactic features of clauses, like focus or interrogativity. Reuland argues that it is only the means by which these features are expressed which are non-discrete, not the features themselves. But only the latter are subject to the theory of grammar. Reuland further separates diVerences in language, which do not exist despite the preference for one or the other expression within the (idealized) speech community, from diVerences in socio-cultural conventions, which may exist, but are irrelevant for the study of grammar. Nevertheless, non-discrete phenomena are expected to occur where grammar leaves open space for certain choices, for instance in the way the subcomponents of grammar interact. Another source of gradience is variation in acceptability judgements within a speech community, as dialectal or ideolectal variation, and even within speakers, using diVerent ideolects on diVerent occasions, or as the eVect of uncertainty in a judgement task. Apart from these extra-grammatical explanations for gradience, Reuland also sees grammar itself as a possible source of gradience. Current models of grammar include a number of subcomponents, each of which has its own rules and constraints, some perhaps violable, which interact in a non-trivial way. Any theory of language, Reuland concludes, that involves a further articulation into such subsystems is in principle well equipped to deal with ‘degrees’ of well- or ill-formedness. Reuland exempliWes his position with a comparative case study of the syntax of pronouns, mainly in Dutch and English. He shows that the syntactic properties of reXexives and pronominals depend on a number of further morphosyntactic properties of the languages in question, among which are the inventory of pronouns in the language, richness of case, the possibility of preposition stranding, the mode of reXexive marking on verbs, the organization of the syntax–semantics interface in thematic
Gradience in Grammar
21
interpretation, and pragmatics. These factors interact non-trivially; constraint violation cannot always be avoided, and thus leads to degraded acceptability. Admitting this, in Reuland’s view, in no way requires abandoning the categorical conception of grammar. We have organized this book into four parts. The initial chapters have a certain emphasis on clarifying the nature of gradience as such, and give answers to the question of ‘What is gradience’ from the perspectives of phonology (Cohn; Frisch and Stearns), generative syntax (Reuland), psycholinguistics (Schlesewsky, Bornkessel, and McElree; Sorace) and sociolinguistics (Cornips). The following two parts address speciWc issues in phonology (Boersma; Albright and Hayes; Fe´ry and Stoel) and syntax (Crocker and Keller; Hawkins; Keller; Vogel). The contributions to the Wnal part of the book have in common that they look at a speciWc construction, namely long movement, from diVerent methodological backgrounds (Erteschik-Shir; Fanselow and Frisch; Kitagawa and Fodor).
This page intentionally left blank
Part I The Nature of Gradience
This page intentionally left blank
2 Is there Gradient Phonology? ABIGAIL C. COHN
2.1 Introduction In this chapter,1 I consider the status of gradient phonology, that is, phonological patterns best characterized in terms of continuous variables. I explore some possible ways in which gradience might exist in the phonology, considering the various aspects of phonology: contrast, phonotactics, morphophonemics, and allophony. A fuller understanding of the status of gradience in the phonology has broader implications for our understanding of the nature of the linguistic grammar in the domain of sound patterns and their physical realizations. In the introduction, I consider why there might be gradience in the phonology (Section 2.1.1). I then brieXy discuss the nature of phonology versus phonetics (Section 2.1.2). 2.1.1 Is there gradient phonology? Phonology is most basically a system of contrasts, crucial to the conveyance of linguistic meaning. This suggests that phonology is in some sense ‘categorical’. Central to most formal models of phonology is a characterization of minimally contrasting sound ‘units’ (whether in featural, segmental, or gestural terms) that form the building blocks of meaningful linguistic units. In what ways is phonology categorical—mirroring its function as deWning contrast, and to what degree is phonology inherently gradient in its representation, production, perception, acquisition, social realization, and change over time?
1 A number of the ideas discussed in this chapter were developed in discussions in my graduate seminars at Cornell, Spring 2004 and Spring 2005. Some of these ideas were also presented in colloquia at the Universities of BuValo and Cornell. Thanks to all of the participants in these fora for their insightful comments and questions. Special thanks to Mary Beckman, Jim Scobbie, and an anonymous reviewer for very helpful reviews of an earlier draft, as well as Johanna Brugman, Marc Brunelle, Ioana Chitoran, Nick Clements, Caroline Fe´ry, Lisa Lavoie, Amanda Miller, and Draga Zec for their comments.
26
The Nature of Gradience . The physical realization of sounds, understood (at least intuitively) as abstract units, is continuous in time and space, with the relationship between the speciWc acoustic cues and abstract contrasts often being diYcult to identify. . One crucial aspect of the acquisition of a sound system is understanding how phonetic diVerences are marshalled into deWning abstract categories. . Intraspeaker and interspeaker variation signal speaker identity, community identity, and attitude, while simultaneously conveying linguistic meaning through minimally contrasting elements. . The results of many diachronic changes, understood to be ‘regular sound change’ in the Neogrammarian sense, are categorical, yet how do changes come about? Are the changes themselves categorical and abrupt or do the changes in progress exhibit gradience and gradual lexical diVusion?
A modular view of grammar such as that espoused by Chomsky and Halle (1968, SPE) frames our modelling of more categorical and more gradient aspects of such phenomena as belonging to distinct modules (e.g. phonology versus phonetics). While SPE-style models of sound systems have achieved tremendous results in the description and understanding of human language, strict modularity imposes divisions, since each and every pattern is deWned as either X or Y (e.g. phonological or phonetic). Yet along any dimension that might have quite distinct endpoints, there is a grey area. For example, what is the status of vowel length before voiced sounds in English, bead [bi:d] versus beat [bit]? The diVerence is greater than that observed in many other languages (Keating 1985), but does it count as phonological? Bearing in mind how a modular approach leads to a particular interpretation of the issues, I consider the relationship between phonology and phonetics before exploring the question of gradience in phonology. 2.1.2 The nature of phonetics versus phonology A widely held hypothesis is that phonology is the domain of abstract patterns understood to be discrete and categorical, and phonetics is the domain of the quantitative realization of those patterns in time and space. These relationships are sketched out in (2.1). (2.1) The relationship between phonology and phonetics: phonology ¼ discrete, categorical 6¼ phonetics ¼ continuous, gradient
Is there Gradient Phonology?
27
For recent discussions of this consensus view, see for example Keating (1996); Cohn (1998); Ladd (2003), also individual contributions in Burton-Roberts et al. (2000) and Hume and Johnson (2001). See also Cohn (2003) for a fuller discussion of the nature of phonology and phonetics and their relationship. For the sake of concreteness, consider an example of phonological patterns and their corresponding phonetic realization that are consistent with the correlations in (2.1). In Figure 2.1, we see representative examples of the patterns of nasal airXow in French and English (as discussed in Cohn 1990, 1993). Nasal airXow is taken here as the realization of the feature Nasal. In the case of a nasal vowel in French, here exempliWed in the form daim ](Figure 2.1a), there is almost no nasal airXow on [d] and there is ‘deer’ [dE ]. Here we observe plateaus correspondsigniWcant airXow throughout the [E ing to the phonological speciWcations, connected by a rapid transition. In English on the other hand, during a vowel preceding a nasal consonant, such as [e] in den [den] (Figure 2.1b), there is a gradient pattern—or a cline— following the oral [d] and preceding the nasal [n] (which are characterized by the absence and presence of nasal airXow respectively). This is quite diVerent t] from the pattern of nasalization observed on the vowel in cases like sent [sE (Figure 2.1c), in which case the vowel is argued to be phonologically nasalized (due to the deletion of the following /n/) and we observe a plateau of nasal airXow during the vowel, similar to the pattern seen in French. The observed
d −N
∼ ε +N
d −N
(a) French daim 'deer' /dε∼/
s −N
ε
n +N
100ms
(b) English den / dεn/
∼ ε +N
(n)
t −N
~ / (c) English sent / sεt
Figure 2.1. Examples of nasal airflow in French and English following Cohn (1990, 1993)
28
The Nature of Gradience
diVerences between French and English relate quite directly to the fact that French has nasal vowels, but English does not. If the correlations in (2.1) are correct, we expect to Wnd categorical phonology, but not gradient phonology, and gradient, but not categorical, phonetics. Recent work calls into question this conclusion. In particular, it is evidence suggesting that there is gradience in phonology that has led some to question whether phonetics and phonology are distinct. Pierrehumbert et al. (2000) state the question in the following way: this assertion [that the relationship of quantitative to qualitative knowledge is modular] is problematic because it forces us to draw the line somewhere between the two modules. Unfortunately there is no place that the line can be cogently drawn. . . . In short, knowledge of sound structure appears to be spread along a continuum. Finegrained knowledge of continuous variation tends to lie at the phonetic end. Knowledge of lexical contrasts and alternations tend to be more granular. (Pierrehumbert et al. 2000: 287)
Let us consider the background of this issue in a bit more depth. Growing out of Pierrehumbert’s (1980) study of English intonation, gradient phonetic patterns are understood as resulting from phonetic implementation, through a mapping of categorical elements to continuous events. Under the particular view developed there, termed generative phonetics, these gradient patterns are the result of interpolation through phonologically unspeciWed domains. Keating (1988) and Cohn (1990) extend this approach to the segmental domain, arguing that phenomena such as long distance pharyngealization and nasalization can be understood in these terms as well. For example, the cline in nasal airXow seen in the vowel [e] in [den] in Figure 2.1b is interpreted as resulting from phonetic interpolation through a phonologically unspeciWed span. The phonology, then, is understood as the domain of discrete, qualitative patterns and the phonetics as the domain of the continuous, quantitative realization of those patterns. Intrinsic to this view is the idea that lexical entries and phonological patterns are represented in terms of distinctive features, taken to be abstract properties, albeit deWned phonetically. These are then interpreted in a phonetic component, distinct from the phonological one. I refer to this as a mapping approach. A modular mapping approach has been the dominant paradigm to the phonology–phonetics interface since the 1980s and has greatly advanced our understanding of phonological patterns and their realization. Such results are seen most concretely in the success of many speech-synthesis-by-rule systems both in their modelling of segmental and suprasegmental properties of sound systems. (See Klatt 1987 for a review.)
Is there Gradient Phonology?
29
An alternative to the types of approaches that assume that phonology and phonetics are distinct and that there is a mapping between these two modules or domains are approaches that assume that phonology and phonetics are one and the same thing, understood and modelled with the same formal mechanisms, what I term a unidimensional approach. A seminal approach in this regard is the theory of Articulatory Phonology, developed by Browman and Goldstein (1992 and work cited therein), where it is argued that the domains that are often understood as phonology and phonetics respectively can both be modelled with the same formalisms as constellations of gestures. Under this view, phonetics and phonology are not distinct and the apparent diVerences might arise through certain (never explicitly speciWed) constraints on the phonology. This gestural approach has served as fertile ground for advancing our understanding of phonology as resulting at least in part from gestural coordination. However, there are criticisms of this approach as a comprehensive theory of phonology, including arguments that Articulatory Phonology greatly overgenerates possible patterns of contrast. (See commentaries by Clements 1992 and Steriade 1990.) More recently, there is a signiWcant group of researchers (e.g. Flemming 2001; Kirchner 2001; Steriade 2001; see also Hayes et al. 2004) working within constraint-based frameworks, pursuing the view that there is not a distinction between constraints that manipulate phonological categories and those that determine Wne details of the representation. This then is another type of unidimensional approach that assumes no formally distinct representations or mechanisms for phonology and phonetics. One type of argument in favour of this approach is that it oVers a direct account of naturalness in phonology. However, the strength of this argument depends on one’s view about the source(s) of naturalness in language. (See Blevins 2004 for extensive discussion of this issue.) Such unidimensional views of phonology and phonetics also need to oVer an account of not only what I term ‘phonetics in phonology’, but also of the relationship between phonological units and physical realities—‘phonology in phonetics’. (See Cohn 2003 for a discussion of the distinct ways that phonology and phonetics interact with each other.) Independent of the account of naturalness, the question of whether one can adequately model the way that the phonetics acts on phonology still remains. Both Zsiga (2000) and Cohn (1998) have argued that such unidimensional approaches do not oVer an adequate account. As documented by Cohn (1998), this is commonly seen in ‘phonetic doublets’, cases where similar but distinct eVects of both a categorical and gradient nature are observed in the same language. These sorts of eVects can be seen in the case of nasalization discussed above. In French, in
30
The Nature of Gradience
the realization of contrastive nasal vowels, there is nasal airXow resulting from the contrast and also from coarticulatory patterns, seen, for example, in the transition between oral vowels and nasal consonants. Both aspects need to be modelled. In the case of contextual nasalization in English, there are both long distance and more local eVects seen in the physical patterns of nasal airXow that need to be accounted for. The question of whether phonology and phonetics should be understood as distinct modules needs to be approached as an empirical question. What sort of approach gives us the best Wt for the range of more categorical versus more gradient phenomena? There are clearly some grey areas—notably gradient phonology. Yet it is important to realize that just because it is diYcult to know exactly where to draw the line (cf. Pierrehumbert et al. 2000), this does not mean there are not two separate domains of sound structure. The fact that it is diYcult to draw a line follows in part from the conception of phonologization (Hyman 1976), whereby over time low-level phonetic details are enhanced to become phonological patterns. Phonologization by its very nature may result in indeterminate cases. As phonetic details are being enhanced, it will be diYcult at certain stages to say that a particular pattern is ‘phonetic’ while another is ‘phonological’. It has been suggested, for example that vowel lengthening before voiced sounds in English is currently in this in-between state. The diYculty of drawing a line also relates to the sense in which categoriality can only be understood in both rather abstract and language-speciWc terms. Recent work suggests that phonology and phonetics are not the same thing, but that the distinction might be more porous than assumed following strict modularity (e.g. Pierrehumbert 2002 and Scobbie 2004). Pierrehumbert (2002: 103) states: ‘categorical aspects of phonological competence are embedded in less categorical aspects, rather than modularized in a conventional fashion.’ We return below to the nature of the relationship between phonology and phonetics, as the status of gradient phonology plays a crucial role in this question. In order to investigate gradience in phonology, we need a clearer understanding of what we mean by gradience and we need to consider how it might be manifested in diVerent aspects of the phonology. I turn to these questions in the next section.
2.2 Aspects of gradience Most basically, we understand gradient and gradience in opposition to categorical and categoriality. A gradient (n.) in its original sense is a mapping
Is there Gradient Phonology?
31
from one continuous variable to another, that is, a slope. (In linguistic usage, we use the form gradience as a noun and gradient as an adjective.) It has also shifted to mean the continuous nature of a single variable.2 Thus we need to be clear on which sense of gradient we are talking about. Discrete is often equated with categorical and continuous with gradient (although there may be gradient patterns that are discrete). We need to consider both the question of what is gradient, as well as what is continuous. The terms gradient and gradience have been used in a number of diVerent ways in the recent phonetic and phonological literature. To think more systematically about the nature of gradience in phonology, we need to tease apart these diVerent usages (Section 2.2.1) before considering how these senses might apply to diVerent aspects of what is understood to be phonology—that is, contrast (Section 2.2.2), phonotactics (Section 2.2.3), and alternations, both morphophonemics (Section 2.2.4) and allophony (Section 2.2.5). 2.2.1 DiVerent uses of the term gradience When we talk about sound patterns, there are at least three senses of gradience that have been prevalent in the recent literature—temporal/spatial gradience, variability, and gradient well-formedness.3 2.2.1.1 Temporal/spatial gradience In work on the phonetic implementation of phonology, gradient/gradience is used in the sense of change in phonetic space through time. This is the sense of gradient versus categorical seen in the example of the realization of nasalization shown in Figure 2.1. In this case, there is a change in the amount of nasal airXow (space) through time, characterized as being a cline, distinct from more plateau-like cases (argued to obtain in cases of contrast). This is what I take to be the primary sense of gradience versus categoriality as it applies to the domain of sound patterns and their realization. The term gradience is also often used to refer to variable 2.2.1.2 Variability realizations or outcomes of sound patterns, understood as unpredictable or as stemming from various sociolinguistic and performance factors. We might understand this as gradience in the sense of gradience across tokens. 2 Thanks to Mary Beckman (p.c.) for clarifying this question of usage. 3 There is an additional use of the term gradient in the recent phonological literature. Within Optimality Theory, gradient has also been used to refer to constraint satisfaction (e.g. McCarthy and Prince 1993; McCarthy 2003), where more violations of a particular constraint are worse than a single violation. This is diVerent from the other senses discussed here and will not come into play in the present discussion.
32
The Nature of Gradience
Variability is sometimes understood in phonological terms as optional rule application, or freely ranked constraints, or as ‘co-phonologies’. These patterns have sometimes been modelled in statistical or stochastic terms. There are also approaches that model these factors directly as sociolinguistic or stylistic markers. (See Antilla 2002 and Coetzee 2004 for discussion of recent approaches to modelling phonological variation.) Both variability and gradience in phonetic implementation are pervasive in phonetic patterns and both must ultimately be understood for a full understanding of phonology and its realization. What we sometimes interpret as variability may in fact result from methodological approaches that are not Wne-tuned enough in their characterization of conditioning factors or prosodic context. For example, the realization of the contrast between so-called ‘voiced’ and ‘voiceless’ stops in English is highly dependent on segmental context, position in the word, position in the utterance, location relative to stress, etc. The nature of contrast may also vary systematically by speaker (Scobbie 2004). If these factors are not taken into consideration, one would conclude that there is enormous variability in the realization of these contrasts, while in fact much of the variation is systematic. It is not necessarily the case that temporal/spatial gradience and variability go hand in hand. In fact, there are well documented cases where they do not, that is, cases of variability that involve quite distinct categorical realizations. For example, this is the case with the allophones of /t/ and /d/ in English as documented by Zue and Laferriere (1979). There are also patterns of temporal/ spatial gradience that are highly systematic, as numerous studies of coarticulation and phonetic implementation show. These issues are also closely related to the question of sources of diachronic change and the issue of whether change is gradual. The nature of variation as it is manifested in the social system and its relationship to diachronic change are very important issues, but not ones that I pursue here. (See work by Labov, Scobbie, Bybee, Kiparksy for recent discussions.) 2.2.1.3 Gradient well-formedness There is gradience across the lexicon, or statistical knowledge, as documented in recent work by Pierrehumbert, Frisch, and others. (See Frisch 2000 for a review and Bod et al. 2003 for recent discussion.) Here we talk about gradient well-formedness, the idea that speaker/hearers make relative judgements about the wellformedness of various sound structures. In the case of phonotactics, this is understood as resulting from stochastic generalizations across the lexicon. Such gradient well-formedness judgements are observed in other aspects of
Is there Gradient Phonology?
33
the phonology, as well as other domains including both morphology and syntax. (See other chapters, this volume.) In such cases, it is the judgement about well-formedness or grammaticality that is gradient, not a physical event in time and space such as in the Wrst sense. We turn now to the question of how gradience might be manifested in the diVerent facets of phonology, focusing primarily on temporal/spatial gradience and gradient well-formedness. 2.2.2 Contrast Fundamental to a phonological system is the idea of lexical contrast: some phonetic diVerences in the acoustic signal result in two distinct lexical items, that is, minimal pairs. This is also the basis upon which inventories of sounds are deWned. The term contrast is used in two rather diVerent senses: underlying or lexical contrast, and surface contrast, that is, identiWable phonetic diVerences independent of meaning. The question of surface contrast sometimes arises when comparisons are made between phonological categories across languages. It also often arises in the discussion of phonological alternations that aVect lexical contrasts in terms of neutralization or nearneutralization. Cases of complete phonological neutralization should result in no cues to underlying diVerences or contrast. Yet many cases of what are claimed to be complete neutralization exhibit subtle phonetic cues that diVerentiate between surface forms. (For a recent discussion and review of such cases involving Wnal devoicing, see Warner et al. 2004). Under one interpretation, such cases can be understood as gradient realization of contrast. Due to space limitations, I do not pursue the issue of nearneutralization here. We might wonder if contrast is all or nothing, or whether it too might be gradient in the sense of exhibiting gradient well-formedness. Within generative grammar, we understand contrast in absolute terms. Two sounds are either in contrast or they are not. Many contrasts are very robust. Yet, contrast can also be much more speciWc or limited. (See Ladd 2003 for a discussion of some such cases.) There are certain sounds that contrast in some positions, but not others (that is, positional neutralization). For example, even for speakers who maintain an /a/ – /O/ contrast in American English, this contrast holds only before coronals and in open syllables. What is the nature of realization of these sounds before non-coronals? Do speakers produce the ‘same’ vowel in fog and frog? There are also some sounds that contrast in all positions in the word, but where the functional load of the contrast is very limited, such as in the case of /u/ versus /ð/ in English (thigh vs. thy, ether vs. either, Beth vs. eth, that is [ð]). Is contrast realized the same way in these cases
34
The Nature of Gradience
as in the more robust cases? Or should contrast also be understood as a gradient property? I will not pursue this question here, but it might well be that contrast is more gradient in nature than often assumed and so robustness of contrast might well prove to be an interesting area for investigation. Lexical neighbourhood eVects as well as phonological conditioning might both come into play. 2.2.3 Phonotactics A second aspect of sound systems widely understood to constitute part of phonology is allowable sound combinations or sequences—phonotactics. Some aspects of phonotactics appear to be deWned by segmental context, especially immediately preceding and following elements; some aspects are deWned by prosodic position, often best characterized in terms of syllable structure; and some aspects are deWned by morpheme- or word-position. Under many approaches to phonology, phonotactic patterns are understood to be categorical in nature. Particular combinations of sounds are understood to be either well-formed or ill-formed. Following most generative approaches to phonology, both rule-based and constraint-based, phonotactic patterns are captured with the same formal mechanisms as phonological alternations. Typically, phonotactic and allophonic patterns closely parallel each other, providing the motivation for such uniWed treatments. It is argued that distinct treatments would result in a ‘duplication’ problem (e.g. Kenstowicz and Kisseberth 1977). Recent work by a wide range of scholars (e.g. Pierrehumbert 1994, Vitevich et al. 1997, Frisch 2000, Bybee 2001, and Hay et al. 2003) suggests that phonotactic patterns can be gradient, in the sense that they do not always hold 100 per cent of the time. Phonotactic patterns may reXect the stochastic nature of the lexicon and speaker/hearers are able to make judgements about the relative well-formedness of phonotactic patterns. As an example, consider the phonotactics of medial English clusters, as analysed by Pierrehumbert (1994). Pierrehumbert asks the question of how we can account for the distribution of medial clusters, that is, the fact that certain consonant sequences are well-formed but others are not, for example /mpr/, /ndr/ but not */rpm/ or */rdn/. A generative phonology approach predicts: medial clusters ¼ possible codas + possible onsets. While a stochastic syllable grammar makes diVerent predictions: ‘the likelihood of medial clusters derived from the independent likelihoods of the component codas and onsets’ (1994: 174) and ‘The combination of a low-frequency coda and a lowfrequency onset is expected to be a low-frequency occurrence’ (1994: 169). Pierrehumbert carried out a systematic analysis of a dictionary and found
Is there Gradient Phonology?
35
roughly Wfty monomorphemic medial clusters. In the same dictionary, there were 147 possible codas and 129 possible onsets. If these were freely combining, there would be predicted to be 18,963 medial clusters. With some expected restrictions, Pierrehumbert concludes that we would still expect approximately 8,708. Pierrehumbert observes ‘It turned out that almost all the occurring triconsonantal clusters were among the 200 most likely combinations, and that a stochastic interpretation of syllable grammar eVectively ruled out a huge number of possible clusters, eliminating the need for many idiosyncratic constraints in the grammar’ (1994: 169). Pierrehumbert then discusses the systematic restrictions that play a role in determining the particular Wfty or so medial combinations that are attested among the 200 most likely. She concludes that a stochastic syllable grammar understood in the context of certain more traditional sorts of phonological constraints accounts for the observed patterns. Recent work in psycholinguistics shows that speakers have access in at least some situations to very Wne details including both speaker-speciWc and situation-speciWc information. (See Beckman 2003 and Pierrehumbert 2003 for reviews and discussion of this body of work.) Thus, it is not that surprising that speakers are sensitive to degrees of well-formedness in phonotactic patterns and that these parallel in some cases distributions in the lexicon. This leads us to two important issues. First, are phonotactic patterns and other aspects of phonology (contrast, morphophonemics, and allophony) as closely associated with each other as has been assumed in the generative phonological literature? Perhaps while similar and in some cases overlapping, phonotactics and other aspects of phonological patterning are not necessarily the same thing. This suggests that the standard generative phonology approach is reductionist in that it collapses distributional generalizations across the lexicon with other aspects of what is understood to be phonology. Second, evidence suggests that we have access to Wner details in at least some situations/ tasks and some of these Wner details may play a role in characterizing lexical entries. Thus, it cannot be, as is often assumed following theories of underspeciWcation in generative phonology, that lexical representations consist only of highly sparse contrastive information (e.g. pit /pit/, spit /spit/). We will not reach insightful conclusions about the nature of phonology if we just assume that lexical representations capture only contrast. These two widely held assumptions of generative phonology need to be revisited. However, there are two important caveats on the other side. Just because we are sensitive to Wner details, does not mean that we cannot abstract across the lexicon. To assume that we do not is to fall prey to this duplication problem from the other side. Pierrehumbert (2003) argues that some phonotactic
36
The Nature of Gradience
knowledge is not tied to frequency and indeed is true abstraction across the lexicon, that is, there is phonological knowledge independent of statistical generalizations across the lexicon. ‘In light of such results, I will assume, following mainstream thought in linguistics, that an abstract phonological level is to be distinguished from the lexicon proper.’ (2003: 191). This suggests that we have access to both Wne-grained and coarse-grained levels of knowledge and that they co-exist (see Beckman 2003 and Beckman et al. 2004). We would predict a (negative) correlation between the degree of gradience and the level of abstraction. 2.2.4 Alternations (morphophonemics) In many ways, the core phenomena understood to constitute phonology are alternations. The most canonical types are morphophonemic alternations, where the surface form of a morpheme is systematically conditioned by phonological context. Alternation is also used to refer to allophonic alternation where particular phones are in complementary distribution and are thus argued to be variants of the same underlying phoneme. Positional allophones are argued to alternate in their distribution based on phonological context. We consider morphophonemic alternations in this subsection and allophony in Section 2.2.5. Assuming we can draw appropriate boundaries (delineating the cases that are phonologically conditioned, productive, and not morpheme-speciWc), morphophonemic alternations are at the very core of what most phonologists think of as phonology. Most alternations are understood to be quite categorical in nature, often involving the substitution of distinct sounds in particular environments. Following a Lexical Phonology approach (e.g. Kiparsky 1982), such alternations are understood to be part of the lexical phonology and are assumed to respect structure preservation. If these sorts of cases are shown to involve gradience, this would strike at the core of our understanding of the phonology, since these are the least disputable candidates for ‘being phonology’. A widely cited claim arguing for gradience in phonology is that made by Steriade (2000). Parallel to phonological paradigm uniformity eVects, which are taken to account for some ‘cyclic’ eVects (e.g. Benua 1998; Kenstowicz 2002), Steriade argues that there are phonetic paradigm uniformity eVects, where non-contrastive phonetic details may be marshalled to indicate morphological relatedness. Consider Wrst a canonical example of so-called paradigm uniformity eVects. Many languages show overapplication or underapplication of phonological patterns that result in phonological similarity within morphologically related forms, despite the absence of the relevant phonological conditioning
Is there Gradient Phonology?
37
context. For example, in Sundanese there is a general pattern of vowel nasalization, whereby vowels become nasalized after a nasal consonant, unless blocked by a non-nasal supra-laryngeal consonant (Robins 1957). This is exempliWed in (2.2a). There is overapplication of nasalization in inWxed forms indicating plurality or distributedness (2.2b). (2.2) Nasalization in Sundanese (Cohn 1990) a. /Jiar/ [Jı˜a˜r] ‘seek’ (active) /niis/ [nı˜?ı˜s] ‘relax in a cool place’ (active) /˛atur/ [˛a˜tur] ‘arrange’(active) /˛uliat/ [˛u˜liat] ‘stretch’ (active) b. Singular Plural /Jiar/ [Jı˜a˜r] ‘seek’(active) /J¼al¼iar/ [Ja˜lı˜a˜r] /niis/ [nı˜?ı˜s] ‘relax’(active) /n¼ar¼iis/ [na˜rı˜?ı˜s] In derivational approaches, this overapplication follows from a cyclic analysis, where vowel nasalization reapplies after inWxation (e.g. Cohn 1990). However such a solution is not available in non-serial approaches such as most Optimality Theoretic approaches. One account within Optimality Theory is that such patterns result from Output–Output constraints, reXecting the morphological relationships between words (Benua 1998). Such phonological parallels are enforced by paradigm uniformity. Steriade (2000) argues that not only phonological properties (those that are potentially contrastive) show such eVects but that ‘paradigmatic uniformity is enforced through conditions that govern both phonological features and properties presently classiWed as phonetic detail, such as non-contrastive degrees in the duration of consonant constrictions, non-contrastive details in the implementation of the voicing contrast, and degree of gestural overlap.’ (2000: 314). She then goes on to say that ‘There is a larger agenda behind this argument: the distinction between phonetic and phonological features is not conducive to progress and cannot be coherently enforced.’ (2000: 314) This very strong claim rests on two cases. The Wrst case is schwa deletion in French, where paradigm uniformity is argued to be responsible for the subtle diVerences between forms such as pas d’role ‘no role’ and pas droˆle ‘not funny’, where the syllable-initial character of [å] is maintained in the Wrst case, despite the deletion of schwa. The second is Xapping in American English: the observation (made by Withgott 1983 and others) that in some cases where the phonological environment is met for Xapping, Xapping does not occur is argued to be due to subphonemic paradigm uniformity, for example ca´piDalist: ca´piDal, but mı`litarı´stic: mı´litary.
38
The Nature of Gradience
In Steriade’s argument concerning Xapping there are two crucial assumptions. First, ‘we suggest that PU [paradigm uniformity] (STRESS) should characterize not only stress identity between syllables but also the use of individual stress correlates (such as duration, pitch accents, vowel quality) to Xag the stress proWle of the lexical item to be accessed.’ (2000: 321). In eVect, what Steriade hopes to conclude—that non-contrastive details can drive paradigm uniformity—becomes a working assumption, making the argument circular. Second, ‘The diVerence between [Q] and [t]/[d] is a function of closure duration . . . . The extra-short duration of [Q] is a candidate for a never-contrastive property’ (2000: 322). In fact, there are a number of other candidates for the diVerence between Xap and [d/t], some of which are contrastive properties (such as sonority). Steriade conducted an acoustic study of twelve speakers uttering one repetition each of several pairs of words, with judgements based on impressionistic listening (which turns out to be rather unreliable in identifying Xapping). Based on the results of the study, she concludes that PU (stress: duration) is responsible for observed base–derivative correspondence. However a recent experiment designed to replicate Steriade’s Wnding by Riehl (2003a, 2003b) calls into question Steriade’s (2000) conclusions about the nature of the paradigm uniformity eVect. Riehl recorded six speakers, with twelve repetitions of each form, using similar pairs to those in Steriade’s study. She undertook an acoustic analysis of the data (including measures of closure duration, VOT, presence or absence of burst, and voicing duration during closure) and also a systematic perceptual classiWcation by three listeners, in order to compare the perception and consistency of perception with the acoustic realization. In Riehl’s data, there were four relevant pairs of forms where phonologically one might expect a Xap in one case and a stop in the other, as in the examples studied by Steriade. There were 24 possible cases of paradigm uniformity (4 forms 6 speakers) where 12/12 forms could have shown both Xaps or both stops. Since there was quite a bit of variation in the data, Riehl counted either 12/12 or 11/12 cases with the same allophone as showing ‘uniformity’. Out of the 24 cases, there were 7 that showed uniformity or near uniformity and 17 with variation within forms and within pairs. Thus the case for paradigm uniformity was weak at best. In cases of variation, stops were usually produced earlier in the recordings, Xaps later, arguably showing a shift from more formal to more casual speech (highlighting the importance of looking at multiple repetitions). Moreover, Riehl found that the coding of particular tokens as Xaps or stops was not as straightforward as often assumed and she found that the perception of Xaps correlated best with VOT, not closure duration.
Is there Gradient Phonology?
39
This does not mean that there is no morphological inXuence on Xapping, but suggests that the pattern may not be that strong. There is also a lack of compelling evidence to show that these eVects are best understood as subphonemic paradigm uniformity. Steriade’s conclusions regarding French schwa are also not that secure. It is not clear whether these eVects are really what we understand to be paradigm uniformity; rather, this interpretation seems to be driven by Steriade’s assumption that phonology and phonetics are not distinct. (Barnes and Kavitskaya 2002 also question Steriade’s conclusions in the case of schwa deletion in French.) Does this mean that there are not gradient eVects in the domain of morphophonemics? A more convincing case of morphological inXuences on phonetic realization may be the degree of glottalization in English correlating with degree of morphological decompositionality, for example, realign versus realize as discussed by Pierrehumbert (2002). ‘The model predicts in particular the existence of cases in which relationship of phonetic outcomes to morphological relatedness is gradient.’ (2002: 132) The question is how close the correlation between morphological decompositionality and phonetic realization is and how best to model this correlation. I fully agree with Pierrehumbert that ‘More large-scale experiments are needed to evaluate this prediction.’ (2002: 132) 2.2.5 Allophony The Wnal aspect of phonology is allophony. Based on the deWnitions of SPE, allophony is understood to be part of phonology, due to its language-speciWc nature. There has been much discussion in the literature about whether allophony is necessarily categorical in nature or whether there are gradient aspects of allophony. There are also many cases of what was understood as allophony in categorical terms that have been shown, based on instrumental studies, to be gradient. This is the case of anticipatory nasalization in English discussed in Cohn (1990, 1993) and the case of velarization of [l] in English as discussed by Sproat and Fujimura (1993). Such cases raise three issues. 1. Based on impressionist description, much work on allophony suggests that allophony is quite categorical in nature. Yet, both the tools we use (careful listening) and the symbols available to us (phonetic transcription which is discrete in nature) bias our understanding of these patterns as categorical. 2. There has been a wide body of work arguing for a rethinking of the SPE deWnition of what is phonology and what is phonetics. Much work has identiWed the language-speciWc nature of phonetic patterns (e.g. Chen 1970; Keating 1985; Cohn 1990; Kingston and Diehl 1994), leading to a
40
The Nature of Gradience rethinking of where we draw the boundary between phonetics and phonology. Under these approaches many cases that have been thought of as phonological have been reanalysed as phonetic. 3. This still leaves us with the question of where to draw the line and whether we should draw a line. We return to this question in Section 2.3.
There has also been argued to be gradience in allophony in a rather diVerent sense. Bybee (2001) and Jurafsky et al. (2001) among others argue that lexical (token) frequency aVects allophony in the sense that more frequent words are observed to be shorter and phonologically reduced. Bybee (2001 and earlier work) has argued that what is understood as allophony in generative phonology cannot follow from general rules or constraints, because there are frequency eVects on the realization of non-contrastive properties. If what we think is allophony falls along a continuum rather than in two or three discrete categories, and if there is a strong correlation between the realization of a particular non-contrastive property and frequencies of particular lexical items in the lexicon, then this would be diYcult to model in standard generative phonological models. One widely cited case in this regard is the case of schwa deletion in the context of a resonant in English. Bybee (2001) citing an earlier study based on speaker self-characterization (Hooper 1976, 1978, a.k.a. Bybee) argues that it is not the case that schwa is either deleted or present, but rather that there are degrees of shortening. She observes impressionistically a continuum from [Ø], to syllabic resonants, to schwa + resonant, for example every [Ø], memory [r" ], mammary [@r] (where a syllabic resonant is thought to be shorter than a schwa plus resonant). It is further argued that these diVerent realizations correlate with the lexical (token) frequency, that is, there is complete deletion in the most common forms, schwa plus a resonant in the least frequent forms, and syllabic resonants in the cases which fall in between. This is understood to follow from the view that sound change is lexically and phonetically gradual, so that ‘schwa deletion’ is farther along in high frequency words. Lavoie (1996) tried to replicate Bybee’s Wnding with a more systematic study including instrumental data. Her study included acoustic measurements of multiple repetitions of near mimimal triplets, sets that were similar in their phonological structure and diVered in relative frequency both within the sets and in terms of absolute frequency across the data set, based on frequency from Francis et al. (1982). Crucially when frequency was plotted against duration, no correlation was found. Rather, there was a robust subpattern of complete deletion of schwa in many forms independent of
Is there Gradient Phonology?
41
lexical frequency and there was variation in duration independent of lexical frequency. Thus schwa deletion in English does not provide the kind of evidence that Bybee suggests for allophony being driven by lexical token frequency. (The other cases widely discussed by Bybee in this regard, such as aspiration of /s/ in Spanish and /t, d/ deletion in English also warrant careful reconsideration.) We need to consider the question of whether there are cases where gradient phonological patterns correlate with lexical (token) frequency. The short answer is yes, but the best documented cases in this regard are of a very diVerent sort than those mentioned above. When token frequency diVerences correlate with function versus content word diVerences, indeed frequency in such cases has a major eVect on the realization of sound patterns. Function words show much more reduced and variable realization than content words. See for example Lavoie (2002)’s study of the reduction and phonetic realization of for versus four and Jurafsky et al.’s (2001) study of both reduction and variability in function words. Bybee assumes that it is token frequency that diVerentiates function words and content words, yet these frequency eVects can also be understood to follow from the prosodic diVerence between content and function words. (For recent discussion of the prosodic structure of lexical versus functional categories, see e.g. Zec 2002.) Unequivocal support for Bybee’s claim would come from duration diVerences strongly correlated with token frequency diVerences found within the same lexical category, with appropriate controls for discourse context, priming eVects, and so forth. Cohn et al. (2005) investigate the phonetic durations of heterographic pairs of homophonous English nouns that diVer in token frequency. Homophonous pairs were grouped into three categories based on the magnitude of the frequency diVerence between the members of each pair, as determined by relative frequencies in Wve large corpora. This included Large DiVerence pairs (e.g. time thyme, way whey), Medium DiVerence pairs (e.g. pain pane, gate gait), and little or No DiVerence pairs (e.g. son sun, peace piece). Four native speakers of American English participated in two experiments. In the Wrst experiment, the speakers were recorded reading four repetitions of a randomized list of the target words in a frame sentence. In the second experiment, a subset of these words was read in composed sentences with controlled prosodic structures. The phonetic duration of each target word was then measured in Praat, and the ratio more frequent/less frequent was calculated for each repetition of each pair. If the hypothesis that greater frequency leads to shorter duration is correct, then these ratios should systematically fall below one for the Large DiVerence and Medium DiVerence pairs, while those for the little or no diVerence group should be approximately
42
The Nature of Gradience
one. No systematic diVerences were found for individual speakers or across speakers in either the frame sentences or the composed sentences. The lack of positive correlation between duration and token frequency calls into question the hypothesis that greater frequency leads to shorter duration. These results are interesting in light of Jurafksy’s (2003) observation that evidence for frequency eVects are better documented in comprehension than production. On the production side, eVects are much more robustly documented for latency in lexical access than in phonetic duration diVerences. These results and observations highlight the need for a better understanding of the locus of frequency eVects in the lexicon and in speech production.
2.3 Conclusions and implications Having considered the evidence for three cases of gradience in the phonology in Sections 2.2.3–5, we now return to the broader question: Is there gradience in the phonology? Not surprisingly, the answer seems to be yes and no. It depends on what we mean by gradience and it depends on which facets of the phonology we consider. The clearest evidence for gradience among the cases that we considered is gradient well-formedness, as documented in the case of phonotactics. It was less clear that there was a convincing empirical basis for the speciWc claims made by Steriade in terms of subphonemic eVects in paradigm uniformity and those made by Bybee regarding frequency eVects in allophony. However the shakiness of the speciWc cases does not answer the question of whether there is gradience in phonology in the areas of morphophonology and allophony. In both cases, the conclusion about whether there is gradience in the phonology depends in part on the deWnition of phonology and how we understand phonology in relationship to phonetics. This leads us back to the question, discussed in Section 2.1.2, whether phonetics and phonology are distinct domains. A modular view of the grammar necessarily leads us to a mapping approach between the phonology and phonetics view. On the other hand, focusing primarily on the grey area, cases that are particularly diYcult to classify, and deWning similarity as ‘duplication’ lead us to a unidimensional view. Let us return to the observation by Pierrehumbert et al. (2000) that knowledge of sound structure falls along a continuum, with more Wnegrained knowledge tending to lie at the phonetic end and lexical contrast and alternations being more granular. This continuum is schematized in Figure 2.2a with phonetics versus phonology on the x-axis and degree of granularity on the y-axis. Consider the schematic distribution of the data: the modular approach suggests a distribution such as that in Figure 2.2b, with
Is there Gradient Phonology? granular
granular
fine-grained (a)
43
fine-grained phonetics
phonology
phonetics
(b)
phonology
granular
fine-grained (c)
phonetics
phonology
Figure 2.2. (a) Continuum between phonetics and phonology (x-axis) and finegrained and granular (y-axis) dimensions of speech; (b) distribution of data, modular approach; (c) distribution of data, unidimensional approach
little or no grey area. The unidimensional approach suggests a distribution such as that in Figure 2.2c, with little correlation between the two dimensions. Yet the data clearly fall somewhere between these two views. How can we understand and model this distribution? Two methodological issues contribute to the perceived cost of ‘duplication’ and to the tendency to avoid duplication through reductionism. The Wrst is the nature of modularity. Hale and Reiss (2000: 162) state ‘The modular approach to linguistics, and to science in general, requires that we both model the interactions between related domains, and also sharply delineate one domain from another.’ But, we need to ask the question: Is there strict modularity? Does modularity entail sharp delineation? Could there be modularity that is not rigid? The lack of strict modularity is implicit in views to understanding the relationships between linguistic domains through interfaces. If we do not subscribe to strict modularity between phonology and phonetics and between phonology and the lexicon, then it becomes an empirical question if drawing a distinction is useful. Does a division of labour contribute to both descriptive adequacy and explanatory adequacy? The second is the status of Occam’s Razor, or the principle of parsimony. Perhaps Occam’s Razor does not hold as strongly as we believe. There is redundancy in language. Redundancy is widely observed in the domain of phonetics in terms of multiple and varied cues to the realization of particular phonological structures. Even cases of what we understand to be
44
The Nature of Gradience
a straightforward phonological contrast may involve multiple cues. Evidence suggests that lexical representations include multiple levels of details, including the kind of sparse abstract representations widely assumed in generative phonology and much more Wne-grained levels of detail. (See Beckman et al. 2004 for discussion and a speciWc proposal in this regard.) Not only is there redundancy within domains, but there appears to be redundancy across domains, so ‘duplication’ is not a problem, but in fact an intrinsic characteristic of language. Increasingly there is agreement that unidimensional or reductionist views are not suYcient (see Pierrehumbert 2001: 196). Attempting to understand sound structure in only abstract categorical terms or in only the gradient details, or trying to understand the nature of the lexicon in exactly the same terms in which we try to understand phonology is insuYcient. In conclusion, the relationship between phonetics and phonology is a multifaceted one. It reXects phonetic constraints that have shaped synchronic phonological systems through historical change over time. Synchronically, phonological systems emerge as a balance between the various demands placed on the system, but the evidence suggests that phonology cannot be reduced to the sum of these inXuences. Phonetics and phonology also need to be understood in relationship to the lexicon. There are parallels and overlaps between these three areas, but none of them is properly reduced to or contained in the others. Language patterns are fundamentally Xuid. There is evidence of phonologization, grammaticalization, lexicalization, and so forth. Similar patterns can be observed across these domains. To reach a fuller understanding of the workings of the sound system and the lexicon, we need to be willing to reconsider widely held assumptions and ask in an empirically based way what is the connection between these domains of the linguistic system.
3 Gradedness: Interpretive Dependencies and Beyond ERIC REULAND
3.1 Introduction During the last decades it has been a recurrent theme whether or not the dichotomy between grammatical versus ungrammatical or well-formed versus ill-formed should not be better understood as a gradient property (cf. Chomsky’s (1965) discussion of degrees of grammaticality).1 If so, one may well ask whether gradedness is not an even more fundamental property of linguistic notions. The following statement in the announcement of the conference from which this book originated presupposes an aYrmative answer, and extends it to linguistic objects themselves, making the suitability to account for gradedness into a test for linguistic theories: ‘The kind of grammar typically employed in theoretical linguistics is not particularly suited to cope with a widespread property of linguistic objects: gradedness.’2 This statement implies that we should strive for theories that capture gradedness. To evaluate it one must address the question of what ‘gradedness’ as a property of linguistic objects really is. The issue is important. But it is also susceptible to misunderstandings. My Wrst goal will be to show that gradedness is not a uniWed phenomenon. Some of its manifestations pertain to language use rather than to grammar per se. Understanding gradedness may therefore help us shed light on the division of labour among the systems underlying language and its use. Showing that this is the case will be the second goal of this contribution. 1 This material was presented at the ‘Gradedness conference’ organized at the University of Potsdam 21–23 October 2002. I am very grateful to the organizers, in particular Gisbert Fanselow, for creating such a stimulating event. I would like to thank the audience and the two reviewers of the written version for their very helpful comments. Of course, I am responsible for any remaining mistakes. 2 This statement is taken from the material distributed at the conference.
46
The Nature of Gradience
The introductory material to this book connects the discussion of gradedness with the notion of idealization in grammar. As the text puts it: The formulation of grammatical models is often guided by at least two idealizations: the speech community is homogeneous with respect to the grammar it uses (no variation), and the intuitive judgements of the speakers about the grammaticality of utterances are categorical and stable (no gradedness). There is a growing conviction among linguists with diVerent theoretical orientations that essential progress could be made even in the classical domains of grammar if these idealizations were given up.
This assessment as such may well be correct, although it pertains to the sociology of the Weld rather than to linguistic theorizing itself. However, it is important to see that the idealizations that are being given up do not reXect the idealization underlying the competence-performance distinction formulated in Chomsky (1965): Linguistic theory is concerned primarily with an ideal speaker–listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaVected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance. . . . To study actual linguistic performance, we must consider the interaction of a variety of factors of which the underlying competence of the speaker–hearer is only one. (Chomsky 1965: 3)
This quote does not state that the speech community is homogeneous, nor that one should not study the nature of variation between speech communities. It also does not claim that intuitive judgements of the speakers about the grammaticality of utterances are categorical and stable. In fact one should not expect them to be. Linguistic data are like any empirical data. Whether one takes standard acceptability judgements of various kinds, truth value judgement tasks, picture identiWcation tasks on the one hand, or eye-tracking data and neuro-physiological responses in brain tissue on the other, they all have the ‘ugly’ properties of raw empirical data in every Weld. Depending on the nature of the test, it may be more or less straightforward to draw conclusions about grammaticality, temporal or spatial organization of brain processes, etc. from such data. In fact, Chomsky says no more than that the ‘study of language is no diVerent from empirical investigation of other complex phenomena’, and that we should make suitable idealizations in order to make eVective empirical investigation feasible at all. For anyone who can see that watching a tennis match is not the best starting point if you want to begin getting an understanding of the laws of motion Chomsky’s point should be pretty straightforward.
Gradedness: Interpretive Dependencies and Beyond
47
Gradedness in linguistics may come from at least four sources: 1. linguistic processes could have an analogue character as opposed to being discrete; 2. properties of languages may be tendency based versus system governed; 3. in certain domains one may Wnd individual variability (variability both across individuals and within individuals at diVerent occasions, in judgement or in production/perception); 4. linguists may Wnd systematic distinctions in acceptability judgements on a scale. In the next section I will discuss these in turn.
3.2 Categorizing gradedness 3.2.1 Gradedness I: the analogue–discrete distinction Much of our current thinking about language is intimately connected to what is called the computational theory of the mind. Language is a system allowing us to compute the systematic correlations between forms and their interpretations. Right from the beginning, generative models of language were based on computations using discrete algebras, reXecting properties of sets of strings deWned over Wnite alphabets. The question of whether a certain step in a computation is admissible or not can only be answered with yes or no; the question of whether a certain string belongs to the language generated by a certain grammar admits only three answers: yes, no, or un-decidable. It makes no sense to say that a certain string approximates membership of a language, just like the model has no room for saying that that one item is slightly more of a member of the alphabet than another. The question is whether this discrete model of linguistic computations may be fundamentally wrong. One could argue that biological processes are fundamentally analogue in nature, and that neural systems may encode diVerences in information states by diVerences in degree of excitation. Such a position appears to be implied by much work in connectionist modelling of language, since Rumelhart et al. (1986b). However, the situation is not that simple. As argued by Marantz (2000) one cannot maintain that biological systems are analogue: ‘Genetics, at the genetic coding level, is essentially digital (consider the four ‘‘letters’’ of the genetic code). In addition, neurons are ‘‘digital’’—there are no percentages of a neuron; you have a cell or you don’t. Neuron Wring can be conceived of as digital or analogue.’ That is, whether or not a neuron Wres is a yes–no issue; but diVerent Wrings may vary along certain physical dimensions.
48
The Nature of Gradience
The issue is how the diVerences along those dimensions aVect the transmission of information to other parts of the system. As we all may know from inspecting our watches, a digital system can mimic an analogue process and an analogue system can mimic a digital process, and in fact all watches are based on conversions from one type of process to another. In the brain it need not be diVerent. And even if at some level the brain architecture would have connectionist type properties, this would not prevent it from emulating symbolic/discrete operations. So, even regardless of properties of the brain architecture, whether or not the system underlying language is best conceived as analogue or discrete/digital, is an independent empirical issue. If so, there is no escape from approaching the issue along the lines of any rational empirical inquiry. All models of language that come anywhere near a minimal empirical coverage are discrete. There are no analogue models to compare them with (see Reuland (2000) for some discussion). Yet, there are some potentially analogue processes in language: . use of pitch and stress as properties of form representing attitudes regarding the message; . use of pitch and stress representing relative importance of parts of the message. These, however, involve typically indexical relationships such as the contours of intonation that one may use for various degrees of wonder, or surprise; or the heaviness of the stress, or the height of the pitch of an expression, where the relative position on a scale in some dimension of properties of the signal reXects the relative position on a scale of importance, surprise, etc., of what it expresses. One need not doubt that intensity of the emotions involved is expressed by properties of the linguistic signal in an essentially analogue manner. But this type of import of the signal must be carefully distinguished from properties of the signal that serve as grammatical markings in the sentence. For instance, question intonation is the realization of a grammatical morpheme. One expression cannot be more or less a question than another one. So, the grammatical import of question intonation is as discrete as the presence or absence of a segmental question morpheme. As shown by studies of the expression of focus (Reinhart 1996), the role of intonation in focus marking is discrete as well.3 There is no gradedness of focus-hood as a 3 In Reinhart’s system focus is determined by stress, and two rules. (i) Focus rule: the focus of IP is a(ny) constituent containing the main stress of IP as determined by the (general) stress rule (ii) Marked focus rule: relocate the main stress on a constituent you want to be the focus
Gradedness: Interpretive Dependencies and Beyond
49
function of gradedness of the stress. In a nutshell, there is ‘gradedness’ in the domain of prosody, but the relevance of this gradedness to the grammar is null. 3.2.2 Gradedness II: tendency and system Another face of gradedness we Wnd in the contrast between tendency based and system governed properties of language(s). Two issues arise: What kind of phenomena do we have in mind when we speak of tendencies in the study of language? And what is the status of whatever represents them in our linguistic descriptions? Sometimes the issue is clear. There are striking diVerences between written cultures as to the average sentence length in their typical novels. A cursory examination of a standard nineteenth-century Russian novel, or a German philosophical treatise, will show that their average sentence length in words will surpass that of a modern American novel. But individual samples may show substantial variation. Clearly, a matter of tendency rather than law. What underlies such a diVerence? Few people will claim that diVerences in this dimension are diVerences in language. Rather they will be regarded as diVerences in socio-cultural conventions. One may expect convergence between language communities as the relevant conventions do; eventually, stratiWcation along genres instead of ‘languages’, etc. Thus, average sentence length is a gradable property of texts. But its source is conventions of language use. At the other extreme, we Wnd properties such as the position of the article in a DP, or the position of the Wnite verb in the clause. No serious linguist would be content claiming that English has a tendency to put articles before the noun, and Bulgarian a tendency to put them after, or that Dutch has just tendencies to put the Wnite verb in second position in declarative root clauses, and to put it after the direct object in subordinate clauses. It is easy to think of other cases where system and use based accounts might conceivably compete. Take, for instance, the cross-linguistic variation that is found in the expression of agents in passive. The standard view is that such diVerences are categorical, determined by the precise mechanics used to ‘suppress’ the agent role. Instead one could imagine a usage based approach, based on cultural variation in the role assigned to eYciency of communication. For instance, certain communities might entertain the conversational convention if an argument is so unimportant that you consider demoting it, better go all the way and omit it entirely. A similar line could be attempted with respect to word order and its variations. Of course, the question is how to evaluate such positions.
50
The Nature of Gradience
As long as we do not have eVective analogue models of language, discussion of such alternatives remains moot. We cannot evaluate analogue versus algebraic accounts of a certain phenomenon if the analogue version does not exist. Does this mean that discrete theories rule out that we Wnd any phenomena only involving tendencies? In fact not. And we can go a bit further. From the discrete/grammatical perspective, one expects to Wnd tendencies precisely there where principles of grammar leave choices open. Just as there are no linguistic principles determining what one will say, there are no linguistic principles determining how one will say it. I will return to this below. 3.2.3 Gradedness III: variability It is a common experience for linguists that speakers of the same language show variation in their judgements on one and the same sentence, or that even one and the same speaker gives varying judgements at diVerent occasions. Note that the second variability may also show up as the Wrst, but for the discussion we will keep them separate. The question is what kind of theoretical conclusions one should draw from such variation. Do messy, variable data show that ‘gradedness’ is a concept that should be incorporated into linguistic theorizing? I will discuss some examples that illustrate the issue. 3.2.3.1 Inter-subject variability One relevant type of variation is the intersubject variation illustrated by the contrast between varieties of Dutch that do or do not require the expletive er in sentences like (3.1): (3.1) a. A: Wie denk je dat *(er) komt b. B: Wie denk je dat (er) komt who think you that (there) comes Unlike what was thought in the 1970s, there is no clear split along regional dialect lines. Rather there is variation at the individual level. Note, however, that it is not a matter of real gradedness. Individual speakers are quite consistent in their judgement. Hence, no insight would be gained by treating this as a ‘graded’ property of Dutch. The same holds true of the variation regarding the that-trace Wlter in American English. Variation of this kind is easily handled by the one theoretical tool late GB or current minimalist theory has available to capture variation, namely variability in feature composition of functional elements. The contrast between (3.1a) and (3.1b) reXects a
Gradedness: Interpretive Dependencies and Beyond
51
general contrast between Dutch speakers concerning the licensing of nonargument null-subjects, ultimately reducible to micro-variation in the feature composition of T. Another instructive case is variation in operator raising in Hungarian, as in (3.2) (Gervain 2002): (3.2) KE´T FIU´T mondta´l hogy jo¨n/jo¨nnek two boys.sg.acc said.2sg that come.3sg/come.3pl ‘You said that two boys would come.’ The literature on operator raising in Hungarian gives conXicting judgements on case (nominative or accusative on the focused phrase) and agreement or anti-agreement on the downstairs verb. Gervain succeeds in showing that there actually is systematic variation between two basic dialects (but without regional basis). One ‘dialect’ allows both nominative and accusative on the focused phrase, but rejects anti-agreement on the embedded verb, the other prefers matrix accusative, but accepts both agreement and anti-agreement. The former employs a movement strategy, the other a resumptive strategy. The source of the patterns can ultimately be reduced to a diVerence in the feature composition of the complementizer hogy ‘that’, one feature set allowing an operator phrase to pass along by movement, the other blocking it. This case is so instructive, since it shows that the right response to messy data is further investigation rather than being content just stating the facts. There are other sources of inter-subject variability. Some speakers may be daily users of words that other speakers do not know, or deWnitely Wnd weird when they hear them. Such variation follows from diVerences in experience, but crucially, resides in the conceptual rather than the grammatical system. Variability in judgement may also ensue from the interaction of computationally complex operations. In fact this type of variation may occasionally cross the border from inter-subject to intra-subject variability. Even in simple interactions world knowledge may be necessary to bring out object wide scope readings as in (3.3b) versus (3.3a): (3.3) a. some student admired every teacher b. some glass hit every mirror It is a safe bet that in on-line processing tasks few people will manage to get all the quantiWcational interdependencies in a sentence of the following type, and even on paper I have my doubts:
52
The Nature of Gradience
(3.4) which candidates from every class did some teacher tell every colleague that most mother’s favourites refused to acknowledge before starting to support Carrying out complex computations with interacting quantiWers may easily lead to the exhaustion of processing resources. OverXow of working memory leads to guessing patterns. It is well-known that speakers diVer in their processing resources (for instance, Just and Carpenter 1987; Gathercole and Baddeley 1993). So, varying availability of resources may lead to diVerential patterns of processing break-down. Simply speaking, one speaker may start guessing at a point of complexity, where another speaker is still able to process the sentence according to the linguistic rules. This would yield diVerences in observable behaviour that do have gradient properties, but that are again grammar external, and would in fact be a mistake to encode in the grammatical system. Note, that even simple sentences may sometimes be mind boggling as in determining whether every mother’s child adores her allows every mother to bind her or not. Of course, the ranking of processes in terms of resource requirements has a clear theoretical interest, since it sheds light on the overall interaction of processes within the language system, but it is entirely independent of the issue of gradedness of grammar. 3.2.3.2 Intra-subject variability As we all know, speakers of a language may not always be able to give clear-cut categorical judgements of the grammatical status of sentences. Actually, we Wnd this in two forms: (a) a speaker of a language may give diVerent judgements on diVerent occasions; (b) a speaker of a language may express uncertainty on one occasion. Once more, the issue is not so much whether such variability occurs, as what it means for the (theory of) grammar. And, again, the variability does not necessarily mean very much for our conception of grammar given what we know about possible sources. If the interpretation of a sentence exerts a demand on processing resources to the point of overXow, variability is what we would expect. Depending on the level of sophistication guessing may either result in (a), as it does with children on condition B tasks (Grodzinsky and Reinhart 1993), or in (b). For another source of variation, note that people are perfectly able to master more than one variant of a language, for instance in the form of diVerent registers, or a standard language and one or more dialects. This means that the mental lexicon must be able to host entries that are almost identical; just marginally diVerent in some of their instructions to the
Gradedness: Interpretive Dependencies and Beyond
53
sensori-motor system or the grammatical system. For instance Dutch dat, and Frisian dat are both complementizers; they diVer slightly in the way the a is realized, more back in the case of Frisian. In Frisian, but not in Dutch, dat carries a grammatical instruction letting it cliticize to wh-words (actually, it does not matter whether this is a property of dat, or of the element in spec-CP). This property can be dissociated from its pronunciation, witness the fact that some Frisian speakers have this feature optionally in their Dutch, and use cliticization together with the Dutch pronunciation of the a. Optionality means that the mental lexicon of such speakers does contain the two variants, both being accessible in the ‘Dutch mode’. If so, there is no reason that this cannot be generalized to other cases of micro-variation. If the mental lexicon may contain close variants of one lexical item, one may expect retrieval of one or the other to be subject to chance. Of course, this does not mean that the phenomenon of variation and the mechanisms behind it are uninteresting. It does mean that the concept of gradedness does not necessarily help understanding it. Interpretation being dependent on perspective is another possible source of variation (as in the Necker cube: which edge of the cube is in front). Consider the following contrast from Pollard and Sag (1992): (3.5) a. Johni was going to get even with Mary. That picture of himselfi in the paper would really annoy her, as would the other stunts he had planned. b. *Mary was quite taken aback by the publicity Johni was receiving. That picture of himselfi in the paper had really annoyed her, and there was not much she could do about it. There is a clear diVerence in well-formedness between these two discourses. Yet, structurally the position of the anaphor himself is identical in both cases. The only relevant contrast is in the discourse status of the antecedent. In (3.5a) John’s viewpoint is taken, in (3.5b) Mary’s. And, as noted by Pollard and Sag, the interpretation of anaphors that are exempt from a structural binding requirement (such as the himself in a picture NP) is determined by viewpoint. Hence, in (3.5b) John does not yield a proper discourse antecedent for himself. The following variant will probably give much messier results: (3.6) (?) Mary was quite taken aback by the publicity. Johni was getting the upper hand. That picture of himselfi in the paper had really annoyed her, and there was not much she could do about it.
54
The Nature of Gradience
Example (3.6) allows for two discourse construals. One in which the viewpoint is Mary’s throughout, another in which the perspective shifts to that of John as soon as John is introduced. Clearly, judgements will be inXuenced by the ease with which the perspective shift is carried out. So, we have gradedness in some sense, but it is irrelevant for the system, since the relevant factor is still discrete. The judgement is determined by whether the shift in viewpoint is accessed in actual performance. 3.2.4 Gradedness IV: acceptability or grammaticality judgements on a scale As noted above, Chomsky (1965) introduced the notion ‘degree of grammaticality’ indicating that such degrees should reXect the nature of violations of diVerent subsystems of the grammar and the way these violations add up. They were expressed in a metric such that a violation of strict subcategorization as in *John arrived Mary comes out as worse than a violation of a selection restriction, as in ??sincerity hated John. Clearly, over the years not much progress has been made. In practice, researchers have been content impressionistically labelling violations from **, via *, *?, to ?? and ?. The question is, does this reXect on grammatical theory or just on grammatical practice? For a GB-style syntax it is easily seen that there is no problem in principle. For any full representation of a sentence, that is with indices, traces, cases, theta-roles, etc. it is easy to count the number violations and devise a metric based on that number. Of course, one would need to decide whether violations of selection requirements count as violations of grammar or not, or whether all types of violations get the same value in the metric, but that’s a matter of implementation. Also a minimalist style grammar allows a metric for violations. It requires looking at a potential derivation from the outside and assessing what would have happened if it had not crashed. More precisely, on the basis of each standard minimalist grammar G one can deWne a derivative grammar G’ such that derivational steps that crash in G are deWned in G’ and associated with a marker of ungrammaticality. It is trivial to deWne a metric over such markings. A minimalist style grammar is, in fact, even better equipped for expressing diVerences in well-formedness than a GB-style grammar. Whereas a GB-style grammar does not make clear-cut distinctions between processes that belong to syntax proper, or to syntax external components such as semantics, pragmatics, or conceptual structure (since there aren’t many restrictions on the potential encoding mechanisms), a minimalist style grammar draws a
Gradedness: Interpretive Dependencies and Beyond PFinterface
(
C-Iinterface (interpretive system)
inferences
sensori-motor system
)
CHL
[
55
lexicon
language of thought
]
Figure 3.1. A minimalist organization of the language system
fundamental line around the computational system of human language CHL (narrow syntax). It allows for a minimal set of operations (Merge, Attract, Check, Delete (up to recoverability)), deWned over a purely morphosyntactic vocabulary, and triggered by the grammatical feature composition of lexical items. Possible derivations are restricted by the inclusiveness condition, which states that no elements can be introduced in the course of the derivation that were not part of the initial numeration (no lambda’s, indices, traces, etc.). Clearly, the language system as a whole must allow for operations that do not obey inclusiveness. Otherwise, semantic distinctions like distributive versus collective could only be annotated, not represented.4 Let us place this issue in the broader perspective of the minimalist organization of the language system, as in Figure 3.1. Following Chomsky (1995) and subsequent work, CHL mediates between the sensori-motor system and the language of thought. The lexicon is a set of Wxed triples , with p instructions to the sensori-motor system, l instructions to the language of thought, and g instructions to the computation as such. In this schema the interfaces play a crucial role, since they determine which of the properties of the systems outside CHL are legible for the purpose of computation. What is legible may in fact be only a very impoverished representation of a rich structure. For instance, verbs may have a very rich conceptual structure reXected in their thematic properties. Reinhart (2000b, 2003) 4 Take, for instance, the rules computing entailments. A sentence such as DPplur elected Y does not entail that z elected Y, for z 2 kDPplurk, whereas DPplur voted for Y does entail that z voted for Y, for z 2 kDPplurk. This is reXected in the contrast between we elected me and ??we voted for me. The latter is ill-formed since it entails the instantiation I (lx (x voted for x) ), which is reXexive, but not reXexivemarked (Reinhart and Reuland 1993).
56
The Nature of Gradience
shows that the computational system can read these properties only in so far as they can be encoded as combinations (clusters) of two features: [ + c(ause)] and [ + m(ental)]. Anything else is inaccessible to CHL. The coding of the concept subsequently determines how each cluster is linked to an argument. Note furthermore, that an interface must itself contain operations. That is, it must be a component sui generis. For instance, syntactic constituents do not always coincide with articulatory units. Furthermore, in preparation for instruction to the sensori-motor system, hierarchical structure is broken down and linearized. On the meaning side, it is also well-known that for storage in memory much structural information is obliterated. As Chomsky (1995) notes, at the PF side such operations violate inclusiveness. It seems fair to assume that the same holds for the interpretive system at the C–I interface. Given this schema, more can be ‘wrong’ with an expression than just a violation (crash) of the derivation in CHL. DiVerences in the ‘degree of ungrammaticality’ may well arise by a combination of violations in diVerent components: PF, lexicon, narrow syntax, interpretive system, interpretation itself. Any theory of language with a further articulation into subsystems is in principle well equipped to deal with ‘degrees’ of well- or ill-formedness.
3.3 Issues in binding and co-reference The domain of anaphora is well-suited for an illustration of these issues. The canonical binding theory (Chomsky 1981) represents a strictly categorical approach. However, in many languages anaphoric systems are more complex than the canonical binding theory anticipated. Moreover, not all cases where an anaphoric element is co-valued with an antecedent are really binding relations. Limitations of space prevent me from even remotely doing justice to the discussion over the last decades. Instead I will focus on one basic issue. Going over the literature one will Wnd that sometimes judgements are clearly categorical, in other cases we Wnd judgements that are far more ‘soft’ and ‘variable’. In one and the same domain, languages may in fact vary, even when they are closely related. Below I will discuss some concrete examples. Recently, this property of anaphora has been taken as evidence that a categorical approach to anaphora must be replaced by an approach in which binding principles are replaced by soft, violable constraints along the lines proposed in optimality theoretic approaches to grammar. A useful discussion along these lines is presented in Fischer (2004). Yet, such a conclusion is not warranted. ‘Flexible’ approaches of various kinds may seem attractive at a coarsely grained level of analysis, taking observational entities at face value. The attraction disappears if one sees
Gradedness: Interpretive Dependencies and Beyond
57
that more Wnely grained analyses make it possible to connect the behaviour of anaphoric elements to the mechanisms of the grammar, and to explain variation from details in their feature make-up, or from diVerences in their environment. Current research warrants the conclusion that the computational system underlying binding operates as discretely and categorically as it does in other areas of grammar. However, just as in the cases discussed in Sections 3.1 and 3.2, the computational system does not determine all aspects of interpretation; where it does not, systems of use kick in, evoking the air of Xexibility observed. Much work on binding and anaphora so far is characterized by two assumptions: 1. all binding dependencies are syntactically encoded (by indices or equivalent); 2. all bindable elements are inherently characterized as pronominals or anaphors (simplex, complex, or clitic). These assumptions are equally characteristic of extant OT-approaches to binding. So, one Wnds diVerent rankings of binding constraints on pronominals or subtypes of anaphors, across environments and languages. In Reuland (2001) I argued that both assumptions are false. The argument against (1) rests in part on the inclusiveness condition, and hence is theoryinternal, but should be taken seriously in any theory that strives for parsimony. Clearly, indices are not morphosyntactic objects. No language expresses indices morphologically. Thus, if syntax is, indeed, the component operating on morphosyntactic objects, it has no place for indices. External validation of the claim that syntax has no place for indices rests on the dissociation this predicts between dependencies that can be syntactically encoded and dependencies that cannot be. In so far as such dissociations can be found, they constitute independent evidence (see Reuland 2003 for discussion). The argument against (2) is largely theory-independent. It essentially rests on the observation that there are so many instances of free ‘anaphors’ and locally bound ‘pronominals’, that one would have to resort to massive lexical ambiguity in order to ‘maintain order’. Instead, we can really understand the binding behaviour of pronouns (to use a cover term for anaphors and pronominals) if we consider their inherent features, the way these features are accessed by the computational system, and general principles governing the division of labour between the components of the language system, in particular CHL, logical syntax, and discourse principles as part of the interpretive system.
58
The Nature of Gradience
By way of illustration I will brieXy consider three cases: (a) free anaphors (often referred to as ‘logophors’); (b) reXexives in PPs; and (3) local binding of pronominals. 3.3.1 Free anaphors There are two main issues any theory of binding must address: 1. Why is it that cross-linguistically reXexivity must be licensed? 2. Why is it that certain expressions, ‘anaphors’, must be bound? Question (1) covers a substantial part of Condition B of the canonical binding theory, question (2) reXects the canonical condition A. Consider Wrst question (1). More concretely it asks: Given that pronominals can generally be bound by DP, why cannot we simply use expressions of the form John admires him, or more generally, DP V pronominal, to express that the person who John admires is John? Simply invoking avoidance of ambiguity does not help. There are lots of cases where pronominals are used ambiguously, and no special marking is introduced. An ambiguity story cannot be complete without explaining why in all the other cases there is no marking. Moreover, languages diVer as to where they require special marking (we will discuss one such case below). Again, an avoid-ambiguity story would not lead us to expect such variation. As I argued in Reuland (2001) the essence of the answer to (1) is that reXexivization involves the identiWcation of variables in the logical syntax representation of a predicate.5 Thus, limiting ourselves for simplicity’s sake to binary predicates, reXexivization eVectively reduces a relation to a property, as in (3.7): (3.7) lx ly (P x y) ! lx (P x x) This leads to a theta-violation since the computational system cannot distinguish these two tokens of x as diVerent objects (occurrences). At the C–I interface syntactic structure is broken down and recoverable only in so far as reXected in conceptual structure. This is parallel to what happens at the PF-interface; here syntactic structure is broken down as well, and recoverable in so far as reXected in properties of the signal. SpeciWcally, following the line of Chomsky (1995), a category such as V’, needed to express the speciWer– complement asymmetry in [VP Spec [V’ V Comp] ], is not a term, hence not 5 The notion ‘logical syntax’ is to be distinguished from ‘logical form’. Logical form is the output of the computational system (CHL); the operations yielding logical form are subject to the inclusiveness condition. Logical syntax is the Wrst step in interpretation, with operations that can translate pronominals as variables, can raise a subject leaving a lambda expression, etc., thus not obeying inclusiveness. For a discussion of logical syntax, see Reinhart (2000a) and references cited there.
Gradedness: Interpretive Dependencies and Beyond
59
visible to the interpretive system. Therefore, it is not translatable at the C–I interface, and the hierarchical information it contributes is lost. What about order? Syntax proper only expresses hierarchy, but no order. Order is imposed under realization by spell-out systems. As a consequence, the computational system cannot distinguish the two tokens of x in (3.7) on our mental ‘scratch paper’ by order. Hence, translating DP V pronominal at the C–I interface involves the steps in (3.8): (3.8) [VP x [V’ V x ]] ! ([VP V ‘‘x x’’ ]) ! *[VP V x] 1 2 3 The second step with the two tokens of x in ‘x x’ is virtual, hence it is put in brackets. It is virtual, since with the breakdown of structure, and given the absence of order, it has no status in the computation: eliminate V’ and what you get is the third stage. The transition from (3.8.1) to (3.8.3) does not change the arity of V itself. It is still a 2-place predicate, but in (3.7)/(3.8.3) it sees only one object as its argument. As a consequence, one theta-role cannot be assigned. Under standard assumptions about theta-role discharge a theta-violation ensues. (Alternatively, two roles are assigned to the same argument with the same result.)6 Languages employ a variety of means to obviate the problem of (3.8) when expressing reXexive relations (Schladt 2000). They may mark the verb, they may put the pronominal inside a PP, they may double the pronominal, add a body-part, a focus-marker, etc. Here I will limit discussion to SELF-marking in languages such as English and Dutch. BrieXy, the minimum contribution 6 I am grateful to an anonymous reviewer for pointing out a problem in the original formulation. In the present version I state more clearly the empirical assumptions on which the explanation rests. Note that the problem of keeping arguments apart for purposes of theta-role assignment does not come up in the case of two diVerent arguments, as in (i) (abstracting away from QR/l-abstraction): (i) [VP j [V’ V m ]] ! [VP V m j ]) (–/! [VP V m/j]) 1 2 3 The objects remain distinct. There is no reason that theta-relations established in the conWguration in (i.1) would not be preserved in (i.2). Hence, the issue of (i.3) does not arise. Note, that theta-roles are not syntactic features that modify the representation of the element they are assigned to. That is, they are not comparable to case features. This leaves no room for the alternative in (ii) where xu1 and xu2 are distinguishable objects by virtue of having a diVerent ‘u-feature’ composition. (ii) [VP xu1 [V’ V xu2 ]] ! [VP V xu1 xu2 ] –/! [VP V xu1/xu2] 1 2 3 Rather, with theta-assignment spelled out, but reading xu1 correctly as ‘as x is being assigned the role u1’ we get (iii), which reXects the problem discussed in the main text: (iii) [VP xu1 [V’ V xu2 ]] ! ([VP V ‘‘x x’’u1/u2 ]) ! *[VP V xu1/u2] 1 2 3
60
The Nature of Gradience
that SELF must make in order to make a reXexive interpretation possible is to induce suYcient structure. That is, in a structure such as (3.9) the two arguments of P are distinct. (3.9) lx (P x [x SELF]) This is independent of other eVects SELF may have on interpretation. The semantics of SELF only has to meet one condition: [x SELF] should get an interpretation that is compatible with the fact that the whole predicate has to be used reXexively. That implies that whatever interpretation [x SELF] gets should be suYciently close to whatever will be the interpretation of x. This is expressed in (3.10): (3.10) lx (P x f(x)), with f a function that for each x yields a value that can stand proxy for x It is this property of SELF that gives rise to the statue-reading discussed by JackendoV (1992). A statue reading also shows up in Dutch, as is illustrated in (3.11b), where zichzelf despite being ‘reXexive’ refers to the Queen’s statue rather than to the Queen herself (Reuland 2001). Importantly, the statuereading depends on the presence of SELF. If we replace zichzelf by zich, which is allowed in this environment, the statue reading disappears. So, zich expresses identity, zichzelf stands proxy for its antecedent. (3.11) ‘Madame Tussaud’s’-context: Consider the following example in Dutch: De koningin liep bij Madame Tussaud’s binnen. Ze keek in een spiegel en a. ze zag zich in een griezelige hoek staan b. ze zag zichzelf in een griezelige hoek staan Translation: The queen walked into Madame Tussaud’s. She looked in a mirror and a. she saw SE in a creepy corner stand b. she saw herself in a creepy corner stand Interpretations: (a) zich ¼ the queen: the queen saw herself; (b) zichzelf ¼ the queen’s statue: the queen saw her statue. The diYculty for our computational system to deal with diVerent tokens of indiscernibles is in a nutshell the reason why reXexivity must be licensed. In other languages a ‘protecting element’ can behave quite diVerently from English himself. In Malayalam, for instance, the licensing anaphor does not need to be locally bound at all (Jayaseelan 1997). Compare the sentences in (3.12):
Gradedness: Interpretive Dependencies and Beyond
61
(3.12) a.
raamani tan-nei *(tanne) sneehikunnu ‘Raman SE-acc self loves’ ‘Raman loves himself ’ b. raamani wicaariccu [penkuttikal tan-nei tanne sneehikkunnu enn@] ‘Raman thought [girls SE-acc self love Comp]’ ‘Raman thought that the girls loved him(self).’
Example (3.12a) is a simple transitive clause. A simple pronominal in object position cannot be bound by the subject. A reXexive reading requires the complex anaphor tan-nei tanne. In (3.12b) tan-nei tanne is put in an embedded clause with a plural subject, which is not a possible binder for the anaphor. In English the result would be ill-formed. In Malayalam, however, the matrix subject raaman is available as a binder for the anaphor. Thus, the presence of tanne licenses but does not enforce reXexivity. Note, that tan-ne tanne is essentially a doubled pronoun, and tanne need not be identiWed with SELF. Hence a diVerence in behaviour is not unexpected. As illustrated in (3.5a), in English himself need not always be locally bound either. Reinhart and Reuland (1991, 1993) and Pollard and Sag (1992) present extensive discussion of such cases. Example (3.13) presents an illustrative minimal pair. (3.13) a.
Max boasted that the queen invited Mary and him/himself for a drink b. Max boasted that the queen invited him/*himself for a drink
In (3.13a) the reXexive can have a long-distance antecedent, in (3.13b) it cannot. In their Dutch counterparts we see a diVerent pattern: in both cases, long-distance binding of zichzelf is impossible. (3.14) a.
Max pochte dat de koningin Marie en hem/*zichzelf voor een drankje had uitgenodigd b. Max pochte dat de koningin hem/*zichzelf voor een drankje had uitgenodigd
In Dutch the canonical counterpart of himself is zichzelf. The question is why an LD antecedent is possible in (3.13a) and not in (3.13b), and why it is never possible with zichzelf. If we only consider superWcial properties of himself and zichzelf, no nonstipulative answer will be found. However, Reinhart and Reuland (1991) propose a more Wne-grained analysis (also assumed in Reinhart and Reuland 1993). As we saw, a reXexive predicate must be licensed. In Dutch and English, the licenser is SELF. Unlike Malayalam tan-ne tanne a SELF-anaphor
62
The Nature of Gradience
obligatorily marks a predicate reXexive if it is a syntactic argument of the latter (Reinhart and Reuland 1993: a reXexive-marked syntactic predicate is reXexive). In (3.15b) this requirement causes a clash: the predicate cannot be reXexive due to the feature mismatch between the queen and himself. Hence (3.13b) is ill-formed. In (3.13a) himself is properly contained in a syntactic argument of invite, hence it does not impose a reXexive interpretation. But why does the analysis have to refer to the notion syntactic argument/ predicate? This falls into place if we understand reXexive marking to be a real operation in the computational system. We know independently that coordinate structures or adjuncts resist certain operations that complements easily allow. Movement is a case in point, as illustrated by the coordinate structure constraint. Assume now, that there is a dependency between SELF and the verb that is sensitive to the same constraints as movement. For short, let us assume it is movement and that SELF-movement is really triggered by a property of the verb. Its eVect in canonical cases is represented in (3.15): (3.15) a. John admires himself b. John SELF-admires [him [-]] SELF now marks the predicate as reXexive, and indeed nothing resists a reXexive interpretation of (3.15). Consider now (3.13b), repeated as (3.16): (3.16) a. *Max boasted that [the queen invited himself for a drink] b. *Max boasted that [the queen SELF-invited [him[-]] for a drink] SELF attaches to the verb, marks it as reXexive, but due to the feature mismatch between the queen and him it cannot be interpreted as reXexive, and the sentence is ruled out. Consider next (3.13a), repeated as (3.17): (3.17) a. Max boasted that [the queen invited [Mary and himself ] for a drink] b. Max boasted that [the queen SELF-invited [Mary and [him[-]]] for a drink] X
In this case SELF-movement is ruled out by the coordinate structure constraint. As a consequence, the syntactic predicate of invite is not reXexivemarked, and no requirement for a reXexive interpretation is imposed. Hence, syntax says nothing about the way in which himself is interpreted, leaving its interpretation open to other components of the language system.
Gradedness: Interpretive Dependencies and Beyond
63
SELF also has another use, namely as an ‘intensiWer’. Hence, its interpretation reXects that property precisely in those cases where its use is not regulated by the structural requirements of grammar (as we saw in the case of (3.5) and (3.6)). Note, that we do not Wnd the same pattern in Dutch. This is because himself diVers from zichzelf in features. The zich in zichzelf is not speciWed for number and gender. We know descriptively that zich must be bound in the domain of the Wrst Wnite clause. So, any theory assigning zich a minimal Wnite clause as its binding domain will predict that Max is inaccessible as an antecedent to zich in both (3.14a) and (3.14b) (see Reinhart and Reuland 1991 for references, and Reuland 2001 for an execution in current theory without indices). So, the contrast between Dutch and English follows from independent diVerences in feature content between himself and zichzelf. Why must anaphors be bound? The fact that himself must be bound only where it is a syntactic argument of the predicate, and is exempted where it is not, already shows that this is too crude a question. Also Icelandic sig may be exempted from a syntactic binding requirement, be it in a diVerent environment, namely subjunctive.7 Similar eVects show up in many other languages. Hence, there can be no absolute necessity for anaphors to be bound. Reuland (2001) shows that anaphors have to be bound only if they can get a free rider on a syntactic process enabling the dependency between the anaphor and its antecedent to be syntactically encoded. In the case of anaphors such as sig, or zich a syntactic dependency can be formed as in (3.18): (3.18) DP
I R1
V R2
pronoun R3
R3 is the dependency between object pronoun and V, that is realized by structural case. R1 reXects the agreement between DP and inXection. R2 represents the dependency between verb and inXection (assumed to also present if I is on an auxiliary). All three dependencies are syntactic and independent from binding. Dependencies can be composed. Composition of R1-R2-R3 yields a potential dependency between pronoun and DP. For reasons discussed in Reuland (2001) this dependency can only be formed 7 See the example in (i): (i) Marı´a var alltaf svo andstyggileg. Þegar Olafurj kæmi segd¯i hu´n se´ri/*j a´reid¯anlega ad¯ fara . . . (Thra´insson 1991) Mary was always so nasty. When Olaf would come, she would certainly tell himself [the person whose thoughts are being presented—not Olaf] to leave
64
The Nature of Gradience
eVectively if the pronoun lacks a speciWcation for grammatical number. Thus, zich allows it (lacking a number speciWcation), its pronominal counterpart hem ‘him’ does not. Complementarity between pronominals and anaphors follows not from any local prohibition on binding pronominals, but from a very general principle concerning the division of labour between components of the language system: in order to interpret zich as a variable bound by the antecedent, syntactic processes suYce. The dependency between a pronominal and its antecedent cannot be established in the syntax (grammatical number prevents this), but requires that part of the computation is carried out via the CI Interface. Switching between components incurs a cost, so using zich is less costly than using a pronominal instead. In the case of SELF the trigger must be diVerent. Dutch zelf does indeed have a relevant property which so far has gone unnoticed. Consider the following contrast between zich and zichzelf in Dutch, which shows up with verbs allowing either. One such verb is verdedigen ‘defend’. Suppose a group of soldiers has been given the assignment to occupy a hill and subsequently the enemy attacks. After the battle a number of situations can obtain, two of which can be described as follows: (a) the soldiers kept the hill, but at the cost of most of their lives; (b) the soldiers lost the hill, they all stayed alive. In the Wrst case one can properly say (3.19a), but not (3.19b). In the second case one can say either: (3.19) a.
De The b. De The
soldaten soldiers soldaten soldiers
verdedigden defended verdedigden defended
zich ‘them’ zichzelf themselves
met succes successfully met succes successfully
Zichzelf has a distributive reading (each of the soldiers must have defended himself successfully), whereas zich is collective. If the verbal projection has a position to mark distributivity (Stowell and Beghelli 1997), this is suYcient to warrant attraction of SELF. If so, we have on the one hand an independent account of the fact that SELF is attracted to the verb, we have an account for the meaning contrast in (3.19), and the account accommodates the fact that not all licensers are attracted and that reXexivity is not always enforced. This approach predicts correlations that go beyond being an anaphor or being a pronominal. For instance, given that in Malayalam reXexivity is not enforced by tan-ne tanne, one predicts that it does not mark any special property of the verb. Whether this prediction is in fact correct is a matter of further research.
Gradedness: Interpretive Dependencies and Beyond
65
3.3.2 ReXexives in PPs The following contrast between French (Zribi-Hertz 1989) and Dutch illustrates how a small variation in grammar independent from binding may have a signiWcant eVect on binding. (3.20) a. Jean Jean b. Jean Jean c. Jean Jean d. Jean Jean
est is est is
Wer proud jaloux jealous bavarde avec mocks (of) parle talks
lui/lui-meˆme him/himself *lui/lui-meˆme him/himself *lui/lui-meˆme him/himself de lui/lui-meˆme (of) him/himself de of de of
This pattern is sometimes taken to be a problem for a conWgurational approach to binding, and condition B in particular. ConWgurationally (3.20a) and (3.20b) are identical, and the same holds for (3.20c) and (3.20d). If so, how can a pronominal be allowed in the one case, and not in the other? Alternatively, it is argued, these examples show that the selection of anaphors is sensitive to semantic-pragmatic conditions. How ‘expected’, or ‘normal’, is the reXexivity of the relation expressed by the predicate? Let us assume the relevant factor in French is indeed semantic-pragmatic. Yet, this cannot be all there is to it, since in the corresponding paradigm in Dutch all have the same status and in all cases a complex anaphor is required. (3.21) a.
Jan Jan b. Jan Jan c. Jan Jan d. Jan Jan
is is is is spot mocks praat talks
trots proud jaloers jealous met (of) over (of)
op of op of
zichzelf/*zich him/himself zichzelf/*zich him/himself zichzelf/*zich him/himself zichzelf/*zich him/himself
One, surely, could not seriously claim that Dutch speakers have a diVerent pragmatics, or that trots means something really diVerent than Wer. A clearly syntactic factor can be identiWed, however. As is well-known, Dutch has preposition stranding, but French does not. Whatever the precise implementation of preposition stranding, it must involve some relation between P and the selecting predicate head that obtains in stranding languages like Dutch and does not in French. Let us assume for concreteness sake that this relation
66
The Nature of Gradience
is ‘allows reanalysis’. Thus, P reanalyses with the selecting head in Dutch, not in French (following Kayne 1981). We will be assuming that reanalysis is represented in logical syntax. If so, in all cases in (3.21) we have (3.22) as a logical syntax representation: (3.22) DP [V [P pro]] ! . . . .[ V-P] . . . ! DP (lx ([V-P] x x)) We can see now, that at logical syntax level we have a formally reXexive predicate. Such a predicate must be licensed, which explains the presence of SELF in all cases. In French there is no V-P reanalysis. Hence, we obtain (3.23): (3.23) DP [V [P pro]] ! DP (lx (V x [P x])) Here, translating into logical syntax does not result in a formally reXexive predicate. This entails that no formal licensing is required.8 Hence it is indeed expectations, or other non-grammatical factors that may determine whether a focus marker like meˆme is required. In a nutshell, we see how in one language a grammatical computation may cover an interpretive contrast that shows up in another. 3.3.3 Locally bound pronominals Certain languages allow pronominals to be locally bound. Frisian is a case in point. Frisian is instructive, since the literature occasionally reports that Frisian has no specialized anaphors (Schladt 2000). This is in fact incorrect, since in Frisian as in Dutch, reXexivity must be licensed by adding sels to the pronominal, yielding himsels.9 A rule of thumb is that Frisian has the bare pronominal him where Dutch has the simplex anaphor zich.10 So, in Frisian one Wnds the pattern in (3.24): (3.24) a.
Jan John b. Jan John
wasket washes Welde felt
him himself [him himself
fuortglieden] slip away
8 Note, that by itself meˆme is quite diVerent from SELF (for instance in French there is no *meˆmeadmiration along the lines of English self-admiration, etc.). 9 The occasional claim that Frisian does not have specialized anaphors is reminiscent of the claim that Old English does not have them. For Frisian the claim is incorrect. For Old English I consider the claim as inconclusive. Going over Old English texts, it is striking that all the cases that are usually adduced in support of the claim are in fact cases that would make perfectly acceptable Frisian. What one would need are clear cases of sentences with predicates such as hate or admire to settle the point. 10 Note that him is indeed a fully-Xedged pronominal that can be used as freely as English him or Dutch hem.
Gradedness: Interpretive Dependencies and Beyond c.
67
Jan bewuˆndere him*(sels) John admired himself
This is one of the cases where re-ranking of constraints might descriptively work. However, Reuland and Reinhart (1995) present independent evidence that pronominals such as him in Frisian are under-speciWed for structural case. Fully explaining the consequences of this diVerence in feature speciWcation would lead us beyond the scope of this article. For current purposes a fairly crude suggestion suYces: whereas zich’s under-speciWcation for number makes it possible for Dutch zich to form a syntactic dependency with its antecedent along the lines indicated in (3.18), the case system in Frisian does not enable an element in the position of the pronoun in (3.18) to enter a syntactic dependency with the antecedent, that is the link R3 is not eVective for encoding in Frisian. This goes beyond just saying that Frisian happens to lack a zich-type anaphor. Even if Frisian had a zich-type anaphor, it would not be more economical than using him. In any case, the case correlation shows the importance of considering the Wne grain of grammar to understand the syntactic processes of binding. As discussed in Reuland and Reinhart (1995) German dialects also exhibit interesting variation in the local binding of pronominals. This variation is case related as well, although so far a detailed analysis has not been given. Yet, any theory that fails to take into account that it is case that links DPs, and therefore also anaphors and pronominals, to the fabric of the sentential structure is at risk of missing the source of the variation. This paves the way for discussing an instance of grammar-based gradedness in Dutch. Since Dutch has a three-way system, with a simplex anaphor, a complex anaphor, and a pronominal, the Dutch counterpart of (3.24) has a couple more options than Frisian: (3.25) a.
Jan John b. Jan John c. Jan John
wast washes voelde felt bewonderde admired
zich/1hem himself [zich/1hem wegglijden] himself slip away zichzelf/1zich/ 2hem himself
In (3.25a) and (3.25b), using hem instead of zich only violates the principle that it is more economical to have an anaphor than a pronominal. In (3.25c), however, hem violates two principles, economy and the principle that reXexivity be licensed. And indeed, we do Wnd diVerent degrees of ill-formedness reXecting the number of violations.
68
The Nature of Gradience
3.3.4 Variability and gradedness in binding relations It is time for a summary of how real or apparent variation in binding arises. The collective–distributive contrast in the interpretation of zich versus zichzelf is a potential source of variation in judgements. Unless collectivity versus distributivity is systematically controlled for, an investigator may have the impression of Wnding arbitrary intra- and inter-subject variation where in fact there is underlying systematicity. Assessing binding into PPs, one must distinguish between categorical, grammatical factors, and non-categorical extra-grammatical factors. In Dutch PPs of the type Jan praat over zichzelf ‘Jan talks of himself ’ grammatical factors overrule possible sources of variation from discourse: no variation between complex and simplex anaphors obtains. However, in French Jean parle de lui/lui-meˆme ‘Jean talks of him/himself ’ there are two options, since grammar leaves the choice of strategy open. Judgements will, therefore, be open to variation governed by intra-subject shifts in perspective, like Necker cube eVects in visual perception. Something similar obtains in Dutch locative PPs, where the grammar leaves open the choice between zich and hem, and consequently, grammatically they are in free variation. Yet, the choice is not arbitrary: informally speaking, Jan keek onder zich ‘Jan looked under SE’ is presented from Jan’s perspective, whereas Jan keek onder hem ‘Jan looked under him’ is presented from the speaker’s perspective. Subtle though this is, this is another factor to be controlled for. As the contrasts in (3.5) and (3.6) showed, perspective with its concomitant variability is also involved in the licensing of exempt anaphors. Extragrammatical factors become visible where the categorical distinctions of grammar leave interpretive options open. I presented case as a factor in the local binding of pronominals. In languages without a strong morphology, case by itself is a low-level variable factor in the grammar, and a typical area in which we Wnd dialectal variation and variation across registers in language. Slight variations in the case system may have high level eVects on binding possibilities of pronominals, as may other types of variation in feature composition. In addition we saw that wherever binding phenomena are determined by the interaction between several conditions, gradedness phenomena also can be explained from the satisfaction of some, but not all conditions.
Gradedness: Interpretive Dependencies and Beyond
69
3.4 By way of conclusion I realize that the discussion in this article implies that no easy explanations are forthcoming in linguistics. Being a pronominal or being an anaphor are no longer characterizations that can be taken at their face value. What are the components of a pronominal or anaphoric expression? What grammatical features do they have? What are the properties of the case system? How is agreement eVected? How are prepositions and verbs related? What lexical operations does a language have? What morphological operations are available to manipulate the argument structure of verbs? The answers to such questions are needed in order to assess whether putative instances of tendencies, or grading, are really in the purview of grammar. If certain issues of language have to remain outside our grammars, this simply reXects the fact that, whatever grammar does, it certainly does not tell us all of how to say what we want to say.
4 Linguistic and Metalinguistic Tasks in Phonology: Methods and Findings S T E FA N A . F R I S C H A N D A D R I E N N E M . S T E A R N S
4.1 Introduction Phonologists have begun to consider the importance of gradient phonological patterns to theories of phonological competence (e.g. Anttila 1997; Frisch 1996; Hayes and MacEachern 1998; Ringen and Heinamaki 1999; Pierrehumbert 1994; Zuraw 2000; and for broader applications in linguistics see the chapters in Bod et al. 2003 and Bybee and Hopper 2001). Gradient patterns have been discovered in both internal evidence (the set of existing forms in a language) and external evidence (the use and processing of existing and novel forms in linguistic and metalinguistic tasks). The study of gradient phonological patterns is a new and promising frontier for research in linguistics. Grammatical theories that incorporate gradient patterns provide a means to bridge the divide between competence and performance, most directly in the case of the interface between phonetics and phonology (Pierrehumbert 2002, 2003) and between theoretical linguistics and sociolinguistics (e.g. Mendoza-Denton et al. 2003). In addition, linguistic theories that incorporate gradient patterns are needed to unify theoretical linguistics with other areas of cognition where probabilistic patterns are the rule rather than the exception. A variety of methodologies have been used in these studies, and an overview of these methodologies is the primary focus of this paper. The methodologies that are reviewed either attempt to directly assess phonological knowledge (well-formedness tasks and similarity tasks) or indirectly reXect phonological knowledge through performance (elicitation of novel forms, corpus studies of language use, errors in production and perception). A case
Linguistic and Metalinguistic Tasks in Phonology
71
study demonstrating an approach that combines both internal and external evidence is also presented.
4.2 Approaches to phonological judgements (data sources) The evidence for gradient patterns as part of the phonological knowledge of a language comes from a variety of sources, including psycholinguistic experiments using metalinguistic and language processing tasks, and studies of language corpora. In this section, these methods are brieXy reviewed. While the focus of this chapter is on tasks involving metalinguistic judgements, additional sources of evidence that indirectly reXect linguistic knowledge, such as corpus studies, are also reviewed. When these diVerent methodological approaches are used together, strong evidence for the relevance of gradient phonological patterns can be obtained. For example, lexical corpus studies have revealed numerous systematic statistical patterns in the phonotactics of a variety of languages. If metalinguistic tasks where participants make judgements about novel forms that share those same patterns demonstrate that participants make probabilistic generalizations, then it must be the case that those statistical patterns are part of the phonological knowledge of the participants. 4.2.1 Direct measures Direct measures of phonological judgements are tasks that explicitly ask participants to make a judgement about a lexical item or phonological form. These tasks probe the explicitly available linguistic knowledge of participants. 4.2.1.1 Well-formedness judgements The most commonly used judgement tasks are well-formedness judgement tasks. Well-formedness judgement tasks attempt to directly probe the grammaticality of phonological forms. Two variants of the well-formedness judgement task are the acceptability task and the wordlikeness task. In the acceptability task, the participant is presented with a novel phonological form and given two choices, acceptable/possible or unacceptable/ impossible. This task is equivalent to the prototypical grammaticality judgement task used elsewhere in linguistics. While the acceptability task might, at Wrst glance, appear to reduce the likelihood of collecting gradient data, gradience is still commonly found in acceptability judgement tasks in variation in the response to a stimulus item across participants. For example, Coleman and Pierrehumbert (1997) collected acceptability judgements for
72
The Nature of Gradience
novel multisyllabic words containing illegal onset consonant clusters. They found considerable variability across participants in the acceptability of these clusters, and this variability was predicted in part by the probability of the rest of the novel word. An illegal cluster in a high probability context was accepted by more participants than an illegal cluster in a low probability context. The wordlikeness task attempts to measure gradient well-formedness within individual participants. Participants are asked to judge the degree to which a novel word could be a word in their language. In Frisch et al. (2000) and Frisch and Zawaydeh (2001), these judgements were on a one to seven scale, where ‘one’ represents a word that could not possibly be a word in the language and ‘seven’ represents a word that could easily be a word in the language. Frisch et al. (2000) and Frisch and Zawaydeh (2001) collected wordlikeness judgements for novel multisyllabic words in English and Arabic respectively. As in the Coleman and Pierrehumbert (1997) study, one factor that was shown to strongly aVect wordlikeness is the aggregate probability of the constituents in the novel word. An example of this task applied to novel monosyllabic words is given in the case study section of this chapter. Frisch et al. (2000) collected both acceptability judgements and wordlikeness judgements for the same novel word stimuli in English. In a comparison of the two sets of data, they found that participants appeared to perform a similar judgement task whether they were allowed just two options (acceptable/unacceptable) or seven (1–7). In particular, Frisch et al. (2000) found that the ‘unacceptable’ option was used in the acceptability task more frequently than the rating of ‘one’ in the wordlikeness task. Thus it appeared that participants applied the acceptable/unacceptable rating like a wordlikeness task on a scale from ‘one’ to ‘two’. Overall, they found that phonotactic probability predicted participant judgements in both tasks, and that participant judgements were similar across the two tasks. Given that acceptability and wordlikeness tasks generated similar results, but the wordlikeness task allows participants to express more subtle distinctions between forms, researchers interested in gradient phonological patterns would likely beneWt from using the wordlikeness task. Well-formedness judgement tasks have also been used to probe morphophonological knowledge. Zuraw (2000) investigated nasal substitution in Tagalog. Nasal substitution in Tagalog is a phonological process where a stem initial obstruent consonant is replaced by a homorganic nasal. For example, the stem /kamkam/ ‘usurpation’ surfaces with initial /˛/ in /ma3pa3˛amkam/ ‘rapacious’. This process is lexically gradient, occurring with some stems but not others. The likelihood that a stem participates in nasal substitution depends on the obstruent at the onset of the stem. Substitution is
Linguistic and Metalinguistic Tasks in Phonology
73
more likely for /p t s/ and less likely for /d g/. Zuraw (2000) used elicitation tasks to explore the productivity of this process, and she also used a wellformedness task where participants judged the acceptability of nasal substitution constructions using novel stems in combination with common preWxes. She compared wordlikeness judgements for the same novel stem in forms with and without nasal substitution, and found that ratings were higher for nasal substitution in cases where nasal substitution was more common in the lexicon (e.g. /p t s/ onsets to the novel words). Hay et al. (2004) examined the inXuence of transitional probabilities on wordlikeness judgements for novel words containing medial nasal-obstruent clusters (e.g. strimpy, strinsy, strinpy). Overall, they found that wordlikeness judgements reXected the probability of the nasal-obstruent consonant cluster. However, they found surprisingly high wordlikeness for novel words when the consonant cluster was unattested (zero probability) in monomorphemic words. They hypothesized that these high judgements resulted from participants’ analyses of these novel words as multimorphemic. This hypothesis was supported in additional experiments where subjects were asked to make a forced choice decision between two novel words as to which was more morphologically complex. Participants were more likely to judge words with low probability internal transitions as morphologically complex, demonstrating that participants considered multiple analyses of the forms they were given. Hay et al. (2004) proposed that participants would assign the most probable parse to forms they encountered (see also Hay 2003). The Wndings of Hay et al. (2004) highlight the importance of careful design and analysis of stimuli in experiments using metalinguistic tasks. Although these tasks are meant to tap ‘directly’ into phonological knowledge, the judgements that are given are only as good as the stimuli that are presented to participants and there is no guarantee that the strategy employed by participants in the task matches the expectations of the experimenter. Another example of this problem appears in the case study in this chapter, where perceptual limitations of the participants resulted in unexpected judgements for nonword forms containing presumably illegal onset consonant clusters. 4.2.1.2 Distance from English Greenberg and Jenkins (1964) is the earliest study known to the authors that examined explicit phonological judgements for novel word forms in a psycholinguistic experiment. They created novel CCVC words that varied in their ‘distance from English’ as measured by a phoneme substitution score. For each novel word, one point was added to its phoneme substitution score for every position or combination of positions for which phoneme substitution could make a word. For example, for the novel
74
The Nature of Gradience
word /druk/ there is no phoneme substitution that changes the Wrst phoneme to create a word. However, if the Wrst and second phonemes are replaced, a word can be created (e.g. /Xuk/). For every novel word, substitution of all four phonemes can make a word, so each novel word had a minimum score of one. Greenberg and Jenkins (1964) compared the cumulative edit distance against participants’ judgements when asked to rate the novel words for their ‘distance from English’ using an unbounded magnitude estimation task. In this task, participants are asked for a number for ‘distance from English’ based on their intuition, with no constraint given to them on the range of numbers to be used. Greenberg and Jenkins (1964) found a strong correlation between the edit distance measure and participants’ judgements of ‘distance from English’. They also found similar results when they used the same stimuli in a wordlikeness judgement task where participants rated the words on an 11-point scale. Given that the data from a wordlikeness task is simpler to collect and analyse, there appears to be no particular advantage to collecting distance judgements in the study of phonology. 4.2.1.3 Similarity between words Another phonological judgement task is to ask participants to judge the similarity between pairs of real words or novel words (Vitz and Winkler 1973; Sendlmeier 1987; Hahn and Bailey 2003). This task has been used to investigate the dimensions of word structure that are most salient to participants when comparing two words. Presumably, these same dimensions of word structure would be the most salient to participants in making well-formedness judgements which, in some sense, involve comparing a word to the entire lexicon. These tasks have found that word initial constituents and stressed constituents have greater impact on similarity judgements than other constituents. However, it is well-known from the literature on similarity judgements in cognitive science that similarity comparisons are context dependent (Goldstone et al. 1991). Thus the most salient factors for a pairwise comparison of words might not be the most salient factors when a novel word is compared to the language as a whole. This is a research area where very little direct work has so far been done (see Hahn and Bailey 2003). Frisch et al. (2000) analysed a variety of predictors for their well-formedness judgements and found some evidence that word initial constituents and stressed constituents had a greater role in predicting well-formedness, at least in the case of longer novel words. Thus, it appears that similarity judgements are a tool that can investigate the same dimensions of lexical structure and organization from a diVerent perspective. If a well-deWned relationship between well-formedness and similarity judgements can be found, then that would suggest more explicitly that the
Linguistic and Metalinguistic Tasks in Phonology
75
well-formedness task is somehow grounded in a similarity judgement to one or more existing lexical items. 4.2.2 Indirect measures Indirect measures of well-formedness reXect grammatical linguistic knowledge through linguistic performance. Indirect measures have provided evidence for the psychological reality of gradient phonological patterns, as several studies have shown participant performance to be inXuenced by the phonological probability of novel words or sub-word constituents. 4.2.2.1 Elicitation of novel forms The traditional linguistic approach to exploring competence experimentally is to elicit examples that demonstrate the linguistic process of interest. This procedure has been extended to elicitations that involve nonsense forms, demonstrating unequivocally that a process is productive. This type of experiment is sometimes referred to as ‘wug-testing’, in honour of one of the Wrst uses of this technique by Berko (1958) to explore children’s knowledge of plural word formation using creatures with novel names (e.g. ‘this is a wug, and here comes another one, so now there are two of them; there are now two ____’). Elicitation of novel forms can be used to examine gradient phonological patterns. Variation across participants in their performance in an elicitation task provides evidence for an underlying gradient constraint. Alternatively, if participants are given a number of stimuli of roughly the same phonological form to process, variation within an individual can be observed (Zuraw 2000). 4.2.2.2 Analysis of lexical distributions Linguistic corpora have also been used as a source of external evidence. Generally, these corpora have resulted from large, organized research projects such as CELEX (Burnage 1990) or the Switchboard corpus (Godfrey et al. 1992). Recently, however, web searches have been used to generate corpora from the large amount of material posted on the internet (e.g. Zuraw 2000). Written corpora may be more or less useful for phonological study, depending on the phenomenon to be examined and the transparency of the writing system for the language involved. Written corpora have shown the greatest potential in the study of morphophonology. For example, Anttila (1997) and Ringen and Heinamaki (1999) examined the quantitative pattern of vowel harmony in Finnish suYxes using large written corpora. They found that harmony is variable, with the likelihood of harmony depending on the distance between the harmony trigger and target, and also the prosodic prominence of the trigger. In a subsequent psycholinguistic task, Ringen and Heinamaki (1999) found that type frequencies from their corpus predicted well-formedness judgements of native speakers.
76
The Nature of Gradience
4.2.2.3 Confusability in perception and production Gradient wellformedness of phonological forms is also connected to speech perception and production processes. It is well documented that word perception is inXuenced by the number of phonologically similar words in the lexicon, known as the size of the word’s lexical neighbourhood (e.g. Luce and Pisoni 1998). However, lexical neighbourhood sizes are correlated with phonological probability and the relationship between these two measures has yet to be clearly established (Bailey and Hahn 2001; Luce and Large 2001). Overall, words that share sub-word constituents such as onsets, rimes, or nuclei with many other words are subject to competition during spoken word recognition. In addition, generalizations about more and less frequent subword sequences bias phonetic perception in favour of more frequent sequences (Pitt and McQueen 1998; Moreton 2002). Together, these Wndings suggest that lexical organization for language processing utilizes those subword phonotactic constituents. Similar eVects of phonotactic probability have been demonstrated in speech production. For example, Munson (2001) probed the relationship between phonological pattern frequency and repetition of nonwords by adults and children. The nonwords created for Munson’s study used consonant clusters embedded in a two-syllable construct (CVCCVC). The stimuli were Wrst tested to determine if the frequency of constituents aVected wordlikeness ratings by participants. The wordlikeness ratings were highly correlated with the phonological pattern frequency of the sound sequences within the novel words. Munson then used the same stimuli to study the ability of children and adults to accurately repeat nonwords and found that subjects were less accurate in their repetition of nonwords that contained less frequent phoneme sequences. In addition, Munson (2001) found there was greater variability within participants’ productions for low probability items versus high probability items. Similar speech production eVects have been found elsewhere for monolingual adults (e.g. Vitevitch and Luce 1998) and second language learners (e.g. Smith et al. 1969), although with additional complications that will not be discussed here. These results further support theories that information about phonological pattern frequency is encoded at the processing and production levels of linguistic representation. 4.2.3 Summary Studies of language patterns, language processing, and metalinguistic judgements have found substantial evidence that probabilistic phonological patterns are part of the knowledge of language possessed by speakers. These
Linguistic and Metalinguistic Tasks in Phonology
77
patterns are reXected in the frequency of usage of phonological forms, the ease of processing of phonological forms, and gradience in metalinguistic judgements about phonological forms. In the next section, a case study is presented that demonstrates probabilistic patterns in the cross-linguistic use of consonant clusters and gradience in metalinguistic judgements about novel words with consonant clusters that reXects the cluster’s probability.
4.3 Case study The example case study presented here focuses on initial consonant clusters. In a linguistic corpus study, the frequency of word onset consonant clusters across and within languages is examined. Cross-linguistic patterns of consonant clustering are reXected in the likelihood of use of a consonant cluster type within a language. The pattern appears to be gradient, and based on the scalar property of sonority. In an experimental study, wordlikeness for novel words based on the type frequency of the onset cluster is examined. Wordlikeness judgements are aVected by consonant cluster frequency, showing that native speakers are sensitive to consonant cluster frequency, and thus are capable of encoding a gradient onset consonant cluster constraint. 4.3.1 Cross-linguistic patterns in onset clusters It has been claimed that consonant cluster combinations are restricted within and across languages by sonority sequencing. Sonority is a property of segments roughly corresponding to the degree of vocal tract opening or acoustic intensity (Hooper 1976). Stop consonants have the lowest sonority and liquids and semi-vowels have the highest sonority. Analyses of sonority sequencing constraints have generally assumed that there is a parametric restriction on the degree of sonority diVerence required in a language, where larger sonority diVerences are allowed and smaller sonority diVerences are not allowed (e.g. Kenstowicz 1994). The current study shows that sonority restrictions are gradient, rather than absolute. A closer examination of sonority sequencing for onsets presented in this section shows that the scalar property of sonority is reXected quantitatively in the frequency of occurrence of consonant cluster types across languages. In addition, the data support an analysis of sonority constraints based on ‘sonority modulation’ (Ohala 1992; Wright 1996). The cross-linguistic preference is for onset consonant clusters that have a large sonority diVerence, whether rising or falling. Consonant clusters with large sonority diVerences occur more frequently than consonant clusters with small sonority diVerences across a wide range of languages.
78
The Nature of Gradience
Evidence for gradient sonority modulation constraints can be found by looking at the distribution of consonant clusters within a language. Although a particular degree of sonority modulation may be permitted in a language, it is usually the case that not every conceivable cluster for a particular sonority diVerence is attested. For example, consider word Wnal liquid+C clusters in English, as in pert, purse, fern, furl. Table 4.1 shows the number of attested word Wnal liquid+C cluster types, the number that are theoretically possible given the segment inventory of English, and the percentage of possible clusters that are attested, for the four diVerent levels of sonority diVerence. For example, attested cluster types include /rt/, /rs/, /rn/, and /rl/ as demonstrated in the examples above, as well as /lt/ (e.g. pelt, wilt, salt), /lb/ (bulb), /lv/ (delve, valve), /lm/ (elm, psalm). Unattested clusters include /lg/, /r˛/, and /lr/. The number of possible consonant clusters is determined by the size of the segment inventory. For example, English has three nasal consonants and two liquids, so there are six theoretically possible nasal-liquid combinations. Notice in Table 4.1 that as the sonority diVerence between C and the preceding liquid decreases, the relative number of attested clusters decreases quantitatively, even when the possible number of cluster combinations is taken into account. Similar quantitative analyses of permissible consonant clusters were conducted for onset clusters in a sample of thirty-seven languages from a variety of language families. In this analysis, attested and possible cluster types were determined for all combinations of two stop, fricative, nasal, and liquid consonants as word onsets. The languages analysed were Abun, Aguatec (Mixtec), Albanian, Amuesha, Chatino, Chinantec (Quioptepec), Chinantec (Usila), Chontal (Hokan), Chukchee, Couer D’Alene, Cuicatec, Dakota (Yankton), English, Greek, Huichol, Hungarian, Ioway-Oto, Italian, Keresan, Khasi, Koryak, Kutenai, Mazatec, Norwegian, Osage, Otomi (Mazahua), Otomi (Temoayan), Pame, Portuguese, Romanian, Takelma, Telugu, Terena, Thai, Totonaco, Tsou, and Wichita. For each language, a four-by-four table similar to Table 4.1 was created with the percentage of attested clusters for
Table 4.1. Word Wnal liquid+C clusters in English C
Stop
Fricative
Attested Possible % Attested
11 12 92%
11 16 69%
Nasal 4 6 67%
Liquid 1 2 50%
Linguistic and Metalinguistic Tasks in Phonology
79
each combination of consonant type (e.g. stop-stop, stop-fricative, fricativestop, etc.). Each four-by-four table was then analysed to determine whether larger sonority diVerences were reXected in greater numbers of attested clusters while smaller sonority diVerences were reXected in fewer numbers of attested clusters. For each four-by-four table, there are twenty-four such comparisons that could be made (e.g. stop-stop to stop-fricative, stop-stop to stop-nasal, stop-stop to fricative-stop, etc.). The mean across languages was 74 per cent of comparisons supporting sonority modulation (e.g. fricative-stop and stopfricative more frequent than stop-stop and fricative-fricative). In other words, possible clusters with large sonority diVerences are more frequently attested than possible clusters with small sonority diVerences cross-linguistically. For these onset clusters, sonority modulation appeared to be equally robust for rises toward the syllable nucleus (75 per cent) as for falls toward the nucleus (73 per cent). The minimum for any particular language was for 50 per cent of comparisons supporting sonority modulation (and so 50 per cent not supporting it) and the maximum was 100 per cent of comparisons supporting sonority modulation. Interestingly, non-modulation was never more common than sonority modulation for any of the thirty-seven languages, suggesting that sonority modulation is a universal gradient cross-linguistic constraint. 4.3.2 Wordlikeness judgements Given the apparent presence of quantitative sonority modulation constraints, the next question to be examined is whether these quantitative constraints are part of the synchronic grammar of a native speaker. In this section, an experiment is presented that demonstrates that onset consonant cluster frequencies inXuence well-formedness judgements of native speakers of English. In this experiment, native speakers of English rated novel monosyllabic words with onset consonant clusters for wordlikeness, as in previous studies (e.g. Frisch et al. 2000). 4.3.2.1 Stimuli The stimulus list was created by randomly matching twenty diVerent onset consonant clusters with varying frequencies of occurrence in the lexicon (type frequency) to rimes selected from the mid-range of rime frequencies. The occurrence frequency of each constituent was obtained from a computerized American English dictionary (Nusbaum et al. 1984). Among the consonant clusters were four clusters that do not occur in English (/sr, tl, dl, ul/) and the remaining sixteen ranged in frequency from clusters that occur very rarely (/gw, sf, dw/) or somewhat rarely (/tw, sm, ur, sn/) to clusters
80
The Nature of Gradience
that occur moderately frequently (/kw, dr, sl, fr/) or very frequently (/pl, sp, X, gr, pr/) in the lexicon. The nonword list was analysed to avoid potential confounding factors for wordlikeness judgements such as violating a phonotactic constraint somewhere other than in the onset. Rime statistics were also compiled to examine the eVects of rime frequency on judgements (as in Kessler and Treiman 1997). After discarding items that were not suitable, 115 novel words remained to be used in the experiment. The nonwords were recorded as spoken by the Wrst author using digital recording equipment. 4.3.2.2 Participants Thirty-Wve undergraduate students in an introductory communication sciences and disorders course participated in the experiment. Subjects were between 19 and 45 years of age, and three males and thirty-two females participated. All participants were monolingual native speakers of American English and reported no past speech or hearing disorders. 4.3.2.3 Procedure The experiment was conducted using ECOS/Win experiment software. Participants were seated at individual carrels for the experiment in groups of one to four. Subjects listened to each of the stimuli presented one at a time through headphones at a comfortable listening level. The computer screen displayed a rating range between 1 (not at all like English) and 7 (very much like English) and the participants gave their responses by clicking with a mouse on the button that corresponded with their rating. The total experiment required approximately 15 minutes of the participants’ time to complete. 4.3.2.4 Results The data were analysed based on the mean rating given to each stimulus word across subjects. Three instances where a participant gave no response to the stimulus were discarded; otherwise, all data were analysed. Correlations were examined for the type frequency of onset CC, C1, C2, rime, nucleus, and coda. As expected, mean wordlikeness judgements correlated signiWcantly with the type frequency of the CC sequences contained in the novel words. The CC frequency was the strongest predictor of how the participants judged wordlikeness (r ¼ 0.38). The frequency of the nucleus was also correlated with the participant’s judgement of wordlikeness (r ¼ 0.19). In a regression model of the wordlikeness data using these two factors, both factors were found to be signiWcant (CC: t(112) ¼ 4.1; p < .001; Nucleus: t(112) ¼ 2.0; p < .05). The amount of variance explained by the CC frequency and the nucleus frequency is relatively small (cf. Munson 2001; Hay et al. 2004). Unexpectedly, we found that participant judgements of the novel words containing CC that do not occur in English were fairly high. Possible explanations for this Wnding
Linguistic and Metalinguistic Tasks in Phonology
81
are being investigated. Based on preliminary data collected to date, it appears that participants do not consistently perceive the illegal CC that is presented, but instead regularly perceive a similar sounding CC that is allowed in English (e.g. /tl/ ! /pl/). As mentioned in the discussion of Hay et al. (2004), experiment participants assigned a more probable parse to the unattested sequences present in the stimuli. Setting aside the unattested clusters, the pattern for the remaining clusters is as expected. Mean average ratings for the attested consonant clusters used in the experiment are shown in Figure 4.1 with a best Wt line for the eVect of log CC type frequency in these clusters. The data clearly show that the CC frequency was a strong indicator of the subjects’ ratings of stimulus words. Overall, then, this experiment demonstrates that English speakers are sensitive to the frequency of occurrence of onset consonant clusters, and thus have learned the patterns in the lexicon that would reXect a universal sonority modulation constraint. Thus it seems possible that English speakers (and presumably speakers of other languages) could learn a sonority modulation constraint.
7
6
fr
Mean Rating
5 θr psn 4 dw 3
fl
gr kw sp snpldr sl
sf
2
gw
1 1
10
100
1000
Log CC Type Frequency
Figure 4.1. Mean wordlikeness rating for occurring consonant clusters in English (averaged across stimuli and subjects) by consonant cluster type frequency
82
The Nature of Gradience
4.4 Implications Phonology provides an ideal domain in which to examine gradient patterns because there is a rich natural database of phonological forms: the mental lexicon. A growing number of studies using linguistic and metalinguistic evidence have shown that phonological structure can be derived, at least in part, from an analysis of patterns in the lexicon (e.g. Bybee 2001; Coleman and Pierrehumbert 1997; Frisch et al. 2004; Kessler and Treiman 1997; and see especially Hay 2003). In addition, it has been proposed that the organization and processing of phonological forms in the lexicon is a functional inXuence on the phonology, and that gradient phonological patterns quantitatively reXect the diYculty or ease of processing of a phonological form (Berg 1998; Frisch 1996, 2000, 2004). For example, the storage of a lexical item must include temporal information, as distinct orderings of phonemes create distinct lexical items (e.g. /tAp/ is diVerent from /pAt/ in English). The temporal order within a lexical item is reXected in some models of speech perception, such as the Cohort model and its descendents (e.g. Marslen-Wilson 1987), where lexical items compete Wrst on the basis of their initial phonemes, and then later on the basis of later phonemes. Sevald and Dell (1994) found inXuences of temporal order on speech production. They had participants produce nonsense sequences of words and found that participants had more diYculty producing sequences where words shared initial phonemes. Production was facilitated for words that shared Wnal phonemes (in comparison to words without shared phonemes). In general, models of lexical activation and access predict that lexical access is most vulnerable to competition between words for initial phonemes and less vulnerable for later phonemes, as the initial portions of the accessed word severely restrict the remaining possibilities. This temporal asymmetry can be reXected quantitatively in functionally grounded phonological constraints. For example, phonotactic consonant co-occurrence constraints in Arabic are stronger word initially than later within the word (Frisch 2000). This asymmetry is compatible with the claimed grounding of this co-occurrence constraint in lexical access (Frisch 2004). It has also been demonstrated in a wide range of studies that the phonological lexicon is organized as a similarity space (Treisman 1978; Luce and Pisoni 1998). The organization of the lexicon as a similarity space is reXected in processing diVerences based on activation and competition of words that share sub-word phonotactic constituents. The impact of similarity-based organization is most clearly reXected in cases of analogical processes between
Linguistic and Metalinguistic Tasks in Phonology
83
phonologically similar words (Bybee 2001; Skousen et al. 2002). In related work, similarity has also been used to explain limitations on phonological processes. For example, Steriade (2000, 2001) claims that grammar is constrained to create maximally similar underlying and derived forms where similarity is deWned over a perceptual lexical space. Given the grounding of gradient phonological patterns in lexical distributions, it is unclear whether a distinct phonological grammar (phonological competence) is required above and beyond what is necessary to explain patterns of phonological processing (phonological performance). Ultimately, this is an empirical question. Arguments for and against a distinct phonological grammar have been made (e.g. Pierrehumbert 2003; Moreton 2002). However, diVerent models of phonological processing and the lexicon make diVerent predictions about the nature of phonological patterns. Traditionally, models of language processing hypothesize that abstract linguistic constituents, such as phonemes, onsets, rimes, and syllables, are used in speech perception, spoken word recognition, speech production, and lexical organization. These models assume that, at some stage of processing, an abstract representation is involved that is independent of speciWc instances (e.g. Dell 1986; Berent et al. 1999). This type of model uses generalizations that are naturally akin to the types of representations found in phonological grammars. These types of representations would encode gradient phonological patterns as frequency counts or probabilities for the symbols. Recently, an alternative processing model based on instance-speciWc exemplars has received empirical support in language processing (e.g. Johnson 1997; Pierrehumbert 2002). Exemplar models can still explain generalizationbased behaviour as a reXex of the collective activation of exemplars that are similar along some phonological dimension. Thus, abstract phonological categories may not be needed, and consequently a grammar containing abstract phonological rules may be an illusion resulting from the stable behaviour that follows from the activation of a large number of overlapping exemplars. Exemplar models of language represent the frequency information in gradient phonological constraints in a completely straightforward and transparent way. The frequency information is a direct consequence of frequency of exposure and use. The symbolic and exemplar representations systems are not necessarily mutually exclusive. It is also possible that both abstract categories and instance-speciWc exemplars are part of a person’s phonological knowledge, creating a phonological system with redundancy and with competing analyses of the same phonological form (cf. Bod 1998; Pierrehumbert 2003).
84
The Nature of Gradience
4.5 Summary Studies of gradience in phonology using linguistic and metalinguistic data have revealed a much closer connection between phonological grammar and the mental lexicon. These new dimensions of phonological variation could not have been discovered without corpus methods and data from groups of participants in psycholinguistic experiments. While the range of patterns that have been studied is still quite limited, the presence of gradient phonological constraints demonstrates that phonological knowledge goes beyond a categorical symbolic representation of possible forms in a language. In order to accommodate the broader scope of phonological generalizations, models of grammar will have to become more like models of other cognitive domains, which have long recognized and debated the nature of frequency and similarity eVects for mental representation and processing. The study of phonology may provide a unique contribution in addressing these more general questions in cognition for two reasons. The scope of phonological variability is bounded by relatively well-understood mechanisms of speech perception and speech production. Also, phonological categories and phonological patterns provide a suYciently rich and intricate variety of alternatives that the full complexity of cognitive processes can be explored.
5 Intermediate Syntactic Variants in a Dialect-Standard Speech Repertoire and Relative Acceptability LEONIE CORNIPS
5.1 Introduction Non-standard varieties, such as dialects throughout Europe, which are under investigation challenge research about the phenomenon of micro-variation in two ways.1 Within the framework of generative grammar, the linguist studies the universal properties of the human language in order to Wnd out the patterns, loci, and limits of syntactic variation. Language is viewed essentially as an abstraction, more speciWcally, as a psychological construct (I-language) that refers primarily to diVerences between individual grammars within a homogeneous speech environment, that is to say, without consideration of stylistic, geographic, and social variation. Given this objective, a suitable investigative tool is the use of intuitions or native-speaker introspection, an abstraction that is quite normal within the scientiWc enterprise. Frequently, however, there are no suYciently detailed descriptions available of syntactic phenomena that are of theoretical interest for investigating micro-variation in and between closely related non-standard varieties in a large geographical area (cf. Barbiers et al. 2002). Subsequently, a complication emerges in that the linguist has to collect his own relevant data from speakers of local dialects who are non-linguists. The elicitation of speaker introspection often calls for a design of experiments in the form of acceptability judgements when the linguist has to elicit intuitions from these speakers (Cornips and Poletto 2005). 1 I like to thank two anonymous reviewers for their valuable comments. Of course, all usual disclaimers apply.
86
The Nature of Gradience
Moreover, the standard variety may strongly interfere with local dialect varieties in some parts of Europe so that there is no clear-cut distinction between the standard and the local dialect. In this contact setting—a so-called intermediate speech repertoire (cf. Auer 2000)—the speakers of local dialects may assess all possible syntactic variants, that is dialect, standard, and emerging intermediate variants to their local dialect. Subsequently, clear-cut judgements between the local dialect and the standard variety are not attainable at all. This is, among other factors, of crucial importance for understanding the phenomenon of gradedness in acceptability judgements. This chapter is organized as follows. In the second part it is proposed that acceptability judgements do not oVer a direct window into an individual’s competence. The third part discusses an intermediate speech repertoire that is present in the province of Limburg in the Netherlands. In this repertoire some constructions are diYcult to elicit. Finally, acceptability judgements that are given by local dialect speakers in the same area are discussed. Using data from reXexive impersonal passives, reXexive ergatives, and inalienable possession constructions, it is argued that the occurrence of intermediate variants and the variation at the level of the individual speaker is not brought about by speciWc task-eVects but is due to the induced language contact eVects between the standard variety and the local dialects.
5.2 Relative acceptability Bard et al. (1996: 33) discuss the important three-way distinction among grammaticality, a characteristic of the linguistic stimulus itself, acceptability, a characteristic of the stimulus as perceived by a speaker, and an acceptability judgement which is the speaker’s response to the linguists’ enquiries. These authors note that relative grammaticality is an inherent feature of the grammar whereas relative acceptability reXects gradience in acceptability judgements. The former has a controversial status since it is not entirely clear how to deal with relative grammaticality in formal theory. According to Schu¨tze, the best-known proponents of the view that grammaticality occurs on a continuum are Ross, LakoV and their followers in the late 1960s and early 1970s (see Schu¨tze 1996: 62, 63 for more detail). With respect to acceptability judgements, every elicitation situation is artiWcial: the speaker is being asked for a sort of behaviour that, at least on the face of it, is entirely diVerent from everyday conversation (cf. Schu¨tze 1996: 3). Moreover, Chomsky (1986: 36) argues that: ‘In general, informant judgments do not reXect the structure of the language directly; judgments of acceptability, for example, may fail to provide direct evidence as to grammatical status
Intermediate Syntactic Variants
87
because of the intrusions of numerous other factors’ (cf. Gervain 2002). ‘The intrusion of numerous other factors’ may lead to a crucial mismatch between the acceptability judgements of a construction and its use in everyday speech. One of these factors is that in giving acceptability judgements people tend to go by prescriptive grammar (what they learned at school, for instance) rather than by what they actually say (cf. Altenberg and Vago 2002). This is consistent with sociolinguistic research that prescriptive grammars usually equal standard varieties that are considered more ‘correct’ or have more prestige than the vernacular forms speakers actually use. Moreover, strong sociolinguistic evidence shows that a speaker may judge a certain form to be completely unacceptable but can, nevertheless, be recorded using it freely in everyday conversation (Labov 1972, 1994, 1996: 78). One way to diminish the prescriptive knowledge eVect is to ask for indirect comparative acceptability judgements. Rather than eliciting direct intuitions by the formula: ‘Do you judge X a grammatical/better sentence than Y?’, speakers can be asked the more indirect: ‘Which variant Y or X do you consider to be the most or the least common one in your local dialect?’ Relative judgements can be administered by asking the speakers to indicate how uncommon or how common (for example represented by the highest/lowest value on a several point scale, respectively) the variant is in their local dialect. Psychometric research shows that subjects are thus much more reliable on comparative, as opposed to independent ratings (cf. Schu¨tze 1996: 79 and references cited there). These Wndings indicate that relative acceptability is an inevitable part of the speaker’s judgements. Relative acceptability is without doubt brought about by the complex relationship between I-language and E-language phenomena. The opposition between these two types of phenomena is not necessarily watertight as is often claimed in the literature. Muysken (2000: 41–3) argues that the cognitive abilities which shape the I-language determine the constraints on forms found in the E-language and that it is the norms created within E-language as a social construct that make the I-language coherent. One example of this complex relationship may be that in a large geographical area two or more dialects may share almost all of their grammar (more objective perspective) but are perceived as diVerent language varieties by their speakers (more subjective perspective). This can be due to the fact that dialects may diVer rather strongly in their vocalism and that non-linguists are very sensitive to the quality of vowels. The perceived diVerences between dialects may be associated with diVerent identities and vice versa. Another consequence may be that speakers actually believe that the norm created within their speech community (E-language; more subjective perspective) reXects their grammar
88
The Nature of Gradience
(I-language; more objective perspective). For example, in the Dutch Syntactic Atlas project (acronym SAND) we asked dialect speakers to ‘translate’ standard Dutch verbal clusters containing three verbs, as exempliWed in (5.1), into their local dialect (cf. Cornips and Jongenburger 2001).2 (5.1) Ik weet dat Jan hard moet kunnen werken I know that Jan hard must can work Quite a number of speakers told us that their dialects are simpler or more informal than standard Dutch. Therefore, sentences such as (5.1) are judged as ungrammatical or less grammatical than sentences containing two-verb clusters. However, there is not one Dutch dialect attested that excludes constructions as in (5.1). So, it is important to realize that the use of the dialect and the standard variety in a speciWc setting may be triggered by stylistic or social factors (in a speciWc setting, with speciWc interlocutors) if these varieties constitute a continuum. This information minimizes the risk that we are obtaining information about the prescriptive norms of the standard (or prestigious or formal) variety while our intention is rather to question speakers about their dialect (or vernacular non-standard) forms.
5.3 The intermediate speech repertoire The so-called intermediate speech repertoire (cf. Auer 2000) as exempliWed in the southeastern part of the Netherlands (province of Limburg) is presumably the most widespread in Europe today. In this repertoire, there is a structural or genetic relationship between the standard variety and the dialects (cf. Auer 2005). The inXuence of the standard variety on the dialects is quite manifest. There is no longer a clear-cut separation between the varieties, that is to say, speakers can change their way of speaking without a clear and abrupt point of transition between these varieties. This is of crucial importance to understanding relative acceptability. In general, syntactic elicitation provides no diYculties if structures are grammatical in the standard variety and ungrammatical in the dialects (Cornips and Poletto 2005). The speakers usually refuse these constructions, either by providing a grammatical alternative or by simply not being able to translate the sentence and, hence, show non-response. For instance, in the local dialect of Heerlen (a city in the south of the province of Limburg in the Netherlands, see Figure 5.1) negation agreement in (5.2a) is ungrammatical.
2 More information about the SAND-project can be found at: http://www.meertens.nl/projecten/ sand/sandeng.html
Intermediate Syntactic Variants
89
The local dialect speaker easily provides an alternative, as shown in (5.2b) (taken from the Dutch Syntactic Atlas-project): (5.2) Instruction: ‘Translate into your local dialect’ a. Er wil niemand niet dansen expl will no one not dance ‘No one wants to dance.’/ ‘Everyone wants to dance.’ Translation: b. Gene wilt danse no one wants dance ‘No one wants to dance.’ Second, speakers may provide syntactic features that are obligatory in the local dialect even if the same phenomenon is banned from the standard variety. In this case speakers seem to be able to distinguish whether a given construction is grammatical without interference from prescriptive norms. In the dialect of Heerlen and standard Dutch, there is a very sharp contrast between the grammaticality of the impersonal passive with and without a reXexive, respectively. The local dialect speakers in Heerlen were asked whether they encounter the variant in (5.3a) in their local dialect. This variant is fully ungrammatical in the standard variety. The majority of the subjects (16 out of 24, 67 per cent) provide an aYrmative answer, which is conWrmed by their translation, as exempliWed in (5.3b):3 (5.3) Local dialect of Heerlen Instruction: ‘Do you encounter this variant’ a. Er wordt zich gewassen refl washed thereexpl is Answer: ‘Yes’ and translation b. 1t wee¨d zich gewessje refl washed thereexpl is ‘One is washing himself.’ Finally, in the case that the phenomenon is optional, speakers tend to reproduce the standard variety, because this is nonetheless grammatical in their dialect (Cornips and Poletto 2005). This issue is nicely illustrated by responses to constructions in which an aspectual reXexive zich (an optional element) is oVered to the Limburg dialect speakers, as illustrated in (5.4) (cf. Cornips 1998):
3 However, the plain impersonal passive is also grammatical in the local dialect of Heerlen. In that case it has no reXexive interpretation: ‘One is washing (clothes).’
90
The Nature of Gradience
(5.4) d’r Jan had zich in twiee¨ minute e beeke gedro`nke the Jan had reX in two minutes a small beer drunk The construction with the reXexive zich is fully ungrammatical in the standard variety but optional in the local dialect. The written questionnaire of the Dutch Syntactic Atlas project shows that in only two out of thirty-Wve possible locations in the province of Limburg and its immediate surroundings, an answer with the reXexive is given. Obviously, the interference with the standard variety is so strong that the reXexive is not presented in the answers. It seems that only a very good subject can provide optional structures or all the possibilities that come to his mind. 5.3.1 Heerlen Dutch as an emerging intermediate regional standard variety Without any doubt, every emerging intermediate speech repertoire is a result of an induced language contact situation and/or processes of standard–dialect and/or dialect–dialect convergence, that is to say, vertical and horizontal levelling, respectively (cf. Cornips and Corrigan 2005). The emergence of intermediate variants, which may result in a regional variety as a second standard in the area, is crucial to understanding the phenomenon of syntactic variation within the speech community and at the individual speaker level. A good example of an intermediate variety due to language contact eVects is Heerlen Dutch (Cornips 1998). Heerlen Dutch is a regional standard Dutch variety in the Netherlands. Heerlen is a town of 90,000 inhabitants, situated in Limburg, a province in the southeast of the Netherlands near the Belgian and German borders (see Figure 5.1). As already discussed above, in the local dialect of Heerlen reXexives may occur in a much wider range of constructions than in standard Dutch. An example is the appearance of the reXexive zich in inchoative constructions. In standard Dutch (SD) the appearance of zich in inchoative verb constructions is far from regular. Zich is required in (5.5a), is optional in (5.5b), and obligatorily absent in (5.5c) (cf. Everaert 1986: 83, SD ¼ standard Dutch): (5.5) a
SD Het the b. SD De the c. SD De the
gerucht rumour suiker sugar boter butter
verspreidt spreads lost dissolves smelt *(zich) melts refl
(*zich) refl (zich) op refl part.
Intermediate Syntactic Variants
91
N
Amsterdam
et he nd
Germany
rla s Belgium (Flemish part)
=
Heerlen
Figure 5.1. The location of Heerlen in the province of Limburg
In the local dialect of Heerlen, the reXexive zich has to be present in (5.5b) and it may be present in (5.5c). Further, it also arises in some more inchoative constructions based on transitive verbs as veranderen ‘change’, krullen ‘curl’, and buigen ‘bend’. All these inchoative constructions are fully ungrammatical with a reXexive in the standard variety. Heerlen Dutch, however, as a second standard variety in the area has regularized the presence of the reXexive throughout the whole verb class. It has an optional reXexive zich in the construction in (5.5c) and also in (5.6) which are ungrammatical in the local dialect (and in the standard variety) (HD ¼ Heerlen Dutch):4 4 In Cornips and Hulk (1996), it is argued that the Heerlen Dutch constructions in (5.6) are ergative intransitive counterparts of transitive change of state verbs with a causer as external argument. Further, it is shown that in constructions such as (5.4) and (5.6), the reXexive marker zich acts as an aspectual marker, that is: the aspectual focus is on the end-point of the event.
92
The Nature of Gradience
(5.6) a.
HD Dit this b. HD De the c. HD Het the
vlees meat jurk dress glas glass
bederft spoils sleept drags breekt breaks
zich refl zich over de vloer refl over the Xoor zich refl
It is important to note that optionality arises as an induced contact outcome (see also later in Section 5.3.2). Another example of an emerging variant in Heerlen Dutch concerns the socalled dative inalienable possession construction in which the referent of the dative object is the possessor of the inalienable body-part(s) denoted by the direct object. Importantly, all (old) monographs of the Heerlen dialect and written dialect literature (see Blancquaert et al. 1962; Jongeneel 1884; Kessels 1883) show that in the local dialect the DP referring to the body-part(s) such as handen ‘hands’, illustrated in (5.7), is headed by the deWnite determiner de ‘the’. The possessive dative construction expressing inalienable possession is abundantly used in the eastern dialect varieties of Dutch, although extremely rare in standard Dutch (cf. Broekhuis and Cornips 1994; Cornips 1998). The inalienable possession construction, as far as possible, has an idiomatic reading in standard Dutch that is completely absent in this regional Dutch variety and in the dialects of Limburg (Hdial ¼ Heerlen dialect): (5.7) Hdial /?*SD Ik was Jandat./hemdat. de handen. I wash Jan/him the hands ‘I am washing Jan’s/his hands.’ Hence, in the standard variety the inalienable possession relation must be expressed by means of a possessive pronoun, namely zijn ‘his’, as illustrated in (5.8). The construction in (5.8) is in turn rare in the local dialect of Heerlen: (5.8) SD/?*Hdial Ik was zijn/Jans handen. I wash his/Jan’s hands ‘I am washing his/Jan’s hands.’ Nowadays, Heerlen Dutch involves a large spectrum of intermediate lects varying between the local dialect and standard Dutch. As a result, in Heerlen Dutch we Wnd both the inalienable dative construction in (5.7) and the possessive pronoun construction in (5.8). Thus, the syntactic variation within a regional variety as Heerlen Dutch corresponds to cross-linguistic diVerences between English and French, as in (5.9) and (5.10), respectively (Cornips 1998; Vergnaud and Zubizarreta 1992):
Intermediate Syntactic Variants (5.9)
93
a. Eng *I am washing himdat. the hands. b. Eng I am washing his hands.
(5.10) a. Fr b. Fr
*Je lave ses mains. (*inalienable reading) Je luidat. lave les mains.
Moreover, in spontaneous speech data—although rarely—we even Wnd intermediate forms such as the dative object in combination with the possessive pronoun, as in (5.11b) (cf. Cornips 1998, HD ¼ Heerlen Dutch): (5.11) a.
HD
Ik I b. HD Ik intermediate form I c. HD Ik I
was wash was wash was wash
hemdat. him hemdat. him
de the zijn his zijn his
buik. stomach buik. stomach buik. stomach
It is important to point out that this intermediate variant was already present in 1962 in the neighbouring community of Heerlen, that is Kerkrade.5 Obviously, the inalienable possessive construction is not a binary variable since it allows more than two variants, as illustrated in (5.11). Spontaneous speech data of intermediate variants is presented in (5.12) (see also (5.19) in Section 5.3.2, Cornips 1998). In (5.12) the inalienable possession relation is expressed both by means of a possessive dative je ‘you’ and by means of the possessive pronoun je ‘your’. Note, however, that in all these examples the DP referring to an inalienable body-part is the complement of a PP and not a direct object:6 naar je (5.12) a. HD want ze zeuren jedat van alles because they nag you everything to your hoofd toe (19: Cor) head part ‘They are nagging you about everything.’ bij je neus staan b. HD die had jedat zo they had you right away with your nose stand (35: dhr Berk) ‘They stand in front of you right away.’ 5 Blancquaert et al. (1962) denote the following translation in the local dialect of Kerkrade (in Dutch orthography): (i) De schipper likte zich zijn lippen af The skipper licked refl his lips part. ‘The skipper licked his lips.’ 6 The constructions in (5.12) with a PP are more accepted in standard Dutch than the double object constructions (see Broekhuis and Cornips 1997 for more details about this type of construction).
94
The Nature of Gradience
In (5.13) the semi-copula krijgen ‘get’ cannot assign dative case to the possessor hij ‘he’ which is therefore nominative (cf. Broekhuis and Cornips 1994). However, the spontaneous speech data example in (5.13b) shows that the possessor is also realized by means of the reXexive zich, which is fully ungrammatical in the local dialect (like standard Dutch): (5.13) a. HD Hij krijgt de he gets the ‘His hands are dirty.’ b. HD Die heeft zich (33: dhr Quint) he has refl ‘He gets hell.’
handen hands
vies dirty
enorm
op z’n donder gekregen
enormously a beating
got
5.3.2 The lack of characteristic properties of the dative inalienable possession construction Interestingly, it is not only the case that intermediate variants emerge in an intermediate speech repertoire but they also lack characteristic properties (Vergnaud and Zubizarreta 1992: 598) of the ‘original’ dative inalienable possession construction. Two important properties are the strictly distributive interpretation and grammatical number. The former refers to the presence of a plural possessor combined with a singular inalienable argument as in (5.14a). If that is the case, the referent of the inalienable argument is nevertheless interpreted as referring to more than one bodypart. The latter property refers to the fact that inalienable arguments are obligatorily singular when referring to body-parts of which the number per individual is limited to one such as ‘head’, regardless of whether they have a plural possessor or not. This is illustrated by the grammaticality contrast between (5.14b) and (5.14c) (note that there is no idiomatic reading involved): Ik was hundat./3pl het hoofd (i.e. more ‘heads’ involved) I wash them the head ‘I am washing their heads.’ b. *Ik was hundat./3pl de hoofdenpl I wash them the heads ‘I am washing their heads.’ c. Ik was hundat./3pl de handenpl I wash them the hands ‘I am washing (both) their hands.’
(5.14) a.
Intermediate Syntactic Variants
95
Let us now compare (5.14) with the spontaneous speech data example in (5.15) that occurs very infrequently in the corpus:7 (5.15) HD Ze slaan mekaarpl niet meteen de koppenpl in (5: Stef) they hit each other not at once the heads in ‘They don’t hit each other immediately.’ Apparently, both a distributive interpretation and the property of grammatical number may no longer be characterizing properties of the intermediate forms in Heerlen Dutch, that is: the inalienable argument koppen ‘heads’ is plural although the number of the body-parts kop ‘head’ per individual is limited to one. Finally, the dative construction cannot be modiWed by just any attributive adjective, whereas there is no such restriction in the possessive pronoun constructions, as exempliWed in (5.16a) and (5.16b), respectively (Vergnaud and Zubizarreta 1992): vieze buik HD *Ik was hemdat. de I wash him the dirty stomach ‘I am washing his dirty stomach.’ b. HD Ik was zijn vieze buik. I wash his dirty stomach ‘I am washing his dirty stomach.’
(5.16) a.
I asked speakers of Heerlen Dutch to tell a short story containing the elements vuil ‘dirty’ and handen ‘hands’. In addition to (5.16b), they realize the intermediate variant in (5.17) which lacks, due to the presence of the possessive pronoun, any restriction on the presence of the adjective: (5.17) HD Hij wast hemdat. zijn vuile handen. He washes him his dirty hands ‘He is washing his dirty hands.’ Further, a major important aspect of emerging intermediate variants such as the ones described above is that optionality arises. Hence, a major characteristic of the dative inalienable possession construction is that the [spec TP]subject or the agent, cannot enter into a possessive relation with the direct
7 One reviewer points out that the German counterpart of (5.15) with a plural inalienable argument is grammatical, whereas the German counterpart of (5.14a) with a singular inalienable argument is ungrammatical. However, (5.14a) is fully grammatical both in the local dialect of Heerlen and Heerlen Dutch.
96
The Nature of Gradience
object (or prepositional complement), not even if the indirect object is absent, as illustrated in (5.18a). Thus, a possessive relation between the subject and the direct object can only be expressed indirectly, namely by inserting a dative NP or a reXexive zich, as in (5.18b), respectively (see Broekhuis and Cornips 1994; Vergnaud and Zubizarreta 1992). de handeni (5.18) a. Hdial /SD *Hiji wast b. Hdial /?*SD Hiji wast zichi de handen. he washes reX the hands ‘He is washing his hands.’ However, in all the intermediate variants described so far in which the possessive relation is expressed by the possessive pronoun, the dative object or reXexive referring to a possessor is optional. Importantly, all constructions in (5.19), in contrast to the double object constructions, involve idiomatic readings:8 naar je hoofd toe (19: Cor) (5.19) a. HD ze zeuren (jedat) van alles they nag you everything to your head part ‘They are nagging you about everything.’ bij je neus staan b. HD die had (jedat) zo they had you right away with your nose stand (35: dhr Berk) ‘They stand in front of you right away.’ c. HD Die heeft (zich) enorm op z’n donder gekregen (33: dhr Quint) he has refl enormously a beating got ‘He gets hell.’ Taken together, these intermediate variants are of extreme importance with respect to the locus of syntactic variation, that is: whether the primitive of variation is located outside or inside the grammatical system. It becomes obvious that the facts in Heerlen Dutch indicate that syntactic variation can no longer exhaustively be described by binary settings or diVerent values of a parameter (Cornips 1998). In an intermediate speech repertoire, this concept is a very problematic one and must be open to discussion. DiVerent alternatives are possible but there are no satisfying answers yet. A more recent alternative is to argue that from a minimalist point of view, lexical elements
8 It might be the case that, eventually, the optional dative object will disappear or that it will gain emphatic meaning.
Intermediate Syntactic Variants
97
show minimal morphosyntactic diVerences, more speciWcally, whether they bear un- or interpretable features interacting with general principles.9 Another alternative is to place the notion of choice between syntactic variants into the grammatical system. Henry (1995) shows that imperatives in Belfast allow optional raising of the verb to C and inversion in embedded questions may or may not occur. She accounts for this optionality by arguing that a functional category such as C has both strong and weak features instead of diVerent settings of a parameter. In sum, analyses diVer with respect to the locus of syntactic variation (grammar versus lexicon) and whether individuals may have one or two (or more) grammars responsible for the various syntactic variants.
5.4 Acceptability judgements in an intermediate speech repertoire The coming into existence of intermediate variants is of crucial importance for understanding the phenomena of relative acceptability and perhaps relative grammaticality. Hence, variationist studies have convincingly shown that individual speakers do not show all possible alternatives that exist at their level of their community (Henry 2002; Cornips 1998). So, the behaviour of individual speakers with respect to acceptability judgments cannot be interpreted without knowledge of the community pattern. An individual speaker thus has a passive knowledge of more possible syntactic alternatives than he actually uses due to the fact that these possible alternatives, that is standard, dialect, and emerging intermediate variants, can be heard daily in his community. The intermediate variants form a continuum with the standard and local dialect varieties. This continuum arises not only from a geographic perspective but also from a stylistic (for example the use of dialect and standard features in a more informal and formal setting, respectively) and social perspective (age, gender, ethnicity, levels of education, and occupation of the speaker) as well. I propose that a speaker may no longer be able to judge syntactic features as fully grammatical or ungrammatical. Instead, it is very likely that due to the eVects of the standard–dialect contact situation the speaker can only make relative judgements by comparing those variants. Further, it might be the case that this gradience in acceptability judgements partly arises due to the relative
9 However, even if the syntactic variants are analysed as coming into existence as a result of two competing grammars (cf. Kroch 1989), then some lexical elements must still bear un- and interpretable features as well in order to account for the syntactic alternatives.
98
The Nature of Gradience
grammaticality of the intermediate variants in the community, namely the fact that intermediate variants no longer possess characterizing properties as discussed in 5.3.2. In the former section, a regional standard variety was discussed. Let us now consider the local dialects in the same area. These dialects were investigated in the Dutch Syntactic Atlas project. The design of methodology in the Dutch Syntactic Atlas project consisted of two phases, written elicitation and oral elicitation. The oral acceptability judgement tasks were administered in dialect rather than in the standard variety or some regiolect, in order to avoid accommodation, that is adjustment from the dialect in the direction of the standard-like varieties (cf. Cornips and Poletto 2005). In the phase of oral elicitation, 250 locations were selected throughout the Netherlands. We had a major problem in doing the Weldwork since the large majority of the Weldworkers and Ph.D. students speak only the standard variety. It is for this reason that we had to ask for the assistance of another dialect speaker from the same community speaking the same variety in order to be able to interview the subject in his own dialect. The Weldworker (speaking only standard Dutch) trained a local dialect speaker as an ‘assistant interviewer’. This ‘assistant interviewer’ was asked to translate a standard Dutch structured elicitation task into his or her local dialect. These translations were recorded. In a second session these recordings were played to the second local dialect speaker. In this session, the entire conversation was restricted to the two dialect speakers and the Weldworker did not interfere. 5.4.1 Oral elicitation: the local dialects Two small case studies convincingly show how easily speakers switch between the (base) dialect and the standard variety in an oral task in the southern part of the province of Limburg where an intermediate speech repertoire exists. One of the locations involved in the project was Nieuwenhagen (Landgraaf) a very small ‘rural’ village in the environs of Heerlen. In the local dialect of Nieuwenhagen proper names are obligatorily preceded by the deWnite determiner et or der ‘the’ depending on whether the proper name refers to a female or male, respectively. The presence of the deWnite determiner preceding a proper name, as in (5.20), is fully ungrammatical in standard Dutch: (5.20) et Marie / der Jan is krank det Mary /det Jan is ill
Intermediate Syntactic Variants
99
The recording of the Wrst session between the standard Dutch speaking Weldworker and the local ‘assistant interviewer’ translating standard Dutch into his own dialect shows that the deWnite article in his translation is absent: that is, the proper names Wim and Els show up without it, as illustrated in (5.21). These sentences were elicited in order to investigate the order in the verbal cluster (right periphery): (5.21) 1st session (dialect–standard) Ø Wim dach dat ich Ø Els han geprobeerd e kado te geve Wim thought that I Els have tried a present to give ‘Wim thought I tried to give a present to Els.’ In the same interview session, the ‘assistant interviewer’ shows in another sentence that he may or may not use the deWnite article resulting in der Wim and Ø Els respectively in his local dialect. Note that the deWnite determiner precedes the subject DP whereas it is absent in front of the object DP: (5.22) 1st session (dialect–standard) Der Wim dach dat ich Ø Els e boek han will geve det Wim thought that I Els a book have will give ‘Wim thought I wanted to give a book to Els.’ In the second session, however, in which the ‘assistant interviewer’ exclusively interviews the other dialect speaker in the local dialect, the latter utters the deWnite article both with the subject and object DP as ‘required’: (5.23) 2nd session (dialect–dialect) Der Wim menet dat ich et Els e boek probeerd ha kado te geve. ‘Wim thought I tried to give a book to Els.’ Other indications for easily switching between the base dialect and standard Dutch can be found in (5.24). The inWnitival complementizer has the form om and voor in standard Dutch and the local dialect, respectively (see Cornips 1996 for more details). In the Wrst session, the ‘assistant interviewer’ in interaction with the standard Dutch speaking Weldworker uses the standard Dutch complementizer om whereas in the second session the dialect speaker utters voor, as illustrated in (5.24a) and (5.24b), respectively. Moreover, note that in the Wrst session the proper name Wendy lacks the deWnite article again whereas it is present in the second session, as presented in (5.24a) and (5.24b), respectively: (5.24) a.
Ø Wendy probeerdet om ginne pien te doe.
1st session (dialect– standard)
100
The Nature of Gradience b. Et Wendy hat geprobeerd voor ginne pien te doe. 2nd session (dialect– dialect) ‘Wendy tried not to hurt anyone.’
It is important to note that these functional elements were not explicitly mentioned to the dialect speakers as features we were interested in. From the above, it is obvious that in this linguistic repertoire, speakers can adjust to the standard variety (and surrounding varieties) without a noticeable eVort. This might be due to the fact that speakers are sensitive to their (un)conscious awareness of social diagnosticity of syntactic features, namely that features belonging to the domain of standard Dutch are the prestige variants (Cornips 1996). It is for this reason that training interviewers who are native speakers of the local dialect is necessary although every design has to take into account that standard, non-standard, and intermediate variants represent the daily speech situation, that is: syntactic features from the local dialects and standard Dutch appear in a continuum and have become continuous (cf. Paolillo 1997). The emergence of new intermediate syntactic variants, too, points towards a direction in which it is no longer possible to make a clear-cut distinction between the standard variety and the local dialects from a syntactic point of view. In contrast, as already noted in the introduction, the experiences during Weldwork are that the Limburgian speakers perceive the local dialect and the standard variety as two diVerent varieties and associate them with diVerent identities although they share almost all of their grammar. 5.4.2 Written and oral acceptability judgements in the local dialect In this section, written acceptability judgements about the inalienable possession construction are discussed. The Wrst step in the design of the Dutch Syntactic Atlas project was an extensive written questionnaire containing 424 questions (including sub-questions and remarks to be made by the respondents) that were sent out to 850 respondents and a number of additional informants in Belgium (Cornips and Jongenburger 2001). The grid of the written questionnaires of the Dutch Syntactic Atlas project contains, among others, ten neighbouring villages of Heerlen. In this questionnaire, local dialect speakers were oVered the possessive pronoun construction, as in (5.9), repeated here for convenience as (5.25): (5.25) instruction ‘Translate into your local dialect’ Ik was zijn handen. I wash his hands ‘I am washing his hands.’
Intermediate Syntactic Variants
101
Example (5.26) and Figure 5.2 (on page 102) reveal the translations of (5.25) into the local dialect: (5.26)
‘translations’ location Beek Eijgelshoven Maastricht Vaals Waubach
he ha¨t zien hanj geweschen He had sieng heng gewesche heer heet zien han gewasse Hae hat zieng heng jewaesje Hee¨ hat zieng heng gewessje he has his hands washed f. intermediate variant Eijgelshoven Hee¨ had zieg sieng heng gewesje g. intermediate variant Valkenburg Hae haet zich zien heng gewesje h. intermediate variant Spekholzerhei hea hat ziech zien heng jewesche he has REFL his hands washed i. dialect variant Simpelveld hea hat zich de heng gewesche j. dialect variant Waubach Hea haa zich de heng gewe`sje he has REFL the hands washed a. standard variant b. standard variant c. standard variant d. standard variant e. standard variant
To begin with, the responses show that standard variants, dialect variants, and intermediate variants are among the answers. Further, all deviations of the input, for example intermediate and dialect variants as in (5.26f,g,h) and (5.26i,j) respectively, provide strong evidence that these variants are in the grammar of the speaker (Carden 1976). Moreover, variation arises within a local dialect, as is the case in the spontaneous speech data of Heerlen Dutch, which is a regional standard variety. Thus, two respondents in the location of Eijgelshoven and Waubach reveal diVerent responses. The former displays both the standard and the intermediate variant in (5.26b), and (5.26f), respectively, whereas the latter yields the standard and the dialect variant in (5.26e) and (5.26j), respectively. Finally, the majority of the respondents copy the standard Dutch variant into their local dialect, as illustrated in (5.26a–e).10 In order to control for this task eVect, we also administered this type of construction in the oral acceptability task (see below). Taken together, the translations provide evidence that (a) the standard variety strongly interferes with the local dialect variety, (b) intermediate variants arise, and (c) in this part of the province of Limburg syntactic features from the local dialects and standard Dutch exist in a continuum both in a regional standard variety and in the local dialects (see also (5.11)). 10 Maastricht, in the western part of Limburg, denoted the standard variant in 1962 (cf. Blancquaert et al. 1962). The translations in the atlas of Blancquart seem to suggest that the dative inalienable possessive construction is more spoken in the eastern part of Limburg, i.e. Heerlen and surroundings.
102
The Nature of Gradience
Possessive pronoun
(5)
Intermediate forms
(3)
dative inalienable possession (2)
Figure 5.2. Possible inalienable possession constructions as revealed by the responses to the written questionnaire in ten surrounding locations of Heerlen
Similar to the written translation task, the standard Dutch possessive pronoun construction in (5.25) above was oVered in the oral elicitation task, which was the second step in the Dutch Syntactic Atlas project. The locations in the neighbouring villages of Heerlen where oral Weldwork was conducted are presented in Figure 5.3. In the Wrst section, that is the standard–dialect interaction (see 5.3.2) the assistant interviewers have been asked to translate (5.25) into their local dialect. Only 4 out of 12 respondents (33 per cent) immediately translate (5.25) into the dialect variant which is the dative inalienable possession construction. On the other hand, the majority of the assistant interviewers (8 out of 12, 67 per cent) just copy the possessive pronoun construction into their local dialect. Hence, these respondents show interference from the
Intermediate Syntactic Variants
103
c Meertens Inst
Figure 5.3. Grid of the oral interviews in Heerlen and neighbouring locations
standard variety in shifting towards the more prestigious variety in their response, as is the case in the written elicitation task. What is more, the majority of the respondents (6 out of 8, 75 per cent) reveals an implicational pattern revealing that they copy the standard Dutch variant in the Wrst session (dialect–standard repertoire) whereas they use the intermediate or the local dialect variant in the second session which is the dialect–dialect repertoire, as illustrated in (5.27). (5.27) Location of Vaals: Assistant interviewer: a. Oversetz: ‘Hij Instruction: he ‘Translate’ b. 1st session: ‘Her standard–dialect he c. 2nd session: ‘Her dialect–dialect he Assistant interviewer: d. ‘Komt disse Her satz ook veur?’ ‘Do you also He encounter this variant?’ e. Answer: ‘ja’ ‘yes’
heeft zijn has his
handen gewassen.’ hands washed
had has had has
zien his sich refl
heng hands sien his
gewasse’ washed heng hands
gewasse.’ washed
had
zich
de
heng
gewasse.
has
refl the
hands
washed
104
The Nature of Gradience
Again, the interaction in (5.27) reveals that the speech repertoire in Limburg is a continuous one in which the distinction between standard and dialect varieties is blurred. Consequently, the dialect speaker judges all possible variants, that is to say, the standard possessive pronoun (5.27b), the dialect dative construction (5.27d,e), and the intermediate variant (5.27c) as acceptable. More evidence is presented by the fact that 9 out of 12 speakers (75 per cent) accept both the dative possessive construction and the intermediate form. Strikingly, 6 out of 12 speakers (50 per cent) argue that all forms in (5.27) are acceptable in the local dialect. Two of them give relative judgements without being asked: one considered (5.27c) as slightly more acceptable than (5.27d), the other speaker just considered (5.27d) slightly more acceptable than (5.27c). This small case study indicates (a) extensive variation at the level of the individual speaker such that half of the speakers show all possible syntactic alternatives that exist on the level of their community and (b) the existence of intermediate variants to such an extent that it blurs the distinction between the local dialect and the standard variety. This result is attested in spontaneous speech, and in both the written and oral elicitation data, so we can exclude the possibility that it is primarily due to task eVects. In this intermediate speech repertoire the occurrence of intermediate variants is inevitable in the process of vertical standard–dialect and horizontal dialect–dialect convergence. These Wndings put a question mark on the central sociolinguistic proposal that only phonology is a marker of local identity whereas syntax is a marker of cohesion in large geographical areas. Further, syntactic elicitation shows that speakers of local dialects are no longer able to refuse syntactic variants as fully ungrammatical even if (a) these concern emerging intermediate variants and (b) they did not originally belong to their local dialect variety. Consequently, relative acceptability is the result.
5.5 Conclusion In this paper, I have discussed a so-called intermediate speech repertoire, that results from a contact situation between standard Dutch, a regional Dutch variety (Heerlen Dutch), and local dialects in the southern part of the province of Limburg in the south of the Netherlands. This speech repertoire reveals syntactic diVerences along a continuum to such an extent that it blurs the distinction between the local dialect and the standard variety. It is demonstrated that in this speech repertoire clear-cut judgements are not attainable at all. Using case studies, it has been shown that speakers in this area are not able to judge syntactic features as fully grammatical or ungrammatical. Instead, all variants heard in the community, for example standard,
Intermediate Syntactic Variants
105
dialect, and intermediate variants are considered as acceptable. Moreover, it may be argued that in this speech repertoire dialect and standard varieties form a continuum also beyond the geographic level, that is to say, a continuum from a stylistic and social variation perspective. Subsequently, the Wndings in these case studies can be generalized beyond geographical variation. Moreover, these case studies show that syntax may also be a marker of local identity.
6 Gradedness and Optionality in Mature and Developing Grammars A N TO N E L L A S O R AC E
6.1 Introduction This paper focuses on speciWc patterns of gradedness and optionality in the grammar of three types of speakers: monolingual native speakers, speakers whose native language (L1) has been inXuenced by a second language (L2), and very Xuent non-native speakers. It is shown that in all these cases gradedness is manifested in areas of grammar that are at the interface between the syntax and other cognitive domains. First, evidence is reviewed on the split intransitivity hierarchy (Sorace 2000b, 2003a), indicating that not only auxiliary selection but also a number of syntactic manifestations of split intransitivity in Italian and other languages are lexically constrained by aspectual properties. These constructions tend to show gradedness in native intuitions that cannot easily be accommodated by current models of the lexicon–syntax interface. Moreover, the mappings between lexical properties and unaccusative/unergative syntax are developmentally unstable, whereas the unaccusative/unergative distinction itself is robust and unproblematic in acquisition. Second, it is shown that residual optionality, with its entailed gradedness eVects, occurs only in interface areas of the competence of near-native speakers, and not in purely syntactic domains. Sorace (2003b) indicates that the interpretable discourse features responsible for the distribution of overt and null subject pronouns are problematic in the L2 steady state of L1 English learners of Italian, whereas the non-interpretable syntactic features related to the null subject parameter are completely acquired. Third, it is argued that the diVerentiation between narrow computational syntactic properties and interface properties is also relevant in other domains of language development, such as L1 attrition due to prolonged exposure to a
Gradedness and Optionality in Mature and Developing Grammars
107
second language (Sorace 2000b; Tsimpli et al. 2004; Montrul 2004). A clear parallelism exists between the end-state knowledge of English near-native speakers of Italian and the native knowledge of Italian advanced speakers of English under attrition with respect to null/overt subjects and pre/postverbal subjects. In both cases, the speakers’ grammar is/remains a null-subject language: for example null subjects, when they are used, are used in the appropriate contexts, that is when there is a topic shift. The purely syntactic features of grammar responsible for the licensing of null subjects are not aVected by attrition. The generalization seems to be that constructions that belong to the syntax proper are resilient to gradedness in native grammars; are fully acquired in L2 acquisition; and are retained in L1 attrition. In contrast, constructions that require the interface of syntactic knowledge with knowledge from other domains are subject to gradedness eVects; present residual optionality in L2; and exhibit emergent optionality to L1 attrition. The question of the interpretation of this generalization, however, is still open. There are (at least) two issues for further research. First, there is a lack of clarity about the nature of diVerence interfaces. Are all interfaces equally susceptible to gradedness and optionality? Second, is gradedness inside or outside the speakers’ grammatical representations? The available evidence is in fact compatible both with the hypothesis that the gradedness is at the level of knowledge representations and with the alternative hypothesis that it arises at the level of processing. Possible approaches to these open issues are outlined.
6.2 The syntax–lexicon interface in native grammars: split intransitivity According to the unaccusative hypothesis (Perlmutter 1978; Burzio 1986), there are two types of intransitive verbs, unaccusative and unergative, with distinct syntactic properties. The essential insight (variously expressed by diVerent syntactic theories) is that the subject of unaccusative verbs is syntactically comparable to the object of a transitive verb, while the subject of an unergative verb is a true subject. Evidence for the distinction is both syntactic and semantic: in several European languages unaccusative verbs generally select BE as a perfective auxiliary while unergative verbs select have; semantically, the subject of unaccusative verbs tends to be a patient while that of unergative verbs is an agent. However, it has proved diYcult to Wt many verbs unambiguously into one class or the other. On the one hand, there are verbs
108
The Nature of Gradience
that do not satisfy unaccusativity diagnostics in consistent ways, both within and across languages: so blush is unaccusative in Italian (arrossire, selecting be) but unergative in Dutch (blozen, selecting have); Worire ‘blossom’ can take either have or be. On the other hand, there are verbs that can display either unaccusative or unergative syntax depending on the characteristics of the predicate: for example, all verbs of manner of motion (e.g. swim) select have in Dutch and German when they denote a process but take be in the presence of a prepositional phrase denoting a destination; verbs of emission (e.g. rumble; see Levin and Rappaport Hovav 1995 for extensive discussion) are unergative in their default case but in some languages may exhibit unaccusative behaviour when they receive a telic interpretation. One of the main challenges opened up by the unaccusative hypothesis is therefore how to account for the variable behaviour of verbs. A great deal of research in the last ten years has been devoted to explaining the complex mappings between a lexical-semantic level of representation and the level of syntactic structure. This eVort has taken two broad and seemingly incompatible directions. Theories of argument structure (which, following Levin and Rappaport Hovav (1996) may be termed ‘projectionist’) assume that the verb’s lexical entry contains the necessary speciWcation for the mapping of arguments onto syntactic positions. This approach posits Wne-grained distinctions in lexical-semantic representations, singles out the syntactically relevant lexical-semantic components in diVerent languages, and identiWes a set of linking rules that deterministically project lexical-semantic features onto syntactic positions, hence determining the unaccusative or unergative status of verbs. The second direction—named ‘constructional’ by Levin and Rappaport Hovav (1996)—empties the lexical entries of verbs of any syntactic speciWcation and makes semantic interpretation directly dependent on the syntactic conWgurations in which the verb can appear. If verbs are thus not tied to deterministic linking rules but have freedom of mapping, unaccusativity or unergativity become by-products of the verb’s compatibility with particular syntactic conWgurations, instead of inherent lexical properties of verbs (Borer 1994). However, both the projectionist and the constructional solutions to the lexicon–syntax puzzle have limitations; the most relevant is that the former allows for too little variation, because of the deterministic nature of its linking rules, whereas the latter allows too much variation, because of the lack of a mechanism that rules out impossible mappings. These limitations have been highlighted in particular by Sorace (1992, 1993a, 1993b, 1995, 2000b, 2003a), who has shown that there is systematic variation that cannot be explained by
Gradedness and Optionality in Mature and Developing Grammars CHANGE OF LOCATION >
109
Categorical unaccusative syntax
CHANGE OF STATE > CONTINUATION OF STATE > EXISTENCE OF STATE > UNCONTROLLED PROCESS > MOTIONAL PROCESS > NON-MOTIONAL PROCESS
Categorical unergative syntax
Figure 6.1. The split intransitivity hierarchy
either approach.1 Instead, she proposes that intransitive verbs are organized along a hierarchy (the split intransitivity hierarchy (SIH), originally called the auxiliary selection hierarchy (ASH)) deWned primarily by aspectual notions (telicity/atelicity), and secondarily by the degree of agentivity of the verb (Figure 6.1). The SIH is therefore an empirical generalization that identiWes the notion of ‘telic change’ at the core of unaccusativity and that of ‘atelic non-motional
1 The study of optionality and gradedness at interfaces can be measured experimentally with behavioural techniques that are able to capture subtle diVerences in speakers’ performance. The informal elicitation techniques traditionally used in linguistics and language development research (such as binary or n-point acceptability judgement tests) are unlikely to be reliable for such data, because they can measure only very broad distinctions and typically yield ordinal scales, in which the distance between points cannot be evaluated (Sorace 1996). A suitable experimental paradigm that has gained ground in recent years is magnitude estimation (ME), a technique standardly applied in psychophysics to measure judgements of sensory stimuli (Stevens 1975). The magnitude estimation procedure requires subjects to estimate the perceived magnitude of physical stimuli by assigning values on an interval scale (e.g. numbers or line lengths) proportional to stimulus magnitude. Highly reliable judgements can be achieved in this way for a whole range of sensory modalities, such as brightness, loudness, or tactile stimulation (see Stevens 1975 for an overview). The ME paradigm has been extended successfully to the psychosocial domain (see Lodge 1981 for a survey) and recently Bard et al. (1996), Cowart (1997), and Sorace (1992) showed that it may be applied to judgements of linguistic acceptability. Unlike the n-point scales conventionally employed in the study of psychological intuition, ME allows us to treat linguistic acceptability as a continuum and directly measures acceptability diVerences between stimuli. Because ME is based on the concept of proportionality, the resulting data are on an interval scale, which can therefore be analysed by means of parametric statistical tests. ME has been shown to provide Wne-grained measurements of linguistic acceptability, which are robust enough to yield statistically signiWcant results, while being highly replicable both within and across speakers. ME has been applied successfully to phenomena such as auxiliary selection (Bard et al. 1996; Sorace, 1992, 1993a, 1993b; Keller and Sorace 2003), binding (Cowart 1997; Keller and Asudeh 2001), resumptive pronouns (Alexopoulou and Keller 2003; McDaniel and Cowart 1999), that-trace eVects (Cowart 1997), extraction (Cowart 1997), and word order (Keller and Alexopoulou 2001; Keller 2000b).
110
The Nature of Gradience
activity’ at the core of unergativity. The closer to the core a verb is, the more determinate its syntactic status as either unaccusative or unergative. Verbs that are stative and non-agentive are the most indeterminate. Sensitivity to contextual or compositional factors correlates with the distance of a verb from the core. Thus, the ASH helps to account both for variability and for consistency in the behaviour of intransitive verbs. In contrast to the constructionist view, where context is always critical, the ASH account prescribes that core verbs have syntactic behaviour that is insensitive to non-lexical properties contributed by the sentence predicate. On the other hand, peripheral verbs, which are neither telic nor agentive, do seem to behave according to the constructionist observation, with syntactic behaviour depending on the properties of the predicate in which they appear. The SIH substantiates the intuition that, within their respective classes, some verbs are ‘more unaccusative’ and ‘more unergative’ than others (Legendre, Miyata, and Smolensky 1991). Crucially, however, this does not mean that unaccusativity or unergativity are inherently gradient notions, or that the distinction is exclusively semantic, but rather that some verbs allow only one type of syntactic projection whereas other verbs are compatible with diVerent projections to variable degrees. This is the reason why any approach that focuses exclusively on the syntactic or on the semantic side of split intransitivity is ultimately bound to provide only a very partial picture of the phenomena in this domain. While no formal model yet exists that can comprehensively account for the SIH, the SIH has given a new impetus to the search for such a model. Theoretical research inspired by the SIH has in fact been developed within diVerent frameworks and for diVerent languages (e.g. Bentley and Eyrtho´rsson 2004; Cennamo and Sorace in press; Keller and Sorace 2003; Legendre in press; Legendre and Sorace 2003; Randall in press; Mateu 2003; among others). Developmental evidence for the SIH comes from research on second language acquisition (Montrul 2004, in press; DuYeld 2003) and on Wrst language attrition (Montrul in press). Core verbs are the Wrst ones to be acquired with the correct auxiliary both in Wrst and second language acquisition. Data from the acquisition of Italian as a non-native language show that the syntactic properties of auxiliary selection are acquired earlier with core verbs and then gradually extended to more peripheral verb types (Sorace 1993a, 1995), although L2 learners do not attain the same gradient intuitions as those displayed by native Italians. Moreover, Italian learners of French Wnd it more diYcult to acquire avoir as the auxiliary for verbs closer to the core than for peripheral verbs (Sorace
Gradedness and Optionality in Mature and Developing Grammars
111
1993b, 1995), and do not completely overcome this diYculty even at the advanced level. A study by Montrul (in press) conWrms this pattern for L2 learners of Spanish, who have determinate intuitions on the syntactic correlates of split intransitivity in this language, but only with core verbs. These developmental regularities suggest two things. First, the acquisition of the syntax of unaccusatives crucially depends on the internalization of both the hierarchical ordering of meaning components, and the lexicon– syntax mapping system instantiated by the target language. The pattern uncovered by these data is consistent with an enriched constructional model, equipped with a checking mechanism that is sensitive to the degree of lexical speciWcation of verbs and rules out impossible mappings (see Mateu 2003). As it is the position of verbs on the ASH, rather than their frequency, which determines the order of acquisition, it seems that L2 learners do rather more than engaging in the kind of statistical learning envisaged by a basic constructional model. Second, and more generally, there are two sides to the split intransitivity question: a syntactic side (the structural conWguration that determines unaccusativity or unergativity) and a lexicon–syntax interface side (the mapping system that decides the syntactic behaviour of any given verb). Gradedness and indeterminacy in native grammars, as well as learning diYculties and residual problems in non-native grammars, tend to be situated on the interface side: the syntactic distinction itself is categorically stable.
6.3 The syntax–discourse interface in language development Developmental research points to the conclusion that the same areas of grammar appear to be unstable in other domains of language development and change, regardless of the circumstances in which development takes place. Areas of grammar that have been found to be particularly vulnerable to variable crosslinguistic inXuence in diVerent bilingual populations are those that involve the coordination of syntax and discourse. The obvious questions are why this should be so, whether crosslinguistic inXuence is the only cause of these phenomena, and whether the source of crosslinguistic inXuence is the same for each bilingual group. Before addressing these questions, the convergence between two developmental domains—L2 acquisition and L1 attrition—will be brieXy illustrated. 6.3.1 Endstate grammars One of the characteristics of L2 advanced grammars that has received attention recently is residual optionality, that is unsystematic L1 eVects surfacing in
112
The Nature of Gradience
L2 speakers’ production.2 A much-discussed case is subject realization in null-subject languages spoken by non-native speakers. It is well established that null subjects in these languages are syntactically licensed but their distribution is governed by discourse-pragmatic features (Rizzi 1982; Cardinaletti and Starke 1999). In Italian, a typical agreement-licensed nullsubject language, sentences such as (6.1a) are possible, whereas the equivalent sentence in English, (6.1b), is not: E` partito is-3rd SG gone. b. *Is gone.
(6.1) a.
Moreover, in Italian the option of a null or overt subject is conditioned by pragmatic factors, such as the [topic-shift] and the [focus] feature (Grimshaw and Samek-Lodovici 1998). Thus in (6.2), an overt pronoun lui in the subordinate clause can be co-referential with the complement Pietro, or with an extralinguistic referent, but not with the matrix subject Gianni. In contrast, a null pronoun in the same context signals co-referentiality with the topic Gianni. (6.2) Giannii ha salutato Pietrok quando proi / lui*i/k/j l’ha visto. Gianni has greeted Pietro when pro / he him-saw ‘Gianni greeted Pietro when he saw him.’ A characteristic feature of English near-native speakers of Italian (Sorace 2000a, 2003b; Filiaci 2003; Belletti et al. 2005) is that they optionally produce (6.3b), where a monolingual Italian speaker would produce (6.3c). Perche` why b. (perche`) (because) c. (perche`) (because)
(6.3) a.
Maria Maria lei she ___ has
e` is ha has ha found
andata gone trovato found trovato another
via? away? un altro lavoro. another job un altro lavoro. job
In contrast, the same speakers never produce a null pronoun when there is a shift of topic, as in (6.4b), or when the subject is contrastive, as in (6.5b). (6.4) a. Perche` Maria non ha parlato con nessuno? b. Perche *Ø (¼ Gianni) non l’ha neanche guardata because Ø (¼ Gianni) didn’t even look at her 2 Optionality is regarded here as the pre-condition for gradedness: the term refers to the co-existence in the same grammar of two alternative ways of expressing the same semantic content, of which one appears to be preferred over the other by the speaker in production and comprehension, creating gradedness eVects (Sorace 2000b).
Gradedness and Optionality in Mature and Developing Grammars (6.5) a. Maria Maria b. *No, Ø (¼ Paolo) No, Ø
ha has ha
detto che said that detto che
113
andava da Paolo? was going to Paolo’s? andava da lei
said that
he was her. going to A similar pattern obtains for the position of subjects with respect to the verb. In answer to an all-focus question, such as ‘what happened’, L1 English nearnative speakers of Italian optionally place the subject in preverbal position (6.6b), whereas native Italians would naturally place it after the verb (6.6c). This also happens in a narrow-focus context (6.7b), in which Italian requires the topic to be in postverbal position (6.7c). (6.6) a. Che cosa e` successo? b. Gianni Gianni c. E` is-3s
‘What happened ?’ e` is-3s partito left
partito. left Gianni Gianni
(6.7)
‘‘Who sneezed?’’ ha has-3s starnutito) sneezed
starnutito sneezed Gianni Gianni
a. Chi ha starnutito? b. Gianni Gianni c. (Ha Has-3s
These patterns are noticeably asymmetric: near-native speakers of Italian overgeneralize overt subject pronouns and preverbal subjects to contexts which would require null subjects and postverbal subjects in native Italian, but they do not do the reverse, namely they do not extend null and postverbal subjects to inappropriate contexts. In fact, when they use null pronouns and postverbal subjects, they use them correctly. These speakers therefore have acquired a nullsubject grammar. The optionality in their grammar does not aVect the syntactic licensing of null subjects, but is at the level of the discourse conditions on the distribution of pronominals and on the placement of subjects.3 It is worth noting that although the behaviour of native speakers is statistically diVerent from that of near-natives, it is not categorical. In a small number of cases, native speakers also over-produce overt subjects and postverbal subjects in inappropriate discourse contexts. The signiWcance of 3 The few existing studies on near-native L2 grammars point to a similar split between purely syntactic constraints, which are completely acquired, and interpretive conditions on the syntax, which may or may not be acquired. See Sorace (2003b, in press) for details.
114
The Nature of Gradience
this detail lies in the fact that the options favoured by near-native speakers are (strongly) dispreferred by natives, but they are not illicit in their grammar. 6.3.2 L1 attrition There is evidence that the same pattern of asymmetric optionality is exhibited by native speakers of null subject languages who have had prolonged exposure to English. Research on changes due to attrition from another language (Sorace 2000a, Tsimpli et al. 2004) indicates that native Italians who are near-native speakers of English exhibit an identical pattern of optionality as the English near-native speakers of Italian described above: these speakers overgeneralize overt subjects and preverbal subjects to contexts which require a null subject or a postverbal subject. The reverse pattern is not found. It is worth noting that the phenomenon is found both in production and in comprehension. For example, in the forward anaphora sentences in (6.8b), speakers under attrition are signiWcantly more likely than monolingual Italians to judge the overt pronoun as coreferential with the matrix subject ‘Maria’; however, the null pronoun in (6.8a) is correctly interpreted as referring to the matrix subject. These speakers are also more likely to produce sentences such as (6.9a) regardless of whether the subject is deWnite or indeWnite, whereas monolingual speakers would prefer a postverbal subject, as in (6.9b), particularly when the subject is indeWnite. (6.8)
a. b.
(6.9)
a.
b.
Mentre while Mentre while
pro LEI she
attraversa is crossing attraversa is crossing
la strada, the street, la strada, the street,
Maria Maria Maria Maria
saluta greets saluta greets
la sua her la sua her
amica friend amica friend
Hai sentito che un palazzo/il palazzo dell’ONU e` crollato? Have heard that a building /the UN building collapsed? you Hai sentito che e` crollato un palazzo/il palazzo dell’ONU? Have heard that is collapsed a building/the UN building? you
Thus, there is a parallelism between the end-state knowledge of English nearnative speakers of Italian and the native knowledge of Italian near-native speakers of English under attrition with respect to null/overt subjects and
Gradedness and Optionality in Mature and Developing Grammars
115
pre/postverbal subjects: the speakers’ grammar is and remains a null-subject language. The computational features of grammar responsible for the licensing of null subjects are acquired completely, and are not aVected by attrition.4 More generally, changes induced by attrition in individual speakers primarily aVect morphosyntactic features that are interpretable at the interface with conceptual systems. The aVected features may become unspeciWed, giving rise to optionality. Thus, attrition is expected to aVect the use of overt subjects in L2 Italian (given that this is regulated by the interpretable [topic-shift] and [focus] features). If these features become unspeciWed, overt subjects in Italian under attrition are not necessarily being used or interpreted as shifted topics or foci.5 The lexicon–syntax interface conditions governing the syntactic behaviour of intransitive verbs are also vulnerable to attrition. Montrul’s (in press) study on generational attrition in Spanish heritage speakers found that attrition aVects the mappings of individual verbs onto unaccusative/unergative syntax: these speakers do not show a sensitivity to diVerent subclasses represented on the SIH, and their determinate intuitions are restricted to core verbs. Montrul’s study shows that this is the same pattern obtained for L2 learners of Spanish. Once again, both bilingual groups have a robust syntactic representation of split intransitivity, but exhibit instability with respect to the lexicon– syntax interface conditions regulating the distribution of verbs into one syntactic class or the other.
6.4 Interpreting gradedness and optionality At this point we need a generalization that describes these converging patterns of results. A Wrst approximation might be the following:
4 Other cases of selective attrition at interfaces are discussed in Montrul (2002) with respect to the tense/aspect domain in Spanish; Polinsky (1995) with respect to the distinction between reXexive and possessive pronouns in Russian; Gu¨rel (2004) on pronominals in Turkish. 5 Studies on bilingual Wrst language acquisition converge with the results of research on L2 acquisition and L1 attrition. The syntax–pragmatics interface has been identiWed as a locus of crosslinguistic inXuence between the bilingual child’s syntactic systems (Mu¨ller and Hulk 2001). Bilingual children who simultaneously acquire a null-subject language and a non-null-subject language overproduce overt subjects in the null-subject language (see Serratrice 2004 on Italian–English bilinguals; Paradis and Navarro 2003 on Spanish–English bilinguals; Schmitz (2003) on Italian–German bilinguals). Thus, crosslinguistic eVects obtain only from the non-null-subject language to the null-subject language and never in the other direction, regardless of dominance.
116
The Nature of Gradience
(6.10) ‘Narrow’ versus ‘Interface’ syntax: . Non-interpretable features that are internal to the computational system of syntax proper and drive syntactic derivations are categorical in native grammars; are acquired successfully by adult L2 learners; and are retained in the initial stages of individual attrition. . Interpretable features that ‘exploit’ syntactic options and belong to the interface between syntax and other domains, such as the lexicon, discourse, or pragmatics, may exhibit gradedness in native grammars; may present residual optionality in near-native grammars, due to the inXuence of the native language even at the most advanced competence stage; and are vulnerable to change in individual attrition.6 This generalization, which is compatible with theoretical assumptions in the minimalist programme (Chomsky 1995), assumes the existence of diVerent ‘layers’ of syntactic knowledge and places these phenomena at the level of syntactic representations: hence, within the speaker’s grammatical competence. Interpretable features at interfaces are more vulnerable to underspeciWcation, in both native and non-native grammars, and are therefore more prone to gradedness and optionality (Sorace and Keller 2005). This is the analysis adopted by Tsimpli et al. (2004) for L1 attrition: no evidence of attrition is found in the parameterization of purely formal syntactic features, whereas attrition is evident with respect to the distribution of subjects in appropriate pragmatic contexts, which is regulated by interpretable interface features. A similar diVerentiation between narrow syntax and interface properties may be found in the domain of split intransitivity. The unaccusative–unergative distinction is a syntactically represented, potentially universal property, and it belongs to narrow syntax. As argued in much recent research, the syntactic conWguration that determines the unaccusativity of verbs contains a telicity aspectual projection (Borer 1994; van Hout 2000). Core unaccusative verbs are inherently lexically speciWed for telicity and categorically project onto the unaccusative conWguration: they are determinate, acquirable, and 6 At Wrst sight, it may appear as if the generalization just presented contradicts decades of L2 acquisition research. In particular, early research showed that semantically more transparent properties are easier to learn than more abstract syntactic properties that do not correspond in any clear way to semantic notions (see e.g. Kellerman 1987). Moreover, studies of the ‘basic variety’ argued that early interlanguage grammars favour semantic and pragmatic principles of utterance organization (Klein and Perdue 1997). However, the argument here is NOT that syntactic aspects are easier than semantic aspects, but that aspects of grammar that require not only syntactic knowledge, but also the integration of syntactic knowledge with knowledge from other domains are late acquired, or possibly never completely acquired by L2 learners.
Gradedness and Optionality in Mature and Developing Grammars
117
stable. Knowledge of split intransitivity, however, also involves mastery of the behaviour of non-core verbs and their compositional interpretation in the predicates in which they appear: this is acquired gradually through exposure to particular verbs in speciWc aspectual contexts; gives rise to variable intuitions; and is unstable in a situation of attrition. Recent proposals in syntactic theory further reWne the distinction between narrow syntax and interface properties. The distinction between formal licensing of a null pro and the discourse-related conditions on postverbal subjects, as well as on the use/interpretation of subject pronouns is highlighted by the ‘cartographic’ theoretical framework (Belletti 2004; Rizzi 2004). It is assumed within this theory that the low part of the clause contains discourse-related positions, labelled ‘Topic’ and ‘Focus’, which constitute a clause internal VP periphery. Postverbal subjects Wll one of these dedicated positions according to their interpretation in discourse contexts. For example, a sentence such as (6.11b) is associated with the representation in (6.12), where the subject Wlls the speciWer of the (new information) low Focus projection and is therefore interpreted as conveying new information; the preverbal subject position is occupied by pro: (6.11) a. Chi ha starnutito? b. Ha starnutito Gianni (6.12) [CP . . . [TP pro . . . ha starnutito . . . [TopP [FocP Gianni [TopP [VP . . . .] ] ] ] ] ] According to this position, the formal syntactic licensing of pro is a necessary, but not suYcient condition for VS, since the postverbal subject also requires activation of the VP periphery. The experimental data illustrated earlier indicate that it is precisely this further condition that remains problematic in near native L2 speakers of Italian: these speakers often fail to activate the VP-internal focus position required by focalization in Italian.7 The picture is furthercomplicated by the existence of diVerent phenomena that involve an interface between syntax and discourse. Do they all present gradedness and optionality? At a developmental level, there are theoretical and empirical arguments in favour of a distinction between discourse-related phenomena that are also relevant to LF, and those that are relevant only to the syntax–discourse 7 The result is the use of focus in-situ, namely an L1-based strategy that is more economical because it involves a DP-internal focus position (as the one overtly manifested in a sentence like ‘John himself sneezed’). It is worth noticing that L1 French speakers of L2 Italian often use clefting in the same context (Leonini and Belletti 2004), which is an alternative way of activating the VP-periphery (as shown in the example below) and is widely available in French.
(i) Ce . . . . [Top [Foc [Top [VP eˆtre [sc Jean [CP qui a e´ternue´] ] ] ] ] ]
118
The Nature of Gradience
interface. LF-relevant phenomena may pose developmental problems at intermediate stages, but they are ultimately acquired; LF-irrelevant phenomena raise persistent problems at all stages. Moreover, the former normally determine grammaticality eVects, whereas degrees of preference are associated with the latter. For example, syntactic focusing in languages such as Hungarian and Greek involves the formation of an operator-variable structure at LF (cf. Kiss 1998; Szendroi 2004), which causes verb-raising to C/F and associated grammaticality eVects, as opposed to degrees of preference. Focus movement of nonsubject arguments as well as adverbs and participles is unproblematic both in advanced L2 speakers of Greek and in native Greek speakers in an attrition situation. In contrast, both groups of speakers exhibit more variable performance on overt subject pronouns, a discourse-related phenomenon (Tsimpli and Sorace 2005). The diVerences observed between these structures may be due to the fact that discourse, in the sense of pragmatic conditions on the distribution of subject pronouns, is outside grammar proper, whereas the LF-interface is aVected by modular computations within the language system. Even when L2 speakers attain native-like knowledge of properties relevant to LF representations, optionality and crosslinguistic eVects remain possible at the discourse level where pragmatic and processing constraints aVect L2 use. These theoretical reWnements have begun to unravel the complexity of the factors that determine instability at interfaces. In doing so, however, they have also magniWed the fundamental ambiguity of the notion of interface and thus the diYculty of establishing the origins of interface instability. Do interfaces give rise to indeterminacy at the representational level, or is gradedness a phenomenon external to syntactic representations? The ambiguity is apparent in many recent studies. For example, Jakubowicz (2000) argues for the relevance of the notion of syntactic complexity in research on early normal and SLI child grammars, claiming that: (a) constructions requiring the integration of syntactic knowledge and knowledge from other domains are more complex than constructions requiring syntactic knowledge only; and (b) a syntactic operation is less complex if it is obligatorily required in every sentence; it is more complex if it is present only in some sentences because of semantic or pragmatic choices. The felicitous use of complex constructions, according to this deWnition, demands the simultaneous mastery of both the morphosyntactic properties of given constructions and of the discourse conditions governing their distribution and use.8 But is ‘complexity’ related 8 Avrutin (2004) goes a step further and regards ‘discourse’ as ‘a computational system (my emphasis) that operates on non-syntactic symbols and is responsible for establishing referential dependencies, encoding concepts such as ‘‘old’’ and ‘‘new’’ information, determining topics, introducing discourse presuppositions, etc’. Investigating the interface between syntax and discourse necessarily requires going beyond ‘narrow syntax’.
Gradedness and Optionality in Mature and Developing Grammars
119
to problems internal to the speaker’s representation of syntactic knowledge, or are these problems external to these representations and resulting from processing diYculties in integrating knowledge from diVerent domains? L2 studies on other potentially problematic interfaces (e.g. the syntax– morphology interface (Lardiere 1998; Pre´vost and White 2000; White 2003)) point towards the latter explanation, suggesting that persistent problems with inXectional morphology in endstate grammars may in fact be ‘surface’ problems related to the retrieval of the correct morphological exponents for abstract syntactic features. The fact that learners’ problems tend to be with missing inXection, as opposed to wrong inXection, suggests the existence of diYculties at the level of access to knowledge, rather than with knowledge itself, which lead to the optional use of ‘default’ underspeciWed forms. The choice of referential pronouns in Italian qualiWes as complex, since it demands the simultaneous mastery of both morphosyntactic properties and discourse conditions. In contrast, referential subject pronouns in English are less complex because there is no choice of diVerent forms that is conditioned by discourse factors.9 Similar diVerentiations in terms of complexity can be made for some manifestations of split intransitivity. Perfective auxiliaries in Italian are more complex than in English, because only Italian requires a choice of auxiliaries governed by lexical-semantic and aspectual features of the verb. Auxiliary choice in Italian is also more complex than in French, because in French the only verbs that take BE are those inherently speciWed for telicity, and therefore there is no auxiliary selection dependent on the evaluation of the properties of the predicate. Within Italian, auxiliary selection with core verbs is less complex than with non-core verbs, because selection with the latter has to take into account both the properties of the verb and other characteristics of the predicate. It is therefore possible to propose an alternative generalization on the nature of interfaces: (6.13) Processing complexity . Structures requiring the integration of syntactic knowledge and knowledge from other domains are more complex than structures requiring syntactic knowledge only. . Complex structures may present gradedness and variation in native grammars; may pose residual diYculties to near-native L2 speakers; may pose emerging diYculties to L1 speakers experiencing attrition 9 As pointed out by a referee, the interface with discourse conditions obviously aVects other aspects of pronominal use in English, such as the distribution of stress.
120
The Nature of Gradience from a second language because of increasingly frequent failure to coordinate/integrate diVerent types of knowledge.
This hypothesis Wnds a fertile testing ground in recent research on L2 processing. . For L2 speakers, recent evidence from on-line psycholinguistic (Felser et al. 2003; Kilborn 1992) and neuroscience experiments (particularly ERPs, see Hahne and Friederici 2001); . Sabourin (2003) indicates that syntactic processing (i.e. access to ‘Narrow Syntax’) continues to be less than optimally eYcient in nonnative speakers even at advanced levels. If syntactic processing is less eYcient in L2 speakers than in L1 speakers, the coordination of syntax with other domains is aVected because speakers have insuYcient processing resources to carry it out. When coordination fails, speakers resort to the most ‘economical’ option. Crosslinguistic inXuence from English may thus not be the only cause of the over-use of overt subject pronouns in Italian– English bilinguals. Rather, this behaviour is favoured by two concomitant factors: on the one hand, the availability of the English option, which is economical in processing terms; on the other, the speakers’ sub-optimal processing resources.10 Factors related to inadequate parsing resources also Wgure prominently in a recent proposal on the nature of language learners’ grammatical processing by Clahsen and Felser (in press). Accounting for L2 speakers’ divergent behaviour, according to this proposal, does not necessarily involve positing ‘representational deWcits’: L2 speakers can, and indeed do, attain target representations of the L2, but may compute incomplete (‘shallow’) syntactic parses in comprehension. Such shallow processing is often accompanied by reliance—or overreliance—on lexical, semantic, and pragmatic information, which can lead to seemingly trouble-free comprehension in ordinary communication. If the notion of shallow processing is extended to production (see Sorace, in press, for discussion), one may plausibly assume that shallow processing may result in non-native speakers’ lack of activation of the VP periphery in narrow focus contexts. Interpreting these phenomena in the light of Clahsen and Felser’s hypothesis allows us to identify their source in the persistence of an L1-based discourse ‘prominent’ strategy, employed to compensate for the failure to compute the required L2 syntactic representation, despite the potential grammatical availability of the latter. In comprehension, shallow 10 A similar argument is developed by Rizzi (2002), who accounts for the presence of null subjects in early child English grammars by assuming that this is an option structurally available to the child, which also happens to be favoured in terms of limited processing resources.
Gradedness and Optionality in Mature and Developing Grammars
121
processing may similarly involve the optional lack of activation of the VP periphery, which is necessary to the reading of the postverbal subject as carrying focus on new information. The result may be, for example, an ‘old information’ interpretation of an indeWnite postverbal DP. Analogous considerations may be extended to the diVerent distribution of overt subject pronouns in near-native Italian. The non-native production and interpretation of overt subject pronouns may be the result of shallow processing of the interface mapping governing the use of overt subjects (e.g. the obligatory presence of the feature ‘topic shift’; see Tsimpli et al. 2004) and the consequent assimilation of strong Italian pronouns to the corresponding weak English pronouns, which—unlike Italian overt pronouns—can refer to topic antecedents. The strategy used in these circumstances would be diVerent from the use of overt pronouns in a default form to relieve processing overload due, for example, to insuYcient knowledge of (or access to) agreement inXection (Bini 1993; Liceras et al. 1999; Sorace in press). This account crucially involves the optionality of shallow processing, that is, the L2 speakers’ ability to perform full processing, at least at the near-native level. Shallow processing, in this sense, would be a relief strategy that is available to all speakers but is relied on especially by bilingual speakers.11 For this reason, native speakers should not be immune from occasional interface coordination diYculties, for example in situations of competing processing demands. Indeed, a study by Serratrice (2004) shows that older monolingual Italian children (aged 8+) overproduce overt referential subjects, although not to the same extent as English–Italian bilingual children. This seems to suggest that the ‘interface’ conditions relating subject pronouns to discourse factors are late acquired because they are more demanding. As already mentioned, even Italian adult monolingual control groups in bilingual studies (Tsimpli et al. 2004; Filiaci 2003; Belletti et al. 2005) do not show categorically correct behaviour with respect to subject pronouns; they (sporadically) make unidirectional mistakes that always involve the inappropriate use of overt subjects in contexts that would favour null subjects.12 11 One should not lose sight of the fact that these diYculties are resolved in ways that betray the inXuence of universal factors. Optionality favours the retention and occasional surfacing of unmarked options which are subject to fewer constraints, consistent with typological trends (see Bresnan 2000). 12 The extension of overt subject pronouns to null subject contexts is attested in another situation in which knowledge of English is not a factor. Bini (1993) shows that Spanish learners of Italian up to an intermediate proWciency level use signiWcantly more overt subjects than monolingual Italians and monolingual Spanish speakers. Since the two languages are essentially identical with respect to both the syntactic licensing of null subjects and the pragmatic conditions on the distribution of pronominal forms, L1 inXuence is not a relevant factor here. This pattern is therefore likely to be due exclusively to coordination diYculties leading to the use of overt subjects as a default option.
122
The Nature of Gradience
6.4.1 The role of external ‘destabilizing’ factors Finally, the quantitative and qualitative characteristics of the input to which speakers are exposed may play a role in accounting for the instability found at interfaces in diVerent speaker populations. The quantitative factor is evident in bilingual use. What L2 near-native speakers and L1 speakers under attrition have in common is the fact that their total exposure to the language is reduced compared to that of monolingual speakers: in the case of L2 speakers, because they started the process of L2 acquisition in adulthood; in the case of L1 speakers under attrition, because they are no longer exposed to the L1 in a continuous way.13 Qualitative diVerences may be less obvious but are equally relevant: both the near-native speakers of Italian and the native Italian experiencing attrition are likely to receive input from native Italians in a situation of attrition and from other non-native Italian speakers; they may be exposed to non-native Italian from their spouse, and to ‘bilingual’ Italian from their children. Thus, these speakers receive qualitatively diVerent input that is consistent with, and reinforces, their own grammar. Gradedness and indeterminacy in split intransitivity is also fed and maintained, both in native and non-native speakers, by the input, which is categorical and uniform (and therefore rich in terms of frequency) for core verbs and variable for non-core verbs.14 It is intriguing to ask exactly what ‘destabilizing’ eVects may be brought about by the quantitative and qualitative diVerences in the input to which speakers are exposed, and grammars are aVected in diVerent ways. One possible hypothesis is that quantitative diVerences are likely to aVect processing abilities, because speakers have fewer opportunities to integrate syntax 13 Clearly the quantitative factor is also a function of age of Wrst exposure: it cannot be considered in absolute terms. Thus, an L2 speaker may have been exposed to the language for many decades and still exhibit non-native behaviour compared to a younger native speaker who has been exposed to the language for a shorter time, but since birth. 14 Variation at interfaces may be regarded as the motor of diachronic change, because it is at this level that ‘imperfect acquisition’ from one generation to the next is likely to begin. Sprouse and Vance’s (1999) study of the loss of null subjects in French indicates that language contact created a situation of prolonged optionality, that is competition between forms that make the same contribution to semantic interpretation, during which the null-subject option became progressively dispreferred in favour of the overt-subject option because it is the less ambiguous in processing terms. An analogous situation is experienced by the native Italian speaker after prolonged exposure to English: this speaker will be exposed both to null pronouns referring to a topic antecedent (in Italian) and to overt pronouns referring to a topic antecedent (in English, and also in the Italian of other native speakers in the same situation). Optionality, and competition of functionally equivalent forms, is therefore as relevant in this situation as in diachronic change. The diachronic loss of auxiliary choice in Romance languages may also be traced as beginning from non-core verbs and gradually extending to core verbs (Sorace 2000b; Legendre and Sorace 2003).
Gradedness and Optionality in Mature and Developing Grammars
123
and other cognitive domains in interpretation and production; qualitative diVerences, on the other hand, may aVect representations, because speakers would receive insuYcient evidence for interface mappings. Generally, it seems that exposure to consistent input up to a certain threshold level is necessary both for acquiring and maintaining an eYcient syntactic system.
6.5 Conclusions To conclude, I have presented evidence of gradedness and optionality in native and non-native grammars whose locus seems to be the interface between syntactic and other cognitive domains. There are two potential explanations for these patterns. One involves underspeciWcation at the level of knowledge representations that involves the interaction of syntax and other cognitive domains, such as lexical-semantics and discourse-pragmatics. The other involves insuYcient processing resources for the coordination of diVerent types of knowledge. Furthermore, there are diVerent kinds of interfaces, not all of which are susceptible to gradedness eVects either in stable or in developing grammars. Behavioural and neuropsychological evidence suggests that syntactic processes are less automatic in L2 speakers than in L1 speakers, which in turn may increase coordination diYculties. L2 speakers may also have inadequate resources to carry out the right amount of grammatical processing required by on-line language use, independently of their syntactic knowledge representations. The processing and the representational explanations, however, do not necessarily exclude each other, and indeed seem to work in tandem, particularly in the case of bilingual speakers. Furthermore, syntactic representations and processing abilities may be diVerentially aVected over time by quantitative and qualitative changes occurring in the input to which speakers are exposed. Future research is needed to ascertain the plausibility, and work out the details, of a uniWed account of gradedness and optionality in native and non-native grammars.
7 Decomposing Gradience: Quantitative versus Qualitative Distinctions M AT T H I A S S C H L E S EWS K Y, I NA B O R N K E S S E L A N D BRIAN MCELREE
7.1 Introduction Psycho- and neurolinguistic research within the last three decades has shown that speaker judgements are subject to a great deal of variability. Thus, speakers do not judge all sentences of a given language equally acceptable that are assumed to be grammatical from a theoretical perspective. Likewise, ungrammatical sentences may also vary in acceptability in rating studies conducted with native speakers. These Wndings stand in stark contrast to the classical perspective that grammaticality is categorical in that a sentence is either fully grammatical or fully ungrammatical with respect to a particular grammar. This apparent contradiction has, essentially, been approached from two diVerent directions. On the one hand, it has been proposed that judgement variability—or gradience—results from extra-grammatical ‘performance factors’ and that it therefore has an origin distinct from linguistic ‘competence’ (Chomsky 1965). Alternatively, the gradience of linguistic intuitions has been described in terms of varying markedness of the structures in question. Rather than appealing to variation in grammaticality, this latter approach introduces and appeals to an additional grammar-internal dimension. The idea that structures can vary in acceptability for grammar-internal reasons has found expression in the use of question marks, hash marks, and the like to describe the perceived deviation from the endpoints of the grammaticality scale. Importantly, it must be kept in mind that judgements of acceptability— whether they are binary judgements or numerical ratings—represent
Quantitative versus Qualitative Distinctions
125
unidimensional assessments of what is inherently a multidimensional signal. In essence, intuitions of acceptability reXect the endpoint of a complex sequence of processes underlying sentence comprehension or production. Consequently, it is possible that two sentence structures judged to be equally unacceptable may be unacceptable for rather diVerent reasons. Consider, for example, the following three German examples: (7.1) a. Dann hat der Lehrer dem Jungen den Brief gegeben. then has [the teacher]NOM [the boy]DAT [the letter]ACC given ‘Then the teacher gave the letter to the boy.’ b. ??Dann hat dem Jungen den Brief der Lehrer gegeben. then has [the boy]DAT [the letter]ACC [the teacher]NOM given ‘Then the teacher gave the letter to the boy.’ c. *Dann hat der Lehrer gegeben dem Jungen den Brief. then has [the teacher]NOM given [the boy]DAT [the letter]ACC
Example (7.1a) illustrates the canonical argument order in German: nominative > dative > accusative. In (7.1b), two argument permutations have resulted in the order dative > accusative > nominative. Argument serializations of this type are typically analysed as grammatical, but are highly marked. Example (7.1c), by contrast, is ungrammatical because of the positioning of the participle, which should be clause-Wnal. Interestingly, structures such as (7.1b) and (7.1c) are consistently judged to be equally (un-)acceptable in rating studies of various types, including questionnaire studies (Pechmann et al. 1996) and speeded acceptability ratings (Ro¨der et al. 2002; Fiebach et al. 2004). Thus, it is not possible to discriminate between the two sentence types by relying on linguistic intuitions alone. Fortunately, however, other measures can eVectively discriminate between the structures in (7.1). In a recent study using functional magnetic resonance imaging (fMRI) to map the brain areas involved in the processing of sentences such as (7.1), Fiebach et al. (2004) showed that the observed pattern of acceptability can be traced back to two distinct sources of neural activation (Figure 7.1). Whereas complex grammatical sentences (e.g. 7.1b) gave rise to an enhanced activation in Broca’s area (the pars opercularis of the left inferior frontal gyrus, BA 44), ungrammatical structures (e.g. 7.1c) engendered activation in the posterior deep frontal operculum. The data reported by Fiebach et al. (2004) thus provide a compelling demonstration that overt judgements of sentence acceptability (or grammaticality) may not provide an adequate means of determining the underlying diVerences in acceptability of various sentence structures.
126
The Nature of Gradience
scrambling effect ungrammaticality effect
Figure 7.1. A schematic illustration of the activations elicited by the complexity (scrambling) and the grammaticality manipulation in Fiebach et al. (2004)
In this chapter, we draw upon studies of word order variation in German to examine how linguistic judgements emerge from the real-time comprehension processes. On the basis of a number of empirical observations, we argue that gradient data need not be interpreted as evidence against categorical grammars. Rather, gradience can arise from a complex interaction between grammar-internal requirements, processing mechanisms, general cognitive constraints1 and the environment within which the judgement task is performed. We begin by describing the critical phenomena from a behavioural perspective (Section 7.2), before turning to experimental methods providing more Wne-grained data (event-related brain potentials, ERPs; speed-accuracy trade-oV, SAT) (Section 7.3).
7.2 The phenomenon: argument order in German Argument order variations in German are typically classiWed along several dimensions. First, a permuted argument may occupy either the sentenceinitial position (the Vorfeld) or reside in a clause-medial position (in the Mittelfeld).2 Secondly, the type of permuted argument (wh-phrase, pronoun, etc.) is also of crucial importance. In this way, four permutation types are distinguished: topicalization (7.2a), wh-movement (7.2b), scrambling (7.2c), and pronoun ‘movement’ (7.2d). 1 It has been shown, for example, that the Wnal interpretation of a sentence may vary interindividually as a function of general cognitive capacity. Thus, researchers have distinguished between fast and slow comprehenders (e.g. Mecklinger et al. 1995), good and poor comprehenders (e.g. King and Kutas 1995), high and low verbal working memory capacity as measured by the reading span test (King and Just 1991) and individual alpha frequency (Bornkessel et al. 2004b). However, a discussion of these factors is beyond the scope of this chapter. 2 The Mittelfeld is the region of the German clause that is delimited to the left by a complementizer (subordinate clauses) or Wnite verb in second position (main clauses) and to the right by a clause-Wnal participle or particle.
Quantitative versus Qualitative Distinctions
127
(7.2) a. Topicalization (Vorfeld, -wh) Den Arzt hat wahrscheinlich der Journalist [the journalist]NOM [the doctor]ACC has probably eingeladen. invited ‘The journalist most likely invited the doctor.’ b. Wh-movement (Vorfeld, +wh) Welchen Arzt hat wahrscheinlich der Journalist [the journalist]NOM [which doctor]ACC has probably eingeladen? invited ‘Which doctor did the journalist most likely invite?’ c. Scrambling (Mittelfeld, non-pronominal) Wahrscheinlich hat den Arzt der Journalist probably has [the doctor]ACC [the journalist]NOM eingeladen. invited ‘The journalist most likely invited the doctor.’ d. Pronoun ‘movement’ (Mittelfeld, pronominal) Wahrscheinlich hat ihn der Journalist eingeladen. probably has [him]ACC [the journalist]NOM invited ‘The journalist most likely invited him.’ In addition to these four theoretically motivated permutation types, psychoand neurolinguistic studies implicate an additional dimension, namely whether the permuted argument is unambiguously case marked (e.g. den ¨ rztin, ‘[the Arzt, ‘[the (male) doctor]NOM’) or case ambiguous (e.g. die A (female) doctor]NOM/ACC’). From a comprehension perspective, the diVerence between unambiguous and ambiguous case marking lies in the fact that the former immediately signals the presence of an argument order variation, while the latter does not. Empirical evidence indicates that, when faced with an ambiguity, German speakers initially pursue a strategy in which the Wrst ambiguous argument is analysed as the subject of the clause (e.g. Frazier and d’Arcais 1989; de Vincenzi 1991; Bader and Meng 1999). Only when information contradicting this analysis is encountered is a reanalysis of the clause initiated. In this way, the processes leading to the recognition of an argument order variation diVer qualitatively in unambiguous and ambiguous situations. The various types of argument order variations in German have been subject to a number of empirical investigations using diVerent types of acceptability measurements. From these Wndings, the four central generalizations in (7.3) emerge.
128
The Nature of Gradience
(7.3) Generalizations with regard to the acceptability of argument order variations in German (i) object-initial sentences are generally less acceptable than their subject-initial counterparts (Krems 1984; Hemforth 1993); (ii) acceptability decreases with an increasing number of permutations (Pechmann et al. 1996; Ro¨der et al. 2000); (iii) the acceptability of object-initial sentences decreases when the permuted object is case ambiguous (Meng and Bader 2000a); (iv) the acceptability diVerence between object- and subject-initial structures varies according to the following hierarchy: scrambling > topicalization > wh-movement > pronoun movement (Bader and Meng 1999). The four generalizations summarized in (7.3) interact to produce the overt acceptability pattern seen in German argument-order variations, giving what appear to be a highly gradient set of linguistic intuitions. To cite just one example, Meng and Bader (2000b) report a 49 percent acceptability rate for the scrambling of ambiguous accusative objects. As participants were asked to provide yes–no judgements, this amounts to chance-level performance.3 However, the generalizations in (7.3) emerged almost exclusively from studies on the permutation of accusative objects in transitive sentences. By contrast, when the relative ordering between dative- and nominative-marked arguments is examined, at least two interesting exceptions to this general pattern become apparent. First, the severe drop in acceptability for scrambled (transitive) objects is attenuated when the object bears dative case marking: Schlesewsky and Bornkessel (2003) report an 86 percent acceptability rate for initially ambiguous dative-initial structures similar to those engendering a 49 percent acceptability rate for accusative-initial structures in the Meng and Bader (2000b) study. Secondly, in sentences with dative object-experiencer verbs—which project an argument hierarchy in which the dative-marked experiencer outranks the nominative-marked stimulus—the acceptability decrease for object-initial orders is neutralized or even tendentiously reversed (Schlesewsky and Bornkessel 2003; see Table 7.1). Dramatic diVerences of this sort call for an explanation. We believe that an adequate explanation requires, as a Wrst step, an accurate, Wne-grained characterizationoftheoutputsignalthatconstitutesanacceptability judgement.
3 Note that this essentially amounts to the same level of performance that a non-human primate might be expected to produce when confronted with the same sentences and two alternative pushbuttons. As such, chance-level acceptability deWes interpretation.
Quantitative versus Qualitative Distinctions
129
Table 7.1. Acceptability ratings for locally ambiguous subjectand object-initial sentences with dative active and dative objectexperiencer verbs Sentence type Subject-Wrst, active verb Subject-Wrst, object-experiencer verb Object-Wrst, active verb Object-Wrst, object-experiencer verb
Mean acceptability in % 92.3 84.6 86.5 91.7
Source : Schlesewsky and Bornkessel 2003
Thus, we must examine how the judgement ‘emerges’ from the on-line comprehension process.
7.3 Sources of gradience A natural Wrst step in tracing the emergence of a linguistic judgement lies in the examination of the comprehension system’s initial response to the variation under consideration, that is, for present purposes, to the argument order permutation. How does the system react when it encounters an object before a subject and is there evidence for an immediate diVerentiation between the diVerent permutation types? A methodological approach optimally suited to answering this question is the measurement of event-related brain potentials (ERPs; see Appendix 1). Because of their very high temporal resolution and their multi-dimensional characterization of neuronal activity, ERPs allow for an exquisitely precise diVerentiation of various cognitive processes, and, not surprisingly, many researchers have capitalized upon these properties to explore language processing. Most of the argument order variations discussed above have been subjected to examinations using ERPs, but we restrict our discussion to order permutations in dative constructions and how these contrast with those in accusative structures. Moreover, we will focus primarily on Wndings for initially ambiguous structures, as these reveal the inXuence of processing considerations on acceptability ratings most clearly. From the perspective of a strong competence versus performance distinction, these structures might be considered a ‘worst case scenario’ and are as such well-suited to examining the limitations of supposedly time insensitive linguistic judgements. The exploration of ambiguous sentences provides fruitful ground for investigating these issues. Consider, for example, the sentence fragment in (7.4):
130
The Nature of Gradience
(7.4)
. . . dass Dietmar Physiotherapeutinnen . . . . . . that DietmarNOM/ACC/DAT physiotherapistsNOM/ACC/DAT
When confronted with an input such as (7.4), the processing system initially analyses the Wrst argument Dietmar as the subject of the clause (e.g. Hemforth 1993; Schriefers et al. 1995; Bader and Meng 1999; Schlesewsky et al. 2000). Accordingly, the second argument Physiotherapeutinnen—which does not contradict the initial assignment—is analysed as an object. If, however, the clause is completed by a plural verb such as beunruhigen (‘to disquiet’), the supposed subject of the clause no longer agrees with the Wnite verb. Thus, a reanalysis towards an object-initial order must be initiated in order for a correct interpretation to be attained. In terms of ERP measures, reanalyses are typically associated with a late (approximately 600–900 ms) positivity with a posterior distribution (P600; e.g. Osterhout and Holcomb 1992). Indeed, this component has also been observed for the reanalysis of argument order in German, for example in whquestions (beim Graben et al. 2000), topicalizations (Frisch et al. 2002) and scrambled constructions (Friederici and Mecklinger 1996). Note, however, that all of these studies only manipulated the word order of accusative structures. However, a qualitative diVerence emerges when the reanalysis towards a dative-initial order is examined and compared to the reanalysis towards an accusative-initial order in an otherwise identical sentence (i.e. completing sentence fragments as in (7.4) with either an accusative or a dative verb, e.g. besuchen (‘to visit’) versus danken (‘to thank’)). While the reanalysis in (a)
Accusative
(b)
Dative CP3
CP3 N400
0.600 ..0.900 s −4.0
mV
0.600 ..0.600 s
P600
+4.0
−4.0
−5 mV
5
1.0
+4.0
OBJECT−SUBJECT
s 0.5
mV
CP3
SUBJECT−OBJECT
Figure 7.2. Grand average ERPs for object- and subject-initial structures at the position of the disambiguating clause-final verb (onset at the vertical bar) for sentences with accusative (A) and dative verbs (B). Negativity is plotted upwards. The data are from Bornkessel et al. (2004a)
Quantitative versus Qualitative Distinctions
131
accusative structures gives rise to a P600 eVect as discussed above, the revision towards a dative-initial word order elicits a centro-parietal negativity between approximately 300 and 500 ms post onset of the disambiguating element (N400; Bornkessel et al. 2004a; Schlesewsky and Bornkessel 2004). The diVerence between the two eVects is shown in Figure 7.2. In accordance with standard views on the interpretation of ERP component diVerences (e.g. Coles and Rugg 1995), we may conclude from this distinction that reanalysis towards an object-initial order engages qualitatively diVerent processing mechanisms in accusative and dative structures. The processes in question, which may be thought to encompass both conXict detection and conXict resolution, therefore reXect underlyingly diVerent ways of resolving a superWcially similar problem (i.e. the correction of an initially preferred subject-initial analysis). Before turning to the question of whether the ERP diVerence between reanalyses towards accusative and dative-initial orders may be seen as a correlate of the diVerent acceptability patterns for the two types of object cases—and thereby a source of gradience in this respect—we shall Wrst examine a second exception to the generalizations in (7.3), namely the behaviour of dative object-experiencer verbs (e.g. gefallen, ‘to be appealing to’). As brieXy discussed above, this verb class is characterized by an ‘inverse linking’ between the case/grammatical function hierarchy and the thematic hierarchy: the thematically higher-ranking experiencer bears dative case, while the lower-ranking stimulus is marked with nominative case. In the theoretical syntactic literature, it has often been assumed that these verbs are associated with a dative-before-nominative base order, which comes about when the lexical argument hierarchy (experiencer > stimulus) is mapped onto an asymmetric syntactic structure (e.g. Bierwisch 1988; Wunderlich 1997, 2003; Haider 1993; Haider and Rosengren 2003; Fanselow 2000). These properties of the object-experiencer class lead to an interesting constellation for argument order reanalysis, which may again be illustrated using the sentence fragment in (7.4). As with the cases discussed above, the comprehension system initially assigns a subject-initial analysis to the input fragment in (7.4). Again, when this fragment is completed by a dative object-experiencer verb that does not agree with the Wrst argument, reanalysis towards an object-initial order must be initiated. However, in contrast to the structures previously discussed, here the verb provides lexical information in support of the target structure, speciWcally an argument hierarchy in which the dative outranks the nominative. The ERP diVerences between reanalyses initiated by dative object-experiencer verbs and dative active verbs (which project a ‘canonical’ argument hierarchy) are shown in Figure 7.3 (Bornkessel et al. 2004a).
The Nature of Gradience
132
(b) Object-experiencer verbs
(a) Active verbs
N400
0.350 .. 0.550 s
−3.0
mV
N400
P4
P4
0.350 .. 0.550 s −3.0
+3.0
mV
−5 mV
SUBJECT−OBJECT
s 5
0.5
1.0
+3.0
OBJECT−SUBJECT P4
Figure 7.3. Grand average ERPs for object- and subject-initial structures at the position of the disambiguating clause-final verb (onset at the vertical bar) for sentences with dative active (A) and dative object-experiencer verbs (B). Negativity is plotted upwards. The data are from Bornkessel et al. (2004a)
As Figure 7.3 shows, reanalyses initiated by a dative object-experiencer verb also engender an N400 component, rather than a P600. However, the N400 eVect is less pronounced than for the analogous structures with dative active verbs. The diVerence between the two types of dative constructions is therefore quantitative rather than qualitative in nature. This suggests that reanalysis is more eVortful in the case of dative active verbs, but that the same underlying processes may be assumed to take place with both verb classes. To a large extent, the ERP patterns mirror the acceptability judgements described above. On the one hand, there is a general diVerence between dative- and accusative-initial sentences: the former are not only more acceptable than the latter, they also engage qualitatively diVerent processing mechanisms in reanalysis. Secondly, there is also a diVerence within the dative verbs themselves such that reanalysis towards a dative-initial order is less costly when it is triggered by an object-experiencer rather than an active verb. Nonetheless, reanalyses with both dative verb types appear to proceed in a qualitatively similar manner. However, despite this strong convergence of measures, it is unrealistic to expect a one-to-one mapping between the ERP data and the acceptability ratings. Indeed, not all the diVerences found in ERP measures are expressed in overt judgements. For example, the disadvantage for the object-initial
Quantitative versus Qualitative Distinctions
133
structures is measurable in ERPs even with dative object-experiencer verbs, while the diVerence between the two word orders is no longer visible in the acceptability rates. In order to precisely predict the relationship between the two types of measures we must fully understand how an overt judgement ‘emerges’ from the comprehension process. This requires tracing the development of the judgement from the point at which the problem is detected to later points when the system has settled on a Wnal assessment of the acceptability of the structure. The speed–accuracy trade-oV procedure (SAT; see Appendix 2) is one experimental method that allows for an examination of how a linguistic judgement develops over time. This method traces the emergence of an acceptability judgement from its beginnings (i.e. from the point at which the judgement departs from chance-level) up to a terminal point (i.e. a point at which the judgement no longer changes even with functionally unlimited processing time). Under the assumption that ERPs characterize the processing conXict and its resolution, while time-insensitive linguistic judgements reXect the endpoint of a multidimensional set of processing mechanisms, SAT procedures provide a bridge between the two measures. Let us Wrst consider the SAT results for reanalysis towards a dative-initial order in sentences with dative active and dative object-experiencer verbs. The SAT functions for the four critical conditions are shown in Figure 7.4 (Bornkessel et al. 2004a). The SAT data shown in Figure 7.4 were best Wt with an exponential approach to a limit function (Eq. 1), that assigned a distinct asymptote (l)
Subject-Object, Active Object-Subject, Active Subject-Object, Object-Experiencer Object-Subject, Object-Experiencer
Accuracy (d⬘)
4 3 2 1 0 0
1 2 3 4 5 Processing time (lag plus latency) in seconds
6
Figure 7.4. SAT functions for object- and subject-initial structures with (dative) active and (dative) object-experiencer verbs The data are from Bornkessel et al. (2004a)
134
The Nature of Gradience
to each of the four conditions and distinct intercepts (d) to the subject-initial and object-initial conditions, respectively. (Eq. 1) d’ (t) ¼ l (1–e–b(t–d)) for t > d, otherwise 0 The intercept diVerence between subject-initial and object-initial structures, with a longer intercept for object-initial structures, indicates that the Wnal analysis of the object-initial sentences takes longer to compute than the Wnal analysis of their subject-initial counterparts. This is the characteristic pattern predicted for a reanalysis operation: as reanalysis requires additional computational operations, the correct analysis of a structure requiring reanalysis should be reached more slowly than the correct analysis of an analogous structure not requiring reanalysis. The dynamics (intercept) diVerence occurs in exactly the same conditions as the N400 eVect in the ERP experiment. The asymptotic diVerences appear to result from two sources. First, the object-initial structures are generally associated with lower asymptotes than the subject-initial controls. This diVerence likely reXects a decrease in acceptability resulting from the reanalysis operation required to interpret the object-initial sentences. A principled explanation for this pattern, one that is consistent with the concomitant dynamics diVerences, is that, on a certain proportion of trials, the processing system fails to recover from the initial misanalysis, thus engendering lower asymptotic performance for an initially ambiguous object-initial structure as compared to a subject-initial structure. More interesting, perhaps, are the diVerences in asymptote between the two object-initial conditions: here, the sentences with object-experiencer verbs were associated with a reliably higher asymptote than those with active verbs. This diVerence may directly reXect the diVerences in the accessibility of the object-initial structure required for a successful reanalysis. Whereas the active verbs provide no speciWc lexical information in favour of such a structure, the object-experiencer verbs are lexically associated with an argument hierarchy calling for precisely this type of ordering. Thus, while a garden path results for both verb types, the object-experiencer verbs provide a lexical cue that aids the conXict resolution. Again, the correspondence to the ERP data is clear: the higher asymptote for the object-initial structures with object-experiencer verbs—which we have interpreted as arising from the higher accessibility of the object-initial structure in these cases—corresponds to the reduced N400 for this condition, which also reXects a reduction of the reanalysis cost. Two conclusions concerning acceptability patterns follow from this analysis. First, despite the presence of an almost identical processing conXict in both cases, dative-initial structures with object-experiencer verbs are more
Quantitative versus Qualitative Distinctions
135
acceptable than those with active verbs because only the former are lexically associated with an object-initial word order. Secondly, however, even the presence of an object-experiencer verb can never fully compensate the cost of reanalysis, as evidenced by the fact that an initially ambiguous dative-initial structure never outstrips its nominative-initial counterpart in terms of acceptability. From a surface perspective, therefore, the observed acceptability patterns are the result of a complex interaction between diVerent factors. The observed gradience does not result from uncertainty in the judgements, but rather from interactions between the diVerent operations that lead to the Wnal intuition concerning acceptability. Having traced the emergence of the acceptability judgements for the two types of dative structures, a natural next step appears to be to apply the same logic to the diVerence between accusative and dative structures and to thus examine whether similar types of parallels between the on-line and oV-line Wndings are evident in these cases. Recall that, while the reanalysis of a dative structure generally engenders an N400 eVect in ERP terms, reanalysis towards an accusative-initial order has been shown to reliably elicit a P600 eVect. In addition, the surface acceptability is much lower for the accusative than for the dative-initial structures. How might these Wndings be related? Essentially, the diVerent ERP patterns suggest that the two types of reanalysis take place not only in a qualitatively diVerent manner, but also in diVerent phases of processing: while the N400 is observable between approximately 300 and 500 ms after the onset of a critical word, the time range of the P600 eVect is between approximately 600 and 900 ms. Thus, the reanalysis of an accusative structure appears to be a later process than the reanalysis of a dative structure and, in terms of the SAT methodology, we might therefore expect to observe larger dynamics diVerences between subject- and object-initial accusative sentences than in the analogous contrast for dative sentences. As discussed above, dynamics diVerences can subsequently lead to diVerences in terminal (asymptotic) acceptability and the distinction between the dative and the accusative structures might therefore also—at least in part—stem from a dynamic source. The diVerence between subject- and object-initial dative and accusative structures as shown in an SAT paradigm is shown in Figures 7.5.a and 7.5.b (Bornkessel et al. submitted). As the accusative and dative sentences were presented in a between-subjects design, model Wtting was carried out separately for the two sentence types. While the accusative structures were best Wt by a 2l–2b–2d model (adjusted R2 ¼ .994), the best Wt for the dative structures was 1l–1b–2d (adjusted R2 ¼ .990). Estimates of the composite dynamics (intercept + rate) were
136
The Nature of Gradience
Accuracy in d⬘ units
(a) 5
Accusatives
4
S-INITIAL O-INITIAL
3 2 1 0 0
Accuracy in d⬘ units
(b) 5
1
2 3 4 5 Processing time in seconds
6
7
6
7
Datives
4
S-INITIAL O-INITIAL
3 2 1 0 0
1
2 3 4 5 Processing time in seconds
Figure 7.5. SAT functions for object- and subject-initial structures with accusative (a) and dative verbs (b) The data are from Bornkessel et al. (submitted)
computed for each condition using the formula 1/d + b, which provides a measure of the mean time required for the SAT function to reach the asymptote. The dynamics diVerence between the two accusative structures was estimated to be 588 ms (4430 ms for object- versus 3842 ms for subjectinitial sentences), while the diVerence for the dative sentences was estimated at 332 ms (3623 ms versus 3291 ms). Thus, while both dative- and accusativeinitial sentences show slower dynamics than their subject-initial counterparts, the dynamics diVerence between the subject- and object-initial accusative structures is approximately 250 ms larger than that between the dative structures. This Wnding therefore supports the hypothesis that the large
Quantitative versus Qualitative Distinctions
137
acceptability disadvantage for the initially ambiguous accusative-initial structures—and the corresponding asymptote diVerences for the accusative sentences—results to a large extent from the highly pronounced dynamics diVerence between these structures and the corresponding subject-initial sentences. In other words, the likelihood that the correct analysis fails to be computed is much higher in the accusative-initial sentences because the computational operations required to obtain this analysis are much more complex for this sentence type. Consequently, accusative-initial sentences are rejected as unacceptable in a higher proportion of trials, thereby yielding a lower acceptability rate/asymptote. Again, SAT provides a principled means of establishing the correspondence between the ERP data and the acceptability ratings. The reanalysis mechanisms that are reXected in an N400 eVect—those that enable reanalysis towards a dative-initial structure—are also associated with a smaller dynamics increase than those reXected in a P600 component—those that enable reanalysis towards an accusative-initial structure. Therefore, we might speculate that the diVerence in the underlying neural processes, which is reXected in the diVerent ERP components, gives rise to the concomitant diVerences in SAT dynamics and, thereby, to the diVerences in surface acceptability. If both of the SAT studies discussed here are considered together, an interesting diVerence between the two becomes apparent. In the Wrst experiment, in which only dative structures were examined in a design identical to that used to obtain the acceptability rates in Table 7.1, there were reliable asymptote diVerences between dative- and nominative-initial structures with dative active verbs. In the second study, there were comparable dynamics diVerences, but the asymptote diVerence—although apparent in visual inspection—failed to signiWcantly improve the model Wt. How can we account for this variation or inter-experimental gradience? Assuming that the asymptotic acceptability measured using SAT reXects the endpoint of processing and, thereby, the time-independent acceptability of a given structure, one plausible explanation appears to lie in the diVerent experimental environments in which the structures were presented. It is well-known that sentences judgements are inXuenced by various factors including context, Wller sentence type, etc. (Bard et al. 1996; Schu¨tze 1996). One crucial diVerence between the two SAT studies is that dative object-experiencer verbs were only included in the Wrst experiment. It may therefore be the case that the acceptability ratings for the object-initial dative active sentences arise not only from a contrast with the corresponding subject-initial sentences but also with the object-initial sentences with
138
The Nature of Gradience
experiencer verbs. In the face of more acceptable object-initial structures, the acceptability disadvantage for the object-initial active sentences may be ‘overestimated’. If true, this observation suggests that terminal acceptability may result from the interaction of a variety of diVerent factors. Whatever the source of this discrepancy, it serves to highlight the highly variable nature of acceptability judgements, and to contrast these measures with those that more directly reXect intrinsic properties of the underlying processing mechanisms, such as the dynamics diVerence between object- and subject-initial structures, which may be assumed to be more stable and less subject to environmental inXuences.
7.4 Final remarks In this chapter, we have attempted to show how linguistic judgements arise from diVerent facets of language comprehension. In particular, our data suggest the following caveats concerning the interpretation of acceptability judgements: 1. One-dimensional judgements often result from the interaction of a variety of factors, and hence are inherently multi-dimensional. 2. SuperWcially similar judgements may stem from qualitatively diVerent sources, which should be disentangled. DiVerences that may appear quantitative, for example as diVerent ‘strengths’ on a single dimension may nevertheless have qualitatively diVerent origins. 3. Acceptability decreases may be dynamic or non-dynamic in nature. Concerning gradience in linguistic judgements, these Wndings indicate that a considerable amount of variation in judgements may be accounted for by carefully considering factors that interact to produce the end state that constitutes an acceptability judgement. The question thus arises whether gradience should indeed be attributed to linguistic competence, or whether it is better described as a product of the language in use, that is of processing mechanisms—which may or may not be language speciWc—and of general cognitive factors (e.g. working memory, see footnote 1). From our perspective, the burden of the evidence rests with the advocates of gradient grammaticality, for it appears very diYcult to mount a convincing argument in favour of grammar-internal gradience on the basis of acceptability judgements alone. Thus, when all possible alternative sources of surface gradience are considered, a categorical grammar still appears to be the simplest and therefore most appealing means of accounting for the data.
Quantitative versus Qualitative Distinctions
139
7.5 Appendix 1. Event-related brain potentials (ERPs) Event-related brain potentials (ERPs) are small changes in the spontaneous electrical activity of the brain, which occur in response to sensory or cognitive stimuli and which may be measured non-invasively by means of electrodes applied to the scalp. The high temporal resolution of ERP measures is of particular importance for the examination of language comprehension. Furthermore, ERP patterns (‘components’) are characterizable in terms of the following parameters: polarity (negative versus positive); topography (at which electrode sites an eVect is visible); latency (the time at which the eVect is visible relative to the onset of a critical stimulus); and amplitude (the ‘strength’ of an eVect). While a number of language-related ERP components have been identiWed (cf., for example, Friederici 2002), we will not introduce these here for the sake of brevity. For a more detailed description of the ERP methodology and how it has been applied to psycholinguistic domains of investigation, the reader is referred to the overviews presented in Coles and Rugg (1995), Garnsey (1993), and Kutas and Van Petten (1994). Ongoing EEG Amplifier S
S
S
S
one sec Auditory event-related potential −6 mV N400 Signal averager
ELAN
Auditory stimulus (S)
P600 P200 ms
+6 mV 200 Stimulus onset
400
600 800 Time
Figure 7.6. Schematic illustration of the ERP methodology
1000
140
The Nature of Gradience
The ERP methodology only provides relative measures, that is an eVect always results from the comparison of a critical condition with a minimally diVering control condition. For example, at the position of socks in He spread the warm bread with socks in comparison to the position of butter in He spread the warm bread with butter, a negativity with a centro-parietal distribution and a maximum at 400 ms post critical word onset (N400) is observable (Kutas and Hillyard 1980). Thus, in the experiments presented here, we always compare the response to a critical condition with that to a control condition at a particular (critical) position in the sentence. A schematic illustration of the ERP methodology is shown in Figure 7.6.
7.6 Appendix 2. Speed–accuracy trade-oV (SAT) Reading time (eye-movement tracking or self-paced) procedures are often used as a natural and unintrusive measure of processing time. However, these measures do not provide an estimate of the likelihood that readers have successfully processed a sentence and, conversely, do not provide a direct estimate of the time it takes to compute an interpretation. A reading time diVerence can reXect the time needed to compute a particular interpretation, but it can also reXect the likelihood that readers can compute that interpretation or how plausible readers Wnds the resulting interpretation (McElree 1993, 2000; McElree and GriYth 1995, 1998; McElree and Nordlie 1999; McElree et al. 2003). A standard solution to this problem is to derive a full time–course function that measures how the accuracy of processing varies with processing time (Wickelgren 1977). The response-signal, speed–accuracy trade-oV (SAT) procedure provides the required conjoint measures of processing speed and accuracy. The response-signal speed–accuracy trade-oV task requires subjects to make their judgement of acceptability at particular times. This serves to chart the full time–course of processing, measuring when discrimination departs from a chance level, the rate at which discrimination grows as a function of processing, and the asymptotic level of discrimination accuracy reached with (functionally) unlimited processing time. Figure 7.7 presents illustrative SAT functions derived from this procedure. The accuracy of discriminating acceptable from unacceptable sentences is measured in d’ units (the z-transform of the probability of correctly accepting an acceptable sentence minus z-transform of the probability of falsely accepting an unacceptable sentence). Typical SAT functions display three distinct phases: a period of chance performance (d’ ¼ 0), followed by a period of increasing accuracy, followed by an asymptotic period where further processing does not
Quantitative versus Qualitative Distinctions
141
improve performance. In a sentence acceptability task, the SAT asymptote provides a measure of the probability (across trials and materials) that readers arrive at an interpretation suYcient to support an ‘acceptable’ response. If two conditions diVer in asymptote, as illustrated in Panel (a), it indicates that they diVer in the likelihood that a meaningful interpretation can be computed or in overall acceptability/plausibility of the respective interpretation. The point at which accuracy departs from the chance level (the intercept of the function) and the rate at which accuracy grows over processing time are joint measures of the underlying speed of processing. If one type of structure
(a) Probability of computing of acceptable interpretation
2.0 1.5 Proportional dynamics Functions reach a given proportion of their asymptote at the same time.
1.0
Accuracy (d⬘ units)
0.5 Condition A Condition B
0.0 0.0 (b)
1.0
2.0
3.0
4.0
Speed of computing an acceptable interpretation
2.0 1.5 Disproportional dynamics Functions reach a given proportion of their asymptote at different times.
1.0 0.5 0.0 0.0
1.0
2.0
3.0
Processing time (response time) in seconds
Figure 7.7. Illustrative SAT functions
4.0
142
The Nature of Gradience
can be interpreted more quickly than another, the SAT functions will diVer in rate, intercept, or some combination of the two parameters. This follows from the fact that the SAT rate and intercept are determined by the underlying Wnishing time distribution for the processes that are necessary to accomplish the task. The time to compute an interpretation will vary across trials and materials, yielding a distribution of Wnishing times. Intuitively, the SAT intercept corresponds to the minimum of the Wnishing time distribution, and the SAT rate is determined by the variance of the distribution. Panel (b) depicts a case where the functions diVer in rate of approach to asymptote, leading to disproportional dynamics; the functions reach a given proportion of their asymptote at diVerent times. Dynamics (rate and/or intercept) diVerences are independent of potential asymptotic variation. Readers may be less likely to compute an interpretation for one structure or may Wnd that interpretation less acceptable (e.g. less plausible) than another; however, they may not require additional time to compute that interpretation (McElree 1993, 2000; McElree et al. 2003; McElree and Nordlie 1999).
Part II Gradience in Phonology
This page intentionally left blank
8 Gradient Perception of Intonation C A RO L I N E F E´ RY A N D RU B E N S TO E L
8.1 Introduction Many phonologists associate the term ‘gradience’ with the distinction between phonology—which is supposed to be categorical—and phonetics— which is supposed to be gradient (see Cohn, this volume, for a review of the issues associated with this distinction).1 In recent years, a diVerent role for gradience in phonology has emerged: the well-formedness of phonological structures has been found to be highly gradient in a way that correlates with their frequency. In their chapter, Frisch and Stearns (this volume) show that phonotactic patterns, like consonant clusters and other segment sequences, as well as morphophonology, word-likeness, etc. are gradient in this way. The examination of large corpora is a reliable indicator of relative frequency. Crucially, the less frequent sequences are felt by speakers to be less prototypical exemplars of their category. In grammaticality judgement tasks, wordlikeness tasks, assessment of novel words, etc., less frequent items are likely to get lower grades than more frequent ones. In short, speakers reproduce in their judgements the pattern of relative frequency that they encounter in their linguistic environment. In light of this well-documented (see Frisch and Stearns, this volume and references cited there), but controversial result, the question has arisen for some phonologists as to the need of a grammar operating with abstract phonological categories, like features and phonemes. In their opinion, if phonotactic distribution is learnable by executing probabilistic generalizations over corpora, the only knowledge we need in order to
1 A pilot experiment for this paper was presented at the Potsdam Gradience Conference in October 2002 and some of the results discussed here were presented at the Syntax and Beyond Workshop in Leipzig in August 2003. Thanks are due to two anonymous reviewers, as well as to Gisbert Fanselow and Ede Zimmermann for helpful comments. Thanks are also due to Frank Ku¨gler for speaking the experimental sentences, and to Daniela Berger, Laura Herbst, and Anja Mietz for technical support. Nobody except for the authors can be held responsible for shortcomings.
146
Gradience in Phonology
elaborate ‘grammars’ may turn out to be a stochastic one. But before we can take a stand on this important issue in a competent way, we need to be wellinformed on other aspects of the phonology as well. In this chapter, we take a Wrst step and investigate the question of whether intonational contours are gradient in the same way that segment sequences are. Is it the case that more frequent tonal patterns are more acceptable than less frequent ones? We use the term gradience in the sense of gradient acceptability. Unfortunately, for a number of reasons, large corpora—at least in their present state—are useless for the study of tonal pattern frequencies. One of the reasons relates to the analysis and annotation of tonal patterns. Scholars not only disagree on the kinds of categories entering intonation studies but also on the annotation for ‘rising contour’, ‘fall–rise’, etc. Melodies—like Gussenhoven’s (1984, 2004) nuclear contours or the British school’s ‘heads’ and ‘nuclei’—may well exist as independent linguistic elements, but they are not transcribed uniformly. Even though autosegmental-metrical representations of tonal contours, like ToBI (Beckman and Ayers-Elam 1993; Jun 2005) are evolving to become a standard in intonation studies, they are not suYciently represented in corpora. Most large corpora consist of written material anyway, and those which contain spoken material generally only display segmental transcription rather than tonal. In short, the development of corpora which are annotated in a conventional way for intonation patterns is an aim for the future, but as of now, it is simply not available for German. As a result, we must rely on the intuition of speakers. The questions we address in this chapter are: Which tonal contours are accepted most? Which are less accepted? We will see that the question must be made precise in the following way: given a certain syntactic structure, is there a contour which is accepted in the largest set of contexts? And this is related to the question of pitch accent location. Which constituents are expected to be accented? Which accent structure is the least marked, in the sense of being accepted in the greatest number of contexts? Are some accent patterns (tonal patterns) ‘unmarked’ (more frequent, acquired earlier, but also accepted more easily) in the same sense as consonant clusters or other segment sequences are? Below, we present the results of a perception study bearing on tonal contours. But before we turn to the experiment, we Wrst sum up some relevant issues in the research on prosody and situate our research in this broader context.
Gradient Perception of Intonation
147
8.2 Prosody and intonation Prosody plays a crucial role in communication. To begin with, we partition our utterances in prosodic chunks, like phonological phrases and intonation phrases, which correspond to syntactic constituents (Nespor and Vogel 1986; Truckenbrodt 1999) or information structural blocks (Vallduvı´ 1992). These phrases, which help both speakers and hearers structure the discourse, are signalled phonetically by boundary tones, segmental lengthening, or some other phonological cues. A second factor playing a role in phonological patterning is the distribution and form of pitch accents, associated with prominent syllables. A syllable may be prominent if it is the bearer of the lexical stress of a word or of a larger constituent which is itself prominent. A speaker may decide to speak about some object in her surroundings or an object she knows about, and decide to focus on one property of this object. Or she may answer a question asked by a protagonist because she feels she has to deliver some bit of information. In other words, prominence may be assigned to some linguistic constituents because of contextual or cognitive reasons (Bolinger 1972). The other reason to assign a pitch accent to a syllable is purely grammatical. An internal argument of a German predicate + argument complex, for example, may receive a pitch accent, and the verb may be unaccented. Still, the whole phrase may be prominent (see Bierwisch 1968; Schmerling 1976; Gussenhoven 1983, 1992; von Stechow and Uhmann 1986; Cinque 1993; Fe´ry and Samek-Lodovici 2006, among others). In Standard German, nuclear accents (the Wnal or most prominent accents of an intonation phrase) are either bitonally falling, HL, or rising, LH, whereas prenuclear accents can be rising or falling as well or monotonally high (H) or low (L) (see Fe´ry 1993; Grabe 1998; Grice et al. 2003; Peters 2005 for phonological studies of intonation of standard German). Prosodic phrases may be terminated with a boundary tone, which is written with a subscripted P for a phonological phrase, and a subscripted I for an intonation phrase (following Hayes and Lahiri’s 1991 notation). For the sake of illustration, two pitch tracks of a sentence used in the experiments described below are shown with their phonological tone structure. The Wrst pitch track, in Figure 8.1, is equivalent to a wide-focused realization with two pitch accents, a rising one on the subject Ruderer, and a falling one on the object Boote. The verb, adverb, and particle mit are unstressed. This realization may be dubbed ‘unmarked prosodic structure’ (UPS, see below). It is expected to be the most frequent one, and thus, the most widely accepted pattern for such a declarative sentence. In German, a topic-focus realization, in which the subject is
Gradience in Phonology
148 250
200
Pitch (Hz)
150
100
50 RUDERER
L*H
bringen immer
Hp
BOOTE
H*L
mit
Li 1.67134
0 Time (s)
Figure 8.1. Pitch track of Ruderer bringen immer Boote mit. ‘Oarsmer always bring boats.’
topicalized and the remainder of the sentence is focused, is identical to a wide-focused realization. The second pitch track (Figure 8.2) shows a marked pattern, with just one pitch accent located early in the sentence. This kind of pattern is expected to be conWned to special contexts, in particular those eliciting a narrow focus on the subject. It is not possible to investigate the gradience of tonal patterns out of context. Tonal patterns do not exist as pure melodies: they need to be interpreted as linguistic units, thus as pitch accents or as boundary tones. This can only happen when tonal excursions are associated with text. Moreover, tonal contours are more or less marked only when they are associated with speciWc locations in a sentence, since accent locations are dependent on syntax and information structure. We introduce ‘focus projection’ brieXy in the next section, but have no space to develop all arguments for this phenomenon (see Selkirk 1995; Schwarzschild 1999; Fe´ry and Samek-Lodovici 2006 among others). We propose the concept of ‘Unmarked Prosodic Structure’ (UPS, Fe´ry 2005) as the intonation used when the sentence is realized in a whole-focused environment. It refers to the phrasing and the tonal contour projected when the speakers have no clue about the context. Unmarked Prosodic Structure relies solely on the syntactic structure. A tonal contour
Gradient Perception of Intonation
149
250
200
Pitch (Hz)
150
100
50 RUDERER
bringen immer
H*L
Boote mit
Li 1.73211
0 Time (s)
Figure 8.2. Pitch track of Ruderer bringen immer Boote mit
compatible with unmarked prosody is expected to be acceptable in more environments than other, more marked contours.
8.3 Previous studies on gradient tone perception Few studies, if any, have explicitly addressed the gradience of intonational contours, so we cannot base our work on a rich empirical basis. There are, however, quite a number of studies investigating the question of categories in intonational morphemes, which have found more or less gradient accents or boundaries.2 The most relevant studies for our aim have looked at the adequacy of pitch accent patterns in some speciWc contexts. The issue of the location of pitch accents and their role for the focus structure has been investigated for English by Gussenhoven (1983) and Birch and Clifton (1995), among others, who examine the role of prenuclear accents on the verb in a VP consisting of a verb plus an argument (or an adjunct by Gussenhoven) in English. Gussenhoven’s (1983) sentence accent assignment rules (SAAR) predict that in a focused predicate argument complex, only the argument needs to be stressed, but that a prenuclear accent can be added 2 Some have found categories in the domain of pitch accent realization; for example Pierrehumbert and Steele (1989) or Ladd and Morton (1987).
150
Gradience in Phonology
freely on a verb without impairing processing. In a verbal phrase, by contrast, both the verb and the adjunct need to be stressed. Gussenhoven himself Wnds conWrmation of this prediction in experimental work. In mini-dialogues such as (8.1), there is a diVerence between the focus structure of the sentences answering (8.1a) and (8.1b). In (8.1a), the whole VP share a Xat is focused, whereas in (8.1b) only the direct object is focused, the diVerence being elicited by the preceding question. The same kind of contrast is obtained in the dialogues in (8.2) which contain a verb followed by an adjunct. (8.1) Verb and argument a. C: Do you live by yourself? b. C: I hate sharing things, don’t you? c. U: I share a Xat. (the whole VP or the argument NP is focused) (8.2) Verb and adjunct a. C: Where will you be in January? b. C: Where will you be skiing? c. U: We will be skiing in Scotland. (the whole VP or the adjunct PP is focused) Gussenhoven cross-spliced questions and answers, spoken by native speakers, so as to obtain both answers in both contexts. Subjects then had the task of deciding which of the two answers was the more appropriate response to the preceding question. Gussenhoven found that the presence of an accent on the verb in addition to the expected accent on the object in (8.1) does not change the acceptability of the pitch accent structure, and that this held in both narrow and broad focused contexts. The speakers did not do better than by chance when required to choose between the two contexts on the basis of such an accent pattern. But in (8.2), the absence of a stress on the verb in (8.2a) was an indicator that the verb had to be given (and thus not focused), so that the speakers did better than in the predicate-argument condition in the same task. The reliability of the accent on the verb in deciding for the wide-focus context depended gradiently on the number of unstressed syllables intervening between the two accents. Birch and Clifton (1995) conducted similar experiments, but obtained slightly diVerent results. They also prepared matched and mismatched pairs of questions and answers. An example of a dialogue set is reproduced in (8.3). Only the pairs QA/R1 and QB/R3 match perfectly, all others are predicted to be more or less deviant along the same lines as those just explained, although the authors acknowledge that QA/R2 could be as good as QA/R1 if SAAR make the right predictions.
Gradient Perception of Intonation
151
(8.3) a.
Questions QA: Isn’t Kerry pretty smart? QB: Isn’t Kerry good at math? b. Responses R1: Yes, she teaches math. R2: Yes, she teaches math. R3: Yes, she teaches math.
In judgement and decision tasks, Birch and Clifton found that as an answer to question QA, speakers prefer R1, with two accents, over R2, with just one accent on the argument NP. The diVerence was small but signiWcant. And unsurprisingly, R3 was by far the preferred answer to QB. All other pairs obtained poorer scores. In a second experiment, speakers had to decide how well the pairs made sense. In this case, the results for QA were similar to those of Gussenhoven: there was no diVerence between a sentence with two accents (R1) and a sentence with just one accent on the argument (R2).3 These results, as well as other perception experiments bearing on the location of pitch accents conducted for Dutch (Nooteboom and Kruyt 1987; Krahmer and Swerts 2001) and for German (Hruska et al. 2001) show that, for these three languages at least, a prenuclear accent is readily acceptable, but that a postnuclear one is less easily accepted and that accents on narrowly focused items in an otherwise non-nuclear position are more readily perceived than accents on words accented per default in their unmarked accent pattern. Nooteboom and Kruyt (1987) rightly explain the acceptability of a prenuclear accent in terms of topicalizing or thematicizing the bearer of such an accent, and observe that a sentence with a supplementary prenuclear accent can get an interpretation in which the prenuclear accent is information structurally prominent. In psycholinguistic experiments studying the role of prosody in disambiguating syntactic structures (see for instance Lehiste 1973; Kjelgaard and Speer 1999; Schafer et al. 2000), garden path sentences or sentences with an ambiguous late or early closure/attachment have been tested. These experiments deliver gradient results correlating with the strength and the location of boundaries. Comparing the two realizations of the sentences in (8.4), there is no doubt that intonation can disambiguate the readings. Example (8.4a) is realized as one Intonation Phrase, but in (8.4b), an Intonation Phrase boundary is located after heiratet, which is then understood as an intransitive verb. 3 Birch and Clifton’s results also indicate that a single accent on the verb is readily accepted in a context eliciting broad focus (78 percent of yes). The only situation where speakers accepted a pair less (with 54 per cent and not between 71 and 84 percent as in the other pairs) was when the context was eliciting a narrow focus on the verb, and the answer had a single accent on the argument (QB/R2).
152
Gradience in Phonology
Much more subtle is the question of whether prosody can help with the sentence in (8.5). In one reading, it is the woman who lives in Georgia, and in the other reading, her daughter. The phrasing, in the form of a Phonological Phrase boundary, is roughly the same in both readings. Nevertheless, it is possible to vary the quantity and the excursion of the boundary tone in such a way that the preference for one or the other reading is favoured. L*H L*H H*L LI (8.4) a. [Maria heiratet Martin nicht]I ‘Mary does not marry Martin’ H*L LI L*H H*L LI L*H b. [Maria heiratet]I [Martin nicht]I ‘Mary gets married. Martin does not.’ (8.5) [ [Ich treVe mich heute]P [mit der Tochter der Frau]P]I [ [die in Georgien lebt]P]I ‘I am meeting today with the daughter of the woman who lives in Georgia.’ We are only marginally interested in syntactic disambiguation in this chapter. Rather, our experiment aimed at testing the gradience of German intonational structures. This experiment diVers from the ones conducted by Gussenhoven and by Clifton and Birch in a crucial way: several parameters were systematically varied: sentence type, context, and tonal contours. We were explicitly interested in Wnding out whether some kinds of intonation patterns are more acceptable than others and whether gradience can be observed in the domain of tonal contours.
8.4 Experiment 8.4.1 Background The experiment reported in this section was intended to elucidate the question formulated above: How gradient are tonal contours? We wanted to understand what triggers broad acceptance for intonational patterns. To this aim, we used three diVerent kinds of sentences, which were inserted in diVerent discourse contexts, and cross-spliced. If an eVect was to be found, we expected it to be of the following kind: the unmarked tonal contours should be generally better tolerated than the marked ones. The hypothesis can be formulated as in (8.6).
Gradient Perception of Intonation
153
(8.6) Unmarked Prosodic Structure (UPS) Hypothesis An unmarked prosodic structure, i.e. a prosodic structure adequate in a broad focus environment, is readily accepted. It can be inserted successfully in more environments than a marked prosodic structure, which is appropriate in a restricted number of contexts only. The topic-focus contour that we used in our experiment has the same contour as a broad focus one. Both have a rising pitch accent on the subject, and a falling accent on the focused word (the ‘focus exponent’). We chose a topicfocus environment instead of a broad focus one because of the slightly clearer accent pattern produced with a topic and a focus. Even though we did not include a broad focus context in our experiment, we are conWdent that the pattern we call TF would get high scores in it. 8.4.2 Material Three diVerent kinds of sentences served as our experimental material: six short sentences, six long sentences, and three sentences with ambiguous scope of negation and quantiWer. Every sentence was inserted in three or four matching contexts (see below). In (8.7) to (8.9), an example for each sentence is given along with their contexts. The remaining sentences are listed in the appendix. (8.7) Short sentences Maler bringen immer Bilder mit. Painters bring always pictures with a. Narrow focus on the subject (NFS): Tom hat mir erza¨hlt, dass Fotografen unserer Nachbarin immer Bilder mitbringen. Aber das stimmt nicht: ‘Tom told me that photographers always bring pictures to our neighbour. But this is not true:’ b. Narrow focus on the object (NFO): Angeblich bringen Maler unserer Nachbarin immer Bu¨cher mit. Aber das stimmt nicht: ‘It is said that painters always bring books to our neighbour. But this is not true:’ c. Topic-focus (TF): Meine Nachbarin schmeißt oft große Partys, dafu¨r bekommt sie aber auch viele Geschenke. Regisseure schenken ihr Filme, Schriftsteller Bu¨cher und . . . ‘My neighbour often throws big parties, and therefore she also gets lots of presents. Movie directors give her movies, writers give her books and . . .’
154
Gradience in Phonology
(8.8) Long sentences Passagiere nach Rom nehmen meistens den spa¨ten Flug.4 Passengers to Rome take mostly the late Xight a. Narrow focus on the subject (NFS): Angeblich nehmen die Leute nach Athen meistens den spa¨ten Flug. Aber das stimmt nicht: ‘It is said that the people (Xying) to Athens mostly take the late Xight, but this is not true:’ b. Narrow focus on the object (NFO): Mona sagt, dass Passagiere nach Rom meistens die fru¨he Maschine nehmen. Aber das stimmt nicht: ‘Mona says that passengers to Rome mostly take the early Xight, but this is not true:’ c. Topic-focus (TF): Pendler, die ziemlich weit von zuhause arbeiten, haben oft a¨hnliche Angewohnheiten. Gescha¨ftsleute Richtung Paris fahren oft mit dem Auto, Reisende nach London nehmen den Zug aus Calais und . . . ‘Commuters who work far away from home often have similar habits. Business people who go to Paris often take their car, travellers to London take the train from Calais and . . .’ (8.9) Quantifier-negation sentences Beide Autos sind nicht bescha¨digt worden. Both cars were not damaged a. Two foci (‘two’): Es wa¨re schlimm gewesen, wenn Karl bei dem Unwetter seinen Jaguar und seinen Porsche auf einmal verloren ha¨tte, aber glu¨cklicherweise war es nicht so. ‘It would have been too bad if Charles had lost both his Jaguar and his Porsche because of the bad weather, but fortunately this was not the case.’ b. Narrow focus on the quantiWer (FQ): Ist nur Peters Auto nicht bescha¨digt worden? Nein, . . . ‘Has only Peter’s car not been damaged? No, . . .’ c. Narrow focus on the negation (FN): Ich habe gesehen, dass Deine beiden Autos seit Wochen in der Garage stehen. Sind sie bei dem Unfall bescha¨digt worden?—Nein, ich habe Dir doch schon gesagt: ‘I have seen that both your cars have been sitting in the garage for ages. Were they damaged in the accident?—No, I already told you, . . .’ 4 As Ede Zimmermann (p.c.) observes, it is not undisputed whether there is a structural ambiguity between the temporal and the quantiWcational reading of meistens. We suspect that, even if conWrmed, this ambiguity played no role in the experimental results.
Gradient Perception of Intonation
155
d. Topic-focus (TF): Bei dem Unfall ist verschiedenes passiert. Drei Fahrra¨der sind jetzt Schrott, ein Fußga¨nger ist im Krankenhaus, aber bei den Autos, die dabei involviert waren, war es nicht dramatisch: ‘Several things happened at the accident. Three bikes are now ruined, a pedestrian is at the hospital, but nothing dramatic happened to the cars involved:’ Contexts and stimuli sentences were spoken by a trained speaker and recorded in a sound-proof booth on a DAT recorder. The speaker was instructed to speak naturally, in a normal tempo. He read the context-target pairs at once, Wrst the context and then the stimulus sentences. There were 48 matching pairs for the three experiments altogether (six short sentences, six long sentences, and three quantiWer-negation sentences in their contexts, thus 18 + 18 + 12 pairs). All pitch accents of a speciWc type were realized similarly (see Figures 8.1 to 8.3 for illustrations), and controlled carefully with the help of the speech analysis program PRAAT. Several recording sessions were necessary. The sentences were evaluated by three independent trained phonologists as to their naturalness. Context sentences and stimulus sentences were digitized into individual sound Wles, ready to be cross-spliced. No manipulation whatever was undertaken in order to not endanger the naturalness. We prepared 36 non-matching pairs for the short sentences, 36 for the long sentences, and 32 for the scope sentences, a total of 104 non-matching pairs. The sentences to be evaluated thus consisted in 48 matching and 104 nonmatching pairs, an overall total of 152 pairs. 8.4.3 Subjects Four non-overlapping groups of Wfteen subjects (altogether sixty students at the University of Potsdam) took part in the experiment. They were native speakers of Standard German and had no known hearing or speech deWcit. All were paid or acquired credit points for their participation in the experiment. Two groups judged the sentences on a scale of 1 (very bad) to 8 (perfect), and two groups judged the same sentences in a categorical way: acceptable (yes) or non-acceptable (no). All sixty informants evaluated the scope sentences. In addition, the Wrst and third groups also judged the short sentences, while the second and fourth groups judged the long sentences, thus thirty matching sentences plus sixty-eight non-matching ones each. 8.4.4 Procedure The subjects were in a quiet room with a presentation using the DMDX experiment generator software developed by K. and J. Forster at the
156
Gradience in Phonology
University of Arizona. The experimenter left the subject alone in the room after brief initial instructions as to beginning and ending the session. The subjects worked through the DMDX presentation in a self-paced manner. It led them through a set of worded instructions, practice utterances, and Wnally the experiment itself, consisting of 102 target sentences. No Wllers were inserted, but three practice sentences started the experiment. This experiment was itself included in a set of experiments in which the subjects performed diVerent tasks: production of read material, and dialogues. The instructions made it clear that the aim of the experiment was to test the intonation and stress structure of the sentences, and not their meaning or syntax. The stimuli were presented auditorily only: pairs of context and stimulus sentences were presented sequentially. The subject heard Wrst a context, and after hitting the return key, the test sentence. The task consisted in judging the adequacy of the intonation of the sentence in the given context. Every recorded sentence of the groups of short and long sentences was presented nine times, in three diVerent intonational and stress patterns, and each of these patterns in three diVerent contexts. The scope sentences were presented sixteen times each, in all possible variants. The sentences were presented in a diVerent randomized order for each subject. The set-up and the instructions included the option of repeating the context–stimulus pair for a particular sentence. Most subjects made occasional use of this possibility. Only the last repetition was included in the calculation of the reaction time (see Section 8.4.9). 8.4.5 Short and long sentences There were six short sentences like the one illustrated in (8.7), consisting of a simple subject (an animate noun in plural), a verb (mitbringen ‘bring’), an adverb (immer ‘always’) and a simple object (an inanimate noun in plural). The separable but unstressed particle mit was located at the end of the sentence, resulting in a non-Wnal object. The sentences were inserted in three diVerent contexts inducing the following information structures: narrow corrective focus on the subject (NFS), see Figure 8.2, narrow corrective focus on the object (NFO), see Figure 8.3, and topic-focus (TF), the unmarked prosodic structure, see Figure 8.1. The sentences with narrow focus were elicited by replacing a pre-mentioned element with another one. Our decision to use a corrective narrow focus was driven by the intention to have a very clear accentual structure. A topic-focus was elicited by pre-mentioning some pairs of elements with the same structure as the tested sentence.
Gradient Perception of Intonation
157
250
200
150
100
50 Ruderer
bringen immer
L*H
BOOTE
H*L
mit
Li 1.74512
0 Time (s)
Figure 8.3. Pitch track of Ruderer bringen immer Boote mit
Figure 8.3 displays a narrow focus on the object. The subject Ruderer has a rising prenuclear pitch accent with a much smaller excursion than in the unmarked topic-focus conWguration. The object Boote carries the high-pitched nuclear accent. 8.4.6 Results and discussion Table 8.1 displays the data for the Wrst group of subjects, who had to give scalar judgements. Each cell shows the mean score of the six sentences having the same context-intonation pair. The second group of subjects judged the same sentences in a categorical way, and the mean scores for these subjects are given in Table 8.2. The correlation between the mean scores in Tables 8.1 and 8.2 is
Table 8.1.
Short sentences: mean judgement scores (on a scale from 1 to 8)
Context/intonation
NFS
NFO
TF
NFS NFO TF All contexts
7.7 2.0 2.0 3.9
1.5 7.2 3.7 4.1
2.0 5.9 6.8 4.9
Gradience in Phonology
158
Table 8.2 .
Short sentences: mean judgement scores (categorical)
Context / intonation
NFS
NFO
TF
NFS NFO TF All contexts
0.92 0.22 0.07 0.40
0.18 0.89 0.32 0.46
0.11 0.66 0.87 0.54
Mean
8
Context
7
NFS
6
NFO TF
5 4 3 2 1 NFS
NFO
TF
Intonation
Figure 8.4. Mean acceptability scores for short sentences (scale answers)
almost perfect (Pearson’s product-moment correlation ¼ 0.984, p ¼ 0.000). The interaction between context and intonation is displayed graphically in Figure 8.4. It presents the results of only the Wrst group (i.e. scale answers), but a graph of the second group would look very similar due to the strong association between the two groups. All patterns were accepted best in their own matching context. The unmarked TF tonal contour, corresponding to the UPS, was also readily accepted in the NFO context, a result corresponding to our expectations. NFO had one pitch accent on the object and a reduced prenuclear accent on the subject. It thus looked more like the TF (the realization of the UPS) than the NFS with only one pitch accent on the subject. NFO got intermediate scores in the TF context. The slight inadequacy that our informants felt can be safely attributed to the lack of a topical accent on the subject. By contrast, NFS is accepted in its matching context, but refused in a non-matching context.
Gradient Perception of Intonation
159
Gradient judgements were obtained in two diVerent ways, either directly, by letting the informants give their own gradient results, or indirectly, by counting categorical results. The very high correlation between the two groups of means suggests that it does not matter which method is used, as both methods give very similar results. It will be shown that this correlation reproduced itself for all sentences. In the six longer sentences, one of which is illustrated in (8.8), the subject and the object were syntactically more complex. We decided to include both short and long sentences in our experiment in order to verify the inXuence of length and complexity on the perception of tonal patterns. The distinction between the two kinds of sentences, however, turned out to be minimal, as one can see from a comparison between Figure 8.4 and Figure 8.5. The only diVerence between these sentences and the short ones worth mentioning is that in the TF context, both NFS and NFO were now better tolerated. We do not have any explanation for the slightly better acceptance of the absence of a late accent in a TF context. As an explanation for the better acceptance of NFO in the TF context, we oVer that it might not be so easy to perceive the diVerence between weak and strong prenuclear accents when the sentence is longer. Here also a very high correlation between the two groups of subjects was found, suggesting once more that both scalar and categorical methods are equally good for obtaining gradient judgements.
Mean
8
Context
7
NFS
6
NFO TF
5 4 3 2 1 NFS
NFO
TF
Intonation
Figure 8.5. Mean acceptability scores for long sentences (scale answers)
160
Gradience in Phonology
Let us now relate our Wndings to those described in Section 8.2. First, the scores for matching context-intonation pairs were higher than for nonmatching pairs. Second, a missing nuclear accent and an added nuclear accent triggered lower scores than sentences with the expected accentuation. The same was true for both a missing prenuclear accent and an added prenuclear accent. As described by Hruska et al. (2001), adding a prenuclear accent on the subject in a situation where only a nuclear accent on the object is expected obtained higher scores than other non-matching pairs. In the same way, Gussenhoven, as well as Clifton and Birch, also found that an added prenuclear accent delivers better judgements than an added nuclear accent. 8.4.7 Scope sentences The sentences in the third experiment, one of which is illustrated in (8.9), consist of a subject made up of a quantiWer and a noun, an auxiliary, the negation nicht, and a past participle or an adjective (below called ‘the predicate’), and are characterized by variable scope of negation and variable scope of the quantiWer. Four contexts were constructed, as illustrated in (8.9). First a context eliciting two accents: one on the quantiWer and one on the negation (called ‘two’ in the following). The second context elicits a narrow focus on the quantiWer (FQ), the third context a narrow focus on the negation (FN), and the last context was a topic-focus one, eliciting two accents again, one on the quantiWer, as in ‘two’, and the second one on the predicate (TF). All four contours are illustrated for example (8.9) in Figure 8.6. The syntactic structure of the sentences in this experiment is simple, but their semantic structure is not. First, the negation can have scope over the quantiWer or, vice versa, the quantiWer can have scope over the negation. In the experiment, one context called unambiguously for wide scope of the negation (‘not both cars . . .’), and one unambiguously for wide scope of the quantiWer (‘for both cars, it is not the case that . . .’). The Wrst case (‘two’ context in (8.9)) is triggered by double accentuation on the quantiWer and the negation, and the second case (FQ context in (8.9)) comes with a single accent on the quantiWer.5 It is assumed here that the scope inversion reading elicited by the ‘two’ context can be explained by general properties of topicalization, visible in languages with resumptive pronouns. The topicalized quantiWer in the sentences under consideration is in a position of extraposition to the left, 5 As a generalization, the negation may have wider scope when both the quantiWer and the negation (or the negated constituent) are accented. This generalization holds only for this type of construction, but not for other sentences with inverted scope, such as those with two quantiWers discussed in Krifka (1998).
Time (s) (d) Context ‘TF’
Time (s)
Li H*L
beschädigt worden
(c) Context ‘FN’
0
L*H
sind Beide Autos 50
100
150
200
250
(a) Context ‘Two’
0 50
100
150
NICHT
Time (s)
Li H*L L*H
sind NICHT BEIDE Autos
Hp
Pitch (Hz)
beschädigt worden
2.02
2.06
Pitch (Hz) 200
2.0
Li Hp
50
100
150
200
250
0
L*H
Autos BEIDE
(b) Context ‘FQ’
0
H*L
Autos BEIDE 100
150
200
250
50
Pitch (Hz)
250
161
H*L
BESCHÄDIGT worden sind nicht
Time (s)
Li
sind nicht
beschädigt worden
2.09
Gradient Perception of Intonation
Pitch (Hz)
Figure 8.6. Four realizations of (8.9)
but is nevertheless interpreted to be in the scope of the negation (see also Ho¨hle 1991). All authors who have studied the scope inversion phenomenon in German (Ho¨hle 1991; Jacobs 1997; Bu¨ring 1997; Krifka 1998) have insisted on the necessity of a rise–fall contour to get the interpretation aimed at, and this is the contour which was produced by our speaker as well. Crucially, an
162
Gradience in Phonology
independent phonological phrase is formed which contains the topicalized constituent, separate from the main clause. In a realization with only one accent on the quantiWer, by contrast, both the quantiWer and the negation are interpreted in situ and consequently, the quantiWer has wide scope over the negation.6 Prosodically, the quantiWer cannot be interpreted as being topicalized because it has the focal accent of the sentence. In our experiment, the context eliciting this accent pattern was one in which the quantiWer was contrastively accented. The other two patterns, a single accent on the negation (FN) and a double accent on the quantiWer and on the predicate (TF) do not evoke clear scopal relationships. A unique accent on the negation contradicts the preceding sentence. In the experimental sentences, the predicate had been stressed in the preceding matching context. However, it was not possible to unambiguously reconstruct the context from the negated sentence only. An accent on the quantiWer, the noun or the predicate changes the pragmatics of the sentence, but in the realization with a single accent on the negation, these diVerences are cancelled. The hypothesis was thus that an accent on the negation would be tolerated in a variety of contexts. The TF context with accents both on the NP containing the quantiWer and on the predicate is similar to the ‘two’ context. It can also have diVerent readings, one being that the predicate is contrasted. Inverted scope is also not impossible in this case. To sum up, a realization with a single accent—especially when the accent is on the quantiWer—seems to be more marked than a realization with two accents, in the sense that it is adequate in less contexts. With the third experiment, we wanted to verify this hypothesis. 8.4.8 Results and discussion Tables 8.3 and 8.4 as well as Figure 8.7 present the mean values in both scalar and categorical judgements. Once again, the correlation between the two groups of means is almost perfect (Pearson’s product-moment correlation ¼ 0.973, p ¼ 0.000). The results are not as clear cut as in the short and long sentences. For the ‘two’ context, the FN and the FQ, the matching pairs obtained better scores than the other ones. It is also to be noticed that the TF and ‘two’ contexts are nearly interchangeable. This can be attributed to the presence of two accents 6 Krifka (1998) explains scope inversion of sentences with two quantiWers by allowing movement of accented constituents at the syntactic component of the grammar. Both topicalized and focused constituents have to be pre-verbal at some stages of the derivation in order to get stress.
Gradient Perception of Intonation
163
Table 8.3. Scope sentences: mean judgement scores (on a scale from 1 to 8) Context / intonation
two
FQ
FN
TF
two FQ FN TF All contexts
6.1 3.7 5.4 5.4 5.1
3.6 7.0 3.1 3.6 4.3
5.1 3.2 6.5 4.7 4.9
6.1 3.4 5.3 5.8 5.1
Table 8.4.
Scope sentences: mean judgement scores (categorical) two
FQ
FN
TF
two FQ FN TF All contexts
0.73 0.32 0.62 0.76 0.61
0.27 0.90 0.18 0.39 0.43
0.64 0.26 0.90 0.54 0.59
0.71 0.36 0.57 0.72 0.59
Mean
Context / intonation
8
Context
7
two
6
FQ FN TF
5 4 3 2 1 two
FQ
FN
TF
Intonation
Figure 8.7. Mean acceptability scores for scope sentences (scale answers)
164
Gradience in Phonology
in both sentences, Wtting both contexts requiring two accents. The same cannot be said for the realizations with one accent since the accent elicited in each case is at a diVerent place. However, the FN sentences, with a late accent, elicited better scores in a non-matching environment than the FQ sentences with an early accent. The highly marked prosodic pattern found in FQ sentences obtained poor scores in all non-matching contexts, and the best results in the matching context. To sum up the results obtained for these sentences, it can be observed that the interchangeability of contexts and intonation pattern is higher in these sentences than in the short and long sentences. We explain this pattern of acceptability with the fact that the scope structure of these sentences, complex and subject to diVerent interpretations, renders the accent patterns less rigid. Another interpretation could be that speakers were more concentrated on understanding the scopal relationships and were thus less sensitive to slight variations in the tonal structure of the sentences they heard. 8.4.9 Reaction times Additional information on the cognitive cost of the task was gathered by the measure of reaction times. Table 8.5 shows that it took more time to process the long sentences and the scope sentences than the short ones. It can also be observed that making a decision on a scale needs more time than making a categorical decision (except for the long sentences, where no diVerence could be observed). We could not Wnd any correlation between the number of keys available for responding and the reaction times, neither in the scalar decision task when comparing the subjects who used all keys and those using only four to six keys (out of the eight at their disposal), nor between the two tasks in comparison. In other words, it is not the case that using eight keys instead of two increases the time it takes to make a decision. We conclude that the increase of reaction time that we observe is truly due to an increase of cognitive complexity. Table 8.5.
Mean reaction times Short sentences
Scale
4.2 s (sd ¼ 2.24; N ¼ 810) Categorical 3.7 s (sd ¼ 2.22; N ¼ 810)
Long sentences
Scope sentences
5.0 s (sd ¼ 2.36; N ¼ 810) 5.0 s (sd ¼ 2.27; N ¼ 810)
5.0 s (sd ¼ 2.83; N ¼ 1,440) 4.7 s (sd ¼ 2.80; N ¼ 1,440)
Gradient Perception of Intonation
165
8.5 Conclusion This chapter has investigated the gradient nature of the acceptability of intonation patterns in German declarative sentences. Three kinds of sentences elicited in diVerent information structural contexts were cross-spliced and informants were asked to judge the acceptability of context-target pairs. The clearest results were obtained for the short sentences, although the long sentences delivered comparable results. Finally, the tonal patterns of scope sentences were much more diYcult to interpret, because the scope behaviour of the negation and the quantiWer was variable, depending on the accent structure of these sentences. For all sentences, we found that a prosody with two accents got better scores than a prosody with only one accent, and that a contour with a late accent was better accepted in non-matching environments. We dubbed the prosody with two accents, acceptable in a broad focus context or in a topic-focus context, UPS, for ‘unmarked prosodic structure’, and we observe that this contour is accepted in a non-matching context more readily than contours with only one accent, especially when this single accent is located early in the sentence. The results of the short and long sentences, and, to a lesser extent, those of the scope sentences, point to a good correlation between context and prosodic structure. Speakers and hearers do use prosodic information such as presence versus absence of pitch accents, their form, and the phrasing to assess the well-formedness of context-target sentence pairs, and they do so consistently. Their performance is ameliorated when the syntactic and semantic structure of the sentence is very simple. It can safely be claimed that in German, information structure plays an important role in the processing of prosody, whereas it has been shown for syntax that word order alone, presented in written form, does not have the same eVect (see for instance Schlesewsky, Bornkessel, and McElree, this volume, and references cited there). The conclusion one could tentatively draw from this diVerence is that intonation encodes information structure better than syntax. An interesting result is that in all three experiments the scores obtained for the two groups of subjects (scale and yes–no answers) were similar. In other words, the same gradient results can be obtained by using either gradient or non-gradient judgements. This is remarkable since the cognitive task executed in both groups was diVerent. It could have been the case that in a sentence with a high score of acceptability the rating by scale would have been gradient, but the yes–no judgement categorical. However, if the groups of informants
Gradience in Phonology
166
are large enough, ‘intolerant’ subjects compensate for the degree of insecurity that remains in subjects asked to give a judgement on a scale. Although we oVer no analysis of how our gradient data can be accounted for in a formal grammar, we conclude with the observation that a categorical grammar will not be adequate. Speakers are more or less conWdent in their judgements, and gradiently accept sentences intended to express a diVerent information structure, depending on whether the sentences have a similar accent pattern. A gradient grammar, like stochastic OT, which uses overlapping constraints, can account much better for the observed variability. This is, however, a subject for future research.
8.6 Appendix Short sentences (three contexts) 1. 2. 3. 4. 5. 6.
Maler bringen immer Bilder mit. ‘Painters always bring pictures.’ Lehrer bringen immer Hefte mit. ‘Teachers always bring notebooks.’ Sa¨nger bringen immer Trommeln mit. ‘Singers always bring drums.’ Ruderer bringen immer Boote mit. ‘Oarsmen always bring boats.’ Geiger bringen immer Platten mit. ‘Violinists always bring records.’ Schu¨ler bringen immer Stifte mit. ‘Students always bring pens.’
Long sentences (three contexts) 7. Passagiere nach Rom nehmen meistens den spa¨ten Flug. ‘Passengers to Rome always take the late Xight.’ 8. Reisende nach Mailand fahren oft mit dem schnellen Bus. ‘Travelers to Milan often travel with the express bus.’ 9. Autofahrer nach Griechenland nehmen immer den ku¨rzesten Weg. ‘Car drivers always take the shortest road.’ 10. SchiVe nach Sardinien fahren meistens mit voller Ladung. ‘Ships to Sardinia mostly sail with a full cargo.’ 11. Zu¨ge nach England fahren oft mit rasantem Tempo. ‘Trains to England often ride at full speed.’ 12. Trekker nach Katmandu reisen meistens mit vollem Rucksack. ‘Trekkers to Katmandu mostly travel with a full backpack.’ Variable scope sentences (four contexts) 13. Alle Genera¨le sind nicht loyal. ‘All generals are not loyal.’ 14. Beide Autos sind nicht bescha¨digt worden. ‘Both cars have not been damaged.’ 15. Viele Ga¨ste sind nicht gekommen. ‘Many guests did not come.’
9 Prototypicality Judgements as Inverted Perception PAU L B O E R S M A
In recent work (Boersma and Hayes 2001), Stochastic Optimality Theory has been used to model grammaticality judgments in exactly the same way as corpus frequencies are modelled, namely as the result of noisy evaluation of constraints ranked along a continuous scale. It has been observed, however, that grammaticality judgements do not necessarily reflect relative corpus frequencies: it is possible that structure A is judged as more grammatical than a competing structure B, whereas at the same time structure B occurs more often in actual language data than structure A. The present chapter addresses one of these observations, namely the finding that ‘ideal’ forms found in experiments on prototypicality judgements often turn out to be peripheral within the corpus distribution of their grammatical category (Johnson, Flemming, and Wright 1993). At first sight one must expect that Stochastic Optimality Theory will have trouble handling such observed discrepancies. The present chapter, however, shows that a bidirectional model of phonetic perception and production (Boersma 2005) solves the paradox. In that model, corpus frequency reflects the production process, whereas prototypicality judgements naturally derive from a simpler process, namely the inverted perception process.
9.1 The /i/ prototype eVect: prototypes are peripheral A notorious example of the diVerence between grammaticality judgements and corpus frequencies is the ‘/i/ prototype eVect’ in phonology: if the experimenter asks a subject to choose the most /i/-like vowel from among a set of tokens that vary in their spectral properties, the subject tends to choose a very peripheral token, i.e. one with a very low Wrst formant (e.g. 250 Hz) and a very high second formant (Johnson et al. 1993; Frieda et al. 2000). In actual speech, less extreme formant values (e.g. an F1 of 300 Hz) are much more common. Apparently, then, the token that the subject prefers is much more /i/-like than the average realization of the vowel /i/ is.
168
Gradience in Phonology
9.1.1 Why the /i/ prototype eVect is a problem for linguistic models The /i/ prototype eVect has consequences for models of phonological grammar. The commonly assumed three-level grammar model, for instance, has trouble accounting for it. In this model, the phonology module maps an abstract underlying form (UF), for instance the lexical vowel jij, to an equally discrete abstract surface form (SF), for instance /i/, and the phonetic implementation module subsequently maps this phonological SF to a continuous overt phonetic form (OF), which has auditory correlates, such as a value of the Wrst formant, and articulatory correlates, such as a certain tongue height and shape. Such a grammar model can thus be abbreviated as UF!SF!OF. The experimental prototypicality judgement task described above involves a mapping from the phonological surface form /i/ to an overt auditory Wrst formant value, that is an SF!OF mapping. In the three-level grammar model, therefore, the natural way to account for this task is to assume that it shares the SF!OF mapping with the phonetic implementation process. If so, corpus frequencies (which result from phonetic implementation) should be the same as grammaticality judgements (whose best result is the prototype). Given that the two turn out to be diVerent, Johnson et al. (1993) found the UF!SF!OF model wanting and proposed the model UF!SF!HyperOF!OF, where the additional intermediate representation HyperOF is a ‘hyperarticulated’ phonetic target. The prototypicality task, then, was proposed to tap HyperOF, whereas corpus frequencies reXect OF. The present paper shows, however, that if one distinguishes between articulatory and auditory representations at OF, the two tasks (production and prototypicality) involve diVerent mappings, and the /i/ prototypicality eVect arises automatically without invoking the additional machinery of an extra intermediate representation and an extra processing stratum.
9.2 A bidirectional constraint-based explanation of the /i/ prototype eVect This section presents a simple constraint-based model of the phonological grammar and of Wve phonological processes that are deWned on this grammar. The account leads to an informal explanation for the /i/ prototype eVect. 9.2.1 A grammar model with two phonological and two phonetic representations The grammar model presented in Figure 9.1 is the Optimality-Theoretic model of ‘phonology and phonetics in parallel’ (Boersma 2005).
Prototypicality Judgements as Inverted Perception
169
Figure 9.1 shows the four relevant representations and their connections. There are two separate phonetic forms: the auditory form (AudF) appears because it is the input to comprehension, and the articulatory form (ArtF) appears because it is the output of production. ArtF occurs below AudF because 9-month-old children can perceive sounds that they have no idea how to produce (for an overview, see Jusczyk 1997); at this age, therefore, there has to be a connection from AudF to SF (or even to UF, once the lexicon starts to be built up) that cannot pass through ArtF; Figure 9.1 generalizes this for speakers of any age. 9.2.2 Linguistic processes are deWned on the grammar Figure 9.1 is not a processing model. Rather, linguistic and paralinguistic tasks have to be deWned as processes that travel the representations in Figure 9.1 and are evaluated by the constraints that are visited on the way. Normal language use consists of two linguistic tasks: that of the listener (comprehension) and that of the speaker (production). This section describes the implementation of these two linguistic tasks; three paralinguistic tasks (including the prototypicality task), which can be regarded as simpliWed versions of the linguistic tasks, are described in the next section. Boersma (2005) proposes that in the model of Figure 9.1 the linguistic task of comprehension is implemented as two consecutive mappings (cf. McQueen and Cutler 1997), as shown on the left in Figure 9.2. The Wrst mapping in comprehension is perception (also called prelexical perception or phonetic parsing). In general, perception is the process that maps continuous sensory information onto a more abstract mental representation. In phonology, perception is the process that maps continous auditory (and sometimes visual) information, that is AudF, onto a discrete phonological surface representation, that is SF. The shortest route from AudF to SF in Figure 9.1 determines what constraints evaluate this mapping: the relation between input (AudF) and output (SF) is evaluated with cue constraints UF
lexical constraints faithfulness constraints
SF
structural constraints cue constraints
AudF
auditory constraints? sensorimotor constraints
ArtF
articulatory constraints
Figure 9.1. The grammar model underlying bidirectional phonology and phonetics
170
Gradience in Phonology
(Escudero and Boersma 2003, 2004), and the output (SF) is evaluated with the familiar structural constraints known since the earliest Optimality Theory (OT; Prince and Smolensky 1993). This AudF!SF mapping is languagespeciWc, and several aspects of it have been modelled in OT: categorizing auditory features to phonemes and autosegments (Boersma 1997, 1998a et seq.; Escudero and Boersma 2003, 2004), and building metrical foot structure (Tesar 1997; Tesar and Smolensky 2000; Apoussidou and Boersma 2004). The second mapping in comprehension, shown at the top left in Figure 9.2, is that from SF to UF and can be called recognition, word recognition, or lexical access. In this mapping, the relation between input (SF) and output (UF) is evaluated with faithfulness constraints such as those familiar from two-level OT (McCarthy and Prince 1995), and the output (UF) is evaluated with lexical access constraints (Boersma 2001). Boersma (2005) proposes that in contradistinction to comprehension, production (shown at the right in Figure 9.2) consists of one single mapping from UF to ArtF, without stopping at any intermediate form as is done in comprehension. In travelling from UF to ArtF, the two representations SF and AudF are necessarily visited, so the production process must evaluate triplets of { SF, AudF, ArtF } in parallel. As can be seen from Figure 9.1, the evaluation of these triplets must be done with faithfulness constraints, structural constraints, cue constraints, sensorimotor constraints (which express the speaker’s knowledge of how to pronounce a target auditory form, and of what a given articulation will sound like), and articulatory constraints (which express minimization of eVort; see Boersma 1998a, Kirchner 1998). According to Boersma (2005), the point of regarding phonological and phonetic production as parallel processes is that this can explain how discrete phonological decisions at SF can be inXuenced by gradient phonetic considerations such as salient auditory cues at AudF (e.g. Steriade 1995) and articulatory eVort at ArtF (e.g. Kirchner 1998). Comprehension
Production
UF
UF
SF
SF
AudF
AudF
ArtF
ArtF
Figure 9.2. The linguistic task of the listener, and that of the speaker
Prototypicality Judgements as Inverted Perception Phoneme categorization
Figure 9.3.
Prototypicality
171
Phoneme production
UF
UF
UF
SF
SF
SF
AudF
AudF
AudF
ArtF
ArtF
ArtF
Three laboratory tasks
9.2.3 Experimental tasks are paralinguistic processes Experimental tasks in the phonetics laboratory are often designed to reXect only a part of one of the linguistic processes shown in Figure 9.2. The present section addresses the three tasks that are relevant for explaining the /i/ prototype eVect, namely the phoneme categorization task, the phoneme production task, and the phoneme prototypicality task. In the experimental task of phoneme categorization the participant is asked to classify a given stimulus as one of the phonemes of her language, for instance to classify a synthetic vowel with a known F1 of 360 Hz as either the vowel /i/ or the vowel /i/. Such an experiment tends to be set up in such a way that the inXuence of the lexicon is minimized, for instance by presenting the response categories as semantically empty vowel symbols (e.g. ‘‘a’’, ‘‘e’’, ‘‘i’’, ‘‘o’’, ‘‘u’’ for Spanish listeners) or as equally accessible lexical items (e.g. ‘‘ship’’ and ‘‘sheep’’ for English listeners).1 In the former case, UF may not be accessed at all; in the latter case, the SF!UF mapping may be equally easy for both categories; in both cases, the inXuence of the lexicon may be ignored, so that the task can be abbreviated as in Figure 9.3 (left). The only constraints that are relevant for this mapping are the cue constraints and the structural constraints. In the experimental task of phoneme production the participant is asked to pronounce either a nonsense word or a word with ‘‘no’’ phonology, that is where SF is identical to UF (no faithfulness violations), such as English hid or heed. In both cases, the inXuence of the lexicon can again be ignored, so the task can be abbreviated as in Figure 9.3 (right). The relevant constraints will 1 Unless the very point of the experiment is to investigate the inXuence of the lexicon on prelexical perception. By the way, if such inXuences turn out to exist (e.g. Ganong 1980; Samuel 1981), the comprehension model in Figure 9.2 will have to be modiWed in such a way that perception and recognition work in parallel. In that case, however, the phoneme categorization task will still look like that in Figure 9.3, and the results of the present paper will still be valid.
172
Gradience in Phonology
be the cue constraints, the sensorimotor constraints, and the articulatory constraints. In the experimental task of prototypicality judgements the participant is given an SF, as in the phoneme production task, and asked to choose an AudF, similar to those in the phoneme categorization task. Since this task involves neither the lexicon nor any actual articulation, it can be abbreviated as Figure 9.3 (middle). The only relevant constraints are the cue constraints (if auditory constraints do not exist). 9.2.4 The informal explanation The fact that the prototypicality task yields a diVerent result than the phoneme production task can be attributed to the diVerence between the relevant two processes in Figure 9.3: in the production task, constraints on articulatory eVort do play a role, in the prototypicality task they do not. This is a robust eVect that seems to withstand conscious manipulation: even if listeners are asked to choose the auditory form that they would say themselves, they respond with the prototype, not with the form they would really produce (Johnson et al. 1993). The result of the involvement of articulatory constraints in the phoneme production task is that peripheral tokens of /i/ may be ruled out because they are too eVortful, for example because they require too much articulatory precision, whereas tokens closer to the easiest vowel articulation, perhaps [@], do not violate any high-ranked articulatory constraints.
9.3 A formalization in Optimality Theory While the explanation presented informally in Section 9.2.4 would work for any constraint-based theory of bidirectional phonology and phonetics in parallel, this chapter formally shows that it works for the particular constraint-based framework of Stochastic Optimality Theory. The point of this exercise is not only to provide a rigorous illustrative example, so as to achieve descriptive adequacy, but also to propose an explanation of the acquisition of the relevant part of the grammar in terms of an initial state and a learning path, so as to achieve explanatory adequacy and to show that the resulting grammar is stable. 9.3.1 Formalizing phoneme categorization and its acquisition As seen in Figure 9.3 (left), phoneme categorization can be seen as involving prelexical perception only, that is as a mapping from an auditory form to a phonological surface form. For the case of the /i / prototype eVect, it is relevant
Prototypicality Judgements as Inverted Perception /e/
/a/
Probability density
/i/
173
100
200
300
400
500 600 F1 (Hz)
700
800
900
Figure 9.4. Production distributions of three vowels
to look at auditory events that are likely to be perceived as /i/ or as one of its neighbours in the vowel space, such as /I/ or /e/. Thus, the auditory form (AudF) is a combination of an F1 and an F2 value, and the surface form (SF) is a vowel segment such as /i/ or /e/. This section shows how the AudF!SF mapping is handled with an Optimality-Theoretic grammar that contains cue constraints, which evaluate the relation between AudF and SF, and structural constraints, which evaluate the output representation SF. For simplicity I discuss the example of a language with three vowels, /a/, /e/, and /i/, in which the only auditory distinction between these vowels lies in their F1 values. Suppose that the speakers realize these three vowels most often with F1 values of 700 Hz, 500 Hz, and 300 Hz, respectively, but that they also vary in their realizations. If this variation can be modelled with Gaussian curves with standard deviations of 60 Hz, the distributions of the speakers’ productions will look as in Figure 9.4. Now how do listeners classify incoming F1 values, that is to which of the three categories /a/, /e/, and /i/ do they map a certain incoming F1 value x? This mapping can be handled by a family of negatively formulated Optimality-Theoretic cue constraints, which can be expressed as ‘if the auditory form contains an F1 of x Hz, the corresponding vowel in the surface form should not be y’ (Escudero and Boersma 2003, 2004).2 These cue constraints exist for all F1 values between 100 and 900 Hz and for all three vowels. Examples are given in (9.1). 2 There are two reasons for the negative formulation of these constraints. First, a positive formulation would simply not work in the case of the integration of multiple auditory cues (Boersma and Escudero 2004). Second, the negative formulation allows these constraints to be used in both directions (comprehension and production), as is most clearly shown by the fact that they can be formulated symmetrically as *[x]/y / (Boersma 2005). The former reason is not relevant for the present paper, but the second is, because the same cue constraints are used in the next two sections.
174
Gradience in Phonology
(9.1) Cue constraints for mapping F1 values to vowel categories: ‘an F1 of 340 Hz is not /a/’ ‘an F1 of 340 Hz is not /e/’ ‘an F1 of 340 Hz is not /i/’ ‘an F1 of 539 Hz is not /i/’ The second type of constraints involved in the AudF!SF mapping are the structural constraints that evaluate the output SF. In the present case, they could be something like */a/, */e/, and */i/. I assume that all three vowels are equally perfectly licit phonemes of the language, so that these constraints must be ranked low. I ignore them in the rest of this paper, so that phoneme categorization is handled solely by the cue constraints.3 The ranking of the cue constraints results from lexicon-driven perceptual learning (Boersma 1997, 1998a; Escudero and Boersma 2003, 2004): the learner hears an auditory event drawn from the environmental distributions in Figure 9.4 and classiWes it as a certain vowel, and the lexicon subsequently tells her which vowel category she should have perceived. This type of learning assumes that the acquisition process contains a period in which the listener already knows that the language has three vowel categories and in which all her lexical representations are already correct. If such a learner misclassiWes a speaker’s intended /pit/ as /pet/, her lexicon, which contains the underlying form jpitj, will tell her that she should have perceived /pit/ instead. When detecting an error in this way, the learner will take action by changing the ranking of some constraints. Suppose that at some point during acquisition some of the constraints are ranked as in Tableau 9.1. The learner will then perceive an incoming F1 of 380 Hz as the vowel /e/, as indicated by the pointing Wnger in Tableau 9.1. We can also read from Tableau 9.1 that 320 Hz will be perceived as /i/, and 460 Hz as /e/. Tableau 9.1. Learning to perceive vowel height
[380 Hz] 320 Hz 380 Hz 460 Hz 320 Hz 460 Hz 380 Hz 380 Hz 320 Hz 460 Hz not /a/ not /a/ not /i/ not /e/ not /a/ not /i/ not /e/ not /i/ not /e/ (UF = |i|) /a/
*!
/e/ /i/
←* *!→
3 Auditory constraints, if they exist, evaluate the input and cannot therefore distinguish between the candidates.
Prototypicality Judgements as Inverted Perception
175
If the lexicon now tells the learner that she should have perceived /i/ instead of /e/, she will regard this as the correct adult SF, as indicated by the check mark in Tableau 9.1. According to the gradual learning algorithm for stochastic OT (Boersma 1997, Boersma and Hayes 2001), the learner will take action by raising the ranking value of all the constraints that prefer the adult form /i/ to her own form /e/ (here only ‘380 Hz is not /e/’) and by lowering the ranking value of all the constraints that prefer /e/ to /i/ (here only ‘380 Hz is not /i/’). These rerankings are indicated by the arrows in Tableau 9.1. To see what kind of Wnal perception behaviour this procedure leads to, I ran a computer simulation analogous to the one by Boersma (1997). A virtual learner has 243 constraints (F1 values from 100 to 900 Hz in steps of 10 Hz, for all three vowel categories), all with the same initial ranking value of 100.0. The learner then hears 10 million F1 values randomly drawn from the distributions in Figure 9.4, with an equal probability of one in three for each vowel. She is subjected to the learning procedure exempliWed in Tableau 9.1, with full knowledge of the lexical form, with an evaluation noise of 2.0, and with a plasticity (the amount by which ranking values rise or fall when a learning step is taken) of 0.01. The result is shown in Figure 9.5. The Wgure is to be read as follows. F1 values below 400 Hz will mostly be perceived as /i/, since in that region the constraint ‘an F1 of x Hz is not /i/’ (the solid curve) is ranked lower than the constraints ‘an F1 of x Hz is not /e/’ (the dashed curve) and ‘an F1 of x Hz is not /a/’ (the dotted curve). Likewise, F1 values above 600 Hz will mostly be perceived as /a/, and values between 400 and 600 Hz mostly as /e/. For every F1 value the Wgure shows us not only the most often perceived category but also the degree of variation. Around 400 Hz, /i/ and /e/ perceptions are equally likely. Below 400 Hz it becomes more likely that the listener will perceive /i/, and increasingly so when the
Ranking value
110 105 100 95 90 100
200
300
400
500 600 F1 (Hz)
700
800
900
Figure 9.5. The final ranking of ‘an F1 of x Hz is not /vowel/’, for the vowels /i/ (solid curve), /e/ (dashed curve), and /a/ (dotted curve)
176
Gradience in Phonology
distance between the curves for /i/ and /e/ increases. This distance is largest for F1 values around 250 Hz, where there are 99.8 per cent /i/ perceptions and only 0.1 per cent perceptions of /e/ and /a/ each. Below 250 Hz, the curves approach each other again, leading to more variation in categorization. A detailed explanation of the shapes of the curves in terms of properties of the gradual learning algorithm (approximate probability matching between 250 and 750 Hz, and low corpus frequencies around 100 and 900 Hz) is provided at the end of the next section, where the shapes of the curves are related to the behaviour of prototypicality judges. 9.3.2 Formalizing the prototypicality task As seen in Figure 9.3 (middle), the prototypicality task can be seen as a mapping from a phonological surface form to an auditory form, without the involvement of an articulatory form. This section shows how this SF!AudF mapping is handled with the same optimality-theoretic cue constraints as phoneme categorization. From Figure 9.1, we can see that auditory constraints might be involved in evaluating the output of this mapping, but given that we do not know whether such constraints (against loud and unpleasant noises?) are relevant at all for phonology, I ignore them here. The mapping from SF to AudF in the prototypicality task is thus entirely handled by the cue constraints. For the listener simulated in the previous section, these constraints are ranked as in Figure 9.5. The ranking of the constraints for /i/ has been copied from the solid curve in Figure 9.5 to the top row in Tableau 9.2. In Figure 9.5, for instance, the bottom of the /i/ curve lies at an F1 of 250 Hz. In Tableau 9.2 this is reXected by the bottom ranking of ‘250 Hz is not /i/’. In Tableau 9.2 we also see that as the F1 goes up or down from 250 Hz, the constraint against perceiving this F1 as /i/ becomes higher ranked, just as in Figure 9.5. With the ranking shown in Figure 9.5 and Tableau 9.2, and with zero evaluation noise, the listener will choose an F1 of 250 Hz as the optimal value for /i/. This is more peripheral (more towards the edge of the F1 continuum) than the most often heard /i/, which has an F1 of 300 Hz according to Figure 9.4. The size of the eVect (50 Hz) is comparable to the eVect found by Johnson et al. (1993) and Frieda et al. (2000). Of course, this simulated value of 50 Hz depends on several assumptions, such as an initial equal ranking for all the constraints, which is probably unrealistic (for a more realistic proposal based on a period of distributional learning before lexicondriven learning, see Boersma et al. 2003). The parameter that determines the size of the eVect in the present simulation is the standard deviation of the F1 values in the environmental distribution in Figure 9.1, which was 60 Hz. With diVerent standard deviations, a diVerent eVect size is expected.
Prototypicality Judgements as Inverted Perception
177
Tableau 9.2. The auditory F1 value that gives the best / i /
/i/
320 Hz not /i/
310 Hz not /i/
[170 Hz]
170 Hz not /i/
180 Hz not /i/
300 Hz not /i/
190 Hz not /i/
290 Hz not /i/
200 Hz not /i/
280 Hz not /i/
210 Hz not /i/
270 Hz not /i/
230 Hz not /i/
220 Hz not /i/
240 Hz not /i/
260 Hz not /i/
*!
[180 Hz]
*!
[190 Hz]
*!
[200 Hz]
*!
[210 Hz]
*!
[220 Hz]
*!
[230 Hz]
*!
[240 Hz]
*!
[250 Hz]
*
[260 Hz]
*!
[270 Hz]
*!
[280 Hz]
*!
[290 Hz]
*!
[300 Hz]
*!
[310 Hz] [320 Hz]
250 Hz not /i/
*! *!
The result of 250 Hz in Tableau 9.2 was based on a categorical ranking of the constraints. In the presence of evaluation noise the outcome will vary. If the evaluation noise is 2.0, i.e. the same as during the learning procedure of the previous section, the outcome for the listener of Figure 9.5 and Tableau 9.2 will vary as in Figure 9.6, which was assembled by computing the outcomes of 100,000 tableaus like Tableau 9.2. The diVerences between environmental F1 values and prototypicality judgements seen when comparing Figures 9.4 and 9.6 are very similar to the production/perception diVerences in the experiments by Johnson et al. (1993) and Frieda et al. (2000).
Gradience in Phonology
100
/e/
/i/
Probability density
178
200
300
400
500 FI (Hz)
/a/
600
700
800
900
Figure 9.6. Prototypicality distributions for the three vowels
The conclusion is that if the prototypicality task uses the same constraint ranking as phoneme categorization, auditorily peripheral segments will be judged best if their auditory values are extreme, because cue constraints have automatically been ranked lower for extreme auditory values than for more central auditory values. The question that remains is: how has the /i/ curve in Figure 9.5 become lower at 250 Hz than at 300 Hz? The answer given by Boersma (1997) is the probability-matching property of the Gradual Learning Algorithm: the ultimate vertical distance between the /i/ and /e/ curves for a given F1 is determined (after learning from a suYcient amount of data) by the probability that that F1 reXects an intended /i/ rather than an intended /e/; given that an F1 of 250 Hz has a smaller probability of having been intended as /e/ than an F1 of 300 Hz, the vertical distance between the /i/ and /e/ curves grows to be larger at 250 Hz than at 300 Hz, providing that the learner is given suYcient data. With the Gradual Learning Algorithm and enough input, the prototypicality judge will automatically come to choose the F1 token that is least likely to be perceived as anything else than /i/.4 There are two reasons why the prototype does not have an even lower F1 than 250 Hz. The Wrst reason, which can be illustrated with the simulation, is that there are simply not enough F1 values of, say, 200 Hz to allow the learner to reach the Wnal state of a wide separation between the /i/ and /e/ curves; for the simulated learner, Figure 9.5 shows that even 10 million inputs did not suYce. The second reason, not illustrated by the simulation, is that in reality the F1 values are not unbounded. Very low F1 values are likely to be perceived as an approximant, fricative, or stop rather than as /i/. Even within the vowel 4 This goal of choosing the least confusing token was proposed by Lacerda (1997) as the driving force behind the prototypicality judgement. He did not propose an underlying mechanism, though. See also Section 9.4.
Prototypicality Judgements as Inverted Perception
179
space, this eVect can be seen at the other end of the continuum: one would think that the best token for /a/ would have an extremely high F1, but in reality an F1 of, say, 3000 Hz will be perceived as /i/, because the listener will reinterpret it as an F2 with a missing F1. 9.3.3 Formalizing phoneme production Now that we have seen how inverted perception accounts for the /i/ prototype eVect, we still have to see how it is possible that the same peripheral values are not used in the phoneme production task. Presumably, after all, the learner as a speaker will grow to match the modal F1 value of 300 Hz that she Wnds in her environment (if sound change can be ignored). The answer is shown in Figure 9.3 (right): the phoneme production task takes an SF as its input (as does the prototypicality task), but has to generate both an auditory form and an articulatory form as its output. Similarly to the prototypicality task, the production process will have to take into account the cue constraints, but unlike the prototypicality task, the production process will also have to take into account sensorimotor constraints and articulatory constraints. Tableau 9.3 shows how the phonological surface form /i/ is produced phonetically. The cue constraints are still ranked exactly as in Tableau 9.2. Every candidate cell, however, now contains a pair of phonetic representations: articulatory and auditory. The articulatory part of each candidate shows the gestures needed for articulating [i]-like sounds. For simplicity I assume that the main issue is the precision with which the tongue has to be bulged towards the palate, and that more precision yields lower F1 values, for example a precision of ‘26’ yields an F1 of 240 Hz whereas a precision of ‘17’ yields an F1 of 330 Hz. These precison values are evaluated by articulatory constraints that are ranked by the amount of eVort involved, i.e. the constraint ‘the precision should not be greater than 26’ has to outrank the constraint ‘the precision should not be greater than 17’. The sensorimotor constraints are missing from Tableau 9.3. This is because for purposes of simplicity I assume here that the relation between articulatory and auditory form is Wxed, that is the speaker has a fully proWcient view of what any articulation will sound like and of how any auditory event can be implemented articulatorily. The candidate [240 Hz]Aud [prec¼26]Art, for instance, occurs in Tableau 9.3 because it only violates the low-ranked sensorimotor constraint *[240 Hz]Aud [prec¼26]Art, whereas candidates like [240 Hz]Aud [prec¼22]Art and [270 Hz]Aud [prec¼26]Art violate the highranked sensorimotor constraints *[240 Hz]Aud [prec¼22]Art and *[270 Hz]Aud [prec¼26]Art and are therefore ignored in Tableau 9.3. By making
*!
*
*
[180 Hz]Aud [prec=32]Art
*!
*
*
*
*
*
*
*
*
*
*
*
*
*
[190 Hz]Aud [prec=31]Art
*!
* *
[200 Hz]Aud [prec=30]Art
*!
*
*
*
*
*
[210 Hz]Aud [prec=29]Art
*!
*
*
*
*
*
[220 Hz]Aud [prec=28]Art
*!
*
*
*
*
*
[230 Hz]Aud [prec=27]Art
*!
*
*
*
*
*
[240 Hz]Aud [prec=26]Art
*!
*
*
*
*
[250 Hz]Aud [prec=25]Art
*!
*
*
*
*
*!
*
*
*
[260 Hz]Aud [prec=24]Art
*!
*
*
*
[280 Hz]Aud [prec=22]Art
*!
*
*
[290 Hz]Aud [prec=21]Art
*!
*
*
[270 Hz]Aud [prec=23]Art
[300 Hz]Aud [prec=20]Art
* *!
[310 Hz]Aud [prec=19]Art [320 Hz]Aud [prec=18]Art
*!
*
*
* * *
/i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/
Providing vowel height
[170 Hz]Aud [prec=33]Art
290 200 280 210 270 230 220 240 260 250 Hz Hz Hz Hz Hz Hz Hz Hz Hz Hz not not not not not not not not not not
* * * * * * * * * * *
Gradience in Phonology
170 prec 180 prec 300 prec 190 Hz not Hz not Hz not Hz not > not > not > not /i/ 20 /i/ 18 /i/ 16 /i/
180
prec 310 prec not Hz not > not > 24 /i/ 22
Tableau 9.3.
this simpliWcation, we can regard the relationship between AudF and ArtF as Wxed, so that only the cue constraints and the articulatory constraints determine the speaker’s behaviour. The result of the ranking in Tableau 9.3 is that the auditory-articulatory pair [F1 ¼ 300 Hz]Aud [prec¼20]Art wins. Forms with a lower F1 are too eVortful, whereas forms with a higher F1 are too confusable.
/i/
prec 320 not Hz > not 26 /i/
Prototypicality Judgements as Inverted Perception
181
The result of the phoneme production task is thus very diVerent from that of the prototypicality task. The diVerence between the two tasks can be reduced to the presence of the articulatory constraints in the production task and their absence in the prototypicality task. 9.3.4 The formal explanation for the /i/ prototype eVect With the tables and simulation in Sections 9.3.1 to 9.3.3 the /i/ prototype eVect can now be explained in more formal terms than in Section 9.2.4. The simulation in Section 9.3.1 explains the fact that the F1 in the prototypicality task was 50 Hz lower than the modal F1 in the learner’s environment, while the diVerence between the tables in Section 9.3.2 and Section 9.3.3 explains the fact that the F1 in the production task was 50 Hz higher than in the prototypicality task. One can say that the prototypicality eVect is –50 Hz and the articulatory eVect is +50 Hz, and that the two eVects cancel out. The fact that the two eVects cancel out in the example of Sections 9.3.1 to 9.3.3 is due to my arbitrary choices for the ranking of the articulatory constraints in Tableau 9.3. Had I ranked these constraints higher, the candidate [310 Hz] might have won in Tableau 9.3; had I ranked them lower, the candidate [290 Hz] might have won. Either alternative would have led to a predicted shift in F1 from one generation of speakers to the next. The actual ranking in Tableau 9.3 was chosen in such a way that the two eVects cancel out exactly, so that the production distributions stay stable over the generations. To sum up: three F1 values for /i/ have been addressed: the modal F1 of the Wrst generation, the prototypical F1 for the second generation, and the modal F1 of the second generation. If the prototypicality and articulatory eVects cancel out (as they do in reality, if there is no sound change), the Wrst and third of these F1 values will be identical, and the prototype F1 will be the odd one out and the most conspicuous to researchers. Its diVerence from the two modal F1 values has been accounted for in Sections 9.3.2 and 9.3.3, respectively. 9.3.5 Stability over generations: a coincidence? The really surprising observation is now no longer the fact that the prototypicality task leads to a diVerent F1 than the modal F1 produced by the Wrst and second generations, but the fact that the modal F1 of the second generation is identical to that of the Wrst. The question of the stability of the production distribution could be answered with the help of Kirchner’s (1998: 288) proposal that the ranking of articulatory constraints is Wxed, namely simply a function of articulatory eVort alone. Imagine that this Wxed ranking is the one in Tableau 9.3, but that
182
Gradience in Phonology
a learner is confronted with an environmental distribution with a modal F1 of 280 instead of 300 Hz. The F1 of the prototype /i/ will shift down, but not by 20 Hz, because the inXuence of /e/ tokens decreases; it may shift from 250 to, say, 235 Hz. The modal produced F1 will also shift, but not by 15 Hz, because the articulatory constraints do not shift; it may shift to, say, 290 Hz. Within one generation, therefore, the modal F1 for /i/ will rise from 280 to 290 Hz, and in a couple of generations more it will be very close to 300 Hz. An analogous thing will happen if the environmental distribution has a modal F1 of 320 Hz: it will move to 300 Hz in three generations or so. Given a Wxed ranking of the articulatory constraints, therefore, every language will reach the same equilibrium: 300 Hz is the only stable F1 value possible, as long as everything else remains equal to the case of Figure 9.4. Other possible explanations for cross-generational stability involve the various learning algorithms for production in the parallel phonological-phonetic model (Boersma 2005), but these are far outside the scope of the present paper.
9.4 Comparison with earlier accounts Tableaus 9.2 and 9.3 automatically predict that, if the child is given enough time to learn even the rare overt forms, the best auditory form is one that is less likely to be perceived as /e/ than the modal F1 value for /i/ is, and that articulatory constraints lead to a higher F1 value in production. This can be explained within grammar models in which ArtF can inXuence AudF, because in such models the resulting AudF will be diVerent according to whether an ArtF has to be evaluated (as in the phoneme production task) or not (as in the prototypicality task). Such models include the one exempliWed in the present paper, namely Boersma’s (2005) parallel model of phonology and phonetics, where production is modelled as UF ! { SF, AudF, ArtF }, but they also include Boersma’s (1998) earlier listener-oriented grammar model, where production is modelled as UF ! (ArtF!AudF!SF). They do not include forward modular models of production of the type UF ! SF ! AudF ! ArtF, because in such serial models articulatory restrictions cannot inXuence the auditory form. The prototypicality proposal by Johnson et al. (1993) is an example of a serial production model. Presumably, their model would abbreviate the prototypicality task as SF ! HyperOF, and the phoneme production task as SF ! HyperOF ! OF. Since the presence versus absence of the ‘later’ representation OF has no way of inXuencing the form of the ‘earlier’ representation HyperOF, this ‘hyperarticulated phonetic target’ has to contain a peripheral F1 value of 250 Hz that is independent from the experimental task. The authors
Prototypicality Judgements as Inverted Perception
183
provide no conclusive independent evidence for the existence of such a representation, whereas the representations AudF and ArtF proposed in the present paper are independently needed to serve as the input to comprehension and the output of production. The prototypicality proposal by Frieda et al. (2000) invokes an extra representation as well, namely the ‘prototype’. For the existence of this level of representation Kuhl (1991) gave some independent evidence, namely the ‘perceptual magnet’ eVect. However, this eVect can be explained without invoking prototypes. This has Wrst been shown for lexically labelled exemplars by Lacerda (1995), and even in models of pure distributional learning without lexical labels, perceptual warping automatically emerges as an epiphenomenon, as has been shown for neural maps by Guenther and Gjaja (1996) and for Optimality Theory by Boersma et al. (2003). With Occam’s razor, explanations without these poorly supported prototypes have to be preferred. The prototypicality proposal by Lacerda (1997) derives goodness judgements from the activations of categories in an exemplar model of phonology. However, the auditory token that generates the highest activation for /i/ is still the modal auditory form; the best prototype is only derived indirectly as the auditory form that has the highest activation for /i/ relative to its activation for other vowel categories. This proposal thus does choose the least confusable auditory form, but does not provide an automatic underlying mechanism such as the one that necessarily follows from the task in Figure 9.3 (middle). Finally, the results derived in this paper could equally well have been derived by formalizing the grammar and task models in Figures 9.1 to 9.3 within a framework that does not rely on constraint ranking but on constraint weight addition, such as Harmony Grammar (Legendre et al. 1990a, 1990b).
9.5 Conclusion The present paper has oVered formal explanations for two facts, namely that a prototype (by being less confusable) is more peripheral than the modal auditory form in the listener’s language environment, and that a prototype (by not being limited by articulatory restrictions) is more peripheral than the modal auditory form that the listener herself will produce. Given the representation-and-constraints model of Figure 9.1, the only assumptions that led to these formal explanations were that representations are evaluated only if they are necessarily activated (Figure 9.3) and that in production processes (Figure 9.2, right; Figure 9.3, right) the output representations are evaluated
184
Gradience in Phonology
in parallel, so that ‘later’ representations can inXuence ‘earlier’ representations. The explanations provided here for a phonetic example may well extend to other areas of linguistics. The place where grammaticality judgements have most often been investigated is that of syntactic theory. One can imagine that the corpus frequency of constructions that are informed by speaker-based requirements at Phonetic Form is greater than would be expected on the basis of grammaticality judgements in a laboratory reading task, which may only activate Logical Form. This, however, is worth a separate investigation.
10 Modelling Productivity with the Gradual Learning Algorithm: The Problem of Accidentally Exceptionless Generalizations A DA M A L B R I G H T A N D B RU C E H AY E S
10.1 Introduction Many cases of gradient intuitions reXect conXicting patterns in the data that a child receives during language acquisition.1 An area in which learners frequently face conXicting data is inXectional morphology, where diVerent words often follow diVerent patterns. Thus, for English past tenses, we have wing winged (the most common pattern in the language), wring wrung (a widespread [I] [ˆ] pattern), and sing sang (a less common [I] [æ] pattern). In cases where all of these patterns could apply, such as the novel verb spling, the conXict between them leads English speakers to entertain multiple possibilities, with competing outcomes falling along a gradient scale of intermediate well-formedness (Bybee and Moder 1983; Prasada and Pinker 1993; Albright and Hayes 2003). In order to get a more precise means of investigating this kind of gradience, we have over the past few years developed and implemented a formal model for the acquisition of inXectional paradigms. An earlier version of our model is described in Albright and Hayes (2002), and its application to various empirical problems is laid out in Albright et al. (2001), Albright (2002), and Albright and Hayes (2003). Our model abstracts morphological and phonological generalizations from representative learning data and uses 1 For helpful comments and advice we would like to thank Paul Boersma, Junko Ito, Armin Mester, Jaye Padgett, Hubert Truckenbrodt, the editors, and our two reviewers, absolving them for any shortcomings.
186
Gradience in Phonology
them to construct a stochastic grammar that can generate multiple forms for novel stems like spling. The model is tested by comparing its ‘intuitions’, which are usually gradient, against human judgements for the same forms. In modelling gradient productivity of morphological processes, we have focused on the reliability of the generalizations: how much of the input data do they cover, and how many exceptions do they involve? In general, greater productivity is correlated with greater reliability, while generalizations covering few forms or entailing many exceptions are relatively unproductive. For English past tenses, most generalizations have exceptions, so Wnding the productive patterns requires Wnding the generalizations with the fewest exceptions. Intermediate degrees of well-formedness arise when the generalizations covering diVerent patterns suVer from diVerent numbers of exceptions. The phenomenon of gradient well-formedness shows that speakers do not require rules or constraints to be exceptionless; when the evidence conXicts, they are willing to use less than perfect generalizations. One would expect, however, that when gradience is observed, more reliable generalizations should be favoured over less reliable ones. In this article, we show that, surprisingly, this is not always the case. In particular, we Wnd that there may exist generalizations that are exceptionless and well-instantiated, but are nonetheless either completely invalid, or are valued below other, less reliable generalizations. The existence of exceptionless, but unproductive patterns is a challenge for current approaches to gradient productivity, which generally attempt to extend patterns in proportion to their strength in the lexicon. We oVer a solution for one class of these problems, based on the optimality-theoretic principle of constraint conXict and employing the Gradual Learning Algorithm (Boersma 1997; Boersma and Hayes 2001). In the Wnal section of the paper we return to our earlier work on gradience and discuss the implications of our present Wndings.
10.2 Navajo sibilant harmony The problem of exceptionless but unproductive generalizations arose in our eVorts to extend our model to learn non-local rule environments. The Wrst example we discuss comes from sibilant harmony in Navajo, a process described in Sapir and Hoijer (1967). Sibilant harmony can be illustrated by examining the allomorphs of the s-perfective preWx. This preWx is realized as shown in (10.1) (examples from Sapir and Hoijer):2 2 We have rendered all transcriptions (including Sapir and Hoijer’s) in near-IPA, except that we use [cˇ cˇh cˇ’ sˇ zˇ] for [tS tSh tS’ SZ] in order to depict the class of nonanterior sibilants more saliently.
The Gradual Learning Algorithm (10.1) a. [sˇ`ı-]
187
if the Wrst segment of the stem is a [–anterior] sibilant ([cˇ, cˇ’, cˇh, sˇ, zˇ]), for example in [sˇı`-cˇh ı`t] ‘he is stooping over’
b. [sˇ`ı-] or [sı`- ] if somewhere later in the stem is a [–anterior] sibilant, as in [sˇı`-the´:zˇ] [sı`-th e´:zˇ] ‘they two are lying’ (free variation) otherwise, as in [sı`-thK2] ‘he is lying’3
c. [sı`-]
A fully realistic simulation of the acquisition of Navajo sibilant harmony would require a large corpus of Navajo verb stems, along with their s-perfectives. Lacking such a corpus, we performed idealized simulations using an artiWcial language based on Navajo: we selected whole Navajo words (rather than stems) at random from the electronic version of Young et al.’s dictionary (1992), and constructed s-perfective forms for them by attaching [sˇı`-] or [sı`-] according to the pattern described in (10.1).
10.3 The learning model Our learning system employs some basic assumptions about representations and rule schemata. We assume that words are represented as sequences of phonemes, each consisting of a bundle of features, as in Chomsky and Halle (1968). Rules and constraints employ feature matrices that describe natural classes, as well as variables permitting the expression of non-local environments: ([+F]) designates a single skippable segment of the type [+F], while ([+F])* designates any number of skippable [+F] segments. Thus, the environment in (10.2): (10.2)
/___ ([+seg])* [–anterior]
can be read ‘where a non-anterior segment follows somewhere later in the word’ ([+seg] denotes the entire class of segments). The model is given a list of pairs, consisting of bases and inXected forms. For our synthetic version of Navajo, such a list would be partially represented by (10.3): (10.3) a. [pa`:?] [sı`-pa`:?] b. [cˇ’ı`æ]
[sˇı`-cˇ’ı`æ]
c. [cˇho`:jı`n]
[sˇ`ı-cˇho`:jı`n]
3 Sapir and Hoijer speciWcally say (1967: 14–15): ‘Assimilation nearly always occurs when the two consonants are close together (e.g. sˇı`-cˇa`:?, from sı`-cˇa`:? ‘‘a mass lies’’; . . . but it occurs less often when the two consonants are at a greater distance.’
188
Gradience in Phonology
d. [ka`n]
[sı`-ka`n]
e. [k’a`z]
[sı`-k’a`z]
f. [khe´sˇka1:] [sˇı`-khe´sˇka1:], [sı`-khe´sˇka1:] g. [sı´:æ]
[sı`-sı´:æ]
h. [tha1sˇ]
[sˇ`ı-tha1sˇ], [sı`-tha1sˇ]
i. [tho´:?]
[sı`-tho´:?]
j. [t¸e´:zˇ]
[sˇ`ı-t¸e´:zˇ], [sı`-t¸e´:zˇ]
Where free variation occurs, the learner is provided with one copy of each variant; thus, for (10.3f) both [khe´sˇka1:] [sˇı`-khe´sˇka1:] and [khe´sˇka1:] [sı`-khe´sˇka1:] are provided. The goal of learning is to determine which environments require [sı`-], which require [sˇı`-], and which allow both. Learning involves generalizing bottom-up from the lexicon, using a procedure described below. Generalization creates a large number of candidate environments; an evaluation metric is later employed to determine how these environments should be employed in the Wnal grammar.
(.)
I. PREFIX [sì-]
II. PREFIX [šì-]
a. [pà:?]
[sì-pà:?]
a. [chò:jìn]
[šì-chò:jìn]
b. [kàn]
[sì-kàn]
b. [cPì ]
[šì-cPì ]
` c. [khéškã:]
` [sì-khéškã:]
` c. [khéškã:]
` [šì-khéškã:]
d. [t é:z]
[sì-t é:z]
d. [t é:z]
[šì-t é:z]
` e. [thãš]
` [sì-thãš]
` e. [thãš]
` [šì-thãš]
f. [kPàz]
[sì-kPàz]
g. [sí: ]
[sì-sí: ]
h. [thó:?]
[sì-thó:?]
The Gradual Learning Algorithm
189
Learning begins by parsing forms into their component morphemes and grouping them by the morphological change they involve. The data in (10.3) exhibit two changes, as shown in (10.4); the box surrounds cases of free variation. For each change, the system creates hypotheses about which elements in the environment crucially condition the change. It begins by treating each pair as a ‘word-speciWc rule’, separating out the changing part from the invariant part. Thus, the Wrst three [sˇı`-] forms in (4) would be construed as in (5): (10.5) a. Ø ! sˇı` / [ ___ cˇho`:jı`n] b. Ø ! sˇı` / [ ___ cˇ’ı`æ] c. Ø ! sˇ`ı / [___ khe´sˇka1:] Next, the system compares pairs of rules that have the same change (e.g. both attach [sˇı`-]), and extracts what their environments have in common to form a generalized rule. Thus, given the word-speciWc rules in (10.6a), the system collapses them together using features, as in (10.6b).
(.) a.
∅ → šì / [ ___ thãš] ∅ → šì / [ ___ t é:z] ∅ → šì / [ ___
th
ã`
š
]
+ ∅ → šì / [ ___
b.
t
é:
z
]
= ∅ → šì / [ ___
−sonorant −continuant +anterior
+syllabic −high −round
−sonorant +continuant −anterior +strident
]
In this particular case, the forms being compared are quite similar, so determining which segment should be compared with which is unproblematic. But for forms of diVerent lengths, such as [cˇho`:jı`n] and [cˇ’ı`æ] above, this is a harder question.4 We adopt an approach that lines up the segments that are most similar to each another. For instance, (10.7) gives an intuitively reasonable alignment for [cˇho`:jı`n] and [cˇ’ı`æ]: (10.7) cˇh o`: j `ı n j j j `ı æ cˇ’ 4 The issue did not arise in an earlier version of our model (Albright and Hayes 2002), which did not aspire to learn non-local environments, and thus could use simple edge-in alignment.
190
Gradience in Phonology
Good alignments have two properties: they match phonetically similar segments such as [cˇh] and [cˇ’], and they avoid leaving too many segments unmatched. To evaluate the similarity of segments, we employ the similarity metric from Frisch et al. (2004). To guarantee an optimal pairing, we use a cost-minimizing string alignment algorithm (described in Kruskal 1999) that eYciently searches all possible alignments for best total similarity. Seen in detail, the process of collapsing rules is based on three principles, illustrated in (10.8) with the collapsing of the rules Ø ! sˇı` / [ ___ khe´sˇka1:] and Ø ! sˇ`ı / [ ___ tha1sˇ].
1. Shared material is collapsed using the feature system.
= ∅ → šì
/
[ [
[
kh th -sonorant -contin +spread gl. -constr. gl.
é ã
š š
kã:
+syllabic –high –round
š
(k) (ã:)]
š
([+seg])*]
⬘
/ /
⬘
∅ → šì + ∅ → šì
2. Unmatched material is designated as optional, notated with parentheses.
] ]
⬘
(10.8)
3. Sequential optional elements are collapsed into a single variable, encompassing all of their shared features (e.g.([+F])*).
∅ → šì
/
[
-sonorant -contin +spread gl. -constr. gl.
+syllabic -high -round
Paired feature matrices are collapsed by constructing a new matrix, containing all of their shared features (see step 1). Next, any material in one rule that is unmatched to the other is designated as optional, represented by parentheses (step 2). Finally, sequences of consecutive optional elements are collapsed together into a single expression of the form (F)*, where F is the smallest natural class containing all of the collapsed optional elements (step 3). The process is iterated, generalizing the new rule with the other words in the learning data; the resulting rules are further generalized, and so on. Due to memory limitations, it is necessary periodically to trim back the hypothesis
The Gradual Learning Algorithm
191
set, keeping only those rules that perform best.5 Generalization terminates when no new ‘keeper’ rules are found. We Wnd that this procedure, applied to a representative set of words, discovers the environment of non-local sibilant harmony after only a few steps. One path to the correct environment is shown in (10.9):
(.)
[ch ò: j ì n]
-continuant -anterior
[c’ ì ]
(
+son -cons -nasal
-syllabic ) *i +anterior
-continuant -anterior
∅→
[ch ì t í]
([+seg])*
[z ì: ]
[-anterior] ([+seg])*
[t í w ó z ì: p á h í ]
sì-/ [
([+seg])* [−anterior] ([+seg])* ]
The result can be read: ‘PreWx [sˇı`-] to a stem consisting of any number of segments followed by a nonanterior segment, followed by any number of segments.’ (Note that [–anterior] segments in Navajo are necessarily sibilant.) In more standard notation, one could replace ([+seg])* with a free variable X, and follow the standard assumption that non-adjacency to the distal word edge need not be speciWed, as in (10.10): (10.10) Ø ! sˇı`- / ___ X [–anterior] We emphasize that at this stage, the system is only generating hypotheses. The task of using these hypotheses to construct the Wnal grammar is taken up in Section 10.5.
5 SpeciWcally: (a) for each word in the training set, we keep the most reliable rule (in the sense of Albright and Hayes 2002) that derives it; (b) for each change, we keep the rule that derives more forms than any other.
192
Gradience in Phonology
10.4 Testing the approach: a simulation We now show that, given representative learning data, the system just described can discover the rule environments needed for Navajo sibilant harmony. As noted above, our learning simulation involved artiWcial Navajo s-perfectives, created by attaching appropriate preWx allomorphs to whole Navajo words (as opposed to stems). Selecting 200 words at random,6 we attached preWxes to the bases as follows, following Sapir and Hoijer’s characterization: (a) if the base began with a nonanterior sibilant, we preWxed [sˇı`-] (there were nineteen of these in the learning set); (b) if the base contained but did not begin with a nonanterior sibilant, we made two copies, one preWxed with [sˇı`-], the other with [sı`-] (thirty-seven of each); (c) we preWxed [sı`-] to the remaining 144 bases. Running the algorithm just described, we found that among the 92 environments it learned, three were of particular interest: the environment for obligatory local harmony, (10.11a); the environment that licenses distal harmony, ((10.11b); note that this includes local harmony as a special case); and the vacuous ‘environment’ specifying the default allomorph [sı`-], (10.11c). (10.11) a. Obligatory local harmony Ø ! [sˇı`-] / ___ [–anterior] b. Optional distal harmony (¼ (10.10)) Ø ! [sˇı`-] / ___ X [–anterior] c. Default [sı`-] Ø ! [sı`-] / ___ X The remaining eighty-nine environments are discussed below.
10.5 Forming a grammar These environments can be incorporated into an eVective grammar by treating them not as rules, as just given, but rather as optimality-theoretic constraints of 6 From the entire database of 3,023 words, we selected 2,000 words at random, dividing this set into ten batches of 200 words each. To conWrm the generality of our result, we repeated our simulation on each 200-word training sample. Due to space considerations, we report here the results of only one of the ten trials; the remaining nine were essentially the same in that they all discovered the environments in (10.11). The primary diVerence between trials was the precise formulation of the other, unwanted constraints (Section 10.6), but in every case, such constraints were correctly ranked below the crucial constraints, as in (10.13).
The Gradual Learning Algorithm
193
morphology (Boersma 1998b; Russell 1999; Burzio 2002; MacBride 2004). In this approach, rule (10.11a) is reconstrued as a constraint: ‘Use [sˇı`-] / ___ [–anterior] to form the s-perfective.’ This constraint is violated by forms that begin with a nonanterior segment, but use something other than [sˇı`-] to form the s-perfective. The basic idea is illustrated below, using hypothetical monosyllabic roots: (10.12) Morphological Candidates that obey base Use [sˇı`-] / ___ [–anterior] [sˇa´p]
[sˇı`-sˇa´p]
[ta´p]
all
Candidates that violate Use [sˇı`-] / ___ [–anterior] *[sı`-sˇa´p], *[mu`-sˇa´p], etc. none
It is straightforward to rank these constraints in a way that yields the target pattern, as (10.13) and (10.14) show: (.)
USE [šì-]/
[−ant]>> { USE [sì-] /
X, USE [šì-] /
X [−ant] } >> all others
ranked in free variation
(.) a.
b.
/sì-cìd/
USE [šì-]/___[–ant]
USE [šì-]/___ X [–ant]
F šì-cìd * sì-cìd
*!
*
/sì-té:z/
USE [šì-]/___[–ant]
USE [šì-]/___ X [–ant]
USE [sì-]/___ X
*
F šì-té:z F sì-té:z
USE [sì-]/___ X
* *!
For (10.14b), the free ranking of Use [sˇı`-] / ___ X [–ant] and Use [sı`-] / ___ X produces multiple winners generated in free variation (Anttila 1997).
10.6 Unwanted constraints The eighty-nine constraints not discussed so far consist largely of complicated generalizations that happen to hold true of the learning data. One example is shown in (10.15): (10.15) Use sı`- / ___ ([–round])* +anterior ([ consonantal])*] +continuant As it happens, this constraint works for all thirty-seven forms that meet its description in the learning data. However, it makes profoundly incorrect
194
Gradience in Phonology
predictions for forms outside the learning data, such as the legal but nonexisting stem /cˇa´la´/ (10.16). ð10:16Þ
USE sì- /
sì-
([−round])* c á
+anterior +continuant ([−consonantal])*] l
á
If ranked high enough, this constraint will have the detrimental eVect of preventing [sˇı`-cˇa´la´] from being generated consistently. We will call such inappropriate generalizations ‘junk’ constraints. One possible response is to say that the learning method is simply too liberal, allowing too many generalizations to be projected from the learning data. We acknowledge this as a possibility, and we have experimented with various ways to restrict the algorithm to more sensible generalizations. Yet we are attracted to the idea that constraint learning could be simpliWed—and rely on fewer a priori assumptions—by letting constraints be generated rather freely and excluding the bad ones with an eVective evaluation metric. Below, we lay out such a metric, which employs the Gradual Learning Algorithm.7
10.7 The Gradual Learning Algorithm The Gradual Learning Algorithm (GLA: Boersma 1997; Boersma and Hayes 2001) can rank constraints in a way that derives free variation and matches the frequencies of the learning data. The GLA assumes a stochastic version of optimality theory, whereby each pair of constraints {A, B} is assigned not a strict ranking, but rather a probability: ‘A dominates B with probability P.’ Thus, the free ranking given in (10.13) above would be captured by assigning the constraints Use [sı`-] / ___ X and Use [sˇı`-] / ___ X [–ant] a 50–50 ranking probability. Any such theory needs a method to ensure that the pairwise probabilities assigned to the constraints are mutually consistent. In the GLA, this is done by arranging the constraints along a numerical scale, assigning each constraint a ranking value. On any particular occasion that the grammar is used, a 7 A reviewer points out that another approach to weeding out unwanted generalizations is to train the model on various subsets of the data, keeping only those generalizations that are found in all training sets (cross-validation). Although this technique could potentially eliminate unwanted generalizations (since each subset contains a potentially diVerent set of such generalizations), it could not absolutely guarantee that they would not be discovered independently in each subset. Given that such constraints make fatal empirical predictions, we seek a technique that reliably demotes them, should they arise.
The Gradual Learning Algorithm
195
selection point is adopted for each constraint, taken from a Gaussian probability distribution with a standard deviation Wxed for all constraints. The constraints are sorted by their selection points, and the winning candidate is determined on the basis of this ranking. Since pairwise ranking probabilities are determined by the ranking values,8 they are guaranteed to be mutually consistent.
10.8 The need for generality Let us now consider the application of the GLA to Navajo. Naively, one might hope that when the constraints are submitted to the GLA, the junk will settle to the bottom. However, what one actually Wnds is that the junk constraints get ranked high. Although Use [sˇ`ı-] / ___ [–ant] does indeed get ranked on top, the crucial constraints Use [sˇı`-] /___ X [–ant] and Use [sı`-]/___ X are swamped by higher-ranking junk constraints, and rendered largely ineVective. The result is a grammar that performs quite well on the training data (producing something close to the right output frequencies for every stem), but fails grossly in generating novel forms. The frequencies generated for novel forms are determined by the number of high ranking junk constraints that happen to Wt them, and do not respect the distribution in (10.11). The problem is a classic one in inductive learning theory. If a learning algorithm excessively tailors its behaviour to the training set, it may learn a patchwork of small generalizations that collectively cover the learning data. This does not suYce to cover new forms—which, after all, is the main purpose of having a grammar in the Wrst place! Why does the GLA fail? The reason is that it demotes constraints only when they prefer losing candidates. But within the learning data, the junk constraints generally prefer only winners—that is precisely why they emerged from the inductive generalization phase of learning. Accidentally true generalizations thus defeat the GLA as it currently stands. What is needed is a way for the GLA to distinguish accidentally true generalizations from linguistically signiWcant generalizations.
10.9 Initial rankings based on generality Boersma (1998) suggested that for morphology, initial rankings should be based on generality—the more general the constraint, the higher it is ranked before learning takes place. It turns out that this insight is the key 8 A spreadsheet giving the function that maps ranking value diVerences to pairwise probabilities is posted at http://www.linguistics.ucla.edu/people/hayes/GLA/.
196
Gradience in Phonology
to solving the Navajo problem. What is needed, however, is a way to characterize generality in numerical terms. There are various possible approaches; Chomsky and Halle (1968), for example, propose counting symbols (fewer symbols ¼ greater generality). Here, we adopt an empirical criterion: a morphological constraint is maximally general if it encompasses all of the forms that exhibit its structural change. We use the fraction in (10.17): (10.17) number of forms that a constraint applies to total number of forms exhibiting the change that the constraint requires In the 200-word Navajo simulation discussed above, some representative generality values are shown in (10.18). (10.18)
Constraint use [sˇ`ı-] / ___ [–anterior] use [sˇı`-] / ___ X [–anterior] use [sı`-] / ___ X Constraint (10.15) (‘junk’ constraint)
Relevant forms
Forms with this change
Generality .339
19 56 [sˇ ı` -] forms
56
1
181
1 181 [sı` -] forms
37
.204
The idea, then, is to assign the constraints initial ranking values that reXect their generality, with more general constraints on top. If the scheme works, all the data will be explained by the most general applicable constraints, and the others will remain so low that they never play a role in selecting output forms. In order to ensure that diVerences in initial rankings are large enough to make a diVerence, the generality values from (10.17) were rescaled to cover a huge probability range, using the formula in (10.19): (10.19) For each constraint c, initial ranking valuec ¼ Generality c Generality min 500 Generality max Generality min
The Gradual Learning Algorithm
197
where Generalitymin is the generality of the least general constraint, and Generalitymax is the generality of the most general constraint.
10.10 Employing generality in a learning simulation We implemented this scheme and ran it multiple times on the Navajo pseudodata described above. For one representative run, it caused the relevant constraints (including here just one representative ‘junk’ constraint (10.15)), to be ranked as follows:
(.)
GENERALITY
INITIAL RANKING
FINAL RANKING 550
USE [sì-] /__X [−ant] 1,1 USE [sì-] /—X
500, 500
500
500
.9
450
450
.8
400
400
1
.7
350
.6
300
.5
250
.4 USE [sì-] /__ [−ant] ‘Junk’ constraint
350 100,000 training cycles
300 250
200
200
.339
.3
150.9
150
150
.204
.2
79.7
100
100
.1
50
50
0
0
0
(.)
514.9 499.9,500.1
19.2
The Wnal grammar is depicted schematically in (10.21), where the arrows show the probabilities that one constraint will outrank the other. When the diVerence in ranking value exceeds about 10, the probability that the ranking will hold is essentially 1 (strict ranking).
(.)
USE [šì-] /___ [−ant] 514.9
Undominated local harmony
1 USE [sì-] /___X 500.1
USE [šì-] /___ X [−anterior] .5
499.9
Free variation for non-local harmony
1 +anterior USE šì- /___([−round])* +continuant
([−consonantal])*]
19.2
Potentially harmful constraints like (.) safely outranked
198
Gradience in Phonology
This approach yields the desired grammar: all of the junk constraints (not just (10.15)) are ranked safely below the top three. The procedure works because the GLA is error-driven. Thus, if junk constraints start low, they stay there, since the general constraint that does the same work has a head start and averts any errors that would promote the junk constraints. Good constraints with speciWc contexts, on the other hand, like ‘Use [sˇı`-] /___ [–ant]’, are also nongeneral—but appropriately so. Even if they start low, they are crucial in averting errors like *[sı`-sˇa´p], and thus they are soon promoted by the GLA to the top of the grammar. We Wnd, then, that a preference for more general statements in grammar induction is not merely an aesthetic bias; it is, in fact, a necessary criterion in distinguishing plausible hypotheses from those which are implausible, but coincidentally hold true in the learning sample.
10.11 Analytic discussion While the Navajo simulation oVers a degree of realism in the complexity of the constraints learned, hand analysis of simpler cases helps in understanding why the simulation worked, and ensures that the result is a general one. To this end, we reduce Navajo to three constraints, renamed as follows: (1) Use [sı`-], which we will call Default, (2) the special-context Use [sˇı`-] /___ X [–ant], which we will call Contextual [sˇı`-], and (3) the accidentallyexceptionless (10.15), which we will call Accidental [sı`-]. Accidental [sı`-] is exceptionless because the relevant forms in the training data happen not to contain non-anterior sibilants. Suppose Wrst that all harmony is optional (50/50 variation). Using the normal GLA, all constraints start out with equal ranking values, set conventionally at 100. The constraints Contextual [sˇı`-] and Default should be ranked in a tie to match the 50/50 variation. During learning (see Boersma and Hayes 2001: 51–4), these two constraints vacillate slightly as the GLA seeks a frequency match, but end up very close to their original value of 100. Accidental [sı`-] will remain at exactly 100, since the GLA is error driven and none of the three constraints favours an incorrect output for the training data that match Accidental [sı`-] (Default and Accidental [sı`-] both prefer [sı`-], which is correct; and Contextual [sˇ`ı-] never matches these forms). Thus, all three constraints are ranked at or near 100. This grammar is incorrect; when faced with novel forms like (10.16) that match all three constraints, Contextual [sˇı`-] competes against two equally ranked antagonists, deriving [sˇ`ı-] only a third of the time instead of half.
The Gradual Learning Algorithm
199
Initial rankings based on generality (Section 10.9) correct this problem. Given that Default and Contextual [sˇı`-] cover all [sı`-] and [sˇı`-] forms respectively, they will be assigned initial ranking values of 500. DeWne the critical distance C as the minimum diVerence between two constraints that is needed to model strict ranking. (Informal trials suggest that a value of about 10.5, which creates a ranking probability of .9999, is suYcient.) It is virtually certain that the initial ranking value for Accidental [sı`-] will be far below 500 C, because accidentally true constraints cannot have extremely high generality, other than through an unlikely accident of the learning data. Ranking proceeds as before, with Default and Contextual [sˇ`ı-] staying around 500 and Accidental [sı`-] staying where it began. The resulting grammar correctly derives 50/50 variation, because Accidental [sı`-] is too low to be active. Now consider what happens when the data involve no free variation; that is [sˇı`-] is the outcome wherever Contextual [sˇı`-] is applicable. When initial rankings are all equal, [sˇı`-] forms will cause Contextual [sˇı`-] to rise and Default to fall, with their diVerence ultimately reaching C (Contextual [sˇ`ı-]: 500+C/2; Default: 500 C/2). Just as before, Accidental [sı`-] will remain ranked where it started, at 500. The diVerence of C/2 between Contextual [sˇı`-] and Accidental [sı`-], assuming C ¼ 10.5, will be 5.25, which means that when the grammar is applied to novel forms matching both constraints, [sı`-] outputs will be derived about 3 per cent of the time. This seems unacceptable, given that the target language has no free variation. Again, the incorrect result is avoided under the initial-ranking scheme of Section 10.9, provided that Accidental [sı`-] is initially ranked at or lower than 500 C/2, which is almost certain to be the case. In summary, schematized analysis suggests that the Navajo result is not peculiar to this case. The eVect of accidentally true generalizations is strongest when optionality is involved, but they pose a threat even in its absence. Initial rankings based on generality avoid the problem by keeping such constraints a critical distance lower than the default, so they can never aVect the outcome.
10.12 The realism of the simulation In this section we address two possible objections to our model. 10.12.1 Phonological rules versus allomorph distribution Navajo sibilant harmony is typically analysed as a phonological process, spreading [–anterior] from right to left within a certain domain. The grammar
200
Gradience in Phonology
learned by our model, on the other hand, treats harmony as allomorphy ([sı`-] versus [sˇı`-]), and cannot capture root-internal harmony eVects. Thus, it may be objected that the model has missed the essential nature of harmony. In this connection, we note Wrst that harmony is often observed primarily through aYx allomorphy—either because there is no root-internal restriction, or because the eVect is weaker within roots, admitting greater exceptionality. For these cases, allomorphy may be the only appropriate analysis. For arguments that root-internal and aYxal harmony often require separate analyses, see Kiparsky (1968). More generally, however, there still remains the question of how to unify knowledge about allomorphy and root-internal phonotactics. Even when aYxes and roots show the same harmony patterns, we believe that understanding the distribution of aYx allomorphs could constitute an important Wrst step in learning the more general process, provided there is some way of bootstrapping from constraints on particular morphemes to more general constraints on the distribution of speech sounds. We leave this as a problem for future work. 10.12.2 Should arbitrary constraints be generated at all? Another possible objection is that a less powerful generalization procedure would never have posited constraints like (10.15) in the Wrst place. Indeed, if all constraints come from universal grammar (that is, are innate), the need to trim back absurd ones would never arise. Against this objection can be cited work suggesting that environments sometimes really are complex and synchronically arbitrary (Bach and Harms 1972; Hale and Reiss 1998; Hayes 1999; Blevins 2004). For instance, in examining patterns of English past tenses, we found that all verbs ending in voiceless fricatives are regular, and that speakers are tacitly aware of this generalization (Albright and Hayes 2003). Not only are such patterns arbitrary, but they can also be rather complex (see also Bybee and Moder 1983). Regardless of whether such generalizations are learned or innate, it seems likely that any model powerful enough to handle the full range of attested patterns will need a mechanism to sift through large numbers of possibly irrelevant hypotheses.
10.13 Modelling gradient productivity: the fate of reliability metrics As noted above, one of our long-term goals is to understand how gradient productivity arises when the learner confronts conXicting data. The results above challenge our earlier views, and in this section we lay out ways in which our previous approach might be revised.
The Gradual Learning Algorithm
201
Earlier versions of our model evaluated contexts according to their accuracy, or reliability, deWned as the ratio of the number of forms a rule derives correctly, divided by the total number of forms to which the rule is applicable. We have found in many cases that we could model native speaker intuitions of novel forms by using the reliability of the best rule that derives them (adjusted slightly, in a way to be mentioned below). However, the results of our Navajo simulations show that accuracy alone is not an adequate criterion for evaluation, since assiduous rule discovery can sometimes Wnd accidentally-true (and thus perfectly accurate) generalizations which nonetheless lead to disaster if trusted. The Navajo example illustrates why it is not enough to evaluate the accuracy of each generalization independently; we must also consider whether generalizations cover forms that are better handled by a diVerent generalization.9 Another possible failing of the reliability approach is that it is ill-suited to capture special case/‘elsewhere’ relations (Kiparsky 1982). The environment for [sˇı`-] in Navajo is diYcult to express by itself, but easy as the complement set of the [sı`-] environments. In optimality theory, ‘elsewhere’ is simply the result of constraint ranking: a context-sensitive constraint outranks the default. Unfortunately for the reliability-based approach, default environments such as (10.11c) often have fairly high reliability (181/237 in this case)—but that does not mean that they should be applied in the special-allomorph context (e.g. of (10.11a)). In light of this, it is worth considering why we adopted reliability scores in the Wrst place. Ironically, the reason likewise involved accidentally-true generalizations, but of a diVerent kind. One of the phenomena that compelled us to use reliability scores was the existence of small-scale patterns for irregulars, seen, for example, in English past tenses. As Pinker and Prince (1988) point out, when a system includes irregular forms, they characteristically are not arbitrary exceptions, but fall into patterns, e.g. cling clung, Xing Xung, swing swung. These patterns have some degree of productivity, as shown by historical change (Pinker 1999) and ‘wug’ (nonce-word) testing (Bybee and Moder 1983; Prasada and Pinker 1993; Albright and Hayes 2003).10
9 A related problem, in which overly broad generalizations appear exaggeratedly accurate because they contain a consistent subset, is discussed in Albright and Hayes (2002). 10 We restrict our discussion to phonological patterns; for discussion of patterns based possibly on semantic, rather than phonological similarities, see Ramscar (2002). In principle, the approach described here could be easily extended to include constraints that refer to other kinds of information; it is an empirical question what properties allomorphy rules may refer to.
202
Gradience in Phonology
The problem is that our algorithm can often Wnd environments for these minor changes that are exceptionless. For example, the exceptionless minor change in (10.22) covers the four verbs dig, cling, Xing, and sling.11 2 3 þcor þdorsal 4 5 ½ þpast ð10:22Þ i ! =X þant þvoice þvoice The GLA, when comparing an exceptionless constraint against a more general constraint that suVers from exceptions, always ranks the exceptionless constraint categorically above the general one. For cases like Navajo, where the special constraint was (10.11a) and the general constraint was (10.11c), the default constraint for [sı`-], this ranking is entirely correct, capturing the special/default relationship. But when exceptionless (10.22) is ranked categorically above the constraints specifying the regular ending for English (such as Use [-d]), the prediction is that novel verbs matching the context of (10.22) should be exclusively irregular (i.e. blig ! blug, not *bligged). There is evidence that this prediction is wrong, from wug tests on forms that match (10.22). For instance, the wug test reported in Albright and Hayes (2003) yielded the following judgements (scale: 1 worst, 7 best): (10.23) Present stem a. blig [blIg]
Choice for past Rating blug [bl^g]
4.17
bligged [blIgd]
5.67
b. spling [splI˛] splung [spl˛]
5.45
splinged [splI˛d]
4.36
The regular forms are almost as good or better than the forms derived by the exceptionless rule. We infer that numbers matter: a poorly attested perfect generalization such as (10.22) is not necessarily taken more seriously than a broadly attested imperfect generalization such as Use [-d]. For Navajo, strict ranking is appropriate, since the special-environment constraint (10.11a) that must outrank the default (10.11c) is robustly attested in the language. In the English case, the special-environment constraint is also exceptionless, but is attested in only four verbs, yet the GLA—in either version—ranks it on top of the grammar, just as in Navajo. 11 This is the largest set of I ! verbs that yields an exceptionless generalization. There are other subsets, such as cling, Xing, and sling, that also lead to exceptionless generalizations, and these are also generated by our model. The problem that we discuss below would arise no matter which set is selected, and would not be solved by trying to, for example, exclude dig from consideration.
The Gradual Learning Algorithm
203
It can now be seen why in our earlier work we avoided constraint interaction and employed reliability scores instead. With reliability scores, it is simple to impose a penalty on forms derived by rules supported by few data—following Mikheev (1997), we used a statistical lower conWdence limit on reliability. Thus, for a wug form like blig, two rules of comparable value compete: the regular rule (has exceptions, but vast in scope) versus (10.22) (no exceptions, but tiny in scope). Ambivalence between the two is a natural consequence. If reliability statistics are not the right answer to this problem, what is? It seems that the basic idea that rules based on fewer forms should be downgraded is sound. But the downgrade need not be carried out based on reliability scores—it might also be made part of the constraint ranking process. In particular, we propose that the basic principles of the GLA be supplemented with biases that exert a downward force on morphological constraints that are supported by few data, using statistical smoothing or discounting. As of this writing we do not have a complete solution, but we have experimented with a form of absolute discounting (Ney et al. 1994), implemented as follows: for each constraint C, we add to the learning data an artiWcial datum that violates C and obeys every other constraint with which C is in conXict. Under this scheme, if C (say, (10.22) above) is supported by just four forms, then an artiWcially-added candidate would have a major eVect in downgrading its ranking. But if C is supported by thousands of forms (for example, the constraint for a regular mapping), then the artiWcially added candidate would be negligible in its eVect. We found that when we implemented this approach, it yielded reasonable results for the English scenario just outlined: in a limited simulation consisting of the regulars in Albright and Hayes (2003) plus just the four irregulars covered by (10.22), regular splinged was a viable competitor with splung, and the relationships among the competing regular allomorphs remained essentially unchanged. There are many ways that small-scale generalizations could be downgraded during learning. We emphasize that the development of a well-motivated algorithm for this problem involves not just issues of computation, but an empirical question about productivity: when real language learners confront the data, what are the relative weights that they place on accuracy versus size of generalization? Both experimental and modelling work will be needed to answer these questions.12 12 An unresolved question that we cannot address here is whether a bias for generality can be applied to all types of phonological constraints, or just those that govern allomorph distribution. It is worth noting that for certain other types of constraints, such as faithfulness constraints, it has been argued that speciWc constraints must have higher initial rankings than more general ones (Smith 2000). At present, we restrict our claim to morphological constraints of the form ‘Use X’.
204
Gradience in Phonology
10.14 Conclusion The comparison of English and Navajo illustrates an important problem in the study of gradient well-formedness in phonology. On the one hand, there are cases such as English past tenses, in which the learner is confronted with many competing patterns and must trust some generalizations despite some exceptions. In such cases, gradient well-formedness is rampant, and the model must retain generalizations with varying degrees of reliability. On the other hand, there are cases such as Navajo sibilant harmony, in which competition is conWned to particular contexts, and the learner has many exceptionless generalizations to choose from. In these cases, the challenge is for the model to choose the ‘correct’ exceptionless patterns, and refrain from selecting an analysis that predicts greater variation than is found in the target language. We seek to develop a model that can handle all conWgurations of gradience and categoricalness, and we believe the key lies in the trade-oV between reliability and generality. We have shown here how our previous approach to the problem was insuYcient, and proposed a new approach using the GLA, modiWed to favour more general constraints. The precise details of how generality is calculated, and how severe the bias must be, are left as a matter for future research.
Part III Gradience in Syntax
This page intentionally left blank
11 Gradedness as Relative Efficiency in the Processing of Syntax and Semantics1 J O H N A . H AW K I N S
11.1 Introduction This paper presents a set of corpus data from English, a head-initial language, and some additional data from Japanese, a head-Wnal language, showing clear selection preferences among competing structures. The structures involve the positioning of complements and adjuncts relative to the verb, and the preferences range from highly productive to unattested (despite being grammatical). These ‘gradedness eVects’ point to a principle of eYciency in performance, minimize domains (MiD). SpeciWcally, this chapter argues for the following: (11.1) a. MiD predicts gradedness by deWning the syntactic and semantic relations holding between categories, by enumerating the surface structure domains in which these relations can be processed, and by ranking common domains in competing structures according to their degree of minimization. Relative weightings and cumulative eVects among diVerent syntactic and semantic relations are explained in this way. b. The same minimization preferences can be found in the preferredgrammatical conventions of diVerent language types, and a performance–grammar correspondence hypothesis is proposed. c. Principles of performance can predict what purely grammatical principles of ordering can only stipulate, and can explain exceptions to grammatical principles. A model that appears to capture the desired performance–grammar correspondence, stochastic 1 This paper is dedicated to Gu¨nter Rohdenburg on the occasion of his 65th birthday. Gu¨nter’s work discovering patterns of preference in English performance (see e.g. Rohdenburg 1996) has been inspirational to me and to many others over many years.
208
Gradience in Syntax optimality theory, has the reverse problem: it stipulates and fails to predict the performance data. d. What is needed in this and other grammatical areas is: a predictive and explanatory model of performance; an adequate description of the grammatical conventions that have been shaped by performance preferences; and a diachronic model of adaptation and change.
The order of presentation is as follows. In Section 11.2 I deWne the principle of minimize domains. Section 11.3 tests this principle on postverbal PPs in English and on preverbal NPs and PPs in Japanese and introduces a metric for quantifying multiple constraints and their interaction. Section 11.4 discusses minimize domains in grammars and in cross-linguistic variation, and Section 11.5 summarizes the conclusions.
11.2 Minimize domains This principle is deWned in (11.2): (11.2)
Minimize domains (MiD) The human processor prefers to minimize the connected sequences of linguistic forms and their conventionally associated syntactic and semantic properties in which relations of combination and/or dependency are processed. The degree of this preference is proportional to the number of relations whose domains can be minimized in competing sequences or structures, and to the extent of the minimization diVerence in each domain.
A relation of combination is deWned here as follows. (11.3)
Combination Two categories A and B are in a relation of combination iV they occur within the same mother phrase and maximal projections (phrasal combination), or if they occur within the same lexical co-occurrence frame (lexical combination).
Clever is in combination with student in clever student, since both are in the same mother phrase (NP), read combines with the book in the VP read the book, and so on. These phrasal combinations are deWned by general phrase structure rules. Subject and object arguments of a verb are in lexical combination with that verb and with one another, and more generally the ‘complements’ of a verb are listed alongside that verb in its lexical entry. For dependency I propose what is ultimately a processing deWnition:
Gradedness in the Processing of Syntax and Semantics
209
(11.4) Dependency Two categories A and B are in a relation of dependency iV the processing of B requires access to A for the assignment of syntactic or semantic properties to B with respect to which B is zero-speciWed or ambiguously or polysemously speciWed. Theta-role assignment to an NP by reference to a verb can serve as an example of a dependency of B on A. Co-indexation of a reXexive anaphor by reference to an antecedent, and gap processing in relation to a Wller, are others. Some dependencies between A and B also involve combination (theta-role assignments, for example), others do not. A ‘domain’, as this term is used in this context, is deWned in (11.5): (11.5)
Domain A combinatorial or dependency domain consists of the smallest connected sequence of terminal elements and their associated syntactic and semantic properties that must be processed for production and/or recognition of the combinatorial or dependency relation in question.
The domain suYcient for recognition of the VP and its three immediate constituents (V, PP1, PP2) is shown in bold in the following sentence (cf. 11.3.1 below): the old lady counted on him in her retirement. One prediction made by MiD (11.2) that will be signiWcant here involves the preferred adjacency of some categories versus others to a head of phrase: (11.6)
Adjacency to heads Given a phrase {H, {X, Y}}, H a head category and X and Y phrases that are potentially adjacent to H, then the more combinatorial and dependency relations whose processing domains can be minimized when X is adjacent to H, and the greater the minimization diVerence between adjacent X and adjacent Y in each domain, the more H and X will be adjacent.
11.3 Verbal complements and adjuncts In a head-initial language like English there is a clear preference for short phrases to precede longer ones, the short ones being adjacent to the relevant head. This ‘weight eVect’ has been analysed in Hawkins (1994, 1998, 2001) in terms of the eYciency with which phrasal combinations can be parsed (the theory of early immediate constituents, or EIC). This theory is now subsumed under the more general theory of MiD (11.2).2 2 See also Gibson’s (1998) theory of ‘locality of syntactic dependencies’, which makes similar predictions to minimize domains (11.2). See Hawkins (2004) for discussion of some diVerences between the two approaches.
210
Gradience in Syntax
11.3.1 Relative weight The immediate constituents (ICs) of a phrase can typically be recognized on the basis of less than all the words dominated by that phrase, and some orderings reduce the number of words needed to recognize these ICs, resulting in faster phrase structure recognition. 11.3.1.1 Head-initial structures Compare the alternations in (11.7a) and (11.7b) involving two PPs following an intransitive verb (PP2>PP1 in numbers of words): (11.7) a.
The man vp[waited pp1[for his son] pp2[in the cold but not 1 2 3 4 5 unpleasant wind]] b. The man vp[waited pp2[in the cold but not unpleasant wind] 1 2 3 4 5 6 7 8 pp1[for his son] ] 9
The parser can construct the three ICs of VP, V, PP1, and PP2, in a domain of Wve connected words in (11.7a), compared with nine in (11.7b), assuming that head categories like P (or other constructing categories) immediately project to mother nodes such as PP and render them predictable on-line.3 The greater eYciency of (11.7a), in which the same structure can be derived from less input, can then be captured in terms of its higher IC-to-word ratio within a VP constituent recognition domain (CRD).4 (11.7’) a. VP: IC-to-word ratio ¼ 3/5 or 60% b. VP: IC-to-word ratio ¼ 3/9 or 33% Hawkins (2000) analysed a set of English structures taken from a corpus in which exactly two PPs followed an intransitive verb and in which the PPs were permutable with truth-conditional equivalence, that is the speaker had a choice as in (11.7a) and (11.7b).5 Overall 82 per cent of the sequences with a 3 For detailed discussion of node construction and an axiom of constructability for phrase structure, see Hawkins (1993) and (1994: ch. 6). See also Kimball (1973) for an early formulation of the basic insight about phrasal node construction in parsing (his principle of New Nodes). 4 IC-to-word ratios are simpliWed procedures for quantifying what is technically an IC-to-nonIC ratio, which measures the ratio of ICs to all other terminal and non-terminal nodes in the domain. For explicit comparison of the two metrics, see Hawkins (1994: 69–83). For empirical testing of word-based and (structural) node-based complexity metrics and a demonstration that the two are highly intercorrelated, cf. Wasow (1997, 2002). See Lohse et al. (2004: 241) for a summary of, and references to, some of the issues that arise in actually deWning the ‘weight’ of a constituent. 5 The corpus of Hawkins (2000) consisted of 500 pages of written English (200 pages of Wction, 225 pages of non-Wction and 75 pages of a diary) and of eight distinct texts.
Gradedness in the Processing of Syntax and Semantics Table 11.1. n ¼ 323 [V PP1 PP2] [V PP2 PP1]
211
English prepositional phrase orderings by relative weight PP2 > PP1 by 1 word
by 2–4
by 5–6
by 7+
60% (58) 40% (38)
86% (108) 14% (17)
94% (31) 6% (2)
99% (68) 1% (1)
PP2 ¼ longer PP; PP1 ¼ shorter PP Proportion of short–long to long–short given as a percentage; actual numbers of sequences in parentheses An additional 71 sequences had PPs of equal length (total n ¼ 394) Source : Hawkins 2000: 237
length diVerence were ordered short before long (265/323), the short PP being adjacent to V, and the degree of the weight diVerence correlated precisely with the degree of preference for the short before long order, as shown in Table 11.1. As length diVerences increase, the eYciency (and IC-to-word ratio) of the long-before-short structure (11.7b) decreases relative to (11.7a), and (11.7a) is increasingly preferred, and predicted, by MiD (11.2). The data of Table 11.1 are from (written) production. Similar preferences have been elicited in production experiments by Stallings (1998) and Stallings et al. (1998). Domain minimization can be argued to be beneWcial for the speaker, therefore, and not just an accommodation to the hearer’s parsing needs (cf. Hawkins 2004). I shall accordingly relabel a CRD as a phrasal combination domain (PCD), making it compatible with production and comprehension. (11.8) Phrasal combination domain (PCD) The PCD for a mother node M and its I(mmediate) C(onstituents) consists of the smallest string of terminal elements (plus all Mdominated non-terminals over the terminals) on the basis of which the processor can construct M and its ICs. EIC can be generalized to make it compatible with both as follows: (11.9) Early immediate constituents (EIC) The human processor prefers linear orders that minimize PCDs (by maximizing their IC-to-word ratios), in proportion to the minimization diVerence between competing orders. 11.3.1.2 Head-Wnal structures For head-Wnal languages EIC predicts a mirror-image preference. Postposing a heavy NP or PP to the right in English shortens PCDs. Preposing heavy constituents in Japanese has the same eVect, since the relevant constructing categories (V for VP, P for PP, etc.) are now on the right (which is abbreviated here as VPm, PPm, etc.). In a structure like [{1PPm, 2PPm} V] the PCD for VP will proceed from the Wrst
212
Gradience in Syntax
P(ostposition) encountered to the verb, and will be smaller if the shorter 1PPm is adjacent to the verb than if the longer 2PPm is adjacent. The preferred pattern overall should be long before short in Japanese, therefore, and the degree of this preference should increase with increasing weight diVerentials. In this way the time course from recognition of the Wrst PP to the VP-Wnal verb will be faster than if the long PP is adjacent to V, following the shorter PP.6 Consider some illustrative data collected by Kaoru Horie and involving orderings of [{NPo, PPm} V], where NPo stands for a direct object NP containing an accusative case particle o, in combination with a postpositional phrase, that is with right-peripheral construction of PP by P (PPm).7 I assume here that the o is the constructing category for this case-marked NP, parallelling the Wnal postposition in PP and the Wnal V in VP, and that the processing of VP proceeds bottom-up in an order such as [PPm NPo V]: daughter constituents of PPm are processed before the PP itself is recognized (by projection from P). The distance and time course from the Wrst constructing category (P or o) to V is then shorter when the longer phrase precedes the shorter one. That is, [PPm NPo V] is preferred when PPm>NPo, and [NPo PPm V] is preferred when NPo>PPm. An example of the relevant sentence type is given in (11.10): (11.10) Japanese a. Tanaka ga [[Hanako kara] [sono hon o] katta] Tanaka NOM Hanako from that book ACC bought ‘Tanako bought that book from Hanako’ b. Tanaka ga [[sono hon o] [Hanako kara] katta] Table 11.2 presents the relative weights of the two non-subject phrases and their correlated orderings. NPo and PPm are collapsed together for present purposes. The data for individual [PPm NPo V] versus [NPo PPm V] orders are presented in Table 11.5. Table 11.2 reveals that each additional word of relative weight results in a higher proportion of long before short orders, mirroring the short before long preference of Table 11.1. The overall preference for long before short in Table 11.2 is 72 per cent (110/153) to 28 per cent short before long (43/153). This long before short eVect in Japanese has been replicated in Yamashita and Chang (2001) and in Yamashita (2002). It can also be seen in the widespread preposing preference for subordinate clauses with Wnal complementizers in this and other head-Wnal languages. 6 For quantiWcation of this Japanese preposing preference in relation to EIC, cf. Hawkins (1994: 80–1). 7 The Japanese corpus analysed by Kaoru Horie consisted of 150 pages of written Japanese summarized in Hawkins (1994: 142), and of three distinct texts.
Gradedness in the Processing of Syntax and Semantics Table 11.2.
213
Japanese NPo and PPm orderings by relative weight
n ¼ 153 [2ICm 1ICm V] [1ICm 2ICm V]
2ICm>1ICm by 1–2 words 66% (59) 34% (30)
by 3–4
by 5–8
by 9+
72% (21) 28% (8)
83% (20) 17% (4)
91% (10) 9% (1)
NPo ¼ direct object NP with accusative case particle o PPm ¼ PP constructed on its right periphery by a P(ostposition) ICm ¼ either NPo or PPm 2IC ¼ longer IC; 1IC ¼ shorter IC Proportion of long–short to short–long orders given as a percentage; actual numbers of sequences in parentheses An additional 91 sequences had ICs of equal length (total n ¼ 244) Source : Hawkins 1994: 152; data collected by Kaoru Horie
The preference for long before short in Japanese is not predicted by current models of language production, which are heavily inXuenced by English-type postposing eVects. Yet it points to the same minimization preference for PCDs that we saw in head-initial languages. For example, according to the incremental parallel formulator of De Smedt (1994), syntactic segments are assembled incrementally into a whole sentence structure, following message generation within a conceptualizer. Short constituents can be formulated with greater speed in the race between parallel processes and should accordingly be generated Wrst, before heavy phrases. The theory of EIC, by contrast, predicts that short phrases will be formulated Wrst only in head-initial languages, and it deWnes a general preference for minimal PCDs in all languages. The result: heavy ICs to the left and short ICs to the right in head-Wnal languages. 11.3.2 Complements versus adjuncts The adjacency hypothesis of (11.6) predicts a preference for adjacency that is proportional to the number of combinatorial and dependency relations whose domains can be minimized in competing orders, and in proportion to the extent of the minimization diVerence in each domain. This can be tested by reexamining the data from the last section to see whether the processing of additional relations between sisters and head has the predicted eVect on adjacency. Some of these data went against weight alone and had non-minimal PCDs. Such orders deserve special scrutiny, since they are predicted here to involve some other syntactic or semantic link whose processing prefers a minimal domain. PPs with intransitive verbs in English exhibit considerable variation with respect to their precise relation to the verb. An important distinction that is
214
Gradience in Syntax
commonly drawn in the theoretical literature is between ‘complement’ PPs, which are lexically listed alongside the verbs that govern them, and ‘adjunct’ PPs, which are not so listed and which are positioned by general syntactic rules. The PP for John is a complement in wait for John, whereas in the evening is an adjunct in wait in the evening. The problem with this distinction is that there are several more primitive properties that underlie it, and there are examples that fall inbetween.8 Some complements are obligatorily required, others are optional, like adjuncts. A transitive verb requires its complement object NP, and an intransitive verb like depend requires a co-occurring PP headed by on (I depended on John, contrast *I depended). The intransitive wait, on the other hand, is grammatical without its PP complement (cf. I waited for John, and I waited). The intransitive count is also grammatical without its PP complement headed by on (I counted), but the meaning is diVerent from that which is assigned in the presence of the complement (I counted on John). The interpretation of the preposition may also depend on the verb, even when the verb’s meaning is not dependent on the preposition. There are only few intransitive verbs like depend that require a co-occurring PP for grammaticality, but many like count on or wait for that involve a dependency, so we might use dependency as the major criterion for distinguishing complements from adjuncts. I will propose two dependency tests here. They provide what is arguably a suYcient criterion for complementhood and for the co-occurrence of a PP in the lexical co-occurrence frame of an intransitive verb (cf. Hawkins 2000). One test, the verb entailment test, asked: does [V, {PP1, PP2}] entail V alone or does V have a meaning dependent on either PP1 or PP2? For example, if the man waited for his son in the early morning is true, then it is also true that the man waited, and so the former entails the latter. But the man counted on his son in his old age does not entail the man counted. Another test, the pro-verb entailment test, asked: Can V be replaced by some general pro-verb or does one of the PPs require that particular V for its interpretation? For example, the boy played on the playground entails the boy did something on the playground, but the boy depended on his father does not entail the boy did something on his father. The results of applying these tests to the data of Table 11.1 were as follows. When there was a lexical-semantic dependency between V and just one of the PPs by one or both tests, then 73 per cent (151/206) had that PP adjacent to V. 8 See Schu¨tze and Gibson (1999) and Manning (2003) for useful discussion of the complement/ adjunct distinction in processing and grammar.
Gradedness in the Processing of Syntax and Semantics
215
Recall that 82 per cent had a shorter PP adjacent to V and preceding a longer one in Table 11.1. For PPs that were both shorter and lexically dependent, the adjacency rate to V was almost perfect, at 96 per cent (102/106). This combined adjacency eVect was statistically signiWcantly higher than for lexical dependency and EIC alone. The processing of a lexical combination evidently prefers a minimal domain, just as the processing of phrasal combinations does. This can be explained as follows. Any separation of count and on his son, and of wait and for his son, delays recognition of the lexical co-occurrence frame intended for these predicates by the speaker and delays assignment of the verb’s combinatorial and dependent properties. A verb can be, and typically is, associated with several lexical co-occurrence frames, all of which may be activated when the verb is processed (cf. Swinney 1979; MacDonald et al. 1994). Accompanying PPs will select between them, and in the case of verbs like count they will resolve a semantic garden path. For dependent prepositions, increasing separation from the verb expands the domain and working memory demands that are required for processing of the preposition. We can deWne a lexical domain as follows: (11.11) Lexical domain (LD)9 The LD for assignment of a lexically listed property P to a lexical item L consists of the smallest string of terminal elements (plus their associated syntactic and semantic properties) on the basis of which the processor can assign P to L. 11.3.3 The interaction of LD and PCD processing When lexical processing and phrase structure processing reinforce one another, that is when the lexically listed PP is also the shorter PP, we have seen a 96 per cent adjacency to the verb. When the two processing domains pull in diVerent directions, that is when the complement PP is longer, we expect variation. What is predicted is a stronger eVect within each domain in 9 I make fairly standard assumptions here about the properties that are listed in the lexicon. They include: the syntactic category or categories of L (noun, verb, preposition, etc.); the syntactic co-occurrence frame(s) of L, i.e. its ‘strict subcategorization’ requirements of Chomsky (1965) (e.g. V may be intransitive or transitive, if intransitive it may require an obligatory PP headed by a particular preposition, or there may be an optionally co-occurring PP headed by a particular preposition, etc.); ‘selectional restrictions’ imposed by L, Chomsky (1965) (e.g. drink requires an animate subject and liquid object); syntactic and semantic properties assigned to the complements of L (e.g. the theta-role assigned to a direct object NP by V); the diVerent range of meanings assignable to L with respect to which L is ambiguous or polysemous (the diVerent senses of count and follow and run); and frequent collocations of forms, whether ‘transparent’ or ‘opaque’ in Wasow’s (1997, 2002) sense.
216
Gradience in Syntax
proportion to the minimization diVerence between competing sequences. For phrasal combinations this will be a function of the weight diVerence between the two PPs, as measured by EIC ratios. For lexical domains it will be a function of the absolute size of any independent PP (Pi) that intervenes between the verb and an interdependent or matching PP. Let us abbreviate a PP judged interdependent by one or both entailment tests as Pd. If Pi is a short two-word phrase, the diVerence between [V Pd Pi] and [V Pi Pd] will be just two words. But as Pi gains in size, the processing domain for the lexical dependency between V and Pd grows, and the minimization preference for [V Pd Pi] grows accordingly. EIC’s graded weight preferences and predictions were conWrmed in the data of Table 11.1. For lexical dependency it is indeed the absolute size of the potentially intervening Pi that determines the degree of the adjacency preference between V and Pd, as shown in Table 11.3. As Pi grows in size, its adjacency to V declines.10 Multiple preferences therefore have an additive adjacency eVect by increasing the number of processing domains that prefer minimization. They can also result in exceptions to each preference when they pull in diVerent directions. Most of the Wfty-eight exceptional long-before-short sequences in Table 11.1 do indeed involve a dependency between V and the longer PP (Hawkins 2000), applying in proportion to the kind of domain minimization preference shown in Table 11.3. Conversely V and Pd can be pulled apart by EIC, in proportion to the weight diVerence between Pd and Pi. This is shown in Table 11.4. When Pi > Pd and both weight (minimal PCDs) and lexical dependency prefer Pd adjacent to V, there is almost exceptionless adjacency (in the righthand columns). When weights are equal and exert no preference, there is a strong (83 per cent) lexical dependency eVect. When the two preferences conXict and the dependent Pd is longer than Pi (in the left-hand columns) EIC asserts itself in proportion to its degree of preference: for one-word diVerentials lexical dependency claims the majority (74 per cent) adjacent to V; for 2–4 word diVerentials the short-before-long preference wins by 67 per cent to 33 per cent; and for 5+ word diVerentials it wins by a larger margin of 93 per cent to 7 per cent. EIC therefore asserts itself in proportion to its degree of preference for minimal PCDs. 10 In corresponding tables cited in Hawkins (2000, 2001) I included Wve additional sequences, making 211 in all, in which both PPs were interdependent with V, but one involved more dependencies than the other. I have excluded these Wve here, resulting in a total of 206 sequences, in all of which one PP is completely independent while the other PP is interdependent with V by at least one entailment test.
Gradedness in the Processing of Syntax and Semantics
217
Table 11.3. English lexically dependent prepositional phrase orderings n ¼ 206 [V Pd Pi] [V Pi Pd]
Pi ¼ 2–3 words
4–5
6–7
8+
59% (54) 41% (37)
71% (39) 29% (16)
93% (26) 7% (2)
100% (32) 0% (0)
Pd ¼ the PP that is interdependent with V by one or both entailment tests Pi ¼ the PP that is independent of V by both entailment tests Proportion of adjacent V-Pd to non-adjacent orders given as a percentage; actual numbers of sequences in parentheses Source : Hawkins’ 2000 data
Table 11.4. Weight and lexical dependency in English prepositional phrase orderings Pd>Pi by n ¼ 206
5+
2–4
Pd¼Pi 1
Pi>Pd by 1
2–4
5+
[V Pd Pi] 7% (2) 33% (6) 74% (17) 83% (24) 92% (23) 96% (49) 100% (30) [V Pi Pd] 93% (28) 67% (12) 26% (6) 17% (5) 8% (2) 4% (2) 0% (0) Pd ¼ the PP that is interdependent with V by one or both entailment tests Pi ¼ the PP that is independent of V by both entailment tests Proportion of adjacent V-Pd to non-adjacent orders given as a percentage; actual numbers of sequences in parentheses Source : Hawkins 2000: 247
For the Japanese data of Table 11.2, I predict a similar preference for complements and other lexically co-occurring items adjacent to the verb, and a similar (but again mirror-image) interaction with the long-before-short weight preference. A transitive verb contracts more syntactic and semantic relations with a direct object NP as a second argument or complement than it does with a PP, many or most of which will be adjuncts rather than complements. Hence a preference for NP adjacency is predicted, even when the NP is longer than the PP, though this preference should decline with increasing (relative) heaviness of the NP and with increasing EIC pressure in favour of long before short phrases. This is conWrmed in Table 11.5 where NP-V adjacency stands at 69 per cent overall (169/244) and is as high as 62 per cent for NPs longer than PP by 1–2 words and 50 per cent for NPs longer by 3–4 words, that is with short PPs before long NPs. Only for 5+ word diVerentials is NP-V adjacency avoided in favour of a majority (79 per cent) of long NPs before short PPs. When EIC and complement adjacency reinforce each other in favour of [PPm NPo V] in the right-hand columns, the result is signiWcantly higher NP
218
Gradience in Syntax
Table 11.5.
Weight and direct object adjacency in Japanese NPo>PPm by
n ¼ 244
5+
3–4
NPo¼PPm
1–2
[PPm NPo V] 21% (3) 50% (5) 62% (18) 66% (60) [NPo PPm V] 79% (11) 50% (5) 38% (11) 34% (31)
PPm>NPo by 1–2
3–8
9+
80% (48) 84% (26) 100% (9) 20% (12) 16% (5) 0% (0)
NPo ¼ see Table 11.2 PPm ¼ see Table 11.2 Proportion of adjacent NPo-V to non-adjacent orders given as a percentage; actual numbers of sequences in parentheses Source : Hawkins 1994: 152; data collected by Kaoru Horie
adjacency (of 80 per cent, 84 per cent and 100 per cent). When weights are equal there is a strong (66 per cent) NP adjacency preference deWned by the complement processing preference alone. And when EIC and complement adjacency are opposed in the left-hand columns, the results are split, as we have seen, and EIC applies in proportion to its degree of preference. Table 11.5 is the mirror-image of table 11.4 with respect to the interaction between EIC and lexical domain processing. One further prediction that remains to be tested on Japanese involves the PP-V adjacencies, especially those in the right-hand columns in which adjacency is not predicted by weight. These adjacencies should be motivated by strong lexical dependencies, that is they should be lexical complements or collocations in Wasow’s (1997, 2002) sense, and more Wne-tuned testing needs to be conducted in order to distinguish diVerent PP types here. 11.3.4 Total domain diVerentials The existence of multiple domains for the processing of syntactic and semantic properties requires a metric that can assess their combined eVect within a given structure. The principle of MiD (11.2) predicts tighter adjacency, the more combinatorial and dependency relations there are, and in proportion to the diVerences between competing domain sizes in the processing of each relation. But can we make any more absolute predictions for when, for example, the [V Pi Pd] variant will actually be selected over [V Pd Pi]? The data in the left-hand columns of Table 11.4 suggest at Wrst that weight is a stronger preference than lexical dependency since this latter claims a majority (of 74 per cent) only when the resulting long-before-short ordering involves a small one-word diVerence, and for all other relative weights of 2+ words the majority of orders are short-before-long, in accordance with EIC, with the dependent Pd non-adjacent to V.
Gradedness in the Processing of Syntax and Semantics
219
But there is another possibility. Every PP, whether Pd or Pi, is at least two words in length. The reason why a relative weight diVerence of oneword cannot generally assert itself in favour of the normally preferred shortbefore-long order, when the longer PP is a Pd, could be because the size of an intervening Pi will be longer (at 2+ words) than the relative weight diVerence. Hence the LD diVerential would always exceed the PCD diVerential when this latter stands at one word. As relative weights increase, the PCD totals will gradually equal or exceed the absolute size of the Pi, and the overall eYciency of the sequence will shift in favour of relative weight and minimal PCDs. In other words, we can make predictions for the relative strength of these two factors, the syntactic (phrasal combination) and the lexical, by measuring their respective domain minimizations in terms of words. We can then formulate a selection prediction based on an assessment of the degree of minimization that can be accomplished in each domain within each competing sequence. Whichever sequence has the highest overall minimization will be more eYcient and should be the one selected. This approach makes the (probably incorrect) assumption that word minimizations in phrasal combination domains and in lexical domains count equally. But in the absence of a good theory motivating why one should be stronger or weaker than the other, it is worth exploring how far we can go without additional assumptions and stipulations. In the event that a principled reason can eventually be given for why word minimizations in one domain should exert a stronger inXuence than those in another, this can be factored into the predictions. In the meantime I shall run the tests assuming equality between diVerent domains. Let us deWne a total domain diVerential (TDD) as in (11.12) and an associated prediction for performance in (11.13): (11.12)
Total domain differential (TDD) ¼ the collective minimization diVerence between two competing sequences measured in words and calculated on the basis of the phrasal combination domains, lexical domains, or other domains required for processing the syntactic or semantic relations within these sequences. (11.13) TDD performance prediction Sequences with the highest collective minimization diVerences will be those that are preferably selected, in proportion to their relative TDDs.
For the data of Table 11.4 we are dealing with phrasal combination domains (EIC eVects) and lexical domains (established on the basis of the entailment tests for lexical dependency). The TDD predictions can be set out as follows:
220 (11.14)
Gradience in Syntax TDD Predictions for Table 11.4 For Pi > Pd Only [V Pd Pi] preferred [V Pd Pi] Both PCDs and LDs prefer [V Pi Pd] Neither prefer For Pi ¼ Pd No PCD preference [V Pd Pi] LD prefers (in proportion to Pi size) [V Pi Pd] LD disprefers (in proportion to Pi size) For Pd > Pi PCD and LD conXict [V Pd Pi] LD preference $ PCD preference (i.e. the size of Pi $ weight diVerence) [V Pi Pd] PCD preference $ LD preference (i.e. the weight diVerence $ size of Pi)
These predictions are straightforward when Pi > Pd and when weights are equal, since a Pd adjacent to V is always preferred. But the two processing domains compete when Pd is the longer phrase, in examples such as count [on the lost son of my past] [in old age] versus count [in old age] [on the lost son of my past]. Pd has seven words here and Pi three. The weight diVerence is four, and the absolute size of Pi is three. This weight diVerence exceeds the size of Pi, so short before long is predicted, that is [V Pi Pd]. When the weight diVerence is less than Pi, then [V Pd Pi] is preferred, for example count [on him] [in old age] (weight diVerence ¼ 1, Pi ¼ 3). With weight diVerences equal to Pi, both orders are possible. The results are set out in (11.15). I Wrst give the total correct for the 206 sequences of Table 11.4 (a full 90 percent), along with Wgures from the remaining 10 percent that come within one word per domain of being correct. I then give separate Wgures for the conXict cases in which Pd > Pi. The success rate here is 83 percent, and this jumps to 97 percent for sequences that come within one word per domain of the preferred TDD. (11.15) Results for Table 11.4 Total with preferred orders ¼ 90 per cent (185/206) Additional total within 1 word per domain of preferred TDD ¼ 95 per cent (196/206) Total correct in conXict cases ¼ 83 per cent (59/71) Additional total within 1 word per domain of preferred TDD ¼ 97 per cent (69/71) These Wgures provide encouraging support for this predictive multi-domain approach to adjacency and relative ordering.11 11 See Hawkins (2004) for further illustration and testing of these total domain diVerential predictions in a variety of other structural types.
Gradedness in the Processing of Syntax and Semantics
221
11.4 Minimal domains in grammars Grammatical conventions across languages reveal the same degrees of preference for minimal phrasal combination domains that we saw in the performance data of Section 11.3.1. For example, the Greenbergian word order correlations show that the adjacency of lexical head categories is massively preferred over their non-adjacency (Greenberg 1963; Hawkins 1983; Dryer 1992). EIC predicts these correlations (cf. Hawkins 1990, 1994). Two of them are presented in (11.16) and (11.17), with IC-to-word ratios given for each order in (11.16).12 Example (11.16) shows a correlation between verb-initial order and prepositions, and between verb-Wnal and postpositions (i.e. phrases corresponding to [went [to the store]] versus [[the store to] went]). Example (11.17) shows one between prepositions and nouns preceding possessive (genitive) phrases and between postpositions and nouns following (corresponding to [in [books of my professor]] versus [[my professor of books] in]). (11.16) a. vp[V pp[P NP]] ¼ 161 (41%) b. [[NP P]pp V]vp ¼ 204 (52%) IC-to-word: 2/2 ¼ 100% IC-to-word: 2/2 ¼ 100% d. [pp[P NP] V]vp ¼ 6 (2%) c. vp[V [NP P]pp] ¼ 18 (5%) IC-to-word: 2/4 ¼ 50% IC-to-word: 2/4 ¼ 50% Assume: V ¼ 1 word; P ¼ 1; NP ¼ 2 EIC-preferred (16a)+(16b) ¼ 365/389 (94%) (11.17) a. pp[P np[N Possp]] ¼ 134 (40%) b.[[Possp N]np P]pp ¼ 177 (53%) c. pp[P [Possp N]np] ¼ 14 (4%)
d.[np[N Possp] P]pp ¼ 11 (3%)
EIC-preferred (17a) + (17b) ¼ 311/336 (93%) The adjacency of V and P, and of P and N, guarantees the shortest possible domain for the recognition and production of the two ICs in question (V and PP within VP, P and NP within PP). Two adjacent words suYce, hence 100 per cent IC-to-word ratios. In the non-adjacent domains of the (c) and (d) orders, ratios are signiWcantly lower and exemplifying languages are signiWcantly less. The preferred (a) and (b) structures collectively account for 94 per cent and 93 per cent of all languages respectively. Patterns like these have motivated the head-initial (or VO) and head-Wnal (OV) parameter in both typological and generative research, see, for example, 12 The quantitative data in (11.16) are taken from Matthew Dryer’s sample, measuring languages rather than genera (see Dryer 1992, Hawkins 1994: 257). The quantitative data in (11.17) come from Hawkins (1983, 1994: 259).
222
Gradience in Syntax
Vennemann (1974), Lehmann (1978), Hawkins (1983), and Travis (1984, 1989). The two language types are mirror images of one another, and EIC provides an explanation: both (a) and (b) are optimally eYcient. Grammatical conventions also reveal a preference for orderings in proportion to the number of combinatorial and dependency relations whose processing domains can be minimized (recall (11.6)). Complements prefer adjacency to heads over adjuncts in the basic ordering rules of numerous phrases in English and other languages and are generated in a position adjacent to the head in the phrase structure grammars of JackendoV (1977) and Pollard and Sag (1987). Tomlin’s (1986) verb object bonding principle supports this. Verbs and direct objects are regularly adjacent across languages and there are languages in which it is impossible or highly dispreferred for adjuncts to intervene between a verbal head and its subcategorized object complement. The basic reason I oVer is that complements also prefer adjacency over adjuncts in performance (cf. 11.3.3), and the explanation for this, in turn, is that there are more combinatorial and/or dependency relations linking complements to their heads than link adjuncts to their heads. Complements are listed in a lexical co-occurrence frame deWned by, and activated in on-line processing by, a speciWc head such as a verb and processing this co-occurrence favours a minimal lexical domain (11.11). There are more productive relations of semantic and syntactic interdependency between heads and complements than between heads and adjuncts. A direct object receives its theta-role from the transitive verb, and so on. These considerations suggest that domain minimization has also shaped grammars and the evolution of grammatical conventions, according to the following hypothesis: (11.18) Performance-grammar correspondence hypothesis (PGCH) Grammars have conventionalized syntactic structures in proportion to their degree of preference in performance, as evidenced by patterns of selection in corpora and by ease of processing in performance. It follows from the PGCH that performance principles can often explain what purely grammatical models can only stipulate, in this context adjacency eVects and the head ordering parameter. SigniWcantly, they can also explain exceptions to these stipulations, as well as many grammatically unpredicted regularities. For example, Dryer (1992) has shown that there are systematic exceptions to Greenberg’s correlations ((11.16)/(11.17)) and to consistent head ordering when the non-head is a single-word item, for example an adjective modifying a noun (yellow book). Many otherwise head-initial languages have non-initial heads here (English), while many otherwise head-Wnal languages
Gradedness in the Processing of Syntax and Semantics
223
have noun before adjective (Basque). But when the non-head is a branching phrase, there are good correlations with the predominant head ordering position. EIC can explain this asymmetry. When a head category like N (book) has a branching phrasal sister like Possp {of, the professor} within NP, the distance from N to the head category P or V that constructs the next higher phrase, PP or VP respectively, will be long when head orderings are inconsistent, see, for example, (11.17c) and (11.17d). If the intervening category is a non-branching single word, then the diVerence between pp[P [Adj N]np] and pp[P np[N Adj]] is small, only one word. Hence the MiD preference for noun initiality (and for noun Wnality in postpositional languages) is signiWcantly less than it is for intervening branching sisters, and either less head ordering consistency or no consistency is predicted. When there is just a one-word diVerence between competing domains in performance, in for example Table 11.1, both ordering options are generally productive, and so too in grammars. Many such universals can be predicted from performance preferences, including structured hierarchies of centre-embedded constituents and of Wller-gap dependencies, markedness hierarchies, symmetries versus asymmetries, and many morphosyntactic regularities (Hawkins 1994, 1999, 2001, 2003, 2004). A model of grammar that seems ideally suited to capturing this performance-grammar correspondence is S(tochastic) O(ptimality) T(heory), cf. Bresnan et al. (2001), Manning (2003). These authors point to the performance preference for Wrst and second person subjects in English (I was hit by the bus) over third person subjects (the bus hit me), which has been conventionalized into an actual grammaticality distinction in the Salish language Lummi. SOT models this by building performance preferences directly into the grammar as a probability ranking relative to other OT constraints. For English there is a partial overlap with other ranked constraints, and non-Wrst person subjects can surface as grammatical. In Lummi there is no such overlap and sentences corresponding to the bus hit me are not generated. In the process, however, SOT stipulates a stochastic distribution between constraint rankings within the grammar, based on observed frequencies in performance. We could formulate a similar stochastic model for the phrase structure adjacencies and lexical co-occurrences of this paper. But there are good reasons not to do so. First, SOT would then stipulate what is predictable from the performance principle of MiD (11.2). Second, the grammatical type of the syntactic and semantic relation in question does not enable us to predict the outcome of the constraint
224
Gradience in Syntax
interaction. What matters is the size of the domain that a given relation happens to require in a given sentence and its degree of minimization over a competitor. One and the same grammatical relation can have diVerent strengths in diVerent sentences (as a function of the weight diVerences between sisters, for example). And phrasal combination processing can be a stronger force for adjacency than lexical dependency in some sentences, but weaker in others. In other words, it is processing, not grammar, that makes predictions for performance, and it would be unrevealing to model this as an unexplained stochastic distribution in a grammar, when there is a principled account in terms of MiD. And third, I would argue that performance preferences have no place in a grammar anyway, whose primary function is to describe the grammatical conventions of the relevant language. To do so is to conXate, and confuse, explanatory questions of grammatical evolution and synchronic questions of grammaticality prediction. The soft constraint/hard constraint insight is an important one, and it Wts well with the PGCH (11.18), but hard constraints can be explained without putting soft constraints into the same grammar, and the soft constraints require a processing explanation, not a grammatical one.
11.5 Conclusions The data considered in this paper lead to the conclusions summarized in (11.1). First, there are clear preferences among competing and grammatically permitted structures in the corpus data of English (Tables 11.1, 11.3, and 11.4) and Japanese (Tables 11.2 and 11.5). These preferences constitute a set of ‘gradedness eVects’ and they can be argued to result from minimization diVerences in processing domains, cf. minimize domains (11.2). MiD deWnes a cumulative beneWt for minimality when the same terminal elements participate in several processing domains. The English intransitive verbs and PPs that contract relations of both phrasal sisterhood and of lexical combination exhibit 96 per cent adjacency, those that involve only one or the other relation exhibit signiWcantly less (cf. 11.3.3). The relative strengths in these cases reXect the degree of minimization distinguishing competing orders in the processing of each relation. These cumulative eVects are captured in a quantitative metric that measures total domain diVerentials (or TDDs) across structures (11.12). The metric measures domain sizes in words, but could easily be adapted to quantify a more inclusive node count, or a count in terms of phrasal nodes only.
Gradedness in the Processing of Syntax and Semantics
225
Experimental Wndings are generally correlated with corpus frequences for these ordering phenomena, see, for example, Stallings (1998), Wasow (2002), and Yamashita and Chang (2001) for weight eVects. Acceptability intuitions, on the other hand, appear capable of detecting only the extremes of preference and dispreference. My own judgements on English could not have produced these preferences in advance, and disagreements among linguists outside the extremes of clear grammaticality and ungrammaticality are legion. Second, domain minimization preferences in performance have also shaped grammars and the evolution of grammatical conventions, according to the performance-grammar correspondence hypothesis (11.18). Computer simulations of the evolution of some of these conventions out of performance preferences can be found in Kirby (1999). Third, this kind of ‘eYciency-based’ theory of performance can give a principled explanation for grammatical principles and universals, such as head ordering, subjacency hierarchies and numerous other phenomena. It can explain exceptions to these stipulations, such as the absence of consistent head ordering with single-word sisters of heads. And it can give a theoretical reason for expecting some typological variants rather than others, by observing patterns of preference in languages with variation, like the postverbal orderings of English and the preverbal orderings of Japanese, and formulating performance principles in conjunction with the PGCH. I have argued that S(tochastic) O(ptimality) T(heory), despite its attractive premise that hard constraints mirror soft constraints, stipulates the performance preferences that need to be explained, and since they are not explained, the explanation for grammars and grammatical variation is weakened. The frequency data of this paper are patterned and principled. It remains to be seen whether other stochastic distributions can be similarly explained, or whether there are simply certain ‘conventions of use’ in diVerent speech communities that have to be learned.13 The person ranking constraint of Bresnan et al. (2001) and Manning (2003) correlates with, and suggests an explanation in terms of, degrees of accessibility (Ariel 1990) in noun phrase references in performance. Interesting test cases here would involve languages with similar typologies but diVerent distributions for, for example, passive versus active, or diVerent scrambling or extraposition frequencies. Relative or absolute values for these 13 See Hawkins (2003, 2004) for detailed discussion of frequency distributions and their grammatical counterparts in numerous areas, in terms of minimize domains (11.2), in conjunction with two further principles of eYciency, minimize forms, and maximize on-line processing.
226
Gradience in Syntax
frequencies may not be predictable, but you will not know this if you look only at the grammar of the preference (Wrst and second person subjects are preferred over third persons, etc.) instead of its processing. Should such distributions turn out not to be predictable from eYciency principles of performance, they should still not be included in a grammar, if their stochastic ranking is not explainable by grammatical principles either, which it almost certainly will not be. Fourth and Wnally, we need a genuine theory of performance and of the human processing architecture from which frequencies can be derived, and I have argued (Hawkins 2004) that we do not yet have the kind of general architecture that we need. MiD is an organizing principle with some predictiveneness, but it too must be derivable from this architecture. We also need the best model of grammatical description we can get, incorporating relevant conventions in languages that have them, and deWning the diVerences in grammaticality between languages in the best possible way. The further ingredient that is ultimately needed in the explanatory package is a diachronic model of adaptation and change, of the type outlined in Haspelmath (1999) and Kirby (1999).
12 Probabilistic Grammars as Models of Gradience in Language Processing M AT T H EW W. C RO C K E R A N D F R A N K K E L L E R
12.1 Introduction Gradience in language comprehension can be manifest in a variety of ways, and have various sources of origin.1 Based on theoretical and empirical results, one possible way of classifying such phenomena is whether they arise from the grammaticality of a sentence, perhaps reflecting the relative importance of various syntactic constraints, or arise from processing, namely the mechanisms which exploit our syntactic knowledge for incrementally recovering the structure of a given sentence. Most of the chapters in this volume are concerned with the former: how to characterize and explain the gradient grammaticality of a given utterance, as measured, for example, by judgements concerning acceptability. While the study of gradient grammaticality has a long history in the generative tradition (Chomsky 1964, 1975), recent approaches such as the minimalist programme (Chomsky 1995) do not explicitly allow for gradience as part of the grammar. In this chapter, we more closely consider the phenomena of gradient performance: how can we explain the variation in processing difficulty, as reflected for example in word-by-word reading times? Psycholinguistic research has identified two key sources of processing difficulty in sentence comprehension: local ambiguity and processing load. In the case of local, or temporary ambiguity, there is abundant evidence that people adopt some preferred interpretation immediately, rather then delaying interpretation. Should the corresponding 1 The authors would like to thank the volume editors, the anonymous reviewers, and also Marshall Mayberry for their helpful comments. The authors gratefully acknowledge the support of the German Research Foundation (DFG) through SFB 378 Project ‘Alpha’ awarded to the first author, and an Emmy Noether fellowship awarded to the second author.
228
Gradience in Syntax
syntactic analysis be disconfirmed by the sentence’s continuation, reanalysis is necessary, and is believed to be an important contributor to observable difficulties in processing. It is also the case, however, that processing difficulties are found in completely unambiguous utterances, such as centre embedded structures. One explanation of such effects is that, despite being both grammatical and unambiguous, such sentences require more cognitive processing resources (such as working memory) than are available. While these phenomena have been well studied, both empirically and theoretically, there has been little attempt to model relative processing difficulty: why some sentences are more difficult than others, and precisely how difficult they are. Quantitative models, which can predict real-valued behavioural measures are even less common. We argue, however, that one relatively new class of models offers considerable promise in addressing this issue. The common distinguishing feature of the models we discuss here is that they are experience-based. The central idea behind experienced-based models is that the mechanisms which people use to arrive at an incremental interpretation of a sentence are crucially dependent on relevant prior experience. Generally speaking, interpretations which are supported by our prior experience are preferred to those which are not. Furthermore, since experience is generally encoded in models as some form of relative likelihood, or activation, it is possible for models to generate real-valued, graded predictions about the processing difficulty of a particular sentence. We begin by reviewing some of the key psycholinguistic evidence motivating the need for experience-based mechanisms, before turning to a discussion of recent models. We focus our attention here on probabilistic models of human sentence processing, which attempt to assign a probability to a given sentence, as well as to alternative parse interpretations for that sentence. Finally, we will discuss the relationship between probabilistic models of performance (gradient processing complexity), and probabilistic models of competence (gradient grammaticality). A crucial consequence of the view we propose is that the likelihood of a (partial) structure is only meaningful relative to the likelihood of competing (partial) structures, and does not provide an independently useful characterization of the grammaticality of the alternatives. Thus we argue that a probabilistic characterization of gradient grammaticality should be quite different from a probabilistic performance model.
12.2 The role of experience in sentence processing People are continually faced with the problem of resolving the ambiguities that occur in the language they hear and read (Altmann 1998). Computational theories of human language comprehension therefore place much emphasis
Probabilistic Grammars in Language Processing
229
on the algorithms for constructing syntactic and semantic interpretations, and the strategies for deciding among alternatives, when more than one interpretation is possible (Crocker 1999). The fact that people understand language incrementally, integrating each word into their interpretation of the sentence as it is encountered, means that people are often forced to resolve ambiguities before they have heard the entire utterance. While it is clear that many kinds of information are involved in ambiguity resolution (Gibson and Pearlmutter 1998), much attention has recently been paid to the role of linguistic experience. That is to say, to what extent do the mechanisms underlying human language comprehension rely on previous linguistic encounters to guide them in resolving an ambiguity they currently face? During his or her lifetime, the speaker of a language accrues linguistic experience. Certain lexical items are encountered more often than others, some syntactic structures are used more frequently, and ambiguities are often resolved in a particular manner. In lexical processing, for example, the influence of experience is clear: high frequency words are recognized more quickly than low frequency ones (Grosjean 1980), syntactically ambiguous words are initially perceived as having their most likely part of speech (Crocker and Corley 2002), and semantically ambiguous words are associated with their more frequent sense (Duffy et al. 1988). Broadly, we define a speaker’s linguistic experience with a given linguistic entity as the number of times the speaker has encountered this entity in the past. Accurately measuring someone’s linguistic experience would (in the limit) require a record of all the text or speech that person has ever been exposed to. Additionally, there is the issue of how experience is manifest in the syntactic processing mechanism. The impracticality of this has lead to alternative proposals for approximating linguistic experience, such as norming experiments or corpus studies. Verb frames are an instance of linguistic experience whose influence on sentence processing has been researched extensively in the literature. The frames of a verb determine the syntactic complements it can occur with. For example, the verb know can appear with a sentential complement (S frame) or with a noun phrase complement (NP frame). Norming studies can be conducted in which subjects are presented with fragments such as (12.1) and complete them to form full sentences. (12.1) The teacher knew —. Subjects might complete the fragment using the answer (NP frame) or the answer was false (S frame). Verb frame frequencies can then be estimated as the frequencies with which subjects use the S frame or the NP frame (Garnsey
230
Gradience in Syntax
et al. 1997). An alternative to the use of completion frequencies is the use of frequencies obtained in a free production task, where subjects are presented only with a verb, and are asked to produce a sentence incorporating this verb (Connine et al. 1984). An alternative technique is to extract frequency information from a corpus, a large electronic collection of linguistic material. A balanced corpus (Burnard 1995; Francis et al. 1982), which contains representative samples of both text and speech in a broad range of genres and styles, is often assumed to provide an approximation of human linguistic experience. In our examples, all instances of know could be extracted from a corpus, counting how often the verb occurs with the NP and the S frame. Additionally, however, there is the issue of how experience is manifest in the syntactic processing mechanism. A simple frequentist approach would mean that all our experience has equal weight, whether an instance of exposure occurred ten seconds ago, or ten years ago. This is true for the kinds of probabilistic models we discuss here. Thus an interesting difference between corpus estimates and norming studies is that the former approximates the experience presented to a speaker, while the latter reflects the influence of that experience on a speaker’s preferences. Results in the literature broadly indicate that frame frequencies obtained from corpora and norming studies are reliably correlated (Lapata et al. 2001; Sturt et al. 1999). It should be borne in mind, however, that corpus frequencies vary as a function of the genre of the corpus (Roland and Jurafsky (1998) compared text and speech corpora) and also verb senses play a role (Roland and Jurafsky 2002). Once language experience has been measured using norming or corpus studies, the next step is to investigate how the human language processor uses experience to resolve ambiguities in real time. A number of studies have demonstrated the importance of lexical frequencies. These frequencies can be categorical (e.g. the most frequent part of speech for an ambiguous word, Crocker and Corley 2002), morphological (e.g. the tendency of a verb to occur in a particular tense, Trueswell 1996), syntactic (e.g. the tendency of a verb to occur with a particular frame, as discussed above, Ford et al. 1982; Garnsey et al. 1997; Trueswell et al. 1993), or semantic (e.g. the tendency of a noun to occur as the object of a particular verb, Garnsey et al. 1997; McRae et al. 1998; Pickering et al. 2000). It has been generally argued that these different types of lexical frequencies form a set of interacting constraints that determine the preferred parse for a given sentence (MacDonald 1994; MacDonald et al. 1994; Trueswell and Tanenhaus 1994). Other researchers (Brysbaert and Mitchell 1996; Mitchell et al. 1996) have taken the stronger view that the human parser not only makes use of lexical
Probabilistic Grammars in Language Processing
231
frequencies, but also keeps track of structural frequencies. This view, known as the tuning hypothesis, states that the human parser deals with ambiguity by initially selecting the syntactic analysis that has worked most frequently in the past (see Figure 12.1). The fundamental question that underlies both lexical and structural experience models is the grain problem: What is the level of granularity at which the human sentence processor ‘keeps track’ of frequencies? Does it count lexical frequencies or structural frequencies (or both), or perhaps frequencies at an intermediate level, such as the frequencies of individual phrase structure rules? The latter assumption underlies a number of experience-based models that are based on probabilistic context free S
VP
NP
NP
V
Someone shot
det
N PP
the servant
Spanish prep
NP English
of det
N
the actress
RC
relp
S
who
...
Figure 12.1 Evidence from relative clause (RC) attachment ambiguity has been taken to support an experience-based treatment of structural disambiguation. Such constructions are interesting because they do not hinge on lexical preferences. When reading sentences containing the ambiguity depicted above, English subjects demonstrate a preference for low-attachment (where the actress will be further described by the RC who . . . ), while Spanish subjects, presented with equivalent Spanish sentences, prefer high-attachment (where the RC concerns the servant) (Cuetos and Mitchell 1988). The Tuning Hypothesis was proposed to account for these findings (Brysbaert and Mitchell 1996; Mitchell et al. 1996), claiming that initial attachment preferences should be resolved according to the more frequent structural configuration. Later experiments further tested the hypothesis, examining subjects’ preferences before and after a period of two weeks in which exposure to high or low examples was increased. The findings confirmed that even this brief period of variation in ‘experience’ influenced the attachment preferences as predicted (Cuetos et al. 1996)
232
Gradience in Syntax
grammars (see Figure 12.2 for details). Furthermore, at the lexical level, are frame frequencies for verb forms counted separately (e.g. know, knew, knows, . . . ) or are they combined into a set of total frequencies for the verb’s base form (the lemma KNOW) (Roland and Jurafsky 2002)?
12.3 Probabilistic models of sentence processing Theories of human syntactic processing have traditionally down-played the importance of frequency (Fodor and Frazier 1978; Marcus 1980; Pritchett 1992), focusing rather on the characterization of more general, sometimes language universal, processing mechanisms (Crocker 1996). An increasing number of models, however, incorporate aspects of linguistic experience in some form or other. This is conceptually attractive, as an emphasis on experience may help to explain some of the rather striking, yet often unaddressed, properties of human sentence processing: . Efficiency: The use of experience-based heuristics, such as choosing the reading that was correct most often in the past, helps explain rapid and seemingly effortless processing, despite massive ambiguity. . Coverage: In considering the full breadth of what occurs in linguistic experience, processing models will be driven to cover more linguistic phenomena, and may look quite different from the toy models which are usually developed. . Performance: Wide-coverage experience-based models can offer an explanation of how people rapidly and accurately understand most of the language they encounter, while also explaining the kinds of pathologies which have been the focus of most experimental and modelling research. . Robustness: Human language processing is robust to slips of the tongue, disfluencies, and minor ungrammaticalities. The probabilistic mechanisms typically associated with experience-based models can often provide sensible interpretations even in the face of such noise. . Adaptation: The human language processor is finely tuned to the linguistic environment it inhabits. This adaptation is naturally explained if processing mechanisms are the product of learning from experience. Approaches in the literature differ substantially in how they exploit linguistic experience. Some simply permit heterogeneous linguistic constraints to have ‘weights’ which are determined by frequency (MacDonald et al. 1994; Tanenhaus et al. 2000), others provide probabilistic models of lexical and syntactic processing (Crocker and Brants 2000; Jurafsky 1996), while connectionist
Probabilistic Grammars in Language Processing
233
models present yet a further paradigm for modelling experience (Christiansen and Chater 1999, 2001; Elman 1991, 1993). Crucially, however, whether experience is encoded via frequencies, probabilities, or some notion of activation, all these approaches share the idea that sentences and their interpretations will be associated with some real-valued measure of goodness: namely how likely or plausible an interpretation is, based on our prior experience. The appeal of probabilistic models is that they acquire their parameters from data in their environment, offering a transparent relationship between linguistic experience and a model’s behaviour. The probabilities receive a cognitive interpretation; typically a high probability is assumed to correlate with a low processing effort. This suggests that the human sentence processor will prefer the structure with the lowest processing effort when faced with a syntactic ambiguity (see Figure 12.1 for an example). Before considering probabilistic models of human processing in more detail, we first quickly summarize the ideas that underlie probabilistic parsing. 12.3.1 Probabilistic grammars and parsing A probabilistic grammar consists of a set of symbolic rules (e.g. context free grammar rules) annotated with rule application probabilities. These probabilities can then be combined to compute the overall probability of a sentence, or for a particular syntactic analysis of a sentence. The rule probabilities are typically derived from a corpus—a large, annotated collection of text or speech. In cognitive terms, the corpus can be regarded as an approximation of the language experience of the user; the probabilities a reflection of language use, that is they provide a model of linguistic performance. Many probabilistic models of human sentence processing are based on the framework of probabilistic context free grammars (PCFGs, see Manning and Schu¨tze 1999, for an overview). PCFGs augment standard context free grammars by annotating grammar rules with rule probabilities. A rule probability expresses the likelihood of the left-hand side of the rule expanding to its right-hand side. As an example, consider the rule VP ! V NP in Figure 12.2(a). This rule says that a verb phrase expands to a verb followed by a noun phrase with a probability of 0.7. In a PCFG, the probabilities of all rules with the same left-hand side have to sum to one: P (12.2) 8i P(N i ! zj ) ¼ 1 j
where P(N i ! zj ) is the probability of a rule with the left-hand side N i and the righthand side zj . For example, in Figure 12.2(a) the two rules VP ! V NP and VP ! VP PP share the same left-hand side (VP), so their probabilities sum to one.
234
Gradience in Syntax
The probability of a parse tree generated by a PCFG is computed as the product of its rule probabilities: Q P(N ! z) (12.3) P(t) ¼ (N!z)2R
where R is the set of all rules applied in generating the parse tree t. It has been suggested that the probability of a grammar rule models how easily this rule can be accessed by the human sentence processor (Jurafsky 1996). Structures with greater overall probability should be easier to construct, and therefore preferred in cases of ambiguity. As an example consider the PCFG in Figure 12.2(a). This grammar generates two parses for the the sentence John hit the man with the book. The first parse t1 attaches the prepositional phrase with the book to the noun phrase (low attachment), see Figure 12.2(b). The PCFG assigns t1 the following probability, computed as the product of the probabilities of the rules used in this parse: (12.4)
P(t1 ) ¼ 1:0 0:2 0:7 1:0 0:2 0:6 1:0 1:0 0:5 1:0 0:6 1:0 0:5 ¼ 0:00252
The alternative parse t2 , with the prepositional phrase attached to the verb phrase (high attachment, see Figure 12.2(c)) has the following probability: (12.5) P(t2 ) ¼ 1:0 0:2 0:3 0:7 1:0 1:0 0:6 1:0 0:6 1:0 0:5 1:0 0:5 ¼ 0:00378 Under the assumption that the probability of a parse is a measure of processing effort, we predict that t2 (high attachment) is easier to process than t1 , as it has a higher probability. In applying PCFGs to the problem of human sentence processing, an important additional property must be taken into account: incrementality. That is, people face a local ambiguity as soon as they hear the fragment John hit the man with . . . and must decide which of the two possible structures is (a) S → NP VP
1.0
NP → DetNP
0.6
V → hit
1.0
PP → P NP
1.0
NP → NP PP
0.2
N → man
0.5
VP → V NP
0.7
NP → John
0.2
N → book
0.5
P → with
1.0
Det → the
1.0
VP → VP PP 0.3
Figure 12.2 An example for the parse trees generated by a probabilistic context free grammar (PCFG). (a) The rules of a simple PCFG with associated rule application probabilities. (b) and (c) The two parse trees generated by the PCFG in (a) for the sentence John hit the man with the book.
Probabilistic Grammars in Language Processing (b)
S 1.0
NP 0.2
VP 0.7
John
V 1.0
NP 0.2
hit NP 0.6
PP 1.0
Det 1.0 N 0.5
the
NP 0.6
P 1.0
man
Det 1.0
with
the
(c)
S 1.0
NP 0.2
VP 0.3
John VP 0.7
V 1.0
hit
NP 0.6
Det 1.0 N 0.5
the
Figure 12.2.
PP 1.0
Contd.
man
P 1.0
with
NP 0.6
Det 1.0
the
N 0.5
book
N 0.5
book
235
236
Gradience in Syntax
to be preferred. This entails that the parser is able to compute prefix probabilities for sentence initial substrings, as the basis for comparing alternative (partial) parses. Existing models provide a range of techniques for computing and comparing such parse probabilities incrementally (Brants and Crocker 2000; Hale 2001; Jurafsky 1996). For the example in Figure 12.1, however, the preference for t2 would be predicted even before the final NP is processed, since the probability of that NP is the same for both structures. Note that the move from CFGs to PCFGs also raises a number of other computational problems, such as the problem of efficiently computing the most probable parse for a given input sentence. Existing parsing schemes can be adapted to PCFGs, including shift-reduce parsing (Briscoe and Carroll 1993) and left-corner parsing (Stolcke 1995). These approaches all use the basic Viterbi algorithm (Viterbi 1967) for efficiently computing the best parse generated by a PCFG for a given sentence. 12.3.2 Probabilistic models of human behaviour Jurafsky (1996) suggests using Bayes’ rule to combine structural probabilities generated by a probabilistic context free grammar with other probabilistic information. The model therefore integrates multiple sources of experience into a single, mathematically founded framework. As an example consider again the fragment in (12.1). When a speaker reads or hears know, he or she has the choice between two syntactic readings, involving either an S complement or an NP complement. Jurafsky’s model computes the probabilities of these two readings based on two sources of information: the overall structural probability of the S reading and the NP reading, and the lexical probability of the verb know occurring with an S or an NP frame. The structural probability of a reading is independent of the particular verb involved; the frame probability, however, varies with the verb. This predicts that in some cases lexical probabilities can override structural probabilities. Jurafsky’s model is able to account for a range of parsing preferences reported in the psycholinguistic literature. However, it might be criticized for its limited coverage, that is for the fact that it uses only a small lexicon and grammar, manually designed to account for a handful of example sentences. In the computational linguistics literature, on the other hand, broad coverage parsers are available that compute a syntactic structure for arbitrary corpus sentences with an accuracy of about 90 per cent (Charniak 2000). Psycholinguistic models should aim for similar coverage, which is clearly part of human linguistic performance.
Probabilistic Grammars in Language Processing
237
This issue has been addressed by Corley and Crocker’s (2000) broad coverage model of lexical category disambiguation. Their approach uses a bigram model to incrementally compute the probability that a string of words w0 . . . wn has the part of speech sequence t0 . . . tn as follows: n Q (12.6) P(t0 . . . tn ,w0 . . . wn ) P(wi jti )P(ti jti 1 ) i¼1
Here, P(wi jti ) is the conditional probability of word wi given the part of speech ti , and P(ti jti 1 ) is the probability of ti given the previous part of speech ti 1 . This model capitalizes on the insight that many syntactic ambiguities have a lexical basis, as in (12.7): (12.7) The warehouse prices/makes —. These fragments are ambiguous between a reading in which prices or makes is the main verb or part of a compound noun. After being trained on a large corpus, the model predicts the most likely part of speech for prices, correctly accounting for the fact that people understand prices as a noun, but makes as a verb (Crocker and Corley 2002; Frazier and Rayner 1987; MacDonald 1993). Not only does the model account for a range of disambiguation preferences rooted in lexical category ambiguity, it also explains why, in general, people are highly accurate in resolving such ambiguities. More recent work on broad coverage parsing models has extended this approach to full syntactic processing based on PCFGs (Crocker and Brants 2000). This research demonstrates that when such models are trained on large corpora, they are not only able to account for human disambiguation behaviour, but they are also able to maintain high overall accuracy under strict memory and incremental processing restrictions (Brants and Crocker 2000). Finally, it is important to stress that the kind of probabilistic models we outline here emphasizes lexical and syntactic information in estimating the probability of a parse structure. To the extent that a PCFG is lexicalized, with the head of each phrase being projected upwards to phrasal nodes (Collins 1999), some semantic information may also be implicitly represented in the form of word co-occurrences (e.g. head–head co-occurrences). In addition to being incomplete models of interpretation, such lexical dependency probabilities are poor at modelling the likelihood of plausible but improbable structures. Probabilistic parsers in their current form are therefore only appropriate for modelling syntactic processing preferences. Probabilistic models of human semantic interpretation and plausibility remain a largely unexplored area of research.
238
Gradience in Syntax
12.3.3 Towards quantitative models of performance So far, probabilistic models of sentence processing have only been used to account for qualitative data about human sentence processing (e.g. to predict whether a garden path occurs). By quantifying the likelihood of competing structural alternatives, however, such models in principle offer hope for more quantitative accounts of gradient behavioural data (e.g. to predict the strength of a garden path). In general terms, this would entail that the probability assigned to a syntactic structure is to be interpreted as a measure of the degree of processing difficulty triggered by this structure. Gradient processing difficulty in human sentence comprehension can be determined experimentally, for example by recording reading times in self-paced reading studies or eyetracking experiments. An evaluation of a probabilistic model should therefore be conducted by correlating the probability predicted by the model for a given structure with reading times (and other indices of processing difficulty). This new way of evaluating processing models raises a number of questions. Most importantly, an explicit linking hypothesis is required, specifying which quantity computed by the model would be expected to correlate with human processing data. One possible measure of processing difficulty would be the probability ratio of alternative analyses (Jurafsky 1996). That is, in addition to predicting the highest probability parse to be the easiest, we might expect the cost of switching to a less preferred parse to be correlated with the probability ratio of the preferred parse with respect to the alternative. Hale (2003) suggests an alternative, proposing that the word by word processing complexity is dominated by the amount of information the word contributes concerning the syntactic structure, as measured by entropy reduction. Hale’s model is thus in stark contrast with the previous probabilistic parsing accounts, in that he does not assume that switching from a preferred parse to an alternative is the primary determinant of processing cost. To date, Hale’s model has been evaluated on rather different kinds of structures than the probabilistic parsers discussed above. Reconciliation of the probabilistic disambiguation versus entropy reduction approaches—and their ability to qualitatively model reading time data—remains an interesting area for future research. 12.3.4 Evidence against likelihood in sentence processing Experience-based models often assume some frequency-based ambiguity resolution mechanism, preferring the interpretation which has the highest
Probabilistic Grammars in Language Processing
239
likelihood of being correct, namely the higher relative frequency. One wellstudied ambiguity is prepositional phrase attachment: (12.8) John hit the man [PP with the book ]. Numerous on-line experimental studies have shown an overall preference for high attachment, that is for the association of the PP with the verb (e.g. as the instrument of hit) (Ferreira and Clifton 1986; Rayner et al. 1983). Corpus analyses, however, reveal that low attachment (e.g. interpreting the PP as a modifier of the man) is about twice as frequent as attachment to the verb (Hindle and Rooth 1993). Such evidence presents a challenge for accounts relying exclusively on structural frequencies, but may be accounted for by lexical preferences for specific verbs (Taraban and McClelland 1988). Another problem for structural tuning comes from three-site relative clause attachments analogous to that in Figure 12.1, but containing an additional NP attachment site: (12.9) [high The friend ] of [midthe servant ] of [lowthe actress ] [RCwho was on the balcony ] died. While corpus analysis suggest a preference for low > middle > high attachment (although such structures are rather rare), experimental evidence suggests an initial preference for low > high > middle (with middle being in fact very difficult) (Gibson et al. 1996a, 1996b). A related study investigating noun phrase conjunction ambiguities (instead of relative clause) for such three-site configurations revealed a similar asymmetry between corpus frequency and human preferences (Gibson and Schu¨tze 1999). Finally, there is recent evidence against lexical verb frame preferences: (12.10) The athlete realized [S [NP her shoes/goals ] were out of reach ]. Reading times studies have shown an initial preference for interpreting her goals as a direct object (Pickering et al. 2000), even when the verb is more likely to be followed by a sentence complement (see also Sturt et al. 2001, for evidence against the use of such frame preferences in reanalysis). These findings might be taken as positive support for the tuning hypothesis, since object complements are more frequent than sentential complements overall (i.e. independent of the verb). Pickering et al. (2000), building on previous theoretical work (Chater et al. 1998), suggest that the parser may in fact still be using an experience-based metric, but not one which maximizes likelihood alone.
240
Gradience in Syntax
12.4 Probabilistic models of gradient grammaticality As argued in detail in the previous section, probabilistic grammars can be used to construct plausible models of human language processing, based on the observation that the disambiguation decisions of the human parser are guided by experience. This raises the question whether experience-based models can also be developed for other forms of linguistic behaviour, such as gradient grammaticality judgements. This issue will be discussed in this section. 12.4.1 Probabilities versus degrees of grammaticality We might want to conjecture that probabilistic models such as PCFGs can be adapted so as to account for gradient grammaticality, with probabilities being reinterpreted as degrees of grammaticality. The underlying assumption of such an approach is that language experience (approximated by the frequencies in a balanced corpus) not only determines disambiguation behaviour, but also determines (or at least influences) the way speakers make grammaticality judgements. The simplest model would be one where the probability of a syntactic structure (as estimated from a corpus) is directly correlated with its degree of grammaticality. This means that a speaker, when required to make a grammaticality judgement for a given structure, will draw on his or her experience with this structure to make this judgement. Manning (2003) outlines a probabilistic model of gradient grammaticality that comes close to this view. (However, he also acknowledges that such a model would have to take the context of an utterance into account, so as to factor out linguistically irrelevant factors, including world knowledge.) Other authors take a more sceptical view of the relationship between probability and grammaticality. Keller (2000b), for instance, argues that the degree of grammaticality of a structure and its probability of occurrence in a corpus are two distinct concepts, and it seems unlikely they can both be modelled in the same probabilistic framework. A related point of view is put forward by Abney (1996), who states that ‘[w]e must also distinguish degrees of grammaticality, and indeed, global goodness, from the probability of producing a sentence. Measures of goodness and probability are mathematically similar enhancements to algebraic grammars, but goodness alone does not determine probability. For example, for an infinite language, probability must ultimately decrease with length, though arbitrary long sentences may be perfectly good’ (Abney 1996: 14). He also gives a number of examples for
Probabilistic Grammars in Language Processing
241
sentences that have very improbable, but perfectly grammatical readings. A similar point is made by Culy (1998), who argues that the statistical distribution of a construction does not bear on the question of whether it is grammatical or not. Riezler (1996) agrees that probabilities and degrees of grammaticality are to be treated as separate concepts. He makes this point by arguing that, if one takes the notion of degree of grammaticality seriously for probabilistic grammars, there is no sensible application to the central problem of ambiguity resolution any more. A probabilistic grammar model cannot be trained so that the numeric value assigned to a structure can function both as a wellformedness score (degree of grammaticality) and as a probability to be used for ambiguity resolution. Keller and Asudeh (2002) present a similar argument in the context of optimality theory (OT). They point out that if an OT grammar were to model both corpus frequencies and degrees of grammaticality, then this would entail that the grammar incorporates both performance constraints (accounting for frequency effects) and competence constraints (accounting for grammaticality effects). This is highly undesirable in an OT setting, as it allows the crosslinguistic re-ranking of performance and competence constraints. Hence such a combined competence/performance grammar predicts that crosslinguistic differences can be caused by performance factors (e.g. memory limitations). Clearly, this is a counterintuitive consequence. A further objection to a PCFG approach to gradient grammaticality is that assigning probabilities to gradient structures requires the grammar to contain rules used in ‘ungrammatical’ structures. It might not be plausible to assume that such rules are part of the mental grammar of a speaker. However, any realistic grammar of naturally occurring language (i.e. a grammar that covers a wide range of constructions, genres, domains, and modalities) has to contain a large number of low-frequency rules anyway, simply in order to achieve broad coverage and robustness. We can therefore assume that these rules are also being used to generate structures with a low degree of grammaticality. 12.4.2 Probabilistic grammars and gradient acceptability data The previous section reviewed a number of arguments regarding the relationship between probabilities (derived from corpora) and degrees of grammaticality. However, none of the authors cited offers any experimental results (or corpus data) to support their position; the discussion remains purely
242
Gradience in Syntax
conceptual. A number of empirical studies have recently become available to shed light on the relationship between probability and grammaticality. Keller (2003) studies the probability/grammaticality distinction based on a set of gradient acceptability judgements for word order variation in German. The data underlying this study were gathered by Keller (2000a), who used an experimental design that crossed the factors verb order (initial or final), complement order (subject first or object first), pronominalization, and context (null context, all focus, subject focus, and object focus context). Eight lexicalizations of each of the orders were judged by a total of fifty-one native speakers using a magnitude estimation paradigm (Bard et al. 1996). The results show that all of the experimental factors have a significant effect on judged acceptability, with the effects of complement order and pronominalization modulated by context. A related experiment is reported by Keller (2000b), who uses ditransitive verbs (i.e. complement orders including an indirect object) instead of transitive ones. Keller (2003) conducts a modelling study using the materials of Keller (2000a) and Keller (2000b), based on the syntactically annotated Negra corpus (Skut et al. 1997). He trains a probabilistic context-free grammar on Negra and demonstrates that the sentence probabilities predicted by this model correlate significantly with acceptability scores measured experimentally. Keller (2003) also shows that the correlation is higher if a more sophisticated lexicalized grammar model (Carroll and Rooth 1998) is used. This result is not incompatible with the claim that there is a divergence between the degree of acceptability of a sentence and its probability of occurrence, as discussed in the previous section. The highest correlation Keller (2003) reports is .64, which corresponds to 40 per cent of the variance accounted for. However, this is achieved on a data set (experiment 1) which contains a contrast between verb final (fully grammatical) and verb initial (fully ungrammatical) sentences; it is not surprising that a PCFG trained on a corpus of fully grammatical structures (but not on ungrammatical ones) can make this distinction and thus achieves a fairly high correlation. On a corpus of only verb final structures that show relatively small differences in acceptability (experiment 2), a much lower (though still significant) correlation of .23 is achieved. This means that the PCFG only models 5 per cent of the variance. In other words, Keller’s (2003) results indicate that the degree of grammaticality of a sentence is largely determined by factors other than its probability of occurrence (at least as modelled by a PCFG). A related result is reported by Kempen and Harbusch (2004), who again deal with word order variation in German. They compare twenty-four word orders obtained by scrambling the arguments of ditransitive verbs (all
Probabilistic Grammars in Language Processing
243
possible argument permutations, with zero or one of the arguments pronominalized). Frequencies were obtained for these twenty-four orders from two written corpora and one spoken corpus and compared against gradient grammaticality judgements from Keller’s (2000b) study. The results are surprising in that they show that there is much less word order variation than expected; just four orders account for the vast majority of corpus instances. Furthermore, Kempen and Harbusch (2004) demonstrate what they term the frequency–grammaticality gap: all the word orders that occur in the corpus are judged as highly grammatical, but some word orders that never occur in the corpus nevertheless receive grammaticality judgements in the medium range. This result is consistent with Keller’s (2003) finding: it confirms that there is only an imperfect match between the frequency of a structure and its degree of grammaticality (as judged by a native speaker). Kempen and Harbusch (2004) explain the frequency–grammaticality gap in terms of sentence production: they postulate a canonical rule that governs word order during sentence production. The judgement patterns can then be explained with the additional assumption that the participants in a grammaticality judgement task estimate how plausible a given word order is as the outcome of incremental sentence production (governed by the canonical rule). Featherston (2004) presents another set of data that sheds light on the relationship between corpus frequency and grammaticality. The linguistic phenomenon he investigates is object co-reference for pronouns and reflexives in German (comparing a total of sixteen co-reference structures, e.g. ihni ihmi ‘him.ACC him.DAT’, ihni sichi ‘him.ACC REFL.DAT’). In a corpus study, Featherston (2004) finds that only one of these sixteen co-reference structures is reasonably frequent; all other structures occur once or zero times in the corpus. Experimentally obtained grammaticality data show that the most frequent structure is also the one with the highest degree of grammaticality. However, there is a large number of structures that also receive high (or medium) grammaticality judgements, even though they are completely absent in the corpus. This result is fully compatible with the frequency– grammaticality gap diagnosed by Kempen and Harbusch (2004). Like them, Featherston (2004) provides an explanation in terms of sentence production, but one that assumes a two-stage architecture. The first stage involves the cumulative application of linguistic constraints, the second stage involves the competitive selection of a surface string. Grammaticality judgements are made based on the output of the first stage (hence constraints violations are cumulative, and there are multiple output forms with a similar degree of grammaticality). Corpus data, on the other hand, are produced as the output
244
Gradience in Syntax
of the second stage (hence there is no cumulativity, and only a small number of optimal output forms can occur).
12.5 Conclusion There is clear evidence for the role of lexical frequency effects in human sentence processing, particularly in determining lexical category and verb frame preferences. Since many syntactic ambiguities are ultimately lexically based, direct evidence for purely structural frequency effects, as predicted by the tuning hypothesis, remains scarce (Jurafsky 2002). Probabilistic accounts offer natural explanations for lexical and structural frequency effects, and a means for integrating the two using lexicalized techniques that exists in computational linguistics (e.g. Carroll and Rooth 1998; Charniak 2000; Collins 1999). Probabilistic models also offer good scalability and a transparent representation of symbolic structures and their likelihood. Furthermore, they provide an inherently gradient characterization of sentence likelihood, and the relative likelihood of alternative interpretations, promising the possibility of developing truly quantitative accounts of experimental data. More generally, however, experience-based models not only offer an account of specific empirical facts, but can more generally be viewed as rational (Anderson 1990). That is, their behaviour typically resolves ambiguity in a manner that has worked well before, maximizing the likelihood of correctly understanding ambiguous utterances. This is consistent with the suggestion that human linguistic performance is indeed highly adapted to its environment and the task rapidly of correctly understanding language (Chater et al. 1998; Crocker to appear). It is important to note however, that such adaptation based on linguistic experience does not necessitate mechanisms which are strictly based on frequency-based estimations of likelihood (Pickering et al. 2000). Furthermore, different kinds and grains of frequencies may interact or be combined in complex ways (McRae et al. 1998). It must be remembered, however, that experience is not the sole determinant of ambiguity resolution behaviour (Gibson and Pearlmutter 1998). Not only are people clearly sensitive to immediate linguistic and visual context (Tanenhaus et al. 1995), some parsing behaviours are almost certainly determined by alternative processing considerations, such as working memory limitations (Gibson 1998). Any complete account of gradience in sentence processing must explain how frequency of experience, linguistic and non-linguistic knowledge, and cognitive limitations are manifest in the mechanisms of the human sentence processor.
Probabilistic Grammars in Language Processing
245
An even greater challenge to the experience-based view is presented by gradient grammaticality judgements. A series of studies is now available that compares corpus frequencies and gradient judgements for a number of linguistic phenomena (Featherston 2004; Keller 2003; Kempen and Harbusch 2004). These studies indicate that there is no straightforward relationship between the frequency of a structure and its degree of grammaticality, which indicates that not only experience, but also a range of processing mechanisms (most likely pertaining to sentence production) have to be invoked in order to obtain a plausible account of gradient grammaticality data.
13 Degraded Acceptability and Markedness in Syntax, and the Stochastic Interpretation of Optimality Theory R A L F VO G E L
13.1 Introduction Markedness plays a central role in optimality theoretic grammars in the form of violable well-formedness constraints.1 Grammaticality is understood as optimality relative to a constraint hierarchy composed of markedness constraints, which evaluate diVerent aspects of well-formedness, and faithfulness constraints, which determine, by their deWnition and rank, which aspects of markedness are tolerated, and which are not: grammaticality is dependent on and derived from markedness. An optimality grammar is an input–output mapping: marked features of the input have a chance to appear in the output, if they are protected by highly ranked faithfulness constraints. Optimal expressions might diVer in their markedness which is reXected in the diVerent constraint violation proWles that these expressions are assigned by the grammar. This chapter argues that violation proWles can be used to predict contrasts among expressions in empirical investigations, and that markedness is the grammar-internal correlate of (some) phenomena of gradedness that we 1 I want to thank my collaborators Stefan Frisch, Jutta Boethke, and Marco Zugck, without whom the empirical research presented in this paper would not have been undertaken. For fruitful comments and helpful suggestions I further thank Gisbert Fanselow, Caroline Fe´ry, Doug Saddy, Joanna Blaszczak, Arthur Stepanov, and the audiences of presentations of parts of this work at the Potsdam University and the workshop on Empirical Syntax/WOTS 8 at the ZAS Berlin, August 2004. This work has been supported by a grant from the Deutsche Forschungsgemeinschaft for the research group ‘ConXicting Rules in Language and Cognition’, FOR-375/2-A3.
Degraded Acceptability and Markedness in Syntax
247
experience in empirical studies. Contrary to other empirically oriented work within optimality theory (OT), I claim that a standard OT grammar is already well-suited for reXecting gradedness. Section 13.2 reXects the discussion about gradedness and categoricity within the tradition of generative grammar, especially in its relevance for the competence/performance distinction. Section 13.3 introduces a particular case of syntactic markedness: case conXicts in argument free relative constructions. I present data from an experimental study and show how an OT grammar interpreted in the way sketched above can predict the observed gradient data. In Section 13.4, I compare this account with two alternative ways of dealing with markedness within OT. Section 13.5 discusses stochastic OT, an enhancement of standard OT, especially designed to deal with results from more advanced empirical investigations. A stochastic component as part of an OT syntax grammar is on the one hand not necessary to derive empirical predictions. But furthermore, the stochastic OT model, as it is applied to syntactic problems, has one serious shortcoming: in stochastic OT, an expression is the winner of a competition with a certain probability only. However, besides this, corpus frequencies not only mirror how often a candidate wins, but also the frequency of the competition itself. A rare structure might be a frequent winner of a rare competition, or a rare winner of a frequent competition. The underlying grammars in these seemingly indistinguishable cases would be radically diVerent. I show in Section 13.6, with the results of a corpus study on German free relative constructions, that these two kinds of frequencies can indeed be observed, how they can be distinguished, and that they are both driven by markedness, which should, therefore, be deWned in an independent way. This is also necessary to avoid the pitfalls of contradictory results from diVerent empirical methods. All methods create their own artefacts, and these should not enter the grammar.
13.2 Gradedness and categoricity in generative syntax Josefsson (2003) reports a survey that she did with about thirty Swedish native speakers on the possibility of pronominal object shift in Swedish. She gave her informants a Wve-point scale with the values ‘o.k. – ? – ? ? – ? * – *’. For the statistical analysis, she correlated the judgements with natural numbers ranging from ‘o.k.’¼ 4 to ‘*’ ¼ 0. Josefsson further assumed that grammatical sentences have at least an average acceptability value of 1.5. This decision is not of particular importance in her analysis. However, it appears to be a purely
248
Gradience in Syntax
normative decision. She could as well have proposed 2.0 or 2.5 as the boundary. How can such a decision be justiWed independently? An answer to this question requires a theory of acceptability judgements. Theoretical linguists rarely explicate their point of view on this. Interpreting the ‘?’ as uncertainty could simplify the problems somewhat, as this allows the assumption of a categorical grammar. But we would still have to exclude that the gradedness that we observe results from inherent properties of the grammar, instead of being the result of ‘random noise’. If, on the other hand, phenomena of gradedness are systematically correlated with grammatical properties, then the whole categorical view on grammar is called into question. I think that this is indeed the case. More recent variants of ‘explanations’ in terms of non-grammatical factors attribute variation and gradedness in grammaticality judgements to ‘performance’. Abney (1996) remarked that such a line of argumentation takes the division between competence and performance more seriously than it should be taken: Dividing the human language capacity into grammar and processor is only a manner of speaking, a way of dividing things up for theoretical convenience. It is naive to expect the logical grammar/processor division to correspond to any meaningful physiological division—say, two physically separate neuronal assemblies, one functioning as a store of grammar rules and the other as an active device that accesses the grammar-rule store in the course of its operation. And even if we did believe in a physiological division between grammar and processor, we have no evidence at all to support that belief; it is not a distinction with any empirical content. (Abney 1996: 12)
Gradedness can even be used as a criterion for determining whether a constraint belongs to competence or performance: constraints that belong to performance cause degraded acceptability, rather than plain ungrammaticality. This would immunize the competence/performance distinction against any empirical counter-evidence, which would make even clearer that the distinction is only made for theoretical convenience. Manning (2003) argues along very much the same lines. Emphasizing, like Abney, that the generative grammarian discourse centred around the notion of competence is very limited in its scope, he calls for the application of probabilistic methods in syntax: Formal linguistics has traditionally equated structuredness with homogeneity . . . , and it has tried too hard to maintain categoricity by such devices as appeal to an idealised speaker/hearer. . . . The motivation for probabilistic models in syntax comes from two sides:
Degraded Acceptability and Markedness in Syntax
249
. Categorical linguistic theories claim too much. They place a hard categorical boundary of grammaticality where really there is a fuzzy edge, determined by many conXicting constraints and issues of conventionality versus human creativity. [ . . . ] . Categorical linguistic theories explain too little. They say nothing at all about the soft constraints that explain how people choose to say things (or how they choose to understand them). (Manning 2003: 296–7)
Sternefeld (2001) provides further arguments against the traditional competence/performance distinction within generative grammar. One of the issues he discusses is that structures with central embeddings are very often degraded in their acceptability. The problem for the competence/performance distinction is: why should the computational system of I-language, the competence system, be able to produce structures that the parser is unable to compute eYciently? Why is it impossible for the parser to use the computational system of I-language in processing? With Kolb (1997), Sternefeld claims that the description of I-language as a ‘computational system’ makes it impossible to distinguish it theoretically and empirically from the ‘processing system’, that is performance. Both have the same ontological status as generative, procedural systems. Sternefeld therefore proposes with Kolb that competence should be understood as a declarative axiomatic system, comparable to formal logics. Computational procedures, however abstractly they may be conceived, are then part of the performance system. A derivation can be seen as a proof for a particular structure, interpreted as a theorem of the algebraic system of I-language. The performance system, however, includes not only these derivational procedures, but also, for instance, all the psychological restrictions that are known to inXuence linguistic behaviour, and anything else that is usually subsumed under the term ‘performance’. A research programme that restricts itself to an investigation of competence in this sense would not be able to formulate anything of empirical relevance. In other words: the linguists’ focus of interest is and has always been performance. Abney, Manning, and Sternefeld each argue from diVerent perspectives for abandoning the competence/performance distinction in the traditional sense. In particular, they all show that it is useless for the investigation of a number of empirical phenomena, including gradient acceptability. Perhaps, the diYculty of relating the numerical, statistical results of psycholinguistic experiments, corpus studies, or other more advanced empirical methods to a categorical understanding of grammaticality is the reason why
250
Gradience in Syntax
the results of such empirical studies only rarely Wnd their way into the grammar theoretical work of generative syntacticians.
13.3 Markedness in syntax An important feature of many empirical methods is their relational way of gathering data about linguistic structures. A typical design for a psycholinguistic experiment uses minimal pairs. An example is the pair in (13.1): free relative clauses (FR) are clauses that stand for non-clausal constituents. They have the syntax of relative clauses, but miss a head noun. The initial whpronouns of argument FRs are sensitive to the case requirements of both the FR-internal verb and the matrix verb. When the two cases diVer, we observe a conXict: one of the two cases cannot be realized. This leads to ungrammaticality in (13.1b):2 (13.1) Case matching in argument free relative clauses in German: a. Wer uns hilft, wird uns vertrauen [Who-nom us-dat helps]-nom will us-dat trust ‘Whoever helps us will trust us’ b. *Wer uns hilft, werden wir vertrauen [Who-nom us-dat helps]-dat will we-nom trust ‘Whoever helps us, we will trust him’ Experiments usually test for contrasts between minimally diVerent expressions. In our example, the theory of case matching in argument free relative clauses (Groos and van Riemsdijk 1981; Pittner 1991; Vogel 2001) is conWrmed if (13.1b) is judged as grammatical less often than (13.1a) to a statistically signiWcant degree. This is indeed the result of a speeded grammaticality judgement experiment by Boethke (2005). Structure (13.1b) has signiWcantly less often been judged as acceptable than (13.1a).3 This result is unproblematic for a categorical grammar. However, the experiment contained two further conditions: 2 The case required by the matrix verb appears slanted and attached to the FR in the glosses. 3 For the sake of completeness, I will brieXy describe the experiment design: each of the 24 participants—students of the University of Potsdam—saw eight items of each of the conditions. Test items were FRs with the four possible case patterns with nominative and dative. The experiment included four further conditions which will be introduced later—so the experiment had eight test conditions altogether. The test items of this experiment have been randomized and mixed with the test items of three other experiments which served as distractor items. The sentences have been presented visually word by word on a computer screen, one word at a time, each word was presented for 400 ms. Subjects were asked to give a grammaticality judgement by pressing one of two buttons for grammatical/ungrammatical, within a time window of 2,500 ms.
Degraded Acceptability and Markedness in Syntax
251
Table 13.1. Acceptability rates for the structures in (13.1) and (13.2) in the experiment by Boethke (2005) Case of wh-phrase: nominative dative
Case required by matrix verb: nominative
dative
87% (13.1a) 62% (13.2b)
17% (13.1b) 71% (13.2a)
(13.2) a. Wem wir helfen, werden [Who-dat we-nom help ]-dat will ‘Whoever we help, we will trust him’ b. Wem wir helfen, wird [Who-dat we-nom help ]-nom will ‘Whoever we help, he will trust us’
wir vertrauen we-nom trust uns us-dat
vertrauen trust
The acceptability rates for these two structures are between those for the two structures in (13.1). All contrasts except for the one between (13.2a, 13.2b) were statistically signiWcant: A categorical grammar has the problem of mapping this observation to its dichotomous grammaticality scale. How can we independently justify where we draw the boundary? If we state that only (13.1b) is ungrammatical, then we state that the observed contrast between (13.2b) and (13.1b) is crucial, but all the others are not. Likewise, if we treat (13.2b) as grammatical, we ignore the contrast between (13.2a) and (13.1b). No matter how we decide, the diYcult task is Wnding arguments for our decision to ignore some contrasts while using others. But most importantly, there is no way of accounting for all contrasts with the grammatical/ungrammatical dichotomy only. This shows that the decision between a categorical or a gradient conception of grammaticality is also an empirical matter.4 If empirical methods show an intermediate acceptability status under such controlled conditions, it is very likely that the factor that caused this intermediate status is grammar-internal. At least, this should be the null assumption.
I want to emphasize that this experiment led to gradient acceptability (see below) without asking for it. In questionnaire studies with multi-valued scales and experiments based on magnitude estimation gradience is already part of the experimental design. One could argue that subjects only give gradient judgements, because they have been oVered this option. In the experiment described here, the gradience results from intra- and inter-speaker variation among the test conditions in repeated measuring. 4 Featherston (to appear) provides more arguments in favour of this position.
252
Gradience in Syntax
A theory of grammar that has the potential to deal with gradedness more successfully is optimality theory (Prince and Smolensky 1993). It departs in a number of ways from classical generative grammar. It is constraint-based, which is not strikingly diVerent, but the constraints are ranked and violable. DiVerent structures have diVerent violation proWles. One important departure from traditional grammars is that the grammaticality of an expression cannot be determined for that expression in isolation. An expression is grammatical, if it is optimal. And it is optimal if it performs better on the constraint hierarchy than all possible alternative expressions in a competition for the expression of a particular underlying input form. OT thus determines grammaticality in a relational manner. This is reminiscent of what is done in the empirical investigations described above. It should be possible to systematically relate observed gradedness to relative optimality of violation proWles.5 OT is based on two types of constraints, markedness and faithfulness constraints. Markedness constraints evaluate intrinsic properties of candidates, while faithfulness constraints evaluate how similar candidates are to a given input. As there are inWnitely many possible input speciWcations, there is the same rich amount of competitions. Grammatical expressions are those that win, that is are optimal, in at least one of these competitions. Candidates which are good at markedness, that is relatively unmarked candidates, are not as dependent on the assistance of faithfulness constraints relatively marked candidates. This is schematically illustrated in Tables 13.2 and 13.3. Table 13.2.
Grammar with low ranked faithfulness
cand1
M1
+cand1 cand2
*!
M2
F
cand2
*
+ cand1 cand2
*
M1
M2
F
*
*
M1
M2
*!
Table 13.3. Grammar with highly ranked faithfulness cand1
F
M1
+cand1 cand2
*!
*
M2 *
cand2
F
cand1 + cand2
*!
* *
M1, M2: markedness constraints; F: faithfulness constraint; cand1, cand2: input speciWcations; cand1, cand2: output candidates; * ¼ constraint violation; *! ¼ fatal violation; + ¼ winning candidate
5 The Wrst author who explored this feature of OT systematically was Frank Keller (Keller 2000b, and further work). See below for a brief discussion of his approach.
Degraded Acceptability and Markedness in Syntax
253
Candidate cand1 performs better than cand2 in the hierarchy of markedness constraints ‘M1 M2’. Therefore, we can say that cand1 is less marked than cand2 in the language at issue. This does not tell us anything about the grammaticality of cand2, however. But we know that cand1 is grammatical, irrespective of the grammaticality of cand2—provided, as we assume for the sake of the example, that there are no further constraints and candidates to consider. The faithfulness constraint F, if ranked low, cannot assist candidate cand2, and so cand1 wins the competitions for both inputs ‘cand1’ and ‘cand2’. Highly ranked faithfulness gives higher priority to input preservation, and therefore cand2 wins its own competition. Irrespective of the fact that both cand1 and cand2 are grammatical under highly ranked faithfulness, we can still derive that cand2 is the more marked structure from the violation proWles of the two structures, when we abstract away from particular inputs, that is leave out the faithfulness constraints. An OT grammar, interpreted this way, not only tells whether a structure is grammatical, it also determines its relative markedness compared to other structures. This second property is particularly interesting for the predictability of gradience. Markedness can be seen as the correlate of gradience within the OT grammar. Because markedness is one of the key concepts of OT, nothing substantial needs to be added to account for gradience. In our abstract example, the prediction would be that the less marked cand1 receives higher acceptability, is easier to process, is more frequent, etc. I will illustrate this with the linguistic example in (13.1) and (13.2). Simplifying my own account (Vogel 2001, 2002, 2003b), we can assume the following constraints to distinguish the four structures: (13.3) a. Realize Case (RC): An assigned case must be realized morphologically. b. Realise Oblique(RO): An assigned oblique case (e.g. dative) must be realized morphologically. c. Subject precedes Object (S 1To1 RO > RCr > F > RC > 1To1 RO > RCr > RC > F > 1To1
Vogel (2001), German A Pittner (1991) Groos and van Riemsdijk (1981)
256
Gradience in Syntax
motivation for the introduction of Xoating constraints is the same kind of problem observed here, variation within a speech community, or a family of closely related dialects. In our example, the variable rank of F can be interpreted as reXecting the individually varying level of ‘error tolerance’ within a speech community. In stochastic optimality theory, all constraints occupy a particular rank only with a certain probability, and can potentially be Xoating. It therefore provides even more Xexibility to make adequate predictions for empirical investigations. This approach is the topic of Section 13.5.
13.4 Markedness in OT Markedness constraints do most of the crucial work in OT grammars. One might object that markedness is only a reXection of typicality (just as one anonymous reviewer did): a certain expression is degraded in acceptability only because it is less frequently used or less prototypical. This objection does not carry over to the phenomenon we are exploring here, case conXicts in argument FRs. Most German native speakers agree that the following grammaticality contrast holds: (13.9) a.
Ich I b. *Ich I
besuche, [fr visit-[acc] vertraue, [fr trust-[dat]
wem who-dat wen who-acc
ich I ich I
vertraue ] trust besuche ] visit
This contrast could be conWrmed very clearly in a speeded grammaticality judgement experiment (Vogel and Frisch 2003). In a corpus investigation, however, neither one of these structures could be found in our samples (Vogel and Zugck 2003). The contrast in (13.9) is certainly not a typicality contrast— the structures are too rare. In our studies, argument FRs without case conXicts turned out to be both more frequent and more likely to be accepted than conXicting structures such as those in (13.9). The source for the observed contrasts lies in the expressions under examination: case conXicts are problematic in themselves, and some conXicts are more problematic than others, but they are a material property of the expressions, and thus represent grammatical markedness in the very traditional sense. Markedness shows up in two ways in an OT model, and these need to be distinguished. First, every candidate, even the winner of an OT competition, can violate markedness constraints. A comparison of the constraint violation proWles only of structures that are optimal in some competition, as sketched
Degraded Acceptability and Markedness in Syntax
257
in Table 13.4, results in a relative markedness ranking of grammatical structures. This is all we need to predict the relative frequencies of diVerent structures in a corpus, as ungrammatical structures do not occur in even very large corpora to a measurable degree. It can be used in the same way to predict relative acceptabilities in experiments. That an OT markedness grammar outputs the correct relative acceptabilities/frequencies in such a comparison could be a criterion for the empirical adequacy of a model. Secondly, suboptimal structures, the losers of single OT competitions, are more marked than the winners of these competitions. Many of these output candidates do not win in any competition of a language. Of course, it is the nature of OT that suboptimal structures also diVer in their violation proWles. In the same way as with winning structures, the proWles can be used to predict the results of experiments which use these expressions. Keller (2000b, 2001) relates suboptimality to degraded acceptability. 13.4.1 Relative markedness of winners An understanding of markedness in the Wrst sense underlies the common sense usage of this term in the linguistic literature. Expressions are classiWed as grammatical, ungrammatical, and ‘marked’, which usually means, informally speaking, ‘not ungrammatical, but not perfect either’, but rarely ever ‘ungrammatical, but better than other ungrammatical structures’.8 The possibility of a comparison of all winners in the OT grammar of a particular language with respect to their relative markedness is an important feature that distinguishes OT from ordinary models of generative grammar. There, all grammatical structures are equal in the sense that the only criterion for grammaticality is the possibility of assigning them a well-formed structural description. This notion of ‘well-formedness’ is not abandoned in OT. All winning structures of single competitions are well-formed in this sense. But the winning structures are not equal. They are assigned diVerent violation proWles by the OT grammar, and these, in principle, are accessible for comparison. The result of such a comparison is a scale of relative markedness which should ideally conform to the gradedness that we observe. The Wrst one who exploited this idea to account for gradedness in syntax, as far as I know, was Mu¨ller (1999). However, much of the crucial work in his proposal is done by a subsystem of constraints which works diVerently in accounting for grammaticality than in accounting for degraded acceptability. That is, he uses slightly diVerent grammars for the two tasks. 8 To me, this formulation even has the Xavour of a logical contradiction. Ungrammatical structures can by deWnition not be better than other structures.
258
Gradience in Syntax
This is not the case in the proposal that I developed above. I only use the constraint types that are already there, markedness and faithfulness constraints. Faithfulness plays a crucial role in selecting the winners of single competitions, but cannot, by deWnition, play a role in the relative comparison of these winners, as they are winners for diVerent inputs. Mu¨ller, on the contrary, selects the constraints that are responsible for gradedness in an ad hoc manner from the set of markedness constraints. In a similar vein, Keller (2001) and Bu¨ring (2001) propose diVerences among markedness constraints. Roughly speaking, they should be distinguished by the eVect of their violation. Irrespective of their rank in the constraint hierarchy, markedness constraints are claimed to diVer in whether their violation leads to ungrammaticality or only to degraded acceptability. These three authors have in common that they propose that markedness in the traditional sense must be added to the OT model as a further dimension of constraint violation. They did not Wnd a way of accounting for it within standard OT. This is surprising insofar as the traditional conception of markedness is the core of OT. However, I think that I showed a way out of this dilemma that can do without these complications. 13.4.2 Markedness as suboptimality Markedness in the second sense that I mentioned above, as an artefact of the OT model, is a much more problematic concept, and one might wonder whether it has or should have any empirical consequences. A single OT competition only knows winners and losers. Mu¨ller (1999) already argues against the conception of suboptimality proposed by Keller (cf. Keller 2000): in many OT analyses, the second best candidate is simply the candidate that is excluded last, and very often this candidate is plainly ungrammatical and much worse than other candidates which have been excluded earlier. Take the case of a candidate cand1 that is excluded early in competition A only because of highly ranked faithfulness, but wins another competition B that has the appropriate input. Such a structure would certainly be judged better in an experiment than a candidate cand2 that is excluded late in both competitions, but does not win any competition in the language at hand, and is therefore ungrammatical. Because of faithfulness, structures are assigned diVerent violation proWles in diVerent competitions. Being suboptimal in one competition does not mean being suboptimal in all competitions. Consider our example of free relative constructions. As I showed above, a correlative structure such as the one in (13.8) avoids the case conXict with an additional resumptive pronoun. Therefore, CORR structures do not violate constraints on case realization.
Degraded Acceptability and Markedness in Syntax
259
Consequently, CORR structures are less marked than FR structures.9 But how can an FR be grammatical at all, then? Simply, because we speciWed in the input that we want an FR structure, and highly ranked faithfulness rules out the less marked CORR candidate—but only in this particular competition! The CORR candidate still wins the competition where CORR is speciWed in the input. The CORR structure performs worse than the FR structure in one competition, but better in the other one. On which of these two contradicting competitions shall we now base our empirical predictions? A competition in an OT model is a purely technical device which should not be identiWed with a comparison in a psycholinguistic experiment. The only possible way to derive empirical predictions from a standard OT model also for the comparison of ungrammatical structures seems to me to be the meta-comparison for markedness sketched above that abstracts away from single competitions, and therefore from faithfulness. A powerful enhancement of OT that tries to relate grammar theory and empirical linguistics is stochastic optimality theory which will be discussed in the next section.
13.5 Stochastic optimality theory—how to make grammar Wt observations Stochastic OT has been developed by Boersma (1998b) and Boersma and Hayes (2001). The most important diVerence to classical OT is that constraints are ordered at an inWnite numerical scale of ‘strictness’. The relative rank of constraints is expressed by their distance on this scale, rather than simply by domination. The ‘rank’ of a constraint, furthermore, is not a Wxed value, but a probabilistic distribution. A constraint has a particular rank only with a particular probability. At evaluation time, a certain amount of noise is added, the probabilistic distributions of two constraints might overlap and the grammar can have diVerent rankings at diVerent times, although these rankings might diVer in their probabilities. The body of work of stochastic OT in syntax is still rather small, and most of it has been carried out by Joan Bresnan and her group at Stanford University. Let me introduce only one example. Bresnan et al. (2001) study the inXuence of person features of agent and patient on the choice of voice in English and Lummi. They analysed the parsed SWITCHBOARD corpus, a 9 This is also reXected in the typology of these two constructions. To the best of my knowledge, the languages which have FR constructions are a proper subset of those that have CORR constructions, as I also illustrated in my earlier work, cf. (Vogel 2002).
260
Gradience in Syntax
Table 13.6. English person/role by voice (full passives) in the parsed SWITCHBOARD-corpus, from Bresnan et al. (2001) Action 1,2 ! 1,2 1,2 ! 3 3! 3 3 ! 1,2
#Active 179 6,246 3,110 472
#Passive
Active %
0 0 39 14
100.0 100.0 98.8 97.1
Passive % 0.0 0.0 1.2 2.9
database of spontaneous telephone conversations spoken by over 500 American English speakers. The analysis revealed the absence of full passive (with by-phrases) if the agent of the transitive verb is Wrst or second person, while they found an albeit small number of full passives with third person agents. This diVerence, although numerically small, is statistically signiWcant. Table 13.6 displays their Wgures.10 English exhibits as a tendency what a language like Lummi has as a categorical rule: Passives are avoided for structures with Wrst and second person agents and they are more likely to occur with Wrst and second person patients than with third person patients.11 Observations of this sort are evidence for a position that unites functional and formal linguistics within optimality theory under the slogan of the ‘stochastic generalization’:12 The same categorical phenomena which are attributed to hard grammatical constraints in some languages continue to show up as statistical preferences in other languages, motivating a grammatical model that can account for soft constraints. (Bresnan et al. 2001: 29)
Bresnan et al. (2001) show that Stochastic OT ‘can provide an explicit and unifying theoretical framework for these phenomena in syntax.’ The frequencies of active and passive are interpreted to correspond to the probabilities of being the optimal output in a stochastic OT evaluation. The most important constraints that are used in that account are *Obl1,2 which is ranked highest and bans by-phrases with Wrst and second person, *SPt, which bans patients from being subjects, that is penalizes passives, *S3, which penalizes 3rd person subjects, and *SAg, which penalizes Agents as subjects. The latter two constraints are ranked on a par and overlap a bit 10 In Table 13.6, the description of the Action is to be read as follows: ‘‘1,2 ! 3’’ means that a first or second person agent acts upon a third person patient. 11 In Lummi, sentences with Wrst or second person objects and third person subjects are ungrammatical. Likewise, passive is excluded if the agent is Wrst or second person. 12 The eVect described here can be achieved without stochastic enhancements, just by exploiting the violation proWles in the way illustrated in Section 13.3.
Degraded Acceptability and Markedness in Syntax
261
with the higher ranked *SPt, which in turn overlaps a bit with the higher ranked *Obl1,2. The rarity of passives with 1st or 2nd person subjects is mirrored by the high rank of *Obl1,2. Is it really the case that the rarity of passives with Wrst and second person by-phrases is the result of a grammatical constraint, or is it not rather the result of the rarity of the communicative situation in which such a passive would be appropriate? Not all instances of infrequency have a grammatical cause. It seems that a constraint system that is designed to directly derive frequency patterns runs into the danger of interpreting properties of the ‘world’ as properties of the grammar. I will discuss this problem in more detail below.13
13.6 A case study—continued One problem for stochastic optimality theory that has often been noticed (cf. Boersma, this volume) is that diVerent tasks seem to require diVerent stochastic OT grammars. In particular, corpus frequencies and relative acceptabilities might not always go hand in hand. Our studies on German argument free relative clauses that have been introduced brieXy above are another example in case. The experiment by Boethke (2005) altogether included eight conditions, four FRs in the four diVerent case conWgurations, and their correlative counterparts.14 One prediction is that the correlative structures have a higher acceptability rate than their FR counterparts, because they avoid case realization problems with the additional resumptive pronoun. This expectation is met, as can be seen in Table 13.7. All contrasts are statistically signiWcant, except for the least problematic context, nom-nom. This could be due to the fact that FRs in this Table 13.7. Mean acceptabilities for FR and CORR in diVerent case conWgurations (in %) nom-nom
dat-dat
dat-nom
nom-dat
FR
CORR
FR
CORR
FR
CORR
FR
CORR
87
95
71
91
62
92
17
90
13 See also Boersma (this volume) for more discussion of problems of this kind. 14 The abbreviations for the case patterns here and below have the following logic: in ‘case1-case2’, case1 is the case of the wh-pronoun, case2 is the case assigned to the FR by the matrix verb.
262
Gradience in Syntax Table 13.8. Results of a corpus investigation (Vogel and Zugck 2003) Case pattern nom-nom dat-dat dat-nom nom-dat
FR 274 (89.8%) 1 (5.6%) 33 (34.4%) 0 (0%)
CORR 31 (10.2%) 17 (94.4%) 63 (65.6%) 5 (100%)
context are too good already, and so there might in fact be a diVerence, but it cannot be detected with this method. Secondly, the contrast between the FRs in the contexts dat-dat and datnom was not signiWcant either, contrary to all other contrasts. This is perhaps due to an equal rank of the constraints RC and S S4 . If we assume a naive extension of standard OT, then this order corresponds to the grammaticality order of the candidates. The naive extension assumes the strict domination of constraints, and therefore fails to model ganging-up eVects. Under this approach, there is no possibility for a joint violation of C2 and C3 to be as serious as a single violation of C1 , due to the ranking C1 C2 C3 .
Linear Optimality Theory as a Model
277
Hence the naive extension of standard OT fails to account for the ganging-up eVects that were observed experimentally. 14.3.3 Ranking argumentation and parameter estimation Optimality theory employs so-called ranking arguments to establish constraint rankings from data. A ranking argument refers to a set of candidate structures with a certain constraint violation proWle, and derives a constraint ranking from this proWle. This can be illustrated by the following example: assume that two structures S1 and S2 have the same constraint proWle, with the following exception: S1 violates constraint C1 , but satisWes C2 . Structure S2 , on the other hand, violates constraint C2 , but satisWes C1 . If S2 is acceptable but S2 is unacceptable, then we can conclude that the ranking C2 C1 holds (see Prince and Smolensky 1993: 106). In the general case, the fact that S1 is acceptable but S2 is unacceptable entails that each constraint violated by S1 is outranked by at least one constraint violated by S2. (See Hayes 1997, for a more extensive discussion of the inference patterns involved in ranking argumentation in standard OT.) The LOT approach allows a form of ranking argumentation that relies on gradient acceptability data instead of the binary acceptability judgements used in standard OT. A ranking argument in linear optimality theory can be constructed based on the diVerence in acceptability between two structures in the same candidate set, using the following deWnition: (14.10) Ranking argument Let S1 and S2 be candidate structures in the candidate set R with the acceptability diVerence DH. Then the equation in (14.11) holds. (14.11)
H(S1 )
H(S2 ) ¼ DH
This deWnition assumes that the diVerence in harmony between S1 and S2 is accounted for by DH, the acceptability diVerence between the two structures. DH can be observed empirically, and measured, for instance, using magnitude estimation judgements (Sorace and Keller 2005). Drawing on the deWnition of harmony in (14.5), equation (14.11) can be transformed to: P w(Ci )(v(S1 ,Ci ) v(S2 ,Ci ) ) ¼ DH (14.12) i
This assumes that S1 and S2 have the violation proWles v(S1 ) and v(S2 ) and are evaluated relative to the grammar signature hC,wi. Typically, a single ranking argument is not enough to rank the constraints of a given grammar. Rather, we need to accumulate a suYciently large set of ranking arguments, based on which we can then deduce the constraint
278
Gradience in Syntax
hierarchy of the grammar. To obtain a maximally informative set of ranking arguments, we take all the candidate structures in a given candidate set and compute a ranking argument for each pair of candidates, using deWnition (14.12). The number of ranking arguments that a set of k candidates yields is given in (14.13); note that this is simply the number of all unordered pairs that can be generated from a set of k elements. (14.13)
n¼
k2
k 2
Now we are faced with the task of computing the constraint weights of a grammar from a set of ranking arguments. This problem can be solved by regarding the set of ranking arguments as a system of linear equations. The solution for this system of equations will then provide a set of constraint weights for the grammar. This idea is best illustrated using an example. We consider the candidate set in Table 14.1 and determine all ranking arguments generated by this candidate set (here wi is used as a shorthand for w(Ci ), the weight of constraint Ci ): (14.14) S1
S2 : 0w1 þ 1w2 þ 1w3
0w1
1w2
2w3 ¼
((
4)
(
5) ) ¼
S1 S1 S2
S3 : 0w1 þ 1w2 þ 1w3 S4 : 0w1 þ 1w2 þ 1w3 S3 : 0w1 þ 1w2 þ 1w3
0w1 1w1 0w1
0w2 0w2 0w2
1w3 ¼ ( ( 0w3 ¼ ( ( 1w3 ¼ ( (
4) 4) 5)
( ( (
1) ) ¼ 3 4) ) ¼ 0 1) ) ¼ 4
S2 S3
S4 : 0w1 þ 1w2 þ 1w3 1w1 0w2 0w3 ¼ ( ( 5) ( 4) ) ¼ 1 S5 : 0w1 þ 0w2 þ 0w3 1w1 0w2 0w3 ¼ ( ( 1) ( 4) ) ¼ 3
1
This system of linear equations can be simpliWed to: (14.15)
w3 ¼ 1 w2 ¼ 3 w2 þ w3 w1 ¼ 0 w2 þ w3 ¼ 4 w2 þ 2w3 w3
w1 ¼ 1 w1 ¼ 3
We have therefore determined that w2 ¼ 3 and w3 ¼ 1. The value of w1 can easily be obtained from any of the remaining equations: w1 ¼ w2 þ w3 ¼ 4. This example demonstrates how a system of linear equations that follows from a set of ranking arguments can be solved by hand. However, such a manual approach is not practical for large systems of equations as they occur in realistic ranking argumentation. Typically, we will be faced with a large set
Linear Optimality Theory as a Model
279
of ranking arguments, generated by a candidate set with many structures, or by several candidate sets. There are a number of standard algorithms for solving systems of linear equations, which can be utilized for automatically determining the constraint weights from a set of ranking arguments. One example is Gaussian Elimination, an algorithm which delivers an exact solution of a system of linear equations (if there is one). If we are dealing with experimental data, then the set of ranking arguments derived from a given data set will often result in an inconsistent set of linear equations, which means that Gaussian elimination is not applicable. In such a case, the algorithm of choice is least square estimation (LSE), a method for solving a system of linear equations even if the system is inconsistent. This means that LSE enables us to estimate the constraint weights of an LOT grammar if there is no set of weights that satisfy all the ranking arguments exactly (in contrast to Gaussian elimination). LSE will Wnd an approximate set of constraint weights that maximizes the Wt with the experimentally determined acceptability scores. A more detailed explanation of LSE and its application to LOT is provided by Keller (2000b).
14.4 Comparison with other optimality theoretic approaches 14.4.1 Standard optimality theory Linear optimality theory preserves key concepts of standard optimality theory. This includes the fact that constraints are violable, even in an optimal structure. As in standard OT, LOT avails itself of a notion of constraint ranking to resolve constraint conXicts; LOT’s notion of ranking is quantiWed, that is richer than the one in standard OT. The second core OT concept inherited by LOT is constraint competition. The optimality of a candidate cannot be determined in isolation, but only relative to other candidates it competes with. Furthermore, LOT uses ranking arguments in a similar way to standard OT. Such ranking arguments work in a competitive fashion, that is based on the comparison of the relative grammaticality of two structures in the same candidate set. As in standard OT, a comparison of structures across candidate sets is not well-deWned; two structures only compete against each other if they share the same input. The crucial diVerence between LOT and standard OT is the fact that in LOT, constraint ranks are implemented as numeric weights and a straightforward linear constraint combination scheme is assumed. Standard optimality theory can then be regarded as a special case of LOT, where the constraint weights are chosen in an exponential fashion so as to achieve strict domination (see the subset theorem in (14.16)). The extension of standard OT to LOT is crucial in
280
Gradience in Syntax
accounting for the cumulativity of constraint violations. The linear constraint combination schema also greatly simpliWes the task of determining a constraint hierarchy from a given data set. This problem simply reduces to solving a system of linear equations, a well-understood mathematical problem for which a set of standard algorithms exists (see Section 14.3.3). Another advantage is that LOT naturally accounts for optionality, that is for cases where more than one candidate is optimal. Under the linearity hypothesis, this simply means that the two candidates have the same harmony score. Such a situation can arise if the two candidates have the same violation proWle, or if they have diVerent violation proWles, but the weighted sum of the violation is the same in both cases. No special mechanisms for dealing with constraint ties are required in linear OT. This is an advantage over standard OT, where the modelling of optionality is less straightforward (see Asudeh 2001, for a discussion). An OT grammar can be formulated as a weighted grammar if the constraint weights are chosen in an exponential fashion, so that strict domination of constraints is assured. This observation is due to Prince and Smolensky (1993: 200) and also applies to linear optimality theory. Therefore, the theorem in (14.16) holds (the reader is referred to Keller (2000b) for a proof). (14.16) Subset theorem A standard optimality theory grammar G with the constraint set C ¼ {C1 ,C2 , . . . ,Cn } and the ranking Cn Cn 1 . . . C1 can be expressed as a linear optimality theory grammar G’ with the signature hC,wi and the weight function w(Ci ) ¼ bi , where b 1 is an upper bound for multiple constraint violations in G. Note that the subset theorem holds only if there is an upper bound b 1 that limits the number of multiple constraint violations that the grammar G allows. Such an upper bound exists if we assume that the number of violations incurred by each structure generated by G is Wnite. This might not be true for all OT constraint systems. 14.4.2 Harmonic grammar Harmonic grammar (Legendre et al. 1990a, 1990b, 1991; Smolensky et al. 1992, 1993) is the predecessor of OT that builds on the assumption that constraints are annotated with numeric weights (instead of just being rank-ordered as in standard OT). Harmonic grammar (HG) can be implemented in a hybrid connectionist-symbolic architecture and has been applied successfully to gradient data by Legendre et al. (1990a, 1990b). As Prince and Smolensky
Linear Optimality Theory as a Model
281
(1993: 200) point out, ‘Optimality Theory . . . represents a very specialized kind of Harmonic Grammar, with exponential weighting of the constraints’. Linear optimality theory is similar to HG in that it assumes constraints that are annotated with numeric weights, and that the harmony of a structure is computed as the linear combination of the weights of the constraints it violates. There are, however, two diVerences between LOT and HG: (a) LOT only models constraint violations, while HG models both violations and satisfactions; and (b) LOT uses standard least square estimation to determine constraint weights, while HG requires more powerful training algorithms such as backpropagation. We will discuss each of these diVerences in turn. LOT requires that all constraint weights have the same sign (only positive weights are allowed, see Section 14.3.1). This amounts to the claim that only constraint violations (but not constraint satisfactions) play a role in determining the grammaticality of a structure. In HG, in contrast, arbitrary constraint weights are possible, that is constraint satisfactions (as well as violations) can inXuence the harmony of a structure. This means that HG allows a grammar to be deWned that contains a constraint C with the weight w and a constraint C’ that is the negation of C and has the weight w. In such a grammar, both the violations and the satisfactions of C inXuence the harmony of a structure. The issue of positive weights has important repercussions for the relationship between standard OT and LOT: Keller (2000b) proves a superset theorem that states that an arbitrary LOT grammar can be simulated by a standard OT grammar with stratiWed hierarchies. The proof crucially relies on the assumption that all constraint weights are of the same sign. StratiWed hierarchies allow us to simulate the addition of constraint violations (they correspond to multiple violations in standard OT), but they do not allow us to simulate the subtraction of constraint violations (which would be required by constraints that increase harmony). This means that the superset theorem does not hold for grammars that have both positive and negative constraint weights, as they are possible in harmonic grammar. The second diVerence between HG and LOT concerns parameter estimation. An HG model can be implemented as a connectionist network, and the parameters of the model (the constraint weights) can be estimated using standard connectionist training algorithms. An example is provided by the HG model of unaccusativity/unergativity in French presented by Legendre et al. (1990a, 1990b) and Smolensky et al. (1992). This model is implemented as a multilayer perceptron and trained using the backpropagation algorithm (Rumelhart et al. 1986a).
282
Gradience in Syntax
It is well-known that many connectionist models have an equivalent in conventional statistical techniques for function approximation. Multilayer perceptrons, for instance, correspond to a family of non-linear statistical models, as shown by Sarle (1994). (Which non-linear model a given perceptron corresponds to depends on its architecture, in particular the number and size of the hidden layers.) The parameters of a multilayer perceptron are typically estimated using backpropagation or similar training algorithms. On the other hand, a single-layer perceptron (i.e. a perceptron without hidden layers) corresponds to multiple linear regression, a standard statistical technique for approximating a linear function of multiple variables. The parameters (of both a single-layer perceptron and a linear regression model) can be computed using least square estimation (Bishop 1995). This technique can also be used for parameter estimation for LOT models (see Section 14.3.3). Note that LOT can be conceived of as a variant of multiple linear regression. The diVerence between LOT and conventional multiple linear regression is that parameter estimation is not carried directly on data to be accounted for (the acceptability judgements); rather, a preprocessing step is carried out on the judgement data to compute a set of ranking arguments, which then form the input for the regression. To summarize, the crucial diVerence between HG and LOT is that HG is a non-linear function approximator, while LOT is a linear function approximator, that is a variant of linear regression. This means that a diVerent set of parameter estimation algorithms is appropriate for HG and LOT, respectively. 14.4.3 Probabilistic optimality theory Boersma and Hayes (2001) propose a probabilistic variant of optimality theory (POT) that is designed to account for gradience both in corpus frequencies and in acceptability judgements. POT stipulates a continuous scale of constraint strictness. Constraints are annotated with numerical strictness values; if a constraint C1 has a higher strictness value than a constraint C2 , then C1 outranks C2 . Boersma and Hayes (2001) assume probabilistic constraint evaluation, which means that at evaluation time, a small amount of random noise is added to the strictness value of a constraint. As a consequence, re-rankings of constraints are possible if the amount of noise added to the strictness values exceeds the distance between the constraints on the strictness scale. For instance, assume that two constraints C1 and C2 are ranked C1 C2 , selecting the structure S1 as optimal for a given input. Under Boersma and Hayes’ (2001) approach, a re-ranking of C1 and C2 can occur at evaluation time, resulting in the opposite ranking C2 C1 . This re-ranking might result
Linear Optimality Theory as a Model
283
in an alternative optimal candidate S2 . The probability of the re-ranking that makes S2 optimal depends on the distance between C1 and C2 on the strictness scale (and on the amount of noise added to the strictness values). The re-ranking probability is assumed to predict the degree of grammaticality of S2 . The more probable the re-ranking C2 C1 , the higher the degree of grammaticality of S2 ; if the rankings C1 C2 and C2 C1 are equally probable, then S1 and S2 are equally grammatical. The POT framework comes with its own learning theory in the form of the gradual learning algorithm (Boersma 1998a, 2000; Boersma and Hayes 2001). This algorithm is a generalization of Tesar and Smolensky’s (1998) constraint demotion algorithm in that it performs constraint promotion as well as demotion. The gradual learning algorithm incrementally adjusts the strictness values of the constraints in the grammar to match the frequencies of the candidate structures in the training data. The fact that the algorithm relies on gradual changes makes it robust to noise, which is an attractive property from a language acquisition point of view. There are, however, a number of problems with the POT approach. As Keller and Asudeh (2002) point out, POT cannot model cases of harmonic bounding, as illustrated in Table 14.2: candidate S2 is harmonically bound by candidate S1 , which means that there is no re-ranking of the constraints that would make S2 optimal. As S2 can never be optimal, its frequency or acceptability is predicated to be zero (i.e. no other candidate can be worse, even if it violates additional constraints). An example where this is clearly incorrect is S3 in Table 14.2, which violates a higher ranked constraint and is less acceptable (or less frequent) than S2 . A second problem with POT identiWed by Keller and Asudeh (2002) is cumulativity. This can be illustrated with respect to Table 14.3: here, candidate S1 violates constraint C2 once and is more acceptable than S2 , which violates C2 twice. S2 in turn is more acceptable than S3 , which violates C2 three times. A model based on constraint re-ranking cannot account for this, as a Table 14.2 . Data that cannot be modelled in probabilistic OT (hypothetical frequencies or acceptability scores) /input/ S1 S2 S3
C3
C1
C2
* *
*
*
Source : Keller and Asudeh (2002)
Freq./Accept. 3 2 1
284
Gradience in Syntax Table 14.3. Data that cannot be modelled in probabilistic OT (hypothetical frequencies or acceptability scores) /input/ S1 S2 S3 S4
C1
*
C2
Freq./Accept.
* ** ***
4 3 2 1
Source : Keller and Asudeh (2002)
re-ranking of C2 will not change the outcome of the competition between S1 , S2 , and S3 . Essentially, this is a special case of harmonic bounding involving only one constraint. There is considerable evidence that conWgurations such as the ones illustrated in Tables 14.2 and 14.3 occur in real data. Keller (2000b) reports acceptability judgement data for word order variation in German that instantiates both patterns. Guy and Boberg’s (1997) frequency data for coronal stop deletion in English instantiates the cumulative pattern in Table 14.3. Ja¨ger and Rosenbach (2004) show that cumulativity is instantiated in both frequency data and acceptability data on genitive formation in English. None of these data sets can be modelled by POT, and thus they constitute serious counterexamples to this approach. In linear optimality theory, on the other hand, such cases are completely unproblematic, due to the linear combination scheme assumed in this framework. In a recent paper, Boersma (2004) acknowledges that cases of harmonic bounding and cumulativity as illustrated in Tables 14.2 and 14.3 pose a problem for POT. In response to this, he proposes a variant of POT, which we will call POT’. In POT’, the acceptability of a candidate S is determined by carrying out a pairwise comparison between S and each of the other candidates in the candidate set; the acceptability of S then corresponds to the percentage of comparisons that S wins.3 As an example, consider Table 14.2. Here, S1 wins against S2 and S3 , hence its acceptability value is 2/2¼100%. S2 wins against S3 but loses against S1 , so its acceptability is 1/2¼50%. S3 loses against both candidates, and thus receives an acceptability value of 0 per cent. In POT’, the relative grammaticality of a candidate corresponds to its optimality theoretic rank in the candidate set. This is not a new idea; in fact 3 More precisely, it is the POT probability of winning, averaged over all pairwise comparisons, but this diVerence is irrelevant here.
Linear Optimality Theory as a Model
285
Table 14.4 . Data that cannot be modelled in POT’ (hypothetical frequencies or acceptability scores) /input/ S1 S2 S3
C3
*
C1
C2
* *
*
Freq./Accept. 2 1 1
it is equivalent to the deWnition of relative grammaticality in terms of suboptimality, initially proposed by Keller (1997). The only diVerence is that in POT’, suboptimality is determined based on a POT notion of harmony, instead of using the standard OT notion of harmony, as assumed by Keller (1997). However, there are a number of conceptual problems with this proposal (which carry over to POT’), discussed in detail by Mu¨ller (1999) and Keller (2000b). In addition to that, there are empirical problems with the POT’ approach. POT’ correctly predicts the relative acceptability of the example in Table 14.2 (as outlined above). However, other counterexamples can be constructed easily if we assume ganging-up eVects. In Table 14.4, the combined violation of C1 and C2 is as serious as the single violation of C3 , which means that the candidates S2 and S3 are equally grammatical. Such a situation cannot be modelled in POT’, as S2 will win against S3 (because C3 outranks C1 ), hence is predicted to be more grammatical than S3 . As discussed in Section 14.2, ganging-up eVects occur in experimental data, and thus pose a real problem for POT’. In contrast to POT and POT’, LOT can model ganging-up eVects straightforwardly, as illustrated in Section 14.3.2. This is not surprising: the weights in LOT grammars are estimated so that they correspond in a linear fashion to the acceptability scores of the candidates in the training data. The strictness bands in POT (and POT’) grammars, on the other hand, are estimated to match the frequencies of candidates in the training data; it is not obvious why such a model should correctly predict acceptability scores, given that it is trained on a diVerent type of data. 14.4.4 Maximum entropy models The problems with POT have led a number of authors to propose alternative ways of dealing with gradience in OT. Goldwater and Johnson (2003), Ja¨ger (2004), and Ja¨ger and Rosenbach (2004) propose a probabilistic variant of OT based on the machine learning framework of maximum entropy models,
286
Gradience in Syntax
which is state of the art in computational linguistics (e.g. Abney 1997; Berger et al. 1996). In maximum entropy OT (MOT) as formulated by Ja¨ger (2004), the probability of a candidate structure (i.e. of an input–output pair (o,i)) is deWned as: P 1 exp ( rj cj (i,0) ) (14.17) PR (oji) ¼ ZR (i) j Here, rj denotes the numeric rank of constraint j, while R denotes the ranking vector, that is the set of ranks of all constraints. The function cj (i,o) returns the number of violations of constraint j incurred by input–output pair (i,o). ZR (i) is a normalization factor. The model deWned in (14.17) can be regarded as an extension of LOT as introduced in Section 14.3.1. It is standard practice in the literature on gradient grammaticality to model not raw acceptability scores, but logtransformed, normalized acceptability data (Keller 2000b). This can be made explicit by log-transforming the left-hand side of (14.6) (and dropping the minus and renaming the variable i to j). The resulting formula is then equivalent to (14.18). P (14.18) H(S) ¼ exp ( w(Cj )v(S,Cj ) ) j
A comparison of (14.17) and (14.18) shows that the two models have a parallel structure: w(Cj ) ¼ rj and v(S,Cj ) ¼ cj (i,o) (the input–output structure of the candidates is implicit in (14.18)). Both models are instances of a more general family of models referred to as log-linear models. There is, however, a crucial diVerence between the MOT deWnition in (14.17) and the LOT deWnition in (14.18). Equation (14.18) does not include the normalization factor ZR (i), which means that (14.18) does not express a valid probability distribution. The normalization factor is not trivial to compute, as it involves summing over all possible output forms o (see Goldwater and Johnson 2003, and Ja¨ger 2004, for details). This is the reason why LOT assumes a simple learning algorithm based on least square estimation, while MOT has to rely on learning algorithms for maximum entropy models, such as generalized iterative scaling, or improved iterative scaling (Berger et al. 1996). Another crucial diVerence between MOT and LOT (pointed out by Goldwater and Johnson 2003) is that MOT is designed to be trained on corpus data, while LOT is designed to be trained on acceptability judgement data.
14.5 Conclusions This paper introduced linear optimality theory (LOT) as a model of gradient grammaticality. Although this model borrows central concepts (such as
Linear Optimality Theory as a Model
287
constraint ranking and competition) from optimality theory, it diVers in two crucial respects from standard OT. First, LOT assumes that constraint ranks are represented as numeric weights (this feature is shared with probabilistic OT and maximum entropy OT, see Sections 14.4.3 and 14.4.4). Secondly, LOT assumes that the grammaticality of a given structure is proportional to the sum of the weights of the constraints it violates, which means that OT’s notion of strict domination is replaced with a linear constraint combination scheme (this feature is shared with maximum entropy OT, see Section 14.4.4). We also outlined a learning algorithm for LOT (see Section 14.3.3). This algorithm takes as its input a grammar (i.e. a set of linguistic constraints) and a training set, based on which it estimates the weights of the constraints in the grammar. The training set is a collection of candidate structures, with the violation proWle and the grammaticality score for each structure speciWed. Note that LOT is not intended as a model of human language acquisition: it cannot be assumed that the learner has access to training data that are annotated with acceptability scores. The sole purpose of the LOT learning algorithm is to perform parameter Wtting for LOT grammars, that is to determine an optimal set of constraint weights for a given data set. LOT is able to account for the properties of gradient structures discussed in Section 14.2. Constraint ranking is modelled by the fact that LOT annotates constraints with numeric weights representing the contribution of a constraint to the unacceptability of a structure. Cumulativity is modelled by the assumption that the degree of ungrammaticality of a structure is computed as the sum of the weights of the constraints the structure violates. Once ranking and cumulativity are assumed as part of the LOT model, other properties of gradient linguistic judgements follow without further stipulations.
This page intentionally left blank
Part IV Gradience in Wh-Movement Constructions
This page intentionally left blank
15 EVects of Processing DiYculty on Judgements of Acceptability G I S B E RT FA N S E LOW A N D S T E FA N F R I S C H
15.1 Introduction and overview There is a certain tension between the role which acceptability judgements play in linguistics and the level of their scientiWc underpinning.1 Judgements of grammaticality form the empirical basis of generative syntax, but little is known about the processes underlying their formation and the factors diVerent from grammar contributing to them. This paper illuminates the impact of processing diYculty on acceptability. Section 15.2 reviews evidence showing that parsing problems often reduce acceptability. That processing diYculty may increase acceptability is less obvious, but this possibility is nevertheless borne out, as Section 15.3 shows, which reports several experiments dealing with locally ambiguous sentences involving discontinuous NPs, NP-coordination, and VP-preposing. The preferred interpretation of a locally ambiguous construction can have a positive inXuence on the global acceptability of a sentence even when this reading is later abandoned. Our experiments focusing on long wh-movement in Section 15.4 conWrm the existence of the positive eVect of local ambiguities in a domain that goes beyond mere syntactic feature diVerences. The global acceptability of a sentence is thus inXuenced by local acceptability perceptions during the parsing process.
15.2 Decreased acceptability caused by processing problems Generative syntax subscribes to two fundamental convictions: the notions of grammaticality and acceptability must be kept apart, and grammatical 1 We want to thank Caroline Fe´ry, Heiner Drenhaus, Matthias Schlesewsky, Ralf Vogel, Thomas Weskott, and an anonymous referee for helpful comments and critical discussion, and Jutta Boethke, Jo¨rg Didakowski, Ewa Trutkowski, Julia Vogel, Nikolaus Werner, Nora Winter, and Katrin Wrede for technical support. The research reported here was supported by DFG-grant FOR375.
292
Gradience in Wh-Movement Constructions
sentences may be unacceptable because of the processing diYculty they involve (Chomsky 1957; Chomsky and Miller 1963). The latter is exempliWed by multiple centre embeddings such as (15.1). Their acceptability decreases with the number of self-embeddings, yet they are constructed according to the principles of English grammar, and should therefore be grammatical. Processing explanations for their low acceptability seem well-motivated, since they Wt into theories of language processing (see Lewis 1993). (15.1) the man who the woman who the mosquito bit loves kicked the horse Strong garden path sentences such as the horse raced past the barn fell (Bever 1970) illustrate the same point: sentences not violating any of the constructional principles of English may have properties that make it close to impossible for human parsing routines to identify their correct grammatical analysis. This renders them unacceptable. While it seems uncontroversial that eVects of strong processing problems should not be explained as violations of grammatical principles, the interpretation of milder parsing diYculties is less uniform. Consider the fronting of objects in free word order languages such as German. Experimental studies have revealed that object-initial sentences such as (15.2b) are less acceptable than their subject-initial counterparts (15.2a) (Bader and Meng 1999; Featherston 2005; Keller 2000a). der Tiger hat den Lo¨wen gejagt the.nom tiger has the.acc lion.acc chased b. den Lo¨wen hat der Tiger gejagt ‘the tiger has chased the lion’
(15.2) a.
Keller (2000a) and Mu¨ller (1999) make syntactic constraints responsible for the lower acceptability of object-initial sentences, but a diVerent explanation is at hand: object-initial sentences are more diYcult to parse than subjectinitial ones, and this may render them less acceptable. Processing diYculties of object-initial structures have been documented since Krems (1984), see also Hemforth (1993), Meng (1998), Schlesewsky et al. (2000), among others. Their low acceptability can be explained in terms of this additional processing load, and the latter can be shown to be grammar-independent. In a self-paced reading study, Fanselow et al. (1999) compared the processing of embedded German subject-initial, object-initial, and yes–no-questions. They found an increase in reading times for the object-initial condition (15.3b), beginning with the wh-phrase and ending at the position of the second NP (¼ the subject, in the object-initial condition).
Effects of Processing DiYculty on Judgements
293
(15.3) es ist egal ‘it does not matter’ a. wer
vermutlich glu¨cklicherweise den
Mann
erkannte
man
recognized
vermutlich glu¨cklicherweise der
Mann
erkannte
presumably fortunately
man
recognized
den Dekan
erkannte
who.nom presumably fortunately
the.acc
‘who fortunately presumably recognized the man’ b. wen who.acc
the.nom
‘who the man presumably fortunately recognized’ c. ob whether
vermutlich glu¨cklicherweise der Mann presumably fortunately
the.nom man the.acc dean recognized
‘if the man presumably fortunately recognized the dean’
Fanselow et al. (1999) interpret this result in terms of memory cost: a fronted object wh-phrase must be stored in memory during the parse process up to the point where an object position can be postulated. In an SOV-language such as German, this means that the object must be memorized until the subject has been recognized. This account is in line with recent ERP research. King and Kutas (1995) found a sustained anterior negativity for the processing of English object relative clauses (as compared to subject relative clauses), which Mu¨ller et al. (1997) relate to memory. Felser et al. (2003), Fiebach et al. (2002), and Matzke et al. (2002) found a sustained LAN in the processing of German object-initial wh-questions and declaratives, which is again attributed to the memory load coming from the preposed object. The claim that object-initial structures involve a processing diYculty is thus well supported. It is natural to make this processing diYculty responsible for the reduced acceptability of sentences such as (15.2b). Subjacency violations as in (15.4) constitute another domain in which processing diYculty reduces acceptability. Kluender and Kutas (1993) argue that syntactic islands arise at ‘processing bottlenecks’ when the processing demands of a long distance dependency at the clause boundary add up on the processing demands of who or whether. This processing problem is reXected in dramatically reduced acceptability. (15.4) ??what do you wonder who has bought Processing accounts of the wh-island condition furthermore allow us to understand satiation (Snyder 2000) and training eVects (Fanselow et al. to appear) that are characteristic of wh-island violations: repeated exposure facilitates the processing of sentences such as (15.4), and renders them more acceptable.
294
Gradience in Wh-Movement Constructions
Processing diYculty reduces acceptability in further areas. Mu¨ller (2004) shows that the low acceptability of CP-extrapositions from certain attachment sites follows from attachment preferences, and does not reXect low grammaticality. Experimental evidence (Featherston 2005) suggests that the acceptability of subject relative clauses involving a locally ambiguous relative pronoun decreases with an increase of the length of the ambiguous region. This may be explained in terms of the processing diYculties associated with locally ambiguous arguments (Frisch et al. 2001, 2002).
15.3 Increased acceptability linked to processing problems 15.3.1 General remarks Processing diYculty can reduce the acceptability of a sentence. In principle, the reverse might also exist: some processing diYculty makes a sentence with low grammaticality fairly acceptable. For example, this should be the case when the factor making the structure ungrammatical is diYcult to detect. Marks (1965: 7) shows that the position of a grammatical violation correlates with the degree of (un-)acceptability: violations coming early as in boy the hit the ball are less acceptable than late violations as in the boy hit ball the. Meng and Bader (2000b) and Schlesewsky et al. (2003) (among others) found chance performance (rather than outright rejection) in speeded acceptability rating tasks for ungrammatical transitive sentences such as (15.5) containing illegitimate combinations of two nominative NPs. Schlesewsky et al. explain such results with the assumption that the case marking of NPs tends to be overlooked in nominative-initial sentences. (15.5) *welcher Ga¨rtner sah der Ja¨ger which.nom gardener saw the.nom hunter The experiments reported here were carried out in order to systematically investigate such positive eVects of processing diYculties on acceptability. In particular, we expected a mitigating inXuence of local ambiguities. That parsing problems can reduce the global acceptability of a sentence suggests that it not only reXects properties of the Wnal analysis, but also the ‘local acceptability’ of intermediate processing steps. When these intermediate steps have ‘better’ grammatical properties than the Wnal analysis of the string, one should expect that global acceptability is increased by the relatively wellformed intermediate parsing steps. In particular, we studied discontinuous NPs (experiment 1), subject verb agreement (experiment 2), VP-preposing (experiment 3), and long distance movement (experiment 4).
Effects of Processing DiYculty on Judgements
295
15.3.2 Experiment 1: Discontinuous noun phrases NPs can be serialized discontinuously in German, as illustrated in (15.6c). See Fanselow (1988), Fanselow and C´avar (2002), and Riemsdijk (1989) for diVerent analyses of discontinuous NPs (DNP), and Bader and Frazier (2005) for oZine experiments involving DNP. (15.6) a.
er he b. Viele many c. Bu¨cher books
liest [NP reads Bu¨cher books liest reads
viele many liest reads er he
Bu¨cher] books er he viele many
German DNP are subject to two grammatical constraints on number (cf. Fanselow and C´avar 2002): an agreement constraint, and a ban against singular count nouns appearing in the construction. Apart from a few exceptional constellations, DNPs are grammatical only if the two parts agree in number (the Agreement constraint). While such a constraint holds for DNPs in many languages, German is exceptional in the other respect, viz. in disallowing singular DNPs for count nouns, as the contrast between (15.7a) and (15.7b) illustrates (the Singularity constraint). The constraint derives from a general requirement that articles may be absent in German in partial and complete NPs only if the NP is headed by a plural or mass noun. Some dialects repair the ungrammaticality of (15.7b) by ‘regenerating’ (Riemsdijk 1989) an article in the left part of the NP as shown in (15.7c); in other dialects, there is no grammatical way of expressing what (15.7b) tries to convey. For exceptions to these generalizations, see Fanselow and C´avar (2002) and van Hoof (1997). (15.7) a.
alte Professoren liebt old.pl professors.pl loves ‘she loves no old professors’ b. *alten Professor liebt old.sg professor.sg loves ‘she loves no old professor’ c. einen alten Professor an old professor
sie she
keine no.pl
sie she
keinen no.sg
liebt sie loves she
keinen no
Many German nouns such as KoVer ‘suitcase’ do not distinguish singular and plural morphologically for nominative and accusative case. The left periphery of the DNP (15.8) is therefore (locally) compatible with a plural
296
Gradience in Wh-Movement Constructions
interpretation, which is excluded when the singular determiner keinen is processed. Up to this point, however, the phonetic string allows an analysis in which singularity is not violated. Introspection suggests that this local number ambiguity increases acceptability as compared to other singular DNP: (15.8) sounds better than (15.7b). (15.8) KoVer hat sie keinen suitcase.ambiguous has she no.singular Experiment 1 tested several hypotheses on German DNP. Experiment 1a investigated whether DNP with matching number are more acceptable than those without (Agreement), and whether singular DNP are less acceptable than plural ones (Singularity). Experiment 1b addressed the question of whether local ambiguities of number increase the acceptability of singular DNP. Experiment 1b required that we compare DNP with and without adjectives contained in their left part. In experiment 1a, we therefore also tested whether the presence of an adjective has an inXuence on the acceptability of DNP. 15.3.3 Experiment 1a 15.3.3.1 Materials Experimental items had the form exempliWed in (15.9). In a sentence with a pronominal subject preceded by the verb and followed by an adverb, an object NP was split such that the left part (LP) preceded the verb, while the right part (RP) was clause Wnal. The LP could consist of a single noun (simple) (15.9a, 15.9b, 15.9e, 15.9f), or of a noun preceded by an adjective (like alten, old) agreeing with the noun (15.9c, 15.9d, 15.9g, 15.9h). The LP and RP appeared in either singular (sg) or plural (pl) form (see below). (15.9) a. Professor
kennt
sie
leider
professor.sg knows
she
unfortunately no.sg
b. Professoren kennt
sie
leider
professor.pl knows
she
unfortunately no.pl
c. Alten old.sg d. Alte old.pl
Professor
kennt sie
keinen
simple_sg_sg
keine leider
simple_pl_pl keinen complex_sg_sg
professor.sg knows she
unfortunately no.sg
Professoren kennt sie
leider
keine complex_pl_pl
professor.pl knows she
unfortunately no.pl
kennt
sie
leider
keine
professor.sg knows
she
unfortunately no.pl
f. Professoren kennt
sie
leider
professor.pl knows
she
unfortunately no.sg
e. Professor
keinen
simple_sg_pl simple_pl_sg
Effects of Processing DiYculty on Judgements g. Alten
Professor
old.sg h. Alte old.pl
kennt sie
leider
297
keine complex_sg_pl
professor.sg knows she
unfortunately no.pl
Professoren kennt sie
leider
professor.pl knows she
unfortunately no.pl
keinen complex_pl_sg
15.3.3.2 Method Forty students of the University of Potsdam participated. They were paid for their participation, or received course credits. Participants rated 106 sentences in pseudo-randomized order for acceptability on a six point scale (1¼ ‘very good’, 6¼ ‘very bad’). There were four items per condition. Each participant saw 16 experimental items (2 per condition), 74 unrelated and 16 related distractor items (items of experiment 1b plus 4 Wllers). A larger set of 128 sentences (16 sets of identical lexical material in each of the 8 conditions) was created and assigned to 8 between subjects versions in such a way that no subject saw identical lexical material in more than one sentence. In the other experiments, we used a rating scale diVerent from the one in experiment 1. In order to increase the comparability of the results, we will use transformed values for mean ratings in this results section: the ratings on the ‘1¼best/6¼worst’ scale are mapped to their equivalent on the ‘1¼worst/ 7¼best’ scale used later (using the equation: transformed_value ¼ 8— ( real_value + (real_value-1)/5) ). 15.3.3.3 Results Figure 15.1 shows the mean acceptability ratings per condition for all forty subjects. In an ANOVA with the factors MATCH (number match between LP and RP), NUMBER (number of LP: singular versus plural) and COMPLEXITY (with versus without adjective in LP), we found a main eVect of MATCH (F1(1,39) ¼ 35.02, p < .001) due to higher acceptabilities in matching compared
7 6
4 3
4.9
4.97
5
3.23 2.98
2.7
2.92
3.33 2.56
2 1
Simple
Figure 15.1. Results of Experiment 1a
SG/Match
Complex
SG/Mism. PL/Match PL/Mism.
298
Gradience in Wh-Movement Constructions
to mismatching conditions. Furthermore, there was a main eVect of NUMBER (F1(1,39) ¼ 49.33, p < .001) because LP plurals were more acceptable than LP singulars. There was no main eVect of complexity (F < 1). In addition, there was an interaction MATCH NUMBER (F1(1,39) ¼ 21.72, p < .001). Resolving this interaction by the factor NUMBER revealed a signiWcant advantage for matching (compared to mismatching) number for LP plurals (F1(1,39) ¼ 39.10, p < .001), but only a marginal one for LP singulars (F1(1,39) ¼ 3.71, p ¼ .06). No further interaction reached signiWcance. 15.3.3.4 Discussion Experiment 1a conWrms the constraints Agreement and Singularity borrowed from the literature. DNP are not acceptable when the number of the LP and of the RP of the DNP do not match (15.9e–15.9h). Furthermore, only plural (15.9b, 15.9d) but not singular DNP (15.9a, 15.9c) are acceptable (if the construction is formed with a countable noun). In line with our expectations, the presence of an adjective in the LP of a DNP had no inXuence on the acceptability of the construction. The addition of an adjective could thus be employed as a disambiguating device in experiment 1b. 15.3.4 Experiment 1b 15.3.4.1 Materials The six conditions of experiment 1b are exempliWed in (15.10a) to (15.10f). All nouns were ambiguous with respect to number. The LP of DNP which just consisted of a noun was consequently number-ambiguous as well (15.10a, 15.10b). The addition of a number-marked adjective disambiguated the LP towards a singular (15.10c, 15.9d) or plural (15.10e, 15.10f) interpretation. The RP of the DNP was either singular (15.10a, 15.10c, 15.10e) or plural (15.10b, 15.10d, 15.10f). (15.10) a. KoVer
hatte
suitcase.amb had b. KoVer
hatte
suitcase.amb had c. Roten red.sg d. Roten red.sg e. Rote f.
KoVer
er
leider
he
unfortunately no.sg
er
leider
he
unfortunately no.pl
hatte er
suitcase had KoVer
he
hatte er
suitcase had KoVer
keinen
he
hatte er
red.pl
suitcase had
Rote
KoVer
red.pl
suitcase had
he
hatte er he
amb_sg
keine leider
amb_pl keinen sg_sg
unfortunately no.sg leider
keine
sg_pl
unfortunately no.sg leider
keinen pl_sg
unfortunately no.sg leider
keine
unfortunately no.pl
pl_pl
Effects of Processing DiYculty on Judgements
299
15.3.4.2 Method There were four items per contition. Experiment 1b was included in the same questionnaire as experiment 1a (see above). Each participant saw 12 experimental items (2 per condition), 74 unrelated and 16 related distractor items (items of experiment 1a) plus 4 Wllers. A larger set of 96 sentences (16 sets of identical lexical material in each of the 6 conditions) was created and assigned to 8 between subjects versions in such a way that no subject saw identical lexical material in more than one sentence. 15.3.4.3 Results Figure 15.2 shows the mean acceptability ratings per condition for all forty subjects. We computed an ANOVA with the factors LP NUMBER (number of left part: ambiguous versus singular versus plural) and RP NUMBER (number of right part: singular versus plural). We found a main eVect of LP NUMBER (F1(2,78) ¼ 30.82, p < .001) which was due to the fact that LP singulars were less acceptable than both LP plurals (F1(1,39) ¼ 31.69, p < .001) and ambiguous LP (F1(1,39) ¼ 51.36, p < .001). However, ambiguous and plural LP did not diVer from one another (F1(1,39) ¼ 1.51, p ¼ .34). Furthermore, there was a main eVect of RP NUMBER (F1(1,39) ¼ 24.67, p < .001) which was due to the fact that RP plurals were more acceptable than RP singulars. We also found an interaction between both factors (F1(2,78) ¼ 13.77, p < .001). Resolving this interaction by the factor RP NUMBER, we found a main eVect of LP number for both RP singulars (F1(1,39) ¼ 6.66, p < .01) and RP plurals (F1(1,39) ¼ 33.37, p < .001). Within RP singulars, ambiguous LP were better than both singulars (F1(1,39) ¼ 14.38, p < .001) and plurals (F1(1,39) ¼ 7.61, p < .01) whereas within RP plurals, ambiguous LP were better than singulars (F1(1,39) ¼ 43.60, p < .001), but equally acceptable as LP plurals (F < 1).
7 6
4
4.97
4.78
5
SG
3.87 2.98
3
3.23 2.7
2 1 Ambiguous
Singular
Figure 15.2. Results of Experiment 1b
Plural
PL
300
Gradience in Wh-Movement Constructions
15.3.4.4 Discussion When we conWne our attention to DNPs with unambiguous LPs, experiment 1b is in line with what we saw in experiment 1a. German DNP are subject to the Singularity constraint ruling out (15.10c, 15.10d), so that only DNP with a plural LP can be acceptable. Furthermore, the Agreement constraint imposes a further restriction on acceptable DNP: the RP must be plural as well (15.10f). The results for the ambiguous LP conditions reveal more interesting facts, in particular, when the RP is singular. Let us, however, consider ambiguous LPs with a plural RP Wrst. When the right part of the DNP disambiguates the DNP towards a plural interpretation (15.10b), sentences beginning with an ambiguous left part are as acceptable as sentences with a plural left part (15.10f). This is not surprising: the human parser must interpret the morphologically ambiguous LP of the DNP as plural, since the Singularity constraint against singular DNP cannot be fulWlled otherwise. A right part of the DNP with a plural marking constitutes no reason for abandoning this plural hypothesis. Interestingly, however, ambiguous LP are more acceptable than both singular and plural items when the RP bears a singular marking (15.10a). This is in line with the intuitive assessment of such structures mentioned at the outset, and it conWrms our expectation that the presence of a local ambiguity can increase the acceptability of a sentence. How does the ambiguity eVect in DNP with singular RPs come about? Note Wrst that the presence of an adjective in the unambiguous LPs and its absence in the ambiguous LPs cannot be made responsible for the results, since the complexity of the LP has not had any eVect on acceptability in experiment 1a. Rather, we can link the positive eVect of local ambiguity to the processing diYculty that arguably arises when a locally ambiguous item Wgures diVerently in the computations related to two (or more) diVerent contraints. That the DNPs with an ambiguous LP are better than those with unambiguously singular LPs (15.10c) follows straightforwardly from the fact that the ambiguous item can be (temporarily) interpreted as plural. Thus, the Singularity constraint banning singular DNP can be considered fulWlled when the ambiguous item is processed (see above). That the ambiguous LPs are also better than DNP with plural left parts (15.10e) in turn seems to be related to the fact that the ambiguous item can also be interpreted as a singular, so that the Agreement requirement can also be taken to be fulWlled. There are two mitigating eVects of local ambiguity, then, but they are based on two incompatible interpretations of the ambiguous noun. Experiment 1b has thus conWrmed the expectation that the presence of a local ambiguity can increase global acceptability. In particular, the results
Effects of Processing DiYculty on Judgements
301
represented in Figure 15.2 are compatible with the view that intermediate acceptability assessments (in our case: concerning Singularity) inXuence global acceptability: the option of a plural interpretation for a locally ambiguous noun leads to a positive local acceptability value, because Singularity appears fulWlled. This positive local assessment contributes to the global acceptability of DNPs even when the plural interpretation is later abandoned because a singular right part is detected. In contrast to grammaticality, global acceptability does not only depend on the Wnal structural analysis, but also on the acceptability of intermediate analysis steps. This acceptability pattern can also be found with professional linguists. They are not immune to such ‘spillover’ eVects increasing global acceptability, as survey 1c has revealed. 15.3.5 Survey 1c By e-mail, we asked more than sixty linguists (nearly all syntacticians) with German as their native language for judgements of sixteen DNP constructions, among them the items (15.11a, 15.11b) illustrating DNP with a singular RP and a number ambiguous (15.11a) or singular (15.11b) LP constructed as in experiment 1b. (15.11) a.
KoVer suitcase.amb b. Roten red-sg
hat has KoVer suitcase
er he hat has
keinen no.sg er he
zweiten second.sg keinen zweiten no.sg second.sg
Of the remaining fourteen items, eight were DNP constructions with singular LP and RP, one was a DNP with plural LP and RP, and four DNP had a plural LP but checked for diVerent grammatical parameters. There was a further item with an ambiguous LP. No distractor items were used in order to increase the likelihood of a reply. Forty-Wve linguists responded by sorting the sentences into the categories ‘*’, ‘?’, and ‘well-formed’. The results are summarized in Figure 15.3, showing the number of participants choosing a particular grade. (15.12) Professoren kennt sie zwei professor-pl knows she two As Figure 15.3 indicates, sentence (15.12) beginning with an unambigous plural item was accepted by nearly all participants. Two-thirds of the participants rejected sentences that began with an unambiguous singular DNP (15.11b). Both results are in line with the constraint *Singularity. The reaction to the
302 45 40 35 30 25 20 15 10 5 0
Gradience in Wh-Movement Constructions 40 30 okay ? out
20 14
11
6
9
5 0
15.11a
15.11b
15.12
Figure 15.3. Survey 1c
ambiguous item (15.11a) was diVerent. Only fourteen of the forty-Wve linguists rejected this sentence. A statistical comparison between the number of rejections in (15.11a) versus (15.11b) revealed a signiWcant diVerence (x2 ¼ 5:82, p < :05). The result shows that local ambiguities can improve acceptability not only in the context of fast responses given by experimental subjects when Wlling in a questionnaire. The eVect is also visible in the more reXected judgements of professional syntacticians and other linguists. 15.3.6 Experiment 2: Disjunctive coordination Experiment 2 was carried out with two goals: Wrst, we wanted to demonstrate a positive eVect of processing diYculty in a domain other than DNP. A second purpose was to test whether local ambiguities can increase acceptability in syntactic constellations that do not completely Wt into the ‘classical’ preferred reading/reanalysis constellation. We looked at a construction involving a syntactically unsolvable agreement problem. Subject–verb agreement is mandatory in German. Coordinated subjects may thus lead to expressivity problems when the language oVers no rules for computing the person–number features of the coordinated NPs. When two NPs are coordinated by and, plural agreement of the verb seems justiWed on semantic grounds, but there are no parallel conceptual arguments for picking any of two diVerent values of the person feature. Timmermans et al. (2004) found a preference for choosing 3rd person agreement rather than 2nd person agreement for subjects consisting of a 2nd and 3rd person noun phrase coordinated by und ‘and’. The order of the two NPs within the coordination seemed unimportant. (15.13) a.
weil because b. weil because
du you du you
und and und and
er he er he
gehen go.3pl geht go.2pl
Effects of Processing DiYculty on Judgements
303
When two singular NPs are coordinated by oder ‘or’, choosing plural agreement for the verb is not (necessarily) semantically justiWed. Still, when one searches the web, plural agreement, as in (15.14), is one of the frequent options. (15.14) Wer weiss, wie er oder ich in zwei Jahren denken who knows how he or I in two years think.3pl ‘who knows what I or he will think in two years’ time’ Of the Wrst twenty-Wve occurences of er oder ich ‘he or I’ found by Google in the German pages of the internet for which verbal agreement could be determined (included in the Wrst 180 total hits for er oder ich), fourteen had a plural verb, and eleven a singular one. However, the plural is less often chosen when the addition of entweder ‘either’ comes close to forcing the exclusive interpretation of oder. Among the Wrst twenty-Wve occurences of entweder er oder ich ‘either he or I’ found by Google in the German pages of the internet for which verbal agreement could be determined (included in the Wrst 100 total hits for the construction), only Wve were constructed with a plural verb.2 When one looks at the data extracted from the web showing singular agreement more closely, an interesting pattern emerges. Of the thirty-one examples, twenty-two involved a verb which was morphologically ambiguous between a 1st and 3rd person interpretation (this is true of past tense verbs, modal verbs, and a few lexical exceptions), and only nine bore an unambiguous person feature (present tense verbs apart from the exceptions mentioned), with a strong bias for 3rd person (7 of 9). This is in line with intuitions. Neither of the two verbal forms of schlafen ‘sleep’ sounds really acceptable in the present tense, in which the forms of 1st (15.15c) and 3rd person (15.15a) are distinguished morphologically. Examples (15.15b) and (15.15d) involve verb forms that are morphologically ambiguous, and sound much better. (15.15) a.
er he b. er he c. er he d. er he
oder or oder or oder or oder or
ich I ich I ich I ich I
schla¨ft sleep3sg schlief slept.amb schlafe sleep.1sg darf schlafen may.amb sleep
2 The websearch was done on 26 January 2005 at 7pm GMT.
ER, UNA ER, AMB
304
Gradience in Wh-Movement Constructions
We conducted two questionnaire studies in order to test whether the use of person-ambiguous verbs increases acceptability. 15.3.7 Description of experiment 2a 15.3.7.1 Materials The four conditions of experiment 2a are exempliWed in (15.15a–b) and (15.16). The experimental items began with an NP consisting of the pronouns ich ‘I’ and er ‘he’ conjoined by oder ‘or’. In the ER-initial condition, er came Wrst (15.15), in the ICH-initial condition, the NP began with ich (15.16). The verb form was always 3rd person singular. The verb could, however, either allow an additional 1st person singular interpretation (AMBiguous condition, 15b, 16b) or be conWned to the 3rd person reading (UNAmbiguous condition). (15.16) a.
ich I b. ich I
oder or oder or
er he er he
schla¨ft ICH, UNA sleeps.3sg. schlief ICH, AMB slept.amb
15.3.7.2 Method Forty-eight students of the University of Potsdam participated. They were paid for their participation, or received course credits. Participants rated 120 sentences for acceptability, on a seven point scale (1 ¼ very bad, 7 ¼ very good). There were 16 experimental items (4 items/ condition) in a within subject design, and 104 items not related to the experiment. 15.3.7.3 Results Figure 15.4 represents mean judgements of acceptability in experiment 2a. The mean acceptability of structures beginning with er (ERinitial (15.15) ) was 4.23, and was statistically indistinguishable from the 4.24 mean acceptability of sentences beginning with ich (ICH-initial, (15.16) ) (F1 < 1, F2 < 1). However, there was a signiWcant eVect of ambiguity: the 7 6 4.51
5 4
3.96
4.41
4.07
UNA AMB
3 2 1
ER
Figure 15.4. Experiment 2a
ICH
Effects of Processing DiYculty on Judgements
305
ambiguous structures (15b, 16b) (AMB) were rated better (4.5) than unambiguous ones (15a, 16a) (UNA) (4.0), (F1 (1,47) ¼ 6:79, p < :05; F2 (1,15) ¼ 50:74, p < :001). There was no interaction between both factors (F1 (1,47) ¼ 11, p ¼ :30, F2 < 1). 15.3.7.4 Discussion The order in which er and ich appeared in the experimental items had no eVect on acceptability. In this respect, experiment 2a is comparable to the results of Timmermans et al. (2004) involving andcoordination. The morphological ambiguity of the verb exerted an eVect on acceptability, in the expected direction: whenever the morphological shape of the verb Wts the person speciWcation of both pronouns because of the verbal ambiguity, acceptability increases. This ambiguity eVect is in line with our expectations. The acceptability of a sentence depends on whether the verb agrees with the subject. In the unambiguous conditions, the verb visibly disagrees with one of the two pronouns. In (15.15b, 15.16b), however, the ambiguous verb appears to meet the requirements of both pronouns (but only relative to diVerent interpretations of the verb), which makes a local perception of acceptability possible. The computations for pairwise agreement between the verb and the two pronouns yield positive results, which has a positive eVect on global ambiguity even though the two pairwise agreement computations cannot be integrated, since they work with diVerent interpretations of the ambiguous verb. One might object that the diVerence between the ambiguous and the unambiguous condition might also be explained in terms of grammatical well-formedness. The ambiguous verb form might have an underspeciWed grammatical representation, viz. [singular, –2nd person], which is grammatically compatible with both a 1st and a 3rd person subject. In contrast, the features of the unambiguous 3rd person form clash with those of the 1st person pronoun. Thus, the higher acceptability of the ambiguous forms might only reXect the absence of a feature clash. Such an account would leave it open, however, why the sentences with ambiguous verb forms are not rated as fully grammatical, as they should be, if no feature clash would be involved. We also tested the plausibility of this alternative explanation in experiment 2b, in which we investigated the acceptability of sentences in which er ‘he’ and ihr ‘you, plural’ were conjoined by or. In the regular present tense paradigm, 3rd person singular and 2nd person plural forms fall together. There is no simple way in which this ambiguity can be recast as underspeciWcation.3 If underspeciWcation rather than local 3 In a paper written after the completion of the present article, Mu¨ller (2005) oVers an underspeciWcation analysis for (15.17a, 15.17b) within a distributed-morphology model, however.
306
Gradience in Wh-Movement Constructions
ambiguity was responsible for the Wndings in experiment 2a, there should be no beneWt in acceptability in experiment 2b resulting from the use of homophonous forms. 15.3.8 Description of experiment 2b 15.3.8.1 Materials The four conditions of experiment 2b are exempliWed in (15.17). The experimental items began with an NP consisting of the pronouns er ‘he’ and ihr ‘you, plural’ conjoined by oder ‘or’. In the ER-initial condition, er came Wrst (15.17a, 15.17c), in the IHR-initial condition, the NP began with ihr (15.17b, 15.17d). The verb form was always 2nd person plural. In the UNAmbiguous condition (15.17c, 15.17d), the verb appeared in past tense, in which it is distinct from the 3rd person singular form. In the AMBiguous condition, the present tense was used. Such verbs allow an additional 3rd person reading. oder ihr kommt verspa¨tet zu dem TreVen ER, AMB or you come late to the meeting oder er kommt verspa¨tet zu dem TreVen IHR, AMB ER, oder ihr kamt verspa¨tet zu dem TreVen UNA he or you came late to the meeting d. ihr oder er kamt verspa¨tet zu dem TreVen IHR, UNA
(15.17) a. er he b. ihr c. er
15.3.8.2 Method Thirty-two students of the University of Potsdam participated. They were paid for their participation, or received course credits. Participants rated 96 sentences for acceptability, on a seven point scale (1 ¼ very bad, 7 ¼ very good). There were 16 experimental items (4 items/ condition) in a within subject design, and 80 items not related to the experiment. 15.3.8.3 Results Figure 15.5 represents mean judgements of acceptability in experiment 2b. The mean acceptability of structures beginning with er (ER-initial, 15.17a, 15.17c) was 4.03, and was statistically indistinguishable from the 3.92 mean acceptability of sentences beginning with ihr (IHR-initial, 15.17b, 15.17d) ) (F1 < 1, F2 < 1). However, there was a signiWcant eVect of ambiguity: the ambiguous structures (15.17a, 15.17b) (AMB) were rated better (4.58) than unambiguous ones (15.17c, 15.17d) (UNA) (3.38) (F1(1,31) ¼ 28.65, p < .001; F2(1,15) ¼ 22.26, p < .001) There was no interaction between both factors (F1 < 1, F2 < 1). 15.3.8.4 Discussion In line with previous Wndings, the order of the pronouns had no eVect on acceptability. Acceptability was, rather,
Effects of Processing DiYculty on Judgements
307
7 6 4.64
5 4
4.52
3.42
UNA AMB
3.33
3 2 1 ER
IHR
Figure 15.5. Experiment 2b
inXuenced by local ambiguity again. Structures with a visible clash between the 3rd person pronoun and the 2nd person verb form (15.17c, 15.17d) were less acceptable than sentences in which the ambiguity of the verb form made it seem compatible with both the 3rd singular and the 2nd person plural pronoun. To the extent that this particular constellation of features is diYcult to represent in terms of some (plausible) underspeciWcation of the grammatical features of verbs like kommt, we would possess an additional type of evidence for the claim that the acceptability diVerence found in experiment 2b is not due to a diVerence in grammaticality between the two conditions. For the records, it may be added that we found a similar ambiguity eVect for verb stems ending in –s in a further experiment. For such verbs (like reis- , ‘travel’), the absence of geminate consonants in German implies that the addition of the 3rd person singular –t ending has the same outcome as the addition of the 2nd person singular –st ending, viz. reist. Sentences such as (15.18a) with such ambiguous verb forms were again rated better (4.80 versus 4.41 on our 7 point scale) than those involving the unambiguous past tense form (15.18b) by 32 subjects in an experimental design identical to the one of experiment 2b (F1 (1,31) ¼ 6:58, p < :05; F2 (1,15) ¼ 4:38, p ¼ :05). (15.18) a.
er he b. er he
oder or oder or
du you du you
reist travel reiste travelled
nach to nach to
Amerika America Amerika America
308
Gradience in Wh-Movement Constructions
Local ambiguity thus seems to increase acceptability irrespective of the source of the ambiguity and the particular feature combinations involved. 15.3.9 Experiment 3: Fronted verb phrases Experiment 3 investigated whether case ambiguities can also increase acceptability. German VPs can appear in the position immediately preceding the Wnite verb in main clauses, as (15.19a) illustrates. Such structures are grammatical when the NP in the fronted VP is an (underlying) object of the verb. The inclusion of an (underlying) subject (as in 15.19b) is taken to be much less acceptable. geku¨sst] [VP einen Jungen a.acc boy.acc kissed ‘she has not kissed a boy’ b. ??ein Junge geku¨sst a.nom boy.nom kissed ‘a boy has not kissed her’
(15.19) a.
hat sie nicht has she not hat sie nicht . has her not
Feminine and neuter nouns do not distinguish morphologically between nominative and accusative. Unlike (15.19a–b), (15.19c–d) with feminine Frau involve a local ambiguity of the sentence initial NP. [VP eine a.amb d. [VP eine a.amb
(15.19) c.
Frau woman Frau woman
geku¨sst] kissed geku¨sst] kissed
hat has hat has
er he.nom ihn him.acc
nicht not nicht not
The grammatical restriction against the inclusion of subjects in preposed VPs implies a parsing preference for initially analysing eine Frau as the object of the verb geku¨sst in (15.19c–d). This analysis can be maintained in structures such as (15.19c) in which the second NP is nominative, but it must be abandoned in (15.19d) when the pronominal NP is parsed because it bears accusative case, which identiWes it as the object. Example (15.19d) should be less unacceptable than (15.19b) if global acceptability is inXuenced by temporary acceptability values: unlike (15.19b), (15.19d) initially appears to respect the ban against the inclusion of subjects in fronted VPs. We tested this prediction in a questionnaire and a speeded acceptability rating experiment. 15.3.10 Description of experiment 3a 15.3.10.1 Material (15.20).
The experimental items had the structure illustrated in
Effects of Processing DiYculty on Judgements (15.20) Fronted VP ¼ ambiguous subject + verb a. Ein schlaues Ma¨dchen geku¨sst hat a.amb clever.amb girl kissed has ‘a clever girl has not yet kissed him’ Fronted VP ¼ ambiguous object + verb b. Ein schlaues Ma¨dchen geku¨sst hat a.amb clever.amb girl kissed has ‘he has not yet kissed a clever girl’ Fronted VP ¼ unambiguous subject + verb c. Ein junger Mann besucht hatte a.nom young man visited had ‘a young man visited him only yesterday’ Fronted VP ¼ unambiguous object + verb d. einen jungen Mann besucht hatte a.acc young man visited had ‘he visited a young man only yesterday’
309
ihn noch nie him not yet
er he
noch nie not yet
ihn erst gestern him only yesterday
er he
erst gestern only yesterday
All experimental items involved a preposed VP. The NP in this VP could either bear overt case morphology (unambiguous condition, 15.20c–d) or be unmarked for case (locally ambiguous condition, 15.20a–b). The second NP in the sentence was a pronoun bearing the case not realized by the Wrst NP (unambiguous condition), or which disambiguated the initial NP in the ambiguous condition towards an object or subject reading. A set of 64 sentences (16 sets of identical lexical material in each of the 4 conditions) was created and assigned to 4 between subjects versions in such a way that no subject saw identical lexical material in more than one sentence. 15.3.10.2 Method Forty-eight students of the University of Potsdam participated. They were paid for their participation, or received course credits. The 16 experimental items (4 per condition) were among the distractors of experiment 2a. 15.3.10.3 Results and discussion As Wgure 15.6 shows, fronted verb phrases that include a direct object are more acceptable than those that include a fronted subject (F1 (1,47) ¼ 34:74, p < :001, F2 (1,15) ¼ 37:26, p < :001)., Contrary to our expectation, there was no main eVect of ambiguity (F1 < 1, F2 < 1) and no interaction between both factors (F1 (1,47) ¼ 2:69, p ¼ :11, F2 < 1). We used the same material in a speeded acceptability rating experiment.
Gradience in Wh-Movement Constructions
310 7 6
5.1
4.9
5 3.8
4
AMB
3.7
UNA
3 2 1 SUB
OBJ
Figure 15.6. Experiment 3a (questionnaire)
15.3.11 Description of experiment 3b 15.3.11.1 Material experiment 3a.
The experimental items were the ones used in
15.3.11.2 Method Twenty-six students were paid for their participation in a speeded acceptability judgement task. There were 64 experimental sentences (16 per condition), and 160 unrelated Wller sentences. After a set of 16 training sentences (4 in each of the critical conditions), the sentences of the experiment were randomly presented word by word. Every word appeared in the centre of a screen for 400 ms (plus 100 ms ISI). 500 ms after the last word of each sentence, subjects had to judge its well-formedness within a maximal interval of 3000 ms by pressing one of two buttons. 1000 ms after their response, the next trial began.
94.71
100
% acceptable
80 60
96.15
65.63 AMB
51.44
UNA 40 20 0 VP with SU
Figure 15.7. Experiment 3b (speeded rating)
VP with OB
Effects of Processing DiYculty on Judgements
311
15.3.11.3 Results The results of experiment 3b are represented in Figure 15.7. There was a main eVect of the grammatical function of the NP in the fronted VP: Fronted VPs that include an object were rated acceptable in 89.5 per cent of cases, contrasting with 54.6 per cent for VPs including a subject (F1 (1,25) ¼ 85:65, p < :0001, F2 (1,15) ¼ 127:7, p < :001). Items in which the NP in VP was case-ambiguous were rated as acceptable more often (75.11 per cent) than items in which there was no ambiguity (69.0 per cent) (F1 (1,25) ¼ 6:98, p < :05, F2 (1,15) ¼ 14:87, p < :05:). The interaction between the grammatical function of the NP in VP and its ambiguity was also signiWcant (F1 (1,25) ¼ 10:25, p < :01, F2 (1,15) ¼ 29:14, p < :001): for VPs including objects, there was no ambiguity related diVerence (ambiguous: 94.7 per cent; unambiguous 96.1 per cent). VP-fronting that pied-pipes the subject was considered acceptable in 65.6 per cent of the trials when the subject was not overtly case marked, but only 51.4 per cent of the trials were rated as acceptable when the subject-NP bore an unambiguous case marking. 15.3.11.4 Discussion Experiments 3a and 3b show that there is a syntactic restriction against underlying subjects appearing in fronted VPs. They also show that local case ambiguities do not reduce acceptability. Furthermore, experiment 3b conWrms our expectation that local ambiguities may increase acceptability: the temporary fulWlment of the constraint blocking subjects in preposed VPs in structures such as (15.20a) seems to render such structures more acceptable than examples such as (15.20c), in which the violation of the anti-subject restriction is obvious from the beginning. Experiment 3b is thus in line with experiments 1 and 2. However, locally ambiguous and unambiguous structures were equally acceptable in experiment 3a. We can only oVer some speculations about the reasons for this diVerence to the other experiments. The resolution of the local ambiguity in experiment 3 aVects the assignment of grammatical functions and the interpretation of the sentence, while the number and person ambiguities had no such eVect in experiments 1 and 2. The need for revising an initial interpretation may have negative consequences for acceptability that override positive eVects of local ambiguity (experiment 3a). If this reduction of acceptability takes place in a time window later than the one used in experiment 3b, we would understand why the speeded acceptability rating task shows a positive impact of local ambiguity on acceptability.
15.4 Structural ambiguities In experiments 1–3, the positive results of early computations increased the global acceptability of a sentence even when the outcome of these early
312
Gradience in Wh-Movement Constructions
computations had to be revised later. The experiments dealt with constructions in which some crucial item was morphologically ambiguous. In experiment 4, we investigated whether structural ambiguities in which morphological facts play no (decisive) role can also lead to increased acceptability. Experiment 4 focused on structures that Kvam (1983) had used in his informal studies on long wh-movement constructions in German. He observed that the acceptability of such sentences depends on whether the grammatical features of the phrase having undergone long movement also match requirements imposed by the matrix verb on its arguments. Example (15.21) illustrates two structures in which the subject of a complement clause has been moved into the matrix clause. Example (15.21a) is more acceptable than (15.21b). When (15.21a) is parsed, was ‘what’ locally allows an analysis as an object of the matrix clause, while wer ‘who’ could neither function as the subject nor as the object of the matrix clause. (15.21) a. was denken Sie, dass die Entwicklung beeinXusst what think you that the development inXuenced b. wer denken Sie, dass die Entwicklung beeinXusst who.sg think.pl you that the development inXuenced ‘who/what do you think inXuences the development’
hat has hat has
The same logic underlies the contrast between the relative clauses in (15.22), in which the relative pronoun is extracted from an inWnitival complement clause. In the more acceptable (15.22a), the relative pronoun die also Wts the accusative case requirement of the predicate embedding the inWnitive. In the less acceptable (15.22b), the dative case of the relative pronoun clashes with the case requirement of the embedding predicate. (15.22) a. Eine Kerze, die a
candle which
er fu¨r gut
hielt, dem
he for good
held the.dat Ludwig
Ludwig
zu weihen to dedicate
‘a candle which he considered good to dedicate to Ludwig’ b. Eine Frau, a
der
er fu¨r angemessen hielt, ein
woman who.dat he for appropriate held a
Geschenk zu geben present
to give
Such contrasts in acceptability are predicted by the hypothesis pursued here. The relatively low acceptability of the b-examples reXects the decrease in acceptability that long distance movement always seems to come along with. Probably, this decrease is due to the processing problems which long distance movement creates. The a-examples, on the other hand, involve a local ambiguity: temporarily, the a-examples can be interpreted by the human parser as involving short distance movement only. Short distance movement
Effects of Processing DiYculty on Judgements
313
is more acceptable than long distance movement. If the global acceptability of a clause reXects the status of intermediate parsing steps, long distance movement constructions that temporarily allow a short distance analysis (15.21, 15.22a) should be more acceptable than those long distance movement constructions that do not involve such an ambiguity (15.21, 15.22b). 15.4.1 Description of experiment 4 15.4.1.1 Material Experiment 4 consists of two subexperiments, one for wh-questions, the other for relative clauses. In the question subexperiment, the eight experimental items (4 per condition) had the structure illustrated in (15.21). The subject of a dass ‘that’- complement clause is moved into the matrix clause, consisting of a plural matrix verb and a pronominal subject. In the unambiguous wh-condition (15.21b), the subject extracted from the complement clause is nominative wer ‘who’. Because of its case, wer allows no intermediate analysis as the object of the matrix clause. Since it does not agree with the plural verb, wer can also not be analysed as the matrix subject. In the ambiguous wh-condition (15.21a), the moved wh-pronoun was ‘what’ is case ambiguous. In its accusative interpretation, it could Wgure as the object of the matrix clause, in its (eventually mandatory) nominative interpretation, it is the subject of the complement clause. The eight items of the relative clause subexperiment were constructed as illustrated in (15.22). A relative pronoun is extracted from an inWnitival complement clause. In the unambiguous relative clause condition (15.22b), the dative case of the relative pronoun does not match the case requirements of the predicate embedding the inWnitive. In the locally ambiguous relative clause condition (15.22a), the accusative relative pronoun is compatible with the case requirements of the embedding predicate. 15.4.1.2 Method Forty-eight students of the University of Potsdam participated. They were paid for their participation, or received course credits. The sixteen experimental items were among the distractor items of experiment 2a. 15.4.1.3 Results and discussion Figure 15.8 graphically represents the mean acceptability of locally ambiguous and unambiguous wh-questions. The acceptability of the locally ambiguous question is much higher than that of the unambiguous construction (F1 (1,47) ¼ 30:05, p < :001, F2 (1,7) ¼ 62:50, p < :001). The wh-subexperiment of experiment 4 conWrms that long distance wh-movement is not fully acceptable for speakers of Northern German (see
314
Gradience in Wh-Movement Constructions 7 6 4.7
5 4
3.15
3 2 1 Unambigous
Ambiguous
Figure 15.8. Experiment 4–wh-questions
also Fanselow et al. to appear). The fairly low acceptability value for the unambiguous wh-condition constitutes clear evidence for this. Furthermore, as in the preceding experiments, acceptability is aVected by the presence of a local ambiguity in a signiWcant way: if the sentence to be judged can temporarily be analysed as involving short movement, its acceptability goes up in quite a dramatic way. The initial segment of (15.21a) is locally ambiguous in more than one way. In addition to the possibility of interpreting was as a matrix clause object or an argument of the complement clause, was also allows for a temporary analysis as a wh-scope-marker in the German ‘partial movement construction’ illustrated in (15.23). (15.23) was denkst Du wen Maria einla¨dt what think you who.acc Mary invites ‘who do you think that Mary invites?’ Therefore, we only know that the local ambiguity of (15.21a) increases its acceptability, but we cannot decide whether this increase is really due to the short versus long movement ambiguity. The relative clause subexperiment avoids this problem. In the grammatical context in which they appear in (15.22), the crucial elements die and der can only be analysed as relative pronouns. The only ambiguity is a structural one: long versus short movement of the relative pronoun. When the relative pronoun is temporarily compatible with a short movement interpretation, the structure is more acceptable than when the case of the relative pronoun clashes with the requirements of the matrix clause (F1 (1,47) ¼ 8:28, p < 0:01), F2 (1,7) ¼ 3:73, p ¼ :10). Both subexperiments thus show the expected inXuence of the local ambiguity on global acceptability: long distance movement structures are
Effects of Processing DiYculty on Judgements
315
7 6 4.76
5
4.17
4 3 2 1 Unambiguous
Ambiguous
Figure 15.9. Experiment 4–relative clauses
perceived as more acceptable when the wh-phrase/the relative pronoun involves a local ambiguity. The Wnal subexperiment revealed that such eVects show up even when the only ambiguity involved is the one between short and long movement.
15.5 Conclusions The experiments reported in this paper have shown that the presence of a local ambiguity inXuences the overall acceptability of a sentence. If our interpretation of the results is correct, there is a spillover from the acceptability of the initial analysis of a locally ambiguous structure to the global acceptability of the complete construction. Structures violating some constraint may appear more acceptable if their parsing involves an intermediate analysis in which the crucial constraint seems fulWlled. Similar eVects show up in further constructions, such as free relative clauses (see Vogel et al in preparation). At the theoretical level, several issues arise. First, the factors need to be identiWed under which local ambiguities increase acceptability. Secondly, means will have to be developed by which we can distinguish mitigating eVects of local ambiguities from a situation in which the grammar accepts a feature clash in case it has no morphological consequences. Thus, in contrast to what we investigated in experiment 2, plural NP coordinations such as the one in (15.24) that involve 1st and 3rd person NPs seem fully acceptable although they should involve a clash of person features. Perhaps, the diVerent status of (15.24) and the structures we studied in experiment 2 is caused by the fact that 1st and 3rd person plural verb forms are always identical in German, whereas the syncretisms studied above are conWned to certain verb classes, or certain tense forms. Similarly, the case clash for was in
316
Gradience in Wh-Movement Constructions
(15.25)4 has no negative eVect on acceptability at all, in contrast to what happens in (15.26), and this is certainly due to the fact that inanimate pronouns never make an overt distinction between nominative and accusative case. (15.24) entweder wir oder either we or
die Brasilianer gewinnen the Brazilians win
(15.25) was what
ist zu is too
Du kaufst you buy
(15.26) wer/wen Du mitbringst ist zu who you bring is too ‘the person who you bring is too shy’
das Spiel the game
teuer expensive schu¨chtern shy
At a more practical level, our results certainly suggest that sentences involving a local ambiguity should be avoided when one tries to assess the acceptability of a construction one is interested in. 4 Kaufen assigns accusative case to was, while the matrix predicate requires nominative case.
16 What’s What? N O M I E RT E S C H I K - S H I R
16.1 What is the status of gradience? My purpose in this chapter is to demonstrate that the source of graded acceptability judgements cannot be purely syntactic. Instead, such data are predicted by information structure (IS) constraints.1 Since the early days of generative grammar, it was observed that acceptability often patterns as squishes (e.g. Ross 1971). Ross’ explanation was couched in terms of the strength of the transformation, the strength of the construction (island) and the strength of the language. Danish was therefore considered to be a ‘strong’ language because it allowed extraction out of relative clauses, which were graded as ‘strong’ islands, by a ‘weak’ transformation such as wh-movement: (16.1) Hvad for en slags is er der mange børn der kan li? what kind of ice cream are there many children who like The idea that gradience is the result of the ‘strength’ of the processes (or constraints) involved has survived till today, particularly in some versions of optimality theory. Less sophisticated attempts at this type of theory were made in the 1990s: for example, the Empty Category Principle (ECP) was considered to be a stronger constraint and subjacency was considered to be a weak constraint. The violation of both these constraints together was predicted to result in a stronger grammaticality infraction than the violation of just one. This provided an explanation for the distinction between the examples in (16.2) and (16.3):
1 Thanks to Gisbert Fanselow and an anonymous reviewer for their comments and to Tova Rapoport, SoWe Raviv, and the audience of the ‘Conference on Gradedness’ at the University of Potsdam for their feedback. This work is partially supported by Israel Science foundation Grant #1012/03.
318
Gradience in Wh- Movement Constructions
(16.2) a. ?This is the guy that I don’t know whether to invite t b. ?This is the guy that I don’t know whether I should ask t to come to the party. c. ?This is the guy that I asked whether Peter had seen t at the party. (16.3) a. *This is the guy that I don’t know whether t should be asked to come to the party. b *This is the guy that I asked whether t had seen Peter. The hierarchy in (16.4) (Lasnik and Saito 1992: 88) illustrates that the strength of subjacency can be seen as depending on the number of barriers crossed. In the last example of the three subjacency is doubly violated, in the others it is only singly violated. (16.4) a. ??What did you wonder whether John bought? b. ?*Where did you wonder what John put? c. *Where did you see the book which John put? This shows that in principle it is possible to have, as the output of the syntax, sentences of various levels of acceptability. The number of constraints violated, as well as the strength of the relevant constraint/s, will render diVerent outputs. However, as observed already in Ross (1971), the extraction data are complicated by the fact that not all processes of extraction render the same results. The sentences in (16.4a) and (16.4b) are worse than the ones in (16.2). An attempt to explain why wh-movement can render worse outputs than relativization is to be found in Cinque (1990). Cinque demonstrates that extracted phrases which can be interpreted as being d-linked render superior results to those in which the extracted phrases cannot be interpreted in this way. The type of extraction illustrated in (16.2) is more readily interpreted as being d-linked than simple wh-movement. D-linked wh-movement also improves the examples in (16.4a) and (16.4b), as shown in (16.5): (16.5) a. ?Which book did you wonder whether John bought? b. ??Which place did you wonder what John put? D-linking depends on whether a contextual referent for the wh-phrase is available. Hence in a context in which a set of relevant books (16.5a) or a set of relevant places (16.5b) is available (16.4a) and (16.4b) should be as good as the examples in (16.5). Cinque builds this notion of referentiality into his syntax and predicts that when the context provides the required referent, the extraction should be perfectly acceptable. What is missing in Cinque’s
What’s What?
319
approach is an explanation of why referentiality should interact with syntactic constraints, such as subjacency, in this way. On the basis of squishy data of this type I argued in Erteschik-Shir (1973) that extraction is completely determined by IS constraints, in particular that only focus domains are transparent for purposes of extraction. The intuition behind this idea was that potential focus domains are processed diVerently from non-focus domains in that gaps are only visible in the former.2 In view of the fact that the availability of focus domains depends on context, the results will be graded according to the discourse into which the target sentence is embedded. Example (16.6) provides an illustration: (16.6) a. Who did John say that he had seen? b. ?Who did John mumble that he had seen? c. *Who did John lisp that he had seen? Example (16.6b) is improved in a context in which ‘mumbling’ has been mentioned (e.g. following ‘At our meetings everyone always mumbles’). Example (16.6c) is acceptable in a context in which it is known that John lisps. This is because such a context enables the main verb to be defocused and consequently enables the subordinate that-clause to be focused. If intuitions are elicited out of context, judgements for sentences of this kind will depend on whatever context the informant happens to come up with. The examples in (16.7) illustrate other kinds of contextual factors that interact with focusassignment and the concomitant acceptability judgements. (16.7) a. ??What did the paper editorialize that the minister had done? b. *What did you animadvert that he had done? Example (16.7a) would sound much better if uttered by a member of an editorial board, and (16.7b) probably can’t be contextually improved due to the fact that highly infrequent items such as animadvert are necessarily focused. Contrastive contexts also interact with extraction judgements: (16.8) a. ?Who did John SAY that he had seen? [¼contrastive] b. Who did JOHN say that he had seen? [¼contrastive] Contrastive focus on the main verb, as in (16.8a), or on another constituent of the main clause, as in (16.8b), does not preclude focus on the subordinate clause. Therefore these sentences are Wne with contrastive interpretations. The reason (16.8a) is slightly more degraded than (16.8b) is because it is harder to construe a likely context for it. 2 See Erteschik-Shir and Lappin (1983: 87) for this proposal which also provides an explanation for why resumptive pronouns salvage islands.
320
Gradience in Wh- Movement Constructions
These examples illustrate that the positive response of informants is conditional on their ability to contextualize in such a way that the clause from which extraction has occurred is interpreted as a focus domain. In view of the fact that informants diVer with respect to the contexts they are able to construct, the results across informants are predicted to be non-uniform. A number of diVerent syntactic solutions have been suggested over the years to account for such squishes in grammaticality. This type of solution does not, however, explain the gradience of the output, nor does it explain the contextual eVects. Speakers’ judgements with respect to data of the kind illustrated in (16.6)–(16.8) are rarely stable: diVerences are found across speakers and sometimes the responses of the same speaker change. This type of instability occurs whenever grammaticality is context dependent because the judgements in such cases are also context dependent. Therefore, if a sentence of this type is presented to an informant out of context, it is judged good to the extent that the speaker can imagine a context in which the verb is defocused. The lowest grade will be assigned to a sentence for which the particular informant does not come up with a context which improves it. No syntactic account of data of the type in (16.6)–(16.8), even if it can predict gradience, will be able to predict the contexts which improve acceptability. Syntactic constraints will therefore always fail empirically. Let us examine if this is indeed the case: it has been suggested that the thatclauses following verbs of manner-of-speaking, other than say, are adjuncts rather than complements (e.g. Baltin 1982) and adjuncts are, of course, islands. Extraction is therefore predicted to be blocked. Formulating the constraint on extraction out of that-clauses as a syntactic constraint on extraction out of adjuncts predicts that, with a particular verb, extraction will either be perfectly good, or totally bad. Moreover, it will not allow for the inXuence of context in improving extractability. Supporting the ‘adjunct’ analysis is the fact that the that-clauses which are argued to be adjuncts are optional: (16.9) John mumbled/lisped/*said. This correlation, however, also follows from an analysis in terms of IS: the verbs that require a complement are light verbs which do not provide a focus. Sentences with such verbs without a complement are ruled out because they do not contain an informational focus, a minimal requirement for any sentence. Another reason that the oddness of the sentences in (16.6)–(16.8) cannot be due to a syntactic constraint is that not only must a level of acceptability be assigned, but an account of the context dependency of the grammaticality
What’s What?
321
judgements must also be given. Such an account is provided by a theory of IS which accounts for the contextual properties of sentences. I conclude that any phenomenon which varies with context among and across speakers cannot receive a syntactic account. An account in terms of IS is geared to predict this type of variation. It follows that syntactic constraints will always render ungraded results. A violation of a syntactic constraint will therefore be ungrammatical, a violation of an IS constraint will be open to contextual variation and will therefore result in gradience. There will be no weak syntactic constraints, only strong ones. Examples of violations of such ‘real’ syntactic constraints are shown in (16.10): (16.10) a. *John eat soup. (agreement)3 b. *Eats John soup? (do) c. *John likes he. (case) If we adopt the proposal that violations of syntactic constraints cannot be graded, whereas violations of IS constraints can be, we can employ the presence of context-sensitive grammaticality squishes as a diagnostic for whether a syntactic or an IS constraint is involved: whenever context interacts with acceptability, the constraint cannot be syntactic. The answer to the question posed in the title is therefore that IS constraints can generate graded output whereas syntactic constraints cannot. It is often assumed that IS constraints must be universal since they are based on universal concepts such as topic and focus. This is only partially true and depends on how the particular language codes these basic concepts. Danish is a language in which topics are fronted whenever possible. In English, however, topics are generally interpreted in situ. This diVerence triggers a diVerence in the application of island constraints in the two languages. Compare, for example, the following case of extraction out of a relative clause in the two languages (both are licensed by the IS constraint in (16.16) below): (16.11) a. Den slags is er der mange der kan li. This kind icecream are there many who like ‘This kind of icecream there are many who like.’ b. ?This is the kind of icecream that there are many people who like. 3 Timmermans et al. (2004) argue that agreement involves both a syntactic procedure and a conceptual-semantic procedure which aVects person agreement with Dutch and German coordinated elements which diVer in person features. The former, according to these authors, ‘hardly ever derails’. This is what I have in mind here. The fact that nonsyntactic procedures are also involved in certain agreement conWgurations is irrelevant to the point I’m making here.
322
Gradience in Wh- Movement Constructions
Example (16.11a) is perfect in Danish and sentences of this sort are common. Example (16.11b) is surprisingly good in English in view of the fact that it violates the complex NP constraint, yet it is not considered perfect by speakers of English. In Erteschik-Shir (1982) I oVer more comparative data and illustrate that the acceptability squish in English is exactly the same as in Danish, yet all the examples in English are judged to be somewhat worse than their Danish counterparts. In Erteschik-Shir (1997), I introduce a theory of IS, f(ocus)-structure theory. F-structure is geared to interact with syntax, phonology, and semantics and is therefore viewed as an integral part of grammar. Here I argue that this approach predicts gradience eVects of various kinds. In Section 16.2, I map out the theory of f-structure. Section 16.3 demonstrates the f-structure constraint on extraction. In section 16.4, I show that the same constraint which accounts for extraction also accounts for Superiority in English and the concomitant gradience eVects. In section 16.5, I extend this account to explain diVerent superiority eVects in Hebrew, German, and Danish. Section 16.6 provides a conclusion.
16.2 Introduction to f(ocus)-structure theory The primitives of f-structure are topic and focus.4 These features are legible to both interfaces: in PF this shows up across languages as intonational marking as well as f-structure motivated displacements. At the interpretative interface it allows for the calculation of truth values. Following Strawson (1964) and Reinhart (1981), I deWne topics as the ‘address’ in a Wle system under which sentences are evaluated (Erteschik-Shir 1997). If truth values are calculated with respect to the topic, it also follows that every sentence must have a topic. Topics are selected from the set of referents previously introduced in the discourse, which correspond to cards in the common ground. Topics are therefore necessarily speciWc: they identify an element in the common ground that the sentence is about. Whereas diVerent types of focus have been deWned in the literatures (e.g. information focus, contrastive focus, broad focus, narrow focus), I propose only one type of focus which functions to introduce or activate discourse referents. The diVerent types of focus are derived in this framework by
4 In Erteschik-Shir (1997) I assume that the output of syntax is freely annotated for topic and focus features. In Erteschik-Shir (2003), I introduce topic-focus features at initial merge on a par with w-features in order to abide by the inclusiveness principle. The issue of how top/foc features are introduced into the grammar is immaterial to the topic of this paper.
What’s What?
323
allowing for multiple topic-focus assignments. As an illustration, examine the contrastive f-structure in (16.12): (16.12)Itop read
a Book foc top foc, (not a magazine) a magazine
In cases of contrast, a contrast set with two members is either discoursally available or else it is accommodated. In (16.12) ‘a book and a magazine’ form such a set. In view of the fact that this set is discoursally available, it provides a topic (as indicated by the top marking on the curly brackets). One of the members of this set is focused (in this case ‘a book’) and in this way, the set is partitioned, excluding the non-focused member of the set from the assertion. Since foci are stressed in English, stress is assigned to the contrasted element.5,6 Contrastive elements are thus marked as both topic (the discourseavailable pair) and as focus (the selected element). Such an element can play the role of a focus in the sentence as a whole as it does in (16.12), and it can also function as a main topic, forming a contrastive topic; this is illustrated in (16.13): TOMfoc top top is handsome. (16.13) Bill
Example (16.13) asserts that Tom (and not Bill) is handsome. Not all f-structure assignments are equally good. Example (16.14) illustrates a well-known asymmetry: objects are harder to interpret as topics than subjects (in languages with Wxed word order and no morphological marking of top/foc):7 (16.14) Tell me about John: a. He is in love with Mary. b. ??Mary is in love with him. In view of the fact that this constraint Wgures prominently in languages such as English which have Wxed word order, I propose that the reason for this asymmetry is that there is a preference for aligning f-structure with syntactic structure. The alignment is shown in (16.15): 5 See Erteschik-Shir (1997, 1999) for an account of intonation in which f-structure provides the input to a stress rule which assigns stress to foci. 6 Example (16.12) also illustrates that not all constituents of a sentence need be assigned either top or foc features. Here the verb is assigned neither and the sentence could provide an answer to a question such as ‘Did you read a book or a magazine?’ 7 See, among others, Li and Thompson (1976); Reinhart (1981); Andersen (1991); and Lambrecht (1994).
324
Gradience in Wh- Movement Constructions
(16.15) Canonical f-structure: SUBJECTtop [ . . . X . . . ]foc In other words, an unmarked f-structure is one in which syntactic structure is isomorphic with f-structure: either the subject is the topic and the VP is the focus or there is a stage topic and the remaining sentence is the focus.8 It follows that a marked f-structure is one in which an object is the topic. This section provided a brief outline of the discourse properties of f-structure.9 I now turn to f-structure constraints on syntax, constraints which provide a graded output.
16.3 F-structure constraints As pointed out above, only focus domains are transparent for purposes of extraction. Erteschik-Shir (1997) argues that this constraint on extraction falls under a more general constraint on ‘I-(dentiWcational) dependencies’ which include anaphora, wh-trace dependencies, multiple wh-dependencies, negation and focus of negation, and copular sentences. What all these dependencies have in common is that the dependent is identiWed in the construction, either by its antecedent or by an operator. The constraint which governs I-dependencies is the Subject Constraint: (16.16) An I-dependency can occur only in a canonical f-structure: SUBJECTtop [ . . . X . . . ]foc j I-dependency I-dependencies are thus restricted to f-structures in which the subject is the topic and the dependent is contained in the focus. The intuition behind this constraint is that dependents must be identiWed and that a canonical f-structure, in which f-structure and syntactic structure are aligned, enables the processing of this identiWcation. In the case of wh-traces, for example, the trace must be identiWed with the fronted wh-phrase. The proposed constraint restricts such identiWcation to canonical f-structures. The constraint is thus couched in processing terms in which f-structure plays a critical role. 8 Sentences uttered out-of-the-blue are contextually linked to the here-and-now of the discourse. I argue in Erteschik-Shir (1997) that such sentences are to be analysed as all-focus predicated of a ‘stage’ topic. The sentence It is raining, for example, has such a stage topic and is therefore evaluated with respect to the here-and-now. All-focus sentences also have a canonical f-structure in which the (covert) topic precedes the focus. 9 I have included only those aspects of f-structure strictly needed for the discussion in this chapter. See Erteschik-Shir (1997) for a more complete introduction to f-structure theory.
What’s What?
325
Let us Wrst examine how the constraint applies to the graded extraction facts in (16.6)–(16.8). In Erteschik-Shir and Rapoport (in preparation), we oVer a lexical analysis of verbs in terms of meaning components. We claim that verbs of speaking have a Manner (M) meaning-component. M-components are interpreted as adverbial modiWers, which normally attract focus. The M-component of ‘light’ manner-of-speaking verbs such as say is light, that is there is no adverbial modiWcation, and the verb cannot be focused. M-components can be defocused contextually, enabling focus on the subordinate clause, which then meets the requirement on extraction, since, according to the subject constraint, the dependent (the trace) must be contained in the focus domain. It follows that, out of context, only that-clauses under say allow extraction. All the other manner-of-speaking verbs require some sort of contextualization in order for the adverbial element of the verb to be defocused, thus allowing the subordinate clause to be focused. Extraction is judged acceptable in these cases to the extent that the context enables such a focus assignment. The subject constraint, which constrains dependencies according to whether the syntactic structure and the f-structure are aligned in a certain way can, in cases such as this one, generate graded results. This is not always the case. Extraction out of sentential subjects is always ungrammatical and cannot be contextually ameliorated. Example (16.17) gives the f-structure assigned to such a case: (16.17) *Who is [that John likes t]top [interesting]foc In order to comply with the subject constraint, the subject, in this case a sentential one, must be assigned topic. Since dependents must be in the focus domain, they cannot be identiWed within topics and extraction will always be blocked. Although the subject constraint involves f-structure, it does not necessarily render graded results. This is because the constraint involves not only f-structure but also the alignment of f-structure with syntactic structure. Sentential subjects are absolute islands because they are both IS topics and syntactic subjects.
16.4 Superiority Superiority eVects are graded as the examples in (16.18) show: (16.18) a. b. c. d.
Who ate what? *What did who eat? Which boy read which of the books? Which of the books did which boy read?
326
Gradience in Wh- Movement Constructions e. f.
?What did which boy read? ?*Which of the books did who read?
Superiority therefore provides a good test case to demonstrate how gradience is predicted by f-structure theoretical constraints. The answer to a multiple wh-question forms a paired list, as demonstrated in (16.19): (16.19) Q: Who read what? A: John read the Odyssey and Peter read Daniel Deronda. Such an answer can be viewed as ‘identifying’ each object (answer to what) with one of the subjects (answer to who). In this sense the multiple whquestion itself forms an I-dependency in which one wh-phrase is dependent on the other. Superiority eVects are the result of two I-dependencies in the same structure: (16.20) *What did who read t j j j j One I-dependency is between the fronted wh-phrase and its trace. The other one is between the two wh-phrases. As (16.20) illustrates, the dependent is identiWed in two diVerent dependencies at once. This results in an interpretative clash, thus blocking the processing of the sentence. The subject constraint is not violated, however, since the subject wh-phrase can be assigned topic (the question ranges over a discourse speciWed set; it is d-linked) and the trace can be analysed as within the focus domain. Since it is not f-structure assignment which rules out the sentence, context should not have an eVect on cases of superiority. This prediction is false, however, as shown by the following well-known example: (16.21) I know that we need to install transistor A, transistor B, and transistor C, and I know that these three holes are for transistors, but I’ll be damned if I can Wgure out from the instructions where what goes! (Pesetsky 1987, from Bolinger 1978). The answer to this puzzle lies in a proper understanding of the distinction between d-linked and non-d-linked questions, however not the one proposed in Pesetsky (1987; see Erteschik-Shir 1986, 1997). Examples (16.22a) and (16.22b) illustrate a non-d-linked and a d-linked question, respectively, and the f-structure of each one:
What’s What?
327
(16.22) a.
What did you choose? [What] did you top [choose t]foc b. Which book did you choose? [Which book]top [did you choose t]foc
In the non-d-linked question in (16.22a), the fronted wh-phrase and its trace form an I-dependency and the trace is interpreted as an anaphor. Such a question must therefore conform to the Subject Constraint. In (16.22b), however, the fronted wh-phrase functions as a topic in that it ranges over a contextually available set (of books). The trace can therefore be interpreted on a par with a coreferent pronoun, since the set over which it ranges is discoursally available. Since no I-dependency is deWned, the subject constraint is not invoked, hence no superiority eVects are predicted with which-phrases, which must be interpreted as d-linked. Questions with simple wh-phrases can be interpreted as being d-linked if the context provides a set over which they must range. That is why superiority violations such as (16.21) can be contextually ameliorated. They are always degraded, however. The reason is that both wh-phrases have to be interpreted as topics as shown in (16.23): (16.23) [Where]top [[whattop [goes t]foc]foc j j I-dependency The subject wh-phrase forms an I-dependency with the trace in order to render the pair-list reading. The subject constraint on I-dependencies requires the subject to be a topic. The fronted wh-phrase must be interpreted as a topic because otherwise it will form an I-dependency with the trace which will then be doubly identiWed as in (16.21). Bolinger’s detailed context allows for such an interpretation. The question will be viewed as degraded relative to whether the context forces a topic reading on both wh-phrases or not. Note that both wh-phrases must be interpreted as d-linked. Which-phrases are necessarily d-linked. Therefore, multiple wh-questions involving only which-phrases are perfect, as shown in (16.18d). When only one of the wh-phrases is a whichphrase, the other depends on context to receive a d-linked interpretation. This is why (16.18e) and (16.18f) are degraded. The examples in (16.24) provide further evidence for the analysis of superiority eVects proposed here. They illustrate that superiority eVects also arise in single wh-questions when the subject is a nonspeciWc indeWnite, that is a subject which cannot be interpreted as a topic: (16.24) a. *What did a boy Wnd? b. (?)Which book did a boy Wnd?
328
Gradience in Wh- Movement Constructions c. What did a certain boy Wnd? (16.18e) d. What did a BOY Wnd? e. What do boys like?
Example (16.24a) violates the Subject Constraint because the subject cannot be interpreted as a topic. In (16.24b), the fronted wh-phrase is d-linked and therefore does not form an I-dependency with its trace. It is degraded on a par with a sentence with an indeWnite subject and a deWnite object as in (16.25): (16.25) a. (?)A boy found the book. b. A BOY found the book. Example (16.25) is degraded because it is a non-canonical f-structure (cf. (16.15)) in which the object is the topic. Note that contrastive stress on the subject as in (16.25b) enables its interpretation as a topic, rendering a canonical f-structure. Examples (16.24c), (16.24d), and (16.24e) do not violate the subject constraint because speciWc, contrastive, and generic indeWnite subjects are interpretable as topics. Kayne’s (1984) facts in (16.26) and (16.27) show that, surprisingly, an extra wh- improves superiority violations: (16.26) What did [who]top [hide t where]foc j______________j j__________________j (16.27)
Who knows what whotop [saw t]foc j______________j j_____________j
This is because the extra wh-phrase makes it possible to circumvent doublyidentifying the trace. In (16.26), for example, the fronted wh-phrase forms an I-dependency with the trace. This dependency is licensed by the subject constraint since the subject is interpreted as a topic and the trace is embedded in the focus. Another I-dependency is formed between the two remaining whphrases. This I-dependency is also licensed by the subject constraint since the dependent is embedded in the focus. The presence of the extra wh-phrase enables the formation of two separate I-dependencies without forcing a double identiWcation of the trace as in the classic case in (16.20). This is how the extra wh-phrase saves the construction. Although Kayne-type questions are an improvement on the classical case, they are still quite degraded. There are two reasons for this. First, the subject wh-phrase has to be contextualized as ranging over a topic set (due to the subject constraint). Second, the integration of the two separate dependencies
What’s What?
329
poses a heavy processing load: One I-dependency in (16.26) is between who and where allowing for the pair-list interpretation of these two wh-phrases. However, in order to process the question the fronted wh-phrase what must also be accommodated so that the interpretation of the question is that it asks for a ‘triple’ list-reading.10 The account of superiority eVects proposed here thus aVords an explanation of when context can improve acceptability and when it cannot and predicts the Wne distinctions in acceptability evident in the English data.
16.5 Superiority in other languages The account of the observed gradience in the English superiority data extends to other languages, once the nature of their canonical f-structures is determined. This section discusses Hebrew, German, and Danish data and shows that superiority eVects are determined by the same considerations as in English. DiVerences are due to variation in the application of the subject constraint which is in turn determined by the particular canonical focus structure of the language in question. 16.5.1 Hebrew The Wrst observation concerning Hebrew is that although topicalization may result in OSV, superiority violations are licensed only in the order OVS, as shown in (16.28a) and (16.28b) from Fanselow (2004): (16.28) a.
ma kana mi? what bought who b. *ma mi kana? c. mi kana ma?
Example (16.28a) is only licensed in a d-linked context in which a set of goods are contextually speciWed and (16.28c) requires a d-linked context in which a set of buyers are contextually speciWed. D-linking is not employed in Hebrew as a way to avoid double ID as it is in English.11 The fronted wh-phrase therefore does not form an I-dependency with its trace. It follows that only one I-dependency is at work in Hebrew multiple wh-questions, namely the one that renders the paired reading: (16.29) a.
mi kana ma j________j I-dependency
10 Triple dependencies are not derivable in this framework, a desirable result since they do not render an optimal output. 11 There is no parallel to a ‘which-phrase’ in Hebrew. ‘eize X’ is best paraphrased as ‘what X’.
330
Gradience in Wh- Movement Constructions b.
ma kana mi j________j I-dependency
I conclude that the subject constraint is not operative in Hebrew as it is in English. This conclusion is also supported by the fact that adding a third whphrase not only does not help, as it does in English, but is blocked in all cases: (16.30) a. *mi kana ma eifo? b. *ma kana mi eifo? c. *ma mi kana eifo? The subject constraint constrains I-dependencies to the canonical f-structure of a particular language. In English, the canonical f-structure is one in which syntactic structure and f-structure are aligned. The fact that the OVS and SVO orders of (16.28a) and (16.28c) are equally good in Hebrew and that the OSV order of (16.28b) is ruled out may mean that it is the OSV word order which is the culprit. The diVerence between OSVand OVS in Hebrew is associated with the function of the subject when the object is fronted. When it is interpreted as a topic, it is placed preverbally and when it is focused, it is placed after the verb. The examples in (16.31)–(16.33) demonstrate that this is the case: (16.31) a.
et hasefer moshe kana.12 the-book Moshe bought b. et hasefer kana moshe.
(16.32) a *et hasefer yeled exad kana the-book boy one bought ‘Some boy bought the book.’ b. et hasefer kana yeled exad. (16.33) a.
et hasefer hu kana. the-book he bought b. *et hasefer kana hu
Example (16.31) shows that a deWnite subject which can function as both a topic and a focus can occur both preverbally and postverbally. Example (16.32) shows that an indeWnite subject which cannot be interpreted as a topic is restricted to the postverbal position. Example (16.33), in turn, shows that a subject pronoun, which must be interpreted as a topic, can only occur preverbally. Examples (16.31a) and (16.33a) also require contextualization in view of the fact that both 12 ‘et’ marks deWnite objects. mi (¼ ‘who’) in object position is most naturally marked with ‘et’ whereas ma (¼ ‘what’) is not. I do not have an explanation for this distinction.
What’s What?
331
the topicalized object and the preverbal subject are interpreted as topics. Since every sentence requires a focus, this forces the verb to be focused or else one of the arguments must be interpreted contrastively. In either case the f-structure is marked. To complete our investigation of the unmarked f-structure in Hebrew, we must also examine the untopicalized cases: (16.34) a.
moshe Moshe b. ?yeled boy
kana bought exad one
et hasefer/sefer the-book/(a) book kana et hasefer bought the book
The most natural f-structure of (16.34a) is one in which the subject is the topic and the VP or object is focused. Example (16.34b) with the deWnite object interpreted as a topic is marked.13 The results of both orders are schematized in (16.35): (16.35) a. b. c. d. e. f.
*Otop Sfoc V ?Otop Stop V Otop V Sfoc *Otop V Stop Stop V Ofoc ?Sfoc V Otop
Examples (16.35c) and (16.35e) are the only unmarked cases. I conclude that the unmarked focus structure in Hebrew is one in which the topic precedes the verb and the focus follows it. Hebrew dependencies therefore do not depend on the syntactic structure of the sentence, but only on the linear order of topic and focus with respect to the verb. The (subject) constraint on I-dependencies which applies in Hebrew is shown in (16.36): (16.36) An I-dependency can occur only in a canonical f-structure: Xtop V [ . . . Y . . . ]foc Example (16.36) correctly rules out (16.28b) and predicts that both (16.28a) and (16.28c) are restricted to d-linked contexts (the initial wh-phrase must be a topic). 13 Example (i), in which both arguments are indeWnite, is interpreted as all-focus: (i) yeled exad kana sefer boy one bought book ‘Some boy bought a book.’ In this chapter all-focus sentences are ignored. For an account of such sentences within f-structure theory, see Erteschik-Shir (1997).
332
Gradience in Wh- Movement Constructions
I conclude that multiple wh-questions in Hebrew are governed by the same considerations as they are in English. DiVerences between the two languages follow from their diVerent canonical f-structures. 16.5.2 German According to many authors, German lacks superiority eVects. Wiltschko (1998) not only argues that this is not the case, but also explains why German superiority eVects have been overlooked. One of the reasons she oVers is that controlling for d-linking is diYcult since ‘discourse-related contrasts are often rather subtle’ (1998: 443). Along these lines, Featherston (2005) performed an experiment in which informants were asked to grade the data according to an open-ended scale. His results showed that superiority eVects are ‘robustly active’ in German. It turns out, then, that German does not diVer signiWcantly from English in this respect. Fanselow (2004), although aware of Featherston’s results, still distinguishes the status of English and German with respect to superiority eVects. Fanselow points out that in German the superiority eVect does appear when the subject wh-phrase is in Spec, IP (his (35)): (16.37) a.
wann hat’s wer gesehen when has it who seen b. ?*wann hat wer’s gesehen ‘who saw it when?’
In (16.37a) the subject follows the object clitic, indicating its VP-internal position. In (16.37b), it precedes the object clitic and so must be outside the VP. These data are reminiscent of the Hebrew facts just discussed: German subjects in Spec,IP must be interpreted as topics, whereas VP-internal subjects are interpreted as foci. D-linking is also required, as noted by Wiltschko. Fanselow (2004) gives the following illustration (his (42)): (16.38) wir haben
bereits
herausgefunden
we have
already
found out
a. wer
jemanden
gestern
anrief, und
who.nom someone.acc yesterday called and b. wen
jemand
gestern
anrief, und
who.acc someone.nom yesterday called and
wer
nicht
who.nom not wen
nicht
who.acc not
Aber
wir
sind
nicht eher zufrieden, bis wir auch wissen
But
we
are
not
WEN
angerufen hat
a’. wer
who.nom who.acc b’. wen
WER
called
has
angerufen hat
earlier content
until we also know
What’s What?
333
According to Fanselow, OSV order is licensed only if the object is discourse linked, but SOV order is also allowed in an out-of-the-blue multiple whquestion (his (43)): (16.39) Erza¨hl mir was u¨ber die Party. ‘Tell me something about the party.’ a. Wer hat wen getroVen? who.nom has who.acc met? b. ??Wen hat wer getroVen Fanselow’s example cannot, however, be considered out-of-the-blue. A party necessarily involves a set of participants. These are what the wh-phrases range over in the questions following the initial sentence. Since both wh-phrases range over the same set of party-participants, they are equivalent. Example (16.39a), in which no reordering has occurred, is therefore preferred. From this data I gather that the German canonical f-structure is similar to the one proposed for Hebrew, with only one small diVerence: German, too, requires that the Wrst argument be the topic and the second be the focus, yet the status of the subject is determined diVerently: German subjects are interpreted as foci when they are VP-internal, and as topics when they are not, as shown in (16.37). The position of the subject is transparent only in the presence of adverbials or other elements that mark the VP boundary.14 In many of the examples in which such elements are absent, the linear position of the subject wh-phrase gives no clue as to its syntactic position. In those cases, the subject will be interpreted according to contextual clues. An I-dependency is licensed between two wh-phrases in German when the Wrst one is interpreted as a topic and the second as a focus. 16.5.3 Danish According to Fanselow (2004), Swedish does not exhibit superiority eVects (his (12)): (16.40) Vad koepte vem what bought who In Danish, the same question is degraded: (16.41) a.
Hvem who b. ?Hvad what
købte bought købte bought
hvad? what hvem? who
14 See Diesing (1992) for this eVect.
334
Gradience in Wh- Movement Constructions
Overt d-linking signiWcantly improves the question: (16.42) Hvilken bog købte hvilken pige? Which book bought which girl? Danish may have a preference for overtly marking d-linked wh-phrases instead of just depending on contextual clues. Danish is like English in this respect, except that the preference in English is even stronger. Danish diVers from English in that superiority eVects in subordinate clauses are not ameliorated by overtly d-linked wh-phrases: (16.43) a.
*Jeg I b. *Jeg I
ved know ved know
ikke not ikke not
hvad what hvilken which
hvem who bog book
købte bought hvilken pige købte which girl bought
Danish generally marks the topic by fronting it to sentence-initial position. This is also the case if the topic is located in the subordinate clause. Topicalization within a subordinate clause is therefore excluded.15 It follows that whereas word order may signal the f-structure of the main clause, the order within subordinate clauses does not. This is the explanation I propose for the diVerent behaviour of Danish main and subordinate clauses with respect to superiority eVects. Scrambling languages such as German diVer: scrambling positions topics outside the VP in subordinate clauses as well as main clauses. No diVerence between main and subordinate clauses is predicted in scrambling languages. This prediction is borne out for German.16 Fanselow (2004) rejects the idea that the availability of scrambling is what explains the lack of superiority eVects because there are non-scrambling languages which also lack superiority eVects. I would not be surprised if non-scrambling languages exhibit the same diVerence between main and subordinate clauses as Danish. Since the verb in Danish main clauses must appear in second position, the canonical f-structure is identical to the one proposed for Hebrew. The only 15 Topicalization is licensed in subordinate clauses under a few bridge-verbs such as think. In such cases the syntactically subordinate clause functions as a main clause. 16 Hebrew is like Danish in this respect. Since Hebrew is not a scrambling language, this is what is predicted. Since English is not a scrambling language, English should also exhibit a diVerence between main and subordinate clauses. This is not the case: (i) Which book did which boy buy? (ii) I don’t know which book which boy bought. The diVerence between main and subordinate clauses in Danish arises because only in the former is fstructure marked by word order. English main clauses do not diVer from subordinate clauses in this way. This may explain why no diVerence in superiority eVects between main and subordinate clauses can be detected.
What’s What?
335
diVerence between Hebrew and Danish is the preference for overtly d-linked wh-phrases. What is common to the languages examined here is the need for d-linking of at least one of the wh-phrases in multiple wh-questions. That is why such questions are always sensitive to context and therefore exhibit gradience. Variation among languages follows from three parameters: the canonical f-structure, the availability of topicalization and scrambling processes, and the array of wh-phrases available in a particular language. As I have shown here, all three must be taken into account in order to predict the crosslinguistic distribution of superiority eVects.
16.6 What’s what? In this paper, I have shown that f-structure constraints, which are sensitive to context, generally result in gradient output. Speaker judgements, which are generally solicited out of context, depend on how likely it is for a given informant to contextualize the test sentence appropriately. This will be hard if the required f-structure is marked or if accommodation is necessary. The ability of speakers to contextualize appropriately will also vary. It follows that gradience within and across speakers is to be found whenever grammaticality is constrained by f-structure principles. I expect that the (subject) constraint on I-dependencies is universal and that its raison d’eˆtre is to enable the processing of the dependency. Sentences which exhibit a canonical f-structure are easy to process because they do not require complex contextualization. Dependencies also impose a processing burden. They are therefore restricted to structures which impose only a minimal processing burden themselves. Language variation follows from diVerences in canonical f-structure. The answer to my question ‘What’s what?’ is that gradience can only result when f-structure is involved. Violations of syntactic constraints necessarily cause strong grammaticality infractions, thus resulting in ungrammatical sentences. It follows that context-sensitive grammaticality squishes provide a diagnostic for whether a syntactic or a focus-structure constraint is involved: whenever context interacts with acceptability, the constraint cannot be syntactic. There is therefore no need for ‘weighted constraints’ in syntactic theory.
17 Prosodic InXuence on Syntactic Judgements YO S H I H I S A K I TAG AWA A N D JA N E T D E A N F O D O R
17.1 Introduction It appears that there is a rebellion in the making, against the intuitive judgements of syntacticians as a privileged database for the development of syntactic theory.1 Such intuitions may be deemed inadequate because they are not suYciently representative of the language community at large. The judgements are generally few and not statistically validated, and they are made by sophisticated people who are not at all typical users of the language. Linguists are attuned to subtle syntactic distinctions, about which they have theories. However, our concern in this paper is with the opposite problem: that even the most sophisticated judges may occasionally miss a theoretically signiWcant fact about well-formedness. In the 1970s it was observed that in order to make a judgement of syntactic well-formedness one must sometimes be creative. It was noted that some sentences, such as (17.1), are perfectly acceptable in a suitable discourse context, and completely unacceptable otherwise (e.g. as the initial sentence of a conversation; see Morgan 1973). (17.1) Kissinger thinks bananas. Context: What did Nixon have for breakfast today? Given the context, almost everyone judges sentence (17.1) to be well-formed. But not everyone is good at thinking up such a context when none is 1 This work is a revised and extended version of Kitagawa and Fodor (2003). We are indebted to Yuki Hirose and Erika Troseth who were primarily responsible for the running of the experiments we report here, and to Dianne Bradley for her supervision of the data analysis. We are also grateful to the following people for their valuable comments: Leslie Gabriele, Satoshi Tomioka, three anonymous reviewers, and the participants of Japanese/Korean Linguistics 12, the DGfS Workshop on Empirical Methods in Syntactic Research, and seminars at Indiana University and CUNY Graduate Center. This work has been supported in part by RUGS Grant-in-Aid of Research from Indiana University.
Prosodic InXuence on Syntactic Judgements
337
provided. That is not a part of normal language use. Hence out-of-context judgements are more variable, since they depend on the happenstance of what might or might not spring to the mind of the person making the judgement (see Schu¨tze 1996: sect. 5.3.1.) Our thesis is that prosodic creativity is also sometimes required in judging syntactic well-formedness when sentences are presented visually, that is when no prosodic contour is supplied. Consider sentence (17.2), which is modelled on gapping examples from Hankamer (1973). If (17.2) is read with a nondescript sort of prosody—the more or less steady fundamental frequency declination characteristic of unemphatic declarative sentences in English—it is likely to be understood as in (17.2a) rather than (17.2b). (17.2) Jane took the children to the circus, and her grandparents to the ballgame. a. . . . and Jane took her grandparents to the ballgame. b. . . . and Jane’s grandparents took the children to the ballgame.
If construed as (17.2a), ‘her grandparents’ in (17.2) is the object of the second clause, in which the subject and verb have been elided; this is clause-peripheral gapping. If construed as (17.2b), ‘her grandparents’ in (17.2) is the subject of the second clause, in which the verb and the object have been elided; this is clause-internal gapping. It demands a very distinctive prosodic contour, which readers are unlikely to assign to the word string (17.2) in the absence of any speciWc indication to do so. It requires paired contrastive accents on Jane and her grandparents, and on circus and ballgame, defocusing of the children, and a signiWcant pause between the NP and the PP in the second clause. (See Carlson 2001 for relevant experimental data.) In a language with overt case marking, a sentence such as (17.2) would not be syntactically ambiguous. The (17.2b) analysis could be forced by nominative case marking on the ‘grandparents’ noun phrase. Case marking is not robust in English, but for English speakers who still command a reliable nominative/accusative distinction, the sentence (17.3a) can only be understood as peripheral gapping with ‘us grandparents’ as object, and (17.3b) can only be understood as clause-internal gapping with ‘we grandparents’ as subject. (17.3) a. Jane took the children to the circus, and us grandparents to the ballgame. b. Jane took the children to the circus, and we grandparents to the ballgame.
If these sentences are presented in written form, (17.3a) is very likely to be accepted as well-formed but (17.3b) may receive more mixed reactions. Readers are most likely to begin reading (17.3b) with the default prosody, and they may then be inclined to continue that prosodic contour through the
338
Gradience in Wh-Movement Constructions
second clause, despite the nominative ‘we’. If so, they might very well arrive at the peripheral-gap analysis and judge ‘we’ to be morphosyntactically incorrect on that basis. It might occur to some readers to try out another way of reading the sentence, but it also might not. The standard orthography does not mark the prosodic features required for gapping; they are not in the stimulus, but must be supplied by the reader—if the reader thinks to do so. Thus, grammaticality judgements on written sentences may make it appear that clause-internal gapping is syntactically unacceptable, even if in fact the only problem is a prosodic ‘garden path’ in reading such sentences. The way to Wnd out is to present them auditorily, spoken with the highly marked prosody appropriate for clause-internal gapping, so that their syntactic status can be judged without interference from prosodic problems. The outcome of such a test might still be mixed, of course, if indeed not everyone accepts (this kind of) non-peripheral gapping, but at least it would be a veridical outcome, a proper basis for building a theory of the syntactic constraints on ellipsis. The general hypothesis that we will defend here is that any construction which requires a non-default prosody is vulnerable to misjudgements of syntactic well-formedness when it is read, not heard.2 It might be thought that reading—especially silent reading—is immune to prosodic inXuences, but recent psycholinguistic Wndings suggest that this is not so. Sentence parsing data for languages as diverse as Japanese and Croatian are explicable in terms of the Implicit Prosody Hypothesis (Fodor 2002a, 2002b): ‘In silent reading, a default prosodic contour is projected onto the stimulus. Other things being equal, the parser favors the syntactic analysis associated with the most natural (default) prosodic contour for the construction.’ In other words, prosody is always present in the processing of language, whether by ear or by eye. And because prosodic structure and syntactic structure are tightly related (Selkirk 2000), prosody needs to be under the control of the linguist who solicits syntactic judgements, not left to the imagination of those who are giving the judgements. At least this is so for any construction that requires a non-default prosodic contour which readers may not be inclined to assign to it. We illustrate the importance of this methodological moral by considering a variety of complex wh-constructions in Japanese. In previous work we have argued that disagreements that have arisen concerning the syntactic 2 There is a Wne line between cases in which a prosodic contour helps a listener arrive at the intended syntactic analysis, and cases in which a particular prosodic contour is obligatory for the syntactic construction in question. The examples we discuss in this paper are of the latter kind, we believe. But as the syntax–phonology interface continues to be explored, this is a distinction that deserves considerably more attention.
Prosodic InXuence on Syntactic Judgements
339
well-formedness of some of these constructions can be laid at the door of the non-default prosodic contours that they need to be assigned.
17.2 Japanese wh-constructions: syntactic and semantic issues The constructions of interest are shown schematically in (17.4) and (17.5). For syntactic/semantic theory, the issues they raise are: (a) whether subjacency blocks operations establishing LF scope in Japanese; (b) whether (overt) longdistance scrambling of a wh-phrase in Japanese permits scope reconstruction at LF. For many years there was no consensus on these matters in the literature. (17.4) Wh-in-situ: [ ------ [------ wh-XP ------ COMPSubord ] ------ COMPMatrix ] (17.5)
Long-distance scrambling: [ wh-XPi ------ [------ ti ------ COMPSubord ] ------ COMPMatrix ]
Consider Wrst the situation in (17.4), in which a wh-phrase in a subordinate clause has not moved overtly. This wh-phrase could have matrix scope if appropriate covert operations are permitted: either movement of a wh-phrase to its scope position at LF, with or without movement of an empty operator at S-structure, or operator-variable binding of a wh-in-situ by an appropriate COMP. If subjacency were applicable to such scope-determining operations, it would prevent matrix scope when the subordinate clause is a wh-island, for example when the subordinate clause complementizer is -kadooka (‘whether’) or -ka (‘whether’). The scope of a wh-XP in Japanese must be marked by a clause-Wnal COMPwh (-ka or -no).3 Thus, if the matrix clause complementizer were -no (scope-marker), and the subordinate clause complementizer were -kadooka, the sentence would be ungrammatical if subjacency applies to covert operations in Japanese: subordinate scope would be impossible for lack of a subordinate scope-marker, and matrix scope would be impossible because of subjacency. The applicability of subjacency to wh-in-situ constructions has signiWcant theoretical ramiWcations (see discussion in Kuno 1973, Huang 1982, Pesetsky
3 The ambiguity of some complementizers will be important to the discussion below. For clarity, we note here that both -ka and -no are ambiguous. -ka can function as a wh-scope marker, COMPWH, in any clause, or as COMPWHETHER in subordinate clauses, and as a yes/no question marker Q in matrix clauses. -no can be an interrogative complementizer only in matrix clauses, where it can function either as COMPWH or as Q. For most speakers, -kadooka is unambiguously COMPWHETHER, although a few speakers can also interpret -kadooka as a wh-scope marker (COMPWH) in a subordinate clause.
340
Gradience in Wh-Movement Constructions
1987, among others). It has been widely, although not universally, maintained that subjacency is not applicable to covert (LF) operations. Thus it would clarify the universal status of locality principles in syntax if this were also the case in Japanese. This is why it is important to determine whether sentences of this form (i.e. structure (17.4) where the subordinate complementizer is not a wh-scope marker) are or are not grammatical. We will argue that they are, and that contrary judgements are due to failure to assign the necessary prosodic contour. Example (17.5) raises a diVerent theoretical issue, concerning the relation between surface position and scope at LF. Note Wrst that the long-distance scrambling in (17.5) is widely agreed to be grammatical, even when the COMPSubord is -kadooka (‘whether’) or -ka (‘whether’). Thus, subjacency does not block scrambling (overt movement) from out of a wh-complement in Japanese (Saito 1985).4 What needs to be resolved is the possible LF scope interpretations of a wh-XP that has been scrambled into a higher clause. Does it have matrix scope, or subordinate scope, or is it ambiguous between the two? When a wh-XP has undergone overt wh-movement into a higher clause in a language like English, matrix scope is the only possible interpretation. But unlike overt wh-movement, long-distance scrambling in Japanese generally forces a ‘radically’ reconstructed interpretation, that is, a long-distance scrambled item is interpreted as if it had never been moved. (Saito 1989 describes this as scrambling having been ‘undone’ at LF; Ueyama 1998 argues that longdistance scrambling applies at PF.) If this holds for the scrambling of a whphrase, then subordinate scope should be acceptable in the conWguration (17.5). However, there has been disagreement on this point. We will maintain that subordinate scope is indeed syntactically and semantically acceptable, and judgements to the contrary are most likely due to a clash between the prosody that is required for the subordinate scope interpretation and the default prosody that a reader might assign. Thus, our general claim is that syntactic and semantic principles permit both interpretations for both constructions (17.4) and (17.5) (given appropriate complementizers), but that they must meet additional conditions on their PFs in order to be fully acceptable (see Deguchi and Kitagawa 2002 for details). We discuss the subjacency issue (relevant to construction (17.4)) in Section 17.3.1, and the reconstruction issue (relevant to construction (17.5)) in Section 17.3.2. 4 Saito argued, however, that subjacency does block overt scrambling out of a complex NP, and out of an adjunct. This discrepancy, which Saito did not resolve, remains an open issue to be investigated.
Prosodic InXuence on Syntactic Judgements
341
17.3 The prosody of wh-constructions in Japanese Japanese wh-questions have a characteristic prosodic contour, called emphatic prosody (EPD) by Deguchi and Kitagawa (2002). The wh-XP is prosodically focused, and everything else in the clause which is its scope is de-focused. That is, there is an emphatic accent on the wh-item, and then post-focal ‘eradication’ (compression of pitch and amplitude range, virtually suppressing lexical and phrasal pitch accents) up to the end of the whscope.5 Importantly, this means that there is a correlation between the extent of the prosodic eradication and the extent of the syntactic/semantic scope of the wh-phrase. Subordinate wh-scope (¼ indirect wh-question) is associated with what Deguchi and Kitagawa called short-EPD, that is EPD which ends at the COMPWH of the subordinate clause. Matrix wh-scope (¼ direct whquestion) is associated with what Deguchi and Kitagawa called long-EPD, that is EPD which extends to the matrix COMPWH at the end of the utterance. (See Ishihara (2002) for a similar observation and see Hirotani (2003) for discussion of the role of prosodic boundaries in demarcating the wh-scope domain.) This is the case for all wh-constructions, regardless of whether the wh-phrase is moved or in situ, and whether or not it is inside a potential island. 17.3.1 Wh-in-situ First we illustrate Deguchi and Kitagawa’s observation for wh-in-situ. In (17.6) and (17.7) we show a pair of examples which diVer with respect to wh-scope, as determined by their selection of complementizers. In both examples the wh-phrase dare-ni (‘who-DAT’) is in situ and there is no whisland, so there is no issue of a subjacency violation. What is of interest here is the relation between wh-scope and the prosodic contour. (In all examples below, bold capitals denote an emphatic accent; shading indicates the domain of eradication, accent marks indicate lexical accents that are unreduced, and " indicates a Wnal interrogative rise.)
5 In this chapter we retain the term ‘eradication’ used in our earlier papers, but we would emphasize that it is not intended to imply total erasure of lexical accents. Rather, there is a post-focal reduction of the phonetic realization of accents, probably as a secondary eVect of the general compression of the pitch range and amplitude in the post-focal domain. See Ishihara (2003) and Kitagawa (2006), where we substitute the term post-focal reduction. Also, we note that the utterance-Wnal rise that is characteristic of a matrix question overrules eradication on the sentence-Wnal matrix COMPWH. The prosodic descriptions given here should be construed as referring to standard (Tokyo) Japanese; there is apparently some regional variability.
342
Gradience in Wh-Movement Constructions
(17.6) Short-EPD #Keesatu-wa [ Mary-ga Police-top
ano-ban
DAre-ni re-ni denwasita-ka denwasita-ka] ´ımademo sirabeteteiru.
Mary-nom that-night who-dat called-compwh even.now investigating
‘The police are still investigating who Mary called that night.’
(17.7) Long-EPD Keesatu-wa [ Mary-ga
ano-ban
DAre-ni re-ni denwasita-to] denwasita-to] imademo kangaeteiru-no"? kangaeteiru-no
Police-top Mary-nom that-night who-dat called-compthat even.now think-compwh ‘Who do the police still think that Mary called that night?’
When we gather acceptability judgements on these sentences we present them in spoken form with either the short-EPD or the long-EPD prosodic pattern. With the contours shown here the sentences are judged acceptable. If the two contours are exchanged, the sentences are judged to be extremely unnatural.6 (See Section 17.4 for some related experimental data.) Now let us consider examples that are similar to (17.6) and (17.7) but have a diVerent selection of complementizers. In (17.8) and (17.9) the wh-phrase is in situ inside a wh-complement clause. The word strings are identical here, so (17.8) and (17.9) are lexically and structurally identical; only the prosodic contour diVers between them. Observe that in both cases the complementizers (subordinate -kadooka, matrix -no) are compatible only with matrix scope. We thus predict that (17.8) with short-EPD will be judged unacceptable, while (17.9) with long-EPD will be judged acceptable. And this is indeed what informants’ judgements reveal when sentences are presented auditorily, with prosodic properties controlled. (We use # below to denote a sentence that is unacceptable with the indicated prosody.) (17.8) Short-EPD #Keesatu-wa [Mary-ga . . . ano-ban DAre-ni denwasita-kadooka ] re-ni denwasita-kadooka ´ımademo sirabeteteiru-no? Police-top
Mary-nom
that-night who-dat called-compwh even.now investigating-q
a. ‘‘Who1 is such that the police are still investigating [whether Mary called him/her1 that night]?’ b. ‘Are the police still investigating [whether Mary called who that night]?’
6 Although this is generally true, Satoshi Tomioka notes (p.c.) that certain expressive modes (e.g. a strong expression of surprise) can disturb the prosody-scope correlation for long-EPD. This phenomenon needs further investigation. See also Hirotani (2003) for psycholinguistic data on the perception of long-EPD utterances.
Prosodic InXuence on Syntactic Judgements
343
(17.9) Long-EPD Keesatu-wa [Mary-ga ano-ban DA re-ni denwasita-kadooka] imademo sirabeteteiru-no Police-top Mary-nom that-night who-dat called-compwhether even.now investigating-compwh ‘Who1 is such that the police are still investigating [whether Mary called him/her1 that night]?’
Pronounced with long-EPD, (17.9) is acceptable and has matrix scope interpretation of the wh-phrase. Sentence (17.8) with short-EPD is not acceptable. It may be rejected on one of two grounds, as indicated in (a) and (b). Either a hearer attempts to interpret (17.8) with matrix wh-scope as in translation (17.8a), and would then judge the prosody to be inappropriate; or (17.8) is interpreted with subordinate wh-scope as in translation (17.8b), in line with the prosody, and the subordinate complementizer -kadooka (‘whether’) would be judged ungrammatical since it cannot be a wh-scope marker. As noted, however, there are some speakers who are able to interpret -kadooka as a wh-scopemarker, and for them (17.8) is acceptable with subordinate scope, as expected. The fact that (17.9) is acceptable shows that matrix wh-scope is available when the sentence is pronounced with long-EPD. Thus it is evident that subjacency does not block scope extraction from a -kadooka clause. The unacceptability of (17.8) therefore cannot be due to subjacency. Only an approach that incorporates prosody can account for the contrast between the two examples. The confusion about the applicability of subjacency in Japanese is thus resolved. When appropriate prosody is supplied, grammaticality judgements show no eVect of subjacency on the interpretation of wh-in-situ.7 The variable judgements reported in the literature are explicable on the assumption that when no prosody is explicitly provided, readers project their own prosodic contour. A reader of (17.8)/(17.9) who happened to project long-EPD would Wnd the sentence acceptable on the matrix scope reading represented in (17.9). A reader who happened to project short-EPD would in eVect be judging (17.8), and would be likely to Wnd it unacceptable on the matrix scope reading (and also the subordinate scope reading). This judgement could create the impression that subjacency is at work. As we discuss below, there are reasons why readers might be more inclined to project short-EPD than long-EPD for
7 See Deguchi and Kitagawa (2002) for evidence that long-EPD is not an exceptional prosody which permits scope extraction out of wh-islands by overriding subjacency.
344
Gradience in Wh-Movement Constructions
wh-in-situ examples. If this is so (i.e. if short-EPD is the default prosody for this construction), it would encourage the misreading of this word string as (17.8) rather than as (17.9), and so would tilt readers toward a negative judgement.8 17.3.2 Long-distance-scrambled wh The other data disagreement which needs to be resolved with respect to Japanese wh-constructions concerns the scope interpretation of a wh-XP that has undergone long-distance scrambling out of a subordinate clause. This was schematized in (17.5), repeated here, and is exempliWed in (17.10). (17.5)
Long-distance scrambling : [ wh-XPi . . . [ . . . ti . . . COMPSubord ] . . . COMPMatrix ]
(17.10) Nani1-o
John-wa
[ Mary-ga
t1
tabeta-ka ]
siritagatteiru-no"?
what-acc John-top Mary -nom ate-compwhether/wh wants.to.know-compwh/-q a. ‘Does John want to know what Mary ate?’ b. *‘What does John want to know whether Mary ate?’ (i.e. ‘Whati is such that John wants to know whether Mary ate iti?’)
As noted earlier, there is no evidence of any subjacency restriction on overt long-distance scrambling in this construction: a wh-phrase can be freely scrambled even out of a wh-island. But there has been disagreement in the literature concerning the LF scope of a long-distance-scrambled wh-XP. If scrambling of a wh-XP is subject to obligatory (or ‘radical’) reconstruction at LF, the scrambled wh-phrase in (17.10) would have to be interpreted in its underlying position, that is with the same scope possibilities as for an in-situ wh-phrase. We observed above that wh-in-situ can be interpreted with either subordinate-clause scope or matrix-clause scope, although with a preference for the former in reading, when prosody is not pinned down. However, Takahashi (1993) claimed to the contrary that only matrix wh-scope (i.e. interpretation (17.10b)) is acceptable in this construction.
8 In Kitagawa and Fodor (2003) we noted two additional factors that could inhibit acceptance of matrix scope for wh-in-situ: semantic/pragmatic complexity (the elaborate discourse presuppositions that must be satisWed); and processing load (added cost of computing the extended dependency between the embedded wh-phrase and a scope marker in the matrix clause). It seems quite likely that these conspire with the default prosody to create diYculty with the matrix scope reading. However, we will not discuss those factors here, because they cannot account for judgements on the wh-scrambling examples that we examine in the next section.
Prosodic InXuence on Syntactic Judgements
345
Unacceptability of the subordinate scope interpretation (17.10a) does not follow from subjacency or from any other familiar syntactic constraint. In order to account for it, Takahashi was driven to assume that sentences such as (17.10) are derived not by long-distance scrambling but by overt whmovement, which (unlike scrambling) would not be ‘undone’ (i.e. would not be radically reconstructed) at LF. Although a clever notion, this does not mesh well with other observations about scrambling in Japanese and also in Korean (e.g. Kim 2000). That it is not the right approach is underscored by the observation (Deguchi and Kitagawa 2002) that when short-EPD is overtly supplied in spoken sentences, many speakers accept subordinate scope in examples such as (17.10). That is, (17.10a) is acceptable with short-EPD, although not with long-EPD—although informants often sense a lingering awkwardness in (17.10a), for which we oVer an explanation below. The mixed opinions on (17.10a) thus fall into place on an account that respects prosodic as well as syntactic constraints. The correlation of prosody and scope in informants’ judgements of spoken sentences is exactly as in the other examples noted above: short-EPD renders subordinate scope acceptable and blocks matrix scope, while with long-EPD matrix scope is acceptable and subordinate scope is not. Note, however, that to explain why it is (17.10a) rather than (17.10b) that raises disagreement when prosody is not speciWed, the prosodic account would have to assume that long-EPD is the prosody that readers naturally project onto the word string. However, we saw above that readers must prefer short-EPD if prosody is to provide an explanation for the mixed judgements on wh-in-situ. Apparently the phonological default Xips between wh-in-situ constructions and wh-scrambled constructions. In the next section we consider why this would be so. 17.3.3 Which prosody is the default? Many, perhaps most, judgements of syntactic well-formedness reported in the literature are made on written examples. No doubt this is largely for reasons of convenience, but perhaps also the intention is to exclude phonological factors from the judgement so that it can be a pure reXection of syntactic structure. However, if the implicit prosody hypothesis (Section 17.1) is correct, this is an unrealistic goal. Phonological factors cannot be excluded, because default prosody intrudes when no prosody is speciWed in the input. Thus judgements on visually presented sentences are not prosody-free judgements, but are judged as if spoken with default prosody. To provide a full explanation of why certain scope interpretations of Japanese wh-constructions tend to be disfavoured in
346
Gradience in Wh-Movement Constructions
reading, we need prosodic theory to make predictions as to which prosody is the default for which construction. In particular, the observed preference for subordinate scope for wh-in-situ would be explained if readers tend to assign shortEPD rather than long-EPD to wh-in-situ constructions; and the observed preference for matrix clause scope for long-distance scrambled wh would be explained if readers tend to assign long-EPD in preference to short-EPD to scrambled wh constructions. Although it may have the Xavour of a contradiction, this is in fact exactly what would be expected. Our proposal is that competition among various constraints at the PF interface yields a diVerent prosodic default for scrambled wh than for wh-in-situ. In our previous work we have argued that short-EPD is phonologically more natural than long-EPD because the latter creates a long string of rhythmically and tonally undiVerentiated material, which is generally dispreferred in natural language (see Selkirk 1984; Kubozono 1993). This implies that even where a grammar insists on prosodic eradication, the shorter it can be, the better it is. In support of this, Kitagawa and Fodor (2003) presented examples indicating that a sentence becomes progressively less natural as the extent of EPD is increased by adding extra material even within a single-clause construction. Comparable sentences but without wh and hence without EPD do not degrade in the same manner; thus the eVect is apparently prosodic. A similar distaste for lengthy stretches of deprosodiWed material can be observed in English right dislocation constructions. The dislocated phrase requires prosodic eradication yet eradication that extends over more than a few words is disfavoured. This creates a clash which makes an example such as I really hated it, that Wsh that Mary tried to persuade me to eat at the French restaurant last night stylistically awkward. A diVerent sort of clash occurs when a long-distance-scrambled wh is to be pronounced with short-EPD in order to give it subordinate clause scope, as in (17.10a) (repeated below as (17.11a) with its prosody indicated). Although short-EPD is generally preferred, in the scrambled construction it traps an element of the matrix clause (John-wa; ‘John-TOP’ in (17.10)) between the scrambled XP and the rest of the subordinate clause.9 Prosodic 9 If the XP were scrambled to a position between any overt matrix items and the Wrst overt element of the subordinate clause, no matrix item would be trapped. The resulting sentence would be ambiguous between local scrambling within the subordinate clause, and long-distance scrambling into the matrix clause, so it would provide no overt evidence that the scrambled phrase is located in the matrix clause in the surface form. In that case the example would not be useful for studying the prosodic and/or semantic eVects of long-distance scrambling. Thus: any sentence that could be used to obtain informants’ judgements on the acceptability of subordinate clause scope for a long-distance scrambled wh would necessarily exhibit the entrapment which we argue favours long-EPD and hence matrix scope.
Prosodic InXuence on Syntactic Judgements
347
eradication proceeds from the focused wh phrase through to the end of the clause which is its scope. In the case of short-EPD, this will be from the surface position of the wh-XP to the end of the subordinate clause. Thus, the matrix topic John-wa in (17.10) will have its accent eradicated even though it is not in the intended syntactic/semantic scope of the wh-XP. This is represented in (17.11a). (17.11) a. NAni ni11-o what-acc
John-wa John-wa
[Mary-ga [Mary-ga t11
tabeta-ka ]
siritaga´tteiru-no"
John-TOP Mary-NOM ate-compWH want.to.know-q
‘Does John want to know what Mary ate?’ ni1-o b. NAni what-acc
John-wa
[Mary-ga [Mary-gatt1 1 tabeta-ka tabeta-ka ]
siritagatteiru-no siritagatteiru-no "
John-TOP Mary-NOM ate-compWHETHER want.to.know-compwh
*‘What does John want to know whether Mary ate?’ (i.e. ‘Whati is such that John wants to know whether Mary ate iti?’)
Short-EPD as in (17.11a) is dispreferred. There is a mismatch between the inclusion of John-wa within the EPD domain, and the ending of the EPD at the subordinate COMP. This oVends a very general preference for congruence between syntactic and prosodic structure, which encourages perceivers to assume a simple transparent relationship between prosody and syntax wherever possible. Thus we expect a preference for material in the prosodic eradication domain of a Japanese wh construction to be construed as being in the syntactic scope domain also. In the present case this preference for congruence can be satisWed only if the prosody assigned is long-EPD, which extends through both clauses, as in (17.11b). If (17.10) were presented in writing, a reader assigning implicit prosody, and having necessarily eradicated the accent in John-wa, would be likely to continue the eradication through the rest of the clause that includes John-wa. The result would be long-EPD, favouring a matrix scope interpretation. The advantage of longEPD over short-EPD with respect to congruence for long-distance scrambled wh might outweigh the fact that long expanses of EPD are generally dispreferred.10 In support of this account of long-EPD as the preferred prosody for a scrambled-wh construction, we note that even for spoken sentences with
10 We noted above that semantic and processing factors may reinforce the prosodic default in the case of wh-in-situ. However, those factors would favour subordinate scope for scrambled wh as well as for wh-in-situ, as explained in Kitagawa and Fodor (2003). Thus, only the prosodic explanation makes the correct prediction for both contexts: a preference for subordinate scope for wh-in-situ and a preference for matrix scope for long-distance scrambled wh.
348
Gradience in Wh-Movement Constructions
overt prosody, hearers (and even speakers!) sometimes complain that they can accept the subordinate scope interpretation only by somehow disregarding or ‘marginalizing’ the intervening matrix constituent. This is interesting. It can explain why subordinate scope is not always felt to be fully acceptable even with overt short-EPD, and it is exactly as could be expected given that the intrusion of this matrix constituent in the subordinate clause eradication domain is what disfavours the otherwise preferred short-EPD. The general conclusion is clear: when overt prosody is present, listeners can be expected to favour the syntactic structure congruent with the prosody and judge the sentence accordingly. When no overt prosody is in the input, as in reading, perceivers make their judgements on the basis of whatever prosodic contour they have projected. This is a function of various principles, some concerning the prosody–syntax interface, others motivated by purely phonological concerns (e.g. rhythmicity) which in principle should be irrelevant to syntax. However, a reader may proceed as if the mentally projected prosody had been part of the input, and then judge the syntactic wellformedness of the sentence on that basis. Although some astute informants may seek out alternative analyses, there is no compelling reason for them to do so, especially as the request for an acceptability judgement implies— contrary to the expectation in normal sentence processing for comprehension—that failure to Wnd an acceptable analysis is a legitimate possibility. Therefore, any sentence (or interpretation of an ambiguous sentence) whose required prosodic contour does not conform to general prosodic patterns in the language is in danger of being judged ungrammatical in reading, although perceived as grammatical if spoken with appropriate prosody.
17.4 Judgements for written and spoken sentences 17.4.1 Previous research When we began this work on wh-scope interpretation in Japanese, we took it for granted that the relevance of prosody to acceptability, for some constructions at least, would be a familiar point and that the recent wave of psycholinguistic experiments on grammaticality judgements would have produced plenty of data in support of it. But we scoured the literature, most notably the volumes by Schu¨tze (1996) and Cowart (1997), and found very few reports of grammaticality judgements on spoken sentences. Comments on using speech input for grammaticality judgements mostly concern diVerences in register between spoken and written language. Cowart also notes some practical
Prosodic InXuence on Syntactic Judgements
349
disadvantages of spoken input.11 Schu¨tze cites a study by Vetter et al. (1979), which compared written presentation and auditory presentation, the latter with normal or monotone intonation. The sentence materials were diverse and the results concerning prosody were mixed; normal intonation had an eVect in some cases only. The details may repay further investigation, but the sentence materials were not designed in a way that could shed light on our hypothesis that auditory presentation should aid judgements primarily for sentences needing non-default prosody.12 The only other study we know of that tested identical sentence materials in written and spoken form is by Keller and Alexopoulou (2001) on Greek word order, accent placement, and focus. This is a substantial investigation of six diVerent word orders in declarative sentences, each in Wve diVerent question contexts establishing a discourse focus. In the spoken sentences, accent position was also systematically varied. The magnitude estimation method (see Bard et al. 1996) was used to elicit judgements of ‘linguistic acceptability’, a term which was intentionally not deWned for the participants. The results and conclusions are of considerable interest but are too numerous to review here. It is worth noting, however, that Keller and Alexopoulou underscore the signiWcant contribution of prosody to acceptability for sentences involving focus, even in a language with considerable freedom of word order such as Greek. They write: ‘English relies on accent placement and only rarely on syntax . . . for discourse purposes. On the other hand, the literature on free word order languages . . . has emphasized the role of word order . . . We found that, at least in Greek, word order . . . plays only a secondary role in marking information structure; word order preferences can be overridden by phonological constraints’ (Keller and Alexopoulou 2001: 359–60). Unfortunately for present 11 We set aside here studies whose primary focus is judgements by second language learners; see Murphy (1997) and references there. Murphy found for English and French sentences that subjects (both native and L2 speakers) were less accurate with auditory presentation than with visual presentation, especially with regard to rejecting subjacency violations and other ungrammatical examples (cf. Hill’s observation noted below). 12 Schu¨tze also mentions an early and perhaps not entirely serious exploration by Hill (1961) of ten example sentences, eight of them from Chomsky (1957), judged by ten informants. For instance, the sentence I saw a fragile of was accepted in written form by only three of the ten informants. In spoken form, with primary stress and sentence-Wnal intonation on the word of, it was subsequently accepted by three of the seven who had previously rejected it. Some comments (e.g. ‘What’s an of?’) revealed that accepters had construed of as a noun. Hill concluded, as we have done, that ‘intonation-pattern inXuences acceptance or rejection.’ However, his main concern, unlike ours, was over-acceptance of spoken examples. He warned that ‘If the intonation is right, at least enough normal speakers will react to the sentence as grammatical though of unknown meaning, to prevent convergent rejection.’ Our experimental data (see below) also reveal some tendency to over-accept items that are ungrammatical but pronounced in a plausible-sounding fashion, but we show that this can be minimized by simultaneous visual and auditory presentation.
350
Gradience in Wh-Movement Constructions
purposes, no exact comparison can be made of the results for the reading condition and the listening condition, because there were other diVerences of method between the two experiments. 17.4.2 Experimental Wndings: Japanese and English 17.4.2.1 Materials Since the relevance of prosody to acceptability had not previously been broadly tested, we conducted an experiment on the two Japanese wh-constructions discussed above, with a related experiment on two constructions in English for purposes of comparison. In all four cases the target constructions were hypothesized to be fully acceptable only if assigned a non-default prosody (explicitly or implicitly). Our prediction was that they would be accepted more often when presented auditorily with appropriate prosody (the listening condition) than when presented visually without prosody (the reading condition). The Japanese experiment was conducted by Kitagawa and Yuki Hirose. The target items were instances of constructions (17.4) and (17.5) above, with whin-situ and long-distance scrambled wh respectively. Each was disambiguated by its combination of matrix and subordinate complementizers toward what has been reported to be its less preferred scope interpretation: (a) subordinate wh-in-situ with forced matrix scope as in (17.12) below; (b) wh scrambled from the subordinate clause into the matrix clause, with forced subordinate scope as in (17.13).13 (17.12) Kimi-wa Kyooko-ga
hontoowa [ dare-o
you-top Kyooko-nom in.reality
aisiteita-to ]
imademo omotteiru-no?
who-acc love-compthat even-now thinking-compwh
‘Who do you still think that Kyoko in fact loves?’ (17.13) Nani1-o
aitu-wa
[[Tieko-ga t1 kakusiteiru-ka] boku-ga sitteiru-to ]
what-acc that.guy-top Tieko -nom hiding-compwh I-nom know-compthat omotteiru-rasii-yo. thinking-seems-affirm ‘That guy seems to think that I know what Chieko is hiding.’
In the listening test, the sentences were spoken with appropriate prosody: long-EPD for wh-in-situ examples such as (17.12), and short-EPD for fronted13 An extra declarative clause was added in the sentences of type (17.13), structurally intermediate between the lowest clause, in which the wh-XP originated, and the highest clause, into which it was scrambled. The purpose of this was to prevent readers, at the point at which they encounter the -ka, from easily scanning the remainder of the sentence to see that no other possible scope marker is present. If they had at that point detected the absence of a scope marker in the matrix clause, they would inevitably have adopted a subordinate scope reading, and that would have inactivated any possible preference for the long-EPD/matrix scope reading.
Prosodic InXuence on Syntactic Judgements
351
wh examples such as (17.13). In the reading test, prosody was not mentioned, so readers were free to assign either prosody (or none at all). Our hypothesis that short-EPD is the default for wh-in-situ examples, and long-EPD the default for fronted-wh examples, predicted that the experimental sentences would be rejected more often when presented in written form than when spoken with appropriate contours. We conducted a comparable experiment in English in order to provide some benchmarks for the Japanese study. The English experiment was conducted by Fodor with Erika Troseth, Yukiko Koizumi, and Eva Ferna´ndez. The target materials were of two types. One was ‘not-because’ sentences such as (17.14), with potentially ambiguous scope that was disambiguated by a negative polarity item in the because-clause, which would be ungrammatical unless that clause were within the scope of the negation. (17.14) Marvin didn’t leave the meeting early because he was mad at anyone. The second type of target sentence consisted of a complex NP (a head noun modiWed by a PP) and a relative clause (RC) as in (17.15), which was potentially ambiguous between high attachment to the head noun or low attachment to the noun inside the PP, but was disambiguated by number agreement toward high attachment. (17.15) Martha called the assistant of the surgeons who was monitoring the progress of the baby. For both of these constructions, as in the Japanese experiment, the disambiguation was toward an interpretation which has been claimed to require a non-default prosody. For the not-because construction, Frazier and Clifton (1996) obtained experimental results for written materials indicating that the preferred interpretation has narrow-scope negation, that is the because-clause is outside the scope of the negation. (Unlike (17.14), their sentences had no negative polarity item forcing the wide-scope negation reading.) That the dispreferred widescope-negation reading needs a special intonation contour is noted by Hirschberg and Avesani (2000). In their study, subjects read aloud contextually disambiguated examples, and the recordings were acoustically analysed. The Wnding was that the intonation contours for the (preferred) narrowscope-negation ‘usually exhibit major or minor prosodic phrase boundaries before the subordinate conjunction’ and ‘usually were falling contours’. These are typical features of multi-clause sentences without negation. By contrast, Hirschberg and Avesani noted that the intonation contours for the (dispreferred) wide-scope-negation ‘rarely contain internal phrase boundaries’
352
Gradience in Wh-Movement Constructions
and ‘often end in a ‘‘continuation rise’’.’ This prosody—especially the sentence-Wnal rise—is generally perceived to be highly marked for English. In our listening test this marked prosody was used. Thus, we predicted that the sentences would be perceived with wide-scope negation, which would license the negative polarity item, so that the sentences would be judged grammatical. If instead, readers assigned the default prosody without these marked features, they might not spot the wide-scope-negation interpretation, and the negative polarity item would then seem to be ungrammatical. For the RC construction in (17.15), experimental results by Cuetos and Mitchell (1988) have shown that the low-attachment reading is mildly preferred for ambiguous examples in English (although the opposite is true in Spanish). It has been suggested (Fodor 1998, 2002a, 2002b) that this is for prosodic reasons. It has been shown (Maynell 1999; Lovricˇ 2003) that a prosodic boundary before an RC promotes high attachment; but English (unlike Spanish) often has no prosodic break at the beginning of an RC. If English readers tend to assign a contour with no pre-RC break, that would encourage the low attachment analysis, so the verb in the RC in the experimental sentences would appear to have incorrect number agreement, and a judgement of ungrammaticality would ensue. In our listening test, we used the marked prosody with a prosodic break at the pre-RC position, to encourage high attachment. The two English constructions tested in this experiment are useful because they diVer considerably with respect to the degree of markedness of their less preferred prosodic contour: for (17.14) it is extreme; for (17.15) it is very slight. We chose these two constructions in the hope that they would allow us to bracket the sensitivity of the reading-versus-listening comparison, providing useful baselines for future research. We predicted considerably lower acceptance rates in reading than in listening for the wide-scope interpretation of the not-because construction, but a much smaller diVerence, if any, for the highattachment RC construction. The Japanese wh constructions were expected to fall between these end-points. 17.4.2.2 Method and presentation In both experiments there were twelve of each of the two types of target sentence, and for each target type there were also twelve Wller sentences (four grammatical, eight ungrammatical) for comparison with the targets; these ‘related Wllers’ were superWcially similar to the targets in structure but did not contain the critical ambiguity disambiguated to its non-preferred reading. In both experiments, the targets and their related Wllers were presented in pseudo-random order among forty
Prosodic InXuence on Syntactic Judgements
353
assorted Wller sentences (twenty grammatical, twenty ungrammatical) with completely diVerent structures. One group of subjects saw all sentences on a computer screen, one whole sentence at a time, with a timed exposure (nine seconds per sentence in the Japanese study; twelve seconds per sentence in the English one), and read them silently. Another group heard sound Wles of the same sentences, spoken with appropriate prosody by an instructed native speaker. For the English materials, there were twelve seconds between the onsets of successive spoken sentences as for the written sentences (although none of the spoken sentences occupied the full twelve seconds). For the Japanese spoken materials, the presentation time was from Wve to seven seconds, tailored to the length of the sentence. For English only, there was a third group of subjects who heard the sound Wles simultaneously with visual presentation, for twelve seconds per sentence. Subjects in both experiments, thirteen in each presentation condition, were college students, native speakers of the language of the experimental materials. They made rapid grammaticality judgements by circling ‘YES’ or ‘NO’ (‘HAI’ or ‘IIE’ in the Japanese experiment) on a written response sheet. They were then allowed to revise this initial judgement if they wished to. This revision opportunity was aVorded in order to prevent excessively thoughtful (slow) initial responses. In fact there were few revisions and we report only the initial judgements here. 17.4.2.3 Results Acceptance rates (as percentages) are shown in Figures 17.1 to 17.5. What follows is a brief review of the experimental Wndings. We regard these results as preliminary, and plan to follow them up with more extensive studies, but we believe there are already outcomes of interest here, which we hope will encourage comparable studies on other constructions and in other languages. Key to Wgures: In all the Wgures below, the percentage acceptance rates for target sentences (of each type named) are represented by horizontal stripes. The grammatical Wller sentences that are related to the targets are represented by vertical stripes, and the ungrammatical Wllers related to the targets are represented by dots. The assorted (unrelated) Wllers are shown separately at the right. In the Japanese data we see, as predicted, that the target sentences were accepted more often in listening than in reading (see the central bars for whin-situ and for matrix-scramble, across the two presentation conditions). The diVerence is not large but it is statistically signiWcant (p < .01). Relatively speaking, the results are very clear: in the reading condition, the targets are intermediate in judged acceptability between their matched grammatical
354
Gradience in Wh-Movement Constructions
100.0 80.0 60.0 40.0 20.0 0.0 wh-in-situ
matrix-scramble
assorted fillers
Figure 17.1. Japanese reading, percent acceptance
100.0 80.0 60.0 40.0 20.0 0.0 wh-in-situ
matrix-scramble
assorted fillers
Figure 17.2. Japanese listening, percent acceptance
Wllers and matched ungrammatical Wllers, but in the listening condition they draw signiWcantly closer to the grammatical Wllers, supporting the hypothesis that the grammar does indeed license them, although only with a very particular prosody. Aspects of the Japanese data that need to be checked in continuing research include the relatively poor rate of acceptance in reading for the matrixscramble Wller sentences,14 and the lowered acceptance of all grammatical 14 This result may dissolve in a larger-scale study. It was due here to only one of the four grammatical Wller sentences related to the matrix-scramble experimental sentences. Unlike the other three, which were close to 100% acceptance, this sentence was accepted at an approximately 50% level. This
Prosodic InXuence on Syntactic Judgements
355
100.0 80.0 60.0 40.0 20.0 0.0 not-because
RC-attachment
assorted fillers
Figure 17.3. English reading, percent acceptance
100.0 80.0 60.0 40.0 20.0 0.0 not-because
RC-attachment
assorted fillers
Figure 17.4. English listening, percent acceptance
Wllers in the listening condition. The general reduction in discrimination of grammatical versus ungrammatical Wller items in listening is observed also in the English study and its cause is considered below. We turn now to the English data which show, as anticipated, that the beneWt of spoken input depends on how marked the non-preferred prosody is.
one example had a matrix-scrambled/matrix-interpreted wh-phrase in a construction with three clauses, in which there were two intervening non-wh complementizers between the overt wh-phrase and its ultimate wh-scope marker. It is possible that in this multi-clause structure, the dispreference for very long-EPD outweighed the preference for syntax–prosody congruence, creating an apparent ungrammaticality.
356
Gradience in Wh-Movement Constructions
100.0 80.0 60.0 40.0 20.0 0.0 not-because
RC-attachment
assorted fillers
Figure 17.5. English simultaneous reading and listening, percent acceptance
For the not-because sentences, acceptance was extremely low in the reading condition, little better than for the matched ungrammatical Wllers. In the listening condition there was a striking increase in acceptance for these sentences. It did not rise above 50 per cent, even with the appropriate prosody as described by Hirschberg and Avesani (2000). The reason for this was apparent in subjects’ comments on the materials after the experiment: it was often remarked that some sentences were acceptable except for being incomplete. In particular, the continuation rise at the end of the not-because sentences apparently signalled that another clause should follow, to provide the real reason for the event in question (e.g. Marvin didn’t leave the meeting early because he was mad at anyone; he left early because he had to pick up his children from school.)15 This sense of incompleteness clearly cannot be ascribed in the listening condition to failure to assign a suitable prosodic contour. So it can be regarded as a genuine syntactic/semantic verdict on these sentences. Thus this is another case in which auditory presentation aVords a clearer view of the syntactic/semantic status of the sentences in question. It seems that notbecause sentences with wide-scope negation stand in need of an appropriate following discourse context—just as some other sentence types (such as (17.1) above) stand in need of an appropriate preceding discourse context. The RC-attachment sentences, on the other hand, showed essentially no beneWt from auditory presentation. Acceptance in the reading condition was 15 We have found that a suitable preceding context can obviate the need for the Wnal rise, and with it the associated expectation of a continuation. For example, a Wnal fundamental frequency fall on at anyone is quite natural in: I have no idea what was going on that afternoon, but there’s one thing I do know: Marvin did not leave the meeting early because he was mad at anyone. However, it is still essential that there be no intonation phrase boundary between the not and the because-clause.
Prosodic InXuence on Syntactic Judgements
357
already quite high and it did not increase signiWcantly in the listening condition. This could indicate that the prosodic explanation for the trend toward low RC-attachment in English is invalid. But equally, it might show only that this experimental protocol is not suYciently discriminating to reveal the advantage of the appropriate prosody in this case where the diVerence is quite subtle. The familiar preference of approximately 60 per cent for low RCattachment with written input is for fully ambiguous examples. For sentences in which the ambiguity is subsequently disambiguated (e.g. by number agreement, as in the present experiment), subjects may be able to recover quite eYciently from this mild Wrst-pass preference once the disambiguating information is encountered. (See Bader 1998 and Hirose 2003 for data on prosodic inXuences on garden-path recovery in German and Japanese respectively.) In short: the present results for relative clause attachment do not contradict standard Wndings, although they also do not deWnitively support a prosody-based preference for low RC attachment in English reading. If prosody is the source of this preference, this experimental paradigm is not the way to show it. This is an informative contrast with the case of the notbecause sentences, for which intuitive judgements are sharper and for which the prosodic cues in spoken sentences had a signiWcant eVect in this experimental setting. An unwelcome outcome of the English study is that greater acceptance of the target sentences in the listening condition is accompanied by greater acceptance of the related ungrammatical Wller sentences. It is conceivable, therefore, that these Wndings are of no more interest than the discovery that inattentive subjects can be taken in by a plausible prosodic contour applied to an ungrammatical sentence as Hill (1961) suggested (see footnote 12). However, it seems unlikely that this is all that underlies the considerable diVerence between reading and listening for the not-because sentences. A plausible alternative explanation is that listening imposes its own demands on perceivers, which may oVset its advantages. Although auditory input provides informants with additional linguistically relevant information in the form of a prosodic contour, it also requires the hearer to perceive the words accurately and hold the sentence in working memory without the opportunity for either lookahead or review. Our methodology provided no independent assessment of whether errors of perception were more frequent for auditory than for visual input. It seems likely that this was so (although the converse might be the case for poor readers), since the distinction between grammatical and ungrammatical sentences often rested on a minor morphophonological contrast. In the English RC sentences the disambiguation turned on a singular versus plural verb, for example walk versus walks, which could have been misheard.
358
Gradience in Wh-Movement Constructions
Although it may have a natural explanation, the ‘Hill eVect’ is a potential disadvantage of auditory presentation for the purposes of obtaining reliable syntactic judgements, since it decreases the discrimination between grammatical and ungrammatical items. To the extent that it is due to persuasiveness of the prosodic contour, it cannot easily be factored out. But problems of auditory perceptibility and memory can be eliminated by presenting the sentence in written form while it is being heard. For the English sentences, the results for simultaneous visual and auditory presentation (see Figure 17.5) show that the mis-acceptance of ungrammatical sentences is substantially reduced, while the grammatical sentences are relatively unaVected or even improved. Thus, it appears that combined visual and auditory presentation optimizes both factors: perceptual accuracy and short-term memory are relieved of pressure, while the extra information in the auditory stimulus eliminates the need for prosodic creativity in reading sentences that require a non-default contour. Combined visual and auditory presentation will therefore be our next step in investigating the Japanese materials.
17.5 Conclusion These experimental Wndings, although modest as yet, support the general moral that we were tempted to draw on the basis of informal judgements of written and spoken sentences. That is: acceptability judgements on written sentences are not purely syntax-driven; they are not free of prosody even though no prosody is present in the stimulus. This has a practical consequence for the conduct of syntactic research: more widespread use needs to be made of spoken sentences for obtaining syntactic well-formedness judgements. The ideal mode of presentation, as we have seen, provides both written and auditory versions of the sentence (e.g. in a PowerPoint Wle), to minimize perceptual and memory errors while making sure that the sentence is being judged on the basis of the prosody intended. We are sympathetic to the fact that this methodological conclusion entails more work for syntacticians (Cowart 1997: 64, warns that auditory presentation is ‘time-consuming to prepare and execute’), but it is essential nonetheless, at least for sentences whose prosody is suspected of being out of the ordinary in any way.
References Abney, S. (1996) ‘Statistical methods and linguistics’, in J. Klavans and P. Resnik (eds), The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Cambridge, MA: MIT Press, pp. 1–26. —— (1997) ‘Stochastic attribute-value grammars’, Computational Linguistics 23(4): 597–618. Albright, A. (2002) ‘Islands of reliability for regular morphology: Evidence from Italian’, Language 78: 684–709. —— and Hayes, B. (2002) ‘Modeling English past tense intuitions with minimal generalization’, in M. Maxwell (ed.), Proceedings of the 2002 Workshop on Morphological Learning. Philadelphia: Association for Computational Linguistics. —— and Hayes, B. (2003) ‘Rules vs. analogy in English past tenses: A computational/ experimental study’, Cognition 90: 119–61. ——, Andrade, A., and Hayes, B. (2001) ‘Segmental environments of Spanish diphthongization’, UCLA Working Papers in Linguistics 7: 117–51. Alexopoulou, T. and Keller, F. (2003) ‘Linguistic complexity, locality and resumption’, in Proceedings of the 22nd West Coast Conference on Formal Linguistics. Somerville, MA: Cascadilla Press, pp. 15–28. Altenberg, E. P. and Vago, R. M. ms. (2002) ‘The role of grammaticality judgments in investigating first language attrition: A cross-disciplinary perspective’, paper presented at International Conference on First Language Attrition: Interdisciplinary Perspectives on Methodological Issues. Free University, Amsterdam, 22–24 August. Queens College and University of New York. Altmann, G. T. M. (1998) ‘Ambiguity in sentence processing’, Trends in Cognitive Sciences 2: 146–52. Andersen, T. (1991) ‘Subject and topic in Dinka’, Studies in Linguistics 15(2): 265–94. Anderson, J. R. (1990) The Adaptive Character of Thought. Hillsdale, NJ: Lawrence Erlbaum Associates. Antilla, A. (2002) ‘Variation and phonological theory’, in J. Chambers , P. Trudgill, and N. Schilling-Estes (eds), The Handbook of Language Variation and Change. Oxford: Blackwell, pp. 206–43. Anttila, A. (1997) ‘Deriving variation from grammar’, in F. Hinskens, R. van Hout, and L. Wetzels (eds.), Variation, Change and Phonological Theory. Amsterdam: John Benjamins, pp. 35–68. Apoussidou, D. and Boersma, P. (2004) ‘Comparing two optimality-theoretic learning algorithms for Latin stress’, WCCFL 23: 29–42. Ariel, M. (1990) Accessing Noun Phrase Antecedents. London: Routledge.
360
References
Asudeh, A. (2001) ‘Linking, optionality, and ambiguity in Marathi’, in P. Sells (ed.), Formal and Empirical Issues in Optimality-Theoretic Syntax. Stanford, CA: CSLI Publications. Auer, P. (2000) ‘A European perspective on social dialectology’, talk presented at First International Conference on Language Variation in Europe (ICLaVE1). Barcelona, 1 July. —— (2005) ‘Europe’s sociolinguistic unity, or: a typology of European dialect/standard constellations’, in N. Delbecque, J. van der Auwera, D. Geeraerts (eds.), Perspectives on Variation. Sociolinguistic, Historical, Comparative. Trends in Linguistics. Studies and Monographs 163: Mouton de Gruyter, pp. 8–42. Avrutin, S. (2004) ‘Beyond narrow syntax’, in L. Jenkins (ed.), Variation and Universals in Biolinguistics. Amsterdam: Elsevier. Bach, E. and Harms, R. (1972) ‘How do languages get crazy rules?’, in R. Stockwell and R. Macauley (eds), Linguistic Change and Generative Theory. Bloomington, Indiana: Indiana University Press, pp. 1–21. Bader, M. (1996) Sprachverstehen. Syntax und Prosodie beim Lesen. Opladen: Westdeutscher Verlag. —— (1998) ‘Prosodic influences on reading syntactically ambiguous sentences’, in J. D. Fodor and F. Ferreira (eds), Reanalysis in Sentence Processing. Dordrecht: Kluwer, pp. 1–46. —— and Frazier, L. (2005) ‘Interpretation of leftward-moved constituents: Processing topicalizations in German’, Linguistics 43(1): 49–87. —— and Meng, M. (1999) ‘Subject–object ambiguities in German embedded clauses: An across-the-board comparison’, Journal of Psycholinguistic Research 28: 121–43. Bailey, T. M. and Hahn, U. (2001) ‘Determinants of wordlikeness: Phonotactics or lexical neighborhoods?’, Journal of Memory and Language, 43: 568–91. Baltin, M. R. (1982) ‘A landing site theory of movement rules’, Linguistic Inquiry 13(1): 2–38. Barbiers, S., Cornips, L., and van der Kleij, S. (eds) (2002) Syntactic Microvariation. Electronic publication of the Meertens Instituut. http://www.meertens.knaw.nl/ projecten/sand/synmic/ Bard, E. G., Robertson, D., and Sorace, A. (1996) ‘Magnitude estimation of linguistic acceptability’, Language 72: 32–68. Barnes, J. and Kavitskaya, D. (2002) ‘Phonetic analogy and schwa deletion in French’, presented at the Berkeley Linguistic Society. Baum, L. E. (1972) ‘An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes’, Inequalities 3: 1–8. Beckman, M. (1996) ‘The parsing of prosody’, Language and Cognitive Processes 11: 17–67. —— (2003) ‘Input representations (inside the mind and out)’, in M. Tsujimura and G. Garding (eds), WCCFL 22 Proceedings. Somerville, MA: Cascadilla Press, pp. 101–25. Beckman, M. E. and Ayers-Elam, G. (1993) Guidelines for ToBI Labelling. http:// www.ohio-state.edu/research/ phonetics/E_ToBI/singer_tobi.html.
References
361
Beckman, M., Munson, B., and Edwards, J. (2004) ‘Vocabulary growth and developmental expansion of types of phonological knowledge’, LabPhon 9, pre-conference draft. beim Graben, P., Saddy, J. D., Schlesewsky, M., and Kurths, J. (2000) ‘Symbolic dynamics of event-related brain potentials’, Physical Review E 62: 5518–41. Belletti, A. (2004) ‘Aspects of the low IP area’, in L. Rizzi (ed.), The Structure of CP and IP. The Cartography of Syntactic Structures, Volume 2. Oxford: Oxford University Press, pp. 16–51. ——, Bennati, E., and Sorace, A. (2005) ‘Revisiting the null subject parameter from an L2 developmental perspective’, paper presented at the XXXI Conference on Generative Grammar, Rome, February 2005. Bentley, D. and Eytho´rsson, T. (2004) ‘Auxiliary selection and the semantics of unaccusativity’, Lingua 114: 447–71. Benua, L. (1998) ‘Transderivational Identity’, Ph.D. thesis, University of Massachusetts. Berent, I., Pinker, S., and Shimron, J. (1999) ‘Default nominal inflection in Hebrew: Evidence for mental variables’, Cognition 72: 1–44. Berg, T. (1998) Linguistic Structure and Change: An Explanation from Language Processing. Oxford: Clarendon Press. Berger, A., Della Pietra, S., and Della Pietra,V. (1996) ‘A maximum entropy approach to natural language processing’, Computational Linguistics 22(1): 39–71. Berko, J. (1958) ‘The child’s learning of English morphology’, Word 14: 150–77. Bever, T. G. (1970) ‘The cognitive basis for linguistic structures’, in J. R. Hayes (ed.), Cognition and the Development of Language. New York: John Wiley. Bierwisch, M. (1968) ‘Two critical problems in accent rules’, Journal of Linguistics 4: 173–8. —— (1988) ‘On the grammar of local prepositions’, in M. Bierwisch, W. Motsch, and I. Zimmermann (eds), Syntax, Semantik und Lexicon (¼ Studia Grammatica XXIX). Berlin: Akademie Verlag, pp. 1–65. Bini, M. (1993) ‘La adquisicı´on del italiano: mas alla´ de las propiedades sinta´cticas del para´metro pro-drop’, in J. Liceras (ed.), La linguistica y el analisis de los sistemas no nativos. Ottawa: Doverhouse, pp. 126–39. Birch, S. and Clifton, C. (1995) ‘Focus, accent, and argument structure: Effects on language comprehension’, Language and Speech 38: 365–91. Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford: Oxford University Press. Blancquaert, E., Claessens, J., Goffin, W., and Stevens, A. (eds) (1962) Reeks Nederlandse Dialectatlassen: Dialectatlas van Belgisch-Limburg en Zuid-Nederlands Limburg, 8. Antwerpen: De Sikkel. Blevins, J. (2004) Evolutionary Phonology: The Emergence of Sound Patterns. Cambridge: Cambridge University Press. Bod, R. (1998) Beyond Grammar: An Experience-Based Theory of Language. Stanford, CA: Center for the Study of Language and Information. ——, Hay, J., and Jannedy, S. (2003) Probabilistic Linguistics. Cambridge, MA: MIT Press.
362
References
Boersma, P. (1997) ‘How we learn variation, optionality, and probability’, Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 21: 43–58. —— (1998a) ‘Functional Phonology: Formalizing the Interactions between Articulatory and Perceptual Drives’, Ph.D. thesis, University of Amsterdam. —— (1998b) ‘Typology and acquisition in functional and arbitrary phonology’, MS, University of Amsterdam. http://www.fon.hum.uva.nl/paul/ papers/typ_acq.pdf. —— (2000) ‘Learning a grammar in functional phonology’, in J. Dekkers, F. van der Leeuw, and J. van de Weijer (eds), Optimality Theory: Phonology, Syntax, and Acquisition. Oxford: Oxford Univeristy Press, pp. 465–523. —— (2001) ‘Phonology-semantics interaction in OT, and its acquisition’, in R. Kirchner, W. Wikeley, and J. Pater (eds), Papers in Experimental and Theoretical Linguistics, Vol. 6. Edmonton: University of Alberta, pp. 24–35. —— (2004) ‘A stochastic OT account of paralinguistic tasks such as grammaticality and prototypicality judgments’, unpublished manuscript, Rutgers Optimality Archive no. 648-0304. —— (2005) ‘Some listener-oriented accounts of hache aspire´ in French’, MS, University of Amsterdam. Rutgers Optimality Archive 730. http://roa.rutgers.edu —— and Escudero, P. (2004) ‘Learning to perceive a smaller L2 vowel inventory: An optimality theory account’, Rutgers Optimality Archive 684. —— and Hayes, B. (2001) ‘Empirical tests of the gradual learning algorithm’, Linguistic Inquiry 32(1): 45–86. ——, Escudero, P., and Hayes, R. (2003) ‘Learning abstract phonological from auditory phonetic categories: An integrated model for the acquisition of language-specific sound categories’, Proceedings of the 15th International Congress of Phonetic Sciences, 1013–16. Boethke, J. (2005) ‘Kasus im Deutschen: Eine empirische Studie am Beispiel freier Relativsa¨tze’, Diploma thesis, Institute of Linguistics, University of Potsdam. Bolinger, D. L. (1961a) Generality, Gradience and the All-or-None. The Hague: Mouton. —— (1961b) ‘Syntactic blends and other matters’, Language 37: 366–81. —— (1972) ‘Accent is predictable (If you’re a mind-reader)’, Language 48: 633–44. —— (1978) ‘Asking more than one thing at a time’, in H. Hiz (ed.), Questions. Dordrecht: Reidel. Borer, H. (1994) ‘The projection of arguments’, in E. Benedicto and J. Runner (eds), Functional Projections. University of Massachusetts, Amherst: Occasional Papers 17. Bornkessel, I., Schlesewsky, M., McElree, B., and Friederici, A. D. (2004a) ‘Multidimensional contributions to garden-path strength: Dissociating phrase structure from case marking’, Journal of Memory and Language 51: 495–522. ——, Fiebach, C. J., Friederici, A. D., and Schlesewsky, M. (2004b) ‘Capacity reconsidered: Interindividual differences in language comprehension and individual alpha frequency’, Experimental Psychology 51: 279–89. ——, McElree, B., and Schlesewsky, M. (submitted). ‘On the time course of reanalysis: The dynamics of verb-type effects’.
References
363
Brants, T. and Crocker, M. W. (2000) ‘Probabilistic parsing and psychological plausibility’, in Proceedings of the 18th International Conference on Computational Linguistics. Saarbru¨cken/Luxembourg/Nancy. Bresnan, J. (2000) ‘The emergence of the unmarked pronoun’, in G. Legendre, J. Grimshaw, and S. Vikner (eds), Optimality-Theoretic Syntax. Cambridge, MA: MIT Press, pp. 113–42. ——, Dingare, S., and Manning, C. D. (2001) ‘Soft constraints mirror hard constraints; Voice and person in English and Lummi’, in M. Butt and T. Holloway King (eds), Proceedings of the LFG01 Conference. Stanford University, Stanford: CSLI Publications, pp. 13–32. Briscoe, T. and Carroll, J. (1993) ‘Generalised probabilistic LR parsing for unificationbased grammars’, Computational Linguistics 19: 25–60. Broekhuis, H. and Cornips, L. (1994) ‘Undative constructions’, Linguistics 32(2): 173–89. —— and Cornips, L. (1997) ‘Inalienable possession in locational constructions’, Lingua 101: 185–209. Browman, C. and Goldstein, L. (1992) ‘Articulatory phonology: An overview’, Phonetica 49: 155–80. Brysbaert, M. and Mitchell, D. C. (1996) ‘Modifier attachment in sentence parsing: Evidence from Dutch’, Quarterly Journal of Experimental Psychology 49A: 664–95. Bu¨ring, D. (1997) The Meaning at Topic and Focus – The 59th Street Bridge Accent. London: Routledge. —— (2001) ‘Let’s phrase it!—Focus, word order, and prosodic phrasing in German double object constructions’, in G. Mu¨ller and W. Sternefeld (eds), ‘Competition in Syntax’, No. 49 in Studies in Generative Grammar. Berlin and New York: de Gruyter, pp. 101–37. Burnage, G. (1990) CELEX—A Guide for Users. Nijmegen: Centre for Lexical Information, University of Nijmegen. Burnard, L. (1995) Users Guide for the British National Corpus. British National Corpus Consortium, Oxford University Computing Service. Burton-Roberts, N., Carr, P., and Docherty, G. (2000) Phonological Knowledge: Conceptual and Empirical Issues. New York: Oxford University Press. Burzio, L. (1986) Italian Syntax: A Government-Binding Approach. Dordrecht: Foris. —— (2002) ‘Surface-to-surface morphology: When your representations turn into constraints’, in P. Boucher (ed.), Many Morphologies. Somerville, MA: Cascadilla Press, pp. 142–77. Bybee, J. L. (1994) ‘A view of phonology from a cognitive and functional perspective’, Cognitive Linguistics 5(4): 285–305. —— (2000a) ‘Lexicalization of sound change and alternating environments’, in M. B. Broe and J. B. Pierrehumbert (eds), Acquisition and the Lexicon: Papers in Laboratory Phonology V. Cambridge: Cambridge University Press, pp. 250–69. —— (2000b) ‘The phonology of the lexicon: Evidence from lexical diffusion’, in M. Barlow and S. Kemmer (eds), Usage-Based Models of Language. Palo Alto: CSLI Publications.
364
References
Bybee, J. L. (2001) Phonology and Language Use. Cambridge: Cambridge University Press. —— (2003) ‘Mechanisms of change in grammaticalization: The role of frequency’, in R. D. Janda and B. D. Joseph (eds), Handbook of Historical Linguistics. Oxford: Blackwell. —— and Hopper, P. (eds) (2001) Frequency and the Emergence of Linguistic Structure. Amsterdam: John Benjamins. —— and Moder, C. L. (1983) ‘Morphological classes as natural categories’, Language 59: 251–70. Carden, G. (1976) ‘Syntactic and semantic data: Replication results’, Language in Society 5: 99–104. Cardinaletti, A. and Starke, M. (1999) ‘The typology of structural deficiency. A case study of the three classes of pronouns’, in H. van Riemsdijk (ed.), Clitics in the Languages of Europe, Vol. 8 of Language Typology. Berlin: Mouton de Gruyter. Carlson, K. (2001) ‘The effects of parallelism and prosody in the processing of gapping structures’, Language and Speech 44: 1–26. Carroll, G. and Rooth, M. (1998) ‘Valence induction with a head-lexicalized PCFG’, in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Granada, pp. 36–45. Cedergren, H. J. and Sankoff, D. (1974) ‘Variable rules: Performance as a statistical reflection of competence’, Language: Journal of the Linguistic Society of America 50: 333–55. Cennamo, M. and Sorace, A. (in press) ‘Unaccusativity at the syntax-lexicon interface: Evidence from Paduan’, to appear in R. Aranovich (ed.), Cross-linguistic Perspectives on Auxiliary Selection. Amsterdam: John Benjamins. Charniak, E. (2000) ‘A maximum-entropy-inspired parser’, in Proceedings of the 1st Conference of the North American Chapter of the Association for Computational Linguistics. Seattle, WA, pp. 132–9. Chater, N., Crocker, M., and Pickering, M. (1998) ‘The rational analysis of inquiry: The case for parsing’, in N. Chater and M. Oaksford (eds), Rational Models of Cognition. Oxford: Oxford University Press, pp. 44–468. Chen, M. (1970) ‘Vowel length variation as a function of the voicing of consonant environment’, Phonetica 22: 129–59. Chomsky, N. (1955) The Logical Structure of Linguistic Theory. Published (in part) as Chomsky (1975). —— (1957) Syntactic Structures. The Hague: Mouton. —— (1964) ‘Degrees of grammaticalness’, in J. A. Fodor and J. J. Katz (eds), The Structure of Language: Readings in the Philosophy of Language. Englewood Cliffs, NJ: Prentice-Hall, pp. 384–9. —— (1965) Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. —— (1975) The Logical Structure of Linguistic Theory. New York: Plenum Press. —— (1981) Lectures on Government and Binding. Dordrecht: Foris.
References
365
Chomsky, N. (1986) Knowledge of Language. Its Nature, Origin, and Use. New York/ Westport/London: Praeger. —— (1995) The Minimalist Program. Cambridge, MA: MIT Press. —— and Halle, M. (1968) The Sound Pattern of English. New York: Harper and Row. —— and Miller, G. A. (1963) ‘Introduction to the formal analysis of natural languages’, in R. D. Luce, R. R. Bush, and E. Galanter (eds), Handbook of Mathematical Psychology, volume II. New York: John Wiley. Christiansen, M. H. and Chater, N. (1999) ‘Connectionist natural language processing: The state of the art’, Cognitive Science 23: 417–37. —— and Chater, N. (2001) ‘Connectionist psycholinguistics: Capturing the empirical data’, Trends in Cognitive Sciences 5: 82–8. Cinque, G. (1990) Types of A’-Dependencies. Cambridge, MA: MIT Press. —— (1993) ‘A null theory of phrase and compound stress’, Linguistic Inquiry 24: 239–97. Clahsen, H. and Felser, C. (in press) ‘Grammatical processing in language learners’, to appear in Applied Psycholinguistics. Clements, G. N. (1992) ‘Phonological primes: Gestures or features?’, Working Papers of the Cornell Phonetics Laboratory 7: 1–15. Coetzee, A. (2004) ‘What it Means to be a Loser: Non-Optimal Candidates in Optimality Theory’, Ph.D. thesis, University of Massachusetts. Cohn, A. (1990) ‘Phonetic and Phonological Rules of Nasalization’, Ph.D. thesis, UCLA, distributed as UCLA Working Papers in Phonetics 76. —— (1993) ‘Nasalization in English: Phonology or phonetic’, Phonology 10: 43–81. —— (1998) ‘The phonetics-phonology interface revisited: Where’s phonetics?’, Texas Linguistic Forum 41: 25–40. —— (2003) ‘Phonetics in phonology and phonology in phonetics’, paper presented at 11th Manchester Phonology Meeting, Manchester, UK. ——, Brugman, J., Clifford, C., and Joseph, A. (2005) ‘Phonetic duration of English homophones: An investigation of lexical frequency effects’, presented at LSA, 79th meeting, Oakland, CA. Coleman, J. and Pierrehumbert, J. B. (1997) ‘Stochastic phonological grammars and acceptability’, in Computational Phonology: Third Meeting of the ACL Special Interest Group in Computational Phonology. Somerset, NJ: Association for Computational Linguistics, 49–56. Coles, M. G. H. and Rugg, M. D. (1995) ‘Event-related brain potentials: An introduction’, in M. D. Rugg and M. G. H. Coles (eds), Electrophysiology of Mind: EventRelated Brain Potentials and Cognition. Oxford, UK: Oxford University Press, pp. 1–26. Collins, M. (1999) ‘Head-Driven Statistical Models for Natural Language Parsing’, Ph.D. thesis, University of Pennsylvania, Philadelphia, PA. Connine, C. M., Ferreira, F., Jones, C., Clifton, C., and Frazier, L. (1984) ‘Verb frame preferences: Descriptive norms’, Journal of Psycholinguistic Research 13: 307–19.
366
References
Corley, S. and Crocker, M. (2000) ‘The modular statistical hypothesis: Exploring lexical category ambiguity’, in M. Crocker, M. Pickering, and C. Clifton (eds), Architectures and Mechanisms for Language Processing. Cambridge: Cambridge University Press. Cornips, L. (1996) ‘Social stratification, linguistic constraints and inherent variability in Heerlen Dutch: The use of the complementizers om/voor’, in J. Arnold et al. (eds), Sociolinguistic Variation: Data, Theory, and Analysis. Selected papers from NWAVE- 23, CSLI Publications: Stanford University, pp. 453–67. —— (1998) ‘Syntactic variation, parameters and their social distribution’, Language Variation and Change 10(1): 1–21. —— and Corrigan, K. (2005) ‘Convergence and divergence in grammar’, in P. Auer, F. Hinskens, and P. Kerswill (eds), The Convergence and Divergence of Dialects in Contemporary Societies. Cambridge: Cambridge University Press. —— and Hulk, A. (1996) ‘Ergative reflexives in Heerlen Dutch and French’, Studia Linguistica 50(1): 1–21. —— and Jongenburger, W. (2001) ‘Elicitation techniques in a Dutch syntactic dialect atlas project’, in H. Broekhuis and T. van der Wouden (eds), Linguistics in The Netherlands 2001, 18. Amsterdam/Philadelphia: John Benjamins, pp. 57–69. —— and Poletto, C. (2005) ‘On standardising syntactic elicitation techniques. Part I’, Lingua 115(7): 939–57. Cowart, W. (1997) Experimental Syntax: Applying Objective Methods to Sentence Judgments. Thousand Oaks, CA: Sage Publications. Crocker, M. (1996) Computational Psycholinguistics: An Interdisciplinary Approach to the Study of Language. Dordrecht: Kluwer. —— (1999) ‘Mechanisms for sentence processing’, in S. Garrod and M. Pickering (eds), Language Processing. London: Psychology Press. —— (to appear) ‘Rational models of comprehension: Addressing the performance paradox’, in A. Cutler (ed.), Twenty-First Century Psycholinguistics: Four Cornerstones. Hillsdale: Lawrence Erlbaum. —— and Brants, T. (2000) ‘Wide-coverage probabilistic sentence processing’, Journal of Psycholinguistic Research 29: 647–69. —— and Corley, S. (2002) ‘Modular architectures and statistical mechanisms: The case from lexical category disambiguation’, in P. Merlo and S. Stevenson (eds), The Lexical Basis of Sentence Processing: Formal, Computational, and Experimental Issues. Amsterdam: John Benjamins, pp. 157–80. Cuetos, F. and Mitchell, D. C. (1988) ‘Cross-linguistic differences in parsing: Restrictions on the late closure strategy in Spanish’, Cognition 30: 73–105. ——, Mitchell, D. C., and Corley, M. M. B. (1996) ‘Parsing in different languages’, in M. Carreiras, J. Garcı´a-Albea, and N. Sabastia´n-Galle´s (eds), Language Processing in Spanish. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 145–89. Culy, C. (1998) ‘Statistical distribution and the grammatical/ungrammatical distinction’, Grammars 1: 1–19.
References
367
Davis, S. and Baertsch, K. (2005) ‘The diachronic link between onset clusters and codas’, in Proceedings of the Annual Meeting of the Berkeley Linguistics Society, BLS 31. De Smedt, K. J. M. J. (1994) ‘Parallelism in incremental sentence generation’, in G. Adriaens and U. Hahn (eds), Parallelism in Natural Language Processing. New Jersey: Ablex. De Vincenzi, M. (1991) Syntactic Parsing Strategies in Italian. Dordrecht: Kluwer Academic Publishers. Deguchi, M. and Kitagawa, Y. (2002) ‘Prosody and Wh-questions’, in M. Hirotani (ed.), Proceedings of the Thirty-second Annual Meeting of the North-Eastern Linguistic Society, pp. 73–92. Dell, G. S. (1986) ‘A spreading activation theory of retrieval in sentence production’, Psychological Review 93: 283–321. Diesing, M. (1992) Indefinites. Cambridge, MA: MIT Press. Dryer, M. S. (1992) ‘The Greenbergian word order correlations’, Language 68: 81–138. Duffield, N. (2003) ‘Measures of competent gradedness’, in R. van Hout, A. Hulk, F. Kuiken, and R. Towel (eds), The Interface between Syntax and the Lexicon in Second Language Acquisition. Amsterdam: John Benjamins. Duffy, S. A., Morris, R. K., and Rayner, K. (1988) ‘Lexical ambiguity and fixation times in reading’, Journal of Memory and Language 27: 429–46. Elman, J. L. (1991) ‘Distributed representations, simple recurrent networks and grammatical structure’, Machine Learning 9: 195–225. —— (1993) ‘Learning and development in neural networks: The importance of starting small’, Cognition 48: 71–99. Erteschik-Shir, N. (1973) ‘On the Nature of Island Constraints’, Ph.D. thesis, MIT. —— (1982) ‘Extractability in Danish and the pragmatic principle of dominance’, in E. Engdahl and E. Ejerhed (eds), Readings on Unbounded Dependencies in Scandinavian Languages. Sweden: Umea˚. —— (1986) ‘Wh-questions and focus’, Linguistics and Philosophy 9: 117–49. —— (1997) The Dynamics of Focus Structure. Cambridge: Cambridge University Press. —— (1999) ‘Focus structure theory and intonation’, Language and Speech 42(2–3): 209–27. —— (2003) ‘The syntax, phonology and interpretation of the information structure primitives topic and focus’, talk presented at GLOW workshop: Information structure in generative theory vs. pragmatics, The University of Lund, Sweden. —— and Lappin, S. (1983) ‘Dominance and extraction: A reply to A. Grosu’, Theoretical Linguistics 10: 81–96. —— and Rapoport, T. R. (to appear) The Atoms of Meaning: Interpreting Verb Projections. Ben Gurion University. Escudero, P. and Boersma, P. (2003) ‘Modelling the perceptual development of phonological contrasts with optimality theory and the gradual learning algorithm’,
368
References
in S. Arunachalam, E. Kaiser, and A. Williams (eds), Proceedings of the 25th Annual Penn Linguistics Colloquium. Penn Working Papers in Linguistics 8(1): 71–85. Escudero, P. and Boersma, P. (2004) ‘Bridging the gap between L2 speech perception research and phonological theory’, Studies in Second Language Acquisition 26: 551–85. Everaert, M. (1986) The Syntax of Reflexivization. Dordrecht: Foris. Fanselow, G. (1988) ‘Aufspaltung von NP und das Problem der ‘‘freien’’ Wortstellung’, Linguistische Berichte 114: 91–113. —— (2000) ‘Optimal exceptions’, in B. Stiebels and D. Wunderlich (eds), Lexicon in Focus. Berlin: Akademie Verlag, pp. 173–209. —— (2004) ‘The MLC and derivational economy’, in A. Stepanov, G. Fanselow, and R. Vogel (eds), Minimality Effects in Syntax. Berlin: Mouton de Gruyter. —— and C´avar, D. (2002) ‘Distributed deletion’, in A. Alexiadou (ed.), Theoretical Approaches to Universals. Amsterdam: Benjamins, pp. 65–107. ——, Kliegl, R., and Schlesewksy, M. (1999) ‘Processing difficulty and principles of grammar’, in S. Kemper and R. Kliegl (eds), Constraints on Language. Aging, Grammar, and Memory. Kluwer: Boston, pp. 171–201. ——, Kliegl, R., and Schlesewksy, M. (in preparation) ‘Syntactic variation in German Wh-questions’, to appear in Linguistic Variation Yearbook 2005. Fasold, R. (1991) ‘The quiet demise of variable rules’, American Speech 66: 3–21. Featherston, S. (2004) ‘The decathlon model of empirical syntax’, in S. Kepser and M. Reis (eds), Linguistic Evidence. Berlin: Mouton de Gruyter. —— (2005) ‘Universals and grammaticality: Wh-constraints in German and English’, Linguistics 43 (4). —— (to appear) ‘Universals and the counter-example model: Evidence from Whconstraints in German’, MS, University of Tu¨bingen. Felser, C., Clahsen, H., and Mu¨nte, T. (2003) ‘Storage and integration in the processing of filler-gap dependencies: An ERP study of topicalization and Wh-movement in German’, Brain and Language 87: 345–54. ——, Roberts, L., Marinis, T., and Gross, R. (2003) ‘The processing of ambiguous sentences by first and second language learners of English’, Applied Psycholinguistics 24: 453–89. Ferreira, F. and Clifton, C. (1986) ‘The independence of syntactic processing’, Journal of Memory and Language 25: 348–68. Fe´ry, C. (1993) German Intonational Patterns. Tu¨bingen: Niemeyer. —— (2005) ‘Laute und leise Prosodie’, in H. Blu¨hdorn (ed.), Text-Verstehen. Grammatik und daru¨ber hinaus. 41. IDS-Jahrbuch 2005. Berlin: Mouton De Gruyter, pp. 162–81. —— and Samek-Lodovici, V. (2006) ‘Focus projection and prosodic prominence in nested foci’, Language 82(1): 131–50. Fiebach, C., Schlesewsky, M., and Friederici, A. (2002) ‘Separating syntactic memory costs and syntactic integration costs during parsing: The processing of German Wh-questions’, Journal of Memory and Language 47: 250–72.
References
369
——, Schlesewsky, M., Bornkessel, I., and Friederici, A. D. (2004) ‘Distinct neural correlates of legal and illegal word order variations in German: How can fMRI inform cognitive models of sentence processing’, in M. Carreiras and C. Clifton, Jr. (eds), The On-line Study of Sentence Comprehension. New York: Psychology Press, pp. 357–70. Filiaci, F. (2003) ‘The Acquisition of Null and Overt Subjects by English-Near-Native Speakers of Italian’, M.Sc. thesis, University of Edinburgh. Fischer, S. (2004) ‘Optimal binding’, Natural Language and Linguistic Theory 22: 481–526. Flemming, E. (2001) ‘Scalar and categorical phenomena in a unified model of phonetics and phonology’, Phonology 18: 7–44. Fodor, J. D. (1998) ‘Learning to parse?’, Journal of Psycholinguistic Research 27: 285–319. —— (2002a) ‘Prosodic disambiguation in silent reading’, in M. Hirotani (ed.), Proceedings of the Thirty-second Annual Meeting of the North-Eastern Linguistic Society, pp. 113–37. —— (2002b) ‘Psycholinguistics cannot escape prosody’, Proceedings of the Speech Prosody 2002 Conference, Aix-en-Provence, pp. 83–8. —— and Frazier, L. (1978) ‘The sausage machine: A new two-stage parsing model’, Cognition 6: 291–325. Ford, M., Bresnan, J., and Kaplan, R. M. (1982) ‘A competence-based theory of syntactic closure’, in J. Bresnan (ed.), The Mental Representation of Grammatical Relations, Cambridge, MA: MIT Press, pp. 727–96. Francis, N., Kucera, H., and Mackie, A. (1982) Frequency Analysis of English Usage: Lexicon and Grammar. Boston: Houghton Mifflin. Frazier, L. (1978) ‘On Comprehending Sentences: Syntactic Parsing Strategies’, Ph.D. thesis, University of Connecticut. —— (1987) ‘Syntactic processing: Evidence from Dutch’, Natural Language and Linguistic Theory 5: 519–59. —— and Clifton, C. (1996) Construal. Cambridge, MA: MIT Press. —— and d’Arcais, G. F. (1989) ‘Filler-driven parsing: A study of gap-filling in Dutch’, Journal of Memory and Language 28: 331–44. —— and Rayner, K. (1987) ‘Resolution of syntactic category ambiguities: Eye movements in parsing lexically ambiguous sentences’, Journal of Memory and Language 26: 505–26. Frieda, E. M., Walley, A. C., Flege, J. E., and Sloane, M. E. (2000) ‘Adults’ perception and production of the English vowel /i/’, Journal of Speech, Language, and Hearing Research 43: 129–43. Friederici, A. D. (2002) ‘Towards a neural basis of auditory sentence processing’, Trends in Cognitive Sciences 6: 78–84. —— and Mecklinger, A. (1996) ‘Syntactic parsing as revealed by brain responses: First pass and second pass parsing processes’, Journal of Psycholinguistic Research 25: 157–76.
370
References
Frisch, S., Schlesewsky, M., Saddy, D., and Alpermann, A. (2001) ‘Why syntactic ambiguity is costly after all. Reading time and ERP evidence’, AMLaP Saarbru¨cken 2001. ——, Schlesewsky, M., Saddy, D., and Alpermann, A. (2002) ‘The P600 as an indicator of syntactic ambiguity’, Cognition 85: B83–B92. Frisch, S. A. (1996) ‘Similarity and Frequency in Phonology’, Ph.D. thesis, Northwestern University. —— (2000) ‘Temporally organized lexical representations as phonological units’, in M. B. Broe and J. B. Pierrehumbert (eds), Acquisition and the Lexicon: Papers in Laboratory Phonology V. Cambridge: Cambridge University Press, pp. 283–98. —— (2004) ‘Language processing and OCP effects’, in B. Hayes, R. Kirchner, and D. Steriade (eds), Phonetically-Based Phonology. Cambridge: Cambridge University Press, pp. 346–71. —— and Zawaydeh, B. A. (2001) ‘The psychological reality of OCP-place in Arabic’, Language 77: 91–106. ——, Broe, M., and Pierrehumbert, J. (1997) ‘Similarity and phonotactics in Arabic’. MS, Indiana University and Northwestern University. ——, Large, N. R., and Pisoni, D. B. (2000) ‘Perception of wordlikeness: Effects of segment probability and length on the processing of nonwords’, Journal of Memory and Language 42: 481–96. ——, Large, N., Zawaydeh, B., and Pisoni, D. (2001) ‘Emergent phonotactic generalizations’, in J. L. Bybee and P. Hopper (eds), Frequency and the Emergence of Linguistic Structure, Amsterdam: John Benjamins, pp. 159–80. ——, Pierrehumbert, J. B., and Broe, M. B. (2004) ‘Similarity avoidance and the OCP’, Natural Language and Linguistic Theory 22: 179–228. Ganong, W. F. III (1980) ‘Phonetic categorization in auditory word perception’, Journal of Experimental Psychology: Human Perception and Performance 6: 110–25. Garnsey, S. M. (1993) ‘Event-related brain potentials in the study of language: An introduction’, Language and Cognitive Processes 8: 337–56. ——, Pearlmutter, N. J., Myers, E. M., and Lotocky, M. A. (1997) ‘The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences’, Journal of Memory and Language 37: 58–93. Gathercole, S. and Baddeley, A. (1993) Working memory and language. Essays in Cognitive Psychology. Hove: Lawrence Erlbaum. Gervain, J. (2002) ‘Linguistic Methodology and Microvariation in Language: The Case of Operator-Raising in Hungarian’, unpublished M.A. thesis, Dept. of Linguistics, University of Szeged. Gibson, E. (1998) ‘Linguistic complexity: Locality of syntactic dependencies’, Cognition 68: 1–76. —— and Pearlmutter, N. J. (1998) ‘Constraints on sentence comprehension’, Trends in Cognitive Sciences 2: 262–8. —— and Schu¨tze, C. T. (1999) ‘Disambiguation preferences in noun phrase conjunction do not mirror corpus frequency’, Journal of Memory and Language 40: 263–79.
References
371
Gibson, E., Pearlmutter, N., Canseco-Gonzalez, E., and Hickok, G. (1996a) ‘Crosslinguistic attachment preferences: Evidence from English and Spanish’, Cognition 59: 23–59. ——, Schu¨tze, C. T., and Salomon, A. (1996b) ‘The Relationship between the Frequency and the Processing Complexity of Linguistic Structure’, Journal of Psycholinguistic Research 25: 59–92. Godfrey, J. J., Holliman, E. C., and McDaniel, J. (1992) ‘SWITCHBOARD: Telephone speech corpus for research and development’, in IEEE International Conference on Acoustics, Speech and Signal Processing 1992, pp. 517–20. Goldinger, S. D. (2000) ‘The role of perceptual episodes in lexical processing’, in A. Cutler, J. M. McQueen, and R. Zondervan (eds), Proceedings of SWAP (Spoken Word Access Processes), Nijmegen: Max Planck Institute for Psycholinguistics, pp. 155–9. Goldstone, R., Medin, D., and Gentner, D. (1991) ‘Relational similarity and the nonindependence of features in similarity judgments’, Cognitive Psychology 23: 222–62. Goldwater, S. and Johnson, M. (2003) ‘Learning OT constraint rankings using a ¨ . Dahl (eds), Proceedmaximum entropy model’, in J. Spenader, A. Eriksson, and O ings of the Stockholm Workshop on Variation within Optimality Theory, Stockholm University, pp. 111–20. Grabe, E. (1998) ‘Comparative Intonational Phonology: English and German’, Ph.D. thesis, Universiteit Nijmegen. Greenberg, J. H. (1963) ‘Some universals of grammar with particular reference to the order of meaningful elements’, in J. H. Greenberg (ed.), Universals of Language, Cambridge, MA: MIT Press, pp. 73–113. —— and Jenkins, J. J. (1964) ‘Studies in the psychological correlates of the sound system of American English’, Word 20: 157–77. Grice, M., Baumann, S., and Benzmu¨ller, R. (2003) ‘German intonation in autosegmental phonology’, in S.-A. Jun (ed.), Prosodic Typology. Oxford: Oxford University Press. Grimshaw, J. (1997) ‘Projection, heads, and optimality’, Linguistic Inquiry 28: 373–422. —— and Samek-Lodovici, V. (1998) ‘Optimal subjects and subject universals’, in P. Barbosa, D. Fox, P. Hangstrom, M. McGinnis, and D. Pesetsky (eds), Is the Best Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press, pp. 193–219. Grodzinsky, Y. and Reinhart, T. (1993) ‘The innateness of binding and coreference’, Linguistic Inquiry 24: 69–101. Groos, A. and H. van Riemsdijk (1981) ‘Matching effects with free relatives: A parameter of core grammar’, in A. Belletti, L. Brandi, and L. Rizzi (eds), Theories of Markedness in Generative Grammar. Pisa: Scuola Normale Superiore di Pisa, pp. 171– 216. Grosjean, F. (1980) ‘Spoken word recognition processes and the Gating paradigm’, Perception and Psychophysics 28: 267–83.
372
References
Guenther, F. H. and Gjaja, M. N. (1996) ‘The perceptual magnet effect as an emergent property of neural map formation’, JASA 100: 1111–21. Gu¨rel, A. (2004) ‘Selectivity in L2-induced L1 attrition: A psycholinguistic account’, Journal of Neurolinguistics 17: 53–78. Gussenhoven, C. (1983) ‘Testing the reality of focus domains’, Language and Speech 26: 61–80. —— (1984) On the Grammar and Semantics of Sentence Accents. Dordrecht: Foris. —— (1992) ‘Sentence accents and argument structure’ in I. Roca (ed.), Thematic Structure. Its Role in Grammar. Berlin: Foris, pp. 79–106. —— (2004) The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Guy, G. (1980) ‘Variation in the group and the individual’, in W. Labov (ed.), Locating Language in Time and Space. New York: Academic Press, pp. 1–36. —— (1981) Linguistic Variation in Brazilian Portuguese: Aspects of the Phonology, Syntax, and Language History, Ph.D. thesis, University of Pennsylvania. —— and Boberg, C. (1997) ‘Inherent variability and the obligatory contour principle’, Language Variation and Change 9: 149–64. Hahn, U. and Bailey, T. M. (2003) ‘What makes words sound similar?’, MS, Cardiff University. Hahne, A. and Friederici, A. (2001) ‘Processing a second language: Late learners’ comprehension mechanisms as revealed by event-related brain potentials’, Bilingualism: Language and Cognition 4: 123–41. Haider, H. (1993) Deutsche Syntax, generativ. Tu¨bingen: Narr. —— and Rosengren, I. (2003) ‘Scrambling: Nontriggered chain formation in OV languages’, Journal of Germanic Linguistics 15: 203–67. Hale, J. (2001) ‘A probabilistic Earley parser as a psycholinguistic model’ in Proceedings of the 2nd Conference of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA. —— (2003) ‘The information conveyed by words’, Journal of Psycholinguistic Research 32: 101–22. Hale, M. and Reiss, C. (1998) ‘Formal and empirical arguments concerning phonological acquisition’, Linguistic Inquiry 29: 656–83. —— and Reiss, C. (2000) ‘Phonology as cognition’, in N. Burton-Roberts, P. Carr, and G. Docherty (eds), Phonological Knowledge: Conceptual and Empirical Issues. New York: Oxford University Press, pp. 161–84. Hankamer, J. (1973) ‘Unacceptable ambiguity’, Linguistic Inquiry 4: 17–68. Haspelmath, M. (1999) ‘Optimality and diachronic adaptation’, Zeitschrift fu¨r Sprachwissenschaft 18: 180–205. Hawkins, J. A. (1983) Word Order Universals. New York: Academic Press. —— (1990) ‘A parsing theory of word order universals’, Linguistic Inquiry 21: 223–61. —— (1994) A Performance Theory of Order and Constituency. Cambridge: Cambridge University Press.
References
373
Hawkins, J. A. (1998) ‘Some issues in a performance theory of word order’, in A. Siewierska (ed.), Constituent Order in the Languages of Europe. Berlin: de Gruyter, pp. 729–81. —— (1999) ‘Processing complexity and filler-gap dependencies across grammars’, Language 75: 244–85. —— (2000) ‘The relative ordering of prepositional phrases in English: Going beyond manner–place–time’, Language Variation and Change 11: 231–66. —— (2001) ‘Why are categories adjacent?’, Journal of Linguistics 37: 1–34. —— (2003) ‘Efficiency and complexity in Grammars: Three general principles’, in M. Polinsky and J. Moore (eds), Explanation in Linguistics. Stanford University, Stanford: CSLI Publications. —— (2004) Efficiency and Complexity in Grammars. Oxford: Oxford University Press. Hay, J. (2003) Causes and Consequences of Word Structure. New York: Routledge. ——, Pierrehumbert, J. B., and Beckman, M. B. (2004) ‘Speech perception, wellformedness, and the statistics of the lexicon’, in J. Local, R. Ogden, and R. Temple (eds), Papers in Laboratory Phonology VI. Cambridge: Cambridge University Press, pp. 58–74. Hayes, B. (1997) ‘Four rules of inference for ranking argumentation’, MS, Department of Linguistics, University of California, Los Angeles. —— (1999) ‘Phonological restructuring in Yidi and its theoretical consequences’, in B. Hermans and M. van Oostendorp (eds), The Derivational Residue in Phonological Optimality Theory. Amsterdam: John Benjamins, pp. 175–205. —— and Lahiri, A. (1991) ‘Bengali intonational phonology’, Natural Language and Linguistic Theory 9: 47–96. —— and MacEachern, M. (1998) ‘Quatrain form in English folk verse’, Language 74: 473–507. ——, Kirchner, R., and Steriade, D. (2004) Phonetically Based Phonology. Cambridge: Cambridge University Press. Hemforth, B. (1993) Kognitives Parsing: Repra¨sentation und Verarbeitung grammatischen Wissens. Sankt Augustin: Infix. Henry, A. (1995) Belfast English and Standard English: Dialect Variation and Parameter Setting. Oxford: Oxford University Press. —— (2002) ‘Variation and syntactic theory’, in J. K. Chambers, P. Trudgill, and N. Schilling-Estes (eds), The Handbook of Language Variation and Change. Oxford: Blackwell, pp. 267–83. Hill, A. A. (1961) ‘Grammaticality’, Word 17: 1–10. Hindle, D. and Rooth, M. (1993) ‘Structural ambiguity and lexical relations’, Computational Linguistics 19: 103–20. Hirose, Y. (2003) ‘Recycling prosodic boundaries’, Journal of Psycholinguistic Research 32: 167–95. Hirotani, M. (2003) ‘Prosodic effects on the interpretation of Japanese Wh-questions’, Alonso-Ovalle, L. (ed.), University of Massachusetts Occasional Papers in Linguistics 27—On Semantic Processing, pp. 117–37.
374
References
Hirschberg, J. and Avesani, C. (2000) ‘Prosodic disambiguation in English and Italian’, in A. Botinis (ed.), Intonation. Dordrecht: Kluwer Academic Publishers. Ho¨hle, T. (1982) ‘Explikation fu¨r ‘‘normale Betonung’’ und ‘‘normale Wortstellung’’ ’, in Abraham, W. (ed.), Satzglieder im Deutschen: Vorschla¨ge zur syntaktischen, semantischen und pragmatischen Fundierung. Tu¨bingen: Narr, pp. 75–153. —— (1991) ‘On reconstruction and coordination’, in H. Haider and K. Netter (eds), Representation and Derivation in the Theory of Grammar. Dordrecht: Reidel. ¨ ber Verum-Fokus im Deutschen’, in J. Jacobs (ed.), Informationsstruk—— (1992) ‘U tur und Grammatik (¼ Linguistische Berichte, Sonderheft 4): 112–41. Hooper, J. B. (1976) ‘Word frequency in lexical diffusion and the source of morphophonological change’, in W. Christie (ed.), Current Progress in Historical Linguistics. Amsterdam: North Holland, pp. 95–105. —— (1978) ‘Constraints on schwa deletion in American English’, in J. Fisiak (ed.), Recent Developments in Historical Phonology. The Hague: Mouton, pp. 183–207. Hruska, C., Alter, K., Steinhauer, K., and Steube, A. (2001) ‘Misleading dialogues: Human’s brain reaction to prosodic information’, in C. Cave, I. Guaitella, and S. Santi (eds), Orality and Gesture. Interactions et comportements multimodaux dans la communication. Paris: L’Harmattan, pp. 425–30. Huang, C.-T. J. (1982) ‘Logical Relations in Chinese and the Theory of Grammar’, Ph.D. thesis, Massachusetts Institute of Technology. Hume, E. and Johnson, K. (2001) The Role of Speech Perception in Phonology. San Diego: Academic Press. Hyman, L. (1976) ‘Phonologization’, in A. Juilland (ed.), Linguistic Studies Offered to Joseph Greenberg. Vol. 2, Saratoga: Anma Libri, pp. 407–18. Ishihara, S. (2002) ‘Invisible but audible Wh-scope marking: Wh-constructions and deaccenting in Japanese’, Proceedings of the Twenty-first West Coast Conference on Formal Linguistics, pp. 180–93. —— (2003) ‘Intonation and Interface Conditions’, Ph.D. thesis, Massachusetts, Institute of Technology. Jackendoff, R. (1977) X-bar Syntax: A Study of Phrase Structure. Cambridge, MA: MIT Press. —— (1992) ‘Mme. Tussaud meets the Binding Theory’, Natural Language and Linguistic Theory 10: 1–33. Jacobs, J. (1997) ‘I-Topikalisierung’, Linguistische Berichte 168: 91–133. Ja¨ger, G. (2004) ‘Maximum entropy models and stochastic optimality theory’, MS, University of Potsdam. —— and Rosenbach, A. (2004) ‘The winner takes it all—almost: Cumulativity in grammatical variation’, MS, University of Potsdam and University of Du¨sseldorf. Jakubowicz, C. (2000) ‘Functional categories in (ab)normal language acquisition’, MS, Universite´ Paris 5. Jannedy, S. (2003) ‘Hat Patterns and Double Peaks: The Phonetics and Psycholinguistics of Broad versus Late Narrow versus Double Focus Intonations’, Ph.D. thesis, The Ohio State University.
References
375
Jayaseelan, K. A. (1997) ‘Anaphors as pronouns’, Studia Linguistica 51(2): 186–234. Johnson, K. (1997) ‘Speech perception without speaker normalization: An exemplar model’, in K. Johnson and J. W. Mullennix (eds), Talker Variability in Speech Processing. San Diego: Academic Press, pp. 145–65. ——, Flemming, E., and Wright, R. (1993) ‘The hyperspace effect: Phonetic targets are hyperarticulated’, Language 69: 505–28. Jongeneel, J. (1884) Dorpsspraak van Heerle vormenleer en woordenboek. Heerlen: Van Hooren 1980. Josefsson, G. (2003) ‘Four myths about object shift in Swedish—and the truth . . .’, in L.-O. Delsing (ed.), Grammar in Focus II. Festschrift for Christer Platzack. Lund: Wallin + Dalholm, pp. 199–207. Jun, S. (ed.) (2005) Prosodic Typology. The Phonology of Intonation and Phrasing. Oxford: Oxford University Press. Jurafsky, D. (1996) ‘A probabilistic model of lexical and syntactic access and disambiguation’, Cognitive Science 20: 137–94. —— (2003) ‘Probabilistic modeling in psycholinguistics: Linguistic comprehension and production’, in R. Bod, J. Hay, and S. Jannedy (eds), Probabilistic Linguistics. Cambridge, MA: MIT Press, pp. 39–95. ——, Bell, A., Gregory, M., and Raymond, W. D. (2001) ‘Probabilistic relations between words: Evidence from reduction in lexical production’, in J. Bybee and P. Hopper (eds), Frequency and the Emergence of Linguistic Structure. Amsterdam: John Benjamins. Jusczyk, P. (1997) The Discovery of Spoken Language. Cambridge, MA: MIT Press. Just, M. A. and Carpenter, P. A. (1987) The Psychology of Reading and Language Comprehension. Boston, London, Sidney, Toronto: Allyn and Bacon Inc. Kay, P. and McDaniel, C. K. (1978) ‘The linguistic significance of the meanings of basic color terms’, Language 54: 610–46. Kayne, R. (1981) ‘On certain differences between French and English’, Linguistic Inquiry 12: 349–71. Kayne, R. S. (1984) Connectedness and Binary Branching. Dordrecht: Foris. Keating, P. (1985) ‘Universal phonetics and the organization of grammars’, in V. Fromkin (ed.), Phonetic Linguistics: Essays in Honor of Peter Ladefoged. Orlando: Academic Press, pp. 115–32. —— (1988) ‘The window model of coarticulation: Articulatory evidence’, UCLA Working Papers in Phonetics 69: 3–29. —— (1996) ‘The phonology–phonetics interface’, UCLA Working Papers in Phonetics 92: 45–60. Keller, F. (1997) ‘Extraction, gradedness, and optimality’, in A. Dimitriadis, L. Siegel, C. Surek-Clark, and A. Williams (eds), Proceedings of the 21st Annual Penn Linguistics Colloquium, no. 4.2 in Penn Working Papers in Linguistics, Department of Linguistics, University of Pennsylvania, pp. 169–86. —— (2000a) ‘Evaluating competition-based models of word order’, in L. R. Gleitman and A. K. Joshi (eds), Proceedings of the 22nd Annual Conference
376
References
of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 747–52. Keller, F. (2000b) ‘Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality’, Ph.D. thesis, University of Edinburgh. —— (2001) ‘Experimental evidence for constraint competition in gapping constructions’, in G. Mu¨ller and W. Sternefeld (eds), ‘Competition in syntax’, No. 49 in Studies in Genitive Grammar. Berlin and New York: de Gruyter, pp. 211–48. —— (2003) ‘A probabilistic parser as a model of global processing difficulty’, in R. Alterman and D. Kirsh (eds), Proceedings of the 25th Annual Conference of the Cognitive Science Society. Boston, pp. 646–51. —— and Alexopoulou, T. (2001) ‘Phonology competes with syntax: Experimental evidence for the interaction of word order and accent placement in the realization of information structure’, Cognition 79: 301–72. —— and Asudeh, A. (2001) ‘Constraints on linguistic coreference: Structural vs. pragmatic factors’, in J. D. Moore and K. Stenning (eds), Proceedings of the 23rd Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 483–8. —— and Asudeh, A. (2002) ‘Probabilistic learning algorithms and optimality theory’, Linguistic Inquiry 33(2): 225–44. —— and Sorace, A. (2003) ‘Gradient auxiliary selection and impersonal passivization in German: An experimental investigation’, Journal of Linguistics 39(1): 57–108. Kellerman, E. (1987) ‘Aspects of Transferability in Second Language Acquisition’, Ph.D. thesis, University of Nijmegen. Kempen, G. and Harbusch, K. (2004) ‘Why grammaticality judgments allow more word order freedom than speaking and writing: A corpus study into argument linearization in the midfield of German subordinate clauses’, in S. Kepser and M. Reis (eds), Linguistic Evidence. Berlin: Mouton de Gruyter. Kenstowicz, M. (1994) Phonology in Generative Grammar. Cambridge: Blackwell. —— (2002) ‘Paradigmatic uniformity and contrast’, MIT Working Papers in Linguistics 42 Phonological Answers (and Their Corresponding Questions). —— and Kisseberth, C. (1977) Topics in Phonological Theory. New York: Academic Press. Kessels, M. J. H. (1883) Der koehp va Hehle. Ee Hehlisj vertelsel. Heerlen: Uitgeverij Winants. Kessler, B. and Treiman, R. (1997) ‘Syllable structure and the distribution of phonemes in English syllables’, Journal of Memory and Language 37: 295–311. Kilborn, K. (1992) ‘On-line integration of grammatical information in a second language’, in R. J. Harris (ed.), Cognitive Processing in Bilinguals. Amsterdam: Elsevier Science. Kim, A.-R. (2000) ‘A Derivational Quantification of ‘‘WH-Phrase’’ ’, Ph.D. thesis, Indiana University.
References
377
Kimball, J. (1973) ‘Seven principles of surface structure parsing in natural language’, Cognition 2: 15–47. King, J. and Just, M. (1991) ‘Individual differences in syntactic processing: The role of working memory’, Journal of Memory and Language 30: 580–602. —— and Kutas, M. (1995) ‘Who did what and when? Using clause- and word-related ERPs to monitor working memory usage in reading’, Journal of Cognitive Neuroscience 7: 378–97. Kingston, J. and Diehl, R. (1994) ‘Phonetic knowledge’, Language 70: 419–54. Kiparsky, P. (1968) ‘How abstract is phonology?’, Bloomington: Indiana University Linguistics Club. Reprinted 1982 in P. Kiparsky, Explanation in Phonology. Dordrecht: Foris, pp. 119–63. —— (1982) ‘Lexical morphology and phonology’, in The Linguistics Society of Korea (ed.), Linguistics in the Morning Calm: Selected Papers from SICOL-1981. Seoul: Hanshin Publishing Co., pp. 3–91. Kirby, S. (1999) Function, Selection and Innateness: The Emergence of Language Universals. Oxford: Oxford University Press. Kirchner, R. (1998) ‘Lenition in Phonetically-Based Optimality Theory’, Ph.D. thesis, UCLA. —— (2001) An Effort-Based Approach to Consonant Lenition. New York, NY: Routledge. (1998 UCLA Ph.D. thesis). Kiss, K. (1998) ‘Identificational focus versus information focus’, Language 74: 245–73. Kitagawa, Y. (2006) ‘Wh-Scope puzzles’, Proceedings of the Thirty-fifth Annual Meeting of the North-Eastern Linguistic Society. Connecticut, 22 October 2004. —— and Fodor, J. D. (2003) ‘Default prosody explains neglected syntactic analyses of Japanese’, Japanese/Korean Linguistics 12: 267–79. Kjelgaard, M. M. and Speer, S. R. (1999) ‘Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity’, Journal of Memory and Language 40: 153–94. Klatt, D. (1987) ‘Review of text-to-speech conversion for English’, JASA 82(3): 737–93. Klein, W. and Perdue, C. (1997). ‘The basic variety (or: Couldn’t natural languages be much simpler?)’, Second Language Research 13: 301–47. Kluender, R. and Kutas, M. (1993) ‘Bridging the gap: Evidence from ERPs on the processing of unbounded dependencies’, Journal of Cognitive Neuroscience 5: 196–214. Kolb, H.-P. (1997) ‘Is I-language a generative procedure?’, in ‘GB-blues: Two essays’, No. 110 in Arbeitspapiere des Sonderforschungsbereichs 340, Tu¨bingen: University of Tu¨bingen, pp. 1–14. Krahmer, E. and Swerts, M. (2001) ‘On the alleged existence of contrastive accents’, Speech Communication 34: 391–405. Krems, J. (1984) Erwartungsgeleitete Sprachverarbeitung. Frankfurt/Main: Lang. Krifka, M. (1998) ‘Scope inversion under the rise–fall contour in German’, Linguistic Inquiry 29: 75–112.
378
References
Kroch, A. S. (1989) ‘Reflexes of grammar in patterns of language change’, Language Variation and Change 1: 199–244. Kruskal, J. (1999) ‘An overview of sequence comparison’, in D. Sankoff and J. Kruskal (eds), Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, 2nd edn. Reading, MA: Addison-Wesley, pp. 1–44. Kubozono, H. (1993) The Organization of Japanese Prosody. Tokyo: Kurosio Publishers. Kuhl, P. K. (1991) ‘Human adults and human infants show a ‘‘perceptual magnetic effect’’ for the prototypes of speech categories, monkeys do not’, Perception and Psychophysics 50: 93–107. Kuno, S. (1973) The Structure of the Japanese Language. Cambridge, MA: MIT Press. Kutas, M. and Hillyard, S. A. (1980) ‘Reading senseless sentences: Brain potentials reflect semantic incongruity’, Science 207: 203–5. —— and Van Petten, C. (1994) ‘Psycholinguistics electrified: Event-related brain potential investigations’, in M. Gernsbacher (ed.), Handbook of Psycholinguistics. New York: Academic Press, pp. 83–143. Kvam, S. (1983). Linksverschachtelung im Deutschen und Norwegischen. Tu¨bingen: Niemeyer. Labov, W. (1969) ‘Contraction, deletion, and inherent variability of the English copula’, Language 45: 715–62. —— (1972) Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press. —— (1980) Locating Language in Time and Space. New York: Academic Press. —— (1994) Principles of Linguistic Change. Internal Factors. Oxford: Blackwell. —— (1996) ‘When intuitions fail’, Papers from the 32nd Regional Meeting of the Chicago Linguistics Society 32: 76–106. ——, Cohen, P., Robins, C., and Lewis, J. (1968) A Study of the Non-Standard English of Negro and Puerto Rican Speakers in New York City. Philadelphia: US Regional Survey. Lacerda, F. (1995) ‘The perceptual magnet effect: An emergent consequence of exemplar-based phonetic memory’, Proceedings of the XIIIth International Congress of Phonetic Sciences 2: 140–7. —— (1997) ‘Distributed memory representations generate the perceptual-magnet effect’, MS, Institute of Linguistics, Stockholm University. Ladd, D. R. (2003) ‘ ‘‘Distinctive phones’’ in surface representation’, written version of paper presented at LabPhon 8, to appear in the Proceedings. —— and Morton, R. (1987) ‘The perception of intonational emphasis: Continuous or categorical?’ Journal of Phonetics 25: 313–42. Lambrecht, K. (1994) Information Structure and Sentence Form. Cambridge: Cambridge University Press. Lapata, M., Keller, F., and Schulte im Walde, S. (2001) ‘Verb frame frequency as a predictor of verb bias’, Journal of Psycholinguistic Research 30: 419–35. Lardiere, D. (1998) ‘Case and tense in the ‘fossilized’ steady state’, Second Language Research 14: 1–26.
References
379
Lasnik, H. and Saito, M. (1992) Move a, Conditions on its Application and Output. Cambridge, MA: MIT Press. Lavoie, L. (1996) ‘Lexical frequency effects on the duration of schwa-resonant sequences in American English’, poster presented at LabPhon 5, Chicago, IL, June 1996. —— (2002) ‘Some influences on the realization of for and four in American English’, JIPA 32: 175–202. Legendre, G. (in press) ‘Optimizing auxiliary selection in Romance’, to appear in R. Aranovich (ed.), Cross-Linguistic Perspectives on Auxiliary Selection. Amsterdam: John Benjamins. Legendre, G. and Sorace, A. (2003) ‘Auxiliaires et intransitivite´ en franc¸ais et dans les langues romanes’, in D. Godard (ed.), Les langues romanes; proble`mes de la phrase simple. Paris: Editions du CNRS, pp. 185–234. ——, Miyata, Y., and Smolensky, P. (1990a) ‘Harmonic grammar—a formal multilevel connectionist theory of linguistic well-formedness: Theoretical foundations’, in Proceedings of the Twelfth Annual Conference of the Cognitive Sciences. Cambridge, MA: Lawrence Erlbaum, pp. 388–95. ——, Miyata, Y., and Smolensky, P. (1990b) ‘Harmonic grammar—A formal multilevel connectionist theory of linguistic well-formedness: An application’, in Proceedings of the Twelfth Annual Conference of the Cognitive Sciences. Cambridge, MA: Lawrence Erlbaum, pp. 884–91. ——, Miyata, Y., and Smolensky, P. (1991) ‘Unifying syntactic and semantic approaches to unaccusativity: A connectionist approach’, Proceedings of the 17th Annual Meeting of the Berkeley Linguistic Society. Berkeley: Berkeley Linguistic Society, pp. 156–67. Lehiste, I. (1973) ‘Phonetic disambiguation of syntactic ambiguity’, Glossa 7: 107–22. Lehmann, W. P. (1978) ‘The great underlying ground-plans’, in W. P. Lehmann (ed.), Syntactic Typology: Studies in the Phenomenology of Language. Austin: University of Texas Press, pp. 3–55. Leonini, C. and Belletti, A. (2004) ‘Subject inversion in L2 Italian’, in S. Foster-Cohen, M. Sharwood Smith, A. Sorace, and M. Ota (eds), Eurosla Workbook 4: 95–118. ——, and Rappaport Hovav, M. (1995) Unaccusativity at the Syntax–Semantics Interface. Cambridge, MA: MIT Press. Levin, B. and Rappaport Hovav, M. (1996) ‘From lexical semantics to argument realization’, MS, Northwestern University and Bar-Ilan University. Lewis, R. (1993) ‘An Architecturally-Based Theory of Human Sentence Comprehension’, Ph.D. thesis, Carnegie Mellon University. Li, C. and Thompson, S. (1976) ‘Subject and topic: A new typology’, in C. Li (ed.), Subject and Topic. New York: Academic Press, pp. 457–89. Liberman, M. and Pierrehumbert, J. (1984) ‘Intonational invariance under changes in pitch range and length’, in M. Aronoff and R. T. Oehrle (eds), Language Sound Structure. Cambridge, MA: MIT Press, pp. 157–233. Liceras, J., Valenzuela, E., and Dı´az, L. (1999). ‘L1/L2 Spanish Grammars and the Pragmatic Deficit Hypothesis’, Second Language Research 15: 161–90.
380
References
Lindblom, B. (1990) ‘Models of phonetic variation and selection’, PERILUS 11: 65–100. Lodge, M. (1981) Magnitude Scaling: Quantitative Measurement of Opinions. Beverley Hills, CA: Sage Publications. Lohse, B., Hawkins, J. A., and Wasow, T. (2004) ‘Domain minimization in English verb-particle constructions’, Language 80: 238–61. Lovricˇ, N. (2003) ‘Implicit Prosody in Silent Reading: Relative Clause Attachment in Croatian’, Ph.D. thesis, CUNY Graduate Center. Luce, P. A. and Large, N. (2001) ‘Phonotactics, neighborhood density, and entropy in spoken word recognition’, Language and Cognitive Processes 16: 565–81. —— and Pisoni, D. B. (1998) ‘Recognizing spoken words: The neighborhood activation model’, Ear and Hearing 19: 1–36. MacBride, A. (2004) ‘A Constraint-Based Approach to Morphology’, Ph.D. thesis, UCLA, http://www.linguistics.ucla.edu/ faciliti/diss.htm. MacDonald, M. C. (1993) ‘The interaction of lexical and syntactic ambiguity’, Journal of Memory and Language 32: 692–715. —— (1994) ‘Probabilistic constraints and syntactic ambiguity resolution’, Language and Cognitive Processes 9: 157–201. ——, Pearlmutter, N. J., and Seidenberg, M. S. (1994) ‘Lexical nature of syntactic ambiguity resolution’, Psychological Review 101: 676–703. Manning, C. D. (2003) ‘Probabilistic syntax’, in R. Bod, J. Hay, and S. Jannedy (eds), Probabilistic Linguistics. Cambridge, MA: MIT Press, pp. 289–341. —— and Schu¨tze, H. (1999) Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. Marantz, A. (2000) Class Notes. Cambridge, MA: MIT Press. Marcus, M. P. (1980) A Theory of Syntactic Recognition for Natural Language. Cambridge, MA: MIT Press. Marks, L. E. (1965) Psychological Investigations of Semi-Grammaticalness in English. Dissertation, Harvard: Harvard University —— (1967) ‘Judgments of grammaticalness of some English sentences and semisentences’, American Journal of Psychology 20: 196–204. Marslen-Wilson, W. (1987) ‘Functional parallelism in spoken word-recognition’, in U. Frauenfelder and L. Tyler (eds), Spoken Word Recognition. Cambridge, MA: MIT Press, pp. 71–102. Mateu, J. (2003) ‘Digitizing the syntax–semantics interface. The case of aux-selection in Italian and French’, MS, Universitat Auto`noma of Barcelona. Matzke, M., Mai, H., Nager, W., Ru¨sseler, J., and Mu¨nte, T. F. (2002) ‘The cost of freedom: An ERP-study of non-canonical sentences’, Clinical Neurophysiology 113: 844–52. Maynell, L. A. (1999) ‘Effect of pitch accent placement on resolving relative clause ambiguity in English’, The 12th Annual CUNY Conference on Human Sentence Processing (Poster). New York, March. McCarthy, J. (2003) ‘OT constraints are categorical’, Phonology 20: 75–138. —— and Prince, A. (1993) ‘Generalized alignment’, in G. Booij and J. van Marle (eds), Morphology Yearbook 1993. Dordrecht: Kluwer, pp. 79–153.
References
381
—— and Prince, A. (1995) ‘Faithfulness and reduplicative identity’, in J. Beckman, L. W. Dickey, and S. Urbanczyk (eds), Papers in Optimality Theory. University of Massachusetts Occasional Papers 18. Amherst, MA: Graduate Linguistic Student Association, pp. 249–384. McDaniel, D. and Cowart, W. (1999) ‘Experimental evidence of a minimalist account of English resumptive pronouns’, Cognition 70: B15–B24. McElree, B. (1993) ‘The locus of lexical preference effects in sentence comprehension’, Journal of Memory and Language 32: 536–71. —— (2000) ‘Sentence comprehension is mediated by content-addressable memory structures’, Journal of Psycholinguistic Research 29: 111–23. —— and Griffith, T. (1995) ‘Syntactic and thematic processing in sentence comprehension’, Journal of Experimental Psychology: Learning, Memory and Cognition 21: 134–57. —— and Griffith, T. (1998) ‘Structural and lexical effects on filling gaps during sentence processing: A time-course analysis’, Journal of Experimental Psychology: Learning, Memory, and Cognition 24: 432–60. —— and Nordlie, J. (1999) ‘Literal and figurative interpretations are computed in equal time’, Psychonomic Bulletin and Review 6: 486–94. ——, Foraker, S., and Dyer, L. (2003) ‘Memory structures that subserve sentence comprehension’, Journal of Memory and Language 48: 67–91. McQueen, J. M. and Cutler, A. (1997) ‘Cognitive processes in speech perception’, in W. J. Hardcastle and J. Laver (eds), The Handbook of Phonetic Sciences. Oxford: Blackwell, pp. 566–85. McRae, K., Spivey-Knowlton, M. J., and Tanenhaus, M. K. (1998) ‘Modeling the influence of thematic fit (and other constraints) in on-line sentence comprehension’, Journal of Memory and Language 38: 283–312. Mecklinger, A., Schriefers, H., Steinhauer, K., and Friederici, A. D. (1995) ‘Processing relative clauses varying on syntactic and semantic dimensions: An analysis with event-related potentials’, Memory and Cognition 23: 477–94. Mendoza-Denton, N., Hay, J., and Jannedy, S. (2003) ‘Probabilistic sociolinguistics: Beyond variable rules’, in R. Bod, J. Hay, and S. Jannedy (eds), Probabilistic Linguistics. Cambridge University, MA: MIT Press. Meng, M. (1998) Kognitive Sprachverarbeitung. Rekonstruktion syntaktischer Strukturen beim Lesen. Wiesbaden: Deutscher Universita¨tsverlag. —— and Bader, M. (2000a) ‘Mode of disambiguation and garden path strength: An investigation of subject–object ambiguities in German’, Language and Speech 43: 43–74. —— and Bader, M. (2000b) ‘Ungrammaticality detection and garden path strength: evidence for serial parsing’, Language and Cognitive Processes 15(6): 615–66. Mikheev, A. (1997) ‘Automatic rule induction for unknown-word guessing’, Computational Linguistics 23: 405–23.
382
References
Mitchell, D. C., Cuetos, F., Corley, M. M. B., and Brysbaert, M. (1996) ‘Exposure based models of human parsing: Evidence for the use of coarse-grained (nonlexical) statistical records’, Journal of Psycholinguistic Research 24: 469–88. Montrul, S. (2002) ‘Incomplete acquisition and attrition of Spanish tense/aspect distinctions in adult bilinguals’, Bilingualism: Language and Cognition 5: 39–68. —— (2004) ‘Subject and object expression in Spanish heritage speakers: A case of morphosyntactic convergence’, Bilingualism: Language and Cognition 7(2): 125–42. —— (in press) ‘Second language acquisition and first language loss in adult early bilinguals: Exploring some differences and similarities’, to appear in Second Language Research. Moreton, E. (2002) ‘Structural constraints in the perception of English stop-sonorant clusters’, Cognition 84: 55–71. Morgan, J. L. (1973) ‘Sentence fragments and the notion ‘‘Sentence’’ ’, in B. B. Kachru, R. B. Lees, Y. Malkiel, A. Pietrangeli, and S. Saporta (eds), Issues in Linguistics: Papers in Honor of Henry and Renee Kahane. Urbana, IL: University of Illinois Press. Mu¨ller, G. (1999) ‘Optimality, markedness, and word order in German’, Linguistics 37(5): 777–818. —— (2005) ‘Subanalyse verbaler Flexionsmarker’, MS, Universita¨t Leipzig. Mu¨ller, H. M., King, J. W., and Kutas, M. (1997) ‘Event-related potentials elicited by spoken relative clauses’, Cognitive Brain Research 5: 193–203. Mu¨ller, N. and Hulk, A. (2001) ‘Crosslinguistic influence in bilingual language acquisition: Italian and French as recipient languages’, Bilingualism: Language and Cognition 4: 1–22. Mu¨ller, S. (2004) ‘Complex NPs, subjacency, and extraposition’, Snippets, Issue 8. Munson, B. (2001) ‘Phonological pattern frequency and speech production in adults and children’, Journal of Speech, Language, and Hearing Research 44: 778–92. Murphy, V. A. (1997) ‘The effect of modality on a grammaticality judgment task’, Second Language Research 13: 34–65. Muysken, P. (2000) Bilingual Speech. A Typology of Code-Mixing. Cambridge: Cambridge University Press. Nagy, N. and Reynolds, B. (1997) ‘Optimality theory and variable word-final deletion in Faetar’, Language Variation and Change 9: 37–55. Narayanan, S. and Jurafsky, D. (1998) ‘Bayesian models of human sentence processing’, in M. A. Gernsbacher and S. J. Derry (eds), Proceedings of the 20th Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates. Nespor, M. and Vogel, I. (1986) Prosodic Phonology. Dordrecht: Foris. Ney, H., Essen, U., and Kneser, R. (1994) ‘On structuring probabilistic dependencies in stochastic language modeling’, Computer Speech and Language 8: 1–28. Nooteboom, S. G. and Kruyt, J. G. (1987) ‘Accents, focus distribution, and the perceived distribution of given and new information: An experiment’, Journal of the Acoustical Society of America 82: 1512–24.
References
383
Nusbaum, H. C., Pisoni, D. B., and Davis, C. K. (1984) ‘Sizing up the Hoosier mental lexicon: Measuring the familiarity of 20,000 words’, Research on Speech Perception, Progress Report 10. Bloomington: Speech Research Laboratory, Indiana University, pp. 357–76. Ohala, J. J. (1992) ‘Alternatives to the sonority hierarchy for explaining the shape of morphemes’, Papers from the Parasession on the Syllable. Chicago: Chicago Linguistic Society, 319–38. Osterhout, L. and Holcomb, P. J. (1992) ‘Event-related brain potentials elicited by syntactic anomaly’, Journal of Memory and Language 31: 785–804. Paolillo, J. C. (1997) ‘Sinhala diglossia: Discrete or continuous variation?’, Language in Society 26, 2: 269–96. Paradis, J. and Navarro, S. (2003) ‘Subject realization and cross-linguistic interference in the bilingual acquisition of Spanish and English: What is the role of input?’, Journal of Child Language 30: 1–23. Pechmann, T., Uszkoreit, H., Engelkamp, J., and Zerbst, D. (1996) ‘Wortstellung im deutschen Mittelfeld. Linguistische Theorie und psycholinguistische Evidenz’, in C. Habel, S. Kanngießer, and G. Rickheit (eds), Perspektiven der Kognitiven Linguistik. Modelle und Methoden. Opladen: Westdeutscher Verlag, pp. 257–99. Perlmutter, D. (1978) ‘Impersonal passives and the unaccusative hypothesis’, Berkeley Linguistic Society 4: 126–70. Pesetsky, D. (1987) ‘Wh-in situ: Movement and unselective binding’, in E. Reuland and A. T. Meulen (eds), The Representation of (in)Definiteness. Cambridge, MA: MIT Press, pp. 98–129. Peters, J. (2005) Intonatorische Variation im Deutschen. Studien zu ausgewa¨hlten Regionalsprachen. Habilitation thesis. University of Potsdam. Pickering, M. J., Traxler, M. J., and Crocker, M. W. (2000) ‘Ambiguity resolution in sentence processing: Evidence against frequency-based accounts’, Journal of Memory and Language 43: 447–75. Pierrehumbert, J. (1980) ‘The Phonology and Phonetics of English Intonation’, Ph.D. thesis, MIT. —— (1994) ‘Syllable structure and word structure: A study of triconsonantal clusters in English’, in P. Keating (ed.), Phonological Structure and Phonetic Form: Papers in Laboratory Phonology III. Cambridge: Cambridge University Press, pp. 168–88. —— (2001) ‘Stochastic phonology’, GLOT 5 No. 6: 195–207. —— (2002) ‘Word-specific phonetics’, in C. Gussenhoven and N. Warner (eds), Laboratory Phonology 7. Berlin: Mouton de Gruyter, pp. 101–39. —— (2003) ‘Probabilistic phonology: Discrimination and robustness’, in R. Bod, J. Hay, and S. Jannedy (eds), Probabilistic Linguistics. Cambridge, MA: MIT Press, pp. 177–228. —— and Steele, S. (1989) ‘Categories of tonal alignment in English’, Phonetica 46: 181–96. ——, Beckman, M. E. and Ladd, D. R. (2000) ‘Conceptual foundations in phonology as a laboratory science’, in N. Burton-Roberts, P. Carr, and G. Docherty (eds),
384
References
Phonological Knowledge: Conceptual and Empirical Issues. New York: Oxford University Press, pp. 273–304. Pinker, S. (1999) Words and Rules: The Ingredients of Language. New York: Basic Books. —— and Prince, A. (1988) ‘On language and connectionism: Analysis of a parallel distributed processing model of language acquisition’, Cognition 28: 73–193. Pitt, M. A. and McQueen, J. M. (1998) ‘Is compensation for coarticulation mediated by the lexicon?’, Journal of Memory and Language 39: 347–70. Pittner, K. (1991) ‘Freie Relativsa¨tze und die Kasushierarchie.’, in E. Feldbusch (ed.), Neue Fragen der Linguistik. Tu¨bingen: Niemeyer, pp. 341–7. Polinsky, M. (1995) ‘American Russian: Language loss meets language acquisition’, in W. Browne, E. Dornish, N. Kondrashova and D. Zec (eds), Annual Workshop on Formal Approaches to Slavic Linguistics. Ann Arbor: Michigan Slavic Publications, pp. 371–406. Pollard, C. and Sag, I. A. (1987) Information-Based Syntax and Semantics, Vol.1: Fundamentals. Stanford University, Stanford: CSLI Lecture Notes No. 13. —— and Sag, I. A. (1992) ‘Anaphors in English and the scope of the binding theory’, Linguistic Inquiry 23: 261–305. Prasada, S. and Pinker, S. (1993) ‘Generalization of regular and irregular morphological patterns’, Language and Cognitive Processes 8: 1–56. Pre´vost, P. and White, L. (2000) ‘Missing surface inflection or impairment in second language acquisition? Evidence from tense and agreement’, Second Language Research 16: 103–33. Prince, A. and Smolensky, P. (1993) ‘Optimality theory: Constraint interaction in generative grammar’. Technical Report TR-2, Rutgers University Center for Cognitive Science. Published as Prince and Smolensky (2004). —— and Smolensky, P. (1997) ‘Optimality: From neural networks to universal grammar’, Science 275: 1604–10. —— and Smolensky, P. (2004) Optimality theory: Constraint interaction in generative grammar. Oxford: Blackwell. Pritchett, B. L. (1992) Grammatical Competence and Parsing Performance. Chicago: University of Chicago Press. Ramscar, M. (2002) ‘The role of meaning in inflection: Why the past tense does not require a rule’, Cognitive Psychology 45: 45–94. Randall, J. (in press) ‘Features and linking rules: A parametric account of auxiliary selection’, to appear in R. Aranovich (ed.), Cross-Linguistic Perspectives on Auxiliary Selection. Amsterdam: John Benjamins. Rayner, K., Carlson, M., and Frazier, L. (1983) ‘Interaction of syntax and semantics during sentence processing: Eye movements in the analysis of semantically biased sentences’, Journal of Verbal Learning and Verbal Behavior 22: 358–74. Reinhart, T. (1981) ‘Pragmatics and linguistics: An analysis of sentence topics’, Philosophica 27: 53–94.
References
385
—— (1996) ‘Interface economy—focus and markedness’, in C. Wilder, H. M. Gaertner, and M. Bierwisch (eds), The Role of Economy Principles in Linguistic Theory. Berlin: Akademic Verlag. —— (2000a) ‘Strategies of anaphora resolution’, in H. Bennis, M. Everaert, and E. Reuland (eds), Interface Strategies. Amsterdam: Royal Netherlands Academy of Arts and Sciences, pp. 295–324. —— (2000b) ‘The theta system: Syntactic realization of verbal concepts’, OTS working paper in linguistics 00, 01/TL, Utrecht Institute of Linguistics, OTS. —— (2003) ‘The theta system—an overview’, Theoretical Linguistics 28(3). —— and Reuland, E. (1991) ‘Anaphors and logophors: An argument structure perspective’, in J. Koster and E. Reuland (eds), Long Distance Anaphora. Cambridge: Cambridge University Press, pp. 283–321. —— and Reuland, E (1993) ‘Reflexivity’, Linguistic Inquiry 24: 657–720. Reuland, E. (2000) ‘The fine structure of grammar: Anaphoric relations’, in Z. Frajzyngier and T. Curl (eds), Reflexives: Forms and Functions. Amsterdam: John Benjamins, pp. 1–40. —— (2001) ‘Primitives of binding’, Linguistic Inquiry 32(2): 439–92. —— (2003) ‘State-of-the-article. Anaphoric dependencies: A window into the architecture of the language system’, GLOT International 7(1/2): 2–25. —— and Reinhart, T. (1995) ‘Pronouns, anaphors and case’, in H. Haider, S. Olsen, and S. Vikner (eds), Studies in Comparative Germanic Syntax. Dordrecht: Kluwer, pp. 241–69. Reynolds, B. (1994) ‘Variation and Phonological Theory’, Ph.D. thesis, University of Pennsylvania. Riehl, A. (2003a) ‘American English flapping: Evidence against paradigm uniformity with phonetic features’, Proceedings of the 15th International Congress of Phonetic Sciences, 2753–6. —— (2003b) ‘American English flapping: Perceptual and acoustic evidence against pardigm uniformity with phonetic features’, Working Papers of the Cornell Phonetics Laboratory 15: 271–337. Riemsdijk, H. van (1989) ‘Movement and regeneration’, in P. Beninca` (ed.), Dialectal Variation and the Theory of Grammar. Dordrecht: Foris, pp. 105–36. Riezler, S. (1996) ‘Quantitative constraint logic programming for weighted grammar applications’, in Proceedings of the 1st Conference on Logical Aspects of Computational Linguistics. Berlin: Springer. Ringen, C. and Heinamaki, O. (1999) ‘Variation in Finnish vowel harmony: An OT account’, Natural Language and Linguistic Theory 17: 303–37. Rizzi, L. (1982) Italian Syntax. Dordrecht: Foris. —— (2002) ‘On the grammatical basis of language development: A case study’, MS, University of Siena. —— (2004) The Structure of CP and IP. The Cartography of Syntactic Structures, Volume 2. Oxford: Oxford University Press.
386
References
Robins, R. H. (1957) ‘Vowel nasality in Sundanese: A phonological and grammatical study’, Studies in Linguistics (special volume of the Philological Society). Oxford: Basil Blackwell, pp. 87–103. Ro¨der, B., Schicke, T., Stock, O., Heberer, G., and Ro¨sler, F. (2000) ‘Word order effects in German sentences and German pseudo-word sentences’, Zeitschrift fu¨r Sprache und Kognition 19: 31–7. ——, Stock, O., Neville, H., Bien, S., and Ro¨sler, F. (2002) ‘Brain activation modulated by the comprehension of normal and pseudo-word sentences of different processing demands: A functional magnetic resonance imaging study’, NeuroImage 15: 1003–14. Rohdenburg, G. (1996) ‘Cognitive complexity and grammatical explicitness in English’, Cognitive Linguistics 7: 149–82. Roland, D. and Jurafsky, D. (1998) ‘How verb subcategorization frequencies are affected by corpus choice’, in Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics. Montre´al, pp. 1122–28. —— and Jurafsky, D. (2002) ‘Verb sense and verb subcategorization probabilities’, in P. Merlo and S. Stevenson (eds), The Lexical Basis of Sentence Processing: Formal, Computational, and Experimental Issues. Amsterdam: John Benjamins, pp. 325–46. Ross, J. R. (1971) ‘Variable Strength’, MS, MIT. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986a) ‘Learning internal representations by error propagation’, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations. Cambridge, MA: MIT Press, pp. 318–62. ——, McClelland, J. L., and the PDP Research Group (1986b) Parallel Distributed Processing. Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press. Russell, K. (1999) ‘MOT: Sketch of an optimality theoretic approach to morphology’, MS, http://www.umanitoba.ca/linguistics/russell/. Sabourin, L. (2003) ‘Grammatical Gender and Second Language Processing’, Ph.D. thesis, University of Groningen. Saito, M. (1985) ‘Some Asymmetries in Japanese and Their Theoretical Consequences’, Ph.D. thesis, MIT. —— (1989) ‘Scrambling as semantically vacuous A’-movement’, in M. Baltin and A. Kroch (eds), Alternative Conceptions of Phrase Structure. Chicago: University of Chicago Press. Samuel, A. G. (1981) ‘The role of bottom-up confirmation in the phonemic-restoration illusion’, Journal of Experimental Psychology: Human Perception and Performance 7: 1124–31. Sankoff, D. and Labov, W. (1979) ‘On the uses of variable rules’, Language in Society 8: 189–222. Sapir, E. and Hoijer, H. (1967) The Phonology and Morphology of the Navajo Language. Berkeley: University of California Press.
References
387
Sarle, W. S. (1994). ‘Neural networks and statistical models’, in Proceedings of the 19th Annual SAS Users Group International Conference. Cary, NC: SAS Institute, pp. 1538–50. Schafer, A. J. (1997) ‘Prosodic Parsing: The Role of Prosody in Sentence Comprehension’, Ph.D. thesis, Amherst, MA: University of Massachusetts. ——, Carlson, K., Clifton, C., and Frazier, L. (2000) ‘Focus and the interpretation of pitch accent: Disambiguating embedded questions’, Language and Speech 43: 75–105. Schladt, M. (2000) ‘The typology and grammaticalization of reflexives’, in Z. Frajzyngier and T. Curl (eds), Reflexives: Forms and Functions. Amsterdam: John Benjamins. Schlesewsky, M. and Bornkessel, I. (2003) ‘Ungrammaticality detection and garden path strength: A commentary on Meng and Bader’s (2000) evidence for serial parsing’, Language and Cognitive Processes 18: 299–311. —— and Bornkessel, I. (2004) ‘On incremental interpretation: Degrees of meaning accessed during sentence comprehension’, Lingua 114: 1213–34. ——, Fanselow, G., Kliegl, R., and Krems, J. (2000) ‘The subject-preference in the processing of locally ambiguous Wh-questions in German’, in B. Hemforth and L. Konieczny (eds), German Sentence Processing. Dordrecht: Kluwer, pp. 65–93. ——, Fanselow, G., and Frisch, S. (2003) ‘Case as a trigger for reanalysis—some arguments from the processing of double case ungrammaticalities in German’, MS, University of Potsdam. Schmerling, S. (1976) Aspects of English Sentence Stress. Austin: University of Texas Press. Schmitz, K. (2003) ‘Subject omission and realization in German–Italian bilingual children’, MS, University of Hamburg. Schriefers, H., Friederici, A. D., and Ku¨hn, K. (1995) ‘The processing of locally ambiguous relative clauses in German’, Journal of Memory and Language 34: 499–520. Schu¨tze, C. T. (1996) The Empirical Base of Linguistics. Grammaticality Judgments and Linguistic Methodology. Chicago: The University of Chicago Press. —— and Gibson, E. (1999) ‘Argumenthood and English prepositional phrase attachment’, Journal of Memory and Language 40: 409–31. Schwarzschild, R. (1999) ‘GIVENness, AvoidF and other constraints on the placement of accent’, Natural Language Semantics 7: 141–77. Scobbie, J. (2004) ‘Flexibility in the face of incompatible English VOT systems’, written version of Lab Phon 8 paper. Selkirk, E. (1984) Phonology and Syntax. The Relation between Sound and Structure. Cambridge, MA: MIT Press. —— (1995) ‘Sentence prosody: Intonation, stress and phrasing’, in J. Goldsmith (ed.), Handbook of Phonological Theory. Cambridge, MA: Blackwell, pp. 550–69. —— (2000) ‘The interaction of constraints on prosodic phrasing’, in M. Horne (ed.), Prosody: Theory and Experiment. Amsterdam: Kluwer, pp. 231–62.
388
References
Sendlmeier, W. F. (1987) ‘Auditive judgments of word similarity’, Zeitschrift fu¨r Phonetik, Sprachwissenschaft und Kommunikationsforschung 40: 538–46. Serratrice, L. (2004) ‘Anaphoric interpretation of null and overt pronominal subjects in bilingual and monolingual Italian acquisition’, MS, University of Manchester. ——, Sorace, A., and Paoli, S. (2004) ‘Transfer at the syntax–pragmatics interface: Subjects and objects in Italian–English bilingual and monolingual acquisition’, Bilingualism: Language and Cognition: 183–207. Sevald, C. and Dell, G. S. (1994) ‘The sequential cueing effect in speech production’, Cognition 53: 91–127. Skousen, R., Lonsdale, D., and Parkinson, D. B. (2002) Analogical Modeling: An Exemplar-Based Approach to Language. Amsterdam: John Benjamins. Skut, W., Krenn, B., Brants, T., and Uszkoreit, H. (1997) ‘An annotation scheme for free word order languages’, in Proceedings of the 5th Conference on Applied Natural Language Processing. Washington, DC. Smith, J. (2000) ‘Positional faithfulness and learnability in optimality theory’, in R. Daly and A. Riehl (eds), Proceedings of ESCOL 99. Ithaca: CLC Publications, pp. 203–14. Smith, O. W., Koutstaa, C. W., and Kepke, A. N. (1969) ‘Relation of language distance to learning to pronounce Greenberg and Jenkins List-1 CCVCS’, Perception and Motor Skills 29: 187. Smolensky, P., Legendre, G., and Miyata, Y. (1992) ‘Principles for an integrated connectionist/symbolic theory of higher cognition’, Report CU-CS-600–92, Computer Science Department, University of Colorado at Boulder. ——, Legendre, G., and Miyata, Y. (1993) ‘Integrating connectionist and symbolic computation for the theory of language’, Current Science 64: 381–91. Snyder, W. (2000) ‘An experimental investigation of syntactic satiation effects’, Linguistic Inquiry I 31: 575–82. Sorace, A. (1992) ‘Lexical Conditions on Syntactic Knowledge: Auxiliary Selection in Native and Non-Native Grammars of Italian’, Ph.D. thesis, University of Edinburgh. —— (1993a) ‘Incomplete vs. divergent representations of unaccusativity in nonnative grammars of Italian’, Second Language Research 9: 22–47. —— (1993b) ‘Unaccusativity and auxiliary choice in non-native grammars of Italian and French: asymmetries and predictable indeterminacy’, Journal of French Language Studies 3: 71–93. —— (1995) ‘Acquiring argument structures in a second language: The unaccusative/ unergative distinction’, in L. Eubank, L. Selinker, and M. Sharwood Smith (eds), The Current State of the Interlanguage. Amsterdam: John Benjamins, pp. 153–75. —— (1996) ‘The use of acceptability judgments in second language acquisition research’, in T. Bhatia and W. Ritchie (eds), Handbook of Second Language Acquisition. San Diego: Academic Press. —— (2000a) ‘Syntactic optionality in L2 acquisition’, Second Language Research 16: 93–102.
References
389
Sorace, A. (2000b) ‘Gradients in auxiliary selection with intransitive verbs’, Language 76: 859–90. —— (2003a) ‘Gradedness at the lexicon–syntax interface: Evidence from auxiliary selection and implications for unaccusativity’, in A. Alexiadou, E. Anagnostopoulou, and M. Everaert (eds), The Unaccusativity Puzzle: Explorations in the Syntax– Lexicon Interface. Oxford: Oxford University Press, pp. 243–68. —— (2003b) ‘Near-nativeness’, in M. Long and C. Doughty (eds), Handbook of Second Language Acquisition. Oxford: Blackwell, pp. 130–51. —— (2005) ‘Syntactic optionality at interfaces’, in L. Cornips and K. Corrigan (eds), Syntax and Variation: Reconciling the Biological and the Social. Amsterdam: John Benjamins, pp. 46–111. —— (in press) ‘Possible manifestations of ‘‘shallow processing’’ in advanced L2 speakers’, to appear in Applied Psycholinguistics. —— and Keller, F. (2005) ‘Gradedness in linguistic data’, Lingua 115: 1497–1524. Speer, S., Warren, P., and Schafer, A. (2003) ‘Intonation and sentence processing’, in Proceedings of the International Congress of Phonetic Sciences 15. Barcelona: 95–106. Sproat, R. and Fujimura, O. (1993) ‘Allophonic variation in English /l/ and its implications for phonetic implementation’, Journal of Phonetics 21: 291–311. Sprouse, R. A. and Vance, B. (1999) ‘An explanation for the decline of null pronouns in certain Germanic and Romance languages’, in M. DeGraff (ed.), Language Creation and Language Change: Creolization, Diachrony, and Development. Cambridge, MA: MIT Press, pp. 257–84. Stallings, L. M. (1998) ‘Evaluating Heaviness: Relative Weight in the Spoken Production of Heavy-NP Shift’, Ph.D. thesis, University of Southern California. ——, MacDonald, M., and O’Seaghda, P. (1998) ‘Phrasal ordering constraints in sentence production: Phrase length and verb disposition in heavy-NP shift’, Journal of Memory and Language 39: 392–417. Stechow, A. v. and Uhmann, S. (1986) ‘Some remarks on focus projection’, in W. Abraham and S. de Meij (eds), Topic, Focus and Configurationality. Amsterdam/ Philadelphia: John Benjamins, pp. 295–320. Steinhauer, K. (2000) ‘Hirnphysiologische Korrelate prosodischer Satzverarbeitung bei gesprochener und geschriebener Sprache’, MPI series in cognitive neuroscience 18. Steriade, D. (1990) ‘Gestures and autosegments: Comments on Browman and Goldstein’s Paper’, in J. Kingston and M. Beckman (eds), Papers in Laboratory Phonology II: Between the Grammar and Physics in Speech. Cambridge: Cambridge University Press, pp. 382–97. —— (1995) ‘Positional neutralization’, unfinished MS, UCLA. —— (2000) ‘Paradigm uniformity and the phonetics/phonology boundary’, in M. Broe, and J. B. Pierrehumbert (eds), Acquisition and the Lexicon: Papers in Laboratory Phonology V. Cambridge: Cambridge University Press, pp. 313–35. —— (2001) ‘Directional asymmetries in place assimilation: A perceptual account’, in E. Hume and K. Johnson (eds), The Role of Speech Perception in Phonology. New York: Academic Press, pp. 219–50.
390
References
Sternefeld, W. (2001) ‘Grammatikalita¨t und Sprachvermo¨gen. Anmerkungen zum Induktionsproblem in der Syntax’, in J. Bayer and C. Ro¨mer (eds), Von der Philologie zur Grammatiktheorie: Peter Suchsland zum 65. Geburtstag. Tu¨bingen: Niemeyer, pp. 15–44. Stevens, S. S. (1975) ‘On the psychophysical law’, Psychological Review 64: 153–81. Stolcke, A. (1995) ‘An efficient probabilistic context-free parsing algorithm that computes prefix probabilities’, Computational Linguistics 21: 165–201. Stowell, T. and Beghelli, F. (1997) ‘Distributivity and negation’, in A. Szabolcsi (ed.), Ways of Scope Taking. Dordrecht: Kluwer, pp. 71–107. Strawson, P. F. (1964) ‘Identifying reference and truth-values’, Theoria 30: 96–118. Sturt, P., Pickering, M. J., and Crocker, M. W. (1999) ‘Structural change and reanalysis difficulty in language comprehension’, Journal of Memory and Language 40: 136–50. ——, Pickering, M. J., Scheepers, C., and Crocker, M. W. (2001) ‘The preservation of structure in language comprehension: Is reanalysis the last resort?’ Journal of Memory and Language 45: 283–307. Suppes, P. (1970) ‘Probabilistic grammars’, Synthese 22: 95–116. Swinney, D. A. (1979) ‘Lexical access during sentence comprehension: (Re)consideration of context effects’, Journal of Verbal Learning and Verbal Behavior 18: 645–60. Szendroi, K. (2004) ‘Focus and the interaction between syntax and pragmatics’, Lingua 114: 229–54. Takahashi, D. (1993) ‘Movement of Wh-phrases in Japanese’, Natural Language and Linguistic Theory 11: 655–78. Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., and Sedivy, J. C. (1995) ‘Integration of visual and linguistic information in spoken language comprehension’, Science 268: 1632–4. ——, Spivey-Knowlton, M. J., and Hanna, J. E. (2000) ‘Modelling discourse context effects: A multiple constraints approach’, in M. Crocker, M. Pickering, and C. Clifton (eds), Architectures and Mechanisms for Language Processing. Cambridge: Cambridge University Press, pp. 90–118. Taraban, R. and McClelland, J. L. (1988) ‘Constituent attachment and thematic role assignment in sentence processing: Influences of content-based expectation’, Journal of Memory and Language 27: 597–632. Tesar, B. (1997) ‘An iterative strategy for learning metrical stress in optimality theory’, in E. Hughes, M. Hughes, and A. Greenhill (eds), Proceedings of the 21st Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla, pp. 615–26. —— and Smolensky, P. (1998) ‘Learnability in optimality theory’, Linguistic Inquiry 29(2): 229–68. —— and Smolensky, P. (2000) Learnability in Optimality Theory. Cambridge, MA: MIT Press.
References
391
Thra´insson, H. (1991) ‘Long-distance reflexives and the typology of NPs’, in J. Koster and E. Reuland (eds), Long-Distance Anaphora. Cambridge: Cambridge University Press, pp. 49–76. Timberlake, A. (1977) ‘Reanalysis and actualization in syntactic changes’, Linguistic Inquiry 8: 141–77. Timmermans, M., Schriefers, H., Dijkstra, T., and Haverkorth, M. (2004) ‘Disagreement on agreement: Person agreement between coordinated subjects and verbs in Dutch and German’, in Linguistics 42: 905–29. Tomlin, R. S. (1986) Basic Word Order: Functional Principles. London: Routledge (Croom Helm). Travis, L. (1984) ‘Parameters and Effects of Word Order Variation’, Ph.D. thesis, Department of Linguistics, MIT. —— (1989) ‘Parameters of phrase structure’, in M. R. Baltin, and A. S. Kroch (eds), Alternative Conceptions of Phrase Structure. Chicago: The University of Chicago Press. Treisman, M. (1978) ‘Space or lexicon? The word frequency effect and the error frequency effect’, Journal of Verbal Learning and Verbal Behavior 17: 37–59. Truckenbrodt, H. (1999) ‘On the relation between syntactic phrases and phonological phrases’, Linguistic Inquiry 30: 219–55. Trueswell, J. C. (1996) ‘The role of lexical frequency in syntactic ambiguity resolution’, Journal of Memory and Language 35: 566–85. —— and Tanenhaus, M. K. (1994) ‘Toward a lexicalist framework for constraintbased syntactic ambiguity resolution’, in C. Clifton, L. Frazier, and K. Rayner (eds), Perspectives on Sentence Processing. Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 155–79. ——, Tanenhaus, M. K., and Kello, C. (1993) ‘Verb-specific constraints in sentence processing: Separating effects of lexical preference from gardenpaths’, Journal of Experimental Psychology: Learning, Memory, and Cognition 19: 528–53. Tsimpli, I. and Sorace, A. (2005) ‘Differentiating ‘‘interfaces’’: L2 performance in syntax/semantics and syntax/discourse phenomena’, MS, University of Thessaloniki and University of Edinburgh. ——, Sorace, A., Heycock, C., and Filiaci, F. (2004) ‘First language attrition and syntactic subjects: A study of Greek and Italian near-native speakers of English’, International Journal of Bilingualism 8: 257–77. Ueyama, A. (1998) ‘Two Types of Dependency’, Ph.D. thesis, University of Southern California. Vallduvı´, E. (1992) The Informational Component. New York: Garland. Van Hoof, H. (1997) ‘On split topicalization and ellipsis’, Technical Report. 112, Arbeitspapiere des Sonderforschungsbereichs 340, Tu¨bingen. Van Hout, A. (2000) ‘Event semantics in the lexicon–syntax interface: Verb frame alternations in Dutch and their acquisition’, in C. Tenny and J. Pustejovsky (eds), Events as Grammatical Objects. Stanford: CSLI, 239–82.
392
References
Vennemann, T. (1974) ‘Theoretical word order studies: Results and problems’, Papiere zur Linguistik 7: 5–25. Vergnaud, J. R. and Zubizarreta, M. L. (1992) ‘The definite determiner and the inalienable constructions in French and English’, Linguistic Inquiry 23: 592–652. Vetter, H. J., Volovecky, J., and Howell, R. W. (1979) ‘Judgments of grammaticalness: A partial replication and extension’, Journal of Psycholinguistic Research 8: 567–83. Viterbi, A. J. (1967) ‘Error bounds for convolutional codes and an asymptotically optimal decoding algorithm’, IEEE Transactions on Information Processing 13: 260–9. Vitevitch, M. and Luce, P. (1998) ‘When words compete: Levels of processing in perception of spoken words’, Psychological Science 9: 325–9. ——, Luce, P., Charles-Luce, J., and Kemmerer, D. (1997) ‘Phonotactics and syllable stress: Implications for the processing of spoken nonsense words’, Language and Speech 40: 47–62. Vitz, P. C. and Winkler, B. S. (1973) ‘Predicting judged similarity of sound of English words’, Journal of Verbal Learning and Verbal Behavior 12: 373–88. Vogel, R. (2001) ‘Case conflict in German free relative constructions. An optimality theoretic treatment’, in G. Mu¨ller and W. Sternefeld (eds), ‘Competition in syntax’, No. 49 in Studies in Generative Grammar. Berlin and New York: de Gruyter, pp. 341–75. —— (2002) ‘Free relative constructions in OT syntax’, in G. Fanselow and C. Fe´ry (eds), ‘Resolving conflicts in grammars: Optimality theory in syntax, morphology, and phonology,’ in Linguistische Berichte Sonderheft 11, Hamburg: Helmut Buske Verlag, pp. 119–62. —— (2003a) ‘Remarks on the architecture of OT syntax’, in R. Blutner and H. Zeevat (eds), Optimality Theory and Pragmatics. Houndmills, Basingstoke, Hampshire, England: Palgrave Macmillan, pp. 211–27. —— (2003b) ‘Surface matters. Case conflict in free relative constructions and case theory’, in E. Brandner and H. Zinsmeister (eds), New Perspectives on Case Theory. Stanford: CSLI Publications, pp. 269–99. —— (2004) ‘Correspondence in OT syntax and minimal link effects’, in A. Stepanov, G. Fanselow, and R. Vogel (eds), Minimality Effects in Syntax. Berlin: Mouton de Gruyter, pp. 401–41. —— and Frisch, S. (2003) ‘The resolution of case conflicts. A pilot study’, in S. Fischer, R. van de Vijver, and R. Vogel (eds), Experimental Studies in Linguistics 1, vol. 21 of Linguistics in Potsdam. Institute of Linguistics, Potsdam: University of Potsdam, pp. 91–103. —— and Zugck, M. (2003) ‘Counting markedness. A corpus investigation on German free relative constructions’, in S. Fischer, R. van de Vijver, and R. Vogel (eds), Experimental Studies in Linguistics 1, vol. 21 of Linguistics in Potsdam. Institute of Linguistics, Potsdam: University of Potsdam, pp. 105–22. ——, Frisch, S., and Zugck, M. (in preparation) ‘Case matching. An empirical study.’ MS, University of Potsdam. To appear in Linguistics in Potsdam.
References
393
Warner, N., Jongman, A., Sereno, J., and Kemps, R. (2004) ‘Incomplete neutralization and other sub-phonemic durational differences in production and perception: Evidence from Dutch’, Journal of Phonetics 32: 251–76. Warren, P., Grabe, E., and Nolan, F. (1995) ‘Prosody, phonology and parsing in closure ambiguities’, Language and Cognitive Processes 10: 457–86. Wasow, T. (1997) ‘Remarks on grammatical weight’, Language Variation and Change 9: 81–105. —— (2002) Postverbal Behavior. Stanford University, Stanford: CSLI Publications. Welby, P. (2003) ‘Effects of pitch accent position, type and status on focus projection’, Language and Speech 46: 53–8. White, L. (2003) Second Language Acquisition and Universal Grammar. Cambridge: Cambridge University Press. Wickelgren, W. A. (1977) ‘Speed-accuracy tradeoff and information processing dynamics’, Acta Psychologica 41: 67–85. Wiltschko, M. (1998) ‘Superiority in German’, in E. Curtis, J. Lyle, and G. Webster (eds), Wccfl 16, the Proceedings of the Sixteenth West Coast Conference on Formal Linguistics. Stanford: CSLI, pp. 431–45. Withgott, M. (1983) ‘Segmental Evidence for Phonological Constituents’, Ph.D. thesis, Univerity of Texas, Austin. Wright, R. (1996) ‘Consonant Clusters and Cue Preservation’, Ph.D. thesis, University of California, Los Angeles. Wunderlich, D. (1997) ‘Cause and the structure of verbs’, Linguistic Inquiry 28: 27–68. —— (2003) ‘Optimal case patterns: German and Icelandic compared’, in E. Brandner and H. Zinsmeister (eds), New Perspectives on Case Theory. Stanford: CSLI Publications, pp. 329–65. Yamashita, H. (2002) ‘Scrambled sentences in Japanese: Linguistic properties and motivations for production’, Text 22(4): 597–633. —— and Chang, F. (2001) ‘ ‘‘Long before short’’ preference in the production of a head-final language’, Cognition 81: B45–B55. Young, R. W., Morgan Sr., W., and Midgette, S. (1992) Analytical Lexicon of Navajo. Albuquerque: University of New Mexico Press. Zec, D. (2002) ‘On the prosodic status of function words’, Working Papers of the Cornell Phonetics Laboratory 14: 206–48. Zribi-Hertz, A. (1989) ‘A-type binding and narrative point of view’, Language 65: 695–727. Zsiga, E. (2000) ‘Phonetic alignment constraints: consonant overlap and palatalization in English and Russian’, Journal of Phonetics 28: 69–102. Zue, V. and Laferriere, M. (1979) ‘Acoustic study of medial /t, d/ in American English’, JASA 66: 1039–50. Zuraw, K. R. (2000) ‘Patterned Exceptions in Phonology’, Ph.D. thesis, University of California, Los Angeles.
This page intentionally left blank
Index of Languages Abun 78–9 Aguatec 78–9 Albanian 78–9 Amuesha 78–9 Arabic 9 Basque 223 Bulgarian 49 Chatino 78–9 Chinantec 78–9 Chontal 78–9 Chukchee 78–9 Couer D’Alene 78–9 Croatian 338 Cuicatec 78–9
266–8, 283, 292–309, 312–15, 321, 332–4, 357 Greek 78–9, 118, 349 Hebrew 329–32, 334 Huichol 78–9 Hungarian 51, 78–9, 118 Icelandic 63 Ioway-Oto 78–9 Irish 97 Italian 18, 78–9, 106–8, 110–15, 117, 119–22 Japanese 13, 18–19, 207, 211–13, 217, 225, 338–48, 352–4, 357–8
Dakota 78–9 Danish 317, 321–2, 333–5 Dutch 20, 49–51, 53, 59, 61, 63–8, 88–105, 108, 151, 321–2,
Keresan 78–9 Khasi 78–9 Korean 345 Koryak 78–9
English 2, 6–11, 13, 18, 26–8, 30, 32–4, 37–41, 49–54, 59–61, 63, 78–81, 92–3, 106–7, 112–15, 119–22, 149–52, 171, 185–6, 200–3, 207, 209–11, 222–5, 231–2, 259–60, 265, 284, 292, 321, 334, 336–7, 340, 349–53, 355–8
Lummi 259–60
Finnish 75 French 27–30, 37, 39, 65–6, 68, 92–3, 110–11, 117, 119, 122, 281, 349 Frisian 53, 66–7 German 16, 20, 49, 67, 95, 108, 115, 125–8, 130, 146–8, 151–5, 161, 165–6, 242–3, 247, 249, 254–6, 261–4,
Malayalam 60–1, 64 Mazatec 78–9 Navajo 11, 186, 191–3, 195–204 Norwegian 78–9 Osage 78–9 Otomi 78–9 Pame 78–9 Portugese 78–9 Romanian 78–9 Russian 49, 115
396
Index of Languages
Spanish 41, 111, 115, 121, 171, 231–2, 352 Sundanese 37 Swedish 247, 333 Tagalog 72 Takelma 78–9 Telugu 78–9
Terena 78–9 Thai 78–9 Totonaco 78–9 Tsou 78–9 Turkish 115 Wichita 78–9
Index of Subjects adjacency hypothesis 209, 217–18, 220–2 alignment generally 189–90 string alignment algorithm 190–1 allophony 39 ambiguous case marking 308–11 ambiguous verb form 302–8 anaphora 53, 56–8, 60–8 argument order variation (in German) 125–8 argument order permutation types generally 127–9 pronoun ‘movement’ 127 scrambling 127 topicalization 127 wh-movement 127 argument order reanalysis 130–8 argument structure theory constructional theory 108–9 projectionist theory 108 split intransivity hierarchy (SIH) 109–11, 115–19, 122 attachment 231–2, 239 canonical binding theory 56–8 combination 208 confusiability testing 76 constituent recognition domain (CRD) 210–11 constraints 1to1 255, 263–4 1to1 & S