Linguistic Evidence
≥
Studies in Generative Grammar 85
Editors
Henk van Riemsdijk Harry van der Hulst Jan Koster
M...
136 downloads
1315 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Linguistic Evidence
≥
Studies in Generative Grammar 85
Editors
Henk van Riemsdijk Harry van der Hulst Jan Koster
Mouton de Gruyter Berlin · New York
Linguistic Evidence Empirical, Theoretical and Computational Perspectives
Edited by
Stephan Kepser Marga Reis
Mouton de Gruyter Berlin · New York
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
The series Studies in Generative Grammar was formerly published by Foris Publications Holland.
앝 Printed on acid-free paper which falls within the guidelines 앪 of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication Data Linguistic evidence : empirical, theoretical, and computational perspectives / edited by Stephan Kepser, Marga Reis. p. cm. ⫺ (Studies in generative grammar ; 85) Includes bibliographical references. ISBN-13: 978-3-11-018312-2 (cloth : alk. paper) ISBN-10: 3-11-018312-9 (cloth : alk. paper) 1. Linguistics ⫺ Methodology. I. Kepser, Stephan, 1967⫺ II. Reis, Marga. III. Series. P126.L48 2005 410.72⫺dc22 2005031124
Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at ⬍http://dnb.ddb.de⬎.
ISBN-13: 978-3-11-018312-2 ISBN-10: 3-11-018312-9 ISSN 0167-4331 쑔 Copyright 2005 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Cover design: Christopher Schneider, Berlin. Printed in Germany.
Contents
Evidence in Linguistics Stephan Kepser and Marga Reis
1
Gradedness and Consistency in Grammaticality Judgments Aria Adli
7
Null Subjects and Verb Placement in Old High German Katrin Axel
27
Beauty and the Beast: What Running a Broad-Coverage Precision Grammar over the BNC Taught Us about the Grammar – and the Corpus Timothy Baldwin, John Beavers, Emily M. Bender, Dan Flickinger, Ara Kim, and Stephan Oepen
49
Seemingly Indefinite Definites Greg Carlson and Rachel Shirley Sussman
71
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese Sonia M. L. Cyrino and Ruth E. V. Lopes
87
Aspectual Coercion and On-line Processing: The Case of Iteration Sacha DeVelle
105
Why Do Children Fail to Understand Weak Epistemic Terms? An Experimental Study Serge Doitchinov
123
Processing Negative Polarity Items: When Negation Comes Through the Backdoor Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
145
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs Veronika Ehrich
165
vi Contents The Decathlon Model of Empirical Syntax Sam Featherston
187
Examining Constraints on the Benefactive Alternation by Using the World Wide Web as a Corpus Christiane Fellbaum
209
A Quantitative Corpus Study of German Word Order Variation Kris Heylen
241
Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity Derrick Higgins
265
Language Production Errors as Evidence for Language Production Processes – The Frankfurt Corpora Annette Hohenberger and Eva-Maria Waleschkowski
285
A Multi-Evidence Study of European and Brazilian Portuguese wh-Questions Mary Aizawa Kato and Carlos Mioto
307
The Relationship between Grammaticality Ratings and Corpus Frequencies: A Case Study into Word Order Variability in the Midfield of German Clauses Gerard Kempen and Karin Harbusch
329
The Emergence of Productive Non-Medical -itis: Corpus Evidence and Qualitative Analysis Anke L¨udeling and Stefan Evert
351
Experimental Data vs. Diachronic Typological Data: Two Types of Evidence for Linguistic Relativity Wiltrud Mihatsch
371
Reflexives and Pronouns in Picture Noun Phrases: Using Eye Movements as a Source of Linguistic Evidence Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
393
The Plural is Semantically Unmarked Uli Sauerland, Jan Anderssen, and Kazuko Yatsushiro
413
Coherence – an Experimental Approach Tanja Schmid, Markus Bader, and Josef Bayer
435
Contents vii
Thinking About What We Are Asking Speakers to Do Carson T. Sch¨utze
457
A Prosodic Factor for the Decline of Topicalisation in English Augustin Speyer
485
On the Syntax of DP Coordination: Combining Evidence from Reading-Time Studies and Agrammatic Comprehension Ilona Steiner
507
Lexical Statistics and Lexical Processing: Semantic Density, Information Complexity, Sex, and Irregularity in Dutch Wieke M. Tabak, Robert Schreuder, and R. Harald Baayen
529
The Double Competence Hypothesis. On Diachronic Evidence Helmut Weiß
557
List of Contributors
577
Evidence in Linguistics Stephan Kepser and Marga Reis
As is well known, the central objects of linguistic enquiry – language, languages, and the factors/mechanisms systematically (co-)governing language acquisition, language processing, language use, and language change – cannot be directly accessed; they must be reconstructed from the accessible manifestations of linguistic behaviour. These manifestations constitute the realm of possibly usable linguistic data. Since they fall into many types – introspective data, corpus data, data from (psycho-)linguistic experiments, synchronic vs. diachronic data, typological data, neurolinguistic data, data from first and second language learning, from language disorders, etc. –, and since each type, apart from historical data, can be instantiated by infinitely many tokens, the linguist’s central task of building theories about the above-mentioned linguistic objects is invariably bound up with several empirical tasks as well: (i) collecting/selecting a representative as well as reliable database from one or more data types, (ii) evaluating the various data types as to how they reflect linguistic competence (recall that even so-called primary data from introspection as well as authentic language production are complex performance data involving different nonlinguistic factors), (iii) assessing the relationship between the various data types such that comparison between studies of the same issue based on different data types is possible, and potential conflicts in results can in principle be resolved. As will be obvious, the three empirical tasks are largely interdependent. However, they are to a considerable degree dependent on linguistic theorising as well: Task (i) must typically be solved for specific linguistic problems, the specific shape of which is determined by linguistic theory proper. Tasks (ii) and (iii) must be related to theories about the interaction of linguistic competence with nonlinguistic faculties and factors in performance. Thus, gaining relevant linguistic evidence from the mass of potentially available data is neither a trivial matter nor a purely methodical one that can be pursued in isolation from concrete linguistic enquiry and their theoretical concerns. Moreover, providing useful data collections (be it appropriately annotated corpora, collections of controlled speaker judgements, experimentally elicited data, etc.) is also a linguistically challenging ‘practical’ task. In short, linguistic
2 Stephan Kepser and Marga Reis evidence is an extremely important topic as well as a challenging problem for linguists of all persuasions. Given the fundamental nature of the problem, linguistic evidence is a remarkably new topic of linguistic discussion. Traditionally, concrete speech events, i.e, naturally occurring written or spoken utterances, were taken without further ado as the only relevant source of linguistic data, although the need for ‘abstracting’ the linguistically relevant traits from these data was by no means unknown (cf. B¨uhler 1932: 97, 1934: 14–15). Within structuralism, this tradition gained explicit methodological and theoretical status (‘distributionalism’). Thus the explicit mentalistic turn of generative grammar which claimed the priority of explanatory over descriptive goals and introspective over corpus data was bound to inspire a heated debate concerning the status of linguistics as an empirical science in general and the nature of proper linguistic evidence in particular. This debate, however, died down after the seventies without virtually any consequences on linguistic practice: Generative linguists continued relying more or less on introspective data gained in rather informal ways, non-generative linguists continued relying more or less on corpus data that were often just as informally obtained. In recent times, this has begun to change. Regarding the use of introspective data, an important turning point was the book by Sch¨utze (1996), who was the first to argue forcefully for a systematic approach to the collection of speaker judgements. Since then, many authors have followed his lead and shown in various ways the necessity of controlling the many factors that influence speaker judgements in order to obtain more reliable data. As a consequence, there is a growing awareness among generative linguists that it is imperative to collect introspective data in systematically controlled ways, and moreover useful to complement them by data from other sources, both of which increasingly influences their linguistic practice. Regarding corpus data, the importance of this source of evidence has grown significantly since about the mid nineties, when when really large amounts of language data of many types became electronically available and easily accessible for the first time. Frequently, these data were annotated in linguistically relevant ways which made these sources even more valuable. At the same time, computational linguists developed methods of accessing and evaluating these corpora. Consequently, linguists have now access to corpora that are several orders of magnitude larger than they were before. And the size and number of such corpora is still rapidly growing. Hence the renaissance of corpus linguistics to be observed since the nineties is by no means a coincidence. Both developments, by voiding mutual reservations concerning solidity
Evidence in Linguistics 3
and practicability of method, have also paved the way for a rapprochement between introspective and corpus linguists, as evidenced by several recent publications in which the question of what should count as linguistic evidence is discussed from either perspective, on the whole opting for using corpus as well as introspective evidence (see, e.g., the recent special issues of Lingua and Studies in Language ). But an astonishing number of participants in the discussion are still trying to argue that one of these types of linguistic evidence is generally significantly superior to the other (see, e.g., Lehmann (2004) and Borsley (2005b)). It is one of the main aims of this volume to overcome the corpus data versus introspective data opposition and to argue for a view that values and employs different types of linguistic evidence each in their own right. Evidence involving different domains of data will shed different, but altogether more, light on the issues under investigation, be it that the various findings support each other, help with the correct interpretation, or by contradicting each other, lead to factors of influence so far overlooked. This ties in naturally with the fact we started out with that there are more domains and sources of evidence that should be taken into account than just corpus data and introspective data. These insights may sound simple, but, unfortunately, a look into the discussion on evidence in linguistics shows that they are not generally accepted. Apparently, it is not so much the origin of evidence that counts. What is more important is adequacy and the status of the data as true ‘evidence’. Adequacy means that the data put forward to support a certain claim actually do so. This can only be decided on an individual level, i.e., for the particular linguistic problem in question. It is therefore of no concern to us here. Whether certain data can be regarded as true evidence touches the key questions of reliability and reproducibility of data. Reproducibility of data is a base demand in all areas of science for these data to be considered true evidence for something. Typical counterexamples are example sentences held to be (un)acceptable by virtue of the linguist’s own judgement only (especially if fortified by the belief in individual ‘dialects’), or quoting a single occurrence of a construction found in the world wide web, which is by some regarded as the largest accessible corpus as support for this construction’s grammatical existence. Reliability encompasses reproducibility, but requires more. A proper analysis and control of the factors that influence the constitution of the data are necessary as well. With reproducibility and reliability secured, data can be fruitfully used as evidence for strengthening or refuting hypotheses. The contributions to the present book are examples of how this can be done
4 Stephan Kepser and Marga Reis in linguistic practice. An important aspect of this book, and a consequence of what we pointed out at the outset about the theoretical underpinnings of issues of linguistic evidence, is the absence of purely abstract discussions of methodologies. Rather, all issues concerning linguistic evidence taken up in the various contributions are addressed in relation to specific linguistic research problems. The main reason for this is our belief that it is only with respect to concrete problems that the quality of the method and of the various types of evidence brought to bear on them can be evaluated. Apart from that it is just more convincing to see how using different types of evidence and different methods of obtaining it may in fact further our understanding of such concrete problems. It stands to reason then that a volume on ‘Linguistic Evidence’ should cover a wide range of data types (and methods for turning data into evidence) to be applied to an equally wide range of linguistic phenomena. The present volume does: As for data types, many sources of evidence come into play: corpus data, introspective data, psycholinguistic data, data from computational linguistics, language acquisition data, data from historical linguistics, and sign language data. In several contributions, different data types are comparatively evaluated, which yields particularly insightful results. What is remarkably absent is quarrel about the status of introspective vs. corpus data; both are recognised throughout as equally valid sources of evidence. We take this as a hopeful sign that the longstanding but fruitless either-or confrontation of these data types will finally be overcome. Different ways for gaining linguistic evidence are also well represented in this volume, papers applying/exploring psycholinguistic methods forming perhaps the largest group. A good part of them is concerned with experimental data from language processing, exploring systematic ways for measuring and interpreting these data. But there are also papers exploring methods for collecting reliable as well as reproducible grammaticality judgements. These data types and methods are applied insightfully to phenomena from such diverse areas as syntax, semantics, phonology, morphology, psycholinguistics, historical linguistics, language acquisition, corpus linguistics, computational linguistics, and patholinguistics. For books, such diversity of topics is not always a virtue. But in this case, it serves to underline the fundamental importance issues of linguistic evidence have for all fields of linguistics. It also indicates that awareness of these issues has by now reached almost all these fields. The present book is based on the conference on Linguistic Evidence. Empirical, Theoretical, and Computational Perspectives that took place in T¨ubingen, January 29 – February 1, 2004. It was organised by the Collaborative
Evidence in Linguistics 5
Research Centre (SFB) 441 on “Linguistic Data Structures. On the Relation between Data and Theory in Linguistics” at the University of T¨ubingen, which has supported in-depth studies of linguistic evidence in all its aspects since 1999. The contributions to this volume are elaborated versions of the conference presentations, plus a paper by H. Weiß designed to complement the historical section. Unfortunately, four papers presented at the conference were not be submitted for publication. The editors of this volume wish to express their gratitude to the members of the collaborative research centre (SFB) 441 on Linguistic Data Structures at the University of T¨ubingen for many interesting discussions on key issues of evidence in linguistics, and for their vigorous support when organising the above-mentioned conference. In this regard we owe particular thanks to Sam Featherston, Beate Starke, and Dirk Wiebel. We also want to thank the members of the conference programme committee for their excellent work. When preparing the present volume we received again generous support by many, to whom we are very grateful. In particular, we wish to thank the colleagues who reviewed the papers for publication, for their extremely useful comments and criticisms, and the group of helpers without whom editing this volume might have become a mission impossible: Iris Banholzer, Ansgar H¨ockh, Chris Sapp, and Bettina Zeisler. We are also grateful to the German Science Foundation (DFG) for their generous support of the collaborative research centre 441 and of the conference on Linguistic Evidence.
Stephan Kepser and Marga Reis
November 2005
References Borsley, Robert D., (ed.) 2005a Data in Theoretical Linguistics, volume 115(11) of Lingua. Borsley, Robert D. 2005b Introduction. Lingua, 115(11): 1475–1480. B¨uhler, Karl 1932 Das Ganze der Sprachtheorie, ihr Aufbau und ihre Teile. In Bericht u¨ ber den XII. Kongreß der Deutschen Gesellschaft fu¨ r Psychologie, pp. 95–122. Fischer, Jena.
6 Stephan Kepser and Marga Reis 1934
Sprachtheorie. Die Darstellungsfunktion der Sprache. Fischer, Jena. 2nd edition Stuttgart, 1965. Lehmann, Christian 2004 Data in linguistics. The Linguistic Review, 21: 175–210. Penka, Martina and Anette Rosenbach 2004 What counts as evidence in linguistics. Studies in Language, 28(3): 480–526. Sch¨utze, Carson 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press.
Gradedness and Consistency in Grammaticality Judgments Aria Adli
1
The importance of graded grammaticality judgments: a case study of que Æ qui in French
The methodological issue of the unreliability of certain introspective data circulating in the syntactic literature has already been mentioned by several authors (e.g. Schütze 1996; Adli 2004). One particularly problematic phenomenon is that questionable judgments are sometimes quoted in theoretical studies without prior critical empirical verification, contributing to the formation of “myths” in the literature. One case is the que Æ qui ‘rule’ in French. This rule, which has been introduced into the literature solely on the basis of uncontrolled introspective data, is not confirmed by an experimental study in which a controlled process of data collection is applied to a whole sample of test subjects and which makes use of a graded concept of grammaticality. The que Æ qui rule essentially states that an ECP violation can be avoided in French if qui is used instead of the usual complementizer que in sentences where a wh-phrase has been extracted from the subject position (see Perlmutter 1971; Kayne 1977). This rule rests on the empirical ‘premise’ that there should be a clear difference in grammaticality between (2a) and (2b) (all four sentences are taken from Hulk and Pollock 2001). (1)
a. Quel livre crois-tu que les filles vont acheter. which book think-you COMPque the girls will buy b. *Quel livre crois-tu qui les filles vont acheter. which book think-you COMPqui the girls will buy
(2)
a. *Quelles filles crois-tu que vont acheter ce livre-là. which girls think-you COMPque will buy that book-there b. Quelles filles crois-tu qui vont acheter ce livre-là. which girls think-you COMPqui will buy that book-there
8
Aria Adli
The que Æ qui rule has been an often-used argument in syntactic theorizing.1 The assumption is that this rule is a sort of loophole to avoid ungrammaticality, or in Pesetsky’s words (1982: 308): “Qui does not occur freely as a complementizer, but only ‚when needed’ to avoid an NIC violation. [...] In other words, qui is a form of que which provides an ‘escape hatch’ from the effects of the NIC.” Chomsky (1977) compares it with free deletion in COMP in English. Rizzi (1990; 1997) supports his assumptions concerning the agreement process in the COMP system with this rule. He states that in cases of felicitous subject extraction in French the agreeing complementizer is not 0, but the overt form qui. He assumes that an ECP violation is produced if the agreeing form does not occur and C is in what he considers as the unmarked form que. He further states that this rule is a morphological reflex of Spec-head-agreement between a trace and the head of COMP. Therefore Rizzi (1990: 56) assumes: (3)
qui = que + Agr
Rizzi (1990) accounts for the ungrammaticality of the object extraction (1b) by assuming that Spec-head-agreement requires a C-adjacent position of the extracted element. Furthermore, Rizzi (1990) assumes that the que Æ qui rule only applies when agreement occurs between C0 on the one hand and its specifier as well as its complements on the other hand. (Such a double agreement had already been described for Bavarian German by Bayer 1984 concerning sentences like Wenn-st du kumm-st). The result would be as shown in (4): t’ agrees with C0, t with I0 and – due to the identity of t and t’ – C0 with the maximal projection of I0 (by transitivity). (4)
[t’ C0 [ t I0 ...
One aim of this paper was to test this assumption in an experimentally controlled process of data collection using a graded concept of grammaticality. Such a graded concept is assumed in Chomsky (1964), but it is already given up in Chomsky (1965) in favour of a distinction between grammaticality and acceptability. However, a rather pre-theoretic concept of gradedness persists in the syntactic literature, sometimes tacitly through the use of symbols like “?”, “??”, etc. Furthermore, some principles even make use of theoretical predictions in line with a graded concept (e.g. ECP vs. subjacency violation).
Gradedness and Consistency in Grammaticality Judgments
9
In order to measure graded grammaticality judgments, an instrument based on the principle of graphic rating (cf. Guilford 1954: 270; Taylor and Parker 1964) has been developed. Part of the design is an extensive instruction and training phase. Judgments are expressed by drawing a line on a bipolar scale (and not by marking one of several boxes with a cross). Within the limits of a person’s differential capacity of judgment, a theoretically infinite number of gradations are therefore possible. The test was presented in a A4 ring binder containing two horizontally turned A5 sheets (see diagram). comparaison
Quel avion, pouvez-vous penser, prennent les touristes chinois ?
Jugement (510B)
Le gros buffet en chêne doit être retapé.
Quelle est l’armoire que refont les employés de la scierie ?
Figure 1.
The upper sheet contained the reference sentence, the lower sheet the experimental sentence. The sentence, with the graphic rating scale under it, was printed in the middle of each sheet. After the subject had rated the experimental sentence on the lower sheet, he or she turned this page to go on with the next sentence. The upper sheet with the reference sentence was not turned and remained visible during the whole test. The judgments were given relative to the reference sentence judged in the beginning by the sub-
10
Aria Adli
ject himself, within both endpoints (obviously well-formed and obviously ungrammatical) given by the design. It was, therefore, a bipolar, anchored rating scale with the characteristic that the subjects choose the anchor for themselves. The reference sentence consisted of a suboptimal, but not extremely ungrammatical, sentence. The dependent variable was the difference between the judgment of a particular sentence and the judgment of the reference sentence. The test started, after the presentation of written instructions, with an interactive instruction and training phase of about 10 to 15 minutes. During this phase, two main concepts were introduced in a 9step procedure: isolated grammaticality and gradedness (cf. Adli 2004: 8588 for details). A pre-test revealed the importance of such an additional training phase. Although not directly visible to the naked eye, the concept of grammaticality was often confounded with extra-grammatical factors (e.g., the plausibility of the situation described by the sentence). The understanding of the concept of isolated grammaticality is necessary to reduce interferences with semantic and pragmatic effects. Furthermore, subjects had to replace the common distinction between grammatical and ungrammatical, or "good" and "bad", sentences with a truly graded notion of grammaticality. They were introduced to these two main concepts, among other things, by rating different training sentences and by explaining the reasons for their ratings to the experimenter, who could therefore adapt the instructions to the level of understanding of each subject. After instruction and training, the experimenter left the room. Given that reliability can generally be improved by the use of several items, each syntactic structure was presented in 4 lexical variants. Since the use of experimental methods in grammar research is recent, and not much experience exists yet, the evaluation of the instrument with regard to its reliability is important. A reliability analysis indicates the limits of an instrument concerning the precision of its measurements. Furthermore, the only three studies on the reliability of experimentally collected, graded grammaticality judgments I know of, namely Bard, Robertson and Sorace (1996: 61), Cowart (1997: 23) and Keller (2000: 215), rely on erroneous or improper calculations.2 Reliability is evaluated by Cronbach’s D, which is a measure of internal consistency (see Cronbach 1951). It indicates the consistency between the different lexical variants of a sentence without taking into consideration mean differences between the variants. Indeed, the reliability of the measurements turned out to be sufficiently high (Cronbach’s D = 0.85).
Gradedness and Consistency in Grammaticality Judgments
11
78 French native speakers participated in the experiment. Validity was ensured by means of a special index (called violation of trivial judgments), reflecting the capability of the subject to give graded grammaticality judgments (cf. Adli 2004: 89-91). By means of this criterion, those subjects who were deemed unable to perform this task could be identified and excluded; the data of 65 subjects could be utilized for the subsequent statistical analyses. Given that the measure of graded grammaticality does not reflect the categorical distinction between well-formed and ill-formed sentences, and given that such an information is still – for theory-internal reasons – important, grammatical as well as ungrammatical constructions were included in the test design in order to make available comparative scale points for the interpretation process: The experiment did not only cover subject-initial and object-initial interrogatives with long extraction over que and/or qui. The clearly felicitous constructions (5a) and (5b) with a PP-parenthetical “d’après vous” and the sentences (6a) and (6b) with the expression “croyezvous” at the position after the wh-phrase were also included – some aspects of their syntax are discussed in section 3 (see Adli 2004 for full details).3 (5)
a. Quel appache, d'après vous, méconnaît les obstacles de l'hiver? which Appache according you ignores the difficulties of the winter b. Quel animal, d'après vous, rôtissent les esquimaux de l' igloo? which animal according you grill the Eskimos of the Igloo
(6)
a. (?)Quel architecte, croyez-vous, conçoit les demeures du président? which architect think you designs the residences of the president b. (?)Quel argent, croyez-vous, investissent les organisateurs du bal? which money think you invest the organisers of the ball
(7)
a. ??Quel ingénieur, pensez-vous, qui conçoit la fusée de l'Aérospatiale? which engineer think you quiCOMP designs the rocket of Aérospatiale b. *Quel idiot, pensez-vous, que perd les clefs de la maison? which idiot think you queCOMP looses the keys of the house
12
Aria Adli
c. ?Quel appel, pensez-vous, que reçoivent les policiers du quartier? which call think you queCOMP receive the police officers of the district The data was analysed with a two-way repeated measures ANOVA (variable A: “d’après vous” / “croyez-vous” / “pensez-vous qu-”; variable B: subject / object). I took into consideration not only information about the significance level, but also about the effect size of the differences (in terms of partial K2, cf. Cohen 1973; see also Keren & Lewis 1979: 119). The hypothesis was tested at D = 5%, which approximately allows for D = E.4 In the following, only the relevant results concerning the que Æ qui issue will be given: In order to take into account the whole details of the results, a complete set of orthogonal simple effects was tested as regards the subject interrogatives (cf. Bortz 1999: 254), contrasting (i) (5a). vs. (6a), (ii) (7a) vs. (7b), as well as (iii) (5a) and (6a) vs . (7a) and (7b) [+]grammatical 60
subject questions
grammaticality
40
object questions
20
0
-20
-40
"...d'après vous..." "...croyez-vous..."
"...pensez-vous qui...""...pensez-vous que..."
-60
[-]grammatical
Figure 2.
The results show a partial K2 of 0.183 (p or ), and that linguistic scales give rise to so-called scalar implicatures. Linguistic scales have two important properties: logically, the truth of a stronger term always entails the truth of all weaker terms on a given scale. Therefore, a sentence with might is true in all situations in which a corresponding sentence with must is true. Pragmatically, however, when a speaker uses a weaker term of a given scale, s/he normally denies that all stronger terms of the scale would describe the situation appropriately. That is, the use of might by a speaker implicates that the use of must would be inappropriate, even if, from a purely semantic point of view, both terms could be used to describe the situation. To sum up: the use of might implicates not must. Children who systematically ignore scalar implicatures will treat weak scalar terms like might logically. This will lead them to automatically judge statements with might to be acceptable in situations that they know to be determinate and that normally require must. Noveck (2001) examined this possibility in three experiments. His first two experiments were designed to evaluate how children from 5;0 to 9;0 judge the appropriateness of sentences with might. His results suggest that even 9-year-olds consider sentences with might to be acceptable in situations that normally require must more often than adults. In a follow-up experiment (not with the same children), he observed exactly the same type of reaction with the French quantifier certains ‘some’, which is also a weak scalar term. From this, Noveck (2001) concludes that children are more likely than adults to accept under-informative statements. The results of the experimental studies presented above do not allow us to clearly decide whether the late understanding of weak epistemic terms by children is due to their cognitive inability to understand epistemic uncertainty or to their inability to recognise scalar implicatures in the same way as adults. The typical reaction of the children tested in these studies can be taken as evidence for either of the hypotheses under review. Thus, what we need is further experimental research that tests for both hypotheses. In fact, the only
126 Serge Doitchinov way to examine the role of both hypotheses in the acquisition of epistemic terms is to let the same sample of children carry out a set of tasks that would allow us to assess their cognitive as well as their linguistic abilities. This is the aim of the two experiments reported below. 2 Experiment 1 The experiment consists of three tasks: (i) The modal expression task (ME task) investigates the children’s ability to understand weak epistemic expressions correctly; (ii) the implicature task was designed to assess the children’s understanding of scalar implicatures; and (iii) the inference task examines their ability to deal with epistemic uncertainty. In order to gain a better insight into the relationship between the three abilities tested here, the three tasks were performed by the same sample of children. They were conducted by means of the picture selection paradigm. The same testing method was used in all tasks to reduce potential biases that might be caused by a different degree of complexity in the tasks. 2.1
Participants
Eighteen 6-year-olds (from 5;7 to 6;6, mean age 6;2), twenty-eight 7-yearolds (from 6;7 to 7;6, mean age 7;1), twenty-eight 8-year-olds (from 7;7 to 8;6, mean age 8;1) and ten adults, all monolingual speakers of German, participated in the experiment. The children were recruited from elementary schools and kindergartens in Stuttgart and T¨ubingen (Germany). The adults were students at the University of T¨ubingen. 2.2 2.2.1
Method and material The ME task
In the ME task the children’s understanding of the German epistemic modals k¨onnen ‘may’ and vielleicht ‘maybe/perhaps’ was tested. The following sentence types were used to carry out the task: (1)
a.
Es kann sein, dass der Junge im Haus ist. it can be that the boy in-the house is ‘It may be the case that the boy is in the house.’
Why Do Children Fail to Understand Weak Epistemic Terms? 127
Figure 1a. The “certainty story”
Figure 1b. The “uncertainty story”
Figure 1. Picture stories presented with the sentence in (1a).
b.
Der Junge ist vielleicht im Haus. the boy is maybe in-the house ‘Maybe the boy is in the house.’
The children were tested three times for each type of sentence. So, six different sentences with k¨onnen and vielleicht were used in this task. Following Doitchinov’s (2001) design, the children were presented with two picture stories at the same time. Each story consisted of two pictures (see Figure 1). The first story showed a boy performing some action. The actions shown were always of the same type: a boy moving from one location to another. At the end of the story, it was always possible to see that the boy was at the place described in the input sentences. This type of story was called the “certainty story” (see Figure 1a). The second story (“the uncertainty story”, see Figure 1b) always showed the same beginning as the corresponding “certainty story”. The only difference was that at the end of the “uncertainty story” the location of the boy was left uncertain (i.e. there was nobody to see in the second picture of the “uncertainty story”). To prevent the participant from confusing the two stories, one showed a boy with blond hair and the other a boy with brown hair. The factors [blond/brown], [certain/uncertain] and [left/right position in front of the participant] were randomised through-
128 Serge Doitchinov out the trials. The main test was preceded by a short warm-up session (not for the adult group). The goal of this session was to ensure that the participants were able to understand that the two pictures belonged to the same story. At the beginning of the warm-up session, the investigator told the participant that he wanted to show him/her a picture story about a boy (“look, this is the story of a boy with blond/brown hair!”). At this point, the investigator showed the participant a two-picture-story similar to the “certainty story”. The participant was given enough time to look at the pictures. Then, the investigator asked him/her to tell the story. Participants who were obviously unable to recognise that the two pictures formed one and the same story or that the story was about one and the same boy were excluded from the main test. At the beginning of the main test, the investigator told the participant that he wanted to play a small quiz-game with him/her: “Here are two different picture stories. As you can see in the two pictures here, the first story is about a boy with blond/brown hair. And, as you can see in the two pictures here, the second story is about a boy with brown/blond hair. Please look at the stories carefully! Tell me when you are ready, and then I will ask you a question and you will have to guess which story is the right one. If you listen carefully to my question, you will be able to guess correctly.” At the beginning of each trial the investigator waited until the participant told him that s/he was ready. Then, the investigator asked him/her the following question: “Von welcher Geschichte spreche ich, wenn ich sage: I NPUT SENTENCE?” (‘Which story am I talking about, when I say . . . ’). The participants were not only asked to choose one story, but also to give a justification of their choice. A similar task (but with differing weak epistemic sentences) was conducted in Doitchinov (2001): the results of this study showed that the children who correctly understood weak epistemic terms chose the “uncertainty story”. They argued that they could know the boy was in the house in the “certainty story” or that there was nobody to see in the house at the end of the “uncertainty story”. Children who did not understand the weak epistemic terms correctly also behaved very consistently: They nearly always chose the “certainty story”, arguing that they could see the boy in the house. The same typical reactions are expected in the ME task of the present experiment. 2.2.2
The implicature task
The implicature task was designed to find out whether the participants preferred a pragmatic interpretation (i.e. under consideration of the scalar impli-
Why Do Children Fail to Understand Weak Epistemic Terms? 129
Figure 2a. Beginning of a story with an open end
SOME case
ALL case
NONE case
Figure 2b. Three possible outcomes
Figure 2. Picture story for the implicature task.
cature) of the quantifier einige ‘some’ in a picture selection task. This task was very similar to the ME task. The participants were presented the beginning of a picture story, the ending of which was open. The first picture always showed a group of five children that were about to perform some action (see Figure 2a). The participants were presented two of three possible outcomes of the story in each trial (see Figure 2b). The first possible outcome showed that only three children performed the action at the end of the story (= the SOME case), the second that all children (= the ALL case) performed the action and the third that none of the children did (= the NONE case). The understanding of sentences that were quantified with einige ‘some’, alle ‘all’ and kein ‘no’ was tested: (2)
Einige/alle/kein Kind(er) sind/ist im Boot. some/all/no child(ren) are/is in-the Boat ‘Some/all/no child(ren) are/is in the boat.’
The rest of the procedure was similar to the ME task: the investigator introduced the first trial by saying: “Here is the beginning of a story with two pictures. The story is about a group of five children. But we don’t know yet the end of the story. Here are two possible outcomes for the story. I will ask
130 Serge Doitchinov you a question, and you will have to guess which one of the two pictures is the right ending of the story.” After the participant was given enough time to look at the pictures, the investigator asked the following question: “Welches ist das richtige Ende der Geschichte, wenn ich sage: I NPUT SENTENCE?” (‘Which is the right ending of the story when I say: . . . ’). The next trials were just introduced by saying that there was a new quiz of the same type. Three different stories were used and six different combinations of input sentences and possible outcomes were tested three times each. In the critical trials, the participants were given a sentence with some in combination with the SOME and the ALL cases. The order of the trials was randomised but the three critical trials were always performed before the three trials with alle in order to avoid the following bias: first, an alle sentence is associated with the ALL case and then, in contrast, all the following sentences with einige are associated with the SOME case. This task was designed to find out whether children prefer the underinformative semantic (ALL case) or the more accurate pragmatic reading (SOME case) of einige in the critical trials. Because the ME and the implicature tasks are very similar, a comparison of the results of both tasks allows one to assess whether a reluctance to take into account the scalar implicature with k¨onnen/vielleicht is responsible for the choice of the “certainty story” in the ME task, as predicted by the implicature based hypothesis. If this were the case, one would expect that children who prefer the “certainty story” in the ME task would favour the ALL case in the critical trials of the implicature task. 2.2.3
The inference task
The goal of the inference task was to assess the children’s ability to deal with the concept of epistemic uncertainty. The task was a modified version of Somerville et al.’s (1979) first experiment. The task was introduced as a quiz-game. The participants were presented a picture showing a child with three toys (see the pictures on the left in Figure 3). They were told that this child always forgets one of his/her toys at the door of his/her house when going home. In a second picture, two houses were shown with a toy at their doors (see the pictures on the right in Figure 3). The investigator asked the participant the following question: “Kannst du mir sagen, in welchem Haus das Kind wohnt? Oder brauchst du Hilfe von mir?” (‘Can you point to the house the child lives in? Or do you need some help from me?’). Three possible cases were taken into account: in case A, the problem was
Why Do Children Fail to Understand Weak Epistemic Terms? 131
Figure 3a. Determinate case
Figure 3b. Indeterminate case with neg./no evidence
Figure 3c. Indeterminate case with pos. evidence
Figure 3. Pictures for the inference task.
determinate, because only one toy in front of the houses could belong to the child (see Figure 3a). In case B and case C, the problem was indeterminate, because either none of the toys (case B) or both toys (case C) belonged to the child (see figure 3b and c). Case B was called the inference ne task, because it contained no evidence about the house of the child, and case C was called inferencepe task, because in this case there was ‘too much’ positive evidence. Cases B and C were indeterminate problems, because the information available did not allow for a clear decision. The participants were tested three times for each case. They also had to justify their answers. It was expected that the participants who understand epistemic uncertainty would ask for help in case B, because none of the toys in front of the houses matched the child’s toys, and also in case C, because both toys matched the child’s toys. Following the inference based hypothesis, participants who pass the inference task should be more likely to succeed in the ME task.
132 Serge Doitchinov 2.3
Procedure
The young participants were tested individually in a separate room of their kindergarten or elementary school. The experiment was conducted in two sessions of about 15 minutes. In the first session, the participants performed the ME task. In the second session, they performed the implicature task first and then the inference task. About 40% filler trials were added to each task. The answers were tape recorded. 2.4
Results
In the ME task an answer was counted as correct if the participant chose the “uncertainty story” and gave an appropriate justification (i.e. rejecting the “certainty story”, arguing that one could see the boy; or choosing the “uncertainty story”, arguing the boy could not be seen). The results for vielleicht and k¨onnen were counted separately. In the implicature task, only the three critical trials were taken into account (no participant had any difficulties in the other trials). To be successful in these trials, the participant had to show a preference for the SOME case. In the inference ne and inferencepe tasks, an answer was counted as correct, if the participant showed that s/he was not willing to choose one of the houses. A correct answer counted as 1/3 point, so that each participant reached a score between 0 and 1 for each task. Table 1. Percentage of correct answers in Experiment 1. [With standard deviation in ().] Age
k¨onnen
vielleicht
implicature
inferencene
inferencepe
6 7 8 Ad.
0.09 (.25) 0.37 (.47) 0.64 (.47) 1.00 (.00)
0.12 (.19) 0.40 (.47) 0.69 (.44) 1.00 (.00)
0.89 (.32) 0.93 (.26) 1.00 (.00) 1.00 (.00)
0.47 (.43) 0.66 (.45) 0.73 (.42) 1.00 (.00)
0.22 (.35) 0.48 (.50) 0.68 (.46) 1.00 (.00)
Table 1 shows the mean score of correct answers reached by the participants for each task/group. A 2 (SEX) × 4 (AGE: 6, 7, 8, adult) MANOVA was conducted. For all tasks, the MANOVA showed no effect for SEX (F(1, 82) = 0.29, p = .59 for k¨onnen; F(1, 82) = .09, p = .77 for vielleicht; F(1, 82) = 3.55, p = .06 for implicature; F(1, 82) = 1.33, p = .25 for inference ne and F(1, 82) = .35, p = .56 for inferencepe ). On the contrary, the analysis showed an important effect of AGE in the ME task (F(3, 80) = 13.42, p = .000 for
Why Do Children Fail to Understand Weak Epistemic Terms? 133
k¨onnen and F(3, 80) = 14.11, p = .000 for vielleicht), in the inference ne task (F(3, 80) = 3.42, p = .02), and in the inference pe task (F(3, 80) = 8.27, p = .000). No effect of AGE was observed in the implicature task (F(3, 80) = 1.50, p = .22). Concerning the factor AGE, Post-hoc Dunnett-C tests (α = .05, 2-tailed) were conducted for all tasks but the implicature task. These tests showed that the adults performed significantly better than the children in all tasks. In the ME task, the 8-year-olds understood both epistemic terms significantly better than the 6-year-olds. No other significant difference between the groups of children was observed in this task. In the inference ne task, there was no significant difference between the performance of the different groups of children. In the inferencepe task, the Dunnett-C test shows the same result as in the ME task: the 8-year-olds solved this task significantly better than the 6-year-olds. No other significant differences in the results of the groups of children were noticed. A Wilcoxon test (α = .05, 2-tailed) was performed in order to assess whether the children understood vielleicht better than k o¨ nnen, and whether they performed better in the inference task than in the ME task. The results of this test showed that there were only one significant difference: the 6- and the 7-year-olds performed significantly better in the inference ne task than in the ME task (6-year-olds: z = −2.76, p = .006 between vielleicht and inferencene ; z = −2.85, p = .004 between k¨onnen and inferencene . 7-year-olds: z = −2.6, p = .009 between vielleicht and inference ne ; z = −2.52, p = .012 between k¨onnen and inferencene ). More important for testing the validity of the implicature based hypothesis and the inference based hypothesis are the correlations between the results of the different tasks. Partial correlations controlling for AGE were computed, but without taking the adults’ results into account. Table 2 shows the degree and the significance of the partial correlations. No significant correlations were observed between the ME and the implicature tasks. On the contrary, there are highly significant correlations between the children’s comprehension of the two epistemic terms in the ME task and their performance in judging epistemic uncertainty in both inference tasks. However, the correlations with the inference pe task are somewhat higher than the correlations with the inference ne task. Not surprisingly, the two inference tasks correlate with each other. It is also worth noticing, that the highest significant correlation is between the results for both epistemic terms.
134 Serge Doitchinov Table 2. Degree and significance of partial correlations in Experiment 1. k¨onnen k¨onnen vielleicht implicature inferencene inferencepe ∗ p = .000
2.5
— 0.91∗ 0.03 0.43∗ 0.52∗
vielleicht 0.91∗ — 0.00 0.43∗ 0.58∗
implicature
inferencene
inferencepe
0.03 0.00 — −0.18 −0.01
0.43∗ 0.43∗ −0.18 — 0.59∗
0.52∗ 0.58∗ −0.01 0.59∗ —
Discussion
The results of the ME task indicate that an important developmental change toward a full mastering of the meaning of epistemic terms takes place between 6 and 8 years of age. The 6-year-olds gave only 9% (k o¨ nnen) and 12% (vielleicht) of the correct answers. Most of the time, the 6-year-olds preferred the choice of the “certainty story” to the “uncertainty story”. This shows that children of this age are not yet able to understand reliably that vielleicht and epistemic k¨onnen are weak epistemic terms. On the contrary, the 8-year-olds were already quite confident in dealing with those terms: the children of this group mastered the ME task about two-thirds of the time, i.e. about six times more often than the youngest group. These results suggest that the most important part of the development of the understanding of weak epistemic terms occurs during this period of two years. Nevertheless, the results also suggest that even the 8-year-olds do not achieve the same competence in dealing with these terms as adults. In fact, the significant difference between the adults’ and the 8-year-olds’ performances cannot only be explained in terms of stress in an experimental situation or difficulties to remain concentrated during the task. If this were the case, one would expect to find nearly 100% of the 8year-olds giving just two correct answers out of three (which would lead to an average of exactly 2/3 correct answers, i.e. to the same results as observed). However, the analysis of the data of the ME task shows that, on the contrary, about 30% of 8-year-olds gave zero or only one correct answer. This means that about 30% of the 8-year-olds were still very poor in understanding weak epistemic terms. The justifications given by the participants for their choice during the ME task also support the claims above. The typical reaction of the 6-year-olds in this task was to choose the “certainty story” and to justify this choice by
Why Do Children Fail to Understand Weak Epistemic Terms? 135
explaining that they could see the boy in the house at the end of the story. This reaction shows clearly that children of this age agree that a weak epistemic statement is compatible with a situation that should be described with a strong epistemic statement or by a simple assertion. In contrast, the 7- and 8-yearolds who preferred the “uncertainty story” gave two types of justifications: some of the children argued that the “certainty story” was inadequate because they could see the boy looking out of the window, and the rest of the children argued that the “uncertainty story” was the right one, because the boy had totally disappeared. Both the wrong justifications of the younger children as well as the right ones of the older groups indicate that the choice of the story by the children was not motivated by random guessing, but by their knowledge about the location of the boy. To sum up, the results of the ME task show that 6- and 7-year-olds (and to some extent even 8-year-olds) are more likely to accept a weak epistemic term than adults in a situation that normally requires the use of a stronger expression. However, the main goal of this study was to investigate whether this behaviour of young children is due to their inability to recognise epistemic uncertainty, as predicted by the inference based hypothesis, or due to some tendency to ignore scalar implicatures, as predicted by the implicature based hypothesis. The following discussion of the results of the inference task and the implicature task will show that the children’s ability to solve the ME task is only correlated with their ability to recognise uncertainty, and not with some avoidance to take scalar implicatures into account. The results of the inference task show the same developmental trend that was already observed in the ME task. In general, the children of the two younger groups were not able to suspend their judgements when it was impossible to decide which house was the right one. Only the 8-year-olds showed a reliable competence in making such judgements of uncertainty independently from the type (negative vs. positive) of evidence they were offered, even if about 30% were still not able to do this. It is also important to notice that the children of all groups could recognise indeterminacy more easily when the evidence was negative than when it was positive: this difference was 25% for the 6-year-olds, 18% for the 7-year-olds, but only 5% for the 8-year-olds. The better performance of the children in the inference ne task compared to the inferencepe task is probably due to the fact that, in the inference ne task, the children used a try-and-eliminate strategy that automatically led to the conclusion that none of the presented houses is a possible correct answer to the task. Thus, in the inferencene task, there was no need for a comparison be-
136 Serge Doitchinov tween the two possible outcomes to lead the participant to the conclusion that they would need help from the investigator. The elimination of both houses in this task gave enough information to the participant to infer that there was no solution to the problem. On the contrary, in the inference pe task, the use of such a try-and-eliminate strategy did not allow the elimination of any of the possible answers, because none of the proposed solutions turned out unsatisfactory at the end of the elimination process. Thus, at the end of such a fruitless elimination process, the participants could follow two strategies: (i) they could choose one house randomly, or (ii) compare the two possibilities. It is obvious that only the second strategy led to a correct answer in the inferencepe task. So, unlike the inferencene task, the inferencepe task required a comparison between the two possible answers from the participants in order to come to the right conclusion, i.e. that the task was indeterminate. As previous research by Morris and Sloutsky (2002) has shown, young children have difficulties in drawing such a comparison. Their data suggest that children just pick out the first possibility that matches with an answer, without checking whether the problem may have another possible solution or not. It is therefore probable that many children just stopped the elimination process as soon as a possible solution to the task was found, without checking if there could be another one. Following this procedure, the children who simply recognised the toy in front of the first house concluded that it was an acceptable solution to the problem and skipped the examination of the second house. As a possible explanation for this behaviour, Morris and Sloutsky (2002, 924) suggested that children use a ‘cut’ strategy to simplify the complexity of problems (i.e. they transform an indeterminate problem to a determinate one). It is likely that the same ‘cut’ procedure was used in the inferencepe task. In the same line of argumentation as Morris and Sloutsky (2002), Pieraut-Le Bonniec (1980) found out that young children often add irrelevant information to an unsolvable task in order to increase the determinability of the problem. Furthermore, the results of the ME task and inference task indicate that the children’s abilities to recognise uncertainty and to understand expressions of epistemic uncertainty are interrelated. It is, therefore, probable that the participants who favoured the “certainty story” in the ME task did not realise that the “uncertainty story” was indeterminate. Basically, the “uncertainty story” can be interpreted in three ways: (i) one may decide that the boy is not in the house at the end of the story, (ii) one may decide that the boy is definitely in the house, or (iii) one may decide that both (i) and (ii) are possible. Naturally, only interpretation (iii) leads to an uncertainty judgement. It is probable that
Why Do Children Fail to Understand Weak Epistemic Terms? 137
many of the children who failed to pass the ME task gave interpretation (i) to the “uncertainty story”, because this interpretation automatically eliminates the “uncertainty story” as a possible answer. (Weak epistemic terms are, in fact, logically compatible with the “certainty story”, but never with a story in which one can be sure that the boy is not in the house.) On the contrary, giving interpretation (ii) to the “uncertainty story” did not really help them to solve the task, because in this case both stories are equally good answers to the task. Although the results of this experiment suggest that the late understanding of epistemic terms is linked, to some extent, to the children’s ability to recognise epistemic uncertainty, one should be careful concluding that children under 8;0 are unable to deal with epistemicity. As many previous studies show, children’s ability to recognise uncertainty may vary considerably depending on the complexity of the task they have to perform (cf. Byrnes and Beilin 1991). It is therefore possible that the children in this experiment tended to choose one of the houses in the inferencene and inferencepe tasks even if they had some feeling that they did not really know which one is the right one. They might have a priori expected the task to be solvable. So, they might have felt as if they were not on safe ground in showing that they did not know how to solve the task. For the same reason, it is also possible that, in the ME task, many children were just more cautious than adults, because they did not feel so secure in the environment of an experiment. This could have led some children to accept the weak statements in combination with a situation that would normally require the use of must or certainly, although they would reject them under circumstances that fit better with their experience in everyday life. Despite these possible objections, the main result of this experiment still suggests that the understanding of epistemic terms by children interacts with the way they are able to deal with their own ignorance about some facts, and not with their ability to recognise scalar implicatures, as the following discussion of the results of the implicature task will show. Whereas, in the ME task, nearly all 6-year-olds, many 7- and some 8-yearolds demonstrate a strong preference for the “certainty story” (i.e. the only story that fits with a logical interpretation of weak epistemic terms), nearly all children preferred the more informative pragmatic reading of einige in the implicature task. Accordingly, no significant correlation was observed between the results from both tasks. If the reluctance to take scalar implicatures into account was the reason for the children’s failure to understand the epistemic terms in the ME task, one would expect exactly the same pattern of answers
138 Serge Doitchinov in the implicature task. This possibility is clearly inconsistent with the data of the experiment. The analysis of the justifications given by the children in the critical trials of the implicature task strongly support this claim. In nearly 100% of the cases, the children argued that the ALL case did not fit with the einige sentences because the latter required that only a part of the group had performed the action. This is exactly the type of justification one expects when the choice was made according to the requirement of the scalar implicature. However, one should be careful with the interpretation of the results of the implicature task. In this experiment, the goal of the implicature task was only to get a good comparison between the reaction of the children to epistemic k¨onnen and vielleicht, on the one side, and the weak scalar term einige, on the other. However, one should keep in mind that the implicature task was only designed to test the preferred reading of einige and not to assess whether the children accept both the semantic and the pragmatic reading of the quantifier. In other words, this task was not designed to assess children’s ability to recognise scalar implicatures directly. Therefore, it cannot be ruled out that children who do not recognise scalar implicatures at all might also prefer to associate einige with the SOME case for other reasons. One such possible reason could be, for example, that einige occurs more often with its pragmatic meaning in everyday conversation. In order to find out whether the preferred pragmatic reading of einige in the implicature task was really the result of a scalar implicature, a second experiment was carried out. 3 3.1
Experiment 2 Participants, method, material and procedure
Four 6-year-olds (from 5;7 to 6;6, mean age 6;2), four 7-year-olds (from 6;7 to 7;6, mean age 7;2), and four 8-year-olds (from 7;7 to 8;6, mean age 8;3), all monolingual speakers of German, participated in the experiment. They were recruited from elementary schools and kindergartens in Stuttgart and T¨ubingen (Germany). To assess the children’s ability to recognise scalar implicatures two tasks were set up. The first task was an exact repetition of the implicature task from the first experiment. This task was called the picture selection task (PST). The second task was a truth value judgement task (TVJT), inspired by Chierchia et al. (1998). The advantage of the TVJT is that it allows for testing the acceptance of sentences with einige in the ALL and the SOME cases
Why Do Children Fail to Understand Weak Epistemic Terms? 139
separately. In the TVJT, the participants were presented the following scenario with small puppets: a group of five Smurfs (Smurfs are comic figures that are well known by German children) were observed by a sixth Smurf (the observer) while performing some action. For example, three of the five Smurfs wanted to cross a road, but two of them did not want to. Three possible outcomes were used for this situation. (i) In the ALL case, three Smurfs crossed the road. After a long discussion, the two reluctant ones joined the group again. (ii) In the SOME case, the reluctant Smurfs refused to cross the road, while the remaining three Smurfs went across. (iii) In the NONE case, no Smurf performed the action. Of course, only one of the possible outcomes was presented in each trial. At the end of the scene, the observer told the participant that he knew what happened. Then he described the scene with one sentence of the same type as in (2). Afterwards, the participant was asked whether the description by the observer was correct or not. The participants had to perform the test three times for the same combination of input sentences and possible outcomes, as in Experiment 1. Three different scenes were used in this experiment. As in Experiment 1, about 40% filler trials were added. In the critical trials, it was expected that all participants would accept the einige sentences in combination with the SOME case, but that only the participants who took the implicature into account would reject this type of sentence in combination with the ALL case. The goal of this experiment was to determine what correlation can be observed between the ability of the children to recognise the implicature in the TVJT and their preferred reading of weak scalar terms in the PST. The participants were tested individually in a separate room at their kindergartens or schools in a single session of about 20 minutes. The order of presentation of the tasks was varied within the different groups. 3.2
Results
Because the answers of the participants were very homogeneous throughout the different trials (always 3/3 or 0/3 correct answers), no mean score of correct answers was computed. The participants were just evaluated as successful or not successful. In the PST, a participant was successful if s/he showed a preference for the SOME case when the input sentence was with einige and the choice was between either the SOME case or the ALL case (cf. above the critical trials in the implicature task of Experiment 1). In the TVJT, a participant was successful when s/he accepted the sentence with einige in
140 Serge Doitchinov the SOME case and rejected it in the ALL case. Eleven children passed the PST successfully. This result largely replicates the finding of the first experiment, that 6- to 8-year-olds strongly prefer the pragmatic reading of einige to the semantic reading. The same eleven children also succeeded in the TVJT. The result shows that at least the great majority of the children reliably took the scalar implicature into account when judging the appropriateness of einige in the TVJT. Only one child — a 6year-old — failed to judge the einige sentences correctly in the TVJT. It is important to note that this child not only failed in the TVJT — showing that he did not recognise the implicature —, but he also had a strong preference for the semantic reading of einige in the PST. It is obvious that the results of the second experiment do not need extra statistical analysis to support the claim that there is a r = 1.0 (p = .000) correlation between the results of the two tasks. 3.3
Discussion
To the extent that a group of 12 children can be representative, the results of this experiment lead to two conclusions: first, the results of the TVJT indicate that children of this age almost always recognise and make use of implicatures in appropriate contexts. Second, the strong correlation between the results of both tasks in Experiment 2 shows that children who do not recognise implicatures strongly favour the semantic reading of scalar terms over their pragmatic reading, and vice versa. These two conclusions are critical for the interpretation of the data from the first experiment. They strongly support the claim that children who preferred the pragmatic reading of einige in the implicature task of the first experiment did so because they made use of a scalar implicature to solve the task, and not just because they relied on the most habitual meaning of einige in everyday conversation. The results of this study in general and those of the second experiment in particular (apparently) contradict Noveck’s (2001) proposal that a certain reluctance by children to draw implicatures may be responsible for the late understanding of weak epistemic terms. What direction does this point to? It is possible that to some extent Noveck’s design has masked children’s capacity to recognise scalar implicatures. In his third task, for example, children and adults had to judge whether a sentence like Certaines giraffes ont un long cou (‘Some giraffes have long necks’) was felicitous or not. No further context was given to the participants, so that they had to rely on their world
Why Do Children Fail to Understand Weak Epistemic Terms? 141
knowledge to judge the appropriateness of the sentence. Because there is no natural context in the world that supports a pragmatic reading of this sentence, it was expected that the participants would reject it as soon as the implicature is recognised. His results show that 7-year-olds did not take the implicature into account (neither did the adults in 41% of the trials). However, it is important to keep in mind that the participants in Noveck’s experiment could follow two distinct strategies to solve the task. First, they might conclude that the sentence was pragmatically incorrect. Second, they might assume that the goal of the study was to test their ability to understand the logical properties of scalar terms. Following this second strategy, they simply accepted the sentences as semantically correct. The lack of a context that could potentially fit with a pragmatic interpretation of the input sentences may have led the participants to decide that what was being tested was their ability to understand the logical properties of certains. Moreover, the lack of a context that fits with a pragmatic reading of certains might have led the participants to give an existential meaning to the quantifier. Since some is a weak quantifier, it is ambiguous between a quantificational and an existential reading. For example, a sentence like Some children went to school can mean that some particular children went to school (quantificational, e.g. Marc, Martha and Nina [but not Adam and Susanna]), or simply that there are children who went to school (existential). While the quantificational reading always gives rise to an implicature in appropriate contexts, the existential reading leaves it completely open, whether all or only a part of the children went to school; it only excludes that it is the case that none of the children went to school. The settings of the present experiments differed in an important point from Noveck’s design: they always gave rise to a comparison between a semantic and a pragmatic context that potentially fits with the tested weak scalar expressions. This was obvious in the first experiment and in the PST of the second experiment. However, the TVJT of the second experiment also provided a much more realistic context than Noveck’s third task did. The einige statements were always embedded in an everyday situation (e.g. crossing a road) that strongly favoured a pragmatic reading of the statements. Furthermore, in the ALL case of the TVJT, two Smurfs were at first reluctant to cross the road and had a discussion before they finally did so. This may have enhanced the probability that the correctness of the statement made by the observer has been judged from a pragmatic and not from a pure logical point of view. The fact that, at the beginning of the scene, it was not sure that all of the Smurfs would cross the road and that it always took time before they finally did, may have led the children to pay more attention to the distinction
142 Serge Doitchinov between the pragmatic and the semantic reading of einige than in Noveck’s third task. The children were also more likely to admit that the observer giving an einige statement in the ALL case did not notice that the last two Smurfs had actually crossed the road, and therefore that he had made a pragmatically inadequate statement. Although the data of the present study clearly show that the choice of an adequate context influences children’s willingness to take scalar implicatures into account, the data of Experiment 2 do not completely contradict Noveck’s main assumption. Following Sperber and Wilson’s (1995) Relevance Theory, Noveck suggests that the semantic reading is the basic reading of weak scalar terms, and that scalar implicatures are not generalised. Sperber and Wilson argue — against neo-Gricean accounts (cf. Levinson 2000) — that the pragmatic meaning of weak scalar terms is not automatically taken into account (as a generalised implicature), but it must be added every time the context requires it. Consequently, Noveck suggests that, in his experiment, the children found no reason to add an implicature because no appropriate context called for it, and therefore, they remained with the default (i.e. the logical) meaning of certains. The results of the present study do not contradict this claim, only if it is admitted that the contexts of the two experiments conducted here gave enough reason to the children to take the implicature into account when judging the input sentences. Furthermore, in the second experiment, the fact that the 6-year-old who did not recognise the scalar implicature in the TVJT also strongly favoured the semantic reading of einige in the PST fits well with the assumption that the default reading of weak scalar terms is the semantic and not the pragmatic one. However, the empirical basis of the second experiment of this study is too weak to make any definitive claim about this. More data, especially from younger children, will be needed to confirm or contradict this last observation. 4 Conclusion The results of the two experiments of this study suggest that the course of the acquisition of epistemic terms depends to a great extent on the development of children’s ability to process the information that underlies an epistemic inference. They also suggest that this ability is not yet fully mastered by eight years of age. This, however, does not mean that younger children are unable to use weak epistemic terms at all, but rather that their capacity to use such terms is simply limited by their partial inability to recognise epistemic uncertainty. Therefore, young children probably first use weak epistemic terms
Why Do Children Fail to Understand Weak Epistemic Terms? 143
seldomly, and only in everyday life situations they are very familiar with. In this sense, the data provided by this study do not fully contradict the findings of naturalistic studies (cf. Ehrich 2005). However, the results of this study also suggest that young children should have difficulties relying on linguistic evidence (i.e. epistemically modalised utterances of a speaker) to infer epistemic possibility, and that they should occasionally overgeneralise the use of strong epistemic terms in their talk. However, it remains questionable whether such mistakes can be observed in naturalistic studies at all. Acknowledgements This research has been supported by the “Deutsche Forschungsgemeinschaft” within the Sonderforschungsbereich 441 “Linguistic Data Structures”. I wish to thank Veronika Ehrich, Ira Noveck and two anonymous reviewers for insightful comments on this paper. I am also very grateful to Barbara Dillenburger and Frank Schlosser who helped me conduct the experiments.
References Bartsch, Karen and Henry M. Wellman 1995 Children Talk About the Mind. Oxford University Press, Oxford, New York. Byrnes, James P. and Harry Beilin 1991 The cognitive basis of uncertainty. Human Development, 34: 189– 203. Byrnes, James P. and Willis F. Overton 1986 Reasoning about certainty and uncertainty in concrete, causal and propositional contexts. Developmental Psychology, 6: 793–799. Chierchia, Gennaro, Stephen Crain, Maria T. Guasti, and Rosalind Thornton 1998 “some” and “or”: A study on the emergence of logical form. In Annabel Greenhill, Mary Hughes, Heather Littlefield, and Hugh Walsh, (eds.), Proceedings of the 22th annual Boston Conference on Language Development, volume 1, pp. 97–108. Cascadilla, Somerville, MA. Doitchinov, Serge 2001 ‘Es kann sein, daß der Junge ins Haus gegangen ist’. Zum Erstspracherwerb von k¨onnen in epistemischer Lesart. In R. M¨uller and M. Reis, (eds.), Modalit¨at und Modalverben im Deutschen, pp. 111– 134. Buske, Hamburg. Ehrich, Veronika 2005 Linguistic constraints on the acquisition of epistemic modal verbs. This volume.
144 Serge Doitchinov Horn, Laurence R. 1972 On the semantic properties of the logical operators in English. Mimeo: Indiana University Linguistic Club. Levinson, Stephen C. 2000 Presumptive Meanings. The Theory of Generalized Conversational Implicature. MIT Press, Cambridge, MA. Morris, Bradley J. and Vladimir Sloutsky 2002 Children’s solution of logical versus empirical problems: What’s missing and what develops. Cognitive Development, 16: 907–928. Noveck, Ira A. 2001 When children are more logical than adults: experimental investigations of scalar implicature. Cognition, 78: 165–188. Papafragou, Anna 2000 Modality: Issues in the Semantic-Pragmatic Interface. Elsevier, Amsterdam. Pieraut-Le Bonniec, Gilberte 1980 The Development of Modal Reasoning: Genesis of Necessity and Possibility Notions. Academic Press, New-York. Somerville, Susan C., B. A. Hadkinson, and G. Greenberg 1979 Two levels of inferential behaviour in young children. Child Development, 50: 119–131. Sperber, Dan and Deirdre Wilson 1995 Relevance: Communication and Cognition. Blackwell, Oxford, 2nd edition. Stephany, Ursula 1986 Modality. In Paul Fletcher and Michael Garman, (eds.), Language Acquisition, pp. 375–400. Cambridge University Press, Cambridge. Wellman, Henry M. 1990 The Child’s Theory of Mind. MIT Press, Cambridge, MA.
Processing Negative Polarity Items: When Negation Comes Through the Backdoor Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
1
Introduction
Various lexical elements, such as the German negative polarity item jemals (ever) exhibit an interesting property in that they can only occur in certain kinds of contexts. Negative polarity items must occur in a context in which the proper semantic/pragmatic properties are accessible, see (1a). If the context does not provide an (accessible) negator, the construction becomes unacceptable, see (1b) .1 (1)
a. Kein Mann war jemals glücklich. no man was ever happy ‘No man was ever happy.’ b. *Ein Mann war jemals glücklich. a man was ever happy ‘A man was ever happy.’
Another important observation is that the violation of the polarity construction is not due to a word category mismatch, see (2): (2)
a. Kein Mann war gestern glücklich. no man was yesterday happy ‘No man was happy yesterday.’ b. Ein Mann war gestern glücklich. a man was yesterday happy ‘A man was happy yesterday.’
In (2), the polarity item jemals is replaced by a non-polarity adverb gestern (yesterday). Both sentences (2a) and (2b) are equally acceptable, independent of the presence of negation. Seeing that the negative polarity item jemals (ever) belongs to the word category adverb, the unacceptability of (1b) is not due to a violation of the structural requirements. (1b) is unac-
146
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
ceptable rather, because of a conflict between the specific lexical demands of the polarity item and the properties of the context. However, what the relevant lexical properties are that restrict the occurrence of a polarity item, is discussed controversially in the theoretic linguistic literature. Linguistic descriptions of the distribution and interpretation of polarity items agree that the occurrence of polarity items is licensed by semantic (e.g. Horn 1997; Ladusaw 1980) or pragmatic (Chierchia 2001; Fouconnier 1980; Krifka 1995) properties, or by a combination of both (Baker 1970; Linebarger 1987). These properties, in addition, must be accessible to the polarity item, where accessibility is determined by hierarchical constituency (Haegeman 1995; Laka 1994; Progovac 2000). A negative polarity item is only licensed if it occurs in the scope of a negator, such as in (3a). As a consequence, linguistic theory predicts that a negative polarity construction is equally unacceptable irrespective of whether the context provides no negation at all, such as in (3b), or of whether the negative polarity item is preceded, but not c-commanded by a negator, such as in (3c). (3)
a. Kein Mann, der einen Bart hatte, war jemals glücklich. no man who a beard had was ever happy ‘No man who had a beard was ever happy.’ b. *Ein Mann, der einen Bart hatte, war jemals glücklich. a man who a beard had was ever happy ‘A man who had a beard was ever happy.’ c. *Ein Mann, der keinen Bart hatte, war jemals glücklich. a man who no beard had was ever happy ‘A man who had no beard was ever happy.’
In sum, syntactic as well as semantic/pragmatic information play an important role in the licensing of negative polarity items in a sentence context. In order to shed light on the specific lexical properties of a negative polarity item like jemals (ever) and the licensing conditions which are due to hierarchical constituency, we investigated in two studies negative polarity structures such as in (3). More specifically, we examined the influence of hierarchical constituency and linear order of the negator by using acceptability speeded judgment tasks and event-related brain potentials (ERPs).
Processing Negative Polarity Items
2
147
The judgment and processing of polarity items
From a psycholinguistic point of view, the properties of polarity items raise questions with respect to syntactic and semantic processing. More specifically, we want to know how the human language processor responds to the different types of demands initiated by a polarity item. This is supposed to shed light not only on the specific nature of polarity items, but more importantly on how the specific properties of the polarity item interact with the restrictions provided by the context. Our experiments focused on the acceptability of negative polarity in three types of constructions such as (3). In (3a) the negator kein (no) appears in the same clause (main clause) as the negative polarity item jemals (ever) and is therefore accessible. The ungrammatical structure in (3b) does not contain negation, which leaves the jemals (ever) unlicensed. In the third structure (3c), the negator precedes the negative polarity item, but the negation is not structurally accessible because it is too deeply embedded in the relative clause. If the linguistic description is correct that negative polarity items need a structurally accessible negator in order to be licensed, we expect that structures (3b) and (3c) (where this condition is not met) to be rejected as ungrammatical significantly more often compared to structures such as (3a). However, linguistic theory does not provide a reason to assume that acceptabilities should differ depending on whether the negation is there but not accessible, as in (3c), or not present at all, as in (3b). 2.1
Experiment 1: Speeded acceptability judgment-study
In a first experiment, we wanted to test how structures such as (3) are judged by native German speakers in a speeded grammaticality judgement task. This technique is believed to provide a reflection of online processing decisions by not allowing much time for reflection that could contaminate the subject’s responses and not accurately reflect their grammatical intuitions.2 2.1.1
Methods
Participants 24 students from the University of Leipzig (mean age 21 years, 10 female) participated. They were all monolingual speakers of German and received course credits for their participation.
148
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
Materials 24 sets of lexical material were constructed in each of the three critical conditions (3a) to (3c), resulting in a total of 72 experimental sentences. Each subject saw a subset of 24 sentences (8 per condition). Additionally, these sentences were intermixed with 24 related and 80 unrelated fillers. Procedure After a set of 12 training sentences (4 per condition), the 128 sentences of the experiment were presented in a pseudo-randomized order in the center of a screen. The initial subject phrase, the nominal phrase within the relative clause, and each of the other words in isolation were presented for 300ms each. There were 100ms blank screen between the pictures (interstimulus interval/ISI). 500ms after the last word of each sentence, subjects had to judge the acceptability of the presented sentence within a maximal interval of 3000ms by pressing one out of two buttons. 1000ms after their response, the next trial began. Data analysis We computed the mean accuracy percentages and mean response latencies in the correctly performed trials, per condition per subject as well as per condition per item. Subject and item analysis were statistically analyzed in separate ANOVAs with a factor VIOLATION, with three levels correct (COR), violation without negation (VNO), violation with inaccessible negation in the relative clause (VNR). In order to control for violations of sphericity, the correction proposed by Huynh and Feldt (1970) was applied. Seeing that all possible single comparisons were computed in case of significant main effect, the alpha level was adjusted according to Keppel (1991). 2.1.2
Results
Mean accuracy percentages and response latencies in the acceptability judgment task are displayed in Table 1. Table 1. Mean accuracy rates (in percent) and reaction times (in ms) for all three conditions across all 24 subjects (with standard deviations in parentheses) a. correct (COR) b. violation without negation (VNO) c. violation with inaccessible negation (VNR)
accuracy 85 (13.3) 83 (17.2) 70 (30.3)
reaction times 540 (241) 554 (237) 712 (314)
Processing Negative Polarity Items
149
The statistical analysis of the accuracies revealed a main effect of CON(F1 (2,46) = 4.38, p < .05; F2 (2,46) = 7.68, p < .01). This main effect was due to the fact that subjects made more errors in rejecting condition VNR compared to both VNO (F1 (1,23) = 6.11, p < .05; F2 (1,23) = 10.80, p < .01) and COR (F1 (1,23) = 5.11, p < .05; F2 (1,23) = 8.89, p < .01). VNO and COR, however, did not differ from one another (both F < 1). Analysis over response latencies revealed a similar picture. We found a main effect of CONDITION (F1 (2,46) = 8.75, p < .01; F2 (2,46) = 7.28, p < .01). Responses in condition VNR were slower compared to both VNO (F1 (1,23) = 26.68, p < .001; F2 (1,23) = 11.95, p < .01) and COR (F1 (1,23) = 10.25, p < .01; F2 (1,23) = 8.35, p < .05). Again, VNO and COR did not differ from one another (both F < 1). DITION
2.1.3
Discussion
The results of the acceptability judgment experiment show that subjects rejected both violation conditions as we expect them to do. However, the comparison between the incorrect conditions reveals that subjects accepted more often those structures in which the negator precedes the negative polarity item although it is not c-commanding the latter, compared to structures without negation at all. Differences in response latencies point in the same direction. The reduction in accuracy as well as the higher reaction times in condition VNR imply that it is difficult for the language processor to inhibit the influence of the negation in the relative clause. This suggests that the negator may be wrongly used to license the polarity item despite the fact that it is not in a c-commanding position. Nevertheless, the fact that the 70% of VNR constructions are rejected indicates that hierarchical constituency and accessibility play a crucial role in determining the acceptability of negative polarity constructions. 2.2 2.2.1
Experiment 2: Event-related brain potentials (ERPs) Some preliminary remarks on ERPs
In order to investigate the on-line processing of negative polarity constructions, we used event-related potentials (ERPs). Before we describe the result of our ERP study in detail we would like to give a brief overview over some language related ERP effects which have been identified using this experimental technique.
150
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
Event-related potentials (ERPs) are an ideal tool in investigating language processing on-line because they are continuous and have a very high temporal (millisecond-by-millisecond) resolution (Kutas and van Petten 1994). Compared to quantitative measures (e.g. reaction times), ERP effects (so-called components) are characterized by a set of quantitative (peak latency) and qualitative parameters (polarity, topography, experimental sensitivity). In response to linguistically distinct experimental manipulations, distinct ERP patterns have been found. With regard to language processing, four main markers have been identified in the literature. They are identified by a nomenclature which refers to their polarity (N/negativity versus P/positivity), post-stimulus peak latency and topographic distribution. The early left anterior negativity (ELAN) occurs between 120 and 220ms with either a left or a bilateral anterior distribution. In several studies this component has been associated with phrase structure violations (cf. Friederici 2002; Hahne and Friederici 1999, 2002). Second, the left anterior negativity (LAN) is similar in topography and in polarity to the ELAN, but peaks later than the latter, namely between 300 and 500ms and in response to morphosyntactic violations (Coulson, King, and Kutas 1998; Gunter, Schriefers, and Friederici 2000; Friederici and Frisch 2000). The third component is the so-called N400. The N400 is a negativity with a latency peaking typically around 400ms after the onset of a critical element. It has a centro-parietal bilateral distribution often with a slight right hemisphere focus. The N400 reflects the processing costs of semantic or thematic integration, since it has been found in response to semantic as well as thematic violations (Kutas and Hillyard 1980a; Friederici and Frisch 2000) either of verb argument structure or of thematic hierarchies between case-marked arguments (Frisch and Schlesewsky 2001). Finally, the so-called P600 is a positivity peaking between 600 and 900ms with a centro-parietal distribution and has been associated with syntactic reanalysis and repair (Osterhout and Holcomb 1992). Additionally, this component has been found in response to enhanced syntactic complexity (Kaan, Gibson, Harris, and Holcomb 2000; Friederici, Hahne, and Saddy 2002) including ambiguity (Frisch, Schlesewsky, Saddy, and Alpermann 2002).
Processing Negative Polarity Items
2.2.2
151
ERPs and polarity constructions
With regard to our study on negative polarity items, there are – to our knowledge – only two studies in the literature in which the processing of polarity items was tested by using event-related potentials. In a study carried out by Shao and Neville (1998), the differences between a correct sentence (4a) and its violation (4b) were tested. (4)
a. Max says that he has never been to a birthday party. b. *Max says that he has ever been to a birthday party.
Shao and Neville (1998) found in ungrammatical sentences like (4b) on the polarity item ever an anterior negativity between 300 and 500ms followed by a late positivity between 500 and 1000ms, compared to never which meets the context requirements in (5a). Surprisingly, they suggested that the negativity can be associated with specific types of semantic processing that are bound to polarity constructions, although they did not find an N400. Additionally, it is well known that lexical differences between two elements affect ERP correlates (Kutas and van Petten 1994). Therefore, it is not possible to exclude that Shao and Neville's findings were influenced by lexical differences between the two items tested (ever versus never). Saddy, Drenhaus, and Frisch (2004) investigated the failure to license positive polarity items ((5c) versus (5d)) and negative polarity items ((5a) versus (5b)) in German. (5)
a. Kein Mann, der einen Bart hatte, war jemals glücklich. no man who a beard had was ever happy ‘No man who had a beard was ever happy.’ b. *Ein Mann, der einen Bart hatte, war jemals glücklich. a man who a beard had was ever happy ‘A man who had a beard was ever happy.’ c. *Kein Mann, der einen Bart hatte, war durchaus glücklich. no man who a beard had was certainly happy ‘No man who had a beard was certainly happy.’ d. Ein Mann, der einen Bart hatte, war durchaus glücklich. a man who a beard had was certainly happy ‘A man who had a beard was certainly happy.’
Their results showed different processing reflexes associated with failure to license positive polarity items in comparison to failure to license negative polarity items. Failure to license both negative and positive polar-
152
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
ity items elicited an N400 component reflecting semantic integration cost. Failure to license positive polarity items, however, also elicited a P600 component ((5c) versus (5d)). The additional P600 in the positive polarity violations was interpreted as a reflex of higher processing complexity associated with a negative operator. Their results suggest a difference between the two types of violation; namely, that the processing of negative and positive polarity items does not involve identical mechanisms. 2.2.3
The present study
In an ERP experiment, we addressed the question of how structures such as in (3) are processed on-line. Seeing that ERPs provide qualitatively different types of responses (components) being associated with different types of linguistic information (see above), they can be used to answer the following questions: First, how do ERP patterns differ between acceptable (such as (3a)) and unacceptable structures (such as (3b) and (3c))? Second, what is the nature of the intrusion effect found in Experiment 1, i.e. the difference between (3b) and (3c)? Seeing that the violation of licensing a negative polarity item elicited a N400 component in Saddy et al. (2004), we expect a similar pattern for the violation conditions in our study, namely, that such a violation induces semantic/pragmatic integration problems. We do not expect a P600 effect in our study, seeing that Saddy et al. found this component only in response to a positive polarity violation. Over and above that, we are interested in the influence of the negator in the relative clause on the ERP pattern for the negative polarity item. As we have seen in the acceptability judgments in Experiment 1, a non ccommanding negation enhances the acceptability of the unlicensed negative polarity item. If this effect reflected already an on-line process, we would expect the violation effect (that is, the N400 that is generally expected according to Saddy et al. 2004) to decrease. 2.2.4
Methods
Participants 16 undergraduate students (mean age 21 years, 10 female) from the University of Potsdam participated in the ERP study after giving informed consent. All were right-handed and had normal or corrected-tonormal vision.
Processing Negative Polarity Items
153
Materials Each subject read a total of 640 sentences in a randomized order in which 120 critical sentences (40 per condition) were intermixed with 120 related and 320 unrelated filler sentences. The total material was presented to each subject in two sessions with a one-week interval between the sessions. All sentences consisted of a main clause, in which a relative clause was embedded. The negator appeared in the grammatical condition (6a) in the main clause (accessible to the negative polarity item), in ungrammatical condition (6c) in the relative clause (not accessible to the negative polarity item) and the ungrammatical condition (6b) did not contain negation. (6)
a. Kein Mann, der einen Bart hatte, war jemals glücklich. no man who a beard had was ever happy ‘No man who had a beard was ever happy.’ b. *Ein Mann, der einen Bart hatte, war jemals glücklich. a man who a beard had was ever happy ‘A man who had a beard was ever happy.’ c. *Ein Mann, der keinen Bart hatte, war jemals glücklich. a man who no beard had was ever happy ‘A man who had no beard was ever happy.’
Procedure After a set of 12 training sentences (4 in each of the critical conditions, see above), the 120 critical sentences were randomly presented in the center of a screen, with 400ms (plus 100ms interstimulus interval) for the initial subject phrase, the nominal phrase within the relative clause, and for each of the other words in isolation. 500ms after the last word of each sentence, subjects had to judge its well-formedness within a maximal interval of 3000ms by pressing one out of two buttons. 1000ms after their response, the next trial began. The EEG was recorded by means of 16 AgCl electrodes with a sampling rate of 250Hz (with impedances < 5kOhm) and were referenced to the left mastoid (re-referenced to linked mastoids offline). Electrode positions are based on the nomenclature proposed by the American Electroencephalographic Society (Sharbrough et al. 1991). The horizontal electro-oculogram (EOG) was monitored with two electrodes placed at the outer canthus of each eye and the vertical EOG with two electrodes above and below the right eye. Data analysis In order to see whether subjects judged the sentences in the way we expected them to do, the accuracy percentages of their judgments
154
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
as well as response latencies of behavioral data were analyzed (see section 2.1.1 for explanation). For the ERP analysis, only trials with correct answers in the judgment task and without artifacts were selected (83% of all trials). In order to compensate for drifts, data were filtered with 0.4Hz high pass. Single subject averages were computed in a 1300ms window relative to the onset of the critical item (second argument) and aligned to a 200ms pre-stimulus baseline. ERPs were statistically analyzed in two time windows: 300–450 ms for the N400 and 500–800ms for the P600 effects. ERP effects were statistically computed in repeated-measures analyses of variance (ANOVA) with two factors: A condition factor VIOLATION, with three levels correct (COR), violation without negation (VNO), violation with inaccessible negation in the relative clause (VNR) and a topographical factor REGION with the three levels anterior (electrodes F3, FZ and F4), central (electrodes C3, CZ and C4) and posterior (electrodes P3, PZ and P4). 2.2.5
Results of the behavioral data
Table 2 shows the mean accuracy percentages and response latencies in the behavioral data of the ERP-study (acceptability judgment task). Table 2. Mean accuracy rates (in percent) and reaction times (in ms) for all three conditions across all 16 subjects (with standard deviations in parentheses) a. correct (COR) b. violation without negation (VNO) c. violation with inaccessible negation (VNR)
accuracy 94 (4.6) 95 (4.6) 89 (8.7)
reaction times 575 (203) 529 (184) 595 (254)
The statistical analysis of the accuracies showed a main effect of CON(F1 (2,30) = 5.29, p < .05; F2 (2,78) = 8.28, p < .001). This main effect was due to the fact that subjects made more errors in rejecting condition VNR compared to both COR (F1 (1,15) = 5.61, p < .05; F2 (1,39) = 8.04, p < .001) and VNO (F1 (1,15) = 10.57, p < .001; F2 (1,39) = 17.17, p < .0001). However, VNO and COR did not differ from one another (both F < 1). The analysis over response latencies showed a different picture. We found a main effect of CONDITION in the item analysis (F2 (2,78) = 5.46, p < .001) which was only marginal in the subject analysis (F1 (2,30) = 2.82, p = .08). Responses in condition VNR were slower compared DITION
Processing Negative Polarity Items
155
to VNO (F1 (1,15) = 5.43, p < .05, F2 (1,39) = 8.31, p < .001). The responses in condition VNO were faster compared to COR in the item analysis only (F1 (1,39) = 2.69, p = .12, F2 (1,39) = 7.48, p < .01). VNR and COR, however, did not differ (F1 < 1, F2 < 1). The results of the behavioral data show a similar pattern as in Experiment 1. Subjects rejected the incorrect conditions (VNO) and (VNR). The comparison between both ungrammatical conditions showed that subjects incorrectly accept sentences more often when negation linearly precedes the negative polarity item (VNR) than in sentences which do not contain negation. 2.2.6
Results of the ERP study
ERP patterns from the onset of the critical item (polarity item, onset at 0ms) up to 1000ms thereafter are displayed in Figure 1. As can be seen from the Figure 1, ERPs in both incorrect conditions (b) and (c) show a negativity that is broadly distributed compared to the correct condition (a). Moreover, we see a difference in the amplitude of the negativity in both ungrammatical conditions (b) versus (c), in that the negativity seems to be weaker in the condition where negation is captured in the relative clause (c). Additionally, the visual inspection reveals a positivity for both ungrammatical condition (b) and (c) in comparison with the grammatical condition (a), especially at posterior sites. Statistical analysis for the N400 time window (300 to 450 ms) revealed a main effect of VIOLATION (F (2,30) = 9.91, p < .001) which was due to a negative going pattern in both VNO (F (1,15) = 17.22, p < .001) and VNR (F (1,15) = 6.42, p < .05) compared to COR. In addition, there was a marginal negativity in VNO compared to VNR (F (1,15) = 4.29, p = .08). Furthermore, there was an interaction VIOLATION x REGION (F (4,60) = 9.58, p < .001). Resolving this interaction revealed main effects of VIOLATION in the anterior (F (2,30) = 4.61, p < .05), the central (F (2,30) = 10.58, p < .001) and the posterior region (F (2,30) = 13.00, p < .001). VNO was more negative than COR in all three regions (anterior: F (1,15) = 5.97, p < .05; central: F (1,15) = 17.59, p < .001; posterior: F (1,15) = 27.05, p < .001). The pattern in VNR was also more negative than in COR, but only in the central (F (1,15) = 8.82, p < .05) and the posterior region F (1,15) = 11.25, p < .001, not at anterior sites (F < 1). Interestingly,
156
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
Figure 1. ERP effects on the negative polarity item jemals (ever) from the onset up to 1000ms there after at a subset of nine electrodes. Negativity is plotted upwards. The solid line displays the grammatical condition (a), the dotted line displays the incorrect condition without any negation (b) and the broken line displays the incorrect condition where the relative clause contains negation (c). For presentation purpose only, ERPs were filtered off-line with 8Hz low pass.
VNO elicited a negativity compared to VNR in the anterior region (F (1,15) = 7.82, p < .05), but not in the two other regions (central: F (1,15) = 3.12, p = .15; posterior: F (1,15) = 2.51, p = .20). In the global ANOVA for the P600 time window (500 to 800ms) we only found a significant interaction for VIOLATION x REGION (F (4,60) = 2.63 p < .05). Resolving the interaction, we found a main effect of VIOLATION only in the posterior region (F (2,30) = 5.04 p < .05). This main
Processing Negative Polarity Items
157
effect was due to a positivity in both VNR (F (1,15) = 5.68 p < .05) and VNO (F (1,15) = 9.25 p < .05) compared to COR. There was no effect between the two violation conditions VNO and VNR (F < 1). 3
Discussion
In the present study, we investigated the processing of negative polarity items in German using speeded acceptability judgment tasks and ERPs. The results have shown that the language processor is sensitive to whether the licensing conditions of the negative polarity item can be met. Furthermore, our results showed that the language processor is sensitive to licensing information when the licensor of the negative polarity item is structurally not accessible. We know from their linguistic description that polarity items have interesting lexical properties in that they are dependent on a specific context. A negative polarity item like the German jemals (ever) has to occur in the scope of appropriate licensor, where scope is defined in hierarchical terms, namely, a c-command relation. These syntactic and semantic constraints have to be met, otherwise the structure becomes ungrammatical. In short, there are two licensing conditions for negative polarity items. First, a licensor has to be present (semantic/pragmatic condition), and second, a licensor has to be structurally accessible (structural/syntactic condition). The linguistic characterization of this specific type of lexical element does not give reason to assume that acceptability should differ, depending on whether negation is there but not accessible as in (6b) or not present at all as in (6c). Negation in a relative clause should not influence the unacceptability of structures with jemals (ever). In this sense, we would expect similar responses from the processing system when licensing conditions are violated. However, this seems not to be the case. Results of the judgment study (Experiment 1) – which are nicely replicated in the behavioral control task of Experiment 2 – show that subjects rejected both violations, but not in the same way. Interestingly, accuracies were lower and reaction times were higher in the violation condition in which negation appeared in the relative clause compared to the violation without negation at all. This suggests that the negator is (wrongly) used to license the polarity item even if it is not in a c-commanding position. Obviously, the language processor does not strictly adhere to structural restrictions. However, since judgement data cannot reveal the qualitative nature of this effect, and in order to see whether the intrusion is an on-line process, an ERP study was conducted.
158
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
The results of the ERP study showed both that a negative polarity item in the context without negation and in the context where the negation is not accessible elicit an N400 negativity followed by a late positivity (P600) compared to their grammatical counterparts. Following ERP literature as described above (see Section 3.2.1), the N400 reflects enhanced cost of semantic integration, whereas the P600 can be seen as a marker of syntactic repair attempts. Although the N400 was predicted according to the study of Saddy et al. (2004), the P600 was not, since Saddy et al. found a significant positivity only in response to a positive polarity violation, but not on unlicensed negative polarity items. How can we account for the difference between the Saddy et al. and the present study? One possibility could be that the P600 might be sensitive to the saliency of a (syntactic) violation (Coulson, King, and Kutas 1998). In other words, this component has been argued to increase, the more salient the violation gets. Saliency can be operationalized via detectability in case of a violation (Osterhout and Hagoort 1999). In other words, saliency of a violation should be inversely proportional to the number of errors subjects make when judging structures containing the respective violation (Osterhout and Hagoort 1999). Accordingly, we would expect the participants in the study of Saddy et al (2004) – which did not show a significant P600 component – to have significantly higher error rates than the participants in the present study where a significant P600 was found. This seems in fact to be the case.3 Moreover, Saddy et al. (2004) found the absence of a P600 for a jemals-violation to contrast with the violation of a positive polarity item (durchaus (certainly)) which elicited both a N400 and a P600. It would be in line with the general argument of Saddy et al. (2004) if the violation of a negative polarity item jemals (ever) in the present study affected the P600, but not as strongly as a violation of a positive polarity item. Again, there are indications that this is indeed the case.4 Nevertheless, whether these interpretations satisfactorily explain the divergence between the two studies has to be systematically addressed in future experiments. Seeing that the elicited P600 does not distinguish between the two ungrammatical conditions, the negation in the relative clause does not seem to affect the parser's effort to repair a syntactically ill-formed structure. With regard to the N400, by contrast, we found differences in the amplitudes of this component between the two ungrammatical conditions. In the condition where negation is not accessible to the negative polarity item, the N400 was weaker compared to the condition without negation at all. This result suggests that a preceding negator can erroneously promote semantic/pragmatic integration despite a lack of structural accessibility.
Processing Negative Polarity Items
159
In both cases, the processor tries to integrate the polarity item by ‘looking for’ a negator in the sentence context. In the case of a – structurally inaccessible, but linearly preceding – negator, the processor finds a suitable target and a potential analysis is entertained. This elicits a weaker N400 compared to the context without negation. In the latter case, the N400 is stronger because the processor does not find a goal that allows a semantic/pragmatic integration of the polarity item. The fact that there is no observable difference in the P600 in the two violation conditions, suggests that the influence of negation in the relative clause has an impact only on the semantic/pragmatic integration of the negative polarity item (N400). The results of this study point to further systematic investigations of polarity constructions that will shed more light on the processing of these elements and the contexts they appear in. In our study, we have investigated the conditions for negative polarity items in a specific type of construction. Further research should investigate the role of other licensors. WH-operators, for example, are considered another licensor of negative polarity items. It would be interesting to test whether the properties of this kind of operator will induce similar or distinct ERP effects compared to the processing of constructions with negation. This will give us more insight into the understanding of the licensing relation/environments in which negative polarity items appear. In general, the investigation of polarity constructions gives us the possibility of understanding the interaction of pragmatic, semantic, and syntactic phenomena. 4
Conclusion
Both the results of the speeded acceptability judgement as well as the ERP experiment revealed that unlicensed negative polarity items are unacceptable on both semantic and syntactic grounds. Furthermore, a linearly preceding but structurally inaccessible negator can, on the one hand, erroneously enhance the acceptability of the structure in the judgement data, and, on the other hand, weaken the N400 effect in the ERP data, which reflects semantic integration problems. These results can be interpreted as follows: the simple existence of a potential licenser for a negative polarity item is sufficient to alter both the time course and efficiency of processing. It follows then, that at least for the examples investigated here, there is a competition between semantic/pragmatic information and hierarchical constituency. A theoretical approach that is only based on structural relations would not predict this dis-
160
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
tribution. Taken together, the results of the two experiments support an approach that combines semantic (pragmatic) properties and hierarchical constituency during the processing of negative polarity items. Acknowledgements The present research was supported by a grant from the Deutsche Forschungsgemeinschaft (DFG) to D.S. (FOR 375/1–4). We want to thank Angela D. Friederici, Joanna Báaszczak – among many others – and two anonymous reviewers for many helpful suggestions, and Heike Herrmann, Beate Müller and Kristin Wittich for their support in data acquisition.
Notes 1.
2. 3.
4.
For the purpose of our paper, we restrict ourselves only to (constituent) negation. However, the characterization of the licensing contexts for negative polarity items is incomplete. Polarity items can be licensed by verbs with negative properties (e.g., We doubted that Peter was ever happy) or downward entailing weak quantifiers (e.g., Few men were ever happy) or in the context of questions (e.g., Who was ever happy). Note that there are contexts in which a polarity item is licensed even if it is not overtly c-command by negation (e.g., A doctor who knew anything about acupuncture was not available). The results of a questionaire study used to determine the viability of potential stimuli sentences showed that subjects did not discrimate between (3b) and (3c) type of constructions. By comparing the error rates in the correct condition and the violation condition without a negation in the relative clause (which were identical in both studies) in the Saddy et al. study with those in the present experiment in a between-subjects ANOVA, we found an interaction VIOLATION x GROUP (F (1,30) = 7.99, p < .01). This interaction was indeed due to the fact that the mean accuracies in the violation condition without a negation in the relative clause (which was used in both studies) were lower in the Saddy et al. study (86%) compared to the experiment described in the present paper (92%). This difference turned out to be significant (F (1,30) = 5.96, p < .05). Interestingly, there was no difference in accuracies between the correct conditions from both studies (95% versus 94%, F < 1). This shows that there was no general difference in performance between the two subject samples, but that this was confined to the detection of the violation. Since we had positive polarity constructions with durchaus (certainly) as fillers in the present study, we could directly compare negative and positive polarity violation effects. We therefore chose only structures without negation in
Processing Negative Polarity Items
161
the relative clause (COR and VNO in the present study and the respective durchaus-counterparts) since these are identical to the conditions in Saddy et al. (in press). In a time window between 600 and 800 ms over all 9 electrodes, we found in the direct comparison no significant difference between the two correct conditions (F < 1). However, we found a significant difference in the incorrect conditions (F (1,15) = 4.7, p < 0.5), which was due to the fact that the pattern in the incorrect durchaus-condition was more positive going than the pattern in the jemals-violation condition.
References Baker, Curtis L. 1970 Double Negatives. Linguistic Inquiry, 1: 169–186. Brown, Colin M. and Hagoort, Peter 1999 The Neurocognition of Language. Oxford: Oxford University Press. Chierchia, Gennaro. 2001 Scalar implicatures, polarity phenomena, and the syntax/pragmatics interface. Ms. University of Milan. Coulson, Seana, King, Jonathan & Kutas, Marta 1998 Expect the unexpected: Event-related brain response to morphosyntactic violations. Language and Cognitive Processes, 13, 21–58. Fauconnier, Gilles 1975b Polarity and the scale principle. Chicago Linguistic Society, 11, 188– 199. Fauconnier, Gilles 1980 Pragmatic entailment and questions. In J.R. Searle et al. (Eds.), Speech act theory and pragmatics, Dordrecht, Reidel. Friederici, Angela D. 2002 Towards a neural basis of auditory language processing. Trends in Cognitive Science, 6, 78–84. Friederici, Angela D., Hahne, Anja. and Saddy, Douglas. 2002 Distinct neurophysiological patterns reflecting aspects of syntactic complexity and syntactic repair, Journal of Psycholinguistic Research, 31 (1), 45–63. Friederici, Angela D. and Frisch, Stefan 2000 Verb argument structure processing: the role of verb-specific and argument-specific information. Journal of Memory and Language, 43, 476–507. Frisch, Stefan and Schlesewsky, Matthias 2001 The N400 reflects problems of thematic hierarchizing. Neuroreport, 12, 3391–3394. Frisch, Stefan, Schlesewsky, Matthias, Saddy, Douglas, and Alpermann, Annegret 2002 The P600 as an indicator of syntactic ambiguity. Cognition, 85, 83– 92.
162
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
Giannakidou, Anastasia 1998 Polarity Sensitivity As (Non)Veridical Dependency. Linguistik Aktuell/Linguistics Today, 23. Gunter, Thomas C., Stowe, Laurie A., and Mulder, Gerben 1997 When syntax meets semantics. Psychophysiology, 34, 660–676. Haagort, Peter, Brown, Colin M., and Groothusen, J. 1993 The syntactic positive shift as an ERP measure of syntactic processing. Language and Cognitive Processes, 8, 439–483. Haegeman, Liliane 1995 The Syntax of Negation [= Cambridge Studies in Linguistics 75]. Cambridge University Press. Hahne, Anja 1998 Charakteristika syntaktischer und semantischer Prozesse bei der auditiven Sprachverarbeitung: Evidenz aus ereigniskorrelierten Potentialstudien. MPI Series in Cognitive Neurosciences 1. Horn, Larry 1997 Negative polarity and the dynamics of vertical inference. In Forget, D., Hirschbühler P., Martinon, F. & M.-L. Rivero (Eds.), Negation and Polarity: Syntax and Semantics (pp. 157–182). Amsterdam: John Benjamins. Huynh, Huynh and Feldt, Laur. S. 1970 Conditions under which the mean square ratios in repeated measurement designs have exact F-distribution. Journal of the American Statistical Association, 65, 1582–1589. Kaan, Edith, Harris, Anthony, Gibson, Edward, and Holcomb, Phillip 2000 The P600 as an index of syntactic integration difficulty. Language and Cognitive Processes, 15, 159–201. Krifka, Manfred 1995 The semantics and pragmatics of polarity items. Linguistic Analysis, 25, 209–257. Keppel, Geoffrey 1991 Design and Analysis. Upper Saddle River: Prince Hall. Kutas, Marta & Hillyard, Steven A. 1980a Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207, 203–205. Kutas, Marta & VanPetten, Cyma K. 1994 Psycholinguistics electrified. In M. A. Gernsbacher (Ed.), Handbook of Psycholinguistics (pp. 83–143). San Diego: Academic Press. Ladusaw, William 1980 Polarity Sensitivity as Inherent Scope Relations. New York & London: Garland Publishing, Inc. Laka, Itzia 1994 On the Syntax of Negation. New York & London: Garland Publishing, Inc.
Processing Negative Polarity Items
163
Linebarger, Marcia 1987 Negative polarity and grammatical representation. Linguistics and Philosophy, 10, 325–387. Osterhout, Lee and Hagoort, Peter 1999 A superficial resemblance does not necessarily mean you are part of the family: Counterarguments to Coulson, King, and Kutas (1998) in the P600/SPS-P300 debate. Language and Cognitive Processes, 14, 1–14. Osterhout, Lee and Holcomb, Phillip J. 1992 Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language, 31, 785–806. Progovac, Ljiljana 2000 Negative and positive feature checking and the distribution of polarity items. In Brown, S. & Przepiorkowski, A. (Eds.), Negation in Slavic. Slavica Publishers. Saddy, Douglas, Drenhaus, Heiner, and Frisch, Stefan 2004 Processing polarity items: Contrastive licensing costs, Brain and Language, Volume 90, Issues 1–3, July–September 2004, Pages 495–502. Shao, Jenny & Neville, Helen 1998 Analyzing semantic processing using event-related brain potentials. The Newsletter of the Center for Research in Language. University of California, San Diego, La Jolla CA 92039, 11 (5). www:http://crl.ucsd.edu/newsletter.html. Sharbrough, Frank W., Chatrian, Gian-Emilio, Lesser, Ruth P., Lüders, Hans O., Nuwer, Mark R., and Picton, Terence W. 1991 American Electroencephalographic Society guidelines for standard electrode position nomenclature. Journal of Clinical Neurophysiology, 8 (2), 200–202.
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs Veronika Ehrich
1
Introduction
The form and meaning of modal verbs (MVs) have been under linguistic debate for many years. The still ongoing discussion adresses the status of MVs as (i) auxiliaries or non-auxiliaries (functional or non-functional categories), (ii) as raising vs. control verbs, (iii) as sources of coherent infinitives and (iv) as systematically polyfunctional items, interpretable by reference to either circumstantial or epistemic discourse backgrounds in the sense of Kratzer (1991). The main problem at issue is the question whether the semantic polyfunctionality of MVs in general, and the epistemic MVreadings in particular, have a grammatical correlate in (one of) the properties (i-iii). Psycholinguistic studies investigating the acquisition of MVs have been mainly concerned with the cognitive basis of modal reasoning and the use of epistemic MVs as reflecting a developing Theory of Mind. In the present paper, I argue that the ontogenesis of epistemicity has a syntactic and a semantic basis as well, especially concerning the acquisition of the coherent infinitive and of different non-epistemic discourse backgrounds for MVs. I argue against monocausal accounts in acquisition research and try to demonstrate that different developmental paths in the domains of syntax, semantics and cognition converge in giving rise to epistemic meanings. The paper is structured as follows. In section 2, I review the relevant semantic and syntactic properties of German MVs and the main findings of MV-acquisition research. Section 3 presents the results of a corpus study (Caroline-corpus in CHILDES) in relation to the competing (psycho-) linguistic approaches to epistemicity in language and language development.
166
Veronika Ehrich
2 2.1
Linguistic properties of German MVs Semantic polyfunctionality
Modals relate a given state of affairs p, denoted by the subject-infinitive predication ‘Max swim’ in (1-2), to a given discourse background such as the subject’s desires, abilities or obligations (1), or the speaker’s evidence for p (2). According to Kratzer (1991), p is assessed as necessary if it follows from a given discourse background, and as possible if it is compatible with it. Kratzer defines three parameters determining the interpretation of a given MV: MODAL FORCE (necessity vs. possibility), MODAL BASE (circumstantial vs. epistemic) and ORDERING SOURCE (dispositional, deontic or realistic backgrounds for a circumstantial MODAL BASE (1), or between strictly epistemic and quotative-evidential readings for an epistemic MODAL BASE (2)). (1)
CIRCUMSTANTIAL MODAL BASE a. Max muss / kann täglich schwimmen. Er braucht das einfach. ‘Max has to / is able to swim every day. He simply needs that.’ (Desire / ability to bring about p: Dispositional Ordering Source) b. Max muss / kann jetzt schwimmen. Ich verlange / erlaube es. ‘Max is obliged / permitted to swim now. I request / permit that.’ (Obligation / permission wrt. p: Deontic Ordering Source) c. Max muss / kann zur Insel schwimmen. Das Wasser ist recht tief. ‘Max must / may swim to the island. The water is quite deep.’ (Pure necessity / possibility of p: Realistic Ordering Source)
(2)
EPISTEMIC MODAL BASE a. Max muss / kann täglich schwimmen. Sein Auto parkt immer beim See. ‘Max must / may swim every day. His car is always parked near the lake.’ (Inference from speaker’s knowledge: Strictly Epistemic Ordering Source) b. Max muss täglich schwimmen – nach dem was seine Freunde sagen. ‘Max must swim every day– according to his friends.’ (Evidence from hear-say: Quotative-evidential Ordering Source)
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
167
2.2 Syntactic correlates of semantic polyfunctionality 2.2.1 Raising and control It is commonly assumed in generative syntax that MVs are control verbs in circumstantial and raising verbs in epistemic readings (Ross 1969; v. Stechow and Sternefeld 1988; Roberst and Roussou 1999). A recent variant of the raising hypothesis, proposed by Wurmbrand (1999), assumes that all German MVs are raising verbs in any reading, which implies that wollen (‘wish to’) – being a control verb – in fact doesn’t belong to the class of German MVs. But wollen shares all features that define German MVs as a class: the preterito-present morphology, the embedding of bare infinitives, the coherent construction type and, most importantly, the semantic polyfunctionality allowing for bouletic as well as quotative-evidential readings. Hence, the assumption that wollen does not belong to the class of MVs is unconvincing. Raising verbs do not assign a theta-role to their matrix subject and thus leave the subject position empty at the level of deep structure. The subject of the embedded verb must be raised to the subject position of the matrix clause, where it receives case. Control verbs, on the other hand, theta-mark their base-generated matrix subject, and the PRO-subject of the embedded clause is subject to control. These differences in derivational history are reflected in various distributional properties: raising verbs, as opposed to control verbs, embed impersonal constructions (including impersonal passives), they allow for expletives as well as sentential subjects, and their active-passive-variants are truth-functionally equivalent. Application of these diagnostics to German MVs proves that the raising/control contrast cuts across the MODAL BASE circumstantial/epistemic distinction: Können, müssen, sollen, dürfen show raising properties in epistemic as well as in most circumstantial readings, whereas wollen behaves as a control verb in all its readings (see Öhlschläger 1989; Reis 2001). While the raising/control opposition is orthogonal to the distinction of circumstantial vs. epistemic MODAL BASEs, there is a systematic interaction between raising and control in terms of ORDERING SOURCE. Behaviour with respect to truth-functional equivalence of active and passive is a case in point. Bouletic readings of müssen (‘must’) in (3) and ability readings of können (‘can’) in (4) have non-equivalent active-passive variants, which corresponds to the control pattern in this respect.
168
Veronika Ehrich
(3)
a. Ich freue mich so, dich zu sehen, dass ich dich umarmen muss. I enjoy myself so you to see, that I you hug-Inf must ‘I am so glad to see you, that I have to (urgently wish to) hug you.’ b. Ich freue mich so, dich zu sehen, dass du von mir umarmt werden musst. I enjoy myself so you to see, that you by me hugged get-Inf must ‘I am so glad to see you that you have to (urgently wish to) be hugged by me.’
(4)
a. Der Zauberer kann ein Kaninchen aus dem Hut zaubern. the wizard can a rabbit out the hat produce-Inf ‘The wizard is able to produce a rabbit out of his hat.’ b. Ein Kaninchen kann (*ist fähig) von dem Zauberer aus dem Hut gezaubert (*zu) werden. a rabbit can by the wizard out the hat produced be ‘A rabbit can (*is able to) be produced out of the wizard’s hat.’ Glosses in the examples are based on etymological (but not always semantically equivalent) English cognates. The appropriate English readings are presented in the translations.
Deontic active-passive alternants, on the other hand, seem to be truthfunctionally equivalent (5a,b). (5)
a. Das Kind kann /muss / soll den Großvater küssen. the child can / must / shall the grandfather kiss-Inf ‘The child may / has to / is supposed to kiss the grandfather.’ b. Der Großvater kann / muss / soll von dem Kind geküsst werden. the grandfather may / must / shall by the child kissed be ‘The grandfather may / has to / is supposed to be kissed by the child.’ c. Der Großvater kann / muss / soll sich von dem Kind küssen lassen. the grandfather can / must / shall himself by the child kiss-Inf let ‘The grandfather can / must / shall allow to be kissed by the child.’
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
169
The passive sentence (5b) is, however, ambiguous, and may be interpreted in terms of an obligation addressed to either the child or the grandfather as in (5c). The active-passive equivalence only holds for the first interpretation (obligation addressed to the child) and is thus dependent on the assumption that the obligation is addressed to the referent of the logical rather than the grammatical subject in (5b). The target of the obligation is, of course, irrelevant in cases where the embedded verb lacks a logical subject (6a), or where the ORDERING SOURCE is determined by a given course of events (6b,c) and thus forms a realistic discourse background. MVs clearly show the behaviour of raising verbs in these cases. (6)
a. Es muss / kann heute regnen. (Bouletic, deontic) it must / can today rain-Inf ‘It must / may rain today.’ b. Wenn der Tank leer ist, kann / muss man ihn füllen. when the tank empty is can/ must you it fill-Inf ‘When the tank is empty, you must / should fill it.’ c. Wenn der Tank leer ist, kann / muss er gefüllt werden. when the tank empty is, can / must it filled get-Inf ‘When the tank is empty it can / must be filled.’
The raising control contrast, while cutting across the epistemic / nonepistemic-opposition, is thus sensitive to differences in ORDERING SOURCE. Dispositional (bouletic or ability) readings of können and müssen are control readings, deontic readings have a control as well as a raising potential depending on the source and the addressee of the obligation. Realistic readings show consistent raising behaviour, which they share with epistemic MV readings. While it is thus true that raising and control interact with the interpretation of MVs, this interaction takes place on the level of ORDERING SOURCE rather than MODAL BASE. 2.2.2 Strict coherence The evidence presented so far shows that raising vs. control diagnostics do not clearly single out epistemic readings. Thus, for a child having to determine the grammatical basis of epistemicity, raising diagnostics are poor evidence. On the other hand, German MVs behave quite consistenly with respect to the following syntactic requirements: MVs (i) uniformly govern the bare infinitive, and (ii) enter into obligatorily coherent
170
Veronika Ehrich
constructions. Reis (2001), therefore, argues that STRICT COHERENCE (defined by the combination of i,ii) is to be considered the syntactic correlate of semantic polyfunctionality. In a coherent construction, matrix verb and embedded verb get fused into a single verbal complex, to the effect that matrix and embedded clause integrate into a single clause (7a). As a conseqence, extraposition of the infinitival complement is ungrammatical (7b), whereas fronting of MV plus infinitive is possible (7c). (7)
a. ...dass er den Kuchen aufessen darf. ... that he the cake up-eat-Inf may ‘...that he may eat up the cake.’ b. ...,*dass er darf den Kuchen aufessen (No extraposition of Inf. ). ..., that he may the cake up-eat-Inf ‘..., that he may eat up the cake.’ c. Aufessen dürfen hat er den Kuchen. up-eat-Inf may-Inf has he the cake ‘He was allowed to eat the cake up.’
Coherence diagnostics besides extraposition, in particular scrambling and adverbial scope ambiguities (see Kiss 1995), present positive evidence to the child and may help her to recognize the essential features of German MVs on which their polyfunctional semantics is based. English MVs, of course, exhibit the same kind of polyfunctionality although the coherence / non-coherence contrast does not exist in English. Why then should strict coherence and polyfunctionality be interdependent in German? Recall that MVs are auxiliary verbs in English but full verbs in German. Auxiliarization creates bondedness: auxiliary and full verb form a bonded complex. Strict coherence fulfils an analogous function: the integration of MV and bare infinitive into a single verbal complex creates a similarly bonded construction. In other words, strict coherence and auxiliarity are different parametrizations of the same underlying property of bondedness (see Reis 2004). 2.3
The ontogenesis of epistemicity
It is uncontroversial in acquisition research that children produce circumstantial MVs earlier than epistemic ones (Stephany 1995). The first MV productions reported in the literature have bouletic and ability
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
171
readings, followed by MVs in deontic interpretations. While first epistemic MVs have been documented for some 2;6 year-olds, it is commonly assumed that epistemic readings commonly arise between age 3;5 and 6 or even later (see Doitchinov 2001 and this volume). Producing and understanding modal expressions in epistemic readings involves certain cognitive prerequisites (Shatz and Wilcox 1991), such as the ability to distinguish between actual and hypothetical states of affairs as well as between facts and mental representations of facts. The result that non-epistemic MV readings are acquired earlier than epistemic ones has a straightforward explanation on this account: epistemic reasoning requires the ability to relate the actual world to its possible alternatives, which, in turn, presupposes the availability of a THEORY OF MIND. A child is supposed to have developed a THEORY OF MIND as soon as she is able to refer to mental states (desires, beliefs) of her own and to contrast them with actual states of affairs or with assumptions (mental states) of other persons (Shatz et al. 1983). The contrastive use of mental verbs therefore counts as evidence that a child has acquired the cognitive prerequisites for modal reasoning. Recent studies investigating the acquisition of MVs are primarily concerned with the THEORY OF MIND as the possible basis of epistemicity (Papafragou 2002). While the case study presented here also reviews the use of mental terms, it is primarily focussed on the linguistic prerequisites of epistemicity in relation to the growing command of MV syntax in child language. 3 3.1
The corpus study The data
The case study to be presented here is based on data from the CarolineCorpus in CHILDES. It concentrates on Caroline’s MV use between age 2;3 and 3;0. This period has been chosen because by age 2;2 Caroline has produced each of the German MVs at least once, and because her first epistemic MVs occur between age 2;6 and 2;10. Caroline starts using MVs by the age of 1;8. Table 1 documents the order of her first productions of MVs, and their respective semantic interpretations. In general, Caroline’s early MVs confirm what is known from the literature: will, occurring by age 1;8 for the first time, is the earliest MV, while kann and muss are acquired last, by age 2;2.
172
Veronika Ehrich
Table 1. Caroline’s first MVs
will (1;8) < darf (1;11) < mag nicht, soll nicht (2;0) < kann, muss (2;2) bouletic
deontic
bouletic
deontic
ability, deontic
The data to be discussed in the following sections are presented in groups of three successive MLU stages (see Table 2). Caroline’s early MVs are restricted to bouletic and deontic interpretations at stage I, followed by deontic and realistic MVs at stage II. Her first epistemic modals occur by stage III. Table 2. MLU stages and age groups Stage I
Stage II
Stage III
MLU
2.34
3.41
4.42
Age Groups
2;3 – 2;4
2;5 – 2;8
2;9 – 2;10
Actually, there are only three clearly epistemic examples in the entire corpus. This raises doubt as to whether the data may be considered substantial evidence at all. It has to be admitted that evidence from child corpora is questionable for several reasons: (i) Audio and video recordings, even if taken in short intervals, present just a selection of a child’s linguistic production. Data from such recordings may be taken as representing a child’s linguistic abilities if the behaviour under investigation is sufficiently frequent. Verb-placement studies are a case in point. Epistemic MVs are, however, quite infrequent even in adult language. A few instances of some infrequent linguistic type in a child corpus may thus be incidental rather than reflecting a child’s systematic command of that type. (ii) Investigators can hardly avoid interpreting a child’s utterances on the basis of their adult linguistic knowledge. It cannot be ruled out a priori that a child, who in fact tries to convey a message m (and is thus cognitively capable to conceptualize m), is simply unable to make herself unambiguously understood. The reverse failure is possible, too: a child may have encoded her message in a grammatical form suggesting the interpretation m
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
173
to the investigator, although m does not correspond to the communicative intention of the child. (iii) Spontaneous speech suffers from all kinds of disruptions and is highly elliptical. This is unproblematic for the investigation of adult language because adults may always be assumed to be competent speakers of their mother tongue. But with respect to child language, it is often difficult to decide whether an utterance lacks full grammatical shape because the child has not yet fully acquired the construction in question, or whether the deviation is caused by contingent performance restrictions imposed on spoken language of children and adults alike. With these reservations in mind, I am ready to quote what I see as Caroline’s first epistemic MVs. (8)
***File CHI020322.cha":line 260;) *CHI: nage scheinlich eine reht [?] # . ‘nai[l] [prob]ably [entered?]’ *MOT: Nagel wahrscheinlich eingetreten ? *CHI: ja # scheinlich # vielleicht #1 soll sein # . ‘Yes, [prob]ably, perhaps, may be!’
(9)
***File CHI020708.cha":line 383;) *CHI: steck die # das ## in Mun Mund ## in Mund Mund xxx # . *CHI: des muss ein #1 mal rund #1 gewesen sein ## weil dis #2 ein Knetgummi#3. ‘This must once have been round, because this (is) a kneading gum.’
(10)
***File CHI020918.cha": line 86;) *MOT: so Caroline #1 jetzt sag mir noch mal # hast du Hunger #3 du # ? *CHI: gib mir noch mal ## xx xx eine gute Idee mach # . *CHI: wo ist denn [?] ## ein einen muss da oben sein . ‘Where is the cake? One must be up there.’
File references like ‘CHI020322.cha":line 260’ are to be read as follows: The digits refer to the child’s age at the occasion of the utterance, the first two digits refer to the year, the next two digits to the month and the last two to the day. In other words, Caroline was two years, 3 months and 22 days old when she uttered line 260 of file 020322 in (8).
174
Veronika Ehrich
In the following sections, I will discuss the question whether Caroline’s early epistemics are to be explained in terms of one of the competing accounts of epistemicity sketched in section 2 above. 3.2
The auxiliarization hypothesis
Theoretical approaches as different as grammaticalization theory and generative syntax (often) analyze MVs as full verbs in circumstantial and (always) as auxiliary verbs (functional categories) in epistemic readings. Acquisition data seem to support this view. It has often been observed that German MVs – as opposed to ordinary full verbs – are generated in second position (V2) even in very early child language, when full verbs still occur mainly in final position. Caroline confirms this picture. She produces MVs in V2 from very early on, see (11) for illustration: (11)
***File CHI020121.cha": line 493; *CHI: Ayche # dadze danzen ## . *MOT: was ist mit der Katze ? *CHI: die will tanzen. this-fem. will dance-Inf
The difference between MVs and FVs in terms of verb placement suggests that very young children misanalyze MVs as auxiliaries before recognizing them as full verbs in German (see Clahsen & Penke 1992 for a similar view). Assuming a (hypothetical) Aux-Parameter, set as ‘MV is Aux’ (which needs to be reset for German), would explain the verb placement data. The hypothetical Aux-Parameter might be seen as the developmental counterpart of the Auxiliarization Hypothesis. This would imply that the parameter must be reset for circumstantial MVs only, since – in view of this hypothesis – epistemic MVs are auxiliaries anyway. This is, however, incompatible with the well-established fact that epistemic readings are acquired later than circumstantial ones. If (i) epistemic MVs were auxiliaries anyway, and if (ii) children took German MVs as auxiliaries from early on, they should produce epistemic MVs from early on (provided that auxiliarization is a sufficient condition for epistemicity). The fact that epistemics occur later in acquisition thus doesn’t have a linguistic explanation under the Auxiliarization Hypothesis, and would call for a non-linguistic, cognitive account. While it is true that Caroline uses MVs in the finite V2 position quite regularly, it is also true that these MV-occurrences are in fact finite. Due to
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
175
their preterite-present morphology, German modals are endingless in the 1st and 3rd pers. sing. The lack of inflectional markers in MVs probably facilitates the acquisition of finiteness, and MVs are generated in V2 simply because they are among the first finite verbs in child language (see Jordens 1990, 2002). We compared Caroline’s MVs and her use of non-modal full verbs (FVs) in search of further evidence. The comparison includes the total of Caroline’s MVs for the period between age 2;3 and 2;10 (MLU-stages I to III) and the FV-occurrences from the first two and the last two files of each month in the same period. The result is straightforward: while only half of the occurring FVs are finite, they tend to occur in finite position even at stage MLU I.
Occurrences of MV and FV
Placement of Full Verbs (FV) and Modal Verbs (MV) 800 600 400 200 0 MLU I
MLU II
MLU III
MV Total MV in V2 / V1 FV Total FV-fin FV-fin in V2/V1
MLU Stages
Figure 1. Distribution of Finite MVs and FVs in Finite Positions
In other words, finite (non-modal) FVs and finite MVs do not behave differently with respect to verb placement. Evidence from word order thus does not provide much support for the auxiliarization hypothesis. 3.3
RAISING and CONTROL
The common RAISING/CONTROL diagnostics are not applicable to early child language: young children do not use impersonal constructions or exple-tives, and active/passive equivalence is hardly testable in corpus data
176
Veronika Ehrich
anyway. Accordingly, any attempt to show whether Caroline does or does not master RAISING by age 2;7, at which she produces her first epistemic MVs, will have to rely on alternative – and more indirect – tests. RAISING and CONTROL constructions differ mainly in the way they generate their matrix subject, and this difference might have a direct reflex in child language. We compared corpus-occurrences of wollen (a systematic control verb) and können (a raising verb in most of its readings) with respect to their subjects. Caroline, in fact, makes a clear difference between wollen and können in this respect: she omits the subject of wollen more than twice as often as the subject of können (Figure 2).
Percentage
Missing Subjects 100 90 80 70 60 50 40 30 20 10 0
kann w ill MVtotal FV
MLU I
MLU II
MLU III
Figure 2. Percentages of Missing Subjects for Full Verbs (FV), Modal Verbs (MV) and for kann and will
The proportion of full verbs (FVs) without subjects decreases from 35% at MLU I to 22% at MLU III. During the same period, the proportion of MVs without subjects also decreases from 51% to 32%. But whereas the percentage of omitted subjects remains almost constant for will, there is a remarkable decrease from MLU I to MLU II for kann. This distribution has a straightforward pragmatic explanation: Caroline uses will mainly in reference to her own wishes, preferences, and needs. While the omission of subjects is common in early child language in general, it is even more frequent when a child refers to ego. In fact, almost 90% of the will-occurrences in
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
177
the data are used for reference to ego, as opposed to less than 50% of the kann-occurrences and 50% of the MV-total (see Figure 3).
Percentages
Reference to Ego 100 90 80 70 60 50 40 30 20 10 0
MV-ego kann-ego will-ego FV-ego
MLU I
MLU II
MLU III
Figure 3. Proportion of Ego-Reference in the Use of MVs
The difference between kann and will occurrences with overt/non-overt subjects thus seems to have a purely pragmatic explanation. In adult language, subject omission is, however, not only pragmatically motivated, it is also syntactically constrained. A grammatical subject can only be omitted via TOPIC DROP: omission from the topic position (12a) is grammatical, omission from a non-topic position in the middle field (12b) is syntactically deviant. (12)
a. Wie geht es Max1? (__i) hat gestern angerufen. how goes it Max? (__i) has yesterday called ‘How is Max? (__i) has called yesterday.’ b. Wie geht es Max1? *Gestern hat (__i) angerufen. how goes it Max? Yesterday has (__i) called ‘How is Max? Yesterday has (__i) called.’
Topicalization moves a subject from its home position in Spec-I (or Spec- V) to the topic position. TOPIC DROP occurs when a topicalized phrase isn’t spelled out in that position. Subject topicalization is impossible where the subject position in Spec I has not (yet) been filled. Therefore,
178
Veronika Ehrich
applies to the matrix subject of a raising verb only after the subject of the embedded verb has been raised to the matrix-subjectposition. In other words, RAISING is prior to TOPIC DROP. Control constructions, which base-generate their matrix subjects, impose no such priority constraint on TOPIC DROP. A child who has not yet acquired RAISING will therefore omit subjects of control verbs more easily than subjects of raising verbs. Since kann is a raising verb in most of its readings, the low percentage of subject omissions for kann may be seen as reflecting the fact that Caroline hasn’t yet acquired RAISING by age d 3. On this account, RAISING isn’t a necessary prerequisite for her first epistemic MVs. There is, however, an alternative explanation. Whereas the relative amount of missing subjects is almost equally distributed for kann and will at MLU I (50% of all kann / will occurrences are subjectless at MLU I, see Figure 2), there is a remarkable shift at the subsequent stages: the missing subject rate constantly remains at 50% for will, but decreases to 20% or less for kann, at stage MLU II already. This distribution has a straightforward semantic explanation: will is necessarily interpreted in terms of a subjectinternal ORDERING SOURCE, whereas kann, which occurs not only in ability readings (13), but also in deontic (14) and realistic readings (15), allows for either an internal or an external ORDERING SOURCE. TOPIC DROP
(13)
***File CHI020622.cha": line 183; *CHI: guck mal ich schon wach ## xx mit ## aehm # mit # mit # sch # mit # mit # mit ## Hocke # sch #2. *CHI: ich kann noch ein stehen und schaukeln #1. I can another one stand and swing *MOT: wie bitte ## ? *CHI: schaukel ## auch im Stehn kann ich noch # kann ich schaukeln >...@ swing even when standing can I swing
(14)
***File CHI020628.cha": line 535; Deontic Reading *CHI: wa kann ich malen ## wann kann ich malen ? ‘whe[n] can I paint when can I paint? ’ *MOT: erst wenn du dich hinsetzt #2. ‘only when you sit down.’
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
179
(15) ***File CHI020613.cha": line 136; (MOT and CHI are making puppets) *CHI: da kann man des durchstecken # und ein Clown machen. there can one that through-put and a clown make *MOT: Clown ist schon ziemlich schwer # . The difference with respect to ORDERING SOURCE correlates with overt subject realization. We classified the MVs of the corpus by their ORDERING SOURCE and checked how many occurrences of each class were used with an overt subject. Figure 4 presents the results: deontic readings, which require an external ORDERING SOURCE, are relatively rare at MLU I (13% of all MV-occurrences), but their frequency increases (from 25% at MLU II to 34% at MLU III). Realistic MVs, which also have an external ordering source, are the least frequent MVs (no occurrence at MLU I, 7% at MLU II, and 9% at MLU III), but they hardly ever occur without a subject.
Percentages
Overt Subjects and Ordering Source 100 90 80 70 60 50 40 30 20 10 0
Dispo-sub Deo-sub Real-sub MV-sub FV-sub
MLU I
MLU II
MLU III
Figure 4.
The distribution of overt subjects is quite different: MVs in general as well as MVs in dispositional readings are used with an overt subject in about 50% of their occurrences, whereas the frequency of overt subjects for deontic MVs (with external ordering source) increases from 63% at MLU I to 83% at MLU III. Caroline, obviously, has the capacity to produce either
180
Veronika Ehrich
full clauses with subjects or elliptical ones without, but she seems to avoid the effort of producing a full clause whenever the semantics of the MV ensures that the intended message will be recognized anyway. Subjects of FVs and dispositional MVs (with internal ordering source) are randomly omitted between MLU I and MLU III, whereas subjects of deontic and realistic MVs (with external ordering source) are spelled out on a quite regular basis by stage MLU II. In other words, the increasing availability of different MV readings goes along with the development of a more elaborate syntax. 3.4
Bare infinitives and strict coherence
According to Reis (2001), STRICT COHERENCE is the syntactic correlate of epistemicity, its defining feature being the embedding of a bare infinitive. Thus, on the STRICT COHERENCE account, Caroline should have acquired the bare infinitive constraint on MVs before she produces her first epistemic MV readings. In fact, even some of her earliest MVs combine with a bare infinitive (see (11) above for illustration), and, by age 2;3, Caroline even distinguishes bare infinitives and zu-infinitives (16 -18). But she is not very consistent in this respect, sometimes she omits the infinitive ending (18), and, in 38% to 46% of her MV-productions she fails to embed an infinitive at all. (16)
***File CHI020302.cha": line 418; *MOT: guck mal wer wohnt denn in dem schwarzen Haus? *CHI: ja #2 ein Dach ## musst du malen #1. yes a roof must you paint-Inf *CHI: ich mal #3 Dach # ein Dach #1.
(17)
*** File "90-02-17.cha": line 235. *CHI: ja #2 brauch keine Angst zu haben die Ente # . yes need not be afraid the duck *MOT: ja aber die Eulen die fressen nämlich manchmal Enten # ***" CHI020325.cha":line 30; **MOT:aber du kannst zum Beispiel # ne Strumpfhose naehen. *CHI: strumpf # trumpf # Hose naeh kann doch nicht # . panty panty hose sew can yet not
(18)
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
181
While being able to obey the bare infinitive constraint in principle, she avoids the infinitive quite frequently. This suggests that integration of MVs and infinitives into a strict coherent construction poses considerable difficulties for her (compare Table 3). Table 3. Distribution of Infinitives. The first figure gives the absolute number of MVs in each group, the second figure gives the absolute number of MVs embedding bare infinitives in that group, percentages of embedded infinitives in parentheses. MVtotal 81, 31 (38%) MLU II 540, 263 (46%) MLU III 282, 122 (43%) MLU I
Dispositional 46, 13 (10%) 340, 123 (36%) 127, 57 (44%)
Deontic 10, 9 (90%)
Realistic 0
136, 83 (61%) 71, 61 (81%)
43, 42 (97%) 27, 22 (81%)
Again, there are remarkable differences with respect to the different MV readings. Deontic and realistic MV occurrences, which are based on an external ordering source, occur with bare infinitives almost twice as often as dispositional MVs based on an internal ordering source. Obviously, Caroline’s performance on the bare infinitive constraint and her performance on the overt subject requirement follow the same strategy: dispositional MVs occur in more elliptical constructions, whereas deontic and realistic MVs tend to be used in fully integrated structures with overt subjects plus embedded infinitives. We measured Caroline’s MV-productions for the degree of integration. MV-constructions containing a bare infinitive in addition to an overt subject are counted as fully integrated (Integration Factor =1), MVs accompanied by either a bare infinitve or an overt subject count as partially integrated (=0.5). A zero-degree of integration (=0) is assumed where a MV construction lacks a bare infinitive as well as a subject. We calculated the Mean Integration Factor for each MLU stage by adding up the values obtained for the individual MVs at a given stage and dividing the sum by the total number of MVs occurring at that stage. Figure 5 shows that the degree of integration is lowest for MVs in dispositional readings, and highest for MVs in realistic readings (= 0.79 at MLU III).
182
Veronika Ehrich
Integration Factor 1 = 100
Integration Factors 100 90 80 70 60 50 40 30 20 10 0
MV-total MV-disp MV-deo MV-real
MLU I
MLU II
MLU III
Figure 5. Integration Factors for MVs in Different Readings
These data, again, are evidence for a close interaction between the syntax and the semantics of MVs in child language. This does not necessarily entail that syntax is the source of MV semantics or vice versa. It may very well be the case that semantic and syntactic capacities, while having developed separately up to a certain time (each in its own way and temporal order), converge at a certain point in bringing about a growing variety of MV readings with their specific syntactic shapes.
3.5
Evidence for a developing THEORY OF MIND
In order to find out whether Caroline had developed a THEORY OF MIND when producing the first epistemic MVs, the corpus was checked for occurences of mental verbs like wissen (‘know’), denken (‘think’), meinen (‘mean’), verstehen (‘understand’), glauben (‘believe’), finden (‘judge’) and vergessen (‘forget’). These verbs are used in reference to mental states in 40% of their overall occurrences (Benz 2004). See (19) for illustration:
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
(19)
183
***CHI020413.cha": line 15. *MOT: sprichst du mit deiner Puppe #1? *CHI: ja # ja #1 [=! stoehnt] . *CHI: kann nich hinstelln # . *CHI: weisst du genau #6. know you exactly
Caroline’s use of sentential adverbs like vielleicht (‘perhaps’) is further evidence for her ability of modal reasoning. She uses vielleicht in a deliberating function at almost the same age at which she produces her first epistemic MVs (20). (20)
***CHI020814: line 81. *MOT: und da # hat er da sein Taschentuch #1? *CHI: nein ein Baby #1. >...@ *MOT: und warum sitzt es da #1? *CHI: vielleicht # ist da #2 in Papis Bauch #2. perhaps is there >a baby@ in daddy’s belly *MOT: aber Papis haben keine Babys im Bauch ##
Caroline talks about consequences resulting from a possible action and uses first conditionals. The one in (21), though connected to an ongoing action, proves that she starts reflecting about alternative futures resulting from her actions already by age 2;7. (21)
***CHI020710.cha": line 237. *MOT: dis # ah ja ich probier es jetzt mal nur mit falten . *CHI: nein dis # brauchst du ## zum Kleben #3. *CHI: wenn dis #gar nich geht ## machen wir ohne ##Klebe #1. if this not works do we without glue *CHI: ich mach #1 [=! stoehnt].
There is, thus, firm evidence that Caroline has acquired an elementary by age 2.7. She has not only acquired inferencing capacities but is also able to express her reasoning in appropriate linguistic terms. THEORY OF MIND
184
Veronika Ehrich
4
Conclusion
MV-acquisition studies of the last twenty years have been mainly concerned with cognitve constraints on the rise of epistemic meanings, whereas the form-meaning correlation has hardly been tackled. By contrast, the present study is focussed on the interaction between the syntax and the semantics of modal verbs as evidenced by Caroline’s development. Caroline’s production of MVs shows that form and meaning of MVs are indeed tightly connected. This is not primarily a function of the MODAL BASE contrast between circumstantials and epistemics, but seems to depend on the contrast between INTERNAL (ability and bouletic readings) and EXTERNAL ORDERING SOURCEs (deontic, realistic and epistemic readings). Before using her first epistemic MVs by age 2;7, Caroline starts varying her syntax for circumstantial MVs. While elliptical constructions lacking a subject phrase, a bare infinitive, or both, are predominant in bouletic and ability readings of MVs even beyond age 2;10, Caroline uses a more elaborate syntax for deontic and realistic readings by age t 2;4. The increase in semantic MV variation goes along with the production of more full-fledged syntactic structures. Caroline’s growing capacity for handling semantic polyfunctionality and her growing command of MV syntax converge in the period from age 2;4 to 2;10. This is also the age when she produces her first epistemic MVs. But syntactic development in general, and Caroline’s growing command of strict coherence in particular, are probably not the only source of epistemicity. The fact that reference to mental states, first epistemic adverbs and conditionals temporally overlap with her first epistemic MVs indicates that the cognitive basis for modal reasoning develops across various grammatical categories. Obviously, syntactic progress, semantic diversification and cognitive development are all necessary prerequisites for the rise of epistemicity, but none seems to be sufficient by itself. The data reported here do not support any monodirectional account in terms of syntactic vs. semantic boot-strapping, nor in terms of strict cognitivism. Caroline’s first epistemic MV uses seem to arise from converging developments in syntax, semantics and cognition. She makes use of whatever evidence is available to her, in order to gain access to the grammar of MVs, and, of whatever capacity she has, in order to make herself understood. This is, perhaps, just the way language development works.
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
185
Acknowledgements I would like to thank the editors and two anonymous reviewers for their helpful comments and suggestions. I am greatly indebted to the members of the ‘Modal Verb Project’ in the SFB 441, especially to Marga Reis, who shared their ideas about the syntax, semantics, and acquisition of modal verbs with me.
References Benz, Judith 2004 Epistemische Ausdrücke in der Kindersprache. Zulassungsarbeit, Tübingen: Deutsches Seminar. Clahsen, Harald and Martina Penke 1992 The Acquisition of Agreement Morphology and its Syntactic Consequences: New Evidence on German Child Language from the Simone-Corpus. In Jürgen M. Meisel, (ed.), The Acquisition of Verb Placement, 181-223. Dordrecht: Kluwer. Doitchinov, Serge 2001 „Es kann sein, dass der Junge ins Haus gegangen ist“. Zum Spracherwerb von können in epistemischer Lesart. In Reimar Müller, Marga Reis, (eds.), Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9: 109 – 134. Hamburg: Buske. 2005 Why do children fail to understand weak epistemic terms? An experimental study. [This volume]. Jordens, Peter 1990 The Acquisition of Verb Placement in Dutch and German. Linguistics 28: 1407 – 1448. 2002 The acquisition of verb placement in Dutch and German. Linguistics 40: 687 – 765. Kiss, Tibor 1995 Infinitive Komplementation – Morphologie, thematische und syntaktische Relationen. Neue Studien zum deutschen verbum infinitum. Tübingen: Niemeyer. Kratzer, Angelika 1991 ‘Modality’. In Arnim v. Stechow and Dieter Wunderlich, (eds.), Semantik. Ein internationales Handbuch der zeitgenössischen Forschung, 639-650. Berlin/New York: de Gruyter. MacWhinney, Brian B. 2000 The CHILDES project: Tools for Analysing Talk. Third edition. Mahwah, NJ, Lawrence Erlbaum Associates. http://childes.psy.cmu.edu Müller, Reimar and Marga Reis (eds.) 2001 Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9. Hamburg: Buske.
186
Veronika Ehrich
Öhlschläger, Günther 1989 Zur Syntax und Semantik der Modalverben des Deutschen. Tübingen: Niemeyer. Papafragou, Anna 2002 Modality and theory of mind. Perspectives from language development and autism. In Sjef Barbiers, Frits Beukema, and Wim v.d. Wurff (eds.), Modality and its interaction with the verbal system, 185-204. Amsterdam: Benjamins. Reis, Marga 2001 Bilden Modalverben im Deutschen eine syntaktische Klasse? In Reimar Müller and Marga Reis (eds.), Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9: 287-318. Hamburg: Buske. 2004 Modals, so-called Semi-Modals and Grammaticalization. Unpublished ms., Tübingen University. Roberts, Ian and Anna, Roussou 1999 A formal approach to grammaticalization. Linguistics 37: 1011-1041. Ross, John 1969 Auxiliaries as main verbs. In William Todd (ed.), Studies in Philosophical Linguistics. Series I: 77-102. Evanston, Ill.: Great expectations Press. Shatz, Marilyn, Henry M. Wellman, and Sharon Silber 1983 The acquisition of mental verbs. A systematic investigation of the first reference to mental state. Cognition 14: 301-321. Shatz, Marilyn and Sharon A. Wilcox 1991 Constraints on the acquisition of English modals. In Susan A. Gelman and James P. Byrnes (eds.), Perspectives on language and thought: interrelations in development, 319 - 353. New York: Cambridge University Press. Stechow, Arnim v. and Wolfgang Sternefeld 1988 Bausteine syntaktischen Wissens. Ein Lehrbuch der generativen Grammatik. Opladen: Westdeutscher Verlag. Stephany, Ursula 1995 Function and Form of Modality in First and Second Language Acquisition. In Anna Giacalone Ramat and Grazia Crocco Galeas (eds.), From Pragmatics to Syntax. Modality in Second Language Acquisition, 105-120. Tübingen: Narr. Wurmbrand, Susanne 1999 Modal verbs must be raising verbs. In Proceedings of the 18th West Coast Conference on Formal Linguistics (WCCFL 18): 599-612. Somerville, MA: Cascadilla Press.
The Decathlon Model of Empirical Syntax Sam Featherston
1 Introduction This paper reports our investigations into the data base of syntactic theory, specifically addressing the similarities and differences between corpus data and experimentally obtained well-formedness judgements and sketching the implications for the construct of grammaticality and the architecture of the grammar which our findings have. The motivation for these studies was a dissatisfaction with the state of affairs in syntax, when two syntacticians can look at the same phenomenon and come up with widely differing analyses of what is going on. Another disappointment is the lack of any real forward movement in theory: alternative analyses seem to succeed each other more due to fashion than due to falsification. We might say that syntactic description, let alone syntactic explanation, is underdetermined by its data base. This is in part due to the nature of the available evidence: most data feeding into syntactic theory has significant flaws: it is fuzzy and it reflects multiple factors, only some of which are relevant to theory. These factors are difficult to identify and even more difficult to distinguish (eg Sch¨utze 1996). Judgements have been particularly criticized as a data type, partly because of their inherent qualities, but partly for the way that they have been used (eg Labov 1996). One problem is that, faced with the impreciseness of judgement intuitions, researchers have idealized this data type to a very great degree, reducing the scale to a binary opposition, with marginal values as unclear cases. In part as a response to this situation, some syntacticians have sought other data sources, such as corpus frequencies and processing studies, which has tended to split the field and furthered the development of schools of syntax, who have neither a common formalism nor a common data base which would permit rapprochement between them. Related to this diversification into schools, a range of different grammar architectures have arisen. Generative syntacticians most commonly still use judgements, and assume a “live rail” grammar, in which any infringement of a grammatical rule causes a structure to be excluded absolutely. Those interested in competition models such as Optimality Theory (OT, Prince and
188 Sam Featherston Smolensky 1993) will tend to use frequency data and allow some idealization, while those favouring probabilistic models will tend to take a more finegrained approach to frequency data, and account for the variants using probabilities (eg Manning 2003). It need hardly be said that these models of the architecture of the grammar cannot all be right. Our view was that a more detailed study of data types and their characteristics might provide a way forward. When we have a more detailed understanding of the factors which each reflects and thus how the different types relate to each other, then we shall be in a better position to judge the evidence of each for syntax. This should also allow us to establish a well-founded procedure for idealization for each. With more finely grained data, we should be in a better position to determine how the grammar functions, and which architecture is therefore the correct one. In the following, we first sketch the sort of studies we have undertaken, outline the broad picture of the results, and then move on to the implications that these findings have for the relationship between data and theory and for the nature of the grammar. It turns out that the grammar has a rather different architecture to what is generally assumed. Note that this article aims to provide an overview of our results and our interpretation of their wider implications for theory; space does not permit discussion of the individual studies (see Featherston 2002, 2003, 2004, 2005a, 2005b). 2 Our studies We have carried out many studies using both frequencies and judgements, aiming firstly to clarify these issues of data and data type, secondly to clear up outstanding questions in the syntax, and thirdly to clarify the nature of the grammar. We have performed experiments on German and English, and have addressed a range of syntactic structures, among others island constraints, reflexives, reciprocals, word order, parenthetical insertions and echo questions. Our frequency data for German is drawn from the COSMAS I corpus of German (IDS, Mannheim), and for English, the British National Corpus (Oxford). We have generally elicited our judgement data using a variant of the magnitude estimation procedure (Bard et al. 1993). This method has three main differences to standard judgement elicitation. First, only relative judgements are gathered. Subjects are asked whether example A sounds more or less “natural” than example B, and by how much, but no absolute criterion of well-formedness is used. This distinction between relative and categorical judgements is important (see Section 5.2 below), but it also has the simple
The Decathlon Model 189
practical advantage that it defuses the problem of having to define a cutoff point between well-formed and ill-formed. Second, to anchor the judgements, subjects give their judgements relative to reference items and to their own previous judgements. Third, there is no imposed scale; no top or bottom limit nor minimum division between scores. Judgements are expressed in numerical form, and decimal fractions are allowed. This method allows informants to express all the differences in “naturalness” that they perceive, with no coercion to a given scale. When the limitation to a scale selected by the linguist is removed, the results exhibit more differentiation than conventional judgements are assumed to contain. But this additional information is an inherent part of grammaticality judgements, which was always potentially present. Previous collection methods were insufficiently sensitive to reveal this detail; deliberately so, since asking for categorical judgements is a form of idealization, of simplifying the explanatory task by reducing the amount of information gathered. In this paper we argue that this idealization has not only, as intended, simplified the job of explanation, but also distorted the picture, and led to some false conceptions of the way that the grammar and associated systems work. 3 Relative judgements In this section we sketch out the general pattern that the results of the judgement studies showed. This is necessary, since these experimentally obtained judgements reveal a very different pattern to that often assumed (but see Keller 2000 and Cowart 1996 for discussion and much insight). Firstly, judged well-formedness is a continuum. Figure 1 shows the results of a typical experiment gathering judgements from informants who are not forced to use a particular scale. In the graph, the four different syntactic conditions tested appear along on the horizontal axis and the mean judged wellformedness on the vertical axis, with higher scores reflecting “better” judgements. The error bars show the mean values and 95% confidence intervals of the scores for each condition. Let us be clear that these mean judgements can show no differential effect of lexis, context or plausibility, since these factors are fully controlled for. These error bars show only effects of structure. This experiment looked at the effect of discourse linking (Pesetsky 1987), which one might loosely paraphrase here as the effect of wh-item type on the permissable order of wh-items in multiple wh-questions. The standard view of the data is that (1a), (1b), and (1d) are good, but (1c) is ungrammatical, so syntacticians look for a factor affecting (1c).
190 Sam Featherston
Figure 1. Given freedom of judgement scale, informants do not just distinguish ’good’ and ’bad’ structures, but also ’good’ ones and ’better’ ones.
(1)
a. Who ate what? b. Which person ate which food? c. *What did who eat? d. Which food did which person eat?
The results of this study confirmed that (1c) is plainly worse than the other conditions, but the data also reveals that (1b) is not only good, but also clearly better than (1a) and (1d). What is more, (1b) is just about as much better than (1a), as (1d) is better than (1c). The factor we should be looking for therefore applies to both (1b) and (1d). If we use a model of well-formedness idealized to a binary opposition, in which (1a), (1b), and (1d) are all just good, not only do we do serious violence to the data, but we will also be looking in the wrong place for the correct syntactic account. In order to deal with this data, we must have a model of well-formedness as a continuum, on which there is not only good and bad, but also good and better. A model with good, bad, and intermediate positions (such as example structures in syntax with a question mark) will not suffice here. It follows that there are cases where the correct syntactic analysis of a structure can only be represented with a model of well-formedness as a continuum. In the following, we will illustrate our points about the data with reference to a particular piece of work in which both judgements and frequencies were collected. Let us be clear that this is just one example study, but other studies of the same type show the same basic patterns. The focus of this work was
The Decathlon Model 191
Figure 2. The data pattern form judgement studies. Given the choice, informants do not choose to bunch structures as good or bad, instead they produce a continuum of well-formedness.
the realizations of coreferent objects in the mittelfeld in German. Example (2) shows just eight of the possibilities; we also tested full NPs as antecedents. Full linguistic details of this work are in Featherston (2002), but these are not necessary for a full understanding of the present paper. Note that we tested 16 conditions in the original study, but here we shall sometimes just report on eight of them, for clarity. (2)
a. b. c. d.
ihni (selbst) im Spiegel gezeigt habe weil ich ihmi in.the mirror shown have as I him.DAT him.ACC self ihmi (selbst) im Spiegel gezeigt habe weil ich ihni in.the mirror shown have as I him.ACC him.DAT self Spiegel gezeigt habe sichi (selbst) im weil ich ihmi in.the mirror shown have as I him.DAT REFL self Spiegel gezeigt habe sichi (selbst) im weil ich ihni in.the mirror shown have as I him.ACC REFL self
Figure 2 shows the results of this study, this time with eight conditions. Here we have ordered the conditions not by their linguistic features (for this see Figure 3 below), but in order of their judged well-formedness, from best to worst. The judged well-formedness of these structures descends gradually. Looking at this continuum, it will be clear that the choice of any point at which to locate the cut-off point between well-formed and ill-formed will
192 Sam Featherston
Figure 3. This graph shows the results of the same judgement study on eight structural variants defined by three binary features. All syntactic features have an effect upon the judgements, and these effects are cumulative.
be arbitrary. These examples straddle the putative location of the cut-off point, since the best among them would be regarded as well-formed and the worst ones as ill-formed. In fact informants show no sign of using categorical well-formedness when given the option not to, instead they always use a continuum. We put forward our explanation of the intuition of categorical well-formedness in Section 5.2. We now turn to Figure 3, which shows the same data set but with the conditions ordered by their grammatical features. This was a 2x2x2 experimental design, that is, we tested eight structural variants differing on three binary parameters: a b c. The graph shows the conditions across the horizontal axis and the mean judged well-formedness on the vertical axis, with higher scores indicating better judgements. Each pair of error bars linked by a line is a minimal pair differing only in one of the three features a b c. In each case one of the pair violates a constraint and the other does not. We annotate the conditions in the graphic with the numbers 1 to 8 for easy identification, but the values of the three syntactic features for each condition are given on the baseline. For example, condition 1 has the values a:1, b:0 and c:0, which means that it violates constraints b and c, but not a. Let us look at the pattern of data. The clear finding is that well-formedness judgements directly correspond to the syntactic conditions, that is, the conditions are judged well-formed to the extent that they do not violate the syn-
The Decathlon Model 193
tactic constraints, but any and every constraint which is violated affects the judgements. It is evident that each constraint differentiating a minimal pair has a consistent effect upon the judgements: the relationship between the scores assigned to each pair differentiated by a given constraint is the same. So the relationship between conditions 1 and 2 is the same as between 3 and 4, and between 5 and 6, and between 7 and 8. Put differently, the relation between all 1bc and all 0bc is consistent. Notice also that this is generally true: for each pair ab0 and ab1, and a0c and a1c, the relationship is the same. Whether the structure was good or bad before does not matter: the application of these constraint violation costs is blind and automatic. Notice also that the pairs close to each other (eg 1 and 2, 3 and 4,...), linked by the short broken line, differ in their ratings only moderately, which shows that this particular constraint has a relatively small violation cost. The other two sets of minimal pairs (1 and 3, 2 and 4, etc and 1 and 5, 2 and 6, etc) have greater violation costs, and consistently so, but the systematicity is just as evident. The violation of a given linguistic constraint entails a given difference in judgements. We can say that each linguistic factor has a quantifiable, and constraint-specific effect upon judged well-formedness. An additional important point (Keller 2000) is that these violation costs are cumulative. The violation of any constraint entails a violation cost in judged well-formedness, to which any further violation costs are added. It is thus systematically the case that more violations cause a structure to be judged worse. This raises the question whether any of these constraints could be regarded as a ‘hard’ constraint. Perhaps the traditional definition of a ‘hard’ constraint is one which excludes a structure from being part of the language. In judgement data we might expect a ’hard’ constraint to cause a violating structure to drop to the bottom of the scale. One might predict that a ’hard’ constraint would cause a structure to be judged so bad that no further additional constraint violation could make it any worse. Perhaps surprisingly, experience with this data type has shown that there is apparently no such thing as a ‘hard’ constraint on this definition. The effect of a violation is only ever to make a structure worse, by an identifiable amount; no constraint violation makes a structure so bad that it cannot be made worse by an additional violation. We refer to this quality of linguistic constraints in judgement studies as survivability, which is best understood in contrast to the OT concept of violability. OT’s violability means that under certain circumstances, constraints have no effect on the output, that is, they fail to apply. This is in part necessary because the only effect that a constraint can have in OT is to exclude categorically the violating structures - OT only has ‘hard’ constraints. Our
194 Sam Featherston Table 1. Data from COSMAS, IDS, Mannheim (531 million word forms) ihni ihmi ihmi ihni ihmi sichi ihni sichi ihni ihmi selbst ihmi ihni selbst ihmi sichi selbst ihni sichi selbst
(“him.ACC him.DAT”) (“him.DAT him.ACC”) (“him.DAT REFL.ACC”) (“him.ACC REFL.DAT”) (“him.ACC him.DAT SELF”) (“him.DAT him.ACC SELF”) (“him.DAT REFL.ACC SELF”) (“him.ACC REFL.DAT SELF”)
0 hits 0 hits 0 hits 1 hit 0 hits 0 hits 0 hits 14 hits
survivability means that all constraints always apply, exceptionlessly, and a given violation always has the same effect – there is no probabilistic element at all. The effect of a constraint violation is to cause a structure to be judged worse, but no violation excludes a structure. We lay out what can exclude a structure under Section 5.2 below. Notice that this is strong evidence that our informants are not using occurrence as a criterion when they give a judgement – they must be assumed to be responding to something else. In the light of the finding that violation costs as measured in judgements of well-formedness are cumulative, survivable and blindly applied, on the one hand, but, as we shall see in Section 5.1 below, not directly related to output frequency on the other, it seems reasonable to assume that these violation costs, and hence well-formedness as measured by judgements, are related to computional workload. This raises questions about what psycholinguistically plausible mechanism might allow us to convert cognitive workload into judgements, and why we have such an ability. We take up these questions in Section 5.2, but we now turn to consider the evidence of frequency data. 4 Judgement data and frequency data Frequency data reveals a very different pattern. Table 1 contains the data pattern of the frequency study looking at the same variants of object coreference structures as those judged in Figures 2 and 3. The important point here is the distribution of forms found in the corpus: one structure is found fourteen times, another one is found once, but none of the others appear at all. Frequency data shows evidence of a competitive interaction of candidate forms, which would seem to indicate that the “best” structure of a comparison set usually wins through to be produced. Intuitively, this seems to be evidence
The Decathlon Model 195
Figure 4. The contrast between COSMAS frequency data and experimental judgement data on the same phenomenon.
that there is a competition function in the grammar, which in particular Optimality Theory has raised to its central operating principle. Interestingly, slightly less “good” alternatives are sometimes produced, which would suggest that the competition for output functions probabilistically. This is the motivation for stochastic versions of OT (eg Boersma and Hayes 2001). Figure 4 allows us to compare the two data patterns directly, as it superimposes the two different measures of the same sixteen structures on a single graph. The error bars show the mean normalized judgements obtained for the sixteen structures tested (left-hand scale). These can be seen to increase steadily from the very bottom to the very top, while the frequencies (right-hand scale), represented by the line without error bars, creep across the bottom at zero, and only rise sharply at the right-hand end. The comparison of these two measurements of the same structures brings the contrast of the data patterns into sharp focus. The first point to notice is their similarity: the same structures come top in both data types. The highest frequency structure is judged best and the next highest is judged second best, which makes it seem likely that the two data types are at least in part measuring the same underlying factor. But we should also note the key difference: the judgement data demonstrates that at least some part of the human linguistic computation mechanism is sensitive to differences among structures which are so bad that they would never be produced, for the structural variants on the left are surely so bad that they would never appear in any corpus, no matter how big. Since this is the case, it is plain that the two data types
196 Sam Featherston are also in part not measuring the same factor. We can therefore exclude categorically the possibility that relative judgements merely reflect frequency or probability of occurrence in some way. The attested frequency and probability of occurrence of the worst two thirds of these structural variants is exactly the same, and it is zero. These structures have in all likelihood never been used in all of human history, but our subjects can readily distinguish them in judgements, and do so very consistently. Our Decathlon Model of wellformedness and the architecture of grammar attempts to specify what process differentiates the two types of data, frequencies and relative judgements. 5 The Decathlon Model The name of this model derives from the athletic discipline of the decathlon. In this event, competitors take part in ten different sub-disciplines, and their performances are converted into a numerical form according to a set of standard scoring tables. The sum of these scores decides who wins the medals. But the scores are calculated not on their relative performance in the subdisciplines, but in their absolute performances, which means that whether an athlete comes first, second, or third in a sub-discipline is of no significance, what matters is that they perform at their personal best. In a sense therefore, they are not so much competing against each other at this stage as against themselves. Competition between competitors takes place at the second stage, where the ten numerical scores are totalled, and the highest scorer takes the gold. Something similar seems to us to be happening in human linguistic processing, as will become clear in this section. The Decathlon Model is at once an outline architecture of a grammar and at the same time an account of the differences between data types. Our finding that gradience reflects a real psychological phenomenon related to constraint violation cost (see Section 3) demands that the architecture of syntax reflect this reality, which current models generally do not do. An empirically adequate and psychologically real grammar must have the following features: quantifiable violation costs, a continuum of well-formedness, and survivable constraints (ie no constraint violation necessarily results in the exclusion from the language of the violating structure); all this to account for our judgement data. It must also generate output competitively and probabilistically so as to reflect the data patterns observed in frequencies. The obvious way to achieve this is for our syntax model to distinguish between a grammatical module which applies syntactic constraints and another which selects output. Our Decathlon Model thus has a Constraint Applica-
The Decathlon Model 197
Figure 5. The Decathlon Model of the grammar and grammaticality.
tion module, which applies constraints, assigns violation costs, and outputs form/meaning pairs, weighted with violation costs. We know certain things about the internal functioning of this module: constraints are applied blindly and exceptionlessly, and violation costs are cumulative. We may think of this module as containing the grammar, though it also contains the other factors which affect well-formedness in judgements. The second module, Output Selection, functions quite differently. Its task is to select from the possible form/meaning pairs the form which is to be output (in production processing) or the interpretation to be assigned to an input (in receptive processing), and exclude the others. It functions competitively and selects the best candidate on the basis of the weightings assigned by the Constraint Application module. This selection occurs probabilistically however, which accounts for occasional production of sub-optimal versions. In Figure 5 we see the computational steps which generate frequency data and judgements. In production we assume that an unformed message is delivered for formulation in the Constraint Application module, drawing on the resources of the lexicon. Incrementally, perhaps phrase by phrase, candidates for the linguistic representation together with their weightings are proposed to the Output Selection function, which selects the best, or one of the best. The arrows exiting the left-hand module show the candidate continuations of the structure passing to the selection module, their weightings represented by their offset positions. Sometimes two continuations will be roughly equally good: She turned the light off vs She turned off the light, in which case both will have about equal weightings and both will occur. Receptive processing makes use of the same two modules, Output Selection choosing in this case what form/meaning pair to assign to a given form, the input, rather than
198 Sam Featherston choosing what form/meaning pair to assign to a given meaning, the message the speaker wishes to convey. Giving judgements is a little different. The example is input processed as usual to determine its structure and meaning, but instead of returning the output of the selection module, relative judgements consist of returning the output of the Constraint Application function. Recall that Constraint Application outputs form/meaning pairs with a weighting. This of course requires the claim that the output of this module can be consciously accessed, as well as merely passed on as usual for selection. The capacity to be aware of finegrained cognitive workload is not something which we might have predicted for ourselves, but it is nevertheless not implausible, since we are certainly aware of more coarse-grained thinking effort. The difference between frequency measures and relative judgements can therefore be attributed to them being the outputs of two different modules of linguistic processing, both of which are independently motivated. This model has a number of explanatory advantages. First, it is firmly based on the primary data of syntax. It accounts for the differences in outcome patterns between data types, an outstanding question in linguistics. Frequency data reflects the output of the Output Selection module, which is (necessarily, since we produce only one form of an utterance) competitive. Since this module uses the weightings which are output by the Constraint Application module, we account for the fact that judgements and frequencies agree in identifying the same forms as optimal. These weightings are themselves functionally motivated by their identification with computational complexity, an explanatorily economical association, since we know of the existence of workload effects from other sources, such as processing data. The fact that output selection occurs probabilistically accounts for the occasional production of sub-optimal versions: rare but documented counterexamples in corpus data are thus no threat to grammatical generalizations in this model. Note that this is not an unprincipled method of accounting for awkward data, on the contrary, it makes strong and testable predictions: the most frequently occurring variant should be that which is judged best, but the much lower frequency of alternative variants should be strictly in order of their judged well-formedness. Second, it ties the grammar in to evidence from sentence processing. It is consensual that syntactic processing operates on-line, incrementally, and applies information from multiple sources in order to take decisions. It has often been suggested that the processor consists of a constraint component and a decision component which prunes less optimal interpretations or outputs
The Decathlon Model 199
(see Featherston 2001 for discussion of parser types). Our model is compatible with the evidence that we readily understand structures with errors, for example. This makes it necessary that we should be able to assign a structure to input which contains faults. Our Constraint Application model can account for this well-documented characteristic, since structures with constraint violations are not immediately excluded from the language but merely given more negative weightings. No model which assumes that a grammatical violation cost is identical with exclusion from the language can do this. This fault-tolerant quality greatly extends the range of linguistic data that the grammar can account for. A third strength is that it provides some explanation of the wide variation in grammar architectures that we find competing in linguistic theory. Each of these captures a part of the fuller picture that we have sketched: until the recent interest in competition in syntax (eg M¨uller and Sternefeld 2001), it was generally the case that all constraints were thought to apply to all structures, unorderedly, blindly, and automatically. Our judgement data confirms the empirical reality of this and it is reflected in our model. OT, by contrast, is entirely committed to competition, motivated by the insight that it is generally the best of any competing set of structural alternatives which is produced. This too reflects a real aspect of the empirical data: the process of selecting a form to produce necessarily results in a competitive interaction – the non-occurrence of anything but the best. This is thus included in the Output Selection module in our model. Probabilistic grammars (cf Manning 2003) too have their motivation: there is indeed a probabilistic component in the linguistic production system, although our relative judgements suggest that it is no part of the grammar, which operates blindly and exceptionlessly, but is located further downstream at the selection stage. Each of these grammar types can achieve some success because each reflects an aspect of the data: the Decathlon Model shows that they need not be contradictory, and includes all three features simultaneously. Our fourth and last explanatory advantage concerns the position of the grammar in the wider picture of evidence about the way language works. Our model allows the syntax to cover a much wider range of phenomena. Such issues as linguistic variation and language acquisition can be accounted for in a model with exceptionless constraint application but a parameter of violation cost strength. For example, Aissen & Bresnan’s (2002) Stochastic Generalization notes that similar constraints may be found cross-linguistically, but they appear grammatical and categorical in one language while being mere statistical tendencies in another. We have a ready account of these findings:
200 Sam Featherston the same factors exist across languages, but their violation costs vary, due to interactions of constraints (for the superiority effect in German and English as an example of this, see Featherston 2005b). Not only the differences between languages, but also regional, sociolinguistic and even idiomatic variation can be encoded as differences in violation cost amplitude. The learning of the language-specific parts of these violation strengths can thus be seen as a part of the acquisition of syntax. Our model thus offers a far wider view of the linguistic environment than most approaches to syntax. In this it bears a resemblance to the syntax of the sixties and seventies, when questions about the position of grammar in a more general cognitive setting were a standard issue for syntacticians. More recently they have tended to see their role as developing grammars within a psycholinguistic framework which in the meantime has become not merely a consensus, but rather a part of the set of basic assumptions of syntactic theory. Syntacticians now tend to devise syntactic analyses within this given conceptual space, rather than question the shape and extent of the space itself. In our work we have aimed to re-open this debate, and revisit these assumptions in the light of the new data available. 5.1
Well-formedness does not directly trigger occurrence
Our model is also supported by data from the interaction of well-formedness and occurrence. The standard assumption is that the functions of constraint application and output selection are not to be distinguished, and that they take place with the same module. In generative grammar, this would predict that any structure which is generated and does not violate any constraint on structure is grammatical and may be produced, while in OT the last candidate remaining, the only well-formed one, is produced. Both of these thus assume that production depends directly on the grammar, and that well-formedness directly determines occurrence. The Decathlon Model however claims that production competition determines output, so that there is no single level of well-formedness that triggers occurrence in the output. In the light of this, consider the results of the experiment in Figure 6. This figure shows the results of an experiment which contained three unrelated sub-experiments, with their mean judgements indicated by error bars as before, arranged in ascending order of well-formedness, by sub-experiment. Each group of error bars is thus a set of structural alternatives competing to represent given semantic contents. This is clearest in the set on the right-hand side, where all are competing to represent a single semantic content, whereas
The Decathlon Model 201
Figure 6. The mismatch of well-formedness and occurrence: Production is competitive.
the middle group are competing for three different semantic contents, and the left-hand group are competing for four different semantic contents. In each set, those structures which were found to occur in the COSMAS I corpus (IDS, Mannheim) are above the line, while those which do not occur are below the line. It is striking that the structures which occur always appear in a solid block, from the top of the group. This alone is strong evidence of competition for production, based on the weighting information which we can access as judgements. However, notice that the best two structures from the right-hand group, which are those which occur in the language, are nevertheless judged worse than some of the lower structural alternatives in the other groups, which do not occur. Let us be clear that these judgements were given by the same participants in the course of the same experiment, whose items were ordered randomly. The implication is clear: occurrence is not directly dependent upon well-formedness, but rather upon a competition function based on these weightings. This finding supports the distinction of the grammar and the production function, as in the Decathlon Model, but it is not compatible with an architecture in which these two are merged. 5.2
Categorical judgements and relative judgements
This insight into human linguistic processing offers an account of another outstanding question: Why do judgements, elicited under strictly controlled conditions, show that informants, given a free choice of scale, do not use a
202 Sam Featherston binary division or end points which might represent “fully grammatical” and “fully ungrammatical”? Our solution to this quandary is to distinguish the categorical judgements commonly used in syntactic work from the relative judgements obtained from our experimental studies. Our assumption of this dissociation is based upon several pieces of evidence. The strongest evidence for the reality of categorical judgements is quite simply our intuition that there are such things as “full grammaticality” (= “I would expect to hear this”) and “full ungrammaticality” (= “I would never expect to hear this”). Every speaker seems to have this, and neither its reality nor its relevance can be doubted: any naive informant, given a binary choice whether an example is good or bad, can immediately make sense of the question. It seems likely that the existence of this intuition is the reason for the standard linguistic assumption of dichotomous grammaticality. On the other hand, the results of carefully controlled experimental studies such as our own demonstrate conclusively that relative judgements exist too. Further evidence for the distinction is offered by a frequent comment in judgements of sets of sub-optimal structures: “I would never say it, but it is better than the other one”. The frequency of this type of reaction suggests that this intuition too is common to all speakers. With this response the informant is giving both types of judgement information: a categorical judgement and a relative one. This typical comment also gives us a clue about the difference between the two types: categorical judgements concern occurrence, while relative judgements reflect computational cost. Let us take these in turn. The categorical judgement, we argue, is an expression of the likelihood that a structure is good enough to occur in practice. As such it is probably dependent on one or both of two factors: firstly, our internal corpus of the language, made up of the effects of language exposure, which feeds information into every process which makes use of frequency. The question that the informant is internally answering (at least sometimes consciously) is: “Have I heard structures like this?” The second possible factor is our Output Selection function. The internal question here is: “Would this structure be produced or is there a better alternative which would be chosen in preference to it?”. Either way, categorical judgements reflect occurrence, and produce an essentially binary output in the same way that other occurrence-based data types do; a structure either does or does not occur. The relative judgement, on the other hand, reflects the cognitive workload in processing the form and semantic content of the structure, and relating the two. It reflects the function of the Constraint Application module, and consists of its standard output of a candidate form-and-meaning pair with an assigned weighting. This provides an account of why relative judgements can
The Decathlon Model 203
distinguish between sets of structures which are all seriously ill-formed and none of which would ever occur. Such data cannot possibly reflect occurrence or frequency, since this is consistent across all such structures, but they nevertheless differ in computational workload. Notice that this too provides an explanation of our failure to find any reflection of the intuition that certain linguistic constraints are ‘hard‘ constraints in our judgements (see Section 3). Constraints felt to be ‘hard’ are those which have such high violation costs that structures violating them will, in practice, tend not to win the competition for output and thus not occur. We find no correlate of ’hard’ constraints in our relative judgements because their ‘hardness’ is a feature of occurrence, not computational complexity. Short people do not tend to become professional basketball players, in fact there may be no short basketball players. A ’hard’ restriction? Plainly not. Short people can become basketball players if their other qualities (agility, speed, good aim) can make up for the disadvantage of shortness in this context. This may in practice never occur, but nevertheless the restriction is not a ’hard’ one. Restrictions on syntactic structure work in the same way: certain violations may mean that violating structures will rarely or never be selected for output: but the link is not direct, and there is no ’hard’ restriction. 6 The nature of well-formedness If further work confirms that relative judgements reflect computational load and categorical judgements reflect possible occurrence, then a number of implications for the architecture and nature of the grammar would follow. First, at least a proportion of the restrictions on linguistic structure are ultimately functionally motivated, since they relate to the factor ease of use, and are ultimately emergent, in that the factors which drive the division into “better” and “worse” structures are themselves value-free. It should be clear that this conception of the cognitive roots of grammar has little in common with approaches more generally associated with the label emergent (eg Bybee and Hopper 2001), which use the factor frequency as the causal factor in the emergence of structure. Our ambition here is to account for (among other things) occurrence frequencies, not use them as explanations. We are associative and competition-driven thinkers: put differently, we are lazy thinkers, and we therefore prefer computationally easier processing tasks. But the processing of every word and every syntactic relation comes with a cost: this can be readily seen in judgement studies, where longer sentences are systematically judged worse than shorter sentences (more words
204 Sam Featherston mean more computional load). There is of course nothing actually “wrong” with longer sentences: the interpretation of computational load as “badness” comes only at the stage when the production system has to deal with structural alternatives, and, at this stage but not before, forms which incur higher computational costs are dispreferred. Thus longer sentences are computationally more costly, but sometimes necessary and so they occur, whereas forms for which a more economical structural alternative exists may only be a little more costly, but even this little additional computational cost is unnecessary, which makes the structure unlikely to be selected for output. This model of well-formedness can perhaps be understood as resembling the world of economics. All expenses are dispreferred. Nevertheless, we are prepared to pay more for a motorbike than for a bicycle, because a motorbike does more than a bicycle. We therefore produce long or complex structures when these are appropriate, even though these are computationally costly. On the other hand, we are not to prepared to buy the more expensive of two objects which perform the same function. Equivalently, when two structural variants communicate the same semantic content, we choose the less computationally costly alternative. But there is nothing per se about any structure which makes it good or bad – computational workload is not bad until we lazy thinkers judge it so. This analysis of the nature of judged well-formedness accounts for cumulativity, violation costs, survivability etc, but at the same time goes some way to explaining why there is evidence for the universality of grammatical restrictions (architecture-related factors are by their nature universal), and it does this within a psycholinguistically and empirically motivated framework. 7 Implications for data types and their relation to theory It seems fair to state that a fundamental assumption underlying the use of frequencies as a source of evidence for syntax is that “good” structures are produced, and thus found in corpus data, while “bad” structures are not produced and thus not found. In a second step we might generalize that there is an assumption that better structures are produced more often than less good structures. These assumptions are confirmed by our findings, but they reveal that this is not the full picture: frequencies correlate with well-formedness in judgements among the very “best” structures, but provide no information about “poorer” candidates, because these undifferentiatedly do not occur. Or rather, they do not occur in the size of corpus to which we have access. If we are right in our suggestion that output competition is probabilistic, then,
The Decathlon Model 205
in a big enough corpus, we should find not only the best and second-best candidates but also the third and fourth and so on. The fact that linguists are always finding structures in corpus data which they had assumed to be categorically excluded, but which do not appear to be mere slips of the tongue, must strongly support this suggestion (just search for ”What did who” in Google). It would follow that frequency measures and judgement data are mathematically related, since we could predict the score of a given item in a comparison set on the basis of the set’s scores from the other data type. They are not practically related, however, since the corpus size required would increase exponentially as we proceeded down the order of preferredness. It follows from our arguments here that the data type of choice for syntax must be relative judgements. Frequency measures give us the same information as relative judgements about the best (couple of) structural alternatives in each comparison set, but they give us no information about any of the others. Since the interaction of linguistic constraints is demonstrably cumulative, this is a severe disadvantage, especially as it tends to make linguists interpret relative restrictions on structure as absolute restrictions. Put briefly: if you want to know what people say, choose frequencies, but if you want to know why, you are better off with relative judgements. 8 Implications for syntactic theory These new, but empirically founded perspectives on data types and their implications for the nature of grammaticality and the structure of the grammar are in some ways revolutionary, since they require a number of conventional assumptions to be abandoned or revised. Much can remain unchanged, however, since linguists in the past, on the basis of the much more partial data they had available, often nevertheless correctly identified characteristics of the data set. For example, with only an individual’s judgements and without the immediate access to corpus data that we have now, the abstraction to an essentially binary model of grammaticality was a reasonable step, which has in many ways served the field well. On the other hand, our findings should make it clear to every syntactician that the current model of syntax has significant weaknesses. We can well understand how these came about, but that cannot be a reason not to move on. In fact the necessary reformulation of syntactic theory requires only two major steps. Syntacticians must first recognize that production processing has a role in deciding what linguistic forms are produced, and that occurrence only indirectly reflects well-formedness. This entails that output selection and the grammar are two separate processes, and we must decide which of these we are modelling.
206 Sam Featherston There are three possibilities: we can look specifically at the system of the constraints which apply to syntactic structures, our Constraint Application module, and disregard production factors. To do this we should use data types which exclude the effects of occurrence as far as possible, ie relative judgements, and refine our theory to more accurately reflect the attested data patterns. This is narrow syntax. Others will be more interested by the processing system: there is an extensive literature on sentence processing and numerous data-near models of how we go about using our embedded grammar. This work concerns the aspects of what we have called here the Output Selection module. The third approach is to look at the cumulative effect on output of the two modules, Constraint Application and Output Selection. This is what many syntacticians are currently doing, assuming themselves to be looking at just one system, but the mismatch in data patterns between frequencies and relative judgements reveals this work to be treating two heterogenous objects as one. Nevertheless, this is an interesting and worthwhile field of study in its own right, one closely related to traditional descriptive linguistics, in which the occurring patterns of a language are the issue, rather than the underlying causes of these patterns. Frequencies will be the data of choice for this study since they represent the selection of the output processing system from the candidates made available and weighted by the grammar. The insight that we can and should distinguish between the functioning of Constraint Application and Output Selection should bring about a major improvement in the empirical adequacy of syntax models, for the division of these two modules resolves at a stroke many of the inconsistencies which obscure the nature of the interaction of linguistic constraints. Syntactic theory will be far closer to the data, and hypotheses about the grammar will be far more constrained, surely a welcome development. Having cleared the picture by factoring out the competitive effects of output selection, we can take a look at the module containing the grammar, which we have called Constraint Application. The second major step in the revision of theory applies here, and consists of the specification of constraint violation costs. Each violation must have a quantified cost, since there are stronger and weaker violation constraints. The introduction of this parameter should alone bring about many of the changes in architecture which are necessary to adjust current theory to gradient grammaticality, as is demonstrated to be necessary in work such as Keller (2000) and Featherston (2005b). As soon as violation costs are accepted as a real variable, the other adjustments (survivability of constraint violations, cumulativity of violation costs, dissociation of categoricity and grammar-relevance) follow automatically.
The Decathlon Model 207
These then are the lessons which we argue that syntax theory needs to draw from the closer inspection of its data base. First, we must redraw the boundary between grammar and production so as to distinguish between the effects of linguistic constraints, and the effects of our need to select just one way formulating each utterance. Second, we must add the additional parameter of violation cost to our models of syntax. Not words and rules, therefore, are the basic components of the grammar, but words, rules and sanctions. Acknowledgements This work was carried out in the project Suboptimal Syntactic Structures of the SFB 441 Linguistic Data Structures supported by the Deutsche Forschungsgemeinschaft. Thanks are due to project leader Wolfgang Sternefeld, my colleague Tanja Kiziak and many other members of the SFB 441, as well as to Frank Keller for WebExp. All errors are mine.
References Aissen, Judith and Joan Bresnan 2002 Categoricity and variation in syntax: The Stochastic Generalization. Talk at Potsdam Gradience Conference, 22.2.2002. Bard, Ellen, Dan Robertson, and Antonella Sorace 1993 Magnitude estimation of linguistic acceptability. Language, 72: 32– 68. Boersma, Paul and Bruce Hayes 2001 Empirical tests of the gradual learning algorithm. Linguistic Inquiry, 32: 45–86. Bybee, Jane and Paul Hopper 2001 Frequency and the Emergence of Linguistic Structure. Benjamins, Amsterdam. Cowart, Wayne 1996 Experimental Syntax: Applying Objective Methods to Sentence Judgements. Sage, Thousand Oaks, California. Featherston, Samuel 2001 Empty Categories in Sentence Processing. Benjamins, Amsterdam. 2002 Coreferential objects in German: Experimental evidence on reflexivity. Linguistische Berichte, 192: 457–484. 2003 That-trace in German. Lingua, 1091: 1–26. 2004 Bridge verbs and V2 verbs: The same thing in spades? Zeitschrift f u¨ r Sprachwissenschaft, 23: 181–209. 2005a Magnitude estimation and what it can do for your syntax. Lingua, 115. 2005b Universals and grammaticality: Wh-constraints in German and English. Linguistics, 43.
208 Sam Featherston Keller, Frank 2000 Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph.D. thesis, Edinburgh University, Edinburgh. Labov, William 1996 When intuitions fail. In Lisa McNair, Kora Singer, Lise Dolbrin, and Michelle Aucon, (eds.), Papers from the Parasession on Theory and Data in Linguistics 32, pp. 77–106. Chicago Linguistics Society, Chicago. Manning, Christopher 2003 Probabilistic syntax. In Rens Bod, Jennifer Hay, and Stephanie Jannedy, (eds.), Probabilistic Linguistics, pp. 289–341. MIT Press, Cambridge, MA. M¨uller, Gereon and Wolfgang Sternefeld 2001 Competition in Syntax. Mouton de Gruyter, Berlin. Pesetsky, David 1987 Wh-in-situ: Movement and unselective binding. In Eric Reuland and Alice ter Meulen, (eds.), The Representation of (In)Definiteness, pp. 98–129. MIT Press, Cambridge, MA. Prince, Alan and Paul Smolensky 1993 Optimality Theory: Constraint interaction in generative grammar. Technical Report Technical Report No.2, Center for Cognitive Science, Rutgers University, New Brunswick. Sch¨utze, Carson 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago.
Examining the Constraints on the Benefactive Alternation by Using the World Wide Web as a Corpus Christiane Fellbaum
1
Introduction
This paper asks whether data gathered from the web can provide new insight into speakers' grammars and serve as evidence for linguistic theories. Our case study examines the poorly understood English Benefactive Alternation and the various constraints that have been proposed to account for its distribution. Web data show these constraints to be soft and subject to frequent violation and extension, raising the possibility of a constraint with "fuzzy edges." Our case study argues for the need to substitute or augment constructed data with web data to avoid theoretical biases and capture the full range of rule-governed linguistic behavior. The Benefactive Alternation relates two syntactic variants for the expression of an argument that bears the semantic role of Beneficiary. This argument can be realized either in a PP headed by for as in (1) or as the first of two direct objects in a double object construction as in (2): (1) (2)
Chris baked/bought/stole a cake for Kim. Chris baked/bought/stole Kim a cake. The direct object alternant of the Benefactive is subject to constraints:
(3) (4) (5)
Chris baked/bought/decorated/sliced a cake for Kim. Chris baked/bought Kim a cake. *Chris decorated/sliced Kim a cake
Prior explanations for contrasts like that between (2, 4) and (5) have been formulated in terms of semantic constraints on the verb's semantic class membership, the aspectual nature of the event, the Beneficiary and/or the Agent arguments, as well as the verb's morphophonological make-up.
210
Christiane Fellbaum
1.1
The data
Previous investigations of the ill-understood Benefactive alternation are based on introspection and the examination of data generated by the investigator. Such data are often constructed in order to highlight a particular phenomenon, and theoretical bias may well influence and limit the range of data that should be considered for a full account. Judgements can be unreliable and vary both across speakers and within a given speaker, depending on factors such as the context in which a specific structure is embedded. Given the current availability of naturally occurring data, it seems timely to re-examine the alternation and the constraints that have been proposed in an empirical fashion.1 1.1.1 Using the web as a corpus We used the World Wide Web as a corpus and a source for naturally occurring examples in order to examine a range of claims specifying how the Benefactive Alternation is semantically constrained. The web is a rich source of freely available linguistic data covering a wide range of speakers, topics, and styles, with much of the data generated spontaneously and unedited. The web is thus a logical alternative to conventional corpora, which tend to be hard to come by, small, or limited to a few domains. At the same time, there are no controls on web postings, and some linguistic data may not be suitable as evidence for linguistic theories. Web data are vulnerable to the charge of unreliability for several reasons. A major concern are data posted by non-native speakers. This is particularly worrisome in the case of English data, as English, unlike, say, Hungarian or Japanese, serves as the lingua franca for worldwide communication, for which the web is a preferred channel. To safeguard again non-native data, each URL needs to be examined, and those that are clearly of foreign origin must be excluded. Moreover, for each type of construction, we collected several data points. Perhaps most importantly, the data discussed in this chapter were presented at several conferences and in colloquia where they went unchallenged by the native English speakers in the audience. Some of the data cited in this paper could be dismissed as non-standard. Certainly, quite a few examples reflect language that one might not find in official publications or written language at all. But the kind of language that is often spontaneously generated in postings to chat groups nevertheless reflects speakers' grammars. Our work shows that data generated "naturally" and outside the context of a linguistic investigation force a rethinking of
Examining the Constraints on the Benefactive Alternation
211
previously proposed constraints on the Benefactive alternation, which we argue are too narrowly formulated. Not just anything goes, as far as web data are concerned. The fact that we could not find web data like (5) indicates that the construction is constrained in a principled way that needs to be understood. Of course, not finding a web example of a Benefactive with a particular verb or noun argument does not mean that it is categorically ruled out and ungrammatical. First of all, our search was unsophisticated in that we could not look for abstract syntactic patters or entire verb classes, but had to search for specific strings, necessarily missing other, similar, ones. But even the absence of hits to more exhaustive searches cannot be overinterpreted. The asterisks in this chapter must therefore be interpreted not as strictly "ungrammatical," but as "unattested on the web." The motivation for this work was less a definitive formulation of the constraints on the Benefactive as a confrontation of the proposed constraints with attested data. 1.1.2 Searching for relevant structures The PP alternant of the Benefactive, exemplified in (1) and (3) is essentially unconstrained, while (5) shows that the alternant where the Beneficiary is projected as the direct object (DO) is restricted. To determine the scope and nature of the constraints, we searched for examples of the DO alternants of the corresponding PP alternants. As this work was carried out prior to the development of the Linguist's Search Engine (Resnik and Elkiss 2004), we had no intelligent tool for targeted searches and had to rley on simple pattern-matching searches. Using GOOGLE , we formulated queries of the forms (6) (7)
"she Ved me some" "he Vs her a"
where V was filled with specific verbs. We used a variety of verbs that occurred with high to medium frequency in the Brown Corpus, and whose semantic make-up was relevant to the constraints formulated. When testing hypotheses concerning the nature of the Agent or the Beneficiary, we looked for examples with specific nouns filling the argument slots. We excluded sexually explicit sites or other inappropriate data.
212
Christiane Fellbaum
2
Beneficial events
We now examine some aspects of constructions that express events with a benefit. 2.1
What kinds of events can be beneficial?
The argument expressing a Beneficiary is always optional and is not part of the verb's theta grid. This suggests that verbs selecting for Beneficiaries do not inherently denote beneficial events, but can receive such an interpretation. A question that arises is, are there any constraints on the class of verbs that can express benefits? Many transitive and intransitive verbs from a wide variety of semantic classes can add a Beneficiary as a PP adjunct; the actions expressed by these verbs are not inherently actions performed for someone's sake or benefit. The sentences below, found on the web with the Beneficiary, are just as good without: (8)
(9)
There'll be an unloading zone at the transition area if you wish to have someone drop you off and park your car for you home/att.net/~ata-jc/kaprules.html There is also a system-wide startup file which is run for you first Orbit-net.nesdis.noaa.gov/ora/oraintranet/ctst/unix/c15.html
Similarly, many verbs permitting the DO alternant do not denote events with an inherent benefit: (10) Peel me a grape (11) Hurry, get my red shirt (12) It feels as though someone had designed me a custom dress... www.between-theshadows.com/shadows/fire/transformations/aboutme.html (13) I asked Mom to wash me some clothes, www.bad-krama.net/archive/arc39.html (14) I ask Roberto if he can change me some money www.newbury.net/deanwood/doc/Greece.htm (15) And try to find me some aspirin while you're at it. www.geocities.com/chocofeathers/ multifics/2ndc_chap8.htmll
Examining the Constraints on the Benefactive Alternation
213
In some cases, construing a beneficial reading is difficult. We could not find examples such as the following, though they are perfectly interpretable in the right context (e.g., where the Beneficiaries are a nurse and a stage director, respectively). (16) I'll take a walk/swim/nap for you (17) She fell down the steps for him While these structures seem perfectly grammatical, given an appropriate context, the corresponding DO alternants do not: (18) ??I take you a walk/swim/nap (19) ??She fell him down the stairs Unlike the unrestricted PP alternant, the DO alternant seems to be reserved for events where a beneficial reading can be constructed more easily and naturally. 2.2
No benefit
The alternation also occurs with events that have undesirable consequences for the DO: (20) They have done nothing but ruin me my whole life www.piedmont.tec.sc.us/worldlit/andr1.htm (21) They have done nothing but ruin my whole life for me (22) So they set you a trap hot.ee/fanfic/thirteefull.html (23) So they set a trap for you There is a straightforward semantic contrast between benefactives and these "malefactives." It has often been observed that contrast is a particular kind of semantic similarity, as contrastive concepts tend to represent different values of a shared attribute or distinct points on the same scale. The fact that Benefactives and some "Malefactives" participate in the same alternation is consistent with a view in which they are semantically related.
214
Christiane Fellbaum
2.3
Beneficiary or replaced Agent?
The for-phrase in the PP alternant is potentially ambiguous. Green (1974) cites, besides the Benefactive, the "instead-of" reading. In (24) and (25) respectively, the missionary taught the class, and Kahler gave the speech, in place of the writer: (24) But on Tuesday, I stayed home, in bed part of the day! Another missionary taught the class for me. www.jacklynes.com/russia/letter16.htm (25) ...had developed a very hoarse sore throat. So with the approval of my hosts, Kahler gave the speech for me and did very well indeed. www.nobel.se/noble/events/eyewitness/hench/ The distinction between the Benefactive and the "instead-of" readings is not always sharp, as the substitution seems to imply a benefit for the substituted Agent; this reading is avoided only when the PP is headed by instead of. (26) Complete Grocery Shopping... We do all the shopping for you. There's no need for you to spend your valuable time ... www.shadowlief.com/what_we_offer.htm (27) E-mail Software does all the work for you. www.homeuniverse.com/bulk.htm In some cases, the context supplies world knowledge that disambiguates between the two readings of the for-phrase: (28) ...pianist Vladimir Horowitz. After hearing two of John's compositions, which he played for the maestro one evening after dinner, Mr. Horowitz looked over to John ... www.johnsciullo.com Most likely, John is performing for Horowitz's benefit here, not in his place. Under the "instead-of" reading, for can receive heavy stress, so long as no other constituent in the VP is focused. Thus, (29) means (30) and not (31): (29) I'll do it FOR you (30) I'll do it in your place
Examining the Constraints on the Benefactive Alternation
215
(31) I'll do it for your benefit Green notes that in the direct argument alternant, no replacement interpretation is possible, and the NP is always a Beneficiary. This can be seen in (32) and (33), where only the beneficial context seems felicitous: (32) Mary played Mr. Horowitz two of her compositions ...and the maestro listened attentively ??...while he was away on tour (33) Mom washed me some shirts ...so I'd look neat for the job interview ??because I don't know how to operate the washing machine 2.4
The Agent as Beneficiary
Some verbs, including consumption and perception verbs like eat, drink, watch, listen, etc., denote events with an inherent benefit for the subject, the "ingester." It is difficult to construct an additional Beneficiary argument for these verbs. Nevertheless, some dialects of American English can add an explicit DO Beneficiary, in constructions that Curme (1986) calls a "personal dative." 2 The DOs here must be Beneficiaries and not Recipients, as the verbs do not denote transfers. These Beneficiaries are either reflexives or object pronouns, necessarily coreferent with the subject. Web examples are: (34) (35) (36) (37) (38)
I'll have myself a little snack before bed I'll eat me some potted meat gonna listen me some Guns and Roses gonna watch me some uneducational TV Have yourself a merry little Christmas
In contrast to the Benefactives where subject and object have distinct referents, the corresponding PP alternants seem ungrammatical: (39) (40) (41) (42) (43)
*I'll have a little snack before bed for myself *I'll eat some potted meat for me/myself *gonna listen to some Guns and Roses for me/myself *gonna watch some uneducational TV for me/myself *Have a merry little Christmas for yourself
216
Christiane Fellbaum
Non-ingestion verbs can have a reflexive beneficiary in a for-phrase, but only when there is a contrast: (44) I'll wash some shirts for myself (but not for you) The reflexives seem therefore to constitute a different phenomenon with these ingestion verbs; we will return to these data later. 2.4.1 Some Spanish data In what appear to be related cases, Nishida (1994) examines Spanish sentences with transitive verbs that have alternants with an additional Reflexive: 3 (45) Juan SE tomó una copa de vino John (REFL) drank a glass of wine (46) Yo ME comí diez manzanas I (REFL) ate ten apples The Spanish data seem similar to the English ones, and interestingly, Nishida's examples cover the same semantic classes of verbs (verbs of consumption/ingestion like eat, drink, smoke, read), and what could be described as semantically contrasting verbs (skip over, miss out); as well as verbs of acquisition like steal, gain, learn). Nishida claims that the reflexive clitic in cases like the above overtly marks quantitatively delimited events. The reflexive variants of (45) and (46) thus can be translated as eat up/drink up, whereas the non-reflexives do not have this completive aspect. This contrast is very clearly shown in (47). Only the sentence with the reflexive necessarily refers to an event where the entire book was read (the English gloss is underspecified with respect to aspect): (47) Juan (SE) John REFL
leyó el libro anoche read the book last night
Given the meaning difference, it seems surprising that sentences like (46), which have a delimited object (as opposed to, say, a bare plural NP), allow both the reflexive and the non-reflexive form. Not surpisingly, Nishida states that the native speakers he consulted preferred the reflexive form.
Examining the Constraints on the Benefactive Alternation
217
Despite the superficial similarity, Nishida's explanation for the Spanish data does not hold for English, where the sentences with reflexive Beneficiaries like (35-37) have a DO with a partitive, which marks them as nondelimited, and where the event is necessarily open-ended. We will return to the question of aspect in Benefactives in Section 4.4. 3
Argument status of the Beneficiary
The Beneficiary is always optional, hence it cannot be considered to be a subcategorized argument of the verb. Several explanations have been offered to account for the licensing of the Beneficiary. Larson (1990) suggests a mechanism he calls Argument Augmentation, which adds a Beneficiary to the verb's theta grid. Marantz (1984) proposes an affix-mediated increase in the verb's valency. Baker (1988) suggests that a zero affix allomorph of for attaches to verb when the Beneficiary is in direct argument position.4 In support of the claim that a Beneficiary is not a true argument, it has been pointed out that it it fails a standard test for argumenthood, passivization. In this respect, Beneficiaries contrast with Recipients: (48) *Kim was selected/designed/sewn a wedding dress (Kim = Beneficiary) (49) Kim was given/sent/mailed a present (Kim = Recipient) However, a web search turned up numerous examples of passivized Beneficiaries (these may be more characteristic of British than American English): (50) I was made coffee and sat and talked to Ella www.punternet.com/frs/fr_view.php?recnum=6798 (51) ...until early morn, when I was made tea and toast, ... pws.prserv.net/usinet/declair.diary1.htm (52) Today, the teachers were fixed breakfast www.switzerland.k12.in.us/~pacerps/pdf (53) "Oh," Nick said ...watching as he was poured a drink. "Brian?" http://cgi.allihave.net/fiction/hom/short/ForMyWedding.shmtl (54) He was built a house at pool-side, to keep him in the shade. www.bronxbard.com/specials.html
218
Christiane Fellbaum
(55) His 'friend' who came with him insisted that he was bought some trainers... duvel.lowtem.hukudai.ac.jp/~jim.climbing/manda3_report/node15.html We will not further pursue the question concerning the argument status of the Beneficiary but, clearly, passivization does not distinguish Beneficiaries from real arguments like Recipients. We will return to a comparison of the Beneficiaries and Recipients later. 4
Constraints on the Benefactive alternation
A number of explanations have been proposed to account for the restrictions on the Benefactive alternation, specifically, the distribution of the DO alternant. These explanations have been formulated in terms of the verbs' lexico-semantic and morphophonological properties, the aspect of the beneficial event, and the semantics of the Agent and the Beneficiary arguments. We examine each of the proposed constraints in the light of pertinent web data. 4.1
Lexico-semantic constraints
Green presents the most extensive analysis of verbs that show the Benefactive alternation. She classifies these verbs into distinct groups: Verbs of creation, selection, performance, and obtaining, as well as "symbolic actions." In parallel with Green's classification, benefits have been characterized as created entities (including entities created as a by-product of acting on another entity), prepared entities, (future) possessions and obtained entities (Green 1974; Levin 1993; Larson 1990; Jackendoff 1990; inter alia). We will examine each of these classes in turn with respect to the alternation. 4.1.1 Verbs of creation A creation usually requires effort, and efforts tend to be undertaken only when they are associated with a benefit. A created entity can therefore readily be interpreted as a Benefit, and creation verbs generally allow Beneficiaries in direct argument position:
Examining the Constraints on the Benefactive Alternation
219
(56) In 1818-1819, Benjamin Henry Latrobe built the couple a house, which came to be known as Decatur House, next to Lafayette ... www.library.georgetown.edu/dept/speccoll.decatur (57) My friend Ola fixed me a job. www.trance.org/sensphere/ (58) She made them waffles www.womenspace.ca/Fabrications/lorr3.htm (59) and she bore him a son, Hasumat. www.sacred-texts.com/oah/oah/oah360.htm An interesting subclass is constituted by cases where the creation is a by-product or result of another event, which is not a creation event: (60) only if you clean me some room on this desk to work, right? www.geocities.com/dbzasuri/epics.itsoadc1.htm The room is the result of the desk cleaning event; even though the room is the DO of the verb, what is cleaned is not the room but the desk. 4.1.2 Destruction verbs It has been claimed that that the Benefactive is not available for verbs of destruction (Wechsler 1991). However, we found quite a few web examples that refute this claim, including the following: (61) ...kick the crap outta saint nick and burn him some pagans www.geocities.com/s1xlet/apathy19.html (62) ...an idealistic 18-year-old eager to go kill him some Redcoats www.dvdjournal.com/reviews/p/patriot.shtml These sentences show that the destruction of an entity (the Theme) may result in a benefit (for a Beneficiary that may or may not be coreferent with the Agent). Destruction and creation verbs are semantically related by virtue of the contrast between them, and an extension of the alternation from creation to destruction verbs could be attributed to this similarity. In (63) and (64), the destruction events do not appear to entail a benefit: (63) Herons or other wild fowl shall destroy them their nest or eggs www.nprwc.usgs.gov/resource/1999/eastblue/ebexotic.htm
220
Christiane Fellbaum
(64) The white missionary is trying to ruin them their way of life www.piedmont.tec.sc.us/worldlit/andr1.htm Here, the DO seems to emphasize the "malefactive" effect of the destruction. 4.1.3 Verbs of preparation Verbs expressing events where an Agent acts on an entity such that this entity is prepared for use or consumption easily take a Beneficiary argument and exhibit the alternation. (65) Peel me a grape, Crush me some ice. Skin me a peach, save the fuzz for my pillow. www.amyandfreddy.com/cd/track5.html (66) I asked Mom to wash me some clothes www.bad-karma.net/archive/arc39.html (67) Honey, can you iron me a shirt? www.epinions.com/hmgd-review-6689-32384DB-3A231D50-prod1 4.1.4 Verbs of performance Verbs of performance can be described as re-creations of a work of art such as a composition, a poem, or a song. Performances therefore resemble creations. The Beneficiary here is an Experiencer. (68) and that's where I met Mel and Shaz, I played them some tunes www.portowebbo.co.uk/nottinghilltv/faces-kgee.htm (69) Anyways, Herman has sang me some of his banana-fried lyrics members.aol.com/bumingler/set1/songs.html (70) Morgane donned wooden clogs and danced us a dance beyondthebrochure.homestead.com/britnorm.html Note that some performance verbs take both for- and to-phrases in the PP alternant: (71) Moses and his sister Miriam both sang a song to the Lord... www.pcusa.org/ega/music/favoritesongs.htm
Examining the Constraints on the Benefactive Alternation
221
(72) That week at ending campfire, we sang a song for Cassie. kidsaid.com/stories/cassie.html (73) ...anyone who would like to can play a piece to the school. www.childokeford.dorset.sch.uk/ clubsandactivities/music.htm (74) When I actually sit down to play a piece for others... www.violinist.com/discussion/response.cfm?ID=3688 4.1.5 Symbolic actions In addition to the classes listed above, Green notes that certain "symbolic actions" performed for someone's benefit can undergo the Benefactive alternation. Green does not explain the exact nature of "symbolic actions;" the benefit is specific to the context. Among the examples we found are these: (75) God said to Abraham: Kill me a son www.ieor.berkeley.edu/~goldberg/lecs/kierkegaard.html (76) Baby open me your door www.geocities.com/SunsetStrip/Pit/8508/songs/chameleo.html The verbs here (and those cited by Green) do not fit neatly into any semantic class. Moreover, the symbolic actions most clearly show the distinction between Benefactive and Recipient, since these cases do not involve a created/prepared/performed/obtained entity that is moved, transferred, or that comes into the possession of the Beneficiary. 4.1.6 (Future) possession Many verbs of obtaining, describing a resultant possession for the Beneficiary, participate in the alternation: (77) He said he'd rather go out and grab him some food www.dreamwater.net/art.jtdoc/hasil.html (78) Retrieve me some cream cakes! Home.talkcity.com/ekochap8.htm A subclass of the future possession verbs are the verbs of selection, where a to-be-possessed entity is chosen for the Beneficiary:
222
Christiane Fellbaum
(79) ...please select me a good singer for about twelve shillings www.classicreader.com/read.php/sid3/bookid1506. (80) I've written to Sylvia asking her to choose me a coat www.gerty.ncl.ac.uk/letters/l1170.htm Obtained Entities, including abstract ones like emotions, can become the Beneficiary's possession: (81) Radical Red: Get Me Some Self-esteem. by Laura Jones. www.thebody.com/tpan/julaug_01/self_esteem.html (82) I was bought loads of drinks and got quite drunk. www.wrecked.co.uk/norames/ott3.html 4.2
Benefit and possession
Larson (1990), Daultrey (1997), Krifka (1999) inter alia, attribute to the DO alternant the requirement for a created/prepared/obtained entity that becomes the Beneficiary's possession. The benefits in the different verb classes conform to the notion of possession to varying degrees. Future possession verbs clearly do. The products of creation events also become straightforwardly possessions of the Beneficiary: (83) Anyone who can create me some copies in other formats, please give me a shout! www.cunningham-king.freeserve.co.uk/YCC/Fornt%20Page.htm (84) She read the recipes and cooked her husband some Spam (85) composed me a few lovely haiku www/ghostinthemachine.net/weblog4200.html Performance verbs are a subclass of creation verbs that involve no physical entity; Green argues that the audience's perception constitutes a kind of possession: (86) Henn Parn with his dancing partner performed us the professional ballroom dances www.euroiniv.ee/evana/eng/ball2001.htm The notion of possession is somewhat stretched here, as other ways of referring to a possessed entity seem infelicitous. With a possessive adjec-
Examining the Constraints on the Benefactive Alternation
223
tive, the noun in (87) , interpreted as an activity rather than as a result, is odd, as compared with the created noun in (88): (87) You/I/she watched your/my/her dances (88) Here is your/my/his spam The product of a preparation or transformation event is extending the notion of possession even further, as the Theme is already in the possession of the Beneficiary prior to the preparation or transformation: (89) Well, the rest is his story? Honey, can you iron me a shirt?? www.epinions.com/hmgd_review-6689-32484DB-3a231D60-prod1 (90) You're a good boy, Joe. Now get busy and wash me some dishes. www2.xlibris.com/bookstore.book-excerpt.asp?bookid=902 (91) I asked Mom to wash me some clothes, www.bad-karma.net/archive/arc39.html Sentences like (92) and (93) might be described as referring to the repossession of an entity that the Beneficiary owns: (92) The captain shouted to the first mate, "Hurry, go to my cabin and get me my red shirt!" www.skywaystools.com/jokes1/html (93) his segundo would fetch him his French hat, morning frock coat and a birch tree chair. Collections.ic.ca/skeena/Cataline.htm Clearly, to equate Benefit with Possession is stretching the notion of possession considerably in many cases. Moreover, such an equation would not account for why the Benefactive exists as a phenomenon distinct from the Dative Alternation and applies to a different set of verbs. 4.3
The Latinate constraint
As in the case of Dative Shift, verbs of Latinate origin are said to be generally ineligible for the Benefactive Alternation (Levin 1993, inter alia).5 However, we found the following examples on the web, which include verbs from all the lexico-semantic classes associated with the Benefactive Alternation:
224
Christiane Fellbaum
(94) Anyone who can create me some copies in other formats, please give me a shout www.cunningham-king.freeserve.co.uk/YCC/Front%20Page.htm (95) Please compose me a short piece. www.uen.org/utahlink/activities/view_activity.cgi?activity_id=7511 (96) ...promised to procure me seeds mnlg.com/gc/species/c/cau_pla.html (97) I shall decline your invitation to purchase me a beverage www.fabulamag.com/contest/august_html (98) I am going to japan to acquire me a new slave home1.gte.net/methnews1/GLA.txt (99) this networking helped to secure me a position www.geocities.com/SouthBeach.Jetty/9001/collegedays.html (100) She produced me two gorgeous sows cavyclub.tripod.com/satinperuvian.html (101) Can anyone..photocopy me the manual. www.driverforum.com/harddrive/1267.html (102) Is there someone who could construct me a set of replicas www.taxidermy.net/forums/FeerTaxiArticles/ (103) ... a group of students performed us sketches about their school www.fast-trac.ofw.fi/report14.htm All these examples have a clitic pronoun in direct argument position, which might suggest that for clitics, the Latinate constraint is relaxed. Indeed, we found fewer Latinate verbs with a full noun DO Beneficiary than with a pronoun, but our searches turned up quite a few examples, including the following: (104) ... To secure our customers success in using our technology... Support.reachin.se/Downloadable/brochures/Core_technology.pdf (105) Her aggressive and well planned marketing concepts, combined with her personable selling skills, guarantee her customers outstanding results. rearch_realtors.com/pennsylvania/wyomissing/Liz_Egner_94768886.html (106) ...in order to ensure future generations an opportunity to appreciate and enjoy the West's rich heritage ... www.wstpc.org/About/Facts/htm (107) SA has obtained his clients recognition all over the world... www.pontsoft.com/empresa/plasticm/eng/welcome.htm
Examining the Constraints on the Benefactive Alternation
225
We conclude that there is no restriction on the Benefactive alternation that can be formulated in terms of the etymological or morpholphonological properties of the verb. Rather, speakers seem to employ this construction with certain Latinate verbs just as they do with semantically similar verbs that are of Germanic origin. In this respect, too, the Benefactive alternation resembles the Dative Shift. The restrictions on the Dative Shift have been formulated in terms of the Latin vs. Germanic origin of verbs like donate/contribute on the one hand and give/hand,on the other hand. But this explanation does not hold, as sentences like (108) and (109) with Latinate verbs show: (108) at her death she bequeathed him her whole property www.fordham.edu/halsall/pwh/plut_sull1.html (109) he offered us some hope. www.ucsfhealth.org/childrens/profiles/sieberSamuel.html Offer and bequeath are verbs with of future possession. Pinker (1989) attributes to this distinction their apparently exceptional behavior with respect to the Dative Shift. But this explanation does not account for the Dative Shift with a verb like render, which denotes a transfer contemporaneous with the event time but which behaves syntactically like a verb of future possession: (110) having rendered us the slightest service www.wtj.com/archives/suchet/suchet03a.htm The distinction between future possession and possession at the time of the event cannot account for the distribution of the Benefactive alternation with non-Latinate verbs, either. While for many verbs, in particular the creation and preparation verbs, there is necessarily a delay between the event and the benefit derived from it, benefits derived from a performance must necessarily be co-temporaneous with the performance, and Pinker's proposed constraint for the Dative Shift would not work for verbs like sing, dance, and recite in sentence like these: (111) She sang them a song (112) He danced us a little jig (113) Recite us your latest poem
226
Christiane Fellbaum
4.4
Aspect
Green states that the Benefactive alternation is compatible only with accomplishment. But again, attested data appear to refute this claim. As we saw in connection with the reflexives and the Spanish data, DO Benefactives occur with events that have stative character: (114) I always keep me some balled up paper by the phone. www.jolenestrailerpark.com/Storys/4htm (115) We all loved the flavour and thedevelopment of this wine and I said "Keep me some of this for my seafood starter". ... www.wineoftheweek.com/hist/food200806.html In the following example, the event denotes an open-ended activity: (116) I'm gonna have me some fun. ww.atlyrics.com/quotes/p/predator.html In fact, all the creation, preparation, and performance verbs can be turned into activities, as can be seem by their compatibility with temporal for-adjuncts: (117) She baked them waffles for hours (118) Mom washed me my shirts for years (119) She sang us Christmas songs for weeks It might be argued that (117-119) refer to repeated accomplishments; however, this is not the case in (114-116). We conclude that the restriction on the Benefactive alternation cannot be formulated in terms of the aspectual properties of the event. Indeed, it is not clear how such a constraint would be semantically motivated.6 4.5
Constraints on the noun arguments
Several restrictions have been formulated in terms of the semantics of the nouns expressing the Agent, the Benefit, and the Beneficiary.
Examining the Constraints on the Benefactive Alternation
227
4.5.1 Devotion or intention to please Green states that the Benefactive construction expresses the Agent's devotion or desire to please the Beneficiary. The Agent intends the event to benefit the Beneficiary. We already saw examples that involve no benefit but rather an undesirable consequence. Moreover, devotion and desire to please presuppose the animacy--or, more precisely, the sentience--of both Agent and Beneficiary. The Agent must not only be the instigator and in control of the event, but capable of intent and the feeling or attitude of devotion; conversely, the Beneficiary must be capable of appreciation. While a majority of the attested Benefactive data conforms to this assertion, we found cases with inanimate Beneficiaries, which are clearly incapable of being pleased. One could argue that in the examples below, the Agents are devoted to a doll or a car, which are anthropomorphized; the "devotion constraint" is extended here. (120) Brandy found a shirt sleeve...and made her doll a dress www.geocities.come/Haertland/Esates/3147/RPLOTN/julkids.html (121) ... Bought my car some new boots... www.strangely.org/diary/200008/ There are also cases where the subject of the Benefactive is an inanimate Cause or a causing event; these data invalidate the claim that the event demonstrates devotion or a desire to please. (122) ...the mixture of sand and clay and then let it stand in summer, the sun bakes you a brick. www.growise.com/Articles/sprhtm/bestsoilamendments.htm (123) Luck Found Me a Friend in You. ... www.gocollect.com/p/cherished-teddie/special-occasions.html (124) That deal saved me $6 www.100megsfree4.dom/blahthings/2001/march/mar16.html These data suggest that the devotion/appreciation constraint is not a hard one. While in (124), the Beneficiary might also have played an agentive role in the deal and thus have been at least partially responsible for his own action and the ensuing consequences, no intention can be ascribed to the sun or luck.
228
Christiane Fellbaum
4.5.2 Contemporaneous existence Green includes in her constraints on the Benefactive the contemporaneity of Agent and Beneficiary; the reasoning is that the Beneficiary must be able to benefit from the product or result of the event or of the entire event. But a web search reveals that this constraint does not hold. People perform actions for the benefit of not-yet-born Beneficiaries: (125) This again will save future generations time in collecting data ilil.essortment.com/craftstimecaps_rlmd.htm (126) ...industrious cultivators of Abbasid times save their descendants expense and labour by providing them with building materials www.gerty.ncl.ac.uk/letters/1462htm (127) agreement the Air Force and Raytheon and Hughes have negotiated on the Advanced Medium-Range Air-to-Air Missile will save future customers about $180 million. .. www.aerotechnews.com/starc/102797/102997d.html We also found examples of actions performed to benefit deceased, i.e., or no longer existing, Beneficiaries: (128) We will see what we can do to get him a gravestone marker www.islesoford.com/idcgues.html (129) if you don't buy a gravestone for Khveodor. You kept saying, it's winter, winter ... if you don't buy him one, he will come again, www.geocities.com.cmcarpenter28/Works/3deaths.txt (130) I've saved my last dime to buy him a casket. amsterdam.nettime.org/Lists-Archives/nettime 9908/msg00038.html Presumably, the speakers/writers of these sentences consider the deceased as being still among them and they extend the "contemporaneity" constraint proposed by Green accordingly. 4.5.3 The Beneficiary as employer Green further states that, with the Beneficiary as the DO, the action performed for someone's benefit cannot be carried out when the referent of the subject is employed by the referent of the DO. Her (constructed) examples are (131-132), where Mr. Lubin pays the speaker:
Examining the Constraints on the Benefactive Alternation
229
(131) I baked cakes for Mr. Lubin. (132) *I baked Mr. Lubin cakes. But the "employment constraint" is more subtle. We found numerous examples on the web with DO Beneficiaries where the action is performed under conditions of employment: (133) Happy Customers .. He built me a very fully loaded system www.grovecomputerservices/com/happy.htm (134) Tom Mullins Web Design Studio prepares you a bid www.tommullinsdesign.com/flag-prices .htm (135) In 1818-1819, Benjamin Henry Latrobe built the couple a house www.library.georgetown.edu/dept/speccoll/decatur/ What seems to account for the contrast in Green's examples and the web data is not just a broader context of employment or the commercial setting of the event, but the difference between the roles of an employer and a customer (I am grateful to Anthony Krogh and Philippa Cook, who, on separate occasions, suggested this interpretation). When the DO is unambiguously an employer, as in (136-139), the Benefactive alternant seems ruled out and only the for-PP alternant is grammatical; when the DO is a customer (a kind of temporary employer), the DO alternant is felicitous. We could find no sentences of the kind (136) (137) (138) (139)
??She stacked Wal-Mart shelves ??We cleaned The Maids houses ??They sold AIG insurance (cf. They sold insurance for AIG) ??Maradona played the Naples club soccer
Interestingly, the customer is also the Recipient of the product or result of the event, while the employer, who presumably passes the product on to his customers, is not. These data show once again the close semantic relation between Recipient and Benefactive. 5
Benefactive vs. Dative alternation
The Benefactive alternation resembles the Dative alternation both syntactically and semantically, and the two are often lumped together. We discuss
230
Christiane Fellbaum
some similarities and differences and argue for a distinction between the two constructions. 5.1
Beneficiary and Recipient
Both Recipients and Beneficiaries can occur freely in PP adjuncts and, with as yet ill-understood restrictions, as direct objects of many verbs. We already saw the semantic similarity between the two kinds of roles in cases where a Beneficiary is also a Recipient: (140) ...where he bought him some meat and a big loaf of bread. www.penguinreaders.com/downloads.Spreads/Olivertwist167.pdf (141) I got her some cute summer dresses. www.livejournal.com/users/piazza_rox31/ Oliver and she both receive something and benefit from it. The semantics of verbs like buy and get include both transfer and benefit, and this may account for the syntactic similarity of these verbs with respect to the PP/DO alternation. Only the PP alternants for verbs like buy and get, which must be headed by for, not to, show the difference between the two kinds of arguments. Some verbs of transfer do not strictly subcategorize for a Recipient. They may select for an adjunct with either a Recipient or a Beneficiary: (142) Kim mailed/sent/faxed a poem for John on his birthday. (John=Beneficiary) (143) Kim mailed/sent/faxed a poem to John on his birthday. (John=Recipient) Both arguments can co-occur as adjuncts in either order: (144) Kim mailed/sent/faxed a poem to Mary for John. (145) Kim mailed/sent/faxed a poem for John to Mary. But when one of these arguments is in direct argument position, it must be the Recipient: (146) Mary mailed/sent/faxed John a poem for Kim. (John = Recipient) (147) *Mary mailed/sent/faxed John a poem to Kim. (John = Beneficiary)
Examining the Constraints on the Benefactive Alternation
231
This fact seems to suggest some kind of "primacy" of the Recipient over the Beneficiary with respect to argument status. Passivization data reinforce this intuition. Our search results indicate that verbs that can take both a Recipient and a Beneficiary can passivize only the Recipient: (148) I sent mailed flowers for/to John. (John = Recipient or Beneficiary) (149) John was sent /mailed flowers (John = Recipient only) By contrast, many verbs that do not (also) select for a Recipient can passivize the Beneficiary (cf. also the examples in (50) - (55)): (150) The host poured drinks for us/*to us (us = Beneficiary) (151) The host poured us drinks (152) We were poured drinks Nevertheless, some verbs with a strong transfer meaning component that undergo the Benefactive alternation apparently cannot passivize the Beneficiary argument: (153) (154) (155) (156) (157) (158)
He got her shoes for her She fetched some clothes for him He got her her shoes She fetched him his clothes ??She was gotten her shoes ??He was fetched his clothes
Another difference between Recipients and Beneficiaries shows up in sentences where they are the sole argument. Some verbs that select for a Theme and a Recipient may delete the Theme when it is background knowledge shared among the discourse participants; here, the Recipient can be the sole DO: (159) (160) (161) (162)
I paid him (the money) She served them (dinner) I trade you (my stamp collection) He showed me (the trick)
But verbs in double object Beneficiary constructions cannot delete the Theme:
232
Christiane Fellbaum
(163) (164) (165) (166)
I'll cook you *(dinner) She prepared them *(lunch) We danced the children *(a folk dance) He bought her *(the ring)
Passives with Beneficiaries seem moreover constrained to events denoting the preparation of an entity and/or a transfer of possession. We found many non-transfer verbs in active constructions with a Beneficiary argument (either in a PP or as a DO ) but the web yielded no corresponding passives: (167) (168) (169) (170) (171) (172) (173) (174) (175) (176) (177) (178)
he composed her a song ??she was composed a song create me a website ??I was created a website wash me a shirt ??I was washed a shirt kill me some Redcoats ??I was killed some Redcoats ruin them their way of life ??they were ruined their way of life strike me a fire ??I was struck a fire
Like ingestion verbs, verbs of transfer can also select for a reflexive Recipient in DO position: (179) Mr Graham-Cumming sent himself the same message 10,000 times... www.spamfo.co.uk/The_News/Scams_&_Fraud/How_to_make_spam_unst oppable/2/ (180) She had promised herself a night on the town www.skaro.com/write/trish/trish27.html (181) she granted herself permission to lie. www.creativenonfiction.org/thejournal/articles/issue05/05editor.htm For Recipients, the PP-alternant is attested, too, in contrast to the ingestion verb with a reflexive Beneficiary: (182) ...a copy he sent to himself turned up in his own spam folder. ... www.careerjournal.com/jobhunting/ resumes/20040413maher.htm
Examining the Constraints on the Benefactive Alternation
233
(183) But Lindo had promised to herself that she would never forget... www.hh.shuttle.de/hh/gyha/ Facher/Englisch/joyluckmarl.htm (184) ...the little birthday present she'd granted to herself. www.grandt.com/XanderZone/ stories/read.php?story=Rejoined The show that there is some overlap between Beneficiaries and Recipients. First, a number of verbs select for both arguments. Second, the semantic role of a noun phrase often includes aspects of both Beneficiary and Recipient and cannot always be clearly distinguished. Third, the passivization data indicate a kind of "competition" between the Beneficiary and the Recipient, but suggest that only the latter has full argument status. A possible interpretation of the data is that the Beneficiary is a kind of sub-role of the Recipient, semantically more specified and syntactically more constrained. The Benefactive may be reserved for those cases not covered by the broader Dative/Recipient, namely, cases where no change of possession is necessarily involved or where the emphasis is on the benefit rather than a change of possession. Previous analyses of the Benefactive alternation, including Green,'s have cast its semantics in terms a change of possession, characterizing verbs of performance, creation, and preparation as metaphorical possession transfer. But this does not account for the fact that the alternations differ and are available for different verb classes. 6
Semantics of the alternants
We examined the constraints that have been proposed to account for the double object alternant of the Benefactive alternation. Web data demonstrate that many of the previously formulated constraints do not strictly hold and that speakers violate them regularly. However, the violations are not random but appear to be extensions demonstrating the "softness" of the semantic constraints. Given the semantic and syntactic overlap between the Dative Shift and the Benefactive Alternation, one might ask whether the explanations proposed for the constraints on the former can also help in understanding the latter. Krifka (1999, 2003) in his study of a large number of Dative-shifting verbs, argues for distinct meanings associated with the two alternants in many cases. He proposes that the DO syntax expresses a change of possession, where an Agent causes a Goal (or Recipient) to be in a state of possessing the Theme. The DO construction does nor provide for a movement event, in contrast to the PP alternant, which expresses an event where the
234
Christiane Fellbaum
Agent causes the motion of the Theme towards a Goal. Assigning specific semantics to these constructions, in the spirit of Goldberg (1995), seems to work well for the wide range of verbs showing the Dative alternation that Krifka discusses. An extension of this explanation to the Benefactive Alternation might be formulated roughly as follows. Analogously to Krifka's proposed analysis for the Dative Shift, the DO alternant causes a change of state in the Beneficiary, namely one where the referent necessarily becomes a Beneficiary and incurs the benefit. The PP alternant on the other hand simply expresses an event where an Agent intends a benefit for a potential Beneficiary; intention here could be interpreted as a kind of metaphorical movement of the benefit.7 Interestingly, the data from the reflexive Beneficiaries, often considered substandard or dialectal, provide some support for this analysis. Recall that these sentences involve verbs of ingestion and perception, where the Agent or Experiencer is necessarily coreferent with the Beneficiary: (185) (186) (187) (188)
I'll have myself a little snack before bed I'll eat me some potted meat gonna listen me some Guns and Roses gonna watch me some uneducational TV
Unlike with the other verb classes that show the alternation, the PP alternant is not available for these verbs: (189) (190) (191) (192)
*I'll have a little snack before bed for myself *I'll eat some potted meat for me/myself *gonna listen to some Guns and Roses for me/myself *gonna watch some uneducational TV for me/myself
Ingestion and perception events necessarily affect, and change the state of, the ingesting or perceiving entity, making these data consistent with a theory that says that the semantics of the double object alternant, but not those of the PP alternant, provide for a change of state. Green's constraint requiring the contemporaneous existence of Agent and Beneficiary constitutes a prerequisite for this analysis, while the data we found where the DO Beneficiaries are dead or as yet non-existent (sentences (125)-(130)) would be counterexamples. But it seems plausible that the speakers of these sentences conceptualize the Beneficiaries as living entities, capable of benefiting and undergoing a change of state.
Examining the Constraints on the Benefactive Alternation
7
235
Restrictions on the Benefactive alternation consistent with the data
We saw that the previously proposed constraints on the Benefactive cannot fully account for the naturally occurring data found on the web. The web data indicate that the constraints, as they have been formulated, are too rigid, and speakers regularly extend them. But clearly, restrictions on the DO Benefactive do exist. The data we examined do not permit us to formulate any hard constraints. Instead, we can state only one necessary but not sufficient condition for the DO Benefactive alternation. The attested data all appear to show at least one common semantic feature, the control of the subject over the event. 7.1
Control with transfer verbs
We saw that the Benefactive is allowed in many cases where the subject acts in order to bestow a benefit on another person (or entity). Some verbs of obtaining, like buy, get, fetch, and steal, when used as simple transitives, imply that the Agent is also the Beneficiary or Recipient of the obtained entity. But the default Beneficiary or Recipient can be cancelled in the presence of another Beneficiary argument: (193) ...santa said you guys gotta buy me my presents this year. . www.sassyandseksi.com/buystuff.htm (194) He wants someone to fetch him his shoes... www.washingtonpost.com/wpsrv/style/books/features/11980621.htm (195) Gabby's mom stole me some pants from the hospital www.dyve.com/springman/avi/art/artnav.htm Other verbs, like receive, denote events where the potential benefit must remain with the subject and cannot be passed on to another (non-subject) Beneficiary. The subject here is not only a necessary, but also a passive Beneficiary, and is not in control of the event where the possession changes ownership. We could not find any examples of these verbs with the Benefactive, either in the PP or in the DO alternant: (196) *I'll receive me/you/her a little present The subject's control over the event appears to be one (necessary but not sufficient) requirement for the Benefactive alternation. Further evidence
236
Christiane Fellbaum
comes from the interesting case of polysemy presented by the verb find. It shows the Benefactive alternation (and a Benefactive reading of the corresponding for-NP phrase) when it refers to the result of a search effort that implies a goal or intent, but we could not find instances where it refers to an accidental or serendipitous finding, as in the constructed (200): (197) Find Me My Property. Www.marinatradingpost.com/form1.html (198) My husband ...made it his mission to find me my pink shoes. www.epinions.com/sprt-Basketball-Adidas_superstarII (199) Find me my Perfect Mate! ... www.cutecards.net/platinum/icq/funlovetest.html (200) ??Find me a wallet in the street 7.2
Control with consumption and perception verbs
For verbs of consumption and perception, the subject, the ingester, is necessarily the Beneficiary. An explicit Beneficiary, coreferent with the subject, can be added to emphasize the subject's causation of, or control over, the event: (201) I'll have myself a little snack before bed www.dedecountryhome.com/BuddyBoy3.html (202) I reckon I'll eat me some potted meat. www.math.gatech.edu/~mullikin/res/respics.htm (203) gonna.listen me some Guns and Roses www.angelfire.com/me3/NovaSparkle/xjournal02.html (204) Gonna watch me some uneducational TV, damnit. www.champuru.com/08-2000/08-29-2000.html Our web searches turned up no examples of Benefactives with verbs where the perception event is not caused or controlled by the subject, as with hear and see, which do not imply intention and hence control by the perceiver: (205) ??I hear me some noises in the street (206) ??I saw me an accident on the road
Examining the Constraints on the Benefactive Alternation
7.3
237
Inanimate controllers
Sentences like (207-208) below show that an inanimate Cause can have control over an event, even though it is incapable of intention and volition: (207) The sun baked you the bricks (208) Still, the fact is the current budget only bought us time. ... www.americanprogress.org/site/ A Cause may control an event because of its specific properties, much as in middle constructions, where an entity's particular property enables a potential event. No sentience, volition, or intention is required to cause a benefit. Control is thus the one common semantic component of the wide range of subjects in the DO alternant; all other previously proposed constraints were shown to be violated by attested data. While control does not seem like a satisfactory semantic characterization, we expect to better understand the nature of the arguments in the alternation as more sophisticated web searches yield pertinent data. 8
Conclusions and future work
The web data show that most of the constraints that have been proposed on the basis of constructed data are soft and speakers frequently violate and extend them, though most data fall into the kinds of patterns that previous researchers have suggested. The work reported here raises the question as to the core of a constraint and its "fuzzy edges." This case study shows up the need for attested data, as constructed contrastive data, often labeled either "grammatical" or "ungrammatical," fail to capture the fuzziness of real constraints and often reflect the theoretical biases of the investigators who construct the data.8 Our main goal here was to test proposed constraints against attested data, we are not able to offer a revised full explanation of the alternation. The data are consistent with two observations: One, that the DO alternant requires that the subject have the abilities or properties required to bestow the benefit; two, that in the DO alternant, unlike in the PP alternant, a benefit is necessarily bestowed, resulting in a change of state of the affected entity, the Beneficiary. Traditionally, linguistic research had to rely on data based largely on the investigator's intuitions; attested, unsolicited, and naturally occurring data could not be obtained in a systematic fashion. Corpora represent a first step
238
Christiane Fellbaum
toward research based on non-constructed data. In particular, the World Wide Web represents a very large and domain-independent corpus that can be mined easily and efficiently. Our method was clumsy, and we cannot claim to have found all the relevant data. Therefore, we are careful not to propose a definitive account of the Benefactive construction. We plan to re-examine the Benefactive construction, as well as other illunderstood grammatical phenomena, with the help of a sophisticated search tool (Resnik and Elkiss 2004). Resnik and Elkiss's Linguist's Search Engine allows the user to search data from Internet Archives for specific syntactic structures and to build custom tailored corpora with pertinent hits for the purposes of empirical investigation. This tool will allow the testing and possible refinement of linguistic theories, and permit their formulation in the light of relevant data that might not otherwise be considered. Acknowledgements This work was supported in part by NSF Grant Number IIS-0112429. I thank Mari Olsen, Philip Resnik, Usama Soltan, Manfred Krifka, Philippa Cook, Adele Goldberg, Artemis Alexiadou, Hans Kamp, Effi Georgialou, Anthony Kroch, and Ben Haskell for their critique and helpful comments.
Notes 1.
2. 3. 4. 5.
Lapata's (1999) interesting study the Dative and Benefactive alternations, using the British National Corpus, investigates the relative frequencies of the two alternations, the preference of the alternating verbs for the DO vs. the PP alternant, and the representative members of the participating classes, based on Levin (1993). Her quantitative focus is however quite different from ours and does not directly challenge the proposed constraints on the Benefactive alternation. These constructions are described by Christian (1991) in her study of Appalachian speech. Christian states that they carry a "light benefactive meaning," but offers no further evidence for this assertion. I thank Manfred Krifka for pointing these data out to me. In principle, non-argument status would preclude the occurrence in direct argument position. Verbs like contribute and donate are generally considered to be restricted from the Dative Alternation. While a web search showed up no sentences with contribute and DO Recipients, well over a thousand sentences like can anyone donate me some ice cubes? and you can donate me some money. Apparently,
Examining the Constraints on the Benefactive Alternation
6. 7.
8.
239
whatever semantic constraints blocks the alternation for contribute does not (or no longer) block it for donate. If Benefit is equated with Change of Possession, then the aspectual constraint would be better motivated, as a change of possession tends to denote an accomplishment. Such an explanation seems related to the holistic constraint, which is often assumed to account for the spray-load alternation. The argument projected as the direct object is fully (holistically) affected by the event, in contrast to the argument in the PP. In an examination of the distribution of the Dative Shift on the basis of attested data, Bresnan and Nikitina (2003) claim that much data that is labeled "ungrammatical" is merely "improbable" and that the probability of their occurrence is linked to information structure.
References Anderson, Stephen R. 1971 On the Role of Deep Structure in Semantic Interpretation. Foundations of Language 7: 387-396. Baker, Carlos L. 1979 Syntactic theory and the projection principle. Linguistic Inquiry 10. 533-581. Bresnan, Joan and Tatiana Nikitina 2003 On the Gradience of the Dative Alternation. Ms. Stanford, CA: Stanford University. Christian, Donna 1991 The personal dative in Appalachian speech. In Dialects of English, Peter Trudgill. and J Chambers (eds.), 11-19. London: Longman. Curme, George O. 1986 A grammar of the English Language. Vol II: Syntax. Essex, CT: Verbatim Printing. Daultrey, Bethan 1997 The Structure of the Double Object Construction in English. www.ucd.ie/~pages/97/daultrey Goldberg, Adele 1995 Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Green, Georgia 1974 Semantics and Syntactic Regularity. Bloomington, IN:Indiana University Press. Jackendoff, Ray 1990 Semantic Structures. Cambridge, MA: MIT Press.
240
Christiane Fellbaum
Krifka, Manfred 1999 Manner in Dative Alternation. In: Proceedings of the 18th West Coast Conference on Formal Linguistics, Sonya Bird, Andrew Carnie, Jason D. Haugen, and Peter Norquest (eds.), Tucson, AZ: University of Arizona. 2003 Semantic and pragmatic conditions for the Dative Alternation. Korean Journal of English Language and Linguistics 4:1-32. Lapata, Maria 1999 Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations. In: Proceedings of the 37th Meeting of the Association for Computational Linguistics, 266-274. College Park, MD. Larson, Richard 1990 On the Double Object Construction. Linguistic Inquiry 19:335-391. Levin, Beth 1993 English Verb Classes and Alternations: A Preliminary Investigation. Chicago, IL: University of Chicago Press. Marantz, Alec 1984 On the Nature of Grammatical Relations. Cambridge, MA: MIT Press. Nishida, Chiyo 1994 The Spanish reflexive clitic se as an aspectual class marker. Linguistics 23: 425-258. Pinker, Steven 1989 Learnability and Cognition: The Acquisition of Argument Structure. Cambridge, MA: MIT Press. Resnik, Philip and Aaron Elkiss 2004 The Linguist's Search Engine: Getting Started Guide. Technical Report: LAMP-TR-108/CS-TR-4541/UMIACS-TR-2003-109. University of Maryland, College Park. Wechsler, Stephen 1995 A Non-Derivational Account of the English Benefactive Alternation. Paper presentend at the 65th LSA Annual Meeting. Chicago, IL.
A Quantitative Corpus Study of German Word Order Variation Kris Heylen
1
Introduction
Word order variation in German is as an area of syntactic research in which the limitations of the types of data traditionally used in theoretical linguistics have become apparent. In a case study, we present a quantitative corpus analysis as a possible alternative to overcome these shortcomings. Traditionally, theoretical linguists have not had to worry much about the validity of the data on which they based their theories. For most linguists, obtaining relevant data seemed fairly unproblematic and easy. In the generative tradition, linguists could rely on the introspective grammaticality judgments of any single native speaker (including themselves) to uncover the principles of grammar. Researchers taking a cognitive-functional approach offered hermeneutic interpretations of “encountered” examples of language use to show how discourse properties or general cognitive abilities shape the grammar. Most research into word order variation in German was also based on these types of linguistic evidence. Recently however, there has been a growing awareness across the field of theoretical linguistics that this kind of data has limited reliability and is insufficient to deal with the complexity of linguistic phenomena. Several options are being pursued to provide grammar research with a more solid empirical basis. One of them is the use of large electronic corpora for collecting representative data samples in which theoretically interesting patterns of linguistic structure can be discovered and validated. This kind of corpus study will be applied here to word order variation in the Mittelfeld of the German clause. 2
Word order variation in German
Word order variation has been a longstanding issue within the study of German syntax. In the first section, we will briefly outline the major research questions involved, and we will point out how problems with data
242
Kris Heylen
reliability have played an important role in this area of research. The second section introduces the specific type of word order variation that we will pursue further in a following case study. 2.1
A challenge to traditional data types
German clause topology is characterized by the so-called Klammerconstruction. The German clause has two fixed positions, called Klammer (German for ‘brackets’), that are typically occupied by elements of the verbal group in main clauses or by a complementizer and the verbal group in subordinate clauses. These fixed positions subdivide the clause into three main “fields”. Of interest here is that the field between the two fixed positions, called the Mittelfeld (middle field), can contain multiple constituents and that these constituents do not always occur in the same order. Especially the relative order of verbal arguments like subject, direct and indirect object, co-occurring in the Mittelfeld, has been the subject of a lively debate within the German linguistics community. In this debate, problems of linguistic evidence have played a major role. At the debate’s climax in the mid 1980’s, linguists mainly differed in opinion as to whether the word order variation in the Mittelfeld was mainly determined by either grammatical or pragmatic factors.1 Both sides kept on coming up with examples that seemed to confirm the importance of the factors they had put forward, while refuting the effect of those suggested by the other side. Two main problems seemed to keep the debate from being settled: firstly, a great many factors were involved simultaneously, and secondly, each factor taken individually rarely had a categorical effect. This posed enormous problems to the traditional types of linguistic evidence on which the researchers relied. Without a categorical effect of the factors, grammaticality judgements were not unambiguous and not at all stable across speakers. Moreover, there was no obvious scale to interpret the resulting graded differences in grammaticality. The fact that multiple factors were involved, made it well nigh impossible to control for all of these factors simultaneously in test sentences, let alone assess the contribution of each factor to the graded grammaticality. Towards the beginning of the 1990’s, there was an increasing awareness that the main problem was indeed a methodological one: the traditional introspective data was unreliable and could not cope with the phenomenon’s complexity. As a consequence, several new empirical methods were tried out.
A Quantitative Corpus Study of German Word Order Variation
243
One approach used psycholinguistic experiments based on processing time differences (e.g. Pechmann et al. 1996, Poncin 2001), and a second type of study looked at corpus material (e.g. Primus 1994; Kurz 2000). A third and more recent approach uses a sophisticated version of grammaticality judgments with a strict design, taken from multiple test subjects and analyzed with advanced statistical techniques (e.g. Keller 2000). Yet, these approaches do have problems of their own. Both the psycholinguistic experiments and the corpus studies continued struggling with the variation’s multifactorial complexity: the psycholinguistic studies had to limit the number of factors because of the time-consuming and costly way of collecting data. Results were highly reliable but dealt with only one or two factors simultaneously. The corpus studies could investigate the effect of multiple factors in large amounts of actual usage data, but they lacked the statistical apparatus to deal with multifactorial complexity. The third method of enhanced grammaticality judgments can gather sufficient data relatively easy and has the appropriate statistics to deal with multifactoriality, but the heuristic status of grammaticality judgments themselves is not unproblematic. They certainly give a reliable, reproducible estimate of speakers’ post-hoc introspective judgments of sentence acceptability, but it is unclear whether this, as claimed, directly reflects speakers’ linguistic competence of grammatical constraints used in on-line production, while at the same time filtering out “performance2 noise”. It seems more probable that the relation of acceptability judgements to the grammatical system is a complex and indirect one, because assessing acceptability is a separate and complex cognitive activity that potentially introduces new biases. The case study presented below opts for corpus material as data source for several reasons. Firstly, usage data as collected in corpora can be seen as the primary data in linguistics. Actual usage is what can be directly observed about language in reality. For a usage-based approach to grammar3, usage is primary because it fundamentally shapes the grammar. But even in a modular, autonomous grammar, usage data is not less primary than judgments because both are biased by performance, and in both performance noise can be filtered out in principle. Secondly, electronic corpora are getting larger by the day, so that gathering large amounts of data is relatively easy. Thirdly, corpus data deals with the problem of gradient effects fairly automatically by studying relative frequencies. Finally, the simultaneous effect of multiple factors can be studied directly by looking at actual usage, and these effects can be explored given the appropriate statistical apparatus. It is in this last respect that the case study below will try to improve on previous corpus studies that only used monofactorial analyses.
244
Kris Heylen
2.2
A specific type of word order variation
The case study focuses on a specific type of word order variation in the Mittelfeld, viz. the variation that occurs when both a full NP-subject and a pronominally realized object are present in the Mittelfeld. In this case, the pronominal object can either precede the full subject NP (as in ex. 1) or follow it (ex. 2).4 The variation occurs with both direct and indirect object pronouns. (1)
Ein paar Tage später nahm ihn der SED-Chef der Uni beiseite a few days later took him the SED-chief of the Uni aside ‘A few days later the university's SED-chief took him aside’
(2)
Später, als die Kommission ihn entlassen hat, sagt er, ... later when the commission him dismissed has says he ‘Later, when the commission has dismissed him, he says ...’
Although most reference grammars of German consider the word order with object-first (ex. 1) to be more common, both word orders seem to be freely interchangeable without any obvious difference in grammaticality or meaning. Because of this, traditional heuristic methods like grammaticality judgments cannot discriminate between examples and thus cannot detect the effect of relevant factors. Even the method of enhanced grammaticality judgments cannot detect a difference in acceptability between the two variants (Keller 2000: 108ff). The few other studies that discuss this type of variation (Lenerz 1994; Zifonun 1997:1511ff) admit that influencing factors are hard to identify. Classifying any syntactic variation as “free” variation is explanatorily highly unsatisfying and probably means more often than not that our methods for studying the phenomenon are insufficient. Admittedly, this type of variation might seem a hard nut to crack and not an obvious choice to start investigating German word order variation, but it also has a methodological advantage: By keeping the object pronominal, we reduce the multifactorial complexity because pronouns vary less in e.g. lexical form, length or discourse given/new status than full NPs. In what follows, we look at whether a quantitative corpus study can overcome the deadlock of traditional grammaticality judgments.
A Quantitative Corpus Study of German Word Order Variation
3
245
A quantitative corpus study
In the first section we discuss the corpus that was used to collect data and how this data collection was done. The second section introduces the different factors whose effect on word order we will examine in section 4. 3.1
Collecting the data
The case study is based on data from the NEGRA corpus.5 The corpus was compiled at the University of Saarbrücken and consists of 20,602 morphosyntactically annotated sentences (355,096 tokens) taken from the 1992 editions of a local Frankfurt-based daily newspaper (Frankfurter Rundschau). Strictly speaking, this means that any of our findings will only hold for this specific type of language use, i.e. language use from this specific register (newspaper articles), location (region of Frankfurt) and time period (1992). Yet newspaper texts cover many topical domains and are written by multiple authors, often with different backgrounds, so that the patterns we find in this type of data might well be representative for modern German usage in general. Of course, whether there are regional, sociolinguistic or register differences is a question that is open to further empirical investigation. For the data collection, relevant observations were defined as all clauses with a nominal subject and one pronominal object in the Mittelfeld. Taking advantage of NEGRA’s morpho-syntactic annotation, we used a selfprogrammed PERL script to extract these observations automatically, and then we did a manual check for precision on all retrieved observations and a manual check for recall on 20% of the corpus. The script had found all relevant observations but also 15% spurious ones. After removing irrelevant clauses, we were left with a total number of 995 observations. This means that the construction is quite common and present in about 5% of the corpus’ sentences. The observations were then annotated for the response variable word order (subject first vs. object first) and for a number of factors that are traditionally mentioned as relevant in the literature on word order variation. These factors had to be operationalized, so that each observation could be assigned a unique value during a process of mostly manual annotation. Basically, this process takes the form of an iterative annotation loop: as a new observation comes up that does not fit into the initial operationalization of a factor, this operationalization has to be revised to accommodate for this observation. Then, all previously annotated observations have to be checked
246
Kris Heylen
again to see whether they are still in compliance with the new operationalizations of the different factors. This reiteration goes on until all observations are annotated adequately. It goes without saying that this a very time consuming process, but it is probably the price that has to be paid for reliable data. The final outcome of this annotation is a so-called data matrix, which states for every observation which word order was attested and which values were observed for each factor. This data matrix is then amenable to further statistical analysis. But before turning to this analysis, the next section discusses the factors that were investigated. 3.2
The factors
Because the differences in grammaticality and meaning between the word order variants in our study are so elusive, it is very hard to make an initial guess about which factors might have an influence. Therefore we consulted the (vast) literature on word order and selected a set of the many factors that are mentioned as relevant. Seven of those are discussed in this study (see table 1 for an overview). Most of them pertain to properties of the subject because its realization as a full NP allows for more diversity than the pronominal object. Table 1. Factor overview FACTORS
VALUES
1
Case of the pronominal object
[dative] [accusative]
2
Semantic role of the subject
[agent] [recipient] [theme]
3
Length difference between subject and object
number of syllables
4
Given/new status of the subject
nine point ordinal scale
5
Animacy of the subject
[animate] [inanimate]
6
Pronoun type of the object
[personal] [reflexive]
7
Clause type
[main] [subordinate]
A Quantitative Corpus Study of German Word Order Variation
3.2.1
247
Case of the pronominal object
Grammatical case is probably the most basic factor for word order phenomena. Indeed, most German grammars explicitly refer to a constituent’s case to describe preferred orderings. Grammatical case like nominative, accusative or dative is mostly morphologically marked in German. Because our study keeps the presence of a nominative subject constant, the only variation in case occurs with the pronominal object. This can either be a pronoun in accusative or dative case. 3.2.2
Semantic role of the subject
The semantic role or theta role of a constituent refers to the role the referent of a constituent plays in the action denoted by the verb. It is a factor that figures prominently in many generative accounts of word order. In this study we discriminate between three roles for the subject referent. Agent, when the referent performs an action or causes an action to take place; Recipient, when a referent is the active recipient of objects or stimuli; and Theme, when the referent is itself inactive in the action / state denoted by the verb. 3.2.3
Length difference between subject and object
The idea that length difference has an effect on word order was introduced by Jacob Wackernagel in late 19th century and recapitulated by Behaghel as his “law of growing constituents”, which claimed that short constituents tended to precede longer ones. More recently, Hawkins’ (1994) EIC-theory claims that length difference is the main factor determining word order. In this study we measured the difference between subject and object in syllables rather than number of words, because in a compound-friendly language like German individual words can already differ greatly in length. Because the object is always a one or two syllable word, this measure mainly reflects the length of the subject. 3.2.4
Given/new status of the subject
Given/new status refers to whether a referent was previously mentioned in the discourse or not. Its influence on word order was a basic assumption of
248
Kris Heylen
the Prague School. It is a factor that is notoriously difficult to operationalize because of the many intermediate categories between completely new and totally given. The 9 point scale we use here for the subject referent was developed as an opportunistic tool by Grondelaers (2000) based on earlier work by Ellen Prince (1981) and Mira Ariel (1991) and it has since proven its applicability in several analyses. The scale (table 2) classifies a referent by its degree of accessibility, given the current discourse model. Table 2. Given/new scale VALUE
REFERENT ACCESSIBILITY IN CURRENT DISCOURSE
1
Not accessible and unconstrained
2
Not accessible but constrained by the context
3
Not accessible but constrained by an anchor referent
4
Accessible through encyclopaedic knowledge
5
Inferable from an anchor referent
6
Accessible in the wider linguistic context
7
Inferable from the near linguistic context
8
Accessible in the near linguistic context
9
Accessible in the immediate speech context
3.2.5
Animacy of the subject
Whether a constituent’s referent is animate or inanimate is proposed as the main determiner of German constituent order in the reference grammar by Zifonun et al. (1997). In this study, we only look at the animacy of the subject referent because a substantial part of the objects, viz. those realised as reflexive pronouns, would only mirror the subject’s animacy. 3.2.6
Pronoun type of the object
Although reflexive pronouns also participate in the variation, they have a different semantics from personal pronouns, which could influence word order. Because of the reporting style of news paper text, most pronouns are third person pronouns. For reflexives, the third person form sich is indeed the only one that was observed.
A Quantitative Corpus Study of German Word Order Variation
3.2.7
249
Clause type
In this study, clause type refers to the difference between main clauses and subordinate clauses. Whereas main clauses in German have the finite verb in second position (occupying the first Klammer), subordinate clauses have the finite verb in (near) final position (the second Klammer). Hawkins’ (1994) theory predicts this will lessen the tendency in subordinate clauses to put shorter constituents before longer ones. In our case, this would mean fewer short pronominal objects before long full-NP subjects in subordinate clauses. 4
Statistical exploration
After the annotation process described above, we had obtained a data matrix which states for every observation which order the subject and object appeared in, and which value was observed for each of the seven factors. Now, statistical analysis allows us to examine the correlations between word order and the factors. This can be done from two perspectives: if some theory of grammar has lead us to formulate a hypothesis that makes an explicit prediction about the correlation between word order and some factor(s), we can test whether this hypothesis is confirmed by the data or not. This is called confirmatory analysis. On the other hand, if we have annotated our observations for a number of factors, but we do not yet have an explicit hypothesis about which factors determine the word order and we would just like to know a bit more about the effects of these factors, we can explore the correlations between factors and word order in the data. This is called exploratory analysis. This exploration is meant to give a better insight into the data, which may well lead to new theoretical understandings and explicit hypotheses. In their turn, these hypotheses can again be tested. With the word order variation studied here, the main problem was precisely its elusive character which prevented us from formulating an explicit hypothesis about what determined the variation. Instead we chose to look at a number of factors suggested by the literature. An exploratory statistical analysis can now give us an idea about which of these factors are actually relevant, in what way and to what extent. Note that a statistical analysis will not in itself provide an explanation; rather, it uncovers patterns that themselves need explaining. These analyses help to expose empirical facts that are not apparent at first sight. These facts should be the input for explanation finding and theory building. A good theory will try to generalize and make predictions for other cases than the ones it started from. Whether
250
Kris Heylen
these predictions are borne out by the “empirical facts” is then a question to be addressed in additional analyses. The analyses presented below explore the data at levels of increasing complexity with increasingly advanced statistical techniques. The main concern will be the kind of information that these statistical techniques provide, not their technical details.6 First, we look at the relative order of subject and object per se to see which order is dominant. Next, the effect on word order of each factor separately is investigated. Then, we examine the effect of one factor while controlling for a second factor. Finally, we assess the effect on word order of multiple factors simultaneously. 4.1
The proportion of object-first and subject-first
In studying syntactic variation, an obvious first question seems to be: how much variation is there? Are both orderings of subject and object equally frequent, or is there a clear default, dominant order? In our data, 889 out of 995 observations have object-first whereas only 106 observations have subject-first. This proportion of 89,3% object-first confirms what most grammars of German say, viz. that object-first is the default order. For a future theoretical interpretation, this probably means that subject-first will be considered a marked order whereas object-first is the unmarked order. We might also be interested to know how reliable the information about this proportion is. How sure can we be that the proportion object-first we find in our data is a good estimate for the proportion of object-first in general. In fact, this is the basic question underlying all of statistics: how reliable are the results obtained from a sample of observations when compared to all possible observations. Intuitively, it is clear that the more observations we take into consideration, the more reliable our results will be. Statisticians use this property to determine confidence intervals from a sample. The 95% confidence interval for a proportion is the interval in which the true proportion for all possible observations will be situated with 95% certainty. The more observations we take into account, the more we can narrow down the interval. This confidence interval holds for all observations made under similar conditions as those under which the sample was collected. In our case, these conditions would be something like all observations that come from newspaper articles that appeared in local newspapers from central Germany in the early 1990’s. For these conditions, the 95% confidence interval for the proportion of object-first is situated between
A Quantitative Corpus Study of German Word Order Variation
251
87.1% and 91.0%. We now can reliably say that object-first is indeed the default order. 4.2
The effect of separate factors
Above, we introduced seven factors that we think might influence the relative ordering of a full subject NP and a pronominal object. In this section, we examine for each of the seven factors separately, what its effect on word order is. For each factor, we look at two statistics: the F² test7 tells us whether there is an association between the factor and word order. If there is an association, a “measure of association” tells us how strong the association is and what direction it takes. 4.2.1
Case of the pronominal object
Table 3 makes clear that there is not much of an effect of case on word order. The proportion of object-first versus subject-first cases is exactly the same for observations with an accusative and a dative pronoun. The F² test confirms that there is no significant association (p = 0.94).8 This lack of effect is somewhat unexpected, because case is considered to be relevant for word order by nearly all reference grammars. However, this may lead us to consider that although case per se is not important, some more specific interpretation of case might well have an effect, as we will see below (4.3). Table 3. Word order by Case of object Case
OBJECT FIRST
SUBJECT FIRST
ACCUSATIVE
724 / 810 (89%)
88 / 810 (11%)
DATIVE
165 / 185 (89%)
20 / 185 (11%)
4.2.2
Semantic role of the subject
The semantic role of the subject has a significant effect on word order (F², p < 0.01). Agent subjects precede the object relatively more often than recipient subjects, and in their turn recipient subjects precede objects more often than theme subjects. Increased agentivity of the subject seems to favour subject before object ordering, something we indeed expect from the literature. If we consider the three semantic roles as levels on a scale of
252
Kris Heylen
agentivity, the so-called gamma index gives a measure for the strength of the association between agentivity and word order. The index ranges from -1 (perfect inverse linear association) over 0 (no association) to 1 (perfect linear association). Here the gamma index is –0.49. The negative sign means that high levels of agentivity correspond to relatively lower levels of object-first (i.e. relatively more subject-first). The absolute value of |0.49| indicates that there is moderate association. Table 4. Word Order by Semantic role of subject Role
OBJECT FIRST
SUBJECT FIRST
AGENT
466 / 547 (85%)
81 / 547 (15%)
RECIPIENT
104 / 116 (90%)
12 / 116 (10%)
THEME
319 / 332 (96%)
13 / 332 (04%)
4.2.3
Length difference between subject and object
Length difference between subject and object has a significant effect on word order (F², p < 0.01). Smaller length differences lead to relatively more subject-first as we can also see in table 5. Because the pronominal object is always short, length difference mainly reflects the length of the subject. This means that shorter subjects precede the object relatively more often than longer ones, which is what we expect from Behaghel’s “law of growing constituents”. The gamma index of –0.44 reflects a moderate inverse association between object-first and smaller length differences. Table 5. Word Order by Length difference Syllables
OBJECT FIRST
SUBJECT FIRST
0-3
299 / 362 (82%)
63 / 362 (18%)
3-6
212 / 233 (91%)
21 / 233 (09%)
>6
378 / 400 (95%)
22 / 400 (05%)
4.2.4
Given/new status of the subject
Table 6 shows that the given/new status of the subject referent as measured by its degree of accessibility does not have a perfect linear effect.9 Indeed, empirical data does not always show the neat results we would like. How-
A Quantitative Corpus Study of German Word Order Variation
253
ever, more accessible subjects do seem to precede the object relatively more often, which we would expect from the theories of the Prague school. The (MH) F² test confirms that there is linear association (p < 0.01). A gamma index of 0.28 indicates that this linear association is relatively weak. Table 6. Word order by given/new status of the subject Values
OBJECT FIRST
SUBJECT FIRST
1
162 / 169 (96%)
7 / 169 (04%)
2
48 / 55 (87%)
7 / 55 (13%)
3
24 / 24 (100%)
0 / 24 (00%)
4
103 / 117 (88%)
14 / 117 (12%)
5
101 / 106 (95%)
5 / 106 (05%)
6
255 / 291 (88%)
36 / 291 (12%)
7
140 / 159 (88%)
19 / 159 (12%)
8
56 / 74 (76%)
18 / 74 (24%)
4.2.5
Animacy of the subject
In table 7, animate subjects precede pronominal objects more often than inanimate subjects. This effect is statistically significant (F² p < 0.01) and fits in with the effect of animacy that Zifonun (1997) predicts. Both word order and animacy have only two values and the measure of association generally used in such cases is the odds ratio. Here, this is the odds in favour of subject-first with animate subjects divided by the the odds in favour of subjectfirst with inanimate subjects, which gives a value of 2.33. The odds of having subject-first with animate subjects is more than twice the odds with inanimate subjects, a moderately strong association. Table 7. Word order by Animacy of the subject Animacy
OBJECT FIRST
SUBJECT FIRST
ANIMATE
532 / 614 (87%)
82 / 614 (13%)
INANIMATE
357 / 381 (94%)
24 / 381 (06%)
254
Kris Heylen
4.2.6
Pronoun type of the object
Personal pronouns follow the subject significantly more than reflexive pronouns (F² p < 0.01). Apparently, the fact that reflexive pronouns do not introduce a separate referent in the sentence’s meaning has consequences for their ordering. This finding from our data exploration can now lead us to search for a theoretical interpretation. The odds ratio of 2.96 indicates a moderately strong association. Table 8. Word order by pronoun type of the object Pronoun type
OBJECT FIRST
SUBJECT FIRST
PERSONAL
141 / 179 (79%)
38 / 179 (21%)
REFLEXIVE
748 / 816 (92%)
68 / 816 (08%)
4.2.7
Clause type
The marked order subject before object is much more frequent in subordinate clauses than in main clauses. There is indeed a significant association between clause type and word order (F² p < 0.01). This is a finding we will also have to interpret further after completing our data exploration. The odds ratio of 7.41, meaning that the odds for subject first in subordinate clauses is more than 7 times those odds in main clauses, indicates a very strong association. Table 9. Word order by Clause typet Clause type
OBJECT FIRST
SUBJECT FIRST
MAIN
646 / 674 (96%)
28 / 674 (04%)
SUBORDINATE
243 / 321 (76%)
78 / 321 (24%)
4.3
Stratified analysis
In the one-by-one analysis of factors, a surprising result was the lack of effect of object case on word order. Although there is no general effect, there might be an effect for specific types of pronominal objects. We therefore consider the effect of case for personal and reflexive pronouns separately. This is done in a so-called stratified analysis: we examine the effect of one factor (case) on word order, while controlling for a second factor (pronoun
A Quantitative Corpus Study of German Word Order Variation
255
type). Table 10 now tells us that there is a significant, moderately strong effect of case with personal pronouns (F² p < 0.01, odds ratio = 2.92), but there is no such effect with reflexive pronouns (F² p = 0.60). There is also a test statistic, the Breslow-Day test, to check whether the effect of case is indeed significantly different for reflexives and personal pronouns. With a p-value of 0.02, we can say that the probability of the effect being the same is very small. One reason might be case syncretism: the reflexive sich has the same form for dative and accusative, whereas personal pronouns do have different forms for these cases. There may also be other reasons, but in any case, the stratified analysis has revealed an interesting difference that we might want to interpret theoretically. Table 10. Word order by pronoun case, stratified for pronoun type Pron. type
Case
PERSONAL
ACCUSATIVE
69 / 97 (71%)
28 / 97 (29%)
DATIVE
72 / 82 (88%)
10 / 82 (12%)
ACCUSATIVE
655 / 713 (92%)
58 / 713 (08%)
DATIVE
93 / 103 (90%)
10 / 103 (10%)
REFLEXIVE
4.4
OBJECT FIRST
SUBJECT FIRST
Multifactorial analysis
In the previous sections, we have looked at the effect of the seven factors separately, or at the effect of one factor while controlling for a second factor. However, in the actual data, these seven factors are at work simultaneously. To investigate simultaneous effects, multifactorial statistical techniques are used. They address questions like, considering all factors at the same time, which ones do actually have an effect, what is their combined effect, what is each factor’s contribution to the combined effect, which factor is the most important one, and how good can we model the variation by the factors we have considered so far? First, we will look at a Logistic regression model. Next, we discuss a Classification and Regression Tree (CART). 4.4.1
Logistic regression model
A logistic regression model is an advanced statistical technique that estimates the simultaneous effect of the factors on word order. First, a stepwise
256
Kris Heylen
selection procedure determines which factors actually have an effect, given that all seven factors are considered simultaneously. The procedure selects the factors in order of effect strength and adds these to the model until no factors are left that still make a significant contribution to the effect on word order. In table 11, we see that five factors with a significant effect (p-value < 0.01) are selected for the model. Clause type has the strongest effect, followed by length difference, subject animacy, pronoun type and subject givenness. The procedure also selects one interaction, between clause type and pronoun type. Apparently, the effect of pronoun type is not the same in main and subordinate clauses. The model now states the combined effect of all selected factors on the odds of having subject-first (because odds must lie between 0 and 1, the effect is modelled on a logarithmic scale). Table 11. Logistic regression model Factor
DF
Estimate
Odds ratio
p
INTERCEPT
1
-5.114
1) Clausetype (subordinate)
1
2.512
2) Length diff. (small)
1
0.731
2.078