Domains and Dynasties The Radical Autonomy of Syntax
Studies in Generative Grammar The goal of this series is to publi...
29 downloads
740 Views
11MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Domains and Dynasties The Radical Autonomy of Syntax
Studies in Generative Grammar The goal of this series is to publish those texts that are representative of recent advances in the theory of formal grammar. Too many studies do not reach the public they deserve because of the depth and detail that make them unsuitable for publication in article form. We hope that the present series will make these studies available to a wider audience than has hitherto been possible.
Editors: Jan Koster Henk van Riemsdijk Other books in this series:
1. Wim Zonneveld A Formal Theory of Exceptions in Generative Phonology
2. Pieter Muysken Syntactic Developments in the Verb Phrase of Ecuadorian Quechua
3. Geert Booij Dutch Morphology
4. Henk van Riemsdijk A Case Study in Syntactic Markedness
5. Jan Koster Locality Principles in Syntax
6. Pieter Muysken (ed.) Generative Studies on Creole Languages
7. Anneke Neijt Gapping
8. Christer Platzack The Semantic Interpretation of Aspect and Aktionsarten
9. Noam Chomsky Lectures on Government and Binding
10. Robert May and Jan Koster (eds.) Levels of Syntactic Representation
11. Luigi Rizzi Issues in Italian Syntax
12. Osvaldo Jaeggli Topics in Romance Syntax
13. Hagit Borer Parametric Syntax
14. Denis Bouchard On the Content of Empty 'Categories
15. Hilda Koopman The Syntax of Verbs
16. Richard S. Kayne Connectedness and Binary Branching
17. Jerzy Rubach Cyclic and Lexical Phonology: the Structure of Polish
18. Sergio Scalise Generative Morphology
19. Joseph E. Emonds A Unified Theory of Syntactic Categories
20. Gabriella Hermon Syntactic Modularity
21. Jindrich Toman Studies on German Grammar
22. J. Gueron/H.G. Obenauerl J.-Y.Poliock (eds.) Grammatical Representation
23. S.J. KeyserlW. O'Neil Rule Generalization and Optionality in Language Change
24. Julia Horvath FOCUS in the Theory of Grammar and the Syntax of Hungarian
25. Pieter Muysken and Henk van Riemsdijk Features and Projections
26. Joseph Aoun Generalized Binding. The Syntax and Logical Form of Wh-interrogatives
27. Ivonne Bordelois, Heles Contreras and Karen Zagona Generative Studies in Spanish Syntax
28. Marina Nespor and Irene Vogel Prosodic Phonology
29. Takashi Imai and Mamoru Saito (eds.) Issues in Japanese Linguistics
Jan Koster
D mains and ynasties The Radical Autonomy of Syntax
1987 FORIS PUBLICATIONS Dordrecht - Holland/Providence - U.S.A.
Published by: Foris Publications Holland P.O. Box 509 3300 AM Dordrecht, The Netherlands Sale distributor for the U.S.A. and Canada: Foris Publications USA, Inc. P.O. Box 5904 Providence RI 02903 U.S.A. CIP-DATA Koster, Jan Domains and Dynasties: the Radical Autonomy of Syntax / Jan Koster. - Dordrecht [etc.] : Foris. - (Studies in Generative Grammar; 30) With ref. ISBN 90 6765 270 9 paper ISBN 90 6765 269 5 bound SISO 805.4 UDC 801.56 Subject heading: syntax; generative grammar
ISBN 90 6765 269 5 (Bound) ISBN 90 6765 270 9 (Paper) ©
1986 Foris Publications - Dordrecht
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission from the copyright owner. Printed in The Netherlands by ICG Printing, Dordrecht.
Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
Chapter 1. The Invariant Core of Language. . . . . . . . . . . . . . . . . . . . . . . . 1.1. The research program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. The configurational matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3. Domain extensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4. Conclusion.. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . .. . Notes.......................... . .... ......... ..............
1 1 8 17 25 29
Chapter 2. Levels of Representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. D-structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. NP-structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. Logical Form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes.......................... .............. ..............
31 31 38 57 76 98 108
Chapter 3. Anaphoric and Non-Anaphoric Control. . . . . . . . . . . . . . . . . . . 3.1. Introduction ...............................-. . . . . . . . . . . . 3.2. Where binding and control meet. . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Some minimal properties of control. . . . . . . . . . . . . . . . . . . . . . . . 3.4. Infinitival complements in Dutch. . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. Asymmetries between N and V .. : . . . . . . . . . . . . . . . . . . . . . . . . . 3.6. Conclusion.. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . Notes.......... ........ ............ .......... .. ............
109
Chapter 4. Global Hannony, Bounding, and the ECP . . . . . . . . . . . . . . . . 4.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. On the nature oflocal domains .................... -~ - ....... 4.3. The Cinque-Obenauer hypothesis. . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. The parametrization of dynasties. . . . . . . . . . . . . . . . . . . . . . . . . . 4.5. Global harmony. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6. The grammar of scope. . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7. Conclusion.. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes.. ............................ ................. .......
145 145 147 153 159 172 201 231 236
109 110 113 119 136 141 142
vi
Domains and Dynasties
Chapter 5. NP-Movement and Restructuring. . . . . . . . . . . . . . . . . . . . . . . 5.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Passives and ergatives in Dutch. . . . . . . . . . . . . . . . . . . . . . . . . .. 5.3. Case, agreement, and subject drop in Dutch. . . . . . . . . . . . . . . .. 5.4. A difference between English and Dutch. . . . . . . . . . . . . . . . . . .. 5.5. Reanalysis and covalency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6. Against reanalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7. Transparency without reanalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8. Restructuring in French. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9. Conclusion. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . Notes................... . ... . . .... .... ............... ......
239 239 242 257 265 271 279 288 296 312 314
Chapter 6. Binding and its Domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Reflexives in Dutch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3. The principles Band C in English and Dutch. . . . . . . . . . . . . . . . 6.4. Principle C effects in parasitic gap constructions. . . . . . . . . . . . . . 6.5. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes......................... ............... ..............
315 315 323 341 356 368 369
Chapter 7. The Radical Autonomy of Syntax. . . . . . . . . . . . . . . . . . . . . . .
371
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
377
Index of names. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
385
General index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
388
Preface
Linguistics, like any other field of inquiry, can only make progress through a certain diversity of viewpoints. Although there have been many challenges to "standard" theories of generative grammar, there have been relatively few major controversies within what is often referred to as the Theory of Government and Binding. The theory presented in this study accepts the major goals of Government and Binding, but differs from the standard view in a number of respects. The basic difference is that the theory of Domains and Dynasties entirely rejects the notion "move alpha" and, therefore, the idea oflevels connected by "move alpha". Apart from Lexical Structure and Phonetic Representation, only one level is accepted, namely the level of S-structure. In my opinion the traditional level of D-structure can most appropriately be seen as a substructure of S-structure, while the notion of Logical Form is rejected altogether. This study grew out of my reactions to Chomsky's Pisa lectures. Shortly before the Pisa lectures, I had published a version of Subjacency (the Bounding Condition) that appeared to be almost indistinguishable from principle A of the binding theory. This strongly suggested that a generalization was being missed. Currently, more than seven years after the Pisa lectures, a condition like the Bounding Condition also shows up in mainstream GB theories under the name O-subjacency, and also in the idea that all traces are antecedent-governed in a strictly local domain. It seems to me that such a strict locality condition makes traditional Subjacency superfluous and that it brings back into focus what I consider one of the most important problems of the theory of grammar: how is the locality condition for the binding of traces related to the locality domains of other grammatical dependencies? The answer given here is that at an appropriate level of abstraction, there is a uniform locality condition for all grammatical relations of a certain type. The idea of a uniform locality condition leads to the Thesis of Radical Autonomy. According to this thesis, core grammar is characterized by a c9nfigurational matrix of properties that are entirely constructionindependent. A further perspective is that the configurational matrix determines the form of a computational faculty that is not intrinsically built for language. Grammar in the traditional generative sense is perhaps only an application of this computational module, in the same way that book-keeping is an application of arithmetic. Language in this view only
viii
Domains and Dynasties
originates through the interaction of the abstract computational module with our conceptual systems, whereas the lexicon can be considered the interface among these components. Rules like LF-movement cannot be fundamental computations from such a perspective since they are specific to certain conceptual contents, which belong to a different and presumably equally autonomous system. Research for this book started in 1979 in a project (Descriptive Language) organized by the University of Nijmegen and the Max Planck Institute for Psycholinguistics and sponsored by the Netherlands Organization for the Advancement of Pure Research (Z.W.O.). The original versions of my theory were discussed with Angelika Kratzer of the Max Planck Institute, and with Dick Klein and John Marshall of the University of Nijmegen, among others. The many visitors to the Max Planck Institute, Robert May and Edwin Williams in particular, also contributed much to the development of my views. Also during this time, I had regular meetings with a group of linguists from the Federal Republic of Germany. This book would probably not exist without the many discussions of Chomsky'S Pisa lectures I had with Tilman H6hle, Craig Thiersch, Jindra Toman, Hans Thilo Tappe, and many others. I have very good memories of the friendship and encouragement I experienced in this group. Most of the work on this book was done after I joined the faculty of Tilburg University in 1981. Here, I worked under the excellent conditions created by Henk van Riemsdijk. As ever, I felt greatly stimulated by the harmonious combination of friendship and polemics dating back to our student days. Several aspects of this study were discussed with Henk, and also with my other colleagues at Tilburg, including Reineke Bok-Bennema, Norbert Corver, Jan van Eijck, Anneke Groos, Casper de Groot, Anneke Neijt, Rik Smits, and Gertrud de Vries. Furthermore, I was able to discuss my work with several visitors, such as Ken Hale, Jean-Roger Vergnaud, and Maria-Luisa Zubizarreta. More than anything else, the content of this study was inspired by the seminal work of Richard Kayne. I learned very much from our discussions and from the critical comments that Richie gave me on several parts of the text. Likewise, I was inspired by the work of Guglielmo Cinque and Hans-Georg Obenauer, as is clear from several chapters. In addition, I would like to thank Guglielmo Cinque for his detailed comments on large parts of the text. Other colleagues and friends I would like to thank for comments include Hans den Besten, Elisabet Engdahl, Ton van Haaften, Riny Huybregts, David Lebeaux, Robert May, Carlos Otero, Christer Platzack, Thomas Roeper, and Tarald Taraldsen. I am grateful to Gaberell Drachman of Salzburg University, Austria, for giving me the opportunity to present parts of this book at the Salzburg International Summer Schools of 1982 and 1985. I was much encouraged and stimulated by the discussions and the friendship of the many participants. As for the 1982 Summer School, I would like to acknowledge
Preface
ix
the contributions ofSascha Felix, Wim de Geest, Liliane Haegeman, Hubert Haider, David Lebeaux, Anna Szabolcsi, and Dong-Whee Yang. Of the 1985 Summer School, I would like to mention Elena Benedicto, Clemens Bennink, Leonardo Boschetti, Anna Cardinaletti, Kirsti Christensen, Gunther Grewendorf, Willy Kraan, Martin Prinzhorn, Alessandra Tomaselli, and Gert Webelhuth. The Netherlands Organization for the Advancement of Pure Research (Z.W.O.) gave me the opportunity to visit MIT and the University of Massachusetts at Amherst in the fall of 1983 (grant R30-191), which I hereby gratefully acknowledge. At MIT, I discussed parts of chapter 4 with Noam Chomsky, Danny Jaspers, Carlos Quicoli, Luigi Rizzi, and Esther Torrego, among others. At Amherst, I profited from the comments of David Pesetsky and Edwin Williams. Charlotte Koster read the whole text and proposed many improvements of both content and style. Especially chapter 6 owes much to her ideas on learnability. I would like to thank her in more ways than one, as ever! In preparing the final text, I received excellent editorial assistance from Rita DeCoursey of Foris Publications and technical assistance from the staff of my current department at the University of Groningen. In the department, Corrie van Os helped me with the bibliography and Wim Kosmeijer compiled the index. Versions of chapters 1 and 3 were published earlier, respectively in Theoretical Linguistic Research 2 (1985), 1-36, and Linguistic Inquiry 15 (1984), 417-459, and are reprinted here with kind permission of the publishers. Jan Koster Groningen, December 1986
Chapter 1
The Invariant Core of Language
1.1. The research program
Recently, N oam Chomsky appropriately characterized the goal of generative grammar as a contribution to the solution of "Plato's problem": how can we know so much given that we have such limited evidence?! Among the cognitive domains that confront us with this problem, our language is a particularly striking and important example. In studying human language, it is difficult not to be impressed by the richness, subtlety, and specificity of the system of knowledge acquired. Since only a fraction of this richness seems to be encoded in the evidence available to the language learner, much of the architecture of the acquired system must spring from the innate resources of the organism itself. Either the learning child possesses rich powers of abstraction and generalization (general learning strategies), or its inborn capacities involve an articulated and specific system that is only triggered and "finished" by the evidence. There is, to my knowledge, no research program in linguistics that is based on general learning strategies and that is even beginning to come to grips with the richness of our knowledge of language. So far, only the second approach, i.e. the attempt to formulate a highly articulate initial scheme, has attained a promising degree of success. I therefore believe that this is the right approach to Plato's problem in the domain of natural language. This conclusion is sometimes called pretentious or unmotivated, but it is often hard to see what motivates the opposition beyond prejudice. On the one hand there is not the slightest evidence that the data available to the child, or "general learning mechanisms", are rich enough to account for the nature of the system acquired; on the other hand, the program based on the alternative, the assumption of an articulate initial scheme, has led to a very successful research program. I fail to see how critics of the Chomskyan program can account for the total lack of success of the other theories and the continuous development and success of the program criticized. Even if one fully agrees with Chomsky's approach to Plato's problem, there are different ways to execute the research program based on it. Generative grammar in Chomsky's sense is a much more pluriform enterprise than it is sometimes believed to be. This pluralism is generally 1
2
Domains and Dynasties
considered healthy and even necessary for progress, as in any other science. It is a truism that one of the most effective tools towards progress is criticizing existing theories by the formulation of challenging alternatives. Given the Chomskyan approach to Plato's problem, then, we can distinguish several largely overlapping but sometimes conflicting lines of research. The most common line of research has always stressed the importance of distinct levels of syntactic representation. Most of these levels are supposed to be connected by a special mapping, nowadays generally referred to as "move alpha". Chomsky, for instance, distinguishes lexical structure, D-structure, S-structure, Logical Form (LF), and Phonetic Form (PF). Van Riemsdijk and Williams (1981) add yet another level to this series, namely the level of NP-structure. My own approach differs somewhat from this commonly assumed picture. It has always seemed to me that with the introduction of trace theory in Chomsky (1973), the original arguments for certain levels have lost their force. To a certain extent, this was also observed by Chomsky at the end of "Conditions on Transformations" (1973): as soon as you have traces there is an obvious alternative according to which traces are basegenerated at S-structure. In this view, D-structure is not necessarily a separate level, but can also be interpreted as a substructure or a property of S-structure. 2 Chomsky has never been convinced of the meaningfulness of the alternative, mainly because of the alleged properties of "move alpha". In Chomsky's view, the alternative could only be formulated with interpretive rules at S-structure that duplicate the unique properties of "move alpha".3 Since I believe that this latter conclusion is false, I have been trying to develop the alternative in Koster (1978c) and subsequent papers. These attempts have nothing to do with a general preference for frameworks without transformations or with a preference for context-free rules in the sense of Gazdar and others. 4 I agree with Chomsky (1965) that the significant empirical dimension of the research program has little to do with the so-called Chomsky hierarchy. What is significant is the attempt to restrict the class of attainable grammars (perhaps to a finite class) in a feasible way. From this point of view, formulating grammars with or without transformations is not necessarily a meaningful question (apart from empirical considerations). My main argument i!; that I consider the attempts to isolate the properties of "move alpha" entirely unconvincing. "Move alpha" exists only to the extent that it can be shown to have properties. Neither attempts to establish properties of "move alpha" directly, nor attempts to establish movement indirectly by attributing special properties to its effects (traces) have been successful, in my opinion. At the same time, it is understandable that these attempts to isolate "move alpha" as something special have inhibited research into unified theories, i.e. theories that subsume movement and, for instance, anaphora under a common cluster of properties.
The Invariant Core of Language
3
Functionally speaking, "move alpha" is insufficiently general for the job that it is supposed to do. Movement can be seen as a transfer mechanism: it connects certain categories with deep structure positions (which are also available at S-structure under trace theory) and transfers the Case- and 8license of these positions to the moved categories. It is hardly controversial that not all transfer can be done by movement. A standard example demonstrating this is left dislocation: (1)
That book, I won't read it
Originally, such sentences were also derived by movement transformations (see Ross (1967)). But it is generally assumed now that (1) and many similar cases of transfer cannot be accounted for by "move alpha". An example like (1) shows that anaphors like it can transfer 8-roles to NPs (like that book) in non-8-positions. This independently needed transfer mechanism makes "move alpha" superfluous. Obviously, we can do with only one general transfer mechanism from dependent elements to their antecedents. This transfer mechanism is instantiated by (1) and in a similar way by a "movement" construction like (2): (2)
Which book did you read t?
The trace tin (2) appears to behave like the pronominal it in (1) in the relevant aspects. The burden of proof is certainly on those who claim that we need an entirely new transfer mechanism ("move alpha") beyond what we need anyway for (1). Attempts have been made to meet this burden of proof, but the question is whether these attempts have been successful. If "move alpha" is superfluous from a functional point of view, it might still be argued that it can be recognized by its special properties. Chomsky (1981b, 56) argues that the products of "move alpha", traces, have the following three distinct properties: (3)
a. b. c.
trace is governed the antecedent of trace is not in a 8-position the antecedent-trace relation satisfies the Subjacency condition
Note, however, that none of these properties uniquely distinguishes movement from other grammatical dependency relations. It is already clear from (1) that also the antecedents oflexical anaphors (or pronominals) can be in non-8-positions (3b). Also, government (3a) is not a distinguishing property, because all lexical anaphors bear Case and must therefore be governed. 5 The only plausible candidate for the status of distinguishing property has always been Subjacency (3c). It is for this reason that I have focused on this property in Koster (1978c) and elsewhere. The crucial question from my point of view, then, is whether Subjacency is really that different
4
Domains and Dynasties
from, say, the locality principles involved in the binding theory of Chomsky (1981b). If we take a closer look at Subjacency, it can hardly be missed that the form it is usually given (and which is clearly distinct from the anaphoric locality principles) is entirely based on certain idiosyncrasies of English and a few other languages. Under closer scrutiny, Subjacency as a separate property appears to dissolve. The version originally proposed on the basis of English in Chomsky (1973) simply conflates a general locality principle with a small extension for limited contexts in English. Before I demonstrate this with examples, I would like to stress that I consider Subjacency, or more generally, the idea that "unbounded" movement is built up from a succession of local steps, as one of the most important advances of generative grammar in the 1970s. Thanks to Subjacency, it has become clear for the first time that grammatical dependency relations that look wildly differen t at the surface might, contrary to appearances, be instantiations of a common underlying pattern. Subjacency has been a crucial conceptual step, and my own attempts at further unification only became possible because of Subjacency, which reduced a mass of seemingly unbounded relations to a simple local pattern. My criticisms do not concern Subjacency as a strict locality principle, but the particular form given to it in Chomsky (1973), which makes it unsuitable for further unification with other locality principles. If we want a further unification, we have to get rid somehow of the differences between the locality format for movement (Subjacency) and, for instance, for anaphora (principle A of the binding theory). At first sight, this is not so easy because there seem to be some clear differences. These differences can be summarized as follows: (4)
a.
b. c.
Subjacency is often formulated as a condition on derivations, while principle A of the binding theory is a condition on representations Subjacency involves two domain nodes, while principle A only involves one node (the governing category) Contrary to Subjacency, principle A involves opacity factors like INFL or SUBJECT
Given the desirability of unification, these differences present themselves as a puzzle: how can we show that "move alpha" and anaphoric binding are governed by the same basic locality principle? Let us consider in turn the differences listed in (4). Originally, Subjacency was formulated as a condition on derivations. But Freidin (1978) and Koster (1978c) claimed that, with traces, it could just as well be formulated as a condition on representations. Also Chomsky (1985a) formulates Subjacency as a condition on S-structure. So, it is questionable whether this point is still controversial: we can simply formulate
The Invariant Core of Language
5
Subjacency as a condition on S-structure, just like principle A, as long as there is no evidence to the contrary. There is also an easy solution to the second difference. In Koster (1978c) it was concluded that even for English, Subjacency could be replaced by a one-node domain statement (like the later principle A for anaphors) for all contexts except one. The standard two-node formulation was based on the peculiar postverbal context of English, which was a bad place to look to begin with. Thus, in general, the bounding facts of English can be formulated by specifying just one bounding node, S' or NP. Much of the subject condition of Chomsky (1973), for instance, follows from a condition that says that elements cannot be extracted from an NP: (5)
*Who did you say that [NP a picture of t] disturbed you?
The one-node format would have been sufficient for these cases, but it did not seem to be for contrasts like the following: (6)
a. b.
Who did you see [NP a picture of t] *Who did you hear [NP stories about [NP pictures of t]]
Even from English alone, however, it is clear that (6b) is irrelevant for a choice between a one-node and a two-node Subjacency format. The reason is that standard two-node Subjacency is both too strong (7b) and too weak (7a) for English in this context: (7)
a. b.
* Who did you destroy [NP a picture of t] Which girl did you consider [NP the possibility of [NP a game with t]]
As (7a) shows, one node can already lead to unacceptable sentences, while (7b) and many other examples show that extraction across two or even three bounding nodes may still yield acceptable sentences. In short, one node is sufficient for all contexts of English, except the postverbal context, in which we can find almost anything. The conclusion that Subjacency is a one-node condition was reinforced by the fact that even (6a) is ungrammatical in most languages, Dutch among them: (8)
*Wie heb je [NP een foto van t] gezien?
It must therefore be concluded that one node is sufficient for Subjacency in almost all languages known to have "unbounded" movement in all contexts, and in some languages, like English, in all contexts but one. In the exceptional context, two-node Subjacency is just as irrelevant as onenode Subjacency.
6
Domains and Dynasties
On the basis of the facts, then, we are justified in also taking the second step towards unification: both bounding and binding involve local domains that specify only one node. Of course, we are left with the problem of how to account for cases like (6a) and (7b), but it seems at least plausible that this problem has nothing to do with Subjacency. Recently, I have tried to give a solution for this problem by adopting certain ideas formulated by Kayne (1983). According to this solution (Koster (1984b) and chapter 4 below), the basic bounding domain is a one-node domain, which can be extended under very specific and partially universal conditions. A bounding domain can be extended only if the last trace of a chain is structurally governed and if all domains up to the antecedent are governed in the same direction. With some qualifications, to which I will return, I believe that bounding is constrained by the one-node format in all other cases. This part of the puzzle is therefore solved by splitting standard Subjacency in two parts: a universal one-node domain specification, and a domain extension based on the language-particular fact that prepositions can be structural governors in English, together with the fact that the direction of government is rather uniform in English. As I will argue below, the one-node domain that we have split off from Subjacency forms the basis of a construction-independent and universal locality principle. With respect to this one-node locality principle, all languages are alike, while languages differ with respect to the extensions, which are also the loci of parametric variation. If this hypothesis solves the first two aspects of the unification puzzle, the next step is trying to solve the third aspect by splitting off the same universal domain from the binding conditions for anaphors. In the case of anaphoric domains, it is already generally assumed that the locality format involves only one node, the governing category. The big problem here is how to split off the opacity factors, such as INFL and SUBJECT. It seems to me that the solution is very similar to what we saw in the case of bounding: there is a basic one-node domain defined without opacity factors; these opacity factors only playa role in partially languagespecific domain extensions. As before, English is a poor choice to illustrate this because this language has a relatively impoverished system of anaphors. But in many languages clitics are used in the domain of V, while different pronouns are used for binding into PPs and other constituents. For the clitics, the opacity factors are usually irrelevant: the clitics are simply bound in the minimal Xmax (Sf) in which they are governed, just like traces. 6 Thus, a clitic governed by V is bound in its minimal Sf, just like a trace governed by V. Often clitics cannot be bound in any other environment. French, for instance, uses a reflexive se in the domain of a verb, but other forms, like /ui-meme, in the domain of P and other categories (see chapter 6 for a more elaborated account). Dutch forms a very interesting illustration of this point of view. This
The Invariant Core of Language
7
language has at least two reflexives, zich and zichze?f. The crucial fact is that these reflexives overlap in the domain of V, but contrast in other contexts (i.e. in extended domains), for instance in the domain of P. The following examples illustrate this: (9)
a.
Jan J an Jan Jan
b.
wast washes wast washes
zichzelf himself zich himself
It is not the case that both reflexives occur with all verbs in this context, which is probably a lexical fact. The point is that verbs that select both forms can have them in the same context, namely the domain of V. We can account for the sentences in (9) by a domain statement that does not refer to opacity factors like SUBJECT. We can simply say that both zich and zichze?{ are bound in the minimal X max of the governor V (under the assumption that this domain is S'). I assume that in the unmarked case both Dutch reflexives are bound in their minimal X max (in practice only the minimal S') without any reference to opacity factors. Opacity factors only play a role in the marked case, under so-called "elsew here" conditions. Thus, if the reflexives are not bound in their minimal xmax, they contrast with respect to the notion subject: zichzelJ must be bound in the minimal domain containing a subject, while zich must be free in this domain. The contrast is illustrated in the following examples, in which the reflexives are bound across a PP boundary (and therefore not bound in their minimal governing Xmax):
(10)
a. b.
Jan schiet [pp op zichzelfJ J an shoots at himself *J an schiet [pp op zichJ
Thus, in Dutch the distinction between the basic domain and the extended domain (which involves opacity factors) can be detected by the fact that the two reflexives overlap in the former domain while they are in complementary distribution with respect to the latter domain. There is much more to say about Dutch reflexives (see Koster (1985) and chapter 6 below), but the basic approach is clear from these simple examples. The path towards unification, then, can only be followed if we see that neither standard Subjacency (with its two nodes) nor binding principle A (with its opacity factors) formulates the primitive locality domain for the dependency relations in question. Both conditions conflate the common universal part with language-particular extensions. If we split off the extensions, it appears that bounding and binding are governed by exactly the same basic locality principle. The approach taken here involves a theory of markedness. The un-
8
Domains and Dynasties
marked locality principle for all local dependencies in all languages is a simple one-node domain principle that says that an element must be connected with its antecedent in the minimal xmax in which it is governed. Beyond this, there are only marked extensions from which languages may or may not choose. Both directionality factors in the sense of Kayne (1983) and opacity factors in the binding theory belong to the theory of markedness. The theory of markedness is also the main locus of parametrization. The basic, unmarked domain might be part of all languages without parametrization; this certainly is the strongest possible hypothesis, one that we would like to maintain as long as possible. If all this is correct, the unmarked format for Subjacency (the Bounding Condition of Koster (1978c)) is indistinguishable from the unmarked locality format for binding. None of the properties in (3), then, distinguishes "move alpha" from any other dependency relation in the unmarked case. If "move alpha" can be detected neither by its functional role nor by its properties, then without new evidence, there is no reason to assume that "move alpha" exists.
1.2. The configurational matrix The most fundamental notion of the theory of grammar is the dependency relation. Most grammatical relations are dependency relations of some kind between a dependent element () and an antecedent a: (11)
... a,
... ,o ...
LJ R
In anaphoric relations, for instance, the anaphors are dependent on their antecedent. Similarly, subcategorized elements that receive a a-role or Case are dependent on some governor, usually the head of a phrase. There are many different types of dependency relations, but all have something in common, both functionally and formally. Functionally speaking, dependency relations have the following effect: (12)
share property
Any kind of property can be shared by two properly related elements. Antecedent and anaphor, for instance, share a referential index, which entails that they have the same intended referent. A "moved" lexical category and its trace share one lexical content (found at the landing site) and one set of licensing properties (found at the trace position). Formally speaking, all dependency relations have the same basic form, while some have their basic form extended in a certain way. As already
The Invariant Core of Language
9
indicated in the previous section, domain extensions are languageparticular options that result from parameter setting, and which fall within the limits of a very narrow hypothesis space, which is defined by Universal Grammar. Domain extensions for empty categories involve chains of equally oriented governors, amI domain extensions for other anaphors involve the opacity factors or chains of governors that agree with respect to some factor. More will be said on domain extensions in the next section. In this section, I will only define the basic, unextended form of dependency relations. First, I will mention and briefly illustrate the properties of the relation R (of (11)). Then, I will discuss the question to what extent the list of properties has some internal structure. I will conclude this section with a discussion of the scope of the properties in question. As I have discussed elsewhere, it seems to me that basic dependency relations of type R (in (11)) have at least the following four properties: 7 (13)
a. b. c. d.
obligatoriness uniqueness of the antecedent c-command of the antecedent locality
The first property, obligatoriness, is almost self-explanatory. All dependency relations with the properties of (13) are obligatory in the sense that the dependent elements in the relation must have an antecedent. Thus, a reflexive pronoun does not occur without a proper antecedent (14)
*1 hate himself
A structure like (14), in which no antecedent for the reflexive can be found, is ill-formed, and if there is an appropriate antecedent, it cannot fail to be the antecedent:
(15)
John hates himself
In this respect, the binding of reflexives differs from the binding of other pronouns, like the (optional) binding of him in: (16)
John thinks that Mary likes him
As is well known, we can optionally connect him with the possible antecedent John, but we may also leave the pronoun unbound. The second property, uniqueness, applies only to antecedents. Thus, we may connect an antecedent with more than one anaphor:
(17)
They talked with each other about each other
10
Domains and Dynasties
But we can only have one antecedent for an anaphor; in other words, split antecedents are impossible: (18)
*John
confronted Mary with each other
Again, this is not a necessary property of anaphoric connections. Pronominals differ from bound anaphors in that they can take split antecedents, as has been known since the 1960s: (19)
John told Mary that they had to leave
The third property, c-command, is so well known that it hardly stands in need of illustration here. In (20a), himself is not c-commanded by the antecedent John. For pronominals, c-command is not necessary, as shown by (20b): (20)
a. b.
*[NP The father of John] hates himself [NP The father of John] thinks he is happy
The form of c-command that I have in mind is the more or less standard form proposed by Aoun and Sportiche (1983), according to which the minimal X max containing the antecedent must also contain the anaphor. The fourth property, locality, is illustrated by the following contrast: (21)
a. b.
John hates himself thinks that Mary hates himself
*J 01m
Again, it can be observed that pronominals like him are not constrained by the locality principle in question: (22)
J 01111 thinks that Mary likes him
The standard form of locality for anaphors is given by principle A of the binding theory of Chomsky (1981b, ch. 3): anaphors must be bound in their governing category. A governing category is the minimal X max containing the governor of an anaphor and a SUBJECT (subject or AGR) accessible to the anaphor. The basic form of locality that I am assuming here differs from this standard format. Instead, I will assume that the Bounding Condition of Koster (1978c) is basic, not only for empty categories, but for all local dependencies: (23)
Bounding Condition
A dependent element 8 cannot be free in: '" [~ ... 8 ... ] ... where ~ is the minimal Xmax containing 8 (and the governor of 8)
The Invariant Core of Language
11
This locality principle accounts for the contrast between (24a) and (24b), under the assumption that Sf is the relevant xmax: (24)
a. b.
[s' J 0/1/1 hates himself] John thinks [s' that himself is sick]]
*[s'
The following acceptable sentence is not accepted by the basic locality principle (23), because himself is not bound in the minimal PP in which it is governed:
(25)
J 0/111 depended
[pp
on himse?fJ
This sentence is only accepted by adding a marked option to the basic locality principle. According to this "elsewhere" condition, a reflexive must be bound in the extended domain defined as the minimal Xmax that contains a subject. Thus, principle A of the binding theory is considered a marked, extended domain from this point of view. 8 Apart from this not unsubstantial modification, the properties listed under (13) are well known, especially c-command and locality. What has not received much attention, however, is the fact that the properties in question form a cluster: if a dependency relation involves locality it usually also involves c-command and uniqueness. The fact that these properties co-occur suggests that there might be some further structure to this collection. It seems to me that the relation R is in fact a function. According to the definition of a function, there is a unique value in the co-domain for each argument in the domain. Suppose now that we take dependent elements in a given structure as arguments. In that case, we can consider antecedents in the same structure as values. The function is not defined in structures without appropriate antecedents, and these structures are rejected. In this way, we account for the obligatoriness of R (property (13a)). Similarly, we account for the uniqueness property: a function always gives a unique value for a given argument, in this case a unique antecedent. Assuming that R is a function, the only two substantial properties are (13c) and (13d): c-command and locality, respectively. It seems to me that these two properties are not unrelated either. In fact, both properties are locality principles. C-command is locality seen from the perspective of the antecedent. It can be formulated as follows: (26)
C-command
A potential antecedent a cannot be free in: ... [p ... a ... ] ... where ~ is the minimal Xmax containing a
This is very similar to the Bounding Condition (23), repeated here for convenience:
12 (27)
Domains and Dynasties Bounding Condition A dependent element 8 cannot be free in: ... [~ ... 8 ... ] ... where P is the minimal Xmax containing 8 (and the governor of 8)
The similarity between (26) and (27) is just too striking to be accidental. I assume therefore that R is a bilocal function, a function that gives a unique value (the antecedent) for each dependent element, in such a way that the antecedent is in the minimal domain of the dependent element (cf. (27)) and the dependent element in the minimal domain of the antecedent (cf. (26)). If this conclusion can be maintained, the list in (13) can be replaced by a simple function that shows a certain degree of symmetry with respect to the notion "locality". An intriguing question that I will not pursue here is whether there is a counterpart to the notion of domain extension for (26). Recall that one of the most general domain extensions for (27) involves the notion "subject". Under this extension, a dependent element is not accessible in the domain of a subject. If there is full symmetry in this respect, we expect that there are also languages that define their antecedent domain as a similar extension of (26): in such languages potential antecedents are not accessible in the domain of a subject I have argued elsewhere that it is exactly this situation that we find in languages like Japanese, Korean, and many others, in which only subjects can be antecedents for reflexives: if potential antecedents are not accessible in 'the domain of a subject, only the subject itself is accessible in the given domain (Koster (1982b)). If this conclusion is correct, then unrestricted c-command, as in English, is the unmarked condition for antecedents, while the subjects-only option for antecedents is a marked extension, not unlike the extensions that we find for anaphors in principle A of the binding theory. This would be a remarkable confirmation of the view that c-command is the antecedent counterpart of locality, as it is usually defined for the dependent element. In any case, it seems worthwhile to look not only for lists of correlating properties like (13) but also for the deeper structural principles from which these properties follow. The properties in (13) (and the principles from which they follow) define a configurational matrix for almost all grammatical dependency relations. There are surprisingly few relations that are not somehow characterized by the properties of this configurational matrix. In fact, there might be only one major class of exceptions, which I will briefly discuss in a moment. Furthermore, there are anaphoric systems, like the one for the reflexive zibun in Japanese, that seem to be characterized by locality on the antecedent (c-command) but not by locality on the dependent element (as in the case of English anaphors). The major exception that comes to mind is the class of dependencies
The Invariant Core of Language
13
that seem to be characterized by principles of argument structure. Thus, control structures are not generally characterized by the properties in (13). There are control structures without obligatory antecedents (28a), with split antecedents (28b), with non-c-commanding antecedents (28c), and with nonlocal antecedents (28d) (see Koster (1984a) and chapter 3 below): (28)
a. b. c. d.
It is impossible [PRO to help Bill] John proposed to Mary [PRO to help each other] It is difficult for Mary [PRO to help Bill] John thinks [s it is impossible [s PRO to shave himself]]
In some cases, the antecedent of PRO must f-command it (in the sense of Bresnan (1982)). Similar observations can be made about anaphor binding in many languages. Even in English, c-command is not always necessary, as was observed by J ackendoff (1972): (29)
A book by John about himself
This does not mean that the configurational binding theory can be replaced for English by a theory based on argument structure. In languages like English and Dutch, possibilities like (29) are limited to certain prepositions, while c-command is much more generally usable. In control structures, principles of argument structure are more prominent in English, but even in the case of control these principles interact with the purely structural notions of (13) (see Koster (l984a) and chapter 3 below). One might argue that Universal Grammar defines two systems: a system based on argument structure, and a purely structural system. The former system might be the older system, while the latter system might be the result of a later evolutionary development. Whatever the merit of these speculations, it seems to me that nonconfigurational principles have a minority position in most natural languages. Most dependency relations fall within the limits of the configurational matrix characterized by (13). At least the following dependency relations have the form specified by (13): (30)
a.
b.
c. d.
licensing relations government subcategorization El-marking Case assignment agreement subject-verb CO MP-verb anaphor binding movement
14
Domains and Dynasties
e. f. g.
NP-movement Wh-movement obligatory control predication gapping
For most of these dependencies, Chomsky (1981b, 1982a) postulates different modules, such as government theory, Case theory, binding theory, bounding theory, control theory, etc. Insofar as each of these subtheories has some characteristics of its own, I agree. But it would be a mistake to consider each subtheory a totally primitive structure. To a large extent, the subtheories are made from the same stuff, namely the properties of the configurational matrix (13). In many cases, the fact that the construction types in (30) have the properties listed in (13) needs little illustration. It is clear, for instance, that the licensing relations, (30a), have the four properties: a subcategorized element is obligatorily dependent (13a) on a unique head (13b).· Furthermore, the head c-commands its complements (13c) in a local domain, i.e. the head does not govern into the domain of another governor (13d). Similarly, the agreement relations, (30b), and the predication relation, (30f), have the four properties in a rather perspicuous manner. The other relations are interesting in that they seem to contradict the uniformity hypothesis in one way or another. Obligatory control has already been briefly discussed: a well-defined subclass of control structures has the properties listed in (13), as has been argued in Koster (1984a) and chapter 3 below. Anaphor binding and movement are the most problematic from the point of view of a unified theory. Both seem to involve wildly varying domains, within one language, and also across languages. Some of this variation has already been discussed, and I will return to it in the next section. I will conclude the present section with some nonstandard applications of the configurational matrix. First, I will give a brief review of the properties of the gapping construction, which is constrained by (13) in a nontrivial way.9 One problem with gapping is that it is not quite clear what kind of representation is appropriate for coordinate structures. Often, coordination has been treated in terms of normal tree structures. Accordingly, the gaps in the gapping construction were handled by the usual transformational or interpretive processes. Thus, in Ross (1967) the gap in (31 b) is created by deleting the corresponding verb in (31a): (31)
a. b.
John reads a newspaper and Mary reads a book John reads a newspaper and Mary a book
Using essentially the same type of representation, others (like Fiengo (1974)) have replaced the deletion transformation by interpretive rules.
The Invariant Core of Language
15
More radical proposals do not consider coordinated structures as basic phrase markers but as the derivative product of a linearization rule. One of the earliest examples is Williams (1978), and more recently De Vries (1983) and Huybregts (to appear) have been exploring three-dimensional representations (based on set union of reduced P-markers in the sense of Lasnik and Kupin (1977)). F or present purposes, I will assume representations in the spirit of Williams (1978), which is most readily accessible. In this kind of framework, conjuncts before linearization can be represented in columns: John I NP Mary
(32)
Ireads
a newspaper NP a book
I
II
In this representation, elements in the same column have the same function. Thus, both John and Mary have the status of subject, and they receive the same 9-role. The conjuncts each occupy one row, and two conjuncts are properly coordinated if the minimal Xmax containing the column of the two conjuncts contains a conjunction. As before, we assume that S' can function as the minimal Xmax containing the elements governed by V (or INFL). Applied to (32), this means that both John and Mmy and a newspaper and a book are properly coordinated. The column with John and Mary, for instance, is accepted by the conjunction and in its minimal S'. The same holds for the column with a newspaper and a book. In coordinate structures, then, the relation R of (11) is interpreted as a relation between conjunctions and columns of type Xi (where Xi is an element from the X-bar system). A special feature of (32) is that the gap of the second conjunct is not considered a deletion site or an empty V. The properties of the verb read are simply equally distributed over the members of the column to which the verb is related. Thus, in (32) both the book and the newspaper are governed by the verb read. If we assume that the relation between conjunctions and columns has the properties in (13), many facts about gapping are explained. Particularly, the local properties of gapping are explained if we assume that columns are only possible if they are licensed by a conjunction in the same local domain (in the sense of the Bounding Condition; see Koster (1978c, ch. 3)). For instance, the facts that Neijt (1981) seeks to explain in terms of Hankamer's Major Constituent Condition seem to follow. A relevant contrast is the following: (33)
a. b.
*Peter was invited by Mary and John Peter was invited by Mary and John
'/1# /#ft¢~ '/v# #Ym~
~i Bill
by Bill
Contrary to (33b), the gap of the ungrammatical (33a) also includes the preposition by.
16
Domains and Dynasties
The explanation is straightforward, if we assume that gapping is constrained by (13). Consider the underlying representation of (33a):
(34)
a.
* Is,
and
Peter INP John
I was invited
This sentence is ungrammatical because Mary and Bill are not properly coordinated, i.e. the maximal column containing these NPs is not licensed by a conjunction in the minimal local domain (which is the PP headed by by). The representation underlying (32b), however, is well-formed: (34)
b.
Is'
and
Peter INP John
Iwas
invited
In this case, Mary and Bill are part of the more inclusive PP column, thanks to the presence of the second occurrence of by. The PP conjuncts are properly coordinated because their column is licensed by the conjunction and in the minimal domain S'. These examples are representative of the local properties of gapping as described by the Major Constituent Condition of Neijt (1981). The facts straightforwardly follow from the Bounding Condition, which also determines all other local dependencies. Various other hitherto unexplained gapping facts follow from the hypothesis that gapping is constrained by the configurational matrix. So far, it is clear that the list in (30) covers an enormous mass of facts. Many entries are themselves abbreviations for large collections of constructions, "Wh-movement" for instance (see Chomsky (1977)). And yet the list is probably far too short, due to certain arbitrary limitations imposed on the relations considered. One such limitation is the fact that usually only those instantiations of R in (11) are considered in which a does not dominate O. As soon as we drop this arbitrary limitation, the scope of the configurational matrix is considerably extended. Consider for instance the vertical relation in the xbar system, and in phrase structure in general. All sister nodes depend on an immediately dominating mother node. The relation between mother and daughters has the properties in (13): the relation is obligatory (13a), there is always a unique mother to a given pair of daughters (13b), and clearly the relation is local (13d):
The Invariant Core of Language
(35)
17
[vp V [pp P NP]]
P is the head of PP and not of VP, which (for the P) is beyond the limits imposed by the Bounding Condition. It seems to me, then, that there is a close relationship between the Bounding Condition and X-bar theory. The nodes of a projection form a family within the domain (Xmax) defined by the Bounding Condition. Similarly, our modified concept of c-command applies (13c): not only are daughter nodes determined by the mother node within their minimal xmax, but also the mother node determines the daughters within its minimal Xmax. It is somewhat accidental, perhaps, that vertical grammatical relations (like the relations between members of a projection) have hardly been studied from the same perspective as "horizontal" relations like anaphora and movement (an exception is Kayne (1982)). If we abstract away from the distinction related to dominance, it might appear that (13) simply sums up the properties of all local relations of grammar, including both those given in (30) and those implied by the X-bar system. In chapter 2, some applications of this perspective will be discussed. Henk van Riemsdijk has pointed out (personal communication) that scope relations can be seen as an instantiation of "vertical locality". Normally, quantified NPs are assigned a scope either by (an interpretation of) QR (May (1977)) or by relating the quantified element to an abstract morpheme Q (in the sense of Katz and Postal (1964)). Both procedures have the effect that the properties of the scope relation are given the format of a "normal" dependency relation, in which the dependent element is not dominated by its antecedent. If the dominance/nondominance distinction is irrelevant, we can assign scope to a quantified element without QR or an abstract morpheme. We can simply interpret the scope of a quantified element as a relation between this element and the minimal S that contains (i.e. dominates) it. I will not pursue further the many intriguing consequences of interpreting (13) also as a property of vertical relations. Apart from the applications discussed in chapter 2, I consider the vertical dimension as a topic for future research.
1.3. Domain extensions So far we have assumed that purely structural grammatical dependency relations have the same unparametrized form in all constructions in all languages (in the unmarked case). This form is determined by the properties in (13), which include the C-command Condition (26) and the Bounding Condition (27) as universal locality principles. For several constructions in several languages nothing further has to be said.
18
Domains and Dynasties
But in many languages the basic domain as determined by the Bounding Condition can be "stretched" in a certain manner. As mentioned before, domain stretching belongs to the theory of markedness. This conclusion is based on the fact that it is not universal and subject to parametric variation. A trace of Wh-movement, for instance, cannot be bound across a PP boundary in most languages. This fact follows from the Bounding Condition (27), which entails that a trace must be bound in the minimal PP (an Xmax) in which it is governed. In other words, the domain for Wh-traces cannot be stretched beyond the size of a PP (or any other xmax) in most languages (with overt Wh-movement). English and the Germanic Scandinavian languages are among the very few languages with preposition stranding, which entails domain stretching beyond PP boundaries. But even in these languages, this marked phenomenon is limited to very narrowly defined conditions, to which I will return in a moment. Standard Subjacency blocks extraction from complex NPs (in the sense of Ross (1967)), but allows extraction from PPs. This shows that Subjacency, taken as a universal locality principle, is too permissive. It fails to indicate that extraction from PP is something rather exceptional, even in English. In retrospect, we can say that standard Subjacency conflates elements of the unmarked locality principle (27) with elements of the language-particular domain stretching that makes preposition stranding possible in certain contexts. In my opinion, one of the most interesting developments during the last few years has been the emergence of theories that try to describe exactly under what conditions domain stretching is possible. As mentioned in the first section, two types of domain stretching can be distinguished. According to the first type, a domain can be extended by specifying an extra category that the domain must contain. This option is probably limited to categories like subject, INFL, or CO MP. Thus, if a category is governed by a preposition, it must be bound within its minimal governing category (= PP) in the unmarked case. By stipulating that the minimal domain must also contain a subject, the minimal domain PP is extended to the first S containing the PP (this S being the first category up that contains a subject). For English, this is the domain extension chosen for bound anaphors (see Chomsky (1981b, ch. 3) for further details). In languages that do not select this option for certain anaphors, the anaphors in question cannot be bound across PP boundaries. Examples were given in section 1.1 above. Here, I will limit myself to the second type of domain extension, the one that allows violation of Wh-islands in certain languages, among other things. For this type of extension, the key insight was provided by Kayne (1983): the path from dependent element to antecedent must meet certain conditions (see also Nakajima (1982)). In particular, Kayne observed that the direction in which the successive projections (up to the antecedent) are governed plays a crucial role in domains the size of which exceeds the size
The Invariant Core of Language
19
of the minimal X max. This insight led to some remarkable predictions; for instance, as to the (near) absence of parasitic gaps in SOY languages like German and Dutch (Bennis and Hoekstra (1984), Koster (1983, 1984b), and chapter 4 below). In addition to some minor modifications necessary for languages like Dutch, my interpretation of the directionality constraints differs somewhat from Kayne's. First of all, it seems to me that directionality plays no role in the assignment of scope (whether it is executed as LF movement or not). Second, directionality constraints belong entirely to the theory of markedness in my view. In the unmarked domain theory (entailed by the Bounding Condition (27)), directionality does not playa role (see chapter 4 for further details). It seems to me that Kayne's theory of path conditions can also be generalized for types of long distance dependencies other than Whmovement. Many languages have long distance anaphora, for instance (see Yang (1984)). As in the case of Wh-movement, domain stretching in these cases often depends on the nature of the successive governors. In Icelandic, for instance, long distance reflexivization is possible if all Vs from the reflexive up to the domain of the antecedent are in the subjunctive mood (see Maling (1981) and the literature cited there, and furthermore chapters 4 and 6 below). Possibly, there are very similar conditions on long Wh-movement in certain languages. Alexander Grosu has informed me, for example, that in certain cases of Rumanian long Wh-movement, all verbs of the path from trace to antecedent must take the supine form if the verb of the top domain (containing the Wh-antecedent) has the supine form (see also Georgopolous (1985) for uniform paths of the realis or irrealis). In general, then, long distance dependencies (other than successive cyclic Wh-movement) seem to require certain types of agreement among the successive domain governors. These governors form a chain that we might call a dynasty (Koster (1984b) and chapter 4 below): (36)
A dynasty is a chain of governors such that each governor (except the last one) governs the minimal domain containing the next governor.
Thus, the governors that can stretch the domain for Icelandic reflexives must be in the subjunctive form. The governors that can stretch the domain for Wh-traces must govern in the same direction, and so on. Until evidence to the contrary is found, I assume that there are only very few kinds of dynasties, and that their nature is determined by Universal Grammar. In fact, I know of only three kinds of dynasties, determined by the following types of agreement: directionality (for Wh-movement), interclausal verb agreement (subjunctive, supine, etc.), and agreement of lexical category (see below). If dynasties are defined by UG, the nature of domain extensions is not
20
Domains and Dynasties
determined by data, and is not by itself a matter of parametric variation. Dynasties might just be dormant features of all grammars, which become available in certain cases if independent parameters are set. Thus, preposition stranding involves a certain type of domain extension (beyond the minimal PP containing the trace). It is presumably acquired by the language learner if certain data (for instance, stranded prepositions) show that the language under consideration has prepositions among its structural governors (see Kayne (1984, ch. 5)). Even if the domain extension is usually acquired on the basis of data, there is no reason to assume that the same holds for the nature of the dynasty, which determines where prepositions can be stranded and where not. Similarly, long distance reflexivization might be an option for all languages in which interclausal verb dependency is somehow expressed. What is a matter of parametric variation, then, is the nature of the verbverb agreement, not the fact that it defines a domain extension. Data seem to playa role in the factors that trigger domain extensions, and not in the factors that determine their shape. If all this is correct, we have the following domain theory. The shape of grammatical domains is entirely determined by UG, by the Bounding Condition (27) in the unmarked case, and by a very limited number of dynasty-governed domain extensions in the marked case. Parameters play a precisely defined and limited role in this theory: they block or open the way to certain domain extensions. In other words, parameters do not play a role at all in the universal configurational matrix (13) that defines the basic shape of dependencies in all languages. In domain theory, parameters are the switches that separate the unmarked domain and its marked extensions. It is not unlikely that parameters play other roles as well, but there can be little doubt that the theory of parameters can develop beyond a mere statement of differences among languages only if the use of parameters is somehow severely limited. I will now turn to the role and nature of dynasties in island violations. Until Chomsky (1977), generative grammar had a rather simple theory of islands. There were just a few, like the Complex NP Constraint (CNPC) and the Wh-island Condition, which were both explained by Subjacency. This theory was elegant and suggestive, but it was not entirely satisfactory for a number of reasons. Some reasons have already been mentioned, among others the stricter nature of island conditions in a language like Dutch. Other languages, like Italian and the Scandinavian languages, turned out to be more permissive with respect to island violations. But even within English, violations of island conditions vary strongly in acceptability. Some of these differences, such as the subject-object asymmetry in Wh-island violations, were explained in terms of the ECP, but others led to many theories but little agreement among linguists. One of the controversial theories is the directionality theory based on Kayne (1983), which was briefly mentioned before. So far, it is the only
The Invariant Core of Language
21
available theory that explains why Dutch has only stranding of postpositions (not of prepositions), and why parasitic gaps are practically lacking in Dutch. This theory also explains the sharp difference between English and Dutch with respect to violations of the CNPC. Thus, certain violations of this condition are reasonably acceptable in English: (37)
[Which race did you express [NP a desire [to win t]]]
The trace is not bound within its minimal domain (expressed by the innermost brackets). So, it can only be bound in an extended domain, in this case the domain indicated by the outermost brackets. The domain extension is well-formed, because the governors of the dynasty all govern in the same direction: the three relevant governors, express, desire, and win, all govern to the right. This kind of directional agreement is required by the theory of Kayne (1983) and its offspring (like Bennis and Hoekstra (1984), Koster (l984b), and chapter 4 below). The Dutch equivalent of (37) is hopelessly ungrammatical: (38)
*[Welke race heb je [een verlangen [te t winnen] uitgedrukt]]
The explanation is straightforward: the N verlangen 'desire' governs to the right, but contrary to what we see in English, the two verbs govern to the left in an SOY language like Dutch. Since there is no dynasty of governors governing in the same direction, the domain extension is not well-formed. A theory based on directionality, though successful in many cases, does not work as an account for the variable acceptability of Wh-island violations, both within one language and across languages. For example, earlier attempts to explain the relative strictness of Wh-islands in Dutch dealt with examples like the following (Koster (1984b)): (39)
*Welk
boek weet je [wie t gelezen heeft] which book know you who read has 'Which book do you know who read?'
This fact seemed to be explained by the directionality constraints, under the assumption that the matrix verb governs the clausal complement to the right, while the object in the embedded clause (indicated by the trace) is leftward-governed by the verb. This is in accordance with the fact that tensed complement clauses must occur to the right of the verb, while NPobjects must occur to the left. This explanation is incorrect, as pointed out by Koopman and Sportiche (1985), who have given relatively acceptable violations of Whislands in Dutch: (40)
Met welk mes weet je niet hoe je dit brood with which knife know you not how you this bread
22
Domains and Dynasties zou kunnen snijden could cut 'With which knife don't you know how you might cut this bread?'
Relatively acceptable Wh-island violations can be found in Dutch after all, contrary to the predictions made by the directionality theory. The fact that earlier studies claimed a stricter Wh-island behavior for Dutch than for English is probably due to two factors. First of all, Whisland violations in English are often milder with relative pronouns extracted from dependent questions: (41)
?This is the boy that I know who kissed
In Dutch, such sentences are distinctly worse: (42)
*Dit is
de jongen die ik weet wie kuste
This contrast is probably due to an independent factor, namely the fact that Dutch has so-called d-words (like die) in such cases, which are somewhat more difficult to extract, even in non-island contexts. Furthermore, Dutch has only a very limited supply of infinitival Whcomplements. In English, these are among the best examples of relatively acceptable Wh-island violations, while extractions from tensed clauses (like (39)) are often bad in both languages if subjects are crossed. Examples without Wh-subjects in COMP lead to relatively mild violations in Dutch: (43)
a.
b.
?Welke boeken wil je wet en aan wie hij which books want you know to whom he heeft? has 'Which books do you want to know to whom ?Aan wie wil je weten welke boeken hij to whom want you know which books he heeft? has To whom do you want to know which books
gegeven given
he gave?' gegeven given
he gave?'
Koopman and Sportiche claim a further contrast between examples like (43a) and (43b): extraction of a direct object is supposed to be worse (43a) than extraction of a subcategorized PP (43b). To my ear, however, (43a) and (43b) hardly differ in acceptability. It is really not a contrast to build a theory on. The directionality theory is of course also insufficient for contrasts within one language. In earlier work, I observed a contrast between the extractability of adjuncts and, for instance, direct objects on the basis of examples like the following (Koster (1978c, 195-198)):
The Invariant Core of Language (44)
a. b.
23
What don't you know how long to boil? *How long don't you know what to boil?
Huang (1982) sought to relate such differences between the extractability of complements and adjuncts to the ECP: complements are properly governed (in the sense of the ECP), while adjuncts are not. Koopman and Sportiche (1985) further developed this type of theory by stipulating that long extraction across Wh-islands is possible if and only if the long-moved Wh-element comes from a 8-position. An alternative theory has been developed by Hans Obenauer (1984, based on work presented in 1982) and Guglielmo Cinque (1984). According to this theory, extraction beyond the domains defined by Subjacency always involves pro. Since only NPs (and certain designated PPs) have the feature + pro, only these elements can be extracted from Wh-islands. This theory also explains the poor extractability of adjuncts in cases like (44b). In spite of success in cases like this one, neither the Huang-KoopmanSportiche theory nor the Cinque-Obenauer theory explains all facts. The former theory, for instance, does not explain Adriana Belletti's observation that extraction of thematic PPs from certain islands is much worse than extraction of NPs:
(45)
*With whom did you express [a desire [to talk t]]
For the Cinque-Obenauer approach, such facts and many others (see Koster (1984b)) are unproblematic, because there is no overt pro-form corresponding to the PPs in question. The Cinque-Obenauer theory, on the other hand, does not account for the relative acceptability of (43b). This fact cannot be accounted for by Subjacency, as suggested for similar facts in Spanish by Obenauer (1984). Subjacency would have to be formulated with S' as bounding node for Dutch. But apart from all the other problems with Subjacency (some of which have been mentioned above), this solution would not account for the fact that the following sentence is still relatively acceptable in Dutch: (46)
?Aan wie wil je weten [s' welke boeken hij zegt to whom want you know which books he says [s' dat hij gegeven heeft]] that he given has 'To whom do you want to know which books he says that he has given?'
This sentence is (43b) with one embedding added. The fronted PP comes from the most deeply embedded clause. Therefore, it has to pass two S's, which is a violation of Subjacency in the intended sense. And yet (46) is hardly less acceptable than (43b). Subjacency, in other words, cannot be
24
Domains and Dynasties
the factor that governs the extractability of PPs from islands in these cases. Summarizing, we have the following situation. Many facts, such as the nature of P-stranding in Dutch, the near absence of parasitic gaps in German and Dutch, and the strong contrast between English and Dutch with respect to the CNPC, can only be accounted for at the moment by a theory that incorporates Kayne's directionality constraints in some form. The nonextractability of adjuncts follows from the Huang theory and its further development by Koopman and Sportiche (1985). It also follows from the Cinque-Obenauer theory. The latter theory has the advantage that it also explains Adriana Belletti's observation of the nonextractability of complement PPs in almost all cases, other than (43b) or (46). At least for this reason, the Cinque-Obenauer theory must be accepted as an important supplement to a Kayne-type directionality theory (along with the qualifications made in chapter 4 below, in my opinion). The Koopman-Sportiche theory has one advantage, however. It is the only theory that does not exclude (46). As we have seen, both the application of the directionality theory to this type of example and the Cinque-Obenauer theory wrongly exclude (46). The question, then, is whether we can save this advantage of the Koopman-Sportiche theory in some form. In fact, examples like (43b) and (46) were given a special status in Koster (1984b) in a discussion of similar examples from Italian. In one of the well-known examples from Rizzi (1978), a PP is extracted from a Whisland: (47)
Tuo fratello, your brother raccontato t, told
a cui mi domando che storie abbiano to whom I wonder which stories they have era molto preoccupato was very troubled
Like (46), this example is incompatible with the Cinque-Obenauer theory as interpreted in Koster (1984b). For this reason, I introduced an extra condition, the Extended Bounding Condition, for examples like (47). According to this condition, the unmarked domain (27) is stretched if there is a dynasty of only Vs. Contrary to the directionality-governed dynasty, which only allows extraction of NPs (= pro), this V-dynasty would allow Wh-fronting of all categories, just like in the unmarked domain (Wh-movement within a single clause). This view has the consequence that Italian counterparts of examples like (46) are predicted to be relatively acceptable, contrary to what the Subjacency account of Rizzi (1978) suggests. To my knowledge, this prediction is borne out. In spite of this, some other data from Koopman and Sportiche (1985) suggest that this formulation (in terms of the Extended Bounding Condition) is too permissive: the account permits extraction of categories of all types (including adjuncts) in domains determined by a pure V-
The Invariant Core of Language
25
dynasty. Adjuncts, however, cannot be extracted from Wh-islands within the domains in question: (48)
*Waarom wil
je weten [wat hij t gelezen heeft] want you know what he read has 'Why do you want to know what he read t?' why
It appears that the Koopman-Sportiche generalization is exactly right for extended domains with pure V-dynasties: in those domains only a-marked categories (NPs or PPs) can be extracted. But as soon as we have dynasties with mixed categories, for instance N and V as in the CNPC, directionality constraints become relevant and only NPs can be extracted (in accordance with the Cinque--Obenauer approach). Both the Huang-Koopman-Sportiche approach and the Cinque-Obenauer approach, then, are right, be it that they concern slightly different domains. All in all, we have a three-way distinction for Wh-movement, one for the unmarked case (49a), and two for the marked case (49b and c), depending on the nature of the dynasty:
(49)
a. b.
c.
all categories movable within basic domain (27) (no dynasty) only complements movable in a domain defined by a dynasty of Vs (no directionality) elsewhere: only NPs moved if there is a dynasty of equally oriented governors
The contrast between (49b) and (49c) is not entirely unexpected: quite generally, the acceptability of extractions from islands is a function of the uniformity and simplicity of dynasties. lO The most important conclusion, however, is that the extraction facts from many languages confirm the reality of the (unmarked) Bounding Condition (27). To the best of my knowledge, the Bounding Condition defines the only domain (in all languages with Wh-movement) in which categories of all types can be moved to CO MP. Domain extensions (which lead to Wh-island violations) are only possible under very limited conditions that can be met in some languages but not in others, depending on the fixing of certain parameters. A domain extension can be recognized not only by its dynasty conditions, but also by strict limitations on the type of category that can be moved to COMPo
1.4. Conclusion
In recent years, much attention has been paid to parametrized theories of grammar. On the one hand, this has given linguistic theory the necessary flexibility, but on the other hand, it has led to a rather unconstrained use
26
Domains and Dynasties
of parameters. This is somewhat reminiscent of the earlier unconstrained use of features. Like a theory of features, a theory of parameters must be constrained: it can only contribute to explanatory adequacy, beyond the mere description of differences among languages, if it indicates where parameters playa role and where not. A tentative effort towards this goal is the hypothesis of the previous section that parameters do not play a role in the unmarked core of grammar, but only as switches between this core and the marked periphery. The most important conclusion, however, is that there is an invariant core of language after all, in spite of the obvious need for parameters at some point in the theory. This invariant core is a configurational matrix, characterized by the four properties listed in (13), which plays a role in almost all local dependencies in (presumably) all languages. A crucial feature of (13) is that it incorporates a universal locality principle, the Bounding Condition (27), that is believed to hold for all constructions mentioned under (30). This locality principle is in a sense the minimally necessary locality principle for all languages in that it defines domains similar to the maximal projections of X-bar theory. Abstracting away from the dominance/nondominance distinction, we concluded that an obvious generalization can be made: the notion "maximal projection" not only defines the domain for vertical dependency relations, it also defines the unmarked domain for all other local dependency relations. Under the crucial assumption that S' (rather than VP) can be the minimal domain of V, the unmarked locality principle (27) characterizes many of the constructions in (30) without further problems. The real challenge for the hypothesis of a universal unmarked locality principle comes from the fact that many constructions, particularly control, bound anaphora, and movement constructions, seem to require a domain definition that somehow deviates from the Bounding Condition. Control, for instance, seems to allow long distance dependency, and more generally, seems to involve principles of argument structure rather than a purely configurational theory. I have tried to show, however, that a well-defined subclass of control structures- namely, obligatory control in the sense of Williams (1980) - has exactly the properties in (13), including the Bounding Condition (27) (see chapter 3 for further details). The biggest problem has been the unification of bound anaphora and "move alpha" in terms of the Bounding Condition. The domain statement for bound anaphora, principle A of the binding theory of Chomsky (1981b, ch. 3), deviates from the Bounding Condition in that the minimal relevant Xmax must contain a SUBJECT (in the sense defined in Chomsky (1981b)). An even greater discrepancy exists between the Bounding Condition and the standard locality principle for "move alpha", i.e. Subjacency. Contrary to the Bounding Condition, Subjacency does not specify one, but two nodes of type Xmax (traditionally NP and S' (or S)). In short, both bound anaphora and movement seem to require domains
The Invariant Core of Language
27
larger than the one specified by the Bounding Condition. The idea that bigger domains must be defined was reinforced by the study of long distance anaphora in languages like Icelandic (and from a different perspective, Japanese) and by reports concerning languages with permissive island behavior, like Romance and Scandinavian. It is fairly obvious now, I believe, that in many languages with phenomena that seem to require more extended domains, the minimal domain defined by the Bounding Condition (27) can still be detected somehow. In languages with long distance anaphora, different things often happen in the minimal domain. In Dutch, for instance, the two reflexives zich and zichzelf are usually in complementary distribution, but they are bound in the same way in the only minimal domain in which they can have an antecedent, namely the domain of V (= S'). As we saw in section 1.1, this domain is specified by the Bounding Condition (without reference to the notion subject). The notion subject only appears to playa role if the anaphors in question are not bound in their minimal Xmax: zichzelf must be bound in the domain of a subject (like English himself), while zich must be free in the minimal domain containing a subject. Similarly, clitics are usually bound in their minimal governing xmax and cannot be bound across major phrase nodes. Again, the domain for these clitics can be defined by the Bounding Condition, without reference to the notion subject. The facts from Dutch suggest that the notion subject does not playa role in the basic domain, but only in an extended domain, which is not universal, as shown by the clitics in many languages. In short, bound anaphors are universally bound within their minimal X max. Outside this minimal domain, anaphors are bound in the minimal subject domain, free in the minimal subject domain, or not bound at all. In comparing various languages, we observe that notions like subject, INFL, or COMP do not define basic domains, but only playa role as domain stretchers. Domain stretching is a marked option in this view. Another method of domain stretching, necessary for long distance anaphora and long movement, is based on the dynasty concept. According to this idea, a domain can be stretched if the governors in the path from dependent element to antecedent agree in some fashion (see chapter 6 for further details). "Move alpha" is the most important case, because its alleged deviant properties have always played a role in the defense of the traditional derivational perspective on grammar. "Move alpha" defines the mapping between various levels of representation. If the properties of "move alpha" cannot be defined, one argument for a particular multilevel approach collapses. 11 As we have seen, Subjacency is the only relevant distinguishing property of "move alpha". If "move alpha" is not characterized by Subjacency, but by the universal Bounding Condition, it loses its distinct character.
28
Domains and Dynasties
The evidence that "move alpha" is not characterized by Subjacency but by the Bounding Condition is very strong in my opinion. Even in English, the Bounding Condition - simpler than Subjacency - suffices for almost all contexts. The only exception is a certain class of postverbal extractions. But this context is clearly irrelevant because, on the one hand, Subjacency is both too weak and too strong for this context, and on the other hand, in many languages (Dutch, for instance) extraction in this context, just as in the other contexts, is perfectly characterized by the Bounding Condition (see Koster (1978c)). The peculiar permissiveness of movement from postverbal contexts in English and a few other languages derives from the possibility of preposition stranding, together with the uniform direction from which the successive projections from trace to antecedent are governed. Thanks to some independent structural features of English, this language allows for a domain extension in the very limited context in question, an extension determined by dynasties of uniformly oriented governors. Strong evidence for the Bounding Condition has come from the study of Wh-island violations in recent years. These violations differ much in strength, depending on the nature of the Wh-category moved to COMPo The relevant fact here is that in the domain defined by the Bounding Condition all categories (including adjuncts) can be moved to COMP, while there are severe limitations both on the type of category moved and on the dynasty conditions if a Wh-element is moved to CO MP in an extended domain. The Bounding Condition, in other words, defines the domain in which all categories can be moved to COMP, relatively free of further conditions. This distinction between the unmarked domain and the extended domain can be observed in most (perhaps all) languages studied from this perspective, even in Italian, as shown by Huang (1982) (see chapters 4 and 5 for further details). If all this is correct, the theory of the configurational matrix (which includes the Bounding Condition) is a step in the direction of a unified theory of grammatical dependency relations. The theory is not only universal in the sense that it applies to all languages, it is also universal in the sense that it applies to all constructions of a certain type. The hypothesis that the core properties of grammar are constructionindependent, I will refer to as the Thesis of Radical Autonomy (see chapter 7). Needless to say, a theory with this scope is highly abstract. But the promising aspect of it is that in spite of this degree of abstractness, it makes very concrete predictions about a large number of constructions. It determines the locality properties of constructions as diverse as subcategorization, bound anaphora, control, and gapping. In the chapters that follow, I will demonstrate the reality of the configurational matrix in X-bar structures (chapter 2), control structures (chapter 3), structures involving Wh-movement (chapter 4) and NPmovement (chapter 5), and also in bound anaphora (chapter 6). If the
The Invariant Core of Language
29
configurational matrix can be detected in all these different constructions, the Thesis of Radical Autonomy is confirmed, which ultimately entails that core grammar is not functionally determined but rather based on mental structures without an inherent meaning or purpose (chapter 7).
NOTES 1. Chomsky (1984). 2. See Sportiche (1983) for a lucid development of this idea. 3. See Chomsky (1981b). 4. See Gazdar (1982), for example. 5. See Bouchard (1984) for the fundamental similarities between empty categories and lexical anaphors in this respect. 6. I am assuming throughout this book that S' (rather than VP) is the minimal Xmax for V. This assumption is at variance with the usual assumption that the maximal projection of V is VP, and that INFL and/or COMP are the heads of new projections. I have never been quite convinced by this assumption, however. It might be useful to make a distinction between lexical projections (based on the categories V, N, P, and A) and auxiliary projections (based on Q, COMP, and INFL). For some purposes, then, S' might be the minimal domain for V (i.e. VP plus its auxiliary projections based on INFL and COMP), and for others VP might be the relevant domain (i.e. the lexical projection without its auxiliaries). Whatever the ultimate truth in this respect, it seems to me that S' often replaces VP as the minimal domain of V. 7. For earlier accounts, see for instance Koster (1982b) and (1984a). 8. Thus, the binding theory for English has the following form: A bound anaphor must be bound in: (i) its minimal Xmax, or elsewhere: (ii) in its minimal SUBJECT domain The first part, (i), is the universal Bounding Condition. The second part, (ii), is the languageparticular extension for English. The status of (ii) can be derived from the fact that it is either lacking in other languages, or is a dimension of contrast, as we saw in section 1.1 for the Dutch reflexives. 9. The following discussion of gapping is from Koster (1984c), where these and other facts are somewhat more extensively discussed. 10. See Koster (1984b), for example. 11. It should be noted that I am not arguing against multilevel theories in general. Apart from S-structure (with its "D-structure" and "LF" properties), I am assuming LS (lexical structure) and PF. The mapping among these levels, however, does not have the properties of "move alpha".
Chapter 2
Levels of Representation
2.1. Introduction
The construction of levels of representation, like deep and surface structure, connected by movement transformations is the standard solution to a certain reconstruction problem. Thus, there are idiomatic expressions like to make headway, in which the idiomatic connection requires the adjacency of the verb make and the NP headway. Assuming that adjacency is a necessary condition for idiomatic interpretation, the following type of example, in which the idiomatic elements are "scattered", poses the classical problem:
(1)
Headway seems to be made
Since the necessary adjacency is lost here, it must be somehow reconstructed. Deep structure was the answer: there must be an underlying level at which make and headway are literally adjacent: (2)
seems to be made headway
The surface structure (1) is derived from the deep structure representation (2) by what is now called "move alpha". This solution was generalized to most situations in which a strictly locally defined relation must be reconstructed. Another example is subjectverb agreement: (3)
a. b.
Mary thinks that the boys have lost The boys think that Mary has lost
The number of the finite verb (have vs. has) is determined by the number of the subject that immediately precedes it. As in the idiom example, an element of the agreement relation (the subject in this case) can be indefinitely far away from the verb: (4)
Which boys do you think that Bill said that Mary thinks have lost
Since it is entirely obvious that number agreement depends on the local 31
32
Domains and Dynasties
subject of a verb, and since the relevant subject which boys is not occupying the relevant local position, it is again reasonable to reconstruct the deep structure in which the subject and the verb are adjacent: (5)
do you think that Bill said that Mary thinks which boys have lost
These examples, to which many others could be added, illustrate one of the fundamental problems that transformational-generative grammar has sought to solve. The standard solution, constructing a level of deep structure, seems very natural. In fact, it seems to be the only reasonable solution in a framework without traces. The standard solution to the reconstruction problem has been undermined by two developments. First, it was shown that the proposed solution was not sufficiently general in that there were similar cases that could not be solved by postulating a level of deep structure. Secondly, trace theory came to the fore, which suggested what in my opinion is a more promising alternative. To illustrate the first point, consider binding of the anaphor himself. Like idiom interpretation and number agreement, anaphor binding is a local relation: (6)
John thinks that the boy admires himself
Both antecedent and reflexive enter into the binding relation if they are within the same local domain. As before, the antecedent can be moved from the necessary local position: (7)
Which boy does John think admires himself
As before, it is clear that the local pattern can be restored by reconstructing the antecedent position of which boy: (8)
does John think which boy admires himself
It is also possible to reorder the reflexive instead of the antecedent: (9)
a. b.
Himself I don't think he really likes What he really likes is himself
It is my claim that in these cases the standard solution does not work. Neither in the case of topicalization (9a), nor in the case of pseudo-cleft (9b) is it possible to literally reconstruct himself in the local domain of the antecedent (the object position of like). I will return to topicalization in what follows. Here, I will briefly illustrate this point with the pseudo-cleft construction. In accordance with the standard solution to the reconstruction problem, it was originally
Levels of Representation
33
thought that the deep structure of (9b) literally has himself in the object position of the verb: (10)
[NP it [8' he really likes himself]] is-
Deriving (9b) from (10) is not easy. Himself has to be moved to the postcopular position indicated by - , and it must be replaced by what (see Chomsky (1970, 209) for a solution along these lines). This way of deriving pseudo-cleft sentences has been universally abandoned. Roger Higgins (1973) convincingly demonstrated that it does not work. In present terms, the movement of himself is impossible because it violates Subjacency. It would also violate the a-criterion because himself, an argument, would fill a a-position at D-structure which is filled by the variable (also an argument) bound by what at surface structure. Last but not least, the binding theory that relates himself to its antecedent does apply at S-structure (see Chomsky (1981b)), so that himself can only indirectly be linked to its antecedent. In short, (9b) is a clear example in which a local relation, the antecedent-reflexive relation, cannot be reconstructed in the standard way by stipulating that there is a deep structure like (10). Apparently, local relations may be reconstructed in a weaker way, namely by the mediating properties of anaphors. In the copular predication (9b), the reflexive himself is interpreted as the value of the pronominal what, which in turn binds a trace at the position where the antecedent-reflexive relation is normally locally determined. The consequences of the fact that the reconstruction problem cannot be solved by standard means in (9b) should not be underestimated. In fact, we can interpret (9b) as a counterexample to the standard approach if the latter is taken to have the following content: local relations can only be satisfied by elements in situ, i.e. by elements that literally occupy the positions involved in the local relations. It seems to me that this is one of the core ideas of the standard level approach; (9b) shows that the standard approach is untenable as a general solution to the reconstruction problem. A somewhat weaker principle is in order. Suppose that local relations are defined for a local domain ~. We then need a principle like: (11)
A dependent element (5 and an antecedent a satisfy a local relation in a domain ~ if a and (5 are in domain ~, or if a or () are respectively related to a or () in p. I
I
The standard approach requires "being in" a certain position; the revised approach (11), necessary in view of examples like (9b), says that "being in" the relevant positions is fine, but "being related" to the positions in question is sufficient. It is now clear why (11), in conjunction with trace theory, potentially undermines the standard approach. In a theory with traces, the Sstructures of (1) and (4) are (12a) and (12b), respectively:
34 (12)
Domains and Dynasties a. b.
Headway seems to be made t Which boys do you think that Bill said that Mary thinks t have lost
Headway is interpreted idiomatically if it is in the object position of make, but according to (11) it is also so interpreted if it is related to an element in the object position of make. The trace t in (12) is precisely the "anchor" element to which headway can be related. Similarly, which boys in (12b) is linked to an element t in the relevant local domain, so that which boys satisfies the locally defined agreement relation. Given the necessity of (11), trace theory is not a complement to the standard approach, but an alternative to it: with traces represented at Sstructure, it is not necessary to have a separate level of D-structure. In a sense, deep structure does not disappear, because its relevant aspects are now coded into S-structure. Chomsky (1973, sect. 17) realized that trace theory suggested the alternative just mentioned, but has never accepted it as the better theory. Since many of the standard arguments lose their force under the assumptions of trace theory, the motivation for a separate level of D-structure, related to S-structure by "move alpha", must be sought elsewhere. In principle, there are two ways to justify D-structure plus movement: either to show that there are properties that are naturally stated only at Dstructure (and not at S-structure), or to demonstrate that "move alpha" has properties that cannot be identified as the properties of rules of construal at S-structure. Note that the second type of argumentation is indirect and weak in principle. The only point of this type of argumentation is that "move alpha" can be reformulated as a rule of construal at Sstructure, but that such restatements are unsuccessful if the rules of construal still have the properties of "move alpha", which are distinct from the properties of other construal rules. The theory without "move alpha" would be a notational variant of the two-level theory, at best (Chomsky (1981b)). If "move alpha" has distinct, irreducible properties, the derivational perspective is not really well established, because it is clear that different rules of construal can have different properties at S-structure. Thus, the alleged unique properties of "move alpha" give circumstantial evidence for a derivational approach, at best. If it can be shown, however, that there are no unique principles applying to "move alpha" and not to other rules of construal, a much stronger point can be made: "move alpha" becomes entirely superfluous. This is one of the central theses of this book: the (unmarked) configurational core of "move alpha" can also be found in a subclass of control structures, in bound anaphora constructions, and in many other constructions. In short, my argument against "move alpha" is essentially an argument of conceptual economy. I agree with Chomsky (1981b, 92) that there is no argument based on conceptual economy if the properties
Levels of Representation
35
of "move alpha" are not shared by other rules of construal. But I will show that there is much evidence that there is a common core in "move alpha" and the rules of construal. One of the redundancies of the current GB approach is that it has two indexing procedures: free indexing for construa~ and indexing by application of "move alpha". By generating S-structures directly, we can do with only one procedure, namely free indexing. The configurational matrix discussed in chapter 1 can be seen as a definition of possible coindexing configurations: coindexing is only permitted between a dependent element o and a unique antecedent a within a local domain ~. As we briefly indicated in chapter 1, coindexing can be interpreted in one and only one way: (13)
share property
This mode of interpretation is sufficient for both the antecedent-trace relation and the antecedent-anaphor relation. It is the central interpretive rule of grammar that these two forms of coindexing share with several other relations. Properties are only optionally shared. A category can derive properties from another category only if it does not yet have the properties in question. This is determined by the uniqueness property of the configurational matrix. Thus, an NP can only share the lexical content of another NP if it does not have a lexical content of its own. Similarly, a-roles and referential indices can only be borrowed by categories that do not have a a-role or a referential index of their own. Some examples may illustrate this: (14)
a. b.
Johni saw himselfi Johni saw Billi
Suppose that all NPs in a tree except anaphors have an inherent referential index. Suppose furthermore that the indices in (14) do not indicate intended coreference but accessibility of rule (13) for the two elements in question. Then, Bill in (14b) cannot share a referential index with John by (13), because this would violate the uniqueness property: an NP can have one and only one referential index. As a consequence, John and Bill must have a different referential index in (14b), which is ultimately interpreted as "disjoint reference". An anaphor like himself, however, does not have an inherent referential index. This might be seen as the definition of the notion "anaphor". But since all NPs of a certain type must have a referential index, himself must borrow the index from its possible antecedent John, which is brought about by (13). Compare now (15a) with (l5b): (15)
a. b.
Johnj was arrested tj J ohnj saw himselfj
36
Domains and Dynasties
Again, we have two coindexed NPs in a local relation permitted by the configurational matrix. Again, then, whatever properties are lacking from one of the NPs can be transferred by (13). In the first case, (lSa), a a-role must be transferred. Since John stands in the proper relation to its trace t, it can borrow a a-role from the trace by (13). Nothing blocks this transfer, because John is not in a position where it is assigned another a-role. In (lSb), we find two NPs that meet the same configurational criteria, but here it is not possible to transfer a a-role from himself to John. The optional rule (13) allows this transfer, but the result would be filtered out by the uniqueness property (usually referred to as the a-criterion): since John already has a a-role it cannot share another a-role with an element coindexed with it. In short, optional property transfer (13) in conjunction with independent principles like the uniqueness condition not only gives the results of the construal rules, but also the results of "move alpha". It should be said at the outset that I am not claiming that we find the same relation in (1Sa) and (lSb). There is an obvious difference between the antecedenttrace relation found in (1Sa) and the antecedent-anaphor relation found in (lSb). What I am claiming is something different: both (1Sa) and (1Sb) involve the same interpretive rule with the same configurational properties, namely (13). The result of this rule is different in these two cases because of independent factors, namely, the fact that John in (1Sb) already has a a-role, while in (1Sa) John is in a non-a-position. But clearly, this difference has nothing to do with the interpretive rule involved, which is (13) in both cases. What I am advocating here, in other words, is a more modular approach to the two different relations in (1Sa) and (1Sb), respectively: one interpretive rule together with two different antecedents (a versus non-a) yields two different relations. The alternative approach sketched here gives a unified account of the common core of "move alpha" and other rules of construal. It not only accounts for the classical cases discussed at the beginning of this chapter but also for the problematic (9b), which was beyond the scope of "move alpha". Let us briefly consider, then, how these cases are accounted for. Take the S-structure representation of (1): (16)
HeadwaYj seems to be made
tj
The relevant idiomatic interpretation is forced upon this structure if the complement of made has the lexical content headway. Since the trace tj does not have inherent lexical properties, they must be borrowed elsewhere. The trace and its antecedent headway meet the conditions of the configurational matrix, so that (13) applies. This entails that tj has the required lexical properties, which it shares with its antecedent. Thanks to (13), this result can be derived without reconstruction of a level of Dstructure in which headway actually occupies the position of the trace. Similar considerations hold for the agreement fact (12b), repeated here
Levels of Representation
37
for convenience:
(17)
Which boysj do you think that Bill said that Mary thinks tj have lost
The agreement relation requires the feature "plural" on the trace tj. Traces never have such properties inherently, but thanks to (13) the feature can be borrowed from the antecedent which boys, which is inherently plural. Let us now have a closer look at the various levels that have been proposed in the literature: (18)
a. b. c. d. e.
D-structure (Chomsky (1981b)) NP-structure (Van Riemsdijk and Williams (1981)) S-structure (Chomsky (1981b)) Logical Form (Chomsky (1981b)) surface structure (Chomsky (1981b))
There is some consensus about the idea that S-structure is the most fundamental level of syntactic representation. Given the strong and growing evidence for empty categories with their distinct properties, the existence of this abstract level seems well established. Naturally, surface structure is then also relatively unproblematic. It differs from S-structure by certain marginal deletions, and perhaps by certain stylistic rules. All the other levels are highly problematic. They are interrelated by "move alpha", a ghost device the properties of which have never been successfully identified. This can be seen by inspecting the properties of traces, the products of "move alpha". Chomsky (1981b, 56) gives the following distinguishing properties: (19)
a. b. c.
trace is governed the antecedent of trace is not in a 9-position the antecedent-trace relation satisfies the Subjacency condition
N one of these properties distinguishes traces from other things. Not only traces but also lexical anaphors are governed. There is strong evidence that PRO can be governed in a subclass of control structures (Koster (1984a) and chapter 3 below); pro (Chomsky (1982a)) is also governed, as a subject in pro-drop languages and also as a resumptive pronoun (chapter 4 below). The second property (19b) is shared by trace and overt resumptive pronouns. It is an error to consider this property the property of a rule ("move alpha"). It is clearly an independent property of certain antecedents. The fact that the subject position of verbs like seem and the subject position of passive constructions is non-9, has nothing to do with "move alpha". The subject positions in question have the same properties without "move alpha", as is clear from structures like it seems that . .. and from passives like it is said that .... 1 It is very unfortunate that an
38
Domains and Dynasties
independent property of the antecedent is confused with a property of the rule itself; as if the fact that anaphors can have plural or singular antecedents entails that there are two entirely different rules of bound anaphora. The third property (19c) is the only substantial property that has been attributed to "move alpha". It is one of the main theses of this book that Subjacency is not a distinguishing property either. The gaps that we find in movement constructions appear to be divided into two classes with entirely different properties. The dividing line is not Subjacency, but the Bounding Condition, which also characterizes locality in many other constructions (chapter 4 below). In other words, there is no rule with the properties of (19). Of course, there are relations with these properties. But these relations are not primitive; they are modularly built up from independent elements, such as the properties of antecedent positions, and the all-purpose propertysharing rule (13). This latter rule has the properties of the configurational matrix, which has nothing in particular to do with movement constructions. If "move alpha" is an artefact, it is hard to imagine what else could justify levels like D-structure, NP-structure, or LF. Apart from "move alpha", the standard approach is to isolate properties that can only be naturally stated at one level or another. But as noted before, such arguments are weak in principle because the relevant aspects of D- or NPstructure are represented at S-structure as subparts. Arguments for levels come down, then, to the idea that subparts of S-structure can be distinguished which have their own properties. This conclusion seems hardly controversial. Let us nevertheless have a closer look at the properties that are supposed to characterize the various levels.
2.2. I)-structure It is not easy to find out exactly what D-structure is. In Chomsky (1981b, 39) we find the following characterization: (20)
D-structure lacks the antecedent-trace relation entirely. At D-structure, then, each argument occupies a El-position and each El-position is occupied by an argument. In this sense, D-structure is a representation of El-role assignment - though it has other properties as well, specifically, those that follow from X-bar theory and from parameters of the base (e.g. ordering of major constituents) in a particular language.
There are two aspects here: (i) D-structure has no traces, and, (ii) it is a pure representation of GF-9 (among other things). Note that these two aspects are independent of one another. In practice, D-structure is interpreted as a level without traces, but its significance is obviously based on the second aspect, i.e. its being a pure representation of GF..e. That the
Levels of Representation
39
two aspects are not interrelated can be seen from an example like the following:
(21)
W hatj did he see t j?
If D-structure is defined as a level without traces, (21) is of course not a D-
structure, but if it is only defined as a level at which each argument occupies a 8-position, (21) does qualify as a potential D-structure. In GB theory, a Wh-trace is considered a variable, i.e. an argument. So, the representation (21) contains two 8-positions that are both filled by an argument (he and t, respectively). If the essence of D-structure is the pure representation of GF-8, movement to A'-positions is irrelevant: before and after the movement the A-chains have exactly one element, which is typical of D-structures. In practice, (21) is not interpreted as a D-structure, but this then depends on the extra stipulation that D-structure contains no traces, neither NP-traces nor Wh-traces. For Wh-traces, this has nothing to do with the essence of D-structure (its being a pure representation of GF-8). If we drop the unmotivated stipulation, we can maintain the essence of Dstructure and consider (21) a D-structure (which falls together with its Sstructure, as in so many other cases). This is a welcome conclusion, because there are independent reasons to assume that Wh-phrases must be base-generated in COMP in certain cases. This is so in languages with overt resumptive pronouns (which are very marginal in English). I will show below that English has empty resumptive pronouns that cannot be related to their Wh-antecedent in COMP by "move alpha". So, the Whphrase in COMP in (21) is in one of its possible base positions, and its trace is an argument with a function chain of one member, which is in accordance with the definition of D-structure. Since it is not possible to exclude (21) as a D-structure on the basis of the argument-8-role distribution, and since the Wh-phrase is also in a possible D-structure position, I see only one argument - apart from arbitrary stipulation - against its D-structure status: the properties of "move alpha". If "move alpha" is a condition on derivations with specific properties, and if the antecedent-trace relation in (21) has these properties, then (21) is not a plausible D-structure. Chomsky has argued recently, however, that there are reasons to consider the traditional characteristic of "move alpha", Subjacency, a property of S-structure (LF movement does not obey Subjacency; class lectures, fall 1983, and Chomsky (1986a)). But if Subjacency is a property of S-structure, there are no significant reasons left to deny D-structure status to structures with only Wh-traces. This is a fortiori true for the theory presented here, according to which "move alpha" has no characteristic properties at all. We must conclude, then, that theD-structure/S-structure distinction is practically meaningless for the many constructions that only involve Whmovement (see Chomsky (1977) for the scope of this rule). If the D-
40
Domains and Dynasties
structure/S-structure distinction is significant at all, it must be based on NP-movement, because only this rule creates A-chains with more than one member. But here we meet other problems. If Wh-movement (for instance in (21)) exists, there must be a distinction between a category as a functional position in a structure and the lexical content of that category. This is clear from the fact that the alleged Dstructure of (21) has the Wh-phrase in the position of the argument, the trace: (22)
COMP he PAST saw [NP what]j
The 9-role can be assigned to the object NP only in abstraction from its lexical content. The reason is that this lexical content is moved to COMP, where it does not have a 9-role (Chomsky (1981b, 115)). The 9-role is left behind at the now empty NP position (the trace). It is therefore not necessary for "move alpha" to carry along9-roles. What (22) and (21) have in common from the point of view of the a-criterion and the Projection Principle is that in both cases there is one a-role assigned to one argument position, i.e. the object position. In (22), this position has lexical content, and in (21) the lexical content has been moved. What remains constant is the a-role assigned to the NP position, which then has this a-role in abstraction from its lexical content This is not what we see in the case of NP-movement: (23)
a. b.
NP was arrested [NP John]j Johnj was arrested [NP tj]
This case has been treated in different ways. One way is to assign the arole to the NP John in (23a); when John is moved to the subject position, the a-role is carried along. The a-role is then not assigned to the object position, in abstraction from its lexical content, as in (22). This is hardly a fortunate result, because a-role assignment would be more or less dependent on the content of NPs: if the NP contains a (quasi-) quantifier, the a-role is assigned to the position (22), and if the NP contains a referential expression, the a-role is assigned to that expression (i.e. not to the position but to the content of the position: (23a)). The problem can be circumvented by assigning a-roles to chains, which is more or less standard now (see Chomsky 1982a)). But this is also problematic, because now John no longer has a 9-role itself in (23b). At S-structure, then, the only way to see whether the conditions of the 9criterion are met is by inspecting the chain. But this algorithm, which checks whether John is connected to a 9-position, practically mimics "move alpha". In short, both methods of transmitting a a-role to a derived A-position lead to problems: either Wh-movement and NP-movement get a different treatment, or "move alpha" is duplicated. But even if these problems can
Levels of Representation
41
be solved, the biggest conceptual problem remains: the derived structure (23b) seems to contain two arguments, a name (John) and an anaphor (the NP-trace). GB theory explicitly states that anaphors are arguments, which is only reasonable (Chomsky (1981b, 35)). Since NP-traces are anaphors for the binding theory (Chomsky (1981b, ch. 3), a structure like (23b) contains two arguments. This is at variance with the a-criterion and the Projection Principle, which require a one-to-one relation between a-roles and arguments at all levels. In practice, therefore, NP-traces are supposed to be non-arguments in structures like (23b). This does not follow from the a-criterion, which only entails that (23b) contains one argument, without telling which of the two NPs is the argument. If not only names, but also all anaphors are arguments, (23b) is in fact ruled out by the a-criterion, unless it is guaranteed somehow that some anaphors (NP-traces) are non-arguments. This must be done by stipulation: (24)
Anaphors are arguments unless they are non-a-bound in a nonCase-position
Even with this stipulation of the worst possible sort, the contradiction remains, because NP-traces must be arguments for binding purposes: (25)
a. b.
TheYi seem [ti to like each otheri] TheYi were confronted ti with each otheri
In both cases, each other is A-bound by a trace of NP-movement. But if NP-traces can enter into a chain of coreference, they must be capable of some referential function themselves, and are therefore arguments by definition. There is also another reason to consider both they and its trace to be arguments in (25a). Both are followed by a VP; if the notion argument makes sense at all, it is reasonable to say that each NP in the predication relation par excellence, the [NP VP] relation, is an argument. It seems to me that the ugly stipulation and the contradiction that we observed form strong counterevidence against the second part of the acriterion (in bold type) (Chomsky (1981b, 36)): (26)
Each argument bears one and on.ly one a-role, and each a-role is assigned to one and only one argument
If both the antecedent and the trace (after NP-movement) are arguments, we have one a-role distributed over two arguments. This is a welcome conclusion, because, as we discussed in chapter 1, the configurational matrix requires a unique antecedent but not a unique dependent element. In other words, the core relations of grammar are not biunique. But this fact throws a new light on the a-criterion (26). As mentioned in chapter 1, licensing relations meet the conditions of the configurational
42
Domains and Dynasties
matrix. If this is the case, the first part of the a-criterion need not be stipulated. It simply follows from the general uniqueness property of the configurational matrix: the a-roles can depend on one and only one antecedent, the licensing governor in this case. This fact is completely analogous to what we observe for bound anaphors: they cannot have split an teceden ts: (27)
*John
confronted Mary with themselves
A dependent element like a reflexive can receive only one referential index from one antecedent. Similarly, an argument can receive only one a-role from one licensing category. But if the second part of the a-criterion is false, the licensing relation is also in this respect like other core relations. Anaphors must have a unique antecedent, but a given antecedent can take more than one anaphor:
(28)
They talked with each other about each other
All in all, it appears that the theory of grammar is considerably simplified if we drop the second part of the a-criterion. It is no longer necessary at all to stipulate the a-criterion, if licensing is a core relation. Together with the empirical evidence given earlier, this forms very strong evidence for the idea that NP-traces are in fact arguments. Consider now a relevant example: (29)
Johnj seems [tj to go]
If this S-structure contains two arguments (to one a-role), its D-structure,
by the Projection Principle, also contains two arguments. But then it becomes senseless to postulate a D-structure which is different from its Sstructure for (29). For NP-movement, then, we come to the same conclusion as for Wh-movement: it does not make sense to remove traces from D-structure (= S-structure). In other words, it does not make sense to distinguish D-structure from S-structure. We have now also located our main difference with the standard GB theory. According to the standard approach, the a-criterion is a biuniqueness condition that states that the relation between a-role assigners and arguments is one to one. According to the present approach, the relation between a-role assigners and arguments is one to one or one to many. As we have seen, this leads to three disadvantages for the standard approach: (i) part of the a-.criterion has to be stipulated, (ii) it must be stipulated that some anaphors are not arguments, (iii) this latter stipulation leads to a contradiction. I will now try to sketch the outlines of a theory without these three disadvantages. As already mentioned, the a-criterion disappears, because its empirically relevant part follows from the general properties of core
Levels of Representation
43
relations (in particular from the uniqueness property of the configurational matrix). Although it does not make sense to distinguish Dstructure from S-structure in the alternative theory, the Projection Principle still makes sense. This is so because the existence of Lexical Structure, distinct from S-structure, is not disputed. Thus, if a verb selects an object, this object must always be represented at S-structure. In structures with fronted Wh-objects then, the gap in object position must contain an empty category (a trace in the standard theory). Nevertheless, I would like to slightly modify the Projection Principle, or rather its scope. Much of the standard theory is inspired by the desire to define syntactic structure as a projection from the lexicon. This has not been entirely successful, because of the obligatoriness of subjects. This has led to the Extended Projection Principle: syntactic structures consist of projections from the lexicon plus subjects (Chomsky (1982a, 10)). These are also the 9-positions. In the same spirit, I would like to define the possible 9-positions (argument positions): (30)
9-roles are assigned by: a. b.
heads (for complements) (to direct 9-positions) predicates (for subjects) (to indirect 9-positions)
The first part (30a) is in accordance with the standard Projection Principle. The second part (30b) is an extension that goes slightly beyond the standard extension of the Projection Principle. The standard extension concerns subjects in the sense of Chomsky (1965), i.e. subjects defined as [NP, S]. It seems to me that this is not sufficient, and that the extension must cover all subjects of subject-predicate relations in the sense of Williams (1980) and subsequent papers. According to this conception, a subject is an NP in the configuration [~ NP XP], where XP stands for any maximal projection (including S'). The NP subject in this sense may receive a 9-role by indirect 9-marking (Chomsky (1981b, 38)), but also by binding an element in the predicate XP. Some possibilities are exemplified by (31): (31)
a. b. c.
John broke his arm Johnj [vP seems [tj to go]] J ohnj [s' OJ [I don't really like t j]]
In all three cases, the argument John is followed by a predicate. In (31a), John receives a 9-role by indirect 9-marking in the usual sense. In (31b), John receives a 9-role by binding an open place in the following predicate. The 9-role of the open place is transmitted by the property-sharing rule (13).
It seems to me that the subject-predicate relation is the only extension we need: it is the only place where direct projection of 9-roles from the
44
Domains and Dynasties
lexicon fails. Ultimately, all El-roles come from the lexicon, but they are only indirectly assigned to subjects. Since we gave up the one-to-one requirement between El-roles and arguments, this indirect El-marking by "property sharing" with another argument is unproblematic. Topicalized constructions like (31c) have always been very problematic for the standard approach. The open sentence is predicated over John in (31c) (Chomsky 1977)), so that John must be an argument according to any reasonable definition of this term. But if John is an argument, it must have a El-role. Under the property-sharing approach, this is not a problem, because John is linked to the trace in (31c) by a construal chain. This trace, an argument, has a El-role that may be shared by the other argument, John. A movement analysis, on the other hand, is impossible for topicalization. John would originate in the trace position, moved to COMP, and from there it would be lifted to the topic position by Vergnaud-raising (see Van Haaften et al. (1983)). But as I will argue below, Vergnaud-raising is impossible for topicalization. In Dutch, topicalization may look like English topicalization, but it may also involve a so-called d-word in COMP position (Van Riemsdijk and Zwarts (1974), Koster (l978a)): (32)
Die man, die ken ik that man that know I
In this case, not only a El-role is transferred, but also Case. In languages with rich overt Case-marking, like German, agreement in Case is normal (see Van Riemsdijk (1978)): (33)
Den Hans (acc.), den (acc.) mag ich nicht the John him like I not 'J ohn, I don't like him'
This example shows once again that in general there is no one-to-one relation between antecedents and dependent elements. There is always a unique antecedent (the Case assigner in this example), but there may be more than one dependent element (Case-bearing NPs). The Dutch and German cases definitely do not involve Vergnaudraising, which would create the d-word with its Case ex nihilo (see also Cinque (1983a) and section 3 below for more arguments). So, here we have a crucial example: Case and El-role assignment to the topic by movement is impossible, while the property-sharing rule may use the construal chain through the anaphoric d-words to transfer to the topic the licenses it needs. The examples with the d-words are particularly interesting because dwords do not usually link idiom chunks to their licensing position, as shown by Van Riemsdijk and Zwarts (1974):
Levels of Representation (34)
45
a.
Ik geloof er de ballen van 1 believe there the balls of 'I don't believe any of it' b. *De ballen, dat/die geloof ik er van the balls that believe 1 there of
Usually, "move alpha" can transfer at least three things: a a-role, Case, and lexical content. If we compare (32) and (33) to (34), we see a discrepancy: in the first two examples, it appears that d-words can transmit a a-role and Case, but from (34) it is clear that lexical content cannot be transmitted. This difference does not come as a surprise. As 1 argued before, the property-sharing rule transmits whatever properties can be transmitted. Normally, the uniqueness condition works as a filter. Thus, Case cannot be transmitted to NPs that already have Case. Similarly, lexical content cannot be transmitted to an NP position that already has lexical content. Thus, the representation of (34b) is as follows: (35)
*De ballenj diej geloof ik tj er van
A Case-marked trace must have a unique lexical content as antecedent (antecedents are always unique). Die in (35) qualifies as the lexical content of the trace, but then it is impossible for the idiomatic NP de bal/en to also qualify as the lexical content of the trace position. Diej cannot be skipped, because according to the configurational matrix, an antecedent is obligatory within a local domain. The transfer of a-roles and Case is unproblematic, however, in such cases. For those, the licensing element (the assigner) is the antecedent. So, the trace tj in (35) has a unique antecedent within the local domain, the verb geloof. As noted before, the number of dependent elements is not constrained by a uniqueness condition, so that both the topic and the dword may depend on the assigner of Case and a-role. So, the rule "share property" works selectively, since its scope is "filtered" by independent principles, such as the uniqueness property of the configurational matrix. This approach solves a paradox about easy-to-please constructions (Chomsky (1981b, 308-314)): (36)
Johnj [vP is [AP easy [OJ [PRO to please tj]]]]
John seems to be in a non-a-position because it can be replaced by it (it is easy to please John). Traditionally, it has also been assumed that John has its D-structure position in the trace position, from where it is moved to the matrix subject position (see Lasnik and Fiengo (1974), however, for a deletion approach, and also Chomsky (1977) for a similar approach). A movement analysis for (36) leads to a paradox, as noted by Chomsky (1981b, 309). The problem is that idiom chunks cannot be moved to the
46
Domains and Dynasties
matrix subject position, as one might expect under a movement analysis: (37)
a. b.
*Good carej is hard to take tj of the orphans *Too muchj is hard to make tj of that suggestion
It seems to me that this paradox cannot be solved under the standard assumptions. Chomsky (1981) assumes that the examples in (37) show that a movement analysis is not possible. I agree, but it must be concluded then that the standard assumptions are seriously undermined, because the standard approach crucially assumes that tl-roles are assigned directly, and not by linking. Moreover, Chomsky (1981b, 313) observes that a nonmovement analysis creates a new problem. If John is inserted in D-structure, the Projection Principle requires that its position be a tl-position, which it is not. Chomsky therefore weakens the assumptions about lexical insertion by assuming that John is inserted in S-structure in (36) (while such names are inserted in D-structure elsewhere). This is even interpreted as an argument in favor of D-structure, because the solution of the paradox crucially involves the distinction between S-structure and D-structure (Chomsky (1981 b, 346, point (e))). It seems reasonable, however, to interpret the paradox as an argument against D-structure and the standard assumptions. Clearly, John is an argument in (36), which must receive its tl-role directly, if the standard assumptions are correct. For the alternative approach, however, (36) is unproblematic. John is inserted at S-structure like all other lexical items (the simplest theory) and it may receive a tl-role because it is a subject. Particularly, it must receive a 8-role from its predicate according to (30b). Since there is a construal chain (indicated by the indices in (36)), this tl-role may be shared with the trace coindexed with it, a trace within the predicate as required. As we saw in the Dutch case, idiomatic lexical content is not necessarily transferred in construal chains. It is only transferred if the chain does not contain other lexical material. It is reasonable, however, to assume that the operator OJ in (36) has features. Intermediate links in COMP do not necessarily have content, but a COMP-to-COMP chain always ends in an operator position, usually marked by the feature + WH (see Chomsky (1977)). It seems appropriate to assume, then, that the feature that makes a COMP position an operator is also present if the operator is not phonetically realized, as in (36). We can also consider these lexical features of the operator position the realization of the Case assigned to the trace. Under the alternative theory, there is nothing paradoxical about (36). There is a construal chain as indicated, and property sharing is filtered by the uniqueness condition as usual. A tl-role is transferred to John, because it is not in a direct tl-position. Case is not transmitted, however, because John is already in a Case position. Similarly, lexical content is not transmitted, because the lexical content of the trace position is already
Levels of Representation
47
satisfied by the features of the operator position. But since lexical content is not transmitted, idiom chunks cannot appear in the matrix subject position, as shown by (37). I will now give a brief review of all arguments in favor of D-structure that can be found in Chomsky (1981b), and that are summarized there on page 346. There is some consensus that S-structure is the basic level of syntactic representation. Chomsky notes that the arguments for Dstructure (as a level distinct from S-structure) are "highly theory-internal". In particular, "[tJhe existence of a level of D-structure, as distinct from Sstructure, is supported by principles and arguments that are based on or refer to specific properties of this level, which is related to S-structure by the rule Move-a." The arguments in which D-structure plays a role are summarized as follows (page numbers of Chomsky (1981b) added): (38)
a. b. c. d.
e.
asymmetric properties of idioms (ch. 2, note 94) movement only to non-S-position ( ... and discussion ... of the distinction between NP-trace and PRO) (p. 46ff.) restriction of an operator to a single variable (p. 203) the requirement that AGR-subject coindexing be at D-structure, as distinct from government by AGR at S-structure, with its various consequences (p. 259) the possibility of inserting lexical items either at D- or Sstructure (p. 312)
We have just discussed argument (38e) and concluded that the facts in question form arguments against D-structure. We can therefore limit our attention to the first four arguments (38a-d). The idiom argument hinges on the fact that some idioms can be "scattered" at S-structure (good carei was taken ti of the orphans), while others cannot (*the bucketi was kicked ti)' In other words, idioms of the first type can undergo movement (bind traces), while idioms of the second type cannot. The argument deserves to be quoted in full (Chomsky (1981b, 146, note 94)): Thus idioms in general have the properties of non-idiomatic structures, and appear either in D-structure or S-structure form, but not only in S-structure or LF-form. D-structure, not S-structure or LF, appears to be the natural place for the operation of idiom rules, since it is only at D-structure that idioms are uniformly not "scattered" and it is only the D-structure forms that always exist for the idiom (with marked exceptions), S-structures sometimes being inaccessible to idiomatic interpretation. Thus at D-structure, idioms can be distinguished as subject or not subject to Move-a, determining the asymmetry just noted.
It is true that there are idioms that only exist in their D-structure form, but there are also idioms that only exist in S-structure form (the marked exceptions mentioned in the quotation). Bresnan (1982), for instance, gives passive idioqls like x's goose is cooked (meaning, x is in trouble and there
48
Domains and Dynasties
is no way out). But it is irrelevant whether there are many or few such examples, because the logic of the argument is unclear. What is an idiom rule? Presumably it is a rule that says that a V + NP combination, among others, has an idiomatic interpretation (make + headway, kick + the bucket, etc.). It seems to me that the most natural place for such interpretation (e.g. kick the bucket = 'die') is not D-structure but the lexicon. The crucial fact, then, is that some idioms can be scattered and some cannot. But of course, the most natural place for that information is also the lexicon. The question is how this information must be coded. It should be noted that the fact to be accounted for is not that no element of certain idioms can be moved. The NP part of certain V + NP idioms cannot be moved, but there is no direct evidence from English that the V part is also immobile. A language like Dutch has some obligatory V-movement rules, Vsecond (Koopman (1984)), and V-raising (Evers (1975)). It appears that the V part of all V + NP idioms in Dutch undergoes these rules, including idioms of the type kick the bucket. An example is de PUP uitgaan ('to die', lit. to go out of the pipe): (39)
a. b. c.
dat hij de pijp uit ging that he the pipe out went hij ging de pijp uit t dat hij de pijp t scheen uit te gaan that he the pipe seemed out to go
(non-root order) (root order after V-second) (after V-raising)
I conclude from these facts that the non-scattering of idioms is a fact of the NP, not of the V, in V + NP idioms. The question now is what the nature of this fact is. Chomsky (1981b, 146, note 94), assumes - and that is the crux of the argument - that the NP must be marked as not undergoing "move alpha". This marking can of course be done in the lexical specification of the idiom, but it remains a fact about certain idioms which cannot be moved, and therefore can only be inserted in D-structure. But note that under this interpretation the argument tacitly assumes what it must prove, namely that the crucial fact about certain idiomatic NPs is plus or minus "move alpha". It is not only possible but presumably even necessary to code the properties of the idiomatic NPs in the lexicon in a different way. The fact to be explained is that the bucket in kick the bucket cannot bind a trace at S-structure. Suppose now that we code this in the lexicon as follows: (40)
[v kick] [NP the bucket] = 'die' [ - antecedent]
Idioms like care (to take care) and headway are not marked with
Levels of Representation
49
[ - antecedent], a marking which presumably follows from a more general property, e.g. the property of being nonreferential in some sense. The marking with [ - antecedent], as in (40), now no longer blocks insertion at S-structure, but the result is filtered out if the bucket binds something at Sstructure, for instance a trace. This solution is presumably better than the marking with [- move alpha] at D-structure, because (40) also blocks (41) at S-structure: (41)
*He kicked the bucketj before he had paid for itj
The bucket cannot be the antecedent for the pronominal it either, a fact about binding stated at S-structure. Parts of idioms like care can sometimes be antecedents at S-structure (Chomsky (1981b, 327)): (42)
Carej was taken tj of the orphans, but itj was sometimes insufficient
All in all, it can hardly be concluded that the idiom argument supports Dstructure. Idioms surely differ from one another, a fact that is naturally expressed in the lexicon. But the differences in question are best interpreted as differences in S-structure behavior. The second argument (38b), "movement only to non-9-positions", has to do with the 9-criterion. Again, we see that "movement" is already presupposed. But since part of the 9-criterion is preserved in the alternative account, the fact in question receives an explanation that does not substantially differ from the standard account: (43)
NPj, ... , NP j
"-----9 If two NPs are coindexed, property sharing, including sharing of the 9-
role, is possible. But as we saw before, property sharing is filtered by the uniqueness condition: the second NP in (43) can transmit a 9-role to the first only if it does not have a 9-role of its own. This fact has nothing to do with D-structure, but is explained by the uniqueness property of the configurational matrix, which is a property of S-structure relations. Note that it is also guaranteed under the alternative account that in a function chain GF b ... , GFn, it is always GFn that is directly licensed. Suppose it were otherwise, i.e. that a 9-role were indirectly assigned (transmitted) to the last NP in a chain: (44)
... NPn-1, ... ,NP n 9 _______ J
Because of the c-command requirement, each link in a chain c-commands the next link; therefore, NP n _ 1 c-commands NP n' Suppose now that NP n
50
Domains and Dynasties
is not directly f)-marked, but that it receives its f)-role from NP n - 1. According to (30), indirect f)-marking goes only from predicates to subjects. Consequently, NP n _ 1 must be contained in the predicate of which NP n is the subject. But this is only possible if NP n _ 1 does not ccommand NP n (the predicate itself c-commands the subject, so that the material contained by the predicate does not c-command the subject). But if NP n _ 1 does not c-command NP n, these two NPs do not form a link of a chain. Therefore, it is impossible for the last element of a chain to get a f)-role indirectly. The last element must always be in a direct f)-position, and the other elements must be in non-f)-positions because of the uniqueness condition. The difference between trace and PRO will be the topic of the next chapter. The third argument (38c) concerns examples like (Chomsky (1981b, 203)): (45)
*WhOi did you give [pictures of ti] to ti?
This example is supposed to be ungrammatical because of the fact that it contains two variables. The idea is that D-structure cannot contain traces and WhOi can fill only one variable position at D-structure, so that the Dstructure for (45) always contains a non-argument, [NP e], at D-structure. This argument is without force, because, as we saw before, the definition of D-structure does not exclude a base-generated Wh-phrase binding two variables (unless it is stipulated that D-structure does not contain Wh-traces). More importantly, the intended explanation is completely overruled by the discovery (or rediscovery) of parasitic gaps:2 (46)
Which booki did you return ti before reading ei?
This structure contains two variables that cannot both be filled at Dstructure by which book. It is therefore not surprising that the earlier explanation for the ungrammaticality of (45) is not maintained in Chomsky (1982a). The fourth argument (38d) has to do with the ungrammaticality of the following Italian sentence (Chomsky (1981b, 259)): (47)
*NPi AGRi sembra [s Giovanni leggere i libri] seems to read the books
The intended explanation is based on the idea that assigning nominative Case involves a mechanism with two components: (48)
a. b.
AGR is coindexed with the NP it governs nominative Case is assigned to (or checked for) the NP governed by AGR
Levels of Representation
51
Clearly, (48b) applies at S-structure (as Chomsky notes), because Case must be checked after Raising. The argument, then, crucially involves the assumption that (48a) applies at D-structure (and not at S-structure). If this assumption is plausible, we might have some confirmation for Dstructure. According to Chomsky, (48a) must apply at D-structure for the following reason. If it is assumed that in pro-drop languages the rule R (which adjoins AGR to V) applies in the syntax, AGR will govern Giovanni in (47): "If AGR could be coindexed with Giovanni by [(48a)], then both conditions for nominative Case assignment would be fulfilled: Giovanni would receive nominative Case in [(47)] and raising of the embedded subject would not be obligatory. But if the agreement phenomenon is determined at D-structure, then the structure [(47)] is barred as required" (Chomsky (1981b, 25<J-260)). It seems to me that this argument is based on questionable assumptions. A much simpler analysis of (47) assumes that the nominative Case assigner AGR (with or without the rule R) or INFL is not accessible to Giovanni. Nominative Case is assigned under government by AGR (or INFL), but in this case Giovanni cannot be governed by AGR because Giovanni is already in the domain of another governor, namely the verb sembra. As we discussed in chapter 1, government is determined by minimal c-command, i.e. a governer y cannot govern an element 0 in the domain of another governory'. Consequently, Giovanni is not governed by AGR in (47), so that nominative Case is not assigned to it and (47) is ruled out by the Case filter without Raising. 3 There are various ways to account for pro-drop phenomena, as indicated by Chomsky (1982a, 78ff.), but there is no evidence that an account involving D-structure is somehow superior, or even plausible. We must conclude, then, that none of the arguments given in Chomsky (1981b) and summarized in (38) supports D-structure. Let us now turn to some direct arguments against D-structure. The first argument is based on a phenomenon analyzed by Obenauer (1984). Obenauer has shown that there is a phenomenon in French which he calls "Quantification at a Distance", the binding of the empty ei by beaucoup in (49b): (49)
a. b.
11 a rencontre [beaucoup de linguistes] II a beaucouPi rencontre rei de linguistes]
The interesting property of beau coup is that it can also occur in these contexts without binding an empty element, with a similar meaning, codetermined by the verb: (50)
II a beaucoup rencontre Jean
There are similar quantifiers, like combien, that can undergo Wh-
52
Domains and Dynasties
movement: (51)
a. b.
[Combien de linguistes]j a-t-il rencontre tj [Combien]j a-t-il rencontre [ej de linguistes]
It appears that this type of Wh-movement is not possible across a quantifier like beaucoup:
(52)
*[QP Combien]j a-t-il beaucoup rencontre [NP[QP e]j de linguistes]
Obenauer calls this pseudo-opacity. This phenomenon is problematic for a movement account, because nothing blocks the movement of combien in (52) (cf. (51b)), and beau coup occurs in the preverbal position where it can normally occur without having a movement source (see (50)). Obenauer argues persuasively that beaucoup necessarily becomes an A'binder of the trace, if there is a trace at S-structure. Under this assumption, (53) (= (52)) is straightforwardly ruled out: (53)
*Combienj a-t-il beaucoupj rencontre [tj de linguistes]
In the theory presented here, this structure is ruled out by the uniqueness principle at S-structure: a trace can have only one antecedent. The point is that this analysis is based on the fact that the relation between beaucoup and the trace is of the same nature as the relation between combien and the trace (the uniqueness condition constrains relations of a given type). But then the relation in question has nothing to do with "move alpha", because the relation between beaucoup and the trace is not created by "move alpha". In terms of a movement account, beaucoup only becomes a trace-binder after the unrelated movement of combien. In other words, (52) clearly involves a relation of the antecedent-trace type that is not created by "move alpha". As Obenauer rightly concludes, such examples favor a representational view of the antecedent-trace relation over a derivational view. A similar argument against the derivational approach can be based on island violations. In some languages, like Swedish, islands can relatively easily be violated (see Allwood (1976)), which might have to do with the productivity of the resumptive pronoun strategy in this language (see Engdahl (1984)). But even in English, relatively acceptable violations of the Complex NP Constraint can be produced (as we briefly discussed in chapter 1): (54)
Which racej did you express [NP a desire [s to win ej]]
This is not a universal phenomenon, because the Dutch equivalent is totally ungrammatical:
53
Levels of Representation
(55)
te **Welke racej heb je het verlangen (om) COMP to which race have you the desire uitgedrukt? expressed
ej
winnen Will
If Subjacency were discovered on the basis of Dutch, we could simply say
that this sentence is ruled out by Subjacency (or some equivalent of it; see chapter 4). It would be a reasonable conclusion, then, that (54), which as a Subjacency violation does not show the characteristic property of movement, cannot be derived by movement. If this conclusion is correct, it must be possible to generate Wh-phrases in COMP (as in (54)) without "move alpha". Since there is nothing in the definition of D-structure that precludes base-generation of variables (apart from stipulation), (54) must be a possible base structure. Despite the fact that (54) does not show the property of "move alpha", it is usually assumed that it is derived by "move alpha", perhaps under the further assumption that Subjacency is not an absolute principle, but an expression of the unmarked case. The Dutch example, however, shows that Subjacency is an absolute principle: if it is violated, the resulting sentence is very unacceptable. I will show in chapter 4 that relatively acceptable island violations do not involve "move alpha" in the traditional sense, but a resumptive pronoun strategy, as discovered by Cinque (1983b) and Obenauer (1984). A violation of island constraints (as in (54)) is not simply "movement" with less strict Subjacency. Gaps in such structures appear to have properties that are entirely different from the properties of standard traces. Standard traces can be of all categories (NPs, PPs, adjuncts, etc.). Gaps in islands (like ej in (54)), parasitic gaps among them, are usually exclusively of the category NP. Moreover, they can only be found in structures in which certain directionality constraints are met (global harmony, see chapter 4). The directionality constraints in question can be met in SVO languages like English, Romance, and Scandinavian, but not in SOY languages like Dutch or German; hence, the acceptability of (54) in the SVO languages, and the total un acceptability in the SOY languages (see 55)). Gaps in islands, in other words, show a surprising clustering of properties: 4 (56)
a. b. c.
Subjacency is violated only NPs are possible directionality constraints must be met
Gaps that are strictly locally bound, like "traces", lack these three properties. What this indicates, is that the violation of Subjacency in (54) is not an accidental property, but a characteristic property of the construction in question (i.e. the construction with the antecedent-resumptive
54
Domains and Dynasties
pronoun strategy): the properties (56b-c) are found only if Subjacency is violated. If these conclusions (worked out in chapter 4) are correct, (54) cannot be derived by "move alpha". There is, therefore, strong evidence that Whphrases can be generated in CO MP without involvement of "move alpha". Like Obenauer's argument, this argument favors the representational view over the derivational view for the Wh-phrase-gap relation. We can now summarize our objections against D-structure and "move alpha". D-structure is essentially a reconstruction level for the direct assignment of a-roles. As we have seen, it is not possible to construct such a level (distinct from lexical structure). Even if D-structure is assumed, arole assignment must be extended to subjects (indirect a-marking in the sense of Chomsky (1981 b, 38)). This is not sufficient, however, because arguments in topicalization and easy-to-please constructions are neither complements nor subjects that qualify as indirect a-positions in the sense of the Extended Projection Principle. Indirect a-role assignment must be extended to all subjects in subject-predicate constructions. If we do so, the biuniqueness property of the a-criterion must be dropped, which is a welcome move for principled reasons. As soon as we drop biuniqueness (the one-to-one relation between a-roles and arguments), the distinction between D-structure and S-structure becomes meaningless. "Move alpha" is essentially a transfer mechanism. Its status as a unique transfer mechanism, distinct from other construal rules, is not supported by the facts. First of all, it has the same configurational properties as other construal rules (see chapter 1 and the following chapters). And, secondly, it is a transfer mechanism filtered by the same uniqueness condition as other construal rules. This second aspect is very important, because in the standard GB approach much weight is given to the fact that movement is to non-apositions. This is, however, totally determined by the uniqueness condition that also filters the transfer potential of other construal rules. a-roles cannot be transmitted to positions that already have a a-role (or that cannot have one in principle), which is just like the fact that referential indices cannot be transmitted to NPs that already have a referential index (such as nonanaphors), or that cannot have a referential index in principle (like A' -positions). That "move alpha" is just an instance of the general property-sharing rule (13), filtered by uniqueness and other independent factors, is clearly demonstrated by the fact that "move alpha" transmits different things in differen t circumstances: (i)
"move alpha" transmits Case (and lexical content), but no a-role, as in Wh-movement
(57)
Whomj did you see
tj?
Levels of Representation
55
This follows from (30): indirect O-role assignment is only to arguments (subjects). This fact is significant because it instantiates a general aspect of the property-sharing rule: only those properties are transmitted that the "receiving" category can take independently. In this case, a 8-role is not transmitted, because non-arguments are inherently unable to take a O-role. (ii)
"move alpha" transmits a O-role (and lexical content) but no Case, as in NP-movement
(58)
Johnj was arrested tj
Naturally, a O-role can be transmitted to a non-O-position. Transfer of a 8role to a 8-position would violate the uniqueness condition. Similarly, John in (58) receives Case independently (from INFL). Again, the uniqueness condition prohibits transport of a second Case to such positions. What Wh-movement in (57) and NP-movement in (58) have in common is the transfer of lexical content. This is not necessary, however: (iii)
"move alpha" transmits a O-role but no lexical content:
(59)
Johnj wants [PROj to be arrested tj]
Again, we see that what "move alpha" transmits is dependent on the inherent properties of the "landing site". If the landing site cannot have lexical content for independent reasons (like the PRO position in (59)), no lexical content is transmitted. It must be concluded, then, that "move alpha" cannot be functionally defined: what it does is contextually determined. Since "move alpha" cannot be configurationally defined either, there is no evidence for the idea that it deserves an independent existence in the theory of grammar. It is just an artefact, the result of the combined properties of the propertysharing rule (13) (i.e. the properties of the configurational matrix) and the independent properties of the landing sites. Crucial evidence in favor of this view, apart from its obvious advantage in terms of conceptual economy, can be found in topicalization and other left dislocation structures, and in easy-to-please constructions: (60)
a. b.
Bill j , I don't like himj J ohnj is easy [OJ [to please tj]]
As a transfer mechanism of 8-roles and Case, "move alpha" is neither necessary nor sufficient. We have already seen that "move alpha" does not always transmit a 8-role (57) or Case (58). The examples in (60) show that Case and 8-role can also be transmitted by other construal rules. What is crucial is that the transfer is dependent on the same filtering mechanism in these nonmovement construal rules. Thus, in (60a) both a O-role and a
56
Domains and Dynasties
Case are transmitted to Bill, because Bill is an argument in a position to which Case and a-role are not assigned independently. In (60b), John is in a position with inherent Case (assigned by INFL), but without a direct alicense. As a consequence of the uniqueness condition, Case can be transmitted in (60a) but not in (60b). A a-role must (and can) be transmitted in both cases. Nonmovement construal, in other words, is subject to the same functional selectivity as movement. Where movement and nonmovement construal sometimes differ is with respect to the transfer of lexical content (for instance, idiom chunks): (61)
*Good carej is hard to take tj of the orphans
But again, this is not a difference in the nature of the transfer mechanism itself, but the consequence of an independent filter mechanism. If an idiom chunk is related to its licensing position in a construal chain that involves other lexical material (like him in (60a) and the lexical features of the operator OJ in (60b)), the lexical content of the idiom chunk cannot be fully transmitted to (shared by) the licensing position. Again, this is a result of the uniqueness condition: a given position has one and only one lexical content. No wonder, then, that idiomatic lexical content is only optimally transferred in construal chains with no other lexical features. The difference in grammaticality between (60b) and (61) is usually interpreted as an indication that "move alpha" is not involved in the a-role transfer of the trace tj in (60b) to John. The implicit assumption, then, is that the transfer of a a-role and the transfer of lexical content always correlate in "move alpha". As we have seen, however, this assumption is false. In (59), for instance, a a-role is transferred from tj to PROj, while lexical content is clearly not transferred. Given the contextually determined functional selectivity of "move alpha", the discrepancy in grammaticality between (60b) and (61) is not a sufficient argument against the involvement of "move alpha" in the a-transfer from tj to Johnj in (60b). But there are other arguments against a movement analysis. That (60a) does not involve movement is quite uncontroversial. We explained the ungrammaticality of (61) by the hidden lexical features of the operator OJ (cf. (60b)). If John in (60b) inherits a a-role by "move alpha" through the operator position in COMP, lexical features must be created ex nihilo, and we would have a chain with two different Cases (of John and the trace, respectively), which is not possible for chains in general (Chomsky (1981b, 334)). A structure like (60b), then, forms a strong counterexample to the usual assumptions concerning D-structure and S-structure as a pair related by "move alpha". In any case, (60b) shows a structure with two arguments (John and the trace) sharing only one a-role. There is no corresponding Dstructure for such cases, according to standard assumptions. We can give (60b) a D-structure only by dropping the assumption of a one-to-one relation between a-roles and arguments at D-structure. This brings us
Levels of Representation
57
once again to the essence of our argument: dropping the one-to-one relation in question comes down to giving up the idea that there is a significan t distinction between D-structure and S-structure. It appears, then, that one of the oldest examples presented in favor of a level of deep structure, John is easy to please, forms one of the strongest arguments against what is now called D-structure. In conclusion, it is useful to give a summary of the kinds of positions that we find at S-structure: (62)
a. b.
c.
positions that are directly projected from the lexicon (basic positions) positions that are related to (and share properties with) basic positions: i. Wh-positions in COMP ii. subjects in non-a-positions iii. topics adjuncts
It has never been controversial that there are positions like the basic positions of (62a), distinct from the positions summed up in (62b). I will maintain this aspect of D-structure by sometimes referring to basic positions as D-stru-cture positions. In chapter 4, I will show that directionality constraints (global harmony) are computed from D-structure positions only. In this sense, D-structure survives as a substructure of S-structure with specific properties. This fact is compatible with a theory like the one presented in Koster (1978c). What was denied in that theory, and what seems even more untenable now, is that S-structure with the positions mentioned under (62b) and its lexical substructure (the D-structure positions) exist as two different levels that can be bridged by "move alpha".
2.3. NP-structure Van Riemsdijk and Williams (1981) have proposed a level of representation distinct from both D- and S-structure. This level is situated between the application of NP-movement and Wh-movement: (63)
D-structure
~
move NP
NP-structure
~
S-structure
move Wh
The arguments for NP-structure are very much like the standard arguments for D-structure: certain facts are treated most elegantly, in the most revealing way, if certain elements are in certain positions, rather than related to certain positions. It should be noted that it is essentially an argument of elegance, because there is a certain consensus that the
58
Domains and Dynasties
arguments in question are not absolutely compelling if there are traces at S-structure. The point can best be illustrated with an example of predication that Van Riemsdijk and Williams give (p. 205): (64)
a. b.
John ate the meatj rawj How rawj did John eat the meat tj?
Both sentences represent a subject-predicate relation with the meat as subject and (how) raw as predicate. If we call the level at which the subjectpredicate relation is fixed "predicate structure" (as in Williams (1980)), the c-command condition on the predication relation is presumably stated as follows: (65)
In predication structure, a subject must c-command its predicate or a trace of its predicate
Van Riemsdijk and Williams argue that if we assume that predicate structure is in fact the pre-Wh-movement NP-structure, the statement (65) can gain in elegance by dropping the reference to the trace of the predicate (in bold type in (65)). An argument of this type is weak in principle because the full representation of (64b) is as follows: (66)
[AP how raw]j did John eat [NP the meat]j [AP t]j
Since the predication relation is defined for pairs [NPj APiJ, it is simply false that the statement of the c-command relation has to refer to the notion "trace" at S-structure. The theory presupposed by (65) is typically a theory without traces, in spite of the fact that (65) mentions the notion "trace". With traces, the c-command condition can be given at S-structure without reference to the notion trace: (67)
A subject NPj in a predicate structure NP i XP j must c-command its predicate XPj
Both in (64a) and (64b) (= (66)), this simple condition is fulfilled at Sstructure. So, there is no element of elegance in this case. The argument is apparently based on the mistaken assumption that the whole predicate has been moved in (66). What has been moved in (66), however, is not the whole predicate but only the lexical content of the predicate. A given lexical content has a syntactic function (such as "being a predicate") only with respect to a functional position. The Wh-position in COMP is not a functional position in this sense. The situation is analogous to what we observe when a Wh-phrase is moved from an argument position:
Levels of Representation (68)
Whatj did you see
59 tj
It is generally assumed that what in CO MP is not an argument in this case; only its "original" position, the position of the trace, is an argument position. Similarly, how raw is not a predicate in (66); only the trace position is. Basically, all the arguments that Van Riemsdijk and Williams give are of this "elegance" type, and basically all of these arguments show the same weakness, namely, that a functional position is not distinguished from its lexical content. I will return to the arguments in detail. But first I will show that predicate structure cannot be NP-structure in the intended sense. Consider a topicalization structure like (69):
(69)
Billj [s' OJ [I don't like
tJJ
Clearly, this is a predication structure with Bill as subject and the open sentence as predicate (see Chomsky (1977)). The topic Bill inherits its Case from the trace position t j, and binding conditions are also transferred from this position: (70)
[Pictures of each otherJj [OJ [theYj don't like
tjJJ
This reveals an inconsistency in the NP-structure model: according to Van Riemsdijk and Williams, NP-structure is the level at which the binding theory applies and at which Case is assigned. This means that the topic must have its NP-structure position in the position indicated by the trace. It is only here that Case is assigned directly and that the binding theory applies without extra transfer. It is for this reason that the NP-structure model derives topicalization by so-called Vergnaud-raising, a rule proposed by Vergnaud (1974). Applied to topicalization, this analysis assumes movement of the topic from the trace position tj to the operator position OJ in (70), followed by Vergnaud-raising to the topic position. In this analysis, the predicate structure in the optimally elegant sense that a lexical subject c-commands a lexical predicate, is only formed after Whmovement of the topic from tj to OJI will show that Vergnaud-raising for topicalization is impossible for independent reasons. Here, the example suffices to establish that it is impossible to construct a level at which both Case assignment and predication apply in the intended sense. More generally, the arguments against NP-structure are of the same type as the arguments against D-structure: the properties of the mapping, "move alpha", cannot be isolated and established, and the properties attributed to NP-structure itself are not exclusive properties of that level. Let us therefore have a closer look at the properties in question. Van Riemsdijk and Williams (1981) give the following four properties of NP-structure:
60
(71)
Domains and Dynasties
a. b. c. d.
the opacity condition (ultimately the binding theory) applies at NP-structure (abstract) Case is assigned at NP-structure contraction (i.e. to-contraction) operates at NP-structure (certain) filters apply at NP-structure
I will now briefly discuss these arguments, beginning with the idea that the binding theory applies at NP-structure. That Wh-movement does not affect the binding possibilities in the same way as NP-movement is not very surprising, given the fact that the binding theory concerns the relations between arguments, i.e. elements in A-positions. Rules that move material to A'-positions, naturally, have no effect on relations between A-positions. But apart from this, (71a) cannot fulfill its promises. As we saw in (70), the binding theory can only apply at NP-structure if we assume Vergnaud-raising for topicalization. This cannot be right because, as I will show, Vergnaud-raising is impossible for topicalization. If this conclusion is correct, (71a) must be false. One aspect of (71a) deserves special attention. Van Riemsdijk and Williams see it as an advantage of their model that the binding theory does not have to refer to Wh-traces: at NP-structure the future Wh-traces are still filled by Wh-phrases, which, as nonanaphors (and nonpronominals) cannot be bound. This would explain the alleged binding properties of Wh-traces in other models without stipulating a difference between Wh-traces and other empty categories (1981, 174). I strongly agree with the spirit of this explanation, as I will argue below. But the proposed execution of the idea, with the Wh-phrases physically filling the future trace positions, again does not give what it promises. The reason is that there are gaps, like parasitic gaps, that are identified by Wh-phrases but that cannot be literally filled by these Wh-phrases at NP-structure. As I will show in detail in chapter 4, the relation between Wh-phrases and parasitic gaps is definitely not characterized by Whmovement. Here we might add that in languages with rich overt Casemarking, like Finnish, the parasitic gaps can have a Case different from the Wh-phrase, which always agrees with the locally bound gap (the trace; see Taraldsen (1984)). This confirms the idea that the relation between a Wh-phrase and a trace has the properties of what is usually called Whmovement, but that parasitic gaps have different properties. In spite of the fact that parasitic gaps cannot "physically" be filled by their binding Wh-phrases, they have the properties of Wh-traces with respect to the binding theory (they must be A-free in every governing category; see chapter 6 for more details). This fact undermines the execution that Van Riemsdijk and Williams give to a certain explanatory idea with which I agree, i.e. the idea that the behavior of certain gaps is derived from the nature of their antecedent. In particular, the facts about parasitic gaps undermine the idea that NP-structure is necessary or even possible in carrying out the intended explanation.
Levels of Representation
61
In short, (71a) is not supported by the binding facts. It is, on the contrary, incompatible with the binding facts if a larger class of facts is considered, in particular the facts about topicalization and parasitic gaps. The second property, (71b), will be my main target for a somewhat more elaborated critique, so I will postpone its discussion until after a brief discussion of (71c) and (d). These two arguments are very similar and show the same weakness: they overlook the fact that the behavior of a position is the joint result of its functional status and its lexical content. The contraction facts on which (71c) is based are well known. A Whtrace blocks contraction (72), while an NP-trace and PRO do not block contraction ((73) and (74), respectively): (72)
a.
b. (73)
(74)
a. b. a. b.
Whoj do you want tj to beat Nixon *Whoj do you wanna tj beat Nixon J ohnj is supposed tj to leave J ohnj is sposta tj leave Ij want PROj to leave Ij wanna PROi leave
The argument is that it is already known that material that is "physically" present between want and to blocks contraction, and that it is only natural to assume that "physically present" material blocks contraction at PR (Phonetic Representation). If contraction applies at NP-structure, before Wh-movement, the Wh-phrase is literally present between want and to in (72), so that contraction is blocked in the most natural way. It seems to me that this approach adds nothing to the standard approach, which explains the difference by the fact that Wh-traces differ from NP-traces and PRO in that Wh-traces have Case. Case is what Wh-traces have in common with lexical NPs, which explains the similarity in contraction behavior. Since there are Case-marked gaps identified by a c-commanding Whphrase that are not the result of Wh-movement, namely parasitic gaps and gaps in islands, it is possible to give a crucial test. We have seen before that gaps in islands differ from traces (see (56) above); particularly, they miss the characteristic Subjacency property of the traces of Wh-movement. If these gaps are not created by Wh-movement (see chapter 4), the NPstructure theory and the standard theory make different predictions. Since the gaps in question are Case-marked, the standard GB theory predicts that contraction across these gaps is just as bad as in (72b). The NPstructure approach, however, predicts that contraction is possible because the gaps are not created by Wh-movement; particularly, the gaps have not been physically filled at any level by the Wh-phrase. Consider now the following data: (75)
a.
??Which manj did you express [a desire [that you want ej to succeed Reagan]]
62
Domains and Dynasties
b.
*Which manj did you express [a desire [that you wanna ej succeed Reagan]]
Naturally, such island violations yield less acceptable sentences. But to the extent that these data are clear, it seems to me that we find the same contrast as in (72). If this conclusion is correct, the standard approach is confirmed: Case suffices to block contraction, and "physical presence" of the Wh-phrase - as required by the NP-structure model - is not necessary. The fourth argument, (71d), is based on a filter proposed for certain Italian data by Longobardi (1980). In Italian, adjacent infinitives are bad under certain circumstances (Van Riemsdijk and Williams (1981, 177)): (76)
*Giorgio comincia ad amare studiare begins to like to study
Essentially, Longobardi's filter has the following form:
Again, Van Riemsdijk and Williams observe, the filter is blocked by a Whtrace (in an argument position) but not by an NP-trace or PRO. Moreover, the filter still seems to apply if the second infinitive is preposed, for instance by clefting (78a) or topicalization (78b): (78)
a. b.
*E [andare a Roma]j che potrei desiderare tj it-is to-go to Rome that I-might wish *[Andare a Pisa]i potrei preferire tj to-go to Pisa I-might prefer
Since the configuration of the filter (77) seems to be destroyed in these structures, Van Riemsdijk and Williams assume that there must be a preWh-movement level, NP-structure, at which the two infinitives are still adjacent. In this case, it might seem that it is less easy to dismiss the argument than in the predication case (66). In the predication case, the relevant information, the category AP, was still present after Wh-movement. In (78) something seems to be definitely lost, namely the internal structure of the trace. The categorial structure of the trace (presumably S') is irrelevant. What really matters is the internal structure of S', i.e. the fact that it contains an infinitive. One solution, proposed by Longobardi, would be layered traces, so that the S'-trace can have an empty V with the feature
inf. This solution is not particularly attractive, according to Van Riemsdijk and Williams. It seems to me that the solution of NP-structure is unattractive for the same reason: it does not seem very plausible that the
Levels of Representation
63
infinitives in (7S) directly bind the trace, just as it is implausible that cleft sentences and topicalizations are derived by Vergnaud-raising, as Van Riemsdijk and Williams assume. It is not unlikely that the construal chain involves an extra step, as in:
(79)
Whatj I prefer tj is to go to Rome
As mentioned before, Higgins (1973) has argued that such pseudo-clefts cannot be derived by movement. (79) is a good example, because the trace is of type NP, while the focus constituent, the infinitive, is of type Sf. Similarly, the cleft and topicalization structures might involve a Wh-trace of an NP, bound by an operator in COMP, which is in turn linked to a fronted Sf. If an analysis along these lines is correct for (7S), then the trace tj is an NP, and both the solution based on layered traces and that based on NP-structure must be incorrect, because these solutions presuppose a trace of the categorial type of the infinitive, i.e. an Sf. Later on, I will present a solution that avoids this problem in particular, and the impossible Vergnaud-type solution for topicalization in general. But first I will analyze (71 b), the claim that NP-structure is the level at which Case is assigned. Case assignment is for NP-structure what E)-marking is for D-structure. Whereas the postulation of D-structure is inspired by the desire to construct a level where all E)-roles are directly assigned, NP-structure is postulated by a desire, among other things, to construct a level at which Case is directly assigned. The underlying assumption is the same in both cases: direct licensing is more natural than indirect licensing through traces and other links of construal chains. As we have seen in the case of D-structure, it is not possible to construct a level at which all E)-marking is direct. But at least it could be maintained that all E)-roles are ultimately derived from the properties of lexical items. For Case, not even that can be true. Consider first an example in which the Case of a topic is derived from the complement Case of a verb. A relevant example is the German example given earlier (Van Riemsdijk (197S, 167)) (see (33) above): (SO)
Den Hansj (acc.), denj (acc.) mag ich tj nicht
In a way, this example is already problematic for the idea that NPstructure is the level of Case assignment. The problem is that (SO) contains two Case-marked NPs. It is not possible to create a level at which both NPs receive their Case directly in the object position of the verb. It is therefore again necessary to derive (SO) by Vergnaud-raising. But note that this would be a very undesirable kind of chain formation in this case. First of all, "move alpha" would have to be complicated by giving it the ad hoc power of being able to create an extra lexical position. This would result in a chain with two Cases, which is an anomaly (c£ Chomsky (19S1b, 334)).
64
Domains and Dynasties
Secondly, it is hard to maintain that den in (SO) is a kind of "visible" trace of the moved topic den Hans. The point is that den can also occur independently (this is essentially Dougherty's anaporn principle): (S1)
Deni mag ich ti nicht that one like I not 'That one, I don't like'
Den is just an independent pronominal. So, we could already interpret (SO) as a counterexample to the idea that there is a level at which all Case is assigned directly. Things become more problematic if we consider examples in which the Case of the topic is not derived from a complement (or subject) position. Thus, Van Riemsdijk (197S, 16S) gives examples like the following: (S2)
Der Hans (nom.), mit dem (dat.) spreche ich nicht mehr the John with him talk I not more 'J ohn, I don't talk to him any longer'
Here, the topic has obligatory nominative Case, while the bound d-word in CO MP has dative Case. Case agreement leads to an ungrammatical sentence: (S3)
*Dem Hans (dat.), mit dem (dat.) spreche ich nicht mehr
In (S2), then, the nominative Case of the topic is not derived from a clauseinternal position. This shows two things. First, it appears that not all Cases are ultimately derived from positions projected from the lexicon, as 8-roles are. Secondly, (S2) shows that Vergnaud-raising is not a general solution to the Case-transfer problem. What we see in (S2) is the selectivity of transfer that characterizes rules of construal in general. If a topic already has a Case (S2) for some independent reason, Case is not transmitted. If the topic has no independent Case, it must be transmitted (SO). Apparently, nominative Case is optionally assigned to topics (it is also possible as an option in (SO)), as a default Case (as suggested by Jan Odijk (personal communication)), or as a generalization of the Case assignment to subjects (the topics in question are subjects in a subject-predicate relation). If the nominative option is not chosen, Case must be transmitted by a c-commanding d-word, as in (SO). If the d-word does not c-command the topic, no Case is transmitted, and the nominative is obligatorily chosen because of the Case filter (cf. (S2) and (S3)). The idea that NP-structure is the level of Case assignment is motivated by examples like the following: (S4)
Whomj did you see
tj
Levels of Representation
65
The idea is that only the trace posItIOn is a posItIOn of direct Case assignment. Since direct Case assignment is the most natural Case assignment, and since whom is only in its natural Case position before Whmovement, i.e. at NP-structure, NP-structure is the natural level of Case assignment. As in the case ofe-marking and D-structure, this view only makes sense if it can be established that there is a one-to-one correspondence between Case positions and Case-bearing NPs. If Case can be transmitted by (nonmovement) construal, so that the one-to-one correspondence breaks down, there is no evidence for NP-structure. It must be shown, in other words, why it is implausible that the Case of whom in (84) is derived from the trace position at S-structure. The view that Case can only be assigned to NPs in direct Case positions and not be transmitted by construal rules is plainly false. Practically all construal rules that connect NPs can transfer Case. We have already seen examples like (80), but also ordinary (non-d) pronouns transmit Case. Van Riemsdijk (1978, 175, note 27) gives examples like:
(85)
Den Hans (acc.), ich habe ihn (acc.) gestern gesehen I have him yesterday seen the John 'John, I saw him yesterday'
There is almost full consensus that such cases of ordinary Left Dislocation are not transformation ally derived by "move alpha" (see Van Riemsdijk and Zwarts (1974) for arguments). One reason is that epithets can also transfer Case (see Cinque (1983a)): (86)
J Ohl1, Mary doesn't like that little bastard
Also, in many other examples of nonmovement construal, it appears that Case can be transmitted: (87)
a. b. c. d.
Whatj he really likes tj (obj.) is himselfj (obj.) He saw something awfulj (obj.): himselfj! (obj.) Whatj did he see tj (obj.)? Himself! (obj.) John saw Billj (obj.) and Peter himselfj (obj.)
Another clear case is Sluicing, discussed by Van Riemsdijk (1978, 231ff.). Van Riemsdijk convincingly argues that Sluicing structures are not derived by deleting the whole context as in: (88)
a.
Someone has done the dishes, but I am not sure who ~# yiPJl! j~, ~JJJl#
Sluicing falls into a domain of facts that Van Riemsdijk refers to as "connectedness of discourse" phenomena. What has already been shown
66
Domains and Dynasties
by (87), but what is also demonstrated in Sluicing constructions by Van Riemsdijk, is that "connectedness in discourse" is sufficient in most cases for Case transfer. Thus, Van Riemsdijk (1978, 244-245) gives the following German example: (88)
b.
Er will jemandem (dat.) schmeicheln, aber sie wissen nieht he wants someone flatter but they know not wem (dat.) whom 'He wants to flatter someone but they don't know whom'
The dative Case of wem, which is obligatory, can only be derived from jemandem in the preceding discourse. The Case cannot be derived from the deleted context, because, as Van Riemsdijk shows, there has never been a context to delete. But if even discourse rules can transmit Case, there is not the slightest reason to doubt that Case can be transmitted in local construals like ((84), repeated here for convenience): (89)
Whomj did you see
tj
Since whom is construed with the Case-bearing trace, it can derive a Case without problems. In fact, there is direct evidence that Case can be transmitted in very similar situations without movement. As we have mentioned several times, gaps in islands do not have the properties of traces and cannot be the result of "move alpha" (see also chapter 4). Consider now the following island violation, which is reasonably acceptable: (90)
Whomj did you express [a desire [to see ej]]
Here, the NP-structure solution fails, because there is no underlying structure with whom in the place of the gap. Particularly, it is not possible to derive (90) from such an NP-structure by "move alpha". But if (90) does not involve NP-structure, I see no reason why (89) should involve NPstructure. It is fair to say, then, that Case assignment has nothing to do with NPstructure. Since the other alleged properties of NP-structure (71) do not support NP-structure either, we must conclude that there is no evidence for NP-structure. Furthermore, we have seen that there are predicate structures that involve Wh-movement (69), which contradicts the view that predicate structure is the pre- Wh-movement NP-structure. A more general objection is that models with NP-structure crucially involve "move alpha", a rule without known properties. A specific feature of the NP-structure hypothesis is that it leads to a Vergnaud-raising analysis for topiealization in a number of cases. I have
Levels of Representation
67
already indicated why I find this rule questionable. It leads, for instance, to chains with two Case positions. I will conclude this part of the discussion with two more arguments against a Vergnaud-type analysis for topicalization. The first argument is a familiar one by now. If gaps in islands are not created by movement, topics cannot originate in the gap position either (from which they are moved to an operator position from which they are raised to the topic position). If the analysis (of chapter 4) for the gaps in question is correct, the following example cannot be derived by movement (including Vergnaud-raising): (91)
That racej [s' OJ [I did not express [a desire [to win eJ]]]
In spite of the fact that such examples cannot be derived by movement, the construal chain suffices for the transfer of anaphoric relations: (92)
[Pictures of each otherJj [OJ [theYi did not express [a desire [to buy ej]]]]
This example shows once again that NP-structure is not the level of the binding theory, in the sense that binding relations must directly be expressed there, without construal transfer. This leads us to a second argument against a Vergnaud-type analysis for topicalization. Consider the following example, in which the reflexive is inside an AP (or a small clause):
(93)
Hej was never [satisfied with himselfJ
This AP can be topicalized: (94)
[Satisfied with himselfJj [OJ [hej never was tj]]
In English, it is not obvious that this example is incompatible with Vergnaud-raising. In Dutch, however, it is possible to have a d-word in the operator position: (95)
zichzelfj]j [s' datj [is hijj nooit tj geweest]] that has he never been satisfied about himself
[AP Tevreden over
The crucial point, now, is that the d-word is an NP and not an AP (or a small clause). In other words, Vergnaud-raising would lead here to the transmutation of categories. The transmutation of water into wine is more credible than the transmutation of NPs into APs by the nonexisting rule "move alpha". I will now turn to a point in which I fundamentally agree with Van Riemsdijk and Williams (1981), and disagree with the standard GB
68
Domains and Dynasties
approach. The point can best be illustrated with strong crossover:
(96)
*Who j did hej say that Mary liked
tj
According to the standard GB approach, the ungrammaticality of this sentence is explained by the fact that the trace is defined as a variable, which can be clarified by the following LF paraphrase of (96): (97)
For which x, x a person, x said that Mary liked x
At S-structure, where the binding theory applies, the trace tj in (96) is already defined as a variable. Under the "natural" assumption that variables have something in common with names ("variables are unspecified names", see Chomsky (1981b, 102)), variables are supposed to have the same binding properties as names. This is expressed by the wellknown principle C of the binding theory: R-expressions (names and variables) must be free in every governing category. This principle is supposed to explain (96), in which the variable tj is illegitimately bound by an antecedent he. This view is confusing for a number of reasons. To begin with, it has traditionally been assumed that pronouns, like he, are the natural language equivalent of variables. Thus, (98a) can be paraphrased as (98b ):
(98)
a. b.
Everyone thinks that he is happy For every x, x a person, x thinks that x is happy
In this case, the pronoun he, which corresponds to the rightmost occurrence of x in (98b), cannot be free, but on the contrary, must be bound. From the point of view of Logical Form, this discrepancy between (97) and (98) seems paradoxical: in (97) a variable must be free, and in (98b) a variable in a similar position must be bound. But of course, the binding theory applies at S-structure, where we have two different categories corresponding to the variables in (97) and (98b). In (97) the variable corresponds to a trace at S-structure, and in (98b) the variable corresponds to a pronoun at S-structure. At S-structure, there is no paradox because traces differ from pronouns in their binding properties. But the problem is that an S-structure like (96) has always been seen as a precursor of the logical form (97). Without this assumption, it makes little sense to call something a variable at S-structure. If syntactic categories are precursors of LF categories, then it is hard to avoid he also being called a variable at S-structure, so that the paradox reappears at Sstructure. Nevertheless, it is essential to the standard binding theory that a Whtrace be called a variable at S-structure. Ultimately, this cannot be done
Levels of Representation
69
without stipulation. In other words, the standard approach is problematic in two respects. It leads to a paradox and it requires unnecessary stipulation. It is a fundamental merit of the analysis given by Van Riemsdijk and Williams (1981), that both problems are avoided. I will later show that their analysis (or some variant of it) avoids several other problems of the standard approach (see section 4 on Logical Form). According to the NPstructure analysis, the binding theory applies to (96) before Whmovement (99)
*H ej
said that Mary liked who j
The position of the object trace is still filled by the Wh-phrase who here. This structure is naturally ruled out by the binding theory because who is neither an anaphor nor a pronominal, so that it is accepted neither by principle A nor by principle B of the binding theory. This theory is not a notational variant of the standard theory because a (quasi-) quantified NP like who in (99) is not a variable. A quantified NP is neither an operator nor a variable, but a category that somehow combines the two aspects of quantification, a matter to which I will return. In fact, we can entirely dispense with principle C of the binding theory if we assume that only anaphors and pronominals can be bound. Since who is neither, (99) can be ruled out without reference to principle C or the notion variable. Although I consider this account a definite improvement over the standard theory, it does not quite work for all cases of strong crossover. Strange as it may seem, this is due to the very idea of NP-structure, the idea that Wh-phrases are literally present in the positions of the gaps at some level. Crucial evidence against this aspect of Van Riemsdijk and Williams' analysis comes from the anti-c-command condition on parasitic gaps. 5 Thus, (lOOa) is relatively acceptable, while (lOOb), in which the first gap c-commands the second, is ungrammatical (see Chomsky (1982a)): (100)
a. b.
Which book j did you return tj without reading ej book j tj was returned before you could read
*Which
ej
It may seem that (lOOb) favors the standard account of strong crossover: if the rightmost gap (the parasitic gap) is a variable, (lOOb) is ruled out because the variable is not free. The NP-structure analysis, on the other hand, does not apply here because it is not possible to reconstruct a preWh-movement level at which the Wh-phrase fills both gaps. Moreover, parasitic gaps are usually in islands, so that (lOOb) cannot be derived from NP-structure by "move alpha" without violating Subjacency. It seems to me that the essence of the Van Riemsdijk- Williams analysis can be preserved by changing the mode of execution, i.e. by giving up NPstructure. The reason why the NP-structure account breaks down in cases
70
Domains and Dynasties
like (100b) is that it is based on an untenable dogma that also led to problems in the case of D-structure. In order to see how the alternative execution works, we must make a short excursion into the domain of categories and their lexical content. In particular, we must sketch the outlines of a theory of identification of categories. All syntactic categories must be identified somehow. The most direct way of identifying a category is by giving it a lexical content. Case assignment is another identification strategy (see the discussion of "visibility" in Chomsky (1981b)). Thus, a category that has both Case and lexical content is doubly identified in a sense. Adjuncts, on the other hand, are only identified by their lexical content. I think that this is the ultimate reason why adjunct gaps have a much more limited distribution than NP gaps (see chapter 4). The Case-marked NPs can be identified by weaker means than the non-Case-marked adjuncts. NP gaps can, for instance, be identified by AGR in pro-drop languages. Furthermore, we have already indicated that NP gaps are the only gaps in English that occur in islands. Huang (1982) has discussed in detail how strict the island behavior of adjuncts is (see chapter 4 below). In English, Case is never a sufficient identifier. Contrary to what we see in languages like Japanese or Chinese (see Xu (1984)), a Case-marked NP gap must always be identified by a c-commanding category of some kind. It seems to me that many construals between categories are identification strategies. A bound anaphor, for instance, is incompletely identified because it is an argument without an inherent referential index. Binding, then, is an instantiation of the property-sharing rule (13). The missing identification is shared with some local antecedent. Similarly, the subject of a passive is insufficiently identified because it is in a position where it lacks an inherent 8-role. I have used "dynamic" terminology for these phenomena, such as "inheriting a 8-role", or "transmitting Case, or a referential index". This usage is somewhat metaphorical, for expository reasons. What I really mean is that the two related categories literally share a certain property, without transfer in the "dynamic" sense. Thus, two NPs can be mapped onto one referential index, or one 8-role, etc.: (101)
a.
John saw himself
~./ I
b.
John was arrested t
~8~ I
As we have mentioned before, all these identification mappings have the form of the configurational matrix discussed in chapter 1. Let us now make a distinction between a category and its lexical content. This distinction has traditionally been expressed by the device of lexical insertion. Thus, the sentence John loves Mary is derived by inserting the lexical elements (102a) in the syntactic skeleton (102b):
Levels of Representation (102)
a.
b.
71
[NP lohn]i, [NP Mary]j. [v love]k [[NP e]i [[v eh [NP e]j]]
In (102 a), we find the lexical content of the categories given in (102b). I will assume now that also after lexical insertion, a filled NP position consists of two parts, the functional category and its lexical content. Thus, in a sentence like John loves Mary, we distinguish for instance the object category [NP e] and its lexical content [NP Mary]. We can also say that Mary identifies the functional object position. It is a fundamental aspect of the representational theory presented here that lexical content is considered part of the identification of a category, on a par with other identificational material, like referential indices, Case, and a-roles. This entails that lexical material can be shared by two positions, just as Case or referential indices can. Thus, a sentence like who did you see can be represented like (101): (103)
[[COMP NP] [you saw NP]]
~/ who
This representation would be the same for a sentence without overt Whmovement, like you saw who (with the Wh-phrase in situ). Given the general possibility of property sharing (within the limits of the configurational matrix) it is immaterial in which position the lexical material actually occurs. In short, a functional category can be identified in two ways by its lexical content in situ, i.e. when the category dominates its lexical content, or by binding, i.e. when the lexical content is the antecedent of the functional category in some domain. In both cases, the functional category shares its properties (such as Case and a-role) with the information provided by the lexical content. I will refer to the former situation as isotopic property sharing, and to the latter as non-isotopic property sharing. It seems to me that generative grammar has always had a strong bias towards isotopic property sharing. Originally, for instance, anaphors were derived from full names in a sentence like John saw himself We could interpret this by saying that himself already had an isotopic referential index in deep structure. This view turned out to be untenable (see, for instance, lackendoff (1972)). Since then, it has generally been assumed that interpretation of anaphors is non-isotopic, i.e. the referential index is derived from (shared with) a binding category. Much of the foregoing discussion can be seen as a demonstration of the inevitable development in the direction of non-isotopic theories for Case and a-roles as well. A continuing preference for D-structure can be seen as an attempt to maintain an isotopic theory for a-role assignment. And the development of NP-structure can be seen as an attempt to construct an isotopic theory for
72
Domains and Dynasties
Case also. In both cases, these attempts are unsuccessful, in my opinion. One reason is that there can be a difference between isotopic property sharing and non-isotopic property sharing. With isotopic property sharing, the functional category and its lexical content share all the properties they have. Non-isotopic property sharing, however, can be partial. As we have seen many times, non-isotopic property sharing can be partial, because it is filtered by independent properties of antecedents. Thus, an antecedent and an anaphor share only one identifier, the referential index. In general, we can say that the fewer properties an antecedent has, the more properties it can share. What has been called "move alpha" is a case of partial property sharing and is therefore most naturally treated as a case of non-isotopic property sharing. Under a movement analysis, some properties of the functional position (the D-structure position) must inevitably be "left behind" (like the a-role in Wh-movement). We can now see why the NP-structure theory did not really work. The reason is that NP-structure requires isotopic property sharing for Whphrases, i.e. they are in a functional position at NP-structure, so that they must share all features with their NP-structure position. This is impossible in the case of parasitic gaps, as we saw in (100). It is stiII possible, under a non- Wh-movement analysis, to determine partial property sharing at S-structure. Thus both (96) and (lOOb) (repeated here) can be excluded if we drop the assumption of (rather) full property sharing presupposed by the movement account (104)
a. b.
* Whoj did hej say that Mary liked tj * Which bookj tj was returned before you could
read
ej
In (104a), the trace position is fully identified by the Wh-phrase in COMP, as in the NP-structure theory, i.e. the trace position shares the lexical features of the fronted Wh-phrase. This rules (104a) out, because binding by he requires anaphoric or pronominal identification. How can we rule out both (104a) and (104b) without movement or variables? Let us first consider some simpler cases, for instance French reflexivization: (105)
IIj sej lave tj he himself washes 'He washes himself'
II must bind se, because se is specified in the lexicon as an anaphor, i.e. as an incompletely identified element. II cannot bind se directly, because se is not in an A-position: only elements in A-positions can be bound. The trace cannot be bound either. Here we make a crucial assumption different from the standard approach. According to the standard binding
73
Levels of Representation
theory, Case-marked empty elements have inherent binding properties. In contrast, I will assume that empty Case-marked NPs have no inherent binding properties at all: their binding properties depend on the properties of their identifiers. What this comes down to is that neither se nor t can be directly bound in (105). But se must be bound. The solution is property sharing. A reflexive can only be bound in relation to an A-position. The relation can be isotopic, i.e. if the reflexive is dominated by the NP in A-position. The identification relation can also be non-isotopic, as in (105). In both cases, the reflexive lexical content identifies the A-position. In (105), then, se can be bound because it identifies (shares the properties of) an A-position. Similar mechanisms rule out the following example, with a nonreflexive clitic: (106)
*I\i lei lave ti he him washes
Le as a nonanaphor cannot be bound. In (106), it identifies an A-position, which rules the sentence out (the lexical content Ie must be free in relation to an A-position). This same mechanism is sufficient to rule out (104a): as such, the trace has no binding properties, but its identifier, the Wh-phrase, cannot be bound in relation to this A-position. Consider now an island violation, which involves an empty resumptive pronominal according to the theory referred to before: (107)
Which racei did you express [a desire [to win ei]]
The empty pronominal is identified by a c-commanding phrase (the Whphrase), as required. The gap is not checked by the binding theory, _because it is not necessary for resumptive pronouns to be A-bound, nor is there an A-binding antecedent in the sentence. Consider next a case in which the empty pronominal is A-bound:
(108)
*Hei said that Mary liked ei
Why can a resumptive empty pro not be identified by an A-binder? Recall that an A-binder does not really bind an empty Case-mark~d NP. It only binds the identifier of this empty category in relation to this A-position. For (108) this would entail that he binds itself, which is impossible under the reasonable assumption that A-binding is a nonreflexive relation. Identification by a Wh-phrase in COMP is no problem in (107) because in this example there is no A-binder, so that the binding theory does not apply. Now consider (104b) again (repeated here): (109)
* Which booki
ti
was returned before you could read ei
74
Domains and Dynasties
Parasitic gaps are resumptive pros according to the Cinque-Obenauer theory to be discussed in chapter 4. As empty elements, they must be identified. In (109) this requirement is met, because there is a ccommanding Wh-phrase. But the empty ei is also bound in (109), namely by the c-commanding trace (which is also identified by the Wh-phrase). It is now clear that (109) is ruled out in the same way as (108). The trace stands in a potential binding relation to the gap ei' As before, this gap has no inherent binding properties. The c-commanding trace (an A-binder) only binds the identifier of this gap in relation to (the A-position of) the gap. But the identifier, which book, cannot be bound, because it is not an anaphor or a pronominal in relation to its functional position, the position of the trace. In other words, the trace cannot bind the parasitic gap because it cannot bind its identifier in relation to its own position. (Of course, the trace also cannot bind which book in relation to the other Aposition, the position of the gap ejo) In sum, if Case-marked empty positions always transfer binding relations to their identifier, which is then bound in relation to the empty position, we have a uniform account for the facts in (104), (105), (106), and (108). The analysis preserves the fundamental and desirable property of the Van Riemsdijk- Williams analysis of strong crossover. But by giving up the idea of movement and NP-structure, the analysis (based on property sharing now) can be extended to the crossover-like anti-ccommand condition on parasitic gaps. Needless to say, the analysis based on property sharing can handle all the facts that the NP-structure hypothesis can handle. Thus, the transfer of binding relations after Wh-movement is accounted for (c£ property (71a)): (110)
[Which pictures of each otheri]j do theYi lik_e tj
They c-commands tj, so that by property sharing the lexical content in COMP is also in the domain of they. Since each other is dominated by a phrase in the domain of they, each other is also in the domain of they, as required. This analysis avoids the problematic transmutation of categories that seemed to be required for a movement analysis of (95) (repeated here): (111)
[AP Tevreden over zichzeifiJj [s' dat j [is hiji nooit tj geweest]]
Instead of Vergnaud-raising, we may of course have a binding relation between an AP (or a small clause) and an NP. There is independent evidence for this type of relation (see also Ross (1969)): (112) Peter is tem'eden over zichzeifi en Hans is dati ook Peter is satisfied with himself and John is that also 'Peter is satisfied with himself, and so is John'
75
Levels of Representation
This is an anaphoric relation that does not involve transformations. Essentially, we find the same base-generated relation in (111) (recall Dougherty's anaporn principle: dat can also be interpreted without an AP topic in (111)). The anaphoric relation, a construal that normally leads to property sharing, appears to be sufficient for the transfer of the ccommand relation. As we concluded before, a movement analysis (and therefore an NP-structure analysis) is impossible in (111). The contraction facts are already handled by the Case-feature on the Wh-trace. The facts also follow from property sharing: (113)
*Whoj do you wanna
tj
succeed Reagan?
It is not necessary to reconstruct who literally into the place of the trace, because the trace is identified by who, so that the trace position shares the
lexical content with the COMP position. Case has been extensively discussed above. Let us therefore end with the Italian facts related to the Longobardi filter. The most problematic facts concern fronted infinitives, cases in which the configuration of the Vjnf Vjnf filter seems to be blocked: (114) *E [andare a RomaJj che potrei desiderare it-is to-go to Rome that I-might wish
tj
If the fronted infinitive is directly linked to the trace, the example is accounted for immediately by property sharing. If the filter says that an
infinitive may not be followed by a category the lexical content of which begins with an infinitive, (114) meets the structural description of the filter without (pseudo-) reconstruction or layered traces. Fronted phrases simply specify the lexical content of their phrases. As in other cases, structural descriptions have an isotopic and a non-isotopic interpretation. The idea that (114) must involve some kind of rec'Onstruction is based on the untenable dogma that structural descriptions can only be met isotopically. In sum, then, we see that NP-structure was one of the last attempts to give the priority of isotopic property sharing some weight. D-structure tried to maintain the dogma for 8-roles. NP-structure tried to revitalize it for Case positions. In both domains we have seen a development that started earlier, with the theory of bound anaphora. With respect to the property-sharing rule, all identifiers are equal: referential indices, 8-roles, Cases, and lexical content. In all cases', it must be concluded that nonisotopic property sharing cannot generally be derived from representations with only isotopic property sharing, i.e. from distinct levels of representation like D-structure or NP-structure.
76
Domains and Dynasties
2.4. Logical Form
Since the term Logical Form is used in different senses, I will first briefly mention what is not at issue in this section. The syntactic level of Logical Form (LF) has very little to do with what logicians call logical form. So, if I am criticizing LF, I am saying nothing whatsoever about logic or semantics in the logician's sense. Furthermore, I will be critical of LF but not of one of the ideas that led to it in the first place, namely the idea that the scope of quantified elements is, at least in part, determined by syntactic principles. The partial dependence on syntax of scopal phenomena is a well-established fact, something I will not take issue with in what follows. The syntactic level of LF differs only minimally from S-structure, particularly by so-called LF movement, an application of "move alpha" that adjoins certain quantified elements to a category containing them. In essence, this idea was first proposed by Chomsky (1973) for unmoved Whelements, so-called Wh-elements in situ (like what in Who saw what). This analysis was extended to other quantified elements, like everyone, by May (1977). Other applications are the movement of elements in focus, like the stressed JOHN in I saw JOHN (see Chomsky (1981b, 196)). LF movement is based on the analogy that is supposed to exist between the output of this rule and the output of overt Wh-movement at S-structure: (115)
Whoi did he see ti?
It has been assumed for some time (i.e. since Chomsky (1973)) that this is a representation of the (quasi-) quantifier-variable relation that can be paraphrased in this case as:
(116)
For which x, x a person, he saw x
This is even one of the ideas that stimulated the development of trace theory, the thought that natural language structures give direct evidence for quantifier-variable representation as the natural ("biologically real") expression of quantification. In discussing these matters, Chomsky (l980b, 165) comes to the following conclusions: If these conclusions are correct, one might speCUlate that the familiar quantifiervariable notation would in some sense be more natural for humans than a variable-free notation for logic; it would be more readily understood, for example, in studying quantification theory and would be a more natural choice in the development of the theory. The reason would be that, in effect, the familiar notation is "read off of" the logical form that is the mental representation for natural language. The speCUlation seems to me not at all implausible.
These speculations have had a great influence on syntactic theory, in that
Levels of Representation
77
much effort has been invested in the development of a syntactic level of LF. Furthermore, this "logistic" view of syntax has also had much influence on the views of S-structure. Particularly, Wh-movement is usually seen as a rule that creates the precursor of Logical Form. From this point of view it is not incoherent to refer to Wh-traces as variables, and to Wh-phrases in COMP as operators. In the binding theory, as we saw in the preceding section, Wh-traces are treated as variables, and the fact that they behave like names is considered natural under the assumption that variables are unspecified names (Chomsky (1981b, 102». Although this view is quite prominent in GB theory, it is not generally accepted, not even among those who accept GB theory in other respects. From the point of view defended here, for instance, the "logistic" approach to syntax is rejected entirely. It seems to me that it is extremely unlikely that Frege, by giving the foundations for quantifier-variable notation in his Begriffschrift (1879) after centuries of logic, did nothing else than rediscover our deepest nature. I find it, on the contrary, more plausible that quantifier-variable notation was developed so late in the history of logic because natural language contains nothing that fully corresponds to the operators and variables in predicate calculus. 6 Universal quantification, for instance, is not expressed in natural language by an operator and a variable, but by a quantified NP, which involves aspects of both. Thus, everyone in (117a) corresponds to the italicized part of the canonical representation (l17b): (117)
a. b.
Everyone left For every x, x a person, x left
Thus, a quantified NP in natural language merges three elements of the canonical format: (i) a quantifier, (ii) a restriction on the quantifier, and (iii) a variable. Natural language expresses these three elements simultaneously, in one NP, and not analytically, as in standard predicate calculus. As for variables, it has traditionally been assumed that pronouns are the closest natural-language counterparts of these elements of the predicate calculus. But where the latter contains "pure variables", natural-language pronouns can only be "dressed" variables, i.e. elements that always have features like person and number. All in all, it seems to me that there are no direct counterparts to pure variables bound by pure operators in natural language and that it is an error to interpret the output of Wh-movement along these lines. The development of LF is based on the logistic interpretation of the output of Wh-movement. According to this view (developed in May (1977) and (1985», the familiar rule "move alpha" picks up a quantified NP and adjoins it to S (this rule is called Quantifier Raising (QR». Thus, QR gives the following output of (117a):
78
(118)
Domains and Dynasties
[s[everyone]j [s
tj
left]]
According to this view, the output of QR is logically transparent in the same sense as the output of Wh-movement at S-structure (115): both structures are supposed to be almost like the canonical representations (116) and (117). The word almost plays a somewhat underestimated role in this context. Not much attention has been paid to the fact that there is still a significant gap between (118) and the canonical representation (117b). In (1l7b), the restriction on the quantifier is also analytically represented, while in (118) the restriction is still implicit in everyone. Further rules are needed to "extract" the restriction. Usually, these rules are only hinted at by saying that (118) can be paraphrased as (1l7b). It is not easy to make these rules (which seem to involve the insertion of lexical material) explicit, as experience with Generative Semantics has shown. In short, quantified NPs contain three elements, a quantifier, a restriction, and a variable. QR makes the quantifier and the variable partially explicit, while the restriction is left implicit. The alternative, to be sketched in a moment, seems somewhat more natural in its underlying assumption that it is not the business of syntax at all to make explicit certain aspects of the content of lexical elements; or at least, "move alpha" is only an instance of the property-sharing rule, and not a device that analytically splits lexical items. The alternative view, according to which Wh-movement has nothing to do with logic, has already been anticipated in the preceding section. I will refer to it as the identification approach, as opposed to the standard logistic approach. According to the identification approach, empty Casemarked elements are not variables but dummies that must be identified by some element in an A'-position. As mentioned before, these dummies do not have independent binding properties. Their binding properties depend on the properties of their identifiers. Thus, if the identifier is a reflexive clitic like French se, this element must be locally bound in relation to the dummy position (which as an argument position has the necessary features). If the identifier does not have anaphoric properties, it cannot be bound in the dummy position. This rules out the following example, in which he has the same index as the trace: (119)
*Whoj did hej see
tj
The ungrammaticality of this sentence according to the identification approach has nothing to do with the status of the trace as a variable. It is a simple consequence of the fact that the trace cannot be identified as an anaphor. If there is a construal chain that makes it possible to identify the trace with reflexive features, the sentence is grammatical, even if the trace is A'-bound from COMP:
Levels of Representation (120)
79
Himselfj [OJ [hej does not really like tj]]
If the trace is interpreted as a variable, it is not easy to circumvent the binding theory for this case (the variable is not free). Under the alternative approach, there is no problem at all: himself is bound in position tj, to which it is connected by construal, which entails property sharing. As already mentioned in the preceding section, there is no difference between (119) and its counterpart in a language without overt Whmovement, which could be represented in English as:
(121)
*Hej saw who j
The sentence would be ungrammatical for exactly the same reason: who is not an anaphor. Such sentences can be ruled out by the binding theory at S-structure, and it is not necessary to have recourse to the alieged properties of variables at LF. A sentence like (121) represents isotopicaliy what (119) represents non-isotopically. In both cases, it is the combination of an Aposition and its specific lexical content that makes the sentence ungrammatical. In neither case is there a direct relation with canonical representations like (116). In particular, it is meaningless in this approach to say that (115) is closer to the canonical representation than (121). Who is not an operator in (119) and t is not a variable. Together, these two elements express the content of a (quasi-) quantified NP who in a nonanalytic fashion; i.e. the operator part and the variable part (not to mention the restriction) are not separated by "move alpha". These three aspects remain indistinguishable in the quantified NP in question. What (119) "splits" in comparison with (121) is the argument and its identifying lexical content, not the variable part and the operator part of that lexical content. In this alternative view, Wh-movement might still be indirectly related to scope assignment. In chapter 4, I will assume that ali cases of scope assignment for Wh-phrases involve an abstract scope marker Q, in the sense of Katz and Postal (1964) and Baker (1970). The representations of (119) and (121) are as follows in this view: (122)
a. b.
[S' Qj [Wh-phrase]j [s ... tj . .. ]] [s' Qj [s· .. [Wh-phrase]j ... ]]
(119) (121)
In both cases, scope is assigned to the Wh-phrase by its relation with the scope marker Q. The difference is that the content of the Wh-phrase is adjacent to Q in (122a). It is not excluded in the alternative analysis that this adjacency somehow facilitates scope assignment. It might even be necessary in languages in which the scopal domain of Wh-phrases is S (instead of Sf) to move the content of the Wh-phrase first to the COMP position, if direct linking to Q from the S-internal position is not possible as a consequence of the domain restriction of Wh-phrases (see chapter 4). But saying that Wh-movement to COMP somehow marks scope is
80
Domains and Dynasties
something very different from saying that the Wh-phrase in COMP is an operator that binds a variable. If the alternative view is coherent, we must ask ourselves how it can be empirically distinguished from the standard view. Some evidence has already been given. In contrast with the standard view, all A'-binding of A positions has the same characteristics in the alternative theory: it identifies empty argument positions. It is not necessary, as in the standard theory, to make a distinction between A'-bound argument positions that are operator-bound, and A'-bound positions that are not operator-bound, and therefore not interpreted as variables (positions bound by c1itics, stylistically dislocated material, etc.). Furthermore, nothing special has to be said about cases like (120). It seems to me, however, that a much stronger case can be made. I will show that the alternative theory can handle everything the operatorvariable theory can handle, and that there is no reason to assume that the notion "variable" deserves a place in syntactic theory. What is more important is that there is crucial evidence that decisively favors the alternative theory. Pied Piping in general, and certain cases of Whmovement in German, have a fully regular status in the identification theory, but are irreparably anomalous in the standard theory. The notion "variable" has played a crucial role in the explanation of the following contrasts: (123)
strong crossover a. b.
(124)
weak crossover a. b. c.
(125)
Hisj mother likes J ohnj *Hisj mother likes everyonej * Whoj does hisj mother like
tj
CO \1P-to-CO \1P violation a.
(126)
Whoj tj said Mary kissed himj * Whoj did hej say Mary kissed tj
*Whoj[ S t j though [s' t j [ S John would see t j]]]
anti-c-command condition for parasitic gaps a. b.
Which bookj did you return tj before you could read ej *Which bookj tj was returned before you could read ej
In all these examples, the rightmost t or e is considered a variable according to the logistic approach. Apart from weak crossover, the ungrammatical sen tences are ruled out by principle C of the binding theory, which stipulates that variables may not be bound (as in the ungrammatical sentences).
Levels of Representation
81
The same facts follow immediately from the identification approach, which does not make reference to the notion "variable". In all ungrammatical cases, the rightmost empty element is identified by a Wh-phrase, a nonanaphor and non pronominal, which may not be bound in the Aposition in question. For strong crossover and the anti-c-command condition of parasitic gaps, this explanation was already demonstrated in the preceding section. It is clear that the same simple explanation will do for the COMP-to-COMP condition of Chomsky (1973), a fact for which complex logistic machinery has also been proposed (see May (1979)). Weak crossover is also optimally simple under an identification approach (see Koster (1983)). A pronoun can only be interpreted as a ("dressed") variable bound by a quantifier, if a quantified NP (in Aposition) c-commands that pronoun at S-structure. Thus, in (124) there is no binding relation at all between who and his (as assumed by Koopman and Sportiche (1982)). Binding (in the sense of sharing of referential properties) is by definition a relation between A-positions (see Chomsky (1981b)). Consequently, his can only be bound by who in an A-position. The A-position in question, the trace position in (124c), does not ccommand his, so that his cannot be bound by who (see Reinhart (1976) and especially Hai'k (1984) for a similar explanation). This is the optimally simple and natural approach to weak crossover, while alternatives based on the notion "variable" only create new and difficult problems (as shown by Hai'k (1984)). The fact that only (124a) is grammatical is not surprising, because this is the only case in which the two NPs can be coreferential without ccommand (binding). Without binding there can only be accidental coreference, a possibility for referential NPs like names and not for quantifiers (see Lasnik (1976)). It seems to me, then, that the standard cases are equally well or even more simply explained by the variable-free identification approach. Crucial evidence against the standard logistic approach has been around for some time but has not received the full attention it deserves. Curiously, one of the oldest arguments against the logistic interpretation of the output of Wh-movement can be found in Chomsky (1977, 83): 'The error of identifying trace itself as the variable within the scope of the wh-quantifier, which is overcome by the much more natural theory just outlined, resulted from concentration on too narrow a class of whphrases." In what follows, Chomsky shows that the parallelism of Whtrace and variable is only seemingly direct in cases like (l27a), but not in cases with more inclusive Wh-phrases like (127b) and (c): (127)
a. b. c.
I wonder what John saw t W hose book did Mary read t Pictures of whom did Mary see t
The more complex examples (127b) and (c) show beyond reasonable
82
Domains and Dynasties
doubt that Wh-phrases do not directly translate as operators. Nor do the traces correspond to variables, as is clear from the paraphrases that Chomsky gives: (128)
a. b.
For which x, x a person, Mary read ex's book] For which x, x a person, Mary saw [pictures of x]
In these cases, the traces only correspond to phrases containing variables, not to the variables themselves. It is crucial, then, that strong crossover, etc. can also be observed in these cases: (129)
*[Whose brother]j did hej say that Mary liked tj
The trace can still be defined as a variable, but the notion loses its significance, because as Chomsky (1977) pointed out, the Wh-phrase does not correspond to an operator. Under the identification approach, (129) is ruled out in the same way as before: the trace cannot be bound by he because it is identified by a nonpronominal, the Wh-phrase. If a PP containing a Wh-word is preposed, we have a really crucial example: (130)
*[With whomj] did hej say that Mary talked [pp t]
This is a normal case of strong crossover. It is not possible to construct a reading in which there is a binding relation between he and (the variable corresponding to) whom. In this case, the example is not ruled out by the binding theory because the binding theory says nothing about PP-traces. The identification approach, however, rules out (130) as it does all the other cases. The preposed Wh-phrase is the lexical content which identifies the trace. The preposed Wh-phrase can only be interpreted in relation to the functional position indicated by the trace. In this position, the NP whom is in the domain of he, contrary to what is possible for a Wh-phrase. In the logistic approach, (130) can presumably only be ruled out in the following way. First, the Wh-phrase is reconstructed in its original position. Then, Wh-movement applies again, this time only affecting the alleged operator part of the Wh-phrase. Only then is a variable created. The variable is then perhaps illegitimately in the domain of he, but we can only rule out the structure in question by applying the binding theory at LF. Since the binding theory must also apply at S-structure (see Chomsky (1981b)), this would lead to an undesirable duplication of the operations of the binding theory. In the identification approach, all these complications are unnecessary, because a functional category and its lexical content can share their properties not only isotopically but also non-isotopically. Non-isotopic
Levels of Representation
83
property sharing works "as if" the lexical content is literally reconstructed in the functional position. But the whole idea of literal reconstruction is foreign to the identification approach. Reconstruction in any form is only necessary if it is assumed that lexical content and functional position must be represented isotopically for certain purposes. It is this insistence on isotopic property sharing that causes the problems, as in many other cases. In Chomsky (1981b, 185) the problematic Pied Piping facts are mentioned, but not solved. From the present perspective, then, these facts are a serious anomaly for the logistic interpretation of Wh-movement. At the same time, these facts crucially support the alternative theory based on identification. Given the identification theory, it is also clear why some Wh-movements look like the creation of an operator-variable structure. In general, a Wh-phrase is a phrase of some size containing a Wh-word that corresponds to a logical operator. In the smallest possible Wh-phrase, this Wh-word falls together with the NP containing it (as in (127a)). If the Wh-phrase is preposed, it might then look as if an operator has been moved. Appearances can be deceiving, however, as we have seen by considering a broader class of Wh-phrases. The bigger Wh-phrases cannot be translated as variables, but only as the identifying lexical content of categories containing a variable. If we want to have a uniform interpretation of Wh-phrases, the smallest Wh-phrase must also be interpreted as the identifying lexical content of a category, a category with which it falls together in this smallest case. It seems to me that this interpretation is entirely consistent and is the only interpretation that avoids the Pied Piping anomaly. Although I consider even the English Pied Piping facts as decisive evidence against the logistic interpretation, it is useful to add some spectacular evidence from German discussed by Van Riemsdijk (1982) and (1983). German can "pied pipe" whole clauses, as first discussed by Ross (1967) (see also Longobardi (1980) and Cinque (1980) for related cases in Italian). Van Riemsdijk (1982) gives examples like the following: (131) Jetzt hat er sich endlich den Wagen [s' den zu kaufenJj now has he (himself) finally the car which to buy er tj sich schon lange vorgenommen hatte, leisten he (himself) already long planned had afford konnen been-able-to 'Now he has finally been able to afford the car which he had planned to buy for a long time' Van Riemsdijk shows convincingly that the moved Wh-phrase (den zu kaufen) is of type Sf, and moreover, that the relative pronoun den has been moved internally, i.e. to the COMP of the pied-piped Sf, which is in COMP itself. Thus, the structure is as follows (his (34)):
84
Domains and Dynasties
(132) NP
S'
I
--------------I N~
CaMPI
der Waf/ell
S
I~
S'; ~~ C0 MP 1
dellj
~'-........
2
er
~
NP
S'
NP
til'
sl'elh
VP
I~ PRO NP V
I
tj
V I
vorgenommen hatte
I
zu kaufen
According to the logistic approach, Wh-movement in relative clauses is also considered the creation of an operator-variable structure. A structure like (132) dramatically deviates from this picture. The Wh-phrase in COMPi is not an operator at all, but only the content of the rightmost trace tj. If anything is an operator, it is den in COMP 2. But COMP2 is not the operator COMP that is supposed to introduce English relative clauses. COMP 2 introduces the complement of the matrix verb sich vornehmen, which strictly subcategorizes it. But contrary to COMPi> COMP2 is not a likely candidate for an operator COMPo In other words, neither COMP immediately dominates an operator in (132). COMPi only contains a phrase containing a possible operator (den), and COMP 2 is not an operator COMP, so that it does not make sense to interpret the word den that it dominates as an operator. Of course, (132) is fully compatible with the identification approach: COMPi contains the material that identifies tj, and COMP 2 contains the material that identifies tj. If it is not possible to give a coherent logistic interpretation to (132), we have another counterexample to the standard approach. There is also direct evidence from German that a fronted Wh-phrase is not necessarily in an operator position. It appears that in languages with both Wh-movement and overt scope markers for questions, the trajectory from the D-structure position of a Wh-phrase to its point of scope marking can be shared by a series of movements and a series of scope markers. This is particularly clear in German (see De Mey and Manicz (1984) for a similar phenomenon in Hungarian). In German, the distance from an already moved Wh-phrase to its scope position can be bridged by a repetition of the scope marker was in the intermediate CO MP positions: (133)
Was glaubst du, was Peter meint, mit wemj Hans sagt, what think you what Peter believes with whom Hans says dass Klaus behauptet, dass Maria tj gesprochen hat that Klaus claims that Maria spoken has
Levels of Representation
85
According to Van Riemsdijk (1983a, 13), mit wem has the widest possible scope, i.e. over the matrix clause. In that case, scope is not computed from the point where the Wh-phrase has been moved to (in an intermediate COMP), but from the highest occurrence of was. Interestingly, Van Riemsdijk shows that the Wh-phrase can end up in any intermediate COMP, as long as the path to the matrix is filled by a series of was: (134)
a. b.
c. d. e.
Was glaubst du, was Peter meint, was Hans sagt, was Klaus behauptet, mit wem Maria gesprochen hat Was glaubst du, was Peter meint, was Hans sagt, mit wem Klaus behauptet, dass Maria gesprochen hat Was glaubst du, was Peter meint, mit wem Hans sagt, dass Klaus behauptet, dass Maria gesprochen hat Was glaubst du, mit wem Peter meint, dass Hans sagt, dass Klaus behauptet, dass Maria gesprochen hat Mit wem glaubst du, dass Peter meint, dass Hans sagt, dass Klaus behauptet, dass Maria gesprochen hat
These sentences all have the same logical form in that the Wh-phrase (the Wh-word containing it) has matrix scope. Only in (134e) does the Whphrase also fill the COMP from where its Wh-part has scope over the whole sentence. In other words, (134) forms evidence of the most direct possible kind against the view that Wh-movement creates an operatorvariable structure. Scope is only assigned to Wh-phrases by linking them to a scope marker (Q). In English, there is always the requirement that this Q be adjacent to at least one Wh-phrase (in Wh-questions). In other languages, this requirement is lacking if the Wh-phrase can be linked to Q by a series of intermediate scope markers. 7 Given the fact that (134) is entirely unproblematic for the identification view on Wh-movement, it seems to me that (134) forms compelling counterevidence against the logistic interpretation of Wh-movement: Whmovement has nothing to do with the creation of an operator-variable structure. If Wh-movement is not the creation of an operator-variable structure, there is not much point to so-called LF movement, the rule (including QR) that creates LF on the analogy of overt Wh-movement. If overt Whmovement does not serve any logical purpose, why would covert Whmovement? In principle, there are two ways to test the hypothesis of LF movement. The first prediction is that it has the properties of "move alpha", in particular that it is constrained by Subjacency. A second possible prediction is that the gaps created by LF movement have the same properties as overt gaps. Both consequences have been proposed. Thus, May (1977) claimed that scope assignment is constrained by Subjacency. This prediction of the hypothesis of LF movement is generally believed to be false now (see Chomsky (1977), and especially Huang (1982); also May (1985)). The fact
86
Domains and Dynasties
that LF movement does not appear to have the properties of movement is, of course, an argument against LF movement The second prediction, the similarity in the behavior of gaps, is the only one that is currently seriously defended. According to this hypothesis, there is a significant generalization about overt gaps and covert gaps, namely the ECP (see Lasnik and Saito (1984) for a recent discussion). I will try to show in chapter 4 that this second prediction is also false, at least with respect to Wh-elements in situ. Here, I will only add a brief discussion of QR. QR is the rule proposed by May (1977) that adjoins quantified NPs to the Ss containing them, on the analogy of overt Wh-movement. Thus, everyone saw Bill is represented as: (135)
S
N~
~VP
eveJonei
NP
I
tj
~NP
V
I
saw
I
Bill
According to this view, everyone has scope over the whole sentence because it c-commands the nodes in the sentence. Scope is, therefore, believed to depend on c-command at LF. Furthermore, the trace in (135) is considered a variable, like the analogous traces of Wh-movement. Note, however, that there is no evidence that the scope of (non-Wh) quantified NPs must be expressed by c-command. Thus, the following sentence is usually supposed to be ambiguous between a reading in which everyone has wide scope and a reading in which someone has wide scope (see May (1977, 1985); also Ha'ik (1984)): (136)
Evelyone loves someone
Everyone c-commands someone, but not the other way around. So, there is
no direct evidence that c-command is a necessary condition for scope. We might simply say that a quantified NP has scope over the minimal S that contains it (this is essentially command in the original sense; see Langacker (1969)). This statement, which seems empirically equivalent to QR + c-command, does not require the creation of an entirely new level by a rule without known properties. Moreover, it avoids the creation of traces that must be interpreted as variables. This would be ad hoc, because as we have just seen, there is no reason to interpret traces as variables at
Levels of Representation
87
S-structure. Moreover, since the binding theory applies at S-structure (see Chomsky (1981b)), the new level of LF would not serve any other purpose than the representation of scope. In fact, it would be created for the sole purpose of expressing scope by c-command rather than by S-command. To my knowledge, there is no empirical evidence that makes it plausible that scope must be assigned by c-command rather than by S-command. Binding relations require c-command, as the following example shows: (137)
a. b.
Everyone thinks that he is happy *The father of everyone thinks that he is happy
Contrary to what we see in (137a), the quantified NP everyone cannot bind the pronoun he because it does not c-command it (see Lasnik (1976), Reinhart (1976), Culicover (1976), and recently, Hai'k (1984)). The scope of the quantified NP, however, is not expressed by c-command but by Scommand: (138)
a. b.
Everyone thinks that someone loves him The father of everyone thinks that someone loves him
In both cases, the reference of someone varies as a function of the fact that it is in the scope of everyone (see Hai'k (1984) for this property of indefinites). Of course, we can convert (138) into structures in which everyone c-commands someone, but the question is: why should we? What is worse for the QR-hypothesis is that QR is not really able to disambiguate (136), given the transfer properties of traces. The original idea was that (136) can be disambiguated by applying QR in two different fashions (May (1977)): (139)
a. b.
[Everyone]j [someone]j [tj loves tj] [Someone]j [everyone]j [ti loves tj]
According to the hypothesis in question, (139a) represents the reading in which everyone has wide scope, and (139b) the reading with wide scope for someone. Thus, (139b), for instance, can also be interpreted with everyone having wide scope. Everyone c-commands the trace of someone, tj- By transfer, then, everyone can also c-command someone. This interpretation cannot be avoided if traces keep their normal properties. Compare, for instance, the following S-structure: (140)
[Which pictures of each otherj]j do theYi really like tj
They does not c-command each other directly, but each other is dominated by a phrase the trace of which is in the domain of they. By transfer, then, each other may also be interpreted as being in the domain of they. This is the normal property of traces, as we have seen before in many contexts. In
88
Domains and Dynasties
(139) traces can only be deprived of their normal properties by stipulation, which annihilates the explanatory force of the analysis. Movement, especially movement to A'-positions, is in a sense configuration-preserving; i.e. the relations holding at the positions of origin may still hold after movement. This was also the observation that inspired Van Riemsdijk and Williams (1981) to postulate a pre-Whmovement level, NP-structure, at which major properties are defined: Whmovement leads to practically no changes in the basic relations. Even in (139), therefore, we still need a disambiguating procedure, for instance the kind of scope indexing introduced by Hai'k (1984).8 If this conclusion is correct, it does not make sense to create structures like (139), because scope indexing can just as well be done at S-structure, as shown by Hai'k. Arbitrary (ultimately stipulative) neglect of the transfer properties of traces undermines many other explanations based on QR. May (1985, ch. 1) observes, for instance, the following contrast (141) a. b.
Dulles suspected everyone who Angleton did *Dulles suspected Philby, who Angleton did
The explanation is based on the idea that VP-deletion is only possible if neither the missing verb nor its antecedent c-commands the other. The first example, (141a), involves a quantifier that can be extracted by QR: (142)
[[everyone who Angleton did]2 [Dulles suspected t2]]
In this structure, suspect no longer c-commands the missing verb, so that it can be reconstructed in accordance with the conditions on VP-deletion: (143)
[[everyone who Angleton suspected e2]2 [Dulles suspected t2]]
Since QR does not apply to (141b), which has a name instead of a quantifier, the anti-c-command condition cannot be circumvented in this case. Again, it is not clear what difference QR makes here, because the fronted quantifier phrase in (143) is still indirectly c-commanded by suspect, through the trace. Moreover, (143) is suspect for two other reasons. First of all, it involves Pied Piping, which is a highly idiosyncratic phenomenon that differs from language to language (Ross (1967)). It is likely that the idiosyncratic Pied Piping patterns of languages must be learned on the basis of evidence. But what evidence could there be concerning the nature of Pied Piping with respect to the "invisible" LF movement? Another problem with (143) is that it violates a well-formedness condition on coindexed NPs, Chomsky's iii-filter (Chomsky (1981b, 212)): (144)
*[y ... (5
.•. ],
where y and
(5
bear the same index
Levels of Representation
89
In short, QR does not solve what it is supposed to solve in this case, and it creates new problems that could have been avoided without QR. It seems obvious that the contrast in (141) is caused by the fact that the NP to be reconstructed is bound by a quantifier in (141a) and by a name in (141b). We can distinguish quantified NPs from other NPs at Sstructure by giving them two different kinds of indices, for instance i for nonquantified NPs and iii for quantified NPs (see Hai'k (1984)). This is sufficien t to distinguish (141 a) from (141 b) under reconstruction at Sstructure: (145)
a. b.
Dulles suspected everyonei/i who Angleton did [suspect eviJ Dulles suspected PhilbYi> who Angleton did [suspect eiJ
Whatever the explanation, the condition can clearly be stated at Sstructure: VP-reconstruction under conditions of c-command is only possible in the scope of a quantifier. Another interesting case discussed by May (1985, ch. 1) is the following: (146)
Every pilot hit some Mig that chased him
As in previous examples, this sentence exhibits a scope ambiguity: either quantifier may be understood as having broader scope than the other. May observes that the construal of the pronoun him varies according to the scope relations: him can only be bound by every pilot if every pilot has broader scope than some Mig that chased him. This, again, would follow from the representation after QR: (147)
a. b.
[Every pilot2 [some Mig that chased him3 [t2 hit t3JJJ [Some Mig that chased him3 [every pilot2 [t2 hit t3JJJ
The idea is that only in (147a) is the pronoun c-commanded by every pilot, so that it can be construed as a bound variable. In (14 7b), in contrast, him is in a phrase with broader scope than everyone. Consequently, him is not ccommanded by everyone and thus cannot be interpreted as a bound variable. Again, the explanation does not seem to work. Also in (147b) the quantified NP every pilot c-commands the trace of the NP with broader scope, so that him can indirectly be construed as a variable. This indirect construal is, again, quite normal at S-structure: (148)
[Which of his i sistersJj does every pilot like most tj
His can be construed as a bound variable, in spite of the fact that every pilot does not c-command it directly. The transfer properties of the trace are sufficient. Given the fact that the binding theory applies at S-structure, also for
90
Domains and Dynasties
pronouns bound to a quantified NP, and also given the fact that binding is a relation between A-positions, the explanation based on QR (147) is not intelligible. (For an alternative, see the analysis in terms of the Extended Name Constraint in Hai'k (1984).) QR does not have the desired effects because it overlooks the transfer properties of traces. Another problem has to do with the definition of c-command. Consider the structure of (139a), for instance: (149) [ds everyonej [s someonej [s tj loves tj]]]] This structure was supposed to solve the scope ambiguity because everyone (with wide scope) does c-command someone (with narrow scope), and not the other way around. This interpretation is inconsistent with the current definition of c-command, as noted by May (1985, ch. 2). According to the definition given by Aoun and Sportiche (1983), c-command is defined in terms of maximal projections: a node c-commands another node if each maximal projection dominating the first node also dominates the second node. Under the assumption that S' is a maximal projection in (149) (and not S), it follows from the Aoun-Sportiche definition of ccommand that the quantifiers in (149) c-command one another. In other words, the asymmetry in c-command, necessary for disambiguating the structure, is now lost. It is not easy to preserve the desired properties of QR, if the Aoun-Sportiche definition of c-command is accepted. In an interesting attempt to solve this serious problem, May (1985) proposes the Scope Principle: (150) Scope Principle ~::-sequences
are arbitrarily interpreted
A L-sequence is a class of operators '1', such that for any operator OJ, OJ E '1', OJ governs OJ- The mutual governance is expressed by mutual ccommand, so the two "operators" everyone and someone form a Lsequence in (149). The arbitrary interpretation that is allowed in such sequences according to (150) entails, among other things, that either everyone or someone has scope over the other. C-command, in other words, is no longer used as a disambiguating condition in (149). This is a rather radical departure from the assumptions of May (1977). To see how (150) works and interacts with other principles, consider the following examples (from May (1985, ch. 2)): (151)
a. b.
[ds every student2 [s some professor3 [s e2 admires e3]]]] [s{s some professor3 [s every student2 [s e2 admires q]]]]
In the theory of May (1977), these two representations served to disambiguate the sentence every student admires some professor. In the new
Levels of Representation
91
theory of May (1985), only (151b) represents the two readings. The two quantified NPs govern each other, so that the Scope Principle (150) applies: either some professor or every student has broader scope than the other. Interestingly, (151a) is supposed to be ruled out by the ECP at LF. According to this analysis, q is not properly governed because the antecedent-governor every student is not adjacent to e2. Contrary to q, e2 is not lexically governed, so that the sentence is ruled out by the ECP. (Later, in chapter 5, May reformulates this alleged ECP fact in terms of the Path Containment Condition of Pesetsky (1982b);) This analysis is subsequently applied to an interesting ambiguity, observed in the following sentences: (152)
a. b.
What did everyone buy for Max? Who bought everything for Max?
The first sentence, (152a), has two readings. If what has wide scope, a possible answer is (153a), and if everyone has wide scope, a possible answer is (153b): (153)
a.
b.
Everyone bought Max a Bosendorfer piano Mary bought Max a tie, Sally a sweater, and Harry a piano
In contrast, (152b) has only one reading, with who having wide scope. An appropriate answer would be: (154)
Mary bought everything for Max
According to May, the contrast between (152a) and (l52b) as regards scope possibilities is accounted for by an interaction of QR, the Scope Principle (150), and the ECP. This can be seen from the representations: (155)
a. b.
[S' What2 [s everyone3 [s e3 bought e2 for Max]]] [s' Wh0 3 [s everything2 [s e3 bought e2 for Max]]]
In both cases, the conditions of the Scope Principle are fulfilled, but (155b) is ruled out by the ECP. Consequently, only (155a) is well-formed as the representation of a scope ambiguity. This might seem to have the unwanted consequence that the grammatical (152b) does not have a representation at all. May, therefore, proposes another modification, based on earlier ideas of Williams (1977) and Sag (1976): QR can also adjoin material to VP. This yields the following representation for (152b): (156)
[s' Wh03 [s q [vp everything2 [vp bought q for Max]]]]
The ECP is no longer violated, and relative scope is no longer determined
92
Domains and Dynasties
by the Scope Principle, because the two quantifiers who and everything do not govern each other. If the Scope Principle does not apply, scope is determined in the old way, by c-command: who has wider scope than everything, because who c-commands everything, and not the other way around. I find this explanation unconvincing for a number of reasons. First of all, if QR can adjoin an NP to a VP and to an NP (as May later on assumes), why can QR not adjoin everything to Sf, yielding the following representation for (152b): (157)
[Sf Everything2 [Sf wh0 3 [e3 bought e2 for Max]]]
The conditions of the ECP are met, and the Scope Principle also applies (if it did not, everything would still have wide scope because of ccommand). It seems to me that (157) cannot be excluded without arbitrary stipulation. This is a serious objection, because (157) represents the reading that the analysis seeks to exclude. But apart from this analytical problem, there are also empirical reasons to reject the solution. Even in (152b) it is not quite clear whether the less accessible reading (with everything having wide scope) should be excluded. But even if there is a factor that rules this reading out, it is not obviously the structural factor related to the ECP. The problem is that there are structurally analogous cases in which it is perfectly natural for the universal quantifier to have wide scope: (158)
a. b.
Which chemical gives each wine its own flavor? Who knows the right medicine for each patient?
In these examples, the universal quantifiers can have wide scope, as is clear from the possible answers: (159)
a. b.
Tannin gives Bordeaux its flavor, and sulphite all the rest John knows the right medicine for this one, and Mary knows the right medicine for that one, etc.
Since these cases are structurally similar to (152b), we do not expect these readings if the ECP account is correct. Or in other words, the analysis does not seem to explain what it is supposed to explain. There is no clear evidence that everyone must be adjoined to the VP in certain cases. May mentions the following example from Williams (1977): (160)
Max saw everyone before Bill did
This example is ambiguous, according to Williams, depending on adjunction of the quantified phrase to the VP (the collective reading) or the S (the distributed reading):
Levels of Representation (161)
a. b.
93
[Everyone]j Max [vp saw tj] before Bill did [vp-] Max [vp everyonej [vp saw tj]] before Bill did [vp - ]
According to May, the difference in adjunction will correlate with whether the quantified phrase, or just the variable it binds, is "reconstructed" back into the position of the missing VP. He then goes on to say that (162) is not ambiguous: (162) Who saw everyone before Bill did? This sentence would only show the "collective" construal, while the "distributed" reading of (161) would be lacking. This would follow from the impossibility of S-adjunction in (162), due to the ECP (government of the subject trace of who would be blocked if everyone were adjoined to S). Again, it seems unlikely that (160) can be disambiguated by the two representations in (161). As before, the transfer properties of traces make these two representations equivalent. Both are non-isotopic representations of the same pair everyone (the lexical content in an N-position) plus its functional position (the trace in an A-position). If the VP is reconstructed in (161a), the content of the antecedent is reconstructed. This means that not only saw but also the content of the trace is reconstructed. But apart from this, I detect no clear difference between (160) and (162). Both can have the "collective" and the "distributed" reading for everyone. This is perhaps clearer with verbs like kiss, which are somewhat easier to construe with the distributed than with the collective reading:
(163)
Which girl kissed everyone before Sally did?
The distributed reading comes more readily to mind here than the collective reading. Obviously, the availability of the two readings is independent of Wh-movement. If the disambiguation (161) is accepted, then (163) is in fact a counterexample against the ECP account. If we do not accept the disambiguation (161) (and we should not, in my opinion), there is no longer an argument for LF-adjunction of quantified NPs to the VP. The second argument for VP-adjunction involves the following pair: (164)
a. b.
Which of his poems did every poet read? Which of his poems were read by every poet?
In (164a), it is natural to construe his as a variable bound to every poet. In (164b), this construal is considerably less natural, although not entirely impossible, as May notes. This would follow from the analysis given, because only in (164a) can his be in the scope of every poet, in accordance with the Scope Principle.
94
Domains and Dynasties
Again, the analysis does not carry much weight. To begin with, the bound variable construal is not entirely impossible in (l64b), as May suggests. If this reading is possible at all, (164b) is a counterexample against the intended analysis, because his would not be in the scope of every poet, which is adjoined to VP. The real point is that every poet indirectly c-commands his at S-structure, as required for binding. Binding from a by-phrase is not entirely impossible, as in a book by John about himself. Similarly, the trace in (164b) is not c-commanded by every poet, so that the bound variable construal depends on the extent to which one accepts binding from a by-phrase. A similar contrast can be found in: (165)
a. b.
Everyone said that he was happy It was said by everyone that he was happy
All in all, I see no argument, not even a subtle one, for adjunction to VP in (164). Another argument for adjunction of quantified NPs to VP, again based on earlier work by Williams (1977) and Sag (1976), is given in May (1985, ch. 3). A sentence like some student admires every professor is ambiguous in isolation, but in the following VP-deletion context, the ambiguity disappears (only a specific construal is available for some student): (166)
Some student admires every professor, but John doesn't
The explanation is, as before, based on the idea that only reconstruction of a VP containing the every-phrase will give rise to a well-formed logical representation. If every professor can be adjoined to the VP and to the S, the following structures are possible: (167)
a.
b.
[s Some student2 [s e2 [vP every professor3 [vP admires e3]]]], but John (does not) [vP every professor3 [vP admire e3]] [s Every professor3 [s some student2 [s e2 [vP admires e3]]]], but John (does not) [vP admire e3]
(167b) is considered problematic because of the fact that the second conjunct contains a free variable (the scope of every professor is limited to the first conjunct). The other example, (167a), is not problematic because the reconstructed VP contains a quantifier that binds the variable, thanks to the fact that every professor is conjoined to the VP in the first conjunct. It is this possibility of adjunction either to S or VP, together with the alleged impossibility of (167b), that explains the fact that some student can only have the specific reading (in which some student has wide scope) according to May. This would entail another argument for LF-adjunction to VP. It seems to me, however, that the specific reading of some student in
Levels of Representation
95
(166) is not a fact about VP-deletion but a fact about coordination. In noncoordinated structures, the ambiguity is preserved under YP-deletion: (168)
a b.
Some student admires every professor who John does Some student admired every professor at a time that John didn't yet
In both examples, the first part is identical to the first part of (166), and both examples involve YP-deletion in the same way as (166). In spite of this, both sentences are ambiguous: they also allow the nonspecific construal with some student in the scope of every professor. This is inconsistent with the explanation given for (166): we would expect the same exclusion of the nonspecific reading under the analysis given. If we look at other coordinate structures, however, we see the same pattern as in the coordinate structure (166): (169)
a. b.
Some student and John admire every professor Some student admires every professor and Bill only Quine
Both examples have nothing to do with YP-deletion, but both only allow the specific construal. I conclude from this fact that the specific reading in (166) has nothing to do with VP-deletion, so that it cannot be considered an argument in favor of adjunction to YP. Defacto this was the last argument in favor of adjunction to YP in May (1985). The last argument presented is not a real argument, because it shows only that the evidence in question is compatible with the adjunction-to-YP analysis. It does not show that the evidence in question must involve adjunction to YP. The relevant fact is: (170)
Every pilot hit some Mig that chased him
The point of concern is that him can only be taken as a bound variable if the some-phrase containing it has narrower scope than every pilot. This well-known fact is not compatible with May's earlier analysis (QR as adjunction to S) in conjunction with certain assumptions about weak crossover. May shows that these assumptions are compatible with his new conception of QR (adjunction to YP). As said before, this compatibility does not give new support for the analysis. The fact in question is also compatible with a number of other analyses (for instance, in terms of the Extended Name Constraint in Haik (1984)). This concludes my criticisms of the arguments in favor of QR-asadjunction-to-YP. I will end with one last argument in favor of LF (and QR), one of the strongest arguments for such a level, according to May (1985, ch. 4). The argument is about the inverse-linking cases, discussed in May (1977):
96 (171)
Domains and Dynasties Somebody from every city despises it
May argues convincingly that it is not a donkey-pronoun (like the it in Every man who owns a donkey beats it), and that Hark's concept of indirect binding is therefore problematic for these cases (Hai'k (1984)). In inverselinking cases like (171), everyone has wide scope. It is thanks to this widescope property, expressed by c-command at LF, that it can be interpreted as a bound variable in (171). There are two claims involved: (i) it is a bound variable, bound by every city, (ii) the binding relation is expressed by c-command at LF. This, according to May, is one of the strongest arguments for LF: (171) clearly shows that c-command at S-structure does not work (as assumed by Reinhart (1976), and recently by Hai'k (1984)). One could, of course, say that if c-command at S-structure does not work, something else at S-structure might work. It is after all not a priori certain that all prominence relations must be reduced to c-command at some level. But it is not necessary to make this move. The reason is that (171) does not seem to involve binding at all, precisely because every city does not c-command it at S-structure. The examples that May gives do not involve simple quantified NPs like someone and everyone, but noun phrases like each city and each pianist. In the latter class, the quantifier is expressed by a determiner, and the restriction on the quantifier is expressed by the head noun. There is a considerable difference between these two kinds of quantified NPs. Simple quantifiers practically require c-command for bound-variable interpretation (at S-structure). The other quantified NPs behave almost like descriptions or names. Thus, if we replace every city in (171) by everyone, the sentence becomes ungrammatical: (172)
*A brother of everyone hates him
We find a similar con trast among cases of weak crossover. All of the following examples are less than optimal, but (173b) is considerably better than (173c): (173)
a. b. c.
?His father hates John ?? H is father hates every pianist *His father hates everyone
We find a similar contrast among the following sentences: (174)
a. b. c.
The father of John hopes that he will win prizes The father of every pianist hopes that he will win prizes *The father of everyone hopes that he will win prizes
In (174c), everyone does not c-command he at S-structure, which makes the sentence ungrammatical. Maybe English is a bad example because of
Levels of Representation
97
the deviant behavior of everyone in certain contexts. But in most languages, a simple quantified NP must c-command a bound pronoun at Sstructure (see Higginbotham (1980) for Mandarin Chinese, and Koopman and Sportiche (1982) for French, Dutch, and other languages). Quantified NPs like each pianist, however, can even be embedded in a sentential noun complement (175) The fact that each pianist plays Mozart does not prove that he likes music Such examples are also beyond the scope of QR, which is certainly not designed to move quantified NPs out of a complex NP, in spite of the fact that the canonical paraphrase of (175) is something like: (176)
For every x, x a pianist, the fact that x plays Mozart does not prove that x likes music
In short, with pronouns related to NPs like each pianist we find nothing like binding conditions at all, no c-command at S-structure or at LF. Rather, the quantified NPs behave like quasi-names, and anaphora seems to assume the character of free anaphora to some extent (see Lasnik (1976)). There is direct evidence that it is not in a bound position in Somebody fi'om every city despises it. We have examples like: (177)
a. b.
Somebody from every city hates the place The parents of each pianist want the fellow to be happy
Epithets and the like are never possible in strict binding positions, i.e. if the quantifier c-commands the A-position in question: (178)
*Each
pianist thinks that the fellow is happy
The contrast between (178) and (177b) shows that the fellow is not bound at all by each pianist in (177b). The inverse-linking cases, therefore, do not challenge the view that binding is expressed by c-command at S-structure, nor do they form any evidence for a level of Logical Form. This concludes the discussion of a fair sample of arguments for Logical Form. However interesting the idea of LF has always been, I can only conclude that there is no evidence for it (the same conclusion will be drawn from the ECP evidence in chapter 4). In a way, this conclusion is more negative than in the case of D-structure and NP-structure. Nothing real corresponds to LF, there are no known properties of LF movement, and there are no known properties of the level itself (and the same can be said of the ECP, to which I will return). In the case of D-structure, the conclusion was different. In spite of the
98
Domains alld Dynasties
fact that not all argument positions could be filled from lexical positions by "move alpha", we could still distinguish a substructure of basic epositions, projected from the lexicon. There is also good evidence that this substructure plays a role in grammar. There is nothing real, however, in the case of Logical Form. It is my belief that the inspiration from the quantifier-variable notation of the standard predicate calculus was wrong to begin with. The analogy between Wh-movement and the creation of quantifier-variable structure was very misleading in this respect. It is not the first time that generative grammar has been led astray by images from logic textbooks. Generative Semantics was inspired by the same illusion. In chapter 7, I will make some further remarks to motivate my scepticism about logicism in the study of grammar. Syntax, as I see it, has nothing to do with logic, natural or otherwise. The essence of syntax, the configurational matrix, defines a structure without meaning or inherent purpose. It might be an entirely accidental, epiphenomenal spin-ofT of our brain structure. In any case, it is in itself not a calculus for some inherent purpose, like the expression of meaning. The idea that syntax is a "natural" calculus to that end is, in my view, an obstacle to the scientific study of grammar.
2.5. Conclusion Traditionally, the idea of distinct levels of representation has played an important role in generative grammar. According to some, it is even the most important idea in generative grammar. Also in the framework presented here, at least the following levels must be distinguished: lexical structure, surface structure, and most important, S-structure. Furthermore, it seems likely - although beyond the scope of grammar as such - that there are levels of representation in which aspects of syntax and meaning are integrated. Empirically speaking, little is known about these integrated levels. What we find in the current model theoretic approaches to syntax/ semantics cannot be the biologically real model we are looking for. The reason is that as yet syntax is often treated in these approaches as something that is already known, whereas only the dimmest outlines have been glimpsed. Integrated levels can only be constructed if the constituent elements are better known. They are at a different level of abstraction than what has been studied in generative grammar. The idea that distinct levels of representation are the essence of syntax has led to a true proliferation of levels in recent years. Some of these levels are based on elements that have been part of generative grammar since its inception. D-structure is an example. Others, like LF and NP-structure, are relatively new. The conclusion of this chapter is that there is no convincing evidence for D-structure, NP-structure, or LF.
Levels of Representation
99
S-structure is, according to Chomsky (1981b, 39), factored into two components: D-structure and "move alpha". The possibility of generating s_structure directly has been envisaged since Chomsky (1973). The standard GB approach to these matters, called "theory Ia" in Chomsky (1981b, 90), assumes that only D-structures are base-generated and that Sstructures are derived from them by the rule "move alpha". The alternative defended here, called "theory Ib" by Chomsky, generates S-structures directly, without the rule "move alpha". According to Chomsky (1981b, 90), "[iJt is not easy to find empirical evidence to distinguish between Theories Ia and Ib. It may be that they are to be understood as two realizations of the same more abstract theory, which captures the essential properties of UG at the level of abstraction appropriate for linguistic theory." Particularly, theory Ib has never been accepted as the better theory because it is thought that it would still have to distinguish D-structure as a substructure and, instead of "move alpha", interpretive rules with the properties of "move alpha". These rules would have properties distinct from other construal rules, so that there would be no support for theory Ib on empirical grounds (Chomsky (1981b, 92)). I disagree with this interpretation for two reasons. To begin with, we have shown that there are more arguments at S-structure in non-epositions than can be "filled" by a rule with the properties of "move alpha". This is particularly clear in topicalization structures and easy-toplease constructions. Furthermore, the discovery of empty resumptive pronouns in parasitic gap constructions and in islands has obliterated the idea that empty categories bound by a Wh-phrase in CO MP must be generated by "move alpha". The gaps in question cannot be generated by "move alpha", because the antecedent-gap relation does not have the properties of "move alpha". Most important of all, in spite of the fact that this issue has existed for almost 15 years, the proponents of theory Ia have not been successful in isolating distinctive properties of "move alpha". In this chapter, I have shown that "move alpha" cannot be functionally defined. It is, just like the other construal rules, an instance of the property-sharing rule. It is filtered by the same uniqueness principle (which accounts for the selectivity of property sharing) and it has the same configurational properties (in the unmarked case). In what follows, I will show that the bounding conditions of "move alpha" can be factored into two components. Unmarked bounding is defined by the Bounding Condition of Koster (1978c). According to this condition, empty categories are bound in their minimal governing category. In the unmarked case, the "minimal governing category" falls together more or less with the notion "maximal projection". Thus, an empty category must be bound in the minimal NP, PP, AP, or S' in which it is governed. I will show in chapter 3 that a subclass of control structures is characterized by exactly the same Bounding Condition. It concerns those control complements for which there is much evidence that they are
100
Domains and Dynasties
transparent (like S'-deletion complements), so that PRO must be governed. In chapter 4, I will show that in some languages, like Dutch, traces are characterized by the strict Bounding Condition just mentioned. Interestingly, gaps in nonstrict bounding contexts have entirely different properties from standard traces. They must be pro according to Cinque (1983b) and Obenauer (1984), and can only be of the category NP. That these gaps only occur in some languages, but not in others, is explained in terms of certain directionality constraints. The most important aspect of the conclusions of chapters 3, 4, and 5 is that there is no difference between the locality principle for governed PRO and the (unmarked) locality principle for "move alpha". In chapter 6, I will show that the unmarked locality principle, the Bounding Condition, also plays a role in reflexivization. In chapter 1, I indicated that all licensing relations are also characterized by the Bounding Condition, including the governance relation itself. In the same chapter, I also indicated that gapping is constrained by the Bounding Condition (for details, see Koster (1978c, ch. 3)). All in all, it appears that there is much evidence that Subjacency is an artefact There is no evidence that "move alpha" is characterized by a locality principle distinct from what we find in rules of construal. It is not true, in other words, that theory Ib must mimic "move alpha" with a construal rule with the properties of "move alpha". It is rather the case that there is nothing at all that has the properties of "move alpha". If this conclusion is correct, then theories Ia and Ib are not notational variants. Theory Ib must, on the contrary, be the better theory and theory la is refuted. NP-structure was in part based on an assumption underlying the idea of D-structure. Van Riemsdijk and Williams (1981) admitted that there were no compelling arguments for NP-structure (and D-structure) apart from considerations of elegance and naturalness. I will return to these considerations shortly. Empirically, the NP-structure model was not quite successful, particularly because of the existence of the nonstandard gaps, bound longdistance by a Wh-phrase. The content of these gaps cannot be reconstructed by assuming NP-structure and "move alpha". The reason is that "long distance" A'-binding does not have the properties that are attributed to "move alpha". (A deeper reason for scepticism is, of course, that "move alpha" itself is a dubious concept.) Nevertheless, the gaps-bound-at-adistance often show the "reconstruction" properties of filled gaps at NPstructure. This shows that NP-structure, as a literal, "physical" reconstruction of the content of the gaps, cannot be right. Also in other respects, it was shown that NP-structure could not be defined as the level of certain properties, such as binding or Case-marking. In spite of these disagreements, I think that the NP-structure model is superior to the standard model in one respect, namely in its treatment of
Levels of Representation
101
Wh-traces as reconstruction sites (and not as variables). This has led to a certain scepticism about LF, shared by the theory presented here. The essence of the critique on LF is that the Wh-phrase-trace relation has nothing to do with the operator-variable structures of the standard predicate calculus. In this sense, the NP-structure model has led to what in my opinion is decisive evidence against the logistic interpretation of Whmovement (see Van Riemsdijk (1982, 1983)). If Wh-movement does not create operator-variable structures, the idea that inspired LF and LF movement becomes meaningless. But empirically as well, the idea has not been convincing. In this case, LF movement is not even considered to have the properties normally attributed to " move alpha". And the properties that the level of LF is supposed to have remain in the dark. In chapter 4, I will show that there is no clear evidence for the ECP (al1 dependent elements must be governed, whether they are lexical or not). Also, attempts to establish the idea that LF is the level of anaphor binding have been very unsuccessful. This latter fact has something to do with the similar scepticism about other levels: even if "move alpha" existed, it would be configuration-preserving in a sense, because of the transfer properties of traces. The fact that traces have transfer properties, like anaphors, might be the clue to an understanding of the fact (if it is a fact) that the derivational approach is not successful in the long run. Since its inception, transformational grammar has been characterized by what we might call "derivational concepts" and "representational concepts". In the beginning, there was a heavy bias towards derivational concepts, such as "movement", and the cycle (see Chomsky (1965), the culmination point of this early development). Already at an early stage, there were also purely representational concepts. The notion c-command, for instance, goes back to the notion "in construction with", found as early as Klima (1964). The other fundamental notion, locality, has had a more mixed career. A fundamental shift took place when Dougherty (1969), Jackendoff (1972), and others developed representational theories for bound anaphora. This led to a "mixed" perspective in Chomsky (1973), according to which locaJity principles for anaphors are representational (as in the current binding theory), and locality principles for traces are derivational (Subjacency). Up until recently, this has been the standard perspective (but see Jenkins (1976), Lightfoot (1977), Freidin (1975) and (1978), Koster (1978c), and more recently Cinque (1983b), Rizzi (1983), Sportiche (1983), and others). What has undermined the derivational perspective? Trace theory, ultimately the idea of Structure Preservingness (see Emonds (1970) and (1976}), in my opinion. Structure Preservingness has two aspects, the Emondsian idea that certain landing sites of moved categories are already base-generated, and trace theory. It is, incidentally, the case that other non transformational approaches, like the ones developed by Brame and Bresnan, were also originally inspired by the work of Emonds. The
102
Domains and Dy nasties
underlying idea is quite natural: if transformations only create structures that are already base-generated, what sense does it make to have transformations in the first place? This query was answered in two ways: (i) there must still be a link to the positions in which categories are licensed, and (ii) movement rules have different properties from construal rules. The first motivation for movement, the licensing problem, was entirely undermined by the second aspect of Structure Preservingness, the development of trace theory since Chomsky (1973). Curiously, traces were not accepted by some of the earlier proponents of base-oriented approaches, like Bresnan and Brame. It seems to me that in the traceless base-oriented approaches that they developed, the problem of strictly local licensing was never solved. This perhaps had to do with a misguided anti-abstractness bias, particularly with the idea that a syntactic position is defined exclusively by lexical content. Obviously, however, a syntactic position is defined by lexical content and (or) by context, i.e. by licensing relations. By simply assuming that lexical content must be distinguished from the functional position it fills (as in Aspects, and particularly as in Chomsky (1967)) we get trace theory practically for free (see also Chomsky (1981b, 85ff.)). Trace theory solves the problem of strictly local licensing, even if the lexical content associated with a functional position is " at a distance". The development of trace theory, which was indeed more implicit in earlier variants of transformational grammar than sometimes realized, brought, as we saw, theory Ib to the fore as a potential alternative to the derivational approach. The defense of this theory in Koster (1978c) was, naturally, directed against the only remaining motivation for "move alpha", the idea that movement rules have distinct properties. From the same perspective, Freidin (1978) showed that another core notion of the derivational approach, the principle of the cycle, does not have an independent status in the theory of grammar (under the assumptions of trace theory). In retrospect, it is clear why it has been thought for such a long time that "move alpha" has distinct properties. Long Wh-movement is an accidental property of English (and some other well-studied languages). At first sight, this looks like a process with properties very different from what we see in the construal rules. This conception has been undermined by two facts. First of all, there are languages with local Wh-movement, but without long Wh-movement. In Dutch, for instance, most Wh-movement is of the strictly local variety. Apart from the exception discussed in chapter 1, long Wh-movement exists only as an iteration of local Wh-movement ("successive cyclic movement"). In other languages with Wh-movement, like certain varieties . of German, long Wh-movement does not exist at all. On the other hand, there are languages with "long construal", like the long reflexivization found in Icelandic (see Thdtinsson (1976), Maling (1981), Anderson (1983), and Everaert (1986)). A language like Dutch, with mostly strictly local movement and construal, is in a way, then, a case in which the similarity
Levels of Representation
103
between "movement" and "construal" is most obvious. It is interesting to see to what extent the idea of Structure Preservingness has undermined the derivational approach with its proliferation of levels. In the early stages of the Extended Standard Theory, the issue of levels was much connected with the question at what level semantic interpretation applied (see for instance J ackendoff (1972)). The basic idea was that certain aspects of interpretation must be stated at deep structure, while others must be stated at surface structure. Given the importance that was originally attached to the question which levels are the levels of semantic interpretation, it is significant that this whole issue has almost disappeared. There are very few linguists that refer to GB theory as "interpretive semantics" (also because of the disappearance of Generative Semantics). The issue has largely lost its significance because of Structure Preservingness, mainly because of trace theory. Chomsky (1975, 117), for instance, indicates that thanks to traces all semantic interpretation can be done at S-structure. It has been insufficiently realized that thanks to the Emondsian aspect of Structure Preservingness (pre-generated landing sites), one could just as well say that all (or most) semantic interpretation can be done at the level of D-structure. One of the important aspects of Van Riemsdijk and Williams (1981) is that they show that Wh-movement (and other forms of movement to A'-positions) has practically no effect on semantic interpretation. It is because of this that they consider the possibility of semantic interpretation before Wh-movement (i.e. at their level of NPstructure). But precisely the landing sites of NP-movement are pregenerated in Emonds's sense. Thus, one of the traditional arguments for semantic interpretation at S-structure is a sentence like the following (see Chomsky (1981b, 43)):
(179)
They seem to each other [t to be happy]
The argument has always been that the binding rules for anaphors must apply at surface (or S-) structure, because only at this level is there a suitable antecedent (they) that c-commands the anaphor each other. This argument is far from compelling under current assumptions. Consider the D-structure of (179):
(180)
NPi seem to each otheri [theYi to be happy]
The binding theory says that each other must be A-bound. Clearly, this condition is fulfilled at D-structure: each other is bound by the empty subject NP i , the future landing site of they. The binding theory, in short, could just as well be applied at D-structure. Of course, full interpretation requires the further information that the antecedent is they. But this, too, can be determined at D-structure. We can stipulate that lexical NPs are only interpreted in relation to a Case position. They in (180) is not in a
104
Domains and Dynasties
Case position, so it cannot be interpreted in situ. But it can be interpreted elsewhere in the pre-generated NP-chain, namely in its only Case position, the matrix subject position in (180). Given the correctness of the view that movement to A'-positions is semantically neutral (at least for bound anaphora), all relevant positions for semantic interpretation are already present at D-structure. It is Structure Preservingness in both senses, then, that ultimately undermines the derivational approach. Everything relevant for semantic representation is present at both D- and S-structure. Similarly, LF movement cannot change anything in principle, given the transfer properties of traces. Semantically, the distinction of three levels, S-structure, Dstructure, and LF, is useless. As already mentioned, Van Riemsdijk and Williams (1981) realize that the arguments for NP-structure are not compelling in a theory with traces. Similarly, Chomsky (1981b) mentions several times that the arguments for D-structure are "highly theory-internal" in a theory with traces. It is important to note, then, that there is some consensus that S-structure is by far the best-established level of syntactic representation. A preference for the extra levels of D- and NP-structure is mainly based on considerations of elegance and naturalness. It is not immediately obvious what is meant by elegance and naturalness in this context. It seems to me that the underlying assumption of both Chomsky and Van Riemsdijk and Williams is that the natural relation between a functional category and its lexical content is isotopic, in the sense defined earlier. This is in my opinion the essence of the idea that certain things are most naturally represented at distinct levels. What all these levels appear to do, among other things, is to reconstruct relations of isotopic property sharing. It is highly significant that Van Riemsdijk and Williams (1981) consider various possibilities of reconstruction only in this sense: it is either reconstruction of isotopic relations before Wh-movement (NP-structure) or reconstruction of isotopic relations after Wh-movement (at LF). Non-isotopic property sharing is somehow considered less basic, and therefore derived. Hence, the insistence on the concept of reconstruction, i.e. the reconstruction of isotopic relations. I will conclude this chapter by showing that there is no fundamental difference between isotopic and non-isotopic property sharing, and that there is therefore no reason to reconstruct anything by the postulation of levels. Ultimately, isotopic property sharing and non-isotopic property sharing are manifestations of exactly the same deeper principles, namely the properties of the configurational matrix. Note first that the preference for isotopic property sharing is not universally applied to all syntactic relations. For construal rules, it is generally assumed that property sharing is non-isotopic and not reducible to isotopic property sharing. Thus, a reflexive pronoun shares its referential index with its antecedent non-isotopically. Isotopic property sharing would mean in this case that the reflexive originates in a position
105
Levels of Representation
where the referential index is assigned directly, followed by "movement" to its surface structure position. This is impossible, hence the universal acceptance of what I call non-isotopic property sharing. It has been a fundamental idea in the approach originating with Koster (1978c) that lexical content must essentially be treated like a referential index: it can be shared non-isotopically, by categories in two different positions. At this point, it could be objected that there is still a difference: referential indices must be shared non-isotopically and lexical content can only be shared non-isotopically. Isotopic property sharing for lexical content and the functional category assigned to it could still be basic. I will show now that this apparent difference is deceptive. The fact that isotopic property sharing is not possible in the case of bound anaphora is entirely due to independent factors. This opens the perspective of a conceptually unified theory: all syntactic property sharing is either isotopic or non-isotopic. In fact, I will make a stronger claim: there is no need for the distinction between isotopic and non-isotopic property sharing. These terms were only used for purposes of exposition. Both types of relations are manifestations of the configurational matrix. Ultimately, then, there is only one type of syntactic relation, characterized by one set of properties. Consider the properties of the configurational matrix once again: (181)
a. b. c. d.
obligatoriness uniqueness of the antecedent prominence of the antecedent locality
Compare next a representation of isotopic property sharing (182a) and a representation of non-isotopic property sharing (182b): (182) a.
s·
b.
~S
COMP
~YP I Y~NP. John NP
I saw
I' what
S'
~S I~ NP NP YP
COMP j
Jwt
J01hll
np I I
saw
Ii
In both cases, we can distinguish two aspects of the object (of saw): the functional position NP j (i.e. [NP, VP] with its Case and 9-license) and the associated lexical content [NP whatJj. In (182a), the functional position dominates the lexical content (so that the two fall together). In that case, the properties of the two entities are shared isotopically, as we called it. In (182b), we find exactly the same two ingredients. In this case, the functional category does not dominate the lexical content. Note furthermore that the dominance relation is such that the lexical content also dominates
106
Domains and Dynasties
the functional category in (182b) (i.e. NP j from the lexicon to which the features of what are assigned). As I mentioned, the isotopic representation (182a) has somehow been considered natural or basic, and the non-isotopic representation derived. I have already indicated in the discussion of LF that (182a) and (182b) are semantically equivalent representations. I will now show that it is meaningless to construe (182a) as the "reconstruction" of (182b). Nothing needs to be reconstructed, because both (182a) and (182b) represent a relation with the same properties, namely the properties of the configurational matrix. For (182b), this is nothing new. Let us therefore consider (182a). The first property, obligatoriness, is fulfilled: the functional position NP j must be related to the lexical content [NP what]j, and vice versa. Uniqueness is also a property of the isotopic relation. Thus, it is not possible (in noncoordinated structures) to assign two lexical contents, [NP John]j and [NP BillJi, to one functional position NP j: (183)
*Np·
A
John
Bill
The fourth property, locality, is trivially fulfilled. Both the functional category and its lexical content are in the same domain. Let us now turn to the third property, prominence, which appears to be the crucial property. As discussed in chapter 1, the standard prominence property is c-command, usually defined as follows (see Aoun and Sportiche (1983)): (184) a c-commands ~ = df every maximal projection dominating a dominates ~, and a does not dominate p What is surprising, from the present perspective, is the curious stipulation about dominance (in bold type in (184)). This ad hoc addition to an otherwise natural principle has accompanied the definition of c-command since Reinhart (1976). To my knowledge, it has not been noticed that the stipulation in question occurs twice in the current theory of grammar, at least partially. What I have in mind is Chomsky's iii-filter (Chomsky (1981b, 212)): (185) *[y ... 8 ... ], where y and 8 bear the same index As Chomsky shows, this filter is independently motivated. It is also a very natural principle: a category cannot overlap in reference with a proper subpart of it But it should be clear that the filter duplicates what the stipulation in the definition of c-command purports to do. Consider an example:
107
Levels oj Representation (186)
Np·
~N'
Det
~ PP
N
A
P
Np·
I
1
each other
According to the stipulation in the definition of c-command, NPj cannot be the antecedent ofNP j , and the same is prevented by the filter (185). For related reasons, NP j cannot be the antecedent of itself: the essence of anaphora is that it involves incomplete lexical items: the referential index cannot come from itself, but must be shared with an NP with a content that is complete in the desired sense. Given the independent status of the filter (185), we can drop the stipulation in the definition of c-command, thereby simplifying the concept of prominence. But note now that the stipulation concerned "dominance", i.e. the factor that differentiates isotopic property sharing from nonisotopic property sharing with respect to the properties of the configurational matrix. If we drop the reference to dominance, which is independently motivated, there is no longer a difference between isotopic and non-isotopic property sharing with respect to (181): both modes of property sharing are complete and equal realizations of the properties of the configurational matrix. The idea, then, that isotopic property sharing is more natural or basic has no theoretical foundation. Consequently, it is meaningless to reconstruct somehow a situation of isotopic property sharing, for instance by the postulation of distinct levels of representation. We have another reason, therefore, to be sceptical about a proliferation of levels: they reconstruct something that does not need to be reconstructed. Isotopic and non-isotopic property sharing are manifestations of one and the same relation with one set of defining properties. The difference between the mode of property sharing in anaphora and the position-content relation follows from independent factors. By giving up the idea of reconstruction levels we can simplify the theory of grammar in more ways than one: first of all the levels of NP- and 0structure are eliminated as artefacts. Furthermore, "move alpha", also an artefact, can be seen as the manifestation of a rule with more general properties, the property-sharing rule with the properties of the configurational matrix. And third, we can unify the definition of property sharing by simplifying the definition of c-command. I will now conclude with the speculation that thanks to the simplification of the notion prominence, important aspects of the (so-called) base rules can also be reduced to the properties of the configurational matrix. By dropping the stipulation about dominance in the definition of c-
108
Domains and Dynasties
command, the cluster of configurational properties is extended to vertical relations, i.e. relations in which one of the terms in a relatiQn dominates the other. This is exactly what we want, because it is clear that "property sharing" is also vertical in this sense. Thus, the elements of an X/_ projection share certain properties, which is indicated by the notation of X-theory. There is also a long tradition of relations that extend beyond the boundaries of a projection. This is the "long" property sharing usually referred to as percolation. We might ask ourselves now whether "vertical" relations can be studied from the same perspective. The unified treatment of isotopic and non-isotopic property sharing described above suggests such an extension. I t is indeed the case that the vertical relations in tree representations are characterized by the properties of the configurational matrix: all nodes (except the root) are obligatorily and locally dependent on a unique, more prominent node. In a way, then, I agree with Chomsky (1982a, 16) that it is better to reduce the rewriting rules of the base to simple things like X/-theory and "move alpha", than the other way around. In the same vein, I believe that the properties of the configurational matrix characterize the foundation of all syntactic dependency relations, both "horizontal" and "vertical". All the rest is filling it in.
NOTES 1. In fact, I will argue in chapter 5 that these structures involve movement of it. See also Bennis (1986). 2. According to some recent proposals, examples like (46) involve two chains (Chomsky (1986b)). For an argument against this idea, see chapters 4 and 6 below. 3. I am following standard assumptions here. For a different account of nominative Casemarking, see chapter 5. 4. For some qualifications, see chapters 1 and 4. 5. This condition will be further discussed in chapter 6. 6. The fact that operator-variable notation was developed so late in the history of logic may contradict Chomsky's view that this notation is somehow natural to the human mind. 7. Q is the scope marker of Katz and Postal (1964), which was mentioned in chapter 1. Ultimately, I believe that Q can be dispensed with.- See the discussion of "vertical locality" in chapter 1. 8. This is one important reason why Halk's scope indexing is not a notational variant of QR (as has been claimed by Hornstein (1985)).
Chapter 3
Anaphoric and N onanaphoric Control
3.1. Introduction
According to our theory, most grammatical relations have a common core. Functionally, this common core can be characterized as free property sharing. Formally, this property sharing holds for those relations that are characterized by the configurational matrix that was extensively discussed in the preceding two chapters. The central question of the present chapter is whether control (of the PRO subject of infinitives) is also characterized by the configurational matrix. As usual, the most important issue concerns the locality property of the configurational matrix, the Bounding Condition. In chapters 4 and 5, I will continue the line of my earlier work (Koster (1978c)) by demonstrating that both Wh-movement and NPmovement are - in the unmarked case - characterized by the Bounding Condition, which is essentially a one-node version of Subjacency (interpreted as a condition on representations). In chapter 6, it will be claimed that the Bounding Condition is also the unmarked locality principle for the binding theory. Provided, then, that these conclusions are correct, a considerable degree of unification will be reached if it can be shown that control is also characterized by the Bounding Condition. According to standard assumptions (Chomsky (1981b)), control is something rather different from movement or bound anaphora. I fully agree with this standard view as long as we consider the total of properties of control vis-a-vis the full set of properties of "movement" or binding. But as soon as we analyze the relations in question into their components, a rather different picture arises. Control, for instance, is often considered a matter of argument structure (whatever that is). I will not take issue with this view, but I will claim that argument structure is not the whole story. What little we know about argument structure does not suffice to account for the sharp division we find in control structures between optional and obligatory control in the sense of Williams (1980). Both forms of control involve principles of argument structure, but obligatory control is also characterized by the principles of the configurational matrix. The latter situation only arises, as I will show, in transparent complements with governed PRO. Governed PRO behaves like an anaphor in that it is always strictly locally bound. There are, in other words, two kinds of control, namely anaphoricand nonanaphoric control. 109
110
Domains and Dynasties
If anaphoric control in the sense of this chapter exists, it is a phenomenon of great theoretical significance. It would be the case that there is a well-defined subclass of control structures that has all the properties of the configurational matrix in common with "movement" and bound an aphora. This would not only be a significant step in the direction of a more unified theory, it would also be a vindication of the Thesis of Radical Autonomy, according to which the core properties of grammar are entirely construction-independen t.
3.2. Where binding and control meet It is quite commonly assumed that infinitival complements are sentential in that they have a subject. In accordance with Koster and May (1982) I will assume that the embedded subject is an empty category, so that a sentence like John wants to go has the following structure: (1)
Johnj wants [ej to go]
The empty element ej is usually referred to as PRO. A crucial question is how this element is related to the binding theory. Are its properties totally independent from the configurational matrix, and is it in all cases governed by a separate theory, the theory of control? Or are there certain overlaps and interactions? One of my central claims in Koster (1978c) was that the following two sentences have the same configurational properties: (2)
a. b.
John seems [e to go] John tries [e to go]
Furthermore, I claimed there that the usual distinction between the empty elements (trace in (2a) and PRO in (2b)) is not based on the alleged fact that trace and PRO are two different primitives. I considered the 8-status of the matrix subject with respect to the two different verbs to be totally independent of the configurational properties of the relation between John and e. In a sense, then, a distinction was made in Koster (1978c) between trace and PRO, or at least a clear distinction was made between the antecedenttrace relation and the antecedent-PRO relation (see pp. 32-34). Any listing of differences between these two relations (Chomsky (1982b, 87)) is therefore an inadequate response to my original claim, as is clear from the references just given. What I really had in mind did not concern the relations but the primitive status of trace and PRO. These categories had been claimed to differ intrinsically (in feature content), whereas they had the same primitive status in Koster (1978c). More recently, Chomsky (1981b, 1982a) has also
Anaphoric and Nonanaphoric Control
111
abandoned the position that there are intrinsic differences among empty categories. Thus, until recently at least, there was a growing consensus that the properties of empty categories are contextually determined (but see Chomsky (1986a» . On the other hand, in Koster (1978c) I tended to underestimate the independent status of the theory of control. I now agree with Chomsky (1981a,b) that there is such a theory, independent from the binding theory. Here too, then, the different positions have become less sharply distinguishable. In spite of these convergences, I would like to claim that the standard analysis of the difference between (2a) and (2b) is not quite correct. I wish to maintain that (2b) is characterized not only by the independent theory of control, but also by the full binding theory. In other words, the binding theory and the theory of control overlap in certain cases (though not in others). The arguments for an independent theory of control are familiar by now. In terms of the previous discussion we can say that all four properties of the configurational matrix (which are also the properties of anaphor binding) can be violated. Thus, there are control constructions without obligatory antecedents (3a), with split antecedents (3b), with nonc-commanding antecedents (3c), and with nonlocal ("long distance") antecedents: (3)
a. b. c. d.
It is impossible [e to help Bill] John proposed to Mary [e to go to the movies] It is difficult for M my [e to help Bill] John thinks [s it is impossible [8 e to shave himself]]
It is true that we never find such deviant properties for the antecedenttrace relation, but from this fact it cannot be concluded that the antecedent-PRO relation is generally lacking the properties of the configurational matrix (which always characterize trace binding). The crucial point is that examples like (3a-d) form a precisely definable subclass of the antecedent-PRO relation. Williams (1980) has made the important observation that there are two distinct clusters of properties associated with PRO, Obligatory control (the control in complements of verbs that do not select for or a gerund) has roughly the properties of the configurational matrix, whereas the deviant properties exemplified in (3) occur only in complements that do select the complementizer for or a gerund. Thus, all the examples (3a-d), and in general all examples usually given to demonstrate the trace-PRO distinction, involve optional PRO: l (4)
a. b. c. d.
It is impossible [for Mary to help Bill] John proposed to Mary [for Bill to go to the movies] It is difficult for Mary [for John to help Bill] John thinks it is impossible [for him to shave himself]
112
Domains and Dynasties
In other words, I have never seen an argument for the deviant properties of control on the basis of obligatory PROs in the complements of verbs like try, begin, condescend, etc. Try, for instance, has an obligatory PRO in its complement and never selects a jor-complementizer: (5)
*John tried very hard [for Bill to go]
The relevant fact, now, is that so-called raising predicates (seem, be likely, etc.) are also forms that never select a jor-complementizer or a gerund. Traces in embedded subject position (of infinitivals) are, so to speak, just as obligatory as obligatory PROs: (6)
a. b.
*It seems [for Bill to go] *It is likely [for Mary to help Bill]
These are the matrix predicates that select reduced S's (referred to as S'Deletion in Chomsky (1981b)). Such reduced clauses are transparent for government from the matrix verb, which is also necessary for exceptional Case-marking in the complements of believe-type verbs. Note once again that believe-type verbs do not select jor (see also Chomsky and Lasnik (1977)): (7) *John believes [for Mary to go] I believe that the phenomenon referred to as S'-deletion is in fact the absence ofCOMP: 2 (8)
a.
Full clauses
b.
s'
S'
~
COMP
Reduced clauses
S
I
S
Small clauses (in the sense of Chomsky (1981b)) can also be subsumed under (8b): if these structures are clauses at all, they are clauses without a complementizer. It is generally agreed that reduced clauses are transparent for government from the matrix verb. This can now be seen as an automatic consequence of the absence of COMPo Traces in the complements of raising verbs can be governed this way, as required. Let us assume now that government is the crucial factor that determines that empty categories are bound in accordance with the four properties of the configurational matrix. Are there reasons to exclude PRO from this pattern? The opposite appears to be true. If we assume that PRO can be the subject of reduced infinitival clauses, we can explain why these PROs are exactly the PROs that are bound in accordance with the four properties of
Anaphoric and Nonanaphoric Control
113
the configurational matrix (just like traces). Consider, for example, the fol1owing structure: (9) John tries [s{s e to go]] If we assume that try selects a reduced clause, an automatic consequence of the fact that try does not selectjor, we must conclude that tries governs the embedded subject e (as in the case of seem). This is perhaps forbidden by the standard government and binding theory, but that is a disadvantage of that theory, because governed PRO makes the right predictions in cases like (9): absence of COMP always triggers the pattern of the configurational matrix, not only in an antecedent-trace relation but also in an antecedent-PRO relation. In other words, whether the pattern of the configurational matrix is triggered or not does not depend on the intrinsic content of the relation (antecedent-trace or antecedent-PRO) but on construction-independent configurational factors. Once again, it appears that the pattern of the configurational matrix is a radical1y autonomous pattern.
33. Some minimal properties of control One objection against the theory of governed PRO is the loss of an ECP explanation for the fol\owing case:
(10)
*John was tried [e to go]
According to the ECP explanation, this sentence is ruled out because the trace of John (the embedded e) cannot be governed by try, since this is not a verb that triggers Sf-deletion. If we assume that try does select a reduced complement, this explanation is lost. This is, however, the point where the independent theory of control comes into play. Since it is this theory that explains (10), let us take a closer look at it. Although some minimal assumptions about control are in order, it is not my purpose here to give a theory of control (see Manzini (1983a)). My main concern is the overlapping properties of anaphor binding and a subclass of control constructions. I will therefore limit myself here to control of infinitival complements to verbs. 3 The least that can be said about the theory of control in the sense intended here is that it involves argument structure (see Chomsky (1981b, 77)). Infinitival subjects (of verb complements) are usually control1ed by one or more arguments of the matrix predicate. Since these arguments are minimally contained by the next higher clause, control appears to be a rather local process. That there are apparent cases of "long distance" control involves the fact that arguments can remain implicit. Therefore, let
114
Domains and Dynasties
me first explain what I understand by implicit arguments. As is well known, the by-phrase of a passive construction is optional. Thus, we find both John was hit and John was hit by someone. Semantically speaking, the agent is still presupposed in the first case. This tacitly present agent remains part of the argument structure, and it is this kind of hidden argument that I will refer to as an implicit argument. Another example is John gave his money, where the indirect object is implicit. I would like to' claim now that certain processes - in particular, processes that crucially depend on argument structure - do not always distinguish between explicit and implicit arguments. Control is an example of such a process. Thus, consider a verb like suggest. A person who suggests something has an addressee in mind: (11)
My teacher suggested to me to take another topic
In this case, I am the one who receives suggestions. In an appropriate context, the same content can be expressed by leaving the receiver implicit: (12)
My teacher suggested -
to take another topic
The point here is that the (implicit) receiver remains the controller. A further claim is that, apart from some marginal exceptions, so-called long distance control involves an implicit controller in the immediately adjacent matrix clause of the infinitival complement: 4 (13)
It is difficult to take another topic
The understood subject of to take anothel: topic is the same person (or set of persons) for whom it is difficult to take another topic. Difficult is a subjective modat there is always someone for whom something is difficult. If this hidden argument is explicitly expressed, it must be the controller. Thus, in the following sentence Bill is the controller, and not Mary: (14)
Mary said it was difficult for Bill to take another topic
Thus, long distance control in such cases is possible only if the Jar-phrase is not made explicit, and only if it can implicitly be interpreted as the long distance controller: (15)
Mary said it was difficult to take another topic
In this case, Mary can be interpreted as the con troller because Mary can be interpreted as the one for whom the particular action is difficult. The examples just given reveal another aspect of argument-structure binding - namely, the fact that the argument can be contained in a
Anaphoric and Nonanaphoric Control
115
characteristic PP, in this case ajor-phrase. This is why c-command can be violated: the relevant argument qualifies as a controller, no matter how it is structurally expressed. In general, this means that the controlling arguments can be left implicit, or can be couched in a characteristic PP. It is, of course, also possible that the controlling argument c-commands the embedded subject As we have seen before, this third option is obligatory for those cases of control that are also subject to the binding theory, since the binding theory requires an explicit c-commanding antecedent (cases like try). What all three manifestations of the controlling argument have in common is that they belong to the lexical structure of the predicate in the adjacent matrix clause. This leads to the first- locality- property of an important class of control cases: (16)
The controller is an argument of the minimal argument structure containing the control complement.
A second restriction is that the controller is a designated argument (perhaps predictable on the basis of a more inclusive theory of control). Thus, for promise only the subject qualifies as a controller (under certain circumstances), whereas for persuade the object must be chosen. The common assumption here is that this information must be stipulated in the lexical structure of these verbs. There are also verbs for which two arguments are possible (see Chomsky (1968, 48) for such examples): (17)
a. b.
John asked Bill to go John asked Bill to be permitted to go
I suppose that in such cases both the matrix subject and the indirect object are lexically designated controllers, and that further choices are either pragmatically induced or determined by a future, more inclusive theory of control (which, again, is not our primary concern here). In any case, we can modify (16) as follows: (18)
The controller for an embedded subject (PRO) is a designated argument of the minimal argument structure containing the control complement.
These minimal assumptions about control primarily concern argument structures determined by a V (or a copula with an A). Quite similar observations can be made about NPs. Recall, for instance, the interesting examples given by Postal (1969):5 (19)
a. b.
America' s attempt to attack Cuba at night the American attempt to attack Cuba at night
116
Domains and Dynasties
In both cases, the controller is the (understood) subject of attempt, ie. America. Again, we see that in argument-structure processes c-command is not required. 6 It does not matter how the relevant argument is structurally expressed. It can be inferred from an adjective, as in (19b); it can also be couched in a characteristic by-phrase, as in (20a), or be left implicit, as in (20b): (20)
a. b.
the attempt by America to attack Cuba at night the attempt to attack Cuba at night
Other familiar examples are: (21)
a. b.
We found plans to kill the Ayatollah We have plans to kill the Ayatollah
In both cases, the controller is an implicit argument of plans (someone's plans, our plans), the nature of which is again determined pragmatically. With this minimal, rather conventional theory of control in mind, we can return to our original problem, the ungrammaticality of (22) (= (10)): (22)
*John was tried [e to go]
The explanation appears to be quite simple. Try is not only a verb that selects reduced clauses (which makes the embedded PRO an anaphor subject to the binding theory); it is also, unlike seem, a verb of control. Since it is a control verb, it must have a designated argument that serves as the controller. For try, this designated element is the underlying subject, which is not explicitly expressed in (22). Why, then, can this argument not be left implicit or be expressed by an agentive by-phrase? It is here that the independent binding theory comes into play. Since the infinitival complement of try in (22) is a reduced clause, its subject e (PRO) is governed and must therefore be bound in its minimal governing category. This means that there must be an obligatory c-commanding antecedent. John in (22) is the only NP that fulfills these conditions. But John is not the controller according to the independent theory of control, so the sentence is ruled out. In other words, (22) is ungrammatical because the combined requirements of the theory of control and the binding theory cannot be met. 7 The explanation of the ungrammaticality of (22) is analogous to the explanation of the ungrammaticality of (23): (23)
*Bill
was promised [e to go]
This sentence is ungrammatical because the designated controller, the underlying subject of promise, is absent. Again, the controller cannot be left implicit since promise is also a verb that selects reduced clauses (it does not select aJor-complementizer). As before, this leads to government of the
Anaphoric and Nonanaphoric Control
117
PRO-subject of the complement. Governed PRO is an anaphor subject to principle A of the binding theory. In other words, e must be bound by Bill in (23) (the only possible binder according to the binding theory). Again, this NP is not the underlying subject, which is the designated controller of promise. As in the case of (22), the requirements of the binding theory and the theory of control are not compatible, hence the ungrammaticality.8 In conclusion, it appears that the ECP is by no means needed to explain the ungrammaticality of (22). It follows just as well from the independently needed factors that rule out (23). In general, the type of account just given explains what Bresnan (1982, 402) calls "Visser's generalization": " ... the observation that verbs whose complements are predicated of their subjects do not passivize". Bresnan gives the following examples (her (84) and (86» : (24)
(25)
a. b. c. d. e. a. b.
c. d.
e.
He strikes his friends as pompous The boys made Aunt Mary good little housekeepers Max failed her as a husband The vision struck him as a beautiful revelation Mary promised Frank to leave *His friends are struck (by him) as pompous *Aunt Mary was made good little housekeepers (by the boys) *She was failed (by Max) as a husband *He was struck (by the vision) as a beautiful revelation *Frank was promised to leave (by Mary)
Examples (24e) and (25e) correspond to (23). The other examples involve small clauses as complements (in the sense of Chomsky (1981b». Small clauses lack a complementizer and are therefore transparent for government by the matrix verb. This entails that all PRO-subjects of the small clauses are governed anaphors that must be bound by a c-commanding NP. The designated controllers (the underlying subjects) are also ccommanding antecedents for binding in (24), but not in (25). In the latter set of examples, only the derived subjects are possible binders, which are not the designated controllers. In short, our theory of governed PRO in reduced clauses (without a complementizer) explains Visser's generalization where it operates in combination with lexical stipulations concerning the designated controller. These lexical stipulations (which can perhaps in part be reduced to more general principles, like Rosenbaum's Minimal Distance Principle),9 are of the same nature for cases of obligatory control (in the sense of Williams (1980» and optional control. The difference is that in the case of obligatory control these lexical properties interact with the binding theory. It is this interaction that gives our account explanatory force, since it relates Visser's generalization to a general pattern - namely, the pattern of four properties of the configurational matrix that also characterizes many other core grammar dependencies that have nothing to do with control.
118
Domains and Dynasties
It is not clear whether the lexical-functional variant of generative grammar explains Visser's generalization. According to this alternative, control from the by-phrases in (25) is excluded by the stipulation that obligatory controllers must be subjects or objects in a non oblique form (Bresnan (1982, 376)). Lexical stipulations in this framework may refer only to "semantically unrestricted" functions like subject and object in the case of control, and not to oblique functions like by-phrases. This is said to follow from "severe constraints on the lexical encoding of semantically restricted functions". But these "severe constraints" have no explanatory value, since they do nothing beyond stipUlating that by-phrases occur in certain lexical frames but not in others. Bresnan mentions another empirical generalization, Bach's generalization, which also follows from our account where the object of a verb is an obligatory controller, "intransitivization" is impossible (Bresnan (1982, 418)). Bresnan gives the following examples (her (122) and (123)):
(26)
a. b.
(27)
a.
b.
Louise *Louise Louise Louise
taught Tom to smoke taught to smoke signaled Tom to follow her signaled to follow her
In (27b) the object can be omitted, but not in (26b). According to our previous account of such cases, controllers can be omitted (or given in oblique form) only if the complement is a full clause with a complementizer (for). This prediction is confirmed by the following examples from Bresnan (1982) (her (124) and (125)):
(28)
a. b.
*Louise taught Tom for him to smoke Louise signaled Tom for him to follow her
All in all, it seems that our hypothesis of governed PRO has a real explanatory advantage over the standard theory (which does not allow governed PRO) and the lexical-functional approach. In our theory, absence of the complementizer Jar (in D-structure) makes PRO accessible for the governing verb of the matrix clause. This makes PRO an anaphor according to the standard binding theory (principle A), so that it is predicted that the controller cannot be left implicit (as in (26)) or expressed with a by-phrase (as in (25)). The reason is that the binding theory requires an explicit, c-commanding antecedent. Our hypothesis therefore explains the generalizations made by Visser and Bach. The standard theory, on the other hand, has no obvious explanation for the fact that the impossibility of the Jor-complementizer is necessarily correlated with an explicit nonoblique antecedent. This salient fact remains entirely accidental. The same can be said about the lexical-functional approach, because, as
Anaphoric and Nonanaphoric Control
119
we have seen, the impossibility of implicit or oblique controllers in certain cases is entirely a matter of stipulation in this framework. The facts of English, therefore, give strong support to the claim that PRO must be governed under certain circumstances. Even stronger support for this claim appears to come from the complement system of Dutch.
3.4. Infinitival complements in Dutch There is an intriguing difference between English and Dutch concerning certain facts discussed by Williams (1980). Williams points out that the following type of construction is generally impossible with verbs that do not select aJor-complementizer: (29)
*It was tried [e to see Bill]
Such constructions do occur, however, with verbs that select Jor. Williams makes the strong claim that such constructions are possible only with verbs that select Jor:10 (30)
It was arranged [e to see Bill]
It is easy to see that (the possibility of) such contrasts follows from the assumptions made so far. In (29) the complement is transparent because it is never introduced by Jor. PRO (e) is therefore governed in (29), which as an anaphor in its minimal entails that it must be bound governing category. The only available antecedent is the matrix subject it, which is not the designated controller for the control verb try. According to control theory, the underlying subject is the only possible controller, which conflicts with the requirements of the binding theory. In (30), however, the verb arrange selects a Jor-complementizer (which must be deleted if it is not followed by a lexical subject; see Chomsky and Lasnik (1977)). This makes the infinitival complement opaque for government from the matrix verb arrange. Consequently, e (the embedded PRO) does not have to be bound according to the binding principles. The only theory that applies in this' case is the theory of control. Contrary to the binding theory, this theory allows an implicit argument as controller; hence the possibility of (30). This type of explanation receives remarkable support from certain facts in Dutch. In this language, the equivalent of tly, the verb proberen (which has exactly the same meaning), differs from t/y only in that it can select an optional complementizer om. Interestingly, this produces a grammatical equivalent of 'the ungrammatical English example (29): (31-)
Er werd geprobeerd [(om) e Bill te bezoeken] there was tried COMP Bill to visit
120
Domains and Dynasties
In Dutch, an SOV language, these complements introduced by complementizers occur only to the right of (the underlying position of) the verb. The complement of a verb like proberen can also occur to the left of the verb, in which case the embedded verb is obligatorily adjoined to the matrix verb (Verb Raising in the sense of Evers (1975)). I will return to the Dutch complement system in more detail, but at this point it is only relevant to mention an exceptionless fact infinitival complements on the left-hand side of the matrix verb (which undergo Verb Raising) never have a complementizer. This renders these complements transparent in many respects, as I will demonstrate. What is crucial at this point is that our hypothesis predicts that PRO in these complements is also accessible for government from the matrix verb, so that this PRO must be bound in accordance with principle A of the binding theory. In other words, it predicts that we will never find examples like (31) for transparent complements. This is indeed the case. If the complement of proberen is to its left, it patterns like English try: (32)
*Er werd [e Bill !l] geprobeerd te bez6ekenl there was PRO Bill tried to visit
These are representative examples. Extraposed complements with the possibility of the om-complementizer (like (31)) pattern like English examples with jor-complements, whereas Verb Raising complements, which never have a complementizer, pattern like English verbs that never have a jOI'-complementizer. What is so striking about the Dutch verb pl'obel'en is that we see the two distinct patterns with the same verb. I take this as strong evidence for the thesis of governed PRO. 11 The general pattern becomes even more perspicuous if we consider the Dutch complement system in more detail. The Dutch complement system differs from the English system in two respects. First, like English, Dutch has infinitives with and without the morpheme te (English to). Dutch, however, has a much wider use for infinitives without teo Roughly, Dutch has te-less infinitives not only where English has to-less infinitives (as in John saw Bill go), but also where English has gerunds. As I will show, te-Iess infinitives form a Dutch counterpart of the English gerund, which explains why the distribution of these infinitives deviates from that of other infinitival complements. The most important difference between Dutch and English, however, involves the underlying SOV structure of Dutch. The exact nature of the difference will become clear as we proceed, but for the moment it suffices to consider the major facts. In contrast with English, Dutch has infinitival complements on both sides of the matrix verb. Certain complements occur only in extraposed position, to the right of the matrix verb. I will refer to these complements as extra posed complements. Other complements occur only to the left of the matrix verb, in which case, as we saw in connection with (32), the verb of the complement is obligatorily adjoined to the
Anaphoric and Nonanaphoric Control
121
matrix verb. This process is called Verb Raising (VR), and I will refer to the complements in question as VR-complements. It is very interesting from the point of view of our present theoretical concerns to see which complements occur only in extraposed position, and which complements occur only as VR-complements, or as both VR-complements and extraposed complements. The entire pattern is rather complex and has been unravelled by Evers (1975). As I will show, several of the classical problems can be solved within the framework of the theory of government and binding. As described by Evers (1975), the properties of the Dutch infinitival system can be summarized as follows:12
(33)
a. b. c. d.
e. f.
g.
Only extraposed complements can be introduced by COMPo Raising (to subject position) occurs only from VR-complements. Control is possible with both extraposed and VR-complements. Infinitives without te occur only as VR-complements. Exceptional Case-marking occurs only with VR-complements. Obligatory control (in the sense of Williams (1980)) is a property of VR-complements. Only VR-complements show certain transparencies (Verb Raising, R-movement, adverbial scope).
There is a beautiful pattern in these complex data, but this only becomes clear if the data themselves are crystal clear. Therefore, let me summarize the data in yet another way: VP
(34)
S'
~
COMP
v
j
\
S
y
V R-complements Extraposed complements
a. b. C.
d.
e. f. g.
COMP Raising Control Without te Exceptional Case Obligatory control Transparency
+ + + + + + +
+
Before going on, it is useful to point out a preliminary generalization on
122
Domains and Dynasties
the basis of this summary. The analysis of English complementation in section 2 showed that there are reasons to correlate the absence of a complementizer with transparency phenomena such as government into embedded clauses, raising (and obligatory control), and exceptional Casemarking in believe-type verbs. My theory differs from the standard theory in considering obligatory control to fall in the same natural class as the other transparency phenomena (usually referred to as S'-deletion phenomena). What is so interesting about (34) is that in Dutch, transparent and opaque infinitival complements are "physically" separated by the matrix verb. All transparency phenomena are found in VR-complements, which typically lack COMP (see (34a)). What really strongly confirms our theory of obligatory control is that it patterns with raising and other transparency phenomena (cf. (34f) and (34b)), and not with control in general (cf. (34f) and (34c)). It seems to me that much of (34) can be explained by the standard assumptions of government and binding theory such as the ECP (traces must be properly governed), together with two additional, well-motivated assumptions. The first rather crucial additional assumption is the claim made by several linguists that government of argument positions is directional. 13 There must be a parameter that determines that languages are either SVO or SOY. A simple formulation of this parameter is the idea that government (of argument positions) is directional. In SVO languages the verb governs to the right, and in SOy languages it governs to the left. This simple assumption accounts for the fact that (35a) is grammatical in an SOY language like Dutch, whereas (35b) is not: (35)
a. b.
Ik I *Ik I
denk think denk think
dat that dat that
hij he hij he
Mary zag Mary saw zag Mary saw Mary
In (35a) Mary is governed by the verb zag, because the verb governs to the left. (35b) is ungrammatical because Mary is to the right of the verb, so that it remains ungoverned. In an SVO language like English, the opposite pattern holds. A second additional assumption is that te-less infinitives are 9-marked (and perhaps Case-marked), which accounts for the NP-like distribution of these clausal complements. I will return to this matter in the more general discussion of te-less infinitives. Let us now turn to some illustrations of the facts listed under (34) and the principles that explain them. First, the possibility of COMPo We have seen that a verb like proberen 'try' can select a complementizer am if its infinitival complement is in extraposed position. The same holds for many other control verbs: they select om-complements only if the complements are extraposed but never if
Anaphoric and Nonanaphoric Control
123
they are to the left and undergo Verb Raising. The following facts illustrate this: (36)
a.
b.
(37)
a.
b.
Ik denk dat zij probeerde (om) het boek te lezen I think that she tried CO MP the book to read 'I think that she tried to read the book' Ik denk dat zij (*om) het boek probeerde te lezen I think that she (COMP) the book tried to read 'I think that she tried to read the book' Ik denk dat hij weigerde (om) Mary te kussen COMP Mary to kiss I think that he refused 'I think that he refused to kiss Mary' Ik denk dat hij (*om) Mary weigerde te kussen I think that he (COMP) Mary refused to kiss 'I think that he refused to kiss Mary'
It is sometimes claimed that the lack of a complementizer in VRcomplements shows that these complements are in fact VPs. But this argument has no force. First of all, such a VP-analysis would entail that one verb (such as proberen, weigeren) has two clause-like complements instead of one: a VP-complement to the left of the matrix verb and an S'complement to the right. This can hardly be considered an elegant conclusion. More important, even VP-analysts would have to stipulate that complements with complementizers do not occur to the left of the verb. The point is that tensed complements, which always have a complementizer in Dutch, cannot occur to the left of the verb either:
(38)
a.
b.
Ik denk dat hij zei dat hij zou komen I think that he said that he would come 'I think that he said that he would come' *Ik denk dat hij dat hij zou komen zei I think that he that he would come said
On the basis of these facts, which have nothing to do with infinitives, we can conclude that there is a general ban in Dutch against complementizers to the left of the verb. One would hope to find an explanation for this fact, but whatever rules out (38b) seems sufficient to rule out the ungrammatical variants of (36b) and (3 7b) as well. 14 Moreover, there is nothing inherent to VP-analyses that requires VP-complements to be generated to the left of the matrix verb. One would have to stipulate this fact, which reduces the explanatory advantage of VP-analyses to zero. This becomes even clearer if we consider the next case, raising complements. Dutch has many raising complements with te and without te preceding the infinitive. Both types of complements occur only to the left of the matrix verb, as VR-complements. For te-Iess complements, this also follows from an independent factor to which I will return shortly. Crucial
124
Domains and Dynasties
cases are therefore raising complements with te, like the complement of schijnen 'seem': (39)
a.
b.
Ik denk dat zij het boek schijnt te lezen I think that she the book seems to read 'I think that she seems to read the book' *Ik denk dat zij schijnt het boek te lezen I think that she seems the book to read
These facts follow from the standard assumption of the theory of government and binding that traces must be properly governed, in conjunction with the well-motivated assumption that government is directional for arguments (leftward in Dutch). This becomes clear if we consider the underlying structures: (40)
a.
b.
Ik denk dat zij [t het boek te lezen] schijnt *Ik denk dat zij schijnt [t het boek te lezen]
( = (39a)) (= (39b))
(40a) is the structure underlying (39a) (before raising of the verb lezen). The trace t is properly governed (as required by the ECP) since the verb schijnen governs leftward. It is for this reason that (40b) (underlying (39b)) is ungrammatical: the trace is to the right of schijnen, where it cannot be governed. It is here that raising crucially differs from control. As we saw in (36) and (37), both proberen and weigeren select a te-complement that can occur on both sides of the matrix verb. This is allowed by the theory, as the structures underlying (36) demonstrate: (41)
a. b.
Ik I Ik I
denk think denk think
dat that dat that
zij probeerde [(om) e het boek te lezen] she tried COMP the book to read zij [e het boek te lezen] probeerde she the book to read tried
The contrast between (41) (two grammatical sentences) and (39) (one grammatical sentence) provides crucial evidence in favor of the theory of government and binding and the assumptions made here. In (39) (see (40)), the antecedent zij and its trace t form a chain, and empty categories in chains must be governed. (41) involves control, which means that the antecedent zij and the embedded PRO (e) are in two distinct chains. This follows from the e-criterion. 15 Unlike traces, PROs are not necessarily governed. This is why (41a) is accepted: e is to the right of proberen, so that it is not governed. This has no consequences here, contrary to what we see with schijnen ((39b), (40b)). I have demonstrated above that PRO can be governed in transparent complements (like (41b)), but nowhere
Anaphoric and Nonanaphoric Control
125
have I claimed that PRO must be governed. Only traces must be governed. It should be noted here that, as in Koster (1978c) and Chomsky (1981b, ch. 6), I am not making a distinction between trace and PRO as primitives. There is only one type of empty category, its status being determined by the relations into which it enters. In a chain, with an antecedent belonging to the same chain, an empty category must be governed; this requirement does not exist when an empty category has an antecedent in a different chain. In any case, no other theory that I know of explains the ungrammaticality of (39b) and the grammaticality of (41a). As in the lexicalfunctional framework, the theory advocated here makes it possible to group obligatory control together with raising (cf. the notion "functional control" in Bresnan (1982)). The overlap in binding properties follows according to the present theory - from the possibility of governed PRO under certain circumstances. Thus, both the lexical-functional approach and the present approach differ from the standard approach in that they classify obligatory control with raising. In contrast with the lexical-functional approach, however, the present approach is like the standard approach in that it assumes empty categories, (directional) government, and the ECP. This has led to an explanation for the fact that raising sometimes differs from control (cf. (39b)). On the other hand, it is not clear how the ungrammaticality of (39b) follows from anything in the lexical-functional approach. One can of course stipulate that raising complements are of a specific type. But nothing in a lexically oriented theory forbids these complements of a specific type from occurring to the right of the matrix verb. Again, requiring that schijnen 'seem' select VPs that occur only to the left of the matrix verb would be mere stipulation. The government and binding approach, on the other hand, explains the fact that raising complements in Dutch never occur in extraposed position. In this approach, this crucial fact follows from the ECP and the independently motivated assumption of directional government (see the references of note 13). If my assumptions are correct, then, the present approach has the advantage over both the standard and the lexical-functional approaches that it explains the properties of obligatory control (the properties of the configurational matrix). Besides, it has the extra advantage over the lexical-functional approach that it explains the contrast between raising and control with respect to the distribution of infinitival complements in Dutch. Dutch has an abundance of infinitival complements without te 'to'. This class of complements deserves special attention, because it involves some distributional peculiarities that confirm the idea of directional government. Infinitives without te come in three varieties: raising verbs (42), control verbs (43), and verbs with "exceptional Case-marking" (44):
126
(42)
Domains and Dynasties
a. b.
(43)
a.
b.
(44)
a.
b.
Ik denk dat Peter zal vertrekken I think that Peter will disappear Ik denk dat Mary moet blijven I think that Mary must stay Ik denk dat Peter boeken leerde lezen I think that Peter books learned read 'I think that Peter learned to read books' Ik denk dat Mary een auto wilde kopen I think that Marya car wanted buy 'I think that Mary wanted to buy a car' Ik denk dat Mary hem hoorde zingen I think that Mary him heard sing 'I think that Mary heard him sing' Ik denk dat Peter haar liet komen I think that Peter her let come 'I think that Peter let her come'
The first class (42) includes the Dutch auxiliaries. There is no reason, however, to assume that there is a special category Aux in Dutch. All socalled auxiliary verbs are rather regular verbs and lack the defective paradigms and deviant distribution of the English auxiliaries. What the auxiliaries have in common is that they do not assign a a-role to the subject In other words, they are raising verbs (like seem). This means that a sentence like (42b) is derived as follows: 16 (45)
a. b. c.
Ik denk dat [s NP [s Mary blijven] moet] NP Raising) Ik denk dat [s Mary [s t blijven] moet] Verb Raising) Ik denk dat [s Mary [s t ] moet blijuen]
All Dutch auxiliaries can be treated as normal verbs (Vs) that select a clausal complement. In this way, we can maintain the natural generalization that there is a one-to-one correspondence between subjects and verbs. Besides, it is by far the simplest solution because both NP Raising and Verb Raising are needed anyway. A solution that postulates a special category Aux for these verbs, or VP-complements, adds something to the grammar that is entirely superfluous. A control case like (43a) has the following underlying structure: (46)
Ik denk dat [s Peteri [s ei boeken lezen] leerde]
After Verb Raising we derive the structure underlying (43a): (47)
Ik denk dat [s Peteri [s ej boeken - ] leerde lezen]
In this case, the subject of the embedded clause (eJ is a PRO controlled by Peter. A sentence like (44a) is derived from the following underlying
Anaphoric and Nonanaphoric Control
127
structure: (48)
Ik denk dat [s Mary [s hem zingen] hoorde]
After Verb Raising its structure is as follows: (49)
Ik denk dat [s Mary [s hem-] hoorde zingen]
1 will return to these cases of "exceptional Case-marking" later. What all te-less complements in (42)-(44) have in common is that they occur only as VR-complements. They never occur in extraposed position. 17 (50)
*Ik denk dat Peter zal [s t het boek lezen] 1 think that Peter will the book read
The only grammatical variant is (51): (51)
Ik denk dat Peter [s t het boek - ] zal lezen] I think that Peter the book will read 'I think that Peter will read the book'
The cases of control (43) and exceptional Case-marking (44) also lack a variant with an extraposed complement (Evers (1975)): (52)
a. b.
*Ik denk dat [s Peterj leerde [s ej boeken lezen]] *Ik denk dat [s Mary hoorde [s hem zingen]]
The raising case (50) is ungrammatical because of directional government As with schijnen (cf. (39b)), the trace in the complement is not properly governed, because the matrix verb zal does not govern to the right. The same principle explains the ungrammaticality of (52b). Here we need exceptional Case-marking from the verb hoorde. But exceptional Casemarking is like regular Case-marking in that it only works under government. Since the verb hOG/'de in (52b) does not govern to the right, hem is not governed by this verb, so that it cannot receive Case from it. The only new problem is (52a), the case of control. The ungrammaticality of this example does not follow from directional government in the same way, because here the embedded subject ej is a PRO that need not be governed. Why, then, is (52a) ungrammatical? The answer again involves directional government, this time not with respect to the embedded subject, but with respect to the containing clause. Complements that have infinitives with te are not necessarily governed by a verb to their right, so that these complements occur on both sides of the matrix verb (cf. (36)- (37)). Te-less infinitives, however, differ in interesting ways. They behave like English gerunds in that they have NP-like
128
Domains and Dynasties
distribution. If te-Iess infinitives have the same distribution as NPs, the ungrammaticality of (52a,b) is what we expect: like NPs, the te-Iess complements do not occur to the right of the verb. This is not unlike English gerunds, which behave like NPs (Emonds (1972)) and do not undergo extraposition (Rosenbaum (1967, 45)). There is good evidence that te-Iess infinitives behave like NPs in other contexts. They are the only type of complement that can occur in subject position: (53)
Ik denk dat [boeken lezenJ noodzakelijk is I think that books read necessary IS 'I think that reading books is necessary'
Note that the te-Iess infinitive in the Dutch example is naturally translated into English by using a gerund. Other types of complements (tensed clauses and infinitives with te) are impossible in subject position: (54)
a. b.
*Ik I *Ik I
denk think denk think
dat that dat that
[dat hij that he [boeken books
komtJ noodzakelijk is comes necessary is te lezenJ noodzakelijk is to read necessary is
Not only in embedded clauses, but also in root sentences in which the finite verb precedes the subject (such as questions; cf. English subject-aux inversion), te-Iess infinitives are the only possible type of complement: (55)
a.
b. c.
Is [boeken lezenJ noodzakelijk? is books read necessary 'Is reading books necessary?' *Is [dat hij komtJ noodzakelijk? is that he comes necessary *Is [boeken te lezenJ noodzakelijk? is books to read necessary
In an earlier paper (Koster (1978a)), I concluded on the basis of data like (54) and (55b,c) that "subject sentences don't exist". Full clauses appear only in peripheral positions: in extraposed positions and in topicalized positions, but never in typical NP-positions such as subject positions. Emonds (1972), on which Koster (1978a) was based, noted that gerunds in English are different from full clauses in that they do occur in typical NPpositions such as subject positions. From (53) and (55a) it is clear that teless infinitives in Dutch have a status similar to that of gerunds in English. The NP-like status of te-Iess infinitives is confirmed by several other facts. One important example will be given in the next section: nouns (Ns) have neither NP-complements nor te-Iess infinitival complements. Another relevant fact is that te-Iess infinitives (with the exception of the
Anaphoric and Nonanaphoric Control
129
exceptional Case-marking cases, to which I will return) are the only type of VR-complement that easily undergoes topicalization: (56)
a.
b.
[Boeken lezenJ wil hij niet books read want he not 'Reading books, he does not want' [Piano spelenJ leert zij nooit piano play learns she never 'Playing the piano, she will never learn'
In this respect, te-less infinitives behave like NPs and not like some other clause types, which are often difficult to topicalize: (57)
a. b.
*[Boeken te lezenJ probeerde hij nooit books to read tried he never *[Oe hond te vangenJ heeft hij geweigerd the dog to catch has he refused
In short, there is plenty of evidence that te-less infinitives have an NP-like distribution, just like English gerunds. The question, then, is how to account for this fact. On the basis of their NP-like distribution, Emonds (1972) concluded that gerunds are in fact NPs. This would account for their distribution. However, the clause-like gerunds (and the equally clause-like te-less infinitives) are not NP-like in any other respects. They do not have a noun as their head, and they are in fact like regular clauses in most respects. In principle, the theory of government and binding allows another solution. As mentioned earlier in connection with directional government, NPs owe their distribution to their governors. The nature of the governor determines where arguments can be governed and where they cannot. Furthermore, it is generally assumed that a-marking and Case-marking depend on government. The latter two processes are concomitant features of the assignment of argument status to a category. Suppose now that, in general, clauses do not occur in a-positions, but that there is an exception to this rule, namely, gerunds (and te-less infinitives). These two clause types would then have argument status, without being NPs. Their similarity to NPs in distributional character would follow from something that they share with NPs, i.e. a-role assignment and Case-marking. Since these processes only occur under government, we can explain why the clauses in question occur in NP-positions without being NPs. Technically, we can consider Case assignment to be a relation between a Case assigner and a head. Thus, normally a Case assigner affects the head N of the NP to which Case is assigned. Similarly, we can assume that in the case of te-less infinitives, the matrix verb affects the head V of the infinitival complement. Schematically, then, we would have the following situation:
130 (58)
Domains and Dynasties ... [s ... V ... ] ... V ... t
I
Case If a V receives Case in this way, it must assume a particular morphological shape. In English, Case-marked Vs are realized as gerunds, in Dutch as teless infinitives. Considering Case assignment as a relation between a head and a Case assigner that is external to its maximal projection has the advantage that the fact that Ns do not assign Case can be related to a general property of Ns, as 1 will show in the next section. In conclusion, it seems to me that by incorporating the notion of directional government, the theory of government and binding explains two classical problems of Dutch syntax. First, it explains the fact that raising complements with te occur only to the left of the matrix verb (as VR-complements), whereas control complements with te occur on both sides of the matrix verb. Second, it explains the fact that te-less infinitives (involving raising or control) do not occur at all in extraposed position. Let us now turn to exceptional Case-marking in Dutch. This phenomenon is found only with verbs of perception (horen 'hear', zien 'see', etc.) and causatives like Taten 'let' (see Evers (1975, 4)):
(59)
a.
b.
Ik denk dat ik Chaar het boek - ] zag I think that I her the book saw 'I think that 1 saw her read the book' Ik denk dat ik [Peter de auto - ] laat I think that 1 Peter the car let 'I think that I'll let Peter wash the car'
Tezen read wassen wash
There are reasons to assume that these are cases of exceptional Casemarking, because there is no normal thematic relation between the matrix verb and the objective forms (haar in (59a) and Peter in (59b)) that depend on it. This was shown by De Geest (1972, 170), who gave examples like (60): (60)
Ik zag geloof overal ontbreken 1 saw faith everywhere lack 'I saw faith lacking everywhere'
A sentence like (60) does not entail II< zag geToof'l saw faith'. Therefore, this might be a form of exceptional Case-marking. On the other hand, there is a well-known heresy, raising to object position, which cannot be excluded out of hand. It is true that most of Postal's (1974) arguments are unconvincing (see Bresnan (1976)), but I am not entirely convinced by the arguments for exceptional Case-marking either. One persistent problem is that complements of verbs like believe (in their exceptional Case-marking analysis) fail to pass constituency tests.
131
Anaphoric and Nonanaphoric Control
Bresnan (1982), for instance, mentions Postal's Right Node Raising argument (61)
*Mary believes, but Catherine doesn't believe, Peter to be fat
The part that has undergone Right Node Raising (Peter to be fat) fails the test for constituency. This argument is not decisive for two reasons. First of all, it is not clear whether Right N ode Raising is a valid test for constituency. In English, and definitely in Dutch, there are cases of Right Node Raising of nonconstituents. 18 But even if it were a valid test, it would not be decisive since the tests in question do give sufficient, but not necessary, conditions for constituency. In spite of these objections, it must be said that a sentence like (61) constitutes a problem for a theory that incorporates the idea of exceptional Case-marking. A better case can be made on the basis of examples like (59) in Dutch. Dutch is a so-called verb-second language, which means that in root sentences there is always one constituent preceding the finite verb.19 This provides us with an excellent test for constituency: only constituents can precede the finite verb. We have already seen that te-less infinitives can be preposed in Dutch (cf. (56)). Alleged exceptional Case-marking in Dutch always involves teless infinitives (cf. (59)). It is a remarkable fact, then, that preposing the complements in question yields highly ungrammatical sentences: (62)
a. b.
*[Ham' her *[Peter Peter
boeken lezen] zag ik books read saw I de auto wassen] liet the car wash let
zelden rarely ik nooit I never
These facts strongly suggest that the preposed complements (between brackets) do not form constituents (unless one can find an independent explanation for the ungrammaticality). The alternative seems to give an advantage here because the objective forms haar and Peter can be left behind (in fact must be,left behind): (63)
a. b.
[Boeken lezen] zag ik haar zelden books read saw I her rarely [De auto wassen] liet ik Peter nooit the car wash let I Peter never
These facts provide compelling evidence for the constituency of the preposed complements,l° Presumably, these complements are clauses,. just like in the other cases of preposed te-less infinitival complements (cf. (56)). Under the alternative analysis, it could be maintained that the pre posed
132
Domains and Dynasties
constituents in (63) are VPs, the subjects of their clauses being left behind. This would be a highly problematic conclusion, however, since there is no independent evidence in Dutch for VP-preposing. 21 In short, examples like (62) and (63) give some evidence for the foIlowing underlying structure for an example like (59a):
s
(64)
~VP I~
NP
:~
r TA ~h::~
I
~!,
t
NP
V
het boek 'the book'
lezen 'read'
61
AIl in all, it seems to me that the debate "exceptional Case-marking versus raising to object position" is still open. Fortunately, the outcome of this debate is irrelevant for the issues that concern us here. What reaIly matters in this context is that both analyses require the embedded clause to be transparent. Chomsky (1981b) assumes Sf-deletion both for exceptional Case-marking and for raising. In both cases the embedded subject position in (64) must be governed by the matrix verb. Thus far, we have used the notion of directional government and the essential transparency of VR-complements for the following facts: (65)
a. b. c. d.
Passivization of VR-control verbs is impossible (32). Raising complements are always VR-complements. Te-less infinitives cannot be extraposed. Exceptional Case-marking (or raising to object position) is possible in VR-complements.
In the remainder of this section, I will show that there are several other transparency phenomena that distinguish VR-complements from extraposed complements. Crucially~ I will show that several of these transparency phenomena can be observed in control complements as long as they are to the left of the matrix verb. Often the same control verbs can also have their complements in extraposed position, in which case no transparency can be found.
Anaphoric and Nonanaphoric Control
133
First, Verb Raising itself requires transparency. All complements to the left of the matrix verb undergo Verb Raising. It is not known why this is necessary, but it is no doubt possible thanks to the fact that the clause that loses its verb is transparent because it lacks a complementizer. Extraposed complements never undergo Verb Raising. 22 Second, there are certain transparency phenomena concerning reflexivization in Dutch that can only be observed in VR-complements (Koster (1985) and chapter 6 below). Consider the following contrast: (66)
a.
b.
Ik denk dat Peter [s Mary naar zich toe zag komen] I think that Peter Mary to himself prt. saw come 'I think that Peter saw Mary come toward himself' *Ik denk dat Peter Mary dwong [s' om naar zich I think that Peter Mary forced COMP to himself toe te komen] prt. to come 'I think that Peter forced Mary to come to himself'
(66a) involves a VR-complement; (66b) has an extraposed complement, introduced by the complementizer om. In (66a) the reflexive zich can be bound by Peter across the specified subject Mary. This is not possible in (66b), where Mary controls the subject of the complement. There is considerable independent evidence in Dutch that zich cannot be bound across the boundaries of a full clause (see Koster (1985) and chapter 6 below for details). The minimal S in (66a) is not a full clause, so that zich can have an antecedent external to it. In (66b), however, the minimal clause is an extraposed complement (which can be introduced by a complementizer, as indicated). Extraposed complements are full clauses, which are absolute boundaries for the binding of zich. Thus, there is a clear difference between VR-complements and extraposed complements with respect to binding possibilities. This difference is explained by assuming that the VR-complement is not a full clause (as indicated by the impossibility of a complementizer). The third kind of transparency phenomenon yields a clear difference between VR-complements and extraposed complements of the same (control) verbs. It has been known since Evers (1975) that clitics in VR-complements can be moved across subjects: (67)
Ik denk dat hij het Peter t hoorde zingen I think that he it Peter heard sing 'I think that he heard Peter sing it'
This is a striking fact because normally het cannot be moved across a subject. If the raising-to-object analysis is correct for such cases, het in (67) has been moved from its object position in the complement across the
134
Domains and Dynasties
raised constituent (Peter) in the matrix clause. This kind of "clitic climbing" is possible only from VR-complements. It is never possible to move het out of an extraposed complement: (68)
a.
b.
Ik denk dat Peter probeerde (om) het I think that Peter tried COMP it te geven to give 'I think that Peter tried to give it to Mary' *Ik denk dat Peter het probeerde (om) I think that Peter it tried COMP te geven to give
aan Mary to Mary
aan Mary to Mary
Other striking examples, which directly involve control complements, have to do with R-movement, extensively discussed by Van Riemsdijk (1978). In the following example, the particle er 'there' has been moved from its original position (69a) to a position in front of the subject (69b): (69)
a.
b.
Ik denk dat iemand er over schrijft I think that someone there about writes 'I think that someone writes about it' Ik denk dat er iemand t over schrijft I think that there someone about writes
It appears that VR-complements are no barrier for this kind of movement. Er can be moved from the complement across the matrix subject: (70)
a.
b.
Ik denk dat iemand [s er over] probeerde te schrijven I think that someone there about tried to write 'I think that someone tried to write about it' Ik denk dat er iemand [s t over] probeerde te I think that there someone about tried to schrijven write
This example constitutes direct and very strong evidence for the transparency of control complements as opposed to the opaqueness of extraposed complements, because er cannot be moved out of the latter: (71)
a.
Ik denk dat iemand probeerde [s' om er over I think that someone tried COMP there about te schrijven] to write 'I think that someone tried to write about it'
Anaphoric and Nonanaphoric Control
135
b. *Ik denk dat er iemand probeerde [s' om t over 1 think that there someone tried COMP about te schrijven] to write A last, equally compelling argument involves adverbial scope. It seems reasonable to assume that sentence adverbials like waarschijnlijk 'probably' have as their scope the minimal full clause containing them: (72)
Mary says that John is probably crazy
Probably has scope over the embedded clause, and not over the matrix clause. Again, we find a strjking contrast between the VR-complements of control verbs and their extraposed complements: (73)
Ik denk dat Peter [s het boek waarschijnlijk] probeerde te lezen to read 1 think that Peter the book probably tried 'I think that Peter probably tried to read the book'
It is very likely that the adverbial waarschijnlijk is contained by the complement: it is probably internal to its VP because it has passed the object het boek. It is therefore striking that the scope of the adverbial is not the complement but the next higher clause. This follows from the assumption that the scope of such sentence adverbials is the minimal full clause containing them, together with the assumption that the VRcomplement is not a full clause (it always lacks a complementizer). Limiting the scope to the complement does not even giv~ a possible reading, and again we see that extraposed complements are opaque:
(74)
*Ik denk dat Peter probeerde [s' om waarschijnlijk het COMP probably the I think that Peter tried boek te lezen] book to read 'I think that Peter tried to probably read the book'
If we try to limit the scope of waarschijnlijk to the complement, we get an impossible reading. The crucial point in this case is that the complement is opaque, so that we do not derive the reading where the adverbial has scope over the next clause up. In conclusion, it seems to me that all VR-complements show transparency phenomena. Transparency is expressed in Chomsky (1981b) by the device of S'-deletion. As we have seen, this phenomenon is correlated in English with the impossibility of the complementizer for. It is striking that VR-complements, which all show transparency phenomena in Dutch, can never have a complementizer. What is most crucial is that there is direct evidence that control complements also show these correlated phenomena:
136
'Domains and Dynasties'
absence of a complementizer plus transparency. But since control complements (in their VR-position) are so obviously transparent in Dutch, there is no reason to block government of PRO by the matrix verb. Since controlled VR-complements fall under the same generalization as other complements that show "S'-deletion"-like behavior, the conclusion seems inescapable: PRO must be governed under certain circumstances. Of course, this is not a frightening conclusion. It is, on the contrary, the only plausible explanation for the fact that PROs of a well-defined class behave like anaphors. In the next section, I will show that the nature of infinitival complements to nouns confirms the conclusion of this section. Ns are not proper governors and therefore select only opaque infinitival complements, both in English and in Dutch. 3.5. Asymmetries between N and V There is considerable evidence that nouns and verbs do not have the same governance properties. Richard Kayne has investigated these differences and expressed them by stipulating that Vs are structural governors and that Ns are just governors (1981, 1983). Normally a governor governs the categories that it subcategorizes, but a structural governor can also govern elements from other projections under certain conditions. Schematically, this difference is as follows: (75)
a.
". N ". [xn ". Y ". ] " . I
)(
government
b.
T
". V ". [Xn ". Y ". ] " .
T
I
government A V may govern across a maximal projection boundary X n, but an N never does. Although this difference has many interesting consequences, I will not explore them here, limiting myself instead to the consequences for infinitives. Nevertheless, it is important to bear in mind that there is considerable independent evidence for the distinction. Let me, therefore, give one example that has nothing to do with the infinitival complements that concern us here. Ross (1967) proposed a condition generally known as the Complex NP Constraint. This generalization entails that elements can never be extracted from the complement of an N. There are no such restrictions on the complements of Vs. In other words, there is no such thing as a Complex VP Constraint. This N-V asymmetry follows from the distinction shown in (75), in conjunction with the requirement that traces must be properly
137
Anaphoric and Nonanaphoric Control
governed. Assuming that Wh-phrases are extracted through COMP, we can express the distinction as follows: (76)
b.
a.
In (76a) the trace in COMP is not accessible for the potential governor N, because this N, not being a structural governor, cannot pass the maximal projection boundary Sf. In (76b) there appears to be no problem. V is a structural governor, so that it can govern the trace in COMPo Extraction from an Sf-complement of an N would, therefore, lead to an ECP violation, whereas extraction from a V-complement leaves the trace properly governed. In other words, the difference in governance properties of N and V, together with the ECP, explains why Vs can be bridges, whereas N s are generally not. With this much evidence in mind, we can now turn to infinitival complements. First, we can explain why gerunds do not occur as the complements of Ns, and more generally, why Ns do not assign Case. Recall that we have decided to consider Case assignment to be a relation between a governor and a head. If this is the correct view, we can immediately explain why Vs assign Case, whereas Ns do not: (77)
a.
AN\T
ne'
Case
b.
VP
~
V
NP
\1
Case
In (77a) N2 cannot receive Case from Nl because Nl cannot govern N2 across the maximal projection NP2. Since government is a necessary condition for Case assignment, N 1 cannot assign Case to N 2. Again, a similar problem does not arise in (77b) because V, as a structural governor, can cross the maximal projection boundary NP. I will assume, however, that Vs can govern categories in other projections only if these categories are not governed by another governor,
138
Domains and Dynasties
such as INFL or COMPo Thus, if a verb selects a for-complement, this complement is "protected" from government by the matrix verb. This accounts for the fact that for-complements are opaque, whereas complements without a (D-structure) complementizer appear to be transparent for government from the matrix verb. The consequences of this difference have been extensively illustrated in the previous sections. In Dutch, the category V is also a structural governor, but in this language we must take into account the effects of directional government. VR-complements never have a complementizer, are therefore transparent, and are open to government from the matrix verb. Extraposed complements often have a complementizer, but even without a complementizer they are not accessible to the governing powers of the matrix V.23 In this way, we have been able to explain a whole array of remarkable differences between VR-complements and extraposed complements. Given the fact, then, that Ns do not govern into another projection, we can make a very strong prediction. Our theory predicts that infinitival complements to N s are never transparent, but always opaque. This prediction is exactly, in all details, borne out. In order to see this, we must briefly go through the list of properties (34). First, N-complements can select complementizers. In English the complementizer for can be selected; in Dutch the complementizer am is often preferred: (78)
a.
b.
de poging am Nicaragua aan te vallen the attempt COMP Nicaragua prt. to attack 'the attempt to attack Nicaragua' het verlangen am rijk te worden the desire CO MP rich to get 'the desire to get rich'
N-complements differ in this respect from VR-complements, which never select a complementizer. Second, raising is impossible in NPs, as was already shown in Chomsky (1970) (see also Williams (1982)): (79)
a. b.
John appears [t to come] *John's appearance [t to come]
Exactly the same holds for Dutch. There is no raising in NPs: (80)
a.
b.
John schijnt ziek te zijn John seems sick to be 'John seems to be sick' *J ohn's schijn ziek te zijn John's appearance sick to be
)
Anaphol'ic and Nonanaphoric Control
139
As in the previous cases, this follows from the ECP and the fact that Ns do not govern across clause boundaries. As a consequence, the trace t is governed in (79a), whereas it remains ungoverned in (79b). Third, control is of course possible in NPs. Peter is the controller in the following example: (81)
Peter's attempt PRO to go home
Control in general does not distinguish opaque complements from transparent ones (cf. (34c)), so we will leave this property here. Fourth, te-less infinitives provide interesting confirmation for our analysis. In the previous section, they were analyzed as NP-like complements, which had to undergo Case-marking. Naturally, this can only happen under government. But Ns do not assign Case, as is well known. Under the assumption made before - that Case-marking involves government into an NP - we predict that N s do not select NPs, gerunds, or te-less infinitives. This is indeed the case: (82)
a. b.
Zij wil vertrekken she wants to leave *haar wil vertrekken her will to leave
This generalization has no exceptions, a remarkable fact never considered before to my knowledge. Fifth, exceptional Case-marking occurs only in the domain of a V, and never in the domain of an N: (83)
a. b.
Mary believes him to come *Mary's belief him to come
Again, this fact is easily explained by the requirement that Case is only assigned under government. A related fact was noted by Williams (1982): passivization is not possible from the complement of belief Compare: (84)
a. b.
The book was believed t to be stolen *the book's belief t to be stolen
The explanation is as before: only verbs govern into complements. Sixth, and most crucial, obligatory control (in the sense discussed before) does not occur systematically in NPs. Usually, it is not necessary to have an explicit, c-commanding antecedent: (85)
a.
Bill's attempts to leave the country
b. the attempts (by Bill) to leave the country c.
the attempts for Bill to leave the country
140
Domains and Dynasties
The most interesting cases involve a contrast between a verb and a closely related noun. Refuse seems to be a verb of obligatory control: (86)
a. b. c.
Bill refused to go *1 refused for Bill to go *It was refused (by me) to go
The corresponding noun refusal has the two possibilities that refuse lacks: (87)
a. b. c.
Bill's refusal to go Bill's refusal for Mary to go the refusal to go
The same can be said about Dutch. To my knowledge, there are no systematic examples of obligatory control. It is almost always possible to replace the subject-controller of an NP by an article. If this generalization is correct, it provides important evidence for the involvement of government (of PRO) in obligatory control. Without the assumption of governed PRO, there is no reason to expect that obligatory control patterns with raising and other transparency phenomena. As for the remaining transparency phenomena, N-complements are opaque, insofar as they can be tested in NPs. There is clearly no counterpart of Verb Raising in NPs: there is no rule that adjoins the V of the complement to the N of the NP. Clitics cannot be moved out ofN-complements, but this does not tell us very much because there is no clitic position in NPs that can serve as a possible landing site. Sentence adverbials usually cannot occur in N-complements, which pattern like extraposed complements in this respect: (88)
*zijn poging om waarschijnlijk te vertrekken his attempt probably to leave
It appears, therefore, that N-complements are opaque and behave like extraposed complements - and not like VR-complements - in Dutch. The relevant facts can be summarized by repeating (34) with the properties of N-complements added: (89)
Properties of infinitival complements in Dutch VR
a. b. c. d. e.
COMP Raising Control Without te Exceptional Case
+ + + +
Extraposed
N
+
+
+
+
141
Anaphoric and Nonanaphoric Control
f. g.
Obligatory control Transparency
+ +
It is, of course, very remarkable that extraposed complements and Ncomplements have exactly the same properties, which in most cases sharply contrast with the properties of VR-complements. The explanation of the contrast rests, as we have seen, on the fact that VR-complements lack a complementizer (see (89a)), which makes them transparent with respect to government. Extraposed complements are either opaque because they have a complementizer, or in principle transparent, but not affected by government from the matrix V, since this V governs only in the other direction. N-complements are not affected by government either, since Ns do not govern into complements in general. It is the notion of government, therefore, that ultimately explains the very complex distribution of facts summarized in (89).
3.6. Conclusion In chapter 1, I specified a very general set of properties, the configurational matrix, that I claimed to be the essence of all core grammar dependencies. According to the Thesis of Radical Autonomy, there are no sets of such major properties that are specific to a certain type of construction (e.g. movement, anaphor binding, predication, control). Control has been a challenge from this point of view, because it is usually claimed that control has little in common with movement or even anaphor binding. According to the standard theory of government and binding, this deviant status of control is due, among other things, to the alleged fact that PRO is not governed. The purpose of this chapter has been to show that the standard view is false in this respect. The reason appears to be that the standard view ignores the fact that there are two kinds of PRO, as was stressed by Williams (1980) and, in a different framework, by Bresnan (1982). This chapter started from the assumption that what Williams calls "obligatory control" (and Bresnan "functional control") has properties that are remarkably similar to the properties of anaphor binding, and ultimately to the properties of movement and predication as well. This can only mean that obligatory PROs (in the intended sense) are in fact anaphors. Since PRO is an empty category, it can only be an anaphor if it is governed (principle A of Chomsky's binding theory). It was furthermore shown that PRO can only be governed (i.e. be anaphoric) if it is the subject of a transparent complement. In English and Dutch, at least, transparency follows from the absence of a complementizer in underlying structure. 24 Although it might seem somewhat strange and unorthodox to postulate the possibility of governed PRO, the consequences appear to overwhelm-
142
Domains and Dynasties
ingly favor this assumption. First of all, it leads to an explanation of the correlation between the lack of a complementizer and obligatory control. Furthermore, it provides - in conjunction with some common assumptions about control - an explanation for some generalizations made by Visser and Bach. These generalizations are not explained by the standard theory, nor by the lexical-functional alternative, in spite of claims to the contrary. Nevertheless, the theory presented in this chapter can be seen as a slight modification of the standard theory. Much of this theory was adopted here, especially the S-analysis of infinitival complements plus the various possibilities of government in general, and of empty categories in particular. The advantages of this approach are particularly striking with respect to the complement system of Dutch. In this case, it is not even clear how governed PRO can be avoided (in VR-complements), and, on the positive side, a very intricate set of facts falls into place. To the extent that the analyses in this chapter can be maintained, the Thesis of Radical Autonomy is vindicated because the assumption of governed PRO entails that there is a class of control cases that shows the strict properties of anaphor binding. My theory differs in this respect from the standard theory, which still has the - in my opinion undesirable - tendency toward constructionspecific properties. In a sense, the same can be held against the alternatives proposed by Bresnan and Williams. In the lexical-functional approach, there is no obvious way to explain the similarities between a "functionally" described phenomenon such as obligatory control and a purely nonfunctional, configurational phenomenon such as Wh-movement. In conclusion, Williams's (1980) approach was one of the sources of inspiration for this chapter. In this framework, much is reduced to the properties of predication; or at least obligatory control is reduced to predication. The Thesis of Radical Autonomy suggests that this generalization might be somewhat misleading. The properties of predication themselves stand in need of clarification. What particularly calls for explanation is the fact that the configurational properties of the predication relation largely overlap with the properties of anaphor binding, movement, and many other dependencies. What is needed, in other words, is a deeper configurational principle that explains the similarities among all local dependencies. It is my hope that the properties of the configurational matrix provide the basis for such a deeper principle.
NOTES 1. For the sake of convenience, I will refer to all PROs in Jar-infinitival complements as optional PROs, in spite of the fact that such PROs are sometimes obligatory, as in John
Anaphoric and Nonanaphoric Control
143
knows what PRO to do. The obligatoriness of PRO (and absence ofjin) in such cases follows from independent factors, as has been familiar since Chomsky and Lasnik (1977). 2. Apart from overt complementizers, I also assume the existence of null complementizers. This assumption seems necessary, because the relevant distinctions can also be found in languages without overt complementizers for infinitives. 3. For the sometimes rather different properties of other control structures, see Van Haaften (1982). See also Manzini (1983a). 4. For the old idea of an implicit controller, see Koster (1978b, 583) and Roeper (1983), among others. 5. Note that the infinitives in these examples are complements to Ns, which never contain obligatory PRO in the sense intended here (see note 1). 6. In fact, the scope of the c-command property is unclear and not well explored. Reflexivization, for instance, does not always involve c-command (see Koster (1985)). Jackendoff (1972) gives examples like a book by Johl1 about himself; see also E. Kiss (1981). McCloskey (1984) gives an interesting argument for raising into PPs in modern Irish. If McCloskey is right, even movement does not always involve a c-commanding antecedent. 7. Some verbs are more complex than Iry. The verb say, for instance, has two possibilities: John said PRO to go, with PRO ("arbitrarily") controlled by the x to whom John addresses himself, and Johl1 was said t to go, which does not involve control at all, but raising. 8. Guglielmo Cinque has reminded me of the well-known fact that passivization in the complement of promise may lead to corresponding passivization of the matrix verb: Bill was promised PRO to be allowed to go. 9. See Rosenbaum (1967). See also Koster (1978c, ch. 3) for discussion. 10. It is not possible at the moment to give necessary and sufficient conditions for such structures. As Williams (1980) points out, not all verbs with Jor-complements permit this construction (want, for instance, is an exception). But if these constructions are possible at all, the complement is usually a Jor-complement. 11. Hans den Besten and Riny Huybregts have pointed out to me that (32) is not necessarily a crucial example because, according to them, passivization is generally impossible with Verb Raising complements. I disagree, because in my speech the following example (which involves Verb Raising and passivization) is perfectly grammatical:
(i) dat hij de voorzitter geacht werd te zijn ... that he the president considered was to be 'that he was considered to be the president ... ' Thus, if I am right, the account given here explains the incompatibility of Verb Raising and passivization for control verbs. There are also noncontrol verbs, such as verbs of perception, that seem to show the incompatibility of Verb Raising and passivization, but this is probably an independent matter that has nothing to do with Verb Raising. I base this conclusion on the fact that passivization is also impossible in the corresponding English examples, which certainly do not involve Verb Raising: (ii) We saw Bill go (iii) *Bill was seen 1 go (by us) 12.
Henk van Riemsdijk has pointed out to me that there is an exception to (33b): (i) dat Bill werd geacht t vergeten te zijn that Bill was considered forgotten to be 'that Bill was considered to be forgotten'
This is not a productive pattern in Dutch, and it occurs only in a very limited number of cases with participles. As far as I know, all (non participial) verbs that allow raising to subject are VR-verbs. 13. See Stowell (1981), Hoekstra (1982), and Kayne (1983). Henk van Riemsdijk (personal
144
Domains and Dynasties
communication) has suggested that directionality is limited to argument positions. For the SOV character of Dutch and German, see Koster (1975), Den Besten (1977), and Thiersch (1978). 14. See Hoekstra (1984) for discussion of this matter. 15. I am assuming here that D-structure properties can be projected from S-structure (see Koster (1978c) and Sportiche (1983)). A further assumption is that (at S-structure) a a-role assigned to an NP is optionally inherited by its antecedent. If the antecedent is assigned an independent a-role (as in control structures), inheritance is blocked by the a-criterion, which prevents double a-marking of NPs. 16. Riny Huybregts has argued that Verb Raising is a two-step operation: reanalysis at Dstructure and actual movement of the verb at the phonological level (PF); see chapter 5 below for discussion of this idea. Note that the surface order of (45b) also gives a grammatical sentence in Dutch. This does not mean, however, that Verb Raising is optional. Presumably, reanalysis (or rather, the alternative discussed in chapter 5) always applies, whereas the actual surface order is determined at PF by minor movements that vary somewhat from dialect to dialect in Germanic (especially with modals). 17. See Evers (1975). Note that the surface order of (50) is grammatical in Flemish dialects. This is not the result of extraposition, however, but a consequence of the fact that the dialects in question allow incorporation of one X" category under Verb Raising (see chapter 5 for references and some discussion). 18. A relevant example is the following: (i) Ik geloof dat John met I believe that John with
en Mary zonder een pot load werkte and Mary without a pencil worked
The words in italics definitely do not form a constituent. 19. Actually, it is possible for two constituents to precede the finite verb, if the second consituent is a so-called d-word. See Koster (1978a,c) and Thiersch (1978). 20. It is even possible in Dutch to insert a d-word between the preposed constituents and the finite verb in (63): (i) [Boeken lezenJ dati zie je haar ti zelden books read that see you her rarely The d-word dat is an NP, which usually has an NP or S' as antecedent. The D-structure position, indicated by the trace, is an NP-position. Haar remains the understood subject of boeken lezen, which indicates that this subject is not part of the preposed S' at S-structure. 21. VP-preposing is totally impossible without an auxiliary verb. But in Dutch auxiliary verbs are raising verbs, which always have an S'-complement that can be preposed. Consequently, there is no evidence for VP-preposing. 22. This is pr,obably an independent matter. If Verb Raising involves reanalysis (see note 16), it probably also involves the typical condition on reanalysis, namely, adjacency of the reanalyzed items. After extra position, the matrix verb and the complement are not adjacent. For an argument against reanalysis, see chapter 5. 23. This follows from the fact that Vs do not govern arguments to their right in Dutch. 24. Presumably, there are clauses with null complementizers. Guglielmo Cinque has pointed out to me that two classes of complements can also be distinguished in Italian, in spite of the fact that in this language the distinction does not depend on the presence or absence of an overt complementizer.
Chapter 4
Global Harmony, Bounding, and the ECP
4.1. Introduction One of the most popular developments in the theory of grammar during the past five years, the Empty Category Principle (ECP) can be seen as a rapprochement between the grammar of gaps and the grammar of scope. According to most versions of the ECP, there is a level of Logical Form (LF) to which the condition applies. LF is derived from S-structure by the rule "move alpha", which adjoins, among other things, quantified phrases to some category containing them (see May (1977) and (1985) for details). In this way, the nature of scopal domains is determined in part at least by the properties of either move alpha or by the properties of gaps (empty categories) in general. Not all versions of the ECP apply to LF. In fact, one of the most elaborated versions, the Connectedness Condition (CC) of Kayne (1983), applies at the level of S-structure. But in one crucial aspect Kayne's version agrees with most other versions: it assumes the essential parallelism of the grammar of gaps and the grammar of scope (particularly, the grammar of Wh-elements in situ). The following elements can be distinguished in Kayne's CC: (1)
a. · proper government (standard ECP) b. bounding (Subjacency) c. percolation (directionality)
According to the CC, an empty category must have an antecedent in one of its g-projections. An ec (empty category) has a g-projection if and only if it is properly governed. 1 The CC, therefore, incorporates some version of Chomsky'S classical ECP (Chomsky (1981b)). Especially in an earlier version (Kayne (1981)), Kayne's ECP was also designed to incorporate the bounding theory (Subjacency). The current version, the CC, no longer entails the whole bounding theory, but there are still elements of this theory that the CC covers. The subject condition of Chomsky (1973), for instance, follows from the CC: (2)
* Whoj did [NP a picture of tj] disturb you? 145
146
Domains and Dynasties
The trace of who, ti, is properly governed, so that it has a g-projection. Assuming that canonical government is to the right in English, percolation goes up to the subject NP (indicated by the brackets in (2)). From here, there is no further percolation because the subject is on a left branch, contrary to the condition of canonical government, which requires gprojections on right branches all the way to the antecedent who. It is therefore not possible to include the antecedent who in any g-projection of the trace, from which it follows that (2) violates the Connectedness Condition. Other versions of the ECP also include proper government (la) and elements of bounding theory (lb), but Kayne's CC is unique in that it crucially involves percolation (lc), particularly certain directionality constraints, as entailed by the notion "canonical government configuration". It is for this reason that Kayne's version of the ECP is the only version that has something to say about preposition stranding, among other things. In what follows, I will develop Kayne's directionality constraints and show that the nature of preposition stranding, the near absence of parasitic gaps, and the extreme marginality of island violations in an SOY language like Dutch follow from these directionality constraints. But in spite of my agreement with this particular aspect of Kayne's CC, I will show that the crucial assumption that inspired several versions of the ECP (including the CC), the idea of strong parallelism between the grammar of gaps and the grammar of scope, is unmotivated. In particular, I would like to argue that there is at best a weak parallelism in terms of (la), i.e. in terms of the standard ECP effects. Not only the bounding conditions (i.e. (lb)) but also the directionality constraints (i.e. (lc)) fail to show the expected parallelism. 2 As for the bounding conditions, the lack of parallelism has been convincingly demonstrated by Huang (1982). Recall that May's rule of Quantifier Raising (QR) was an instance of move alpha. May (1977) derived a strong prediction from this fact, namely that the grammar of scope (as determined by QR) would be constrained by the distinguishing property of move alpha, namely Subjacency. This prediction has turned out to be false. In fact, Chomsky (1977) had already argued that scopal domains are not really determined by Subjacency, but at best by a specificity constraint (see Fiengo and Higginbotham (1981)), as can be seen in the following examples (Chomsky (1977,214)); (3) a. b.
We can't find books that have any missing pages *We can't find the books that have any missing pages
In (3a), any can have wide scope in spite of the fact that it is in a complex NP. Specificity of the NP in (3b) blocks such an interpretation. Similarly, Chomsky showed that the scope of Wh-phrases in situ is not really subject
Global Harmony, Bounding, and the ECP
147
to island constraints. Chomsky (1981b, 235) repeats this conclusion and gives the following examples: (4)
a. b.
Who remembers where we bought which book I wonder who heard the claim that John had seen what
In (4a), the rightmost Wh-phrase, which book, can have wide scope, in violation of the Wh-island constraint. And in (4b), the violation of the Complex NP Constraint seems tolerable. Huang (1982) extensively demonstrates similar violations of island constraints for Chinese, and Lasnik and Saito (1984) give similar examples for another language without over Wh-movement, namely Japanese. In short, the idea that the scope of Wh-in situ is constrained by Subjacency has been almost universally abandoned (also by May (1985)). What has not been universally concluded, however, is that this result undermines the concept of LF movement. LF movement exists to the extent that it has properties. If it does not have the properties of move alpha, what other properties does it have? Unless this question is answered, the concept of LF movement is more obscure than its popularity might suggest. In any case, the alleged parallelism between the grammar of scope and the grammar of gaps has effectively been undermined. In recent work, Chomsky has "institutionalized" the observed discrepancy by stipulating that Subjacency is the characteristic condition of gaps at S-structure, while the (standard) ECP is the characteristic condition of gaps at LF {Chomsky (l986a)). It is the main purpose of this chapter to show that there is a second major discrepancy between S-structure gaps and so-called LF gaps, namely in terms of the directionality constraints (lc). My elaboration of Kayne's directionality constraints is based on the more general theory of syntactic domains that was briefly introduced in chapter 1 and that I will discuss first.
4.2. On the nature of local domains The general background assumption of the following discussion is the Thesis of Radical Autonomy. According to this thesis, the major properties of core grammar are construction-independent. Thus, the c-command property meets the criterion of radical autonomy because it can be found in most modules, such as predication theory, government theory -(including licensing relations like El-marking, Case-marking, and subcategorization), binding theory, and bounding theory. In other words, c-command is not the exclusive property of any of these subtheories. Locality conditions, on the other hand, are more or less supposed to be different for each subtheory. Thus, the locality property of binding is expressed by the notion governing category, while the locality condition of bounding is
148
Domains and Dynasties
expressed by the unrelated and different notion of Subjacency. The most important implication of the Thesis of Radical Autonomy is that what holds for c-command also holds for locality: locality conditions and their extensions are roughly of the same nature for all subtheories. As said before, it should be noted that the Thesis of Radical Autonomy does not entail that there are no differences between NP-movement and control (see Koster (1984a)), or between binding and bounding. Clearly, there are certain properties that differentiate the various subtheories of grammar. Radical autonomy only means that there is a common core to the subtheories, a core which is therefore totally construction-independent, and which includes (in my view) the locality conditions. In other words, the current subtheories are not atomic but molecular: they are to a large extent made from the same stuff. It is my purpose to determine what the common core is, and to optimalize, for instance, the similarities between binding and bounding. Although it seems to me that such a goal would be fairly obvious in any science that is trying to expand its explanatory core, the momentum of linguistic theory has been almost in the opposite direction in recent years. Thus, the opacity conditions of Chomksy (1973) were supposed to hold for both movement and anaphor binding. In Chomsky (1981b), things are more strictly separated. Subjacency holds for movement only, while opacity (as incorporated in the binding theory) only holds for anaphor binding (including NP-traces, but not Wh-traces). Similarly, the NIC of Chomksy (1980a) applied to both Wh-traces and anaphors. In Chomsky (1981b) part of the NIC is subsumed under the binding theory (anaphors) while a residue was developed into the ECP (traces). There are several other examples of this development, and its main thrust is clear: the optimalization of the differences between binding and bounding. In my opinion, the results of this development have not always been convincing, and in any case it seems useful to counterbalance the development just mentioned by an attempt to stress the similarities between binding and bounding. If we look at a language like English, or at other SVO languages such as Italian, French, N Ofwegian, or Swedish, the differences between the properties of Wh-traces (bounding) and the properties of anaphors (binding) seem almost overwhelming at first sight. But if we look at the much stricter bounding conditions on Wh-traces in an SOY language like Dutch, the similarities with the binding properties of anaphors can hardly be overlooked. It is my claim that the more permissive nature of bounding in English and other SVO languages is (in part) the consequence of the nature of the directionality constraints that determine domain extensions. In a language like Dutch, the nature of directionality blocks the domain extensions, hence the strict character of bounding. On the other hand, there are many languages with a much more permissive type of binding than English. In English, the domain of binding is essentially the governing category of the
Global Harmony, Bounding, and the ECP
149
anaphor. But in many languages, the domain can be "stretched" under certain conditions. Long distance reflexivization, for instance, is by no means exceptional, as shown in a survey by Yang (1984). So, in fact, there could be a possible natural language which is more or less the mirror image of English with respect to the relative permissiveness of binding and bounding. Such a language would have the strict bounding conditions of Dutch and the more permissive binding conditions of Icelandic. In other words, the emphasis on the difference between binding and bounding has perhaps been determined by the accidental properties of English and some other languages that happen to have the same differences between binding and bounding. Assuming that this bias is accidental, I will now sketch a theory of local domains (and their extensions) that stresses the similarities between binding and bounding. According to the theory introduced in chapter 1, there is a simple prototypical domain that is relevant for a number of different construction types. Other domain definitions can be seen as simple extensions of this prototypical "U rform". I am assuming, then, that the following domain definitions are sufficient for most construction types: (5)
a.
b. c.
... [~ ... ( ) .. . y .. . 8 ... ] .. . . .. [~ ... (ro) . . . y . .. 8 ... ] .. . if ... [~ ... (ro) ... y .. . 8 ... ] ... a domain then ... [w . .. y' ... [~ ... (ro) ... y ... 8 ... ] ... ] ... a domain
In these definitions, 8 stands for "dependent category" (for instance, an anaphor or Wh-trace); y is the minimal governor of 8, and Pis the minimal maximal projection containing y and 8. For some unexplained reason, it appears that the value of ~ is usually not VP, but S'. Thus, the values of P are: NP, PP, AP, and S'. ro stands for "opacity factor", i.e. the elements subject and AGR (SUBJECT in the sense of Chomsky (1981b)), and presumably INFL or COMP (see chapter 6). The third definition (5c) is recursive, and therefore the most interesting. In this definition, Wstands for the minimal category (with the same values as P) containing p and y' (the governor of P). In principle, the recursive definition (5c) can lead to domains of unlimited size, usually if there is some sort of agreement between the successive governors y and y'. This agreement between successive governors is a very common property of this type of domain extension. The first type of domain (5a) is by far the most productive. It is part of the configurational matrix and the locality condition for the government relation itself, and therefore also for the licensing relations (9-marking, Case-marking, subcategorization) that depend on government. As we saw in chapter 3, it is also the locality condition for obligatory control (see Koster (1984a)) and predication (see Williams (1980)). Most importantly, it is the locality condition that determines chain formation. It has exactly the same format as the Bounding Condition of Koster (1978c) (as
150
Domains and Dynasties
reformulated in Koster (1979) and chapter 1 above). This condition is a somewhat stricter locality principle than Subjacency. As an illustration of (5a), consider the government relation. Government is strictly local, as is clear from the following example: (6)
AA
V
P
NP
In such configurations, the NP is governed by P and not by V. This follows from the fact that government involves locality principle (5a). The NP is the dependent element 0, which is dependent on the minimal governor P (the Y in 5a)). The minimal category ~ containing these elements is the PP. As a result, the NP cannot be governed by an element outside this PP. The V is therefore not a possible governor of the NP. The second type of domain (5b) clearly is a minimal extension of (5a). By adding the opacity factor ill, the values of ~ are reduced to those categories that can contain a SUBJECT or COMP or some other opacity factor (see chapter 6). This is typically the domain proposed for binding relations (the notion "governing category" in Chomsky (1981b, ch. 3)). The third type of domain (defined by (5c)) is a recursive extension of (5a) or (5b). This is the domain of long reflexivization in Icelandic (and many other languages; see Yang (1984)). It is also the domain for long Wh-movement and parasitic gaps in English, French, Italian, Spanish, Scandinavian, etc. As mentioned before, this domain extension often requires a kind of agreement between the governor of the lower domain and the governor of the superjacent domain. Long reflexivization and long Wh-movement differ in this agreement relation, but otherwise the nature of the domain extension seems to be similar. In both cases, the domain extension is triggered by a chain of successive governors. I will refer to such a chain as a dynasty: (7)
Dynasty
= df
a chain of successive governors 1) such that for each i (1 ::;; i < n), Yi governs the minimal domain ~ containing Yi + 1
Thus, if n = 5 for instance, we have the following situation, where each Y (except Y5) governs the domain ~ of the next governor:
Global Harmony, Bounding, and the ECP
151
As in all other good dynasties, the members of a syntactic dynasty have something in common. What they have in common is in part a matter of parametrization. One of the best known examples is the kind of domain extension we find for Icelandic reflexives (see Thniinsson (1976), Maling (1981), Anderson (1983)). In the simplest case, domain extensions for Icelandic reflexives are determined by a dynasty of Vs, such that each V (except the first one) is in the subjunctive mood. Anderson (1983) gives examples like the following: (9) Jon segir ao Maria viti ao Haraldur vilji John says that Mary knows(subj.) that Harold wants(subj.) a 0 Billi mei oi sig that Bill hurts(subj.) himself In this example, the reflexive sig is not bound in its minimal domain but in a higher domain. In fact, a reflexive can be bound in any higher domain as long as the intermediate verbs are in the subjunctive mood. Moreover, each domain must be governed by the next higher subjunctive verb, up to the domain of the antecedent. The dynasty for Icelandic reflexivization in an example like (9) is as follows: (10)
. " [s' Jon . .. VI [S'· · . V ... [S'·· ' V ... [S' . .. V . . .sig ... ]]]]. . . subj. subj. subj.
The chain of subjunctive verbs is dependent on the first verb (V 1) which is in the indicative mood. In general, dynasties for anaphors are formed if the chain of governors is somehow dependent on the first element. We can express this by stipulating that the elements of the dynasty have the feature [ + dependent] as a common characteristic (see Yang (1984)). The feature [ + dependent] appears to be a parameter. It is the property shared by the members of the relevant dynasties (for n ~ 2) and it can have at least the following values (see chapter 6 for some problems and refinements): (11)
a. b. c.
[+dep] = subjunctive (Icelandic) [+dep] = infinitive (Icelandic, Norwegian) [ +dep] = reanalysis (Dutch, German)
In Norwegian, the domain of the reflexive seg is extended if the dependent verb is an infinitive (see Hellan (1980)):
(12)
Ola bad oss [PRO snakke om seg] Ola asked us to talk about himself
The Dutch equivalent of (12) is ungrammatical:
152
Domains and Dynasties
(13)
*Marie
vroeg ons [om PRO over zich te praten] about herself to talk Mary asked us compI.
In Dutch (and German), the domain is only extended with accusativewith-infinitive verbs (Reis (1976)). These verbs undergo Verb Raising under conditions of "reanalysis" (here used only as a descriptive term; see chapter 5): (14)
Marie liet i [ons over i zich praten i ] Mary let us about herself talk
In short, the various Germanic languages that allow long reflexivization are parametrized with respect to the feature [ +dep]. This feature [ +dep] is the property which the dynasties that determine the domain extension have in common. I would now like to show that the same dynasty concept applies to empty categories (ec's). I will assume, in accordance with Koster (1978c) and chapter 1 above, that the Bounding Condition (Sa) is the locality condition for traces in the unmarked case. Extensions of this minimal domain always have the form (5c) and are often only possible if a dynasty of a certain type can be formed. As in the case of reflexivization, dynasties can be formed only if the successive governors of the domain extension share a certain property. This property can be met in some languages, but not in others. If and only if the property in question can be met, does the language in question have long Wh-movement. For the time being, I will assume that in fact two conditions must be met: (15)
ec's have a dynasty iff
(a) Yn (of (7)) is a structural governor, and (b) each Yi has the same orientation
Whether something is a structural governor or not differs from language to language;4 so, let us assume that (15a) involves a parametrized notion. The second condition, however, might be invariant. So, let us make the strongest claim possible, namely, that it is an unparametrized principle of Universal Grammar. The relevant form of orientation is the direction of government, an important feature of grammars as argued by Stowell (1981), Kayne (1983), Hoekstra (1984), and others. The intuitive content of the directionality constraints (to be defined in the next section) can best be illustrated with an example: (16)
WhOi did you s~e [a pic.lure [Q[ ti]]
The preposition of is a structural governor, so condition (15a) is fulfilled. The domain of this P, indicated by the brackets, is governed by the head
Global Harmony, Bounding, and the ECP
lS3
(N) of the next domain. In turn, the domain of this N (the NP indicated by the next pair of brackets) is governed by the verb see. So, in principle, we have a dynasty here, consisting of the elements V, N, and P. According to condition (lSb), however, we can form a dynasty for the trace only if all these elements govern in the same direction. This happens to be the case in (16), as indicated by the arrows underneath the governors. My claim is that this sameness of the directionality of government is a necessary condition for grammaticality. If this claim is universally true, we make the strong prediction that in languages in which the orientation of government is not uniform (as in (16» , the equivalent of (16) is ungrammatical. This is indeed the case in a language like Dutch. (17)
*Wiej heb je [een fo.1o [v~n t j] gezkn] who have you a picture of seen
As I will show in what follows, prepositions are structural governors in Dutch, so that the first condition of (lS) is fulfilled. The second condition, however, is not fulfilled: the orientation of the P van 'of' and the N Joto 'picture' is the same as in English (government to the right), but the orientation of the verb zien 'see' is in the opposite direction (government to the left). The nonuniformity of the direction of government, indicated by the arrows in (17), makes the sentence ungrammatical. There is no dynasty, which blocks the recursive domain extension (Sc). Consequently, the Bounding Condition (Sa) defines the maximal domain in which the trace can be bound. In general, traces are bound in their minimal maximal projection (NP, PP, AP, S'), unless the domain can be extended. The domain can be extended only if a dynasty of uniformly oriented governors can be formed. Before discussing the evidence for this hypothesis, I will first point out certain differences between ec's in (A')-chains and ec's in domain extensions as defined by (Sc).
4.3. The Cinque-Obenauer hypothesis In an earlier paper, I made a distinction between an anaphoric strategy and a pronominal strategy for A'-bound ec's. The anaphoric strategy connects the links of an A'-chain, in !lccordance with the Bounding Condition (Sa). The pronominal strategy can be found in constructions involving island violations and parasitic gaps (Koster (1983». Recently, Cinque (1983b) and Obenauer (1984) have given much evidence that the notion pronominal strategy should be taken literally. Guglielmo Cinque, whose ideas I will follow here to a large extent, has proposed that the empty category with the features [- anaphor, +pronominal) is not only the little pro identified by AGR in pro-drop languages, but also the empty resumptive pronoun found in parasitic gap
154 Domains and Dynasties
constructions (among others). In the latter case, pro is not identified by AGR but by an operator in A'-position. Since this pro is considered a resumptive pronoun, it can only be an argument NP.5 This hypothesis explains several facts about so-called long Whmovement. Consider for instance Belletti's well-known observation that violations of the Complex NP Constraint are much worse if the extracted Wh-phrase is a PP: (18)
a.
b. (19)
a.
b.
?Whoj do you believe the claim that John spoke to ej *To whomj do you believe the claim that John spoke ej ?This is the guy OJ I heard about a plan to give a book to ej *This is the guy to whomj I heard about a plan to give a book ej
Extraction of PPs from islands is consistently worse than extraction of NPs from islands. This fact is explained, according to Cinque, if we assume that the rightmost ej in (18) and (19) is not a trace but an empty resumptive pronoun (pro). Obviously it must be determined then that the ec's in these sentences cannot be part of a chain. Why, in other words, are the ec's in (18) and (19) not traces? Cinque solves this problem by stipulating that there is a Wellformedness Condition on Chains (WCC) which entails, expressed in our terms, that a dynasty for traces contains only structural governors. Assuming that P and V are the structural governors (Kayne (1984)), it follows that the ej in (18) and (19) cannot be a trace: in all cases, the dynasty that extends the domain of ej to the antecedent contains the N head of the complex NP (claim, plan). Since N is not a structural governor, the WCC is not met; therefore, the ec's cannot be traces and must be pro. Since only NPs can be pro, the contrasts in (18) and (19) are explained. Although my own views differ somewhat with respect to the WCC, I think this approach is essentially correct. Or, at least, it is the only approach that explains the contrasts in (18) and (19) at all. Another much discussed problem that is explained by Cinque's approach is the following contrast (Chomsky (1982a, (71), (72)); see also Lasnik and Saito (1984) and Pesetsky (1984)); (20)
a. b. c.
someone who John expected t to be successful though believing e to be incompetent *someone who John expected t would be successful though believing e is incompetent *someone who John expected t would be successful though believing that e is incompetent
In fact, we find a three-way contrast here: (20c), with undeleted that, is worse than (20b), which is in turn considerably worse than (20a). How can we account for these contrasts? Suppose that e is a resumptive pro in all cases. (20a) would be
155
Global Harmony, Bounding, and the ECP
unproblematic because e is structurally governed and there is a path of right branches (in the sense of Kayne's Connectedness Condition) up to the antecedent (contrary to traces, pro does not require a dynasty of only structural governors). The other examples, however, are problematic if e must be pro. The problem is that there is no structural governor for the e in nominative subject position in (20b,c). These ec's can at best be structurally governed by COMP or antecedent-governed by a local antecedent in the immediately preceding COMPo But if ec's are locally bound in this way, they cannot be pro, but must, instead, be anaphoric. In a GLOW presentation (Copenhagen, 1984), Cinque gave evidence that resumptive pronouns (like pro) are never bound by a close operator (if the pronouns are subjects). Even in English this can be demonstrated: (21)
a.
b.
*the man whoj hej died yesterday ... the man whoj we didn't know who had invited himj
So, it is clear that the e in (20b,c) cannot be pro. But then it must be a trace. Sentence (20c) is then immediately ruled out by the principles that account for the that-t effect. But (20b) is also ruled out; since e is not pro, there is no direct link to the antecedent. The only other option would be successive cyclic linking (as in chains). But there cannot be a chain from e to the antecedent because not all intermediate governors are structural (though is not) and thus Cinque's WCC is not met. The only remaining option, according to which the e (trace) is locally bound by an operator, which is itself pronominally linked to who, is also excluded: (22)
someone whoj would be successful though believing [OJ [tj ... ]] pronominal linking
This option is excluded because only arguments (NPs) can be resumptive pronouns. A third fact explained by the Cinque-Obenauer approach is the wellknown fact, studied in detail by Huang (1982), that certain adjuncts, like why, cannot be extracted from islands: (23) *WhYj do you believe the claim that John left
ej
Again, the ej cannot be a trace: since the intermediate N (claim) is not a structural governor, the WCC cannot be met. But it cannot be a pro either, because it is not an argument NP. So, this approach offers a very simple solution for Huang's adjunct facts (to the extent that they involve overt gaps). From the discussion of the examples in (20) it appeared that parasitic gaps must be considered to be pro. This view is at variance with the alternative view that parasitic gaps are in fact traces, bound by an abstract operator o. According to this view, parasitic gaps constructions involve
156 Domains and Dynasties two traces, and therefore two chains that are somehow compounded (Chomsky (1986b)): (24) Which book did you return t without 0 reading t
\
j chain ,1
I
\
cjhain 2
1-_ _ _ _--1,
compounding This view is based on the apparent island-sensitivity of parasitic gaps, that is, the fact that parasitic gaps are usually bad in islands: (25)
(26)
This is the man I interviewed t a. before telling you to give the job to e b. *?before reading [NP the book you gave to eJ (CNPC) This is the man I interviewed t a. before telling you to give the job to e b. *?before asking you [which job to give to eJ (Wh-island)
If the parasitic gaps are traces, which obey Subjacency, these facts are accounted for. If parasitic gaps are resumptive pro, on the other hand, there is no obvious explanation for the ungrammatical (25b) and (26b), because it is almost a characteristic property of resumptive pronouns that they make it possible to circumvent island conditions. I will argue below that the latter generalization holds for overt resumptive pronouns but not for empty resumptive pronouns. But first it should be noted that the chain composition approach, though attractive, meets with some obvious problems. First of all, it has always been a remarkable feature of parasitic gaps that they do violate island conditions under certain circumstances, particularly in the so-called subject cases (This is a man who everyone who knows e admires t; see Taraldsen (1981)). The island effects cannot always be found in the subject cases, so the chain composition approach does not really carryover to these cases. Secondly, it is not clear how chain composition is brought about. Suggestions to the effect that the second chain would be connected to the tail of the first chain by a kind of predication mechanism are extremely problematic. In ordinary predication, the subject c-commands the predicate, and vice versa. With parasitic gap constructions we find, in contrast, an anti-c-command condition (Chomsky (1982a)). Thus, the tail of the first chain (like t in (24)) never c-commands the hypothetical second chain (see chapter 6 (section 6.4) for further details). But even if these two problems can be solved, chain composition cannot explain why the second chain usually contains only NP gaps. In a normal A'-chain, almost any category - NP, PP, AP, or adjunct occurs. Parasitic gaps are always NPs, and non-NPs give a typically bad result:
Global Harmony, Bounding, and the ECP (27)
a.
b.
157
*This is the man to whom we gave a present t without talking e *This is the champion against whom we fought t without yelling e
Adjuncts like how also give impossible sentences: (28) *How did you solve the problem t without knowing that you could do e These facts are predicted by the hypothesis that parasItIc gaps are resumptive pro, because only NPs can be pro. Chain composition, in contrast, leaves unexplained the fact that parasitic gaps are always NPs. But how can we account for the island-sensitivity of parasitic gaps? Note, first, that according to the pro analysis, parasitic gaps can be embedded in islands. In the subject case, this is obvious. But also in the adjunct cases, parasitic gaps are already in islands (if there is no hidden operator). If this is the case, the reason (25b) and (26b) are so bad is because the parasitic gaps are embedded in two islands, the adjunct island (in both cases), and a complex NP (25b) and a Wh-island (26b). In languages like English, island constraints can be violated to some extent. As we will see later, it is possible to formulate the necessary conditions for these violations, conditions that can be met in English, Scandinavian, and Romance, but not in Dutch and German. It is, however, not possible to formulate sufficient conditions on the distribution of gaps in islands (pro). In English, for instance, it is very difficult to violate island conditions twice, even in cases that are quite tolerable with one island violation. Thus, the following island violations are not so bad: (29)
a.
b.
Which race did you express a desire to win t? What books don't you remember who borrowed t from you?
If a second island is embedded, these sentences become considerably worse:
(29)
a'. b'.
*Which race did you express a desire to meet the man who won t? *What books don't you remember who knew who had borrowed t from you?
Such cases are analogous to (25b) and (26b), but here it is not possible to have recourse to the idea of chain composition. Pro, in other words, does not have the same free distribution as lexical resumptive pronouns. Contrary to the latter, it must be entirely identified by its antecedent, which is only possible in contexts that are not too complex (compare the specificity constraints on scope assignment). It is at present not possible to give an exact formulation of the degree of
158 Domains and Dynasties
complexity of the context that pro can tolerate (see note 5). But it is clear from (29) that more than one island may be too many. The fact that parasitic gaps cannot be embedded in too many islands is hardly surprising, given the fact that pro cannot be so embedded in other cases either (29). Chain composition cannot solve the problem in general (certainly not in (29)), and therefore the ungrammatical (25b) and (26b) do not form evidence in favor of extra hidden chains. In short, islandsensitivity cannot decide the status of parasitic gaps (trace or pro). There is still another argument against chain composition. Cinque (1983b) has argued that it is not always absolutely necessary to have a licensing trace for ec's with parasitic-gap-like properties. The following example (from Chomsky (1982a, (99c)) is a case in point: (30)
the man that I went to England without speaking to e
This sentence is marginal, but relatively acceptable compared to (31), in which the gap is a PP: (31)
the man to whom I went to England without speaking e
This is a familiar phenomenon by now (compare Belletti's observation discussed before). The gap in this case has the same properties as parasitic gaps - not only PPs, but also adjuncts resist extraction: (32)
*Why did he go to England without speaking to Bill e
The adjunct cannot be construed with the sentence embedded under without. Cinque also observes that the third diagnostic criterion, apparent lack of successive cyclic movement, is met in these constructions: (33) *The student that I went to England without saying (that) e is intelligent As before (cf. (20)), the sentence is bad with or without deletion of that. In other words, the gaps in these constructions have the same properties as parasitic gaps and other gaps embedded in islands. But note that since there is no licensing chain, the gap in (30) must be directly linked to the operator position that follows the head of the relative clause: (34)
the man OJ that I went to England without speaking to
ej
In this case, we cannot have a second operator following without (as in (24)): (35)
the man OJ I went to England without OJ speaking to
ej
Global Harmony, Bounding, and the ECP
159
If the rightmost 0i completes the chain, as in (24), the leftmost OJ is a vacuous operator, which is not permitted by any theory. So, example (30) obliterates the idea of chain composition. In conclusion, then, it seems to me that the Cinque-Obenauer approach is the only current theory that explains the systematic difference between traces in chains, and the gaps that we find in islands and adjuncts.
4.4. The parametrization of dynasties
If we tentatively assume that there is a clear distinction between traces in chains and pro in other contexts, we have to solve a demarcation problem: where do we find traces, and where do we find pro? Cinque (1983b) has solved this problem in terms of his Well-formedness Condition on Chains and Kayne's Connectedness Condition. According to this view, traces are only acceptable if all g-projections that connect trace and antecedent are complements of structural governors. Pro only meets the weaker conditions on g-projections as expressed by Kayne's CC: the successive gprojections are not necessarily structurally governed. The only requirement is that the g-projections are on a left branch or a right branch, depending on the canonical government configuration of a language. Although I agree to a large extent with the view that pro depends on the CC, it seems to me that the WCC is too weak. Moreover, the CC and the WCC overlap too much. I would therefore like to propose a stronger condition on chains, one that does not overlap with the CC. The condition on chains that I would like to propose is the Bounding Condition of Koster (1978c, 1979), discussed in chapter 1 and repeated above as (5a). This view entails that in the unmarked case, each trace of a chain is bound in its minimal domain P (in the sense of (5a)). Thus, under normal successive cyclic movement this condition is met: (36)
Whoj do you think [ti that he said [ti that he saw ti]]
Assuming that the traces in COMP are accessible for government from the higher verbs, each trace is bound in its minimal domain, the minimal governing Sf. Nothing further has to be stipulated with respect to the nature of the dynasty involved. In (36), there is a dynasty of Vs (structural governors in accordance with Cinque's WCC), but this fact does not have to be stipulated because it follows from the simplest domain definition, namely (5a). The Complex NP Constraint violations cannot involve traces (a chain) because it is not possible to provide an antecedent for each t in its minimal domain: (37)
a.
Whoj do you believe [NP the claim [s' tj that [Bill saw tj]]]
160 Domains and Dynasties The rightmost trace is bound in its minimal domain Sf, but the intermediate tj in COMP cannot be bound in its minimal domain. If nonstructural governors like N cannot penetrate another projection at all (see chapter 3), this trace remains ungoverned so that it does not have a domain. If the N does govern this trace, it is not bound in its minimal domain because the NP lacks a landing site (a COMP). The elements (whoj, tj, tj) therefore do not form a chain in (37a). But then there is only one other way to connect the rightmost tj with its antecedent whoj, namely by interpreting it as pro. As we saw in the preceding section, the properties of pro are exactly the properties of gaps that we find in complex NPs. Cinque (1983b) gives only one kind of fact for which the WCC seems to give different predictions from the simpler and more general Bounding Condition (5a). Cinque notes that his WCC predicts the possibility of extraction from the following context in English: (37)
b.
. .. [yp V [pp P [s{ ... e . .. ]]]] ...
If a Wh-phrase is moved from the position e to a position outside of the VP, a chain is possible according to the WCC. This is so because the dynasty from e to its antecedent would only involve structural governors, namely the P and V in (37b). Chain formation of this kind would be inconsistent with the Bounding Condition, because an intermedilj.te trace in (37b) could not be bound in the PP in (37b). Given the fact that PP extraction always involves chains, the following examples (Cinque's (67)) might confirm the validity of the WCC:
(38)
a. b. c.
the girl to whomj he was counting [pp on [s{s PRO giving a present tj]]] ... the manjrom whomj we were looking forward [pp to [s{s PRO recei ving a letter t i]]] ... the man to whomj they insisted [pp on [s{s PRO sending an invitation tj]]] ...
These examples are grammatical, and the WCC seems to be confirmed. Note, however, that these examples are not overwhelmingly convincing. In all cases, there is an alternative analysis that is consistent with the Bounding Condition. Suppose that the V and the adjacent P in (37) and (38) were to undergo "reanalysis", so that a complex verb would be formed (see chapter 5 for an alternative account of "reanalysis"): j (39) ... [yp (V pj)y [s'
tj [ ... tj . .. ] ...
In that case, the complex V could govern the intermediate trace in COMP, and the Bounding Condition would be met as in (36). There are good reasons to assume that this is what we actually find in (38). Van Riemsdijk (1978) has argued that pseudo-passives, contrary to
Global Harmony, Bounding, and the ECP
161
other cases of preposition stranding, involve complete reanalysis of V and P. All Ps in (38) can be reanalyzed with the V as complex verbs in pseudopassives: (40)
a. b. c.
It cannot be counted on t That was looked forward to t That course was insisted on t
In these cases, the t must be governed by the preceding "complex verb". Since the elements in question can undergo complete reanalysis, there is no reason to assume that this option is excluded for (38). Let us conclude therefore that the cases of chain formation considered so far are consistent with the Bounding Condition as a criterion that distinguishes chains from pro-binding. Before giving positive evidence for this view (which is inconsistent with the WCC), I would first like to make a few remarks about the general nature of dynasties. Dynasties come in various degrees of perfection. A perfect dynasty would be a set of governors that are all of the same kind. Thus, the kind of dynasty that we find in Icelandic long reflexivization is perfect because it consists of Vs and nothing but Vs: (41) ... V [ ... V [ ... V [ ... V [ ... V ... ]]]] ... Even with only Vs there might be degrees of perfection, because there are different kinds of verbs, such as infinitives and finite verbs. A slightly less perfect kind of perfection is exhibited by the kind of dynasty involved in Cinque's WCC. This type of dynasty involves structural governors only: (42) ... V [ ... P [ ... V [ ... P [ ... V ... ]]]] ... An even less perfect dynasty involves lexical governors only (Longobardi (1985)): (43) ... V [ ... N [ ... A [ ... P [ ... V ... ]]]] ...
The least perfect kind of dynasty, a really decadent dynasty, is the dynasty that defines percolation in Kayne's Connectedness Condition. Here, the only requirement is uniformity of the direction of government (canonical government). But apart from this faint family resemblance, there is little uniformity; not even lexical government is required: (44) ... y
[ ... N~ [ ... y [ ... V~ [ ... y ... ]]]] ...
I would now like to argue that the acceptability of gaps is in part a function of the dynasty that connects them to their antecedents. It has, for instance, often been observed that CNPC violations are worse with
162
Domains and Dynasties
relative clauses than with N-complements. Assuming that relative clauses have the structure [NP NP S'], this can be seen as a function of the perfection of the dynasties: (45)
a. b.
complement Ns: ... y relative clauses: ... y
[ ... tl [ ... Y t . .. ] .. . [ ... NE [ ... Yt ... ] .. .
Similarly, most parasitic gap constructions involve imperfect dynasties. The adjunct cases, for instance, might have the following structure, so that extraction from an adjunct always involves government of a g-pl:ojection by a nonlexical category (VP in this case): (46)
A
NP
~
VP
Adjunct
This accounts for the fact that adjunct parasitic gaps are always somewhat marginal. The same holds for constructions like the following: (47)
the man that I went to England without speaking to e
The upshot of this discussion is that the marginality of these constructions is not only due to the fact that they involve pro, but also due to the fact that their dynasty is far from perfect. Consider now simple cases of preposition stranding in English: (48)
[s' What [are you talking [pp about e]]]
It is with respect to these constructions that the WCC and the Bounding Condition make different predictions. According to the WCC, e is a trace because chain formation is possible: the dynasty involves the P about and the V talk, both structural governors. According to the Bounding Condition approach to the demarcation problem, the e is not a trace but a pro: the e cannot be bound in its minimal domain (PP). How can we test these predictions? Before giving some possibly crucial evidence, I would like to recall some facts discussed by Ross (1967) .. Elaborating certain observations by Kuroda (1964), Ross points out that certain nouns like time, way, manner, place, etc. cannot be pronominalized (Ross (1967, 4.203-204)): (49)
a.
My sister arrived at a time when no buses were running, and my brother arrived at a time when no buses were running too
Global Harmony, Bounding, and the ECP b. (50)
a. b.
(51)
a.
b.
163
*My sister arrived at a time when no buses were running and my brother arrived at one too Jack disappeared in a mysterious manner and Marian disappeared in a mysterious manner too *Jack disappeared in a mysterious manner and Marian disappeared in one too I live at the place where Route 150 crosses Scrak River and my dad lives at the place where Route 150 crosses Scrak River too *1 live at the place where Route 150 crosses Scrak River and my dad lives at it too
Ross then goes on to point out that in these cases prepositions cannot be stranded (p. 4.205): (52)
a.
b. c.
*What time did you arrive at e? *The manner which Jack disappeared in e was creepy *The place which I live at e is the place where Route 150 crosses Scrak River
On the basis of these facts, Ross (p. 4.206) comes to a conclusion that is very interesting from the present perspective (recall that NP = PP for Ross): (53)
No NP whose head noun is not pronominalizable may be moved out of the environment [P - JNP
In other words, ec's must have pronominal features in this context. But this is exactly what we expect if ee's after stranded prepositions are pro. A really crucial test would be the following. The Bounding Condition approach entails that gaps in PPs (like in (48)) are pro, while the wce approach entails that they are traces. The Bounding Condition approach, then, predicts that PPs cannot be extracted from PPs (cf. Belletti's observation, discussed above). Thus, the following configuration is allowed by the wec but excluded by the BC: (54)
Whj [ ... V ... [P [pp eiJJ
... J ...
The path from ej to Whj involves the structural governors P and V, so the linking can be considered a chain according to the wee. The Be, on the other hand, excludes chains of this type because the ej is not bound in its minimal domain (PP). The divergent predictions can be tested with Ps that take a PP complement (PP ~ P PP). Although such PPs exist in English (Jackendoff (1977), Van Riemsdijk (1978)), the prediction is not easy to test, because there are not too many bona fide prepositions that can be stranded and that take PP comple-
164 Domains and Dynasties
ments. Particles must be excluded, and so must the possibility that the preposition is assimilated to the verb (as we saw in connection with (38)). But to the extent that these pitfalls can be circumvented, the prediction of the Be approach seems to be correct. Van Riemsdijk (1978, 146) gives examples like the following: (55)
a. b.
They took a shot at him [pp from [pp behind the car]] *[Behind the car]j they took a shot at him [pp from [pp ej]]
Other examples are: (56)
a.
b.
They heard the noise [pp from [pp inside the barn]] *[Inside which barn]j did they hear the noise [pp from [pp ej]]?
(57)
a.
b.
They sent him [pp down [pp into the valley]] *[Into which valley]j did they send him [pp down [pp ej]]?
So, to the extent that the evidence is representative, the Be approach seems to be vindicated. In the next section, I will show that evidence from Dutch points in the same direction. Let us tentatively conclude, therefore, that gaps in preposition stranding contexts are not traces but pro and that PPs are absolute barriers for "movement" (chain formation). Thus, both the following structures are cases not of movement but of pro-linking: ('58)
a. b.
Whatj are you talking [about proiJ the man OJ that I went to England [without speaking to proiJ
The relative acceptability of (58a) in comparison with (58b) has to do with the relative perfection of the dynasty in (58a): the dynasty in (58a) involves only structural governors, while the adjunct phrase (without . .. ) in (58b) is not even lexically governed. Problematic for the Be approach are the well-known violations of the Wh-island condition in Italian,studied by Rizzi (1978b): (59)
Tuo fratello, a cuij mi your brother to whom I abbiano raccontato tj era they have told was
domando che storie wonder which stories molto preoccupato very troubled
The problem is that a PP has been extracted out of a Wh-island. Since the Be cannot be met (the tj is not bound in its minimal Sf), we would expect a pro under our hypothesis. But since the trace is a PP it cannot be pro. The same can be observed in those dialects of English in which the following example is grammatical:
165
Global Harmony. Bounding. and the ECP (60)
the man to whomj I wondered [s' which storiesj [PRO to tell
tjtjJJ
If extraction of PP is a diagnostic criterion for chain formation. we must conclude that domains for traces can be bigger than the minimal domain ~ of (5a). In fact. these problems suggest a new approach to parametrization. based on the dynasty concept. The basic idea is that not bounding nodes but dynasties form the primary locus of parametrization. According to application of the BC sketched earlier. chains are characterized by the BC. the simplest domain principle of UG (5a). But it is clear. as we saw for Icelandic ' and other Germanic languages. that not only the binding of pro but also the binding of anaphors can involve extended domafns with dynasties (in the sense of (5c)). As we saw before. the binding of pro conforms to the rather permissive dynasty of Kayne's Connectedness Condition. But nothing in our theory prohibits an independent domain extension for traces. It seems to me that the relevant domain extension is similar to the domain extensions that we see for anaphors in certain languages. Exactly as in the cases of long reflexivization. the domain extension for traces is a parametric option based on the category V. If we assume that the BC involves exactly one governor Xo. for instance V (61a). we may hypothesize that several languages. including Italian and certain varieties of English or even Dutch (see chapter 1). can extend the bounding domain by a dynasty of Vs (61b): (61)
a.
Bounding Condition:
b. Extended Bounding Condition:
(V) (Vl •.. ·• V j •... V n )
This means that in the unmarked case. a bounding domain is defined by exactly one governor (V in (61a)). In the marked case. the domain is defined by a dynasty of Vs (61b). As it stands. (6tb) is too permissive for the facts of Italian that we find in Rizzi (1978b). This problem can be solved in two ways. Either we could stipulate an upper bound on certain dynasties. or we could maintain (61b) and hypothesize that the limitations on the size of the dynasties in Italian are caused by complexity factors independent of the bounding theory. Although I tend to opt for the latter solution. I will show that Rizzi's major data can be handled by a dynasty of exactly two Vs. Let us first note that an adjunct cannot be extracted from the simplest Wh-islands in Italian. as observed by Huang (1982. ch. 7. data provided by Rita Manzini): (62)
a.
e
*Questo la ragione perLa quale mi chiedo che cosa ho this is the reason jor whichj I wonder what comprato tj bought tj
166
Domains and Dynasties b.
e il
*Questo this is comprato bought
modo nel quale mi chiedo che cosa ho the way in whichj I wonder what I tj tj
If we assume with Huang that only governed complements trigger percolation, these facts are easily explained. With a dynasty of only Vs, the trace itself and each successively higher domain containing it must be governed by a V. This condition is not met in (62): the trace of the adjunct (tj) is not governed by V, and is definitely not a complement of the embedded verb. Note that the sentences in (62) were also ruled out if we had pro instead of trace, since adjuncts do not have pros. But the pro-linking approach is irrelevant for cases like (62), because PPs can be extracted from the Whislands in question (see (59)). This shows that we have to do with a trace domain in (62) and not with a pro domain. As said before, most of Rizzi's data can be accounted for by stipulating that the Extended Bounding Condition involves a dynasty of exactly two Vs. What this would come down to is that traces in Italian must be bound in their domain governing category in the sense of Manzini (1983a). Two sets of data are relevant in this respect. First of all, Rizzi has shown that violation of more than one Wh-island leads to ungrammaticality:
(63) *Questo incarico che non so proprio chi possa avere this task that not I know really who might have indovinato a chi affidero, mi sta creando un sacco di grattacapi guessed to whom I will entrust is getting me into trouble The structure of such double Wh-island violations is as follows: (64)
OJ ... V [whj [tj ... V [Whk [ ... V tj tk]]]]
Clearly, the path from tj to OJ involves a dynasty of three Vs, which is one more than the maximally permitted number of two. Another example discussed by Rizzi is the following. Extraction of a relative pronoun from a clause introduced by a Wh-phrase gives a reasonably acceptable result «65a), corresponding to Rizzi's (17a)), while the sentence becomes ungrammatical if the Wh-phrase does not introduce the clause from which the relative pronoun is extracted, but the next clause up «65b), corresponding to Rizzi's (17b)): (65)
a.
b.
OJ [8 .. · V [8' tj [8 ... V [8 whj [8 ... V tj tj]]]] *OJ [8 ... V [8' whj [8 ... tj . .. V [8' tj [8 ... V tj . .. ]]]]]
Sentences illustrating this contrast are the following (Rizzi's (18a) and (18b)) (although (66b) can be made virtually acceptable by slight stylistic
167
Global Harmony, Bounding, and the ECP
modifications according to Guglielmo Cinque (personal communication)): (66)
a.
II mio primo libro che credo che tu sappia a my first book which I believe that you know to chi ho dedicato mi sempre stato molto caro whom I have dedicated to me has always been very dear *II mio primo libro che so a chi credi my first book which I know to whom you believe che abbia dedicato mi sempre stato molto caro . that I have dedicated to me has always been very dear
e
b.
e
These facts are explained by Rizzi by stipulating that S' is the bounding node for Subjacency in Italian. In (65a), the path from the rightmost tj to OJ never passes more than one S', while in (65b) there is not such a path: the linking of the tj in the rightmost COMP to OJ passes more than one S'. Note however that the same facts are explained by the hypothesis that a dynasty for traces in Italian involves at the most two Vs. In (65a), this is immediately obvious: the path connecting the two traces contains only two Vs. The second example (65b) looks somewhat more problematic at first sight, because the path connecting the two tjS contains only one V, while the path connecting the intermediate tj to the operator OJ contains two Vs. But note that the latter path does not contain a well-formed dynasty, if we assume in the spirit of Huang (1982) that this type of percolation can only be triggered by complements. Since the intermediate tj in COMP is not a complement, it cannot be the basis of a dynasty. Only the rightmost tj is a complement that can trigger a dynasty. But the path from this trace to OJ contains three Vs, one more than permitted. 6 All in all, it seems to me that the basic contrasts in Italian are explained by the hypothesis that a trace is either bound in its minimal domain, or in a domain defined by a dynasty of not more than two Vs. In spite of the fact that stipulating an upper bound to dynasties for Italian presumably leads to a certain degree of descriptive adequacy, I have a preference for the second option mentioned above. According to this alternative, Italian traces are either locally bound, or bound in a domain defined by any number of Vs. In this way, we do not have to complicate the domain definitions given under (5): domains are either strictly local or recursively expanded to the extent that a dynasty (of some type) can be formed. The fact that it is difficult to violate more than one Whisland in Italian must then be due to complexity factors of various kinds. In fact, there is some evidence for this second option. First of all, decrease of acceptability with multiple Wh-island violations seems to be more gradual than the other approaches suggest (Guglielmo Cinque, personal communication). Furthermore, Rizzi observes that violations of more than one Wh-island lead to more acceptable results if the dynasty (in our terms) involves infinitives. This fact is not easily accounted for if there is
168
Domains and Dynasties
a sharp cut-off point after two bounding nodes or a dynasty of exactly two Vs. In general, it seems to me that domain extensions for traces (or for any other dependent element) are not defined in terms of bounding nodes, but in terms of the nature" of the Succ€ssive domain governors. If this is true, dynasties form one of the primary loci of parametrization among languages. This was already clear from the study of reflexivization in various languages. Thus, domains for reflexives can often be expanded if there is a dynasty of infinitives. Similarly, it has often been observed that Wh-islands can be relatively easily violated in many dialects of English if there is a dynasty of infinitives. Reinhart (1975), for instance, gives the following examples: (67)
a.
b. c.
What don't you know when to file? What don't you know how long to boil? What don't you know where to put?
Reinhart even gives examples of Wh-island violations from tensed sentences that appear to be acceptable to many speakers: (68)
What books don't you remember who borrowed from you?
Given these facts, one might wonder what the difference between English and Italian really is. SIS' parametrization for Subjacency is obviously not sufficient. And if we accept the dynasty approach, we face the same problem: both English and Italian allow Wh-island violations in certain contexts, so, for both languages we have to define an extended domain. That the dynasty approach is adequate for examples like (67) appears from the fact that we cannot extract adjuncts (as observed in Koster (1978c, 198)). If we reverse the Wh-phrases in (67), the sentences become ungrammatical (cf. the Italian sentences in (62)): (69)
a.
b. c.
*When don't you know what to file t? How long don't you know what to boil t? *Where don't you know what to put t?
*'
In short, there appears to be considerable overlap between the Wh-island behavior of English and of Italian. So, again, where are the differences? The following data from Chomsky (1973, 69) seem to be relevant for a potential answer to this question: (70)
a.
b.
*What booksj does John know to whomj to give tj tj? *To whomj does John know what booksj to give tj tj?
The judgments given for (67}-(69) seem to hold for most varieties, but with
169
Global Harmony, Bounding, and the ECP
respect to (70) there might be a dialect split: some speakers accept (70) while others reject these sentences. Note also that the dynasty approach does not rule out (70) in an obvious way. The sentences in (69) were no problem from this point of view, because adjuncts are not governed by Vs so that there was no way to trigger a percolation domain. In (70), in contrast, all traces are governed by the verb give. So, why are the sentences in (70) bad in certain varieties of English? I would like to suggest the following answer to this question. The domain of empty categories is determined by a governor, or, in the case of a dynasty, by a chain of governors. Let us suppose now that there is a uniqueness condition that says that each governor can define the domain of exactly one empty category. If both traces in (70) are governed by the verb give, this uniqueness condition would be violated because the governor would determine the domain of more than one ec (tj and tj). Along similar lines, we would have an explanation for the following ungrammatical sentence: (71)
* Whatj do you wonder whomj to give
tj
to ej?
The ec corresponding to whom (i.e. ej) must be pro according to our earlier account of ec inside PPs. This entails that there must be a weak dynasty (in the sense of Kayne's CC) that connects this ec to its antecedent whom} This dynasty consists of the successive governors P (to) and V (give). So, the verb give defines (or co-defines) the domain of an ec. But the same verb give also defines - as its governor - the domain of tj. Clearly, this is a violation of the uniqueness condition, which says that a governor can determine the domain of only one ec. We are now able to account for all of the following contrasts (discussed by Chomsky in class lectures, fall 1983): (72)
a. b. c. d.
This is a paper OJ that we really need to find someonej [whoj [tj understands ej]] ?*This is a paper OJ that we really need to find someonej [OJ that [we can intimidate tj with ej]] *This is the reason whyj we really need to find someonej [OJ that [we can intimidate tj with this paper ej]] ?*This is a paper OJ that we really need to find someonej [whoj [we can all agree [tj [tj understands ej]]]]
The ungrammaticality of sentence (72c) is explained by the CinqueObenauer theory discussed before. Sine the ej is not bound in its minimal domain, it must be a pro. But only NPs can be pro, so that the adjunct why does not qualify as an element that binds pro. The other two ungrammatical sentences are explained by the uniqueness condition. In (72b), the verb intimidate defines two domains, namely the domains of tj and ej (the dynasty of ej involves the verb, which also
170 Domains and Dynasties
defines the domain of tj). Similarly, in (72d) the verb agree co-determines the domains of the intermediate trace tj, and of ej. In contrast, the verb understand in (72a) only determines the domain of ej. The domain of tj is not determined by any lexical governor at all, but by the antecedent governor whoj in COMP or by INFL. It is, in other words, not necessary under the uniqueness explanation to adopt the idea of Leland George that who remains in its (D-structure) subject position at S-structure (followed only by movement to COMP at LF). Let us assume, then, that the domain of A'-bound empty categories is always determined by a unique governor. The uniqueness principle suggests a new solution for the English-Italian difference (to the extent that there is a difference). As we saw before, most varieties of English allow some form of Wh-island violation (see (67)--(68)). Problematic is the impossibility of the sentences in (70) in some varieties. In these sentences, the VP contains two traces, in apparent violation of the uniqueness principle. The same problem arises in the sentences discussed by Rizzi, particularly in examples like (59) (repeated here for convenience): (73)
Tuo fratello, a cuij mi domando che storiej abbiamo raccontato tj tj era molto preoccupato
This type of sentence also has two traces in the VP. How can we reconcile this with the uniqueness principle? We would not like to parametrize the uniqueness principle itself. It looks natural, and it loses its force if it is not universal: something is unique or it is not. I would therefore like to propose an alternative solution that parametrizes not the uniqueness principle but the scope of governors. Consider the structure of the predicate in English: (74)
VP
~ A
Adjunct
PP
V
NP
I
what
to whom
when
The adjunct is weakly governed by the adjacent VP; the PP is governed by V', and the NP by V. Alternatively, it might be the case that the V governs both the NP and the PP. Suppose now that this is a matter of para-
Global Harmony, Bounding, and the Eep
171
metrization: in some languages (like in some varieties of English) the V governs both the NP and the PP, while in other languages (like Italian) the NP is governed by V and the PP by V'. We have in principle, then, a way to circumvent the uniqueness principle if the domain extensions for traces are determined by a dynasty of Vs or ViS. Take the following example again:
(75)
To whomj do you wonder [whatj [to [vp[ v{ V give] ti] tj]]]
If only the embedded V governs, the dynasty for tj is