SYNTAX AND SEMANTICS VOLUME 32
EDITORIAL BOARD
Series Editors BRIAN D. JOSEPH AND CARL POLLARD Department of Linguis...
56 downloads
1241 Views
18MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
SYNTAX AND SEMANTICS VOLUME 32
EDITORIAL BOARD
Series Editors BRIAN D. JOSEPH AND CARL POLLARD Department of Linguistics The Ohio State University Columbus, Ohio
Editorial Advisory Board JUDITH AISSEN University of California, Santa Cruz
PAULINE JACOBSON Brown University
PETER CULICOVER The Ohio State University
MANFRED KRIFKA University of Texas
ELISABET ENGDAHL University of Gothenburg
WILLIAM A. LADUSAW University of California, Santa Cruz
JANET FODOR City University of New York
BARBARA H. PARTEE University of Massachusetts
ERHARD HINRICHS University of Tubingen
PAUL M. POSTAL Scarsdale, New York
A list of titles in this series appears at the end of this book.
SYNTAX and SEMANTICS VOLUME 32 The Nature and Function of Syntactic Categories Edited by
Robert D. Borsley Department of Linguistics University of Wales Bangor, Wales
ACADEMIC PRESS San Diego London Boston New York Sydney Tokyo Toronto
This book is printed on acid-free paper. Copyright © 2000 by ACADEMIC PRESS All Rights Reserved. No parts of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher's consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-1998 chapters are as shown on the title pages, if no fee code appears on the title page, the copy fee is the same as for current chapters. 0092-4563/99 $30.00
Academic Press A Division of Harcourt, Inc. 525 B Street, Suite 1900, San Diego, CA 92101-4495 http://www.apnet.com Academic Press 24-28 Oval Road, London NW1 7DX http://www.hbuk.co.uk/ap/ International Standard Book Number: 0-12-613532-0 PRINTED IN THE UNITED STATES OF AMERICA 99 00 01 02 03 04 BB 9 8 7 6 5 4 3 2 1
CONTENTS
Contributors
ix
Introduction ROBERT D. BORSLEY
1
1. Some Background 2. The Chapters References
1 3 6
Grammar without Functional Categories RICHARD HUDSON 1. 2. 3. 4. 5. 6. 7. 8. 9.
Introduction Functional Categories Complementizer Pronoun Valency and Its Irrelevance to Classification Determiner FWCs as Classes of Function Words FWCs as Closed Classes Grammar without FWCs References
V
7 7 7 10 15 19 22 25 28 30 34
vi
Contents
Functional versus Lexical: A Cognitive Dichotomy RONNIE CANN 1. 2. 3. 4. 5.
Introduction Characterizing Functional Expressions The Psycholinguisticstic Evidence Categorizing Functional Expressions Conclusion References
Feature Checking under Adjacency and VSO Clause Structure DAVID ADGER 1. 2. 3. 4.
Introduction Feature Checking Subject Positions in Irish and Scottish Gaelic Conclusions References
Mixed Extended Projections ROBERT D. BORSLEY AND JAKLIN KORNFILT 1. 2. 3. 4. 5. 6. 7. 8.
Introduction A Proposal Some Constructions Some Alternative Approaches Some Impossible Structures Some Other Analyses A Further Issue Conclusions References
Verbal Gerunds as Mixed Categories in Head-Driven Phrase Structure Grammar ROBERT MALOUF 1. 2. 3. 4.
Introduction Properties of Verbal Gerunds Previous Analyses Theoretical Preliminaries
37 37 39 51 59 70 74 79 79 80 87 97 99 101 101 102 104 117 120 123 125 126 129
133 133 135 140 148
Contents
5. A Mixed Lexical Category Analysis of Verbal Gerunds 6. Conclusion References English Auxiliaries without Lexical Rules ANTHONY WARNER 1. 2. 3. 4. 5. 6. 7.
Introduction Auxiliary Constructions in Head-Driven Phrase Structure Grammar Negation Subject-Auxiliary Inversion Linear Precedence Postauxiliary Ellipsis Conclusion References
The Discrete Nature of Syntactic Categories: Against a Prototype-Based Account FREDERICK J. NEWMEYER 1. 2. 3. 4. 5.
Prototypes, Fuzzy Categories, and Grammatical Theory Prototype Theory and Syntactic Categories Prototypicality and Paradigmatic Complexity The Nonexistence of Fuzzy Categories Conclusion References
Vll
152 163 164 167 167 168 175 194 201 202 211 218
221 221 226 228 242 245 247
Syntactic Computation as Labeled Deduction: WH a Case Study 251 RUTH KEMPSON, WILFRIED MEYER VIOL, AND Dov GABBAY 1. 2. 3. 4. 5. 6.
The Question The Proposed Answer The Dynamics Crossover: The Basic Restriction Towards a Typology for Wh-Construal Conclusion References
251 256 264 272 278 287 291
viii
Contents
Finiteness and Second Position in Long Verb Movement Languages: Breton and Slavic MARIA-LUISA RIVERO 1. 2. 3. 4. 5.
PF Conditions on Tense Long Verb Movement versus Verb Second Breton South and West Slavic Summary and Conclusions References
French Word Order and Lexical Weight ANNE ABEILLE AND DANIELE GODARD 1. 2. 3. 4. 5. 6.
Introduction The Order of Complements in the VP A Feature-Based Treatment The Position of Adjectives in the NP Ordering Adverbs in the VP Conclusion References
Index
295 296 297 304 312 318 321 325 325 326 329 338 346 354 358 361
CONTRIBUTORS Numbers in parentheses indicate the pages on which authors' contributions begin.
Anne Abeille (325), IUF, Universite Paris, UFRL, Paris, France David Adger (79), Department of Language and Language Science, University of York, Heslington, York, United Kingdom, YO1 5DD Robert D. Borsley (1,101), Linguistics Department, University of Wales, Bangor, Wales, United Kingdom, LL57 2DG Ronnie Cann (37), Department of Linguistics, University of Edinburgh, Edinburgh, United Kingdom, EH8 9LL Dov Gabbay (251), Department of Computing, Kings College London, London, United Kingdom, WC2R 2LS Daniele Godard (325), CNRS, Universite Lille 3, Villeneuve d'Ascq, France Richard Hudson (7), Linguistics Department, University College, London, London, United Kingdom WC1E 6BT Ruth Kempson (251), Department of Philosophy, Kings College London, University of London, London, United Kingdom, WC1H OXG Jaklin Kornfilt (101), Department of FLL, Syracuse University, Syracuse, New York 13244 Robert Malouf (133), Stanford University, Stanford, California and University of California, Berkeley Frederick J. Newmeyer (221), Department of Linguistics, University of Washington, Seattle, Washington 98195 Maria-Luisa Rivero (295), Department of Linguistics, University of Ottawa, Ottawa, Ontario, Canada KIN 6N5 Wilfried Meyer Viol (251), Department of Computing, Kings College London, London, United Kingdom WC2R 2LS Anthony Warner (167), Department of Language and Linguistic Science, University of York, Heslington, York, United Kingdom YO1 5DD ix
This page intentionally left blank
INTRODUCTION ROBERT D. BORSLEY Linguistics Department University of Wales Bangor, Wales
For any theory of syntax, major questions arise about its classificatory scheme. What sort of syntactic categories does it assume? What properties do they have? How do they relate to each other?' The questions are prominent in different ways in two of the main contemporary theories of syntax, Principles and Parameters theory (P&P) and Head-driven Phrase Structure Grammar (HPSG), but they are also important in other theoretical frameworks. This book brings together ten chapters that discuss questions that arise in connection with the nature and function of syntactic categories. The book has its origins in a conference held at the University of Wales, Bangor, in June 1996, where earlier versions of all but three of the papers included here, those of Malouf, Rivero, and Warner, were presented.2 In this introduction I will sketch some background and then introduce each chapter.
1. SOME BACKGROUND The idea that syntactic categories are complex entities related in various ways is implicit in traditional discussion of grammar, where labels like "masculine singular noun" and "feminine plural noun" are employed. In spite of this, the earliest work in generative grammar assumed simple atomic categories and had no theory of syntactic categories.3 The work of the 1960s began to lay the basis Syntax and Semantics, Volume 32 The Nature and Function of Syntactic Categories
1
Copyright © 2000 by Academic Press All rights of reproduction in any form reserved. 0092-4563/99 $30
2
Robert D. Borsley
for a theory of syntactic categories. The idea that syntactic categories are complex entities was a central feature of Harman (1963), and it gained general acceptance after it was adopted by Chomsky (1965). However, Chomsky proposed not that all syntactic categories are complex but only that lexical categories like noun and verb are. He used features to provide a more refined classification of lexical items than is possible with labels like "N" and "V." In particular, he employed features to subclassify verbs, to distinguish, for example, between verbs like die, which take no complements, and verbs like kill, which take a noun phrase (NP) complement, marking the former as [+_#] and the latter as [+_NP]. He argued that "There is apparently no motivation for allowing complex symbols to appear above the level of lexical categories" (1965:188). Chomsky later abandoned this position (Chomsky, 1970) and proposed that all categories are "sets of features" (1970:49). In this work, he laid the foundations for X-bar theory, which was in part a theory of syntactic categories. Whereas Chomsky (1965) provided a way of recognizing subclasses of certain lexical classes, Xbar theory by breaking up categories into a basic categorial component and a bar level provided a way of recognizing certain superclasses of expressions. Thus, an NP like the picture of Mary, an N' like picture of Mary, and an N like picture are all identified as nominal expressions. Later work in X-bar theory analyzed nominal, verbal, adjectival, and prepositional expressions in terms of the features ± N, ± V and recognized further intersecting superclasses. Thus, all nominal and adjectival expressions are +N, and all nominal and prepositional expressions are —V. A rather different analysis of nominal, verbal, and so on was advanced in Jackendoff (1977). Thus, much was unclear. It was generally accepted, however, that syntactic categories are complex entities "going together" in various ways. A number of important ideas about categories have developed since 1980. Within P&P, a distinction has been drawn between lexical categories like noun and verb and functional categories like complementizer and determiner. Chomsky (1986) proposed that functional categories head phrases in just the same way as lexical categories. Subsequent work proposed a large number of abstract phonologically empty functional categories. (See Webelhuth, 1995, for a useful list.) Such categories generally play a role in licensing certain kinds of morphology or act as landing sites for movement processes and thus account for certain wordorder facts. For example, it is widely assumed that English aspectual auxiliaries precede the negative particle not because they are moved to a T(ense) functional category, whereas lexical verbs follow because they remain in situ. Similarly, it is assumed that the different position of finite verbs in main and subordinate clauses in German is a result of their movement to C in main clauses. In much the same way, Cinque (1994) proposes that Italian NPs have the order noun + adjective because nouns move to a functional head, whereas English NPs have the order adjective + noun because nouns remain in situ. Thus, functional categories account for differences in the distribution of members of the same broad category
Introduction
3
either within a single language or across languages. Given the proliferation of functional categories in P&P work, it is natural to ask whether they can be classified in some way. An important idea here is Grimshaw's (1991) proposal that functional categories are associated with specific lexical categories. Thus, C(omplementizer) and T(ense) are verbal categories, whereas D(eterminer) and Num(ber) are nominal categories.4 One point that we should stress here is that P&P ideas in this area go well beyond the claim that there is a significant distinction between lexical and functional expressions. Hence one might accept this claim without accepting many of the other P&P ideas. Also since 1980 the descriptive and explanatory potential of complex categories has been explored first within Generalized Phrase Structure Grammar (GPSG) and then within HPSG. Thus, early work in GPSG showed how a category-valued feature SLASH permitted an interesting account of long-distance dependencies. Subsequent work in HPSG showed the value of features with feature groups and lists and sets of various kinds as their value. Hence, whereas P&P has assumed a large number of relatively simple categories, GPSG and HPSG have assumed a smaller number of more complex categories. The GPSG/HPSG conception of syntactic categories naturally leads to different analyses of many syntactic phenomena. In particular, it leads to rather different approaches to morphology and linear order. Clearly, there are major questions about the relation between these two different conceptions of syntactic categories. Other work since 1980 has focused on the relation between syntactic and semantic information. This has been a central concern for work in various versions of categorial grammar, which assumes a very close relation between syntactic and semantic categories. The syntax-semantics relation has also been central for cognitive linguistics, which also, although in a very different way, assumes a close connection between syntactic and semantic categories. The relation between syntactic and semantic information has also been an important concern for GPSG and HPSG. It is hoped that this brief sketch makes it clear that syntacticians have developed a rich body of ideas about syntactic categories and raised a variety of important questions. The chapters in this volume explore some of these ideas and address some of the questions.
2. THE CHAPTERS A number of the chapters in the volume are concerned with functional categories. Chapters by Hudson and Cann consider whether there is a clear distinction between lexical and functional categories. Hudson argues against this idea, focusing in particular on determiners and complementizers. He argues that determiners
4
Robert D. Borsley
are pronouns and hence a subclass of nouns and not a functional category, and that the main putative examples of complementizers, that, for, and if, do not form a natural class, but are syncategorematic words, which are the sole member of a unique category. If his arguments are sound, they cast some doubt on an important element of P&P theorizing. Cann considers the distinction between lexical and functional categories from both a linguistic and a psycholinguistic point of view. He argues that there are no necessary or sufficient linguistic conditions that identify an expression as being of one type or the other. This suggests that the difference between them is not categorial. He argues, however, that evidence from processing, acquisition, and breakdown suggests that the distinction is categorial. He goes on to argue that the contrast between the linguistic and the psycholinguistic evidence reflects a difference between properties of E-language and properties of I-language and that the functional/lexical distinction holds of the former but not necessarily of the latter. He then develops a theory of categorization that incorporates this idea. For those who assume a distinction between lexical and functional categories, a variety of questions arise. For example, there are questions about what sort of functional categories should be assumed. In particular, a question arises about whether it is necessary to assume functional categories with no semantic import. Adger's chapter addresses this issue, and he argues that a proper consideration of the way that syntax interfaces with other components of grammar obviates this need. He illustrates this with a case study of subject licensing in Scottish Gaelic and Modern Irish. The consensus in the literature is that this requires the postulation of functional categories with no semantic import. Adger shows that an alternative appoach, which licenses the subject via a morphological process, allows the elimination of such abstract functional categories. Questions also arise about the relation between lexical and functional categories. As we noted earlier, Grimshaw proposes that some functional categories are inherently nominal and others inherently verbal. She also proposes that there are no "mixed extended projections," in which a functional category occurs not with the associated lexical category but with some other lexical category. Borsley and Korafilt argue against this claim. They consider a variety of constructions in a variety of languages that display a mix of nominal and verbal properties, and argue that these constructions should be analyzed as structures in which a verb is associated with one or more nominal functional categories instead of or in addition to the normal verbal functional categories. If this is right, then Grimshaw's claim is too strong. A very different approach to one of the constructions discussed by Borsley and Kornfilt, the English poss-ing construction, is developed by Malouf. This utilizes the hierarchical lexicon of HPSG to analyze verbal gerunds as both nouns, a category that also includes common nouns, and verbals, a category that also includes verbs and adjectives. He also shows how this approach can be extended to the English ace-ing construction.
Introduction
5
The hierarchical lexicon of HPSG is also exploited in Warner's chapter. Warner develops a detailed HPSG analysis of English auxiliaries, dealing in particular with negation, inversion, and ellipsis. He shows that the complex array of data in this domain can be accommodated through inheritance and that there is no need for lexical rules. As noted above, an important question about syntactic categories is how they relate to semantic categories. This is the main concern of Newmeyer's chapter. He focuses on the idea central to cognitive linguistics that categories have "best case" members and members that systematically depart from the "best case" and that the optimal grammatical description of morphosyntactic processes makes reference to the degree of categorial deviation from the "best case." He argues that the phenomena that have been seen to support this view are better explained in terms of the interaction of independently needed principles from syntax, semantics, and pragmatics. The relation between syntax and semantics is a major concern in the area of wh-questions, which are the focus of Kempson, Meyer-Viol, and Gabbay's chapter. They are concerned with why wh-questions have the properties that they do: long-distance dependencies, wh-in situ, partial movement constructions, reconstruction, crossover, and so on. They argue that this array of properties can be explained within a model of natural language understanding in context, where the task of understanding is taken to be the incremental building of a structure over which the semantic content is defined. The model involves a dynamic concept of syntax rather different from that assumed in the other chapters. Questions also arise about the relation between syntactic and morphological information. The role of certain morphological features in syntax is a central concern for P&P work. Rivero's chapter focuses on one aspect of this. She is concerned in particular with the licensing of Tense in Breton and South and West Slavic languages, which have main clauses in which an untensed verb precedes a tensed auxiliary. She argues that this is the result of Long Verb Movement, which is triggered by a PF interface condition. A very different approach to word order phenomena is developed in Abeille and Godard's chapter on French. They argue that a variety of French wordorder facts cannot be captured using only functional or categorial distinctions but also require a distinction in terms of weight. In addition to the traditional heavy constituents (which have to come last in their syntactic domain), they propose to distinguish between light constituents (consisting of certain words used bare or in minor phrases) that tend to cluster with the head, and middle weight constituents (including typical phrases) that allow for more permutations. They capture these distinctions within HPSG with a ternary-valued WEIGHT feature. The chapters collected here obviously do not discuss all the questions that arise about syntactic categories, but they do discuss many of the most important issues.
6
Robert D. Borsley
Above all, they highlight the centrality of questions about syntactic categories for a number of different frameworks.
NOTES 1 1 am grateful to Anne Abeille, David Adger, Ruth Kempson, Fritz Newmeyer, Marisa Rivero, and Anthony Warner for helpful comments on this introduction. 2 1 am grateful to the British Academy for help with the funding of the conference and to Ian Roberts for help with the organization. 3 Gazdar and Mellish (1989:141) trace the idea that syntactic categories are complex back to Yngve( 1958). 4 A rather similar conception of functional categories is developed in Netter (1994).
REFERENCES Chomsky, N. A. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N. A. (1970). Remarks on nominalization. In R. Jacobs and P. S. Rosenbaum (Eds.), Readings in English transformational grammar. Waltham, MA: Ginn and Co. Chomsky, N. A. (1986). Barriers. Cambridge, MA: MIT Press. Cinque, G. (1994). On the evidence for partial N-movement in the Romance DP. In G. Cinque, J. Koster, J.-Y. Pollock, L. Rizzi, and R. Zanuttini (Eds.), Paths towards universal grammar: Studies in honor of Richard S. Kayne. Washington, DC: Georgetown University Press. Gazdar, G., and C. Mellish (1989). Natural language processing in PROLOG: An introduction to computational linguistics. New York: Addison Wesley. Grimshaw, J. (1991). Extended projection. Unpublished manuscript, Brandeis University, Waltham, MA. Harman, G. (1963). Generative grammar without transformational rules: A defense of phrase structure. Language, 39, 597-616. Jackendoff, R. S. (1977). X'-syntax: A study of phrase structure. Cambridge, MA: MIT Press. Netter, K. (1994). Towards a theory of functional heads: German nominal phrases. In J. Nerbonne, K. Netter, and C. Pollard (Eds.), German grammar in HPSG, CSLI, (297-340). Stanford, CA: Stanford University Press. Webelhuth, G. (1995). X-bar theory and case theory. In G. Webelhuth (Ed.), Government and binding theory and the minimalist program, (15-95). Oxford: Blackwell. Yngve, V. (1958). A programming language for mechanical translation. Mechanical Translation, 5, 25-41.
GRAMMAR WITHOUT FUNCTIONAL CATEGORIES RICHARD HUDSON Linguistics Department University College, London London, United Kingdom
1. INTRODUCTION The chapter considers the notion functional category and concludes that, at least as far as overt words are concerned, the notion is ill founded. First, none of the definitions that have been offered (in terms of function words, closed classes, or nonthematicity) are satisfactory, because they either define a continuum when we need a sharp binary distinction, or they conflict with the standard examples. Second, the two most commonly quoted examples of word classes that are functional categories cannot even be justified as word classes. Complementizers (Comp) have no distinctive and shared characteristic, and Determiners are all pronouns that are distinguished only by taking a common noun as complement—a distinction that is better handled in terms of lexical valency than in terms of a word class.
2. FUNCTIONAL CATEGORIES The notion functional category' has played a major part in discussions of syntactic theory. For example, Chomsky introduces it as follows: Syntax and Semantics, Volume 32 The Nature and Function of Syntactic Categories
7
Copyright © 2000 by Academic Press All rights of reproduction in any form reserved. 0092-4563/99 $30
8
Richard Hudson Virtually all items of the lexicon belong to the substantive categories, which we will take to be noun, verb, adjective and particle, . . . The other categories we will call functional (tense, complementizer, etc.). (Chomsky, 1995:6)
He later suggests that only functional categories carry strong features (Chomsky, 1995:232), and that they "have a central place in the conception of language . . . primarily because of their presumed role in feature checking, which is what drives Attract/Move" (Chomsky, 1995:349). Similarly, it has been suggested that functional categories cannot assign thetaroles (Abney, 1987; Radford, 1997:328), and that they can constitute the "extended projection" of their complement's lexical category (Grimshaw, 1991; Borsley and Kornfilt, this volume). According to the Functional Parameterization Hypothesis, functional categories are the special locus of the parameters that distinguish the grammars of different languages (Atkinson, 1994:2942; Ouhalla, 1991; Pollock, 1989; Smith and Tsimpli, 1995:24), and Radford (1990) has suggested that they are missing from child language. Such claims have not been restricted to the Chomskyan school: In Head-driven Phrase Structure Grammar (HPSG) we find the suggestion that only functional categories may act as "markers" (Pollard and Sag, 1994:45), and in Lexical Functional Grammar (LFG) that functional categories always correspond to the same part of f-structure as their complements (Bresnan, this volume). Any notion as important as Functional Category2 should be subjected to the most rigorous scrutiny, but this seems not to have happened to this particular construct. Instead it has been accepted more or less without question, and has become part of mainstream theorizing simply through frequent mention by leading figures. I suggest in this chapter that the notion is in fact deeply problematic. The attempts that have been made to define it are flawed, and all the individual categories that have been given as examples present serious problems. The issues raised here should at least be considered by proponents of the notion. If the criticisms are well founded, the consequences for syntactic theory are serious; but even if these worries turn out to be groundless, the debate will have made this key notion that much clearer and stronger. To avoid confusion it is important to distinguish three kinds of category, which we can call Word Category, Subword Category, and Position Category. Word categories are simply word classes—Noun, Determiner, and so on. Every theory accepts that there are words and that these fall into various classes, so Word Category is uncontroversial even if the validity of particular word categories is debatable. Subword categories are elements of syntactic structure that (in surface structure) are smaller than words—morphemes or zero. (Clitics are on the border between these types, but it makes no difference here whether we classify them as belonging to word or sub word categories.) The obvious example of a subword category is
Grammar without Functional Categories
9
inflection (INFL), to the extent that this corresponds merely to the verb's inflection or to zero. It is a matter of debate whether subword categories have any place at all in syntactic theory, and most theories at least restrict their use (e.g., by Pullum and Zwicky's principle of Morphology-free Syntax—Zwicky, 1994:4477). This issue is orthogonal to the questions about functional categories that I wish to raise here, so I shall avoid it by focusing on word categories. Position categories are a further extension of word and subword categories, where the name of the category is used to label a structural position. For example, the standard Barriers analysis of clause structure recognizes C and I as positions in an abstract tree structure. The labels C and I are abbreviations of Comp (for Complementizer) and INFL (for Inflection), but the link to the original word and subword categories is broken because these positions may be either empty or filled by a verb—which is not, of course, classified inherently as a complementizer or inflection, even if the relevant feature structures overlap. Such position categories are also controversial and raise problems of both fact (Hudson, 1995) and theory that go beyond the scope of this chapter. The central question to be addressed, therefore, is the status of the construct Functional Word Category (FWC), rather than the more general question of functional categories. Given this focus it is important to acknowledge that subword and position categories are also central to the discussion of functional categories. The conclusion of this chapter is that FWC is not justified, but even if this conclusion is correct, it will still remain possible that some subword and position categories are functional. I shall argue, for example, that Complementizer is not a valid word category, but it could still be true that the position category C is valid. On the other hand, FWCs are part of the evidence that is normally adduced in support of the more abstract categories, so anything that casts doubt on the former must affect the credibility of the latter. This chapter moves from the particular to the general. Sections 2 and 5 will discuss the categories Complementizer and Determiner, which are among the most frequently quoted examples of FWC. The discussion of Pronoun in section 3 is needed as a preparation for the proposed analysis of determiners, as is the general theorizing about valency and classification in section 4. The conclusion of these sections will be that neither Complementizer nor Determiner is a valid word class, so (a fortiori) neither can be a FWC. Sections 6 and 7 will consider two standard definitions of FWC: as a class of function words and as a closed class. It will be argued that Function Word is indeed an important and valid construct with a good deal of empirical support, and similarly (but to a lesser extent) for Closed Class. However, I shall also show that neither of these two concepts is suitable as a basis for FWC. The conclusion, in section 8, will be that FWC plays no part in grammar, though there may be a small role for Function Word. Encouragingly, Cann (this volume) reaches a similar conclusion by a different route.
10
Richard Hudson
3. COMPLEMENTIZER The following argument rests in part on a general principle of categorization that should be laid out before I proceed. The principle amounts to no more than Occam's razor, so it should be sufficiently bland to be acceptable regardless of theoretical inclinations. (1)
Principle 1 A word class should be recognized only if it allows generalizations that would not otherwise be possible.
The classic word classes satisfy this principle well. Take Noun, for example. Without it, one could say that some words can head a verb's subject, and that some words can head its object, but in each case one would have to simply list all the words concerned. Given the category Noun, however, we can express the generalization that the lists are the same—not to mention the lists needed for many other facts about distribution, morphology, and semantics. Similarly for Auxiliary Verb, a word class defined by a collection of characteristics that include negation, inversion, contraction, and ellipsis. Without this word class it would not be possible to show that these characteristics all applied to the same list of words. In contrast with these very well-established classes, some traditional word classes have a rather uncertain status, with Adverb as the classic case of a "dustbin" that has very few characteristics of its own, though probably enough to justify it among the major word classes. In short, every word class must earn its place by doing some work in the grammar. How does the word class Complementizer fare when tested against this principle? The history of this class is not encouraging, as its very existence escaped the notice of traditional grammarians; if it really does allow generalizations that would not otherwise be possible, how did traditional grammar manage without it? Even the name Complementizer suggests some uncertainty about the distinctive characteristics of its members: Do they introduce complement clauses or subordinate clauses in general? In English, the words concerned are (according to Radford, 1997:54) that, if and for. Every introductory book tells us that these form a class, with the possible addition of whether, but what precisely are the generalizations that this class allows? The answer seems to be that there are no such generalizations. This claim is controversial and requires justification, but before we consider the evidence I should reiterate that we are discussing the "word category," whose members are overt words, and not the "position category," which includes the structural position "C." I have argued elsewhere (Hudson, 1995) that this category is invalid as well, but that is a separate debate.3 What, then, do all the three core complementizers have in common? As Radford
Grammar without Functional Categories
11
points out (1997:54), they can all introduce a subordinate clause that is the complement of a higher verb or adjective. His examples are the following: (2) a. I think [that you may be right]. b. I doubt [if you can help]. c. I'm anxious [for you to receive the best treatment possible]. Radford's generalization is that complementizers: 1. indicate that the following clause is a complement of some other word, 2. show whether this clause is finite, and 3. mark its semantic role in the higher clause (which Radford calls its illocutionary force). Unfortunately these characteristics do not justify Complementizer, as we shall now see. • Claim A (indicating complement-hood) is false because the clause introduced by a complementizer need not be the complement of another word. That and for allow a subject link: (3) a. [That you may be right] is beyond doubt. b. [For you to receive the best treatment possible] is important. Moreover, for also allows an adjunct link: (4) a. I bought it [for you to wear]. b. A good book [for you to read] is this one. According to standard analyses of relative clauses, the same is even true of that, which is assumed to occur in combination with a zero relative pronoun (Radford, 1997:305): (5) He is someone [that we can identify with]. Furthermore, although it is true that all the complementizers may be used to introduce a complement clause, the same is also true of words that are not complementizers, most obviously the interrogative words. (6) a. I wonder [who came]. b. I know [what happened]. It is true that standard analyses assume a zero complementizer in addition to the interrogative word, but the claim is that complementizers indicate the clause's function, which must be a claim about overt words. • Claim B (indicating finiteness) is true, but again not unique to complementizers. The same is in fact true of every word that can introduce a clause: there
12
Richard Hudson
is no word that allows a clause as its complement without placing some kind of restriction on its finiteness. For example, why requires a tensed clause, whereas how allows either a tensed clause or an infinitival: (7) a. I wonder [how/why he did it], b. I wonder [how/*why to do it]. Similar remarks apply to all the traditional subordinating conjunctions, such as while, because, and unless, none of which are generally considered to be complementizers. • Claim C (indicating semantic role) is only partially true, as Radford's own second example illustrates: after doubt either that or if is possible without change of meaning. (8) I doubt [if/that you can help]. Moreover, to the extent that it is true, this characteristic is again not peculiar to complementizers. Most obviously, the same is (again) true of interrogative pronouns. Having considered and rejected Radford's generalizations, we should consider whether there are any other generalizations that might justify Complementizer. A plausible candidate concerns extraposition: all the complementizers allow extraposition. (9) a. It surprises me [that John is late]. b. It is unclear [if it rained]. c. It surprises me [for John to be late]. However, if Complementizer was valid this should be the end of the possibilities, but it is not. The same is also true for TO (which is not a complementizer) and for all the interrogative words, including whether: (10) a. It surprises me to see John here. b. It is unclear whether/when it rained. Indeed, extraposition is even possible for some noun-headed phrases, such as those containing nouns like WAY (but not MANNER) and NUMBER: (11) a. It is astonishing the way/*manner she drinks. b. It is astonishing the number of beers she can drink. These nouns can only be extraposed if they are modified by what is at least syntactically a relative clause: (12) a. *It is astonishing the clear way. b. *It is astonishing the incredibly large number.
Grammar without Functional Categories
13
Once again Complementizer does not prove particularly helpful. If there is a single thread running through all the phrases that can be extraposed, it may be semantic rather than syntactic. In short, whatever all three core complementizers have in common does not distinguish them from interrogative words. This suggests that Radford's three claims can and should be handled without mentioning Complementizer. Let us consider how this can be done. • Claim A. To the extent that complementizers do indicate a complement link between the following clause and some preceding word, this is because the latter selects it as the head of its complement. However, words that select complementizers always select specifically. This is illustrated in Table 1, which shows that think allows that or zero4 but not if or for, and so on. Furthermore, almost every verb that allows if also allows whether and the full range of interrogative pronouns. (The only exception is doubt.) In short, no valency statement will ever say, "such-and-such word takes as its complement a clause introduced by a complementizer."5 • Claim B. Precisely because different complementizers select different tenses, Complementizer as such will not help in constraining the choice of tense inflection. This selection must be handled separately for different complementizers: tensed or subjunctive6 after that, tensed after if, to after for. (13) a. b. c. d.
I know that Pat is/*be/*to be leader. I recommend that Pat is/be/*to be leader. I wonder if Pat is/*be/*to be leader. I long for Pat to be/*is/*be leader.
• Claim C. The same logic applies here, too. Different complementizers indicate different semantic roles, so verbs will select specific complementizers rather than the general category Complementizer. As mentioned above, almost every verb that selects if also allows any interrogative word, which makes Complementizer even less relevant to semantic selection. TABLE 1 SELECTIONAL DIFFERENCES AMONG COMPLEMENTIZERS Complement clause Verb
that/zero
Think Wonder Long Know
+ 0 0 +
if (whether, who . ..)
for ... to
+
0 0
0
+
+
0
0
14
Richard Hudson
In short, Complementizer has no role to play in defining the use of the words that, if, and for. It should be noted that we arrived at this conclusion while considering only the "core" examples, so the status of Complementizer is not likely to be improved by including more peripheral examples like whether. On the contrary, in fact, since whether is even more like the interrogative words than if is. Unlike if, but like interrogative words, it allows a following infinitive and a subject link: (14) a. I wonder [whether/when/*if to go]. b. [Whether/when/*if to go] is the big question. However we analyze whether, it is unlikely that we shall gain by invoking Complementizer. The conclusion, therefore, must be that Principle 1 rules out Complementizer. If these words are not complementizers, what are they? We might consider assigning them individually to existing word classes; for example, Haegeman classifies for as a prepositional complementizer, or more simply as a preposition (1994:167), in recognition of the fact that it licenses an accusative subject NP. But even if this is the right analysis for for, it is certainly not right for that (nor for if, though this is less obvious), and in any case it raises other problems. If for is a preposition, its projection should presumably be a PP and yet it is said to head a CP. Its classification should explain why a for-clause can be used equally easily as complement, as subject, or as adjunct, but no single established category has this distribution. The problems of classifying that and if are similar, but if anything even more acute. The alternative to problematic classification is no classification at all—an analysis in which these words are each treated as unique ("syncategorematic"). This is my preferred analysis, as it reflects exactly the outcome of the earlier discussion in which we found that each word is, in fact, unique. Thus that is simply a word, and so are if and for; they are recognized as lexical items, but have no grammatical features and belong to no categories. When the grammar mentions them, it defines them simply as lexical items whose word class is irrelevant. The only complementizer whose classification is at all straightforward is whether, whose similarities to interrogative pronouns have already been pointed out. At least some linguists (e.g., Larson, 1985) argue that it is in fact a whpronoun, and I myself agree with this conclusion (Hudson, 1990:374). Even this analysis is problematic, however, because whether, not being a true pronoun, has no grammatical role within its complement clause. In this respect it is just like if and that, as well as all the subordinating conjunctions, so it is at best a highly unusual wh-pronoun. In conclusion, we have found no justification for Complementizer because there seem to be no generalizations that apply to all the core members. This means that it is not enabling the grammar to express any generalizations, so according to
15
Grammar without Functional Categories
Principle 1, Complementizer does not exist as a category, so (a fortiori) it is not anFWC.
4. PRONOUN We now make a slight detour from the main discussion in order to establish the controversial claim that pronouns are nouns, which will play an important part in the next section's discussion of determiners. As it happens, Pronoun is itself claimed to be an FWC (Radford, 1997:48-49) on the grounds that pronouns are determiners and that Determiner is an FWC. The status of Pronoun as an FWC is thus tied up with that of Determiner, which is the topic for the next section. If, as I shall argue, Determiner is not an FWC, Pronoun cannot be either. However, the argument there presupposes a specific set of analytical assumptions about the classification of non-standard pronouns: that Pronoun is a subclass of Noun, and that determiners are pronouns. Before we can consider the status of Pronoun as an FWC, therefore, we must attend to these analytical questions. Why should we take Pronoun as a subclass of Noun? Radford's discussion considers only one kind of pronoun, personal pronouns, but it is uncontroversial that there are other subclasses, including reflexive, reciprocal, interrogative, relative, demonstrative, negative, distributive, and compound. These subclasses are presented in Table 2, a reminder that our Pronoun is the traditional word class, not the much smaller category that Chomsky (1995:41) TABLE 2 SUBCLASSES OF PRONOUN
a
Class
Definiteness
Personal Reflexive Reciprocal Relative Demonstrative Possessive Distributive Universal Existential Negative Interrogative Compound
Definite Definite Definite Definite Definite Definite Definite Indefinite Indefinite Indefinite Indefinite Indefinite
Examples I/me, you, he/him, one(?) Myself, yourself, himself Each other, one another Who, which, whose, where, when This /these, that /those Mine, yours, his; -'sa Each All, both Some, any, either None, neither Who, what, which, how, why Someone, anybody, nothing, everywhere
The analysis of possessive 's is controversial. I shall simply assume that it is a possessive pronoun; for evidence see Hudson (1990:277).
16
Richard Hudson
calls Pronoun. His category excludes reflexive and reciprocal pronouns and is claimed always to refer, so it presumably excludes the indefinites as well. What all these words share is the ability to be used in the range of phrasal environments available for a full NP; for example, they/them has almost exactly the same overall distribution as the students: (15) a. b. c. d. e.
They/the students have finished. Have they/the students finished? I like them/the students. We talked about them/the students. I saw them and the students.
There are obvious differences of detail that apply to specific subclasses (e.g., personal pronouns cannot be used before possessive - 's, reflexives cannot be used as subject) but the overall similarity between pronouns and NPs is striking. On the other hand, pronouns are different from common nouns in several ways, but in particular in their inability to combine with a preceding determiner (*the I, *a who, etc.). The range of possible modifiers is also strictly limited compared with common nouns, though some allow adjuncts (someone nice, who that you know), and depending on one's analysis, some allow complements (who came, this book).1 Traditional grammar recognizes Pronoun as a supercategory, one of the basic parts of speech alongside Noun, and links the two classes by the enigmatic statement that pronouns "stand for" nouns. In modern phrase-structure analyses the similarity is shown at the phrase level by giving the same label to the phrases that they head. This label is either DP or NP, depending on analysis, but this choice is crucial to the following argument so we shall keep it open by temporarily adopting the neutral term "Nominal Phrase." Thus they is not only a pronoun but also a nominal phrase, and the students is a nominal phrase. Suppose we accept this analysis, and also the general X-bar principle that a phrase's classification must be that of its head word. Given these two assumptions, what follows for the classification of they and students? (The next section will discuss the classification of the.) We have to choose between two answers: Al (the standard analysis): They belongs to the same class as the, which (for the time being) we can call "determiner"; we shall revise the name in the next section. A2 (my preferred analysis): They belongs to the same class as students: Noun. The choice between the two analyses revolves around the analysis of the one-word phrase students, where there is no determiner: (16) a. I like students. b. I found students waiting for me.
Grammar without Functional Categories
17
If students really is the only word in this phrase (as I shall argue), its classification must project to the phrase that must therefore be an NP, so they must also head an NP and must itself be a noun. If, on the other hand, the phrase students is headed by a covert determiner, it must be a determiner phrase and they must be a determiner. The standard analysis stands or falls with the covert determiner. We shall now consider some arguments for it, and an argument against it. One argument for the covert determiner is that it is required by the DP analysis (Abney, 1987), which is widely accepted. If the is the head of the students, the phrase the students must be a projection of the: DP. Therefore the one-word phrase students must be a DP, with a covert determiner. However, this argument rests on the assumption that the is not a noun. If it were, both phrases would be NPs and there would be no need for a covert determiner. The next section will be devoted to this claim, so I shall simply note at this point that (as far as I know) the category Determiner has always been taken for granted in discussions of the DP hypothesis so there is no "standard" evidence for it. I cannot prove that such evidence does not exist, but I shall prove that a coherent analysis can be built without this assumption. Radford gives some more direct evidence in support of the covert determiner (1997:152). He points out that (if it exists) it selects the same kind of complement as enough,8 namely a plural or mass common noun: (17) a. I found things/stuff/*thing. b. I found enough things/stuff/*thing. It also has an identifiable determiner-like meaning, which is either "existential," as in the above examples, or "generic," as in the following: (18)
I collect things /stuff/*thing.
In short, the covert determiner is a normal determiner in its semantic and syntactic characteristics, so its only peculiarity is its lack of phonology. However, this argument is open to various empirical objections: • It is because of the semantics of enough that its complement must be either plural or mass; so we might predict that the word meaning enough in any other language will have the same restriction. In contrast, the restrictions on the hypothetical covert determiner vary between languages—in many languages it would not be as restricted as in English. So even if the covert determiner selects the same kind of complement as some determiners, it does not select in the same way. • The word sufficient imposes the same restriction on the semantics of the noun as does enough, and also, like enough, it excludes (other) determiners. (19) a. enough/sufficient things/stuff/*thing b. some/the *enough/*sufficient stuff
18
Richard Hudson
But to judge by the adverb sufficiently, sufficient is an adjective, which weakens the argument for a covert determiner wherever the noun must be plural or mass. It could equally be argued either that enough is an adjective, or that there is a covert adjective whose meaning and distribution are like those of sufficient. • The fact that the covert pronoun allows generic and existential reference only shows that it places no restrictions on that aspect of reference. In contrast, overt determiners typically do restrict it; for example, some excludes generic reference. • What the covert pronoun does exclude is "definite" reference—reference to an object that is already known to the addressee; but definiteness is one of the main differences between common nouns and proper nouns, which are inherently definite. This suggests that the indefiniteness of the one-word phrase students is inherent to the common noun, rather than due to a covert determiner. We now consider the alternative analysis of the one-word phrase students in which there is no covert determiner, my analysis A2. The syntactic restrictions can be reversed: instead of saying that the covert determiner selects plural or mass common nouns, we have Rule 1.9 (20)
Rule 1 Singular, countable common nouns must be the complement of a determiner.10
As for the indefinite meaning of students, we can follow the suggestion made above: just as proper nouns are inherently definite, so common nouns are inherently indefinite. In both cases the default meaning may be overridden—in the terms of Pustejovsky (1991), "coerced"—by the meaning imposed by a determiner; so a common noun may be coerced into definiteness (the students), and a proper noun into indefiniteness (a certain John Smith). In this analysis, a has a special status similar to that of the dummy auxiliary do. There are purely syntactic patterns such as subject inversion that are only available for auxiliary verbs; so if the meaning to be expressed does not require any other auxiliary verb, do can be used because it has no meaning of its own, and therefore does not affect the meaning. Similarly for a: Rule 1 sometimes requires a determiner, so if no other determiner is required by the meaning to be expressed, a "dummy" determiner is needed which will not affect the meaning. This is a, whose only contribution to meaning is to restrict the sense to a single individual (thereby excluding plural and mass interpretations). This analysis of a leads to a positive argument against the covert-determiner analysis. The argument involves predicative uses such as the following: (21) a. They seem (*some/*no/*the) good linguists, b. He is a/*any/*no/*the good linguist. One of the restrictions imposed by verbs like seem is that a complement nominal must not contain a determiner other than a.'' Ignoring the exception of a, this
Grammar without Functional Categories
19
restriction is quite natural if we assume, first, that determiners relate semantically to a nominal's referent rather than to its sense, and, second, that a predicative nominal has no referent (as witnessed by the oddity of the question Which good linguists do they seem?). On that assumption, it is natural to assume that good linguists has no determiner at all, rather than that it has a covert one: so in Radford's terms, it is an NP, not a DP. But in that case it is hard to explain the determiner a in the second example—why is a not only possible, but obligatory? The DP analysis forces a disjunction: the predicative complement of verbs like seem is either an NP or a DP headed by a. (Alternatively, the disjunction may be between a and the covert determiner as the head of the DP.) Now consider the no-determiner analysis. Suppose we assume, first, that seem requires its complement to have no referent, and second, that most determiners have a referent. These two assumptions immediately explain the basic fact, which is the impossibility of determiners. The other fact is the appearance of a, which is also explained by two assumptions that we have already made: that Rule 1 requires a determiner before singular countable common nouns, and that a does not have to have a referent—it is a semantically empty, dummy word like the auxiliary do. The result is that a is both obligatory and possible with linguist, but neither needed nor possible with linguists. Rather tentatively, therefore, we may be able to conclude that nominal phrases need not contain a determiner, so they must be projections of Noun. Therefore pronouns too must be nouns. We can still distinguish Pronoun from Common Noun and Proper Noun as subclasses of Noun. As shown in Table 2, Pronoun has its own subclasses, and no doubt the same is true for Common Noun. The hierarchical structure is shown (using Word Grammar notation) in Figure 1, and more details of the assumed analysis can be found in Hudson (1990:268-335). The conclusion of this section is that Pronoun is a subclass of Noun. This view is quite traditional and fairly widely held (Huddleston, 1984:96), but it is controversial. On the one hand the traditional part-of-speech system treats Pronoun as a separate superclass, and this tradition persists in most modern descriptive grammars (e.g., Quirk et al., 1985:67). On the other hand, the modern DP analysis treats it as a subclass of Determiner, which itself is a distinct supercategory (Radford, 1997:154). If the present section is right, at least this part of the DP analysis must be wrong.
5. VALENCY AND ITS IRRELEVANCE TO CLASSIFICATION In preparation for the discussion of determiners we must establish another general principle, which is merely a particular application of Principle 1 (Occam's razor). It concerns the treatment of valency (alias subcategorization), the restrictions that words place on their complements.12 Various devices are available for
20
Richard Hudson
Figure 1
stating these restrictions: subcategorization frames, Case-marking, SUBCAT lists, linking rules, and so on. However formulated, these restrictions can be stated on an item-by-item basis, as facts about individual lexical items. There is no need to recognize a word class for every valency pattern unless the pattern correlates with some other shared characteristic. In fact, more strongly, it would be wrong to recognize a word class because the class would do no work. As we can see in the following abstract example, it would actually make the grammar more complex without permitting any additional generalization. Given some valency pattern V, which is shared by words A, B, and C, the simplest grammar relates V directly to A, B, and C, giving at most13 three rules:
(22) a. A has V. b. B has V. c. C has V. Now consider the alternative grammar in which A, B, and C are grouped into a word class W, whose sole defining characteristic is that its members have valency pattern V. In this grammar there must be four rules, because the membership of A, B, and C in W must be stipulated:14
(23) a. b. c. d.
A is a W. B is a W. C is a W. W has V.
Grammar without Functional Categories
21
So long as V is the sole characteristic shared by these three words, the grammar with W is clearly inferior to the one without it. In short, valency facts have the same status as any other facts. The above conclusion follows directly from Principle 1, but I shall state it as a separate principle: (24) Principle 2 A word class should not be recognized if its sole basis is in valency/ subcategorization. For example, if a verb's lexical entry shows that it needs a particle (e.g., give up), there is no point in duplicating this information by also classifying it as a "particle-taking verb" unless such verbs also share some other characteristic. So long as the complementation pattern is their only shared feature, the class Particletaking Verb is redundant. In most cases this principle is quite innocuous. It does, however, conflict with the traditional Bloomfieldian idea that differences of "distribution" justify differences of word class. This idea is still widely accepted (or at least taught): The syntactic evidence for assigning words to categories essentially relates to the fact that different categories of words have different distributions (i.e., occupy a different range of positions within phrases or sentences). (Radford, 1997:40; italics in original)
Principle 2 means that some distributional differences are not relevant to categorization, because they are best handled by means of lexical valency. Consider, for instance, the traditional classification of verbs as transitive or intransitive. These terms are simply informal descriptions of valency patterns that can be stated better in valency terms so far as they correlate with nothing else. Indeed, valency descriptions may be preferable to word classes even when there are other correlating characteristics. For example, as the classic Relational Grammar analyses showed (Blake, 1990), it is better to describe the facts of the French faire-faire' construction in terms of valency than in terms of transitive and intransitive verbs. (25) a. Paul fait rire Marie. Paul makes laugh Mary. 'Paul makes Mary laugh.' b. Paul fait lire la lettre a Marie. Paul makes read the letter to Mary. 'Paul makes Mary read the letter.' Described in terms of verb classes, as in (26a), the facts appear arbitrary; but explanation (26b) allows them to follow from the assumption that a verb cannot have two direct objects.
22
Richard Hudson
(26) a. The direct object of faire is demoted to an indirect object if its infinitive complement is transitive. b. The direct object of faire is demoted to an indirect object if it also has a direct object raised from its infinitive complement. It should be noted that Principle 2 is not a wholesale "rejection of distributionalism" (as one reader complained), but simply a recognition that the syntactic distribution of a word has two separate components. One component involves its relations to "higher" structures, via the phrase that it heads. Thus, when seen as head, a preposition is used in "prepositional" environments, a noun in nominal environments, and so on. These are the distributional facts for which word classes are essential. However, the other component involves its valency, its relations to "lower" structures. Here word classes are less helpful because the facts concerned vary lexically and/or semantically: different members of the same word class allow complementation patterns that vary in complex ways that have little to do with word classes, but have a great deal to do with semantics.
6. DETERMINER Turning then to Determiner, this is another class that Radford presents as a functional category (1997:45-48). In this case I shall use Principle 2 to argue that there is in fact no word class Determiner because Determiner would be a subclass that was defined solely by valency. As in other analyses, I shall classify determiners with pronouns, but I shall also argue that the superclass that contains both determiners and pronouns is actually Pronoun, not Determiner. The analysis will build on the idea of section 3 that pronouns are nouns. The first step in the argument is to establish that many determiners can also be used as pronouns. This overlap of membership has often been noticed, and has led to various analyses in which pronouns are treated as determiners (Radford, 1997:154). The earliest of these analyses was Postal (1966), who argued that pronouns are really "articles" generated in what would nowadays be called the Determiner position, and which may be followed by a noun, as in we linguists. The DP analysis of nominals continues the same tradition in which the classes of determiners and pronouns overlap, so the general idea is now very familiar and widely accepted. As Radford points out (1997:49), some of the words vary morphologically according to whether they are used as pronouns or as determiners; for example, none is the pronoun form that corresponds to the determiner form no, and similarly for mine/my, yours/your, hers/her, ours/our, and theirs/their. However the recent tradition assumes that these variations can be left to the morphology and ignored in the syntax. This seems correct (Hudson, 1990:269).
Grammar without Functional Categories
23
The second step involves the syntactic relationship between the determiner and the common noun. The DP tradition that Radford reviews takes the determiner as the head of the phrase. This is also the analysis that I have advocated for some time (Hudson, 1984:90), so I shall simply accept it as given. One of the benefits of the analysis is that it explains why the common noun is optional after some determiners but obligatory after the others (namely, the, a, and every): each determiner has a valency that not only selects a common noun but decides whether it is obligatory or optional, in just the same way that a verb's valency determines the optionality of its object. In other words, the lexical variation between determiners that I shall review is just what one would expect if the common noun is the determiner's complement. The final step is to show that this analysis is only partially right. Determiners are in fact pronouns, rather than the other way round. This may sound like a splitting of hairs, but it will make a great deal of difference to our conclusion. This alternative is not considered in the DP tradition, so there is no case to argue against. In favor of it, we can cite the following facts: • When a determiner/pronoun occurs without a common noun, it is traditionally called a pronoun, not a determiner. It seems perverse to call she and me determiners in She loves me, as required by the DP analysis. In contrast the traditional analysis would (incorrectly) treat this in this book as an adjective, so it is no more perverse to call it a pronoun than to call it a determiner. • Almost every determiner can also be used as a traditional pronoun, but most traditional pronouns cannot also be used as determiners. As mentioned earlier, the only determiners that cannot be used without a common noun are the articles the and a, and every. If we ignore the morphological variation discussed above, all the others can be used either with or without a common noun: (27) a. b. c. d. e.
I like this (book). Each (book) weighs a pound. I found his (book). I found the *(book). Every *(book) weighs a pound.
Given that Determiner is almost entirely contained in Pronoun, it seems perverse to call the superset Determiner. It seems, therefore, that "determiner" may be just an informal name for a particular kind of pronoun, namely a pronoun whose complement is a common noun (or, in more orthodox terms, a phrase headed by a common noun). I believe I may have been the first to suggest this (Hudson, 1984:90), but others have arrived independently at the same conclusion (Grimshaw, 1991; Netter, 1994). As promised, this change of terminology has far-reaching consequences. The pronouns that allow nominal complements are scattered unsystematically across
24
Richard Hudson
the subclasses distinguished in Table 2, with representation in nine of the subclasses: personal (we, you), relative (whose), demonstrative (this/these, that/ those), possessive (my, your, etc.), distributive (each, every), universal (all, both), existential (some, any, either), negative (no, neither), and interrogative (which, what). (The classification also needs to accommodate the articles the and a, but this is irrelevant here.) So far as I know there is no other characteristic that picks out this particular subset of pronouns. It is true, for example, that determiners are also distinguished by the fact of being limited to one per nominal phrase (e.g., unlike Italian, we cannot combine both the and my to give *the my house). But this simply reflects a more general fact about pronouns: they are not allowed as the complement of a pronoun. So long as the complement of a pronoun is limited to a phrase headed by a common noun, one-per-NP restriction will follow automatically.15 What this observation suggests is that valency is the sole distinguishing characteristic of determiners; in short, that determiners are just the subset of pronouns that happen to be "transitive." If this is so, Determiner is ruled out by Principle 2 as redundant. Instead of classifying this as a determiner, therefore, we just say that it allows a complement, and similarly for all the other determiners. Admittedly, this misses an important shared characteristic of determiners, which is that their complements are common nouns; but this generalization can be captured in terms of Pronoun. Indeed, we can even generalize across subclasses of pronoun about their valency, but without invoking Determiner. The following mini-grammar suggests how determiners should be treated:16 (28) a. b. c. d. e.
which is an interrogative pronoun. which allows a complement. this is a demonstrative pronoun. A demonstrative pronoun allows a complement. The complement of a pronoun is a common noun.
Rule (b) treats the valency of which as an arbitrary fact about this pronoun, whereas (d) generalizes across both the demonstrative pronouns; and (e) defines the possible complements for all pronouns (without, however, implying that all pronouns allow a complement). If determiners really are transitive pronouns, two things follow. First, so-called DPs must in fact be NPs because their head, the determiner, is a pronoun and pronouns are nouns. Therefore the head projects to NP, not DP. Second, and most important for present purposes, at least in English there is no category Determiner, so Determiner cannot be any kind of category, functional or otherwise. If correct, this conclusion should be worrying for those who advocate FWCs because I have now eliminated the two most quoted examples of FWC, Complementizer and Determiner.
Grammar without Functional Categories
25
7. FWCs AS CLASSES OF FUNCTION WORDS I turn now to a more general consideration of FWC in terms of its defining characteristics: what is it about a category that makes it qualify as an FWC? A satisfactory answer will draw a boundary around FWCs which satisfies three criteria. First, the boundary should be "in the right place," corresponding to the general intuitions that Auxiliary Verb and Pronoun are candidates for FWC, but Full Verb and Common Noun are not. How can we decide where the right place is, other than by invoking these intuitions? Fortunately, we already have a criterion: Principle 1. The category FWC will be justified to the extent that it allows generalizations that are not otherwise possible. This means that we must look for distinct characteristics that correlate with each other, comparable to those that distinguish (say) Verb from Noun. Without such correlations, any choice of criteria is arbitrary; but with them it becomes a matter of fact. If it turns out that criteria A and B correlate, and both characterize categories X, Y, and Z, then it is just a matter of fact, not opinion, that X, Y, and Z belong to a supercategory. If A and B are among the standard characteristics of FWCs, then we can call this supercategory "FWC"—provided it satisfies the remaining criteria. Second, the boundary should have "the right degree of clarity." By this I mean that it should probably be an all-or-nothing contrast, and not a matter of degree. This criterion is important because of the kinds of generalizations that are expressed in terms of FWC—categorical generalizations such as the one quoted earlier about the links between FWC and feature checking. It would be impossible to evaluate such generalizations if categories had different degrees of "functionality." Third, FWC should have the "right kind of membership," because as a supercategory its members must be categories, not words. If it includes some members of a category, it must include them all. We start then with a widely accepted definition of FWC, which links it to the traditional notion Function Word (FW): The lexical/functional dichotomy is rooted in the distinction drawn by descriptive grammarians . . . between two different types of words—namely (i) contentives or content words (which have idiosyncratic descriptive content or sense properties), and (ii) function words (or functors), i.e. words which serve primarily to carry information about the grammatical properties of expressions within the sentence, for instance information about number, gender, person, case, etc. (Radford, 1997:45)
As Radford says, this distinction is often drawn by descriptive grammarians, and is beyond dispute in the sense that there is an obvious difference between the meaning of a word like book or run and that of a function word such as the or will.
26
Richard Hudson
Cann (this volume) surveys a wide variety of criteria by which Function Word (his "FE") has been defined, and which all tend to coincide. The criteria are semantic, syntactic, and formal, and in each case FWs are in some way "reduced" in comparison with typical words. Semantically they lack what is variously called the "denotational sense" or "descriptive content" of words like tomato, syntactically their distribution is more rigidly restricted, and formally they tend to be short. We could even add to Cann's list of formal distinctions that are relevant to English: • In terms of phonology, only FWs may have /a/ as their only vowel. • In terms of spelling, only FWs may have fewer than three letters.17 • In terms of orthography, only FWs are consistently left without capital letters in the titles of books and articles.18 In short, there can be no doubt about the reality and importance of FW as the carrier of a large number of correlating characteristics. Nevertheless, FW does not justify FWC because it fails the second and third criteria. As far as clarity of boundaries is concerned, there are too many unclear borderline cases that either have one of the characteristics only to some degree, or that have some characteristics of FWC but not all of them. These uncertainties and conflicts are well documented by Cann. For example, there is no clear cutoff between having descriptive content and not having it, so there are borderline cases such as personal pronouns. Unlike clear FWs, these each have a distinct referent, and they also involve "descriptive" categories such as sex, number, and person. Perhaps because of this they tend to be capitalized in book titles (contrary to the pattern for FWs mentioned above). Similarly, it is hard to see where to draw the line, on purely semantic grounds, between the FW and and what is presumably not an FW, therefore: where, for example, would so belong? If even one of the characteristics of FW is indeed a matter of degree, determined by the "amount of meaning" carried, it cannot be mapped onto the binary contrast between functional and substantive categories. This is not an isolated case: formal brevity is even more obviously a matter of degree, and it is hard to imagine that there is a clear threshold for syntactic limitation. This applies similarly to the third criterion, concerning the membership of FWC. If it is indeed a set of word classes, then for any given word class either all of its members should belong to FWC, or none of them should. There should be no split classes (what Cann calls "crossover" expressions). And yet, as Cann points out, split classes do seem to exist. The clearest example of a split word class is Preposition. Some prepositions are very clear FWs—for example, of, at, in, to, and by all qualify in terms of both meaning and length. Indeed, all these prepositions have regular uses in which they could be said to have no independent meaning at all:
Grammar without Functional Categories
(29) a. b. c. d. e.
27
the city of Glasgow at home believe in fairies take to someone kidnapped by gangsters
In each example the preposition is selected by the construction, and does not contrast semantically with any other. On the other hand, there are also prepositions like during and after that have just as much meaning as some adverbs—indeed, there are adverbs that are synonymous except for being anaphoric (meanwhile, afterwards). If the adverbs are content words, the same should presumably be true of their prepositional synonyms. But if this is right, some prepositions are content words and some are FWs. This should not be possible if it is whole word classes that are classified as FWCs. A similar split can be found even among nouns and verbs, though in these classes there are very few members that have the "wrong" classification. The anaphoric one is an ordinary common noun (with the regular plural ones): (30) a. He lost the first game and won the second one. b. He lost the first game and won the other ones. One (in this sense) has no inherent meaning except "countable," since it borrows its sense from its antecedent by identity-of-sense anaphora. But it behaves in almost every other respect just like an ordinary common noun such as book—it accepts attributive adjectives, it inflects for number, and so on. Similarly for the British English anaphoric do, which is an ordinary nonauxiliary verb: (31) a. He didn't call today, but he may do tomorrow. b. A. Does he like her? B. Yes, he must do—just look how he talks to her. This too is completely empty of meaning—it can borrow any kind of sense from its antecedent, stative or active—and yet we use it syntactically exactly as we use an ordinary verb like run. Their lack of meaning suggests that both these words are function words—an analysis that is further supported by their shortness (do has only two letters, and one has a variant with /a/, which is often shown orthographically as 'un: a big 'un). And yet they are clear members of classes whose other members are content words. A similar problem arises with a widely accepted definition of FW in terms of "thematicity" (Radford, 1990:53, quoting in part Abney, 1987:64-65). According to Radford, FWs are "nonthematic," by which he means that even if they assign a theta role to their complement, they do not assign one to their specifier: for example, consider the auxiliary may in the following: (32) It may rain.
28
Richard Hudson
This has a thematic relationship to its complement "it . . . rain," but not to its subject. This may well be a general characteristic of FWs, but it does not apply in a uniform way to all members of the two main verb classes, Full Verb and Auxiliary Verb. On the one hand, as Radford recognizes, there are full verbs that are nonthematic (e.g., seem) and on the other, there are auxiliary verbs that are thematic. Perhaps the clearest example of a thematic auxiliary is the possessive auxiliary have, which for most British speakers can be used as in the following examples: (33) a. Have you a moment? b. I haven't time to do that. The inversion and negation show that have is an auxiliary, but it means "possess" and assigns a thematic role to both its subject and its complement. Some modal verbs also qualify as thematic auxiliaries. The clearest example is dare, but we might even make the same claim regarding can if we can trust the evidence of examples like the following: (34) a. Pat can swim a mile. b. Pat's ability to swim a mile is impressive. The words Pat's ability to swim a mile are a close paraphrase of example (a), so they should receive a similar semantic (or thematic) structure; but if the ability is attributed directly to Pat, as in (b), there must be a thematic link not only in (b) but also in (a). If this is true, can in (a) must be thematic, because it assigns a role to its subject as well as to its complement. In short, although most auxiliary verbs qualify as FWs, there are some that do not. This is not what we expect if Auxiliary Verb, as a whole, is an FWC. In conclusion, we cannot define FWC in terms of FW because the latter does not have suitable properties. As Cann says, FW is a "cluster concept," which brings together a range of characteristics that correlate more or less strongly with one another, but which does not map cleanly onto word classes. The boundary of FW runs through the middle of some word classes, and the criteria that define FW are themselves split when applied to word classes. There is no doubt that a grammar should accommodate FW in some way, but not by postulating FWC.
8. FWCs AS CLOSED CLASSES Another definition that has been offered for Functional Category refers to the distinction between open and closed classes, which again is part of a fairly long tradition in descriptive linguistics (Quirk et al., 1985:71; Huddleston, 1984:120). For example, Haegeman (1994:115-116) invokes the contrast between closed
Grammar without Functional Categories
29
and open classes when she first introduces the notion Functional Projection (her equivalent of Functional Category). It is also one of the criteria in Abney (1987:64). This distinction is different from the function-content distinction because it applies to classes rather than to their members. A class is open if it can accept new members, and closed if it cannot, regardless of what those members are like. This looks promising as a basis for the defmiton of FWC—at least it should satisfy our third criterion of having whole classes rather than individual words as members. However, this definition fares badly in relation to the other two criteria. Once again one problem is that the closed-open distinction is a matter of degree, whereas categories must be either functional or substantive; in short, this criterion fails on the second test, clarity of the boundary. The distinction really belongs to historical linguistics because the addition of new vocabulary changes the language and involves two diachronic processes: creative word formation and borrowing. Borrowing is the most relevant of these because it is the one most usually mentioned in discussions of the closed-open distinction. The question, then, is whether there is, in fact, a clear distinction between word classes that do accept loans (or caiques) from other languages, and those that do not. Among historical linguists the answer is uncontroversial (e.g., Bynon, 1977: 255; Hudson, 1996:58-59). There is no such distinction, only a gradient from the most "open" class, Noun, to the most closed ones (such as Coordinating Conjunction). Even the most closed classes do accept some new members. For example, in English, the list of personal pronouns has seen some changes through time, with the recent addition of one, "people," and the much older addition of they, them, and their; and even Coordinating Conjunction has a penumbra of semimembers (yet, so, nor—see Quirk et al., 1985:920), which may presage a future change of membership. Another way to approach the closed-open distinction would be to consider the actual size of the membership, giving a distinction between "large" classes and "small" ones. However, this is obviously likely to be a matter of degree as well, and it is precisely in the classes of intermediate size that uncertainty arises. Preposition is a clear example, with about seventy clear single-word members (Quirk et al., 1985:665), several of which are loans (via, per, qua, circa, versus, vis-avis, save). Chomsky (1995:6) appears to classify Preposition as a functional category,19 but Radford does not (1997:45). Quirk et al. (1985:67) classify Preposition as a closed class, in spite of the evidence in their own list, and Haegeman (1994:115) recognizes that it is a "relatively closed class," but nevertheless classifies it as a substantive category. As we saw in the last section, Preposition is also a troublesome borderline case for the definition of FWC in terms of FW. The closed-class definition of FWC also fails on the first test by not putting the boundary in the right place. The problem is that it is easy to find examples of closed classes that are not FWC by any other criteria—and in particular, not FWs.
30
Richard Hudson
Cann lists a number of examples such as points of the compass and days of the week. These have some idiosyncratic syntactic characteristics, but in most respects they are ordinary common or proper nouns (to the north, on Wednesday). The membership of these classes is rigidly closed, so should we conclude that they are FWCs in spite of being nouns? This discussion has suggested that FWC is not the natural extension of Closed Class that it may seem to be. Classes are more or less closed, but categories are not more or less functional, and a closed class may be a subset of an open one.
9. GRAMMAR WITHOUT FWCs If the previous arguments are right, the notion FWC has never been defined coherently, so we cannot be sure which categories are functional and which are not. Moreover, we have found that two of the clearest examples, Complementizer and Determiner, are not even word classes, let alone functional word classes. It therefore seems fair to conclude (with appropriate reservations about subword and position categories) that there may not in fact be any functional categories at all. However, it would be wrong to end on such a negative note, because the discussion also has a positive outcome: the validity of the notion Function Word as a cluster concept defined by a combination of characteristics. Even if FW does not justify FWC, it deserves some place in a grammar, but what place should it have? The basis for FW is that words that have very little meaning tend also to have very little form and very little syntactic freedom. One possibility is that this is a fact that is available only to academic observers of language, comparable with the facts of history and worldwide variation; but this seems unlikely, as the raw data are freely available to every speaker, and the correlations are both obvious and natural—indeed, iconic. It seems much more likely that FW is part of every speaker's competence, but cluster concepts are a challenge for currently available theories of grammar,20 especially when some of the concepts are quantitative (amount of meaning, amount of form, amount of freedom).
NOTES 1
This chapter has changed considerably since my presentation at the Bangor conference on syntactic categories in 1996. It has benefited greatly from discussion at that conference and at a seminar at University College London, as well as from the individual comments of And Rosta, Annabel Cormack, and, in particular, Bob Borsley. It also takes account of what two anonymous referees said about an earlier version. I am grateful to all these colleagues who have helped me along the way, and especially to Ronnie Cann for generously showing me several versions of his paper.
Grammar without Functional Categories
31
2
Grammatical terms may be used either as common nouns (e.g., a noun; two nouns) or as proper nouns (e.g., (The class) Noun is larger than (the class) Adjective). I shall capitalize them when used as proper nouns. 3 To give a flavor of the debate, consider the argument that the position C allows a simple explanation for word-order facts in Germanic languages. In V2 order, the verb is said to be in C, so if C is filled by an overt complementizer, the verb cannot move to C—hence clause-final verbs in subordinate clauses. Unfortunately for this explanation, it is not only complementizers that trigger clause-final verb position—the same is true of all traditional "subordinating conjunctions," relative pronouns, interrogative pronouns, and so on, none of which are assumed to be in C position. If they are (say) in Spec of C, why can't the verb move to C, as in a main clause? 4 The treatment of "zero" is irrelevant to our present concerns because any solution will pair zero with just one complementizer, that. My own preferred solution was suggested by Rosta (1997), and involves a special relationship "proxy." Verbs that allow either that or zero select a proxy of a tensed verb, which is either the tensed verb itself, or the instance of that on which it depends. In this way I avoid positing a "zero complementizer," while also avoiding the repeated disjunction "a tensed clause introduced by that or by nothing." As Rosta points out, another advantage of this analysis is that it allows us to refer directly to the finiteness of the lower verb. One part of finiteness is the contrast between indicative and "subjunctive," as in (1). (1) I recommend that Pat be the referee. A verb such as recommend may select the proxy of a subjunctive verb as its complement, which means that followed by a subjunctive verb. If it turns out that subjunctive verbs almost always occur with that, this fact can be built into the definition of "proxy of subjunctive verb," just as the optionality of that is built into that of "proxy of indicative verb." 5 A referee comments that the same is true of Preposition: no verb selects generally for PP, but many verbs select either individual prepositions (e.g., depend selects on) or some meaning which may, inter alia, be expressed by a PP (e.g., put selects a place expression such as on the table or here). This is true, but all it shows is that Preposition is not relevant to subcategorization. It is not a reductio ad absurdum of Principle 1, because Preposition can be justified in other ways (e.g., in terms of preposition stranding and pied-piping). 6 See footnote 4 for my preferred treatment of subjunctive selection. 7 According to the analysis for which I shall argue in section 5, determiners are pronouns that take common nouns as their complements. Wh-pronouns also take complements, though their complements are the finite verb in the clause that they introduce (Hudson, 1990:362). Given these two analyses, it follows that one pronoun may even have two complements, a common noun and a finite verb, as in which students came? 8 The word enough is a poor example of a determiner, as it is more like a quantity expression such as much and many—indeed, Rosta has suggested (1997) that the surface word enough corresponds to a pair of syntactic words, much/many enough. 9 Rule 1 ignores examples like the one Radford cites, our (8a): (1)
Pat is head of the department.
This shows that some singular countable common nouns can sometimes be used without a determiner, but this possibility depends both on the noun itself and on the containing
32
Richard Hudson
phrase. It is possible for names of professions (as in French), but not generally; and it is possible after the verb be or become, but not generally: (2) a. Pat is head/*bore/*admirer of the department. b. Pat is/became/*introduced/*looked for head of the department. If anything, the pattern confirms Rule 1, because the exceptions also involve a complement noun selecting the word on which it depends: Rule 1 (exception) A profession noun may be the complement of the verb be or become. Similar remarks apply to other well-known examples like the following: (3) a. We were at school/college/*cinema. b. He was respected both as scholar and as administrator. 10
This analysis reverses the usual relationship between complements and heads. In general, heads select complements, but in this analysis it is the complement (the common noun) that selects the head (the determiner). This relationship is not without precedent, however. In Romance languages, the perfect auxiliary is selected by its complement (e.g., unaccusative verbs such as "go," select "be," while other verbs select "have"). In English, the adjective same selects the determiner the (thel*al*my same person), and own selects a possessive determiner (my/*the/*an own house). 11 The possibilities are different if the complement contains a superlative adjective: (1) a. That seems the best solution, b. That seems my best option. Thanks to Annabel Cormack for this detail. And Rosta also points out the possibility of no before certain adjectives: (2) a. She seems no mean linguist. b. That seems no small achievement. 12
It could be argued that valency should also cover subjects/specifiers, but this is a separate issue. 13 It makes no difference to this argument whether valency patterns are stipulated or derived by general linking rules from argument structure. In either case, the links between the words and the word class have to be stipulated. 14 It makes no difference how the class-membership is expressed. I use Word Grammar terminology here, but the same objection would apply to a feature notation because the relevant feature [+W] would be stipulated lexically. 15 If the equivalent of *the my house is permitted in some other language, this must be because the valency of the article allows a disjunction: either a common noun or a possessive pronoun. Similar minor variations are found in English—for example, universal pronouns (all, both) allow a pronoun or a common noun (all (the) men, both (my) friends). This is to be expected if the complement is selected lexically by the determiner, as claimed here. 16 This little grammar speaks for itself, but illustrates some important principles of Word
Grammar without Functional Categories
33
Grammar, such as the independence of rules about "possibility" and "identity," and generalization by default inheritance. (For more details see Hudson, 1990, 1998.) 17 The only nonfunction words whose only vowel is /a/ are the interesting pair Ms. and Saint (pointed out by Piers Messun); and those that have fewer than three letters are go, do, and ox. The observation about spelling is in Albrow (1972) and Carney (1997). 18 For example, we all write Aspects of the Theory of Syntax, with the FWs of and the treated differently from the others. Even quite long FWs such as since and with may be treated in this way, but some short FWs tend to be capitalized, as in the following examples: (1) a. b. c. d.
But Some of Us Are Brave. What Do We Mean by Relationships? Up to You, Porky. Cosmetics: What the Ads Don't Tell You.
Usage is remarkably consistent across authors, but not completely consistent; for example I found its treated in both ways: (2) a. Huddersfield and its Manufacturers: Official Handbook, b. The English Noun Phrase in Its Sentential Aspect. 19 More accurately, Chomsky lists just four substantive categories, including "particle," but gives an incomplete list of functional categories that does not include Preposition. His view of Preposition therefore depends on whether or not he intends it to be subsumed under Particle. To add to the uncertainty, in another place (1995:34) he includes Pre- or Postposition among the categories that are defined by the features [N, V], which presumably means that it is a substantive category, but Particle is not. He does not mention Adverb in either place. 20 1 believe that Word Grammar offers as good a basis as any current theory for the treatment of cluster concepts. The "isa" relationship allows the members of FW to be either whole word classes (e.g., Auxiliary Verb) or individual words (e.g., that). Default inheritance allows individual FWs to lack any of the default characteristics, as required by any cluster concept. Functional definitions allow FW to have the same range of attributes as any specific word—a default meaning schema, a default valency pattern, even a default phonological and spelling schema. Thus the definition of FW might look something like this (where "whole" is the full inflected form—Creider and Hudson, 1999):
(1) a. b. c. d. e.
FW has a complement. FW's sense and referent are the same as those of its complement. FW's whole is one syllable. FW's vowel may be /a/. FW's written whole may be one letter.
This set of characteristics defines the typical FW, such as a, do, or of. Less typical FWs override one or more characteristics, so the more characteristics are overridden, the less typical they are. This captures one aspect of the quantitative variability of FWs. The other aspect involves the amount of variation on each of the individual characteristics; for example, a word that contains two syllables is clearly less exceptional than one with three, and a word whose sense supplies only one feature (e.g., "countable") is less exceptional than one that supplies two (e.g., "male," "singular").
34
Richard Hudson
REFERENCES Abney, S. (1987). The English Noun Phrase in its Sentential Aspects. MIT dissertation. Albrow, K. (1972). The English writing system: Notes towards a description. London: Longman. Atkinson, M. (1994). "Parameters." In R. Asher (ed.), Encyclopedia of Language and Linguistics (pp. 2941-2). Oxford: Pergamon Press. Blake, B. (1990). Relational grammar. London: Routledge. Bynon, T. (1977). Historical linguistics. Cambridge: Cambridge University Press. Carney, E. (1997). English spelling. London: Routledge. Crystal, D. (1995). The Cambridge encyclopedia of the English language. Cambridge: Cambridge University Press. Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press. Creider, C., and Hudson, R. (1999) Inflectional morphology in Word Grammar. Lingua 107: 163-87. Grimshaw, J. (1991). Extended projection. Unpublished manuscript. Haegeman, L. (1994). Introduction to Government and Binding Theory. (2nd ed.) Oxford: Blackwell. Huddleston, R. (1984). An introduction to the grammar of English. Cambridge: Cambridge University Press. Hudson, R. (1984). Word grammar. Oxford: Blackwell. Hudson, R. (1990). English word grammar. Oxford: Blackwell. Hudson, R. (1995). Competence without Comp? In B. Aarts and C. Meyer (eds.), The verb in contemporary English (pp. 40-53). Cambridge: Cambridge University Press. Hudson, R. (1996). Sociolinguistics. Cambridge: Cambridge University Press. Hudson, R. (1997). The rise of auxiliary DO: verb-non-raising or category-strengthening? Transactions of the Philological Society 95:1, 41-72. Hudson, R. (1998). An encyclopedia of Word Grammar. Accessible via http://www. phon.ucl.ac.uk/home/dick/wg.htm Larson, R. (1985). On the syntax of disjunction scope. Natural Language and Linguistic Theory 3:217-264. Netter, K. (1994). Towards a theory of functional heads: German nominal phrases. In J. Nerbonne, K. Netter, and C. Pollard (Eds.), "German grammar in HPSG. CSLI Lecture Notes 46." Stanford: CSLI. Ouhalla, J. (1991). Functional categories and parametric variation. London: Routledge. Pollard, C., and Sag, I. (1994). Head-driven Phrase Structure Grammar. Chicago: University of Chicago Press. Pollock, J.-Y. (1989). Verb-movement, universal grammar and the structure of IP. Linguistic Inquiry, 20, 365-424. Postal, P. (1966). On so-called 'Pronouns' in English. Georgetown Monographs on Languages and Linguistics, 177-206. Pustejovsky, J. (1991). The generative lexicon. Computational Linguistics, 17, 409-441. Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J. (1985). A comprehensive grammar of the English language. London: Longman. Radford, A. (1990). Syntactic theory and the acquisition of English syntax. Oxford: Blackwell.
Grammar without Functional Categories
35
Radford, A. (1997). Syntactic theory and the structure of English: A minimalist approach. Cambridge: Cambridge University Press. Rosta, A. (1997). English Syntax and Word Grammar Theory. London PhD. Smith, N., and Tsimpli, I. (1995). The mind of a savant: Language learning and modularity. Oxford: Blackwell. Trudgill, P. (1990). The dialects of England. Oxford: Blackwell. Zwicky, A. (1994). Syntax and phonology. In R. Asher (ed.), Encyclopedia of language and linguistics (pp. 4476-81). Oxford: Pergamon Press.
This page intentionally left blank
FUNCTIONAL VERSUS LEXICAL: A COGNITIVE DICHOTOMY RONNIE CANN Department of Linguistics University of Edinburgh Edinburgh, United Kingdom
1. INTRODUCTION1 A persistent tendency within the grammatical tradition has been to divide grammatical categories and parts of speech into two superclasses. The distinction appears, for example, in the differentiation made between "grammatical" (or functor) expressions and "contentive" ones (Bolinger, 1975). The former consist of those expressions, words and bound morphemes, that serve a purely grammatical function, whereas the latter provide the principal semantic information of the sentence. In recent years within transformational syntax, the distinction has (re-)surfaced as a contrast between "functional" and "lexical" categories (Chomsky, 1995; Kayne, 1994; Ouhalla, 1991; Stowell, 1981; etc.). This distinction shares properties with that made between grammatical and contentive expressions in that it applies to bound morphs as well as to independent words and reflects a primary semantic distinction between theta-assigning (contentive) categories and nontheta-assigning (functional) ones (Grimshaw, 1990). It also reflects the distinction made in the classical grammatical tradition between "accidence" and "substance." The former refers primarily to the grammatical (morphological) categories exhibited by a language (such as case, tense, etc.) that are the parochial characteristics of word formation of a particular language, whereas the substantives are the linguistically universal classes and properties. Hence, functional elements may be associated with the accidental morphological properties of a Syntax and Semantics, Volume 32 The Nature and Function of Syntactic Categories
37
Copyright © 2000 by Academic Press All rights of reproduction in any form reserved. 0092-4563/99 $30
38
Ronnie Cann
language and so implicated in parametric variation. Lexical expressions, on the other hand, provide the universal substance of the sentence through their semantic content. The significance of this distinction has apparently received strong psycholinguistic support over recent years, with extensive evidence that the processing of functional expressions differs from that of contentive ones (see below for references). Evidence from aphasic breakdown, language acquisition, priming experiments, and so on all indicate that a small subset of words are processed differently from the majority of the basic expressions of a language. This difference in processing may be argued to reflect the different syntactic properties exhibited by the two macroclasses of elements and hence provide a sound psychological underpinning to recent developments in linguistic theory. However, despite the centrality of functional categories within current linguistic theory and the robustness of the psycholinguistic evidence for their significance in processing, there remains considerable vagueness about what exactly the term functional picks out from the expressions of a language, what constitutes a functional category, and what is the relationship between functional expressions, broadly construed, and the functional categories identified for a language, either specifically or universally. Within transformational grammar, the functional categories typically include complementizer, tense, and agreement, and are distinguished from the major categories of noun, verb, adjective, adverb, and (to a certain degree) preposition. In the psycholinguistic literature, however, expressions, such as, there, here, and so on, within the major classes, and discourse markers, such as therefore, are often included in the set of functional elements, whereas certain expressions often considered to be members of functional classes (like certain quantifiers, e.g., many, several, and the numerals) are treated as nonfunctional. The relation between the experimental evidence and the theoretical distinction is thus more problematic than at first appears. In particular, the question arises as to whether the functional distinction is categorial, as has been suggested in certain studies of first language acquisition (see Morgan, Shi, and Alopenna, 1996). If it is, then the nature of this categorial split and the way that it interacts with further categorization becomes an important question. If it is not, then one must ask what is the relation between the set of functional expressions and the functional categories recognized within syntactic theory. In this chapter, I explore these questions, beginning with a review of the general linguistic properties considered illustrative of the distinction and the psycholinguistic evidence for the nature of the functional-lexical divide. The main problem centers around whether the distinction should be made at the level of the expression or at some more abstract level of categorization. Noting that the evidence for a categorial distinction to be made between functional and lexical expressions comes principally from psycholinguistic studies, I argue that the distinction is best viewed in terms of Chomsky's (1986) differentiation be-
Functional versus Lexical: A Cognitive Dichotomy
39
tween I-language and E-language. The discussion then moves to the nature of E-linguistic categorization and its relation to I-linguistic (universal) categories. The chapter ends by questioning the need to set up the specific functional categories independently of functional lexemes themselves and suggests a model of the grammar that attempts to reconcile the psycholinguistic properties of functional expressions and their position within theoretical syntax.
2. CHARACTERIZING FUNCTIONAL EXPRESSIONS Within general linguistic theory, the identification of functional expressions and, especially, functional classes is controversial and problematic. Within transformational grammar, the syntactically significant functional classes include complementizer, determiner, and inflection (INFL), the latter of which is now often decomposed into Tense and Agr(eement), following Pollock (1989).2 Other functional categories are regularly added to the list, most frequently verbal categories such as Neg(ation) (Pollock, 1989), Asp(ect) (Hendrick, 1991), and Focus (Tsimpli, 1995, inter al.), but also nominal categories such as Det(erminer) (Abney, 1987), Num(ber) (Ritter, 1991), and K (case) (Bittner and Hale, 1996). In frameworks such as Kayne (1994) and Cinque (1998), functional categories are set up independently of any morphophonological considerations, leading to a proliferation of such categories that are empty of all content, syntactic, semantic, and phonological (their content coming from contentive specifiers). In this section, I am concerned with the general linguistic properties that have been proposed to characterize functional categories (see also Abney, 1987, for some discussion). The ones that interest me are defined over the functional expressions that instantiate the categories, rather than over more abstract properties (such as the ability to assign theta roles). In the discussion that follows, I shall be concerned only with the behavior of morphs or the observable characteristics of the classes they comprise. The syntactic properties discussed below are thus intended to be predicated of free and bound morphs such as articles, demonstratives, quantifiers, pronouns, complementizers, agreement affixes, tense, and reflexes of other inflectional elements, and not of the more abstract concepts with which they may be associated. The abstract functional categories of Kayne (1994) are hence omitted from consideration as they cannot directly provide evidence for a macrofunctional category. 2.1. Closed versus Open The distinction between functional and lexical parallels (and is often conflated with) that drawn between "closed" and "open" classes of expressions (Quirk et al., 1972:2.12-2.15). Functional classes such as pronoun, article, conjunction,
40
Ronnie Cann
and so on, form classes whose membership is fixed, whereas noun, verb, and adjective are open classes whose membership can be extended through borrowing or by transparent derivational means. Typically, functional classes are small and listable for any language, and the total number of all such elements within a language is considerably smaller than the numbers of open class expressions. Thus, the number of independent (wordlike) functional expressions within English has been said to be around 150 (Shillcock and Bard, 1993), which only increases slightly if bound morphemes are included in the total. This criterion is not entirely straightforward, however. In the first place, a number of subgroups of the traditional open classes form closed subclasses. For example, auxiliary verbs form a closed subclass of verbs, and days of the week, months of the year, points of the compass, and so on, form closed subclasses of nouns that show idiosyncratic syntactic behavior [compare (la) with (1b-lc) and (1d) with (le)].3 (1) a. b. c. d. e.
I'll see you Tuesday/on Tuesday/the first Tuesday after Easter. I'll see you tomorrow/*on tomorrow/*on the first tomorrow after Easter. I'll see you *breakfast/at breakfast/at the first breakfast after Lent. The exiles went North/to the North/to North London/North of Watford. The exiles went *London/to London/to London town/*London of Ontario.
Although the class of auxiliary verbs is usually taken to comprise a class of functional expressions, it is not normal to so classify the nominal subclasses indicated above, despite the fact that they clearly define a closed class of expression. Hence, the membership of some expression in a closed class is not by itself sufficient to make that expression (or the class that contains it) a functional one. Conversely, being identified as a functional expression may not always imply that the class it belongs to is closed. For example, the class of adverbs, generally construed as an open class, contain the expressions here and there, which are often classified with functional expressions, being essentially "pro-adverbs." Furthermore, there are closed classes, such as the pronouns, whose functional status is unclear and which are variously classified as reflexes of either major or functional categories (Noun, Agr, or Det). Thus, although there is a strong correlation between functional status and closed class, the property is neither necessary nor sufficient to distinguish functional classes from lexical ones. 2.2. Phonology and Morphology A number of phonological differences between the functional and lexical expressions have been noted. For example, evidence from English indicates that nonaffixal functional expressions typically lack metrical stress (see Cutler and Norris, 1988) and their vowels tend to be reduced and centralized (although this is unlikely to be true for all affixes in highly inflecting languages). For English,
Functional versus Lexical: A Cognitive Dichotomy
41
this phonological difference can also be seen in the general lack of initial strong syllables for functional expressions (9.5% of the 188,000 words in the LondonLund corpus), although it is common for lexical expressions (90%) (see Cutler and Carter, 1987). This reduced phonological status of functional expressions is reflected in their morphological structure. Functional expressions tend to be less independent than lexical expressions and are often encoded as bound morphs or clitics, as illustrated in (2). (2) a. b. c. d.
I'll work it out. (< will) *I'll the kettle (< fill). We've arrived. (< have) * We've on Sunday (< leave)
However, phonological reduction may also occur with lexical expressions in certain contexts. For example, it is likely to occur if a lexical expression is repeated or strongly predictable from the discourse context. In certain cases some expressions may even lose their lexical integrity (e.g., wanna < want to, gonna < going to), the latter contraction occurring in real lexical constructions (in British English, at any rate) as in I'm [gauna] London. On the other hand, it is possible in certain circumstances to accent functional expressions (e.g., in contrastive focus: / saw THE best book on Minimalism today, Cinderella HAS gone to the ball, etc.). Thus, although phonological and morphological reduction is indicative of functional status, it is not criterial. Following the general tendency for functional expressions to form closed classes, we find that they do not generally undergo derivational or other word formation processes like compounding (unfair ~ *unmany, verbify ~ *himify, owner ~ *haver). It is certainly true that books about morphology discuss only such processes as they apply to content words, and there are few uncontroversial examples of derivation as applied to functional expressions. On the other hand, the lack of derivation is not a sufficient criterion for functional status, as many lexical expressions fail to undergo expected or possible derivational processes (e.g., unhappy ~ *unsad ~ *unmany). Hence, again, we see that the lack of derivational morphology associated with functional expressions is not a sufficient condition to distinguish functional expressions from lexical expressions. 2.3. Syntax A number of syntactic differences have been said to distinguish lexical and functional expressions. In the first place, the latter appear in more restricted syntactic contexts than the former. For example, functional expressions usually appear in just a few syntactic contexts, and these are definitional of the class they belong to. Thus, modals must appear in construction with4 a bare V (or zero proverb) (Kim may go/Kim may I* Kim may going/* Kim may a dog)', articles all appear in construction with a following noun and nowhere else (the goosel*the ran, etc.);
42
Ronnie Cann
quantifiers appear independently (manyI all) or in construction with a following noun (many geese/all sheep) or with a following of phrase (many of the sheep/ all of the sheep) and so on. For lexical expressions, on the other hand, syntactic context varies widely and is not definitional of the class as a whole, or even of distinct subclasses. For example, lexical expressions may appear in various syntactic environments: (e.g., verbs may appear with or without direct objects or with sentential or nominal complements or with NPs in various cases: e.g., partitive for accusative in Finnish, etc.) (believe 0/the story/that Kim is mad/Kim to be mad); nouns may appear with or without determiners (water/the water/the water of Leith); adjectives may appear predicatively or attributively, and so on. Thus, the fact that an expression is a verb says nothing about the number and class of its complements. However, identifying an expression as a (proper) quantifier (in English) automatically predicts that it may appear on its own, with a common NP or with a following of phrase containing a definite NP. Furthermore, if a functional expression can appear with an expression with a particular property, then it will appear with all expressions with the same property. Thus, the can appear with any common noun in English, a can appear with any singular count noun, be can appear with any transitive verb that has an en form, and so on. For lexical expressions, however, there is no guarantee that an expression can appear with all relevant complements. Thus, while transitive verbs all take NP direct objects (by definition), it is not the case that a particular transitive verb will appear with every NP because of selectional restrictions (e.g., kick the footballl*kick many ideas). It is also possible for lexical expressions to be so restricted in their distribution that they will appear with only one or two items in the language (e.g., addled in English, which can collocate only with the words eggs and brains). The possibility of restrictive collocation does not seem to hold of functional expressions and may be attributed to the fact that such expressions typically do not impose idiosyncratic semantic selectional restrictions on their complements. Another aspect of the syntactic restrictedness of functional expressions, unlike lexical expressions, is that there are no processes that alter their selectional properties. Thus, there are no processes that apply to functional expressions5 that alter the status of their semantic arguments (as in passivization, raising, etc.), whereas such processes are common for lexical expressions and ensure that they appear in a wider range of syntactic contexts. Furthermore, it is not normally the case that long-distance dependencies alter the contexts in which functional expressions may be found. Question movement, topicalization, extraposition, and so on, which may radically alter the environments in which lexical expressions are found, do not generally apply to the complements of functional expressions. This is necessarily true of affixes, but also holds of more independent expressions, hence the ungrammaticality of expressions like *cats, Kim really liked the parallel to The cats, Kim really liked.
Functional versus Lexical: A Cognitive Dichotomy
43
This is not always true of all classes of functional expressions, however. For example, both auxiliaries and prepositions in English permit the extraction of their following complements (e.g., Who did Kim give the book to? What town did Kim send the cat to? Lou said he must go to town, and so go to town, he must.) However, such extractions are not common and are often subject to restrictions not apparent with lexical expressions. Thus, in English, the topicalization of a VP after a modal or auxiliary is strongly literary, whereas extraction from prepositional phrases is not completely free. It does not, for example, apply to clausal complements (assuming that complementizers like because are prepositions, see Emonds, 1976) (e.g., *Kim is mad, Jo is not happy because), nor to prepositional ones (*Through what did Kim go out? parallel to What did Kim go out through?). It is worth noting in this regard that auxiliaries and prepositions both have stronger semantic argument properties than many other functional expressions and given the association often made between argument structure and extraction,6 it is possible that this property is responsible for such exceptions to the general rule. Conversely, there are processes that apply to functional expressions that do not apply to lexical ones. An obvious example are the auxiliary verbs in English, which may appear before the subject (Will Hermione sing?/*Sings Hermione?); host the negative clitic n't (Hermione won't sing/*Hermione singn't)', and permit cliticization to a preceding element (Hermione'II sing soon). Although there are some verbs that occupy an awkward midway position between auxiliary and main verb in allowing some of these processes (such as need, dare, see Pullum and Wilson, 1977, inter al.), the majority of verbs show none of them. Groups of functional expressions also tend to cluster together around a particular major class (e.g., determiners and quantifiers with nouns, tense, aspect, and agreement with verbs) and these groupings define syntactic domains of a particular type (an extended projection in the terminology of Grimshaw, 1991). Thus, in English any expression appearing after the must be interpreted as nominal, whereas any expression appearing with a modal must be verbal [e.g., (3a, 3b)]. Where functional expressions from different domains are combined, the result is generally gibberish [e.g., (3c)]. (3) a. the kill (N) ~ may kill (V) b. the killing of the whale (N) ~ may kill the whale (V) c. *the ran ~ *many bes ~ *may them The same strict interpretation of syntactic domain does not hold of combinations of lexical expressions, and apparently anomalous combinations of expressions (e.g., adjective plus verb) do not necessarily lead to nonsense. Thus, the strings slow ball or cat killer may be used in different environments without being incomprehensible, compare (4a) with (4b) and (4c) with (4d). (4) a. Kim hit a slow ball (N). b. Kim slow balled it into the back of the net (V).
44
Ronnie Cann
c. Felix was a cat killer (A/N). d. Felix cat killered it round the garden (V). Another important property of functional expressions is that they can alter the categorial status of lexical expressions, whereas the latter cannot "coerce" functional expressions out of the domain that they define. Thus, tense morphemes are always verbal, articles are always nominal, whatever lexical expression they appear with.7 Looked at extensionally, once a functional expression has been assigned to a general domain (nominal, verbal, or whatever), then it always remains in that domain (although certain ones may be underspecified, like English -ing forms, which can appear in nominal or verbal contexts, see Borsley and Kornfilt, this volume). Lexical expressions, on the other hand, are freer to appear in different syntactic domains.8 Thus we have a situation where functional expressions generally exhibit a more restricted syntax, are more categorially determinate than lexical expressions, and often also associated with syntactic positions that cannot be occupied by lexical expressions. Furthermore, they cannot be coerced out of their syntactic category in the same way as lexical expressions. These properties are more robustly and generally applicable to functional expressions than those discussed in previous sections. Again, however, they are neither fully necessary nor sufficient to guarantee that some expression is functional, as there are lexical expressions with restricted syntax (e.g., addled noted above) and that resist appearing as a member of more than one category, and there are functional ones that appear in a wider range of contexts and as member of different categories (e.g., the participle forms in English) and which do only appear in positions that can be occupied by lexical ones (e.g., pronouns).
2.4. Semantics The most quoted semantic difference between the two classes of expression is that functional expressions have a "logical" interpretation, whereas lexical expressions have a denotative one. Thus, we find that major word classes have been traditionally defined in terms of their supposed semantic denotations. Nouns are notionally classed as expressions that name persons, places, or things; verbs are classed as expressions that denote actions, processes, states, and so on. Although structuralist linguists have denied the utility of such notional definitions of the parts of speech, the concept was defended in Lyons (1966) and has reentered the literature in terms of semantic sorts. Thus, many theoretical frameworks make use of Davidson's ontological distinction between events and individuals (see Davidson, 1967). Although the correspondence is not strictly parallel to the syntactic classification of verb versus noun (phrase), its recent appearance indicates a persistent tendency for lexical expressions to be defined in terms of their denota-
Functional versus Lexical: A Cognitive Dichotomy
45
tion (i.e., through the ontological properties of the sorts of thing they typically identify). (See also Anderson, 1997, for other a recent notional theory of the parts of speech.) Functional expressions, on the other hand, are said not to denote in the same way: they do not pick out sets of primitive elements, and ontological considerations do not have an effect on their classification. Instead, functional expressions typically semantically constrain relations between basic denotational classes or provide instructions for where to look for the denotation specified by an associated lexical expression. So, for example, quantifiers relate cardinalities and proportions of elements between nominal and verbal denotations; articles provide information about the discourse status of a referent; tense provides information about the relative time an event occurs; .modals provide information about the status of an event or proposition (e.g., as possible, necessary, etc.). However, such an approach begs many questions. Precisely what it means to have a logical interpretation is not easy to define, and the attempt at a characterization of the semantics of functional expressions in the previous paragraph is not easy to sustain. For example, although it is often true that functional expressions constrain relations between classes of primitive denotata, this does not hold of anaphoric expressions such as pronouns, pro-adverbs, and so on, which have a referential rather than a relational function. Furthermore, many lexical expressions denote relations that may, as in the case of verbs taking intensional complements like want, be as complex in semantic structure as more obviously functional expressions. Nor is it possible to maintain a view of functional expressions in which they typically convey less information (in some sense) than lexical ones. The semicopular verbs in English such as seem, appear, and so on, are typically treated as lexical expressions despite the fact that the information they convey bears comparison with that conveyed by the modal auxiliaries like can, may, which are treated as functional. Moreover, certain apparently functional expressions (like quantifiers such as several and numerals) again appear to convey as much information as lexical nouns (such as number, mass and so on). It appears, therefore, that although there does intuitively seem to be some content to the idea that the major lexical classes denote ontologically basic elements, a purely semantic characterization of the difference between functional and lexical expressions is unlikely to be sustainable. A more robust semantic property that differentiates the two classes, however, can be found in the interrelations that are found between members of different subclasses of expressions. Lexical expressions are linked in complex arrays of sense relations and exhibit identifiable semantic relations with each other, in terms of synonymy, hyponymy, selectional restrictions, and so on (see Cruse, 1986, for an extended discussion). These properties constitute the subject matter of most work on lexical semantics and provide interesting insights into the way our experience of the world is structured. No such sense relations obtain between
46
Ronnie Cann
(subclasses of) functional expressions. Although classes of functional expressions do exhibit similarities in meaning, this always results from the defining characteristics of the class itself. Thus, the and a, might be described as "opposites" (or co-hyponyms) of definiteness, but the relation between them is not one that is identifiable in groups of lexical expressions, nor is there ever a corresponding superordinate expression (i.e., no general purpose definite/indefinite marker) that can be transparently related to other subclasses of functional expressions.9 Quantifiers also form a class that exhibits a number of logical relations with each other, but these result from the basic semantics of the class in determining relations between sets, and the common characteristics are constrained by properties like permutation, conservativity, and so on (see van Benthem, 1986), which are hypothesized to be universal, unlike the parochiality exhibited by semantic fields in different languages. In other words, classes of functional expressions are semantically isolated, whereas lexical expressions are linked in complex arrays of meaning relations. Another semantic property displayed by lexical expressions but not by functional ones involves "coercion" or the modification of the denotation of one lexical expression by that of another. A classic example of this involves the influence of a complement NP on the Aktionsart of a sentence (see Verkuyl, 1993, inter al.). Thus, a bounded NP object like three dinners with an essentially unbounded process verb like eat produces an interpretation of the event as bounded (i.e., as an accomplishment), whereas a semantically unbounded NP (mass or bare plural) induces a process interpretation (5a-5c). This does not happen with functional expressions whose interpretation remains constant whatever semantic characteristics are displayed by the expression with which they combine. Notice further that combining a distributive quantifier with a mass term (or vice versa), does not affect the basic interpretation of the quantifier, which remains distributive (or mass). So three wines is distributive/count in (5d) and much sheep remains mass in (5e), despite the normal denotation of the complement noun. (5) a. b. c. d. e.
Kim ate all day. Kim ate three ice creams. Kim ate ice cream all day. Kim drank three wines. Much sheep was eaten by the infected cattle.
Unbounded/Process Bounded!Accomplishment Unbounded/Process Count Mass
The effects of semantic coercion go beyond Aktionsart, however. Because of the existence of selectional restrictions, different combinations of lexical expressions may give rise to metaphorical effects requiring inferencing to resolve apparent contradictions. In other words, attempts will be made to accommodate apparently anomalous combinations of lexical expressions, yielding metaphorical interpretations that may alter the basic type of object described by a phrase. Thus, in (6a) the event described is a physical one, whereas in (6b) an abstract event is
Functional versus Lexical: A Cognitive Dichotomy
47
described (see Pittock, 1992, for some discussion of this form of semantic coercion). Contradictions generated by combinations of functional expressions, on the other hand, lead to incomprehension/ungrammaticality (*all some books cf. all the books). In other words, the meaning of a functional element is not negotiable: there is no "inferential space" between a functional expression and the expressions with which it combines. (6) a. The river flowed to the sea. Physical b. Kim's thoughts flowed to Skye. Abstract A further semantic property of functional expressions that has been noted is that they may yield no semantic effect in certain environments. This is typically said of case or agreement, and in Chomsky (1995) a distinction is made between interpretable and noninterpretable features from which a number of theoretical consequences are derived. It is certainly the case that grammatical properties that are determined by other elements may not be semantically interpreted. Thus, the preposition to following a ditransitive verb like give is said not to have a semantic role but to act like a case-marker, in distinction from its use following an intransitive verb of motion like go. However, it is unlikely that any grammatical distinction that is not purely morphological (e.g., declensional class and the like) is entirely without interpretative capability. For example, agreement is often asserted (usually without discussion) to be an instance of a category without semantics, its sole role being to encode dependency relations. But this is shown to be false when one takes into account examples where agreement relations are broken (instances of grammatical disagreement). Where expected patterns are disrupted, the disagreeing feature signalled by the functional expression (usually an affix) induces an interpretation based on the interpretation of that feature. We find examples of this in many languages that have a system of agreement, as illustrated in the examples from Classical (Attic) Greek in (7a-7b). In (7a), there is a disagreement in number on the verb that emphasizes the individual nature of the withdrawal, and in (7b) there is a disagreement in gender that signals the effeminacy of the subject (see Cann, 1984, ch. 6 for further discussion of such phenomena).10 (7) a. to stratopedon anekho:ru:n Thucydides 5.60 the army[sg] withdraw [3pl] 'The army are withdrawing (severally).' b. kle:sthene:s esti sophoitate: Kleisthenes[masc] is wise[superlative,fem] 'Kleisthenes is a most wise woman.' However, although it does not appear to be true that functional expressions always lack semantic effect, it is true that this is often suppressed or eliminated in normal environments. Such is not the case with lexical expressions, however. The meaning of a lexical expression is not fully suppressed, even in strongly idiomatic or
48
Ronnie Cann
metaphorical environments, as can be seen in the ways in which metaphors and idioms can be felicitously extended. For example, (8a) makes a better extension of the figurative sentence in (6b) than (8b), whereas (8c) is a more informative statement than (8d), showing that the literal meaning of expressions is not completely suppressed in coerced (metaphorical or idiomatic) interpretations. (8) a. and eddied around the poor cottage where her mother lived. b. ??and exploded beside the poor cottage where her mother lived. c. Tabs were kept on the victim but they kept blowing off. d. ??Tabs were kept on the victim, but they were very noisy. Although an absolute distinction between the semantic properties of functional and lexical expressions cannot probably be made, semantic differences between the two classes do thus exist. Lexical expressions engage in rich semantic relations with others of the same sort, but their interpretation is subject to inferential manipulation in context. The semantics of functional expressions, on the other hand, may be suppressed in normal environments, but their interpretation cannot be coerced by other expressions with which they appear. 2.5. Diachrony, Polysemy, and Homonymy If there were a strong categorial differentiation between functional and lexical expressions, this would imply that the sets of expressions that make up the two macroclasses are discrete. This requires formally identical morphs that have both functional and lexical manifestations to be treated as homonyms rather than polysemes, which typically do not involve different categories. Hence, the morph to in English in its grammatical usage as a case marker ought to be classed as functional, whereas its manifestation as a preposition of motion should be classed as a lexical homonym. However, it is far from clear that the two uses of the preposition are as distinct as homonymy implies. For example, as a case-marker to only marks NPs whose relation to the event described by the main verb is such as can be described as a goal. There are no examples of this preposition marking patient, theme, or source NPs, indicating that it is a semantically reduced variant of lexical, to.11 This observation has led to the view, advocated in Adger and Rhys (1994), that case-marking prepositions (and other functional expressions in Chinese, see Rhys, 1993) mediate the thematic role assigned by a verb. Thus, although such prepositions, whose appearance is determined by a verb, do not themselves assign a full thematic role to their complement NPs, they provide bridges to help verbs assign the correct thematic roles to their arguments and so must be of the right sort to identify that role. If we accept this view, then we could hypothesize that there is only a single preposition to12 in English that has functional and lexical manifestations. Other evidence against homonymy comes from diachronic processes of Gram-
Functional versus Lexical: A Cognitive Dichotomy
49
maticalization. According to one theory (Hopper and Traugott, 1993), an expression develops into a grammatical homonym through a period of polysemy involving pragmatic enrichment (9). (9) single item > polysemy (pragmatic enrichment) > homonymy ("bleaching") It is the middle phase that poses problems for the idea that there is a discrete categorial difference between functional and lexical. The notion of polysemy requires there to be a single lexeme used in different contexts to give different but related meanings. If the dichotomy between functional and lexical is analyzed in terms of discrete categories, then it should be impossible for any expression to have polysemous uses that straddle the boundary between them. However, it is clear that this is precisely what does happen where a lexical expression is in the process of developing grammatical uses. An example of this sort of polysemy is given by the verb have in certain dialects of English.13 The different constructions involving have do not partition neatly in terms of their syntactic properties according to whether they are contentive (i.e., semantically "full") or functional (semantically bleached). Thus, from a semantic point of view the decrease of semantic effect goes from Process (Jo had a party), through Possessive (Jo has three books), to Causative (Jo had the cat cremated), Modal (Jo had to go home), and Perfect (Jo had gone home) (lOa). However, classic tests for auxiliaryhood in English (see Pullum and Wilson, 1977) show a different pattern that cuts across the semantic development, with the possessive showing more auxiliary-like behavior than the causative or modal uses (10b).14 (10) a. Process > Possessive > Causative/Modal > Perfect b. Process/Causative > Modal > Possessive > Perfect The mismatch between the auxiliary status of the verb in each construction and its semantic content seems to deny any clear distinction between the contentive and functional uses of this expression, thus undermining the idea that there is homonymy, and leading to the conclusion that have is a single polysemous expression in this English dialect. The fact that one must recognize functional/ lexical polysemy, at least for certain stages in the Grammaticalization of an expression, makes a strong categorial distinction between functional and lexical expressions problematic. 2.6. Discussion From the above discussion it appears to be true that certain grammatical tendencies are related to the functional/lexical distinction. Functional expressions tend to form closed classes; to be phonologically and morphologically reduced; to appear in a restricted range of often idiosyncratic syntactic environments; to appear in general categorial domains from which they cannot be shifted; to have
50
Ronnie Cann
meanings that may be fully suppressed in certain environments; and to allow the possibility of syntactically and semantically coercing lexical expressions. Lexical expressions, on the other hand, seem not to have these properties, but to form open classes, to be morphologically free, to appear in a wide range of syntactic environments, and to be categorially and semantically coercible. However, none of these linguistic characteristics is individually sufficient or uniquely necessary to determine whether a particular expression in some language is functional or lexical. Furthermore, the discussion in section 2.5 shows that, if the functional/lexical dichotomy is categorial, it cannot be discretely so, since a single expression may show behavior that combines both functional and lexical properties. This type of pattern, where grammatical properties cluster around groups of expressions but do not fully define them, and where there is not a discrete boundary separating one class of expressions from another, is typical of a number of linguistic notions like subject, head, and so on. Such "cluster concepts" characterize gradients from one type of expression to another depending on the number of properties exhibited but seem to reflect linguistically significant distinctions. There are four ways to approach a cluster concept of this sort. In the first place, one may deny the utility of the concept in linguistic description. Second, one may treat the concept as prototypical, allowing more or less determinable deviations from a putative norm. Third, one may restrict the set of properties indicative of the category to a potentially relevant subset in order to make the concept absolute. Finally, one may assume that the concept is essentially primitive and that variability in associated properties is explicable through other means. With regard to a categorial distinction between functional and lexical expressions, the first position is the one taken in Hudson (this volume, 1995), which accepts the importance of the notion of functional expression (Hudson's Function Word) but denies that Function Word Category has any linguistic significance. The fact that categories are only as useful as the generalizations that can be made using them makes the lack of any defining (and, therefore, predictable) properties of functional expressions strongly indicate that a category, functional, is not a linguistically useful one. However, the fact that there are strong tendencies for functional expressions to exhibit certain types of property supports the second position, which might be taken by proponents of Cognitive Grammar (Langacker, 1987; Taylor, 1989). In such a view, there would be a prototypical functional category that would be, for example, phonologically reduced, affixal, syntactically restricted to a single domain, and semantically impoverished in some sense. Instantiations of this category would more or less conform to this prototype and shade into the prototypical lexical category. The third approach that could be taken to the apparent cluster concept of functional category appears to be the one often taken in the Principles and Parameters literature. Here an abstract view of categorization is assumed that maps only imperfectly onto particular classes of (distributionally defined) expressions within a
Functional versus Lexical: A Cognitive Dichotomy
51
particular language. Functional categories, for example, may be defined as ones that do not assign a theta-role, but that select a particular (often unique) type of syntactic complement (Grimshaw, 1990). These theoretically motivated properties abstract away from the directly observable properties of functional expressions and allow the categorial distinction to be made uniformly at a more remote level of analysis. The final view of the categorial divide is the least well supported by the linguistic data, but is the one that will be pursued in this chapter. In other words, I explore the idea that the functional-lexical distinction is useful at some level of description, is not a prototypical concept, is not abstract but is categorial. To provide evidence that this is the case, however, I do not intend to explore further the linguistic properties of such expressions. Instead I will examine the psycholinguistic evidence in favor of there being a significant difference in the processing of the two classes of expression. Although it is not common to resort to experimental or pathological evidence to support linguistic hypotheses, the growing body of psycholinguistic research into the distinction between lexical and functional expressions is too extensive and important to ignore. Although none of the evidence is uncontroversial, the picture that emerges is one where the psychological treatment of functional expressions differs significantly from that of lexical ones, lending credence to the idea that they instantiate a primary categorial distinction.
3. THE PSYCHOLINGUISTIC EVIDENCE Evidence for the significance of the functional-lexical distinction from a psychological perspective comes from three principal sources: language processing, patterns of aphasic breakdown, and language acquisition. Exactly what the functional elements are within a language is not, however, clearly defined in the psycholinguistic literature, and the distinction between functional and contentive elements is often rather crudely drawn. Typically, such expressions are referred to as "closed class" items, even though, as pointed out in section 2.1, this is not a particularly good determinant of functional status. Fairly uncontroversially, however, such a view leads to classes of expressions such as determiners (especially articles, demonstratives, and certain quantifiers like every and all), auxiliary and modal verbs, prepositions, (certain) complementizers, and pronouns being treated as functional. More controversially, also included within this grouping are the "pro-adverbs" (here and there), clausal connectives (such as therefore), and intensifies (such as so, very, etc.). Other possible functional expressions (such as certain quantifiers like several, many) may be excluded from consideration as are expressions (such as the quasi-modals need, dare, etc.) that behave syntactically partly like functional expressions and partly like contentive ones. In the discussion
52
Ronnie Cann
that follows, I shall be deliberately loose in my terminology, reflecting the looseness apparent in the psycholinguistic literature. 3.1. Processing Experiments to test the psychological mechanisms underlying language processing provide strong support for there being a significant difference in the way certain functional elements behave. In the first place, there is evidence that functional expressions are not affected by speech errors. For example, spoonerisms only involve pairs of contentive expressions and never involve functional ones (Garrett, 1976, 1980). Thus, one gets errors like The student cased every pack but not Every student packed the case for The student packed every case or A student likes the lecturer for The student likes a lecturer. Processing models (e.g., Garrett, 1980) have tried to explain this effect by assuming a level at which lexical expressions are represented in the syntactic tree, prior to the insertion of the functional elements. Erroneous replacements and switches are then held to apply at this prior level, giving the observed errors. Second, normal adults show a frequency effect in lexical decisions with contentive expressions. In other words, normal adults respond quicker in timing experiments to more frequent words. This does not apply to functional expressions, where response times for all expressions is similar, even if on a straight count the items differ in absolute frequency (e.g., between the and those) (Bradley, Garrett, and Zurif, 1980). These results are controversial, and Bradley's dual-access route to the lexicon has been challenged in Besner (1988), Gordon and Caramazza (1982, 1985), among others, who report work that indicates that there is a frequency effect with functional expressions, as well as with lexical expressions. It may therefore be the case that both classes do show frequency effects, but that there is a limit to the effect with the most frequent expressions, a group that is dominated by functional expressions (Richard Shillcock, personal communication). More robust evidence comes from experiments that show that normal subjects take longer to reject nonwords based on lexical expressions than those based on functional expressions (e.g., thinage vs. thanage) (Bradley, 1978, and replicated by others, see Matthei and Kean, 1989). This implies that the linguistic-processing mechanism "knows" that a word is a functional expression and thus "knows" that it will not undergo any derivational processes. For lexical expressions, the processor appears to make a wider search for matching candidates within the lexicon. Thus, it appears that the linguistic processor is able to recognize instantly a derived form as based on a functional expression and reject the form without trying to identify whether the form is well formed and/or attested. Word priming experiments (see especially Shillcock and Bard, 1993) show that there is a difference in priming between certain functional expressions and lexical expressions. Lexical expressions prime lexical homophones (so, for example, the
Functional versus Lexical: A Cognitive Dichotomy
53
verb arose primes the noun rose}, and they also prime semantically related expressions (for example, wood also primes timber). Functional expressions, however, do not prime homophones (e.g., would does not prime wood), nor do they appear to prime semantically related expressions (e.g., may does not prime must or might). Further evidence for the distinction between functional expressions and lexical expressions is afforded by the informational encapsulation of lexical items during processing. Priming effects are independent of the syntactic structure within which a lexical item is embedded. So, rose primes both the noun (and semantically associated flower) and the verb (see Tannenhaus et al., 1989). Functional expressions, however, are affected by syntactic context: where the syntax strongly favors a functional expression, only the functional expression will be activated. Hence after an initial noun phrase [wud] does not prime wood (or timber), and so on. This connection between syntax and closed class items is further supported by evidence from bilinguals, where in code-switching situations the functional expressions used tend to come from the language that supplies the syntax (Joshi, 1985). 3.2. Acquisition and Breakdown Evidence from first language acquisition and from different types of language breakdown resulting from brain trauma also show distinctions in the behavior of functional and lexical expressions. Numerous studies have focused on the acquisition of grammatical elements (see, for example, Bloom, 1970; Bowerman, 1973; Radford, 1990, and the papers in Morgan and Demuth, 1996, among many others). The data from these studies are not uncontroversial, but they indicate that functional expressions typically appear later in child language production than lexical expressions, and that functional categories appear later than lexical ones. Crosslinguistically, however, this is probably not absolutely true. For example, Demuth (1994) reports that Sesotho children produce a number of functional, or functionlike, elements from an early age (she cites passive morphology as an example) and claims of this sort for English tend to ignore the affix ing, which is acquired and produced relatively early (de Villiers and de Villiers, 1978). Furthermore, studies like Gerken and Mclntosh (1993) indicate that children who fail to produce function words are nevertheless sensitive to their appearance in input and suggest that therefore children may have representations of such expressions before they use them. Morgan et al. (1996) further hypothesize that the functional-lexical split is innate and that children use the phonological differences to group expressions into the two classes. This, they suggest, helps the identification of word-meaning mappings by cutting down the amount of utterance material that the child must attend to. Thus, children may indeed have some (possibly underdetermined) concept of the functional expressions in the language they are acquiring. This implies that any relative lateness in the production of functional expressions may be due to the
54
Ronnie Cann
communicative needs of the learner, since lexical expressions carry greater information than functional expressions and therefore are likely to be fully represented and so produced earlier. It also implies that functional expressions that carry a lot of semantic information or are otherwise salient in the speech stream (e.g., because of regular morphology or phonological prominence) may be acquired relatively early, while less informative or salient elements will be acquired later.15 Whatever the precise characterization of first language acquisition, however, the importance and robustness of the functional-lexical divide is clear, and that the acquisition of syntax proper by first language learners is coincident with the production of functional expressions is an accepted fact. Because of the difficulties in interpreting what children are producing or comprehending and the problems and controversies that surround the nature of child language, patterns of aphasic breakdown are, in many ways, more interesting for our purposes, because we see in such cases what happens when damage occurs to a full adult grammar. The evidence can thus be taken as strongly indicative of the nature of the mature language faculty. Aphasias can be characterized broadly as fluent and nonfluent. Fluent aphasias (Wernicke's) are characterized by the use of functional expressions, control of syntactic operations (movement), production of speech at a normal rate of speed, and appropriate intonational patterns; but comprehension is disrupted and access to information associated with lexical expressions is deficient, particularly with regard to predicate-argument structure and lexical semantics. Agrammatic aphasia (Broca's), on the other hand, is characterized by slow or very slow speech, no control of sentence intonation, impaired access to functional expressions, no control of syntactic operations. Comprehension, provided syntactically complex sentences are avoided, is unimpaired and lexical expressions are generally used appropriately, indicating full access to semantic information (see Goodglass, 1976). What is interesting here is that in agrammatic aphasia, semantic processing appears to be intact, while syntactic processes are disrupted. Some representative examples of agrammatic speech (taken from Tait and Shillcock, 1992) appear in (1 la-1 If). (1 la) and (lib) illustrate difficulties with participle formation (and one example of an omitted determiner); (l1c) from Italian shows difficulty with gender agreement in both articles and verbs; in (1 Id) from Dutch there is a missing auxiliary; (l1e) from German displays wrong case assignment (accusative for dative); and (llf) from French indicates difficulty with prepositions. (11) a. burglar is open the window b. Little Red Hood was visit forest grandmother c. il, la bambina sono, e andata the.m, the.f girl have has gone d. ik nou 21 jaar gewerkt I now 21 years worked
Functional versus Lexical: A Cognitive Dichotomy
55
e. die Oma sperrt ihn auf the grandmother opens him.acc f. j'aipris chemin de d'orthophoniste envoiture I have taken road from/of of speech-therapist in car Of course, the syntactic impairment shown by such dysphasics is not an absolute, all-or-nothing affair affecting the whole of a subclass of functional expressions or all occasions of utterance [cf. (lla), where there is one omitted determiner and one overt one]. However, it is clear that there is difficulty in production16 and that this principally affects functional expressions, both words and affixes. There is also evidence that agrammatic aphasics have difficulty in interpreting noncanonical structures. For example, many agrammatics have difficulty understanding passive sentences that cannot be disambiguated through semantics alone. In experiments it has been shown that performance in understanding passives where the thematic roles are easily assignable is significantly better than comprehension of passives where no semantic clues are available (Saffran et al., 1980; Schwartz et al., 1980). (12) a. b. c. d.
The hunter shot the duck. The duck was shot by the hunter. The square shot the circle. The square was shot by the circle.
Furthermore, it is reported in Bradley et al. (1980) that agrammatic aphasics appear to show frequency effects for functional expressions. Although, as noted above, the conclusion drawn by Bradley that normals do not show such effects with functional expressions is controversial; the effect of frequency on the recognition of words by aphasic speakers is apparently more marked than for normal ones. Again, the test for recognition of nonwords based on closed and open class items is more robust and has been replicated. Broca's aphasics show no difference in reaction times between the two types of nonwords, indicating that their recognition of functional expressions is impaired. 3.3. Discussion The psycholinguistic evidence points to a strong distinction in the processing of functional expressions and contentive (lexical) ones. In particular, the evidence from word priming indicates that functional expressions are not encapsulated from syntax since the syntactic context that surrounds functional expressions affects lexical access, whereas syntactic context has no effect on the lexical access of contentives. Furthermore, functional expressions are recognized quickly by the processor and do not appear to interact with the mechanisms that identify contentive expressions. The data from language acquisition and language breakdown also show that functional expressions are closely linked with syntactic operations
56
Ronnie Cann
like passive, dative shift, and so on, while lexical expressions provide sufficient information for basic semantic processing to occur, even in the absence of coherent syntax. There is thus not only strong support for a significant distinction to be made between functional and lexical elements but also for the hypothesis that functional expressions are more closely associated with (local) syntactic processing than lexical ones, which themselves are more strongly implicated in semantic processing. Evidence from neurobiology further supports the significance of the distinction between functional and lexical expressions and the association of the former with syntactic processing, as it suggests that the two types may be stored in different parts of the brain. For example, the loss of the ability to manipulate syntactic operations in patients who have damage in the anterior portion of the left hemisphere, along the angular gyrus (Broca's area), indicates that the syntactically significant functional expressions may be located in this area. Patterns that emerge from fluent aphasias indicate that lexical expressions are less strongly localized, though a general tendency toward localization within the posterior portion of the left hemisphere is attested. Following left hemispherectomy, the right hemisphere may take over functions involving lexical expressions, with a remapping of activity to that hemisphere, but it cannot take over the functions of functional ones. Speech is possible, with normal comprehension and communication, but syntactic complexity is absent. There is also evidence from neurobiological studies that indicate differences in the storage of lexical and function items. It appears that neuronal assemblies corresponding to function words are restricted to the perisylvian language cortex, whereas those corresponding to content expressions include neurons of the entire cortex (see Pulvermuller and Preissl, 1991, and the discussion of neurobiological implications for language acquisition in Pulvermuller and Schumann, 1994). Unfortunately, as shown in section 2, the robust psycholinguistic evidence for the distinction is not reflected in the linguistic properties exhibited by the two macroclasses of expression. If the functional-lexical dichotomy is categorial, there should, as Hudson (this volume) notes, be "generalizations which would not otherwise be possible" without the categorization. In other words, the identification of an expression as functional will predict some subset of its grammatical properties. Furthermore, in a strict interpretation of the distinction between the two categories, there should be no expressions that are morphosyntactically attributable to both classes. If a lexeme is identified as a member of a contentive class by certain grammatical properties, then it should not exhibit properties centrally associated with functional ones (and vice versa). An implication of this is that Grammaticalization processes, whereby contentive expressions become functional, should exhibit an instantaneous shift from one class to the other at some point in the diachronic development. This in turn implies that lexemes that appear to have both lexical and functional uses should behave as homonyms
Functional versus Lexical: A Cognitive Dichotomy
57
and so should exhibit morphosyntactic properties that are entirely independent of each other. The fact that these properties do not appear to hold indicates that the important psycholinguistic notion of the functional-lexical distinction does not constitute a linguistic category. On the other hand, the notion does mirror the conceptual distinction between grammatical and contentive categories within linguistic theory, and there are clear connections between the psycholinguistic conception of functional expressions and their linguistic behavior. Thus, although there are no necessary and sufficient conditions that identify expressions as of one type or another, as noted in section 2.6, functional expressions are generally associated with restricted syntactic contexts and are not amenable to syntactic or semantic coercion, whereas lexical ones appear in a wider range of syntactic contexts and are syntactically and semantically coercible. This is reminiscent of the association of functional expressions with syntactic processing and lexical ones with semantic processing noted previously. We appear to have a situation, therefore, in which an important psycholinguistic distinction is not fully reflected in linguistic properties, but where there is a clear, but imprecise, relation between the processing properties associated with functional and lexical expressions and their general syntactic and semantic behavior. 3.4. E-language and I-language The apparent contradiction between the categorial nature of the functionallexical distinction implied by the psycholinguistic evidence and the noncategorial nature of the distinction implied by the lack of definitional linguistic properties can be usefully approached in terms of the distinction between E-language and I-language made in Chomsky (1986). The term E-language in that work is used to refer to the set of expressions that constitute the overt manifestation of a language in terms of actual utterances and inscriptions. It is something that may be observed directly as the output of linguistic behavior, an extensional or ostensive view of language that may be equated with the structuralist and mathematically formal notion of a language as a set of strings of basic elements. Different from this is I-language, which is characterized as an internal representation of structures that gives rise to the external manifestation of a particular language. I-language may be construed as a metalanguage that generates (or otherwise characterizes) E-language and is equated in Chomsky (1986) with a parameterized state of Universal Grammar. I-language thus consists of grammatical elements that are universally available to humans and that are manipulable by universal linguistic principles. E-language, on the other hand, necessarily consists of languageparticular elements (the expressions of the language) whose description at the level of the given phenomena must also be parochial and not necessarily amenable to analysis that is crosslinguistically generalizable.
58
Ronnie Cann
Considerations of this sort led Chomsky (1986) to argue that it is I-language that is the proper object of inquiry for linguistics, because it is this that results from the operation of universal linguistic principles and is thus directly relevant for the understanding of Universal Grammar. E-language, on the other hand is, for Chomsky, relegated to the status of an epiphenomenon, a symptom of language rather than its substance. Leaving aside the ideological battle that informs much of the debate around this topic, we may question whether there are in fact no aspects of E-language that are best described on their own terms (i.e., for which an I-language explanation misses the point and fails to adequately characterize all the relevant properties). Indeed, it is precisely with respect to this question about the nature of the functional-lexical dichotomy that the potential drawbacks of having a purely I-linguistic characterization of the language faculty are thrown into focus. Psycholinguistic investigation into language processing is principally concerned with the investigation of human responses to E-language. Descriptions of aphasic behavior or first language acquisition relate to the linguistic expressions that are produced or, less frequently, comprehended by the people being studied. Priming and other sorts of psycholinguistic experimentation record reactions to written or spoken tokens of expressions that are (or are not) part of a particular E-language. We may, therefore, hypothesize that the functional-lexical dichotomy indicated by psycholinguistic evidence is an ostensibly E-language notion, and we may assume that at the level of E-language (the set of expressions, particularly basic expressions, that extensionally define a language), the distinction is categorial, because it does identify a significant grouping of expressions that show identifiable traits in parsing and production (functional expressions are not encapsulated in processing, are accessed quickly, etc.). This hypothesis is supported by the fact that the set of functional expressions within a particular language is always sui generis in the sense that different languages overtly manifest different types of functional expression. English, for example, has no overt manifestation of gender agreement, nominal case, or switch marking, whereas a language like Diyari (Austin, 1981) has morphemes that express these concepts but no person agreement or aspect marking. I-language relates principally to the need to account for universal properties of language, whereas the "accidence" of grammar has traditionally been viewed as a language-specific phenomenon, but one that determines the properties of a specific language independent of its "substance." Insofar as accidence and functional expressions coincide, we might expect the study and analysis of this aspect of grammar to be language specific. In current transformational grammar, of course, the variability associated with accidence is attributed to universal parameters and, as such, is in the domain of I-language rather than E-language. However, parameters are intended to determine variable properties of language that are linked together in some way. Arbi-
Functional versus Lexical: A Cognitive Dichotomy
59
trary variations in the grammar of a language (e.g., a language has ejective consonants, fusional morphology, no overt WH-expressions, etc.) are relegated to the lexicon. What is not addressed is how significant such language-specific properties are and how much they contribute to the linguistic structures of a language beyond an epiphenomenal haze of arbitrary attributes. There is no a priori reason why external and nonuniversal properties cannot be linguistically significant. Aspects of E-language may determine certain aspects of grammaticality and interact with I-linguistic properties in interesting ways. In fact, the association of functional expressions with local syntactic processing and their independence from semantic processing implies a radical differentiation in the ways that functional and lexical expressions are represented. This hypothesis, that the functional-lexical distinction is an E-language phenomenon, will be pursued in the remainder of this chapter with a view to proposing a view of the grammar whereby extensional and external properties of language interact with intensional and internal ones and marry aspects of processing and theory in an interesting way.
4. CATEGORIZING FUNCTIONAL EXPRESSIONS At the end of the previous section, the hypothesis was promoted that the functional-lexical distinction is categorial at the level of E-language. We will refer to this E-language category ("E-category") as "functional" and take it to apply to the set of expressions in any language that show the psychological properties illustrated in the last section. In other words, the hypothesis is that the categorization of functional expressions is determined for an individual language through properties of processing and frequency. It is possible that certain types of phonological cue may also help to define this category.17 In other words, such a categorization is determined by properties that clearly belong to E-language and, hence, it must be language specific and not determined by universal factors. This is not to deny that expressions with certain inferential or semantic properties (such as anaphors and tense) will tend to be encoded by functional expressions, but the categorization of the expressions of a language into functional and lexical is one that is determined by the external manifestation of that language, as discussed above. This primary categorization into the macroclasses, functional and lexical, induces a split in the vocabulary that permits further (E-)categorization to take place. In this section, I explore the nature of this further categorization and develop a view of the way a theory of syntax may be developed that utilizes the different types of information associated with the two types of expression.
60
Ronnie Cann
4.1. Defining E-Categories Although not much in vogue in many current approaches to syntax, the quintessential type of syntactic categorization has generally been determined through properties of distribution. This approach to categorization finds its most elaborated form in the writings of the European and American structuralists (see, for example, Harris, 1951; Hjelmslev, 1953; Hockett, 1954). Morphosyntactic classes are defined by the syntagmatic and paradigmatic properties of expressions, typically through the use of syntactic frames: expressions are grouped into classes according to their ability to appear in a position determined by a particular frame. Clearly again, this type of categorization is induced by properties of E-language, as it depends solely on the appearance of expressions with one another and not on more abstract linguistic properties. Hence, one might look for further subcategorization of the lexical and functional categories to be determined by such a process. There is, however, a well-known methodological problem with this type of classification: how to determine which distributional frames are significant and which are not. In the classical model, categorization is meant to be automatic and determined without reference to semantics so that any linguistic context can in principle be used to define a distributional frame (see particularly works by Harris). In practice, of course, this ideal is not (and cannot be) met for all expressions in a language. The semantics of an expression is often used to determine whether it should be identified as a preposition or an adverb, a pronoun or a proper name, a verb or an adjective, before any distributional analysis is carried out. More problematic is the selection of significant distributional frames. For reasons to do with selectional restrictions, register, and other factors, if categorization is determined by distributional frames that are allowed to mention specific words, then almost all expressions in a language, including major class ones, will define unique word classes, since they will appear in a unique set of contexts. Clearly, if this applies to lexical as well as functional expressions, then this is problematic from the point of view of the grammar, since in the worst case it requires the same number of distributional (E-)categories as lexical expressions, preventing significant generalizations to be made. To get around this problem, structural linguists have tended to use broad, and sometimes arbitrary, syntagmatic frames to define word classes. This methodological problem is one that led to the move away from distributional theories of categorization to ones that rely on abstract or notional properties. In fact, however, the difficulty disappears if categorization is determined, not with respect to all basic expressions in a language, but only with respect to the functional ones. As noted in section 2.1, the number of functional expressions is itself small, so that even if every functional expression in a language appears in a unique set of contexts, the number of different categories that need to be recognized will still be small (no greater than the number of functional expressions). Furthermore, because there are no operations that alter the syntactic environment
Functional versus Lexical: A Cognitive Dichotomy
61
of functional expressions in the same way as for lexical ones, the number of significant contexts for any single functional expression, abstracting away from individual lexical expressions, will be small. Moreover, since functional expressions can appear with all members of an associated lexical class, and they coerce lexical expressions to be of the appropriate class in context, we may further abstract away from individual lexical expressions and refer only to major class labels. Thus, instead of classifying articles in English in terms of an indefinite number of frames [_ dog], [ student], [ hamster in a cage], and so on, they are classified in terms of the single frame [_ N]. In order that the distributional definitions of functional categories are not circularly reapplied to the definition of the major parts of speech (e.g., by taking the frame [the _] to identify particular lexical expressions as nouns), labels like N and V must be taken to be a priori categories that the class of functional expressions define extensionally. Thus, in English, whatever expression appears in construction with the, some, and so on, is necessarily (headed by) a noun or with may, will, and so on, is necessarily (headed by) a verb. In other words, E-functional categories are defined over the class of functional expressions and a small set of major class labels like N and V, the latter of which are universally given and hence may be considered to form part of the vocabulary of I-language. A restricted vocabulary, of course, does not guarantee that the set of distributional frames that needs to be considered will also be small or even finite. However, it seems (again because of the restricted syntactic distribution of functional expressions) that significant distributional frames will be in the region of two to four words in length. In general, increasing the size of the context used to identify classes of functional expressions will have no effect on the membership of those classes.18 For example, with respect to the illustrative set of frames for part of the functional system in English (13a-k),19 frames like [_ V+ed the N] or [_ has been V+ing] will pick out exactly the same class of expressions as (13h); frames like [_ N of the N], [_ A N], [_ A A N], and so on, will pick out the same class as (13a), and so on. (13) a. b. c. d. e. f. g. h. i.
[_ N] = {the, a, every, much, no, my, your}N [ N+s] = {the, some, many, few, all, no, those, my, your}N [the _N+s] = {many, few}N [ of the N] = {all, many, few, some, none}N [_ the N] = {all}N [_ V+s] = {he, she, it, this}N [_V] = {you, they, I, we, those, these, many, several, few} N [_ V+ed] = {I, you, he, she, it, we, they}N [N+_] = {-s}N
j. [V _] = {here, there}ADV
k. [A+ _] = {-ly}ADV
62
Ronnie Cann
14 (Figure 1)
Figure 1
One of the interesting things to note about functional E-categories is that they cut very finely. For example, given the representative data about the functional expressions in the nominal field in (13a) to (13e), we find that the different distributional classes are not fully generalizable to all members of this subclass. Thus, although most of the expressions that satisfy the frame [_ of the N] (abstracting here away from number) also satisfy [_ N], at least one does not (i.e., none}, and while most expressions that satisfy [_ V] satisfy [_ N] (and vice versa), not every relevant expression satisfies both (the personal pronouns satisfy the first but not the second, whereas the articles a, the, and possessive pronouns satisfy the second and not the first). However, some of the frames considered above do appear to be predictive: [_ of the N] predicts [_ V] and [the _ N] and [_ the N] predict [_ N] and [_ V] (when restricted to functional expressions, as we are doing). The intersection of the classes defined by [_ of the N] and [_ N] yields a further class. We can diagram these relations using the (subsumption) lattice in (14), where the nodes correspond to sets of expressions that can appear in a particular frame, to the intersection of classes defined by different frames, or to the complement of such intersections with respect to the two original sets. In this way, a complex array of distributional categories emerges. As one goes down the lattice, the categories (necessarily) become smaller, with all and none defining categories of their own. Indeed, if one cuts across the lattice with further properties (like syntactic number) then further differentiation occurs, with, for example, a and much being distinguished from the and no, and so on. Ultimately, the process leads to very small classes of expression, often containing only one member. This approach to the categorization of functional expressions thus yields a set of relations between individual morphemes that essentially treats each such morpheme as syncategorematic (or equivalently acategorematic), whose syntactic interpretation is given by its position within a distributional lattice as in those shown in (13). It comes as no surprise in such a
Functional versus Lexical: A Cognitive Dichotomy
63
view of functional categories that there will be expressions which are entirely sui generis and appear not to relate directly to other functional expressions (like perhaps the complementizer that, see Hudson, 1995). If it is the case that basic syntactic environments define a (meet semi-)lattice in terms of the elements that appear in them, then one only has to know the point at which a particular element is attached to the lattice to know its distribution. This is, of course, equivalent to defining a set of grammatical rules (of whatever sort) and assigning expressions to particular labels introduced by those rules, in the normal structuralist mode.20 It is not here important how the syntactic relations between the nodes on the lattice are determined and with what generality. What is important is that a structuralist distributional approach directly induces the categorization of functional expressions, both at and below word level, and, because of the syntactically restricted nature of such expressions, such an approach can in principle provide an exhaustive characterization of the restricted environments in which functional expressions can appear. 4.2. I-categories and E-projections As noted above, distributional classes such as those shown in (13) define E-categories, since they are extensionally defined over the vocabulary of English. Clearly, in such a categorization, the principal categorial distinction must be between functional and lexical, since this provides the restriction on the given data that makes distributional categorization possible. The functional expressions essentially then define the E-categories of the lexical expressions through the use of universal major class labels.21 Although such an approach is in principle capable of yielding an exhaustive characterization of the strictly local dependencies of the vocabulary of a language, as it stands it determines only subclausal constituents. Functional expressions do not provide sufficient information to enable distributionally defined phrases to be combined. Something more is needed that can induce the set of permitted combinations and presumably account for general, putatively universal, linguistic processes like unbounded dependencies and suchlike. Within transformational grammar, universal syntactic processing is assumed to operate only over I-language entities and so the relation between E-categories and I-categories becomes an important issue. One of the features of classifying functional expressions in terms of their distribution is that, because of their strict association with particular domains (nominal, verbal), basic labeling of phrases that are the output of the distributional grammar discussed in the last section can be done with respect to these domains, as indicated by the subscripts around the classes in (13). Thus, the different classes labeled N and V above are functional classes related to the universal I-categories noun and verb, respectively.22 Note that the I-categorial label is not equivalent to the E-categorial label used in the distribution frames themselves. Thus, we cannot
64
Ronnie Cann
substitute the (or the N, or any pronoun) for N in the frames (13a) to (13d). In fact, we can usefully here distinguish between the E-category N (or V) and its I-category counterpart N (or V). If we take the position that these latter labels are the ones that are visible to Universal Grammar, then we may understand the combination of functional expressions with a lexical expression as recursively defining the resulting (complex) expression as being of the appropriate I-category. Functional classes may thus be construed as defining E-projections (to slightly modify the concept of extended projection of Grimshaw, 1991) of the major class label they contain. This is illustrated in (15) below, where the complex expressions are defined by the distributional grammar associated with the functional expressions, and the categorial label gives the resultant I-category. Note that it is not important exactly how (or whether) the internal structure of such phrases is represented. What is important is that the phrases are constructed from information provided by the E-categories of the functional expressions within a given language and that they are labeled with the I-category associated with the major class of the lexical expression they contain. (15) a. [cat+s],v b. [the cat+s]^ c. [all the cat+s] # Through their associated I-category, E-projections are visible to Universal Grammar (however construed) and so may be combined through the syntactic operations that the grammar permits. One of the universal aspects of syntactic combination assumed in all current theories of syntax is the combination of lexical predicates and their arguments. Information about lexical argument structure necessarily comes from the lexical expression in an E-projection, as in (16), and E-projections may be combined by some tree-forming operation (like "Merge" in Chomsky, 1995), as illustrated in (17).23 (16) a. b. < [have kick+ed]v, > c. < [may have kick+ed] v, < AGENT,PATIENT>> This view of the grammar, whereby combinations of a lexical expression and its associated functional structure is defined by distributional grammar, and further combination is done through the manipulation of major I-categories and
17 (Figure 2)
Figure 2
Functional versus Lexical: A Cognitive Dichotomy
65
argument structure, provides a way to accommodate properties of linguistic expressions that are indicated by the psycholinguistic evidence and provides a solution to a number of the problems of characterizing functional categories discussed in section 2. From the processing point of view, the fact that functional expressions are associated with syntactic frames in a different way from lexical ones, and that they are strictly associated with syntactic frames, can be an explanation for why only lexical expressions prime homonyms; why the rejection of nonwords based on functional expressions is faster than those based on lexical ones; and why the processing of functional expressions is not encapsulated from syntax, but that of lexical ones is not. Furthermore, since the variables in distributional frames are associated with the major lexical classes, only these classes of expressions will be affected by spoonerisms. In terms of language breakdown, the association of functional expressions with local syntax means that the loss of such elements automatically entails the loss of their associated syntactic properties. Hence, in Broca's aphasia, what is left intact is the ability to manipulate argument structure, and so semantically coherent expressions can be constructed using only lexical expressions. In addition, if the representation of E-categories is essentially lexical, then particular functional expressions (and their associated syntax) may be lost, whereas other such expressions may be retained, giving rise to partial fluency. Hence, it is not necessary to assume (as did Ouhalla, 1991) that breakdown necessarily involves a complete functional class. The linguistic consequences of the approach also go some way to explaining the existence of expressions that show nonfunctional properties and why certain of the properties discussed in section 2 are not good indicators of functional status. In the first place, the property of closed versus open classes becomes mostly irrelevant. All functional classes are necessarily closed (and small), given the natures of E-categorization. However, all such classes are associated with some lexical (I-)category, and so the fact that certain functional expressions have the distribution of lexical classes is nonproblematic and expected, since the null environment (within an E-projection) is a possible environment (e.g., [there]ADV, [she]N, etc.). Second, nothing in the model prevents certain expressions that have similar semantic functions to functional expressions from being treated as lexical. So perhaps certain quantifiers may appear in the grammar as lexical nouns with argument structure (e.g., perhaps several), whereas others (e.g., every) are only associated with the I-category noun through its position in an E-projection. Provided that the semantic force of the two expressions can be expressed (which it must be able to be), the difference in syntactic status is immaterial. Furthermore, expressions that have both lexical and functional properties are not disallowed. Such expressions can be assigned to a major E-category (through its semantic sort) but also can be associated with functional domains. So, have
66
Ronnie Cann
may be a verb through its association with the sort event, but may also be associated with distributional frames like [_ V+ed] v and so on. This predicts thatpolysemous expressions that cross the functional divide are expected to show syntactic behavior that is not determined by whether the expression is being used as a lexical or a functional element, hence the mixture of auxiliary and main verb uses of have whether or not it is used as a possessive verb or a causative marker. The general syntactic properties of functional expressions noted in section 2.3 also follow from this model. Because distribution is defined with respect to a major class label and not individual lexical items, a functional expression cannot differentiate between members of the class and so cannot select any subset of them to appear with. This property also predicts that coercion will always be to the class required by the functional expression, and not vice versa, and that functional expressions cannot coerce each other. Furthermore, if long-distance dependencies are determined by argument structure (as noted in footnote 6), then the extraction of parts of an E-projection will be impossible, predicting the ungrammaticality of *cats, Kim really thought Lou liked the.24 Finally, the difference in the syntactic operations that govern the construction of E-projections and their combination into clauses allows, but does not require, functional expressions to appear in syntactic contexts in which lexical expressions cannot. The strong differentiation made between functional expressions and lexical ones may also form the basis of an explanation for other properties noted above. For example, phonological and morphological reduction may be expected for functional expressions given the close association between expressions in an E-projection and their predictability, whereas lexical ones are not predictable. The proposal made above, which utilizes aspects of different syntactic theories in having the grammar partly defined by distributional rules and partly by more abstract properties of Universal Grammar, thus provides a potential basis of explanation for a whole range of phenomena that are problematic when approached from the viewpoint of a theory that envisages just one type of syntactic representation for all expressions in a language. 4.3. FUNCTIONAL I-catcgories The picture of the grammar presented above, in which functional expressions define (distributionally determined) local domains over which universal grammatical principles operate, leaves out the relation between the E-categorial functional classes and the functional categories familiar from much recent syntactic theory. To relate the two notions, we might hypothesize the existence of an I-language category, FUNCTIONAL, which would consist of the nonmajor categories familiar from current transformational grammar, AGR, COMP, DET, TNS, and so on (i.e., a set of grammatical categories). The FUNCTIONAL categories are, however, independent of the language-particular morphs that somehow encode them,
Functional versus Lexical: A Cognitive Dichotomy
67
since they are, by being objects in I-language, necessarily universal, whereas functional classes are language particular and defined solely through their distribution within the language and not according to their relation to some abstract linguistic property. The independence of I-language and E-language categories presents a particular problem for functional elements that is not apparent with lexical ones. The E-categorization of lexical expressions into nouns, verbs, and so on is determined by their co-occurrence with nominal and verbal functional elements (words or affixes). However, functional expressions do not classify lexical expressions into those classes, because the distributional definition of functional classes cannot, by hypothesis, refer to individual lexical items nor, as we have seen, can we classify lexical expressions according to distributional frames defined by the functional ones without circularity. Major class membership must thus be determined in some other way, presumably through basic ontological properties as suggested in notional definitions of the major parts of speech.25 The I-category associated with a lexical expression is thus determined by the I-category associated with a particular functional expression (or directly in the lexicon, if the expression can appear without any accompanying functional expression, such as adjectives and proper names in English). Its association with an E-category is, however, mediated by its semantic properties (such as its sort).26 Because of this, there is no particular problem in understanding the relation between the major E-language and I-language categories or relating lexical expressions with particular I-categories. However, this transparency of relatedness between I- and E-categories and between expressions and I-categories does not hold for the relationship between functional classes and FUNCTIONAL categories. Individual functional expressions, for example, typically encode more than one traditional grammatical category. Hence, while the article the in English could be considered to instantiate only the category of definiteness (18a), its indefinite counterpart encodes both (in)definiteness and number (being singular) (18b). The quantifier every encodes number (singular) and the fact that it is a quantifier (18c), whereas the my encodes definiteness, agreement (pronominality), and possession (18d). However, distributionally these expressions form a functional class. What then is the relationship between this class and the FUNCTIONAL categories? Most obviously, the hypothesis should be that the functional class relates to the union of all FUNCTIONAL categories with its members (18e) or to their intersection (18f). Unfortunately, neither of these potential solutions tells us anything useful, since not all the members of the class exhibit all the properties indicated, and there is no one property shared by every member of the class. (18) a. the: [DBF] b. a: [DBF, NUM] c. every: [NUM, QNT]
68
Ronnie Cann
d. my: [DBF, POS, AGR]
e. (the, a, my,.. ., every} = [DBF, POS, AGR, NUM, QNT] f. (the, a, my,. . ., every} = 0 A further problem with the mapping between functional expressions and FUNCTIONAL categories has to do with the fact that certain expressions perform different grammatical functions according to their local syntactic context. For example, the morph -ed in English is interpreted either as perfect or passive (or adjectival) depending on whether it appears with the verb have or the copula be (or no verb at all). It is argued in Cann and Tait (1995) (and reiterated in Cann, 1999) that this morph is not homonymous between aspect and voice, but has a single interpretation (as an unaccusative state) whose other properties are determined by the elements with which it combines.27 If this is correct, then the mapping from individual functional expressions to FUNCTIONAL categories is not necessarily one-to-one and is thus nontransparent. It is not only the mapping from functional E-categories to FUNCTIONAL Icategories that is problematic, but so also is the reverse mapping from I-category to E-category. First, following from the observation above concerning the encoding of a number of grammatical categories by a particular functional expression, it is clear that a particular FUNCTIONAL category may be instantiated by a number of E-categories: agreement, for example, is distributed across nominal and verbal functional classes in many languages; definiteness may be distributed across articles, possessive pronouns and certain quantifiers; and so on. More importantly, FUNCTIONAL categories may be realized not only by functional expressions (affixes or semi, -bound forms like the articles in English) but also by lexical ones, which may or may not be in the process of grammaticalization. For example, TENSE in English may be realized by affixes (-ed, -s), auxiliary verbs (will), or fully lexical verbs (go as in be going to}. In Diyari, a number of TENSE and ASPECT distinctions are encoded by what appear to be full verbs followed by participles. For example, the habitual or intermediate past is indicated by the use of the verb wapa- meaning 'go', while pada- 'lie' indicates recent past, warn- 'throw' indicates immediate past, and wanti- 'search' indicates distant past (Austin, 1981:89). There is thus no direct correspondence between FUNCTIONAL category and functional expression. The Diyari example above also indicates a problem with FUNCTIONAL categories and their relation to functional classes that is part of a common concern for all universalist theories of linguistic categorization. As is well known, different languages often instantiate different values for a certain category (e.g., different types of past tense in Diyari), and no language morphosyntactically encodes every possible grammatical category. The question that arises is whether all the different values and all the different categories are to be considered universal. If so, then the theory of Universal Grammar requires every possible variation of a grammati-
Functional versus Lexical: A Cognitive Dichotomy
69
cal category to be at least immanently present in every human language, leading to further problems with regard to the representation of the nonovert categories within I-language. The position that all values of grammatical categories (or indeed all grammatical categories) are universal is not likely to be tenable, given the thousands of variations in the number and type of distinctions made crosslinguistically in all areas of the grammar. However, if categories like "distant past" are not universal, they must be represented as E-categories defined by morphosyntax of the language concerned. Since I-categories and E-categories are defined independently of each other, this leads to the uncomfortable situation where some functional expressions within a language encode (universal) FUNCTIONAL categories, but others must contribute semantic information without the mediation of such an I-category. Whether or not it is possible to identify any "significant" universal grammatical categories that must exist independently of any sets of associated functional classes, the fact that at least some functional expressions remain unassociated with any FUNCTIONAL category raises the possibility that the content of such expressions is always input into the grammar without this sort of mediation. Considerations such as the one-to-many mapping between functional expressions and FUNCTIONAL categories, the failure of the latter to consistently map onto functional classes (or even functional expressions), and the problem of apparently language-specific functional categories leads to a view of the grammar where the latter have no independent syntactic status. Indeed, one might hypothesize that if FUNCTIONAL categories are dissociated from distributional criteria (and thus any direct connection with functional classes), then all that is left of their content are the semantic functions they perform. Since such functions vary across functional expressions in a single language and across different languages, it may be that the categories themselves are not independently significant, and the content of functional expressions is projected directly into the semantic content of the expression without augmenting the syntactic information of the label of the expression (E-projection), as illustrated in (19). (19) a. b. c. d. e.
28 N[BAR:2, POSS: +], H[BAR: 1] The head of the phrase is only specified for the feature BAR. The HFC is a default condition that requires that the mother and the head daughter match on all features, so long as they do not conflict with any "absolute condition on feature specifications" (780). So, for instance, for the rule in (32), this will ensure that the head daughter will match the mother in its major category and that the phrase will be headed by an N. Given this background, Pullum observes that POSS-mg VGerPs can be accounted for by introducing a slightly modified version of the previous rule: (33) N[BAR:2]
(N[BAR:2, POSS: +],) H[VFORM:prp]
This rule differs from the rule in (32) only in the feature specification on the head daughter: in (33), the head daughter is required to be [VFORM: prp]. An independently motivated Feature Co-occurrence Restriction (FCR) given in (34) requires that any phrase with a VFORM value must be verbal. (34) [VFORM] D [V: +, N: -] This constraint overrides the HFC, so the rule in (33) will only admit phrases with -ing form verb heads. However, the mother is the same as the mother in (32), so (33) will give VGers the following structure:
This reflects the traditional description of VGerPs as "verbal inside, nominal outside" quite literally by giving VGerPs a VP node dominated by an NP node. However, Pullum's (1991) analysis only applies to POSS-wg VGerPs and has nothing to say about ACC-ing VGerPs at all. He suggests that ACC-wg and POSS-mg
142
Robert Malouf
VGerPs "must be analyzed quite differently" (766), but by treating them as unrelated constructions, he fails to capture their similarities. This is not merely a shortcoming of the presentation. There does not seem to be any natural way to assimilate ACC-ing VGerPs to Pullum's analysis. The simplest way to extend (33) to cover ACC-ing VGerPs is to add the following rule: (36) N[BAR:2]
(N[BAR:2]), H[VFORM:prp]
Since the default case for NPs is accusative, this rule will combine an accusative NP with an -ing form VP. This rule neatly accounts for the similarities between the two types of VGers, but not the differences. Following the direction of Hale and Platero's (1986) proposal for Navajo nominalized clauses, we might try (37) instead. (37)
N[BAR:2]->H[SUBJ:+,VFORM:prp]
The feature SUBJ indicates whether a phrase contains a subject and is used to distinguish VPs from Ss. A VP is V[BAR:2, SUBJ: -], whereas an S is V[BAR:2, SUBJ: +]. So, (37) would assign an ACC-ing VGerP the structure in (38).
It is plausible that this rule might account for some of the differences between the two types of gerund phrases. It less clear though how it can account for the difference in pied piping, since nothing in the GPSG treatment of relative clauses rules out examples like (21b), repeated here (see Pollard and Sag, 1994:214ff): (39)
*The person (for) who(m) to be late every day Pat didn't like got promoted anyway.
This analysis cannot, however, properly account for PRO-ing VGerPs. Since the possessive NP in (33) is optional, it treats PRO-ing VGerPs as a subtype of POSS-ing VGerPs even though, as we have seen, PRO-ing VGerPs have more in common with ACC-mg VGerPs. Furthermore, I do not think it is possible to account for the control properties of PRO-mg VGerPs in this type of analysis. Some subjectless gerund complements, like some subjectless infinitive complements, must be interpreted as if their missing subject were coreferential with an argument of the higher verb:
Verbal Gerunds as Mixed Categories
(40)
Chris tried
143
a Nautilus machine in Paris without success.
In both sentences in (40) the subject of find must be coindexed with the subject of tried, namely Chris. In GPSG control for infinitive complements is determined by the Control Agreement (AGR) Principle, which ensures that the AGR value of to in (41) is identified with the AGR value of try (Gazdar et al., 1985:121).
Other constraints identify the AGR value of try with its subject and the AGR value of to with the unexpressed subject of find. Although this works for infinitive complements, it cannot be extended to account for control in gerunds. The agreement FCR in (42) will block projection of the gerund's AGR value to the top-level NP node. (42)
[AGR]D[V:+,N:-]
Because complement control is mediated by AGR specifications, there will be no way to capture the parallel behavior of subjectless infinitives and gerunds. Finally, structures like (38) raise doubt as to whether the notion of head embodied by the HFC has any content at all. In this case, the only head specification shared by the mother and the head daughter is [BAR:2], and this match comes about not by the HFC but by the accidental cooperation of the rule in (37) with the FCR in (43). (43)
[SUBJ:+]D[V:+,N:-,BAR:2]
I think it is fair to classify (37) as an exocentric rule. So, the only clear way to extend Pullum's analysis to account for ACC-ing VGerPs violates one of the theoretical desiderata that are the primary motivators for his analysis in the first place.1 Lapointe (1993) observes three problems with Pullum's analysis. The first problem is that, as discussed above, it vitiates the principle of phrasal endocentricity. Lapointe's second objection is that Pullum's proposal is much too general. It has no way of representing the fact that some types of mixed category constructions are much more common than others. Nothing in it prohibits outlandish and presumably nonattested rules such as:
144
Robert Malouf
(44) a. VP->H[NFORM:plur],PP b. N' -> (QP), H° [VFORM: psp] And, nothing in it explains why constructions parallel to the English POSS-ing VGer are found in language after language. To avoid these shortcomings of Pullum's analysis, Lapointe proposes a more conservative modification to standard notions of endocentricity. He proposes introducing dual lexical categories (DLC) like ; BAR:2] -*..., H[(X\Y); BAR:2],... b. No ID rule can have the form <X|X>->...,H[F,g],..., where F implies (X \ Y), unless g includes (X \ Y). However, Lapointe restricts himself to discussion of genitive subject VGerPs. As a consequence, his analysis suffers from the same problems as Pullum's.2 In addition, since Lapointe's necessarily brief presentation leaves some formal details unspecified, it is not at all clear that a rule like (37) would even be permissible under his system. Wescoat (1994) points out an additional problem with Pullum's analysis: in excluding articles and adjectives from gerunds, they "make no allowance for a variant grammar of English that admits archaic forms like [(47)], attested between the 15th and early 20th centuries" (588). (47) a. the untrewe forgyng and contryvyng certayne testamentys and last wyll [15th cent.] b. my wicked leaving my father's house [17th cent.] c. the being weighted down by the stale and dismal oppression of the rememberance [19th cent.]
Verbal Gerunds as Mixed Categories
145
Wescoat goes on to note that "such structures coexisted with all modern gerund forms, so it is only plausible that the current and former grammars of gerunds should be largely compatible, in a way that Pullum's approach cannot model" (588). Wescoat proposes to preserve phrasal endocentricity by modifying Kornai and Pullum's (1991) axiomatization of X' syntax to allow a single word to project two different unordered lexical categories and therefore two different maximal phrases. He proposes that VGers have a structure like (48a), parallel to the clause in (48b).
In these trees, the N and I nodes, respectively, are extrasequential. That is to say, they are unordered with respect to their sisters. This structure preserves syntactic projection, but at the cost of greatly complicating the geometry of the required phrase structure representations in ways that do not seem to be independently motivated. Even assuming Wescoat's formal mechanism can be justified, the analysis shown in (48a) runs into problems with POSS-ing VGerPs. In order to account for the nonoccurrence of adjectives and determiners with gerunds in Late Modern English, Wescoat adds a stipulation that the N node associated with a gerund must be extrasequential. Since adjectives and determiners must precede the N they attach to, this stipulation prevents them from occurring with gerunds. But, possessors also have to precede the head noun in their NP, so this stipulation should also prevent gerunds from occurring with possessors. As there is no way an ordering restriction could distinguish between adjectives and determiners on the one hand and possessors on the other, Wescoat has no choice but to treat possessors in POSS-ing VGerPs as subjects with unusual case marking, not as specifiers. In so
146
Robert Malouf
doing, he fails to predict that PQSS-ing VGerPs, unlike ACC-ing VGerPs, share many properties of head-specifier constructions. For example, POSS-ing gerunds are subject to the same pied-piping constraints as NPs, while ACC-ing gerunds behave more like clauses. On the other hand, Wescoat's approach would extend to cover the ACC-ing VGerPs that are problematic for other analyses. A natural variant of (38) using lexical sharing would be
In this structure, both the N and the I nodes associated with painting are extrasequential. This tree seems to be fully consistent with all of Wescoat's phrase structure tree axioms. But, because it is not clear from his discussion how noncategorial features get projected, it is hard to say whether this kind of analysis could account for the differences between the two types of VGerPs. For instance, the contrast in (21), repeated in (50), is typically attributed to the fact that projection of wh-features is clause-bounded. (50) a. The person whose chronic lateness Pat didn't like got promoted anyway, b. *The person (for) who(m) to be late every day Pat didn't like got promoted anyway. This is what motivates the introduction of an S node in (38). However, it is not obvious that the introduction of an IP in (49) will prevent any features from projecting from the head painting directly to the top-most NP If the N, I, and V nodes in (49) are really sharing the same lexical token, then the same head features should be projected to the NP, IP, and VP nodes. Otherwise, in what sense are the three leaf nodes "sharing" the same lexical token? Without further development of these issues, it is hard to evaluate Wescoat's analysis. Finally, Wescoat's approach runs into a fatal problem, pointed out by Wescoat (p.c.), when faced with coordinate gerund phrases. Take an example like (51). (51)
Pat's never watching movies or reading books
Since the adverb never is modifying the whole coordinated VP, the only plausible structure Wescoat could assign to this sentence is (52).
Verbal Gerunds as Mixed Categories
147
But this structure is clearly ruled out by Wescoat's constraints: the mapping from leaf nodes to lexical tokens need not be one-to-one, but it must still be a function. That is, while a lexical token may be linked to more than one leaf node, each leaf node must be linked to one and only one lexical token. Therefore Wescoat's approach cannot account for examples like (51), and there is no obvious way that it could be extended to handle this kind of construction. Despite their technical differences, these approaches share a common underlying motivation. Very similar proposals have been made by Hale and Platero (1986) for Navajo nominalized clauses, by Aoun (1981) for Arabic participles, by van Riemsdijk (1983) for German adjectives, and by Lefebvre and Muysken (1988) for Quechua gerunds. Although these analyses differ greatly in their technical details, they all involve a structure more or less like the tree in (53), and so require weakening the notion of head to allow a single lexical item to head both an NP and a VP simultaneously.
The assumptions underlying (53) are those mentioned above: that the basic categories are N, V, A, and P, and that the properties of a phrase are determined by the lexical category of its head. If we accept X' theory in general, then we would not expect to find an NP projected by a verb, and the "null hypothesis" should be that structures like (53) do not exist. If there were strong evidence that (53) was indeed the structure of an English VGer, then we would have no choice but to reject the hypothesis and revise the principles of X' theory. However, as shown in
148
Robert Malouf
section 2.2, there is no clear evidence that VGerps include a VP. Therefore, an analysis that can account for the properties of VGers without violating the principles of X' theory is preferable a priori to one that posits a structure like (53). Borsley and Kornfilt (this volume) argue that a mixed extended projection similar to the structure in (53) provides insight into the cross-linguistic distribution of gerund-like elements, whereas the analysis presented here does not. However, it should be noted that the present analysis is compatible with Croft's (1991) functional explanation for the observed cross-linguistic patterns (see Malouf, 1998). In the next sections I will explore an analysis of VGers that takes into account the varying sources of syntactic information by exploiting HPSG's fine-grained categorial representations and thus calls into question the assumption underlying analyses involving categorial changeover.
4. THEORETICAL PRELIMINARIES Recent work in Construction Grammar (Fillmore and Kay, in press; Goldberg, 1995) and HPSG (Pollard and Sag, 1994) provide the basis for an analysis of mixed categories that can account for their hybrid properties without the addition of otherwise unmotivated mechanisms. In this section, I will outline the theoretical devices that will play a role in the analysis. The basic unit of linguistic structure in HPSG is the sign. Signs are "structured complexes of phonological, syntactic, semantic, discourse, and phrase structural information" (Pollard and Sag, 1994:15) represented formally by typed feature structures (TFSs), as in (54):
This TFS represents part of the lexical entry for the common noun book. A sign consists of a PHON value and a SYNSEM value, a structured complex of syntactic and semantic information.
Verbal Gerunds as Mixed Categories
149
Every linguistic object is represented as a TFS of some type, so linguistic constraints can be represented as constraints on TFSs of a certain type. The grammar of a language is represented as a set of constraints on types of signs. In order to allow generalizations to be stated concisely, linguistic types are arranged into a multiple-inheritance hierarchy. Each type inherits all the constraints associated with its supertypes, with the exception that default information from higher types can be overridden by conflicting information from a more specific type.3 In addition to allowing generalizations to be expressed, the type hierarchy also provides a natural characterization of motivation, in the Saussurean sense discussed above. In Construction Grammar, default inheritance is used to give a formal characterization of such system-internal motivation: "A given construction is motivated to the degree that its structure is inherited from other constructions in the language. ... An optimal system is a system that maximizes motivation" (Goldberg, 1995:70). Thus, the type hierarchy reflects the way in which constructions are influenced by their relationships with other constructions with the language and allows what Lakoff (1987) calls the "ecological niche" of a construction within a language to be captured as part of the formal system. Considerable work in HPSG has focused on examining the hierarchical structure of the lexicon (e.g., Flickinger, 1987; Riehemann, 1993). More recent research has investigated applying the same methods of hierarchical classification to types of phrasal signs. Expanding on the traditional X' theory presented in Pollard and Sag (1994), Sag (1997) develops an analysis of English relative clauses based on a multiple-inheritance hierarchy of construction types, where a construction is some form-meaning pair whose properties are not predictable either from its component parts or from other constructions in the language. A relevant part of the basic classification of constructions is given in (55).
Phrases can be divided into two types: endocentric headed phrases and exocentric nonheaded phrases. Since syntactic constraints are stated as constraints on
150
Robert Malouf
particular types of signs, the Head Feature Principle can be represented as (56), a constraint on all signs of the type headed. (56)
headed^
Headed phrases are also subject to the following constraint on valence features: (57)
headed
This constraint ensures that undischarged valence requirements get propagated from the head of a phrase. In the case of, say, a head/modifier phrase, the nonhead daughter [2] will not be a member of the SUBJ, SPR, or COMPS value of the head, and so the valence values will be passed up unchanged. In the case of, say, a head/complement phrase, \2\ will be on the head's COMPS list B, so the mother's COMPS value is the head's COMPS value minus the discharged complement. Headed phrases are further divided into head-adjunct phrases and head-nexus phrases. Head-nexus phrases are phrases that discharge some grammatical dependency, either a subcategorization requirement (valence) or the SLASH value of an unbounded dependency construction (head-filler). Finally, valence phrases can be subtyped according to the kind of subcategorization dependency they discharge: subject, specifier, or complement. For example, head/specifier constructions obey the constraint in (58). (58)
head-spr
In addition, constructions inherit constraints from the cross-cutting classification of phrases into either clauses or nonclauses. Among other things, clauses are subject to the following constraint (further constraints on clauses will be discussed in section 5.2):
Verbal Gerunds as Mixed Categories
(59)
151
clause
This constraint states that the SUBJ list of a clause must be a list of zero or more PRO objects. This ensures that either the clause contains an overt subject (and so the the SUBJ list is empty) or the unexpressed subject (e.g., in control constructions) is PRO, a special type of SYNSEM object that at minimum specifies accusative case and pronominal semantics (eitherppro or reft). Note that this PRO is quite unlike the homonomous empty category of Chomsky and Lasnik (1977). Its purpose is only to put constraints on the argument structure of a verb in a control structure, and it does not correspond to a phonologically unrealized position in the phrase structure. In addition, the constraint in (59) restricts the semantic type of their content: the CONT value of a clause must be a psoa object (i.e., a proposition). These two hierarchies define a set of constraints on phrasal signs. A syntactic construction is a meaningful recurrent bundle of such constraints. One way to think of constructions is as the syntactic equivalent of what in the lexical domain would be called morphemes. In terms of the theory of phrasal types presented here, a construction is a phrasal sign type that inherits from both the phrase hierarchy and the clause hierarchy. Because a construction licenses a type of complex sign, it must include information about how both the form and the meaning are assembled from the form and the meaning of its component parts. A construction may inherit some aspects of its meaning from its supertypes. In contrast to the strictly head-driven view of semantics presented by Pollard and Sag (1994), a construction may also have idiosyncratic meaning associated with it. Some of the basic constructions of English are shown in Figure l.Thefin-headsubj-cx and the nonfin-head-subj-cx constructions combine a subcategorized for subject with a finite and nonfinite head, respectively. The finite version, for normal English sentences like They walk, requires a nominative subject. The nonfinite version, for "minor" sentence types like absolutives or Mad magazine sentences (McCawley, 1988), requires an accusative subject. The noun-poss-cx construction combines a noun head with a determiner or possessive specifier to form a phrase with a nom-obj (i.e., an index-bearing unit) as the CONT value. To be more precise, the construction type noun-poss-cx is subject to the following constraint: (60)
noun-poss-cx —>
152
Robert Malouf
Figure 1. English construction types.
Here for convenience, I assume that the English genitive case marker 's is an edge inflection (see Zwicky, 1987; Miller, 1992; Halpern, 1995). The two head/complement constructions both combine a head with its selected for complements, but differ as to whether the resulting phrase can function as a clause and is subject to the constraint in (59).
5. A MIXED LEXICAL CATEGORY ANALYSIS OF VERBAL GERUNDS Words in HPSG select for arguments of a particular category. Therefore, categorial information, projected from the lexical head following the Head Feature Principle, determines the external distribution of a phrase. Selectional information, from a lexical head's valence features, determines what kinds of other phrases can occur in construction with that head. Finally, constructional information, represented as constraints on particular constructions, controls the combination of syntactic units. Within each of these three domains, VGerPs show fairly consistent behavior. What is unusual about VGers is their combination of nounlike categorial properties with verb-like selectional properties. Given the theoretical background of the previous section, we can account for the mixed nominal and verbal properties of VGers that seem puzzling given many standard assumptions about syntactic structure. The categorial properties of VGers are determined by their lexically specified head value. Like all other lin-
Verbal Gerunds as Mixed Categories
153
guistic objects, types of head values can be arranged into a multiple inheritance type hierarchy expressing the generalizations across categories. The distribution of VGers can be accounted for by the (partial) hierarchy of head values in (61).
Since gerund is a subtype of noun, a phrase projected by a gerund will be able to occur anywhere an NP is selected for. Thus, phrases projected by VGers will have the external distribution of NPs. Adverbs modify objects of category verbal, which include verbs, adjectives, and VGers, among other things. Since adjectives only modify c(ommori)-nouns, VGerPs will contain adverbial rather than adjectival modifiers. As a subtype of noun, gerunds will have a value for the feature CASE (although in English this is never reflected morphologically), but since gerunds are not a subtype of verb, they will not have a value for VFORM. The cross-classification in (61) directly reflects the traditional view of gerunds as intermediate between nouns and verbs. In this respect, it is nothing new: in the second century B.C. Dionysius Thrax analyzed the Greek participle as a "separate part of speech which '. . . partakes of the nature of verbs and nouns'" (Michael, 1970:75). But, by formalizing this intuitive view as a cross-classification of HEAD values, we can localize the idiosyncratic behavior of VGers to the lexicon. The position of gerund in this hierarchy of head values provides an immediate account of the facts in (17b) and (17c). The remaining two gerund properties in (17) can be accounted for most simply by the lexical rule in (62).
This rule produces a lexical entry for a VGer from the present participle form of the verb. The VGer differs syntactically from the participle in two ways: it is of category gerund and it selects for both a specifier and a subject. Because a VGer selects for the same complements as the verb it is derived from, the phrase formed by a VGer and its complements will look like a VP. And, since a gerund selects for both a subject and a specifier, it will be eligible to head either a nonfin-headsubj-cx construction, which combines a head with an accusative NP subject, or a noun-poss-cx construction, which combines a head with a genitive NP specifier. Here I assume that the gerund's external argument is lexically unmarked for case and that it is assigned either accusative or genitive case by the appropriate
154
Robert Malouf
construction. POSS-wg VGerPs will inherit all the constraints that apply to possessive constructions in general, for example, the restrictions on the specifier NP and on pied piping. Because the subject and specifier are identified with each other, no VGer will be able to combine with both a subject and a specifier. The combination of properties created by the lexical rule is unusual for English, but the properties themselves are all inherited from more basic types. This mixture of verbal and nominal characteristics reflects the VGer's intermediate position between nouns and verbs in the hierarchy of categories. 5.1. Some Examples To see how these constraints interact to account for the syntax of VGers, it will be useful to consider an example of each type. First, consider the (partial) lexical entry for the present participle of the verb fold, in (63).
From this lexical entry, the Verbal Gerund Lexical Rule produces the corresponding entry in (64).
The two entries differ only in the shaded features. The output of the lexical rule is of category gerund, rather than verb, and the gerund selects for both a subject and
Verbal Gerunds as Mixed Categories
155
a specifier. All other information about the verb gets carried over from the input to the lexical rule. Now we turn to the constructions that this gerund is eligible to head. We will look at two cases: POSS-mg VGerPs and ACC-ing VGerPs. First we will look at the structure of the phrase Pat's folding the napkins, shown in Figure 2. The head of this phrase, folding, is a VGer formed by the lexical rule in (62). It combines with its complement NP (marked 3]) via the head-comp-cx construction. It then combines with a genitive specifier to form a noun-poss-cx construction. Note that the formulation of the Valence Principle in (57) allows Pat's to satisfy both the subject and the specifier requirement of the gerund simultaneously. However, since the construction this phrase is an instance of is a subtype of head-spr, Pat's will only have the properties of specifier.
Figure 2. Pat's folding the napkins.
156
Robert Malouf
Figure 3. Pat folding the napkins.
An equivalent example with an accusative subject is given in Figure 3, for the phrase Pat folding the napkins. This example differs from the previous example only in the way the subject combines with the head. The nonfin-head-subj-cx combines a nonfinite head with an accusative subject. As before, Pat cancels both the subject and the specifier requirement of the head, but in this case it will have only subject properties. 5.2. Pied Piping The pied-piping contrast between ACC-ing and POSS-ing VGerPs follows from the fact that the former are clauses while the latter are not. To show how this
Verbal Gerunds as Mixed Categories
157
result is achieved, I will first sketch the HPSG treatment of pied piping developed by Pollard and Sag (1994), Sag (1997), and Ginzburg and Sag (1998). The basic fact that needs to be accounted for is shown in (65). (65) a. Who failed the exam? b. Whose roommate's brother's neighbor failed the exam? In a wh-question, the leftmost constituent must contain a wh-word, but that wh-word can be embedded arbitrarily deeply. This dependency is encoded by the nonlocal feature WH. Question words are marked with a nonempty value for the WH, whose value is a set of interrogative parameters. Wh-words also introduce an interrogative parameter in the STORE of the verb that selects them. All parameters and quantifiers introduced into the STORE then must be retrieved somewhere in the sentence and assigned a scope by a constraint-based version of Cooper storage (Cooper, 1983; Pollard and Sag, 1994). Take a sentence like (65a). This is an instance of the construction wh-subjinter-cl, which combines a subject and a head to form an interrogative clause. This construction is subject to the constraint in (66).4 (66)
wh-inter-cl
This constraint requires that the subject have somewhere inside it a wh-word that contributes an interrogative parameter. The presence of a wh-word is indicated by the phrase's nonempty WH-value. The position of the wh-word is not unconstrainted, but it can be embedded arbitrarily deeply within the subject so long as its WH-value is passed up to the top of the phrase. In addition, this interrogative parameter must be a member of the PARAMS value of the interrogative clause, and all of the members of the clause's PARAMS value are removed from the store and given scope over the clause. As first proposed by Ginzburg (1992), the interrogative word who is optionally specified for a nonempty WH value, as in (67).
(67)
who
In addition, the lexical entries for all lexical heads obey Sag's (1997) WH Amalgamation Principle in (68).
158
(68)
Robert Malouf
word
This constraint ensures that the WH-value of a head is the union of the WH values of its arguments. These lexical constraints force the head of any phrase that contains a governed wh-word to have a nonempty WH-value reflecting that fact. Next, the WH Inheritance Constraint, in (69), ensures that the value of WH gets passed from the head daughter to the mother. (69)
head-nexus
Similar constraints amalgamate the STORE value of a word's arguments and pass up the STORE value of a phrase from its head daughter. Finally, to guarantee that only questions contain interrogative words, clauses are subject to the constraint in (70). (70)
clause [NONLOCAL | WH { }]
This requires all clauses to have an empty WH-value. This means that any WHvalue introduced by the lexical entry of an interrogative word must be bound off by an appropriate interrogative construction, ruling out declarative sentences like Chris flunked which student. These constraints provide a completely general, head-driven account of pied piping in both relative clauses and questions. Consider first the nongerund examples in (71). (71) a. Whose failure was expected? b. *For whom to fail was expected? In (7la), failure will take on the nonempty WH-value of its specifier whose. The constraint in (69) passes the WH-value of failure (that is, an interrogative parameter whose) up to the entire phrase whose failure. The wh-subject interrogative clause construction forms the interrogative clause whose failure was expected? A similar chain of identities passes up the WH-value of whom in (71b) to the clause for whom to fail. But, this violates (70), and the example is ruled out. Now it should be clear how this theory of pied piping carries over to the VGer examples in (72).
Verbal Gerunds as Mixed Categories
159
(72) a. I wonder whose failing the exam surprised the instructor, b. *I wonder who(m) failing the exam surprised the instructor. The structure of these examples is given in Figures 4 and 5. In (72a), failing picks up the WH-value of whose and passes it up to the phrase whose failing the exam. Since this is an example of a POSS-mg VGerP, a type of noun-poss-cx construction, it is not subject to (70). In (72b), though, the subject of the question is a nonfinite head-subject clause, which by (70) must have an empty wn-value. This conflicts with both the constraints on wn-percolation and with (66), and the sentence is ungrammatical. The difference between POSS-ing and ACC-ing VGerPs with respect to pied piping follows directly from independently motivated constraints on constructions types. Any analysis that treats the subject case alternation
Figure 4.
Whose failing the exam surprised the instructor?
160
Robert Malouf
Figure 5.
*whom failing the exam surprised the instructor?
as essentially free variation would be hard pressed to account for this difference without further stipulations. By adapting Ginzburg's (1992) theory of interrogatives to Sag's (1997) analysis of pied piping, we can also account for the behavior of VGerPs in multiple wh-questions. For multiple wh-questions, Ginzburg (1992:331) suggests "the need for syntactic distinctions between forms that are, intuitively, interrogative syntactically and semantically and forms that are declarative syntactically, but have interrogative contents." In a multiple wh-question, wh-words that are leftmost in their clause have both interrogative syntax and interrogative semantics. They pass up a nonempty wn-value in exactly the same way as in ordinary whquestions. Wh-words that are not clause initial, on the other hand, have only inter-
Verbal Gerunds as Mixed Categories
161
rogative semantics. While they introduce an interrogative parameter into the store, they have an empty WH-value. This is what accounts for the noncontrast in (73). (73) a. I wonder who was surprised by whose failing the exam. b. I wonder who was surprised by who failing the exam. The structures of (73) is given Figure 6. Note that I have assumed Pollard and Yoo's (1998) head-driven STORE collection here, but I have crucially not adopted their analysis of multiple wh-questions. Unlike (72b), (73b) does not run afoul of (70), the constraint requiring clauses to have empty WH-values. Since the ACC-ing VGerP who failing the exam appears in situ, it is only interrogative semantically and its WH value is empty. No constraints prohibit an interrogative parameter from being passed up via the storage mechanism, and so both (73a) and (73b) are grammatical. The constraints outlined in this section also apply to restrictive relative clauses, and so predict the contrasts in (20) and (21). Similarly, under this analysis, PRO-ing VGerPs are instances of clauses. As Wasow and Roeper (1972) observe, PRO-mg VGerPs are parallel to subjectless infinitives in Equi constructions: (74) a. Lee hates loud singing. b. Lee hates singing loudly. c. Lee hates to sing loudly. In both (74b) and (74c), the understood subject of the embedded verb must be Lee. In (74a), though, the understood subject of the nominal gerund singing can be anyone. Since, by the constraint on clauses in (59), the unexpressed subject of a PRO-mg gerund phrase is a PRO, it will be governed by Pollard and Sag's (1994) semantic theory of complement control just like the unexpressed subjects of infinitive complements (see Malouf, 1998). Furthermore, since subjectless gerunds are clausal, pied piping out of a PRO-mg VGerP is also predicted to be ungrammatical: (75) a. Pat invited no one who(m) Chris hates talking to. b. *Pat invited no one talking to who(m) Chris hates. One final point is that the constraints discussed here apply only to questions and restrictive relative clauses. As Levine (p.c.) observes, these predictions do not hold for nonrestrictive relatives or "pseudorelatives" (see McCawley, 1988): (76) a. Sandy, even talking about whom Chris hates, won't be invited. b. Robin is one person even talking about whom gets my blood boiling. These constructions place weaker constraints on pied piping than restrictive relatives do, and they generally allow pied piping of clauses:
162
Robert Malouf
Figure 6.
Who was surprised by whose/whom failing the exam?
(77) a. Sandy, for someone to even talk about whom Chris hates, won't be invited. b. Robin is one person for someone to even talk about whom gets my blood boiling. So, what these examples show is that pied piping in nonrestrictive relatives is not mediated by the feature WH and so are not subject to the constraint in (70).
Verbal Gerunds as Mixed Categories
163
6. CONCLUSION The constructions that combine a VGer with its complements and its subject or specifier are the same constructions used for building NPs, VPs, and clauses. This reflects the traditional view that VGerPs are built out of pieces of syntax "reused" from other parts of the grammar. In one sense, under this analysis a VGer together with its complements really is like V. Both are instances of the same construction type and both are subject to any constraints associated with that construction. In the same way, a VGer plus an accusative subject really do form a clause, while a VGer plus a genitive subject really do form an NP. So, these two types of VGerPs inherit the constraints on semantic type and pied piping associated with the construction type of which they are an instance. However, in a more important sense, a VGer plus its complements forms a VGer', which combines with an accusative or genitive subject to form a VGerP. The analysis presented here allows this similarity to be captured without weakening HPSG's strong notion of endocentricity. By exploiting HPSG's hierarchical classification of category types and its inventory of elaborated phrase structure rules, we are able to account for the mixed behavior of English VGers without adding any additional theoretical mechanisms or weakening any basic assumptions. The analysis presented here does not require syntactic word formation and thus preserves lexical integrity. It also does not require any phonologically null elements of abstract structure, and it allows us to maintain the strong notion of endocentricity embodied by the HPSG Head Feature Principle. Finally, by making crucial reference to syntactic constructions, this analysis allows us to capture on the one hand the similarities among the subtypes of VGerPs and on the other their similarities to other English phrase types.
ACKNOWLEDGMENTS I would like to thank Farrell Ackerman, Bob Borsley, Bob Levine, Carl Pollard, Ivan Sag, Gert Webelhuth, Michael Wescoat, and an anonymous reviewer for their helpful comments. This research was conducted in part in connection with Linguistic Grammars Online (LiNGO) project at the Center for the Study of Language and Information, Stanford University.
NOTES 1
In addition, there are quite general formal problems with the default nature of the GPSG Head Feature Convention (McCawley, 1988; Shieber, 1986). 2 Similarly, under the approach taken by Borsley and Kornfilt (this volume) it is difficult to account for the similarities between ACC-ing and POSS-ing VGerPs.
164
Robert Malouf
3 The details of default inheritance are not relevant to this chapter, but Lascarides and Copestake (1999) suggest how such a system might be formalized. 4 The contained set difference of two sets (X — Y) is the ordinary set difference as long as Y X. Otherwise it is undefined. Likewise, the disjoint set union of two sets (X Y) is the ordinary set union as long as the two sets are disjoint.
REFERENCES Abney, S. P. (1987). The English noun phrase in its sentential aspect. Ph.D. thesis, MIT, URLhttp://www.sfs.nphil.uni-tuebingen.de/~abney/Abney_87a.ps.gz. Aoun, Y. (1981). Parts of speech: A case of redistribution. In A. Belletti, L. Brandi, and L. Rizzi (Eds.), Theory of markedness in generative grammar (pp. 3-24). Pisa: Scoula Normale Superiore di Pisa. Aronoff, M. (1976). Word formation in generative grammar. Cambridge, MA: MIT Press. Bouma, G. (1993). Nonmonotonicity and Categorial Unification Grammar. Ph.D. thesis, Rijksuniversiteit Groningen. Briscoe, T, Copestake, A., and Lascarides, A. (1995). Blocking. In P. St. Dizier and E. Viegas (Eds.), Computational lexical semantics.Cambridge: Cambridge University Press. URL http://www.cl.cam.ac.Uk/ftp/papers/acquilex/acq2wp2.ps.Z. Chomsky, N. (1970). Remarks on nominalizations. In R. Jacobs and P. Rosenbaum (Eds.), Readings in English transformational grammar (184-221). Waltham, MA: Ginn. Chomsky, N., and Lasnik, H. (1977). Filters and control. Linguistic Inquiry, 8, 425-504. Cooper, R. (1983). Quantification and syntactic theory. Dordrecht: Reidel. Croft, W. (1991). Syntactic categories and grammatical relations. Chicago: University of Chicago Press. Fillmore, C. J., and Kay, P. (in press). Construction grammar. Stanford: CSLI Publications. URL http://www.icsi.berkeley.edu/~kay/bcg/ConGram.html. Flickinger, D. (1987). Lexical rules in the hierarchical lexicon. Ph.D. thesis, Stanford University. Gazdar, G., Klein, E., Pullum, G., and Sag, I. (1985). Generalized Phrase Structure grammar. Cambridge: Harvard University Press. Ginzburg, J. (1992). Questions, Queries, and Facts: A Semantics and Pragmatics for Interrogatives. Ph.D. thesis, Stanford University. Ginzburg, J., and Sag, I. A. (1998). English interrogative constructions. Unpublished manuscript, Hebrew University and Stanford University. Goldberg, A. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press. Hale, K., and Platero, P. (1986). Parts of speech. In P. Muysken and H. van Riemsdijk (Eds.), Features and projections (pp. 31-40). Dordrecht: Foris. Halpern, A. (1995). On the placement and morphology of clitics. Stanford: CSLI Publications. J0rgensen, E. (1981). Gerund and to-infinitives after 'it is (of) no use', 'it is no good', and 'it is useless'. English Studies, 62, 156-163.
Verbal Gerunds as Mixed Categories
165
Kornai, A., and Pullum, G. K. (1990). The X-bar theory of phrase structure. Language, 66, 24-50. Lakoff, G. (1987). Women, fire, and dangerous things. Chicago: University of Chicago Press. Lambrecht, K. (1990). 'What me worry?' Mad magazine sentences revisited. In Proceedings of the Berkeley Linguistics Society (vol. 16, pp. 215-228). Lapointe, S. G. (1993). Dual lexical categories and the syntax of mixed category phrases. In A. Kathol and M. Bernstein (Eds.), Proceedings of the Eastern States Conference of Linguistics (pp. 199-210). Lascarides, A., and Copestake, A. (1999). Default representation in constraint-based frameworks. Computational Linguistics, 25, 55-105. Lefebvre, C., and Muysken, P. (1988). Mixed categories. Dordrecht: Kluwer. Malouf, R. (1998). Mixed categories in the hierarchical lexicon. Ph.D. thesis, Stanford University. URL http://hpsg.stanford.edu/rob/papers/diss.ps.gz. McCawley, J. D. (1988). The syntactic phenomena of English. Chicago: University of Chicago Press. McCawley, J. D. (1982). The nonexistence of syntactic categories. In Thirty million theories of grammar. Chicago: University of Chicago Press. Michael, I. (1970). English grammatical categories and the tradition to 1800. Cambridge: Cambridge University Press. Miller, P. (1992). Clitics and constituent in phrase structure grammar. New York: Garland. Pollard, C., and Sag, I. A. (1994). Head-driven phrase structure grammar. Chicago: University of Chicago Press, and Stanford: CSLI Publications. Pollard, C., and Sag, I. A. (1987). Information-based syntax and semantics. Stanford: CSLI Publications. Pollard, C., and Yoo, E. J. (1998). Quantifiers, w/z-phrases, and a theory of argument selection. Journal of Linguistics, 34, 415-446. Former, P. H. (1992). Situation theory and the semantics of propositional expressions. Ph.D. thesis, University of Massachusetts, Amherst. Distributed by the University of Massachusetts Graduate Linguistic Student Association. Pullum, G. K. (1991). English nominal gerund phrases as noun phrases with verb-phrase heads. Linguistics, 29, 763-799. Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J. (1985). A comprehensive grammar of the English language. London: Longman. Riehemann, S. (1993). Word formation in lexical type hierarchies. Master's thesis, Universita't Tubingen. URL ftp://ftp-csli.stanford.edu/linguistics/sfsreport.ps.gz. Ross, J. R. (1967). Constraints on variables in syntax. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Sag, I. A. (1997). English relative clause constructions. Journal of Linguistics, 33, 431484. Shieber, S. M. (1986). A simple reconstruction of GPSG. In Proceedings of the eleventh International Conference on Computational Linguistics (COLENG-86). (pp. 211215). Bonn, Germany. Taylor, J. R. (1995). Linguistic categorization (2nd ed.). Oxford: Oxford University Press. van Riemsdijk, H. (1983). A note on German adjectives. In F. Heny and B. Richards (Eds.),
166
Robert Malouf
Linguistic categories: Auxiliaries and related puzzles (pp. 223-252). Dordrecht: Reidel. Wasow, T., and Roeper, T. (1972). On the subject of gerunds. Foundations of Language, 8, 44-61. Webelhuth, G. (1992). Principles and parameters of syntactic saturation. New York: Oxford University Press. Wescoat, M. T. (1994). Phrase structure, lexical sharing, partial ordering, and the English gerund. In S. Gahl, A. Dolbey, and C. Johnson (Eds.), Proceedings of the Berkeley Linguistics Society (vol. 20, pp. 587-598). Williams, E. (1975). Small clauses in English. In J. Kimball (Ed.), Syntax and semantics (vol. 4, pp. 249-273). New York: Academic Press. Zwicky, A. M. (1987). Suppressing the Z's. Journal of Linguistics, 23, 133-148. Zwicky, A. M., and Pullum, G. K. (1996). Functional restriction: English possessives. Paper presented at 1996 Linguistics Society of America meeting.
ENGLISH AUXILIARIES WITHOUT LEXICAL RULES ANTHONY WARNER Department of Language and Linguistic Science University of York Heslington, York United Kingdom
1. INTRODUCTION English auxiliaries show a complex but systematic set of interrelationships between their characteristic construction types. It has often been assumed that, for its proper description, this requires the resources of movement defined over structures (as in the tradition stretching from Chomsky, 1957, to Pollock, 1989, and onwards). An alternative, within Phrase Structure Grammar, has been to appeal to the resources of lexical rules (Flickinger, 1987; Pollard and Sag, 1987) or their antecedent metarules. In this chapter I will give an account of the grammar of these characteristic constructions in Head-driven Phrase Structure Grammar (HPSG), without using lexical rules or movements interrelating structures, but relying solely on the organization of information within an inheritance hierarchy to make relevant generalizations. The demonstration that lexical rules are not required in this area substantially enhances the possibility that lexical rules could be banished from the armory of HPSG in favor of mechanisms of lexical inheritance (extending the kind of approach to valence alternations developed in Kathol, 1994, Bouma, 1997, and to inflectional and derivational relationships in, for example, Krieger and Nerbonne, 1993; Riehemann, 1994). The demonstration is the Syntax and Semantics, Volume 32 The Nature and Function of Syntactic Categories
167
Copyright © 2000 by Academic Press All rights of reproduction in any form reserved. 0092-4563/99 $30
168
Anthony Warner
more convincing because of the complexity of the interrelationships between auxiliary constructions, and of their interface with negation, which at first sight seems to require a more powerful device than simple inheritance. This chapter is a development of the lexicalist analysis of auxiliaries given in Warner (1993a). The structures posited and much of the argumentation for them are essentially carried over. But the desire to avoid the device of lexical rules, which played a major role in Warner (1993a), has led to an entirely new analysis. The interrelationships proposed between structures are radically different since they are constrained by the need to state them within a hierarchy of unifiable information, whereas lexical rules permit what looks to the practicing grammarian like a more potent ability to manipulate relationships between feature structures.1
2. AUXILIARY CONSTRUCTIONS IN HEAD-DRIVEN PHRASE STRUCTURE GRAMMAR Head-driven Phrase Structure Grammar characterizes linguistic information (lexical or phrasal signs and their components) in terms of feature structures and constraints on those feature structures, where a constraint is, in effect, a partial description.2 Feature structures (or attribute value matrices) are themselves defined within a hierarchy of types. Appropriate features are defined for each type, and appropriate values for each feature. Thus the type category will be defined as having values for attributes HEAD and VALENCE. The values of HEAD correspond broadly to part of speech, and one of the subtypes involved here is verb. This is defined as having attributes AUX (with Boolean values {+, —}) and VFORM, with values corresponding to the major morphosyntactic subcategories of verbs: {fin, bse, etc.} (finite, base infinitive, etc.). In parallel fashion, VALENCE will be defined as having attributes SUBJ (subject), SPR (specifier), and COMPS (complements), which have as their values lists of synsem objects (that is, of feature structures which characterize syntactic and semantic information) corresponding to the subject, specifier, and complements of the category in question. Within this framework, I assume that auxiliaries (modals, BE and appropriate instances of DO and HAVE) share a type verb with nonauxiliary verbs, being distinguished from nonauxiliary verbs as [+AUX] versus [—AUX], that they occur in structures which are like (1) for the reasons argued in Gazdar, Pullum, and Sag (1982) and Warner (1993a), and that they head their phrase.
English Auxiliaries without Lexical Rules
169
Then modals are subcategorized for a plain infinitive phrase; BE (when used as a copular verb) is subcategorized for a "predicative" phrase; DO is subcategorized for a plain infinitive phrase which cannot be headed by an auxiliary, and so on. Most English auxiliaries are "raising" verbs, requiring identity between their subject and the subject of their complement. This holds not only for verbal complements, but for predicative complements after BE. It is dealt with as token identity (that is, structure sharing) between two feature structures within the higher verb's VALENCE: the value of its attribute SUBJ and the value of the attribute SUBJ within the sole member of its COMPS list. This token identity is shown by tagging the feature structure's occurrence with a boxed numeral. The lexicon will then include such basic valence information as that given in (2), where PRD is a Boolean-valued feature which is positive in predicative phrases, and list values appear in angle brackets. (2)
Auxiliary category and subcategorization information in the lexicon
Category can, could, etc. (finite) is (finite) do (finite)
Selecting a phrase headed by Plain infinitive Noninfinitive predicative4 Nonauxiliary plain infinitive
Corresponding value of VALENCE3 SUBJ([D} COMPS [PRD —, SUBJ , VFORM bse} SUBJ COMPS [PRD +, SUBJ ] SUBJ COMPS [PRD —, SUBJ , AUX —, VFORM bse]
Together with some further distinctions and conventions which will be discussed immediately below, the feature structure for modal should in affirmative declaratives will include the information in (3). I omit the specification of a value for PHON, which will characterize its phonology, and for NONLOCAL, which will state the properties needed to deal with unbounded dependencies, such as the relationship between a fronted wh-word and a corresponding gap within the constituent headed by should. The empty list is designated "elist," and the complement's CONTENT value is abbreviated as a boxed integer after a colon (in accordance with the normal convention: Pollard and Sag, 1994:28).
170
Anthony Warner
The combination of this lexical information into phrases depends on schemata of immediate dominance which define local structures (themselves specified in terms of attribute value matrices in recent work), and principles of linear order. Beyond this we need to appeal to three particular principles. One, the Head Feature Principle of (4a), requires the value of a mother's HEAD feature to be identical to that of its head daughter, thus ensuring, for example, that a finite verb occurs within a finite verb phrase (VP), and that a finite VP occurs within a further finite verbal projection. The second, the Valence Principle of (4b), ensures a mismatch between mother and head daughter within the list-valued features in VALENCE, provided that the relevant synsem objects occur as sisters to the daughter: the effect is one of cancellation by combination, as illustrated in (5). Note that a hierarchy of levels is partly established by the occurrence of different values of SUBJ. At the highest level its list is empty; at lower levels it has a nonempty list. The third principle, the Semantics Principle (Pollard and Sag, 1994: 48, 323), has the effect that in headed structures the mother and head daughter have the same value for CONTENT in cases where neither adjuncts nor quantification are involved. So in (5) the CONTENT of the whole clause is token identical with that of its VP, and this is token identical with that of its head. This token is tagged . Should is treated here as a raising auxiliary, whose content is a semantic
English Auxiliaries without Lexical Rules
171
relation which takes as its argument the content of its complement, here tagged [3], as also in (3).5 The principles in (4) are cited from Miller and Sag, 1997:583. (4) a. Head Feature Principle: A head-daughter's HEAD value is identical to that of its mother. b. Valence Principle: If a phrase consists of a head daughter and one or more arguments (complement(s), subject, or specifier), then its value for the relevant VALENCE feature F (COMPS, SUBJ, or SPR) is the head daughter's F value minus the elements corresponding to the synsem(s) of the nonhead daughter(s). Otherwise, a phrase's F value is identical to that of its head daughter.
English auxiliaries differ from nonauxiliary verbs in that they occur not only in structures like (1) but also in the distinctive negated, inverted, and elliptical structures associated with (6b, c, d).
172
(6) a. b. c. d.
Anthony Warner
John should go there. He loves opera. John should not go there. *He loves not opera. Should John go there? *Loves John opera? If John wants to go, he should, [sc. go] *If he can go, he intends, [sc. to go] —OK with intends to, but to is an auxiliary.
I will interpret these constructions in terms of modifications of the basic VALENCE of the auxiliaries involved, as in (7b, c, d). (7) a. John should go there. should = [SUBJ [1]NP , COMPS [VFORM bse, SUBJ [1] ] ] b. John should not go there. should = [SUBJ [1]NP , COMPS not, [VFORM bse, SUBJ [1] ] ] c. Should John go there? should = [SUBJ elist, COMPS [1]NP, [VFORM bse, SUBJ [1] ] ] d. If John wants to go there, he should. [sc. go there] should = [SUBJ, COMPS elist] The interrelationships between these auxiliary constructions can be dealt with by lexical rules, as in Warner (1993a), where one rule maps the lexical information on should in (7a) into that on should in (7b), another into that on should in (7c), a third into that on should in (7d). But some papers have suggested that particular lexical rules may be avoided within HPSG, and that the interrelationships they encode should be reinterpreted within an inheritance hierarchy. Thus Riehemann (1994) shows how morphologically complex words which contain different layers of structure can be appropriately related to other lexical items within an inheritance framework which represents this structuring. This approach is not suitable for the present case, where there is no internal structuring to support the valence alternations. But Kathol (1994) pointed out that lexical rule interrelationships can be integrated into an inheritance hierarchy by the use of ad hoc features (his "proto features") which encode the shared information. A minor extension of this might be to propose that in each of (7a-d) the attribute value matrix for should contained a feature PROTOCOMPSh4}.
(23)
Not
Now consider adding not as modifier of the auxiliary head to the ARG-ST list of that head, as in (24), which modifies the feature structure for could in (22) in just this respect. Here, not[MOD |CONTENT |KEY |HANDEL h2] (abbreviated not [MOD | KEY h2]) identifies the HANDEL value within KEY in the modified
182
Anthony Warner
category, could. This not is simply the normal modifier in an abnormal position, being distinct in that it does not make a constituent with the phrase it modifies. When not is added to the ARG-ST list, it will also appear on the COMPS list: this follows from the Argument Realization constraint (discussed below) which states that ARG-ST is the append of the valence lists. At the phrasal level, the Semantics Principle of Copestake, Flickinger, and Sag (1997) will specify values on the mother as follows: the value of LISZT will be the append of the LISZT values for could and its sisters, the value of CONDS will be the union of the daughters' values, and the value of KEY will be identical with that of the head daughter, as in (25). For convenience the indices of h correspond in these and subsequent examples, except that the h4 of not's feature structure in (23) is replaced by the relevant token identical handle (h2 in (25)). (24) Wide scope of negation: could in John could not leave
(25) Wide scope at phrase level: could not leave
The order of scopes imposed at phrase level by the unified set of conditions in CONDS is not > could > leave. So there is a straightforward integration of the
English Auxiliaries without Lexical Rules
183
semantics of not, with appropriate results, where not is specified as the modifier of the auxiliary head. Now let us turn to narrow scope. Here we add not as modifier of the nonsaturated complement to the ARG-ST list of the auxiliary, as in (26), hence also to the COMPS list. At the phrasal level the Semantics Principle will (as before) integrate the values of LISZT and CONDS appropriately, as in (27). Indices again correspond for convenience, except that the MOD | CONTENT | KEY | HANDEL of not is h1, and the condition it supplies therefore h6 > h1. (26) Narrow scope of negation: should in John should not leave
(27) Narrow scope at phrase level: should not leave
Here, though, the feature structure for should parallel to (22) has been modified not just by the addition of not to its ARG-ST list, but by a further condition. The order of scopes imposed by the set of conditions derived from should in positive sentences and from not as modifier of the nonsaturated complement would be should > leave (h3>hl) and not > leave (h6>hl). In order to impose the required should > not the further constraint CONDS {h3>h5} has been added, stated on
184
Anthony Warner
should. So, here, too, there is a straightforward integration of the semantics of not, with appropriate results, provided that not is specified as the modifier of the auxiliary's nonsaturated complement, and a further scope condition is added. Note that the scope conditions here permit a quantifier to intervene between the modal and negation, as in the second of the three readings of You must not eat one cake: (must ( (one))) 'You must eat none'; (must (one( ))) 'There must be one that you don't eat'; (one(must ( ))) 'There is (a particular) one that you must not eat.' See Copestake, Flickinger, and Sag (1997) for details of the assignment of quantifier scope within the HPSG implementation of Minimal Recursion Semantics. 3.3. Integration into the Lexical Hierarchy How will the information in these lexical entries be organized in the lexical hierarchy? Auxiliaries are for the most part raising predicates, and they will inherit information from this general constraint. This includes predicative BE, which has a nonsaturated complement whose SUBJ value is token identical to that of BE. Davis (1996) discusses semantic regularities across the valences of different lexical items, seen as constraints on the relationship between values of CONTENT and those of the ARG-ST list (see,also Wechsler, 1995). In the light of Davis's discussion it seems reasonable to suggest that a "linking type" which encodes the relevant constraints on this relationship for raising to subject categories will be roughly as follows: (28)
Linking type raising to subject
where the whole synsem is + AUX if the second member of ARG-ST is word. Here the relational predicate rel has an attribute ARG whose propositional argument's handle is identical to or outscopes that of the nonsaturated complement's KEY. The possibility of a not on the list in second position is provided for by the optional member word, with the constraint that this is only present if the whole synsem is +AUX.17 Now for the specification of wide scope negation all that is needed is the unification of this type word on the ARG-ST list with not, where not modifies the head auxiliary, i.e., unification with the following information, which places not in second position on an auxiliary's ARG-ST list, and identifies the auxiliary's KEY | HANDEL value with that of not's MOD feature. The rest of
English Auxiliaries without Lexical Rules
185
the necessary information, including the condition specifying scope, is part of the lexical entry for not and need not be specified here. (29)
Wide scope auxiliary negation
This will also undergo unification with the types specified by Davis as underlying transitive verbs. These place the subject in initial position on the ARG-ST list, but allow the direct object to occur at some later point on that list, thus accounting for the typical intervention of the indirect object when both are NP. As in the raising to subject type, not will supply scope information, resulting in wide scope negation. So unification with a single statement for wide scope negation with not will account for all such negation in auxiliaries, including transitive auxiliaries (possessive HAVE and identificational BE) as well as raising auxiliaries. Narrow scope negation is only found with raising auxiliaries, not with transitives. I accounted for it above by placing not on the ARG-ST list as modifier of the nonsaturated complement, and adding to CONDS the condition h3>h5, which requires the modal to outscope negation but leaves open the possibility of a quantifier scoping between them. The relevant information to be unified with the linking type constraint for raising to subject is as follows. The other information required belongs to the lexical entry for not. (30)
Narrow scope auxiliary negation
Thus far we have a notably simple analysis, which makes full use of the lexical entry for not, and depends on general principles of combination by unification. We are in a position to set up a type negated, with a partition of subtypes: wide scope and narrow scope. Negated would establish the basic constraint (that not is added to the ARG-ST list of a finite auxiliary); unification with wide scope or
186
Anthony Warner
narrow scope would add information about what not modifies. Individual auxiliaries in particular senses would inherit either from wide scope or narrow scope. Both of these types would have a unification with raising to subject; the first also with transitive. Negated would also be a subtype of finite aux lex which restricts the partition to finite auxiliaries. But there are some considerations which imply a somewhat more complex hierarchy to permit an appropriate treatment of scope, and this will imply a modification of our treatment of not. 3.4. Other Considerations 3.4.1. SCOPE RESTRICTIONS ON INDIVIDUAL AUXILIARIES Individual auxiliaries are restricted in their occurrence with wide and narrow scope negation. In general, each individual lexeme retains the same scope of negation whether it corresponds to epistemic, dynamic, or deontic modality (here using the distinctions of Palmer, 1979), though may and might are obvious exceptions.18 I shall therefore suppose that this is fundamentally an area of lexical idiosyncrasy, in which only partial generalizations are to be expected. The identification of appropriate scopes is not, however, straightforward, being dependent on the interpretation assigned to the auxiliary in question, itself dependent on the type of account being offered.19 I shall adopt an interpretation based essentially on writings on English grammar, principally Quirk et al. (1985) and Palmer (1979, 1988). I shall also suppose that the auxiliaries do, perfect have and predicative be occur with wide-scope negation; predicative be includes so called "progressive" and "passive" be, in which I argue that it is the complement of be that has "progressive" and "passive" properties, not be itself (Warner, 1993a). The major uses of the modals themselves can reasonably be subdivided as follows (see Quirk etal., 1985: §§10.67ff.): 1. wide scope of negation: can, could, may (deontic), need, dare, will, would. 2. narrow scope of negation: may (epistemic), might (epistemic), must, ought, shall, should. Will and would are particularly difficult to interpret. Quirk et al. say that the distinction between wide and narrow scope is neutralized; I shall in the first instance interpret negation with will and would as having wide scope, taking epistemic uses such as there won't be a problem as (not (future)) rather than (predictable (not)). Similarly "subject-oriented" instances such as She won't behave, I will not surrender will be interpreted as "not willing to" rather than "intend not to," "be disposed not to" (cf. Perkins, 1983:47ff. for a different view). But in note 23 I allow for an account which assigns epistemic will and would either scope, and an account which assigns epistemic will, would narrow scope can readily be constructed. So little hangs on the assignment of particular scopes to these words.
English Auxiliaries without Lexical Rules
187
The best overall account will be the one which most appropriately reduces the amount of lexical idiosyncrasy. From (2) above it is clear that narrow scope negation holds broadly of modals of necessity and obligation, and one might add two complementary constraints to the lexical hierarchy, representing a partial generalization across auxiliaries. The first ties together narrow scope and necessity/ obligation, supposing that a more abstract relation oblig—rel will underlie the modals of necessity and obligation (need, must, ought, shall, should). The second associates wide scope with other relations. This substantially reduces the amount of lexical idiosyncrasy.20 The exceptions are epistemic may and might, which have narrow scope negation, need, which has wide scope negation (and perhaps also epistemic will and would if these may have narrow scope negation). But given the amount of apparent lexical variation in this area, any account must expect a small number of exceptions. I shall suppose that modals inherit from a set of relations which specifies them as epistemic or root (dynamic or deontic), so that epistemic may and might can be specified by reference to a relation may-epistemic—rel. Remember that what I have called syntactic constituent negation after a modal (as in (19b, c)) does not belong here. It has narrow scope because of the presence of not within a complement. This is not a property of the modal, but is a separate issue, which will be predicted from the syntactic combination of the modal with a constituent which happens to be negated. 3.4.2. INFLECTIONAL FORMS IN -N'T A second set of facts which needs to be integrated concern the forms in -n't. These are open to analysis as a series of negative forms (as traditionally), or, more recently, as carrying a negative inflection (Zwicky and Pullum, 1983), since only a proportion of -n't forms is phonologically predictable as the addition of a "cliticized" -n't to the positive. They show both wide and narrow scope of negation. (31) illustrates a typical wide-scope negation. The information associated with -n't is the relation not—rel in LISZT, and the condition CONDS {h6>h2}. (31) Wide scope: couldn't
188
Anthony Warner
(32) illustrates a typical narrow scope negation. Here the information associated with -n't is again the relation not rel in LISZT, and the condition CONDS{h6>hl}, if we suppose that {h3>h5} is rather a property of should. (32)
Narrow scope: shouldn't
These forms also imply that scope facts should be treated separately from the realization of negation as not or -n't, because shouldn't is a word whose semantics includes negation, whereas the should in should not does not include negation. One combination is within the word, the other is syntactic. 3.4.3. GENERAL NATURE OF LEXICALLY RESTRICTED SCOPE RESTRICTIONS Finally, it seems that a preference for wide or narrow scope is characteristic of each individual auxiliary (or sense of an auxiliary) whether negation is realized with not or with inflected -n't or with some other negative (see Palmer, 1979:26). So there is a relationship between the individual auxiliary and negation, not one set of distinct relationships for not and another for -n't. This also implies that scope should not be tied closely to the realization of negation. But here I will deal only not and -n't, leaving for future work the wider implications of the scope facts with never and other negatives. (33) a. Paul shouldn't /should not /should never eat anything uncooked. Paul should eat nothing uncooked, (narrow scope negation) b. Paul couldn't /could not /could never eat anything uncooked. Paul could eat nothing uncooked, (wide scope negation) 3.5. The Inheritance Hierarchy The considerations of section 3.4 imply that it makes sense to distinguish constraints determining scope from constraints which establish the valence alternations of auxiliaries. So I will analyze the facts of negation by setting up two partitions within the lexical hierarchy in a type finite aux lex from which all finite auxiliaries will inherit:
English Auxiliaries without Lexical Rules
189
(i) NEG FORM, a partition of negative types which are basically (but not solely) concerned with valence facts. I have included semantic information about -n't in this hierarchy, though it might better be placed in a separate part of the hierarchy concerned with inflected forms. (ii) NEG SCOPE, a partition with subtypes wide neg scope and narrow neg scope. All finite auxiliaries will also inherit from this, and I therefore place it alongside NEG FORM, recognizing that this location also may need revision in a more comprehensive treatment of scope. The separation of scope facts from the realization of negation means that it is no longer appropriate to treat not as a modifier. In effect, the scope constraints are being treated as a property of the auxiliary head of the construction. Where not is concerned, this can be stated as the selection by the auxiliary head of a particular value of MOD | CONTENT | KEY | HANDEL within not. But as soon as scope is treated more generally, this becomes redundant. It can be stated, but at the cost of some further complexity. So I shall suppose that it is no longer necessary, and that the lexical entry for not should be revised by adding the possibility that it may be [MOD none, CONDS eset]. This attributes to not a syntactic property which is typical of members of the ARG-ST list, and this underscores its status as a complement of the verb. Now in order to make the separation of scope and valence effective, so that there is a unitary statement of scope conditions, I need to set up a list-valued feature NEG, defining it as one of the attributes of content. Its possible values will be elist and a singleton list containing the not—rel relation, which will be identical with one on a relevant LISZT. Its default value is elist. It is parallel to KEY, in that it identifies a specific relation within LISZT. The point of this feature is that, like KEY, it permits reference to feature values within a specific relation, and it is necessary to do this in order to provide a unitary statement of the common scope properties of auxiliaries which have negation in different locations: within the head when inflected with -n't, or within the head's sister not. The partition for negation can then be set up as in (34). In NEGATION, the type negated contains CONTENT |NEG, which is common both to auxiliaries which combine with not and to their -n't forms. Not negated specifies that the second member of its ARG-ST list is a phrase: this prevents unification with the partition of raising to subject which has word in this position. In NEG FORM not arg places not in the second position of the ARG-ST list and identifies its CONTENT | KEY not-rel relation as the value of the CONTENT |NEG of the auxiliary head. The type -n't form states the basic semantics of negation by placing not—rel on the auxiliary's LISZT value, where this not—rel is token identical to the value in the NEG list. Here I have assumed that the value of KEY is that of the first member of the LISZT list, and that a high-level default states that it is the only member; this default is superceded by the constraint in -n'tform.21 An appropriate statement about the morphology of the form will also be needed.
190
Anthony Warner
(34)
Part of the inheritance hierarchy for finite auxiliaries
(35)
The types within the partition NEGATION
(36)
The types within the partition NEG FORM 22
Finally in (37) I give the constraints on scope. NEG SCOPE is the partition wide neg scope and narrow neg scope. In these, two relations within the finite auxiliary's CONTENT are isolated: CONTENT KEY is the relation which cor-
English Auxiliaries without Lexical Rules
191
responds to the auxiliary's meaning, and CONTENT |NEG contains not—rel. Then in (a) wide scope conditions are established for relations which are not oblig—rel, except for epistemic may and might; and in (b) narrow scope conditions are established for relations which are oblig—rel and for epistemic may and might. The (a) group covers all of the wide scope modals listed above: can, could, may (deontic), dare, will, would, but it omits need. The (b) group covers all the narrow scope modals listed above: may (epistemic), might (epistemic), must, ought, shall, should. It also includes need, but this actually has wide scope and needs separate statement as an exception. Narrow neg scope, which holds only for auxiliaries which are raising predicates, and which requires reference to the CONTENT | KEY handle of the nonsaturated complement for the statement of scope, refers additionally (as needed) to ARG-ST.23 (37)
The types within the partition NEG SCOPE
and I CONTENT I KEY ~i [oblig_rel] and
[may-epistemic_rel]
Members of type (a) are can, could, may,24 dare, will, would, epistemic uses of (can), could,25 will, would.
and I CONTENT I KEY [oblig_rel] or [may-epistemic_rel]
192
Anthony Warner
Members of type (b) are must, ought, shall, should (and need), and epistemic uses of may, might. Need requires a further exception statement. There is one default statement here, in the CONDS of (37b), where the back slash indicates a value which may be overridden. This permits the formulation of an exception statement for need, the only exception to the partition as formulated. Need is [oblig—rel] with wide scope negation.26 The lexical entry for need will allow for two possibilities: one with [NEG elist] will have no unification with negated; the other will have [NEG ([not—rel])] and a statement which will both affirm the wide scope condition (defining h2 as in 34a) and negate the relevant narrow scope condition, thereby taking precedence over the default in (37b): CONDS {h6>h2} and -CONDS{h3>h5}.27 The condition {h6>hl} need not be negated; it is consistent with wide scope. There is one final (but major) wrinkle. In negated yes-no questions, the normal (perhaps invariable) scope of sentential negation with notl-n't is wide.28 (38) a. Should you not keep sober for once? Shouldn't you keep sober for once? ('Is it not the case that you should keep sober for once?') b. Might there not have been a problem over his drinking? ('Is it not possible that there was a problem over his drinking?') c. Won't there have been a problem over his passport? ('Is it not predictable that there was a problem over his passport?') This looks like a different order of fact from the interaction between lexical item and scope considered here, and I will suppose that a subtype of a type yes-nointerrogative-clause will contain a constraint imposing wide scope negation on auxiliaries. This will unify straightforwardly with the constraint of (37a). If it also negates the specification of narrow scope negation (as in the lexical entry for need just discussed), it will have a unification with the information in (37b), taking priority over the default, and thereby assigning wide scope of negation to should, must, etc. as appropriate. The information in the different partitions of this hierarchy unifies straightforwardly to give the feature structures which permit us to characterize such categories as "epistemic modal with wide scope negation with not," and so on. To exemplify this, here in (39) are two particular results of the unification of the information in the hierarchy of (34)-(37), with raising to subject (28) and aux lex (which supplies only +AUX). The first structure, (39a), corresponds to a wide scope use of not, the second, (39b), to narrow scope -n't.
English Auxiliaries without Lexical Rules
193
(39) a. Information resulting from the unification of the types: raising to subject, aux lex, finite .aux lex, negated, not arg, wide neg scope
and I CONTENT I KEY
[oblig_rel] and
[may-epistemic_rel]
b. Information resulting from the unification of the types: raising to subject, aux lex, finite aux lex, negated, -n't form, narrow neg scope, and the information on type word given in note 21.
and I CONTENT I KEY [oblig_rel] or [may-epistemic_rel]
194
Anthony Warner
3.6. Conclusion In effect I have presented two accounts of negation here. In the first, not is a modifier, and scope differences depend on the fact that what it modifies is selected by its head auxiliary. The second account makes a general statement about scope which covers -n't negatives as well as not. Under this account, the conditions which establish scope restrictions are stated directly as a property of the auxiliary head, and it is no longer necessary to treat not as a modifier. Both of these accounts of negation depend on the integration of simple properties: those of the semantics and syntax of auxiliaries, those of not and -n't, and those of statements of scope. There is some complexity in identifying the relevant scope for individual modals, but this is only to be expected when dealing with the lexical idiosyncrasies of this intricate group. The accounts are compositional in the very straightforward sense that when not is on the ARG-ST list, the semantics of not is unified into the analysis. This is a desirable treatment, which argues for the appropriateness of these analyses. The second analysis accounts not only for both wide and narrow scope negation, but also for the partially systematic distribution of auxiliaries with these different scopes, and can even accommodate the different behavior of yes-no questions. It therefore goes beyond the earlier analyses of Warner (1993a), Kim and Sag (1996a), and Kim (1995) in several respects. But what is most striking is the fact that these accounts depend almost solely on the simple addition or unification of information. Not even the restricted manipulations of lexical rules are required. They therefore provide a strong argument against the use of lexical rules in HPSG. Moreover, the fact that the unification of information can smoothly account for so much strengthens the position developed in Kim and Sag (1996b), which argues against the use of a more complex structure containing functional categories and employing movement to capture generalizations.
4. SUBJECT-AUXILIARY INVERSION Inversion of subject and finite auxiliary occurs in main clause interrogatives, in tag questions, after a fronted negative with scope over the auxiliary, in and neither and and so tags, and restrictedly in conditionals and comparatives (Quirk et al., 1985: §§18.24, 15.36). (40) a. Could you see the horizon? b. At no point could I see the horizon. c. I could see the horizon, and so could Harry. Such clauses are best analyzed within the framework adopted here as having a "flat" structure in that the finite auxiliary, the subject phrase, and the complement phrase are all sisters, as in (41), though the feature content of V here is a matter for further discussion.
English Auxiliaries without Lexical Rules
195
An alternative which has been proposed (e.g., by Gazdar, Pullum, and Sag, 1982) takes the subject and complement to form a constituent (as in: could [you see the horizon]?}. But the arguments for this structure are unsatisfactory, and it involves difficulties, so it is better rejected. Thus the possibility of coordinations of the sequence subject + complement as in (42a, b) does not necessarily imply the constituency of subject + complement in (40), given that the possibility of coordination in ditransitive structures, such as (43), implies that we need a more general account of these phenomena. The same line of thinking leads to the rejection of instances of Right Node Raising of the subject + complement sequence as an argument for constituency (suggested in Borsley, 1989), since Right Node Raising is not an infallible test for constituency (Abbott, 1976). (42) a. Will Paul sing and Lee dance? (Gazdar, Pullum, and Sag, 1982:612 [49a]) b. Is Paul beautiful and Lee a monster? (Gazdar, Pullum, and Sag, 1982: 612 [49g]) (43) John gave a record to Mary and a book to Harry. There would be evident difficulties, too, for dealing in a natural and motivated way with the facts of subject-verb agreement and nominative case assignment if the sequence subject + complement were a constituent, since we would naturally expect the mother of this "small clause" to be [SUBJ elist], and this would imply that the subject should be oblique and that the auxiliary should lack agreement. Moreover, a further argument against this constituency can be constructed from the survival of the subject in ellipsis. Suppose that the sequence subject + complement is indeed a constituent in inverted clauses. Then the inverted auxiliary must have the category of this "small clause" on its COMPS list. But ellipsis of phrasal complements after auxiliaries is free, and the clause complement of an
196
Anthony Warner
auxiliary may undergo ellipsis, as in (44).29 So why should not the possibility of ellipsis be generalized to inverted instances? This would predict the occurrence of the auxiliary without subject or complement as an instance of elliptical inversion, as in (45), and this is not well formed. (44)
Would they rather Paul came on Tuesday? —Yes, they would rather.
(45) Can we go to Disneyland? *Please can? [sc. we go to Disneyland] So it is best to analyze these inverted clauses as "flat" in structure: the finite auxiliary, the subject phrase, and the complement phrase are all sisters. The finite head auxiliary will also carry the head feature [+INV], which is justified on both syntactic and morphological grounds: syntactically by the restricted distribution of inverted clauses and morphologically from the uniqueness of aren't in aren't I?30 The most satisfactory way of generating this flat structure is to change the valence list membership of its head by placing the first member of the ARG-ST list on the COMPS list [so that the head auxiliary has its subject as the first item on its COMPS list, and is consequently SUBJ elist, as in (41)], and to use Schema 2, the HEAD-COMPLEMENT SCHEMA, which is also used to specify the structure of VPs. The case of the subject and subject-verb agreement will be specified by reference to the initial member of the auxiliary's ARG-ST list. The information which will appear in a type inverted, a subtype of finite aux lex, will be as follows: (46) inverted
here ® is append, [1] is a synsem, and [2], [3] are lists of synsem (which may be empty). This immediately raises two questions. The first concerns the use of Schema 2; the second the details of this formulation. Pollard and Sag discuss two ways in which the flat structure might be generated (1987, 1994: §§1.5, 9.6).31 One is to specify a change of valence and use Schema 2, as just suggested.32 The other is to retain the same values of SUBJ and COMPS as in a noninverted structure, and to use Schema 3, the HEAD-SUBJECTCOMPLEMENT SCHEMA, in which both the subject and the complements of the lexical head are its sisters. This schema is apparently required for other languages; see Borsley (1995) for the suggestion that universal grammar must provide for Schema 3 to cope with the facts of Syrian Arabic (alongside Schema 2 which is required for Welsh). On this account, the types inverted and not inverted will differ
English Auxiliaries without Lexical Rules
197
only in that the former is [+INV], the latter [—INV], though it will be necessary to constrain the values of INV which occur with the structural schemata.33 Consequently, the demonstration that it is possible to account for constructions with auxiliaries within an inheritance hierarchy (which forms my more general concern in this chapter) goes through directly for this construction, since the feature content of these inverted and uninverted subtypes differs only in this particular respect. I prefer, however, to reject this account, on the ground that this would constitute the only use in English of Schema 3, and such an isolated use is unattractive unless there is a clearer rationale for its adoption.34 But the matter remains uncertain, though the adoption of Schema 2 is probably currently better motivated, and my general point can also be demonstrated with respect to this schema. Proceeding then on the assumption that Schema 2 is appropriate, the other question is essentially as follows. Given that we need to ensure that the first member of the ARG-ST list appears on the COMPS list, what is the best way of doing this? The definition of inverted in (46) identifies the initial members of the ARG-ST and COMPS lists, and leaves the specification of SUBJ as elist to the Argument Realization constraint of (47), which defines the ARG-ST list as the append of the valence lists. (47)
Argument Realization
But why not reverse this? Why not define the type inverted by constraining the value of SUBJ (and SPR) to be elist, thus forcing the subject onto the COMPS list via the Argument Realization constraint? This apparent point of detail matters because the formulations have different consequences. If the subject is forced onto the COMPS list via the Argument Realization constraint, then it is difficult to see why it should not behave like other members of COMPS and be subject to ellipsis, so that examples like the Please can? in (45) would be predicted grammatical, or extracted, so that sentences like John must leave would be assigned an additional filler-gap structure (John [+INVmust leave]) alongside the subject predicate structure.35 Given the formulation of (46), however, the initial member of the COMPS list cannot be subject to ellipsis under the account developed below, since this would require unification with a constraint requiring its absence from the COMPS list, and (46) specifies precisely that it is present. Nor can it be subject to extraction if we have an account in which complement extraction is defined by unification with a constraint permitting an appropriate mismatch between the COMPS list
198
Anthony Warner
and the ARG-ST list of some lexeme (in the spirit of Bouma, Malouf, and Sag, 1997), and for the same reason: its presence on the COMPS list is specified in inverted, and there can be no unification of this with a constraint which specifies its absence from the list. Given the important consequences of this minor difference of formulation, some justification for the adoption of (46) over the alternative is clearly in order. Two considerations are relevant. The first involves the role of specifiers, which has not so far been discussed, but which figure in the statement of Argument Realization. It seems reasonable to suggest that it would in fact be necessary to specify inverted as [+INV, SUBJ elist, SPR elist] to ensure that the subject was forced onto the COMPS list.36 This is not as intuitively appropriate as (46), and it is not more economical, since both constrain the values of three attributes. So at least the alternative carries no advantage. The rationale underlying this position is that it seems likely that auxiliaries may have specifiers, and that they are not restricted to being lexical items. Candidates for such status are floated quantifiers, which could reasonably be analyzed (at least in part) as specifiers of verbs and predicative items. The distribution of all, both, and nearly all (for example) is to a large extent consistent with this, since these readily precede overt VP, as illustrated in (48), but do not precede an ellipsis site, or follow VP. If they are treated as specifiers of V, this gives a better motivated account of these positional restrictions than does analyzing them as adverbs (as in Kim and Sag, 1996b). (48) The old men (_) would (_) have (_) liked (_) to (_) fly (*_). All may appear in any of the positions indicated except the last. Second, the acquisition of the formalism of (46) is in accord with the thinking behind the Subset Principle (Berwick, 1985), which posits a preference for the most restrictive hypothesis as a factor in learning, thus requiring a grammar generating the smallest set of structures. This principle holds for very precisely defined cases. In the present instance we have two output languages, one of which is a subset of the other, since the formulations differ only in that one generates additional instances with an empty subject position in inverted structures. We might suppose that a learner is moving from an antecedent analysis in which inverted structures are fully specified. Of the two alternatives, the unmarked analysis (to be preferred in the absence of contrary evidence) is the conservative one, which retains the output of the antecedent analysis, whereas the marked analysis, requiring positive evidence for its adoption, goes beyond this in adding the possibility that the subject may be absent. This is exactly the kind of situation to which the Subset Principle is relevant. This is, moreover, surely an area with plentiful data: children will presumably be exposed to ample evidence of the subject's presence in tag questions and inverted elliptical structures, to set alongside the complete absence of data such as the Please can of (45). So this analysis is learn-
English Auxiliaries without Lexical Rules
199
able in principle. On both of these fronts, then, it seems reasonable to adopt the formulation of (46). On this account, finite auxiliaries must meet one of the two constraints inverted and not inverted. Inverted assigns [+INV], and it identifies the first item on the ARG-ST list with the first item on the COMPS list. Unification with Argument Realization will specify the value of SUBJ as elist. Not inverted assigns [—INV, SUBJ ]and auxiliaries of this type will be SUBJ, ARG-STh5} given default encoding,
English Auxiliaries without Lexical Rules
215
alongside another (overriding) type encoding the absence of any relationship between these handles. I am grateful to Ann Copestake for pointing out both problem and solution. 28 Distinguish constituent negation with not which of course has narrow scope. Palmer (1979:96; 1988:127) suggests that sentential negation may remain narrow in questions with mustn't, but qualifies the observation: "Although native speaker intuition is uncertain here." I have not included this possibility in my analysis. 29 This holds good whether would is the relevant auxiliary and rather is phrasal complement, or rather is itself an infinitival auxiliary; see note 46. 30 [+INV] must occur only in main clauses. This restriction can be dealt with by a constraint (which has some lexically controlled exceptions) that no member of an ARG-ST list may be [+INV]. There is no need to extend INV to all verbs. 31 The existence of twin possibilities within a framework containing the attribute SUBJ seems to have been first noted by Borsley (1986:83). 32 Pollard and Sag (1994: §9.6) pointed to two particular problems with this approach. One concerned difficulties avoiding unwanted extractions of the subject from the COMPS list. This will be discussed below. The second problem is the suggestion that if a subject is present in COMPS, SLASH categories will percolate into it without also percolating into the complement. This would falsely predict the grammaticality of (i) beside (ii). (i) (ii)
*Which rebel leader did rivals of e assassinate the British Consul? Which rebel leader did rivals of e assassinate e?
But in such instances this will be ruled out by the "Subject Condition," as noted in Warner (1993a:249, note 20 to p. 84). Since in subject-raising structures information about the subject is shared by the valence statements for the lower and higher predicates, the lower verb (here assassinate) will have a value for SUBJ which contains SLASH. So the Subject Condition (which prevents subjects from having nonparasitic gaps) will hold for it. Hence assassinate must also have a SLASH within its COMPS list; (cf. Pollard and Sag, 1994: §4.5,§9.2). There remains however the problem of identificational BE, as in (Hi), which is surely not a raising predicate, but which apparently shows a (weak) restriction on extraction from its inverted subject, contrast (iv) and (v). (iii)
Is the first man the thief?
(iv)
Of which book was the reviewer also the author?
(v)
?Of which book was the reviewer also the author of Gamelan Studies?
It is harder to tell whether there is a similar set of facts for the inverted "possessive" HAVE of British English, since this is formal and increasingly restricted in usage. There may be a difficulty here for the use of Schema 2 in English inversions, unless the "Subject Condition" facts can be made to fall out in a way which covers the data given above. 33 The type hierarchy under head will state that the type finite auxiliary has the Boolean attribute INV, so finite aux lex will automatically be marked for INV. There will, however, be the unwelcome complexity that Schema 3 will need to be specified +INV for English, while Schemas 1 and 2 will need to be specified —'[+INV]. 34 There is, however, some further possible evidence in favor of the use of Schema 3 in the operation of Linear Precedence statements, since subjects which are Ss or VPs are
216
Anthony Warner
not distributed just like complements. Pollard and Sag (1987:181) formulate their Linear Precedence statement 2 so that the ordering of phrasal complements in English reflects the ordering of the ARG-ST list, except for synsem of type verb. This allows the freedom of order found in (i) beside (ii), but bars (iv) since AP must precede PP as in (iii). (i)
Kim appeared to Sandy to be unhappy ARG-ST
(ii)
Kim appeared to be unhappy to Sandy ARG-ST
(iii)
Kim appeared unhappy to Sandy
ARG-ST
(iv)
*Kim appeared to Sandy unhappy
ARG-ST
We might expect to find this freedom in the case of subjects if they are indeed on the COMPS list, so that (vi) would be available beside (v). But this is not so. (v)
Is (for a journalist) to reveal sources legitimate? ARG-ST<S, AP>; ARG-ST
(vi) *Is legitimate (for a journalist) to reveal sources? ARG-ST<S, AP>; ARG-ST It is, however, not clear to me what the most satisfactory account of this restriction is, nor whether it will indeed constitute an argument for the use of Schema 3 within a more detailed account of Linear Precedence principles of English grammar. 35 Pollard and Sag (1994: §9.6) discuss the problem of assigning a double structure to John must leave. More recently, Bouma, Malouf, and Sag (1997) have proposed a fillergap account of Who visits Alcatraz? But this does not imply that a subject may be extracted from [+INV] structures since the phrase visits Alcatraz is not [+INV], and Who will visit Alcatraz? can be dealt with as the extraction of the subject from [—INV]. Indeed, if subject extraction from [+INV] is not permitted, this will avoid the assignment of a further [+INV] structure to Who will visit Alcatraz? 36 Then the interrogative Would the old men all have liked to fly? corresponds most directly to the declarative The old men would all have liked to fly in that in both all is the specifier of have liked to fly, and there is no interrogative corresponding directly to The old men all would have liked to fly. Note that adopting (46) has the same result, since the unification of (46) and (47) has SPR elist. 37 Here and in (54) below, HEAD designates the head daughter of an ID schema; it is not the attribute HEAD. 38 Quirk et al. (1985:809) observe that "some speakers accept" the "rather formal" construction of Is not history a social science? and that this order is especially likely in formal contexts where the subject is lengthy. Some further principle (perhaps weight ordering) will be needed to account for the greater difficulty of the order with pronoun subjects, which are more marginal (e.g., Should not you talk to him about it?) and for the absolute impossibility of this order in tags: aren't they?, are they not? *are not they? 39 This covers the central area of what has been called VP Deletion, but both the term and the implied analysis are inappropriate (Warner, 1993a: 5f.). 40 If I follow the system of Bouma, Malouf, and Sag (1997), but substitute the ARG-ST list for their DEPS list, the complement would appear on the ARG-ST list as gap-ss, and its SLASH value would be re-entrant with that of the auxiliary's (TO-)BIND feature: will in (55b) would be [ARG-ST,
English Auxiliaries without Lexical Rules
217
SLASH{VP}, BIND{VP}], where all VP are token identical feature structures of type local. 41 Note that Pseudogapping, illustrated in (i), should be generalized with Postauxiliary Ellipsis, as argued (among others) by Miller (1990) and Warner (1993a). A syntactic account of this might treat it as the ellipsis of a synsem with non-null SLASH within a gapfiller structure [i.e., as parallel in major respects to the examples of (56)]. The missing complement of will in (i) would be VP/NP. (i) We're agreed then: You will try to persuade your father this weekend and I will _ your mother. 42
The proform so with auxiliaries is regrettably narrowly distributed, and it is necessary carefully to distinguish the connective adverb. But note that the construction of (i), in which so is a proform for AP/PP, is ungrammatical, unlike (ii) which has both ellipsis and extraction and is grammatical. The contrast between (iii) and (iv) also implies that so is not a proform which may contain a non-null SLASH, if we suppose that (iv) is a gap-filler structure: so [VP/NP] ...the ham[NP]. (i)
*She was fond of Harry, and of George she will be so too.
(ii)
She was fond of Harry, and of George she will be too.
(iii)
If you order Harry to eat up all his food, then so he will!
(iv)
*If you order Harry to eat up all his food, then so he will the ham!
43
The arguments in Milsark (1976) and those noted in Lumsden (1988:51) seem to me to show that we should reject the analysis of there sentences as containing a single NP complement with an internal adjunct, as proposed by Williams (1984), even for instances with AP or participial phrase, in favour of that adopted by Pollard and Sag (1994). 44 One problem of analysis (which I shall not pursue) is that of establishing the relationship of instances of ellipsis of the first complement with retention of the second to Pseudogapping. But see note 47. 45 There may be some evidence for the ellipsis of locatives as distinct from their optional status in restricted constructions such as: (i) I have some minor matters on my conscience. *?But there are no real misdeeds [sc. on my conscience]. 46
Another construction which might show two complements with an auxiliary is found with would rather, had better. The restricted optionality of rather and better shown in (i) and (ii) is interesting. (i)
I don't know whether he would rather leave early, or whether he would *(rather) leave late.
(ii) You would rather leave early? —Yes, I would (rather). —No, I would *(rather) leave late. The simplest solution is that rather and better are themselves [+AUX, bse], that they take a plain infinite complement (or a clause in the case of rather), and that they provide a context for deletion, but can only themselves be deleted along with their complements.
218
Anthony Warner
47 1 have not here discussed the relationship between Postauxiliary Ellipsis and Pseudogapping, although examples like (61) might readily be referred to Pseudogapping. Instances like (63b) (65b), however, seem most naturally interpreted as the Postauxiliary Ellipsis of only one complement phrase (given that there constructions have two phrases within their complement, see note 43). This may open an alternative of analysis in which Postauxiliary Ellipsis affects only the final element (or elements) of the complement, and clauses with ellipsis of nonfinal phrases are analyzed as showing Pseudogapping. But if John reads to his children, Mary cooks for the family show ellipsis of a nonfinal complement, there is no simple generalization with such ellipsis in transitives. 48 Alternatively, if lexical items may not be extracted, then it may be appropriate to restrict the value of [3] on word so that it cannot include synsem of type word. Then not could not be extracted or removed in ellipsis. 49 A generalization of this will be required if a similar statement (with a mismatch between ARG-ST and COMPS lists) is to be made for transitive verbs with a "deleted" object (eat, read, etc.), as proposed in Davis (1996). 50 I have not here discussed the peculiarity of the distribution of do, that unstressed affirmative do is absent, or the facts of imperative do, which has a distinct distribution. For an account of these see Warner (1993a:86ff.), where I suggest an informal "blocking" (or default) account of the relationship between the realization of tense as the word do and as verbal affix, and analyze imperative do, don't as unique finites.
REFERENCES Abbott, B. (1976). Right node raising as a test for constituenthood. Linguistic Inquiry, 7, 639-642. Berwick, R. (1985). The acquisition of syntactic knowledge. Cambridge, MA: MIT Press. Borsley, R. D. (1986). A note on HPSG. Bangor Research Papers in Linguistics, 1, 77-85. Borsley, R. D. (1987). Subjects and complements in HPSG. Report No. CSLI-87-107. Stanford: Center for the Study of Language and Information. Borsley, R. D. (1989). Phrase-structure grammar and the Barriers conception of clause structure. Linguistics, 27, 843-863. Borsley, R. D. (1995). On some similarities and differences between Welsh and Syrian Arabic. Linguistics, 33, 99-122. Borsley, R. D. (1996). Modern phrase structure grammar. Oxford: Blackwell. Bouma, G. (1997). Valence alternation without lexical rules. Unpublished manuscript, Rijksuniversiteit. Bouma, G., Malouf, R., and Sag, I. A. (1997). Satisfying constraints on extraction and adjunction. Unpublished manuscript, Stanford University. Carpenter, B. (1992). The logic of typed feature structures with applications to unification grammars, logic programs and constraint resolution. Cambridge: Cambridge University Press. Carpenter, B. (1993). Skeptical and credulous default unification with applications to tern-
English Auxiliaries without Lexical Rules
219
plates and inheritance. In T. Briscoe, A. Copestake, and V. de Paiva (Eds.), Inheritance, defaults, and the lexicon (13-37). Cambridge: Cambridge University Press. Chomsky, N. (1957). Syntactic structures. The Hague: Mouton. Copestake, A. (1993). Defaults in lexical representation. In T. Briscoe, A. Copestake, and V. de Paiva (Eds.), Inheritance, defaults, and the lexicon (223-245). Cambridge: Cambridge University Press. Copestake, A., Flickinger, D., and Sag, I. A. (1997). Minimal Recursion Semantics: an introduction. Unpublished manuscript, Stanford University. Davis, T. (1996). Lexical semantics and linking in the hierarchical lexicon. Ph.D. dissertation, Stanford University. Flickinger, D. (1987). Lexical rules in the hierarchical lexicon. Ph.D. dissertation, Stanford University. Gazdar, G., Pullum, G. K., and Sag, I. A. (1982). Auxiliaries and related phenomena in a restrictive theory of grammar. Language, 58, 591-638. Hankamer, J., and Sag, I. A. (1976). Deep and surface anaphora. Linguistic Inquiry, 7, 391-428. Huddleston, R. (1969). Review of Madeline Ehrman (1966). The meanings of the modals in present-day American English. Lingua, 23, 165-176. Kathol, A. (1994). Passives without lexical rules. In J. Nerbonne, K. Netter, and C. Pollard (Eds.), German in Head-driven Phrase Structure Grammar (237-272). Stanford: CSLI. Kim, J.-B. (1995). English negation from a non-derivational perspective. Proceedings of the 21st Annual Meeting, Berkeley Linguistics Society, 186-197. Kim, J.-B., and Sag, I. A. (1996a). The parametric variation of French and English negation. Proceedings of the 14th Annual Meeting of the West Coast Conference on Formal Linguistics, 303-317. Kim, J.-B., and Sag, I. A. (1996b). French and English negation: a lexicalist alternative to head movement. Unpublished manuscript, Stanford University. Klima, E. (1964). Negation in English. In J. A. Fodor and J. J. Katz (Eds.), The structure of language (246-323). Englewood Cliffs, NJ: Prentice-Hall. Krieger, H.-U., and Nerbonne, J. (1993). Feature-based inheritance networks for computational lexicons. In T. Briscoe, A. Copestake, and V. de Paiva (Eds.), Inheritance, defaults, and the lexicon (90-136). Cambridge: Cambridge University Press. Lappin, S. (1997). An HPSG account of antecedent contained ellipsis. SOAS Working Papers in Linguistics and Phonetics, 7, 103-122. Lascarides, A., Briscoe, T, Asher, N., and Copestake, A. (1996). Order independent and persistent typed default unification. Linguistics and Philosophy, 19, 1-90. Lumsden, M. (1988). Existential sentences: Their structure and meaning. London: Croom Helm. Manning, C., Sag, I. A., and lida, M. (1996). The lexical integrity of Japanese causatives. In T. Gunji (Ed.), Studies on the universality of constraint-based phrase structure grammars (9-37). Report of International Scientific Research Program Project 06044133. Osaka. Manning, C., and Sag, I. A. (1997). Dissociations between argument structure and grammatical relations. Unpublished manuscript, Stanford University.
220
Anthony Warner
Miller, P. H. (1990). Pseudogapping and do so substitution. Proceedings of the 26th Meeting of the Chicago Linguistics Society (293-305). Chicago: Chicago Linguistics Society. Miller, P. H., and Sag, I. A. (1997). French clitic movement without clitics or movement. NLLT, 15, 573-639. Milsark, G. L. (1976). Existential sentences in English. Bloomington: IULC. Palmer, F. R. (1979). Modality and the English modals. London: Longman. Palmer, F. R. (1988). The English verb (2nd ed.). London: Longman. Perkins, M. R. (1983). Modal expressions in English. London: Francis Pinter. Pollard, C., and Sag, I. A. (1987). Information-based syntax and semantics, vol. I: Fundamentals. Stanford: Center for the Study of Language and Information. Pollard, C., and Sag, I. A. (1994). Head-driven Phrase Structure Grammar. Stanford: CSLI and Chicago: University of Chicago Press Pollock, J.-Y. (1989). Verb movement, Universal Grammar, and the structure of IP. Linguistic Inquiry, 20, 365-424. Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J. (1985). A comprehensive grammar of the English language. London and New York: Longman. Riehemann, S. (1994). Morphology and the hierarchical lexicon. Unpublished manuscript, Stanford University. Sag, I. A. (1979). The nonunity of anaphora. Linguistic Inquiry, 10, 152-164. Sag, I. A. (1997). English relative clause constructions. Journal of Linguistics, 33, 431483. Sag, I. A., and Wasow, T. (1997). Syntactic theory: A formal introduction. Unpublished manuscript, Stanford University. (Partial draft of Sept 1997) Warner, A. R. (1993a). English auxiliaries: Structure and history. Cambridge: Cambridge University Press. Warner, A. R. (1993b). The grammar of English auxiliaries: An account in HPSG. York Research Papers in Linguistics, YLLS/RP 1993-4. York: Department of Language and Linguistic Science, University of York. Wechsler, S. (1995). The semantic basis of argument structure. Stanford: CSLI Publications. Williams, E. S. (1984). There-insertion. Linguistic Inquiry, 15, 131-153. Zwicky, A. M., and Pullum, G. K. (1983). Cliticization vs. Inflection: English n't. Language, 59, 502-13.
THE DISCRETE NATURE OF SYNTACTIC CATEGORIES: AGAINST A PROTOTYPE-BASED ACCOUNT FREDERICK J. NEWMEYER Department of Linguistics University of Washington Seattle, Washington
1. PROTOTYPES, FUZZY CATEGORIES, AND GRAMMATICAL THEORY 1.1. Introduction There are many diverse approaches to generative grammar, but what all current models share is an algebraic approach to the explanation of grammatical phenomena.1 That is, a derivation consists of the manipulation of discrete formal objects drawn from a universal vocabulary. Foremost among these objects are the syntactic categories: NP, V, S, and so on. The inventory of categories has changed over the years and differs from model to model. Likewise, their distribution has been constrained by proposals such as X-bar theory, feature subcategorization schemes, and the current (albeit controversial) distinction between lexical and functional categories. Nevertheless, what has remained constant, for the past two decades at least, is the idea that among the primitives of grammatical theory are discrete categories whose members have equal status as far as grammatical processes are concerned. That is, the theory does not regard one lexical item as being "more of a noun" than another, or restrict some process to apply only to the "best sorts" of NP.2 This classical notion of categories has been challenged in recent years by many Syntax and Semantics, Volume 32 The Nature and Function of Syntactic Categories
221
Copyright © 2000 by Academic Press All rights of reproduction in any form reserved. 0092-4563/99 $30
222
Frederick J. Newmeyer
working within the frameworks of functional and cognitive linguistics (see especially Comrie, 1989; Croft, 1991; Cruse, 1992; Dixon, 1977; Heine, 1993; Hopper and Thompson, 1984, 1985; Langacker, 1987, 1991; Taylor, 1989; Thompson, 1988). In one alternative view, categories have a prototype structure, which entails the following two claims for linguistic theory: (1)
Categorial Prototypicality: a. Grammatical categories have "best case" members and members that systematically depart from the "best case." b. The optimal grammatical description of morphosyntactic processes involves reference to degree of categorial deviation from the "best case."
Representatives of both functional linguistics and cognitive linguistics have taken categorial prototypicality as fundamental to grammatical analysis, as the following quotes from Hopper and Thompson (leading advocates of the former) and Langacker (a developer of the latter) attest (I have added emphasis in both passages): It is clear that the concept of prototypicality (the centrality vs. peripherality of instances which are assigned to the same category) has an important role to play in the study of grammar. Theories of language which work with underlying, idealized structures necessarily ignore very real differences, both crosslinguistic and intra-linguistic, among the various degrees of centrality with which one and the same grammatical category may be instantiated. (Hopper and Thompson, 1985:155) How then will the theory achieve restrictiveness? Not by means of explicit prohibitions or categorical statements about what every language must have, but rather through a positive characterization of prototypicality and the factors that determine it.... The theory will thus incorporate substantive descriptions of the various kinds of linguistic structures with the status of prototypes. (Langacker, 1991:513-514)
These approaches attribute prototype structure to (virtually) all of the constructs of grammar, not just the syntactic categories (see, for example, the treatment of the notion "subject" along these lines in Bates and MacWhinney, 1982; Langendonck, 1986; Silverstein, 1976; and Van Oosten, 1986). However, this chapter will focus solely on the syntactic categories. Another position that challenges the classical approach to grammatical categories is that they have nondistinct boundaries: (2) Fuzzy Categories: The boundaries between categories are nondistinct. My impression is that the great majority of functionalists accept categorial prototypicality, and a sizable percentage accept fuzzy categories. Comrie (1989) and Taylor (1989), for example, are typical in that respect. However, Langacker
Syntactic Categories: Against Prototype
223
(1991), while accepting an internal prototype structure for categories, rejects the idea that the boundaries between them are nondistinct, arguing that syntactic categories can be defined by necessary and sufficient semantic conditions. Wierzbicka (1990) accepts this latter conception, but rejects prototypes. She writes: In too many cases, these new ideas [about semantic prototypes] have been treated as an excuse for intellectual laziness and sloppiness. In my view, the notion of prototype has to prove its usefulness through semantic description, not through semantic theorizing, (p. 365)
And Heine (1993), on the basis of studies of grammaticalization, was led to accept fuzzy categories, but to reject categorial prototypicality. In his view, the internal structure of categories is based on the concept of "degree of family resemblance" rather than "degree of prototypicality." The specific goal of this chapter is to defend the classical theory of categories. First, it will provide evidence against categorial prototypicality by rebutting (1b), namely the idea that descriptively adequate grammars need to make reference to the degree of prototypicality of the categories taking part in grammatical processes. To the extent that it is successful, it will thereby provide evidence against (la) as well. Since grammatical behavior gives us the best clue as to the nature of grammatical structure, any refutation of (1b) ipso facto presents a strong challenge to (la). To be sure, it is possible to hold (la), but to reject (1b). Such a view would entail the existence of judgments that categories have "best-case" and "less-thanbest-case" members, without the degree of "best-casedness" actually entering into grammatical description. Does anybody hold such a position? It is not clear. George Lakoff seems to leave such a possibility open. He writes that "prototype effects ... are superficial phenomena which may have many sources" (1987:56) and stresses at length that the existence of such effects for a particular phenomenon should not be taken as prima facie evidence that the mind represents that phenomenon in a prototype structure (see in particular his discussion of prototype effects for even and odd numbers in chapter 9). On the other hand, his discussion of strictly grammatical phenomena suggests that he does attribute to grammatical categories a graded structure with inherent degrees of membership, and the degree of membership is relevant to syntactic description (see his discussion of "nouniness" on pages 63-64, discussed in section 3.4.3 below). In any event, in this chapter I will be concerned only with theories advocating the conjunction of claims (la) and (1b). That is, I will attend only to approaches in which descriptively adequate grammars are said to make reference (in whatever way) to graded categorial structure. Limitations of space force me to ignore a number of topics that are relevant to a full evaluation of all facets of prototype theory. In particular, I will not address the question of whether nonlinguistic cognitive categories have a prototype
224
Frederick J. Newmeyer
structure. Much has appeared in the psychological literature on this topic, and a wide variety of opinions exist (see, for example, Armstrong, Gleitman, and Gleitman, 1983; Dryer, 1997; Fodor and Lepore, 1996; Kamp and Partee, 1995; Keil, 1989; Lakoff, 1987; Mervis and Rosch, 1981; Rosch and Lloyd, 1978; and Smith and Osherson, 1988). However, the evidence for or against a prototype structure for grammatical categories can, I feel, be evaluated without having to take into account what has been written about the structure of semantic, perceptual, and other cognitive categories. The question of whether grammatical categories have a prototype structure is, to a degree, independent of whether they can be defined notionally, that is, whether they can be defined by necessary and sufficient semantic conditions. The arguments put forward to support notional definitions of categories will be addressed and challenged in Newmeyer (1998). Second, I will argue against fuzzy categories. Nothing is to be gained, either in terms of descriptive or explanatory success, in positing categorial continua. The remainder of section 1 provides historical background to a prototype-based approach to syntactic categories. Section 2 discusses how prototype theory has been applied in this regard and discusses the major consequences that have been claimed to follow from categories having a prototype structure. Section 3 takes on the evidence that has been adduced for (1b) on the basis of the claim that prototypical members of a category manifest more morphosyntactic complexity than nonprototypical members. I argue that the best account of the facts makes no reference, either overtly or covertly, to categorial prototypicality. Section 4 argues against fuzzy categories and is followed by a short conclusion (section 5). 1.2. Squishes and Their Legacy Prototype theory was first proposed in Rosch (1971/1973) to account for the cognitive representation of concepts and was immediately applied to that purpose in linguistic semantics (see Lakoff, 1972, 1973).3 This work was accompanied by proposals for treating syntactic categories in nondiscrete terms, particularly in the work of J. R. Ross (see especially Ross, 1973a, 1973b, 1975). Ross attempted to motivate a number of "squishes," that is, continua both within and between categories, among which were the "Fake NP Squish," illustrated in (3), and the "Nouniness Squish," illustrated in (5). Consider first the Fake NP Squish: (3)
The Fake NP Squish (Ross, 1973a): a. Animates b. Events c. Abstracts d. Expletive it e. Expletive there f. Opaque idiom chunks
Syntactic Categories: Against Prototype
225
Progressing downward from (3a) to (3f) in the squish, each type of noun phrase was claimed to manifest a lower degree of noun phrase status than the type above it. Ross's measure of categorial goodness was the number of processes generally characteristic of the category that the NP type was able to undergo. Consider the possibility of reapplication of the rule of Raising. The "best" sort of NPs, animates,4 easily allow it (4a), "lesser" NPs, events, allow it only with difficulty (4b), while "poor" NPs, idiom chunks, do not allow it at all (4c): (4) a. John is likely to be shown to have cheated. b. ?The performance is likely to be shown to have begun late. c. *No headway is likely to have been shown to have been made. Ross proposed the "Nouniness Squish" to illustrate a continuum between categories. Progressing from the left end to the right end, the degree of sententiality seems to decrease and that of noun phrase-like behavior to increase: (5) The "Nouniness Squish" (Ross, 1973b: 141): that clauses > for to clauses > embedded questions > Ace-ing complements > Poss-mg complements > action nominals > derived nominals > underived nominals Ross put forward the strong claim that syntactic processes apply to discrete segments of the squish. For example, preposition deletion must apply before that and for-to complements (6a), may optionally apply before embedded questions (6b), and may not apply before more "nouny" elements (6c): (6) a. I was surprised (*at) that you had measles. b. I was surprised (at) how far you could throw the ball. c. I was surprised *(at) Jim's victory. Given the apparent fact that different processes apply to different (albeit contiguous) segments of the squish, Ross was led to reject the idea of rigid boundaries separating syntactic categories. In other words, Ross's approach involved hypothesizing both categorial prototypicality and fuzzy categories. By the end of the 1970s, however, very few syntactic analyses were still being proposed that involved squishes. Ross's particular approach to categorial continua was problematic in a number of ways. For one thing, it did not seek to provide a more general explanation for why categories should have the structure that he attributed to them. Second, his formalization of the position occupied by an element in the squish, the assignment of a rating between 0 and 1, struck many linguists as arbitrary and unmotivated. No reasonable set of criteria, for example, was ever proposed to determine if an abstract NP merited a rating of, say, .5 or .6 on the noun phrase scale. Third, Ross dealt with sentences in isolation, abstracted away from their extralinguistic context. Since at
226
Frederick J. Newmeyer
that time those linguists who were the most disillusioned with generative grammar were the most likely to take a more "sociolinguistic" approach to grammar, Ross's silence on the discourse properties of the sentences he dealt with seemed to them to be only a slight departure from business as usual. And finally, some doubts were raised about the robustness of the data upon which the squishes were based. Gazdar and Klein (1978) demonstrated that one of them (the "Clausematiness Squish" of Ross, 1975) did not exhibit statistically significant scalar properties that would not show up in an arbitrary matrix. But while Ross's particular approach was abandoned, the central core of his ideas about grammatical categories has lived on. In particular, many linguists continued to accept the idea that they have a prototype structure and/or have fuzzy boundaries. The 1980s saw the development of alternatives to generative grammar that have attempted to incorporate such ideas about categorial structure into grammatical theory. It is to these approaches that we now turn, beginning with an examination of prototypes within functional linguistics.
2. PROTOTYPE THEORY AND SYNTACTIC CATEGORIES Among linguists who take a prototype approach to syntactic categories there is considerable disagreement as to how to define the prototypical semantic and pragmatic correlates of each category. Just to take the category "adjective," for example, we find proposals to characterize its prototypical members in terms of a set of concepts such as "dimension," "physical property," "color," and so on (Dixon, 1977); their "time-stability" (Givon, 1984); their role in description, as opposed to classification (Wierzbicka, 1986); and their discourse functions (which overlap with those of verbs and nouns respectively) of predicating a property of an existing discourse referent and introducing a new discourse referent (Thompson, 1988). This lack of consensus presents a bit of a dilemma for anyone who, like this author, would wish to evaluate the success of prototype theory for syntax without undertaking an exhaustive critique of all positions that have been put forward as claiming success in this matter. My solution will be to adopt for purposes of discussion what I feel is the best motivated, most elaborate, and most clearly explicated proposal for categorial prototypicality, namely that presented in Croft (1991). His proposals for the prototypical semantic and pragmatic properties of noun, adjective, and verb are summarized in Table 1. In other words, the prototypical noun has the pragmatic function of reference, it refers to an object with a valency of 0 (i.e., it is nonrelational), and it is stative, persistent, and nongradable. The prototypical verb has the pragmatic function of
227
Syntactic Categories: Against Prototype
TABLE 1 PROTOTYPICAL CORRELATIONS OF SYNTACTIC CATEGORIES Syntactic category
Semantic class Valency Stativity Persistence Gradability Pragmatic function
Verb
Adjective
Noun
Action
Object
Property
0
1
>1
state persistent nongradable Reference
state persistent gradable Modification
process transitory nongradable Predication
From Croft, 1991:55,65.
predication, it refers to an action, it has a valency of 1 or greater, and is a transitory, nongradable process. The links between semantic class and pragmatic function are, of course, nonaccidental (see Croft, 1991:123), though I will not explore that matter here. Table 1 characterizes the most prototypical members of each category, but not their internal degrees of prototypicality. Most prototype theorists agree that definite human nouns are the most prototypical, with nonhuman animates less prototypical, followed by inanimates, abstract nouns, and dummy nouns such as it and there. As far as adjectives are concerned, Dixon (1977) finds that words for age, dimension, value, and color are likely to belong to the adjective class, however small it is, suggesting that adjectives with these properties make up the prototypical core of that category. Words for human propensities and physical properties are often encoded as nouns and verbs respectively, suggesting that their status as prototypical adjectives is lower than members of the first group. Finally, Croft notes that it is difficult to set up an elaborated prototypicality scale for verbs. However, there seems to be no disagreement on the point that causative agentive active verbs carrying out the pragmatic function of predication are the most prototypical, while nonactive verbs, including "pure" statives and psychological predicates are less so. It is important to stress that the approach of Croft (and of most other contemporary functional and cognitive linguists) differs in fundamental ways from that developed by Ross in the 1970s.5 Most importantly, it adds the typological dimension that was missing in Ross's squishes. Prototypes are not determined, as for Ross, by the behavior of particular categories with respect to one or more grammatical rules in a particular language. Rather, the prototypes for the syntactic categories are privileged points in cognitive space, their privileged position being
228
Frederick J. Newmeyer
determined by typological grammatical patterns. Hence, no direct conclusion can be drawn from the hypothesized universal (cognitive) properties of some prototypical syntactic category about how that category will behave with respect to some particular grammatical process in some particular language. Indeed, it is consistent with Croft's approach that there may be languages in which the category Noun, say, shows no prototype effects at all. Another difference has to do with the structure of categories themselves. Ross assumes that all nonprototypical members of a category can be arranged on a one-dimensional scale leading away from the prototype, that is, hierarchically. Croft, on the other hand, assumes a radial categorial structure (Lakoff, 1987). In such an approach, two nonprototypical members of a category need not be ranked with respect to each other in terms of degree of prototypicality. Croft's theory thus makes weaker claims than Ross's. One might even wonder how the notion "prototypicality" surfaces at all in grammatical description. Croft explains: These [markedness, hierarchy, and prototype] patterns are universal, and are therefore part of the grammatical description of any language. Languagespecific facts involve the degree to which typological universals are conventionalized in a particular language; e.g. what cut-off point in the animacy hierarchy is used to structurally and behaviorally mark direct objects. (Croft, 1990:154)
In other words, grammatical processes in individual languages are sensitive to the degree of deviation of the elements participating in them from the typologically established prototype.
3. PROTOTYPICALITY AND PARADIGMATIC COMPLEXITY The most frequently alluded to morphosyntactic manifestation of prototypicality is that it correlates with what might be called "paradigmatic complexity." That is, more prototypical elements are claimed to have a greater number of distinct forms in an inflectional paradigm than less prototypical elements or to occur in a larger range of construction types than less prototypical elements. In this section, I will challenge the idea that any correlation other than the most rough sort holds between paradigmatic complexity and prototypicality. My conclusion will serve to undermine the grammatical evidence for the idea that categories are nondiscrete. In section 3.1, I review the evidence that has been adduced for the correlation between categorial prototypicality and paradigmatic complexity. Section 3.2 outlines the various positions that could be—and have been—taken to instan-
Syntactic Categories: Against Prototype
229
tiate this correlation in a grammatical description. Section 3.3 shows that for three well-studied phenomena, the postulated correlation is not robust, while section 3.4 presents alternative explanations for phenomena that have been claimed to support a strict correlation and hence nondiscrete categories. 3.1. Paradigmatic Complexity and Prototypes Croft (1991:79-87) defends the idea that the prototypical member of a category manifests more distinct forms in an inflectional paradigm than the nonprototypical and occurs in a broader range of construction types. As he notes (p. 79), each major syntactic category is associated with a range of inflectional categories, though of course languages differ as to which they instantiate: Nouns: number (countability), case, gender, size (augmentative, diminutive), shape (classifiers), definiteness (determination), alienability; Adjectives: comparative, superlative, equative, intensive ("very Adj"), approximative ("more or less Adj" or "Adj-ish"), agreement with head; Verbs: tense, aspect, mood, and modality, agreement with subject and object(s), transitivity. Croft argues that there is systematicity to the possibility of a particular category's bearing a particular inflection. Specifically, if a nonprototypical member of that category in a particular language allows that inflection, then a prototypical member will as well. Crosslinguistically, participles and infinitives, two nonpredicating types of verbal elements, are severely restricted in their tense, aspect, and modality possibilities. (Nonprototypical) stative verbs have fewer inflectional possibilities than (prototypical) active verbs (e.g., they often cannot occur in the progressive). Predicate Ns and (to a lesser extent) predicate As are often restricted morphosyntactically. Predicate Ns in many languages do not take determiners; predicate As do not take the full range of adjectival suffixes, and so on. The same can be said for mass nouns, incorporated nouns, and so on—that is, nouns that do not attribute reference to an object. Furthermore, nonprototypical members of a syntactic category seem to have a more restricted syntactic distribution than prototypical members. As Ross (1987: 309) remarks: "One way of recognizing prototypical elements is by the fact that they combine more freely and productively than do elements which are far removed from the prototypes." This point is amply illustrated by the Fake NP Squish (3). Animate nouns are more prototypical than event nouns, which are more prototypical than abstract nouns, which are more prototypical than idiom chunks. As degree of prototypicality declines, so does freedom of syntactic distribution. The same appears to hold true of verbs. In some languages, for example, only action verbs may occur in the passive construction.
230
Frederick J. Newmeyer
3.2. Paradigmatic Complexity and Claims about Grammar-Prototype Interactions The idea that inflectional and structural elaboration declines with decreasing categorial prototypicality has been interpreted in several different ways. Four positions can be identified that express this idea. In order of decreasing strength, they are "Direct Mapping Prototypicality," "Strong Cut-off Point Prototypicality," "Weak Cut-off Point Prototypicality," and "Correlation-only Prototypicality." I will now discuss them in turn. According to Direct Mapping Prototypicality, morphosyntactic processes make direct reference to the degree of prototypicality of the elements partaking of those processes. In other words, part of our knowledge of our language is a Prototypicality Hierarchy and a grammar-internal mapping from that hierarchy to morphosyntax. Ross's squishes are examples of Direct Mapping Prototypicality. As I interpret his approach, the correlation in English between the position of a noun on the Prototypicality Hierarchy and its ability to allow the reapplication of Raising (see 4a-c) is to be expressed directly in the grammar of English. In Strong Cut-off Point Prototypicality, the effects of prototype structure are encoded in the grammar of each language, but there is no language-particular linking between gradations in prototypicality and gradations in morphosyntactic behavior. To repeat Croft's characterization of this position: These [markedness, hierarchy, and prototype] patterns are universal, and are therefore part of the grammatical description of any language. Languagespecific facts involve the degree to which typological universals are conventionalized in a particular language; e.g., what cut-off point in the animacy hierarchy is used to structurally and behaviorally mark direct objects. (Croft, 1990:154)
One can think of Strong Cut-off Point Prototypicality as a constraint on possible grammars. For example, it would prohibit (i.e., predict impossible) a language in other respects like English, but in which the reapplication of Raising would be more possible with nonprototypical NPs than with prototypical ones. Weak Cut-off Point Prototypicality allows a certain number of arbitrary exceptions to prototype-governed grammatical behavior. Thus it would admit the possibility that the reapplication of Raising could apply to a less prototypical NP than to a more prototypical one, though such cases would be rather exceptional. I interpret the analyses of Hungarian definite objects in Moravcsik (1983) and English there-constructions in Lakoff (1987) as manifesting Weak Cut-off Point Prototypicality. The central, prototypical, cases manifest the phenomenon in question, and there is a nonrandom, yet at the same time unpredictable, linking between the central cases and the noncentral ones. Correlation-only Prototypicality is the weakest position of all. It simply states
Syntactic Categories: Against Prototype
231
that there is some nonrandom relationship between morphosyntactic behavior and degree of prototypicality. 3.3. On the Robustness of the Data Supporting Cut-off Point Prototypicality In this section, I will demonstrate that for three well-studied phenomena, Cutoff Point Prototypicality, in both its strong and weak versions, is disconfirmed. At best, the data support Correlation-only Prototypicality. 3.3.1. THE ENGLISH PROGRESSIVE English is quite poor in "choosy" inflections, but it does permit one test of the correlation between prototypicality and paradigmatic complexity. This is the marker of progressive aspect, -ing. Certainly it is true, as (7a-b) illustrates, that there is a general correlation of categorial prototypicality and the ability to allow progressive aspect (note that both verbs are predicating): (7) a. Mary was throwing the ball. b. *Mary was containing 10 billion DNA molecules. However, we find the progressive with surely nonprototypical temporary state and psychological predicate verbs (8a-b), but disallowed with presumably more prototypical achievement verbs (9): (8) a. The portrait is hanging on the wall of the bedroom, b. I'm enjoying my sabbatical year. (9)
*I'm noticing a diesel fuel truck passing by my window.
Furthermore, we have "planned event progressives," where the possibility of progressive morphology is clearly unrelated to the prototypicality of the verb (cf. grammatical l0a and ungrammatical l0b): (10) a. Tomorrow, the Mariners are playing the Yankees, b. *Tomorrow, the Mariners are playing well. In short, the English progressive falsifies directly the idea that there is a cut-off point on the scale of prototypicality for verbs and that verbs on one side of the cut-off point allow that inflection, while those on the other side forbid it. Furthermore, the exceptions (i.e., the verbs of lesser prototypicality that allow the progressive) do not appear to be simple arbitrary exceptions. Therefore, the facts do not support Weak Cut-off Point Prototypicality either.
232
Frederick J. Newmeyer
One could, of course, attempt to by-pass this conclusion simply by exempting English progressive inflection from exhibiting prototype effects in any profound way. One might, for example, appeal to some semantic or pragmatic principles that account for when one finds or does not find progressive morphology. Indeed, I have no doubt that such is the correct way to proceed (for discussion, see Goldsmith and Woisetschlaeger, 1982; Kearns, 1991; Smith, 1991; Zegarac, 1993; and Swart, 1998). But the point is that degree of verbal prototypicality fails utterly to account for when one finds progressive morphology in English. Therefore the facts lend no support to Croft's claim that the prototypical member of a category manifests more distinct forms in an inflectional paradigm than the nonprototypical member.6 3.3.2. ADJECTIVES Dixon (1977:22-23) cites two languages that distinguish a certain subset of adjectives morphologically. Rotuman (Churchward, 1940) has an open-ended adjective class, but only the (translations of) the following 12 have distinct singular and plural forms: big; long; broad; whole, complete; black; small; short; narrow, thin; old; white; red; female. Acooli (Crazzolara, 1955) has a closed class of about 40 adjectives, 7 of which have distinct singular and plural forms: great, big, old (of persons); big, large (of volume); long, high, distant (of place and time); good, kind, nice, beautiful; small, little; short; bad, bad tasting, ugly. The remaining adjectives translate as new; old; black; white; red; deep; shallow; broad; narrow; hard; soft; heavy; light; wet; unripe; coarse; warm; cold; sour; wise. These two languages, then, refute Strong Cut-off Point Prototypicality. While 11 of the 12 Rotuman adjectives fall into the prototypical adjective classes of age, dimension, value, and color (female would appear to be the exception), any number of adjectives in these classes do not have distinct singular and plural forms. Since there is an open-ended number of adjectives in the language, and there is no reason to think that old is more prototypical than new or young, or female more prototypical than male, there is no cut-off point separating the prototypical from the nonprototypical. And in Acooli there are even more putatively prototypical adjectives in the class with only one form for number than in the class with two forms. Weak Cut-off Point Prototypicality does not fare much better. It is true that no nonprototypical adjectives (except for the word for female in Rotuman) have two number possibilities. But in this language the exceptions turn out to be the norm: 12 forms out of an open-ended number and 7 out of 40 do not provide very convincing support for what is put forward as a universal statement about prototypicality. Weak Cut-off Point Prototypicality is in even more serious trouble for Turkish. Croft (1991:128), citing work by Lewis (1967), mentions that only a subset of
Syntactic Categories: Against Prototype
233
Turkish adjectives allow reduplication for intensification. These include "basic color terms, 'quick,' 'new,' and 'long,' as well as less prototypical adjectives."7 3.3.3. ENGLISH VERBAL ALTERNATIONS As we have noted, Strong Cut-off Point Prototypicality predicts that there should be no grammatical processes applying only to nonprototypical forms. Levin (1993) has provided us with the means to test this hypothesis with respect to English verbal alternations. Depending on how one counts, she describes from several dozen to almost one hundred such alternations. Significantly, many of these are restricted to nonprototypical stative and psychological predicates. The following are some of the alternations that illustrate this point: Various subject alternations: (11) a. The world saw the beginning of a new era in 1492. b. 1492 saw the beginning of a new era. (12) a. We sleep five people in each room, b. Each room sleeps five people. (13) a. The middle class will benefit from the new tax laws, b. The new tax laws will benefit the middle class. There-insertion: (14) a. A ship appeared on the horizon. b. There appeared a ship on the horizon. Locative inversion: (15) a. A flowering plant is on the window sill, b. On the window sill is a flowering plant. 3.4. Some Explanations for Prototypicality Effects We have seen several examples that seem to falsify cut-off point prototypicality. And yet, there are undeniable correlations between prototypicality and the possibility of morphosyntactic elaboration. In general, actives do progressivize more easily than statives; it would seem that in general certain adjective classes allow more structural elaboration than others; and undeniably there are more verbal alternations in English involving active verbs (or an active and its corresponding stative) than statives alone. Any theory of language should be able to explain why this correlation exists. This section will examine a number of English syntactic processes that manifest such correlations and have thereby been invoked to suggest that categories are
234
Frederick J. Newmeyer
nondiscrete. For each case, I will argue that the facts fall out from a theory with discrete categories and independently needed principles.
3.4.1. MEASURE VERBS AND PASSIVE An old problem is the fact that English measure verbs (cost, weigh, measure, etc.) do not passivize: (16) a. The book cost a lot of money, b. John weighed 180 pounds. (17) a. * A lot of money was cost by the book. b. * 180 pounds was weighed by John. The earliest work in generative grammar attempted to handle this fact by means of arbitrarily marking the verbs of this class not to undergo the rule (Lakoff, 1970). But there has always been the feeling that it is not an accident that such verbs are exceptional—there is something seemingly less "verb-like" about them than, say, an active transitive verb like hit or squeeze. Translated into prototype theory, one might suggest that measure verbs are "on the other side of the cut-off point" for passivization in English. And, in fact, such an analysis has been proposed recently in Ross (1995) (though Ross refers to "defectiveness" rather than to "lack of prototypicality"). I will now argue that these facts can be explained without recourse to prototypes.8 I believe that it was in Bresnan (1978) that attention was first called to the distinction between the following two sentences (see also Bach, 1980): (18) a. The boys make good cakes, b. The boys make good cooks. Notice that the NP following the verb in (18a) passivizes, while that following the verb in (18b) does not: (19) a. Good cakes are made by the boys, b. *Good cooks are made by the boys. Bresnan noted that the argument structures of the two sentences differ. Good cakes in (18a) is a direct object patient, while good cooks in (18b) is a predicate nominative. Given her theory that only direct objects can be "promoted" to subject position in passivization, the ungrammaticality of (19b) follows automatically. Turning to (16a-b), we find that, in crucial respects, the semantic relationship between subject, verb, and post-verbal NP parallels that of (18b). "A lot of money" and "180 pounds" are predicate attributes of "the book" and "John" respectively. In a relationally based framework such as Bresnan's, the deviance of
Syntactic Categories: Against Prototype
235
(17a-b) has the same explanation as that of (19b). In a principles-and-parameters approach, a parallel treatment is available. Since "a lot of money" and "180 pounds" are predicates rather than arguments, there is no motivation for the NP to move, thereby accounting for the deviance of the passives.9 Crucially, there is no need for the grammar of English to refer to the degree of prototypicality either of the verb or of the NP that follows it. 3.4.2. THERE AS A NONPROTOTYPICAL NP Recall Ross's Fake NP Squish, repeated below: (3) The Fake NP Squish a. Animates b. Events c. Abstracts d. Expletive it e. Expletive there f. Opaque idiom chunks Expletive there occupies a position very low on the squish. In other words, it seems to manifest low NP-like behavior. First, let us review why one would want to call it a NP at all. The reason is that it meets several central tests for NP status. It raises over the verb seem and others of its class (20a); it occurs as a passive subject (20b); it inverts over auxiliaries (20c); and it can be coindexed with tags (20d): (20) a. b. c. d.
There seems to be a problem. There was believed to be a problem. Is there a problem? There is a problem, isn't there?
The null hypothesis, then, is that there is an NP, with nothing more needing to be said. Now let us review the data that led Ross to conclude that rules applying to NPs have to be sensitive to their categorial prototypicality. He gives the following ways that there behaves like less than a full NP (the asterisks and question marks preceding each sentence are Ross's assignments): It doesn't undergo the rule of "promotion" (21a-b); it doesn't allow raising to reapply (22a-b); it doesn't occur in the "think o f . . . as X" construction (23a-b) or the "what's . . . doing X" construction (24a-b); it doesn't allow "bemg-deletion" (25a-b); it doesn't occur in dislocation constructions (26a-b); it doesn't undergo towgh-movement (27a-b), topicalization (28a-b), "swooping" (29a-b), "equi" (30a-b), or conjunction reduction (31a-b). In each case an example of a more prototypical NP is illustrated that manifests the process:
236
Frederick J. Newmeyer
Promotion: (21) a. Harpo's being willing to return surprised me. / Harpo surprised me by being willing to return. b. There being heat in the furnace surprised me. / *There surprised me by being heat in the furnace. Double raising: (22) a. John is likely _ to be shown _ to have cheated. b. ?*There is likely _ to be shown _ to be no way out of this shoe. Think of . . . as NP: (23) a. I thought of Freud as being wiggy. b. *I thought of there as being too much homework. What's. . .doing X?: (24) a. What's he doing in jail? b. *What's there doing being no mistrial? Being deletion: (25) a. Hinswood (being) in the tub is a funny thought. b. There *(being) no more Schlitz is a funny thought. Left dislocation: (26) a. Those guys, they're smuggling my armadillo to Helen. b. *There, there are three armadillos in the road. Tough movement: (27) a. John will be difficult to prove to be likely to win. b. *There will be difficult to prove likely to be enough to eat. Topicalization: (28) a. John, I don't consider very intelligent. b. *There, I don't consider to be enough booze in the eggnog. Swooping: (29) a. I gave Sandra my zwieback, and she didn't want any. / I gave Sandra, and she didn't want any, my zwieback. b. I find there to be no grounds for contempt proceedings, and there may have been previously. / *I find there, which may have been previously, to be no grounds for contempt proceedings.
Syntactic Categories: Against Prototype
237
Equi: (30) a. After he laughed politely, Oliver wiped his mustache. / After laughing politely, Oliver wiped his mustache. b. After there is a confrontation, there's always some good old-time head busting. / * After being a confrontation, there's always some good oldtime head-busting. Conjunction reduction: (31) a. Manny wept and Sheila wept. / Manny and Sheila wept. b. There were diplodocuses, there are platypuses, and there may well also be diplatocodypuses. / *There were diplodocuses, are platypuses, and may well also be diplatocodypuses. I wish to argue that all of these distinctions follow from the lexical semantics of there and the pragmatics of its use. What does expletive there mean? The tradition in generative grammar has been to call there a meaningless element, or else to identify it as an existential quantifier with no intrinsic sense. Cognitive linguists, on the other hand, following earlier work by Dwight Bolinger (1977), have posited lexical meaning for it. To Lakoff (1987), for example, expletive there designates conceptual space itself, rather than a location in it. To Langacker (1991:352), there designates an abstract setting construed as hosting some relationship. In fact, we achieve the same results no matter which option we choose. Meaningless elements / abstract settings / conceptual spaces are not able to intrude into one's consciousness, thus explaining (21b) and (23b). (24b) is bad because abstract settings, and so on, cannot themselves "act"; rather they are the setting for action. Furthermore, such elements are not modifiable (29-30) nor able to occur as discourse topics (26-28). (25b) is ungenerable, given the uncontroversial requirement that there occur with a verb of existence. In my opinion and that of my consultants, (22b) and (31b) are fully acceptable. In short, the apparent lack of prototypical NP behavior of expletive there is a direct consequence of its meaning and the pragmatics of its use. Nothing is gained by requiring that the rules that affect it pay attention to its degree of prototypicality. Syntax need no more be sensitive to prototypicality to explain the examples of (21-31) than we need a special syntactic principle to rule out (32): (32) The square circle elapsed the dragon. As a general point, the possibility of syntactic elaboration correlates with the diversity of pragmatic possibilities. Concrete nouns make, in general, better topics, better focuses, better new referents, better established referents, and so on than do abstract nouns. We can talk about actions in a wider variety of discourse contexts and for a greater variety of reasons than states. The syntactic
238
Frederick J. Newmeyer
accommodation to this fact is a greater variety of sentence types in which objects and actions occur than abstract nouns and states. There is no reason to appeal to the prototypicality of the noun or the verb. 3.4.3. ENGLISH IDIOM CHUNKS Notice that in the Fake NP Squish, idiom chunks occupy an even lower position than expletive there. Lakoff (1987:63-64), in his presentation of cognitive linguistics, endorsed the idea that their behavior is a direct consequence of their low prototypicality and even went so far as to claim that idiom chunk NPs can be ranked in prototypicality with respect to each other. Drawing on unpublished work by Ross (1981), he ranked four of them as follows, with one's toe the highest in prototypicality, and one's time the lowest: (33) a. b. c. d.
to stub one's toe to hold one's breath to lose one's way to take one's time
Lakoff (for the most part citing Ross's examples), argued that each idiom was more restricted in its syntactic behavior than the next higher in the hierarchy. For example, only to stub one's toe can be converted into a past participle-noun sequence: (34) a. b. c. d.
A stubbed toe can be very painful. *Held breath is usually fetid when released. *A lost way has been the cause of many a missed appointment. *Taken time might tend to irritate your boss.
To stub one's toe and to hold one's breath allow gapping in their conjuncts: (35) a. b. c. d.
I stubbed my toe, and she hers. I held my breath, and she hers. *I lost my way, and she hers. *I took my time, and she hers.
Pluralization possibilities distinguish to stub one's toe from to hold one's breath, and both of these from to lose one's way and to take one's time. When to stub one's toe has a conjoined subject, pluralization is obligatory; for to hold one's breath it is optional; and for the latter two it is impossible: (36) a. b. c. d.
Betty and Sue stubbed their toes. *Betty and Sue stubbed their toe. Betty and Sue held their breaths. Betty and Sue held their breath.
Syntactic Categories: Against Prototype
e. f. g. h.
239
*Betty and Sue lost their ways. Betty and Sue lost their way. *Betty and Sue took their times. Betty and Sue took their time.
Finally, Lakoff judges all but to take one's time to allow pronominalization: (37) a. b. c. d.
I stubbed my toe, but didn't hurt it. Sam held his breath for a few seconds, and then released it. Harry lost his way, but found it again. *Harry took his time, but wasted it.
Lakoff concludes: In each of these cases, the nounier nouns follow the general rule . . . while the less nouny nouns do not follow the rule. As the sentences indicate, there is a hierarchy of nouniness among the examples given. Rules differ as to how nouny a noun they require. (Lakoff, 1987:64).
In all cases but one, however, I have found an independent explanation for the range of judgments on these sentences. Beginning with the participle test, we find that (for whatever reason) held and taken never occur as participial modifiers, even in their literal senses: (38) a. *Held cats often try to jump out of your arms, b. *The taken jewels were never returned. I confess to not being able to explain (34c), since lost does occur in this position in a literal sense: (39)
A lost child is a pathetic sight.
Turning to Gapping, sentences (40a-d) show that the facts cited by Lakoff have nothing to do with the idioms themselves:10 (40) a. b. c. d.
I lost my way, and she her way. I took my time, and she her time. ?I ate my ice cream and she hers. In the race to get to the airport, Mary and John lost their way, but we didn't lose ours (and so we won).
(40a-b) illustrates that the idioms lose one's way and take one's time do indeed allow gapping in their conjuncts. The correct generalization appears to lie in discourse factors. Gapping apparently requires a contrastive focus reading of the gapped constituent. Hence (40c) seems as bad as (35c-d), while (40d) is fine. In the examples involving plurals, what seems to be involved is the ability to individuate. We can do that easily with toes and less easily, but still possibly, with
240
Frederick J. Newmeyer
breaths. But we cannot individuate ways and times. So, nonplural (41a) is impossible—legs demand individuation—while plural (41b) is impossible as well, for the opposite reason. Rice in its collective sense is not individuated: (41) a. *Betty and Sue broke their leg. b. *My bowl is full of rices. Finally, (42a) and (42b) illustrate that time in take one's time is not resistant to pronominalization. (42a) is much improved over (37d) and (42b) is impeccable: (42) a. Harry took his time, and wasted it. b. Harry took his time, which doesn't mean that he didn't find a way to waste it. Again, there is no reason whatever to stipulate that grammatical processes have to be sensitive to the degree of prototypicality of the NP. Independently needed principles—none of which themselves crucially incorporate prototypicality—explain the range of acceptability. 3.4.4. EVENT STRUCTURE AND INFLECTIONAL POSSIBILITIES Let us further explore why there is in general a correlation between degree of prototypicality and inflectional possibilities. Since Vendler (1967) it has been customary to divide the aspectual properties of verbs (and the propositions of which they are predicates) into four event types, generally referred to as "states," "processes," "achievements," and "accomplishments." States (know, resemble) are not inherently bounded, have no natural goal or outcome, are not evaluated with respect to any other event, and are homogeneous. Processes (walk, run) are durative events with no inherent bound. Achievements (die, find, arrive) are momentary events of transition, while Accomplishments (build, destroy) are durative events with a natural goal or outcome. There have been a number of proposals for the representation of event structure (Dowty, 1979; Grimshaw, 1990; Pustejovsky, 1995). The following are the proposals of Pustejovsky (1991): (43) a. States:
b. Processes:
c. Achievements and Accomplishments have the same schematic structure (both are called "transitions"), though the former are nonagentive and the latter agentive:
Syntactic Categories: Against Prototype
241
Two observations are in order. The first is that there is a general increase in the complexity of event structure from states to accomplishments. The second is that this increase in complexity corresponds roughly to the the degree of prototypicality for verbs. From these observations, we may derive the reason for the correlation between prototypicality and inflectional possibilities to hold for verbs as a general tendency. There is clearly a mapping between the event structure of a proposition and those aspects of morphosyntactic structure in which tense, aspect, modality, and so on are encoded. In standard varieties of principles-andparameters syntax, again, this is the "functional structure" representation of the sentence. Now, the more complex the event structure of a proposition, the more aspectual possibilities that it allows. Hence, the more complex (in terms of number of projections) the functional structure can be. And, of course, it follows that the possibilities for inflection will be greater. In other words, we have derived a general correlation between degree of prototypicality and potential richness of verbal inflection without any reference to prototypicality per se. It should be pointed out that this approach demands that functional projections exist only where they are motivated (i.e., that there can be no empty projections). Otherwise, there would be no difference in functional structure between states and accomplishments. In other words, it presupposes the principle of Minimal Projection, proposed and defended in Grimshaw (1993). According to this principle, a projection must be functionally interpreted, that is, it must make a contribution to the functional representation of the extended projection of which it is part.11 The correlation between semantic complexity and inflectional possibilities holds for nouns as well. Objects (e.g., people, books, automobiles, etc.) can be individuated and specified in a way that abstract nouns such as liberty and dummy nouns like there cannot. So it follows that the semantic structure of concrete nouns will have the general potential to map onto more layers of nominal functional structure than that of abstract nouns. There are two problems, however, that are faced by an exclusively semantic account of restrictions on inflection. The first is that some languages have inflections that are restricted to some forms but not others, even though there appears to be no semantic explanation for the restriction. For example, Croft (1990:82) notes that process verbs in Quiche take tense-aspect inflectional prefixes, while stative verbs do not and writes: There is no apparent reason for this, since there is no semantic incompatibility between the inflectional prefixes ... and in fact in a language like English stative predicates do inflect for tense. It is simply a grammatical fact regarding the
242
Frederick J. Newmeyer expression of stative predicates in Quiche. As such, it provides very strong evidence for the markedness of stative predicates compared to process predicates.
Although I do not pretend to have a full explanation for cases such as these, I would venture to guess that pragmatic factors are overriding semantic ones. Both process verbs and stative verbs can logically manifest tense, aspect, and modality, though in many discourse contexts such distinctions are irrelevant for the latter. Thus pragmatic factors have kept the grammars of Quiche and languages manifesting similar phenomena from grammaticalizing tense, aspect, and modality for stative verbs. No appeal to prototypicality is necessary. A second problem with a semantic approach to inflection is that inflections that appear to be semantically empty also manifest prototype effects. Take agreement inflections, for example. As is noted in Croft (1988), where languages have a "choice" as to object agreement, it is always the more definite and/or animate (i.e., more prototypical) direct object that has the agreement marker. Does this fact support prototype theory? Not necessarily: In the same paper Croft argues that agreement has a pragmatic function, namely to index important or salient arguments. So, if agreement markers are not "pragmatically empty," then their presence can be related to their discourse function and need not be attributed to the inherently prototypical properties of the arguments to which they are affixed.
4. THE NONEXISTENCE OF FUZZY CATEGORIES We turn now to the question of whether categories have distinct boundaries, or, alternatively, whether they grade one into the other in fuzzy squish-like fashion. I examine two phenomena that have been appealed to in support of fuzzy categories—English near (section 4.1) and the Nouniness Squish (section 4.2)—and conclude that no argument for fuzzy categories can be derived from them. 4.1. English Near Ross (1972) analyzes the English word near as something between an adjective and a preposition. Like an adjective, it takes a preposition before its object (44a) and like a preposition, it takes a bare object (44b): (44) a. The shed is near to the barn, b. The shed is near the barn. So it would appear, as Ross concluded, that there is a continuum between the categories Adjective and Preposition, and near is to be analyzed as occupying a position at the center of the continuum. A prototype theorist would undoubtedly
Syntactic Categories: Against Prototype
243
conclude that the intermediate position of near is a consequence of its having neither prototypical adjectival nor prototypical prepositional properties. In fact, near can be used either as an adjective or a preposition. Maling (1983) provides evidence for the former categorization. Like any transitive adjective, it takes a following preposition (45a); when that preposition is present (i.e., when near is adjectival), the degree modifier must follow it (45b); it takes a comparative suffix (45c); and we find it (somewhat archaically) in prenominal position (45d): (45) a. b. c. d.
The gas station is near to the supermarket. Near enough to the supermarket Nearer to the supermarket The near shore
Near passes tests for Preposition as well. It takes a bare object (46a); when it has a bare object it may not be followed by enough (46b), but may take the prepositional modifier right (46c): (46) a. The gas station is near the supermarket. b. *The gas station is near enough the supermarket.12 c. The gas station is right near (*to) the supermarket. It is true that, as a Preposition, it uncharacteristically takes an inflected comparative: (47) The gas station is nearer the supermarket than the bank. But, as (48) shows, other prepositions occur in the comparative construction; it is only in its being inflected, then, that near distinguishes itself:13 (48) The seaplane right now is more over the lake than over the mountain. Thus I conclude that near provides no evidence for categorial continua. 4.2. The Nouniness Squish Recall Ross's Nouniness Squish (5), repeated below: (5) The "Nouniness Squish": that clauses > for to clauses > embedded questions > Ace-ing complements > Poss-ing complements > action nominals > derived nominals > underived nominals This squish grades nouns (or, more properly, the phrases that contain them) along a single dimension—their degree of nominality. Subordinate and relative clauses introduced by the complementizer that are held to be the least nominal; underived nominals (i.e., simple nouns) are held to be the most nominal. But, according to Ross, there is no fixed point at which the dominating phrase node
244
Frederick J. Newmeyer
ceases to be S and starts to be NP; each successive element on the squish is held to be somewhat more nominal than the element to its left. As Ross is aware, demonstrating a fuzzy boundary between S and NP entails (minimally) showing that syntactic behavior gradually changes as one progresses along the squish; that is, that there is no place where S "stops" and NP "starts." We will now examine two purported instances of this graduality. As we will see, the facts are perfectly handlable in an approach that assumes membership in either S or NP. First, Ross claims that "the nounier a complement is, the less accessible are the nodes it dominates to the nodes which command the complement" (Ross, 1973b: 174). That is, it should be harder to extract from a phrase headed by an underived N than from a that clause. He illustrates the workings of this principle with the data in (49) and concludes that "the dwindling Englishness of [these] sentences supports [this principle]" (p. 175): (49) a. b. c. d. e. f. g. h.
I wonder who he resented (it) that I went steady with. I wonder who he would resent (it) for me to go steady with. *I wonder who he resented how long I went steady with. ?I wonder who he resented me going out with. ??I wonder who he resented my going out with. ?*I wonder who he resented my careless examining of. ?*I wonder who he resented my careless examination of. ?*I wonder who he resented the daughter of.
But one notes immediately that, even on Ross's terms, we do not find consistently "dwindling Englishness": (49c) is crashingly bad. Furthermore, he admits (in a footnote) that many speakers find (49h) fine. In fact, the data seem quite clear to me: (49a-b, d-e, h) are acceptable, and (49c, f-g) are not. The latter three sentences are straightforward barriers violations, given the approach of Chomsky (1986), while the others violate no principles of Universal Grammar. The degree of "nouniness" plays no role in the explanation of these sentences. Second, Ross suggests that the phenomenon of pied piping—that is, whmovement carrying along material dominating the fronted wh-phrase, is sensitive to degree of nouniness. He cites (50a–f) in support of this idea. It would appear, he claims, that the more nouny the dominating phrase, the more pied piping is possible: (50) a. b. c.
*Eloise, [for us to love [whom]] they liked, is an accomplished washboardiste. *Eloise, [us loving [whom]] they liked, is an accomplished washboardiste. *Eloise, [our loving [whom]] they liked, is an accomplished washboardiste.
Syntactic Categories: Against Prototype
245
d. ?*Eloise, [our loving of [whom]] they liked, is an accomplished washboardiste. e. ?Eloise, [our love for [whom]] they liked, is an accomplished washboardiste. f. Eloise, [a part of [whom]] they liked, is an accomplished washboardiste. Again, there is no support for a categorial continuum in these data. For to clauses, Ace-ing complements, and Poss-wg complements are all dominated by the node S, which can never pied pipe. Hence (50a-c) are ungrammatical. (50d-e), on the other hand, are all fully grammatical, though this is masked by the stylistic awkwardness of the loving... liked sequence. By substituting was a joy to our parents for they liked some of the awkwardness is eliminated and both sentences increase in acceptability.
5. CONCLUSION The classical view of syntactic categories assumed in most models of generative grammar has seen two major challenges. In one, categories have a prototype structure, in which they have "best-case" members and members that systematically depart from the "best case." In this approach, the optimal grammatical description of morphosyntactic processes is held to involve reference to degree of categorial deviation from the "best case." The second challenge hypothesizes that the boundaries between categories are nondistinct, in the sense that one grades gradually into another. This chapter has defended the classical view, arguing that categories have discrete boundaries and are not organized around central "best cases." It has argued that many of the phenomena that seem to suggest the inadequacy of the classical view are best analyzed in terms of the interaction of independently needed principles from syntax, semantics, and pragmatics.
NOTES 1 An earlier version of this chapter was presented at Universidade Federal de Minas Gerais, Universidade de Campinas, Universidade Federal do Rio de Janeiro, the University of California at San Diego, and the University of Washington, as well as at two conferences: The International Conference on Syntactic Categories (University of Wales) and the Fifth International Pragmatics Conference (National Autonomous University of Mexico). It has benefited, I feel, from discussion with Paul K. Andersen, Leonard Babby, Robert
246
Frederick J. Newmeyer
Borsley, Ronnie Cann, William Croft, John Goldsmith, Jeanette Gundel, Ray Jackendoff, Jurgen Klausenburger, Rob Malouf, Pascual Masullo, Edith Moravcsik, Elizabeth Riddle, Margaret Winters, and Arnold Zwicky. I have no illusions, however, that any of these individuals would be entirely happy about the final product. For deeper discussion of many of the issues treated here, see Newmeyer (1998). 2 In most generative approaches, categories have an internal feature structure, which allows some pairs of categories to share more features than others and individual categories to be unspecified for particular features. Head-driven Phrase Structure Grammar (HPSG) goes further, employing default-inheritance mechanisms in the lexicon. These lead, in a sense, to some members of a category being "better" members of that category than others (see, for example, the HPSG treatment of auxiliaries in Warner (1993a, b). A similar point can be made for the "preference rules" of Jackendoff and Lerdahl (1981) and Jackendoff (1983). Nevertheless, in these (still algebraic) accounts, the distance of an element from the default setting is not itself directly encoded in the statement of grammatical processes. 3 For more recent work on prototype theory and meaning representation, see Coleman and Kay (1981); Lakoff (1987); Geeraerts (1993); and many of the papers in Rudzka-Ostyn (1988), and Tsohatzidis (1990). For a general discussion of prototypes, specifically within the framework of cognitive linguistics, see Winters (1990). 4 Ross's work did not employ the vocabulary of the then nascent prototype theory. However, as observed in Taylor (1989:189), his reference to "copperclad, brass-bottomed NP's" (p. 98) to refer to those at the top of the squish leaves no doubt that he regarded them as the most "prototypical" in some fundamental sense. 5 Iam indebted to William Croft (personal communication) for clarifying the differences between his approach and Ross's. 6 Along the same lines, Robert Borsley informs me (personal communication) that Welsh and Polish copulas provide evidence against the idea that prototypical members of a category necessarily have more inflected forms than nonprototypical members. One assumes that the copula is a nonprototypical verb, but in Welsh it has five (or six) tenses compared with three (or four) for a standard verb, and in Polish it has three tenses, compared with two for a standard verb. On the other hand, one might take the position expressed in Croft (1991) that copulas are categorially auxiliaries, rather than verbs. 7 And it should be pointed out that Dixon says that words in the semantic field of "speed" (e.g., quick) tend to lag behind the four most prototypical classes in their lexicalization as adjectives. 8 I would like to thank Pascual Masullo and Ray Jackendoff (personal communication) for discussing with me the alternatives to the prototype-based analysis. 9 Adger (1992, 1994) offers a treatment of measure verbs roughly along these lines. In his analysis, measure phrases, being "quasi-arguments" (i.e., not full arguments), do not raise to the specifier of Agreement, thereby explaining the impossibility of passivization. Indeed, thematic role-based analyses, going back at least to Jackendoff (1972), are quite parallel. For Jackendoff, measure phrases are "locations." They are unable to displace the underlying "theme" subjects of verbs such as cost or weigh, since "location" is higher on the thematic hierarchy than "theme." Calling them "locations," it seems to me, is simply another way of saying that they are not true arguments. 10 I am indebted to Ronnie Cann for pointing this out to me.
Syntactic Categories: Against Prototype
247
"Grimshaw (1997) derives Minimal Projection from the principles of Economy of Movement and Oblig Heads. 12 Maling (1983) judges sentences of this type acceptable, and on that basis rejects the idea that near is a P. I must say that I find (46b) impossible. 13 Presumably the inflectional possibilities of near are properties of its neutralized lexical entry; not of the ADJ or P branch of the entry.
REFERENCES Adger, D. (1992). The licensing of quasi-arguments. In P. Ackema and M. Schoorlemmer (Eds.), Proceedings of ConSole I (pp. 1-18). Utrecht: Utrecht University. Adger, D. (1994). Functional heads and interpretation. Unpublished Ph.D. thesis, University of Edinburgh. Armstrong, S. L., Gleitman, L., and Gleitman, H. (1983). What some concepts might not be. Cognition, 13, 263-308. Bach, E. (1980). In defense of passive. Linguistische Berichte, 70, 38-46. Bates, E., and MacWhinney, B. (1982). Functionalist approaches to grammar. In E. Wanner and L. Gleitman (Eds.), Language acquisition: The state of the art (pp. 173-218). Cambridge: Cambridge University Press. Bolinger, D. (1977). Meaning and form. English Language Series 11. London: Longman. Bresnan, J. W. (1978). A realistic transformational grammar. In M. Halle, J. Bresnan, and G. Miller (Eds.), Linguistic theory and psychological reality (pp. 1-59). Cambridge, MA: MIT Press. Chomsky, N. (1986). Barriers. Cambridge, MA: MIT Press. Churchward, C. M. (1940). Rotuman grammar and dictionary. Sydney: Australasian Medical Publishing Co. Coleman, L., and Kay, P. (1981). Prototype semantics. Language, 57, 26-44. Comrie, B. (1989). Language universals and linguistic typology (2nd ed.). Chicago: University of Chicago Press. Crazzolara, J. P. (1955). A study of the Acooli language. London: Oxford University Press. Croft, W. (1988). Agreement vs. case marking and direct objects. In M. Barlow and C. A. Ferguson (Eds.), Agreement in natural language: Approaches, theories, descriptions (pp. 159-179). Stanford, CA: Center for the Study of Language and Information. Croft, W. (1990). Typology and universals. Cambridge: Cambridge University Press. Croft, W. (1991). Syntactic categories and grammatical relations. Chicago: University of Chicago Press. Cruse, D. A. (1992). Cognitive linguistics and word meaning: Taylor on linguistic categorization. Journal of Linguistics, 28, 165-184. Dixon, R. M. W. (1977). Where have all the adjectives gone? Studies in Language, 1, 1-80. Dowty, D. R. (1979). Word meaning and Montague grammar. Dordrecht: Reidel. Dryer, M. S. (1997). Are grammatical relations universal? In J. Bybee, J. Haiman, and
248
Frederick J. Newmeyer
S. A. Thompson (Eds.), Essays on language function and language type (pp. 115143). Amsterdam: John Benjamins. Fodor, J. A., and Lepore, E. (1996). The red herring and the pet fish: Why concepts still can't be prototypes. Cognition, 58, 253-270. Gazdar, G., and Klein, E. (1978). Review of Formal semantics of natural language by E. L. Keenan (ed.). Language, 54, 661-667. Geeraerts, D. (1993). Vagueness's puzzles, polysemy's vagaries. Cognitive Linguistics, 4, 223-272. Givon, T. (1984). Syntax: A functional-typological introduction (vol. 1). Amsterdam: John Benjamins. Goldsmith, J., and Woisetschlaeger, E. (1982). The logic of the English progressive. Linguistic Inquiry, 13, 79-89. Grimshaw, J. (1990). Argument structure. Cambridge, MA: MIT Press. Grimshaw, J. (1993). Minimal projection, heads, and optimality. Technical Report 4. Piscataway, NJ: Rutgers Center for Cognitive Science. Grimshaw, J. (1997). Projection, heads, and optimality. Linguistic Inquiry, 28, 373-422. Heine, B. (1993). Auxiliaries: Cognitive forces and grammaticalization. New York: Oxford University Press. Hopper, P. J., and Thompson, S. A. (1984). The discourse basis for lexical categories in universal grammar. Language, 60, 703-752. Hopper, P. J., and Thompson, S. A. (1985). The iconicity of the universal categories 'noun' and 'verb.' In J. Haiman (Ed.), Iconicity in syntax (pp. 151-186). Amsterdam: John Benjamins. Jackendoff, R. (1972). Semantic interpretation in generative grammar. Cambridge, MA: Cambridge University Press. Jackendoff, R. (1983). Semantics and cognition. Cambridge, MA: MIT Press. Jackendoff, R., and Lerdahl, F. (1981). Generative music theory and its relation to psychology. Journal of Music Theory, 25, 45-90. Kamp, H., and Partee, B. H. (1995). Prototype theory and compositionality. Cognition, 57, 129-191. Kearns, K. S. (1991). The semantics of the English progressive. Unpublished Ph.D. dissertation, MIT. Keil, F. C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: Bradford Books. Lakoff, G. (1970). Irregularity in syntax. New York: Holt, Rinehart, and Winston. Lakoff, G. (1972). Hedges: A study in meaning criteria and the logic of fuzzy concepts. Chicago Linguistic Society, 8, 183-228. Lakoff, G. (1973). Fuzzy grammar and the performance / competence terminology game. Chicago Linguistic Society, 9, 271-291. Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press. Langacker, R. W. (1987). Nouns and verbs. Language, 63, 53-94. Langacker, R. W. (1991). Foundations of cognitive grammar: volume 2; descriptive application. Stanford, CA: Stanford University Press. Langendonck, W. van (1986). Markedness, prototypes, and language acquisition. Cahiers de I'institut de linguistique de Louvain, 12, 39-76.
Syntactic Categories: Against Prototype
249
Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press. Lewis, G. L. (1967). Turkish grammar. Oxford: Oxford University Press. Maling, J. (1983). Transitive adjectives: A case of categorial reanalysis. In F. Heny and B. Richards (Eds.), Linguistic categories: Auxiliaries and related puzzles 1: Categories (pp. 253-289). Dordrecht: Reidel. Mervis, C. B., and Rosch, E. (1981). Categorization of natural objects. Annual Review of Psychology, 32, 89-115. Moravcsik, E. A. (1983). On grammatical classes—the case of "definite" objects in Hungarian. Working Papers in Linguistics, 15, 75-107. Newmeyer, F. J. (1998). Language form and language function. Cambridge, MA: MIT Press. Pustejovsky, J. (1991). The syntax of event structure. Cognition, 41, 47-81. Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press. Rosch, E. (1971/1973). On the internal structure of perceptual and semantic categories. In T. E. Moore (Ed.), Cognitive development and the acquisition of language (pp. 111144). New York: Academic Press. Rosch, E., and Lloyd, B. B. (Eds.) (1978). Cognition and categorization. Hillsdale, NJ: Erlbaum. Ross, J. R. (1972). The category squish: Endstation Hauptwort. Chicago Linguistic Society, 8, 316-328. Ross, J. R. (1973a). A fake NP squish. In C.-J. N. Bailey and R. Shuy (Eds.), New ways of analyzing variation in English (pp. 96-140). Washington: Georgetown. Ross, J. R. (1973b). Nouniness. In O. Fujimura (Ed.), Three dimensions of linguistic theory (pp. 137-258). Tokyo: TEC Company, Ltd. Ross, J. R. (1975). Clausematiness. In E. L. Keenan (Ed.), Formal semantics of natural language (pp. 422-475). London: Cambridge University Press. Ross, J. R. (1981). Nominal decay. Unpublished ms., MIT. Ross, J. R. (1987). Islands and syntactic prototypes. Chicago Linguistic Society, 23, 309320. Ross, J. R. (1995). Defective noun phrases. Chicago Linguistic Society, 31, 398-440. Rudzka-Ostyn, B. (Ed.) (1988). Topics in cognitive linguistics. Amsterdam: John Benjamins. Silverstein, M. (1976). Hierarchy of features and ergativity. In R. M. W. Dixon (Ed.), Grammatical categories in Australian languages (pp. 112-171). Canberra: Australian Institute of Aboriginal Studies. Smith, C. (1991). The parameter of aspect. Dordrecht: Kluwer. Smith, E. E., andOsherson, D. N. (1988). Conceptual combination with prototype concepts. In A. Collins and E. E. Smith (Eds.), Readings in cognitive science: A perspective from psychology and artificial intelligence (pp. 323-335). San Mateo, CA: M. Kaufman. Swart, H. de (1998). Aspect shift and coercion. Natural Language and Linguistic Theory. Taylor, J. R. (1989). Linguistic categorization: Prototypes in linguistic theory. Oxford: Clarendon. Thompson, S. A. (1988). A discourse approach to the cross-linguistic category 'adjective.' In J. Hawkins (Ed.), Explaining language universals (pp. 167-185). Oxford: Basil Blackwell.
250
Frederick J. Newmeyer
Tsohatzidis, S. L. (Ed.) (1990). Meanings and prototypes: Studies in linguistic categorization. London: Routledge. Van Oosten, J. (1986). The nature of subjects, topics, and agents: A cognitive explanation. Bloomington, IN: Indiana University Linguistics Club. Vendler, Z. (1967). Linguistics in philosophy. Ithaca, NY: Cornell University Press. Warner, A. R. (1993a). English auxiliaries: Structure and history. Cambridge: Cambridge University Press. Warner, A. R. (1993b). The grammar of English auxiliaries: An account in HPSG. York Research Papers in Linguistics Research Paper (YLLS/RP 1993-4), 1-42. Wierzbicka, A. (1986). What's in a noun? (Or: How do nouns differ in meaning from adjectives?). Studies in Language, 10, 353-389. Wierzbicka, A. (1990). 'Prototypes save': On the uses and abuses of the notion of 'prototype' in linguistics and related fields. In S. L. Tsohatzidis (Ed.), Meanings and prototypes: Studies in linguistic categorization (pp. 347-367). London: Routledge. Winters, M. E. (1990). Toward a theory of syntactic prototypes. In S. L. Tsohatzidis (Ed.), Meanings and prototypes: Studies in linguistic categorization (pp. 285-306). London: Routledge. Zegarac, V. (1993). Some observations on the pragmatics of the progressive. Lingua, 90, 201-220.
SYNTACTIC COMPUTATION AS LABELED DEDUCTION: WH A CASE STUDY RUTHKEMPSON* WILFRIED MEYER VIOl + DOVGABBAY + * Department of Philosophy Kings College London University of London London, United Kingdom + Department of Computing Kings College London University of London London, United Kingdom
*Department of Computing Kings College London University of London London, United Kingdom
1. THE QUESTION Over the past 30 years, the phenomenon of long-distance dependence has become one of the most well-studied phenomena. Requiring as it does correlation between some position in a string and the c-commanding operator which determines its interpretation, it is uncontroversially assumed across different theoretical frameworks to involve an operator-variable binding phenomenon as in standard predicate logics (cf. Chomsky, 1981; Morrill, 1994; Pollard and Sag, 1991; Johnson and Lappin, 1997). However, it is known to display a number of properties which distinguish it from the logical operation of quantifier variable binding, and these discrepancies are taken to be indicative of the syntactic idiosyncracy of Syntax and Semantics, Volume 32 The Nature and Function of Syntactic Categories
251
Copyright © 2000 by Academic Press All rights of reproduction in any form reserved. 0092-4563/99 $30
252
Ruth Kempson et al.
natural language formalisms. Investigation of these properties has led to the postulation of increasing numbers of discrete phenomena. There has been little attempt until recently to ask the question as to why the overall cluster of w/z-processes exist (for recent partial attempts, cf. Cheng, 1991; Muller and Sternefeld, 1996).1 The primary purpose of this chapter is to propose an answer to this question. Having set out an array of largely familiar data in section 1, in section 2 we develop the LDSNL framework, within which the analysis is set. This is a formal deductive framework being established as a model of the process of utterance interpretation. Then in section 3 we present a unified account of the crossover phenomenon, and in sections 4-5 we briefly indicate analyses of wh-in situ, multiple wh-questions, and partial wh-movement phenomena, showing how a typology of wh-variation emerges. In all cases, the solution will make explicit reference to the discrete stages whereby interpretation is incrementally built up in moving on a left-right basis from the initial empty state to the completed specification of a logical form corresponding to the interpretation of the string in context. The account is thus essentially procedural, in the sense of focusing not just on properties of some resulting interpretation, but on how it is established stepwise. In closing we reflect on the direction which this conclusion suggests—that the boundaries between syntax, semantics, and pragmatics need to be redrawn, with syntax redefined as the dynamic projection of structure within an abstract parsing schema. 1.1. Failure to Display Scopal Properties in Parallel with Quantifying Expressions As is well known, w/z-expressions fail to display scopal properties in parallel with quantifying expressions. Initial w/z-expressions may take narrow scope with respect to any operator following it as long as that operator precedes the position of the gap. Hence (1) allows as answers to this question, ones in which the wh has been construed as having scope relative to the expression every British farmer in the subordinate clause. (1) What is the Union insisting that every British farmer should get rid of ? Answer: At least 1000 cattle. Answer: His cattle. On the assumption that scope is displayed in the syntactic structure assigned to the string, questions such as these appear to require an LF specification which displays the relative scope of the two expressions in contravention of the structure associated with the surface string. This phenomenon is quite unlike quantifiers in logical systems. A given quantifier may bind free variables if and only if these variables are within its scope, where this is defined by the rule of syntax that introduces that quantifier, hence by definition guaranteeing a configuration equivalent to c-command. Furthermore, other natural language quantifiers behave
Computation as Labeled Deduction
253
much more like logical quantifiers, and in the main must be interpreted internally to the clause in which they are contained.2 Thus (2)-(3) are unambiguous. Neither can be interpreted with the quantified expression in the subordinate clause taking scope over the matrix subject: (2) Every British farmer is complaining that most countries of the EU fail to appreciate the problem =£ Tor most countries of the EUX, every British farmer is complaining that x fails to appreciate the problem' (3) Most countries of the EU are responding that every British farmer fails to appreciate the seriousness of the problem. 'Of every British farmery most countries of the EU are responding that y fails to appreciate the seriousness of the problem.' This phenomenon can be analyzed by defining w/i-expressions to be a complex higher-type quantifier simultaneously binding two positions, one of which is an invisible pronominal element (Chierchia, 1992), but this technical solution fails to provide any basis for explaining other phenomena associated with whexpressions. Crossover phenomena in particular, though an essential piece of supporting evidence for this analysis, become a mere syntactic stipulation. 1.2. Crossover Pretheoretically, the crossover phenomena is simply the interaction between wh and anaphora construal. Within the General Binding (GB) paradigm, this has been seen as dividing into at least three discrete phenomena (Chomsky, 1981; Lasnik and Stowell, 1991; Postal, 1993). The data are as follows: (4)
*Whoi, does Joan think that hei worries ei is sick?
(5) *Who, does Joan think that his, mother worries ei is sick? (6) *Whosei exam resultsi was hei certain ej would be better than anyone else's? (7) Whoi does Joan think ei worries hisi mother is sick? (8) Whoi does Joan think ei worries that he, is sick? (9) Whosei exam resultsj ej were so striking that hei was suspected of cheating? (10) *Johni who Sue thinks that hei worries ei is sick unecessarily, was at the lecture. (11) John, whoi hisi mother had ignored ei fell ill during the exam period. (12) John, whosei exam results hei had been certain ei would be better than anyone else's, failed dismally.
254
Ruth Kempson et al.
The need to distinguish discrete subclasses of phenomena arises from the analysis of the gap as a name, subject to Principle C of the A-binding principles (Chomsky, 1981). A strong crossover principle is said to preclude a gap (as a name) being coindexed with any c-commanding argument expression, hence precluding (4), (10), and possibly (6), while licensing (7)-(9) on the grounds that the relation between gap and w/z-operator is a relation of A' binding and not of A-binding. ((6) has been dubbed "extended strong crossover" because the wh-expression, being a possessive determiner, doesn't, strictly speaking, bind the gap, but a subexpression within it.) Such a restriction however fails to preclude (5) and (6), for which a separate restriction of weak crossover is set up. There are several versions of this principle (Higginbotham, 1981; Koopman and Sportiche, 1982; Chomsky, 1981; Lasnik and Stowell, 1991)—the simplest is that a pronoun which does not c-command a given trace may nevertheless not be coindexed with it if it is to the left of the trace (and right of the binding operator). This restriction in turn, however, fails to predict that in some circumstances this restriction may get suspended, as in (11)-(12), and an alternative analysis in which the traces in this position are not names but prominal-like "epithet" expression, is advocated. The phenomenon is thereby seen as a cluster of heterogeneous data, not amenable to a unified analysis. No explanation is proffered for why the data should be as they are, and Postal (1993) describes the phenomenon as a mystery. 1.3. Subjacency Effects and the W/i-initial versus Wh-in Situ Asymmetry There are also the familiar island restrictions associated with wh-initial expressions, which are alien to quantification in formal systems, and so, like the other data, differentiate long-distance dependency effects from regular operatorvariable binding. However, more striking is that wh-in situ expressions, despite commonly being said to be subject to the same movement as wh-initial expressions but at the level of LF (Reinhart, 1991; Aoun and Li, 1991; Huang, 1982) are characteristically not subject to these same restrictions. So, unlike (13), (14) allows an interpretation in which the w/i-expression is construed, so to speak, externally to the domain within which the wh-expression is situated: (13)
* Which document did the journalist that leaked to the press apologize?
(14)
The journalist that leaked which document to the press became famous overnight?
The phenomenon of wh-in situ is arguably peripheral in English, but in languages where this is the standard form of wh-question, the distribution of wh-in situ, unless independently restricted (cf. the data of Iraqi Arabic below), is characteristically not subject to the same constraints as w/z-movement (Chinese, Japanese, Malay) (data from Simpson, 1995):
Computation as Labeled Deduction
255
(15) Ni bijiao xihuan [[ta zenmeyang zhu] de cat] ? CHINESE You more like how cook REL food 'What is the means x such that you prefer the dishes which he cooks by x?' 1.4. Multiple W/t-Structures Paired with this phenomenon are multiple w/z-questions, of which the initial wh-expression is subject to island restrictions, but the w/i-expression in situ is not: (16) Who do you think should review which book? (17) *WhOi did the journalist leak the document in which Sue had criticized ei to which press? (18)
Who reported the journalist that leaked which document to the press?
1.5. Partial Wh-Movement Of the set of data we shall consider, there is finally the phenomenon in German dubbed "partial wh-movement" in which apparently expletive wh-elements anticipate full wh-expressions later in the string, but are not themselves binders of any gapped position. (19)
Was glaubst du was Hans meint mil wem Jakob gesprochen hat? 'With whom do you think Hans thought/said Jakob had spoken?'
Such expletive elements must invariably take the form was in all complementizer positions between the initial position and the wh-expression they anticipate, but subsequent to that full wh-expression, the complementizer selected must be dass. This gives rise to a number of discrete forms, with identical interpretation: (20)
Was glaubst du mit wem Hans meint dass Jakob gesprochen hat?
(21) Mit wem glaubst du dass Hans meint dass Jakob gesprochen hat? 'With whom do you think Hans thought/said Jakob had spoken?' This phenomenon, with minor variations, is widespread in languages in which the primary structure is the wh-'m situ form. Iraqi Arabic for example has a reduced wh-expression which is suffixed to the verb indicating the presence of a w/z-expression in a subordinate clause. However, unlike German, the subordinate clause contains the full wh-expression in situ. Also unlike German, this suffix sh—, a reduced form of sheno (= 'what'), must precede the verb in each clause between the initial clause carrying the first instance of the expletive and the clause within which the full wh-expression itself occurs. Without sh—, the presence of the wh- in situ in a tensed clause is ungrammatical (data from Simpson, 1995):
256
Ruth Kempson et al.
(22) Mona raadat [riijbir Su 'ad tisa 'ad meno] ? Mona wanted to force Suad to help who 'Who did Mona want to force Suad to help?' (23)
*Mona tsawwarat [AH ishtara sheno]? Mona thought All bought what (Intended: 'What did Mona think that Ali bought?')
(24)
Sheno, tsawwarit Mona [Ali ishtara e i ]? What thought Mona Ali bought 'What did Mona think Ali bought?'
(25)
sh- 'tsawwarit Mona [Ali raah weyn ] ? Q-thought Mona Ali went where 'Where did Mona think that Ali went?'
(26) sh-'tsawwarit Mona [Ali ishtara sheno]! Q-thought Mona Ali bought 'What did Mona think Ali bought?' These phenomena have only recently been subject to serious study, but their analysis in all frameworks remains controversial (McDaniel, 1989; Dayal, 1994; Simpson, 1995; Johnson and Lappin, 1997; Muller and Sternefeld, 1996). Faced with this apparent heterogeneity, it is perhaps not surprising that these phenomena are generally taken in isolation from each other, requiring additional principles. Of those who provide a general account, Johnson and Lappin (1997) articulate an account within the Head-driven Phrase Structure Grammar (HPSG) framework which involves three distinct operators:—a binding operator for whexpressions, a discrete operator for wh- in situ, and yet a further operator to express the expletive phenomena. The primary task in the various theoretical paradigms seems to have been that of advocating sufficient richness within independently motivated frameworks to be able to describe the data. Little or no attention has been paid to why wft-expressions display this puzzling array of data.
2. THE PROPOSED ANSWER The answer we propose demands a different, and more dynamic, perspective. Linguistic expressions will be seen to project not merely some logical form mirroring the semantic content assigned to a string, but also the set of steps involved in monotonically building up that structure. This dynamic projection of structure is set within a framework for modeling the process of utterance interpretation, which is defined as a left-to-right goal-directed task of constructing a proposi-
Computation as Labeled Deduction
257
tional formula. Two concepts of content are assumed within the framework— the content associated with the logical form which results from the output of the structure-building process, and the content associated with the process itself— both reflected in lexical specifications. The process is modeled as the growth of a tree representing the logical form of some interpretation. The initial state of the growth process is merely the imposition of the goal to establish some proposi– tional formula as interpretation. The output state is an annotated tree structure whose root node is annotated with a well-formed prepositional formula compiled from annotations to constituent nodes in the tree. The emphasis at all stages is on the partial nature of the information made available at any one point: a primary commitment is to provide a representational account of the often observed asymmetry between the content encoded in some given linguistic input and its interpretation in context (cf. Sperber and Wilson, 1986, whose insights about context dependence this framework is designed to reflect). W/z-initial expressions will be seen as displaying such asymmetry. As clause-initial expressions they do not from that position project a uniquely determined position in the emergent tree structure, and this poses a problem to be resolved during the interpretation process as it proceeds from left to right through a string. Wh-in situ constructions are the mirror image to w/i-initial constructions, their tree relation with their sister nodes being uniquely determined. Finally, partial movement constructions will emerge as a direct consequence of combining this underspecification analysis of the characterization of wh with the dynamics of the goal-directed parsing task. The consequence of this shift to a more dynamic syntactic perspective is a much closer relation between formal properties of grammars and parsers, a consequence which we shall reflect on briefly in closing. 2.1. The Framework: A Labeled Deductive System for Natural Language—LDSNL The general framework within which the analysis is set is a model of the process of natural language interpretation, where the goal is to model the pragmatic process of incrementally building up propositional structure from only partially specified input. The underlying aim is to model the process of understanding reflecting at each step the partial nature of the information encoded in the string and the ways in which this information is enriched by choice mechanisms which fix some particular interpretation. The process is driven by a mixed deductive system—type-deduction is used to project intraclausal structure (much as in Categorial Grammar—cf. Moortgat, 1988; Morrill, 1994; Oehrle, 1995), but there is in addition inference defined over databases as units for projecting interclausal (adjunct) structure (cf. Joshi and Kulick, 1997, for a simple composite type-deduction system). The background methodology assumed is that of Labeled Deductive Systems (LDS) (Gabbay, 1996). According to this methodology, mixed
258
Ruth Kempson et al.
logical systems can be defined, allowing systematically related phenomena to be defined together while keeping their discrete identity. A simple example is the correlation between the functional calculus and conditional logic (known as the Curry-Howard isomorphism) with functional application corresponding to Modus Ponens, Lambda abstraction corresponding to Conditional Introduction. Thus we might define Modus Ponens for labeled formulae as: (27) Modus Ponens for labeled formulae
In the system we adopt here, intraclausal structure is built up by steps of type deduction, much as in Categorial grammar, but, since the primary task is that of building up a propositional structure, we define the formula to be the expression being established, and the labels to be the set of specifications/constraints which drive that process. (28) provides the simplest type of example, displaying how type deduction (and its twinned operation of function application) drives the process of projecting a representation of propositional content which duly reflects the internal mode of combination:
The interpretation process is formalized in an LDS framework in which labels guide the parser in the goal-directed process of constructing labeled formulas, in a language in which these are defined together. Declarative units consist of pairs of sequences of labels followed by a content formula. The formula of a declarative unit is the side representing the content of the words supplied in the course of a parse. The labels annotate this content with linguistic features and control information guiding the direction of the parse process. Since the aim is to model the incremental way information is built up through a sequence of linguistic expressions, we shall need a vocabulary that enables us to describe how a label-formula constellation is progressively built up. With this in mind, declarative units are represented as finite sets of formulas (cf. section 2.2.1):
In the course of a parse these feature sets grow incrementally. The dynamics of the parse is given by a sequence of parse states, each of which is a partial description of a tree. The task is goal-driven, the goal to establish a formula of type t using information as incrementally provided on a left-right basis by a given input string. At each state of the parse, there is one node under
Computation as Labeled Deduction
259
development which constitutes the current task state. Each such task state has a database and a header. The database indicates the information established at that node so far. The header indicates the overall goal of that task—SHOW X for some type X; the tree node of the particular task being built up; the subgoal of the given task—what remains TO DO in the current task; and a specification of which task state it is. (29) displays the general format:
(P = "P holds at a daughter of me") In the initial state, the goal SHOW and the subgoal TO DO coincide: establish a formula of type t, for node m (the putative top node). In the final state, the goal of node m is fulfilled, and a prepositional formula established. Each subtask set in fulfilling that task is assigned a task number, and described according to a
260
Ruth Kempson et al.
tree-node vocabulary which enables trees to be defined in terms of properties holding at its nodes, and relations between them. Successive steps of introduction rules introduce subtasks, which once completed combine in steps of elimination to get back to the initial task and its successful completion. Thus for example, the opening sequence of states given presentation of a subject NP introduces the subtasks, TODO, of building first a formula of type e, and then a formula of type e t (enabling a formula of type t to be derived). And correspondingly, the last action in the sequence of parse states is a step at which, both these subtasks having been completed, a step of Modus Ponens applying to the completed tasks establishes the goal of deducing a formula of type t at the root node m which constitutes the initial task state. The result is a sequence of task states each completed, so with no subtasks TODO remaining outstanding. Notice that these task states progressively completed reflect the anticipation of structure corresponding to the semantic interpretation of the string, and they are not a semantically blind assignment of syntactic structure. As the proto-type sketch in (29) suggests, the system is an inferential building of a feature-annotated tree structure. One of its distinguishing properties is that this articulation of the process of building a tree structure is itself the syntactic engine, as driven by lexical specifications. There is no externally defined syntactic mechanism over and above this. Unlike other syntactic models, the system combines object-level information pertaining to the structure and interpretation of the resulting formula with metalevel information about the process of establishing it. So there is DECLARATIVE structure which indicates what the content is (type plus formula plus tree node). And there is IMPERATIVE structure which indicates what remains to be done. 2.2. The Logical Language To express this degree of richness, we need a formal language with a number of basic predicates, a formula predicate, a label predicate, a tree-node predicate, each a monadic predicate of the form P(d) with values a. A composite language combines these various formulae in defining the structure within which the annotated tree node is built up. 2.2.1. THE FORMULA PREDICATE The values of the formula predicate Fo are expressions of an extended quantifier-free lambda calculus LC. Terms are predicate constants sing, see, smile, and so on and a range of lambda expressions; individual constants John, Mary, and so on.3 The quantifier-variable notation of predicate logic is replaced by epsilon (equivalent to 3) and tau (equivalent to V) terms, each with a restrictive clause nested within the term itself. For example, a man is projected as ex(x, man(x)). In
Computation as Labeled Deduction
261
addition, there are a range of specialized metavariables, "m-variables." These are annotated to indicate the expression from which they are projected: e.g., wh (to be read as 'gap'), upro. In all cases such expressions are taken as placeholders of the appropriate kind, and operations map these expressions onto some expression of the formula language which replaces them.4 2.2.2. THE LABEL PREDICATES Labels present all information that drives the combinatorial process. These include: (i) The Type predicate, with logical types as value represented as type-logical formulae e, t, e t, e (e i), . . , corresponding to the syntactic categories DP, IP, intransitive verb, transitive verb, and so on. These are displayed as: Ty(e), Ty(e f), etc. We may also allow Ty(cn) as a type to distinguish nouns from intransitive verbs. (ii) The Tree node predicate, with values identifying the tree position of the declarative unit under construction from which its combinatorial role is determined (see below). (iii) Additional features as needed. We shall, for example, distinguish discrete sentence types such as +Q associated with questions. We might also add a further range of features such as case or tense features, for example defining tense as a label to a formula of type t (following Gabbay, 1994). All issues of case and tense we leave to a later occasion, here allowing the set of label-types to be open-ended, assuming some additional syntactic feature +Tense (cf. 5.3.1). Boolean combinations of such atomic formulae are then defined in the standard way. 2.2.3. MODAL OPERATORS FOR DESCRIBING TREE RELATIONS Relations between nodes in a tree are described by a tree-node logic, LOFT (Logic of Finite Trees), a propositional modal language with four modalities (Blackburn and Meyer-Viol, 1994): P P holds at my mother P P holds at a daughter of current node P P holds of a left-sister of current node P P holds of a right-sister of current node In addition to the operator (x), x ranging over {u, d, I, r}, its dual [x] is defined:
262
Ruth Kempson et al.
We extend this language with an additional operator , which describes a link relation holding between an arbitrary node of a tree and the root node of an independent tree. This relation enables us to express a relation between pairs of trees, which we shall use to characterize adjunction. This modal logic allows nodes of a tree to be defined in terms of the relations that hold between them. For purposes of this chapter, the system can be displayed by example. In the following tree for example, from the standpoint of node n, where Ty(t) holds, (Fo(John) & Ty(e)) & Ty(e t) hold; from the standpoint of n', where Fo(John)&Ty(e) holds, Ty(t) and (r)Ty(e t) hold; and from the standpoint of n", where Ty(e t ) holds, Ty(t) and (l}(Fo(John) & Ty(e)) hold:
The language can have constants which may be defined as required: (e.g., 0 [u] root node—"nothing is above me"). Note the use of the falsum. Also manipulated are Kleene star operators for defining the reflexive transitive closure of the basic relations:
For example: *X
Some property X holds either here or at a node somewhere below here.
*Tn(m)
'Either here or somewhere above me is Tn(m)'. This property is true of all nodes dominated by the node m.
This use of the Kleene star operator provides a richness to syntactic description not hitherto exploited (though cf. Kaplan and Zaenen, 1988, for its use in defining a concept of functional uncertainty defined over LFG f-structures): it provides the capacity to specify a property as holding either at some given node or at some node elsewhere in the tree in which it is contained. It is this relatively weak disjunctive specification which we shall use to characterize wh or other expression initial in a string, whose properties and their projection within the string are not dictated by its immediate neighbors. The effect will be that not all expressions in a sequence fully determine their structural role in the interpretation of the string from that position in the string.
Computation as Labeled Deduction
263
2.2.4. THE LANGUAGE OF DECLARATIVE UNITS The language of declarative units is a first-order language with the following nonlogical vocabulary: 1. a denumerable number of sorted constants from Labi for i< n, where L, = (Labi, R],... ,Ri> structures the set at feature values in Labi, 2. monadic predicates Fo ('Formula'), Ty (Type'), Tn (Tree node'), Cj, i ^ n and identity ' = ', 3. modalities ,u. (up), (d) (down), {/) (left), . (right) and their starred versions *,*,*,*,(L} for pairing some completed node in a tree and the root node of a new tree (for adjuncts), and , which are the analogue of *, and * defined over the union of and ., and and respectively. Formulas: 1. Ifj e Lc then Fo(j) is an (atomic) DU-formula. If k e Labi then Ty(k) is an (atomic) DU-formula. If A; e Lab2 then Tn(k) is an (atomic) DU-formula. If k Labi, 2< i < n, then Ci(k} is an (atomic) DU-formula. If t, t' are variables or individual constants, then t = t' is an (atomic) DUformula. 2. If f and y are DU-formulas then f # y is a DU-formula for # e {A,V,->, }• If x is a variable and f a DU-formula, then xf and 3xf are DU-formulas. If M is a modality and f a DU-formula, then Mf is a DU-formula. With this composite language system, inference rules which characterize the transition between input state and final outcome can now be set out. All inference operations are defined as metalevel statements in terms of DU -formulas and relations between them. For instance, applications of the rule of Modus Ponens
for declarative units become a metalevel statement licensing the accumulation of information at nodes in a tree structure represented as: "Modus Ponens" for DU-formulas:
An item has Type Feature t and Formula Feature y(f) if it has daughters with Type Features e t and e and Formula Features y and f, respectively. Controlled
264
Ruth Kempson et al.
Modus Ponens is then a straightforward generalization.5 For instance,
where Modus Ponens is restricted to daughters with features X and Y. In general, in this modal logic, a rewrite rule y 1 . . . , Yn X gets the form
3. THE DYNAMICS A parse is a sequence of parse states, each parse state an annotated partial tree. A parse state is a sequence of task states, one for every node in the partial tree so far constructed. A task state is a description of the state of a task at a node in a partial tree. A task is completely described by a location in the tree, a goal, what has been constructed, and what still has to be done (TODO). So the four Feature dimensions of a task state are 1. Goal (G). Values on this dimension are the semantic types in the label set Ty. This feature tells us which semantic object is under construction. 2. Tree Node (TN). Values are elements of the label set Tn. The 'top-node' in Tn will be denoted by 1. This feature fixes the location of the task in question within a tree structure. 3. Discrepancy (TODO). Values are (finite sequences of) DU-formulas. This dimension tells us what has to be found or constructed before the goal object can be constructed. 4. Result (DONE). Values are lists, sequences, of DU-formulas. These values will be the partial declarative units of the Incremental Model. We will represent the task state TS(i) by
We can distinguish three kinds of task states 1. Task Declarations
The Task Declaration. Nothing has yet been achieved with respect to the goal G. Everything is still to be done. Analogously to the description of
Computation as Labeled Deduction
265
declarative units we can represent the above task state as a list of featurevalue statements as follows. 2. Tasks in Progress
In the middle of a task. If things are set up right, then ab G. The value of Done gives an element of the domain C of the Incremental Model, a partial declarative unit. The value of Todo gives a demand associated with this element that still has to be satisfied. 3. Satisfied Tasks
A Satisfied Task. There is nothing left to be done. Soundness of the deductive system amounts to the fact that the goal G can be computed, derived, from a in case TODO is empty. From a different perspective we can consider the state
as constituting an association between a node in a tree and a Labeled object decorating that node. Notice that this can be seen as a tree node decorated by some feature structure plus an unsatisfied demand. In the course of a parse we may have a Task State with the Tree Node feature undefined
In this case we are dealing with a decoration in search of a node to decorate.
266
Ruth Kempson et al.
3.1. Dynamics: The Basic Transition Rules The dynamics of the parse process consists then of a sequence of partial descriptions of a tree. Concretely, the dynamics of the parsing process is the dynamics of demand satisfaction. This sequence of parse states can be seen as a sequence of tree descriptions in which nodes are introduced which must subsequently be completed to derive a formula corresponding to an interpretation of the string. The tree corresponds to a skeletal anticipation of the internal semantic structure of the resulting propositional formula, and not to a tree-structure for the input sequence of words. Indeed there is no necessary one-to-one correspondence between the individual linguistic expressions in the string and nodes of the tree. 3.1.2. BASIC TRANSITION RULES In the following the symbols X, Y, Z, . . . will range over individual DUformulas, the symbols a, b, . . . will range over (possibly empty) sequences of such formulas, D, D',... will range over (possibly empty) sequences of tasks, and wi, wi+l,... will range over words. The start of a parsing sequence is a single task state, the Axiom state. The last element of such a sequence is the Goal state. The number of task states in a parse state grows by applications of the Subgoal Rule. Tasks become satisfied by applications of the Scanning and the Completion Rules. 1. Axiom ax
Goal go
where all elements of D are satisfied task states. 2. Scanning
The expression LEX(w) = Y refers to the lexical entry for the word w. This is a set of DU-formulas possibly containing U, the required element in the TODO box.
Computation as Labeled Deduction
267
3. Mode of Combination (a) Introduction
Notice that the premises Yi are indexed as daughters of the task p. The rule Y0, . . . , Yn Z stands for an arbitrary rule of combination in general. We can see it as an application of Modus Ponens, a syntactic rewrite rule, or interpret => as logical consequence, (b) Elimination
Notice that this rule effects the converse of Introduction. This inverse relation guarantees that an empty TODO compartment corresponds to a DONE compartment which can derive the goal. 4. Subordination (a) Prediction
where Rd is the relation holding between a node and its daughter, (b) Completion
These are the basic set of general rules driving the parsing process, exemplified by returning to our earlier display (29), filling out the leaves and annotating the
268
Ruth Kempson et al.
tree structure to show how the rules have applied in projecting an interpretation for John saw Mary:
In this example and the following ones, we use the formula m, abbreviating Tn(m), to stand for an element k of the label set Tn such that Ru(m, k) (that is, m is the mother of k) and, analogously, we will use the other modalities, e.g. and * as relative addresses for elements of Tn. There are other rules to add to this set. In particular there are the rules associated with subordination, and rules specifically associated with wh-expressions. We also presume on rules which relate sequences of tasks states as units. These give rise to linked task-sequences, to which we shall return, in considering adjunction.
Computation as Labeled Deduction
269
3.2. Modeling the Partial Nature of Natural Language Content Before increasing the complexity with additional rules, we indicate how we model the gap between lexically specified content and its assigned interpretation within the context of some given string, as this is the heart of any account of how utterances are interpreted in context.6 The most extensively studied phenomenon involving such asymmetry is anaphora. In this model, we take the underspecification of the input specification associated with pronominal anaphora a step further than in many other analyses. We assume that pronouns are invariably inserted from the lexicon with a single specification of content, and that any bifurcation into bound-variable pronoun, indexical pronoun, E-type, etc., is solely a matter of the nature of the contextually made choice, context here being taken to include logical expressions already established within the string under interpretation. Accordingly, pronouns are projected as m- variables with an associated procedure (not given here) which imposes limits on the pragmatic choice of establishing an antecedent expression from which to select the form which the pronoun is to be taken as projecting: Lex(he)
{Ty(e), Fo(upro), Gender (male),Ty (t),. . .}
(Notice how the condition on the mother of the formula projected from he is in effect a feature-checking device, licensing the occurrence of the pronoun within a particular frame.) Instantiation of the m-variable upro is generally on-line, as the m-variable is inserted from the lexicon into the tree. It must be selected only from identified formulae, where an identified formula is either a formula in some satisfied task (i.e., in the done box of a task with empty TODO and identified tree node) or a formula which has been derived elsewhere in the discourse. For a pronominal of type e, there is a further restriction that the formula selected as providing its value may not occur within the same r-domain within which upro is located. This is expressed as a side condition, given here only informally.7 Note the metalevel status of this characterization of pronouns. Anaphora resolution is defined not as a task of reference assignment defined in semantic terms, but as a selection process defined over available representations. The nature of this choice will determine whether the denotative content of the pronoun relative to the assigned interpretation is that of a variable, a referential term, and so on. 3.2.1. UNDERSPECIFICATION OF TREE CONFIGURATION More unorthodox than the recognition that a single pronoun has a single lexical specification which by enrichment of its input specification becomes a boundvariable, constant, and so on is the claim that expressions in a string may also underspecify the role the expression is to play in the compilation of interpretation for the string. Taking up the potential of LOFT to express disjunctive characterizations of node properties, there are rules which allow individual lexical items to
270
Ruth Kempson et al.
project tree descriptions which do not fully determine all branches of the tree. This is the primary distinguishing feature of our analysis of initial wh-expressions. Wh-expressions, we claim, project the following state:
(31) displays the projection of a task with the goal of showing Ty(t) at some node ra identified as a wft-question, but with everything still to do, except that a completed e task has been added, lacking merely the specification of where in the tree it holds. WH, which is the value of the Formula predicate, is an m-variable, either retained as a primitive term to be resolved by the hearer and so incomplete in content, or in relative clauses resolved by replacing the m-variable WH with the formula projected by the adjoined head. (u)*m is an abbreviation for (u)*(Tn(m))
'The root node is either here or somewhere above me'
In other words, the structural position of Fo(WH) and Ty(e), hence its functionargument role in the propositional formula under construction, is not fixed at this juncture in the parsing process. The structure is merely defined as having such a node. The + Q feature is an indication by feature specification of a propositional formula which is to be open with respect to at least one argument. Seen as an on-line target-driven parsing task, by a single step of inference we can add a conclusion at the current node m about the presence of the WH:
The (d)* form of specification holds because somewhere in the tree dominated by m is a node with the properties listed. (This inference is not a necessary part of the specification projected, but, as we shall see in section 5.3, is the form of characterization that brings out its parallel with wh-expletives.) We have not yet added any account of why the properties of wh might get carried down from one clausal domain to another. This transfer follows from the recursive definition of (#)*X. Consider the evaluation of the Du-formula holding at some node (u)*m. By definition, this property holds either at m or at a daughter of m or at a daughter of a daughter to m, and so on (cf. section 2.3.3). Given that information at all nodes must be locally consistent, the mismatch between TODO Ty(t) and Fo(WH) & Ty(e) will lead to the DU-formula annotating the node so
Computation as Labeled Deduction
271
far unfixed being evaluated with respect to some daughter, and then successively through the tree until resolution is possible. This resolution is achieved at node i by some TODO specification associated with a task state being taken as satisfied by the presented floating constituent, whose node characterization is thereby identified (WH-RESOLUTION): (33)
Wtf-RESOLUTION
Provided Ru*(m, i) and if Ty(x) is in the lexical specification of the current word, then x X ....> e and x e.
The side condition is a restriction that the type of the current word in the string must neither meet the TODO specification directly, nor set up a type specification which a sequence of Introduction and Elimination steps would satisfy. This guarantees that such resolution only takes place when there is no suitable input. With the information from the unfixed node copied into the tree, the underspecification intrinsic to (u)*m as a tree-node identifier is resolved with i= (u}*m, and the unfixed node is deleted. We now set out two examples. First is the specification of input state and output state for the string Who does John like?: (34)
Who does John like? INPUT TASK STATE
Notice how the lexical specification of who simultaneously projects information both about its mother node (that it is a question) and about some unplaced
272
Ruth Kempson et al.
constituent. The finally derived state with the t target duly completed no longer has this unfixed node as part of the tree description. (35), our second sample derivation, specifies a characterization of the parse state following the projection of information following think. It displays the disjunctive specification associated with who being carried down to information projected by the string making up the subordinate clause through inconsistency between the type of the wh-element and that assigned to each intermediate right-branching daughter, with the point at which the information projected by who still not fixed: (35)
Who do you think Bill likes?: PARSE STATE following entry of think:
Notice that we are in effect abandoning the assumption that a w/z-initial expression takes scope over the remaining nodes defined over the subsequent string, for the formula projected by the wh-expression has a fixed position only at the point at which its tree relation to the remaining structure is fixed—viz, the "gap." Hence we shall have a basis from which to characterize the scope idiosyncrasy of initial wft-expressions that they freely allow narrow scope effects with respect to expressions which follow them (listed as problem (1)).
4. CROSSOVER: THE BASIC RESTRICTION We are now in a position to present the basic crossover restriction, to wit that in questions, pronouns can never be interpreted as dependent on the preceding w/i-expression unless they also follow the gap. This restriction is uniform, and runs across strong and weak crossover configurations (cf. examples (4)-(9) repeated here):
Computation as Labeled Deduction
(36)
*Who, does Joan think that hei worries ei is sick?
(37)
*Who, does Joan think that his, mother worries ei is sick?
273
(38) *Whosei exam resultsj was hei certain eJ would be better than anyone else's? (39)
Who, does Joan think ei worries his; mother is sick?
(40)
Who, does Joan think ei worries that hei, is sick?
(41)
Whose, exam resultsj ej were so striking that he i was suspected of cheating?
This is directly accounted for by the concept of identification associated with anaphora resolution. As long as the underspecified node is not fixed, by definition it cannot serve as an antecedent for the pronoun (cf. section 3.2). The effect of wh-resolution when it later applies is indeed to determine the position within the configuration at which the properties projected by the w/i-expression should be taken to hold. Such features become available for pronominal resolution only after the gap has been projected. In this way, the system is able to characterize the way in which wh-expressions in questions do not provide an antecedent for a following pronominal until the gap (= Fo(Wh)) is constructed in a fixed position. Hence the primary crossover restriction
(for both weak, strong, and extended strong crossover data).8 4.1. Crossover and Relative Clauses With relative clauses, we face data that are apparently problematic for this restriction, as indeed for all accounts in terms of operator-gap binding, as these demonstrate that the crossover phenomenon is context-sensitive (Lasnik and Stowell, 1991; Postal, 1993). In some contexts, the primary crossover restriction is suspended altogether, in others it remains in force. In relatives, the crossover restriction against a sequence wh . . . . pronominal . . . . gap, all interpreted as picking out the same entity, does not hold, if either the pronoun "crossed over" is a determiner (the primary weak crossover cases), or if the w/z-expression, likewise, is contained within some larger noun phrase (the "extended strong crossover" cases—Postal, 1993): (42)
*John,, who Sue thinks hei said ei was sick, has stopped working.
(43)
John, who, Sue thinks ei said hei was sick, has stopped working.
(44)
John, whoi Sue said his, mother is unnecessarily worried about ei, has stopped working.
274
(45)
Ruth Kempson et al.
John, whosei motherj Sue said he, was unnecessarily worried about ej, has stopped working.
This contrast between (42) and (44)-(45) is less marked in restrictive relatives, but many English speakers report a difference between (46), and (47)-(48), with (46) being unacceptable on an interpretation in which the pronoun is construed as identical to the head nominal, but (47)-(48) to the contrary allowing an interpretation in which wh, pronoun, and gap position are all construed as picking out the same individual:9 (46)
Every actor who the press thinks he said e was sick, has stopped working.
(47)
Every actor who the press thinks his director is unnecessarily worried about e, has stopped working.
(48)
Every actor whose director the press said he was fighting with e, has stopped working.
This asymmetry between relatives and questions is inexplicable given an analysis of the primary crossover restriction solely in terms of the relative positions of the three elements wh, pronominal, and gap, as any binding precluded by one such configuration should continue to be excluded no matter what environment the configuration is incorporated into as a subpart. On the other hand, if the crossover restriction in questions is due to some intrinsic property of a wh-word in questions, say the weakness of description provided by the wh-expression, then we have some means of explaining the difference between questions and relatives, as long as we can provide a means of distinguishing the way the wh-expression is understood in the context of relative constructions. In particular, if there is some externally provided means of enriching the very weak description of content of wh-expressions in relatives, then this source of information may also provide a means of resolving the underspecification intrinsic to the following pronoun and, through it, indirectly identifying a fixed node for the newly enriched formula, so providing potential for interaction between the processes of node fixing and anaphora resolution. This is the approach we shall take, using the property of the head to which the relative is adjoined as the external source of information. This account turns on the account of relative clauses provided within this framework. 4.1.1. RELATIVE CLAUSES AS INDEPENDENT, LINKED TREE STRUCTURES The point of departure for our account of relative clause construal is the observation that relative clause construal involves constructing two propositional formulae that have some element in common. Furthermore, given that English is a head-initial language, the information as to what that shared element is is given ab initio—it is the formula projected by the head to which the relative is adjoined. The starting point in constructing a representation for the relative clause is, then,
Computation as Labeled Deduction
275
the requirement that this second structure must have a copy of the formula of the node from which this tree has been induced: (49)
John, who Sue's mother thinks is sick, is playing golf.
It is this set of observations which our account directly reflects. In (49) the occurrence of who is defined as signaling the initiation of a second structure which is required to contain a copy of the formula Fo(John) and Ty(e). As so far set out, the system only induces single trees for a system of typededuction which annotates the nodes of the tree once set out, plus a process of projecting and subsequently resolving initially unfixed nodes. To reflect the informal observation, this process of fixing an initially unfixed node is combined with a rule for transferring information from one tree to a second tree suitably linked to the first. The concept of linked tree is defined to be a pair of trees sharing some identical expression:10 (50)
For a pair of trees Ti, T2 , RLINK (T1 , T2) iff T\ contains at least 1 node n with the DU-formula (Fo(A), Ty(X), Y}, and T2 , with root node Tn(n) contains at least 1 node with (Fo(A), Ty(X), Z}.
To project such a structure incrementally, we define a LINK introduction rule. This rule applies to some completed node n in a tree T1 , induces a new tree T2 with root node n; and imposes on the new tree T2 a requirement of having a node annotated with the relevant formula Fo(A), (described as * n). For the case of nonrestrictive relative construal, we assume this LINK Introduc1tion process applies to the node of type e, carrying over the formula which annotates this node into the new linked tree T2 .u (51)
LINK Introduction:
Note that the unfixed node is described as an arbitrary node along a chain of daughter relations, with the requirement on that node that it be filled by the formula filling the node to which the tree is linked. This requirement will be satisfied only if there is some subsequent expression projecting the necessary formula onto the node. The consequence of meeting this requirement is that all pairs of linked trees will contain a common occurrence of the variable copied over, thus meeting the characterization of what it means for a pair of trees to satisfy the relation RLINK In the case of a language such as English, the relativizing complementizer is defined as carrying a copy of the head formula by substitution of the metavariable
276
Ruth Kempson et al.
Fo(WH), and this requirement of a common formula in the two structures is met immediately. The new tree is thus "loaded" with an occurrence of the formula occurring within the node from which the LINKed structure was projected: (52)
INPUT TASK STATE
In the process defined as (52), the content a is carried from the host task state into the independent task state with its goal of SHOW t. The value of wh is therefore identified as identical with that of its head by definition. However, the specification of the presence of this formula is characterized as annotating a node whose tree position in the new tree is as yet unfixed, and this form will as before give rise to percolation of the specification Fo(a)andTy(e) down through the tree, checking at each node whether the DU -formula is to hold at that node. It is this anaphoric property of wh, being replaced by a substituend in relatives, which provides the basis for explaining the asymmetry between crossover phenomena in questions and in relatives. The contrast between questions and relatives can be seen as arising from an interaction between two different forms of update—pronoun construal and the identification of a tree position for some unfixed node. In relatives in English, the w/z-expression projects a formula Fo(a) of Ty(e) identified with the head, in virtue of the anaphoric properties of the relative "pronoun." All that needs to be established in the subsequent development of the tree is where the unfixed node annotated with Fo(a) and Ty(e) should fit into the tree. The interference caused by anaphora resolution is then as follows. If by a free pragmatic process, a pronominal of the same type as the dislocated constituent happens to be assigned as value the same formula Fo(a) as the head noun before the gap is reached, then a fixed position within the tree for the DU-formula Fo(a (and Ty(e}} will have been found. This will automatically lead to update of the tree—there is at this juncture an occurrence of Fo(a) at a fixed node in the tree—and there will be nothing to fill any subsequent gap where the words themselves provide inadequate or conflicting input. If, that is, the words that follow fail to project a full set of annotations to fulfil whatever requirements are set up in the subsequent projection, then at that later stage there will no longer be any outstanding unfixed node whose position has to be resolved, and hence no successful completion of the tree. So, should the pronoun be identified as being a copy of the formula inhabiting the head, there will be no successful completion of the tree if
Computation as Labeled Deduction
277
a "gap" follows. Indeed, the only way to ensure that the initially unfixed node is used to resolve some such outstanding requirement is to interpret the intervening pronoun as disjoint from the head nominal. This is the strong crossover phenomenon, exemplified by (42). Following on from this, should a pronominal have served the purpose of identifying a fixed tree position for a hitherto unfixed node, all subsequent references to the same entity will have to be made through anaphora: (53)
John, who Sue said he was worried was sick, has gone to the hospital.
(54)
John, who Sue said was worried he was sick, has gone to the hospital.
(55)
John, who Sue said he was worried he was sick, has gone to the hospital.
Thus (53) precludes any interpretation of he as dependent on John (the strong crossover data); (54) allows an interpretation for the occurrence of he as dependent on John, possibly via its identification with whatever occurs in the gap (the two will not be distinguishable, because the supposed gap contains an occurrence of the variable associated with the nominal); and (55) allows such an interpretation for both occurrences of he, and moreover also allows an interpretation in which the first but not the second pronoun is interpreted as dependent on John. The interaction of anaphora resolution and gap resolution said to underpin strong crossover has turned on the fact that the pronoun and the w/i-expression are of the same type, Ty(e). Should, then, there be any reason why identifying the pronoun will not lead to fixing the tree position for the dislocated expression, then the pronoun won't be able to be used to fix the position of the unfixed node; there will still be a role for a subsequent gap; and choice of pronoun as identified with the head nominal will not interfere with the Gap Resolution process. This happens in two types of case: (i) if the pronoun is a determiner and so not of type e (weak crossover effects), and (ii) if the wh-expression is contained within some larger expression and it is this larger expression which is unfixed within the emergent tree configuration (extended strong crossover effects). Both weak crossover effects (44) and extended strong crossover effects (45) are thus predicted to be well formed in relatives with the pronoun construed as dependent on the head, as the mere complement of the type of case that is precluded. None of these means of updating the tree through the occurrence of the pronoun will be available in questions, for there is no independent identification of the whexpression in questions. It remains an unidentified formula with but a placeholding wh metavariable, and, prior to Gap Resolution, without even a fixed position in a tree structure. Hence the asymmetry between questions and relatives.12 This account has the advantage over a number of accounts (including that of Kempson and Gabbay, 1998) that it provides a natural basis for distinguishing languages such as Arabic with a resumptive pronoun strategy from a language such as English, which uses pronouns resumptively only for marked effect. Arabic
278
Ruth Kempson et al.
displays no crossover effects, either strong or weak in relative clauses:13 (56)
irra:gil illi nadja ftakkarit innu qe:l innu aiya:n the man who Nadya thought he said he was sick
(57)
irra:gil illi nadja ftakkarit inn umuhe qelqa:neh minn-u the man who Nadya thought that his mother was worried about
All that is required to explain this difference between English and Arabic relatives is to propose that in Arabic, the relativizing complementizer has less rich anaphoric properties than in English. All that it encodes is the very requirement on the LINK structure definitional of LINK structures, an expletive which requires further lexical input. In consequence, the requirement of creating a copy of the nominal formula in the relative clause structure is not satisfied by the complementizer itself, and so the presence of the required copy will only be met through an ensuing pronoun identified anaphorically with the formula inhabiting the head. There is never any question of "gap" positions occurring subsequent to some pronominal, for there is no successfully annotated unfixed node for whom it is only its position in the tree that remains to be identified. Correctly, the analysis leads us to expect a total lack of observable crossover data in relative clauses in Arabic. The difference between the two languages thus reduces to a lexical difference between the relativizing complementizer in the two languages. Notice that in all cases, the account, and its context sensitivity, make critical use of the way information has been accumulated previous to the projection of the pronoun, and is not solely defined in terms of the configuration in which the pronoun occurs. It is thus sensitive to linear order, partiality of information at intermediate steps in the interpretation process, and the way information is accumulated through the interpretation process. In particular, the dynamics involved in interpretation of w/i-expressions follow from the goal of seeking to resolve the weak tree description initially projected. The context sensitivity of "weak" and "extended strong" crossover but not "strong" crossover is thus predicted from the proposed characterizations of w/i-expressions, pronouns, and relative clauses, without additional construction-specific stipulation.
5. TOWARDS A TYPOLOGY FOR WH-CONSTRUAL In the face of the presented evidence, one might grant the need for some form of incrementality in the projection of interpretation for wh-structures, but nevertheless argue that the slash mechanism of HPSG, with its percolation of wh features progressively up a tree through feature unification captures just the right dynamic element, without abandoning the overall declarative formalism. What, one might ask, does this disjunctive specification approach have to offer, over and
Computation as Labeled Deduction
279
above that, more conservative, form of specification? It is furthermore extremely close to the functional uncertainty analysis of LFG (Kaplan and Zaenen, 1988). Kaplan and Zaenen indeed analyze long-distance dependencies in terms of the Kleene * operator, and so constitute a genuine precursor of the present analysis. However, in that case, the disjunction is defined over string sets and the/-structure specifications, not, as here, over structural specifications. The advantage specific to this account, in reply to such a charge, is the dynamic parsing perspective within which the account is set. It is this dynamic perspective that provided an account of the crossover phenomenon. And it is this same perspective that also provides the basis for a general typology of w/i-constructions, explaining why they occur as they do, rather than simply defining distinct mechanisms for each new set of data (as do Johnson and Lappin, 1997). The unifying form of explanation that we set out is not available to the more orthodox frameworks, in which syntax is defined purely statically. We take in order wh-in situ constructions, multiple w/i-constructions, and partial movement constructions. (In all cases we shall restrict attention to full NP w/z-expressions such as who, what.) 5.1. Wh-in situ Constructions In the framework adopted, there is a near symmetry between w/z-initial and wh-in situ constructions. The in situ form is the fixed variant of the node which the initial w/i-expression projects as unfixed. The only additional difference is that wh-in situ constructions lack the additional +Q feature indicating a formula with one open position. We specify together the result of processing a w/i-initial expression, and the effect of processing a wh-in situ expression. (58)
Wfe-initial:
(59)
Wi-in-situ:
The w/i-initial expression encodes an instruction that its formula and type are satisfied at some lower point in the tree, together with the specification that the
280
Ruth Kempson et al.
node currently under construction has the property of being a wh-question, hence a formula open with respect to at least one argument. The wh-in situ expression, conversely, encodes an instruction that it is the premise Fo(WH) and Ty(e) which is projected into the current task state. There is no tree with unfixed node position, as the wh- in situ projects information to a fixed node of the tree; and the feature +*P provides the basis for expressing natural linguistic generalizations—in particular not only characterizing the structural properties of wh-initial sentence forms, but also providing a principled basis from which to elaborate a whole family of generalizations about wh-structures. Wh-initial effects, wh-in situ effects, and the required array of partial movement effects are correctly predicted, as are the array of otherwise puzzling crossover phenomenona. Each of the phenomena standardly taken to require independent characterization have been explained from the same set of assumptions about the process of tree growth and its role in the interpretation process. Essential to the account have been two properties: 1. The asymmetry between the input provided by any individual expression on the one hand and its interpretation/structural role in interpretation on the other; 2. A specification of how such encoded input information is incrementally enriched as part of a left-right process of building up some prepositional form as interpretation for a string. The novelty of this account lies in the claim that natural-language expressions may not merely only relatively weakly specify the content to be assigned to them in context, but they may also fail to project a fully defined tree relation to the constituents projected by the items to which they are adjacent in the string. Expressions in a string must therefore in part be interpreted by a process of enrichment which involves not merely fixing the content of the expression relative to context, but also establishing tree relations which may hitherto not have been uniquely fixed. The significance of this account of long-distance dependency and related phenomena lies in two properties. First it is presented in terms of the "pragmatic" process of utterance interpretation—building up an interpretation in context. Second, there is no concept of syntactic structure over and above the structure in terms of which the incremental process of interpretation is modeled. The model of the parsing process itself provides the structural framework, indeed this is the syntax, and it is in terms of this framework that all linguistic explanations are couched. In this articulation of a single level of representation, it is quite unlike Discourse Representation Theory, despite obvious parallels between the resulting tree structure configurations and discourse representation structures.21 Furthermore, the projection of structure from wh-expressions is taken to be part of the process of resolving such relatively weak input specifications, defined, like anaphora resolution, in terms of the (primarily) left-right projection of information from
288
Ruth Kempson et al.
preceding linguistic input. The explanation therefore falls within a family of explanations which might loosely be called parsing explanations (cf. earlier attempts by Erteschik-Shir, 1973; Marcus, 1980; Berwick and Weinberg, 1984). It should however be stressed that this explanation of the data departs from earlier conceptions of the relation between competence and performance, or semantics and pragmatics, in which the competence model is defined in terms of an independently defined body of syntactic/semantic/phonological axioms which performance/pragmatic explanations take as input. We are not proposing a pragmatic model in terms that takes a fixed, semantically interpreted structural configuration as input with pragmatic principles applying to this input to yield a set of contextually fixed values. And we are not proposing an explanation of whphenomena in terms of parsing strategies merely to come to the conclusion that wh-binding, crossover, wh-in situ, and partial wh-movement effects fall outside the remit of the natural-language computational system, leaving the assumption of a computational system specific to the language faculty reduced but intact. We are, to the contrary, elaborating a model of the parsing process which purports to provide the total vocabulary for explaining structural (syntactic) properties of natural language. Despite the procedural flavor of this framework, many commonalities with other frameworks remain. Together with all other linguistic frameworks, we are assuming that the lexicon provides the input on the basis of which interpretation is projected, and that such encoded information provides all the information needed to characterize idiosyncrasies of individual languages. Together with other frameworks (HPSG, Categorial Grammar) we assume that lexical specifications include type-logical information fixing the combinatorial properties of individual expressions. Together with others, we assume that these lexical specifications also include representation of concepts, which in some cases fix the denotational content of some individual expressions. However, unlike other frameworks, we assume that such lexical specifications may include procedural instructions on the process of parsing itself, and that a unitary characterization of lexical specifications requires the definition of all such specifications as procedures that provide input to the incremental projection from a string onto some logical form. Furthermore, we claim that this incremental projection of structure is the only level of representation required in characterizing natural-language interpretation. The overall framework in terms of which these lexical specifications are defined, is, then, the metalevel theory which defines the inferential, goal-directed process that constitutes the activity of parsing. What we are proposing is that the human faculty for natural language is a capacity for parsing, a specialized inferential capacity for pairing linguistic expressions with logical forms which they are taken to express, these logical forms themselves being vehicles for inference of an orthodox sort. The shift of perspective has consequences. First, it suggests that the study of syntax has, following the lead of semantics, to become dynamic, defined in terms
Computation as Labeled Deduction
289
of the ongoing projection of information on a left-right basis (cf. Johnson and Moss, 1994, for the proposal that current models of grammar might revealingly be recast as dynamic algebras). Second, our concept of competence in opposition to some concept of performance has to be revised. We no longer envisage the systems underpinning natural language according to the static pattern imposed by classical Fregean logics, with strings assigned denotational content direct, and some ancillary and entirely separate largely unknown theory of performance explaining how these systems are manipulated in communication. Rather, we envisage natural languages as systems specifically developed for the dynamic enterprise of projecting infinite variety of interpretative content from a finite lexicon. Seen in this light, the underspecification of natural language content is no longer an embarrassing divergence from formal language systems, to be patched up in the analysis to approximate as closely as possible to those systems. Such specifications are, to the contrary, indicative of the purpose for which natural languages are designed. Natural languages are metalevel devices for the projection of vehicles of thought/inference, encoding procedures whereby the intended content can most effectively be retrieved. There is no longer a dichotomy between the perspectives provided by theories of competence and theories of performance. Theories of linguistic competence are indeed theories about the language faculty, and these are theories about the abstract formal properties of the framework which we put to use in parsing. Such theories are complemented by theories of pragmatics. The burden of pragmatic theories, and, more generally, performance theories, is to articulate the general constraints imposed by the cognitive system which determine how the choices made available by the competence system are actually realized in context (Sperber and Wilson, 1986). The two together combine to yield a theory of linguistic knowledge and use.
NOTES 1
This paper was stimulated by J. Aoun, who posed this question in a talk at the School of Oriental and African Studies, London, in February 1996. We are grateful to Andrew Simpson, Shalom Lappin, and Abbas Benmamoun for conversations over many months, and to the audience at the Bangor conference on Syntactic Categories for comments. 2 Indefinites are a systematic exception to this. Compare Reinhart (1997), Winter (1997), Farkas (1997), Abusch (1994), Meyer-Viol et al. (in press) for recent attempts to account for this phenomenon. 3 We leave open the question of whether the arity of predicates should include an argument for an event variable, but do not include such an argument position in what follows. 4 Scope effects are also projected from such m-variables, with each determiner projecting an m-variable. Such determiner m-variables may be indexed as dependent on some other
290
Ruth Kempson et al.
term, the choice of the term on which some variable is dependent, being an anaphoric-like choice which has to be made during the setting out of the annotated tree structure (cf. Meyer-Viol et al, in press. For a detailed account of the epsilon calculus, cf. MeyerViol, 1995). 5 There are specialized function-application rules in case the formula of the argument contains indexed variables (cf. Meyer-Viol et al., in press). 6 This point has been emphasized both in the semantic and pragmatic literature for over a decade now. Compare Kamp, 1981; Kamp and Reyle, 1994; Barwise and Perry, 1983; Sperber and Wilson, 1986; and the articles within these paradigms which have followed these. 7 The formal specification involves denning an additional Locality predicate, the value of which is shared for all nodes within a domain intervening between a node of Ty(t) annotated with a feature +Tense and some dominated node of Ty(t) also annotated with a feature + Tense. 8 Data such as His mother ignored every student with his not able to be construed as bound by every student is, on the account to be given here, an independent phenomenon to be explained in terms of linear order (cf. Williams, 1994, for a similar view). 9 The literature reports differences in judgments of acceptability with restrictive relative clause crossover data, but in recent months we have been unable to find a single speaker who consistently provides judgments in which (46)-(48) all preclude an interpretation in which the pronominal is bound by the quantifier. So, at least initially, we presume that the restrictive and nonrestrictive relatives alike display only strong crossover effects (cf. note 10). 10 This is in effect equivalent to the in situ form having an associated restriction (Labels(+Q)). In many languages (e.g., Japanese), the indication of the associated +Q feature is independently projected by a sentence-final particle. 1 ' The co-sharing of a formula expression in the two trees is most transparently displayed in nonrestrictive relative clauses. However, on the assumption that nominals project a pair of a variable (of type e) and a common noun, the same account can be extended to restrictive relative clause construals. 12 For those speakers for whom weak crossover effects persist in restrictive relative clauses, it appears that the variable projected by the nominal (the interpretation of the nominal in which the variable is contained being as yet incomplete) lacks sufficient denotational value to serve as an antecedent for the pronoun. (Cf. Kempson and Gabbay, 1998, for a discussion of this property within a different account of crossover phenomena in terms of locality.) 13 Wh-questions in Arabic display crossover effects much as in English if the wh is interpreted strictly as an indefinite. Should the wh- expression be interpreted quasireferentially as picking out some specific but not fully identified individual, then some speakers license the use of resumptive pronouns as a means of resolving the unfixed tree position. This independent means of enriching the wh-formula inhabiting the unfixed node provides a basis for identifying the following pronoun, which then, as in English relative-clause crossover phenomena, provides a means of identifying the position in the tree for the initially unfixed formula, thus precluding any following gap. 14 This is in effect equivalent to the in situ form having an associated restriction (Labels(+Q)) (= "a node bearing the annotation Labels(+Q) is somewhere above me
Computation as Labeled Deduction
291
in the tree description so far compiled"). In many languages (e.g., Japanese) in which the in situ form is the regular position for the wh-expression, the indication of the associated +Q feature is independently projected by a sentence-final particle. 15 In addition to this rule is the projection of the Topic position, associated with a clauseinternal position through the use of a resumptive pronoun. We leave this process on one side here, as not pertinent to a w/z-typology. Two alternatives present themselves. Either topic structures are an additional form of LINK structure, or they project an additional option for expansion from a node m requiring Ty(t) allowing a new node being introduced to be characterized as having a tree-node identified as m. Cf. note!3. 16 A restriction licensing only one unfixed constituent per task state, which is standardly imposed across languages, needs to be independently imposed. In some languages there is no such restriction, at least for w/z-expressions (e.g., Bulgarian and Czech). In these languages, in which all wh-expressions may occur preverbally, the characterization of whexpressions as simultaneously projecting a +Q feature at a task projecting a node of Ty(f) and a w/z-formula at an unfixed node, with no preverbal restriction to a projection of but a single unfixed node, is sufficient to allow the preverbal position of all wh-expressions. Simpson (1995) reports this preverbal array of wft-expressions as obligatory, but he informs us that if construed as D-linked, the second wh-form may remain in situ in a postverbal position. 17 The problems posed by so-called partial w/z-movement are especially problematic for the minimalist program, within which no unitary account appears to be possible (cf. Beck and Herman, 1996; Horvath, 1997, for recent advocacy of the two opposing "direct dependency" and "indirect dependency" accounts, both granting the necessity of the other form of account for some languages, and Simpson, 1995, for detailed evaluation of these problems.) For a more detailed account of this phenomenon within this framework, compare Kempson et al. (in press). 18 Should it prove possible to argue that the expletive form is a VP clitic as in Iraqi Arabic (cf. section 5.3.1) this stipulation would not be needed. 19 We ignore here the extra complexity associated with the preposition mil and all additional complications needed to predict constructions in which the wh- expression is contained within a larger fronted constituent, as in the English pied-piping construction. 20 For dialects in which sequences of was expletives are obligatory, the phenomenon has to be defined as locally inducing a complement clause node with appropriate properties. Cf. Kempson et al. (in press) for discussion of this and the expletives using a non-was form of wh. 21 It is also quite unlike dynamic predicate logic, whose characterization of anaphoric dependency and wh-questions involves a characterization of content exclusively in modeltheoretic terms projected from some discrete syntactic configuration defined over the syntactic string (about which the semantic formalism has nothing to say).
REFERENCES Abusch, D. (1994). The scope of indefinites. Natural Language Semantics, 3, 88-135. Aoun, J., and Li, A. (1991). Wh elements: syntax or LF? Linguistic Inquiry, 24, 199238.
292
Ruth Kempson et al.
Barwise, J., and Perry, J. (1983). Situations and attitudes. Cambridge, MA: MIT Press. Beck, I., and Berman, S. (1996). WTz-scope marking: direct vs indirect dependency. In U. Lutz and G. Miiller (Eds.), Papers on Wh-scope marking: Proceedings of a workshop on The Syntax and Semantics of Wh-scope marking 1995 (pp. 59-83). University of Stuttgart. Berwick, R., and Weinberg, A. (1984). The grammatical basis of linguistic performance. Cambridge, MA: MIT Press. Blackburn, S., and Meyer-Viol, W. (1994). Linguistics, logic and finite trees. Bulletin of Interest Group in Pure and Applied Logics, 1, 3-29. Cheng, L. (1991). On the Typology ofwh Questions. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge. Chierchia, G. (1992). Questions with quantifiers. Natural Language Semantics, 1, 181234. Chomsky, N. (1981). Lectures on government and binding. Dordrecht: Foris. Dayal, V. (1994). Scope marking as indirect wh-dependency. Natural Language Semantics, 2, 137-170. Erteschik-Shir, N. (1973). On the nature of island constraints. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA. Farkas, D. (1997). Indexical Scope. In A. Sczabolsci (Ed.), Ways of scope-taking. Dordrecht: Kluwer. Gabbay, D. (1996). Labeled deductive systems. Oxford: Oxford University Press. Gabbay, D. (1994). Classical vs. non-classical logics (the universality of classical logic). In D. Gabbay, C. Hogger, and J. Robinson (Eds.), Handbook of Logic in Artificial Intelligence and Logic Programming: Vol.2 Deductive Methodologies (pp. 359-500). Oxford: Clarendon Press. Higginbotham, J. (1981). Pronouns as variables. Linguistic Inquiry, 11, 679-708. Horvath, J. (1997). The status of wh-expletives and partial wh movement construction of Hungarian. Natural Language and Linguistic Theory, 15, 507-571. Huang, J. (1982). Logical relations in Chinese and the theory of grammar. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA. Johnson, D., and Lappin, S. (1997). A critique of the minimalist programme. Linguistics and Philosophy, 20, 272-333. Johnson, D., and Moss, L. (1994). Grammar formalisms Viewed as Evolving Algebras. Linguistics and Philosophy, 17, 537-560. Joshi, A., and Kulick, S. (1997). Partial proof trees as building blocks for a categorial. Linguistics and Philosophy, 20, 637-667. Kamp, H. (1981). A theory of truth and semantic representation. In J. Groenendijk, R. Janssen, and M. Stokhof (Eds.), Formal methods in the study of language, mathematical centre tract 135 (pp. 277-322). University of Amsterdam. Kamp, H., and Reyle, U. (1994). From discourse to logic. Dordrecht: Kluwer Academic Publishers. Kaplan, R., and Zaenen, A. (1988). Long-distance dependencies, constituent structure, and functional uncertainty. In M. Baltin and A. Kroch (Eds.), Alternative conceptions of phrase structure (pp. 17-43). Chicago: University of Chicago Press. Kempson, R., and Gabbay, D. (1998). Crossover: a dynamic perspective. Journal of Linguistics, 34, 73-124.
Computation as Labeled Deduction
293
Kempson, R., Meyer-Viol, W., and Gabbay, D. (in press). Dynamic syntax. Oxford: Blackwell. Koopman, H., and Sportiche, D. (1982). Variables and the bijection principle. The Linguistic Review, 2, 139-160. Lasnik, H., and Stowell, T. (1991). Weakest crossover. Linguistic Inquiry, 22, 687-720. Lutz, U., and Miiller (Eds.) (1995). Papers on Wh-scope marking: Proceedings of a workshop on The Syntax and Semantics of Wh-scope marking 1995. University of Stuttgart. Marcus, M. (1980). A theory of syntactic recognition for natural language. Cambridge, MA: MIT Press. McDaniel, D. (1989). Partial and multiple wh movement. Natural Language and Linguistic Theory, 7, 565-604. Meyer-Viol, W. (1995). Instantial logic. Doctoral dissertation, University of Utrecht. Meyer-Viol, W., Kibble, R. Kempson, R., and Gabbay, D. (in press). Indefinites as epsilon terms: a labelled deduction account. In H. Bunt and R. Muskens (Eds.), Computing meaning: Current issues in computational semantics. Dordrecht: Kluwer Academic Publishers. Morrill, G. (1994). Type-logical Grammar. Dordrecht: Kluwer Academic Publishers. Moortgat, M. (1988). Categorial investigations. Berlin: Mouton De Gruyter. Miiller, G., and Sternefeld (1996). 'A' chain formation and economy of derivation. Linguistic Inquiry, 27, 580-512. Oehrle, D. (1995). Term-labelled categorial type system. Linguistics and Philosophy, 17, 633-678. Ouhalla, J. (1996). Remarks on the binding properties of wh pronouns. Linguistic Inquiry, 27, 676-708. Pollard, C., and Sag, I. (1991). Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press. Postal, P. (1993). Remarks on weak crossover effects. Linguistic Inquiry, 24, 539-556. Reinhart, T. (1997). wh- in situ in the framework of the minimalist program. Natural Language Semantics, 6, 29-56. Reinhart, T. (in press). Interface strategies. Cambridge, MA: MIT Press Simpson, A. (1995). Wh-movement, licensing and the locality of feature checking. Doctoral dissertation, School of Oriental and African Studies, University of London. Sperber, D., and Wilson, D. (1986). Relevance: Communication and cognition. Oxford: Blackwell. Williams, E. (1994). Thematic relations in syntax. Cambridge, MA: MIT Press. Winter, Y. (1997). Choice functions and the scopal semantics of indefinites. Linguistics and Philosophy, 20, 399-467.
This page intentionally left blank
FINITENESS AND SECOND POSITION IN LONG VERB MOVEMENT LANGUAGES: BRETON AND SLAVIC MARIA-LUISA RIVERO Department of Linguistics University of Ottawa Ottawa, Ontario, Canada
In this chapter,1 I argue that Long Verb Movement (LVM) languages are characterized by (a) a PF interface condition on Tense (T) that mentions a Head-Complement configuration, and (b) a LVM process that fronts a nonfinite verb, and applies in PF to satisfy this condition. This has two typological consequences. The first is that unrelated languages such as Breton and Bulgarian share identical second-position effects for tensed Auxiliaries (Aux), and LVM constructions with an untensed V preceding a tensed Aux. The second is that LVM and Verb Second (V2) languages can both be said to exhibit secondposition effects in main clauses, but nevertheless differ. The received view is that V2 involves two fronting operations that are syntactic and hence check features. I propose that LVM is a hierarchical fronting process of the PF branch that satisfies an interface condition on T (or a stylistic rule), and not a checking or syntactic operation. The chapter is organized as follows. Section 1 outlines the system to satisfy the requirements of T in PF in LVM languages, which consists of two parts. Section 2 contrasts LVM and V2 languages on the basis of this system. Sections 3 and 4 discuss similarities and differences between Breton and Slavic languages with LVM, and between these LVM languages and Polish, a Slavic language without the LVM process. Syntax and Semantics, Volume 32 The Nature and Function of Syntactic Categories
295
Copyright © 2000 by Academic Press All rights of reproduction in any form reserved. 0092-4563/99 $30
296
Maria-Luisa Rivero
1. PF CONDITIONS ON TENSE The central idea of this chapter is that in LVM languages the functional category T is subject to a bare output condition or PF requirement. This condition is configurational or hierarchical and not linear, does not mention formal features, and can be satisfied via two core syntactic structures: the Head-Complement configuration in (la), or the Checking Configuration in (Ib).
On this view, T may satisfy its output condition when it heads a TP that is the Complement of a C with certain PF characteristics. That is, T must be in the structure depicted in (la), or in the Internal Domain in the sense of Chomsky (1995) of a C that is for now overt, but more precisely visible. This condition is formulated in (2), with H standing for head. (2) H-Internal Domain Condition Satisfy the PF condition of T in the internal domain of a C visible in PF. Alternatively, T may satisfy its PF requirement by appearing in a Checking configuration in the sense of Chomsky (1995), as when an overt V adjoins to T in the structure depicted in (1b). This condition is formulated in (3). (3) H-checking Domain Condition Satisfy the PF condition of T in its H-checking domain. The intuitive idea is that T requires overt support at the PF interface, and that this support can be supplied under two structural conditions: (a) by a head that is the sister of TP, the maximal projection of T, or (b) one that is the sister of T itself. The choice between these two structures gives rise to parametric variation, which distinguishes Breton from Slavic, and Slavic languages from one another. First, differences between Breton and Slavic languages with LVM can be attributed to contrasts in the quantitative use of these two options, as follows. In Breton, T is usually licensed with condition (2) and the Head-complement configuration in (la), while condition (3) with the checking configuration in (1b) is used with just a few verbs. These two licensing options for T are also found in Slavic languages with LVM, but under different circumstances. The checking configuration in (1b) is used to license T with verbs and with lexical auxiliaries; condition (2) with the Head-Complement configuration in (la) is used with just the subset of auxiliaries that are functional. Breton and these Slavic languages, then, are similar in licensing T under condition (2) and the Head-Complement configuration, and
Finiteness and Second Position
297
in sharing a LVM process with parallel properties that applies in PF triggered by this condition. In sum, LVM-languages use two structures to license T in PF under different circumstances. Breton uses condition (2) in general and (3) only exceptionally, while the Slavic languages with LVM use condition (2) just for functional auxiliaries. Second, the two options to license T in PF are also at the source of a parametric variation that distinguishes Polish from LVM languages in Slavic, and also Breton. Borsley and Rivero (1994) argue that Polish lacks LVM and displays raising of a nonfinite V to finite Aux (i.e., Incorporation forming a morphological complex). I suggest here that what makes Polish different is that it uses a variant of condition (3) to license T in PF with functional auxiliaries, which does not trigger LVM but an Incorporation or word formation rule that is stylistic (i.e., a hierarchical operation in the PF branch that does not check the formal features of the target against those of the raising V).
2. LONG VERB MOVEMENT VERSUS VERB SECOND This section compares LVM and V2, deriving differences from the PF condition on T and the PF process to satisfy it in LVM languages. 2.1. Root Clauses V2 languages exhibit a word-order asymmetry that distinguishes main from embedded clauses, as in (4). (4) a. Dieses Buck hat Hans gelesen. German 'Hans has read this book.' b. Ich bedauere [dass Hans dieses Buck gelesen hat]. 'I regret that Hans has read this book.' c. Hat Hans dieses Buck gelesen ? 'Has Hans read this book?' The classic analysis is that the (finite) head raises to C in (4a), while it does not in (4b) (den Besten, 1977, to Branigan, 1996), the V-in-C part of V2. Second, if V is in C, Spec-of-C must also be filled, and a common assumption is that a covert operator fills that position in (4c). LVM-languages contrast with V2-languages in two respects. First, a nonfinite V raises to C in LVM. Consider (5-6) from this perspective. (5) a. Lavaret en deus [he deus desket he c'henteliou]. said 3S have [3S have learned her lessons] 'He has said that she has learned her lessons.'
Breton
298
Maria-Luisa Rivero
b. *En deus lavaret [he deus desket he c'henteliou}. c. *Lavaret en deus [desket he deus he c'henteliou]. (6) a.
Spytalsomsa [chi si napisal list}. Slovak asked have+ 1SREFL [if have+2S written letter] 'I have asked if you have written a letter.' b. *Som sa spytal [chi si napisal list}. c. *Spytal som sa [chi napisal si list}.
In (5a-6a), V precedes Aux in the main clause, and follows it in the subordinate clause. The proposal is that V fronts to C in main clauses through a LVM process2 that does not apply in the majority of subordinate clauses, which is similar to V-to-C in V2. Second, Spec-of-C must be phonologically empty if the nonfinite V is in C (Rivero, 1993a), which can be illustrated with multiple Wh-movement and LVM, as in (7). (7) a.
Koga kakvo e kupil? when what have+3S bought? 'When has he bought what?' b. *Koga kakvo kupil e ? c. Kupil li e knigata? bought Q have+3S book+the 'Has he bought the book?'
Bulgarian (Rudin, 1986:(81b))
For Rudin (1988), Wh-phrases in Bulgarian move to Spec-of-C, which can hold several phrases as in (7a). However, (7b) shows that V cannot front with those phrases, and (7c) indicates that it fronts in interrogatives with no overt Spec-of-C (Rivero, 1993b). Thus, if Spec-of-C is filled with one or more phrases, C cannot hold the untensed V. The same is found in declaratives, as Breton (8-9) illustrate. (8)
a. *Al levr lennet en deus Tom. Breton the book read 3S have Tom b. *[CP NPk [c' [co Vi] [TP Aux [VP ... t i . . . tk]]]].
(9) a. Al levr en deus lennet Tom. the book 3S have read Tom 'Tom has read the book.' b. [CP NPj [c' [co 0] [TP Aux [VP ... t i . . . ]]]] c. Lennet en deus Tom al levr. Tom has read the book.' d. [CP [c' [co V;] [TP Aux [VP ... ti ; ... ]]]] For Borsley, Rivero, and Stephens (1992), (8) combines Topicalization and LVM, and hence has material in both Spec-of-C and C, while (9a-c) contain only one
Finiteness and Second Position
299
overt constituent in CP. They also argue that lennet in (9c) is in C, and distinguish between LVM and VP-topicalization as in (10). Borsley and Stephens (1989: sect. 6) argue that (finite) auxiliaries or verbs are in I=T. (10)
O lennal levr n' emanket Tom. Breton PRT read the book Neg is Neg Tom 'Tom is not reading the book.' 'Reading the book, Tom is not.'
To conclude, a hallmark of V2 languages is root clauses where C is filled by a finite V, and Spec-of-C is also filled. The hallmark of LVM-languages is (a subset of) main clauses with a C filled with the nonfinite V, and a Spec-of-C that is empty of phonological material. 2.2. Nonroot Clauses Germanic embedded V2 is not homogeneous, which has attracted much attention (latridou and Kroch, 1992, for an overview). In standard Dutch, V2 is limited to main clauses. In German and mainland Scandinavian, it is possible in a limited range of subordinate clauses. In Icelandic and Yiddish, it is acceptable in a wide range that includes adjunct and subject clauses. For some, including Vikner (1991), embedded V2 results from CP-recursion as in (lla): Cl takes CP2 as complement, Spec2 holds a phrase, and C2 holds the embedded tensed V. For others, including Diesing (1990), embedded V2 can also result from V in I, and any type of phrase in Spec-IP, as in (1 Ib). (11) a. V [CP1 C1 [CP2 X max i [c' [C2 Vj] [IP ... t j . . . ti;. .. ]]]] b. V [CP C [IP X max i [i; [i Vj ] [VP ... t j . . . t i . . . ] ] ] latridou and Kroch (1992) relate the restricted embedded V2 of mainland Scandinavian to CP-recursion licensed by a governing V as in (1 la). Unrestricted embedded V2, as in Yiddish and Icelandic, involves IP as in (1 Ib). Embedded LVM also exists but is not homogeneous. There are two groups of languages. One includes Serbo-Croatian and Czech and resembles standard Dutch, with LVM only in main clauses; in these languages, untensed Vs do not precede tensed Aux in subordinate clauses. A second type resembles mainland Scandinavian and includes Old Spanish. Lema and Rivero (1991:254-257) show that this language allows LVM in the complement of bridge Vs, and argue that this results from CP-recursion: the untensed V fronts to C2 in a structure like (lla). I know of no language with LVM in a wide range of subordinate clauses, so the third group with unrestricted embedded LVM corresponding to Icelandic and Yiddish seems not to exist. If the landing site of LVM is C, this situation can be accounted for. On this view, LVM is restricted to the type of embedded clause with CP-recursion, which contains a landing site for the untensed V. Then, the absence of embedded LVM in Serbo-Croatian or Czech means that there is no
300
Maria-Luisa Rivero
CP-recursion as in standard Dutch, and restricted embedded LVM in Old Spanish indicates that there is CP-recursion, as in mainland Scandinavian or German. In sum, some languages show LVM in declarative complements embedded under bridge verbs, which fits well with the idea that V raises to C in LVM, and other languages disallow embedded declaratives with LVM because they disallow CP-recursion. Breton and Bulgarian lack LVM embedded under bridge Vs, but display interrogative complements with this process, as in (12). (12) a. N' ouzon ket [ha lennet en deus Tom al levr.] Breton Neg know+ IS Neg [Q read 3S have Tom the book] 'I do not know whether Tom has read the book.' b. Ne znam [prochel li e Petur knigata.] Bulgarian Neg know+lS [read Q have+3S Peter book+the 'I do not know whether Peter has read the book.' These patterns may look surprising, as embedded V2 is not usually restricted to interrogative complements, and most Germanic languages disallow V2 under question Vs. I discuss this in section 3, and argue that the hypothesis that LVM is a process that satisfies a PF interface condition of a functional category can provide an account of these data. 2.3. Explanations for V2 and LVM Let us recall the main account of V2 to contrast it with the one proposed here for LVM. For Vikner (1991:2.2 and references), the classical view is that a finite feature in C triggers V-in-C in V2, an idea found most recently in Branigan (1996). A topic /wh-feature is often seen as the reason to fill Spec-of-C overtly. The two aspects of V2, then, result from internal conditions of the syntax or feature checking. Finiteness is checked via raising to C, and topichood via raising to Spec-of-C. In my view, LVM phenomena are of a different nature. The core idea is that LVM-raising contrasts with V-raising in V2 in that it operates to satisfy a condition of the PF-interface—or what Chomsky (1995) calls a bare output condition—not to check a formal feature. As LVM satisfies an external condition and is not a checking operation, (a) it can apply in the PF branch and (b) have an output that does not establish a checking configuration in the sense of Chomsky (1995). However, LVM is a hierarchical not a linear process, and comparable to the operations called stylistic in Chomsky (1995) and Chomsky and Lasnik (1977), and the PF-driven rules in Zubizarreta (1995) and Reinhart (1995). In other words, LVM applies in PF but has "syntactic" and not "phonological" characteristics. With this proposal in mind, let us look at differences between LVM and V2 beginning with V-in-C. My claim is that LVM resembles V2 in thatfiniteness— or T—constitutes a trigger for both processes: the untensed V raises to C to satisfy
Finiteness and Second Position
301
a condition of T in TP. The differences are (a) that LVM is not a checking operation, and (b) that LVM can establish a Head-Complement configuration, and thus is not limited to the checking configuration required of syntactic Move. Consider the analysis proposed for (13), which repeats (9) with .TP replacing IP. (13) a. *En deus lennet Tom al levr 3S have read Tom he book b. Lennet en deus Tom al levr. c. [CP [c' [co Vj] [TP Aux [VP ... t i . . . ]]]] The auxiliary en deus (T) imposes the interface condition (2) that mentions the structure in (la): it must appear in PF in the complement of a head that is visible. This PF condition can be met in a variety of ways examined in section 3, but the LVM process satisfies it: it fills C with V, which makes TP the complement of an overt C. On this view, LVM results in the Head-Complement configuration depicted in (la), which is similar to the output of Merge in the computation, and contrasts with the output of Move. That is, LVM does not establish a checking configuration between the verb and the auxiliary containing T, in contrast to what is required for syntactic Move. In my view, LVM can establish the output in (la) because it does not check features, and in Rivero (in prep) I develop the idea that this output corresponds to the case where the category affected by the process (V) keeps on projecting once it raises, and the category that serves as target/attractor in the movement (Aux) does not further project. By contrast, in the case of syntactic Move, the target projects and the moved category does not, and this ensures that a checking configuration is established where formal features can be checked in the sense of Chomsky (1995). The availability of two configurations to license T in PF suggests that interface conditions use the same structures as the conditions of the internal system. On the one hand, the interface condition in (3) is based on the configuration used to license (i.e., check) formal features in the system, which most often results from Move and not Merge. The interface condition in (2) is based on the Headcomplement configuration, which in the computation is used for Theta relations and Selection and results from Merge. In sum, PF offers a choice because the two configurations for internal requirements are also used for interface or external requirements. The hypothesis that LVM satisfies an interface condition on T based on the Head-complement configuration derives a major syntactic difference between yes-no questions in LVM and V2-languages. We saw that in V2 languages finite Auxiliaries can be string initial in yes-no questions. LVM languages offer several syntactic strategies for such questions, but disallow string-initial Auxiliaries, as Slovak illustrates. In Slovak, LVM applies in the same way in declaratives and yes-no questions.
302
Maria-Luisa Rivero
(14) a. *Si napisal list? Slovak have+2S written letter b. Napisal si list? written have+2S letter? 'Have you written the letter?' The difference between German and Slovak results from the PF condition on T. Fronting the untensed V to C is independent of illocutionary factors, but ensures that at PF the tensed Aux heads the Complement of an overt C. Yes-no questions may contain an unpronounced operator in Spec-of-C, but I show in section 3 that condition (2) can be satisfied only if overt material and not unpronounced material fills Spec-of-C. The second aspect of LVM is that Spec-of-C must be phonologically empty if the nonfinite V is in C. This means that V-raising in V2 applies in all main clauses, while LVM operates only in a subset of them, and is in complementary distribution with movement to Spec-of-C. This difference follows from the proposal that LVM is a PF process that applies to satisfy an interface condition, and not to check formal features. LVM becomes unnecessary and in fact is blocked if Topicalization or wh-movement establish a configuration that is a legitimate PF object that satisfies the requirements of T. Under the standard assumption that Topicalization/ wh-movement apply for feature checking, these rules are obligatory when the appropiate formal feature is present in the structure, thus they block LVM, which is not triggered by a feature. LVM is triggered by the need to satisfy an interface condition of T, plays no internal role in the system, and will not operate once feature-checking frontings succeed in establishing the configuration that satisfies T's requirement. In this way LVM has a "last resort" flavor in the sense of Chomsky (1991) and Epstein (1992), but at the same time is a process that violates the Last Resort Principle in the sense of Chomsky (1995). In section 3, I argue that this difference as to feature checking between LVM in PF and syntactic rules accounts for contrasts as to second- and third-position effects in Breton. The core idea is that feature-checking rules each check a (different) formal feature and can co-occur, and the auxiliary in T may end in third position. By contrast, LVM is a process which usually leaves an auxiliary in second position; it applies in PF only if fronting in the syntax does not establish the configuration to license T. Chart (15) summarizes the differences between LVM and V2 languages. First, with LVM finiteness is located in T and imposes a PF interface condition, but with V2 finiteness is located in C and imposes an internal condition. V raises to C in both V2 and LVM, but in V2 finite raising is a checking or syntactic operation, and in LVM nonfinite raising is a PF or stylistic rule with a configurational output that moves V to C to satisfy the external condition of T in TP. Second, V-to-C in V2 combines with other feature-checking processes that fill Spec-of-C, with each rule playing a different role in the system. By contrast, LVM is incompatible with
Finiteness and Second Position
303
feature-checking operations that fill Spec-of-C in the syntax given that those rules establish the appropiate configuration to license T in PF
(15) T in Complement of visible C V-to-C trigger Content of C in PF Content of Spec-of-C in PF
LVM languages
V2 languages
Yes PF condition Untensed V Empty (if C filled)
No Last Resort Tensed V Usually filled
2.4. Differences in LVM Languages LVM languages are not identical, and two differences related to T deserve mention in this section. The first is that Breton is VtensedSO like the Celtic languages (Anderson and Chung, 1977; Borsley and Stephens, 1989; Schafer, 1992), while other LVM languages are not. The contrast is observed in embedded clauses where the rigid VtensedSO of Breton contrasts with flexible order in Slavic, including SVO as in Bulgarian (16b). (16) a. Mona a lavar [e oar Yann ar respont]. Breton Mona PRT says PCL knows Yann the answer 'Mona says that Yann knows the answer.' (Schafer, 1992: (4)) b. Petur znaex [ che decata vidjaxa knigata]. Bulgarian Peter knew that children+the saw book+the 'Peter knew that the children saw the book.' Clause-initial finite Vs characterize VSO languages, as illustrated with Welsh (17). Breton belongs to the VSO type, so it may appear surprising that tensed items cannot be first in root clauses in this language, as illustrated in (18). (17)
Gwelodd Emrys ddraig. Welsh saw Emrys dragon 'Emrys saw the dragon.'
(18)
*Lenn Anna al levr. Breton reads Anna the book '*Anna reads the book.'
I attribute the Breton restriction to the PF requirement on T, which is absent in other VtensedSO Celtic languages: TP must be the Complement of a visible head. For the moment visible is equivalent to overt. As a result, VtensedSO order is reserved in Breton for environments where this PF condition is satisfied, which include the embedded clause in (16a). In brief, Breton is V-initial and the Slavic languages are not, but the restriction on T makes the Celtic language resemble the Slavic languages.
304
Maria-Luisa Rivero
A second difference concerns finite verbs versus auxiliaries. We just saw that Breton disallows sentence-initial auxiliaries and verbs: (13a) and (16). In West and South Slavic, by contrast, auxiliaries traditionally known as clitics cannot be sentence-initial (14), but tensed verbs and lexical auxiliaries can be, which is discussed in section 4: (19)
Vidjaxa (decatd) knigata(decatd). saw (children+the) book+the 'The children saw the book.'
Bulgarian
I attribute this contrast to the PF licensing system for T summarized in (20). (20)
Tense-licensing in PF in LVM Languages V-raising to T Language Breton Slavic
TP as complement
V
Aux
V
Aux
No Yes
No No
Yes No
Yes Yes
In Breton T is licensed in the Head-Complement configuration in almost all cases. In Slavic, this structure is reserved for auxiliaries, and the Checking configuration in (Ib) is operative with verbs. On this view, Breton T must appear in PF in a complement, which means that finite verbs and auxiliaries cannot be sentence initial. In Slavic, T in verbs is licensed in the configuration in (Ib), which means that tensed Vs can be sentence-initial as in (19), and hence need not be in a projection that is a complement.
3. BRETON In this section, I argue that Breton T is licensed mainly via the internal domain condition in (2) and hence in the configuration in (la), which is the cause of second- and third-position effects with most tensed items. The checking configuration in (Ib) and (3) is used with a few Vs, leading to some first position effects. The preference for condition (2) and LVM in PF make Breton contrast with other Celtic languages and resemble the Slavic languages. 3.1. The H-Internal Domain and Second Position The H-internal domain condition in (2) accounts for the second position of most tensed items in Breton. This principle requires that tensed root clauses have a layer above TP, which is identified here with CP, and frontings in syntax or PF ensure the projection of this level.
Finiteness and Second Position
305
First consider perfect auxiliaries. Affirmative root constructions may satisfy the H-internal domain condition by a phrase in Spec-of-C, or a V that fills C through LVM.3 (21) a. b. c. d.
Al levr en deus lennet Tom. Breton [CP NP [c' [co ] [TP A Lennet en deus Tom al levr. [CP [c' [co Vj] [TP Aux [VP NP tj NP]]]] 'Tom has read the book.'
If a phrase moves in the overt syntax to Spec-of-C, as in (2la), it signals that C must be projected before Spell-Out, which makes the position visible in PR If it is assumed that a covert/LF movement raises an abstract operator in yes-no questions, under our approach this will not make C visible. If V moves to C as in (21c-d), this head contains overt material. Now consider synthetic Vs, as in (18) repeated in (22a). With just a tensed V, the option with V in C fails to occur. If V moves to C, it does not head a complement and violates the internal domain condition. A V in T is also impossible with no overt material in CP, as TP is not a complement, or complements a C that is not visible in PF. The derivation with fronting of a phrase, as in (23), complies with the requirement on c. (22) a. *Lenn Anna al levr. Breton b- "[cp [c' [coVi] [Tpti NP NP]]]] c. *[Tp Vi [vp NP ti NP]] (23) a. Al levr a lenn Anna. the book PCL read+PRESENT Anna 'Anna reads the book.' b- [CP tspec-c NPj] [c' [coq ] [TP a Vk [VP NP tk tj ]]] (22a) is deviant and Stump (1984:298) notes that sentences with sentence-initial particles are also ill formed: *A lenn Anna al levr. Following Stephens (1982), Stump (1984, 1989), and Schafer (1992), I assume that the particle is in T, not in C (contra Hendrick, 1991), which means that sentences with initial particles are ruled out for the same reason as (22): they lack a filled=overt C. Consider ober 'do' as tense carrier (Borsley, Rivero, and Stephens, 1986; Schafer, 1994; Stephens, 1982; Wojcik, 1976). Ober is inflected for Tense and takes a VP complement with V raising to C by LVM, as in (24a). The untensed V is the visible head required by the H-internal domain condition to license T. (24) a. Lenn a ra Anna al levr. Breton read PCL do+PRESENT Anna the book 'Anna reads the book.' b. [ C p q [ c V i ] [ X T P a Aux [ V p NP t i NP]]]
306
Maria-Luisa Rivero
Thus, Breton sentences may come in analytic or synthetic versions. In the synthetic construction in (23), the lexical V is inflected for Tense, and X max fronting to Spec-of-C applies and makes C visible. In the analytic construction as in (24), Aux is inflected for Tense, and the lexical V is the head of its nonfinite complement. In this situation, LVM or X max -fronting, as in Al levr a ra Anna lenn, are the two alternative options to license T. Consider now negation. Initial ne in Breton ne . .. ket counts for first position, as in (25): (25) a. N' en deus ket lennet Tom al levr. Breton neg 3S have neg read Tom the book 'Tom has not read the book.' b. Ne lenn ket Anna al levr. neg reads neg Anna the book 'Anna does not read the book.' Here Breton resembles other Celtic languages and some of the Slavic languages, as in (26). (26) a. Nid ydyw Megan ddim yn cysgu. Welsh neg is Megan neg in sleep 'Megan is not sleeping.' b. Ne sum prochel (az) knigata. Bulgarian Neg have+lS read (I) book+the 'I have not read the book.' For some, Celtic Neg occupies C (Chung and McCloskey, 1987; Rouveret, 1991), an idea adopted for Breton in (Hendrick, 1988, 1991; Borsley, Rivero, and Stephens, 1996). Given this assumption, Neg licenses T as the overt head in C that takes TP as complement, as in (27). (27)
[CP [c Ne] [TP [T lenn} [ket Anna al levr}}}
Now consider subordinate clauses, as in (28). These do not contain overt complementizers, have the Aux in first position, and disallow LVM. (28) a. Lavaret he deus Anna [en deus lennet Tom al levr.} Breton said 3S have Anna 3S have read Tom the book 'Anna said that Tom had read the book.' b. V [Cp [c' [co q] [TP en deus [Vp lennet Tom al levr}}}}} If such clauses are CPs with a null C as in (28b), C is visible in PF due to selection. That is, the main V contains a feature encoding the type of clause it selects (i.e., declarative), and a similar feature is found in the embedded C. When V is combined with a clause via Merge, the feature in V and the one in C must match, and this information is available in the remainder of the derivation including PF, and
Finiteness and Second Position
307
makes C visible. In (28), then, TP is the complement of a nonovert but visible C. If V in fact takes a TP complement and there is no CP, V is the visible head that licenses T in PR The Internal Domain Condition is an interface principle for Tense, as the following cases demonstrate. First, the LVM construction in (29) shows that nonfinite auxiliaries can be initial, so there is no prohibition against auxiliaries in first position. (29) Bet am eus kavet al levr. Breton had IS have found the book 'I have found the book.' Second, imperatives lack T (Beukema and Coopmans, 1989; Zanuttini, 1991; Rivero, 1994a), show Agr (Person and Number), and can be initial as in (30). This shows that verbs with Agr and no T need not be in a complement. (30) Sent ouzh da vamm! Breton obey+2Sto your mother! (Schaefer, 1992: (38)) 'Obey your mother!' Third, finite Vs show T and Agr with null subjects, but only T with overt subjects (Anderson, 1982; Borsley and Stephens, 1989; Hendrick, 1988; Schafer, 1992; Stump 1984, 1989). Regardless of Agr, however, finite verbs must appear in a complement, as in (31), and cannot be initial. This shows that the Internal Domain Condition is sensitive to T and disregards Agr. (31) a. Levriou a lennont. (Stump, 1984) books PCL read + 3P They read books.' b. Levriou a lenn (*lennoni) ar vugale. books PCL read+3S (*read+3P) the children 'The children read books.' In sum, in Breton T is licensed when TP is the complement of a visible head. A head is visible (a) if filled overtly, as in LVM, (b) if its Spec is filled overtly, as in Topicalization, or (c) if its projection is selected, if we assume that embedded declaratives have a null C. 3.2. The H-Internal Domain and Third Position Two types of third-position effects result from the internal domain condition, as when (a) feature-checking rules co-occur, or (b) an initial constituents is in a projection that does not have TP as its internal domain. Let us begin with checking operations by recalling that LVM applies to license
308
Maria-Luisa Rivero
T, and will not operate when a Spec-of-C is filled in the syntax: LVM does not cooccur with Topicalization. Checking operations must apply obligatorily to check features, so may combine in ways that leave the auxiliary in third position, which can be illustrated with Neg-fronting and Topicalization. Under the assumption that Neg originates within the clause (Borsley et al., 1996), Neg-fronting operates to check a strong [+neg] feature in C. The standard assumption is that Topicalization checks a Top/Focus feature in Spec-of-C. When these two rules combine, as in (32), Aux is in third position. (32) a. Al levr n' en deus ket lennet Tom. Breton the book neg 3S have neg read Tom 'Tom has not read the book.' b. [CP Al levr [c' [Co n'} [IP en deus ket lennet Tom]}] The purpose of LVM in PF is to make C visible. Raising Neg to C in the syntax also has this effect, so this is why LVM does not combine with Neg-fronting (Stephens, 1982), as in (33). In other words, LVM is usually seen with an auxiliary in second position. (33)
*Lennet n' en deus ket Tom al levr. Breton read neg 3S have neg Tom the book
One exception to this is the auxiliary in third position in left-dislocations and yes-no questions. First consider left dislocated phrases, which do not usually "count" for first position, in contrast with topicalized phrases, which usually do. This characteristic is illustrated in Breton (34): the dislocated phrase is followed by a V fronted by LVM and the auxiliary in third position. That topicalized phrases differ is seen in (35): they disallow LVM or count as first position: (34) a. Yann, roet meus al levr dezhan. Breton Yann, given IS+have the book to + him (Schafer, 1992:(44b)) 'Yann, I've given the book to him.' b. [TOPP NP [CP [C' [co Vi] [TP Aux [vp ti NP PP]]]]] (35) a. *A/ levr lennet en deus Tom. Breton the book read 3S have Tom b. *[ CP NP k [c, [a, Vi] [TP Aux ti ..tk.]]]. I account for this contrast using the Left Dislocation analysis of Chomsky (1977) updated in Lasnik and Saito (1992:76ff): such constructions contain a projection called here TOP(ic)P with the base-generated dislocated phrase as Spec and a null head. (36)
[TOPP X max [ TOP ' [TOP 0] [ CP [c- C° TP]]]].
On this analysis, C prevents TOP from having TP as its internal domain. TP is in the complement domain of TOP, but not its minimal complement. In (34), then,
Finiteness and Second Position
309
LVM places V in C to license T. Topicalization, by contrast, is a movement to Spec-of-C that establishes in the syntax the internal domain for T, so blocks LVM. Now let us consider yes-no questions such as (37), where the question particle ha is followed by sonjal 'think' fronted by LVM, the particle a, and Aux 'do', which means that they are similar to Left-Dislocations. W/z-phrases are like topic phrases and incompatible with LVM as in (38) (ha is not used in wh-questions). (37) Ha sonjal a raint er bleuniou? (Desbordes, 1983:84) Q think PCL do+Fut+3PL the flowers 'Will they think of the flowers?' (38) a. Piv en deus lennet al levr? Breton who 3S have read the book? 'Who has read the book?' b. *Piv lennet en deus al levr? Ha heads a phrase dubbed Q(uestion)P as in (39) and takes a CP-complement that Borsley et al. (1996) assume it does not L-mark.
Alternatively, ha as functional head does not select a specific type of complement. This means that the question particle can be merged in the computation with a complement that does not share an equivalent feature standing for, roughly, illocutionary force: ha is [+Q], but the C that projects CP in (39) does not contain this feature. The assumed lack of selection between ha and C prevents C from being visible in PF unless either it or its Spec is filled overtly. In structures such as (39), then, LVM fills C so that TP can be the complement of a visible head. We noted in section 2 that embedded LVM is found in ha-interrogative complements, as in (12) repeated now as (40a). Such complements also allow Topicalization, as in (40b). (40) a. N' ouzon ket [ha lennet en deus Tom al levr.] Breton Neg know+ IS Neg [Q read 3S have Tom the book] b. N' ouzon ket [ha al levr en deus lennet Tom.] Neg know+ IS Neg [Q the book 3S have read Tom] 'I do not know whether Tom has read the book.'
310
Maria-Luisa Rivero
The analysis given for main clauses can account for these embedded questions. If ha is merged with an unselected CP-complement, Xmax-fronting to Spec-of-C in the syntax (Topicalization) or LVM to C in PF are two procedures that make C visible to license T in PF. Thus, the interface condition on T that mentions the Head-Complement configuration is ultimately responsible for LVM in clauses that contain the question particle ha, whether they are embedded or not, and accounts for verb fronting to C in a syntactic environments where, as noted in section 2, V2 phenomena are usually absent in Germanic.4 In sum, third-position effects result in Breton from checking operations that combine as when Topicalization co-occurs with Neg-fronting. These two operations block LVM in PF. They also result when a first constituent is in a projection that does not have TP as its internal domain, as with dislocations and haquestions, in a structural situation that allows LVM. 3.3. The H-Checking Domain and First Position In Breton root clauses, first-position effects with tensed Vs are lexically determined. Aspectual mont 'go', and eman 'be', are the two Auxiliaries that can be sentence-initial, or head clauses that are not complements. Eman takes either a PP or a progressive complement, as in (41). (41) Eman Yann o lenn al levr. Breton Is Yann PROG read the book 'Yann is reading the book.' I suggest that these entries have a feature that allows them to satisfy all the requirements of T, including its PF-condition. One way to implement this idea is that mont and eman raise from an internal position in the clause, adjoin to T, satisfy the PF-interface condition of T via the H-checking Domain Condition, so can be sentence-initial. 3.4. Breton Clitic Pronouns In some languages, an internal domain condition similar to (2) serves to license functional categories like D (clitic pronouns); this gives rise to pronouns in second position, as in Slavic. In Breton condition (2) is restricted to T and does not apply to D. Breton clitic pronouns do not impose an interface condition, but adopt the characteristics of the verb they are attached to. Consider (42). (42) a.
b.
E desket en deus Yann. Breton him taught 3S have Yann 'Yann has taught him.' [cp [c'[co [vo CL [vo V]i] [Tp Aux[vpNPti]]]]
Finiteness and Second Position
311
c. *E deskas Yann. Breton him taught Yann '*Yann taught him.' d.
"[TptTotvoCLtvoVlhHvpNPti]]]
When clitic pronouns are attached to untensed Vs as in (42a), they can be initialappearing in a projection that is not a complement. This sentence involves LVM of the untensed V with the pronoun. This verb does not contain T, and since the pronoun imposes no interface condition of its own, the two share the initial position. By contrast, when clitics attach to tensed Vs they must be within a complement projection and cannot be initial: (42c). The finite verb deskas hosts a preceding clitic and contains T, which means that it must head a complement but it does not, so the sentence is deviant. In brief, clitic pronouns can be initial only when attached to nonfinite verbs. Breton shows a dichotomy between T and D that has no counterpart in Slavic. In Breton, T must be in an internal domain, but D need not be, which means that Wackernagel phenomena arise with finite verbs and auxiliaries, but not with clitic pronouns. In Slavic, Wackernagel phenomena appear with some finite auxiliaries and with clitic pronouns. To bring these Breton and Slavic phenomena under a common umbrella, I suggest that the fundamental idea is that certain functional categories (as opposed to clitics) must satisfy PF interface conditions and hence that this variation can be attributed to functional categories. This means that I do not emphasize the notion "clitic" when discussing second position phenomena. Under my view, Breton and Slavic share the licensing system for the functional head T but not the licensing system for functional D, and I maintain that tensed Aux in Slavic or Breton can be considered "clitics" because they contain a T that imposes a bare output condition, while untensed Aux are not clitics because they lack the relevant T. On this view, which allows one to unify Breton and Slavic phenomena, "clitic" becomes a derivative not a primitive notion. If "clitic" was the crucial notion, the curious conclusion would be that all Breton tensed Vs and Aux are "second-" position clitics, and that only those Breton pronouns that are attached to tensed Vs are second-position clitics. As stated, my proposal is that T in Breton and Slavic must satisfy similar interface conditions, and those resemble the ones imposed by D in Slavic. 3.5. Summary Breton is head-initial, which means that no constituent precedes T within TP, but the PF interface condition called the H-Internal Domain Condition requires that TP be the complement of a visible C. This provides a unitary account for the "second-" and "third-" position of finite heads in Breton root clauses. We have seen that the finite auxiliary/verb in TP is in second position in (a) affirmative w/z-questions, which have the w/i-phrase in Spec-C, (b) affirmative topicalizations
312
Maria-Luisa Rivero
with the topic in Spec-C, (c) negative clauses with ne in C, and (d) LVM constructions with the untensed V in C. Finite auxiliaries/verbs are third when featurechecking rules co-occur as in (e) negative wh-questions with the Wh-phrase in Spec-C and ne in C, and (f) negative topicalizations, with the same characteristics. Third position is also possible with an initial constituent above CP as in (g) dislocations with the dislocated phrase in TOPP, and (h) questions with ha, which resemble dislocations. The verbs in first position are mont and eman, and they license T in PF via the Checking Configuration or (3).
4. SOUTH AND WEST SLAVIC The aim of this section is to establish that South and West Slavic shares with Breton the PF licensing system for T. Slavic languages that participate in this system, which need not be identical in other respects, include Bulgarian, Czech, Serbo-Croatian, and Slovak. Polish participates in this licensing system in a way that differs, and is discussed as one example of parametric variation on the HeadComplement and Checking configurations. We have seen that in Breton, T is licensed via the H-internal Domain Condition, with the exception of mont and eman, which invoke the H-checking Domain Condition. In this section, I argue that Slavic uses these two licensing modes, but for different lexical items, which leads to quantitative differences. The H-checking Domain Condition and (tensed) V-to-T apply with verbs and lexical auxiliaries, and the H-internal Domain Condition affects the Slavic Aux traditionally labeled clitics, and triggers LVM (but not in Polish). Slavic LVM has the characteristics discussed for Breton: it moves the untensed V to C to make TP the internal domain of a visible C, which licenses T. As a result, Slavic shares with Breton Wackernagel phenomena for tensed Aux. I mentioned earlier that Breton and Slavic contrast as to systems for the functional category D (clitic pronouns), which will not be discussed. 4.1. Verbs versus Auxiliaries T-licensing conditions distinguish Vs and Aux in Slavic. Slavic Vs need not head complements, as in (16a) repeated as (43). (43)
Vidjaxa decata knigata. Bulgarian saw children+the book-I-the 'The children saw the book.'
T is satisfied via the H-checking Domain Condition as in (44) through (tensed) V-to-T. Slavic Vs differ from most Breton Vs, but not from mont and eman, which use (44), so the difference reduces to lexical variation:
Finiteness and Second Position
(44)
313
[ T p[ T oV i [ T oT ] ][ v p . . . t 1 . . ] ]
VSO in questions with dali in C (Rudin, 1986; Rivero, 1993b) show that V is not in C: (45) a. Dali vidjaxa decata knigata? Q saw children+the book+the 'Did the children see the book?' b. [CP [c' [co dali] [TP Vi, [VP NP ti,NP]]] According to Lema and Rivero (1990) and later work, Auxiliaries fall into two classes. When Aux are positionally free, they resemble Vs in various ways and belong to the lexical class, with a good example being the modal equivalent to must. By contrast, Aux that are positionally restricted do not resemble Vs in the same way, and show properties of temporal affixes, so they are dubbed functional. In this chapter, I propose that in Slavic the functional class uses the internal domain condition to license T, while the lexical class is identical to V and uses the checking configuration (I offer no explanation as to why this is so). Arguments for the lexical/functional dichotomy are given for Slavic and Old Romance LVMlanguages in (Rivero, 1994b), and are repeated here only partially, with Czech as the case in point. Let us begin by introducing these two classes in view of positional restrictions. The Czech perfect Aux in (46) is functional, and must head a complement. Future Aux as in (47) is lexical, is free from this restriction, and behaves like a tensed V. (46) a.
a'. b. b'. c. d.
Tehdyjsem pisal knihy. Czech then have+IS written books Then I {have written/was writing} books.' [CP Tehdy [Co 0] [TP jsem pisal knihy}} Pisal jsem knihy tehdy. 'I {have written/was writing} books then.' [Cp [c0pisali ][TP jsem [Vp ti knihy]]] *Jsem pisal knihy tehdy. * Tehdy pisal jsem knihy.
(47) a. Budu pisat knihy. will.IS write books 'I {will write/will be writing} books.' b. [ TP Budu [ VP pisat knihy] ] Similarities with Breton arise with Slavic functional Aux, but not lexical Aux nor ordinary Vs. Slavic is similar to Breton when auxiliaries must be in the complement of a visible head: (46). Slavic shares strategies with Breton to place tensed Aux in a complement. That is, X max may fill the Spec of the superordinate category, as in (46a), or an X° may fill the head of that category, as in (46b), through LVM. Slavic is like Breton, as LVM fails to apply if Spec-of-C is filled: (46d). In
314
Maria-Luisa Rivero
Slavic, LVM is also absent from embedded clauses, which usually have a C that is overt, as in Slovak . . . chi si napisal list '. . . if he has written the letter.' In brief, Breton and Slavic are parallel in that they share the H-internal Domain Condition for T, but contrast because they use it for different lexical items. In addition, Breton and Slavic are similar in that they share the LVM process. Example (48) serves to show that the H-internal Domain Condition mentions Tense and not Agr, as established for Breton in section 3. (48) Bil sam chel knigata. Bulgarian had have+lS read book+the 'I have read the book.' Lit. I have had read the book In this sentence, an auxiliary in participle form is fronted by LVM. This auxiliary shows Agr (masculine, singular) but lacks T, and can head a root projection, unlike sdm that shows both Agr and T. So I conclude that the H-internal Domain Condition is sensitive to T not Agr, and that auxiliary be/have must appear in an internal domain when finite (sdm), but not when nonfinite (bil). In traditional terms, the Slavic perfect is a "clitic" only when tensed, not when untensed, which means that the bare-output condition that this auxiliary is subject to resides in T. In my analysis, positional restrictions, or their inability to license T, are a sign of the functional nature of some Aux, but additional factors reviewed next separate functional from lexical Aux. This dichotomy has more predictive power than the traditional division between clitic and nonclitic Aux and offers the advantage of capturing similarities between Breton and Slavic through the idea that functional categories such as T (and not necessarily clitics) display interface conditions. My approach does not deny the distinction between clitics and nonclitics, but considers it one manifestation of the more fundamental contrast between functional versus lexical categories. For Lema and Rivero (1991), lexical and not functional Aux license VPPreposing. Thus, VP-preposing is fine with the Czech future Aux but not with perfect Aux (Rivero, 1991: (4)): (49) a.
[Kupovat knihy] budu. buy+INF books will+lS 'Buy books I will.' b. *[Koupil knihy] jsem. bought books have+ IS '*Bought books I have.'
Analyses for Slavic VP-Preposing await development, but the idea that lexical Aux may establish a Theta-role-type relation with its VP-complement could explain why this complement shows extraction properties similar to those of a NPobject.
Finiteness and Second Position
315
Another difference is that functional Aux precedes lexical Aux when the two combine (Rivero, 1994b). In (50), the perfect precedes the modal, which is usual in Slavic. (50)
Tehdy jsem musel chtit vedet. Czech then have+lSmust want know 'Then I must have wished to know.'
If functional categories form extended projections with lexical categories, as in Grimshaw (1991), this order follows if the Modal is lexical and the Perfect functional. Modals are not second position items, and allow VP-Preposing (Rivero, 1991: n. 4), so these different properties coexist. In Czech (as in Slovak and Polish), the position of Neg establishes an interesting contrast between lexical and functional Aux (Rivero, 1991): (51) a.
Tedhy jsem ne-pisal knihy. Czech then have+ IS not-written books Then I have not written books.' b. *Ne jsem pisal knihy tedhy. c. Ne-pisal jsem knihy tedhy. d. *Jsem ne-pisal knihy tedhy.
(52) a. Ne budu pisat knihy. neg will+ IS write books 'I will not write books.' b. *Budu ne-pisat knihy. Examples (5la) and (52b) show that ne follows perfect Aux and cannot attach to this item. In (51c) LVM moves the untensed V to C with ne 'not' attached to it; therefore, ne fails to immediately precede the functional Aux. By contrast, ne immediately precedes future Aux in (52a), and tensed Vs as in (53). If future Aux belongs to the lexical class, this distribution is principled. (53) Ne-pishem knihy. Czech neg-write+PRES+IS books 'I am not writing books.' The traditional view that attributes positional restrictions to clitics also establishes two classes of auxiliaries, but cannot account for word-order differences other than second position. Thus the functional/lexical distinction is superior to the clitic-nonclitic view both from the contrastive perspective that seeks to relate Breton and Slavic, and from a point of view internal to Slavic. I propose that tensed lexical Aux are merged in the lexical layer of the clause like ordinary Vs, and raise to T like them. This raising establishes a checking configuration that licenses T in PF under the Checking Domain Condition. This
316
Maria-Luisa Rivero
accounts for why the Czech future Aux escapes position restrictions, and patterns like a V when negated. Recall that for Rivero (1991), Czech T takes NegP as complement, and Neg takes VP as complement: (54)
[ T pT[ N e g p Neg [vp V]]]
The ordinary V amalgamates with ne, and the complex raises to T resulting in ne+V in (53). Future Aux is lexical, generated under V with a VP-complement (not represented), so amalgamates with ne to form the complex that raises to T, which results in ne+Aux in (52a). On this view, Tensed functional Aux are items merged in T, take a VPcomplement headed by V, and disallow V to raise to T. No V or V-like Aux raises to the functional Aux from the VP, so no H-checking domain for T is established. Thus, this case involves the same H-internal domain condition as Breton: Aux must be the head of a complement of a C visible in PR The Czech perfect Aux in (54) is generated under T with NegP as complement, and is positionally restricted only appearing in an internal domain. In root clauses, fronting of a constituent must apply so that T can be licensed, as in (5la) and (51c). In sum, Slavic and Breton root sentences in the perfect contain a T that must be satisfied via the H-internal Domain Condition, so they look similar. However, Breton and Slavic root patterns with Verbs and Modals differ since they fall under the H-internal Domain Condition in Breton, and the H-checking Domain Condition with raising to T in Slavic. 4.2. Parametric Variation in Slavic: Polish We saw above that the two options for licensing T in PF are the source of the parametric variation that distinguishes Breton from the LVM Slavic languages as to verbs and auxiliaries. The same dichotomy is the source of the parametric variation that differentiates Polish from the LVM languages of the Slavic family and from Breton at the same time. To see how this works first recall that Borsley and Rivero (1994) argue that Polish lacks LVM and displays raising of a nonfinite V to a finite Aux, resulting in a morphological complex. That is, Polish has Incorporation as in Baker (1988). What I suggest here is that this process is "stylistic" and thus applies to establish an H-Checking Domain to license T in PF, and not to check features, and this is why it can be optional in some varieties of this language. In the other Slavic languages examined above and in Breton, Incorporation is unavailable as a licensing option in PF, which leads to a type of parametric variation attributable to the functional category T. Consider here sentences with perfect auxiliaries as in (55), where Polish looks identical to Bulgarian, Czech, and Breton. A functional Aux when finite cannot head a projection that is not a complement, and Wh-movement to Spec-of-C li-
Finiteness and Second Position
317
censes the structure in PF. In other words, the Head-Complement structure is used to license T in PF in a way that is reminiscent of the LVM languages. (55) a.
Kiedy-s widzial ten film? Polish when-have+2S seen this film? 'When have you seen this film?' b. [ CP kiedyi [ C ' [ C o q] [TP [TOs] [Vp widzial ten film] ti,]]] c. *S widzial ten film. have+2S seen this film '*You have seen this film.' d. *[TP [TO s] [VP widzial ten film ]]]]
In addition, certain varieties of Polish have an optional incorporation process with the Participle moving to the tensed Aux, as in (56), and this process also serves to license T in PF as in (56c-d). This is where the differences with LVM Slavic languages arise. (56) a. Kiedy widzial-es ten film? when seen- have+2S this film b. [Cp kiedyi [c'[coq] [Tp [To widzialk -es] [Vp tk ten film ] ti]]] c. Widzial -es ten film. seen- have+2S this film '*You have seen this film.' d- [TP [TO widzialk -es] [Vp tk ten film ]]]] If we think of the T-node as the locus for a Tense affix as in (Chomsky, 1981), the ordinary inflected V results from V-raising to the affix in T, reminiscent of English / am (not) in Paris. In such a situation, V is usually precluded from moving to the T that is filled by a lexical item, as in / will (not) be in Paris. From this perspective, Polish is a language that allows V-raising if T is filled by an affix or by a full word. Thus, the untensed V raises to T if this node contains a functional auxiliary, as in (56). Incorporation leads to a distributional difference with the LVM languages. Recall that sequences formed by w/i-phrase(s), a Participle, and a tensed Aux are ungrammatical in Slavic languages with LVM, and in Breton: Bulgarian *Kakvo kupil e? versus Kakvo e kupil? 'What has he bought?'. The reason is that LVM does not co-occur with syntactic movement to Spec-of-C, as the participle will only move to C if the licensing configuration for T has not been established in the syntax. In Polish, by contrast, the assumption is that the Participle raises to T and not to C, which explains why incorporated forms do not display the root
318
Marfa-Luisa Rivero
characteristics of LVM constructions, and may appear in all types of embedded clauses including relatives (Borsley and Rivero, 1994). The next question is why Incorporation applies in Polish. Sequences such as (55a) indicate that in some variants the process need not always apply, which suggests that it is not a checking operation. My proposal thus is that Polish uses the H-internal Domain and the H-checking Domain Conditions as two alternative ways to satisfy the PF requirements of T in functional Aux. If V raises to Aux, the process establishes the H-checking domain where T is licensed, as in (56c-d), and this makes Polish contrast with other Slavic languages and with Breton, which necessarily appeal to the H-internal Domain Condition in the context of functional Aux. Polish functional Aux that are not targeted by V-incorporation are licensed by the H-internal Domain Condition, as in (56a), similar to what happens in Breton and other Slavic languages. As a consequence of this double choice, the Polish perfect Aux is positionally restricted in the absence of incorporation, as in (55c-d) (i.e., it cannot be sentence-initial). In this case, Polish auxiliaries most closely resemble Bulgarian auxiliaries, which cannot be first, but need not be second or adjacent to C. However, if incorporation applies as in (56b), it has the properties of the ordinary tensed V in Slavic: it can head a projection that is not a complement, as in (56c-d). Finally, Polish does not differ from other Slavic languages when it comes to true verbs and lexical auxiliaries; these use the checking configuration to license T in PF. On this analysis Polish uses, under slightly different circumstances, the two PF licensing principles for T that are also found in Breton and the other Slavic languages. This provides additional support for the idea that the Head-Complement and the Checking configurations are available in Universal Grammar to satisfy PF interface conditions, not just internal conditions. The chart in (57) summarizes the proposal by adding Polish to the chart in (20). (57)
Tense-licensing in PF Checking Domain
Internal Domain
Language
V
Aux
V
Aux
Breton Slavic LVM Igs Polish
No Yes Yes
No No Yes
Yes No No
Yes Yes Yes
5. SUMMARY AND CONCLUSIONS In LVM-languages, T is a functional category that imposes a PF-interface condition. This condition can be satisfied in two structural ways based on sisterhood. TP can be the sister of the head that licences its T (i.e., the licensing configura-
Finiteness and Second Position
319
tion is the Head-Complement structure). Alternatively, T can be the sister of the licensing head (i.e., the licensing structure is a Checking Configuration). In addition, these languages share a verb^fronting process dubbed LVM with an output that establishes a Head-Complement structure, which is reminiscent of computational Merge, not Move. This process applies in PF to satisfy the interface condition of T and not to check features: V becomes the sister of TP when it moves to C. Parametric differences between LVM languages result from the interaction of the two structural options to license T in PF and entries in the lexicon. In Breton, T is most often licensed when TP is a sister or complement, but with a few verbs, T is licensed by being itself a sister. A consequence of this is that Breton is a VSO language where VI patterns are almost nonexistent in root clauses, and secondposition restrictions on finite heads are pervasive. In Slavic the option to license T when TP is a complement/sister is found with functional auxiliaries traditionally called clitics, while with verbs T is licensed by being itself a sister, so second-position effects on finiteness are less pervasive than in Breton. Besides their interface requirements on T, Slavic and Breton are characterized by an LVM process. This process applies in PF to satisfy T, establishes a Head-Complement configuration reminiscent of Merge in the computation, and does not check formal features. The same PF system to license T serves to distinguish Polish from LVM languages, including Breton. Polish uses Incorporation of V to Aux to license T in PF, hence a Checking Configuration, which distinguishes it from the other languages. The Head-Complement configuration where TP is the sister of C is also used to license T in Polish. This language lacks LVM, so the Head-Complement option is used not when V is in C but when a phrase is in Spec-of-C.
NOTES 'The first version of this paper dates from 1993; this updated version owes much to helpful comments by R. D. Borsley, P. Hirschbiihler, M. Suner, A. Terzi, and two anonymous reviewers for the present volume. Unless otherwise indicated, Breton examples are from (Borsley, Rivero, and Stephens, 1996). I owe thanks to many colleagues and friends for data and discussion through the years: for Breton, R. Borsley and J. Stephens; for Bulgarian, G. Alexandrova, O. Arnaudova, M. Dimitrova-Vulchanova, and E. Savov; for Czech, F. Bakes and J. Sedivy; for Polish, R. Borsley, E. Jaworska, and J. Witkos; for SerboCroatian, W. Browne, A. Donskov, D. Stojanovic (formerly Kudra), and L. Progovac; for Slovak, H. Briestenska. I acknowledge support from SSHRCC Grants 410-91-0178 and 410-94-0401 and the Eurotyp Project of the ESF. 2 This process, known as Long Head Movement, first proposed in Rivero (1994a [written
320
Maria-Luisa Rivero
in 1988]) has attracted much attention, leading to a variety of counterproposals whose discussion falls well beyond the scope of this chapter. Rivero (1996) lists some of the existing alternatives, and sketches a critique of (pure) Morphological Merger/Prosodic Inversion analyses, as proposed most notably in Halpern (1992, 1995); the basic idea is that LVM can account for phenomena that fall outside the scope of linear operations like Prosodic Inversion/ Morphological Merger (and see note 4). Rivero (in prep) discusses LVM in the context of stylistic rules affecting verbs, which are the hierarchical movements in the PF branch with a Head-Complement or a Checking Configuration output that have in common the application to satisfy interface conditions and not to check formal features. 3 For Schafer (1994), Breton is both an LVM and a V2 language, albeit in different constructions. Schafer views LVM essentially as in Borsley, Rivero, and Stephens (1996), not as a feature-checking operation. By contrast, for Topicalizations, Wh-questions, and negative sentences she proposes a V2 analysis with tensed V-to-(I-to)-C for feature-checking. In my analysis, finite Vs and Aux are in T in all constructions, and finite raising to C is not viable; if it applied in the syntax, T would not head a complement at PF, and its interface condition would not be satisfied. On this view, Breton cannot be a V2 language, interpreting V2 to imply finite raising to C for feature checking. 4 In Bulgarian, verb raising in PF applies with li in main and embedded questions, as in (i), (ii), and (iii), which repeats (12b). (i)
(ii)
(iii)
Vidjaxme li knigata? Bulgarian Saw+lP]Q book+the Did we see the book?' Prochel li e knigata? Read Q have+3S book+the 'Has he read the book?' Ne znam [prochel li e Petur knigata.} Neg know+lS [read Q have+3S Peter book+the 'I do not know whether Peter has read the book.'
In my view, functional li (=Q) is in C and imposes an interface, not a feature-checking requirement: it must have overt material in its checking domain. This PF requirement triggers stylistic V-fronting in both root and nonroot clauses: verbs adjoin to li to licence it and thus come to precede it. LVM has the same external distribution in Bulgarian and Breton in yes-no questions, but Breton ha precedes the fronted V as in ha lennet en deus because LVM applies in the CP-complement of ha to satisfy the interface condition of T. That is, T and not Q triggers LVM in Breton in these questions, another parametric contrast between these two languages explored in Rivero (in prep). Rivero (1996) uses embedded interrogatives as in (iii) to argue against Prosodic Inversion/Morphological Merger. Examples of type (iv) from Borsley, Rivero, and Stephens (1996:66) also favor a hierarchical process, and argue against a linear operation. Prosodic Inversion, for instance, inverts a string-initial Aux with a following prosodic word, and wrongly predicts that the auxiliary should follow the first verb in these sequences: (iv) a. Lennet ha komprenet en deus Yann al levr. Breton read and understood 3S have Yann the book 'Yann has read and understood the book.'
Finiteness and Second Position
321
b. Vidjal i prochel e knigata. Bulgarian seen and read have+3S book+the 'He has seen and read the book.'
REFERENCES Anderson, S. (1982). Where's Morphology. Linguistic Inquiry, 13, 571-612. Anderson, S., and Chung, S. (1977). On grammatical relations and clause structure in verbinitial languages. In P. Cole and J. Sadock (Eds.), Syntax and semantics 8: Grammatical relations (pp. 1-26). New York: Academic Press. Baker, M. C. (1988). Incorporation. Chicago, IL: University of Chicago Press. den Besten, H. (1977). On the interaction of Root transformations and Lexical Deletive rules. Published in 1983 in W. Abraham (Ed.), On the formal syntax of the Westgermania (pp. 47-131). Amsterdam: John Benjamins. Beukema, R, and Coopmans, P. (1989). A government-binding perspective of the imperative in English. Journal of Linguistics, 25, 417-436. Borsley, R. D., and Rivero, M. L. (1994). Clitic auxiliaries and incorporation in Polish. Natural Language and Linguistic Theory, 12, 373-422. Borsley, R., Rivero, M. L., and Stephens, J. (1996). Long Head Movement in Breton. In R. Borsley and I. Roberts (Eds.), The syntax of the Celtic languages (pp. 53-74). Cambridge: Cambridge University Press. Borsley, R. D., and Stephens, J. (1989). Agreement and the position of subjects in Breton. Natural Language and Linguistic Theory, 7, 407-428. Branigan, P. (1996). Verb-second and the A-bar syntax of subjects. Studia Linguistica, 50, 50-79. Chomsky, N. (1977). On Wh-movement. In P. Culicover et al. (Eds.), Formal syntax (pp. 71-132). New York: Holt, Rinehart and Winston. Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. (1991). Some notes on economy of derivations and representations. In R. Freidin (Ed.), Principles and parameters in comparative grammar (pp. 417-454). Cambridge, MA: MIT Press. Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N., and Lasnik, H. (1977). Filters and control. Linguistic Inquiry, 8, 425-504. Chung, S., and McCloskey, J. (1987). Government, barriers and small clauses in modern Irish. Linguistic Inquiry, 18, 173-237. Desbordes, Y. (1983). Petite grammaire du Breton moderne. Mouladuriou hor Yezh. Diesing, M. (1990). Verb movement and the subject position in Yiddish. Natural Language and Linguistic Theory, 8, 41-79. Epstein, S. D. (1992). Derivational constraints on A'-chain formation. Linguistic Inquiry, 23, 235-261. Grimshaw, J. (1991). Extended projections. Unpublished manuscript, Brandeis University. Halpern, A. (1992). Topics in the placement and morphology of Clitics. Ph.D. Dissertation, Stanford University. Revised version published as (Halpern 1995).
322
Maria-Luisa Rivero
Halpern, A. (1995). On the placement and morphology of Clitics. Stanford, CA: CSLI Publications. Hendrick, R. (1988). Anaphora in Celtic and Universal Grammar. Dordrecht: Kluwer. Hendrick, R. (1991). The morphosyntax of aspect. Lingua, 85, 171-210. latridou, S., and Kroch, A. (1992). The licensing of CP-recursion and its relevance to the Germanic verb second phenomenon. Working Papers in Scandinavian Syntax, 50, 1-24. Lasnik, H., and Saito, M. (1992). Move alpha. Conditions on its application and output. Cambridge, MA: MIT Press. Lema, J., and Rivero, M. L. (1990). Long Head Movement: ECP vs. HMC. NELS20, 333347. GLSA, University of Massachusetts, Amherst, MA. Lema, I, and Rivero, M. L. (1991). Types of verbal movement in Old Spanish: Modals, futures, and perfects. Probus, 3, 237-278. Reinhart, T. (1995). Interface strategies. OTS Working Paper, Utrecht. Rivero, M. L. (1991). Long Head Movement and negation: Serbo-Croatian vs. Slovak and Czech. The Linguistic Review, 8, 319-351. Rivero, M. L. (1993a). Long Head Movement vs. V2, and Null Subjects in Old Romance. Lingua, 89, 113-141. Rivero, M. L. (1993b). Bulgarian and Serbo-Croatian Yes-No Questions. V°- raising to -li vs. L/-Hopping. Linguistic Inquiry, 24, 567-575. Rivero, M. L. (1994a). Clause Structure and V-movement in the languages of the Balkans. Natural Language and Linguistic Theory, 12, 63-120. Rivero, M. L. (1994b). Auxiliares funcionales y auxiliares lexicos. In V. Demonte (Ed.), Gramdtica del Espanol (pp. 107-138). Publicaciones de la Nueva Revista de Filologia Hispanica VI. CELL, El Colegio de Mexico, Mexico. Rivero, M. L. (1996). Verb Movement and Economy: Last Resort. Papers from the First Conference on Formal Approaches to South Slavic Languages. Plovdiv. October 1995. University of Trondheim Working Papers in Linguistics 28. 211-228. Revised version to appear in Benjamins volume. Rivero, M. L. (in prep). Verb syntax and interface conditions. New York: Oxford University Press. Rouveret, A. (1991). Functional categories and agreement. The Linguistic Review, 8, 353-387. Rudin, C. (1986). Aspects of Bulgarian syntax: Complementizers and Wh constructions. Columbus, OH: Slavica Publishers. Rudin, C. (1988). On multiple questions and multiple Wh-fronting. Natural Language and Linguistic Theory, 6, 445-501. Schafer, R. J. (1992). Negation and verb second in Breton. Working Paper 92-02, Syntax Research Centre, University of California, Santa Cruz. Revised version published in NLLT in 1994. Schafer, R. J. (1994). Nonfinite predicate initial constructions in Breton. Ph.D. Dissertation, University of California, Santa Cruz. Stephens, J. (1982). Word order in Breton. Ph.D. Dissertation, University of London. Stump, G. (1984). Agreement vs. Incorporation in Breton. Natural Language and Linguistic Theory, 2, 289-348.
Finiteness and Second Position
323
Stump, G. (1989). Further remarks on Breton Agreement. Natural Language and Linguistic Theory, 7, 429-472. Vikner, S. (1991). Verb movement and the licensing of NP-positions in the Germanic languages. Unpublished manuscript, University of Stuttgart. Published by Oxford University Press in 1995. Wojcik, R. (1976). Verb-fronting and auxiliary do in Breton. NELS, 6, 259-278. Zanuttini, R. (1991). Syntactic properties of sentential negation: A comparative study of Romance languages. Ph.D. Dissertation, University of Pennsylvania. Zubizarreta, M. L. (1995). Prosody, focus, and word order. Ms. University of Southern California. Revised version published by MIT Press.
This page intentionally left blank
FRENCH WORD ORDER AND LEXICAL WEIGHT ANNE ABEILLE* DANIELE GODARD *IUF Universite Paris 7 UFRL Paris, France *CNRS Universite Lille 3 Villeneuve d'Ascq, France
1. INTRODUCTION1 As usual with complex phenomena, progress in the comprehension of word order can only be made by isolating and studying each factor in turn. We concentrate our attention here on the syntactic constraints governing the order of complements and adjuncts in French, leaving aside discursive, pragmatic, and stylistic factors. Accordingly, the grammatical judgments we provide are to be taken with an unmarked intonation, some of the sentences given as ungrammatical here being acceptable with a special prosodic pattern. The study of word order requires a great attention to the detail of the data. Nevertheless, we think it possible to arrive at generalizations that are both empirically accurate and theoretically interesting. Recently, the question has been taken up of the relation between constituency and word order with the two questions: can word order be reduced to the hierarchical structure (Kayne, 1994; Cinque, 1977), or does it constitute a separate component (Gazdar et al., 1985; Pollard and Sag, 1987), and, in the second case, do the constituency and the ordering domains coincide or does word order have a domain of its own, and, if so, how is it related to constituency (Reape, 1994; Kathol, 1995)? The word-order facts we look at are not readily amenable to structural distinctions, and point to the existence of a separate word-order component, Syntax and Semantics, Volume 32 The Nature and Function of Syntactic Categories
325
Copyright © 2000 by Academic Press All rights of reproduction in any form reserved. 0092-4563/99 $30
326
Anne Abeille and Daniele Godard
but do not seriously challenge the view that the constituency and the word-order domain coincide. Our main finding is classificatory: we bring to light a new syntactic factor that plays a role in word order, building on suggestions in Sadler and Arnold (1993, 1994) for the English NP, and Sells (1994) for certain Korean facts. We show that certain constituents, which consist of a word, obey much stricter constraints than their phrasal counterparts or other such constituents. Roughly, they must occur first in the phrase or adjacent to the head. This suggests a weight constraint symmetrical to the well-known heaviness constraint, which tends to order heavy elements last in their domain. Leaving heavy constituents aside, we contrast "light" constituents with ordinary "middle-weight" ones, using a two-value (lite vs. nonlite) feature WEIGHT, which characterizes both lexical items (they can be lite, nonlite, or unspecified) and phrases (usually nonlite). Adopting the Head-driven Phrase Structure Grammar framework (HPSG, Pollard and Sag, 1987, 1994), we formalize order rules as constraints on the daughters in a phrasal type.2 In this framework, we build on our empirical findings to propose a mixed theory of word order, which results from the interplay of the grammatical function and the weight of the daughters. We begin with an examination of the order of complements in the VP, showing a systematic difference between bare complements and the others (section 2), which we describe using the WEIGHT (WGT) feature in conjunction with phrasal constraints and LP rules for French (section 3). We then apply the theory to account for the position of adjectives in the NP (section 4). Finally, we go back to the adverbs in the VP, to give a fuller account of ordering in French (section 5).
2. THE ORDER OF COMPLEMENTS IN THE VP We contrast phrasal complements (which we call "nonlite," anticipating the weight feature) which occur freely to the right of the head in French, with bare complements (called "lite") which must precede phrasal complements and are strictly ordered among themselves. 2.1. Free Order among Phrasal Complements As has often been observed, complements in French are not ordered with respect to one another (leaving discursive factors aside).3 An indirect object may precede or follow a direct object (1); a predicative adjective may precede or follow a direct object (2): (1) Paul donne un livre a sonfilsl donne a son fils un livre. 'Paul gives a book to his son.'
French Word Order and Lexical Weight
327
(2) Cette musique rend monfilsfou dejoiel rendfou dejoie monfils. 'This music makes my son really happy.' (lit: crazy of joy) Similarly, a sentential or infinitival complement may precede or follow a nominal complement, with some preference for the second position due to heaviness: (3) Paul dit a Marie de venirl dit de venir a Marie. 'Paul says to Marie to come.' (4) Paul dit a safille qu 'ilfait beau/ dit qu 'ilfait beau a safille. 'Paul tells his daughter that it is nice weather.' The same mobility can be observed with complements of nouns: (5) La destruction de Rome par les Barbaresl par les Barbares de Rome. 'The destruction of Rome by the Barbarians.' (6) La volonte de lutter de Jean/ de Jean de lutter. 'Jean's wish to fight' (lit: The wish of fight(ing) of Jean). 2.2. Lite Complements before Nonlite Complements Bare proper names and predicative adjectives have the same mobility as phrasal complements: (7) a. Paulpresente Geraldine asonfils I a sonfils Geraldine. 'Paul introduces Geraldine to his son.' b. Cette musique rend monfils foul rendfou monfils. 'This music makes my son really happy.' On the other hand, bare common nouns exhibit ordering constraints not observed with phrasal complements. First, they precede phrasal complements. Light verbs provide numerous instances of bare nominal complements, which invariably occur immediately after the verbal head: (8) a. La course donne soifa Jean/ * donne a Jean soif. 'The race makes Jean thirsty.' (lit: gives thirst to Jean) b. Ce livre fait plaisir a Marie/ * fait a Marie plaisir. 'This book gives pleasure to Marie.' (lit: makes pleasure to Marie) However, when the same N has a complement or a determiner, it becomes as free as a phrasal complement: (9) a. La course donne une grande soifa Jean/ donne a Jean une grande soif. 'The race makes Jean very thirsty.' (lit: gives a great thirst to Jean) b. Ce livre fait le plaisir de sa vie a Marie I fait a Marie le plaisir de sa vie. 'This book gives the pleasure of her life to Marie'
328
Anne Abeille and Daniele Godard
Modification by an adverb or conjunction of these N has a similar effect:4 (10) La course donne [vraiment soif] a Jean/ donne a Jean [vraiment soif]. 'The race makes Jean really thirsty.' (lit: gives really thirst to Jean) (11) La marche donnera [faim ou soif] a Marie/ ? donnera a Marie [faim ou soif]. 'A walk will make Marie hungry or thirsty.' (lit: will-give hunger or thirst to Marie) (12) La vitesse fait [peur et plaisir] a Marie/fait a Marie [peur et plaisir]. 'Speed gives fear and pleasure to Marie.' The same observation extends to another case of bare complements, the past participle in tense auxiliary constructions and the infinitive in causative constructions. We analyze tense auxiliaries andfaire as the head of a flat VP, which takes as complements the participle or the infinitive and its complements (cf. Abeille et al., 1997). The tree structure representations of these constructions are given in (13), where the function of the daughters is represented as an annotation on the branches:
In this analysis, the auxiliary or the causative faire is the morphosyntactic head (H) of the construction, which inherits all the complements (C) of the bare participle or infinitive. Like other bare complements, it must precede all other nonlite complements. (14) a. Paul a achete des pommesl * a des pommes achete. 'Paul has bought apples.' b. Cette musique fait pleurer mon filsl * fait mon fits pleurer. 'This music makes my son cry.' (lit: makes cry my son) However, unlike the N in light verb constructions, these verbal complements must precede the other complements even when modified or conjoined: (15) a. Paul a [achete et mange] des pommes/ * a des pommes [achete et mange]. 'Paul has bought and eaten apples.' b. Paul fait [beaucoup rire] son filsl * fait sonfils [beaucoup rire]. 'Paul makes his son laugh a lot.' As explained below, this difference between N and V does not depend on the category but on the requirement made by the predicate of which they are a complement. To account for the difference between (10-12) and (15), and in light of additional data on adjectives and adverbs (see sections 4 and 5), we will analyze
French Word Order and Lexical Weight
329
coordination or modification of lite categories as potentially ambiguous between lite and nonlite. 2.3. Rigid Ordering of Lite Complements Unlike phrases, the bare complements mentioned above are rigidly ordered in the following way (leaving bare quantifiers aside):5 (16)
Head < Past Part < Vinf < Bare Noun
The past participle must precede the other lite complements. It precedes the bare N in (17) and the bare V[infJ in (18): (17) La course a donne soif 'a Marie/ * La course a soif donne a Marie. 'The race has made Marie thirsty.' (lit: has given thirst to Marie) (18) Paul a fait tomber le vase/ * a tomberfait le vase. 'Paul made the vase fall.' (lit: has made fall the vase) Similarly, the V[inf] precedes the lite nominal complements: (19) Le President fera rendre hommage aux victimesl * fera hommage rendre aux victimes. 'The President will make one pay tribute to the victims.' (lit: will-make pay tribute to the victims)
3. A FEATURE-BASED TREATMENT Before presenting our analysis with the feature WEIGHT, we briefly show why alternative analyses based on morphological incorporation or syntactic distinctions completely independent from word-order properties are inappropriate. 3.1. Alternative Analyses The existence of bare complements has seldom been recognized as a syntactically relevant phenomenon. Some analyses have been proposed to deal with them in the morphology. Auxiliary constructions are traditionally handled in the same chapter as verbal inflection in descriptive or school grammars (e.g., Bescherelle, 1980; Grevisse and Goose, 1988); there are also attempts to account for the position of the infinitive in causative constructions by postulating morphologically complex predicates (e.g., Zubizarreta, 1985). However, a morphologically based solution is not consistent with the data, because adverbs and PPs, which do not belong to the same word as the verbal head, can always occur between the head and the bare complements:6 (20) a. Paul a evidemment achete des pommes. 'Paul has of course bought apples.'
330
Anne Abeille and Daniele Godard
b. La musiquefait depuis toujours pleurer monfils. 'Music always makes my son cry a lot.' (lit: music makes always a lot cry my son) c. Le livrefera sans doute plaisir a Marie. 'The book will no doubt give pleasure to Marie.' If the past participle, the infinitival or the bare noun in (20) were part of the same word as the head V, so would the adverb; it is not clear how such a proposal could be justified. As an alternative, one might think that some categories are adjoined to the V rather than at the same level as the regular complements. But many light V constructions (faire plaisir andfaire un grand plaisir, rendre hommage and rendre un vibrant hommage, avoir faint, and avoir une faim de loup) do not specify whether the complement is a bare N or an NP (with a determiner). The complementation of such light Vs would be radically different, depending on whether the N has or doesn't have a determiner. While not impossible, this structural difference would require independent justification.7 Another hypothesis is to use categorical distinctions. Distinguishing betwen V and VP (or S) complements could account for the contrast between (14) and (4-6). One could simply say that V complements, but not VP (or S) complements, must precede other complements. But a similar distinction is more problematic for nominal complements. One could contrast bare nouns as NPs with "maximal" nominal phrases or proper names as DPs, only the second being referential (e.g., Abney, 1987; Longobardi, 1994), and we would simply say that NPs must precede DPs in French. But if bare soifis an NP, one cannot see how the adjunction of an adverbial modifier (vraiment soif} would turn it into a DP; analogously, it is difficult to have coordination of nominal complements ("NPs") such as (11)-(12) recategorized as DPs. A category-based account will be even more difficult to account for the potential ambiguous behavior of certain modified or conjoined phrases (see sections 4 and 5). A feature-based account seems more appropriate for this kind of under-specification. Another categorical distinction would make use of bar-level distinctions. This is Sells's proposal to account for similar word-order restrictions in Korean, where certain bare complements and adverbs resist scrambling and must immediately precede the head (Sells, 1994). Assuming a binary phrase structure for Korean, Sells contrasts X0 categories, which must combine with an X0 head, with Xt and X2 categories that can combine with an X1 head; only the phrases with an X! head can scramble. The analysis can be summarized as follows: (i) words, rather than maximal projections only, can be complements or modifiers; (ii) certain words, but not all, are prevented from projecting X1 or X2 phrases by themselves;
French Word Order and Lexical Weight
331
(iii) certain syntactic phrases must be defined as X0 categories (the negationverb syntactic combination for instance), while others are X1 or X2. The effects of this proposal are very similar to what we also want for the French data. However, we find that X-bar theory is not the most appropriate tool. Proposals (ii) and (iii) represent a real difficulty for a bar-level representation, particularly when adverbs are taken into account. The word-order phenomena under investigation reflect properties of the lexical items; because they cannot be reduced to valence requirements, and because the combinatorics is not different when the phrase behaves ambiguously and when it behaves only as a usual maximal projection, a bar-level distinction is not appropriate. Anticipating the following discussion, certain adverbs in the VP must be adjacent to the lexical head, like common nouns, and others are more mobile, like proper names or maximal projections (see section 5). While we might associate the difference between common nouns and proper names with the fact that the second but not the first is valence saturated, this does not make sense with adverbs. Turning to the ambiguous phrases (vraiment soif, faim et soif}, it is impossible to get both X0 and X1 or X2. Again, as soon as the need for underspecification and sharing of value is recognized, a feature-based approach is more appropriate than one based on distinct categories. Bratt (1990) uses two features to get three levels of structure. Analyzing the sequence made of a causative verb and its infinitival verb complement in French (faire rire in Paul fait rire son fits 'Paul makes his son laugh,' lit: makes laugh his son), as a verbal complex, she notes this category with the two features: [LEX±] and [PHRAS±].8 While a word usually is [LEX+, PHRAS-] and a (usual) phrase [LEX-, PHRAS+], this complex is [LEX-, PHRAS-]. Proper names could be specified as [LEX+, PHRAS+] in the lexicon (her suggestion); our problematic combinations (vraiment soif, faim et soif), could then be underspecified ([LEX—, PHRAS±]), and the ordering constraints would say that [PHRAS—] come before [PHRAS+] constituents. However, the empirical justification for PHRAS is not clear, as soon as some words (proper names, but also most adverbs) have to be [PHRAS+], while some syntactic combinations are [PHRAS—], and others would have to be ambiguous. We conclude that, in the same way as word-order phenomena are not reducible to matters of constituency, the appropriate notation for them requires the use of a feature that is not reducible to other independent features. 3.2. The Feature WEIGHT In a way analogous to the heaviness constraint (which says that heavy phrases tend to come last, cf. Wasow, 1996), we propose that a constraint holds for light weight words or phrases that tend to come first in the phrase (just before or just
332
Anne Abeille and Daniele Godard
after the lexical head). We call them "lite" to make the point that lite is not just the contrary of heavy, the usual phrases being in fact "middle-weight." Lite constituents cluster with the head V. Ignoring heaviness phenomena here, we speak of a contrast between lite and nonlite constituents. The feature WEIGHT, present both in the lexicon and phrases, aims at capturing a general theory of word order. First, not all lexical items have the same weight value: they may be [ WGT lite], [WGT nonlite], or unspecified (with a general constraint that words are not heavy). Thus, we distinguish between common nouns, which are lite, and proper names, which are nonlite. Usually, predicates require their arguments to be nonlite; however, light verbs may allow (or require) that they be lite or unspecified. Second, while most phrases are nonlite, we allow certain phrases to be lite, such as achete et lu in (15a) or (23): (23) a. Paul a achete et lu La Recherche, b. *Paul a La Recherche achete et lu. 'Paul has bought and read the Recherche.' In (23) the coordination of participles is lite, because tense auxiliaries obligatorily take a lite V complement, that is, a participle which is unsaturated for all of its subcategorized complements. This sucategorization is represented in (24), as the value of the syntactic attribute ARG-ST whose first element corresponds to the subject and the others to complements; the identity of the integers means identity of the value for the lists (which is left unspecified), and © the concatenation of lists (Abeille and Godard, 1994; Abeille et al., 1997): (24)
avoir: ARG-ST [2]
The first complement of the auxiliary is the lite participle, and the second is identified with the list of complements that this participle itself subcategorizes for. Accordingly, the conjunction achete et lu must be lite when it is a complement of the auxiliary. Sentence (23b) is out because the [WGT lite] constraint on the coordination of past participles conflicts with the constraint that orders lite complements before nonlite ones.9 The question that must be raised, then, is whether we can or should dispense with head-only phrases. Given that the occurrence of lite and nonlite arguments depends on the subcategorization of predicates, which does not say whether they are words or phrases, do we need to build, or do we have arguments against building, a head-only phrase? It turns out that we can dispense with head-only phrases, at least regarding the data under consideration here. Since the weight distinction is what counts for subcategorization as well as word order, we get the right results if we accept combining words in the syntax. On the other hand, we have no argument which shows the head-only phrase to be inconsistent with our findings. The head-only phrase can give the right results if its description is identical to that of the head, in particular regarding weight and valence, and if syntax combines only phrases. In this chapter, we will explore a representation that does not use head-
French Word Order and Lexical Weight
333
only phrases, in order to keep constituency as simple as possible. The reader should keep in mind that this is a matter of representation, and can replace our representation combining words by head-only phrases, if it suits his or her taste better. 3.3. Liteness in Phrasal Descriptions The basic idea of the HPSG representation of linguistic expressions, or signs, is that all signs can be classified in types (noted with italics), which are associated with feature structures meeting certain constraints (Pollard and Sag, 1987; Sag, 1997). Signs divide into words (the unit for syntax) and complex constituents (phrases), which have daughters (hence the attribute DTRS). We examine here the consequences of the proposed WEIGHT feature for the representation of the relevant constituents. Let us first present the organization of phrases we assume:
This hierarchy is identical to that in Sag (1997), except for the hd-marker-phrase, and the hd-adj-comp-phrase which we propose for French, containing the complements and the adjuncts at the same time.10 As regards weight, we propose a general constraint, such that all head-nexus-phrases are nonlite: (26) head-nexus-phrase
[WEIGHT nonlite]
In order to account for lite phrases, illustrated in (15a) and (23) by the coordination of participle complements, we propose the following constraints on headadjunct phrases and coordinated phrases:11
Constraints (27a) and (27b) allow such phrases to be lite if all the daughters are lite. The daughters are not required to have the same weight ([1],[2],[n]and may
334
Anne Abeille and Daniele Godard
be different); however, the values can unify only if they are identical. Accordingly, the first disjunct in the value for the phrase is equivalent to lite if the daughters are all lite, to nonlite if they are all nonlite; since union fails if the daughters do not have the same weight value, the value for the phrase in this case is given by the second disjunct (nonlite). Because both signs and phrases can be lite or nonlite, the introduction of the WEIGHT feature leads to a more complex classification of signs, cross-classifying them for the two dimensions of weight and phrasality: (28) The hierarchy of signs and the feature WEIGHT
In the lexicon, nouns and adverbs are unambiguously specified as lite or nonlite: all proper names are nonlite, and all common nouns lite in French. Most adverbs are nonlite, while some are lite (see section 5). Verbs and adjectives can be underspecified for weight: most verbs are underspecified and can behave either as lite or nonlite. Adjectives may be lite, nonlite, or underspecified, depending on their pre- or postnominal position in the NP (see section 4). Words that are underspecified for weight are specified in context, given the constraint on weight in the phrase in which they appear. As an example, we represent in (29a) the analysis of the sentence Paul viendra according to the hypotheses presented in this section, and we give in (29b) the description of the head-subject phrase to which it corresponds:12
French Word Order and Lexical Weight
335
Although somewhat unusual in phrase structure frameworks, the representation in (29b) is perfectly in keeping with the formal apparatus for categories in HPSG. The notation "VP" has no theoretical status in this framework; it is an abbreviation for a phrasal constituent whose lexical head is a V, which is (normally) saturated for its complements, but is missing a subject. Similarly, an "NP" abbreviates a phrase whose head is a lexical N, and which is saturated for its complements and specifier; it is nonlite and also "maximal" to use the usual parlance, while the VP is not maximal, since the verb is considered the head of the sentence. Thus, if one does not want to use head-only phrases, the only phrase in the sentence Paul viendra is a hd-subj-phrase. There is no VP because the verb has no complement, and we have no head-only schema. There is no NP either, since the subject is a proper name. Both the subject and the head are nonlite words; the verb viendra is nonlite because most V's are lexically unspecified for weight, and the constraint on hd-subj phrases requires the head to be nonlite. The subject daughter is not so constrained and can be lite (as in Hommage sera rendu aux victimes 'tribute will be paid to the victims'); it is nonlite in (29b) because proper names are lexically nonlite. Two Linear Precedence constraints are associated with phrasal descriptions, making use of the function of the daughters and independent of weight ('