LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
This book was originally selected and revised to be included in the ...
30 downloads
1181 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
This book was originally selected and revised to be included in the World Theses Series (Holland Academic Graphics, The Hague), edited by Lisa L.-S. Cheng.
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
DAVID LEBEAUX NEC Research Institute
JOHN BENJAMINS PUBLISHING COMPANY PHILADELPHIA/AMSTERDAM
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984.
Library of Congress Cataloging-in-Publication Data Lebeaux, David. Language acquisition and the form of the grammar / David Lebeaux p. cm. Includes bibliographical references and index. 1. Language acquisition. 2. Generative grammar. I. Title. P118.L38995 2000 401’.93--dc21 ISBN 90 272 2565 6 (Eur.) / 1 55619 858 2 (US)
00-039775
© 2000 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. • P.O.Box 75577 • 1070 AN Amsterdam • The Netherlands John Benjamins North America • P.O.Box 27519 • Philadelphia PA 19118-0519 • USA
Table of Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
C 1 A Re-Definition of the Problem . . . . . . . . . . . . . . . . . . . . . 1.1 The Pivot/Open Distinction and the Government Relation 1.1.1 Braine’s Distinction . . . . . . . . . . . . . . . . . . . . . . 1.1.2 The Government Relation . . . . . . . . . . . . . . . . . 1.2 The Open/Closed Class Distinction . . . . . . . . . . . . . . . . 1.2.1 Finiteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 The Question of Levels . . . . . . . . . . . . . . . . . . . 1.3 Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 A Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Determining the base order of German . . . . . . . . 1.3.2.1 The Movement of NEG (syntax) . . . . . . . 1.3.2.2 The Placement of NEG (Acquisition) . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
7 7 7 9 11 13 14 16 16 17 24 26
C 2 Project-α α, Argument-Linking, and Telegraphic Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Parametric variation in Phrase Structure . . . . . . . . . . . . . 2.1.1 Phrase Structure Articulation . . . . . . . . . . . . . . . 2.1.2 Building Phrase Structure (Pinker 1984) . . . . . . . 2.2 Argument-linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 An ergative subsystem: English nominals . . . . . . 2.2.2 Argument-linking and Phrase Structure: Summary
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
31 31 31 32 38 41 45
vi
TABLE OF CONTENTS
2.3 The Projection of Lexical Structure . . . . . . . . . . . . . . . . 2.3.1 The Nature of Projection . . . . . . . . . . . . . . . . . . 2.3.2 Pre-Project-α representations (acquisition) . . . . . . 2.3.3 Pre-Project-α representations and the Segmentation 2.3.4 The Initial Induction: Summary . . . . . . . . . . . . . 2.3.5 The Early Phrase Marker (continued) . . . . . . . . . 2.3.6 From the Lexical to the Phrasal Syntax . . . . . . . . 2.3.7 Licensing of Determiners . . . . . . . . . . . . . . . . . . 2.3.8 Submaximal Projections . . . . . . . . . . . . . . . . . . .
....... ....... ....... Problem ....... ....... ....... ....... .......
C 3 Adjoin-α α and Relative Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Some general considerations . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The Argument/Adjunct Distinction, Derivationally Considered . 3.3.1 RCs and the Argument/Adjunct Distinction . . . . . . . . . 3.3.2 Adjunctual Structure and the Structure of the Base . . . . 3.3.3 Anti-Reconstruction Effects . . . . . . . . . . . . . . . . . . . . 3.3.4 In the Derivational Mode: Adjoin-α . . . . . . . . . . . . . . 3.3.5 A Conceptual Argument . . . . . . . . . . . . . . . . . . . . . . 3.4 An Account of Parametric Variation . . . . . . . . . . . . . . . . . . . 3.5 Relative Clause Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 The Fine Structure of the Grammar, with Correspondences: The General Congruence Principle . . . . . . . . . . . . . . . . . . . . . . . . 3.7 What the Relation of the Grammar to the Parser Might Be . . . C 4 Agreement and Merger . . . . . . . . . . . . . . . . . . 4.1 The Complement of Operations . . . . . . . . . 4.2 Agreement . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Merger or Project-α . . . . . . . . . . . . . . . . . 4.3.1 Relation to Psycholinguistic Evidence 4.3.2 Reduced Structures . . . . . . . . . . . . . 4.3.3 Merger, or Project-α . . . . . . . . . . . . 4.3.4 Idioms . . . . . . . . . . . . . . . . . . . . . . 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . .
47 51 56 60 65 66 75 84 86
. . . . . . . . . . .
91 91 93 94 94 98 102 104 110 112 120
. . . . 126 . . . . 136
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
145 146 149 154 154 157 165 169 181
vii
TABLE OF CONTENTS
C 5 The Abrogation of DS Functions: Dislocated Constituents and Indexing Relations . . . . . . . . . . . . . . . . 5.1 “Shallow” Analyses vs. the Derivational Theory of Complexity . . . 5.2 Computational Complexity and The Notion of Anchoring . . . . . . . 5.3 Levels of Representation and Learnability . . . . . . . . . . . . . . . . . . 5.4 Equipollence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Case Study I: Tavakolian’s results and the Early Nature of Control . 5.5.1 Tavakolian’s Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Two Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 PRO as Pro, or as a Neutralized Element . . . . . . . . . . . . . . 5.5.4 The Control Rule, Syntactic Considerations: The Question of C-command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.5 The Abrogation of DS functions . . . . . . . . . . . . . . . . . . . . 5.6 Case Study II: Condition C and Dislocated Constituents . . . . . . . . 5.6.1 The Abrogation of DS Functions: Condition C . . . . . . . . . . 5.6.2 The Application of Indexing . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Distinguishing Accounts . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Case Study III: Wh-Questions and Strong Crossover . . . . . . . . . . . 5.7.1 Wh-questions: Barriers framework . . . . . . . . . . . . . . . . . . 5.7.2 Strong Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Acquisition Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.4 Two possibilities of explanation . . . . . . . . . . . . . . . . . . . . 5.7.5 A Representational Account . . . . . . . . . . . . . . . . . . . . . . . 5.7.6 A Derivational Account, and a Possible Compromise . . . . .
. . . . . . . . .
183 184 188 192 199 203 204 207 208
. . . . . . . . . . . . .
213 220 224 226 229 234 239 240 242 245 248 249 251
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
There are two ways of painting two trees together. Draw a large tree and add a small one; this is called fu lao (carrying the old on the back). Draw a small tree and add a large one; this is called hsieh yu (leading the young by the hand). Old trees should show a grave dignity and an air of compassion. Young trees should appear modest and retiring. They should stand together gazing at each other. Mai-mai Sze The Way of Chinese Painting
Acknowledgments
This book had its origins as a linguistics thesis at the University of Massachusetts. First of all, I would like to thank my committee: Tom Roeper, for scores of hours of talk, for encouragement, and for his unflagging conviction of the importance of work in language acquisition; Edwin Williams, for the example of his work; Lyn Frazier, for an acute and creative reading; and Chuck Clifton, for a psychologist’s view. More generally, I would like to thank the faculty and students of the University of Massachusetts, for making it a place where creative thinking is valued. The concerns and orientation of this book are very much molded by the training that I received there. Further back, I would like to thank the people who got me interested in all of this in the first place: Steve Pinker, Jorge Hankamer, Jane Grimshaw, Annie Zaenen, Merrill Garrett and Susan Carey. I would also like to thank Noam Chomsky for encouragement throughout the years. Since the writing of the thesis, I have had the encouragement and advice of many fine colleagues. I would especially like to thank Susan Powers, Alan Munn, Cristina Schmitt, Juan Uriagereka, Anne Vainikka, Ann Farmer, and AnaTeresa Perez-Leroux. I am also indebted to Sandiway Fong, as well as Bob Krovetz, Christiane Fellbaum, Kiyoshi Yamabana, Piroska Csuri, and the NEC Research Institute for a remarkable environment in which to pursue the research further. I would also like to thank Mamoru Saito, Hajime Hoji, Peggy Speas, Juergen Weissenborn, Clare Voss, Keiko Muromatsu, Eloise Jelinek, Emmon Bach, Jan Koster, and Ray Jackendoff. Finally, I would like to thank my parents, Charles and Lillian Lebeaux, my sister, Debbie Lebeaux, and my sons, Mark and Theo. Most of all, I would like to thank my wife Pam, without whom this book would have been done badly, if at all. This book is dedicated to her, with love.
Preface
What is the best way to structure a grammar? This is the question that I started out with in the writing of my thesis in 1988. I believe that the thesis had a marked effect in its answering of this question, particularly in the creation of the Minimalist Program by Chomsky (1993) a few years later. I attempted real answers to the question of how to structure a grammar, and the answers were these: (i)
In acquisition, the grammar is arranged along the lines of subgrammars. These grammars are arranged so that the child passes from one to the next, and each succeeding grammar contains the last. I shall make this clearer below. (ii) In addition, in acquisition, the child proceeds to construct his/her grammar from derivational endpoints (Chapter 5). From the derivational endpoints, the child proceeds to construct the entire grammar. This may be forward or backward, depending on what the derivational endpoint is. If the derivation endpoint, or anchorpoint, is DS, then the construction is forward; if the derivational endpoint or anchorpoint is S-structure or the surface, then the construction proceeds backwards. The above two proposals were the main proposals made about the acquisition sequence. There were many proposals made about the syntax. Of these, the main architectural proposals were the following. (iii) The acquisition sequence and the syntax — in particular, the syntactic derivation — are not to be considered in isolation from each other, but rather are tightly yoked. The acquisition sequence can be seen as the result of derivational steps or subsequences (as can be seen in Chapter 2, 3, and 4). This means that the acquisition sequence gives unique purchase onto the derivation itself, including the adult derivation. (iv) Phrase structure is not given as is, nor is derived top-down, but rather is composed (Speas 1990). This phrase structure composition (Lebeaux 1988), is not strictly bottom up, as in Chomsky’s (1995) Merge, but rather involves
xiv
PREFACE
(a) the intermingling or units, (b) is grammatically licensed, and not simply geometrical (bottom-up) in character (in a way which will become clearer below), and (c) involves, among other transformations, the transformation Project-α (Chapter 4). (v) Two specific composition operations (and the beginnings of a third) are proposed. Adjoin-α (Chapter 3) is proposed, adding adjuncts to the basic nuclear clause structure (Conjoin-α is also suggested in that chapter). In further work, this is quite similar to the Adjunction operation of Joshi and Kroch, and the Tree Adjoining Grammars (Joshi 1985; Joshi and Kroch 1985; Frank 1992), though the proposals are independent and the proposals are not exactly the same. The second new composition operation is Project-α (Chapter 4), which is an absolutely new operation in the field. It projects open class structure into a closed class frame, and constitutes the single most radical syntactic proposal of this book. (vi) Finally, composition operations, and the variance in the grammar as a whole, are linked to the closed class set — elements like the, a, to, of, etc. In particular, each composition operation requires the satisfaction of a closed class element; as well as a closed class element being implicated in each parameter. These constitute some of the major proposals that are made in the course of this thesis. In this preface I would like to both lay out these proposals in more detail, and compare them with some of the other proposals that have been made since the publication of this thesis in 1988. While this thesis played a major role in the coming of the Minimalist Program (Chomsky 1993, 1995), the ideas of the thesis warrant a renewed look by researchers in the field, for they have provocative implications for the treatment of language acquisition and the composition of phrase structure. Let us start to outline the differences of this thesis with respect to later proposals, not with respect to language acquisition, but with respect to syntax. In particular, let us start with parts (iv) and (v) above: that the phrase marker is composed from smaller units. A similar proposal is made with Chomsky’s (1995) Merge. However, here, unlike Merge: (1) (2)
The composition is not simply bottom-up, but involves the possible intermingling of units. The composition is syntactically triggered in that all phrase structure composition involves the satisfaction of closed class elements
xv
PREFACE
(3)
(Chapters 3 and 4), and is not simply the geometric putting together of two units, as in Merge, and The composition consists of two operations among others (these are the only two that are developed in this thesis), Adjoin-α and Project-α.
With respect to the idea that all composition operations are syntactically triggered by features, let us take the operation Adjoin-α. This takes two structures and adjoins the second into the first. (1)
s1: s2:
the man met the woman who loved him
Adjoin-α
the man met the woman who loved him
This shows the intermingling of units, as the second is intermeshed with the first. However, I argue here (Chapter 4), that it also shows the satisfaction of closed class elements, in an interesting way. Let us call the wh-element of the relative clause, who here, the relative clause linker. It is a proposal of this thesis that the adjunction operation itself involves the satisfaction of the relative clause linker (who), by the relative clause head (the woman), and it is this relation, which is the relation of Agreement, which composes the phrase marker. The relative clause linker is part of the closed class set. This relative clause linker is satisfied in the course of Agreement, thus the composition operation is put into a 1-to-1 relation with the satisfaction of a closed class head. (This proposal, so far as I know, is brand new in the literature). (2)
Agree Relative head/relativizer ↔ Adjoin-α
This goes along with the proposal (Chapter 4), which was taken up in the Minimalist literature (Chomsky 1992, 1995), that movement involves the satisfaction of closed class features. The proposal here, however, is that composition, as well as movement, involves the satisfaction of a closed class feature (in particular, Agreement). In the position here, taken up in the Minimalist literature, the movement of an element to the subject position is put into a 1-to-1 correspondence with agreement (Chapter 4 again). (3)
Agree Subject/Predicate ↔ Move NP (Chapter 4)
The proposal here is thus more thoroughgoing than that in the minimalist literature, in that both the composition operation, and the movement operation are triggered by Agreement, and the satisfaction of closed class features. In the minimalist literature, it is simply movement which is triggered by the satisfaction
xvi
PREFACE
of closed class elements (features); phrase structure composition is done simply geometrically (bottom-up). Here, both are done through the satisfaction of Agreement. This is shown below. (4)
Minimalism
Lebeaux (1988)
Movement
syntactic (satisfaction of features)
syntactic (satisfaction of features)
Phrase Structure Composition
asyntactic (geometric)
syntactic (satisfaction of features)
This proposal (Lebeaux 1988) links the entire grammar to the closed class set — both the movement operations and the composition operations are linked to this set. The set of composition operations discussed in this thesis is not intended to be exhaustive, merely representative. Along with Adjoin-α which Chomskyadjoins elements into the representation (Chapter 3), let us take the second, yet more radical phrase structure composition operation, Project-α. This is not equivalent to Speas’ (1990) Project-α, but rather projects an open class structure into a closed class frame. The open class structure also represents pure thematic structure, and the closed class structure, pure Case structure. This operation, for a simple partial sentence, looks like (5) (see Lebeaux 1988, 1991, 1997, 1998 for further extensive discussion). The operation projects the open class elements into the closed class (Case) frame. It also projects up the Case information from Determiner to DP, and unifies the theta information, from the theta subtree, into the Case Frame, so that it appears on the DP node. The Project-α operation was motivated in part by the postulation of a subgrammar in acquisition (Chapters 2, 3, and 4), in part by the remarkable speech error data of Garrett (Chapter 4, Garrett 1975), and in part by idioms (Chapter 4). This operation is discussed at much greater length in further developments by myself (Lebeaux 1991, 1997, 1998). I will discuss in more detail about the subgrammar underpinnings of the Project-α approach later in this preface. For now, I would simply like to point to the remarkable speech error data collected by Merrill Garrett (1975, 1980), the MIT corpus, which anchors this approach.
xvii
PREFACE
(5)
Theta subtree (open class)
Case Frame (closed class)
V N agent
VP V
V
man see
V′
DP +nom V
N patient
Det +nom
woman
the
DP +acc
NP
e
Det +acc
NP
a
e
see
Project-α
VP DP +agent +nom Det +nom
NP +agent
the
man
V′ DP +patient +acc
V see Det +acc
NP +patient
a
woman
Garrett and Shattuck-Hufnagel collected a sample of 3400 speech errors. Of these, by far the most interesting class is the so-called “morpheme-stranding” errors. These are absolutely remarkable in that they show the insertion of open class elements into a closed class frame. Thus, empirically, the apparent “importance” of open class and closed class items is reversed — rather than open class items being paramount, closed class items are paramount, and guide the derivation. Open class elements are put into slots provided by closed class elements, in Garrett’s remarkable work. A small sample of Garrett’s set is shown below.
xviii
PREFACE
(6)
Speech errors (stranded morpheme errors), Garrett (personal communication) (permuted elements underlined) Error Target my frozers are shoulden → my shoulders are frozen that just a back trucking out → a truck backing out McGovern favors pushing busters → favors busting pushers but the clean’s twoer → … two’s cleaner … his sink is shipping → ship is sinking the cancel has been practiced → the practice has been cancelled → … sights set … she’s got her sets sight a puncture tiring device → … tire puncturing device …
As can be seen, these errors can only arise at a level where open class elements are inserted into a closed class frame. The insertion does not take place correctly — a speech error — so that the open class elements end up in permuted slots (e.g. a puncture tiring device). Garrett summarizes this as follows: … why should the presence of a syntactically active bound morpheme be associated with an error at the level described in [(6)]? Precisely because the attachment of a syntactic morpheme to a particular lexical stem reflects a mapping from a “functional” level [i.e. “grammatical functional”, i.e. my theta subtree, D. L.] to a “positional” level of sentence planning …
This summarizes the two phrase structure composition operations that I propose in this thesis: Adjoin-α and Project-α. As can be seen, these involve (1) the intermingling of structures (and are not simply bottom up), and (2) satisfaction of closed class elements. Let us now turn to the general acquisition side of the problem. It was said above that this thesis was unique in that the acquisition sequence and the syntax — in particular, the syntactic derivation — were not considered in isolation, but rather in tandem. The acquisition sequence can be viewed as the output of derivational processes. Therefore, to the extent to which the derivation is partial, the corresponding stage of the acquisition sequence can be seen as a subgrammar of the full grammar. The yoking of the acquisition sequence and the syntax is therefore the following: (7)
A S
subgrammar approach phrase structure composition from smaller units
PREFACE
xix
The subgrammar approach means that children literally have a smaller grammar than the adult. The grammar increases over time by adding new structures (e.g. relative clauses, conjunctions), and by adding new primitives of the representational vocabulary, as in the change from pure theta composed speech, to theta and Case composed speech. The addition of new structures — e.g. relative clauses and conjunctions — may be thought of as follows. A complex sentence like that in (8) may be thought of as a triple: the two units, and the operation composing them (8b). (8)
a. b.
The man saw the woman who loved him. (the man saw the woman (rooted), who loved him, Adjoin-α)
Therefore a subgrammar, if it is lacking the operation joining the units may be thought of as simply taking one of the units — let us say the rooted one — and letting go of the other unit (plus letting go of the operation itself). This is possible and necessary because it is the operation itself which joins the units: if the operation is not present, one or the other of the units must be chosen. The subgrammar behind (8a), but lacking the Adjoin-α operation, will therefore generate the structure in (9) (assuming that it is the rooted structure which is chosen). (9)
The man saw the woman.
This is what is wanted. Note that the subgrammar approach (in acquisition), and the phrase structure composition approach (in syntax itself) are in perfect parity. The phrase structure composition approach gives the actual operation dividing the subgrammar from the supergrammar. That is, with respect to this operation (Adjoin-α), the grammars are arranged in two circles: Grammar 1 containing the grammar itself, but without Adjoin-α, and Grammar 2 containing the grammar including Adjoin-α. (10)
Grammar 2 (w/ Adjoin-α)
Grammar 1
The above is a case of adding a new operation. The case of adding another representational primitive is yet more interesting.
xx
PREFACE
Let us assume that the initial grammar is a pure representation of theta relations. At a later stage, Case comes in. This hypothesis is of the “layering of vocabulary”: one type of representational vocabulary comes in, and does not displace, but rather is added to, another. (11)
theta Stage I
→
theta + Case Stage II
The natural lines along which this representational addition takes place is precisely given by the operation Project-α. The derivation may again be thought of as a triple: the two composing structures, one a pure representation of theta relations, and one a pure representation of Case, and the operation composing them. (12)
((man (see woman)), (the __ (see (a __))), Project-α) the “sees” in theta tree and Case frame each contain partial information which is unified in the Project-α operation.
The subgrammar is one of the two representational units: in this case, the unit (man(see woman)). That is a sort of theta representation or telegraphic speech. The sequence from Grammar 0 to Grammar 1 is therefore given by the addition of Project-α. (13)
Grammar 1 (w/ Project-α)
Grammar 0
The full pattern of stage-like growth is shown in the chart below: (14)
A: Subgrammar Approach Add construction operations Relative clauses, to simplified tree Conjunction (not discussed here) Add primitives to Theta → Theta + Case representational vocabulary
As can be seen, the acquisition sequence and the syntax — syntactic derivation — are tightly yoked. Another way of putting the arguments above is in terms of distinguishing
xxi
PREFACE
accounts. I wish to distinguish the phrase structure operations here from Merge; and the acquisition subgrammar approach here from the alternative, which is the Full Tree, or Full Competence, Approach (the full tree approach holds that the child does not start out with a substructure, but rather has the full tree, at all stages of development.) Let us see how the accounts are distinguished, in turn. Let us start with Chomsky’s Merge. According to Merge, the (adult) phrase structure tree, as in Montague (1974), is built up bottom-up, taking individual units and joining them together, and so on. The chief property of Merge is that it is strictly bottom-up. Thus, for example, in a right-branching structure like “see the big man”, Merge would first take big and man and Merge them together, then add the to big man, and then add see to the resultant. (15)
Application of Merge: V
Det Adj
see
the
N
N′
→
big man
N
big
man
VP
DP Det
NP Adj
the
→
Adj
→ N
big man
V see
DP Det the
NP Adj
N
big man The proposal assayed in this thesis (Lebeaux 1988) would, however, have a radically different derivation. It would take the basic structure as being the basic government relation: (see man). This is the primitive unit (unlike with Merge). To this, the the and the big may be added, by separate transformations, Project-α and Adjoin-α, respectively.
xxii
PREFACE
(16)
a. Project-α Theta subtree V V
N
Project-α V′
see
man V
Case Frame
DP
(see) Det
NP
the
man
V′ V
DP
(see) Det
NP
the
b. Adjoin-α (V′ see (DP the man)) (ADJ big)
e
Adjoin-α → (V′ see (DP the (NP big man)))
How can these radically distinct accounts (Lebeaux 1988 and Merge) be empirically distinguished? I would suggest in two ways. First, conceptually the proposal here (as in Chomsky 1975–1955, 1957, and Tree Adjoining Grammars, Kroch and Joshi 1985) takes information nuclei as its input structures, not arbitrary pieces of string. For example, for the structure “The man saw the photograph that was taken by Stieglitz”, the representation here would take the two clausal nuclear structures, shown in (17) below, and adjoin them. This is not true for Merge which does not deal in nuclear units. (17) s1: the man saw the photograph Adjoin-α the man saw the photograph s2: that was by Stieglitz that was by Stieglitz Even more interesting nuclear units are implicated in the transformation Project-α, where the full sentence is decomposed into a nuclear unit which is the theta subtree, and the Case Frame.
xxiii
PREFACE
(18)
(man (see woman)) The man saw the woman (the _(see a_))
The structure in (18), the man saw the woman, is composed of a basic nuclear unit, (man (see woman)), which is telegraphic speech (as argued for in Chapter 2). No such nuclear unit exists in the Merge derivation of “the man saw the woman”: that is, in the Merge derivation, (man (see woman)) does not exist as a substructure of ((the man) (saw (the woman)). This is the conceptual argument for preferring the composition operation here over Merge. In addition, there are two simplicity arguments, of which I will give just one here. The simplicity argument has to do with a set of structures that children produce which are called replacement sequences (Braine 1976). In these sequences, the child is trying to reach (output) some structure which is somewhat too difficult for him/her. To make it, therefore, he or she first outputs a substructure, and then the whole structure. Examples are given below: the first line is the first outputted structure, and the second line is the second outputted structure, as the child attempts to reach the target (which is the second line). (19) (20)
see see see see
ball (first output) big ball (second output and target) ball (first output) the ball (second output and target)
What is striking about these replacement sequences is that the child does not simply first output random substrings of the final target, but rather that the first output is an organized part of the second. Thus in both (19) and (20), what the child has done is first isolate out the basic government relation, (see ball), and then added to it: with “big” and “the”, respectively. The particular simplifications chosen are precisely what we would expect with the substructure approach outlined here, and crucially not with Merge. With the substructure approach outlined here (Chapter 2, 4), what the child (or adult) first has in the derivation is precisely the structure (see ball), shown in example (21).
xxiv
PREFACE
(21)
V V
N +patient
see
ball
To this structure is then added other elements, by Project-α or Adjoin-α. Thus, crucially, the first structure in (19) and (20) actually exists as a literal substructure of the final form — line 2 — and thus could help the child in deriving the final form. It literally goes into the derivation. By contrast, with Merge, the first line in (19) and (20) never underlies the second line. It is easy to see why. Merge is simply bottom-up — it extends the phrase marker. Therefore, the phrase structure composition derivation underlying (20) line 2, is simply the following (Merge derivation). (22)
Merge derivation underlying (20) line 2 (N ball) (DP (D the) (N ball)) (see (DP (D the) (N ball)))
However, this derivation crucially does not have the first line of (20) — (see (ball)) — as a subcomponent. That is, (see (ball)) does not go into the making of (see (the ball)), in the Merge derivation, but it does in the substructure derivation. But this is a strong argument against Merge. For the first line of the outputted sequence of (20), (see ball), is presumably helping the child in reaching the ultimate target (see (the ball)). But this is impossible with Merge, for the first line in (20) does not go into the making of the second line, according to the Merge derivation. That is, Merge cannot explain why (see ball) would help the child get to the target (see (the ball)), since (see ball) is not part of the derivation of (see (the ball)), in the Merge derivation. It is part of the sub-derivation in the substructure approach outlined here, because of the operation Project-α. The above (see Chapters 2, 3, and 4) differentiates the sort of phrase structure composition operations found here from Merge. This is in the domain of syntax — though I have used language acquisition argumentation. In the domain of language acquisition proper, the proposal of this thesis — the hypothesis of substructures — must be contrasted with the alternative, which holds that the child is outputting the full tree, even when the child is potentially just in the one word stage: this may be called the Full Tree Hypothesis. These
xxv
PREFACE
differential possibilities are shown below. (For much additional discussion, see Lebeaux 1991, 1997, 1998, in preparation.) (23)
Lebeaux (1988) Syntax
Distinguished From
phrase structure com- Both: position (1) no composition (2) Merge
Language Acquisition subgrammar approach Full Tree Approach Let us now briefly distinguish the proposals here from the Full Tree Approach. In the Full Tree Approach, the structure underlying a child sentence like “ball” or “see ball” might be the following in (24). In contrast, the substructure approach (Lebeaux, 1988) would assign the radically different representation, given in (25). (24)
Full Tree Approach IP TP
DP D
NP
AgrSP
T AgrS
AgrOP AgrO
VP DP
V′ V
e
e
e
e
e
e
e
DP D
NP
e
ball
xxvi
PREFACE
(25)
Substructure Approach V V
N +patient ball
How can these approaches be distinguished? That is, how can a choice be made between (25), the substructure approach, and (24), the Full Tree approach? I would suggest briefly at least four ways (to see full argumentation, consult Lebeaux 1997, to appear; Powers and Lebeaux 1998). First, the subgrammar approach, but not the full tree approach, has some notion of simplicity in representation and derivation. Simplicity is a much used notion in science, for example deciding between two equally empirically adequate theories. The Full Tree Approach has no notion of simplicity: in particular, it has no idea of how the child would proceed from simpler structures to more complex ones. On the other hand, the substructure theory has a strong proposal to make: the child proceeds from simpler structures over time to those which are more complex. Thus the subgrammar point of view makes a strong proposal linked to simplicity, while the Full Tree hypothesis makes none. A second argument has to do with the closed class elements, and may be broken up into two subarguments. The first of these arguments is that, in the Full Tree Approach, there is no principled reason for the exclusion of closed class elements in early speech (telegraphic speech). That is, both the open class and closed class nodes exist, according to the Full Tree Hypothesis, and there is no principled reason why initial speech would simply be open class, as it is. That is, given the Full Tree Hypothesis, since the full tree is present, lexical insertion could take place just as easily in the closed class nodes as the open class nodes. The fact that it doesn’t leaves the Full Tree approach with no principled reason why closed class items are lacking in early speech. A second reason having to do with closed class items, has to do with the special role that they have in structuring an utterance, as shown by the work of Garrett (1975, 1980), and Gleitman (1990). Since the Full Tree Approach gives open and closed class items the same status, it has no explanation for why closed class items play a special role in processing and acquisition. The substructure approach, with Project-α, on the other hand, faithfully models the difference, by having open class and closed class elements initially on different representations,
PREFACE
xxvii
which are then fused (for additional discussion, see Chapter 4, and Lebeaux 1991, 1997, to appear). A third argument against the Full Tree Approach has to do with structures like “see ball” (natural) vs. “see big” (unnatural) given below. (26)
see ball (natural and common) see big (unnatural and uncommon)
Why would an utterance like “see ball” be natural and common for the child — maintaining the government relation — while “see big” is unnatural and uncommon? There is a common sense explanation for this: “see ball” maintains the government relation (between a verb and a complement), while “see” and “big” have no natural relation. While this fact is obvious, it cannot be accounted for with the Full Tree Approach. The reason is that the Full Tree Approach has all nodes potentially available for use: including the adjectival ones. Thus there would be no constraint on lexically inserting “see” and “big” (rather than “see” and “ball”). On the substructure approach, on the other hand, there is a marked difference: “see” and “ball” are on a single primitive substructure — the theta tree — while “see” and “big” are not. A fourth argument against the Full Tree Approach and for the substructure approach comes from a paper by Laporte-Grimes and Lebeaux (1993). In this paper, the authors show that the acquisition sequence proceeds almost sequentially in terms of the geometric complexity of the phrase marker. This is, children first output binary branching structures, then double binary branching, then triply binary branching, and so on. This complexity result would be unexpected with the Full Tree Approach, where the full tree is always available. This concludes the four arguments against the Full Tree Approach, and for the substructure approach in acquisition. The substructure approach (in acquisition) and the composition of the phrase marker (in syntax) form the two main proposals of this thesis. Aside from the main lines of argumentation, which I have just given, there are a number of other proposals in this thesis. I just list them here. (1) One main proposal which I take up in all of Chapter 5 is that the acquisition sequence is built up from derivational endpoints. In particular, for some purposes, the child’s derivation is anchored in the surface, and only goes part of the way back to DS. The main example of this can be seen with dislocated constituents. In examples like (27a) and (b), exemplifying Strong Crossover and a Condition C violation respectively, the adult would not allow these constructions, while the child does.
xxviii (27)
PREFACE
a. *Which mani did hei see t? (OK for child) b. *In John’si house, hei put a book t. (OK for child)
It cannot be simply said, as in (27b), that Condition C does not apply in the child’s grammar, because it does, in nondislocated structures (Carden 1986b). The solution to this puzzle — and there exist a large number of similar puzzles in the acquisition literature, see Chapter 5–is that Condition C in general applies over direct c-command relations, including at D-Structure (Lebeaux 1988, 1991, 1998), and that the child analyzes structures like (27b) as if they were dislocated at all levels of representation, thus never triggering Condition C (a similar analysis holds of Strong Crossover, construed as a Condition C type constraint, at DS, van Riemsdijk and Williams 1981). That is, the child derivation, unlike the adult, does not have movement, but starts out with the element in a dislocated position, and indexes it to the trace. This explains the lack of Condition C and Crossover constraints (shown in Chapter 5). It does so by saying that the child’s derivation is shallow: anchored at SS or the surface, and the dislocated item is never treated as if it were fully back in the DS position. This is the shallowness of the derivation, anchored in SS (discussed in Chapter 5). (2) A number of proposals are made in Chapter 2. One main proposal concerns the theta tree. In order to construct the tree, one takes a lexical entry, and does lexical insertion of open class items directly into that. This is shown in (28).
V
(28)
N man →
V V see
N patient
←woman
This means that the sequence between the lexicon and the syntax is in fact a continuum: the theta subtree constitutes an intermediate structure between those usually thought to be in the lexicon, and those in the syntax. This is a radical proposal. A second proposal made in Chapter 2 is that X′ projections project up as far as they need to. Thus if one assumed the X′-theory of Jackendoff (1977) (as I did in this thesis) — recall that Jackendoff had 3 X′ levels — then an element might project up to the single bar level, double bar level, or all the way up to the triple bar level, as needed.
PREFACE
xxix
N′′′
(29)
N′′ N′ N This was called the hypothesis of submaximal projections. A final proposal of Chapter 2 is that the English nominal system is ergative. That is, a simple intransitive noun phrase like that in (29), with the subject in the subject position (of the noun phrase) is always derived from a DS in which the subject is a DS object. Crucially, this includes not simply unaccusative verbs (i.e. nominals from unaccusative verbs) but unergative verbs as well (such as sleeping and swimming). (30)
a. b.
John’s sleeping derived from: the sleeping of John (subject internal) John’s swimming derived from: the swimming of John (subject internal)
This means that the English nominal system is actually ergative in character — a startling result. Some final editorial comments. For space reasons in this series, Chapter 5 in the original thesis has been deleted, and Chapter 6 has been re-numbered Chapter 5. Second, I have maintained the phrase structure nodes of the original trees, rather than trying to “update” them with the more recent nodes. The current IP is therefore generally labelled S (sentence), the current DP is generally labelled NP (noun phrase), and the current CP is sometimes labelled S′ (S-bar, the old name for CP). Finally, the term dislocation in Chapter 5 is intended to be neutral by itself between moved and base-generated. The argument of that section is that wh-elements which are moved by the adult, are base generated in dislocated positions by the child. Finally, I would like to thank Lisa Cheng and Anke de Looper for helpful editorial assistance.
Introduction This work arose out of an attempt to answer three questions: I.
Is there a way in which the Government-Binding theory of Chomsky (1981) can be formulated so that the leveling in it is more essential than in the current version of the theory? II. What is the relation between the sequence of grammars that the child adopts, and the basic formation of the grammar, and is there such a relation? III. Is there a way to anchor Chomsky’s (1981) finiteness claim that the set of possible human grammars is finite, so that it becomes a central explanatory factor in the grammar itself? The work attempts to accomplish the following: I.
To provide for an essentially leveled theory, in two ways: by showing that DS and SS are clearly demarcated by positing operations additional to Move-α which relate them, and by suggesting that there is a ordering in addition by vocabulary, the vocabulary of description (in particular, Case and theta theory) accumulating over the derivation. II. To relate this syntactically argued for leveling to the acquisition theory, again in two ways: by arguing that the external levels (DS, the Surface, PF) may precede S-structure with respect to the induction of structure, and by positing a general principle, the General Congruence Principle, which relates acquisition stages and syntactic levels. III. To give the closed class elements a crucial role to play: with respect to parametric variation, they are the locus of the specification of parametric difference, and with respect to the composition of the phrase marker: it is the need for closed class (CC) elements to be satisfied which gives rise to phrase marker composition from more primitive units, and which initiates Move-α as well. In terms of syntactic content, Chapters 2–4 deal with phrase structure — both the acquisition and the syntactic analysis thereof — and Chapter 5 deals with the interaction of indexing functions, Control and Binding Theory, with levels of representation, particularly as it is displayed in the acquisition sequence.
2
INTRODUCTION
Thematically, a number of concerns emerge throughout. A major concern is with closed class elements and finiteness. With respect to parametric variation, I suggest that closed class elements are the locus of parametric variation. This guarantees finiteness of possible grammars in UG, since the set of possible closed class elements is finite.1 With respect to phrase structure composition, it is the closed class elements, and the necessity for their satisfaction, which require the phrase marker to be composed, and initiate movement as well (e.g. Move-wh is in a 1-to-1 correspondence with the lexical necessity: Satisfy +wh feature). The phrase marker composition has some relation to the traditional generalized transformations of Chomsky (1957), and they may apply (in the case of Adjoin-α) after movement. But the composition that occurs is of a strictly limited sort, where the units are demarcated according to the principles of GB. Finally, closed class elements form a fixed frame into which the open class (OC) elements are projected (Chapters 1, 2, and 4). More exactly, they form a Case frame into which a theta sub-tree is projected (Chapter 4). This rule, I call Merger (or Project-a). A second theme is the relation of stages in acquisition to levels of grammatical representation. Since the apparent difficulty of any theory which involves the learning of transformations,2 the precise nature of the relation of the acquisition sequence to the structure of the grammar has remained murky, without a theory of how the grammatical acquisition sequence interacts with, or displays the structure of the grammar, and with, perhaps, many theoreticians believing that any correspondence is otiose. Yet there is considerable reason to believe that there should be such a correspondence. On theoretical grounds, this would be expected for the following reason: The child in his/her induction of the grammar is not handed information from all levels in the grammar at once, but rather from particular picked out levels; the external levels of Chomsky (class lectures, 1985) — DS, LF, and PF or the surface. These are contrasted to the internal level, S-structure. Briefly, information from the external levels are available to the child; about LF because of the paired meaning interpretation, from the surface in the obvious fashion, and from DS, construed here simply as the format of lexical forms, which are presumably given by UG. As such, the child’s task (still!) involves the interpolation of operations and levels between these relatively fixed points. But, this then means
1. Modulo the comments in Chapter 1, footnote 1. 2. Because individual transformations are no longer sanctioned in the grammar. I do not believe, however, that the jury is yet in on the type of theory that Wexler and Culicover (1980) envisage.
INTRODUCTION
3
that the acquisition sequence must build on these external levels, and display the structure of the levels, perhaps in a complex fashion. A numerical argument leads in the same direction: namely, that the acquisition theory, in addition to being a parametric theory, should contain some essential reference to, and reflect, the structure of the grammar. Suppose that, as above, the closed class elements and their values are identified with the possible parameters. Let us (somewhat fancifully) set the number at 25, and assume that they are binary. This would then give 225 target grammars in UG (=30 million), a really quite small finite system. But, consider the range of acquisition sequences involved. If parameters are independent — a common assumption — then any of these 25 parameters could be set first, then any of the remaining 24, and so on. This gives 25! possible acquisition sequences for the learning of a single language (=1.5 × 1025), a truly gigantic number. That is, the range of acquisition sequences would be much larger than the range of possible grammars, and children might be expected to display widely divergent intermediate grammars in their path to the final common target, given independence. Yet they do nothing of the sort; acquisition sequences in a given language look remarkably similar. All children pass through a stage of telegraphic speech, and similar sorts of errors are made in structures of complementation, in the acquisition of Control, and so on. There is no wide fecundity in the display of intermediate grammars. The way that has been broached in the acquisition literature to handle this has been the so-called linking of parameters, where the setting of a single parameter leads to another being set. This could restrict the range of acquisition sequences. But, the theories embodying this idea have tended to have a rather idiosyncratic and fragmentary character, and have not been numerous. The suggestion in this work is that there is substructuring, but this is not in the lexical-parametric domain itself (conceived of as the set of values for the closed class (CC) elements), but in the operational domain with which this lexical domain is associated. An example of this association was given above with the relation of the wh-movement to the satisfaction of the +wh feature; another example would be with satisfaction of the relative clause linker (the wh-element itself), which either needs or does not need to be satisfied in the syntax. This gives rise to either language in which the relative forms a constituent with the head (English-type languages), or languages in which it is splayed out after the main proposition, correlative languages. (1) Lexical Domain Operational Domain +wh must be satisfied by SS Move-wh applies in syntax +wh may not be satisfied by SS Move-wh applies at LF
4
INTRODUCTION
Lexical Domain Relative Clause linker must be satisfied by SS Relative Clause linker may not be satisfied by SS
Operational Domain English-type language Correlative language
The theory of this work suggests that all operations are dually specified in the lexical domain (requiring satisfaction of a CC lexical element) and in the operational domain. The acquisition sequence reflects the structure of the grammar in two ways: via the General Congruence Principle, which states that the stages in acquisition are in a congruence relation with the structure of parameters (see Chapter 3 for discussion), and via the use of the external levels (DS, PF, LF) as anchoring levels for the analysis — essentially, as the inductive basis. The General Congruence Principle is discussed in Chapter 2–4, the possibility of broader anchoring levels, in Chapter 5. The latter point of view is somewhat distinct from the former, and (to be frank) the exact relation between them is not yet clear to the author. It may be that the General Congruence Principle is a special case, when the anchoring level is DS, or it may be that these are autonomous principles. I leave this question open. The third theme of this work has to do with levels or precedence relations in the grammar. In particular, with respect to two issues: (a) Is it possible to make an argument that the grammar is essentially derivational in character, rather than in the representational mode (cf. Chomsky’s 1981 discussion of Move-α)? (b) Is there any evidence of intermediate levels, of the sort postulated in van Riemsdijk and Williams (1981)? I believe that considering a wider range of operations than Move-α may move this debate forward. In particular, I propose two additional operations of phrase structure composition: Adjoin-α, which adjoins adjuncts in the course of the derivation, and Project-α, which relates the lexical syntax to the phrasal. With respect to these operations, two types of precedence relations do seem to hold. First, operation/default organization holds within an operation type. In the case of Adjoin-α and its corresponding default, Conjoin-α (i.e., two of the types of generalized transformations in Chomsky 1957, are organized as a single operation type, with an operation/default relation between them). The other precedence relation is vocabulary layering and this hold between different operations, for example, Case and theta theory (see Chapter 2, 3, and 4 for discussion). Further, operations like Adjoin-α may follow Move-α, and this explains the anti-Reconstruction facts of van Riemsdijk and
INTRODUCTION
5
Williams (1981); such facts cannot be easily explained in the representational mode (see Chapter 3). In general, throughout this work I will interleave acquisition data and theory with ‘pure’ syntactic theory, since I do not really differentiate between them. Thus, the proposal having to do with Adjoin-α was motivated by pure syntactic concerns (the anti-Reconstruction facts, and the attempt to get a simple description of licensing), but was then carried over into the acquisition sphere. The proposal having to do with the operation of Project-α (or Merger) was formulated first in order to give a succinct account of telegraphic speech (and, to a lesser degree, to account for speech error data), and was then carried over into the syntactic domain. To the extent to which this type of work is successful, the two areas, pure syntactic theory and acquisition theory may be brought much closer, perhaps identified.
C 1 A Re-Definition of the Problem
1.1 The Pivot/Open Distinction and the Government Relation For many years language acquisition research has been a sort of weak sister in grammatical research. The reason for this, I believe, lies not so much in its own intrinsic weakness (for a theoretical tour de force, see Wexler and Culicover 1980, see also Pinker 1984), but rather, as in other unequal sibships, in relation. This relation has not been a close one; moreover the lionizing of the theoretical importance of language acquisition as the conceptual ground of linguistic theorizing has existed in uneasy conscience alongside a real practical lack of interest. Nor is the fault purely on the side of theoretical linguistics: the acquisition literature, especially on the psychological side, is notorious for having drifted further and further from the original goal of explaining acquisition, i.e. the sequence of mappings which take the child from G0 to the terminal grammar Gn, to the study of a different sort of creature altogether, Child Language (see Pinker 1984, for discussion and a diagnostic). 1.1.1
Braine’s Distinction
Nonetheless, even in the psychological literature, especially early on, there were a number of proposals of quite far-reaching importance which would, or could, have (had) a direct bearing on linguistic theory, and which pointed the way to theories far more advanced than those available at the time. For example, Braine’s (1963a) postulation of pivot-open structures in early grammars. Braine essentially noticed and isolated three properties of early speech: for a large number of children, the vocabulary divided into two classes, which he called pivot and open. The pivot class was “closed class”, partly in the sense that it applies in the adult grammar (e.g., containing prepositions, pronouns, etc.) but partly also in the broader sense: it was a class that contained a small set of words which couldn’t be added on to, even though these words corresponded to
8
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
those which would ordinarily be thought of as open class (e.g. “come”); these words operated on a comparatively large number of open class elements. An example of the Braine data is given below. (1)
Steven’s word combinations want baby see record want car see Stevie want do want get whoa cards want glasses whoa jeep want head want high more ball want horsie more book want jeep want more there ball want page there book want pon there doggie want purse there doll want ride there high want up there momma want byebye car there record there trunk it ball there byebye car it bang there daddy truck it checker there momma truck it daddy it Dennis that box it X etc. that Dennis that X etc. get ball get Betty here bed get doll here checker here doll see ball here truck see doll bunny do daddy do momma do
A RE-DEFINITION OF THE PROBLEM
9
The second property of the pivot/open distinction noticed by Braine was that pivot and open are positional classes, occurring in a specified position with respect to each other, though the positional ordering was specific to the pivot element itself (P1 Open, Open P2, etc.) and hence not to be captured by a general phrase structure rewrite rule: S → Pivot Open. This latter fact was used by critical studies of the time (Fodor, Bever, and Garrett 1974, for example) to argue that Braine’s distinction was somehow incoherent, since the one means of capturing such a distinction, phrase structure rules, required a general collapse across elements in the pivot class which was simply not available in the data. The third property of the pivot/open distinction was that the open class elements were generally optional, while the pivot elements were not. 1.1.2
The Government Relation
What is interesting from the perspective of current theory is just how closely Braine managed to isolate analogs not to the phrase structure rule descriptions popular at that time, but to the central relation primitives of the current theory. Thus the relation of pivot to open classes may be thought of as that between governor and governed element, or perhaps more generally that of head to complement; something like a primitive prediction or small clause structure (in the extended sense of Kayne 1984) appears to be in evidence in these early structures as well: (2)
Steven word utterances: it ball that box it bang that Dennis it checker that doll it X, etc. that Tommy that truck there ball here bed there book here checker there doggie here doll there X, etc. here X, etc. Andrew word combinations: boot off light off pants off shirt off shoe off
airplane all gone Calico all gone Calico all done salt all shut all done milk
10
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
water off clock on there up on there hot in there X in/on there, etc.
all all all all
done gone gone gone
now juice outside pacifier
Gregory word combinations: byebye plane byebye man byebye hot
allgone allgone allgone allgone allgone
shoe vitamins egg lettuce watch
etc. The third property that Braine notes, the optionality of the open constituent with respect to the pivot, may also be regularized to current theory: it is simply the idea that heads are generally obligatory while complements are not. The idea that the child, very early on, is trying to determine the general properties of the government relation in the language (remaining neutral for now about whether this is case or theta government) is supported by two other facts as well: the presence of what Braine calls “groping patterns” in the early data, and the presence of what he calls formulas of limited scope. The former can be seen in the presence of the “allgone” constructions in Andrew’s speech. The latter refers simply to the fact that in the very early two-word grammars, the set of relations possible between the two words appears limited in terms of the semantic relation which hold between them. This may be thought of as showing that the initial government relation is learned with respect to specific lexical items, or cognitively specified subclasses, and is then collapsed between them. See also later discussion. The presence of “groping patterns”, i.e. the presence, in two word utterances of patterns in which the order of elements is not fixed for lexically specific elements corresponds to the original experimentation in determining the directionality of government (Chomsky 1981, Stowell 1981). The presence of groping patterns is problematic for any theory of grammar which gives a prominent role to phrase structure rules in early speech, since the order of elements must be fixed for all elements in a class. See, e.g., the discussion in Pinker (1984), which attempts, unsuccessfully I believe, to naturalize this set of data. To the extent to which phrase structure order is considered to be a derivative notion, and the government-of relation the primitive one, the presence of lexically specific order difference is not particularly problematic, as long as the
A RE-DEFINITION OF THE PROBLEM
11
directionality of government is assumed to be determined at first on a word-byword basis.
1.2 The Open/Closed Class Distinction Braine’s prescient analysis was attacked in the psychological literature on both empirical and especially theoretical grounds; it was ignored in the linguistic literature. The basis of the theoretical attack was that the pivot/open distinction, being lexically specific with respect to distribution, would not be accommodated in a general theory of phrase structure rules (as already mentioned above); moreover, the particular form of the theory adopted by Braine posited a radical discontinuity in the form of the grammar as it changed from a pivot/open grammar to a standard Aspects-style PS grammar. This latter charge we may partly diffuse by noting that there is no need to suppose a radical discontinuity in the form of the grammar as it changed over time, the pivot/open grammar is simply contained as a subgrammar in all the later stages. However, we wish to remain neutral, for now, on the general issue of whether such radical discontinuities are possible. The proponents of such a view, especially the holders of the view that the original grammar was essentially “semantic” (i.e. thematically organized), held the view in either a more or less radical form. The more extreme advocates (Schlesinger 1971) held not simply that there was a radical discontinuity, but that the primitives of later stages — syntactic primitives like case and syntactic categories like noun or noun phrase — were constructed out of the primitives of the earlier stages: a position one may emphatically reject. Other theoreticians, however, in particular Melissa Bowerman (Bowerman 1973, 1974) held that there was such a discontinuity, but without supposing any construction of the primitives of the later stages from those of the earlier. We return, in detail, to this possibility below. More generally, however, the charge that the pivot/open class stage presents a problem for grammatical description appears to dissolve once the governmentof relation is taken to be the primitive, rather than the learning of a collection of (internally coherent) phrase structure rules. However, more still needs to be said about Braine’s data. For it is not simply the case that a rudimentary government relation is being established, but that this is overlaid, in a mysterious way, with the open/closed class distinction. Thus it is not simply that the child is determining the government-of and predicate of relations in his or her language, but also that the class of governing
12
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
elements is, in some peculiar way, associated with a distributional class: namely, that of closed class elements. While the central place of the government-of relation in current theory gives us insight into one-half of Braine’s data, the role of the closed class/open class distinction, though absolutely pervasive in both Braine’s work and in all the psycholinguistic literature (see Garrett 1975, Shattuck-Hufnagel 1974, Bradley 1979, for a small sample) has remained totally untouched. Indeed, even the semantic literature, which has in general paid much more attention to the specifier relation than transformational-generative linguistics, does not appear to have anything to say that would account for the acquisition facts. What could we say about the initial overlay of the elements closed class and the set of governors? The minimal assumption would be something like this: (3)
The set of canonical governors is closed class.
While this is an interesting possibility, it would involve, for example, including prepositions and auxiliary verbs in the class of canonical governors, but not main verbs. Suppose that we strengthen (3), nonetheless. (4)
Only closed class elements may govern.
What about verbs? Interestingly, a solution already exists in the literature: in fact, two of them. Stowell (1981) suggests that it is not the verb per se which governs its complements, but rather the theta grid associated with it. Thus the complements are theta governed under coindexing with positions in the theta grid. And while the class of verbs in a language is clearly open class and potentially infinite, the class of theta grids is equally clearly finite: a member of a closed, finite set of elements. Along the same lines, Koopman (1984) makes the interesting, though at first glance odd, suggestion that it is not the verb which Case-governs its complements, but Case-assigning features associated with the verb. She does this in the context of a discussion of Stowell’s Case adjacency requirement for case assignment; a proposal which appears to be immediately falsified by the existence of Dutch, a language in which the verb is VP final, but the accusative marked object is at the left periphery of the VP. Koopman saves Stowell’s proposal by supposing that the Case-assigning features of the verb are at the left periphery, though the verb itself is at the right. This idea that the two aspects of the verb are separable in this fashion will be returned to, and supported, below. What is crucial for present purposes is simply to note that Casegoverning properties of the verb are themselves closed class, though the set of verbs is not. Thus both the Case-assigning and theta-assigning properties of the
A RE-DEFINITION OF THE PROBLEM
13
verb are closed class, and we may assume that these, rather than some property of the open class itself enters into the government relation. There is a second possibility, less theory-dependant. This is simply that, as has often been noted, there is within the “open” part of the vocabulary of language a subset which is potentially closed: this is the so-called basic vocabulary of the language, used in the teaching of basic English, and other languages. The verb say would presumably be part of this closed subset, but not the verb mutter, as would their translations. The child task may be viewed as centering on the closed class elements in the less abstract sense of lexical items, if these are included in the set. 1.2.1
Finiteness
While the syntactic conjecture that the Case features on the verb are governing its object has been often enough made, the theoretical potential of such a proposal has not been realized. In essence, this proposal reduces a property of an open class of elements, namely verbs, to a property of a closed class of elements (the Case features on verbs). Insofar as direction of government is treated as a parameter of variation across languages, by reducing government directionality to a property of a closed class set, the two sorts of finiteness, lexical and syntactic, are joined together. The finiteness of syntactic variation (Chomsky 1981) is tied, in the closest possible way, to the necessary finiteness of a lexical class (and the specifications associated with it). Let us take another example. English allows wh-movement in the syntax; Chinese, apparently, apportions it into LF (Huang 1982). This is a parametric difference in the level of derivation at which a particular operation applies. However, this may well be reducible to a parametric difference in a closed class element. Let us suppose, following Chomsky (1986), that wh-movement is movement into the specifier position of C′. Ordinarily it is assumed that lexical selection (of the complement-taking verb) is of the head. Let us assume likewise — the matrix verb must select for a +/− wh feature in Comp. This, in turn, must regulate the possible appearance of the wh-word appearing in the specifier position of C′. We may assume that some agreement relation holds between these two positions, in direct analog to the agreement relation which exists generally between specifier and head positions, e.g. with respect to case. Thus the presence of the overt wh-element in Spec C′ is necessary to agree with, or saturate the +wh feature which is basegenerated in Comp. What then is the difference between English and Chinese? Just this: the agreeing element in Comp must be satisfied at S-structure in English,
14
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
while it needs only be satisfied at LF in Chinese. This difference, in turn, may be traced to some intrinsic property of agreement in the two languages, we might hope. (5)
I wonder
C′′ C′
who
I′′
Comp
I′
NP John
VP
I V
NP
saw
e
If this sketch of an analysis is correct — or something like it is — then the parametric difference between English and Chinese with respect to wh-movement is reduced to a difference in the lexical specification of a closed class element.1 Since the possible set of universal specifications associated with a closed class set of elements is of necessity finite, the finiteness conjecture of Chomsky (1981) would be vindicated in the strongest possible way. Namely, the finiteness in parametric variation would be tied, and perhaps only tied, to the finiteness of a group of necessarily finite lexical elements, and the information associated with them. 1.2.2
The Question of Levels
There is a different aspect of this which requires note. The difference between Chinese and English with respect to wh-movement is perhaps associated with features on the closed class morpheme, but this shows up as a difference in the appearance of the structure at a representational level. I believe that this is in
1. I should note that the term closed class element here is being used in a somewhat broader sense than usual, to encompass elements like the +wh feature. The finiteness in the closed class set cannot be that of the actual lexical items themselves, since these may vary from language to language, but in the schema which defines them (e.g. definite determiner, indefinite determiner, Infl, etc.).
A RE-DEFINITION OF THE PROBLEM
15
general the case: namely, that while information associated with a closed class element is at the root of some aspect of parametric variation, this difference often evidences itself in the grammar by a difference in the representational level at which a particular operation applies. We may put this in the form of a proposal: (6)
The theory of UG is the theory of the parametric variation in the specifications of closed class elements, filtered through a theory of levels.
I will return throughout this work to more specific ways in which the conjecture in (6) may be fleshed out, but I would like to return at this point to two aspects which seem relevant. First is the observation made repeatedly by Chomsky (1981, 1986a), that while the set of possible human languages is (at least conjecturally) finite, they appear to have a wide “scatter” in terms of surface features. Why, we might ask, should this be the case? If the above conjecture (6) is correct, it is precisely because of the interaction of the finite set of specifications associated with the closed class elements, and the rather huge surface differences which would follow from having different operations apply at different levels. The information associated with the former would determine the latter; the latter would give rise to the apparent huge differences in the description of the world’s languages, but would itself be tied to a parametric variation in a small, necessarily finite set. How does language acquisition proceed under these circumstances? Briefly, it must proceed in two ways: by determining the properties of lexical specifications associated with the closed class set the child determines the structure of the levels; by determining the structure of the levels he or she determines the properties of the closed class morphemes. The proposal that the discovery of properties associated with closed class lexical items is central obviously owes a lot to Borer’s (1985) lexical learning hypothesis, that what the child learns, and all that he/she learns is associated with properties of lexical elements. It constitutes, in fact, a (fairly radical) strengthening of that proposal, in the direction of finiteness. Thus while the original lexical learning hypothesis would not guarantee finiteness in parametric variation, the version adopted in (6) would, and thus may be viewed as providing a particular sort of grounding for Chomsky’s finiteness claim. However, the proposal in (6) contains an additional claim as well: that the difference in the specifications of closed class elements cashes in as a difference in the level that various operations apply. Thus it provides an outline of the way that the gross scatter of languages may be associated with a finite range.
16
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
1.3 Triggers 1.3.1
A Constraint
The theory of parametric variation or grammatical determination has often been linked with a different theory: that of triggers (Roeper 1978b, 1982, Roeper and Williams 1986). A trigger may be thought of, in the most general case, as a piece of information, on the surface string, which allows the child to determine some aspect of grammatical realization. The idea is an attractive one, in that it suggests a direct connection between a piece of surface data and the underlying projected grammar; it is also in danger, if left further undefined, of becoming nearly vacuous as a means of grammatical description. A trigger, as it is commonly used, may apply to virtually any property of the surface string which allows the child to make some determination about his or her grammar. There is, as is usual in linguistic theory, a way to make an idea more theoretically valuable: that is, by constraining it. This constraint may be either right or wrong, but it should, in either case, sharpen the theoretical issues involved. In line with the discussion earlier in the chapter, let us limit the content of “trigger” in the following way: (7)
A trigger is a determination in the property of a closed class element.
Given the previous discussion, the differences in the “look” of the output grammar may be large, given that a trigger has been set. The trigger-setting itself, however, is aligned with the setting of the specification of a closed class element. There are a number of instances of “triggers” in the input which must be reexamined given (7) above, there are however, at least two very good instances of triggers in the above sense which have been proposed in the literature. The first is Hyams (1985, 1986, 1987) analysis of the early dropping of subjects in English. Hyams suggests that children start off with a grammar which is essentially pro-drop, and that English-speaking children then move to an Englishtype grammar, which is not. These correspond to developmental stages in which children initially allow subjects to drop, filter out auxiliaries, and so on (as a first step), to one in which they do not so do (as the second step). The means by which children pass from the first grammar to the second, Hyams suggests, is by means of the detection of expletives in the input. Such elements are generally assumed not to exist in pro-drop languages; the presence of such elements would thus allow the child to determine the type of the language that he or she was facing.
A RE-DEFINITION OF THE PROBLEM
1.3.2
17
Determining the base order of German
The other example of a trigger, in the sense of (7) above, is found in Roeper’s (1978b) analysis of German. While German sentences are underlyingly verb-final (see Bierwisch 1963, Bach 1962, Koster 1975, and many others), the verb may show up in either the second or final position. (8)
a. b.
Ich sah ihn. I saw him. Ich glaube dass ich ihn gesehen habe. I believe that I him seen have
Roeper’s empirical data suggests that the child analyses German as verb-final at a very early stage. However, this leaves the acquisition question open: how does the child know that German is verb final? Roeper proposes two possible answers: (9)
i. ii.
Children pay attention to the word order in embedded, not matrix clauses. Children isolate the deep structure position of the verb by reference to the placement of the word “not”, which is always at the end of the sentence.
At first, it appears that the solution (i) is far preferable. It is much more general, for one thing, and it also allows a natural tie-in with theory — namely, Emonds (1975) conception, that various transformations apply in root clauses which are barred from applying in embedded contexts. However, recent work by Safir (1982) suggests that Emonds generalization follows from other principles, in particular that of government, and even if it were the case that Safir’s particular proposal were not correct, it would certainly be expected, in the context of current theory, that the difference between root and embedded clauses would not be stated as part of the primitive basis, but would follow from more primitive specifications. A different line of deduction, not available to Roeper in 1974, appears to be more promising. Namely, for the child to deduce DS position from a property of the government relation. Given Case Adjacency (Stowell 1981), and given a theory which states that Case assignment applies prior to verb movement, and given the assumption that the accusative-marked element has not moved (all these assumptions are necessary), then the presence of the accusative-marked object, in the presence of other material preceding it in the VP, would act as a legitimate “marker” for the presence of the DS verb following it:
18
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(10)
a. b.
Ich habe dem Mann das Buch gegeben. I have (to) the man the book given. Ich gebe dem Mann das Buch t. I gave the man the book.
This is one way out of the problem. However, certain of these assumptions appear questionable, or at least not easily determinable by the child on the basis of surface evidence. For example, the accusative will appear adjacent to the phrase-final verb if both objects in the double object construction are definite full NPs, but if the direct object is a pronoun, the order of the two complements reverses, obligatorily (Thiersch 1978). (11)
a. *Ich hatte dem Mann es gegeben. I had the man it given b. Ich hatte es dem Mann gegeben. I had it the man given
Assuming that accusative is a case assigned by the verb but that dative is not, the interposition of the dative object between the verb and the direct object would create learnability problems for the child; in particular, the presence of the accusative object would not be an invariable marker of the presence (at DS) of the verb beside it. Of course, additional properties or rules (e.g. with respect to the possibility of cliticization) may be added to account for the adult data, but this would complicate the learnability problem, in the sense that the child would already have to have access to this information prior to his/her positing of the verb-final base. A second and equally serious difficulty with using the presence of the accusative object as a sure marker of the presence of the DS subject (under Case Adjacency) is simply that in quite closely related languages, e.g. Dutch, such a strict adjacency requirement does not seem to be required. Thus in Dutch, the accusative object appears at the beginning of the verb phrase, while the verb is phrase final. (12)
a. b.
Jan plant bomen in de tuin. John plants trees in the garden. Jan be-plant de tuin met bomen. John plants the garden with trees.
Of course, it is possible to take account of this theoretically, along the lines that Koopman (1984) suggests, where the Case-assigning features are split off from the verb itself. But the degree of freedom necessitated in this proposal, while quite possible from the point of view of a synchronic description of the grammar,
A RE-DEFINITION OF THE PROBLEM
19
makes it unattractive as a learnability trigger (in the sense of (7) above). In particular, the abstract Case-assigning features, now separated from the verb, could no longer allow the presence of an accusative marked object to be the invariable marker of the verb itself, and thus allow the child to determine the deep structure position of the verb within the VP. While not unambiguously rejecting the possibility that the presence of accusative case may act as the marker for the verb in conjunction with other factors for the child (since, in current theory, it is the interaction of theories like Case and Theta theory which allow properties of a construction to be determined: why should it be any different for the child?) let us turn to the third option for learnability, the second option outlined by Roeper (1974). This is that the position of Neg or the negative-like element marks the placement of the verb for the child. Not following (yet) from any general principle, this may appear to be the most unpromising proposal of the lot. Let us first, however, suitably generalize it: (13)
The child locates the position of the head by locating the position of the closed class specifier of the head; the latter acts as a marker for the presence of the former.
If we assume that Neg or not is in the specifier of V′ or V″, the generalization in (13) is appropriate. Does (13) hold? Before turning to specifically linguistic questions, it should be noted that there does exist a body of evidence in the psycholinguistic literature, dating from the mid-sixties, which bears on this question (Braine 1965, Morgan, Meier, and Newport 1987) This is in the learning of artificial languages by adults. Such languages may be constructed to have differing properties, and, in particular, to either have or not have a closed class subset in them. Such languages are “learned”, not by direct tuition, but rather by direct exposure to a large number of instances of well-formed sentences in the language, large enough so that the subject cannot have recourse to nonlinguistic, general problem-solving techniques. What is interesting about this line of research is that the closed-class morphemes seem to play an absolutely crucial role in the learnability of the language (Braine 1965, Morgan, Meier, and Newport 1987) In particular, with such morphemes, the grammar of the language is fairly easily deducible, but without them the deduction is very much more difficult. Certain questions relevant to the issue at hand have not been dealt with in this literature — for example, in a language in which at the surface the string may have the head in one of two places, how is this placement determined? — but the general upshot of this line of research seems clear: such elements are crucial in the language’s acquisition. Of course, it must still be noted that this is language-learning by
20
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
adults, not children, but the fact that the learning is occurring under conditions of input-flooding, rather than direct tuition, makes it correspond much more closely to the original conditions under which learning takes place. Let us return to (13). The claim in (13) is that the closed class specifiers associated with a head are better markers of that head’s DS position than the head itself. Given that the child must in general determine whether movement has taken place, it is necessary that there be some principle, gatherable from the surface itself, which so determines it. With respect to direct complements of a head, we may assume that the detection of movement (i.e. the construction of the DS and SS representations from the surface) is done with respect to characteristics of the head: for example, that the head is missing one of its obligatorily specified complements. What about if the head itself is moved? In this case, any of three sorts of information would suffice: either the stranding of a (on the surface, not governed) complement would suffice, or the stranding of a specifier. It is also possible, if the head is subcategorized by another head, and this head is left without a complement, then the detection of movement would take place. This would be the case, in English, in instances where a full clause was fronted. With respect to the movement of the verbal head in German, the second proposal, that the DS position of the verb is determined with respect to an obligatorily present complement, corresponds to the proposal that the DS position of the verb is detected by reference to the accusative-marked object. The second possibility, that the stranding of the specifier marks the DS position of the head, is essentially Roeper’s proposal with respect to the placement of Neg. What about if the specifier itself is moved? This could be detected by the placement of the head, if the head itself were assumed to be rigid in such a configuration. A grave difficulty would arise, however, if both the specifier and the head-complement complex were moved from a single DS position, since the DS position would not be determinable.
XP
(14)
X′ YP
Spec X′ X
We are left with the following logical space of possibilities [Recall that at the time that this work was written, the specifier was viewed differently than it is now. It constituted the set of closed class elements associated
A RE-DEFINITION OF THE PROBLEM
21
with an open class head, among other things. For example the was considered the specifier in the picture of John; and an auxiliary verb like was was considered the specifier of the verb phrase, in a verb phrase like was talking to Mary. Thus the and was would be considered specifiers, rather than independently projecting heads. The following discussion only makes sense with this notion of specifier in mind. D. L.] (15)
Moved element complement head specifier
Detected by head complement/specifier/subcategorizing head head/subcategorizing head (?)
By “subcategorizing head” I mean the head which, if it exists, selects for the embedded head and its arguments. The idea that the subcategorizing head may determine the presence of the specifier, as well as the head of the selected phrase, may be related to the proposal, found in Fukui and Speas (1986) as well as in the categorial grammar, that, for some purposes at least, the specifier may act as the head of a given element (e.g. of an NP). I return in detail to this possibility below. The chart in (15) gives the logical space in which the problem may be solved, but leaves almost all substantive issues unresolved; more disturbingly, the process of “detection”, as it is faced by the child in (15), does not bear any obvious and direct relation to current linguistic theory. The linking of dislocated categories and their DS positions takes place, in current theory, under two relations: antecedent government and lexical government (Lasnik and Saito 1985, Chomsky 1986, Aoun, Hornstein, Lightfoot and Weinberg 1987). Let us go further, and, in the spirit of Aoun, Hornstein, Lightfoot, and Weinberg (1987), associate lexical government with the detection of the existence of the null element (and perhaps its category), while antecedent government determines the properties of that element: both constitute, in the broadest sense, a sort of recoverability condition with respect to the placement of dislocated elements. We might take the detection of the existence to take place at a particular level (e.g. PF or the surface), while the detection of properties takes place at another (e.g., LF). It was suggested earlier that, in spite of its theoretical attractiveness, the possibility that the child detected the DS position of the verb via the position of the accusative-marked object and Case Adjacency seemed unlikely, as too difficult an empirical problem (given the possible splitting up of Case-assigning features from the verb, etc.) Let us suppose that this difficulty is principled, in the sense that in the child grammar, as in the adult grammar, the movement of
22
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
heads is never “detected” by the governed element of the head, but rather by the governor. Thus the child, even though he is constructing the grammar, is using the same principles as the adult. This radically reduces the logical space of (15) to that in (16):2 (16)
Type of element moved complement head specifier
Detected by governing head governing head ?/governing head
The question mark next to the specifier in (16) is partly because of an embarrassment of riches — it could be either the higher head or the category head itself which governed the specifier — and partly because of a lack: it is not clear that either governs the specifier in the same way that a complement (e.g a subcategorized NP) is governed by a head. One thing seems quite clear: closed class specifiers are fixed, with respect to certain movement operations in a way that other elements are not. There is, for example, no way to directly question the cardinality of the specifier in English, without moving the entire NP along with it: (17) *a did you see (e man)? Nor, as Chierchia (1984) points out, is there a way to directly question prepositions, suggesting a similar constraint: (18) *To did you talk (e the man)? (“Was it to that you talked to the man?”) While prepositions are not normally thought of as specifiers of NP, but rather as enforcing their own X′ system (see Jackendoff 1977), there is a strong and persistent undercurrent that certain PPs, even in English, are simply a spell-out of Case-marking and perhaps theta-marking features, something which is, arguably part of the specifier, which later gets spelled out onto the head. If this is the case, then the data in (17) and (18), which seems to fall together in terms of pre-theoretical intuitions, may ultimately be collapsed. But why should (17) and (18) be so bad? The chart in (16), which suggests essentially that all elements are detected (i.e. lexically governed for the point of view of recovery) by their governor, gives no clue. While one might argue that
2. Recall again that the notion of specifier used is that current in 1988 [D.L. in 1999]. It includes elements like the determiner the in the picture of Mary, and was in was talking to John: closed class items specifying the head.
A RE-DEFINITION OF THE PROBLEM
23
there is some sort of constraint that, in attempting to extract a head, necessarily drags other material with it, and that this accounted for the ungrammaticality of (18), there is no way to extend this constraint to (17), under normal assumptions about the headedness of the phrase. However, even in its own terms such a constraint is dubious, since in, e.g., German and Dutch, there is verb movement without the complement of the verb being moved as well. Chierchia himself suggests that the ungrammaticality of sentences like (17) and (18) is due to a (deep) property of the type system: namely, that the system is strictly second-order (with a nominalization operator), and that no variable categories exist of a high enough type to correspond to the traces of the determiner and preposition. While Chierchia’s solution is coherent, and indeed exciting, from the point of view of the theory that he advocates, there are obvious problems in transposing the solution to any current version of GB. Indeed, even if the constraint did follow from some deep semantic property of the system, we would still be licensed in asking if there was some constraint in the purely syntactic system which corresponds to it. To the extent to which constraints are placed over purely syntactic forms, as well as (perhaps) the semantic system corresponding to it, we arrive at a position of an autonomous syntax, which, while perhaps constructed over a semantic base, retains its own set of properties distinct from the conceptual core on which it was formed. For discussion, from quite different points of view, see Chomsky (1981), where it is argued that the core relation of government is “grammaticalized” in a way which might not be determinable from its conceptual content alone, and that this sort of formal extension is a deep property of human language; see also Pinker (1984) where the notion of semantic bootstrapping plays a similar role. Returning to the problem posed by the ungrammaticality of (17) and (18), we would wish to propose a syntactic constraint which would bar the unwanted sentences, and at the same time help the acquisition system to operate: (19)
Fixed specifier constraint: The closed class specifier of a category is fixed (may not move independently from the category itself).
It is clear why something like (19) would be beneficial from the point of view of the acquisition system. The problem for that system, under conditions of extensive movement, is that there is no set of fixed points from which to determine the D-structure. Of course, a trace-enriched S-structure would be sufficient, but the child himself is given no such structure, only the surface. The fixed specifier constraint suggests that there is a set of fixed points, loci from
24
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
which the structure of the string can be determined. Further, this seems to be supported by the grammatical evidence of lack of extractability. The alternative is that the D-structure and the fact that movement has taken place, is determinable from a complex set of principles that the child has reference to (Case theory, theta theory, etc.), but without any particular set of elements picked out as the fixed points around which others may move. This possibility cannot be rejected out of hand, but a system which contained (19) instead would quite clearly be simpler. (19) itself is a generalization of Roeper’s initial proposal, that it is the element Neg which plays a crucial role; we will see later on that there is overt evidence for the role of this element in the acquisition of English. The learnability point notwithstanding, it may be asked whether a strong constraint such as (19) can empirically be upheld. While the closed class specifiers of NP clearly do not move in English, and it is curiously supportive as well that, if we do think of prepositions as in some sense reanalyzed as the specifier of the NP that they precede, that particle movement applies just in case the preposition does not have an NP associated with it (i.e. cannot be reanalyzed as a specifier), there do seem to be instances in which specifiers move at a sentential level. Not may apparently move in Neg-raising contexts (20a), and do fronts in questions, sometimes in conjunction with a cliticized not(20 b). (20)
a. b.
I don’t believe that he is coming. (= I believe that he isn’t coming.) Didn’t he leave already?
Thus, while it is in general the case that complements are more mobile than heads, and heads are more mobile than specifiers, it is by no means clear that specifiers form the “grid” necessary to determine the basic underlying structure of the language for the child. 1.3.2.1 The Movement of NEG (syntax) The syntactic problem posed by (20) for the general idea that specifiers constitute a fixed grid from which the child posits syntactic order is a difficult one, but perhaps not insuperable. The status of the Neg-raising case, is in any case unclear — e.g. as to whether movement has taken place. The problem posed by (20b) is more difficult. Given that movement has taken place, examples such as (20b) would seem to provide a straightforward violation of the fixed specifier constraint (and thus leave the learnability problem untouched). However, the example given, and the movement operation, is one which affects both Neg and the auxiliary verb: examples such as (21) are ungrammatical. (21) *Not he saw Mary.
A RE-DEFINITION OF THE PROBLEM
25
That is, the movement operation does not move Neg per se, but rather the category under which it is adjoined. If we consider this category itself to be Infl, which is not a specifier category, but rather the head of I′, then it is not the case that the closed class specifier itself has been moved to the front of the sentence, but rather I, a head constituent. The fixed specifier constraint is therefore not violated by Subject/Aux inversion. (22)
S NP
I′ Infl
VP
Infl Neg he
is
not
going
Derivation: (He ((is not) (going))) → (Isn’t he ((e) (going))) (23)
Movement types. a. Move-α. Potentially unbounded, applies to major categories, maximal projections. b. Hop-α. String adjacent, applies to minor categories, closed class elements, minimal projections.
The set of properties listed under the movement types is intended as a pretheoretical characterization only, with the formal status of this division to be determined in detail. We might include other differing properties as well: e.g., perhaps Hop-α, but not Move-α, is restricted to particular syntactic levels. Further, the exact empirical extension of Hop-α is left undetermined. In the original account (Chomsky 1975–1955, 1957), Hop-α was restricted (though not in principle) to the movement of affixes, i.e. closed class selected morphemes, onto the governed element, in particular the governed verb. In Fiengo’s interesting extension, Hop-α may be applied to other string adjacent operations involving closed class elements.3 Assuming a division of movement types such as that given in (23), the Neg movement operation adjoining not to Infl may be considered a movement
3. For a somewhat different view of Hop-α, see Chomsky (1986b).
26
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
operation of a particular type: namely, an instance of Hop-α, not Move-α. As such, the fixed specifier constraint may still be retained, but modified in the following way: (24)
Fixed specifier constraint (modified form): A closed class specifier may not be moved by Move-α.
1.3.2.2 The Placement of NEG (Acquisition) While the revision of the Fixed Specifier Constraint in (24) allows the syntactic system to retain a set of nearly fixed points, and thus simplify the acquisition problem at a theoretical level, a very interesting body of literature remains, about the acquisition sequence, which appears to directly undermine the claim that this information is in fact used by the child. This is the set of papers due to Ursula Bellugi and Edward Klima from the mid-60’s (Klima and Bellugi 1966), recently re-investigated by Klein (1982), in which it appears that Neg is initially analyzed by the child as a sentential, not a VP operator, and hence appears prior to the modified clause. Of course, if the child himself allows negation to move in his grammar, it can hardly be the case that he is using it as a fixed point from which to determine the placement of other elements. Bellugi and Klima distinguish three major stages in the acquisition of negation. (25)
a. b. c.
Stage 1: Negation appears in pre-sentential position. Stage 2: Negation appears contracted with an auxiliary, stage prior to that at which the auxiliary appears alone. Stage 3: Negation is used correctly.
Bellugi and Klima suggest that in the intermediate stage the auxiliary does not have the same status that it has in the adult grammar, since it appears only in the case that negation also occurs contracted on it. They suggest, rather, that the negation and the auxiliary form themselves a constituent headed by Neg: (NEG can (NEG not)) . Thus the fact that the negated auxiliary appears prior to any occurrences of the non-negated auxiliary is accounted for by supposing that no independent auxiliary node exists as such, the initial negative auxiliary is a projection of Neg. The data corresponding to the Stages 1–3 above is given in (26). (26)
a.
Stage 1: no see Mommy no I go no Bobby bam-bam etc.
A RE-DEFINITION OF THE PROBLEM
b.
c.
27
Stage 2: I no leave I no put dress me can’t go table etc. Stage 3: comparable to adult utterances
The problematic utterances from the current point of view are given in (26a). Given the assumption that the structure of these utterances is that in (27), the child appears to be lowering the negation into the VP, in the transition between Stage 1 and Stage 2. This, in turn, is problematic for any view that such elements are viewed as fixed for the child. (27)
(Klima and Bellugi 1966), analysis of Stage 1 negation: S Neg
S
no
NP
VP
me
like spinach
The analysis given in (27), however, is the sentential analysis of Bellugi and Klima. Recently, a new analysis has been given for the basic structure of S (Kitagawa 1986, Sportiche 1988, Fukui and Speas 1986). Kitagawa, Sportiche, Fukui and Speas argue, on the basis of data from Italian and Japanese, that the basic D-structure of S has the subject internal to the VP, though outside the V′. The D-structure of (28a) is therefore given in (28b). (28)
a. b.
John saw Mary. S NP
I′ I
e
VP NP
V′
John
saw Mary
28
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The internal-to-VP analysis allows theta assignment to take place internal to the maximal projection VP; it also provides for a variable position in the VP, as a predication-type structure results: NPi (VP ei (saw Mary)) (Predication type analysis). Kitagawa also argues that certain differences in the analysis of English and Japanese follow from this analysis. En route to S-structure, the DS subject is moved out of the VP, to the subject position. Note that such a movement is necessary, if the subject is to be assigned Case by Infl. A Kitagawa/Sportiche/Fukui-Speas type analysis of the basic structure of S receives striking confirmation from the acquisition data, if we assume that negation is fixed throughout the acquisition sequence, and that, throughout Stage 1 speech, it is direct theta role assignment, rather than assignment of (abstract) Case, which is regulating the appearance of arguments. That is, in Stage 1 speech the negation is, as expected, adjoined off of VP as the Spec of VP. However, the subject is internal to the VP: that is, in its D-structure position. The relevant structure then is that in (29). (29)
VP Spec VP
VP NP
no
me
V′ V
NP
go
mommy
In this structure, if we assume that theta role assignment to the subject is via the V′, and further, that abstract Case has not yet entered the system (see later discussion), then the resultant structure is precisely what would be expected, given the fixity of the specifier and the lack of subject raising. The apparent sentential scope is actually VP scope, with a VP-internal subject. At the point at which abstract Case does enter the system, the subject must be external, and appears, moved, prior to negation: Stage 2. I will return to this analysis, and to a fuller analysis of the role of Case and theta assignment in early grammars in later chapters. For the moment, we may simply note two properties of the above analysis: first, that the (very) early system is regulated purely by theta assignment, rather than the assignment of abstract Case. This is close to the traditional analysis in much of the psycholinguistic literature (e.g. Bowerman 1973) that the early grammar is “semantic”,
A RE-DEFINITION OF THE PROBLEM
29
i.e. thematic. The second property of this analysis is in the relation of syntactic levels, in the adult grammar, to stages in the acquisition sequence. Namely, there is a very simple relation: the two stages in the acquisition sequence correspond to two adjacent levels of representation in the synchronic analysis of the adult grammar. That is, the “geological” pattern of surface forms L1 → L2 corresponding to adjacent grammars in the child’s acquisition sequence corresponds to adjacent levels of representation in the adult grammar. This sort of congruence, while natural, is nonetheless striking, and suggests that rather deep properties of the adult grammar may be projected from the acquisition sequence, i.e. the fact of development.
C 2 Project-α α, Argument-Linking, and Telegraphic Speech
2.1 Parametric variation in Phrase Structure In the last chapter, I suggested that the range of parametric variation across languages was tied to the difference in the specifications associated with closed class elements. This strengthened the finiteness claim of Chomsky (1981), by linking the finiteness of variation in possible grammars with another sort of finiteness: that of the closed class elements and their specifications. However, this very small range of possible parametric variation still had to be reconciled with a very different fact: the apparent “scatter” of the world’s languages with respect to their external properties, so that radically different surface types appear to occur. It was suggested that this scatter was due to the interaction of the (finite and localized) theory of parametric variation in lexical items with a different aspect of the theory: that of representational levels. Slightly different parametric settings of the closed class set would give rise to different, perhaps radically different, organizations of grammars. This would include the postulate that different operations might apply at different grammatical levels crosslinguistically, as in the earlier discussion which suggested that the different role that wh-movement played in English and Chinese (Huang 1982) — a levels difference — should be tied to some property of the agreement relation which held between the fronted wh-element and the +/−wh feature in Comp, and that this, in turn, could be related to the differing status of agreement, a closed class morpheme, in the two languages. If this general sort of approach is correct, it may be supposed that large numbers of differences may be traced back in this way. 2.1.1
Phrase Structure Articulation
One difference cross-linguistically, then, would be traceable to a difference in the level at which a particular operation applied. If the foregoing is correct, then
32
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
this should not simply be stated directly, but instead in terms of the varying property of some element of the closed class set. What about differences in phrase structure? In the general sort of framework proposed in Stowell (1981), and further developed by many others, phrase structure rules should not have the status of primitives in the grammar, but should be replaced by, on the one hand, lexical specifications (e.g. in government direction), and, on the other, general licensing conditions. Within a theory of parametric variation, one would therefore expect that languages would differ in these two ways. On the other hand, really radical differences in phrase structure articulation cross-linguistically may be possible, at least if the theory of Hale (1979) is correct. Even if one did not adopt the radical bifurcationist view implicit in the notion of W* languages (see Hale’s appendix), one might still adopt the view that degrees of articulation are possible with respect to the phrase structure of a language. The flatter and less articulated a language in phrase structure, the closer that it would approximate a W* language. Of course, the question still arises of how a child would learn these cross-linguistic differences in degree of articulation, particularly if true W* languages existed alongside languages which were not W*, but exhibited a large degree of scrambling. 2.1.2
Building Phrase Structure (Pinker 1984)
Pinker and Lebeaux (1982) and Pinker (1984) made one sort of proposal to deal precisely with this problem: how might the child learn the full range of phrase structure articulation, in the presence of solely positive evidence. The answer given relied on a few key ideas. First, following Grimshaw (1981), relations between particular sets of primitives were assumed to contain a subset of canonical realizations. The possibility of such realizations were assumed to be directly available to the child, and in fact used by him/her in the labelling of the string. Thus, in the first place, the child has access to a set of cognitively based notions: thing, action, property, and so on. These correspond, in a way that is obviously not one-to-one, to the set of grammatical categories: NP, Verb, Adjective Phrase, and so on. What is the relation, if not one-to-one? According to Grimshaw, the grammatical categories, while ultimately fully formal in character, are nonetheless “centered” in the cognitive categories, so that membership in the latter acts as a marker for membership in the former: a noun phrase is the canonical grammatical category corresponding to the cognitive category “thing”; a verb (or verb phrase) is the canonical grammatical category corresponding to the cognitive category “action”; a clause is the canonical grammatical category corresponding to the cognitive category “event” or “proposition”; and so on. This
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
33
assumes an ontology rich enough to make the correct differentiations, see Jackendoff (1983) for a preliminary attempt to construct such an ontology. Crucially, the canonical realizations are implicational, not bidirectional; further, once the formal system is constructed over the cognitive base, it is freed from its reliance on the earlier set of categories. Consider how this would work for the basic labelling of a string. The simple three-word sentence in (1) would have cognitive categories associated with each of their elements. (1)
thing act thing (cognitive category) John saw Mary.
These would be associated with their canonical structural realizations in phrasal categories. (2)
NP V NP (canonical grammatical realization) thing act thing (cognitive category) John saw Mary
On the other hand, sentences to which the child was exposed which did not satisfy the canonical correspondences, would not be assigned a structure: (3)
? ? ? (cognitive category) This situation resembles a morass.
A number of questions arise here, as throughout. For example, could the child be fooled by sequences which did not only not satisfy the canonical correspondences, but positively defied them? Deverbal nominals would be a good example: (4)
VP event
(canonical grammatical realization) (cognitive category)
The examination of the patient In (4), the deverbal nominal recognizably names an event. Given the canonical correspondences, this should be labelled a VP, or some projection of V. But this, in turn, would severely hamper the child in the correct determination of the basic structure of the language. One way around this problem would be to simply note that deverbal nominals are not likely to be common in the input. A more principled solution, I believe, would be to further restrict the base on which the canonical correspondences are drawn. For example, within each language there is not simply a class of
34
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
nouns, roughly labelling things, but a distinguished subset of proper names. Of course, if the general theory in Lebeaux (1986) is correct, then derived nominals of the sort in (4) — i.e. nominalized processes or actions — actually are projections of V: namely nominalized V′s or V″s, with -tion acting as a nominalizing affix, though they achieve this category only at LF, after affix raising (see Lebeaux 1986 for such an analysis). The second property of the analysis, along with the idea that one set of primitives is related to another by being a canonical realization of the first in the different system, is that the second set is constructed over the first and is itself autonomous, and freed from its reliance on the first in the fully adult system. It is this which allows strings like that given above (E.g. “This situation resembles a morass”), which do not obey the canonical correspondences, to be generated by the grammar. We may imagine a number of possibilities in how the systems may be overlaid: it may be that the original set of primitives, while initially used by the grammatical system in the acquisition phase, is entirely eliminated in the adult grammatical system. This presumably would be the case in the above labelling of elements as “thing”, “action”, etc., which would not be retained in the adult (or older child’s) grammar. On the other hand, certain sets of primitives might be expected to be retained in the adult system. In the framework of Pinker (1984), this would include the set of grammatical relations, which were used to build up the phrase structure. In Pinker and Lebeaux (1982), Pinker (1984) the labelled string allowed the basic structure of S to be built over it in the following way: (i) particular elements of the string were thematically labelled in a way retrievable from context (Wexler and Culicover 1980), (ii) particular grammatical functions corresponded to the canonical structural realizations of thematically labelled elements (agent → subject, patient → object, goal → oblique object, etc.), (iii) grammatical relations were realized, according to their Aspects definition, as elements in a phrase marker: subject (NP, S), object (NP, VP), Oblique Object (NP, PP), and so on, (iv) the definitions in (iii) were relaxed as required, to avoid crossing branches in the PS tree. The proviso in (iv) was intended to provide for languages exhibiting a range of hierarchical structuring. Rather than specifically including each degree of hierarchical structuring as a setting in UG as a substantive universal (i.e. a possible setting for a substantive universal), the highly modular approach of (i)–(iv) allows for the interaction of the substantive universal in (iii) and the formal universal in (iv) to introduce the degree of hierarchical relaxation necessary, without specific provision having to be made for a series of differing grammars.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
35
The provisions (i)–(iv) may be put in a “procedural” format: (5)
Building phrase structure: a. Thematic labelling: (i) Label agent of action: agent (ii) Label patient of action: patient (iii) Label goal of action: goal etc. b. Grammatical Functional labelling: (i) Label agent: subject (ii) Label patient: object (iii) Label goal: oblique object etc. c. Tree-building: (i) Let subject be (NP, S) (ii) Let object be (NP, VP) (iii) Let oblique object be (NP, XP), XP not VP, S etc. d. Tree-relaxation: If (a)–(c) requires crossing branches, eliminate offending nodes as necessary, from the bottom up. Allow default attachment to the next highest node.
The combination of (c) and (d) assumes maximum structure, and then relaxes that assumption as necessary. The principle (5a) meshes with the general cognitive system, as does the node-labelling mentioned earlier. The other principles are purely linguistic, but even here the question arises of whether they are permanent properties of the linguistic system (i.e. UG as it describes the adult grammar), or localized in the acquisition system per se, as Grimshaw (1981) suggests. We leave this question open for now. I have considered above how the basic labelling would work; consider now the general analysis. The string in (6) is segmented by the child. (6)
John hit Bill.
From the general cognitive system, the segmented entities may be labelled for their semantic content. (7)
thing (name) action thing (name) John hit Bill
These cognitive terms take their canonical grammatical realization in node labels.
36
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(8)
NP thing(name)
V action
NP thing(name)
John
hit
Bill
These nodes, in turn, may be thematically labelled. (9)
NP thing(name) agent
V action
NP thing(name) patient
John
hit
Bill
The canonical realization of these theta roles is as particular grammatical relations. (10)
NP thing(name) agent subject
V action
NP thing(name) patient object
John
hit
Bill
These grammatical relations have, as in the Aspects definition, particular structural encodings. (11)
S NP thing(name) agent subject
John
VP V
NP thing(name) patient object
hit
Bill
And the phrase structure tree is complete. As Pinker (1984) notes, once the structure in (11) is built, the relevant general grammatical information (e.g. S → NP VP) may be entered in the grammar. The PS rule is available apart from its particular instantiating instance,
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
37
and the system itself is freed from its reliance on conceptual or notional categories. Rather, it may analyze new instances which do not satisfy the canonical correspondences: any pre-verbal NP, regardless of its conceptual content (e.g. naming an event, rather than a thing) will be analyzed as an NP. It is precisely this property, which represents the autonomous character of the syntactic system, after its initial meshing with the cognitive/semantic system, which gives the proposal its power. I have included, in the phrase structure tree, not purely grammatical information, e.g. the node labels, but also other sorts of information: thematic and grammatical relational, as well as cognitive. Should such information be included? There is some reason to believe not, at least for categories above. Thus, while grammatical processes require reference to node labels like NP, they do not seem to require reference to cognitive categories like “thing” or “proper name”. Giving such labels equal status in the grammatical representation implies counterfactually that grammatical processes may refer to them. This suggests, in turn, that they should not be part of the representation per se, but part of the rules constructing the representation. The situation is more complex with respect to the other information in the phrase structure tree. Thus in the tree in (11), it is assumed that thematic information (the thematic labels “agent”, “patient”, etc.) is copied onto the node labels directly, as is grammatical-functional information (Subj, Obj, etc.). Presumably, in a theory such as GB, the latter intermediate stage of grammatical functions would be discarded in favor of a theory of abstract Case. The question of whether the thematic labels are copied directly onto the phrasal nodes can also not be answered a priori, and is associated with the question of how thematic assignment occurs exactly in the adult grammar. In traditional analyzes, theta roles were thought of as being copied directly onto the associated NP: i.e. the relevant argument NP was considered to have the thematic role as part of its feature set directly. In the theory of Stowell (1981), the NP does not have the theta role copied onto it, but rather receives its theta role by virtue of a (mini-)chain which coindexes the phrasal node with the position in the theta grid. (12)
VP V
NPj
see (Ag, Patj)
Mary
38
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
While Stowell’s system is in certain respects more natural — in particular, it captures the fact that case, but not theta roles, seem to show up morphologically on the relevant arguments, and hence may be viewed as direct “spell-outs” of the feature set — the acquisition sequence given here suggests that theta roles actually are part of the feature set of the relevant NP. Since I will assume here, as throughout, that the representations adopted in the course of acquisition are very closely aligned with those in the adult grammar, this suggests that theta roles actually are assigned to the phrasal NP, at least in the case of agent and patient (which are the central roles for this part of the account).
2.2 Argument-linking The plausibility and efficacy of the above approach in learning cross-linguistic variation in phrase structure depends in part on the outcome of unresolved linguistic questions. In particular: (i) to what degree do languages actually differ in degree of articulation, (ii) to what degree may elements directly associated with the verbal head, or auxiliary, the pronominal arguments of the head (Jelinek 1984), be construed as the direct arguments, with the auxiliary NP considered to be simply adjuncts or adarguments, and (iii) the precise characterization of the difference between nominative/accusative and ergative/absolutive languages, or the range of languages that partially have ergative/absolutive properties (e.g. split ergative languages). The existence of so-called “true ergative” languages has, in particular, been used to critique the above notion that there are canonical (absolute) syntactic/thematic correspondences, and that these may be used universally by the child to determine the Grammatical Relational or abstract Case assignment in the language. Thus it is often noted that while nominative/ accusative languages use the mapping principles in (13a), ergative/absolutive languages use those in (13b) (Marantz 1984; Levin 1983). (13)
a.
b.
subject of transivitive subject of intransitive object of transitive subject of transitive subject of intransitive object of transitive
→ → → → → →
nominative nominative accusative ergative absolutive absolutive
And in fact the “true ergative” language Dyiribal is assumed to have the following alignment of theta roles and grammatical relations (Marantz 1984; Levin 1983):
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(14)
agent patient
→ →
39
object subject
The mapping principles in (13) are stated in terms of grammatical relations, but it is clear that even if the principles were re-stated in other terms (e.g those relating to abstract Case), a serious problem would arise for the sort of acquisition system envisioned above, if no more were said. The reason is that the set of canonical correspondences must be assumed to be universal, and one set of primitives (case-marking) centered in another set (thematic), perhaps through the mediation of a third set (abstract Case). The linking rules must be assumed to be universal, since if they were not so assumed, they would not give a determinate value in the learning of any language, for the child faced with a language of unknown type. It is easy to see why. Let us call the (canonical) theta role associated with the subject position of intransitives t1, the canonical theta role associated with the subject position of transitives t2, and the canonical theta role associated with the object position of transitives t3. Then nominative/accusative languages use the following grouping into the case system (15). (15)
t1 t2
nominative
t3
accusative
theta system
case system
The ergative absolutive languages use the following grouping: (16)
t1
absolutive
t2 t3
ergative
theta system
case system
This is perfectly adequate as a description, but if the child is going to use theta roles to determine the (grammatical relation and) case system in the language in which he has been placed, then the position is hopeless, since the child would have to know what language-type community he was in antecedent to his postulation of where subjects were in the language. Otherwise he will not be able
40
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
to tell where, e.g., subjects are in the language. But it is precisely this that the radical break in the linking principles evidenced in (15) and (16) would not allow. The division of language types in this fashion would thus create insuperable difficulties for the child, since there would be no coherent set of theta– Grammatical Relational or theta–Case mappings that he or she could use, to determine the position of the subject in the language. Of course, once the language-type was determined, the mapping rules themselves would be as well, but it is precisely this information that the child has to determine. There is, however, an unexamined assumption in this critique. It is that the situation is truly symmetrical: that is, that the child is faced with the choice of the following two linking patterns: (17)
G0
nom/acc pattern
erg/abs pattern
Given such an assumption, there is no way for this acquisition system to proceed. However, if the situation is not truly symmetrical — i.e. if there are other differences in the languages exhibiting ergative vs. those exhibiting nominative/accusative linking patterns — and if these differences are determined by the child prior to his/her determination of the linking pattern in (17), then the critique itself is without force. We would wish to discover, rather, how this prior determination occurs, and how it and the adoption of a particular linking pattern mesh. In fact, there appears to be evidence for just this lack of asymmetry: evidence that the majority (and perhaps the vast bulk) of ergative/absolutive languages are associated with a different sort of argument structure. I rely here on the work of Jelinek (1984, 1985), and associated work. Jelinek proposes a typological difference between broadly configurational languages, which take their associated NPs as direct arguments (e.g. English, French), and languages which she designates “pronominal argument languages”. In the latter type (she argues), the pronominal “clitics” associated with the verbal head or auxiliary are actually acting as direct arguments of the main predicate, and the lexical noun phrases are adjuncts or adarguments, further specifying the content of the argument slot. The sentential pattern of phrasally realized arguments in such languages, then, would roughly resemble the nominal pattern in English in picture-noun phrases, where all arguments may be considered to be adjuncts in relation to the head. While it is not the case that all ergative languages reveal this sort of optionality of arguments (and in particular Dyiribal does not), it does seem that
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
41
the bulk do (Jelinek 1984). If we take this as the primary characteristic of these languages, then the choice for the child is no longer the irresolvable choice of (17), but rather the following: (18)
G0 arguments obligatory
arguments (i.e. not arguments optional but ad-arguments)
nominative/ accusative
ergative/ absolutive
The “choice matrix” in (18) is undoubtedly very much simplified. Nonetheless, it appears to be the basic cut made in the data. If this is so, however, then the original decision made by the child is not that of the linking pattern used, but rather in the determination of the argument status of the phrasal arguments; this, in turn, may cue the child into the sort of language that he/she is facing. 2.2.1
An ergative subsystem: English nominals
The general pattern suggested in (18) gets support from a rather remarkable source: English nominals. It was noted above that simple nominals in English (e.g. “picture”), have pure optionality of arguments. What has been less commented on is that deverbal English nominals have an ergative linking pattern as well, in the sense of (13b) above. Deverbal nominals from a transitive stem base have the same linking patterns as their verbal counterparts (Chomsky 1970): (19)
John’s destruction of the evidence John destroyed the evidence
I assume, with Chomsky (1970), that in cases in which the DS object is in subject position (the city’s destruction) Move-α has applied. What about deverbal nominals from an intransitive base? Here the data is more complex. Appearance, from the unaccusative verb appear, allows the argument to appear in either N′ internal or N′ external position, with genitive marking on the latter. (20)
the appearance of John shocked us all John’s appearance shocked us all
42
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The second example in (20) is actually ambiguous between a true argument and a relation R type reading (the fact that John appeared vs. the way he looked, respectively), the latter reading does not concern us here. What about other intransitives? Surprisingly, the “internal” appearance of the single argument of intransitives verbs is not limited to unaccusatives, but occurs with other verbs as well. For example, sleep and swim, totally unexceptionable unergative (i.e. pure intransitive) verbs, allow their argument to appear both internal and external to the N′. (21)
the sleeping of John John’s sleeping
(22)
the swimming of John John’s swimming
This possibility of taking the “subject” argument as internal is not limited to deverbal nominals simply taking one argument, but extends to those taking other internal arguments as well, as long as those would be realized as prepositional objects (rather than direct objects) in the corresponding verbal form. (23)
a. b. c.
the talking of John to Mary (John talked to Mary) the reliance of John on Bill (John relied on Bill) the commenting of John on her unkindness (John commented on her unkindness)
Of course, deverbal nominals formed from simple transitive verbs do not allow the subject of the verbal construction to appear in the internal-to-N′ position in an of-phrase: (24)
a. *the destruction of John of the city (John destroyed the city) b. *the examination of Bill of the students (Bill’s examination of the students)
It appears, then, that the linking pattern for subjects in nominals differs, according to whether the underlying verbal form is transitive or not. In all deverbal nominals formed from intransitive verbs — not simply those formed from unaccusatives — the subject of the corresponding verb may be found in the nominal in internal-to-N′ position, in the of-phrase which marks direct arguments (see (23)), and without genitive marking in that position. This is totally impossible in deverbal nominals formed on a transitive verbal base.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
43
This clearly is an ergative-like pattern, but before drawing that conclusion directly, a slightly broader pattern of data must be examined. It is not simply the case that nominals formed from intransitive verbal bases allow their (single) direct argument to appear internal to the N′; it may appear in the external position as well, as noted above. (25)
a. b. c. d.
the sleeping of John John’s sleeping (in derived nominal, not gerundive, reading) the talking of John to Mary John’s talking to Mary the reliance of John on Mary’s help John’s reliance on Mary’s help the love of John for his homeland John’s love for his homeland
Second, genitives may appear post-posed, in certain cases: (26)
a. b.
John’s pictures of Mary the pictures of Mary of John’s John’s stories the stories of John’s
However, we may note that when the subject is postposed, it must appear with genitive marking (or in a by-phrase). (27)
a.
the pictures of Mary of John’s *the pictures of Mary of John b. John’s examination the examination of John’s (possessive reading) the examination of John (OK, but only under patient/theme reading)
A reasonable possibility for analysis is that the NP with genitive marking which appears post-head is postposed from the subject/genitive slot: this accounts for the genitive marking (see Aoun, Hornstein, Lightfoot, and Weinberg 1987, for a different analysis, in which the genitive is actually associated with a null N′ category). What appears to be the case, however, is that elements which have moved into the subject position, may no longer postpose. (28)
a. b.
John’s picture (ambiguous between possessor and theme reading, the latter derived from movement) the picture of John’s (only possessor reading)
44
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(29)
a. b.
John’s examination (ambiguous between possessor and theme reading) the examination of John’s (only possessor reading)
Thus the nominal in (28a) is ambiguous between the moved and not moved interpretation while the postposed genitive in (28b) is not, similarly for (29a) and (b). Thus the following constraint appears to be true, whatever its genesis. (see Aoun, Hornstein, Lightfoot, and Weinberg 1987, which makes a similar empirical observation, though it gives a different analysis). (30)
Posthead genitives may only be associated with the deep structure subject interpretation; derived genitive subjects may not appear posthead with genitive marking.
The constraint in (30) may now be used to help determine the argument structure of the intransitive nominals under investigation. It was earlier noted that the subject in such nominals appeared on either side of the head. (31)
a. b.
the appearance of John John’s appearance the sleeping of John John’s sleeping
Which position is the DS position? Given (30), the genitive subject should count as a deep structure subject if it may postpose with genitive marking, but not otherwise. In fact, it cannot postpose: (32)
a.
the appearance of John (startled us all) John’s appearance *the appearance of John’s b. the sleeping of John John’s sleeping *the sleeping of John’s c. the talking of John to Mary John’s talking to Mary *the talking to Mary of John’s
The inability of the genitive subject of the intransitive to postpose then constitutes evidence that the subject position is not the DS position of the single direct argument, but rather the internal-to-N′ position, and the constructions in which that element does appear in subject position are themselves derived by preposing. The argument-linking pattern for English nominals would then not be some sort of mixed system, but a true ergative-style system, given in (33).
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(33)
English nominals: t1 (subject of intransitives) t2 (object of transitives)
internal position
t3 (subject of transitives)
external position
45
This, in turn, strongly suggests that the argument linking pattern split found between the nominative/accusative languages and ergative/absolutive languages should not be itself the primary “cut” faced by the child, but rather that between languages (and sub-languages) which obligatorily take arguments, and those which do not. Even in English, a strongly nominative/accusative language, a sub-system exists which is basically ergative in character: this, not coincidentally, corresponds to the subsystem in which the arguments are optional, the nominal system. 2.2.2
Argument-linking and Phrase Structure: Summary
To summarize: I have suggested (following Pinker 1984) the need for maximally general linking rules, associating thematic roles with particular grammatical functions, or abstract case. Pinker has suggested the term “semantic-bootstrapping” to apply to such cases; it would be preferable, perhaps, to consider this a specific instance of a more general concept of analytic priority. (34)
(35)
Analytic priority: A set of primitives a1, a2, … an is analytically dependant on another set b1, b2, … bn iff bi must be applied to the input in order for ai to apply. The set of theta-theoretic primitives is analytically prior to the set of Case-theoretic primitives.
Semantic-bootstrapping would thus be a particular instance of analytical priority; another such example would be Stowell’s derivation (1981) of phrase structure ordering from principles involving Case. In order for a set of primitives to be analytically prior in the way suggested above, and for this to aid the child in acquisition, it must be the case that the analytic dependance is truly universal, since if this were not so, the child would not be able to determine the crucial features of his or her language (in particular, the alignment of the analytically dependant primitives a1, a2, … an) on the basis of the analytically prior set. It is for this reason that the existence of ergative languages constitutes an important potential counterexample to the idea of analytic priority, and its particular instantiation here, since it would be that there would be no linking regularities that the child could antecedently have access to.
46
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
However, there is fairly strong evidence that the languages differ not only in their linking patterns, but in their argument structure in general: in particular, that ergative/absolutive languages are of a “pronominal argument” structure type, with elements in the verb itself satisfying the projection principle (Jelinek 1984). The choice faced by the child would then be that given in (18), repeated here below. (36)
G0 obligatory arguments
optional arguments
nominative/ accusative
ergative/ absolutive
The association of optionality with ergativity was supported by the fact that such languages do in general show optionality in the realization of arguments, and, further, in the extensive existence of ergative splits. Most remarkably of all, the English nominal appears to show an ergative-like pattern in argument-linking: that is, the ergative pattern shows up precisely in that sub-system in which arguments need not be obligatorily realized. This suggests, then, that the idea of analytic priority, and in general the priority of a given set of primitives over another set, is viable. This general idea, however, has a range of application far more general than the original application in terms of acquisition; moreover, there is some reason to believe that while the original proposals involving semantic bootstrapping or analytic priority were made in an LFG framework, they would be able to accommodate themselves in a more interesting form a multi-leveled framework like Government-Binding theory. While the proposal that the grammar — and in particular, the syntax — is levelled has been common at least since the inception of generative grammar (Chomsky, 1975–1955), the precise characterization of these levels has remained somewhat vague and underdetermined by evidence (van Riemsdijk and Williams 1981). Within the Government-Binding theory of Chomsky (1981), this problem has become more interesting and yet more acute, since while it is crucial that particular subtheories (e.g. Binding Theory) apply at particular levels (e.g. S-structure), the general principles which would force a particular module to be tied to a particular level are by no means clear, nor is there any general characterization which would lead one to expect that a particular module or subtheory should apply at a single level (Control theory, Binding theory), while another subtheory must be satisfied throughout the derivation (theta theory, according to the Projection Principle). The problem is intensified by the fact that, given that
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
47
a single operation, Move-α, is the only relation between the levels, and the content of this operation is for the most part — perhaps entirely — retrievable from the output, the levels themselves may be collapsed, at least conjecturally, so that the entire representation is no longer in the derivational mode, but rather contains all its information in a single level representation, S-structure (the representational mode). Chomsky (1981, 1982), while opting ultimately for a derivational description, notes that in a framework containing only Move-α as the relation between the levels, the choice between these modes of the grammar is quite difficult, and appears to rest on evidence which is hardly central. This indeterminacy of representational style may appear to be quite unrelated to the other problem noted in Chapter 1, the lack of any general understanding of how acquisition, and the acquisition sequence in its specificity, fits into the theory of the adult grammar. I believe, however, that the relation between the two “problem areas” is close, and that an understanding of the acquisition sequence provides a unique clue to the theory of levels. In particular, at a first approximation, the levels of representation simply correspond, one-to-one, to the stages of acquisition. That is: (37)
General Congruence Principle: Levels of grammatical representation correspond to (the output of) acquisitional stages.
We will return to more exact formulations, and general consequences, throughout. For the present, we simply note that if the General Congruence Principle is correct, then the idea of analytic priority, and the possibility that the Case system in some way “bootstraps” off of the thematic system would be expected to be true not only of the acquisition sequence, but reflected in the adult grammar as well. Before turning to the ramifications of (37), and its fuller specification, a different aspect of phrase structure variance must be considered.
2.3 The Projection of Lexical Structure In the section above, one aspect of grammatical variance was considered, from the point of view of a learnability theory: namely, the possibility that languages used radically different argument linking patterns. Such a possibility would run strongly against Grimshaw’s (1981) hypothesis, that particular (sets of) primitives were “centered” in other sets, being their canonical structural realizations in a different syntactic vocabulary (e.g. cognitive type: phrasal category, thematic role: grammatical relation, and so on). It was argued, however, that Grimshaw’s
48
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
hypothesis could be upheld upon closer examination, and that the divergence in linking pattern was not the primary cut in the data: that being rather tied to the difference in optionality and obligatoriness of arguments. Since this divergence could be learned by the child antecedently to the linking pattern itself, the learnability problem would not arise in its most extreme form, for this portion of the data. Moreover, bootstrapping-type accounts were found to be a subcase of a broader set: those involving analytic priority, one set of primitives being applied to the data antecedently to another set. This notion would appear to be quite natural from the point of view of a multi-leveled theory like GovernmentBinding theory. We may turn now to another aspect of the structure building rules in (5), where structure is assumed to be maximal, and then relaxed as necessary to avoid crossing branches. The structure-building and structure-labelling rules are repeated below: (38)
Building phrase structure: a. Thematic labelling: i) Label agent of action: agent ii) Label patient of action: patient iii) Label goal of action: goal (Note: the categories on the left are cognitive, those on right are linguistic.) b. Grammatical Functional labelling: i) Label agent: subject ii) Label patient: object iii) Label goal: oblique object etc. c. Tree-building: i) Let subject be (NP, S) ii) Let object be (NP, VP) iii) Let oblique object be (NP, XP), XP not VP, S d. Tree-relaxation: If (a)–(c) requires crossing branches, eliminate offending nodes as necessary, from the bottom up. Allow default attachment to the next highest node.
This degree of relaxation may be assumed to occur either in a language-wide or on a construction-by-construction basis (Pinker 1984). To the extent to which the interaction of rules in (38) are accurate, they would allow the learner to determine a
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
49
range of possible structural configurations for languages, on the basis of evidence readily accessible to the child: surface order. Finally, while it is not the case that the grammars at first adopted would be a direct subset of those adopted later on (assuming some “relaxation” has occurred), it would be the case that the languages generated by these grammars would be in a subset relation. These attractive features notwithstanding, there are potential empirical and theoretical complications, if a theory of the above sort is to be fleshed out and made to work. The problem is perhaps more acute if the theory is to be transposed from an LFG to a GB framework. This is because LFG maintains grammatical functions as nonstructurally defined primitives; the lack of articulation in phrase structure in “flat” languages is still perfectly compatible with a built-up and articulated functional structure. Relevant constraints may then be stated over this structure. This possibility is not open in GB (though see Williams 1984, for a discussion which apparently allows “scrambling” in the widest sense — perhaps including flattening — as a relation between S-structure and the surface). This difficulty reaches its apex in the syntactic description of nonconfigurational languages. In the general GB framework followed by Hale (1979), and also adopted here, no formal level of f (unctional)-structure exists. Nonetheless, the descriptive problem remains: if there is argument structure in such languages, and rules sensitive to asymmetries in it, then there must be some level or substructure which has the necessary degree of articulation. If we assume with Hale that there are languages in which this degree of syntactic articulation does not exist at any level of phrasal representation (though this set of languages need not necessary include languages such as Japanese, where it may be the case that Move-α has applied instead, see Farmer 1984; Saito and Hoji 1983 for relevant divergent opinions), then this articulation must still be present somewhere. Hale’s original proposal (1979) was that the necessary structure was available at a separate level, lexical structure, at which elements were coindexed with positions in the theta structure of the head. It was left unclear in Hale’s proposal what precisely the nature of lexical structure would be in configurational languages. Because of this, in part, the proposal was sharply questioned by Stowell (1981, 1981/1982), essentially on learnability grounds. Stowell noted the difficulty in supposing that languages represented their argument structure in two radically different ways, phrase-structurally or in lexical structure: how could the child tell the difference? would lexical structure, as Hale defined it, just remove itself in languages which didn’t use it? Let us sketch one way through this, then step back to consider the consequences. We may recast Hale’s original proposal in such a way as to escape
50
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
some of these questions, using a mechanism introduced by Stowell himself: the theta grid. Suppose we identify Hale’s lexical structure simply with the theta grid itself. Whatever degree of articulation is necessary to describe languages of this type would then have to be present on the theta grid itself; the deployment of the phrasal nodes would themselves be flat. Rather than representing the lexicalthematic information on a grid, as Stowell suggests, let us represent it on a small (lexical) subtree. (39)
hit: (lexical structure)
V N agent
V V
N patient
hit It is these positions, not positions in a grid, which are linked to different arguments in a full clausal structure: (40)
S NPi
VP
The man
V
NPi
Ni (agent)
V V
the boy
Nj (patient)
hit With respect to theta assignment, this representation and Stowell’s would behave identically. However, there are key differences between the two representations. First, the representation of thematic positions as actual positions on a subtree gives them full syntactic status: this is not clearly the case with the grid representation. Second, the tree is articulated in a way that the grid is not: in particular, the internal argument is represented as inside a smaller V than the external argument. This means that the internal/external difference is directly and configurationally represented, at the level of the “grid” itself (no longer a grid, rather, a subtree). There is some reason to think that theta positions may have the “real” status given to them here. It would allow a very clear representation of
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
51
clitics, for example: they would simply be lexicalized theta tree positions (perhaps binding phrasal node empty categories). Similarly, the “pronominal arguments” of pronominal argument languages would be lexically realized elements of these positions on the small theta subtree. And, finally, the possibility of such a lexical subtree provides a solution to a puzzle involving noun incorporation. In the theory of Baker (1985), noun incorporation involves the movement of the nominal head (N) of an NP into the verb itself, incorporating that head but obligatorily not retaining specifiers, determiners, and other material. (41)
a. I push the bark. b. I bark-push. (incorporated analysis) c. *I the bark-push.
Assuming the basics of Baker’s theory, we might ask why the incorporated noun is restricted to the head, and specifically the N0 category: why, for example, it cannot include specifiers. Baker suggests that this is because movement is restricted to either maximal or minimal projections. This representation, however, suggests a different reason: the noun incorporation movement is a substitution operation, rather than an adjunction operation. As such, the moved category must be of the right type to “land” inside the word: in particular, it must not be of category bar-level greater than that available as a landing site. Since the landing site is a mere N, not an N′ or NP, the only possible incorporating elements are of the X0 level. I will henceforth use the theta subtree rather than the theta grid, except for reason of expository convenience. 2.3.1
The Nature of Projection
The Grimshaw (1981) and Pinker (1984) proposal contained essentially two parts. The first was that certain aspects of argument-linking are universal, and it is this which allows the child to pick out the set of subjects in his or her language. I have attempted above to adopt this proposal intact, and in fact expand on it, so that the principles are not simply in the mapping principles allowing the child to determine his or her initial grammar, but in the synchronic description of the grammar as well. The second part of the proposal is concerned not so much with argument-linking, but the building of phrase structure. Here I would like to take a rather different position than that considered in the Grimshaw/Pinker work. In particular, I would like to argue that the Projection Principle, construed as a continually applying rule Project-α, plays a crucial role. Recall the substance of the proposal in (38), where arguments are first
52
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
thematically labelled, then given grammatical functions, and then given PS representations, relaxed as necessary to avoid crossing branches. From the point of view of the current theory (GB), there are aspects of this proposal which are conceptually odd or unexpected, and perhaps empirical problems as well. First, the tree-building rules given above require crucial reference to grammatical relations. In most versions of Government-Binding Theory, grammatical relations play a rather peripheral role, e.g. as in the definition of function chains (Chomsky 1981). Second, the tree representation itself is given an unusual prominence, in that it is precisely the crossing of branches which forces a flattening of structure. This extensive reliance on tree geometry runs against the general concept of theories like Stowell (1981). Further, there is an empirical problem. Given the possibility that two different sorts of languages exist, those which are truly flat, and those which are not but have extensive recourse to Move-α — such at least would be the conclusion to be drawn by putting together the proposals of Hale (1979), Jelinek (1984), on the one hand, and Saito and Hoji (1983) on the other — the simple recourse to flattening in the case in which the canonical correspondences are not satisfied cannot possibly be sufficient. Finally, it is odd again that the Projection Principle, which plays such a large role in the adult grammar, should play no role at all in the determination of early child grammars. We may pose the same questions in a more positive way. The child starts out with a one-word lexical-looking grammar. From that, he or she must enter into phrasal syntax. How is that done? Let us give the Projection Principle, construed as an operation as well as a condition on representations, central place. In particular, let us assume the following rule: (42)
Project-α
Project-α holds at all levels of representations; it is a condition on representations. However, it is also a rule (or principle) which generates the phrasal syntax from the lexical syntax, and links the two together. Thus, looking at the representation from the head outward, we may assume that the phrasal structure enveloping it actually is projected from the lexical argument structure. This relies crucially, of course, on the theta subtree representation introduced above.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(43)
53
V
lexical form: N agent
Project-α
V V
N patient
hit (44)
phrasal projection:
x
V
NP1
V
x
V N1 agent
NP2 V
V
N2 patient
hit In the representation in (43)–(44), the lexical representation of the verbal head has projected itself into the phrasal syntax, retaining the structure of the lexical entry. In effect, it has constructed the phrasal representation around it, “projecting out” the structure of the lexical head. Into this phrasal structure, lexical insertion may take place. I have left it intentionally vague, for now, what the level of the phrasal projections which are the output of “Project-α” is, calling them simply Vx. Similarly, the question of whether the thematic role (agent, patient, etc) projects has been left undetermined (no projection is shown above). There are 4 areas in which the above representation may be queried, or its general characteristics further investigated: i)
Given that the phrase structure is already articulated, does not the addition of an articulated lexical representation, in addition to the PS tree, introduce a massive redundancy into the system? ii) Assuming that Project-α does occur, is all information associated with the lexical structure projected, or just some of it? Further, are there any conditions under which Project-α is optional, or need not be fully realized? iii) Given that Project-α is an operation, as well as a condition on ultimate representations, is there any evidence for pre-Project-α representations, either in acquisition or elsewhere?
54
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
iv) How might the typology of natural languages, in particular, Hale’s (1979) conjecture that nonconfigurational languages are flatter in structure (i.e. less articulated with respect to X′-theory), be associated with differences in Project-α? In the following sections, I will be concentrating on questions (ii)-(iv). The question — or challenge — posed in (i), I would like to consider here. It requires a response which is essentially two-pronged. The first part concerns the nature of the lexical representation that is employed: that is, as a subtree with articulated argument structure, including potential order information (e.g. the internal argument is on the right), rather than the more standard formats with the predicate first, and unordered sets of arguments (Bresnan 1978, 1982; Stowell 1981). In part, the problem posed here is simply one of unfamiliarity of notation — an unordered set of arguments may be converted into a less familiar tree structure with no loss of information — but in part the most natural theoretical commitments of the notations will be different, and the notation adopted here requires a defense. I have suggested above one line of reasoning for it: namely, by allowing the theta grid real syntactic status, the placement of elements like clitics (and perhaps noun-incorporation structures) can be accounted for. Two other consequences are: the notion of “Move-α in the lexicon” (Roeper and Siegel 1978; Roeper and Keyser 1984) would be given a natural format if tree structures are assumed; perhaps less so in the more standard Bresnan-Stowell type notation. Moreover, making the usual assumptions about c-command and unbound traces (namely, that traces must be bound by a c-commanding antecedent), the tree notation makes a prediction: externalization rules in the lexicon should be possible, but internalization rules should not. This is because the latter rule would be a lowering rule, leaving an unbound trace; the externalization rule would not. (45)
Externalization: V Ni agent
V V melt
Ni patient
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(46)
55
Internalization: V * Ni agent
V V
Ni patient
melt
Aside from this empirical consequence, which falls out automatically from the notation and requires verification or not, the tree notation has a further consequence, bearing on a theoretical proposal of Travis (1984) and Koopman (1984). Travis and Koopman suggest that there are two distinct processes, Case assignment (or government) and theta assignment (or government), and that each of these processes is directional. Thus, in the theory of Travis (1984), these processes may be opposite in directionality, for some categories (e.g. V in Chinese), with Move-α occurring obligatorily as a consequence. The notion that case-assignment is a directional process is a natural one, the notion that thetaassignment is directional is perhaps less so, at least under the version of theta assignment outlined in Stowell (1981), where theta roles are assigned under coindexing with positions in the grid. Note, however, that given the tree-type representation of lexical entries above, where order information is included, the idea that theta government is directional becomes considerably more natural. This is because, to the extent to which Project-α faithfully projects all the information associated with the lexical entry, the order information encoded in the lexical entry would be expected to be projected as well. Thus the Koopman-Travis proposal, and the proposal for the format of lexical entries suggested here, mutually reinforce each other, if it is assumed that Project-α is a true projection of the information present in the lexical representation. This partially answers the question posed by i): the existence of “redundancy” in the chosen lexical representation, and that given in the syntax. While the two representations are close to each other in format and information given, this is precisely what would be expected under interpretations of the Projection Principle in which the projected syntactic information is faithfully projected from the lexicon. In fact, we may use the syntactic information — so the Projection Principle would tell us — to infer information about the structure of the lexical entry, given a strict interpretation of that principle. There is, however, a different aspect of the problem. The above argument
56
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
serves to weaken criticisms about redundancy based on the similarity between the lexical representation and the syntactic: in fact, just such redundancy or congruence might be expected, given the Projection Principle. It might still be the case that both sorts of information, while present in the grammar, are not present in the syntactic tree itself. This would represent a different sort of redundancy. The output of Project-α given above includes both the lexical information, and the syntactic (as does Stowell’s theta grid).
V
(47)
N agent
VP Project-α
V V
NP
N patient
V′ V
N agent
NP V
V
N patient
The arguments for this sort of reduplication of information are essentially empirical. They depend ultimately on the position that implicit arguments play in the grammar. In many recent formulations (e.g. Williams 1982), implicit arguments are conceived of as positions in the theta grid not bound by (coindexed with) any phrasal node. As such, they may be partially active in a syntactic representation; co-indexed with the PRO subject of a purposive, or an adverbial adjunct (following Roeper 1986), even though the position associated with them is not phrasally projected. (48)
a. b.
The boati (vp was (v agj (v sunk patienti)) ti PROj to collect the insurance). The pianoi (vp was (v agj (v played themei)) ti (PROj nude)).
Of course, if such positions are available in the syntax, even partially, they must be present in the representation. 2.3.2
Pre-Project-a representations (acquisition)
The theory outlined above has two central consequences. First, the Pre-Project-α representation has a structure which is fully syntactic, a syntactically represented subtree. Second, this subtree is available in principle prior to the application of Project-α, since the latter is interpreted both as an operation, and as a constraint
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
57
on the outputted representations. In the normal course of language use, no instances of Pre-Project-α representations would be expected to be visible, since one utters elements which are sentences, or at least part of the phrasal syntax, and not bare words. However, this need not be the case with the initial stages of acquisition, if we assume, as in fact seems to be the case, that the child starts out with a “lexical-looking” syntax, and only later moves into the phrasal syntax. The idea that the child may be uttering a single lexical item, even when he or she is in the two (or perhaps three) word stage, becomes plausible if we adopt the sub-tree notation suggested earlier. It was suggested above that a verb like want, in its lexical representation, consists of a small subtree, with the positions intrinsically theta-marked.
V
(49)
N agent
V V
N theme
want There were shown above a number of constructions from Braine (1963, 1976) from very early speech. Among them were the want constructions from Steven: (50)
want want want want want want want etc.
baby car do get glasses head high
These appear directly after the one-word stage. One possibility of analysis is that these are (small) phrasal collocations. Suppose we assume instead that they are themselves simply words: the lexical subtree in (49), with the terminal nodes (or the object terminal node) filled in.
58
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
V
(51)
N agent
V V
N theme
want
baby
Then the child is, at this point, still only speaking words, though with the terminals partly filled in. Can other constructions be analyzed similarly? It appears so, for the nounparticle constructions (Andrew, word combinations), if we assume that a particle subcategorizes for its subject.
P
(52)
N theme
P
boot
off
A different, interesting set is the collocations of a closed class referring element and a following noun. These have often been called “structures of nomination”: (53)
that that that that etc.
Dennis doll Tommy truck
it bang it checker etc.
A similar set involves a locative closed class subject, followed by various predicative nouns. (54)
Steven word combinations: there ball here bed there book here checker there doggie here doll etc. etc.
From the point of view of the current proposal, these examples are potentially problematic on at least two grounds. First, we have been assuming, essentially
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
59
following Braine, that the initial structures were pivot-open (order irrelevant), where this distinction is close to that of head-complement or subcategorizersubcategorized. The set of pivots (heads) is small compared to the set of elements that they “operate on”. However, under normal assumptions, it is the predicate of a simple copular sentence (which the sentences in (53) and (54) correspond to) which is the head, and subcategorizer, and least in the semantic sense (though INFL may be the head of the whole clause). (55)
S NP
H
John
is a fool
The data in (53) and (54), however, suggest the reverse. In all these cases it is the subject which is closed class and fixed (the pivot), and the predicative element freely varies. Trusting the acquisition data, we must then say that it is the subject which is acting as the head or operator, and that is “operating on” the open element, for simple subject-copula-predicative phrase constructions, though not generally: (56)
i) ii) iii) iv)
here __ there __ that __ it __
This suggests in turn that simple sentences may differ with respect to their semantic and syntactic “headedness”, depending on whether they are copular or eventive, headed by a main verb predicate. In the latter structures (e.g. want X), the main verb seems to act as the pivot or functor category; in the former (e.g. here X, it X), the subject seems to act in a similar manner. This syntactic fact from early speech is in accord with a semantic intuition: namely, that eventive sentences are in some sense “about “ the event that they designate, while copular sentences are about the subject. Precisely how these semantic intuitions may be syntactically represented, I leave open. In effect, the structures in (56) are like small clause structures, without the copula. In the literature, two sorts of small clauses are often distinguished: simple small clauses and resultatives. The former are shown in (57a), the latter in (57b).
60
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(57)
a. b.
John ate the meat (PRO nude) I consider that a man I painted the barn black I want him out
Interestingly, both of these structures are available in the acquisition data above, but with different classifications of what counts as the pivot. Corresponding to the simple small clauses are the structures of nomination in (58), where the subject acts as pivot. Corresponding to the resultatives are the noun-particle constructions given in (52). (58)
a.
b.
that __ it ___ here __ __ off
The former have their pivot (head) to the left, it is the subject. The latter have their pivot (head) to the right. This suggests that, semantically, the subject “operates on” (or syntactically licenses) the predicate in the constructions in (58a) in the child grammar (and, ceteris paribus, in the adult grammar as well), while the reverse occurs in (58b). This is additional evidence for speculations directly above about differences in argument structure in these types of sentences, both at a broad semantic level, and syntactically. 2.3.3
Pre-Project-a representations and the Segmentation Problem
The section above provides a plausibility argument that the initial representations are lexical in character, and that the pivot is to be identified with the head of a Pre-Project-α representation, while the open element is a direct realization of the position in the argument structure. I will consider here some additional data, and a constraint relevant to that determination. One aspect of the analytical side of the child’s task of acquisition in early speech must be the problem of segmentation. The child is faced with a string of elements, largely undetermined as to category and identity, and must, from this, segment the string sufficiently to label the elements. Some part of this task may be achieved by hearing individual words in isolation (and thus escaping the segmentation task altogether), but it is clear that the vast majority of the child’s vocabulary must be learned in syntactic context, through some sort of segmentation procedure. It has often been assumed that the segmentation task requires extensive
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
61
reliance on phrase structure rules. Thus given the phrase structure rule in (59), and given the string in (60) with the verb analyzed but the NP object not analyzed yet by the child, the application of the phrase structure rule (59) to the partially labelled string in (60) would be sufficient to determine the category of the object. (59) (60)
VP → V NP V ? see the cat
→
V NP see the cat
Similarly, given the phrase structure rule in (61), and the partially analyzed string in (62), the identity of the second element may be determined. (61) (62)
NP → Det N Det ? an albatross
→
Det N an albatross
While these simple examples seem to allow for a perspicacious identification of categories on a phrase structure basis, both empirical and theoretical complications arise with this solution, when the problem is considered at a more detailed level. The empirical problems are of two sorts: the indeterminacy of category labelling given the optionality of categories in the phrase structure rule, and the mislabeling of categories, with potentially catastrophic results. An example of the former can be seen very easily at both the VP and NP level. A traditional, Jackendoffian expansion of the VP might be something like the following: (63)
VP → V (NP) (NP) (PP) (S′) …
And that of the NP is given in (64). (64)
NP → (Det) (Adj) N (PP) (PP) (S′) …
Consider the child’s task. He or she has isolated the verb in some construction. A complement appears after it. What is the complement’s type? The expansion rule in (63) is nearly hopeless for this task. Suppose that the child has a relatively complete VP expansion. This is then laid over the partially segmented string: (65) (66)
V (NP) (NP) (PP) (S′) see beavers beavers: NP? PP? S?
62
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The result, given in (66) is nowhere near determinate. In fact, the unknown element may be placed as any of the optional categories: this extreme indeterminacy of result makes the procedure very unclear as to result. The same holds for the nominal expansion. Suppose that the child hears the following: (67)
American arrogance
This has two analyses under his/her grammar, perhaps with additional specification of the determiner in (68b). (68)
a. b.
( (American)A (arrogance)N)NP ( (American)DET (arrogance)N)NP
Again, the category assigned by the phrase structure rule is indeterminate, and perhaps misleading. It may be objected at this point that the above criticism, while correct as far as it goes, does not take account of the fact that additional evidence may be brought to bear to lessen the indeterminacy of the analysis. For example, in (66), the category of beavers may be initially placed as (NP, PP, S′), but the choice between these is drastically narrowed by the fact that beavers name things, and the canonical structural realization of things is as NPs. However, while this assertion is correct, it allows a mode of analysis which is too powerful, if the phrase structure rule itself is still assumed to play a central role. If the child is able to segment the substring beavers from the larger string, and if the child knows that this labels a thing, and is able to know that things are NPs (in general), then the category of beavers can itself be determined by this procedure, without reference to the phrase structure analysis at all. In fact, things are even more difficult for the phrase structure analysis than has so far appeared. We have been assuming thus far that the child has a full phrase structure analysis including all optional categories, and has applied that to the input. Suppose that, as seems more likely, the initial phrase structure grammar is not complete, but only has a subset of the categories. The VP, let us say, has only the object position; the NP has a determiner, but no adjective. Let us further assume that these categories have so far been analyzed as obligatory — i.e. no intransitive verbs (or an insufficient number of them) have appeared in the input, similarly for determinerless nouns. (69)
VP → V NP NP → Det N
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
63
While it is obviously not the case that every child will have the phrase structure rules in (69) at an early stage, the postulation of these rules would not seem out of place for at least a subset of the children. Consider now what happens when the child is faced with the following two instances. (70)
a. b.
go to Mommy Tall men (are nice)
The application of the PS rules in (69) to the data in (70) gives the following result: (71) (72)
( (go)V ( (to)D (Mommy)N)NP ( (tall)D (men)N)NP
To is mislabeled as a determiner, and to Mommy as an NP; tall is also mislabelled as a determiner. Of course, this misidentification of categories is not recoverable from on the basis of positive evidence (the category would just be considered to be ambiguous); worse, the misidentification of a category would not be local, but would insinuate itself throughout the grammar. Thus the misidentification of to as a determiner might result in the following misidentification of the true determiner category following it, and in the following misidentification of the selection of other verbs: (73) (74)
a. to this store b. ( (to)D (this)A (store)N)NP ( (talk) ( (to)D (Mary)N)NP
It is precisely this indeterminacy of analysis, and possibly misleading analyses, which led Grimshaw (1981) and Pinker (1984) to adopt the notion that structural canonical realizations play a crucial role in early grammars, with the analytic role of phrase structure rules reduced. However, the Grimshaw/Pinker analysis does not go far enough. For the later stages of grammars, the optionality of elements in the phrase structure rule would make the output of analytic operations involving them useless (giving multiple analyses), while in the early stages, such analyses could be positively misleading, with erroneous analyses triggering other misanalyses in the system. Let us therefore eliminate their role entirely: (75)
No analytic operations occur at the phrase structure level.
A generalization such as (75) would be more or less expected, given Stowell’s (1981) elimination of phrase structure rules from the grammar. It is nonetheless
64
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
satisfying that the elimination of a component in the adult grammar is not paired with its ghostly continuance in the acquisition system. The segmentation problem, however, still remains. Here I would like to adopt a proposal, in essence a universal constraint, from Lyn Frazier (personal communication): (76)
Segmentation Constraint: All analytic segmentation operations apply at the word level.
Let us suppose that (76) is correct. Suppose that the child hears a sentence like the following: (77)
I saw the woman.
Suppose that the child has already identified the meaning of see, and thus the lexical thematic structure associated with it. According to the discussion above, this is a small lexical sub-tree.
V
(78)
N agent
V V
N theme
The subtree in (78) is applied the input. In the resultant, the closed class determiner, which is causing the N to project up to the full maximal node (see later discussion), is dropped out.
V
(79)
N agent
V V
N theme
saw
woman
I have included the subject as optional in (78) partly for syntactic reasons (it is by no means clear that the subject is part of the lexical representation in the same sense as the object, see Hale and Keyser 1986a, 1986b), and partly because of the acquisition data (the subject appears to be optional in child language, but we may conceive of this as due to some property of the lexical entry, or the case
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
65
system, rather than due to the existence of pro-drop: see later discussion). The dropping of the determiner element in the child’s speech falls out immediately if we assume that the child is speaking words (which contain simple θ-bar-level nominal constituents); this assumption is being carried over from the discussion of the initial utterances. The isolation of the nominal category (woman, here), occurs at the word or word-internal level. Frazier’s segmentation constraint therefore fits in well with the fact that the early grammars drop determiner elements. If such elements are part of the phrasal syntax (i.e. project up to the XP, or cause the head to so project), and if such syntax is not available in these very early stages, then the dropping of the determiner elements is precisely what would be expected. Further, the segmentation of early strings becomes a morphological operation, which is surely natural. 2.3.4
The Initial Induction: Summary
I will discuss further aspects of the analytic problem, and the role that the open class/closed class distinction may play in the identification of categories, and in the composition of phrase structure itself, further below. For the moment, I would like to summarize the differences between the proposals above, and the Grimshaw/Pinker proposals with which the chapter began, since the current comments suggest a difference in orientation not only with respect to the building and relaxation of phrase structure (the second part of the proposal), but with the nature of the generalization involved in argument-linking as well. In the Grimshaw/Pinker system, the following holds: i)
ii)
The subject and object are identified by virtue of their associated thematic relations, agent and patient; similarly for other thematic roles. This identification is universal (the assumption of canonical grammatical-functional realizations). Cross-linguistic variation in articulation of phrase structure follows from a constraint on the phrase structure system: no crossing branches are allowed. Given this constraint, and given the apparent permutation of subject and object arguments, the phrase structure tree of the target language will be flattened.
In the proposal above, the Projection Principle, in the form Project-α, is given pride of place. In addition, it is claimed that i) lexical representations are articulated (sub-)tree representations, which Project-α faithfully represents the information in, and ii) that very early representations are lexical in character. The proposals corresponding to the Grimshaw/Pinker proposals would then take the
66
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
following form. First, the primitives subject and object would be replaced with Williams’ (1981) categorization of internal and external argument. This would be present structurally in terms of the articulation of the phrase structure tree, with the external argument more external than the internal argument, though perhaps not external in Williams’ sense of external to a maximal projection (at least at D-structure, see Chapter 1, discussion of Kitagawa and Sportiche). The external/ internal distinction would also be present in the lexical sub-tree. Proposal i) of Grimshaw/Pinker would therefore correspond to the following: (80)
Agents are more external than patients (universally).
If we assume (80) to hold both over lexical representations, and, by virtue of the Projection Principle, over syntactic representations as well, then the child may, upon isolation of elements carrying the thematic roles agent and patient, determine the following sub-tree.
V
(81)
(N) agent
V V
N patient
This has the same content as the original proposal in terms of the articulation of the initial trees, but eliminates the primitives subject and object from the system. 2.3.5
The Early Phrase Marker (continued)
I have suggested above that the initial grammar is lexical, in the sense of its content being determined by X0 elements. In general, the phrasal system is entered into at the point at which determiner elements, i.e. closed class specifier elements, are met in the surface string. This general picture, together with the fact that the lexicon and the syntax use the same type of formal representation, namely tree-structures (acyclic, directed, labelled graphs, with precedence defined), suggests that the demarcation between the lexicon and the syntax is not as sharp as has sometimes been claimed. Let me now expand a bit on the type of the representation, since the claim that the child is still operating with lexical representations at (say) the two-word stage is unnecessarily dramatic, though still significant. The actuality is that the representation is pre-Project-α, and that more than one level characterizes representations of this type.
67
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
Let us divide the pre-Project-α representation into two relevant levels (there may be others, cf. Zubizarreta 1987, they do not concern us now). The first will be called the lexical representation proper; the second is the thematic representation. Both of these are tree representations. The difference is that OC (open class) lexical insertion has taken place in the latter, but not the former. (82)
Lexical representation V N agent
OC insertion
V V hit
Thematic representation V
N patient
N agent
V V
man
N patient
hit dog S Project-α
NP The man
VP V
NP
hit
the dog
In the sequence of posited mappings, open class lexical insertion (and perhaps other operations) distinguish the thematic representation from the lexical representation. The rule Project-α projects the thematic representation into the phrasal syntax. The claim that the child is speaking “lexical representations” when he/she is speaking telegraphic speech may now be recast as: the child is speaking thematic representations. This is a pre-Project-α representation, but does not correspond to any single accepted level in current theories (e.g. GB or LFG). It has rather an intermediate character between what is generally assumed to be lexical and syntactic levels. Let us turn now back to the segmentation problem. We may simply view this as the determination of the set of primitives that the child employs at PF (where the segmentation occurs). One obvious fact about acquisition is that at the stage where the child commands telegraphic speech, he or she is able to isolate out the open class parts of the vocabulary from parental speech. Assuming that Motherese does not play a crucial role in this process (though it may speed it up, Gleitman, Gleitman, and
68
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Newport 1977), we may view this as involving the child parsing the initial full sentence which is heard with a lexical representation of the head, a tree with precedence and externality/internality defined, giving rise to the thematic representation. For the case of (83), this would be the following:
V
(83)
N
V
V N see Input: The bureaucrat saw the monster Retrieved representation: (V bureaucrat (V see (N monster))) The representation which would be applied to the input would be the lexical representation in the sense suggested above: a headed subtree, with precedence and externality structurally defined. The non-head nodes correspond to lexical, not phrasal, categories. As such, only part of the input string would be analyzed: that corresponding to the full lexical categories. As such, there is no need for a separate parsing structure here apart from that already implicated in the grammar itself: the parsing is done with a representation which is part of the grammar, the lexical representation (and it returns the thematic representation). A second example would be the following: the structures of nomination. These are given in (84), where, as already noted above, the subject appears to act as a fixed functor category, taking a series of predicates as arguments. These occur in copular constructions, not elsewhere. (84)
a. b.
that ball that ___ it mouse it ___
These may be formally represented as involving a tree structure with a fixed subject and open predicate. applied successively to different inputs. The resultant is a partially analyzed string.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
69
(85)
N
V
lexical/thematic representation
that (86)
N
.
that is a
N ball
The copular element and the determiner drop out of the analyzed representation; as in the case of the headed eventive structures, it is the lexical/thematic structures themselves which parse the string. In the adult grammar, these initial representations are licensing conditions: the that __ above licenses the predicate that follows it, at the thematic level of representation (the thematic representation exists in the adult grammar as well). This mode of analysis suggests that the complexity of initial structures should be viewed as a function of the complexity of the primitive units, and of the relations between them. As such they may be used to gain knowledge of what the primitive units are. As noted by Bloom (1970), children’s ability to construct sentences does not proceed sequentially as the following hypothetical paradigm might suggest. bridge big bridge a big bridge build a big bridge This suggests that the child’s grammar, while built of more primitive units, is not built purely bottom-up: see also discussion next chapter. More natural sequences are those given below: (87)
(88)
I see see bridge see big bridge like Mary I like Mary
These sequences tend to isolate the government relation, and then add elements to it (e.g. in (87)). One prediction of the account above would be that in the two word stage, the government relation, and perhaps other types of licensing
70
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
relations (e.g. modification) would be paramount; two word utterances not evidencing these relations would be relatively rare. That this is so is hardly news in the acquisition literature (see Bloom 1970; Brown 1973, and a host of others), but the difficulty has been to model this fact in grammatical theory. Phrase structure representations do not do well; the idea of primitive licensing relations, and compounds of these (see also Chomsky 1975–1955) would do much better. This would give one notion of the complexity of the initial phrase marker. Another point of reference would be the differences in complexity between subjects and objects. Much early work in acquisition (by Bloom and others) suggested that the initial subjects were different than objects in category type: less articulated, and much less likely to be introduced with a determiner or modifier. This was encoded in Bloom (1970) by introducing subjects with the dominating node Nom rather than NP in the phrase structure rule, since it showed different distributional characteristics. This is shown in (89). (89)
S → VP →
Nom VP V NP
If there is a distinction like this (though see the critical discussion in Pinker 1984), then it would follow quite naturally from the sort of theory discussed in this chapter. Given a rule of Project-α, one might ask: is there an ordering in how arguments are projected? In particular, are arguments projected simultaneously, or one-by-one? Projection here would be close to the inverse operation to that of “argument-taking”. The Bloom data suggests an answer, assuming, as always, a real convergence between the ordering of stages in acquisition and the ordering of operations in the completed grammar. Suppose the following holds: (90)
The projection of internal arguments applies prior to the projection of external arguments (in both the completed grammar and in acquisition).
Then the Bloom data and the difference between the subject and object would follow. I leave this open. Let us turn back to the segmentation problem (which may be simply identical to determining the set of primitive structures at PF). I suggested earlier that the parsing of initial strings was done by application of the thematic representation, a pre-Project-α representation, to the phrasal input.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
71
V
(91)
N
V V
N
the bureaucrat saw the monster Returns:
V
N
V V
bureaucrat see
N monster
This returns the open class representation given in (91). Ultimately, the category of the determiner (and its existence), must be determined as well. I suggested above that all segmentation takes place at the lexical level. This means that for the determiner to be segmented out, it must first be analyzed as part of a single word with the noun that it specifies. This means that one of two representations of it must hold. (92)
a.
N (at PF) Det the
b.
or
N man Det (at PF)
Det the
N man
For the time being, I will assume the first of these representations, the more traditional one. This goes against the position of Abney (1987a), for example. There is a reason for this, however. With the notion that the noun is the head of the NP, we are able to keep a coherent semantic characterization of the specifier element (in general) — it contains closed class elements, and not other elements. The notion that all elements are projecting heads in the NP, no longer allows us to keep this semantic characterization.
72
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The notion of functor category that I am using here, deriving from Braine’s work on pivots, is different from that of Abney (1987a), Fukui and Speas (1986), and different as well from the notion of functor in categorial grammar. I have taken the following categories as functors, in the child grammar and presumably in the adult grammar as well: verbs in eventive structures, prepositions in constructions like “boot off”, and the (deictic) subject in sentences like “that ball”. This is clearly not a single lexical class, nor is it all closed class elements. It is closer to the notion of governing head, but here as well differences appear: that in “that ball” would not normally be taken as a governing head. Let us define a functor or pivot as follows: (93)
G is a functor or pivot iff there is a lexical representation containing G and an open variable position.
As such, the extension of the term functor is an empirical matter, to be determined by the data. In these terms, there is some reason to think that closed class determiners are not functors or pivots, in the sense of (93). While the child does exhibit replacement sequences like those in (94) (with the verbal head taking the nominal complement), he/she does not exhibit structures like those in (95). (94)
(95)
see ball see man see big ball the ball (not present in output data) the man the table
The presence of the former type of sequences may be viewed as a result of the child inserting a nominal into the open position in the lexical frame in (96).
V
(96)
V
N
see The fact that the latter does not occur may be taken to suggest that there is no comparable stored representation such as (97) in the grammar.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(97)
73
Det or N Det
N
the Note that a pragmatic explanation, that such lists are unnecessary for the child for the cases in (97), would be incorrect. The child needs to identify the set of cardinal nouns in his language, and the formation in (97) would be a way to do it. Curiously, the phenomenon of lexical selection, and its direction, may have some light shed on it by second language acquisition, particularly by the format of grammatical drills. A common approach is to retain a single head, and vary the complement. (98)
a. b. c.
(ich) sah den Mann. (verb repeated, object alternated) I saw the man. (ich) sah das Maedchen. I saw the girl. (ich) sah die Frau. I saw the woman. etc.
Much less common would be lists in which the object is held constant and the complement-taking head varied. (99)
a. b. c.
(ich) sah den Mann. (object repeated, verb alternated) I saw the man. (ich) totet den Mann. I killed the man. (ich) kusste den Mann I kissed the man. etc.
This may be viewed as a means of inducing the head-centered thematic representations suggested above, while replacing the complement with a variable term. Induced representation for (98): (100)
V V sah
N
74
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Note that this answers the question asked in Aspects (Chomsky 1965): should the subcategorization frame be a piece of information specified with the verbal head, or should the subcategorization (e.g. __ NP) be freely generated with the complement, and a context-sensitive rule govern the insertion of the verb? Only the former device would be appropriate, given the data here. (The Projection Principle derives the result from first principles.) Another type of grammatical drill is one in which the subject is kept constant, and the copular predicate varied: (101) a. b. c.
Der the Der the Der the
Tisch table Tisch table Tisch table
ist is ist is ist is
grau. (subject constant, copular predicate varied) gray. blau. blue. grun green.
Much less common is one in which the subject of an intransitive is kept constant and the verb varied: (102) a. b.
Der the Der the etc.
Mann man Mann man
schlaft. sleeps totet. killed
This suggests again that there is a difference in thematic headness, with the simple copular predication structure being “about” the subject, and forming a nominal-variable structure like that in (103), while the eventive intransitive does not form a subject headed structure (104). (103)
N
A
Der Tisch (104)
V N Der Mann
V
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
75
Finally, we note that lists in which the determiner is held constant and the nominal varied are not common at all. (105) der Mann der ___ etc. This suggests, again, that these do not form the sort of headed lexical structures (with the determiner as head) noted above. The data from second language drill, and first language acquisition (first stages) are therefore much in parity. While it is true that second language learning grammars have the disadvantage of attempting to teach language by direct tuition, the particular area of language drills is not one for which a (possibly misleading) prescriptive tradition exists. Rather, linguistic intuitions are tapped. It is therefore striking that the variable positions in such drills correspond to what may be viewed as the open position in a functor-open analysis (or governed-governee in a broad sense). I have suggested, then, the following: there is a set of lexical-centered frames which are applied by children to the input data; the head or functor element corresponds to a fixed lexical item. These are in addition part of the adult grammatical competence. Let us return now to the question of how closed class elements are analyzed in the initial parse. 2.3.6
From the Lexical to the Phrasal Syntax
The result of applying the verbal lexical entry to the full phrase structure string is an instance of parsed telegraphic speech. (106)
V N
V V
N
The bureacrat saw the monster The residue of this analysis are the closed class elements: the subject determiner, Infl, and the object determiner. In the earliest speech, these are simply filtered out. At a later stage, these are not part of active speech, but understood at least partly in the input, before being mastered. The acquisition problem therefore is the following: how, and when, are these elements incorporated into the phrase marker? I have suggested already the first step: the lexical representation of the head
76
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(see, in this case) is projected over the entire structure, parsing the open class part of the structure. The problem with the closed class elements is that they must be analyzed in the phrase marker antecedent to their identification as markers of categories of given types (this would follow if telegraphic speech really is a type of speech, and shows the primary identification of the open class heads). I will assume at this point a uniform X0 for all the closed class elements. That is, they must be attached first on simply structural grounds. Let us attach determiners and closed class elements into the structure as follows. (107) a. b. c.
Segment the independent closed class elements. Identify the Principle Branching Direction of the language (Lust and Mangione 1984; Lust 1986). Attach each closed class element in accordance with the Principle Branching Direction (and the already established segmentation).
Proceeding from right-to-left, the child would, apart from the strictures in (107) be allowed either of the two attachments for the, actually associated with the object (assuming binary branching as a first approximation): (108) Two possible attachments: a. Right attachment: V
N
V V
N
The bureacrat saw the monster b.
Left attachment: V N
V V
N
The bureacrat saw the monster Given (107), only the structure in (108a) would be the appropriate one. Three questions: How is the Principle Branching Direction determined by the child? Why is the parsing done right-to-left? Why are the elements only added in a binary-branching fashion?
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
77
Three answers: 1) The Principle Branching Direction is either given to the child on phonological grounds, or is perhaps identified with the branching direction given by the transitive verb (with the internal/external division). 2) Parsing of the closed class elements is done in the direction opposite of the Principle Branching Direction (so, right-to-left in English). 3) While it need not be the case that all structures in the language are binary branching, I will assume that there is necessary binary branching in one of the following two situations: a) where the selecting category is a functional head in the sense of the Abney (1987a, b) possibly, in cases in which the language is uniformly right-headed (as in. the case of Japanese, Hoji 1983, 1985). Continuing with the parse, the left-peripheral element is arrived at. In line with (107c), this would give the following (erroneous) parse (if the were attached low, there would be a left branch). (109)
V N
V V
N
The bureaucrat saw the monster Let us exclude this in principle, in the following way: (110) a.
b.
α is extra-syntactic iff it is an unincorporated element on the periphery of the domain on the border opposite from that with which the parsing has begun. Extra-syntactic elements need not be parsed in the initial parsing.
The result of (107) together with (110) would therefore be the following, with the initial the extrasyntactic. (111)
V N
V V
N
(The) bureaucrat saw the monster
78
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The parentheses indicate extra-syntacticity. The tree in (111) must now be labelled. I have suggested earlier that the closed class elements, while segmented, have not been labelled for distinguishing form class. Let us therefore label them all X0 or XCC (Xclosed class). In addition, there is the question of what the constituent is that is composed of an unspecified X0 and an N. Let us adopt the following labeling conventions or principles: (112) Labelling: a. If a major category (N, V, A) is grouped with a minor element, the resultant is part of the phrasal syntax. b. The major category projects over an unspecified minor category, or c. The adjoined category never projects up. d. No lexical category may dominate a phrasal category. (112a) is the labelling principle corresponding to the suggestion above that it is precisely when the closed class determiner elements are added that the phrasal syntax is entered into: this part of the parsing procedure corresponds, then, to the rule of Project-α. (112d) is an unexceptional assumption of much current work (though problems arise potentially for the analysis of idioms). (112b) is more tentatively assumed than the others; see later discussion. None of these are intended to be special “parsing principles”; all are simply part of the grammar. The addition of these elements to the phrase marker stretches it, rather than building it up. The attachment operations are roughly equivalent to the sort of transformations envisioned in Joshi, Levy, and Takahashi (1975), Kroch and Joshi (1985), Joshi (1985). The resultant of applying (112) is the following tree:
VP
(113)
N
VP V
the bureaucrat saw
NP X0
N
the
monster
The second the is labelled X0 and attached to a constituent with the following noun. By (112a) this forms a phrasal category; by (112b) or (c) this phrasal category is an NP. Since a lexical category may not dominate a phrasal category ((112c)), the type of the category dominating the verb and the following elements
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
79
is not a V, but a VP. By (112c), though not (112b), the entire category would be labelled VP. Finally, the initial the is still not incorporated. This initial representation in fact makes a prediction: that the child may initially know that there is a slot before the head noun, and in other places, but have this slot unspecified as to content and category. Interestingly, in production data, it often appears that the overt competence of closed class determiner items, and other sorts of items is preceded by a stage in which a simple schwa appears in the position in which the determiner would be due. This seems to be the case (from Bloom 1970). (114) a. b. c. d. e.
This 6 slide. That 6 baby. 6 take 6 nap. 6 put 6 baby. Here comes 6 ’chine.
More strikingly, however, the schwa appears in two others places: in a preverbal (or perhaps copular) position, and in a position at the beginning of the sentence (from Bloom 1970). (115) a. b. c. d. e. f. g.
This 6 tiger books. (could be determiner or copula) This 6 slide. (could be determiner or copula) 6 pull. (sentence initial or subject) 6 pull hat. (sentence initial or subject) 6 write. (sentence initial or subject) 6 sit. (sentence initial or subject) 6 fit. (sentence initial or subject)
Bloom makes the following comment (pg. 74–75): The transformation for placing the /6/ in preverbal position accounted for the 31 occurrences of /6/ before verbs — for example, “6 try”, “6 see ball” — and for the fact that the schwa was not affected by the operation of the reduction transformation. The preverbal occurrence of /6/ was observed in the texts of all three children and deserves consideration — particularly since such occurrence precluded the specification of /6/ in the texts as an article or determiner exclusively. The problem of ordering the /6/ placement transformation in relation to the reduction rules [the set of deletion rules that Bloom assumes deriving the child’s phrase marker from a relatively more expansive adult-like one, D. L.] has not been solved satisfactorily. The order in which they appear in the grammar was chosen because /6/ was not affected by reduction; that is, it was not deleted with increased complexity within the sentence, and its occurrence
80
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR did not appear to operate to reduce the string in the way that a preverbal noun (as sentence-subject) did. This order is attractive because /6/ can be viewed as ‘standing in’ for the deleted element and appears to justify the /6/ as a grammatical place holder.
The reduction transformation that Bloom posits is of course not carried over into the present theory. Interestingly, the set of places that the schwa is occurring is precisely where one would expect the X0 categories to appear in the current theory. Three questions again arise: How and when is the sentence initial the incorporated into the representation? What about the inflection on the verb (children’s initial verbs are uninflected)? And, crucially, what constitutes the domain over which extra-syntacticity is defined? The second of these questions also has repercussions for what the category of the entire resultant is (VP or IP). For the first of these questions, let us assume that the answer is the same that is given in phonology for extrametricality: If α is extrasyntactic, group it locally with the most nearly adjacent category. This would give rise to the initial the being grouped, correctly, in the initial NP. The question of Infl is more complex, and partly hangs on the domain of extrasyntacticity. Let us assume, as I believe we must, that the child initially is able to analyze the verb as composed of the verb and some additional element. I will assume that this additional element is simply analyzed as an X0, i.e., an additional closed class element of uncertain category, and is simply factored out of the produced representation. That is, the child’s telegraphic speech is of bare infinitival stems (or the citation form), not of correct (or incorrect!) inflected verbs. If we assume that Infl is also adjoined to the VP in accordance with the principle branching direction of the language, we arrive at a representation which is, except for categoricity, identical to the representation of the phrase marker given (for example) in Chomsky (1986). This corresponds, presumably, to the DS representation. On the other hand, if we assume that the domain of extrasyntacticity is the VP, as well as the S, and that Infl is therefore regarded as extrasyntactic in its domain, its attachment site would be low, with the verb. This would correspond to the representation after affix-hopping, in the standard theory (though see, again, Chomsky 1986).
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(116) a.
81
Placement of Infl, if S is only extrasyntactic node: S NP X0
N
X0
VP V
The bureaucrat Infl
b.
NP
see
X0
N
the
monster
Placement of Infl, if S and VP are extrasyntactic nodes: S VP
NP X0
N
The bureaucrat
V
NP
X0
V
X0
N
Infl
see
the
monster
These representations correspond not only to two different representations of the phrase marker, but to what might be assumed to be two different levels of the phrase marker: at DS, Infl takes scope over the entire VP, while at PF, it is attached to the verb. This process itself may be simply a subcase of a more general process in the grammar where a closed class element appears to take wider scope semantically than it does at PF: in morphology, where constructions like transformational grammarian have the -ian take wide scope semantically, but comparative narrow scope syntactically; and perhaps relative clause constructions, where the determiner must appear outside the N′ or N″ for a perspicacious construal of scope relations (the Det-Nom analysis), while for other purposes it appears that the NP-S′ analysis may be preferable (where the determiner has been lowered). With respect to Infl, in earlier work (Lebeaux 1987) I took the position that the initial Infl was adjoined to V, hence did not govern the subject position, and hence the latter could stay unrealized in early speech. This was in contradistinction
82
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
to Hyams’ analysis of early “pro-drop” (Hyams 1985, 1986, 1987). The present representation suggests other alternatives for the early representation of subjects. One alternative, which I will simply mention without taking a position on: that in the early grammar, the subject position is, initially, purely a thematic position (a pure instantiation of theta relations). It might be assumed that such positions, if external, need not be realized on purely thematic grounds. This would account for the non-obligatoriness of “deep structure subjects” in passive constructions; the actual obligatoriness of subjects in active constructions would then have to be due to some other principle. More strikingly, extra-syntacticity may give an account for the placement of early subjects. I suggested, again in Lebeaux (1987), that early pronominal subjects may not be in subject position, but rather adjoined to the verb, as verbal clitics (see also Vainikka 1985). This would follow if such elements were extrasyntactic, and adjoined low, late.
VP
(117)
V
NP
N
V
N
my
did
it
Other elements, in particular agreement elements in Italian, noted by Hyams (1985, 1986, 1987) in her discussion of the early phrase marker would presumably have the same analysis, and hence not be counterexamples to the thesis of this chapter (see also Chapter 4). In the representation above, I have begun with a thematic representation, i.e., a projection of the pure thematic argument structure of the head, and then incorporated the closed class determiner elements by “stretching” the representation to incorporate them, rather than building strictly bottom-up. This is what the operation corresponds to in the analytic mode. In the derivational mode, this corresponds to the basic rule of Project-α: these are simply aspects of the same operation. This is the means by which the phrasal syntax is entered into, from the lexical syntax. Since the closed class elements are not initially analyzed as to category, and their attachment direction is also not given, these choices must initially be made blindly, in accordance with the Principle Branching Direction of the language (or perhaps the governing direction of eventive verbs), and the
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
83
projection of categorial information. Crucial as well was the notion of extrasyntacticity at the periphery of a domain: it was this, and only this, which allowed the appropriate attachment of the initial determiner. One might wonder how far these initial assumptions would go, in building the phrase marker by the child, and how they are revised. For example, a sentence like that in (118a) would have by the child an initial (telegraphic) analysis like that in (118b). (118) a.
N N
Nloc
mommy
roof
b. NP
NP NP NP
N
X0
X0
X0
Nloc
Mommy is
on
the
roof
The representation in (118b), while correct so far as constituency is concerned, is obviously incorrect so far as categoricity: the PP is misanalyzed, and worse, given the uniform analysis of the closed class elements all being of the form X0, one would expect, wrongly, that they would freely permute in the child grammar. One might of course immediately reply that while the initial grouping is given by the X0 term, the more specific categoricity is determined later, far prior to the point at which the closed class determiner elements play a role in the grammar, and thus that the representation in (118c) should not play any formal role in the analysis, being merely a stepping off stage: in the child grammar, as well as in the adult. Nonetheless, it has shed light before — and will continue to do so in the rest of this work — to suppose that the child’s representations have a real status both in themselves, and in their relations to the adult grammar. The question would then be what representations of the form in (118 c) would mean.
84
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
2.3.7
Licensing of Determiners
One advantageous aspect to the analysis in (118) is that it supposes the closed class elements to be of a uniform category. While this is false with respect to distributional characteristics in the syntax, it may well be close to true in the phonology: i.e., at PF. To the extent to which the acquisition system in its analytic mode is operating on PF structures (see Chapter 5), this assumption is correct. A deeper question, with respect to the format of the adult grammar is the following. In the traditional phrase structure rule approach (Chomsky 1965) categories were licensed with relation to their mother node, via the phrase structure rules. (119) VP → V NP The rewrite rule in (119) may be taken to mean that the node VP licenses to dominated categories, V, NP, in that order. In Chomsky (1981) this view is revised in a number of ways, most particularly by assuming that the basic complement is a projection of the lexical entry (of heads), and the element is licensed in that way. This leaves open a number of licensing possibilities for those elements which are not projections of a head: for example, adjuncts and determiners (or specifiers). The situation with adjuncts is discussed in the following chapter. Insofar as the current literature is concerned (see especially the suggestive and important work of Fukui and Speas 1986; Abney 1987a) a particular position has been taken with respect to the specifier-of relation. Namely, in the case of nominals at least, these have been taken to be a projection of the determiner, with the determiner taking the (old) Noun Phrase as a complement. (120) DetP → Det NP This answers, though obliquely, a particular question that might be raised — namely, what is the relation between the specifier and the head (or head plus complement) — and it answers it by assuming that that relation is modelled in the grammar in essentially the same type of way that a simple head (e.g. a verbal head) takes its complement. In essence, both are types of complementation, the head projecting up. I would like to suggest here that the specifier-of relation should be modelled in a way which is different than complementation, as has already been suggested by the treatment of the closed class categories in acquisition above. This involves (at least) two specifications: i) what is the projecting head?, and ii) how is the
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
85
licensing done? The projecting head is the major category. The licensing is done in the following way: (121) a.
b.
Projection: Let M a major category, project up to any of the bar levels associated with it (for concreteness, I will assume three: thus, N, N′, N″, N″′). Attachment: Attach X0 to the appropriate bar level.
It is the attachment rule here which is distinguished from the type of licensing which occurs for direct complements. Categories traditionally analyzed as determiners would therefore have different attachment sites associated with them. If Jackendoff (1977) is correct, and definite and indefinite determiners differ in their attachment site in English, the former being N″′, and the latter being N″, then their lexical specifications would be the following. (122) a. b.
the: X0, (X0, N″′) a: X0, (X0, N″)
The notation on the left is the category; the notation on the right is simply the structural definition of the element, in terms of the mother-daughter relation (cf. the structural definition of grammatical relations in Aspects). It was suggested in earlier work of mine (Lebeaux 1987, see also Lebeaux 1985) that NP elements may be licensed in more than one way: via their relation to sister elements, which is associated with what is called structural case in Chomsky (1981), and via their relation to their mother nodes. An example of the latter would be the licensing of the genitive subject of nominals in English: that is licensed via its relation (NP, NP). In that article, I argued that this differential type of licensing was associated with a different type of case assignment, namely what I called phrase-structural case, and this is associated with a different type of theta assignment as well (where the NP genitive element is not given its theta role by the N′ following it, but rather a variable relation, relation R, is supplied at the full NP node, relating both to each other; see Lebeaux 1985). By adopting the format in (127) for closed class determiner elements, and other closed class elements, I am adopting the position that these elements are licensed not via their relation to the N′ with which they are associated, but rather via their relation to the mother element (the N″ or N″′ or whatever). The category which they close off (the variable of) is the one which they are licensed by. They neither license the following N′ nor are licensed by it: that is, a complement-of type relation with respect to licensing is not adopted in either direction. Rather, a different type of licensing relation, that of the element to the mother node, is adopted.
86
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
By supposing that the NP-genitive, and the determiner are both licensed in a common way, i.e. via the structural condition on insertion with respect to the mother node, an intuition behind the position of Chomsky (1970) is perhaps explained. In that work, Chomsky supposed that the genitive element was generated directly under the specifier, in the same position as the closed class determiner. This is not obviously correct. Nonetheless, if the above is correct, these elements are licensed in a common way. 2.3.8
Submaximal Projections
As the foregoing suggests, categories may be built up not simply to the maximal projection, but to what might be called submaximal projections. By submaximal projections, I mean a projection which is maximal within its phrase structure context (i.e., its dominating category is of a different type), but which is not the maximal projection allowed for that category in the grammar. For nominals, for example, assuming provisionally the three-bar system of Jackendoff (1977) (nothing hangs on the number of bar levels here), a nominal might be built up to simply N, or to N′, or to N″, or to N″′: N″′ would be maximal for the grammar, the others would be submaximal (though perhaps maximal for that particular instantiation). So far as I know, the syntactic proposal was first made in an unpublished paper by myself (Lebeaux 1982), it had earlier been suggested in acquisition work by Pinker and Lebeaux (1982) and Pinker (1984), and has recently been independently formulated, in a rather different way, by Abney, Fukui, and Speas. Though some of the terminology and many of the concerns will be similar to those of Abney, Fukui, and Speas, there are differences both of principal and detail; I will build here on my own work (Lebeaux 1982, 1987). Some consequences of assuming submaximal projections are the following: I. Subcategorization is not in terms of a particular phrasal category (e.g. N″′), but rather in terms of a category of a given type. E.g. a nominal category Nx, where x may vary over bar levels. II. Crucially, the definite/indefinite contrast may be gotten, without reference to features, or to percolation from the nonhead node (which would be necessary otherwise if one assumes that N is the head of the full category NP). This is done as follows: the nominal is built up as far as necessary. Jackendoff (1977) argues quite forcefully that the definite determiner is attached under N″′ in English, while the indefinite is attached under N″. The two representations are therefore the following.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(123)
87
N′′ Det
a (124)
N′ N
PP
picture
of Mary
N′′′ N′′
Det
N′
the
N
PP
picture
of Mary
Thus, the semantics need not refer to a feature such as +/−definite: this is encoded in the representation. Other types of processes sensitive to the definite/ indefinite contrast, e.g., there-insertion in English and dative shift in German, would presumably refer to this same structural difference. III. Other types of binding and bounding processes may refer to the maximal projection, rather than to the presence or absence of the definite determiner. Here, the relevant contrasts are those such as those in (125) and (126): (125) a. Who would you like a picture of t? b. Who do you like pictures of t? c. *Who do you like the picture of t? (126) a. Every mani thinks that a picture of himi would be nice. b. Every mani thinks that pictures of himi would be nice. c. ?*Every mani thought that the picture of himi was nice. Assuming the same general theory as that above, these contrasts would be specified not by the feature +/−definite, but by the maximal projected node. Of course, the means of capturing the generalization here is simply as strong as the original generalization, and there is some reason to think that it is “namehood” or “specificity” which matters in determining these binding relations
88
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(Fiengo and Higginbotham 1981), not the presence or absence of the determiner itself. If so, there is no purely structural correlate of the opacity. (127) a. *Which director do you like the picture of t? b. What director do you like the first picture of t? IV. Assuming that there is the possibility of submaximal projections, we find a curious phenomenon: for each major category, there are cases which appear to be “degenerate” with respect to projection (Lebeaux 1982). That is, they simply project up to the single bar-level, no further. These are the following. (128) Basic category verb noun preposition adjective
Degenerate instance auxiliary verb pronoun particle (prenominal adjective in English)
The first three of these are clear-cut, the fourth is more speculative. The sense in which a pronoun would be a degenerate case of a noun is that it does not allow complements, nor specifiers. (129) a. *the him b. *every him c. every one d. the friend of Mary, and the one/*him of Sue This may be accounted for if we assume that it is inherently degenerate with respect to the projections that these would be registered under. Similarly, while the issues and data are complex — see the data in Radford (1981) for interesting puzzles — an auxiliary verb does not appear to take complements, specifiers, and adjuncts in the same way that verbs do. Indeed, to account for some of the mixed set of data in Radford, it may make sense to allow for both a Syntactic Structures type analysis and one along the lines of Ross (1968) or Akmajian, Steele, and Wasow (1979), where the auxiliary takes the different structures at different levels of representation. Aside from facts having to do with complement-taking, there are those having to do with cliticization, and the rhythm rule (Selkirk 1984) which in general support the hypothesis of bare, nonprojecting lexical categories. In general, the rhythm rule does not apply across phrasal categories. However, it may apply in prenominal adjectival phrases. (130) a. b.
That solution is piecemealÁ. A pieceÁmeal solution.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(131) a. b.
89
The tooth is impactedÁ. An imÁpacted tooth
In (130b) the main adjectival stress has retracted onto the first syllable; in (131b) it has done likewise, optionally. Assuming that the rhythm rule does not apply across phrasal boundaries, this means that the structure must be one in which the adjective is non-phrasally projecting, i.e., an A, not an A″ or A″′: (132)
NP Det A an
N
impacted tooth
This would also account for the impossibility of phrasal APs in prenominal position in English: it suggests that it is the degeneracy of this category in this position which is the point to be explained, rather than something along the lines of the Head Final Filter of Williams. The other area in which phonological evidence is relevant is with respect to cliticization. In English, a pronoun, but not a full noun, nor the pro-nominal one, may lose its word beat and retract onto the governing verb. (133) a. I saw ’m. b. I saw one. c. *I saw ’n. Similarly, the auxiliary may do so, and cliticize onto the subject, or onto each other. (134) a. b.
I cn go. He may’ve left.
The generalization in these cases seems to be the following. i) α governs β, and ii) β is an X0 category, Then β may cliticize onto α.
(135) If
But this generalization requires the presence of submaximal nodes: i.e., degenerate projections.
90
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
V. Finally, we may note that the device of submaximal projections gives the possibility of differentiating null categories, without assuming a primitive feature set of +/−anaphor, +/−pronominal. Namely, the categories may be differentiated by the node to which they project up. These would be as follows. (136) Null Category PRO, pro wh-trace NP-trace
Category type N N′ N″
The particular assignments are given for the following reasons: PRO and pro because of their similarity to simple pronominals in interpretation (though not, in the case of PRO, in their dependency domains), NP-trace because it may also simply be regarded semantically as a surrogate for the NP which binds it: this is not true for either wh-trace or PRO (or pro). The reason that wh-trace is identified with N′ will be apparent in the next chapter. This identification of null categories with nominal projection levels here is intended as suggestive: a full proposal along these lines would go far beyond the scope of this thesis.
C 3 Adjoin-α α and Relative Clauses
3.1 Introduction In the previous chapter I dealt with some aspects of argument structure, phrase structure, and the realization rule, Project-α, which relates lexical representations to the syntactic system strictly speaking. Aside from particular points that were made, a few central conclusions were reached: i) that Project-α is a relation relating the lexical entry to the phrase marker in a very faithful way — in particular, the internal/external distinction used in the syntax, and even directionality information as to theta marking (Travis 1984; Koopman 1984) is found in the lexical entry, ii) that telegraphic speech was a representation of pure theta relations, and iii) that there was a close relation between acquisition stages and representational levels, a relation of congruence (the General Congruence Principle). In this chapter, I take up the third point more carefully, modifying it as necessary. What is the nature of the relation between acquisitional stages and representational levels? Are there true levels, and, if not, are there at least ordered sequences of operations (where these sequences do not, however, pick out levels)? If there are levels, are they ordered: i.e. are DS, SS, PF, LF, and perhaps others, to be construed as a set, or is there a (partial) ordering relation between them? Finally, what is the way that levels may be ordered: on what grounds? Note that to the extent to which there are real acquisitional stages, and these are in a correspondence relation with representational levels, a strong syntactic result would be possible: that the grammar is essentially leveled, and the leveling follows from, is “projected from”, the course of development. A geological metaphor is apt: the sedimentation pattern over a period of time is essentially leveled. The sedimentation layers are distinct and form strata, moreover they have distinct “vocabularies”. The course of the geological history is projected into the final structure: a cross-section reveals the geological type of the resultant.
92
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
In the next three chapters, I would like to discuss three areas of grammatical research. These are intended to shed light on the question of representational mode. One of these areas is well mapped out in current syntactic work, though the appropriate analysis is still not clear: the argument/adjunct distinction. This I will discuss in this chapter. Here I will be looking at, in particular, the representation of relative clauses — the paramount case of an element in an adjunctual relation to a head. The general conclusion will be that the adjunct relation should be modeled in a derivational way: namely, by supposing that adjuncts are “added into” the representation in the course of a derivation. Both syntactic and acquisition evidence will be presented to support this view. A second question has been asked more extensively in the acquisition literature than in syntactic research per se, though with notable exceptions (Chomsky 1980, on Case assignment; Marantz 1982, on early theta assignment). It divides into two parts. First, is there a “semantic stage” in acquisition, such as suggested by much early work in psycholinguistics (Bowerman 1973, 1974; Brown 1973)? Such a stage would use purely semantic, i.e. thematic, descriptors in its vocabulary, without reference to, e.g., grammatical relations or abstract Case. The same question may be asked with respect to Case assignment: are there different types of Case assignment, are these ordered in the derivation, and may they be put into some correspondence with acquisitional stages? N. Chomsky in “On Binding” answered the first and second questions in the affirmative about case assignment (though this was before work on abstract Case), and in earlier work (Lebeaux 1987), I suggested that Hyams’ data about the dropping of early subjects might fall under the rubric not of pro-drop, but of the lack of analysis of the verb + Infl combination, together with two types of Case assignment (structural and phrase-structural) having a precedence relation in the acquisition sequence, and operating in different manners. My analysis, then, answered the third question in a positive manner, with respect to Case. I will not discuss the possibility of precedence within types of Case assignment in this book, but I will discuss the thematic issue, in Chapter 4. A third question has to do with the nature of the open class/closed class distinction, and how this might be modelled in a grammatical theory. This question, again, has received more attention in the psycholinguistic literature (see Shattuck-Hufnagel 1974; Garrett 1975; Bradley 1979; Clark and Clark 1977, and a panoply of other references) than in grammatical theory proper. I propose in Chapter 4, the beginnings of a way to accommodate the two literatures. The current chapter, however, will be concerned with the argument/adjunct distinction.
93
ADJOIN-α AND RELATIVE CLAUSES
3.2 Some general considerations As has often been noted (Koster 1978; Chomsky 1981), given a particular string, say that in (1), there are two ways of modelling that string, and the dependencies within it, within a GB-type theory. (1)
Whoj didi John ei see ej?
On the one hand, these informational dependencies may be viewed as an aspect of a single level of representation. Thus in Chomsky (1981) it is suggested that the operation Move-α may be viewed as a set of characteristics of the S-structure string, involving boundedness, single assignment of theta role, and so on. On the other hand, the string in (1) may be viewed not as a representation at a single level, but as being the output of a particular derivation, that in (2). (2)
DS:
SS: C′′
C′′ C′
C
C′
Who IP
C
IP
Move-α NP John
I′ I Infl
NP VP
John
I′ I
VP
V
NP
V
NP
see
who
see
e
Under this view, the sentence in (1) is just a sectioning of a larger figure. The full figure is multi-leveled. The representation in (1) retains a good deal — and perhaps all — of the information necessary to derive the full representation in (2) back again. It is precisely this character, forced in part by the projection principle, that makes the distinction between “representational” and “derivational” modes noted in Chomsky (1981) so difficult. Indeed, in the old-style Aspects derivations, where no traces were left, such a question could not arise, since the surface (corresponding to the present S-structure) patently did not contain all the information present in the DS form: it did not define for example, the position from which the movement had taken place. However, to the extent to which constancy principles hold — i.e. principles
94
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
like the Projection Principle which force information to be present at all levels of the same derivation — the problem of the competition in analysis between the representational and derivational modes becomes more vexed. It is therefore natural, and necessary, to see what sort of information in principle might help decide between them. Basically, the information may be of two types. Either i) there will be information present in the representational mode which is not present in the derivational mode, or can only be present under conditions of some unnaturalness, or ii) there is information present in the derivational mode which is not present in the representational mode — or, again, may only be represented under conditions of some unnaturalness. It is possible to conceive of more complex possibilities as well. For example, it may be that the grammar is stored in both modes, and is used for particular purposes for either. I do not wish to examine this third, more complex, possibility here.
3.3 The Argument/Adjunct Distinction, Derivationally Considered In the next few sections, I would like to argue for a derivational approach, both from the point of view of the adult system, and from the point of view of acquisition. The issue here is the formation of relative clauses and the modelling of the argument/adjunct distinction in a derivational approach. 3.3.1
RCs and the Argument/Adjunct Distinction
Let us consider the following sentences: (3)
a. b. c. d. e.
The man near Fred joined us. The picture of Fred amazed us. We enjoyed the stories that Rick told. We disbelieved the claim that Bill saw a ghost. John left because he wanted to.
The following examples give the same sentences with the adjuncts in italics. (4)
a. b. c. d. e.
The man near Fred joined us. The picture of Fred amazed us. We enjoyed the stories that Rick told. We disbelieved the claim that Bill saw a ghost. John left because he wanted to.
ADJOIN-α AND RELATIVE CLAUSES
95
I have differentiated in the sentences above between the modifying phrases near Fred and that Rick told in (4a, c), and the phrases of Fred and that Bill saw a ghost in (4b, d), which have intuitively more the force of direct arguments. See Jackendoff (1977) for structural arguments that the two types of elements should be distinguished. It is sometimes claimed that the latter phrases are adjuncts as well (Stowell 1981; Grimshaw 1986), but it seems clear that, whatever the extension of the term “adjunct”, there is some difference between the complements in (4b, d) vs. those in (4a, c). It is likely, therefore, that there is a threeway difference between pure arguments like obligatory arguments in verbal constructions (“I like John”), the optional arguments in instances like picturenoun constructions and in the complement of denominals like claim (“the picture of John”, “the claim that Bill saw a ghost”), and true adjuncts like relative clauses and locative modifiers (“near Fred” and “that Rick told” above). For present purposes, what matters is the distinction between the second type of complement and the third, and it is here that I will locate the argument/adjunct distinction. There is no unequivocal way to determine the adjunctual status of a given phrase, at least pre-theoretically. One commonly mentioned criterion is optionality, but that will not work for the complements above, since all the nominal complements are optional — yet we still wish to make a distinction between the picture-noun case (as nominal arguments), and the locative phrases or relative clauses (as adjuncts). Nonetheless, if the intuition that linguists have is correct, the property of optionality is somehow involved. Note that there is still a difference in the two nominal cases: namely, that if the nominal construction is transposed to its verbal correlate in the cases that we would wish to call “argument”, then the complement is indeed obligatory, while the locative complement remains not so. (5) (6)
a. b. a.
the photograph (of Fred) the photograph (near Fred) We photographed Fred. #We photographed. (not same interpretation) b. We photographed near Fred. We photographed. (same interpretation)
This suggests that the difference between (5a) and (6a) may not reside so much in theta theory as in Case theory (Norbert Hornstein, p.c.). Let us therefore divide the problem in this way. There are two sorts of optionality involved. The first is an optionality across the subcategorization frames of an element. The
96
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
nominal head of a construction like photograph, or picture, is optional across subcategorization frames, while the corresponding verbal head is not. (7)
a. b.
photograph (V): __NP photograph (N): __ (NP)
It is this sort of optionality which may, ultimately, following Hornstein, be attributed to the theory of Case: for example, that the verbal head photograph assigns case and hence requires an internal argument at all levels, while the nominal head photograph does not. Over against this sort of optionality, let us consider another sort: namely, that of licensing in a derivation. Since the work of Williams (1980), Chomsky (1982), Rothstein (1983), Abney and Cole (1985), and Abney (1987b), as well as traditional grammatical work, it is clear that elements may be licensed in a phrase marker in different ways. In particular, there exist at least two different sorts of licensing: licensing by theta assignment and licensing by predication. Let us divide these into three subcases: the licensing of an object by its head, which is a pure case of theta licensing, the licensing involved in the subject-predicate relation, which perhaps involves both theta licensing and licensing by predication, and the relation of a relative clause to its head, which is again a pure instance of predication licensing (according to Chomsky 1982). These sorts of combinatorial relations may themselves be part of a broader theory of theta satisfaction along the lines sketched by Higginbotham (1985), which treats things like integration of adjectival elements into a construction; the refinements of Higginbotham’s approach are irrelevant for the moment. (8)
a. b. c.
John hit Bill. (licensed by theta theory) John hit Bill. (licensed by theta theory and predication theory) The man that John saw (licensed by predication theory)
Let us now, following the basic approach of Chomsky (1982) and Williams (1980), note that predication licensing — i.e. predication indexing in Williams’ sense — need not take place throughout the derivation, but may be associated with a particular level, Predication Structure in Williams’ theory. Let us weaken Williams’ position that there is a particular level at which predication need apply, and adopt instead the following division, which still maintains an organizational difference between the two licensing conditions: (9)
a. b.
If α is licensed by theta theory, it must be so licensed at all levels of representation. If α is not licensed by theta theory, it need not be licensed at all levels of representation (but only at some point).
ADJOIN-α AND RELATIVE CLAUSES
97
Predication licensing in Chomsky’s (1982) broad sense (and possibly in Williams 1980, sense as well) would fall under (9b), while the licensing of direct internal arguments would fall under (9a). However, (9a) itself is just a natural consequence of the Projection Principle, while (9b) simply reduces to the instances over which the Projection Principle holds no domain, which needs no special statement. The strictures in (9) may therefore be reduced to (10), which is already known. (10)
a. b.
The Projection Principle holds. All categories must be licensed.
In terms of the two types of “optionality” noted above, the optionality of (9) is the optionality in licensing conditions for adjuncts at DS. (11)
Arguments must be licensed at DS; adjuncts are optionally licensed at DS.
With respect to the constructions discussed earlier, the picture-noun complements and complements of claim, this means that the complements in such constructions, as arguments, must be assigned a theta role and licensed at DS, when they appear. (12)
(13)
the picture of Mary theme licensed at DS the claim that Rick saw a ghost theme licensed at DS
These complements need not appear; they are optional for the particular head (picture, claim). However, when they appear, they must be licensed and theta marked at DS, by the Projection Principle. This distinguishes them from true adjuncts, which need not be licensed at DS. The optionality in the licensing of adjuncts at DS, but not arguments, is one way of playing out the argument/adjunct distinction which goes beyond a simple representational difference such as is found in Jackendoff (1977), where arguments and adjuncts are attached under different bar-levels. However, there is a more profound way in which the argument/adjunct distinction, and the derivational optionality associated with it may enter into the construction of the grammar. It is to this that I turn in the next section.
98
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
3.3.2
Adjunctual Structure and the Structure of the Base
In the sentences in (4) above the adjuncts were italicized, picking them out. Suppose that, rather than considering the adjuncts in isolation, we consider the rest of the structure, filtering out the adjuncts themselves. (The (b) sentences are after “adjunct filtering”.) (14) (15) (16) (17) (18)
a. b. a. b. a. b. a. b. a. b.
Bill enjoyed the picture of Fred. Bill enjoyed the picture of Fred. He looked at the picture near Fred. He looked at the picture. We disbelieved the claim that John saw a ghost. We disbelieved the claim that John saw a ghost. We liked the stories that Rick told. We liked the stories. John left because he wanted to. John left.
Comparing the (a) and (b) structures, what is left is the main proposition, divested of adjuncts. Let us suppose that we apply this adjunct filtering operation conceptually to each string. The output will be a set of structures, in which the “argument-of” relation holds in a pure way within each structure (i.e. the subjectof, object-of, or prepositional-object-of is purely instantiated), but the relation of adjunct-of holds between structures. In addition, one substructure is specially picked out as the root. (19)
(20)
(21)
(15a) after adjunct filtering: Argument structure 1: He looked at the picture. Argument structure 2: near Fred The rooted structure is 1. (16a) after adjunct filtering: Argument structure 1: We disbelieved the claim that John saw a ghost. The rooted structure is 1. (17a) after adjunct filtering: Argument structure 1: We liked the stories. Argument structure 2: that Rick told. The rooted structure is 1.
ADJOIN-α AND RELATIVE CLAUSES
(22)
99
(18a) after adjunct filtering: Argument structure 1: John left. Argument structure 2: because he wanted to The rooted structure is 1.
Each of the separate argument structure elements are a pure representation of the argument-of relation; no adjuncts are included. They may be called the Argument Skeletons of the phrase marker. In this sense, each phrase marker is composed of a set of argument skeletons, with certain embedding relations between them (which haven’t been indicated above), and one element picked out as the root. (23)
Phrase marker
Argument skeletons Can anything be made of such a conceptual device? Before considering data, let us note one aspect of current formulations of the base. According to Stowell (1981), there is no independent specification of the base. Rather, its properties follow from that of other modules: the theory of the lexicon, Case theory, theta theory, and so on. Let us take this as a point of departure: all properties of the base follow from general principles in the grammar. What about the actual content of the base: of the initial phrase marker? Here we note (as was noted above) that a duality arises in licensing conditions: elements may either be directly licensed by selection by a head (i.e. subcategorized, perhaps in the extended sense of theta selection), or they may not be obligatorily licensed at all, but may be optionally present, and, if so, need not be licensed at DS, but simply at some point in the derivation: the case of adjuncts (Chomsky 1982, and others). Suppose that we adopt the following constraint on D-structures: (24)
(Every) D-structure is a pure representation of a single licensing condition.
Then the duality noted in the licensing conditions would be forced deeper into the grammar. The consequence of (24) would be that arguments, licensed by a head, and adjuncts, licensed in some other way, would no longer be able to both
100
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
be present in the base.1 The base instead would be split up into a set of sub-structures, each a pure representation of a single licensing condition (“argument-of” or “assigned-a-theta-role-by”), with certain adjoining relations between them. That is, if (24) is adopted, the argument skeletons above (arg. skeleton 1, arg. skeleton 2, etc.) are not simply conceptual divisions of the phrase marker, but real divisions, recorded as such in the base. Ultimately, they must be put together by an operation: Adjoin-α. By adopting a position such as (24), we arrive then, at a position in some ways related to that of Chomsky (1957) (see also Bach 1977): there is a (limited) amount of phrase marker composition in the course of a derivation. Yet while phrase markers are composed (in limited ways), they are not composed in the manner that Chomsky (1957) assumes. Rather, the Projection Principle guides the system in such a way that the substructures must respect it. There is, in fact, another way of conceiving of the argument structures picked out in (19)–(23). They are the result of the Projection Principle operating in the grammar, and, with respect to the formulation of the base, only the Projection Principle. If the Projection Principle holds, then there must be the argument structures recorded in (19)–(23), at all levels of representation. However, there need not be other elements in the base, there need not be adjuncts. If we assume that the Projection Principle holds, and (with respect to this issue) only the Projection Principle, then it would require additional stipulation to actually have adjuncts present in the base: the Projection Principle requires that arguments be present, but not adjuncts. It is simpler to assume that only the Projection Principle holds, and the adjuncts need not be present. The sort of phrase structure composition suggested above differs from both the sort suggested in varieties of categorial grammar (e.g. Dowty, Wall, and Peters 1979), and from the domains of operation of traditional cyclic transformations (Chomsky 1965). With respect to categorial grammar, since the ultimate phrase marker or analysis tree is fully the result of composition operations, there are no subunits which respect the Projection Principle. The analysis may start off with the transitive verb (say, of category S/NP/NP), compose it with an object creating a transitive verb phrase, and compose that TVP with a subject. The original verb, however, starts out “naked”, unlike the representation in Chapter 2, and the argument skeletons above. And the composition operation, being close
1. This is a stronger condition than that which simply follows from the Projection Principle, because it requires that something actually force the adjuncts in, for them to be present. In other words, elements cannot be “hanging around” in the structure, before they are licensed there (or if they are not licensed there).
101
ADJOIN-α AND RELATIVE CLAUSES
to the inverse of the usual phrase structure rule derivation (with the possible difference of extensions like “Right- and Left- Wrap”, Bach 1979), would not add adjuncts like relative clauses in the course of the derivation, but rather would compose a relative clause directly with its head, and then let the resultant be taken as an argument: exactly the reverse of the order of expansion in the phrase marker. The operation above, however, takes two well-formed argument skeletons and embeds one in the other. The difference between the domains scanned in the theory proposed above, and that found in standard early versions of cyclic theories is perhaps more subtle. Cyclic theories (pre-Freidin 1978) scan successive sequences of subdomains, the least inclusive sub-domains first. A partial ordering exists between the sub-domains, where the possibility of multiple branching requires that the ordering be merely partial rather than complete. This is also true with the argument skeleton approach above. However, the domains which are in such an inclusion relation are different. This is shown in (25) below. (25)
Cyclic domains S
Ordering Relations
NP
➄ ➃
NP
1