Minimality Effects in Syntax
≥
Studies in Generative Grammar 70
Editors
Henk van Riemsdijk Harry van der Hulst Jan ...
31 downloads
1116 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Minimality Effects in Syntax
≥
Studies in Generative Grammar 70
Editors
Henk van Riemsdijk Harry van der Hulst Jan Koster
Mouton de Gruyter Berlin · New York
Minimality Effects in Syntax
edited by
Arthur Stepanov Gisbert Fanselow Ralf Vogel
Mouton de Gruyter Berlin · New York
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
The series Studies in Generative Grammar was formerly published by Foris Publications Holland.
앝 Printed on acid-free paper which falls within the guidelines 앪 of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication Data Minimality effects in syntax / edited by Arthur Stepanov, Gisbert Fanselow, Ralf Vogel. p. cm. ⫺ (Studies in generative grammar ; 70) Includes bibliographical references and index. ISBN 3-11-017961-X (cloth : alk. paper) I. Stepanov, Arthur. II. Fanselow, Gisbert. III. Vogel, Ralf, 1965⫺ IV. Series. 2004019085
ISBN 3-11-017961-X Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at ⬍http://dnb.ddb.de⬎.
쑔 Copyright 2004 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Cover design: Christopher Schneider, Berlin. Printed in Germany.
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arthur Stepanov, Gisbert Fanselow and Ralf Vogel
1
On clitics, feature movement and double object alternations . . . . . . . . . 15 Elena Anagnostopoulou PF merger in stylistic fronting and object shift . . . . . . . . . . . . . . . . . . . . 37 Z eljko Bos kovic The MLC and derivational economy . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Gisbert Fanselow Stylistic Fronting: A contribution to information structure . . . . . . . . . . 125 Susann Fischer The superiority conspiracy: Four constraints and a processing effect . . . 147 Hubert Haider Minimal links, remnant movement, and (non-)derivational grammar . . . 177 John Hale and Géraldine Legendre Extending and reducing the MLC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Winfred Lechner Minimality in a lexicalist Optimality Theory . . . . . . . . . . . . . . . . . . . . . 241 Hanjung Lee Phrase impenetrability and wh-intervention . . . . . . . . . . . . . . . . . . . . . . 289 Gereon Müller MLC violations: Implications for the syntax/phonology interface . . . . . 327 Geoffrey Poole and Noel Burton-Roberts Ergativity, Case and the Minimal Link Condition . . . . . . . . . . . . . . . . . 367 Arthur Stepanov Correspondence in OT syntax and Minimal Link effects . . . . . . . . . . . . 401 Ralf Vogel Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Introduction Arthur Stepanov, Gisbert Fanselow and Ralf Vogel
1. Minimality and the Minimal Link Condition One of the central assumptions in modern syntactic theory is that syntactic relations must be local. A major form of instantiatiating the fundamentally local character of human language is Minimality, a general concept restricting syntactic dependencies to some sufficiently ‘minimal’ (in the sense to be defined) domains. The concept of Minimality figures prominently in virtually every major instantiation of generative syntactic theory, and plays an increasingly important role in its most recent developments. In the Government and Binding framework, Minimality is a central notion that enters various definitions both of government and movement. As for the former, Minimality restricts government to local domains in a way of precluding government across some intervening element, a potential governor. In the context of syntactic movement, Minimality implies, informally, that a moving element must not skip another element of the relevant kind – the intervenor – on its way to its target position, that is, a resulting dependency must be minimal in length. In the later instantiations of GB, especially starting from Barriers (Chomsky 1986), Minimality plays a key role in defining the way movements may proceed. Rizzi’s (1990) fundamental investigation of Relativized Minimality further explored the empirical depth of this principle, and showed in full light its explanatory potential. It has been widely maintained in various theories that some version of Minimality is relevant in explaining the ill-formedness of the (a) examples in (1)-(3) involving non-local head movement, A-movement and A’-movement, respectively: (1)
a. *How fix John will t the car? b. cf. How will John fix the car?
(2)
a. *John seems that it is certain t to win the race b. cf. It seems that John is certain to win the race
2
(3)
Arthur Stepanov, Gisbert Fanselow and Ralf Vogel
a. Who do you expect t to say what? b. *What did you expect who to say t?
Sentences such as (1)–(3) provided a particularly powerful thrust to investigating various properties of Universal Grammar that make use of syntactic Minimality (the Empty Category Principle, Head Movement Constraint etc.). In particular, the contrast in (3) has been attributed to Superiority Condition (Chomsky 1973), already encompassing the spirit of Minimality. In the Minimalist Program, Minimality is stated in terms of Minimal Link Condition (MLC), itself incorporated into the definition of movement understood in terms of Attract. (Chomsky 1995: 297, 356) formulates Attract as follows: (4)
K attracts F if F is the closest feature that can enter into a checking relation with a sublabel of K If ` c-commands _ and o is the target of raising, then ` is closer to K than _ unless ` is in the same minimal domain as (a) o or (b) _.
Thus, for instance, in the ungrammatical (3b) movement of what does not obey the MLC since from the point of view of the target of movement (interrogative C0) there is a closer candidate, who. In the grammatical (3a), in contrast, the closest candidate who is Attracted. MLC is thus behind the Minimalist interpretation of Superiority. Similar considerations obtain with respect to the “super-raising” case (2) (as for (1), minimalism is somewhat reluctant to incorporate head movement into the syntax proper, thus it is not obvious that considerations of purely syntactic Minimality apply here; see also below). The Minimalist program which concerns itself, among purely empirical matters, with the overall architecture of the language faculty, also provides a conceptual motivation of Minimality in general, and the Minimal Link Condition in particular. The motivation comes from the requirement is to reduce the “search space” for syntactic dependencies (Chomsky 2000, Chomsky 2001), a consideration from computational complexity that ultimately has to do with possible memory limitations on the human mechanisms responsible for language. On the empirical side, new data brought to light in the last decade or so have led to an increased focus on Minimality/MLC and its proper formulation in syntactic theory. Perhaps it is not an exaggeration to say that a major thrust to investigating the role and principles behind Minimality was provided by languages that have multiple wh-movement. Observe that the
Introduction
3
MLC as formulated in (4) predicts a) Superiority effects across the board; b) ‘nestedness’ of multiple movement (see e.g. Pesetsky 1982). However, multiple wh-movement in a number of Slavic languages such as Bulgarian, Russian, Serbo-Croatian (the latter in certain contexts) present a challenge to the MLC in that they seem to allow patterns of movement excluded by a) Superiority; b) nestedness. Although an impressive amount of work has been done to reconcile these languages with the minimalist version of MLC (including works by Bo‰koviç, Pesetsky, Richards and many others), a number of mysteries remain in this area. Another relevant phenomenon is the pattern of pronoun placement in German which also sometimes seems to violate the MLC. This situation raises the need to expand the empirical base of the MLC in order to establish a wider range of phenomena potentially problematic for it, and investigate to what extent the challenge is real, and to what extent it can be reduced to the existing properties of the grammar. Another interesting question is whether Minimality/MLC should be formulated as a constraint on representations or a derivational condition, and furthermore, whether a principle like MLC may itself be derivable from more primitive notions. The first part of the issue bears on the overall structure of the theory. While standard minimalist views tend to regard MLC as a derivational condition (which can also be inferred from (4)), representational formulations have also been advanced within minimalism (in fact the recent idea not to compute the MLC upon completion of a ‘phase’ until the next higher phase is built (Chomsky 2001) also gives the MLC a representational flavor) as well as within representational frameworks such as Optimality Theory. The question thus arises as to which version has a greater descriptive and explanatory potential. The second part – whether MLC is derivable from other principles – has only recently attracted the researchers’ attention and, in our view, promises to yield a novel and interesting line of discovery. The MLC can reasonably be seen as a constraint comparing syntactic objects: the absolute length of a movement path is not per se relevant for its grammaticality, rather, what matters is whether it is shorter than its competitors. The MLC thus belongs to the realm of grammatical generalizations that motivated the formulation of Optimality Theory (OT, Prince and Smolensky 1993, Grimshaw 1997): the relative degree by which constraints are satisfied determines grammaticality. In OT, the MLC may in fact be derivable from more principled constraints and the overall architecture of determining grammaticality. If there is a constraint that excludes movement crossing barriers (and if each XP is a barrier), then, ceteris paribus, the structure will be grammatical that violates this constraint less often than its
4
Arthur Stepanov, Gisbert Fanselow and Ralf Vogel
competitors and therefore has the shortest movement path (see Legendre, Smolensky and Wilson 1998). In an OT syntax, one thus will not necessarily find a constraint that looks like the minimalist MLC. OT offers two further interesting ways of dealing with MLC-effects. Müller (2001) argued that the multiple wh-movement data and similar facts suggest a recasting of the MLC in terms of principles that require that c-command relations among syntactic elements be identical at all levels of representations. Going one step further, one might also deny that more than the basic architecture of OT is needed to capture MLC effects: for contrasts such as the one in (3) it suffices to assume that a constraint that guarantees that the subject-theta-role be realized in a position ccommanding the object-theta-role is violable in case a higher principle (whclauses must have a wh-phrase in their specifier position) makes that violation unavoidable. While it may thus seem that OT is particularly adequate for dealing with MLC-phenomena, it is not entirely obvious that the MLC is, in fact, a violable principle. Chomsky (1995) argues that it is not, focusing on A-movement phenomena, and the HMC also seems to be quite resistant to modulation by further principles. Overall, recent research led to postulation of the following general questions concerning Minimality/MLC: (5)
I. II. III.
IV.
How can apparent violations of syntactic Minimality/MLC be accounted for? What is the theoretical status of MLC? Is it a primitive or a theorem in the grammar? Can the computation of the MLC be affected by the interface properties (e.g. LF interface), perhaps in a manner similar to Reinhart (1995)? Is there MLC at PF? Can Minimality phenomena shed evidence in favor of a derivational or a representational framework?
Investigation of these issues is the task of the present volume. 2. The volume This volume comprises twelve original articles, eight of which have been presented at the Workshop “Minimal Link Effects in Minimalist and Optimality Theoretic syntax” organized at the University of Potsdam on March 21-22, 2002 in the context of the project A3 of the Research Group
Introduction
5
FOR 375 “Conflicting Rules and Conflict Resolution Strategies in Cognitive Science”, funded by the German Research Foundation (DFG). The articles in this volume explore to various extent the questions posed in (5) above on a solid empirical basis, reevaluating the data already existing in the literature as well as bringing novel data and generalizations into the domain of discussion. The empirical phenomena considered by the authors of the volume include: Superiority effects in multiple wh-questions, including those with ‘D-linked’ wh-phrase(s); Stylistic Fronting in Germanic and Romance; Amovement and Case phenomena; word order ‘freezing’ effects; and remnant constituent displacement in German and Japanese. Nine of the proposed accounts are couched along the lines of the Minimalist framework (Chomsky 1995, 2000, 2001), three in the framework of Optimality Theory (Prince and Smolensky 1993). The spectrum of defended views and conclusions concerning the role and status of MLC is also quite diverse. While some papers suggest to retain the status of MLC along the lines it is formulated in the Minimalist program, others suggest various reformulations, or propose to abandon it altogether, attributing its known effects to other grammatical (and extra-grammatical) mechanisms. (Dis)advantages of formulating the MLC in Minimalist and Optimality Theoretic syntax are also discussed. This broad range of opinions reflects, in our view, a growing interest among syntacticians in MLC and Minimality as a prominent property of human grammar, as well as the maturity of syntactic thought capable of investigating this property in an impressive depth. 2.1. Stylistic fronting and related phenomena Stylistic Fronting (SF) is a phenomenon observed in Icelandic whereby an element such as a participle, adverb or a particle is preposed in front of the finite verb. The phenomenon was in focus of the syntactic research since at least Maling (1990) and continues to generate plethora of fruitful investigations. Of special interest in the context of this volume is the fact that even though SF can be generally thought of observing the MLC (this can be inferred, for instance, from one interpretation of Maling’s hierarchy of elements undergoing SF, which appears to be structural), certain cases of SF seem to violate MLC, as for example in (6) where a participle skrifuD crosses finite auxiliary hefur and participle veriD: (6)
_etta er verstabók sem skrifu§ hefur veri§ __ this is the worst book that written has been ‘This is the worst book that has (ever) been written
6
Arthur Stepanov, Gisbert Fanselow and Ralf Vogel
This and other properties of SF are investigated in detail in articles by Bo‰koviç, Fischer, and Poole and Burton-Roberts. Bo‰koviç develops a PF merger analysis of the phenomenon. He argues that a SF construction involves leftward head-adjunction of the element affected by SF to a phonologically null head lexically specified as a verbal affix. Being a verbal affix, this head must merge at PF with the verb (as its host) under adjacency, much along the lines of Affix Hopping analyses of the 1950’s also revived in recent works. The adjacency-based account straightforwardly accounts for the well-known ‘subject gap’ restriction on SF (the subject in SF sentences cannot be lexically realized in its canonical position, e.g. Spec-IP) and other properties including the impossibility of inserting adverbs between the SF-ed element and the verb. Bo‰koviç further extends this account to cases of object shift in Scandinavian and wh-movement in Bulgarian both of which, he shows, involve the adjacency effect typical of SF constructions. He also provides an argument for Multiple Spell-out based on Scandinavian object shift. Fischer addresses the diachronic aspect of SF. On the basis of her original data from Old Catalan, Fischer demonstrates that this language in fact featured a fronting process that had most of the properties of the Icelandic SF (with the exception of the subject-gap restriction). Similarly to Bo‰koviç, Fischer postulates a functional projection hosting a SF-ed element, which for her is YP in the sense of Laka (1990). In Old Catalan, she argues, SF is an effect of checking the strong V-feature of Y by the element undergoing SF. In addition, Fischer argues that SF in Old Catalan serves the information structural needs, as it was used in sentences that express “emphasis”. She makes an interesting suggestion that the loss of SF in transition from Old to Modern Catalan is correlated with the loss of postverbal clitics in embedded sentences and suggests a direction of accounting for this correlation. Poole and Burton-Roberts are directly concerned with apparent MLC violations in SF constructions (cf. Icelandic (6)) as well as in related constructions involving long head movement in Breton. For them, SF is a “deed” of the phonological component of grammar. However phonology itself is given in their article a status different from that in the traditional minimalist model. The authors point out that the logic of the traditional model in which PF interprets (or “realizes”) the result of syntactic computation leads one to regard a given phenomenon either as strictly syntactic or strictly phonological. With respect to “stylistic” phenomena such as SF, the traditional model thus faces a paradox: since SF is not a purely syntactic phenomenon (it allows for MLC violations) the traditional model would
Introduction
7
then imply that it is strictly phonological, yet other evidence points out that SF is still sensitive to syntax (e.g. to the distinction between main and auxiliary verbs). The authors resolve the paradox by proposing the “Representational Hypothesis” which eliminates PF as a ‘post-syntactic’ component. The computation is oriented exclusively toward LF, and lexical items manipulated in the course of the derivation do not contain any phonological features. Rather, the correspondence between syntax and its phonetic output is achieved by virtue of ‘representational conventions’ that are different from language to language, and do not necessarily imply a direct correspondence between structural relations and linear word order (as for instance in Kayne’s Linear Correspondence Axiom). According to the authors, SF in Icelandic, long head movement in Breton as well as the verb second phenomenon are all the effects of particular ‘representational conventions’ operating in a given language.
2.2. A-movement and case phenomena The articles by Anagnostopoulou and Stepanov are devoted to investigating apparent MLC violations with respect to A-movement and Case assignment. Both works aim to reconcile cases of apparent violations of Minimality with the spirit of the MLC. Anagnostopoulou investigates various patterns of A-movement in double object constructions in Greek. The goal/experiencer argument in this language can be expressed as either a) a genitive DP, b) a PP or c) a clitic or a clitic-doubled DP. In A-movement contexts (passives, unaccusatives and raising) the DP receiving Nominative (the theme) can ‘skip’ over the goal/ experiencer argument on its way to Tense, just in case the goal/experiencer argument is a clitic or a clitic doubled DP, but not if it is a full (genitive) DP. If the goal/experiencer is a PP, the Nominative DP can skip over it in passives and unaccusatives, but not in raising constructions. Anagnostopoulou proposes an account of this distribution making use of the notion of minimal domain and equidistance, as well as the special properties of clitics. According to Anagnostopoulou, the intervening EPP feature blocks raising of the Nominative when the dative and Nominative are in different minimal domains, but not when they are in the same minimal domain. The former is the case when the goal/experiencer is a genitive DPs (in all A-movement contexts); the latter is the case when it is a PP in passive/unaccusative structure. In addition, Anagnostopoulou argues that clitics never intervene in raising of Nominative because the clitic itself moves to Tense prior to
8
Arthur Stepanov, Gisbert Fanselow and Ralf Vogel
movement of the Nominative. Consequently, at the time of movement of Nominative nothing intervenes on its way. The focus of Stepanov’s paper is another case of ‘skipping’ observed in ergative languages. In Hindi, for instance, the Nominative (Absolutive) case of object can be assigned by Tense via Agree across the ergative subject, even though the latter is base-generated higher than the object, hence, is closer to Tense than the object. Stepanov’s solution reconciling the ergativenominative pattern of Case assignment with the MLC is very close in essence to Anagnostopoulou’s in that it capitalizes on a particular timing of derivational steps. Stepanov proposes a modification of the minimalist derivational system in which DPs bearing an inherent Case are Merged into the structure postcyclically. Thus the subject carrying an inherent ergative Case in Hindi is Merged into the structure postcyclically, specifically, after the Case dependency between Tense and the Nominative objects has been established and deactivated. The solution is extended to another case of intervention involving inherent Case, namely, raising across the experiencer in seem-constructions.
2.3. OT Syntax: Word order, movement and “freezing” effects The three Optimality Theoretic contributions argue for reconstructing the Minimal Link Condition in a non-derivational fashion within OT. Hale and Legendre’s paper is a critical review of an argument in favour of a derivational syntax that has been put forward by Müller (1998). A remnant XP, by Müller’s generalization, cannot undergo a movement of type Y, if the antecedent of the trace inside the remnant XP also has undergone Y-movement. Müller derives this generalization from the principle of Last Resort and the barriers condition. He further suggests that only a derivational theory of syntax is able to express this empirical generalization. Hale and Legendre show that this is not the case. Their non-derivational analysis reconstructs the Minimal Link Condition with a family of violable constraints in a representational version of OT syntax. The central constraints evaluate the lengths of movement chain links by counting the maximal projections that they cross. Hale and Legendre further show with the example of Japanese that Müller’s generalization about German does not hold universally, and thereby further motivate their approach in terms of violable constraints. The main topic of Lee’s paper is the phenomenon of word order freezing: Movement of an object NP in front of a subject NP is blocked if the two NPs are of the same type. In Hindi, object-subject order becomes impossible,
Introduction
9
if subject and object are in the same case. For instance, some verbs in Hindi assign nominative to both their subject and their object. An explanation in terms of the MLC suggests itself: the subject can be seen as an intervener blocking the movement of the object. Lee argues that such an account would have to appeal to surface, i.e. PF properties of the NPs in question, which is an “extremely unnatural situation” in a derivational model like the Minimalist program, where syntax feeds PF. Lee’s alternative account is based on an OT version of Lexical Functional Grammar (LFG). The constraints evaluate the correspondence of parallel structures. Freezing occurs when an object-subject structure is indistinguishable from a subject-object structure at the surface, while such structures are possible elsewhere. The OT grammar therefore includes an interpretive perspective. Grammaticality is not only a matter of which structures can be generated, but also which underlying structures can be recovered from a surface form. This is reflected in a “bidirectional” version of OT syntax which takes both generation and interpretation perspectives (Smolensky 1996). Lee further discusses the problem of recoverability in two head-marking languages, Chamorro and Tzotzil. These languages display another example of a recoverability phenomenon: passive is avoided unless the intended meaning cannot be recovered from the active form. This generalization can be derived in a specific OT model Lee calls “Medium Strength OT”. Vogel’s contribution adds up to Lee’s argumentation and observations in many respects. His topic is the exploration of an architecture for OT syntax that is based on Jackendoff’s (1990) conception of correspondence: the OT constraints evaluate the mapping relations between a semantic (M), a syntactic (S) and a phonological (P) representation. Vogel shows how such a system of constraints is able to derive the difference between topicalization and wh-movement in English, as well as their non-difference in German. The analysis is then extended to classical superiority effects in English and German, and word order freezing effects in German, which occur with two ambiguous NPs of the same type, e.g. proper names, but not with an ambiguous wh-NP, and an ambiguous proper name. That word order freezing in German follows such a pattern of “relativized minimality” (Rizzi 1990) is another piece of evidence for the position defended by Lee and Vogel that freezing effects result from grammatical constraints. They cannot be reduced to mere performance. Bidirectionality is again a crucial property of the OT grammar proposed by Vogel.
10
Arthur Stepanov, Gisbert Fanselow and Ralf Vogel
2.4. Understanding the nature of MLC A large block of articles is devoted to examining various conceptual issues with respect to the MLC thus exploring its theoretical status and role in the grammar. These articles address to various extent questions II, III, and IV posed in Section 1 of this Introduction. Lechner proposes to refine the definition of MLC in two important ways. First, his modification of the MLC entails a widening of the domain of Attract to include not only candidates for Attraction already in the tree but also elements from the numeration, thus regulating certain aspects of Merge. Coupled with a revised definition of closeness, the modified version of the MLC captures the well known contrast between There seems to be someone in the garden and *There seems someone to be in the garden which originally motivated Chomsky’s (1995) economy condition favoring Merge over Move. Lechner’s account also extends to “Case freezing” effects (cf. *Someone seems that _ is in the garden) as well as Superraising (cf. (2a)) without additional provisos. Second, Lechner argues that certain effects of MLC pertaining to movement can be relegated to Kayne’s (1994) Linear Correspondence Axiom (LCA) operating in the context of the Phase Impenetrability Condition (Chomsky 2001), and proposes a new account of Superiority phenomena in the context of so revised LCA. His study suggests that operations Merge and Move are indeed very closely interrelated in that they are both subject to the same derivational conditions (MLC and LCA). Müller suggests that the MLC is not a principle of UG – rather, he shows how to derive MLC-effects from two independent assumptions, viz. the Phrase Impenetrability Condition and the cyclic nature of movement. According to the Phrase Impenetrability, material in the domain of the head of a phase XP is not accessible to operations outside XP. E.g., a wh-object can leave the verb phrase only if has undergone prior movement to the specifier position of the verb phrase. This movement can be a consequence of the principle of Phrase Balance, that licenses (untriggered) cyclic F-movement steps of an XP to intermediate positions unless the workspace of the derivation still contains a YP that could undergo F-movement. Thus, a wh-object may be moved to the specifier position of the verb phrase (and thus become available for further movement to Spec,CP) just in case the workspace of the derivation contains no further wh-phrase that would later be merged in a higher position (e.g., as a subject). In this way, standard superiority effects can be derived. Müller shows that his model does not only capture MLCeffects for English – it is also able to neatly predict where German shows superiority effects, and where is does not. He furthermore extends his ap-
Introduction
11
proach to intervention effects in which there is no c-command relation between the elements involved and which therefore could never be accounted for in terms of a simple MLC. Haider concurs with Müller in the conviction that there is no need for an independent MLC in grammar, but he explains the MLC away in quite a different fashion: there is a “conspiracy” of four independent constraints that yields the data normally discussed under the label “Superiority”. For example, Haider observes that wh-subjects in situ are penalized by grammar quite independently of whether they have been crossed by a lower whphrases (as in Superiority violations) or whether they are c-commanded by the wh-phrases that moves (as in *it is unclear who thinks that who saw us). Likewise, higher type wh-phrases (such as why) in Spec,CP cannot license other higher type wh-phrase, even when no intervention relation holds between the relevant items. Therefore, the pertinent licensing principle must be independent of the MLC. There is a residue of superiority effects that cannot be reduced to these and the other two principles motivated by Haider. For this residue, Haider proposes a processing account. Fanselow’s contribution tries to defend the idea that the MLC is an economy principle. It is concerned with the well-known fact that the MLC shows a blocking only for those structures the meaning of which can be expressed in a different way. Fanselow tries to identify the scope of this phenomenon, and shows that this concern for expressivity involves information structure as well. He argues against a representational and in favor of a derivational concept of an economy-based MLC. The cumulative contribution of this volume into the discussion of Minimality and MLC is thus both empirical and theoretical. We believe the articles in the volume elucidate many aspects of the fundamental questions posed in Section 1, even though none of these questions can be concluded to receive an ultimate and exhaustive answer. We thus hope that these and related questions will continue to motivate fruitful collaborative research in this area, which can only benefit our understanding of mechanisms of Minimality, and, more generally, locality in natural language.
12
Arthur Stepanov, Gisbert Fanselow and Ralf Vogel
Acknowledgements We would like to express our gratitude to our external reviewers for their excellent job evaluating the papers for this volume: Peter Ackema, Artemis Alexiadou, Hans Broekhuis, Lina Choueiri, Hans-Martin Gaertner, Anoop Mahajan, Martha McGinnis, Yukiko Morimoto, Ad Neeleman, Vieri Samek-Lodovici, Peter Sells, Halldór Ármann Sigur§sson, Chris Wilder. Our thanks also to Ines Mauer for her generous help in organizing the workshop, Marco Zugck for creating the index for this volume and Ursula Kleinhenz, the in-house editor at Mouton, for the very productive cooperation during the publication process.
References Chomsky, Noam 1973 Conditions on transformations. In A Festschrift for Morris Halle, ed. Stephen Anderson and Paul Kiparsky, 232–286. New York: Holt, Rinehart and Winston. 1986 Barriers. Cambridge, Mass.: MIT Press. 1995 The minimalist program. Cambridge, Mass.: MIT press. 2000 Minimalist inquiries: The framework. In Step by step: Essays in minimalist syntax in honor of Howard Lasnik, ed. Roger Martin, David Michaels and Juan Uriagereka, 89–155. Cambridge, Mass.: MIT Press. 2001 Derivation by phase. In Ken Hale: A life in language, ed. Michael Kenstowicz, 1–50. Cambridge, MA: MIT press. Grimshaw, Jane 1997 Projection, heads and optimality. Linguistic Inquiry 28: 373–422. Jackendoff, Ray 1990 Semantic structures. Cambridge, Mass.: MIT press. Kayne, Richard 1994 The antisymmetry of syntax. Cambridge, Mass.: MIT Press. Laka, Itziar 1990 Negation in syntax: on the nature of functional categories and projections. Doctoral dissertation, MIT, Cambridge, Mass.
Introduction
13
Legendre, Geraldine, Paul Smolensky, and Colin Wilson 1998 When is less more? Faithfulness and minimal links in wh-chains. In Is the best good enough? Optimality in syntax, ed. Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis and David Pesetsky, 249–289. Cambridge, Mass.: MIT Press. Maling, Joan 1990 Inversion in embedded clauses in Modern Icelandic. In Modern Icelandic syntax (Syntax and Semantics 24), ed. Joan Maling and Annie Zaenen, 71–91. San Diego: Academic Press. Müller, Gereon 1998 Incomplete category fronting: A derivational approach to remnant movement in German. Dordrecht: Kluwer. 2001 Order preservation, parallel movement and the emergence of the unmarked. In Optimality-theoretic syntax, ed. Geraldine Legendre, Jane Grimshaw and Sten Vikner, 279–313. Cambridge, Mass.: MIT Press. Pesetsky, David 1982 Paths and categories. Doctoral dissertation, MIT, Cambridge, Mass. Prince Alan, and Paul Smolensky 1993 Optimality Theory: Constraint interaction in generative grammar. Technical report, Rutgers University. Reinhart, Tanya 1995 Interface strategies. Ms., Research Institute for Language and Speech, Faculty of Letters, University of Utrecht. Rizzi, Luizi 1990 Relativized minimality. Cambridge, Mass.: MIT Press. Smolensky, Paul 1996 On the comprehension/production dilemma in child language. Linguistic Inquiry 27:720–731.
On clitics, feature movement, and double object alternations Elena Anagnostopoulou
In this paper I establish the following Generalization for Greek: (1)
When a lower nominative argument moves to T over a higher argument the higher argument must move to T as well as a clitic.
Specifically, I investigate the distribution of dative goals and experiencers in Greek and, I show that while in transitive constructions dative arguments can be DPs with morphological genitive Case, PPs or clitic doubled/cliticized genitives, in NP movement constructions certain forms are systematically missing. Genitive DPs are illicit in all NP movement contexts (passives, unaccusatives and raising) and PPs are ruled out in raising constructions. On the other hand, clitics and clitic doubled DPs are always licit. On the basis of this distribution, I argue that clitic doubling/cliticization is a way to circumvent Attract Closest effects with NP-movement. Nominatives may not cross over higher dative DPs and PPs when the latter are located in a different minimal domain than nominatives (Chomsky 1995: 356, see also McGinnis 1998 for a related though different account). Under cliticization and clitic doubling, the formal features of a dative DP move to T before a lower nominative DP moves and, therefore, the dative does not count as an intervener for the movement of the nominative. Under the proposed analysis, clitic doubling constitutes feature raising without phrasal pied piping (Chomsky 1995).
1.
The forms of dative constructions in Greek
1.1. The Dative Alternation Greek has a dative alternation. The indirect object can be realized as a PP or a DP with morphological genitive case:
16
(2)
Elena Anagnostopoulou
a. Edosa to vivlio s-ton Janni Gave-I the book (Acc) P-Det (to-the) John (Acc) ‘I gave the book to John’ b. Edosa tu Janni to vivlio Gave-I the John(Gen) the book(Acc) ‘I gave John the book’
PP-dative
Genitive DP
Greek has lost the distinction between morphological dative and genitive case and has generalized the use of genitive. The Genitive construction illustrated in (2b) is a double object construction. There is extensive evidence for this discussed in Markantonatou (1994) and Anagnostopoulou (1998). For example, there is an animacy restriction on genitive goals which generally characterizes double object constructions crosslinguistically, as is well known: (3)
*Estila tis Gallias to gramma Sent-I the France(Gen) the letter(Acc) ‘*I sent France the letter’
Moreover, there are constraints on the semantic types of predicates that license the Genitive construction in Greek, like English (Pinker 1989, Pesetsky 1995). For example, (4) shows that the Genitive construction is disallowed with verbs expressing communication of propositions, like the double object construction in English: (4)
*Ipostiriksa tu dikasti tin athootita mu Claimed/asserted-I the judge(Gen) the innocence-my(Acc) ‘*I asserted the judge my innocence’
Finally, in the Genitive construction the goal asymmetrically c-commands the theme. The examples below illustrate this with respect to quantifiervariable binding (Barss & Lasnik 1986): (5)
?Estila tu kathe ipallilui tin epitagi tui Sent-I the every employee(Gen) the paycheck his(Acc) ‘I sent every employee his paycheck’
(6)
*Estila tu katoxu tui to kathe checki Sent-I the owner its(Gen) every check(Acc)
On clitics, feature movement and double object alternations
17
1.2. Indirect object clitic doubling When the indirect object is a Genitive DP, it can be doubled by a pronominal clitic: (7)
Tu-edosa tu Janni to vivlio Cl(Gen)-gave-I the John(Gen) the book(Acc) ‘I gave John the book’
When it is a PP, doubling is not possible (cf. Markantonatou 1994, Dimitriadis 1999): (8)
*Tu-edosa to vivlio s-ton Janni Cl(Gen) gave-I the book (Acc) P-Det (to-the) John (Acc) ‘I gave the book to John’
Doubling is possible with indirect object Genitive QPs. The genitive asymmetrically c-commands the accusative: (9)
tin epitagi tui ?Tu estila tu kathe ipallilui Cl(Gen) sent-I the every employee(Gen) the paycheck his(Acc) ‘I sent every employee his paycheck’
to kathe checki (10) *Tu estila tu katoxu tui Cl(Gen) sent-I the owner its(Gen) every check(Acc) After having introduced the various types of datives in Greek, I will now look at their distribution in transitive and intransitive contexts.
2. The distribution of datives The distribution of dative phrases in Greek is summarized in table 1: Table 1.
Transitives Passives/Unaccusatives Raising
Genitive DPs
PPs
Doubled DPs/Clitics
ok * *
ok ok *
ok ok ok
18
Elena Anagnostopoulou
Starting with transitives, we saw already that when the general preconditions for the double-object construction are met, i.e. the goal is animate and the predicate is of an appropriate semantic type, goals can either be PPs or genitive DPs. In these contexts, clitic doubling or cliticization of the goal argument may optionally take place. On the other hand, in NP-movement constructions there are systematic restrictions. Genitive DPs may not break up an A movement Chain when there is NP movement of themes or raising of subjects to a higher [Spec, TP]. In (11) it is shown that genitive DPs are not allowed in passives: (11) *?To vivlio dothike tu Janni apo tin Maria The book (Nom) was given the John(Gen) by the Mary ‘The book was given to John by Mary’ The same holds for goals and experiencers in non-alternating unaccusatives, alternating unaccusatives and experiencer object predicates that belong to Belletti & Rizzi’s (1988) ‘piacere’ class, a class which is uncontroversially unaccusative (Pesetsky 1995): non(12) a. *?To gramma irthe tis Marias me megali kathisterisi alternating: The letter (Nom) came the Mary(Gen) with a big delay ‘The letter came to Mary with a big delay’ b. *O ipopsifios parusiastike tis Marias alternating: The candidate (Nom) appeared the Mary(Gen) ‘The candidate appeared Mary’ c. *Ta vivlia aresoun tu Petru experiencer-object The books (Nom) please-3pl to-the Peter ‘Peter likes books’ Finally, with the raising verb fenete ‘seem’ genitive DP experiencers are not licensed. This is shown in (13a) with a small clause complement and (13b) with a subjunctive complement:1 (13) a. *O Jannis fenete tis Marias eksipnos The Jannis seems the Mary(Gen) intelligent ‘John seems to Mary to be intelligent’ b. *Ta pedhia dhen fenonte tis Marias na dhiavazoun The children not seem-3pl the Mary(Gen) SUBJ read-3pl ‘The children do not seem to Mary to study’
On clitics, feature movement and double object alternations
19
PPs are allowed when there is NP movement of ‘deep objects’ to Spec,TP, i.e. in passives and unaccusatives, but not when there is raising of lower subjects to a higher [Spec, TP]: 2 (14) a. To vivlio dothike s-tin Maria passive The book was given to-the Mary b. To gramma eftase s-tin Maria unaccusative The letter got to-the Mary c. To vivlio aresi s-tin Maria experiencer-object The book appeals to-the Mary d. ?* O Jannis fenete s-tin Maria eksipnos raising The John seems to-the Mary intelligent e. *Ta pedhia dhen fenonte s-tin Maria na dhiavazoun The children not seem-3pl to the Mary SUBJ read-3pl ‘The children do not seem to Mary to study’ On the other hand, there are no restrictions on clitic doubled or cliticized genitives. They are allowed in all NP movement contexts, passives, unaccusatives and raising constructions alike: (15) a. To vivlio tis dothike (tis Marias) passive The book Cl(Gen) was given the Mary(Gen) b. To gramma tis eftase (tis Marias) unaccusative The letter Cl(Gen) got the Mary(Gen) c. To vivlio tis aresi (tis Marias) experiencer-object The book Cl(Gen) appeals the Mary (Gen) d. O Jannis tis fenete (tis Marias) eksipnos raising The John Cl(Gen) seems the Mary(Gen) intelligent e. Ta pedhia dhen tis fenonte (tis Marias) na dhiavazoun The children Cl(Gen) not seem-3pl the Mary(Gen) SUBJ read-3pl In the next sections, I will argue that the complex interaction between cliticization/clitic doubling and A-movement shown in the patterns above can be derived from a theory according to which locality is sensitive to intervening features (Chomsky 1995, 1998). I will also argue that clitic doubling is a case in which Move raises just formal features leaving the rest of the category unaffected (as in Chomsky 1995).
20
Elena Anagnostopoulou
3.
Analysis
3.1.
Assumptions
Following Chomsky (1995, 1998), I assume that a set of universal features are manipulated by the computational system by Feature-Attraction and Move to generate expressions. Attraction affects the closest to the target appropriate phrase. “Appropriateness” depends on whether or not a feature F of the moved constituent may enter into a matching relation with a feature of the target. Mismatch of features cancels the derivation. “Closeness” is defined in terms of c-command and equidistance. The definitions I assume are given in (16). They are taken from Chomsky (1995): (16) a. _ can raise to target K only if there is no legitimate operation Move ` targeting K, where ` is closer to K. b. K attracts F if F is the closest feature that can enter into a checking relation with a sublabel of K. c. If ` c-commands _, and o is the target of movement, then ` is closer to o than _ unless ` is in the same minimal domain as (i) o or (ii) _ Following Alexiadou & Anagnostopoulou (1998a,b) and Anagnostopoulou (1998), I further assume that there are two features associated with I (cf. Chomsky 1995, Collins 1997): an EPP feature and a Case feature. Both are formal features of the same type, i.e. [-interpretable] nominal features on functional heads, and both are responsible for the movement operations performed in the computational system. Unlike Chomsky (1995, 1998), I assume that EPP is not necessarily satisfied by Move / Merge XP. Move X0 and Move F can also check EPP.
3.2.
Differences in the distribution of DPs and PPs follow from Equidistance
Recall that there is an asymmetry in the distribution of genitive DPs and PPs. The former are ruled out in monoclausal and biclausal NP-movement constructions alike. The latter are ruled out only in biclausal NP-movement constructions. I argue that this asymmetry can be naturally accommodated
On clitics, feature movement and double object alternations
21
in the system outlined above. Specifically, (i) dative phrases have an EPP feature that can be attracted by T irrespective of their categorial status, whether they are PPs or DPs. As a result, both DPs and PPs may block movement of lower nominatives. (ii) The intervening feature blocks attraction of the nominative argument when the nominative and the dative are in different minimal domains. Genitive DPs are always in a different minimal domain than nominatives because they are introduced by a light applicative verb (Marantz 1993). PPs are in a different minimal domain than nominatives only in raising constructions because they are arguments of the main verb while the subject raises out of the embedded clause. In monoclausal constructions (passives, unaccusatives) PPs and nominative themes are both in the minimal domain of the lexical verb. Collins (1997) and Alexiadou & Anagnostopoulou (1998b) argue that locative and dative PPs are visible for EPP-Attract. Evidence for this comes from the fact that they may undergo EPP-driven movement in constructions like locative inversion (Bresnan & Kanerva 1989 among many others) and dative inversion (den Dikken 1995). In the present system, this means that dative PPs have a D (EPP) feature that can be attracted by T. When the PP moves, it checks the EPP feature of T. Case and the q-features of T are checked by the other argument. Dative DPs also provide evidence that they can be attracted to T. In many languages, dative arguments must become subjects in passive ditransitives.3 Depending on whether a language has a distinction between a dative (lexically specified) and an accusative morphological case, dative arguments become subjects retaining their lexically specified dative when the language has a three-way case/agreement system, while they exchange their accusative (or object agreement) with nominative morphology (or subject agreement) when the language has a two-way case/agreement system. In the former class of languages, the other argument surfaces with nominative case. In the latter class of languages, the other argument surfaces with a case which by some has been characterized as inherent/oblique accusative (Larson 1988, Pesetsky 1995), by others as no Case (Baker 1988, 1996) and by others as structural accusative (McGinnis 1998). In the present framework, this means that dative DPs with morphologically specified case have a D feature that can be attracted to T, while the nominative argument checks Case and the q-features of T. On the other hand, indirect object DPs which become nominative under passivization check all formal features of T. Greek is a language with a three-way distinction and for this reason, I will assume that genitive DPs have a D feature that can be attracted by T, like PPs and like quirky datives in Icelandic. Genitive DPs are not attracted
22
Elena Anagnostopoulou
for Case/q-features since they never surface with nominative and they never agree with the verb: (17) *I Maria dothike to vivlio The Mary(Nom) was given the book(Acc) ‘Mary was given the book’ Romero & Ormazabal (1998) have made the important observation that languages with a two-way case system have unaccusatives that do not license the double object construction, while languages with a three-way case system have unaccusatives that license the double object construction. In the former group of languages, unaccusatives differ from passives in permitting neither NP-movement of the goal nor NP-movement of the theme; that is, unaccusatives in languages with a two-way case system are obligatorily related to the PP-dative construction (Baker 1993). In the latter group of languages, unaccusatives behave syntactically like passives. Greek is a language with a three-way system, and unaccusatives behave exactly like passives. In what follows, I will concentrate on Greek and I will discuss passives and unaccusatives on a par. With these in mind, I now turn to the distribution of Genitive DPs and PPs in monoclausal and biclausal NP-movement constructions. Recall that the alternation between the PP-construction and the Genitive construction in Greek corresponds to the dative alternation in English. Greek se-datives are the counterparts of English to-datives, and Greek genitive goals are the counterparts of double object goals in English. Even though there is controversy in the literature as to whether the two constructions are transformationally related or not, and what the correct representation is for the double object construction (see the various proposals discussed in Barss & Lasnik 1986, Larson 1988, Baker 1988, 1996 Marantz 1993, Pesetsky 1995, den Dikken 1995), there is a growing consensus that the double object construction involves a zero affix introducing the goal argument,4 which explains, among other things, why nominalizations (18) and adjectival passives (19) related to the double object construction are ungrammatical (Myers’s Generalization effect,5 see Pesetsky 1995 and Marantz 1993): (18) a. *Sue’s gift of Mary of a book b. Sue’s gift of a book to Mary (19) a. hand-made cookies b. *flower-given boss
On clitics, feature movement and double object alternations
23
The fact that double object constructions and applicative constructions found in e.g. Bantu languages have identical syntactic properties, and that in applicative constructions this affix is overt (the applicative affix), further supports this analysis. Marantz (1993) argues that this applicative affix is a light v introducing the goal argument, which is merged on top of the lexical V introducing the theme, resulting in a “stacked VP” structure in which the theme is introduced by V, the goal by an applicative v and the agent by a causative v (Chomsky 1995 building on Hale & Keyser 1993, Kratzer 1994): (20)
vP 3 John v’ 3 v vP ! 3 gave Mary v’ 3 VP vAPPL 3 V a book
In such a structure, it is clear why the theme cannot move across the goal to T in passives and unaccusatives given the definition of equidistance in (16c). The goal is neither in the same minimal domain as the target (Spec,TP) nor in the same minimal domain as the theme since the goal is the specifier or vAPPL and the theme is the complement (or specifier, Marantz 1993) of V. Thus, this structure6 together with the assumption that Genitive DPs have a D feature that can be attracted by T, accounts for the fact that they are ruled out in in Greek passives and unaccusatives (Table 1) in terms of Attract Closest. In (11) and (12) the nominative cannot raise to T across an intervening dative which is closer to T than the nominative and has a D feature that can be attracted by T. The same analysis can be extended to the raising examples in (13), though we will see that it is not even necessary to appeal to vAPPL, in order to account for the ungrammaticality of (13). Coming to PPs, the wellformedness of passives and unaccusatives related to the PP construction in (14a)–(14c) is due to the fact that PPs and nominatives are equidistant from T since they are in the same minimal domain.
24
Elena Anagnostopoulou
Crosslinguistic evidence for the fact that PP-goals and themes are equidistant from T comes from the fact that PP datives in English permit optional movement of either the theme or the goal argument in passives, a fact which can be explained in terms of Local Economy (Collins 1997): 7 (21) a. A book was given to Mary b. To Mary was given a book Moreover, there is no evidence from nominalizations and adjectival passives that there is an extra head in PP dative constructions. In this account, the grammaticality of PP datives is expected regardless of whether the correct structure is one in which the theme commands the goal (22a), as proposed by Larson (1988), or one in which the goal commands the theme underlyingly (22b), as argued for in Pesetsky (1995) on the basis of backward anaphora: (22)
a.
T-max 3 T
VP 3 DPnom
b.
V’ 3 V PP
T-max 3 T
VP 3 PP V’ 3 V DPnom
And in fact, both structures are compatible with the facts in Greek.8 As shown in (23), an indirect object quantifier can precede the direct object and bind a pronoun in it, and, conversely, a direct object quantifier can precede the indirect object and bind a pronoun in it. Similar results are obtained by the Each … the Other test (Barss & Lasnik 1986), but the relevant facts are ommitted here due to space limitations:
On clitics, feature movement and double object alternations
(23) a. Estila se kathe ipalliloi tin epitagi tui Sent-1sg to every employee(PP) the paycheck his(Acc) ‘I sent to every employee his paycheck’ b. ??Estila ston katoxo tui kathe checki Sent-1sg to the owner its(PP) every check(Acc) ‘I sent to its owner every check’ c. Estila kathe checki ston katoxo tui Sent-1sg every check(Acc) to-the owner his ‘I sent every check to his owner’ d. ??Estila tin epitagi tui se kathe ipalliloi Sent-1sg the paycheck his(Acc) to every employee ‘I sent his paycheck to every employee’
25
PP >DP
DP >PP
In raising constructions, the PP is an optional argument of the matrix verb fenete ‘seem’, while the subject raises out of the embedded clausal complement (small clause or subjunctive):9 (24)
TP 3 spec T’ 3 T VP 3 PP V’ ston Petro 3 V IP/SC fenete 3 Jiannis I’/X’ 3 I/X
Since in (24) the nominative argument and the c-commanding PP are in different minimal domains, the raising examples in (14d) and (14e) are correctly predicted to be ungrammatical. Note that the asymmetry between monoclausal and biclausal constructions in the case of PPs is a strong argument for locality in terms of minimal domains and equidistance rather than ccommand, especially if the underlying structure of PP-datives is (22b). With these in mind, I will now turn to an analysis of clitic constructions.
26
Elena Anagnostopoulou
3.3. Dative clitics move to T before lower nominatives In the previous section, I argued that the distribution of dative arguments in NP-movement constructions is determined by Attract Closest. In passives and unaccusatives, DPs are ruled out because they are introduced by a light applicative head, thus blocking NP movement of the lower theme argument to T. PPs, on the other hand, are licit because they are merged in the same minimal domain as themes. In raising constructions, both types of datives are ungrammatical since they occur in the main clause while the nominative raises out of the embedded clause. As shown in (15), clitics and clitic doubled DPs are grammatical in all NP movement contexts. Given that undoubled genitives are ungrammatical, the result is that while cliticization and doubling of genitives are optional in transitive contexts, they are obligatory in in passives, unaccusatives and raising constructions. The well-formedness of the examples in (15), with clitic doubling and cliticization, suggests that cliticized and clitic doubled DPs in Greek are always ignored for the purposes of Move/Attract. Even in raising constructions, where dative arguments are clearly higher than the arguments undergoing NP-movement and where PPs are impossible, cliticized/clitic doubled DPs are well formed. This is surprising since cliticized/clitic doubled datives are clearly DPs having a D-feature, which are at least as high as their non-doubled counterparts, or even higher. And yet, we must conclude that Attract Closest is not violated. This leads to a re-interpretation of the Generalization in (1) as in (25): (25) When a lower nominative argument moves to T over a higher argument and the higher argument moves to T as well as a clitic, there is no violation of locality. To account for (25), I propose that in clitic constructions, the clitic moves to T before the lower nominative. Consider (26) where the nominative argument cannot move to T across the genitive DP because the genitive has a D feature which can be attracted to T and is closer to T than the nominative:
On clitics, feature movement and double object alternations
27
(26)
If the genitive moves to T before the nominative, as is illustrated in (27), then Attract Closest is respected: (27)
Step I
Step II I argue that the well formed examples with clitic doubling and cliticization involve the derivation in (27). Starting from simple clitics, according to several analyses of cliticization (Kayne 1991 and much subsequent literature), pronominal clitics are attached to a functional head position in which the verb is found, as a result of V-to-I
28
Elena Anagnostopoulou
movement. Assuming this head to be T, clitics attach to the complex v-T head, resulting in (28): 10 (28)
T 3 cl
T 3 V
T
Nominative arguments also move to T to check phi-features and Case. This means that in a construction containing a nominative DP and a pronominal clitic both arguments target a single head T. Since both arguments move to the same head, the order of their movements is determined by Attract Closest. T prefers to attract the argument which is closer to it first. Movement of the lower argument will follow anyway, but on the assumption that the grammar cannot look ahead in the derivation, this is irrelevant to the choice of which of the two arguments will move first. In a construction without an external argument the pronominal dative clitic is merged higher than the nominative and it moves first followed by the nominative argument. T attracts the clitic first because it is closer to it. Once it is in T, the clitic no longer interferes with the movement of the nominative. Such a derivation is crucially different from a derivation without cliticization. At the point where the nominative moves, the dative argument no longer is in its base position. It is in T and therefore, it does not block movement of the nominative to T. This accounts for the difference between NP movement constructions in which the dative surfaces as a clitic, which are well formed, and NP movement constructions in which the dative surfaces as a Genitive DP, which are ungrammatical. I further extend this analysis to clitic doubling constructions. I argue that the clitic is a spell-out of formal features of the full argument it doubles. Thus, even though the genitive phrase is in a position between the nominative and T, its D feature has moved “out of the way” of the nominative argument.11 Clitic doubling is, on this view, a “sign” of D-feature movement without phrasal pied piping. Evidence for this analysis comes from the fact that, as noted and discussed in detail in Alexiadou & Anagnostopoulou (1999), the presence of doubling clitics affects binding relationships among DPs. Specifically, clitic doubling systematically obviates Weak Crossover effects. Though Greek has WCO effects, they are systematically absent when the lower phrase undergoes clitic doubling. The basic contrast is illustrated below:
On clitics, feature movement and double object alternations
29
(29) a. Kathe mitera sinodepse to pedhi tis sto sxolio Every mother accompanied the child hers at school ‘Every mother accompanied her child to school’ b. ?*I mitera tu sinodepse to kathe pedhi sto sxolio The mother his accompanied the every child at school ‘?*His mother accompanied every child to school’ (30) a. Kathe mitera to sinodepse to pedhi tis sto sxolio Every mother cl-acc accompanied the child hers at school ‘Every mother accompanies her child at school’ b. I mitera tu to sinodepse to kathe pedhi sto sxolio the mother his cl-acc accompanied the every child at school ‘His mother accompanied each child at school’ (30a) shows that the subject binds into the clitic doubled object and (30b) shows that the clitic doubled object also binds into the subject. Crucially, in the absence of a doubling clitic in (29) the usual subject-object asymmetry arises. The subject can bind into the object while the reverse is not possible. Alexiadou & Anagnostopoulou (1999) argue that the mutual binding effects in (30) are due to the fact that there is movement of the object across the subject, which is signified by the clitic, and optional reconstruction of the preverbal subject to its VP-internal position, which is lower than the moved object. On this account, the backward variable binding effects found in Greek clitic doubling constructions are assimilated to comparable effects found in English raising constructions as opposed to control constructions discussed in Fox (1998):12 (31) a. His father seems to every boy [t to be a genious] b. Every woman seems to her son [t to be a genious] (32) a. ??His father wrote to every boy [PRO to be a genious] b. Every father wrote to his boy [PRO to be a genious] Chomsky (1995: 272–275) suggests that if feature movement exists, we expect it to show binding effects because binding involves a relation between formal features (D and phi-features) of DPs. He argues that this is correct on the basis of binding evidence found in ECM constructions and control evidence found in there-type expletive constructions. However, Lasnik (1996) points out that in expletive-associate chains feature movement does not affect binding, on the basis of the contrasts in (33) and (34) with anaphora and WCO effects respectively:
30
Elena Anagnostopoulou
(34) a. *there seem to each other [t to have been many linguists given good job offers] b. Many linguists seem to each other to have been given good job offers (35) a. *There seems to his lawyer to have been some defendant at the scene b. Some defendant seems to his lawyer to have been at the scene Lasnik (1996) concludes that expletive constructions provide no evidence that feature movement affects binding, a conclusion which, I believe, is valid. Clitic doubling in Greek, however, has a clear and systematic binding effect. Thus Greek provides evidence that feature movement creates new binding configurations. I propose that the difference between expletive constructions and clitic doubling constructions is that the former involve just phi/N-feature movement while the latter involve D feature movement. That expletive constructions do not involve D feature movement is uncontroversial. This is the standard way of analysing Definiteness Restriction effects: it is assumed that the expletive has a D feature satisfying the EPP and needs to combine with an N feature, this being the reason why the associate cannot definite or headed by a strong determiner. That clitic doubling constructions involve raising of a D feature is exactly what we need to assume in order to account for the lack of intervention effects in NP-movement constructions. I conclude that binding is affected only when there is D feature movement, not otherwise. This means that the D feature is the formal feature relevant for binding relations. 13
Acknowledgements I would like to thank Artemis Alexiadou, David Embick, Martin Everaert, Danny Fox, Martin Hackl, Sabine Iatridou, Tony Kroch, Winfried Lechner, Alec Marantz, Martha McGinnis, Norvin Richards, Henk van Riemsdijk, Philippe Schlenker, and especially David Pesetsky for discussion, comments and suggestions. This paper has also been published at the proceedings of NELS 29. Papers from the Poster Session. I have changed nothing in the original paper.
On clitics, feature movement and double object alternations
31
Notes 1. In Anagnostopoulou (1998) it is argued in detail that Greek has raising across subjunctive na-complements. Greek also has control subjunctives (Iatridou 1988/1993, Terzi 1992 among others). It would lead us too far afield to address the issue here. 2. Similar facts are found in French and Italian. See McGinnis (1998) for a detailed analysis in terms of featural locality. 3. As is well known, the situation is very complicated with double object constructions crosslinguistically. Languages differ with respect to whether they only allow goal passivization (asymmetric double object languages) or both goal and theme passivization (symmetric double object languages). A further division is between languages that do not have a morphological distinction between dative and accusative case and languages that do have such a distinction. Among languages that have a morphological distinction between dative and accusative case, there are languages like Icelandic in which the dative argument becomes the subject in passives retaining its morphological case (quirky subject) and languages like Albanian in which the theme argument becomes subject. It is not possible to discuss all these cases here due to space limitations. The reader is referred to Baker (1988), Marantz (1993), McGinnis (1998) and Anagnostopoulou (1998) among many others for detailed discussion. 4. Pesetsky (1995) argues that the zero affix introduces the theme argument but his arguments crucially rely on the assumption that the Case of the theme-argument is exceptional. This might be correct for asymmetric double object languages which lack a morphological distinction between a dative and an accusative (English) but cannot be extended to symmetric double object languages and, especially, dative-accusative languages. 5. According to Myers’s Generalization, zero-derived verbs do not permit affixation of further derivational morphemes (Pesetsky 1995: 128). On the assumption that double object constructions are formed on the basis of a zero affix, it follows that derivational processes like nominalizations and adjectival passive formation cannot take as basis a double object construction because e.g. the nominalizing affix will attach to the verb plus the zero applicative affix. 6. In passives and unaccusatives the structure is identical, except that the causative v is not projected, and either there is no v at all or there is an intransitive v. Collins (1997) argues on the basis of the position of the verb in English unaccusatives that there is an intransitive v to which the lexical verb raises, and Marantz (1997) argues on independent grounds for the same. I believe they are right. However, in the structures to follow I abstract away from intransitive v for reasons of space. 7. Note that the grammaticality of the examples in (21) is an argument that double object constructions involve an applicative head introducing the goal. The fact that in the double object construction the goal blocks NP-movement of the
32
Elena Anagnostopoulou
theme cannot be accounted for simply in terms of c-command, or else we would incorrectly predict either (21a) or (21b) to be ungrammatical. 8. There are some reasons to believe that (22b) is the correct underlying structure, as discussed in detail in Anagnostopoulou (1998). For present purposes, however, both structures would do. 9. In (24) the head and the category of the small clause is left vague. In an Agrbased system it could be an AgrP (Chomsky 1995: 353). In a system that does away with Agr projections, it could either be an AP (Stowell 1983, Chomsky 1995: 353–4), or even a VP with a V mediating the relation between the subject and the predicate (Hale & Keyser 1997). It doesn’t matter which one as long as the SC is headed. 10. In (28) I abstract away from clitic clusters. Following Richards (1997), I assume that when two clitics move they target the same head T, resulting in crossing paths, which are analysed in terms of “tucking in”. The higher clitic moves first to T because it closer to it (Attract Closest), and the lower clitic moves second “tucking in” to a position beneath the first one as a result of Shortest Move. This analysis gives the correct result that in Greek (as in many other languages) the order of clitics is strictly genitive>accusative. Note that, as pointed out by Bonet (1991), Greek is one of the languages in which the order of clitics does not appear to be determined by morphological factors such as sensitivity to person features. Thus, Greek can be plausibly claimed to be a language in which the order of clitics reflects their syntax and is not altered by requirements imposed by the morphological component. Note that the surface structure resulting from (27) does not reflect the order in which the arguments have moved. This is so because the nominative moves as an XP while the dative moves as a head or as a set of features. It appears that the base order among arguments is preserved when all arguments undergo the same type of movement: they uniformly undergo XP or clitic movement. 11. Pesetsky (1998) interprets these facts in terms of Richards’ (1997) Principle of Minimal Compliance. 12. Fox argues that the control sentences are deviant because of WCO under the assumption that QR involves A’movement. The raising sentences, on the other hand, are acceptable. This is explained if we assume that QR is not necessary to get scope for the universal quantifier (because then we would expect a WCO effect to obtain). In turn, this suggests that the well-formed raising examples involve Scope Reconstruction. 13. Another possibility is to suggest that expletive constructions are instances of Agree (Chomsky 1998) and do not involve feature movement to a higher position. In expletive constructions, the associate remains in its base position and, therefore, a new binding configuration cannot be created because command remains the same. On the other hand, in clitic doubling the clitic undergoes actual raising, which has a clear PF reflex, and therefore it does c-command an originally higher DP.
On clitics, feature movement and double object alternations
33
References Alexiadou, Artemis & Elena Anagnostopoulou 1998a Parametrizing AGR: Word Order, Verb-movement and EPP checking. Natural Language and Linguistic Theory 16: 491–539. 1998b The subject in situ generalization, and the role of Case in driving computations. Paper presented at the 21st GLOW Colloquium in Tilburg. 1999 Clitic phenomena and (non-) configurationality. To appear in D. Jung and J. Helmbrecht (eds) Pronominal Arguments: Morphology and Syntax. John Benjamins. Anagnostopoulou, Elena 1998 On dative alternations and clitics, ms., University of Crete. Baker, M. 1988 Incorporation. Chicago, Illinois: University of Chicago Press. Baker, Mark 1993 Why Unaccusative Verbs cannot Dative-Shift. In Proceedings of the North East Linguistic Society 23, University of Ottawa. A. J. Schafer (Ed.), Graduate Linguistic Student Association: 33–47. 1996 The Polysynthesis Parameter. Oxford: Oxford University Press. Barss, Andrew & Howard Lasnik 1986 A Note on Anaphora and Double Objects. Linguistic Inquiry 17: 347–354. Belletti, Adriana. & Luigi Rizzi 1988 Psych verbs and Theta-Theory. Natural Language and Linguistic Theory 6: 291–352. Bonet, Eulalia 1991 Morphology after Syntax: Pronominal Clitics in Romance. Ph.D. Dissertation, Cambridge, MA: MIT. Bresnan, Joan & J. Kanerva 1989 Locative Inversion in Chichewa: a Case study of Factorization in Grammar. Linguistic Inquiry 20: 1–50. Chomsky, Noam 1995 The Minimalist Program. Cambridge, MA: MIT Press. 1998 Minimalist Inquiries: the Framework, ms., MIT. Collins, Chris 1997 Local Economy. Cambridge, MA: MIT Press. Dikken, Marcel den 1995 Particles: on the Syntax of Verb-particle, Triadic and Causative constructions. Oxford: Oxford University Press. Dimitriadis, A. 1999 On Clitics, Prepositions and Case Licensing in Standard and Macedonian Greek. In A. Alexiadou, G. Horrocks and M. Stavrou, eds., Studies in Greek Syntax: 95–113. Dordrecht: Kluwer.
34
Elena Anagnostopoulou
Fox, Danny 1998 Economy and Semantic Interpretation. Ph.D. Dissertation, Cambridge, MA: MIT. To appear as a joint publication of MIT Press and MITPWL. Hale, Ken & Samuel J. Keyser 1993 On Argument Structure and the Lexical Expression of Syntactic Relations. In K. Hale, & S. J. Keyser (Eds.), The View from Building 20: 53–110. Cambridge, MA: MIT Press. 1997 The Basic Elements of Argument Structure. Ms. MIT. Iatridou, Sabine 1988/1993 On Nominative Case assignment and a few related things. In MITWPL 18 (1993). Papers on Case & Agreement II, Cambridge, MA: MIT. Kayne, Richard 1991 Romance Clitics, Verb Movement, and PRO. Linguistic Inquiry 22: 647–686. Kratzer, Angelika 1994 On External Arguments. In E. Benedicto, & J. Runner, eds., Functional Projections, pp. 103–130. Amherst: GLSA. Larson, Richard 1988 On the Double Object Construction. Linguistic Inquiry 19: 335–392. Lasnik, Howard 1996 On Certain Structural Aspects of Anaphora. Paper presented at the First Linguist On-Line Conference: Geometric and Thematic Structure in Binding, November 1996. Marantz, Alec 1993 Implications of asymmetries in double object constructions. In S. A. Mchombo, ed., Theoretical Aspects of Bantu Grammar: 113–150. Stanford: CSLI Publications. 1997 No escape from syntax: Don’t try morphological analysis in the privacy of your own lexicon. Paper prsented at the 21st Penn Linguistics Colloquium. To appear in the University of Pennsylvania Working Papers in Linguistics, 4.2. University of Pennsylvania. Markantonatou, S. 1994 Diptota Rimata: Mia Lexico-Semasiologiki Prosegisi. Proceedings of the 15th Annual Meeting of the Department of Linguistics of the University of Thessaloniki, Thessaloniki. McGinnis, Martha 1998 Locality in A-Movement. Ph.D. Dissertation, Cambridge, MA: MIT. Pesetsky, David 1995 Zero Syntax. Cambridge, MA: MIT Press. 1998 Handout distributed at MIT Class Lectures, Fall 1998. (joint work by D. Pesetsky & E. Torrego).
On clitics, feature movement and double object alternations
35
Pinker, Steven 1989 Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press. Richards, Norvin 1997 What Moves Where When in Which Language. Ph. D. Dissertation, Cambridge, MA: MIT. Romero, Juan & Javier Ormazabal 1998 Morphological restrictions and other syntactic matters. Paper presented at the Linguistics Colloquium at the University of Pennsylvania. November, 13 1998. Stowell, Tim 1983 Subjects across categories. The Linguistic Review 2: 285–312. Terzi, Arhonto 1992 PRO in Finite Clauses. A study of the Inflectional Heads of the Balkan Languages. Ph. D. Dissertation, New York: CUNY.
PF merger in stylistic fronting and object shift Z eljko Bos kovic
In this paper I examine stylistic fronting and object shift in Scandinavian. I show that several otherwise puzzling properties of Scandinavian stylistic fronting and object shift, most notably, the subject gap restriction on the stylistic fronting construction and the saving effect of V-topicalization on object shift in auxiliary+participle constructions, discussed by Holmberg (1999), can be accounted for in a principled way under PF merger analyses of these constructions. For the object shift construction, I essentially follow Bobaljik’s (1994, 1995) PF merger analysis. For the stylistic fronting construction, I provide a new PF merger analysis, which is extended to the adjacency effect in Bulgarian wh-questions. The PF merger analyses of the Scandinavian constructions in question are shown to provide evidence that adverbs interfere with PF merger and to provide an argument for the multiple spell-out hypothesis. In section 1 of the paper I discuss stylistic fronting. In section 2 I discuss object shift. Section 3 is the conclusion.
1.
Stylistic fronting
1.1. The PF merger analysis of stylistic fronting Stylistic fronting in Icelandic affects a variety of different elements, including participles, adjectives, adverbs, particles, and prepositions. ((1b-c) are taken from Maling 1980/1990 and (1d-g) from Jónsson 1991. The elements undergoing stylistic fronting are underlined. Stylistic fronting is also found in Faroese and Old Scandinavian.) (1)
a. –etta er ma#ur sem ekki hefur leiki# nítíu leiki. this is a man that not has played ninety games ‘This is a man that has not played ninety games.’ b. –a# var hætt a# rigna —egar komi# var —anga#. it was stopped to rain when arrived was thither ‘It had stopped raining when they/we arrived there.’
38
Îeljko Bo‰koviç
c. –etta er bærinn —ar sem fæddir eru margir frægustu menn this is the town where born are many most-famous men —jó#arinnar. the nation(gen) ‘This is the town where many of the most famous men of the nation were born.’ d. –etta eru tillögurnar sem um var rætt. these are the proposals that about was discussed ‘These are the proposals that were discussed.’ e. –egar fram fara kosningar er alltaf miki# fjör. when forth go elections is always a lot action ‘When elections are held, there is always a lot of action.’ f. Sá sem fyrstur er a# skora mark fær sérstök ver#laun. he that first is to score goal gets special prize ‘The first one to score a goal gets a special prize.’ g. –etta er versta bók sem skrifu# hefur veri#. this is the worst book that written has been ‘This is the worst book that has been written.’ Maling (1980/1990) observes a curious restriction on stylistic fronting: the subject in sentences involving stylistic fronting cannot be lexically realized in its canonical position (SpecIP). Thus, (1a), where the subject is a whtrace (see also (1d,f,g)), contrasts with (2a-b) with respect to the possibility of stylistic fronting of the negative element, whose base-generated position is given in (2c). (2)
a. *Ég held a# Halldór ekki hafi sé# —essa mynd. I think that Halldor not has seen this film b. *Ég held a# ekki Halldór hafi sé# —essa mynd. c.
Ég held a# Halldór hafi ekki sé# —essa mynd. ‘I think that Halldor has not seen this film.’
The null subject of the stylistic fronting construction can also be an expletive, as shown by (1b). (The alternative is that the embedded clause does not have a subject at all.) A lexical subject can also appear in a stylistic fronting construction if located to the right of SpecIP, which is then presumably also filled by a null expletive. This is illustrated in (1c,e).
PF merger in stylistic fronting and object shift
39
Several authors (see Maling 1980/1990, Otósson 1989, Platzack 1987, Rögnvaldsson and Thráinsson 1990, Holmberg 2000, and Hiraiwa 2001, among others) have tried to account for the subject gap restriction by assuming that the landing site of stylistic fronting is the subject position (SpecIP). This analysis is obviously problematic. Given the kind of elements that are affected by stylistic fronting (see (1)), it seems implausible that its landing site is the subject position, SpecIP. Also, it is far from clear that SpecIP would be free for, for example, the negative marker to move to in constructions like (1a). In fact, SpecIP should be filled by a trace of the null operator/ relative head. Notice also that the analysis in question rests on the assumption that heads can move to specifiers, which is standardly assumed not to be allowed. For another very serious problem with the analysis the reader is referred to fn. 13. Several authors (see Holmberg and Platzack 1995, Jónsson 1991, Poole 1992, 1996, Santorini 1994, among others) have proposed that stylistic fronting involves adjunction to I, where the finite verb is located. This analysis cannot account for the subject gap restriction (for relevant discussion, see Fischer and Alexiadou 2001 and Holmberg 2000, among others). The above considerations strongly argue against both the movement to SpecIP and the adjunction to I analyses of stylistic fronting. I conclude therefore that we need a new analysis of the phenomenon. I will now show that the subject gap restriction on stylistic fronting can be accounted for in a principled way if the stylistic fronting construction involves a phonologically null head which is lexically specified as being a verbal affix.1 The analysis will be based on probably the oldest surviving analysis of generative syntax, Chomsky’s (1957) mechanism of affix hopping, revived recently in Halle and Marantz (1993), Bobaljik (1994, 1995), Lasnik (1995), Bo‰koviç (in press b) and Bo‰koviç and Lasnik (2003), among others. In the recent instantiations, the mechanism is treated as a morphophonological rule that involves merger between an affix and its host in PF under adjacency. Merger is blocked by intervening phonologically realized elements, but not by phonologically null elements such as traces and pro. To illustrate how the mechanism works, consider (3a-c), whose structures before PF merger and do-support are given in (4). (3)
a. John laughed. b. *John not laughed. c. John did not laugh.
(4)
a. [IP Johni I (ed) [VP ti laugh]] b. [IP Johni I (ed) [NegP not [VP ti laugh]]]
40
Îeljko Bo‰koviç
Assume that English I is a verbal PF affix, hence must merge with a verbal element in PF under adjacency. The adjacency requirement is not met in (4b) due to the intervening negative head, which blocks PF merger. Do-support, a last resort operation, then takes place to save the stranded affix, deriving (3c). In (4a), the merger is not blocked since no phonologically realized element intervenes between I and the verb. I then merges with the verb, deriving (3a). Returning now to the subject gap restriction on stylistic fronting, one way to look at it is to consider it an instance of an adjacency (i.e. affixation) relation with the verb. In other words, the target of stylistic fronting must be adjacent to a verb, the most straightforward interpretation of which is that it is a verbal affix. I therefore propose that elements affected by stylistic fronting move to a functional projection right above IP (as discussed below, the movement involves leftward head-adjunction), whose head, call it F, is a verbal affix.2 Being a verbal affix, F must merge under PF adjacency with a verb.3 It follows then that a lexically realized subject cannot intervene between a stylistically fronted element and the verb. Nothing, however, prevents phonologically null subjects from doing so. The relevant structures for (1a-b) and (2b) are given in (5). (5)
a.
–etta er ma#ur sem ekki F t hefur leiki# nítíu leiki. | | this is a man that not has played ninety games ekki F Halldór hafi sé# —essa mynd. | | * think that not Halldor has seen this film
b. *Ég held a#
I c.
a# rigna —egar komi# F pro var —anga#. | | was stopped to rain when arrived was thither
–a# var hætt it
The subject gap restriction on the stylistic fronting construction is thus accounted for. This is done in a rather straightforward manner without assuming any theoretically anomalous mechanisms, which alternative accounts are quite generally forced to do.4 The current analysis, which treats stylistic fronting as syntactic movement but holds PF responsible for the subject gap restriction, also resolves a serious problem that the apparent optionality of stylistic fronting
PF merger in stylistic fronting and object shift
41
raises for the current theoretical framework, which has no natural place for truly optional syntactic movement. (For discussion of optionality of stylistic fronting, see also Poole 1996.) (6)
a. –etta er ma#ur sem ekkii+F hefur ti leiki# nítíu leiki. this is a man that not has played ninety games ‘This is a man that has not played ninety games.’ b. –etta er ma#ur sem hefur ekki leiki# nítíu leiki.
Under the current analysis, there is no need to take (6) to indicate that stylistic fronting is a syntactically optional operation. The options in (6a-b) can be treated as a result of different lexical choices: If F is inserted into the structure, as in (6a), it obligatorily triggers stylistic fronting. When F is not inserted into the structure, which I assume is the case in (6b), stylistic fronting does not, and cannot, take place. There is then nothing optional syntactically about stylistic fronting, which is conceptually desirable from the current theoretical point of view. It is also worth noting that, as discussed in Delsing (2001), in Old Scandinavian stylistic fronting appeared to be obligatory with the relative marker sum prior to 1350, and with the relative pronoun hvilkin even after 1350. The apparent obligatoriness of stylistic fronting in the constructions in question can be readily captured in the current analysis by assuming that the head of the relative clauses in question obligatorily took FP as its complement. 1.2. Extension to Bulgarian questions The PF merger analysis of the subject gap restriction on Icelandic stylistic fronting can be readily extended to a similar restriction on Bulgarian whquestions. It is well-known (see Izvorski 1993, Kraskow 1994, Rudin 1986, and Bo‰koviç 2001, among others) that, as illustrated in (7), a subject cannot intervene between a wh-phrase located in SpecCP and the verb in whquestions in Bulgarian, although, as shown convincingly in Izvorski (1993), the verb in such questions does not move to C. (7)
a. *Kakvo Ana dade na Petko? what Ana gave to Petko ‘What did Ana give to Petko?’ b. Kakvo dade Ana na Petko?
42
Îeljko Bo‰koviç
Izvorski observes that if Bulgarian were to have I-to-C movement in whquestions, (8b) should be acceptable, with the auxiliary moving to C across the subject in SpecIP, as in its English counterpart What has Maria forgotten about? (Notice that the auxiliary in (8) is not a proclitic on the verb, which several other auxiliary forms in Bulgarian are.) (8)
a. Maria be‰e zabravila za sre‰tata. Maria was forgotten for the meeting. ‘Maria had forgotten about the meeting.’ b. *Za kakvo be‰e Maria zabravila? for what was Maria forgotten ‘About what had Maria forgotten?’ c. cf. Za kakvo be‰e zabravila Maria?
Also, if Bulgarian were to have verbal movement to C in questions, which means that the subject following the verb can be located in SpecIP, the adverb in (9b) should have both the low, manner reading, and the high, sentential subject-oriented adverb reading, just like the adverb in (9a) and English constructions of this type. (Izvorski gives What did John carefully read, where the adverb can have either the manner or the subject-oriented adverb reading.) However, the expectation is not borne out. Based on these data, Izvorski concludes that Bulgarian wh-questions do not involve verbal movement to C. Rather, the verb is located lower than C in Bulgarian questions. Still, a subject cannot intervene between the interrogative C and the verb, as (7a) shows. 5 (9)
a. Petko pravilno otgovori na v u˘ prosa im. Petko correctly answered to the question theirs ‘Petko did the right thing when he answered their question.’ ‘Petko gave a correct answer to their question.’ b. Na kakvo otgovori Petko pravilno? to what answered Petko correctly ‘*What was Petko right to answer?’ ‘What did Petko give a correct answer to?’
I conclude that Bulgarian wh-questions exhibit a subject gap restriction, similar to the subject gap restriction on Icelandic stylistic fronting. In fact, under the current analysis of stylistic fronting, the parallel is complete. In both Bulgarian wh-questions and Icelandic stylistic fronting constructions a
PF merger in stylistic fronting and object shift
43
null head has to be adjacent to a finite verbal element located in a lower head position. I propose to account for the subject gap restriction on Bulgarian wh-questions in the same way as the subject gap restriction on the Icelandic stylistic fronting construction. In particular, I propose that the phonologically null interrogative C in Bulgarian is a verbal affix, hence must merge with a verb under PF adjacency. This straightforwardly explains the adjacency effect in (7). Although lexical subjects in Bulgarian can either move to SpecIP or stay in SpecVP overtly, the subject in wh-questions has to remain in SpecVP. 6 If it moves to SpecIP, as it does in (7a), it blocks merger of the interrogative C and the verb. As a result, the affix requirement of the interrogative C cannot be satisfied. (10) a. [CP Kakvo C [IP dade Ana na Petko]] | | b. [CP Kakvo C [IP Ana dade na Petko]] | | * Being forced to remain in SpecVP, the subject must follow the participle zabravila in (8), and the adverb pravilno, which follows the subject, can have only the low, manner reading in (9b). (To have the high, subject-oriented adverb reading, the adverb would have to precede the verb. Notice that the participle undergoes overt movement outside of VP in Bulgarian; see Izvorski 1993 and Bo‰koviç 1997b.) The contrast between the wh-question in (7a) and the yes-no question in (11) provides a confirmation of the current analysis. (11) Dali Ana dade na Petko knigata? Q Ana gave to Petko the book ‘Did Ana give Petko the book?’ Dali, the complementizer in yes-no questions, is clearly not a verbal affix. It is a prosodic word bearing stress and therefore is not expected to be subject to the adjacency requirement the null interrogative C is subject to under the current analysis. The PF merger analysis of the adjacency effect in Bulgarian fits well with a conclusion concerning interrogative C-insertion in Bulgarian and SerboCroatian (SC) reached in Bo‰koviç (2002a), where it is argued that whmovement must take place overtly in Bulgarian, but not in SC.7 I attribute
44
Îeljko Bo‰koviç
the difference to the timing of the interrogative C-insertion in Bulgarian and SC: the C, whose presence triggers immediate wh-movement, must be inserted in overt syntax in Bulgarian, but not in SC, hence wh-movement must take place overtly in Bulgarian, but not in SC. Why is there a difference in the timing of C-insertion between the two languages? In Bo‰koviç (2000) I suggest that the same difference exists between French and English and attribute it to a PF requirement on the interrogative C which is present in English, but lacking in French.8 In particular, I suggest that the interrogative C is a PF verbal affix in English, but not in French. As a result, the C must be inserted into the structure in overt syntax in English, but not necessarily in French. If the interrogative C were to be inserted into the structure in LF in English, which I argue is a possibility in French and results in wh-in-situ questions, the PF affix requirement could not be satisfied and the derivation would crash.9 Independent evidence for the difference between English and French is provided by the fact that S-Aux Inversion is obligatory in English, but not in French questions, as illustrated in (12). (More precisely, the fact that the interrogative C must be adjacent to a verb in PF in English, but not in French indicates that the C is a verbal affix in English, but not in French. See also Bo‰koviç 2000 for explanation why S-Aux Inversion does not take place in English embedded questions.) (12) a. Qui tu as vu? whom you have seen ‘Who did you see?’ b. *Who you have seen? Bulgarian and SC differ in the same way. Thus, the counterpart of Bulgarian (7), repeated in (13a), is acceptable in SC, as illustrated in (13b). (13) a. *Kakvo Ana dade na Petko? what Ana gave to Petko ‘What did Ana give to Petko?’ b. ·ta Ana dade Ivanu? what Ana gave Ivan ‘What did Ana give to Ivan?’ The difference between Bulgarian and SC can be accounted for if the interrogative C is a verbal affix in Bulgarian, but not in SC. The PF merger analysis thus provides a uniform account of the different behavior of the
PF merger in stylistic fronting and object shift
45
two languages with respect to the adjacency effect in questions and the obligatoriness of overt wh-movement. It is worth noting here that, as discussed in Izvorski (1993), the adjacency effect is not present in Bulgarian relative clauses and in questions with the question word za‰to ‘why’. (14) a. Pismoto, koeto deteto napisa, e na masata. the letter which the child wrote is on the table ‘The letter which the child wrote is on the table.’ b. Za‰to Ivan napusna universiteta? why Ivan left the university ‘Why did Ivan leave the university?’ This can be interpreted as indicating that the relative C is not a verbal affix and that za‰to at least can occur in C. (Notice in this respect that ‰to can serve as a complementizer in SC (see Biboviç 1971 and Browne 1980), a closely related language, and used to be able to do so in Bulgarian up to the beginning of the 20th century.) To summarize the section on Bulgarian, we have seen that the PF merger analysis explains the otherwise mysterious subject gap restriction on Bulgarian wh-questions, which is treated on a par with the subject gap restriction on the Icelandic stylistic fronting construction. In both languages, the reason for the restriction is phonological. More precisely, in both languages, a phonologically null head which is lexically specified as a verbal affix takes as its complement the IP where the verb with which it needs to merge is located. If a lexical subject is located in SpecIP the merger is blocked, resulting in a Stranded Affix Filter violation. The fact that the current analysis provides a uniform account of the subject gap restriction on Icelandic stylistic fronting and the subject gap restriction on Bulgarian wh-questions should be taken as an argument for the analysis, especially in light of the fact that the PF merger analysis of the subject gap restriction on Bulgarian wh-questions has independent motivation, as demonstrated above.10 The above discussion also provides an illustration of the ease with which the PF merger/affix hopping mechanism captures adjacency relations between elements that belong to different syntactic projections, which syntax itself is hard pressed to deal with. In fact, the natural conclusion of the above discussion is that the PF merger/affix hopping analysis should be considered the null hypothesis when faced with adjacency relations between elements belonging to different syntactic projections, one of which is phonologically weak.
46
Îeljko Bo‰koviç
1.3. PF Merger and adverbs Returning to stylistic fronting, the PF merger analysis of stylistic fronting also gives us an insight into the effect of adverbs on PF merger. Consider (15). (15) Mary quickly left. On the basis of constructions such as (15) Bobaljik (1994, 1995) proposes that adverbs or, more generally, adjuncts, do not count for the purpose of PF adjacency relevant to merger. The assumption is obviously problematic. Ochi (1999), however, gives a deduction of Bobaljik’s assumption. He follows Lebeaux (1988), Chomsky (1993), Bo‰koviç and Lasnik (1999), and Stepanov (2001a,b) in assuming that adverbs (more precisely, adjuncts) can be inserted into the structure acyclically and shows that given the assumption and the multiple spell-out hypothesis, according to which the phonology has multiple derivational access to the syntax (see Bresnan 1971, Chomsky 1999, 2000, Epstein 1999, Epstein et al 1998, Uriagereka 1999, and section 2), the adverb adjacency problem disappears. (It is important to bear in mind that both multiple spell-out and late adverb insertion are crucially needed in Ochi’s account.) For example, the adverb quickly in (15), which Ochi assumes intervenes between I (more precisely, Tense) and the verb, can be inserted into the structure acyclically after the structure, with I and the verb adjacent, has already been sent to the phonology. PF merger can then take place prior to adverb insertion. The structure is sent again to the phonology after adverb insertion. However, the presence of the adverb is now irrelevant since the merger has already taken place. The derivation in question is given in (16).11 (16) a. Send John Infl leave to PF, merge Infl and leave into left. b. Insert the adverb in the syntax and send the structure again to PF. Lasnik (in press) suggests an alternative analysis of constructions like (15) that does not need to say anything special about adverbs. He suggests that adverbs like quickly (the analysis is extendable to other ‘intervening’ adverbs in English) can be located above Tense so that they do not interfere with the merger of Tense and the verb. Evidence that quickly can occur above Tense is provided by (17), given that do is located under Tense.12 (17) John said that he would leave, and he quickly did.
PF merger in stylistic fronting and object shift
47
This analysis removes the main reason for making adverbs special when it comes to PF merger. The current analysis of stylistic fronting suggests that this is the right way to proceed. As illustrated in (18), adverbs cannot occur between a stylistically fronted element and the verb (cf. (1a)).13 (18) *–etta er ma#ur sem ekki í gær hefur leiki# nítíu leiki. this is a man that not yesterday has played ninety games ‘This is a man that has not played ninety games yesterday.’ If we adopt the Bobaljik/Ochi analysis of the failure of adverbs to block PF merger in (15), which exempts adverbs from PF adjacency, we would have to stipulate that the adverb cannot be inserted between the stylistically fronted element and the verb (i.e., that there is no proper position for the adverb between the two). The desired result can be achieved in a more principled way under Lasnik’s analysis of the lack of the adjacency effect in (15), which does not exempt adverbs from adjacency relevant to PF merger and accounts for the lack of the adjacency effect in constructions like (15) by placing the adverb above the null head undergoing merger. Most authors (see Holmberg and Platzack 1995, Jónsson 1991, Poole 1992, 1996, Santorini 1994, among others) assume that Icelandic stylistic fronting involves headmovement, which under the current analysis is instantiated as left-adjunction to F, in accordance with Kayne’s (1994) LCA. There is considerable evidence that Icelandic stylistic fronting indeed involves head movement. Thus, it is generally restricted to heads. Notice, for example, that the participle and the adjective alone undergo stylistic fronting in (19), taken from Jónsson (1991), leaving their complements behind. (19) a. –etta er ma#ur sem leiki#i hefur ti nítíu leiki. this is a man that played has ninety games ‘This is a man that has played ninety games.’ b. –eir sem ánæg#iri eru ti me# kaupi# kvarta ekki. those who content are with the pay complain not Notice also that stylistic fronting does not seem to have any semantic or pragmatic effects (see Holmberg 2000). This is not surprising under the head movement analysis since head movement generally lacks such effects (see Chomsky 1999). The fact that stylistic fronting is clause-bound also fits well with the head-movement analysis.14 Returning to (18), given that stylistic fronting in Icelandic involves head movement (i.e. leftward head adjunction to F), there is simply no space
48
Îeljko Bo‰koviç
between the stylistically fronted element and the null head undergoing merger with the verb for the adverb to intervene. No Spec position or XP/X’adjoined position is available, as in English (15), where the adverb can be either X’-adjoined, XP-adjoined, or even located in SpecXP, depending on which assumptions concerning the split I hypothesis and adverb placement (see Cinque 1998 on the latter) are adopted. We thus have here evidence that adverbs do count for the purpose of PF adjacency relevant to merger, i.e. that they block PF merger, just like other phonologically realized elements. This is certainly the null hypothesis (see section 2, in particular, example (34) for additional evidence that adverbs interfere with PF merger).This means that the analysis that accounts for the grammaticality of (15) by placing the adverb above the null head undergoing merger is more adequate than the analysis that accounts for such constructions by making adverbs irrelevant to PF merger.15
2.
Object shift
In this section I show that certain otherwise puzzling properties of Scandinavian object shift can also be accounted for in a principled way under a PF merger analysis. The analysis is also shown to have important consequences for the syntax-phonology interface. In particular, it provides an argument for the multiple-spell out hypothesis (see Bresnan 1971, Chomsky 1999, 2000, Epstein et al 1988, Franks and Bo‰koviç 2001, and Uriagereka 1999, among others), on which syntax and phonology interact derivationally, with the syntax sending information to the phonology at more than one point, that is, throughout the derivation.
2.1. Participle movement and object shift in auxiliary+participle constructions It is well-known that, as discussed by Holmberg (1986), object shift in Scandinavian depends on V-movement. As illustrated by Swedish (20), object shift can take place in main verb V-2 clauses, but not in aux+participle and embedded clauses, where the main verb does not undergo movement.16 (20) a.
Jag kysstej [AgroP hennei [VP inte [VP tj ti]]] I kissed her not ‘I didn’t kiss her.’
PF merger in stylistic fronting and object shift
49
b. *Jag har [AgroP hennei [VP kysst ti]] I have her kissed ‘I have kissed her.’ c.
Jag har [AgroP [VP kysst henne]]
d. *…att [IP jag [AgroP hennei [VP kysste ti]]] that I her kissed e.
…att [IP jag [AgroP [VP kysste henne]]]
Holmberg (1999), however, makes a very interesting observation that object shift can take place even in aux+participle constructions if the participle undergoes movement to SpecCP. (Holmberg argues that only the verbal head moves to SpecCP in (21) and calls this movement V-topicalization. The alternative Holmberg argues against is remnant VP-fronting, which would have to follow object shift. The issue is addressed below.) (21) a.
Kysst har jag henne inte (bara hållit henne i handen). kissed have I her not only held her by the hand ‘Kissed her I haven’t (only held her by the hand).’
b. Sett har han mej kanske (men han vet inte vad jag heter). seen has he me perhaps but he knows not what I am-called ‘Seen me he may have done (but he doesn’t know my name).’ As Holmberg shows, this type of construction invalidates Chomsky’s (1993) equidistance account of the dependency of object shift on V-movement (see also Bobaljik and Jonas 1996). Under Chomsky’s account, in order for the object to be able to skip the subject in SpecVP, and for the subject to be able to skip the shifted object when moving to SpecTP, it is necessary for the main verb to move not only to Agro but also to T. This clearly cannot take place in (21). The auxiliary rather than the participle moves to T in (21). To account for the saving effect of V-topicalization on object shift in aux+participle constructions, Holmberg proposes an analysis that treats object shift as a phonological operation and stipulates a locality condition which prevents object shift from applying across a phonologically visible category asymmetrically c-commanding the object position except for adjuncts. (As noted above, the negative marker is considered to be an adjunct.)17 Given this, V-movement in (21) must precede object shift. Since, according to Holmberg, V-movement (more generally, movement to SpecCP) is a syntactic operation, object shift then must be a phonological operation.
50
Îeljko Bo‰koviç
If it were to take place in the syntax, the cycle would be violated in constructions like (21). As discussed in Chomsky (1999), Holmberg’s analysis is problematic in several respects. The proposed locality condition is rather strange and does not fall together with locality conditions on other putative cases of PF movement. The exception for adjuncts is also obviously problematic. Given that, as shown convincingly in Diesing (1996), object shift is a semantically “loaded” operation,18 another problem is the semantics/phonology interaction that is necessary under Holmberg’s analysis. Such an interaction cannot be established under the standard conception of the grammar, where semantic effects are restricted to narrow syntax, the post-spell out PF derivation not having effect on semantics.19 Another problematic aspect of Holmberg’s analysis is his stipulation that [-focus] elements (elements that undergo object shift are specified as [-focus] according to Holmberg) must be governed by a [+focus] element. This is so especially in light of the fact that Holmberg’s [+focus] elements represent an arbitrary collection of categories that does not fit into any of the standard conceptions of focus.20 (See the discussion below for another problem with Holmberg’s analysis which has to do with the phrase structure status of the element undergoing topicalization in (21). For problems with Holmberg’s analysis as well as alternative proposals, see also Erteschik-Shir 2001 and Josefsson 2001.) Given all of these problems, I conclude that though very interesting, Holmberg’s analysis cannot be maintained. So, how can we explain the saving effect of V-topicalization on object shift in aux+participle constructions? In the next section I will show that Bobaljik’s (1994, 1995) PF merger analysis of the impossibility of object shift in (20b) can provide a straightforward account of the acceptability of (21) if we adopt the multiple spell-out hypothesis, on which the syntax sends information to the phonology throughout the derivation.
2.2. Object shift and multiple spell-out Bobaljik (1994, 1995) argues that object shift is ruled out in Scandinavian embedded and aux+participle clauses for morphophonological reasons, namely, due to a violation of the requirement that an affix which is to be phonetically realized on a stem must be adjacent to it in PF. As a result, even if a verb in Swedish does not move to I overtly, the verb and I still must be adjacent in PF, that is, they must undergo PF merger. In (20d), the PF adjacency requirement cannot be satisfied due to the intervening shifted
PF merger in stylistic fronting and object shift
51
object. The problem does not arise in (20e), where the object remains in situ. As for (20b-c), Bobaljik posits a participial affix, located above the shifted object (PartP, headed by the affix, is the complement of the auxiliary), which must merge under adjacency with the participle in PF. The account of (20d-e) then readily extends to (20b-c).The relevant structures are given in (22). (22) a. *Jag har [PartP Part [AgroP hennei [VP kysst ti]]] I have her kissed b.
Jag har [PartP Part [AgroP [VP kysst henne ]]]]
As one argument for his analysis, Bobaljik points out that in head-final Germanic languages, object shift can take place even in embedded and aux+participle clauses, i.e. in the absence of V-movement. This is expected under his analysis since due to the head final nature of these languages, the relevant verbal elements and I and Part remain linearly adjacent even if object shift takes place overtly. (The following Dutch example is taken from Bobaljik 1995.) (23) …dat veel mensen [PartP [AgroP dat boek [VP gisteren gekocht]] Part] hebben. that many people that book yesterday bought have ‘… that many people bought that book yesterday.’ Let us now consider the contrast between (20b) and (21). As discussed above, under Bobaljik’s analysis, (20b) is ruled out because the shifted object intervenes between the participle and the null head (Part) the participle is required to merge with (see the structure in (22a)). I will show now that we can account for (21) under Bobaljik’s analysis without any additional assumptions if we adopt the multiple spell-out hypothesis, which allows the phonology to have access to intermediate syntactic structures by having the syntax send structures to the phonology throughout the derivation. Consider how the saving effect of V-topicalization on object shift in aux+participle constructions, illustrated in (21), would be treated under the PF merger+multiple spell-out analysis. Suppose that the verb undergoes successive cyclic movement to SpecCP and that during the movement, it lands at some point to a position that is adjacent to the null head that it is required to merge with, both of which are reasonable assumptions. (For discussion concerning what the position in question is, see (26) below.) If the structure can be sent to the phonology at this point, certainly a possibility in
52
Îeljko Bo‰koviç
the multiple spell-out model, the participle and the null head will be adjacent in the phonology so that the merger will be able to take place.21 The participle will proceed with movement to SpecCP. I assume that the morphological combination of the null affix head and the participle is licensed at the point of merger during the derivation. The multiple spell-out hypothesis thus makes it possible to account for the saving effect of V-topicalization on object shift in aux+participle constructions without assuming that object shift is a phonological operation, a problematic assumption as discussed above, and without requiring phonology and semantics to interface. Furthermore, in contrast to Holmberg’s analysis, where object shift takes place acyclically after movement to SpecCP in constructions like (21), under the current analysis object shift precedes movement to SpecCP, obeying the cycle. This removes Holmberg’s main reason for pushing object shift outside of narrow syntax. Notice also that the merger is blocked in (20b) even if multiple spell-out is adopted. Given the cycle, the object must move in front of the participle before Part is merged into the structure. At no point in the derivation are then the participle and Part adjacent in (20b) (see in this respect the more detailed structure in (22a)).22 23 Consider now the phrase-structure status of the element located in SpecCP in (21). For Holmberg, it is crucial that the element has X0 status, i.e., we have to be dealing here with head movement to SpecCP. The alternative analysis, remnant VP fronting, cannot be adopted under Holmberg’s set of assumptions since this analysis requires object shift to precede topicalization. This cannot happen if topicalization is syntactic movement and object shift phonological movement, as Holmberg assumes. Under the multiple spell-out analysis, it is not necessary to adopt the non-standard assumption that heads can move to specifiers. More precisely, the multiple spell-out analysis makes it possible to treat movement to SpecCP in (21) as an instance of remnant VP fronting rather than fronting of an X0 element (on remnant VP fronting, see Den Besten and Webelhuth 1987, Huang 1993, and Müller 1998, among many others). Holmberg points out a potential problem for the remnant phrasal preposing analysis. He observes that it is impossible to save an object shift derivation for an aux+participle construction by topicalizing a VP containing a small clause. (24) *Hört hålla föredrag har jag henne inte. heard give talk have I her not
PF merger in stylistic fronting and object shift
53
The ungrammaticality of (24) is surprising given that topicalizing a VP containing a small clause is otherwise possible, as shown by (25). (The phrase undergoing topicalization in (25) could actually be larger than VP. I leave open what the phrase is and refer to it as VP for ease of exposition.) (25) Hört henne hålla föredrag har jag inte. heard her give talk have I not Holmberg accounts for the data under consideration by assuming that we are dealing here with V-movement to SpecCP, rather than remnant VP movement. The assumption is unnecessary under the multiple spell-out analysis. Recall that the reason why the shifted object does not interfere with the merger of the participle and the null head in (21) is that the participle is placed to a position adjacent to the null head during movement to SpecCP. Suppose now that the position in question precedes the null head. (The position may in fact be SpecPartP.) In other words, the relevant configuration is (26a) rather than (26b). This amounts to assuming that there is no position for the element moving to SpecCP to move through between the shifted object and Part, a plausible assumption. (26) a. …[VP participle] Part [AgroP object… b. …Part [VP participle] [AgroP object… The small clause following the participle in (24) now disturbs the adjacency between the participle and Part, blocking the merger.24 The problem does not arise in (25), where the participle is adjacent to the null head at least prior to VP-fronting to SpecCP. (I return to the placement of negation in (24)–(25) below, where I argue that the negation can be located above Part. For the moment, I disregard it.) (27) …[PartP Part[AgroP [VP hört henne…]]] heard her If the structure is sent to the phonology at this point, the merger can take place. Recall that in (24), the participle and the null head are not adjacent prior to the movement because of the shifted object. The multiple spell-out analysis thus accounts for the contrast between (24) and (25) without the assumption that a head moves to SpecCP in (21). The analysis blames the ungrammaticality of (24) on the impossibility of merger of the participle and the null head. Strong confirmation that doing
54
Îeljko Bo‰koviç
this is on the right track is provided by the fact that constructions like (24) are acceptable in German, as noted in Holmberg (1999). (The observation is attributed to Gert Webelhuth.) (28) Rauchen gelassen hat er seine Tochter nicht. smoke allowed has he his daughter not ‘He hasn’t allowed his daughter to smoke.’ The merger problem does not arise in (28). As discussed in Bobaljik (1995), German being head-final, a shifted object does not interfere with the merger of the participle and Part. The heads in question are adjacent at one point in German regardless of whether the participle is moved to SpecCP even when the object undergoes object shift. Merger can then be licensed in (28) if the structure is sent to the phonology prior to remnant preposing of the VP, when the relevant part of the structure is as shown in (29). (29) [PartP [VP …rauchen gelassen] Part…] smoke allowed The contrast between (24) and (28) thus receives a straightforward account under the multiple spell-out analysis. Considering the movement that places a participle in SpecCP to be VP preposing rather than V-preposing is desirable in light of the ungrammaticality of constructions like (30). (30) a. ?*Sett har jag honom inte röka (men jag har känt hans seen have I him not smoke (but I have smelled his andedräkt). breath) b. *Sett har jag inte Per röka (men jag har känt hans seen have I not Per smoke (but I have smelled his andedräkt) breath) The ungrammaticality of these constructions strongly indicates that we are not dealing with V-movement. Under the remnant VP preposing analysis of (21), the constructions in (30) can be readily accounted for if the small clause predicate cannot move outside of the VP, which is a prerequisite for remnant VP preposing. (In fact, there seems to be no proper motivation for
PF merger in stylistic fronting and object shift
55
this movement.) I therefore conclude that the saving effect of topicalization of a constituent containing the participle on object shift in aux+participle constructions can be accounted for without undesirable consequences concerning the status of the saving movement (the movement can be considered remnant phrasal movement) if multiple spell-out and Bobaljik’s PF merger analysis are adopted.
2.3. Where is inte located? Before concluding the discussion of object shift in Scandinavian I will address one issue that Holmberg raises as a problem for Bobaljik’s PF merger analysis. Bobaljik assumes that elements like inte mark the left edge of the VP. More precisely, he assumes that they are left-adjoined to VP. Furthermore, he assumes that both the landing site of object shift and the null head that merges with the participle are higher than inte, the null head being higher than the shifted object. Holmberg observes that these assumptions are untenable for Mainland Scandinavian based on constructions like (31), which indicates that inte is higher in the structure than the auxiliary, which on Bobaljik’s analysis is supposed to be higher than the shifted object and the null head the participle merges with. (Recall that the auxiliary remains in its base-generated position in Swedish embedded clauses.) We thus appear to have a contradiction at hand. (31) a.
Det är möjligt [att Per inte har kysst henne]. it is possible that Per not has kissed her
b. *Det är möjligt [att Per har inte kysst henne]. The problem is actually even more serious. Recall that Bobaljik assumes that adjuncts like inte are invisible to the operation of merger and therefore do not disrupt the adjacency necessary for merger to take place. However, we have seen in the discussion of stylistic fronting in section 1 that the assumption is not only conceptually, but also empirically problematic. The facts discussed in that section indicate that, as would be expected, adjuncts are visible in PF and interfere with merger. Given this conclusion, even (32a) becomes problematic if the negation is adjoined to VP since the negation should block the merger of the participle and Part, as shown in the structure in (32b). (Bobaljik would deal with such constructions by assuming that adjuncts do not block merger.)
56
Îeljko Bo‰koviç
(32) a. Per har inte kysst henne. Per has not kissed her b. Per har [PartP Part [AgroP [VP inte [VP kysst henne]]] I conclude, therefore, that the negation must be higher than Part in (32a). (32a) can be readily accounted for if we assume that the negation can be adjoined not only to the main verb VP, as Bobaljik does, but also to the VP headed by the auxiliary, a rather natural assumption. The negation can then be located in this higher position in (32a). (33) Per hari [VP inte [VP ti [PartP Part [AgroP [VP kysst henne]]]]] However, this may not be enough to account for (31a). If the embedded clause auxiliary needs to merge with I, which is what Bobaljik assumes, the negation would intervene between the two elements even if it is adjoined to the VP headed by the auxiliary. Inte in (31a) in fact raises the same kind of problem as quickly does in (15). I therefore suggest that (31a) should be accounted for in the same way as (15). This means that inte would be attached in (31a) wherever quickly is attached in (15). (See the discussion in section 1. In fact, inte in (32a) might also be located in this position, instead of being adjoined to the higher VP.) As for (31b), there is in principle nothing wrong with the position of the negation, which occupies the lower neg position (adjoined to the main verb VP) in (31b). (Recall that the auxiliary does not move in (31b), which means that (31b) cannot be analyzed in the same way as (32a).) The problem is that as a result of being placed in the lower position, the negation intervenes between the participle and Part, blocking the merger of the two heads. (34) *Det är möjligt [CP att Per [VP har [PartP Part [AgroP [VP inte [VP kysst henne]]]]]] Under this analysis, nothing prevents us from locating the negation in the lower position in (20a) and (21). Notice also that in constructions like (35a), whose structure is given in (35b), both the shifted object and the negation now interfere with the merger of the participle and Part. (35) a. *Per har henne inte kysst. Per has her not kissed b. Per hari [VP ti [PartP Part [AgroP henne [VP inte [VP kysst]]]]]
PF merger in stylistic fronting and object shift
57
3. Conclusion I have shown that the subject gap restriction on stylistic fronting and the saving effect of V (i.e. VP)-topicalization on object shift in aux+participle constructions receive a principled account under the PF merger analysis of these constructions. Furthermore, this is accomplished without positing any kind of phonology/semantics interaction. The analysis of the subject gap restriction on stylistic fronting is extended to the adjacency effect in Bulgarian questions. The PF merger analysis of Scandinavian stylistic fronting and object shift is shown to provide empirical evidence that adverbs do interfere with PF merger. The PF merger analysis of object shift also provides evidence for the multiple spell-out hypothesis. The argument for multiple spell-out from Scandinavian object shift is straightforward: PF needs to have access to intermediate syntactic representations, which is possible under the multiple spell-out model, but not under the standard Y-model. The argument for multiple spell-out is at the same time an argument for a derivational model of the grammar and therefore represents a serious challenge for non-derivational theories like Optimality Theory.
Acknowledgements For insightful comments and questions, I thank Christer Platzack, Mamoru Saito, anonymous reviewers, and the participants of my seminars at the University of Connecticut and the University of the Basque Country at Vitoria-Gasteiz. Earlier versions of the paper appeared in Working Papers in Scandinavian Syntax 68 and Gengo Kenkyu 123. The paper was originally written in the summer of 2000. As a result, I was not able to consider in sufficient detail works that have appeared since then.
58
Îeljko Bo‰koviç
Notes 1. See Poole (1997) for a somewhat related PF account, which I became aware of only after this paper was originally written. The crucial aspect of Poole’s account is his claim that the finite verb in stylistic fronting constructions is an enclitic which undergoes rightward movement in PF in order to get proper prosodic support (Prosodic Inversion in Halpern’s 1995 terminology). The analysis faces several serious problems. First, as Poole himself notes, there are stylistic fronting constructions in which the finite verbal element is clearly not a clitic. Second, in languages in which an enclitic follows a complementizer in the syntax, the enclitic typically encliticizes to the complementizer (see Bo‰koviç 2001). Given this possibility, Poole’s enclitic should not, hence could not (under Poole’s assumptions), undergo the necessary rightward PF movement. 2. Note that F is phonologically weak and must be adjacent to a verb, which is the definition of a verbal affix. 3. More precisely, finite verb, given that stylistic fronting cannot occur in infinitives. Examples (ia-b) from Holmberg and Platzack (1995: 117-118) show this. (Notice that the infinitival marker a# is in C and the verb is in I in Icelandic infinitival clauses.) (i) a. María lofa#i a# (*ekki/alltaf) lesa (ekki/alltaf) bókina. Mary promised to not/always read not/always the book b. *María lofa#i a# teki# hafa út peninga úr bankanum á morgun. Mary promised to taken have out money from the bank tomorrow c. cf. María lofa#i a# hafa teki# út peninga úr bankanum á morgun. Notice that Mainland Scandinavian lost stylistic fronting when it lost agreement (see Falk 1993 and Holmberg and Platzack 1995), which indicates that F, whose precise nature I leave open, is (or can be) involved in the agreement system, possibly through selection for AgrsP or by being hosted by an agreeing verb. (The fact that stylistic fronting cannot occur in infinitives may be related to this.) It is worth noting that Anderson (1993) also suggests that stylistic fronting is movement to a position above the subject. (Notice that this accounts for (2a).) Anderson’s analysis, however, does not seem to leave room for the optionality of the process, discussed below with respect to (6a-b). The reader is also referred to Fischer and Alexiadou (2001), who also argue that stylistic fronting constructions introduce an additional functional projection. 4. It is worth noting that positing a phonologically null affix does not represent a departure from standard assumptions concerning what kind of elements can function as affixes. Thus, it is standardly assumed that English (i) contains a phonologically null Tense affix, hence the necessity of do-support in (ii) (not intervenes between the affix and the verb). (i) They work all the time. (ii) a. They do not work all the time. b. *They not work all the time.
PF merger in stylistic fronting and object shift
59
5. Note that given that the verb is located lower than C in questions, the subject that follows the verb, as in (7b) and (9b), must be located lower than SpecIP, namely in SpecVP (see the discussion below). 6. In Bo‰koviç (2001) I suggest that the subject actually moves to SpecIP in questions, but is pronounced in SpecVP to satisfy the affix requirement on the C. (Bobaljik 1995 gives a similar analysis of Scandinavian object shift; see fn. 22.) The analysis is based on the assumption that a lower copy of a non-trivial chain can be pronounced iff this is necessary to avoid a PF violation (see Franks 1998). As discussed in Bo‰koviç (2001), the assumption enables us to re-analyze a number of constructions that were previously argued to involve PF movement without any PF movement. 7. Among other things, the analysis accounts for the contrast between SC (i) and Bulgarian (ii) with respect to Superiority. (See Bo‰koviç 1999, 2002a for additional arguments. Notice that both SC and Bulgarian move all wh-phrases. However, I show that none of the fronted wh-phrases in SC (i) and (13b) has to be located in SpecCP overtly, which is not the case with their Bulgarian counterparts.) (i) a. Ko koga voli? who whom loves ‘Who loves whom?’ b. Koga ko voli? (ii) a. Koj kogo obiãa? who whom loves b. *Kogo koj obiãa? 8. The difference is correlated with the possibility of wh-in-situ in the two languages. In both languages insertion of the interrogative C triggers wh-movement. Since, in contrast to English, the interrogative C does not have to be inserted overtly in French, unlike in English, wh-movement does not have to take place overtly in French (see (i)). Under this analysis, the different behavior of the two languages with respect to wh-in-situ is correlated with the different behavior of the two languages with respect to S-Aux inversion, discussed directly below. (i) a. Tu as vu qui? you have seen who ‘Who did you see?’ b. *You saw who? 9. More precisely, the presence of phonological information in LF would cause a crash. (The same thing would happen if, for example, John were to be inserted into the structure in LF.) If English interrogative C (or John for that matter) is inserted into the structure overtly, the phonological information from its lexical entry is stripped off when the structure is sent to PF, so that it does not enter LF. 10. The analysis of the adjacency effect in Bulgarian wh-questions might be extendable to the widely discussed adjacency effect in Spanish wh-questions (see Torrego 1984 for an early discussion and Suñer 1994 for arguments that the verb
60
Îeljko Bo‰koviç
in (ia) does not raise to C), which is similar to the adjacency effect in Bulgarian. The Spanish construction, however, raises several questions that do not arise in Bulgarian which I will not attempt to deal with here. (i) a. Qué dijo Juan? what said Juan ‘What did Juan say?’ b. *Qué Juan dijo? 11. Questions raise a potential problem for Ochi’s analysis. (i) a. What did John buy? b. *What John bought? Suppose that spell-out applies before the CP projection is inserted (i.e. before wh- and I-to-C movement). The affix property of I could then be satisfied. The question then arises why do-support must take place in (i). To account for this, Ochi suggests, following Bo‰koviç (2000) (see the discussion of (12) above), that the interrogative C in (i) is also a verbal affix. 12. According to Lasnik, all potentially intervening adverbs pattern with quickly with respect to (17), even the adverbs, such as quickly itself, that normally occur below auxiliaries (cf. Ochi’s *Peter quickly will leave and Peter will quickly leave). Lasnik’s examples involve the adverb completely. (i) a. John will completely lose his mind. b. *John completely will lose his mind. c. John partially lost his mind, and Bill completely did. It thus appears that certain adverbs under certain circumstances (ellipsis and avoiding blocking PF merger) can occur higher in the structure than they normally do, a rather curious state of affairs which I leave open here. For much relevant discussion, see Oku (1998). 13. Jónsson (1991) observes that the same holds for parentheticals. (i) Ég hélt a# byrja#, (*eins og María haf#i sagt), yr#i a# opna I thought that started like Mary had said would-be to open the presents right after supper pakkana strax eftir kvöldmatinn. ‘I thought that, like Mary said, he would start opening the presents right after the supper.’ Significantly, as observed by Jónsson, a parenthetical can occur between a subject and the verb. (ii) Ég hélt a# Jón, eins og sannur skáti, myndi hjálpa gömlu I thought that Jón like a true Boy Scout would help old konunni a# komast yfir götuna. the lady to get across the street The above contrast strongly argues against analyses that place stylistically fronted elements in SpecIP. Holmberg (2000) offers an account of (i) based on his claim that stylistic fronting affects the closest element with a phonological matrix to I, the parenthetical
PF merger in stylistic fronting and object shift
61
being closer to I than byrja#. However, it seems that this should not matter since Holmberg claims that only elements that themselves can undergo stylistic fronting function as interveners in the relevant sense. (Thus, for Holmberg, auxiliaries vera ‘be’ and hafa ‘have’, which in most cases cannot undergo stylistic fronting, do not count as interveners. Note, however, that Holmberg makes contradictory claims concerning which elements count as interveners in the relevant respect.) It should be also pointed out that Holmberg’s intriguing claim that stylistic fronting picks the closest element with a phonological matrix faces a host of empirical problems, as discussed by Sigur§sson (1997) and Holmberg himself, and thus at present seems to be empirically unmaintainable. 14. One question that arises under the current head movement analysis is whether head movement to F violates locality restrictions on movement. Strictly speaking, the movement does violate the Head Movement Constraint. However, the movement does not raise any problems with respect to locality under the featurechecking approach to locality, as long as the intervening heads do not possess the feature that drives stylistic fronting. (Notice in this respect that elements that can in principle undergo stylistic fronting observe a hierarchy with respect to which of them can undergo stylistic fronting that appears to be structural (see Maling 1980/1990 and Jónsson 1991, among others). It appears that the hierarchy can be readily captured under the feature-checking approach to locality (i.e. under Chomsky’s 1995 Attract Closest).) The movement to F is also consistent with Roberts’ (1992) and Rivero’s (1991) relativized minimality version of the Head Movement Constraint if, for example, F in (5a) is an A’-head and the heads ekki crosses are A-heads, certainly plausible assumptions. (For discussion of the stylistic fronting hierarchy, see Jónsson 1991, Sigur§sson 1997, and Holmberg 2000, among others.) It is worth noting here that stylistic fronting in Old Scandinavian could involve phrasal movement (see Falk 1993 and Delsing 2001), which I take to land in SpecFP. Unambiguous phrasal stylistic fronting is not completely excluded in Icelandic either (see Sigur§sson 1997 and Holmberg 2000). However, it appears to be severely restricted, i.e. it is completely unavailable in most cases, hence I ignore it here. The reader is also referred to Fischer and Alexiadou (2001), who argue that there is crosslinguistic variation concerning whether stylistic fronting involves head or phrasal movement, i.e. F° or SpecFP in our terms. 15. We now need an alternative account of the Irish data Bobaljik (1994, 1995) analyzed by exempting adverbs from interfering with PF merger. (For a potential line on these data, see the discussion concerning (17) above, including fn. 12.) Notice also that, as shown by Izvorski (1993), in contrast to a subject, an adverb can intervene between a wh-phrase and the verb in Bulgarian wh-questions, as demonstrated in (i). (i) Kakvo vãera kupi Petko? what yesterday bought Petko ‘What did Petko buy yesterday?’ Given the above discussion, the adverb in (i) should be analyzed as being located above the interrogative complementizer, so that it does not intervene
62
Îeljko Bo‰koviç
between the complementizer and the verb. It could be located in an additional (lower) SpecCP or C’-adjoined in a more traditional structure, neither of which is the possibility for the adverb in Icelandic (18), since the element preceding the adverb in (18) is adjoined to the head undergoing merger with the verb, and not located in its Spec, as in the Bulgarian construction. Bulgarian (i) could in fact be another example where an adverb occurs higher in the structure than it normally does in order not to interfere with PF merger (see in this respect fn. 12). This could explain the contrast between (i) and English *What yesterday did Peter buy? (It is worth noting, however, that some speakers of Bulgarian allow adverbs to intervene even in between fronted wh-phrases, which are standardly analyzed as being all located in SpecCP.) 16. Unless otherwise indicated, all the data discussed in this section are from Swedish and taken from Holmberg (1999). (Some of the data are slightly modified.) The precise positions of the lexical items in (20), including the shifted object, do not affect the analysis about to be given. That is, the gist of the analysis would not be affected by changing the labels of the phrasal nodes in (20). For ease of exposition, I am following more or less standard assumptions concerning where the relevant elements are located. The negation is standardly assumed to be VP-adjoined and therefore mark the left edge of the VP (see, however, section 2.3.). Most relevant literature assumes that the landing site of object shift is SpecAgroP (SpecvP in Chomsky’s 1995 system). See, however, Bo‰koviç (1997a: 211–212, in press c), Holmberg and Platzack (1995), and Vikner (1995), among others for problems for the standard assumption. 17. Holmberg presents three other cases which he argues are also covered by his generalization regarding when object shift can take place. As Holmberg himself notes, it is standardly assumed in the literature that the cases in question, which involve a blocking effect of non-verbal elements on object shift, and the blocking effect of verbs on object shift should not be treated in the same way. This seems quite clear for two of the cases Holmberg gives, illustrated in (i). (The third case is discussed in fn. 23.) (ia) illustrates the blocking effect of prepositions on object shift and (ib) the blocking effect of indirect objects on the object shift of direct objects. The latter disappears with A’-movement of the indirect object (ic). The contrast between (ib) and (ic) seems to parallel the contrast between (20b) and (21). (i) a. *Jag talade hennei inte med ti. I spoke her not with b. *Jag gav deni inte Elsa ti. I gave it not Elsa c. Vem gav du deni inte ti? who gave you it not ‘Who didn’t you give it to?’ However, there is an independent account of all the data in (i). Given that object shift involves movement to (or through) a Case-checking position, (ia) must involve movement from a Case-checking to a Case-checking position,
PF merger in stylistic fronting and object shift
63
which is disallowed. As for (ib), Collins and Thráinsson (1996) present an account of (ib) in terms of a morphological constraint that prevents the Agr which hosts the shifted direct object from having a strong N feature unless the Agr which hosts the shifted indirect object (the latter Agr is higher than the former Agr) has a strong N feature. The constraint in question is violated in (ib), where only the direct object Agr has a strong N feature. (If the indirect object Agr had a strong Agr feature, the indirect object would also have to undergo object shift, which would place it above the direct object.) The problem does not arise in (ic), where the indirect object could be undergoing object shift on its way to SpecCP (see in this respect Bo‰koviç 1997b, where it is shown that object wh-NPs must pass through SpecAgroP on their way to SpecCP), hence the indirect object Agr could also be strong. (ii) Vemj gav du tj deni inte tj ti? Another way of accounting for the contrast between (ib) and (ic) would be to rule out (ib) by appealing to locality restrictions on movement (more precisely, relativized minimality; see Vikner 1989 for a relativized minimality account of (ib)). Depending on the precise structure of the constructions in question, a controversial issue, (ic) could then plausibly be accounted for by assuming (see Chomsky 1995: 304, 1999) that traces are invisible to Move (more precisely, they do not have a blocking effect on movement) and that locality relevant to movement is computed only at the phase level (see Chomsky 1999: 23). 18. Diesing shows that referential, specific, non-contrastive definite NPs undergo object shift, while non-specific indefinite NPs cannot undergo object shift. Notice also that object shift can affect Binding Conditions (see Holmberg and Platzack 1995). 19. To deal with this issue, Holmberg considerably enriches the standard model. It is worth noting in this respect that Chomsky (1999) presents an alternative to Holmberg’s analysis that also faces the problem of phonology-semantics interaction. In particular, Chomsky (1999: 28) proposes a rule that makes an assignment of a particular interpretation sensitive to the notion of phonological border. Another problem with Chomsky’s analysis is his adoption of the assumption (p. 29) that the feature driving object shift can be present in the structure only if it will eventually have an effect on the interpretation of the sentence (I am ignoring here constructions involving successive cyclic movement through the object shift position), an assumption which results in considerable globality. 20. Holmberg does not leave sufficient room for contextual effects on focus assignment since he assumes that certain categories, for example, main verbs, prepositions, verb particles, in fact all lexical predicate heads, are inherently specified as +[focus]. The assumption seems unmaintainable. For example, neither the verb nor the particle is focused in Mary turned on the radio if the sentence is a response to the following question: What did Mary turn on? Holmberg also assumes that certain elements, in particular, adverbs, negation, and in general predicate adjuncts, are not marked for the focus feature. The assumption is also
64
Îeljko Bo‰koviç
problematic. To illustrate the problem, the adverb is focused, in fact, it is the only focused element in Mary left yesterday if the sentence is a response to the following question: When did Mary leave? 21. The null hypothesis is that each phrase is “shipped” to the phonology. However, Chomsky (1999, 2000) suggests that certain phrases are privileged in this respect. He develops a rather stipulatory notion of phase and suggests that the syntax sends information to the phonology phase-by-phase. The notion of phase is empirically motivated largely on grounds independent of multiple spell-out. In fact, it removes one of the arguments for multiple spell-out Chomsky offers. Chomsky (1995: 385) observes that in the single point of spell-out model, often during the derivation uninterpretable features check and erase pre-spell-out even though they have a phonetic effect. The question arises how to ensure that such features remain in the structure until spell-out. Chomsky (2000: 131) observes that the multiple spell-out hypothesis resolves the question since under this hypothesis, each relevant feature can be sent to the phonological component along with the rest of the structure before being erased; it does not need to “hang around” in the structure upon checking. However, this is still necessary in some cases if spell-out applies only at the phase level. (See Chomsky 1999 for a slightly different take on the issue. The point made here, however, remains.) The main empirical motivation for the notion of phase for Chomsky is to make a distinction between CP and IP with respect to several phenomena independent of multiple spell-out, essentially by making IP special in a way that CP is not (see, however, Bo‰koviç 2002b, in press a for critical discussion). Franks and Bo‰koviç (2001) provide empirical evidence from Bulgarian cliticization that IP is special even with respect to multiple spell-out. In particular, Franks and Bo‰koviç provide evidence that the syntax cannot send IPs to the phonology even in the multiple spell-out system. Making IP special in this respect would not affect the current analysis of the saving effect of V-topicalization on object shift in aux+participle constructions. (Notice also that nothing changes with respect to the current account of stylistic fronting regardless of whether FP is considered to be a phase, given that successive cyclic wh-movement would proceed through SpecFP if FP is a phase.) It is also worth noting here that I do not adopt Chomsky’s (2000) assumption that X must have an uninterpretable feature to be visible for movement. The assumption is obviously very problematic conceptually due to its stipulatory/ arbitrary nature and proliferation of features needed to implement movement under the assumption (in other words, it is a very non-minimalist assumption), and is unnecessary on empirical grounds (see Bo‰koviç in press a and Saito 2000). This means that the participle, which eventually moves to SpecCP, does not have to contain any uninterpretable features in the intermediate representation which is sent to the phonology under the current analysis. (Note that in Chomsky’s 1995 system, which does not rely on the visibility approach discussed above, the only feature that the participle would need to have for it to be able to move to SpecCP is the interpretable topic feature.)
PF merger in stylistic fronting and object shift
65
22. Bobaljik (1995) suggests that object shift takes place overtly even in (20c). However, a lower copy of the shifted object is pronounced in order not to disturb adjacency between the null head and the participle so that the merger can take place. For Bobaljik (1994), on the other hand, object shift simply does not take place overtly in (20c). 23. The account of the contrast between (20b) and (21) can be extended to the following constructions, discussed in Holmberg (1999), if we assume that (ia) contains a null head, located above the shifted object, with which the particle needs to merge. (ia) can then be accounted for on a par with (20b) and (ib) on a par with (21). (i) a. *Dom kastade mej (inte) ut. they threw me not out ‘They didn’t throw me out.’ b. Uti kastade dom mej ti inte (bara ned för trappan). out threw they me not only down the stairs However, I emphasize that it is not clear how much importance should be attached to the ungrammaticality of (ia) when examining object shift. As noted by Holmberg (1999), we are dealing here with a quirk of Swedish. Such constructions are acceptable in Danish, Faroese, Icelandic, and Norwegian. (ii) illustrates this for Norwegian. (ii) De kastet meg (ikke) ut. they threw me not out We therefore probably do not want a very deep account of (ia), which the account hinted at above is not. (The account, however, should not be taken too seriously due to its sketchiness. I leave a thorough examination of (i) for future research.) Holmberg’s account of (ia), based on the revised Holmberg’s generalization (see the discussion above), seems too deep and raises a question as to why (ia) is acceptable in all other Scandinavian languages. (Holmberg in fact leaves (ii), which raises a serious problem for his analysis since it clearly violates his generalization concerning when object shift can take place, unaccounted for.) It is also worth noting here that, as pointed out by Mamoru Saito (personal communication), under the current analysis we can account for the ungrammaticality of English VP fronting constructions like (iiia), where the main verb and I are adjacent at one point of the derivation, by appealing to the old intuition that only lexical I can license the trace of VP (or, more generally, null VP – see Lasnik 2002, Lobeck 1990, Zagona 1988, among others), which requires lexicalization of I in VP fronting constructions. (iii) a. *Left, he. b. Leave, he did. 24. Notice that Part must merge with the participle of its own clause since the participle from another clause would already be merged with its clause-mate Part. Providing another, more deeply embedded participle for Part in (24) to merge with therefore would not help.
66
Îeljko Bo‰koviç
References Anderson, Stephen 1993 Wackernagel’s revenge: clitics, morphology, and the syntax of second position. Language 69: 68–98. Besten, Hans den, and Gert Webelhuth 1987 Remnant topicalization and the constituent structure of VP in the Germanic SOV languages. GLOW Newsletter 18: 15–16. Biboviç, Ljiljana 1971 Some remarks on the factive and non-factive complements in English and Serbo-Croatian. In The Yugoslav Serbo-Croatian-English Contrastive Project Studies 3, Rudolf Filipoviç (ed.), 37–48. Zagreb: Institute of Linguistics, University of Zagreb. Bobaljik, Jonathan 1994 What does adjacency do? MIT Working Papers in Linguistics 22: 1–32. 1995 Morphosyntax: The syntax of verbal inflection. Ph.D. diss., MIT. Bobaljik, Jonathan, and Dianne Jonas 1996 Subject positions and the roles of TP. Linguistic Inquiry 27: 195–236. Bo‰koviç, Îeljko 1997a The syntax of nonfinite complementation: An economy approach. Cambridge, Mass.: MIT Press. 1997b On certain violations of the Superiority Condition, AgrO, and economy of derivation. Journal of Linguistics 33: 227–254. 1999 On multiple feature-checking: Multiple wh-fronting and multiple headmovement. In Working minimalism, Samuel D. Epstein and Norbert Hornstein (eds.), 159–187. Cambridge, Mass: MIT Press. 2000 Sometimes in SpecCP, sometimes in situ. In Step by step: Essays on minimalism in honor of Howard Lasnik, Roger Martin, David Michaels, and Juan Uriagereka (eds.), 53–87. Cambridge, Mass.: MIT Press. 2001 On the nature of the syntax-phonology interface: Cliticization and related phenomena. Amsterdam: Elsevier Science. 2002a On multiple wh-fronting. Linguistic Inquiry 33: 351–383. 2002b A-movement and the EPP. Syntax 5: 167–218. in press a On left branch extraction. In Investigations into Formal Slavic Linguistics. Contributions to the 4th European Conference on Formal Description of Slavic Languages – FDSL VI, Pter Kosta, Joanna Błaszczak, Jens Frasek, Ljudmilla Geist, and Marzena ygis (eds.), 543–577. Frankfurt am Main: Peter Lang. in press b On the clitic switch in Greek imperatives. In Topics in Balkan Syntax and Semantics, Olga Mi‰eska Tomiç (ed.). Amsterdam: John Benjamins. in press c Be careful where you float your quantifiers. Natural Language and Linguistic Theory.
PF merger in stylistic fronting and object shift
67
Bo‰koviç, Îeljko, and Howard Lasnik 1999 How strict is the cycle? Linguistic Inquiry 30: 691–703. 2003 On the distribution of null complementizers. Linguistic Inquiry 34: 527–546. Bresnan, Joan 1971 Contraction and the transformational cycle. Unpublished ms., MIT. Browne, Wayles 1980 Relativna reãenica u hrvatskom ili srpskom jeziku u poredjenju sa engleskom situacijom. [The relative clause in Serbo-Croatian in comparison with English.] Ph.D. diss., University of Zagreb. Chomsky, Noam 1957 Syntactic structures. The Hague: Mouton. 1993 A Minimalist Program for Linguistic Theory. In The view from building 20: Essays in linguistics in honor of Sylvain Bromberger, Ken Hale and S. Jay Keyser (eds.), 1–52. Cambridge, Mass.: MIT Press. 1995 Categories and transformations. In The Minimalist program, 219– 394. Cambridge, Mass.: MIT Press. 1999 Derivation by phase. MIT Occasional Papers in Linguistics 18. 2000 Minimalist inquiries. In Step by step: Essays on minimalism in honor of Howard Lasnik, Roger Martin, David Michaels, and Juan Uriagereka (eds.), 89–155. Cambridge, Mass.: MIT Press. Cinque, Guglielmo 1998 Adverbs and functional heads: A cross-linguistic perspective. Oxford: Oxford University Press. Collins, Chris, and Höskuldur Thráinsson 1996 VP-internal structure and Object Shift in Icelandic. Linguistic Inquiry 27, 391–444. Delsing, Lars-Olof 2001 Stylistic fronting. Evidence from Old Scandinavian. Working Papers in Scandinavian Syntax 68: 147–171. Diesing, Molly 1996 Semantic variables and object shift. In Studies in comparative Germanic syntax, Höskuldur Thráinsson, Samuel D. Epstein, and Steve Peter (eds.), Vol. 2, 66–84. Dordrecht: Kluwer. Epstein, Samuel D 1999 Un-principled syntax and the derivation of syntactic relations. In Working minimalism, Samuel D. Epstein and Norbert Hornstein (eds.), 317–345. Cambridge, Mass: MIT Press. Epstein, Samuel D., Erich Groat, Ruriko Kawashima, and Hisatsugu Kitahara 1998 A derivational approach to syntactic relations. Oxford: Oxford University Press. Erteschik-Shir, Nomi 2001 P-syntactic motivation for movement: Imperfect alignment in object shift. Working Papers in Scandinavian Syntax 68: 49–73.
68
Îeljko Bo‰koviç
Falk, Cecilia 1993 Non-referential subjects in the history of Swedish. Ph.D. diss., Lund University. Fischer, Susann, and Artemis Alexiadou 2001 On stylistic fronting: Germanic vs Romance. Working Papers in Scandinavian Syntax 68: 117–145. Franks, Steven 1998 Clitics in Slavic. Paper presented at the Comparative Slavic Morphosyntax Workshop, Spencer, Ind. Franks, Steven, and Îeljko Bo‰koviç 2001 An argument for multiple spell-out. Linguistic Inquiry 32: 174–183. Halle, Morris, and Alec Marantz 1993 Distributed morphology and the pieces of inflection. In The view from building 20: Essays in linguistics in honor of Sylvain Bromberger, Ken Hale and S. Jay Keyser (eds.), 111–176. Cambridge, Mass.: MIT Press. Halpern, Aaron Lars 1995 On the placement and morphology of clitics. Stanford, Calif.: CSLI. Hiraiwa, Ken 2001 EPP and object shift in Scandinavian: Deriving parametric differences. Unpublished ms., MIT. Holmberg, Anders 1986 Word order and syntactic features in the Scandinavian languages and English. Ph.D. diss., University of Stockholm. 1999 The true nature of Holmberg’s generalization. Studia Linguistica 53: 1–39. 2000 Scandinavian stylistic fronting. Linguistic Inquiry 31: 445–483. Holmberg, Anders, and Christer Platzack 1995 The role of inflection in Scandinavian syntax. Oxford: Oxford University Press. Huang, C. -T. James 1993 Reconstruction and the structure of VP: Some theoretical consequences. Linguistic Inquiry 24: 103–138. Izvorski, Roumyana 1993 On wh-movement and focus-movement in Bulgarian. Paper presented at CONSOLE 2, University of Tübingen. Jónsson, Jóhannes Gísli 1991 Stylistic fronting in Icelandic. Working Papers in Scandinavian Syntax 48: 1–43. Josefsson, Gunlög 2001 The true nature of Holmberg’s generalization revisited-once again. Working Papers in Scandinavian Syntax 67: 85–102.
PF merger in stylistic fronting and object shift
69
Kayne, Richard 1994 The antisymmetry of syntax. Cambridge, Mass.: MIT Press. Kraskow, Tina 1994 Slavic multiple questions: Evidence for wh-movement. Paper presented at the 68th Annual Meeting of the Linguistic Society of America, Boston, Mass. Lasnik, Howard 1995 Verbal morphology: Syntactic structures meets the Minimalist Program. In Evolution and revolution in linguistic theory: Essays in honor of Carlos Otero, Héctor Campos and Paula Kempchinsky (eds.), 251–275. Washington, D. C.: Georgetown University Press. 2002 From features to remnants. In Dimensions of movement, Artemis Alexiadou, Elena Anagnostopolou, Sjef Barbiers, and Hans-Martin Gärtner (eds), 189–208. Amsterdam/Philadelphia: John Benjamins. in press Patterns of verb raising with auxiliary be. Proceedings of the 1995 UMass Conference on African American English. Lebeaux, David 1988 Language acquisition and the form of the grammar. Ph.D. dissertation, University of Massachusetts, Amherst. Lobeck, Anne 1990 Functional heads as proper governors. Proceedings of NELS 20, 348–362. Maling, Joan 1980 Inversion in embedded clauses in Modern Icelandic. Islenskt mál 2: 175–193. Also in Modern Icelandic Syntax. Syntax and Semantics 24, Joan Maling and Annie Zaenen (eds.), 71–91. San Diego: Academic Press. Müller, Gereon 1998 Incomplete category fronting: a derivational approach to remnant movement in German. Dordrecht: Kluwer. Ochi, Masao 1999 Multiple spell-out and PF adjacency. Proceedings of NELS 29: 293–306. Oku, Satoshi 1998 A theory of selection and reconstruction in the minimalist perspective. Ph.D. diss., University of Connecticut, Storrs. Ottósson, Kjartan 1989 VP-specifier subjects and the CP/IP distinction in Icelandic and Mainland Scandinavian. Working Papers in Scandinavian Syntax 44: 89–100. Platzack, Christer 1987 The Scandinavian languages and the null subject parameter. Natural Language and Linguistic Theory 5: 377–401.
70
Îeljko Bo‰koviç
Poole, Geoffrey 1992 The Case Filter and stylistic fronting in Icelandic. Harvard Working Papers in Linguistics 1: 19–31. 1996 Optional movement in the Minimalist Program. In Minimal ideas, Werner Abraham, Samuel David Epstein, Höskuldur Thráinsson, and C. Jan-Wouter Zwart (eds.), 199–216. Amsterdam: John Benjamins. 1997 Stylistic fronting in Icelandic: A case study in prosodic X0 movement. Newcastle and Durham Working Papers in Linguistics 4: 249–283. Rivero, María-Luisa 1991 Long head movement and negation: Serbo-Croatian vs. Slovak and Czech. The Linguistic Review 8: 319–351. Roberts, Ian 1992 Verbs and diachronic syntax. Dordrecht: Kluwer. Rögnvaldsson, Eiríkur, and Höskuldur Thráinsson 1990 On Icelandic word order once more. In Modern Icelandic Syntax. Syntax and Semantics 24, Joan Maling and Annie Zaenen (eds.), 3–40. San Diego: Academic Press. Rudin, Catherine 1986 Aspects of Bulgarian syntax: Complementizers and wh-constructions. Columbus, Ohio: Slavica. Saito, Mamoru 2000 Scrambling in the Minimalist Program. Unpublished ms., Nanzan University. Santorini, Beatrice 1994 Some similarities and differences between Icelandic and Yiddish. In Verb movement, David Lightfoot and Norbert Hornstein (eds.), 87– 106. Cambridge: Cambridge University Press. Sigur§sson, Halldór 1997 Stylistic fronting. Paper presented at the Second Annual International Tromsø Workshop on Linguistics, University of Tromsø. Stepanov, Arthur 2001a Cyclic domains in syntactic theory. Ph.D. diss., University of Connecticut, Storrs. 2001b Late adjunction and minimalist phrase structure. Syntax 4: 94–125. Suñer, Margarita 1994 V-movement and the licensing of argumental wh-phrases in Spanish. Natural Language and Linguistic Theory 12: 335–372. Torrego, Esther 1984 On inversion in Spanish and some of its effects. Linguistic Inquiry 15: 103–129. Uriagereka, Juan 1999 Multiple spell-out. In Working minimalism, Samuel D. Epstein and Norbert Hornstein (eds.), 251–282. Cambridge, Mass: MIT Press.
PF merger in stylistic fronting and object shift
71
Vikner, Sten 1989 Object shift and double objects in Danish. Working Papers in Scandinavian Syntax 44: 141–155. 1995 Verb movement and expletive subjects in the Germanic languages. Oxford: Oxford University Press. Zagona, Karen 1988 Proper government of antecedentless VP in English and Spanish. Natural Language and Linguistic Theory 6: 95–128.
The MLC and derivational economy Gisbert Fanselow
Introduction There is a certain tension between the role which the Minimal Link Condition (MLC, (1)) plays in at least the minimalist theories of syntax, and the existence of numerous (apparent or real) counterexamples such as (2) that arise in multiple questions. For such questions, the MLC seems to imply strict superiority effects. In particular, wh-objects should not be able to cross whsubjects on their way to Spec,CP. More often than not, this prediction fails to be observed. Put differently, the question arises as to why the MLC is respected strictly by head movement, and more of less so by A-movement, while it is a fairly poor predictor for grammaticality when the proper way of carrying out operator movement is at stake. (1)
Minimal Link Condition (MLC) _ cannot move to a if there is a ` that can also move to a and is closer to a than _
(2)
Constructions violating the superiority condition a. which book did which person read? b. was hat wer gelesen what has who read “what was read by whom?”
(German)
If correct, this characterization of the problem already suggests a solution: the MLC must be interpreted as a principle that is sensitive to interpretation/ expressivity (cf. also Kitahara (1993), (1994), Reinhart (1995), Sternefeld (1997)). Whenever it does not make a semantic difference whether the MLC is respected or not, the MLC must be obeyed strictly. However, the MLC is never (by itself) able to block a movement operation that is inevitable for expressing a certain meaning. Consequently, to the extent that head movement does not have any semantic effects, the MLC governs head movement
74
Gisbert Fanselow
in a strict and exceptionless way. To the extent that different ways of carrying out operator movement are crucial in establishing different semantic relations, the MLC effects we observe in this domain are modulated by considerations of interpretation. Originally, the idea that the MLC decides between those structural alternatives only that have identical meanings was motivated by data involving different scope assignments to wh-operators (see sect. 1). The present paper argues that the required meaning identity must also involve distinctions of information structure (sections 3 and 4), which explains why many (if not most) languages are like German in not showing simple superiority effects at all. Languages like English and Bulgarian fit into such a picture as well – there is no variation among languages in this respect. Furthermore, we concur with Sternefeld (1997) in the claim that the MLC must be applied in a cyclic rather than global fashion (section 2.4), and we argue that it involves reference to LF-identity rather than meaning identity in a broad sense.
1. The MLC and wh-phrase scope The MLC is a core principle of current syntactic theorizing, and has been made responsible for a wide variety of syntactic generalizations, such as the Head Movement Constraint of Travis (1984), the intervention effects restricting A-movement to subject position (Chomsky 1993, 1995, Stepanov 2001, this volume), and the superiority effect governing the formation of multiple questions. In spite of the important role it plays in determining whether syntactic computations are formally correct, some aspects of multiple questions require that the MLC is sensitive to the interpretation of the structures or derivations that it compares. Before we discuss this fact, let us consider some simple superiority effects in English. Object wh-phrases cannot cross c-commanding subject wh-phrases (3), as was observed by Kuno and Robinson (1972). Haider (this volume) argues that the contrast in (3) involves a grammatical constraint that bans wh-phrases occupying the subject position of finite clauses (such as the Empty Category Principle of Chomsky 1981). (3)
Simple subject-object asymmetry a. (It does not matter) who bought what b. (It does not matter) *what who bought _
The MLC and derivational economy
75
Independent of whether such a factor contributes to making (3b) worse than (3a), the special status of the subject position cannot be the only source for superiority effects: wh-objects must not cross wh-subjects even when the latter are lexically governed, as in (4). Likewise, a wh-object from a lower clause cannot cross a wh-object from a higher clause on its way up to Spec,CP (5). The interaction of clausemate objects yields identical intervention effects, as evidenced by the contrasts in (6). (4)
Subject-object asymmetry not involving proper government a. who do you expect _ to do what? b. *what do you expect who to do _
(5)
Biclausal object-object-asymmetry a. who do you persuade _ to do what b. *what do you persuade who to do _
(6)
Superiority effects among objects a. *what did you give who _ b. who did you give _ what c. what/which check did you send _ to who d. *who(m) did you send what/which check to _
As Hendrick and Rochemont (1982) correctly point out, data such as (4) – (6) are incompatible with the view that the superiority effect can be completely reduced to the ECP or a similar principle. What is called for is an account along the lines originally proposed by Kuno and Robinson (1972): A wh-DP a cannot cross a structurally higher wh-DP b when moving to Spec,CP. This generalization derives from the MLC in (1) straightforwardly. One notorious difficulty of purely formal accounts of the superiority condition derives from the fact that pairs of wh-phrases that take different semantic scope need not obey the MLC, as (7) illustrates (see, e.g., Huang 1982, Lasnik and Saito 1992). If the lower occurence of who in (7a) takes matrix scope, the sentence is fine, although the movement of what across who fails to obey the MLC. If the lower who takes scope over the complement clause only, (7a) is as ungrammatical as (3b). The effect is not confined to clausemate wh-phrases. Of ten English native speakers (all linguists) that I consulted, seven accepted (7c), and five did not even find (7d) objectionable.
76
(7)
Gisbert Fanselow
Absence of superiority effects for wh-phrases with different scope a. who wonders what who bought? b. who wonders who bought what? c. who wonders what John persuaded who to buy __ ? d. who wonders what John told who that he should buy __ ?
According to Golan (1993), Kitahara (1993), and Reinhart (1995, 1998), such facts suggest that the MLC must be interpreted as an economy constraint related to LF-outputs (meanings). Whenever there is no other way to express a certain meaning, the MLC need not be respected. Let us consider (7a) in more detail. Overt movement of a wh-phrase to Spec,CP fixes its scope. A wh-phrase merged in a complement clause can thus take matrix scope under two conditions only: it moves to the Spec,CP position of the matrix clause, or it stays in situ, and gets scope-bound by an element in the matrix clause. It must not, however, be placed into the Spec,CP position of the complement clause, and still take matrix scope. Thus, the subject of the complement clause who can take matrix scope in (7a) only if it stays in situ. In other words, it can take matrix scope only if crossed by the lower wh-phrase what targeting the complement clause Spec,CP position. The meaning (8a) of (7a) simply cannot be expressed differently – (7b) means something else (viz. (8b)). Whether the MLC is respected or not is irrelevant when the structural alternatives differ in interpretation. (8)
a. For which persons x,y: x wonders what y bought b. For which person x, and for which z: x wonders who bought z
In contrast to what holds for (7), the two derivational alternatives in (3) do not yield different interpretations: there is only one scope option available for the two wh-phrases. In such a situation (and only in such a situation), the MLC filters out derivations that are not in line with it. Further English constructions illustrating that the application of the MLC depends on the interpretation arrived at will be presented in sections 2.1. and 2.3. Given that the wellformedness of (7a) is of some theoretical importance, it is surprising that little evidence from other languages has entered the discussion of the interpretation-sensitivity of the MLC. According to one of my informants (Koyka Stoyanova, p.c.), (9a,b) are as fine in Bulgarian as they are in English if the second occurence of koj is stressed, but in her dialect, the order kakvo koj is grammatical in simple multiple questions, too.
The MLC and derivational economy
77
Penka Stateva, my second Bulgarian informant, does not accept the order kakvo koj in a simple clause, and rejects (9) as well. No contrast such as the one between (3) and (7) exists in Bulgarian. The absence of this contrast will be explained in section 4.1: we argue there that the ordering restrictions among Bulgarian wh-phrases are not caused by the MLC. The ungrammaticality of (9) in some dialects therefore does not bear on the issue of the interpretation sensitivity of the MLC. (9)
Anti-superiority in Bulgarian a. #koj se chudi, kakvo koj who wonders what who “who wonders what who bought?”
kupi? bought
b. #na kogo kaza, who.dat you-tell
kupi? bought
kakvo what
koj who
For other languages, it is not much easier to construct relevant evidence, because the simple superiority effect exemplified in (3) is not a widespread phenomenon. The following data from German, however, provide further evidence for the interpretation sensitivity of the MLC. (2b) has already shown that the formation of multiple questions is not affected by the MLC in German (at least superficially) when clausemates are involved, but it has been claimed frequently that a wh-phrase from a lower clause cannot cross a matrix wh-word. (10) Superiority for non-clausemates in standard German a. *wen hat wer wh.acc has who.nom b.
gehofft, dass hoped that
Irina Irina
einlädt invites
wer hat gehofft, dass Irina wen einlädt “who has hoped that Irina will invite who?”
There are reasons to doubt, however, that the ungrammaticality of (10a) (in the standard dialect) is caused by the MLC. Superiority effects disappear when the wh-phrases are discourse-linked in the sense of Pesetsky (1987). However, (10a) does not improve in the standard language when d-linked wh-phrases are used. Thus, what rules out (10a) must be different from the MLC.
78
Gisbert Fanselow
(11) *welchen dass which.acc that
Studenten Irina
hat welcher einlädt
Professor gehofft,
student.acc Irina
has which.nom professor invites
hoped
“which professor has hoped that Irina invites which student?” In less restrictive dialects (such as the one spoken by the author), all sentences in (12) are acceptable up to a certain degree, but (12a) and (12b) have different interpretations. If (12a) is completely wellformed at all, the sentence allows a single-pair interpretation only. A pair-list-reading is available for (12b) only, i.e., for the structure which violates the MLC. In addition the “scope-marking” construction (12c) allows the pair-list-reading as well. (12) Nonstandard German: Subordinate clause wh-elements crossing matrix wh-phrases a. (?)wer hat who has
gehofft, dass hoped that
b. wen hat who.acc has
wer who.nom
c. was what
wer who
hat has
Irina Irina
gehofft, hoped
wen who dass that
einlädt invites Irina Irina
gehofft, wen Irina hoped who.acc Irina
einlädt invites
einlädt invites
How can these data be understood?1 In quite a number of languages, in situ wh-phrases cannot take scope out of the minimal (finite) clause they are contained in. Hindi is a case in point (see Mahajan 1990). The scope of an in-situ wh-phrase must be determined by linking it to a higher wh-phrase, or to a scope marker. The linking might be arrived at in various ways (binding, covert movement), but the important observation concerning Hindi and other languages is that linking is subject to strong locality requirements. In contrast to what holds for overt movement (= wh-scrambling in the case of Hindi), finite CPs are barriers for the linking relation. Consequently, (13) is ungrammatical because the lower occurence of kis-ko must be linked to a whphrase or a scope marker, but cannot be so because it is embedded in an island for linking.
The MLC and derivational economy
79
(13) Clauseboundedness of the binding of in situ wh-phrases in Hindi *Raam-ne
kis-ko
kahaa
ki
Sitaa-ne
Raam-erg who.dat told that Sita-erg “who did Ram tell that Sita saw who?”
kis-ko
dekhaa
who
saw?
Let us now come back to (12). First, we want to explain why (12a) is out with a pair-list interpretation. This follows if (the relevant version of) German resembles Hindi in that finite CPs are barriers for the scope linking of in situ wh-phrases. Consequently, wen cannot be scope-linked to wer in (12a), which renders the structure ungrammatical under the intended interpretation. Finite clauses are not, however, barriers for overt movement. Therefore, there is a way of constructing a Logical Form for (12) in which both whphrases take matrix scope, viz. by moving the wh-element from the complement clause into the matrix-Spec-CP position, and by scope-linking the matrix subject to the matrix Spec,CP position. This is what has happened in (12b). None of the relations established there is in conflict with locality requirements – but the MLC is violated. Apparently, this MLC-violation is licensed because the relevant Logical Form cannot be arrived at in a different way – the structure (12a) respecting the MLC is incompatible with the locality of the licensing of wh-phrases in situ. (12) illustrates the same phenomenon as (7), but in a rather different context. The other examples in (12) illustrate two further points. (12c) shows that German is like Hindi in having a wh-scope-marking construction, in which a scope marker (was) rather than the real wh-phrase appears in Spec,CP. (12c) is well-formed in all dialects of German, and expresses a pair-list interpretation. A minor point illustrated by this example is that finite clauses are islands for scope taking in German only for wh-phrases that do not occupy a Spec,CP position (note that the lower wh-phrase is fronted in the complement clause). There are various ways of analysing the construction (see, e.g., the contributions in Lutz, Müller and von Stechow 2000), but details are irrelevant for the more important point: long wh-movement in (12b) and wh-scope marking in (12c) yield the same interpretation, but the wh-scope-marking construction (12c) avoids an MLC violation, in contrast to (12b). This shows that the sensitivity of the MLC to interpretation cannot involve a simple, “global” concept of meaning identity. If it would, the wellformedness of (12c) should imply that the MLC is able to rule out (12b). Given (12c), no MLC-violation is necessary for expressing the “meaning” of (12b). The MLC must therefore not be sensitive to “meaning identity” in a global sense. Rather, the identity of interpretation that is rele-
80
Gisbert Fanselow
vant for the applicability of the MLC must be a matter of identical (or closeto-identical) Logical Forms. The LF of (12c) is different from the one of (12b) (see in particular Fanselow and Mahajan (2000) for arguments), and therefore, (12c) does not count when the grammaticality of (12b) is established. Haider (1997: 221) exemplifies the claim that complement clause whphrases may cross matrix wh-phrases in German with examples such as (14). To me, (14) invites a single-pair answer only, so that (14) is not fully comparable to the multiple questions discussed so far. Furthermore (14) involves apparent movement from a V2-complement clause, and the theoretical status of such an operation is quite unclear, see Reis (1996, 1997) for arguments that the construction is parenthetical. I therefore refrain from discussing such examples in more detail. (14) Superiority violations in a construction with extraction out of a V2 complement wemi Bild
hat wer verkauft]?
gesagt [ei habe
who.dat hat who.nom said picture sold
sie ei ein
has.subjunctive she
a
“who said she had sold a picture to whom?” Our argumentation presupposes that single-pair interpretations of multiple questions (for which (12a) seems marginally acceptable) have a derivation different from the one for multiple questions with a pair-list reading. This claim is supported by the observation that further constructions are ungrammatical with a pair-list reading, but acceptable under a single-pair interpretation. E.g., most native speakers of German (including the author) reject (15) as a question asking for pair-lists, but the single pair interpretation is fine. (15) Multiple adjunct question with a single pair interpretation wie
hat
er
es
warum
geschrieben
how has
he
it
why
written
“how did he write it, and why” Examples such as (7) show that the applicability of the MLC depends on the interpretation of the structure that it would block. German data such as
The MLC and derivational economy
81
(12) constitute further evidence for this. At the same time, the data in (12) shows that the MLC is not sensitive to “meaning” in a global sense – rather, it is the nature of the LF that a movement operation creates that determines whether the MLC must be respected.
2.
The MLC and expressivity
The strongest conclusion one can draw from from the discussion in the preceding paragraph is that requirements of semantic expressivity always override the MLC. A structure violating the MLC is ungrammatical only if the Logical Form it would express can be arrived at with a structure respecting the MLC. In this section, we defend this strong conclusion against potential counterexamples, and discuss how the MLC can be applied in local fashion. First, we discuss the interaction of the MLC with the that-trace filter. Section 2.2 focuses argument-adjunct asymmetries, while section 2.3 is dedicated to nestedness effects, which have been related to the MLC. Finally, we will briefly discuss what a cyclic application of the MLC might look like.
2.1. Interactions with the ECP As one of the anonymous reviewers has pointed out, the absence of a contrast in (16) might pose a problem for the idea that the MLC applies only if that does not prevent a certain interpretation from being expressed: (16) Two wh-phrases merged in a finite complement clause a. *who do you think that _ bought what b. *what do you think that who bought _ (16b) violates the MLC, so its ungrammaticality is expected. However, the constellation that respects the MLC, viz., (16a), is ungrammatical as well because of a that-trace-filter violation. In contrast to what we saw in section 1, the MLC violation of (16b) is not tolerated by the grammatical system of English, in spite of the fact that this renders the interpretation of (16b) inexpressible. The absence of a contrast in (16) does not show, however, that the MLC is able to block structures even if the competing structure respecting the
82
Gisbert Fanselow
MLC violates a further condition on LF. Aoun et al. (1987) and others have argued that the principle Q responsible for the that-trace effect applies at PF, and not at LF. Consequently, Q cannot interact with the MLC: the MLC applies to LFs, and compares derivations that yield (close-to-) identical LFs. It is blind to what happens in other branches of the derivation. A structure that has an optimal LF and is accepted by the MLC need not be in line with further PF-requirements, rendering the LF unpronouncable. Given this relevance of PF-constraints, (16) does not exclude an interpretation of the MLC that compares different ways of arriving at essentially the same LFs – while it falls in line with an other conclusion arrived at in section 1: the MLC is not a principle that takes care of “expressivity” in a literal sense. The following observation leads to a modification of our analysis of (16), which leaves the crucial point intact, however: the MLC responds to the need of respecting further LF-constraints, but it is blind to what happens in the PF-branch of grammar. Haider (this volume) argues that there is an extra constraint banning wh-phrase occupying the specifier position of a finite IP in English. The constraint is independent of the MLC, since it shows its force even in constructions that do not involve a crossing wh-dependency, as was already observed by Chomsky (1981). Interestingly, as (17) illustrates, the relative degree of (un-)acceptability involves dimensions such as discourse-linking (see 17b), and, as Bresnan (1972) has observed, in situ wh-subjects are much better when they appear in subjunctive clauses. (17) Wh-subjects in situ a. *who believes that who loves Irina? b. ?who believes that which man loves Irina? c. ?who demands that who be arrested? The ungrammaticality of (16b) might therefore also be caused by the presence of an in situ wh-subject in a finite clause quite independent of the MLC. Given the contrasts in (17), one would expect that structures like (16b) improve if, e.g., the complement clause appears in the subjunctive mood. In such a construction, the overt movement of the subject of the complement clause still implies a that-trace filter violation, but the additional ban against in situ wh-subjects is now much less strict. According to Anthony Green and Sue Olsen (p.c.), (18a) is indeed much better that (16b).
The MLC and derivational economy
83
(18) Missing superiority effect for extraction out of a subjunctive complement a. (?)what do they require that who buy? b. *who do they require that buy what c. *what do you expect who to buy d.
who do you expect to buy what
If the contrast between (16b) and (18a) generalizes, we have a further example from English that shows that the MLC does not block a construction (viz., (18a)) if the structure that conforms to the MLC (viz., (18b)) violates a different principle. The contrast between (18a) and (18b) would force upon us the assumption that the that-trace filter banning overt subject movement in fact applies at LF, and not at PF. Otherwise, its effects would not be visible to the MLC, as necessary for (18a). Consequently, the PF-located constraint that is invisible to the MLC (as required for (16)) is rather the further ban against in situ wh-subjects argued for by Haider (this volume) and not the that-trace filter. It should finally be noted that the contrast between (18a) and (18c) is due to the fact that the MLC-respecting competitor is well-formed in the case of (18c), but not in the case of (18d).
2.2.
Adjuncts
Multiple questions with adjunct wh-pronouns constitute a second domain that is relevant for the status of the MLC as an economy constraint. None of the structures in (19) is grammatical – although there is no other (monoclausal) way of expressing the intended interpretations. (19) Adjunct effects in English a. b. c. d.
*who came why *why did who come *who spoke how *how did who speak?
The MLC clearly picks (19a,c) rather than (19b,d), and correctly so in the light of (20). (19a,c) are blocked by some requirement (see, e.g., Haider, this
84
Gisbert Fanselow
volume, Reinhart 1995, Hornstein 1995, among many others) that excludes the adjuncts how and why in any position but Spec,CP. (20) a. who spoke when? b. who spoke in what way? Again, the question arises as to why the MLC cannot be overriden in this context – yielding (19b,d), which do not violate the strong constraint against how and why appearing in situ. Note that (21) is ungrammatical: only one out of some twenty linguists with English as a native language who I consulted accepted the sentence with a downstairs interpretation of how. Unlike what we saw in the preceding section in the context of (18), the ungrammaticality of (19b,d) can not be explained in terms of an additional constraint filtering out wh-phrases in the subject position of non-subjunctive clauses. (21) *how does the police demand that who be treated _ We will propose two accounts of (19) that allow us to maintain that the MLC is ignored when a certain LF cannot be constructed otherwise. As Haider (this volume) has pointed out, adjunct effects of the sort exemplified in (19) are absent in OV languages, as (22) illustrates. This observation excludes the idea that (19a,c) are ungrammatical on simple semantic grounds. (22) Missing Adjunct Effects in OV-languages a. wie het hoe
gedaan
b. wer es
wie
gemacht hat
how
done
who it
heeft
(Dutch complement question) (German complement question)
has
Haider suggests that higher-order wh-operators such as how and why must c-command the head of the phrase they are applied to. Higher order adverbs range over events, so how and why should c-command the element that situates the proposition in time, i.e., how and why must c-command the (finite) verb. This condition is fulfilled in (22a,b), but not in (19a,c). Movement of the finite verb to Comp does not render wh-adjuncts in situ ungrammatical in Dutch or German. This is in line with the general observation that verb second movement is invisible at the level of Logical Form, either because it is reconstructed, or because it applies in the phonological component of grammar.
The MLC and derivational economy
85
(22) Missing Adjunct Effects in OV-languages c. wie heeft d. wer hat who has
het es it
hoe wie how
gedaan gemacht done
(Dutch matrix question) (German matrix question)
The account suggested by Haider (this volume) cannot be fully correct, however, because Swedish is not in line with it. All of my five informants accepted (23a), and three of them found (23b) grammatical, in spite of the VO-nature of Swedish. (23) Missing Adjunct Effects in Swedish a. vem skrattade varfoer who laughed why b. Det spelar ingen roll vem som skrattade varfoer it plays no role who that laughed why “it does not matter who laughed for what reason” Similarly, Richards (2001: 18–19) reports adjunct effects for the SOV language Tibetan. Therefore, a different solution is called for. Rizzi (1990: 47) has proposed that certain wh-adjuncts (corresponding to sentence-level adverbs) are based-generated in Comp. One way of translating this proposal into the current discussion consists of the assumption that certain wh-elements are required to appear at the left periphery of clauses on a language particular and item-specific basis. Because of (22) – (23), this idiosyncrasy of how and why (and French pourquoi) cannot be reduced to semantic considerations alone. One way of spelling this idea out lies in the assumption that the MLC applies cyclically (see below for details), while the constraints forcing how and why into Spec,CP are representational principles checking the wellformedness of completed Logical Forms. The MLC would therefore apply prior to the constraints affecting higher order wh-phrases, with the desired effect: the MLC picks (19a,c), and these sentence are blocked at too late a point in the derivation for undoing the impact of the MLC. The account sketched so far predicts the data as judged in (19) and (24). The MLC forces the subject to move to Spec,CP in a multiple question involving subjects and adjuncts, but the resulting structure is blocked because why and how cannot appear in any position but Spec,CP. On the other hand, when adjuncts interact with objects, the MLC will make (24b) block (24a). (24b) is also in line with the requirement that English wh-adjuncts appear at the left periphery.
86
Gisbert Fanselow
(24) Adjunct-object interaction in multiple questions a. *what did Bill buy why b. why did Bill buy what Hornstein (1995: 147–149) reports further data such as (25) that may in fact lead to a simpler analysis. If his judgements are correct, why (unlike its Dutch, German and Swedish counterparts) cannot appear at all in multiple questions, quite independent of the position it appears in. (25) wh-adjuncts blocked in multiple questions a. *I wonder why Bill left when b. *I wonder why Bill lives where c. *I wonder why which person came d. *I wonder why you bought what e. *why does John expect who to win If Hornstein is correct, wh-adjuncts come in two varieties. German wie “how” and warum “why” are linked to a semantic representation that makes them eligible for multiple questions, whereas how and why cannot appear there. Under such an account, all sentences in (19) are simply gibberish, and we need not care about what the MLC would predict for them. Whether this simplification is tenable or not depends on the status of (24b). If grammatical, this sentence is incompatible with the idea that why cannot appear in multiple questions. The simplification thus presupposes that (24b) involves an “illusion of acceptablity” (Hornstein 1995: 148). We need not settle the issue here, because the idea that the MLC is an economy constraint can be maintained in the account discussed earlier as well.
2.3. Nestedness A third domain sheds light on the question of whether the MLC is sensitive to LF-identity or not: nestedness effects. It has been suggested that the nestedness effect can be derived from the MLC, see Richards (2001) for a detailed proposal. If this suggestion is correct, the application of the MLC could not be confined to structural candidates yielding the same LF.
The MLC and derivational economy
87
In English, the interaction of two wh-phrases moving to two different Spec,CP positions is governed by a nestedness effect (see Fodor 1978, Pesetsky 1982): the dependencies formed by the two wh-chains must not cross – one path must be embedded in the other. The nestedness condition is respected, even when it blocks the expression of a certain interpretation, as it does in (26b) and (27b).2 (26) Nestedness Effects a. ?Which violinj do you wonder which sonatai to play _i on _ j b. *Which sonataj do you wonder which violini to play _i on _ j (27) a. ?Whatj did you decide [whoi [to persuade ti [to buy tj]]] (Oka 1993: 255, (2a)) b. *Whoi did you decide [whatj [to persuade ti [to buy tj]]] (Oka 1993: 255, (2b)) The constraint responsible for nestedness is respected even though the meanings of (26b, 27b) are different from the one expressed by (26a, 27a). Such observations are relevant for the present discussion to the extent that claims made by Richards (2001) and others are correct that the nestedness condition reduces to the MLC. If it does, (26) and (27) would not be in line with the idea that the MLC is ignored when a certain LF could not be formulated otherwise. Under what conditions does the MLC imply nestedness effects? Consider an abstract representation such as (28a), with two wh-phrases that both could be attracted by either of CompA and CompB. When the derivation reaches the point at which CompB attracts a wh-phrase (at which the specifier of CompB must be filled by a wh-phrase), a “blind” application of the MLC implies that wh1 only can move, forming (28b). At a later stage in the derivation, CompA attracts (the specifier of CompA must be filled by a wh-phrase). Let us confine our attention to a situation in which wh1 has already reached its scope position in (28b). Therefore, it cannot undergo further movement. What will happen in such a situation? (28) a. [CompA
…
[CompB
[ .. wh1 .. [
wh2 … ]]]]
b. [CompA
…
wh1 [CompB
[ .. wh1 .. [
wh2 … ]]]]
c. [wh2 CompA …
wh1 [CompB
[ .. wh1 .. [
wh2 … ]]]]
88
Gisbert Fanselow
If the MLC applies blindly irrespective of whether the LF it generates is wellformed or not, then only wh1 can move to CompA (wh2 cannot move because of intervening wh1), which implies that the derivation breaks down, because a wh-phrase is required to move that must not do so. In this way, aspects of the wh-island condition might be derived, see Chomsky (1995). This would constitute a case in which the MLC rules out a meaning that cannot be expressed otherwise. It is not advisable, however, to derive the wh-island effect from the MLC. In spite of the fact that it respects the superiority condition, English is sometimes quite liberal with respect to wh-islands, as the status of e.g. what do you wonder how to fix suggests. German respects the wh-island condition, but fails to show superiority effects. The two phenomena simply are not correlated with each other. If wh1 is frozen in its position in (28b), i.e., if it cannot move further, and if that is taken into account in the computation of MLC effects, then wh1 does not constitute a b is the sense of (1) repeated below that could go to CompA (1)
Minimal Link Condition (MLC) _ cannot move to a if there is a ` that can also move to a and is closer to a than _
Therefore, wh2 can move to CompA (as in (28c)). The derivation leading to (28c) is well-formed, yielding a nested structure, because the lower of two Comps (which attracts first) only attracts the higher of two wh-phrases if movement respects the MLC. In this way, the nestedness condition is derivable from the MLC. Obviously, this reduction of the nestedness condition to the MLC presupposes that the applicability of the MLC does not depend on the existence of a different way of constructing the intended Logical Form. The LF (28d) is different from (28c), so that the fact that (28c) cannot be arrived at in a derivation respecting the MLC is irrelevant for the wellformedness of (28d). (28) d. [wh1 CompA
…
wh2 [CompB
[ .. wh1 .. [
wh2 … ]]]]
If one wants to stick to the idea that the MLC triggers nestedness effects, one has to offer alternative accounts of the data presented sections 1 and 2.1 that suggest an interpretation-sensitivity of the MLC. There are, however, good reasons for not endorsing such an MLC-based account of nestedness. Superiority and nestedness do not go hand in hand, as one would expect if the two phenomena were due to the same principle of UG.
The MLC and derivational economy
89
For example, Swedish respects the nestedness condition (see Maling and Zaenen 1982: 238f) although it fails to show superiority effects, see (29) and (ii) in endnote 3. Thus, at least in Swedish, nestedness cannot be reduced to the MLC. (29) Absence of superiority effects in Swedish Vad what
koepte bought
vem who
At least certain varieties of Spanish (see (30)) and Catalan exemplify what appears to be an anti-nestedness effect for extractions from wh-clauses: the wh-phrase that is merged in the higher position must also be the one moved to the higher of the two Spec,CP slots. Thus, wh-subjects and wh-indirect objects may cross wh-objects, but not vice versa. (30) Anti-nestedness in Spanish a. *qué libros which books
no not
quién who
ha has
leido read
b. quién who
sabes qué libros you know which books
ha has
leido read
no not
c. a quién to who
sabes you know
no sabes qué libros not you know which books
ha has
devuelto returned
Celia Celia
d. *qué libros no sabes a quíen ha devuelto Celia Likewise, Richards (2001: 27) claims that there is an anti-nestedness effect in Bulgarian. Again, the constraint seems uncorrelated with superiority, since simple superiority effects are observed in Bulgarian only, and not in Spanish. (31) Anti-nestedness in Bulgarian a. Koj1
se
Who self
opitvat
da
razberat
kogo2
try
to
find out
whom
b. *Kogo1 se opitvat da razberat
t1
e
ubil
t2
is killed
koj2 t2 e ubil t2
In any event, it is hard to draw firm theoretical conclusions from such contrasts, since there is considerable individual variation among speakers of
90
Gisbert Fanselow
Bulgarian (see Richards 2001: 28) and of Spanish (at least among the speakers we consulted). This variation suggests that processing factors contribute to generating (anti-)nestedness effects (see also Fodor 1978). Furthermore, nestedness effects have properties are different from those of superiority. Norwegian shows nestedness effects, but only if three (or more) dependencies are involved (Maling & Zaenen 1982). This is unexpected from an MLC perspective: the addition of a third wh-phrase eliminates superiority effects in English. Likewise, at least in English, there is no discourse-linking influence on nestedness: (26b) is bad although both wh-phrases are d-linked in the sense of Pesetsky (1987). Superiority effects fail to show up, however, when the wh-phrases are d-linked. To sum up, there is a number of reasons for not deriving (anti-)nestedness from the MLC.
2.4. Cyclic application of the MLC Our discussion corroborated the view that the MLC is an economy constraint: it does not apply when the relevant LF cannot be generated without violating it. The MLC is, however, insensitive to the issue of whether other components of grammar (such as PF) might prevent the structure selected by it from surfacing. The target LFs that the MLC compares must be very similar to each other. Otherwise, we could not understand the data discussed in section 1: the availability of a wh-scope-marking constructions was shown to be irrelevant for the applicability of the MLC in a structure involving long wh-movement. From a conceptual point of view, the MLC should be a derivational principle that applies when a phrase moves, or when a phase is completed. A cyclic application of the MLC may be called for on empirical grounds as well: if the principles that block in situ wh-subjects in English non-subjunctive clauses, and wh-adjuncts in non-left peripheral positions do not apply to PF, but rather at LF, then we must guarantee that the application of the MLC is not affected by them. This would hold if the MLC is applied cyclically, while the two constraints are representational restrictions on completed LFs. The simplest (but insufficient) way of applying the MLC cyclically and capturing interpretation effects at the same time works with the assumption that attracting Comps come with some index that must be shared by the whphrase to be attracted. The index indicates the target scope of the wh-phrases. Comp can attract a wh-phrase only if the indices borne by the two elements are identical. Therefore, under a strict interpretation of (1), a wh-phrase can
The MLC and derivational economy
91
skip another wh-phrase if they have different indices. See, e.g., Sternefeld (1997) for a discussion. What3 can move across the wh-subject in (32a), since who bears the scope index A of the matrix Comp. (32) a. whoA CompA wonders whatB CompB whoA bought _ b. whoA CompA wonders whoB CompB bought whatA c. whoA CompA wonders whoB CompB bought what d. *who CompA wonders whatB CompB whoB bought In such a model, the MLC can be hard-wired into the definition of movement (as proposed by Chomsky 1995): the attracting Comp always triggers the movement of the closest wh-phrase with the same index. Exceptions to the MLC such as (32a) are more apparent than real: whoA cannot be attracted by CompB at all. While being attractive from a conceptual point of view, this model does not account for a number of data we have considered. In (33), the wh-phrases must bear the same index, because they take scope over the same proposition (viz., the whole sentence). Therefore, if Comp attracts the closest wh-phrase with the same index, the sentences in (33) cannot be generated at all – contrary to what is necessary. (33) a. ?what do they require that who buy? b. wen who.acc
hat has
wer who.nom
gehofft, dass hoped that
Irina Irina
einlädt invites
For (33a), it might suffice to assume that the that-trace effect is hard-wired into the definition of movement as well, so that who is invisible to the attracting matrix Comp in (33a). Such a solution cannot be applied for (33b), however, since matrix subjects easily reach Spec,CP in German questions. (33b) and (to a lesser extent) (33a) thus show that a local version of comparing different derivations cannot be avoided in a successful theory of the MLC. This can be made precise as follows. Let us assume that wh-phrases in situ receive their scope by being bound (as was first suggested by Baker (1970), see Dayal (2003) for an overview of non-movement theories of wh-phrases in situ), and that the binding process is itself cyclic. When the syntactic object (34a) has been constructed, a wh-phrase must move to the specifier of CompA if CompA has a feature attracting a wh-phrase. There are four derivations to be considered,
92
Gisbert Fanselow
then: either wh1 or wh2 moves to the specifier of CompA, and the wh-phrase remaining in situ may or may not be scope-bound by Comp or the other whphrase. (34) a. [CompA
---
[wh1 - - - [ wh2 - - - ]]]]
Suppose that the in-situ wh-phrase is not scope bound after movement within (34a). Then we arrive at the representations (34b,c), which are different from each other. Therefore, the MLC-respecting structure (34b) cannot block (34c), if an application of the MLC presupposes that the relevant LF can be generated otherwise. (34) b. [wh1 [CompA
---
[wh1 - - - [ wh2 - - - ]]]]]
c. [wh 2 [CompA
---
[wh1 - - - [ wh2 - - - ]]]]]
Wh-phrases that are not scope-bound at all are illegal at LF. Consequently, the two partial derivations in (34b,c) will end up as grammatical only if the wh-phrase left unbound so far is later bound by a higher Comp, or by a higher wh-phrase. This is exactly what happens in (32a,b). These examples show that neither of (34b,c) should be able to block the other. If the derivation proceeds beyond (34b,c), the cyclic nature of wh-binding implies that the scope of the in situ wh-phrase must not be confined to the domain of CompA. Suppose now that the in situ wh-phrase is scope bound after movement. This yields the representations (34d,e): (34) d. [wh1 [Y CompA
---
[wh1A - - - [ wh2A - - - ]]]]]
e. [wh 2 [Y CompA
---
[wh1A - - - [ wh2A - - - ]]]]]
The two syntactic objects in (34d,e) are certainly not identical, but they differ in a specific way only. The presence or absence of a phonetic matrix should be irrelevant for the constitution of a Logical Form. If we abstract away from the distribution of phonetic features in a syntactic object (and call the result a “partial Logical Form”), then the categories Y are fully identical in (34d,e). Consequently, the MLC is applicable if it sensitive to the identity of the partial Logical Forms under construction, and if it selects the most economical one of the legal derivations. Normally, the MLC will pick (34d) and block (34e) because the closest phrase must be attracted. However, if there is a factor that applies cyclically and renders (34d) illegal, the MLC will let (34e) pass, since there is no better competing structure left. The that-trace filter
The MLC and derivational economy
93
(33a) and the locality requirements for binding (33b) are examples of factors that imply a vacuous application of the MLC. 3.
Pragmatic effects
In the majority of languages, there are no simple superiority effects for clausemate wh-phrases. The purpose of this section is to integrate the description of these languages into our interpretation of the MLC. Section 3.1 presents the core facts, discusses potential processing influences, and contains further remarks on argument-adjunct asymmetries. Section 3.2. refutes the idea that the absence of simple superiority effects is due to a relaxed definition of closeness, while section 3.3 argues that we also cannot be content with the proposal that the superiority violations are absent because scrambling may precede wh-movement. The economy account envisaged here is discussed in section 3.4. 3.1. The absence of simple superiority effects: some general remarks In a surprisingly large number of languages, intervention effects of the kind exemplified in (3) do not show up in single clauses. Consider, e.g., the examples given in (35), all illustrating (apparent) violations of (1). Other languages belonging to this group are Mohawk, Kashmiri, Malayalam, and the Slavic languages except Bulgarian. (35) Apparent violations of the MLC for clausemate arguments a. Vad what
koepte bought
vem who
(Swedish)
b. hva# what
keypti bought
hver who
(Icelandic)
c. qué what
dijo said
quién who
(Spanish)
d. co what
kto who
robił did
e. nani-o what
dare-ga tabeta no who ate
f. was what
hat has
wer what
(Polish)
gesagt said
(Japanese) (German)
94
Gisbert Fanselow
Two remarks are in order before we can discuss possible analyses for (35). First, it is often hard to determine whether a language tolerates superiority violations or not. When I asked 22 Dutch linguists via the internet to rate (36), five accepted it and seven found it questionable, while ten speakers rejected the sentence. It not very plausible that this judgment pattern lends support to the claim that there is a categorial difference between, say, Dutch and German with respect to superiority. Likewise, it is not obvious what the marginality of (37) implies for the status of superiority in French. (36) Dutch superiority #ik weet niet wat wie I know not what who “I do not know who has bought what”
gekocht bought
heeft has
(37) French superiority ?Je me demande à qui a I me wonder to whom has “I wonder who has talked to whom”
parlé talked
qui who
Instead of forcing (36) and (37) into one or the other category, the graded nature of such MLC violations should figure in the analysis of the construction.3 This is particularly true in the light of experimental findings concerning judgements by linguistically naive informants. We compared structures such as (38a) and (38b) in a questionnaire study and found a highly significant difference between multiple questions that respect the MLC and those that do not. Structures violating the MLC were rated worse than those respecting it (4.8 vs. 2.34) on a 1-6 scale (1: perfect, 6: completely ungrammatical) by linguistically naive informants. (38) a. Wer besucht wen in der Villa? who visited whom in the villa b. wen besucht wer in der Villa?
2.34 4.80
Given that the syntax literature states more or less unanimously that German lacks simple superiority effects, such findings are a bit surprising at first glance, but they are in line with those obtained by Featherston (2002a,b), and they reappeared in a very similar shape in our questionnaire studies concerning Polish and Russian.
The MLC and derivational economy
95
The key to an understanding of this difference between the syntacticians’ wisdom and empirical findings lies in the observation that acceptability judgements are influenced by a variety of factors, among them being processing difficulty. Object initial structures are harder to process than their subject-initial counterparts (as was already shown by Krems 1984 and Frazier and Flores d’Arcais 1989, see also Hemforth 1993, among many others), and it seems to be for exactly this reason that object-initial structures are in general rated worse than subject initial ones in German, irrespective of whether a potential superiority violation is involved or not (see Featherston 2002b). The rating difference between (38a) and (38b) is thus not a proof that there is some underlying MLC-based superiority effect in German, but if this line of reasoning is correct, it is hard to see on what basis one would have to assume a grammatical rather than a processing account for the rating profile for Dutch (36). The second remark concerns the reappearance of argument-adjunct asymmetries in structures violating superiority. Wh-Objects may cross whsubjects in Swedish (35a), but wh-adjuncts do not have such a freedom: my five informants unanimously rejected (39b), and accepted (39a) only. (39) Swedish adjunct superiority a. Vem who
skrattade laughed
varfoer why
b. *Varfoer skrattade vem German, on the other hand, imposes no real restrictions on multiple questions involving warum, ‘why’. We asked 17 non-linguist native speakers of German to rate the grammaticality of (40). 15 of these accepted (40a), and 10 found (40b) grammatical as well. (40) Absence of superiority effects for German adjuncts a. wer lachte warum who laughed why b. warum lachte wer Presumably, this contrast is related to a further difference between Swedish and German. Multiple questions involving two adjuncts were unanimously rejected by the Swedish informants. German shows something reminiscent of a superiority effect in such multiple questions : (42a) was accepted by 9
96
Gisbert Fanselow
of 17 informants, while (42b) was judged as grammatical by three informants only. To my ears, (42a) allows a pair-list reading, while (42b) is restricted to a single pair/echo interpretation. See Haider (this volume) for an analysis of languages (not necessarily true for German) in which multiple questions must not involve two adjuncts. Below, we will comment on the apparent superiority effect in (42). (41) Swedish multiple questions involving two adjuncts a. *Varfoer bettedde why behaved
sig refl
barnen the children
hur how
b. *Hur betedde sig barnen varfoer (42) German multiple questions involving two adjuncts a. Warum why
benahmen behaved
sich refl
die the
Kinder children
wie? how?
b. *Wie benahmen sich die Kinder warum?
3.2. The absence of simple superiority effects: caused by low subject positions? At least two types of formal accounts for the absence of superiority effects in (35) can be found in the literature, and we will discuss them in turn before we consider a pragmatic explanation. First, the definition of “closeness” central to the MLC might be modified, so that two phrases can be “equidistant” from a target position even if one of them asymmetrically c-commands the other. Second, additional movement operations might reverse the ccommand relations between wh-phrases before wh-movement. Whether a wh-phrase _ may cross another wh-phrase ` c-commanding _ depends on the definition of closeness in (1): If the MLC is defined as in (43), crossing is excluded in general, but if closeness is made precise in a more liberal way, as in (44), the MLC does not restrict the movement of phrases within the same maximal projection. (43) MLC: Strict Version _ cannot move to a if there is a ` that can also move to a and that c-commands _
The MLC and derivational economy
97
(44) MLC: Liberal Version _ cannot move to a if there is a ` that can also move to a and that asymmetrically m-commands _ Suppose, then, that MLC effects are computed relative to (44). Whether a whobject may be moved across a wh-subject then depends on the hierarchical position of the subject. Subjects are base-generated in the VP. If the subject moves to Spec,IP as in (45a), it asymmetrically m-commands the object. Therefore, an object cannot pass it on its way up to Spec, CP. If the subject stays in VP, as in (45b), the condition for the application of (44) is not met, so that the presence of a wh-subject does not interfere with the preposing of a wh-object. (45) a. [ IP subject [verb phrase [V object]] b. [ IP [verb phrase subject [V object]] (44) thus links the presence or absence of simple superiority effects to an independent parameter, viz., the location of the subject. Indeed, subjects need not move to Spec, IP in many of the languages (among them Spanish or German) that disrespect superiority. The “free inversion” of subjects and verbs in Spanish has always been taken as evidence that Spec,TP can be filled by an empty pleonastic pro, which allows the subject to stay in the verbal projection. (46) Free Inversion in Spanish le regalaron los estudiantes un libro her gave the students a book “the students gave her a book as a present” The view that thematic subjects need not leave the VP in German either is corroborated by constructions in which the VP precedes the second position auxiliary, as was noted by Haider (1986, 1990, 1993): The subject can be part of such VPs (47b,c), a fact suggesting that it need not move to Spec, IP in overt syntax. (47) a. [Mädchen geküsst] hat er noch girls kissed has he not “he has not yet kissed any girls”
nie yet
98
Gisbert Fanselow
b. [Häuser gebrannt] haben hier noch houses burnt have here yet “houses have never burnt here” c. [Mädchen geküsst] haben ihn noch girls kissed have him yet “girls have not kissed him yet”
nie never nie never
Thus, there is independent evidence that (45b) is a legal constellation of German and Spanish. An MLC formulated as in (44) will not prevent the object from moving across the subject in (45b). In contrast, subjects must go to Spec,IP in English. Here, (45a) is the only constellation that can underlie multiple questions such as (3). Even in its liberal version (44), the MLC prevents an object from crossing a subject. The choice between (44a) and (44b) is thus a good candidate for an explanation of crosslinguistic variation concerning simple superiority effects (see, e.g., Haider, this volume). While such an approach successfully captures basic superiority facts, more complex data are not readily explained along these lines. Consider Icelandic first. One may want to relate the absence of superiority effects in this language to (44), since the existence of so-called transitive expletive constructions in Icelandic suggests that thematic subjects may be placed into lower positions than in English (see, e.g., Bobaljik and Jonas (1996)). Haider (2000, this volume) observes that movement to Spec,IP is an option for thematic subjects in Icelandic, German, and Spanish, and notes that an explanation of the absence of superiority effects in terms of a low subject position predicts that one should observe English-type asymmetries whenever the position of adverbial material makes it clear that the thematic subject occupies a high position. Haider cites contrasts such as (48) (which he attributes to Ottósson (1989), and H. Sigurdsson, p.c.) as evidence for the claim that this prediction is borne out: (48) Superiority effects in Icelandic and different subject positions a. Hva# hefur what has
hver who
gefi# given
börnunum? the-children?
b. *Hva# hefur hver oft gefi# börnunum? what has who often given the-children? “who has often given what to the children?” The availability of a low position for the subject hver in the verbal projection explains the grammaticality of (48a). In (48b), however, the subject precedes
The MLC and derivational economy
99
oft ‘often’, i.e., it precedes an element adjoined to VP, and occupies a high position in the clause. The ungrammaticality of (48b) suggests, then, that the position of the thematic subject is the crucial factor governing superiority effects, as Haider argues. The argument presupposes, however, that (48b) becomes perfect when the order of the subject and the adverb is reversed. According to my informant (-orsteinn Hjaltason), this expectation is not fulfilled. Rather, we get the following array of relative judgements: (49) Superiority effects in Icelandic and different subject positions a. Hva# what
hefur has
hver who
gefi# börnunum? given the children
b. ?Hva# what
hefur has
hver who
oft often
gefi# given
börnunum? the children
c. ?Hva# what
hefur has
hver who
oft often
gefi# given
hverjum? whom?
d. *Hva# hefur oft hver gefi# börnunum? e. Hva# what
hefur has
hva#a fa#ir oft which father often
gefi# given
börnunum? the children
f. *Hva# hefur oft hva#a fa#ir gefi# börnunum? (49a) is grammatical because Icelandic shows no superiority effects. (49b) is less acceptable, but this effect is not eliminated by the addition of a third wh-phrase (49c), as it should be if the phenomenon is related to the MLC. Most importantly, the structure becomes fully ungrammatical when the order of the subject and the adverb is reversed. The status of (49d) is quite unexpected, because the order of subject and adverb seems to imply a low position for the former. The ungrammaticality of (49d) is matched by the one of (49f), which involves a d-linked wh-phrase. Whatever may be responsible for the contrasts in (48) and (49) – the MLC is not likely to come into play. Similarly, the grammaticality of (50a) might be related to the low position occupied by the unaccusative subject in this example.4 However, the structure does not degrade dramatically when the subject is placed into the slot preceding the object pronoun (that is, when it presumably moves to Spec,IP): five out of a total of eight Dutch linguists I consulted found (50b) completely unobjectionable.
100
Gisbert Fanselow
(50) Different subject positions and Dutch superiority a. wanneer when
is is
hem him
wat what
overkomen happened
b. wanneer is wat hem overkomen The position of the subject is also not completely irrelevant for the wellformedness of multiple questions in German. Consider the contrasts in (51)5, in which a non-subject has been placed in front of a wh-subject. Such constructions fail to be fully grammatical (to different degrees) when the subject precedes a clitic object pronoun (51b,d), an unstressed object pronoun (51f), or one of the particles like denn which have been claimed to mark the VP boundary in German (51h) (see Diesing 1992, Meinunger 1995, for a discussion of VP boundaries). (51) Different subject positions in German multiple questions a. wann hat’s wer gesehen when has it who seen b. ?*wann hat wer’s gesehen “who saw it when?” c. wem who.dat
hat`s has it
wer who
gegeben given
d. ?*wem
hat wer’s gegeben
e. wem hat es wer gegeben f. ?*wem hat wer es gegeben “who gave it to whom” g. was hat what has
denn ptc.
wer who
gesagt said
h. ?*was hat wer denn gesagt “who said what” Multiple questions are less grammatical when a wh-phrase crosses a whsubject that has moved to Spec,IP. It is tempting to explain such contrasts in terms of the assumption that the wh-subject asymmetrically c-commands the trace of the wh-object in the ungrammatical examples, so that the MLC (44) would block the bad structures.
The MLC and derivational economy
101
Such an analysis is not convincing, however. It does not take into account the fact that the same or similar contrasts show up in constructions for which the MLC cannot be relevant. German wh-words are ambiguous between an interrogative and an indefinite interpretation. The restrictions on the placement of interrogative wh-subjects exemplified in (51) are exactly mirrored by comparable restrictions on the placement of indefinite wh-subject pronouns, as (52) shows. The indefinite subjects in (52) share the distribution of wh-phrases, but they do not interact with any other element in the clause in terms of the MLC. Therefore, the MLC cannot explain (52), and it would be strange if it accounted for the same distribution of data in (51). (51) and (52) show that German syntax imposes restrictions on the placement of subjects that are not definite. The MLC is not responsible for these. (52) Effects of the subject position for indefinite pronouns a. dann hat’s wer gesehen then has it someone seen “then, someone saw it” b. ??dann hat wer’s gesehen c. dem hat`s wer gegeben him.dat has it someone given “someone gave it to him” d. ?*dem
hat wer’s gegeben
e. dem hat es wer gegeben f. ?*dem hat wer es gegeben g. hat denn wer angerufen has ptc. someone called “did someone call?” h. ?*hat wer denn angerufen? A central prediction of an account of (absent) superiority effects that exploits differences in the placement of subjects is not borne out: in a number of languages that fail to show superiority effects (German, Icelandic, and perhaps Dutch), the actual position of the subject does not influence the grammaticality of multiple question in the expected way.
102
Gisbert Fanselow
3.3. The absence of simple superiority effects: caused by scrambling? A second attempt of capturing (35) assumes that the object in fact c-commands the subject at the point of derivation when movement to Spec, CP is carried out. Under this circumstance, the MLC does not have to be relaxed in order to explain (35): Given that the order object > subject is in principle always grammatical in a German (53a,b), the question arises whether (53c) really is not in line with even the strictest version of the MLC. After all, (53c) might be derived from (53d) rather than (53e). In the former case, the highest wh-phrase is moved to Spec,CP in (53c), as predicted by the MLC. (53) Object-subject order in German and the MLC a. dass fast jeden jemand that nearly everyone.acc someone.nom “that someone had called nearly everyone”
angerufen called
hatte had
b. dass fast jeden wer that nearly everone.acc someone.nom “that someone had called nearly everyone”
angerufen called
hatte had
c. wen hat wer who.acc has who.nom “who has invited whom?”
eingeladen invited
d. hat [wen [wer eingeladen]] e. hat [wer [wen eingeladen]] In other words, (53c) might be grammatical because additional movement operations (scrambling) can change the c-command relations established by Merge.6 If the object can in general be placed in front of the subject, structures such as (53d) can be derived in which the wh-object c-commands the wh-subject. Even in its strictest version, the MLC cannot block the subsequent movement of the wh-object to Spec, CP. See, e.g., Fanselow (1998, 2001), Haider (1986), Wiltschko (1998), among others, for different versions of this account. According to Fanselow (1998), the contrasts in (54) corroborate the view that apparent violations of superiority are licensed by scrambling. Certain wh-phrases such as wen von den Studenten (54a) or was für Frauen (54d) can either move to Spec,CP as a whole, or be split up in simple and multiple questions (54b,e). In the latter case, only the wh-part of the phrase under-
The MLC and derivational economy
103
goes fronting, whereas the remaining part is stranded. The stranded material indicates the position from which the phrase has been attracted to Spec, CP. The ungrammaticality of (54c,f) suggests, then, that a wh-phrase cannot cross another one in German, either. Objects may undergo overt wh-movement in multiple questions, but only if movement starts in a position c-commanding a wh-subject. (54) Superiority and Splitting a. wen von den Studenten hat heute wer eingeladen? who.acc of the students has today who.nom invited b. wen hat [von den Studenten] heute wer heute eingeladen? c. *wen hat
heute wer
abends
von den
Studenten eingeladen who has today who in the evening of the students invited “who has invited which of the students today (in the evening)s”
heute d. was für Frauen hat wer what for women has who.nom today “who has invited which kind of women today”
eingeladen invited
e. was hat für Frauen wer heute eingeladen f. ??was hat wer für Frauen heute eingeladen Pesetsky (2000) points out that contrasts such as the ones in (54) find an explanation in terms of the intervention effects analysed by Beck (1996), see also Mathieu (2002). (55) shows that the parts of a discontinuous wh-phrase must not be separated by any kind of operator in German. An intervention account can explain (54) and (55) at the same time, while the MLC-based explanation for (54) cannot be easily extended to (55). (55) Intervention effects and Split noun phrases a. was hat er für Frauen nicht what has he for women not “what kind of woman did he not meet?”
getroffen met
b. *was hat er nicht für Frauen getroffen Pesetsky’s observation certainly establishes that data such as (54) cannot be used to show that object wh-movement cannot originate below a wh-subject
104
Gisbert Fanselow
in a multiple question. Notice, however, that (54b,e) still show that whextraction of an object may start in a position c-commanding a wh-subject. Reference to (53b), i.e., to the grammaticality of structures in which the object occupies a higher position than the subject, thus seems to be in general a sufficient7 (though not a necessary) condition for the absence of simple superiority effects in a language. Unfortunately, the scrambling solution for (35) cannot be applied in all languages in which superiority effects are absent, because quite a number of them (Swedish, Icelandic, French, Dutch) do not have free constituent order generated by scrambling! 8
3.4. Pragmatics In spite of its shortcomings, the scrambling account has an attractive feature: it implies that the choice between the object- and the subject-initial versions of a multiple question is never arbitrary in the languages that tolerate (35). Scrambling can place an object in front of a subject only if the latter is more focal than the former. Therefore, the scrambling account of missing superiority predicts that apparent superiority violations are acceptable under certain pragmatic circumstances (those that would license scrambling) only. This prediction is borne out. The pragmatic conditions of use of (56a) and (56b) are different. They require different “sorting keys” (Comorovski 1996). Answers to (56b) are well-formed if the object of the clause represents a contrastive topic. There are no comparable restrictions on the wellformedness of (56a). (56) Absence of superiority in German a. wer who.nom
hat has
wen gesehen who.acc seen
b. wen hat wer gesehen “who has seen whom?” This pragmatic dependency becomes evident when one considers the minitexts in (57). The a.- and b. examples introduce the referents of the subject and the object, respectively, as known to the speaker. These referents constitute the “sorting keys” for the multiple questions a’ and b’, they are discourselinked (see Pesetsky 1987). (57a) can only be continued by (57a’), and (57b) only by (57b’).
The MLC and derivational economy
105
(57) Discourse influence on superiority violations in German wir we
haben have
bereits herausgefunden already found out
a. wer jemanden gestern anrief, und wer nicht who.nom someone.acc yesterday called and who.nom not b. wen jemand gestern anrief, und wen nicht who.acc someone.nom yesterday called and who.acc not Aber wir sind nicht eher zufrieden, bis wir auch wissen But we are not earlier content until we also know a’. wer who.nom
WEN who.acc
angerufen called
hat has
b’. wen WER angerufen hat In other words, a wh-object can precede a wh-subject in German if the former is more topical than the latter. Out of the blue wh-questions allow subject > object order, only. This is particularly clear when the predicate is symmetric (such as treffen, “meet”) as in (58), so that discourse-linked differentiations of subjects and topics are very hard to imagine. (58) Superiority effects in out of the blue contexts Erzähl mir was über die Party. “Tell me something about the party” a. Wer hat who.nom has
wen who.acc
getroffen? met?
b. ??Wen hat wer getroffen “who met who?” Steinitz (1969) was the first to observe that modal or sentence level adverbs resist reordering in the interest of information structure. The adverbial “superiority” effects discussed in (42) can be accounted for in these terms. The languages that lack simple superiority effects do not differ in this respect: constituent order reflects information structure. Different types of operations conspire to guarantee that focal information is preceded by topical one: scrambling (German, Japanese, Polish), topicalization to Spec, CP (Swedish, Icelandic, German), or subject placement in Spec, IP or VP (Spanish, German). In the most parsimonious account, these operations are driven by a constraint C-INF that requires that topical material c-command
106
Gisbert Fanselow
focal elements (but more luxurious theories of information structure would have the same effect). If C-INF plays a role in determining the well-formedness of partial LFs in multiple questions as well, then (34d) (repeated here for convenience) is able to block (34e) only if this does not prevent a particular distribution of focality/topicality among the wh-phrases from being expressed within the limits imposed by C-INF. If the higher degree of topicality of wh2 must be expressed, (34e) can be chosen. Information structure overrides the MLC. (34) d. [wh1 [Y CompA e. [wh 2 [Y CompA 4.
-----
[wh1A - - - [ wh2A - - - ]]]]] [wh1A - - - [ wh2A - - - ]]]]]
The nature of exceptions
While the absence of simple superiority effects in the interest of information structure is a widespread phenomenon, it is far from being universal, as evidenced, e.g., by the relevant English data. The contrast between English and German in the formation of multiple questions might be indicative of the different importance the languages attribute to C-INF: in German, its effects are stronger than the MLC, while it is the other way round in English. The interaction of the MLC and the constrains of information structure would thus be reminiscent of an optimality theoretic framework (for OT accounts of MLC-effects, see, e.g., Müller 2001, and the constributions by Hale and Legendre, Lee, and Vogel, this volume). We will argue, however, that such a conclusion is not warranted. The MLC is never stronger than C-INF. 4.1. Bulgarian Roumanian and Bulgarian are languages cited frequently when one wants to substantiate the claim that superiority effects are not confined to English. (59) Simple superiority effects in Roumanian and Bulgarian a. cine ce cumpara who what buys b. *ce cine cumpara c. koj kogo vizda who whom sees d. *kogo koj vizda
The MLC and derivational economy
107
The Slavic languages other than Bulgarian such as Czech, Polish or Russian allow superiority violations, however. The seminal study of Rudin (1988) initiated an impressive series of studies that try to account for this and other differences among the Slavic languages, cf., Błaszczak and Fischer (2002) for an overview. The proposal advanced by Bo‰koviç (2002) (see also Bo‰koviç 1997, Stepanov 1998) is the most interesting one in the context of the preceding section. According to him, wh-phrases move to specifier positions defined in terms of information structure (focus) in Polish, Russian, or German, while movement targets a pure [+wh] specifier in Bulgarian or English. This might fit into the preceding discussion in the following way: when phrases move to [+wh]-specifier, only the attracting feature is grammatically visible, so that additional features of information structure will not interfere with the application of the MLC. However, when XPs are attracted to heads defined in terms of information structure, it is the distribution of the pertinent features that determines how attraction is carried out. It is doubtful, however, that a model drawing a sharp line between Bulgarian and the other Slavic languages is adequate. The intuition represented in (59c,d) is not shared by all native speakers of Bulgarian: two of the five native speakers that I have consulted accept a sentence such as kakvo koj pravi? “what who did” provided that koj “who” is stressed. It is not obvious, then, that the judgement pattern for Bulgarian multiple questions is qualitatively different from the one for German or Dutch. Even if we disregard the empirical issue of whether (59d) is really ungrammatical in Bulgarian (and not just rejected by some speakers, similar to what holds for German or Polish superiority violations), the contrast between (59c-d) is not identical with the one we find in English. A number of differences to English come out clearly. In Bulgarian, strict superiority effects can be found for animate subjects only. When the subject is inanimate, and the object animate, both orders are fine, as Billings and Rudin (1996: 38) have observed. (60) Absence of superiority effects with inanimate subjects of transitive verbs a. Kogo whom.acc
kakvo what.nom
b. Kakvo kogo e udarilo? “What hit whom?”
e CL
udarilo? hit
108
Gisbert Fanselow
No superiority effects show up with psychological predicates, as (61) illustrates. Sometimes, subject-initial sentences even seem worse than sentences beginning with the dative wh-phrase: (61) Absence of superiority effects with psychological predicates a. Koj na kogo mu who.nom whom.dat CL-dat 3.sg (literally) “Who is likeable to whom?”
xaresva? is-pleasing
b. Na kogo koj mu xaresva? c. ??Kakvo na kogo mu what.nom to whom.das CL-dat.3.sg (literally) “what is likeable to whom?’
xaresva? is-pleasing
d. Na kogo kakvo mu xaresva? Superiority effects are therefore restricted to external arguments of transitive verbs, and even for them, the only defensible generalization is the one offered by Billings and Rudin (1996: 46) “If the wh-external argument is human (i.e., koj), then it must appear first in the wh-cluster.” That such a constraint on wh-clusters may be necessary quite independent of any considerations of superiority is suggested by the fact that Bulgarian differs from English with respect to ternary questions as well. Kayne (1983), Hornstein (1995), Pesetsky (2000) and others have observed that superiority need not be respected in ternary questions: even the lowest whphrase can be fronted. (62) Absence of superiority effects in English ternary questions a. what did who buy where? b. what did who persuade who to buy Cancellation effects due to the addition of a third wh-phrase exist in Bulgarian, too (see (63)), but the examples used in the literature and the intutions of my informant Penka Stateva suggest that the liberalizing effect never affects subject koj.
The MLC and derivational economy
109
(63) Restricted liberalization of superiority in Bulgarian a. Koj kogo kakvo e who whom what is “Who asked whom what?’
pital? asked
b. Koj kakvo kogo e pital? c. Koj kogo kak e who whom how is “Who kissed whom how?”
tselunal? kissed
d. Koj kak kogo e tselunal? e. Koj who
kogo k˘ude whom where
e is
vidjal? seen
f. Koj k˘ude kogo e vidjal? Recall also that the ordering restrictions of Bulgarian koj do not show the interpretation-sensitivity of the English superiority effect. The judgements for (9) – repeated for convenience – seem to correlate with the jugdements for simple kakvo koj kupi. If the MLC would be responsible for the ungrammaticality of (59d), it would be unclear why the condition is not interpretation-sensitive in Bulgarian, whereas it is in English and German. (9) Anti-superiority in Bulgarian a. #koj se chudi, kakvo koj who wonders what who “who wonders what who bought?”
kupi? bought
b. #na kogo who.dat
kupi? bought
kaza, you-tell
kakvo what
koj who
The MLC is thus not a likely cause for the ordering restrictions in Bulgarian. A simple account can, however, be formulated in terms of the fact that Bulgarian is a multiple fronting language. One of the crucial insights of Rudin (1988) was that the peculiarities in the behavior of Bulgarian (as compared to other Slavic languages) can be related to the fact that Bulgarian is a “multiple filler” language: all wh-phrases must be preposed in a multiple question (unless they are discourse-linked). Suppose that sequences of whpronouns form a cluster, and that the morphophonological realization of this cluster is subject to the kind of rules that also govern the linear arrangement
110
Gisbert Fanselow
of sequences of clitics and inflectional affixes. As Bonet (1991), Halle (1992) and Noyer (1992) show, the order of elements in such clusters cannot be exclusively predicted from syntax. Rather, independent principles of morphology are needed, a view that is well-established nowadays in the theory of distributed morphology (Halle and Marantz 1993,1994). There is independent evidence that the composition of wh-phrases in clusters is governed by non-syntactic principles in Bulgarian. Billings and Rudin (1996: 43) suggest that (64b) is ungrammatical because *na kogo kogo ‘to whom whom’ violates a ban against consecutive wh-homophones. In colloquial Bulgarian, na kogo can be replaced by na koj, in which case both orders of the objects are fine: (64) Phonological restrictions in wh-clusters in Bulgarian a. Koj kogo who.nom whom.acc
na kogo e to whom.dat CL
pokazal? showed
b. *Koj na kogo kogo e pokazal? c. Koj kogo who.nom whom.acc
na koj to who.dat
e CL
pokazal? showed
d. Koj na koj kogo e pokazal? “Who pointed out whom to who?” It natural to assume that further templatic constraints determine the arrangement of wh-pronouns in the cluster, among them a requirement that koj must come first in a truly transitive construction. This requirement implies the contrast in (59). Since it is a PF constraint, considerations of expressivity will not play a role, as required. Some observations from other languages lend support to the view that cluster formation is crucial in establishing ordering restrictions that resemble (but fail to be) superiority effects. Languages in which cluster formation is optional are of particular relevance here. In Yiddish, multiple fronting of whphrases to a position preceding the verb is possible, but not mandatory. In wh-clusters, word order is strict (65a,b) but it is free when only one wh-phrase is placed into preverbal position (65c,d), see Hoge (2000) for discussion. Likewise, in Hebrew, superiority can be violated only if the verb is placed between the two wh-phrases, although inversion in wh-questions is not necessary as such (66):
The MLC and derivational economy
111
(65) Multiple questions in Yiddish a. ver who
vemen whom
hot has
kritikirt? criticised
b. *vemen ver hot kritikirt? c. ver who
hot has
vemen whom
kritikirt? criticise
d. vemen hot ver whom has who “who criticised whom?”
kritikirt? criticised
(66) Superiority in Hebrew a. ma kana mi what bought who b. *ma mi kana For obvious reasons, wh-pronouns cannot form a continuous cluster when they are separated by a verb. The data in (65) and (66) can be captured easily in a model that allows for templatic ordering restrictions of wh-phrases which apply when syntax is spelt out. Grewendorf (1999, 2001) and Hoge (2000) account for superiority in Bulgarian by cluster formation as well, but in a fairly different way.
4.2. English English superiority effects are difficult to account for in the model we propose. This is not necessarily a negative aspect: superiority effects in English are distributed in a very complex way, for which it is not clear at all how it could be captured in a simple MLC account. Intervention effects disappear in English when the wh-phrases allow a context-related interpretation. Pesetsky (1987) shows that (67a) is fine because it has a “discourse-linked” interpretation: a wh-phrase is discourselinked if its interpretation relates to a contextually given set of objects and persons, from which one tries to pick a relevant one with the wh-phrase. Thus, the d-linked wh-phrase in (67a) generates s contrastive topic for the answers, as it does in German. As Bolinger (1978) observes, proper contexts even license the absence of intervention effects for wh-pronouns, as in (67b).
112
Gisbert Fanselow
(67) Absence of superiority effects in certain contexts in English a. which book did which person read b. I know what everyone was supposed to do. But what did who actually do? However, reference to (67) allone does not explain why (68a) sounds bad to the English ear, while its one-to-one translation into German (68b) is grammatical. (68) a. *what b.
was
will
who
see
wird
wer
sehen
The key to an understanding of this contrast lies in the observation that who is a topic in (68a), while wer can be focal in (68b), and bear focal stress. If wh-pronouns are inherently indefinite, and constitute bad topics, the different status of (68a,b) can be understood. A number of facts support this view. First, the acceptability of a crossing structure depends of the degree to which the subject wh-phrase can be interpreted as a referential category, as discourselinked, as a potential topic. (69) Crossing effects as a function of the potential topicality of the subject a. what did a friend of who say to Bill? b. what did whose friends say to Bill? c. *what did each friend of who say to Bill (Hornstein 1995: 147) d. *what did how many men buy? Second, Erteschik-Shir (1997: 190) observes that object initial questions are in general relatively bad in English when the subject is an indefinite, a weak quantifier. Obviously, the unacceptability of (70b) (with a non-generic nonspecific reading of a boy) cannot be explained in terms of the MLC. But if indefinites are bad as such in the subject position of questions, one does not need to additionally invoke the MLC.
The MLC and derivational economy
113
(70) Non-referential subjects in wh-questions in English a. what did two boys find? b. *what did a boy find? c. which book did two boys find? d. ?which book did a boy find? Summing up, there is reason to believe that the difference between (68a) and (68b) stems from the fact that a wh-subject must be topical in English when it is in situ, while this does not hold for German. Zubizarretta (1998) develops a prosodic theory for accent and focus placement in English which implies that the predicate will be in focus in double questions of English in which the subject is left in situ. Erteschik-Shir (1997) proposes a model of the syntax-information structure interface which also implies topichood for the subject when certain formal dependencies are built up in a clause. In the interest of space, I will not try to assess the merits of these approaches, but confine myself to pointing out that the connection between topichood and in situ wh-subjects apparently need not be stipulated for English. Explanations borrowed from Zubizarretta and Erteschik-Shir may help explaining the status of (68a) – but do they also fit the general model we try to defend here, viz. that the MLC is an interface economy constraint that blocks structures only if their (partial) LF can be arrived at in a more economical way? What is the proper way of expressing questions in which an object wh-pronoun is the sorting key for answers? It is worthwhile to compare the constellations which lead to crossing effects with wh-pronouns in English with those that do not: (71) Structural constellations leading to crossing effects: passive a. who bought what? a’. *what did who buy? a”. what was bought by whom? b. who did you give _ what b’. *what did you give who _ c. what did you give _ to whom c’: *who did you give what to _
114
Gisbert Fanselow
The contrasts in (71) are related to the fact that English expresses information structure distinctions in a way different from scrambling and topicalization. (71) shows that English bans crossing wh-pronouns primarily in those contexts in which it offers an alternative way of making a lower (wh-) phrase more topical than the higher one. For subjects and objects, this alternative way is the passive construction. The conditions of information structure that license counterparts to (71a’) in German are therefore not inexpressible in English. Rather, they imply the use of a passive. (71b-c) illustrate that one can front both the direct and the indirect object in a multiple question, but the options (related to information structure) are linked to the dative alternation. (71b) is unobjectionable because it is in line with the MLC. The MLC-violation in (71b’) would have to be motivated on grounds of information structure (who being more focal than what), but in a dative shift construction, the inner object (who) must be more topical than the outer object. Therefore, (71b’) is ill-formed on pragmatic grounds. (71c) is grammatical since it conforms to the MLC. The information structure requirements that would license the MLC-violation in (71c’) are those that trigger the dative shift alternation. (71c’) is illicit because the proper way to express its information structure is (71b). In an OT-framework, one may feel tempted to explain the data in (71) by assuming a grammatically visible competition between active and passive sentences, or between the constructions V NP PP and V NP NP at the point when the MLC is evaluated, but a more conservative solution is also at hand: we can assume that the information structure constellation needed to override the MLC in (71a’, b’, c’) cannot be linked to the construction in question in English (because of the structural alternatives passive and dative shift). Other constellations do not yield a crossing effect. English has no special way of expressing information structure interactions of objects and adverbs and adverbial PPs. There being no restrictions on the distribution of topicality, the information structure needed to override the MLC in either (72a) or (72a’) can linked easily with to the sentences, so that both ways of formulating the multiple question are wellformed. (72) Constellations without crossing effects a. what did you see where? a’. where did you see what? b. to whom did you give what? b’. what did you give to whom?
The MLC and derivational economy
115
The absence of a contrast between (72b) and (72b’) forces upon us the assumption that the construction V NP [to NP] comes in two varieties: to may be a dative marker, or the head of a PP. If information structure restrictions favoring the dative alternation affect the former version only, the absence of a contrast is predicted. Alternatively, we may assume that wh-PPs may always cross wh-DPs. For English, the approach just sketched implies that the topical nature of in situ subjects in multiple questions must be the blocking factor for sentences with wh-pronouns in subject position. (73) a. who arrived when? b. *when did who arrive
5. Concluding remarks In the theory defended here, the MLC is a constraint that applies cyclically in a derivation: if more that one category can be attracted to a certain position P, only the one closest to P can move. However, the MLC cannot prevent a movement operation from applying if that movement step is inevitable in generating the (partial) LF-representation in question. Given that considerations of information structure play a role in this context, the fact that the MLC decides between syntactic objects with the same partial LF only renders the principle quite weak in the domain of operator movement. The predictions are quite different for head movement, if head movement does not have semantic effects. Consequently, the two syntactic objects in (74) (with A and B being heads attracted to X) do not yield different partial LFs, because they differ in the location of the phonetic matrix of A and B only. In the model advocated here, this is equivalent to saying that nothing will prevent the MLC from blocking (74b). (74) a. [[X A ] [ … A … [… B … ]]] b. [[X B ] [ … A … [… B … ]]] Phrasal A-movement has semantic consequences in many theories, and the pragmatic implications of different options of filling the subject position are obvious. The current proposal therefore implies that MLC-effects should be influenced by considerations of interpretation in the domain of A-movement as well, i.e., one should be able to observe apparent MLC-violations. This prediction is borne out. E.g., Hestvik (1986) observes that both objects can
116
Gisbert Fanselow
be attracted to the subject position in the passive version of double object constructions in Norwegian: (75) Passive formation in Norwegian double object constructions a. det there
ble was
gitt given
ham him
en a
gave present
b. han he
ble was
gitt given
en a
gave present
c. en gave ble gitt ham The standard assumption concerning English is that the direct object must not cross the indirect one in the passive of a double object construction, but this does not characterize all dialects of the language. After all, McCawley (1988: 79) observes that (76) sounds acceptable to speakers of British English. (76) a car was sold my brother __ for $200 by Honest Oscar Phrasal A-movement thus seems to have properties comparable to the one of operator movement with respecr to the MLC. One needs to identify the interpretive conditions that license (75c) or (76), and offer an account as to why information structure does not seem to modulate MLC-effects in certain languages or dialects of languages (such as American English). German shows that additional formal aspects comes into play that do not figure in operator movement: both objects may be promoted to subject status in a passive construction, but different auxiliaries are used for the promotion of direct and indirect objects: (77) Passive formation in German double object constructions a. jemand someone.nom b. ein a.nom
stiehlt steals
Schlüssel key
dem the.dat wird is
Kind child
dem the.dat
c. das Kind bekam einen the.nom child got a.acc “someone stole a key from the child”
einen a.acc Kind child
Schlüssel key gestohlen stolen
Schlüssel key
gestohlen stolen
The MLC and derivational economy
117
Similarly, noun phrases with an oblique Case must not move to the subject position in many languages, and they may be skipped by A-movement to Spec,IP (see, e.g., Stepanov, this volume). There is no comparable array of facts with A-bar movement. The data show that the application of the MLC is not only sensitive to questions of identity of (partial) Logical Forms, but also constrainted by purely formal factors. A discussion of these is beyond the scope of the present paper. I have argued that the MLC must be considered an economy constraint that compares (partial) derivations and selects the one that fulfils checking requirements with the shortest movements possible. However, the set of candidate derivations which the MLC compares is constrained by formally encoded expressivity conditions: a derivational step B leading from structure S* to a partial LF S is blocked by the MLC only if S can also be reached from S* in a way that respects the MLC.
Acknowledgements The research reported here was supported by grants of the Deutsche Forschungsgemeinschaft to the Forschergruppe Konfligierende Regeln (FOR 375), and to the Innovationskolleg Formale Modelle kognitiver Komplexität (INK 12). I want to thank Joanna Błaszczak, Eva Engels, Susann Fischer, Stefan Frisch, Hans-Martin Gärtner, Andreas Haida, Hubert Haider, Gereon Müller, Doug Saddy, Matthias Schlesewsky, Penka Stateva, Arthur Stepanov, Koyka Stoyanova, Ralf Vogel, and the two anonymous referees for helpful comments.
118
Gisbert Fanselow
Notes 1. Superiority can be violated in similar contexts in Swedish, too. (ii) was accepted by two of my five informants, two rejected it, one found it questionable. All five informants considered (i) grammatical. (i)
Vem Who
tror believes
att that
Johan John
(ii)
Vad tror vem att Johan gjorde
gjorde did
vad what
2. Contrast between structures rated as “?” with others rated as “*” may not be too impressive, but examples involving different kinds of A-bar-movement yield clearer contrasts: (i) which violin-1 is this sonata-2 easy to play t-2 on t-1 (ii) *which sonata is this violin easy to play on 3. And the model proposed below does so by linking the acceptability of a crossing constellation to the expression of a non-standard information structure. 4. As suggested by Hubert Haider, p.c. 5. The relevance of such examples has been brought to my attention by Gereon Müller. 6. One may wonder, why scrambling is able to create structures incompatible with a simple MLC. Fanselow (2001) suggests that this problem is part of an argument in favor of the base-generation of scrambling structures. 7. Tibetan shows at least some of the contrasts one is familiar with from English (Seele p.c, Chungda Haller, p.c), in spite of the fact that it is a free constituent order language. I have no explanation for this. (i)
a. b.
su ga re nyos pa red? who what bought *ga re su nyos pa red
8. One might claim that these languages nevertheless allow scrambling, but only as an intermediate step followed by further movements. It is difficult to assess, however, which data could possibly refute such an account. Its empirical force is thus limited, and we refrain from considering it.
The MLC and derivational economy
119
References Aoun, Joseph, Norbert Hornstein, David Lightfoot and Amy Weinberg 1987 Two Types of Locality. Linguistic Inquiry 18: 537–577. Aoun, Joseph, and Audrey Li 1989 Scope and constituency. Linguistic Inquiry 20: 141–172. 1993 The Syntax of Scope. Cambridge, Mass: MIT Press, 2002 Essays on the Representational and Derivational Nature of Grammar. Cambridge, Mass.: MIT-Press. Baker, C. 1970 Notes on the description of English questions: the role of an abstract question morpheme. Foundations of Language 6: 197–219. Beck, Sigrid 1996 Quantified structures as barriers for LF movement. Natural Language Semantics 4, 1–56. Billings, Loren and Catherine Rudin 1996 Optimality and Superiority: A new approach to overt multiple-wh ordering. In; Proceedings of Annual Workshop on Formal Approaches to Slavic Linguistics. The College Park Meeting 1994, Jindrich Toman (ed), 35–60. Michigan Slavic Publications. Ann Arbor. Błaszczak, Joanna and Susann Fischer 2002 Multiple Wh-Konstruktionen im Slavischen. Linguistics in Potsdam 14. Bobaljik, Jonathan and Diane Jonas 1996 Subject Positions and the Role of TP’. Linguistic Inquiry 27: 195–236. Bolinger, Dwight 1978 Asking more than one thing at a time. In; Questions, Henry Hiz (ed.), 107–150. Dordrecht: Reidel. Bonet, Eulalia 1993 Morphology after syntax: Pronominal clitics in Romance. Doctoral dissertation, MIT, Cambridge, Mass. Bo‰koviç, Îeljko 1997 Superiority effects with multiple wh-fronting in Serbo-Croatian. Lingua 102: 1–20. 2002 On multiple wh fronting. Linguistic Inquiry 33: 351–383. Bresnan, Joan 1972 Theory of Complementation in English Syntax. Doctoral dissertation, MIT, Cambridge, Mass. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. 1993 A Minimalist Program for Linguistic Theory. In; The View from Building 20, Ken Hale and Samuel Keyser (eds), 1–58. Cambridge: MIT Press. 1995 The minimalist program. Cambridge, Mass.: MIT-Press.
120
Gisbert Fanselow
Comorovski, I. 1996 Interrogative Phrases and the Syntax-Semantics Interface, Dordrecht: Kluwer Academic Publishers. Dayal, Veneeta 2003 Multiple wh-questions. Case 66, The Syntax Companion. Diesing, Molly 1992 Indefinites. Cambridge, Mass.: MIT-Press. Erteschik-Shir, Nomi 1997 The dynamics of focus structure. Cambridge, Cambridge UP. Fanselow, Gisbert 1998 Minimal Link Effects in German (and Other Languages). Paper, presented at the 1998 MLC conference, Potsdam. 2001 Features, e -roles, and free constituent order. Linguistic Inquiry 32, 3. Fanselow, Gisbert and Anoop Mahajan 2000 Towards a minimalist theory of wh-expletives, wh-copying, and successive cyclicity. In: Wh-scope marking, Uli Lutz, Gereon Müller and Arnim von Stechow (eds). Amsterdam: Benjamins. Featherston, Sam 2002a Grammaticality and Universals. Wh-constraints in German. to appear in Linguistics. 2002b Magnitude estimation and what it can do for your syntax.: some whconstraints in German. Ms., Tübingen. Fodor, Janet Dean 1978 Parsing strategies and constraints on transformations. Linguistic Inquiry 9: 427–473. Frazier, Lyn and Giovanni Flores d’Arcais 1989 Filler-driven parsing: A study of gap filling in Dutch. Journal of Memory and Language 28: 331–344. Golan, Yael 1993 Node crossing economy, superiority and D-linking. Ms., Tel Aviv University. Grewendorf, Günther 1999 The additional-wh effect and multiple wh-fronting. In: Specifiers, D. Adger, S. Pintzuk, B. Plunkett and G. Tsoulas (eds.), 146–162. Oxford: Oxford University Press, 2001 Multiple wh-movement. Linguistic Inquiry 32: 87–122. Haider, Hubert 1986 Deutsche Syntax – Generativ. Habilitation thesis. Vienna. 1990 Topicalization and other puzzles of German syntax. In: Scrambling and Barriers. Günter Grewendorf and Wolfgang Sternefeld (eds.), 93–112. Amsterdam: Benjamins. 1993 Deutsche Syntax-generativ. Tübingen: Narr. 1997 Economy in syntax is projective economy. In; The Role of Economy Principles in Linguistic Theory. Chris Wilder, Hans-Martin Gärtner and Manfred Bierwisch (eds.), 205–226. Berlin: Akademie Verlag.
The MLC and derivational economy 2000
121
Superiority Revisited – Dutch, English, German, Icelandic Contrasts. A representational account. Ms., University of Salzburg. This vol. The Superiority Conspiracy – Four Constraints and a Processing Effect. Halle, Morris 1992 Latvian declension. In Yearbook of Morphology Geert Booij and Jaap van der Marle (eds.), 33–47. Kluwer, Dordrecht. Halle, Morris and Alec Marantz 1993 Distributed Morphology and the pieces of inflection. In The View from Building 20. Ken Hale and S. Jay Keyser (eds.), 111–176. MIT Press, Cambridge. 1994 Some key features of Distributed Morphology. In MITWPL 21: Papers on Phonology and Morphology. Andrew Carnie & Heidi Harley (eds.), 275–288. MIT Press, Cambridge. Hemforth, Barbara 1993 Kognitives Parsing: Repräsentation und Verarbeitung sprachlichen Wissens. Sankt Augustin: Infix. Hendrik, R. & Michael Rochemont 1982 Complementation, multiple wh, and echo questions. Ms., University of North Carolina, Chapel Hill, N.C. and University of California, Irvine, California. Hestvik. Arild 1986 Case Theory and Norwegian Impersonal Constructions: SubjektObject Alternations in Active and Passive Verbs. Nordic Journal of Linguistics 9: 181–197. Hoge, Kerstin 2000 Superiority. Doctoral dissertation. Oxford. Hornstein, Norbert 1995 Logical Form. Cambridge, Mass.: Blackwell. Huang, James T. 1982 Logical relations in Chinese and the theory of grammar. Doctoral dissertation, MIT, Cambridge, Mass. Kayne, Richard 1983 Connectedness. Linguistic Inquiry 14: 223–249. Kitahara, Hisatsugu 1993 Deducing ‘superiority’ effects from the Shortest Chain Requirement. Harvard Working Papers in Linguistics 3: 109–119. Kitahara, Hisatsugu 1994 Target _. Doctoral dissertation, Harvard University. Krems, Josef 1984 Erwartungsgeleitete Sprachverarbeitung. Frankfurt/Main: Lang. Kuno, Susumo and J. Robinson 1972 Multiple wh-questions. Linguistic Inquiry 3: 463–487. Lasnik, Howard and Mamoru Saito 1992 Move a: Conditions on its Application and Output, Cambridge, Mass.: MIT Press.
122
Gisbert Fanselow
Lee, Hanjung This vol. Minimality in a Lexicalist OT. Lutz, Uli, Gereon Müller and Arnim von Stechow (eds.) 2000 Wh-scope marking. Amsterdam: Benjamins. Mahajan, Anoop 1990 The A/A-bar distinctin and movement theory. Doctoral dissertation, MIT, Cambridge, Mass. Maling, Joan and Annie Zaenen 1982 A phrase structure account of Scandinavian extraction phenomena. In: The Nature of Syntactic Representations, Pauline Jacobson & Geoffrey Pullum. (eds), 229–282. Dordrecht: Reidel. Mathieu, Eric 2002 The Syntax of Non-Canonical Quantification: A Comparative Study. Doctoral dissertation. London. McCawley, James 1988 The Syntactic Phenomena of English. Chicago: The University of Chicago Press. Meinunger, André 1995 Discourse dependent DP de-)placement. Doctoral dissertation, University of Potsdam. Müller, Gereon 2001 Order preservation, parallel movement, and the emergence of the unmarked. In: Optimality Theoretic Syntax, Geraldine Legendre, Jane Grimshaw, and Sten Vikner (eds), 279–313. Cambridge, Mass.: MITPress. Noonan, Maire 1988 Superiority Effects : How do antecedent government, lexical government and V2 interact. McGill Working Papers in Linguistics 1988: 192–214. Noyer, Robert 1992 Features, Positions and Affixes in Autonomous Morphological Structure. Doctoral dissertation, MIT, Cambridge, Mass. Oka, T. 1993 Shallowness. MIT Working Papers in Linguistics 19: 255–320. Pesetsky, David 1982 Paths and categories. Doctoral dissertation, MIT, Cambridge, Mass. 1987 Wh-in-situ: movement and unselective binding. In: The Representation of (In)definiteness, Eric Reuland and Alice. ter Meulen (eds), 98–129. Cambridge, Mass.: MIT Press. 2000 Phrasal Movement and Its Kin. Cambridge, Mass.: MIT-Press. Reinhart, Tanya 1995 Interface Strategies. OTS working papers in Linguistics. 1998 Wh-in-situ in the Framework of the Minimalist Program. Natural Language Semantics 6: 29–56.
The MLC and derivational economy
123
Reis, Marga 1996 Extractions from Verb-Second Clauses in German?. In: On Extraction and Extraposition in German, Uli Lutz and Jürgen Pafel (eds.), 45–88. Amsterdam: Benjamins. 1997 Zum syntaktischen Status unselbständiger Verbzweit-Sätze. In: Sprache im Fokus, Christa Dürscheid, Karl Heinz Ramers, and Monika Schwarz (eds.), 121–144. Tübingen: Niemeyer. Richards, Norvin 2001 Movement in Language. Oxford & New York: Oxford University Press. Rizzi, Luigi 1990 Relativized Minimality. Cambridge, Mass.: MIT Press. Rudin, Catherine 1988 On multiple questions and multiple wh fronting. Natural Language and Linguistic Theory 6: 445–501. Steinitz, Renate 1969 Adverbialsyntax. Berlin: Akademie-Verlag, Stepanov, Arthur 1998 On Wh-Fronting in Russian – Proceedings of NELS 28. Pius N. Tamanji & Kiyomi Kusumoto (eds), 453–467. 2001 Cyclic domains in syntactic theory. Doctoral dissertation, University of Connecticut, Storrs. This vol. Ergativity, Case, and the Minimal Link Condition. Sternefeld, Wolfgang 1997 Comparing Reference Sets. In: The Role of Economy Principles in Linguistic Theory, Chris Wilder, Hans.-Martin Gärtner and Bierwisch (eds.), 81–114. Berlin: Akademie Verlag. Travis, Lisa 1984 Parameters and Effects of Word Order Variation. Doctoral dissertation, MIT, Cambridge, Mass. Vogel, Ralf This vol. Correspondence in OT syntax and minimal link effects. Wiltschko, Martina 1998 Superiority in German. In: Proceedings of the Sixteenth West Coast Conference on Formal Linguistics, E. Curtis, J. Lyle & G. Webster (eds.). 431–445. Stanford, Cal.: CSLI Publications. Zubizaretta, Maria Luisa 1998 Prosody, Focus and Word Order. Cambridge, Mass.: MIT Press.
Stylistic fronting: a contribution to information structure Susann Fischer
Standardly, Stylistic Fronting (SF) is understood as a rule which moves a category to a position in front of the finite verb in those sentences where the position in front of the verb (SpecIP) is not occupied by an overt subject NP. SF has often been claimed to represent a mere “MLC-effect” without any information structural consequences. The aim of this paper is to show that the Catalan inverted elements are similar to what we see in the Germanic languages but don’t obey one of the formulated constraints of SF: they also invert when an overt subject is present. In order to account for this fact, I will propose that in Old Catalan the trigger for SF to take place is not checking off a EPP feature, but checking off a strong V-feature in an additional category placed between CP and IP. More specifically I will propose that SF in Old Catalan – contrary to what has been claimed for SF in Germanic – contributes to information structure.
1. Introduction The aim of this paper is to investigate a case of inversion in Old Catalan which is referred to by Mailing (1980/1990) and others – with respect to the Germanic languages – as Stylistic Fronting. Stylistic Fronting (SF) was claimed to be operative in all Old Scandinavian languages (Falk 1993, Platzack 1987), with respect to the Modern Germanic languages it is still operative in Icelandic and Faroese, and was also discussed for Yiddish (Diesing 1990, Santorini 1989). With respect to other languages, examples of SF have been presented for Old French (Cardinaletti and Roberts 1991), and Old Spanish (Fontana 1993, 1996) 1. The examples in (1) below illustrate instances of Icelandic SF. (1a) represents the canonical word order, while (1b) illustrates the order after SF has applied. The examples in (2) illustrate the same phenomenon with respect to Old Catalan.
126
Susann Fischer
(1)
a. [Sá sem er fyrstura# skora mark] fær sérstök ver#laum he that is first to score goal gets special price ‘The first one to score a goal gets a special price’ a# skora mark] fær sérstök ver#laum b. [Sá sem fyrstur er to score goal gets special price he that first is (Jónsson 1991)
(2)
a. Longament considerà lo hermitá en la demanda que li hac feta Fèlix. long considered the hermit in the question that him has.3sg made F.2 ‘For a long time the hermit considered the question that Felix had asked him.’ b. com no li responia a la demanda [que feta li havie ] how not him answered to the question [that made him had.3sg ] (Fischer 2002)
Standardly, Stylistic Fronting (SF) is understood as a rule which moves a category to a position in front of the finite verb in those sentences where the position in front of the verb (SpecIP) is not occupied by an overt subject NP. SF has often been claimed to represent a mere “MLC-effect” without any information structural consequences. I will show that the Catalan inverted elements are similar to what we see in the Modern Germanic languages but don’t obey one of the formulated constraints of SF: they also invert when a non-pronominal overt subject is present.3 In order to account for this fact, I will propose that – at least in Old Catalan – the trigger for SF to take place is checking off a strong V-feature in an additional category between CP and IP, more specifically I will propose that SF in Old Catalan – contrary to what has been claimed for SF in Germanic – contributes to information structure. The paper is organized as follows: In section 2 the different properties of SF in contrast to topicalisation will be discussed and examined with respect to Old Catalan. I find the discussion of the different properties indispensable in order to show that what we observe in Old Catalan is really SF and not something else. Section 3 discusses the most prominent analyses that have been proposed for SF in the Germanic languages and it will be shown that these proposals cannot explain SF in Old Catalan. In section 4 I present my proposal for SF in Old Catalan, arguing that SF contributes to information structure. Section 5 will discuss some of the problematic data with respect to the MLC and it will point out the contradictions within the existing analyses for Germanic SF.
Stylistic fronting: A contribution to information structure
127
2. The properties of stylistic fronting SF underlies different constraints as was first discussed by Maling (1980, 1990) and subsequently claimed in most literature on Icelandic (Barnes 1987, Jónsson 1991, Holmberg 1997, 2000, Poole 1992, 1996, 1997 and Sigur§sson 1997 with a more critical view). In particular Maling (1980, 1990) suggested that a distinction can be made between topicalisation and what is now called Stylistic Fronting, consider the dichotomy in (3). (3)
Topicalisation
Stylistic Fronting
– – – –
– – – –
applies to XPs is unbounded does require focus no subject gap required
applies to X° is clause-bounded does not require focus requires a subject gap
These properties have lately been completed by a locality constraint insofar that SF but not topicalisation observes minimality. Altogether these properties are seen to represent the salient properties of SF in Icelandic. In the next section I will examine them one by one with respect to the Old Catalan data that I have collected.4 2.1. Head movement There is still some controversy in Icelandic as to whether SF involves movement of a head, movement of a phrase, or even both. With respect to Old Catalan one can observe at this point of discussion that it displays movement of a lexical head. Notice particularly that SF of the past participle in (4a) must strand the direct object. This is usually taken as convincing evidence that SF is an instance of head-movement as opposed to XP movement. (4)
la corona del Emperi, a. que feita aviets that made had.3pl the crown of the emperor,
participle (Desclot/309)
b. que molt es noble cavaler, noble man, that much is.3sg
adverb (Desclot/275)
c. qui demanar li vengés , , who to-ask him came.3sg
infinitive (Desclot/154)
d. que corporal és e composta, and compound, that corporal is.3sg
adjective (Metge/191)
128
Susann Fischer
2.2. Clause-boundedness SF is taken to be clause-bounded in contrast to topicalisation which can apply across clause boundaries. If we reconsider the examples under (5) we see that SF seems to be clause-bounded in Old Catalan. In the texts used, I did not find any example of SF which crossed a clause boundary, therefore it seems to be the correct assumption that SF is clause-bound in Old Catalan.5 (5)
a.
E dix que anat se n’era ja la nuit … already the night and said.3sg that gone ref. there’was.3sg ‘and he said that the night had already gone …’ (Desclot/284) NA E anat dix que se n’era ja la nuit b. already the night and gone said.3sg that ref there’was.3sg
2.3. Focus not required Usually SF is taken to be a stylistic variant of the canonical word-order with no meaning difference associated. Crucially, it has been claimed that in contrast to topicalization there is no focus or emphasis required in SFed contexts (Maling 1980, 1990 among many others).6
(6)
… eins og frá /*FRÁ hefur veri sagt … as about has been told
Since we cannot ask any speaker to pronounce a sentence with a SFed element we need to leave the question open at this point of discussion.
2.4. Relativized minimality or the MLC SF is subject to locality. The locality condition observed can be accounted for in terms of Relativized Minimality (Rizzi 1990) or the Minimal Link condition (cf. Holmberg 1997). Out of several elements which could in principle undergo SF, it is only the structurally most superior element. This condition seems to hold for the SFed elements in Old Catalan.
Stylistic fronting: A contribution to information structure
(7)
129
a. Lo scuder fo molt meravellat de la demanda que el caveller li hac feta. the squire was very surprised of the question that the cavalier him had made (Llull/60) meravellat lo rey de les peraules del pagès b. molt fo surprised the King of the words of-the page very was (Llull/190) c.
NA
meravellat fo molt
lo rey de les peraules del pages
2.5. The subject gap condition SF appears to induce a restriction on the distribution of elements that can occur in subject position of the finite clause when Stylistic Fronting has taken place. In other words: Stylistic Fronting in contrast to topicalisation has been claimed to be possible, only if the sentence displays a subject gap. Like in Icelandic we find SF in all types of finite clauses in Old Catalan that represent a subject gap. (8)
(9)
subject relative qui demanar li vengés . who to-ask him came.3sg
(Desclot/154)
impersonal construction E el senyor del hosta dix que anat se n’era ja la already the and the landlord said that gone ref there’was.3sg nuit night (Desclot/284)
(10) complement clause , que fet m’hajats that made me’have.2sg (11) matrix clause Dit has més said have.2sg more
,
(Eixemenis/197)
(Metge/179)
What we perceive in Old Catalan looks identical to SF in Modern Icelandic. However, in a language with referential pro, next to expletive pro in main
130
Susann Fischer
and embedded contexts, we do not really expect a difference between topicalisation and SF concerning the empty subject position. Therefore, it is rather difficult to differentiate between topicalisation (12) and SF (13) when a subject gap is taken as the distinctive feature. Especially, since we do find SF together with a subject (13b).7 (12) a. a aquestes peraules lo rey no li respòs . to these words the king not him answered
. (Llull/150)
b. és veritat que en algun lloc s‘ha pus ardentment is.3sg truth that in some place ref‘has put ardently (13) a. del gran honrament que feyt nos avets , of the great honour that made us have.3pl
. . (Metge/60)
, (Desclot/316)
b. e adonchs con amà Deu e serví Déu de ço que Déus donat and so with love.3sg God and serve.3sg of that that God given li havia ___ , . him had.3sg (Llull/36) (13b) contains a complementizer, a subject NP, and in spite of the overt subject NP the past participle has been stylistically fronted.
3.
Existing analyses
3.1. With respect to Modern Scandinavian The properties discussed above have been used in order to corroborate the different analyses on Stylistic Fronting. In the literature, we find different explanations that seek to account for the phenomenon of SF in Modern Icelandic and Modern Faroese. One group treats SF as presenting an optional movement, whereas the other group takes SF to be a kind of obligatory process in order to fill a subject gap or to guarantee a V2 structure. Among the first group we find Rögnvaldsson and Thraínsson (1990), who treat SF as a subcase of topicalisation that targets the topic position in a sen-
Stylistic fronting: A contribution to information structure
131
tence, and Poole (1997), who sees SF as a rightward movement of the auxiliary as part of the phonological computation. Poole derives the optionality of SF in that he assumes two different auxiliaries, one which is enclitic and needs to be protected whenever it would otherwise end up in initial position and one which is a full form that does not need a host to lean on and can thus appear in initial position. The second group can further be divided into those that seek to explain why SF only applies when there is a subject gap (Maling 1980, 1990, Holmberg 1997), while the others change the perspective and seek to find explanations of why subjects cannot appear together with SF (Jónsson 1991). I will not be concerned with any details of these approaches but only discuss those examples that are not only problematic for SF in Old Catalan but also for SF in the Germanic languages. The most intriguing problem for all analyses on SF is why it should only apply in clauses without an overt subject. Jónsson (1991) explains this fact by suggesting that the requirement for a non-overt subject is due to an adjacency requirement for Case assignment in Icelandic. Nominative Case is supposed to be assigned by the finite verb in I°, and this assignment is blocked if an item occurs between the finite verb and the subject in Spec IP. Different problems arise with this proposal. First he is forced to assume that the subject position in finite clauses is not obligatorily Case-marked. Instead, empty subjects are of two types, one type without Case, namely PRO and pro in Stylistic Fronting, and one type with case, namely pro in finite clauses where Stylistic Fronting has not applied. Second, the assumption of an adjacency requirement on Case assignment does not hold in Icelandic. In sentence (14) we see that the NP is separated from I° by the parenthetical like a true scout and still it is assigned Nominative Case. (14) Ég hélt a# Jón, eins og sannur skáti, myndi hjálpa gömlu konunni I thought that Jon like a true scout would help the old lady ‘I thought that Jon, like a true scout, would help the old lady …’ (Poole 1992: 24) Furthermore, the observation that SF is prohibited in clauses with an overt nominative subject is not general enough, since SF is ungrammatical in clauses with oblique subjects as well, as has been shown by Sigur§sson (1997).
132
Susann Fischer
(15) a. sem sagt hefur a# who said has that b. *sem hunn sagt hefur who he said has
.
c. *sem honum sagt hefur veri# who him (Dat) told has been
. . (Sigur§sson 1997)
Maling (1980, 1990) and Holmberg (1997, 2000) taking the other perspective are not confronted with this problem, since they have no need to explain why subjects are ungrammatical in sentences with SF, but instead they need to explain why SF has to apply in sentences that display a subject gap. Maling treats SF as actual movement of the SFed element to SpecIP in order to satisfy the verb-second structure of the clause, whereas Holmberg (1997, 2000) takes SF to present movement of phonological features in order to satisfy a “phonological EPP”. The landing site in Holmberg’s analysis is regarded to be SpecTopP (AgrSP) and the moved element is attracted by a feature [P]. [P] attracts the closest phonological matrix, in accordance with MLC and feature movement economy. In Icelandic the finite verb in Top° cannot check [P]. In this theory the EPP has two parts, a categorial one [D] and a phonological one [P]. The feature [D] is checked by a nominal category moved to or merged in SpecTopP°, or by a finite verb adjoined to Top° (cf. Alexiadou & Anagnostopoulou 1998), whereas [P] in this approach is checked by the SFed element in SpecTopP°. The question that immediately arises with respect to these analyses is why SF not always applies in subjectless clauses? Does Icelandic not always have a verb-second clause structure? Or does pro check the phonological EPP only in some but not in all clauses? And actually, this is exactly what has been proposed with respect to the optionality of SF. However, many problems remain with respect to SF in Modern Scandinavian. In the following I will show that those analyses that make crucial reference to the verb second character of the language and where SFed elements are situated within IP do not hold for Old Catalan either.
3.2. Applied to Old Catalan Different scholars have argued that the Old Romance languages are symmetric verb-second languages (e.g. Cardinaletti & Roberts 1991, Fontana 1993, Benincà 1995 among many others). Consequently, inverted elements
Stylistic fronting: A contribution to information structure
133
in embedded and matrix sentences have been analyzed analogously to the Germanic inverted elements displaying SF: targeting a position within IP and filling the subject gap. Two types of verb-second analyses can be distinguished. One type treats matrix and embedded clauses differently, in that the verb moves up to C° in main clauses and ends up in I° in embedded sentences (e.g. Benincà 1995). The other type argue that the verb in main and embedded sentences only moves up to I° (e.g. Fontana 1993). One of the main arguments in favor of a verb-second analysis was the claim that postverbal clitics could not be attested in a dependent clause introduced by a complementizer or a wh-pronoun. Benincà even claims: “(…) we never have enclisis8 in a clause introduced by a complementizer, in any Romance variety of the Middle Ages” (Benincà 1995: 335). The fact that postverbal clitics are never found in a sentence introduced by a complementizer (which is a designated occupier of the head of CP), suggests for these scholars that verb movement to C° feeds postverbal clitic placement. (16) e donà-la per muler a l’emperador de Castela; and gave.3sg-her as wife to the.Emperor of Castille; ‘and he gave her as wife to the Emperor of Castille;’
(Desclot/1)
Then, if verb movement to C° is blocked, it is predicted that the order verbclitic does not occur. However consider the following examples: (17) E d’aquí avant lo rey féu-li donar tot ço … and from here in front the King made-him give all that … ‘and from now on the King forced him give all that…’ (Desclot/13) (18) E diu que lo primer respòs-li hòrreament e ab males paraules … and said.3sg that the first answered.3sg-him horrified and with bad words … ‘And he said that the first answered him horrified and with abusive words…’ (Desclot/288) (19) … lo dit bon hom hac totes les vestedures pobres e mesquines que la dita infanta portà-li the said good man had.3sg all the clothing poor and shabby that the said Infant carried.2sg him ‘… that the good man had all the poor and shabby clothing that the Infant had given to him’. (Metge/101)
134
Susann Fischer
As for sentence (17) even though this is not a classical verb-second structure since the verb surfaces in a third position, one could argue that the adverb is located above CP, the subject in SpecCP and the verb in C°. So it would still be correct to assume that verb-movement to C° feeds postverbal clitic placement. The argumentation is a little harder for sentence (18). However, one could argue that (18) displays a bridge verb, as has been suggested by Adams (1987) for Old French. She convincingly argues that embedded V2 is possible in the complements to bridge verbs, and that the relationship between clauses in sentences of this type is paratactic, in that both clauses are main clauses. The class of bridge verbs of Old French is comparable to the class of those that in V2 Germanic languages typically allow complements with matrix properties, and like in German, in medieval French the complementizer is not present in these constructions, suggesting that these are cases of Germanstyle embedded V2. In Old Catalan the complementizer is always present which seems to strengthen the fact of subordination, but for the sake of argumentation, I will assume the relationship between the two as being one of parataxis. But see sentence (19), this sentence straightforwardly reveals that it would not be the right analysis to assume that the verb adjoins to C°. Even if the relative pronoun was in SpecCP, the verb-clitic sequence is separated from the relative pronoun by a subject which needs to be in a Spec position below C°. Thus, we need to conclude that there is a further category between CP and IP.
4.
Towards an explanation of SF in Old Catalan
4.1. Sentence structure in Old Catalan In order to explain postverbal clitics in embedded sentences Fischer (2002) makes use of the old assumption that affirmation and negation are generated in the same position in the phrase-structure (Chomsky 1957). Building on Laka (1990), I took this position to be the functional category namely YP9 that hosts different sentence operators: negation, “emphatic” and “neutral” affirmation. With respect to Old Catalan clause structure I propose that different realizations for Y° are available, always depending on what is expressed: negation vs. affirmation or emphasis. (20) a. b. c.
S° [-V] S° [+V] S° [no]
“neutral” affirmation “emphatic” affirmation negation
Stylistic fronting: A contribution to information structure
135
Within a Minimalist framework (Chomsky 1993) S is like I an inflectional head with V-features, either strong or weak, that needs to be checked. Thus in case the V-feature is strong it must be checked in overt syntax, in the instance of a weak V-feature checking needs to be delayed until LF. Under the analysis proposed the difference between the verb-clitic vs. clitic-verb sequence results in a difference in semantic interpretation, i.e., word-order contributes to information structure. The clitic-verb sequence represents a “neutral” affirmation, whereas the verb-clitic sequence emphasizes something that interrupts the routine of what has been told, i.e., something unexpected, unusual or outstanding.10 This analysis is corroborated by the fact that in the whole of my corpus no verb-clitic sequence is attested in negated sentences. Negative sentences are different from emphatically affirmative clauses in that S is lexicalised, therefore S does not attract the verb to check off any strong feature, since the feature has been checked off via merge.11
4.2. Stylistic fronting in Old Catalan Maintaining the above sentence structure, the finite verb, and all other verbal elements, e.g. stylistically fronted elements would target the same position in sentences that express emphasis, namely Y° [+V]. The trigger for SF and finite verb-movement in front of the clitic would be the strong V-feature on Y°. The difference between a sentence with a SFed element and one without would be a difference in interpretation, i.e. SF is thus not obligatory in the sense which has been argued by some analyses of SF in Icelandic (in order to guarantee V2 or substituting a subject), but it applies in order to express something that needs to be emphasized, which is unexpected/unforeseen or outstanding in the development of the text cf. (21) and (22). (20) Longament considerà lo hermitá en la demanda que li hac feta Fèlix. long considered the hermit in the question that him has.3sg made Fèlix. ‘For a long time the hermit considered the question that Felix had asked him.’ (Llull/24) (21) Fèlix. se meravellà del hermitá com no li responia a la demanda [que ] feta li havie Fèlix ref surprised.3sg of the hermit how not him answered.3sg to the ] question [that made him had.3sg ‘Felix was surprised that the hermit did not answer him to that question which he had asked him.’ (Llull/25)
136
Susann Fischer
The story of Fèlix is a story about a boy who walks through the world in order to find out who or what God is, so that he can understand why people love God so much. He meets all different kinds of people and asks them all different kinds of questions, all concerning God, Trinity, Sin, Hell, Heaven etc. In the examples above Felix was sent to the hermit by someone who has told him, that this hermit has been living in the woods for a very long time always serving and loving God. The man sending Felix to the hermit was sure that the hermit can answer Felix all the questions about God that Felix wants to know. So Felix goes to the hermit and talks to him. After a long time of mutual talking Felix finally asks the hermit què és Déus? (what is God?). Sentence (20) above, describes that the hermit takes a long time to consider the answer. Sentence (21) denotes that Felix is really surprised about the fact that the hermit is not able to answer him. The preposed element indicates this. And consider the next sentence (22). (22) E si vós no sabets ço que Déus és, … com lo podets tant amar sens conexença and if you not know that what God is … how him can so love without knowledge ‘But if you don’t know what God is, … how can you love him so strongly without knowing him?’ If this approach is correct, then there should never be a SFed element together with a postverbal clitic or a negation in the same sentence. And this is in fact the case. During the whole period in which SF was possible, not one instance of SF together with a postverbal clitic is attested.
4.3. Explaining the loss of SF in Catalan With respect to the Scandinavian languages it has been claimed that SF disappeared, when the verb stopped moving to I° (Cf. Falk 1993, Platzack 1987). In Modern Catalan SF is neither an option anymore, even though the verb still moves overtly to I°. To account for the loss of SF in Modern Romance I will propose an account of grammaticalization. In the emergence of Catalan different realizations for Y° were available depending on what needed to be expressed: negation vs. affirmation or emphasis, i.e. Y° existed in three variations: with a lexical head Neg°, with a strong [+V] feature and with a weak [–V] feature. In the case of the negation
Stylistic fronting: A contribution to information structure
137
a lexical head was merged to Y°, yielding the sequence clitic-verb. No change can be attested with respect to negated sentences. These have always shown the ordering clitic-verb, and no SFed elements. The change obviously took place with respect to the [+V] feature. With respect to affirmation the evidence for the language learner was very complex: she was exposed to the canonical word-order and to SF or verb-clitic sequences. The different orders were connected to different semantic interpretations, in main as well as in subordinate sentences: clitic verb orders signified a neutral sentence, whereas the order SF or verb-clitic signified an emphatic sentence.12 Additional evidence that the verb was the one to check the [+V] feature on S° in emphatic sentences was provided by the minimal affirmative answer to a yes-no question (see again footnote 11), which is straightforward under the assumption (as a principle of universal grammar) that in replies to yes-no questions Y has to be strong. Both orders, the canonical (clitic-verb) and the non-canonical word order (SF or verb-clitic), are attested with the different interpretations until the 14th century in main and in subordinate sentences. The order verb-clitic was first lost in subordinate clauses. I suggest that this is due to a semantic motivation: the intuition is that speakers are more strongly committed to what is affirmed or denied in matrix clauses than in an embedded clause. However, as long as the sequence SF or verb-clitic was also attested in the subordinate phrase the evidence for the different parameter settings was robust, since in subordinate sentences the verb-clitic sequence always denoted emphasis. After the sequence verb-clitic was lost in subordinate sentences, the order verb-clitic could not easily be identified with the Y° category, i.e., the evidence for the learner was not categorical anymore, since in matrix sentences the verb could also have moved to C°. Thus, in matrix sentences that could not be identified as a clear instance of Narrative Inversion both orderings verb-clitic and clitic-verb were identified as presenting the same meaning: neutral affirmation. The learner was thus exposed to two competing structures (cf. Clark & Roberts 1991), a marked structure and an unmarked structure and thus chose the default option: the order without verb-movement to Y° (cf. Fischer 2002). There are many other effects of this change, two important ones for the analysis of SF are proposed here: the absence of SF goes together with the absence of postverbal clitics in finite sentences in Modern Catalan, and SF rapidly decreased together with the loss of postverbal clitics in embedded sentences. The data presented indicates that SF in Old Catalan does not meet the assumptions for SF in Icelandic. On one side we find SF together with an overt subject, on the other side the need to fill the subject gap seems more
138
Susann Fischer
than dubious in a language that displays referential and expletive pro in main and embedded clauses. What we see in Old Catalan is not a mere MLC effect in order to rescue a Verb-Second structure or in order to fill a subject gap, because otherwise the derivation would not meet the conditions of syntax. What we see is that both derivations, the one with SF and the one without SF both meet the conditions of syntax. Although, SF in Old Catalan clearly obeys the MLC, I argue that the movement of the verbal element in front of the finite verb is not a mere “MLC effect”, i.e. semantically vacuous but contributes to information structure. Whether SF in Old Catalan applies or not depends on what the speaker wanted to express.
5. Some problems for the MLC within Icelandic SF Especially with respect to Icelandic SF it has been argued that it is always the closest element (phonological matrix) that moves in front of the auxiliary in order to satisfy in accordance with the MLC the feature in need to be checked. In other words SF applied in order to rescue some structure that would otherwise crash in the derivation. However, in the following I will point out different problems with respect to this claim and close by proposing that maybe SF in Icelandic should rather be analyzed along the lines of Old Catalan. Looking at conjoined heads illustrates that SF does not simply pick the closest phonological matrix. og vert (23) a. eins unnt væri as possible were and desireable b. eins unnt og vert væri (24) a. eins sagt hefur veri# og skrifa# as said has been and written b. eins sagt og skrifa# hefur veri# (Sigur§sson 1997) In sentences (23b) and (24b) both heads and the conjunction is moved to the front, even though also the sentences (23a) and (24a) are grammatical, the feature in need to be checked is obviously checked, since the sentences occur.
Stylistic fronting: A contribution to information structure
139
Additionally, it has been noted by Maling (1990) and others that it is not always the closest element that undergoes SF. If clauses contain more than one element that may be stylistically fronted, blocking effects emerge, e.g. negation always blocks SF of another head. However, in some cases closeness does not play a role, as any element can move. Data such as the ones in (25) raise some questions. It is not immediately transparent how (25) respects the MLC. (25) a. –eir sem veri# hafa those who been have
veikir —urfa a# fara til læknis sick must see a doctor
b. –eir sem veikir hafa veri# those who sick have been
—urfa a# fara til læknis must see a doctor (Jonsson 1991: 7)
Consider also the Faroese data discussed in Barnes (1987): úti á bygd (26) a. Hon spurdi, hvat vanligt hev#i veri# she asked, what usual had been out on village “what had been usual in rural communities” b. Hon spurdi, hvat veri# hev#i vanligt úti á bygd she asked, what been had usual out on village (Barnes 19987: 32) Maybe sentences (25) and (26) could be rescued arguing that the two elements that can undergo SF are in equidistance to the checking head. However, in sentence (27) it is the adjunct that has been moved over the auxiliary and the past participle, therefore sisterhood cannot explain these data. . (27) a. tey, sum í Danmark hava veri# those who in Denmark have been b. tey, sum veri# hava
í Danmark. (Barnes 1987: 33)
Barnes even notes that not all permutations have been included and that the acceptability of the stylistic inversion of verbal particles or objects in clauses containing both varies considerably; it seems to depend to some extent on emphasis and the particular lexical items involved (Barnes 1987: 33).
140
Susann Fischer
Another argument in favor of the view that maybe SF is not a mere MLC effect but somehow depends on the interpretation of the sentence can be drawn from example (28) which indicates that “pure” auxiliaries, i.e. hafa (have) and passive vera (be) (as opposed to modals) are blocked from moving. (28) a. *_etta er versta bók sem veri§ hefur this is the worst book that been has
skrifu§ written (Jónsson 1991: 7)
b. *_etta er sú ákvör§un sem hafa mun this is the decision that have will
veri§ tekin si§ast been taken latest (Sigur§sson 1997)
Additionally, it has been argued by Jónsson (1991) that progressive be is unable to move by SF even in the absence of any other candidate (29). (29) ??—eir sem veri# höf#u those who been had
a# mála voru or#nir —reyttir painting had become tired (Jónsson 1991: 7)
The above data clearly indicates that not any phonological matrix can rescue a sentence, in the above case an element is needed that carries more semantic content than a “pure” auxiliary. In order to explain the above data we need to differentiate between predicate be and passive be. The latter, a “pure” auxiliary, never undergoes Stylistic Fronting.
6. Conclusion It has been argued that with respect to Old Catalan SF contributes to information structure. When going through the contexts in which SF has applied the sentences clearly need to be interpreted differently then the sentences without SF. In the literature on Icelandic it has been shown that SF is optional with respect to the grammaticality of the clauses, i.e. the sentences without SF are absolutely grammatical as well, i.e. SF applies also in those cases where there is another candidate to check the phonological EPP. It has been shown that SF has other effects as well, it is not always the closest element that moves, and not all elements are allowed to move. As it seems, the elements that are allowed to move need to carry some more semantic content than pure auxiliaries, furthermore, it seems to depend to some extent on the
Stylistic fronting: A contribution to information structure
141
emphasis involved. I thus suggest to reconsider our view that SF in Icelandic represents a mere “MLC effect”. In order to find out what SF in Modern Germanic really is, we need to clarify what SF represented in the Old Scandinavian languages. Maybe what we will find is that SF in Old Scandinavian also contributed to information structure.
Acknowledgements Preliminary versions of this paper have been presented at the Workshop on the Minimal Link Condition in Potsdam and at the VII Diachronic Generative Syntax Conference (DIGS) in Girona. Thanks to the audiences for their comments and remarks. I would particularly like to thank Joanna Blaszczak, Eva Engels, Gisbert Fanselow, Andreas Haida, Geoffrey Poole, Arthur Stepanov, Ralf Vogel and two anonymous reviewers. All remaining errors are entirely my own. I would also like to acknowledge the DFG grant DO544/1–1 for financial support of my research.
Notes 1. Unfortunately, Cardinaletti & Roberts (1991) only present the two examples listed below of which example (ia) is ambiguous. (i) a. Por l’esperance qu’an lui ont , … for the hope which in him have , ‘For the hope which they have in him, …’ b. et si ne sait que faire puisse and so not knows what do can ‘and therefore he does not know what he can do.’ Fontana (1993) gives more examples, but does not decide whether the examples are instances of SF or remnant movement. 2. With respect to Old Catalan it needs to be mentioned that clitic pronouns in the Old Romance languages precede and follow the finite verb. However, a clitic that precedes the finite verb is – contrary to pronouns in Icelandic – not considered to be a SFed element, but a Wackernagel clitic that needs to be in second position, regardless of any subject gap (cf. Cardinaletti & Roberts 1991, Fontana 1993, Fischer 2002, see also the discussion in section 3.3).
142
Susann Fischer
3. Old Catalan sounds almost like Old Icelandic, which also had SF with overt subject, unlike Old Catalan however, in Old Icelandic the subject seems to follow the finite verb and SFed element whereas in Old Catalan the subject is allowed to precede and follow the finite verb and SFed element. (ii) Sagt hefi eg _a§ er eg mun segja (Svarfdæla saga p. 1812) said have I that which I will say ‘I have said what I will say…’
4.
5.
6.
7.
8. 9.
10.
I would like to thank the anonymous reviewer who provided me with the above Old Icelandic data. The texts of which data is presented here are: Llull, Ramon, date of composition around 1288. Llibre de Meravelles; Desclot, Bernard, date of composition between 1283 and 1288, Crònica de Bernat Desclot; Eiximenis, Francesc, date of composition between 1373 and 1386, Contes i Faules; and Metge, Bernard, Date of composition 1399, Lo Somni. The references follow the pattern used in my data-base (cf. Fischer 2002). However, when using diachronic data one is always confronted with the lack of negative evidence. One cannot ask any speaker or use judgements based on ones intuitions of whether a certain grammatical construction would have been acceptable or not. Furthermore we can never be sure whether maybe a certain construction was only used in spoken language, and might therefore very well have existed. That is why for the following examples that have not been attested NA is used instead of the * . However, as pointed out by Sigur§sson (1997), the word or phrase moved by SF can be contrastively focused. (iii) a. … sem hafa GERT eitthva#, en ekki bara talay#. … that have DONE something, and not only talked b. … sem GERT hafa eitthva#, en ekki bara tala#. Recall that Old Icelandic also allow overt subjects in SF constructions. However, in Icelandic full subjects are not allowed to precede the SFed construction (see also the discussion in Fischer to appear). Enclisis is what I call postverbal clitics, i.e. the sequence verb-clitic. This analysis is cast within the framework of Chomsky (1993), where a distinction was made between strong and weak features, and nominal and verbal features both triggering displacement of heads and XPs. The analysis could be embedded within the framework of Chomsky (1995, 2000) along the following lines. S° is like I° an inflectional head with a generalized EPP feature, which either triggers overt displacement of XPs or is satisfied via Agree. Unlike Chomsky (1995, 2000), however, I could assume that such features are not necessarily satisfied by Merge/Move XP. Merge/Move X° can also check them (see Alexiadou & Anagnostopoulou 1998 for detailed argumentation). For a detailed discussion of the information structural implications of postverbal clitics in Old Catalan see Fischer (2002).
Stylistic fronting: A contribution to information structure
143
11. As has been shown in Laka (1990), replies to yes/no questions crucially involve the Y projection. Latin did not have a word for “yes” (cf. Pinkster (1988) and all references in there). Instead expressions like ita est, sane, vero etc. are used next to the bare inflected verb. The minimal affirmative answer in Latin involves the bare inflected verb (ia), while the negative answer involves the affix non together with the bare inflected verb (ib). (iv) Legistine librum? a. legi b. non legi read.2sg book read.1sg no read.1sg ‘Have you read the book’ ‘yes’ ‘no’ The same effect is still observed in European Portuguese: (v) Comeste o pãozinho que te trouxe? a. comi b. não eat.2sg the sandwich that you brought.1sg eat.1sg no ‘Did you eat the sandwich I brought you?’ ‘yes’ ‘no’ European Portuguese obviously shows a conservative feature which is part of Latin, and which is also attested in Old and Modern Provençal (Jensen 1994). I assume that in the medieval period this feature was shared by more Romance languages. In my corpus no minimal answers to yes-no questions are attested. But Catalan directly developed from Latin and was in constant contact with Provençal during the medieval period. It seems therefore plausible to suggest that Old Catalan behaved similar to Latin in that the affirmative answer involved the bare inflected verb, i.e., the strong V-feature on Y was checked by the verb (cf. Fischer 2002: 163ff). 12. In subordinate sentences the order verb-clitic signified always an emphatic sentence, i.e. Y was always involved. In matrix sentences, however, the order verbclitic was not always connected to emphasis, e.g. in narrative inversion or questions, in these structures the verb moved to C° (cf. Fischer 2002).
References Adams, Marianne 1987 Parametric Change: Empty Subjects in Old French. In Advances in Romance Linguistic, David Birdsong, (ed), 1–32. Dordrecht: Foris Publications. Alexiadou, Artemis and Elena Anagnostopoulou 1998 Parametrizing AGR: Word-Order, V-Movement, EPP checking. In Natural Language and Linguistic Theory, 16: 491–539. Barnes, Michael 1987 Some Remarks on Subordinate-Clause Word-order in Faroese. In Scripta Islandica 38: 3–35. Benincà, Paola 1995 Complement Clitics in Medieval Romance: the Tobler-Mussafia Law. In Clause Structure and Language Change, A. Battaye and I. Roberts (eds.), 325–344. New York: Oxford University Press
144
Susann Fischer
Cardinaletti, Anna and Ian Roberts 1991 Clause Structure and X-Second, MS. University di Venezia. Chomsky, Noam 1957 Syntactic Structures. (Janua linguarum 4). ‘s-Gravenhage: Mouton. 1993 A Minimalist Program for Linguistic Theory. In The view from building 20: Kenneth Hale and Samuel Jay Keyser (eds.), 1–52. Cambridge/ Mass.: MIT Press. 1995 The Minimalist Program, Cambridge/Mass.: MIT Press 2000 Minimalist Inquiries: the framework. In: R. Martin, D. Michaels, and Juan Uriagereka (eds.), Step by Step. Cambridge, Mass. MIT Press, 89–155. [also available as MIT Working Papers in Linguistics no.15, 1998. Department of Linguistics and Philosophy, MIT, Cambridge/ Mass]. Clark, Robin and Ian Roberts 1993 A Computational Model of Language Learnability and Language Change, In: Linguistic Inquiry, 24, No 2: 299–345. Diesing, Molly 1990 Verb movement and the subject position in Yiddish, In Natural Language and Linguistic Theory 8, 41–79 Falk, Cecilia 1993 Non-Referential Subjects in the History of Swedish. Ph.D. dissertation. University of Lund. Fischer, Susann 2002 The Catalan Clitic System: A Diachronic Perspective on its Syntax and Phonology. Berlin: Mouton de Gruyter. to appear The diachronic relationship between Quirky Subjects and Stylistic Fronting. In Non-Nominative Subjects, K.V. Subarao and Peri Bhaskararao (eds.). Amsterdam: John Benjamins. Fischer, Susann. and Artemis Alexiadou 2001 Stylistic Fronting: Germanic vs. Romance. In Working Papers of Scandinavian Syntax, 68: 1–34 Fontana, Josep 1993 Phrase Structure and the Syntax of Clitics in the History of Spanish. Ph.D. dissertation. University of Pennsylvania. 1996 Some problems in the analysis of non-finite verb-fronting constructions. In Language Change and Generative Grammar Ellen Brandner and Gisella Ferraresi (eds.), (Linguistische Berichte Sonderheft 7 (1995–1996), Wiesbaden: Westdeutscher Verlag. Holmberg, Anders 1997 Scandinavian Stylistic Fronting: Movement of Phonological Features in the Syntax. In Working Papers in Scandinavian Syntax, 60: 81–124. 2000 Scandinavian Stylistic Fronting. In Linguistic Inquiry 31: 445–483. Jensen, Frede 1994 Syntaxe de l’ancien occitan. Tübingen: Niemeyer.
Stylistic fronting: A contribution to information structure
145
Jónsson, Jóhannes Gísli 1991 Stylistic Fronting. in Icelandic. In Working Papers in Scandinavian Syntax, 48: 1–43. Laka, Itziar Miren 1990 Negation in Syntax: On the Nature of Functional Categories and Projections, (MIT Working Papers in Linguistics). Cambridge/Mass.: MIT Press. Maling, Joan 1980 Inversion in Embedded Clauses in Modern Icelandic, Ìslenskt Màl, og almenn málfrædi, pp. 175–193 1990 Inversion in Embedded Clauses in Modern Icelandic. In Syntax and Semantics: Modern Icelandic Syntax, Joan Maling and Annie Zaenen (eds.), 71–91. London. Platzack, Christer 1987 The Scandinavian Languages and the Null-Subject parameter. Natural Language and Linguitic Theory 5: 377–401. Pinkster, Harm 1988 Lateinische Syntax und Semantik. Tübingen: Francke. Poole, Geoffrey 1992 The Case Filter and Stylistic Fronting in Icelandic. In Harvard Working Papers in Linguistics 1: 19–32. 1996 Optional Movement in the Minimalist Program. In Minimal Ideas: Syntactic Studies in the minimalist framework, Werner Abraham, Samual David Epstein et.al. (eds.), 199–219, Amsterdam. 1997 Stylistic Fronting in Icelandic: A case study in prosodic X° Movement. In Newcastle and Durham Working Papers in Linguistics 4: 249–283. Rizzi, Luigi 1990 Relativized Minimality. Cambridge, Mass: MIT Press. Rögnvaldsson, Eiríkur 1984 Icelandic Word order and _a§-insertion. Working Papers in Scandinavian Syntax, Vol 8. Rögnvaldsson, Eiríkur and Höskuldur Thráinsson 1990 On Icelandic Word Order Once More. In Modern Icelandic Syntax, Joan Maling and Annie Zaenen (eds.), 3–40. San Diego. Santorini, Beatrice 1989 The Generalization of the Verb-Second constraint in the History of Yiddish. Ph.D. dissertation, University of Pennsylvania. Sirgu§sson, Hàlldor 1997 Stylistic Fronting. Paper presented at “Workshop on Subjects, Expletives and the EPP”, University of Tromsø.
The superiority conspiracy: Four constraints and a processing effect Hubert Haider
English wh-in-situ restrictions are commonly analyzed in terms of locality and minimality type constraints, in minimalist theories as well as in optimality oriented approaches. The ‘Minimal Link Condition’ (MLC) is a present day rendering of Chomsky’s (1973) original concept of ‘superiority’.1 The wh-item closest to the top spec-position (‘spec C’) receives priority for movement to this position. A comparative look at Germanic languages, however, tells us that the wh-in-situ patterns do not follow a simple concept of locality or distance-based economy. From a cross-linguistic vantage point, English ‘superiority’ will be argued to be epiphenomenal of independently motivated grammar constraints, plus a processing restriction. The main claim of this contribution is as follows: MLC is inadequate for capturing the core patters of wh-in-situ. The crosslinguistic distribution patterns of wh-in-situ are determined by at least four independent grammatical factors: i) obligatory operator status of a wh-item in situ in spec-positions; ii) semantic type (individual level vs. higher type) of the licensing wh-element; iii) domain requirement for semantic integration of an adverbial wh-item; iv) strictly binary licensing relation for an in-situ wh-element. The residue of superiority-like cases not covered by these constraints seems to invite an account in terms of a processing restriction, rather than a structural constraint.
1. Revisiting wh-in-situ In Haider (2000), the common and the contrasting properties of German and English wh-in-situ constructions were derived as effects of semantic type and domain conditions (for adverbial wh-items) on the one hand and the different structural positions of the subject wh-elements on the other, plus a minimality condition relativised to the level of case features. This analysis is not fully adequate however. First, it remains silent on some intricate empirical issues (see examples 1 and 2), second, it does not satisfactorily capture the facts of Icelandic – as the crucial testing ground for OV/VO-based accounts
148
Hubert Haider
of crosslinguistic contrasts -, and third, as it seems, it elevated a processing restriction to the level of a grammatical constraint (see the discussion of the examples 3). Before going into details, let me enumerate the problem areas for the standard locality approach to superiority phenomena that will be discussed in this paper by assigning them to the constraints that will be shown to be responsible for these phenomena in the rest of the paper. i) obligatory operator constraint: The acceptability of an in-situ whelement in a functional spec-position depends on its ability to function as an operator that binds a variable. This is not the case for an in-situ whitem in its VP-internal argument position [see the ‘amnesty’ phenomenon illustrated by examples (1)]. ii) semantic type constraint: licensor and licensee must not both be operators ranging over ‘higher-than-individual’ semantic types [see examples (2)]. iii) Domain-mapping constraint (for semantic integration of adjuncts): Operators need to c-command their (semantic) domain. For higher order types (e.g. why, how), their domain of semantic integration is the domain of eventualities, whose structural counterpart is the tense domain. iv) minimal binding constraint: the licensing relation between a dependent, in-situ wh-element and its licensing wh-element is biunique: The dependent wh-element is licensed by a minimally c-commanding licensed whitem. As for the ‘amnesty’ effect, a satisfactory account of wh-in-situ construction should provide insight into constructions in which a wh-violation is partially legalized by a binding relation (see Hornstein 1995:144). This is the case if the illegal wh-in-situ element binds a variable (1b) or if the illegal wh-element c-commands (and locally licenses) another licit wh-element (1c). Why should ‘superiority violations’ become much less severe if the ‘illegal’ in-situ wh-element is followed by a ‘legal’ one, or if it binds a pronominal variable? The answer will be: an in-situ wh-element whose argument position is a spec-position is an obligatory operator, whose operator function must be satisfied (by having it bind a variable). The obligatory operator status of a subject wh-element is a crucial and independent factor for ruling out dependent in-situ wh-subjects in English.
The superiority conspiracy
(1)
149
a. * I’d like to know where who hid it b. (?) I’d like to know where whoi hid hisi papers (see Hornstein 1995:144) c. (?) I’d like to know where who hid it when (Chomsky 1981: 238, Kayne 1983: 235)
Second, a hitherto unnoticed restriction on the licensing relation for the in-situ wh-element can be easily identified in OV languages like German and Dutch. Both languages seem at first sight to defy a universal restriction – adverbial wh-operators like why and how cannot license each other – in the context of wh-in-situ constructions with more than two wh-items. But on closer inspection, this apparent violation provides a more precise insight into a biuniqueness property of the licensing relation between an in-situ wh-item and the wh-element it depends on. The fact that why and how cannot license each other (see 2a for English) seems to hold universally. But German and Dutch appear to allow exceptions (2b,c) in cases of intervening, properly licensed in-situ wh-element (see the contrast in 2b for German and 2c for Dutch). Upon closer inspection, it will turn out, however, that they are not exceptional at all. Note, first, that this case is different from the ‘amnesty’ case in (1). In the former case, the illicit wh-element needs to c-command a licit in-situ one. In (2), however, the illicit wh-item is c-commanded by a licit in-situ wh-item. The licit in-situ wh-element must precede (see 2e) the wh-adverbial. What this tells us is that the licensing relations are binary and that the third in-situ element is not licensed by the moved wh-item.2 (2)
a. *Why did he organize it how? – *How did he organize it why? b. Warum hat why has
man one
c. Wie hat man how has one d. Waarom why
was/*das what/that was/*das what/*that
heeft men has one
e. *Wie hat man how has one
wie how
organisiert? organized
warum organisiert? 3 why organized
wat/*dat hoe geregeld? what/that how organized
warum why
was organisiert? what organized
Finally, the superiority characteristics of long distance extractions need to be considered. The examples (3a,c) appear to be clear cases of a minimality
150
Hubert Haider
effect: Moving the lower wh-item across a wh-item in the matrix clause requires long distance movement whereas moving the other potential candidate would amount to local overt movement in the matrix clause only. Is this indeed the source of the unacceptability? The answer is most likely yes, but at least in the case of (3b,c), the unacceptability seems to be not the reflex of a grammaticality violation (i.e. violation of a minimal link condition) but rather a reflex of processing clashes, as shall be argued below. In Haider (2000), the patterns illustrated in (3b,c), originally discussed in Fanselow (1991: 330) and Müller (1995: 323), are accommodated in terms of a minimality requirement. The acceptability of (3b), however, would require relativizing the range of minimality-relevant features to the lexical level of an animacy feature, since was3rd sg./n. and wen3rd sg./m. differ only with respect to animacy, but not in person, number, or case. A more adequate account of the contrast between (3b) and (3c) and similar contrasts to be discussed later in the paper is one in terms of a processing restriction: in structure processing (i.e. parsing), antecedent-gap resolution crashes if a non-distinct element (belonging to another chain) interferes in the computation of an antecedentgap path. This is the case if a potential source position of a moved item contains a non-distinct element that depends on the head of the chain. English is a crucial case since the superiority effect is not restricted – as in German – to occurrences of non-distinct wh-items type but seems to hold for DP-type wh-elements in general: (3)
a. *Whati /*who(m)i did you ask who(m) [to fix ei ] Wasi hast du what did you
wen who
gebeten ask
c. *Weni hast du who did you
wen who
gebeten [dir ei zu zeigen] ? ask [youDat to show] – ‘Who did you ask who to show to you’
b.
[dir ei zu zeigen] ? [youDat to show]
If the examples (3a) and (4a,b) are representative,4 then they indicate that in English, the intervention effect is not triggered by the morphological nondistinctness of wh-elements, although it differentiates between categorially different items (see 4c,d). (4)
a. *Who(m)i did you give what [to ei ]? (Fiengo 1980: 123 ex. 16a) b. *Who(m)i did you introduce who(m) [to ei] c. [For whom]i did you build what ei ? (Fiengo 1980: 123 ex. 17b) d. [To whom]i did you introduce who(m) ei ?
The superiority conspiracy
151
It seems, however, that at least Fiengo’s judgment for the pattern (4a) is not generally shared, given Culicover’s (1997: 220, ex. 2b) characterization of (4a) as regular and not deviant at all.
2. Observational inadequacies and shortcomings of minimality-based accounts Contrary to widely held beliefs, concepts of minimality or locality are neither empirically nor descriptively fully adequate. The empirical coverage is poor in a cross-linguistic perspective, and even for English there are some problematic cases: (5)
a. Whomi did you promise [to timely phone up ei] when/where ? b. Wheni/wherei did you promise [to timely phone up whom] ei ? c. Whomi did you confess [to have tried [to timely phone up ei]]] when/where ?
In a strictly minimal link-driven derivation, either (5a) or (5b) ought to violate a minimality requirement for the matrix construal of the adverbials, given that in s-structure, one of the two wh-elements is obviously more deeply embedded than the other. In terms of a minimal-link condition (Chomsky 1995: 295,311), the matrix adjunct wh-element in (5b) is necessarily closer to the spec-position of the root than the wh-element in the embedded clause. Since infinitival embeddings can be iterated, there is no principled limit for the depth of embedding (5c). In a cross-linguistic perspective, contrasts like those between English and German, call for a principled explanation. In German, a subject whexpression may stay in situ, so the well-known English subject-object asymmetry for non-‘d-linked’ (Pesetsky 1987) wh-pronouns is not found (Haider 1984). The German examples (d-f) correspond to the English examples (a-c), but the German ones are all fully acceptable and attested.5 (6)
a.
Who saw what ?
d.
Wer hat was gesehen?
b. *What did who see?
e.
Was hat wer gesehen?
c. *When did who see it?
f.
Wann hat es wer gesehen?
In fact, data6 that show the irrelevance of a minimal link condition in accounting for the subject-in-situ cases are noted and discussed already in
152
Hubert Haider
Chomsky’s Pisa lectures (1981: 236f.). The relevant contrasts are illustrated in (7). A wh-subject is ungrammatical in situ, independent of superiority contexts: (7)
a.
It is unclear who thinks (that) we saw whom
b. *It is unclear who thinks (that) who saw us c.
I don’t know who would be happy if he/*who won the prize (Chomsky 1981: 236)
Whatever principle accounts for the ungrammaticality of wh-in-situ in (7b,c) will account for the ungrammaticality of (6b,c). (7b,c) are unaccounted for under superiority, and under current accounts in terms of shortest move (or MLC), as well, simply because they do not involve a competition between a shorter and a less short move. In the following sections I shall review the contexts and conditions that in sum produce an apparent ‘superiority conspiracy’ for English. Cross-linguistic comparisons will show that the superiority conspiracy is the combined result of several, independent factors, some of which are structural and some of which are interface effects (at the syntax semantics interface). Let me add a final critical remark on methodology: Not only the minimalist approach is too narrowly tailored to English; other approaches to multiple wh-constructions, as for instance Erteschik-Shirs (1997, sect. 6.2) ‘focus structure’-based one, suffer from the same defect, namely too small (and therefore not sufficiently representative) a database. Her constraint (subject constraint; Erteschik-Shir 1997: 191–193) would incorrectly rule out wh-in-situ subjects in German, Dutch and other OV languages on a par with English. But, her approach can be saved, if the notion of ‘SUBJECT’ in Erteschiks-Shir’s subject constraint is construed not as ‘argument licensed under agreement with the finite verb’ but as ‘argument licensed in the spec of a functional projection of agreement with the finite verb’. In this case, her subject constraint and the obligatory operator constraint produce the same outcome.
3.
Wellformedness conditions for wh-in-situ
3.1. The obligatory operator status of wh-elements in spec As illustrated above (see 7), the grammatical source of the deviance of dependent wh-subjects in English is independent of superiority contexts and
The superiority conspiracy
153
it is a VO-type phenomenon. It will be argued that the deviance is triggered by the structural configuration of the subject position in a VO language as a functional spec position. In German (an in other OV languages), the subject remains in its VP-internal subject position (see section 5), whence the contrast with English in the wh-in-situ contexts. An in-situ wh-element gains obligatory operator status by virtue of being in a functional spec-position whose head checks an argument feature of this wh-element (to be elaborated in section 3.2.1). This is the basis of the account for the ungrammatical sentences in (7): a wh-subject in-situ that does not bind a variable is a vacuous operator and therefore ungrammatical. If the variable binding requirement is met, however, the ungrammaticality status changes. This is the source of the amnesty phenomena mentioned in the introductory section. For the sake of convenience, I repeat the examples: (8)
a. (?) I’d like to know where who i hid his i papers b. (?) I’d like to know where who hid it when
In (8a), the wh-subject binds a pronominal variable; in (8b), it binds a whvariable and thereby satisfies the operator requirement.7 It will be demonstrated in the next section that in (8b), who is indeed the licenser for when. A wh-element in situ is licensed in a strictly minimal environment, that is, in the minimal c-command domain of a potential licenser. This will be made clear in the discussion of German data in the following section. German provides independent evidence for the obligatory operator status of a wh-element in a functional spec-position since in-situ wh-elements can be either interpreted as indefinite pronouns or wh-expression: (9)
(ambiguous) a. Wie oft hat wer angerufen ? how often has who phoned-up ‘How often did someone call?’ – ‘Who called how often?’ b. Wer hat oft angerufen *(?) Who has often called? ‘Who has called often’ vs. * ’Someone has often called’
If the wh-pronoun is moved to spec-C, it cannot be interpreted as an indefinite pronoun. By virtue of being in the spec-C position, it is bound to function as an operator. Let us call this the obligatory operator constraint of the superiority conspiracy: Fully licensed (see below) wh-elements in a functional spec-position are operators. Therefore, variable binding may satisfy their
154
Hubert Haider
operator status. Note that the VP-internal trace of the subject (in ‘spec-VP’) in English does not qualify as a variable because the subject is not fully licensed as an argument in the VP-internal position.
3.1.1. Remarks on ECM and subjunctive subjects, and on the vacuous movement thesis The particular reference to checking in the first paragraph of the preceding section is motivated by the difference between a nominative subject on the one hand and an exceptionally case-marked subject (examples 10a,b from Chomsky 1981: 236) or the subject of an English ‘subjunctive’ construction 8 (Bresnan 1977) on the other hand. These subjects (10a,c) are not subject to the obligatory operator constraint: (10) a.
Who believes [whom to have read the book]
b. *Who believes (that) [who has read the book] c.
Who recommended that [who be fired]
For (10a) the standard ECM-account is sufficient for handling it: If the governing verb is the source/licenser of case of the ECM-subject, the ECM-subject is not checked by a functional head. A similar consideration applies to (10c). In the subjunctive construction, agreement is suspended. So agr-S does not check the subject, hence the in-situ-position is not checked by the functional head.9 Let me add that this is of course but a descriptive characterization of the respective contexts in which the obligatory operator effect is operative. My final remark concerns clause initial wh-subjects and the vacuous movement hypothesis: Note that the operator constraint is not a variance with the assumption that a clause initial subject wh-element stays in its functional subject position in English and is not string-vacuously moved to Spec-C. The reason is that wh-elements are licensed in one of two settings, namely directly or dependently. Direct licensing applies in Spec-C. In the main clause, the wh-feature is checked by clausal typing, in dependent whclauses, the wh-feature is checked by subcategorization. The operator constraint is trivially fulfilled if a wh-element is construed as the clausal typing wh-element, that is, the wh-element in the highest functional position available. In this case, the wh-element is construed as the wh-operator of the clause. So, a wh-subject may remain in its subject functional spec position (if there is no higher overt functional projection targeted by another wh-
The superiority conspiracy
155
phrase) without violating the obligatory operator constraint. Only in case the wh-subject is a dependent wh-element (with another wh-element in Spec-C) the obligatory operator constraint applies, and it becomes responsible for a secondary effect: The operator property must be fulfilled, but the wh-element cannot be the wh-operator of the clause (since this function is assigned to the wh-element in spec-C). So, a secondary, wh-unspecific operator function becomes crucial, namely operator-variable binding function.
3.2.
Semantic type restrictions on in-situ licensing
3.2.1. Why and how as higher type operators Why and how differ from other adverbials like temporal or local ones with respect to the semantic type of the phrase they are applied to. Argumental, temporal or local wh-elements are operators that range over individual type variables. Bare reasons and manner adverbs range over higher order types, namely properties of eventualities. This has been noted frequently (Aoun & Li 1993: 153; Szabolcsi & Zwarts 1993, Hornstein 1995, Reinhart 1998).10 Reinhart (1998: 45) claimed on the evidence of patterns as (11a,b) and (12a,b) that the type property would disqualify why and how for in-situ usage in general. But German (12c,d), and corresponding data from Dutch (see 15), Yiddish 11 (Diesing 2001), Japanese (15b,c from Saito 1994: 195), or Hungarian (Kiss 1993: 191), to name just a few languages, show that the case of (11a,b) must be distinguished from the case (12a,b). In all these languages the pattern corresponding to (12) is fine,12 but (11) is out, and (11) is out on universal grounds. (11) a. *Why did he fix it how?
c. *Weshalb hat er es wie repariert?
b. *How did he fix it why?
d. *Wie hat er es weshalb repariert?
(12) a. *Who fixed it how? b. *Who fixed it why?
c. Wer hat es wie repariert? d. Wer hat es weshalb repariert?
The adequate descriptive generalization formulated in Haider (2000) is this: A wh-element denoting a wh-operator that does not quantify over individual terms does not license in-situ a wh-element that does not quantify over individual terms. The split was-für construction in German provides independent evidence for this generalization: 13
156
Hubert Haider
In the split construction (13a), 14 was – as the split off wh-element – does not quantify over individuals, but rather over higher order entities (namely kinds; cf. Beck 1996, Pafel 1996). So, (13a) is predicted to be ill-formed. The fact that (13c) is grammatical shows that the ungrammaticality of (13a) cannot be attributed to an intervention effect triggered by the in-situ whitem preceding the phrase with the extraction site. (13b) is the variant without splitting off the wh-element, that is, pied piping of the complete DP instead. (13) a. *Wasi hat er wie/warum [ei für Autos] what has he how/why [for cars] ‘What kind of cars did he repair why/how?’ b. [Was für Autos] [what for cars]
hat has
er he
wie/warum how/why
repariert ? repaired
repariert ? repaired
c. Wasi hat er denn wem/wann [ei für Fragen] gestellt? whomDat/when for questions put what has he PRT ‘What kind of questions did he ask whom/when’ The type restriction on higher type in-situ wh-expression is operative in all ‘how-x’ constructions (e.g. how many, how expensive, how big, …), since these expressions all denote higher order predicates. The following German examples illustrate the expected restriction: (14) a. ?? Warum hast Du wie große Kartoffel ausgewählt? why have you how big potatoes chosen b. ?? Wie große Kartoffel hast Du warum how big potatoes have you why
ausgewählt? chosen
In sum, this is the second constraint of the superiority conspiracy against wh-in-situ items, and it is a semantic type-restriction that becomes effective at the syntax semantics interface as a semantic type constraint for dependent wh-elements. The semantic type constraint forbids higher type wh-elements to license each other. Therefore, a higher-type wh-element in situ is ill-formed if its licensing wh-element is also a higher-type wh-element. This constraint is likely to be cross-linguistically invariant, that is, there are no language specific variant factors that restrict its application. Note that this constraint is not a syntactic one, but one of semantic construction.15
The superiority conspiracy
157
3.2.2. VO/OV and a domain requirement for adverbials A typological factor (namely VO vs. OV, or, head-initial vs. head-final) narrows the range of the third constraint of the superiority conspiracy so that it applies in VO but not in OV. It is a domain mapping constraint. Note that the semantic type constraint discussed above cannot be made responsible for the ban against higher order postverbal in-situ wh-elements in English (as in 12a,b above), because the corresponding patterns are grammatical in OV languages, as the German examples (12c,d) or the Dutch one (15a) or the Japanese examples (15b,c) illustrate. A higher order type wh-adjunct can be licensed in situ by a licit individual level type wh-operator. (15) a. Wie heft het waarom/hoe who has it why/how ‘Who did it why/how?’ b. John-ga JohnNom
nani-o whatAcc
naze why
gedaan? done?
katta no bought-Q ?
Dutch
Japanese (Saito 1994: 195)
c. Dare-ga naze kita no who-SUB why came Q-PRT ‘Who came why?’ The ungrammaticality of the pattern in (12a,b) in VO languages like English reflects an independent constraint: on higher order type wh-adverbials cannot be successfully licensed in postverbal positions. This is confirmed also by the German examples in (16), Yiddish (17), French (18a) and Portuguese (18b). (16) a. (?)Er hat he has
dagegen protestiert it-against protested
deshalb therefore
b. * Wogegen what-against
hat has
er he
protestiert protested
weshalb? why
c.
hat
er
weshalb
protestiert?
Wogegen
In German, the extraposed position is not licit for the wh-variant (16b), although extraposition is possible to a certain extent with the declarative form (16a). Independent confirmation is abundantly available in Yiddish. Yiddish provides an excellent testing ground because arguments and adjuncts may occur either in post- or in preverbal positions. But, whadjuncts like ‘why’ and ‘how’ occur only in preverbal positions (17a), and
158
Hubert Haider
not in postverbal ones (17b). Note, however, that the preverbal position in (17a) is the position that is not available in English. (17) a. ver hot who has
vi azoy /farvos how /why
ongeklungen called
der mamen the mother
(Diesing 2001; ex. 8 and 9) b. *ver hot who has
ongeklungen called
der mamen the mother
vi azoy /farvos how why
What the above examples illustrate is the effect of a domain condition for the semantic integration of adverbials: A wh-element denoting a wh-operator that ranges over higher order entities (e.g. predicates or propositions) needs to c-command (the head of) the phrase it is applied to as an operator [i.e. the (head of the) VP or its functional extension]. Assuming that c-command maps to precedence (cf. Haider 1992, Kayne 1994), postverbal positions are necessarily embedded positions. An adjunct wh-operator can only satisfy the domain mapping constraint in a position preceding the verb. But a VO clause structures does not leave room for such a position in between the subject and the VP either. This leads to ineffability, if a wh-subject and a higher order adjunct would have to co-occur. Analogously, in French and in Portuguese, a reason wh-adjunct is not allowed in the postverbal position. (18) a. *Tu es you have b. *O what
venu come
que fizeste you-did
pourquoi ? why porquê why
(from Aoun 1986: 97) (from Costa 2002,16 ex. 84a)
The examples above illustrate the third constraint of the superiority conspiracy, the domain mapping constraint. In Haider (2000), it is identified as a domain condition for the semantic integration of adverbials: A wh-element denoting a wh-operator that ranges over higher order entities (e.g. predicates or propositions) must c-command (the head of) the phrase it is applied to as an operator (i.e. the (head of the) VP or its functional extension). Higher order adjuncts operate on events and propositions. Hence they are expected to c-command the element that situates the event variable. This element is the verb that carries the T-markings. Given that the postverbal positions are necessarily embedded position (cf. Haider 1992, Kayne 1994), c-command entails precedence.
The superiority conspiracy
159
But – and here the OV/VO factor becomes crucial – preverbal positions, that is, positions in the range between the functional subject position and the left edge of the VP, are highly restricted for independent reasons to the extent that phrasal adverbials cannot occur in these positions (see Haider 2000a) in VO languages in contrast to OV languages (see 19a,b). A radical but insufficient solution has been proposed by Rizzi (1990: 47), who suggests that sentential adverbs are base-generated in spec CP on the evidence that why does not show a negative island effect (19c). But this fact simply shows that the base position of why cannot be in the scope of negation, as German clearly shows in (19d). How, however, cannot be base-generated above the negation, since it must be predicated on a domain that denotes an event, that is, a V-projection.17 (19) a. *Who has when/why left? b. Wer ist wann/weshalb weggegangen? c. Why/*how didn’t you come? d. Wer ist who is
weshalb /*wie why /how
nicht not
gekommen? come
3.2.3. Evidence for a strictly binary licensing function The type constraint discussed in 3.2.1 provides insight into the licensing relation between the moved wh-item and the in-situ items in general. In principle, the licensing relation could be a one-to-many relation (like operator binding) 18 or a strictly one-to-one relation. In the first case the moved element licenses all-in-situ elements in its domain as bound variables. In the second case, licensing is a strictly minimal and bi-unique relation. The moved element as the highest wh-element licenses only the highest in-situ wh-element which in turn licenses the next lower one, and so on. In this case, licensing would be a strictly binary function. The following data (20) provide the empirical basis for a decision between the two options. The bi-unique one is the empirically correct one. If licensing were a one-to-many relation, a clause with a higher order type wh-element in the top spec-position (‘spec C’) is predicted to be deviant if one of the in-situ wh-elements is of a higher type too. This is a consequence of the semantic type constraint discussed in section 3.2.1 under the specified licensing conditions. If, however, licensing is a biunique relation, it matters whether the two higher order wh-elements are in a direct licensing relation
160
Hubert Haider
or in an indirect one. In the latter case, the constraint would not apply. The following data support a bi-unique licensing function: (20) a. Warum hat man was/*das wie organisiert ? why has one what/that how organized b. Waarom heft men wat/*dat hoe geregeld ? why has one what/that how organized c. *Wie hat man how has one
weshalb was why what
d. Wie hat man how has one
was what
(German) (Dutch)
organisiert ? organized
(German)
weshalb organisiert ? why organized
(German)
(20) illustrates two key facts. First, a clause with two higher-order wh-items is ungrammatical, unless an additional wh-element occurs, and second, the additional wh-element must c-commanded the second, higher order wh-element. In (20ab), the intervening object wh-element saves the otherwise ungrammatical clause. (20a,b) and (20cd) form a minimal pair with respect to the relative order of the in-situ wh-elements. In the acceptable patterns, the moved wh-element licenses the in-situ wh-object, which in turn is the licenser for the following in-situ wh-adjunct. This strict biuniqueness restriction (the minimal binding constraint) on the licensing relation is the fourth constraint for in-situ wh-elements, referred to in the title of this paper. The next section addresses the discourse linking effect that has been hypothesized to account for the licensing of DPinternal wh-elements: it will demonstrated that this effect can be explained as a subcase of the semantic type constraint.
3.3. D-linking – a dispensable condition ? Pesetsky’s (1987) d-linking hypothesis19 is based on the contrast between a bare and a DP-internal wh-element in subject position (21a,b), and on the contrast between ‘which’ and ‘how many’ (21b, 22b): (21) a. * Mary asked whati who read ei
(Pesetsky 1987: 104)
b. Mary asked which booki [which men] read ei ? Evidently, the wh-subject in (21b) does not violate the constraint violated by (21a). Pesetsky attributes this to a context relation. (21b) presupposes that
The superiority conspiracy
161
there is a set of men, some of which read some books. This contrast had been noted before (e.g. Fiengo 1980, Gueron & May 1984) and construed as a structural effect, given that the wh-element in (21b) is embedded in a DP. Pesetsky takes examples as in (22) as evidence against a specifier-related source of the contrast in (21). (22) a. I need to know how many people voted for whom b.* I need to know whomi how many people voted for ei ? (Pesetsky 1987:107) According to Pesetsky, how many cannot be d-linked, and so it is subject to the same operator constraint as a bare wh-pronoun is. This assumption is not self-evident, however. If we replace ‘people’ by ‘men’ in (22), the discourse common ground for (21b) is identical with that of (22), with respect to the presupposed set of men. Nevertheless the two sentences differ in acceptability. The essential difference seems to be not one in terms of d-linking but rather one in terms of the semantic type of the wh-operators. The wh-operator denoted by ‘which’ ranges over individual type variables, ‘how many’, however, does not range over individual type variables, but over cardinalities, that is, properties of sets. The ‘which’ operator binds the referential variable (‘role R’) of the NP and thereby satisfies its operator requirement for a whelement in the spec (of DP). ‘How many’ is a higher type operator, so its binding requirements as an operator are not met DP-internally. Therefore it turns the DP into an operator phrase, and, since the DP is in a spec-position itself, it must have a variable to bind. In short, this type difference rather than ‘d-linking’ apparently is the crucial difference between (21) and (22). German provides independent evidence for the type distinction. As expected, a phrase that denotes a higher type operator cannot license a higher type wh-phrase in situ. ‘How many’ (as in 23a) is a higher order operator (ranging over cardinalities) that, unlike ‘which’ cannot be satisfied DP-internally. So it is predicted that a ‘how many’-phrase cannot depend on a higher order wh-item. This is confirmed by (23a). (23) a.*/??Weshalb/wie sind wieviele ‘why/how did how many b.
Wann/*wie ‘when/how
c.
Welche ‘which
sind are
Gäste guests
wieviele how many sind are
Gäste guests
abgereist ? leave’
Gäste guests
abgereist ? departed’
weshalb/wie why/how
abgereist ? departed’
162
Hubert Haider
As shown above, German allows wh-subjects in-situ (because they do not require overt checking in a functional spec-position, and therefore are not subject to the operator constraint). So, the deviance of the example (23a) cannot be triggered by the property of being a subject but rather by the semantic type properties. In sum, the semantic type of the wh-element in combination with its structural position is sufficient for modeling the distribution of phrases with DP-internal wh-elements. D-linking is thus a dispensable concept. There is no need to additionally immunize some in-situ wh-expressions against general well-formedness requirements for in-situ wh-phrases.
4. A residue of superiority or a processing effect ? If superiority (or a minimal link condition, or a shortest move requirement) is a genuine structural constraint, one of its clearest contexts of application ought to be one in which the wh-elements are contained in different clausal domains. Data as in (24) are usually cited as cardinal evidence for a superiority phenomenon. (24) a. Who persuaded who(m) [to visit you]? b. * Whoi did you persuade who(m) [to visit ei ]? c. Whoi did you persuade her [to visit ei ] ? d. Whoi did you persuade ei [to visit who(m)] ? In a derivational perspective, the offending property of (24b) is this: a whelement from a lower clause is moved across a wh-element in the higher clause. In other words: The more distant wh-item is moved, and the wh-element closer to the target position is left in situ. In a representational view, the trace of the moved wh-phrase is c-commanded by a closer potential antecedent (i.e. the in-situ-wh-element). So, in any case, a minimal link requirement seems to provide the empirically correct distinctions. But, there are facts that are less easy to handle. These are known at least since Fiengo’s (1980) detailed study (and led Pesetsky to the d-linking proposal): (25) a. Whoi did you introduce [which people] to ei ? b.* Whoi did you introduce who to ei ? c. Whati did you tell [which people] about ei ? d.* Whati did you tell who about ei ? (Fiengo 1980: 126)
The superiority conspiracy
163
The contrasts between (25a,c) and (25b,d), respectively, are hard to reconcile with a minimal link condition on movement or on the antecedent-trace relation. (25a,c) obviously violate superiority, because there is an alternative that obeys the minimal link condition. The crucial factor for (25) seems to be the category difference between the wh-phrases involved. Fiengo (1980: 125) not only contrasts bare wh-pronouns with DP-internal ones (as in 25) but also with PP-internal ones in contrast with P-stranding (as in (26). (26) a. Whati did you give ei to whom ? b. [To whom]i did you give what ei ? c.* Whoi did you give what [to ei]? 20 The result is identical: If the wh-elements are of different categories, there is no superiority effect attested. This is unexpected in a minimal link scenario. Category distinctions should not matter because the crucial property, namely minimal link, is category independent. What matters for a minimal link algorithm is just the property of being a wh-phrase, that is, a phrase with a wh-feature that needs to be checked or licensed. German is particularly instructive because the distinctions much more fine grained, as mentioned already at the end of section 1.21 Apparent superiority effects are found only with wh-elements that are non-distinct in form. gebeten [davon ei abzuhalten] ? (27) a. * Weni hat er denn wen [to keep away from it]’ ‘whoAcc has he PRT whomAcc asked b. Was hat er denn wen gebeten, für ihn zu erledigen ? whatAcc has he PRT whomAcc asked for him to carry out c. Wohini hat er denn wem wherePP has he PRT whomDat d. Wemi
versprochen [die Schlüssel ei zu legen]? promised [to put the keys]
hat er denn wen
whomDat has he PRT
whomAcc
gebeten [die Nachricht ei zu übermitteln] ? asked [to transmit the message (to)]
e. Weri glaubt er/*wer wohl daß ei ihm seine Arbeit hier bezahlen werde? who thinks he/who PRT that ei him his work here pay will
164
Hubert Haider
The wh-elements in the unacceptable examples (27a) and (27e) are of the same form. In (27b), only the form (animate vs. inanimate) differs, but the case is the same as in (27a). This is sufficient, as the acceptability contrast shows. As expected, category differences (27c) or case-driven form differences (27d) suffice, too. Note, however, that a case difference is not enough, if the form does not differ. For the neuter pronoun there is no case distinction between nominative and accusative in German, and the result is unacceptable (28b). Note that this is independent of the wh-status of the ‘was’ in the matrix. (28b) is unacceptable even with ‘was’ construed as an indefinite pronoun. The sheer identity of form in a potential trace position of the fronted whelement seems to be a sufficient obstacle for the parser in the successful scouting for the wh-movement path. (28) a.
Wen/wasi whomAcc/whatAcc
hat has
dieser Umstand this circumstance
dich bewogen you prompted
[für diese Aufgabe ei auszuwählen] ? [for this task to choose] b.
Wen/*was hat whomAcc/whatAcc has
was dich what you
bewogen prompted
[für diese Aufgabe auszuwählen] ? [for this task to choose] ‘whom/what did what/something promt you to chose for this task’ What these (un)acceptability patterns point to is not a grammar based constraint but a processing restriction. Identity of form (despite differences in grammatical functions) is not a plausible parameter of a principle of grammar but a factor that is obviously relevant for processing. Hence, under these circumstances, the apparent superiority effects can be reconstructed as the break down or the impediment of a processing routine. The antecedent-gap computation algorithm blocks at the moment in which a wh-element is encountered that is identical in form with the wh-element whose antecedent-gap relation is still under computation. If the trace of the antecedent is conceived of as a copy of the moved antecedent, blocking under identity is the crash at the decision whether the identical element is a copy of the moved one or an independent wh-element bound by the moved wh-element. The antecedent-trace relation and the binding relation between a moved wh-element and the in-situ one is formally not distinct, whence the breakdown of the processing algorithm. In other words, the processing con-
The superiority conspiracy
165
flict is this: When the processor has an uncompleted chain in its buffer, it cannot assign a second, identical item it encounters to a different chain, but is bound to automatically analyzing the second item as a trace copy belonging to the current chain, with the result, that in the examples discussed above, the chain structure is deviant.
5. Accounting for cross-linguistic variation With respect to wh-in-situ constructions, English and German are two extremes in a system space that allows for parameterizations. The German clause structure provides more room for in-situ wh-element than the English one, for the following reason: German nominative checking is, like in Icelandic, not constrained to a specific structural position, so subjects remain in their VP-internal position, and hence the operator constraint for wh-elements in a spec-position does not become active.22 In English, in-situ wh-subjects are excluded by the operator constraint. Postverbal manner and reason adverbials in English are too low in structure to c-command their linking domain, that is, the domain of the finite verb. German is an OV language. Adverbs, and in particular manner and reason adverbs precede and c-command the finite verb or its trace and thereby meet the c-command requirement. Icelandic is a good testing ground. It shares the VO properties with English, but the case checking system with German. In particular, spec I is not the only and obligatory licensing position for the nominative in Iceland, as the quirky subject phenomenon shows. Since nominative is licensed in the VP-internal position, the relation between the subject in the spec I position and its VP-internal base position is different from English. In Icelandic, a fully licensed DP is moved to spec I (with nominative or any other case, if it is a quirky subject). In English, the nominative needs to be licensed by spec-head agreement, so the subject gets licensed in the spec-position. As a consequence, the chain relation between the DP in spec and its base position in English and Icelandic is different. In English, the head of the NP-movement chain is in the licensing position. In Icelandic, an already licensed subject DP is moved to the spec-position. This difference becomes crucial if the element in spec is a wh-element. In Icelandic, this element is the antecedent of a trace in a case-licensed position. Hence, the trace position qualifies as a variable. In English, however, the trace in the VP-internal subject position is not case-licensed, hence it is not a variable, and hence an English wh-subject in Agr-s cannot satisfy its
166
Hubert Haider
operator property simply by binding its VP-internal trace. So, the prediction for Icelandic is this: In-situ wh-subjects do not violate the operator constraint and should be acceptable, ceteris paribus. An example is given in (29a). (29b) confirms that postverbal higher order adjunct wh-elements behave like in English. They are excluded because of the domain mismatch. (29) a. Hva# hefur hver gefi# börnunum? what has who given children-the (Ottósson 1989, Hálldor Sigur§sson, p.c.) hvers vegna? e. * Hva# hefur hveri gefi# börnunum what has who given children-the why Dutch and German are siblings with respect to OV properties, but differ in the case checking system. Dutch does not have overt morphological case distinctions in the non-pronominal paradigms. Moreover, subtle differences in the clause internal organization indicate that in Dutch at least transitive subjects are in a VP-external position. I shall briefly mention two contrasts: First, Dutch unlike German does not allow pronoun fronting across transitive subjects (30b), and Dutch requires an expletive in impersonal passives (31b) vs. (31e): (30) a. daß ihm that him b.* dat
hem
jemand Bücher someone books
gab gave
iemand
gaf
c. dat iemand het that someone it
boeken
(aan) Jan gaf (to) Jan gave
(German) (Dutch) (= 30a) (Dutch)
Given that the left boundary of the VP is the barrier for pronoun fronting, the different fronting options indicate different boundary conditions: In German the VP spans the whole ‘mittelfeld’; in Dutch, the ‘mittelfeld’ provides a structural subject position. Dutch requires an expletive, just like English, if there is no other element available for the structural subject position, as for instance in the case of locative preposing. (31a) and (31c) are analogous in this respect. The incompatibility of do-support with locative preposing in English demonstrates that the locative is indeed in relation with the subject position.
The superiority conspiracy
167
(31) a. In deze hoek werd (er) volgens mij gefluisterd (Du.) in this corner is (there) according-to me whispered (Paardekoper 1963: 55) b. Werd *(er) gefluisterd (in deze hoek)
(Dutch) (Paardekoper 1963: 55)
c. On this spot (there) will stand a huge tower – On which spot stood a huge tower ? d. Will *(there) stand a huge tower on this spot ? e. Wurde (*es) was there
geflüstert whispered
in dieser Ecke ? in this corner
(German)
German does not allow an expletive. This cannot be blamed on (expletive) pro-drop, because German does not allow dropping pronouns, neither referential ones nor quasi-arguments, nor expletive ones, as (32) illustrates: (32) a. Hat *(es) has it
geregnet rained
in der Nacht? in the night
(quasi argument)
b. Hier lebt *(es) sich gut (intransitive middle construction) here lives it itself well (‘here one lives well’) c. War *(es) sehr unangenehm, daß kein Taxi kam? (extraposition) was it very unpleasant that no taxi arrived d. Wurde (*es) getanzt? – Es wurde getanzt23 (Ge.) was it danced – It was danced
(intr. passive)
e. daß (*es) getanzt wird (Ge.) vs. dat *(er) gedanst wordt (Du.) (intransitive passive)
that
it
danced is
vs. that it
danced is
In combination, these two sets of facts show that there is good reason to assume that the Dutch clause structure provides an overt structural subject position. But, Dutch shares with German and Icelandic the property of relational case checking for the nominative. In sum, given these structural properties of the Dutch clause, Dutch should allow in-situ wh-subjects since they are either VP-internal or they are in a configuration like in Icelandic, binding a variable in the form of the trace of a case-licensed DP. The judgments 24 for the object-subject order is less uniform, than one would expect. Higher adverbials in-situ are judged acceptable, as expected (33c).
168
Hubert Haider
(33) a. Ik weet niet wat wie gekocht heeft I know not what who bought has b. Wie weet
(perfect: 4; ?: 7; *: 8)
wat
wie gekocht heeft voor zijn zusje ? (perfect: 8; ?: 3; *: 8) who knows what who bought has for his sister
c. Wie heeft het waarom gedaan ? how has it why done
(perfect: 19)
The fact that in Fanselow’s survey, there is a substantive minority, namely 8 informants out of 19, who objects against (33 a,b) shows that further investigation is necessary in order to identify intervening factors that account for the variation. Hungarian multiple wh-questions (Kiss 1993: 98–101) support the basic tenet of the discussion above, too. Superiority, minimal links, or shortest moves are not primitive constraints of universal grammar. They are epiphenomenal. Hungarian wh-patterns do not reflect a simple superiority condition: (34) a. Kinek whoDat
mit whatAcc
hozott bought
János John
b. Mit whatAcc
kinek whoDat
hozott bought
János John
c. Mit what
ki who
akar? wants?
(Kiss 1993: 98)
(Kiss 1993: 101)
The constraints that Kiss (1993) found to be operative in Hungarian are not structural locality conditions but conditions that rest on the semantic type and content of the elements involved.
6. The grammar of wh-in-situ in wh-movement constructions A wh-element in the topmost spec-position of a clause (‘spec C’) is directly licensed. Either it is checked in a spec-head configuration of a selected head (subcategorization, in case of an indirect question), or it is directly interpreted by clausal typing, as a mood indicator25 (in the case of an interrogative main clause). An in-situ wh-element is indirectly licensed: It must be bound by a licensed wh-element. If it is not bound, it cannot be interpreted as a wh-element, with the exception of an echo-questions interpretation, of course,
The superiority conspiracy
169
which is an erotetic26 utterance, but its form (Satzmodus = sentence mood) is not interrogative. Note, eventually, that an account of in-situ licensing in terms of binding rather than covert wh-movement does not get into the well-known difficulties with constructions that defy the standard constraints for wh-movement. Covert wh-movement ought to be blocked (given the standard constraints on movement) when the in-situ element is contained in an adverbial clause (35a), or in an indirect question (35b), or in a predicative phrase (35c), since overt wh-movement is ungrammatical in all these contexts. Binding, however, is unproblematic, of course. (35) a. Wer who
ist has
abgereist [bevor left [before
man one
b. Wer who
hat gesagt, [wieviel wer has said [how much who
c. Wer who
ist has
[womit unzufrieden] [what-with discontent]
wen who
vermisst missed
bezahlt paid
hat] ? has]
hat] ? has]
weggegangen ? left ?
7. Summary and conclusion Let me summarize the claims made above: In-situ wh-items are licensed indirectly. Indirect licensing is subject to various constraints, some of which apply to any human language, and some of which are type dependent. A cross-linguistically universal constraint is the semantic type restriction. Higher-type wh-elements cannot license a wh-element of a higher type. This is the constraint that covers the universally exceptional behavior of why and how. The additional restrictions for why and how in VO languages, that are also type driven, are the effect of a domain condition. In OV languages, in contrast to VO languages, the c-command requirement for higher order wh-adjunct cannot be satisfied, if the preverbal position is not available, whence the contrast between OV and VO languages. Note that the domain condition in itself is a universal constraint, but the contexts where it can apply are provided only by VO languages and not by OV languages. The third constraint (obligatory operator constraint) is structurally geared. A wh-element in-situ that is licensed in a spec-position is an obligatory operator. This accounts both for the ungrammaticality of a nominative wh-in situ in VO-languages and the ‘amnesty phenomenon’, when the in-situ nominative binds a variable, either in form of another in-situ wh-element or in form of a bound pronoun.
170
Hubert Haider
The final constraint of the four constraints discussed in this paper (minimal binding constraint) is the relative minimality restriction for indirect licensing that accounts for the phenomenon that an interrogative with a higher-type wh-element in spec may contain an indirectly licensed higher-type wh-element, but only if this is c-commanded by an indirectly licensed wh-element of the appropriate type. Note that this is a minimality requirement, but crucially not a bijection requirement,27 because a wh-element in spec may well license more than one wh-phrase in situ, if the in-situ elements are not in a c-command relation, as illustrated by the well-formed construction (36): hast du gesagt, als sie wen suchte, (36) Welcher Frau when she whom looked-for, which womanDat have you told, daß du ihn wo vermutest ? that you him where suspected In this example, the in-situ wh-elements are contained in extraposed clauses, and none of them c-commands the other. So they must be both indirectly licensed by the moved wh-element in the matrix clause. In conclusion, the wh-in-situ constructions do not call for an account in terms of MLC. In a cross-linguistic perspective the set of English ‘superiority’ data turn out to be conditioned by independent constraints, most of which belong to the syntax-semantics interface. There is only a small residue of data that the MLC could be made responsible for, namely the general resistance of extracting a wh-element from an embedded clause into a higher clause, across a wh-element in the higher clause, as illustrated in (3a). But (3b), the German example, already suffices to cast doubt on an MLC solution for (3a). If (3a) is ruled out by a universal restriction on minimal links, the very same restriction ought to apply to German. Hence it seems more profitable to look for a processing solution for the English data, on a par with the German data discussed in section 4.
Acknowledgements I am extremely grateful to Gisbert Fanselow and an anonymous reviewer for their thorough, generously detailed, constructive, and extremely helpful critical comments to the benefit of the reader. Remaining shortcomings are of course mine.
The superiority conspiracy
171
Notes 1. Chomsky 1973, p. 246: “No rule can involve X,Y in the structure […X…[…Z…WYV…]…], where the rule applies ambiguously to Z and Y and Z is superior to Y.” ‘Superior’ is defined as follows: “Category A is superior to category B in the phrase marker if every major category dominating A dominates B as well but not conversely.” 2. Note that this has consequences for theories that take in-situ wh-elements to be licensed by covert movement to Spec-C. In this case it is not obvious why a third wh-element may lift the ban against the co-occurrence of two higher-type adverbial wh-elements. 3. Example from the web [www.oif.ac.at/aktuell/news_05_2002_wedding.html]: i) Warum wer wie heiratet. why who how marries 4. Judgments vary across informants between more severe and less severe deviance. 5. Examples are easy to find in the media: i) Wo wer im Schwimmbad hingehört, weiß offensichtlich jede ‘where who in-the swimming-bath belongs-to knows obviously everyone’ (source: DIE ZEIT #32, 1988, p.41) ii) (Woran wir würgen oder:) Wie wird wer Akademiker? how becomes who (an) academic? (source: PRESSE, 1996, edition 18.5.) iii) Aber was hat wer der Leiche zuvor aus dem Mund entfernt? But what has who the corpse before out the mouth removed (source: http://www.berliner-lesezeichen.de/lesezei/Blz00_06/text18.htm) iv) Welcher W-Frage die größte Bedeutung zukommt ist Ermessenssache: which wh-question the greatest importance deserves is a matter of opinion: Wer hat was getan? Was hat wer getan? Who has what done? What has who done? (source: www.bgvnet.de/gwm/webdbs/xc/xc9007.nsf/6ab7553e18b eec34c12569f9006091af/ 715586135da51aac1256c8300527ae9?) v) Wie war das in der Raumfähre, was hat wer gesprochen? How was it in the space shuttle, what has who said (source: www.timewarp-news.de/Raum/Archiv-Raumf_/Apollo/ hauptteil_apollo.html) 6. The source of this observation, as acknowledged by Chomsky (1981: 27, is Jane Grimshaw, Richard Kayne, and Leland George. 7. In addition to this type of amnesty phenomenon, it is predicted that the patterns in (7) improve if the in-situ wh-subject is able to bind a variables: i) It is unclear who thinks that whoi needs to consult hisi doctor. ii) It is unclear who thinks that whoi saw whomi.
172
Hubert Haider
8. I am grateful to Gisbert Fanselow for this information. 9. Note that checking for covert features (e.g. an EPP-feature) or checking by obligatorily empty functional heads (e.g. agr-object) either does not exist or does not interfere with the obligatory operator property of subjects in specpositions. I prefer the first alternative. 10. Kiss (1993) draws the line in terms of specificity. In her analysis, ‘why’ and ‘how’ are inherently non-specific. 11. i) ver hot vi azoy /farvos gezungen (Diesing 2001, ex. 8,9) who has how /why sung 12. Here is one of the numerous cases found in the web: i) Es ging darum, was Medienleute und Meinungsforscher meinten, wer wie rüberkam. It was a question of what media people and pollster thought, who how came-across (source: http://www.spiegel.de/jahreschronik/0,1518,225790,00.html) 13. The potential relevance of the ‘was-für’ construction was suggested to me by Gereon Müller (p.c.). 14. ‘Was-für split’ is an optional variant of a wh-pied piping construction of a DP of the form ‘was für (ein) NP’ (what-for a NP; = ‘what kind of NP’). Instead of pied-piping the whole DP, the wh-element ‘was’ is fronted. In semantic terms, this element does not quantify over individuals but over sets (namely kinds). 15. Note that the constraint rules out distributive readings (i.e. multiple pair list answers) generated by higher type operators. It does not necessarily rule out a single pair answer (no distributive reading), and it does not apply if ‘how’ is interpreted as ‘in which manner/way’, that is, if how is type-shifted to an individual type (i.e. a set of manners/ways). This may be responsible for variations in informant judgements as reported to me by Gisbert Fanselow: In a survey by Susan Fischer and Joanna Blaszczak in Potsdam (research group FOR 375, project A3), warum>wie (why>how) got starred less often than wie>warum (how>why). 16. Costa (2002, ex. 83a), however, notes that porquê may appear postverbally in a single-wh question: disseste isso porquê? – you said that why? 17. Costa (2002, sect. 3.2) adduces evidence in favor of Rizzi’s base generation proposal from Kuno & Takami (1993), who note that why and how can be followed by a parenthetical expression, but not when and where: “Why/how/ *when /*where, man, did you meet this lady?”. Costa suggests that why and how are base-generated as CP-adjuncts. This account, however, is contradicted by (19c): If how were base generated it would not show a negative island effect. 18. As it is well known, a quantifier can bind any number of pronominal variables in its domain. 19. Pesetsky (1987) : Wh-phrases are quantifiers (p.100). D-linked wh-phrases are not quantifiers (p.108).
The superiority conspiracy
173
20. Note: This is Fiengo’s judgement (1980: 123). 21. For a more detailed discussion see Fanselow (1991: 330), Müller (1995: 323f.), and Haider (2000). 22. This formulation intentionally leaves room for two possibilities: the possibility I hold, that German subjects never leave the VP, except when they are moved to spec C, and the possibility that German subjects do not need to leave the VPinternal subject position for a higher functional spec position, but that they have the possibility. 23. Note that the ‘es’ in this example (i.e. the third one in (32d)) is an expletive, but in the spec C position, and crucially not in the subject position. This shows that German has an expletive, but must not use is as a structural subject expletive. The natural conclusion is: If there is no expletive, there is no structural subject position. 24. The report on the judgements I owe to Gisbert Fanselow: In a questionnairebased inquiry among 19 native Dutch linguists he got mixed results, given in brackets for each example in (33). 25. Clausal mood or force is: interrogative, declarative, or imperative. 26. I.e. the illocutionary force is that of a question. 27. Bijection would mean that a licensed wh-element could license not more than one wh-element: Licenser and licensee would have to be in a 1-to-1 relation.
174
Hubert Haider
References Aoun, Josef 1986 Generalized Binding. Dordrecht: Foris. Beck, Siegrid 1996 Wh-Constructions and Transparent Logical Form. Ph. D. Diss. Univ. Tübingen. (SfS-Report #4, 1996, Seminar für Sprachwissenschaft, Univ. Tübingen) Bresnan, Joan 1977 Variables in the theory of transformations. Part I: Bounded versus unbounded transformations. In: Culicover, Peter and Wasow, T. and A. Akmajian eds. Formal Syntax. New York: Academic Pres. p. 157–196. Chomsky, Noam 1973 Conditions on Transformations. In: Steven Anderson & P. Kiparsky eds. A Festschrift for Morris Halle. New York: Academic Press, pp. 232–286. 1995 The Minimalist Program, Cambridge. Mass. Costa, Joao forthc. A multifactorial approach to adverb placement: assumptions, facts, and problems. Ms. Universidade Nova de Lisboa. (To be published in a volume on adverbials by A. Alexiadou). Diesing, Molly 2001 Multiple questions in and about Yiddish. (Ms. Cornell University; to appear in: Ji-Yung Kim & Adam Werk eds. Proceedings of SULA. GLSA-Publications). Erteschik-Shir, Nomi 1997 The dynamics of focus structure. Cambridge: Cambridge University Press. Fanselow, Gisbert 1991 Minimale Syntax. Habilitationsschrift, Univ. Passau. Published as: GAGL #32 (Groninger Arbeiten zur Germanistischen Linguistik). Rijksuniversiteit Groningen Fiengo, Robert 1980 Surface Structure. Harvard University Press. Haider, Hubert 1998 Form Follows Function Fails – as a Direct Explanation for Properties of Grammars. In: Paul Weingartner & Gerhard Schurz & Georg Dorn eds. The Role of Pragmatics in Contemporary Philosophy. Vienna: Hölder-Pichler-Tempsky (p. 92–108). 2000 Towards a superior account of superiority. Lutz, Uli & & Gereon Müller & Arnim von Stechow eds. Wh-scope marking. Amsterdam: Benjamins. p. 231–248. 2000a Adverb Placement – convergence of structure and licensing. Theoretical Linguistics 26:95–134.
The superiority conspiracy
175
Hornstein, Norbert 1995 Logical Form. From GB to Minimalism. Oxford: Blackwell 1995. Kayne, Richard 1983 Connectedness. Linguistic Inquiry 14: 223–249. Kiss, Katalin É. 1993 Wh-movement and specificity. Natural Language and Linguistic Theory. 11: 85–120 Kitahara, Hisatsugu 1993 Deducing ‘Superiority’ effects from the Shortest Chain Requirement. Harvard Working Papers in Linguistics 3: 109–119. Kuno, Susumu & Kenichi Takami 1993 Grammar and Discourse Principles. Functional Syntax and GB Theory. Chicago & London: The University of Chicago Press. Müller, Gereon 1995 A-bar Syntax. Berlin: Mouton de Gruyter. Ottósson Kjartan G. 1989 VP-specifier subjects and the CP/IP distinction in Icelandic and Mainland Scandinavian. Working Papers in Scandinavian Syntax 44: 89–100. Paardekooper, P. C. 1963 Beknopte ABN-Syntaxis. 7th edition. Eindhoven: Uitgave in eigen beheer. Pafel, Jürgen 1996 Die syntaktische und semantische Struktur von was-für-Phrasen. Linguistische Berichte 161: 37–67. Pesetsky, David 1987 Wh-in-situ: Movement and Unselective Binding. In: E. Reuland & A. ter Meulen eds. The Representation of (In)definiteness. Cambridge, Mass.: MIT Press. 98–129. 2000 Phrasal Movement and its Kin. Cambridge, Mass.: MIT Press. Reinhart, Tanja 1998 Wh-in-situ in the framwork of the Minimalist Program. Natural Language Semantics 6: 29–56 Rizzi, Luigi 1990 Relativized Minimality. Cambridge, Mass.: MIT Press. Saito, Mamoru 1994 Additional wh-effects and the adjunction site theory. Journal of East Asian Linguistics 3:195–240. Szabolcsi, Anna & Frans Zwarts 1993 Weak islands and an algebraic semantics for scope-taking. Natural Language Semantics: 1: 235–285.
Minimal links, remnant movement, and (Non-)derivational grammar John Hale and Géraldine Legendre
1. Introduction The Minimal Link Condition of Chomsky (1995, 311) exemplifies a particular perspective on the role of optimization in grammar. (1)
Minimal Link Condition (MLC) K attracts _ only if there is no `, ` closer to K than _, such that K attracts `
On this view, optimization plays a role in selecting operations that apply to intermediate stages of grammatical derivations. The role is inherently local in character, suggesting a “derivational” view of grammar as a kind of machine taking linguistic rules as instructions. Chomsky has adopted this view as part of the Minimalist Program (MP, 1999, 12): One might construe L as a step-by-step procedure for constructing EXPs, suggesting that this is how things work is a real property of the brain, not temporally but as part of its structural design. Assumptions of this nature constitute a derivational approach to L.
An alternative view, which has been popular at different times throughout the history of generative grammar places the explanatory burden on constraints that apply to complete structural descriptions (see especially McCawley (1968), Jacobson (1974), as well as Chomsky (1981), Bresnan (1982), Rizzi (1990), and Brody (1995). This “representational” perspective does not appeal to intermediate structures, and it has proved difficult to distinguish the two empirically. However, evidence in favor of a derivational approach has been brought to bear by Gereon Müller (1998) in a penetrating “remnant movement” analysis of incomplete category fronting in German. The idea of remnant movement (Thiersch 1985, den Besten and Webelhuth 1987) is to analyze a
178
John Hale and Géraldine Legendre
sentence such as (2) below, as the product of independently available scrambling and topicalization operations 1. (2)
[VP t1 Gelesen ] 2 hat das Buch1 keiner t2 read has the book no-one “No one read the book”
A remnant movement analysis posits a derivational stage at which das Buch moves outside the underlying verb phrase leaving a trace behind. At some later point in the derivation, another movement brings the incomplete “remnant” verb phrase [VP t1 gelesen ] to its surface position. Müller’s analysis is executed in the derivational mode characteristic of the MP, and crucially employs Chomsky’s formulation of the MLC stated in (1). Müller remarks that the facts of incomplete category fronting in German strongly support a derivational, but not a representational grammar.2 While his analysis is successful in accounting for the German facts, it nevertheless deserves to be scrutinized in view of the conclusions he draws regarding the architecture of the grammar. In this paper we argue that the distribution of remnant versus non-remnant movement in German is also compatible with a non-derivational account. Given this alternative, it becomes untenable to claim that grammar is derivational on the basis of examples such as (2). Moreover, when juxtaposed against subtly different facts from Japanese, this non-derivational alternative provides enlightening cross-linguistic insight that the original lacks. This analysis will be reviewed in Section II. The non-derivational analysis to be proposed in section III is expressed in the Optimality Theory (OT) formalism (Prince and Smolensky 1993, Legendre, Grimshaw and Vikner 2001). It employs a constraint that superficially resembles the MLC but in fact differs radically by presupposing global rather than local optimization.3 The success of the analysis bolsters the idea that the MLC is properly interpreted on the Harmonic Parallelism view where optimization is global, and constraints refer to completed representations. Even more fundamentally, the situation of the MLC as a constraint in Con rather than within Gen is supported (contra Broekhuis 2000). The two analyses of German incomplete category fronting are critically compared along dimensions of simplicity and cross-linguistic perspicuity in section V. Finally, section VI concludes that a derivational approach to grammar may not, in fact be compelled by considerations of remnant movement.
Minimal links, remnant movement, and (non-)derivational grammar
2.
179
Classical analysis
2.1. The phenomenon It is well-known that German and Dutch allow both topicalization and scrambling. The following data4, from Müller (2000) illustrate these options. The first is VP topicalization, shown in (4) below. (3)
[VP Das Buch gelesen ]1 hat keiner t1 the book read has no-one “No one read the book”
Example (3) depicts an analysis of topicalization as WH-movement (Chomsky 1977) in which the verb phrase Das Buch gelesen has landed in the specifier of a functional projection, perhaps CP or TopP 5. The second independently available option is NP scrambling, shown in (4) below. (4)
daß das Buch1 keiner [VP t2 gelesen ] hat that the book no-one read has “that no one read the book”
Example (4) is an embedded clause headed by the overt complementizer daß. The NP das Buch has moved in (4) from its base position as a complement of the verb gelesen to a position still to the right of C. Taking for granted that CP (or TopP) specifiers come to the left of their heads, the movement of das Buch is evidently distinct from the one in (3) since (at least) the landing sites differ. Such NP scrambling is standardly analyzed as left-adjunction to IP or VP (Müller 1995, chapter 3). Both phenomena can be found in the same sentence, as (2) – repeated here as (5) – demonstrates6. (5)
[VP t1 Gelesen ]2 hat das Buch1 keiner t2 read has the book no-one “No one read the book”
In a main clause, additional movement following the scrambling seen in example (4) leaves trace t2 and is termed “remnant” because the moved constituent itself contains a trace, t1. This analysis is attractive because the feeding relation between the two transformations provides cross-linguistic insight. Such a double-movement analysis immediately predicts that languages that do not independently pos-
180
John Hale and Géraldine Legendre
sess the first kind of movement can never show incomplete category fronting. This prediction is confirmed by the contrast between the scrambling languages German and Dutch, which do display incomplete category fronting, and all other Germanic languages, which are non-scrambling and do not display incomplete category fronting (den Besten and Webelhuth 1987, 15). In a completed theory, remnant movement might serve as the empirical basis for a scrambling parameter whose setting is independent of the topicalization parameter. Despite being attractive from a cross-linguistic perspective, remnant movement analyses raise theory-internal problems for a Principles and Parameters (P&P) approach. One problem is that the trace t1 in (5) is unbound. This contradicts the presumably inviolable Proper Binding Condition (Fiengo 1977, 45) which requires that moved elements precede their traces7. Another problem is the violation of the Frozen Structure Constraint (Ross 1967, 173; Wexler and Culicover 1980, 120) which prohibits constituents from moving after subparts have moved. The topicalized VP in (5) is a counterexample to such a principle. Müller argues that while these considerations are problematic for a representational P&P approach, they are natural consequences of a derivational MP analysis. The next section examines some of the facts that underlie this conclusion.
2.2. The generalization Remnant movement can occur with a variety of movement type pairs, but not, it appears, when the two types are the same. Example (6) from (Müller 1996, 360) shows that scrambling a remnant VP, rather than topicalizing it, results in ungrammaticality (cf. example (5)). (6)
* daß [VP t1 Gelesen ]2 das Buch1 keiner t2 hat that read the book no one has “that no one has read the book”
As noted earlier, the movement in example (6) must not be topicalization, since the landing site is to the right of the complementizer. The same pattern shows up with other combinations of movement. Example (7) from (Müller 1996, 363, 374) contrasts the acceptability of the scrambling/WH-movement combination (7-a) with the unacceptability of another scrambling/ scrambling combination (7-b).
Minimal links, remnant movement, and (non-)derivational grammar
(7)
181
a. [NP Welches Buch ti ]j hat [PP über die Liebe ]i niemand tj gelesen? which book has about the love no one read “Which book about love did no one read?” b. *daß niemand [NP ein Buch ti ]j gestern [PP über die Liebe ]i tj that no one a book yesterday about the love gelesen hat read has “that no one read a book about love yesterday”
The generalization that emerges in Müller (1996) is stated in (8). Equivalent formulations have been independently proposed by several researchers.8 (8)
Remnant XPs cannot undergo Y-movement if the antecedent of the unbound trace has also undergone Y-movement.
Müller (1998) undertakes to derive this generalization from MP principles, thereby solving the problems of Proper Binding and Freezing that thwart a GB-type approach.
2.3. Derivational account of the basic facts The essence of the derivational account lies in two principles: Last Resort and the Barriers condition. The Barriers condition primarily addresses Freezing. (9)
Barriers Condition: Movement must not cross a barrier. (Müller 1998, 31)
(10) Barrier: XP is a barrier for _ iff there is an Xn (0 ≤ n ≤ P) such that (a), (b) and (c) hold a. Xn includes _ b. Xn is not a complement c. X0 is distinct from Y0, where XP is the complement of Y0 The import of this definition is that moved constituents become barriers when they move to a specifier or adjunct position; since all derived positions
182
John Hale and Géraldine Legendre
are specifiers or adjuncts, this implements the Freezing generalization. The other key principle is Last Resort. Last Resort (Müller 1998, 273) is construed to include Chomsky’s MLC (1) as part ‘b’ of its definition. (11) Last Resort: _ is raised to a position ` only if (a) and (b) hold (a) ` is a typical checking position for the lowest-ranked unchecked morphological feature F of _ (b) There is no a with an unchecked F feature that is closer to ` It is the MLC, part b of Last Resort, which is primarily responsible for explaining the distribution of remnant movement in German. It does this by ensuring that a subpart of a constituent is never raised if, later in the derivation, the entire constituent would also raise to check the same kind of feature.9 These two principles interact to yield a derivation of the acceptable example (2) [schematically repeated here as (12)] but no derivation for the unacceptable example (6) [repeated here as (13)]. (12) a. …Top[top] … Y[scr] … [_[top] .. NP[scr] … ] … b. [VP t1 Gelesen ]2 hat das Buch1 keiner t2 read has the book no-one “No one read the book” (13) a. …X[scr] … Y[scr] … [_[scr] .. NP[scr] … ] … b.* daß [VP t1 Gelesen ]2 das Buch1 keiner t2 hat that read the book no one has “that no one has read the book” In both cases, the inviolable MLC requires that _ move to a specifier of the head with the closest checkable feature (Y). Example (12-b) is grammatical because it has a derivation that does not violate any principles. In particular, the lowest NP moves to Y, checking the scrambling feature [scr], then the remnant _ moves to Top, checking [top]. The unacceptable scrambling/scrambling derivation would have to proceed from a derivational stage containing two of the same features. Consider some of the most likely scenarios. Part b of Last Resort favors movement of _ to the specifier of Y. But such a movement cannot derive (13-b) because, by failing to evacuate [NP das Buch], _ is precluded from becoming a remnant. Even if _ does move to the specifier of Y, under the Barriers Condition, _
Minimal links, remnant movement, and (non-)derivational grammar
183
then freezes and becomes a barrier to further movement, preventing [NP das Buch] from escaping. All possible (feature-driven) movement options result in a violation, and so the ungrammaticality of (13-b) is obtained. All in all (13-b) is ungrammatical because it has no legitimate derivations.
2.4. Character of the explanation Why do Last Resort and the Barriers Condition suffice to explain the generalization in (8)? A closer look at the formal principles of this analysis reveals that the issue of derivationality is surprisingly absent. Rather, the explanation derives entirely from the link-minimization property expressed locally in the MLC. Consider an input with two of the same features such as (14) below. (14) scr1 [M scr2 [N [_[scr] … `[scr]] In underlying structure (14) there are two scrambling features which need to be checked, separated from each other by a number of maximal projections M, and from the largest potential checker by N maximal projections. Given (14) as input, the preference for non-remnant output is derived solely on the basis of link-minimization, where links are measured in terms of the number of maximal projections spanned. From an input like (14), the cost to reach the remnant movement configuration is N+1 for ` (perhaps a verbal argument) and N+M for _ (the verb phase) for a total of 2N+M+1. To reach a nonremnant configuration in which _ checks scr2 and ` checks scr1 costs M+1 for ` and N for _, totaling N+M+1. If N>0, which is to say, if scrambling is analyzed as adjunction to VP, then given (14) as input, the nonremnant surface structure necessarily has shorter links. Should the input contain two different features, as in (15), (15) top [M scr [N [_[top] … `[scr]] in which only _ can check the topic feature [top], then ` must move on its own. A remnant movement structure is then optimal. From the vantage point of Last Resort, the exception posed by German remnant movement to Proper Binding and Freezing is explained chiefly by part b – the MLC. The explanation is one of feature faithfulness: the nearest phrase (favored by the MLC) cannot check the relevant feature, so the grammar settles for
184
John Hale and Géraldine Legendre
feature checking by a phrase marker whose chains are not minimal. The MLC penalizes chains of greater lengths, some of which nonetheless do figure in grammatical sentences. Couching this style of explanation in OT terms, the MLC functions as a gradient markedness constraint on chain lengths which is surface-violated in the case of remnant movement. Evidently, the derivational character of the explanation subsists solely in the presentation of Last Resort.
3.
Non-derivational proposal
The previous section reviewed an MLC-based account of the fact that incomplete category fronting is possible when movement types are different, but apparently impossible when they are the same. The essential insight of this account was found to be neutral between derivational and representational grammars. So, in principle there is no reason why a notational variant could not be constructed to cover the same data. However, if the essential insight is correct, it should extend to other languages. The next section considers some further data gathered with a view toward such extension. Then a unified, non-derivational analysis is provided.
3.1. Typological considerations A remnant is a moved phrase which itself contains a trace of movement. But what is a non-remnant? This question is really a typological question about how languages that exhibit remnant movement differ from other languages. To facilitate this sort of comparison, it will be necessary to fix some dimensions of typological variation. For purposes of cross-linguistic comparison, let the number of movements under consideration be restricted to two. Then the general form of a remnant movement structure is given in (16). (16) …[_… ti … ]j … [M … `i… [N … tj … In words, a remnant movement structure (16) is a phrase structure that has some moved phrase _j separated from a moved sub-constituent `i. The launching and landing sites are separated by some number of projections indicated by M and N. What is the corresponding non-remnant structure that is most naturally comparable to (16)? An attractive counterpart is (17).
Minimal links, remnant movement, and (non-)derivational grammar
185
(17) …`i … [M … [_… ti … ]j … [N … tj … Structure (17) differs only from (16) in that it eliminates the property of _ being a remnant. These structures have been named “diving” (16) and “surfing” (17) by Pesetsky (1982). The terminological shift is necessary to make the concept of “non-remnant” movement less vague. More fundamentally, these two structures provide a kind of ruler for examining cross-linguistic variation. Applying these rulers, one can immediately observe (2) that German exhibits diving movement when given input containing different features, such as scrambling and topicalization. Equally immediate is the observation (6) that German does not permit diving movement when two of the same features are present. Do any languages generate surfing configurations from these inputs? Japanese does. In Japanese, surfing is attested in connection with (long-distance) scrambling (Saito 1985, Saito 1992, Saito and Fukui 1998). A simpler example (18) from Sauerland (1996, 224) makes the same point. (18) Booru-oj matigatte [CP Susi-ni tj watasoo to ]i Kazuko-ga ti kokoromita Susi-to to give that KazukoNOM tried ballACC falsely “Kazuko tried to give a ball to Susi by mistake” Namely, when given two scrambling features, the Japanese result is not ungrammaticality, but instead, the surfing movement configuration. Now consider contrasting movement types in Japanese. Here, diving or remnant movement is indeed possible. Example (19) from Tsujioka (2000) citing Kurafiji (1995) illustrates this for the scrambling/WH movement combination10. (19) [CP [SC t1 donna-ni kirei-ni ]2 [TP biyoosi ga [DP Mary o ]1 t2 sita ] no] how much beautiful beautician NOM Mary ACC did Q “How beautiful did the beautician make Mary?” These findings are summarized in Table 1 below. Table 1. Patterns of diving and surfing different
same
German
diving
*
Japanese
diving
surfing
186
John Hale and Géraldine Legendre
3.2.
An OT approach to remnant movement in German and Japanese
This subsection reanalyzes the generalization (8) “remnant XPs cannot undergo Y-movement if the antecedent of the unbound trace has also undergone Y-movement” (Müller 1998, 240) from the perspective of Harmonic Parallelism 11. After motivating the constraints, the inputs, and the candidate set, the bulk of the observations in Table 1 are straightforwardly explained as a Strict Domination interaction between a preference for short chains and a preference for fully-checked features. Adding in the notion that feature checking itself is violable explains the full range of phenomena in Table 1 and predicts new languages.
3.2.1. Constraints Section II identified the crucial role of the MLC in Müller’s (1998) grammar of German remnant movement. A similar, but not identical function is served by the OT MINLINK constraint family (Legendre et al. 1995, 1998). (20) BARk A chain link must not cross k barriers. [where a barrier is a maximal projection that is not L-marked] (Chomsky 1986) n 2 (21) MINLINK (UNIVERSAL) : BAR >>… >> BAR >> BAR 1 [Local conjunction power hierarchy of BARk] BAR functions only as a building block for MINLINK, which registers one violation for each link of each chain in a candidate. For example, BAR2 is the Local Conjunction (Smolensky 1997) of BAR1 with itself, indicated by the symbol &1. This conjunction can be recursively applied to derive a markedness constraint against chain links of any length: MINLINK. MINLINK is motivated by data from Chinese, Bulgarian and English. For example, it derives the suboptimality of English superraising.
Tableau 1. English super-raising (Legendre et al 1998) Input: LF=seems(likely(win(John))) a.
BAR2= BAR1 &1 BAR1
☞ It seems [ that [ Johni is [likely [ ti to [win
b. Johni seems [ that [ it
is [likely [ ti to [win
BAR1 *
*!
Minimal links, remnant movement, and (non-)derivational grammar
187
Tableau 1 displays the reason why super-raising (candidate b) is ungrammatical in English: shorter chains are preferred to longer ones. Candidate b incurs a violation of BAR2 compared with a violation of BAR1 for candidate a. Therefore a is optimal. That MINLINK must be stated as a constraint family – more precisely as a universal hierarchy of individual BAR constraints – rather than a single constraint like the MLC is motivated in Legendre et al. 1998 by the necessity of intercalating faithfulness constraints with the BAR hierarchy. See the referenced article for details. While MINLINK has been applied in contexts where it subsumes much of the role of the Condition on Extraction Domain (CED) by penalizing intermediate traces, the two are not equivalent. In particular, MINLINK is crafted with violability in mind whereas the CED was not. However, it appears that the CED, in its inviolable version is not quite on the right track as freezing evidence from German suggests. This is evident in counter examples to the Barriers condition defined earlier. This CED-like condition makes explicit the conditions under which the “properly governed” clause of Huang’s original CED actually obtain. If the CED or Barriers condition were inviolably surface-true, then (22) ought to be ill-formed (de Kuthy and Meurers 1999, 5). (22) Worüber kann [einen Südkurier-Artikel ] selbst Peter nicht am about what can a Südkurier-Artikel even Peter not at the Strand verfassen? beach write “For which topic is it the case that even Peter cannot write an article about it for the Südkurier when he is at the beach?” Here, assuming the NP einen Südkurier-Artikel has scrambled from object position, the CED or Barriers condition ought to rule out any further extraction at the derived position. But in (22), it appears that the WH-element worüber has been moved in just this way. Another case is where again the object precedes the subject, suggesting scrambling, but additional movement of a different kind is still possible (Fanselow 2000, 14).
188
John Hale and Géraldine Legendre
(23) Wasi hätte denn [DP ti für Aufsätze ] selbst Hubert nicht rezensieren what had particle for papers even Hubert not review wollen wanted “What kind of paper would even Hubert not have wanted to review?” The implication of these counterexamples is that some kind of violability is at work. In order to preserve the substantive CED insight, let the universal barrier constraints be of two types, BARCOMPL and BARNCOMPL, sensitive to extraction from complement and non-complement positions respectively. By Local Conjunction, MINLINKCOMPL and MINLINKNCOMPL are immediately obtained (cf. (10) part b). Another important constraint, pending similar refinement, is FEATURE FAITHFULNESS (24) Feature Faithfulness (FF): Features must not be deleted. FEATURE FAITHFULNESS forms the basis of any neutralization account in which unchecked features are simply ignored by the grammar because it would be too expensive to check them. Such a constraint loosely follows Chomsky (1995) in assuming that checker -F and check-ee +F features annihilate one another in a privileged structural configuration such as spechead agreement, whereby they become invisible to FF.
3.2.2. Inputs The input is “the substructure two candidates must share in order that they be competitors” (Legendre et al, 1998, 5). In many formulations this substructure is a semantic form, however to compare German and Japanese along the dimensions mentioned earlier, the input need only be expressive enough to represent 1.
items with checking power
2.
items that need to be checked
3.
base positions of these items
Minimal links, remnant movement, and (non-)derivational grammar
189
In a more complete grammar, such base orderings might be weakly constrained to match canonical information structures, perhaps via “imperfect correspondence” (Bresnan 2000, section 4). The grammar’s function is to define the inventory of acceptable chains, so it is natural to assume the input also contains “underived” phrase structure, which might affect the chain-formation decision. Characteristic of Harmonic Parallelism, outputs are full structural descriptions of surface forms.
3.2.3. Candidate sets The grammar specifies Harmony maxima that represent the best-formed chains that are as compatible as possible with the input. For this reason, the relevant aspects of Gen are the ways it defines chains. Since the present analysis adopts MINLINK, candidate sets can be restricted to a maximum chain-length n where n is the number of syntactic features in the input. In the presence of MINLINK, chains longer than n will always be suboptimal: these structures are “harmonically-bounded” by candidates whose chains are shorter.
3.2.4. Tableaux for basic German and Japanese patterns The analysis is simply that, in German, faithfulness to input features overrides the preference for shorter chains, licensing remnant movement – the diving configuration of (16) which is the structural description for the example (2). This argument, played out in Tableau 2, preserves the insight of Müller (1998, chapter 5.7)12. Since two different chains are present and neither of them originate in non-complement position, MINLINKCOMPL and MINLINKNCOMPL cannot be distinguished – an instance of encapsulated constraints (Prince and Smolensky 1993, chapter 8; Legendre et al. 1998). Tableau 2. Two different features – German Input: Top [M Scr [N [VP[+Top] … das Buch[+scr] … ] a.
FF
☞ [VP ti gelesen ]k hat das Buchi keiner tk
MINLINK i=N=1, k=N+M
b. [VP das Buch gelesen ]k hat Scr keiner tk
*!
c. Top Scr hat keiner [VP das Buch gelesen ]
**!
d. das Buchi hat [VP ti gelesen ]k keiner tk
**!
k=N+M
k=N+1, i=N+M
190
John Hale and Géraldine Legendre
Given an input that specifies topicalization and scrambling, in that order, the optimal candidate (a) has two chains. One chain, headed by das Buch and having ti as its tail, crosses N+1 maximal projections; one for the verb phrase itself and N as determined by an underlying configurational theory of German phrase structure. The other chain subscripted k crosses N+M maximal projections. Eval deals with MINLINK just as any other markedness constraint, by adding up all violations to produce a single integer number of stars. For notational convenience we will continue to write pseudo-equations of the form “chain-index = sum of maximal projection counts” to indicate where the violations are coming from. Unchecked features, sensitive to FF are in bold. The last candidate (d) in table 2 suffers two FF violations because the wrong constituents have checked incompatible features. In Tableau 2 any alternative to remnant movement requires some form of feature neutralization, violating FF. Although topicalizing the verb phrase das Buch gelesen results in a less-marked structure (b) with only one chain rather than two, Strict Domination of MINLINK by FF rules it out. The same constraint conflict derives the preference for surfing over diving in Japanese. This is depicted in Tableau 3 (examples from Sauerland 1996). Tableau 3. Two of the same features – Japanese Input: Scr [M Scr [N [CP[+scr] … Booru-o[+scr] … ] a.
FF
☞ Booru-oj matigatte [CP Susi-ni tj watasoo to ]i Kazuko-ga ti
MINLINK i=N, j=M+1
kokoromita
b. [CP Susi-ni tj watasoo to ]i matigatte Booru-oj Kazuko-ga ti kokoromita c. Scr Scr matigatte Kazuko-ga [CP Susi-ni Booru-o watasoo to ]
i=N+M, j=N+1! **!
Tableau 3 applies the constraint motivated by German to the case in which two input features are presented to the Japanese grammar. In such a case, both surfing (a) and diving structures (b) succeed at canceling all input features, so MINLINK becomes active. As in German, diving movement is always more expensive than surfing movement. Tableau 4 illustrates a third application, again to Japanese, in the case where an input specifies two different features. As in German, Feature Faithfulness is the decisive factor when input features differ.
Minimal links, remnant movement, and (non-)derivational grammar
191
Tableau 4. Two different features – Japanese Input: WH [M=1 Scr [N=0 [SC[+wh] … Mary-o[+scr] …. ] a.
FF
☞ [CP [SC ti donna-ni kirei-ni ]j [TP biyoosi ga [CP Mary-o ] i tj sita ] no]
MINLINK i=1, j=1
b. [CP [SC [DP Mary-o ] donna-ni kirei-ni ] j [TP biyoosi ga Scr t j sita ] no]
*!
c. WH [TP biyoosi ga Scr [SC [DP Mary-o ] kirei-ni ] sita ]
**!
d. [CP [DP Mary-o ] i [TP biyoosi ga [SC t i donna-ni kirei-ni ] j t j sita ] no]
**!
j=1
i=2,j=0
Across three cells of Table 1, repeated below, a broadly painted picture of markedness-faithfulness conflict between FF and MinLink suffices to characterize both the German and Japanese situation. The intuition of Müller, Kitagawa, Takano, Fukui and others has found its place in a Harmonic Parallelism analysis13. Table 1. (repeated) different
same
German
diving
*
Japanese
diving
surfing
The remaining German cell, however motivates a re-ranking account of the difference between German and Japanese.
3.2.5. The difference between German and Japanese One well-known difference lies in the fact that German scrambling, unlike Japanese scrambling, cannot cross a finite clause boundary (Grewendorf 1993, 1308). But this is exactly the property which would allow one to observe either surfing or diving configurations in which the “first” movement is VP scrambling. It is therefore not obvious what sort of structure corresponds to the Japanese input in Tableau 3. The appropriate comparison would be to cases in which same-featuredriven movement is ‘almost’ observable. Indeed there are said to be some cases where predicative scrambling is marginally acceptable in German (Müller 1996, 362).
192
John Hale and Géraldine Legendre
(25) a. *daß [VP dem Peter tj gegeben ]k die Claudia einen Kußj tk hat ART Claudia a kiss has that ART PeterDAT given “that Claudia gave Peter a kiss” b. ??daß [VP dem Peter einen Kuß gegeben ]k die Claudia gestern tk hat that ART PeterDAT a kiss given ART Claudia yesterday has “that yesterday Claudia gave Peter a kiss” What happens when the German grammar is presented with an input where two features of the same type are presented? Example (25-b) suggests that diving is not the result14. But some sort of neutralization is occurring. One possibility is that some scrambling features in German do not delete after checking. Sauerland (1996, 1998) observes that Chomsky’s (1995) distinction between interpretable and non-interpretable feature is borne out in the difference between Japanese, where surfing is possible (Tableau 3) and German, where it is not (26). (26) a. *?weil [den Ball]1 vergeblich [der Susi t1 zu geben] die Kazuko t2 since the ball unsuccessfully the Susi to give the Kazuko versucht hat tried has b.*?weil gestern [von Chomsky]1 in Frankfurt [das neue Buch t1]2 since yesterday of Chomsky in Frankfurt the new book niemand t2 gekauft hat nobody bought has Adopting this key idea offers the possibility that it is the correspondence between check-er and check-ee that is neutralized in this cell of Table 1. In fact, Sauerland’s proposal can be stated without appeal to feature strength in OT terms, by splitting FF into a pair of violable constraints conspiring to promote one-to-one feature checking. (27) *MANYTOONE: : Violated if the same +F feature checks multiple –F features (28) *ZEROTOONE: Violated if a –F feature goes unchecked. On a parallel correspondence theory of feature checking, *MANYTOONE is familiar from phonology as the two constraints UNIFORMITY and INTEGRITY
Minimal links, remnant movement, and (non-)derivational grammar
193
in which the correspondents are now syntactic features (McCarthy and Prince 1995, appendix A). In OT terms, Sauerland’s finding that +F features are re-checkable in German but not in Japanese amounts to the assertion that *MANYTOONE plays a greater role in Japanese than in German. Trading places, however, with all of MINLINK will not work. Rather, the sensitivity of particular members of the MINLINK family that can discern between complement extraction and noncomplement extraction is called for. A descriptively adequate ranking for German is (29) German: *ZEROTOONE >> MINLINKNCMPL >> MINLINKCOMPL >> *MANYTOONE Tableau 5 suggests that the winning candidate when the German grammar is presented with an input containing two of the same features is a compromise like (25-b) where a single VP checks two scrambling features, avoiding both surfing and diving. Tableau 5. Two of the same features – German Input: Scr [M Scr [N [VP[+scr] … einen Kuß[+scr] … ] a.
*Z-1
MLNCMPL
☞ daß [VP dem Peter einen Kuß gegeben ]i
MLCMPL
*M-1
i=N+M
*
die Claudia tvi gestern ti hat
b. daß einen Kußj [VP dem Peter tj gegeben ]i die Claudia ti hat
j=M+1!
c. daß [VP dem Peter tj gegeben ]i einen Kußj die Claudia ti hat d. Scr die Claudia Scr gestern [VP dem Peter einen Kuß gegeben ] hat
i=N i=N+1,j=N+M!
*!
The winning candidate in Tableau 5 is the partially acceptable predicative scrambling item (25-b). In this item, the VP[+scr] dem Peter einen Kuß gegeben checks both scrambling features. This is notated with an intermediate trace tvi suggesting the place where the first instance of checking took place. The overall idea is that German cannot permit multiple scrambling of constituents and their subparts, and would rather bend the “rules” of feature checking to avoid it. This idea is implemented by *Z-1’s outranking *M-1: while it is still important to enforce the ‘interface’ condition that all –F features
194
John Hale and Géraldine Legendre
must be eliminated, that this be done in a completely uniform way is less important. The explanation of the clause-boundedness of German scrambling is that it is always cheaper to violate *M-1 using a single VP-chain than it is to break a subpart out of VP. For Japanese, a descriptively adequate ranking is (30) Japanese: *ZEROTOONE >> *MANYTOONE >> MINLINKCOMPL >> MINLINKNCMPL Tableau 6 verifies the compatibility of the previously-derived Japanese data with this ranking. Tableau 6. Two of the same features – Japanese Input: Scr [M Scr [N [CP[+scr] … Booru-o[+scr] … ]
*Z-1
a. [CP Susi-ni Booru-o watasoo to ]i matigatte tvi Kazuko-ga ti kokoromita b.
☞ Booru-oj matigatte [CP Susi-ni tj watasoo to ]i Kazuko-ga ti kokoromita
MLCMPL
*!
i=N+M i=N
c. [CP Susi-ni tj watasoo to ]i matigatte Booru-oj Kazuko-ga ti kokoromita d. Scr Scr matigatte Kauko-ga [CP Susi-ni Booru-o watasoo to ]
*M-1
MLNCMPL
j=M+1
i=N+1,j=N+M! *!
These rankings also work for both languages when inputs contain different features. The reason is that such competitions are always and essentially a conflict between *ZEROTOONE and MINLINKCOMPL whose relative ranking is the same in both German and Japanese. It is in this respect that the new rankings are instantiations of the broad finding that FF >> MINLINK.
3.2.6. Typological predictions Assuming that *ZEROTOONE has a fixed, high ranking (perhaps in virtue of being an interface condition) the factorial typology is reduced in size from 24 languages to 6, two of which (German and Japanese) have been analyzed in this section. This leaves four other predicted languages.
Minimal links, remnant movement, and (non-)derivational grammar
1.
MINLINKCOMPL >> MINLINKNCMPL >> * MANYTOONE
2.
MINLINKCOMPL >> * MANYTOONE >> MINLINKNCMPL
3.
MINLINKNCMPL >> * MANYTOONE >> MINLINKCOMPL
195
4. * MANYTOONE >> MINLINKNCMPL >> MINLINKCOMPL The input containing two identical features discriminates these predicted languages. Languages 1 and 2 pattern with German, opting for many-to-one feature checking. Language 3 and 4 predict a diving movement path when presented with this input – a prediction which diverges from the Minimalist approach of section II. The larger implication is that the generalization (Takano 2000) In a derivation yielding the configuration …[A…ti…]j …Bi…tj… movement of A and movement of B may not be of the same type. may in fact be language-particular.
4. Empirical basis The analysis in section III is predicated on the generalization (8) repeated here as (31) which has been widely assumed in the literature. (31) Remnant XPs cannot undergo Y-movement if the antecedent of the unbound trace has also undergone Y-movement. However, survey data collected by Gisbert Fanselow indicates that the situation is somewhat more complex. Fanselow finds that, indeed, predicative VP scrambling is possible under some conditions, as illustrated in (32) and (33) below. zusammenstehlen]i nur (32) dass man ja [VP sein Essen sich that one one’s food REFL steal-together only [PP in höchster] ti Not darf in greatest need is-allowed (33) dass [VP die Maria geküsst]i dann DOCH keiner ti hatte that the Maria kissed then PTC no-one has
196
John Hale and Géraldine Legendre
What’s more, with the correct intonation, items such as (34) are acceptable.15 (34) dass [VP tj geküsst] [NP die Marie]j dann DOCH keiner ti hatte that kissed the Marie then PTC nobody had Sentence (34) involves a scrambled VP containing a scrambling trace. It is a counterexample to generalization (8) where Y is scrambling. A precise characterization of the factors that license exceptions to generalization (8) (such as intonation, the particle doch et cetera) remains an important challenge to any descriptively adequate account of German word order. On the view expounded in section III, though, the situation is quite clear: sometimes German is Japanese, as regards recursive scrambling. In OT terms, sometimes the *MANYTOONE constraint can outrank (both instantiations of) MINLINK. If this does indeed happen in the idiolect of a single speaker, it can be viewed as a case of floating constraints (Reynolds 1994, Antilla 1996, Legendre et al. 2002). 5. Comparison Any analysis of the difference between German and Japanese will have to assign some formal element that differs across those two languages, be it a lexical element, additional derivational complexity or a parameter, more or less abstract that is set differently across languages. The proposal of section III is that only the relative priorities of universally-given principles are different across German and Japanese. At present it is unclear what mechanism for cross-linguistic diversity will replace parameter setting in the Minimalist program16. By contrast, the principles required in the new proposal are well-established and very basic: a notion of barrier sensitive to the property of being a complement, and a pair of constraints that tend towards the notion of one-toone feature checking that has always been assumed in discussions of featuredriven movement – but are crucially violable. Occasionally, in discussions of OT syntax, the issue of computational complexity is brought up 17. Indeed, this general issue is as central for theoretical linguistics as it is for natural language processing (see, e.g. Rounds 1991). However, mention of it is often prompted by a concern that because the set of candidates specified by Gen is infinite, the Harmonic Parallelism view of OT is unworkable – or at least, unimplementable in the brain or some other variety of computer.
Minimal links, remnant movement, and (non-)derivational grammar
197
Given that generative grammar has always dealt with infinite sets (through recursion) this is a curious criticism. We feel it is out of place in the context of a competence grammar18 whose role is taken to be the specification of the set of grammatical expressions in an explicit way, analogous to the notion ‘generate’ in algebra. To the extent that the competence/performance distinction allows work on the specification of the human language faculty to proceed independently from work on implemented models of that faculty (e.g. Hale and Smolensky 2001) it is our own feeling that theoretical linguistics can constructively engage other cognitive sciences while still remaining insulated from the algorithmic exigencies of search space, temporal sequencing and memory.
6. Conclusion The main results of this study, however, are purely grammatical. An investigation of the role of the MLC in accounting for German incomplete category fronting (Müller 1998) challenges a prominent argument for derivational grammar. In fact, the key insight of a remnant analysis turns out to be one whose expression in a non-derivational OT framework is quite natural: FEATURE FAITHFULNESS dominates MINLINK. A cross-linguistic comparison of German with Japanese confirms the violable nature of feature checking. This leads to an account of remnant movement in German which – rather than being hindered by violability – is made possible by it. To the extent that the proposed Harmonic Parallelism analysis is successful, it vitiates one of the few empirical arguments for derivational grammar, and supports the view of the MLC as a global, rather than local optimization principle.
Acknowledgements The authors wish to thank Ralf Vogel, Artur Stepanov, Fabian Heck, Yuji Takano, Colin Wilson, Paul Hagstrom, Luigi Burzio and Paul Smolensky for discussion that has been valuable in the preparation of this paper. We also thank both Florian Wolf and Julia Hockenmaier for German acceptability judgments and valuable discussion. Special thanks to Gisbert Fanselow for his survey results, and to our anonymous reviewers.
198
John Hale and Géraldine Legendre
Notes
1. Non-remnant movement analyses have also been proposed which appeal to reconstruction (e.g. Grewendorf and Sternefeld 1990) or deny the existence of scrambling on which remnant movement rests (Fanselow 2000). We abstract away from these alternatives because the primary issue of interest here is to compare current derivational and representational approaches and not to compare the merits of all possible accounts of incomplete category fronting in German. 2. “On a more general note, it has often been observed that representational analyses of syntactic phenomena can usually be rephrased in derivational terms (and vice versa) without too much difficulty, so most syntactic phenomena do not force a decision as to what the overall organization of a grammar (representational or derivational) should look like. Therefore, as soon as we find a phenomenon that strongly suggests preferring one approach over the other, this result may have important repercussions that go beyond the question of an empirically adequate account of the phenomenon itself. Accordingly, I would like to contend that, on the conceptual side, the analysis developed here should be viewed as an argument for a type of grammar in which all constraints are either derivational or transderivational (and constraints that can be viewed as filters on surface representations do not exist at all).” (Müller, 1998, 323) 3. Müller and his collaborators have recently proposed an OT analysis of remnant movement but it crucially relies on a MP-style derivational approach. The conclusions we reach carry over to those analyses. 4. Müller’s data is re-assessed in section IV. 5. We follow Vikner (2001, chapter 3) in assuming the auxiliary hat is in C. 6. Examples of remnant movement analyses in other languages, such an English and Hungarian, can be found in Kayne (1994) and Koopman and Szabolcsi (1998). 7. We follow the original formulation of Proper Binding in terms of precedence (rather than c-command) for expository convenience. 8. Takano (2000,144) remarks that “Fukui 97, Kitahara 94,97, Müller 96 and Takano 94 have independently proposed to account for the differences between licit and illicit remnant movement by paying attention to the types of movement that affect remnant movement. The basic intuition that we attempt to capture is this: In a derivation yielding the configuration …[A…ti…]j …Bi…tj… movement of A and movement of B may not be of the same type.” 9. Müller (1998, 276) puts it this way regarding the configuration: ...`...[a..._...]... “The crucial observation now is that in this context a is invariably closer to ` (in terms of length of path) than _ is, the simple reason for this being that _ is dominated by a.”
Minimal links, remnant movement, and (non-)derivational grammar
199
10. The appropriateness of separate Scrambling, Topicalization and WH-movement rules for Japanese is a matter of some controversy, and distinguishing them is an active area of research. We are sympathetic with the position of Takahashi (1993, 664) that at least some kinds of long-distance scrambling qualify as WHmovement since they exhibit superiority effects (i) John-ga dare-ni [ Mary-ga nani-o tabeta to] itta no? NOM who DAT what-ACC ate COMP said Q Who did John tell that Mary ate what? (ii) ?? Nani-o John-ga dare-ni [Mary-ga ti tabeta to ] itta no? What did John tell who that Mary ate? and have fewer scope possibilities than other kinds of dislocation. 11. One might well begin from a violable constraint like Vikner’s (2002) ProperBinding, perhaps adducing *FROZEN. The goal of the analysis in section III is to provide a satisfactory definitions for such constraints. 12. Feature unfaithfulness is but one possible account of these phenomena. Others, perhaps based on bi-directional optimization (cf. Vogel, this volume) may offer similar cross-linguistic insight. 13. The Harmonic Parallelism approach, unlike the Clash and Crash architecture of Broekhuis (2001) does not presuppose that Gen incorporates any economy principles; instead, the MLC resides entirely in the constraint component Con. 14. A native speaker of German informs us that items such as (25-b) may be more acceptable in dialects originating in what was formerly Bohemia. This suspicion is strengthened by the following observations about modern Czech. Taking (i) as a base, (i) Petr dnes nebude [VP libat Klaudii ruku] “Peter will not kiss Claudia’s hand today” the VP can be fronted: (ii) [VP libat Klaudii ruku]i Petr dnes nebude ti and the NP argument of VP can be scrambled out: (iii) Petr dnes [NP Klaudii ruku]i nebude [VP libat ti ] but the VP-fronting of (1) is preferable to the double-movement in (iv) (iv) * [VP libat ti ]j Petr dnes [NP Klaudii ruku]i nebude tj All four items mean essentially the same thing, so it is reasonable to assume they have the same inputs. Thus, Czech presents a case in which a single movement appears to neutralize two features in a way that two separate movements cannot. 15. Item (34) was judged acceptable by 23 native speakers out of 31 who had an opinion. 16. Müller (1998) remarks “a bit more would have to be said about Japanese” (fn p252). 17. While the complexity of both parsing and generation has been a popular topic in OT phonology, correspondingly less work has pursued these questions in OT syntax. Notable exceptions include Wartena (2000) and Kuhn (2001).
200
John Hale and Géraldine Legendre
18. Chomsky (1968) writes: “In general, a set of rules that recursively define an infinite set of objects may be said to generate this set. Thus a set of axioms and rules of inference for arithmetic may be said to generate a set of proofs and a set of theorems of arithmetic (last lines of proofs). Similarly a (generative) grammar may be said to generate a set of structural descriptions, each of which, ideally, incorporates deep structure, a surface structure, a semantic interpretation (of the deep structure) and a phonetic interpretation (of the surface structure).” (footnote 12, page 126).
References Antilla, A. 1997
Variation in Finnish phonology and morphology. Ph.D. dissertation, Stanford University. Bresnan, J. (ed.) 1982 The Mental Representation of Grammatical Relations. MIT Press. Bresnan, J. 2000 Optimal syntax. In Optimality Theory: Phonology, Syntax and Acquisition. J. Dekkers, F. van der Leeuw, J van de Weijer (eds.). Oxford: Oxford University Press. Broekhuis, H. 2000 Against feature strength: The case of Scandinavian object shift. Natural Language and Linguistic Theory 18: 673–721 Chomsky, N. 1968 The Formal Nature of Language. In Language and Mind. New York: Harcourt Brace Jovanovich. 1977 On Wh-movement. In Formal Syntax, P. Culicover, T. Wasow and A. Akmajian, (eds.) 71–132. New York: Academic Press. 1981 Lectures on Government and Binding: the Pisa Lectures. Dordrecht: Foris. 1986 Barriers. Cambridge, Mass.: MIT Press. 1995 Categories and transformations: Chapter 4 of The Minimalist Program. Cambridge: MIT Press. 1999 Minimalist Inquiries: The framework. Ms, MIT. de Kuthy, K. and Meurers D. 1999 On partial constituent fronting in German. In Tübingen Studies in Head-driven Phrase Structure Grammar. V. Kordoni (ed.), 22–73. den Besten, H. and Webelhuth, G. 1987 Adjunction and remnant topicalization in the German SOV-languages. Paper presented at GLOW, Venice. Fanselow, G. 2000 Features, theta-roles and free constituent order. Linguistic Inquiry 32 (3): 405–437.
Minimal links, remnant movement, and (non-)derivational grammar
201
Fiengo, R. 1977 On trace theory. Linguistic Inquiry 8: 35–61. Grewendorf, G. 1993 Syntactic sketch of German. In Syntax: An International Handbook of Contemporary Research, J. Jacobs, A. von Stechow, W. Sternefeld and T. Vennemann, (eds.) Volume 2 of Handbooks of Linguistics and Communication Science. Berlin: Walter de Gruyter. Grimshaw, J. 1997 Projection, heads and optimality. Linguistic Inquiry 28: 373–422. Hale, J. and Smolensky P. 2001 A Parser for Harmonic Context-Free Grammars. In Proceedings of the 23rd Annual Conference of the Cognitive Science Society. Johanna D. Moore and Keith Stenning, (eds.), 427–432. Jacobson, P. 1974 How a nonderivational grammar might (or might not) work. In Berkeley Studies in Syntax and Semantics, 1–28. Department of Linguistics and Institute of Human Learning,. University of California, Berkeley. Kayne, R. 1994 The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Koopman, H. and Szabolcsi A. 2000 Verbal Complexes. Cambridge, Mass.: MIT Press. Kuhn, J. 2001 Generation and Parsing in Optimality Theoretic Syntax: issues in the formalization of OT-LFG. In Formal and Empirical Issues in Optimality Theoretic Syntax, P. Sells (ed.), 313–366. Kurafuji, T. 1995 An interpretation of wh-phrases that induce a proper binding violation. In J. Costa, R Goedemans, and R. van de Vijver (eds.) Proceedings of ConSOLE 4, 189–408. Legendre, G., Hagstrom P., Vainikka A. and Todorova, M. 2002 Partial constraint ordering in child French syntax. Language Acquisition 10(3): 189–227. Legendre, G., J. Grimshaw, & S. Vikner (eds.) 2001 Optimality-theoretic Syntax. Cambridge, Mass.: MIT Press. Legendre, G., Smolensky, P. and Wilson C. 1998 When is less more? Faithfulness and minimal links in wh-chains. In Is the Best Good Enough: Optimality in Syntax, P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis, and D. Pesetsky (eds.). Cambridge, Mass.: MIT Press. McCarthy and Prince. 1995 Faithfulness and reduplicative identity. In Papers in Optimality Theory J. Beckman, L. Dickey and S. Urbanczyk, (eds.). Amherst, Mass.: GLSA, 249–384. Rutgers Optimality Archive number 60.
202
John Hale and Géraldine Legendre
McCawley, J. 1968 Concerning the base component of a transformational grammar. Foundations of Language 4: 33–81. Müller, G. 1995 A-bar syntax: a study in movement types. Mouton de Gruyter. 1996 A constraint on remnant movement. Natural Language and Linguistic Theory 14:355–407. 1998 Incomplete Category Fronting: A Derivational Approach to Remnant Movement in German. Dordrecht: Kluwer. 2000 Shape Conservation and remnant movement. In M. Hirotani, A. Coetzee, N. Hall, J.-Y. Kim (eds.), Proceedings of NELS 30, Amherst, Mass.: GLSA. 525–539. Pesetsky, D. 1982 Paths and categories. Ph.D. dissertation, MIT. Prince, A. and Smolensky, P. 1993 Optimality Theory: Constraint Interaction in Generative Grammar. Technical Report, Rutgers University. Reynolds, B. 1994 Variation and phonological theory. University of Pennsylvania dissertation. Ross, J. 1967 Constraints on variables in syntax. Ph.D. dissertation, MIT. Rounds, W. C. 1991 The relevance of computational complexity theory to natural language processing. In Foundational Issues in Natural Language Processing, P. Sells, S. M. Shieber and T. Wasow (eds.), MIT Press, Cambridge, Massachusetts, 9–30. Sauerland, U. 1996 The interpretability of scrambling. In Formal Approaches to Japanese Linguistics 2, M. Oishi, M. Koizumi and U. Sauerland (eds.), Volume 29 of MITWPL 213–234. 1998 Erasability and interpretation. Syntax 3: 161–188. Smolensky, P. 1997 Constraint Interaction in Generative Grammar II: Local Conjunction (or, random rules in Universal Grammar). Paper presented at the Hopkins Optimality Theory Workshop/University of Maryland Mayfest. Takano, Y. 1998 Illicit remnant movement: An argument for feature-driven movement. Linguistic Inquiry 31: 141–156. Thiersch, C. 1985 VP and scrambling in the German Mittelfeld. Ms, University of Tilburg.
Minimal links, remnant movement, and (non-)derivational grammar
203
Tsujioka, T. 2001 Improper remnant A-movement. In Proceedings of NELS 31 Min-Joo Kim and Uri Strauss, (eds.), Amherst, Mass.: GLSA, 483–500. Vikner, S. 1995 Verb Movement and Expletive Subjects in the Germanic Languages. Oxford: Oxford University Press. 2001 V-to-I movement and do-insertion in optimality theory. In Optimalitytheoretic Syntax Legendre, Grimshaw and Vikner (eds.), 427–464. Wartena, C. 2000 A note on the complexity of optimality systems. Rutgers Optimality Archive Number 385.
Extending and reducing the MLC Winfried Lechner
1. Introduction According to a widely shared set of assumptions, the economy metric determining the structure of (overt1) syntactic tree representations consists of two components, each of which regulates competition in a different domain. On the one hand, various conditions on movement have been postulated in the literature which minimize the length2 of movement paths. The MINIMAL LINK CONDITION (MLC) probably represents the most prominent exponent of this family of constraints, but similar intuitions have also been expressed in terms of principles such as Shortest Move, Shortest Attract, Shortest, Stay!, etc… (see e.g. Chomsky 1993, 1995, 2000, 2001). On the other hand, it has been hypothesized that contexts in which movement competes with Merge at the root node – and not with another instance of movement – need to satisfy the MERGE OVER MOVE condition, which dictates that insertion from the numeration is preferred over dislocation in the tree (see e.g. Chomsky 1995; Wilder and Gärtner 1996). The present study specifies two methods for simplifying parts of this economy metric. The first objective consists in laying out a strategy for eliminating Merge over Move. It will be argued that a redefinition of the MLC extends its operative scope in such a way that it can also capture the effects of Merge over Move. In addition, the reformulation of the MLC generates a new analysis of Case Freezing. Second, the paper explores prospects and consequences of reducing certain core properties of the MLC to facets of an independent condition on tree representations, the Linear Correspondence Axiom (LCA; Kayne 1994). In particular, I will propose a new analysis of local MLC phenomena – in particular a proper subset of Superiority – which rests on the assumption of a derivational, phase-based implementation of the antisymmetry condition encoded in the LCA. The two objectives of extending the MLC to Merge over Move and deriving parts of the MLC from aspects of the LCA do not only serve the narrow goal of reducing the complexity of the system. In addition, they can also be seen as contributing to the larger enterprise of assessing to which
206
Winfried Lechner
extent the properties which are generally held to be characteristic of Move and Merge are discrete in nature, i.e. mutually exclusive, and to which degree they overlap. More specifically, the extension and reduction of the MLC to be defended below also encapsulate the heterodox claim that the properties of Move and Merge intersect more extensively than is commonly believed (for discussion of similarities between Move and Merge see Bobaljik 1995; Gärtner 1999, 2002; Starke 2001). Such an overlap, it is maintained, materializes in two areas. First, the discussion of Merge over Move in section 2 indicates that the domain of the Merge relation (the numeration), and the domain of the Attract relation (usually believed to contain only syntactic objects in the tree) need not necessarily be construed as two disjunct sets. Second, section 3 presents support for the idea that the LCA, a condition which is widely held to be responsible for the way in which symbols are merged at the root, also has an impact on how movement relations are organized during the derivation.
2. Extending the MLC Chomsky (1995: 334ff) suggests that it is possible to detect in the English expletive construction a local economy condition that favors (root) Merge over movement. It is, according to Chomsky, this inherent imbalance between Move and Merge which underlies the contrast in (1): (1)
a. Therei seems [TP ti T° to be someone in the room] Merge expletive in SpecTP
b. *There seems [TP someonei T° to be [vP ti in the room]] Move subject into SpecTP
In particular, Merge over Move posits that it is more economical to merge an expletive in the embedded SpecTP in (1)a, than it is to move the lower subject someone to SpecTP, as in (1)b. Intuitively, the assumption that Merge wins in direct competition with Move should follow from a member of the same family of economy constraints which prefer shorter movement paths over longer ones. In (1)a, insertion of the expletive in the lower SpecTP yields the trivially local relation between an attractor (or Probe) and an attractee (or Goal or target), one in which checking takes place in the base position, whereas attraction of the subject in (1)b results in movement across at least one maximal projection (vP). The intuition that direct insertion from the numeration constitutes the most economic mode of attraction can however
Extending and reducing the MLC
207
not be captured by standard versions of conditions on chains such as the MLC, a definition of which is provided in (2) (Chomsky 1995; top-down/ bottom-up distinction adopted from van de Koot 1996): 3 (2)
MINIMAL LINK CONDITION K attracts _ iff a. there is no `, ` closer to K than _, such that K attracts b and (Top-down)
b. there is no L, L a target for _, such that _ is closer to L than to K (Bottom-up)
(3)
4
CLOSENESS ` is closer to K than _ is to K iff a. K c-commands ` and b. ` c-commands _
The MLC does not apply to (1)b for two reasons. First, the Attract relation does not range over symbols in the numeration, and (2) therefore only regulates competition among different instances of movement. Second, (2)a defines closeness in such a way that a category blocks attraction only if the attractor/Probe c-commands the intervener. Thus, even if the expletive were accessible to the Attract relation, it would fail to qualify as an intervener, because the attractor T° in (1)a does not c-command the expletive in SpecTP. Adopting the assumption that a unified analysis of Merge over Move effects and standard MLC phenomena is both conceptually desirable and not empirically counterindicated5, I would like to suggest the modified version of the MLC and the relevant locality constraints provided by (4) and (5), respectively. This Generalized MLC incorporates two crucial changes. It entails (i) a widening of the domain of Attract and (ii) a liberalization of the definition of closeness:6 (4) GENERALIZED MINIMAL LINK CONDITION For any _, ` D DTreeFNumeration and for any heads K, L K ATTRACTSDef _ iff a. K potentially checks _ and b. there is no ` such that K potentially checks ` and ` can be merged later than _, and c. there is no L such that L potentially checks _ and L can be merged earlier than K
(Top-down) (Bottom-up)
208
Winfried Lechner
(5)
K POTENTIALLY CHECKSDef _ iff a. K and _ share (a relevant subset of) features and b. If _ is merged, K and _ satisfy the structural requirements for feature checking (c-command or Spec-head relation)
More specifically, the definition in (4) allows Attract to select symbols directly from the numeration, to the effect that expletive insertion now competes with subject raising in (1). In addition, the closeness relation is now expressed in terms of a version of derivational c-command (see Epstein et al. 1998). On this conception, a symbol c-commands all and only those nodes in a tree whose root it has been merged with (see Epstein et al. 1998 for details). As schematically illustrated by (6), the definitions in (4) and (5) in turn match these two cases to two different types of interveners: A head K can attract _ neither across a traditional intervener ` of the right feature composition, nor in contexts where K can be merged with a suitably specified `’ in SpecKP. In the former case, the traditional intervener ` can be merged later than _, i.e. derivationally c-commands _, vetoing movement from the position occupied by _. In the latter case, the definition (5)b establishes that K potentially checks `’ because if `’ is merged in SpecKP, `’ and K satisfy the structural requirements for feature checking. Moreover, given that `’ can be merged later than _ – i.e. `’ derivationally c-commands _ – `’ qualifies as an intervening category: (6)
TOP-DOWN YP 3 `’’non-intervening KP 3 `’intervening 3 KAttractor/Probe XP 3 : ! `intervening 6 ! _Attractee/Goal z____-_-_--_____--___m
All higher categories, such as `’’ in (6), do not meet the structural requirements for feature checking with K, and therefore also fail to qualify as interveners. The generalized version of the MLC naturally accounts for the contrast in (1). Following standard practice, the analysis includes the axiom that T° bears an EPP-feature which probes for an target with appropriate features,
Extending and reducing the MLC
209
as illustrated in (7). Expletive insertion is now preferred over subject raising from a lower subject position because (i) the widening of the domain of Attract allows expletive insertion to compete with subject raising, (ii) there can check the EPP feature of T°, rendering it a suitable target for attraction by T°, and (iii) the expletive can be merged at a later point in the derivation than the lower subject someone. (7)
a.
*There seems someone to be in the room
b.
TP wi ` (= there)
(= (1)b)
MERGE OVER MOVE
wi
: T°[EPP] 6 ! _ (= someone) z____________+____________________m The system also entails as a desirable consequence that the set of symbols which can be attracted directly from the numeration is restricted to simple expressions such as expletives. Internally complex categories have to be composed in the syntactic working space, and are no longer part of the numeration.7 Together with the Theta Criterion, which blocks insertion of arguments in functional projections, this ensures that movement never competes with insertion of anything else but expletives. The Generalized MLC (4) maintains the bipartition of the original version (2) into a top-down and a bottom-up clause. And just as in (2), the two clauses (4)b and (4)c are symmetric in that the structural relations which underlie the characterizations of potential interveners are kept constant. Moreover, the definition of intervention in (4)b has been liberalized. It is now sufficient for a potential intervener ` to (derivationally) c-command the target _, it no longer needs to be (derivationally) c-commanded by the attracting head. Given the symmetry of the b- and c-clauses of (4), one is now led to expect that this looser structural requirements for interveners also carries over to the bottom-up clause (4)c. In what follows, I will comment on some corollaries of this prediction.
210
Winfried Lechner
The tree in (8) depicts possible intervention configurations which fall under the bottom-up clause (4)c of the modified version of the MLC: (8)
BOTTOM-UP KP wo KAttractor/Probe wo : L’intervening attractor 6 ! LP wo ! _Attractee/Goal ei ! Lintervening attractor z______________________m
This second part of the MLC determines the actual scope of movement in contexts which contain more than one attractor/probe and accordingly supply more than one possible landing sites (most prominently represented by Relativized Minimality and Superraising). For these cases, the new, looser definition of closeness generates the prediction that movement8 should not only be excluded if it proceeds across a c-commanding intervener (L’ in (8)), but should be equally impossible if the target is located in the specifier position of a head L with the appropriate feature specification. L blocks movement of _ to K because L can be – and in fact has been – merged earlier in the derivation with the root node than K, and also observes the structural requirements for entering a potential checking relation with _. Can traces of such ‘lower blocking effects’ also be detected empirically? There is good reason to believe that this is indeed the case, at least given plausible assumptions about feature checking. I will briefly expand on two consequences of this idea below. First, the fact that the new version of the MLC predicts lower blocking effects as in (8) makes it necessary to fine-tune the definition of potential checking in order to maintain the analysis of the contrast in (1). If a potential checking relation obtained between the expletive and T1° in (9)a, the embedded T1° would serve as the closest attractor for the expletive, and the analysis would incorrectly predict that the expletive should not be eligible for further raising to the matrix SpecTP in (9)c: (9)
a. [TP1 there T1°[EPP] [to be someone in the room]] b. [TP1 there T1° [to be someone in the room]]
EPP is checked EPP is erased
c. [TP2 Therei T2° seems [TP1 ti T1° [to be someone in the room]]] Raising of expletive
Extending and reducing the MLC
211
Suppose therefore that the EPP-feature on T1° is erased and rendered invisible after expletive insertion in (9)a (for details see discussion below (11)). In (9)b, T1° accordingly no longer supports a potential checking relation with its specifier. Hence, T1° does not function as an intervening head in (9)c, facilitating raising of there to the specifier of T2°. There is another set of configurations which needs to be considered in this context, though. In (9), the expletive was inserted into the specifier of a non-finite T° ((9)a). If, on the other hand, T° is finite, and checks other features in addition to EPP, such as Case and/or Phi-features, further movement of a term from SpecTP is generally prohibited. Case freezing restrictions of this type are for instance widely held to be responsible for the ill-formedness of raising out of finite clauses, as in (10) (Chomsky 2000, 2001; Frampton 1996): (10) *Someonei seems [CP that [TP ti T°[+fin] is in the room]] (11)
TP2 ➞ Point at which lower phase is sent to Spell-Out wo CASE FREEZING T2° vP 3 : seems CP ➞ Case/Phi features of T° remain visible 3 ! ! TP1 ➞ Phase boundary 3 ! ! someone 3 z________+____________m ! T1° [+fin] z____m Case/Phi-feature checking
So far, Case Freezing has resisted a satisfactory explanation. The definition of the MLC adopted here makes it possible, though, to embed the phenomenon into the theory of locality. The tree in (11) exposes the relevant components of the analysis. Assume, following Pesetsky and Torrego (2001), that feature checking does not entirely obliterate Case and/or Phi-features of a finite T°, but that these features on T°[+fin] and the DP in SpecTP remain – in contrast to EPP on infinitival T° – visible at least as long as the derivation reaches the edge of the CP phase.9 Intuitively, this imbalance between EPP and Case/ Phi-features might be thought to correlate with the fact that a residue of Case and/or Phi-feature checking remains visible in overt morphology at later
212
Winfried Lechner
stages of the derivation, whereas the EPP merely serves the strictly derivational requirement to fill certain nodes in the tree, whose effects cannot be recovered later on. Now, assuming that the edge of the phase is accessible to operations in the minimally containing phase (Chomsky 2000), the matrix T° in (10)/(11) has access to the information that the embedded, finite T° is in a potential checking relation with the category in its specifier, even though the Case/Phi-features on both T°[+fin] and SpecTP[+fin] have been checked.10 It follows that the MLC prohibits the matrix T° to attract the subject in the lower SpecTP, because the lower T°[+fin] qualifies as the closest potential attractor.11 On this analysis, Case Freezing instantiates a violation of the bottom-up clause of the Generalized MLC, and can accordingly be subsumed under the theory of locality. The preceding discussion considered implications which result from applying the two clauses of the Generalized MLC ((4)b and (4)c) in isolation. On the one side, it was argued that the revised top-down clause (4)b results in a common analysis of Merge over Move and locality effects. The bottomup clause was on the other side seen to offer a new and potentially promising perspective on Case freezing. Interestingly, it is also possible to find contexts in which the top-down and the bottom-up clause are operative simultaneously, confirming the internal consistency of the definition in (4). To begin with, the unavailability of Superraising constructions such as (12)b demonstrates that the Merge over Move generalization is not absolute, and can in certain environments even be reversed. Raising of the subject to the intermediate SpecTP in (12)a is preferred over insertion of the expletive, followed by long raising, as in (12)b: (12) a. It seems that [TP someonei T°[+fin] [is likely ti to win]] Move subject into SpecTP
b. *Someonei seems that [TP it T°[+fin] [is likely ti to win]] Merge expletive in SpecTP
That is, given a numeration for (12) which includes one expletive and one DP, the only way to satisfy locality and the Theta Criterion simultaneously is to proceed as in (12)a, where expletive insertion is delayed in favor of subject raising. The current analysis attributes the ill-formedness of (12)a to violations of both clauses of the MLC. Assume first that Merge over Move applies, resulting in the derivational step (13)a which locates the base position of it in the intermediate SpecTP1. In this scenario, the bottom-up part (4)c prohibits further raising of it to the superordinate T2° as in (13)b, because the
Extending and reducing the MLC
213
finite intermediate T1° constitutes a closer potentially checking head (Case Freezing): (13) a.
[TP2 T2°[+fin] seems that [TP1 it T1°[+fin] [VP someone is likely to win]]]
b. *[TP2 Iti T2°[+fin] seems that [TP1 ti T1°[+fin] [VP someone is likely to win]]] c. *[TP2 Someonei T2°[+fin] seems that [TP it T1°[+fin] [ti is likely to win]] Moreover, the closest target for the matrix T2° is the expletive. Thus, the topdown clause (4)b fails to sanction long subject raising of someone as in (13)c. To resolve this conflict, the order between Merge of the expletive and subject raising must be reversed, so that raising to SpecTP1 precedes expletive insertion, as in (12)a. Crucially, this reversal of Merge over Move falls out from the Generalized MLC without further additions.12 To summarize, the generalized revision of the MLC in (4) and (5) represents an attempt at contributing to a more inclusive characterization of syntactic intervention effects. In particular, it was argued that generalizations from three different empirical domains (Merge over Move, Case Freezing and Superraising) attest to the higher descriptive adequacy of the Generalized MLC. The specific account advocated in section 1 also generates potentially significant theoretical consequences. More precisely, it entails that the domains of Merge and Move intersect more extensively than standardly assumed. Symbols can now be attracted from the syntactic tree (or subarray or working space) as well as from the numeration, removing the stipulation that the Attract relation may only target contexts which have already been manipulated by the syntactic system.13 This in turn indicates that the properties defining these two structure building operations are not discrete, i.e. mutually exclusive, but might be partially unified, leading to a reduction of redundancies.
3. An LCA-reduction of the MLC In the previous section, it was claimed that a reformulation of the MLC which subsumes Merge over Move effects furnishes support for the conjecture that the properties of Move and Merge partially overlap. The second part of this study discusses a further argument to this end. More precisely, it will be argued that certain restrictions on movement – minimally local violations of the MLC – can be interpreted as a reflex of aspects of the Linear Correspondence Axiom (LCA; Kayne 1994), a principle which regulates possible
214
Winfried Lechner
ordering relations among terminals, and thereby serves as a restriction on Merge. I will henceforth refer to the hypothesis that a proper subset of the effects generally attributed to MLC can be linked to a specific component of the LCA as the LCA-REDUCTION of the MLC. (The qualifications in italics above are instrumental in order to avoid a potential confound. It is neither my intention to reduce the entire MLC to the LCA, nor will I attempt to relate the MLC to the LCA as a whole). Section 3.1 introduces some definitions, and outlines the LCA-reduction of the MLC, while §3.2 presents an implementation based on Superiority effects in English. The last subsection addresses various challenges for the analysis.
3.1. The LCA, copies and cyclic Spell-Out The LCA represents a filter on tree representations which limits possible precedence relations among terminals. It can be decomposed into two parts: First, a set is construed which collects pairs of terminals. In particular, these terminals need to fulfill the condition that they are dominated by non-terminals in an asymmetric c-command relation. (15) provides the definition of a relation T which generates such a set (Stabler 1997): (14) DOMAINS:
N = set of nodes NT = set of terminal nodes NNT = set of non-terminal nodes
(15) RELATION T (NNT A NT) For any a, b D NT, aTb =Def (x,yD NNT)(x W* a y W* b xAy) (where W* denotes the dominance relation and A denotes asymmetric c-command) Second, the LCA demands that the T-relation minimally meet the formal requirements listed under (16). In essence, (16) has the effect of excluding sets which contain symmetric pairs, and sets which do not exhaust the domain of terminals. The first case yields conflicting word order, whereas the second case results in configurations in which one or more terminals cannot be ordered at all.
Extending and reducing the MLC
215
(16) LINEAR CORRESPONDENCE AXIOM For any x,y,z DNT, the following conditions hold: (TOTALITY/LINEARITY) a. x &y ➝ xTy yTx b. xTy yTz ➝ xTz (TRANSITIVITY) c. xTy yTx ➝ x=y (ANTISYMMETRY) (“d(A) [i.e. T] is a linear order of the terminals of the tree”; see Kayne 1994) It has been observed at various points in the literature that the LCA as formulated in (16) has the consequence that members of a movement chain cannot be associated with copies at the level at which the LCA is operative (Kayne 1994; Nunes 1996; Richards 2001, among others). If movement were allowed to leave copies, the T-set, which lists the extension of the T-relation, would also include symmetric pairs. The schematic derivation in (17)14 exemplifies that the combination of the asymmetric ccommand configurations between attracting head and attractee/target prior and subsequent to movement introduces a symmetric pair into the T-set. Prior to movement, the attractor ❶ in (17) is dominated by a non-terminal which asymmetrically c-commands the lower copy of the attractee ②; subsequent to raising of ② to the left of ❶, the higher occurrence of ② is dominated by a non-terminal asymmetrically c-commanding ❶. Since the T-set now contains as well as , the terminals embedded inside the movement copies cannot be ordered, inducing an LCA violation. (17)
T = {, , …}
✗ LCA
3 ②higher copy
3
: ❶Attractor 6 ! ②lower copy z_______________________m If the LCA is regarded as a Spell-Out condition, which regulates the distribution of overt symbols in the tree, it derives the requirement that only one movement copy can be pronounced (Epstein 1998; Nunes 1996; Richards 2001). On a more conservative interpretation (Kayne 1994), according to which the LCA serves as a general restriction on structure building to be observed at least throughout the overt part of the derivation, movement may not even be associated with unpronounced LF-copies. There is also a third
216
Winfried Lechner
position, though, which embodies the claim that the LCA applies to overt as well as to covert categories, but which is at the same time directly compatible with the well-supported tenets of the copy theory of movement. This alternative approach, which will be adopted here, will moreover be seen to have the additional virtue of offering a new account of some ‘core effects’ of the MLC. More specifically, suppose that apart from regulating licit base-generated configurations (Kayne 1994), the LCA also applies to movement chains in accordance with the two hypotheses in (18): (18) H1. The LCA evaluates copies in movement chains relative to a given movement-inducing feature (e.g. [wh]). H2. The computation of the LCA proceeds cyclically. Each phase boundary induces the formation of a new T-set, triggering the LCA to ‘start over’.15 The main consequence of this apparatus consists in prohibiting phase-internal movement across symbols with identical16 feature specification. Crucially for present purposes, such a prohibition also subsumes important aspects of the MLC. In what follows, I will supply the details of two sample derivations first, proceeding from there to a transposition of the analysis to Superiority phenomena in English. The two schematic derivations in (19) exemplify the implementation of the phase-based, feature-relativized version of the LCA. The trees in (19) contain one attractor (❶) and two potential attractees (② and ③) each, differing minimally in that (19)a tracks movement of the higher attractee, while (19)b documents attraction of a category across a phase boundary. Since the derivational LCA algorithm only applies to symbols with identical features (H1 in (18)), other symbols in the tree such as non-attracting heads or phrasal interveners can be ignored. In (19)a, the T-set, which contains all pairs of the T-relation, satisfies the feature-based LCA at the lower phase level, because the two attractees ② and ③ are parsed into an asymmetric c-command relation. When the derivation reaches the phase boundary, information about c-command relations inside the lower phase is eradicated, and a new T-set is established (alternatively, T is reset). Moreover, since movement of ② to the specifier of ❶ introduces only a single pair into the new T-set, the derivation proceeds in conformity with the LCA. In contrast to (19)a, the derivation in (19)b includes an additional, phaseinternal movement step which turns out to be fatal from the perspective of
Extending and reducing the MLC
217
the LCA. Assume that extraction out of phases needs to proceed via phaseperipheral intermediate landing sites. This requirement can for instance be derived from the Phase Impenetrability Condition (PIC; Chomsky 2000, 2001), and a principle which generates an attracting (EPP) feature at the edge of the phase for each overt movement operation (for a possible motivation behind this principle see Heck and Müller 2000). Then, movement of the lower attractee ③ has to strand a copy to the left of ②. But this additional short movement induces now an LCA-violation inside the lower phase.17 (19) a. Licit Movement ✓LCA ✓LCA
Tlower Phase = {} ThigherPhase = {}
3 ② 3 : ❶Attractor 3 Phase Boundary ! ! 3 ! ② 6 z__________________m ③ b. MLC Violation Tlower Phase = {, } Thigher Phase = {}
✗ LCA ✓LCA
3 ③ 3 : ❶Attractor 3 Phase Boundary ! ! 3 ! ③ 3 ! : ② 6 ! ! !! ③ z________________________mz__________________m Movement to edge of phase (PIC)
Thus, the phase-based, feature relativized LCA ensures that a phase-external head can only attract the highest suitable category in the subordinate phase (② in (19)). But this prohibition on non-local dependency formation constitutes nothing else but the core of (the top-down clause of) the MLC, which
218
Winfried Lechner
equally demands attraction of the closest feature-compatible candidate. On this view, the core of the MLC represents an epiphenomenal generalization that can be reduced to a derivational implementation of the antisymmetry condition included in the LCA. Since the LCA is widely thought to regulate the ordering of terminals in base-generated structures,18 this result also implies that the properties characterizing Merge and Move cannot be discrete, but need to overlap partially. Before proceeding to empirical ramifications of the LCA-reduction of the MLC in the next subsection, notice that the approach delineated above shares certain aspects with Richards (2001). Richards (2001) designs an LCA-based algorithm which limits each category type (NP, PP, …) to a single occurrence inside each phase. As a commonality, both analyses achieve a liberalization of the effects of an LCA-related condition by relativizing the LCA to phases. That is, the phase boundaries mark the domains within which a certain wellformedness condition remains operative. But there are also significant disparities. First, for Richards (2001), the LCA serves as a well-formedness condition on base-generated structures, while the present account assumes that the LCA operate on movement chains. Second, LCA-violations can be repaired by moving the offending category out of the local phase in Richards (2001). In contrast, on the present conception, LCA violations cannot be avoided by extraction out of the local phase. This difference follows from the fact that movement copies are visible to the LCA only in the present account. Finally, phase boundaries serve here the purpose of permitting movement across an attractor in the first place, while in Richards (2001), the movement component is entirely disassociated from the LCA. Thus, the two approaches generate substantially distinct – although hopefully not incompatible – sets of predictions.
3.2.
Superiority in English
The LCA-reduction of the MLC entails as a corollary a new analysis of Superiority phenomena, which will be expanded on in the present subsection (for cross-linguistic differences see e.g. Haider, this volume and Müller, this volume). To begin with, the partial derivation in (20) demonstrates that licit subject questions comply with the LCA. Subsequent to merging the vP in Step 1 and Step 2, the subject can be fronted to SpecTP in Step 3 without crossing another wh-phrase.
Extending and reducing the MLC
219
(20) Who bought what Step 1: [VP buy what] Step 2: [vP who [VP buy what]] TvP = {<who, what>} Step 3: [TP whoi ....[vP whoi [VP buy what]]]
G ✓ LCA Move subject to SpecTP
Note in passing that the LCA account demands that wh-subjects surface in SpecTP19 and do not move on to SpecCP, as in (21). (21) Step 4: [CP whoi C°[wh] [TP whoi .....]] TCP = {, <who, C°[wh] >}
Move subject to SpecCP G ✗ LCA
(21) is not sanctioned by the LCA due to the presence of a [wh]-feature on C° and the two occurrences of the subject. It follows that the [wh]-features on the wh-subject and on C° in (20) cannot be eliminated in a specifier-head configuration in CP, but are checked by the licensing relation Agree (Chomsky 2000, 2001). If the wh-object occupies the initial position, as in (22), the PIC dictates that the object needs to land in an intermediate position at the edge of the vP-phase to the left of the wh-subject before leaving the vP (Step 1). As a result, the higher and the lower occurrence of the object enter a symmetric c-command relation with the wh-subject, in violation of the LCA (Step 2).20 (22) *What did who buy Step 1: [vP who [VP buy what]] Step 2: [vP whati [vP who [VP buy whati]]] TvP = {<who, what>, <what, who>}
Merge vP Move object to edge of vP G ✗ LCA
At first sight, this specific implementation of the LCA appears to conflict with the conjecture that movement to the edge of the phase is triggered by an EPP feature (Chomsky 2000, 2001, Heck and Müller 2000). Since object extraction creates an intermediate copy at the edge of vP, even formation of simple object questions as in (23) should conflict with the LCA: (23) What did you buy [vP whati you v° [VP buy whati]] TvP = {, <what, v°>}
Move object to edge of vP G ✗ LCA
However, this result only obtains if the EPP-feature is taken to be present both on the attracting head (v°) and on the target. If the feature on the
220
Winfried Lechner
attracting head and the target are on the other side not identical, the LCA, which by definition only regulates movement dependencies based on feature identity, will not apply. There is now indeed good reason to believe that EPP driven movement is not encoded by one and the same feature on the attracting head and on the target. First, while Case/Phi-feature checking is a symmetric operation, in that the relevant features potentially surface morphologically on the head as well as on the DP, EPP-checking is asymmetric insofar as the moved category satisfies a purely derivational requirement of the head bearing the EPP-feature, but not vice versa. In addition, on an influential view, the EPP-requirement can also be satisfied by other categories, such as verbs, aside from DPs (Alexiadou and Anagnostopoulou 1998). Again, this indicates that the features of the attracting head and the target are not strictly identical. If identity were indeed the relevant notion, the EPP should manifest itself as an N-feature in some contexts, but as a Vfeature in others. Rather, the EPP-feature appears to be underspecified for category (and possibly other properties). As a result, EPP driven movement is exempt from the LCA, resolving the apparent complication posed by successive cyclic movement to the edge (on this issue, see also discussion below (36)). Returning to Superiority, it is a well-known fact that multiple questions involving locative or temporal wh-adjuncts display a certain degree of immunity to the MLC, as witnessed by the absence of a contrast in (24) (Oka 1993; Williams 1994; Hornstein 1995; Huang 1995): (24) a. Where/when did you buy what b. What did you buy when/where (25) details why, on the present analysis, the wh-object in (24)a may remain in-situ. Given that temporal and locative modifiers originate above the object, the adjunct does not have to pass the object on it its way to SpecCP, yielding an LCA conform derivation. (25) Where did you buy what Step 1: [VP [VP buy what] where] Merge VP and adjunct Step 2: [vP wherei you [VP [VP buy what] wherei] Move adjunct to edge of vP G ✓ LCA TvP = {<where, what>}
As for the integration of modifiers into the syntactic tree, I assume that locative and temporal adjuncts are right-adjoined to VP, and that objects obtain
Extending and reducing the MLC
221
scope over these adjuncts by covert, Case-driven movement to SpecvP (Chomsky 1995; Lechner 2003). This approach is fully consistent with the derivational LCA-based algorithm advocated above. It is at odds, though, with the traditional formulation of the LCA (Kayne 1994), which generates all adjuncts as left-branch specifiers. However, in light of the considerable body of evidence in support of the existence of right-adjoined adjuncts, this consequence constitutes an asset rather than a shortcoming of the present analysis (Lechner 1999; Nissenbaum 2001; Sauerland 1998). I will accordingly adopt a slightly weaker hypothesis on the syntax of modification, one which adds the assumption that adjuncts are exempted from the LCA (in its original form; on the exceptional phrase structural status of adjuncts see e.g. Bobaljik 1995). On this view, adjuncts do not partake in the computation of the well-formedness of base-generated strings of terminals, but are nonetheless visible to the phase-based, feature relativized metric which derives core aspects of the MLC. Above, it was seen that the adjunct-first variant of multiple interrogatives (24)a can be directly derived from the LCA-reduction of the MLC. However, if it is the object which occupies the topmost position, as in (26), the analysis at first sight seems to deliver the wrong results. In particular, the PIC demands that a second occurrence of the object copy has to be stranded at the left edge of the vP, above the VP-adjunct (Step 2), in offense of the LCA: (26) What did you buy where Step 1: [VP [VP buy what] where] Step 2: [vP whati [VP [VP buy whati] where]] TvP = {<where, what>, <what, where>}
Merge VP and adjunct Move object to edge of vP G ✗ LCA in vP
There is a natural strategy to avoid this complication, though, which rests on the hypothesis that adjuncts can be inserted counter-cyclically (Lebeaux 1988, 1990; see Nissenbaum 2001 on late adjunct merger in a phase-based model). (27) details the relevant steps in the alternative derivation for (24)b. In Step 1 to 3, the object successive cyclically moves to SpecCP, this time in line with the LCA. Then, in Step 4, the adjunct is inserted counter-cyclically. But since the T-set of the lower phase vP has at this point already been re-set – each phase level triggers the LCA-computation to start over – late adjunct merger does not introduce a symmetric pair of terminals in the lower T-set. Symmetric relations cannot be established retroactively, once a phase has been completed. As a consequence, the derivation in (27) converges:
222
Winfried Lechner
(27) What did you buy where Step 1: [VP buy what] Merge VP Step 2: [vP whati [VP buy whati ]] Move object to edge of vP TvP = {} G ✓LCA Step 3: [CP whati C°[wh] [vP whati [VP buy whati]]] Move object to SpecCP TCP = {<what, C°[wh]>} G ✓LCA Step 4: ...[vP whati [VP [VP buy whati] where]] TvP = { } Merge adjunct; TvP already computed G ✓LCA
The interaction between counter-cyclic adjunct merger and the phase-based LCA leads to an interesting prediction.21 One is led to expect that Superiority effects triggered by overt adjunct wh-movement should re-emerge if the phase-mate wh-argument is generated above, and not below the adjunct. Before expanding on the reasoning behind this claim, notice that the prediction indeed appears to receive empirical support. Contexts which combine wh-subjects with locative wh-adverbs display sensitivity to Superiority (Williams 1994: 191): (28) a. *Where did who go b. Who went where The adjunct in (28)a is trapped in its base position because it needs to cross over the subject phase-internally, but cannot do so, regardless whether the adjunct is merged cyclically or non-cyclically. The details of the two potential derivations for (28)a are provided in (29) and (30), respectively. In (29), the adjunct is inserted before the subject joins the derivation, whereas adjunct insertion is delayed until the vP has been completed in (30): (29) *Where did who go
Cyclic derivation (early adjunct insertion) Step 1: [VP [VP go] where] Merge VP Step 2: [vP who [VP [VP go] where]] Merge vP Step 3: *[vP wherei [vP who [VP [VP go] wherei]]] Move adjunct to edge of vP TvP = {<who, where>, <where, who>} G ✗LCA
(30) *Where did who go
Counter-cyclic derivation (late insertion) Merge vP Step 1: [vP who go] Step 2: [CP where did who [VP go]] Merge adjunct in SpecCP Step 3: *[CP wherei [vP who [VP [VP go] wherei]]] Counter-cyclic chain formation
Extending and reducing the MLC
223
Assuming that (i) subjects originate above locative and temporal adverbs, and that (ii) arguments cannot be merged countercyclically, overt adjunct movement to the edge of the phase in (29) strands a copy to the left and to the right of the subject, inducing an LCA violation. Derivation (30) is on the other side excluded as an instance of counter-cyclic movement. In (30), the higher adjunct copy is inserted in SpecCP (Step 2) prior to merger of the lower, vPinternal copy (Step 3). Despite the fact that (30) is now sanctioned by the LCA – the T-set for vP has already been computed once the derivation reaches Step 3 – late insertion of the lower adjunct copy results in countercyclic chain formation, which in turn is banned by whatever principle prohibits counter-cyclic movement.22 That is, (30) illustrates that even though adjuncts may be merged counter-cyclically, adjuncts which have been inserted late cannot undergo movement. To recapitulate, although neither the cyclic nor the counter-cyclic derivation of (28)a succeeds, they fail to do so for different reasons. Whereas cyclic merger (29) is excluded by the LCA, late adjunction in (30) violates the cyclicity requirement on movement/chain formation. A note of clarification is in order at this point which is intended to remove a potential confound. It was postulated above that the collection of the members of the T-set ‘starts over’ at each phase boundary, yielding a system in which the LCA assesses movement chains cyclically. Crucially, this specific approach does not subscribe to the claim, though, that the LCA-reduction is a surface linearization condition that is verified at each phase boundary, as e.g. in Richards (2001). Such a model, which equates (PF) Spell-Out domains both with phases and the domains relevant for the computation of T-sets, is conceivable, but would, in conjunction with the present assumptions, wrongly entail, for one, that entire phases are linearized as a unit, rendering the edge of a phase inaccessible to heads in higher domains. Rather, the LCA is employed here as a filter on possible movement operations which applies throughout a phase, including its edge, and is reset at the phase boundary. Furthermore, heads in higher phases have access to categories that have reached the edge of the phase by whatever mechanism is assumed to standardly license such relations (a proviso in the PIC, as in Chomsky 2000; delayed Spell-Out, as in Chomsky 2001: 14; or partial Spell-Out of the phase, as in Nissenbaum 2001).
224
Winfried Lechner
3.3. Two open issues Finally, two sets of problematic cases need to be addressed. First, the LCA reduction of the MLC is not equipped to handle Superiority contexts involving manner or reason adverbs. As has been pointed out repeatedly in the literature, such ontologically more complex adjuncts fail to license the full range of overt permutations which are attested with temporal and locative adverbs (Huang 1982, 1995: 153; Hornstein 1995: 147; Williams 1994). For one, manner and reason modifiers induce Superiority effects with objects ((31)), as well as among each other ((32)): (31) a. Why/How did he fix what b. *What did he fix why/how (32) a. *How did she sing why b. *Why did she sing how
(Hornstein 1995: 234, fn. 42)
Very speculatively, manner and reason adjuncts might be distributed less freely because they fail to license the functional readings underlying the interpretation of multiple interrogatives.23 This might be so for type and/or sortal incompatibilities (reasons/manners vs. individuals) between the fronted wh-expression and the implicit variable it binds inside the in-situ wh-phrase. If such a line of reasoning can be further substantiated, these additional restrictions on wh-adverbs should not be related to an inherent property of the MLC, a result which would align well with the present proposal. Second, the LCA reduction of the MLC is challenged by environments involving long-distance argument Superiority, in which the two competing wh-phrases arise in different phases, exemplified by the paradigm in (33) (Hendrick and Rochemont 1982): (33) a. Whoi did John persuade ti [CP to buy what] b. *Whati did John [vP persuade who [CP to buy ti]] c. Whati did John persuade Mary [CP to buy ti] (33)b resists an LCA analysis for the reason that in none of the two phases (vP and CP) occurrences of who and what enter into a symmetric c-command relation. Postulating for (33)b the ill-formed derivation (34), in which movement of what proceeds via two intermediate landing sites to the left and to the right of what inside the vP-phase would moreover lead to a number of dubious side effects. (34) Whati did John [vP whati persuade who whati [CP that Bill bought ti]]
Extending and reducing the MLC
225
If, for instance, successive cyclic movement into the lower domain of a higher phase were a general requirement, simple object questions such as (35) could not be derived. In (35), the attracting head C° is sandwiched inbetween the two copies of what inside the CP-phase, introducing two symmetric pairs into the T-set: (35) What did you see [CP What C°[wh] did you what [vP-Phase boundary see what]] TCP = {, <what, C°[wh]>}
G
✗ LCA
Cross-phasal manifestations of locality also show up in other environments, though, briefly to be touched upon in §4.1. There, I will adopt the orthodox position that all non-phase-internal locality restrictions, including (33)b, fall under the reign of the Generalized MLC. Sections 2 and 3 presented two independent proposals. The first one led to a widening of the empirical coverage of the MLC, expanding its scope to Merge over Move, Case Freezing and Superraising, whereas the second one made it possible to eliminate certain core cases of the MLC by linking them to the LCA. It has not been specified so far, however, whether these two ideas are compatible, and how they interact. The last section will take up these important issues, aiming at a synthesis of the Generalized MLC and the LCA-reduction of the MLC. 4. Combining the generalized MLC and the LCA-reduction Two questions emerge once the reformulation of the MLC of section 2 is combined with the LCA-reduction of section 3. First, are the two proposals compatible, i.e. is the synthesis internally consistent? And second, do the two parts of the analysis indeed make independent contributions, or is one of them subsuming the other? The results of this section will turn out to indicate that a unification into a model with two partially independent subcomponents seems feasible. 4.1. Independence I: The generalized MLC It is easy to see that the Generalized MLC is independent from the LCAreduction. Consider again Case Freezing, repeated from §1: (The same point could be made on the basis of Superraising.)
226
Winfried Lechner
(10) *Someonei seems [CP that [TP ti T°[+fin] is in the room]] As was seen in section 1, the Generalized MLC in (4) and (5) correctly predicts that the potential checking relation between the lower, finite T° and the subject inhibits raising. Since the lower CP in (10) constitutes a phase boundary, and phase boundaries re-set the T-set, movement of someone across the higher T° observes the LCA-reduction, though. Neither the lower nor the higher T-set contains the symmetric pairs <Ti, someone> and <someone, Ti>, with i index ranging over occurrences of T° nodes. It follows that (10) is excluded by the Generalized MLC, but cannot be captured by the LCA. In a similar vein, Merge over Move effects are only amenable to an analysis in terms of the Generalized MLC. In (1), repeated from above, competition between expletive insertion and movement from the lower subject position is resolved by the Generalized MLC: (1)
a. There seems [TP t to be someone in the room] b. *There seems [TP someonei to be ti in the room]
The contrast in (1) can, on the other hand, not be related to the LCA. First, the intuition expressed by the LCA can simply not be translated to the context (1), since (1) does not include crossing categories of the right shape and feature specification. Second, raising across non-transitive vP’s as in (1b) – i.e. vP’s which fail to qualify as phases – has to be exempted from the LCA more generally, because these configurations invariantly include an attracting head and a target inside a single phase. (36) provides the relevant parts of the derivation: (36) [CP [TP Someone T°[+fin] [vP arrived]]] Raising across vP, vP is not a phase TCP = {, <someone, T°[+fin]>} G ✗ LCA There are two strategies to avoid this undesired result, though, one of which has already been employed in the previous section. On the one side, the set of phases can be redefined to include also intransitive vP; I will not comment any further on this option here, which seems viable but comes at the cost of substantial changes (see Legate, to appear, for arguments that passive and unaccusative vPs are also phases). Alternatively, the LCA could be made to disregard the features and categories implicated in subject raising. In section 3, I maintained that the operative domain of the LCA is limited to categories with identical features, and that EPP-driven movement, which is generally thought to motivate raising, does not involve strict feature identity. From this
Extending and reducing the MLC
227
it follows that the T-set for (36) does not contain offending pairs in the first place, and EPP-triggered raising ceases to pose a problem.24 Thus, the observation that feature-driven movement is also licit phase-internally does not necessarily entail unsurmountable complications for the LCA-reduction of the MLC. A second group of contexts which allows to discriminate between the predictive capacities of the Generalized MLC and the LCA-reduction is represented by non-local violations of Relativized Minimality. For instance, the attractees/targets in (37) start out in two different phases, and therefore never enter into a phase-internal symmetric c-command relation.25 (37) *Wherek do you wonder [CP who said [CP that we had met tk]] Thus, the analysis of (37) has to be relegated to the Generalized MLC, which in fact correctly prohibits attraction across the intervening wh-phrase who.
4.2. Independence II: The LCA-reduction In section 4.1. it was argued that the Generalized MLC is independent from the LCA-reduction of the MLC. But does the reverse also hold, i.e. are there cases which can only be captured by the LCA-reduction? Although the answer is contingent on the specific assumptions one adopts about the interaction between the Generalized MLC and phases, it seems to be positive. It can therefore be concluded that both the LCA-reduction and the Generalized MLC control at least partially distinct domains. A first indication for the independence of the LCA-reduction comes from considerations pertaining to successive cyclic movement in multiple questions such as (38). In (38), the EPP-feature on v° attracts the object to SpecvP, resulting in (38)b: 26 (38) *What did who buy a. Step 1: [vP who v° [VP buy what]] z___ EPP ___m b. Step 2: [vP whati [vP who v° [VP buy whati]]] TvP = {<who, what>, <what, who>}
G
✗ LCA
Observe to begin with that the relation between v° and what neither falls under the reign of the LCA, which does not apply to EPP-movement (see discussion below examples (9), (23) and (36)), nor is it regulated by the
228
Winfried Lechner
MLC. In (38)b, the object is the only category that can be attracted by v° in the first place, because the subject cannot check features on the head of the projection it originates in. But since there is no other potentially intervening DP which could possibly eliminate the EPP-feature on v°, the MLC can – in absence of competition – not apply. (38)b will, however, be filtered out by the LCA-reduction, as the three vP-internal occurrences of the two wh-phrases introduce a symmetric pair in the T-set. Thus, the vP-internal relations in (38) are captured by the LCA, but cannot be accounted for by the Generalized MLC, suggesting that the former is independent from the latter. The argumentation above is not conclusive, though. As it turns out, at a later stage of the derivation, (38)b can equally be excluded by the MLC. After the wh-subject has raised to SpecTP ((39)a) and the interrogative C° has been merged ((39)b), the wh-subject in SpecTP is closer to C° than the VP-internal object. It follows that C° can only enter an Agree27 relation with the subject, accounting for the contrast. (39) Who bought what a.
[TP whoi [vP what [vP whoi v° [VP buy what]]]]
b.
[CP C°[wh] [TP whoi [vP what [vP whoi v° [VP buy what]]]]] z_AAgree _m
Consequently, a more reliable diagnostic for differentiating between the predictions generated by the LCA and the MLC has to be based on contexts that do not involve overt subject shift to SpecTP. Such environments are provided by subject in-situ languages such as German. German is generally described as a language which lacks Superiority in simple clauses (see (40)a). However, as observed by Wiltschko (1998), there is an intriguing exception to this generalization. If the subject has to be parsed into a vP-internal position, as in (40)b, it becomes all of a sudden possible to detect robust reflexes of Superiority (the control in (40)c is marked for some speakers, but clearly preferable to (40)b)28: (40) a.
gesehen]] Wen hat [TP wer [vP oft whoNOM often seen whoACC has ‘Who saw who often?’
b. *?Wen hat [vP oft [vP often whoACC has
wer gesehen]] whoNOM seen
Wer hat [vP oft [vP often whoNOM has
wen gesehen]] whoACC seen
c.
Extending and reducing the MLC
229
This finding is directly compatible with the LCA-account, but does not fall out from the MLC. As illustrated by (41), object movement to the edge of vP violates the LCA (Step 2). In line with what has been said about (39), the MLC does not apply to vP-internal movement, because the subject and the object do not compete for the phase-initial position. Crucially, (41) differs from (39) in that the ill-formedness of (41) cannot be captured by the MLC at a later point in the derivation. Subsequent to merger of the interrogative C° at Step 3, the object at the edge of vP is closer to C° than the in-situ subject, and nothing should accordingly block fronting of the object: gesehen]] (41) *?Wen hat [vP oft [vP wer often whoNOM seen whoACC has ‘Who saw who often?’ Step 1: [vP oft [vP wer wen gesehen]] Step 2: [vP wen [vP oft [vP wer wen gesehen]]] TvP = {<wer, wen>, <wen, wer>} Step 3: [CP C° ..... [vP wen [vP oft [vP wer wen gesehen]]]]
G G
✗ LCA ✓ MLC
Thus, the Superiority violation in (41) must be computed locally, inside the vP. This result matches the predictions of the LCA, but cannot be derived from the MLC-account.29 To recapitulate, in section 4.1 and 4.2 it was argued that the combination of the Generalized MLC and the LCA-reduction of the MLC yields a consistent system. An initial survey indicated that the two principles make different contributions, supporting the view that they are independent. On the one side, the Generalized MLC proves to be broader than the LCA-reduction in some domains (Merge over Move, Case Freezing, Superraising, long distance Superiority and Relativized Minimality), and subsumes its effects in others (local Superiority). On the other side, there is at least one group of contexts – Superiority in German – in which the LCA imposes requirements that cannot be expressed by the MLC.
4.3. Dominating Islands This final subsection addresses environments which prove recalcitrant in particular for the Generalized MLC, and specifies a direction to remove this obstacle.
230
Winfried Lechner
According to a well-supported conjecture, Superiority effects are not only induced by c-commanding, but also by dominating interveners (Takano 1995; Kitahara 1995; Müller 1998): (42) a. ??What did who buy t? b. ??Who did you buy [which book of t]?
(Sauerland 2000)
The LCA analysis fails to exclude such contexts in which one target dominates the other, because it functions as a filter for possible c-command relations among attractors/attractees only. The problem carries over to the Generalized MLC, repeated below: (4)
GENERALIZED MINIMAL LINK CONDITION For any _, ` D DTreeFNumeration and for any heads K, L K ATTRACTSDef _ iff a. K potentially checks _ and b. there is no ` such that K potentially checks ` and ` c-command _, and c. there is no L such that L potentially checks _ and K c-commands L
(5)
(Top-down) (Bottom-up)
K POTENTIALLY CHECKSDef _ iff a. K and _ share (a relevant subset of) features and b. If _ is merged, K and _ satisfy the structural requirements for feature checking (c-command or Spec-head relation)
It is however possible to integrate into the Generalized MLC a suitable definition of closeness which is broad enough to generalize to dominating islands. Richards (2002, fn.1) provides such a statement, which, when combined with a standard version of the MLC, straightforwardly excludes both Superiority violations in (42).30 (43) _ is closer to _ potential attractor a than ` is if there is _ node dominating ` which does not dominate _. (Richards 2002, fn.1) (43) can also be transposed to fit the Generalized MLC, for instance by reformulating the top-down clause (4)b as given below:
Extending and reducing the MLC
231
(4)b’ there is no ` such that K potentially checks ` and (Top-down) ` can be merged with the root later in the derivation than a has been merged in the tree In the original version (4)b, _ and ` were taken to range over nodes which have been merged with the root. Since symbols that are merged this way are immediately dominated by the (new) root node, they c-command categories which have been joined by the same strategy earlier in the derivation. The new definition (4)b’ liberates _ from this root Merger requirement, making it possible for _ to be contained inside `.31 As similar considerations do not arise for the bottom-up clause – an attracting head can, on standard assumptions, never dominate another attracting head – the other components of (4) can be kept in their original form. Thus, a minor modification of the Generalized MLC extends the scope of the MLC to intervention effects induced by dominating interveners.
5. Conclusion The present study explored some consequences which result from an extension and a reduction of aspects of the MLC. It was argued that a widening of the domain of the Attract relation together with a loosening of the definition of closeness leads to a unified analysis of a variety of phenomena (Merge over Move, Case Freezing and Superraising) in terms of a generalized version of the MLC. Apart from its extended empirical coverage, the generalized MLC was also seen to be supported by its ability to remove an imbalance in the definition of the domains of Move and Merge. The second part of the paper considered the prospects of reducing a subset of MLC-effects to an antisymmetry algorithm inspired by the LCA. At least at first sight, this reinterpretation of some generalizations about movement not only seems to offer a promising variety of new analytical options, but also might help to prepare the foundation of a system for the formation of syntactic trees which is no longer informed by the impermeable dichotomy Move vs. Merge. The LCA-reduction also instigates various new questions, among them the proper taxonomy of locality violations (local vs. long-distance), and the detailed nature and structural organization of the features involved in movement.
232
Winfried Lechner
Acknowledgements I would like to thank Elena Anagnostopoulou, David Pesetsky, Norvin Richards, Wolfgang Sternefeld, and audiences at McGill University, MIT, the University of Potsdam, the University of Thessaloniki, the University of Urbino, and the University of Tübingen for suggestions, discussion and comments. This work was supported by an APART grant by the Austrian Academy of Sciences.
Notes 1. Economy conditions on optional operations, such as QR, Quantifier Lowering and possibly scrambling, which co-determine interpretation but do not affect the well-formedness of the output string will be disregarded here. See Fox (2000) and Sauerland (2000), among many others, for discussion. 2. For discussion of principles which minimize the number of movement steps, such as Fewest Steps, see e.g. Collins (1996) and Zwart (1996). 3. A reviewer points out that (2) involves two different notions of “attract”, one referring to an operation triggering movement (left side of biconditional), the other to the licensing condition (right side of biconditional, i.e. (2)a and (2)b). The problem disappears in the revised version below. 4. On the relativization of the closeness relation to minimal domains see Anagnostopoulou (2003), Chomsky (1995) and Collins (1996). An anonymous reviewer points out that (3), which defines closeness in terms of possibly symmetric c-command, entails that K can attract neither _ nor `, if _ and ` c-command each other. Note however that in contexts of symmetric ccommand, _ and ` also differ in bar-level. If _ is a head and ` is its complement, then ` is an XP. (In adjunction structures, the nodes structurally differ w.r.t. the inclusion/exclusion relation.) Assume, as seems natural, that the Attract relation is defined for features on K, and not for the head itself, and that K’s features are specified as to whether they attract heads or phrases (but see Alexiadou and Anagnostopoulou 1998). It follows that if _ and ` symmetrically c-command each other, a given feature on K may either attract _ or `, but not both. Turning to the crucial step, (2)a postulates that the MLC only considers potential interveners (`’s) which can be attracted by K. Thus, in contexts in which _ and ` symmetrically c-command each other, (2)a disregards ` because _ and ` are distinct in bar-level and categories of distinct bar-level do not constitute potential interveners in the first place. Hence, clause (3)a of the definition of closeness does not have to be supplemented by an antisymmetry requirement. 5. The poor current understanding of the scope of Merge over Move necessitates further research in order to substantiate these assumptions. There are e.g. various
Extending and reducing the MLC
233
contexts involving expletives which fail to lend themselves to an analysis in terms of Merge over Move. For one, Frampton (1996) and Wilder and Gärtner (1996) discuss pairs as in (i), in which Merge over Move at first sight appears to undergenerate. (i)b should block (i)a, since there insertion precedes movement only in the former case. (i) a. There was [a rumor that a unicorni is ti in the garden] in the air. Move subject b. [A rumor that there is a unicorn in the garden] was in the air. Merge expletive There is an important difference between (i) and the classical examples (1), though. Only the sentences in (i) contrast in interpretation, as can be inferred from the fact that different clauses need to satisfy the definiteness restriction in (i)a and (i)b, respectively. Given that the economy metric only compares synonymous derivations (Reinhart 1997), (i) does not pose a problem since (i)a and (i)b fail to compete. Furthermore, a reviewer cautions that (ii) contradicts the Merge over Move condition: (ii) a. There has been [XP a booki put ti on the table] Move object into SpecXP b. *Therei has been [XP ti put a book on the table] Merge expletive in SpecXP (ii) might lend itself to an analysis in terms of subnumerations (given that the subnumeration for the XP phase does not include the expletive; for arguments that passive vP’s are phases see Legate, to appear). 6. The definitions are only an approximation which still require further refinements. It must e.g. be ensured that every step in the derivation adheres to the Theta Criterion, resulting in cyclic evaluation as in Epstein et al (1998); otherwise, MLC conform derivations which violate the Theta Criterion would block their well-formed competitors. Another way to express the intuition that a head needs to attract the ‘freshest category’ was kindly suggested to me by Wolfgang Sternefeld: (i) K attracts _ from the tree or the numeration iff a. K potentially checks _ and b. if there is a `, such that K potentially checks ` it must hold that the tree with ` in SpecKP does not contain longer chains than the tree including _. (ii) K potentially checks _ iff K and _ share (a relevant subset of) features 7. I assume that symbols are combined with their features already in the numeration. 8. If the MLC is taken to restrict the relation Attract (Chomsky 2000, 2001), the same considerations carry over to conditions on Attract. On locality differences between Attract and Move, see Bobaljik and Wurmbrand (2002) and Ochi (1998).
234
Winfried Lechner
9. The edge of a phase _, where _D{vP, CP}, consists of the head of _ and its specifier(s) (see Chomsky 2000, 2001). 10. Given that the features on T° and on the DP in SpecDP remain accessible to heads in the next phase up, it is plausible to assume that the potential checking relation between T° and DP does so, too. 11. Aspects of this proposal are reminiscent of the ‘defective intervention effect’ discussed in Chomsky (2000, 2001). According to Chomsky, a DP still functions as an intervener after its Case feature has been checked and deleted, while on the present view, a head with a checked Case feature still may enter into a potential checking relation, and therefore block further movement. Thus, the present account generalizes defective intervention effect from DPs to heads. 12. The conflict could also be resolved by the assumption that the expletive and the DP subject never compete because they are distributed into two different subnumerations (for TP1 and TP2). 13. The present conception differs from theories in which Move/Attract is conceived of as multiple application of Merge (Epstein et al 1998). In the latter models, the terms which are merged are internally complex copies, and therefore need to be syntactically derived in the working space first, while the present analysis rests on the assumption that terms can be directly attracted from the numeration. 14. For reasons of space preservation, the non-terminals dominating the terminals (❶, ②, etc…) are not graphically represented. 15. On the interaction between phases and the LCA see Richards (2001) and discussion below example (30). 16. On the relevance of feature identity see discussion of EPP below (23) and below (36). 17. The current account is not compatible with the position that all operations at the edge of the lower phase are visible to the higher phase. Otherwise, the T-set of the higher phase would also contain the pair , and movement would be blocked in general. 18. But see also Epstein (1998), Nunes (1996) and Richards (2001) for applications of the LCA to movement. 19. For a proposal which locates wh-subjects in SpecTP see Pesetsky (1989). Such an analysis is e.g. supported by the absence of do-support in subject questions. 20. In languages which lack subject-object Superiority, such as German, it has been assumed that the object moves out of the vP prior to fronting to SpecCP (see Haider, this volume, and Müller, this volume). An analysis along these lines is fully compatible with the LCA-based account, given that movement out of the vP is triggered by a feature distinct from [wh]. For discussion see §4.2 below. 21. As pointed out by Norvin Richards (p.c.), this prediction is not contingent on the use of the LCA, but would equally arise on the standard definition of the MLC. 22. In principle, it is conceivable that chain formation and movement are subject to different conditions, among them the cycle. In absence of evidence to the contrary, I will assume, however, that chain formation as well as movement observe strict cyclicity.
Extending and reducing the MLC
235
23. For a proposal to relate Superiority to Weak Crossover with functional readings (Chierchia 1992), see Hornstein (1995). On functional readings see e.g. Engdahl (1986), Groenendijk and Stockhof (1984), Hagstrom (1998), Reinhart (1997), Tsai (1994) and Winter (2003). 24. Case/Phi-features on subjects and finite T° do not induce LCA violations on the following two assumptions: (i) Case/Phi-features are embedded below EPP in a hierarchical structure (in analogy to phonological feature geometries; Clemens 1985). (ii) Only the topmost feature is visible to the LCA. 25. Note that symmetric c-command relations cannot be introduced by successive cyclic adjunction, as in (i). This hypothesis creates problems for simple object questions, discussed in connection with (35). (i) ....[CP where who [vP where said [CP that we had met where]]] 26. I assume that movement to the edge of vP does not tuck in (McGinnis 1998; Anagnostopoulou 2003; see also Ura 1996). Support for this claim comes from Icelandic, where floated quantifiers associated with the subject need to surface to the right of object shifted categories (Holmberg and Platzack 1995). 27. Recall that wh-subjects surface in SpecTP, since movement to SpecCP is prohibited by the LCA (see (21)). Thus, the MLC licenses an Agree relation between C° and SpecTP in (39)b. 28. At first sight, (40)b might be taken to indicate that oft/‘often’ can only be generated as a VP-adjunct, and therefore has to follow the subject, which originates in SpecvP. This attractively simple analysis fails in other domains, though. In German, all subjects undergo short movement to the left of sentential negation (see (i)). Moreover, sentential negation follows temporal adjuncts such as oft (see (ii)): (i) weil {wer/jemand} nicht {*wer/*jemand} gekommen ist since someone/someone not someone/someone arrived is ‘since someone didn’t come’ (ii) weil sie {*nicht} oft {nicht} geschlafen hat since she not often not slept has ‘since she has often not slept’ Adopting the null-hypothesis that negation needs to combine in semantics with a node denoting a proposition, the lowest possible node it can attach to is vP. It follows that oft/‘often’ must be generated at least as high as a vP-adjunct. Thus, the simple explanation for (40)b in terms of a base-order conflict cannot be maintained. 29. I follow Wiltschko (1998) in assuming that subject raising to SpecTP obviates Superiority because vP-external subjects are D-linked. 30. For contexts in which _ and ` do not c-command each others, as in (i), the definition in (43) predicts that both _ and ` should qualify as targets. This is so as in (i), there is a node dominating `, but not _, as well as a node dominating _, but not `: (i) [a ... [[..[.._..]..] ... [` ...]]] Thus, a base order such as (ii)a should feed (ii)b as well as (iii)c:
236
Winfried Lechner (ii)
a. Base: She showed [a book about who_] to who` b. (About) who_ did she show [a book (about) t_] to who` c. (To) who` did she show [a book about who_] (to) t` Judgements about (ii)b and (ii)c appear to be unstable and speaker dependent (David Pesetsky, p.c.), a result which might be taken to support the correctness of (43). 31. If the data discussed in fn. 30 is taken to be representative, the root Merger condition could also be dropped for `.
References Alexiadou, Artemis and Elena Anagnostopoulou 1998 Parametrizing AGR: Word Order, V-Movement and EPP-Checking. Natural Language and Linguistic Theory 16 (3): 491–539. Anagnostopoulou, Elena 2003 The Syntax of Ditransitives: Evidence from Clitics. Berlin/New York: Mouton de Gruyter. Bobaljik, Jonathan 1995 In Terms of Merge: Copy and Head Movement. In Papers on Minimalist Syntax, Robert Pensalfini and Hiroyuki Ura (eds.), 41– 64. Cambridge: MITWPL. Bobaljik, Jonathan and Susi Wurmbrand 2002 Relativized Phases. Talk University of Tübingen. Handout available at: http://www.arts.mcgill.ca/programs/linguistics/faculty/wurmbrand/ research/index.html/ Chomsky, Noam 1993 A Minimalist Program for Linguistic Theory. In The View From Building 20, Ken Hale and Jay Keyser (eds.), 1–52. Cambridge, Massachusetts: MIT Press. 1995 The Minimalist Program. Cambridge, Mass.: MIT Press. 2000 Minimalist Inquiries: The Framework. In Step by step: Essays in honor of Howard Lasnik, Roger Martin, David Michaels, and Juan Uriagereka (eds.), 89–155. Cambridge, Mass.: MIT Press. 2001 Derivation by phase. In Ken Hale: A life in language, Michael Kenstowicz, (ed.), 1–52. Cambridge, Mass.: MIT Press. Chierchia, Gennaro 1992 Questions with Quantifiers. Natural Language Semantics 1 (2): 181–234. Clements, Nick 1985 The Geometry of Phonological Features. Phonology Yearbook 2. 225–252.
Extending and reducing the MLC
237
Collins, Chris 1996 Local Economy. Cambridge, Mass.: MIT Press. Engdahl, Elisabet 1986 Constituent Questions. Dordrecht: Reidel. Epstein, Samuel D., Eric Groat, Runko Kawashima, and Hisatsugu Kitahara 1998 A Derivational Approach to Syntactic Relations. Oxford: Oxford University Press. Frampton, John 1996 Expletive Insertion. In Chris Wilder, Hans-Martin Gärtner and Manfred Bierwisch (eds.), The Role of Economy Principles in Linguistic Theory, 36–57. Berlin: Akademie Verlag. Fox, Danny 2000 Economy and Semantic Interpretation. Cambridge, Mass.: MIT Press. Gärtner, Hans-Martin 1999 Phrase-Linking meets the Minimalist Program. In Proceedings of WCCFL XVIII, S. Bird et. al. (eds.), 159–169. Stanford: CSLI Publications. Gärtner, Hans-Martin 2002 Generalized Transformations and Beyond. Berlin: Akademie Verlag. Groenendijk, Jerome and Martin Stockhof 1984 Studies on the Semantics of Questions and the Pragmatics of Answers. Doctoral diss., University of Amsterdam. Hagstrom, Paul 1998 Decomposing Questions. Ph.D. diss., Department of Linguistics, MIT. Heck, Fabian and Gereon Müller 2000 Successive cyclicity, long-distance superiority, and local optimization. Proceedings of WCCFL XIX, 101–114. Stanford: CSLI Publications. Hendrick, Randall and Michael Rochemont 1982 Complementation, multiple wh and echo questions. Ms., University of North Carolina. Holmberg, Anders and Christer Platzack 1995 The Role of Inflection in Scandinavian Syntax. Oxford: Oxford University Press. Hornstein, Norbert 1995 Logical Form: From GB to Minimalism. Cambridge, Mass.: Basil Blackwell. Huang, C.-T. James 1982 Logical Relations in Chinese and the Theory of Grammar. Ph.D. diss., Department of Linguistics, MIT. 1995 Logical Form. In Government and Binding Theory and the Minimalist Program, Gert Webelhuth (ed.), 127–173. Oxford: Blackwell. Johnson, Kyle 1991 Object Positions. Natural Language and Linguistic Theory 9 (4): 577–636.
238
Winfried Lechner
Kayne, Richard S 1994 The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Kitahara, Hisatsugu 1995 Target alpha: Deducing Strict Cyclicity from Derivational Economy. Linguistic Inquiry 26 (1): 47–78. van de Koot, Hans 1996 Strong features, pied-piping and the overt/covert distinction. UCL Working Papers in Linguistics 8: 315–328. Koizumi, Masatoshi 1995 Phrase Structure in Minimalist Syntax. Ph.D. diss., Department of Linguistics, MIT. Lebeaux, David 1988 Language Acquisition and the Form of the Grammar. Ph. D. diss., Department of Linguistics, University of Massachusetts, Amherst. 1990 Relative Clauses, Licensing, and the Nature of the Derivation. Proceedings of NELS 20, 318–332. Amherst: Graduate Linguistic Student Association. Lechner, Winfried 1999 Comparatives and DP-Structure. Ph.D. diss., Department of Linguistics, University of Massachusetts, Amherst. 2003 Phrase Structure Paradoxes, Movement and Ellipsis. In K. Schwabe and S. Winkler (eds.), The Interfaces. Berlin/N.Y.: Mouton de Gruyter. Legate, Julie to appear Some Interface Properties of the Phase. to appear in Linguistic Inquiry. McGinnis, Martha 1998 Locality in A-Movement. Ph.D. diss., Department of Linguistics, MIT. Müller, Gereon 1998 Incomplete Category Fronting. Dordrecht: Kluwer Academic Publishers. Nissenbaum, John 2001 Investigations of covert phrase movement. Ph.D. diss., Department of Linguistics, MIT. Nunes, Jairo 1996 On Why Traces Cannot Be Phonetically Realized. In Proceedings of the NELS 16, Harvard University and MIT, Kiyomi Kusumoto (ed.), 211–226. Amherst: Graduate Linguistic Student Association. Ochi, Masao 1998 Move or Attract? In Proceedings of WCCFL XVI, 319–333. Stanford: CSLI Publications. Oka, Toshifusa 1993 Minimalism in syntactic derivation. Ph.D. diss., Department of Linguistics, MIT. Pesetsky, David 1989 Language-Particular Processes and the Earliness Principle. Ms., MIT.
Extending and reducing the MLC
239
Pesetsky, David and Esther Torrego 2001 T-to-C Movement: Causes and Consequences. In Ken Hale: A Life in Language, Michael Kenstowicz (ed.), 355–426. Cambridge, Mass.: MIT Press. Reinhart, Tanya 1997 Quantifier Scope: How Labor is Divided between QR and Choice Functions. Linguistics and Philosophy 20 (4): 335–397. Richards, Norvin 2001 A Distinctness Condition on Linearization. Ms., MIT. Paper available at: http://web.mit.edu/norvin/www/home.html 2002 Against Bans on Lowering. Ms., MIT. Sauerland, Uli 1998 The Meaning of Chains. Ph.D. diss., Department of Linguistics, MIT. 2000 Syntactic Economy and Quantifier Raising. Ms., Univ. of Tübingen. Paper available at: www2.sfs.nphil.uni-tuebingen.de/home/uli/www/ Stabler, Edward 1997 Derivational Minimalism. Ms. UCLA. Paper available at: www.humnet.ucla.edu/humnet/linguistics/people/stabler/epspub.htlm Starke, Michal 2001 Move dissolves into Merge: a Theory of Locality. Ms., New York University. Paper available at: http://theoling.auf.net/papers/starke_michal/move_is_merge.pdf Takano, Yuji 1995 Predicate Fronting and Internal Subjects. Linguistic Inquiry 26 (2): 327–340. Tsai, Wei-tien Dylan 1994 On Economizing the Theory of A-Bar Dependencies. Ph.D. diss., Department of Linguistics, MIT. Ura, Hiroyuki 1996 Multiple Feature Checking. Ph.D. diss., Department of Linguistics, MIT. Wilder, Chris and Hans-Martin Gärtner 1996 Introduction. In The Role of Economy Principles in Linguistics, Chris Wilder, Hans-Martin Gärtner and Manfred Bierwisch (eds.), 1–35. Berlin: Akademie-Verlag. Williams, Edwin 1994 Thematic Structure in Syntax. Cambridge, Mass: MIT Press. Wiltschko, Martina 1998 Superiority in German. In Proceedings of WCCFL XVI, Stanford. E. Curtis, J. Lyle and G. Webster (eds.), 431– 445. Standford: CSLI Publications. Winter, Yoad 2003 Functional Quantification. to appear in Research on Language and Computation.
240
Winfried Lechner
Zwart, C. Jan-Wouter 1996 “Shortest Move” versus “Fewest Steps”. In Minimal Ideas, Werner Abraham, Samuel D. Epstein, Hoskuldur Thráinsson and C. JanWouter Zwart (eds.), 329–346. Amsterdam: John Benjamins.
Minimality in a lexicalist Optimality Theory Hanjung Lee
1. Introduction A classic problem in generative syntax has been to explain minimality effects in syntactic relations, i.e., movements or dependencies should be as short as possible. Recent research within the Minimalist Program (MP; Chomsky 1993, 1995) has attempted to explain minimality effects in movement by appealing to the Minimal Link Condition (MLC) (Chomsky 1995) in (1) (the notion of closeness here is to be understood in terms of c-command). (1)
Minimal Link Condition (MLC) K attracts _ only if there is no `, closer to K than _, such that K attracts `.
Chomsky (1995) sees the MLC as part of the definition of legitimate movement, i.e., movement that does not obey the MLC, like the derivation shown in (2b), is not defined as a possible derivation step. (2)
a. who1 do you expect t1 to say what? b. *what1 do you expect who to say t1?
Most MLC-based accounts of minimality effects typically focus on the basic contrast in (2), involving subject and object wh-phrases. Similar minimality effects found in A-movement such as Case-driven NP movement have received little attention in the literature, despite their important implications for syntactic theory relating to the formal mechanisms for capturing minimality. The main goal of this paper is to give an account of such a largely ignored kind of a minimality effect, focusing on an analysis of the phenomenon called word order freezing. In many languages with flexible word order, NP movement of, for example, an object NP is blocked when a subject NP of the same type intervenes between its base-generated position and the target position. Consider the Hindi example in (3), which contains subject and object NPs bearing identical case markings (nominative). As noted by
242
Hanjung Lee
Mohanan (1992), the order of the two nominative NPs is fixed as subjectobject order (without strong contextual licensing of the reverse word order). (3)
patth ar t.h elaa todegaa . stone-NOM cart-NOM break-FUT (i) ‘The stone will break a cart.’ (ii) *‘The cart will break a stone.’
Although object-subject order is generally possible in Hindi, it becomes impossible, if subject and object are in the same case. Such word order “freezing” effects could be accounted for in terms of the MLC effect: the nominative object cannot be moved, because a similar element intervenes between its base-generated position and the target position. However, this line of account comes up against two serious obstacles. First, it seems impossible to formulate the MLC with no counterexamples. Second, for the MLC to be applied to the word order freezing effect, it has to be formulated in such a way that it makes direct reference to the specific PF-property of two movable elements (e.g., surface ambiguity or identity in case form), which perhaps plays no role elsewhere in the grammar. This is an extremely unnatural situation in a derivational model of grammar like MP where LF feeds PF, and not vice versa. In this paper, I present an account of minimality effects in A-movement exemplified by the word order freezing effect in Hindi without problematic recourse to the MLC. Specifically, the analysis is developed within the framework of Optimality Theoretic Lexical-Functional Grammar (OT-LFG; Bresnan 2000; Kuhn 2001a; Sells 2001a, 2001b) which embeds LFG’s nonderivational system of correspondence between parallel structures within OT theory of constraint interaction. Crucially, I will show that minimal link effects can be derived as an emergent property of the interaction of violable, universal constraints that are independently motivated in a specific version of OT which takes both generation and interpretation perspectives (Smolensky 1996b, 1998). Though the focus of this paper is the non-standard case of minimality effects, the analysis presented here has interesting implications for the nature of a proper account of standard MLC phenomena and economy effects in syntax.
Minimality in a lexicalist Optimality Theory
2.
243
Interaction of case and word order in Hindi
In this section, I will present data from Hindi illustrating word order freezing in clauses which contain arguments bearing identical case markings, after discussing the overall case and word order patterns in Hindi.
2.1. Case patterns in Hindi Hindi is a language with an aspectually-based split ergative case system, such that ergative case is restricted to the agentive subject in a perfective clause. Otherwise, it is nominative.1 Conditions on ergative marking in Hindi make crucial reference to the semantic property of agency or volitionality (which Mohanan (1994) refers to as ‘conscious choice’) (Mohanan 1994; Butt and King 2002). The verb u.thaa ‘lift’ in (4) takes only an ergative subject, given the required aspectual condition. The action referred to by such verbs must be deliberate (Mohanan 1994: 73).2 (4)
a. raam-ne bacce-ko u.th aayaa. Ram-ERG child-ACC lift-PERF ‘Ram lifted the/a child.’ b. *raam bacce-ko u.th aayaa. Ram-NOM child-ACC lift-PERF ‘Ram lifted the/a child.’
Direct objects in Hindi either bear accusative case (marked with -ko) or nominative, which has no phonological realization. The choice between accusative and nominative is independent of perfectivity and instead determined by both animacy and definiteness. According to the literature on object case in Hindi, Hindi distinguishes three categories of direct objects:3 (i) those which must be accusative, (ii) those which are either nominative or accusative, and (iii) those which can only be nominative but not accusative. Obligatorily accusative objects are those object NPs referring to humans. The categories of objects that can be either nominative or accusative are human-referring non-specifics and inanimate definites; inanimate-referring non-specifics and specifics can only be nominative. Hindi has various other verb classes showing different case patterns. The discussion in this paper is restricted to the class of transitive verbs which exhibit aspectually conditioned split ergativity and two-dimensional split
244
Hanjung Lee
accusativity. For detailed analyses of case patterns in different verb types, see Mohanan (1994), Butt and King (2002) and Lee (2003a), among others.
2.2. Word order freezing in Hindi Hindi is a right-headed language with SOV canonical order. However, unlike Japanese and Korean, the surface order of elements is not strictly head-final. The possible permutations of a simple Hindi sentence are shown in (5). (5)
a. anuu-ne caand dekh aa. Anu-ERG moon-NOM see/look at-PERF ‘Anu saw the moon.’ b. caand Anuu-ne dek h aa. c. anuu-ne dekh aa caand. d. caand dekh aa Anuu-ne. e. dekh aa Anuu-ne caand. f. dekh aa caand Anuu-ne.
Sentence (5a) reflects the ‘basic’, ‘canonical’ or ‘unmarked’ order, and the other orders are deviations from this canonical order (Gambhir 1981; Mohanan 1992, 1994; Mohanan and Mohanan 1994). Such deviations are used to mark a special information structure and generally associated with shifts in prominence, emphasis and semantic effects (e.g., definiteness effects). In the competition-based approach to word order variation taken here, the word orders found prominently in a particular language are interpreted as language-specific solutions to the conflict arising among the different forces affecting linearization patterns. This idea is made explicit in recent optimization-based studies done by Grimshaw and Samek-Lodovici (1998), Samek-Lodovici (2001), Costa (1998, 2001) and Choi (1999), who suggest that different word orders are not optional but the result of different functional specifications in the input. Under this view, basic word order (i.e., the word order found in sentence-focus contexts) is not to be understood as underived word order (i.e., basic phrase structural position in which arguments receive their ‘theta role’) but rather as a consequence of optimization for inputs in which no elements are specified for topic or focus or in which the entire clause is specified for focus. This order is what is called ‘canonical’ or ‘unmarked’ word order (Choi 1999).
Minimality in a lexicalist Optimality Theory
245
Despite a high level of word order freedom in this language, under certain circumstances, free word order freezes into a fixed, canonical order. One environment for restricted word order variation occurs when a sentence contains subject and object NPs bearing identical case markings.4 Let us first consider examples of the double nominative construction in (3), repeated in (6) below.5 (6)
patth ar t.h elaa todegaa. . stone-NOM cart-NOM break-FUT (i) ‘The stone will break a cart.’ (ii) *‘The cart will break a stone.’
The subject patt har in (6) ‘stone’ is nominative because the transitive verb tod. ‘break’ is not in perfective aspect, and the inanimate object .t helaa ‘cart’ is also nominative.6 The examples in (6) and (7) show that the order of the two nominative constituents is “frozen” as SOV. This happens in a null context and in certain special discourse contexts (e.g., in an all-focus context). Reversing the order of the two arguments in (6) yields a new sentence in (7) in canonical SOV order rather than maintaining its meaning with an OSV order. (7)
h patth ar tod. egaa. .t elaa cart-NOM stone-NOM break-FUT (i) ‘The cart will break a stone.’ (ii) *‘The stone will break a cart.’
This contrasts with the sentences in (5), which allow all six possible orderings of subject, object and verb without change in the basic sentential meaning. This difference reflects a well-known crosslinguistic generalization: languages with rich morphological resources for grammatical function specification (case marking in dependent-marking languages and agreement/ pronominal incorporation in head-marking languages) tend to make less use of fixed phrase structures, whereas languages poor in morphology overwhelmingly tend to have rigid phrase structures. Word order freezing occurs in various other kinds of constructions in Hindi. This is not surprising given the fact that many of the case markings are used with more than one meaning. For example, the marking -se indicates instrument, source, path, the demoted subject of a passive, and so on. In (9), the passive of (8), both the demoted agent and the source bear the case marker -se. 7
246
Hanjung Lee
(8)
coor-ne kal ravii se paise curaae. thief-ERG yesterday Ravi-from money-NOM steal-PERF ‘The/a thief stole money from Ravi yesterday.’
(9)
a. coor-se kal ravii-se paise curaae gae. thief-INST yesterday Ravi-from money-NOM steal-PERF go-PERF (i) ‘Money was stolen from Ravi by the/a thief yesterday.’ (ii) *‘Money was stolen from the thief by Ravi yesterday.’ b. ravii-se kal coor-se paise curaae gae. Ravi-INST yesterday thief-from money-NOM steal-PERF go-PERF (i) ‘Money was stolen from the/a thief by Ravi yesterday.’ (ii) *‘Money was stolen from Ravi by the thief yesterday.’ c. ravii-se kal paise coor-se curaae gae. Ravi-INST yesterday money-NOM thief-from steal-PERF go-PERF (i) ‘Money was stolen from the/a thief by Ravi yesterday.’ (ii) *‘Money was stolen from Ravi by the thief yesterday.’
Grammatical function and thematic role are often closely aligned in Hindi. Therefore, it is difficult to distinguish which of the two influences ordering. However, the examples in (9) provide justification for the proposal made by Mohanan and Mohanan (1994) and others that it is in fact thematic role, rather than grammatical function, that determines canonical order. In spite of the fact that the initial -se marked NP in (9a,b,c) is a passive agent and hence an ADJUNCT function, which is lower on the grammatical function hierarchy (10) 8 than the nominative subject, it canonically precedes the subject. As indicated by the glosses, the initial -se marked NP is considered an agent, and the second a source OBLIQUE in accordance with the thematic role hierarchy. (10) Grammatical Function Hierarchy SUBJ > OBJ > SEC. OBJ > OBL > ADJ (Bresnan 2001; Mohanan and Mohanan 1994) Therefore, following Mohanan and Mohanan (1994) and also Sharma (1999), I assume that the canonical or unmarked word order in Hindi conforms to the thematic role hierarchy: (11) Thematic Role Hierarchy Bresnan and Kanerva agent > beneficiary > experiencer/goal > instrument > patient/theme > locative
Minimality in a lexicalist Optimality Theory
247
The effect of thematic roles for canonical word order is further manifested in the contrast between (12a) and (12b). Notice that the locative-OBL in (12b) is closer to the verb than the theme-OBJ: if canonical order were determined by grammatical functions, we would expect the object to be adjacent to the verb: (12) a. ilaa-ne anuu-ko ek haar bh ejaa. Ila-ERG Anu-DAT a necklace-NOM send-PERF ‘Ila sent a necklace to Anu.’ b. ilaa-ne anuu-ko ek ‰ahar bh ejaa. Ila-ERG Anu-DAT a city-LOC send-PERF ‘Ila sent Anu to a city.’ The dative case marker and the accusative case marker in Hindi are also identical: they are both -ko. The main verb sikh ‘teach’ in (13), when it is on its own, takes ergative or nominative case on the subject, depending on whether or not the verb is in perfective aspect. When the verb is combined with an indirect case inducing modal, however, the subject takes dative case, as shown by (13). In this construction too, reversing the order of the two nominals marked with -ko generates a different meaning in (ii). (13) raam-ko ilaa-ke bacce-ko gaanaa sikh aanaa hai. Ram-DAT Ila’s child-ACC music-NOM teach-NF be-PRES (i) ‘Ram has to teach music to Ila’s child.’ (ii) *‘Ila’s child has to teach music to Ram.’ The examples above demonstrate that grammatical functions and thematic roles do not necessarily coincide in their predictions so that one cannot assume only one canonical word order even within a single language. Furthermore, as typological word order studies have shown, the relative strength of the grammatical factors affecting canonical word order varies from language to language. These entail that there must be a language-particular system defining the balance of competing linearization factors. The Hindi examples above reveal the following generalization, based on Mohanan (1992): (14) Generalization: Canonical word order determined by the thematic role hierarchy becomes fixed if the case markings on two nominal arguments of a single predicate are identical under two alternative thematic role interpretations of the nominals.
248
Hanjung Lee
Quite independently of the relevance of freezing effects to the MLC, developing a syntactic account of word order freezing is motivated by the observation that it mirrors a broad crosslinguistic generalization about the ‘inverse’ relation between the amount of information about grammatical function expressed by case marking and the amount expressed by phrase structures. The correlation between case marking and word order flexibility in scrambling languages seems intuitive. However, to date, pre-OT generative approaches to word order variation have not been successful. This is due to the basic architectural properties of generative models of syntax. In GB theories of syntax (Chomsky 1981, 1986), which view order as an abstract underlying property of sentences, the problem of accounting for surface orderings is handled together with other aspects of structure such as Case and Agreement. As Lee (2001b) points out, such theories can easily account for word order variation within a particular language in terms of various movement processes, but lack any principled explanation of the coexistence of the flexibility and invariance of word order within languages. The word order freezing effect widely observed in scrambling languages does not follow naturally from frameworks which do not employ explicit transformational movement, either.9 In order to capture the generalization that ‘morphology competes with syntax’ both within and across languages formally, a mechanism like the candidate evaluation in OT is required.
3. A parallel correspondence-based OT model OT is a grammar formalism developed by Prince and Smolensky in the early 1990s. This new framework brings certain general connectionist computational principles into generative grammar. The connectionist concept of intuitive knowledge as a set of conflicting soft constraints, interacting via optimization of well-formedness, combines with linguistic representations and universal constraints from generative grammar. According to OT, a grammar is a system of conflicting universal constraints which are violable and ranked in a dominance hierarchy. Variation across languages reflects the resolution of conflicts among violable universal constraints. A surface form is ‘optimal’ in the sense that it incurs the least serious violations of a set of violable universal constraints, ranked in a language-specific hierarchy. OT has been applied to a number of areas of linguistic research since its extraordinary success in the domain of phonology. For the domains of syntax and semantics, a growing body of work has shown that many of the motivations for the OT approach to phonology are
Minimality in a lexicalist Optimality Theory
249
paralleled in syntax and semantics, since the pioneering work of Grimshaw (1997). OT is a theory of interactions of grammatical constraints, not a theory of representations. Thus, it is compatible with a wide range of representational formats. In this study, I assume the formal framework of Lexical-Functional Grammar (LFG: Bresnan and Kaplan 1982; Bresnan 2001) recast within the OT framework (OT-LFG: Bresnan 2000; Choi 1999; Kuhn 2001a, 2001b; Sells 2001a, 2001b). The main advantage of using LFG as a representational basis is that its very basic architecture is based on simultaneous competition between parallel, co-present structures. Hence, a theory like OT-LFG which sees constraint ranking as the means to resolve conflicts among a set of constraints in parallel structures provides an appropriate framework in which to approach the problem of word order freezing. Moreover, due to its parallel, correspondence-based architecture, OT-LFG allows a coherent and explicit formulation of markedness and faithfulness constraints in syntax without having to refer to framework-internal mechanisms of the transformational GEN. The general form of an OT grammar that I assume in this work is illustrated in (15). (15) OT-LFG Framework
In OT (Prince and Smolensky 1993), a grammar is a function mapping each linguistic input to its correct structural description or output. Within the OTLFG framework (Bresnan 2000), inputs are taken to be a (possibly underspecified) feature structure representing some given morphosyntactic content independently of its language-particular forms of expression, and the universal input is modelled by sets of f(unctional)-structures. Given an underspecified input f-structure, a set of output candidates (i.e., possible types of formal realizations of that input that are available across languages) are generated by the GEN(ERATOR). The essential property of GEN is its universality: input and candidate set are the same for all languages (this assumption is called ‘richness of the base’ (Smolensky 1996a)). For learnability, the input must be recoverable from the output (Tesar and Smolensky 1998). Therefore, the candidate set cannot simply be surface
250
Hanjung Lee
forms of expressions alone. In OT-LFG the recoverability of the abstracted f-structure input from the output containing the overt forms of expressions is ensured by taking GEN to be a universal LFG which generates the possible types of candidate c(onstituent)-structures and their corresponding fstructures. More accurately, the candidates are thought of as quadruples consisting of c-structures (lexical strings and trees [T]), f-structures and m(orpholexical)-structures (\ and h) and their correspondence functions. Following Kuhn (2001a, 2001b), I assume that all candidates satisfy certain basic inviolable principles (e.g., Uniqueness, Coherence, Completeness, Extended Coherence, Argument-Function Uniqueness, Xv-Theory (see Bresnan (2001) for a detailed discussion.), which can be encoded in an LFG grammar. That is, in the OT-LFG framework, the candidates for a given input can be defined as the structures generated by the LFG grammar encoding the inviolable principles, which are subsumed by the input in their f-structure. The evaluation of these candidate structures is the function of EVAL(UATOR), the component of ranked, violable constraints. The following section illustrates this approach with an analysis of case patterns and constituent ordering in Hindi.
4. OT constraints on case and constituent ordering In this and the following sections, I develop an OT account of word order freezing effects in Hindi. The key to the successful account of this problem is how to formally explicate the relation between the two alternative means of encoding the same grammatical relations, i.e., word order and case morphology. In this section, I present major constraints on case (section 4.2) and constituent ordering (section 4.3), after characterizing morpholexical representations for the case markers (section 4.1).
4.1. Lexical representations of case markers To get the analysis of the Hindi facts off the ground, we first need to characterize the content of the lexical representations for the case markers. In the OT-LFG approach adopted here, case is not viewed as being licensed by any particular formal feature or lexical item. Rather they are seen as active elements which contribute the construction of a clausal analysis by carrying lexically specified information, in line with the recent proposals for Constructive Case formulated by Nordlinger (1998). For instance, the universal
Minimality in a lexicalist Optimality Theory
251
content of the ergative case, independently of their forms of expression, can be characterized as follows: 10 (16) Ergative case marker: (CASE) = ERG (L-SUBJarg–str) (sem–str VOL) = + (sem–str CAUS) = + In addition to the regular case feature information, the ergative case marker carries two pieces of information: (i) information about the higher a(rgument)-structure within which it is contained, i.e., L(OGICAL)-SUBJ (the highest argument role),11 via inside-out function application (Nordlinger 1998); (ii) the semantic properties of the nominal to which it belongs (the agentive properties associated with volitional agents/causers (Dowty 1991)). The information in (16) thus corresponds to the characterization of ergative case as the lexical marker of agentivity. 12 Not all languages have ergative case which expresses all the content available, shown in (16). In some languages (e.g., the Australian language Dyirbal (Dixon 1979)), for instance, the use of the ergative is purely structural and is not tied to the semantic features listed in (16). This non-semantic use of the ergative can be modeled as a case of unfaithfulness in a way analogous to the expletive do in English (Grimshaw 1997; Bresnan 2000). It is possible that case morphemes are unspecified for the grammatical and semantic features, as in (17), and unspecified case is what is conventionally called ‘nominative’. This featureless default case is called “absolutive” if it contrasts with ergative case. Postulation of “absolutive” as a separate case, though useful at a descriptive level, has for some time been recognized as inadequate, because it obscures generalizations that cut across ergative/ absolutive and nominative/accusative paradigms. Goddard (1982), who shows that positing two separate case oppositions (ergative vs. absolutive and nominative vs. accusative) is incorrect for Australian languages, proposes to eliminate “absolutive” as a category. Following this proposal, I treat so-called “absolutive case” as the same case as nominative. In the present analysis, the nominative is the least marked case, being most general in the featural specification. Its meaning and distribution arise from constraint interaction. Having characterized the content of the lexical representations of the case morphemes, let us next define our constraint set. (17) Ergative case marker: (CASE) = NOM
252
Hanjung Lee
4.2. OT Constraints on case Before we move on to the discussion of OT constraints on case, it is essential to clarify the distinction between the case system and the inventory of case markers (Wierzbicka 1981; Goddard 1982; Mohanan 1994; Blake 2001). Case features conventionally labelled as nominative, ergative, and so on are entities drawn from a universal inventory and crosslinguistically characterizable in terms of their basic distribution and their interaction with other features. Case markers are the actual morphological realizations of the abstract case features. In Hindi, the features nominative, ergative, accusative and dative are associated with the following case markers: (18) Association between case features and case markers in Hindi FEATURE NOM ERG ACC DAT
MARKING
ø -ne -ko -ko
SYNTAX AND SEMANTICS
subject; inanimate direct object agentive subject with verb in perfective aspect primary object goal
Within the conception of linguistic structure in this work, the universal case features are part of the subsystem of grammatical features represented by candidate f-structures, while the case markings associated with them belong to the morphophonological component of the grammar. In this paper, I will be concerned only with case features, rather than with different realizations of them. The set of OT constraints on case features that I will assume here builds on the work of Lee (2001b, 2002, 2003a) on the case systems of Hindi and Korean. A leading idea of Lee’s approach to case is that case patterns can be treated as the result of interaction of two types of constraints fundamental to OT: faithfulness constraints, which minimize mismatches between feature specifications in two output feature structures, and markedness constraints, which prohibit certain case features and forms. Following Kuhn (2001a, 2001b), I assume that the output has faithfulness relations between the global f-structure and m-structure (featural specifications projected from the preterminal node of c-structure): grammatical information associated with the word in the c-structure must match the information in the global f-structure. This is the relation that constitutes FAITH-OfOm.
Minimality in a lexicalist Optimality Theory
253
The faithfulness constraints relevant for the present discussion are given in (19). These constraints implement the OT-LFG mechanism for the licensing of “semantic” case: (19) Of Om-Faithfulness constraints I a. IDENT-Of Om (SEM): If the output f-structure and morphological feature structure both have a SEM feature (VOL, CAUS), the values are identical (e.g., Ident Of Om (VOL), Ident Of Om (CAUS)). b. MAX-Of Om (SEM): If the output f-structure has a SEM feature, its corre sponding morphological feature structure also has a SEM feature (e.g., MAX-Of Om (VOL), MAX-Of Om (CAUS)). c. DEP-Of Om (SEM): If the morphological feature structure has a SEM feature, its corresponding f-structure also has a SEM feature (e.g., DEP-Of Om (VOL), DEP-Of Om (CAUS)). The IDENT-OfOm(SEM) constraints check the semantic compatibility between a case marker and the meaning of a verb or a clause, and the MAXOfOm(VOL) and DEP-OfOm(SEM) constraints check the specificity of semantic features. Generally, marking of an argument by a less specific case or a more specific case leads to MAX and DEP violations respectively, whereas the use of a semantically incompatible case marker leads to IDENT violations. Since semantic incompatibility is generally not allowed, the IDENT-OfOm(SEM) constraints should not be violated and are therefore among the highestranked constraints. These constraints are relevant for the choice between ergative and dative cases (IDENT-OfOm(SEM)) and between nominative and more specific cases (MAX-Of Om(SEM) and DEP-OfOm(SEM)). The OO-faithfulness constraint mentioned above is in conflict with the markedness constraints which penalize the featural complexity of candidates: 13 (20) Markedness constraints (*Feature) *ERGATIVE, *DATIVE, *ACCUSATIVE >> * NOMINATIVE These constraints were proposed in Woolford (2001), who suggests the universal hierarchy of * ERG, * DAT >> * ACC >> * NOM. My approach here also uses the notion of a case subhierarchy; in this analysis, however, I simply assume that nominative is universally the least marked case, without assuming an absolute markedness hierarchy for other cases.
254
Hanjung Lee
Finally, we come to the contextual markedness constraints on the ergative case: (21) Contextual Markedness constraints: a. ERGper f : The highest argument role in a perfective clause must be in the ergative. b.
ERGtrans: The highest argument role of a transitive verb must be in
the ergative. In functional terms, these positive markedness constraints can be viewed as ‘naturalness’ constraints stating the relation between marking of the prominent argument (ergative) and the clausal contexts in which it is most preferred (perfective and transitive contexts). The markedness constraint in (21a) is abundantly supported by functional and typological observations. Much work on perfective-split languages suggests that the prevalence of perfective splits in ergative marking is not accidental. Masica (1991: 646), for example, suggests that such splits reflect a universal tendency towards greater transitivity and more complete affectedness of the patient in perfective predications (see also Hopper and Thompson (1980) and Givón (1984)). DeLancey (1981) addresses the issue of the interaction of aspect with case marking. Specifically, he proposes an account based on deixis and a spatial metaphor. The agent-patient relation can be viewed metaphorically as motion from the agent to the patient. Likewise, perfective aspect focuses on the endpoint of the event and its effect on the patient, while imperfective focuses on the activity of the agent rather than its outcome. DeLancey (1981) interpretes the correlation between the marking of agents and the perfective aspect as representing the speaker’s viewpoint on the event: moving from agent to patient in the event is metaphorically interpreted as moving away from the speech situation. The imperfective aspect is therefore associated with an unmarked A argument, and the perfective aspect is associated with an unmarked P argument: (22) a. subject (A) object (P) b. imperfective perfective Woolford (2001) develops an OT account of aspectually based split ergativity in Hindi, based on faithfulness constraints that are contextually restricted to transitives and the perfective aspect in conjunction with markedness con-
Minimality in a lexicalist Optimality Theory
255
straints and general faithfulness constraints. In her view, aspectually based splits in Hindi result when the markedness constraint against ergative case is ranked between the contextually restricted faithfulness constraint and the general faithfulness constraint. Using this approach Woolford (2001) gets very interesting results in both split ergativity and split dativity. But, there are two major problems with her approach. The first concerns the treatment of ergative case as “lexical/quirky” Woolford’s account assumes, along with much work in the Minimalist Program, a structural distinction between structural Case and inherent Case. Structural Case is assumed to be licensed on an argument in a purely configurational way, i.e., in the proper structural relationship with the licensing head, whereas inherent Case (also called lexical or quirky case) is assigned to arguments associated with a particular theta-role in possible dependence on the governing predicates’ lexical properties. Thus, Woolford regards ergative (associated with agents), dative (associated with goals and experiencers), and accusative (associated with themes) as inherent Cases licensed by verbs that carry the specification of inherent Case licensing feature (called ‘lex’ in Woolford (2001)) in their lexical entry. Woolford carries over this conception of lexically determined inherent Case to her OT approach to Case. Let me make a few brief remarks about the way the licensing of inherent Case works. In Woolford’s system, licensing inherent Case involves faithfulness to lexical requirements, specifically inherent Case features of lexical items that are present in OT inputs. While it is true that the association of the ergative case with volitionality or conscious choice and of the dative case with sentience/perception, for example, is not absolute and exceptionless, the view that ergative and dative case as lexical cases are “quirky” and therefore must be specified in the lexical entry of each verb is problematic. It is clearly undesirable to treat cases that are predictable on the basis of semantic information as involving faithfulness to the inherent Case licensing feature ‘lex’ of the verb, which is a purely abstract diacritic feature without substantive basis. Such an approach that relies on lexical stipulation is not extendable to other instances of semantic case that are sensitive to the aspectual property of the VP (e.g., the Finnish partitive), or quantificational properties. The conception of lexically determined inherent Case also departs somewhat from the spirit of OT that the cross-linguistic variation in surface realization of underlying arguments must be derived (as much as possible) as an effect of constraint interaction. In the present approach, a clear distinction is made between such lexical exceptions (“quirky” or “lexical” case) and a semantically meaningful use of case. Only the former kind of case is truly idiosyncratic in that its distri-
256
Hanjung Lee
bution is indeed not predictable from general constraints governing correspondences between the semantics of a verb or clause and case morphology, and therefore must be lexically stipulated. Semantic case, in contrast, is viewed as having semantics of its own, rather than being licensed by any particular feature or lexical item. 14
4.3. Prediction of types of ergative languages Now let us examine the different types of ergative languages that are predicted, based on the possible rankings of violable markedness and faithfulness constraints. First, the classic type of ergative system, in which ergative is restricted to transitive clauses, can be described by the ranking in (23). If ERGtrans is ranked higher than *ERG, as in (23), the ergative must occur in transitives, because ERGtrans requires the use of the ergative feature only for the highest argument role of transitives. The relative ranking of *ERG with respect to other constraints makes ergative case unavailable in other contexts (i.e., intransitives). (23) Classic type of ergative languages ERGtrans >> *ERG >> FAITH-Of Om (SEM) >> *NOM, ERGper f This purely structural (non-semantic) use of ergative is a case of unfaithfulness. More precisely, this use of ergative incurs a violation of DEP-Of Om(SEM), since the semantic features ([VOL] and [CAUS]) present in the m-structure of the ergative case marker have no correspondent in the output global f-structure (reflecting the clausal meaning). Promotion of FAITH-Of Om(SEM) above *ERG as in (24) would yield the active-stative type ergative language, where the ergative surfaces in intransitives (taking agentive subjects) as well as transitives. (24) Active-stative type of ergative languages FAITH-Of Om (SEM) >> *ERG >> *NOM, ERGtrans, ERGper f If *ERG dominates all other constraints under consideration here, as in (25), then the ergative will be eliminated from the inventory of cases altogether. (25) Non-ergative *ERG >> all other constraints
Minimality in a lexicalist Optimality Theory
257
Lastly, the aspectually based split ergative pattern in Hindi can be described by the ranking in (26): (26) Aspectually based split ergative case system ERGper f >> *ERG >> FAITH-Of Om (SEM), *ACC >> *NOM The sandwiching of the constraint against ergative case between the contextual markedness constraint ERGperf and the general faithfulness constraint captures the widely observed case preemption pattern: when more than one case is available for one argument, indirect or semantic case often takes priority over direct or structural case. The ranking in (26) makes the correct prediction that ergative case surfaces in perfective clauses due to the priority of the contextual markedness constraint ERGper f (favoring semantic case) to *ERG; but outside that context, featural markedness takes over, and the subject is nominative, producing the aspectually-conditioned split ergative system.
4.4. OT constraints on constituent ordering The flexibility in the order of a verb’s arguments in Hindi can be modeled by assuming competing sets of alignment constraints in OT;15 the grammatical function ( GF)-based alignment constraints (27), the semantic role-based alignment constraints (28) and discourse-based alignment constraints (29). (27) AlignGF : a. SUBJ-L: Subject aligns left in the clause. b. OBJ-L: Object aligns left in the clause. (28) Aligne : a. PA-L: Proto-Agent aligns left in the clause. b. PP-L: Proto-Patient aligns left in the clause. Information structuring constraints (Choi 1999; Costa 2001; Samek-Lodovici 2001) can also be stated as alignment constraints, as in (29). The discourse motivation for locating background information at one end of the clause and other discourse information at the other seems transparent.
258
Hanjung Lee
(29) Discourse-based Alignment Constraints (AlignDF ): a. TOP-L: Topic aligns left in the clause. b. FOC-L: Focus aligns left in the clause. c. BCK-R: Background (BCK )information aligns right in the clause. Lee (2001a, 2001b) has motivated the following ranking of the alignment constraints for Hindi: (30) TOP-L, BCK-R >> ALIGNe >> ALIGNGF >> FOC-L Crucially, the ranking in (30) can predict that when the arguments do not differ in discourse status, the ALIGN constraints will take effect, leading to the canonical or unmarked order in Hindi (SUBJECT-OBJECT or agent-patient); when there are differences, the canonical order will however violate higherranking information structuring constraints, such that competitors with a different ordering can win out. Let us combine the ranking in (30) and that proposed in section 4.1 for a full picture of the interaction of two alternative morphosyntactic devices for specifying grammatical function information, case and word order. The full array is presented in (31), and the predictions of this ranking are discussed in the next section. (31) Ranking for Hindi: ERGper f >> *ERG *ACC >> *NOM
>> MAX-Of Om (VOL) >> TOP-L >> SUBJ-L >> OBJ-L,
5. Limitations of generation-based unidirectional OT As shown in section 2, arguments bearing identical case markings can be restricted to unmarked word order position in Hindi, and if their ordering is reversed, the meaning of the sentence cannot be maintained. However, the freezing effect of this type does not yet follow from the standard generationbased design of OT. The obvious problem is overgeneration of ungrammatical scrambling in sentences with ambiguous case marking, more generally generation of structures from which the (original) meaning is not recoverable. Section 5.1 discusses the form of the inputs and the candidates in the grammar model that I assume, before examining this problem in detail.
Minimality in a lexicalist Optimality Theory
259
5.1. The input-candidate relation and faithfulness evaluation I follow Bresnan (2000) and Kuhn (2001a, 2001b) in assuming the input to be partially underspecified f-structure, stripped of grammatical function labels. I further assume that a-structure, representing, among other things, the Proto-Agent and Proto-Patient properties of arguments (e.g., volitionality, causation, affectedness, etc.), is also part of the OT-LFG input for syntax (Asudeh 2001). Therefore, the input representation is a pair of underspecified f-structure and a-structure. As an illustration, the input for the Hindi sentence in (32) (repeated from (4a)) would be (33). This abbreviated format represents only the part of the f-structure and a-structure that is relevant for my analysis. (32) raam-ne bacce-ko u .th aayaa Ram-ERG child-ACC lift-PERF ‘Ram lifted the/a child.’ (33) Input f-structure PRED
‘Ram’
SEM
VOL
GF2
PRED
‘child’
ASP
PERF
PRED
‘lift arg1, arg2 ’
GF1
As discussed in section 3, the candidates and outputs will be quadruples consisting of c-structures, fully specified f-structures, m-structures and their correspondence functions. As for formal representation of the relationship between these, I adopt Kuhn’s (2001a, 2001b) formalization of the inputcandidate relation, whereby each candidate f-structure contains the same and more non-conflicting information relative to the input. Under this view, the licensing of semantic case can be modeled as a faithfulness relation between the output m-structure and the syntactic f-structure (Lee 2001b, 2003a). The correspondence between these two output structures can be checked by comparing the relevant semantic feature in a candidate’s syntactic f-structure and that in the m-structure of a case-marked NP. This idea is illustrated for the candidate analyses in examples (34).
260
Hanjung Lee
(34) a. Ergative subject
b. Nominative subject: Violation of MAX-Of Om (SEM)
Candidate (a) above is the optimal way of expressing the input (33) in Hindi. Here the feature [VOL] (lexically associated with the ergative case marker) belonging to the SUBJ argument of the candidate f-structure is present in the m-structure of that argument; but we do not find the [VOL] specification in the m-structure of candidate (b). In other words, the nominative marking of a subject which is a volitional agent is unfaithful to the input, leading to the MAX-Of Om(VOL) violation.
5.2. Interaction of case and word order in a unidirectional OT We are now ready to see how the constraints proposed in the previous section interact. Let us begin with an example of a perfective clause. We first take an input in which the lower argument (GF2) is specified as a TOPIC; the input information is shown in the top left corner of the tableaux. Tableau 1 in (37) below shows some competing candidates for perfective transitive clauses.
Minimality in a lexicalist Optimality Theory
261
Candidates (a) and (c) have an ergative subject, and they differ only in the relative order of the subject and the object, whereas in candidates (b) and (d) the subject is nominative. These are associated with the following syntactic f-structures, which contain the input together with case features. (35) F-structures for (a) and (c)
(36) F-structures for (b) and (d)
In other words, the semantically interpreted parts of the input and candidate syntactic f-structures are identical. At the morpholexical level, on the other hand, candidates may differ from the input in their featural specifications: as discussed in the previous section, at the morpholexical level, the ergative case marker is specified for ‘VOL’, whereas the nominative case marker is left unspecified. In the tableaux that follow, I will indicate morpholexical featural specifications that are relevant to faithfulness evaluations within [ ]. As we are interested in the interaction between the order of nominal arguments and their case marking, we will consider only candidates which have a verb in the clause-final position; also, although I just list the candidates with accusative (human) or nominative (inanimate) direct objects, we must assume that candidates with a dative object, plus many more, are generated by GEN. In Tableau 1, due to the high ranking of the contextual markedness constraint ERGper f , candidates (b) and (d), where the highest argument is not in the ergative case but in the nominative case, are ruled out immediately. This leaves candidates (a) and (c), and the relative high ranking of TOP-L favors (c) with the TOP (OBJ)-SUBJ order. If we add further alignment constraints (e.g., FOC-L and BCK-R, etc) to the constraint ranking and change the input specification of each argument’s discourse status, we would make the correct prediction that all six orderings of subject, object and verb are possible expressions for the content ‘liftarg1,arg2, ASP = PERF’.
262
Hanjung Lee
(37) Tableau 1. Generation of word order variants (perfective)
Now let us move to a nonperfective clause. Tableau 2 below schematically represents only candidates with the future form of the verb tod. ‘break’. Though they compete against each other in the universal candidate set, perfective clauses and nonperfective clauses each are more faithful to different inputs (specifically, perfective candidates will be ruled out by the faithfulness constraint on aspect, when the input is specified as [ASP FUT]). As Tableau 2 shows, the contextual markedness constraint ERGper f is irrelevant, since the four candidates all have the future verb form. So the decision on case is made entirely by the markedness constraints, and candidates (b) and (d) with the nominative subject win because they survive after candidates with the more marked ergative case are eliminated by *ERG. Of these, the S-O candidate (b) is eliminated by TOP-L, leaving (d) as the winner. The analysis in Tableau 2 shows that under the standard generationbased form of OT, in which syntactic structures are optimized with respect to semantic input, the constraint ranking for scrambling languages with rich case marking predicts that all possible orders of argument phrases are available for both clauses containing ambiguously case-marked arguments and
Minimality in a lexicalist Optimality Theory
263
(38) Tableau 2. Generation of word order variants (nonperfective)
clauses where two arguments bear different case markings. Hence, no difference in word order flexibility between the two cases is predicted.
6. Deriving Minimality Effects in bidirectional OT In the previous section, we have seen that word order freezing does not yet follow from the standard generation-based OT grammar. Intuitively, if we are going to rule out winners in standard generation-based optimization associated with the interpretation that does not match the preferred reading of the string, then we need to have a formal method for allowing the output of generation-based optimization to be checked against the string corresponding to the syntactic parse. This can be achieved by extending optimization to comprehension (or interpretation) as well as production (or generation) directions (Smolensky 1996b, 1998). Recently a class of bidirectional OT models have been proposed to handle various shortcomings in unidirectional OT. Section 6.1 provides a brief over-
264
Hanjung Lee
view of versions of bidirectional OT models. The models to be discussed are the strong bidirectional OT and weak bidirectional OT of Blutner (2000), the medium strength OT of Beaver (to appear), and the asymmetric OT of Wilson (2001). In section 6.2, we will consider application of these models to the problem of word order freezing in Hindi.
6.1. Bidirectional optimality Given that production-based and interpretation-based optimization are both well motivated, a question immediately arises as to how the two directions of optimization can be combined into a coherent theory of language structure and interpretation. One option is to combine them conjunctively, producing a model which Blutner (2000) calls the strong bidirectional OT model. The idea is that in order to be grammatical, a form-meaning pair f, m has to be optimal in both directions of optimization. That is, a formmeaning pair is strong OT optimal iff the form produces the meaning in interpretation and the meaning produces the form in production. So we arrive at the following definition of bidirectional optimality (The connective “ ” is read as “more harmonic than” or “more economical than”): (39) f, m is strong OT optimal iff a. f, m D GEN b. there is no f, m D GEN such that f, m c. there is no f, m' D GEN such that f, m'
f, m, and f, m.
Strong OT removes form-meaning pairs that are only optimal under one direction. In this way, it produces strictly fewer form-meaning pairs than either unidirectional production or interpretation OT would with the same constraint ranking, and consequently it can model word order freezing effects, as I will show in section 6.2. However, Strong OT undergenerates form-meaning pairs in certain cases. As noted by Blutner (2000), this poses an empirical problem for partial blocking phenomena: what happens when a specialized or lexicalized expression rules out some (usually analytic or productive) expression for a particular (usually normal or stereotypical) subrange of interpretations, but not for the entire range. An example from McCawley (1978) is that of causatives. The observation is that the existence of a lexical causative “kill” blocks “cause to die” from having its canonical meaning. “Cause to die”
Minimality in a lexicalist Optimality Theory
265
comes to denote a non-canonical killing, for instance one where the chain of causation is unusually long or unforseeable. Blutner’s (2000) weak notion of optimality, which I refer to simply as Weak OT, is an iterated variant of Strong OT that produces partial blocking instead of strict blocking. In Weak OT, sub-optimal candidates in a strong bidirectional competition can become winners in a second or later round of optimization. I will illustrate how Weak OT predicts partial blocking using the example of lexical and periphrastic causatives “kill”/ “cause to die” which we assume are matched on the meaning side by two possible interpretations, direct causation (canonical killing) and indirect causation (non-canonical killing). The following three diagrams, illustrate three phases of weak optimization. In the first diagram, all the unidirectionally optimal links are shown. In addition to the optimal links, two links are shown with dashed lines. Both of these links are unidirectionally sub-optimal at this stage, beaten by other candidates. PHASE 1 – NAIVE INTERPRETATION AND PRODUCTION:
F “kill” “cause to die”
M direct causation indirect causation
In phase 2 of Weak optimization, two unidirectionally optimal links are blocked, leaving a single bidirectionally optimal link, that between the form “kill” and the meaning corresponding to direct causation. PHASE 2 – PRUNING:
F “kill” “cause to die”
M direct causation indirect causation
266
Hanjung Lee
Now we graft the originally sub-optimal links between “cause to die” and the indirect causation meaning back into the picture, since the candidates which originally beat them have been removed by blocking. This gives us two bidirectionally optimal links. In the resulting happy picture, all the candidate meanings are uniquely expressible and all the candidate forms are uniquely interpretable: PHASE 3 – GRAFTING:
F “kill” “cause to die”
M direct causation indirect causation
Blutner (2000) argues that Weak OT captures the essence of the pragmatic generalization that “unmarked forms tend to be used for unmarked situations and marked forms for marked situations” (Horn 1984: 26). As Beaver and Lee (2003, 2004) point out, however, Weak OT suffers from a serious problem of over-generation. Specifically, the process of adding extra links will eventually provide links for every form (if there are at least as many forms as meanings), or every meaning (if there are at least as many meanings as forms). Beaver and Lee (2003) discusses a variant system of Weak OT which models partial blocking without leading to such great overgeneration. This variant system, which they refer to as Medium Strength OT, performs only one iteration of the Weak OT process, pruning once and grafting once. As a result, it maintains some of the properties of Weak OT, but lacks Weak OT’s “everyone’s a winner” profligacy. So far, we have considered three symmetric notions of bidirectional optimality. These do not exhaust possible options of combining the two directions of optimization: one may want to apply them in sequence, such that the first optimization affects the candidate set for the second. Wilson (2001) discusses a model in which interpretation precedes production. I will refer to this as Asymmetric OT.16 In more detail, the idea of Asymmetric OT is as follows: (i) Interpretation: Given any form-meaning pair f, m, find the most harmonic semantic interpretation of f. (ii) Production: Given input meaning m, take as candidate outputs the set of forms f such that f, m is optimal in stage one, and perform standard OT production
Minimality in a lexicalist Optimality Theory
267
optimization with this restricted candidate set. Note that the set of optimal form-meaning pairs in production is a subset of the optimal form-meaning pairs in interpretation. The set of meanings which are in some optimal pair is the same in interpretation and production, although the number of forms would, for constraint sets which are of interest, be smaller in production than in comprehension. It is the reduced set of forms in production, those which result from the two stage process, which are to be considered grammatical, even though there are others which are interpretable. Asymmetric OT has an interesting range of strengths and weaknesses. In particular, it predicts partial blocking in some cases but does not model the full range. This problem will be discussed in more detail in section 7.
6.2. A bidirectional OT analysis of word order freezing in Hindi This section presents a bidirectional OT account of the word order freezing effects in Hindi. Building on Lee (2001a, 2001b, 2002), I will argue that bidirectional optimization is necessary to account for word order freezing effects one find within particular languages: they reflect principles of grammar and can be explained in terms of the interactions of grammatical constraints and the direction of optimization.17 For sentences like (32) containing unambiguously case-marked arguments, change in the order of these NPs does not change their syntactic function interpretation. This is because a candidate that does not interpret the grammatical function of argument NPs in line with the unmarked grammatical function/case association (e.g., SUBJ/ergative and OBJ/accusative) violates the high-ranked contextual markedness and OO-faithfulness constraints and these violations lead to unrecoverability of the original input. Let us see what happens if we apply comprehension-based optimization to a sentence with ambiguously case-marked arguments. Tableau 3 below shows comprehension-directed optimization based on the phonological string .thelaa patthar tod. egaa (in (7)), the overt part of the winner in the production-based optimization in Tableau 2 above. As shown in Tableau 3, unlike in production, in comprehension, only the candidates which share the same string are competing structures and hence different interpretations of this string compete. (In Tableau 3 the first four candidates (a)-(d) mean ‘The stone (S) will break the cart (O)’, whereas candidates (av)-(dv) ‘The cart (S) will break the stone (O)’.) Here the highest-ranking markedness constraint ERGper f is inapplicable, because in all the candidates do not contain the perfective verb form. The
268
Hanjung Lee
(40) Tableau 3. Comprehension-based optimization
faithfulness constraint MAX-Of Om (VOL) has no effect either, because in the candidates there is no [VOL] feature. So the decision on the grammatical roles of ambiguously case marked arguments is made entirely by the lowerranking alignment constraints. The result is the SO interpretation of potentially ambiguous strings (candidate (dv)). The marked OS interpretation (candidate (d)) is eliminated not because it violates high-ranking markedness and faithfulness constraints but because it violates low-ranking alignment constraints. We can illustrate this graphically as follows: 18 PRODUCTION
F arg1- NOM arg2- NOM pred
M pred’(arg1’,arg2’)
arg2- NOM arg1- NOM pred
pred’(arg2’,arg1’) INTERPRETATION
F arg1- NOM arg2- NOM pred
M pred’(arg1’,arg2’)
arg2- NOM arg1- NOM pred
pred’(arg2’,arg1’) STRONG OT OPTIMAL = PROD. E INT.
F arg1-NOM arg2-NOM pred
M pred’(arg1’,arg2’)
arg2-NOM arg1-NOM pred
pred’(arg2’,arg1’)
Minimality in a lexicalist Optimality Theory
269
Thus, by bidirectional optimization we correctly predict that the most harmonic meaning for the string .thelaa patthar tod. egaa is the SO interpretation. What we have here is the emergence of the unmarked (see McCarthy and Prince (1994)) in comprehension grammar: the alignment constraints favoring canonical word order become decisive when faithfulness is no longer a determining factor. The losing candidate (d), even if optimal in production-based optimization (see Tableau 2 above), is blocked (i.e., made suboptimal and thus ungrammatical) by (dv) on markedness grounds. The three other models of bidirectional OT discussed in section 6.1 above all insure this result: in Weak OT and Medium Strength OT, the Hindi double nominative construction is predicted to have only a SO interpretation. Consider in the abstract the two forms ‘arg1-NOM arg2-NOM predimper f’ and ‘arg2-NOM arg1-NOM predimper f’: both of these forms will be paired with meanings (i.e., SO interpretation) in the first phase of Weak/Medium optimization, so neither will enter into later competitions, and neither will become associated with incorrect argument-function mappings. This is predicted in Wilson’s Asymmetric OT as well: both of the Hindi double nominative forms will be paired with the SO interpretation in the first, interpretation stage of optimization. So the pairs of these forms and the OS interpretation will not be included in the candidate set for the second, production optimization, and we derive the effect of freezing. Before closing this section, I discuss further advantages of the present bidirectional OT account. First, in this account, there is no need to posit a separate relativized minimality constraint, say, the version given in (41). Rather the apparent relativized minimality effect is a pure consequence of the relative ranking of independently motivated constraints, which are not in any sense “economy” or “minimality” constraints.19 (41) RELATIVIZED MINIMALITY: Mark ‘*’ for each chain link across a ‘similar’ element. The system of non-derivational correspondence constraints has further important consequences. First, as demonstrated by Sells (2001b), the nonderivational alignment-based account has certain other recurrent properties of movement that are not naturally covered by the MLC, such as order preservation effects in whereby underlying linear relationships are preserved under movement operations as a consequence. Under any account involving movement, this generalization can only be stipulated using extra devices or constraints (as in Müller’s (2001) account). Further, the system of alignment constraints can account for the way grammatical functions are interpreted in
270
Hanjung Lee
languages which do not show word order freezing effects: as Lee (2000) shows, ambiguity in grammatical function interpretation falls out from the variable ranking of the same alignment constraints that are responsible for the freezing effect. Finally, the parallel correspondence-based account is attractive, because it makes it possible to explain standard MLC phenomena also in terms of constraints on the correspondence between different structures (e.g., semantic structure, LF and PF) (Vogel, this volume), without recourse to mechanistic (framework-internal) constraints.
7. Bidirectional optimization and recoverability in head-marking languages The previous section presented a bidirectional OT account of a particular minimality effect, word order freezing in Hindi. We have seen that minimality effects in Hindi are derived as a consequence of independently motivated universal, violable constraints on case and constituent ordering and their interaction (the ranking CASE >> ALIGNDF >> ALIGNGF). The relation between case and word order in dependent-marking languages like Hindi is paralleled by the relation between agreement, word order and voice in head-marking languages.20 In a number of head-marking languages, the distribution of voice displays a highly structured pattern of interpretational preferences and blocking effects. In this section, I argue that this pattern can be formally modeled in bidirectional OT as a specific case of partial blocking, building on the insights by Aissen (2002). In this way, bidirectional OT suggests a much more general account of the relation between alternative morphosyntactic devices for encoding (or recovering) grammatical relations than unidirectional OT or minimalism.
7.1. Recoverability and blocking effects in Chamorro and Tzotzil In a number of head-marking languages, agreement and voice have a relation similar to that between case and word order in dependent-marking languages. In Chamorro (Western Austronesian; Chung 1981, 1989, 1998), for example, active voice is excluded and passive obligatory when both agent and patient are 3rd person, and the patient is more prominent in animacy than the agent:
Minimality in a lexicalist Optimality Theory
271
(42) Disharmonic GF / ANIMACY association in Chamorro a. *Ha-ispanta i ekspiriensianñiha i palao’an. [*Active] 3SG-frighten DET their.experience DET woman ‘Their experience frightened the woman.’ (Chung 1998) b. In-ispanta i palao’an ni ekspiriensianñiha. [√Passive] PSV-frighten DET woman OBL their.experience ‘The woman was frightened by their experience.’ (Aissen 2002a) This is also true in various Mayan languages and Salish languages. For example, in Tzotzil (Mayan; Aissen 1997, 2002), active transitive clauses in which 3rd person patient is more prominent in animacy than 3rd person agent are excluded. Such blocked active clauses can only have the inverse, harmonic interpretation: (43) Disharmonic GF / ANIMACY association in Tzotzil a. Ixpoxta Xun li poxe. [Active] cured(3-3) Juan DET medicine ‘Juan cured the medicine.’ (*‘The medicine cured Juan.’) (Aissen 2002) b. Ipoxta-at ta pox li Xune. [√Passive] cured-PSV by medicine DET Juan ‘Juan was cured by (the) medicine.’ (Aissen 2002) Interestingly, marked active clauses with local person objects are fully grammatical: (44) Ha-na’ma’a’ñao yo’ i estoria. [Chamorro] 1SG DET story 3SG-frighten ‘The story frightened me.’ (45) Lixpoxta li poxe. [Tzotzil] cured(3-1) DET medicine ‘The medicine cured me.’
(Cooreman 1987)
(Aissen 2002)
Why is active voice excluded only when the agent and patient are 3rd person but not when the patient is local person? The difference between these two clause types lies in the form of agreement. In the languages under discussion here, both subject and object are obligatorily marked as morphemes prefixed to the verb stem in active transitive sentences. These agreement markers
272
Hanjung Lee
and their order in the verb clearly distinguish subject from object in most person and number combinations. When the subject and object are 3rd person and of the same number, however, the problem of ambiguity can arise, since one cannot tell from the form of agreement alone which is the subject and which is the object. In such a case, Chamorro and Tzotzil avoid this potential ambiguity by appealing to a rigid hierarchy of animacy: the choice of subject is restricted only to the more prominent argument in the dimension of animacy. This is another clear case of the ‘emergence of the unmarked’ effect in interpretation: the harmonic association of the hierarchies of GF and animacy emerges as the unmarked case in the absence of other reliable cues for disambiguation. Note, however, that there is an interesting difference between Hindi and the head-marking languages under discussion here with respect to the possible mechanisms used to disambiguate subject from object. As noted in section 2, in Hindi, word order determined by the hierarchies of grammatical function and thematic roles is the primary means used to distinguish the argument roles in potentially ambiguous sentences. In Chamorro and Tzotzil, on the other hand, semantic features of arguments and voice play key roles (Nichols 1986; Aissen 1997).
7.2. Aissen’s bidirectional OT account Aissen (2002) offers a bidirectional OT account of the facts of Chamorro and Tzotzil discussed above. She assumes three classes of conflicting markedness constraints; a set of BIAS constraints which penalize subjects and objects with marked semantic features (46a) (Aissen 2003), a constraint which penalizes passive (46b), and (encapsulated) constraints which determine agreement (46c). Here we assume a specific version of this constraint which requires that a verb must not agree with obliques: (46) a. BIAS (e.g., *SUBJ / INAN (IMATE), *OBJ / HUM (AN)) b. *PSV (e.g., *SUBJ / PP) c. AGR (e.g., *AGRobl) The ranking that Aissen assumes for Chamorro and Tzotzil is: (47) AGR >> *PSV >> BIAS
Minimality in a lexicalist Optimality Theory
273
This ranking ensures that active transitive clauses are optimal, if subject and object differ in agreement features. When subject and object have same agreement features, however, the AGR constraint is vacuously satisfied and the lower-ranking constraints, *PSV and BIAS, become active, deriving the emergence of the unmarked effect. Aissen (2002) presents an OT treatment of this effect based on Wilson’s Asymmetric OT model. Recall that in Wilson’s model, interpretation optimization applies first to limit the candidate set for the second, production optimization, such that only winning candidates in interpretation enter into the production optimization. For the blocking effect in the voice system of Chamorro and Tzotzil under discussion here, the consequence of this is as follows: in the interpretation optimization with the passive form as the input, both harmonic and disharmonic interpretations are recoverable in different optimizations, as shown in the Chamorro examples in (42b) and (48). Any alternative interpretation would violate the higher-ranked AGR constraint, and hence is eliminated from the competition. (48) Pärä u-ni-na’gasgas i täpbla ni chi’luhu. [Chamorro] FUT 3SG-PSV-clean DET floor OBL my.sibling ‘The floor was cleaned by my sister.’ In the interpretation optimization with the active form as the input, on the other hand, the disharmonic interpretation (exemplified in (42a) and (43a) above) loses out to the harmonic interpretation of the same string. Consequently, the active form is excluded from the candidate set for the production competition with the disharmonic interpretation as the input. As a result, the passive candidate wins, and the disharmonic interpretation is predicted to be realized as passive voice. In the production competition with the harmonic interpretation as the input, however, active and passive would compete together (they were both winners in the interpretation optimization), and Aissen’s constraints (*PSV >> BIAS) correctly predicts that the active is the winner, i.e., active blocks passive. The optimization processes described above are pictured in the following diagram, where candidates are marked “°” for those competitions where they are not participants:
274
Hanjung Lee
Thus Aissen’s Asymmetric OT account successfully captures the key generalization that passive is avoided unless the intended meaning cannot be recovered from active in Chamorro and other head-marking languages which use a rigid hierarchy for disambiguation. This prediction does not follow from Strong OT. As the following diagram shows, under the constraints assumed, Strong OT incorrectly predicts that Chamorro passives are uninterpretable in the given configuration, and that there is no way of expressing the input corresponding to the disharmonic interpretation:
Minimality in a lexicalist Optimality Theory
275
The pattern of (near) complementary distribution of active and passive in Chamorro is an example of partial blocking: optimality of active under harmonic interpretation leads to partial blocking of passive, such that passive is used to express the meaning that cannot be expressed by the active. The general tendency of partial blocking is that “unmarked forms tend to be used for unmarked situations and marked forms for marked situations” (Horn 1984: 26) – a tendency that Horn terms the division of pragmatic labor. In the case of the Chamorro voice system, the marked form is passive voice, and the marked meaning is the input corresponding to m2 in the diagrams above. Asymmetric OT successfully deals with partial blocking in the case of the Chamorro voice system. It should be noted, however, that it fails to model the standard cases of partial blocking discussed earlier in this paper. An important aspect of the Chamorro data is that both marked form, unmarked meaning and marked form, marked meaning survive in interpretation and that marked form, unmarked meaning is blocked by unmarked form, unmarked meaning in the production competition. This blocking process is what makes marked form, marked meaning available. The standard cases of partial blocking differ from partial blocking in the Chamorro case in one crucial way. As Tableau 4 below shows, the two pairs
276
Hanjung Lee
marked form, unmarked meaning and marked form, marked meaning do not have the same constraint profile (In Tableau 4, ECONOMY is a formal markedness constraint (a preference for short forms), and CANON is a semantic markedness constraint (a preference for the canonical mode of causation): (49) Tableau 4. Interpretation (Asymmetric OT)
As can be seen, the constraints above yield a preferred interpretation of “cause to die” as involving canonical direct causation. Therefore, in the production competition with indirectly caused death as input meaning, “cause to die” is not even amongst the candidate outputs, and cannot be the winner. We can see the difference between the two cases, and how they are treated in Asymmetric OT and Blutner’s Weak OT, graphically. Diagram (i) shows the results of applying Weak OT to either the case of causatives or the Chamorro case: the marked form becomes uniquely associated with the marked meaning in both directions of optimization, while the unmarked form and unmarked meaning continue to be a bidirectionally optimal pair as they were in the original cases. Asymmetryic OT does not achieve the harmonious situation depicted in (i) for either case of partial blocking. What it does achieve is represented in (ii) and (iii). Diagram (ii) shows the results of applying Asymmetric OT to the Chamorro case. Here we see that the division of labor depicted in (i) is almost achieved, except that there remains the possibility of interpreting the marked form as the unmarked meaning. This is a result of the fact that Wilson’s proposal does not innovate above unidirectional interpretation OT as regards interpretation. When Asymmetric OT is applied to the classic “cause to die” situation, what results is (iii). Wilson’s system does not succeed in creating any link between the marked form and the marked meaning, so we can see that it does not provide a very general model of partial blocking. In these cases we might better describe what it does as “almost blocking”.
Minimality in a lexicalist Optimality Theory
277
7.3. An alternative Medium OT account It was noted in section 6.1 that Weak OT models partial blocking only at the expense of massive overgeneration. This makes it untenable as a model of online interpretation or production. Can we model the full range of partial blocking phenomena in a unifying way without such great overgeneration? The possibility I will consider here is the system of Medium Strength OT discussed by Beaver (to appear) and Beaver and Lee (2003). As noted earlier in this paper, this variant system of Weak OT performs only one iteration of the Weak OT process, pruning once and grafting once. As a result, it maintains some of the properties of Weak OT, but lacks Weak OT’s “everyone’s a winner” profligacy. In more detail, Medium Strength OT operates as follows. (i) starting with a set of production links and a set of interpretation links, find strong bidirection optimal form-meaning pairs. (ii) mark form-meaning pairs that have identical form or meaning to a bidirectionally optimal pair, but worse constraint violations. (iii) recalculate production and interpretation links for the remainder to get a new set of strong bidirection optimal pairs. The set of medium strength winners is just the union of the winning sets from each round. Stage (ii) corresponds loosely to the pruning phase (phase 2) of Weak OT. In Medium Strength OT, the recoverability condition on optimality (Smolensky 1998) is implemented into the model as a meta-linguistic con-
278
Hanjung Lee
straint that acts as a blocking mechanism in the pruning phase. Let us term this *BLOCK, defined as follows: (50) *BLOCK: A form-meaning pair may not be dominated by (i.e., loses out to) a bidirectionally optimal candidate in either direction of optimization in the tableau consisting of all constraints except *BLOCK. In what follows I will illustrate how Medium Strength OT predicts partial blocking in the Chamorro case discussed in section 7.1. Consider first the following bidirectional tableau, in which the *BLOCK column is blank, but other constraint violations are marked. Candidate (a) ( f1 , m1 in the diagrams in section 7.2) emerges immediately as a bidirectionally optimal form-meaning pair: 21 (51) Tableau 5. Partial blocking in Medium OT I
Now let us consider how violations of *BLOCK are evaluated. Of the three candidates that are originally non-optimal, candidates (b) and (c) have identical form or meaning to the bidirectionally optimal candidate (candidate (a)), but worse violations of the standard constraints. Hence they are marked with a star in the *BLOCK column, as shown in Tableau 6: (52) Tableau 6. Partial blocking in Medium OT II
Minimality in a lexicalist Optimality Theory
279
Thus Medium Strength OT produces two bidirectionally optimal candidates, f1, m1 and f2 , m2. The same result occurs with the standard cases of partial blocking, so no tableau will be shown here. In summary, I have presented a bidirectional OT account of disambiguation patterns in head-marking languages in which hierarchy-driven voice plays a key role in distinguishing subject from object in potentially ambiguous sentences. In such languages, the distribution of voice displays a highly structured pattern of interpretational preferences and blocking effects. I have argued that in Medium Strength OT, this pattern can be formally modeled as a specific case of partial blocking, building on the insights by Aissen (2002). Thus bidirectional OT suggests a much more general account of the relation between alternative morphosyntactic devices for encoding (or recovering) grammatical relations than unidirectional OT or minimalism.
8. Conclusion The present study has shown that the apparent MLC effect in the phenomenon of word order freezing is derivable in a bidirectional OT from interaction of violable, ranked correspondence constraints that are independently motivated for an account of crosslinguistic variation in case patterns and constituent ordering. I have also discussed the problem of recoverability in the two head-marking languages, Charmorro and Tzotzil, and argued that the key generalization that passive is avoided unless the intended meaning cannot be recovered from active can be formally modeled in Medium Strength OT as a specific case of partial blocking. These results thus provide a strong argument for a grammar model in which recoverability plays a key role in determining the class of optimal structures in human language.
Acknowledgements I am grateful for critical feedback to participants at the Workshop on Minimal Link Effects in Minimalist and Optimality Theoretic Syntax (University of Potsdam, March 21–22, 2002), two anonymous reviewers, and to the editors of the volume, Gisbert Fanslow, Arthur Stepanov and Ralf Vogel, who gave feedback and advice all along the way. Any errors of fact or interpretation are due to my own shortcomings. This work was in part supported by the National Science Foundation grant BCS-9818077.
280
Hanjung Lee
Notes 1. ignoring oblique case-marked subjects here. 2. The abbreviations used in the glosses: ACC ‘accusative’, DAT ‘dative’, DET ‘determiner’, ERG ‘ergative’, FUT ‘future’, INST ‘instrumental’, NF ‘non-finite’, NOM ‘nominative’, PERF ‘perfective’, PRES ‘present’, PSV ‘passive form’. 3. See Masica (1982, 1991), Butt (1993), Mohanan (1994), Aissen (2003) and references cited therein for further details and examples. 4. The other environment for word order freezing occurs when a sentence contains highly marked types of subject and object. In this paper I limit myself to the word order freezing arising from morphological ambiguity, excluding the type of word order freezing that occurs in the case of extreme markedness. For OT works developing an approach to the latter type of word order freezing, see Lee (2001a, 2001b, 2003b). 5. As is well-known, displacement from canonical position in Hindi creates definiteness effects. For example, the preferred interpretation of the object .t helaa ‘cart’ in (3) is as indefinite. However, displacing this object from its canonical position immediately preceding the verb tends to restrict the interpretation as definite. In contrast, the subject patthar ‘stone’ in (3) can be marginally interpreted as indefinite in its canonical clause-initial position, but it can be interpreted only as definite in the position immediately preceding the verb. 6. Evidence that objects without overt marking are in the nominative rather than the zero-marked accusative comes from verb agreement. As argued in Mohanan (1994, section 4.3.3.2), verbs in Hindi can agree with the object in the nominative, if the subject is not nominative. Verbs do agree with inanimate objects not overtly marked, suggesting that they are in the nominative. That objects without overt marking must be treated as nominative rather than accusative is further endorsed by facts of modifier agreement and coordination. See Mohanan (1994: 87–90) for further details and examples. 7. If not marked otherwise, all Hindi examples are taken from Mohanan (1992), Mohanan (1994), Mohanan and Mohanan (1994). 8. Subjects and objects are the core argument functions that map to the central participants of the eventuality expressed by the verb. They are usually formally distinguished from obliques, which are noncore argument functions, and adjuncts, which are not argument functions. 9. See Lee (2001b, section 3.1) for a detailed discussion of limitations of pre-OT generative approaches to syntax. 10. In some languages case affixes are not separated from the stem. For these languages, the information in (10) would be part of the morphological structure of case-inflected words. 11. Whether this argument is mapped onto the grammatical subject (syntactically accusative languages) or other function (syntactically ergative languages) is determined by the constraint ranking of particular languages.
Minimality in a lexicalist Optimality Theory
281
12. The model of constructive case as formulated by Nordlinger (1998) makes crucial reference to a morphemic approach to morphology: morphemes are directly associated with functional descriptions which state that they define or constrain the larger syntactic environment within which they appear. In this work, I will assume a simple morpheme-based approach much like that assumed in much work on constructive morphology. It should be noted, however, that nothing in the general OT approach to case developed here hinges crucially on such a view of morphology, and the central idea of constructive morphology could be translated into a word-based morphology, e.g., an inferential-realizational approach to morphology (Stump 2001; Börjars, Vincent and Chapman 1997; Sadler and Spencer 2001). See Sadler and Nordlinger (2003) for further discussion. 13. It should be emphasized that the constraints in (13) refer to case features, not language-particular forms that realize them. 14. A reviewer points out that “quirky” or “lexical” case is not necessarily used at random. This criticism is warranted. There might be systematicity in the use of lexical case that is the result of a grammar’s evolution. A well-known example illustrating this is lexical case in Icelandic. Consider, for example, the following example of an Icelandic double accusative construction: (i) Drengina vantar mat. boys-ACC lacks food-ACC ‘The boys lack food.’ Interestingly, it has been observed in the literature that double accusative verbs in Icelandic like vantar ‘lack’ now turn to dative-accusative verbs. This phenemenon, referred to as ‘dative sickness’ by Smith (1994), could be predicted under the assumption that ‘lexical’ or ‘quirky’ case (which is stipulated in the lexical entry of a verb) is more marked than structural or grammatical case, and thus historical change occurs toward a direction in which lexical stipulations are simplified. Whereas the framework of OT has been increasingly applied to lexical inventories, it is still an open issue how an OT account can provide for the whole set of case patterns, including both lexical and structural cases, found in particular languages (See Wunderlich (2001) for some insightful suggestions on this issue). 15. See McCarthy and Prince (1993) and Kager (1999, Ch. 3) for an overview of alignment in OT. For recent work on OT syntax applying alignment constraints to syntactic positioning, see Choi (1999), Costa (1998, 2001), Grimshaw (1997, 2001a, 2001b), Legendre (2001), Samek-Lodovici (2001), Sells (2001b), among others. 16. For discussion of different asymmetric models, see Zeevat (2000), Jäger (2004) and Vogel (2004). 17. A similar account of word order freezing has been proposed by Kuhn (2001a, 2001b), within a broader consideration of the development of a parsing and generation algorithm for OT-LFG. Application of bidirectional optimization to standard MLC phenomena is found in Vogel (this volume).
282
Hanjung Lee
18. Here discourse-based alignment constraints such as TOP-L are inapplicable because there is no previous context which can make one argument more topical than others. In order to capture the effects of context on disambiguation, however, a representation of the contextual information for previous sentences need to be assumed as part of the input to comprehension-directed optimization. In other words, when context is supplied to comprehension-directed optimization, the string input to comprehension is enriched with interpretational features, just as the f-structure input to the production direction contains information about each element’s discourse status. This additional information then will play a role in selecting the optimal analysis of the string, by activating discourse-based alignment constraints. 19. Compare Legendre, Smolensky and Wilson (1998) for a similar result with respect to MLC effects in multiple wh-questions. More recently, Grimshaw’s (2001a) has argued that there should not be any “economy constraints” as such, but that economy of structure should be a by-product of a set of constraints on phrase structure. 20. A bidirectional OT account of freezing effects in head-marking Bantu languages is presented in Lee (2000). 21. I use the victory symbol (✌) to mark a bidirectionally optimal form-meaning pair.
References Aissen, J. 1997 1999 2002 2003 Asudeh, A. 2001
On the syntax of obviation. Language, 73, 705–750. Markedness and subject choice in Optimality Theory. Natural Language and Linguistic Theory, 17, 673–711. Bidirectional optimization in non-configurational contexts. Paper presented at NELS 33, MIT. Differential object marking: Iconicity vs. economy. Natural Language and Linguistic Theory, 21, 435–483. Linking, optionality and ambiguity in Marathi: An Optimality Theory analysis. In P. Sells (Ed.), Formal and Empirical Issues in Optimality Theoretic Syntax, (pp. 257–312). Stanford: CSLI Publications.
Beaver, D. I. to appear The optimization of discourse anaphora. Linguistics and Philosophy. Beaver, D. I. & Lee, H. 2003 Form-meanig asymmetries and bidirectional optimization. In J. Spenader, A. Eriksson, & Ö. Dahl (Eds.), Proceedings of the Stockholm Workshop on Variation within Optimality Theory, (pp. 138 –149). Department of Linguistics, Stockholm University. 2004 Input-output mismatches in OT. In R. Blutner & H. Zeevat (Eds.), Optimality Theory and Pragmatics, (pp. 112–153). Palgrave.
Minimality in a lexicalist Optimality Theory Blake, B. J. 2001 Blutner, R. 2000
283
Case (Second Edition). Cambridge: Cambridge University Press.
Some aspects of optimality in natural language interpretation. Journal of Semantics, 17, 189–216. Börjars, K., Vincent, N., & Chapman, C. 1997 Paradigms, periphrasis and pronominal inflection: A feature-based account. In G. Booij & J. van Marle (Eds.), Yearbook of Morphology 1996, (pp. 155–180). Dordrecht: Kluwer Academic Publishers. Bresnan, J. 2000 Optimal syntax. In J. Dekkers, F. van der Leeuw, & J. van de Weijer (Eds.), Optimality Theory: Phonology, Syntax, and Acquisition, (pp. 334–385). Oxford: Oxford University Press. 2001 Lexical-Functional Syntax. Oxford: Blackwell. Bresnan, J. & Kanerva, J. 1989 Locative inversion in Chichewa: ˆ A case study of factorization in grammar. Linguistic Inquiry, 20, 1–50. Bresnan, J. & Kaplan, R. 1982 Lexical-Functional Grammar: A formal system for grammatical representation. In J. Bresnan (Ed.), The Mental Representation of Grammatical Relations, (pp. 173–281). Cambridge: MIT Press. Butt, M. 1993 Object specificity and agreement in Hindi/Urdu. In Proceedings of the 29th Chicago Linguistic Society, (pp. 80–103). Chicago: Chicago Linguistic Society. Butt, M. & King, T. H. 2002 The status of case. In V. Dayal & A. Mahajan (Eds.), Clause Structure in South Asian Languages. In press. Dordrecht: Kluwer Academic Publishers. Choi, H.-W. 1999 Optimizing Structure in Context: Scrambling and Information Structure. Stanford: CSLI Publications. Chomsky, N. 1981 Lectures on Government and Binding. Dordrecht: Foris. 1986 Barriers. Cambridge: MIT Press. 1993 A minimalist program for linguistic theory. In K. Hale & S. J. Keyser (Eds.), The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, (pp. 1–52). Cambridge: MIT Press. 1995 The Minimalist Program. Cambridge: MIT Press. Chung, S. 1981 Transitivity and surface filters in Chamorro. In J. Hollyman & A. Pawley (Eds.), Studies in Pacific Languages and Culture: In Honor of Bruce Biggs, (pp. 311–332). Aucland: Linguistic Society of New Zealand.
284
Hanjung Lee
1989
On the notion of “null anaphor” in Chamorro. In O. Jaeggli & K. Safir (Eds.), The Null Subject Parameter, (pp. 143–184). Dordrecht: Kluwer Academic Publishers. The Design of Agreement. Chicago: University of Chicago Press.
1998 Cooreman, A. 1987 Transitivity and Discourse Continuity in Charmorro Narratives. Berlin: Mouton de Gruyter. Costa, J. a. 1998 Word Order Variation: A Constraint-Based Approach. Ph. D. thesis, Leiden University. 2001 The emergence of the unmarked word order. In G. Legendre, J. Grimshaw, & S. Vikner (Eds.), Optimality-Theoretic Syntax, (pp. 171–203). Cambridge: MIT Press. DeLancey, S. 1981 An interpretation of split ergativity and related patterns. Language, 57, 626–658. Dixon, R. M. W. 1979 Ergativity. Language, 55, 59–138. Dowty, D. 1991 Thematic proto-roles and argument selection. Language, 67, 547–619. Gambhir, V. 1981 Syntactic Restrictions and Discourse Functions of Word Order in Standard Hindi. Ph.D. thesis, University of Pennsylvania. Givón, T. 1984 Syntax: A Functional-Typological Introduction, volume 1. Amsterdam: John Benjamins. Goddard, C. 1982 Case systems and case markings in Australian languages: A new interpretation. Australian Journal of Linguistics, 2, 167–196. Grimshaw, J. 1997 Projection, heads, and optimality. Linguistic Inquiry, 28, 373–422. 2001a Economy of structure in OT. Online, Rutgers University: Rutgers Optimality Archive. ROA-434-0601, http://roa.rutgers.edu. 2001b Optimal clitic positions and the lexicon in Romance clitic systems. In G. Legendre, J. Grimshaw, & S. Vikner (Eds.), Optimality-Theoretic Syntax, (pp. 205–240). Cambridge: MIT Press. Grimshaw, J. & Samek-Lodovici, V. 1998 Optimal subjects and subject universals. In P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis, & D. Pesetsky (Eds.), Is the Best Good Enough? Optimality and Competition in Syntax, (pp. 193–219). Cambridge: MIT Press. Hopper, P. & Thompson, S. A. 1980 Transitivity in grammar and discourse. Language, 56, 251–299.
Minimality in a lexicalist Optimality Theory Horn, L. 1984
Jäger, G. 2004
Kager, R. 1999 Kuhn, J. 2001a
2001b
Lee, H. 2000
2001a
2001b 2002
2003a 2003b
285
Toward a new taxonomy of pragmatic inference: Q-based and R-based implicature. In D. Schiffrin (Ed.), Meaning, Form, and Use in Context, (pp. 11–42). Washington, DC: Georgetown University Press. Learning constraint sub-hierarchies. the bidirectional gradual learning algorithm. In R. Blutner & H. Zeevat (Eds.), Optimality Theory and Pragmatics, (pp. 251–287). Palgrave. Optimality Theory. Cambridge: Cambridge University Press. Formal and Computational Aspects of Optimality-theoretic Syntax. Ph.D. thesis, Institut für maschinelle Sprachverarbeitung, Universität Stuttgart. Generation and parsing in Optimality Theoretic syntax – issues in the formalization in OT-LFG. In P. Sells (Ed.), Formal and Empirical Issues in Optimality Theoretic Syntax, (pp. 313–366). Stanford: CSLI Publications. Bidirectional optimality and ambiguity in argument expression. Longer version of the paper presented at the LFG2000 Conference, UC Berkeley. Markedness and word order freezing. In P. Sells (Ed.), Formal and Empirical Issues in Optimality Theoretic Syntax, (pp. 63–127). Stanford: CSLI Publications. Optimization in Argument Expression and Interpretation: A Unified Approach. Ph.D. thesis, Stanford University. Crosslinguistic variation in argument expression and intralinguistic freezing effects. In T. Ionin, H. Ko, & A. Nevins (Eds.), MIT Working Papers in Linguistics 43, (pp. 103–123). Cambridge: MIT Linguistics Department. Parallel optimization in case systems. In M. Butt & T. H. King (Eds.), Nominals: Inside and Out, (pp. 15–58). Stanford: CSLI Publications. Prominence mismatch and markedness reduction in word order. Natural Language and Linguistic Theory, 21, 617–680.
Legendre, G. 2001 Masked second-position effects and the linearization of functional features. In G. Legendre, J. Grimshaw, & S. Vikner (Eds.), Optimalitytheoretic Syntax, (pp. 241–277). Cambridge: MIT Press. Legendre, G., Smolensky, P., & Wilson, C. 1998 When is less more?: Faithfulness and minimal links in wh-chains. In P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis, & D. Pesetsky (Eds.), Is Best Good Enough? Optimality and Competition in Syntax, (pp. 249–289). Cambridge: MIT Press.
286
Hanjung Lee
Masica, C. P. 1982 Identified object marking in Hindi and other languages. In O. N. Koul (Ed.), Topics in Hindi Linguistics, (pp. 16–50). New Delhi: Bahri Publications. 1991 The Indo-Aryan Languages. Cambridge: Cambridge University Press. McCarthy, J. & Prince, A. 1993 Generalized alignment. In G. Booij & J. van Marle (Eds.), Yearbook of Morphology, (pp. 79–153). Dordrecht: Kluwer Academic Publishers. 1994 The emergence of the unmarked: Optimality in prosodic morphology. In M. Gonzàlez (Ed.), Proceedings of NELS 24, (pp. 333–379). Amherst: GLSA, University of Massachusetts. McCawley, J. 1978 Conversational implicature and the lexicon. In P. Cole (Ed.), Syntax and Semantics 9: Pragmatics, (pp. 245–259). New York: Academic Press. Mohanan, K. P. & Mohanan, T. 1994 Issues in word order in South Asian languages: Enriched phrase structure or multidimensionality? In M. Butt, T. H. King, & G. Ramchand (Eds.), Theoretical Perspectives on Word Order in South Asian Languages, (pp. 153–184). Stanford: CSLI Publications. Mohanan, T. 1992 Word order in Hindi. Paper presented at the Syntax Workshop. Stanford University. 1994 Argument Structure in Hindi. Stanford: CSLI Publications. Müller, G. 2001 Order preservation, parallel movement and the emergence of the unmarked. In G. Legendre, J. Grimshaw, & S. Vikner (Eds.), Optimalitytheoretic Syntax, (pp. 279–313). Cambridge: MIT Press. Nichols, J. 1986 Head-marking grammar and dependent-marking grammar. Language, 62, 56–119. Nordlinger, R. 1998 Constructive Case: Evidence from Australian Languages. Stanford: CSLI Publications. Prince, A. & Smolensky, P. 1993 Optimality Theory: Constraint Interaction in Generative Grammar. Technical Report RuCCS Technical Report #2, Center for Cognitive Science, Rutgers University, Piscataway. Sadler, L. & Nordlinger, R. 2003 Relating morphology to syntax. In L. Sadler & A. Spencer (Eds.), Projecting Morphology. To appear. Stanford: CSLI Publications. Sadler, L. & Spencer, A. 2001 Syntax as an exponent of morphological features. In G. Booij & J. van Marle (Eds.), Yearbook of Morphology 2000, (pp. 71–96). Dordrecht: Kluwer Academic Publishers.
Minimality in a lexicalist Optimality Theory
287
Samek-Lodovici, V. 2001 Crosslinguistic typologies in Optimality Theory. In G. Legendre, J. Grimshaw, & S. Vikner (Eds.), Optimality-Theoretic Syntax, (pp. 316–353). Cambridge: MIT Press. Sells, P. (Ed.) 2001a Formal and Empirical Issues in Optimality Theoretic Syntax. Stanford: CSLI Publications. Sells, P. 2001b Structure, Alignment and Optimality in Swedish. Stanford: CSLI Publications. Sharma, D. 1999 Sentential negation and focus in Hindi. ms., Stanford University. Smith, H. 1994 “Dative sickness” in Germanic. Natural Language and Linguistic Theory, 12, 675–736. Smolensky, P. 1996a The initial state and “richness of the base” in Optimality Theory. Technical report, Technical Report JHU-CogSci-96-4, Department of Cognitive Science, Johns Hopkins University. 1996b On the comprehension/production dilemma in child language. Linguistic Inquiry, 27, 720–731. 1998 Why syntax is different (but not really): Ineffability, violability and recoverability in syntax and phonology. Handout of the talk given at the Stanford/CSLI Workshop: Is Syntax Different? Common Cognitive Structures for Syntax and Phonology in Optimality Theory. Stanford University, December 12–13, 1998. Stump, G. 2001 Inflectional Morphology: A Paradigm Structure Approach. Cambridge: Cambridge University Press. Tesar, B. & Smolensky, P. 1998 Learnability in Optimality Theory. Linguistic Inquiry, 29, 229–268. Vogel, R. 2004 Remarks on the architecture of OT syntax grammars. In R. Blutner & H. Zeevat (Eds.), Optimality Theory and Pragmatics, (pp. 211–227). Palgrave. this vol. Correspondence in OT syntax and Minimal Link effects. Wierzbicka, A. 1981 Case marking and human nature. Australian Journal of Linguistics, 1, 43–80. Wilson, C. 2001 Bidirectional optimization and the theory of anaphora. In G. Legendre, J. Grimshaw, & S. Vikner (Eds.), Optimality-theoretic Syntax, (pp. 465–507). Cambridge: MIT Press.
288
Hanjung Lee
Woolford, E. 2001 Case patterns. In G. Legendre, J. Grimshaw, & S. Vikner (Eds.), Optimality-theoretic Syntax, (pp. 509–543). Cambridge: MIT Press. Wunderlich, D. 2001 Optimal case patterns: German and Icelandic compared. In E. Brandner & H. Zinsmeister (Eds.), New Perspectives on Case Theories. To appear. Stanford: CSLI Publications. Zeevat, H. 2000 The asymmetry of Optimality Theoretic syntax and semantics. Journal of Semantics, 17, 243–262.
Phrase impenetrability and wh-intervention Gereon Müller
1. Introduction and overview This paper takes as its starting point the observation that something is wrong with a Minimal Link Condition (MLC) in a derivational grammar. Brody (2001) argues that a derivational approach to syntax should minimize search space, its representational residue; thus, the amount of structure that is visible and accessible to syntactic operations at any given step should be as small as possible. Given this tenet, it follows that constraints that minimize search space should be strengthened in a derivational grammar; in contrast, constraints that presuppose search space should be abandoned. A constraint that minimizes search space is the Phase Impenetrability Condition (PIC – see Chomsky 2000, 2001b); in contrast, the Minimal Link Condition (MLC – see Fanselow 1991; Ferguson and Groat 1994; Chomsky 1995, 2000, 2001b – among many others) is a constraint that presupposes search space. In line with this, I will argue that wh-intervention effects usually attributed to the MLC (more specifically, superiority effects as they arise with wh-movement in German and English), as well as certain superiority-like wh-intervention effects that the MLC has nothing to say about, can be derived from a strengthened version of the PIC – one that holds for phrases rather than phases. I will proceed as follows. Section 2 provides some background assumptions, introduces standard versions of the PIC and the MLC, and lays out conceptual arguments against the MLC, and for a version of the PIC that is based on a more local domain. Section 3 develops an approach to syntactic movement operations that dispenses with the MLC and relies on a more restrictive version of the PIC. The resulting approach is then shown to account for standard superiority effects in English, the absence of standard superiority effects in German, as well as a priori unexpected instances of superiority and superiority-like effects in both languages. Finally, section 4 draws a conclusion.
290
Gereon Müller
2. Phase impenetrability 2.1. The standard approach Throughout this paper, I presuppose an incremental-derivational approach to movement as developed in Chomsky (2000) and Chomsky (2001b). In this kind of approach, two constraints prove particularly relevant; they reduce derivational search space by imposing strong restrictions on what counts as an active, accessible part of the derivation. First, the Strict Cycle Condition (SCC), arguably indispensable in any derivational approach to syntax, restricts possible positions for the probe (i.e., features of a head that drive movement operations and create the target for movement); second, the PIC significantly reduces the positions in which the derivation can look for a goal (i.e., the item that is to be moved). For present purposes, the SCC can be formulated in a classical way, as in (1) (see Chomsky 1973; Perlmutter Soames 1979).1 (1)
Strict Cycle Condition (SCC): Within the current XP _, a syntactic operation may not target a position that is included within another XP ` that is dominated by _.
A first version of the PIC is given in (2) (see Chomsky 2000: 108, 2001b: 13).2 (2)
Phase Impenetrability Condition1 (PIC1): The domain of a head X of a phase XP is not accessible to operations outside XP; only X and its edge are accessible to such operations.
The notions of (i) “edge” and (ii) “phase” need to be clarified. (i) The edge of a head X is the left-peripheral minimal residue outside of Xv; it includes specifiers of X, of which there can in principle be arbitrarily many (irrelevantly for the purposes of this paper, it also comprises adjuncts to XP); see Chomsky (2001b: 13). (ii) The propositional categories CP and vP are phases; other XPs (except perhaps for DP) are not. With this in mind, let us look abstractly at syntactic derivations, and determine the search space available to the derivation at any given point. Thus, suppose that ZP, XP, and UP are phases in (3). Then, in (3a), an operation can have a probe only in YP (because of the SCC), and an operation can look for a goal only in YP or in the residue or head of XP (because of the PIC1). In the subsequent step (3b), the probe must be in ZP, and the search space for a goal grows as indicated.
Phrase impenetrability and wh-intervention
(3)
291
Search space under PIC1:
Crucially, the PIC1 does not allow an operation involving Y and an item in WP. Chomsky (2001b) argues that such operations are in fact attested, though, and he gives the following example: Suppose that YP = TP, XP = vP, and WP = VP. The PIC1 then precludes an operation involving T and NP in VP; but such an operation must arguably be legitimate for instances of long-distance agreement with VP-internal nominative NPs, attested in a number of languages. Chomsky’s solution is to weaken the phase impenetrability requirement in such a way that a phase is evaluated with respect to the PIC at the next phase level; PIC1 is accordingly replaced by PIC2 (see Chomsky 2001: 14). (4)
Phase Impenetrability Condition2 (PIC2): The domain of a head X of a phase XP is not accessible to operations at ZP (the next phase); only X and its edge are accessible to such operations.
As a consequence, the derivational search space is enlarged: Operations in YP can now look for a goal in YP, in XP, in WP, or in the residue or head of UP. This is shown in (5). Agreement operations involving T and VP-internal nominative NPs are now predicted to be legitimate. (5)
Search space under PIC2:
292
Gereon Müller
Since the empirical focus of the present paper will be on superiority (-like) effects with wh-movement, let me now address the mechanics of wh-movement in the SCC/PIC-based approach. Movement in general is viewed an agreement relation that is accompanied by an EPP feature on the probe; checking is deletion under matching. Both PIC1 and PIC2 require successivecyclic wh-movement to proceed via phase edges, i.e., Specv and SpecC. However, the need for successive-cyclic movement does not automatically provide a trigger for such movement (given that the grammar is not equipped with look-ahead capacity). If we assume that all movement operations must be triggered by certain kinds of features, it is clear that there must be such features on heads of phases that trigger intermediate movement steps to phase edges. These features must be optional (so as to prevent derivations without wh-movement or other unbounded dependencies from crashing); ideally, they should only occur when they are needed. To this end, the requirement in (6) is proposed in Chomsky (2000: 109, 2001: 34); I will refer to this as the Optional EPP Feature Condition. (6)
Optional EPP Feature Condition: The head X of phase XP may be assigned an EPP-feature (after the phase XP is otherwise complete), but only if that has an effect on outcome.
It is by no means evident how “having an effect on outcome” can be understood in a strictly local way, without look-ahead. However, for the moment, I will simply presuppose here that the Optional EPP Feature Condition can indeed be checked locally.3 On this basis, consider the (simplified) derivation of a wh-question involving clause-bound wh-movement in English. EPP features show up obligatorily on T and on C marked [wh]; in addition, there is an optional EPP feature on v that is inserted in accordance with the Optional EPP Feature Condition. (7)
(I wonder) what John read a. [VP read3 what1 ] b. [vP what1 John2 read3 [VP t3 t1 ]] c. [TP John2 T [vP what1 t2 read3 [VP t3 t1 ]]] d. [CP what1 C [TP John2 T [vP tv1 t2 read [VP t3 t1 ]]]]
(EPP on v) (EPP on T) ([wh], EPP on C)
A further assumption that is usually made in this kind of approach is that syntactic operations like movement are subject to a Minimal Link Condition (MLC), as in (8) (see Chomsky: 2000: 123, 2001b: 27).
Phrase impenetrability and wh-intervention
(8)
293
Minimal Link Condition (MLC): If ` and a both match a probe _ and ` asymmetrically c-commands a, a syntactic operation cannot involve _ and a.
The MLC is essentially a feature-based version of the Superiority Condition in Chomsky (1973); in cases of potential ambiguity where two items could act as goals for a given probe, only the higher one can in fact participate in the operation. The MLC has a number of interesting consequences (for superiority and other effects); but there are also several well-known problems with a simple version of this constraint. An obvious problem is that subject raising from a vP-internal position to SpecT is wrongly expected to be blocked by the MLC if object movement to Specv has occurred.4 Thus, what1 is closer to T in (7c) than t2, and should therefore have precluded movement of John2 to SpecT. Several solutions to this problem have been proposed. Chomsky (1995) envisages a way out in terms of the concept of “equidistance,” which plays a role instead of the notion of “asymmetrical ccommand” in the formulation of the MLC. The equidistance approach is abandoned again in Chomsky (2000, 2001b) in favour of the stricter formulation of the MLC in (8). The problem that the MLC poses for subject raising in (7) is then addressed by observing that after wh-movement of what1 to SpecC, the subject NP is the closest goal for T after all (the intervening object having left its position). At first sight, it seems that an execution of this idea implies giving up the SCC: Movement in TP would have to follow movement in CP, in violation of strict cyclicity. Still, Chomsky suggests that there is a way out of this dilemma that respects both the SCC and the MLC in strict versions: The idea is that the MLC is not evaluated at each step of the derivation; rather, it is only evaluated at the phase level. Thus, subject raising in (7c) would indeed violate the MLC; but TPs are not phases, and the MLC is therefore not operative at this stage. The MLC does apply to the output in (7d) because CP is a phase. However, at this point, there is no overt NP in Specv left that would separate the subject trace and T, and, given some obvious adjustments, it follows that the MLC is respected. Of course, there is now a change of perspective that is non-trivial: The MLC cannot be conceived of as a derivational constraint on operations anymore; it acts as a representational constraint on certain kinds of structures (viz., trees with phases at the root). This concludes the sketch of movement operations in the incrementalderivational approach developed in Chomsky (2000, 2001b). In the next subsection, I will argue that both the MLC and the PIC1,2 emerge as suboptimal from a point of view that takes the task of reducing derivational search space
294
Gereon Müller
seriously; and I will argue that the MLC should be dispensed with completely in favour of a more restrictive version of the PIC.
2.2.
Conceptual considerations
It is an attractive feature of incremental-derivational approaches to syntax that complexity can be reduced, compared to representational approaches. Such reduction of complexity becomes manifest in three different domains. First, the system does not permit look-ahead: At any given stage of the derivation, operations in later cycles and their effects cannot be considered. Second, the system relies on cyclicity: At any given stage of the derivation, the SCC makes it impossible to target a position (i.e., locate a probe) by a syntactic operation that is not included in the minimal XP. And third, the system incorporates a phase impenetrability requirement (PIC1,2) that significantly reduces the search space for the goal of an operation. In effect, all syntactic material in the domain that the PIC renders opaque can (and must) be ignored for the remainder of the derivation.5 So far, so good. However, closer inspection reveals conceptual problems with both the MLC and the two versions of the PIC: First, the MLC inherently depends on a certain amount of search space to work on. And second, it turns out that the PIC1 and, in particular, the PIC2 could reduce search space even more radically. More specifically, given the overall goal of search space reduction, the MLC/ PIC1,2-based approach to movement creates three conceptual problems.
2.2.1. Weak and strong representationality In his comparison of derivational and representational approaches to syntax, Brody (2001) observes that a representational approach can be strictly nonderivational. In contrast, a derivational approach is usually representational to some extent, by adhering to the very concept of syntactic structure. Brody calls a derivational approach weakly representational if “derivational stages are transparent (i.e., representations), in the sense that material already assembled can be accessed;” and he calls it strongly representational if it “is weakly representational and there are constraints on the representations.” On this view, the approach sketched in the previous subsection is strongly representational: This is not the fault of the SCC or the PIC (in either version); these are derivational constraints on operations. In the formulation given in (8), the MLC is also a derivational constraint; however, this is not
Phrase impenetrability and wh-intervention
295
the case anymore if we re-interpret the MLC in the way suggested at the end of the previous section to account for the existence of subject raising in examples like (7). Here, the MLC is a representational constraint that is evaluated at the phase level; it checks the legitimacy of structures rather than operations. Brody concludes from this (and from related observations) that a representational approach has an inherent advantage over a derivational approach in this domain. Let us assume that the argument is correct. Then, given a derivational approach, the task will be to reduce its representational residue – ideally, a derivational theory should not even be weakly representational. This implies abandoning all constraints that presuppose too much structure (in a sense to be made precise); a good candidate for exclusion then is the MLC.
2.2.2. A redundancy Interestingly, a simultaneous adoption of the MLC and the PIC leads to redundancies: As noted by Chomsky (2001b: 47, fn. 52), “the effect on the MLC is limited under the PIC, which bars ‘deep search’ by the probe.” Thus, the MLC can only become relevant in the relatively small portions of structure permitted by PIC1/PIC2; it thus loses much of its original empirical coverage. Against the background of Brody’s argument involving (weak or strong) representationality of derivational approaches, this can be viewed as further evidence that derivational approaches should dispense with the MLC in toto. I would like to contend that, in a derivational approach, minimality effects should not be covered by a constraint that accesses a significant amount of syntactic structure, i.e., a representation, and then chooses between two items that may in principle participate in a given operation (as is done by the MLC). Rather, minimality effects should emerge as epiphenomena of constraints that reduce the space in which the derivation can look for items that may participate in an operation (as is done by the PIC); ideally, all competition among items (that a priori qualify for some operation) that must be resolved is in fact independently resolved if the search space is sufficiently small.
2.2.3. An asymmetry The SCC and the PIC have complementary tasks and look like two sides of the same coin. Therefore, it is a potentially suspicious property of the system
296
Gereon Müller
laid out above that the two constraints rely on syntactic domains of such a different size. In one case (SCC), it is the phrase, in the other, it is the phase (PIC). In an optimally designed system, we would expect more symmetry in domains for probe and goal localization: Either the local domain of the SCC should be the phase (not the phrase), or the local domain of the PIC should be the phrase (not the phase). My goal in what follows is to develop a derivational approach that evades these three conceptual problems by exhibiting the following properties: First, the material that can be accessed at any given step of the derivation is an extremely small bundle of categories with virtually no internal structure that can hardly be called a representation anymore. Hence, the approach to be developed will not even be weakly representational.6 Second, the MLC is dispensed with in favour of a strengthened version of the PIC. Third, the new version of the PIC has the same kind of local domain as the SCC: the phrase.
3.
Phrase impenetrability
3.1. Assumptions Following Sternefeld (2000), I assume a system in which two types of features participate in movement operations. On the one hand, there are [*F*] features that trigger movement as probes (to specifier positions, for the cases considered in this paper, and directly, without recourse to additional generalized EPP features). On the other hand, there are corresponding [F] features on items that turn them into goals for a movement operation triggered by [*F*]. The constraint that brings about movement is the Feature Condition; the constraint that requires all movement to be feature-driven is Last Resort. (9)
Feature Condition: An [*F*] feature on X requires movement of an item marked [F] to the edge of X.
(10) Last Resort: Movement requires matching [F] and [*F*] at an edge. The SCC remains the same; (1) is repeated here as (11). However, the PIC is now restricted to phrases; see the PIC3 in (12).7
Phrase impenetrability and wh-intervention
297
(11) Strict Cycle Condition (SCC): Within the current XP _, a syntactic operation may not target a position that is included within another XP ` that is dominated by _. (12) Phrase Impenetrability Condition3 (PIC3): The domain of a head X of a phrase XP is not accessible to operations outside XP; only X and its edge are accessible to such operations. A comparison of the abstract derivations in (3) (under PIC1) and (5) (under PIC2) with the abstract derivation in (13) shows that the new PIC3 is more restrictive in the sense that derivational search space is minimized. (13) Search space under PIC3:
Finally, recall that so far, intermediate movement steps required by the PIC were triggered by optional EPP features demanded by the Optional EPP Feature Condition. I would now like to suggest that the role of the Optional EPP Feature Condition is played by the constraint Phrase Balance. This constraint is a straightforward adaptation of the constraint Phase Balance developed in Heck and Müller (2000). Ph(r)ase Balance arguably captures the underlying idea of the Optional EPP Feature Condition; and it does so without running into the danger of invoking look-ahead – the pieces of information that must be taken into account for the purposes of Phrase Balance at any given stage of the derivation are locally available, either in the present tree, or in the workspace of the derivation, which must be accessible throughout (I will address this concept immediately). (14) Phrase Balance: Every XP has to be balanced: For every feature [*F*] in the numeration there must be a potentially available feature [F] at the XP level. The concept of potential availability of a feature remains to be defined; this can be done as in (15).
298
Gereon Müller
(15) Potential availability: A feature [F] is potentially available if (i) or (ii) holds: (i) [F] is on X or edgeX of the present root of the derivation. (ii) [F] is in the workspace of the derivation. The workspace of a derivation D comprises the numeration N and material in trees that have been created earlier (with material from N) and have not yet been used in D. Phrase Balance triggers movement without feature matching in cases where the Feature Condition does not force movement (viz., to intermediate positions). However, Last Resort clearly prohibits such movement. In view of this state of affairs, I will assume (following again Heck and Müller 2000) that Last Resort is minimally violable if this is the only way to fulfill the inviolable constraints Feature Condition, SCC, PIC3, and Phrase Balance.8 As a consequence of Phrase Balance, wh-movement must proceed via every XP on the way to its ultimate target position (the C[*wh*] node that attracts it, because of the Feature Condition).9 The reason is this: As long as there is a C bearing the feature [*wh*] in the numeration, and no [wh] feature on either another item in the numeration, or in a tree that has been formed earlier, a root XP of the current derivation can only be balanced if non-feature-driven wh-movement takes place to its specifier. The derivation of a simple wh-question under these assumptions is given in (16) (compare (7)); material that is crossed out has been rendered inaccessible by the PIC3, and is thus not available anymore for further operations in the derivation.10 (16) (I wonder) what John read a. [VP what1 read3 t1 ]
workspace: {C[*wh*], John, T[*D*], v} b. [vP what1 John2 v+read3 [VP tv1 t3 t1 ]] workspace: {C[*wh*], T[*D*]} c. [TP what1 John2 T [vP tvv1 t2 v+read3 [VP tv1 t3 t1 ]]] workspace: {C[*wh*]} John T [ tvv t v+read [ d. [CP what1 C [TP tvvv 1 2 vP 1 2 3 VP tv1 t3 t1 ]]]] workspace: {–}
Given that Phrase Balance forces intermediate, non-feature-driven whmovement only if there is otherwise no potentially available [wh] feature, and given that Last Resort can only be violated if this is the only way to fulfill constraints like Phrase Balance, the prediction is that presence of an
Phrase impenetrability and wh-intervention
299
accessible [wh] feature in the workspace should make non-feature-driven wh-movement of a wh-phrase impossible, and the PIC3 should then block any further operations applying to this wh-phrase. The next subsection shows that this prediction is borne out, and that it offers a simple account of superiority effects in English, without recourse to a constraint like the MLC.
3.2. Superiority effects in English Examples illustrating superiority effects in English are shown in (17) (for subject and object) and in (18) (for two objects): Given two wh-phrases that qualify in principle as goals for movement operations targetting a single C[*wh*] probe, only the higher wh-phrase can undergo movement to the target position. (17) a. (I wonder) who1 bought what2 b. *(I wonder) what2 who1 bought t2 (18) a. Who1 did you persuade t1 [CP to read what2 ] ? b. *What2 did you persuade who1 [CP to read t2 ] ? These superiority effects can be derived under the assumptions adopted so far. In both cases, the lower wh-phrase NP2 has a chance to leave the (rightperipheral) complement position of the VP that it is merged in only if it first moves to the (left-peripheral) SpecV position; this is so because of the PIC3. A priori, there are two conceivable ways to move NP2 to SpecV. First, the Feature Condition might trigger [*F*]-driven movement to SpecV (Specv, SpecT, …). This is not an option in English, which has neither object shift nor scrambling. Second, movement of NP2 to SpecV might be triggered by Phrase Balance, as in (16). However, this is not an option either in (17) and (18): VP is balanced because there is another wh-phrase in the workspace, viz., NP1. The vP and TP categories that dominate this VP are balanced in (17) because the wh-phrase NP1 occupies the respective specifiers (as a result of Merge and [*D*]-driven movement, respectively); they are balanced in the embedded clause in (18) because the wh-phrase NP1 is still in the workspace. Thus, any attempt to derive a sentence like (17b) will automatically result in a sentence like (17a); as the derivation in (19) shows, the decision against wh-movement of the object NP2 is made very early, at the first stage, where NP2 cannot move to SpecV (similarly for (18a) vs. (18b)).
300
Gereon Müller
(19) a. [VP bought3 what2 ]
workspace: {C[*wh*], who1[wh], T[*D*], v} b. [vP who1 v+bought3 [VP t3 what2 ]] workspace: {C[*wh*], T[*D*]} c. [TP who1 T [vP t1 v+bought3 [VP t3 what2 ]]] workspace: {C[*wh*]} d. [CP who1 C [TP tv1 T [vP t1 v+bought3 [VP t3 what2 ]]]] workspace: {–}
Double object constructions provide an interesting testing ground for approaches to superiority effects. Constructions with ditransitive verbs basically come in two varieties in English: the prepositional object construction, and the dative shift construction. As far as the prepositional object construction is concerned, it has been observed by Chomsky (1973: 246) and Fiengo (1980: 123) that either object (NP or PP) can move if both are wh-phrases (see 20a/b); however, preposition stranding (which is legitimate if the remaining NP object is not a wh-phrase) becomes impossible in this context (see 20c). (20) a. What1 did you give t1 to whom2 ? b. To whom3 did you give what1 t3 ? c. *Who2 did you give what1 [PP3 to t2 ] ? The situation is different in dative shift constructions with two wh-objects. Barss and Lasnik (1986: 349) note that the second object NP can never move in this context; the first, dative-shifted object NP can marginally move.11 (21) a. (?)Who2 did you give t2 what1 ? b. *What1 did you give who2 t1 ? To account for these data, we need to say something about the structure of double object constructions in English, and about pied piping in wh-PPs. Modifying the proposal in Larson (1988), I assume that the direct (i.e., Theme) object is merged in a complement position of V, where it remains in both types of double object constructions (unless it undergoes movement to the clausal periphery). The indirect (i.e., Goal) object is at the edge of V if it has undergone dative shift (see 22b), and in a right-peripheral Vv sister position if it is prepositional (see 22a).12 Note that this right-peripheral position does not belong to the edge of V.13 (22) a. [VP [Vv [Vv V NP1 ] [PP3 P NP2 ]]] b. [VP NP2 [Vv V NP1 ]]
Phrase impenetrability and wh-intervention
301
With respect to pied piping, I assume that there is optional percolation of the feature [wh] in wh-PPs; for present purposes, this percolation operation can be conceived of as an actual feature displacement.14 Consider now first the examples involving prepositional object constructions in (20). Given that feature percolation of [wh] from NP to PP is optional, we have to take into account two possibilities. First, suppose that [wh] percolation has taken place, and PP bears [wh]. The two objects are merged in VP-internal non-edge positions (see 22a). Hence (given that there is no [wh] waiting in the workspace), Phrase Balance forces movement of one wh-phrase to SpecV so as to balance the VP (there is a [*wh*] on C in the numeration). It does not matter which of the two wh-phrases moves to SpecV, but whichever wh-phrase moves first forces the other wh-phrase to stay in situ, to avoid an unforced violation of Last Resort. The wh-phrase in SpecV is then passed on through further cycles of the derivation, until CP is reached and [*wh*] on C is checked. This way, (20a) and (20b) can both emerge as grammatical. Consider now the second option: [wh] percolation from NP to PP has not taken place. Then, PP cannot move to SpecV – if it moves, Phrase Balance will not be satisfied because [wh] is not potentially available at the VP level since it is not part of the edge of V (it is dominated by an edge element – PP – but not on an edge element itself). NP2 in PP cannot move either, though: To leave PP, NP2 must move to SpecP, given the PIC3. However, this operation is not legitimate because there is no [*F*] that might trigger it (English does not have an independent PP-internal preposing operation), and because Phrase Balance is independently satisfied (with another wh-phrase in the workspace). Therefore, the superiority effect in (20c) is correctly predicted. Turning next to dative shift constructions as in (21), it follows from (22b) that NP2 is in SpecV for independent reasons.15 Hence, Phrase Balance can be fulfilled without a Last Resort violation, and any such violation incurred by movement of the lower wh-phrase will be fatal. Consequently, NP2 can undergo wh-movement (see 21a), but NP1 cannot undergo such movement, because of the PIC3 (see 21b). The analysis makes a further prediction: If both wh-phrases are embedded in PPs, preposition stranding is predicted to be blocked throughout: It is impossible for an embedded wh-phrase to move to SpecP in this context because Phrase Balance is always satisfied without such movement. Here is why. For the first wh-phrase NPi that is merged with P (be it NP1 or NP2), the PP is balanced without local inversion of NPi to the edge of P because there is a [wh] feature on another wh-item left in the numeration. For the second wh-phrase NPj that is merged with P, movement to the edge of PP
302
Gereon Müller
will also be blocked because there is now invariably a tree in the workspace that contains (or is) a wh-phrase bearing a [wh] feature. Consequently, no wh-phrase can move to SpecP in this context, and subsequent movement of such a wh-phrase from its base position will fatally violate the PIC3. By and large, this prediction seems to be tenable, as the data in (23) illustrate.16 (23) a. b. c. d.
?*Who2 did you give [NP pictures of t2 ] [PP to whom1 ] ? ?*Who1 did you give [NP pictures of whom2 ] [PP to t1 ] ? ?*Who2 did you talk [PP to t2 ] [PP about whom1 ] ? ?*Who1 did you talk [PP to whom2 ] [PP about t1 ] ?
To sum up this subsection, the present approach accounts both for standard superiority effects in English, and their absence in certain kinds of double object constructions, without invoking the MLC, by the interaction of Phrase Balance and the PIC3. I will now turn to the situation in German.
3.3. The lack of superiority effects in German It has often been observed that German does not exhibit superiority effects with wh-phrases that are clause-mates; see Haider (1983, 1993, 2000), Grewendorf (1988), and Bayer (1990), among many others). A relevant pair of examples involving a wh-subject NP and a wh-object NP is given in (24). (24) a. (Ich weiß nicht) wer1 C t1 was2 gesagt hat whatacc said has I know not whonom b. (Ich weiß nicht) was2 C wer1 t2 gesagt hat I know not whatacc whonom said has Similarly, German does not exhibit superiority effects with control infinitives; see Fanselow (1991), Kim and Sternefeld (1997), and Haider (2000).17 This is shown in (25): er t1 überzeugt hat [ was2 zu kaufen ] (25) a. (Ich weiß nicht) wen1 I know not whomacc he convinced has whatacc to buy überzeugt hat [ t2 zu kaufen] b. (Ich weiß nicht) was2 er wen1 to buy I know not whatacc he whomacc convinced has
Phrase impenetrability and wh-intervention
303
Various accounts of the lack of superiority effects with two wh-phrases that share a minimal finite clause have been given in the literature. I will here adopt an analysis that has been suggested by Fanselow (1996) and Grohmann (1997) (who assume that the MLC underlies superiority effects): 18 German has scrambling: A lower wh-phrase can independently be moved to a higher position, by wh-scrambling. Thus, a lower wh-phrase cannot move across another wh-phrase merged in a higher cycle by wh-movement, given Phrase Balance; but it can do so by scrambling. To implement this analysis in the present approach, I assume that scrambling is triggered by a designated optional feature (or feature bundle) that we can refer to as [*Y*]; accordingly, scrambled items bear [Y] features (see Müller 1998; Sauerland 1999; Grewendorf and Sabel 1999). For our present concerns, it is immaterial whether Y is a formal feature that is not interpreted, or can in fact be shown to be related to contentful notions that are sometimes viewed as triggers for scrambling (definiteness, specificity, animacy, focus, and the like). The derivation of a sentence like (24b) can then proceed as in (26), where the whobject NP2 first undergoes Phrase Balance-driven movement to SpecV (because of [*Y*] on v, not because of [*wh*] on C), and then Feature Condition-driven movement to Specv (because of [*Y*]). At the vP level, both wh-phrases show up at the edge; hence, NP1 and NP2 are both in principle eligible for further movement (given the PIC3); such further movement is triggered by Phrase Balance on the TP cycle, and by the Feature Condition in the final step (CP).19 (26) a. [VP was2,[Y] [Vv t2 gesagt ]] workspace: {C[*wh*], wer1[wh], T, [v hat ][*Y*]} b. [vP was2,[Y] wer1 [VP tv2 [Vv t2 gesagt ]] [v hat ]] workspace: {C[*wh*], T} c. [TP was2,[Y] [vP tvv2 wer1 [VP tv2 [Vv t 2 gesagt ]] [v hat ]] T ] workspace: {C[*wh*]} d. [CP was2,[Y] C [TP tvvv 2 [vP tvv 2 wer1 [VP tv2 [Vv t 2 gesagt ]] [v hat ]] T ] workspace: {–} This concludes the account of the lack of superiority effects in German in the present approach.20 The prediction is that the PIC3 should give rise to superiority effects after all in German if the lower wh-phrase cannot be moved to the domain occupied by the higher wh-phrase because scrambling is not available (for whatever reason). The following three subsections highlight three contexts where wh-scrambling is impossible in German; and it is in these contexts that superiority effects do indeed occur.
304
Gereon Müller
3.4. Superiority effects with long-distance movement in German The first such context is well known: As observed by Frey (1993), Büring and Hartmann (1994), Fanselow (1996), Heck and Müller (2000), Pesetsky (2000), and others, German does exhibit superiority effects with long-distance movement. This is shown by the contrast in (27). mag ] ? (27) a. Wer1 hat t1 geglaubt [CP dass der Fritz wen2 that the Fritz whomacc likes whonom has believed b. *Wen2 hat wer1 geglaubt [CP dass der Fritz t2 mag ] ? that the Fritz likes whomacc has whonom believed The analysis is straightforward. First, as before, NP2’s [wh] feature in (27) does not permit movement: Phrase Balance is satisfied by the presence of NP1 in the workspace; therefore, movement of NP2 for the purposes of [wh] will fatally violate Last Resort. Second, and more importantly in the present context, NP2 cannot move by scrambling either: Scrambling cannot leave a finite CP in German. Consequently, an embedded wh-phrase is correctly predicted to be stuck in the embedded clause if there is another wh-phrase in the workspace that is eventually merged in the matrix clause.21
3.5 Superiority effects with subject raising in German The second context in which there are a priori unexpected superiority effects in German involves subject raising.22 NP raising to subject position is optional in German (see Diesing 1992). In the present approach, this implies that the EPP feature [*D*] is optional on T. However, as shown in Haider (1993: ch. 8), the evidence cited in Diesing (1992) and much related work in favour of subject raising to SpecT (based on phenomena like particle placement) is far from conclusive. As far as I can see, there is only one context where it is clear that subject raising to SpecT must have occurred in German (see Müller 2001: 296): Unstressed pronouns must be at the phonological border of vP (in the sense of Chomsky 2001b: 34), i.e., they cannot be preceded by non-pronominal material within TP (in contrast, stressed pronouns behave like non-pronominal NPs). There is but one exception: The subject NP, and only the subject NP, can optionally precede these pronouns within TP. This strongly suggests a special position that is available only for subject NPs. Hence, we can conclude that if a subject NP precedes unstressed pronouns, it must have undergone optional movement to SpecT.
Phrase impenetrability and wh-intervention
305
Interestingly, there is a clear superiority effect in exactly this context. Since we need an unstressed object pronoun to ensure that subject raising has taken place, relevant examples involve ditransitive verbs. The contrast in (28) shows that a dative wh-object NP cannot undergo wh-movement to SpecC if a wh-subject occurs in front of an unstressed accusative object pronoun. (28) a.
Wem2 hat [vP es tv2 wer1 t2 gegeben ] ? has itacc whonom given whomdat
b. ?*Wem2 hat wer1 [vP es tv2 t1 t2 gegeben ] ? itacc given whomdat has whonom The contrast in (29) shows the same for an accusative wh-object NP and an unstressed dative object pronoun. (29) a.
Was2 hat [vP ihm tv2 wer1 t2 gegeben ] ? himdat whonom given whatacc has
b. ?*Was2 hat wer1 [vP ihm tv2 t1 t2 gegeben ] ? given whatacc has whonom himdat This superiority effect follows under present assumptions: Suppose that a subject NP[wh] and an object NP[wh] are both in Specv at some stage of the derivation, and that T has an optional [*D*] feature. Then, TP is balanced (for [*wh*]) by feature-driven subject raising, and movement of the object NP incurs a fatal Last Resort violation.23 If this analysis is on the right track, we expect that a non-wh-subject NP should, ceteris paribus, not block movement of an object wh-phrase. This is the case: Only a wh-subject NP in SpecT blocks wh-movement of an object NP; see (30). (30) a. b.
Wem2 hat tvv2 der Fritz1 [vP es tv2 t1 t2 gegeben ] ? itacc given whomdat has the Fritznom Was2 hat tvv2 der Fritz1 [vP ihm tv2 t1 t2 gegeben ] ? himdat given whatacc has the Fritznom
Similarly, replacing the unstressed object pronoun with a non-pronominal object NP should void the superiority effect (other things being equal). The reason is that the wh-subject does not have to be in SpecT in this context (non-pronominal NPs do not have to be at the phonological border of vP). (31) shows that this prediction is borne out, too.
306
Gereon Müller
(31) a. Wem2 hat [vP tv2 wer1 t2 das Buch gegeben ] ? whonom the bookacc given whomdat has b. Was2 hat [vP tv2 wer1 dem Fritz t2 gegeben ] ? whonom the Fritzdat given whatacc has
3.6. Superiority effects with scrambling from wh-XP in German A third context in which superiority effects arise in German has been noted in Fanselow (1996). The construction involves a configuration where the two wh-phrases are initially not in a c-command relation (as in all the examples discussed thus far); rather, one dominates the other. More specifically, suppose that a wh-phrase PP1 is dominated by a wh-phrase NP2 (it has been merged with NP2’s head), as in wieviele Bücher über wen (‘how many books about whom’). Suppose furthermore that PP1 can be moved out of NP2 without violating locality constraints. This implies that NP2 is in object position when extraction takes place (otherwise, the Condition on Extraction Domain (CED) would be violated that permits extraction from XP only if XP occupies a complement position). It also implies that NP2 is embedded by a certain kind of verb (verbs like lesen (‘read’) permit extraction from NP, verbs like zerstören (‘destroy’) do not); that NP2 is sufficiently non-specific (highly specific NPs like welches Buch (‘which book’) tend to block extraction, nonspecific NPs like wieviele Bücher (‘how many books’) do not); etc. Then, a wh-PP1 can be scrambled from a wh-NP2 if there is a [Y] feature on PP1 and a [*Y*] feature on a higher head (V or v). PP1 moves to the edge of V, driven either by the Feature Condition (if [*Y*] is on V) or by Phrase Balance (if [*Y*] is on v). This stage of the derivation is depicted in (32). (32) [VP [PP1
über wen ] [Vv [NP2 tv1 about whom
wieviele Bücher t1 ] lesen ]] how many books read
Here, PP1 occupies an edge position of VP, and NP2 a complement position. As shown by the contrast in (33), it is indeed the case that only PP1 can undergo further movement, as one might expect: Movement of NP2 on the next (vP) cycle will have to violate the PIC3, or so it seems. (33) a. (Ich weiß nicht) [PP1 über wen ] er [NP2 wieviele Bücher t1 ] I know not about whom he how many books lesen will read wants
Phrase impenetrability and wh-intervention
307
b. ?*(Ich weiß nicht) [NP2 wieviele Bücher t1 ] er [PP1 über wen ] t2 I know not how many books he about whom lesen will read wants However, there is a gap in this reasoning: (32) closely mirrors the situation found with two wh-phrases that are co-arguments in German, which do not normally exhibit superiority effects. Thus, if NP2 can undergo scrambling in (32), it should be able to undergo further wh-movement to SpecC after all. Fanselow (1996) solves this problem by showing that a derivation of (33b) that involves intermediate scrambling of NP2 will invariably violate another constraint: the principle of Unambiguous Domination (see Müller 1998: 271). Unambiguous Domination is essentially a constraint on the movement of remnant XPs, i.e., XPs from which movement has taken place. This constraint states that _-traces must not be _-dominated (in the domain of the head of the chain). For the case at hand, this means: A scrambling trace like t1 must not be dominated by a category that has itself undergone scrambling. This precludes intermediate scrambling of NP2 in (32). Consequently, any derivation of (33b) will have to violate either Unambiguous Domination or PIC3, depending on whether intermediate scrambling of NP2 does or does not take place. No such effect is predicted to occur if PP1 is not a wh-phrase. Now, NP2 can (in fact, must, given Phrase Balance) move to SpecV in (32); this movement is not an instance of scrambling because there is no [*Y*] involved (be it directly or indirectly). Compare (33b) with (34). (34) (Ich weiß nicht) [NP2 wieviele Bücher t1 ] er [PP1 über die Liebe ] t2 I know not how many books he about love lesen will read wants To sum up, German does exhibit superiority effects in certain contexts. These contexts have in common that intermediate scrambling of the second wh-phrase is not available, for independent reasons (scrambling in German cannot leave a finite clause, cannot target TP, and cannot apply to XPs from which scrambling has taken place). The effects are then derivable from the PIC3.
308
Gereon Müller
3.7. Superiority-like effects with remnant movement in German Let me make a brief digression at this point. Recall that it is a major goal of this paper to show that the MLC can be dispensed with in a derivational grammar because typical MLC effects follow straightforwardly from a strict version of the PIC that is independently motivated by conceptual considerations. As we have seen in the last subsection, a constraint like Unambiguous Domination proves necessary to account for one such effect (whether we adopt the MLC, as in Fanselow (1996), or the PIC3). Interestingly, however, it has been argued that Unambiguous Domination can itself be derived from a version of the MLC (defined in terms of closeness rather than asymmetric c-command (see Takano 1994; Koizumi 1995; Kitahara 1997; Müller 1998; Sauerland 1999). In a nutshell, the idea is this: In a configuration … [ ` … a …], where ` and a both qualify as a goal for a `-external probe _, the MLC forces movement of the item that is closer to _; and that is `, not a. Hence, ` must move first, and subsequent movement of a must incur a violation of the CED (because a-extraction takes place from ` in a noncomplement position, which ` must be in after movement), and, if a-movement is to a position that follows `, an additional violation of the general ban on lowering (which is arguably derivable from the SCC, given some minor modification; see Müller 1998). Thus, Unambiguous Domination effects (as they show up in (35a) vs. (35b) in German) turn out to be derivable from the MLC. On this view, the only relevant difference between typical Unambiguous Domination configurations (as in 35a) and typical superiority configurations is that the two items that compete for movement (because they have the same [F] feature attracted by a higher [*F*]) are in a dominance relation in the first case, and in a c-command relation in the second. t2 versucht (35) a. *dass [vP [VP2,[Y] t1 zu lesen] [NP1,[Y] das Buch ] keiner that to read the bookacc no-onenom tried hat ] has b. dass [vP [VP2,[Y] das Buch1 zu lesen ] keiner t2 versucht hat ] has that the bookacc to read no-onenom tried The question arises of whether the present system based on the PIC3 also directly accounts for dominance-related MLC effects, in addition to the ccommand-related MLC effects discussed so far. The answer is no: The illformedness of (35a) does not follow from the PIC3. To see this, suppose that
Phrase impenetrability and wh-intervention
309
there are two [*Y*] features, one for NP1, one for VP2. Then, there should be a well-formed derivation for (35a), with NP1 undergoing Phrase Balancedriven movement to SpecV first, followed by feature-driven movement of NP1 to Specv, and then of VP2 to Specv – both movements are compatible with PIC3. However, this does not imply that Unambiguous Domination must be stated as such. Its effects can be derived from a more general constraint: a simple version of the A-over-A Condition. (36) A-Over-A Condition: If [*F*] can be checked either with a head, or with an edge element, it must be checked with the head. This version of the A-Over-A Condition forces VP2 movement to apply first in (35a); subsequent NP1 lowering then violates (at least) the CED.24
3.8. Intervention without c-command in German The three types of superiority effects in German that were discussed in subsections 3.4, 3.5, and 3.6 as such do not differentiate between PIC-based and MLC-based analyses. However, it is worth noting that, in stark contrast to what is the case with an MLC account, there is nothing in the PIC3-based account that would tie the intervention effect incurred by a wh-phrase wh1 for another wh-phrase wh2 to a c-command (or dominance) relation between the two. All that is needed for an intervention effect to arise in the PIC3based analysis is that wh1 enters the derivation that wh2 is part of at a later stage, and wh2 cannot end up in the same edge domain as wh1 by some independently motivated movement operation. Consequently, we expect that there should be wh-intervention effects without c-command. As noted in Heck and Müller (2000), it is indeed the case that non-ccommanding wh-phrases in a matrix clause block long-distance wh-movement in German. This superiority-like effect without c-command is exemplified by the contrast in (37). In (37a), there is clause-bound wh-movement of NP1 across an adverbial CP that contains another wh-phrase NP2, and that is merged later; here, an intervention effect can be avoided because NP1 can reach a position in the same edge domain as the adverbial CP by scrambling. However, the option of intermediate scrambling is not available for long-distance wh-movement; scrambling must stop in the embedded vP domain. Thus, the presence of NP2 in the workspace blocks Phrase Balance-
310
Gereon Müller
driven movement of NP1, and (37b) emerges as ungrammatical because of the PIC3. In (37c), it is NP2 rather than NP1 that undergoes wh-movement; the result is also ill formed. As in (37b), a PIC3 violation cannot be avoided here: No matter whether the adverbial CP is created before or after the object CP, Phrase Balance cannot trigger successive-cyclic movement of NP2 because NP1’s [wh] feature is potentially available for C[*wh*] in the workspace. In addition, sentences like (37c) are ruled out by the CED: Movement of NP2 takes place from an adverbial CP that does not occupy a complement position.25 The overall result is that the numeration underlying (37b) and (37c) cannot yield a well-formed output. (If NP2 is not a whphrase, (37b) is well formed, as expected.) (37)
Exactly the same reasoning applies in (38), where NP2 shows up in a relative clause CP that is in turn dominated by an NP: (38)
Phrase impenetrability and wh-intervention
311
Yet another set of examples that illustrates the same pattern is given in (39); here the intervening wh-phrase that blocks long-distance wh-movement is embedded in a simple NP. (39)
The prediction is that the same kind of superiority-like effect without ccommand should be detectable in subject raising constructions. The contrast between (40a) and (40b) may not be one of perfect wellformedness vs. absolute ungrammaticality; but the tendency is clear enough, and conforms to expectations: (40a) is much better than (40b). In (40a), the subject NP does not have to be in SpecT, and the wh-object NP can therefore move to a position in front of it by scrambling; this option is not available in (40b), where the subject NP must be in SpecT (because of the presence of the unstressed object pronoun), i.e., in a domain that cannot be reached by scrambling. (40c) is also excluded by the PIC3; in addition, it is blocked by the CED because the subject NP does not occupy a complement position.26 (40)
312
Gereon Müller
3.9. Intervention without C-Command in English The system developed so far makes yet another prediction: There is no clause-bound intervention effect in the examples in (37a), (38a), (39a), and (40a) because German has scrambling. Since English does not have scrambling, we expect clause-bound intervention effects with non-commanding wh-phrases to occur. At first sight, this seems to contradict the standard view that argument wh-in situ in English does not obey any island constraints (see Chomsky 1981; Huang 1982, 1995; Lasnik and Saito 1992; Hornstein 1995 – among others). However, it is worth noting that most of the pertinent examples in the literature do not involve intervention without c-command: The typical kind of multiple wh-question that is taken to argue for non-island-sensitivity of wh-in situ in English has one wh-phrase embedded in an island, and a second wh-phrase merged in a higher position. This latter wh-phrase then undergoes movement to SpecC, as in the examples in (41), where a wh-phrase dominated by an object NP or by an adjunct PP does not block wh-movement of a subject wh-phrase merged later. (41) a. b. c. d. e. f. g.
Who1 t1 saw [NP the man that bought what2 ] ? Who1 t1 likes [NP books that criticize who2 ] ? Who1 t1 bought [NP the books on which table2 ] ? Who1 t1 met [NP friends of whom2 ] ? I wonder who1 t1 heard [NP the claim that John had seen what2 ] I wonder who1 t1 heard [NP John’s stories about what2 ] Who1 t1 left [PP despite which warning2 ] ?
Similarly, a wh-phrase that is part of a subject NP does not block movement of a wh-phrase that is merged in a higher clause, as in (42). (42) Who1 t1 thinks that [NP pictures of who2 ] are on sale ? All this is expected under present assumptions: NP2 cannot undergo Phrase Balance-driven movement in (41) or (42) early in the derivation because there is another item bearing [wh] left for [*wh*] of C in the workspace, and the PIC3 precludes Feature Condition-driven movement of NP2 at the end of the derivation. NP1, in contrast, undergoes movement from the edge of v to the edge of T in accordance with Phrase Balance, and is then forced to end up at the edge of C by the Feature Condition. However, consider now the case where wh-intervention without c-command does occur. In the examples in (43), an object wh-phrase that is merged
Phrase impenetrability and wh-intervention
313
first (NP2) moves across a subject NP containing another wh-phrase (NP1). Such movement results in significantly reduced acceptability, as predicted under the present PIC3-based approach.27 (43) a. b. c. d. e.
?*Who2 did [NP the man that bought what1 ] see t2 ? ?*Who2 did [NP books that criticize who1 ] impress t2 ? ?*What2 did [NP the books on which table1 ] cost t2 ? ?*Who2 did [NP friends of whom1 ] meet t2 ? *Who2 did [NP friends of whom1 ] say that we should invite t2 ?
The explanation is completely analogous to that given for the ungrammatical German examples involving intervention without c-command in the last subsection: Non-feature-driven movement of NP2 to the edge of V at an early stage in the derivation is not forced by Phrase Balance (because the VP is balanced anyway, with NP1’s [wh] feature matching the matrix C’s [*wh*] feature in the workspace), and therefore excluded by Last Resort. Consequently, any movement operation applying to NP2 at later stages of the derivation fatally violates the PIC3.28
3.10.
Further refinements
The approach developed in this paper imposes severe restrictions on whmovement; as a matter of fact, it turns out to be slightly too restrictive in two domains.
3.10.1. Multiple C[*wh*] domains and intervention The first problem concerns sentences like (44) in German (see Heck and Müller 2000). (44)
(44) is clumsy, but well formed. Consider the underlying numeration. There are two C heads bearing [*wh*] features (C6 and C5), and there are three wh-pronouns bearing [wh] features – hence, one of the two C[*wh*] heads will
314
Gereon Müller
have to give rise to a multiple question. The example has been designed in such a way that CP6 is the multiple question, and CP5, which is merged earlier, is a simple question. The task now is to ensure that the wh-phrase wie (‘how’) can undertake steps of successive-cyclic movement until it reaches the edge of T of CP5, where it is attracted by C[*wh*]. Unfortunately, successive-cyclic movement of wie3 turns out to be blocked at the very first stage under present assumptions: At the point where it must be decided whether wie3 can move by violating Last Resort, the phrase is wrongly predicted to be balanced: There are two C heads bearing [*wh*] features, and there are two remaining items in the workspace that bear corresponding [wh] features. Closer inspection reveals that the same kind of problem also shows up in simpler sentences in German (see 45a) and in English (see 45b). (45)
As it stands, the wh-phrase NP3 (was/what) cannot reach the edge of T, from where it can be attracted by C4 bearing [*wh*] in accordance with the PIC3. The highest position that NP3 can be in prior to wh-movement to SpecC is the edge of v in (45a) in German (due to this language’s scrambling options), and the complement position of V in English (due to this language’s lack of scrambling options). Intuitively, the problem with (44), (45a), and (45b) is clear: A wh-phrase that is part of the workspace must not interact with a wh-phrase in a given derivation if the two wh-phrases target different C[*wh*] domains. Following Heck and Müller (2000), this problem can be solved by minimally enriching the representation of wh-features. Thus, suppose that both [*wh*] and [wh] features are accompanied by scope indices in the numeration, and that whphrases can only be interpreted with a given C node if they share a scope index. Under this assumption, a feature [wh] i on a wh-item can never count as potentially available for a feature [*wh*] j on a C in the workspace, due to feature mismatch. For the cases at hand, this means that the [wh] feature of NP2 must be accompanied by the same scope index as the [*wh*] feature of C6 in (44), and of C5 in (45a) and (45b), in order to be interpretable as part of the multiple question. Therefore, at the point where the question of nonfeature-driven movement of NP3 must be decided, NP2 does not intervene anymore: A feature like [wh]5 on NP2 in (45a/b) can never satisfy Phrase Balance for a feature like [*wh*]4 on C4. Only a feature like [wh]4 on NP3
Phrase impenetrability and wh-intervention
315
can do so; accordingly, Phrase Balance forces successive-cyclic movement of NP3.
3.10.2. D-linking and intervention Wh-phrases that qualify as D(iscourse)-linked behave differently from other wh-phrases in a number of respects; see Pesetsky (1987) and much subsequent work. One well-known pecularity of D-linked wh-phrases is that they do not induce intervention effects in English if they show up in situ; see the contrast in (46).29 (46) a. *I know [NP2 which books ] who1 read t2 b. I know what2 [NP1 which people ] read t2 The present analysis can accomodate standard accounts of this phenomenon straightforwardly. Thus, assume that D-linked wh-phrases in English can optionally lack a (proper) [wh]-feature that would make them accessible for a [*wh*] feature on C. Then, if a D-linked wh-phrase lacks the [wh] feature in the workspace, Phrase Balance can only be fulfilled by movement of the remaining wh-phrase, and an intervention effect can be avoided. (Of course, a [wh] feature must be present in those cases where the D-linked wh-phrase itself undergoes wh-movement.)
4. Conclusion Let me summarize the main results of this paper. First, I have argued that there are independent reasons for strengthening the standard PIC in a derivational grammar, from a condition on phases (PIC1,2) to a condition on phrases (PIC3). Second, it follows from this move that the PIC3 accounts for typical MLC effects in English without further ado. The MLC can therefore be dispensed with (except for a residue, the A-Over-A Condition). Third, given that German has scrambling of wh-phrases, superiority effects are predicted to be absent, except under those circumstances where scrambling is independently excluded (long-distance effects, subject position effects, remnant movement effects). Fourth and finally, unlike the MLC, the system based on Phrase Balance and the PIC3 predicts superiority-like intervention effects without c-command, which are indeed attested.
316
Gereon Müller
Needless to say, the PIC3 has important consequences for many other phenomena outside the domain of wh-constructions, especially if we we adopt the following hypothesis, which the approach assumed here lends itself to: (47) Once rendered inaccessible by the PIC3, syntactic structure does not become accessible again when the syntactic derivation terminates (“at LF”). Hence, there can be no constraints on representations (“bare output conditions”). Hypothesis (47) effectively implies a derivational approach to semantic interpretation, i.e., cyclic semantic spell-out (see note 5; and Sternefeld (1996) and Adger and Svenonius (2003) for sketches of such a model of interpretation). (47) also suggests that there is no reason left to assume the existence of traces (neither as t, nor as a copy): Given the PIC3, these are not accessible for semantic interpretation, and there are no derivational constraints that apply to them. The hypothesis also raises interesting problems for binding of anaphors (at least for those cases that are not strictly local, and that therefore cannot be covered by the reflexivity constraints of Reinhart and Reuland 1993) and pronouns; for control; for long-distance agreement; etc. In general, apparently non-local relations must decomposed into a succession of local steps, as proposed in Gazdar et al. (1985). More specifically, non-local relations could be accounted for by successive-cyclic local [F] feature movement from head to head (required by constraints of the Phrase Balance type or motivated by independent features; see Pesetsky (2000) on the viability of feature movement). [F] must encode the relevant properties of the in-situ element; e.g.: anaphor, PRO. For binding, this strategy would be a natural extension of proposals like LF movement of anaphors (see Chomsky 1986).30 For (obligatory) control, the strategy would amount to a decomposition of Landau (2000) Agree relation into small steps of feature movement (or, indeed, a version of Hornstein’s (2001) A-movement approach). However, carrying out such analyses is beyond the scope of the present paper.
Acknowledgements For helpful comments and discussion, I would like to thank Chris Collins, Gisbert Fanselow, Silke Fischer, Hubert Haider, Fabian Heck, Michal Starke, two careful reviewers, and the participants of the workshops on Minimal Link Effects at Universität Potsdam (March 2002), and on Tools in Linguistic Theory (TiLT) at Universiteit Utrecht (April 2002). This paper was inspired by a talk by Michael Brody at Universität Tübingen (February, 2001).
Phrase impenetrability and wh-intervention
317
Notes 1. For more recent versions of the SCC, see Chomsky (1995, 2001b), Collins (1997), Kitahara (1997), Bo‰koviç and Lasnik (1999), and Freidin (1999), among others. 2. Here and henceforth, I write “PICn” when I refer to a specific version of this constraint (there will be three all in all), and “PIC” when I do not discriminate between the different versions. 3. Following Heck and Müller (2000), I will suggest a slightly different constraint to replace the Optional EPP Feature Condition in subsection 3.1 below; this latter constraint can be locally evaluated. 4. At least, this holds as long as we assume that object movement must end up in a position in vP that is higher than the base position of the subject; but see Richards (2001) for a different view. 5. This consequence is particularly obvious if we assume the concept of cyclic spell-out, according to which domains that have been rendered inaccessible via the PIC are immediately sent off to the phonological and semantic interfaces; see Chomsky (2001a: 4). 6. Whether or not one still insists on calling these objects representations is no more than a terminological issue. What counts is the extreme reduction of representations to small, virtually unstructured objects, which leads to a system in which Brody’s conceptual objection loses its force. 7. As such, it closely resembles the Head Constraint developed by van Riemsdijk (1978) (see also the Bounding Condition proposed by Koster 1978). Note that this denies a special role of CP and vP for the purposes of movement theory (contra Chomsky 2000, 2001b; Fox 2000; Nissenbaum 2000; Bruening 2001; Barbiers 2002; and others). However, the revised approach is of course compatible with the all the evidence suggesting that SpecC and Specv are used by successive-cyclic movement. Moreover, the concept of phase does not necessarily have to be abandoned: Phases are independently motivated (semantically, as propositional objects), and may or may not figure as special derivational units in other parts of the theory. Note finally that the present approach is therefore not as radical as the one pursued in Epstein and Seely (2002) (where the relevant move is not from phase to phrase, but from phase to derivational step). 8. This can be encoded in an optimality-theoretic manner by a ranking {Feature Condition, SCC, PIC3, Phrase Balance} >> C >> Last Resort. Note that an additional constraint C would be needed to ensure that the higher-ranked constraints are in fact never violable in a well-formed output: C punishes the candidate derivation that derives absolute ungrammaticality/ineffability, e.g., an empty output (null parse), or an unfaithful output that removes an offending property and leads to neutralization of different input specifications. See Müller (2000) and Fanselow and Féry (2002) for discussion of these and further options in optimality-theoretic syntax. 9. The resulting system is thus close to analyses in Sportiche (1989), Sportiche (1998), Takahashi (1994), and Agbayani (1998), among others. It also bears a
318
10. 11.
12. 13.
14.
15.
16.
17. 18.
19.
Gereon Müller certain resemblance to GPSG analyses that rely on Slash feature percolation (Gazdar 1981; Gazdar et al. 1985), to the approach in terms of gap marker percolation developed by Stechow Sternefeld (1981), and to Koster (2000)’s analysis based on feature percolation in gap phrases. Note that Phrase Balance forces movement of the wh-phrase within VP already, so as to displace [wh] to the edge of V. The marginality is due to a general weak ban on A-bar movement of dativeshifted objects in English and thus independent of superiority; see Stowell (1981: ch. 4) and Larson (1988), among others. To accomodate evidence from binding theory, we must then assume that linear order is relevant; see Barss Lasnik (1986) and Jackendoff (1990). However, if we follow Chomsky (2002: 133–136), this position will invariably be a specifier (i.e., non-first Merge) position. Thus, as will be shown momentarily, at this point it is crucial that (15) refers to edgeX rather than to SpecX. Heck (2001, 2004) develops a more elaborate theory of pied piping that does without feature percolation. This approach can be reconciled with the present analysis, but I will refrain from doing so, for reasons of space and coherence. There are two possibilities: Either NP2 is merged in SpecV, or it is moved there because of some [*F*] feature that triggers dative shift to that position. The present analysis is compatible with both a base-generation and a movement approach to dative shift constructions. It should be noted, however, that there is some disagreement about the status of these examples. Sentences like (23d) are classified as ill formed in Jackendoff (1990: 433), and as well formed in Fiengo (1980: 124). Furthermore, (23a) and (23c) are classified as acceptable by Jackendoff; but note that these examples are in fact expected to involve an additional violation of the Clause Nonfinal Incomplete Constituent Constraint; see Kuno (1973: 379), Lasnik and Saito (1992: 91). This constraint is operative independently of multiple-wh (superiority) contexts; see (i-a) vs. (i-b). (i) a. Who2 did you give [NP pictures of Mary ] [PP to t2 ] ? b. ?*Who1 did you give [NP pictures of t1 ] [PP to John ] ? However, see Haider (2000: 239) for an additional dissimilarity requirement on the two wh-phrases. The basic idea can already be found in the analysis of the lack of weak crossover effects in German that is developed in Grewendorf (1988: 320). Other accounts of the lack of superiority effects in German include Haider (1983), Noonan (1988), Bayer (1990), Haider (1993), Müller (1995), Richards (2001), Kim and Sternefeld (1997), Haider (2000), Pesetsky (2000), and Grewendorf (2001). Fanselow (1991), Wiltschko (1997), Grohmann (1998), and Featherston (2001) suggest re-evaluations of the empirical evidence. As noted in Fanselow (1990), Müller and Sternefeld (1993), and elsewhere, whscrambling often leads to reduced acceptability (but not strict ungrammaticality), which is not attested in cases like (24b) and (25b). However, reduced acceptability may result not from the application of wh-scrambling as such, but from
Phrase impenetrability and wh-intervention
20.
21.
22. 23.
24.
25.
319
the surface position of a scrambled wh-phrase. Since wh-scrambling is subsequently undone in a derivation like (26), this restriction will not apply. English does not have scrambling; but it does exhibit topicalization. Thus, it has to be ensured that the account of superiority effects in English is not undermined by intermediate wh-topicalization. Indeed, wh-topicalization is independently excluded in English (and other languages); see Epstein (1992) and Müller and Sternefeld (1996) for analyses and further references. As a matter of fact, only v and V can tolerate [Y] in their edge domains; no other kind of head provides a scrambling domain (see Müller 1995). Hence, there can be no [*Y*] feature on T or C that could trigger movement beyond vP. Still, something extra will ultimately have to be said to derive the ban on longdistance scrambling in German in toto: It must be ensured that a [*Y*] feature on a matrix V or v cannot attract an XP bearing [Y] in the lower clause. There are various ways of achieving this; but I will not pursue the matter here. To the best of my knowledge, this observation is new; an informal survey suggests that the data are quite robust. Haider (2002) argues that Icelandic has optional subject raising, and that it exhibits superiority effects with subject NPs only when the subject NP is in SpecT, not when it is in Specv. This generalization can be derived in the same way. Note that the distinction between head and edge element in the definition of the A-Over-A Condition is the only case where a minimal structural differentiation of the bundle of categories accessible for further operations seems necessary; recall the discussion in subsection 2.2. This means that, if nothing else is said, examples of the type in (37c), where a wh-phrase XP2 is embedded in some other phrase that c-commands the whphrase XP1, are predicted to be ill formed even if movement of XP2 to SpecC[*wh*] does not violate the CED or another locality constraint, as long as XP2 cannot reach the main branch by some other movement operation like scrambling. Relevant examples are hard to find, though. In most pertinent cases, XP2 will have to cross an island, and in the few well-formed constructions where locality constraints can be respected, XP2 can usually undergo scrambling first (given some proviso concerning the coherence/incoherence distinction with control infinitives). Also recall the discussion of the examples in (23) in English. However, should there turn out to be clear cases of well-formed instantiations of the structure in (i) (where _ is not an island and wh2 cannot reach a by an independently available non-wh-movement operation), the present approach would be in need of a modification. (i) … wh2 … [a [_ … t2 … ] … [` … wh1 … ] … … ] … One possibility would be to make the definition of workspace of a derivation sensitive to the distinction between main and minor branches (such that features on the main branch would not count as potentially available when a derivation proceeds in a minor branch). Then, structures like (37c) would not (have to) violate the PIC3 anymore, and wh-movement in (i) would be predicted to be legitimate if _ is not an island. – It might also be worth noting at this point that
320
26.
27.
28.
29. 30.
Gereon Müller replacing requirement (ii) in the definition of potential availability in (15) by the stricter requirement (ii)v would lead to an approach that is empirically very close to an MLC-based system, with wh-intervention effects reduced to c-command environments. (ii)v [F] is on X or edgeX of a root in the workspace of the derivation (lexical items are trivial roots). In principle, one would expect the same kind of superiority-like effect to also occur with examples involving scrambling from wh-XPs, as in (33). However, relevant examples that would show this are difficult to construct because they would have to involve multiple embedding within NP, which creates difficulties of various kinds in multiple questions, for (presumably) independent reasons. The data in (43) were checked with various native speakers, who unanimously declared them to be ill formed, and who all found a sharp contrast in the minimal pairs that can be formed on the basis of (41) and (43). However, I am aware of one exception to the apparent general neglect of constructions like those in (43) in the literature: Such examples are discussed in Fiengo et al. (1988) and, following them, Fitzpatrick (2002), and judged grammatical. I have nothing to say here about the source of the diverging judgements, except for the observation that Fiengo et al. (1988) are primarily concerned with contrasting the construction in (43), with a wh-phrase embedded in a subject NP and an object wh-phrase ending up in front of it, with one in which the subject NP-internal wh-phrase undergoes movement (in violation of the CED) and the object wh-phrase stays in situ – and not with one in which a wh-phrase is embedded in an object NP and a subject wh-phrase undergoes movement. In other words: One might speculate that judgement differences arise in this domain because different kinds of minimal pairs are taken into account, and judgements are taken to be relative rather than absolute. In contrast, the MLC would not make the right predictions. If the MLC is defined in terms of asymmetric c-command (see (8)), all sentences in (43) are ceteris paribus predicted to be well formed; if it is defined in terms of closeness (see section 3.7), it will also wrongly permit wh-movement of NP2 in (43), at least in those cases where NP1 is deeply embedded. The situation is different in German, where D-linking does not seem to have such effects; see, e.g., the examples in (40). Compare the account of A-chain condition effects in Reuland (2001). Also see Fischer (2004) for a derivational analysis of binding phenomena along these lines.
Phrase impenetrability and wh-intervention
321
References Adger, David and Peter Svenonius 2003 Beyond the Interface: Domains of Semantics. GLOW Newsletter 50, Spring 2003. Agbayani, Brian 1998 Feature Attraction and Category Movement. Ph.D. thesis, UC Irvine. Barbiers, Sjef 2002 Remnant Stranding and the Theory of Movement. In: Dimensions of Movement. Benjamins, Amsterdam, pp. 47–67. Barss, Andrew and Howard Lasnik 1986 A Note on Anaphora and Double Objects, Linguistic Inquiry 17, 347–354. Bayer, Josef 1990 Notes on the ECP in English and German, Groninger Arbeiten zur Germanistischen Linguistik 30, 1–51. Bo‰koviç, Zeljko and Howard Lasnik 1999 How Strict is the Cycle? Linguistic Inquiry 20, 691–703. Brody, Michael 2001 Some Aspects of Elegant Syntax. Ms., University College London. Bruening, Benjamin 2001 Syntax at the Edge: Cross-Clausal Phenomena and the Syntax of Passamaquoddy. Ph.D. thesis, MIT, Cambridge, Mass. Büring, Daniel and Katharina Hartmann 1994 The Dark Side of Wh-Movement, Linguistische Berichte 149, 56–74. Chomsky, Noam 1973 Conditions on Transformations. In: S. Anderson and P. Kiparsky (eds.), A Festschrift for Morris Halle. Academic Press, New York, pp. 232–286. 1981 Lectures on Government and Binding. Foris, Dordrecht. 1986 Knowledge of Language. Praeger, New York. 1995 The Minimalist Program. MIT Press, Cambridge, Mass. 2000 Minimalist Inquiries: The Framework. In: R. Martin, D. Michaels and J. Uriagereka (eds.), Step by Step. MIT Press, Cambridge, Mass., pp. 89–155. 2001a Beyond Explanatory Adequacy. Ms., MIT, Cambridge, Mass. 2001b Derivation by Phase. In: M. Kenstowicz (ed.), Ken Hale. A Life in Language. MIT Press, Cambridge, Mass., pp. 1–52. 2002 On Nature and Language. Cambridge University Press, Cambridge. Collins, Chris 1997 Local Economy. MIT Press, Cambridge, Mass. Diesing, Molly 1992 Indefinites. MIT Press, Cambridge, Mass.
322
Gereon Müller
Epstein, Samuel David — 1992 Derivational Constraints on A-Chain Formation, Linguistic Inquiry 23, 235–259. Epstein, Samuel David and T. Daniel Seely 2002 Rule Applications as Cycles in a Level-Free Syntax. In: S. D. Epstein and T. D. Seely (eds.), Derivation and Explanation in the Minimalist Program. Blackwell, Oxford, pp. 65–89. Fanselow, Gisbert 1990 Scrambling as NP-Movement. In: G. Grewendorf and W. Sternefeld (eds.), Scrambling and Barriers. Benjamins, Amsterdam, pp. 113–140. 1991 Minimale Syntax. Habilitation thesis, Universität Passau. 1996 The Proper Interpretation of the Minimal Link Condition. Ms., Universität Potsdam. Fanselow, Gisbert and Caroline Féry 2002 Ineffability in Grammar. In: G. Fanselow and C. Féry (eds.), Resolving Conflicts in Grammars. Buske, Hamburg, pp. 265–307. Featherston, Sam 2001 Universals and Grammaticality: Wh-Constraints in German and English. Ms., Universität Tübingen. Ferguson, Scott and Erich Groat 1994 Defining `Shortest Move’. Ms., Harvard University. Fiengo, Robert 1980 Surface Structure. Harvard University Press, Cambridge, Mass. Fiengo, Robert, Cheng-Teh James Huang, Howard Lasnik and Tanya Reinhart 1988 The Syntax of Wh-in-situ. In: H. Borer (ed.), Proceedings of WCCFL 7. CSLI Publications, Stanford, pp. 81–98. Fischer, Silke 2004 Towards an Optimal Theory of Reflexivization. Doctoral dissertation, Universität Tübingen. Fitzpatrick, Justin 2002 On Minimalist Approaches to the Locality of Movement, Linguistic Inquiry 33, 443–463. Fox, Danny 2000 Economy and Semantic Interpretation. MIT Press, Cambridge, Mass. Freidin, Robert 1999 Cyclicity and Minimalism. In: S. D. Epstein and N. Hornstein (eds.), Working Minimalism. MIT Press, Cambridge, Mass, pp. 95–126. Frey, Werner 1993 Syntaktische Bedingungen für die Interpretation. Akademieverlag, Berlin. Gazdar, Gerald 1981 Unbounded Dependencies and Coordinate Structure, Linguistic Inquiry 12, 155–184.
Phrase impenetrability and wh-intervention
323
Gazdar, Gerald, Ewan Klein, Geoffrey Pullum and Ivan Sag 1985 Generalized Phrase Structure Grammar. Blackwell, Oxford. Grewendorf, Günther 1988 Aspekte der deutschen Syntax. Narr. 2001 Multiple Wh-Fronting, Linguistic Inquiry 32, 87–122. Grewendorf, Günther and Joachim Sabel 1999 Scrambling in German and Japanese, Natural Language and Linguistic Theory 17, 1–65. Grohmann, Kleanthes 1997 German Superiority, Groninger Arbeiten zur Germanistischen Linguistik 40, 97–107. 1998 Syntactic Inquiries into Discourse Restrictions on Multiple Interrogatives, Groninger Arbeiten zur Germanistischen Linguistik 42, 1–60. Haider, Hubert 1983 Connectedness Effects in German, Groninger Arbeiten zur Germanistischen Linguistik 23, 82–119. 1993 Deutsche Syntax – generativ. Narr, Tübingen. 2000 Towards a Superior Account of Superiority. In: U. Lutz, G. Müller and A. von Stechow (eds.), Wh-Scope Marking. Benjamins, Amsterdam, pp. 231–248. 2002 Superiority Revisited: Dutch, English, German, Icelandic Contrasts. A Representational Account. Ms., Universität Salzburg. Heck, Fabian 2001 Pied Piping Without Feature Percolation. Ms., Universität Stuttgart. 2004 A Theory of Pied Piping. Doctoral dissertation, Universität Tübingen. Heck, Fabian and Gereon Müller 2000 Successive Cyclicity, Long-Distance Superiority, and Local Optimization. In: R. Billerey and B. D. Lillehaugen (eds.), Proceedings of WCCFL. Vol. 19, Cascadilla Press, Somerville, Mass., pp. 218–231. Hornstein, Norbert 1995 Logical Form. Blackwell, Oxford. 2001 Move. A Minimalist Theory of Construal. Blackwell, Oxford. Huang, Cheng-Teh James 1982 Logical Relations in Chinese and the Theory of Grammar. Ph.D. thesis, MIT, Cambridge, Mass. 1995 Logical Form. In: G. Webelhuth (ed.), Government and Binding Theory and the Minimalist Program. Blackwell, Oxford, pp. 125–175. Jackendoff, Ray 1990 On Larson’s Account of the Double Object Construction, Linguistic Inquiry 21, 427–454. Kim, Shin-Sook and Wolfgang Sternefeld 1997 Superiority vs. Crossover. Ms., Universität Tübingen.
324
Gereon Müller
Kitahara, Hisatsugu 1997 Elementary Operations and Optimal Derivations. MIT Press, Cambridge, Mass. 1995 Phrase Structure in Minimalist Syntax. Ph.D. thesis, MIT, Cambridge, Mass. Koster, Jan 1978 Locality Principles in Syntax. Foris, Dordrecht. 2000 Variable-Free Grammar. Ms., University of Groningen. Kuno, Susumo 1973 Constraints on Internal Clauses and Sentential Subjects, Linguistic Inquiry 4, 363–385. Landau, Idan 2000 Elements of Control. Kluwer, Dordrecht. Larson, Richard 1988 On the Double Object Construction, Linguistic Inquiry 19, 335–391. Lasnik, Howard and Mamoru Saito 1992 Move _. MIT Press, Cambridge, Mass. Müller, Gereon 1995 A-bar Syntax. Mouton/de Gruyter, Berlin. 1998 Incomplete Category Fronting. Kluwer, Dordrecht. 2000 Elemente der optimalitätstheoretischen Syntax. Stauffenburg, Tübingen. 2001 Order Preservation, Parallel Movement, and the Emergence of the Unmarked. In: G. Legendre, J. Grimshaw and S. Vikner (eds.), Optimality-Theoretic Syntax. MIT Press, Cambridge, Mass., pp. 279–313. Müller, Gereon and Wolfgang Sternefeld 1993 Improper Movement and Unambiguous Binding, Linguistic Inquiry 24, 461–507. 1996 A-bar Chain Formation and Economy of Derivation, Linguistic Inquiry 27, 480–511. Nissenbaum, Jon 2000 Covert Movement and Parasitic Gaps. In: M. Hirotani, A. Coetzee, N. Hall and J.-Y. Kim (eds.), Proceedings of NELS 30. GLSA, Amherst, Mass, pp. 542–555. Noonan, Máire 1988 Superiority Effects: How do Antecedent Government, Lexical Government and V/2 Interact?, McGill Working Papers in Linguistics pp. 192–214. Perlmutter, David and Scott Soames 1979 Syntactic Argumentation and the Structure of English. The University of California Press, Berkeley. Pesetsky, David 1987 Wh-in-Situ: Movement and Unselective Binding. In: E. Reuland and A. ter Meulen (eds.), The Representation of (In)Definiteness. MIT Press, Cambridge, Mass, pp. 98–129.
Phrase impenetrability and wh-intervention
325
Pesetsky, David 2000 Phrasal Movement and Its Kin. MIT Press, Cambridge, Mass. Reinhart, Tanya and Eric Reuland 1993 Reflexivity, Linguistic Inquiry 24, 657–720. Reuland, Eric 2001 Primitives of Binding, Linguistic Inquiry 32, 439–492. Richards, Norvin 2001 Movement in Language. Oxford University Press, Oxford. Sauerland, Uli 1999 Erasability and Interpretation, Syntax 3, 161–188. Sportiche, Dominique 1989 Le Mouvement Syntaxique: Contraintes et Paramètres, Langages pp. 35–80. 1998 Partitions and Atoms of Clause Structure. Routledge, London. Stechow, Arnim von and Wolfgang Sternefeld 1981 A Modular Approach to German Syntax. Ms., Universität Konstanz. Sternefeld, Wolfgang 1996 A Minimalist Semantics for Questions. Ms., Universität Tübingen. 2000 Syntax. Eine merkmalsbasierte Analyse. Book ms., Universität Tübingen. Stowell, Tim 1981 Origins of Phrase Structure. Ph.D. thesis, MIT, Cambridge, Mass. Takahashi, Daiko 1994 Minimality of Movement. Ph.D. thesis, University of Connecticut. Takano, Yuji 1994 Unbound Traces and Indeterminacy of Derivation. In: M. Nakamura (ed.), Current Topics in English and Japanese. Hituzi Syobo, Tokyo, pp. 229–253. van Riemsdijk, Henk 1978 A Case Study in Syntactic Markedness: The Binding Nature of Prepositional Phrases. Foris, Dordrecht. Wiltschko, Martina 1997 Scrambling, D-linking and Superiority in German, Groninger Arbeiten zur Germanistischen Linguistik 41, 107–142.
MLC violations: Implications for the syntax/phonology interface Geoffrey Poole and Noel Burton-Roberts
Introduction In this paper, we examine several word-order effects associated with Stylistic Fronting (SF) in Icelandic and Long Head Movement (LHM) in Breton.1 In these constructions, there is an apparent conflict with the universal and inviolable Minimal Link Condition (MLC) on syntactic movement. We argue that this conflict is created by the generally assumed ‘realizational’ nature of the syntax/phonology relationship, in which phonology ‘realizes’, or ‘spells out’, the (overt) syntactic derivation. Burton-Roberts (2000) and Carr (2000) have suggested an alternative view in which phonetic phenomena stand in a relation of representation to syntactic objects rather than being realizations of them. This distinction may seem rather subtle. However, it has far-reaching implications. In particular, it entails a broader notion of phonology than is possible under the realizational view. This broader view, we argue, allows us to explain SF and LHM word-order effects in phonological rather than syntactic terms. We can therefore conclude that SF and LHM effects, being phonological rather than syntactic, do not undermine the universality or inviolability of the MLC.
1. The MLC and ‘realizational’ phonology Chomsky (1995: 311) gives the following statement of the Minimal Link Condition: (1)
The Minimal Link Condition K attracts _ only if there is no `, ` closer to K than _, such that K attracts `.
Violations of the MLC are particularly problematic given the increasingly central role that this constraint plays. Chomsky (1991; 1995, ch. 2) views the
328
Geoffrey Poole and Noel Burton-Roberts
MLC as an Economy condition (see Chomsky 1995: 89–90). On this conception, MLC violations are possible in order to ‘save’ a derivation which would otherwise not converge. Chomsky later strengthens the MLC (1995: 268–269; 294–297), however, adducing conceptual and empirical arguments for incorporating it into the very definition of a syntactic operation. If this radically stronger version of the MLC is correct, we expect the MLC to be inviolable. Any operation which failed to create the minimal link is on a par with attempting an illegal move in a game of chess. It is simply not a possible syntactic operation. On the traditional view, in which phonology ‘realizes’ or ‘spells out’ the syntactic computation, word order effects systematically observable at PF are the epiphenominal consequences of syntactic operations. But if SF and LHM instantiate syntactic operations, then they must be viewed as tolerating violations of the MLC, thus calling into question the assumption that the MLC is part of what defines a syntactic operation. Consider (2) and (3): (2)
fietta er versta bók sem skrifu› hefur veri› _____ [Icelandic] This is the worst book that written has been ‘This is the worst book that has (ever) been written.’
(3)
Lennet en deus Yann ___ al levr Read 3SG has Yann the book ‘Yann read the book.’
[Breton]
In (2), it appears as though the participle skirfu› has moved from its basegenerated position to some higher position, passing over the participle veri› and finite auxiliary hefur. In (3), the participle lennet appears to have ‘skipped over’ the finite verb en deus in undergoing LHM. We could avoid the problem if the SF and LHM effects could be treated as purely phonological effects. The problem here is that realizational phonology gives one a narrow view of phonology. On realizational terms, it is not sufficient – in arguing that an effect is phonological – to show that it is not syntactic. Phonology is not a ‘wastebasket’ for syntax. Aspects of the PF representation that are not simply realizational of syntax must, more positively, be shown to have phonological motivation, narrowly construed in terms of the traditional scope of phonology – syllable structure, phonological features, prosody, etc.2 In Section 2, we examine the SF and LHM effects in more detail, focusing on data like (2) and (3) above, in which the MLC appears violated. This leads into Sections 3 and 4, in which we consider previous responses to the problems posed by SF and LHM respectively.
MLC violations: Implications for the syntax/phonology interface
2.
329
MLC violations in Icelandic and Breton
2.1. SF in Icelandic: Implications for the MLC SF is exemplified by word-order contrasts such as those in (4). (4)
a. fietta er ma›ur sem hefur leiki› níutíu leiki. This is a man that has played ninety games b. fietta er ma›ur sem leiki› hefur ____ níutíu leiki. This is a man that played has ninety games
There is no meaning difference associated with SF. (4b) is simply a stylistic variant of (4a). It targets not only past participles, as in (4), but also negation, verbal particles, and adjectives. SF effects can be seen in both main and embedded clauses, but it is strictly clause-bounded.3 (5) *fietta er ma›urinn sem sé› spur›i [hvort ég hef›i ___ myndina]. This is the.man that seen asked whether I had the.film ‘This is the man that asked whether I had seen the film.’ In general, SF appears to obey the MLC. As first noted by Maling (1990), SF is governed by an accessibilty hierarchy. SF of an element lower on the hierarchy is blocked if a higher element in the hierarchy is present.4
(6)
negation > predicate adjective >
{
past participle verbal particle
}
Thus, for example, the presence of negation blocks SF of a predicate adjective which could otherwise undergo SF, as illustrated by (7) and (8). (7)
sem ekki er ___ haegt a› geravi›. a. fietta er nokku› This is something that not is ___ possible to fix PRT b. * fietta er nokku› sem haegt er ekki ___ a› geravi›. This is something that possible is not to fix PRT
(8)
fietta er nokku› sem haegt er ____ a› gera vi› This is something that possible is ____ to fix PRT
330
Geoffrey Poole and Noel Burton-Roberts
The facts in (7) and (8) would seem to be transparently explained under the assumption that SF is subject to the MLC. The head hægt ‘possible’ may not undergo SF if it would cross another head (the negation ekki). However, as discussed above, there are cases such as (2) above which appear to violate the MLC. In (2), we see that the past participle skrifu› has been moved by SF, despite the intervening passive participle veri› and the finite verb hefur.
2.2. LHM in Breton Breton, like other Celtic languages, has VSO as the default word order in both main and embedded clauses. Clauses with SVO order are taken to be instances of topicalization (see Borsely and Stephens (1989) for discussion). However, Breton in addition seems to have the requirement that the finite verb may not in general be sentence initial. It must be preceded by an element such as negation or a topicalized NP. (Examples are from Borsely, Rivero, and Stephens (1996) (henceforth BRS).)5 (9)
*Lenn Anna al levr. read Anna the book ‘Anna reads the book.’
(10) a. Al levr a lenn Anna. the book PRT read Anna ‘Anna reads the book’ b. Ne lenn ket Anna al levr. neg read neg Anna the book ‘Anna didn’t read the book.’ Breton also has sentences like (11), which instantiate Long Head Movement (see particularly BRS in the case of Breton) and are not found in the other Celtic languages. (11)
Lennet en deus Yann ___ al levr. read 3SG has Yann the book ‘Yann read the book.’
In these types of sentences, we see a non-finite verb, followed by the finite auxiliary, followed by the subject. Just as with other languages that exhibit
MLC violations: Implications for the syntax/phonology interface
331
LHM, it takes place in Breton in main clauses only. LHM in embedded clauses is impossible. As noted by BRS, this process is not an instance of Topicalization (or Remnant Topicalization). As they note, LHM is clause-bounded, while Topicalization is not. (12) a. *Desket am eus klevet he deus Anna he ____ c’hentelioù. learned 1sg have heard 3sg.F have Anna 3sg.F lessons ‘I have heard that Anna has learned her lessons.’ b. Al levr a lavaras Yann e lennas. the book PRT said Yann PRT read ‘The book, Yann said that he read.’ Just as with SF in Icelandic, however, we seem to have a problem with respect to the MLC. The most obvious analysis of LHM in Breton is that the non-finite verb is moving to C. However, it seems equally obvious that the finite verb is in I. This will require that the non-finite verb, in moving from V to C, ‘skip over’ the intervening finite verb in I, thereby violating the MLC.
3.
SF in Icelandic: Two ‘realizational’ responses
In this section, we consider two previous approaches to SF in Icelandic: Holmberg (2000) and Poole (1997). Holmberg analyzes the SF effect as being the realization of a syntactic operation. He therefore has a violation of the MLC to explain, and we argue that his analysis fails to do so. Poole, on the other hand, attempts to argue that the SF effect is caused by a phonological operation. This disposes of the MLC violation, as the MLC does not constrain phonological operations. We argue that Poole’s analysis faces empirical problems; in particular, it fails in its attempt to provide the requisite phonological motivation.
3.1. Holmberg’s (2000) analysis of Icelandic Holmberg (2000) presents a syntactic approach to SF. Additionally, unlike many previous authors, Holmberg does make an attempt to reconcile with the MLC the facts discussed in section 1 above. Holmberg’s central claim is that SF arises because of requirements imposed by the Extended Projection Principle. He assumes, however, that the EPP effect does not derive from a
332
Geoffrey Poole and Noel Burton-Roberts
single feature in Infl, but rather two features working in tandem. The first feature is a ‘D-feature’, which is checked by an element in Spec IP which has nominal features. The second feature is a ‘P-feature’, which is checked by an element in Spec IP which has phonological features. In the majority of cases (those in which SF is not observed), the two features are checked off by the same single element in Spec IP. Consider (13): (13) Jón hefur bari› Gu›mund. John has hit Gu›mundur When Jón raises from Spec-VP to Spec-IP, it checks off the D-feature of I. Since Jón has phonological content, it also checks the P-feature of I. However, when the subject does not have phonological content, it fails to check the P-feature, so SF is forced to apply. Consider (14): (14) Ef gengi› er eftir Laugaveginum… If walked is along the-Laugavegur… ‘If one walks along the Laugavegur…’ The subject of (14) is proarb, which cannot check off the P-feature of I, having no phonological content. Therefore, to keep the derivation from crashing, the phonological features of the nearest element, in this case the past participle gengi›, are attracted to Spec-IP. Thus, in the case of SF, Spec IP is occupied by the formal and semantic features of proarb, but by the phonological features of the participle gengi›.6 With Holmberg’s general explanation in hand, consider now the MLCviolating instance of SF from section 1 above: bók sem skrifu› hefur veri› _____. (15) fietta er versta This is the worst book that written has been ‘This is the worst book that has (ever) been written.’ From the point of view of the MLC, the central question is why the P-feature in I can attract skrifu› instead of veri›. Holmberg suggests that the key to understanding (15) is the fact that veri› is itself an auxiliary. When veri› is more copula-like, it undergoes SF and acts as a blocker, just as expected. (16) a.
fieir sem veri› hafa veikir flurfa a› fara til laeknis. Those who been have sick must see a doctor
b. ??fieir sem veikir hafa veri› flurfa a› fara til laeknis. Those who sick have been must see a doctor
MLC violations: Implications for the syntax/phonology interface
333
Holmberg argues that the contrast between (15) and (16) provides a conclusive argument that the SF effect is the realization of a syntactic movement operation. Whatever the explanation, it will need to refer to the syntactic distinction between ‘copula’ and ‘auxiliary’. But when auxiliary veri› and copula veri› are spelled-out, all that is left are their phonological/phonetic properties, and at this level the two elements are presumably identical. Since phonology cannot distinguish them, the operation must be syntactic. What Holmberg suggests is that auxiliaries are unable to undergo SF because they lack semantic features. In the light of this, he claims that, for the sentence to receive its proper interpretation, the SF’ed element must undergo LF reconstruction. But, by hypothesis, auxiliary verbs do not have sufficiently ‘salient’ formal and/or semantic features to undergo LF movement (q.v. Chomsky’s (1995: ch. 3) discussion of have and be raising in English). Therefore, any auxiliary verb which underwent SF would be ‘stranded’ at LF, and unable to be reconstructed. However, it is not clear that SF’ed elements actually need to undergo reconstruction under Holmberg’s analysis. Reconstruction is only necessary under the assumption that all features (semantic, formal and phonological) of a lexical item move as a unit. If that were the case with SF, the semantic features, for example, could be ‘out of place’ and it might then be necessary to restore them to their original position. But the semantic and formal features do not move under Holmberg’s analysis of SF. He is quite clear that it is only the phonological features of the SF’ed element that move. The other features remain in the original position (see, for example, his discussion of (29) on p. 460). Therefore any semantic features possessed by the SF’ed element are entirely unaffected by SF, remaining in the position in which they are interpreted. There is therefore no motivation for any reconstruction in the LF component, and it thus remains to explain why veri› should undergo SF as a copula but not as an auxiliary. In the absence of an account of this generalization, we are still left with the violation of the MLC. However, even if Holmberg’s explanation of the auxiliary/copula contrast were not internally problematic, a difficulty would still remain with respect to the MLC. As it must if it is a syntactic movement operation, SF operates on features – albeit in this case phonological features. Therefore, the MLC should require that the SF operation attract the closest set of phonological features, whatever category they are associated with. If this causes the derivation to crash (because of a ‘stranded’ auxiliary, say), then the derivation crashes. The MLC is no longer an Economy condition, selecting from among only convergent derivations. It is a defining property of Move/Attract operations. As such, it is not conditional. Attracting anything other than the
334
Geoffrey Poole and Noel Burton-Roberts
closest set of (in this case phonological) features is akin to making an illegal move in a game of chess. It is simply not a possible operation.
3.2. Poole’s (1997) phonological analysis of Icelandic Explicitly to avoid the problems with accounting for the SF effect as the realizational product of a syntactic movement operation, Poole (1997) explores another possibility – that the SF effect is the result of a purely phonological operation. This approach would seem to be consistent with the ‘stylistic’ character of the SF effect. Moreover, it obviates the need to account for the MLC ‘violation’, since the MLC does not constrain phonological operations. However, Poole’s approach does not in fact explain certain Icelandic word order data related to the ‘MLC-violating’ sentence. Furthermore, the strictly phonological motivation for the effect that Poole offers in support of his phonological account is not persuasive. Poole (1997) suggests that SF in Icelandic is not leftward movement of negation, past participles, etc., as traditionally assumed, but rather rightward movement in the phonology of just finite verbs. (17) a. fietta er tilbo› sem er ekki haegt a› hafna. This is an offer that is not possible to reject b. fietta er tilbo› sem ekki er haegt
a› hafna.
Poole offers the following phonological motivation for this operation: he claims that the auxiliary verb in (17b) is a prosodically deficient left-leaning clitic. The auxiliary is ‘stranded’ as the only member of its phonological phrase when there is no phonologically realized subject (thus capturing Maling’s (1990) ‘Subject Gap’ condition on SF). Being prosodically deficient, the auxiliary is unable to head a phonological phrase. This triggers a prosodic repair rule which ‘flips’ the auxiliary verb with the element to its right (the negation ekki in (17)). The auxiliary is then assimilated into the phonological phrase of the element it flips with, giving the auxiliary the requisite leftward host to cliticize to. Thus the SF word-order effect is not achieved by any leftward syntactic movement, but instead by a rightward movement of the auxiliary as part of a phonological restructuring. Although there is no potential MLC violation since the operation is phonological, the piece of data in (18) still needs to be accounted for.
MLC violations: Implications for the syntax/phonology interface
335
(18) fietta er versta bók sem skrifu› hefur veri›. This is the worst book that written has been ‘This is the worst book that has (ever) been written.’ What Poole suggests is that both the auxiliaries hefur and veri› are prosodically deficient, and they both undergo the ‘prosodic flip’ rule. The specific order is accounted for under the assumption that the two elements form a clitic group at a lower level of phonological organization and therefore flip as a unit. However, this explanation does not account for cases like (19), in which, as we have seen above, auxiliary veri› is unable to undergo SF. bók sem veri› hefur skrifu›. (19) *fietta er versta This is the worst book that been has written ‘This is the worst book that has ever been written.’ It is not clear from Poole’s account why (19) isn’t generable, under the assumption that auxiliary hefur is a clitic, but auxiliary veri› is not. As Poole notes, the lexicon must contain both clitic and non-clitic versions of the auxiliaries, in order to account for sentences like (20), in which SF does not take place. bók sem hefur veri› skrifu›. (20) fietta er versta This is the worst book that has been written ‘This is the worst book that has (ever) been written.’ In order to generate (20), neither hefur nor veri› can be a clitic. Otherwise the prosodic repair rule would be triggered. Therefore, it is seems as though (19) ought to be generable by selecting the clitic version of hefur from the lexicon and the non-clitic version of veri›. This option must be blocked, but it is not at all clear how Poole’s system does so. Furthermore, as noted by Burton-Roberts and Poole (in press), the claim that Icelandic auxiliaries can be prosodically deficient clitics, and therefore the phonological motivation necessary to analyze the phenomenon as nonsyntactic, is not completely persuasive. Clitics, as normally understood, are unfooted syllables. But the Icelandic auxiliaries in question, hefur and veri›, are both disyllables, with a strong-weak prosodic contour. The fact that the first syllable is metrically strong implies that the whole form constitutes a foot – a syllabic trochee (for Icelandic as a syllabic trochee language, see, e.g., Hayes 1995), in which case they cannot be clitics.
336
Geoffrey Poole and Noel Burton-Roberts
It therefore seems as though Poole’s analysis faces the empirical problem of accounting for (19). More importantly, though, it seems as though the phonological motivation that Poole provides does not work. This is crucial because, under a realizational approach to the syntax/phonology interface, any aspects of the PF representation which are not realizational must be explained in strictly phonological or phonetic terms. To argue that a phenomenon is phonological, one must show more than merely that it is not syntactic.7
4. LHM in Breton: Two ‘realizational’ responses In this section, we turn to the LHM effect in Breton, and two representative analyses. Borsely, Rivero, and Stephens (1996), like Holmberg (2000), take the LHM effect as the realization of a syntactic movement operation. They therefore need to explain away the apparent violation of the MLC, and we argue that their approach fails in this. Schafer (1997) postulates ‘multiple insertion’ rather than movement. We argue that, despite appearances, this approach also fails to explain away the MLC violation. 4.1. BRS’s analysis of LHM in Breton Recall from Section 1 the canonical example of LHM in Breton. (3)
[CP [C Lennet ] [IP [I en deus] [VP Yann ___ al levr ] ] ] read 3SG has Yann the book ‘Yann read the book.’
As mentioned, BRS assume that the LHM effect is the realization of a syntactic operation. The mechanics of their analysis are relatively straightforward: the verb is raising to C from its base-generated position in V in order to license Tense. BRS are assuming Relativized Minimality, rather than the Minimal Link Condition. However, even on these terms, the grammaticality of (3) is still unexpected. In moving from V to C, the participle has ‘skipped over’ the finite verb in I. If the licensing of Tense is the driving force behind the operation, we might expect that the finite verb in I is an intervenor for purposes of Relativized Minimality, blocking movement of the participle in V. In order to explain why the finite verb in I does not count as an intervenor, BRS appeal to Roberts’ (1994) distinction between L-related heads and non-L-related heads. L-related heads are those which contain a feature
MLC violations: Implications for the syntax/phonology interface
337
of a lexical head (as opposed to a functional head). In (3), therefore, I is an L-related head, since it contains the auxiliary. Prior to movement, however, C is a non-L-related head since it contains only the features of the functional head C. Therefore, BRS contend, if Relativized Minimality is sensitive to the L-related distinction, then the finite verb in I will not block movement of the participle in V. The intervenor is in an L-related position, and does therefore not block movement to a non-L-related position. Raising from V to C is therefore licensed. As BRS are assuming that movement is constrained by Relativized Minimality, there is no particular problem with incorporating L-relatedness into its definition. But it is not clear how to translate their Relativized Minimality analysis into one incorporating the Minimal Link Condition (under the assumption that the latter now subsumes the former). The MLC operates on features; the question is: what would the relevant feature be in the case of LHM? C must have some feature which I lacks that results in the participle being attracted/moved. It cannot be a purely verbal feature, otherwise we would predict that the finite verb en deus should be the element that undergoes movement. It therefore seems that BRS’s analysis of (3) fails to explain what looks like an MLC violation.8
4.2. Schafer’s (1997) ‘multiple insertion’ analysis of Breton Schafer (1997) regards the LHM effect in Breton – and the verb-second effect with which it interacts – as a stylistic effect. For this and other reasons, she claims that neither effect should be accounted for in terms of syntactic movement. We are sympathetic to this general approach and in due course will argue that, properly implemented, it can provide the basis of a principled account that avoids any violation of the MLC. Nevertheless, Schafer’s implementation of it fails to avoid MLC violations, we argue. Rather than postulating syntactic movement (of lennet from V to C), two identical elements are selected from the lexicon and inserted into the derivation – one in V and one in C, as illustrated in (21). (21) [CP Lennet [IP Yann en deus [VP lennet al levr ] ] ] Read Yann 3SG has read the book ‘Yann read the book.’ Then, within what she refers to as a ‘stylistic component’ of the phonological computation, two operations are triggered which derive the word order seen
338
Geoffrey Poole and Noel Burton-Roberts
on the surface. The first is that the inflected verb en deus is adjoined to C°. This has the effect of creating a verb-second order, enforcing what Schafer calls a ‘verb-second linearization requirement’, resulting in the intermediate representation in (22). (22) [CP Lennet en
deus [IP Yann [VP lennet al levr ] ] ]
Second, Schafer claims that it is Full Interpretation at PF which requires that one of the two instances of lennet ‘read’ be deleted. Since deletion of the higher instance would result in a clause-initial inflected verb (which is prohibited in Breton), it is the lower copy which is deleted at PF, resulting in the word order seen on the surface. (23) Lennet en deus Yann ___ al levr. Read 3SG has Yann the book ‘Yann read the book.’ Since Schafer’s analysis derives the effect of LHM via multiple insertion rather than syntactic movement, it might be thought that her analysis does not conflict with the MLC or its status as a defining part of Move/Attract. On closer analysis, though, it seems the MLC must indeed constrain the relation between these two identical elements. This is not unexpected, as this relation would appear to be a chain in all but name. However the relation between the two elements is effected (whether derivationally or, as for Schafer, representationally) the MLC must surely constrain it if the condition is to have any content. Schafer’s explicit claim (p. 191) is that the numeration literally includes two instances of the same lexical form in sentences like (23). These are selected and merged in the appropriate positions (V and C). However, if two separate Vs are selected from the lexicon and are not chain-mates, we would expect each to have e-roles which must be discharged. Since there is no obvious way for the upper verb in C to accomplish this, sentences displaying LHM should have the status of a e-Criterion violation. The fact that sentences like (23) are grammatical, and do not trigger eCriterion violations, suggests that the multiple instances of the non-finite verb are treated by the computation as a single element. There is therefore only one e-role to discharge. But this in turn suggests that these elements do form a syntactic chain. The non-appearance of MLC effects then re-emerges as a problem. Consider (24), which exhibits a wh-island effect. (24) *Wherej do you wonder whoi ti bought the book tj?
MLC violations: Implications for the syntax/phonology interface
339
If duplicate elements can be freely picked from the lexicon but treated as a unit by the computational system, and if elements constructed in this way are not subject to the MLC, then (24) is incorrectly predicted to be grammatical. The two instances of where could be two instances of the same lexical item picked from the lexicon, generated in the same way as Schafer’s LHM sentences. If this is supposed to avoid an MLC violation, then the MLC would in fact be left virtually without content. We conclude that, whatever the status of these objects constructed via multiple insertion, they must be subject to the MLC, in which case Schafer’s analysis does not in fact avoid an MLC violation in Breton sentences exhibiting the LHM effect. Effectively we have been arguing that, contra Schafer, and indeed Borsely and Kathol (2000)’s understanding of Schafer, hers is not in fact a wholly phonological analysis, but is, in crucial respects, syntactic. Phonology is still realizational of the syntactic computation in that its only role is to determine which of the candidates (the two occurrences of lennet) offered by the syntactic computation is to be realized.9,10
5. The syntax/phonology interface and the problem of realizational phonology In previous sections, we have examined several responses to both the SF effect in Icelandic and the LHM effect in Breton within a ‘realizational’ context. We have argued that they are problematic, both conceptually and empirically. However, it is interesting that three of the four analyses have a ‘phonological’ flavor. We suggest that this ‘leakage’ between syntax and phonology reflects an implicit unease with a realizational view of the syntax/ phonology interface, and an acknowledgement that stylistic, semantically vacuous effects like SF and LHM do not find a natural home in either syntax or realizational phonology. Several authors have suggested that the SF and LHM effects are related to ‘verb-second’ – e.g. Maling (1990) and Anderson (1993) for SF in Icelandic and Borsely and Kathol (2000) and Legendre (2001) for LHM in Breton.11 We will develop this idea and argue that the problems posed by realizational phonology for a natural account of the SF and LHM effects pose similar problems for any analysis of ‘verb-second’.
340
Geoffrey Poole and Noel Burton-Roberts
5.1. The problem for SF and LHM Current generative theory incorporates the idea that language relates sound and meaning, and that it is the computational system which articulates the relation between the two (as discussed, for example, by Chomsky (1995: 2, 265)). On these terms, lexical items are constituted not only by formal and semantic features, but also phonological features. In operating on the semantic and formal features of lexical items, the syntactic computation also operates on the phonological features (by pied-piping). From this ‘double interface’ conception of the language faculty it follows that word order is a realizational ‘epiphenomenon’ of the syntactic computation. That is, the arrangement of (the phonological features of) lexical items at PF is entirely determined by what syntactic operations have applied during the derivation from the lexicon to Spell-Out. It is in this sense that word order at PF is a realization of the syntax.12 Even leaving aside the fact that, as syntactic operations, they would violate the MLC in certain cases, neither of the two effects looks like the result of a syntactic operation. Crucially, they both have a ‘stylistic’ character. Neither contributes to a difference in interpretation. Jónsson (1991) notes that the scope of negation is unaffected when it undergoes SF. Schafer (1997) discusses LHM in the context of the fact that “utterances with the same propositional content vary in form” (p. 172). This is what motivates her appeal to a “stylistic component” (p. 171). The fact that these effects involve apparent violations of the MLC seems all of a piece with their stylistic, rather than strictly syntactic, character. However, if the effects of SF and LHM are not to be explained by means of syntactic operation, the only possible alternative is that they are some kind of a purely phonological phenomenon. But this seems equally implausible. Strictly speaking, an account can only be considered to be phonological – that is, strictly, narrowly phonological – if it makes reference to phonological concepts and constructs and not to syntactic ones. It looks as though SF and LHM fail on both counts. As we have seen, the ‘phonological’ aspects of the accounts discussed in Sections 3 and 4 do not invoke phonological concepts as traditionally understood. This is not surprising as there seems nothing obviously phonological which unites the (phonologically) disparate elements which undergo SF and LHM. Poole’s claim that prosody is relevant cannot be maintained, as we have seen. Furthermore, Schafer explicitly denies (p. 170) that prosody could be relevant for LHM in Breton. Her appeal to a ‘stylistic component’ intermediate between the syntax and the phonology proper seems tanta-
MLC violations: Implications for the syntax/phonology interface
341
mount to a concession that the LHM effect is neither syntactic nor phonological, strictly understood. Furthermore, any account of SF or LHM will have to refer to syntactic properties. As noted by Holmberg, the SF effect is sensitive to the distinction between ‘auxiliary’ and ‘main’ verbs. Only the latter undergo SF. But this is not a distinction which a phonological operation can have access to under the standard, ‘realizational’ conception of phonology. This is why, for Holmberg, it must be syntactic. Similarly, the LHM effect crucially involves a specific syntactic category, viz. participle. But this information should not be accessible to the phonological computation since the relevant syntactic/formal features are simply not available there. A realizational conception of phonology thus presents a generalized barrier to any account of these effects. The central dilemma is this: They are not syntactic; yet they are sensitive to syntax; if they are not syntactic, that suggests that they must be phonological; yet if they are sensitive to the syntax then they cannot be strictly phonological.13
5.2. The problem for ‘verb-second’ A very similar dilemma is presented by ‘verb-second’ effect, and accounting for ‘verb-second’ has been a persistent problem within Chomskyian syntax since the mid 70’s. The dilemma is inherent in the very conjunction of ‘verb’ and ‘second’. ‘Verb’ is a syntactic property. This would suggest that the phenomenon must be syntactic, since syntactic operations have access to such categorial information, while phonological operations surely do not. Against this, ‘second’ suggests it must be phonological since counting and linear order play no role in syntax, but crucially do in phonology (see, e.g., Chomsky (1995: 334)). This problem is usually resolved by claiming that the ‘second’ part of verb-second is the merely epiphenominal result of order-independent syntactic considerations. The finite verb obligatorily raises to C while another XP obligatorily raises to Spec CP (following a line of research often attributed to den Besten (1981)). This ensures that the finite verb is spelled out as the second element. However, as noted particularly by Anderson (1993), this introduces ‘engineering’ problems. To achieve second position, there must be a syntactic operation which raises the finite verb to C. But this operation requires syntactic motivation; some syntactic reason for the verb’s raising to C must therefore be engineered. The problem is compounded by Icelandic, where
342
Geoffrey Poole and Noel Burton-Roberts
verb-second is observed not only in main but also in embedded clauses, where the complementizer presumably fills the C position. This requires some further qualification of the verb-second account and therefore further syntactic ‘engineering’. Does the verb raise to C in German/Dutch but only to I in Icelandic? Or do embedded clauses in Icelandic have a recursive CP structure, thus accommodating both an overt complementizer and a verb in second position? (See Rögnvaldsson and Thráinsson (1990) for discussion.)
5.3. The general problem If phonology is regarded as the realizational component of grammar, then any effect systematically observable at PF should either (i) be the realizational effect of a syntactic property or operation (a property ‘fed’ to phonology by overt syntax), or (ii) be motivated in strictly phonological terms narrowly construed. Otherwise, ‘phonology’ becomes a mere wastepaper basket to which we consign all phenomena which are problematic for syntax. Given realizational assumptions, the problem presented by stylistic effects such as SF and LHM, and indeed with verb-second, is that they seem to be neither syntactic (i.e., exhibiting property (i)) nor purely phonological (as understood in (ii)). What is needed is a concept of phonology and its relation to syntax which, while (a) maintaining a radical distinction between them, nevertheless (b) allows phonology full access to the syntax. In what follows, we develop the idea that if the role of phonology is thought of as representational rather than realizational of the syntactic computation, this yields, on a conceptually principled basis, a much broader notion of phonology consistent with (a) and (b). We argue that this representational approach provides an appropriate locus for the characterization of the SF, LHM and V2 effects observed at PF.
6. A representational approach to the syntax/phonology interface There is a certain tension between the goals of minimalism and realizational phonology. If phonology involves ‘spelling out’ the syntactic computation up to that point (Spell Out), phonological features must be present during syntactic computation. But their presence in the syntactic computation is something of an anomaly. They serve no purpose there and are not interpretable at the interface the computation serves (namely LF). They are there on sufferance, as it were, simply pied-piped along with formal and semantic
MLC violations: Implications for the syntax/phonology interface
343
features.14 This tension between minimalist goals and the need to serve the sensory-motor interface is reflected in Chomsky’s distinction between ‘core’ and ‘periphery’. Phonological features are ‘peripheral’ to the ‘core’ function of the computation, which is to serve the LF interface. This contrast seems to hint implicitly that the minimalist ideal would in fact be a language faculty that served just the LF interface. The Representational Hypothesis (RH) offers a way to pursue the hintedat ideal, explicitly and without qualification. On the RH view, ‘lexical items’ do not contain phonological features and phonological features are never present in the syntactic derivation. The syntactic computation is oriented exclusively towards the LF-interface and manipulates exclusively syntactico-semantic elements. This means there can be no operation of Spell-Out, or therefore any distinction between overt and covert syntax. The computation is wholly ‘covert’. Put another way, it is wholly internal (mentally constituted), and is in fact radically internal. By this we mean [a] that it is not ‘externalizable’, not capable of being realized or instantiated in, or converted to, any ‘overt’ mind-external form (e.g., sound) and furthermore [b] that no aspect of the computation or lexicon is ‘internalized’ (from the external).15 This rigorous exclusion of all PF considerations from the computation is motivated by more than simply minimalist ideals. It rests on the central claim of the Representational Hypothesis: that phenomena capable of being assigned a phonological description – certain phonetic phenomena – stand in a relation of conventional representation to the internal syntactic computation. To explain why this means that PF considerations should be excluded from the computation, we need to explain what we mean by ‘representation’. By ‘representation’, we emphatically mean a relation – an asymmetric relation between R (Representans) and O (representatum, Object of representation). The distinction between R and O is essential (R ≠ O). It is, crucially, the distinction between a perceptual sign (R) and what it is sign of (O).16 To emphasize the relational nature of ‘representation’ intended here, Burton-Roberts (e.g., 2000) coins the term ‘m-representation’ – ’m’ for Magritte, in honor of his painting ‘La Trahison des Images’, in which the image of a smoker’s pipe is accompanied by the words ‘Ceci n’est pas une pipe’. The point is that, in looking at the representation(-of-a-pipe) (R), we are not looking at any pipe (O). We assume that, since the syntactico-semantic computation is radically mind-internal, it does not share, and is not resembled by, properties of any mind-external (e.g., phonetic) phenomenon. What is needed in order to effect a representational relation between the internal syntactico-semantic computation and phonetic phenomena is a system of representational con-
344
Geoffrey Poole and Noel Burton-Roberts
ventions. This is what a phonological system is, by the Representational Hypothesis. Phonological systems do not serve or participate in the internal computation itself. They are, quite distinctly, systems of conventions for the m-representation, in a mind-external acoustic medium, of that radically internal computation. The function and rationale of a phonological system is to harness phonetic phenomena (which have exclusively acoustic properties) to the enterprise of m-representing the syntactic computation, primarily for the purpose of communication. Given the distinction between the m-representation (R) and what it is a representation of (O), phonology pertains to R not O. In characterizing a phonological system, then, we are not characterizing any aspect of the nature of the computation itself (what-is-represented) but only how it is represented in the phonetic medium. On these terms, while the computation itself is not – and being radically internal, could not be – for communication, it enters into communication as representatum, by being m-represented in a perceptual medium. As the locus of representational convention, phonology mediates between what is represented on the one hand and non-linguistic acoustic phenomena on the other. Relevant phonetic phenomena count as ‘speech’, not because they realize or instantiate any property of the (universal, radically internal) computation, but because they are produced in conformity with a system of conventions that harnesses them to the enterprise of m-representing that computation. Particular spoken languages such as Icelandic and Breton, on this view, are not instantiations of that universal computation. They relate to the computation as systems of conventions for its m-representation in the phonetic medium.17 As indicated, this representational conception of the nature and role of phonological systems suggests that the computation is indeed phonology-free. On these terms, in contrast to Chomsky (1995:8, 221), phonology is not ‘peripheral’, nor does it have an ‘epiphenomenal’ role with respect to the computation. Within its own (representational, communicative) domain, phonology is essential. Parsing, on these terms, is an activity performed by speakers that puts acoustic events (which lack linguistic structure and properties) into representational correspondence with objects that have linguistic structure and properties. Parsing of relevant phonetic phenomena is necessary precisely because they do not possess the syntactico-semantic structure they are m-representational of. This is a central aspect of the conventionality of the relation.
MLC violations: Implications for the syntax/phonology interface
345
6.1. The Representational Hypothesis and linear order This change from a realizational to a representational view of the syntax/ phonology interface has important consequences for the understanding of linear order. Within Chomsky’s double interface conception, linear precedence is not taken to be a property of generated expressions. Nevertheless order must be assigned as part of the ‘realization’ process that ‘converts’ expressions generated by the syntactic computation into a form that external systems can use. Performed by the language faculty, this is a process of ‘linearization’. On a representational conception of phonology, by contrast, objects created by the syntactic computation are not, and in no sense become, linear. Nor is any process of linearization involved. Linearity and precedence are unambiguously temporal properties of, and only of, phonetic m-representations of expressions. Linear precedence is not necessary for interpretation at LF (as argued by, e.g., Gazdar et al (1985), Chomsky (1995: 334)), precisely because, in the RH, it pertains, not to the object created by the syntactic computation, but quite distinctly to m-representations of it in the phonetic medium. A central feature of the conventionality of m-representation consists in the fact that linguistic representata are exclusively hierarchical, and are m-represented by events that are linear.
6.2. The nature of Representational phonology To illustrate the broader notion of phonology which the Representational Hypothesis (RH) provides, consider how so-called head-initial and headfinal languages would be treated. First, it is not syntactic expressions like heads and complements themselves that are or could be ‘initial’ or ‘final’. Relevant linear order is exhibited only by phonetically constituted m-representations of heads and complements. Notationally, it is not H and C that are ordered, but the phonetic objects ½[H] and ½[C]. This order is determined by the representational conventions that constitute the particular language. As should be clear from this simple example, the RH provides a principled basis for making a much more radical distinction between the syntactic-semantic computation on the one hand and phonological systems on the other. It is the distinction between what is represented (O) and how it is represented (R). This absolute distinction is nevertheless fully consistent with phonological systems having access to the syntactico-semantic computation. More than merely consistent, such access is conceptually necessary in representational terms: since the rationale of phonological systems is m-repre-
346
Geoffrey Poole and Noel Burton-Roberts
sentational with respect to the computation, they must be thought of as having access to it – including, for example, access to head-complement relations, as just illustrated. This necessary access to the syntax is bound up with the broader character of representational phonology. What in a realizational framework is treated as the ordering of syntactically defined heads and complements themselves – and therefore as syntactic – is, in the RH, a matter of the ordering just of their phonetic m-representations. As representational, it is wholly determined by phonological systems. A representational approach thus succeeds in effecting a radical separation of phonology and syntax, while simultaneously allowing that phonological systems necessarily have access to the syntactic computation, thereby implying a broader notion of what a phonological system is. 6.3. Representational phonology and ‘stylistic’ effects This representational approach to phonology provides a natural domain for the description of ‘stylistic’ effects such as SF and LHM. The representational conventions of a language determine the form of phonetic expressions (PFs) in that language. Many, but not all, aspects of these PFs are strictly representational. By ‘strictly’ representational, we mean that they are exploited specifically for the purpose of m-representing aspects of the wholly internal syntactic computation. In the above example, conventions governing the ordering of m-representations of heads and complements exploit linear order for the purpose of m-representing the sister relation between them and thus hierarchical structure. However, not all aspects of all PFs have such a strictly representational function, even when order is determined by conventions.18 In such a case, word order can be exploited to serve a range of other – not strictly representational – functions, for example the expression of information structure and empathy, to maintain ‘end weight’, or for ease of parsing, whatever. This would seem to be precisely the characterization of ‘stylistic’ phenomena – aspects of word order that are not harnessed, specifically and strictly, to the representational enterprise. 7. A Representational (phonological) account of SF in Icelandic and LHM in Breton Having briefly outlined the main ideas of representational phonology, we show in this section how it might apply to the SF and LHM effects in Icelandic and Breton. Before dealing with these effects directly, we introduce
MLC violations: Implications for the syntax/phonology interface
347
certain representational conventions of Icelandic and Breton, particularly a convention that yields a ‘verb second’ effect.
7.1. ‘Verb-second’ in the Representational Hypothesis We have suggested that the realizational approach to phonology presents a dilemma in capturing the descriptive generalization ‘verb-second’. ‘Verb’ suggests it is syntactic, not phonological but ‘second’ suggests it is phonological not syntactic. The distinction between a radically internal syntactic computation (O) and an external, phonetic representation (R) of that computation offers a solution. Corresponding the two domains – the O domain and the R domain – there are two quite distinct notions of ‘position’. ‘O-position’ is a syntactic (labelled) node in hierarchical structure generated by the computation. ‘Rposition’ is a precedence relation between phonetic forms within representational strings. R-position in PF strings is wholly determined by a (representational) phonological system. There is no question that, under the RH, ‘verb-second’ could be stated as a syntactic constraint, that is, as an ‘Oposition’ effect. ‘Second’ is simply not a concept relevant to O-position. Arguments that linear order plays no role in the syntactic computation would seem to have even more force within the radically internal view of the syntax made possible by the Representational Hypothesis. On the other hand, ordering generalizations find a natural place within the R domain. Crucially, however, it can only be representational entities that are ordered by representational conventions. Assuming ‘verb’ is an O-property, it is not verbs themselves that are ordered second, only their m-representations. Notationally: it is not the linear position of [Vfin] but of the quite distinct phonetic object ½[Vfin] that is at issue. We are dealing then not with some realizational epiphenomenon of syntax, [Vfin]/2, but with a first order representational phenomenon ½[Vfin]/2, to be captured directly and as such by a declarative representational convention. The representational system constituted by a ‘verb-second language’ thus includes: (25) The Representational ‘Verb Second’ Convention – ½[Vfin]/2: The m-representation of a finite verb – ½[Vfin] – must appear as the second element in the m-representation of its clause (½[IP]). The dilemma presented by ‘verb-second’ on realizational assumptions simply does not arise on representation assumptions. Convention (25) neither
348
Geoffrey Poole and Noel Burton-Roberts
manipulates nor even applies to any syntactic object (O); it applies to an entirely separate entity, a phonetic object (or event), which counts as ½[Vfin] only given a (phonologically constituted) representational convention. Notwithstanding this separation of phonology and syntax in the RH, the so-called ‘verb second’ effect can and must be regarded as phonological precisely because, having a representational role with respect to syntax, a phonological system must be regarded as having access to properties of what it is representational of. There is no (literal or metaphorical) movement – either syntactic in O or phonological in R – associated with the above ½[Vfin ]/2 convention. Although the RH by no means eliminates the possibility of syntactic movement in O (or chains, at least) provided there is an identifiable effect on interpretation at LF, the mere fact of apparent ‘displacement’ in PF is not enough under the RH to justify postulating a syntactic operation. Nevertheless, if only for reasons of caution at this stage, we will depart as little as possible from current assumptions and will assume that finite verbs do raise from V to I for ‘narrow syntax’ (i.e. O) reasons. What we reject is the assumption that any further raising of Vfin is required simply to capture ½[Vfin]/2. On this assumption, ½[Vfin], irrespective of its R-position relative to other representational elements, is the m-representation of an O-element, [Vfin ] in the hierarchical O-position defined by the syntactic node I, the position in which it is interpreted, by assumption. More generally, the RH allows us to insist that, whatever linear (R-) position a phonetic form ½[_] has, the (O-) position of its representatum (the syntactic object [_] itself) must be that in which it ([_]) is interpreted at LF.
7.2. Further representational conventions Here we introduce two further conventions that characterise the representational system constituted by Icelandic and Breton. These are merely representational analogues of stipulations which need to be made for those languages irrespective of framework if aspects of their word order are to be accounted for. We take both languages to be ‘configurational’, ones in which order is relatively constrained. From the perspective of the RH, these properties follow from the fact that these languages, to a large extent, do exploit order for strictly representational purposes. In particular, the left-right relation in R is mrepresentational of (a) higher-lower hierarchical relations in O and (b) headcomplement relations in O, all other things being equal. Formally, we have:
MLC violations: Implications for the syntax/phonology interface
349
(26) Default Precedence Convention for M-Representations a. Asymmetric c-command between elements _ and ` of O is m-represented by the m-representation of _ (½[_]) preceding the mrepresentation of ` (½[`]). b. Symmetric c-command between a head _ and its complement ` in O is m-represented by the m-representation of _ (½[_]) being adjacent to and preceding the m-representation of ` (½[`]). We express these as ‘default’ because, as we shall see, they can be overridden by other conventions, particularly by the ½[Vfin]/2 convention. In keeping with our attempt to depart minimally from current assumptions, we adhere to the assumption that in the syntactic computation itself the subject ‘moves’ from Spec-VP to Spec-IP (or equivalent), i.e. that there is a chain, the highest (head, _) member of which is in Spec-IP and the lowest (tail, `) in Spec-VP. In Icelandic and Breton, as in English, chains are generally m-represented by m-representing just their heads. (27) Default Convention for the M-Representation of Chains: A chain in O with head _ and tail ` is m-represented by ½[_]. As a result, given Convention (26) above, chains in these languages are generally m-represented in the leftmost possible position in R.
7.3. ‘Verb-second’ and the SF effect in Icelandic Bringing together much of the previous discussion, we argue that the rationale of the SF effect is to subserve ½[Vfin]/2. We call it ‘the SF effect’ because on our analysis it involves no movement, and in fact no operation as such. It merely is the linear order dictated by the interaction of ½[Vfin]/2 with other representational conventions. Consider the m-representation (28): (28) Ef gengi› er eftir Laugaveginum… If walked is along the-Laugavegur… ‘If one walks along the Laugavegur…’ The syntactic expression m-represented by (28) is (29) (details omitted):19 (29) [CP ef [IP proarb [I er] gengi› eftir Laugaveginum ] ]
350
Geoffrey Poole and Noel Burton-Roberts
By the conventions of Icelandic, proarb is not m-represented. This means that, although proarb is in the O-position Spec-IP, there is no m-representation of that O-position in (28). Given this, blind adherence to the Default Precedence Convention (26) will result in a phonetic form not fully consistent with the conventions of Icelandic. (30) *Ef er gengi› eftir Laugveginum…. By contrast, (28) is an m-representation fully consistent with those conventions, including ½[Vfin ]/2. By that convention, ½[er] must immediately follow whatever m-representation is leftmost in ½[IP]. Assuming that gengi› is a directional verb, taking eftir Laugveginum as complement, ½[gengi›] must precede ½[eftir Laugveginum], by convention (26b). Together with the fact that the subject proarb is not m-represented, this means that, apart from ½[er], no other expression is m-represented in the m-representation of er’s IP. Hence, ½[gengi›] is first and ½[er] therefore follows it. On the Representational Hypothesis view, it does not follow that gengi› itself is in a syntactic position higher than I, syntactically displaced there from the lower position in which it is interpreted. Neither does it involve any rightward phonological movement of ½[er] over ½[gengi›] (as Poole (1997) would claim, for example). The difference between (28) and (30) pertains wholly to R, not O. It is exhaustively described by the statement that the order in (28) (½[gengi›] > ½[er]) conforms to convention (25), while that in (30) (½[er] > ½[gengi›]) does not and hence is ill-formed. Insofar as (30) can be parsed as the m-representation of an expression at all, despite not conforming to the conventions, it will be parsed as m-representing the same expression as that m-represented by (28). It therefore constitutes a mis-representation of it. Consider now the example of SF in Icelandic which appeared to violate the Minimal Link Condition: (31) fietta er versta bók sem skrifu› hefur veri›. This is the worst book that written has been ‘This is the worst book that has (ever) been written.’ The SF effect is exclusively a phenomenon in the R domain; it is in fact a property of the representation that is not strictly representational. That is, we assume that the (R-)position of ½[skrifu›] is not strictly representational of the (O-)position of [skrifu›] itself. This follows from the fact that sole rationale of the SF effect is to support ½[Vfin]/2. ½[Vfin]/2 is itself not strictly
MLC violations: Implications for the syntax/phonology interface
351
representational. That is, the appearance of ½[Vfin] in 2nd position is not representational of the O position of [Vfin] itself (I, by assumption) but is, rather, determined by the ½[Vfin]/2 convention (25), which here overrides the strictly representational Default Precedence Convention (26). Thus, irrespective of the R-position of its m-representation, skrifu› itself is in the Oposition in which it is interpreted. There is no need to postulate a syntactic operation that moves the participle to a higher position simply in order to engineer a ‘verb second’ linear order. The SF phenomenon is phonological, understood within the wider conception of ‘the phonological’ provided by the Representational Hypothesis. The phonologically constituted representational conventions of Icelandic require that the O-object in (32) be represented by the string (31) above. (32) fietta er versta bók [CP sem [IP hefur veri› skrifu› ] ] Since the SF effect involves no movement, either syntactic or phonological, it involves no violation of the Minimal Link Condition. (31) does pose a slightly different question, however. It is not an RHspecific question, but rather one which arises irrespective of framework. This is the question why, as discussed in Section 4, the m-representations of non-finite auxiliary verbs never participate in the SF effect.20 Why can’t ½[veri›] be the first element in ½[IP], as in (33)? sem veri› hefur skrifu›. (33) *fietta er versta bók This is the worst book that been has written ‘This is the worst book that has (ever) been written.’ We are not aware of any wholly satisfactory explanation of the (33)/(31) contrast and have little to offer ourselves. Nevertheless, the Representational Hypothesis at least has the following advantage over the traditional realizational approach to phonology: phonology, consisting as it does of representational conventions, is the locus of the relation between purely syntactic objects and purely phonetic objects/events – defining how the latter are deployed in the phonetic m-representation of the former. On these terms, it does not immediately follow from (31)/(33) contrast, as it does in a realizational framework, that SF must be a syntactic phenomenon, with all of the problems that that entails. There is no problem with stating a generalization which distinguishes ‘main’ verbs from ‘auxiliaries’ within phonology, if this is seen in representational terms.21
352
Geoffrey Poole and Noel Burton-Roberts
7.4. LHM in Breton The above account of SF in Icelandic in terms of ‘verb-second’, carries over naturally into an account of LHM in Breton. However, there is an important aspect to the analysis, specific to Breton, which calls for discussion. Recall the central example: (34) Lennet en deus Yann ___ al levr. read 3SG has Yann the book ‘Yann read the book.’ With Schafer (1995, 1997), Borsely and Kathol (2000) and Legendre (2001), we assume that Breton is a ‘verb-second’ language. In our terms, Breton includes the Representational Verb-Second Convention (25).22 Contra Schafer (1997), however, we believe that (34) does instantiate an ½[Vfin]/2 effect. As discussed above, ‘O-position’– that is, position in the unique, invariant syntactic computation itself, can only mean ‘the syntactically defined hierarchical position in which a syntactic element is interpreted’. In other words, O-position is ‘universal’, regardless of how it is variously m-represented in, e.g., Icelandic or Breton.23 (35) [CP [IP Yann [I en deus] [VP Yann lennet al levr ] ] ] There is in fact a crucial difference between the two ‘languages’: as mentioned, Breton is a VSO language. SVO orders are instances of topicalization or clefting. Therefore, representing (35) using the Default Precedence Convention (26) and the Default Chain Convention (27) will result in a representation which is interpreted as containing a topicalized element (Yann). (36) Yann en deus lennet al levr. ‘Yann read the book.’ In order to have a neutral (non-topicalized) interpretation, the representation must have ½[V] in initial position. But since Breton also contains the ½[Vfin]/2 Convention (25), the representation of the finite verb en deus cannot be clause initial. The only m-representation of the syntactic expression (35) which satisfies ½[VSO] while satisfying ½[Vfin]/2 is one in which the m-representation of the participle, ½[lennet], is initial, followed by the mrepresentation of the finite verb, ½[en deus]: (37) Lennet en deus Yann al levr.
MLC violations: Implications for the syntax/phonology interface
353
Just as in the Icelandic case just discussed, it is crucial that there is no syntactic operation which moves the participle to a higher position in O, and therefore no potential violation of the MLC. Neither is there any need to ‘engineer’ a syntactically defined landing site (i.e., an O-position) for the participle nor to postulate any adjunction operations. Furthermore, a representational approach to the syntax/phonology interface obviates the need to postulate a ‘stylistic component’ á la Schafer (1997). The RH independently motivates a phonologically constituted representational domain in which stylistic considerations find a natural home. By the RH there is no inevitable isomorphism between syntactic properties interpretable at LF and properties of PF, as assumed in a realizational framework (in which syntactic expressions are possessed of phonological features, which must be assumed to pied-pipe with those expressions). Such ‘isomorphism’ as there might be is effected by convention – e.g., the strictly representational conventions (26) and (27) – but not all conventions give rise to such isomorphism. Presumably, a radically ‘non-configurational language’ is one in which no convention imposes such representational isomorphism. Stylistic effects are reconstructed in terms of conventions that are not ‘strictly’ representational in that they dictate that the PF m-representations of certain elements appear in certain (R-)positions regardless of the hierarchical O-position occupied by their representata (and thereby overriding conventions (26) and (27) in languages that have the latter).24 Such conventions are included in, and interact with, the set of conventions that constitute a particular language, without the need for a special stylistic component.
7.5. The satisfaction of ½[Vfin]/2 and constraints on SF and LHM Finally, it is worth observing that the above approach to the SF and LHM effects in Icelandic and Breton provides a natural account of some of the core descriptive facts about the distribution of those effects.
7.5.1. The domain of SF and LHM At the broadest level, an account of the SF and LHM effects in terms of the ½[Vfin]/2 Convention explains why the Icelandic SF effect is evident in both main and embedded clauses, while the Breton LHM effect in is a mainclause phenomenon only. Icelandic is a ‘symmetric verb-second’ language: in representational terms, the ½[Vfin]/2 convention applies in the m-repre-
354
Geoffrey Poole and Noel Burton-Roberts
sentation of both main and embedded clauses. Since the SF effect subserves ½[Vfin]/2 on our analysis, the distribution is explained. Breton, by contrast, displays ½[Vfin]/2 only in m-representations of main clauses (although both main and embedded clauses are normally VSO). If the sole rationale of the LHM effect in Breton is to subserve ½[Vfin]/2, we again derive a simple explanation of why the LHM effect is observed only in main clauses.
7.5.2. Default precedence, ½[Vfin ]/2, and ‘minimal disturbance’ The Default Precedence Convention (26) is a convention whereby, we claim, Icelandic and Breton harness linear order in the representation R in aid of m-representing hierarchical relations in the syntactic computation O. But (26) is a default convention: blindly conforming to (26) can result in ill-formed representations in which ½[Vfin] is in 1st position. Nevertheless, the fact that (26) can be over-ridden is not a license to m-represent anything anywhere. Hierarchical relations in O must still be m-represented to the greatest extent possible, as dictated by strictly representational conventions. That is, the Default Precedence Convention (26) still applies, excepting only that it can be over-ridden by ½[Vfin]/2. This results in what might be called a ‘minimal disturbance’ effect. That is, if other conventions (e.g., the Default Precedence Convention) determine the order given in (40a), then the ½[Vfin]/2 convention is to be satisfied by the order in (40b) rather than that in, (40c), for example. (38) a. ½[A >… B… > …C…] b. ½[A > Vfin > …B…C…] c. *½[B > Vfin >… A…C…] This reconstructs, without special stipulation, Maling’s SF accessibility hierarchy discussed in Section 2 in which the presence of a ‘closer’ SF-able element prevented any SF-able element further away from being used to satisfy ½[Vfin]/2. Notice also that this ‘minimal disturbance’ effect automatically and very simply predicts that both the SF effect in Icelandic and the LHM effect in Breton will be clause-bounded.25
8. Implications for other Minimal Link effects A central thesis of this paper has been that, under representational phonology, there are two different aspects to the linear order of phonetic objects
MLC violations: Implications for the syntax/phonology interface
355
(PF). As discussed in Section 6, there are aspects of the linear order which are strictly representational – that is, they are exploited for the representation of aspects of the internal syntactic computation. For example, both Icelandic and Breton include the strictly representational Default Precedence Convention (26), which determines that many of the left-to-right asymmetries among word-level phonetic objects in R are m-representational of higherlower hierarchical relationships in O. However, there are other aspects of the linear order in R that are not strictly representational. Under our analysis, ½[Vfin]/2 is one such aspect. It follows from this distinction among properties of the representation that linear order in representational PFs is not “isomorphic” to the syntax. The fact that an R-element appears in an R-position other than that determined by the Default Precedence Convention (and thus as ‘displaced’) is not necessarily an indication that a syntactic operation applied to its representatum. This has important implications for the understanding of other phenomena that are relevant for the Minimal Link Condition. Superiority phenomena, for example, seemingly present a challenge to the view defended in this paper that the MLC is universal and inviolable. On the one hand, there are well-known cases which seem to obey the MLC: (39) a. Whoi ti bought whatj? b. *Whatj did whoi buy tj? The contrast in (39) is straightforwardly explained under the assumption that the MLC constrains the wh-movement/attraction operation. The wh-feature in C must attract the subject who in preference to the direct object what because who is a hierarchically closer element which contains a wh-feature. Within representational phonology, this syntactic operation applying to who is reflected in the ordering of the m-representation of who in initial position by the Default Precedence Convention.26 On the other hand, as is well-known, Superiority also seems to be sensitive to a variety of non-syntactic ‘influences’. As noted by Pesetsky (1987), superiority effects can be obviated when the wh-phrases are ‘discourselinked’. (40) Which bookj did which personi read tj? Under a realizational phonology, (40) would seem to be identical to (39b). The word order (seen realizationally) indicates that the direct object whphrases in both (39b) and (40) have moved syntactically from their base-
356
Geoffrey Poole and Noel Burton-Roberts
generated positions to Spec CP. However, there is a marked contrast in grammaticality between (39b) and (40), which seems inconsistent with the claim that the MLC is inviolable. But just as with SF and LHM, this dilemma is in fact rooted in realizational phonology, we believe. It is realizational phonology, and its attendant isomorphism, that implies that the “displacement” of which book in (40) must be the realizational effect of a syntactic operation. Representational phonology, on the other hand, provides no such compulsion. It is quite possible for which book to appear in initial position in the linear order in R without its representatum having undergone any syntactic operation, if it were there as the result of a representational convention that overrides the Default Precedence Convention (in which case its occurrence there would be not strictly representational). This raises the possibility of accounting for the (39b)/(40) contrast without “weakening” the MLC. In other words, (39b) might instantiate “true” wh-movement, and hence an MLC violation, but (40) might not. It is beyond the scope of the present section to provide a systematic account of the (39b)/(40) contrast within representational phonology. However, it is worth noting that discourse and information-structure considerations are precisely the sort of phenomena that we might expect to be served by conventions that result in linear orders that are not strictly representational. The representational is an external domain, and as such is the locus of functional, communicative, and behavioral considerations. It would therefore be natural if discourse-based considerations were relevant there. Some supporting evidence for the non-syntactic nature of unexpectedly grammatical MLC violations comes from Fanselow (this volume), who notes that judgements about the extent to which a language tolerates Superiority violations can be difficult to assess. He discusses the following sentence of Dutch: (41) Ik weet neit wat wie gekocht heeft. I know not what who bought has From a sample of 22 Dutch linguists consulted by Fanselow, 5 found the sentence acceptable, 7 found it questionable and 10 rejected it. If aspects of word order in certain Superiority configurations were conditioned by purely representational considerations, as we suggest, some variation is expected. In contrast to the internal syntactic computation O, which is a universal, invariant and subpersonal computation, the conventions for the representation of that computation must be learned on the basis of individual experience.
MLC violations: Implications for the syntax/phonology interface
357
Under these acquisition conditions, a certain amount of variation in the conventional system acquired is to be expected, as it will be based on the particular data that a speaker was exposed to and on the particular inductive generalizations made by him or her. Certain speakers may have acquired a conventional system in which these superficially MLC-violating word orders are licensed by a discourse-based word order convention, but others may not have. One prediction that arises from this view is that, at least in configurational languages, apparent “violations” should always have some marked discourse or communicative function. In summary, we believe that the facts surrounding other MLC effects, including the well-known fact that discourse considerations may obviate Superiority violations, do not necessarily threaten the claim that the MLC is universal and inviolable. Furthermore, whatever the ultimate account of sentences like (40) turns out to be, we believe that a representational, phonological approach of the discourse-based exceptions to the MLC is superior to a syntactic approach.27 If the syntactic computation is ‘informationally encapsulated’ (i.e. modular in Fodor’s and Chomsky’s terms), it would be surprising if it were sensitive to discourse distinctions. The representational domain, by contrast, is a natural one for these effects to manifest themselves in.
9. Conclusion: On the status of the MLC In this paper, we have developed an approach to Stylistic Fronting in Icelandic and Long Head Movement in Breton which is based on a representational, rather than realizational, approach to the relation between syntax and phonology. We have argued that the traditional realizational approach gives rise to a dilemma in accounting for the properties that these constructions display. In particular, ‘realizational’ phonology strongly suggests that SF effect in Icelandic and LHM effect in Breton are syntactic phenomena (that is, PF realizations of syntactic operations). But the fact that these phenomena violate the Minimal Link Condition runs counter to current interpretation of this principle as a defining part of the Move/Attract operation. Classifying these phenomena as phonological is in keeping with their stylistic character and obviates any need to weaken the MLC to allow for ‘violations’ evident at PF. However, a realizational approach to phonology provides no obvious means by which this could be accomplished, since, in a realizational phonology, effects systematically observable at PF must be either a realizational epiphenomenon of syntactic operations or be explicable in strictly
358
Geoffrey Poole and Noel Burton-Roberts
phonological/phonetic terms. The Representational Hypothesis provides an account of the distinction and relation between syntax and phonology which implies, in a conceptually principled way, a broader concept of the scope of phonology. We have argued that this broader concept of phonology is anyway needed for the description of the SF and LHM effects in Icelandic in Breton – and has the crucial advantage of being consistent with a view of the Minimal Link Condition as a defining (and thus inviolable) property of Move/Attract operations in the syntactic computation.
Appendix: A note on two further analyses of LHM effects in Breton There is some resonance between our account of LHM in Breton in Section 7 and two accounts within other frameworks: that of Borsely and Kathol (2000) within HPSG and Legendre (2001) within Optimality Theory. They too see LHM in Breton as arising from a ‘verb-second’ effect. Furthermore, and more generally, their analyses both seem to depend on respecting the distinction between hierarchical properties and linear precedence properties. However, unlike the representational approach offered here, neither HPSG nor OT has an explicit account of the nature of the relation between syntax (hierarchical facts) and phonology (linear precedence facts). As a result, both analyses of the LHM effect wrongly allow PF considerations to drive assumptions about the syntax. This in turn introduces conceptual problems into their respective frameworks. Within HPSG, the problem relates to linear order. The HPSG combinatorial system is explicitly order-free. Kathol (2000: 42) claims, for example, that “no linear relations can be deduced from the way the combinatorial ingredients are given….” However, semantic modification relationships generally do correlate with linear order in configurational verb-second languages like Breton.28 In order to filter out the many impermissible orders that their ‘order domains’ (over)generate, Borsely and Kathol are forced to introduce various processes to ensure that the correct linear order is obtained. ‘Compaction’ (p. 682-3) is one example, in which the PHON features of certain units (e.g., determiners and nouns) are arbitrarily fused together solely for linearization purposes. But this would seem to undermine what is supposedly a central feature of HPSG, namely that hierarchical properties and linear properties are strictly separated. We believe that this problem arises because properties of the phonology (linear order) are being misanalyzed as properties of the syntax (order domains), and that this conflation is
MLC violations: Implications for the syntax/phonology interface
359
engendered by the lack of an account within HPSG of the relation that phonology bears to syntax. Within OT, the problem is not to do with linear order per se, but rather to what extent syntactic structure is taken by the theory to be determined by PF factors. Legendre, following Grimshaw (1993), assumes that MinimalProjection (MINPROJ) is one of the constraining filters on candidate inputs. The effect of MINPROJ is that “only as much structure is built as is required by the input” (p. 252). But properties of the syntactic structure are being determined by properties of PF, as is clear from her claim that MINPROJ entails that “if null subjects do not exist … then a null subject clause is simply a Vv” (ibid). As far we can tell, the only reason for picking out null subjects in this context is the fact that nothing at PF corresponds to them. Although Legendre might be technically correct to say (p. 255) that “nothing hinges on this assumption” (in the sense that the assumption has no empirical impact on her analysis of Breton), the claim does raise serious and fundamental questions about the nature of syntax and the grammar of Breton within Optimality Theory. The geometric relations that presumably express syntactic relations (such as agreement, Case assignment, binding, erole assignment, etc.) will necessarily be radically different for different inputs if they have the radically different structure entailed by MINPROJ. A crucial question then arises as to how syntactic relations are expressed in the OT grammar of Breton and whether any predicted differences are confirmed by the data. This is surely not a desirable result. Just as with HPSG, we argue that an ‘output’ property (phonetic existence) is being conflated with an ‘input’ property (syntactic existence) because of the lack of a clear account of the nature of both the distinction and the relation between syntax and phonology. It is precisely this distinction and relation that the Representational Hypothesis seeks to address.
Acknowledgements Thanks to the audience at the Universität Potsdam and two anonymous reviewers for many helpful comments. This research was supported by project grant F/00125A from the Leverhume Trust.
360
Geoffrey Poole and Noel Burton-Roberts
Notes 1. See, among many others, Maling (1990), Poole (1996) and Holmberg (2000) on Stylistic Fronting and Borsely, Rivero, and Stephens (1996), Schafer (1997) and Borsely and Kathol (2000) on Long Head Movement in Breton. 2. Some researchers, e.g. Chomsky (1999), have suggested that there might be ‘displacement’ operations which take place as part of the phonological computation. We believe that this in fact reflects an unease about the ‘realizational’ approach to the syntax/phonology interface. See section 5 and note 13. 3. (5) is from Thráinsson (1993). 4. Participles and particles are hierarchically equal in that, when they co-occur, either may undergo SF (provided no other SF-able element is present). This fact presents problems for attempts to explain the hierarchy in terms of phrasestructural superiority. 5. The exception to the generalization that a finite verb may not be sentence-initial consists of the copula when it takes a PP or progressive complement. For discussion see Borsely and Roberts (1996) and BRS. 6. For discussion of the implications of this division for the traditional ‘double interface’ property of language (Chomsky 1995: 2), see Burton-Roberts and Poole (2003; in prep). 7. For further discussion of Poole’s (1997) approach, see Burton-Roberts and Poole (in press). 8. Borsely does have a more recent analysis (Borsely and Kathol (2000)) within the framework of HPSG. See the Appendix for some discussion. 9. In addition to the problems created by the insertion of duplicate lexical items, there also would seem to be problems with the ‘deletion’ mechanism that Schafer postulates. She claims that what forces deletion of one of the ‘duplicate’ elements in LHM constructions is Full Interpretation applying at PF. However, Full Interpretation merely requires that every element present at a given interface (whether LF or PF) be interpreted (see, e.g., Chomsky 1995: 27)). It is not clear how this principle forces deletion of one of the duplicate elements in the way that Schafer suggests. 10. We also believe that Schafer’s proposal faces at least one crucial empirical problem. In order to account for the clause-boundedness of LHM, Schafer introduces the notion ‘predicate-indexing’, by which “tense and aspect markers are interpreted with respect to the state or event named in their clause”(p. 194). ‘Clause’ would seem to have to mean ‘CP’ in order to correctly block ‘longdistance’ LHM, otherwise the higher predicate will not receive the undesired ‘extra’ index from the participle in the higher C. However, this would seem to incorrectly predict that LHM should be possible in modal and ‘bridge verb’ contexts. These structures have no embedded CP, and we therefore expect that the LHM’ed participle should be able to be indexed with the lower predicate. 11. Schafer (1997), however, explicitly denies that LHM in Breton can be interpreted as a reflex of the verb-second effect. The central fact for her claim, however, is
MLC violations: Implications for the syntax/phonology interface
12.
13.
14. 15.
16.
17. 18.
19.
20. 21.
361
that LHM moves X°s, and therefore cannot be moving to Spec CP. This objection cannot be raised against the view which we will develop. See below for further discussion. See Burton-Roberts and Poole (in press) for further discussion. Note that frameworks in which phonological features are not present in the syntax but still assigned to syntactic expressions (e.g., Distributed Morphology) will also have this property. See also footnote 13. The ’double interface’ property of language is part of the Saussurian legacy within generative grammar. A lexical item contains both formal and phonological features in the same way that the sign contains both signifié and signifiant. As will become clear below in Section 6, however, ours is a radically anti-Saussurian proposal, in which a sign is completely disassociated from what it is a sign of. As Schafer notes (p. 178), others, particularly Chomsky (1995), have suggested that there are operations which derive ‘rearrangements’ which do not arise from the checking of morphological features. Schafer lists topic-focus structures, extraposition, and VP-adjunction, among others. However, this is to virtually concede that the strictly realizational view of the syntax/phonology interface cannot be maintained and that a broader notion of phonology is called for. Tellingly, Chomsky (1995: 221) writes “If humans could communicate by telepathy, there would be no need for a phonological component”. This entails that no aspect of the computation is acquired or learned. It is wholly innate, biologically determined and invariant, across individuals of the species and across stages in their maturation (see Burton-Roberts (2000) and BurtonRoberts and Poole (in press) for further discussion). Our view is thus in contrast to Chomsky, who consistently uses ‘representation’ in a non-relational fashion (see Chomsky 1995: 135). For Chomsky, a PF representation is not a representation of anything. It simply is a structure created during the course of the overt syntactic and the phonological computations. Burton-Roberts (2000) calls them Conventional Systems for the Phonetic Representation of L–CSPR(L)s. So-called ‘non-configurational’ languages are ones that comprehensively fail to exploit word order for the strictly representational purpose of m-representing syntactic hierarchy and relations. We put aside here the distinction made in Burton-Roberts and Poole (in press) between ‘lexical items’ (O-elements) and ‘vocabulary items’ (R-elements). The lexical items of O, not having phonological features of any kind, are not words of Icelandic, Breton, or any other language. However, we will illustrate a syntactic object O using words from the language in which it is subsequently mrepresented, with the understanding that this is simply because of the need to physically represent the syntactic object O on the page. Recall that Holmberg adduced this fact as a conclusive argument for SF as a syntactic operation. See Burton-Roberts and Poole (in press) for a suggestion. SF has other properties besides the two we mention here. See Burton-Roberts and Poole for further
362
22. 23. 24.
25.
26.
27.
28.
Geoffrey Poole and Noel Burton-Roberts discussion as to how they might be accounted for within the framework presented here. Unlike Icelandic, however, Breton is not a ‘symmetric’ verb-second language, so the ½[Vfin]/2 convention holds only in the m-representation of main clauses. A reminder that elements of O are not language-specific. See footnote 19. This is precisely what Holmberg attempts to achieve by allowing the syntactic computation to move phonological features independently of the formal/ semantic features of the expressions they belong to. For the paradoxes and problems posed by Holmberg’s proposal for the very idea of realisational phonology, see Burton-Roberts and Poole (2003; in prep). One issue which we have not addressed in this paper is the question of variation among Representational systems (i.e., “languages”). As is well-known, SF is not exhibited by the mainland Scandinavian languages, while older versions of some Romance languages have been claimed to exhibit an SF or LHM-like process (for an overview see Fischer and Alexiadou (2001)). Given our claim that SF/LHM is a reflex of a representational verb-second convention, the explanation for the existence of SF/LHM is not, on our view, to be sought in the synchronic syntax, as would be the case under a realizational approach. In fact, since the Representational Hypothesis claims that the syntax is wholly covert and oriented towards LF, we assume that both wh-elements must have moved in the syntax to the matrix Spec CP in (39). We assume that the representational conventions of English dictate that the first chain created by movement to Spec CP is represented by m-representing the head, but that subsequent chains are represented by m-representing the tail. This correctly rules out (39b), because the wh-chain headed by the m-representation of what cannot have been the first to move to Spec CP without violating the MLC. Pesetsky proposes, for example, that discourse considerations determine what kind of syntactic scope-assignment operations apply (wh-movement for non discourse-linked elements, simple idexation for discourse-linked ones). This is captured within the Representational Hypothesis by the Default Linearity Convetion (26).
References Anderson, Stephen 1993 Wackernagel’s revenge: Clitics, morphology, and the syntax of second position. Language 69: 68–98. Besten, Hans den 1981 Government, syntaktische Struktur und Kasus. In M. Kohrt and J. Lenerz (eds.), Sprache: Formen und Strukturen, Akten des 15. Linguistischen Kollquims Münster 1980 (1). Tübingen: Verlag, 97–107.
MLC violations: Implications for the syntax/phonology interface
363
Borsely, Robert and Andreas Kathol 2000 Breton as a V2 language. Linguistics 38 (4): 665–710. Borsely, Robert and Ian Roberts 1996a Introduction. In Borsely and Roberts (eds.), 1–52. 1996b The Syntax of the Celtic Languages. Cambridge: Cambridge University Press. Borsely, Robert and Janig Stephens 1989 Agreement and the position of subjects. Natural Language and Linguistic Theory 7: 407–427. Borsely, Robert, Maria-Luisa Rivero and Jaing Stephens 1996 Long Head Movement in Breton. In Borsely and Roberts (eds.), 53–74. Burton-Roberts, Noel 2000 Where and what is phonology?. In Noel Burton-Roberts et al. (eds.), 38–66. Burton-Roberts, Noel and Geoffrey Poole 2003 Syntax, sound and minimalist goals: The case of Icelandic Stylistic Fronting. Paper presented at the 26th GLOW Colloquium, Lund, Sweden. in press Syntax vs. phonology. to appear in Phonological Knowledge: Perspectives from Phonology and from Syntax, Patrick Honeybone and Ricardo Bermúdez-Otero (eds.), Lingua special issue. in prep The paradox of realizational phonology, ms., University of Newcastle upon Tyne. Burton-Roberts, Noel, Philip Carr, and Gerard Docherty 2000 Phonological Knowledge. Oxford: Oxford University Press. Carr, Philip 2000 Scientific realism, sociophonetic variation, and innate endowments in phonology. In N. Burton-Roberts, et al. (eds.), 67–104. Chomsky, Noam 1991 Some notes on Economy of Derivation and Representation. In: Principles and Parameters in Comparative Grammar, Robert Freidin (ed.), Cambridge, Mass.: MIT Press. 1995 The Minimalist Program. Cambridge, Mass.: MIT Press. 1999 Derivation by phase. MIT Occasional Papers in Linguistics 18, Cambridge, Mass.: MITWPL, Department of Linguistics and Philosophy, MIT. Fischer, Suzanne and Artemis Alexiadou 2000 On Stylistic Fronting: Scandinavian vs. Romance. Working Papers in Scandinavian Syntax 68: 117–145. Gazdar, Gerald, Ewan Klein, Geoffrey Pullum and Ivan Sag 1985 Generalised Phrase Structure Grammar. Oxford: Blackwell. Grimshaw, Jane 1993 Minimal Projection, heads and optimality, ms., Rutgers University, New Brunswick, N.J.
364
Geoffrey Poole and Noel Burton-Roberts
Hayes, Bruce 1995 Metrical Stress Theory: Principles and Case Studies. Chicago: University of Chicago Press. Holmberg, Anders 2000 Scandinavian Stylistic Fronting: How any category can become an expletive. Linguistic Inquiry 31: 445–483. Jónsson, Jóhannes G. 1991 Stylistic Fronting in Icelandic. Working Papers in Scandinavian Syntax 48: 1–44. Kathol, Andreas 2000 Linear Syntax. Oxford: Oxford University Press. Legendre, Géraldine 2001 Masked second-position effects and the linearization of functional features. In Optimality-Theoretic Syntax, Géraldine Legendre, Jane Grimshaw and Sten Vikner (eds.), 241–275. Cambridge, Mass.: MIT Press. Maling, Joan 1990 Inversion in embedded clauses. In Maling and Zaenen (eds.), 71–91. Maling, Joan and Annie Zaenen 1990 Syntax and Semantics vol. 24: Modern Icelandic Syntax. New York: Academic Press. Pesetsky, David 1987 Wh-in-situ: Movement and unselective binding. In The Representation of (In)definiteness, Eric Reuland and Alice ter Meulen (eds.), 98–129. Cambridge, Mass.: MIT Press. Poole, Geoffrey 1996 Optional movement in the Minimalist Program. In Minimal Ideas, Werner Abraham, Samuel David Epstein, Höskuldur Thráinsson, and C. Jan-Wouter Zwart (eds.), 199–216. Amsterdam: John Benjamins Publishing Company. 1997 Stylistic Fronting in Icelandic: A case study in prosodic X°-movement. In Newcastle/Durham Working Papers in Linguistics 4, Philip Carr and William McClure (eds.), 249–285, Newcastle/Durham: Centre for Research in Linguistics, University of Newcastle upon Tyne and Department of Linguistics and English Language, University of Durham. Roberts, Ian 1994 Two types of head movement in Romance. In Verb Movement, David Lightfoot and Norbert Hornstein (eds.), 207–42. Cambridge: Cambridge University Press. Rögnvaldsson, Eiríkur and Höskuldur Thráinsson 1990 On Icelandic word order once more. In Maling and Zaenen (eds.), 3–40.
MLC violations: Implications for the syntax/phonology interface
365
Schafer, Robin 1995 Negation and verb second in Breton. Natural Language and Linguistic Theory 13: 429–471. 1997 Long Head Movement and information packaging in Breton. Canadian Journal of Linguistics 4: 169–203. Thráinsson, Höskuldur 1993 On the structure of infinitival complements. In Harvard Working Papers in Linguistics 3, Höskuldur Thráinsson, Samuel D. Epstein, and Susumu Kuno (eds.), 181–214. Cambridge, Mass.: Department of Linguistics, Harvard University.
Ergativity, Case and the Minimal Link Condition Arthur Stepanov
1. The issue In a number of ergative languages, such as Hindi, objects of transitives (and subjects of some or all intransitives) bear a morphologically unmarked case, often termed Absolutive, whereas subjects of transitives (and some or all intransitives) are assigned Ergative case. This is illustrated below (from Mahajan 1990): (1)
raam -ne roTii khaayii thii. Ram(m)-erg bread(f.) eat (perf.f.) be (past.f.) ‘Ram had eaten bread.’
I follow the widespread view that Absolutive is a structural Case. For Hindi, I also adopt the assumption extensively motivated in a number of works that Absolutive is parallel to Nominative, in that it is assigned in a similar syntactic environment, namely, by Infl/T(ense). (Bittner and Hale 1996a, 1996b; Bok-Bennema 1991; Campana 1992; Johns 1992; Murasugi 1992, 1995; Nash 1996; Phillips 1995; Woolford 1997). Under the currently held minimalist assumption that structural Case is a reflex of agreement (Chomsky 2001), the view that in Hindi the object, but not the subject, is assigned Nominative is supported by the fact that Hindi shows object agreement, but not subject agreement, in (1).1 Subject agreement obtains in the nominative-accusative pattern in Hindi (Hindi is a ‘split’ ergative language, which shows ergativity only in certain contexts, see, e.g. Mahajan 1990). Even the highest auxiliary enters agreement in both (1) and (2), which suggests that the licensor of agreement is high enough in the structure, namely, T: (2)
raam roTii khaataa thaa. Ram(m) bread(f.) eat(imp.m.) be(past.m.) ‘Ram (habitually) ate bread.’
In transitive sentences, the ergative subject c-commands the object, as in ‘accusative’ languages like English (see, e.g. Anderson 1976; Bobaljik 1993,
368
Arthur Stepanov
Mahajan 2000; Nash 1996). For instance, ergative subject can bind into the object, as shown in the following Hindi example (from Mahajan 2000): (3)
Salmaa-ne apne ghar kaa niriikshan kiyaa. Salma-erg. self’s house gen examination do (perf.m.) ‘Salma examined her house.’
Given that the subject c-commands the object in ergative languages, and that T is the locus of Nominative (see Chomsky 1995, 2000, also Pesetsky and Torrego 2001), one is led to suppose that the Nominative dependency is established across the ergative subject, as the latter is the closest candidate for this dependency. In other words, Nominative is assigned in apparent violation of the MLC: there is a closer ‘goal’ for establishing the dependency with Tense, namely, subject. (4)
T…. SubjERG … ObjNOM : # z---------m
In Hindi, the dependency is likely not to involve overt movement (see Section 4). It can then be stated in terms of pure Attract, or Agree (Chomsky 2000). In some ‘syntactically ergative’ languages, however, the dependency involves overt movement to Spec-TP. This is suggested, in particular, in Bittner and Hale (1996b) for Dyirbal. This paper focuses on the mechanism of Case assignment in transitive clauses in ergative languages under the configuration in (4). Its main claim is that the derivation of transitive ergative sentences proceeds in a manner that actually does not violate the MLC, even though the final representation in (4) gives an illusion of such violation. I will then motivate this proposal and discuss its consequences. Before I begin to discuss the details of the proposal, I want to clarify a number of additional background assumptions concerning the structure of ergative languages. Chomsky (1995: 186) explicitly rules out object raising across subject in languages like English, in the system in which structural Case is checked by (covert or overt) raising to the domain of Agr projections. (5) *AgrS…AgrO SubjACC … ObjNOM : : # # z----m # # z-------------m
Ergativity, Case and the Minimal Link Condition
369
Given that Accusative is checked in AgrO, and Nominative in the higher AgrS, Chomsky’s system universally precludes the ‘nesting’ pattern of checking in (5). Rather, both Cases must be checked in a ‘crossing’ manner. Similarly, in the Agr-less vP framework, it is not possible for the object to check the Case of Tense (Absolutive), because of the MLC considerations: the subject, having checked Accusative, would function as an intervenor for such checking (see Chomsky 1995, 2000, 2001 for discussion). Chomsky’s (1995) theory obviously requires a different approach to ergativity which does not imply the ‘nesting’ pattern. Indeed, Chomsky (1995: 176) does not assume that Absolutive is assigned by Infl/AgrS. This view is encoded in Chomsky’s suggestion concerning the ‘activeness’ of a particular Agr. Following Bobaljik (1993) (see also Laka 1993), Chomsky suggests that the parameter distinguishing ergative and accusative languages dictates which Agr is always present, or ‘active’, even in clauses involving intransitive verbs, in the respective language types. In ‘accusative’ languages, AgrS is active (so Nominative is always assigned with all predicate types). In ergative languages, AgrO is active. That is, both the subjects of intransitives and the objects of transitives receive Absolutive in Spec-AgrO, in the latter case instantiating the legitimate ‘crossing’ pattern. Under this approach, the MLC problem does not arise. However, there are independent reasons to doubt the correctness of this approach. For one thing, at least for languages like Hindi, the licensor of Object Case is higher than a licensor of Accusative would be (see above). Furthermore, both Absolutive and Nominative are often morphologically unrealized Cases, whereas ergative and accusative Cases are usually marked with overt morphology (cf. Dixon 1994). Additionally, there is a strong conceptual reason for not adopting the Agr parameter. As pointed out by Nash (1996), the absence/ inactiveness of AgrO in intransitive clauses in accusative languages – and the corresponding ‘activeness’ of AgrS – can, in reality, be predicted from the lexico-selectional properties of a given predicate. E.g., the fact that the unaccusative verb come, for instance, does not instantiate AgrO, follows from the information encoded in the verb itself (viz. selection). This suggests that the first ‘half’ of the parameter is not really parametric in the usual sense, but rather ‘principled’. At the same time, the systematic behavior of AgrO leaves one with a need for a similar explanation of the absence/inactiveness of AgrS, in ergative languages. But that does not seem to follow from anything, and, with the possibility of parameterizing no longer available, remains stipulative. In addition, in a framework without separate Agr projections, it becomes even more difficult to state the difference between ergative and accusative languages as a parameter on the ‘activeness’ of a Case assigning
370
Arthur Stepanov
head (both Case licensing heads, T and v, are ‘active’ for independent reasons such as their semantic content). Consequently, I will not assume such a parameter. A number of authors (Nash 1996, Bok-Bennema 1991, Alexiadou 2001, Mahajan 1994, among others) explore in various forms the hypothesis that the structure of verbal projection in ergative languages is unaccusative. This approach makes ergative languages compatible with the hypothesis that Absolutive, like Nominative, is assigned by Infl/Tense/AgrS. In unaccusative structures, the single argument of an unaccusative verb checks Nominative in the domain of Tense. For the present purposes, I will adopt the unaccusative hypothesis. The unaccusative proposal raises an obvious concern as to how the Case of the ergative subject in transitive clauses is licensed, given that an unaccusative structure usually gives no possibility for checking the second structural Case.2 The key to a solution lies in the idea that Ergative is an inherent Case, that is, its licensing is not dependent on particular structural configurations, unlike structural Cases like accusative. Strong empirical and conceptual support in favor of this idea has been put forth by Nash (1996), Oyharçabal (1992), Woolford (1997), Alexiadou (2001), among others. In particular, Woolford (1997) points out an apparent gap in the inventory of lexical/ inherent Cases available in UG: there is no lexical case associated with an external theta-role (e.g. agent). Ergative, the case normally assigned to agents in ergative languages, naturally fills in the gap. Cross-linguistically, ergative markers are usually the same markers used to indicate, e.g., instrumentals, locatives, and possessors (cf. e.g. Dixon 1994). This suggests that ergative markers may be associated with theta-roles, a natural state of affairs if Ergative is an inherent Case. Woolford (1997) hints at further distributional similarities between ergative Case NPs and those marked with other inherent Cases, to some of which I will return in Section 6. The hypothesis that Ergative is an inherent Case allows one to reconcile the Nominative on the object with the hypothesis that Tense is a Nominative licensor. However, it still does not resolve the MLC problem.3 The approach that I would like to undertake is that despite the appearance, the ergative subject actually does not intervene between Tense and the object, for the purposes of Case assignment. A solution along these lines require a more fine grained look at the derivation of an ergative transitive sentence. In order to make the guiding idea behind the proposed account clear, observe first a similar kind of apparent violation of the MLC in a com-
Ergativity, Case and the Minimal Link Condition
371
pletely unrelated construction type, namely, raising across the experiencer in English (Chomsky 1995): (6)
Johni T seems to Mary ti to be smart.
In (6) John raises in the checking domain of T to check Nominative, whereas a closer candidate, Mary, assigned an inherent case, is available, hence it should block raising of John, where closeness is defined in terms of c-command. That the experiencer c-commands into the lower clause, despite its embedding inside the PP, can be seen, e.g. from a Condition C violation in (7): (7) *John seems to heri to like Maryi What is common between the apparent violations of MLC in ergative languages like Hindi and the raising construction in (6) is that in both cases, raising takes place across an inherently Case marked NP. This suggests that the apparent obviation of the MLC effect in ergative languages is due precisely to the inherent nature of Case of the intervenor – ergative subject. This is the intuition I would like to explore. Chomsky (1995), (2000), McGinnis (1998b), who discuss raising across the experiencer and other relevant cases, suggest that items marked for lexical/inherent Case do not count as intervenors, for the purposes of the MLC: inherent Case is ‘inactive’ in this sense.4 I believe the suggestion is also applicable to the Case assignment pattern in ergative languages, under the above assumptions, thus removing the MLC problem. The ergative subject has inherent Case, hence is ‘invisible’ for establishing a dependency between the object and T. This is of course only the beginning of a solution. One would like to know what is special about inherent Case in constructions at issue, which makes it ‘inactive’, or ‘transparent’ for attraction across the element bearing it. However, the reason why inherent Case makes the element ‘transparent’, has so far remained unclear.5 I suggest that inherently Case marked ergative subjects, as well as experiencers in English, are not Merged into the structure until after the dependency between Tense and object has been established. More precisely, I suggest that inherently Case marked NPs are Merged post-cyclically. Consequently, at the time Nominative is assigned by Tense, there is no intervention effect simply because the potential intervenor – ergative subject – is not yet introduced into the structure. At the same time, when the ergative subject is Merged into the structure, the Tense-object dependency is no longer active,
372
Arthur Stepanov
as the object has already checked its features against Tense. On the surface, postcyclic insertion of the ergative subject creates an illusion of violating the MLC, whereas in reality there is none. The rest of the paper is devoted to building an argumentation in support of the ‘late Merger’ proposal. In the course of the argumentation, I also refine the structure of transitive clauses in ergative languages.
2.
Late merger of adjuncts
2.1. The algorithm In Stepanov (2001b), (2002), I argue that one class of syntactic objects must enter the structure post-cyclically, namely, XPs that are Merged by adjunction. In contrast, XPs that are Merged by substitution, must enter the structure cyclically. This theory formalizes a stronger version of the thesis previously advanced in Lebeaux (1988). I will outline the basic idea of the algorithm of structure building proposed in these works, and the major points of its formal implementation. The theory utilizes the strong form of Chomsky’s (2000) economy condition of ‘Least Tampering’, according to which structure building operations tend to preserve basic relations inside the existing structure as much as possible. The relevant relation is assumed to be c-command. I restate Least Tampering as in (8), with the definition of c-command and domination in (9) and (10), respectively: (8)
Least Tampering (modified from Chomsky’s 2000 version) Given a choice of operations applying to a syntactic object labeled _, select one that does not change @(_). @(X) – a set of c-command relations in a syntactic object labeled X.
(9)
_ c-commands ` iff neither _ nor ` dominates the other and the first branching node that dominates _ dominates `.
(10) _ is dominated by ` only if it is dominated by every segment of `. Suppose a derivational space contains a phrase marker labelled `, formed by Merging _ and non-minimal X, so ` is either _ (_ a head) or X (_ a specifier) depending on which one projects.
Ergativity, Case and the Minimal Link Condition
(11) a.
` ru _ X
373
b. @(`) = {, <X,_>, …}
A substitution Merger of some a with ` (again, a a head or a specifier) creates a new object; let us label it b. This Merger adds new c-command relations, those involving the Merged element a. At the same time, this Merger does not change the set of c-command relations inside the (existing) object labelled `: this set remains the one in (12b): (12) a.
b ru a ` ru _ X
b. @(`) = {, <X,_>, …}
Further substitutions would take place along the same pattern. That is, substitution Merger of heads or specifiers at the root, is in accord with the conception of Least Tampering in (8). Suppose now that a is Merged acyclically to X in (11a), resulting in the object in in (13a) (for ease of exposition, take a to be a head). The set of ccommand relations now is shown in (13b): (13) a.
` ru _ amax ru a X
b. @ (`) = {, , , , , …}
The object in (13a) is still labeled `, but the set of c-command relations in it has changed, as can be seen from comparison of (13b) with (11b). New ccommand relations – those involving a – have been added in the set. A similar situation arises if a has been Merged by adjunction. Thus, the derivation reaching the stage in (11) can compare the option of Merging something at the root, or not at the root, and determine that the option of Merging at the root leads to preservation of the existing set of c-command relations.6 Consider now an adjunction Merger of a with ` in (11a). This adjunction creates a segmented object, as in (14a). Note that inside this newly created segmented object, the set of c-command relations remains as in (11b): by definitions in (9) and (10), neither a nor c-command the elements of
374
Arthur Stepanov
the other (the “the first branching node…” clause of (9) does not apply), so c-command between a and is undefined. (14) a.
` b. @() = {, <X,_>…} = @(`) ru a ` ru _ X
Nothing in principle precludes adjunction at the point in (14a). In fact, further adjunctions may take place, creating more than two segments of `: @() would still not change. This cyclic adjunction are allowed if no further structure building by substitution is to take place. If further substitution Merger of some l takes place, a new object, labelled b, is created: (15)
b ru l ` ru a ` ru _ X
Merger of l adds new relations inside the newly created b. Crucially, this Merger also changes the set of c-command relations inside the existing segmented object : by definitions in (9) and (10), a now c-commands the elements of , and c-commands a (the “the first branching node…” clause of (9) now applies). (16) @() = {, , , , ,…} This situation arises because an instance of adjunction was followed by a (cyclic) instance of substitution. Thus, even though a cyclic adjunction is in itself a permissible derivational step, the next cyclic substitution adduces tampering inside the existing structure, by redefining the set of c-command relations in it. The system tends to avoid this kind of ‘tampering’ by not resorting to adjunction (when possible). Thus, given a choice of items to be Merged by substitution or adjunction, the system chooses to use substitution, as a more ‘economical’ choice.7 Upon the end of the cycle, structural adjuncts (that is, objects that must enter the structure by adjunction) are inserted inside the existing structure
Ergativity, Case and the Minimal Link Condition
375
(in cases of multiple adjunctions, the system imposes no particular ordering of adjunction). At this stage, even though adjunctions induce tampering inside the existing structure, they are still allowed, in order to ensure that the derivation converges: the system no longer has a choice, and considerations of economy do not arise. I also assume, on natural grounds, that adjuncts nevertheless must enter the structure in overt syntax, given that their placement has visible PF effects.
2.2. Inherently Case marked NPs as structural adjuncts The view I would like to pursue here is that the class of objects that are merged by adjunction (hence, postcyclically) is not equivalent to the class of theta-theoretic adjuncts (see also Chametzky 2000). This follows given that the mode of Merger need not be, contrary to what is sometimes assumed (for instance, in connection with the Projection Principle in the GB era), regulated by theta-theoretic properties of the syntactic object to be Merged, that is, by the distinction between theta-theoretic arguments and non-argument modifiers of a head (=adjuncts). It is reasonable to suppose in this respect that the theta theory, arguably part of the semantic component, cannot impose its requirements on the narrow syntactic mechanism of structure building. Rather, the factor dictating a particular mode of syntactic Merger is most naturally expected to belong to narrow syntax itself. Divorcing the theta-theoretic distinction from the substitution/adjunction distinction yields four logical possibilities: both theta-theoretic arguments and theta-theoretic adjuncts can in principle be Merged by substitution or adjunction (the theta-theoretic and structure building status of syntactic objects may coincide, but does not have to). Furthermore, this move sharpens the need for a formal criterion determining whether each instance of Merge is a substitution and adjunction. Stepanov (2001b) formulates this criterion in terms of uninterpretable feature(s), and defines adjunction and substitution as ‘mirror images’ of each other: (17) A non-projecting syntactic object _ is Merged with a syntactic object ` by adjunction iff the label of _ contains no active (‘unchecked’) uninterpretable feature(s). (18) A non-projecting syntactic object _ is Merged with a syntactic object ` by substitution iff the label of _ contains active (‘unchecked’) uninterpretable feature(s).
376
Arthur Stepanov
It should be kept in mind that the definition in (18) does not necessarily imply checking uninterpretable feature(s) immediately upon Merger. Rather, both (17) and (18) imply an additional property of uninterpretable features, namely, that they trigger projection of a full category. They make no new claims about the mode of checking. Let us now become more specific with respect the kinds of uninterpretable features involved. Adopting the terminology of Chomsky (2000, 2001), we must be dealing here with uninterpretable features of ‘goals’. Following Chomsky, I concentrate on two such features. The first is structural Case. This feature belongs in the A-system, in the traditional sense. For Chomsky, structural Case is not a feature per se; rather, it is a ‘flag’ that makes the goal – the phi-features of an NP – visible for a probe (say, T or v). Nevertheless, Case can conceivably be regarded as an uninterpretable feature, given that it must be eliminated before the derivation reaches the interface level. I take this to be true. Another uninterpretable feature is the wh-feature of wh-phrases undergoing overt wh-movement. This feature belongs in the A’-system. In taking this feature to be uninterpretable I essentially follow Chomsky (2000, 2001), while departing from Chomsky 1995).8 In languages with overt wh-movement, this feature participates in establishing a dependency with the correspondent ‘probe’, such as the interrogative Q feature of the complementizer. Thus even in examples like How did John fix the car? How, a theta-theoretic adjunct, is Merged into the structure by substitution, not adjunction.9 For the present purposes, I will not be concerned with the empirical effects of the wh-feature being in the label of a phrase. See Stepanov (2002) for a more detailed discussion. On the other hand, a non-wh-counterpart of how, e.g. with a hammer, is a PP, which, presumably have neither a wh-feature nor a Case feature in the label of the PP (the Case of a hammer checked off, presumably, by the preposition). Consequently, such PPs enter structure by adjunction, as assumed in the traditional, pre-Larsonian VP structure. In fact, by our definition such PPs will be Merged by adjunction, regardless of their theta-theoretical status. In the present terms, on the shelf in I saw a book on the shelf and I put a book on the shelf enter the structure by adjunction, that is, postcyclically, in both cases, even though it is included in the argument structure of put, but not saw. Generally, in the present system optionality of a phrase (again related to its theta-theoretic status) does not dictate its mode of Merger into the structure. In this regard, consider the example in (19):
Ergativity, Case and the Minimal Link Condition
377
(19) The man likes Mary. I adopt the proposals in Chomsky (2000, 2001), according to which both the man and Mary have an uninterpretable structural Case property in their label. In the present system, then, they both must enter the structure by substitution (as specifiers). It follows that they enter the structure cyclically (as always assumed), later checking their Case feature against corresponding functional heads (T and v). On the other hand, the after-phrase in (20) has no uninterpretable features (in particular, no structural Case feature) in its label. Consequently, it must enter the structure by adjunction, hence postcyclically. (20) John went to bed [after Peter fixed the car]. The view that the after-phrase is inserted postcyclically, accounts for the impossibility of wh-extraction out of it, well known as an Adjunct Condition effect (Huang 1982): (21) ?*What/??Which car did John [VP [VP go to bed] [after Peter fixed ti]]? The fact that the after-phrase contains an uninterpretable wh-feature residing in what is immaterial, since this feature is ‘buried’ sufficiently deep in the structure as to not figure in the label. Under (17), the after-phrase must be Merged with the rest of the structure [vP … go to bed] by adjunction. The late adjunction algorithm forces it to be Merged postcyclically. In particular, it cannot be Merged by the time the interrogative feature Q of the matrix complementizer is Merged with the IP John go to bed. Consequently, under the fairly standard view confining movement to a single phrase marker, the relevant feature of the matrix C cannot be satisfied by wh-movement, since the only available candidate for such movement, namely, what, is not part of the same phrase marker. As a result, this feature remains unchecked, which is responsible for the ungrammaticality of (21) (see Stepanov 2001a for further discussion of Adjunct Condition phenomena from this perspective). In the framework outlined above, ‘true’ inherently Case marked NPs do not have a structural Case feature in their label. By (17), it follows that such NPs (non-wh versions) enter the structure by adjunction. Consequently, the algorithm of Section 2 forces those NPs to enter the structure postcyclically.
378
Arthur Stepanov
3. A late Merger solution to the ‘ergative’ MLC problem The theory of postcyclic adjunction and the formal definition of structural adjuncthood in (17) provides one with necessary tools to formalize the intuition behind the ‘late insertion’ of ergative subjects in sentences such as (1) from Hindi, repeated below: (1)
raam -ne roTii Ram(m)-erg bread(f.) ‘Ram had eaten bread.’
khaayii eat (perf.f.)
thii. be (past.f.)
I continue to adopt the unaccusative analysis of transitives in ergative languages, in line with the authors mentioned above. I assume that the argument structure of transitives in ergative languages involves an unaccusative light verb v, which is ‘defective’ in the sense that it is neither a theta-role assigner nor a structural Case checker (cf. Harley’s 1995 ‘non-causative Event verb’, also Alexiadou 2001, Nakamura 1998).10 The subject is Merged with the vP, headed by this light verb, while the object is generated as a complement of V. Importantly, since Ergative is an inherent Case, ergative subjects do not have an (unchecked) structural Case feature in their label. In that sense, they are similar to PPs, as considered in the previous section, since those too crucially lack such feature. In fact, independent evidence confirms the view treating ergative subjects as PPs. The intuition goes back at least to Hale (1970) who considers ergative subjects on a par with by-phrases in English passives, and also finds its place in Bittner (1994) and Bittner and Hale (1996a, 1996b). Mahajan (1997) provides strong empirical arguments for the adpositional status of the ergative Case marker. He observes that in Hindi this marker can be separated from the NP by an emphatic marker (22). In addition, it appears after a coordinate NP (23): (22)
Raam-hii- ne /us bacce-hii-ne Ram –emph-erg that boy-emph-erg ‘Ram/that boy’
(23) a.
Raam or siitaa-ne/ us bacce or us baccii-ne Ram and Sita-erg that boy and that girl-erg ‘Ram and Sita/that boy and that girl’
b.
Uske pitaa yaa bhaaii-ne her/his father or brother-erg ‘Her/his father or brother’
Ergativity, Case and the Minimal Link Condition
379
As (22) and (23) show, the ergative marker displays a more remote relation to the nominal stem than usual Case markers. Further evidence comes from the fact that Ergative appears to behave like a P in that it assigns/checks a case feature of its nominal complement. This becomes transparent in the morphological class of nominal stems ending in –a: thus baccaa “child” but bacce-ne “child-erg” (I am grateful to a reviewer for reminding me of this fact). It is then reasonable to suppose that the Ergative marker instantiates a category P. Any ergative subject, in effect, has a PP structure in which P takes a NP as its complement. The internal structure of the ergative subject is therefore similar to other instances of PP, in particular, the experiencer phrase in English raising constructions (Section 1).11 Being an NP marked with inherent Case, the ergative subject will be Merged into the structure by adjunction, by (17). In turn, the late adjunction theory will force it to be Merged postcyclically. Since the ergative subject is inserted postcyclically, it does not intervene for establishing the (cyclic!) dependency T-object, because at the point of establishing this dependency, the ergative subject is not yet in the structure. It is inserted after the dependency is ‘deactivated’ by virtue of checking Absolutive (in the case of embedding, this insertion takes place after the higher portion of the structure is built). At the time of raising, there is no closer candidate to be Attracted to the matrix T than the lower subject. The relevant derivational steps are: (24) – Create: – Check Absolutive by T (cf. Attract/Agree) … – Insert subject (postcyclically):
[TP T roTii khaayii thii] [TP T roTii khaayii thii] :___! z___m [T raam-ne roTii khaayii thii]
As (24) demonstrates, the MLC is not violated upon checking Absolutive since the object is the only candidate to enter a dependency with T, during the cyclic portion of the derivation.12 This solution to the ‘transparency’ of ergative subject favors the ‘single cycle’ model such as that based on Agree (Chomsky 2000, 2001). In the Agree-based system, the probe (Tense) finds the goal (Object) and establishes the matching relation with it, by virtue of which the phi-features of Tense are valuated, and the (uninterpretable) Case property of the object is inactivated. After that, Agree is no longer relevant: the dependency as such ceases to exist. Consequently, inserting the ergative subject in between Tense and the
380
Arthur Stepanov
object does not affect the dependency, as there is no longer a dependency as such. A ‘two-cycle’ approach in terms of pure Attract F (Chomsky 1995), presupposes the existence of covert component at which feature movement takes place. Given that the ergative subject must be Merged by adjunction in overt syntax, ceteris paribus, it should render an intervention effect in the covert component, incorrectly predicting the transitive sentences in ergative languages to be ungrammatical. If this line of analysis is on the right track, it serves as an argument for the ‘single-cycle’ model.
4. Syntactic inertness of ergative subject Given that the ergative subject does not participate in cyclic dependencies, and especially in light of the fact that it does not have a structural Case feature in its label, it is not surprising that it does not show subject agreement in the Hindi (1), repeated here: (1)
raam -ne roTii Ram(m)-erg bread(f.) ‘Ram had eaten bread.’
khaayii eat (perf.f.)
thii. be (past.f.)
Furthermore, the present approach makes a clear prediction with respect to the syntactic behavior of ergative subjects, stated in (25): (25) Ergative subjects do not participate in (overt) cyclic operations The reason for that is obvious: as adjuncts, ergative subjects do not participate in the cyclic portion of (overt) structure building; hence, they are not expected to interact with other parts of the cycle. The prediction is borne out, for the large majority of cases discussed in the literature. Baker (1997), Bittner and Hale (1996a), (1996b), among others, argue that the ergative subject does not raise from its base generated position in the vP (see above). In addition, Mahajan (1990) shows that an ‘A-scrambled’ object in Hindi is positionally higher than the subject, as it can bind into the subject. This supports the view that the subject stays in-situ:
Ergativity, Case and the Minimal Link Condition
381
(26) a. ? mohan-koi apnei baccoN-ne ghar se nikaal diyaa. Mohan-DO self’s children-erg. house from throw give-perf. lit. ‘Self’s children threw out Mohan from the house.’ nikaal diyaa. b. cf. *apnei baccoN-ne mohan-ko ghar se self’s children-erg mohan-DO house from throw give Anand and Nevins (2002) observe that ergative subjects differ from nominative subjects in Hindi with respect to the possibility of scope reconstruction. In particular, while nominative subject and accusative object show scopal ambiguity allowing both surface and inverse scope, as in English, the inverse reading disappears in the ergative pattern. This is shown in (27) (from these authors): (27) a. koi kumhaar har tarah-ka gharaa banaata hai. some potter-nom every kind-gen vessel-acc make-imp aux-pres ( > , > ) ‘Some potter makes every type of vessel’ b. koi kumhaar-ne har tarah-ka gharaa banaaya hai. some potter-erg every kind-gen vessel-abs make-perf aux-pres ( > , * > ) ‘Some potter makes every type of vessel’ Inverse scope is otherwise possible for perfective constructions (e.g. intransitives involving locative PPs). Anand and Nevins interpret this fact to the effect that the ergative subject does not reconstruct into its base-generated position from Spec-TP, where this subject is assumed to be located. These authors correlate the impossibility of such reconstruction with the EPP being the only driving force form movement of ergative subject to Spec-TP. This approach begs the question why the EPP should have the ‘no reconstruction’ effect. A different re-interpretation of these data is available, under which this question does not arise. If the ergative subject is Merged postcyclically, as maintained in the present proposal, it simply cannot raise in principle (at least in overt syntax), since raising is a cyclic dependency. The lack of reconstruction then naturally follows.13 Moving beyond Hindi, the ‘inertness’ of ergative subjects is also implied in Marantz’s (1991) generalization: (28) Marantz’s (1991) ergative generalization Even when ergative case may go on the subject of an intransitive clause, ergative case will not appear on a derived subject
382
Arthur Stepanov
Marantz’s concern was primarily the difference in Cases assigned on subjects of intransitive verbs, along the unaccusative/unergative distinction. He observes that in the ergative pattern, the subject of unergative verb bears ergative Case, but the subject of an unaccusative is necessarily Nominative (see also Laka 1993). The fact that unaccusatives do not get ergative Case can be seen most dramatically in split ergative languages, in which a single factor like a particular tense series or aspect gives rise to the ergative pattern. Thus, in Georgian the nominative-accusative pattern obtains in the present and future tense series, and the ergative pattern in the aorist, as shown in (29) (examples here and below from Harris 1981, via Marantz 1991): (29) a.
b.
vano pikr-ob-s marikaze. Vano-nom think-infl Marika-on ‘Vano is thinking about Marika.’ vano-m i-pikr-a marikaze. Vano-erg think-infl Marika-on ‘Vano thought about Marika.’
Interestingly, the aorist tense – the trigger for split ergativity in Georgian – does not lead to Ergative assignment exactly in those structures which involve derived subjects, namely simple unaccusatives and psych verbs. The simple unaccusative example is show below: (30) a.
b.
es saxl-I ivane-s a=u-‰endeb-a. this house-nom Ivan-dat built-Infl ‘This house will be built for Ivan.’ es saxl-I ivane-s a=u-‰end-a. this house-nom Ivan-dat built-Infl ‘This house was built for Ivan.’
These facts confirm that ergative is not a structural Case, since it cannot be assigned in a non-thematic (moved) position. This is, of course, in accord with the view on Ergative as inherent Case, assumed in the present work.14 Furthermore, Marantz’s generalization confirms the expectation made by the late adjunction account of the ergative pattern. Under the late adjunction account, we do not expect ergative to be assigned on a derived subject as a result of establishing a cyclic dependency (see Hale and Keyser 1993), simply because the subject is not yet in the structure. I also note, in passing, that under the standardly assumed internal subject hypothesis, Marantz’s
Ergativity, Case and the Minimal Link Condition
383
generalization indirectly confirms that ergative subjects of transitives do not undergo overt movement to Spec-TP which for the relevant purposes qualifies as a derived position.15
5. Alternative: An equidistance solution One possible alternative way to account for the lack of MLC effects with ergative subjects is to suppose that the Ergative subject enters the structure cyclically, and, furthermore, the Ergative subject and Absolutive object are in the checking domain of the same head, in the sense of Chomsky (1995). Under the unaccusative hypothesis, the idea is naturally implemented if this head is V. This structure for transitive ergatives is entertained in Nash (1996) who suggests it for independent reasons (she does not discuss the MLC issue). For Nash, the ergative subject is generated in Spec-VP, and object as complement of V. The subject is thematically licensed by V. Being in the same checking domain, the subject and object qualify as equidistant from T. Hence, the subject would not count as an intervenor for the Nominative dependency T-object. To avoid potential complications concerning Burzio’s generalization (the Spec-VP is assigned external theta-role, but the complement does not check Accusative), Nash places this proposal in the ‘vP-shell’ framework, in which little v, not V, is a head relevant for Burzio’s generalization.16 In ergative languages, Nash argues, the category v is not present in the lexicon; consequently, the subject is licensed in Spec-VP, which results in the ergative pattern. There are three problems with the equidistance solution, or more precisely, with its premises. First, allowing the external theta-role to be assigned also in a specifier of lexical V makes the system somewhat redundant, since this adds a second option in UG for assigning this theta-role, in addition to the usual one – via v. Postulating that v is absent from the lexicon of ergative languages supposedly leaves only the lexical V option for assigning the thetarole in these languages. However, it also rules in the lexical V option for ‘accusative’ languages like English, in cases when one simply chooses not to draw v in the numeration, in transitive sentences. This wrongly predicts an ergative pattern in these languages. Second, the proposal that there is no v in the lexicon of ergative languages is problematic in light of the phenomenon of split ergativity, whereby a single language displays both an accusative pattern in some cases, and an ergative pattern in others.
384
Arthur Stepanov
The third argument against the equidistance solution is drawn from the absence of ‘lethal ambiguity’ effects in ergative languages, in the sense of McGinnis (1998a, 1998b). McGinnis shows in detail that when two arguments are in the same checking domain at some derivational point, for instance as two specifiers of the same head, no anaphoric dependency can be established among them. ‘Lethal ambiguity’ lies behind the well known restrictions on anaphoric clitic placement in Romance, analyzed in terms of ‘Chain condition’ (Rizzi 1986; see Bo‰koviç to appear for further evidence for ‘lethal ambiguity’). For McGinnis, ‘lethal ambituity’ is responsible for the ungrammaticality of (31b), involving passivizing direct objects in Albanian (Massey 1990): (31) a.
secili djale iu tregua each boy-nom show-nonact ‘Each boy was shown to his father.’
babes father
te tij. his-dat
b. * Drita iu tregua vetes prej artistit. Drita-nom show-nonact self-dat by the-artist ‘Drita was shown to herself by the artist.’ Observe now that under the equidistance solution, presupposing that both the subject and object must be in the same checking domain, we expect ‘lethal ambiguity’ at the point subject is Merged into the structure (on the assumption of formal non-distinctness of complements and specifier, common in bare phrase structure). The prediction is that it should never be possible to have constructions analogous to John likes himself, in ergative languages under discussion. This prediction is not borne out, as shown below for Georgian (cf. Harris 1981): (32) vanom tavisi tavi Vano-erg self’s self-nom ‘Vano convinced himself.’
daircmuna. convince-aor
Finally, the equidistance solution does not capture an important correlation observed with respect to the inherent Case. In particular, ergative subjects (more generally, inherently Case marked NPs) not only are ‘transparent’ for the purposes of the MLC (they do not render an intervention effect), but they are also syntactically ‘inert’, as discussed in the previous section. I conclude that an equidistance analysis is implausible.17 I now turn to discussion of ergative languages that show subject agreement.
Ergativity, Case and the Minimal Link Condition
385
6. On subject-agreement languages The account of ergativity presented above capitalizes on the absence of ergative subject agreement in ergative languages like Hindi. Under the standard view that agreement signals a cyclic dependency, the lack of subject agreement confirms that the subject does not establish a cyclic dependency with any element in the structure. This is expected given that the subject enters the structure late (postcyclically), after all cyclic dependencies have been established. In light of this view, I would like to sketch, if only in a somewhat rough manner, a possible account of ergative languages that do show subject agreement. Basque is one such language. Oyharçabal (1992) argues that the ergative subject in Basque does enter a cyclic dependency, as it shows ergative agreement with the verb, similarly to Nominative subjects: (33) Liburuak amari book-abs mother-dat
nik ekarri *zitzaizkion / nizkion. I-erg brought 3abs.aux.pl.3dat (3abs)1erg.aux.pl.abs.3dat ‘I brought the books to (my) mother.’
Inuit is another subject agreement language (see, e.g. Johns 2000). I assume, following many authors (cf. e.g. Bittner and Hale 1996b) that the ergative subject in these languages checks a structural Case and agreement against Infl/Tense.18 This proposal entails, among other, that Tense is unavailable to check the Case of the Absolutive object. Two questions arise: 1) How is the Absolutive object assigned its Case, and is the mechanism assigning Absolutive in this case observes the MLC? 2) What is responsible for the parametric variation between the subject-agreement type and no-subject agreement (Hindi) type languages? As a first step in approaching these issues, it is necessary to reconcile the fact that Ergative can check a structural Case in languages like Basque with our view that Ergative is an inherent Case. More specifically, it would appear that the Ergative subject in languages like Basque or Inuit checks both an inherent and a structural Case. I would like to argue that this is indeed true, and the observed duality of Ergative mirrors the duality of Icelandic ‘quirky’ Cases. Following Chomsky (2000), Freidin and Sprouse (1991), Frampton and Gutmann (1999), Zaenen et al. (1985), among others, I assume that the Dative in (34) checks an additional structural Case feature in T (see these authors for empirical evidence to this effect):
386
Arthur Stepanov
(34) Joni var John-dat was ‘John was helped.’
hjálpa› ti. helped
Since there is an additional structural Case feature in the label of Joni, the algorithm in Section 2 forces it to enter the structure cyclically. As a result, it is able to participate in cyclic dependencies, such as passivization in (34), and more importantly, to render an MLC effect in contexts analogous to (6) (see Sigur›sson 2000; Boeckx 2000; Stepanov 2002; among others, for extensive discussion and examples). Similarly, if the Ergative subject in the Basque (33) has an additional structural Case feature, it is expected to be Merged cyclically. The suggested parallel between Icelandic and Basque-type languages implies that the Dative in the Icelandic (34) must also be in some agreement relation with T. This agreement relation is not seen on the surface. However, Boeckx (2000) argues that the agreement, though morphologically invisible, still obtains and its presence can be diagnosed by utilizing the ‘Person-Case’ generalization discussed by Bonet (1994): (35) If Dat (agreement), then Accusative or Absolutive/Nominative (agreement)=3rd person Indeed, Icelandic generally allows only third person agreement with a Nominative argument, when a ‘quirky’ subject is present, as shown below (examples from Sigur›sson 2000, see Schütze 1997, Anagnostoupoulou 2003 for discussion of other relevant cases): (36) a. ?* Henni líku›um her-dat liked-1pl. ‘She liked us.’ b.
Henni líku›u her-dat liked(3pl) ‘She liked them.’
vi›. we-nom heir. they-nom
On the other hand, number agreement obtains between verb and Nominative object: (37) Henni leiddust/*?leiddist heir. her-dat bored-3pl/bored-3sg they ‘She was bored with them.’
(Taraldsen 1995)
Ergativity, Case and the Minimal Link Condition
387
Sigur›sson (2000) considers the restriction on person agreement as a case of ‘dative intervention’, in the following sense: (38) In the Dat-Nom construction, the dative satisfies the matching requirements of Pers (inducing ‘null-agreement’), thereby intervening between Pers and the nominative (by Relativized Minimality) [p. 92] That is, the ‘quirky’ NP, as a structurally higher element, enters partial, Person agreement with the sentential person licensor (Pers for Sigur›sson, T for Chomsky), which has no overt morphological reflex. As a result, Person agreement is blocked with the lower Nominative, by MLC. Significantly, a similar person restriction on the Absolutive object is also observed in ergative languages (cf. Dixon 1994, Jelinek 1993, among others). In particular, the Ergative/Absolutive Case pattern in many languages switches to the nominative/accusative pattern just in case the transitive object is a 1st or 2nd person pronoun. It is ergative when the object is a 3rd person pronoun or a full NP. This is illustrated in Dyirbal (after Dixon 1994 and Isaak 2000): (39) a.
bagul yara-gu balan dyugumbil class man-erg class woman-abs ‘The man is hitting the woman.’
balga-n. hit
b. cf. balan dyugumbil bani-nyu. class woman-abs come ‘The woman is coming.’ (40) a.
adya inuna I-nom you-acc ‘I am hitting you.’
balga-n. hit
b. cf. adya bani-nyu. I-nom come ‘I am coming.’ In other words, it is not possible to retain the ergative pattern when the object is not 3rd person. This is strikingly reminiscent of (36b) in which the dativenominative pattern cannot be retained. I take this parallel to support the view on Ergative subjects which establish agreement with T as ‘quirky’, in the sense of Icelandic. The observed similarity in person restrictions in Icelandic and subjectagreeing ergative languages suggests a common analysis of the two. I adopt
388
Arthur Stepanov
the view in Alexiadou (to appear) who suggests that Absolutive is assigned in a functional projection lower than T, which she terms Asp(ect) (see also Woolford 1997). Thus the pattern of Case assignment in the agreeing languages is the following (Sigur›sson 2000 develops a similar proposal for the dative-nominative pattern in Icelandic): (41) [T…[Asp…[Erg…Abs…]]] According to Alexiadou, the impossibility of 1st and 2nd object Absolutive is due to the fact that Asp does not check person, but only number. 3rd person, in this analysis, is not an instantiation of person feature; thus Asp can check 3rd person via a number feature: no MLC violation arises in this case. This provides an answer to question 1) posed above. At the same time, 1st and 2nd person DPs need to check their person feature, and the only person licensor in the clause is T. Consequently, 1st and 2nd person DPs can be found as Ergative subjects, which check their phi-features in T, but not as Absolutive objects, which check their (number) feature in Asp. Coupled with our analysis of Case assignment in non-subject agreeing languages like Hindi, proposed in the previous sections, the resulting approach is in fact more fine-grained than the one proposed by Alexiadou. Alexiadou does not distinguish between ergative languages with respect to the possibility of person split. Thus, for her, even languages like Hindi are expected to show person split in the ergative pattern, apparently incorrectly. Correspondingly, under Alexiadou’s approach, all ergative subjects being licensed by T, as in (41), are expected to show agreement. For us, on the other hand, person split is not expected in Hindi, correctly. This is so because Absolutive in Hindi is assigned not by Asp, as in (41), but by T, as in (4). Correspondingly, the Absolutive object, not subject, agrees with T. Asp in the relevant sense does not play a role in Hindi. This sheds light on the question 2) posed above. Whatever property accounts for the parametric variation between Icelandic which allows ‘quirky’ subjects and languages like English that do not (cf. *Him was helped), should also be relevant in the division of ergative languages into those in which subject agreement with T obtains (Basque, Inuktitut etc.), and those in which it does not (Hindi). This property should at the same time govern the distribution of Asp, so that it can assign Absolutive in those languages in which the ‘quirky’ subject enters the agreement relation in T. I leave for future research a more detailed investigation of this property. If the above line of reasoning is on the right track, it opens a possibility of reconciling the tension between conflicting views in the literature con-
Ergativity, Case and the Minimal Link Condition
389
cerning the mode of licensing of Absolutive Case. As mentioned in Section 1, a number of authors propose an account of ergative Case assignment on the assumption that Absolutive is licensed as Nominative, viz. by Infl/Tense. In contrast, Bobaljik (1993), Laka (1990), Chomsky (1995), and, more recently Elordieta (2001), argue for an account of ergativity which treats Absolutive as Accusative. It is conceivable, then, there can be no unified account of the ergative-absolutive Case pattern, and Absolutive can in principle be licensed in a different manner, the concrete manifestation being determined by independent factors such as the role of Asp in a language, and the option of having ‘quirky’ subjects. Oyharçabal (1992) arrives at a similar conclusion, arguing that Absolutive is not a structurally homogeneous Case (for him, it can be checked in different Agr projections even within one language; in particular Absolutive is checked in AgrO in transitives and unergatives, but in AgrS in unaccusative constructions). Exploring this possibility remains a task for a future study.
7. Conclusion This paper focused on the problem of an apparent MLC violation in the Case assigning pattern in languages like Hindi. Adopting the view that structural Case is a reflex of agreement (Chomsky 2000, 2001), I suggested an analysis reinforcing the validity of the MLC. The solution capitalizes on the timing of derivational steps which introduce the intervenor- ergative subject – and establish a dependency between Tense and the object. I argued that the ergative subject is introduced into the derivation after the dependency is established and deactivated; the system that established the dependency, does not ‘see’ the ergative subject for the purposes of the MLC, at the relevant derivational point. Furthermore, at the point the ergative subject is introduced, it never intervenes into the dependency, because at that point there is no dependency as such. One theoretical outcome of the proposed analysis is that it contributes to the long-standing debate concerning the derivational vs. representational view of syntactic computation in general and the MLC in particular (for overview and discussion see Browning 1991, Lasnik 2000 and Chomsky 1973, among others). It seems quite difficult to imagine an insightful fully representational account of ‘voiding’ the MLC effect in ergative languages. Under an approach takes the MLC to be a condition on (final) representations, transitive sentences in ergative languages ought to be ungrammatical, violating the MLC. On the other hand, the present study demonstrated that all steps
390
Arthur Stepanov
of the derivation of sentences involving ergative subjects are legitimate. The fact that the output of such derivation is grammatical therefore supports the derivational approach. Another result of the present theory is a more principled distinction between adjunction and substitution form of Merge. Indeed, there is a sense in which the ontological duality of Merge a priori might seem as a departure from ‘conceptual necessity’: a single form of Merge would clearly lead to a simpler system. The present study ties the duality of Merge with the presence/ absence of uninterpretable feature(s) of ‘goals’ being Merged. The device of uninterpretable features is available anyway, so its serving additionally as a regulator for the mode of Merge naturally accounts for the duality of the latter.19 Additional support is also provided for Chomsky (2001) mechanism of feature valuation, involving uninterpretable features of ‘goals’, and for the proposal that structural Case and agreement are essentially the same phenomenon.
Acknowledgements I thank Joanna Blaszczak, Îeljko Bo‰koviç, Gisbert Fanselow, Martha McGinnis, Penka Stateva, Ralf Vogel and the audience at the Workshop for fruitful discussions of this work. I am also grateful to two reviewers for their very helpful remarks. This research was supported by the German Research Foundation (DFG) grant FOR375 “Conflicting Rules and Conflict Resolution Strategies in Cognitive Science” and the DFG project DO544/1-1 “Non-Structural Case”.
Abbreviations in glosses erg= Ergative, abs=Absolutive, perf=Perfective, imp=Imperfective, m=masculine, f=feminine, DO=direct object, nom= Nominative, acc=Accusative, gen=Genitive, dat=Dative, pres=Present, nonact= nonactive voice, aor=aorist, pl=plural.
Ergativity, Case and the Minimal Link Condition
391
Notes 1. Not all ergative languages show the Hindi pattern of agreement. I discuss languages with subject agreement in Section 6. 2. Bok-Bennema in fact claims that the ergative pattern arises as a language ‘solution’ to the Case problem posed by unaccusativity. 3. A number of further accounts of ergativity in the literature do not face the MLC problem since they do not assume that Nominative is assigned by Infl/T. The existing alternatives roughly fall into the following classes: 1) syntactic approaches, postulating a licensor for the Absolutive lower than T (e.g. McGinnis 1998b). 2) approaches dissociating morphological and Abstract Case, so that Cases like Ergative are treated as morphological Cases, hence arguably not subject to syntactic constraints such as MLC (Marantz 1991; Harley 1995; Schütze 1997). Among the syntactic approaches consistent with the view that Absolutive is assigned in Infl/T, one finds approaches like Campana (1992) and Bittner and Hale (1996b) for whom the MLC problem does not arise due to the A’character of the dependency between Infl/T and the object (see these works for details). In contrast, I continue to adopt the hypothesis that T establishes an (A-) dependency with the object. 4. See also Anagnostoupoulou (2003) for further empirical discussion of MLC effects arising with structural and inherent Case. 5. Chomsky (2000) restates the relevant property in that inherent (theta-related) Case ‘inactivates’ the phi-features of the NP, making it ‘invisible’ to matching with the probe, hence, not a potential intervenor. Note that this conception of inherent Case is radically different from Chomsky’s conception of structural Case which is not a separate ‘feature’, but simply a flag making the NP ‘active’ for checking phi-features. Reason(s) for this distinction between the two Cases remain unclear. Chomsky (2001) discusses conditions under which intervention effects may not obtain (in particular, when the intervenor blocks remote matching of some, but not all, features, cf. expletives and participles); inherent Case does not seem to fall under these conditions, however. 6. Similar argumentation can be constructed if sisterhood is included in the set of basic relations inside a phrase marker. It should be stressed that comparison between the option of Merging at the root or Merging not at the root does not have to involve one and the same a: different objects may be considered for Merger. 7. This version of the algorithm thus involves a two-step look-ahead. See Stepanov (2001b) for a version that does not involve look-ahead at all, as well as for further details and consequences of the algorithm. Observe also that the algorithm, as presented, does not distinguish between adjunction by pure Merge and adjunction by movement. If the latter exists, it must also be postcyclic. See Stepanov (2002) for further discussion.
392
Arthur Stepanov
8. Chomsky (2000) indicates that the status of the wh-feature in the A’-system is similar to the structural Case feature in the A-system, although he does not specify exactly where the similarities lie. 9. Here I leave open the exact position where the adjunct wh-phrase is Merged. It must be a specifier. Note also that the present proposal does not jeopardize the well known distinction between arguments and adjuncts with respect to locality (Lasnik and Saito 1984), given that it is stated in terms of theta-theory, not the mode of Merger. Other distinctive properties of theta-theoretic arguments, such as A-binding, also (continue to) follow from theta-theory, rather than the mode of Merger. 10. An unaccusative light verb is also associated with verbs like seem in English (e.g. Chomsky 2000). It is an interesting question as to what triggers this kind of v in split ergative languages, in which the ergative pattern is often conditioned by a single morphosyntactic factor such as perfectivity (e.g. Hindi) or particular tense series (cf. aorist in Georgian). Selectional considerations might play a role here, in which case the question is what kinds of heads choose the unaccusative v, in split ergative languages. This issue, however, is orthogonal to the present discussion, and I put it aside here. See Alexiadou (2001), Nakamura (1998) for some relevant discussion. 11. The PP status of the ergative subject is not immediately consistent with the binding facts, namely, that the subject can bind (into) the object in Hindi. The situation is similar to the raising case (7). In order to retain the fundamental insight that binding is sensitive to c-command (see Lasnik 1989 for compelling evidence), an additional proviso must be taken into consideration, namely, that NPs that are complements of P can c-command out of PPs. A number of works (Kitahara 1997, Epstein et al. 1998, among others) try to account for this property by invoking either a somewhat dubious notion of ‘reanalysis’ (see Baltin and Postal 1996 for discussion of general problems with such accounts), or the idea that the ‘referential’ features of NP adjoin to P at LF, creating a binding configuration (see Lasnik 1999 for arguments against such possibility). I will not discuss the nature of this property in any detail in this work. See Pesetsky (1995), Kayne (2001) for much relevant discussion. 12. English raising constructions in (6) receive a similar account. For details, and analysis of other construction types involving inherent Case marked NPs, see Stepanov (2002). 13. I continue to assume that the ergative subject is merged and stays as a vP specifier (an alternative that I do not pursue here is that the subject is Merged directly to Spec-TP). 14. Marantz himself interprets these facts differently, and outlines a theory which is incompatible with the theory of Abstract Case. In fact, Marantz suggests to abandon the Abstract Case theory altogether. 15. I put aside a detailed discussion of the wh-system of Hindi and other ergative languages in the light of the present proposal. For Hindi which I take to be
Ergativity, Case and the Minimal Link Condition
16.
17.
18. 19.
393
largely a wh-in situ language (Dayal 1996), a wh-phrase may not have an uninterpretable wh-feature, unlike in overt wh-movement languages. Consequently, wh-PP (including ergative subjects) still enter the structure postcyclically, with the wh-dependency established at LF (perhaps by the unselective binding strategy, cf. Reinhart (1998), Tsai 1994). In the ‘single-cycle’ model endorsed here, further adjustments to the proposal might be necessary to accommodate this insight, perhaps restricting postcyclic Merger of wh-PPs to a single phase at a time. With respect to so called ‘wh-scrambling’ (optional fronting of wh-phrases in Hindi; cf. Dayal 1996 and others), I am led to assume a base-generated analysis of ‘scrambled’ wh-phrases, of the type argued in Bo‰koviç and Takahashi (1998). Alternatively, Mahajan (2000) argues on the basis of ergative languages that Burzio’s generalization must be abandoned (see also Levin 1983). As independent evidence, Mahajan shows that ergative NPs in the ergative language Hindi behaves similarly to subjects in accusative languages with respect to a number of structural tests – binding, control, certain kinds of extraction. On the basis of these tests, he concludes that just like (nominative) NPs in accusative languages, ergative NPs are in the “subject position”. However, it is not clear whether this conclusion serves a sufficient grounds for a radical rejection of Burzio’s generalization. The latter does not refer to the subject position, but rather, to the theta-role assigned to the subject. Establishing that ergative subjects behave similarly to nominative subjects with respect to a number of syntactic phenomena does not necessarily mean that they are assigned their theta-role under the same conditions. Structural tests seem to be largely insensitive to the issue of where exactly the ergative, or for that matter, nominative subject, receives its theta-role. A reviewer points out that a version of the equidistance solution based on layered specifiers might be plausible for Hindi, if one supposes that the Absolutive object enters a ‘multiple Agree/Move’ relation with v and T (cf. Hiraiwa 2002, Anagnostoupoulou 2003). If the Ergative subject is also introduced by v, then the Ergative and the Absolutive are at some point equidistant. While avoiding to some extent the first two problems mentioned in the text, this modification still faces the latter two. In addition, this move seems to have less potential than the postcyclic Merger solution, in terms of capturing the difference between non-subject agreement (Hindi) and subject agreement languages, discussed in the next section. Mahajan (1990) assumed that ergative subject must get a structural Case in Hindi. Under the present proposal, crucially, this cannot be the case. A similar form of conceptual argument is given in Chomsky (1995) and his later works, in which he identifies two ‘imperfections’ of the computational system: the existence of uninterpretable features, and the existence of displacement. He takes a reductionist approach by suggesting that uninterpretable features implement the displacement property. Thus, the two ‘imperfections’ cancel each other out.
394
Arthur Stepanov
References Alexiadou, Artemis 2001 Functional Structure in Nominals: Nominalization and Ergativity. Amsterdam/Philadelphia: John Benjamins. to appear On nominative case features and split agreement. In New Perspectives on Case theory, Ellen Brandner and Heike Zinsmeister (eds.). Stanford, CA: CSLI publications. Anagnostoupoulou, Elena 2003 The Syntax of Ditransitives: Evidence from Clitics. Berlin: Mouton de Gruyter. Anand, Pranav, and Andrew Nevins 2002 Some AGREEment matters: A cross-linguistic generalization on reconstruction and agreement. Abstract for the West Coast Conference on Formal Linguistics XXII. Anderson, Stephen R. 1976 On the notion of subject in ergative languages. In Subject and Topic, Charles N. Li (ed.), 1–23. New York, NY: Academic Press. Baltin, Mark R., and Paul Postal 1996 More on reanalysis hypotheses. Linguistic Inquiry 27: 127–145. Bittner, Maria 1994 Case, Scope and Binding. Dordrecht: Kluwer. Bittner, Maria, and Ken Hale 1996a Ergativity: Toward a theory of a heterogeneous class. Linguistic Inquiry 27: 531–604. 1996b The structural determination of Case and agreement. Linguistic Inquiry 27: 1–68. Bobaljik, Jonathan David 1993 On ergativity and ergative unergatives. In MIT Working Papers in Linguistics 19, 45–88. Department of Linguistics and Philosophy, MIT, Cambridge, Mass. Boeckx, Cedric 2000 Quirky agreement. Studia Linguistica 54: 354–380. Bok-Bennema, Reineke 1991 Case and Agreement in Inuit. Dordrecht: Foris. Bonet, Eulalia 1994 The Person-Case Constraint: a morphological approach. In MIT Working Papers in Linguistics 22, Heidi Harley and Colin Phillips (eds.), 33–52. MIT, Cambridge, Mass. Bo‰koviç, Îeljko to appear On left branch extractions. In Proceedings of Formal Description of Slavic Languages 4. 1998 Scrambling and last resort. Linguistic Inquiry 29: 347–366.
Ergativity, Case and the Minimal Link Condition
395
Browning, Maggy 1991 Bounding conditions on representation. Linguistic Inquiry 22: 541–562. Campana, Mark 1992 A movement theory of ergativity. Ph.D. diss., Mcgill University, Montreal. Chametzky, Robert 2000 Phrase Structure: From GB to Minimalism. Malden, Mass.: Blackwell. Chomsky, Noam 1973 Conditions on transformations. In A Festschrift for Morris Halle, Stephen Anderson and Paul Kiparsky (eds.), 232–286. New York: Holt, Rinehart and Winston. 1995 The Minimalist Program. Cambridge, Mass.: MIT press. 2000 Minimalist inquiries: The framework. In Step by Step: Essays in Minimalist Syntax in Honor of Howard Lasnik, Roger Martin, David Michaels and Juan Uriagereka (eds.), 89–155. Cambridge, Mass.: MIT Press. 2001 Derivation by phase. In Ken Hale: A Life in Language, Michael Kenstowicz (ed.), 1–50. Cambridge, Mass.: MIT press. Dayal, Veneeta 1996 Locality in WH Quantification: Questions and Relative Clauses in Hindi. Dordrecht: Kluwer Academic Publishers. Dixon, R. M. W. 1994 Ergativity. Cambridge: Cambridge University Press. Elordieta, Arantzazu 2001 Verb movement and constituent permutation in Basque. Ph.D. diss., Leiden University (published by LOT). Epstein, Samuel D., Erich M. Groat, Ruriko Kawashima, and Hisatsugu Kitahara 1998 A Derivational Approach to Syntactic Relations. Oxford: Oxford University Press. Frampton, John, and Sam Gutmann 1999 Cyclic computation, a computationally efficient minimalist syntax. Syntax 2: 1–27. Freidin, Robert, and Rex A. Sprouse 1991 Lexical case phenomena. In Principles and Parameters in Comparative Grammar, Robert Freidin (ed.), 392–416. Cambridge, Mass.: MIT Press. Hale, Kenneth 1970 The passive and ergative in language change: The Australian case. In Pacific Linguistic Studies in Honour of Arthur Capell, S. A. Wurm and Donald C. Laycock (eds.), 757–781. Sydney: A. H. and A. W. Reed.
396
Arthur Stepanov
Hale, Kenneth, and Samuel J. Keyser 1993 On argument structure and the lexical expression of syntactic relations. In The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, Kenneth Hale and Samuel J. Keyser (eds.), 53–110. Cambridge, Mass.: MIT Press. Harley, Heidi Britton 1995 Subjects, events and licensing. Ph.D. diss., MIT, Cambridge, Mass.. Harris, Alice C. 1981 Georgian syntax: A study in relational grammar. New York: Cambridge University Press. Hiraiwa, Ken 2002 Multiple Agree. Paper presented at the GLOW Workshop Tools in Linguistic Theory, University of Utrecht, April 7–8, 2002. Huang, C.-T. James 1982 Logical relations in Chinese and the theory of grammar. Ph.D. diss., MIT, Cambridge, Mass.. Isaak, Andre G. 2000 Split case marking and prominence relations. Ph.D. diss., University of Massachusetts, Amherst. Jelinek, Eloise 1993 Ergative ‘splits’ and argument type. In MIT Working Papers in Linguistics 18 (Papers on Case and Agreement I), 15–42. Department of Linguistics and Philosophy, MIT, Cambridge, Mass.. Johns, Alana 1992 Deriving ergativity. Linguistic Inquiry 23: 57–87. 2000 Ergativity: A perspective on recent work. In The First GLOT International State-of-the-article Book: The Latest in Linguistics, Lisa Cheng and Rint Sybesma (eds.), 47–73. Berlin: Mouton de Gruyter. Kayne, Richard 2001 Prepositions as probes. Ms., New York University, New York. Kitahara, Hisatsugu 1997 Elementary Operations and Optimal Derivations. Cambridge, Mass.: MIT Press. Laka, Itziar 1990 Negation in syntax: on the nature of functional categories and projections. Ph.D. diss., MIT, Cambridge, Mass. 1993 Unergatives that assign ergative, unaccusatives that assign accusative. In MIT Working Papers in Linguistics 18, 149–172. Department of Linguistics and Philosophy, MIT, Cambridge, Mass. Lasnik, Howard 1989 Essays on Anaphora. Dordrecht: Kluwer. 1999 Minimalist Analysis. Malden, Mass.: Blackwell.
Ergativity, Case and the Minimal Link Condition 2000
397
Derivation and transformation in modern transformational syntax. In The Handbook of Contemporary Syntactic Theory, Mark Baltin and Chris Collins (eds.). Malden, Mass.: Blackwell. Lasnik, Howard, and Mamoru Saito 1984 On the nature of proper government. Linguistic Inquiry 15: 235–289. Lebeaux, David 1988 Language acquisition and the form of the grammar. Ph.D. diss., University of Massachusetts, Amherst. Levin, Beth 1983 On the nature of ergativity. Ph.D. diss., MIT, Cambridge, Mass. Mahajan, Anoop 1990 The A/A-bar distinction and movement theory. Ph.D. diss., MIT, Cambridge, Mass. 1994 The ergativity parameter: have-be alternations, word order and split ergativity. In Proceedings of North Eastern Linguistic Society 24, M. Gonzàlez (ed.), 317–331. GLSA, University of Massachusetts, Amherst, Mass. 1997 Universal Grammar and the typology of ergative languages. In Studies on Universal Grammar and Typological Variation, Artemis Alexiadou and T. Alan Hall (eds.), 35–57. Amsterdam: Benjamins. 2000 Oblique subjects and Burzio’s generalization. In Arguments and Case: Explaining Burzio’s Generalization, Reuland Eric (ed.), 79–102. Amsterdam: Benjamins. Marantz, Alec 1991 Case and licensing. In Proceedings – Eastern States Conference on Linguistics (ESCOL) 8, 234–253. Cornell University, Ithaca. Massey, Victoria Walker 1990 Experiencers, themes and c-command in Albanian. In Papers from the Second Student Conference in Linguistics, MIT Working Papers in Linguistics 12, Thomas Green and Sigal Uziel (eds.), 128–143. MIT, Cambridge, Mass. McGinnis, Martha 1998a Reflexive external arguments and lethal ambiguity. In Proceedings of the West Coast Conference on Formal Linguistics 16, Emily Curtis, James Lyle, and Gabriel Webster (eds.), 303–317. Stanford, CA: CSLI Publications. 1998b Locality in A-movement. Ph.D. diss., MIT, Cambridge, Mass. Murasugi, Kumiko 1992 Crossing and nested paths: NP movement in accusative and ergative languages. Ph.D. diss., MIT, Cambridge, Mass. 1995 Lexical case and NP Raising. In Grammatical Relations: Theoretical Approaches to Empirical Questions, Clifford S. Burgess, Dziwirek, Katarzyna, and Donna Gerdts (eds.), 309–320. Stanford, CA: CSLI Publications.
398
Arthur Stepanov
Nakamura, Masanori 1998 Reference set, Minimal Link Condition, and parameterization. In Is the Best Good Enough?, Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis, and David Pesetsky (eds.), 291–313. Cambridge, Mass.: MIT Press. Nash, Léa 1996 The internal ergative subject hypothesis. In Proceedings of the North Eastern Linguistic Society 26, 195–209. GLSA, University of Mass., Amherst, Massachusetts. Oyharçabal, Beñat 1992 Structural Case and inherent Case marking: Ergaccusativity in Basque. In Syntactic Theory and Basque Syntax, Joseba A. Lakarra and Jon Ortiz de Urbina (eds.), 309–342. San Sebastián: Donostia. Pesetsky, David 1995 Zero syntax. Cambridge, Mass.: MIT Press. Pesetsky, David, and Esther Torrego 2001 T-to-C movement: Causes and consequences. In Ken Hale: A life in language, Michael Kenstowicz (ed.), 355–426. Cambridge: MIT Press. Phillips, Colin 1995 Ergative subjects. In Grammatical Relations: Theoretical Approaches to Empirical Questions, Clifford S. Burgess, Dziwirek, Katarzyna and Donna Gerdts (eds.), 341–357. Stanford, CA: Center Study Language & Information. Reinhart, Tanya 1998 Wh-in-situ in the framework of the Minimalist Program. Natural Language Semantics 6: 29–56. Rizzi, Luizi 1986 On chain formation. In Syntax and Semantics: The Syntax of Pronominal Clitics, Vol. 19, Hagit Borer (ed.), 65–95. New York: Academic Press. Schütze, Carson T. 1997 Infl in child and adult language: agreement, Case and licensing. Ph.D. diss., MIT, Cambridge, Mass. Sigur›sson, Halldór Ármann 2000 The locus of Case and agreement. In Working Papers in Scandinavian Syntax 65, Christer Platzack (ed.), 65–108, Lund University. Stepanov, Arthur 2001a Cyclic domains in syntactic theory. Ph.D. diss., University of Connecticut, Storrs. 2001b Late adjunction and minimalist phrase structure. Syntax 4: 94–125. 2002 Derivational properties of inherent Case. Ms., University of Potsdam.
Ergativity, Case and the Minimal Link Condition
399
Taraldsen, Knut Tarald 1995 On agreement and nominative objects in Icelandic. In Studies in Comparative Germanic Syntax, Hubert Haider, Susan Olsen, and Sten Vikner (eds.), 307–327. Dordrecht: Kluwer. Tsai, W.-T. Dylan 1994 On economizing the theory of A’-dependencies. Ph.D. diss., MIT, Cambridge, Mass. Woolford, Ellen 1997 Four-way Case systems: Ergative, Nominative, Objective and Accusative. Natural Language and Linguistic Theory 15: 181–227. Zaenen, Annie, Joan Maling, and Hoskuldur Thrainsson 1985 Case and grammatical functions: the Icelandic passive. Natural Language and Linguistic Theory 3: 441–483.
Correspondence in OT syntax and Minimal Link effects Ralf Vogel
The aim of this paper is the exploration of an optimality theoretic architecture for syntax that is guided by the concept of correspondence: syntax is understood as the mechanism of “translating” underlying representations into a surface form. In minimalism, this surface form is called “Phonological Form” (PF). Both semantic and abstract syntactic information are reflected by the surface form. The empirical domain where this architecture is tested are minimal link effects, especially in the case of wh-movement. The OT constraints require the surface form to reflect the underlying semantic and syntactic representations as maximally as possible. The means by which underlying relations and properties are encoded are precedence, adjacency, surface morphology and prosodic structure. Information that is not encoded in one of these ways remains unexpressed, and gets lost unless it is recoverable via the context. Different kinds of information are often expressed by the same means. The resulting conflicts are resolved by the relative ranking of the relevant correspondence constraints. The minimal link condition (cf. Chomsky 1995, Rizzi 1990) as given in (1) expresses a locality restriction on syntactic movement: Movement of _ to a target K is blocked by `, if ` is closer to K and could enter the same checking relation. (1)
Minimal Link Condition (MLC) K attracts _ only if there is no `, ` closer to K than _, such that K attracts `. (Chomsky 1995: 311)
The restriction described by the MLC has been explained in two ways. The most common explanation is in terms of economy of movement: ` blocks _ because movement of ` would require a shorter movement step. Economy of movement is a core principle of grammar in minimalist syntax (Chomsky 1995). An optimality theoretic implementation of this idea has been developed by Legendre et al. (1998) in the form of the constraint BAR:
402
Ralf Vogel
(2)
BAR: A chain link may not cross a barrier. (Legendre et al. 1998, 261; see also Hale and Legendre, this volume)
Conjoined versions of BAR like BAR2 (“A chain link may not cross two barriers.”), BAR3 etc. make up the “’MINLINK power hierarchy”. The more barriers are crossed, the more violations of BAR constraints are incurred by a movement step. Candidates with fewer BAR violations block others with more violations. The second strategy of explanation for the MLC relies on the fact that movement of _ across ` reverses the relative order of these two elements. Within OT, Müller (2001) presented an analysis of various syntactic movement phenomena that seem to be governed by the force to keep the underlying relative order of elements in a surface form. Müller calls the constraint “Parallel Movement”: (3)
“Parallel Movement” (PAR-MOVE): If _ c-commands ` at level Ln , then _ c-commands ` at level Ln+1 (where `, _ are arguments). (Müller 2001, 279)
Williams’s (2003) “Representation Theory” is largely built on the principle of shape conservation. His proposal is more radical than Müller’s in that shape conservation is conceived as a replacement for derivational economy. Under the term “faithfulness”, structure preservation already plays a central role in OT. Correspondence Theory, as developed by McCarthy and Prince (1995), is an extension and systematisation of the standard input-output faithfulness system. Relations between representations, elements within representations and their properties are evaluated by a set of correspondence constraints, demanding, among other things, the existence of a correspondent, the conservation of the relative order of elements, one-to-one mappings, feature identity of corresponding elements etc. If it is possible to reconstruct the MLC as shape conservation, then this is a natural way of dealing with it within OT, because one would only use tools which are already there, while derivational economy needs to be added in the form of constraints like BAR, or Grimshaw’s (1997) STAY (“No movement”), or by other means. An OT account of the MLC in terms of derivational economy still needs correspondence. Let me demonstrate this with the example of superiority in English: (4)
*What did who say?
Correspondence in OT syntax and Minimal Link effects
403
This is one of the standard cases for which the MLC has been used as explanation. Movement of the object to the clause-initial position [Spec,CP] involves the crossing of more barriers than movement of the subject. Consider a syntactic representation where case is assigned in the specifier of AGR-phrases, as it is usual in minimalist syntax. The table in (5) displays the BAR violations of possible OT candidates for (4): (5)
Qy.Qx.say(x,y)
BAR
c1: [CP WO … [AGRSP WS [AGROP tO [VP tS tO ]]]] ☞ c2: [CP WS … [AGRSP tS [AGROP WO [VP tS tO ]]]] ☞ c3: [CP WO … [AGROP tO [AGRSP WS [VP tS tO ]]]]
*****! **** ****
Candidate c1 represents the ungrammatical (4). The object wh-phrase crosses one barrier, VP, when moving to AGROP, and two further ones, AGROP and AGRSP, when moving to CP. The subject movement crosses VP and AGROP.1 We have five violations of BAR. The blocking candidate c2 has one violation less because the final subject movement to CP crosses only one barrier. But what about candidate c3? Here, we reversed the relative order of AGROP and AGRSP, and we see that we have only four violations now. This candidate has the ungrammatical surface order in (4) and therefore must not be a winner. The problem that is raised here for the OT implementation of the derivational MLC is: If ` blocks movement of _, why does ` precede _ in the first place? Could there not be a candidate that has the reverse underlying order? Furthermore, if movement to [Spec,CP] is blocked, why not insert _ directly into that position without movement? In OT, with its relatively unconstrained candidate generator, these are real options that must be excluded explicitly. This is usually achieved by stipulating that AGRSP must embed AGROP universally, that all NPs have to be inserted into their “theta positions” within VP etc. – inviolable constraints that are assumed to be part of the candidate generator, GEN. A theory that is built on shape conservation has a straightforward explanation for the restrictions just discussed. In LFG-OT, they follow from fstructure/c-structure correspondence (see Kuhn 2001). In the system of Williams (2003), case structure represents theta structure, and the parallel relative order of corresponding elements follows from shape conservation. This is certainly the most plausible explanation for the parallelism of case and argument hierarchy.
404
Ralf Vogel
A correspondence theoretic OT approach can use ordinary violable OT constraints here which are not part of GEN. This is the second conceptual advantage of such a model. GEN should be as unconstrained and do as little explanatory work as possible: the assumption of inviolable constraints as such weakens an OT account. Limitations of space do not allow an exhaustive discussion of all details of an OT syntax model based on correspondence. In what follows, I will concentrate on those aspects that are relevant for the discussion of minimal link effects, first of all superiority. The paper is organised as follows: In section 1 the architecture of the proposed OT syntax model is laid out; section 2 demonstrates in a first application how Greenberg’s (1963) first universal can be derived. Section 3 analyses topicalisation and wh-movement in English, in section 4 this is also done for German. The discussion in section 5 focuses on multiple questions. Section 6 focuses on word order freezing, introduces recoverability as a central criterion for grammaticality, and shows how this can be implemented in a bidirectional model of the grammar.
1.
Correspondence
1.1. A brief sketch of Optimality Theory Optimality Theoretic models consist of five components: – – – – –
An input representation In. A set of representations of output candidates O. A generation function GEN(In,O) that generates O on the basis of In. A constraint hierarchy CON. An evaluation function EVAL(CON,O,On) that selects the optimal output On from O on the basis of CON.
This model can be applied to very different tasks. The input and output representations often vary with the problem that is targeted. In OT syntax, the input is often considered a more or less complex semantic representation, and the output a syntactic representation. Other approaches use an abstract syntactic representation for the input, and a phonological representation as the output. Input and output representations are as unrestricted as possible. Constraints should be part of CON. The constraints are violable and hierarchically ordered. Low ranked constraints can never override the effects of higher ranked constraints, no matter
Correspondence in OT syntax and Minimal Link effects
405
how often they are violated. There are two constraint types: Markedness constraints evaluate features of candidates, while faithfulness constraints evaluate how similar an output candidate is to the input. This latter constraint type is particularly important for our discussion. Output candidates could be more complex, for instance, a pair of representations. In that case, correspondence of two parts of an output candidate is also an issue. A generalised theory of correspondence that includes input-output faithfulness as a special case has been presented by McCarthy and Prince (1995).
1.2. Correspondence-based OT syntax Müller’s (2001b) PAR-MOV requires correspondence between different levels of syntactic representations. He uses the standard model from Chomsky’s (1981) Government and Binding Theory with D-structure, S-structure and LF as levels. An alternative that is based on more recent minimalist syntax has been proposed by Heck and Müller (2000) in their system of serial optimisation. A second kind of correspondence is also conceivable, namely, one that relates different kinds of representations. Standard OT syntax, as introduced by the work of Grimshaw (1997) and others2, assumes the input to be a semantic representation containing first of all the argument structure, but also scope representations for operators and quantifiers. Most researchers also include information structural specifications in the input, as has been suggested by work dealing with the relation between syntax and information structure.3 The candidates are syntactic representations. Faithfulness constraints in this case evaluate the correspondence of semantic and syntactic representations. Other examples for correspondence between different types of representations are the papers by Pesetsky (1997,1998), where the syntactic structure, stands in correspondence with the phonological output representation, PF. Legendre (2000) uses constraints on syntax-PF correspondence in her essentially PF-based analysis of the behaviour of Bulgarian clitics. These are examples for a usage of the term “correspondence” as it is prominent in the work of Jackendoff (1990). The role of syntax, according to Jackendoff, is to mediate between meaning and sound, semantics and phonology, by virtue of so-called “correspondence rules”. I want to explore an architecture for OT syntax which is based on Jackendoff’s notion of correspondence, however, I also assume that there must be a direct correspondence relation between conceptual and phono-
406
Ralf Vogel
logical structures, as has been proposed, for instance, in work on the relation between prosodic and information structure. I will use the labels M (“meaning”, semantics), S (abstract syntax) and P (phonological representation or surface syntax). The mapping relations that hold between these representations have the character of a translation. For instance, scope is often translated into c-command and precedence, the same holds for other instances of prominence, like being the higher argument or being more salient in the discourse. Sisterhood and adjacency is a similar pair. Truckenbrodt (1999) discusses the relation between syntactic and phonological phrases. Following more recent work in minimalist syntax, I assume that the abstract syntactic representation does not represent linear order itself. This is only represented at P. Kayne (1994) postulates the Linear Correspondence Axiom which expresses, as a mapping principle, that asymmetric c-command in a syntactic representation translates into precedence. In the system explored here, this is a good candidate for a correspondence constraint, but, as an OT constraint, it would be violable.4 I assume that S is an X-bar structure, where phrases have a top phrasal node, at most one specifier, at most one X-bar node dominating a head and at most one complement. Movement operations are restricted to cyclic substitution. All (abstract) syntactic movement is therefore structure-building. Adjunction is treated as an instance of linear reordering in the mapping from S to P. Consequently, true syntactic adjuncts, like adverbial modifiers, are generated together with matrix trees, but not connected to them in S. So there is no abstract syntactic operation of adjunction, neither of XPs nor of heads. S therefore is not necessarily a unique syntactic tree, but could as well be a “forest”, a set containing the matrix tree and all its adjuncts as elements.5 As long as there is movement, there is also the notion of movement chain. I assume that only the head of a movement chain is relevant for the mapping from S to P. This might help in reducing movement and assumptions about it to a minimum. But this restriction might as well follow from a recoverability condition: how can a movement step be made visible, if the trace is spelled out instead of the movement chain’s head? S and P could be seen as two representations that both partially represent syntactic properties of a clause. While S represents constituency, abstract features and further abstract syntactic relations (like binding, case assignment, and others, insofar as they are assumed to be syntactic), linear order is represented at P, together with other surface aspects, like morphology and prosodic phrasing.
Correspondence in OT syntax and Minimal Link effects
407
I will introduce recoverability below as another central aspect of the correspondence-based OT syntax model. M and S are encoded into P and must be completely recoverable from P. This defines grammaticality. Four constraint families form the base of this system. These are constraints on the correspondence of M and S (MGS), M and P (MGP), and S and P (SGP). The fourth family, SIGSO, will become necessary with the inclusion of S in both input and output (see section 5 below).
2. Deriving Greenberg’s First Universal As a first demonstration, I want to show how such a system is able to derive Greenberg’s (1963) first universal: (6)
Universal I. In declarative sentences with nominal subject and object, the dominant order is almost always one in which the subject precedes the object.
This universal proposes three basic word order patterns, VSO, SVO and SOV. In the meantime, it has become clear that VOS is also an option. Malagasy (Rackowski and Travis 2000) and Tzotzil (Aissen 1987, 1992, 1996) are examples.6 It seems that in some Mayan languages clauses of the form ‘V NP NP’ are ambiguous for VSO and VOS order, if no other features of the NPs disambiguate the structure. The examples in 7 are from K’ichee’ (Mondloch, 1978): 7 (7)
VSO/VOS order in K’ichee’ (Mayan): a.
X-uu-kuna-j rii achih rii ixoq. CP-E3-cure-ACT DET man DET woman ‘The man cured the woman’ or ‘The woman cured the man’.
b.
X-ee-ki-kuna-j rii achijaab’ rii ixoqiib’. DET women CP-A3pl-E3pl-cure-ACT DET men ‘The men cured the women’ or ‘the women cured the men’.
(CP = completive aspect; E = ergative; A = absolutive; ACT = suffix for active transitive verbs; DET = determiner)
In the following, I assume that VOS is an option and should be a possible winner under some ranking. The orders that have to be excluded as default
408
Ralf Vogel
orders are OVS and OSV. The constraints in 9 evaluate how a VP of the form 8 is linearised. They are introduced in an informal way and will be given more precise definitions in section 4.8 (8)
[VP Subject [V’ V0 Object ] ]
(9)
SGP(sh): A specifier precedes its head. SGP(ch): A complement precedes its head. SGP(hc): A head precedes its complement. SGP(sa): Sisters are adjacent. SGP(NP): Asymmetric c-command among NPs translates into precedence. MGP(fa): A functor precedes all its arguments.
The six logically possible linearisations of subject, object and verb constitute the candidate set. Four of the six candidates are possible under some ranking.9 Two orders are correctly excluded, those with the object in first position. This is not to say that such orders are impossible, they are only impossible as default or unmarked orders. Table 10 displays the violations of the constraints by the six candidates and shows how this result is achieved. (10) [VP S [V’ V0 O ] ] VSO VOS SVO SOV OSV OVS
SGP (sh)
SGP (ch)
* *
* * *
SGP (hc)
SGP (sa)
SGP (NP)
* * * *
* *!
MGP (fa)
* *
*!
*! *!
* *
The orders OSV and OVS violate the same constraints as the SOV order, and further constraints in addition which have been marked with “!”. The candidates are “harmonically bounded”: they would lose against SOV under any ranking. A constraint that is violated by both is SGP(NP). This is the constraint that formulates the core of Greenberg’s first universal, namely, that subjects precede objects, i.e. that asymmetric c-command between elements of the syntactic category NP translates into precedence. The only candidate
Correspondence in OT syntax and Minimal Link effects
409
that violates this constraint and is not harmonically bounded, is VOS. The reason is that VSO, the candidate with the most similar violation profile, cannot fulfil sister adjacency: head initial orders cannot simultaneously fulfil SGP(NP) and SGP(sa). The winning rankings for the four possible winners are summarised in 11 – only the crucial rankings are indicated: (11) SVO: {SGP(hc), SGP(sa), SGP(NP),SGP(sh)} >> {SGP(ch), MGP(fa)} SOV: {SGP(ch), SGP(sa), SGP(NP), SGP(sh)} >> {SGP(hc), MGP(fa)} VSO: {SGP(hc), MGP(fa), SGP(NP), SGP(sh)} >> {SGP(ch), SGP(sa)} VOS: {SGP(hc), MGP(fa), SGP(sa)} >> {SGP(ch), SGP(NP), SGP(sh)}
A language like K’ichee’ would have a constraint ranking where SGP(NP) and SGP(sa) are tied, i.e., are of equal rank:10 (12) VSO/VOS (e.g., K’ichee’): {SGP(hc), MGP(fa)} >> SGP(sa) ● SGP(NP) >> SGP(ch)
Head-initial orders cannot fulfil both SGP(sa) and SGP(NP) simultaneously, contrary to SVO and SOV. A tie between SGP(sa) and MGP(fa) in combination with “SGP(hc) >> SGP(ch)” would yield a VSO/SVO language. Spanish has been classified as SVO and VSO by different authors.11 Perhaps, both orders are equally unmarked. A tie between SGP(sa) and MGP(fa) in combination with “SGP(hc) >> SGP(ch)” would yield such a VSO/SVO language. The following sections will explore how the system of constraints described in this section implements the minimal link condition. The discussion will focus on A-bar movement, in particular, topicalisation and wh-movement. I will compare the accounts for English and German.
3. Topicalisation and wh-movement in English In the English sentences in (13), the order of the NPs is determined by argument structure, operator scope and information structure, respectively: 12 (13) a. John wrote this book. b. What did John write? c. The red book, John wrote.
order follows argument structure order follows wh-scope marking order follows information structure
In both (13b) and (13c), the order of the arguments as determined by argument structure is not preserved. These examples could be seen as simple
410
Ralf Vogel
cases of “minimal link violations”, under a version of the minimal link condition that is equal to the requirement to preserve the underlying argument order. Many cases of MLC violations discussed in the literature are of this kind. What is particularly interesting is that (13b) and (13c) have two different syntactic structures: while in (13b) ‘what’ is moved into the specifier of an additional projection headed by ‘do’, ‘The red book’ in (13c) is left dislocated, presumably adjoined to the root node. The two different structural solutions can be interpreted as resulting from different priorities among correspondence constraints. Scope marking is reflected at both S and P, while information structural prominence is only reflected at P. I assume the following constraints on the correspondence between M and S: 13 (14) Constraints on MGS mapping: (identical indices indicate correspondence of elements, e.g., m1 corresponds to s1)
a. MGS(Arg): If an argument m1 is higher than another argument m2 at M, then s1 asymmetrically c-commands s2 at S. b. MGS(Wh): If a wh-operator m1 has scope over m2 at M, then s1 asymmetrically c-commands s2 at S. c. MGS(Inf): If m1 is [+prom] and m2 is [–prom] at M, then s1 asymmetrically c-commands s2 at S. The ranking in (15) prefers abstract syntax to reflect scope and to ignore information structure. Thus, the relative order of the arguments will be preserved, if they only differ in information structure, but it will not be preserved, if (only) a lower argument is a wh-element. (15) English ranking of MGS constraints: MGS(Wh) >> MGS(Arg) >> MGS(Inf) It is easy to see that these three correspondence constraints potentially conflict: they compete for the same means of encoding for different kinds of information. As introduced above, I also assume constraints that govern the correspondence between M and P directly. These constraints are parallel to those in (14).
Correspondence in OT syntax and Minimal Link effects
411
(16) Constraints on MGP mapping: a. MGP(Arg): If an argument m1 is higher than another argument m2 at M, then p1 precedes p2 at P. b. MGP(Wh): If a wh-operator m1 has scope over m2 at M, then p1 precedes p2 at P. c. MGP(Inf): If m1 is [+prom] and m2 is [–prom] at M, then p1 precedes p2 at P. The ranking of these three constraints is not parallel to the ranking for MGS mapping.14 (17) English ranking of MGP constraints: MGP(Inf) >> MGP(Arg) (MGP(Wh)) Most crucially, MGP(Inf) is higher than MGP(Arg). Thus, although information structure cannot override argument structure in S, it can do so in P – prominent elements cannot be moved in S, but they can be fronted in the string. The cause for the two different fronting strategies that we find in English is that wh-elements are fronted in S, while discourse prominent phrases are fronted in P. To front a non-subject in S, we need an additional projection that provides a specifier as the landing site. This is the motivation for ‘do’-support in non-subject wh-questions.15 These constraints together ensure that MGP(Wh) is fulfilled by winners. MGP(Wh) will be left out in the discussion below, but it certainly has a function in languages other than English. One could imagine that it is responsible for multiple wh-fronting in, for instance, many Slavic languages. 16 The two different fronting strategies result from the ranking in (18) (the family of SGP constraints is abbreviated with one general constraint here): (18) English ranking of MGS, MGP, and SGP constraints: MGS(Wh) >> {MGP(Inf), MGS(Arg)} >> SGP >> {MGS(Inf), MGP(Arg)}
Let us now take a look at the OT analyses. We will start with wh-movement. The constraint MGS(Wh) is higher than MGS(Arg), hence, a structure with syntactic wh-movement is preferred over one without. The effects of this movement are preserved at P. This is ensured by ranking SGP higher than MGP(Arg).
412
Ralf Vogel
Let us start with a standard OT system with the following properties: we use M as input representation, a specification of all relevant semantic information, including at least argument structure, scope specification and information structure. Candidates are [S,P] pairs: (19) Standard OT syntax model: Input: Output:
M S, P
We now want to derive (3b), repeated in (20): (20) What did John write? For S, we have to consider two candidates, one with wh-movement (abbreviated as “+mvt”), and one without (“–mvt”): (21) Candidates for S: a. + mvt: [ NP[+wh]i doj NP tj [ V ti ] ] b. – mvt: [ NP V NP[+wh] ] For P, we in principle have to consider all possible permutations of subject, (do), verb and object. I will restrict myself to the orders SVO, OSV and OVS, and parallel cases with do for the [+mvt] candidates. The other orders are ruled out independently. SOV is ruled out by the ranking “SGP(hc) >> SGP(ch)” which sets the head parameter to VO. Verb-initial orders are ruled out by SGP(sh) which ensures that specifiers precede the heads of their projections. (22) Candidates for P: a. [–mvt]: SVO, OSV, OVS b. [+mvt]: SdoVO, OSdoV, OdoSV This gives us six candidate [S,P] pairs. The following table only uses the constraints relevant for the discussion of wh-movement, tacitly assuming that information structure is not relevant in this particular case.
Correspondence in OT syntax and Minimal Link effects
413
(23) Wh-movement of an object in English:
Qx.write(j,x) [+mvt,SdoVO] [+mvt,OSdoV] ☞ [+mvt,OdoSV] [–mvt,SVO] [–mvt,OSV] [–mvt,OVS]
MGS (Wh)
*! *! *!
MGS (Arg)
SGP (sh)
SGP (hc)
SGP (NP)
* * *
*!
* *!
*
* *
* *
*
MGP (Arg) * * * *
The highest constraint is MGS(Wh). So the three candidates without whmovement at S are excluded. For the three candidates left, we have to find out the optimal linear order, they differ only at P. They all violate MGS(Arg) the same way, because of the syntactic wh-movement. So this constraint cannot decide. The SGP constraints17 determine as winner the OdoSV order: the elements are linearised according to their relative c-command relations (O asymmetrically c-commands do, which asymmetrically c-commands S, which asymmetrically c-commands V). Only the heads of movement chains are taken into account in the evaluation. SGP(sh) needs to be fulfilled by O, the fronted wh-phrase, and do. SGP(hc) can only be violated by do and its complement [S V]. SGP(NP) is fulfilled, if O precedes S. For the [–mvt] candidates the violations are determined accordingly. Let us now turn to (13c), left dislocation of a discourse prominent object, repeated in (24). (24) The red book, John wrote. We use the same candidates as before. The constraints on operator scope are left out, as they are not active here. The SGP constraints have the general effect of keeping departures from the default linear order at P, relative to a given S, as minimal as possible. This is again decisive, as (25) shows:
414
Ralf Vogel
(25) Topicalisation of an object in English: write(j,rb) rb=[+PROM] [+mvt,SdoVO] [+mvt,OSdoV] [+mvt,OdoSV] [–mvt,SVO] ☞ [–mvt,OSV] [–mvt,OVS]
MGP MGS (Inf) (Arg) *!
* *! *!
SGP SGP SGP (sh) (hc) (NP) *
* *
*
* *
* *
MGP MGS (Arg) (Inf)
* *
*! *!
* *
* * *
The candidates with syntactic object movement are ruled out early because of the sub-ranking “MGS(Arg) >> MGS(Inf)”. MGP(Inf) enforces linear reordering according to information structural needs, and the SGP constraints choose the candidate that preserves most of the underlying syntactic configuration in the linear order. This is the candidate “[–mvt,OSV]” where the object is left dislocated, which is, according to the analysis presented here, a process of mere linear reordering. The constraint SGP(sh) makes the difference in the comparison with OVS order. The relative order of the head V and its complement O cannot be preserved, the object is topicalised. But the OSV candidate preserves the relative order of V and its specifier, S. 4. Topicalisation, Wh-movement, and Verb-Second in German This section discusses the equivalent German data. The clauses in (3) translate into German as follows: (26) a. John schrieb dieses Buch John wrote this book b. Was schrieb John? What wrote John? c. Das rote Buch schrieb John The red book wrote John At first sight, to account for these examples seems rather easy: If topicalisation has the same syntactic effects as wh-movement, then perhaps because MGS(Inf) is ranked as high as MGS(Wh):
Correspondence in OT syntax and Minimal Link effects
415
(27) Possible German ranking I: {MGS(Wh), MGS(Inf)} >> MGS(Arg) >> SGP >> {MGP(Inf), MGP(Arg)}
Alternatively, one could yield the same effect in an indirect way: ranking both MGP(Inf) and SGP higher than MGS(Arg) has the effect that a structure with object movement at S is preferred in order to make S and P maximally correspond: (28) Possible German ranking II: {MGS(Wh), MGP(Inf)} >> SGP >> MGS(Arg) >> MGP(Arg)
However, German is a head-final language, with the exception of the VerbSecond effect. To account for this, we need to be more detailed about the constraint family SGP. We already introduced a number of these constraints in section 2 informally. (29)– (31) give more precise definitions: (29) a. SGP(ch) – “complement before head”: If s1 and s2 are sister nodes at S, and s1 is a head and s2 its complement, then p1 precedes p2 at P. b. SGP(hc) – “head before complement”: If s1 and s2 are sister nodes at S, and s1 is a head and s2 its complement, then p1 precedes p2 at P. The relative ranking of SGP(hc) and SGP(ch) determines whether a language is head-initial (SGP(hc) >> SGP(ch)) or head-final (SGP(ch) >> SGP(hc)). The two constraints in (30) express default linearisation tendencies for further relations within a phrase. (30) a. SGP(sa) – “sisters are adjacent”: If s1 and s2 are sisters at S, then p1 and p2 are adjacent at P. b. SGP(sh) – “specifier precedes head”: If s1 is the specifier of a maximal projection headed by s2 at S, then precedes p2 at P. The two constraints in (31) restrict shape conservation to syntactic categories: Asymmetric c-command between elements of the same syntactic category translates into precedence.
416
Ralf Vogel
(31) a. SGP(NP): If an NP s1 asymmetrically c-commands another NP s2 at S, then p1 precedes p2 at P. b. SGP(V0): If a verbal head s1 asymmetrically c-commands another verbal head s2 at S, then p1 precedes p2 at P. SGP(NP) already played an important role in the discussion of Greenberg’s first universal in section 2. SGP(V0) has been used by Schmid and Vogel (2004) in accounting for the dialectal typology of three-verb clusters in German. Swiss German dialects, like Dutch dialects, display a default order in verbal complexes where the higher verb precedes the lower verb, while standard German displays the opposite order, obeying SGP(ch) even within verbal complexes. While Swiss German dialects are still head-final for the relative order of object and verb, complex V-VP structures are head-initial. The high-ranking of SGP(V0) introduces this kind of restricted head initiality into a verb-final language. The two constraints have no counterpart constraint requiring a right-toleft order. This asymmetry reflects what is also expressed in Kayne’s (1994) Linear Correspondence Axiom, namely, a universal tendency for the iconic mapping of asymmetric c-command into precedence. Symmetric c-command, as observed in the case of head-complement sisterhood, is string-ambiguous, and this motivates the two constraints on head-complement order which directly implement a syntactic parameter. Specifiers again asymmetrically c-command their heads, so there is only a constraint that requires specifiers to precede heads. The restriction to a particular syntactic category in SGP(V0) and SGP(NP) might sound arbitrary. It would do less so, if we could turn it into a restriction on the formulation of constraints: constraints on the relative linear order of syntactic elements relate either the immediate constituents of a phrase (specifier, head, complement) or elements of the same syntactic category. I will follow this restriction in this paper. It is a mode of implementing the idea of relativised minimality (cf. Rizzi 1990): only like elements can block each other by minimality. One could imagine that relative prominence is also important for the head-complement relation: though the two elements are sisters, the head could be seen as relatively more prominent, because, in traditional terminology, it often governs and selects its complement. Thus, although the structural
Correspondence in OT syntax and Minimal Link effects
417
symmetry of the head-complement relation justifies the two mirror image constraints SGP(ch) and SGP(hc), under extreme conditions head-beforecomplement might emerge as the unmarked case.18 The verb-second effect in German could be interpreted as the consequence of such a situation. German is head-final, except for the highest projection in the clause. For the highest projection, the head parameter has the biggest consequences. Its head is either the first (given that the specifier is empty) or the last element of the clause. Let us assume a constraint that reflects this: (32) SGP(hc-top): If s1 is the head of the topmost projection in S, and s2 its complement, then p1 precedes p2 at P. High rank of this constraint turns an OV language like German into a “context sensitive VO-language”. The highest head of an extended projection is more important than other heads because it signals the categorial status of the phrase. If this is the motivation for SGP(hc-top), then there should not be a counterpart constraint SGP(ch-top) requiring the topmost head to occur at the end of the clause. In an SOV language, the effects of such a constraint could not be seen, but for an SVO language this constraint would predict a verb-last effect for the highest verb. Thus, while the German order of a matrix clause with an analytic tense is “S Aux O V”, the imaginary language we are talking about would have “S V O Aux”, and a clause with simple tense would have “S O V”. As far as I know, such a language does not exist. I take this gap as another piece of evidence for the claim made above that precedence is the iconic translation of relative syntactic prominence into relative linear order. The standard German ranking, then, is: (33) SGP(hc-top) >> SGP(ch) >> SGP(hc) Let us now return to our discussion of topicalisation and wh-movement in German. The main difference to English is that German does not have the English type topicalisation. Fronting is always accompanied by verb-second. One ranking option is that German has MGS(Inf) ranked higher than MGS(Arg). Linear reordering at P would then be unnecessary to fulfil MGP(Inf). However, left dislocation is also impossible in the case of fronted adverbials, cf. the following three clauses:
418
Ralf Vogel
(34) a. Yesterday I met Pierce Brosnan b. *Gestern ich traf Pierce Brosnan yesterday I met P. B. c. Gestern traf ich Pierce Brosnan Adverbials do not need to undergo movement, they can be inserted where they occur at the surface. There is nothing so far in the system of constraints that would prevent (34b). The ban on left dislocation of the English kind is more general. German has a construction that looks very much like left dislocation, but here it is crucial that the left dislocated element has a correlate within the matrix clause: (35) a. Gestern, da traf ich Pierce Brosnan yesterday there met I P. B. b. (Der/Den) Pierce Brosnan, den traf ich gestern B. that-one-ACC met I yesterday (the-NOM/the-ACC) P. I assume that the clause that follows the left dislocated element must contain all its constituents, and, furthermore, that its shape has to be as would be expected from SGP mapping constraints, and, in particular, the left edge of P has to correspond to the highest constituent in S. Remember that the analysis of English left dislocation presented in section 3 assumes a mismatch between S and P: the leftmost element in P is the fronted constituent. It is only in initial position in P, not in S, so the highest constituent in S is another element, usually the subject. I assume that German has a ban on such a mismatch in the clause-initial position. We already saw in our discussion of the shift from head-final to head-initial in the highest clausal projection that the initial projection of a clause has a special status in German. The constraint that I propose is the one in (36): (36) SGP(top): If s1 is the highest constituent in S, then p1 is leftmost in P The notion of “highest constituent” is defined as follows: (37) HIGHEST CONSTITUENT: Given a syntactic structure S with the root node R, _ is the highest constituent of R, iff _ is an X0 or XP node dominated by R, and there is no other X0 or XP node `&R, such that ` dominates or asymmetrically c-commands _.
Correspondence in OT syntax and Minimal Link effects
419
A complementiser-introduced subordinate clause now has two highest constituents, because the complementiser and its complement mutually c-command each other. SGP(top) can only be fulfilled for one of them. But SGP(hc-top) ensures that the complementiser is the first element. The fronting of a third constituent would be dispreferred, as it results in a double violation of SGP(top). For the examples in (35), we can assume that the left dislocated phrase is outside of the clause in both S and P. The highest constituents of the matrix clauses are the resumptive pronouns ‘da’, ‘der/den’, respectively. SGP(top) is fulfilled. With high rank of this constraint, we only need to have MGS(Inf) or MGP(Inf) ranked higher than both MGS(Arg) and MGP(Arg).19 Let us now discuss some concrete examples. First, a simple subject-initial clause: (38) Hans liebt Maria H.-NOM loves M.-ACC We keep the syntactic structure constant for the candidates, a simple VP which is represented according to X-bar theory, and we only consider the six logically possible orders of the elements. The constraints in the tableau are those that are relevant here: (39) Simple clause structure in German:
[VP S [V’ V O ]]
SGP (hc-top)
SGP (top)
☞ SVO SOV VOS OVS VSO OSV
SGP (ch) *
*! *! * *! * *! *!
* * * *
With an auxiliary, the picture slightly changes: the auxiliary is in second position, and the main verb is clause-final: (40) Hans hat Maria geküsst H.-NOM has M.-ACC kissed
420
Ralf Vogel
SGP(hc-top) is fulfilled by the auxiliary, and so the verbal participle occurs clause-finally in order to fulfil SGP(ch). For reasons of space, we only display candidates with SOV and SVO order in the tableau below: (41) Simple clause structure with auxiliary in German: [IP S Aux [VP [V’ V O ] ] ]
SGP(top)
SGP(hc-top)
*! *!
c1: AuxSVO c2: AuxSOV c3: SAuxVO ☞ c4: SAuxOV c5: SVAuxO c6: SOAuxV c7: SVOAux c8: SOVAux
SGP(ch) ** * ** ! * * * *
*! *! *! *!
We see that the two constraints that take care of the topmost projection exclude all structures that do not begin with the sequence “S Aux”. SGP(ch) then makes the decision in favour of the candidate c4 which has head final order for the VP. I leave out the table that shows the result for subordinate clauses. The winning candidate roughly conforms to c8 in the previous tableau, with the addition of an initial complementiser. That structure performs optimally on SGP(top) and SGP(hc-top) because of the complementiser, and has the ideal head-final order for the non-initial projections, which leads to a minimal violation of SGP(ch). Next, we turn to an example with a topicalised direct object: (42) Das Buch schrieb Helga the book wrote H. (43) Topicalisation of an object in German: schrieb(h,b) b=[+PROM]
SGP (top)
SGP (hc-top)
MGP (Inf)
☞ [+mvt,OVS] [–mvt,SVO] [+mvt,OSV] [–mvt,SOV]
SGP (ch) *
*! *! *!
* *
Correspondence in OT syntax and Minimal Link effects
421
A number of further possible candidates are not listed here, for instance, Vinitial orders. As all of these alternative variants would violate SGP(top), I excluded them here for ease of representation. We only consider candidates that fulfil this constraint. The second constraint, SGP(hc-top), enforces verb-second and excludes the candidates with the verb in final position. Between the two remaining candidates, SVO order without movement, and OVS order with movement, MGP(Inf) makes the decision. This reconstructs the standard analysis of such cases: object fronting is triggered by information structure, thereby observing the verb-second requirement in German. The case of wh-movement is very similar. Here, the constraint MGS(Wh) makes the decision: (44) Was schrieb Helga? What wrote H. (45) Wh-movement of an object in German:
Qx.schrieb(h,x)
SGP (top)
SGP (hc-top)
MGS (Wh)
☞ [+mvt,OVS] [–mvt,SVO] [+mvt,OSV] [–mvt,SOV]
SGP (ch) *
*! *! *!
*
5. Multiple Questions The system developed thus far predicts minimality effects with multiple questions. Consider the following German example: (46) Was kaufte wer? what bought who? With two wh-phrases, the problem arises that only one of them can fulfil MGS(Wh). We would either have a structure like (47a) or (47b), in both of which the verb is outside of the scope of the lower wh-phrase. (47) a. [VP [NP wer] [V’ [V0 schrieb] [NP was] ] ] b. [VP [NP was]i [V’ [V0 schrieb]j [VP [NP wer] [V’ tj ti ] ] ] ]
422
Ralf Vogel
As a consequence of this, MGS(Wh) cannot decide between the two structures. MGP(Wh) could be fulfilled by a candidate that has both wh-phrases in front in the linear string, but this would violate the highly ranked SGP(hc-top) which requires the complement of I0, including the lower whphrase, to remain to the right of the finite verb. In addition, structure (47b) performs worse than (47a) in MGS(Arg) and MGP(Arg). Therefore, whatever constraint ranking one chooses, (47b) will lose against (47a). In the light of an example like (46), this might look like a bad result. On the other hand, this situation simply derives the superiority effect. For English where a clause like (46) is ungrammatical this would be a fine result, and even German native speakers often judge clauses like (47b) worse than (47a). There is a universal markedness relation between these two structures. Probably all languages that allow for (47b) also allow for (47a), but not vice versa. The survival of (46) in German in spite of its markedness must be due to another factor. The standard OT answer to such situations is that there is not only markedness, but also faithfulness – the force to preserve the input in the output. In the system developed thus far, there is no place for such faithfulness. The correspondence constraints that we established are in fact a kind of faithfulness constraints under the more general conception of faithfulness and correspondence developed by McCarthy and Prince (1995). But our correspondence constraints only evaluate the translations between different types of representations. (47a,b) are of equal status here, and it is impossible to give (47b) the advantage it needs. This would only be possible, if we specified in the input already that we want structure (47b). Thus far, I assumed a standard version of OT syntax with M in the input, and pairs [S,P] as output candidates. What we obviously need, is a specification for S in the input already, and faithfulness constraints controlling for the preservation of the “input S” in the “output S”. I argued for such an architecture elsewhere already.20 The main motivation for this move is the observation that there usually is more than one way to express the same meaning. Different modes of expression cannot always be traced back to some functional or subtle semanticpragmatic difference in a predictable manner. The choice of syntactic construction in an utterance is often guided by stylistic and rhetorical factors, perhaps statistical factors like salience, and could even be simply accidental. The alternative to specifiying S in the input would be an integration of any such factors into M, with the underlying assumption that all syntactic differences must have a predictable semantic or pragmatic “cause”. We are far
Correspondence in OT syntax and Minimal Link effects
423
from being able to decide whether this is true or not. I assume that it is not true. Anyway, for now it seems more practicable to presuppose this decision for a particular syntactic structure. The OT grammar then evaluates whether that structure – represented by the syntactic part of the input – is wellformed, i.e., is correlated with a faithful output candidate by the evaluation procedure. One reviewer expresses a worry concerning this architecture. According to him, it “raises the clear danger of stipulating in the input all the syntactic properties that we wish to derive and then use faithfulness to distinguish one structure from the other. The result would be highly stipulative with no genuine explanation of the syntactic properties at issue.” To my mind, this worry does not affect the current approach in any sense. First of all, the question of what is a legitimate syntactic structure in general is not decided casewise for individual languages in generative grammar. A legitimate syntactic structure is any structure that can be built with the candidate generation function, GEN. In this paper, I assume for GEN a simplified version of X-bar theory, excluding adjunction, and allowing for “forests” of phrases instead. All we need is a well-defined grammar formalism. Whether that formalism is stipulative in itself is an independent issue. The question which of these structures is possible in a language is answered by that language’s grammar, its constraint hierarchy. The constraint system that I use here is also very restricted and independently motivated by the underlying concept of correspondence. It is not the task of the constraint hierarchy to explain the syntactic properties of winning candidates. This is again the subject of GEN. The constraint hierarchy explains why candidates with such properties are grammatical in a given language. The reviewer’s worry concerns a possible unsystematic usage of faithfulness. I think that this danger is much higher in an alternative to the proposed system that has only M in the input. I showed above that the proposed system predicts a typological implication, namely, that languages that allow for object-subject orders also allow for subject-object orders, but not necessarily vice versa. The ability to derive such markedness relations is a well-known property of OT systems using faithfulness. In order to account for optionality, the alternative theory has to play around with the definition of M in a very stipulative manner. The risk of losing typological implications because of arbitrary input definitions appears to me much higher.
424
Ralf Vogel
The whole discussion boils down to one central question which is answered in two different ways, the problem of syntactic optionality. The two answers are: 1. Optionality of syntactic structures is an irreducible property of natural languages. Therefore the input must be a pair [M,S]. 2. Optionality of syntactic structures is only apparent. There are no two syntactic structures in any language of the world that have the same meaning. Therefore the input must contain M only. To my mind, answer 2 is nothing more but wishful thinking, although it might be correct for a subset of cases of syntactic optionality. A functional argument from language use in favour of answer 1 is that answer 2 requires a level of linguistic precision that is never reached in everyday conversation. Language is an efficient means for communication because of this robustness. We can understand each other although the expressions we use are often “imperfect”. The alternative strategy, suggested by the reviewer, asks the question “How is a meaning M expressed in a language? ”, on which the model proposed here comments that there are many ways of expressing M, and that it is impossible to give a single answer. Therefore the question has to be restated: “Is it possible to express a meaning M with the structure S in a language? ”. This strategy presupposes a certain amount of imperfection and non-correspondence. It describes the degree of tolerance of such imperfections by the relative ranking of faithfulness. Note also that this is the strategy that is typically chosen when linguists do syntactic typology. We ask questions like “Is it possible to have superiority violations in multiple questions in a language?”. This question describes an input that contains both a semantic (multiple question) and a syntactic (the object wh-phrase precedes the subject wh-phrase) specification. To sum up, the model we arrive at is sketched in (48), input and output S are distinguished by subscripts. (48) Input and output representations in OT syntax (Vogel 2002, 2004): Input: Output:
SI, M SO, P
Correspondence in OT syntax and Minimal Link effects
425
Preservation of SI in SO is enforced by faithfulness constraints. We could assume a large set of such faithfulness constraints, but for our purpose it is sufficient to use the general formulation in (49): (49) SIGSO: SI and SO are identical. To ensure that the minimality violating structure survives, we only need to rank SIGSO higher than MGS(Arg), and we get the correct predictions for multiple questions in German.21
6. Freezing, Recoverability and Bidirectionality Free word order languages sometimes display surprising minimality effects with ordinary NPs, under surface ambiguity of the NPs involved. This phenomenon of “word order freezing” has been discussed at length by Lee (2001a, 2001b, this volume). In colloquial Korean, case markers can often be omitted. However, the NPs are now surface ambiguous for case, and their order is no longer free, subject-object order is the only available option: (50) Mary Jane manna-ss-e Mary Jane meet-pst-decl a. ‘Mary met Jane.’ b. *‘Jane met Mary.’
(Lee 2001b, 111)
It is not surprising that German, another free word order language, shows the same effect: (51) German, freezing to SO order : a. Den HANS liebt Maria, ohne ihn zu kennen without him-ACC to know the H.-ACC loves M. “It is Hans who Maria loves, without knowing him” b. HANS liebt Maria, ohne ihn zu kennen H.-NOM/ACC loves M.-ACC/NOM without him-ACC to know ?? “It is Hans who loves Maria without knowing him”
426
Ralf Vogel
(51b) has a strange interpretation, the person called ‘Maria’ is male. The reason is obviously that the plausible interpretation with the underlying marked OVS order in the matrix clause is inaccessible. Both proper nouns are morphologically unspecified for case. The important observation here is that a syntactic constraint, in particular: MGS(Arg) is active in the interpretive optimisation. We have a minimality effect in interpretation. This suggests that the two perspectives of optimisation are two sides of the same coin. We can combine them in a bidirectional model of OT syntax that uses the same set of ranked constraints for production-oriented and interpretation-oriented optimisation. I follow Lee in this conclusion. The two perspectives are combined in the definition of grammaticality given below in (53). It implements Pesetsky’s (1997, 1998) intuition that recoverability of underlying information constrains surface forms. Contrary to Pesetsky’s original approach, recoverability is not taken to be an OT constraint, but a condition on grammaticality and therefore inviolable. In a model that generates [underlying form,surface form] pairs, a violation of recoverability by a given pair proves its illegitimacy. If u1 is not recoverable from s1, then a different underlying form must be more optimal for s1, e.g., u2. In such a case, u1 is blocked by u2 in the interpretive optimisation for the input s1. I use the terms “first” and “feedback” optimisation for the two optimisation steps. This suggests a serial order of the two processes, but this is not essential. A reversal in the order would not change the results. The two optimisation steps should be seen as operating in parallel. Lee (2001a, 2001b, this volume) uses the terms “productive and interpretive optimisation”, following Smolensky (1996). The reason why I do not adopt these terms is that I want to avoid the usage of vocabulary from performance grammar when discussing the properties of a competence grammar. The two optimisation processes can be seen as independent of each other. They are only combined because of the way we define grammaticality. (52) Input and output representations in bidirectional OT syntax:
First optimisation:
Input: SI, M Output: SO, P
Feedback optimisation:
Input: P Output: SI, M
Correspondence in OT syntax and Minimal Link effects
427
(53) Grammaticality: A triple [Mi,Si,Pi] is grammatical, if and only if the input [Mi,Si] yields [Si,Pi] in first optimisation, and the input [Pi] yields [Mi,Si] in feedback optimisation. Instead of repeating Lee’s discussion here, I want to introduce additional observations concerning the behaviour of German wh-phrases. While a clause with two ambiguous wh-phrases like (54a) shows the same freezing effects as the examples given in (51), the effect disappears if only one of the two ambiguous NPs is a wh-phrase (54b,c): (54) a. Welche Lehrerin besuchte welches which teacher-FEM-NOM/ACC visited which Kind, ohne sie zu kennen? child-NEUT-NOM/ACC without her-ACC to know inaccessible interpretation: “Which teacher did which child visit without knowing her? ” b. Welche Frau küsste Hans, ohne sie zu kennen? without her to know which woman-NOM/ACC kissed H. “Which woman did Hans kiss without knowing her? ” c. Welche Frau küsste Hans, ohne ihn zu kennen? without him to know which woman-NOM/ACC kissed H. “Which woman kissed Hans without knowing him? ” This lacking of a freezing effect is unexpected under the account as presented thus far. In the input, we have a marked structure, an OVS order. High ranked syntactic faithfulness, SIGSO, ensures that this order survives first optimisation although it violates MGS(Arg). In feedback optimisation we have the P part of the winning candidate as input. Freezing occurs if P is morphologically ambiguous for two underlying S’s, the marked OVS and the less marked SVO. This is the case here. SIGSO can no longer rescue the marked structure, there is no S representation in the input, and so MGS(Arg) becomes decisive in feedback optimisation, and SVO wins against OVS, the criterion for grammaticality given above is not met. The tables in (55) illustrate this.
428
Ralf Vogel
(55) First optimisation: SIGSO S = OVS
☞ OVS SVO
Feedback optimisation: MGS (Arg) *
*!
SIGSO P = Hans…Maria OVS ☞ SVO
MGS (Arg) *!
With disambiguation by the preceding determiner ‘den’ in (51a), the marked structure survives because SGP(case) is ranked higher than MGS(Arg) and enforces the noun with accusative morphology to be interpreted as direct object. The scenario in (55) holds for (54a) and (54c), but it makes the wrong predictions for (54b). Obviously, argument structure makes no decision at all here, OVS is not blocked by SVO. It seems as if there was no constraint MGS(Arg) at all. But we know that it is there. What makes the difference is, of course, the fact that we are comparing the relative order of a wh-NP and a non-wh-NP. The two do not interfere with each other. This observation is in line with Rizzi’s (1990) conception of relativised minimality. But such relativisation is not encoded in our constraint set yet. An easy way of doing it would be assuming that MGS(Arg) applies only to XPs of the same type. In that case, MGS(Arg) could not be violated, because the two NPs are of different type. This would work quite well with German, but we would run into the danger of being unable to derive wh-in situ languages like Chinese. The sub-ranking “MGS(Arg) >> MGS(Wh)” is a reasonable candidate for the driving force behind wh-in situ. Hence, MGS(Arg) must be violated by wh-fronting in a clause like (54b). But it may not be violated by the same structure in German, as the grammaticality of this structure shows. As a way out of this dilemma I propose that the argument hierarchy is derived differently in the two languages. This difference results in different behaviour of the same structure with respect to MGS(Arg). Consider a predicate logic formula for a simple question like “who did John kiss? ” as part of its semantic representation, M: (56) Qy.kiss(x,y) For the determination of the argument hierarchy on the basis of this formula, it is crucial which occurence of ‘y’ in (56) is decisive. Let us assume that this decision is made differently in different languages. In German the argument
Correspondence in OT syntax and Minimal Link effects
429
hierarchy is based first on the operatorhood of an element and in the second place on the argument position, while in Chinese it is the other way around. This can be described in an OT fashion: (57) Determination of argument hierarchies in German and Chinese: German: Qy.kiss(x,y) x>y ☞ y>x
Operator
Arg. Str.
*! *
Chinese: Qy.kiss(x,y)
Arg. Str.
☞ x>y y>x
Operator *
*!
The argument hierarchies are presupposed by the constraints MGS(Arg) and MGP(Arg). Wh-in-situ now fulfils MGS(Arg) in Chinese, while it violates that constraint in German for the given structure. However, when the whphrase is fronted, MGS(Arg) is fulfilled in German, no matter whether the wh-phrase is subject or object, provided that it is the only wh-phrase in the clause. In Chinese, MGS(Arg) can only be fulfilled if the wh-phrase remains in situ. Ranking MGS(Arg) higher than MGS(Wh) derives the Chinese pattern. I finally want to show that the bidirectional strategy can successfully be applied to a long-standing empirical problem in German syntax which is exemplified in (58). hat er abgeraten, sofort wem2 nach Saloniki (58) a. Wem1 Saloniki who-DAT has he dissuaded immediately who-DAT to nachzureisen? to-travel-after “Who has he dissuaded from travelling after whom to Saloniki?”
430
Ralf Vogel
b. *Wem1/2 hat er (t1) wem2/1 abgeraten, sofort t2 nach who-DAT dissuaded immediately to who-DAT has he Saloniki nachzureisen? Saloniki to-travel-after intended reading: “Who is the person such that he has dissuaded whom to travel after that person?” c. Wohin2 hat er wem1 abgeraten, der Prinzessin t2 where-to has he who-DAT dissuaded the princess-DAT nachzureisen? to-travel-after “What is the place such that he has dissuaded whom to travel there after the princess?” (cf. Haider 1996, Haider 2000, Fanselow 1991) The ungrammaticality of (58b) can be interpreted as a freezing effect across clause boundaries. We see in (58c) that an embedded wh-phrase may extract across a wh-phrase in the matrix clause. However, under surface identity of the two phrases, this option breaks down. The explanation for this problem follows the same line as before: In feedback optimisation, an underlying syntactic structure is chosen that fulfils MGS(Arg).22 The difference to the freezing cases discussed above is that this winning interpretation is itself ungrammatical. This is due to the vacuous movement of the embedded wh-phrase into the middle field of the matrix clause. For the representation of this vacuous movement, we have two options in our model. It could be represented at both S and P, or, as adjunction, only at P. The latter option is the more realistic one, as the example looks like a typical case of scrambling, which is mostly, though not always, analysed as an instance of adjunction in German.23 Let us first discuss how a candidate with vacuous syntactic movement of ‘wem2’ in S is excluded. The structure of this candidate is sketched in (59): (59) [CP wh-NP1 … wh-NP2 t1 V [CP … t2 V ]] To exclude this structure, we need a constraint that is ranked higher than SIGSO. Let us assume a constraint on “clausemateness” in the following way:24
Correspondence in OT syntax and Minimal Link effects
431
(60) MGS(cm): If a predicate and one of its arguments are both constituents of the same tree in S, then they must be clause-mates. Note the conditional clause in this definition. In the case of adjunct arguments, verb and argument are not clause-mates in S. But we do not want the constraint to penalise such cases, because this is not a case of syntactic dislocation. MGS(cm) can be violated by a winner in German, as exemplified by (58c). So the constraint must be ranked relatively low. It may, however, only be violated in order to fulfil a higher ranked constraint, like, e.g., MGS(Wh). Thus, the constraint on vacuous movement we are looking for should be a conjoined constraint, penalising a violation of MGS(cm) without fulfilling MGS(Wh): (61) MGS(cm) &XP MGS(Wh)(*VAC): No simultaneous violation of MGS(cm) and MGS(Wh) by the same XP. The ranking that we then need is the following: (62) *VAC >> SIGSO >> MGS(cm) This ranking excludes the structure in (59) already in first optimisation. The second option for deriving (58b) is interpreting it only as reordering at P. The S of this candidate would have the structure of (58a). Such an order would violate the following constraint: (63) MGP(cm): A predicate and its arguments are clause-mates at P. One could also imagine a clause-mate constraint for SGP mapping: (64) SGP(cm): If s1 is a constituent of the clause s2 at S, then p1 is contained in p2 at P. A high rank of this constraint would result in the blocking of all P movement across clause boundaries. This might be a plausible restriction for German. However, the other constraints on SGP mapping already cover effects of such a constraint.25 For this reason, I will avoid using it here. The easy task is showing that (58a) has the ideal linearisation of elements for an underlying syntactic configuration where the matrix clause ‘wem’ is fronted:
432
Ralf Vogel
(65) First optimisation for (58a):
S = [ wem1 …t1 V [ wem2 V ]]
MGS (Wh)
☞ c1: wem … … wem … V
MGS (Arg)
MGP (cm)
* *
c2: wem … wem … V … V
*
Candidate c2, representing (58b), violates MGP(cm) in addition to MGS(Wh) (which is violated by both candidates) without gaining anything. c1 therefore blocks c2 in first optimisation. (58b) cannot have the interpretation given in the input in (65). However, under the interpretation where the embedded wem is fronted, first optimisation lets this order survive in the same way as it lets (58c) survive. In feedback optimisation, we have this order in the input, and look for its optimal underlying structure. We have three candidates, and we need to derive that the minimality violating structure, the original input (indicated with “G” in (66)) now loses: (66) Feedback optimisation of (58b): *VAC P = wem … wem … V … V G c1: [ wem2 … wem1 V [ t2 V ]] ☞ c2: [ wem1 … t1 V [ wem2 V ]] c3: [ wem1 … wem2 t1 V [ t2 V ]]
*!
MGS (Wh)
MGS (Arg)
* * *
*!
MGS MGP (cm) (cm) * *
* * *
The worst candidate is c3, the one with vacuous movement in S. Because of the high rank of *VAC, this candidate has no chance in first optimisation either, as discussed above.26 MGS(Wh) cannot be fulfilled by the candidates, because we have two wh-phrases and only one position where they could fulfil that constraint. And now it is again MGS(Arg), the minimality constraint, that makes the decision against the initial input. The winning underlying structure is one that has no chance to win with the P under consideration here, because of the availability of the order (58a) in first optimisation. Clause (58b) is now excluded under all three possible inputs. The MLC obeying interpretation is excluded as an instance of vacuous movement in first optimisation. The MLC violating interpretation survives first optimisa-
Correspondence in OT syntax and Minimal Link effects
433
tion, but loses in feedback optimisation against the MLC obeying structure, but this structure is paired in first optimisation with a more optimal linearisation, the one without vacuous movement in (58a). There is no grammatical triple [M,S,P] with (58b) as P according to the definition of grammaticality in (53). How can the blocking interactions described here be avoided in the case of the well-formed (58c)? The difference to (58b) is that a directional phrase is extracted from the subordinate infinitival clause. The blocking candidate would be an interpretation where the directional PP belongs to the matrix verb, and the vacuously moved dative object would belong to the embedded verb. This would be equivalent to a clause like (67): der Prinzessin t1 nachzureisen? (67) *Wohin2 hat er abgeraten, wem1 where-to has he dissuaded who-DAT the princess-DAT to-travel-after “Whereto did he dissuade to travel after whom after the princess? ” While one argument of ‘dissuade’, the addressee, is missing, the verb has an additional argument, the directional PP, of which we cannot make sense. Likewise, the embedded verb now has two dative objects instead of one, the thematic role of one of them is unclear. This candidate is subject to semantic markedness in a way that (58b) is not, because in the latter case two identical wh-phrases exchange their semantic roles. Semantic markedness surely plays a role in interpretive optimisation. An appropriate constraint should be ranked highly enough to rule out the interpretation in (67) early. This avoids the blocking of (58c) by a candidate like c2 in (66). Candidate c1 now becomes optimal. The analysis presented here requires a fully parallel conception of the bidirectional architecture. The candidate set of feedback optimisation is not restricted to winners of the first optimisation. Herein the proposed model differs from serial bidirectional models like that of Wilson (2001) or Zeevat (2001). We also see that the bidirectional model is really a model of syntactic bidirectionality. Interpretation, here understood as interpretation of a surface form, is syntactic parsing and semantic interpretation at the same time.
7. Conclusion I explored in this paper how a correspondence based conception of OT syntax derives classical minimality effects. I showed that the idea of structure preservation can be reconstructed in terms of correspondence between different
434
Ralf Vogel
representations. In standard “GB” (Chomsky 1981) and minimalism (Chomsky 1995), the configuration that underlies minimality effects among NPs is a projection according to the thematic hierarchy. Elements are inserted into their O-positions. This yields the asymmetric c-command relations between arguments that induce minimality effects in case- and operatormovement. This analysis is reconstructed here in terms of MGS and MGP mapping. The semantic relations encoded at M have to be translated into syntactic relations in terms of an iconic mapping that translates, for instance, relative scope and argument hierarchy into asymmetric c-command and precedence. I then showed how this model extends to cases of word order freezing. These are hard to account for in unidirectional models like minimalism and standard OT. The reason is that freezing is dependent on the surface form. It occurs if a marked syntactic structure is homophonous to a less marked one. Unidirectional grammars model syntax as a mapping from S to P, as a feeding relation. They do not take into account whether the reverse mapping, from P to S, also holds. Word order freezing is a case where this reverse mapping fails. As this is obviously crucial for grammaticality, the definition of grammaticality has to take into account both directions. This has been done with a bidirectional version of OT syntax here. Two different kinds of minimality effects, superiority and word order freezing, are treated in an analogous fashion. In both cases, the constraint MGS(Arg) plays a crucial role. It takes effect in first optimisation in the case of superiority, and in feedback optimisation in the case of freezing. The bidirectional perspective brings the surface form into the centre of attention. All underlying information needs to be encoded in terms of the surface form, and this must be done in such a way that it is recoverable. This defines grammaticality. The conditions for these translation processes can be formulated as OT correspondence constraints in the way demonstrated in this paper. While this surface orientation is a new perspective in OT syntax in the radical way proposed here, other features of the model are very much inspired by previous work: LFG-OT makes extensive use of correspondence (Bresnan 2000; Kuhn 2001). The constraint MGS(Wh) expresses more or less the same as Grimshaw’s (1997) “OpSpec”. Likewise, SGP(hc) is reminiscent of Grimshaw’s “HeadLeft”, though the two do not express exactly the same. The idea of specifying syntactic features in the input is present in (Bakoviç and Keer 2001; Legendre et al. 1998).
Correspondence in OT syntax and Minimal Link effects
435
Acknowledgements This paper has grown over a longer period of time. I want to thank everyone who helped me clarify my thoughts on these issues: Gereon Müller, Sten Vikner, Fabian Heck, Tanja Schmid, Silke Fischer, Jonas Kuhn, Gunnar Hrafnbjargarson, Gisbert Fanselow, Arthur Stepanov, Joanna Blaszczak, Susann Fischer, Florian Schäfer, Eva Engels, Andreas Haida, Hanjung Lee, Yukiko Morimoto, Gerhard Jäger, Doug Saddy, Peter Staudacher, Hubert Haider, Hans-Martin Gärtner, Cornelia Endriss, Jane Grimshaw, Judith Aissen, Juan Uriagareka, Vieri Samek-Lodovici, John Hale, Géraldine Legendre, three anonymous reviewers, and the audiences of several presentations in Potsdam in November 2001 (WOTS 5), July 2002 (minimal link workshop), and in Cologne (January 2003). The copyright for all errors and shortcomings contained in this paper is mine, however. This work has been supported by the Deutsche Forschungsgemeinschaft, research group “Conflicting Rules in Language and Cognition”, FOR–375/2–A3.
Notes 1. Legendre et al. (1998) use Chomsky’s (1986) definition of barrier, where IP is a barrier by inheritance from VP. This is here carried over to all functional projections above VP. 2. See the collection by Legendre et al. (2001) for a representative overview. The state of the art in LFG-oriented OT syntax is documented in (Bresnan, 2000) and the collection edited by Peter Sells (2001). 3. (Choi, 1996) and (Büring, 2001) are two examples in case. 4. A brief remark about frameworks: S, the abstract syntactic representation, is encoded within X-bar theory in this paper. But this is not crucial for the proposed analysis. S could as well be an HPSG feature structure, or a combination of fstructure and c-structure in LFG (without linearisation). The choice of representational alphabet is not of the same importance within OT syntax than in non-OT syntax, because all explanation lies or should lie in the constraints and their interaction. The constraints used in this paper deal with the correspondence between representations and are therefore formulated relative to the representational language that is used, but I expect that the constraints can be recoded to work with HPSG, LFG, or any other syntactic framework. 5. In a syntax class in winter 2001/2002, University of Potsdam, Juan Uriagareka gave a very similar formulation of this idea in correlation with his conception
436
6. 7.
8.
9. 10.
11. 12.
13. 14.
Ralf Vogel of “multiple spell-out” (Uriagareka, 1999): An adjunct is generated in parallel to the matrix tree, and adjoined, i.e. adjacent to that part of the tree that it is spelled out together with. A similar proposal within minimalism is Stepanov’s (2001) theory of Late Adjunction that treats adjunction as post-cyclic operation. Chomsky (2001, 16) seems to have something similar in mind when he states that “given the basic properties of adjunction, we might intuitively think of _ as attached to ` on a separate plane, with ` retaining all its properties on the ‘primary plane’, the simple structure. …”. The notion of “separate plane” remains unclear. My proposal suggests that it is P. A question that immediately arises, and which has also been brought up by one anonymous reviewer, is how ditransitive verbs are represented in this proposal. One possibility is Larson’s (1988) solution of a “VP-Shell”, where the necessary positions for the NPs are provided by several stacked projections of the same verb. This might be the correct proposal for English. However, it has also been argued that arguments can be syntactic adjuncts. Grimshaw (1990) calls the English ‘by’-phrase an “argument adjunct”. Likewise, Vogel and Steinbach (1998) present rich evidence that German dative objects should best be treated as adjuncts syntactically. Both of these options are possible. For a recent overview and an alternative treatment within OT, see Zepter (2003). I am very thankful to Judith Aissen for providing me with information about these facts, and for discussing this and other aspects of the approach presented here. See also Zepter (2003) for an alternative OT treatment of these issues. Some of these constraints have predecessors in earlier work. Grimshaw’s (1997) “HeadLeft” and “HeadRight” closely resemble SGP(hc) and SGP(ch). In general, these constraints express common generalisations about unmarked orders. The typological predictions of this system of constraints have been calculated with the help of “OTSoft” (Hayes et al., 2002). Several definitions for constraint ties have been proposed. The least problematic one, which is sufficient here, is the one that interprets a tie as an abbreviation for the existence of two co-grammars which only differ in the relative ranking of the two tied constraints. The advantage of this definition is that it does not introduce a new kind of constraint interaction. The tie is only a notational convention. McCarthy (2002, 227) traces this definition back to Paul Kiparsky, who motivated it with considerations about language change. See Costa (2001) for some discussion and the references cited there. Prince (1999) shows that calling the fronting operation in (13c) “topicalisation” is misleading in that the fronted element usually does not function as a discourse topic. The operation is possible with focus bearing phrases in particular discourse configurations. Without going into detail, I will refer to this particular property as some kind of information structural prominence. MGS(Wh) is nearly equivalent to Grimshaw’s (1997) constraint “OpSpec”. A brief note about the constraint MGP(Wh) in (17): Its actual ranking cannot be determined because its effects are hidden behind the high rank of MGS(Wh)
Correspondence in OT syntax and Minimal Link effects
15.
16. 17.
18.
19.
20. 21.
22.
23.
24.
437
and constraints on SGP correspondence, in particular, SGP(sh), SGP(hc), and SGP(NP). I am abstracting away from the special problem of do-support here. For an OTanalysis, see Grimshaw (1997). This analysis should, however, be made compatible with what is proposed here. The insertion of “do” can be assumed to be forced by a highly ranked constraint that requires to keep the relative order of subject and main verb. See (Legendre, 2000) for an account of Bulgarian wh-clusters along similar lines. We use three of the constraints introduced informally in section 2. Their relative ranking cannot be determined with the data at hand. All rankings yield the same result. In Schmid and Vogel (2004), we discuss another example in case, namely the loss of strict complement-head order in standard German verbal complexes with rising complexity. The decision between these two options is relevant for the treatment of scrambling. If scrambling is adjunction, then we have MGS(Inf) ranked lower than MGS(Arg) and MGP(Arg) (and MGP(Inf) ranked higher), if scrambling is movement to a specifier position, then the constraint that triggers it, for instance MGS(Inf), is ranked higher. SGP(top) ensures that S and P are synchronised in their initial constituent. The standard analysis of scrambling is that it is adjunction, and so we may assume that MGS(Inf) is ranked low. I will assume this in the subsequent discussion. See Vogel (2002, 2004), Bakoviç and Keer (2001) were the first to propose such an input for OT syntax. A well-known fact about English multiple questions is that discourse-linked wh-phrases can remain in situ. A clause like “What did which student read?” is grammatical. I sketch an analysis of these cases within a correspondence-based OT syntax in Vogel (2004). This example shows again, like the discussion we had before, that the determination of an argument hierarchy depends on the whole representation M, and cannot be restricted to argument hierarchies provided by the argument structures of individual lexical items. In Vogel and Steinbach (1998), we show that dative objects in German share many syntactic properties with adjoined categories and contrast with NPs in Apositions, like accusative objects and subjects. Also from this perspective, an adjunction analysis is more realistic. The definition of clause-mateness should rely on the notion of ‘extended projection’ in the sense of Grimshaw (1991), which roughly conforms to the array of VP, IP and CP nodes on top of the same lexical V. Two elements are clausemates if they are dominated by the same extended projections of V. At P, we need to refer to the P correspondent of that extended projection, i.e. a particular substring of P. This has to contain the P correspondents of the two elements in question.
438
Ralf Vogel
25. For example, SGP(hc) requires all elements of the complement to follow the head. Material that is moved in front of the head in P results in a violation of that constraint. Likewise, the condition on sister adjacency is violated in such a configuration. 26. Note that c1 does not violate *VAC, because its violations of MGS(cm) and MGS(Wh) come from two different XPs!
References Aissen, Judith 1987 Tzotzil Clause Structure. Dordrecht: Reidel. 1992 Topic and Focus in Mayan. Language, 68: 43–80. 1996 Pied-Piping, Abstract Agreement, and Functional Projections in Tzotzil. Natural Language and Linguistic Theory, 14: 447–491. Bakoviç, Eric and Edward Keer 2001 Optionality and Ineffability. In: Géraldine Legendre et al. (eds.), Optimality Theoretic Syntax. Cambridge, Mass.: MIT Press. Barbosa, Pilar, Danny Fox, Paul Hagstrom, Martha McGinnis, and D. Pesetsky (eds.) 1998 Is the best good enough? Optimality and competition in syntax. Cambridge, Massachusetts: MIT Press. Bresnan, Joan 2000 Optimal Syntax. In: Joost Dekkers, Frank van der Leeuw, and Jeroen van de Weijer (eds.), Optimality Theory: Phonology, Syntax and Acquisition, pp. 334–385. Oxford: Oxford University Press. Büring, Daniel 2001 Lets’s Phrase it. Focus, Word Order, and Prosodic Phrasing in German Double Object Constructions. In: Gereon Müller and Wolfgang Sternefeld (eds.), Competition in Syntax, pp. 69–106. Berlin: Mouton de Gruyter. Choi, Hye-Won 1996 Optimizing Structure in Context: Scrambling and Information Structure. Ph.D. thesis, Stanford University. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. 1986 Barriers. Cambridge, Massachusetts: MIT Press. 1995 The Minimalist Program. Cambridge, Mass.: MIT Press. 2001 Beyond Explanatory Adequacy. Manuscript, MIT. Costa, João 2001 The Emergence of Unmarked Word Order. In: Géraldine Legendre et al. (eds.), Optimality Theoretic Syntax, pp. 171–203. Cambridge, Mass.: MIT Press.
Correspondence in OT syntax and Minimal Link effects
439
Fanselow, Gisbert 1991 Minimale Syntax. Groninger Arbeiten zur Germanistischen Linguistik, 32. Greenberg, Joseph H. 1963 Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements. In: Joseph H. Greenberg (ed.), Universals of Language, pp. 58–90. Cambridge, Mass.: MIT Press. Grimshaw, Jane 1990 Argument Structure. Cambridge, Mass.: MIT Press. 1991 Extended Projection. Manuscript, Brandeis Univesity. 1997 Projection, Heads and Optimality. Linguistic Inquiry, 28: 373–422. Haider, Hubert 1996 Towards a Superior Account of Superiority. In: Uli Lutz and Gereon Müller (eds.), Papers on Wh-Scope Marking, pp. 317–329. Arbeitspapiere des SFB 340, Bericht Nr. 76, Universität Tübingen. 2000 Towards a Superior Account of Superiority. In: Uli Lutz, Gereon Müller, and Arnim von Stechow (eds.), Wh-Scope Marking, Linguistik aktuell/Linguistics Today, pp. 231–248. Amsterdam, Philadelphia: John Benjamins. Hayes, Bruce, Bruce Tesar and Kie Zuraw 2002 OTSoft. Software package. http://www.linguistics.ucla.edu/people/ hayes/otsoft/. Heck, Fabian and Gereon Müller 2000 Successive Cyclicity, Long-Distance Superiority, and Local Optimization. Proceedings of WCCFL, 19: 218–231. Jackendoff, Ray 1990 Semantic Structures. Cambridge, Mass.: MIT Press. Kayne, Richard 1994 The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Kuhn, Jonas 2001 Generation and Parsing in Optimality Theoretic Syntax: Issues in the Formalization of OT-LFG. In: Peter Sells (ed.), Formal and Empirical Issues in Optimality-Theoretic Syntax, pp. 313–366. Stanford, Calif.: CSLI Publications. Larson, Richard 1988 On the Double Object Construction. Linguistic Inquiry, 19: 335–391. Lee, Hanjung 2001a Markedness and Word Order Freezing. In: Peter Sells (ed.), Formal and Empirical Issues in Optimality-Theoretic Syntax, pp. 63–128. Stanford, California: CSLI Publications. 2001b Optimization in Argument Expression and Interpretation: A Unified Approach. Ph.D. thesis, Stanford University.
440
Ralf Vogel
Legendre, Géraldine 2000 Morphological and Prosodic Alignment of Bulgarian Clitics. In: Joost Dekkers, Frank van der Leeuw and Jeroen van de Weijer (eds.), Optimality Theory: Phonology, Syntax and Acquisition, pp. 422–462. Oxford: Oxford University Press. Legendre, Géraldine, Jane Grimshaw and Sten Vikner (eds.) 2001 Optimality Theoretic Syntax. Cambridge, Mass.: MIT Press. Legendre, Géraldine, Paul Smolensky and Colin Wilson 1998 When is less more? Faithfulness and Minimal Links in WH-Chains. In: Pilar Barbosa et al. (eds.), Is the best good enough? Optimality and competition in syntax, pp. 249–289. Cambridge, Mass.: MIT Press. McCarthy, John and Alan Prince 1995 Faithfulness and Reduplicative Identity. In: Jill Beckman, Laura Walsh-Dickie and Suzanne Urbanczyk (eds.), Papers in Optimality Theory, vol. 18, pp. 249–384. Amherst, Massachussetts: UMass Occasional Papers in Linguistics. McCarthy, John J. 2002 A Thematic Guide to Optimality Theory. Research Surveys in Linguistics. Cambridge, UK: Cambridge University Press. Mondloch, James 1978 Disambiguating subjects and objects in Quiché. Journal of Mayan Linguistics, 1(1):3–19. Müller, Gereon 2001 Order Preservation, Parallel Movement, and the Emergence of the Unmarked. In: Géraldine Legendre et al. (eds.), Optimality Theoretic Syntax, pp. 279–313. Cambridge, Mass.: MIT Press. 2002 Free Word Order, Morphological Case, and Sympathy Theory. In: Gisbert Fanselow and Caroline Féry (eds.), Resolving Conflicts in Grammars. Linguistische Berichte, Sonderheft 11, pp. 9–48, Hamburg: Helmut Bushe Verlag. Pesetsky, David 1997 Optimality Theory and Syntax: Movement and Pronunciation. In: Diana Archangeli and D. Terence Langendoen (eds.), Optimality Theory. An Overview, chap. 5. Malden, Mass. and Oxford: Blackwell. 1998 Some Optimality Principles of Sentence Pronunciation. In: Barbosa et al. (eds.), Is the best good enough? Optimality and competition in syntax, pp. 337–383. Cambridge, Mass.: MIT Press. Prince, Ellen F. 1999 How not to mark topics: ‘Topicalization’ in English and Yiddish. Texas Linguistics Forum 1999, chapter 8. Rackowski, Andrea and Travis, Lisa 2000 V-initial languages: X or XP-movement and Adverbial Placement. In: Andrew Carnie and Eithne Guilfoyle (eds.), The Syntax of Verb Initial Languages, pp. 117–142. New York: Oxford University Press.
Correspondence in OT syntax and Minimal Link effects
441
Rizzi, Luigi 1990 Relativized Minimality. Cambridge, Mass.: MIT Press. Schmid, Tanja and Vogel, Ralf 2004 “Dialectal Variation in German 3-Verb Clusters. A surface oriented OT account.” Journal of Comparative Germanic Linguistics, 7: 235–274. Sells, Peter (ed.) 2001 Formal and Empirical Issues in Optimality-Theoretic Syntax. Stanford, California: CSLI Publications. Smolensky, Paul 1996 On the Comprehension/Production Dilemma in Child Language. Linguistic Inquiry, 27:720–731. Stepanov, Arthur 2001 Late Adjunction and Minimalist Phrase Structure. Syntax, 4(2): 94–125. Truckenbrodt, Hubert 1999 On the Relation between Syntactic Phrases and Phonological Phrases. Linguistic Inquiry, 30: 219–255. Uriagareka, Juan 1999 Multiple Spell-Out. In: Samuel David Epstein and Norbert Hornstein (eds.), Working Minimalism, pp. 251–282. Cambridge, Mass.: MIT Press. Vogel, Ralf 2004 Remarks on the Architecture of Optimality Theoretic Syntax. In: Reinhard Blutner and Henk Zeevat (eds.), Optimality Theory and Pragmatics, pp. 211–227. Houndmills, Basingstoke, Hampshire, England: Palgrave MacMillan. 2002 Free Relative Constructions in OT Syntax. In: Gisbert Fanselow and Caroline Féry (eds.), Resolving Conflicts in Grammar. Optimality Theory in Syntax, Morphology, and Phonology. Linguistische Berichte Sonderheft 11, pp. 119–163. Hamburg: Helmut Buske Verlag. Vogel, Ralf and Markus Steinbach 1998 The Dative – an Oblique Case. Linguistische Berichte, 173: 65–90. Williams, Edwin 2003 Representation Theory. Cambridge, Massachusetts: The MIT Press. Wilson, Colin 2001 Bidirectional Opimization and the Theory of Anaphora. In: Géraldine Legendre et al. (eds.), Optimality Theoretic Syntax. Cambridge, Mass.: MIT Press. Zeevat, Henk 2001 The asymmetry of Optimality Theoretic syntax and semantics. Journal of Semantics, 17:243–262. Zepter, Alexandra 2003 Phrase Structure Directionality: Having a few Choices. Ph.D. thesis, Rutgers University, New Brunswick, NJ.
Index
a(rgument)-structure, 259, 376–378, 405, 409, 411–412, 428, 437 adjacency, 6, 37–40, 43–60, 65, 131, 401, 406, 409, 438 adverb, 5–6, 37, 42–43, 46–48, 57, 60–64, 84–85, 99, 105, 114, 127, 134, 155, 159, 165, 223–224 locative, 223 of manner, 42–43, 155, 165, 224 of reason, 155, 165, 224 temporal, 223 wh-, 222–224 adverbial, 98, 105, 114, 147–151, 155– 159, 165–171, 309–310, 406, 417–418 CP, 309–310 local, 155 of manner, 165 of reason, 165 phrasal, 159, 169 temporal, 155 wh-, 147, 149, 157, 171 affix(ation), 6, 22, 23, 31, 39–40, 43– 45, 50–52, 58–60, 110, 143, 280 applicative, 23, 31 verbal, 6, 39, 40–45, 58, 60 affix hopping, 6, 39, 45 agent, 23, 245–246, 251, 254–260, 270–271, 370 agreement, 21, 58, 152–154, 245, 248, 270–273, 291–292, 359, 367, 386–391 long-distance, 291, 316 modifier, 280 object, 21, 367 spec-head, 165, 188 subject, 21, 367, 380, 384–385, 388, 391, 393 verb, 280
Albanian, 31, 384 alignment constraint, 257–258, 261, 268–270, 281–282 ambiguity, 270–272, 293 lethal, 384 morphological, 280 surface, 242, 425 animacy, 16, 150, 243, 270–272, 303 anti-nestedness, 89 anti-superiority, 77, 109 Attract Closest, 15, 23, 26–28, 32, 61 attractee, 206, 215–217, 227, 230 attractor, 206–207, 210–212, 215–218, 230 auxiliary, 5–7, 42, 49–51, 55–56, 97, 131, 138–140, 198, 328–337, 341, 351, 367, 419–420 auxiliary/copula contrast, 333 auxiliary-participle construction, 37, 48 background information, 257 Bantu, 23, 282 barriers, 1–3, 78–79, 181, 186, 402–403 Barriers Condition, 8, 181–183, 187 Basque, 385–388 bidirectional(ity), 9, 263–272, 276–282, 404, 425–426, 429, 433–434 blocking, 9–11, 26, 53, 56, 60–62, 115, 164, 264–267, 275–279, 336, 403, 431–433 effect, 62–63, 139, 210, 270, 273, 279 partial, 264–270, 275–279 bridge verb, 134, 360 Bulgarian, 3, 6, 37, 41–45, 57–64, 74– 77, 89–93, 106–111, 186, 405, 437 Burzio’s generalization, 383, 393
444
Index
c(onstituent)-structure, 250–253, 259, 403, 435 Case absolutive, 8, 251, 367–370, 379, 383–393, 407 abstract, 252, 391–392 accusative, 17, 21, 31–32, 164, 243, 247, 251–255, 261, 267, 280– 281, 305, 367–370, 381–383, 386–390, 393, 428, 437 constructive, 250, 281 dative, 7, 15–17, 21–28, 31–32, 108, 114–115, 247, 252–255, 261, 280–281, 300, 385–388, 433, 436–437 ergative, 8, 243, 251–257, 260–262, 367, 370, 378, 381–382, 387–389 genitive, 7, 15–23, 26–28, 32 inherent, 8, 21, 255, 370–371, 375– 385, 391–392 lexical, 255, 281, 370–371 nominative, 7–9, 15, 21–28, 32, 131, 154, 164–169, 241–247, 251–253, 257, 260–262, 269, 280, 291, 367–371, 381–393 oblique, 21, 117, 131, 246, 272, 280 quirky, 21, 31, 165, 255, 281, 385– 389 semantic, 253–259 structural, 255–257, 281, 367–370, 376–393 Case assignment, 7–8, 131, 359, 368– 371, 388–389, 406 Case feature, 20, 147, 234, 251–252, 255, 261, 281, 376–380, 385– 386, 392 Case freezing, 10, 205, 211–213, 225, 229–231 Case markers, 245–247, 250–253, 256, 260–261, 378–379, 425 Case system, 22, 243, 252, 257 Catalan, 6, 89, 125–143 Chamorro, 9, 270–278 checking relation, 2, 20, 210–212, 226, 234, 401
clause-boundedness, 128, 194, 360 Clause Nonfinal Incomplete Constituent Constraint, 318 clitic doubling, 15–19, 26–32 clitic-doubled DP, 7, 15, 26 clitic(ization), 7, 15, 26–32, 58, 64, 100, 110, 135–137, 141–143, 334, 384, 405 postverbal, 6, 133–137, 142 closeness, 10, 20, 93, 96, 139, 207–210, 230–232, 241, 308, 320, 371 complexity, 294 computational, 2, 196 derivational, 196 featural, 253 Condition on Extraction Domain (CED), 187–188, 306–311, 319–320 contrastive topic, 104, 111 correspondence, 7–10, 192, 242, 248– 250, 256, 259, 269–270, 279, 344, 401–407, 410, 422–424, 433–437 imperfect, 189 (see also Linear Correspondence Axiom) cyclic merger, 223 cyclic spell-out, 214, 317 cyclicity, 223, 234, 293–294 Danish, 65 dative alternation, 15, 22, 114–115 dative shift, 114, 300–301, 318 dependent-marking languages, 245, 270 derivational constraint, 293–294, 316 derivational grammar, 177, 197, 202, 289, 308, 315 d-linking, 160–162, 315, 320, 325 double object construction, 7, 16–23, 31, 116, 300–302, 323 Dutch, 51, 84–86, 94–95, 99–107, 149– 160, 166–167, 173, 179–180, 342, 356, 416 Dyirbal, 251, 368, 387
Index economy, 2, 23, 28 condition, 10, 206, 232, 328, 333, 372 constraint, 76, 83, 86, 90, 113, 117, 206, 282 derivational, 73, 402 effect, 242 local, 24, 206 metric, 205, 233 of feature movement, 132 of movement, 401 of structure, 282 principle, 11 Empty Category Principle (ECP), 75, 81 emergence of the unmarked, 269, 272–273 emphasis, 6, 128, 134–143, 244 ‘engineering’ problems, 341 equidistance, 7, 20, 23–25, 49, 139, 293, 383–384, 393 ergative system, 256–257 ergativity, 243, 254–255, 367–369, 382– 385, 389–391 EVAL(uator), 190, 250, 404 expressivity, 11, 73, 81–82, 110, 117 Extended Projection Principle (EPP), 7, 20–21, 30, 123, 132, 140–142, 172, 208–212, 217–220, 226– 228, 234–235, 292, 296–297, 304, 317, 331, 381 f(unctional)-structure, 249–261, 282 Faroese, 37, 65, 125, 130, 139 Feature Condition, 292, 296–299, 303, 306, 312, 317 finite clause, 74, 78–79, 82, 129–131, 191, 211, 303, 307 focus, 50, 63–64, 107, 113, 127–128, 142, 152, 244–245, 254, 258, 303, 361, 368, 436 French, 31, 44, 59, 85, 94, 104, 125, 134, 157–158 GEN(erator), 178, 189, 196, 199, 249–
250, 261–264, 403–404, 423
445
Georgian, 382–384, 392 Germanic, 5, 51, 125–126, 131–134, 141, 147, 180 grammaticalization, 15 Greek, 7, 15–17, 21–32 Head-Driven Phrase Structure Grammar (HPSG), 358–360, 435 head-marking language, 9, 245, 270–274, 279, 282 Hebrew, 110–111 hierarchy, 5, 61, 187, 248, 253, 274, 279, 360–361, 402–404, 423, 428–429, 437 accessibility, 329, 354 argument, 403 grammatical function, 246 markedness, 253 of animacy, 272 power (OT), 186, 402 thematic, 246–247, 434 Hindi, 8–9, 78–79, 241–247, 250–260, 264, 267–272, 280, 367–371, 378–381, 385, 388–393 Icelandic, 5–7, 21, 31, 37, 41–47, 58, 61–67, 98–105, 125–142, 147, 165–167, 235, 281, 319, 327– 358, 361–362, 385–388 information structure, 11, 74, 105–107, 113–118, 125–126, 135, 138–141, 189, 244, 346, 356, 405–412, 421 input, 183–195, 199, 244, 249–250, 255, 258–262, 266–267, 273–276, 282, 317, 359, 402–407, 412, 422–427, 432–434, 437 Japanese, 5, 8, 105, 155–157, 178, 185– 199, 244 Last Resort, 8, 40, 181–184, 296–305, 313–314, 317 late adjunction, 223, 377–379, 382, 436 Least Tampering, 372–373
446
Index
Lexical-Functional Grammar (LFG), 9, 242, 249–250, 435 Optimality Theoretic LFG, 242, 249–253, 259, 281, 403, 434– 435 Linear Correspondence Axiom (LCA), 205, 213–215, 406, 416 m(orpholexical)-structure, 252, 256–260 Magritte, 343 m-command, 97 Merge over Move, 10, 205–209, 212– 213, 225–226, 229–233 ‘minimal disturbance’ principle, 354 minimal domain, 2, 7, 15, 20–26, 232 Minimalist Program, 2, 5, 9, 177, 196, 241, 255 Minimality, 1–11, 127, 147–151, 170, 241, 269–270, 416, 425, 432 effect, 150, 241–242, 263, 269–270, 295, 421, 425–426, 433–434 relativized, 1, 9, 61–63, 128, 210, 227–229, 269, 336–337, 387, 416, 428 movement A-, 1, 4–7, 19, 73–74, 115–117, 241–242, 316 A’-, 1, 32, 62, 117–118, 318, 409 copy, 215–218 cyclic, 10, 51, 63, 220–225, 292, 310, 314–315, 317 diving, 185, 190, 195 feature, 15, 28–32, 132, 316, 380 feature-driven, 227, 309, 313 F-, 10 head, 1–2, 6–7, 47, 52, 61, 73–74, 115, 127 I-to-C, 42, 60 long distance, 90, 150, 304 Long Head Movement (LHM), 6–7, 327, 330, 357, 360 multiple wh-, 4 NP-, 15, 18–22, 26–31, 165, 241 operator, 73–74, 115–116, 434 participle, 48
phonological, 52, 350 remnant, 141, 177–190, 197–198, 308, 315 surfing, 185, 190 V(erb)-, 42, 48–54, 133–137 (see also wh-movement) m-representation, 343–355, 362 default convention for chains, 349 default precedence convention, 349–356 multiple question, 69, 73–77, 80–86, 94–115, 220, 227, 314, 320, 404, 421, 424–425, 437 multiple spell-out, 6, 37, 46–57, 64, 436 narrative inversion, 137, 143 negation, 53–56, 62–63, 134–139, 159, 235, 329–330, 334, 340 nestedness, 3, 81, 86–90 Norwegian, 65, 90, 116 numeration, 10, 205–213, 230–234, 297–301, 310–314, 338, 383 object shift, 6, 37, 48–65, 235, 299 oblique, 21, 117, 131, 246, 272, 280 Old Catalan, 6, 125–132, 134–143 Old French, 125, 134 Old Icelandic, 142 Old Scandinavian, 37, 41, 61, 125, 141 Old Spanish, 125 operator constraint, 148, 152–155, 161– 162, 165–166, 169 Optimality Theory (OT), 3–5, 57, 106, 114, 178, 184–186, 192–199, 242, 248–281, 358–359, 401–407, 411, 422–426, 433–437 Asymmetric, 264–269, 273–276 Medium Strength, 9, 264–266, 269, 277–279 Strong, 264–265, 268, 274 Weak, 265–266, 269, 276–277 syntax, 4, 8–9, 196, 199, 281, 401, 404–407, 412, 422–426, 433–437 Optimality Theory constraints *BLOCK, 278
Index *MANYTOONE, 192 *PSV, 273 *ZEROTOONE, 192–194 BAR, 186–188, 402–403 BIAS, 272–273 DEP-OO, 253, 256 FAITH-OO, 252, 256–257 FEATUREFAITHFULNESS, 188–190, 197 IDENT-OO, 253 MAX-OO, 253 MINLINK, 186–197, 402 PAR-MOVE, 402 optimization, 177–178, 237, 244, 248, 263–269, 276–278 bidirectional, 199, 267–270, 281 comprehension, 267–268, 282 interpretation, 273 production, 267–269, 273 Optional EPP Feature Condition, 292, 297, 317 optionality, 40–41, 58, 131–132, 376, 423–424 output, 7, 76, 183, 189, 232, 248–259, 263, 266, 276, 293, 310, 316– 317, 359, 390, 404–407, 412, 422–426 overt subject, 83, 125–126, 130–131, 137, 142, 228 participle, 5, 37, 43, 47–57, 64–65, 127, 130, 139, 328–337, 341, 351–353, 360, 391, 420 particle, 5, 37, 65, 100, 196, 304, 360 verb, 63, 139, 329 patient, 14, 24 PF merger, 6, 37–51, 55–57, 60–62 phase, 3, 10, 63–64, 90, 183, 205, 211– 212, 216–227, 233–234, 265–266, 269, 277–278, 289–297, 315– 317, 393 Phase Balance, 297 Phase Impenetrability Condition (PIC), 217, 289–291 Phrase Balance, 10, 297–318
447
‘piacere’ class verbs, 18 pied piping, 15, 28, 156, 172, 300–301, 318, 340 Polish, 93–94, 105–107 Portuguese, 143, 157–158 postcyclic, 8, 372–381, 385, 391–393 prepositional object construction, 300–301 PRO, 29, 131, 316 pro, 39–40, 97, 129, 131–132, 138 pro-drop, 167 proarb, 332, 349–350 processing account, 11, 95 processing effect, 162 pronoun, 24, 153, 164, 167–169, 304, 316, 387, 419 clitic, 100, 141 indefinite, 101, 153, 164 object, 99–100, 305, 311 relative, 41, 134 wh-, 83, 101, 109–115, 133, 151–153, 161–163, 313 fronting, 166 placement in German, 3 quantifier-variable binding, 16 recoverability, 9, 250, 267, 270, 277– 279, 404–407, 425–426 representational phonology, 345–347, 354–357 principles, 85 restrictions, 90 strictly, 346, 350–351, 355–356 strongly, 294 weakly, 294–296 vs. derivational, 3–4, 178, 184, 198, 294–295, 389 vs. realizational, 342, 327, 357 Representational Hypothesis (RH), 7, 343–351, 350–351, 358–359, 362 Representational Verb-Second Convention, 347, 352, 362 Romance, 5, 132–136, 141–143, 362, 384
448
Index
Scandinavian, 6, 37, 41, 48, 50, 55–59, 61, 65, 125, 130–132, 136, 141, 362 scope, 32, 74–79, 87, 90–92, 159, 199, 205, 210, 221, 225, 231–232, 340, 362, 381, 405–406, 409– 412, 421, 434 scope index, 91, 314 scrambling, 78, 93, 102–105, 114, 118, 178–199, 232, 248, 258, 262, 299, 303–315, 318–320, 393, 430, 437 search space, 2, 197, 289–297 semantic type, 16–18, 147–148, 155– 162, 168–169 Serbo-Croatian, 3, 43 Slavic, 3, 93, 107–109, 411 Spanish, 59–60, 89–90, 97–98, 105, 409 Stranded Affix Filter, 45 Strict Cycle Condition (SCC), 290–298, 308, 317 Stylistic Fronting (SF), 5, 37–47, 55–61, 64, 125–131, 135, 140, 337, 357, 360 accessibility hierarchy in, 354 subject gap condition, 127–129, 334 subject gap restrition, 6, 37–45, 57 subject raising, 208–209, 212–213, 226, 235, 293–295, 304–305, 311, 319 subjunctive, 18, 25, 31, 80–84, 90, 154 Superraising, 10, 186–187, 210–213, 225, 229–231 Swedish, 48–50, 55, 62, 65, 85–89, 93– 96, 104–105, 118 that-trace-filter, 81–83, 91–92 Tibetan, 85 topic, 104–105, 111–112, 183, 187, 244, 258–260, 361 topicalization, 9, 50, 105, 114, 126–130, 178–180, 185, 190, 199, 319, 330–331, 352, 404, 409, 414– 420, 436 remnant, 331 V(P)-, 37, 49–51, 57, 64
Tzotzil, 9, 270–273, 279, 407 unaccusative, 7, 15–23, 26, 31, 99, 226, 369–370, 378, 382–383, 389, 392 Unambiguous Domination, 307–309 verb-second, 132–134, 138, 337–342, 347–349, 352–353, 358–362, 414, 417, 421 voice, 270–275, 279 volition(ality), 243, 251, 255, 259–260 Weak Crossover Effect (WCO), 28–32, 318 wh-adjunct, 84–86, 90, 95, 157–160, 169, 220 wh-cluster, 108–110, 437 wh-in situ, 11, 44, 59, 78–84, 90–92, 113–115, 147–171, 220, 224, 228, 301, 312, 315–316, 320, 393, 428–429, 437 wh-movement, 2–3, 6, 9, 43–45, 59, 64, 79, 90, 93, 96, 103, 164, 168–169, 179–180, 199, 222, 289, 292–293, 298–315, 319–320, 355–356, 362, 376–377, 393, 401, 404, 409–414, 417, 421 wh-scope marking construction, 79, 90 word order, 7–8, 135, 196, 214, 242–250, 258–263, 270–272, 327–330, 334–340, 346–348, 355–357, 361, 407, 425 canonical, 125, 128, 137, 247, 269 unmarked, 244–246, 258 word order freezing, 5, 8–9, 110, 241– 245, 248–250, 263–270, 279– 281, 404, 425, 434 workspace, 10, 297–304, 309–315, 319–320 Yiddish, 110–111, 125, 155–157