The Grammar of Polarity: Pragmatics, Sensitivity, and the Logic of Scales (Cambridge Studies in Linguistics)

Th e G rammar of Pol ari t y Many, and perhaps all, languages include constructions which are sensitive to the express...

Author: Michael Israel

87 downloads 751 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Th e G rammar of Pol ari t y

Many, and perhaps all, languages include constructions which are sensitive to the expression of polarity: that is, negative polarity items, which cannot occur in affirmative clauses, and positive polarity items, which cannot occur in negatives. Although relatively unknown outside of linguistics, the phenomenon of polarity sensitivity has been an important source of evidence for theories about the mental architecture of grammar over the last fifty years, and to many the oddly dysfunctional sensitivities of polarity items have seemed to support a view of grammar as an encapsulated mental module fundamentally unrelated to other aspects of human cognition or communicative behavior. This book draws on insights from cognitive/functional linguistics and formal semantics to argue that, on the contrary, the grammar of sensitivity is grounded in a very general human cognitive ability to form categories and draw inferences based on scalar alternatives, and in the ways this ability is deployed for rhetorical effects in ordinary interpersonal communication. The book surveys a wide variety of polarity items, both negative and positive, commonly found in English and other languages and shows that grammatical sensitivities arise regularly and only in semantic domains which are inherently scalar. m i c h ae l is r a e l is Associate Professor of English Language at the University of Maryland, College Park.

In this series 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127.

roger lass: Historical linguistics and language change john m. anderson: A notional theory of syntactic categories bernd heine: Possession: cognitive sources, forces and grammaticalization nomi erteschik-shir: The dynamics of focus structure john coleman: Phonological representations: their names, forms and powers christina y. bethin: Slavic prosody: language change and phonological theory barbara dancygier: Conditionals and prediction c l a i r e l e f e b v r e : Creole genesis and the acquisition of grammar: the case of Haitian creole heinz giegerich: Lexical strata in English keren ri ce: Morpheme order and semantic scope april mc mahon: Lexical phonology and the history of English matthew y. chen: Tone Sandhi: patterns across Chinese dialects gregory t. stump: Inflectional morphology: a theory of paradigm structure joan bybee: Phonology and language use laurie bauer: Morphological productivity thomas ernst: The syntax of adjuncts elizabeth closs traugott and richard b. dasher: Regularity in semantic change maya hickmann: Children’s discourse: person, space and time across languages diane blakemore: Relevance and linguistic meaning: the semantics and pragmatics of discourse markers ian roberts and anna roussou: Syntactic change: a minimalist approach to grammaticalization donka minkova: Alliteration and sound change in early English mark c. baker: Lexical categories: verbs, nouns and adjectives carlota s. smith: Modes of discourse: the local structure of texts rochelle lieber: Morphology and lexical semantics holger diessel: The acquisition of complex sentences sharon inkelas and cheryl zoll: Reduplication: doubling in morphology susan edwards: Fluent aphasia barbara dancygier and eve sweetser: Mental spaces in grammar: conditional constructions hew baerman, dunstan brown and greville g. corbett: The syntaxmorphology interface: a study of syncretism marcus tomalin: Linguistics and the formal sciences: the origins of generative grammar samuel d. epstein and t. daniel seely: Derivations in minimalism paul de lacy: Markedness: reduction and preservation in phonology yehuda n. falk: Subjects and their properties p. h. matthews: Syntactic relations: a critical survey mark c. baker: The syntax of agreement and concord gillian catriona ramchand: Verb meaning and the lexicon: a first phase syntax pieter muysken: Functional categories juan uri agereka: Syntactic anchors: on semantic structuring d. rober t ladd: Intonational phonology second edition leonard h. babby: The syntax of argument structure b. elan dresher: The contrastive hierarchy in phonology david adger, daniel harbour and laurel j. watkins: Mirrors and microparameters: phrase structure beyond free word order niina ning zhang: Coordination in syntax neil smith: Acquiring phonology nina topintzi: Onsets: suprasegmental and prosodic behaviour cedric boeckx, norbert hornstein and jairo nu ň es: Control as movement michael israel: The grammar of polarity: pragmatics, sensitivity, and the logic of scales

Earlier issues not listed are also available

CAMBRIDGE STUDIES IN LINGUI STI CS General Editors: p. austin, j. bresnan, b. comrie, s. crain, w. dressler, c. j. ewen, r. lass, d. lightfoot, k. rice, i. roberts, s. romaine, n. v. smith

The Grammar of Polarity

THE GRAMMAR OF POLARIT Y P RAG MAT IC S , S EN S I TIVITY, AND THE LO GI C OF S C ALES

M ich a el Is r a el University of Maryland, College Park

ca mbridge uni ve r s i t y p r e s s Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Tokyo, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521792400 © Cambridge University Press 2011 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2011 Printed in the United Kingdom at the University Press, Cambridge A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data Israel, Michael, 1965– The grammar of polarity : pragmatics, sensitivity, and the logic of scales / Michael Israel. p. cm. Includes bibliographical references and index. ISBN 978-0-521-79240-0 1. Polarity (Linguistics) 2. Grammar, Comparative and general–Negatives. 3. Grammar, Comparative and general–Syntax. 4. Semantics. I. Title. P299.N4.I87 2011 415–dc22 2010051866 ISBN 978-0-521-79240-0 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

The more that I philosophize The more and more I realize That little things which I despise, Like peanut shells and grains of sand, Are very hard, hard to understand. Delmer Israel, To Harry F. Harlow

Contents

List of figures List of tables Acknowledgments List of abbreviations 1

Trivium pursuits

page xii xiii xiv xvi 1

1.1 As above, so below 1.2 A quirk of grammar or a trick of thought? 1.3 The hypothesis: sensitivity as lexical pragmatics 1.4 Putting pragmatics in its place Pragmatics in a usage-based grammar 1.5

1 2 7 10 14

2

Ex nihilo: the grammar of polarity

20

2.1 2.2 2.3 2.4 2.5 2.6

The simplicity of negation The complexity of polarity The phenomenon of polarity sensitivity Basic mysteries: three problems of polarity sensitivity Varieties of polarity sensitivity The Scalar Model of polarity sensitivity

20 21 26 30 37 47

3

Licensing and the logic of scalar models

48

3.1 What is a polarity context? 3.2 Fauconnier’s insight 3.3 The natural logic of scalar models 3.4 Affectivity as a mode of scalar construal 3.5 Syntactic constraints on scalar contruals 3.6 Polarity contexts are mental spaces

48 49 51 61 70 78

4

Sensitivity as inherent scalar semantics

79

4.1 4.2

Scalar operators Two scalar properties

79 81 ix

x Contents 4.3 Four sorts of polarity items 4.4 Sensitivity and the square of oppositions 4.5 The conspiracy theory of polarity licensing The anomaly of inverted polarity items 4.6

5

85 92 93 95

The elements of sensitivity

104

5.1 The Informativity Hypothesis Quantitative semantics 5.2 5.3 The pragmatics of informativity 5.4 Assessing informativity 5.5 Rhetorical coherence in polarity contexts 5.6 Compositional sensitivities

104 105 109 116 120 121

6

126

The scalar lexicon

6.1 Paradigmatic predictions of the Scalar Model 6.2 Modal polarity items 6.3 Connective polarity items 6.4 Aspectual polarity items 6.5 The limits of diversity

126 127 142 151 161

7

163

The family of English indefinite polarity items

7.1 The many splendors of any 7.2 Indefinite family resemblances 7.3 Emphatic construals of indefinite any 7.4 The effects of phantom reference 7.5 Some uses of some 7.6 The limits of free choice 7.7 Indefinite conclusions

163 164 168 180 188 196 200

8

202

Polarity and the architecture of grammar

8.1 High stakes grammar 8.2 Terms of the debate 8.3 The syntactic approach 8.4 Semantic approaches 8.5 Toward a more pragmatic approach

202 203 206 212 227

9

233

The pragmatics of polarity licensing

9.1 Affectivity reconsidered 9.2 Scalar construal 9.3 Logical conditions are not sufficient 9.4 Logical conditions are not necessary

233 235 237 243

Contents xi 9.5 Rhetorical coherence 9.6 Affectivity reclaimed

250 254

10

256

Visions and revisions

Appendix: A catalogue of English polarity items Notes References General index Person index

258 267 270 285 290

Figures

2.1 Haspelmath’s semantic map of indefinite functions 3.1 A scalar model of puzzles 3.2 A two-dimensional scalar model 4.1 Four sorts of polarity items 4.2 Polarity items and the square of opposition 4.3 Canonical and inverted polarity items 6.1 A connective lattice 6.2 Durative until 6.3 Punctual until 8.1 The monotonicity hierarchy

xii

page 33 58 59 90 93 97 147 158 159 219

Tables

1 English quantifiers and indefinite constructions 2 Distributions of three polarity items

page 165 175

xiii

Acknowledgments

This book began with an epiphany in a stairwell by the sea near San Diego. The idea that two rhetorical tropes, exaggeration (emphasis) and understatement (attenuation), might explain the entire grammar of polarity sensitivity (NPIs and PPIs), seemed in an instant so neat, obvious, and simple, I was sure it must be obviously wrong or else already widely assumed, or perhaps both. Now I think the idea was both less obvious and more correct than I first suspected. That idea became the basis for a qualifying paper in 1994, a paper in Linguistics and Philosophy in 1996, and a dissertation in 1998, as well as a handful of shorter works (Israel 1997, 1999, 2001, 2006), and now, finally, for this book. Even now I wonder if I have done justice to this one little idea, but I know that what justice I have done, I could never have done alone. While I am entirely responsibile for the inadequacies which remain in this work, I am deeply in the debt of others for what virtues I have managed to include. Probably I never could have had the idea at all were it not for the extraordinary scholars and teachers who inspired me on my way. It was Chuck Fillmore who first introduced me to polarity items and Eve Sweetser who first taught me to see the rhetoric in lexical semantics, and neither seems ever to have tired of encouraging me since. Adele Goldberg, Suzanne Kemmer, and George Lakoff, each in their different ways, taught me to seek the connections between grammar and meaning, and to appreciate the importance of doing so. Ron Langacker was always generous to me with his thoughts and patient and kind as he encouraged me to develop my own. I am deeply grateful for the thoughtful advice and meticulous readings he has given to me and this work over the years. Gilles Fauconnier, whose old ideas are at the heart of this work, was unstinting in his willingness to revisit old issues here and to help me as I worked through them again. And special thanks are due to Larry Horn, who has been generously reading and responding to drafts of this work almost from the beginning. His unflagging enthusiasm has sustained me throughout and his insights have greatly improved the final product. xiv

Acknowledgments xv Many others have read and commented on drafts of this work and contributed to its final form. I am thus grateful to Raùl Aranovich, Chris Barker, Christine Bartels, Jack Hoeksema, Bill Ladusaw, Haj Ross, Neil Smith, and Yuki Kuroda. Very special thanks are also due to Peter Mallios and Tess Wood for their generosity as readers and editors in the long last stages of this writing, and to my editorial team at Cambridge – Jacqueline French, Sarah Green, Tom O’Reilly, and Andrew Winnard – for all their hard work. Many others have aided, abetted, encouraged, and inspired me in this work. Scholarship is nurtured in friendship – it is hard to say where one begins and the other ends – and I am grateful for the friendly feedback and lively conversations I have enjoyed with, among others, Michel Achard, Noah Baum, Patty Brooks, Bill Byrne, Claudia Brugman, Kathleen Carey, Linda Coleman, Bill Croft, Seana Coulson, Adrian Cussins, Michelle Cutrer, Rich Epstein, Jeanne Fahnestock, Anastasia Giannikidou, David Gil, Joe Grady, Peter Harder, Martin Haspelmath, Dennis Hilton, Paul Hirschbühler, Chris Johnson, Paul Kay, Henny Klein, Margaret Langdon, Pierre Larrivée, Chungmin Lee, Phil Lesourd, Jeff Lidz, Louise McNally, Laura Michaelis, Bill Morris, Karin Pizer, Hotze Rullmann, Scott Scwhenter, Ron Sheffer, Vera Tobin, Michael Tomasello, Elizabeth Traugott, Mark Turner, Karen van Hoek, Arie Verhagen, Gregory Ward, Paul Weinstein, Deirdre Wilson, and Ton van der Wouden. Finally, I am grateful to my family and friends for their love and support over the years, and especially to Tess Wood and Zev Israel who have had to live with me, and sometimes without me, as I wrote this. Words cannot express my gratitude.

Abbreviations

ACC Accusative ADJ Adjective API Affective polarity item BNC British National Corpus CN Common Noun DAT Dative DE Downward entailing DEC Declarative DET Determiner DISJ Disjunctive ERG Ergative FC Free choice FP Focus particle FUT Future IC Implication Constraint IMPF Imperfective INDEF Indefinite INF Infinitive LF Logical form LM Landmark MOD Modal N Noun NEG Negative NOM Nominal (i.e. N′, the complement of a determiner in an NP) NP Noun Phrase NPI Negative polarity item OED Oxford English Dictionary P Preposition/particle PFV Perfective PL Plural xvi

List of abbreviations xvii PPI Positive polarity item PRO Pronoun PS Polarity sensitive S Finite clause SG Singular SUBJ Subjunctive UE Upward entailing TR Trajector V Verb VP Verb Phrase WSJ Wall Street Journal

1 Trivium pursuits

But the truth is, they be not the highest instances that give the securest information, as may be well expressed in the tale so common of the philosopher that while he gazed upwards into the stars fell into the water; for if he had looked down he might have seen the stars in the water, but looking aloft he could not see the water in the stars. So it cometh often to pass that mean and small things discover great, better than great can discover the small. Bacon, The Advancement of Learning, Book II, 1.v. (1605)

1.1

As above, so below

Bacon’s philosopher might be forgiven for looking too much upwards and not enough down. We look “up” not just to the stars and the sky, but to those we admire and to our highest ideals. We look “down,” as often as not, on things we despise, things beneath us, which are low, mean, and base. Familiarity breeds contempt, and it is easy to forget that what lies beneath may also run deep. Figuratively speaking, up is where it’s at. Up is above, on top of, superior to, beyond; it is higher than, taller than, farther than, and more. It can be a location or a direction. It is defined within a larger frame, the vertical scale, which it shares with down – normally, the physical dimension parallel to an upright person standing erect on an even surface. The basic experience of bodily uprightness motivates the common metaphorical associations of being “up” with wakefulness, alertness, strength, reason, and virtue, and being “down” with sleep, weakness, folly, and vice. This massive alignment of evaluative metaphors along a vertical scale is not just some whim of imaginative fancy, nor is it unique to English. Indeed, it is a normal way for conceptual contents to be imaginatively structured across semantic domains – a reflection in grammar of the workings of the mind. The basic opposition between ‘up’ and ‘down,’ and the many metaphorical oppositions it engenders, are themselves symptoms of a much more general tendency for human concepts to be structured in terms of contraries. All languages, it seems, have metaphors in which abstract notions like ‘truth’ and 1

2 The Grammar of Polarity ‘goodness’ are fleshed out in terms of more basic bodily experiences, and one of the most basic experiences featured in such metaphors is the sense of opposition one may feel between contrary concepts like ‘up’ and ‘down,’ ‘light’ and ‘dark,’ or ‘hot’ and ‘cold.’ Contrariety itself is a quintessentially abstract concept, but it is immanent in our most down-to-earth experiences. The human mind thrives on the logic of contraries, and this is everywhere reflected in the structure of language, from the most basic phonemic oppositions and antonymic lexical pairings to the elementary rules for predicate affirmation and denial. Keeping with Bacon’s advice, this book looks mainly down at little things in order to glimpse therein the image of something great. The little things of concern here are matters of grammar – ordinary constructions of everyday talk and their attendant bits of form and meaning. The greater things to be discovered are the elements and principles of thought itself: the commonsense imaginative abilities which allow us, the speaking ape, to entertain concepts and to share them with one another. 1.2

A quirk of grammar or a trick of thought?

This book is concerned with a single, intricate, and easily overlooked grammatical phenomenon going by the awkward name of polarity sensitivity. Many, and perhaps all, human languages include a class of constructions which are somehow sensitive to the expression of polarity – forms whose acceptability in a sentence can depend on whether that sentence is grammatically negative or affirmative. Such polarity items arise in many semantic domains and come in many morphosyntactic flavors; but, since polarity itself is a binary relation, all polarity items divide into two basic classes: positive polarity items (PPIs), which are unacceptable in the scope of negation, and negative polarity items (NPIs), which are unacceptable in simple affirmative contexts. Both NPIs and PPIs can be found side by side in semantic domains they share with semantically similar but grammatically insensitive (or neutral) constructions. The data in (1–4), for example, reveal four sets of sensitivity triplets – items with similar semantics but different sensitivities – taken from four basic semantic domains: (1) agentive effort, (2) epistemic possibility, (3) propositional conjunction, and (4) event frequency. For each domain, the examples in (i) illustrate neutral items, those in (ii) illustrate PPIs, and those in (iii) illustrate NPIs. The unacceptable sentences in (ii–iiib) give some impression of what happens when a polarity item occurs in the wrong sort of context.

Trivium pursuits 3 (1) EFFORT: (i) make an effort to V, (ii) take a stab at V-ing, and (iii) even bother to V. i) a. He made an effort to solve the puzzle. b. He didn’t make an effort to solve the puzzle. ii) a. He took a stab at solving the puzzle. b. *He didn’t take a stab at solving the puzzle. iii) a. *He even bothered to solve the puzzle. b. He didn’t even bother to solve the puzzle. (2) POSSIBILITY: (i) be likely to V, (ii) could well V, and (iii) can possibly V. i) a. She is likely to win the race. b. She is not likely to win the race. ii) a. She could well win the race. b. *She couldn’t well win the race. iii) a. *She can possibly win the race. b. She can’t possibly win the race. CONJUNCTION: (i) and, (ii) as well as, and (iii) let alone. (3) i) a. Chris has read the Aeniad and the Georgics. b. Chris hasn’t read the Aeniad and the Georgics. ii) a. Sally has read the Aeniad as well as the Georgics. b. *Sally hasn’t read the Aeniad as well as the Georgics. iii) a. *Glynda has read the Aeniad, let alone the Georgics. b. Glynda hasn’t read the Aeniad, let alone the Georgics. FREQUENCY: (i) to V X a lot, (ii) be always V-ing X, (iii) to V X much. (4) i) a. Ann listens to the Grateful Dead a lot. b. Ann doesn’t listen to the Grateful Dead a lot. ii) a. Hugh is always listening to the Grateful Dead. b. *Hugh isn’t always listening to the Grateful Dead. iii) a. *Jeff listens to the Grateful Dead much. b. Jeff doesn’t listen to the Grateful Dead much.

The proper way to account for this little phenomenon has been a subject of long-standing and at times rather intense controversy in theoretical linguistics. These are not the sorts of facts one is likely to notice about a language, but they are remarkable nonetheless. One would expect that anything one could affirm, one could also deny, and that anything one could deny, one could also affirm. But polarity items are subject to special constraints, the violation of which results in unexpectedly unacceptable sentences. These constraints are more complicated than the examples here suggest since NPIs can be licensed, and PPIs blocked, in a variety of contexts beside clausal negation – among others, in questions, and in conditional (if) and comparative (than) clauses (see below §2.3.2). Still, the fundamentally striking observation here is that a simple switch in polarity can make an otherwise unobjectionable sentence

4 The Grammar of Polarity not just unacceptable, but apparently ungrammatical. The problem with these sentences is not just one of semantic anomaly (since it is clear what they should mean) nor of any obvious pragmatic infelicity (for it is easy to see how they might be used). Rather, something about these sentences seems to make them intrinsically incoherent. The question is, what is the nature of this incoherence? What, precisely, is wrong with these sentences? How should this wrongness be represented in a theory of grammar? And crucially, what is it about the way speakers understand such sentences that makes them feel so wrong? To answer these questions, one must confront fundamental questions about the nature of grammar and meaning. Almost from the start of generative linguistics, polarity items have been a battleground in debates about the nature of grammatical representation (Lees 1960; Bolinger 1960; Klima 1964; Baker 1970), and as theories have evolved, polarity items have remained a flashpoint. Polarity sensitivity neatly straddles the realms of syntax, semantics and pragmatics, so that a theory of polarity necessarily raises questions not just about the interfaces between these components, but ultimately about the architecture of grammar itself and the grammar’s relation to extra-linguistic aspects of cognition (Fauconnier 1975a, 1976; Ladusaw 1979, 1983; Linebarger 1980, 1987; Israel 1996, 1998a, 2004; Chierchia 2004; Giannakidou 2006). For the most part, these debates have turned on the question of what sorts of entities are needed in a theory of grammatical representations in order to account for the constraints on polarity items. The distributions of polarity items have thus served as evidence that the grammaticality of a sentence may depend on its entailments (Baker 1970) or on its implicatures (Linebarger 1980, 1987, 1991), and as such they have played a central role in debates about the nature of logical form as a level in grammatical representations. Most famously, perhaps, Ladusaw (1979, 1983) has argued that the grammar of polarity items depends on a fully interpreted level of logical form where negative polarity items are constrained to appear in the immediate scope of a downward entailing (DE) operator. According to this proposal, the model-theoretic representation of a sentence’s literal truth conditions is itself a part of grammar – a level where constraints on well-formedness are defined – and not merely the product of more general cognitive abilities operating on the output of a generative grammar. However one chooses to formulate the constraints on polarity items, one must also confront the problem of how language users manage to learn these constraints. Polarity items epitomize a classic quandary of language acquisition: the absence of negative evidence (Braine 1971; Bowerman 1988; Pinker 1989). Somehow speakers learn the grammar of polarity items without

Trivium pursuits 5 hearing the ways these forms cannot be used. But what speakers have to learn about polarity items is precisely the ways they are not used. The obvious way one could learn such a thing – the way linguists in fact learn it – is to find an instance of a polarity item in a context where it cannot be used and to observe – whether by introspection or controlled elicitation – the oddness of its usage. But of course ordinary speakers can never make such an observation since the oddness, or “ungrammaticality,” of such uses normally prevents their occurring at all. In fact, one of the few places such uses do occur (though even here they are rare) is in the spontaneous speech of very young children. The examples below, from the CHILDES database (MacWhinney 1995), illustrate the sort of uncertainty typical of young children’s early uses of polarity items. In (5–6), Abe is just under 33 months old (2;8.22), when he uses the idiomatic NPI in my life in a conversation with his father about an orange fish (Kuczaj 1976): (5) *fat: I bet if you used one of those orange fish # you could catch something what do you think? *ab e: what orange fish? *ab e: what orange fish? *ab e: I never heard of that my life. *fat: you never heard of that in your life? *ab e: I wan(t) (t)a go catch a corn fish. (File 032 – lines 47–53)

In this first use, the NPI (or something close to it, since Abe actually omits the preposition in) is licensed by the negative never. Most likely Abe has learned the NPI here [in pro’s life] as part of a larger idiom – something like never heard of X in my life. But whatever the details, Abe’s usage here is clearly flexible and creative, as moments later he produces the same item in a simple affirmative sentence, without never or any other negative licensor. (6)

*fat: *ab e: *fat: *ab e: *fat: *ab e:

what kind do you want to catch? a [/] a [/] a [/] a stair fish. a stair fish? uhhuh I heard of that in my life. you heard of that is [sic] your life? uhhuh I can’t fish like that. (File 032 – lines 118–23)

Apparently, Abe at this age was not yet aware of the constraints which limit expressions like in my life to negative contexts, or if he was, he did not yet realize that this particular expression is subject to such constraints. A similar pattern of confusion appears in Nina’s corpus (Suppes 1974), where, on one occasion at 36 months (3;0.16), the child seemed

6 The Grammar of Polarity to vacillate between any more and some more in several repetitions of the same clause. (7)

*n i n: have to close (th)em # (be)cause it’s not raining any more. *n i n: when it’s raining some more. *n i n: it’s not raining some more now. *n i n: it’s not raining any more # so we have to close this one. (File 44 – lines 640, 652, 653, and 687)

If these sorts of anecdotal observations are at all representative (and they are certainly not uncommon), it appears that whatever children might know about the theoretical constraints on polarity items, it is not enough to keep them from using such items in some very unconstrained ways. Even if one assumes that speakers come equipped with some innate know ledge of the constraints which govern polarity items, speakers still must learn the particular constructions in their language that are sensitive, which sensitivities they have, and just how strongly sensitive they are. This is a formidable problem since languages vary widely both in the polarity items they include and in the details of their distributions. Moreover, as the data in (1–4) show, near synonyms can and do vary sharply in their sensitivities. Somehow, it seems, speakers must master these subtleties on a case-by-case basis. It thus seems reasonable to follow van der Wouden’s suggestion (1997: 80), that while “the mechanisms underlying the behaviour of polarity items are part of grammar; the specific behaviour of individual polarity items is part of the lexicon.” Still, the question is, just how do these grammatical mechanisms find their way into the individual polarity items? This book seeks answers to this and other questions about the grammar of sensitivity by viewing polarity items, and sensitive items in general, in terms of the semantic and pragmatic contents they encode in observable discourse (whether “real” or in some way experimentally contrived). I assume, in other words, that polarity items are polarity sensitive because of the meanings they encode, so that speakers effectively learn “the grammar” of these constructions (i.e. their particular sensitivities) the same way they learn the meaning and use of any other linguistic construction. This does not mean that “the grammar” here is not in some sense “innate” or “universal.” There are universal constraints on what a human mind may imagine, and on what sorts of imaginings can be encoded by a linguistic construction. But such constraints might take a variety of forms, and it is far from clear which, if any, of our innately human predispositions consists precisely in a constraint on linguistic representations. I will argue here that the distributions

Trivium pursuits 7 of polarity items, at least, are not determined by constraints on linguistic representations per se, but rather reflect the operation of general cognitive abilities in ordinary communicative interactions. My goal is to explain not just why polarity items have the peculiar distributions they do or how speakers manage to learn these distributions, but also why it is that polarity items should exist in the first place. I argue that polarity sensitivity in general arises as a grammatical consequence of the ways language users regularly exploit a basic conceptual ability for rhetorical purposes. The conceptual ability here is the ability to reason in terms of scales – the ability, that is, to construe an entity within a particular sort of semantic frame, a scalar model, and to make inferences based on this construal. 1.3

The hypothesis: sensitivity as lexical pragmatics

The basic theory – what I call the Scalar Model of Polarity – is simple. The claim is that polarity contexts are defined by their effects on scalar inferences and that polarity items encode semantic properties which make them sensitive to such inferences. Polarity items are thus a special class of what Fillmore, Kay, and O’Connor (1988) and Kay (1990) term “scalar operators” – forms which must be interpreted with respect to an appropriately structured scalar model. In particular, I claim, sensitivity arises from the interaction of two sorts of scalar semantic properties – quantitative (q-) value and informative (i-) value – each of which functions independently of polarity sensitivity, but which together constitute the necessary and sufficient conditions for a construction to be polarity sensitive. A form’s q-value depends on its relative position (either high or low) in a scalar ordering. A form’s i-value reflects the informative strength (either emphatic or attenuating) of the proposition to which the form contributes its meaning. Both features are grounded in the logic of scalar reasoning and the rhetoric of interpersonal communication. Their combination within a single form effectively limits that form to contexts which allow the scalar inferences needed to make both values felicitous. The theory makes clear predictions about where polarity items might be found in a language and what forms they can take. Most generally, the theory predicts the existence of four broad classes of polarity items: NPIs divide into emphatic forms with low q-value and attenuating forms with high q-value; PPIs divide into attenuating forms with low q-value and emphatic forms with high q-value. All four sorts are well attested in English and other languages,

8 The Grammar of Polarity and the theory predicts that all sorts of polarity items from all sorts of domains fit this broad taxonomy. The idea that sensitivity might be related to scalar semantics is not new: it has been advanced in one way or another by an impressive set of theorists (e.g. Schmerling 1971; Horn 1972, 1989, 2005; Fauconnier 1975a, 1976; Fillmore, Kay & O’Connor 1988; Kadmon & Landman 1993; Lee & Horn 1994; Krifka 1995; Haspelmath 1997; Lahiri 1998; van Rooy 2003; Zepter 2003), and disputed in one way or another by an equally impressive set (Linebarger 1980; Progovac 1992, 1994; Rullmann 1996; Giannakidou 1998, 1999; Chierchia 2004; Szabolsci 2004). The present work, however, makes the unusual claim (though see Verhagen 2005 for a similar view) that polarity items are not just scalar in their propositional semantics, but also in their pragmatics. Polarity items are, I contend, argumentative operators1 which conventionally index an argumentative attitude – an attitude, that is, toward the expressed content of an utterance; or, in Gricean terms, toward what is (baldly and explicitly) said. For my purposes here, it will suffice to distinguish just two major types of argumentative attitude, emphasis and attenuation, each of which may attach to either a positive or a negative proposition. Constructions which express an emphatic attitude – for example, the English [really Adj] and [(not) at all Adj] constructions in (8) and (9) – present an expressed proposition (what is said) as somehow stronger and more significant than an alternative proposition which might have been said. Conversely, constructions expressing an attenuating attitude – like [sort of Adj] and [(not) such a Adj] in (10) and (11) – hedge what is said, and present a proposition as weaker and less exciting than it might have been. (8)

a. That’s true. b. That’s really true.

p p (> n)

(9)

a. That’s not true. b. That’s not true at all.

~p ~p (> n)

(10)

a. That’s a good idea. b. That’s sort of a good idea.

q q (< n)

(11)

a. That’s not a good idea. ~q b. That’s not such a good idea. ~q (< n)

The constructions here illustrate the four basic sorts of argumentative meanings. These are very general sorts of meaning, and as such can be (and typically are) coded by a great many constructions within a single language. The notations on the right reflect the status of these sentences as neutral, emphatic,

Trivium pursuits 9 or attenuating: “p” and “q” here stand for expressed propositions, “(n)” for a salient alternative proposition (the parenthesis indicates its status as implicit or backgrounded), and the “more than” (“>”) and “less than” (“k. The general phenomenon was first brought to the attention of linguists in Grice’s 1967 William James lectures at Harvard (later appearing as Grice 1975). The examples in (7–9) are typical: in each case, a speaker’s utterance of the (a) sentence may conversationally implicate the conclusion expressed in the corresponding (b) sentence, and in each case, the generation of this implicature depends on the italicized expressions featured in Horn scales. (7)

a. Zelda sometimes drinks her whiskey neat. b. (For all S knows) Zelda doesn’t always drink her whiskey neat.

(8)

a. It is possible that the USA will invade Costa Rica. b. (For all S knows) it is not necessary that the USA will invade Costa Rica.

(9)

a. Michael has seen the movie or read the book. b. (For all S knows) Michael has not seen the movie and read the book.

The conclusions in the (b) sentences do not follow logically from the truth of the (a) sentences, but they are natural inferences which can be, and regularly are, exploited in everyday communciation. Crucially, as Grice argued, the

Licensing and the logic of scalar models 53 ability to make such inferences is not specifically linguistic in nature but rather reflects a general cognitive ability and a general presumption that speakers will act in a rational manner. The key to these and other sorts of conversational implicatures can be located in a general principle, Grice’s Cooperative Principle, that, all things being equal, the participants in a communicative exchange can be assumed to be cooperating with each other. More specifically, scalar implicature exploits Grice’s first submaxim of quantity, which states that in order to be cooperative a speaker should, as Grice puts it (1975: 46), “make [her] contribution as informative as is required (for the current purposes of the exchange).” So, if a speaker says something less informative than something else she could have said just as easily, one may draw the inference that she doesn’t believe the more informative thing, for otherwise she should have just said it. This thumbnail sketch of scalar implicature does justice neither to the complexity of the phenomenon nor to the sophistication of its feature-length treatments. (Those seeking such justice might look in Horn (1972, 1989), Gazdar (1979), Levinson (1983, 2000), Hirschberg (1985), Carston (1995, 1998), Matsumoto (1995), or Schwenter (1999a), among others.) The important point for our purposes is just that people do seem to possess a general and very basic ability for scalar reasoning, and that this basic ability has significant linguistic consequences. Among other things, as I have already hinted, this general cognitive ability holds the key to the grammar of polarity sensitivity. The question is, what exactly is scalar reasoning? As many have pointed out, and as Horn himself freely concedes (1989: 231–42), the original conception of quantitative scales being ordered by semantic entailment may be too narrow. As Fauconnier (1975a, b, 1976, 1978) and Hirschberg (1985) demonstrate, actual entailments are not required for scalar implicatures. Scalar inferences are often not logically valid, but rather depend on general and contingent pragmatic knowledge about how the world normally seems to work. This is clearly the case, for example, in the quantificational interpretations of examples (3–4) with their conjectures about a famous tycoon’s purchasing power and the Pope’s potential temptation to use viagra. Hirschberg (1985), in particular, shows that the types of relation which support scalar implicatures (and, by extension, scalar reasoning in general) go well beyond the logical orderings of Horn scales to include a variety of linear orderings, hierarchical rankings, whole/part relations, entity/attribute relations, set/subset relations, type/instance relations, and orderings defined by step-bystep processes. Hirschberg argues that the relations which can define a scale and thus support scalar implicature in fact include all and only those relations

54 The Grammar of Polarity which define partially ordered sets (or posets). As far as I can tell, this is a robust result, and given this result, we are now in position to define the crucial mechanisms which support scalar reasoning. 3.3.2 Cognitive foundations: conceptual scales Scalar reasoning is a general cognitive ability. As such, it seems reasonable to assume that it is based on very general cognitive constructs. The constructs which I will take as the foundation for scalar reasoning are conceptual scales and scalar models. A conceptual scale is simply a partially ordered set of conceptual entities. In the spririt of Hirschberg (1985), I assume that the ordering of elements in a conceptual scale is determined by an ordering metric, understood as a relation which is irreflexive, asymmetric, and transitive. These three notions are defined as follows: (10)

Given a set Q with elements {…, qi, qj, qk, …}, and a relation R defined on Q: R is irreflexive iff, for all qi ∈ Q, ¬ qi R qi . R is asymmetric iff, for all qi, qj ∈ Q, qi R qj ⇒ ¬ qjR qi . R is transitive iff, for all qi, qj, qk ∈ Q, (qi R qj & qj R qk) ⇒ qiR qk .

The formal nature of these definitions should not distract one from the basic simplicity of the idea behind them. The relations which satisfy these defin itions are, for the most part, the sort of thing one would expect to define a scale. Prominent among these are comparative relations defined on gradable predicates – be taller than, be harder than, be more likely than, be less intelligent than, be less interesting to read than – and absolute comparatives like be more than and be less than. But the class of ordering metrics for conceptual scales is much broader than our ordinary, pretheoretical notion of a scale. Conceptual scales are not simply orderings of amounts or degrees but include a variety of other relations, like the kind of relation in a taxonomic hierarchy and the inclusion relation of set membership, which, though they might not look prototypically scalar, have similar logical structures and support similar patterns of inferencing as do other scales. Hirschberg, like Horn before her, defines her scales as orderings of linguistic expressions; however, since scalar reasoning appears to be a general cognitive ability rather than a specifically linguistic one, I assume that the entities ordered by a conceptual scale are in fact conceptual structures. That is, the scalar relation between a set of words like freezing, cold, and cool reflects the fact that our experience of cold things is itself fundamentally scalar. Scalar reasoning is a way of thinking about these sorts of scalar experiences, and it works

Licensing and the logic of scalar models 55 whether or not the particular experiences are conventionally associated with a particular linguistic expression. Of course, as Levinson (2000) emphasizes, languages do regularly include sets of expressions which are paradigmatically arranged as profiling distinct scalar values – i.e. the Horn scales discussed above – and speakers do exploit such paradigmatic oppositions to generate scalar implicatures. My point is just that these sorts of linguistic oppositions themselves depend on conceptual scales. Technically, a conceptual scale is an ordered pair 〈Q, R〉, where Q is a set of conceptual structures and R is a relation which defines a partial order on the elements of Q. Effectively, then, R is a set of ordered pairs 〈qi, qj〉. Generally, the elements of Q will include all and only those conceptual structures which can be ordered by the relation R. Stretching the mathematical notion of a domain, we can say that the elements of Q constitute a semantic domain, defined in Langacker (1987: 488) as “a coherent area of conceptualization relative to which semantic units may be characterized.” This kind of semantic domain is much broader than what is needed just to define a conceptual scale – it includes, for example, such “coherent areas of conceptualization” as the geometric structure of space, the rules of baseball, and the nature of the speech act situation. Nonetheless, and rather trivially, the ordered elements on a conceptual scale necessarily constitute a semantic domain relative to which an ordering relation is characterized. Some simple examples may serve to illustrate. Time, or, more precisely, the set of points in time, functions as the semantic domain for relations like be earlier than and be later than. These two relations, then, define two distinct conceptual scales with converse orderings on the domain of time: in one, times are ranked from the latest to the earliest; in the other, times are ranked from the earliest to the latest. Similarly, the (extremely complex) semantic domain of private property involves, among other things, the set of things which can, in some sense, be possessed: elements of this set can be ordered on conceptual scales by relations like be more valuable than or be less valuable than. In general, conceptual scales simply reflect the fact that much of our realworld knowledge depends on the way we experience different entities in terms of orderings of one sort or another. The technical notion of a conceptual scale is thus not all that different from the pretheoretical, commonsense notion, though it is strikingly more general. A great many of our most basic experiences of the world, and particularly of the physical world, are scalar in nature: the dimensions of space (length, width, depth, height), color (brightness, saturation, hue), sound (pitch, amplitude), and temperature (warmth, cold), to name some of the most basic perceptual

56 The Grammar of Polarity domains, are all essentially scalar. And they are similar in that they all feature objective, and indeed measurable, properties of the world. But the elements on a scale need not be measurable in any objective sense: they only need to be ordered, and this can be done in basically any way conceivable. In addition to prototypical scales based on measures and quantities, conceptual scales may be defined by rank orderings, hierarchies, taxonomies, or sequences. Hierarchies – including the orders of poker hands, military ranks, and social classes – consist of an ordered set of ranks based on dominance relations, and taxonomies consist of sets of categories ordered by inclusion (i.e. the isa relation). Thus, in Linnaean taxonomics, a phylum includes classes, classes include orders, orders include families, and families include genera. In this sense, biological classification in general is a scalar semantic domain. Similarly, many complex processes require an intrinsically ordered series of steps: to make an omelette one must break the eggs, beat them, and cook them, in just that order, and so “omelette-making” is an inherently scalar domain. Finally, even the basic experience of moving along a path is fundamentally scalar in nature, since every point on a path is intrinsically ordered with respect to every other point. All these sorts of orderings are regularly used to draw scalar inferences. If we observe someone on a path, we use their position and direction of movement to infer where they have been and where they are going. If we observe someone cooking a meal or building a house – or in any telic process, for that matter – we infer from where that person is in the process what sorts of things they must already have done, and what they may be doing next. If we observe someone succeed at a difficult activity – lifting a heavy object, reading a difficult text, performing a complicated acrobatic routine – we assume this person will also succeed at any comparable but less difficult activity. Inferencing of this sort is essential to the most ordinary activities of everyday life. Ultimately, the structure of conceptual scales and their broad significance for reasoning in general stem from a basic cognitive ability to compare and contrast different events in our mental experience. As Langacker (1987: 101) has argued, this ability is fundamental to virtually all aspects of cognition: Fundamental to cognitive processing and the structuring of experience is our ability to compare events and register any contrast or discrepancy between them. Such comparison is at work when we perceive a spot of light against a dark background, for example, or when we catch a spelling error. I assume that this ability to compare two events is both generalized and ubiquitous: acts of comparison continually occur in all active cognitive domains, and at various levels of abstraction and complexity; regardless of domain and level,

Licensing and the logic of scalar models 57 moreover, they are manifestations of the same basic capacity (or at least are functionally parallel).

Conceptual scales are thus a manifestation of a much more general feature of cognitive processing. One should thus not mistake their apparent simplicity for a sign of triviality. Scales seem so simple only because they are such a fundamental part of the way we conceive the world. 3.3.3 Inferential mechanisms: scalar models Conceptual scales form the foundation for scalar reasoning, but there is more to scalar reasoning than just conceptual scales. As defined above, a conceptual scale can consist of elements of any semantic type. However, we do not in fact reason about structures of any semantic type; rather we reason about things like propositions, possible worlds, and communicative intentions, all of which regularly depend on complex combinations of simple scales. If, for example, all I know is that a new Ferrari costs more than a candy bar, there is little I can conclude beyond the mere corollary that a candy bar costs less than a new Ferrari. On its own, the ordering of elements on a conceptual scale is not so informative, but in combination with other simple scales in a matrix of imaginable propositions, its real usefulness emerges. The ordering of Ferraris and candy bars on a scale of cost is really only of interest where (or to the extent that) such items are considered as potential items of purchase. Thus, if I know my friend Paul can afford a Ferrari, then I also know he can afford a candy bar. And given the logic of scalar implicature, if, when asked for a loan, Paul says that he can lend me enough for a candy bar, I will reasonably conclude that he won’t lend me enough for a Ferrari. To explain scalar reasoning, we need to understand how reasoners use their knowledge of conceptual scales to draw inferences about propositions in discourse. The way they do this, I suggest, is that they use the complex kind of conceptual structure which Fillmore, Kay, and O’Connor (1988) dubbed a scalar model. Loosely, a scalar model is a structured set of propositions ordered in terms of one or more conceptual scales. A scalar model consists of a propositional function (or schema), P, with one or more open variables, each ranging over the ordered elements of a conceptual scale. Scalar models are thus (potentially) multidimensional structures: each variable in P defines a unique dimension and each dimension corresponds to a distinct conceptual scale. Fillmore, Kay, and O’Connor (1988: 535–6) call the set of dimensions, D, in a scalar model an argument space, since it is the values in D which define the set of potential arguments for the function P.

58 The Grammar of Polarity y5

hardest

y4 y3

P: ‘Norm can solve y’

y2 y1

easiest

Figure 3.1 A scalar model of puzzles

Figure 3.1 presents a one-dimensional scalar model in which a propositional function, P[y], stated informally as ‘Norm can solve y,’ combines with a conceptual scale of puzzles ordered in terms of their difficulty. The scale specifies a range of values, 〈y1, y2, y3, …〉, as possible arguments for the function P. Within a scalar model, whenever the propositional function P holds for some point yn, then P can be assumed to hold for all points lower than yn on the scale. In other words, for any two propositions P[yi] and P[yj], where yi > yj , P[yi] → P[yj]. Fauconnier (1975a: 193) calls this basic principle of commonsense logic the Scale Principle: here the solid arrow pointing down from y4 thus represents inferences following from the truth of P[y4]. Expanding the argument space only slightly, Figure 3.2 depicts a twodimensional model pairing puzzles ranked in terms of their difficulty with puzzlers ranked in terms of their acuity. Elements on both dimensions are ordered in a way that supports inferences in terms of their potential to satisfy the propositional function Q, ‘x can solve y’. The ordering of puzzles from the easiest to the hardest reflects a default assumption that if someone can solve a hard puzzle, they can also solve an easier one. Similarly, the ordering of puzzlers from the brightest to the dullest, reflects the default assumption that if a dimwitted puzzler can solve a particular problem, then any more clever puzzler will succeed as well. Once again, the scalar model defines a pattern of pragmatic entailments. Given the truth of a proposition p within the model (i.e. where p has a value of T), one can infer that any distinct proposition q which is lower than p on at least one dimension and no higher than p on any other dimension will also be true. Conversely, given the falsity of p (i.e. where p has a value of F), one can conclude that any proposition q higher than p on at least one dimension and no lower than p on any other dimension will also be false. The ordering of elements on a conceptual scale may reflect cultural assumptions or context-specific expectations, so entailments within a scalar

Licensing and the logic of scalar models 59 ∞

hard p u z z l e s easy

T

: . 4 Q: ‘x can solve y’

3 2

F

1 0 1

2 3

4

. . .

∞

puzzlers Stella

Norm

Dim

Figure 3.2 A two-dimensional scalar model

model need not be, even if they usually are, logically valid. Very clever people can be confused by things which should be obvious, and very simple problems can sometimes baffle a brilliant mind. Still, an assertion that one can solve the most difficult puzzle normally invites the inference that one can in fact solve any puzzle. As Fillmore, Kay, and O’Connor (1988: 537) put it, such entailments hold relative to a scalar model. For Fauconnier, such context-sensitive inferences are “pragmatic entailments.” Pragmatic entailments assume a sort of ceteris paribus condition: they are inferences which do not necessarily hold in all the possible worlds, but just in all the worlds one might reasonably consider on any given occasion. They are thus practically, if not logically, valid. The structure of scalar models, together with the practical wisdom of Fauconnier’s Scale Principle, trivially explains how superlative expressions sometimes allow quantificational interpretations. Given the rather innocent assumption that superlatives designate the endpoint of a conceptual scale, it follows from the Scale Principle that, with the right sort of propositional function, predication of a superlative will pragmatically entail all values lower on the scale. The Scale Principle thus effectively turns superlatives into universal quantifiers. The question is, what, exactly, makes something the right sort of propositional function? As outlined above, a scalar model consists of a propositional function and an argument space defined by a set of conceptual scales. Significantly, different propositional functions may operate on the same argument space to define different scalar models. Thus, the conceptual scale of puzzles above might combine with a variety of schemas – for example, ‘I doubt that Norm can solve y,’ ‘I

60 The Grammar of Polarity expect that Norm can solve y,’ ‘If Norm can solve y, I’ll give him a candy,’ ‘Stella will be pleased if Norm can solve y’ – to define a class of related scalar models. The inferences available in a scalar model depend on the schematic proposition that defines that model, but for our purposes here, all such propositional schemas divide into two basic sorts depending on the direction of the inferences they support. With a schema like P, ‘Norm can solve y,’ in Figure 3.1, inferences canonically flow from high to low values for y. With the contradictory schema ¬P, ‘Norm cannot solve y,’ and with other negative and affective contexts, the inferences are reversed, and so run from low to high scalar values, from easier to harder propositions; the validity of any proposition ¬P[yn] low in the model pragmatically entails the validity of all propositions in the model higher than yn. In semantics, as in photography, everything is backwards in the negative, and so for any two propositions ¬P[yi] and ¬P[yj], if yi < yj , then ¬P[yi] → ¬P[yj]. Beyond negation, I follow Fauconnier (1976, 1978) in drawing a general distinction between propositional schemas which, like simple affirmatives, license inferences from high values to low values for a thematic argument in a scalar model, and those which, like negation, reverse these entailments and license inferences from low values to high values for the same argument. Since affirmative assertions constitute the unmarked context, I will refer to schemas which license inferences from high values to low values as scale preserving, and schemas which license inferences from low values to high values I will call scale reversing. Additionally, there may be propositions which, for whatever reason, are not construed with respect to a scalar model or which simply do not license inferences within a scalar model. Such propositions are non-scalar. The crucial notions of scale reversal and scale preservation provide the foundation for a defintion of polarity contexts, and ultimately, for an explanation of polarity sensitivity. As noted above (§3.2), and as Fauconnier (1976) showed in detail, scale reversal appears to be a general property of the contexts which license canonical polarity items like English ever and French jamais. This suggests that polarity items really are in some sense like quantificational superlatives, and that polarity sensitivity in general is a sensitivity to scalar inferencing. In what follows, I propose that polarity items are a special class of scalar operators: their distribution thus depends on the availablity, in context, of an appropriately structured scalar model, and on the way an expressed proposition is construed in relation to its scalar alternatives. NPIs need a model with reversed scalar inferences; PPIs need one with preserved scalar inferences.

Licensing and the logic of scalar models 61 3.4

Affectivity as a mode of scalar construal

If it really is an act of inferencing that licenses NPIs, then strictly speaking licensing itself is neither a syntactic relation between constituents in a sentence, nor even a semantic relation between propositional contents, but fundamentally a pragmatic relation – a mode of construal – which holds between a conceptualizer and the expressed proposition to which a polarity item contributes its meaning. An act of inferencing is always a matter of judgment. To draw an inference in a scalar model, one must make a judgment about the factual status in some discourse context of a single proposition in the model. Given the scalar background, the way any proposition is judged – e.g. whether it is taken as affirmed, denied, suggested, hypothesized, or questioned – has automatic consequences for the way other propositions in the model can be judged. It is the mode of judgment – intuitively, the way a proposition is brought to mind – that determines whether a propositional function is scale-reversing or scale-preserving, and so, by hypothesis, what sorts of polarity items it can license or tolerate. Of course, there is nothing new to the idea that polarity licensing is somehow linked to scalar inferencing, nor even to the idea that this link is somehow mediated by pragmatics. Since Ladusaw (1979) first proposed that licensing is a matter of logical semantics, it has been widely noted that the relevant inferences are subject to pragmatic constraints: they can be triggered by certain conversational implicatures (Linebarger 1980, 1987, 1991; Israel 1996), and they may be blocked if certain presuppositions are not held constant (Heim 1984; Kadmon & Landman 1993; von Fintel 1999; Horn 2002). But while all may admit a role for pragmatics in polarity licensing, the phenomenon is widely seen as in essence a semantic constraint on grammatical well-formedness – a constraint that is defined on a model-theoretic representation of an expression’s objective truth conditions. This is both because in general, as Ladusaw noted, “whether an expression (lexical or phrasal) is a trigger is predictable from its meaning” (1979: 3), and especially because the relevant sort of meaning here consists in an expression’s contribution to the truth conditions of an expressed proposition. Since most linguists will agree that sentences like *I ever kissed her and *There was anyone at the party are not merely semantically anomalous but somehow truly “ungrammatical,” the default assumption has been that the mechanism responsible for polarity licensing must apply at some level of grammatical representation. There is, however, a substantial question here as to whether the inferences involved in polarity licensing are best understood as properties of linguistic representations per se, or as elements of conceptual

62 The Grammar of Polarity structure more generally. In other words, are the constraints on polarity items really a matter of sentence grammar, or do they apply rather at the level of utterance interpretation itself, as constraints on coherent conceptualizations? Since my basic argument here is that polarity licensing depends on the way an expressed proposition is construed within a scalar model, and since I have already defined scalar models as schematic conceptual structures which facilitate general patterns of inferencing, it follows that polarity licensing itself must be essentially a conceptual phenomenon. In general, I take it, any linguistic construction, of any arbitrary complexity (whether a word, a phrase, a clause, or a complex schema), is associated with some semantic content which determines (or at least constrains) the contribution it makes to an expressed proposition. But an expressed proposition is more than just a reflection of linguistic semantics: it is the culmination of a process of meaning construction in which a variety of complex cognitive abilities and structured features of background knowledge are integrated in a mental space (Fauconnier 1985, 1997; cf. Sperber & Wilson 1986). In this light, I contend that polarity contexts are defined not just by the logical, semantic, or syntactic properties of constructions, but also, and crucially, by the pragmatic properties which determine how an expressed proposition is construed in context. The relevant level of analysis here is thus neither purely linguistic nor purely conceptual, but a little of both: it is the level at which a speech act is conceptualized in an act of communication, the mental space in which a proposition is expressed and interpreted. Since polarity items, as opposed to polarity contexts, are necessarily linguistic constructions, they must occur in a linguistic context of some sort. But if licensing depends on the construal of a constructional meaning within a scalar model, then polarity contexts are properly defined not by their grammatical structure per se, but directly in terms of their semantic-pragmatic effects. The formulae below thus define three ways a focused construction C can be construed in a context (or mental space) M with respect to a scalar model SM built on a conceptual scale S = 〈Q, R〉, with Q a set of conceptual entities and R an ordering on Q. Let c′ be the conventional interpretation of c in M, assume that c′ ∈ Q, and for any element x ∈ Q, let M [x / c′] be the context M where c’ = x. Then: (11) a. M is scale preserving with respect to c iff, for all xi, xj ∈ Q, xi < xj → (M [xj / c′] → M [xi / c′]) b. M is scale reversing with respect to c iff, for all xi, xj ∈ Q, xi < xj → (M [xi / c′] → M [xj / c′]) c. M is non-scalar if it is neither scale preserving, nor scale reversing.

Licensing and the logic of scalar models 63 The examples below (12–16) illustrate pairs of semantically similar grammatical contexts which differ in the scalar construals they allow for the models of puzzles and puzzlers discussed above (§3.3.3). The constituents in square brackets in the (a) examples support scale-preserving interpretations, as demonstrated by the apparent entailments from (high-scalar) hard problems to (low-scalar) easy problems; in the minimally different (b) examples, the bracketed constituents are scale reversing, giving inferences from (low-scalar) “easy problem” propositions to (high-scalar) “hard problem” propositions. The overt triggers most responsible for these scale reversals are shown here in boldface. (12) a. Someone [who could solve the hard problems] got a prize. → Someone who could solve the easy problems got a prize. b. Everyone [who could solve the easy problems] got a prize. → Everyone who could solve the hard problems got a prize. (13) a. Norm must [be able to solve the hard problems] to get a prize. → Norm must be able to solve the easy problems to get a prize. b. If [Norm can solve the easy problems], he’ll get a prize. → If Norm can solve the hard problems, he’ll get a prize. (14) a. Norm quit after [he figured out how to solve the hard problems]. → Norm quit after he figured out how to solve the easy problems. b. Norm quit before [he figured out how to solve the easy problems]. → Norm quit before he figured out how to solve the hard problems. (15) a. I expected [Norm could solve the hard problems]. → I expected Norm could solve the easy problems. b. I’d be surprised if [Norm could solve the easy problems]. → I’d be surprised if Norm could solve the hard problems. a. I’m sure [Norm could solve the hard problems]. (16) → I’m sure that Norm could solve the easy problems. b. I doubt [Norm could solve the easy problems]. → I doubt Norm could solve the hard problems.

In all these cases the context M is a clause which supports the construal of problem-solving situations in terms of the ease or difficulty of the problems involved. The inferences here thus depend on a scalar model like that in Figure 3.2, with a conceptual scale S including a set Q of things that can be solved (puzzles, riddles, problems, mysteries, or whatever) and an ordering relation R ranking members of Q in terms of their difficulty. Each sentence includes an instance of the English noun phrase (NP) construction: thus, C is the NP construction, c is the particular NP, either the easy problems or the hard problems, and c′ is the scalar interpretation of the NP in M as contrasting with other members of Q.

64 The Grammar of Polarity The definitions in (11) are, it should be noted, strictly analogous to standard definitions of downward and upward entailing (UE) operators (e.g. Ladusaw 1979: 145–6; van der Wouden 1997: 90; §8.4 below). The real question about any such definition lies in the definition of the arrow “ →” here: in just how, that is, one understands the notion of “entailment” or “informativity” which holds between propositions in a scalar model. The basic claim here is that affective contexts are defined precisely by their effects on scalar inferencing, and in particular by their effects on things like the interpretation of superlative NPs and on the use of scalar focus particles like English even, French même, and German sogar and auch nur. The examples in (12–16) support this hypothesis, since it appears that the constructions here which license NPIs are just those triggers in the (b) examples that license inferences from ‘easy’ to ‘hard’ problems. These examples also suggest that the inferences relevant for licensing can be identified precisely with a sentence’s logical semantic entailments: in each case it seems that any reasonable person who accepted one of these premisses would also have to accept the truth of the consequent proposition. But scalar inferencing is not always so straightforward. Natural language sentences typically have many entailments, and sometimes they can seem to pull in opposite directions. Consider “approximate” (Huddleston & Pullum 2002: 815) or “quasi-negative” (Klein 1998) negators like few, rarely, scant, or hardly. These are well-known NPI triggers, but they also invite some clearly positive sorts of inferences: to say few people came to the show suggests that at least some people did come; to claim that one rarely smokes is in some sense to admit that one might smoke occasionally. This poses a problem for inferences like those in (17–18). (17)

a. Few primates [can solve the easy problems]. b. → Few primates can solve the hard problems.

(18)

a. Norm can rarely [solve the easy problems]. b. → Norm can rarely solve the hard problems.

The thing is, in a situation where some, but not many (i.e. “few”) primates can solve the easy problems, it is perfectly conceivable that no primates could solve the hard ones. Thus, there are situations where (17a) is true, but where it would be misleading, at best, to utter (17b). But while the use of a weak negative like few usually does license a positive inference – i.e. few Xs Y suggests that ‘at least some Xs Y’ – it is unclear whether this inference is a semantic entailment. As many have noted (Horn 1969, 1972, 1989; Ducrot 1973; Carston 1998; Ladusaw 1979: 153), while positive quantifiers like many and often give

Licensing and the logic of scalar models 65 lower-bounded ‘at least’ readings compatible with universal statements (e.g. Many – if not all – of my friends like chocolate), weakly negative quantifiers like few, rarely, seldom, hardly, and (arguably) only yield upper-bounded ‘at most’ readings which are compatible with negation (e.g. Few – if any – of my friends rob liquor stores). Since it is suspendible, the positive inference here seems to be a kind of implicature: the inferences in (17–18) are thus valid, and so the relevant contexts count as scale reversing. In fact, there is good psycholinguistic evidence that speakers draw very different sorts of inferences from quantifiers like few and not many than they do from their counterparts a few and many. Moxey and Sanford (1993, 1994) report on a series of studies in which subjects asked to complete a short discourse in which a pronoun refers back to a quantifier of some sort (e.g. {Few / A few} MPs were at the meeting. They …). The results show that with a positive quantifier, subjects overwhelmingly treat the pronoun as referring to what Moxey and Sanford call “the reference set” – the subset of a quantifier’s restriction which intersects with its nuclear scope (i.e. the MPs who were at the meeting); however, with negative quantifiers there is a strong tendency to interpret the pronoun as referring to the “complement set” – the subset of the quantifier’s restriction which are in the complement of the nuclear scope (i.e. the MPs who missed the meeting). Given this tendency, it seems clear that while quantifiers like few can support a positive inference, they typically serve to highlight a negative proposition and so function as scale reversers. But while the positive inferences of the approximatives may be dismissed as mere implicatures, other polarity triggers really do seem to entail a positive proposition. Consider, for example, a sentence like Only Bill had any fun, where the exclusively focused NP only Bill licenses the NPI any. This sentence clearly entails that ‘no one besides Bill had any fun’ (Horn 1969, 1996), but as Atlas (1993, 1996) has vigorously argued, it defies common sense to think such a sentence could be true if Bill himself did not have any fun. Since, as Horn (2002) himself concedes, a sentence like Only Bill had any fun, and even he didn’t is irreparably self-contradictory, the positive proposition associated with the only NP (i.e. here ‘that Bill had fun’) must be an entailment and not just an implicature. But this poses a problem for the putative reversed entailment in (19), since (19a) clearly does not share the entailment in (19b) that Norm can solve the hard problems, even if both sentences do entail that no one besides Norm can solve them. (19)

a. Only Norm can solve the easy problems. b. ? → Only Norm can solve the hard problems.

66 The Grammar of Polarity In fact, a number of polarity contexts come with similar positive entailments which surprisingly seem not to interfere with their potential as licensors. Factive adversatives like regret and be surprised that in (20–21) are another notorious example (Ladusaw 1979; Linebarger 1980, 1987, 1991; Heim 1984; Kadmon & Landman 1993; von Fintel 1999). These forms both license NPIs in their complements and seem to presuppose the truth of the propositions to which these NPIs would contribute. This combination poses a problem for the putative entailments in (20b–c) and (21b–c). (20)

a. I’m surprised that Norm solved any of the problems. b. I’m surprised that Norm solved the easy problems. c. ? → I’m surprised that Norm solved the hard problems.

(21)

a. I regret having tried to solve any of the problems. b. I regret having tried to solve the easy problems. c. ? → I regret having tried to solve the hard problems.

The problem is that one could perfectly well be surprised that Norm solved some easy problems without ever believing, let alone being surprised, that he also solved any hard problems. Of course, these triggers are scale reversing in the limited sense that, given a “constant perspective” on what counts as surprising (Kadmon & Landman 1993: 381), if I am surprised that Norm solved the simplest puzzle, I will be even more surprised if he turns out to have solved some harder puzzle. Thus, von Fintel (1999) calls the relation from the (b) to the (c) sentences in (20–21) “Strawson Entailment” because the inference only works if one can ignore the sentences’ presuppositions. Contexts like these are scale reversing with respect to their assertions, but not with respect to their presuppositions. This suggests that for purposes of NPI-licensing, what really matters is not so much what a sentence entails, but crucially, what it says – the ostensive contribution it makes to a discourse context. Nowhere is this more dramatically illustrated than in the behavior of forms like almost and barely. Surprisingly, while almost in (22) seems to entail a negative proposition and barely in (23) seems to entail a positive proposition, it is actually barely, and not almost, which licenses an NPI in (24). (22)

a. Yesterday, Stella almost proved Fermat’s Last Theorem. b. → Stella didn’t prove Fermat’s Last Theorem.

(23)

a. Dim barely knows how to balance his checkbook. b. → Dim does know how to balance his checkbook.

(24)

a. *Stella almost proved anything. b. Dim barely knows anything.

Licensing and the logic of scalar models 67 These examples seem puzzling if one assumes that NPIs are somehow triggered by the expression of a negative proposition, since almost with its negative entailments clearly seems more negative than barely with its positive entailments. But not all entailments are created equal. In the spirit of Ducrot (1973, 1980: 20–2; see also Verhagen 2005: 41ff.), I suggest that almost and barely are argumentative operators which, when added to a monoclausal sentence p with a single entailment P, yield a sentence with two entailments: [almost p] entails both (i) that the situation is, in some salient way, very much like one in which P is true, and (ii) that P is not true; and [barely p] entails both (i) that the situation is somehow very like one in which P is false, and (ii) that P is true. There has been a great deal of debate about the status of these dual entailments, if that is indeed what they are (e.g. Sadock 1981; Atlas 1984; Fillmore, Kay & O’Connor 1988; Horn 1991, 1996; Klein 1998 – see Horn 2002 for an overview), but for our purposes here, the important point is just that the two propositions are not on the same pragmatic footing: for both forms, it is the first which is communicatively salient (i.e. asserted), while the second is presented as relatively uncontroversial (or presupposed). In effect, (22a) presents a situation in which Stella has failed to accomplish something very impressive, but it asserts that what she did do is quite impressive in itself. Similarly, (23a) evokes a situation in which Dim can do something very unimpressive, but it asserts that he cannot do anything more. Almost is thus scale preserving: if Stella can accomplish a lot with a difficult problem (even without solving it), then she will succeed all the more with an easier problem. Barely is scale reversing: if Dim has only meager success with easy problems, then clearly he will not have any greater success with harder problems. While polarity licensing may depend on scalar inferencing, polarity items are not, it seems, sensitive to just any inference a context might support. As Horn (2002) suggests, the important thing is not what a sentence entails, but rather what it actually asserts: Semantically entailed material that is outside the scope of the asserted, and hence potentially controversial, aspect of utterance meaning counts as as s er t o r ica l ly in e r t and hence as effectively transparent to NPIlicensing and related diagnostics of scalar orientation. (Horn 2002: 63)

The existence of such assertorically inert entailments clarifies a fundamental fact about the nature of polarity sensitivity. The inferences which can license or block a polarity item are not just a matter of a sentence’s objective truth conditions, but rather are features of the ways a sentence can be used: they are,

68 The Grammar of Polarity specifically, the inferences which a sentence is meant to communicate or taken to mean in the event of an illocutionary act. And in this sense the mechanism of polarity licensing is essentially a matter of pragmatics: really, it all depends on what is said and how it’s taken. The idea that licensing in some sense takes place at the level of the speech act or utterance rather than at some more autonomous level of grammatical structure may also help with one of the great old mysteries of polarity sensitivity – that is, why both NPIs and PPIs are commonly licensed in interrogative clauses. The basic facts here have never fit easily with standard accounts of licensing, since questions are neither essentially negative nor obviously scale reversing (van Rooy 2003). NPIs like any and ever are licensed in all sorts of interrogative clauses: in yes–no questions in (25a), information questions in (25b), and indirect questions in (25c). (25)

a. Will Norm ever solve any of these problems? b. When has Norm ever solved anything? c. I wonder if Norm will ever solve any of these problems.

Since the effect of the NPIs here, particularly with ever and any together in the same clause, is a stong bias towards a negative response like “no” or “never,” these sentences do have a negative flavor. But the negativity seems to come directly from the NPIs rather than the interrogative contexts, and NPIs normally cannot license themselves. And in any case the same contexts also license PPIs like some, already, long since, and every single one in (26). (26)

a. Has Norm already solved some of these problems? b. Who has already solved some of these problems? c. I wonder if Stella has long since solved every single one of these problems.

Presumably, if NPIs really do require scale-reversing contexts and PPIs scale-preserving ones, then interrogative clauses must somehow provide both. But do they provide either? Is there any entailment either way with the pairs of questions in (27–29)? (27)

a. Will Norm be able to solve the easy problems? b. ? → Will Norm be able to solve the hard problems?

(28)

a. When has Norm solved the easy problems? b. ? → When has Norm solved the hard problems?

(29)

a. I wonder if Norm will be able to solve the easy problems. b. ? → I wonder if Norm will be able to solve the hard problems.

The first problem is just to explain what it means for one question to entail another. A standard approach is to define the meaning of a question in terms of

Licensing and the logic of scalar models 69 its possible answers, so that one question is said to entail another just in case any true and complete answer to the former is also a true and complete answer to the latter. Or, more generally, we can say that a sentence P pragmatically entails another sentence Q just in case the utterance of P normally commits a speaker to everything (at least) that the utterance of Q commits her to: thus a question P entails another question Q only if any speaker, in asking P, also asks in effect whatever would be asked by asking Q. But this is clearly not the case for the questions here. The putative entailment in (27) fails since a “yes” answer might easily be true for (27a) but false for (27b); and similarly for (29), one can perfectly well wonder whether Norm can solve an easy problem without thereby wondering whether he can solve any harder problem. Interrogative clauses are thus not scale reversing, and they are not scale preserving either, since one could easily wonder about a man’s ability with some hard problems without thereby questioning his ability with the easier problems. According to these criteria then, interrogative clauses appear to be inherently non-scalar. This makes sense if interrogatives really are neutral between affirmation and denial; still, the act of posing a question is rarely rhetorically neutral.3 While the primary inference licensed by a question may be just that the questioning speaker is unsure of the truth of a questioned proposition, still, as Fauconnier (1980: 63–4) points out, the use of a question like (27a) can say something about a speaker’s attitude toward questions like (27b). If a speaker wonders about the easy problems, then she must either believe that Norm cannot solve the harder ones or else be uncertain about his ability, since if she believed that he could solve the hard ones, she would have to conclude, by the logic of the scale principle, that he could solve the easy ones too. Thus, while questioning a proposition in a scalar model does not entail any questioning of propositions higher in the model, it does presuppose that one either doubts or disbelieves those higher propositions. So the (a) questions in (27) really are stronger than the (b) questions since they express a greater degree of doubt and wonder. And in this sense, interrogatives really can be scale reversing, but the relevant inferences are not between the questions themselves, but the levels of doubt the questions express. Questions on their own seem to be inherently non-scalar, but the act of posing a question is itself enough to express a scale-reversing doubt. Or at least it can be. The important point is that only where such doubts are available do NPIs get licensed in interrogative clauses. And interrogatives do differ in this respect. For closed interrogatives like the yes–no questions in (27), the logic is fairly simple: just by posing a question about Norm’s ability with easy problems, a speaker will normally implicate her doubt that Norm

70 The Grammar of Polarity could solve the harder ones. But with the open interrogatives in (28), the questions turn not on problem-solving abilities per se, but on the particular times such abilities have been demonstrated. Taken literally, such questions are not scale reversing even in the extended sense described above: one can perfectly well wonder when Norm solved an easy problem without being in any doubt as to whether or when he solved the harder ones. In order to get the inference from easy problems to hard ones here, the questions must be taken as purely rhetorical, not as requests for information about an event, but as expressions of doubt that an event took place at all. And taken this way, a sentence like (28a) is not really a question at all but a sort of indirect denial. Since information questions are only scale reversing on a rhetorical reading, it follows that only rhetorical information questions should license NPIs. And indeed, this is the case. As the contrasts in (30–31) suggest, while any and ever can occur with minimal or no bias in yes–no questions, they come with a note of skepticism or disbelief when they occur in information questions. (30)

a. Do you want any bourbon? b. Who wants any bourbon?

(31)

a. Have you ever hunted wild boar? b. When have you ever hunted wild boar?

These facts suggest that interrogative clauses license polarity items not because of what they mean (their semantic entailments) but rather because of the ways they can be used (their illocutionary effects). In other words, it’s not the question itself that licenses a polarity item, but the way it is posed. While approximatives like few, exclusives like only, and adversatives like regret all come with entailments or implicatures which might be expected to block reversed scalar inferences, interrogatives on their own seem to lack the entailments needed to trigger scale reversing. These constructions are not united by their contributions to truth-conditional meaning but rather by their effects on the presentation of such meanings – that is, by the ways they frame a proposition within a scalar model. The evidence reviewed in this section thus suggests that polarity contexts are defined not by their logical meanings alone, but rather and crucially by the ways their meanings affect pragmatic inferencing in a communicative context. 3.5

Syntactic constraints on scalar construals

Beyond the basic problem of finding an eligible trigger, negative polarity items must negotiate a number of minor constraints and petty regulations in order to

Licensing and the logic of scalar models 71 receive a license. Prominent among these are locality conditions prohibiting certain operators from intervening between an item and its licensor (Linebarger 1980, 1987) and the precedence condition which requires a licensor to precede the items it licenses (Ladusaw 1979; Hoeksema 2000). The behavior of NPIs in contexts with multiple licensors is also relevant here: given the logic of scalar reversal and the law of double negation, one might expect any two scale-reversing triggers to form a scale-preserving context, so that NPIs would require an odd number of triggers to be licensed, but this is not always the case (Baker 1970; Chierchia 2004). These constraints show that there is more to licensing than the mere presence of an eligible licensor; however, their effects apply not just to polarity licensing but to the availability of scalar construals more generally. As a group they show just how delicate scalar construals can be, and they further support the claim that such construals are precisely what licensing depends on. 3.5.1 The precedence condition All human languages feature utterances structured as a linear progression of symbolic units, and many languages have rules which govern the ordering of sentence constituents. As (32–34) suggest, NPIs like English any normally appear only after an overtly preceding trigger. (32)

a. The dancers didn’t eat anything for breakfast. b. *Anything wasn’t eaten by the dancers.

(33)

a. None of the dancers ate anything for breakfast. b. *Any of the dancers ate nothing for breakfast.

(34)

a. Nobody came to visit at any time. b. *At any time nobody came to visit.

The precedence condition seems to apply only within a single clause, since NPIs can occur before their licensor when embedded in the clausal complement of an adversative predicate, as in (35), though only if the NPI-containing clause is itself preceded by some overt indication of its subordinate status. (35)

a. *(The idea that) she would so much as think of betraying you sickens me. b. *(The notion) he would budge an inch to save her is pure fantasy. c. *(That) she would lift a finger to help him comes as a complete surprise.

As Ladusaw (1979) acknowledged, the precedence condition greatly undermines the idea that polarity licensing is a matter of logical semantics: since the logical operators which trigger polarity items take scope over whole propositions, their surface word order should not affect licensing. Basically, the (a) and (b) sentences in contrasts like (32–34) above are, or should be, logically

72 The Grammar of Polarity identical. This suggests that it is just a hard syntactic fact of life that NPIs must follow their licensors, at least when both occur in the same clause. But of course the same proposition can be construed and constructed in many different ways, and so the precedence condition may reflect something about the ways NPIs contribute to the construal of a proposition. The problem seems to be that where NPIs occur before their licensors, they are liable to be interpreted as denoting particular, individuated referents, rather than virtual scalar endpoints. Thus, (36a) suggests a specific finger that was lifted to help no one, (36b) a specific occasion of Hillary batting just one eye, and (36c) a particular expression of interest slighter than all others. (36)

a. *Bill lifted a finger to help none of his friends. b. *Hillary batted an eye at none of his outrageous antics. c. *Monica has shown the slightest interest in none of these issues.

The important intuition is perhaps clearest when NPIs occur in subject position, and so are more likely to be construed as topical. The sentences in (37) are heavily biased toward an interpretation in which their respective fingers, interests, and eyes are construed as referential, and so the effect of ungrammaticality is markedly enhanced. (37)

a. **A finger was lifted to help no one. b. **An eye was batted at none of his antics. c. **The slightest interest was not expressed.

It appears, then, that for at least some NPIs, the scalar properties of a licensing context must be established on-line before the item itself can be introduced. In fact, it is difficult, though not quite impossible, to construe any end-ofscale expression within the scope of a following tautoclausal scale reverser. Thus, in (38), while the superlative NP the simplest problem may allow a quantificational interpretation with the following negative (particularly if construed as the focus of even), the construction of the sentence as a whole is awkward in a way that seems analogous to the ill-formedness in violations of the precedence condition. (38)

a. ??The simplest problem couldn’t be solved by those fools. b. ??Oscar solved the simplest problem on none of his exams.

The awkwardness of these examples does not, of course, explain the precedence condition, but it does suggest that the condition may be a constraint on scalar construals in general rather than, as often assumed (Crain & Pietroski 2002), a purely syntactic rule.

Licensing and the logic of scalar models 73 In any case, as Hoeksema (2000) has shown, the condition does not apply equally to all polarity items. English NPIs like as of yet, auxiliary need, and can stand are among those which can and often do precede their licensors. (39)

a. As of yet, there has been no answer from the Klingons. b. You need not trouble yourself with the details. c. I can stand the anticipation not a second longer.

Other NPIs, or NPI-like constructions, actually require their licensor to follow them. For example, the likes of which construction, illustrated with examples from Google in (40–41) below, features a polarity trigger inside the relative clause headed by which and thus after (or perhaps inside of ) the NPI it licenses. (40)

a. Reality has a taste the likes of which fiction can rarely match. b. It was a manhunt the likes of which we will never see again. c. Saddam believes that he is a great natural leader, the likes of which his world has not seen in thirteen centuries.

(41)

a. *Reality has a taste the likes of which fiction can sometimes match. b. *It was a manhunt the likes of which we will see again. c. *He is a natural leader, the likes of which his world has often seen.

Still, in a language like English, where negation canonically occurs at the start of a predicate, most NPIs do mostly occur after their licensors. The important point for my purposes here is just that where NPIs are blocked by a failure of precedence, the scalar inferences which license NPIs also appear to be systematically blocked. 3.5.2 Intervention effects Sentences involving multiple operators are often multiply ambiguous. Negation is especially notorious for creating ambiguities which depend on what falls within its scope and what, precisely, is being negated. Example (42) illustrates a scope ambiguity arising from the interaction of negation with the quantifier every. On the reading in (a), the quantified NP every child is unaffected by the negation, which therefore is said to take narrow scope; on the reading in (b), every child is affected by the negation, which in this case is said to take wide scope. (Note that other readings are possible too: for instance, where a story takes wide scope with respect to negation.) (42)

Margaret didn’t tell every child a story. a. There is no story that Margaret told to every child. b. Not every child was told a story by Margaret.

At this point an interesting fact emerges. If we substitute the NPI any for the indefinite article a in this example, the ambiguity disappears. The sentences

74 The Grammar of Polarity below are only grammatical where the quantifier every takes wide scope with respect to negation and the NPI any is interpreted in the immediate scope of negation. (43)

a. Margaret didn’t tell every child any story. b. Margaret didn’t tell any stories to every child.

The sentences in (43) are acceptable only on a narrow scope reading for negation. For this reason, given normal background assumptions, the (b) sentences in (44–46) sound odd or ungrammatical. In each case, the reading in which the any NP takes wide scope over the every NP is blocked or at least made unlikely for pragmatic reasons: one would not expect anyone to give each of his wives the same painting or to put the same egg in every basket consecutively. (44)

a. Pablo didn’t give all of his wives a painting. b. *Pablo didn’t give all of his wives any paintings.

(45)

a. Gwyneth didn’t fill most of the baskets with eggs. b. *Gwyneth didn’t fill most of the baskets with any eggs.

(46)

a. Hillary didn’t make a donation to every charity. b. *Hillary didn’t give a red cent to every charity.

Horn (1998) makes a similar point when he argues that intervention effects explain the failure of absolutely and just to occur with polarity sensitive any (e.g. Alf won’t eat (*absolutely) any squid). The problem, according to Horn, is that degree adverbs like absolutely incorporate the semantics of a universal quantifier (see also Klein 1998 on the semantics of degree adverbs). In general, certain quantificational expressions – in English, for example, all, every, most, several, and always, among others – seem to absorb the force of a negative operator and block the licensing of NPIs in their scope (Linebarger 1980, 1987; Chierchia 2004), though other logical expressions – in particular, indefinite determiners like many and modals like necessarily and can – do not have this property. While the Scalar Model of Polarity cannot, on its own, explain these facts, it does at least predict them. The prediction is that NPIs should be acceptable in just those sentential contexts which reverse entailments in a scalar model. As it turns out, one basic effect of an intervening quantificational operator is to block scale reversal. The point is illustrated by the examples below where a single doughnut and the easiest problem cannot receive a scalar construal when they are interpreted in the immediate scope of every. (47)

a. Parker didn’t eat a single doughnut in every diner she visited. b. Bruno can’t solve the easiest problem on every test.

Licensing and the logic of scalar models 75 The preferred reading for (47a) is the one on which it entails that Parker ate no doughnuts, that is, the reading in which every takes wide scope over negation and a single takes narrow scope. Another possible reading entails that in at least some restaurants the cardinality of the doughnuts consumed by Parker was other than one. The reading which is, as far as I can tell, perfectly impossible, is the one where Parker ate no doughnuts in only some of the diners – that is, where negation takes wide scope over every, every takes wide scope over a single, and a single gets a scalar construal. Similarly, (47b) cannot be construed to mean that it is not on every test that Bruno fails to solve any problem (only perhaps on some tests). For whatever reason, every seems to absorb the scale-reversing properties of the negation and so blocks the quantificational reading of the superlative NP. Again, the constraints on polarity licensing appear to be part of a larger pattern affecting the availability of scalar construals in general. 3.5.3 The paradox of double negation One of the oldest truths about negation is the law of double negation, that the negation of a negative proposition yields a positive proposition. More generally, as has been known since the work of the medieval scholastics (see Horn 1989, 1996; Sánchez Valencia 1991 for the history), in any complex expression containing multiple scale-reversing operators, the context as a whole will be scale preserving if the number of reversers is even, and scale reversing if the number of reversers is odd. This fact leads one to expect that if polarity items really are sensitive to scale reversal, polarity licensing should be sensitive to the ways multiple scale reversers can combine in a single complex context. But the facts here are more complicated than one might expect. Baker (1970) was the first to address the problem of double negation in a systematic account of polarity phenomena. As he pointed out, while the PPIs would rather and still are blocked by a single negation in sentences like (48a and 49a), the same forms are licensed when they appear embedded under two negations, as in (48b and 49b) (1970: 171–2). (48)

a. *Karin wouldn’t rather be in Montpellier. b. There isn’t anyone here who wouldn’t rather be in Montpellier.

(49)

a. *Someone isn’t still holed up in that cave. b. You can’t convince me that someone isn’t still holed up in that cave.

In much the same way, NPIs in affirmative clauses are licensed when that clause appears embedded in an appropriately scale-reversing context. Thus, in the (b) sentence below, the NPIs anything and at all are licensed because the

76 The Grammar of Polarity sentence as a whole provides a scale-reversing context, even if the local clauses in which they appear do not. (50)

a. *The doctor is doing anything at the moment. b. I’m not sure that the doctor is doing anything at the moment.

(51)

a. *Ellen is all that interested in your sordid tales. b. I find it hard to believe that Ellen is all that interested in your sordid tales.

If NPIs really are sensitive to scalar inferences, then they should also be blocked where an even number of scale reversers combine to form a scalepreserving context. In certain cases this prediction is borne out. Hoeksema (1986: 37–8) offers the examples below as a case in point. (52) (53)

a. Every student who knows anything about logic should know Modus Ponens. b. *Not every student who knows anything about logic should know Modus Ponens. a. Only her husband was ever allowed to dance with her. b. *Not only her husband was ever allowed to dance with her.

In (52b) the combination of negation with the quantifier every creates a complex scale-preserving quantifier not every and so blocks polarity licensing. In (53b) the combination of negation with the normally scale-reversing focus particle only creates a complex operator not only which also blocks licensing. Usually, however, when multiple scale reversers interact across a sentence, their combinations will not affect the licensing of an NPI, so long as the NPI is appropriately licensed by one of the scale reversers at some level of sentence organization. Thus, when an NPI is licensed within a given clause, embedding that clause within a negative environment does not affect its acceptability. The examples in (54), from Baker (1970: 177–8), and (55), from Horn (1996), show that doubling negations do not necessarily cancel the licensing power of a well-placed local negation, and so most multiply-negated clauses will welcome either NPIs or PPIs – though perhaps not both at once (54c).4 (54)

a. There isn’t anyone here who wouldn’t care to do anything down town. b. There isn’t anyone here who wouldn’t rather do something down town. c. *There isn’t anyone here who wouldn’t rather do anything down town.

(55)

a. None of the guests who had seen any of the suspects were excused. b. None of the guests who hadn’t seen any of the suspects were questioned.

These facts show that the scalar inferences needed for licensing need not be computed globally over a complete sentence, so long as they are locally available within some well-formed subpart. In (54a), for example, the NPIs care to

Licensing and the logic of scalar models 77 and anything are licensed by the negated proposition expressed in the relative clause despite the fact that the matrix negation makes the sentence as a whole scale preserving and in fact yields the global entailment that everyone wants to do something in town. The licensing of NPIs in doubly negated contexts shows that if polarity items are licensed by scale reversal, the domain in which they are licensed cannot be that of the sentence as a whole but must be flexibly defined in a way that will include both simple propositions expressed by a single clause and the multi-propositional complexes formed by multi-clausal sentences. As Krifka puts it, “the semantic contribution of a polarity item can be exploited at various levels of a complex semantic expression, not just at the uppermost level of the sentence” (1995: 244). On the basis of such observations, Baker (1970: 178) offers what is surely the most colorful theory of cross-clausal polarity licensing in the literature: We can think metaphorically of a presentational negative element as giving off paint, which spreads through any structure within the scope of that negative element. The flow of paint can, however, be stopped at any S, so that each S represents a sort of valve which, if shut, stops the flow of paint. However, if a valve is left open, the flow of paint cannot be stopped again except by some lower S.

Baker’s flowing paint theory of polarity captures the insight that for purposes of polarity licensing, a clause must be construed either as positive or as negative but never as both. There is in fact a good semantic reason for the apparently syntactic fact that the valves controlling the flow of polarity are located at S – that is, the level of the clause. The reason is that polarity licensing depends on the way a proposition is construed within a scalar model, and the clause is the minimal grammatical level which encodes a complete proposition. A clause which occurs with two scale-reversing operators can be given a scalar construal with respect to either one or both of these operators just so long as each operator can be understood as contributing to the expression of its own distinct proposition which contrasts with its own ordered set of alternatives. Problems arise only when two or more triggers either combine in a sort of complex operator (e.g. as in not every and not only in (52–53), above, and in other non-licensors like not unaware and not unlikely), or for some other reason are not easily construed as contributing to the expression of separate propositions. (56)

a. I doubt that anyone lifted a finger to help him. b. I doubt that no one lifted a finger to help him.

78 The Grammar of Polarity (57)

a. There wasn’t anyone who lifted a finger to help him. b. ??There wasn’t anyone who didn’t lift a finger to help him.

(58)

a. It was never the case that Wilbur didn’t lift a finger to help you. b. ??Wilbur never didn’t lift a finger to help you. c. ???Wilbur never failed to lift a finger to help you.

Thus, while lift a finger is fine in (56b), where it occurs in the negated complement of the negative verb doubt, it is awkward in (57b), where it occurs in a negated relative clause modifying the negated indefinite pronoun anyone. The difference is that in (56b) the embedded negative clause denotes a negative fact which may be considered doubtful or not; but in (57b) the interpretation of the relative clause depends on the interpretation of its indefinite antecedent, and so the proposition denoted – in effect, ‘that not anyone did not lift a finger’ – is composed of two negatives which are construed together, and which therefore do cancel each other out. So it seems there may be something to the old law of double negation after all, but it really only counts where the doublings come so quickly they can’t be kept apart. 3.6

Polarity contexts are mental spaces

I have argued here that Klima’s (1964) old grammatical feature, ±Affective, is not a property of grammatical representations, but a matter of imaginable conceptualizations. The contexts which license polarity items are defined not so much by their syntactic structures or logical form, but rather, and precisely, by their effects on scalar inferencing, and so by the ways they are judged and construed. The relevant modes of judgment are not matters of individual fancy or imagination but depend on the sorts of conceptual structures which make communication possible. Affectivity is thus not properly a syntactic relation between symbolic representations, nor a matter of entailments between objective propositions, nor even just a property of the ways a proposition can be subjectively entertained. It is a matter of meaning and communication. Polarity contexts are mental spaces in which conceptual contents can be jointly imagined and considered in a meaningful discourse. They are defined not just by the denotations of linguistic constructions, but by the very acts in which those constructions are used. Licensing indeed depends on meaning – not just sentence meaning, but speaker meaning as well; not just what is entailed, but also and crucially what is said.

4 Sensitivity as inherent scalar semantics

Nothing that actually occurs is of the smallest importance. Oscar Wilde (1894)

4.1

Scalar operators

Why should polarity items be sensitive to scalar inferencing? In some cases the answer is simple. Just like the quantificational superlatives, many NPIs literally designate a scalar endpoint. Some are themselves superlatives indicating minimal degrees: the foggiest notion, the least bit, in the slightest. Others, like sleep a wink, lift a finger, and a shred of evidence, feature a stereotypical minimal unit on some scale. These minimizer NPIs are like superlatives which only allow a quantificational reading: they have no inherent referential value, and so they cannot refer to a specific minimal unit, but they can be used emphatically, as a way of triggering reference to the ordered set of elements on a conceptual scale. And of course this is only possible in scale-reversing contexts, where pragmatic inferences are licensed from lower to higher scalar values. So, at least for the minimizer NPIs, the sensitivity to scalar inferencing seems intuitively well motivated. But how should this sensitivity be represented in the lexicon? And more importantly, will this sort of intuitive explanation extend to other polarity items with similar sensitivities but with very different scalar semantic properties? This chapter seeks answers to these questions by exploring the hypothesis that polarity items in general constitute a broad but well-defined class of scalar operators. Fillmore, Kay, and O’Connor (1988) introduced the notion of a scalar operator to model the complex semantics and pragmatics of the idiomatic (and polarity sensitive) conjunction let alone. Scalar operators are themselves a special class of what Kay (1989) calls contextual operators – expressions whose meanings involve, usually in addition to constraints on the situations

79

80 The Grammar of Polarity they can appropriately describe, constraints on the contexts where they can appropriately be used. More precisely, contextual operators are lexical items or grammatical constructions whose semantic value consists, at least in part, of instructions to find in, or impute to, the context a certain kind of information structure and to locate the information presented by the sentence within that information structure in a specified way. (Kay 1989: 181)

Naturally, a contextual operator will not be acceptable if it occurs in a context where the information structures it requires can neither be found nor constructed.1 Not all scalar operators are polarity items, nor are all contextual operators scalar in nature. Contextual operators in general are forms which situate the expressed meaning of a sentence within some larger conceptual structure which must be pragmatically available in (i.e. which can be found in or imputed to) the context. Such conceptual structures need not be scalar. Forms like respective, respectively, and vice versa, for example, are non-scalar contextual oper ators whose basic function is to direct traffic among multiple sets of denotata as they are mapped onto their intended propositional roles (Kay 1989). Other contextual operators include hedges like technically, strictly speaking, loosely speaking, a regular X (Lakoff 1972; Kay 1983), and discourse particles such as as a matter of fact, as it turns out, and of course but. Scalar operators in particular are forms which must be construed with respect to a scalar model: they presuppose a scalar model available in the context, and they require the information they express to be integrated with that scalar model in a particular way (Fillmore, Kay & O’Connor 1988; Kay 1990, 1997). The focus particle even exemplifies one of the most common types of scalar operator, the scalar focus particle. There have been many proposals concerning the peculiar contribution words like even make to a sentence (Horn 1969; Fauconnier 1976; Kay 1990; König 1991; Francescotti 1995; among others), but there is a broad agreement, at least to a first approximation, that a sentence containing even will express a proposition which is somehow less expected or more informative than some other contextually supplied proposition. So in a sentence like Even Ezra failed the exam, the particle even does not affect the truth conditions of the expressed proposition but rather introduces a presupposition that the expression of this proposition is somehow very surprising. More precisely, even presupposes that the element in its focus, in this case Ezra, represents the least, or one of the least, likely values one would expect to satisfy the proposition over which even has scope – in this case, x failed the exam. In

Sensitivity as inherent scalar semantics 81 effect, even here presupposes a scale in which individuals are ranked in terms of their propensity to failure, and it presupposes that the focus element, Ezra, occupies a position at or near the bottom of this scale. I suggest that polarity items as a class are like even in that they impose a scalar construal on an expressed proposition. As Heim (1984) noted, many polarity items – and especially, the minimizers – seem to incorporate the semantics of even itself, requiring that the expressed content of a proposition be construed as the least likely in a scale of alternative propositions. But as Heim also noted, not all polarity items can be so analyzed: in particular, indefinite NPIs like any and ever differ from their minimizer cousins in several critical respects (Lee & Horn 1994; Rullmann 1996; §7.4 below). And polarity items like much and all that (as in she doesn’t cheat much or he rarely gets all that drunk) differ from both indefinite and minimizer NPIs in their scalar effects (Linebarger 1980: 236), making for weakly informative or understated propositions instead of strong, emphatic propositions. But while polarity items are not entirely uniform in their scalar effects, they are united at least in that they all have scalar effects. I therefore propose that all polarity items conventionally encode two sorts of semantic properties inherent in the construction of a scalar model. 4.2

Two scalar properties

Every proposition within a scalar model is distinguished by two basic properties. Quantitative value (q-value) refers to a proposition’s position within a scalar model: the higher a proposition is along a scale, the higher its quantitative value. Informative value (i-value) refers to a proposition’s relative informativity within a model: the more entailments a proposition has within a model, the higher its i-value. Q-value and i-value are essentially properties of propositions within a scalar model, but they also find work as lexical features. Many lexical and grammatical constructions are conventionally specified as encoding either a particular q-value or a particular i-value or both. Such forms are, by definition, scalar operators. Since scalar models are built on conceptual scales, the idea of q-value seems relatively straightforward. For many expressions, and for most polarity items, q-value is a salient, and even transparent, feature of the construction’s semantic content. Quantifiers and degree modifiers, for example, typically designate an abstract scalar extent or degree, often without reference to any particular dimension. Thus, a PPI like utterly (as in she was utterly amazed) signals that the predicate it modifies holds to a high degree, while the NPI the least bit (as in she wasn’t the least bit impressed) indicates a minimal degree. The precise

82 The Grammar of Polarity position these forms designate within a scalar ordering is vague and may vary with context, but their fundamentally quantificational nature is hardly open to doubt. In fact, quantitative value is somewhat less straightforward than it might seem at first. The temptation, naturally, is to think of a q-value as a kind of fixed, objective quantity or amount. But problems arise when one considers the analysis of gradable antonyms like fast/slow, easy/difficult, and clever/ dull. There are good reasons to assume that the two terms in each of these pairs do not simply pick out different regions in a single ordering but actually define two distinct scales with inverse orderings of identical elements. One reason is that degree adverbs like very and extremely, forms which themselves seem to denote high q-values, apply equally well to both members of each pair. Presumably a phrase like very fast applies to elements “high” in speed, while very slow applies to elements “high” in slowness. But then the very same objective entity will have a high q-value with respect to one conceptual scale and a low q-value with respect to another. This seems reasonable enough – what it means is just that quantitative value is not itself an inherent property of things in the world but is always defined relative to some scale. It is, in effect, a matter of construal. The crucial question is, how can one determine which scales are operating in any given scalar model? Consider again the model of puzzles and puzzlers discussed above (§3.2.3). Here the argument space consists of two conceptual scales corresponding to each of the two participants in the propositional function ‘x can solve y’. In theory, both of these scales could be ordered in either of two possible ways: the puzzlers on the x-axis can be arranged either from the least to the most clever or from the least to the most dim; and the puzzles on the y-axis can be ordered either from the least to the most difficult or from the least to most easy. The choice, however, is not arbitrary. In order for the model to support the right pragmatic inferences, the two scales need to be correctly coordinated. In effect this means that elements for each dimension are ordered in terms of their potential to satisfy the propositional schema which defines the model. Puzzles are ordered from the least to the most difficult, because if someone can solve a difficult puzzle, then presumably she can also solve any easier puzzle. Similarly, puzzlers are ordered from the least to the most dim, since it is the dim puzzlers who are least likely to solve any puzzles. There is something counterintuitive about this. It would seem more natural to order the puzzlers in terms of their cleverness, and to think of clever puzzlers as having more of something which less clever puzzlers lack. This is an important intuition, and it appears to be the normal way people have of

Sensitivity as inherent scalar semantics 83 thinking about scalar phenomena: in general, given any two inverse orderings for a given domain (i.e. fast vs. slow, sharp vs. dull, bright vs. dim, etc.) one ordering tends to be the default. Thus, we can ask how fast something is without assuming that it really is fast, but if we ask how slow it is, we must have some idea that it really isn’t fast. But scalar models do not necessarily use such unmarked orderings. In general, the ordering of elements on any conceptual scale within a scalar model depends on the role that particular scale plays within a larger proposition. Elements are ordered not in terms of their inherent amounts, nor even in terms of default assumptions about normal orderings, but rather in terms of their significance within a scalar model. As we will see below (§4.5), this fact has important consequences for the ways polarity items are lexicalized, and more generally, for the types of scalar reasoning which underlie polarity sensitivity. For the moment, we can think of quantitative value simply in terms of an element’s position within a scalar ordering. For a form to encode a q-value, it simply has to designate some relative or absolute position within such an ordering. In principle, this allows for an infinite number of distinct q-values, but languages are rather stingy about lexicalizing such distinctions. Among degree adverbs, the locus classicus for scalar distinctions, we tend to find no more than eight basic degrees which are lexically encoded, ranging from the absolute to the absolutely negative (Bolinger 1972; Hübler 1983; Paradis 1997; Klein 1998). For the purpose of understanding polarity sensitivity, I suggest we need only recognize two: high q-value and low q-value, both of which are defined relative to some contextual norm associated with a scale. In general, a coded quantitative value will not denote a precise or objectively fixed position on a scale; more often, q-values consist in a contextually determined range of scalar values. What counts as high or low on a scale depends on background assumptions and implicit norms: what’s big for a mouse tends to be small for a house. In context the construal of a conceptual scale, and hence the use of any scalar predicate, always evokes some scalar norm as an implicit standard of comparison (Sapir 1944). The point is trivial in the case of gradable predicates like tall, fast, beautiful, and intelligent. For something to count as tall, it must exceed some normal expectation about height: it must be construed with respect to a conceptual scale ordering elements in terms of their height and it must be judged as exceeding some scalar norm associated with that scale. The particular value of a scalar norm varies with the expectations and assumptions of speech act participants, but in general it simply reflects a default understanding of the entity under discussion. The scalar norm allows us to view the gradient notion of q-value as a binary opposition: propositions

84 The Grammar of Polarity above the scalar norm associated with a conceptual scale have high q-values; propositions below the scalar norm have low q-values. The need for a scalar norm is also apparent in the case of informative value. I-value depends on an expressed proposition’s inferential relation to other propositions in a model. The question is, how is this relation determined and with respect to which other propositions? If scalar norms constitute an essential, if unspoken, aspect of any scalar model, then i-value can be understood directly in terms of an expressed proposition’s inferential relation to the norm. The norm effectively represents an expectation about what proposition within a model would, in some default context, be most likely to hold. What it means for an assertion to be construed with respect to a scalar model is that it is implicitly contrasted with some alternative default proposition. Kay (1990) thus distinguishes between the expressed proposition overtly encoded by a sentence – the text proposition – and a presupposed proposition in a scalar model with respect to which the text proposition is evaluated – the context proposition. In what follows, I make a parallel distinction between the manifest content expressed by a sentence in context, which I call simply the expressed proposition, and the scalar norm, understood as a proposition within a scalar model with respect to which an expressed proposition may be understood as implicitly contrasting. In general, if an expressed proposition entails the scalar norm, then it is more informative than one might have expected and so has a high i-value; if the expressed proposition is (or would be) itself entailed by the scalar norm, then it is less informative than one might have expected and so has a low i-value. By defining i-value relative to an implicit scalar norm, we again reduce a gradient phenomenon to a binary opposition: propositions entailing the scalar norm have a high i-value; propositions entailed by the norm have a low i-value. This should be intuitive. In general, if a proposition entails the norm, its assertion is informative because it exceeds what one would normally expect to be asserted. I call such relatively informative propositions emphatic. On the other hand, if an expressed proposition is itself entailed by the norm, then its assertion is uninformative, or at least under-informative, because it fails to say whether the default expectation of the norm is met as well. Such under informative propositions I call attenuating. Given this basic distinction between q-value and i-value, we can now consider how these features are encoded in polarity items and how together they can create polarity sensitivities. These features are special because the content they contribute to an expressed proposition is not in fact an inherent property of that proposition but rather depends on its position within a structured set of alternatives: q-value determines an expressed proposition’s position within a

Sensitivity as inherent scalar semantics 85 scalar model; i-value determines an expressed proposition’s inferential value with respect to other propositions in the model. Effectively, what it means for a lexical form to encode one of these properties is that the proposition to which it contributes must be construed relative to a scalar model: i-value and q-value do not simply add information to a proposition; rather, they situate a proposition within a sort of informational matrix. In this sense, both of these features have more to do with construal than they do with the objective content of an expressed proposition. In general, if a form conventionally encodes either an i-value or a q-value, it counts as a scalar operator and must be interpreted relative to a scalar model. But if a form encodes both an i-value and a q-value, it will also be a polarity item: the combination of a fixed scalar location (q-value) with a fixed inferential relation to the scalar norm (i-value) constrains a form to occurring in just those contexts where the direction of scalar inferencing is compatible with both of its scalar values. Thus, the minimizer NPIs discussed above combine a low (in fact, minimal) q-value with a high (or emphatic) i-value, and this combination effectively makes them polarity sensitive, limiting their distribution to the scale-reversing contexts in which their low q-values can support their emphatic i-values.

4.3

Four sorts of polarity items

Since q-value and i-value are both, effectively, binary features, their potential combination yields four theoretically possible classes of scalar operators. The minimizers offer a clear example of one of these basic types, combining low q-values with high i-values. NPIs like much and all that illustrate a second group, in which high q-values combine with low i-values. As NPIs, these forms show roughly the same distributional constraints as the minimizers, though their pragmatic purpose in life is quite different, being used not to strengthen but rather to mitigate the force of a negative utterance. The contrast in (1) between the NPIs much and a wink illustrates the difference. (1)

a. Margo did *(not) sleep a wink before her big test. b. Margo did *(not) sleep much before her big test.

Intuitively, (1a) makes a strong claim by denying that Margo slept even the smallest amount, while (1b) makes a weak claim by denying only that Margo slept for a long time. In (1a), a wink expresses a minimal q-value and produces an emphatic sentence; in (1b), much marks a relatively high q-value and produces an understatement.

86 The Grammar of Polarity Similar examples abound. As noted above (§2.2), one of the most common sorts of NPI is the minimizer – an expression which denotes a minimal quantity or a scalar endpoint and which serves a stereotypically emphatic function (Borkin 1971; Schmerling 1971; Fauconnier 1975a; Horn 1989). Examples in English include drink a drop, (spend) a red cent, budge an inch, lift a finger, and have a snowball’s chance in hell, and similar examples are found in many (and perhaps all) other languages. Another common class of emphatic NPIs includes degree adverbs like English at all, in the slightest, and the least bit, and cross-linguistic counterparts like French le moindre ‘the slightest,’ Hindi zaraa-(bhii) ‘(even) a little,’ and kataii-(bhii) ‘at all’ (Vasishth 1998), and Japanese ikkoo-ni and kaimoku, both meaning roughly ‘at all’ (McGloin 1972). Other emphatic NPIs include scalar conjunctions like let alone and much less, modal constructions like can possibly, and a variety of verbs and verbal idioms such as budge, can stand, can stomach, can fathom, and would dream of. Also in this class are the classic indefinite polarity items any and ever, which in most, though not all (pace Kadmon & Landman 1993), of their uses are clearly emphatic (Heim 1984; Krifka 1995; Rullmann 1996; Israel 1995a, 1998a; §7 below). Attenuating NPIs patterning like the construction with much in (1b) have attracted less attention than their emphatic counterparts, but they are very common both in English and cross-linguistically. Other obvious English items include the temporal adverbial long (e.g. he won’t last long); the degree adverb all that (e.g. he’s not all that clever); and certain uses of many, which in informal usage tends to be replaced by a lot of in positive contexts. Similar NPIs from other languages include French grand chose ‘a whole lot,’ grand monde ‘many people,’ grand choix ‘much choice,’ pour autant ‘for all that,’ and de sitôt ‘so very soon’ (Gaatone 1971; Bouvier 2002: 229); German sonderlich ‘particularly’; and Dutch bijster ‘very,’ pluis ‘plush,’ or ‘easy,’ and mals ‘tender, gentle’ (van der Wouden 1997; Klein 1998); Japanese sonna-ni ‘that much,’ anmari ‘too very,’ rokuni ‘much,’ and betu-ni ‘particularly’ (McGloin 1972: 82); and Persian cœndan ‘much’ and un-qœdrha ‘that much’ (Raghibdoust 1994). Appropriately enough, everything is backwards when polarity is reversed: the neat division of NPIs into low-scalar emphatics and high-scalar attenuators is mirrored by a division of PPIs into high-scalar emphatics and low-scalar attenuators. The abundant emphatic PPIs include quantificational idioms like heaps of, scads of, and the whole shebang, and degree modifiers like horribly, utterly, and amazingly. These forms encode high-scalar q-values in bold, expressive, high i-value assertions, and their use tends to signal a high degree of speaker confidence in the content of an expressed proposition. Attenuating

Sensitivity as inherent scalar semantics 87 PPIs, the last of the four types, also include a wide variety of quantifiers (some, several, few, scant), quantificational idioms (a dab, a tad, a trifle, a soupçon), and degree modifiers (pretty, fairly, kinda). These forms encode (relatively) low-scalar q-values in hedged, low i-value assertions: their use tends to signal either a certain tentativeness, or, at least, a desire not to insist to strongly on one’s point. Consider the contrast between the low-scalar PPI a little bit, and the highscalar scads. (The status of these expressions as PPIs is demonstrated by their unacceptability in the polarity context formed by rarely.) (2)

a. Belinda (*rarely) won scads of money at the races. b. Belinda (*rarely) won a little bit of money at the races.

Again, the difference is intuitively straightforward: (2a) makes an emphatic assertion that Belinda won a very large quantity of money, while (2b) modestly asserts the winning of only a small quantity. Once again, there is a correlation between a polarity item’s informative and quantitative values, only here the correlation is the mirror image of that found with the NPIs in (1): scads designates a high quantity and produces an emphatic sentence; a little bit designates a small quantity and produces an understatement. Similar examples of both low-scalar attenuating and high-scalar emphatic PPIs are readily multiplied. Indeed, if anything, it seems that both sorts of PPIs may be far more abundant than NPIs. High-scalar PPIs, among them what Hinds (1974) calls “doubleplusgood polarity items,” include comparative and superlative expressions such as far Xer, way Xer, and by far the Xest; intensifiers such as utterly, damnably, intensely, and as hell; quantifying NPs such as heaps, mountains, and tons; universalizing idioms like all the time in the world, all smiles, every jot and tittle, and the whole kit and caboodle; and a large class of slangy and unstable evaluative adjectives such as (in some registers of my own idiolect) bitchin, awesome, radical, gnarly, and way cool. There is in fact an overwhelming cross-linguistic tendency for degree words encoding an extremely high quantitative value to function as PPIs. McGloin (1972: 82) cites Japanese PPIs with meanings like ‘everything’ (nan-demo), ‘extremely’ (zuibun, hidoku), ‘very’ (taihen, totemo), ‘considerably’ (hizyoo-ni), ‘quite’ (kanari, naka-naka), and ‘all the more’ (issoo). In her comprehensive study of Dutch degree adverbs, Klein (1998: 208–9) lists just over a hundred Dutch forms expressing a very high degree, among which she identifies eighty-six PPIs (among others, enorm, flagrant, idioot, kolossaal, and vervloekt). And indeed, van der Wouden ventures that in any language most, if not all, “inherently intensified” lexical items are PPIs (1997: 80).

88 The Grammar of Polarity In addition to the quantificational and degree modifier constructions noted above, the class of low-scalar PPIs in English includes frequency adverbs like occasionally and at times; modal auxiliary and catenative verb constructions like would rather, could well, might as well, and might consider; and a good many verbal idioms like have a go at, give X a shot, take a stab at, do pro’s bit for, get by, make do, take a dim view of, and put in a word for. Examples from other languages include French un peu ‘a few’ and plutôt ‘rather’; German etwas ‘somewhat’ and ziemlich ‘rather’; Dutch een beetje ‘a little bit’ and nogal ‘rather’ (Klein 1998); Persian forms like qœdri ‘a bit,’ kœm kœm ‘little by little,’ and the idiomatic VP ye qolop xordœn ‘to drink a gulp’ (Raghibdoust 1994); and Japanese forms like dare-ka ‘somebody,’ sukosi ‘a bit,’ ikubun ‘to some extent,’ and tasyoo ‘somewhat’ (McGloin 1972: 81–2). Having distinguished these four classes of polarity items, it is important to acknowledge that they do not form neat, homogenous groups. The features which define them, q-value and i-value, are schematic properties and allow for a wide range of variation in the ways they apply to lexical items. The crude distinction between low and high q-value, for example, flattens fine-grained distinctions between scalar degrees that could be made in the analysis of degree modifiers. Klein (1998) thus distinguishes between expressions marking absolute, extremely high, high, moderate, and minimal degrees. There are, of course, other ways to divide up the scale, and other distinctions to be made among degree modifiers (Bolinger 1972; Hübler 1983; Paradis 1997, 2001), but these variations appear not to be relevant to the narrow question of what it is that causes polarity sensitivity. One variation which deserves special mention, however, involves the distinction between simple attenuation and actual understatement (Israel 2006; see also Margerie 2007). Some of the forms identified here as attenuating PPIs, for example, sometimes seem to work more like intensifiers. This is particularly true of degree words like rather and pretty and their cross-linguistic counterparts. The basic problem is inherent in the nature of attenuation. An attenuated proposition is one which says less than could have been said – which is less informative than some other proposition that could have been, but was not, expressed. As it turns out, there are two very different sorts of circumstances in which such uninformative propositions get expressed. In cases of true attenuation, one says little because that is all one wants to say; but in other cases, attenuation shades into understatement, where one says little but means much more. The distinction is evident in the different possible uses of a litotic expression like not bad (Horn 1991). One might say of a party, for example, that it was

Sensitivity as inherent scalar semantics 89 not bad. As a simple attenuation, the expression indicates merely that the party could not be characterized as unpleasant. The assertion is attenuating because it says less than one might want to know – it does not say whether the party was actually any good. But as an understatement, the same expression may convey that the party was not only ‘not bad,’ but indeed extraordinarily good. Saying little always raises the question of what is left unsaid, and in some cases, saying less may be a way of meaning more. This is the essence of understatement (i.e. minus dicimus, plus significamus). Certain attenuating forms like rather and pretty seem to be more or less conventionalized expressions of understatement as opposed to mere attenuators. But understatements are not emphatic assertions, and understating attenuators like rather and pretty are not the same as emphatics or intensifiers. In particular, a form like rather displays a certain vagueness, which makes it weaker than a true intensifier like very. To say that someone is rather beautiful is, in effect, a nuanced form of praise, and may well be perceived as less generous than the unqualified compliment without rather. Where rather does express a genuinely high-scalar value, it does so with some delicacy, as if the speaker were reluctant to express the full force of her opinion. Thus, in the examples below, the understatements with rather allow for some latitude as to the precise degree which is meant. This contrasts with the less nuanced expression of forms like very, quite, and awfully, which unambiguously commit the speaker to a high, or very high, degree. (3)

a. I was rather disappointed by your behavior last night. b. Your performance at the party was rather impressive. c. She is, indeed, a rather remarkable young woman.

Considerations of this sort suggest that while rather can be used to express a high degree, such uses are consistent with its basic value as a low-scalar attenuator. Similar points apply to pretty, as in a pretty good dissertation, which, although it may express a relatively high-scalar degree, also allows a certain equivocation absent from a true intensifier like very.2 It is in the nature of attenuation that more may be conveyed than is actually said. Forms which are by their very nature uninformative (that is, which encode a low i-value) naturally tend to be interpreted as coy or oblique ways of expressing something more. In the case of rather and pretty this coy effect of understatement has become so conventional that the forms function almost like intensifiers themselves. With other forms, like sort of, kind of, and fairly, the obliqueness, if present at all, remains more obvious: to call a thesis fairly brilliant could (depending on the speaker) express the highest praise, but what is

90 The Grammar of Polarity

Emphatic

high Attenuating

a heap, a ton, utterly, the whole shebang

a whole hell of a lot, much, all that much, any too

PPls

NPIs

n Attenuating a little bit, sort of, rather, somewhat

Emphatic a damn thing, an inch, at all, the least bit low

Figure 4.1 Four sorts of polarity items

actually said is still much more attenuated. In any case, not all attenuating PPIs allow such interpretations: expressions like somewhat, moderately, and slightly do not so easily admit of obliquely emphatic, “understating,” readings. The lexicalization pattern of PPIs mirrors that of NPIs. While low-scalar attenuators are PPIs, low-scalar emphatics are NPIs, and conversely, while high-scalar attenuators are NPIs, high-scalar emphatics are PPIs. This situation is depicted schematically in Figure 4.1, which presents the four sorts of polarity items arranged in terms of their quantitative and informative values. As depicted here, emphatic – heaping- and damn-type – polarity items tend to hug the scalar extremes, while attenuating – somewhat- and much-type – polarity items hover more around the middle. But the difference between these classes has less to do with their precise positions on a scale than with how those positions are construed. In ordinary use, emphatic items express a value which is felt as somehow more than some alternative, while attenuating items express a value felt to be somehow less. The taxonomy here was first proposed in Israel (1996). Each of the four classes of polarity items had by then already been individually recognized in the literature, though they had not yet been considered as parts of a larger whole. And for the most part, they still aren’t. Instead, one class, the low-scalar emphatic NPIs, overshadows the others in most theoretical attempts to explain the causes of polarity sensitivity. As Ladusaw (1996: 336) notes: It is a theme running through the history of the investigation of this topic that negative polarity items strengthen negative statements, that they are useable

Sensitivity as inherent scalar semantics 91 precisely where they make strong statements, and hence when the polarity items are not licensed, the sentence makes such a weak statement that it is in effect unuseable.

The conventional wisdom remains that polarity items are inherently emphatic, and that the constraints on negative polarity items (NPIs) reflect a need to be maximally informative. In their influential analysis of any, Kadmon and Landman suggest that it is “a very prominent characteristic of any as well as other NPIs that they make the statement they are in stronger” (1993: 369). The implication seems to be that it is a prominent characteristic of all other NPIs. The same intuition has motivated a succession of otherwise quite different analyses (inter alia, Krifka 1992, 1994, 1995; Jackson 1994; Lahiri 1998; van Rooy 2003; Zepter 2003; Chierchia 2004), which take some notion of informative strength or strengthening as the essence of polarity sensitivity. As Zepter puts it, “The actual licensing condition of NPIs is the requirement to be contained in a particularly strong statement” (2003: 235). This view seems unequivocally to predict that all polarity items should occur only where they are especially informative and that no polarity items could be conventionally attenuating or understating. Thus, while Krifka (1992) recognizes a connection between high-scalar PPIs and low-scalar NPIs suggesting that both are somehow inherently emphatic, he argues that some cannot be a true PPI because it is not obligatorily construed against a set of contrasting alternatives (1995: 241). But of course there are many polarity items, both NPIs and PPIs, which, like some and all that, appear only where they are less than fully informative. Many early works on polarity (Klima 1964; Baker 1970; McGloin 1972) discuss both NPIs and PPIs which are clearly attenuating as opposed to emphatic, and these forms have featured prominently in work by Horn (1989), von Bergen and von Bergen (1993), and van der Wouden (1997). Similarly, the formation of understatements via the denial of high-scalar expressions has been widely discussed in the pragmatics literature (Spitzbardt 1963; Bolinger 1972; Hübler 1983; Horn 1989, 1991; Margerie 2008), but rarely with reference to the phenomenon of sensitivity. Linebarger was perhaps the first to recognize “scalar endpoint” NPIs and “understater” NPIs as distinct classes (1980: 236–7), but even she denied any natural connection between them, claiming that each has its own distinct pragmatic motivation (1980: 248). While this is certainly true, it effectively ignores the shared foundation for both types of NPI in the cognitive bedrock of scalar reasoning. The proposed taxonomy is thus neither daring nor entirely original, but it does unite a set of facts which clearly belong together. Each of the four pieces

92 The Grammar of Polarity has already, in one way or another, been independently identified and discussed in the literature; but this just makes it all the more remarkable that they have so rarely been put together, for together they provide new insight into the mystery of polarity sensitivity. 4.4

Sensitivity and the square of opposition

Interestingly, the proposed taxonomy for polarity items maps rather neatly onto the classical square of opposition. The square, which goes back, at least implicitly, to Aristotle, charts a four-way contrast between elements defined in terms of quantity (universal vs. particular) and quality (affirmation vs. negation). The corners of the square are related to each other in terms of contradictory, contrary, and subcontrary opposition. The four corners are labeled with the vowels of the Latin verbs affirmo and nego: the A and I corners represent universal and particular affirmation, respectively; the E and O corners represent universal and particular negation, respectively. These are illustrated with the sentences in (4): (4)

A: All men are foolish. I: Some men are foolish. E: No man is foolish. (≈All men are not-foolish.) O: Not all men are foolish. (≈Some men are not-foolish.)

By minimally redefining the quantity axis in terms of informative value and the quality axis in terms of polarity sensitivity, we can superimpose the taxonomy of polarity items directly onto the square, as illustrated in Figure 4.2 and by the sentences in (5). (5)

A: Stella is awfully clever. I: Stella is sort of clever. E: Stella is not at all clever. O: Stella is not all that clever.

A moment’s introspection confirms that the oppositions between these sentences conform to those required by the square. A and E are contraries since they cannot both be simultaneously true, though they can both be false: if Stella is awfully clever, then it cannot be that Stella is not clever at all, and vice versa; although she could be just moderately clever, in which case both A and E would be false. I and O are subcontraries since they cannot both be false, though they can both be true: if it’s not true that Stella is at least sort of clever, then she must not be all that clever either; though again, if Stella is just moderately clever, it will be true both that she is sort of clever and that she is not all that clever. And I

Sensitivity as inherent scalar semantics 93 [PPIs] [emphatics]

A

[NPIs] contraries

E

awfully

(not) at all

contradictories

[attenuators]

(not) all that

sorta I

subcontraries

O

Figure 4.2 Polarity items and the square of opposition

and E, and A, and O are contradictories since in all possible worlds, one member of each pair must be true and the other false: thus, if Stella is in fact awfully clever, it cannot be that she is not all that clever, and if it is not the case that she is awfully clever, then it must be true that she is not all that clever. The relationship between the four sorts of polarity items and the classical square of oppositions is intriguing, for it suggests an important parallel between polarity items and other sorts of quantificational elements. Klein (1998: 115–26) pursues and refines these parallels in her study of degree adverbs in Dutch. As she notes, the real parallel here likely holds among degree adverbs in general, whether polarity sensitive or not, and among other sorts of quantifying expressions. Either way the evidence does suggest one general pattern to the ways in which polarity items may be lexicalized, and this general pattern seems to conform in broad outline to the patterns which hold among quantifying expressions generally. 4.5

The conspiracy theory of polarity licensing

The analysis of polarity items as scalar operators helps explain both what it is that makes polarity items so sensitive and what it is that polarity items are sensitive to. Polarity items are sensitive to scalar inferencing, and they are sensitive because of the interaction of their scalar semantic properties. In forms which are conventionally specified for both q-value and i-value, the two properties will conspire to create polarity sensitivity. The NPI sleep a wink in (6) provides a simple illustration. (6)

a. Marianne didn’t sleep a wink that night. b. *Marianne slept a wink that night.

94 The Grammar of Polarity Here, the emphatic NPI marks a minimal scalar value in an expressed text proposition and requires that the expressed proposition be construed as more informative than what would be expressed by a proposition based on the scalar norm. In (6a), since the expressed proposition, ‘M didn’t sleep the smallest amount,’ entails the norm, ‘M didn’t sleep a normal amount,’ the requirement is met: the NPI counts as emphatic and the sentence is well formed. In (6b), however, the NPI cannot properly express its emphatic i-value: here the expressed text proposition, ‘M slept the smallest amount,’ is itself entailed by the scalar norm, ‘M slept a normal amount.’ This produces a relatively weak assertion, which clashes with the emphatic nature of the NPI. Similar considerations apply to attenuating PPIs like some and a smidge. (7)

a. Brandon had a smidge of jelly on his collar. b. *Brandon didn’t have a smidge of jelly on his collar.

As a scalar operator, a smidge denotes a quantity which contrasts with the set of alternative scalar values underlying a scalar model. The expressed proposition in (7a), ‘that B had a slight amount of jelly on his collar,’ is construed relative to an implicit scalar norm in the evoked model, something like ‘B had a moderate amount of jelly on his collar.’ (Note that the scalar norm does not require any particular expectation that B should have jelly on him; rather, given that he is so besmirched, the scalar norm just reflects what might be expected to constitute a normal degree of besmirchment in such circumstances.) Since having a moderate amount of jelly on one’s collar entails having a slight amount there too, the implicit norm is more informative than the expressed proposition. A smidge happily expresses its low i-value, and so the sentence is attenuating and grammatical. In (7b), however, the implication reversal triggered by negation makes the diminutive a smidge unacceptably emphatic. Here the expressed proposition, that ‘B didn’t have a slight amount …,’ entails the implicit norm, that ‘B didn’t have a moderate amount …’ The emphatic effect is at odds with the attenuating i-value of the PPI, and the sentence is consequently odd, at best. As these examples suggest, the particular combinations of q-value and i-value in PPIs are such that they are only compatible with contexts where inferences run from high to low q-values, that is, in scale-preserving contexts; contrariwise, the particular combinations in NPIs are such that they are compatible only with scale-reversing contexts, where inferences run from low to high q-values. This, in essence, is why polarity items are sensitive to polarity. Polarity in general is a matter of scalar inferencing and polarity items are just

Sensitivity as inherent scalar semantics 95 scalar operators: the proper expression of their lexical semantics depends on the availability of a properly constructed scalar model. Thus far, then, we have addressed the sensitivity problem by defining polarity items as a special class of scalar operators which encode both a proposition’s position within a scalar model and a proposition’s rhetorical informativity. Given this, and given an understanding of informative value in terms of scalar inferencing, the licensing problem effectively solves itself. Polarity items require a scalar model which can support the expression of their conventional scalar semantics, and so affectivity turns out to be nothing more than the property of being construed with respect to an appropriately structured scalar model: [+Affective] contexts are scale reversing; [-Affective] environments are scale preserving. 4.6

The anomaly of inverted polarity items

As outlined above, the “conspiracy theory” of polarity licensing predicts that there should be four and only four types of polarity items. Although one never knows in advance whether a given form will encode the relevant scalar features – the association of a semantic feature with a lexical form is always essentially arbitrary – one can at least expect that these features should not interact randomly. Rather, the precise sensitivities of any given polarity item should be a direct function of the scalar features it encodes. Basically, this means that certain sorts of polarity item should not exist. For example, one should never find an NPI combining a high-scalar value with an emphatic rhetorical force – such a combination should only yield PPIs. Similarly, there should be no PPIs combining a low-scalar value with an emphatic informative value – such a combination should always create an NPI. At this point, it seems, we have a problem. Both types of putatively impossible polarity item, or things very much like them, not only exist but are in fact rather common. Von Bergen and von Bergen (1993: 155–7), for example, discuss a variety of “maximizing” NPIs – forms which emphatically strengthen negation precisely by virtue of their high quantitative values. Typical instances from English include the items underlined in (8). (8)

a. Wild horses could *(-n’t) keep me away. b. I would *(-n’t) do it for all the tea in China. c. I wouldn’t touch it with a ten-foot pole.

Intuitively, it seems that the wild horses in (8a) stands for something like the most irresistible force imaginable. Similarly, in (8b) all the tea in China

96 The Grammar of Polarity represents an unusually valuable reward, one high on a scale of monetary worth. And the ten-foot pole in (6c) is an unusually large instrument, one which maximizes the distance between the toucher and the touched. Such maximizing NPIs are not peculiar to English. Larrivée (1996) notes parallel constructions from French including pour tout l’or du monde ‘for all the gold in the world,’ de mémoire d’homme, roughly ‘in living memory’ (see also Gaatone 1971: 190), and de (toute) sa vie ‘in his (whole) life.’ Similarly van der Wal (1996) and Hoeksema and Rullmann (2001) note ‘maximum quantity’ NPIs in Dutch such as voor goud ‘for gold’ and in de verste verte ‘in the farthest distance.’ Parallel to these troublesome maximizing NPIs is a set of equally troubling minimizing PPIs like those in (9) – constructions with low-scalar q-values which produce emphatic effects in affirmative contexts. (9)

a. Godfrey is (*not) scared of his own shadow. b. She would (*not) betray us at the drop of a hat. c. You could have knocked me over with a feather.

Clearly, the shadow in (9a) is a minimally frightful sort of thing, the dropped hat in (9b) a stereotypically minimal provocation, and the feather in (9c) a minimally forceful tool for knocking things over, but all three contribute to the expression of maximally emphatic positive propositions. The Scalar Model would seem to make these minimalistic emphatic PPIs just as impossible as the maximalist emphatic NPIs noted above. While inverted polarity items may be less common than their canonical counterparts, they are not particularly rare either, and in some semantic fields are fairly abundant. Both English and Dutch (Hoeksema p.c.) feature an open class of inverted NPIs denoting large time spans, as in (10). Similarly, many inverted PPIs denote minimal time spans, as in (11). (10)

a. We have*(n’t) heard from you in a coon’s age! b. You’ll *(never) in a million years guess who I saw last night.

(11)

a. We will (*not) be back in a jiffy. b. In a New York minute, everything can (*not) change.

Comparable examples include PPIs like in a flash, in a second, in a trice, and in a heartbeat, and NPIs like in days, in weeks, in years, in ages, and in a blue moon. It gets worse. Among polarity items denoting monetary values, there are emphatic constructions at both ends of the same scale: unambiguously emphatic NPIs denoting things of little value (canonical: a red cent, a plugged nickel,

Sensitivity as inherent scalar semantics 97

C A N O N I C A L

high Emphatic PPIs

Emphatic NPIs

tons of, utterly, insanely, way, a heap

wild horses, in ages, all the tea in China n

Emphatic NPIs

Emphatic PPIs

a wink, an inch, at all, the least bit

the drop of a hat, a jiffy, a pittance

I N V E R T E D

low

Figure 4.3 Canonical and inverted polarity items

a thin dime, a brass farthing) and others referring to things of extreme value (inverted: for all the tea in China, for all the money in the world, for love or money, for the life of me); and both unambiguously emphatic PPIs referring to things of the greatest value (canonical: a king’s ransom, an arm and a leg) and others referring to things of negligible value (inverted: for peanuts, for a song, for a pittance, for next to nothing). This is the pecuniary paradox of polarity sensitivity. (12)

a. He won’t spend a red cent on your wedding. b. She wouldn’t kiss him for all the tea in China.

(13)

a. Julio spent a king’s ransom on the party. b. But he somehow got Madonna to play for peanuts.

Apparently, the simple correlation between scalar semantics and polarity sensitivity cannot be as simple as one might have hoped, or as the Scalar Model would seem to predict. Maximizing NPIs and minimizing PPIs invert the normal correlations observed among the more canonical polarity items. As Figure 4.3 suggests, the existence of both canonical and inverted polarity items would seem to preclude the possibility of there being any regular correlation between scalar semantics and polarity sensitivity. On the other hand, all of these apparent counterexamples do share a clearly scalar semantics with their canonical counterparts: their rhetorical effects depend on the scalar values they encode. The distribution of emphatic inverted polarity items obeys the same scalar logic which rules the distribution of canonical emphatic items, and as with the canonical items, this scalar logic is driven by the pragmatics of informativity: such forms are acceptable only where their different q-values support the same sorts of emphatic scalar inferences.

98 The Grammar of Polarity In (12b), for example, all the tea in China denotes a highly valuable reward, and under negation triggers inferences about all less valuable rewards: presumably, if a girl isn’t tempted by all the tea in China, nothing less could tempt her either. Similarly, in (9a), if Godfrey fears such minimally fearsome things as his own shadow, he will presumably be scared of anything more fearsome – that is, in effect, of anything. But how can these forms both invert the normal scalar semantics of canonical polarity items, and still obey the same scalar logic? The answer can be found in the different kinds of scales associated with different polarity items. As it turns out, there is a consistent correlation between the sorts of syntactic and semantic roles a polarity item plays within a proposition and its status as inverted or canonical. Prototypical minimizers – canonical NPIs like crack a book, hurt a fly, lift a finger, or breathe a word – feature indefinite direct objects which measure out the action of a predicate. Inverted polarity items, on the other hand, tend to have idiomatic NPs anywhere but direct object position. The wild horses idiom, for example, is perhaps the only English NPI with an idiomatic subject NP, while other inverted items feature indefinites governed by prepositions such as for (for love or money, for a song), in (in a flash, in a million years), with (touch with a ten-foot pole, knock down with a feather), and at (at the drop of a hat, at a moment’s notice). Underlying this superficial syntactic distinction is a deeper semantic generalization concerning the thematic roles typically associated with canonical and inverted polarity items. Canonical polarity items tend to refer to a patient (crack a book, hurt a fly), a theme (lift a finger, move a muscle, bat an eye), or some sort of increment (sleep a wink, drink a drop, budge an inch, breathe a word). All these forms involve entities which are somehow affected by the action of the verb: they are low in the thematic hierarchy, near the bottom of the action chain (Langacker 1987). Inverted polarity items, however, feature participants at the top of the thematic hierarchy – entities which somehow facilitate the realization of an eventuality. The idiomatic use of wild horses, for example, denotes a stereotypically irresistible force which can affect an agent’s behavior. Thematically, the wild horses idiom fits into a more general class of inverted polarity items which depict a stimulus or causal trigger: for example, at the drop of a hat profiles a small event which provokes a big response, and scared of one’s own shadow profiles a minor threat that triggers major fears. Pushing the generalization a bit, forms like all the tea in China and for a song also profile stimuli of a sort – rewards which might motivate one to act. Finally, polarity items involving reference to an instrument (touch with a ten-foot pole, knock down with a feather)

Sensitivity as inherent scalar semantics 99 are always inverted: the use of a bigger or more powerful instrument tends to facilitate the performance of an act. It appears, then, that the division between canonical and inverted polarity items reflects a deeper distinction in the ways scalar reasoning applies to different propositional roles. Certain types of participants function effectively as obstacles to the occurence of an event; others, on the contrary, act as stimuli. A theme or patient, for example, is an entity which must be affected for an event to take place: the bigger it is, the more resistance it offers, the less likely the event will be. An agent or a stimulus, on the other hand, is an entity which must itself be effective for an event to take place: in this case, the bigger the agent, the more powerful it is, the more likely the event will be. In this light, the pecuniary paradox noted above simply reflects the fact that valuables in an exchange are split between two very different sorts of participant roles. As a rule, any participant in a commercial event must both give something up and gain something in return: otherwise, the exchange won’t happen. We may thus distinguish between the valuables given and the valuables gained. The logic of self-interest treats these two types of valuable very differently. All things being equal, a rationally self-interested participant will strive to give up the smallest amount necessary, and to gain the greatest amount possible. The logic of commercial exchange thus depends on whether a given valuable is understood as a Resource – what one stands to lose – or a Reward – what one stands to gain. The greater the demands on one’s Resources, the less likely one will be to accept an exchange; conversely, the greater the potential Reward, the more likely one will be to accept. Given this, canonical polarity items – emphatic PPIs like a king’s ransom and emphatic NPIs like a red cent – can be understood as expressions denoting Resources (things one can own or spend), while inverted polarity items – emphatic PPIs like for a song and emphatic NPIs like for all the tea in China – denote Rewards (things one can gain). Fundamentally, then, canonical and inverted polarity items do obey the same scalar principles. Emphatic NPIs, whether canonical or inverted, always pick out a class of participants – in this case, big rewards and small expenses – which facilitates the realization of an event. Emphatic PPIs, on the other hand, always denote the sorts of participants which militate against the realization of an event – in this case, small rewards and large expenses. Similar considerations apply to the logic of temporal polarity items like those in (10–11), where emphatic NPIs denote large time spans (in weeks, in years, etc.) and emphatic PPIs denote minimal spans (in a sec, in a jiffy, etc.). Of course, there is nothing about the domain of time itself which makes long

100 The Grammar of Polarity time spans emphatic in negative contexts and short ones emphatic in positive contexts. As it turns out, the briefest moments can also be emphatic in negative contexts, as in (14), and the longest periods can be emphatic in affirmative contexts, as in (15). (14)

a. I {won’t/??will} be half a minute. b. I can *(not) for a second believe she would do that.

(15)

a. It has always been so, time out of mind. b. They’ve been married for {ages / an eternity / donkey’s years}.

The key to this apparent chaos is, again, the realization that the same type of entity – in this case, a temporal interval – may be associated with very different sorts of roles within a proposition. In the case of time spans, the crucial difference depends on the aspectual character of an expressed proposition. Basically, whether or not a long time span makes a given eventuality more or less likely depends on the durativity of that eventuality. Punctual events culminate in an instant within a temporal interval: the longer the interval, the more likely it is that the event will actually happen. Durative situations, on the other hand, must hold for every instant of a time span: the more time that passes the more likely it is that the situation will no longer obtain. So in (14) and (15), where the temporal expressions indicate how long a situation will or will not last, the expression of emphasis involves a canonical scale: brief durations are emphatic under negation; long durations are emphatic in affirmations. Inverted forms like those in (10) and (11), however, invariably designate the bounded interval within which an event takes place. Their inverted scales – with short intervals emphatic in affirmation – simply reflect the logic of punctuality: the shorter the interval, the less likely it is to include the moment where a punctual event takes place. Again, the apparent anomaly of these forms turns out to be a regular feature of the roles they play within an expressed proposition. The question remains, though, why certain propositional roles are associated with canonical scales and others with inverted scales. As I noted above, there are suggestive correlations here with traditional thematic roles like agent, patient, and instrument. But thematic structure alone is an unwieldy instrument for sorting out canonical and inverted polarity items. Aside from the fact there is no consensus on the inventory of thematic roles, or even on their status within linguistic theory, it is unclear, at best, how the multiplicity of thematic roles should map onto a binary distinction between inverted and canonical scalar semantics. Put bluntly, the question is what do Agents,

Sensitivity as inherent scalar semantics 101 Stimuli, Instruments, Rewards, and Temporal Intervals all share that distinguishes them from Themes, Patients, Expenses, and Durations? One might think of the distinction broadly as a force dynamic division between “antagonistic” participants (agents, stimuli, etc.) which facilitate the realization of an eventuality, and “agonistic” participants (patients, themes, etc.) which act against the force of an antagonist to impede the realization of an eventuality (Talmy 1985). The explanation seems appealing with a contrast like the one between the affected fly in hurt a fly and the forceful horses in wild horses, but other polarity items do not lend themselves so naturally to an analysis in terms of force dynamics. In an expression like have a clue, for example, it is hard to see how the clue acts against or impedes the ‘having’ relation. Similar considerations apply to the paper in be worth the paper it’s written on and the ghost in stand a ghost of a chance. And it seems a stretch to think of temporal intervals like a jiffy or a million years as forces compelling or impeding the occurence of an event. Another possibility might be to appeal to Dowty’s (1991) notion of protoagent and proto-patient to predict when a polarity item can be inverted. One might thus propose that polarity items are inverted if and only if they occur in argument positions where they have more proto-agent properties than at least one other argument. Dowty’s proto-agent properties – that is, (1) volitional involvement, (2) sentience or perception, (3) causing an event or change of state in another participant, (4) movement relative to another participant, (5) independent existence from the event denoted by the verb – seem like a good place to start, at least. Such a proposal should make the right predictions for at least those canonical polarity items involving a direct object. It also seems to work for the canonical NPI be worth the paper it’s written on: since both arguments of the predicate be worth entail just one of Dowty’s proto-agent properties (i.e. neither one is more agentive than the other), the NPI is canonical. But it is unclear how this approach could handle temporal adjuncts like a coon’s age and in a jiffy, which do not seem to entail any proto-agent properties. And even if the general framework does extend to such cases, it is unclear what one gains thereby, unless one can also somehow explain what it is about proto-agentivity that makes it invert polarity items. Ultimately, to understand the division between canonical and inverted polarity items we must first understand the roles such forms play within the structure of a scalar model. A scalar model is basically a conceptual tool for thinking about the relations between different possible eventualities. The structure of the model is such that if one knows the status of a given eventuality

102 The Grammar of Polarity (i.e. whether it does or does not hold), one may automatically infer the status of other, related eventualities within the model. This in fact is the key to the problem of inverted polarity. Elements on any scale within a scalar model are always ranked in the same way, that is, in terms of the entailments they yield for a given propositional schema. Those elements which in scale-preserving contexts form the propositions with the most entailments are ranked at the top of the scale; those elements which under the same conditions form the propositions with the fewest entailments are ranked at the bottom. The ranking thus does not depend on the objective properties of the scalar elements alone but is crucially determined by the way these properties interact with a given propostional schema. Normally, of course, one thinks of scales more concretely as ordered in terms of amounts or degrees. Canonically, these orderings run from lesser to greater amounts. Prototypical scales measuring things like size, weight, or intelligence regularly conform to this pattern, and the pervasive scalar metaphor, more is up, whereby an increase in amount is conceptualized as a rise in elevation (Lakoff & Johnson 1980), similarly presupposes a canonical ordering of elements from lesser to greater quantities. But the canonical scale is in fact just a special case (albeit the default case), and in this case, as with all others, the ordering depends on the scale’s role within a larger propositional frame. The frame here involves nothing more than the attribution of a scalar property: x has property to extent-y. Such scales are always canonical, running from smaller to larger extents. Their logic reflects the fact that if some entity instantiates a property to some high degree, then it must also instantiate that property to all lesser degrees as well. As Hoeksema and Rullmann (2001) note, this sort of logic makes canonical scales useful in reasoning about the possible existence of different entities. For example, while things may vary infinitely in weight, everything with any weight will weigh at least a minimal amount, but relatively few things will weigh as much as a ton: so, all things being equal, things with a minimal weight will be more likely to exist, and the canonical order for weights will run from light things to heavy things. The distinction between canonical and inverted polarity items depends on the fact that for every canonical scale, there exists a corresponding inverted scale, and which of these two scales appears in a scalar model depends on its role there. So, again speaking of weights, a propositional frame like this camel can carry X has a very different logic from a frame like X will break this camel’s back. The former calls for a canonical scale, the latter for an inverted one. And the choice in both cases depends on the way the variable affects the possibility of the proposition as a whole being true.

Sensitivity as inherent scalar semantics 103 In short, it is an irreducible fact about scalar logic that different roles within a proposition relate differently to the probability of the proposition’s truth. Some roles involve entities which may facilitate the realization of a proposition; others involve entities which militate against its realization. Canonical polarity items always involve roles of the latter sort; inverted polarity items always involve roles of the former sort. The scalar logic in both cases is identical, and in both cases reflects the way propositions in a model are ordered in terms of their pragmatic entailments. Inverted polarity items thus do not undermine the theory that polarity items are scalar operators; rather, they confirm it. These peculiar polarity items do, however, raise questions about the structure of scalar models and, more particularly, about the nature of quantitative value. The basic generalization remains that polarity items encode a fixed position within a scalar model, and that quantitative value is the feature which expresses this positioning. Emphatic NPIs, whether canonical or inverted, consistently encode a low-scalar q-value, and emphatic PPIs, equally consistently, encode a high-scalar q-value. Only the relevant notion of quantitative value is not a matter of size or amount per se but rather reflects the role a profiled participant plays in the realization of a proposition. Loosely speaking, one may think of quantitative value as reflecting a default expectation of how likely it is that a given element on some scale within a scalar model will yield a true proposition. Scalar models themselves constitute complex presuppositions about the way the world usually works, and the orderings of elements within a model simply reflect a default understanding of how those elements will contribute to the realization of a given situation type. To conclude, polarity items are defined not just with respect to the contexts which license them, but also in terms of the roles they fulfill in a larger propositional context. Different propositional roles are associated with different scalar orderings. More precisely, the ordering for any given propositional role in a scalar model depends on the way that role affects the possibility of the proposition as a whole being true. While the simultaneous existence of both canonical and inverted polarity items shows that the relation between sensitivity and scalar semantics is more complicated than one might have thought, the intricate interaction between argument structure and quantitative value in the lexicalization of sensitive constructions itself confirms the central importance of scalar reasoning in the grammar of polarity.

5 The elements of sensitivity

5.1

The Informativity Hypothesis

This book began with a mystery. Polarity items seem like such peculiar c onstructions, their patterns of distribution so apparently unmotivated, one wonders why such forms should exist in any language. But polarity items might not be so mysterious after all. Thus far I have argued that polarity sensitivity is a regular function of the meanings of polarity items. The theory is that polarity items are scalar operators and that polarity sensitivity is a sensitivity to scalar inferencing. This is the “Scalar Model of Polarity,” or simply, the Scalar Model. The key to the Scalar Model is the idea that polarity items are defined by their rhetorical functions, and particularly by the argumentative force which they contribute to the expression of a proposition. Certain polarity items are associated with the expression of emphatic propositions. Others are associated with the expression of attenuating propositions. The hypothesis is that these associations are not just incidental facts about the uses of polarity items, but essential facts about the nature of grammatical sensitivity. Polarity items are sensitive precisely because they are conventionally associated with these rhetorical functions. The Informativity Hypothesis makes two claims about the lexical semantics of polarity items: i) ii)

Polarity items profile an element with a fixed q-value, either high or low in an ordered set of semantic alternatives. Polarity items are conventionally construed with a fixed i-value, either emphatic or attenuating with respect to their ordered alternatives.

It is the interaction of these properties which is held to cause polarity sensitivity. If this is correct, then all polarity items should exhibit both of these properties, and all constructions which exhibit them should be polarity sensitive. The strange case of the inverted polarity items (§4.6) shows that which values count as high or low within a scalar ordering depends 104

The elements of sensitivity 105 in part on how those values are construed within a proposition. But with the inverted polarity items, even if they seem to have the wrong sorts of quantitative values, the values they have are at least clearly quantitative. The question now is, are all polarity items really scalar operators? And if so, is that really such a remarkable thing? In order to test the Informativity Hypothesis, we need some general way (or ways) of assessing when and to what degree particular constructions can be said to encode quantitative and informative values. 5.2

Quantitative semantics

What makes a construction susceptible to the grammaticalization of polarity sensitivity? This is as much a question about the nature of the lexicon as it is about syntax or semantics. Part of the problem is that forms with very similar meanings may differ in their sensitivities, which suggests that sensitivity is at least in part an arbitrary property of constructions. But the fact that something is not fully predictable does not make it unprincipled. In fact, a relatively small number of semantic domains account for the vast majority of polarity items both within English and cross-linguistically (I distinguish twenty-six such domains in the partial catalogue of English polarity items given in the Appendix, though the list is hardly exhaustive). My main claim here is that not only are all of these domains in some sense inherently scalar, but also that all inherently scalar semantic domains – all domains, that is, in which elements are distinguished by quantitative values – can and do support the formation of polarity sensitive constructions. Quantitative value seems like an easy notion, but it is also an easy notion to misunderstand. The most obvious confusion is to assume that in order to encode a q-value, a construction must somehow denote a quantity, but q-value is a feature of many expressions which are not so obviously quantitative in nature Basically, a quantitative value is just a location in an ordering. In the most general sense, any construction which saliently evokes an entity construed as located on a conceptual scale encodes a q-value. The precise scalar location is typically not specified; more often, q-value includes a range of values relative to a scalar norm. For example, the gradable predicates tall, handsome, and polite profile vague regions on scales of height, beauty, and courtesy, respectively; all three, however, encode a high q-value since in each case the specified range is construed as relatively high with respect to some scalar norm (Sapir 1944; Klein 1998).

106 The Grammar of Polarity The idea that many polarity items – or even most – are scalar operators is hardly controversial. Minimizers, for example, clearly encode both quantitative and informative values: they are conventionally emphatic, and their emphatic force depends on their transparent expression of a minimal quantitative value in an appropriately scale-reversing context. And many other polarity items are just as transparently scalar in meaning: indeed, quantifiers and degree modifiers appear to be among the constructions most prone to polarity sensitivities in English (Bolinger 1972), Japanese (McGloin 1972), Dutch, and many other languages (Klein 1998; Hoeksema & Rullmann 2001). But to focus only on the most obviously scalar sorts of constructions would be to miss the bigger picture. In fact, one finds polarity items in abundance among a wide variety of modal verbs, conjunctions, and temporal and aspectual adverbs, all of which exhibit a demonstrably scalar semantics, even if their scalar properties are in some ways less glaringly obvious than those of a quantifier or a degree modifier. One reason why minimizers and degree modifiers may seem more obviously scalar than some other constructions is that they tend to feature, more or less transparently, one element which designates a gradable relation and another which profiles the extent to which that relation is instantiated: for example, in sleep a wink the verb sleep designates a process of variable duration while the measure NP a wink designates a minimal unit of duration. This clear division of labor between the expression of a gradable relation and the expression of a scalar extent effectively highlights the act of scalar construal in the compositional semantics of the construction as a whole. Many constructions, however, do not so neatly distinguish in the meanings of their parts the relation between a gradable relation and a scalar extent. More often a single word will simultaneously evoke a conceptual scale and designate a quantitative value on that scale. This is particularly clear with implicitly superlative adjectives like fantastic, marvelous, or wonderful (each of which can mean just ‘very good’), but it is true of many lexical constructions: verbs like crawl, amble, and run, for example, both designate a manner of bodily locomotion and saliently express something about the relative rates of motion involved; similarly, like, love, and adore denote emotional attitudes and saliently express something about those attitudes’ intensity. All these expressions simultaneously evoke a conceptual scale and highlight a range of values on that scale. Expressions like these are inherently scalar, but their inherent scalarity may be overshadowed by their broader lexical content. Of course, almost any construction can be given a scalar construal. Even the most robustly non-scalar predicates sometimes combine with degree

The elements of sensitivity 107 modifiers: a woman in her ninth month of pregnancy can be said to be very pregnant, and a man who has been shot, stabbed, chopped into pieces, and scattered at sea will certainly be very dead. Similarly, even an apparently innocent predicate like the reciprocal hold hands may allow a scalar construal in a sentence like all we did was hold hands! where it may contrast with other predicates – like kiss, make out, pet, fondle, etc. – representing different degrees of physical intimacy. Nonetheless, these expressions are not inherently scalar: their scalar construals depend on the contexts in which they are used and do not, as such, constitute an essential and indefeasible feature of their meanings. If the claim that polarity items are scalar operators is to have any empirical teeth, this sort of contingent scalarity must be excluded. For a construction to count as encoding a q-value, it must be inherently scalar – it must evoke some conceptual scale as a necessary and conventional aspect of its meaning. The question is, what exactly does this mean? Within Cognitive Grammar (Langacker 1987, 1991) the semantic content of any linguistic expression (including expressions of any arbitrary complexity) involves the imposition of a semantic profile on some base of construal. The distinction can be thought of as a particular linguistic manifestation of the much more general phenomenon of figure/ground organization. Roughly, the profile of an expression E is the cognitive structure (an entity of any semantic sort) to which E conceptually refers: within the overall conceptualization evoked by E, the profile is that subpart which receives focal prominence. The base of an expression, then, is the semantic domain within which an entity is profiled. A standard example concerns the meaning of the word hypotenuse, where the conceptualization of a right triangle provides the base in which a particular line segment is profiled. Given this basic distinction, we may say that a construction counts as inherently scalar to the extent that its profiled content must be construed against a scalar base, that is an ordered set of alternatives. A construction thus encodes a q-value if and only if it includes a conceptual scale as (part of) its base, and profiles, or includes within its profile, an entity located on that scale. From this it follows that many constructions which can support a scalar construal nonetheless do not count as inherently scalar. Thus, while a word like dance contrasts with words like stand, jump, and slither, and while silk contrasts with cotton, wool, and satin, these oppositions are not scalar but absolute. Words like these evoke alternatives, but the alternatives are not ordinarily construed as ordered in any particular way. Not so with items like love, care for, mind, or matter, each of which profiles a kind of gradable relation which can only be construed against a range of more or less

108 The Grammar of Polarity intense alternatives: for these forms, the scale is an inalienable aspect of their meaning. Furthermore, while many expressions denote elements or relations in domains that are inherently scalar (e.g. ability, cost, reward, significance, comprehension, desire, etc.), many others are in fact inherently unscalar – either because their meaning precludes a construal in terms of alternative values, or because the alternatives they evoke are inherently unordered. The construal of a referring expression as definite, for example, cannot depend on the availability of an ordered set of alternatives. While it is possible to construe a specific individual against a set of scalar alternatives (e.g. even the dean was amused), the construal of an entity as fully individuated and specific cannot be a matter of degree: a proper noun like Glynda or George refers to a particular individual and not just to someone who is more specific or individuated than some set of alternatives. Their reference is absolute rather than scalar. Similarly, words denoting basic-level categories of all sorts – colors, shapes, artifacts, and natural kinds – are probably never inherently scalar, at least in their most basic senses. Such words do, of course, present their profiled content against a range of contrasting alternatives (i.e. something counts as red only to the extent that it is not blue, yellow, or orange), but their evoked alternatives are not ordinarily ordered in any particular way. Most constructions and most semantic domains are thus not inherently scalar. Probably any situation can be given a scalar construal, but few expressions actually encode such a construal as an indefeasible part of their meanings, even when one or more conceptual scales is salient in their encyclopedic semantics. Although one can talk loudly, quickly, fluently, angrily, or lovingly to different degrees, the verb talk itself is neutral with respect to all these parameters. Similarly, the verbs eat and drink are neutral as to the quantity or quality of what is consumed, and the verbs buy, sell, charge, and pay say nothing about the quantity or value of what is exchanged. All of these expressions are defined largely in terms of the participant roles they evoke and the relations they profile between these roles, rather than in terms of any scalar contrast between one sort of participant and another. The claim that all polarity items are inherently scalar is thus by no means trivial. Indeed, given the diversity of forms and functions observed across polarity items, the fact that they appear to be drawn exclusively from scalar semantic domains seems highly significant. It also seems to fit with a parallel observation – one which is perhaps equally remarkable, though by and large much less remarked on – that certain sorts of constructions appear never to be polarity sensitive. Thus, it would be surprising to find a language in which the

The elements of sensitivity 109 word for a particular kind of fruit or insect or a variety of clothing could only be used in negative sentences. Such words can, of course, function as stereotypical minimal units in idiomatic NPIs like hurt a fly or care a fig, but where they do so, they are more like measure terms than the names of natural kinds. The fact that many sorts of constructions are not scalar means that the Informativity Hypothesis could easily be falsified by a single example of a clearly non-scalar polarity sensitive item. Although it has often been claimed that such items do exist (e.g. Linebarger 1980, 1987; van der Wal 1997; Szabolsci 2002, 2004), the sorts of forms most commonly cited in support of this claim – e.g. modal NPIs like English auxiliary need, Dutch hoeven, German brauchen, and French être besoin de; phasal adverbs like yet and already; and basic expressions of disjunction and conjunction in Hungarian and Japanese – actually support the Scalar Model. All of these constructions come from semantic domains which are rather trivially scalar in nature, and each of these domains actually hosts a variety of polarity sensitive constructions of precisely the four sorts predicted by the Scalar Model. Ultimately it will be impossible to prove either that all polarity items are scalar operators or that all inherently scalar domains give rise to polarity sensitive constructions. Still, either of these propositions could in principle easily be falsified: the fact that no such falsification appears to be at hand stands in favor of the Scalar Model. 5.3

The pragmatics of informativity

Informativity seems like a rather mysterious property for a construction. Why should speakers need to mark the informative value of the propositions they express? In context, every sentence is just as informative as it is: any extra signaling of informativity would seem, at best, needlessly redundant. A sentence like Kristen was not the least bit impressed counts as emphatic because under negation the minimal q-value of the least bit entails all higher values on the conceptual scale associated with the predicate impressed. The fact that the least bit also encodes a high informative value does not, in itself, make the sentence any more informative. Paradoxically, informative value appears to be a particularly uninformative property. Informativity is fundamentally an expression of speaker affect. In producing an emphatic or an attenuating utterance, a speaker does more than simply assert a proposition – she expresses an attitude toward that proposition, toward its significance and its informative strength. Most importantly, she expresses an attitude toward her audience, saying “you are the sort of person with whom

110 The Grammar of Polarity I would express myself in this way.” Emphasis and attenuation are, in essence, rhetorical strategies for the presentation of self in discourse. Viewed in this light, the notion of informativity as a linguistic property may not seem so mysterious after all. In this section, I argue that informativity in general, and the linguistic encoding of informative value in particular, reflect general strategies for the negotiation of social interaction. They are, in effect, devices for the expression of speaker involvement. The notion of involvement has a rich and rather heterogeneous history of applications in the study of emotive meaning (Caffi & Janney 1994: 343–8), but for our purposes, the basic idea may be usefully cast in terms of a general theory of politeness. Roughly, understatement and attenuation serve to give a hearer more options in responding to a speech act. As such, they may be seen as expressions of a speaker’s detachment and deference to the hearer. Emphasis, on the other hand, is a sign of intensity and speaker involvement. As such, it serves as a marker of camaraderie and solidarity with the hearer. There is more to communication than just the efficient exchange of information. Social interaction in general, and communicative interaction in particular, are hazardous undertakings. There is almost always more to worry about than just making oneself understood. One must consider the feelings of others and the potential danger to one’s own feelings in any social encounter. Broadly construed, such considerations are aspects of politeness. As many have noted (Lakoff 1973; Brown & Levinson 1978; Leech 1980, 1983; Hübler 1983; Geis 1995), the expression of politeness has important consequences for the ways language is used and, ultimately, for the structure of language itself. Brown and Levinson (1978) conceive of politeness in terms of the work undertaken by a speaker and a hearer to maintain each other’s face. As defined by Goffman, face is “the positive social value a person effectively claims for himself … an image of self, delineated in terms of approved social attributes” (1967: 5). For Brown and Levinson, face consists specifically of “two particular wants – roughly, the want to be unimpeded and the want to be approved of in certain respects” (1978: 63): the first of these they call “negative face,” the second “positive face.” Politeness, in this conception, involves the strategies a speaker may use to satisfy an addressee’s face wants. In general, it is in the speaker’s interest to provide such satisfaction since, all things being equal, this is the best way to ensure that the hearer will in turn work to satisfy her face wants. The basic face wants define two basic strategies for the expression of politeness. According to Brown and Levinson, positive politeness consists in the strategies and devices a speaker may use to reassure the hearer that his wants, and more generally his positive conception of himself, are desirable to the speaker

The elements of sensitivity 111 (1978: 106). Basically, a speaker does this by expressing her interest and approval for her hearer, and by generally observing Lakoff’s third politeness maxim, “Make [H] feel good – be friendly” (1973: 298). While positive politeness emphasizes a speaker’s solidarity with the hearer, negative politeness focuses on the need to show deference. Negative politeness consists in the strategies a speaker may use to satisfy a hearer’s desire that his actions should be unimpeded (Brown & Levinson 1978: 134) and corresponds roughly to Lakoff’s first two maxims of politeness: “Give options” and “Don’t impose” (1973: 298). For the most part, strategies of negative politeness are a matter of avoidance. They seek to avoid the negative consequences to a hearer’s face which might arise from some action of the speaker’s. As such, they are tailored to mitigate the imposition associated with specific sorts of face-threatening acts. The prototypical strategy of negative politeness is perhaps the indirect speech act, in which a speaker seeks to avoid imposing by superficially disguising a potentially facethreatening speech act as something more innocent. Thus, as in Gordon and Lakoff’s (1971) example, a speaker may avoid expressing a potentially facethreatening opinion that it’s silly to paint one’s house purple by asking the apparently innocent question Why are you painting your house purple? Emphasis and attenuation are usefully understood as strategies of positive and negative politeness, respectively. Indeed, Brown and Levinson all but explicitly list them as such. The positive politeness strategy which exhorts a speaker to “exaggerate (interest, approval, sympathy with H)” (1978: 109) depends on a judicious use of emphasis. More generally, the strategy “Intensify interest to H” requires a speaker to make her contributions as vivid and intense as possible in order to convey the pleasure of “a good story” (1978: 110). Similarly, understatement and attenuation figure prominently among the strategies Brown and Levinson list for negative politeness. In general, the surest way to minimize the threat associated with any given speech act is to minimize the expressed content of that act. Thus, instead of asking for a piece of cake, one might ask for just a taste, even if a piece is really what one wants; or when seeking the privilege of taking up somebody’s time, one might innocently ask Do you have a minute?, even if what one really wants is much more. Not surprisingly, then, many of the forms which Brown and Levinson list as useful for the mitigation of face threats are in fact attenuating PPIs: among them a sip, a taste, a smidgen, and a little bit (1978: 182). It might be rash to reduce the notions of emphasis and attenuation to simple politeness strategies. At best, such a move would seem to underestimate the full extent of their usefulness. Emphasis is not just a matter of solidarity: emphatic utterances can convey a sense of urgency (e.g. Don’t waste a second!), and

112 The Grammar of Polarity they can express insults as easily as intimacy (e.g. You wouldn’t know your ass from a hole in the wall). Similarly, the low informativity of an attenuated construction is just as useful for protecting a speaker’s positive face as it is for protecting a hearer’s negative face: hedging, for instance, as Brown and Levinson themselves point out (1978: 151), may serve the selfish function of protecting a speaker from criminal liability when giving testimony in a court of law. Clearly, the rhetorical motivations for emphasis and attenuation extend beyond their usefulness in politely protecting the face wants of others. For this reason, it might help to think of informativity in general as a way of expressing interpersonal involvement with an audience or even just with an utterance. The point is that emphasis and attenuation are useful forms of expression, and their usefulness is systematically related to the ways in which they are (or are not) informative. It is this general usefulness which makes informative value a natural sort of lexical semantic feature, and which motivates its conventional association with particular constructions. Beyond politeness, the pragmatics of informativity plays a fundamental role in the generation of non-logical inferences in the interpretation of texts and conversational interaction. In this respect, emphasis and attenuation basically reflect different strategies a speaker might follow in the formulation of an utterance based on the different strategies a hearer might use in interpreting it. Horn (1984, 1989) develops a model of non-logical inference based on two antithetical principles of conversational interaction. As Horn conceives them, these principles reflect a general principle of least effort governing all linguistic interactions: from the hearer’s perspective, least effort requires that utterances should be as easy as possible to understand; from the speaker’s perspective, least effort requires that utterances should be as easy as possible to produce. As Horn notes, this general tension between what is easy for the speaker and what is easy for the hearer has deep roots in the linguistic literature, going back to the work of Hermann Paul, André Martinet, and George Zipf, and to Atlas and Levinson (1981). In Horn’s (1989: 194) formulation, the basic tension is cashed out in terms of two principles of conversational inference: the Q and R Principles (with Q and R alluding to Grice’s Maxims of Quantity and Relation, respectively): Q Principle R Principle

Hearer Economy: Make your contribution s u f f ic ie n t : Say as much as you can. Speaker Economy: Make your contribution n e c e s s a r y: Say no more than you must.

The elements of sensitivity 113 Basically, the Q Principle ensures that a hearer will be given enough information to understand a speaker’s meaning, while the R Principle ensures that the expression of this information will be as simple as possible.1 From the hearer’s point of view, the two principles define distinct strategies for the interpretation of any utterance. Basically, if one assumes that the Q Principle is in effect, it follows that the speaker has given as much information as she could. Consequently, a hearer can infer that any stronger utterance the speaker might have made would either be false, or at least would not be something the speaker could confidently vouch for. This, of course, is the logic which underlies scalar implicature. On the other hand, if one assumes that the R Principle is in effect, it follows that the speaker may in fact mean more than she says, and a hearer may infer that the explicit, expressed content of an utterance does not exhaust what the speaker hopes to convey. This is the logic which underlies irony, allusion, understatement, and indirectness of all kinds. Given Horn’s formulation here, it seems natural to treat these as principles not just for the pragmatic interpretation of an utterance, but actually as defining two distinct strategies for communicative interaction, roughly analogous to the strategies of positive and negative politeness outlined above. The Q Principle thus encourages a speaker to produce the strongest and most informative contribution she can honestly maintain, and so favors the formulation of emphatic utterances. The R Principle, on the other hand, encourages a speaker to say as little as she can without compromising clarity, and as such, it favors strategies of understatement and attenuation. Of course, in Horn’s formulation, the R Principle serves primarily as a constraint on the form of an utterance rather than on its inherent informativity: if the point is to do no more work than is absolutely necessary, the important thing would seem to be to minimize the amount one utters, not the amount one actually communicates. But communication itself can be a risky business. The more information one conveys with one’s utterance, the more one imposes on the credulity of one’s audience, and the more one risks being exposed to disagreement and rejection. It thus seems reasonable to view the principle of Speaker Economy in broad terms as a force which militates not just against excess articulatory effort but more generally against taking any excessive communicative risks. These considerations may help clarify why informative value can be, and often is, a meaningful part of what a speaker is doing in an utterance. As suggested above, informativity is essentially an index of speaker involvement, but it is also a way of involving an audience in the act of communication. The use of low i-value, informatively attenuated utterances puts a light touch on

114 The Grammar of Polarity communication – they give the hearer options, protect his negative face wants, and afford him the pleasure of working out for himself the full significance of what is said. High i-value, emphatic utterances, on the other hand, leave less to chance in the interpretive process, but they also leave a hearer with fewer options for responding. Ultimately, the fact that linguistic constructions can be conventionally tied one way or the other to the expression of emphasis or attenuation allows speakers to signal their attitude toward their own speech acts, and allows hearers a simple way of knowing where they stand. Figures like emphasis and attenuation turn out to be useful in a variety of rhetorical contexts, and through frequent use, they may come to be conventionally associated with particular lexical items or expressions. I have stressed here the role of politeness in explaining this usefulness because it illustrates just how important, and how pervasive, such rhetorical properties can be in actual usage. At the heart of almost any canonical social encounter, there is the desire to make sure that the interaction will come off well, and that all participants (speaker, hearer, ratified and even non-ratified audience) will leave with their feelings and their face intact. For this reason, it is not only useful, but in fact critical that speakers should have at their disposal the linguistic devices they need to express their good intentions. The conventional encoding of properties like emphasis and attenuation provides a speaker with just such devices, allowing her to unambiguously signal the rhetorical nature of her conversational moves. Seen in this light, it would in fact be strange if languages failed to encode such properties. The conventionalization of informative value as a property of particular lexical items is consistent with, and indeed exemplary of, the general tendency noted by Traugott for “meanings to become increasingly situated in a speaker’s subjective … attitude” toward what is said (1988: 411). In general, i-values get associated with particular lexical items because the rhetorical properties of emphasis and attenuation are salient features of the utterances in which they occur. In essence, informativity is a property of sentences used in context. Emphatic sentences convey more or somehow make a stronger claim than might have been expected; attenuating sentences say less or make a weaker claim than might have been expected. I-value, the sentential property, becomes a feature of lexical semantics when particular words or constructions are regularly used in emphatic or attenuating contexts. If a given form occurs frequently and systematically in such contexts, the rhetorical properties of the utterance as a whole may come to be associated with the use of the form itself. This sort of metonymy is in fact a common source of semantic change. A typical example is the tendency for connectives expressing temporal overlap

The elements of sensitivity 115 to develop concessive meanings, as with English while, still, and yet (Traugott & König 1991: 199): often the point of saying that two things occur together is to draw attention to their normal incompatibility (e.g. She’s seven and she’s studying modal logic), and so this notion of controverted expectation may become associated with a marker of simultaneity. In general, if the use of a word frequently involves the expression of an attitude of some sort, that attitude may become an important part of the word’s meaning. How this actually happens with polarity items is a complex problem, but the idea that constructions can over time acquire an informative value, and so gradually become polarity sensitive, explains one fundamental mystery about polarity items, which is why different constructions with similar or even identical referential properties often exhibit very different sensitivities. In general, it seems that lexical semantics plays an important role in determining what sorts of constructions can become polarity sensitive (Hoeksema 1994, 1998; van der Wal 1997; van der Wouden 1996a, b; 1997). The role it plays, however, is far from determinative. Hoeksema (1994), for example, mentions two Dutch verbs, klikken and boteren, both of which idiomatically mean ‘to get along/be compatible,’ but which differ wildly in their affinity for negation: while boteren occurs with negation 98 percent of the time, klikken does so only 40 percent of the time. Similarly, Hoeksema offers corpus data on English verbs of indifference showing that the verb mind, as in (1), occurs in affirmative contexts just 1 percent of the time (with n=341), while the verb care, as in (2), does so 20 percent of the time (with n=792). (1)

a. I really don’t mind waiting. b. Would you mind waiting a little while? c. I don’t mind waiting, but I will mind if you don’t show up.

(2)

a. I don’t care for kippered herring. b. Would you care for some kippered herring? c. I don’t care for kippered herring, but I do care for you!

As the affirmative (c) examples here suggest, these forms are in fact more likely to occur in affirmative contexts where they function echoically in the rejection of a contextually relevant negative proposition. Still, the fact that they occur in such contexts at all qualifies them in Hoeksema’s estimation as quasipolarity items rather than true NPIs. Hoeksema suggests that such quasi-polarity items may become true polarity items by a process of grammaticalization. He notes that many NPIs exhibit three major properties typical of grammaticalized constructions: they are semantically bleached relative to their lexical counterparts, like wink in sleep

116 The Grammar of Polarity a wink or finger in lift a finger; they encode relatively subjective meanings, for example indexing a speaker’s attitude toward what is said; and they exhibit constructional layering, with one form having several uses, some of which are sensitive while others are not. Indeed, even with the most lexical of sensitive constructions, it makes sense to view the property of sensitivity itself as essentially grammatical. Typical examples of grammaticalization involve a loss of lexical independence and a fusing of elements across word boundaries – as in the evolution of tense markers from periphrastic verbal constructions, agreement markers from pronouns, or case markers from erstwhile adpositions. And this is precisely what happens in the development of NPIs, as constructions become increasingly dependent on and fused with the expression of negation. But grammaticalization in general is not just a matter of syntactic erosion and semantic bleaching; typically, it also involves a process of pragmatic strengthening and subjectification (Sweetser 1988; Traugott 1988; Traugott & Dasher 2002). This is where the Informativity Hypothesis seems particularly helpful, as it offers some insight into what it is that allows some constructions to show up first disproportionately and then exclusively in contexts of one polarity or another. Verbs like care, mind, matter, and bother can grammaticize as polarity items because they profile relations in domains – like desire, displeasure, significance, and effort – which are themselves intrinsically scalar. The degree to which such forms count as true polarity items depends on the degree to which they are conventionally associated with a particular informative value: in as much as verbs like care and mind can still function in neutral or emphatic contexts, their association with an attenuating i-value remains more a pragmatic preference than a strict semantic requirement. The attenuating force of such forms is thus essentially a defeasible conversational implicature which may over time grow more difficult to override, until it becomes an indissociable feature of a form’s conventional meaning. 5.4

Assessing informativity

If polarity items really are conventionally associated with rhetorical functions like emphasis and attenuation, these associations should have consequences for their distributions. For a polarity item to be felicitous, it must felicitously express its informative value in an appropriately attenuating or emphatic way. The right scalar inferences alone will not license an item, if its conventional i-value is inconsistent with the rhetorical force of its use. Where such clashes arise, the effects may range from mild semantic anomaly to outright

The elements of sensitivity 117 ungrammaticality. This section briefly reviews a variety of constructions which bias an expressed proposition either toward an emphatic or an attenuating rhetorical force, and which may therefore be used to assess the inherent i-value of the constructions with which they combine. 5.4.1 Diagnostics of emphasis Literally. The English adverb literally is widely used as an expression of emphasis, indicating that an utterance is to be taken in the strongest possible sense (Powell 1992; Israel 2002). Emphatic literally is compatible with emphatic polarity items like sleep a wink and scads but is ungrammatical with attenuators like much or a little bit in its focus. (3) Margo literally didn’t sleep {a wink / *much} before her big test. (4) Belinda literally won {scads of money / *a little bit} of money playing blackjack.

Similarly, in (5–6), where the emphatics sleep a wink and a ton can be felicitously introduced by a breathless You’ll never believe it!, the attenuators much and a couple of sound awkwardly off. (5) You’ll never believe it! I didn’t sleep {a wink / ?much} last week. (6) You’ll never believe it! Belinda won {a ton of money / ?a couple of dollars} at the races.

The anomalies here are mitigated when the attenuators occur without focal stress, which suggests that the anomaly arises from the use of these constructions as expressions of emphatic or controversial new information. Absolute modification. In general, emphatic polarity items can occur in the scope of degree modifiers like absolute and absolutely. Attenuating polarity items cannot. (7) (8) (9) (10)

Gilda absolutely would not {lift a finger to help / *help much}. They’ve been in there for absolutely {hours / *a while}. She brought an absolute {ton/*touch} of energy to the performance. a. I absolutely do not give a damn what you say. b. *I absolutely do not care to comment.

Modification by even. In general, emphatic polarity items can occur in the focus of even. Most attenuating polarity items are ungrammatical in this context. (11)

a. Laura didn’t give even so much as a word of explanation. b. Even wild horses couldn’t get me to write another dissertation. c. She didn’t even bother to return his desperate phone calls.

118 The Grammar of Polarity (12) a. ??Brandon didn’t even eat much. b. ??Are you even entirely sure that’s wise? c. ??Huguette doesn’t even care for asparagus. cf. Huguette doesn’t even like asparagus (let alone love it).

Focal prominence. In general, emphatic polarity items allow, and even welcome, emphatic focus. Because these forms serve to signal the unusual significance of an expressed proposition, they naturally tend to serve as the informational and intonational focus of a sentence. Attenuating polarity items, on the other hand, usually prefer not to draw attention to themselves and so may be awkward where they occur with focal stress, as in (13–14). (13) (14)

The news left me {thoroughly/??somewhat} confused. Lily didn’t {lift a finger / ??make much effort} to help us.

The anomaly here reflects the incoherence of emphatically strengthening an element whose very purpose is to weaken an expressed proposition. But attenuators can occur with focal stress where the stress is used contrastively, for example to attenuate a prior assertion, as in (15–16). (15) (16)

That novel confused me. At least, it sort of confused me. Jude didn’t help with the moving. At least, he didn’t help much.

In such cases, the use of stress is basically metalinguistic rather than truly emphatic: the point is not to strengthen an expressed proposition, but just to highlight the way the proposition is expressed, and so it is not inconsistent with the expression of an attenuated proposition. The point is, while emphatic forms welcome, and even crave, focal prominence, attenuating forms tolerate it only under specific discourse conditions. Horn scales. The distinction between the emphatic and attenuating sentences is nicely illustrated in the syntactic tests developed by Horn (1972, 1989) to define quantitative scales. These tests establish paradigmatic relations between forms ranged on a scale. Thus, the connective construction or at least, as in (17), links two clauses, the first of which must be stronger than the second, while in fact in (18) links clauses where the second is stronger than the first. Negated restrictive particles (like not only/just), with or without a following adversative (like but also/actually) work similarly, as in (19), requiring their second conjunct to be stronger than the first. (17)

a. Margo didn’t sleep a wink, or at least she didn’t sleep much. b. *Margo didn’t sleep much, or at least she didn’t sleep a wink.

(18)

a. Jerry didn’t help much, in fact he didn’t lift a finger. b. *Jerry didn’t lift a finger, in fact he didn’t help much.

The elements of sensitivity 119 (19)

a. Belinda didn’t just win a little bit of money, she won scads. b. *Belinda didn’t just win scads of money, she won a little bit.

The contrasts here clearly show that emphatic and attenuating polarity items differ precisely in terms of their informative strength, and that these differences have real distributional consequences. 5.4.2 Diagnostics of attenuation At least. Attenuating polarity items can occur in the focus of at least, while emphatic polarity items are comparatively awkward in this context. (20) The evening was a disaster, but at least we didn’t spend {all that much money / *a red cent}. (21) Her taste is expensive, but at least it’s not {too / *the least bit} garish. (22) She is a cruel woman, but at least she’s {sort of / *awfully} cute. (23) It’s very baroque, but at least {I’m beginning to / *I thoroughly} understand it.

Hedged concession. Attenuating polarity items can be used to qualify a concession. Given a strong expectation of some sort, and confronted with evidence that the facts do not justify this expectation, a hedged concession allows one to acknowledge the evidence without abandoning the more general expectation. (24)

a. Well, I guess he’s not here yet, (but I still think he will come). b. Well, I guess he didn’t get all that drunk, (but I still think he drank too much). c. Well, I guess she has some interest in tax policy, (but I still think she’d rather go dancing). d. Well, I guess the movie was just a tad pornographic, (but I still think that it was tastefully done).

Anti-concessives. Attenuating polarity items, but not emphatic polarity items, may be used to qualify a concessive construction in order to reestablish an argumentative conclusion. I call such uses “anti-concessive” because while the first clause superficially concedes a point, the second effectively denies that it matters. The formula I might be stupid, but I’m not that stupid is a classic illustration. Because anti-concessives necessarily express a qualified judgment rather than an absolute one, they are an ideal environment for attenuators but are inhospitable to emphatic constructions. The examples below thus sound fine with attenuators like the NPI much and the PPIs a good bit and fairly, but are basically incoherent with the emphatic counterparts of these constructions, the NPI at all and the PPIs a ton and insanely. (25) (26) (27)

He may have danced, but he didn’t dance {much / *at all}. She might not be rich, but she does have a {good bit / *ton} of money. She may not be brilliant, but she is {fairly/*insanely} clever.

120 The Grammar of Polarity This test does not work unless the concessive and the anti-concessive clauses involve values on the same scale. Paired scalar constructions, as below, allow emphatics in the anti-concessive. (28) (29)

She may be beautiful, but she can be awfully selfish. He may have showed up, but he didn’t say a word to either of us.

The patterns of acceptability in these sentences lend support to the claim that polarity items are divided between emphatic and attenuating forms. 5.5

Rhetorical coherence in polarity contexts

If i-value in general is a rhetorical property of constructions, then forms which encode i-values should be sensitive to the rhetorical properties of any potential licensor. Certain polarity contexts, for example, seem by their very nature to express an emphatic proposition, and these contexts, therefore, should be uncomfortable with attenuating polarity items. Consider (30). (30)

Jasmine kept pestering the coach long after a. she had a hope in hell of getting on the team. b. ?she had much hope of getting on the team.

The use of long after here depends on a felt contrast between the intensity of Jasmine’s efforts and the likelihood of her success: the minimizer a hope in hell reinforces this contrast, while the attenuating much undermines it. The construction as a whole, particularly with the modifier long, sets up an expectation for something emphatically surprising, which effectively blocks the use of an attenuating construction. Comparative constructions in general are similarly inhospitable to attenuators, since typically they serve to emphasize the contrast between compared propositions, and the use of an attenuator may undermine this contrast. At least, this is what seems to go wrong with the use of much in (31). (31) I’d rather be trapped in an elevator with a lecherous Martian than spend {a minute / ??much time} with that Murray.

The comparison here serves to emphasize the speaker’s distaste for Murray: the minimizer a minute effectively reinforces this judgment, while the weaker much diminishes it. Similar considerations apply below, where the attenuators all that much and terribly lead to rhetorical anomaly, if not outright ungrammaticality. (32) Jasmine is more likely to chase chimpanzees through the forest than she is to study {at all / * all that much}.

The elements of sensitivity 121 (33)

Taylor visits the moon more often than she gets {the least bit / *terribly} excited about her work.

The significance of i-value is also evident in its effects on the uses of polarity items in questions. As noted above, emphatic polarity items tend to bias a question toward either a negative or a positive response (Lakoff 1969; Borkin 1971; Hinds 1974; Linebarger 1980; Guerzoni 2004). But, as the examples below suggest, attenuating polarity items are more open-ended and less likely to introduce bias one way or another. (34)

a. Did you eat a bite of the cake? (biased: expects a negative response) b. Did you eat much of the cake? (unbiased)

(35)

a. Wasn’t she awfully pretty? (biased: expects a positive response) b. Wasn’t she sorta pretty? (less biased)

In (35) the negated form of the question itself signals the expectation of a positive response; but the choice between awfully and sorta is significant nonetheless. The emphatic (a) sentence is hardly even a question so much as an expression of amazement coupled with a request for agreement. The (b) sentence, with its tentative sorta, leaves room for disagreement and some latitude concerning the degree of expected pulchritude. Rhetorical questions constitute a sort of indirect speech act in which a speaker, by superficially and insincerely requesting information, actually conveys a very definite opinion. Normally, if a question is a sincere request for information, the speaker will not want to excessively prejudice any possible response; however, that is exactly what an emphatic polarity item will do. By posing a question with reference to an extreme value, the speaker renders one possible response extremely informative and the other extremely uninformative: if the answer to (40a) is “no,” we learn precisely how much cake was eaten (none); if it’s “yes,” we know only that at least the smallest amount possible was eaten. Such a prejudicial posing generates the implicature that the speaker in fact has a very definite idea about the answer, and so the question is rhetorical. The attenuating polarity items, on the other hand, allow more room for negotiation, and so can be used to form simple information questions. 5.6

Compositional sensitivities

Thus far I have argued that q-value and i-value are jointly responsible for a wide array of polarity effects. But if these really are independent lexical semantic features, they should occur independently of each other in constructions which

122 The Grammar of Polarity are in themselves not polarity sensitive but which are systematically sensitive in combination with other scalar expressions. Obviously, there are many constructions which encode a q-value of some sort but are not polarity sensitive. Sometimes, indeed, a single q-value can be conventionally associated with distinct constructions which differ in their sensitivities. The degree modifiers below are a case in point. All encode low q-values, but they vary with respect to i-value: only a bit, unlike its near synonyms the least bit (NPI) and a tad (PPI), can occur equally in emphatic and attenuating contexts. (36)

a. Harry is a bit overweight. b. Harry is a tad overweight. c. *Harry is the least bit overweight.

(37)

a. Harry isn’t a bit overweight. b. *Harry isn’t a tad overweight. c. Harry isn’t the least bit overweight.

The positive sentences in (36) all make weak claims and so can function only as understatements or hedged assertions: the emphatic NPI the least bit cannot be accommodated. In (37), where the same q-value yields a strong scalar claim, the sentences can only count as emphatic denials: here, the attenuating PPI a tad is ruled out. But the versatile a bit is fine in both situations. This shows that quantitative value alone does not determine a form’s sensitivity. A similar contrast is found among the high q-value constructions in (38–39), where the insensitive item very contrasts with the PPI awfully and the NPI all that. Here in (39) negation produces a set of contrary propositions, unlike the contradictory ones in (36). (38)

a. Lewis is very clever. b. Lewis is awfully clever. c. *Lewis is all that clever.

(39)

a. Lewis isn’t very clever. b. *Lewis isn’t awfully clever. c. Lewis isn’t all that clever.

In (38a), very marks a high degree of cleverness in an emphatic assertion; in (39a), very marks a high degree of cleverness in a hedged denial. The (b) and (c) sentences show that awfully and all that, while notionally similar to very, are not so flexible. The notion of i-value provides a simple explanation: forms specified for a particular i-value are limited to contexts supporting that value; forms not so specified are free to occur in emphatic, attenuating, or neutral contexts. Forms like a bit and very, while sharing a q-value with their apparent

The elements of sensitivity 123 synonyms, differ in that they do not encode a conventional i-value. They are therefore not polarity sensitive and their distributions are consequently less constrained. At this point one may object that the argument has turned circular.2 While I’ve claimed that polarity sensitivity is predictable on the basis of i-value and q-value, it seems that in (38–39) the determination of i-value itself depends on a form’s polarity sensitive behavior. If there were no other evidence than this for informative value, it would be just a clever diacritic. But as we have seen, i-value does have significant grammatical reflexes. The point is that i-value turns out to be independent from q-value. More generally, i-value cannot be predicted from lexical semantics because i-value is itself a part of lexical semantics, and so its association with any given form is arbitrary. In this respect, i-value is no different from any other lexical semantic feature. Still, it is worth pointing out that the situation here is at least somewhat more complex. The behavior of degree words in polarity contexts clearly demonstrates that there is more to their meanings than the simple specification of a quantitative value. But as it turns out, there is also more to informative value than these examples might suggest. When one compares the behavior of the insensitive a bit with the superficially similar a little strange things happen (Bolinger 1972; Horn 1989: 401). (40)

a. I’m {a bit / a little} worried about the situation. b. I’m not {a bit / a little} worried about the situation.

While both constructions are similarly attenuating in positive contexts, in the black light of negation (Horn 2000a: 147) their true characters come out: a bit in (40b) contributes to an emphatic denial of concern, while a little only denies that the concern is negligible and implicates that it is actually considerable. The effect of not a little here illustrates the classic figures of litotes and understatement: it is litotic in its application of negation to a low-scalar value, and it is truly understating (Israel 2006) in its effective expression of an obliquely emphatic positive proposition. If i-value is a fully functioning lexical-semantic feature, we should be able to find it at work in constructions with or without any accompanying q-value; and where it occurs without any co-encoded q-value, presumably it will not be polarity sensitive either. The obvious example is the focus particle even. Even is not polarity sensitive, occurring freely in both negative and affirmative sentences, and even is not linked to any fixed q-value, since both low- and high-scalar expressions can occur in its focus. But even is sensitive to the interaction of polarity with the

124 The Grammar of Polarity scalar semantics of its focus. While both even the lowest and even the highest are perfectly well-formed phrases, generally only one of the two can occur in any given context, and which one that is depends on the context’s polarity. (41)

a. Dolly can jump over even the highest fence. b. #Dolly can jump over even the lowest fence.

(42)

a. Dolly can’t jump over even the lowest fence. b. #Dolly can’t jump over even the highest fence.

These expressions are not generally polarity sensitive; rather they are sensitive only with respect to a given propositional context. Thus, a change of predicate as in (43) reverses the pattern of acceptability. (43)

a. Dolly has trouble with even the lowest fence. b. #Dolly has trouble with even the highest fence.

Superlatives like those in the (a) sentences here are tantamount to universal quantifiers. As such, these sentences represent remarkable claims and so welcome even as a marker of their unusual informativity; but replacing these superlatives with their polar contraries, as in the (b) sentences, renders the claims trivial and makes even sound bizarre. Because even is not itself tied to any particular point on a scale, it can occur in both positive and negative sentences and still retain its emphatic force; but to do so, its focus must encode a q-value which fits the polarity of the sentence. Concessive conditionals are subject to a similar effect (Sweetser 1990: 134). Either a positive or a negative apodosis may allow a concessive, even if, reading in (44). (44)

a. Dolly wouldn’t marry you (even) if you were the last man on earth. b. Dolly would marry you (even) if you were a monster from Mars.

But given normal background assumptions about marriage and attractiveness, reversing the polarity of these examples, as in (45), blocks the concessive readings. (45)

a. Dolly would marry you (*even) if you were the last man on earth. b. Dolly wouldn’t marry you (*even) if you were a monster from Mars.

The behavior of even, very, and a bit has significant implications for a theory of polarity. First, quantitative value and informative value are autonomous lexical features. Either one can be conventionally associated with a lexical item independently of the other: even encodes an emphatic i-value but is neutral as to q-value; very and a bit encode high and low q-values, respectively, but are unspecified for i-value. And the behavior of these three forms as a group

The elements of sensitivity 125 demonstrates that the three parameters relevant to polarity sensitivity – polarity, q-value, and i-value – all interact independently of polarity items themselves. Most importantly, the systematic interaction of these features shows that the grammar of polarity sensitivity itself involves a regular process of semantic composition. This point is brought home by the fact that i-value and q-value produce the same grammatical effects whether they co-occur within a single lexical item (e.g. in polarity items like the least bit), or come together as the result of syntactic composition, as in (40–44). Thus, although neither even nor a bit are strictly speaking polarity sensitive, their conventional scalar semantics ensures a regular interaction with polarity contexts. In (41–43), where even expresses an emphatic i-value, a change in sentence polarity necessitates a change in the q-value of the focus. Similarly, in (36–40), while a bit and very mark a constant q-value, a change in polarity brings with it a change in the sentence’s i-value. The implications for a theory of sensitivity are simple and profound. If an expression is such that it conventionally holds constant both quantitative and informative value, that expression will be acceptable only in contexts where both its quantitative and informative values can be compatibly expressed. Polarity items are scalar operators and polarity sensitivity is a sensitivity to scalar inferencing. The account developed here has three major virtues: first, by recognizing polarity items in general as a semantically coherent class of expressions, it explains their distributions directly in terms of the meanings they encode; second, the proposed classification provides a unified account for a wide range of both NPIs and PPIs; and finally, by distinguishing emphatic from attenuating polarity items, the account provides the beginning of a principled explanation for distributional differences between two broad classes of polarity item.

6 The scalar lexicon

Negation seals strange friendships.

6.1

Dwight Bolinger (1960: 380)

Paradigmatic predictions of the Scalar Model

The Scalar Model aims to explain what polarity items are and why they should exist. It also makes clear predictions about what sorts of constructions can be polarity sensitive and where they might be found in the lexicon. If all polarity items are scalar operators, it follows that polar sensitivity can only arise in scalar semantic domains and that all polarity items must profile an entity against a scalar base. Given the diversity of polarity items in both form and meaning and both within and across languages, this might seem an unlikely, or even quixotic, hypothesis. But scalar reasoning is itself a broad and abstract phenomenon, and given its pervasiveness in language and cognition and the ease with which almost anything can be construed against a set of alternatives, this is really a rather weak claim. However, if polarity items really are all conventional expressions of rhetorical affect – of emphatic and attenuating i-values – then they should arise in contexts where there are good pragmatic reasons for expressing such a rhetorical stance. Furthermore, if sensitivity really is an effect of frequently felt pragmatic needs, then there should be regular patterns in the types of expressions which become polarity sensitive and the types of functions these serve, both within and across languages. Thus, polarity sensitive constructions are likely to have sensitive synonyms (or near synonyms) within a language and sensitive counterparts in other languages. Indeed, the Scalar Model predicts that all polarity items should divide into just four basic sorts: (i) Low-scalar emphatic NPIs; (ii) Low-scalar attenuating PPIs; (iii) High-scalar attenuating NPIs; and (iv) High-scalar emphatic PPIs. And since both q-value and i-value are assumed to operate independently of any particular semantic domain, it follows that any domain which includes polarity items of any one sort is liable to include NPIs and PPIs of all other three sorts as well: domains in which NPIs 126

The scalar lexicon 127 occur should host PPIs too; and domains with emphatic operators should also typically include attenuators. Of course, the Scalar Model is founded on the observation that all four types of polarity item are abundant in the lexicons of English and many other languages. But the model is confirmed only to the degree that all sorts of polarity items can be assimilated to just these four semantic types. Given the lack of any consensus as to what the complete class of observable polarity items is in any language, this may be a tricky proposition to assess. Most practical and theoretical work on sensitivity has focused on the syntagmatic distributions of polarity items in sentence grammar, and has been much less concerned with their paradigmatic distributions in the lexicon. But there is no reason to assume that the latter should be any less orderly than the former, and the Scalar Model makes very specific predictions as to just what sorts of order to expect. The evidence from English strongly supports these predictions. My partial catalogue of English polarity items (see Appendix) includes constructions from some twenty-six scalar semantic domains, of which at least nineteen include multiple instances of each of the four construction types predicted by the Scalar Model. NPIs and PPIs regularly arise in precisely the same scalar semantic domains, some very abstract (e.g. quantity, degree, frequency, potential), others more lexically contentful (e.g. similarity, significance, effort, affection). The items in these domains are themselves quite heterogeneous: they are not all sensitive to the same degree or in the same ways in all dialects and registers of English. But their overall distribution in the lexicon does form a pattern, and it is precisely the pattern the Scalar Model predicts. This chapter focuses on three of the less obviously scalar classes of polarity items – modals, connectives, and aspectual operators – to make the case that indeed, all three classes are grounded in inherently scalar semantic domains, and, as predicted by the Informativity Hypothesis, sensitive constructions in these domains conventionally encode emphatic or attenuating scalar pragmatic meanings. 6.2

Modal polarity items

The semantics of modality is notoriously complex, and the complexity begins with the problem of just what modality is supposed to be. In general a modal operator is any construction which profiles the status of an expressed proposition with respect to a speaker’s conception of reality. The most basic sorts of modal operators express notions of ‘possibility’ or ‘necessity’ in relation to everyday reasoning (epistemic modality), social obligation (deontic modality),

128 The Grammar of Polarity or mental and physical ability (dynamic modality). These modalities are of special interest because they are cross-linguistically prone to grammaticalization (Bybee, Perkins & Pagliuca 1994; van der Auwera & Plungian 1998), and because within languages they often exhibit close structural and historical relations (Traugott 1989; Palmer 1990; Sweetser 1990). Syntactically, modal constructions occur in various guises with either a verb phrase or clausal complement: there are modal adjectives (e.g. possible, impossible, easy, hard), modal verbs (e.g. can, may, must, need, want), modal nouns (e.g. chance, hope, or prayer in have a ___ of V-ing), and modal adverbs (e.g. maybe, certainly). Such constructions are often constrained in their combinations with other logical operators, and particularly with negation, both in English (Palmer 1990; Cormack & Smith 2002) and cross-linguistically (Palmer 1995; de Haan 1997; van der Auwera 2001). A modally unmarked proposition presents a situation as actual with respect to some conceived reality – as occurring in a mental space with the status of fac t in the past or present (Cutrer 1994; Fauconnier 1997) – while modally marked propositions (i.e. things which are possible or necessary or preferable or imaginable) present a situation as a kind of potential with respect to a real or imagined world (i.e. in a mental space not construed as fact ). I use the term potential here generally to include such relations as the ease with which a situation might be realized, the evaluation of a situation as desirable or distasteful, and the status of a situation as logically, morally, or physically possible. I assume here, as is common in cognitive linguistics, that while these different sorts of potentials are logically and ontologically very different sorts of creatures, they are all commonly conceptualized in terms of basic embodied, force dynamic image schemas (Talmy 1985; Sweetser 1990; Langacker 1991 – cf. Portner 2009). Thus, the relation between the subject NP of a modal verb and the process profiled in its complement VP is understood as a force – a compulsion, attraction, enablement, or blockage – which can push a proposition into or out of construal as fact. Different modal constructions contrast paradigmatically both in the kinds of potentials they encode (i.e. dynamic, deontic, epistemic, bouletic, etc.) and in the strength of their profiled potencies: strong modals like need and must have high q-values, while weak modals like can and may have low q-values. Modality provides a fertile field for the growth of polarity items because modality itself is a scalar phenomenon. Modal operators have traditionally been understood as the propositional analogues of nominal quantifiers (Jespersen 1917, 1924; von Wright 1952; Horn 1972, 1989: 259ff.; Kratzer 1991; von Fintel 2006). Thus, in possible world semantics, necessity

The scalar lexicon 129 operators are basically universal quantifiers over sets of worlds, while possibility operators are existential quantifiers.1 The analogy between quantifiers and modals need not be exact – the important point is that like quantificational operators, modal operators support scalar inferences. Just as all entails some, so necessity entails possibility, and obligation entails permission. The scalar structure of epistemic (1) and root (2) modality is illustrated below. (1)

It’s after midnight, so the game ___ be over by now. a. must b. should c. may

(2)

You ____ eat your Brussels sprouts before you have dessert. a. must b. should c. may

The strong (a) modals express necessity and encode the sort of extreme q-value found in a quantifier like all or every; the mid-scalar (b) modals express likelihood or obligation and feature high q-values analogous to that of most; and the weak (c) modals express possibility or permission and have low q-values like those of some or any. Probably the best-known polarity sensitive modals are expressions of necessity like the semi-auxiliary use of need as in you needn’t worry. In its most basic use, the lexical verb need denotes a social or physical necessity: for something to need to happen, there must be some positive force, either a social obligation or a physical requirement, favoring its occurrence. In this sense, need encodes a high q-value and contrasts with weaker modals like can or could: normally, if one needs to do something, it follows that one could do it as well. But since, as an NPI, auxiliary need occurs only in scale-reversing contexts, its high q-value contributes to the expression of weak, attenuating propositions. The attenuating force of need makes it particularly suitable for indirect requests as in (3), where the denial that something is necessary (you needn’t) gently implicates that it is also undesirable (please don’t). (3)

a. You needn’t be so coy. b. You needn’t leave yet.

These sentences are conspicuously uninformative if one wants to know what it is one should or should not do, and so they sound more like muted requests than simple statements of fact. They exhibit negative politeness by giving options instead of dictating a desired course of action (Lakoff 1973).

130 The Grammar of Polarity Similarly sensitive needful constructions are found in many languages. These include German brauchen+VP, Norwegian å behØve (Johannessen 2003), Dutch hoeven (Hoeksema 1994; van der Wal 1997), French être besoin de, and Mandarin yòng (Edmondson 1983; van der Wouden 1996a, b). Despite the evidence of a clear cross-linguistic pattern here, these forms remain marginal in most studies of polarity, and as Duffley (1994) notes, where they are discussed, they are mostly seen as an interesting anomaly – either an odd sort of a modal (Coates 1983; Palmer 1990) or an unusual kind of NPI (Jackson 1995; van der Wal 1997). But such constructions may be more ordinary than many have supposed. In fact, modal expressions of all sorts are prone to the four polarity sensitivities predicted by the scalar model. The well-documented ‘necessary’ NPIs are complemented by an even larger class of ‘possible’ NPIs, in which the notion of ‘possibility’ serves an emphatic function. And both the ‘possible’ and the ‘necessary’ NPIs are complemented by robust classes of attenuating ‘possible’ PPIs and emphatic ‘necessary’ PPIs. English ‘necessary’ PPIs include auxiliary and catenative verb constructions with must, should, have got to (gotta), and (had) better, all of which, like some, obligatorily take wide scope over negation. While needn’t expresses the absence of a requirement to do something, shouldn’t and mustn’t profile a requirement to not do something: mustn’t means ‘necessary not’; needn’t can only mean ‘not necessary.’ (4)

a. You {shouldn’t/mustn’t} be so coy. b. You {shouldn’t/mustn’t} leave yet.

These constructions are a rather weak breed of PPI, since though they cannot be interpreted in the scope of negation, both allow negative contraction (in many dialects, at least – mustn’t is uncommon in American English). The constraints on the periphrastic modals (have) got to and (had) better are stronger: these constructions cannot combine directly with negation at all. While both can take a negative VP-complement, as in (5b), neither allows negation in the higher auxiliary phrase, as in (5c). (5)

a. You {have got to / had better} finish the report by Tuesday. b. You {have got to / had better} not worry so much. c. *You {haven’t got to / hadn’t better} finish the report by tomorrow.

Many expressions of ‘necessity’ are similarly uncomfortable in the scope of negation – among others, the complex auxiliary constructions be bound to, be compelled to (at least when used epistemically), and the epistemic modal adverbs surely and certainly, which can emphasize a negative assertion (e.g.

The scalar lexicon 131 I certainly did not eat the last cookie!) but cannot occur to the right of or be interpreted inside the scope of negation (e.g. *I did not certainly eat the cake). Nor is it just ‘necessary’ modals that are subject to such sensitivities. One also finds polarity items in abundance at the low end of the modality scale, where ‘possibility’ is the modal analogue to low q-value quantifiers and indefinites. Probably the best-known example of a ‘possible’ NPI is the epistemic use of the auxiliary can (Horn 1972), which, unlike epistemic could, occurs only in negative (6) and interrogative (7) clauses. (6)

a. They actually pay you to make up words? You can’t be serious. b. He says they pay him to make up words. He could/*can be serious.

(7)

a. Can this really be happening to me? b. *This can really be happening to me.

Epistemic can is essentially an expression of shocked disbelief. It profiles a minimal degree on a scale of possibility and is restricted to use in emphatic propositions where every possibility is effectively excluded. Other ‘possible’ NPIs are even more obviously emphatic. For example, modal nouns and nominals like chance, prayer, ghost of a chance, and snowball’s chance in hell all express emphatically minimal likelihoods where they occur as values of X in the have an X of V-ing construction. Also worth mentioning here are the idiomatic uses of seem, manage, and begin with can and could as in I {can’t/*can} seem to get her attention; We {couldn’t/*could} manage to fix it; and You {can’t/*can} begin to imagine what it was like. Each of these complex NPIs expresses a minimal ability of some sort: as such they are all emphatic NPIs profiling a low q-value on a dynamic modality scale.2 Just as the attenuating NPI need finds an emphatic PPI counterpart in must, so the low-scalar epistemic can is complemented by epistemic may, which is an attenuating PPI. In (8), where may indicates permission, it allows either narrow or wide scope with negation, though the narrow scope reading is generally preferred; but where may indicates logical possibility, as in (9), it obligatorily scopes over negation. (8)

You may not leave the table. a. Narrow: You do not have permission to leave. b. Wide: You have permission to not leave (= you can stay).

(9)

We may not get a chance to talk again. a. *Narrow: It’s not possible that we will talk again. b. Wide: It’s possible that we will not talk again.

132 The Grammar of Polarity The exclusion of epistemic may from questions, as in (10), strengthens the arallel between the PPI may and the NPI can. p (10)

a. *May this really be happening to me? b. Her light is on. Can/Might/*May she be home already?

Epistemic may thus appears to be blocked in just those contexts – questions and negatives – where epistemic can is licensed. Together the two divide up the expression of logical possibility in much the same way that need and must divide the expression of necessity and obligation: can is an emphatic NPI, with low q-value and high i-value; may is an attenuating PPI, with low q-value and low i-value. Other modals patterning with may as low-scalar attenuating PPIs include might, might well, just might, could well, and could just as well, all of which allow negated VP complements (e.g. He could well not be there or She just might not know) but require any tautoclausal negative operator to occur to their right and to be interpreted in their scope. The modal adverbs maybe and perhaps are similarly constrained, as illustrated below. (11)

a. Maybe/Perhaps, it will/won’t rain tonight. b. *It won’t {maybe/perhaps} rain tonight.

The evidence from English suggests that modal constructions of all sorts are prone to polarity sensitivity, and that the sensitivities they display are a function of their basic modal meanings: low q-value ‘possible’ modals grammaticalize as emphatic NPIs and attenuating PPIs, high q-value ‘necessary’ modals as emphatic PPIs or attenuating NPIs (see Appendix, section 8). And if such constructions are common, then presumably they must be somehow useful. Indeed, the Scalar Model predicts that scalar constructions will be polarity sensitive only to the extent that they are conventional expressions of rhetorical affect, encoding either an emphatic or an attenuating informative value. Consider the ‘necessary’ polarity items. Because hearers in general prefer utterances which are highly informative, and because scalar extremes are such salient conceptual reference points, speakers in general are inclined to make emphatic claims. This leads to the grammaticalization of modal PPIs like gotta and better in just those contexts where their ‘necessary’ meanings are effectively emphatic. But emphatic claims are by their nature more likely to be false than more moderate claims, and so are more frequently subject to questioning or denial; and this favors the grammaticalization of ‘necessary’ operators in negative contexts where they are used to rebut their positive counterparts.

The scalar lexicon 133 The modal adverb necessarily, for example, appears in the bnc about three times as often in negative contexts, as in (12), as it does in simple affirmatives, as in (13). (12) (13)

a. I tried to make her understand that kids weren’t necessarily the key to happiness. b. To be a layman, even to be anticlerical, is not necessarily to be irreligious. c. “Life isn’t necessarily fair, Miss Levington,” he rapped. a. This could obviously cause problems when information is recorded electronically, since any print out will necessarily be a copy of the original. b. And we can go on keeping it ad lib. whereas Byrd concerts will necessarily be few and far between. c. It is necessarily selective and undoubtedly subjective in choice of material, and the author apologizes where appropriate.

But while necessarily can appear without negation, where it is negated, and particularly when it is used epistemically or metalinguistically to express disagreement, the negation is often indispensable. In positive uses like those in (13) necessarily presents a proposition objectively, as a consequence of the way the world is: thus the necessity in (13a) is inherent in the nature of electronic records, while the paucity of Byrd concerts in (13b) is due to the inherent difficulty of arranging them. But in (12) the relevant notion of ‘necessity’ concerns the degree to which an inference is justified: thus in (12c) the point is not so much to deny that life is always of necessity fair (which probably no one believes anyway), but to counsel against the not uncommon expectation that it will be. Similarly, in (14), where necessarily works as a conversational particle expressing “a non-committal response to a question or suggestion” (OED II), it must occur with negation. (14) “I’ve just had the most boring night of my life with Bucky Leo and his one amazing brain cell.”… “A no-go, then?” “Not necessarily… He has one mighty fine body.” (S. Stewart, Sharking, xiv. 234)

Here, as in (12), the negated necessarily provides a way of delicately demurring from a contextually salient conclusion without having to fully disagree with one’s interlocutor. It is thus classically attenuating. It leaves options open. The same general tendency which drives (not) necessarily to its frequent use in gentle denials is evident also in the frequent use of auxiliary need with predicates denoting speech acts and anxious emotions. Thus, of the forty-five instances of need hardly in the BNC, thirty-eight (84 percent) feature auxiliary need complemented by a speech act verb and a first person actor, as in (15).


a. I need hardly say that my wife’s first impression of Lewis differed somewhat from my own. b. “I need hardly tell you,” he continued in his dry voice, “what a blow you dealt to she who cared so much for your welfare.” c. It need hardly be pointed out that the provision of additional health/leisure facilities would also justify an increase in room rates.

The effect in these examples is distinctly mitigating. In each case, what is said is just that an informative act is unnecessary, presumably because the information it would provide is too obvious to mention. The use of need hardly is thus attenuating in the trivial sense that the proposition it expresses is conspicuously uninformative and literally non-informing. But the construction here is also understating in the stronger sense that it allows the speaker to mention a proposition without actually taking responsibility for saying it. Thus, the most common use of auxiliary need in this sample is as a device for coyly expressing something which is explicitly left unsaid. The effects are similar where need is used with predicates of worry and distress, as in (16), to discourage an addressee from some presupposed potential discomfort. (16)

a. And you need not worry about whether I was safe or not. b. You need not be ashamed of any degree from Glasgow. c. “You need not trouble yourself, Doctor Sparrow, you need have no anxiety about the question of a wheelchair.”

This use only occurs where an addressee is presumed to be susceptible to some negative emotion, and it serves as a way of both indirectly acknowledging this feeling and gently dismissing it. Pragmatically, what auxiliary need and NPI necessarily share in these examples is a tendency to qualify a highly topical proposition – either one recently posed in a discourse, or one likely to be entertained anyway by the hearer. These are thus dialogical and inherently argumentative constructions – they are used to respond to old propositions in a discourse rather than to construct new ones. But they are not exact synonyms either. Auxiliary need seems especially well suited to and common in the formulation of indirect requests (3), oblique assertions (15), and gentle reassurances (16). NPI necessarily serves primarily as a way of objecting to a conclusion that one’s addressee may have jumped to. A better synonym than need for necessarily is perhaps the use of just because which Bender and Kathol (2001) call the “JB-X DM-Y construction.” The examples in (17–18) suggest that this construction is an NPI and that it is licensed in rhetorical questions, conditional antecedents, and indirectly negated clauses.

The scalar lexicon 135 (17)

a. Just because we live in Berkeley doesn’t mean we’re left-wing radicals. b. *Just because we live in Berkeley means we’re left-wing radicals.

(18)

a. “Just because a guy has bleached hair, winter tan, speaks slowly and is pleasant to the point of being vacuous … does that mean he’s a surfer.” b. If just because we live in Berkeley means we’re left-wing radicals, you have some serious misconceptions about our city. c. Don’t assume that just because we live in Berkeley means we’re left-wing radicals.

This construction features a preposed just because clause expressing the grounds (X) for a possible conclusion (Y), and a main clause proposition denying the inference from X to Y. Typically, this denial is lexicalized by the words doesn’t mean, but other predicates are possible if they effectively convey that X is not a reason to conclude Y. In effect, like NPI necessarily, the JB-X DM-Y construction presupposes a context in which some topical proposition X is considered a strong argument for a conclusion, and it serves to deny that the conclusion is warranted. As with necessarily, there is some question as to whether this construction really is polarity sensitive. Based on examples like those in (19), Bender and Kathol conclude that it is not an NPI but rather is licensed “by any environment that distances the speaker from the belief that X in fact implies Y.” (19)

a. Kim seems to believe that just because we live in Berkeley means we’re left-wing radicals. b. So what you’re saying is just because we live in Berkeley means we’re left-wing radicals.

I think this is right, but what it shows is just that, like many other NPIs, this construction can be licensed by an appropriate negative implicature (Linebarger 1980; Horn 2001; Israel 2004, 2006). In (19) these implicatures are triggered by the representational predicates seems to believe and what you’re saying, which suggest a contrast between the speaker’s beliefs and those of some other person. In fact, many polarity items can be licensed by just this sort of quasi-ironic distancing from a proposition. The expression big deal, for example, occurs in negative understatements only about 70 percent of the time in the BNC, and as such seems not to be a true NPI. But where it occurs without negation, it is either in ironic exclamations (big deal!) or in contexts which place some epistemic distance between a speaker and an expressed proposition, as in the examples below where the big deal is embedded under verbs like think, feel, and seem.


a. That’s because parents in Sylhet seem to think it is a big deal, a real status symbol to get a Biliti Bor (a bridegroom from England). b. The extent to which they are felt to be a big deal for the pupils will mirror the extent to which they are felt to be a big deal by their teachers. c. “It always seemed like a pretty big deal to me,” he said.

While the constraints on the big deal and JB-X DM-Y constructions may be looser than those on some NPIs, they do not seem to be different in kind. In fact the two constructions seem to be fairly representative of two larger classes of polarity items: thus big deal is, like the end of the world and something to write home about, an attenuating NPI which operates on a scale of significance and contrasts with emphatic NPIs like matter and make a difference (see Appendix, section 9). Similarly, the JB-X DM-Y construction is not so different from modal polarity items like the emphatic NPI epistemic can or attenuating ‘necessary’ NPIs like English need and Dutch hoeven. Of course, these sorts of connections across polarity items may be difficult to discern if one lacks a proper theory of what it means to be a polarity item. The Scalar Model predicts that modal operators in general should grammaticalize as polarity items precisely where their use is most pragmatically loaded, and the behavior of the ‘necessary’ NPIs reviewed here suggests that one of the more useful functions a polarity item can perform is the attenuated expression of a contrary opinion. But while the ‘necesssary’ NPIs are all similarly argumentative in their meanings, they are not identical in the pragmatic problems they solve. ‘Necessity’ itself is a very abstract sort of concept, and as such it plays a role in the conceptualization of more concrete semantic domains and may be useful in a wide variety of communicative contexts. Indeed, modal operators appear to be magnets for polarity sensitivity precisely because modality is such a quintessentially abstract scalar semantic domain. Thus, beyond the traditional realms of epistemic, deontic, and dynamic modality, many more contentful verbal polarity items incorporate notions of ‘possibility’ or ‘necessity’ in the broad sense that they denote relations which can affect the likelihood of a situation obtaining. Many of these are psychological in nature, for example states like desire, aversion, tolerance, and antipathy, which can be thought of as forces which compel a subject either to seek out or to avoid a situation of some sort. The Scalar Model is thus further confirmed to the degree that these more contentful sorts of modal domains also include polarity items of all four predicted types. A large class of English verbal polarity items profile different states of desire, like the care to V and the dream of V-ing constructions illustrated below (all unasterisked examples are from the BNC).

The scalar lexicon 137 (21)

a. It’s not a decision I would care to have to make. b. No one here, I trust, would care to disagree. c. Would you care to spend Christmas with me? d. *It’s a decision I would care to make.

(22)

a. I would never dream of marrying for anything less than love. b. If you think I’d dream of sharing so much as a blanket with you after that you’re crazy! c. Who, he asked, would dream of privatising the Royal Navy? d. *I would dream of sharing a blanket with you.

I take it that the care to V construction profiles a moderate-to-strong desire and is thus attenuating, while the would dream of V-ing construction denotes a minimal inclination, and so is emphatic. The pragmatic differences between the two are particularly clear in questions, where the use of would dream of, as in (22c), strongly anticipates a negative response, while the milder care to, as in (21c), makes an indirect invitation with a real hope of a positive response. Indeed, in formulae like would you care…, if you’d care…, you might care…, and perhaps you’d care…, all of which are abundant in the BNC, the care to V construction has arguably grammaticalized as a kind of illocutionary forceindicating device for invitations and polite directives. Like the would dream of construction, the semi-auxiliary use of dare in (23) is a low-scalar emphatic NPI which profiles a subject’s minimal inclination to act in some way. (23)

a. Of what followed I cannot tell in detail – I dare not put it into words. b. I was wearing dresses that showed more than she ever would dare, before she was born. c. But who would dare approach the aloof Lady Eleanor? d. *I will dare put it into words.

Both of these constructions mark a low degree of willingness to act, and so, in the scale-reversing contexts to which they are limited, they form highly informative propositions emphasizing the unlikelihood of an event. The analysis of dare as encoding a low q-value may seem counterintuitive. The problem is that ‘daring’ itself is a scalar notion, and the verb dare seems to indicate a high degree on a scale of daring or audacity. But audacity itself is a relatively weak indicator of future action: one’s having the audacity to do something does not entail that one will do it, or even that one would want to; and if one lacks the audacity to do something, then in that respect one must prefer not to do it. Parallel to NPIs like dare and would dream of, many PPIs also operate on a scale of inclinations – among others, the attenuating would rather construction

138 The Grammar of Polarity illustrated in (24) and the emphatic would love to construction in (25). Examples in (24–25) are from the BNC. (24)

a. I would rather have been down at the villa making figgy hedgehogs for Tony, but a promise is a promise. b. I would rather do anything than have to fly.

(25)

a. I would love to hear from you again, if you can spare the time. b. I would love to have a chauffeured Cadillac, but I can’t afford it.

One striking piece of evidence for the status of these constructions as PPIs comes, ironically enough, from the ways they combine with negation. PPIs are usually acceptable only in the scope of negation where the negation itself is construed in the scope of another polarity trigger (Baker 1970; Chierchia 2004; Szabolsci 2004; §3.6.3, above). Assuming that simple questions and denials are more frequent than negated questions and denials, if a construction commonly occurs with negation only when it is also in the scope of a question, conditional or a negative, then that construction is very likely a PPI. There are just two instances of the string would not rather in the entire BNC, both in (26), compared to 548 for would rather, and no instances of would not (or wouldn’t) love to. Where instances of the latter can be found on the web, they are almost always in rhetorical questions, as in (27). (26) (27)

a. Who would not rather own to theft and deception within the Church’s writ, rather than put his neck into the sheriff’s noose for murder? b. I wonder whether Christ would not rather go to Calvary again than to suffer the unfaithfulness of some of his friends. a. What man on all of Terra would not love to have private nude dancers running around a room of eight foot tube lights suspended from the ceiling. b. What guy wouldn’t love to claim bragging rights as Wendy Shalit’s first lover?

While desiderative polarity items profile an experiencer’s positive desire for or approval of a potential situation, another large class of polarity items operates on precisely the opposite sort of scale – a scale of aversions, where affective attitudes range from a minimal ability to keep away from something to absolute loathing for it. At the low end of the scale, we find predicates which denote an ability to resist or abstain from engaging in some act, or else which indicate simple indifference (e.g. the attenuating PPI can take it or leave it). Falkenberg’s class of “abstentive” NPIs (2001: 81) fit in here as emphatic expressions of minimal aversion. Her examples from German include anstehen ‘hesitate,’ erwarten können ‘can wait,’ sich enthalten können ‘to refrain from,’ sich entbloden ‘to

The scalar lexicon 139 be ashamed to,’ sich entmutigen lassen ‘grow discouraged,’ ermüden ‘tire of,’ verfehlen ‘fail,’ and sich zurückhalten können ‘can hold back from.’ The examples in (28) illustrate a few similar NPIs in English. (28)

a. She {couldn’t/*could} help laughing. b. He {couldn’t/*could} resist asking her about her date. c. I {couldn’t/*could} wait to see him.

The BNC examples in (29–30) illustrate two constructions which denote relations at the high end of the aversion scale: the attenuating NPI mind and the emphatic PPI hate to. (29)

a. Oh well I wouldn’t mind being entertained by her! b. Do you mind if I join you?’

(30)

a. I hate to sound sappy, but anything can make me cry… b. I hate to burden you with this.

Attenuating aversive NPIs like mind reflect a coy strategy common in some cultures (Hübler 1984) for expressing desires indirectly. Other English constructions with similar uses include be averse to, have qualms about, and a broad family of expressions which literally denote disdain but are typically used to express approval – among others, would sniff at, would turn up one’s nose at, and would kick X out of bed, which, as Horn (p.c.) notes, typically functions as an oblique way of expressing sexual attraction by denying an inclination toward rejection. The difference between aversive and desiderative polarity items is particularly clear in the contrast between the mind V-ing and care to V constructions. As attenuators, both constructions tend to be used in politely uninformative speech acts, commonly occurring in more or less formulaic questions where they effectively make it easier for an addressee to answer negatively. In general, one is less likely to have either a strong desire or a strong aversion than to have a weaker one, and so to ask of someone do you mind? (roughly, ‘do you strongly object?’) or would you care? (roughly, ‘do you strongly desire?’) is, at least formulaically, a way of giving an addressee more options. But since the two forms here are associated with the opposite sorts of scales, the pragmatic functions which they serve are similarly opposed. Thus, as noted above, would you care to V (or would you care for X) typically expresses an offer or an invitation – because it politely allows one to decline gracefully – while an expression like would you mind V-ing is typically used for requests – because saying “no” in this case implicates a willingness to do what is requested.

140 The Grammar of Polarity The expression of modality also plays an important role in the semantics of many more complex verbal NPIs. Epistemic can is perhaps the purest example of what Horn (1972: 187ff.) calls an “impossible polarity item” – a form which can be used only to express what cannot be. Impossible polarity items themselves are a subclass of “possible polarity items” (ibid.) – forms like cope and afford, which must occur in the scope of a ‘possibility’ operator of some sort (e.g. can, be hard to, be enough to, help, etc.). ‘Impossible’ polarity items are just ‘possible’ polarity items for which the notion of ‘possibility’ must itself be interpreted inside the scope of negation or some other polarity trigger. The complex sensitivities of this last group are nicely illustrated in Horn’s multilayered example below (1972: 190). (31) can’t I ?can *didn’t

abide bear stand take stomach

linguistics. writing dissertations.

Not all of these forms are strict NPIs. Hoeksema (1994) finds instances of stand, take, and bear in affirmative contexts, though only rarely in his corpus (

The Grammar of Polarity: Pragmatics, Sensitivity, and the Logic of Scales (Cambridge Studies in Linguistics)

The Grammar of Polarity: Pragmatics, Sensitivity, and the Logic of Scales (Cambridge Studies in Linguistics)

Pragmatics and Grammar (Cambridge Textbooks in Linguistics)

Linguistics and the Formal Sciences: The Origins of Generative Grammar (Cambridge Studies in Linguistics)

Creole Genesis and the Acquisition of Grammar: The Case of Haitian Creole (Cambridge Studies in Linguistics)

Relevance and Linguistic Meaning: The Semantics and Pragmatics of Discourse Markers (Cambridge Studies in Linguistics)

The Syntax of Agreement and Concord (Cambridge Studies in Linguistics)

Italian Syntax and Universal Grammar (Cambridge Studies in Linguistics)

The Logic of Conventional Implicatures (Oxford Studies in Theoretical Linguistics)

The Logic of Conventional Implicatures (Oxford Studies in Theoretical Linguistics)

Mental Spaces in Grammar: Conditional Constructions (Cambridge Studies in Linguistics)

The Logic of Conventional Implicatures (Oxford Studies in Theoretical Linguistics)

Polarity Sensitivity as Inherent Scope Relations

Ergativity (Cambridge Studies in Linguistics)

The Syntax of Argument Structure (Cambridge Studies in Linguistics)

The Syntax of Adjuncts (Cambridge Studies in Linguistics)

Complexity Scales and Licensing in Phonology (Studies in Generative Grammar)

The Syntax of Negation (Cambridge Studies in Linguistics)

Handbook of Pragmatics (Blackwell Handbooks in Linguistics)

Syntax and Parsing (Cambridge Studies in Linguistics)

Negative and Positive Polarity: A Binding Approach (Cambridge Studies in Linguistics)

Scales of the Serpent

The Scales of Injustice

The Scales of Injustice

Studies in the Logic of Confirmation (I )

Derivations in Minimalism (Cambridge Studies in Linguistics)

Studies in the Logic of Confirmation (II )

Studies in the History of Arabic Logic

Studies in the History of Arabic Logic

Studies in the History of Arabic Logic

The Order of the Scales

The Grammar of Polarity: Pragmatics, Sensitivity, and the Logic of Scales (Cambridge Studies in Linguistics)