JOURNAL OF SEMANTICS AN INTERNATIONAL JoURNAL FOR THE INTERDISCIPLINARY STUDY OF THE SEMANTICS OF NATURAL LANGUAGE
M A N A GIN G E D I T O R : ASSOCI ATE E DITO RS :
PETER BoscH
(University of Osnabriick) (University of Texas, Austin) SANDT (University of Nijmegen)
NICHOLAS AsHER
RoB VAN
DER
E D IT O R IAL BOARD: MANFRED BIERWISCH
University Berlin) BRANIMIR BoGURAEV
Center)
(MPG and Humboldt (IBM TJ. Watson Research
(University of Essex) (University of Milan) f>NN CoPESTAKE (University of Cambridge) OsTEN DAHL (University of Stockholm) KEES VAN DEEMTER (University of Brighton) PAuL DEKKER (University of Amsterdam) KuRT EBERLE (linguatec-es, Heidelberg) REciNE EcKARDT (University of Konstanz) CLAIRE GARDENT (CNRS, Nancy) BART GEuRTS {University of Nijmegen) LAURENCE R HoRN (Yale University) JoACHIM JACOBS (University of Wuppertal) KEITH BRowN
GENNARO CHIERCH!A
N. J oHNSON-LAIRD (Princeton University) (University of Stuttgart) GRAHAM KATz (University of Tiibingen) SEBASTIAN LOBNER (University of Dusseldorf) Sm JoHN LYoNs (Vemeuil-en-Bourbonnais) MARc MoENS (University of Edinbur gh) FRANCIS J. PELLETIER (University of Alberta) MANFRED PINKAL (University of Saarbriicken) ARNIM voN STECHOW (University of Tiibingen) MARK STEEDMAN (University of Edinburgh) ANATOLI STRIGIN (ZAS, Berlin) HENRIETIE DE SwART (University of Utrecht) BoNNIE WEBBER (University of Edinbur&h) HENK ZEEVAT (University of Amsterdam) THOMAS E. ZIMMERMANN (University of Frankfurt) PHILIP
HANs KAMP
EDITORIAL ADDRESS: Journal of Semantics, c/o Dr P. Bosch, Lerchenstr. 76, 70I76 Stuttgart, Germany. Phone: (49-7II-) 2262616. Telefax: (49-7I I-) 2262614. Email:
[email protected] © Oxford University Press
2000
All rights reserved; no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise without either the prior written permission of the Publishers, or a licence permitting restricted copying issued in the UK by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1P 9HE, or in the USA by the Copyright Clearance Center, 222 Rosewood Drive, Danvers, Massachusetts 01923, USA
Journal of Semantics (ISSN ot67 S'33) is published quarterly in February, May, August and November by Oxford University Press, Oxford, UK. Annual subscription is US$173 per year. journal ofSemantics is distributed by MAIL America, 2323 Randolph Avenue, Avenel, New Jersey 07001, USA Periodical postage paid at Rahway, New Jersey, USA and at additional entry points.
US POSTMASTER: send address corrections to journal of Semantics, c/o Avenel, New Jersey 07001, USA
For subscription information please see inside back cover.
MAIL America, 2323 Randolph Avenue,
JOURNAL OF SEMANTICS Volume 17 Number 3
Special Issue on Optimization of Interpretation (Part I) Guest Editors: Petra Hendriks, Henriette de Swart and Helen de Hoop
CONTENTS
PETRA HENDRIKS, HENRIETIE DE SwART AND HELEN DE HooP
Introduction
185
REIHARD BLUTNER
Some Aspects of Optimality in Natural Language Interpretation
189
PAuL DEKKER AND RoBERT vAN Roov
Bi-Directional Optimality Theory: An Application of Game Theory 217 HENK ZEEVAT
The Asymmetry of Optimality Theoretic Syntax and Semantics
(Part II to follow in vol. 17.4)
Please visit the journal's world wide web site at http://jos.oupjournals.org and the editorial web site at http:/ /journal-of-semantics.org
243
Subscriptions: The Journal of Semantics is published quarterly. £99; USA and Rest of World US$173- (Single issues: UK and Europe £31; USA and Rest of World US$54.)
Institutional: UK and Europe Personal:* UK
and Europe £42.50; USA and Rest of World US$79. (Single issue: UK and Europe £r3; USA and Rest of World US$25.)
*Personal rates apply only when copies are sent to a private address and payment is made by personal cheque/credit card. Prices include postage by surface mail or,for subscribers in the USA and Canada by Airfreight or in Japan,Australia,New Zealand and India by Air Speeded Post. Airmail rates are available on request. Back Issues. The current plus two back volumes are available from the Oxford University Press, Great Clarendon Street, Oxford OX2 6DP. Previous volumes can be obtained from Dawsons Back Issues, Cannon House,Park Farm Road,Folkestone, Kent CTr9 sEE, tel +44 (o)r303 85oror,fax +44 (o)r303 850440. Volumes 1-6 are available from Swets and Zeitlinger,PO Box 830, 2r6o SZ Lisse, The Netherlands. Payment is required with all orders and subscriptions are accepted and entered by the volume. Payment may be made by cheque or Eurocheque (made payable to Oxford University Press), National Girobank (account soo ros6), Credit cards (Access, Visa, American Express, Diners Club),or UNESCO coupons. Please send orders and requests for sample copies to the Journals Subscriptions Department, Oxford University Press,Great Clarendon Street,Oxford OX2 6DP,UK, tel +44 (o)r86 5 267907, fax +44 (o)r86 s 26748s,
[email protected].
Scope of this Journal
The journal of Semantics publishes articles, notes, discussions, and book reviews in the area of academic research into the semantics of natural language. It is explicitly interdisciplinary, in that it aims at an integration of philosophical, psychological, and linguistic semantics as well as semantic work done in logic, artificial intelligence, and anthropology. Contributions must be of good quality (to be judged by at least two referees) and must report original research relating to questions of comprehension and interpretation of sentences, texts, or discourse in narural language. The editors welcome not only papers that cross traditional discipline boundaries, bur also more specialized contributions, provided they are accessible to and interesting for a general readership in the field of natural language semantics. Empirical relevance, sound theoretic foundation, and formal as well as methodological correctness by currently accepted academic standards are the central criteria of acceptance for publication. It is also required of contributions published in the Journal that they link up with currently relevant discussions in the field of natural language semantics. Information for Authors: Papers for publication should be submitted to the Managing Editor (
[email protected]) as a PDF 6le or PS file attachment. Only if this is not feasible please send three paper copies by post to the editorial address and, if possible, enclose a DOS-formatted 3·5 inch disk with a PDF or PS 6le, or text processing source 6le. Papers are accepted for review only on the condition that they have neither as a whole, nor in part, been published elsewhere, are elsewhere under review, or have been accepted for publication. In case of any doubt authors must notify the editor of the relevant circumstances at the time of submission. The style requirements of theJournal ofSemantics are found in the style sheet http://journal-of-semantics.org/style.html and are binding for the final version to be prepared by the author when the paper is accepted for publication. For initial submission it suffices if the following minimal requires are met. The page size should be A4 (or similar format). The paper must be headed by its title and must carry the name and affiliation of the author along with the author's correspondence address (post and email) at the end of the text. All submissions must be accompanied by an approx. 200 word abstract. Detailed bibliographical references must appear at the end of the paper in alphabetical order of authors' names, abbreviated in the text by author's surname and year of publication. Diagrams must be submitted in electronic 6les or camera-ready on paper. Copyright: It is a condition of publication in the Journal that authors assign copyright to Oxford University Press. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning copyright, authors may use their own material in other publications provided that the Journal is acknowledged as the original place of publication, and Oxford University Press is notified in writing and in advance. Advertising:
Advertisements are welcome and rates will be quoted on request. Enquiries should be addressed to Helen Pearson, Oxford Journals Advertising, PO Box 347, Abingdon SO, OX14 sXX, UK. Tel/fax: +44 (o)1235 201904,
[email protected].
Journal of Semantics
I 7' I 8 S- 1 8 7
© Oxford University Press
2000
Guest Editors' Introduction PETRA HENDRIKS, HELEN DE HOOP, AND HENRIETTE DE SWART
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Optimality Theory (OT) was developed in the 1 990s by Alan Prince and Paul Smolensky as a general theory of language and grammar. Crucial for OT is Smolensky's idea of identifying a connectionist notion of well formedness (Harmony) with linguistic well-formedness. In OT a grammar consists of a set of well-formedness constraints. These constraints apply to representations of linguistic structures simultaneously. Moreover, they are soft, which means violable and potentially conflicting. At least an important subpart of these constraints is assumed to be shared by all languages. Individual languages rank these universal constraints differently in such a way that higher-ranked constraints have total dominance over lower ranked constraints. Possible output candidates for each underlying form are evaluated by means of these constraint rankings. The output that best satisfies the constraints is the optimal candidate and will be realized. Although only recently OT was applied to semantic and pragmatic analysis for the first time, the last two years have shown a remarkable growth in the use of soft, conflicting constraints to characterize natural language interpretation. In the OT semantic theory developed by Hendriks & De Hoop (1997, to appear), each grammatical expression is associated with an, in principle, infinite number of interpretations. These candidate interpretations ate tested against the ranked constraints in a parallel fashion. One of the advantages of such an approach is that constraints of various nature (syntactic, pragmatic, etc.) interact with each other in a truly cross modular way. This view crucially differs from the classical compositional approach, where one interpretation is computed on the basis of the syntactic input, making use of context only when necessary. One aspect that receives a lot of attention in this special issue is the adequate treatment of the roles of the speaker's perspective (generation) and the hearer's perspective (comprehension). Whereas OT syntax optimizes syntactic structure with respect to a semantic input (one might say that OT syntax takes the perspective of the speaker, who has a certain thought and wants to express this correctly and optimally through a syntactic structure), OT semantics, on the other hand, takes the point of view of a hearer, who hears (or reads) a certain utterance and wants to interpret it correctly and optimally.
r
86 Guest Editors' Introduction
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Several papers in this special issue argue in favour of a bi-directional OT, where the speaker's and hearer's perspectives are taken simultaneously. Reinhard Blutner establishes a conceptual framework that realizes the integration of the two perspectives. A bi-directional approach explains interpretative preferences that are problematic from the speaker's point of view as well as blocking effects that cannot be explained from the hearer's perspective. Blutner argues that his bi-directional framework captures the essence of the Gricean maxims and the balance between informativeness and efficiency in natural language processing. Henk Zeevat argues for a slightly different combination of syntax and semantics that avoids certain problems of Blutner's hi-directionality. In Zeevat's view, OT syntax is the basic framework, which also deals with interpretation. This program is extended with a bi-directional pragmatic component in the spirit of Blutner. The resulting asymmetry between OT syntax and OT semantics is consistent with the vast differences between what people can say and what they can understand. Another case of syntax/semantics interaction is the Finnish partitive construction discussed by Arto Anttila and Vivienne Fong. This construc tion exhibits a case alternation that is partly semantically and partly syntactically driven. The crucial syntactic and semantic constraints conflict with each other leading to various kinds of outcomes, including free variation and ambiguity, as well as preferences in expression and prefer ences in interpretation. An OT analysis of these facts is developed based on partially ordered grammars. Partial ordering is argued to be crucial in deriving ambiguity and blocking effects. An important question in OT semantics is whether we can account for cross-linguistic variation in interpretation as a result of different rankings among the different types of constraints that relate form and meaning. Alice ter Meulen accounts for differences in reflexivization strategies of Dutch and English by supplementing binding principles applied to Dutch reflexives with optimality considerations and a general principle of linguistic economy. Dutch SE-reflexives optimally encode coreference in contrast to English ordinary bound pronouns. The framework of OT naturally suggests itself for dealing with a wide range of problems in semantics and pragmatics, according to Bart Geurts. Geurts' paper can be viewed as a reply to the OT treatments of presupposition proposed by Blutner and Zeevat. Geurts compares the Informativeness Principle (which states that more informative readings are preferred to less informative ones) to his own Buoyancy Principle (which states that backgrounded material tends to float up to the main context) and concludes in favour of the BP.
Petra Hendriks, Helen de Hoop, and Henriette de Swart I 87
Acknowledgements Helen de Hoop gratefully acknowledges support by the Netherlands Organization for Scientific Research, NWO (grant 300-75 -020). HELEN DE HOOP Rijkuniversiteit Utrecht Trans
10
3512 JK Utrecht The Netherlands email:
[email protected] REFERENCES Hendriks, Petra & Hoop, Helen de (1997), 'On the interpretation of semantic relations in the absence of syntactic structure', in P. Dekker, M. Stokhof, & Y. Venema (eds), The Proceedings of the 1 Jth Amsterdam Collo-
quium, ILLC/Department of Philosophy,
Amsterdam, 15 7-62. Hendriks, Petra & Hoop, Helen de (to appear), 'Optimality theoretic semantics', Linguistics and Philosophy.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The papers in this issue share the goal of elucidating the processes of natural language interpretation, but the theoretical perspectives differ from one another. In the paper by Paul Dekker and Robert van Rooy some parallels are pointed out between principles employed in OT interpretation, and notions from the field of Game Theory. OT interpretation is defined as what Dekker and Van Rooy call an 'interpretation game' and optimality itself is the solution concept for a game. More in particular, optimality is characterized in terms of the game-theoretical notion of a 'Nash Equilibrium'. We hope that the present collection of papers will bring the project of OT semantics to the attention of a broad linguistic community. The papers in this issue represent some of the m�or developments in OT semantics and they will hopefully form a basis for future research in this exciting new field.
Journal of Semantics
© Oxford University Press
17: 189-216
2000
Some Aspects of Optimality in Natural Language Interpretation REINHARD BLUTNER
Humboldt University Berlin Abstract
r
INTRODUCTION
The popularity of Optimality Theory (OT) is notably different in the various fields of linguistics. In phonology it has become the dominant theoretical paradigm. The main reason that OT grew so rapidly in this field is that constraint ranking was silently present in the phonological literature for many years. After the idea was brought from the periphery to the foreground its need in phonology was quite clear. In syntax, the predominant research tradition has given typically negative answers to the question whether a conflict between constraints is resolved by ranking one constraint over the other. Constraints were assumed to be hard and there is ample evidence that conflicts block the existence of any acceptable output (c£ the discussion in Pesetsky 1997). The recent interest in OT syntax is obvious in the investigation of some non-standard phenomena, especially concerning the interaction between syntax, pro nunciation and reference (e.g. Pesetsky 1997). Other motivation came from language typology and from the view that the parser and the grammar are not very different objects. Furthermore, a closer look at the 'absolute' principles has made clear that their violability is actually quite widespread (Speas r 997) In natural language interpretation the idea of optimization is quite .
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In a series of papers, Petra Hendriks, Helen de Hoop, and Henriette de Swart have applied optimality theory (OT) to semantics. These authors argue that there is a fundamental difference between the form of OT as used in syntax on the one hand and its form as used in semantics on the other hand. Whereas in the first case OT takes the point of view of the speaker, in the second case the point of view of the hearer is taken. The aim of this paper is to argue that the proper treatment of OT in natural language interpretation has to take both perspectives at the same time. A conceptual framework is established that realizes the integration of both perspectives. It will be argued that this framework captures the essence of the Gricean maxims and gives a precise explication of Atlas & Levinson's ( 1 9 8 r) idea of balancing between informativeness and efficiency in natural language processing. The ideas are then applied to resolve some puzzles in natural language interpretation.
190
Some Aspects of Optimality in Natural Language Interpretation
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
obvious and there is much evidence in favour of competition and constraint ranking in this field. However, the field is rather divergent. Looking at the different conceptions of discourse coherence gives an impression of the heterogeneity of the field. What is essential is a kind of integrative framework that makes it possible to formulate the different conceptions in one scientific language and thus to make comparisons between different models transparent. In my opinion, OT is an opportunity for realizing such an integrative framework. However, in its present form OT is insufficient to do this job. So, what we have to do first is to adjust OT to the specific demands of natural language interpretation. Then we can come back to the task of integrating different aspects and different views of natural language interpretation. In OT it is common to assume three formal components: the Generator, the Evaluator and a system of (ranked) Constraints. These components are characterized by three basic assumptions. First, a set of inputs A is assumed. For each input, Gen creates a candidate set of potential outputs B. The second assumption is that from the candidate set Eval selects the optimal output for that input. The third assumption is that there is a language particular ranking of constraints from a universal set of constraints. Constraints are absolute and the ranking of the constraints is strict in the sense that outputs that have at least one violation of a higher-ranked constraint can never win over outputs that have arbitrarily many violations of lower-ranked constraints (cf. Prince & Smolensky 1993; Kager 1999). Each of these three assumptions has to be adjusted or revised in order to satisfy the demands of natural language interpretation. With respect to Gen, I think, it is best to take a dynamic picture of natural language semantics and to describe it in terms of a context change semantics. This adjustment is especially important in order to deal with the context dependency of natural language interpretation (e.g. Kamp & Reyle 1993). Next, consider Eval. The direction of optimization is usually taken unidirectional (from A to B, where the elements of A sometimes are called inputs and the elements ofB outputs). One of my main arguments is that in the case of interpretation it is inevitable to have bidirection of optimization (from A to B and from B to A). Both directions are not independent of each other; instead, they should be interrelated in a particular way. Third, with regard to Con we have to acknowledge the role of graded constraints. Graded constraints also appear in other domains, for example in phonology (c£ Prince & Smolensky 1 993; Boersma 1 998). However, in natural language interpretation the role of graded constraints seems to be much more important than in other domains. Another point is that in natural language interpretation the relevant pragmatic constraints are always ranked universally within the set of pragmatic constraints. As a
Reinhard Blutner
19 1
2 TWO PERSPECT IVES O F OPTIMALITY De Hoop & de Swart (1998), Hendriks & de Hoop (to appear), and de Hoop (2ooo) applied OT to sentence interpretation. These authors argue that there is a fundamental difference between the form of OT as used in syntax on the one hand and its form as used in semantics on the other. Whereas in the former case OT takes the point of view of the speaker (production perspective), in the latter case the point of view of the hearer is taken (comprehension perspective).1 This idea is an important one and I think most of the existing analyses conform to it. Moreover, the picture can be extended to OT phonology and morphology. For example, in phonology Gen clearly takes the production perspective and creates a candidate set of potential outputs (=speech sounds as they occur in utterances) for a given input (=speech sounds as they occur in the mental lexicon). From the candidate set, Eval selects the best (optimal) output for that input. A similar picture can be found in OT morphology (e.g. Bresnan, to appear). Here the input 1 By using the terms 'comprehension' and 'production' we do not refer to performance but rather to abstract functions in a mathematical sense that pair certain pairs of representations (cf Smolensky 1996).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
consequence, typological differences between languages are not triggered by a reranking of the constraints within the pragmatic domain. Instead, typological effects are triggered-among other things-by variations that concern the relative importance of pragmatic constraints with regard to other types of constraints. Choi (1996) supports this point in an indirect way by comparing scrambling phenomena in German and Korean. The paper is structured as follows. In section 2 some arguments are put forward as to why bidirection of optimization is of central importance when we try to apply OT to natural language interpretation. Section 3 introduces my proposals for a proper treatment of optimality in natural language interpretation. The starting point is the context change potential of an (underspecified) expression which is described as a relation between input and output contexts. The effect of optimality is simply to constrain this relationship in a way which both involves optimization for interpreta tion and optimization for production. In section 4 the general framework is put in concrete terms by modelling contexts as DRSs. It is demonstrated that van der Sandt's/Geurts' projection mechanism for presuppositions can be reconstructed and extended as a consequence of the present form of OT.
192
Some Aspects of Optimality in Natural Language Interpretation
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
represents language-independent 'content' in the multidimensional space of possible grammatical and lexical contrasts and Gen enumerates a set of concrete realizations of the input that are available across languages (expressing the 'content' with varying fidelity). However, the one way tableau typically assumed in phonology may be insufficient. One reason for this shortage has to do with the nature of the input under OT. In contrast to standard generative phonology, where numerous constraints were imposed on the input, in OT constraints on the input are typically lacking. In principle, the set of inputs to the grammars of all languages is assumed to be the same (richness of the base). As a consequence, in many cases it is easy to construct multiple inputs that converge on a single output. Which of the multiple inputs should be selected? This question is important when we assume that the relevant inputs must be stored somewhere in the mental lexicon. The economy of the lexicon requires that corresponding inputs must be selected careful. Prince & Smolensky (1993: section 9) introduced an algorithm called lexicon optimization (further developed by Ito, Mester, & Padgett 1 995) which optimizes the inputs. The algorithm examines the constraint violations incurred by the winning output candidate corresponding to each competing input. The input-output pair with the fewest violations is selected as the optimal pair. Thus, lexicon optimization works from the inputs A to the outputs B and back from B to A. As a consequence, the 'input' set A is restricted in an indirect way, by means of the system of ranked constraints and the possible outputs. OT syntax is another case where the production perspective is taken exclusively. It optimizes syntactic structure with respect to a semantic input. Now we have to notice human sentence parsing as a related area in which optimality has always been assumed. According to the nature of parsing, in this case the comprehension perspective comes in. Consequently, the parser optimizes underlying structures with respect to a surface input. Gibson & Broihier (1998) and Fanselow, Schlesewsky, Cavar, & Kliegl (1 999) have shown that parsing preferences can be explained in this way. Furthermore, Fanselow, Schlesewsky, Cavar, & Kliegl (1 999) have tried to demonstrate that the same constraints seem to be used both in OT syntax and parsing. If this it right, it demonstrates that both directions of optimization are relevant. OT syntax normally ignores the phenomenon of syntactic ambiguities and does not try to explain the preferences for the different readings that suggest itself (cases in point are quantifier scope and PP attachment). I see it as an opportunity for OT syntax to explain the relevant preferences with the help of syntactic constraints, which are motivated independently. If we consider optimality under the production perspective exclusively, we lose this opportunity to give a syntactic explanation for the
Reinhard Blutner
19 3
( I ) a. I ate pork/?pig
b. Some persons are forbidden to eat beef/?cow c. The table is made of wood/?tree Blocking effects need not be absolute. Instead, they may be cancelled under special contextual conditions. Nunberg & Zaenen ( r 992) give the following example of what they call deblocking: (z) Hindus are forbidden to eat cow/?beef
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
preferences. This does not exclude the relevance of pragmatic factors that arguably interact with the syntactic factors. Now let us address natural language interpretation. Ambiguity, polys emy, and other forms of flexibility are much more obvious and manifested in a much broader way in this area than in the realm of syntax. The assumption that OT in sentence interpretation takes the point of view of the hearer is mainly motivated by this observation and the aim to explain the interpretive preferences. Using this perspective a mechanism for preferred interpretations is constituted that provides insights into different phenomena of interpretations, such as the determination of quantificational structure (Hendriks & de Hoop, to appear), nominal and temporal anaphorization (de Hoop & de Swart I 998), and the interpretational effects of scrambling (de Hoop zooo). However, I think there are reasons demonstratnig this design of OT to be inappropriate and too weak in a number of cases. The reasons have to do with the fact that Gen can pair different forms with one and the same interpretation. The existence of such alternative forms may raise blocking effects that strongly affect what is selected as the preferred interpretation. It is not difficult to see that the arguments for a bidirectional view in syntax and the arguments for a bidirectional view in interpretation are complementary. In the case of syntax, we cannot explain interpretative preferences when we take the production perspective alone. In the case of semantics/pragmatics we cannot explain blocking effects when we take the comprehension perspective alone. Blocking effects are essential for the explanation of pragmatic anomalies. This may be illustrated with an example. Consider the well-known phenomenon of 'conceptual grinding', whereby ordinary count nouns acquire a mass noun reading denoting the stuff the individual objects are made of, as in Fish is on the table or Dog is all over the street. One of the essential factors that restrict the grinding mechanism is lexical blocking. For example, in English the specialized mass terms pork, beef, wood usually block the grinding mechanism in connection with the count nouns pig, cow, tree. This explains the contrasts given in (I).
194
Some Aspects of Optimality in Natural Language Interpretation
(3) a. Johni washes himsel( b. *Johni washes himi c. Johni expected Mary to wash himi In (3b) the coreferential reading is impossible because this interpretation is blocked by the form (3a) which is assumed to be more cheaply generated (because of a weak constraint saying 'bound NPs are marked reflexive'). In (3c) this blocking effect is cancelled out by a higher-ranked constraint 'A reflexive must be bound locally' (Burzio 1 998). The version of (3c) with a reflexive will now be taken to violate this constraint, while the one with the pronoun only violates the lower-ranked constraint 'bound NPs are marked reflexive', thus representing the optimal candidate. Appreciating the basic findings of Petra Hendriks, Helen de Hoop and Henriette de Swart concerning the selection among interpretations, the conclusion can only be that we have to consider bidirectional optimization. This appears to be almost a conceptual necessity. A careful argument in favour of bidirectionality has to take into account the important distinction between a semantic representation (=formal meaning) and an interpretation (content). If we identify semantic repre sentations and interpretational content, then we simply have to state that a bidirectional OT is established by combining OT syntax and OT semantics SYNTAX syntactic representation
semantic representation
SEMANTICS Figure
I
Syntax and semantics as the two directions of bidirectional OT
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
They argue that what makes beif odd here is that the interdiction concerns the status of the animal as a whole, and not simply its meat. That is, Hindus are forbidden to eat beef only because it is cow-stuff Copestake & Briscoe (1995) provide further examples that substantiate this claim. The simplest explanation for blocking (and also deblocking) is a bidirectional OT that takes into account the production perspective. An expression is blocked with regard to a certain interpretation if this interpretation can be generated more economically by an alternative expression. Linguistic and contextual factors can trigger deblocking in case they reverse the corresponding cost values (cf Copestake & Briscoe 1 995; Blutner 1998). The binding behaviour of pronominal expressions gives another illus tration for the importance of blocking in natural language interpretation.
Reinhard Blutner
195
SYNTAX semantic representation
syntactic representation
interpretation
PRAGMASEMANTICS
Figure
2
The two directions of optimization in a model without bidirection
SYNTAX syntactic representation
semantic representation SEMANTICS
Figure
3
interpretation PRAGMATICS
A model with two modes of bidirection
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(see Figure r ). OT semantics takes syntactic representations as inputs and results in optimal semantic outputs, and OT syntax takes semantic representations as inputs and results in optimal syntactic outputs. To say that we need bidirectionality is then simply to say that we need OT syntax and OT semantics. Presumably, this is the view taken by the pioneers of OT semantics. There are different schools of linguistics which consider the distinction between formal meaning and interpretational content as a very important issue. For example, Bierwisch (I 98 3, I996) proposed his two-level semantics, Carston (I998) made a similar point from the perspective of relevance theory, and many people in computational linguistics have a related distinction based on the idea of underspecification (e.g. van Deemter & Peters I996). Assuming this distinction could lead us to an architecture combining the ideas of optimal production and optimal interpretation in a way that does not make use of bidirection (Figure 2 ) . It is not difficult to see that this architecture is unable to explain the blocking of interpretations in the general case. It only describes the blocking of interpretations just for those cases where the corresponding semantic representations are blocked. The example of 'conceptual grinding' and other phenomena within the realm of lexical pragmatics (cf Blutner I998) suggest that one and the same semantic representation may be connected with a variety of different interpretations. Nevertheless, certain interpretations can be blocked without blocking the corresponding semantic representations. It is not difficult to suggest an architecture that doesn't suffer from these shortcomings. It is shown in Figure 3· Here we have to consider two modes of bidirection-one for relating syntactic and semantic representations and one for relating semantic representations and interpretations. It goes without saying that this architecture does not really conflict with the
196
Some Aspects of Optimality in Natural Language Interpretation
2 Another nice example where a bidirectional competition technique can help to explain empirical generalizations is discussed by Lee (2ooo). Based on the constraints assumed by Choi ( 1 996), Lee shows that a bidirectional model can explain some types of 'freezing effects' concerning the word order in German and Korean (looking. at sentences with ambiguous case marking). For further examples and references, see Kuhn (zooo) and the web page of Bresnan: http://www. lfg.stanford.edu/lfg/bresnan/.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
ideas of the pioneers of OT semantics. Instead it broadens their view in a straightforward way. Not surprisingly, it is rather unclear sometimes which phenomenon should be treated within which mode of bidirection. Consider the case of binding phenomena. Building on Burzio (r989), C�lin Wilson (1998) develops a theory of anaphora incorporating two types of competition. Assuming the interface betw:een syntax and semantics to have a particular 'direction', Wilson takes both directions into account-the direction that maps from semantic structures to syntactic ones and the opposite direction that maps from syntactic structures to semantic ones.. Clearly, Wilson's account refers to the mode of bidirection shown on the left-hand side of Figure 3· In contrast, there is Levinson's pragmatic theory of anaphora (e.g. Levinson 1987), which can be seen as operating in the pragmatic. mode of bidirection (right-hand side ofFigure 3). It is not the aim of this paper to judge which decision is the better one. Independent of the position we take with regard to the distinction between meaning and interpretation, the advantage of the bidirectional view becomes clear now: it integrates interpretational preferences and blocking effects and it keeps OT simple: 'What is best expressed as a generation principle is expressed as a generation principle, what is best expressed as an interpretation principle is expressed as an interpretation principle' (Zeevat, this volume),. Under the present perspective of integrating production and compre hension optimality we can account both for ineffability and for pragmatic anomaly. The first case occurs when the optimal production can be triggered more efficiently by an alternative interpretational input. The second case occurs when the optimal interpretation can be expressed more efficiently by an alternative form.2 The final remark has to do with the foundation of OT in Harmony Theory; Harmony Theory is a formalism which abstracts away from the details of connectionist networks and seeks to find out general mathemat ical techniques for analysing classes of connectionist networks (Prince & Smolensky 1993; Smolensky 1986). One essence of Harmony Theory is its founding on a two-layer scheme which allows a combination of simplicity with uniformity. On the lower layer we find representational nodes that encode the different kinds of information involved in language processing
Reinhard Blutner
197
3
AN INTE GRAT IVE FRAME W O RK
In this section an attempt is made to integrate optimal interpretation and optimal production. A look at the area of pragmatics seems to be useful since an analogous optimality metric plays an indispensable role there. The Gricean conversational maxims are widely recognized as a (rather informal) expression of this metric. With Zipf ( r 949) as a forerunner we have to
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(phonological, morphological, syntactic, semantic). On the upper level we find knowledge nodes that are hidden units that encode certain 'patterns' that relate particular configurations of representational units. A connectionist network is a dynamical system that is controlled by a certain Ljapunov function. When activation dynamically spreads off, this function always decreases or remains constant. In other words, harmony theory says that starting from any incomplete representational vector, this vector is always completed in a minimalistic/optimal way. Harmony theory does not say that the different optimizations converge when we start with different parts of a lucid representational vector. The theory says only that one and the same Ljapunow function (=system of ranked constraints in OT) can be used when the system operates like a hearer (starting with a natural language form and ending with an interpretation) or when it operates like a speaker (starting with an activated interpretation and ending with a form). The theory does not say that we come back to the original expression when we execute both operations in successwn. Everyone can describe numerous situations in which he was unable to produce what he understands. More drastically, the phenomenon of aphasia illustrates possible asymmetries in production and comprehension (e.g. Jakobson 194I/I968). A related asymmetry is found in language acquisition. It is well known that children's abilities in production lag dramatically behind their abilities in comprehension. In overcoming this lag, a kind of bootstrap mechanism seems to apply that depends crucially on the robustness of comprehension, possibly by using a technique called robust interpretative parsing (Smolensky 1996; Tesar & Smolensky 2ooo). Consequently, when it comes to relate the two perspectives within a bidirectional OT, we have to acknowledge the close interrelation between them in the OT learning algorithm. In summary, harmony theory per se does not give any argument in favour of bidirection. Instead, the arguments are coming from OT learning theory. I will come back to this important conceptual point in the next section.
198
Some Aspects of Optimality in Natural Language Interpretation
(4) Q-principle: Say as much as you can (given I) (Horn 1984: 13). Do not provide a statement that is informationally weaker than your knowledge of the world allows, unless providing a stronger statement would contravene the !-principle (Levinson 1987: 401). !-principle: Say no more than you must (given Q) (Horn 1984: 13). Say as little as necessary, i.e. produce the minimal linguistic information sufficient to achieve your communicational ends (bearing the Q-principle in mind) (Levinson 1987: 402). Read as much into an utterance as is consistent with what you know about the world (Levinson 1983: 146-7). Obviously, the Q-principle corresponds to the first part of Grice's quantity maxim (make your contribution as informative as required), while it can be argued that the countervailing !-principle collects the second part of the quantity maxim (do not make your contribution more informative than is required), the maxim of relation and possibly all the manner maxims. In a slightly different formulation, the !-principle seeks to select the most coherent interpretation, and the Q-principle acts as a blocking mechanism and blocks all the outputs that can be derived more economically from an alternative linguistic input (for a detailed discussion see Blutner 1998). This formulation makes it quite clear that the Gricean framework can be understood in a bidirectional optimality framework which integrates production and comprehension optimality. At first glance, using a bidirec tional competition technique can be seen merely as establishing the very same ideas as presented in Blutner (1998) using a more widely acknow ledged and well-known basis. However, that is not the whole story. We have to acknowledge that the framework of OT gives us a much wider
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
acknowledge two basic and competing forces, one force of unification, or Speaker's economy, and the antithetical force of diversification, or Auditor's economy. The two opposing economies are in extreme conflict, and we have to look for an optimal way to resolve this conflict. An important step in reformulating and explicating the Gricean frame work has been made by Atlas & Levinson (1981) and Horn (1984), who have tried to clarify the consequences of these opposing economies. Taking Quantity as a starting point, they distinguish between two principles, the Q-principle and the !-principle (termed R-principle by Horn 1984). The !-principle can be seen as the force of unification minimizing the Speaker's effort, and the Q/R-principle can be seen as the force of diversification minimizing the Auditor's effort. Simple but informal formulations of these principles are as follows:
Reinhard Blumer
199
(s) Gena= {(sem(A), r): O'[sem(A)]r} For convenience, we will simply write A instead of sem(A) from now on. The effect of the Gricean maxims is simply to constrain this relation in a particular way, and we have already given some initial motivation that this constraint can be formulated best in a bidirectional OT framework. In OT there is a cost function (harmony function) that evaluates the elements of the generator. For the present aims it is sufficient to assume an ordering relation >-- (being more harmonic, being more economical) that ranks the elements of the Generator.3 Now the following formulation of the Q and the !-principle comes immediately to mind and brings us to a bidirectional optimality view:
(6) Bidirectional OT (strong version) (Q) (A, r) satisfies the Q-principle iff (A, r ) E Gena and there is no other pair (A', r) such that (A', r ) >-- (A, r ) 3 Being more pedantic, we should write >-a in order to indicate the dependence on the actual context u. We can drop the index because here and in the following we assume the actual context to be fxi ed.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
perspective on relating natural language comprehension, language acquisi tion (Tesar & Smolensky zooo) and language change (e.g. Haspelmath 1 999). Furthermore, there are interesting mathematical results concerning the computational capacity of OT systems (see Kuhn 2000 for further references). Taking the broader perspective and the more rigorous for malization, the use of OT may give the enterprise of Radical Pragmatics in general and Lexical Pragmatics in particular a new impulse. With the Gricean maxims as Eval, we have to make more explicit now the status of Gen. Following current trends in semantics, we see the formal meaning of a natural language expression A as its context change potential (e.g. Heim 1 982; Kamp 1 981; Kamp & Reyle 1 993; Groenendijk & Stokhof 1 99 1 ). It describes the way A (or better, the semantic form sem(A) that is associated with A) updates the current context 0' leading to a new context r. In standard dynamic semantics the context change potential is assumed to be a function, with the argument of the function usually written to the left: O'[sem(A)] = r. Taking into account that the semantics is highly under specified (e.g. Reyle 1 993 ) and that it seldom specifies a definite outcome, we assume that the context change potential is a relational notion. If r is one of the potential outcomes of updating 0' with sem(A), this is written as O'[sem(A)]r. The Generator Gena is now identified with the set of input output (form-interpretation) pairs (sem(A), r ) such that r is a potential result of updating 0' with sem(A); more formally:
200
Some Aspects of Optimality in Natural Language Interpretation
(A, T ) satisfies the !-principle iff (A, T ) E Gen17 and there is no other pair (A, T1 ) such that (A, T1 ) >- (A, T ) (A, T ) is called optimal iff it satisfies both the Q-principle and the I-principle.4
(I)
4 In terms of game theory, the solution concept that underlies the formulation of (strong) optimality is that of a 'Nash Equilibrium' (see Dekker & van Rooy, this volume).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Obviously, a pair (A, T ) satisfies the Q principle just in case A is an optimal production that can be generated starting with T. On the other hand, a pair (A, T ) satisfies the I-principle just in case T is an optimal outcome of interpreting A. Seeing both principles as being part of the real mechanism of natural language comprehension, the !-principle can be considered as a sub-mechanism for finding out preferred interpretations, and the Q principle can be considered as an (absolute) blocking mechanism that suppresses the interpretations that are connected more economically with an alternative form. In standard OT the ordering relation between elements of the generator is established via a system of ranked constraints. These constraints are typically assumed to be output constraints, i.e. they may be either satisfied or violated by an output form. In the bidirectional framework just presented, changing the perspectives is possible. This means that an output under one perspective can be seen as an input under the other perspective. Therefore, it is plausible to assume output and input con straints. However, we should avoid (relational) constraints that refer to inputs and outputs simultaneously. Seeing the input as a linguistic form that conveys phonological, syntactic, and semantic information, input con straints are typically markedness conditions that evaluate the 'harmony' of the form. On the other hand, the output (i.e. the resulting context T) is evaluated by constraints that determine its coherence and informativeness (with regard to the initial context a). Let me now give a very schematic example in order to illustrate some characteristics of the bidirectional OT (labelled strong version in order to discriminate it from a weak version introduced later). Assume that we have two constraints called F and C. F is a constraints on linguistic forms and collects the effects of linguistic markedness. C is a constraint on resulting contexts and refers to coherence and informativeness. There is no reason to introduce a ranking between F and C. Let us assume two forms AI and A2 which are semantically equivalent. That means Gena associates the same relations of context change with them. With a as initial context, let us assume the possible outcomes are TI and T2• Further, we assume that no other form updates a to one of these outcomes. Let us stipulate that AI satisfies F but not A2 and that TI satisfies C but not T2• That makes the form
Reinhard Blumer
201
A2 less well-formed than the form A1 and the resulting context T2 more complex than the resulting context TI. The bidirectional view can be demonstrated by the following tableau, where two super-columns are introduced, one for each result of context change.
(7)
AI
�
Az
»+
»+
F
c
I&
* *
*
Tl
*
Tl
I use Smolensky's (1 996) repertoire of symbols here:
�& indicates the optimal candidate when the production perspective is taken (find an optimal expression starting with ri) and »+ indicates the optimal candidate when the comprehension perspective is taken (find an optimal interpreta tion starting with Ai). Super-optimal pairs are those that are production and comprehension optimal. This is indicated by the simultaneous occurrence of� and »+. The tableau shows that only the form A 1 survives, with T 1 as its only interpretative outcome. Obviously, the form A2 is blocked in all its (semantically admissible) interpretations.5 The scenario just installed describes the case of total blocking where some forms (e.g. *furiosity, *fallacity) do not exist because others do (fury, fallacy). However, blocking is not always total but may be partial. According to Kiparsky (1982), partial blocking is realized in the case where the special (less productive) affix occurs in some restricted meaning and the general (more productive) affix picks up the remaining meaning (consider examples like refrigerant- refrigerator, informant - informer, contestant- contester). To handle these and other cases Kiparsky ( 1 982) formulates a general condition Avoid Synonymy. Working independently of the Aronoff-Kiparsky line, McCawley (1978) collects a number of further examples demonstrating the phenomenon of partial blocking outside the domain of derivational and inflectional processes. For example, he observes that the distribution of 5 Zeevat (personal communication) has proposed using pictures of the following kind, where arrows indicate the optimal candidate that arises when the indicated direction of optimization is taken. A link with arrows in both directions indicates a super-optimal pair.
A· x '· 2A 't2
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Interpretations
c
F
Forms
202
Some Aspects of Optimality in Natural Language Interpretation
productive causatives (in English, Japanese, German, and other languages) is restricted by the existence of a corresponding lexical causative. Whereas lexical causatives (e.g. (Sa)) tend to be restricted in their distribution to the stereotypical causative situation (direct, unmediated causation through physical action), productive (periphrastic) causatives tend to pick up more marked situations of mediated, indirect causation. For example, (8b) could have been used appropriately when Black Bart caused the sheriff's gun to backfire by stuffing it with cotton. (8)
a.
Black Bart killed the sheriff Black Bart caused the sheriff to die
Typical cases of total and partial blocking are not only found in morphology, but in syntax and semantics as well (cf. Atlas & Levinson I 98 I ; Horn I 984; Williams I 997 ). The general tendency of partial blocking seems to be that 'unmarked forms tend to be used for unmarked situations and marked forms for marked situations' (Horn I 984: 26)-a tendency that Horn ( I 984: 22) calls 'the division of pragmatic labour'. There are two principal possibilities avoiding total blocking within the bidirectional OT framework. The first possibility is to make some stipulations concerning Gen excluding equivalent semantical forms. Such a case is demonstrated in (9): F
Forms
..
c
*
Interp retations
In this case the unmarked form A, is stipulated to be used for the unmarked situation only. (This seems plausible when we assume the child learns the meaning of kill in stereotypical, unmarked situations). The interpretation of the marked form A2 remains open. Unfortunately, the bidirectional OT described in (6) does not select any situation for A2• Starting with T2, expressive optimization selects A2, as desired. However, we do not come back to the marked situation T2 when the inverse perspective (interpretative optimization) is taken. Instead, the unmarked situation T, is selected. Consequently, there is no output that is paired super-optimal with A2• That means, A2 is blocked in all interpretations.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
b.
Reinhard Blutner
203
The only possibility to account for Horn's division of pragmatic labour is to stipulate it as a property of the Generator. This is indicated by the following tableau: ( Io)
Forms
Obviously, this solution is completely ad hoc, and we should look out for an alternative solution.6 The bidirectional OT we have considered until now is a very strong and absolute one. We have assumed (i) that an input-output pair (A, T) is super optimal just in case T is optimal for A and A is optimal for T, and (ii) that the bidirections of optimization are independent of each other. This means that the results of optimization under one perspective are not assumed to influence which structures compete under the other perspective. Our initial motivation for developing a bidirectional OT was the formulation of the Gricean maxims in Radical Pragmatics (Atlas & Levinson .I 9 8 I ; Horn I 9 8 4). Already the informal formulations given in (4) make it completely clear that we need a formalization where bidirections of optimization refer to each other. Such a formalization has been given in Blumer ( I998 ):
( I I ) bidirectional OT (weak version) (Q) (A, T) satisfies the Q-principle iff (A, T ) E Gena and there is no other pair (A', T ) satisfying the I-principle such that (A1, T) >- (A, T) (A, T) satisfies the I -principle iff (A, T ) E Gena and there is no (I) other pair (A, T1 ) satisfying the Q-principle such that (A, T 1 ) >- (A, T) (A, T ) is called super-optimal iff it satisfies both the Q-principle and the I- principle.7
6 As suggested by an anonymous referee, there is a further argument that shows that it is problematic to have hard constraints for excluding total blocking. In fact, a sentence like (8a) CAN be used in situations where Black Bart caused the Sheriffs gun to backfire it with cotton. This possibiliry is excluded when hard constraints are used as in (9) and ( 1 0). 7 Recently, Gerhard Jager Gager 1999; see also Jager & Blumer to appear) has presented a more transparent formulation of bidirectional OT:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Interpretations
204
Some Aspects of Optimality in Natural Language Interpretation
I call this variant of the bidirectional OT the weak vers10n. The important point is that the structures that compete in one perspective of optimization are constrained by the outcomes of the other perspective and vice versa. The purpose of this kind of recursive dependence can be demonstrated by coming back to our original example which leads now to the following tableau:
( I 2)
Forms AI
Interpretations
I&
F
c
c
»+
* I&
* Tl
Jl)»
*
*
Tl
Let us take first the comprehension perspective starting with A 1• The structures that compete are {r1 , r2} (the marked form A2 does not block any of them). From the fact that ri is less expensive (more stereotypical) than T2 it follows that the little arc »+ has to select T 1 • Now take the production perspective starting with T I . An analogous argument shows that the little hand I& selects A I . Consequently, the pair (Au T1 ) is super-optimal-just as in tableau (7) where we discussed the strong view. Next consider the comprehension perspective starting with A2• In this case the structures that compete are restricted to the singleton {r2} since the unmarked form A 1 blocks T � > and we get that the little arc Jl)» has to select T 2• An analogous argument applies to the production perspective starting with T2• In this case the competition set is restricted to the singleton {A2}, and the little hand I& selects A2• In contrast to the strong view, now the pair (A2, T 2 ) comes out as super-optimal as well. And this demonstrates that the weak view can (A, T ) is super-optimal iff (A, r ) E Gena and (Q) there is no super-optimal (A', r ) < (A, r ) (I) there is no super-optimal (A, r ' ) < (A, r ). Jager has shown that there is a unique super-optimality relation in case < is well founded. Furthermore, this formulation of super-optimality is equivalent to that presented in (I I) if < satisfies transitivity. Jager's results demonstrate that the circularity inherent in definition (I I) is an apparent one only. Suppose the preference relation < as well founded, then both the definition (I I) and Jager's definition of super-optimality come out as sound recursive definitions (cf. also Dekker & van Rooy, this volume). Does the recursive variant of bidirectionality (i.e. weak bidirection) extend the computational capacity of the generator and, if yes, in which way? These are important but largely unsolved problems even for unidirectional OT. (For some interesting results concerning the system OT-LFG, cf. Kuhn zooo). Gerhard Jager (p.c.) has a proof that under the same conditions that are assumed in Karttunen (I 998), weak bidirection does not extend the generative capacity of the generator.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Az
F
Reinhard Blutner
205
.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
account for the good old idea that unmarked forms tend to be used for unmarked situations and marked forms for marked situations. One consequence of the strong mode of optimization in (6) can be summarized as follows: What we produce we are able to understand adequately and what we understand we are able to produce adequately. At least the second part of this consequence is clearly false when we consider children's ability in natural language production, which lags dramatically behind their ability in comprehension. Smolensky (1996) has demonstrated that OT gives an plausible explanation for this lag. OT predicts that in comprehension relatively marked forms can be understood appropriately. However, when we consider generation, then highly unmarked forms are produced that significantly differ from the initial forms. The lag between comprehension and production is overcome by learning. According to the OT learning theory (Smolensky 1 996; Tesar & Smolensky zooo), learning results in a state of the system that satisfies the demands of strong bidirection. It is easy to prove that a pair that is optimal (strong bidirection, c£ (6) ), is super-optimal (weak bidirection, c£ (9) ) as well. However, weak bidirection gives a chance to find additional super-optimal solutions. This is demon strated by tableau ( r 2) Is it possible to give a natural interpretation for these additional solutions? I want to propose the idea that these additional solutions are due to the flexibility and ability to learn which the weak formulation alluded to. In my opinion, the weak version of the bidirectional OT can be taken to describe the possible outcomes of self-organization before the learning mechanism has fully realized the equilibrium between product ive and interpretative optimization. Jager (1 999) and Dekker & van Rooy have proposed algorithms that update the ordering (preference) relation >-- such that (i) optimal pairs are preserved and (ii) a new optimal pair is produced if and only if the same pair was super-optimal at earlier stages. Consequently, we can take the solutions of weak bidirection to be identical with the solutions of strong bidirection considering all the systems that result from updating the ordering relation. Arguably, updating the ordering relation in the style of Jager describes a kind of self-organization which is very close to certain mechanisms of self organization in language change. This point may be clarified when we (re)consider the principle of iconicity (called 'the division of pragmatic labour' within the domain of pragmasemantics). This principle can be proven to result from weak bidirection (ask Gerhard for the proof). In the school of natural morphology (for references c£ Wurzel 1998), the same principle plays an important role in describing the direction of language change.
206 Some Aspects of Optimality in Natural Language Interpretation Constructional iconicity: A semantically more complex, derived morphological form is unmarked regarding constructional iconicity if it is symbolized formally more costly than its semantically less complex base; it is the more marked, the stronger its symbolization deviates from this (Wurzel 1 998: 68).
Analogies of this kind give substance to the claim that weak bidirection can be considered as a principle describing (in part) the direction of language change: super-optimal pairs are tentatively realized in language change. This relates to the view of Horn (1984) who considers the Q principle and the I principle as diametrically opposed forces in inference strategies of language change.
PRESUPP O S I T I O N PROJE C T I O N
In the previous section we have outlined two general ideas that determine the shape of Gen in natural language interpretation: underspecification and dynamic semantics. Within the realm of underspecification we can discriminate between structural underspecification and lexical under specification. Structural underspecification is related, for example, to scope, ellipsis, and presupposition. Lexical underspecification, on the other hand, relates to polysemy, metonymy, and other aspects of the 'Generative Lexicon'. Although it is seldom made completely explicit in OT, the choice of a particular representational format is unavoidable in order to be give a sound formulation of the constraints and their ranking. With regard to the representational format, we will proceed by modelling contexts as DRSs. Moreover, the initial DRSs of presupposition-inducing expressions are treated in the particular framework of van der Sandt (1992) and Geurts (199 5 ). This framework combines the idea of dynamics with the aspect of underspecification that relates to presupposition projection. The aim of this section is to demonstrate that van der Sandt's/Geurts' projection mechanism for presuppositions can be reconstructed (in important aspects) and improved (in secondary aspects) as a consequence of the I-principle. Moreover, it can be explained why accommodation is sometimes blocked. This is an important consequence of the Q-principle, and its integration realizes an effective extension of the van der Sandt/ Geurts proposal. As usual, we consider a DRS K as a pair ( U(K), Con(K) ) , where U(K) is a set of reference markers and Con(K) is a set of DRS-conditions. If P is an n-place predicate, and X1, Xn are reference markers, then P(x1 , , xn) is a simple DRS-condition. If K and K' are DRSs, then • K, K V K', K :::::} K' are •
•
•
,
•
•
•
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
4
Reinhard Blumer
(complex) DRS-condition (c£ Kamp 1 99 5 . 1 999)·
&
Reyle
1 99 3 ;
Kadmon
1 990;
207
Geurts
( 1 3)
u [sem(A)] T just in case T is the result of merging8 u with the result of projecting the presupposed material of sem(A) such that the resulting DRS is a proper one (it may not contain any free reference markers).9
Using the conception of Gen as defined in ( 5 ) , the formulation in ( 1 4) results where the Generator is considered for a specific input form A:
( 1 4) Gena (A) = {T: T is the result of merging u with the result of projecting the presupposed material of sem(A) such that the resulting DRS is a proper one}
The part of the proj ected DRS that factors with part of the superordinated DRS/initial context (u) will be called bound (or resolved) material; the part that does not factor will be called accommodated material. For convenience, in the corresponding DRSs, the part of the presupposition which counts as bound when projected is underlined, and the part which has to be accommodated is underlined twice. 1 995): IfKis a set ofDRSs, then EBK ( UK E K U(K ) , UK E K Con(K )}. A necessary condition is that presupposed material projects to a DRS that subordinates the origin position. 8 9
DRS-merge (c£ Geurts
=
Geurts 1 995): ::; is the smallest preorder (transitive, reflexive) for which all of the following hold, for any K, K', K":
Subordination (c£
a. b. c. d.
If -,K' E Con(K), then K ::; K' If K ' V K " E Con(K), then K ::; K' and K ::; K " If K ' => K" E Con(K), then K ::; K' ::; K" If B/K ' E Con(K), then K ::; K' (Read K' ::; K as K' subordinates K ).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In order to account for presupposition inducers we introduce a further type of complex DRS-conditions: conditions of the form B/K, where K is a DRS and B is a DRS-condition. Conditions B/K have a special status and are called slash-conditions. They induce presuppositions and mark them as material behind the slash. Though not identical, this notation is very similar to that of Geurts ( I 995 ). The role of slash conditions is to indicate that a presupposition may be bound or accommodated in any DRS that subordinates the DRS in which it originates. Since the structural position where the presupposition is resolved/accommodated is not specified semantically, an element of structural underspecification is introduced into the whole framework. More formally, let u and T be ordinary DRSs and sem(A) be a DRS that may contain slash conditions (introducing presupposed material). Then the idea can be expressed by the following notion of context change:
208 Some Aspects of Optimality in Natural Language Interpretation
Let us give two simple examples. In ( I 5) a conditional A is given and its semantic form sem(A) is indicated. With regard to an initial context that is empty (0) three projections of the presupposed material are possible. They are indicated by T I , T 2 , r3 and refer to what is usually called local, intermediate, and global accommodation, respectively. Binding is not possible in these situations.
( I 5) A:
If Peter has a dog, then his cat is gray - [ :[x: dog(x), have(Peter, x)] =? [ : gray(y) I [y: have(Peter, y), cat(y)] ] ] Gen(A) {ri , T2, r J , where T I = [ : [x: dog(x), have(Peter, x)] =? [y: gray(y), have(Peter, y), cat(y)l ] T2 = [ :[x, y: dog(x), have(Peter, x), have(Peter, v), cat(v)] =? [ : gray(y)] ) r3 = [y: have(Peter, y), cat(y), [x: dog(x), have(Peter, x)] =? ( : gray(y)] ] sem(A)
( I 6) A:
If Peter has a cat, then his cat is gray ( : [x: cat(x), have(Peter, x)] sem(A) =? [ : gray(y) I [y: have(Peter, y), cat(y)] ] ] Gen(A) {r1 2 T2 , rJ, where T I = [ :[x: cat(x), have(Peter, x)] =? [y: gray(y), have(Peter, y), cat(y)l ] T2 = ( :[x: cat(x), have(Peter, x)] =? [ : gray(x)] ] T 3 = [y: have(Peter, y), cat(y), [x: cat(x), have(Peter, x)] Y [ : gray(y)] ]
In this case, the local projection (r1) and the global projection (r3) require accommodation. In contrast, the intermediate projection allows factoring, which is already realized in T2• (Bound material is indicated by single underlining). In example (I 6) the intuitively correct interpretation refers to the intermediate projection (T2). In order to account for the intuitively correct interpretations of complex sentences that contain presupposition inducers, van der Sandt ( 1992) assumes that the projection process is restricted by general preferences. Geurts ( I995 ) has reformulated and improved van der Sandt's account. His preferences are as follows: (i) If a presupposition can both be bound or accommodated, there will in general be a preference for the first option, and (ii) If a presupposition can be accommodated at two different sites, one of which is subordinate to the other, the higher site will, ceteris paribus, be preferred. (Geurts I 995: 27ff)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Intuitively, the interpretation given by r3 (global accommodation) seems to be strictly preferred. This conforms to our intuition which interprets A by assuming that Peter has a cat and saying that it is gray in case Peter has a dog. Another example is the following:
Reinhard Blumer 209
Moreover, Geurts provides a clear motivation for these preferences. The rationale behind (i) is that hearers generally aim at interpretations that are maximally coherent, and (ii) is explained by the assumption that hearers tend to prefer the strongest interpretation that is consistent with what the speaker says (Geurts 1995: 28).1 0
My suggestion for an OT treatment of presupposition projection is simply to take the rationale behind Geurt's preferences more serious than the preferences themselves. Consequently, the following constraints can be formulated:
Their ranking is R: AvoidA
»
BeStrong
The first constraint prefers to bind presupposed material instead of accommodating it. Moreover, the present formulation of AvoidA gives a partial explanation for the preference for bridging and partial resolution over pure accommodation.1 1 The notion of strength, on the other hand, is based on the entailment relation which is well defined within DRT (c£ Geurts 1995). As demonstrated in Blutner (1998), this notion can be refined by introducing a probabilistic measure. In any case, what is important is the fact that BeStrong is a graded constraint, not an absolute one. The ranking AvoidA » BeStrong is necessary to validate van der Sandt's/Geurts' first preference. 1 2 It is not difficult to see how interpretation optimality (!-principle) solves the selection task with regard to the examples given in ( I 5) and ( I 6). The respective OT tableaus are presented in (r7) and (r 8) in a schematic form. u>v
0
If p then q/r
»->
w>v
*AvoidA "BeStrong *AvoidA vBeStrong r,p =>
q (global)
(r 1\ p)
=>
*AvoidA wBeStrong
q (Interm.) p => ( q 1\ r) (local)
10 In a footnote, Geurts tells us that this is true only as long as we ignore bridging. In the present paper, we are susceptible to this ignorance. 1 1 By introducing probabilistic notions such as salience and cue validiry the formulation of the constraint can be refined (perhaps along the lines outlined in Blumer 1 998). 12 I am convinced that this strict ranking system must be replaced by a cumulative constraint weighting system when it comes to considering the bulk of bridging phenomena.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
C 1 : Avoid Accommodation (AvoidA): It counts the number of discourse markers that are involved in accommodation. Cz: Be Strong: It evaluates pairs ( A, T) with stronger outputs T higher than pairs with weaker ones.
2 r o Some Aspects of Optimality in Natural Language Interpretation
In the first case all the possible outcomes (r u T2, r3 ) violate the constraint AvoidA (with regard to the reference marker y). Consequently, BeStrong is the critical constraint. Because global accommodation gives the strongest outcome it wins the competition. ( I 8) 0 If p then q/p *AvoidA "BeStrong p, p
=>
q (global)
u>v
w=v
D->
AvoidA vBeStrong
*AvoidA wBeStrong
p => q (Interm.)
p => ( q 1\ p) (localr3
(I 9) a. Every German is proud of his car b. Every German who owns a car is proud of it c. Every German has a car and is proud of it In (I9a) global accommodation is excluded14 and we have to select between intermediate and local accommodation only. Local accommoda tion refers to the stronger interpretation and intermediate accommoda tion refers to accommodation at the higher site. Consequently, if we take the criterion that prefers the higher site, then the interpretation of (I 9a) is identified with that of ( I 9b). In contrast, the criterion that prefers the stronger interpretation identifies the interpretation of (I 9a) with that of ( I 9c). Unfortunately, it is not easy to determine what the intuitively correct interpretation of ( I 9a) is, since the proposition that Germans have cars is nearly tautological. Beaver (I 994) gives an example where the judgement is easier. The following is a slightly simplified vers10n. 13 In this schematic formulation (ignoring reference markers) the intermediate and the local version seem to be logically equivalent, which is not really the case. 1 4 The presupposition triggered by his car contains a reference marker that is bound by the quantifier and it would be free if the presupposition were accommodated globally (resulting in an improper DRS).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In the second case, global and local projection give outcomes that violate the constraint AvoidA. In contrast, intermediate projection allows factoring and that is why it avoids accommodation. Because the constraint AvoidA ranks higher than the constraint BeStrong, intermediate projection is the winner. Obviously, there is no necessary connection between how close the projection is to the main DRS and how strong the resulting interpretation is. A case in point where the two criteria diverge is given by the following example:
Reinhard Blutner
2I I
(2o) a. ??Few of the team members can drive, but every team member will come to the match in her car. b. Few of the team members can drive, but every team member who owns a car will come to the match in her car. c. ?Few of the team members can drive, but every team member owns a car and will come to the match in her car
(2 1) a. Birds lay eggs (preferred ftmale birds lay e�s) b. Most ships unload at night (preferred most ships that unload do it at night) My feeling is that intermediate accommodation is partial in these cases and can outrank local accommodation, which is less partial. 1 6 The kind of partiality I have in mind is probabilistic in nature. A possible way to approach this phenomenon is by adopting an OT framework that is controlled by cue validity and other probabilistic factors (c£ Blutner (1998) for realizing such a framework using a Generator based on abduction). Further research seems necessary to clarify this point. So far we have almost exclusively considered interpretation optimality (!-principle). Is it necessary to make use of the other way of optimization (Q-principle)? The answer is clearly affirmative. The point is that accommodation is not always possible although the !-principle demands it. Accommodation can be blocked. The following example by Asher & Lascarides ( 1998) gives a demonstration. Let us compare the two dialogues (22abc) and (22abd): (22) a. b. c. d.
A: Did you hear about John? B: No, what? A: He had an accident. A car hit him. A: He had an accident. ??The car hit him.
15 This is a somewhat unfair and roughly simplifying look on the van der Sandt/Geurrs proposal. Geurts and van der Sandt (1 999) demonstrate that with a little use of abstraction rules and propositional reference markers the data of Beaver ( 1 994) can be handled. My point here is only to demonstrate that the problems can be resolved in a different way if we take the rationale behind the preferences more seriously than the preferences themselves. 16 Note also the importance of stress and focus, especially in example (2 1 b) (c£ Hendriks & de Hoop, to appear)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Intuitively the interpretation of (2oa) is rather strange while (2ob) is a perfectly acceptable sentence. According to Beaver (1994), this demonstrates that the van der Sandt/Geurts proposal must be wrong, since their criterion identifies the interpretation of (2oa) with that of (2ob). In contrast, the present OT proposal identifies the interpretation of (2oa) with that of (2oc), which I think is a much better choice. 1 5 A further point i s that we should explain why in many examples intermediate accommodation is clearly dominant, such as in the following:
212
Some Aspects of Optimality in Natural Language Interpretation
(2 3) A trigger for presuppositions does not accommodate iff any occurrence of it has a simple expression alternative that does not trigger. Based on the availability of expression alternatives and the logical requirement of the presupposition proposed a fine-grained classification of presupposition triggers can be proposed. Even more interesting, an understanding of presupposition triggers like discourse particles, which are typically outside the scope of most standard theories becomes feasible (cf Zeevat I 999). The semantics and pragmatics of focus provides a further challenge for applying the present ideas. Adding only one new constraint, Avoid Focus, which is ranked lower than Avoid Accommodation, it is a simple exercise to demonstrate that Schwarzschild's deaccenting theory of congruence (Schwarzschild I999) is a natural consequence of the present ideas, crucially making use of the Q-principle. In the first part of this paper I have outlined some theoretical reasons that recommend the weak version of bidirectional OT. From an empirical point of view it is not trivial to find data where the weak version is clearly 17 Bart Geurts (p.c.) argues that the discourse (2 sd) is unacceptable because the proposition made by the second part is rather uninformative (supposed appropriate bridging). Though this idea is interesting it cannot be the whole story. In particular, the idea cannot explain the contrast between the following examples:
c'. He had a bike accident. A car hit him seriously. d'. He had a bike accident. ?The car hit him seriously. Furthermore, the contrast does not disappear when dropping the material that according to Geurts can trigger bridging: c11• A car hit him (seriously). d". ?The car hit him (seriously).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The van der Sandt/Geurts approach does not predict any difference between these two discourses and would find them both acceptable. But (22abd) is unacceptable, while (2sabc) is acceptable. fu a matter of fact the presupposition of the car cannot be accommodated in (22abd). With the help of the Q-principle this observation is easy to explain. Starting with a neutral context (} (neutral with regard to cars), the outcome of context change is the same for (22c) and for (22d). Consequently, the two sentences constitute simple expression alternatives. The difference is that in the second case but not in the first one accommodation is necessary to yield the output context. This makes the second case the more complex one and as such it is blocked by the simpler alternative (Q-principle). 17 Zeevat ( I 999) formulated and substantiated the following theorem which generalizes a series of related facts. It can be proved in the very same way we have just sketched.
Reinhard Blutner 2 r 3
Acknowledgements This paper is dedicated to Manfred Bierwisch on the occasion of his 7oth birthday. This work was supported by the Deutsche Forschungsgemeindschaft (DFG). Parts of this paper were first presented on a DIP colloquium in Amsterdam. My special thanks go to Henk Zeevat and Helen de Hoop who have encouraged me to pursue this line of research and gave valuable impulses and stimulation. Furthermore, I have to thank Anton Benz, Manfred Bierwisch, Paul David Doherty, Werner Frey, Bart Geurts, Gerhard Jager, Paul Law, Klaus Robering, Paul Smolensky, and Rob van der Sandt. I am grateful to two anonymous referees for their very helpful comments. Received: Final version received:
REINHARD BLUTNER Humboldt University, Berlin Prenzlauer Promenade 149 -152 D-13189 Berlin Germany
[email protected] http://wwwz.rz.hu-berlin.de/asg/blutner/
0 5 .04.00 2 5 .07.00
REFERE N CE S Asher, Nicholas & Lascarides, Alex ( 1 998), 'The Semantics and Pragmatics of Pre supposition', journal of Semantics, r s , 239-99· Beaver, David (r994), 'Accommodating Topics', in Peter Bosch & Rob van der Sandt (eds), Focus and Natural Language Processing. Volume ;: Discourse, IBM, Heidelberg, 439-48. Atlas, Jay David & Levinson, Stephen C. (I 98 I), 'It-clefts, informativeness and logical form', in Peter Cole (ed.), Radical
Pragmatics, Academic Press, New York,
I-6 1 . Bierwisch, Manfred (I 9 8 3), 'Semantische Einheiten und konzeptuelle Reprasenta tion lexikalischer Einheiten', in Untersu chungen zur Semantik, Akademie-Verlag, Berlin, 6 I -99. Bierwisch, Manfred (1996), 'Lexical infor mation from a minimalistic point', in Chris Wilder (ed.), The Role of Economy Principles in Linguistic Theory, Akademie Verlag, Berlin. 227-66.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
preferred over its strong counterpart. The investigation of phenomena where Q-based effects (blocking) interact with 1-based effects (interpreta tional preferences) may be an opportunity to make the comparison conceivable. As a first step in this direction, Jager & Blutner (to appear) investigated the interaction between polysemy and focus. Dealing with the German adverb of repetition 'wieder' (again), the specific linguistic puzzle that was envisaged concerned the selection of the repetitive vs. the restitutive readings, depending on focus and scrambling. The results appeared to favour the weak version of bidirectional OT. It seems important to me to pursue the problem of discriminating between the weak and the strong version in depth.
2 I 4 Some Aspects of Optimality in Natural Language Interpretation
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Blutner, Reinhard (I998), 'Lexical prag Gibson, Edward & Broihier, Kevin (I 998), 'Optimality theory and human sentence matics', Journal ofSemantics, I s, I I s-62. processing', in Pilar Barbosa, Danny Fox, Boersma, Paul (I998), Functional Phonology, Paul Hagstrom, Martha McGinnis, & Holland Academic Graphics, The David Pesetsky (eds), Optimality and Hague. Competition in Syntax. MIT Press, Bresnan, Joan (to appear), 'Explaining mor Cambridge, MA, I 5 7-91 . phosyntactic competition', in Mark Baltin & Chris Collins (eds), Handbook Groenendij k, Jeroen & Stokhof, Martin ( 1 99 1 ), 'Dynamic predicate logic', of Contemporary Syntactic Theory, Black Linguistics and Philosophy, 14, 39-Ioo. well, Oxford. Burzio, Luigi (r989), 'On the non-existence Haspelmath, Martin (I999), 'Optimality and diachronic adaptation', Zeitschrift of disjoint reference principles', Rivista di fur Sprachwissenschafi, r 8 , I 80-205. Grammatica Generativa, 14, 3-27. Burzio, Luigi (I998), 'Anaphora and soft Heim, Irene (I982), 'The semantics of definite and indefinite noun phrases', constraints', in Pilar Barbosa, Danny Fox, Ph.D. thesis, University of Massachu Paul Hagstrom, Martha McGinnis, & setts, Amherst. David Pesetsky (eds), Optimality and Competition in Syntax, MIT Press, Hendriks, Petra & Hoop, Helen de (to Cambridge, MA, 9 3- I I 3. appear), 'Optimality theoretic semantics', Linguistics and Philosophy. Carston, Robyn (I998), 'The semantics/ pragmatics distinction: a view from rel Hoop, Helen de (2ooo), 'Optimal scram evance theory', UCL Working Papers in bling and interpretation', in H. Bennis, Linguistics, 10, I-30. M. Everaert, & E. Reuland (eds), Inter face Strategies, KNAW, Amsterdam, Choi, Hye-Won (1996), 'Optimizing Struc ture in Context', Ph.D. dissertation, I 5 3 - I 68. Stanford University. Hoop, de Helen & Swart, Henriette de (I998), 'Temporal adjunct clauses in Copestake, Ann & Briscoe, Ted (I 995), 'Semi-productive polysemy and sense optimality theory', MS, OTS Utrecht. extension', jou rnal ofSemantics, 12, I s-67. Horn, Laurence R. (I984), 'Toward a new taxonomy for pragmatic inference: Deemter, Kees van & Peters, Stanley (eds) Q-based and R-based implicatures', in (I996), Semantic Ambiguity and Under specification, CSLI Publications, Stanford, D. Schiffrin (ed.), Meaning, Form, and Use in Context, Georgetown University Press, CA Washington, I I-42. Dekker, Paul & Rooy, Robert van (this volume), 'Optimality theory and game Ito, Junko, Mester, Armin & Padgett, Laye (I 99 5 ), 'Underspecification in optimal theory: some parallels'. ity theory', Linguistic Inquiry, 26, 5 7 1 Fanselow, Gisbert, Schlesewsky, Matthias Cavar, Damir & Kliegl, Reinhold 6 1 3. (I999), 'Optimal parsing'. MS, University Jager, Gerhard ( 1999), 'Optimal syntax and of Potsdam. optimal semantics', handout for talk at Geurts, Bart (I99S), 'Presupposing', Ph.D. DIP-colloquium. Available from http:// www .zas.gwz-berlin.de/ mitarb/home dissertation, University of Osnabriick. Geurts, Bart (I 999), 'Presuppositions and page/jaeger/ Pronouns', Elsevier, Oxford. Jager, Gerhard & Blutner, Reinhard (to Geurts, Bart & Sandt, Robert A van der appear), 'Against lexical decomposition (I 999), 'Domain restriction', in Peter in syntax', in Proceedings of IATL 15, Bosch & Robert A van der Sandt (eds), University of Haifa. Focus: Linguistic, Cognitive and Computa Jakobson, Roman (r94 I / r 968), Child Lan tional Perspectives, Cambridge University guage, Aphasia and Phonological Universals, Press, Cambridge. Mouton, The Hague.
Reinhard Blumer 2 I 5 Kadmon, Nirit (I990), 'Uniqueness', Lin guistics and Philosophy, 1 3, 273-3 24. Kager, Rene (I999), Optimality Theory, Cambridge University Press, Cambridge. Kamp, Hans (I98 I), 'A theory of truth and semantic representation', in Jeroen Groenendijk et al. (eds), Formal Methods in the Study of Language, Mathematisch Centrum, Amsterdam. Kamp, Hans & Reyle, Uwe (1993), From
Dordrecht. Karttunen, Lauri (I998), 'The proper treat ment of optimality in computational phonology', Xerox Research Centre Europe manuscript. ROA-25 8-0498, Rutgers Optimality Archive, http:// ruccs.rutgers.ed u/roa.html. Kiparsky, Paul (1982), 'Word-formation and the lexicon' in F. Ingeman (ed.), Proceedings of the guistic Co'!ftrence.
1982
Mid-America Lin
Kuhn, Jonas (2ooo), 'Generation and par sing in optimality theoretic syntax issues in the formalization of OT LFG', to appear in Peter Sells (ed.), Formal and Empirical Issues in Optimal ity-theoretic Syntax, CSLI Publications,
Stanford, CA Lee, Hanjung (2ooo), 'Markedness and word order freezing', to appear in Peter Sells (ed.), Formal and Empirical Issues in Optimality-theoretic Syntax, CSLI Publica tions, Stanford, CA Levinson, Stephen C. ( r 98 3 ), Pragmatics, Cambridge University Press, Cambridge. Levinson, Stephen C. (r987), 'Pragmatics and the grammar of anaphora', journal of Linguistics, 2 3 , 379-434· McCawley, James D. (1978), 'Conversa tional implicature and the lexicon', in Peter Cole (ed.), Syntax and Semantics 9: Pragmatics, Academic Press, New York, 245-59· Nunberg, G. & Zaenen, A (1992), 'Systema tic polysemy in lexicology and lexico graphy', in K Varantola, H. Tommola,
,
Explorations in the Microstructure of Cogni tion, MIT Press, Cambridge, MA, 1 94-
28 I. Smolensky, Paul (1996), 'On the com prehension/production dilemma in child language', Linguistic Inquiry, 27, 720-3 I . Speas, Margaret (1997), 'Optimality theory and syntax: null pronouns and control', in Diana Archangeli & D. Terence Langendoen (eds), Optimality Theory: An Overview, Blackwell, Oxford, 1 3 470. Tesar, Bruce & Smolensky, Paul (2ooo), Learnability in Optimality Theory, MIT Press, Cambridge, MA Williams, Edwin (1997), 'Blocking and anaphora', Linguistic Inquiry, 28, 577-628. Wilson, Colin (1998), 'Bidirectional opti mization and the theory of anaphora', MS, Johns Hopkin University, to
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Discourse to Logic: Introduction to Mod e/theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory, Kluwer Academic Publishers,
T. Salmi-Tolonen, & J. Schopp (eds), Euralex II, Tampere, Finland. Pesetsky, David (1997), 'Optimality Theory and syntax: movement and pronunci ation', in Diana Archangeli & D. Terence Langendoen (eds), Optimality Theory: An Overview, Blackwell, Oxford, I 3 4-70. Prince, Alan & Smolensky, Paul (1993), 'Optimality theory: constraint inter action in generative grammar., MS, Rudgers University, New Brunswick, NJ and University of Colorado, Boulder (to appear, MIT Press, Cambridge, MA). Reyle, Uwe (1993), 'Dealing with ambigu ities by underspecification: construction, representation and deduction', journal of Semantics, ro I 23-79· Sandt, van der & Robert, A (I992), 'Pre supposition projection as anaphora reso lution', journal of Semantics, 9, 3 3 3-77. Schwarzschild, Roger (1999), 'GIVENness, AvoidF and other constraints of the placement of accent', Natural Language Semantics, r 3 , 87-I 3 8. Smolensky, Paul (I 986), 'Information processing in dynamical systems: foun dation of harmony theory', in David E. Rumelhart & James L McClelland (eds),
2 16
Some Aspects of Optimality in Natural Language Interpretation
appear in Jane Grimshaw, Geraldine Legendre, & Sten Vikner (eds), Optim ality Theoretic Syntax, MIT Press, Cam bridge, MA Wurzel, Wolfgang U. (1998), 'On marked ness', Theoretical Linguistics, 24, S 3-7 r . Zeevat, Henk (1999), 'Explaining presuppo sition triggers', MS AC99, University of
Amsterdam, available from http:/I www.hum.uva.nl/ computerlinguistiek/ henk/ Zeevat, Henk (this volume), 'Semantics in optimality theory'. Zipf, George K.. ( 1 949), Human Behavior and the Principle of Least Effort, Addison Wesley, Cambridge.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
journal of Semantia
17: 2 1 7-242
© Oxford University Press
2000
Bi-Directional Optimality Theory: An Application of Game Theory
University of Amsterdam
PAUL DEKKER AND ROBERT VAN ROOY
Abstract
r
INTRODUCTION
I fJohn says that OTS is possibly right, we can infer from this that he thinks it is not obviously, or necessarily right. What kind of inference is this? Suppose that from Possibly A we can infer semantically that it is possible that A is false. By this assumption we can easily account for the above inference, but we can no longer account for the fact that we might appropriately say OTS is possibly right, if not necessarily. The latter example makes clear that the above inference to the possibility that OTS is wrong cannot be conventionally associated with all sentences in which the sentential clause OTS is possibly right occurs. But how then should we account for the intuition that we can conclude that OTS might be wrong from what John says? Following Grice (1975), it has become a common practice in the area of pragmatics to distinguish what is said by the speaker's use of a sentence (the semantic or truth-conditional meaning of a sentence), and what is meant by it on a particular occasion. Thus conceived, pragmatics is concerned with the study of what is meant by an utterance above its semantic, or truth-conditional, content by taking into account the issue whether the utterance is appropriate in its conversational context, i.e. with respect to the (common) beliefs and intentions of the participants of the conversation. The main motivation for this division of labour between semantics and pragmatics is to keep the semantics as simple as possible; it allows us to determine the semantic content of a sentence in a compositional way based on its syntactic structure, without making reference to the attitudes of speakers and hearers.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Optimality Theory catches on in linguistics, first in phonology, then in syntax, and recently also at the semantics/pragmatics interface. In this paper we point to some parallels between principles employed in optimality theoretic interpretation, and notions from the well established field of Game Theory. Optimality theoretic interpretation can be defined as what we call an 'interpretation game', and optimality itself can be viewed as a solution concept for a game. More in particular, optimality can be characterized in terms of the game-theoretical notion of a 'Nash Equilibrium'.
2 1 8 Bi-Directional O ptimality Theory: An Application of Game Theory
Following Gazdar (1979), the following general pipe-line architecture of the semantics/pragmatics interface has emerged: I. 2.
3·
Thus, according to Gazdar, the semantics/pragmatics interaction goes only one way; although what is pragmatically presupposed or implicated might depend on the semantic content of the sentence, semantics is autonomous from pragmatics. It seems clear to us that this strong Gazdarian picture of the interface must be wrong for the following reason: not only what is pragmatically implicated depends on the attitudes of the participants of the conversation, but this might also the case for the truth-conditions that a sentence has. Pragmatic notions like appropriateness, expectation/naturalness and relevance are used both to determine what is conversationally implicated and to determine what is asserted by a sentence. It is clear that this dependence of what is said, or asserted, on pragmatic notions undermines the goal to determine the truth-conditions of sentences in a compositional way. Natural-language sentence are highly context-dependent; their truth conditions depend not only on the words used, but also on the circum stances in which they are used. The crucial point is that it seems impossible to explain systematically the truth-conditions that sentences have without referring to the beliefs, presuppositions and intentions of the participants of the conversation. For an illustrative example, let us consider briefly the process of anaphora resolution for a sentence like He is tall. It is clear that this sentence is highly underspecified or ambiguous; in different contexts the pronoun might refer to different individuals. Its resolution implies reference to such things as focus (Sidner r 98 3 ), the syntactic position (subject/non-subject) of the antecedent (Grosz et al. 1995), but also to the scenarios/prototypical situations involved (e.g. Sanford & Garrod 1 98 r ). Although the meaning of the sentence is highly context-dependent, the sentence has a more constant meaning, too; we might say that in all contexts the pronoun refers to the most salient male individual in that context. A
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
What is said by a (declarative) sentence, its semantic content, is equated with its truth-conditions. Truth-conditional content can be determined in a rather simple way compositionally without making reference to either what is (or could be) pragmatically implicated by what is said, or the attitudes of the participants of the conversation. To determine what is pragmatically implicated we can, and have to, make use of the truth-conditional content of the sentence; what is potentially implicated might be overruled, or cancelled, if it conflicts with what is semantically entailed, as in our above example OTS is possibly right, if not necessarily.
Paul Dekker and Robert van Rooy
2I9
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Gazdarian might then propose to represent this contextual information in a more or less objective way, without referring to the attitudes of the agents. What is the most salient individual in a context? For some contexts we can give rather objective criteria. For instance, it seems clear that when we utter the above sentence in the context where Bill is next to John has just been uttered, the pronoun will refer to Bill, but when the foregoing sentence would have been John is next to Bill, the pronoun would refer to John. The objective criterium in this case is that the (individual denoted by the) subject of a preceding sentence is more salient than the (individual denoted by the) object. But now consider the following discourse: Bill tickledJohn. He squirmed. According to the above rule the pronoun should refer to Bill. It is clear, however, that according to its most reasonable interpretation the pronoun does not refer to Bill, but to John. Why? Because we assume that it is the tickled person who has reason to squirm; the assertion that John squirmed is more in accordance with the expected scenario triggered by the previous sentence than the assertion that it is Bill who squirmed. We conclude that the speaker asserted that John squirmed, i.e. the constraint that the pronoun refers to the most salient person in its context of interpretation is overruled by the constraint that demands that what is said should be natural in its context of interpretation, i.e. in accordance with the relevant scenario. The triggered scenarios depend on world-knowledge and expectations of the participants of a conversation, which suggests that the relevant contextual parameters cannot be given without making reference to the attitudes of the speakers. But now we are running ahead of ourselves. For we might think of representing the relevant contextual parameter in the context of interpretation of the sentence in which the pronoun occurs as an 'objective' salient order, when we allow with Lewis (1979) for a rule of accommodation of comparative salience. In principle this is feasible, but note that in this case it is the process of accommodation that is governed by notions like appropriateness, naturalness or relevance that cannot be described without making reference to the attitudes of agents. Notice that according to this variant the relevant contextual parameter that helps to determine what is said (its truth conditions) by an utterance, the salience ordering, crucially depends on the utterance itself; whether and how the salience order should be accommodated depends on what would have been said by this utterance according to the different possible salience orderings. Observe also that in this variant some constraints can be overruled by our general pragmatic notions; in this case not that a pronoun should refer to the most salient individual in its context, but rather that the salience order determined after the interpretation of the first sentence of a discourse will function as the relevant salience order to interpret the anaphoric pronouns of the following sentence.
220
Bi-Directional Optimality Theory:
An
Application of Game Theory
2 O PTIMALITY THE ORETIC I NTERPRE TATI O N Recently, various phenomena on the semantics/pragmatics interface, like the ones discussed above, have been given an optimality theoretic formulation (Blutner, Hendriks & de Hoop, de Hoop & de Swart, Jager, Zeevat). In this section, and in section 4, we give a short overview of the various types of analyses that have been proposed, and illustrate these by means of a few examples.
2. r
One-dimensional optimality
According to the proposed application of Optimality Theoretic principles by de Hoop & de Swart (to appear) and Hendriks & de Hoop (2001) to the theory of interpretation, what compositional semantics gives us is a radically underspecified notion of meaning represented by a possibly infinite set of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The above example shows that we cannot systematically determine the semantic content of a sentence in a compositional way based on its syntactic structure, without making reference to the attitudes of speakers and hearers, if we equate the semantic content of a sentence with its truth -conditions. So what should we do? Give up compositionality, or give up the assumption that what should be determined compositionally are the truth-conditions of a sentence? The former, radical, option would result almost surely in giving up the distinction between semantics and pragmatics, as has been proposed in the old days of generative semantics. According to the latter option, compositional semantics still has a role to play. However, the semantic content of a sentence is not fully determined and does not give rise to clearcut truth -conditions; it is left underspecified. We have only discussed pronouns above, but similar remarks can be, and have been, made for the interpretation of other context-dependent constructions like modals (Kratzer 1977), presuppositions (van der Sandt 1992), quantifier scope (Parikh 1991), tenses (Asher & Lascarides 1993), adjectives (Blutner 1998), and quantified constructions (Hendriks & de Hoop, 2001). For all those cases it has been proposed that what should be determined compositionally should be left rather underspecified, and that to determine the actual truth-conditions of a sentence we have to rely on constraints motivated by principles of rational communication as given, for instance, by Grice's maxims of conversation. This results, obviously, in a new formulation of the semantics/pragmatics interface.
Paul Dekker and Robert van Rooy
22 1
( r ) Often when I talk to a doctor;, the doctor { i, J} disagrees with him { i, J} · In the interpretation of this example two constraints are at work:
(B)
If two arguments of the same semantic relation are not marked as being identical, interpret them as being distinct (DOAP) Don't Overlook Anaphoric Possibilities
In example ( I ), the two constraints have conflicting effects. If (DOAP) is fully satisfied, that is, if both 'the doctor' and 'him' are interpreted as anaphoric upon 'a doctor', then (B) is violated. And if (B) is satisfied, then at least either 'the doctor' or 'him' remains unresolved. Intuitively, this seems the best solution, and Hendriks & de Hoop therefore use this example to show that constraint (B) is harder than (DOAP). The (DOAP)-principle can be
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
interpretations of a well-formed syntactic structure. In addition, optimality theory gives us a ranked set of constraints that allow us to select the optimal interpretation associated with a particular syntactic structure. These constraints should of course be as general as possible, and also the rankings between those constraints should, if possible, be valid for a wide range of languages, based on general principles of rational communication. In order to illustrate how things might work out in such a theory, consider again the example that we discussed above with an anaphoric pronoun. The example is of the form aRb. He is P, where in the first sentence a and b are both names for male individuals. Discourses of this form are potentially ambiguous, or underspecified, because the pronoun might refer back to either a or b. But we can say something more; on the basis of empirical data we might observe that the pronoun will typically refer back to the subject expression, i.e. a. We can state this observation explicitly in a constraint. This constraint is very particular, but we might embed this particular constraint within a more general one, if we make use of the notion of comparative salience. In whatever way we do this, the important point is that the relevant constraint should not be too hard; in some circumstances it might be overruled. In the above discussed discourse Bill tickled John. He squirmed, for instance, it does not seem natural to state that Bill squirmed after the first sentence. Because it seems reasonable, with an eye upon the communicative aims, to assume that the constraint on naturalness is more important than the constraint on salience, the constraint that in our case demands that the pronoun should refer to the subject expression of the previous sentence becomes overruled. Thus, although pronouns are meant to refer back to subject expressions of previous sentences, this will only result in an optimal interpretation in case the stronger constraint of naturalness is also met. Another example, discussed in Hendriks & de Hoop, is the following:
222
Bi-Directional Optimality Theory: An Application of Game Theory
2.2
The Q- and !-principles
In his seminal paper on Logic and Conversation, Grice (1975) tried to account for so-called pragmatic inferences by making use of four maxims of conversation: the maxims of quality, quantity, relation, and manner. More recently, some attempts have been made to reduce and explicate these maxims to some more principled rules of, or constraints on, rational behaviour in communication. Valuable contributions in this direction have been made especially by Atlas & Levinson ( 198 r) and Horn (r984), who seek to reduce the maxims of quantity, relation, and manner to the following two principles: the Q-principle (implementing Grice's first maxim of quantity), which advises the speaker to say as much as he can to fulfil his communicative goals, and the I -principle (called R-principle by Horn 1984, and implementing the rest of the Gricean maxims except for quality), which advises the speaker to say no more than he must to fulfil his 1 The idea to compare not only different outputs with each other to determine the optimal interpretation, but also to take different inputs into account, can be traced back to Prince & Smolensky's (to appear) principle of Lexicon Optimization (section 9.3). A bi-directional view on optimality plays implicitly also an important role in the OT learning algorithm (Tesar & Smolenksy, to appear). According to this algorithm each piece of positive evidence (structural description) about the correct ordering of constraints brings with it a body of implicit negative evidence; the chosen description is preferred to the given competitors.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
overruled in order to satisfy (B), and the 'optimal' interpretation is that either 'the doctor' and not 'him' is anaphoric upon the antecedent 'a doctor', or the pronoun and not the definite description is. So far we have sketched an optimality theoretic formulation of only one of the two types of pragmatic inferences which we discussed in the first section of this paper. So how should we account for the case with which we began our story: the scalar implicature from 0 A to -, OA? Our intuitive explanation for this implicature was that the speaker did not think it was necessary that OTS was right, because otherwise he would have said so, i.e. he would have used another expression. It is not entirely clear how to account for this reasoning in terms of the above sketched one-dimensional search for optimality where the input is given by single syntactic structure, and no reference is made to alternative expressions that the speaker might have used. Blutner (MS) has recently argued that an account of scalar implicatures requires us to take into consideration what the speaker could have said, and proposed to go from a one-dimensional to a two dimensional search for optimality.1 This two-dimensional view was mainly motivated by a reduction of Grice's maxims of conversation to two principles.
Paul Dekker and Robert van Rooy
223
2 Notice the resemblance with Sperber & Wilson's (1 986) Relevance Theory, according to which meaning-optimal relevance-can be thought of as a balance between the two competing forces of maximization of contextual effect and minimizing of processing effort.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
communicative goals. By means of the I -principle we can explain, for instance, why in many contexts we can use (short, and thus efficient) pronouns to refer to individuals, instead of long eternal definite descrip tions, and it can also help to explain why in many cases the conjunctive connective and gives rise to a temporal, or even causal, interpretation. The Q-principle is responsible for the so-called scalar implicatures, and makes essential reference to alternative expressions the speaker could have used. Although both principles have the effect that the hearers can conclude more from the utterance than what is explicitly said by it, the strenghthenings due to the I and Q principles typically go in opposite directions. As a result, the two principles sometimes advise the speaker to do opposite things, and thus we would expect that the hearers sometimes do not know what to make of the utterance. For instance, if you say John was able to solve the problem, I can conclude by means of the I -principle that John actually solved the problem, while the Q-principle gives rise to the opposite conclusion that John actually did not solve the problem. (For otherwise you should have said he did so.) Horn (1984), following Zipf (1949), gives an interesting motivation for why the I- and Q-principles seem to give rise to opposite conclusions. He argues that the principles can be seen as representations of rational goals of competing forces to minimize their efforts: The I -principle represents the speaker's goal to minimize the effort to communicate as much as possible, while the Q principle can be seen to represent the hearer's goal to minimize his effort to understand.2 Looking at both principles from a minimization point of view has the effect that the I -principle and the Q-principle should be seen from two different perspectives: the I -principle from the speaker's perspective, and the Q-principle from the hearer's perspective. Interestingly, the principles can be viewed, equivalently it seems, from a maximization point of view when we switch roles. That is, an I -maxim requiring a cooperative speaker to say no more than needed, will make a rational hearer to get as much as possible out of which the speaker says, that is, in such a cooperative setting, the I -principle relates to a hearer's goal to maximize the relevance, or informativity, of a given utterance. Conversely, the Q-principle, advises the speaker to maximize his contribution to the goal of being as informative as he can (as it indeed was upon Grice's formulation). The two points of view thus collaborate to achieve two mutually dependent goals of the inter locutors: to maximize the cooperative and mutual goal of informativity, and to minimize individual efforts.
224
Bi-Directional Optimality Theory: An Application of Game Theory 2. 3
Two-dimensional optimality theoretic interpretation
(2) Two-dimensional OT (Strong Version) a representation-meaning pair (r, m) is optimal iff it satisfies both the Q- and the /-principle, where: (Q) (r, m) satisfies the Q-principle iff there is no other pair (r', m) such that (r', m) > (r, m) (I) (r, m) satisfies the I -principle iff there is no other pair (r, m') such that (r, m') > (r, m) How does this blocking due to the Q-principle work? Consider the scalar implicature again from Possibly A to Not necessarily A. Let us suppose that the speaker knows all about the possibility of A, and that he has the opportunity to say Possibly A ( p), Necessarily A (0 p) and the negation of these modal possibilities (-, p = 0 -, p and -, 0 p = -, p, respectively). Let us also assume, as seems quite natural, that 0 p f= Op. Given these 3 Boersma (1998) has recently made a similar move in phonology. He argues that sound structures reflect an interaction between the aniculatory and perceptual principles of efficient and effective communication: the speaker-oriented principle of minimization of articulatory effort and the hearer-oriented principle of minimization ofperceptual confusion.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Blutner (1998, 1 999) has recently given the I- and Q-principle a slightly different formulation such that the Gricean maxims can be seen as being part of a two-dimensional optimality theoretic framework of disambigua tion. The I -principle is formulated much like it was above from a maximization point of view, and helps to select the most coherent, or relevant, interpretation. This principle corresponds to the one-direction view on optimality theoretic interpretation as proposed by Hendriks & de Hoop (to appear) and de Hoop & de Swart (2oo1 ), which, exclusively, adopt the hearer's perspective on disambiguation. What is interesting is that Blutner also implements the Q-principle within an Optimality Theoretical frame work, thereby also taking the speaker's perspective into account. Where the I principle compares different possible interpretations for the same syntactic expression, the Q-principle compares different possible syntactic expres sions that the speaker could have used to communicate the same meaning. The interesting feature of Blutner's formulation of the Q-principle within two-dimensional OT is that although it compares alternative syntactic inputs to one another, it still helps to select the optimal meaning among the various possible outputs of the single actual syntactic input given, by acting as a blocking mechanism.3 The strong version of Blutner's two-dimensional OT can be formulated as follows (we here relate pairs (r, m) of possible representations (r) and meanings (m), by means of an ordering relation ' > ', 'being more efficient'):
Paul Dekker and Robert van Rooy
225
assumptions, the speaker knows that only one of three logical possibilities obtains: (i) that D p (and, hence, Op), (ii) that Op !\ • p (so •O• p !\ • D p), or (iii) that D -, p (i.e.
3
GAME THE ORY AND STRO N G O PT I MALITY
The ranking and judging of representations and meanings in optimality theoretic interpretation has a structure which resembles principles devel oped in the well-investigated field of Game Theory. In this section we present a game-theoretical formulation of Blutner's notion of optimality. (For an indepth introduction to game theory, c£ e.g. Osborne & Rubinstein 1994.) The first section presents an introduction to some of the basics of Game Theory, in particular to that of a strategic game. In the next subsection we present the notion of a 'Nash Equilibrium', a renown solution concept in Game Theory. In the third subsection we 'then show how optimality theoretic interpretation can be given a formulation in terms of an interpretation game, and that Blutner's concept of optimality corresponds to precisely this concept of a Nash Equilibrium. 3 . 1 A formal
definition ofgames
In Game Theory, a 'strategic game' is the formal rendering of a game that can be played with a specific number of players, who can play various roles in the game. In strategic games it is assumed that the players all make one choice at the beginning of the game. The players (simultaneously) choose a strategy, and then they play the game, each according to the strategy chosen.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Furthermore, there is a common preference to communicate as much as possible, that is, a preference for (i) and (iii) over (ii). In this situation, saying Possibly A ( Op) implicates • p. For if the speaker had information to the effect that -, -, p = D p, he would have said Necessarily A, which is more informative. As he has not done so, and as long as there is no reason to suppose otherwise, the hearer is entitled to infer -, p. So, although the sentence Possibly A is logically consistent with both Necessarily A and Possibly not A, the first is blocked (by the Q-principle), because of the existence of an alternative syntactic form that would express that meaning in a more efficient way.
Bi-Directional Optimality Theory: An Application of Game Theory
226
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
It is assumed that the players know what options are available to them and to the other players, and what are the outcomes of the game if they know the actions chosen. A strategic game is formalized as a triple (N, (A ; ) , ( 2;)) which consists of a set of players N, and, for each player i E N, a non-empty set of possible actions A;, and a preference relation 2; over the product XJ ENAJ of possible actions of all players. The intuitive idea behind this definition can be put as follows. Each player i can choose any action from his alternatives A;. If all the players have made their choice, we get what is called an 'action profile'. Intuitively, such a profile is one of the possible courses which a game may take. If our players are I , . . . , n and if they choose actions a, . . . , an E XJ E NAJ then that's one possible 'run' of the game. Players are assumed to choose an action which has a preferred result. Preferences over results are given by the preference relations (2;) which are taken to depend wholly and only on the particular actions which the players may choose. Thus, if the players I , . . . , n choose actions a* = a, , . . . , an, respectively, then the result may be better for one player i than when they , bn. In that case, we find that a* > ; b*, that is, a* 2; b* choose b* = b1 , and not b* 2 ; a* Obviously, it may be the case that a* > ; b* and a* >j b* for two profiles a* and b* and players i and j. (This is the case, typically, when two-players have competing or conflicting interests.) In general it is assumed that preference relations are reflexive, transitive, and complete. It may be clear, even from these introductory comments, that the consequences of a particular choice of player i for action a; generally depend, not only on this particular choice, but also on the choices which the other players make. Thus, if the players I , . . . , n choose the action profile , an, respectively, then player a; may be happy about the a* = a1 , result, but if player i sticks to his choice a;, while the others I , . . . , i - I , i + I , . . . , n happen to choose br , . . . , b;- 1 , bi+r , . . . , bm the result may be less welcome for i, of course. On the other hand, if we may assume that the other players I , . . . , i - I , i + I , . . . , in choose a1 , , a;- 1 , a;+1 , , am respectively, then player i is assumed to choose an action a; such that outcome or profile a* = a , , . . . , an is at least as good , an which may result as any alternative profile a" . . . , a;_ , b;, a;+1 , from an alternative choice of i for b;. A note on notation: if we have a profile a* = a, , . . . , an, then we use a':_ ; to indicate the list of profile's strategies of all players except i-i.e. a , , . . . , a;_" a;+" . . . , an-and we use (a':_ ; , b;) to indicate the profile which is like a* with the sole difference that i chooses b; in stead of a;. Typically, of course, a* = (a':_ ; , a;). In order to clarify these notions a bit more, consider the following somewhat stylized example. A famous two-player game is a 'coordination
Paul Dekker and Robert van Rooy
227
game' called 'Bach or Stravinsky'.4 In this game two persons want to go out. They can choose between the performance of a concert of Bach and the performance of a concert of Stravinsky. One player (Bonnie) prefers to go to Bach, the other (Clyde) prefers Stravinsky, but the main concern of both players is to go out together. Formally, this corresponds to a game (N, (A ; ) , {;:::: ; ) ) , where (3) the set of players N = {b, c} consists of Bonnie and Clyde (4) the set of possible actions of Bonnie and Clyde Ab = Ac = {B, S} consist of (a choice for) Bach and Stravinsky
(s) (B, B) >b (s , s) >b (B, s) >b (s, B) (6) (S, S ) > c (B, B) >c (B, S) >c (S, B)
A convenient representation of two-player games can be given in a two dimensional matrix, in which the various rows represent the possible actions of player one (Bonnie) and the columns the possible actions of player two (Clyde):
B
S
(7) B
(3, 2) ( I , I )
S
(o, o) (2 , 3 )
I n this matrix, we have filled in payoff pairs ( n , m ) which indicate the relative payoff of a specific action profile (x, y) for Bonnie and Clyde, respectively. Thus, the pair ( 3 , 2 ) indicates the relative payoff of Bonnie ( 3 ) and Clyde ( 2 ) when Bonnie and Clyde both choose Bach. For Bonnie this constitutes a better payoff then the one in which both choose Stravinsky, because in that case we find a relative payoff pair (2 , 3 ) where Bonnie's payoff ( 2) is less than 3 . For a similar reason, the last profile is better for Clyde, because he prefers a joint choice for Stravinsky over a joint choice for Bach. However, both of these profiles are better than the two in which 4
Originally known
as
'The Battle of the Sexes'.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The profiles of this game are (B, B), (B, S) , (S , B) , and (S, S), where (x, y) indicates the profile which obtains when Bonnie chooses x and Clyde chooses y. Since Bonnie and Clyde definitely prefer to go out together, they both prefer (B, B) and (S, S) over the two other profiles (B, S) and (S, B). Since Bonnie moreover prefers Bach, she also prefers (B, B) over (S, S) and (B, S) over (S, B). Similarly, Clyde prefers (S, S) over (B, B), and (B, S) over ( S, B). The preferences of Bonnie and Clyde, > b and > n can thus be summarized as follows:
228
Bi-Directional Optimality Theory:
An
Application of Game Theory
they do not go out together, and in which they at best reach a payoff of only one. 3 .2
Nash equilibria as solutions
(8) Vi E N and a; E A;: a*
?:;
(a:_ ; , a;)
Intuitively, this says the following. A Nash Equilibrium is a profile in which each player's action is a best response to the choices of the other players in that profile. For no player i is there any alternative a; for the action a: which he chooses in a*, by means of which she can get a better payoff, given that all the other players choose as they choose in a*. A Nash Equilibrium clearly need not give the best possible result which one player might prefer. A player gets the best payoff relative to the choices of the other players in the profile, and this really is an equilibrium because this holds for all players. If we now return to the example which we discussed above we can see that it has two Nash Equilibria, the ones in which both Bonnie and Clyde choose Bach, and the one in which both choose Stravinsky. It is expedient to see why these profiles qualify as equilibria. The profile (B, B) is a Nash Equilibrium because, given that Bonnie chooses Bach, the best possible outcome for Clyde obtains when he chooses Bach as well (since (B, B) >c (B, S)), while given that Clyde chooses Bach, Bach is also the very best choice for Bonnie (since (B, B) > b (S, B)). Something analogous holds of the (S, S) equilibrium. In both profiles, none of the two-players has reason to deviate from the choice he actually makes. Surely, when Bonnie considers the Nash Equilibrium (S, S) she might reason as follows: 'well, I better choose Bach rather than Stravinsky, because given that choice, it is better for Clyde to choose Bach as well, and I like (B, B) better than (S, S)' and therefore choose Bach after all. However, this type of reasoning does not by itself constitute a sound solution concept, because if Clyde also reasons this way , he will choose Stravinsky, and the outcome is (B, S), a profile that is worse, for both Bonnie and Clyde, than the outcome of each of the two mentioned equilibria. The nice point about the two Nash Equilibria in the Bach or Stravinsky game is that the two equilibria are not
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
One of the central notions in game theory is that of a solution concept. In general, solution concepts are abstract and formal specifications of certain optimality recipes. They relate to the reasonable choices which players may make, given some notion of rationality and common knowledge. A very well known solution concept is that of a 'Nash Equilibrium'. A Nash Equilibrium of a strategic game (N, (A; ) , (:2:;) ) is an action profile a* E xJ E N AJ such that:
Paul Dekker and Robert van Rooy
229
absolutely optimal profiles for both players, but optimal profiles relative to the other's choices. Both equilibria are satisfying for both players in this sense, or 'stable'. In the definition of a Nash equilibrium, the only preferences that really count are those between two action profiles a* and b* if their only difference lies in the choice of i, i.e. if a:_i b:_;· Furthermore, non-strict preferences, where both a* ?.; b* and b* ?.; a*, do not count either. (In a Nash Equilibrium, players may have alternative options which are equally good, as long as they are not strictly better.) For this reason, Nash Equilibria in two-player games can be visualized by drawing arrows between two profiles on the same row, or in the same column, with the following meaning: s ('him', Lbb) and ('self', Lbb) >H ('self', Lbx). In the representation of an interpretation game we can visualize this kind of blocking by removing arrows. That is, if a profile points to a Nash Equilibrium, then all pointers to that profile can be removed. If we, thus, remove the arrows pointing to profiles which point to the equilibrium o in the example above, then we get the following, derived game:
(24)
'self'� Lbb
Lbx
I n the resulting interpretation game we find two Nash Equilibria, corresponding to the two BJ -optimal solutions in the original game. This result can be generalized for more involved games with more than two representations and meanings. In such more involved games, the removal of preferences may yield games with new equilibria, and these in their turn may block yet other alternatives. Thus, if we successively keep on removing preferences for blocked profiles, then we collect more and more possible solutions, and if this process reaches a fixed point, then all the resulting Nash Equilibria of the fixed point correspond to the BJ -optimal pairs in the original game. As a matter of fact, such a procedure is the Interpretation Game Theoretical counterpart of Jager's algorithm. Formally, this procedure can be specified as follows. Let I0 be an interpretation game (N, (As, AH ) , ( >s,o , >H ,o) ) , with > i a strict prefer ence relation. Then we define the game In+1 -which is the game In with updated preferences-as follows:
(2 5 ) In+ I = (N, (As , AH) , (>s ,n+ r ' >H ,n+r ) ) with I. >s,n+r = >s,n \ { (y , z ) I ::lx E NEI• : X >H , n y } and 2. >H,n+ r = >H,n \ { (y , z) I ::lx E NEI" : X >s, n y } (In this definition NEI• indicates the set of Nash Equilibria of game In·) If we now construct a sequence of interpretation games I0 , • • • , In , . . . and if we find that In+r = In, then: Observation 3 (B)-solutions are Nash in updated games)
•
the BJ-optimal solutions of I0 are the Nash Equilibria of In
This fact can be proved by comparing the update of preferences with Jager's algorithm for computing optimal solutions. Jager's procedure involves the iterated generation of optimal and blocked profiles. In the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
'him'�
Paul Dekker and Robert van Rooy
237
first run of this procedure, profiles are accepted as optimal that are Nash Equilibria in I0 7 and next those are blocked that have an optimal alternative. It is relatively easily seen that: updates of preferences preserve Nash Equilibria; if an update produces a new Nash Equilibrium, then the same profile was BJ -optimal at earlier stages; if we reach a fixed point Im then all profiles either are a Nash Equilibrium (have no arrow leaving that profile), or are blocked (point at a Nash Equilibrium).
1. 2. 3·
•
•
•
•
•
•
•
•
•
•
7
Since the procedure starts with empty sets of blocked and optimal profiles, the selected optimals ' m ; of course it may be that there is ' such an alternative for a Nash Equilibrium, in case r' =I r and m =I m. However, if ( r, m ) really is a Nash Equilibrium, then it will never get blocked, and as soon as (r', m ' ) is qualified as either optimal or blocked at some stage, then (r, m ) gets accepted as optimal at the next stage. Well-foundedness of Jager's > guarantees this effect. 8 The Game Theoretical formulation of BJ-optimality is close in spirit to von Neumann & Morgenstern (1 944)'s notion of a Stable Set in a coalitional game. Stable Sets are minimal sets of outcomes for which there are no other preferable stable outcomes. Although the concept is framed in terms of outcomes of coalitional games, the idea is clearly similar. C( e.g. Osborne & Rubinstein (1 994: 278ff) for more discussion.
(
r, m
) are those for which there is no preferred alternative (r', )
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Here we witness one merit of viewing optimality theoretic interpretation in terms of (interpretation) games: BJ -optimal solutions can be characterized by means of the independently motivated and well-studied notion of a Nash Equilibrium.8 The update procedure defined above can be illustrated by means of a somewhat artificial but illuminating example. Suppose the possible repre sentations are linearly ordered, so that we can number them: r0, rr , , and that the possible meanings are linearly ordered, too: m0, m 1 , . . . . In this game Io there is one Nash Equilibrium, which is (r0, m0). If we update the preferences in this game, then all H's preferences for ( r 1 , m0) , ( r2 , m0) , are removed, because (r0, m0) is a better Nash Equilibrium for S, and S's are removed because (r0, m0) is a preferences for (r0, m1 ) , (r0, m2) , better Nash Equilibrium for H. Thus, in Iu profile h , m1 ) comes out as Nash Equilibrium as well, because the preferences for (rr , m0) and (r0, m1 ) have been removed. But then we can update again, and remove all H's preferences for (r2, m1 ) , (r3 , m1 ) , . . . and S's preferences for (r1 , m2 ) , (r1 , mJ , . . . . Thus, in I2, profile (r2 , m2 ) comes out as Nash Equilibrium as well. In short, we will find that in game In we have Nash Equilibria (r;, m;) for all i ::; n, so that we construct the diagonal as the solution of I0 The last example also constitutes inspiration for the following proposition:
2 3 8 Bi-Directional Optimality Theory: An Application of Game Theory
Observation 4 (linearizing unambiguous interpretation games)
o
if the set of solutions of an interpretation game is a one-to-one relation between representations and meanings, then the preferences in the game can be equivalently stated by means of a linear order of representations and meanings
4· 3
On two
X
two interpretation games
In this section we give a systematic study of two X two interpretation games, that is games with four profiles. If we thus restrict our attention, we can in principle distinguish seven possible types: one in which there is no solution, one in which there is one solution, one in which there are four, one in which there are three, and three in which there are two:
(26) D D D D D D D (All other types are logical permutations of these types of games.) As we already observed above the first case is excluded by Jager's well-foundedness of > and the second two are void. A three-solutions game is in a sense a combination of the first two two-solutions games. The first two-solutions game models ambiguity, the second synonymy and (expressive) incomplete ness, and the last is the (ideal) diagonal type. It is interesting to note that the last type of interpretation can again be obtained in a variety of ways. All of the following matrices have the diagonal as a solution:
DD
(Besides, any matrices that is a mirror of these matrices along one of the two diagonals yields the same result as well.) In all matrices (and their mirror images) except (the mirror-images of the) first one, one solution is not Nash, that is in these cases the BJ-optimality of that profile is obtained by
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Proo£ If the solutions constitute such a one-to-one relation, and if we order the solutions, then we can identify the i-st representation r; with the representation in the i-st solution, and the i-st meaning with the meaning in the i-st solution; then we can take H's preferences to be defined by precedence in the sequence of meanings, and S's preferences by precedence in the sequence of representations, and the resulting set of solutions is the diagonal, the set of solutions we started out with. End of Proo£
Paul
Dekker
and
Robert
van
Rooy
239
blocked preferences. This is interesting, because it shows that one and the same result can be obtained by a variety of preferences. However, this does not mean that any statement of preferences, which gives the right results, is equally good. In order to appreciate this point, consider the pair of examples discussed in Hendriks & de Hoop (2oor), under the analysis suggested by Blutner:
( r ) Often when I talk to a doctor;, the doctoq ;, J} disagrees with him { i, j} ·
(28) Often when I talk to a doctor;, the doctoq ;, J} disagrees with himself{ i, J} ·
(RE) a reflexive element is preferable to a pronoun (LA) a syntactic domain must contain a pronoun's antecedent The relevant preferences are displayed in the following diagram:
o t t { i, j}
(i, i)
'the doctor-self ' th e dactor-h tm " '
.__
o
This is a diagram of the third diagonal type, in which ('the doctor him', {i, j}) is a BJ-optimal solution because the (LA)-preference for ('the doctor-him', (i, i)) is blocked by the (RE)-preference of ('the doctor self', (i, i)) over this alternative, and because the (RE)-preference for ('the doctor-self', {i, j}) is blocked by the (LA)-preference of ('the doctor self', (i, i)) over this alternative. However, as we argued, we could have obtained the very same result if the preferences were spelled out, alternatively, as indicated by the following diagram: (i, i)
'the doctor-self 'the doctor-h im '
{ i, )}
� �
In this diagram, we have encoded the effect of the converse of the principles (RE) and (LA), and we have obtained a mirror image of the original matrix.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
A BJ-optimal interpretation of example ( r ) is one in which the indices on the noun phrases 'the doctor' and 'him' are different, so that either 'the doctor' or 'him' is interpreted as anaphoric upon 'a doctor', not both. An optimal interpretation of example (28) is one in which both 'the doctor' and 'himself' are interpreted as anaphoric upon 'a doctor'. These results can be obtained by the joint effect of the two constraints (RE) and (LA), which we repeat here for convenience:
240 Bi-Directional Optimality Theory: An Application of Game Theory
This time the solution ('the doctor-him', {i, j}) is optimal (Nash), and the interpretation of ('the doctor-self', (i, i)) turns out BJ-optimal, but the resulting BJ -optimal pairs are the same. Does this mean that we can get away with using the converses of any two or more principles? Certainly not. This can be appreciated when we look at a more general case, where we take more possibilities ((j, j), and {j, k}) into account:
(3 1 )
''he Joc