Uncertainty in Economic Theory
Recent decades have witnessed developments in decision theory that propose an alternative to the accepted Bayesian view. According to this view, all uncertainty can be quantified by probability measures. This view has been criticized on empirical as well as on conceptual grounds. David Schmeidler has offered an alternative way of thinking about decision under uncertainty, which has become popular in recent years. This book provides a review and an introduction to this new decision theory under uncertainty. The first part focuses on theory: axiomatizations, the definitions of uncertainty aversion, of updating and independence, and so forth. The second part deals with applications to economic theory, game theory, and finance. This is the first collection to include chapters on this topic, and it can thus serve as an introduction to researchers who are new to the field as well as a graduate course textbook. With this goal in mind, the book contains survey introductions that are aimed at a graduate level student, and help explain the main ideas, and put them in perspective. Itzhak Gilboa is Professor at the Eitan Berglas School of Economics, Tel-Aviv University, Israel. He is also a Fellow of the Cowles Foundation for Research in Economics at Yale University, USA.
Routledge frontiers of political economy
1 Equilibrium Versus Understanding Towards the rehumanization of economics within social theory Mark Addleson 2 Evolution, Order and Complexity Edited by Elias L. Khalil and Kenneth E. Boulding 3 Interactions in Political Economy Malvern after ten years Edited by Steven Pressman 4 The End of Economics Michael Perelman 5 Probability in Economics Omar F. Hamouda and Robin Rowley 6 Capital Controversy, Post-Keynesian Economics and the History of Economics Essays in honour of Geoff Harcourt, volume one Edited by Philip Arestis, Gabriel Palma and Malcolm Sawyer 7 Markets, Unemployment and Economic Policy Essays in honour of Geoff Harcourt, volume two Edited by Philip Arestis, Gabriel Palma and Malcolm Sawyer 8 Social Economy The logic of capitalist development Clark Everling 9 New Keynesian Economics/Post-Keynesian Alternatives Edited by Roy J. Rotheim 10 The Representative Agent in Macroeconomics James E. Hartley
11 Borderlands of Economics Essays in honour of Daniel R. Fusfeld Edited by Nahid Aslanbeigui and Young Back Choi 12 Value, Distribution and Capital Essays in honour of Pierangelo Garegnani Edited by Gary Mongiovi and Fabio Petri 13 The Economics of Science Methodology and epistemology as if economics really mattered James R. Wible 14 Competitiveness, Localised Learning and Regional Development Specialisation and prosperity in small open economies Peter Maskell, Heikki Eskelinen, Ingjaldur Hannibalsson, Anders Malmberg and Eirik Vatne 15 Labour Market Theory A constructive reassessment Ben J. Fine 16 Women and European Employment Jill Rubery, Mark Smith, Colette Fagan, Damian Grimshaw 17 Explorations in Economic Methodology From Lakatos to empirical philosophy of science Roger Backhouse 18 Subjectivity in Political Economy Essays on wanting and choosing David P. Levine 19 The Political Economy of Middle East Peace The impact of competing trade agendas Edited by J.W. Wright, Jnr 20 The Active Consumer Novelty and surprise in consumer choice Edited by Marina Bianchi 21 Subjectivism and Economic Analysis Essays in memory of Ludwig Lachmann Edited by Roger Koppl and Gary Mongiovi 22 Themes in Post-Keynesian Economics Essays in honour of Geoff Harcourt, volume three Edited by Claudio Sardoni and Peter Kriesler 23 The Dynamics of Technological Knowledge Cristiano Antonelli
24 The Political Economy of Diet, Health and Food Policy Ben J. Fine 25 The End of Finance Capital market inflation, financial derivatives and pension fund capitalism Jan Toporowski 26 Political Economy and the New Capitalism Edited by Jan Toporowski 27 Growth Theory A philosophical perspective Patricia Northover 28 The Political Economy of the Small Firm Edited by Charlie Dannreuther 29 Hahn and Economic Methodology Edited by Thomas Boylan and Paschal F. O’Gorman 30 Gender, Growth and Trade The miracle economies of the postwar years David Kucera 31 Normative Political Economy Subjective freedom, the market and the state David Levine 32 Economist with a Public Purpose Essays in honour of John Kenneth Galbraith Edited by Michael Keaney 33 Involuntary Unemployment The elusive quest for a theory Michel De Vroey 34 The Fundamental Institutions of Capitalism Ernesto Screpanti 35 Transcending Transaction The search for self-generating markets Alan Shipman 36 Power in Business and the State An historical analysis of its concentration Frank Bealey 37 Editing Economics Essays in honour of Mark Perlman Hank Lim, Ungsuh K. Park and Geoff Harcourt
38 Money, Macroeconomics and Keynes Essays in honour of Victoria Chick, volume 1 Philip Arestis, Meghnad Desai and Sheila Dow 39 Methodology, Microeconomics and Keynes Essays in honour of Victoria Chick, volume 2 Philip Arestis, Meghnad Desai and Sheila Dow 40 Market Drive and Governance Reexamining the rules for economic and commercial contest Ralf Boscheck 41 The Value of Marx Political economy for contemporary capitalism Alfredo Saad-Filho 42 Issues in Positive Political Economy S. Mansoob Murshed 43 The Enigma of Globalisation A journey to a new stage of capitalism Robert Went 44 The Market Equilibrium, stability, mythology S.N. Afriat 45 The Political Economy of Rule Evasion and Policy Reform Jim Leitzel 46 Unpaid Work and the Economy Edited by Antonella Picchio 47 Distributional Justice Theory and measurement Hilde Bojer 48 Cognitive Developments in Economics Edited by Salvatore Rizzello 49 Social Foundations of Markets, Money and Credit Costas Lapavitsas 50 Rethinking Capitalist Development Essays on the economics of Josef Steindl Edited by Tracy Mott and Nina Shapiro 51 An Evolutionary Approach to Social Welfare Christian Sartorius
52 Kalecki’s Economics Today Edited by Zdzislaw L. Sadowski and Adam Szeworski 53 Fiscal Policy from Reagan to Blair The Left veers Right Ravi K. Roy and Arthur T. Denzau 54 The Cognitive Mechanics of Economic Development and Institutional Change Bertin Martens 55 Individualism and the Social Order The social element in liberal thought Charles R. McCann, Jnr 56 Affirmative Action in the United States and India A comparative perspective Thomas E. Weisskopf 57 Global Political Economy and the Wealth of Nations Performance, institutions, problems and policies Edited by Phillip Anthony O’Hara 58 Structural Economics Thijs ten Raa 59 Macroeconomic Theory and Economic Policy Essays in honour of Jean-Paul Fitoussi Edited by K. Vela Velupillai 60 The Struggle Over Work The “end of work” and employment alternatives in post-industrial societies Shaun Wilson 61 The Political Economy of Global Sporting Organisations John Forster and Nigel Pope 62 The Flawed Foundations of General Equilibrium Critical essays on economic theory Frank Ackerman and Alejandro Nadal 63 Uncertainty in Economic Theory Essays in honor of David Schmeidler’s 65th birthday Edited by Itzhak Gilboa
Uncertainty in Economic Theory Essays in honor of David Schmeidler’s 65th birthday
Edited by Itzhak Gilboa
First published 2004 by Routledge 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge 29 West 35th Street, New York, NY 10001 This edition published in the Taylor & Francis e-Library, 2006.
“To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” Routledge is an imprint of the Taylor & Francis Group © 2004 selection and editorial matter, Itzhak Gilboa; individual chapters, the contributors All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data A catalog record for this book has been requested ISBN 0-415-32494-7
Contents
List of contributors Preface
xii xv
PART I
Theory 1 Introduction
1 3
ITZHAK GILBOA
2 Preference axiomatizations for decision under uncertainty
20
PETER P. WAKKER
3 Defining ambiguity and ambiguity attitude
36
PAOLO GHIRARDATO
4 Introduction to the mathematics of amibiguity
46
MASSIMO MARINACCI AND LUIGI MONTRUCCHIO
5 Subjective probability and expected utility without additivity
108
DAVID SCHMEIDLER
6 Maxmin expected utility with non-unique prior
125
ITZHAK GILBOA AND DAVID SCHMEIDLER
7 A simple axiomatization of nonadditive expected utility
136
RAKESH SARIN AND PETER P. WAKKER
8 Updating ambiguous beliefs ITZHAK GILBOA AND DAVID SCHMEIDLER
155
x
Contents
9 A definition of uncertainty aversion
171
LARRY G. EPSTEIN
10 Ambiguity made precise: a comparative foundation
209
PAOLO GHIRARDATO AND MASSIMO MARINACCI
11 Stochastically independent randomization and uncertainty aversion
244
PETER KLIBANOFF
12 Decomposition and representation of coalitional games
261
MASSIMO MARINACCI
PART II
Applications
281
13 An overview of economic applications of David Schmeidler’s models of decision making under uncertainty
283
SUJOY MUKERJI AND JEAN-MARC TALLON
14 Ambiguity aversion and incompleteness of contractual form
303
SUJOY MUKERJI
15 Ambiguity aversion and incompleteness of financial markets
336
SUJOY MUKERJI AND JEAN-MARC TALLON
16 A quartet of semigroups for model specification, robustness, prices of risk, and model detection
364
EVAN W. ANDERSON, LARS PETER HANSEN, AND THOMAS J. SARGENT
17 Uncertainty aversion, risk aversion, and the optimal choice of portfolio 419 JAMES DOW AND SÉRGIO RIBEIRO DA COSTA WERLANG
18 Intertemporal asset pricing under Knightian uncertainty
429
LARRY G. EPSTEIN AND TAN WANG
19 Sharing beliefs: between agreeing and disagreeing ANTOINE BILLOT, ALAIN CHATEAUNEUF, ITZHAK GILBOA, AND JEAN-MARC TALLON
472
Contents xi 20 Equilibrium in beliefs under uncertainty
483
KIN CHUNG LO
21 The right to remain silent
522
JOSEPH GREENBERG
22 On the measurement of inequality under uncertainty
531
ELCHANAN BEN-PORATH, ITZHAK GILBOA, AND DAVID SCHMEIDLER
Index
541
Contributors
Evan W. Anderson is Assistant Professor of Economics, University of North Carolina at Chapel Hill. His research interests include heterogeneous agents, recursive utility, robustness, and computational methods. Elchanan Ben-Porath is Professor of Economics at the Hebrew University of Jerusalem. His fields of interest include game theory, decision theory, and social choice theory. Antoine Billot is Professor of Economics at the Université de Paris II, Panthéon-Assas, and junior member of the Institut Universitaire de France. His research interests are in the field of preference theory, social choice theory, and decision theory. Alain Chateauneuf is Professor of Mathematics at the Université de Paris I, Panthéon-Sorbonne. His research is mainly concerned with mathematical economics, focusing on decision theory and particularly on decision under uncertainty. James Dow is Professor of Finance at London Business School. His recent research has been on models that integrate the financial markets with corporate finance. He has also worked on executive compensation and leadership. In his work on Knightian uncertainty with Sérgio Welang, he applied the models developed by David Schmeidler to portfolio choice, to stock price volatility, to the no-trade theorem, and to Nash equilibrium. Larry Epstein is the Elmer B. Milliman Professor of Economics at the University of Rochester. His research interests include decision theory and its applications to macroeconomics and finance. Paolo Ghirardato is Associate Professor of Mathematical Economics at the Università di Torino. His main research interest is decision theory and its consequences for economic, political, and financial modeling. Itzhak Gilboa is Professor of Economics at Tel-Aviv University and a Fellow of the Cowles Foundation for Research in Economics at Yale University. He is interested in decision theory, game theory, and social choice.
Contributors xiii Joseph Greenberg is the Dow Professor of Political Economy at McGill University. His research interests include economic theory, game theory, and theory of social situations. Lars Peter Hansen is the Homer J. Livingston Distinguished Service Professor at the University of Chicago. He is interested in macroeconomic theory, risk, and uncertainty. Peter Klibanoff is Associate Professor of Managerial Economics and Decision Sciences at the Kellogg School of Management, Northwestern University. His research interests include decision making under uncertainty, microeconomic theory, and behavioral finance. Kin Chung Lo is Associate Professor of Economics at York University. He specializes in game theory and decision theory. His publications cover areas such as nonexpected utility, auctions, and foundation of solution concepts in games. Massimo Marinacci is Professor of Applied Mathematics at the Università di Torino, Italy. His main research interest is mathematical economics, in particular choice theory. Luigi Montrucchio is Professor of Economics at the Università di Torino, Italy. His main research interest is mathematical economics, in particular economic dynamics and optimal growth. Sujoy Mukerji is a University Lecturer in Economic Theory at the University of Oxford and Fellow of University College. His research has primarily been on decision making under ambiguity, its foundations, and its relevance in economic contexts. His broader research interests lie in the intersection of bounded rationality and economic theory. Thomas Sargent is Professor of Economics at New York University and senior fellow at the Hoover Institution at Stanford. He is interested in macroeconomics and applied economic dynamics. Rakesh Sarin is Professor of Decisions, Operations, and Technology Management and the Paine Chair in Management at the Anderson School of Management at the University of California at Los Angeles. He is interested in decision analysis, societal risk analysis, and risk analysis. David Schmeidler is Professor of Statistics and Management at Tel-Aviv University and Professor of Economics at Ohio State University. His research topics include economic theory, game theory, decision theory, and social choice. Jean-Marc Tallon is Directeur de Recherche at CNRS and Université Paris I. His research mainly deals with economic applications of models of decision under uncertainty, and general equilibrium models with incomplete market.
xiv Contributors Peter P. Wakker is Professor of Decision under Uncertainty at the University of Amsterdam. His research is on normative foundations of Bayesianism and on descriptive deviations from Bayesianism, the latter both theoretically and empirically. Tan Wang is Associate Professor at the Sauder School of Business, University of British Columbia. His research interest is in decision theory under risk and uncertainty and asset pricing, focusing on the implications of uncertainty aversion. Sérgio Ribeiro da Costa Werlang is Professor of Economics at the Getulio Vargas Foundation. His interests include economic theory and macroeconomics.
Preface
This book is published in celebration of David Schmeidler’s 65th birthday. It is a collection of seventeen papers that have appeared in refereed journals, combined with five introductory chapters that were written for this volume. All papers deal with uncertainty (or “ambiguity”) in economic theory. They range from purely theoretical issues such as axiomatic foundations, definitions, and measurement, to economic applications in fields such as contract theory and finance. There is a large and rapidly growing literature on uncertainty in economic theory, following David’s seminal work on non-additive expected utility. But there is no general introduction to the topic, and scholars who are interested in it are usually referred to the original papers, which are scattered in various journals and are often rather technical. We felt that a collection of papers and introductory surveys would make a significant contribution to the literature, allowing to introduce the novice and to guide the expert. Thus, David’s birthday was the impetus for the publication of this collection, but the latter has an independent raison-d’être. We have no intention or pretense to summarize David Schmeidler’s research. Indeed, uncertainty in economic theory is but one topic that David has worked on. He has made many other remarkable and path breaking contributions to game theory, mathematical economics, economic theory, and decision theory. We do not attempt to give an overview of David’s contributions for two reasons. First, such an overview is a daunting task. Second, this book does not mark David’s retirement in any way. David Schmeidler is a very active researcher. We hope and believe that he will continue to produce new breakthroughs in the future. This book marks a special birthday, but by no means the end of a research career. We thank the contributors to the volume, as well as the publishers for the right to reprint published papers. We hope that this collection, while obviously partial, will give readers a preliminary overview of the research on uncertainty in economic theory. We are grateful to Ms. Lada Burde for her invaluable help in editing and proofreading this volume. Paolo Ghirardato, Itzhak Gilboa, Massimo Marinacci, Luigi Montrucchio, Sujoy Mukerji, Jean-Marc Tallon, and Peter P. Wakker
Part I
Theory
1
Introduction Itzhak Gilboa
1.1. Uncertainty and Bayesianism Ever since economic theory started to engage in formal modeling of uncertainty, it has espoused the Bayesian paradigm. In the mid-twentieth century, the Bayesian approach came to dominate decision theory and game theory, and it has remained a dominant paradigm in the applications of these theories to economics to this day. Economic problems ranging from insurance and portfolio selection to signaling and health policy are typically analyzed in a Bayesian way. In fact, there is probably no other field of formal inquiry involving uncertainty in which Bayesianism enjoys such a predominant status as it does in economic theory. But what does it exactly mean to be Bayesian? One may discern at least three distinct tenets that are often assumed to be held by Bayesians. First, a Bayesian quantifies uncertainty in a probabilistic way. Second, Bayesianism entails updating one’s belief given new information in accordance with Bayes’ law. Finally, in light of the axiomatizations of the Bayesian approach (Ramsey, 1931; de Finetti, 1937; Savage, 1954), Bayesianism is often taken to also imply the maximization of expected utility (EU) relative to probabilistic beliefs.1 Taken as assumptions regarding the behavior of economic agents, all three tenets of Bayesianism have come under attack. The assumption of EU maximization was challenged by Allais (1953). The famous Allais paradox, combined with the body of work starting with Kahneman and Tversky’s Prospect Theory (1979), aimed to show that people may fail to maximize EU even in the face of decisions under risk, namely, where probabilities are known. Tversky and Kahneman (1974) have also shown that people may fail to perform Bayesian updating. That is, even when probabilities are given in a problem, they might not be manipulated in accordance with Bayes’ law. Thus, the second tenet of Bayesianism has also been criticized in terms of descriptive validity. Moreover, other work by Kahneman and Tversky, such as the documentation of framing effects (Tversky and Kahneman, 1981) has shown that some of the implicit assumptions of the Bayesian model are also descriptively inaccurate. Yet, violations of the second and third tenets have not amounted to a serious critique of Bayesianism per se. Violations of Bayesian updating are viewed by most researchers as mistakes. While these mistakes pose a challenge to descriptive
4
Itzhak Gilboa
Bayesian theories, they fail to sway one from the belief that Bayes’ law should be the way that probabilities be updated. Some researchers also view violations of EU maximization (given a probability measure) as plain mistakes, which do not challenge the normative validity of the theory. Other researchers disagree. At any rate, these violations do not clash with the Bayesian view, as statisticians or computer scientists understand it. That is, an agent may quantify uncertainty by a prior probability measure, and update this prior to a posterior in a Bayesian way, without maximizing EU with respect to her probabilistic beliefs.2 This book is devoted to behavioral violations of the first tenet of Bayesianism, namely, that all uncertainty can be quantified by a probability measure. In contrast to the other two types of violations, the rejection of the first tenet is a direct attack on the essence of the Bayesian approach, even when the latter is interpreted as a normative theory. As we argue shortly, there are situations in which violations of the first tenet cannot be viewed as mistakes, and cannot be easily corrected even by decision makers who are willing to convert to Bayesianism. When explaining the basic notion of uncertainty, as opposed to risk, one often starts out with Ellsberg’s (1961) famous examples (the “Ellsberg paradox”, which refers both to the two-urn and to the one-urn experiments). These experiments show that many people tend to prefer bets with known probabilities to bets with unknown ones, in a way that cannot be reconciled with the first tenet of Bayesianism. Specifically, Ellsberg’s paradox provides an example in which Savage’s axiom P2 is consistently violated by a nonnegligible proportion of decision makers.3 A decision maker who violates P2 as in Ellsberg’s paradox does not only deviate from EU maximization. Rather, such a decision maker exhibits a mode of behavior that cannot be described as a function of a probability measure. To the extent that behavioral data can challenge a purely cognitive assumption, Ellsberg’s paradox exhibits a violation of the first tenet of Bayesianism. Yet, David Schmeidler’s interest in uncertainty was not aroused by Ellsberg’s paradox4 or by any other behavioral manifestation of a non-Bayesian approach. Rather, Schmeidler’s starting point was purely cognitive: like Knight (1921) and Ellsberg, he did not find the first tenet of Bayesianism plausible. Specifically, Schmeidler argued that the Bayesian approach “does not reflect the heuristic amount of information that led to the assignment of […] probability” (Schmeidler, 1989: 571). His example was the following: assume that you take a coin out of your pocket, and that you are about to bet on it. You have tossed this coin many times in the past, and you have not observed any significant deviations from the assumption of fairness. For the sake of argument, assume that you have tossed the coin 1,000 times and that it has come up exactly 500 times each as Heads and Tails. Thus, you assign probability of 50 percent to the coin coming up on Heads, as well as on Tails, in the next toss. Next assume that your friend takes a coin out of her pocket. You have absolutely no information about this coin. If you are asked to assign probabilities to the two sides of the coin, you may well follow symmetry considerations (equivalently, Laplace’s principle of insufficient reason) and assign probability of 50 percent to each side of this coin as well. However,
Introduction
5
argued Schmeidler, the 50 percent that are based on empirical frequencies in large databases do not “feel” the same as the 50 percent that were assigned based on symmetry considerations. The Bayesian approach, in insisting that every source of uncertainty be quantified by a (single, additive) probability measure, is too restrictive. It does not allow the amount of information used for probabilistic assessments to be reflected in these assessments. It seems to be a natural step to couch this cognitive observation in a behavioral setup. Indeed, Ellsberg’s two-urn experiment is very similar to Schmeidler’s contrast between the two coins. It is, however, important to note that Schmeidler’s critique of Bayesianism starts from a cognitive perspective. It is not motivated by an observed pattern of behavior, such as much of the work ensuing from Allais’s paradox. Relatedly, Schmeidler’s critique of the first tenet of Bayesianism is not solely on descriptive grounds. Starting with the logic of the Bayesian approach, rather than with experimental evidence, this critique cannot be dismissed as focusing on a setup in which decision makers err. Rather, Schmeidler’s point was that in many situations there is not enough information for the generation of a Bayesian prior. In these situations, it is not clear that the rational thing to do is to behave as if one had such a prior. These considerations also raise doubts regarding the definition of rationality by internal consistency of decisions or of statements. If we were to assume that “rationality” only means coherence, Savage’s axioms would appear to be a very promising candidate for a canon of rationality. However, if we take these axioms as a rationality test, it is too easy to pass it: in a situation of uncertainty, one can arbitrarily choose any prior probability and behave so as to maximize EU with respect to this prior. This will clearly suffice to pass Savage’s rationality test. But it would not seem rational by any intuitive definition of the term. Ellsberg’s paradox is an extremely elegant illustration of a behavioral rejection of the first tenet of Bayesianism. It manages to translate the cognitive unease with the Bayesian approach to observed choice, thanks to certain symmetries in the decision problem. But these symmetries may also be misleading. While many decision makers violate Savage’s P2 in Ellsberg’s paradox, it seems easy enough to “correct” their choices so that they correspond to Savage’s theory. In both of Ellsberg’s examples there is enough symmetry to allow for Laplace’s principle of insufficient reason to pinpoint a single probability measure. This symmetric prior might appear as a natural candidate for the would-be Bayesian, and it might give the impression that violations of P2 can easily be worked around. Cognitive ease aside, the decision maker may behave as if she were Bayesian. This impression would be wrong. Most real-life problems do not exhibit enough symmetries to allow for a Laplacian prior. To consider a simple example, assume that one faces the uncertainty of war. There are only two states of the world to consider – war and no war. Empirical frequencies surely do not suffice to generate a prior probability over these two states, since the uncertain situation cannot be construed as a repeated experiment. Therefore, this is a situation of uncertainty as opposed to risk. But it would be ludicrous to suggest that the probability of war should be 50 percent, simply because there are two states of the world
6
Itzhak Gilboa
with no historical data on their relative frequencies. Indeed, in this situation there is sufficient reason to distinguish between the two states, though not sufficient information to generate a Bayesian prior. Ellsberg’s paradox, as well as Schmeidler’s coin example, should therefore be taken with a grain of salt. They drive home the point that the cognitive unease generated by uncertainty may have behavioral implications. But they do not capture the complexity of a multitude of real-life decisions in which there is sufficient reason (to distinguish among states) but not sufficient information (to generate a prior).
1.2. Nonadditive probabilities (CEU) David Schmeidler’s first attempt to model a non-Bayesian approach to uncertainty involved nonadditive probabilities. This term refers to mathematical entities that resemble probability measures, with the exception that they need not satisfy the additivity axiom. The idea can be simply explained in Schmeidler’s coin example (equivalently, in Ellsberg’s two-urn paradox). Assume, again, that there are two coins. One, the “known” coin, has been tossed many times, with a relative frequency of 50 percent Heads and 50 percent Tails. The other, the “unknown” one, has never been tossed before. Assume further that a decision maker feels uneasy about betting on the unknown coin. That is, she prefers to bet on the known coin coming up Heads than on the unknown coin coming up Heads, and the same applies to Tails. This preference pattern holds despite the fact that the decision maker agrees that both coins will eventually come up on either Heads or Tails. To be more concrete, the decision maker is indifferent between betting on “the known coin comes up Heads or Tails” and on “the unknown coin comes up Heads or Tails”. Should probabilities reflect willingness to bet, argued Schmeidler, the probability that the decision maker assigns to the unknown coin coming up Heads is lower than she does for the known coin coming up Heads, and the same would apply to Tails. To be precise, consider a model with four states of the world, each one describing the outcome of both coin tosses: S = {HH, HT, TH, TT}. HH denotes the state in which both coins come up Heads; HT – the state in which the known coin come up Heads and unknown – Tails; and so forth. In this setup, imagine that the probability of {HH, HT} and of {TH, TT} is 50 percent, whereas the probability of {HH, TH} and of {HT, TT} is 40 percent. This would reflect the fact that the EU of a bet on any side of the known coin is higher than that of a bet on either side of the unknown coin. Yet, the union of the first pair of events, {HH, HT} and {TH, TT}, equals the union of the second pair, namely, {HH, TH} and {HT, TT}, and it equals S. Thus, if probabilities reflect willingness to bet, they are nonadditive: the probability of each of {HH, TH} and {HT, TT} is 40 percent, while the probability of their union is 100 percent. More generally, nonadditive probabilities are real-valued set functions that are defined over a sigma-algebra of events. They are assumed to satisfy three conditions: (i) monotonicity with respect to set inclusion; (ii) assigning zero to the empty
Introduction
7
set; and (iii) assigning 1 to the entire state space (normalization). Observe that no continuity is generally assumed. Hence, adding the requirement of additivity (with respect to the union of two disjoint events) would result in a finitely additive (rather than sigma-additive) probability, in accordance with the derivations of de Finetti (1937) and Savage (1954). As illustrated by the coin example, nonadditive probability measures can reflect the amount of information that was used in estimating a probability of an event. But how does one compute EU with respect to a nonadditive probability measure? Or, how does one define an integral of a real-valued function with respect to such a measure? In the simple case where the function assumes a positive value x on an event A, and otherwise zero, the answer seems simple: the integral should be xv(A) where v(A) denotes the nonadditive probability of A. Indeed, this definition has been implicit in our discussion of “willingness to bet on an event” mentioned earlier. If x stands for the utility level of the more desirable outcome, and zero – of the less desirable one, then 0.4x would be the integral of the bet on each side of the unknown coin, whereas 0.5x would be the integral of the bet on each side of the known coin. What happens, then, if a function assumes two positive values, x on A and y on B (where A and B are two disjoint events)? The straightforward extension would be to define the integral as xv(A) + yv(B). Indeed, this definition would seem to generalize the Riemann integral for the case in which v is nonadditive: it sums the areas of rectangles whose base is the domain of the function, and their height is the value of the function (Figure 1.1). Yet, this definition is problematic. First, letting y approach x, one finds that the integral thus defined is not continuous with respect to the integrand. Specifically, for x = y the value of the integral would be xv(A ∪ B), which will, in general, differ from xv(A) + yv(B) = x[v(A) + v(B)].
x
y xv (A) yv(B)
A
B S
Figure 1.1
8
Itzhak Gilboa
x (x – y)v(A) y
yv(AUB )
A
B S
Figure 1.2
Second, the same example can serve to show that the integral is not monotone with respect to the integrand: a function f may dominate another function g (pointwise), yet the integral of f will be strictly lower than that of g. Schmeidler’s solution to these difficulties was to use the Choquet integral. Choquet (1953–54) dealt with capacities, which he defined as nonadditive probability measures that satisfy certain continuity conditions. Choquet defined a notion of integration of real-valued functions with respect to capacities that satisfies both continuity and monotonicity with respect to the integrand. In the earlier example, the Choquet integral would be computed as follows: assume that x > y. Over the event A ∪ B, one is guaranteed the value y. Hence, let us first calculate yv(A ∪ B). Next, over the event A the function is above y. The additional value, (x − y), is added to y over the event A, but not over B. Thus, we add to the integral (x−y)v(A). Overall, the integral of the function would be (x−y)v(A)+yv(A∪B). This is also the sum of areas of rectangles. But this time their height is not the value of the function. Rather, it is the difference between two consecutive values that the function assumes. (See Figure 1.2.) Observe that this value equals xv(A) + yv(B) if v happens to be additive. That is, this definition generalizes the standard one for additive measures. But even if v is not additive, the Choquet integral retains the properties of continuity and monotonicity. Consider a simplified version of the coin example mentioned earlier, where we ignore the known coin and focus on the unknown coin. There are only two states of the world, H and T. Let us assume that v(H) = v(T) = 0.4. Assume that a function f takes the value x at H and the value y at T. Then, if x ≥ y ≥ 0 the Choquet integral of f is 0.4(x − y) + y = 0.4x + 0.6y. If, however, y > x ≥ 0, the Choquet integral of f is 0.4(y − x) + x = 0.6x + 0.4y.
Introduction
9
The definition of the Choquet integral for general functions follows the intuition stated earlier. The following chapters contain a precise definition and other details. For the time being it suffices to mention that the Choquet integral is, in general, continuous and monotone with respect to its integrand. Schmeidler proposed that decision makers behave as if they were maximizing the Choquet integral of their utility function, where the integral is computed with respect to their subjective beliefs, and the latter are modeled by a nonadditive probability measure. This claim is often referred to as “Choquet Expected Utility” (CEU) theory. Whereas the notion of capacities and the Choquet integral existed in the mathematical literature, Schmeidler’s (1989) was the seminal paper that first applied these concepts to decision under uncertainty, and that also provided an axiomatic foundation for CEU, comparable to the derivation of Expected Utility Theory (EUT) by Anscombe and Aumann (1963). In particular, the axiomatic derivation identifies the nonadditive probability v uniquely, and the utility function – up to a positive linear transformation.5
1.3. A cognitive interpretation of CEU with convex capacities Schmeidler defined a nonadditive (Choquet) EU maximizer to be uncertainty averse if her nonadditive subjective probability v was convex. Convexity, as defined in cooperative game theory,6 means that for any two events A and B, v(A) + v(B) ≤ v(A ∩ B) + v(A ∪ B). This condition is also referred to as supermodularity, or 2-monotonicity, and it is equivalent to stating that the marginal v-contribution of an event is always nondecreasing in the following sense: suppose that an event R is “added”, in the sense of set union, to events S and T that are disjoint from R. Suppose further that S is a subset of T . Then v is convex if and only if, for every such three events R, S, and T , the marginal contribution of R to T , v(T ∪ R) − v(T ), is at least as high as the marginal contribution of R to S, v(S ∪ R) − v(S). Later literature has questioned the appropriateness of this definition of uncertainty aversion, and has provided several alternative definitions. Chapter 3 is devoted to this issue and we do not dwell on it here. But convexity of nonadditive measures has remained an important property for other reasons. It is well known in cooperative game theory that convex games have a nonempty core. That is, if a nonadditive measure v is convex, then there are (finitely) additive probability measures p that dominate it pointwise (p(A) ≥ v(A) for every event A). In the context of cooperative game theory, a dominating measure p suggests a stable imputation: a way to split the worth of the grand coalition, v(S) = p(S) = 1, among its members, in such a way that no coalition A has an incentive to deviate and operate on its own. Schmeidler (1986) showed that, for a convex game v, the Choquet integral of every real-valued function with respect to v equals the minimum over all integrals of this function with respect to the various (additive) measures in the core of v. Conversely, if a game v has a nonempty core, and the Choquet integral of every
10
Itzhak Gilboa
function with respect to v equals the minimum, over additive measures in the core, of the integrals of this function, then v is convex. Consider again the simplified version of the coin example mentioned earlier, where there are only two states of the world, H and T. Assume that v(H) = v(T) = 0.4. This v is convex and its core is Core(v) = {(p, 1 − p) | 0.4 ≤ p ≤ 0.6}. For a function f that takes the value x ≥ 0 at H and the value y ≥ 0 at T, we computed the Choquet integral with respect to v, and found that it is 0.4x + 0.6y if x ≥ y and 0.6x + 0.4y if y > x. It is readily observed that this integral is precisely Min0.4 ≤ p ≤ 0.6 px + (1 − p)y. That is, the Choquet integral (of any non-negative) f with respect to the convex v equals the minimum of the integrals of f relative to additive measures, over all additive measures in Core(v). The decision maker might therefore be viewed as if she does not know what probability measure governs the unknown coin. But she believes that each side of the coin cannot have a probability lower than 40 percent. Thus, she considers all the probability measures that are consistent with this estimate. Each such probability measure defines an integral of the function f . Faced with this set of integral values, the CEU maximizer behaves as if the lowest possible expected value of f is the relevant one. In other words, CEU (with respect to a convex capacity) may be viewed as a theory combining the maxmin principle and EU: the decision maker first computes all possible EU values, then considers the minimal of those, and finally – chooses an act that maximizes this minimal EU. Whenever a CEU maximizer has a convex capacity v, this cognitive interpretation of the Choquet integral holds. The set of probabilities with respect to which one takes the minimum of the integral may be interpreted as representing the information available to the decision maker. In the example stated earlier, the decision maker might not be able to specify the probabilities of the events in question, but she may be able to provide bounds on these probabilities. This cognitive interpretation should be taken with a grain of salt. Observe that in the coin example, the decision maker has no information about the unknown coin. If we were to ask what is the set of probabilities that she deems as possible, we would have to include all probability measures, {(p, 1 − p) | 0 ≤ p ≤ 1}. Yet, the decision maker behaves as if only the measures {(p, 1 − p) | 0.4 ≤ p ≤ 0.6} were indeed possible. Thus, the core of the capacity v need not coincide with the probabilities that are, indeed, possible according to available information. Rather, the core of v is the set of probabilities that the decision maker appears to entertain, given her choices, and in the context of the maxmin decision rule. One may conceive of other decision rules that would give rise to other sets of probability measures, and it is not clear, a priori, that the maxmin framework is the appropriate one to elicit the decision maker’s “real” beliefs.
Introduction
11
1.4. Multiple priors (MMEU) In the coin example, as well as in Ellsberg’s experiments, the information that is explicitly provided to the decision maker can be fully captured by placing lower and/or upper bounds on the probabilities of specific event. For instance, in the coin example it is natural to imagine that the probability of each side of the “known” coin is known to be 50 percent, whereas the probability of each side of the “unknown” coin is only known to be in the range (0, 1). As mentioned earlier, the decision maker’s behavior may not be guided by this set of probabilities. Rather, the decision maker may behave as if each side of the unknown coin has some probability in the range (0.4, 0.6). Further, her behavior may exhibit some uncertainty about the probability governing the known coin as well, and we may find that she behaves as if each side of the known coin has some probability in the range, say, (0.45, 0.55). In all these examples, both explicitly given information and behaviorally derived uncertainty are reflected by lower and upper bounds on the probabilities of specific events. Since upper bounds on the probability of an event may be written as a lower bound on the probability of its complement, lower bounds suffice. That is, one may define the set of relevant probability measures by simple constraints of the form p(A) ≥ v(A) for various events A and an appropriately chosen v. In other words, the set of probabilities may be defined by a nonadditive probability measure, interpreted as the lower bound on the unknown probability.7 But should one follow this cognitive interpretation of CEU, one may find it too restrictive. For example, one might believe that an event A is at least twice as likely as an event B. Thus, one would like to consider only probability measures that satisfy p(A) ≥ 2p(B). This is a simple linear constraint, but it cannot be reduced to constraints of the form p(A) ≥ v(A). Moreover, one may have various pieces of information that restrict the set of probability measures that might be governing the decision problem, that are not representable by linear constraints. For example, assume that a random variable is known (or assumed) to have a normal distribution, with unknown expectation and variance. Ranging over all the possible values of the unknown parameters results in a set of probability measures. This set will generally not be defined by a lower bound nonadditive probability function v. In fact, all problems that are analyzed by the tools of classical statistics are modeled by a set of possible probability measures, over which one has no prior distribution. Thus, a huge variety of problems encountered on a daily basis by individual decision makers, scientists, professional consultants, and other experts involve sets of probability, where, for the most part, these sets do not constitute the core of a nonadditive measure. Whereas classical statistics does not offer a general decision theory, it is natural to extend the CEU interpretation for the case of a convex v to this more general setup. Specifically, assume that a decision maker conceives of a state space S. Over this state, she considers as possible a set of probability measures C. Given a choice problem, and assuming that the decision maker has a utility function u
12
Itzhak Gilboa
defined over possible outcomes, she might adopt the following decision rule: for each possible act f , and for each possible (additive) probability measure p ∈ C, compute the EU of f relative to p. Next consider the minimum (or infimum) of these EU values of f , ranging over all measures in C. Evaluate f by this minimum value, and choose the act that maximizes this index. This theory was suggested and axiomatized by Gilboa and Schmeidler (1989). It has become to be known as the Maxmin Expected Utility (MMEU) model, or the “multiple prior” model. The axiomatization derives a set of priors C that is convex.8 Indeed, given the decision rule of MMEU, any set of probability measures C is observationally equivalent to its convex hull. Given the restriction of convexity, the set C is uniquely identified by the decision maker’s preferences. The utility function in this model is unique up to positive linear transformation, and it is identified in tandem with the set C. In a sense, MMEU theory provided classical statistics with the foundations that Ramsey, de Finetti, and Savage provided Bayesian statistics.9 EUT theory specified how a Bayesian prior might be used for decision making, and the axiomatizations of (subjective) EUT offered a derivation of this prior from observable behavior. Similarly, MMEU specified how decisions might be made given a set of priors, and the related axiomatization provided a derivation of the set of priors. However, with a set of priors there seems to be much lower degree of agreement about the appropriate way to use it for decision making. In particular, the maxmin criterion was often criticized for being too extreme, and several alternatives have been offered. In particular, Jaffray (1989) has offered to use Hurwicz’s α-criterion over the set of EU values of an act, and Klibanoff et al. (2003) offer to aggregate all these EU values. The Gilboa–Schmeidler axiomatization is based on behavioral data. As such, the set of priors that they derive shares the duality of interpretation with the core of a convex capacity. That is, while it is tempting to interpret the set of priors as reflecting the information available to the decision maker, the two might differ. The set of priors is simply those probabilities that describe the decision maker’s behavior, via the maxmin rule, should she satisfy the Gilboa–Schmeidler axioms. It is possible that a decision maker would have actual information represented by a set C, but that she would behave according to the maxmin rule with respect to a different set of priors C . With the caveat mentioned earlier, MMEU has two main advantages over CEU. First, a general set of priors, restricted only by convexity, may represent a much larger variety of decision situations, than may a set that has to be the core of a convex capacity.10 Second, to many authors MMEU appears to be a more intuitive theory than does CEU.11 MMEU is almost as simple to explain as classical EUT. At the same time, MMEU may be easier to implement than EUT, because the former relaxes the informational requirements imposed by the latter. Given that Schmeidler’s interest in uncertainty started with a cognitive unease generated by the assumptions of the Bayesian approach, it is comforting to know that an alternative theory can be offered that can relax the first tenet of Bayesianism, but that retains the cognitive appeal of EUT.
Introduction
13
1.5. Related literature In the introduction, we followed the development of CEU and of MMEU in an associative and chronological order, tracing the path that Schmeidler had taken in his thoughts about decision under uncertainty. Indeed, CEU and MMEU will remain the focus of this volume. However, these theories bear some similarities to other theories of belief representation and/or of decision making. While we do not intend to provide here a complete history of reasoning about uncertainty, the reader would probably benefit from a brief survey of a few other, closely related theories. All of them were developed independently of CEU and MMEU, and some of these developments were more or less concurrent with the development of CEU and MMEU. 1.5.1. Rank dependent expected utility (RDEU)12 Several psychologists have suggested the notion that individuals may not perceive probabilities correctly. This idea dates back to Preston and Baratta (1948) and Edwards (1954), but it has gained popularity among economists mostly with prospect theory (PT), suggested by Kahneman and Tversky (1979). Specifically, it is postulated that, in describing decision making under risk, an event with a stated probability of p has a decision weight f(p) which is, in general, different from p. It is typically assumed that small probabilities are weighted in a disproportionate way, namely, that f(p) > p for small values of p, and that a converse inequality holds when p is close to 1. If we were to separate this idea from the other ingredients of PT (most notably, from gain–loss asymmetry), we would have the following generalization of EUT: faced with a lottery that promises an outcome xi with probability pi , the decision maker evaluates it by f(pi )u(xi ) rather than by pi u(xi ). While this idea can quite intuitively explain many violations of EUT under risk, it poses several theoretical difficulties. First, the evaluation of a lottery depends on its presentation: if the same outcome appears twice in the lottery (with two distinct probabilities), they will enter utility calculations differently than in the case in which they appear only once, with the sum of the corresponding probabilities. Second, the functional f(pi )u(xi ) fails to be continuous in the outcomes xi . Finally, it fails to respect first order stochastic dominance. Specifically, it may decrease as some of the xi increase. All of these problems disappear if f is additive, but in this case it has to be the identity function and the model fails to capture distortion of probabilities (Fishburn, 1978). Prospect theory dealt with the first difficulty by an editing phase that the decision maker goes through before evaluating lotteries.13 But it did not offer solutions to the other two problems. These problems are reminiscent of those that one encounters when one attempts to use a naïve definition of integration with respect to a nonadditive measure, as discussed earlier (see Figure 1.1). Specifically, if one starts out with an additive probability measure P , and “distorts” it by a function f , one obtains a nonadditive probability v defined by v(A) = f(P (A)). In this case,
14
Itzhak Gilboa the functional f(pi )u(xi ) can be thought of as the naïvely defined integral of the utility of the outcome x, with respect to v. It comes as no surprise, then, that maximization of the functional f(pi )u(xi ) poses the same difficulties as those discussed earlier. The discussion of Choquet integration earlier may suggest that, in the context of decision under risk, PT may be modified so that it respects first order stochastic dominance and continuity. To this end, one would like to apply the distortion function f not to the probability that a certain outcome is obtained, but to the cumulative probability, that at least a certain outcome is obtained. Defining v = f(P ), this is tantamount to defining an integral as in Figure 1.2 as opposed to Figure 1.1. This idea was proposed, independently and more or less concurrently, by Quiggin (1982) and by Yaari (1987). Both were developed independently of Schmeidler’s work. Moreover, Weymark (1981) offered yet another independent derivation of the same functional in the context of social choice. The resulting model in the context of decision under risk has come to be known as the “rankdependent expected utility model” (see Chew (1983)), because in this model the decision weight of outcome xi does not depend only on the probability of that outcome, but also on the aggregate probabilities of all outcomes that are ranked above (or below) it. The rank-dependent model has been elaborated on by Segal (1989), and it has been applied to a range of economic problems, as well as tested experimentally. More recently, the rank-dependent model was combined with other ideas of PT to generate cumulative prospect theory (CPT, see Tversky and Kahneman, 1992). The rank-dependent model (without additional ingredients of PT) is a special case of CEU. Specifically, defining v = f(P ), CEU reduces to RDEU. However, the converse is false: not every CEU model can be represented as a RDEU model. Only a very special class of nonadditive probability measures v can be represented by an additive measure P and a distortion function f as stated earlier. The RDEU model does not deal with uncertainty. It is restricted to situations of risk, that is, of known probabilities. In particular, the RDEU model cannot help explain the pattern of choices observed in Ellsberg’s paradox. We therefore do not discuss the RDEU model in this book. 1.5.2. Belief functions The notion that a nonadditive set function may represent bounds on the probabilities of events dates back to the theory of belief functions suggested by Dempster (1967) and Shafer (1976). Assume that one gathers evidence for various events. Some evidence is very specific, suggesting that a particular state of the world is likely to be the case. Other evidence may be more nebulous, suggesting that a nonsingleton event is likely to have obtained, without specifying which state in it has indeed materialized. Generally, evidence is conceptualized as a non-negative number attached to an event. Thus, evidence is specific to an event, and the weight of evidence is measured numerically. Given a collection of such number–event pairs, what is the total weight of evidence supporting a particular event? In answering this question according to
Introduction
15
Dempster–Shafer’s theory, one first normalizes the weight of all evidence gathered so that it adds up to unity. Then one sums up the evidence for the event in question, as well as the evidence for each subset thereof. The resulting function is a nonadditive probability. It can also be shown that this nonadditive probability is convex. In fact, it satisfies a stronger condition than convexity, called infinite monotonicity. Conversely, an infinite monotone nonadditive measure can be obtained from a set of non-negative weights as described earlier. Such functions are called belief functions. In this theory, nonadditivity arises from the fact that evidence is not fully specified. In the context of the coin example, one might imagine that we have evidence for the fact that one of the sides of the coin will come up, but no evidence that specifically points to any side. The weight of evidence for the event {H, T} cannot be split between {H} and {T}. If one were to think in terms of a “true”, “objective” probability measure, one would view the belief function as a lower bound on the values of this probability measure: each event should be assigned a probability that is at least as large as the value attributed to it by the belief function. Since belief functions are convex, and hence have a nonempty core, there are always probability measures that satisfy the constraints represented by a belief function. Dempster–Shafer’s theory is purely cognitive. It has no behavioral component, and no decision theory attached to it. Dempster and Shafer have not offered an axiomatic foundation for their theory. That is, there is no set of axioms on observable data, such as likelihood judgments, that characterize a unique belief function related to these data. Yet, it shares with CEU the representation of uncertainty by a nonadditive measure, and the potential interpretation of this nonadditive measure as a lower bound on what the “real”, additive probability might be.
1.5.3. Multiple priors with unanimity ranking Suppose that a decision maker entertains a set of probability measures as possible priors. For every act, she has a range of possible expected utility values, computed according to these priors. In making a decision, the decision maker may summarize this range by a single number, as suggested by the maxmin, Hurwicz’s or some other criterion. But she may also refrain from collapsing this set of expected utilities into a single number. Rather, she may retain the entire set of EU values, indexed by the priors, as a representation of the act’s desirability. It is then natural to suggest that act f is preferred to act g, if and only if for each and every possible probability measure p, of EU values of f , with respect to p is above that of g. If we think of each possible prior as the opinion of a given individual, then f is preferred to g, if and only if f is considered to be better than g unanimously. Alternatively, if we were to think of probability measures as columns in a decision matrix that specifies, for every act, a row of EU values, this criterion would coincide with strict domination.14 This decision rule was axiomatized, independently, by Gilboa (1984) and Bewley (1986, 2003), both relying on a theorem of Aumann (1962).15
16
Itzhak Gilboa
Strict domination is, obviously, a partial ordering. It follows that the decision theory, one ends up with, will have to remain silent of certain choices. Specifically, if an act f is preferred to another act g according to some priors, but converse preference holds for other priors, the theory does not offer any prediction of choice. This violation of the completeness axiom is viewed by many as problematic for several reasons. Theoretically, the completeness axiom is often justified by necessity: a decision has to be made, so that ultimately revealed preference will have to decide whether f is (at least weakly) preferred to g or vice versa. From a more practical viewpoint, it is often hard to conduct economic analysis when the theory leaves considerable freedom in terms of its predictions. Bewley’s attempt to deal with these difficulties was to suggest that there always is a “status quo” act, f0 , which gets to be chosen unless another act dominates it, and then this new one becomes the new status quo. Bewley has not offered a theory of how the status quo is generated. However, it is still possible that such a theory be offered and complement the unanimous multiple prior model to be an alternative to MMEU.
1.6. Conclusion Choquet expected utility and MMEU were suggested as theories of decision making under uncertainty, rejecting the first tenet of Bayesianism. While some researchers view them solely as theories of bounded rationality, their starting point is not mistakes that decision makers might commit, but the theoretical inadequacy of the Bayesian paradigm. Specifically, when there is no sufficient information for a generation of a prior probability, it is not obvious how one can choose a prior rationally. Instead, one may entertain uncertainty, and make decisions in a way that reflects one’s state of knowledge. The chapters in this volume constitute a sample of the works published on uncertainty in economic theory. They are divided into two main parts: theory and applications, containing more detailed introductions that may help to orient the reader. It (almost) goes without saying that this volume is not exhaustive. Many very good papers, published and unpublished, were written on the topics discussed here, but could not be included in the volume due to obvious constraints. In making the particular selection we offer the reader here, we strove for brevity and variety, in the hope of whetting the reader’s appetite, and with no claim to exhaust the important contributions to this literature. A final caveat relates to terminology. As is often the case, authors who contributed to this volume do not always agree on the appropriate terms for various concepts. In particular, some authors feel very strongly about the choice between “uncertainty” and “ambiguity” (and, correspondingly, between “uncertainty aversion” and “ambiguity aversion”), while they refer, for the most part, to the same concept. Similarly, “MMEU” is sometimes referred to as “MEU” (leaving the maximization out, as in “EU” and “CEU”), or as “the multiple prior model”. Since
Introduction
17
all relevant concepts are defined formally in a way that leaves no room for confusion, we decided to let everyone use the terms they prefer, in the hope that the diverse terms would lead to more fruitful associations.
Acknowledgments I thank my colleagues for comments and suggestions. In particular, this introduction has benefited greatly from many comments by Peter Wakker.
Notes 1 Two more assumptions are often entailed by “Bayesianism”. First, a Bayesian is supposed to conceive of all relevant eventualities. Second, she is expected to apply the Bayesian approach to any decision problem. We do not dwell on these assumptions here. 2 Moreover, the agent may use her probabilistic beliefs in her decisions in a way that uniquely identifies these beliefs, yet that differs from EU maximization, as suggested by Machina and Schmeidler’s (1992) “probabilistic sophistication”. 3 Savage’s axiom P2 states that, if two acts are equal on a given event, then it should not matter what they are equal to on that event. That is, that one can determine preference between them based solely on their values on the event on which they differ. This axiom is often referred to as “the Sure Thing Principle”, though this term has been used in several other ways as well. See Wakker (Chapter 2). 4 In fact, Schmeidler was not aware of Ellsberg’s work when he started his study in the early 1980s. 5 Schmeidler’s work appeared as a working paper in 1982. Some of the mathematical analysis were published separately in Schmeidler (1986). However, it was not until 1989 that his main paper appeared in print. 6 Schmeidler’s choice of the letter v to denote a nonadditive measure was probably guided by his past experience with cooperative game theory, in which the letter v is a standard notation for the characteristic function of a transferable utility cooperative game (which is a nonadditive set function). See Shapley (1965). 7 This set of constraints defines the core of v. Yet, even if this set is a nonempty, v need not be convex. It will, however, be exact (see Schmeidler, 1972). 8 That is, if p and q are in C, then so is αp + (1 − α)q. 9 Gilboa and Schmeidler (1989) derivation was conducted in the framework of Anscombe and Aumann (1963). Derivations that do not resort to objective probabilities were provided by Casadesus-Masanell et al. (2000) and Ghirardato et al. (2001). 10 To be concrete, with a finite state space, the beliefs modeled by CEU are characterized by finitely many parameters, whereas the beliefs modeled by MMEU constitute an infinitely dimensional class. This does not imply that MMEU is a generalization of CEU. MMEU only generalizes CEU with a convex capacity. Generally, capacities need not be convex. Moreover, CEU may reflect uncertainty liking behavior, whereas a certain form of uncertainty aversion is built into MMEU. 11 The axioms of MMEU in the Anscombe–Aumann framework are also easier to interpret than those of CEU. 12 The term “rank dependent utility” is also used for this model. 13 Prospect theory deals with prospects rather than with lotteries, namely, with outcomes as they are perceived relative to a reference point. The notion of a reference point, and the idea that people respond to changes rather than to absolute levels, are ingredients of PT that we ignore here. 14 Alternatively, one may borrow the idea of interval orders (Fishburn, 1985) and argue that f is preferred to g if and only if the entire interval of expected utility values of f
18
Itzhak Gilboa
is above that of g. This would correspond to the notion of “overwhelming” strategy in game theory, which is evidently stronger than domination. 15 Gilboa (1984) appeared in a master’s thesis, and has not been translated from Hebrew. Bewley’s paper appeared as a Cowles Foundation discussion paper in 1986. Bewley took this decision rule as a building block of an elaborate theory.
References Allais, M. (1953), “Le Comportement de L’Homme Rationel devant le Risque: critique des Postulates et Axioms de l’Ecole Americaine,” Econometrica, 21: 503–546. Anscombe, F. J. and R. J. Aumann (1963), “A Definition of Subjective Probability,” The Annals of Mathematics and Statistics, 34: 199–205. Aumann, R. J. (1962), “Utility Theory without the Completeness Axiom,” Econometrica, 30: 445–462. Bayes, T. (1764), “Essay Towards Solving a Problem in the Doctrine of Chances,” Philosophical Transactions of the Royal Society of London. Bewley, T. F. (2003), “Knightian Decision Theory, Part I,” Decisions in Economics and Finance, 25: 79–110 (Cowles Foundation Discussion Paper, 1986). Casadesus-Masanell, R., P. Klibanoff, and E. Ozdenoren (2000), “Maxmin Expected Utility over Savage Acts with a Set of Priors,” Journal of Economic Theory, 92: 35–65. Chew, S. H. (1983), “A Generalization of the Quasilinear Mean with Applications to the Measurement of Income Inequality and Decision Theory Resolving the Allais Paradox,” Econometrica, 51: 1065–1092. Choquet, G. (1953–4), “Theory of Capacities,” Annales de l’Institut Fourier, 5 (Grenoble): 131–295. de Finetti, B. (1937), “La Prevision: Ses Lois Logiques, Ses Sources Subjectives,” Annales de l’Institute Henri Poincare, 7: 1–68. Dempster, A. P. (1967), “Upper and Lower Probabilities Induced by a Multivalued Mapping,” Annals of Mathematical Statistics, 38: 325–339. Edwards, Ward (1954), “The Theory of Decision Making,” Psychological Bulletin, 51: 380–417. Ellsberg, D. (1961), “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75: 643–669. Fishburn, P. C. (1978), “On Handa’s ‘New Theory of Cardinal Utility’and the Maximization of Expected Return,” Journal of Political Economy, 86: 321–324. Fishburn, P. C. (1985), Interval Orders and Interval Graphs, John Wiley and Sons, New York, 1985. Ghirardato, P., F. Maccheroni, M. Marinacci et al. (2001), “A subjective Spin on Roulette Wheels,” Econometrica, 71 (6): 1897–1908, November 2003. (Reprinted as Chapter 6 in this volume.) Gilboa, I. (1984), Aggregation of Preferences, MA Thesis, Tel-Aviv University. Gilboa, I. and D. Schmeidler (1989), “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18: 141–153. Jaffray, J.-Y. (1989), “Linear Utility Theory for Belief Functions,” Operations Research Letters, 8: 107–112. Kahneman, D. and A. Tversky (1979), “Prospect Theory: An Analysis of Decision Under Risk,” Econometrica, 47: 263–291. Klibanoff, P., M. Marinacci, and S. Mukerji (2003), “A Smooth Model of Decision Making under Ambiguity,” mimeo.
Introduction
19
Knight, F. H. (1921), Risk, Uncertainty, and Profit. Boston, New York: Houghton Mifflin. Machina, M. and D. Schmeidler (1992), “A more Robust Definition of Subjective Probability,” Econometrica, 60: 745–780. Preston, M. G. and P. Baratta (1948), “An Experimental Study of the Auction Value of an Uncertain Outcome,” American Journal of Psychology, 61: 183–193. Quiggin, J. (1982), “A Theory of Anticipated Utility,” Journal of Economic Behaviour and Organization, 3: 323–343. Ramsey, F. P. (1931), “Truth and Probability,” The Foundation of Mathematics and Other Logical Essays. New York: Harcourt, Brace and Co. Savage, L. J. (1954), The Foundations of Statistics. New York: John Wiley and Sons. Schmeidler, D. (1972), “Cores of Exact Games, I,” Journal of Mathematical Analysis and Applications, 40: 214–225. Schmeidler, D. (1986), “Integral Representation without Additivity,” Proceedings of the American Mathematical Society, 97: 255–261. Schmeidler, D. (1989), “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57: 571–587. Segal, U. (1989), “Anticipated Utility: A Measure Representation Approach,” Annals of Operations Research, 19: 359–373. Shafer, G. (1976), A Mathematical Theory of Evidence. Princeton University Press, Princeton NJ. Shapley, L. S. (1965), “Notes on n-Person Games VII: Cores of Convex Games,” The RAND Corporation R.M. Reprinted as: Shapley, L. S. (1972), “Cores of Convex Games,” International Journal of Game Theory, 1: 11–26. (Reprinted as Chapter 5 in this volume.) Tversky, A. and D. Kahneman (1974), “Judgment under Uncertainty: Heuristics and Biases,” Science, 185(4157): 1124–1131. Tversky, A. and D. Kahneman (1981), “The Framing of Decisions and the Psychology of Choice,” Science, 211(4481): 453–458. Tversky, A. and D. Kahneman (1992), “Advances in Prospect Theory: Cumulative Representation of Uncertainty,” Journal of Risk and Uncertainty, 5: 297–323. Weymark, J. A. (1981), “Generalized Gini Inequality Indices,” Mathematical Social Sciences, 1: 409–430. Yaari, M. E. (1987), “The Dual Theory of Choice under Risk,” Econometrica, 55: 95–115.
2
Preference axiomatizations for decision under uncertainty Peter P. Wakker
Several contributions in this book present axiomatizations of decision models, and of special forms thereof. This chapter explains the general usefulness of such axiomatizations, and reviews the basic axiomatizations for static individual decisions under uncertainty. It will demonstrate that David Schmeidler’s contributions to this field were crucial.
2.1. The general purpose of axiomatizations In this section we discuss some general purposes of axiomatizations. In particular, the aim is to convince the reader that axiomatizations are an essential step in the development of new models. To start, imagine that you are a novice in decision theory, and have an important decision to take, say which of several risky medical treatments to undergo. You consult a decision theorist, and she gives you a first advice, as follows: 1 2 3 4
List all relevant uncertainties. In your case we assume that the uncertainty concerns which of n potential diseases s1 , . . . , sn is the one you have. Express your uncertainty about what your disease is numerically through probabilities p1 , . . . , pn , subjective if necessary. Express numerically how good you think the result is of each treatment conditional upon each disease. Call these numbers utilities. Of the available treatments, choose the one that maximizes expected utility, that is, the probability-weighted average utility.
Presented in this way, the first advice is ad hoc, and will not convince you. What are such subjective probabilities, and how are you to choose them? Similar questions apply to the utility numbers. And, if such numbers can be chosen, why should you take products of probabilities and utilities, and then sum these products? Why not use other mathematical operations? The main problem with the first advice is that its concepts of probabilities and utilities do not have a clear meaning. They are theoretical constructs, which means that they have no meaning in isolation, but can only get meaning within a model, in relation to other concepts. The decision theorist did not succeed in convincing you, and she now turns to a second advice, seemingly very different. She explains the meaning of transitivity
Preference axiomatizations under uncertainty
21
and completeness of preferences to you, and you declare that you want to satisfy these conditions. She next explains the sure-thing principle to you, meaning that a choice between two treatments should depend only on their results under those diseases where the treatments differ, and not on the results for diseases for which the two treatments give the same results. Let us assume that you want to satisfy this condition as well. Next the decision theorist succeeds in convincing you of the appropriateness of the other preference conditions of Savage (1954). Satisfying these conditions is the decision analyst’s second advice. The second advice is of a different nature than the first. All of its conditions have been stated directly in terms of choice making. Even if you would not agree with the appropriateness of all conditions, at least you can relate to them, and know what they mean. They do not concern strange undefined theoretical concepts. Still, and this was Savage’s (1954) surprising result, the two advices turn out to be identical. One holds if and only if the other holds, given a number of technical assumptions that we ignore here. Whereas the second advice seemed to be entirely different from the first, it turns out to be the same. The second advice translates the first advice, which was stated in a theoretical language, into the meaningful language of empirical primitives, that is, preferences. Such translations are called axiomatizations. They reformulate, directly in terms of the observable primitives such as choices, what it means to assume that some theoretical model holds. A decision model is normatively appropriate if and only if its characterizing axioms are, and is descriptively valid if and only if the characterizing axioms are. Axiomatizations can be used to justify a model, but also to criticize it. Expected utility can be criticized by criticizing, for instance, the sure-thing principle. This is what Allais (1953) did. If a model is to be falsified empirically, then axioms can be of help because they are stated in terms of directly testable empirical primitives. In applications, we usually do not believe models to hold true perfectly well, and use them as approximations or as metaphors, to clarify some aspects of reality that are relevant to us. We mostly do not actually measure the concepts used in models. For instance, most economic models assume that consumers maximize utility, but we rarely measure consumers’ utility functions. The assumption of utility maximization is justified by the belief that for the topics considered, completeness and transitivity of preference are reasonable assumptions. These preference axioms, jointly with continuity, axiomatize the maximization of utility and clarify the validity and limitations thereof. Axiomatizations are crucial at an early stage of the development of new models or concepts, namely at the stage where setups and intuitions are qualitative but quantifications seem to be desirable. Not only do axiomatizations show how to verify or falsify, and how to justify or criticize given models, but they also demonstrate what are the essential parameters and concepts to be measured or determined. Without axiomatizations of expected utility, Choquet expected utility (CEU), and multiple priors, it would not be clear whether at all their concepts such as utility etc. are sensible concepts, and are at all the parameters to be assessed. Ahistorical example may illustrate the importance of axiomatizations. For a long time, models were popular that deviated from expected utility by transforming
22
Peter P. Wakker
probabilities of separate outcomes, such as those examined by Edwards (1955) and Kahneman and Tversky (1979). These models were never axiomatized, which could have served as a warning signal that something was wrong. Indeed, in 1978, Fishburn discovered that no sensible axiomatization of such models will ever be found because these models violate basic axioms such as continuity and, even more seriously, stochastic dominance. When Quiggin (1982) and Schmeidler (1989, first version 1982) introduced alternative models of nonlinear probabilities, they took good care of providing axiomatic foundations. This made clear what the empirical meaning of their models is, that these models do not contain intrinsic inconsistencies, and that their concepts of utilities and nonlinear probabilities are sensible. Quiggin (1982) and Schmeidler (1989) independently developed the idea of rank-dependence and, thus, were the first to present sound models that allow for a new component in individual decision theory: a subjective decision attitude toward incomplete information (i.e. risk and uncertainty). This new component is essential for the study of decision under incomplete information, and sound models for handling it had been dearly missing in the literature up to that point. I consider this development the main step forward for decision under incomplete information of the last decades. Quiggin developed his idea for decision under risk, Schmeidler for the more important and more subtle domain of decision under uncertainty, which is the topic of this book. Axioms can be divided into three different classes. First there are the basic rationality axioms such as transitivity, completeness, and monotonicity, which are satisfied by most models studied today. For descriptive purposes, it has become understood during the last decades that these very basic axioms are the main cause of most deviations from theoretical models. For normative applications, these axioms are relatively uncontroversial, although there is no unanimous agreement on any axiom. The second class of axioms consists of technical axioms, mostly continuity, that impose a richness on the structures considered. For decision under uncertainty, these axioms impose a richness on the state space or on the outcome space. They are usually necessary for obtaining mathematical proofs, and will be further discussed later in this chapter. The third and final class of axioms consists of the “intuitive” axioms that are most characteristic of the models they characterize. They vary from model to model. For expected utility, the sure-thing principle (which amounts to the independence axiom for given probabilities) is the most characteristic axiom. Most axiomatizations of nonexpected utility models have relaxed this axiom. Many examples will be discussed in the following sections, and in other chapters in this book. I end this introduction with a citation from Gilboa and Schmeidler (2001), who concisely listed the purposes of axiomatizations as follows: Meta-theoretical: Define theoretical terms by observables (and enable their elicitation). Descriptive: Define terms of refutability. Normative: Do the right thing.
Preference axiomatizations under uncertainty
23
2.2. General conditions for decision under uncertainty S denotes a state space, with elements called states (of nature). Exactly one state is true, the others are not true. The decision maker does not know which state is the true one, and has no influence on the truth of the states (no moral hazard). For example, assume that a horse race will take place. Exactly one horse will win the race. Every s ∈ S refers to one of the horses participating, and designates the “state of nature” that this horse will win the race. Alternative terms for state of nature are state of the world or proposition. An event is a subset of S, and is true or obtains if it contains the true state of nature. For example, the event “A Spanish horse will win” is the set {s ∈ S: s is Spanish}. C denotes the outcome space, and F the set of acts. Formally, acts are functions from S to C, and F contains all such functions. A decision maker should choose between different acts. An act will yield the outcome f (s) for the decision maker where s is the true state of nature. Because the decision maker is uncertain about which state is true, she is uncertain about what outcome will result from an act, and has to make decisions under uncertainty. An alternative term for an act is statecontingent payoffs, and acts can refer to financial assets. Acts can be considered random variables with the randomness not expressed through probabilities but through states of nature. David Schmeidler is known for his concise ways of formulating things. In the abstract of Schmeidler (1989), he used only seven words to describe the above model: “Acts map states of nature to outcomes.” By , a binary relation on F , we denote the preference relation of the decision maker over acts. In decision under uncertainty, we study properties of the quadruple S, C, F , . A function V represents if V : F → R and f g if and only if V (f ) ≥ V (g). If a representing function exists, then must be a weak order, that is, it is complete (f g or g f for all acts f , g) and transitive. Completeness implies reflexivity, that is, f f for all acts f . We write f g if f g and not g f , f ∼ g if f g and g f , f ≺ g if g f , and f g if g f . For a weak order , ∼ is an equivalence relation, that is, it is symmetric (f ∼ g if g ∼ f ), transitive, and reflexive. Outcomes are often identified with the corresponding constant acts. In this way, on F generates a binary relation on C, denoted by the same symbol and identified with the restriction of to the constant acts. Decision under risk refers to the special case of decision under uncertainty where an objective probability measure Q on S is given, and f ∼ g whenever f and g generate the same probability distribution over C. Then the only information relevant for the preference value of an act is the probability distribution that the act generates over the outcomes. Therefore, acts are usually identified with the probability distributions generated over the outcomes, and S is suppressed from the model. It is useful to keep in mind, though, that probabilities must be generated by some random process, and that some randomizing state space S is underlying, even if not an explicit part of the model. It is commonly assumed in decision under risk that S is rich enough to generate all probabilities, and all probability distributions. My experience in decision under risk and uncertainty has been that
24
Peter P. Wakker
formulations of concepts for the general context of uncertainty are more clarifying and intuitive than formulations only restricted to the special case of risk. This chapter will focus on axiomatizations for decision under uncertainty, the central topic of this book, and will not discuss axiomatizations for decision under risk. Often, axiomatizations for decision under risk readily follow simply by restricting the axioms of uncertainty to the special case of risk. For example, Yaari’s (1987) axiomatization of rank-dependent utility for risk can be obtained as a mathematical corollary of Schmeidler (1989); I will not elaborate on this point. We will also restrict attention to static models, and will not consider dynamic decision making or multistage models such as examined by Luce (2000) unless serving to interpret static models. Other restrictions are that we only consider individual decisions, and do not examine decompositions of multiattribute outcomes. We will neither discuss topological or measure-theoretic details, and primarily refer to works introducing results and not to follow-up works and generalizations. The most well-known representation for decision under uncertainty is subjective expected utility (SEU). SEU holds if there exists a probability measure P on S, and a utility function U : C → R, such that f → S U (f (s) dP (s), the SEU of f , represents preferences. For infinite state spaces S, measure-theoretical conditions can be imposed to ensure that the expectation is well defined for all acts considered. For the special case of decision under risk, P has to agree with the objective probability measure on S under mild richness assumptions regarding S, contrary to what has often been thought in the psychological literature. In general, P need not be based on objective statistical information, and may be based on subjective judgments of the decision situation in the same way as U is. P is, therefore, often called a subjective probability measure. SEU implies monotonicity, that is, f g whenever f (s) g(s) for all s, where furthermore f g if f (s) = α β = g(s) for outcomes α, β and all s in an event E that is “nonnull” in some sense. E being nonnull means that the outcomes of E can affect the preference value of an act, in a way that depends on the theory considered and that will not be formalized here. The most important implication of SEU is the sure-thing principle, discussed informally in the introduction. It means that a preference between two acts is not affected if, for an event for which the two acts yield the same outcome, that common outcome is changed into another common outcome. The condition holds true under SEU, because an event with a common outcome contributes the same term to the expected-utility integral of both acts, which will cancel from the comparison irrespective of what that common outcome is. Savage (1954) introduced this condition as his P2. He did not use the term sure-thing principle for this condition alone, but for a broader idea. The term is, however, used exclusively for Savage’s P2 nowadays. In a mathematical sense, the sure-thing principle can be equated with separability from consumer demand theory, although Savage developed his idea independently. The condition can be derived from principles for dynamic decisions (Burks, 1977: chapter 5; Hammond, 1988), a topic that falls outside the scope of this chapter.
Preference axiomatizations under uncertainty
25
The sure-thing principle is too weak to imply SEU. For instance, for a fixed partition (A1 , . . . , An ) of S, and acts (A1 : x1 ; . . . ; An : xn ) yielding xj for each s ∈ Aj , the sure-thing principle amounts to an additively decomposable representation V1 (x1 ) + · · · Vn (xn ), under some technical assumptions discussed later. This representation is strictly more general than the SEU representation P (A1 )U (x1 ) + · · · + P (An )U (xn ), for instance if V2 = exp(V1 ). It can be interpreted as state-dependent expected utility (Karni, 1985). Therefore, additional conditions are required to imply the SEU model. The particular reinforcements of the sure-thing principle depend on the particular model chosen, and are discussed in the next section.
2.3. Conditions to characterize subjective expected utility The most desirable characterization of SEU, or any model, would concern an arbitrary set of preferences over acts, not necessarily a complete set of preferences over a set F , and would give necessary and sufficient conditions for the preferences considered to be representable by SEU. Most important would be the case of a finite set of preferences, to truly capture the empirical and normative meaning of models such as SEU. Unfortunately, such general results are very difficult to obtain. For SEU, necessary and sufficient conditions for finite models were given by Shapiro (1979). These conditions are, however, extremely complex, and amount to general solvability requirements of inequalities for mathematical models called rings. They do not clarify the intuitive meaning of the model. Therefore, people have usually resorted to continuity conditions so as to simplify the axiomatizations of models. These continuity conditions imply richness of either the state space or the outcome space. Difficulties in using such technical richness conditions are discussed by Krantz et al. (1971: section 9.1) and Pfanzagl (1968: section 9.5). The following discussion is illustrated in Table 2.1. The most prominent model with richness of the state space is Savage (1954). Savage added an axiom P4 to the sure-thing principle, requiring that a preference for betting on one event rather than another is independent of the stakes of the bets. The richness of the state space was ensured by an axiom P6 requiring arbitrarily fine partitions of the state space to exist, so that the state space must be atomless. Decision under risk can be considered a special case of decision under uncertainty where the state space is rich, because it is commonly assumed that all probabilities can be generated by random events. Other than that, there have not been many derivations of SEU with a rich state space. Most axiomatizations have imposed richness structure on the outcome space, to which we turn in the rest of this section. We start with approaches that assume convex subsets of linear spaces as outcome space, with linear utility. In these approaches, outcomes are either monetary, with C ⊂ R an interval, or they are probability distributions over a set of prizes. The sure-thing principle is reinforced into linearity with respect to addition (f g ⇒ f + c g + c for acts f , g, c, where addition is statewise), or mixing (f g ⇒ λf + (1 − λ)c λg + (1 − λ)c for acts f , g, c,
26
Peter P. Wakker
Table 2.1 Axiomatizations and their structural assumptions
Continuous state space U linear in money U linear in probability mixing, 2-stage Canonical probabilities Continuous U, tradeoff consistency Continuous U, multisymmetry Continuous U, actindependence
SEU
CEU
Savage (1954)
Gilboa (1987)∗
de Finetti (1931, 1937) Anscombe and Aumann (1963)
Chateauneuf (1991) Schmeidler (1989)
Raiffa (1968), Sarin and Wakker (1997) Wakker (1984)
Sarin and Wakker (1992) Wakker (1989)
Nakamura (1990) Gul (1992)
Nakamura (1990) Chew and Karni (1994), Ghirardato et al. (2003)
PT
Multiple priors
Chateauneuf (1991) Gilboa and Schmeidler (1989) Sarin and Wakker (1994) Tversky and Kahneman (1992) × ×
Ghirardato et al. (2003); CasadesusMasanell et al. (2000)
Notes × Such an extension is not possible, because the required certainty equivalents are not contained in most of the sign-comonotonic sets. ∗ Required more modifications than only comonotonic restrictions.
where mixing is statewise, and under continuity can be restricted to λ = 12 ). Both of these approaches characterize SEU with a linear utility function. The additive approach was followed by de Finetti (1931, 1937) and Blackwell and Girshick (1954: theorem 4.3.1 and problem 4.3.1). For the mixture approach, Anscombe and Aumann (1963) provided the most appealing result. For earlier results on mixture spaces, see Arrow (1951: 431–432). In addition to the axioms mentioned, these works used weak ordering, monotonicity (this, together with additivity, is what de Finetti’s book-making amounts to), and some continuity (existence of “fair prizes” for de Finetti, continuous mixing for the mixture approaches). In the mixture approaches, the linear utility function is interpreted as an expected utility functional for the probability distributions over prizes, and acts are two-stage: In the first stage, the uncertainty about the true state of nature is resolved yielding a probability distribution over prizes, in the second stage the probability distribution is resolved, finally leading to a prize. This approach assumes that the two stages are processed through backwards induction (“folding back”). The second-stage probabilities could also be modeled through a rich product state space, but for this survey the categorization as rich outcomes is more convenient.
Preference axiomatizations under uncertainty
27
An alternative to Anscombe and Aumann’s (1963) approach was customary in the early decision-analysis literature of the 1960s (Raiffa, 1968: chapter 5). As in Anscombe and Aumann (1963), a rich set of events with objectively given probabilities was assumed present, with preferences over acts on these events governed by expected utility. However, these events were not part of a second stage to be resolved after the events of interest, but they were simply a subset of the collection of events considered in the first, and only, stage. Formally, this approach belongs to the category that requires a rich state space. To evaluate an arbitrary act (A1 : x1 ; . . . ; An : xn ), where no objective probabilities are given for the events Aj , a canonical representation (E1 : x1 ; . . . ; En : xn ) is constructed. Here each event Ej does have an objective probability and is equally likely as event Aj in the sense that one would just as well bet $1 on Ej as on Aj . It is assumed that such canonical representations can be constructed and are preferentially equivalent. In this manner, SEU is obtained over all acts. Sarin and Wakker (1997) formalized this approach. Ramsey (1931) can be interpreted as a variation of this canonical approach, with his “ethically neutral” event an event with probability half, utility derived from gambles on this event, and the extension of SEU to all acts and events not formalized. Returning to the approach with rich outcome sets, more general axiomatizations have been derived for continuous instead of linear utility. Then C can, more generally, be a connected topological space. For simplicity, we continue to assume that C is a convex subset of a linear space. Pfanzagl (1959) gave an axiomatization of SEU when restricted to two-outcome acts. He added a bisymmetry axiom to the sure-thing principle. Denote by CE(f ) a certainty equivalent of act f , that is, an outcome (identified with a constant act) equivalent to f . For events A, M with complements Ac , M c , bisymmetry requires that (A: CE(M: x1 ; M c : y1 ); Ac : CE(M: x2 ; M c : y2 )) ∼ (M: CE(A: x1 ; Ac : x2 ); M c : CE(A: y1 ; Ac : y2 )). For arbitrary finite state spaces S, Grodal (1978) axiomatized SEU with continuous utility using a mean-groupoid operation (a generalized mixture operation derived from preference) developed by Vind. These works were finally published in Vind (2003). Wakker (1984, 1993) characterized SEU for continuous utility using a tradeoff consistency technique based on conjoint measurement theory of Krantz et al. (1971) and suggested by Pfanzagl (1968: end of remark 9.4.5). The basic axiom requires that (A1 : α; A2 : x2 ; . . . ; An : xn ) (A1 : β; A2 : y2 ; . . . ; An : yn ), (A1 : γ ; A2 : x2 ; . . . ; An : xn ) (A1 : δ; A2 : y2 ; . . . ; An : yn ), and (A1 : v1 ; . . . ; An−1 : vn−1 ; An : α) (A1 : v1 ; . . . ; An−1 : vn−1 ; An : β)
28
Peter P. Wakker
imply (A1 : v1 ; . . . ; An−1 : vn−1 ; An : γ ) (A1 : v1 ; . . . ; An−1 : vn−1 ; An : δ), where (A1 , . . . , An ) can be any partition of S. By renumbering, similar conditions follow for outcomes α, β, γ , δ conditional on all pairs of events Ai , Aj . Nakamura (1990) used multi-symmetry, a generalization of Pfanzagl’s (1959, 1968) bisymmetry to general acts, to characterize SEU with continuous utility for finite state spaces. Similar conditions had appeared before in decision under risk (Quiggin, 1982; Chew, 1989). Chew called the condition event commutativity. Consider a partition (A1 , . . . , An ) and a “mixing” event M with complementary event M c . Multisymmetry requires that (A1 : CE(M: x1 ; M c : y1 ); . . . ; An : CE(M: xn ; M c : yn )) ∼ (M: CE(A1 : x1 ; . . . ; An : xn ); M c : CE(A1 : y1 ; . . . ; An : yn )). Multisymmetry implies that (x1 , . . . , xn ) is separable in (A1 : CE(M: x1 ; M c : c1 ); . . .; An : CE(M: xn ; M c : cn )). This implication is called act-independence, and was introduced by Gul (1992). Formally, the condition requires that (A1 : x1 ; . . . ; An : xn ) (A1 : y1 ; . . . ; An : yn ) implies (A1 : CE(M: x1 ; M c : c1 ); . . . ; An : CE(M: xn ; M c : cn )) (A1 : CE(M: y1 ; M c : c1 ); . . . ; An : CE(M: yn ; M c : cn )). Gul showed that this condition suffices to characterize SEU with continuous utility for finite state spaces, under the usual other assumptions. Gul used an additional symmetry requirement that was shown to be redundant by Chew and Karni (1994). Using bisymmetry axioms for two-outcome acts, Ghirardato et al. (2003a) defined a mixture operation that can be interpreted as an endogeneous analog of the mixture operation used in Anscombe and Aumann (1963). They used it also to derive nonexpected utility models discussed in the next section. Characterizations of properties of utility such as concavity have mostly been studied for decision under risk, and less so for decision under uncertainty. Also for uncertainty, utility is concave if and only if the subjective expected value of an act is always preferred to the act (Wakker, 1989; proposition VII.6.3.ii). This result is more difficult to prove than for decision under risk because not all probabilities need to be available, and is less useful because the subjective expected value is not directly observable, in the same way as subjective probabilities are not. More interesting for uncertainty is that utility is concave if and only if preferences are convex with respect to the mixing of outcomes, that is, if f g then 12 f + 12 g g where outcomes are mixed statewise (Wakker, 1989: proposition VII.6.3.iv). This condition has the advantage that it is directly observable.
Preference axiomatizations under uncertainty
29
2.4. Nonexpected utility models This section considers models deviating from SEU. Abandoning basic axioms. Models abandoning completeness (Bewley, 1986; Dubra et al., 2004), transitivity (Fishburn, 1982; Loomes and Sugden, 1982; Vind, 2003), or continuity (Fishburn and LaValle, 1993) will not be discussed. We will only discuss models that weaken the sure-thing principle. In this class, we will not discuss betweenness models (Chew, 1983; Dekel, 1986; Epstein, 1992). These models have been examined almost exclusively for risk, with statements for uncertainty only in Hazen (1987) and Sarin and Wakker (1998), and have nowadays lost popularity. We will neither discuss quadratic utility (Chew et al., 1991), which has been stated only for decision under risk. Choquet expected utility. The first nonexpected utility model that we discuss is rank-dependent utility, or Choquet expected utility (CEU) as it is often called when considered for uncertainty. We assume a utility function as under SEU, but instead of a subjective probability P on S we assume, more generally, a capacity W on S. W is defined on the collection of subsets of S with W (Ø) = 0, W (S) = 1, and C ⊃ D ⇒ W (C) ≥ W (D). is represented by f → S U (f (s)) dW (s), the CEU of f , defined next. Assume that f = (E1 : x1 ; . . . ; En : xn ). The integral is nj=1 πj U (xj ) where the πj s are defined as follows. Take a permutation ρ on {1, . . . , n} such that xρ(1) ≥ · · · ≥ xρ(n) . πρ(j ) is W (Eρ(1) ∪ · · · ∪ Eρ(j ) ) − W (Eρ(1) ∪ · · · ∪ Eρ(j −1) ); in particular, πρ(1) = W (Eρ(1) ). An important concept in CEU, introduced by Schmeidler (1989), is comonotonicity. Two acts f and g are comonotonic if f (s) > f (t) and g(s) < g(t) for no states s, t. A set of acts is comonotonic if every pair of its elements is comonotonic. Comonotonicity is an important concept because, as can be proved, within any comonotonic subset of F the CEU functional is an SEU functional (with numbers such as the above πρ(j ) playing the role of probabilities). It is, therefore, obvious that a necessary requirement for CEU is that all conditions of SEU hold within comonotonic subsets. Such restrictions are indicated by the prefix comonotonic, leading to the comonotonic sure-thing principle, etc. It is more complex to demonstrate that these comonotonic restrictions are also sufficient to imply CEU, but this can be proved in many circumstances. The third column of Table 2.1 gives the axiomatizations of CEU. Prospect theory. Original prospect theory, introduced by Kahneman and Tversky (1979), assumed nonlinear probability weighting but had theoretical problems, and was defined only for risk, not for uncertainty. Only when Schmeidler (1989) introduced a sound model for nonlinear probabilities, could a model of prospect theory be developed that is theoretically sound and that also deals with uncertainty (Tversky and Kahneman, 1992). We define it next. Under prospect theory, one outcome, called the reference outcome, plays a special role. Outcomes preferred to the reference outcome are gains, outcomes preferred less than the reference outcome are losses. The main deviation from other theories is that in different decision situations the decision maker may choose
30
Peter P. Wakker
different reference points, and remodel her decisions accordingly. Although there is much empirical evidence for such procedures, formal theories to describe them have not yet been developed. We will therefore restrict attention, in this theoretical chapter, to one fixed reference point. For results on varying reference points, see Schmidt (2003). With a fixed reference point, prospect theory generalizes CEU and SEU in that it allows for a different capacity, W − , for losses than for gains, where the gain capacity is denoted as W + . Under prospect theory we define, for an act f , f + by replacing all losses of f by the reference outcome, and f − by replacing all gains of f by the reference outcome. Our notation f − deviates from mathematical conventions that, for real-valued functions f , take f − as a positive function, being our function f − multiplied by −1. For general outcomes, however, such a multiplication cannot be defined, which explains our definition. The prospect theory (PT) of an act f is PT(f ) = CEU(f + ) + CEU(f − ) where CEU(f + ) is with respect to W + and CEU(f − ) is with respect to the dual of W − , assigning 1 − W − (Ac ) to each event A (Ac denotes complement). Two acts f , g are sign-comonotonic if they are comonotonic and, further, there is no state s such that of f (s), g(s) one is a gain and the other a loss. A set of acts is sign-comonotonic if any pair of its elements is sign-comonotonic. Signcomonotonicity plays the same role for PT as comonotonicity for CEU. Within any sign-comonotonic set, PT agrees with SEU and, therefore, all conditions of SEU are satisfied within sign-comonotonic sets. A more difficult result, that can be proved in several situations, is that PT holds as soon as the sign-comonotonic conditions of SEU hold, that is, the restrictions of these conditions to signcomonotonic subsets of acts. Axiomatizations of PT are given in the fourth column of Table 2.1. Properties of utility and capacities under CEU and PT. Specific properties of utilities and capacities have been characterized for CEU, and for PT alike. Schmeidler (1989) demonstrated, in his CEU model with linear utility, that the capacity is convex (W (A ∪ B) + W (A ∩ B) ≥ W (A) + W (B)) if and only if preferences are convex. Chateauneuf and Tallon (2002) generalized this result by showing that, under differentiability assumptions, preferences are convex if and only if both utility is concave and the capacity W is convex. Wakker (2001) gave necessary and sufficient conditions for convexity of the capacity, without restricting the form of utility other than being continuous. Tversky and Wakker (1995) characterized a number of other conditions of capacities, such as bounded subadditivity, that are often found in experimental tests of prospect theory. Multiple priors. Another popular deviation from expected utility is the multiple priors model. As in SEU, it assumes a utility function U over outcomes. It deviates by not considering one fixed probability measure, but a set of probability measures. Say C is such a set of probability measures over S. Then an act f is evaluated by minP ∈C SEUP (f ), where SEUP is taken with respect to P . This defines the multiple priors model. It was first characterized by Gilboa and
Preference axiomatizations under uncertainty
31
Schmeidler (1989) in an Anscombe–Aumann setup where outcomes designate probability distributions over prizes, evaluated through a linear utility function (an expected utility functional). In a comprehensive paper, Chateauneuf (1991) obtained the same characterization independently, also for linear utility, but with linearity relating to monetary outcomes. For two-outcome acts, the multiple priors model coincides with CEU utility, so that the common generalization of these two models that imposes the representation only on two-outcome acts can serve as a good starting point (Ghirardato and Marinacci, 2001). The axiomatization of the multiple priors model requires convexity of preference, implying that a representing functional is quasi-concave. Mainly independence with respect to constant functions: f g ⇒ λf +(1−λ)c λg +(1−λ)c for acts f , g, c, both in the Gilboa–Schmeidler approach and in Chateauneuf’s approach, where c is required to be constant, ensures that the representing functional is even concave. A functional is concave if and only if it is the minimum of dominating linear functions, which, under appropriate monotonicity, must be expected utility functionals. Thus, the multiple priors model results. The axiomatization of multiple priors for continuous instead of linear utility has been obtained by Casadesus-Masanell et al. (2000) who used both bisymmetrylike and tradeoff-consistency-like axioms, and Ghirardato et al. (2003a) who used bisymmetry-like axioms to define an endogeneous mixture operation. A less conservative extension of the multiple priors model is the α-Hurwicz criterion, where acts are evaluated by α times the minimal SEU plus 1 − α times the maximal SEU over C. It was axiomatized by Ghirardato et al. (2003b). Probabilistic sophistication. We finally discuss probabilistic sophistication. The derivation of SEU can be divided into two steps. In the first step, uncertainty is quantified through probabilities and the only relevant aspect for the preference value of an act is the probability distribution that it generates over the outcomes. In the second step, the probability distribution over outcomes is evaluated through expected utility. Probabilistic sophistication refers to the first of these steps without imposing expected utility in the second step. A first characterization was given by Machina and Schmeidler (1992), with an appealing generalization in Epstein and LeBreton (1993). The main axiom is de Finetti’s (1949) additivity: If you rather bet on A than on B, then you also rather bet on A ∪ D than on B ∪ D for any event D disjoint from A and B. Under appropriate richness of the event space, this axiom implies that there exists a probability measure P on the events such that you rather bet on A than on B if and only if P (A) ≥ P (B) (for a review, see Fishburn, 1986). Additional assumptions then guarantee that two different acts that generate the same probability distribution over outcomes are equivalent, which implies probabilistic sophistication.
2.5. Conclusion For all models discussed, axiomatizations provided a crucial step in the beginning of their developments, when it was not entirely clear what the right subjective
32
Peter P. Wakker
parameters and their quantitative rules of combination were. It is remarkable that prospect theory could be modeled in a sound way only after Schmeidler (1989) had developed the first axiomatization of decision under uncertainty with nonlinear decision weights.
References Allais, Maurice (1953), “Fondements d’une Théorie Positive des Choix Comportant un Risque et Critique des Postulats et Axiomes de l’Ecole Américaine,” Colloques Internationaux du Centre National de la Recherche Scientifique 40, Econométrie, 257–332. Paris: Centre National de la Recherche Scientifique. Translated into English, with additions, as “The Foundations of a Positive Theory of Choice Involving Risk and a Criticism of the Postulates and Axioms of the American School,” in Maurice Allais and Ole Hagen (1979, eds), Expected Utility Hypotheses and the Allais Paradox, 27–145, Reidel, Dordrecht, The Netherlands. Anscombe, F. J. and Robert J. Aumann (1963), “A Definition of Subjective Probability,” Annals of Mathematical Statistics 34, 199–205. Arrow, Kenneth J. (1951), “Alternative Approaches to the Theory of Choice in Risk-Taking Situations,” Econometrica 19, 404–437. Bewley, Truman F. (1986), “Knightian Decision Theory Part I,” Cowles Foundation Discussion Paper No. 807. Blackwell, David and M. A. Girshick (1954), “Theory of Games and Statistical Decisions.” Wiley, New York. Burks, Arthur W. (1977), “Chance, Cause, Reason (An Inquiry into the Nature of Scientific Evidence).” The University of Chicago Press, Chicago. Casadesus-Masanell, Ramon, Peter Klibanoff, and Emre Ozdenoren (2000), “Maxmin Expected Utility over Savage Acts with a Set of Priors,” Journal of Economic Theory 92, 35–65. Chateauneuf, Alain (1991), “On the Use of Capacities in Modeling Uncertainty Aversion and Risk Aversion,” Journal of Mathematical Economics 20, 343–369. Chateauneuf, Alain and Jean-Marc Tallon (2002), “Diversification, Convex Preferences and Non-Empty Core,” Economic Theory, 19, 509–523. Chew, Soo Hong (1983), “A Generalization of the Quasilinear Mean with Applications to the Measurement of Income Inequality and Decision Theory Resolving the Allais Paradox,” Econometrica 51, 1065–1092. Chew, Soo Hong (1989), “The Rank-Dependent Quasilinear Mean,” Unpublished manuscript, Department of Economics, University of California, Irvine, USA. Chew, Soo Hong and Edi Karni (1994), “Choquet Expected Utility with a Finite State Space: Commutativity and Act-Independence,” Journal of Economic Theory 62, 469–479. Chew, Soo Hong, Larry G. Epstein, and Uzi Segal (1991), “Mixture Symmetric and Quadratic Utility,” Econometrica 59, 139–163. de Finetti, Bruno (1931), “Sul Significato Soggettivo della Probabilità,” Fundamenta Mathematicae 17, 298–329. Translated into English as “On the Subjective Meaning of Probability,” in Paola Monari and Daniela Cocchi (eds, 1993) “Probabilità e Induzione,” Clueb, Bologna, 291–321. de Finetti, Bruno (1937), “La Prévision: Ses Lois Logiques, ses Sources Subjectives,” Annales de l’Institut Henri Poincaré 7, 1–68. Translated into English by Henry E. Kyburg Jr., “Foresight: Its Logical Laws, its Subjective Sources,” in Henry E. Kyburg Jr.
Preference axiomatizations under uncertainty
33
and Howard E. Smokler (1964, eds), Studies in Subjective Probability, Wiley, New York; 2nd edition 1980, Krieger, New York. de Finetti, Bruno (1949), “La ‘Logica del Plausible’ Secondo la Concezione di Pòlya,” Atti della XLII Riunione della Società Italiana per il Progresso delle Scienze, 227–236. Dekel, Eddie (1986), “An Axiomatic Characterization of Preferences under Uncertainty: Weakening the Independence Axiom,” Journal of Economic Theory 40, 304–318. Dubra, Juan, Fabio Maccheroni, and Efe A. Ok (2004), “Expected Utility without the Completeness Axiom,” Journal of Economic Theory, 115, 118–133. Edwards, Ward (1955), “The Prediction of Decisions Among Bets,” Journal of Experimental Psychology 50, 201–214. Epstein, Larry G. (1992), “Behavior under Risk: Recent Developments in Theory and Applications.” In Jean-Jacques Laffont (ed.), Advances in Economic Theory II, 1–63, Cambridge University Press, Cambridge, UK. Epstein, Larry G. and Michel Le Breton (1993), “Dynamically Consistent Beliefs Must Be Bayesian,” Journal of Economic Theory 61, 1–22. Fishburn, Peter C. (1978), “On Handa’s ‘New Theory of Cardinal Utility’ and the Maximization of Expected Return,” Journal of Political Economy 86, 321–324. Fishburn, Peter C. (1982), “Nontransitive Measurable Utility,” Journal of Mathematical Psychology 26, 31–67. Fishburn, Peter C. (1986), “The Axioms of Subjective Probability,” Statistical Science 1, 335–358. Fishburn, Peter C. and Irving H. LaValle (1993), “On Matrix Probabilities in Nonarchimedean Decision Theory,” Journal of Risk and Uncertainty 7, 283–299. Ghirardato, Paolo and Massimo Marinacci (2001), “Risk, Ambiguity, and the Separation of Utility and Beliefs,” Mathematics of Operations Research 26, 864–890. Ghirardato, Paolo, Fabio Maccheroni, Massimo Marinacci, and Marciano Siniscalchi (2003a), “A Subjective Spin on Roulette Wheels,” Econometrica, 71, 1897–1908. Ghirardato, Paolo, Fabio Maccheroni, and Massimo Marinacci (2003b), “Differentiating Ambiguity and Ambiguity Attitude,” Economic Dept, University of Torino. Gilboa, Itzhak (1987), “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics 16, 65–88. Gilboa, Itzhak and David Schmeidler (1989), “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics 18, 141–153. (Reprinted as Chapter 6 in this volume.) Gilboa, Itzhak and David Schmeidler (2001), lecture at 22nd Linz Seminar on Fuzzy Set Theory, Linz, Austria. Grodal, Birgit (1978), “Some Further Results on Integral Representation of Utility Functions,” Institute of Economics, University of Copenhagen, Copenhagen. Appeared in Vind, Karl (2003), “Independence, Additivity, Uncertainty.” With contributions by B. Grodal. Springer, Berlin. Gul, Faruk (1992), “Savage’s Theorem with a Finite Number of States,” Journal of Economic Theory 57, 99–110. (“Erratum,” 1993, Journal of Economic Theory 61, 184.) Hammond, Peter J. (1988), “Consequentialist Foundations for Expected Utility,” Theory and Decision 25, 25–78. Hazen, Gorden B. (1987), “Subjectively Weighted Linear Utility,” Theory and Decision 23, 261–282. Kahneman, Daniel and Amos Tversky (1979), “Prospect Theory: An Analysis of Decision under Risk,” Econometrica 47, 263–291.
34
Peter P. Wakker
Karni, Edi (1985), “Decision-Making under Uncertainty: The Case of State-Dependent Preferences.” Harvard University Press, Cambridge, MA. Krantz, David H., R. Duncan Luce, Patrick Suppes, and Amos Tversky (1971), “Foundations of Measurement, Vol. I. (Additive and Polynomial Representations).” Academic Press, New York. Loomes, Graham and Robert Sugden (1982), “Regret Theory: An Alternative Theory of Rational Choice under Uncertainty,” Economic Journal 92, 805–824. Luce, R. Duncan (2000), “Utility of Gains and Losses: Measurement-Theoretical and Experimental Approaches.” Lawrence Erlbaum Publishers, London. Machina, Mark J. and David Schmeidler (1992), “A More Robust Definition of Subjective Probability,” Econometrica 60, 745–780. Nakamura, Yutaka (1990), “Subjective Expected Utility with Non-Additive Probabilities on Finite State Spaces,” Journal of Economic Theory 51, 346–366. Pfanzagl, Johann (1959), “A General Theory of Measurement—Applications to Utility,” Naval Research Logistics Quarterly 6, 283–294. Pfanzagl, Johann (1968), “Theory of Measurement.” Physica-Verlag, Vienna. Quiggin, John (1982), “A Theory of Anticipated Utility,” Journal of Economic Behaviour and Organization 3, 323–343. Raiffa, Howard (1968), “Decision Analysis.” Addison-Wesley, London. Ramsey, Frank P. (1931), “Truth and Probability.” In “The Foundations of Mathematics and other Logical Essays,” 156–198, Routledge and Kegan Paul, London. Reprinted in Henry E. Kyburg Jr. and Howard E. Smokler (1964, eds), Studies in Subjective Probability, 61–92, Wiley, New York. (2nd edition 1980, Krieger, New York.) Sarin, Rakesh K. and Peter P. Wakker (1992), “A Simple Axiomatization of Nonadditive Expected Utility,” Econometrica 60, 1255–1272. (Reprinted as Chapter 7 in this volume.) Sarin, Rakesh K. and Peter P. Wakker (1994), “Gains and Losses in Nonadditive Expected Utility.” In Mark J. Machina and Bertrand R. Munier (eds), Models and Experiments on Risk and Rationality, Kluwer Academic Publishers, Dordrecht, The Netherlands, 157–172. Sarin, Rakesh K. and Peter P. Wakker (1997), “A Single-Stage Approach to Anscombe and Aumann’s Expected Utility,” Review of Economic Studies 64, 399–409. Sarin, Rakesh K. and Peter P. Wakker (1998), “Dynamic Choice and Nonexpected Utility,” Journal of Risk and Uncertainty 17, 87–119. Savage, Leonard J. (1954), “The Foundations of Statistics.” Wiley, New York. (2nd edition 1972, Dover, New York.) Schmeidler, David (1989), “Subjective Probability and Expected Utility without Additivity,” Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) Schmidt, Ulrich (2003), “Reference Dependence in Cumulative Prospect Theory,” Journal of Mathematical Psychology 47, 122–131. Shapiro, Leonard (1979), “Necessary and Sufficient Conditions for Expected Utility Maximizations: The Finite Case, with a Partial Order,” Annals of Statistics 7, 1288–1302. Tversky, Amos and Daniel Kahneman (1992), “Advances in Prospect Theory: Cumulative Representation of Uncertainty,” Journal of Risk and Uncertainty 5, 297–323. Tversky, Amos and Peter P. Wakker (1995), “Risk Attitudes and Decision Weights,” Econometrica 63, 1255–1280. Vind, Karl (2003), “Independence, Additivity, Uncertainty.” With contributions by B. Grodal. Springer, Berlin. Wakker, Peter P. (1984), “Cardinal Coordinate Independence for Expected Utility,” Journal of Mathematical Psychology 28, 110–117.
Preference axiomatizations under uncertainty
35
Wakker, Peter P. (1989), “Additive Representations of Preferences, A New Foundation of Decision Analysis.” Kluwer Academic Publishers, Dordrecht, The Netherlands. Wakker, Peter P. (1993), “Unbounded Utility for Savage’s ‘Foundations of Statistics,’ and other Models,” Mathematics of Operations Research 18, 446–485. Wakker, Peter P. (2001), “Testing and Characterizing Properties of Nonadditive Measures through Violations of the Sure-Thing Principle,” Econometrica 69, 1039–1059. Yaari, Menahem E. (1987), “The Dual Theory of Choice under Risk,” Econometrica 55, 95–115.
3
Defining ambiguity and ambiguity attitude Paolo Ghirardato
According to the well-known distinction attributed to Knight (1921), there are two kinds of uncertainty. The first, called “risk,” corresponds to situations in which all events relevant to decision making are associated with obvious probability assignments (which every decision maker agrees to). The second, called “(Knightian) uncertainty” or (following Ellsberg (1961)) “ambiguity,” corresponds to situations in which some events do not have an obvious, unanimously agreeable, probability assignment. As Chapter 1 makes clear, this collection focuses on the issues related to decision making under ambiguity. In this chapter, I briefly discuss the issue of the formal definition of ambiguity and ambiguity attitude. In his seminal paper on the Choquet expected utility (CEU) model David Schmeidler (1989) proposed a behavioral definition of ambiguity aversion, showing that it is represented mathematically by the convexity of the decision maker’s capacity v. The property he proposed can be understood by means of the example of the two coins used in Chapter 1. Assume that the decision maker places bets that depend on the result of two coin flips, the first of a coin that she is very familiar with, the second of a coin provided by somebody else. Given that she is not familiar with the second coin, it is possible that she would consider “ambiguous” all the bets whose payoff depends on the result of the second flip. (For instance, a bet that pays $1 if the second coin lands with heads up, or equivalently if the event {HH, TH} obtains.) If she is averse to ambiguity, she may therefore see such bets as somewhat less desirable than bets that are “unambiguous,” that is, only depend on the result of the first flip. (For instance, a bet that pays $1 if the first coin lands with heads up, or equivalently if the event {HH, HT} obtains.) However, suppose that we give the decision maker the possibility of buying shares of each bet. Then, if she is offered a bet that pays $0.50 on {HH} and $0.50 on {HT}, she may prefer it to either of the two bets that pay $1 contingently on {HH} or on {HT}, which are ambiguous. In fact, such a bet has the same contingent payoffs as a bet which pays $0.50 if the first coin lands with heads up, which is unambiguous. That is, a decision maker who is averse to ambiguity may prefer the equal-probability “mixture” of two ambiguous acts to either of the acts. In contrast, a decision maker who is attracted to ambiguity may prefer to choose one of the ambiguous acts.
Defining ambiguity and ambiguity attitude
37
Formally, Schmeidler called ambiguity averse a decision maker who prefers the even mixture1 (1/2)f + (1/2)g of two acts that she finds indifferent to either of the two acts. That is, (1/2)f + (1/2)g f for all f and g such that f ∼ g. As recalled earlier, if the decision maker has CEU preferences, this property implies that her capacity v is convex. If, instead, she has maxmin expected utility (MMEU) preferences, then she satisifies this property automatically (indeed, it is one of the axioms that characterize the model). While this is certainly a compelling definition, it does not seem to be fully satisfactory as a definition of ambiguity aversion. First of all, it explicitly relies on the availability of mixtures of acts, and thus apparently on the existence of objective randomizing devices. This is not a serious problem, for it has been shown by Ghirardato et al. (2001) that mixtures can be defined without invoking randomizing devices, provided the set of prizes is rich and preferences satisfy some mild restrictions. (Moreover, Casadesus-Masanell et al. (2000) show that Schmeidler’s definition can be formulated in a Savage setting which does not explicitly involve mixtures.) Second—and more important—Schmeidler’s definition is not satisfied by preferences that do seem to embody ambiguity aversion, as illustrated by the following example. Example 3.1. Consider again the decision maker facing the set S = {HH, HT, TH, TT} of results of flips of a familiar and an unfamiliar coin. Suppose that she has CEU preferences represented by a capacity v on S which: •
assigns 1/8 to each singleton state, that is, v({HH}) = v({HT}) = v({TH}) = v({TT}) =
•
assigns 1/2 to the results of the familiar coin flip, that is, v({HH, HT}) = v({TH, TT) =
• •
1 ; 8
1 ; 2
assigns 9/16 to any 3-state event (like {HH, HT, TH}) and 1 to the whole state space; assigns the sum of the weights of its (singleton) elements to each other event.
Such a preference embodies a dislike of ambiguity: The decision maker prefers to bet on the familiar coin rather than on the unfamiliar one (notice that v({HH, TH}) = 1/4 < 1/2 = v({HH, HT})). However, the capacity v is not convex,2 so that she is not ambiguity averse according to Schmeidler’s definition.
3.1. Comparative foundations to ambiguity aversion Motivated by these problems with Schmeidler’s definition, Epstein (1999) tried a different approach to defining aversion to ambiguity, inspired by Yaari’s (1969) general definition of risk aversion for non-expected utility preferences.
38
Paolo Ghirardato
He suggested using a two-stage approach, first defining a notion of comparative ambiguity aversion, and then calling averse to ambiguity any preference which is more averse than (what we establish to be) an ambiguity neutral preference. Ghirardato and Marinacci (2002, GM) followed his example, employing a different comparative notion and a different definition of ambiguity neutrality. For reasons that will become clear presently, I shall discuss these contributions in inverse chronological order. Ghirardato and Marinacci depart from the observation that preferences that obey the classical Expected Utility Theory (EUT) are intuitively ambiguity neutral, and propose using such preferences as the benchmark to measure ambiguity aversion. As to the comparative ambiguity aversion notion, they suggest calling a preference 2 more ambiguity averse than a preference 1 , if both preferences are represented by the same utility function3 and given any constant act x and any act f , we have that whenever the first preference favors the (certainly unambiguous) constant x to the (possibly ambiguous) f , the second does the same; that is, x 1 (1 )f
=⇒
x 2 (2 )f .
(3.1)
Thus, a preference is ambiguity averse if it is more averse to ambiguity than some EUT preference. GM show that every MMEU preference is averse to ambiguity in this sense (while “maximax EU” preferences are ambiguity seeking). In contrast, a CEU preference is ambiguity averse if and only if its capacity v has a nonempty core, a strictly weaker property than convexity. Therefore, GM conclude that Schmeidler’s definition captures strictly more than aversion to ambiguity. (Notice that the capacity v in Example 3.1 does have a nonempty core; the uniform probability on S is in Core(v).) This definition is simple and it has intuitive characterizations,4 but it can be criticized in an important respect. It does not distinguish between those departures from EUT, which are unrelated to ambiguity (like the celebrated “Allais paradox”)—in the terminology of Chapter 1, the violations of the third tenet of Bayesianism—and those which are. Every departure from the EUT benchmark is attributed to the presence of ambiguity. To see why this may be an issue, consider the following example. Example 3.2. Using again the two-coin example, consider a decision maker with CEU (indeed, RDEU) preferences and the capacity v defined by: v (S) = 1 and v (A) = P (A)/2 for A = S, where P is the uniform probability on the state space S. At first blush, we may invoke aversion to ambiguity (recall that the second coin is the unfamiliar one) to explain the fact that v ({TH, HH}) = v ({TT, HT}) = 1/4. However, we also see that v ({HH, HT}) = v ({TH, TT}) = 1/4; that is, the decision maker is similarly unwilling to bet on the familiar, unambiguous coin. What we are observing is a dislike of uncertainty which is more general than just aversion to ambiguity: The decision maker treats even events with “known” probability 1/2 as if they really had probability 1/4. This is a trait usually called probabilistic risk aversion; the decision maker appears in fact to be neutral to
Defining ambiguity and ambiguity attitude
39
the ambiguity in this problem. However, the capacity v is convex, so that both Schmeidler and GM would classify this decision maker as ambiguity averse. Epstein (1999) offers a definition that avoids this problem, carefully distinguishing between “risk-based” behavioral traits and “ambiguity-based” ones. The key idea is to use a set A of events which are exogenously known to be considered unambiguous by every decision maker, like the results of the flips of the familiar coin in the example stated earlier. Acts which only depend on the events in A are called unambiguous. The comparative definition is then modified as follows: say that preference 2 is more ambiguity averse than preference 1 if for any act f and any unambiguous act h, we have h 1 (1 )f
=⇒
h 2 (2 )f .
(3.2)
Notice that this definition is strictly stronger than GM’s, as constant acts are unambiguous, while in general (i.e. for nontrivial A) there will be unambiguous acts which are not constant. As long as the set A (and hence the set of unambiguous acts) is sufficiently rich, Eq. (3.2) implies that the two preferences have identical utility functions as well as identical probabilistic risk aversion. For instance, the CEU decision maker with capacity v in Example 3.1 cannot be compared to the one with capacity v in Example 3.2; their willingness to bet on the unambiguous results of the flips of the second coin are different. A CEU preference comparable to that with capacity v must also “transform” an objective probability of 1/2 into a 1/4. The choice of the benchmark with respect to which ambiguity aversion has to be measured is made consistently with this modified comparative notion. EUT preferences are probabilistic risk neutral, and do not “transform” the probabilities of unambiguous events, so they cannot be compared to preferences like the CEU preference with capacity v . Epstein uses preferences which satisfy Machina and Schmeidler’s (1992) probabilistic sophistication model, which allows nonexpected utility preferences as long as their ranking of bets on events can be represented by a probability.5 He calls a decision maker ambiguity averse if his preference is more averse to ambiguity than a probabilistically sophisticated preference. His characterization results are not as clear-cut as those in GM: While basically every MMEU preference is ambiguity averse, the characterization of CEU preferences is less straightforward. Epstein does provide a full characterization for those CEU preferences that satisfy a certain smoothness condition, which he calls “eventwise differentiability.” I refer the reader to his chapter for details. Epstein’s definition of ambiguity aversion is limited by the requirement of a rich set A of exogenously unambiguous events. Suppose that we observe a decision maker who has CEU preferences with capacity v as in Example 3.2, but we do not know what the decision maker knows about these two coins. Can we conclude that he is ambiguity neutral and probabilistic risk averse? If both coins were unfamiliar, his capacity would instead reflect ambiguity aversion—for all we know, he may even have EUT preferences (i.e. be probabilistic risk neutral) when betting on familiar coins. The problem is that in this case the set A is just the trivial {θ , S},
40
Paolo Ghirardato
too poor to enable us to distinguish between “pure” ambiguity aversion and probabilistic risk aversion. (As a consequence, the observation that the capacity v is convex yet induces behavior that is not intuitively ambiguity averse, may be in need of reconsideration.) We reach the conclusion that a theory of “pure” ambiguity aversion (as opposed to what is measured by GM) must be founded on an endogenous theory of ambiguity, if it is to be generally valid. This is what Epstein next turned his attention to; it is discussed in the next subsection. Before closing this discussion on the comparative foundation to ambiguity aversion, I remark that, while Epstein’s (1999) chapter is the earliest to use a comparative approach to provide an absolute notion of ambiguity aversion, there are others who much earlier discuss comparative ambiguity aversion. Tversky and Wakker (1995) present and characterize some different comparative notions related to ambiguity and probabilistic risk aversion. Kelsey and Nandeibam (1996) propose a comparative notion similar to GM’s, implicitly assuming the equality of utility, and show its characterization for CEU and MMEU preferences.
3.2. What is ambiguity? As observed earlier, the quest for the distinction of ambiguity aversion and behavioral traits unrelated to the presence of ambiguity was a driving force behind the more recent attempts (like Epstein and Zhang (2001)) at understanding the behavioral consequences of the presence of ambiguity. However, there have been others who have addressed the definition of ambiguity. Fishburn (1993) considers a primitive ambiguity relation over events, and discusses its properties and representation by an ambiguity measure. Nehring (1999) defines an event A unambiguous for a MMEU preference with set of priors C if P (A) = P (A) for every P , P ∈ C. As to CEU preferences, Nehring recalls that any capacity v on a finite state space S = {s1 , s2 , . . . , sn } can be canonically associated with the set Cv of the probabilities Pσ defined as follows. Let σ denote a permutation of the indices {1, . . . , n}, and define6 Pσ (sσ (i) ) = v({sσ (1) , sσ (2) , . . . , sσ (i) }) − v({sσ (1) , sσ (2) , . . . , sσ (i−1) }). Using this fact allows him to define ambiguity of events analogously to the MMEU case, with Cv in place of C. In both cases, an event is unambiguous if it is given identical weight in the evaluation of any act. Nehring shows that while for MMEU preferences the set of unambiguous events is a λ-system (a class closed with respect to complements and disjoint unions), for CEU preferences it is an algebra (i.e. it is also closed with respect to intersections). As there are situations in which the set of unambiguous events is not an algebra, this suggests that CEU preferences cannot be used to model all decision problems under ambiguity.7 A notion of ambiguity for events that holds for a wider class of preferences was introduced in Zhang (2002). Loosely put, Zhang calls unambiguous an event A such that Savage’s sure-thing principle holds for acts separated on the partition {A, Ac }. He then shows that the set of such events is a λ-system, and that for
Defining ambiguity and ambiguity attitude
41
a subset of CEU preferences (those which induce an exact v; details are found in GM) it has a simple representation in terms of the capacity v: It is the set of the A’s such that v(A) + v(Ac ) = 1. Zhang’s definition of unambiguous event was later modified in Epstein and Zhang (2001, EZ), the announced attempt to endogenize the class of unambiguous events used in Epstein’s definition of ambiguity aversion. The idea of EZ’s definition is similar to Zhang’s (2002), though it yields a larger collection of unambiguous events. Axioms on the decision maker’s preferences are introduced, which guarantee that the resulting collection of events is a λ-system and that the preferences over the sets of unambiguous acts (those which are measurable with respect to unambiguous events) are probabilistically sophisticated in the sense of Machina and Schmeidler (1992). This yields an interesting extension of Machina and Schmeidler’s and Savage’s models, wherein the set of events on which the decision maker satisfies the first and second tenet of Bayesianism is determined endogenously.8 However, it does not fully solve the problem of screening “riskbased” behavioral traits. In fact, if a preference is probabilistically sophisticated then every event is unambiguous in the EZ sense. It follows that the decision maker with CEU preferences and capacity v in Example 3.2 (who, recall, is probabilistically sophisticated) considers every event unambiguous and is probabilistic risk averse. This is regardless of the information that is available to her; it does not matter whether she is betting on familiar or unfamiliar coins. The problem is that EZ’s definition does not distinguish between the really unambiguous events and those which appear to be. It seems likely that such distinction could only be assessed by enriching the decision framework; that is, allowing the theorist to observe more than just the decision maker’s preferences over acts. Going back to the two-coins flip example, regardless of what a decision maker thinks about the unfamiliar coin she may believe that the event that it lands heads up on a single flip is more likely than the event that it lands heads up twice in a row. That is, she may hold that a bet on one head in two flips is “unambiguously better than” a bet on two heads in two flips. All the notions of ambiguity introduced thus far cannot formally capture this possibility. In an unpublished 1996 conference talk, Nehring suggested doing so using the largest subrelation of that satisfies independence, that I shall label I . He argued that if S is finite, for a class of preferences9 the results in Bewley (2002) can be used to show that I has a multiple priors with unanimity representation, with a set of priors D. In particular, when the decision maker satisfies MMEU with set of priors C we have C = D, while D = Cv when she satisfies CEU with capacity v. Although the relation I thus obtained can in principle be constructed using only behavioral data, its derivation is not simple. Independently, Nehring (2001) and Ghirardato et al. (2002) proposed to derive from the decision maker’s preference an unambiguous preference relation as follows: Say that act f is unambiguously preferred to act g, which is denoted f ∗ g, if αf + (1 − α)h αg + (1 − α)h for every α and every h. That is, f ∗ g if the preference of f over g cannot be overturned by mixing them with another act h, regardless of whether the latter allows to hedge (or speculate on) ambiguity. It turns out that ∗ = I , providing
42
Paolo Ghirardato
a more immediate behavioral foundation to the approach proposed by Nehring in his 1996 talk. The set of priors D representing ∗ by unanimity is naturally interpreted as the ambiguity that the decision maker perceives—better, appears to perceive—in her problem. The events on which all probabilities in D agree (which can simply be characterized in terms of the primitive ; see Ghirardato et al. (2002: prop. 24)) are natural candidates for being called unambiguous, and the collection of unambiguous events forms a λ-system. Unlike his 1996 talk, Nehring (2001) considers a countably infinite S and preferences whose induced ∗ is represented by a D satisfying a “range convexity” condition. Among various consequences of such range convexity, he shows the characterization of two intuitive notions of absolute ambiguity aversion. In particular, say that a preference relation is weakly ambiguity averse if for every pair of partitions of S, {A1 , A2 , . . . , An } and {T1 , T2 , . . . , Tn }, such that each Ti is unambiguous, we cannot have that the decision maker prefers betting on Ai over betting on Ti for every i. Under Nehring’s assumptions, a decision maker is weakly ambiguity averse if her ranking of bets can be represented by a capacity v with a nonempty core. A stronger property, which Nehring calls “ambiguity aversion,” is shown instead to be equivalent to the fact that the decision maker’s ranking of bets is represented by the lower envelope of D. Ghirardato et al. (2002) consider an arbitrary S and a different class of preferences.10 They show that the set D representing ∗ can also be obtained as an (appropriately defined) “derivative” of the functional that represents the preferences. In particular, when the state S is finite, this characterization implies that D is the (closed convex hull of the) set of all the Gateaux derivatives of the preference functional, where they exist. This result generalizes the EUT intuition that a decision maker’s subjective probability of state s is the shadow price for changes in the utility received in state s, by allowing a multiplicity of shadow prices. A consequence is the extension to preferences with nonlinear utility of Nehring’s 1996 result that D corresponds to C (resp. Cv ) in the MMEU (resp. CEU) case—which in turn implies that the set of unambiguous events coincides with that defined for such preferences in Nehring (1999). Ghirardato et al. (2002) also prove that the preferences they study can in general be given a representation, which is a generalization of the MMEU representation. More precisely, an act f is evaluated via a(f ) min u(f (s)) dP (s) + (1 − a(f )) max u(f (s)) dP (s), P ∈D
P ∈D
where a(·) is a function taking values in [0, 1] which represents the decision maker’s aversion to perceived ambiguity in the sense of GM. They also axiomatize the so-called α-maxmin EU model, in which a(·) ≡ α.11 The interesting aspect of this representation is its clear separation of ambiguity (represented by D) and ambiguity attitude (represented by a(·)), and it is encouraging that the model does not impose cross-restrictions between these two aspects of the representation. As can be seen from the foregoing discussion, the “relation-based” approach to modeling ambiguity is, at least in terms of its consequences, a significant
Defining ambiguity and ambiguity attitude
43
improvement over the previous “event-based” approaches. It has also yielded some interesting new perspectives on the characterization of ambiguity aversion and love. On the other hand, it is important to stress that this approach suffers of the same shortcoming as GM’s theory of ambiguity aversion: It does not really describe “pure” ambiguity aversion, rather the conjunction of all those behavioral features that induce departures from the independence axiom of EUT. In the terminology of Chapter 1, it does not distinguish between the violations of the first and the third tenets of Bayesianism. As observed earlier, it is not obvious that a solution to this identification problem can be reached without departing from a purely behavioral approach. Besides, a difficulty with such a departure is that it would require some prejudgment as to what really constitutes ambiguity, which is the very question that we set to answer. Another limitation of the “relation-based” approach due to its purely behavioral nature is the identification of ambiguity neutrality with lack of ambiguity. If a decision maker’s preference satisfy EUT, she is deemed to perceive no ambiguity, while it may be the case that she perceives ambiguity and is neutral with respect to it. Clearly, distinctions could be drawn if we considered ancillary information about the ambiguity present in the problem, at the mentioned cost of prejudging the nature of ambiguity. On the other hand, this is not as serious a concern as the one mentioned earlier, for ultimately our interest is modeling ambiguity as it affects decision makers’ behavior, and not otherwise.
Notes 1 Recall that Schmeidler used the Anscombe–Aumann setting, in which mixtures of acts can be defined state by state. Also, he used the term “uncertainty” averse rather than ambiguity averse. 2 For instance, we have that v({HH, HT, TH}) = 9/16 < 10/16 = v({HH, HT}) + v({TH}). 3 The preferences considered in GM, called “biseparable preferences,” induce stateindependent and cardinally unique utilities. They include CEU and MMEU preferences (and other models as well) as special cases. 4 The idea that nonemptiness of the core could be a more appropriate formalization of ambiguity aversion for CEU preferences had already been suggested by Montesano and Giovannoni (1996). 5 For instance, a CEU preference is probabilistically sophisticated if its capacity v is ordinally equivalent to a probability; that is, if it is RDEU. Such is the case of the preference with capacity v in Example 3.2. 6 Given utility u and an act f , it can be seen from the definition of Choquetintegral that if σ is such that u(f (sσ (1) )) ≥ u(f (sσ (2) )) ≥ · · · ≥ u(f (sσ (n) )), then u(f ) dv = u(f ) dPσ . 7 The fact that unambiguous events should form λ-systems and not algebras was observed earlier in Zhang (2002), whose first version predates Nehring’s. 8 Further extensions in this spirit are found in Kopylov (2002). In that chapter it is also shown that in general the sets of unambiguous events of Zhang and EZ are not λ-systems, but less structured families called “mosaics.” 9 Those that have linear utility among the preferences that satisfy all the axioms in Gilboa and Schmeidler (1989) but their “uncertainty aversion” axiom. The latter are called invariant biseparable preferences by Ghirardato et al. (2002).
44
Paolo Ghirardato
10 Invariant biseparable preferences (see note 9). Such preferences do not yield specific restrictions on D (beyond convexity, nonemptiness and closedness), but they embody a mild restriction that Nehring (2001) calls “utility sophistication”. Nehring shows that under range convexity, it is possible to define an unambiguous likelihood relation on events even without utility sophistication. See chapter for details. 11 Variants of this representation are well known at least since the seminal work of Hurwicz (published in Arrow and Hurwicz (1972)). See, in particular, Jaffray (1989).
References Arrow, K. J. and L. Hurwicz (1972). “An Optimality Criterion for Decision Making under Ignorance,” in Uncertainty and Expectations in Economics, ed. by C. Carter and J. Ford. Basil Blackwell, Oxford. Bewley, T. (2002). “Knightian Decision Theory: Part I,” Decisions in Economics and Finance, 25(2), 79–110 (first version 1986). Casadesus-Masanell, R., P. Klibanoff, and E. Ozdenoren (2000). “Maxmin Expected Utility over Savage acts with a Set of Priors,” Journal of Economic Theory, 92, 33–65. Ellsberg, D. (1961). “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Epstein, L. G. (1999). “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608. (Reprinted as Chapter 9 in this volume.) Epstein, L. G. and J. Zhang (2001). “Subjective Probabilities on Subjectively Unambiguous Events,” Econometrica, 69, 265–306. Fishburn, P. C. (1993). “The Axioms and Algebra of Ambiguity,” Theory and Decision, 34, 119–137. Ghirardato, P., F. Maccheroni, and M. Marinacci (2002). “Ambiguity from the Differential Viewpoint,” Social Science Working Paper 1130, Caltech, http://www.hss.caltech.edu/∼paolo/differential.pdf. Ghirardato, P., F. Maccheroni, M. Marinacci, and M. Siniscalchi (2001). “A Subjective Spin on Roulette Wheels,” Econometrica, 71 (6): 1897–1908, Nov. 2003. Ghirardato, P. and M. Marinacci (2002). “Ambiguity Made Precise: A Comparative Foundation,” Journal of Economic Theory, 102, 251–289. (Reprinted as Chapter 10 in this volume.) Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Jaffray, J.-Y. (1989). “Linear Utility Theory for Belief Functions,” Operations Research Letters, 8, 107–112. Kelsey, D. and S. Nandeibam (1996). “On the Measurement of Uncertainty Aversion,” Mimeo, University of Birmingham. Knight, F. H. (1921). Risk, Uncertainty and Profit. Houghton Mifflin, Boston. Kopylov, I. (2002). “Subjective Probabilities on ‘Small’ Domains,” Work in progress, University of Rochester. Machina, M. J. and D. Schmeidler (1992). “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. Montesano, A. and F. Giovannoni (1996). “Uncertainty Aversion and Aversion to Increasing Uncertainty,” Theory and Decision, 41, 133–148. Nehring, K. (1999). “Capacities and Probabilistic Beliefs: A Precarious Coexistence,” Mathemathical Social Sciences, 38, 197–213.
Defining ambiguity and ambiguity attitude
45
—— (2001). “Ambiguity in the Context of Probabilistic Beliefs,” Mimeo, UC Davis. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Tversky, A. and P. P. Wakker (1995). “Risk Attitudes and Decision Weights,” Econometrica, 63, 1255–1280. Yaari, M. E. (1969). “Some Remarks on Measures of Risk Aversion and on Their Uses,” Journal of Economic Theory, 1, 315–329. Zhang, J. (2002). “Subjective Ambiguity, Probability and Capacity,” Economic Theory, 20, 159–181.
4
Introduction to the mathematics of ambiguity Massimo Marinacci and Luigi Montrucchio
4.1. Introduction As discussed at length in Chapters 1–3, some mathematical objects play a central role in Schmeidler’s decision-theoretic ideas. In this chapter we provide some more details on them. One of the novelties of Schmeidler’s decision theory papers was the use of general set functions, not necessarily additive, to model “ambiguous” beliefs. This provided a new and intriguing motivation for the study of these mathematical objects, already studied from a different standpoint in cooperative game theory, another field where David Schmeidler has made important contributions. Here we overview the main properties of such set functions. Most of the results we will present are known, though often not in the generality in which we state and prove them. In the attempt to provide streamlined proofs and more general statements, we sometimes came up with novel arguments.
4.2. Set functions 4.2.1. Basic properties We begin by studying the basic properties of set functions. We use the setting of cooperative game theory as most of these concepts originated there; their decisiontheoretic interpretation is treated in great detail in Chapters 1–3 and 13, as well as in the other chapters in this book. Let be a set of players and an algebra of admissible coalitions in . A (transferable utility) game is a real-valued set function ν : → R with the only requirement that ν(Ø) = 0. Given a coalition A ∈ , the number ν(A) is interpreted as its worth, that is, the overall value that his members can achieve by teaming up. The condition ν(Ø) = 0 reflects the obvious fact that the worth of the empty coalition is zero; a priori, nothing more is assumed in defining a game ν. In the game theory literature several additional conditions have been considered. In
Introduction to the mathematics of ambiguity
47
particular, a game ν is1 positive if ν(A) ≥ 0 for all A; bounded if supA∈ |ν(A)| < + ∞; monotone if ν(A) ≤ ν(B) whenever A ⊆ B; superadditive if ν(A ∪ B) ≥ ν(A) + ν(B) for all pairwise disjoint sets A and B; 5 convex (supermodular) if ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B) for all A, B; 6 additive (a charge) if ν(A ∪ B) = ν(A) + ν(B) for all pairwise disjoint sets A and B.
1 2 3 4
All these conditions have natural game-theoretic interpretations (see, e.g. Moulin (1995) and Owen (1995)). For example, a game is monotone when larger coalitions can achieve higher values, and it is superadditive when combining disjoint coalitions results in more than proportional increases in value. As to supermodularity, it is a stronger property than superadditivity and it can be equivalently formulated as ν(B ∪ C ∪ A) − ν(B ∪ C) ν(B ∪ A) − ν(B),
(4.1)
for all disjoint sets A, B, and C; hence, it can be interpreted as a property of increasing marginal values (see Proposition 4.15). Some assumptions of a more technical nature are also often assumed. For example, a game ν is 7 outer (inner, resp.) continuous at A if limn→∞ ν(An ) = ν(A) whenever An ↓ A (An ↑ A, resp.); 8 continuous at A if it is both inner and outer continuous at A; 9 continuous if it is continuous at each A; ∞ ∞ 10 countably additive (a measure) if ν ν(Ai ) for all countable i=1 Ai = i=1 ∞ collections of pairwise disjoint sets {Ai }∞ i=1 Ai ∈ . i=1 such that We get important classes of games by combining some of the previous properties. In particular, monotone games are called capacities, additive games are called charges, and countably additive games are called measures. Finally, positive games ν that are normalized with ν( ) = 1 are called probabilities. Notice that capacities are always positive and bounded, while positive superadditive games are always capacities. Given a charge µ, its total variation norm µ is given by sup
n
|µ(Ai ) − µ(Ai−1 )|,
(4.2)
i=1
where the supremum is taken over all finite chains Ø = A0 ⊆ A1 ⊆ · · · ⊆ An =
. Denote by ba() and ca() the vector spaces of all charges and of all measures having finite total variation norm, respectively. By classic results (e.g. Dunford and Schwartz (1958) and Rao and Rao (1983)), a charge has finite total variation
48
Massimo Marinacci and Luigi Montrucchio
if and only if it is bounded, and both ba() and ca() are Banach spaces when endowed with the total variation norm. In particular, ca() is a closed subspace of ba(). In view of these classic results, it is natural to wonder whether a useful norm can be introduced in more general spaces of games. Aumann and Shapley (1974) showed that this is the case by introducing the variation norm on the space of all games. Given a game ν, its variation norm ν is given by sup
n
|ν(Ai ) − ν(Ai−1 )|,
(4.3)
i=1
where the supremum is taken over all finite chains Ø = A0 ⊆ A1 ⊆ · · · ⊆ An =
. If ν is a charge, the variation norm ν reduces to the total variation norm. Moreover, all finite games are of bounded variation as they have a finite number of finite chains. Denote by bv() the vector space of all games ν having finite variation norm. Aumann and Shapley (1974) proved the following noteworthy properties. Proposition 4.1. A game belongs to bv() if and only if it can be written as the difference of two capacities. Moreover, bv() endowed with the variation norm is a Banach space, and ba() and ca() are closed subspaces of bv().2 In view of this result, we can say that bv() is a Banach environment for not necessarily additive games that generalizes the classic spaces ba() and ca(). In the sequel we will mostly consider games belonging to it. We close this section by observing that each game ν has a dual game ν¯ defined by ν¯ (A) = ν( ) − ν(Ac ) for each A. From the definition immediately follows that • • •
ν¯¯ = ν; ν is monotone if and only if ν¯ does; ν belongs to bv() if and only if ν¯ does.
More important, dual games have “dual” properties relative to the original game. For example, • •
ν is convex if and only if ν¯ is concave, that is, ν¯ (A ∪ B) + ν¯ (A ∩ B) ≤ ν¯ (A) + ν¯ (B) for all A, B; ν is inner continuous at A if and only if ν¯ is outer continuous at Ac .
For charges µ it clearly holds µ = µ. ¯ Without additivity, ν and ν¯ are in general distinct games (see Proposition 4.3) and sometimes it is useful to consider the pair (ν, ν¯ ) rather than only ν. Example 4.1. The duality between ν and ν¯ does not hold for all properties. For example, it is false that ν is superadditive if and only if ν¯ is subadditive. Consider
Introduction to the mathematics of ambiguity
49
the game ν on = {ω1 , ω2 , ω3 } given by ν(ωi ) = 0 for i = 1, 2, 3, ν(ωi ∪ ωj ) = 5/6 for i, j = 1, 2, 3, and ν( ) = 1. Its dual ν¯ is given by ν¯ (ωi ) = 1/6 for i = 1, 2, 3, ν¯ (ωi ∪ ωj ) = 1 for i, j = 1, 2, 3, and ν( ) = 1. While ν is superadditive, its dual is not subadditive. In fact, ν¯ (ω1 ∪ ω2 ) = 1 > ν¯ (ω1 ) + ν¯ (ω2 ) = 1/3. Normalized superadditive games having subadditive duals are sometimes called upper probabilities (see Wolfenson and Fine (1982) and the references therein contained). 4.2.2. The core Given a game ν, its core is the (possibly empty) set given by core(ν) = {µ ∈ ba() : µ(A) ≥ ν(A) for each A and µ( ) = ν( )}. In other words, the core of ν is the set of all suitably normalized charges that setwise dominate ν. Notice that core(ν) = {µ ∈ ba() : ν ≤ µ ≤ ν¯ } = {µ ∈ ba() : µ(A) ≤ ν¯ (A) for each A and µ( ) = ν( )}, and so the core can be also regarded as the set of charges “sandwiched” between the game and its dual, as well as the set of charges setwise dominated by the dual game. The core is a fundamental solution concept in cooperative game theory, where it is interpreted as the set of undominated allocations (see Moulin (1995) and Owen (1995)). After Schmeidler’s seminal works, the core plays an important role in decision theory as well, as detailed in Chapters 1–3. Mathematically, the interest of the core lies in the connection it provides between games and charges, which, unlike games, are familiar objects in measure theory. As it will be seen later, useful properties of games can be deduced via the core from classic properties of charges. The core is a convex subset of ba(). More interestingly, it has the following compactness property.3 Proposition 4.2. When nonempty, the core of a bounded game is weak*-compact. Proof. Let µ ∈ core(ν) and let k = 2 supA∈ |ν(A)|. For each A it clearly holds µ(A) ≥ ν(A) ≥ − k. On the other hand, µ(A) = µ( ) − µ(Ac ) ≤ ν( ) − ν(Ac ) ≤ 2 sup |ν(A)|, A∈
and so |µ(A)| ≤ k. By (Dunford and Schwartz, 1958: 97), µ ≤ 2k, which implies core(ν) ⊆ {µ ∈ ba() : µ ≤ 2k}. By the Alaoglu Theorem (see Dunford and Schwartz, 1958: 424), {µ ∈ ba() : µ ≤ 2k} is weak*-compact. Therefore, to complete the proof it remains
50
Massimo Marinacci and Luigi Montrucchio
to show that core(ν) is weak*-closed. Let {µα }α be a net in core(ν) that weak*converges to µ ∈ ba(). Using the properties of the weak* topology, it is easy to see that µ ∈ core(ν). Hence, core(ν) is weak*-closed. Remark. When is a σ -algebra, the condition of boundedness of the game in Proposition 4.2 is superfluous by the Nikodym Uniform Boundedness Theorem (e.g. Rao and Rao, 1983: 204–205). The core suggests some further taxonomy on games. A game ν is 11 balanced if its core is nonempty; 12 totally balanced if all its subgames νA have nonempty cores.4 We already observed that for a charge µ it holds µ = µ. ¯ This property actually characterizes charges among balanced games. Proposition 4.3. A balanced game ν is a charge if and only if ν = ν¯ . Proof. The “only if ” part is trivial. As to the “if part”, let µ ∈ core(ν). As ν ≤ µ ≤ ν¯ , we have ν = µ = ν¯ , as desired. The next result characterizes balanced games directly in terms of properties of the game ν. It was proved by Bondareva (1963) and Shapley (1967) for finite games, and extended to infinite games by Schmeidler (1968). Theorem 4.1. A bounded game is balanced if and only if, for all λ1 , . . . , λn ≥ 0 and all A1 , . . . , An ∈ , it holds n i=1
λi ν(Ai ) ≤ ν( ) whenever
n
λi 1Ai = 1.
(4.4)
i=1
Proof. As the converse is trivial, we only show that ν is balanced provided it satisfies (4.4). By (4.4), ν(A) + ν(Ac ) ≤ ν( ) for all A, so that ν ≤ ν¯ . Let E be the collection of all finite subalgebras 0 of ; for each 0 ∈ E set c(0 ) = {γ ∈ R : ν(A) ≤ γ (A) ≤ ν¯ (A) for each A ∈ and γ|0 is a charge}, where R is the collection of all set functions on , and γ|0 is the restriction of γ on 0 . The set c(0 ) is nonempty. In fact, as 0 is finite and the restriction ν|0 satisfies (4.4), by Bondareva (1963) and Shapley (1967) there exists a charge γ0
Introduction to the mathematics of ambiguity
51
on 0 satisfying ν(A) ≤ γ0 (A) ≤ ν¯ (A) for each A ∈ 0 . If we set γ0 (A) if A ∈ 0 γ (A) = ν(A) otherwise we have γ ∈ c(0 ), so that c(0 ) = Ø. Set a = inf A∈ ν(A) and b = supA∈ ν¯ (A). Both a and b belong to R since ν is bounded, Border, 1999: 52) and so by the Tychonoff Theorem (see Aliprantis and . Clearly, c( ) ⊆ [a, b] is compact in the product topology of R the set 0 B∈ B∈ [a, b]. We want to show that c(0 ) is actually a closed subset of B∈ [a, b]. Let γt be a net in c(0 ) such that γt → γ ∈ R in the product topology, that is, γt (A) → γ (A) for all A ∈ . For each A and each t, we have ν(A) ≤ γt (A) ≤ ν¯ (A); hence, ν(A) ∪ γ (A) ≤ ν¯ (A). For each t and for all disjoint A and B in 0 , we have γt (A ∪ B) = γt (A) + γt (B); hence, γ (A ∪ B) = γ (A) + γ (B). We conclude that γ ∈ c(0 ), and so c(0 ) is a closed (and so compact) subset of B∈ [a, b]. ˜ 0 ∈ E the algebra If 0 ⊆ 0 , then c(0 ) ⊆ c(0 ). Hence, denoted by i n generated by a finite sequence {0 }i=1 ⊆ E, we have n
˜0 ⊆ Ø = c c 0i . i=1
In other words, the collection of compact sets {c(0 )}0 ∈E satisfies the finite intersection property. In turn, this implies 0 ∈E c(0 ) = Ø (see Aliprantis and Border, 1999: 38), which means that there exists a charge γ such that ν(A) ≤ γ (A) ≤ ν¯ (A) for each A ∈ . Since γ ∈ B∈ [a, b], the charge γ is bounded and so it belongs to ba(). We conclude that core(ν) = Ø, as desired. Remark. As observed by Kannai (1969: 229–230) for positive games Theorem 4.1 also follows from a result of Fan (1956) on systems of linear inequalities in normed spaces. Since countable additivity is a most useful technical property, it is natural to wonder when it is the case that a nonempty core actually contains some measures. The next example of Kannai (1969) shows that this might well not happen. Example 4.2. Let = N and consider the game ν : 2N → R defined by 0 Ac is infinite ν(A) = 1 else Here core(ν) = Ø. In fact, let ∇ be any ultrafilter containing the filter of all sets having finite complements. The two-valued charge u∇ : 2N → R defined by 1 A∈∇ u∇ (A) = 0 else
52
Massimo Marinacci and Luigi Montrucchio
belongs to core(ν). On the other hand, core(ν) ∩ ca() = Ø. For, suppose per contra that µ ∈ core(ν) ∩ ca(). For each n ∈ N we have µ(n) = µ(N) − µ(N − {n}) = 0. The countable additivity of µ then implies µ(N) = n µ(n) = 0, which contradicts µ(N) = ν(N) = 1. For positive games it is trivially true that core(ν) ⊆ ca() provided ν is continuous at . In fact, for each monotone sequence An ↑ it holds ν( ) = µ( ) ≥ lim µ(An ) ≥ lim ν(An ) = ν( ), n
n
for all µ ∈ core(ν). Hence, µ( ) = limn µ(An ), which implies µ ∈ ca(). For signed games we have a more interesting result, based on Aumann and Shapley (1974: 173). Proposition 4.4. Given a balanced game ν, it holds core(ν) ⊆ ca() provided ν is continuous at both and Ø. Proof. Consider An ↑ . Let µ ∈ core(ν). We want to show that µ( ) = limn µ(An ). Since µ(An ) ≥ ν(An ) for each n, by the continuity of ν at we have lim inf n µ(An ) ≥ lim inf n ν(An ) = ν( ). On the other hand, since Acn ↓ Ø and ν is continuous at Ø, we have lim sup µ(An ) = µ( ) − lim inf µ(Acn ) ≤ ν( ) − lim inf ν(Acn ) = ν( ). n
n
n
In sum, lim sup µ(An ) ≤ ν( ) ≤ lim inf µ(An ), n
n
and so µ( ) = limn µ(An ), as desired. The next example shows that in general these continuity properties are only sufficient for the core being contained in ca(). Example 4.3. Let λ be the Lebesgue measure on [0, 1] and let f : [0, 1] → R be given by ⎧ 1 ⎪ ⎨x 0 ≤ x ≤ 2 1 1 f (x) = 2 2 < x < 1 ⎪ ⎩ 1 x=1 Consider the game ν(A) = f (λ(A)) for each A. Though this game is not continuous at , we have core(ν) = {λ} ∈ ca(). For, let µ ∈ core(ν). We want to show that µ = λ. Given A, there isa partition {Ai }ni=1 of A such that λ(Ai ) ≤ 1/2. any n Hence, µ(A) = i=1 µ(Ai ) ≥ ni=1 λ(Ai ) = λ(A). Since A was arbitrary, this implies µ ≥ λ, and so µ = λ.
Introduction to the mathematics of ambiguity
53
Intuitively, this example works because the connection between the form of the game ν = f (λ) and its core is a bit “loose.” Formally, there are gaps between ν and the core’s lower envelope minµ∈core(ν) µ(A). For example, if A is such that λ(A) = 3/4, then ν(A) = 1/2 < 3/4 = minµ∈core(ν) µ(A). To fix this problem, Schmeidler (1972) introduced the following class of games: a game ν is 13 exact if it is balanced and ν(A) = minµ∈core(ν) µ(A) for each A. In other words, a game is exact if for each A there is µ ∈ core(ν) such that ν(A) = µ(A). Exact games can thus be viewed as games in which there is a tight connection between the form of the game and its core. Schmeidler (1972) provided a characterization of exact games in terms of the game ν, related to (4.4). Moreover, he was able to prove that for exact games continuity becomes a necessary and sufficient condition for the core to be a subset of ca(). To see why this is the case, we need a remarkable property of weak*compact subsets of ca(), due to Bartle, Dunford and Schwartz (see Maccheroni and Marinacci (2000) and the references therein contained). The result requires to be a σ -algebra, a natural domain for continuous set functions. Lemma 4.1. If is a σ -algebra, then a subset of ca() is weak*-compact if and only if it is weakly compact. Remark. As the proof shows, this lemma is a consequence of the Dini Theorem when K ⊆ ca + (). Proof. It is enough to prove that a weak*-compact subset of ca() is weakly compact, the converse being trivial. Suppose K ⊆ ca() is weak*-compact. Since K is bounded and weakly closed, by Dunford and Schwartz (1958: Theorem IV.9.1) the set K is sequentially weakly compact if and only if, given An ↑ , for each ε > 0 there is a positive integer n(ε) such that |µ( ) − µ(An )| < ε for all µ ∈ K and all n ≥ n(ε). In other words, if and only if the measures in K are uniformly countably additive. For convenience, we only consider the case K ⊆ ca + () (e.g., Maccheroni and Marinacci (2000), for the general case). For each n ≥ 1 consider the evaluation functions φn : ba() → R defined by φn (µ) = µ(An )
for each µ ∈ ba().
Moreover, let φ : ba() → R be denned by φ(µ) = µ( ) for each µ ∈ ba(). Both the function φ and each function φn are weak*-continuous, and the sequence {φn }n≥1 is increasing on K. As K is weak*-compact and lim φn (µ) = lim µ(An ) = µ( ) = φ(µ) for each µ ∈ K, n
n
by the Dini Theorem (see Aliprantis and Border, 1999: 55) φn converges uniformly to φ. In turn, this easily implies the desired uniform countable additivity of the
54
Massimo Marinacci and Luigi Montrucchio
measures in K, and so K is sequentially weakly compact. By the Eberlein–Smulian Theorem (see Aliprantis and Border, 1999: 256), K is then weakly compact as well.
Using this lemma we can prove the following result, due to Schmeidler (1972) for positive games. Here |µ|(A) denotes the total variation of µ at A (see Aliprantis and Border, 1999: 360). Theorem 4.2. Let ν : → R be an exact game defined on a σ -algebra . Then, the following conditions are equivalent: (i) (ii) (iii) (iv)
ν is continuous at and Ø. ν is continuous at each A. core(ν) is a weakly compact subset of ca(). There exists λ ∈ ca + () such that, given any A, for all ε > 0 there exists δ > 0 such λ(A) < δ =⇒ |µ|(A) < ε for all µ ∈ core(ν).
(4.5)
Remark. Inspection of the proof shows that when ν is positive, in (i) we can just assume continuity at , while in (iv) we can choose λ so that it belongs to core(ν). Proof. (ii) trivially implies (i), which in turn implies core(ν) ⊆ ca() by Proposition 4.4. By Proposition 4.2, core(ν) is weak*-compact, and so, by Lemma 4.1, it is weakly compact as well. Assume (iii) holds. Since core(ν) is a weakly compact subset of ca(), by Dunford and Schwartz (1958: Theorem IV.9.2) there is λ ∈ ca + () such that (iv) holds. If ν is positive, following Delbaen (1974: 226) replace 1/2i by 1/mn at the bottom of Dunford and Schwartz (1958: 307) to get λ ∈ core(ν). It remains to show that (iv) implies (ii). Assume (iv). Since λ is countably additive, (4.5) implies that each µ ∈ core(ν) is countably additive as well, that is, core(ν) ⊆ ca(). By Lemma 4.1, core(ν) is weakly compact. We are now ready to show that ν is continuous at each A. Per contra, suppose there is some A at which ν is not continuous, that is, there is a sequence, say An ↑ A (the argument for An ↓ A is similar), and some η > 0 such that |ν(An )−ν(A)| ≥ η. As ν is exact, for each n there is µn ∈ core(ν) such that ν(An ) = µn (An ). By the Eberlein– Smulian Theorem (see Aliprantis and Border, 1999: 256), core(ν) is sequentially weakly compact as well. Hence, there is a suitable subsequence {µnk }nk of {µn }n such that µnk weakly converges to some µ˜ ∈ core(ν). By Dunford and Schwartz ˜ for each A. (1958: Theorem IV.9.5), this means that limk µnk (A) = µ(A) Now, consider ν(Ank ) = µnk (Ank ) = µnk (A) − µnk (A \ Ank ).
(4.6)
Clearly, A \ Ank ↓ Ø. Since core(ν) is weakly compact, by Dunford and Schwartz (1958: Theorem IV.9.1) the measures in core(ν) are uniformly countably additive,
Introduction to the mathematics of ambiguity
55
and so for each ε > 0 there is k(ε) 1 such that |µ(A \ Ank )| < ε for all µ ∈ core(ν) and all k ≥ k(ε). In particular, |µnk (A \ Ank )| < ε for all k ≥ k(ε). As ε is arbitrary, this implies limk µnk (A \ Ank ) = 0. By (4.6), we then have lim ν(Ank ) = lim µnk (Ank ) = µ(A) ˜ ≥ ν(A). k
k
(4.7)
On the other hand, there exists a µˆ ∈ core(ν) such that µ(A) ˆ = ν(A). Hence, ˆ nk ) ≥ lim ν(Ank ). ν(A) = µ(A) ˆ = lim µ(A k
k
(4.8)
Putting together (4.7) and (4.8), we get ν(A) = limnk ν(Ank ), thus contradicting |ν(An ) − ν(A)| ≥ η. We conclude that ν is continuous at A, as desired. Point (iv) is noteworthy. It says that the continuity of ν guarantees the existence of a positive control measure λ for core(ν), that is, a measure λ ∈ ca + () such that µ λ for all µ ∈ core(ν). This is a very useful property; inter alia, it implies that core(ν) can be identified with a subset of L1 (λ), the set of all (equivalence classes) of -measurable functions that are integrable with respect to λ. In fact, by the Radon–Nikodym Theorem (see Aliprantis and Border, 1999: 437) to each µ λ corresponds a unique f ∈ L1 (λ) such that µ(A) = A f dλ for all A. Corollary 4.1. Let ν : → R be an exact game defined on a σ -algebra . Then, ν is continuous at and Ø if and only if there is λ ∈ ca + () such that core(ν) is a weakly compact subset of L1 (λ). Proof. Set ca(λ) = {µ ∈ ca : µ λ}. By the Radon–Nikodym Theorem, there is an isometric isomorphism between ca(λ) and L1 (λ) determined by the formula µ(A) = A f dλ (see Dunford and Schwartz, 1958: 306). Hence, a subset is weakly compact in ca(λ) if and only if it is in L1 (λ) as well. It is sometimes useful to know when the core of a continuous game consists of non-atomic measures. We close the section by studying this problem, which also provides a further illustration of the usefulness of the control measure λ. In order to do so, we need to introduce null sets. Given a game ν, a set N is ν-null if ν(N ∪ A) = ν(A)
for all A ∈ .
The next lemma collects some basic properties of null sets. Lemma 4.2. Given a game ν, let N be a ν-null set. Then (i) each subset B ⊆ N is ν-null; (ii) ν(B) = 0 and ν(A \ B) = ν(A) for any B ⊆ N ; (iii) N is ν¯ -null.
(4.9)
56
Massimo Marinacci and Luigi Montrucchio
Proof. (i) Let B ⊆ N and let A be any set in . By (4.9), ν(B ∪ A) = ν(B ∪ A ∪ N ) = ν(A ∪ N ) = ν(A), and so B is ν-null. (ii) If we put A = Ø in (4.9), we get ν(N ) = 0. By (i), each B ⊆ N is νnull, so that ν(B) = 0 by what we have just established. It remains to show that ν(A\B) = ν(A) for any B ⊆ N . By (i), A ∩ B is ν-null. Hence, ν(A\B) = ν((A \ B) ∪ (A ∩ B)) = ν(A), as desired. (iii) Let A be any set in . By (ii) we then have ν¯ (A ∪ N ) = ν( ) − ν(Ac \ N ) = ν( ) − ν(Ac ) = ν¯ (A), as desired. For a charge µ, a set N is µ-null if and only if |µ|(N ) = 0. For, suppose N is µ-null. We have (see Aliprantis and Border, 1999: 360): |µ|(N ) = sup{|µ(B)| + |µ(N\B)| : B ⊆ N }, and so point (ii) of Lemma 4.2 implies |µ|(N ) = 0. Conversely, suppose |µ|(N ) = 0. Then, |µ(B)| = 0 for each B ⊆ N, and so µ(A ∪ N) = µ(A ∪ N \A) = µ(A) + µ(N\A) = µ(A) for each set A ∈ . We conclude that N is µ-null, as desired. Given two games ν1 and ν2 , we say that ν1 is absolutely continuous with respect to ν2 (written ν1 ν2 ) when each ν2 -null set is ν1 -null; we say that the two games are equivalent (written ν1 ≡ ν2 ) when a set is ν1 -null if and only if it is ν2 -null. In the special case of charges we get back to the standard definitions of absolute continuity (see Aliprantis and Border, 1999: 363). Given a balanced game ν, we have µ ν for each µ ∈ core (ν). For, let m ∈ core (ν) and suppose N is ν-null. For each A ⊆ N , we have m(A) ≥ ν(A) = 0, and m(Ac ) ≥ ν(Ac ) = ν( ) = m( ) = m(A) + m(Ac ). Hence, m(A) = 0 for all A ⊆ N, namely, |m|(N ) = 0. For continuous exact games we have the following deeper result, due to Schmeidler (1972: Theorem 3.10), which provides a further useful property of the control measure λ. Lemma 4.3. Given an exact and continuous game ν defined on a σ -algebra , let λ be the control measure of Theorem 4.2. Then, ν ≡ λ.
Introduction to the mathematics of ambiguity
57
Proof. By Dunford and Schwartz (1958: Theorem IV.9.2), we have λ=
kn ∞ −n n 2 µ i kn n=1
(4.10)
i=1
with each µni ∈ core (ν). Let N be ν-null. As µ ν for each µ ∈ core (ν), N is µ-null for each µ ∈ core (ν). Hence, |µ|(N ) = 0 for all µ ∈ core (ν). By (4.10), λ(N ) = 0. Therefore N is λ-null. Conversely, suppose λ(N ) = 0. As µ λ, for each µ ∈ core (ν), we have |µ|(N ) = 0 for each µ ∈ core (ν). By exactness, there are µ, µ ∈ core (ν) such that ν(N ∪ F ) = µ(N ∪ F ) = µ(F ) ≥ ν(F ) = µ (F ) = µ (N ∪ F ) ≥ ν(N ∪ F ) and so N is ν-null. We conclude that ν ≡ λ, as desired. A game ν is non-atomic if for each ν-nonnull set A there is a set B ⊆ A such that both B and A\B are ν-nonnull. In particular, a charge µ is non-atomic if and only if for each |µ|(A) > 0 there is B ⊆ A such that 0 < |µ|(B) < |µ|(A). In turn, this is equivalent to require that for each µ(A) = 0 there is B ⊆ A such that both µ(B) = 0 and µ(A\B) = 0 (see Rao and Rao, 1983: 141–142). We can now state and prove the announced result on “non-atomic” cores. Proposition 4.5. Let ν be a continuous exact game defined on σ -algebra . Then, ν is non-atomic if and only if core (ν) consists of non-atomic measures. Proof. “If” part. Suppose ν is non-atomic. By Lemma 4.3, λ as well is nonatomic. In turn, this implies that each µ ∈ core (ν) is non-atomic. In fact, let |µ|(A) = 0 for some A, so that λ(A) > 0. Since λ is non-atomic, there is a partition 1 of A such that λ(A1 ) = λ(B 1 ) = 1 λ(A) (see Rao and Rao, 1983: A11/2 , B1/2 1/2 1/2 2 1 ) < |µ|(A), we are Theorem 5.1.6). If 0 < |µ|(A11/2 ) < |µ|(A) or 0 < |µ|(B1/2 1 1 ) = |µ|(A). done. Suppose, in contrast, that either |µ|(A1/2 ) = |µ|(A) or |µ|(B1/2 2 be a partition of A1 Without loss, let |µ|(A11/2 ) = |µ|(A). Let A21/2 and B1/2 1/2 1 ) = 1 λ(A1 ). If 0 < |µ|(A2 ) < |µ|(A1 ) or such that λ(A21/2 ) = λ(B1/2 1/2 1/2 1/2 2 2 ) < |µ|(A1 ) we are done. 0 < |µ|(B1/2 1/2 2 ) = Suppose, in contrast, that either |µ|(A21/2 ) = |µ|(A11/2 ) or |µ|(B1/2 1 2 1 |µ|(A1/2 ). Without loss, let |µ|(A1/2 ) = |µ|(A1/2 ). By proceeding in this way, either we find a set B ⊆ A such that 0 < |µ|(B) < |µ|(A) or we can construct a chain {An1/2 }n≥1 such that λ(An1/2 ) = 21n λ(A) and |µ|(An1/2 ) = |µ|(A) for all
n ≥ 1. Hence, being n≥1 An1/2 ∈ , λ( n≥1 An1/2 ) = 0 and |µ|( n≥1 An1/2 ) = |µ|(A) > 0. Since µ λ, this is impossible, and so there exists some set B ⊆ A such that 0 < |µ|(B) < |µ|(A). We conclude that µ is non-atomic, as desired.
58
Massimo Marinacci and Luigi Montrucchio
“Only if.” Suppose that each µ ∈ core (ν) is non-atomic. Set λn = n (2−n /kn ) ki=1 |µni | in (4.10). Then, λ = ∞ n=1 λn . Each positive measure λn is non-atomic. For, suppose λn (A) > 0. There is some |µni | such that |µni |(A) > 0. Hence, there is B ⊆ A such that |µni |(B) > 0 and |µni |(A\B) > 0. Since λn ≥ |µni |, we then have λn (B) > 0 and λn (A\B) > 0, as desired. Since each λn is non-atomic, λ as well is non-atomic. For, suppose λ(A) > 0. There is some λn such that λn (A) > 0. Hence, there is B ⊆ A such that λn (B) > 0 and λn (A\B) > 0. Since λ ≥ λn , we then have λ(B) > 0 and λ(A\B) > 0, and so λ is non-atomic. By Lemma 4.3, ν ≡ λ. As λ is non-atomic, this implies that ν as well is non-atomic, as desired.
4.3. Choquet integrals Given a game ν : → R and a real-valued function f : → R, a natural question is whether there is a meaningful way to write an integral f dν that extends the standard notions of integrals for additive games. Fortunately, Choquet (1953: 265) has shown that it is possible to develop a rich theory of integration in a non-additive setting. As usual with notions of integration, we will present Choquet’s integral in a few steps, beginning with positive functions. 4.3.1. Positive functions A function f : → R is -measurable if f −1 (I ) ∈ for each open and each closed interval I of R (see Dunford and Schwartz, 1958: 240). The set of all bounded -measurable f : → R is denoted by B(). Proposition 4.6. The set B() is a lattice. If, in addition, is a σ -algebra, then B() is a vector lattice. Proof. Let f , g ∈ B(). We only prove that (f ∨ g)−1 (a, b) ∈ for any open (possibly unbounded) interval (a, b) ⊆ R, the other cases being similar. For each t ∈ R, the following holds: (f ∨ g > t) = (f > t) ∪ (g > t) (f ∨ g < t) = (f < t) ∩ (g < t). Hence, (f ∨ g)−1 (a, b) = (f ∨ g > a) ∩ (f ∨ g < b) = ((f > a) ∪ (g > a)) ∩ ((f < b) ∩ (g < b)) ∈ , as desired. Finally, the fact that B() is a vector space when is a σ -algebra is a standard result in measure theory (see Aliprantis and Border, 1999: Theorem 4.26).
Introduction to the mathematics of ambiguity
59
Given a capacity ν : → R and a positive -measurable function f : → R, the Choquet integral of f with respect to ν is given by ∞ ν({ω ∈ : f (ω) ≥ t}) dt, (4.11) f dν = 0
where on the right we have a Riemann integral. To see why the Riemann integral is well defined, first observe that f −1 ([t, +∞)) = {ω ∈ : f (ω) ≥ t} ∈
for each t ∈ R.
Set Et = {ω ∈ : f (ω) ≥ t}; the survival function Gν : R → R of f with respect to ν is defined = ν(Et ) for each t ∈ R. Using this function, we by Gν (t) ∞ can write (4.11) as f dν = 0 Gν (t) dt. The family {Et }t∈R is a chain, with Et ⊇ Et if t ≤ t .5 Since ν is a capacity, we have ν(Et ) ≥ ν(Et ) if t ≤ t , and so Gν is a decreasing function. Moreover, since f is both positive and bounded, the function Gν is positive, decreasing and with compact support. By standard +∞ results on Riemann integration, we conclude that the Riemann integral 0 Gν (t) dt exists, and so the Choquet integral (4.11) is well defined. The Choquet integral f dν reduces to the standard additive integral when ν is additive. Given a positive charge µ and a function f in B(), let f dµ be the standard additive integral for charges (see Aliprantis and Border, 1999: 399 and Rao and Rao, 1983: 115–121). Proposition 4.7. Given a positive function f ∈ B() and a positive charge µ ∈ ba(), it holds f dµ = µ(f ≥ t) dt = f dµ. Proof. We use an argument of Rudin (1987: 172). Set Et = (f ≥ t). Given ω ∈ , we have
∞
1Et (ω) dt =
0
0
∞
1[0,f (ω)] (t) dt =
f (ω)
dt = f (ω).
0
∞ Equivalently, f (ω) = 0 1Et (ω) dλ, where λ is the Lebesgue measure on R. By the Fubini Theorem for the integral (e.g., Marinacci, 1997), we can write
f dµ =
= 0
as desired.
∞
1Et (ω) dλ
0 ∞
dµ =
µ(f ≥ t) dλ = 0
∞
1Et (ω) dµ
0 ∞
µ(f ≥ t) dt,
dλ
60 Massimo Marinacci and Luigi Montrucchio We close by observing that in defining Choquet integrals we could have equivalently used the “strict” upper sets (f > t). Proposition 4.8. Let ν be a capacity and f a positive function in B(). Then,
∞
∞
ν(f t) dt =
ν(f > t) dt.
0
0
Proof. As before, set Gν (t) = ν(f ≥ t) for each t ∈ R. Moreover, set Gν (t) = ν(f > t) for each t ∈ R. We have (f ≥ t + 1/n] ⊆ (f > t) ⊆ (f ≥ t) for each t ∈ R, and so Gν (t +1/n) ≤ Gν (t) ≤ Gν (t) for each t ∈ R. If Gν is continuous at t, we have Gν (t) = limn Gν (t + 1/n) ≤ Gν (t) ≤ Gν (t), so that Gν (t) = Gν (t). On the other hand, as Gν is a decreasing function, it is continuous except on an / T, at most countable setT ⊆ R. As a result, ∞ it holds Gν (t) = Gν (t) for all t ∈ ∞ which in turn implies 0 Gν (t) dt = 0 Gν (t) dt by standard results on Riemann integration. 4.3.2. General functions We now extend the definition of the Choquet integral to general -measurable functions. In the previous subsection we have defined the Choquet integral on B + (), the cone of all positive elements of B(). Each capacity ν induces a functional νc : B + () → R on this cone, given by νc (f ) = f dν for each f ∈ B + (). If f is a characteristic function 1A , we get νc (1A ) = 1A dν = ν(A); thus, the functional νc —which we call the Choquet functional—can be viewed as an extension of the capacity ν from to B + (). Our problem of defining a Choquet integral on B() can be viewed as the problem of how to extend the Choquet functional on the entire space B(). In principle, there are many different ways to extend it. To make the extension problem meaningful we have to set a desideratum for the extension, that is, a property we want it to satisfy. A natural property to require is that the extended functional νc : B() → R be translation invariant, that is, νc (f + α1) = νc (f ) + ανc (1) for each α ∈ R and each f ∈ B(). The next result shows that this desideratum pins down the extension to a particular form. Proposition 4.9. A Choquet functional νc : B + () → R induced by a capacity admits a unique translation invariant extension, given by 0
∞
ν(f ≥ t) dt +
0
−∞
[ν(f ≥ t) − ν( )] dt
for each f ∈ B(), where on the right we have two Riemann integrals.
(4.12)
Introduction to the mathematics of ambiguity
61
Proof. Set
∞
νc (f ) =
ν(f ≥ t) dt +
0
−∞
0
[ν(f ≥ t) − ν( )] dt.
The functional νc is well defined and some simple algebra shows that it is translation invariant and that it reduces to the Choquet integral when f ∈ B + (). Assume ν : B() → R is a translation invariant functional such that ν(f ) = νc (f ) ν satisfies (4.12), so that ν = νc . whenever f ∈ B + (). We want to show that Let f ∈ B() be such that inf f = γ < 0. By translation invariance, ν(f − γ) = ν(f ) − γ ν(1 ). As f − γ belongs to B + (), we can then write: ν(f ) = ν(f − γ ) + γ ν(1 ) = νc (f − γ ) + γ νc (1 ) ∞ ν((f − γ ) ≥ t) dt + γ νc (1 ) = 0
=
∞
ν(f ≥ t + γ ) dt + γ νc (1 )
0
=
∞
ν(f ≥ τ ) dτ + γ νc (1 )
γ
=
0
∞
ν(f ≥ τ ) dτ +
ν(f ≥ τ ) dτ −
0
γ
0
ν( ) dτ γ
where the penultimate equality is due to the change of variable τ = t + γ . As [ν(f ≥ τ ) − ν( )] = 0 for all τ ≤ γ , the following holds: ν(f ) =
∞
ν(f ≥ τ ) dτ +
0
0
−∞
(ν(f ≥ τ ) − ν( )) dτ .
Hence, ν = νc , as desired. Before moving on, observe that the Riemann integrals in (4.12) exist even if ν is a game of bounded variation, that is, if ν ∈ bv(). In fact, for each such game there exist two capacities ν1 and ν2 with ν = ν1 − ν2 . Hence, ν(f ≥ t) = ν1 (f ≥ t) − ν2 (f ≥ t) for each t ∈ R, and so ν(f ≥ t) is a function of bounded variation in t. The Riemann integrals in (4.12) then exist by standard results on Riemann integrals. In view of Proposition 4.9 and the above observation, next we define the Choquet integral for functions in B() with respect to games in bv() as the translation invariant extension of the definition given in (4.11) for positive functions.
62
Massimo Marinacci and Luigi Montrucchio
Definition 4.1. Given a game ν ∈ bv() and a function f ∈ B(), the Choquet integral f dν is defined by
f dν =
∞
ν(f ≥ t) dt +
0
0
−∞
[ν(f ≥ t) − ν( )] dt.
The associated Choquet functional νc : B() → R is given by νc (f ) = for each f ∈ B().
(4.13)
f dν
Translation invariance and Proposition 4.7 imply that when ν is a bounded charge, the Choquet integral f dν of a f ∈ B() reduces to the standard additive integral. Moreover, it is easy to check that Proposition 4.8 holds for general Choquet integrals, that is, ∞ 0 f dν = ν(f > t) dt + [ν(f > t) − ν( )] dt. −∞
0
Finally, the Choquet integral (4.13) is well defined for all finite games since they belong to bv(). As in the finite case B() = R , this means that finite games induce Choquet functional νc : R → R. Example 4.4. Given a nonempty coalition A, the unanimity game uA : → R is the two-valued convex game defined by 1 A⊆B uA (B) = 0 else for all B ∈ . For each f ∈ B() it holds f duA = inf ω∈A f (ω). In fact, we have A ⊆ (f ≥ t) if and only if t ≤ inf ω∈A f (ω), and so GuA (t) = 1(−∞,inf ω∈A f (ω)) (t). Example 4.5. Let = {ω1 , ω2 } and suppose ν is a capacity on 2 with 0 < ν(ω1 ) < 1, 0 < ν(ω2 ) < 1, and ν( ) = 1. Then, νc : R2 → R is given by x1 (1 − ν(ω2 )) + x2 ν(ω2 ) if x2 ≥ x1 , νc (x1 , x2 ) = x1 ν(ω1 ) + x2 (1 − ν(ω1 )) if x2 < x1 Given any k ∈ R, the level curve {(x1 , x2 ) ∈ R2 : νc (x1 , x2 ) = k} is 2) x2 = ν(ωk 2 ) − 1−ν(ω if x2 ≥ x1 , ν(ω2 ) x1 x2 =
k 1−ν(ω1 )
−
ν(ω1 ) 1−ν(ω1 ) x1
if x2 < x1
As a result, the level curve is a straight line when ν is a charge—that is, when ν(ω1 ) + ν(ω2 ) = 1—and it has, in contrast, a kink at the 45-degree line {(x1 , x2 ) ∈ R2 : x1 = x2 } when ν is not a charge. The non additivity of ν is thus reflected
Introduction to the mathematics of ambiguity
63
by kinks in the level curves. In general, level curves of Choquet integrals are not affine spaces, unless the game is a charge. A function f in B() is simple if it is finite-valued, that is, if the set {f (ω) : ω ∈ } is finite. Each simple function f admits a unique represenk k tation f = i=1 αi 1Ai , where {Ai }i=1 ⊆ is a suitable partition of and α1 > · · · > αk . Using this representation, we can rewrite formula (4.13) in a couple of equivalent ways, which are sometimes useful (e.g., the discussion of the Choquet Expected Utility model of Schmeidler (1989) in Chapter 1). Proposition 4.10. Given a game ν ∈ bv() and a simple function f ∈ B(), it holds ⎞ ⎛ i k f dν = Aj ⎠ (αi − αi+1 )ν ⎝ i=1
=
k
⎛ ⎛ αi ⎝ν ⎝
j =1
⎞
i
⎛
Aj ⎠ − ν ⎝
j =0
i=1
⎞⎞
i−1
Aj ⎠⎠ ,
j =0
where we set αk+1 = 0 and A0 = Ø. Proof. It is enough to prove the first equality, the other being a simple rearrangement of its terms. Let f be positive, so that αk ≥ 0. If t > α1 , then {ω ∈ : f (ω) ≥ t} = ∅. If t ∈ (αi+1 , αi ], then (recall that αk+1 = 0): {ω ∈ : f (ω) ≥ t} =
i
Aj .
j =1
Hence, ν(f ≥ t) =
k
⎛ ν⎝
i
⎞ Aj ⎠ 1(αi+1 ,αi ] (t) for each t ∈ R+ ,
j =1
i=1
so that
f dν =
∞
ν(f ≥ t) dt =
0
=
k i=1
k ∞
0
⎛ ν⎝
i
j =1
⎞ Aj ⎠
0
∞
i=1
⎛ ν⎝
i
⎞ Aj ⎠ 1(αi+1 ,αi ] (t) dt
j =1
1(αi+1 ,αi ] (t) dt =
k i=1
⎛ (αi − αi+1 )ν ⎝
i
⎞ Aj ⎠ ,
j =1
as desired. This proves the first equality for a positive f . The case of a general f is easily obtained using translation invariance.
64
Massimo Marinacci and Luigi Montrucchio
When ν ∈ ba(), the above formulae reduce to k αi ν(Ai ), f dν = i=1
which is the standard integral of f with respect to the charge ν. Example 4.6. Let P : → [0, 1] be a probability charge with range R(P ) = {P (A) : A ∈ }. Given a real-valued function g : R(P ) → R, the game ν = f (P ) is called a scalar measure game. It holds ∞ 0 f dν = [g(P (f ≥ t)) − g(1)] dt. g(P (f ≥ t)) dt + −∞
0
The right-hand side becomes ⎡ ⎛ ⎛ ⎞⎞ ⎛ ⎛ ⎞⎞⎤ i i−1 k αi ⎣g ⎝P ⎝ Aj ⎠⎠ − g ⎝P ⎝ Aj ⎠⎠⎦ i=1
j =0
j =0
when f is a simple function. This is a familiar formula in Rank Dependent Expected Utility (see Chapters 1 and 2). 4.3.3. Basic properties We begin by collecting a few basic properties of Choquet integrals. Here, · on bv() is the variation norm given by (4.3), while ≥ and · on B() are the pointwise order and the supnorm, respectively.6 Proposition 4.11. Suppose νc : B() → R is the Choquet functional induced by a game ν ∈ bv(). Then (i) (ii) (iii) (iv)
(Positive homogeneity): νc (αf ) = ανc (f ) for each α ≥ 0. (Translation invariance): νc (f + α1 ) = νc (f ) + ανc (1 ) for each α ∈ R. (Monotonicity): νc (f ) ≥ νc (g) if f ≥ g, provided ν is a capacity. (Lipschitz continuity): for all f , g ∈ B(), |νc (f ) − νc (g)| ≤ νf − g.
(4.14)
Proof. Properties (i) and (ii) are easily established. To see that (iii) holds it is enough to observe that, being ν a capacity, it holds ν(g ≥ t) ≤ ν(f ≥ t) for each t ∈ R since f ≥ g implies (g ≥ t) ⊆ (f ≥ t) for each t ∈ R. As to (iv), suppose first that ν is a capacity. Assume νc (f ) ≥ νc (g) (the other case is similar). As f ≤ g + f − g, by (ii) and (iii) we have νc (f ) ≤ νc (g) + f − gν( ). This implies |νc (f ) − νc (g)| ≤ ν( )f − g, which is (4.14) when ν is monotonic. For, in this case ν = ν( ).
(4.15)
Introduction to the mathematics of ambiguity
65
Now, let ν ∈ bv(). By Aumann and Shapley (1974: 28), ν can be written as ν = ν + − ν − , where ν + and ν − are capacities such that ν = ν + ( ) + ν − ( ). By (4.15), we then have |νc (f ) − νc (g) ≤ [ν + ( ) + ν − ( )]f − g, as desired. If a game ν belongs to bv(), its dual ν¯ as well belongs to bv(). The Choquet functional ν¯ c is therefore well defined and next we show that it can be viewed as the dual functional of νc . Proposition 4.12. Let ν ∈ bv(). Then, ν¯ c (f ) = −νc (−f ) for each f ∈ B(). If, in addition, ν is balanced, then νc (f ) ≤ µ(f ) ≤ ν¯ c (f ) for each f ∈ B() and each µ ∈ core (ν). Proof. Given f ∈ B(), we have ν¯ c (f ) =
∞
ν¯ (f ≥ t) dt +
0
=
∞
0
−∞
[¯ν (f ≥ t) − ν¯ ( )] dt
0
=
∞
[ν( ) − ν(f ≤ t)] dt −
0
=
∞
[ν( ) − ν(f < t)] dt +
0 −∞ 0
−∞
[ν( ) − ν(−f ≥ −t)] dt −
0
−∞
0 −∞
∞
[ν( ) − ν(−f ≥ t)] dt −
=−
ν(f ≤ t) dt
0
=
−ν(f < t) dt
0 0
−∞
ν(−f ≥ −t) dt
ν(−f ≥ t) dt
∞
[ν(−f ≥ t) − ν( )] dt +
ν(−f ≥ t) dt
0
= −νc (−f ). Suppose ν is balanced. Then ν(A) ≤ µ(A) ≤ ν¯ (A) for each A ∈ and each µ ∈ core (ν). In turn this implies that, given any f ∈ B(), ν(f ≥ t) ≤ µ
66
Massimo Marinacci and Luigi Montrucchio
(f ≥ t) ≤ ν¯ (f ≥ t) for each t ∈ R. By the monotonicity of the Riemann integral, 0 ∞ ν¯ (f ≥ t) dt + [¯ν (f ≥ t) − ν¯ ( )] dt 0
≥
−∞
∞
0
≥
0
∞
µ(f ≥ t) dt + ν(f ≥ t) dt +
0
−∞ 0
−∞
[µ(f ≥ t) − ν¯ ( )] dt
[ν(f ≥ t) − ν( )] dt,
and so νc (f ) ≤ µ(f ) ≤ ν¯ c (f ), as desired. In general, Choquet functionals ν : B() → R are not additive, that is, it is in general false that νc (f + g) = νc (f ) + νc (g). However, the next result, due to Dellacherie (1971), shows that additivity holds in a restricted sense. Say that two functions f , g ∈ B() are comonotonic (short for “commonly monotonic”) if (f (ω) − f (ω ))(g(ω) − g(ω )) ≥ 0 for any pair ω, ω ∈ . That is, two functions are comonotonic provided they have a similar pattern. Theorem 4.3. Suppose ν : B() → R is the Choquet functional induced by a game ν ∈ bv(). Then, νc (f + g) = νc (f ) + νc (g) provided f and g are comonotonic, and f + g ∈ B(). To prove this result we need a couple of useful lemmas. The first one says that two functions f and g are comonotonic if and only if all their upper sets are nested. This is trivially true for the two collections (f ≥ t) and (g ≥ t) separately; the interesting part here is that f and g are comonotonic if and only if this is still the case for the combined collection {(f ≥ t)}t∈R ∪ {(g ≥ t)}t∈R . For a proof of this lemma we refer to Denneberg (1994: Prop. 4.5). Lemma 4.4. Two functions f , g ∈ B() are comonotonic if and only if the overall collection of all upper sets (f ≥ t) and (g ≥ t) is a chain. The next lemma says that we can replicate games over chains with suitable charges. The non-additivity of a game is, therefore, immaterial as long as we restrict ourselves to chains. Lemma 4.5. Let ν ∈ bv(). Given any chain C in there is µ ∈ ba() such that µ(A) = ν(A) for all A ∈ C.
(4.16)
If, in addition, ν is a capacity, then we can take µ ∈ ba + (). Proof. It is enough to prove the result for a capacity ν, as the extension to any game in bv() is routine in view of their decomposition as differences of capacities given in Proposition 4.1.
Introduction to the mathematics of ambiguity
67
Consider first a finite chain Ø = A0 ⊆ A1 ⊆ · · · ⊆ An ⊆ An+1 = . Let 0 be the finite subalgebra of generated by such chain. Let µ0 ∈ ba + (0 ) be defined by µ0 (Ai+1 \Ai ) = ν(Ai+1 ) − ν(Ai ) for i = 1, . . . , n. By standard extension theorems for positive charges (see Rao and Rao, 1983: Corollary 3.3.4), there exists µ ∈ ba + () which extends µ0 on , that is, µ(A) = µ0 (A) for each A ∈ 0 . Hence, µ is the desired charge. Now, let C be any chain. Let {Cα }a be the collection of all its finite subchains, and set α = {µ ∈ ba + () : µ(A) = ν(A) for each A ∈ Cα }. By what we just proved, each α is nonempty. Moreover, the collection {α }a has the finite intersection property. For, let {Ci }ni=1 ⊆ {Cα }a be a finite collection. Since ni=1 Ci is in turn a finite chain, by proceeding as before it is easy to establish the existence
of a µ ∈ ba() such that µ(A) = ν(A) for each A ∈ ni=1 Ci . As µ ∈ ni=1 i , the intersection ni=1 i nonempty, as desired. Each α is a weak∗ -closed subset of the weak∗ -compact set {µ ∈ ba + () : µ( ) = ν( )}. Since {α }a has the finite intersection property, we conclude that Any charge µ ∈ α α satisfies (4.16).
α
α = Ø.
Proof of Theorem 4.3. Suppose f and g are comonotonic functions in B(). Then, the sum f + g is comonotonic with both f and g, so that the collection {f , g, f + g} consists of pairwise comonotonic functions. Let C = {(f ≥ t)}t∈R ∪ {(g ≥ t)}t∈R ∪ {(f + g ≥ t)}t∈R . By Lemma 4.4, C is a chain. By Lemma 4.5, there is µ ∈ ba() such that µ(A) = ν(A) for all A ∈ C. Hence, f dν + g dν = f dµ + g dµ = (f + g) dµ = (f + g) dν, as desired. As constant functions are comonotonic with all other functions, comonotonic additivity is a much stronger property than translation invariance. The next result of Bassanezi and Greco (1984: Theorem 2.1) shows that comonotonic additivity is actually the “best” possible type of additivity for Choquet functionals. Proposition 4.13. Suppose contains all singletons. Then, two functions f , g ∈ B(), with f + g ∈ B(), are comonotonic if and only if it holds νc (f + g) = νc (f )+νc (g) for all Choquet functionals induced by convex capacities ν : → R.
68
Massimo Marinacci and Luigi Montrucchio
Proof. The “only if ” part holds by Theorem 4.3. As to “if ” part, assume it holds νc (f + g) = νc (f ) + νc (g)
(4.17)
for all Choquet functionals induced by convex capacities. Suppose, per contra, that f and g are not comonotonic. Then, there exist ω , ω ∈ such that [f (ω ) − f (ω )][g(ω ) − g(ω )] < 0. Say that f (ω ) < f (ω ) and g(ω ) > g(ω ), and consider the convex game u{ω ,ω } (A) =
1 if {ω , ω } ⊆ A . 0 else
By Example 4.4, u{ω ,ω },c (f ) = f (ω ) and u{ω ,ω },c (g) = g(ω ). Hence, u{ω ,ω },c (f + g) = min{(f + g)(ω ), (f + g)(ω )} = f (ω ) + g(ω ) = u{ω ,ω },c (f ) + u{ω ,ω },c (g), which contradicts (4.17). Notice that the argument used to prove the last result can be adapted to give the following characterization of comonotonicity: when contains all singletons, two functions f , g ∈ B() are comonotonic if and only if inf (f (ω) + g(ω)) = inf f (ω) + inf g(ω)
ω∈A
ω∈A
ω∈A
for all A ∈ . Lemmas 4.4 and 4.5 are especially useful in finding counterparts for games and for their Choquet integrals of standard results that hold in the additive case. Theorem 4.3 is a first important example since through these lemmas we could derive the counterpart for Choquet integrals of the additivity of standard integrals. We close this subsection with another simple illustration of this feature of Lemmas 4.4 and 4.5 by showing a version for Choquet integrals of the classic Jensen inequality. Proposition 4.14. Let ν be a capacity with ν( ) = 1. Given a monotone convex function φ : R → R, for each f ∈ B() the following holds:
φ(f ) dν ≥ φ
f dν .
Proof. Given any f ∈ B(), the functions φ ◦ f and f are comonotonic. By Lemmas 4.4 and 4.5, there is µ ∈ ba + () such that µ(f ≥ t) = ν(f ≥ t) and µ(φ(f ) ≥ t) = ν(φ(f ) ≥ t) for each t ∈ R. In turn, this implies µ( ) =
Introduction to the mathematics of ambiguity 69 φ(f ) dµ, and f dν = f dµ. By the standard
ν( ) = 1, φ(f ) dν = Jensen inequality: φ(f ) dν = φ(f ) dµ ≥ φ f dµ = φ f dν , as desired.
4.4. Representation Summing up, Choquet functionals are positively homogeneous, comonotonic additive, and Lipschitz continuous; they are also monotone provided the underlying game does. A natural question is whether these properties actually characterize Choquet functionals among all the functionals defined on B(). Schmeidler (1986) showed that this is the case and we now present his result. Theorem 4.4. Let ν : B() → R be a functional. Define the game ν(A) = ν(1A ) on . The following conditions are equivalent: (i) ν is monotone and comonotonic additive; (ii) ν is a capacity and, for all f ∈ B(), it holds: ν(f ) = 0
∞
ν(f ≥ t) dt +
0
−∞
[ν(f ≥ t) − ν( )] dt.
(4.18)
Remarks. (i) Positive homogeneity is a redundant condition here as it is implied by comonotonic additivity and monotonicity, as shown in the proof. (ii) Zhou (1998) proved a version of this result on Stone lattices. Proof. (ii) trivially implies (i) Conversely, assume (i). We divide the proof into three steps. Step 1. For
and any integer
n, by comonotonic additivity we have
any f ∈ B() f f ν n . Namely, ν fn = n1 ν(f ). Hence, given any positive ν(f ) = ν n n = n rational number α = m/n,
m f f f f m f = ν + ··· + = ν + · · · + ν = ν(f ). ν n n n n n n As a result, we have ν(λf ) = λ ν(f ) for any λ ∈ Q+ . In particular, this implies 0 = ν(λ1 − λ1 ) = λν( ) + ν(−λ1 ) for each λ ∈ Q+ , and so ν(f + λ1 ) = ν(f ) + ν(λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ Q. Step 2. We now prove that ν is supnorm continuous. Let f , g ∈ B() and let {rn }n be a sequence of rationals such that rn ↓ f − g. As f ≤ g + f − g ≤ g + rn ,
70
Massimo Marinacci and Luigi Montrucchio
it follows that ν(f ) ≤ ν(g) + rn ν( ). Consequently, | ν(f ) − ν(g)| ≤ rn ν( ). As n → ∞, we get | ν(f ) − ν(g)| ≤ f − gν( ). Hence, ν is Lipschitz continuous, and so supnorm continuous. In turn, this implies ν(λf ) = λ ν(f ) for all λ ≥ 0 and ν(f + λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ R, that is, ν is translation invariant. Step 3. It remains to show that (4.18) holds, that is, that ν(f ) = νc (f ) for all f ∈ B(). Since both ν and νc are supnorm continuous and B0 () is supnorm dense in B(), it is enough to show that ν(f ) = νc (f ) for all f ∈ B0 (). Let f ∈ B0 (). Since both ν and νc are translation invariant, it is enough to show that ν(f ) = νc (f ) for f ≥ 0. As f ∈ B0 (), we can write f = ki=1 αi 1Ai , where {Ai }ki=1 ⊆ is a suitable partition of and α1 > · · · > αk . Setting Di = ij =1 Aj and αk+1 = 0, we can then write f = k−1 i=1 (αi −αi+1 )1Di +αk 1 . k−1 As the functions {(αi − αi+1 )1Di }i=1 and αk 1 are pairwise comonotonic, by the comonotonic additivity and positive homogeneity of ν we have ⎞ ⎛ k−1 i ν(f ) = (αi − αi+1 )ν ⎝ Aj ⎠ + αk 1. j =1
i=1
∞ i Since ki=1 (αi −αi+1 )ν ν(f ) = j =1 Aj = 0 ν(f ≥ t) dt, we conclude that ∞ ν(f ) = νc (f ), as desired. 0 ν(f ≥ t) dt, that is, Next we extend Schmeidler’s Theorem to the non-monotonic case. Given a functional ν : B() → R and any two f , g ∈ B() with f ≤ g, set V (f ; g) = sup
n−1
| ν(fi+1 ) − ν(fi )|,
i=0
where the supremum is taken over all finite chains f = f0 ≤ f1 ≤ · · · ≤ fn = g. We say that ν is of bounded variation if V (0; f ) < +∞ for all f ∈ B + (). Theorem 4.5. Let ν : B() → R be a functional. Define the game ν(A) = ν(1A ) on . The following conditions are equivalent: (i) ν is comonotonic additive and of bounded variation; (ii) ν is comonotonic additive and supnorm continuous on B + (), and ν ∈ bv(); (iii) ν ∈ bv() and, for all f ∈ B(), ∞ 0 ν(f ) = ν(f ≥ t) dt + [ν(f ≥ t) − ν( )] dt. 0
−∞
Remark. When is finite, the requirement ν ∈ bv() becomes superfluous in conditions (ii) and (iii) as all finite games are of bounded variation.
Introduction to the mathematics of ambiguity
71
Before proving the result, we give a useful lemma. Observe that the decomposition f = (f − t)+ + (f ∧ t) reduces to the standard f = f + − f − when t = 0. Lemma 4.6. Let ν : B() → R be a comonotonic additive functional. Then, ν(f ) = ν((f − t)+ ) + ν(f ∧ t) for each t ∈ R and f ∈ B(). Proof. Given any t ∈ R, the functions (f − t)+ and f ∧ t are comonotonic. In fact, for any ω, ω ∈ we have [(f − t)+ (ω) − (f − t)+ (ω )][(f ∧ t)(ω) − (f ∧ t)(ω )] = (f − t)+ (ω)(f ∧ t)(ω) − (f − t)+ (ω)(f ∧ t)(ω ) − (f − t)+ (ω )(f ∧ t)(ω) + (f − t)+ (ω )(f ∧ t)(ω ) = (f − t)+ (ω)(f − t)− (ω ) + (f − t)+ (ω )(f − t)− (ω) ≥ 0, as desired. Proof of Theorem 4.5. (i) implies (ii). Clearly, ν ∈ bv(). We want to show that (i) implies that ν is supnorm continuous over B + (). As Step 1 of the proof of Theorem 4.4 still holds here, we have ν(f + λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ Q. That is, ν is translation invariant w.r.t. Q. Let f , g ∈ B() with f ≤ g. If f ≥ 0, then V (f ; g) ≤ V (0; g) < +∞. Suppose f is not necessarily positive. There exists λ ∈ Q+ such that f + λ ≥ 0 and g + λ ≥ 0. By the translation invariance w.r.t. Q of ν, we have V (f ; g) = V (f + λ; g + λ) for all λ ∈ Q. Hence, V (f ; g) = V (f + λ; g + λ) < +∞. It is easy to see that V (0; λf ) = λV (0; f ) for all λ ∈ Q+ . The next claim gives a deeper property of V (f ; g). Claim. For all f ≥ 0 and all λ ∈ Q+ , it holds V (−λ; f ) = V (−λ; 0) + V (0; f ). Proof of the Claim. If f ≤ h ≤ g, we have V (f ; g) ≥ V (f ; h)+V (h; g). Hence, it suffices to show that V (−λ; f ) ≤ V (−λ; 0) + V (0; f ). By definition, for any ε > 0 there exists a chain {ϕi }ni=0 such that n−1
| ν(ϕi+1 ) − ν(ϕi )| ≥ V (−λ; f ) − ε,
i=0
with ϕ0 = −λ and ϕn = f . For each ϕi consider the two functions ϕi− = −(ϕi ∧0) and ϕi+ = ϕi ∨ 0 and the two chains {−ϕi− } and {ϕi+ }. The former chain is relative
72
Massimo Marinacci and Luigi Montrucchio
to V (−λ; 0), while the latter is relative to V (0; f ). Therefore, we have V (−λ; 0) + V (0; f ) ≥
n−1
− ) − ν(−ϕi− )| + | ν(−ϕi+1
i=0
=
n−1
n−1
+ | ν(ϕi+1 ) − ν(ϕi+ )|
i=0 − + (| ν(−ϕi+1 ) − ν(−ϕi− )| + | ν(ϕi+1 ) − ν(ϕi+ )|).
(4.19)
i=0
ν(ϕi+ ) + ν(−ϕi− ), On the other hand, by Lemma 4.6 for each i we have ν(ϕi ) = and so + − ν(ϕi )| = | ν(ϕi+1 ) + ν(−ϕi+1 ) − ν(ϕi+ ) − ν(−ϕi− )| | ν(ϕi+1 ) − + − ≤ | ν(ϕi+1 ) − ν(ϕi+ )| + | ν(−ϕi+1 ) − ν(−ϕi− )|.
In view of (4.19), we can write V (−λ; 0) + V (0; f ) ≥
n−1
| ν(ϕi+1 ) − ν(ϕi )| ≥ V (−λ; f ) − ε,
i=0
which proves our claim. Define the monotone functional ν1 (f ) = V (0; f ) on B + (). For each λ ∈ Q+ we have ν1 (f + λ) = V (0; f + λ) = V (−λ; f ) = V (−λ; 0) + V (0; f ) ν1 (f ). = V (0; λ) + V (0; f ) = λV (0; 1) + V (0; f ) = λ ν1 (1 ) + Hence, ν1 is translation invariant w.r.t. Q+ . Since ν1 is monotone, by Step 2 of the proof of Theorem 4.4 it is Lipschitz continuous, and so supnorm continuous. ν1 − ν on B + (). The functional ν2 is monotone; Consider the functional ν2 = moreover, it is translation invariant w.r.t. Q as both ν1 and ν do. Consequently, by Step 2 of the proof of Theorem 4.4 ν2 is supnorm continuous. As ν = ν1 − ν2 , we conclude that also ν is supnorm continuous, thus completing the proof that (i) implies (ii). (ii) implies (iii). Step 1 of the proof of Theorem 4.4 holds here as well. Hence, ν(f + λ1 ) = ν(f ) + λν( ) for each ν(λf ) = λ ν(f ) for all λ ∈ Q+ , and f ∈ B() and each λ ∈ Q. By supnorm continuity, ν(λf ) = λ ν(f ) for all λ ≥ 0, and ν(f + λ1 ) = ν(f ) + λν( ) for each λ ∈ R. The functional ν is, therefore, positively homogeneous and translation invariant. Let νc be the Choquet functional associated with ν. As ν ∈ bv(), νc is well ν and defined and supnorm continuous. We want to show that ν = νc . Since both νc are supnorm continuous and B0 () is supnorm dense in B(), it is enough
Introduction to the mathematics of ambiguity
73
to show that ν(f ) = νc (f ) for each f ∈ B0 (). This can be established by proceeding as in Step 3 of the proof of Theorem 4.4. (iii) implies (i). It remains to show that the Choquet functional νc is of bounded variation as long as ν ∈ bv(). By Proposition 4.1, there exist capacities ν 1 and ν 2 such that ν = ν 1 − ν 2 . Hence, νc = νc1 − νc2 and so the functional νc is the difference of two monotone functionals. This implies V (f ; g) ≤ νc1 (g) − νc1 (f ) + νc2 (g) − νc2 (f ), and we conclude that νc is of bounded variation.
4.5. Convex games Convex games are an interesting class of games and played an important role in Schmeidler’s approach to ambiguity, as explained in Chapter 1. Here we show some of their remarkable mathematical properties. We begin by proving formally that convexity can be formulated as in Equation (4.1), a version useful in game theory for interpreting supermodularity in terms of marginal values (see Moulin, 1995). Proposition 4.15. For any game ν, the following properties are equivalent: (i) ν is convex; (ii) for all sets A, B, and C such that A ⊆ B and B ∩ C = Ø, ν(A ∪ C) − ν(A) ≤ ν(B ∪ C) − ν(B); (iii) for all disjoint sets A, B, and C: ν(B ∪ A) − ν(B) ≤ ν(B ∪ C ∪ A) − ν(B ∪ C). Proof. (ii) easily implies (iii). Assume (ii) holds. Since (A∪B)\A = B \(A∩B), to check the supermodularity of ν is enough to set C = (A∪B)\A. Finally, assume (i) holds. If the sets A, B, and C are disjoint, then (B ∪ C) ∩ (B ∪ A) = B, and so supermodularity implies (iii), as desired. The next result, due to Choquet (1953: 289), shows that the convexity of the game and the superlinearity of the associated Choquet functional are two faces of the same coin.7 Recall that, by Proposition 4.6, B() is a lattice and it becomes a vector lattice when is σ -algebra. Theorem 4.6. For any game ν in bv(), the following conditions are equivalent: (i) ν is convex, (ii) νc is superadditive on B(), that is, νc (f + g) ≥ νc (f ) + νc (g) for all f , g ∈ B() such that f + g ∈ B(). (iii) νc is supermodular on B(), that is, νc (f ∨ g) + νc (f ∧ g) ≥ νc (f ) + νc (g) for all f , g ∈ B().
74
Massimo Marinacci and Luigi Montrucchio
Proof. We prove that both (ii) and (iii) are equivalent to (i). (i) implies (ii). Given f ∈ B + () and E ∈ , we have (f + 1E ≥ t) = (f ≥ t) ∪ (E ∩ (f ≥ t − 1)), and so f + 1E ∈ B + (). In turn, this implies f + g ∈ B + () whenever g ∈ B + () is simple. Moreover, as ν is convex, we get ν(f + 1E ≥ t) ≥ ν(f ≥ t) + ν(E ∩ (f ≥ t − 1)) − ν(E ∩ (f ≥ t)). Consequently, νc (f + 1E ) ∞ = ν(f + 1E ≥ t) dt
0
≥
∞
ν(f ≥ t) dt +
0
= νc (f ) +
∞
0 0 −1
∞
ν(E ∩ (f ≥ t − 1)) dt −
ν(E ∩ (f ≥ t)) dt
0
ν(E ∩ (f ≥ t)) dt = νc (f ) + ν(E).
As νc is positive homogeneous, for each λ ≥ 0 we have f f + 1E ≥ λ νc + ν(E) νc (f + λ1E ) = λνc λ λ = νc (f ) + λν(E).
n Let g ∈ B + () be a simple function. We can write g = i=1 λi 1Di , where D1 ⊆ · · · ⊆ Dn and λi ≥ 0 for each i = 1, . . . , n. As g is simple, we have f + g ∈ B + (). Hence, ! ! n n λi 1Di ≥ νc f + λi 1Di + λ1 ν(D1 ) νc (f + g) = νc f + i=1
≥ · · · ≥ νc (f ) +
i=2 n
λi ν(Di ) = νc (f ) + νc (g),
i=1
as desired. To show that the inequality ν(f + g) ≥ ν(f ) + ν(g) holds for all f , g ∈ B() it is now enough to use the translation invariance and supnorm continuity of νc . (ii) implies (i). Given any sets A and B, it holds 1A∪B + 1A∩B = 1A + 1B . Since the characteristic functions 1A∪B and 1A∩B are comonotonic, we then have ν(A ∪ B) + ν(A ∩ B) = νc (1A∪B ) + νc (1A∩B ) = νc (1A∪B + 1A∩B ) = νc (1A + 1B ) ≥ νc (1A ) + νc (1B ) = ν(A) + ν(B), and so the game ν is convex, as desired.
Introduction to the mathematics of ambiguity
75
(i) implies (iii). As νc is translation invariant, it is enough to prove the implication for f and g positive. It is easy to check that, for each t ∈ R, it holds (f ∨ g ≥ t) = (f ≥ t) ∪ (g ≥ t) (f ∧ g ≥ t) = (f ≥ t) ∩ (g ≥ t). Therefore, if ν is convex, then ν(f ∨ g ≥ t) + ν(f ∧ g ≥ t) ≥ ν(f ≥ t) + ν(g ≥ t). Hence,
∞
νc (f ∨ g) + νc (f ∧ g) = 0
=
∞
ν(f ∨ g ≥ t) dt +
ν(f ∧ g ≥ t) dt
0 ∞
[ν(f ∨ g ≥ t) + ν(f ∧ g ≥ t)] dt
0 ∞
≥
[ν(f ≥ t) + ν(g ≥ t)] dt = νc (f ) + νc (g),
0
as desired. (iii) implies (i). We have 1A ∨ 1B = 1A∪B and 1A ∧ 1B = 1A∩B . Hence, if we put f = 1A and g = 1B in the inequality νc (f ∨ g) + νc (f ∧ g) ≥ νc (f ) + νc (g), we get ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B), as desired. By Theorem 4.6, a game is convex if and only if the associated Choquet functional νc is superlinear, that is, superadditive and positively homogeneous. This is a useful property that, for example, makes it possible to use the classic Hahn–Banach Theorem in studying convex games. In order to do so, however, we first have to deal with a technical problem: unless is a σ -algebra, the space B() is not in general a vector space, something needed to apply the Hahn–Banach Theorem and other standard functional analytic results. There are at least two ways to bypass the problem. The first one is to consider the vector space B0 () of -measurable simple functions in place of the whole set B(). This can be enough as long as one is interested in using results that, like the Hahn–Banach Theorem, hold on any vector space. There are important results, however, that only hold on Banach spaces (e.g. the Uniform Boundedness Principle). In this case B0 (), which is not a Banach space, is useless. A solution is to consider B(), the supnorm closure B() of B0 (),8 which is a Banach lattice under the supnorm (Dunford and Schwartz, 1958: 258). B() is a dense subset of B(); it holds B() = B() when is a σ -algebra, and so in this case B() itself is a Banach lattice. If is not a σ -algebra, to work with the Banach lattice B() we have to extend on it the Choquet functional νc , which is originally defined on B().
76
Massimo Marinacci and Luigi Montrucchio
Lemma 4.7. Any Choquet functional νc : B() → R induced by a game ν ∈ bv() admits a unique supnorm continuous extension on B(). Such extension is positively homogeneous and comonotonic additive. Proof. By Proposition 4.11(iv), νc is Lipschitz continuous on B(). By standard results (Aliprantis and Border, 1999: 77), it then admits a unique supnorm continuous extension on the closure B(). Using its supnorm continuity, such extension is easily seen to be positively homogeneous. As to comonotonic additivity, we first prove the following claim. Claim. Given any two comonotonic and supnorm bounded functions f and g, there exist two sequences of simple functions {fn }n and {gn }n uniformly converging to f and g, respectively, and such that fn and gn are comonotonic for each n. Proof of the Claim. It is enough to prove the claim for positive functions. Let f : → R be positive and supnorm bounded, so that there exists a constant M > 0 such that 0 ≤ f (ω) ≤ M for each ω ∈ . Let M = αn > αn−1 > · · · > α1 > α0 = 0, with αi = (i/n)M for each i = 0, 1, . . . , n. SetAi = (f ≥ αi ) for each i = 1, . . . , n − 1, and define fn : → R as fn = n−1 i=1 αi 1Ai . The collection of upper sets {(fn ≥ t)}t∈R is included in {(f ≥ t)}t∈R and f − fn = maxi∈{0,...,n−1} (αi+1 − αi ) = M/n. In a similar way, for each n we can construct a simple function gn such that the collection of upper sets {(gn ≥ t)}t∈R is included in {(g ≥ t)}t∈R and g − gn = M/n. By Lemma 4.4 the collections {(g ≥ t)}t∈R and {(f ≥ t)}t∈R together form a chain. Hence, by what we just proved, for each n the collections {(g ≥ t)}t∈R , {(gn ≥ t)}t∈R , {(f ≥ t)}t∈R , and {(fn ≥ t)}t∈R together form a chain as well. Again by Lemma 4.4, fn and gn are then comonotonic functions, and so the sequences {fn }n and {gn }n we have constructed have the desired properties. This completes the proof of the Claim. Let f , g ∈ B(). Consider the sequences {fn }n and {gn }n of simple functions given by the Claim. As such sequences belong to B(), by the supnorm continuity of νc we have: νc (f + g) = lim νc (fn + gn ) = lim νc (fn ) + lim νc (gn ) = νc (f ) + νc (g), n
n
n
as desired. It is convenient to denote this extension still by νc , and in the sequel we will write νc : B() → R. In the enlarged domain B() the following cleaner version of Theorem 4.6 holds. As B() is a vector space, here we can consider concavity and quasi-concavity. The latter property is the only nontrivial feature of the next result relative to Theorem 4.6.9 Corollary 4.2. For any game ν in bv(), the following conditions are equivalent: (i) ν is convex,
Introduction to the mathematics of ambiguity (ii) (iii) (iv) (v)
νc νc νc νc
77
is superlinear on B(), is supermodular on B(), is concave on B(), is quasi-concave on B(), provided ν( ) = 0.
Proof. In view of Theorem 4.6, the only nontrivial part is to show that (v) implies (iv). We will actually prove the stronger result that (iv) is equivalent to the convexity of the cone {f : νc (f ) ≥ 0}. Set K = {f ∈ B() : νc (f ) ≥ 0}. Given two functions f , g ∈ B(), we have νc (f ) νc (g) νc g − νc f − 1 = 0, 1 = 0. ν( ) ν( ) Hence, both f − (νc (f )/ν( ))1 , and g − (νc (g)/ν( ))1 lie in K. By the convexity of K, taken α ∈ [0, 1] and α ≡ 1 − α, we have αf − α
νc (f ) νc (g) 1 + αg − α 1 ∈ K. ν( ) ν( )
Namely, νc (f ) νc (g) 1 + αg − α 1
νc αf − α ν( ) ν( ) = νc (αf + αg) − ανc (f ) − ανc (g) ≥ 0. Therefore, νc is concave. Remarks. (i) Dual properties hold for submodular games. For example, a game ν is submodular if and only if its Choquet functional νc is convex on B(); equivalently, a game ν is convex if and only if its dual Choquet functional ν c is convex on B(). For brevity, we omit these dual properties. (ii) Condition ν( ) = 0 in point (v) is needed. Consider the game ν on = {ω1 , ω2 } with ν(ω1 ) = 2, ν(ω2 ) = −1, and ν( ) = 0. Being subadditive, ν is not convex. On the other hand, its Choquet integral is 2(x1 − x2 ) x1 ≥ x2 νc (x1 , x2 ) = , x1 − x 2 x2 > x 1 which is quasi-concave. The next result is a first consequence of the use of functional analytic tools in the study of convex games. The equivalence between (i) and (v) is due to Schmeidler (1986) for positive games and to De Waegenaere and Wakker (2001) for finite games; for the other equivalences we refer to Delbaen (1974) and Marinacci and Montrucchio (2003).
78
Massimo Marinacci and Luigi Montrucchio
Theorem 4.7. For a bounded game ν, the following conditions are equivalent: (i) ν is convex; (ii) for any A ⊆ B there is µ ∈ core(ν), such that µ(A) = ν(A) and µ(B) = ν(B); (iii) for any finite chain {Ai }ni=1 , there is µ ∈ core(ν) such that µ(Ai ) = ν(Ai ) for all i = 1, . . . , n; (iv) ν ∈ bv() and, for any chain {Ai }i∈I , there is µ ∈ ext (core (ν)) such that µ(Ai ) = ν(Ai ) for all i ∈ I ; (v) ν ∈ bv() and νc (f ) = minµ∈core(ν) f dµ for all f ∈ B(); (vi) νc (f ) = minµ∈core(ν) f dµ for all f ∈ B0 (). This theorem has a few noteworthy features. First, it shows that bounded and convex games belong to bv(), so that they always have well-defined Choquet integrals on B(). Second, it improves Lemma 4.5 by showing that in the convex case the “replicating” measures over chains can be assumed to be in the core. Finally, Theorem 4.7 shows that Choquet functionals of convex games can be viewed as lower envelopes of the linear functional on B() induced by the measures in the cores. In other words, convex games are exact games of a special type, in which the close connection between the game and the measures in the core holds on the entire space B(), and not just on . Proof. The proof proceeds as follows: (i) ⇒ (vi) ⇒ (iv) ⇒ (v) ⇒ (iii) ⇒ (ii) ⇒ (i). (i) implies (vi). Given any f ∈ B0 (), the Choquet integral f dν is well defined since ν ∈ bv(f ), where f is the finite algebra generated by f . Hence, the Choquet functional νc : B0 () → R exists on the vector space B0 (), and it is positively homogeneous and translation invariant. Let f , g : → R be any two functions in B0 (). Let f ,g be the smallest algebra that makes both f and g measurable. As f ,g is finite, ν ∈ bv(f ,g ) and so we can apply Theorem 4.6 to the restricted Choquet integral νc : B(f ,g ) → R. Thus, νc (f +g) ≥ νc (f )+νc (g). Since f and g were arbitrary elements of B0 (), we conclude that νc : B0 () → R is a superlinear functional on B0 (). Let f ∈ B0 (). The algebraic dual of B0 () is the space f a() of all finitely additive games on .10 As νc : B0 () → R is superlinear, by the Hahn–Banach Theorem there is µc ∈ f a() such that µc (f ) = νc (f ) and µc (g) ≥ νc (g) for each g ∈ B0 (). In other words, νc (f ) = min µc (f ), µ∈C
where C = {µ ∈ f a() : µc (f ) ≥ νc (f ) for each f ∈ B0 ()}. Next we show that C coincides with the set C = {µ ∈ f a() : µ ≥ ν and µ( ) = ν( )}.
Introduction to the mathematics of ambiguity
79
Let µ ∈ C. Then, µ(A) = µc (1A ) ≥ νc (1A ) = ν(A) for all A ∈ ; moreover, −µ( ) = µc (−1 ) ≥ νc (−1 ) = −ν( ). Hence, µ ∈ C . Conversely, suppose µ ∈ C . As µ ≥ ν and µ( ) = ν( ), the definition of Choquet integral immediately implies that νc (f ) µ(f ). Hence, µ ∈ C. It remains to show that C = core(ν). As ba() ⊆ f a(), core(ν) ⊆ C . As to the converse inclusion, suppose µ ∈ C . Since ν is bounded, for each µ ∈ C we have |µ(A)| ≤ 2 supA∈ |ν(A)| (see Proposition 4.2). Then, µ ∈ ba() (see Dunford and Schwartz, 1958: 97) and we conclude that C ⊆ core(ν), as desired. (vi) implies (iv). Consider first a finite chain A1 ⊆ · · · ⊆ An . By (vi), there exists µ ∈ core(ν) such that
µc
n i=1
! 1Ai
= νc
n
! 1Ai .
i=1
n By comonotonic additivity, ni=1 µ(Ai ) = i=1 ν(Ai ). As µ ∈ core(ν), we have µ(Ai ) ≥ ν(Ai ) for all i = 1, . . . , n, which in turn implies µ(Ai ) = ν(Ai ) for all i = 1, . . . , n. Now, let {Ai }i∈I be any chain in . Let J be the (finite) algebra generated by a finite subchain {Ai }i∈J and J = {µ ∈ core(ν) : µ(Aj ) = ν(Aj ) for all j ∈ J }. Since core(ν) is weak∗ -compact, the set J is weak∗ -compact. Moreover, it is convex and, by what we just proved, J = Ø. It is easily seen that J is also extremal in core(ν). The collection of weak∗ -compact sets {J }{J : J ⊆I and |J | 0. Then, the vector t1A ∈ R+ ν |A| Bν (t1A ) = αA t + terms of lower degree.
Hence, for t large enough we have Bν (t1A ) < 0, a contradiction. We conclude ν that αA ≥ 0 for each A, as desired. This lemma is the reason why we considered multilinear polynomials defined on Rn rather than on [0, 1]n , as it is usually the case. In fact, by Lemma 4.12(iv) the positivity of the Owen polynomial on [0, 1]n only reflects the positivity of the associated game, not its total monotonicity. We now illustrate Lemmas 4.12 and 4.13 with a couple of examples. Example 4.13. Consider the game ν(A) = |A|2 of Example 4.12. As Bν (x) =
n i=1
xi + 2
i =j
x i xj ≥ 0
for each x ∈ Rn+ ,
by Lemma 4.13 the game ν is totally monotone.
Introduction to the mathematics of ambiguity
99
Example 4.14. Consider the game associated with the multilinear polynomial B(x) = x1 x2 + x1 x3 + x2 x3 − εx1 x2 x3 with ε > 0. As B(10/ε, 10/ε, 2/ε) < 0 for each ε > 0, this game is not totally monotone. The game is positive and convex when ε ≤ 1. In fact, B(x) = x1 x2 (1 − εx3 ) + x1 x3 + x2 x3 ≥ 0 on [0, 1]3 , and so by Lemma 4.12(iv) is positive. On the other hand, ∂ 2B = 1 − εxk ≥ 0, ∂xi ∂xj on (0, 1)n , so that, by Lemma (4.12)(v), the game is convex. In view of Lemma (4.13), it is natural to consider the pointed convex cone Pn+ = {P ∈ Pn : P (x) ≥ 0 for each x ∈ Rn+ }. It induces in the usual way an order !p on Pn as follows: given P1 , P2 ∈ Pn , write P1 !p P2 if P1 − P2 ∈ Pn+ . In turn, !p induces a lattice structure and norm, denoted by · p , that makes Pn an AL-space. For brevity, we omit the details of these by now standard notions. The next result summarizes the relations existing between the space of finite games and the space of multilinear polynomials just introduced. Theorem 4.15. There is a lattice preserving and isometric isomorphism B between the AL-spaces (Vn , !, · ) and (Pn , !p , · p ) determined by the identity P (x) =
ν(A)
Ø=A∈
# i∈A
xi
#
(1 − xj ) for each x ∈ Rn .
j ∈Ac
The game ν is totally monotone if and only if the corresponding polynomial P in Pn is nonnegatiue on Rn+ . Summing up, Theorems 4.11, 4.13, and 4.15 established the following lattice isometries:
R2||−1 , ≥, · 1
T
I
←→ (Vn , !, · ) ←→ (ba(2 ), !ba , · ) #B (Pn , !p , · p )
The resulting isometrics I ◦ T −1 and B ◦ T −1 between R2||−1 , ba(2 ), and Pn are obviously well known. The interesting part here is given by the possibility of representing finite games in different ways, each useful for different purposes.
100
Massimo Marinacci and Luigi Montrucchio
4.6.5. Convex games In this last subsection we show some noteworthy properties of finite convex games. A first important property has been already mentioned right after Theorem 4.12: any finite game can be written as the difference of two convex games. To see other properties of finite convex games, we have to turn our attentions to chains of subsets of . As = {ω1 , . . . , ωn }, the collection C given by {ω1 }, {ω1 , ω2 }, . . . , {ω1 , . . . , ωn } forms a maximal chain, that is, no other chain can contain it. More generally, given any permutation σ on {1, . . . , n}, the collection Cσ given by {ωσ (1) }, {ωσ (1) , ωσ (2) }, . . . , {ωσ (1) , . . . , ωσ (n) } forms another maximal chain. All maximal chains in have this form, and so there are n! of them. Let ν be any game. By Lemma 4.5, for each Cσ there is a charge µσ ∈ ba() such that µσ (A) = ν(A) for each A ∈ Cσ . Because of the maximality of Cσ , the charge µσ is easily seen to be unique. We call µσ is the marginal worth charge associated with permutation σ . Marginal worth charges play a central role in studying finite convex games. We begin by providing a characterization of convexity based on them, due to Ichiishi (1981). Theorem 4.16. A finite game ν is convex if and only if all its marginal worth charges µσ belong to the core. Proof. “Only if”. Suppose ν is convex. We want to show that each µσ belongs to core(ν). By Theorem 4.7, there exists µ ∈ core(ν) such that µ(A) = ν(A) for each A ∈ Cσ . By the maximality of Cσ , µσ is the unique charge having such property. Hence, µ = µσ , as desired. “If”. Suppose µσ ∈ core(ν) for all permutations σ . Given any A and B, let Cσ be a maximal chain containing A ∩ B, A, and A ∪ B. Then ν(A ∪ B) + ν(A ∩ B) − ν(A) = µσ (A ∪ B) + µσ (A ∩ B) − µσ (A) = µσ (B). As µσ ∈ core(ν), we have µσ (B) ≥ ν(B), and so ν is convex. Turn now to cores of finite games. The first observation to make is that the core of a finite game is a subset of the | |-dimensional space R of the form: core(ν) = x ∈ R :
ω∈
xω = ν( ) and
ω∈A
$ xω ≥ ν(A) for each A .
Introduction to the mathematics of ambiguity
101
Equivalently, core(ν) =
A∈
x∈R :
ω∈A
$
xω ≥ ν(A) ∩ x ∈ R :
$ xω ≤ ν( ) ,
ω∈
that is, core(ν) is the set of solutions of a finite system of linear inequalities on R . Sets of this form are called polyhedra. By Proposition 4.2 the core is weak*-compact. In this finite setting, this means that it is a compact subset of R , where compactness is in the standard norm topology of R . The core of a finite game is, therefore, a compact polyhedron. As a result, we have the following geometric property of cores of finite games. Proposition 4.16. The core of a finite game is a polytope in R , that is, it is the convex hull of a finite set. Proof. By a standard result (see Aliprantis and Border, 1999: 233–234 or Webster, 1994: 114), compact polyhedra are polytopes. The extreme points of a polytope are called vertices and they form a finite set. As each element of a polytope can be represented as a convex combination of its vertices, the knowledge of the set of vertices is, therefore, key in describing the structure of a polytope. All this means that, by Proposition 4.16, in order to understand the structure of the core it is crucial to identify the set of its vertices. This is achieved by the next result, due to Shapley (1971). Interestingly, the marginal worth charges, which by Theorem 4.16 always belong to the core of a convex game, turn out to be exactly the sought-after vertices. Theorem 4.17. Let ν be a finite convex game. Then, a charge µ ∈ ba() is a vertex of core(ν) if and only if it is a marginal worth charge, that is, if and only if there is a maximal chain Cσ such that ν(A) = µ(A) for all A ∈ Cσ . Proof. An element of a polytope is a vertex if and only if it is an exposed point. Hence, it is enough to show that the marginal worth charges are the set of exposed points of core(ν). “If ”. Suppose µσ is a marginal worth charge, with associated maximal chain Cσ . We want to show that is an exposed point of core(ν). Since Cσ is a maximal by Cσ , that chain, there is an injective function fσ whose upper sets are given is, Cσ = {(fσ ≥ t)}t∈R . For example, if Cσ = {Aσ (i) }, take fσ = ni=1 1Aσ (i) . By the definition of Choquet integral, we have f dµσ = f dν. Since Cσ is maximal, µσ is the unique charge replicating ν on Cσ . Therefore, given any other charge µ in core(ν), there exists A ∈ Cσ such that µσ (A) < µ(A). Equivalently, there is some t ∈ R such that ν(fσ ≥ t) = µσ (fσ ≥ t) < µ(fσ ≥ t). Hence, fσ dν = fσ dµσ < fσ dµ for all µ ∈ core(ν) with µ = µσ , and this proves that µσ is an exposed point, as desired.
102
Massimo Marinacci and Luigi Montrucchio
“Only if ”. Suppose µ∗ is an exposed point of core(ν). We want to show that is a marginal worth charge, that is, that there exists a maximal chain C ∗ in
such that µ∗ (A) = ν(A) for each A ∈ C ∗ . ∗ Let {µi }m i=1 be the set of all exposed points of core(ν), except µ . Set k1 = ∗ ∗ f: → µ ∨ (maxi=1,...,m µi ). Since µ is an exposed point, there exists ∗ . Set k = f dµ for all µ ∈ core(ν) with µ = µ R such that f dµ∗ < 2 mini=1,...,m ( f dµi − f dµ∗ ). Clearly, k2 > 0. Given 0 < ε < k2 /2k1 , there is an injective g : → R such that f − g < ε. Hence, for each i we have g dµi − g dµ∗ = g dµi − f dµi + f dµi − f dµ∗ + f dµ∗ − g dµ∗ ≥ −εk1 + k2 − εk1 > 0.
µ∗
We conclude that g dµ∗ < g dµi for each i, and so g dµ∗ < g dµ for all µ ∈ core(ν) with µ = µ∗ . Since ν is convex, by Theorem 4.7 it holds g dν = minµ∈core(ν) g dµ, and ∗ < g dµ for all µ ∈ core(ν) with µ = µ∗ . The equality so g dν = g dµ ∗ g dν = g dµ implies that µ∗ (g ≥ t) = ν(g ≥ t) for all t ∈ R. Since g is injective, the chain of upper sets {g ≥ t} is maximal in , and it is actually the desired maximal chain C ∗ . Denote by M(ν) the set of all marginal worth charges of a game ν. By Theorem 4.17, we have core(ν) = co(M(ν)), and so all elements of the core can be represented as convex combinations of marginal worth charges. This result has been recently generalized to infinite games by Marinacci and Montrucchio (forthcoming). Putting together Theorems 4.16 and 4.17, we have the following remarkable property of finite games. Corollary 4.4. A finite game ν is convex if and only if M(ν) = exp(core(ν)). Therefore, given a game, the knowledge of its n! marginal worth charges makes it possible to determine both whether the game is convex and what is the structure of its core. We close by observing that it is not by chance that in Corollary 4.4 we use the set of exposed point exp rather than that of extreme points ext. For a polytope these two sets coincide and they form the set of vertices. For general compact convex sets, even in finite dimensional spaces, this is no longer the case and exposed points are only a subset of the set of extreme points. Inspection of the proof of Theorem 4.17 shows that what we have actually proved is that marginal worth charges are the set of exposed points of the core. The fact that they then turn out to coincide with the set of extreme points is a consequence of properties of polytopes, which are immaterial for the proof.
Introduction to the mathematics of ambiguity
103
When extending the result to infinite convex games this observation is important as in the more general, setting—where exposed and extreme points no longer necessarily coincide—the analog of marginal worth charges will actually characterize the exposed points. We refer the interested reader to Marinacci and Montrucchio (forthcoming) for details.
4.7. Concluding remarks 1
2
3
4
5
6
In this chapter we only considered games defined on spaces having no topological structure. There is a large literature on suitably “regular” set functions defined on topological spaces, tracing back to Choquet (1953). We refer the interested reader to Huber and Strassen (1973) and Dellacherie and Meyer (1978). Epstein and Wang (1996) and Philippe et al. (1999) provide some decision-theoretic applications of capacities on topological domains. In a series of papers, Gabriele Greco proposed an interesting notion of measurability on algebras. A noteworthy feature of his approach is that, unlike B(), the resulting class of measurable functions forms a vector space. Greco’s approach is, therefore, a further way to bypass the lack of vector structure of B() that we discussed in some detail after Theorem 4.6. In this chapter, we preferred to define the Choquet functional on the smaller domain B() ¯ and then extend it on the vector space B() using its Lipschitz continuity, following in this way a standard procedure in functional analysis. In any case, details on Greco’s approach can be found in his papers (e.g. Greco, 1981 and Bassanezi and Greco, 1984) and in Denneberg (1994). We did not consider here games and Choquet functionals defined on product algebras. For details on this topic we refer the interested reader to Ben Porath et al. (1997), Ghirardato (1997), and to the references contained therein. Throughout the chapter we only considered Choquet functionals defined on bounded functions. Results for the unbounded case can be found in Greco (1976, 1982), and Bassanezi and Greco (1984) and in Wakker (1993). Sipos (1979a,b) introduced a different notion of integral for capacities. It coincides with the Choquet integral for positive functions, but the extension to general functions is done according to the standard procedure used to extend the Lebesgue integral from positive functions to general functions, based on the decomposition f = f + −f − . The resulting integral is in general different from the Choquet integral and it turned out to be useful in some applications. We refer the interested reader to Sipos’ original papers and to Denneberg (1994). Theorem 4.6 and Corollary 4.2 make it possible to use convex analysis tools in studying convex games and their Choquet integrals. For example, Carlier and Dana (forthcoming) and Marinacci and Montrucchio (forthcoming) use such tools to study the structure of cores of convex games and the differentiability and subdifferentiability properties of their Choquet integrals.
104
Massimo Marinacci and Luigi Montrucchio
Acknowledgments We thank Fabio Maccheroni for his very insightful suggestions, which greatly improved this chapter. The financial support of MIUR (Ministero dell’Istruzione, Università e Ricerca Scientifica) is gratefully acknowledged.
Notes 1 In the sequel subsets of are understood to be in even where not stated explicitly and they are referred to both as sets and as coalitions. 2 Maccheroni and Ruckle (2002) proved that (bv(), · ) is a dual Banach space. 3 The weak∗ -topology and its properties can be found in, for example, Aliprantis and Border (1999), Dunford and Schwartz (1958) and Rudin (1973). 4 The subgame νA is the restriction of ν on the induced algebra A = ∩ A given by νA (B) = ν(B) for all B ⊆ A. 5 A collection C in is chain if for each A and B in C it holds either A ⊆ B or B ⊆ A. Throughout we assume that Ø, ∈ C. 6 That is, f ≥ g if f (ω) ≥ g(ω) for each ω ∈ , and f = supω∈ |f (ω)|. 7 A functional is superlinear if it is positively homogeneous and superadditive. Recall that, by Proposition 4.11, Choquet functionals are always positively homogeneous. 8 That is, f ∈ B() provided there is a sequence {fn }n ⊆ B0 () such that limn f − fn = 0. Here we are viewing B0 () as a subset of the set of all bounded functions f : → R. 9 The equivalence between the convexity of ν and the concavity of νc established in Corollary 4.2 is also a curious terminological phenomenon, which may give rise to some confusion. A simple way to avoid any problem is to use the terminology “supermodular games.” 10 Notice that ba() is the subspace of f a() consiting of all bounded charges. 11 See, for example, Biswas et al. (1999) and the references therein contained. For characterizations of convexity and exactness related to stability, see Kikuta (1988) and Sharkey (1982). 12 Needless to say, the properties we will establish for finite games also hold for games defined on finite algebras of subsets of infinite spaces. 13 That is, V + ∩ (−V + ) = {0}. 14 See Aliprantis and Border (1999: 263–330) for a definition of these lattice operations, as well as for all notions on vector lattices needed in the sequel. 15 The l1 -norm · 1 of Rn is given by x1 = ni=1 |xi | for each x ∈ Rn . 16 In the statement !ba denotes the restriction of ! on ba() as discussed right after Theorem 4.11. 17 B(2 )∗ is the vector space of all linear functionals defined on the vector space B(2 ) of all functions defined on the enlarged space .
References Aliprantis, C. D. and K. C. Border (1999) Infinite dimensional analysis, Springer-Verlag, New York. Aumann, R. and L. Shapley (1974) Values of non-atomic games, Princeton University Press, Princeton. Bassanezi, R. C. and G. H. Greco (1984) Sull’additività dell’integrale, Rendiconti Seminario Matematico Università di Padova, 72, 249–275.
Introduction to the mathematics of ambiguity
105
Ben Porath, E., I. Gilboa and D. Schmeidler (1997) On the measurement of inequality under uncertainty, Journal of Economic Theory, 75, 194–204. (Reprinted as Chapter 22 in this volume.) Bhaskara Rao, K. P. S., M. Bhaskara Rao (1983) Theory of charges, Academic Press, New York. Biswas, A. K., T. Parthasarathy, J. A. M. Potters, and M. Voorneveld (1999) Large cores and exactness, Games and Economic Behavior, 28, 1–12. Bondareva, O. (1963) Certain applications of the methods of linear programming to the theory of cooperative games, (in Russian) Problemy Kibernetiki, 10, 119–139. Boros, E. and P. L. Hammer (2002) Pseudo-Boolean optimization, Discrete Applied Mathematics, 123, 155–225. Carlier, G. and R. A. Dana, Core of convex distortions of a probability on a non-atomic space, Journal of Economic Theory, 173, 199–222, 2003. Chateauneuf, A. and Jaffray, J.-Y. (1989) Some characterizations of lower probabilities and other monotone capacities through the use of Mobius inversion, Mathematical Social Sciences, 17, 263–283. Choquet, G. (1953) Theory of capacities, Annales de l’Institut Fourier, 5, 131–295. Delbaen, F. (1974) Convex games and extreme points, Journal of Mathematical Analysis and Applications, 45, 210–233. Dellacherie, C. (1971) Quelques commentaires sur le prolongements de capacités, Seminaire Probabilités V, Lecture Notes in Math. 191, Springer-Verlag, New York. Dellacherie, C. and P.-A. Meyer (1978) Probabilities and potential, North-Holland, Amsterdam. Dempster, A. (1967) Upper and lower probabilities induced by a multivalued mapping, Annals of Mathematical Statistics, 38, 325–339. Dempster, A. (1968) A generalization of Bayesian inference, Journal of the Royal Statistical Society (B), 30, 205–247. Denneberg, D. (1994) Non-additive measure and integral, Kluwer, Dordrecht. Denneberg, D. (1997) Representation of the Choquet integral with the σ -additive Mobius transform, Fuzzy Sets and Systems, 92, 139–156. De Waegenaere, A. and P. Wakker (2001) Nonmonotonic Choquet integrals, Journal of Mathematical Economics, 36, 45–60. Dunford, N. and J. T. Schwartz (1958) Linear operators, part I: general theory, WileyIntersience, London. Einy, E. and B. Shitovitz (1996) Convex games and stable sets, Games and Economic Behavior, 16, 192–201. Epstein, L. G. and T. Wang (1996) “Beliefs about beliefs” without probabilities, Econometrica, 64, 1343–1373. Fan, K. (1956) On systems of linear inequalities, in Linear inequalities and related systems, Annals of Math. Studies, 38, 99–156. Ghirardato, P. (1997) On independence for non-additive measures, with a Fu-bini theorem, Journal of Economic Theory, 73, 261–291. Gilboa, I. and E. Lehrer (1991) Global games, International Journal of Game Theory, 20, 129–147. Gilboa, I. and D. Schmeidler (1994) Additive representations of non-additive measures and the Choquet integral, Annals of Operations Research, 52, 43–65. Gilboa, I. and D. Schmeidler (1995) Canonical representation of set functions, Mathematics of Operations Research, 20, 197–212.
106
Massimo Marinacci and Luigi Montrucchio
Grabisch, M., J.-L. Marichal and M. Roubens (2000) Equivalent representations of set functions, Mathematics of Operations Research, 25, 157–178. Greco, G. H. (1976) Integrale monotono, Rendiconti Seminario Matematico Università di Padova, 57, 149–166. Greco, G. H. (1981) Sur la mesurabilité d’une fonction numérique par rapport à une famille d’ensembles. Rendiconti Seminario Matematico Università di Padova, 65, 163–176. Greco, G. H. (1982) Sulla rappresentazione di funzionali mediante integrali, Rendiconti Seminario Matematico Università di Padova, 66, 21–42. Huber, P. J. and V. Strassen (1973) “Minimax tests and the Neyman-Pearson lemma capacities,” Annals of Statistics, 1, 251–263. Ichiishi, T. (1981) Super-modularity: applications to convex games and to the greedy algorithm for LP, Journal of Economic Theory, 25, 283–286. Kannai, Y. (1969) Countably additive measures in cores of games, Journal of Mathematical Analysis and Applications, 27, 227–240. Kelley, J. L. (1959) Measures on Boolean algebras, Pacific Journal of Mathematics, 9, 1165–1177. Kikuta, K. (1988) A condition for a game to be convex, Mathematica Japonica, 33, 425–430. Kikuta, K. and L. S. Shapley (1986) Core stability in n-person games, mimeo (2000). Maccheroni, F. and M. Marinacci, An Heine-Borel theorem for ba(), mimeo (2000). Maccheroni, F. and W. H. Ruckle (2002) BV as a dual space, Rendiconti Seminario Matematico Università di Padova, 107, 101–109. Marinacci, M. (1996) Decomposition and representation of coalitional games, Mathematics of Operations Research, 21, 1000–1015. (Reprinted as Chapter 12 in this volume.) Marinacci, M. (1997) Finitely additive and epsilon Nash equilibria, International Journal of Game Theory, 26, 315–333. Marinacci, M. and L. Montrucchio (2003) Subcalculus for set functions and cores of TU games, Journal of Mathematical Economics, 39, 1–25. Marinacci, M. and L. Montrucchio (2004) A Characterization of the core of convex games through Gateaux derivatives, Journal of Economic Theory, 116, 229–248. Moulin, H. (1995) Cooperative microeconomics, Princeton University Press, Princeton. Myerson, R. (1991) Game theory, Harvard University Press, Cambridge. Nguyen, H. T. (1978) On random sets and belief functions, Journal of Mathematical Analysis and Applications, 65, 531–542. Owen, G. (1972) Multilinear extensions of games, Management Science, 18, 64–79. Owen, G. (1995) Game theory, Academic Press, New York. Philippe, F., G. Debs and J.-Y. Jaffray (1999) Decision making with monotone lower probabilities of infinite order, Mathematics of Operations Research, 24, 767–784. Revuz, A. (1955) Fonctions croissantes et mesures sur les espaces topologiques ordonnés, Annales de l’Institut Fourier, 6, 187–269. Rota, G. C. (1964) Theory of Mobius functions, Z. Wahrsch. und Verw. Geb., 2, 340–368. Rudin, W. (1973) Functional analysis, McGraw-Hill, New York. Rudin, W. (1987) Real and complex analysis (3rd edition), McGraw-Hill, New York. Salinetti, G. and R. Wets (1986) On the convergence in distribution of measurable multifunctions (random sets), normal integrands, stochastic processes and stochastic infima, Mathematics of Operations Research, 11, 385–419. Schmeidler, D. (1968) On balanced games with infinitely many players, Research Program in Game Theory and Mathematical Economics, RM 28, The Hebrew University of Jerusalem.
Introduction to the mathematics of ambiguity
107
Schmeidler, D. (1972) Cores of exact games, Journal of Mathematical Analysis and Applications, 40, 214–225. Schmeidler, D. (1986) Integral representation without additivity, Proceedings of the American Mathematical Society, 97, 255–261. Schmeidler, D. (1989) Subjective probability and expected utility without additivity, Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Schultz, M. H. (1969) L∞ -multivariate approximation theory, SIAM Journal on Numerical Analysis, 6, 161–183. Shafer, G. (1976) A mathematical theory of evidence, Princeton University Press, Princeton. Shapley, L. S. (1953) A value for n-person games, in Contributions to the Theory of Games (H. Kuhn and A. W. Tucker, eds), Princeton University Press, Princeton. Shapley, L. S. (1967) On balanced sets and cores, Naval Research Logistic Quarterly, 14, 453–460. Shapley, L. S. (1971) Cores of convex games, International Journal of Game Theory, 1, 12–26. Sharkey, W. W. (1982) Cooperative games with large cores, International Journal of Game Theory, 11, 175–182. Sipos, J. (1979a) Integral with respect to a pre-measure, Mathematica Slovaca, 29, 141–155. Sipos, J. (1979b) Non linear integrals, Mathematica Slovaca, 29, 257–270. Wakker, P. (1993) Unbounded utility for Savage’s “foundations of statistics” and other models, Mathematics of Operations Research, 18, 446–485. Webster, R. (1994) Convexity, Oxford University Press, Oxford. Widder, D. V. (1941) The Laplace transform, Princeton University Press, Princeton. Wolfenson, M. and T.-L. Fine (1982) Bayes-like decision making with upper and lower probabilities, Journal of the American Statistical Association, 77, 80–88. Zhou, L. (1998) Integral representation of continuous comonotonically additive functionals, Transactions of the American Mathematical Society, 350, 1811–1822.
5
Subjective probability and expected utility without additivity David Schmeidler
5.1. Introduction Bayesian statistical techniques are applicable when the information and uncertainty with respect to the parameters or hypotheses in question can be expressed by a probability distribution. This prior probability is also the focus of most of the criticism against the Bayesian school. My starting point is to join the critics in attacking a certain aspect of the prior probability: The probability attached to an uncertain event does not reflect the heuristic amount of information that led to the assignment of that probability. For example, when the information on the occurrence of two events is symmetric they are assigned equal prior probabilities. If the events are complementary the probabilities will be 1/2, independently of whether the symmetric information is meager or abundant. There are two (unwritten?) rules for assigning prior probabilities to events in case of uncertainty. The first says that symmetric information with respect to the occurrence of events results in equal probabilities. The second says that if the space is partitioned into k symmetric (i.e. equiprobable) events, then the probability of each event is 1/k. I agree with the first rule and object to the second. In the example mentioned earlier, if each of the symmetric and complementary uncertain events is assigned the index 3/7, the number 1/7, 1/7 = 1 − (3/7 + 3/7), would indicate the decision maker’s confidence in the probability assessment. Thus, allowing nonadditive (not necessarily additive) probabilities enables transmission or recording of information that additive probabilities cannot represent. The idea of nonadditive probabilities is not new. Nonadditive (objective) probabilities have been in use in physics for a long time (Feynman, 1963). The nonadditivity describes the deviation of elementary particles from mechanical behavior toward wave-like behavior. Daniel Ellsberg (1961) presented his arguments against necessarily additive (subjective) probabilities with the help of the following “mind experiments”: There are two urns each containing one hundred balls. Each ball is either red or black. In urn I there are fifty balls of each color and
Schmeidler, D. (1989). “Subjective probability and expected utility without additivity,” Econometrica, 57, 571–587.
Subjective probability and EU without additivity
109
there is no additional information about urn I I . One ball is chosen at random from each urn. There are four events, denoted I R, I B, I I R, I I B, where I R denotes the event that the ball chosen from urn I is red, etc. On each of the events a bet is offered: $100 if the event occurs and zero if it does not. According to Ellsberg most decision makers are indifferent between betting on I R and betting on I B and are similarly indifferent between bets on I I R and I I B. It may be that the majority are indifferent among all four bets. However, there is a nonnegligible proportion of decision makers who prefer every bet from urn I (I B or I R) to every bet from urn I I (I I B or I I R). These decision makers cannot represent their beliefs with respect to the occurrence of uncertain events through an additive probability. The most compelling justification for representation of beliefs about uncertain events through additive prior probability has been suggested by Savage. Building on previous work by Ramsey, de Finetti, and von Neumann–Morgenstern (N–M), Savage suggested axioms for decision theory that lead to the criterion of maximization of expected utility. The expectation operation is carried out with respect to a prior probability derived uniquely from the decision maker’s preferences over acts. The axiom violated by the preference of the select minority in the example above is the “sure thing principle,” that is, Savage’s P2. In this chapter a simplified version of Savage’s model is used. The simplification consists of the introduction of objective or physical probabilities. An act in this model assigns to each state an objective lottery over deterministic outcomes. The uncertainty concerns which state will occur. Such a model containing objective and subjective probabilities has been suggested by Anscombe and Aumann (1963). They speak about roulette lotteries (objective) and horse lotteries (subjective). In the presentation here the version in Fishburn (1970) is used. The N–M utility theorem used here can also be found in Fishburn (1970). The concept of objective probability is considered here as a physical concept like acceleration, momentum, or temperature; to construct a lottery with given objective probabilities (a roulette lottery) is a technical problem conceptually not different from building a thermometer. When a person has constructed a “perfect” die, he assigns a probability of 1/6 to each outcome. This probability is objective in the same sense as the temperature measured by the thermometer. Another person can check and verify the calibration of the thermometer. Similarly, he can verify the perfection of the die by measuring its dimensions, scanning it to verify uniform density, etc. Rolling the die many times is not necessarily the exclusive test for verification of objective probability. On the other hand, the subjective or personal probability of an event is interpreted here as the number used in calculating the expectation (integral) of a random variable. This definition includes objective or physical probabilities as a special case where there is no doubt as to which number is to be used. This interpretation does not impose any restriction of additivity on probabilities, as long as it is possible to perform the expectation operation which is the subject of this work. Subjective probability is derived from a person’s preferences over acts. In the Anscombe–Aumann type model usually five assumptions are imposed on preferences to define unique additive subjective probability and N–M utility over
110
David Schmeidler
outcomes. The first three assumptions are essentially N–M’s—weak order, independence, and continuity— and the fourth assumption is equivalent to Savage’s P3, that is, state-independence of preferences. The additional assumption is nondegeneracy; without it uniqueness is not guaranteed. The example quoted earlier can be embedded in such a model. There are four states: (I B, I I B), (I B, I I R), (I R, I I B), (I R, I I R). The deterministic outcomes are sums of dollars. For concreteness of the example, assume that there are 101 deterministic outcomes: $0, $1, $2, . . . , $100. An act assigns to each state a probability distribution over the outcomes. The bet “$100 if I I B” is an act which assigns the (degenerate objective) lottery of receiving “$100 with probability one” to each state in the event I I B and “zero dollars with probability one” to each state in the event I I R. The bet on I I R is similarly interpreted. Indifference between these two acts (bets), the independence condition, continuity, and weak order imply indifference between either of them and the constant act which assigns to each state the objective lottery of receiving $100 with probability 1/2 and receiving zero dollars with probability 1/2. The same considerations imply that the constant act above is indifferent to either of the two acts (bets): “$100 if I B” and “$100 if I R”. Hence the indifference between I B and I R and the indifference between I I B and I I R in Ellsberg’s example, together with the von N–M conditions, imply indifference between all four bets. The nonnegligible minority of Ellsberg’s example does not share this indifference: they are indifferent between the constant act (stated earlier) and each bet from urn I , and prefer the constant act to each bet from urn I I . Our first objective consists of restatement, or more specifically of weakening, of the independence condition such that the new assumption together with the other three assumptions can be consistently imposed on the preference relation over acts. In particular the special preferences of the example become admissible. It is obvious that the example’s preferences between bets (acts) do not admit additive subjective probability. Do they define in some consistent way a unique nonadditive subjective probability, and if so, is there a way to define the expected utility maximization criterion for the nonadditive case? An affirmative answer to this problem is presented in the third section. Thus the new model rationalizes nonadditive (personal) probabilities and admits the computation of expected utility with respect to these probabilities. It formally extends the additive model and it makes the expected utility criterion applicable to cases where additive expected utility is not applicable. Before turning to a precise and detailed presentation of the model, another heuristic observation is made. The nomenclature used in economics distinguishes between risk and uncertainty. Decisions in a risk situation are precisely the choices among roulette lotteries. The probabilities are objectively given; they are part of the data. For this case the economic theory went beyond N–M utility and defined concepts of risk aversion, risk premium, and certainty equivalence. Translating these concepts to the case of decisions under uncertainty we can speak about uncertainty aversion, uncertainty premium, and risk equivalence. Returning to the example, suppose that betting $100 on I I R is indifferent to betting $100 on a risky event with an (objective) probability of 3/7. Thus, the subjective probability
Subjective probability and EU without additivity
111
of an event is its risk equivalent (P (I I R) = 3/7). In this example the number 1/7 computed earlier expresses the uncertainty premium in terms of risk. Note that nonadditive probability may not exhibit consistently either uncertainty aversion or uncertainty attraction. This is similar to the case of decisions in risk situations where N–M utility (of money) may be neither concave nor convex.
5.2. Axioms and background Let X be a set and Y be the set of distributions over X with finite supports Y = y : X → [0, 1] | y(x) = 0 for finitely many x’s in X and
$ y(x) = 1 .
x∈X
For notational simplicity we identify X with the subset {y ∈ Y | y(x) = 1 for some x in X} of Y . Let S be a set and let be an algebra of subsets of S. Both sets, X and S are assumed to be nonempty. Denote by L0 the set of all -measurable finite valued functions from S to Y and denote by Lc the constant functions in L0 . Let L be a convex subset of Y S which includes Lc . Note that Y can be considered a subset of some linear space, and Y S , in turn, can then be considered as a subspace of the linear space of all functions from S to the first linear space. Whereas it is obvious how to perform convex combinations in Y it should be stressed that convex combinations in Y S are performed pointwise. That is, for f and g in Y S and α in [0, 1], αf + (1 − α)g = h where h(s) = αf (s) + (1 − α)g(s) on S. In the neo-Bayesian nomenclature, elements of X are (deterministic) outcomes, elements of Y are random outcomes or (roulette) lotteries, and elements of L are acts (or horse lotteries). Elements of S are states (of nature) and elements of are events. The primitive of a neo-Bayesian decision model is a binary (preference) relation over L to be denoted by . Next are stated several properties (axioms) of the preference relation, which will be used in the sequel. (i) Weak order. (a) For all f and g in L: f g or g f . (b) For all f, g, and h in L: If f g and g h, then f h. The relation on L induces a relation also denoted by on Y : y z iff y S zS where y S denotes the constant function y on S (i.e. {y}S ). As usual, and ∼ denote the asymmetric and symmetric parts, respectively, of . Definition 5.1. Two acts f and g in Y S are said to be comonotonic if for no s and t in S, f (s) f (t) and g(t) g(s). A constant act f , that is, f = y S for some y in Y , and any act g are comonotonic. An act f whose statewise lotteries {f (s)} are mutually indifferent, that is, f (s) ∼ y for all s in S, and any act g are comonotonic. If X is a set of numbers and
112
David Schmeidler
preferences respect the usual order on numbers, then any two X-valued functions f and g are comonotonic iff (f (s) − f (t))(g(s) − g(t)) 0 for all s and t in S. Clearly, I I R and I I B of the Introduction are not comonotonic. (Comonotonicity stands for common monotonicity.) Next our new axiom for neo-Bayesian decision theory is introduced. (ii) Comonotonic Independence. For all pairwise comonotonic acts f, g, and h in L and for all α in ]0, 1[: f g implies αf + (1 − α)h αg + (1 − α)h. (]0, 1[ is the open unit interval.) Elaboration of this condition is delayed until after condition (vii). Comonotonic independence is clearly a less restrictive condition than the independence condition stated below. (iii) Independence. For all f, g, and h in L and for all α in ]0, 1[: f g implies αf + (1 − α)h αg + (1 − α)h. (iv) Continuity. For all f, g, and h in L: f g and g h, then there are α and β in ]0, 1[ such that αf + (1 − α)h g and g βf + (1 − β)h. Next, two versions of state-independence are introduced. The intuitive meaning of each of these conditions is that the preferences over random outcomes do not depend on the state that occurred. The first version is the one to be used here. The second version is stated for comparisons since it is the common one in the literature. (v) Monotonicity. For all f and g in L: If f (s) g(s) on S thenf g. (vi) Strict monotonicity. For all f and g in L, y and z in Y, and E in : If f g, f (s) = y on E and g(s) = z on E, and f (s) = g(s) on E c , then y z. Observation.
If L = L0 , then (vi) and (i) imply (v).
Proof. Let f and g be finite step functions such that f (s) g(s) on s. There is a finite chain f = h0 , h1 , . . . , hk , = g where each pair of consecutive functions hi−1 , hi are constant on the set on which they differ. For this pair (vi) and (i) imply (v). Transitivity (i)(b) of concludes the proof. Clearly (i) and (v) imply (vi). For the sake of completeness we list as axiom: (vii) Nondegeneracy. Not for all f and g in L, f g. Out of the seven axioms listed here the completeness of the preferences, (i)(a), seems to be the most restrictive and most imposing assumption of the theory. One can view the weakening of the completeness assumption as a main contribution of all other axioms. Imagine a decision maker who initially has a partial preference relation over acts. After an additional introspection she accepts the validity of several of the axioms. She can then extend her preferences using these axioms.
Subjective probability and EU without additivity
113
For example, if she ranks f g and g h, and if she accepts transitivity, then she concludes that f h. From this point of view, the independence axiom, (iii), seems the most powerful axiom for extending partial preferences. Given f g and independence we get for all h in L and α in ]0, 1[: f ≡ αf + (1 − α)h αg + (1 − α)h ≡ g . However, after additional retrospection this implication may be too powerful to be acceptable. For example, consider the case where outcomes are real numbers and S = [0, 2π]. Let f and g be two acts defined: f (s) = sin(s) and g(s) = sin(s + π/2) = cos(s). The preferences f g may be induced by the rough evaluation that the event [π/3, 4π/3] is more probable than its complement. Define the act h by h(s) = sin(77s). In this case the structure of the acts f = 12 f + 12 h and g = 12 g + 12 h is far from transparent and the automatic implication of independence, f g , may seem doubtful to the decision maker. More generally: the ranking f g implies some rough estimation by the decision maker of the probabilities of events (in the algebra) defined by the acts f and g. If mixture with an arbitrary act h is allowed, the resulting acts f and g may define a much finer (larger) algebra (especially when the algebra defined by h is qualitatively independent of the algebras of f and g). Careful retrospection and comparison of the acts f and g may lead them to the ranking g f (as in the case of the Ellsberg paradox) contradictory to the implication of the independence axiom. Qualifying the comparisons and the application of independence to comonotonic acts rules out the possibility of contradiction. If f , g, and h are pairwise comonotonic, then the comparison of f to g is not very different from the comparison of f to g . Hence the decision maker can accept the validity of the implication: f g ⇐⇒ f g , without fear of running into a contradiction. Note that accepting the validity of comonotonic independence, (ii), means accepting the validity of the implication mentioned earlier without knowing the specific acts f , g, h, f , g , but knowing that all five are pairwise comonotonic. Before presenting the von Neumann–Morgenstern Theorem we point out that stating the axioms of, (i) weak order, (iii) independence, and (iv) continuity do not require that the preference relation be defined on a set L containing Lc . Only the convexity of L is required for (ii) and (iii). von Neumann–Morgenstern Theorem. Let M be a convex subset of some linear space, with a binary relation defined on it. A necessary and sufficient condition for the relation to satisfy (i) weak order, (iii) independence, and (iv) continuity is the existence of an affine real-valued function, say w, on M such that for all f and g in M: f g iff w(f ) w(g). (Affinity of w means that w(αf +(1−α)g) = αw(f )+(1−α)w(g) for 0 < α < 1.) Furthermore, an affine real-valued function w on M can replace w in the above statement iff there exist a positive number α and a real number β such that w (f ) = αw(f ) + β on M. As mentioned earlier, for proof of this theorem and the statement and proof of Anscombe–Aumann Theorem stated later, the reader is referred to Fishburn (1970).
114
David Schmeidler
Implication. Suppose that a binary relation on some convex subset L of Y S with Lc ⊂ L satisfies (i) weak order, (ii) comonotonic independence, and (iv) continuity. Suppose also that there is a convex subset M of L with Lc ⊂ M such that any two acts in M are comonotonic. Then by the von Neumann–Morgenstern Theorem there is an affine function on M, to be denoted by J , which represents the binary relation on M. That is, for all f and g in M: f g iff J (f ) J (g). Clearly, if M = Lc ≡ {y S | y ∈ Y } any two acts in M are comonotonic. Hence, if a function u is defined on Y by u(y) = J (y S ), then u is affine and represents the induced preferences on Y . The affinity of u implies u(y) = x∈X y(x)u(x). When subjective probability enters into the calculation of expected utility of an act, an integral with respect to a finitely additive set function has to be defined. Denote by P a finitely additive probability measure on and let a be a real-valued -measurable function on S. For the special case where a is a finite step function, a can be uniquely represented by ki=1 ai Ei∗ where α1 > α2 > · · · > αk are the values that a attains and Ei∗ is the indicator function on S of Ei ≡ {s ∈ S | a(s) = αi } for i = 1, . . . , k. Then a dP = S
k
P (Ei )αi .
i=1
The more general case where a is not finitely valued is treated as a special case of nonadditive probability. Anscombe–Aumann Theorem. Suppose that a preference relation on L = L0 satisfies (i) weak order, (iii) independence, (iv) continuity, (vi) strict monotonicity, and (vii) nondegeneracy. Then there exist a unique finitely additive probability measure P on and an affine real-valued function u on Y such that for all f and g in L0 : f g iff u(f (·)) dP (g(·)) dP . S
S
Furthermore, if there exist P and u as above, then the preference relation they induce on L0 satisfied conditions, (i), (iii), (iv), (vi), and (vii). Finally, the function u is unique up to a positive linear transformation. There are three apparent differences between the statement of the main result in the next section and the Anscombe–Aumann Theorem stated earlier: (i) Instead of strict monotonicity, monotonicity is used. It has been shown in the Observation that it does not make a difference. However, for the forthcoming extension, monotonicity is the natural condition. (ii) Independence is replaced with comonotonic independence. (iii) The finitely additive probability measure P is replaced with a nonadditive probability v.
Subjective probability and EU without additivity
115
5.3. Theorem A real-valued set function v on is termed nonadditive probability if it satisfies the normalization conditions v(φ) = 0 and v(S) = 1, and monotonicity, that is, for all E and G in : E ⊂ G implies v(E) v(G). We now introduce the definition of s a dv for v nonadditive probability and a = ki=1 αi Ei∗ a finite step function with α1 > α2 > · · · > αk and (Ei )ki=1 a partition of S. Let αk+1 = 0 and define a dv = S
k
⎛ (αi − αi+1 )v ⎝
i
⎞ Ej ⎠ .
j =1
i=1
For the special case of v additive the definition stated earlier coincides with the usual one mentioned in the previous section. Theorem 5.1. Suppose that the preference relation on L = L0 satisfies (i) weak order, (ii) comonotonic independence, (iv) continuity, (v) monotonicity, and (vii) nondegeneracy. Then there exist a unique nonadditive probability v on and an affine real-valued function u on Y such that for all f and g in L0 : f g iff u(f (·)) dv u(g(·)) dv. S
S
Conversely, if there exist v and u as above, u nonconstant, then the preference relation they induce on L0 satisfies, (i), (ii), (iv), (v), and (vii). Finally, the function u is unique up to positive linear transformations. Proof. From the implication of the von N–M Theorem we get a N–M utility u representing the preference relation induces on Y . By nondegeneracy there are f ∗ and f∗ in L0 with f ∗ f∗ . Monotonicity, (v), implies existence of a state s in S such that f ∗ (s) ≡ y ∗ f∗ (s) ≡ y∗ . Since u is given up to a positive linear transformation, suppose from now on u(y ∗ ) = 1 and u(y∗ ) = −1. Denote K = u(Y ). Hence K is a convex subset of the real line including the interval [1, −1]. For an arbitrary f in L0 denote Mf = {αf + (1 − α)y S | y ∈ Y
and a ∈ [0, 1]}.
Thus Mf is the convex hull of the union of f and Lc . It is easy to see that any two acts in Mf are comonotonic. Hence, there is an affine real-valued function on Mf , which represents the preference relation restricted to Mf . After rescaling, this function, Jf satisfies Jf (Y ∗S ) = 1 and Jf (y∗S ) = −1. Clearly, if h ∈ Mf ∩ Mg , then Jf (h) = Jg (h). So, defining J (f ) = Jf (f ) for f in L0 , we get a real-valued function on L0 which represents the preferences on L0 and satisfies for all y in Y : J (y S ) = u(y). Let B0 (K) denote the -measurable, K-valued finite step function on S. Let U : L0 → B0 (K) be defined by U (f )(s) = u(f (s)) for s in S
116
David Schmeidler
and f in L0 . The function U is onto, and if U (f ) = U (g), then by monotonicity f ∼ g, which in turn implies J (f ) = J (g). We now define a real-valued function I on B0 (K). Given a in B0 (K), let f in L0 be such that U (f ) = a. Then define I (a) = J (f ). I is well defined since as mentioned earlier J is constant on U −1 (a):
L0
U
B0 I
J
R We now have a real-valued function I on B0 (K) which satisfies the following three conditions: (i) For all α in K: I (αS ∗ ) = α. (ii) For all pairwise comonotonic functions a, b, and c in B0 (K) and α in [0, 1]: if I (a) > I (b) then I (αa + (1 − α)c) > I (αb + (1 − α)c). (iii) If a(s) b(s) on S for a and b in B0 (K), then I (a) I (b). To see that (i) is satisfied, let y in Y be such that u(y) = α. Then J (y S ) = α and U (y S ) = αS ∗ . Hence I (αS ∗ ) = α. Similarly, (ii) is satisfied because comonotonicity is preserved by U and J represents which satisfies comonotonic independence. Finally, (iii) holds because U preserves monotonicity. The corollary of Section 3 and the Remark following it in Schmeidler (1986) say that if a real-valued function I on B0 (K) satisfies conditions (i), (ii), and (iii), then the nonadditive probability v on defined by v(E) = I (E ∗ ) satisfies for all a and b in B0 (K):
I (a) I (b)
a dv
iff
b dv.
(5.1)
S
S
Hence, for all f and g in L0 :
f g
U (f ) dv
iff S
U (g) dv, S
and the proof of the main part of the theorem is completed. To prove the opposite direction note first that in Schmeidler (1986) it is shown and referenced that if I on B0 (K) is defined by (5.1), then it satisfies conditions (i), (ii), and (iii). (Only (ii) requires some proof.) Second, the assumptions of the opposite direction say that J is defined as a combination of U and I in the diagram. Hence the preference relation on L0 induced by J satisfies all the required conditions. (U preserves monotonicity and comonotonicity and S a dv is a (sup) norm continuous function of a.) Finally, uniqueness properties of the expected utility representation will be proved. Suppose that there exist an affine real-valued function u on Y and a
Subjective probability and EU without additivity nonadditive probability v on such that for all f and g in L0 : u (f (s)) dv u (g(s)) dv . f g iff S
117
(5.2)
S
Note that monotonicity of v can be derived instead of assumed. When considering (5.2) for all f and g in Lc we immediately obtain, from the uniqueness part of the von N–M Theorem, that u is a positive linear transformation of u. On the other hand it is obvious that the inequality in (5.2) is preserved under positive linear transformations of the utility. Hence, in order to prove that v = v we may assume without loss of generality that u = u. For an arbitrary E in let f in ∗ ∗ L0 be such that f (s) = y ∗ /2 + y∗ /2 U (f ) = E . (E.g. f (s) = y on E and C on E . Then S U (f ) dv = v(E) and S U (f ) dv = v (E).) Let y in Y be such that u(y) = v(E). (E.g. y = v(E)y ∗ +(1 − v(E))(y ∗ /2 + y∗ /2).) Then f ∼ y S which in turn implies u(y) = u (y) = S u (y S ) dv = v (E). The last equality is implied by (5.2). In order to extend the Theorem to more general acts, we have to specify precisely the set of acts L on which the extension holds and we have to extend correspondingly the definition of the integral with respect to nonadditive probability. We start with the latter. Denote by B the set of real-valued, bounded -measurable functions on S. Given a in B and a nonadditive probability v on we define 0 ∞ a dv = (v(a α) − 1) dα + v(a α) dα. S
−∞
0
Each of the integrands mentioned earlier is monotonic, bounded and identically zero where |α| > λ for some number λ. This definition of integration for nonnegative functions in B has been suggested by Choquet (1955). A more detailed exposition appears in Schmeidler (1986). It should be mentioned here that this definition coincides, of course, with the one at the beginning of this section when a obtains finitely many values. For the next definition, existence of weak order over Lc is presupposed. An act f : S → Y is said to be -measurable if for all y in Y the sets {s|f (s) y} and {s|f (s) y} belong to . It is said to be bounded if there are y and z in Y such that y f (s) z on S. The set of all -measurable bounded acts in Y S is denoted by L(). Clearly, it contains L0 . Corollary 5.1. (a) Suppose that a preference relation over L0 satisfies (i) weak order, (ii) comonotonic independence, (iv) continuity, and (v) monotonicity. Then it has a unique extension to all of L() which satisfies the same conditions (over L()). (b) If the extended relation, also to be denoted by , is nondegenerate, then there exist a unique nonadditive probability v on and an affine real-valued function u (unique upto positive linear transformations) such that for all f and g in L(): f g iff S u(f (·)) dv S u(g(·)) dv.
118
David Schmeidler
Proof. The case of degeneracy is obvious, so assume nondegenerate preferences. Consider the following diagram: U⬘
L(≥) ~ i J⬘
U
L0 J
B(K) i
B0 (K)
I⬘
I
R The inner triangle is that of the Proof of the Theorem. B(K) is the set of Kvalued, -measurable, bounded functions on S, and i denotes identity. U is the natural extension of U and is also onto. Because B0 (K) is (sup) norm dense in B(K) and I satisfies condition (iii), I is the unique extension of I that satisfies on B(K) the three conditions that I satisfies on B0 (K). The functional J , defined on L() by: J (f ) = I (U (f )), extends J . Hence, the relation on L() defined by f g iff J (f ) J (g) extends the relation on L0 , and satisfies the desired properties. By the corollary of section 3 in Schmeidler (1986) there exists a nonaddi tive probability v on such that for all f and g in L(): I (f ) I (g) iff S U (f ) dv S U (g) dv. Hence, the expected utility representation of the preference relation has been shown. To complete the proof of (b), uniqueness of v and uniqueness up to a positive linear transformation of u have to be established. However, it follows from the corresponding part of the Theorem. The uniqueness properties also imply that the extension of from L0 to L() is unique. Remark 5.1. Instead of first stating the Theorem for L0 and then extending it to L(), one can state directly the extended Theorem. More precisely a preference relation on L, L0 ⊂ L ⊂ Y S is defined such that in addition to the conditions (i), (ii), (iv), and (vii) it satisfies L = L(). It can then be represented by expected utility with respect to nonadditive probability. However, the first part of the Corollary shows that in this case the preference relation of L() is overspecified: The preferences of L0 dictate those over L(). Remark 5.2. If does not contain all subsets of S, and #X 3 then L() contains finite step functions that do not belong to L0 . Let y and z in Y be such that y ∼ z but y = z, and let E ⊂ S but E ∈ . Define f (s) = y on E and f (s) = z on E C . Clearly f ∈ L0 . The condition #X 3 is required to guarantee existence of y and z as mentioned earlier. Remark 5.3. It is an elementary exercise to show that under the conditions of the Theorem, v is additive iff satisfies (iii) independence (instead of or in addition to (ii) comonotonic independence). Also an extension of an independent relation,
Subjective probability and EU without additivity
119
as in Corollary (a), is independent. Hence our results formally extend the additive theory. We now introduce formally the concept of uncertainty aversion alluded to in the Introduction. A binary relation on L is said to reveal uncertainty aversion if for any three acts f , g, and h in L and any α in [0, 1]: If f h and g h, then αf + (1 − α)g h. Equivalently we may state: If f g, then αf + (1 − α)g g. For definition of strict uncertainty aversion the conclusion should be a strict preference . However, some restrictions then have to be imposed on f and g. One such obvious restriction is that f and g are not comonotonic. We will return to this question in a subsequent Remark. Intuitively, uncertainty aversion means that “smoothing” or averaging utility distributions makes the decision maker better off. Another way is to say that substituting objective mixing for subjective mixing makes the decision maker better off. The definition of uncertainty aversion may become more transparent when its full mathematical characterization is presented. Proposition 5.1. Suppose that on L = L() is the extension of on L0 according to the Corollary. Let v be the derived nonadditive subjective probability and I (the I of the Corollary) be the functional on B, I (a) = S a dv. Then the following conditions are equivalent: (i) reveals uncertainty aversion. (ii) For all a and b in B: I (a + b) I (a) + I (b). (iii) For all a and b in B and for all α in [0, 1]: I (αa + (1 − α)b) αI (a) + (1 − α)b. (iv) For all a and b in B and for all α in [0, 1]: I (αa + (1 − α)b) min{I (a), I (b)}. (v) For all α in R the sets {a ∈ B | I (a) α} are convex. (vi) There exists an α¯ in R s.t. the set {a ∈ b | I (a) α} ¯ is convex. (vii) For all a and b in B and for all α in [0, 1]: If I (a) = I (b), then I (αa + (1 − α)b) I (a). (viii) For all a and b in B: If I (a) = I (b), then I (a + b) I (a) + I (b). (ix) v is convex. That is, for all E and F in : v(E) + v(F ) v(EF ) + v(E + F ). (x) For all a in B: I (a) = min{ S a dp | p ∈ core(v)}, where core(v) = {p: → R | p is additive, p(s) = v(S) and for all E in , p(E) v(E)}.
120
David Schmeidler
Proof. For any functional on B: (iii) implies (iv), (iv) implies (vii), (iv) is equivalent to (v), and (v) implies (vi). The positive homogeneity of degree one of I results in: (ii) equivalent to (iii) and (vii) equivalent to (viii). (vi) implies (v) because for all β in R, (β = α − α), ¯ I (a + βS ∗ ) = I (a) + β, and because adding ∗ βS preserves convexity. (viii) implies (ix). Suppose, without loss of generality, that v(E) v(F ). Then there is γ 1 such that v(E) = γ v(F ). Since I (E ∗ ) = v(E) = γ v(F ) = I (γ F ∗ ), we have by (viii), v(E) + γ v(F ) I (E ∗ + γ F ∗ ). But E ∗ + γ F ∗ = (EF )∗ + (γ − 1)F ∗ + (E + F )∗ , which implies I (E ∗ + γ F ∗ ) = v(EF ) + (γ − 1)v(F ) + v(E + F ). Inserting the last equality in the inequality above leads to the inequality in (ix). The equivalence of (ix), (x), and (ii) is stated as proposition 3 in Schmeidler (1986). Last but not least, (i) is equivalent to (iv). This becomes obvious after considering the mapping U from the diagram in the Proof of the Corollary. The basic result of the Proposition is the equivalence of (i), (iii), (iv), (ix), and (x). (iv) is quasiconcavity of I and it is the translation of (i) by U from L to B. (iii) is concavity, which usually is a stronger assumption. Here I is concave iff it is quasiconcave. Concavity captures best the heuristic meaning of uncertainty aversion. Remark 5.4. The Proposition holds if all the inequalities are strict and in (i) it is strict uncertainty aversion. To show it precisely, null or dummy events in have to be defined. An event E in is termed dummy if for all F in : v(F + E) = v(F ). In (ii)–(vii), in order to state strict inequality one has to assume that a and b are not comonotonic for any b which differs from b on a dummy set. To have a strict inequality in (ix) one has to assume that (E − F )∗ , (EF )∗ , and (F − E)∗ are not dummies. In (x) a geometric condition on the core of v has to be assumed. Remark 5.5. The point of view of this work is that if the information is too vague to be represented by an additive prior, it still may be represented by a nonadditive prior. Another possibility is to represent vague information by a set of priors. Condition (x) and its equivalence to other conditions of the Proposition point out when the two approaches coincide. Remark 5.6. The concept of uncertainty appeal can be defined by: f g implies f αf + (1 − α)g. In the Proposition then all the inequalities have to be reversed and maxima have to replace minima. Obviously, additive probability or the independence axiom reveal uncertainty neutrality.
5.4. Concluding remarks 5.4.1 In the introduction a point of view distinguishing between objective and subjective probabilities has been articulated. It is not necessary for the results
Subjective probability and EU without additivity
121
of this work. What matters is that the lotteries in Y be constructed of additive probabilities. These probabilities can be subjectively arrived upon. This is the point of view of Anscombe and Aumann (1963). They describe their result as a way to assess complicated probabilities, “horse lotteries,” assuming that the probabilities used in the simpler “roulette lotteries” are already known. The Theorem here can also be interpreted in this way, and one can consider the lotteries in Y as derived within the behavioristic framework as follows: Let be a set (a roulette). An additive probability P on all subsets of is derived via Savage’s Theorem. More specifically, let Z be a set of outcomes with two or more elements. (Suppose that the sets Z and X are disjoint.) Let F denote the set of Savage’s acts, that is, all functions from to Z. Postulating existence of a preference relation on F satisfying Savage’s axioms leads to an additive probability P on . Next we identify a lottery, say y, in Y with all the acts from
to X, which induce the probability distribution y. Thus we have a two-step model within the framework of a behavioristic (or personal or subjective) theory of probability. Since the motivation of our Theorem is behavioristic (i.e. derivation of utility and probability from preference), the conceptual consistency of the work requires that the probabilities in Y could also be derived from preferences. We will return to the question of conceptual consistency in the next Remark. Instead of the two-step model of the previous paragraph one can think of omitting the roulette lotteries from the model. One natural way to do this is to try to extend Savage’s Theorem to nonadditive probability. This has been done by Gilboa (1987). Another approach has been followed by Wakker (1986), wherein he substituted a connected topological space for the linear structure of Y . 5.4.2 In recent years many articles have been written which challenged the expected utility hypothesis in the von Neumann–Morgenstern model and in the model with state-dependent acts. We restrict our attention to models that (i) introduce functional representation of a preference relation derived from axioms, and (ii) separate “utilities” from “probabilities” (in the representation). Furthermore (iii) we consider functional representations which are sums of products of two numbers; one number has a “probability” interpretation and the other number has a “utility” interpretation. (For recent works disregarding restriction (iii) the reader may consult Fishburn (1985) and the reference there.) Restriction (iii) is tantamount to the functional representation used in the Theorem (the Choquet integral). An article that preceded the present work in this kind of representation using nonadditive probability is Quiggin (1982). (Thanks for this reference are due to a referee.) His result will be introduced here somewhat indirectly. 5.4.2.1 Consider a preference relation over acts satisfying the assumptions, and hence the conclusions, of the Theorem. Does there exist an additive probability P on and a nondecreasing function f from the unit interval onto itself such that v(E) = f (P (E)) on ? (Such a function f is referred to as a distortion function.) Conditions leading to a positive answer when the function f is increasing are well known. (They are stated as a step in the proof in Savage (1954); see also Fishburn (1970).) In this case v represents qualitative (or ordinal) probability, and
122
David Schmeidler
the question we deal with can be restated as follows: Under what conditions does a qualitative probability have an additive representation? The problem is much more difficult when f is just nondecreasing but not necessarily increasing. A solution has been provided by Gilboa (1985). 5.4.2.2 The set of nonadditive probabilities which can be represented as a composition of a distortion function f and an additive probability P is “small” relative to all nonadditive probabilities. For example, consider the following version of the Ellsberg paradox. There are 90 balls in an urn, 30 black, B, balls and all the other balls are either white, W , or red, R. Bets on the color of a ball drawn at random from the urn are offered. A correct guess is awarded by $100. There are six bets: “B”, “R”, “W ”, “B or W ”, “R or W ”, and “B or R”. The following preferences constitute an Ellsberg paradox: B R ∼ W , R or W B or R ∼ B or W . It is impossible to define an additive probability on the events B, R, and W such that this probability’s (nondecreasing) distortion will be compatible with the preferences mentioned earlier. 5.4.2.3 In Quiggin’s model X is the set of real numbers. An act is a lottery k of the form y = (xi , pi )i=1 where k 1, x1 x2 · · · xk , pi 0 and pi = 1. Quiggin postulates a weak order over all such acts which satisfies several axioms. As a result he gets a unique distortion function f and a monotonic, unique up to a positive linear transformation, utility function u on X such that the mapping y → ki=1 (xi −xi+1 )f ( ij =1 pj ) represents the preferences. However, f (1/2) = 1/2. Quiggin’s axioms are not immediate analogues of the assumptions in Section 5.2. For example he postulates the existence of certainty equivalence for each act, that is, for every y there is x in X such that y ∼ x. Yaari (1987) simplified Quiggin’s axioms and got rid of the restriction f (1/2) = (1/2) on the distortion function. However, Yaari’s main interest was the uncertainty aversion properties of the distortion function f . Hence his simplified axioms result in linear utility over the set of incomes, X. He explored the duality between concavity of the utility functions in the theory of risk aversion and the convexity of the distortion function in the theory of uncertainty aversion. Quiggin extended his results from distributions over the real numbers with finite support to distributions over the real line having density functions. Yaari dealt with arbitrary distribution functions over the real line. Finally, Segal (1984) and Chew (1984) got the most general representation for Quiggin’s model. I conclude my remark on the works of Quiggin, Yaari, and Segal with a criticism from a normative, behavioristic point of view: It may seem conceptually inconsistent to postulate a decision maker who, while computing anticipated utility, assigns weight f (p) to an event known to him to be of probability p, p = f (p). His knowledge of p is derived, within the behavioristic model, from preferences over acts (as in 5.4.1). The use of the terms “anticipation” and “weight,” instead of “expectation” and “probability” does not resolve, in my opinion, the inconsistencies. One way out would be to follow paragraph 5.4.2.1 and to try to derive simultaneously distorted and additive probabilities of events. 5.4.3 The first version of this work (Schmeidler (1982)) includes a slightly extended version of the present Theorem. First recall that Savage termed an event
Subjective probability and EU without additivity
123
E null if for all f and g in L: f = g on E c implies f ∼ g. Clearly, if the conditions of the Theorem are satisfied then an event is null iff it is dummy. The extended version of the Theorem includes the following addition: The nonadditive probability v of the Theorem satisfies the following condition: u(E) = 0 implies E is dummy, if and only if the preference relation also satisfies: E is not null, f = g on E c and f (s) g(s) on E imply f g. 5.4.4 The expected utility model has in economic theory two other interpretations in addition to decisions under uncertainty. One interpretation is decisions over time: s in S represents time or period. The other interpretation of S is the set of persons or agents in the society, and the model is applied to the analysis of social welfare functions. Our extension of the expected utility model may have the same uses. Consider the special case where f (s) is s person’s income. Two income allocations f and g are comonotonic if the social rank (according to income) of any two persons is not reversed between f and g. Comonotonic f , g, and h induce the same social rank on individuals and then f g implies γ f +(1−γ )h γ g +(1−γ )h. This restriction on independence is, of course, consistent with strict uncertainty aversion which can here be interpreted as inequality (or inequity) aversion. In other words we have here an “Expected Utility” representation of a concave Bergson–Samuelson social welfare function. 5.4.5 One of the puzzling phenomena of decisions under uncertainty is people buying life insurance and gambling at the same time.1 This behavior is compatible with the model of this chapter. Let S 0 = S 1 × S 2 × S 3 , where s 1 in S 1 describes a possible state of health of the decision maker, s 2 in S 2 describes a possible resolution of the gamble, and s 3 in S 3 describes a possible resolution of all other relevant uncertainties. Let v i be a nonadditive probability on S i , i = 0, 1, 2, 3. Suppose that v 1 is strictly convex (i.e. satisfying strict uncertainty aversion), v 2 is strictly concave (i.e. v 2 (E) + v 2 (F ) > v(E ∪ F ) + v(E ∩ F ) if E\F and F \E are nonnull). Furthermore, if E 0 = E 1 × E 2 × E 3 , and Ei ⊂ S i , then v 0 (E 0 ) = v 1 (E 1 )v 2 (E 2 )v 3 (E 3 ). To simplify matters suppose that X is a bounded interval of real numbers (representing an income in dollars), and the utility u is linear on X. Let the preference relation over acts on S 0 be represented by f → u(f ) dv 0 . In this case buying insurance and gambling (betting) simultaneously is preferred to buying insurance only or gambling only, ceteris parabus. Also either of these last two acts is preferred to “no insurance no gambling.”
Acknowledgments I am thankful to Roy Radner for comments on the previous version presented at Oberwolfach, 1982. Thanks are due also to Benyamin Shitovitz, and anonymous referees for pointing out numerous typos in previous versions. Partial financial support from the Foerder Institute and NSF Grant No. SES 8026086, is gratefully acknowledged. Parts of this research have been done at the University of Pennsylvania, and at the Institute for Mathematics and its Applications at the University of Minnesota.
124
David Schmeidler
Note 1 It is not puzzling, as a referee pointed out, if one accepts the Friedman–Savage (1948) explanation of this phenomenon.
References Anscombe, F. J. and R. J. Aumann (1963). “A Definition of Subjective Probability,” The Annals of Mathematical Statistics, 34, 199–205. Chew, Soo Hong (1984). “An Axiomatization of the Rank-dependent Quazilinear Mean Generalizing the Gini Mean and the Quazilinear Mean,” mimeo. Choquet, G. (1955). “Theory of Capacities,” Ann. Inst. Fourier (Grenoble), 5, 131–295. Dunford, N. and J. T. Schwarz (1957). Linear Operators Part I. New York: Interscience. Ellsberg, D. (1961). “Risk, Ambiguity and Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Feynman, R. P. et al. (ed.) (1963, 1965). The Feynman Lectures on Physics, Vol. I, Sections 37–4, 37–5, 37–6, and 37–7; Vol III, Chapter 1. Fishburn, P. C. (1970). Utility Theory for Decision Making. New York: John Wiley & Sons. —— (1985). “Uncertainty Aversion and Separated Effects in Decision Making Under Uncertainty,” mimeo. Friedman, M. and L. J. Savage (1984). “The Utility Analysis of Choices Involving Risk,” Journal of Political Economy, 56, 279–304. Gilboa, I. (1987). “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88. —— (1985). “Subjective Distortions of Probabilities and Non-Additive Probabilities,” Working Paper, The Foerder Institute for Economic Research, Tel Aviv University. von Neumann, J. and O. Morgenstern (1947). Theory of Games and Economic Behavior, 2nd ed. Princeton: Princeton University Press. Quiggin, J. (1982). “A Theory of Anticipated Utility,” Journal of Economic Behavior and Organization, 3, 323–343. Savage, L. J. (1954). The Foundations of Statistics. New York (2nd ed. 1972): John Wiley & Sons; New York: Dover Publications. Schmeidler, D. (1982). “Subjective Probability without Additivity” (Temporary Title), Working Paper, The Foerder Institute for Economic Research, Tel Aviv University. —— (1986). “Integral Representation with Additivity,” Proceedings of the American Mathematical Society, 97, 253–261. Segal, U. (1984). “Nonlinear Decision Weights with the Independence Axiom,” UCLA Working Paper #353. Wakker, P. O. (1986). “Representations of Choice Situations,” Ph.D. Thesis, Tilburg, rewritten as Additive Representation of Preferences (1989). Norwell, MA: Kewer Academic Publishers (Ch. VI). Yaari, M. E. (1987). “Dual Theory of Choice Under Uncertainty,” Econometrica, 55, 95–115.
6
Maxmin expected utility with non-unique prior Itzhak Gilboa and David Schmeidler
6.1. Introduction One of the first objections to Savage’s paradigm was raised by Ellsberg (1961). He suggested the following mind experiment, challenging the expected utility hypotheses: subject is asked to preference rank four bets. He/she is shown two urns, each containing 100 balls each one either red or black. Urn A contains 50 black balls and 50 red ones, while there is no additional information about urn B. One ball is drawn at random from each urn. Bet 1 is “the ball drawn from urn A is black,” and will be denoted by AB. Bet 2 is “the ball drawn from urn A is red,” and will be denoted by AR, and similarly we have BB and BR. Winning a bet entitles the subject $100. The following preferences have been observed empirically: AB & AR > BB & BR. It is easy to see that there is no probability measure supporting these preferences through expected utility maximization. One conceivable explanation of this phenomenon which we adopt here is as follows: In case of urn B, the subject has too little information to form a prior. Hence he/she considers a set of priors as possible. Being uncertainty averse, he/she takes into account the minimal expected utility (over all priors in the set) while evaluating a bet. For instance, one may consider the extreme case in which our decision maker takes into account all possible priors over urn B. In this case the minimal utility of each one of the bets AB, AR is $50, while that of bets BB and BR is $0, so that the observed preferences are compatible with the maxmin expected utility decision rule. These ideas are not new. Hurwicz (1951) showed an example of statistical analysis where the statistician is too ignorant to have a unique “Bayesian” prior, but “not quite as ignorant” to apply Wald’s decision rule with respect to all priors. Smith (1961) suggested considering an interval of priors in such situations. He tried to axiomatize this behavior pattern using the “Odds” concept. Other works utilize the Choquet Integration with respect to capacities (Choquet (1955)) to deal with the
Gilboa, I. and D. Schmeidler (1989). Maxmin expected utility with a non-unique prior, Journal of Mathematical Economics, 97, 141–153.
126
Itzhak Gilboa and David Schmeidler
problem of a nonunique prior. Huber and Strassen (1973) use the Choquet Integral in testing hypotheses regarding the choice between two disjoint sets of measures. Schmeidler (1982, 1984, 1986) axiomatizes the preferences representable via the Choquet Integral of the utility with respect to a nonadditive probability measure. He used a framework including both “Horse Lotteries” and “Roulette Lotteries,” à la Anscombe and Aumann (1963). Gilboa (1987) obtains the same representation in the original framework of Savage (1954). (See also Wakker (1986)). In Schmeidler (1986) it has been shown, roughly speaking, that when the nonadditive probability v on S is convex (i.e. v(A ∪ B) + v(A ∩ B) ≥ v(A) + v(B)), the Choquet Integralof a real-valued function, say a, with respect to v is equal to the minimum of { a dP |P is in the core of v}. The core of v, by definition, consists of all finitely additive probability measures that majorize v pointwise (i.e. event-wise). That is to say, the nonadditive expected utility theory coincides with the decision rule we propose here, where the set of possible priors is the core of v. However, when an arbitrary (closed and convex) set of priors C is given, and one defines v(A) = min{P (A)|P ∈ C}, v need not be convex, though it is exact, that is, pointwise minimum of additive set functions. (See examples in Schmeidler (1972) and Huber and Strassen (1973).) Furthermore, even if v happens to be convex C does not have to be its core. It is not hard to construct an example in which C is a proper subset of the core of v. This chapter proposes an axiomatic foundation of the maxmin expected utility decision rule. As in Schmeidler (1984), some of which notations we repeat, using the framework of Anscombe and Aumann (1963). The main difference among the models of Anscombe and Aumann (1963), Schmeidler (1984) and the present one lies in the phrasing of the independence axiom (sure-thing principle). Unlike in the other two works, we also use here an axiom of uncertainty aversion. Similarly to the nonadditive expected utility theory, this model extends classical expected utility. In general, the theories differ from each other; as mentioned earlier, they coincide in the case of a convex v. The straightforward interpretation of our result is an extension of the neoBayesian paradigm which leads to a set of priors instead of a unique one. However, with a different interpretation, in which the set C is the set of possible probability distributions in a statistical decision problem, our result sheds light on Wald’s minimax criterion and on its relation to personalistic probability. (We refer here to the minimax loss criterion, which is equivalent to maximin utility, and not to the minimax regret criterion suggested by Savage (1954: Ch. 9).) In Wald (1950: Section 1.4.2), we find: “A minimax solution seems, in general, to be a reasonable solution of the decision problem when an a priori distribution in does not exist or is unknown to the experimenter.” Hence our main result can be considered as an axiomatic foundation of Wald’s criterion. The detailed exposition of the model and the main result are stated in the next section. The proof in Section 6.3 and Section 6.4 is devoted to an extension and several concluding remarks. Especially, we deal there with the definition of the concept of independence in the case of a nonunique prior.
Maxmin expected utility with non-unique prior
127
Finally we would like to note that different approaches to the phenomenon of a nonunique prior appear in Lindley et al. (1979), Vardennan and Meeden (1983), Agnew (1985), Genest and Schervish (1985), Bewley (1986) and others.
6.2. Statement of the main result Let X be a set and let Y be the set of distributions over X with finite supports " % Y = y : X → [0, 1] | y(x) = 0 for only finitely many x’s in X and y(x) = 1 . x∈X
For notational simplicity we identify X with the subset {y ∈ Y |y(x) = 1 for some x in X} of Y . Let S be a set and let be an algebra of subsets of S. Both sets, X and S are assumed to be nonempty. Denote by L0 the set of all -measurable finite step functions from S to Y and denote by Lc the constant functions in L0 . Let L be a convex subset of Y S which includes Lc . Note that Y can be considered a subset of some linear space, and Y S , in turn, can then be considered as a subspace of the linear space of all functions from S to the first linear space. Whereas it is obvious how to perform convex combinations in Y it should be stressed that convex combinations in Y S are performed pointwise. That is, for f and g in Y S and α in [0, 1], αf + (1 − α)g = h where h(s) = αf (s) + (1 − α)g(s) for s ∈ S. In the neo-Bayesian nomenclature, elements of X are (deterministic) outcomes, elements of Y are random outcomes or (roulette) lotteries and elements of L are acts (or horse lotteries). Elements of S are states (of nature) and elements of are events. The primitive of a neo-Bayesian decision model is a binary (preference) relation over L to be denoted by . Next are stated several properties (axioms) of the preference relation, which will be used in the sequel. A1 Weak order. (a) For all f and g in L: f g or g f . (b) For all f , g, and h in L: If f g and g h then f h. The relation on L induces a relation also denoted by on Y : y z iff y ∗ z∗ where x ∗ (s) = x for all x ∈ Y and s ∈ S. When no confusion is likely to arise, we shall not distinguish between y ∗ and y. As usual, > and & denote the asymmetric and symmetric parts, respectively, of . A2 Certainly-Independence (C-independence for short). For all f , g in L and h in Lc and for all α in ]0, 1[: f > g iff αf + (1 − α)h > αg + (1 − α)h. A3 Continuity. For all f , g, and h in L: if f > g and g > h then there are α and β in ]0, 1[ such that αf + (1 − α)h > g and g > βf + (1 − β)h. A4 Monotonicity. For all f and g in L: if f (s) g(s) on S then f g.
128
Itzhak Gilboa and David Schmeidler
A5 Uncertainty aversion. For all f , g ∈ L and α ∈ ]0, 1[: f & g implies αf + (1 − α)g f . A6 Nondegeneracy. Not for all f and g in L, f g. All the assumptions except for A2 and A5 are quite common. The standard independence axiom is stronger than C-independence as it allows h to be any act in L rather than restricting it to constant acts. This axiom seems heuristically more appealing: a decision maker who prefers f to g can more easily visualize the mixtures of f and g with a constant h than with an arbitrary one, hence he is less likely to reverse his preferences. An intuitive objection to the standard independence axiom is that it ignores the phenomenon of hedging. Like comonotonic independence (Schmeidler (1984)), C-independence does not exclude hedging. However, C-independence is much simpler than and implied by comonotonic independence. Uncertainty aversion (which was introduced in Schmeidler (1984)) captures the phenomenon of hedging, especially when the preference is strict. Thus this assumption complements C-independence. Before stating the main result we mention that the topology to be used on the space of finitely additive set functions on is the product topology, that is, the weak∗ topology in Dunford and Schwartz (1957) terms. Recall that in this topology the set of finitely additive probability measures on is compact. Theorem 6.1. Let be a binary relation on L0 . Then the following conditions are equivalent: (1) satisfies assumptions A1–A5 for L = L0 . (2) There exist an affine function u: Y → R and a nonempty, closed and convex set C of finitely additive probability measures on such that: (∗) f g iff minP ∈C u ◦ f dP minP ∈C u ◦ g dP (for all f , g ∈ L0 ). Furthermore: (a) The function u in (2) is unique up to a positive linear transformation; (b) The set C in (2) is unique iff assumption A6 is added to (1).
6.3. Proof of Theorem 6.1 The crucial part of the proof is that (1) implies (2). If A6 fails to hold, then a constant function u and any closed and convex subset C will satisfy (2), hence for the next several lemmata we suppose assumptions A1–A6. Lemma 6.1. There exists an affine u: Y → R such that for all y, z ∈ Y : y z iff u(y) u(z). Furthermore, u is unique up to a positive linear transformation. Proof. This is an immediate consequence of the von Neumann–Morgenstern theorem, since the independence assumption for Lc is implied by C-independence. (See Fishburn (1970: Ch. 8).)
Maxmin expected utility with non-unique prior
129
Lemma 6.2. Given a u : Y → R from Lemma 6.1, there exists a unique J : L0 → R such that: (i) f g iff J (f ) J (g) (for all f , g ∈ L0 ); (ii) for f = y ∗ ∈ Lc , J (f ) = u(y). Proof. On Lc J is uniquely determined by (ii). We extend J to L0 as follows: ¯ y ∈ Y such that y f y. ¯ Given f ∈ L0 , there are y, By the continuity assumption and other assumptions, there exists a unique α ∈ [0, 1] such that f & αy + (1 − α)y. ¯ Define J (f ) = J (αy + (1 − α)y). ¯ By construction, J satisfies (i), hence it is also unique. We shall henceforth choose a specific u: Y → R such that there are y1 , y2 ∈ Y for which u(y1 )< − 1 and u(y2 ) > 1. (Such a choice of a utility u is possible in view of the nondegeneracy assumption.) We denote by B the space of all bounded -measurable real-valued functions on S (which is denoted B(S, ) in Dunford and Schwartz (1957)). B0 will denote the space of functions in B which assume finitely many values. Let K = u(Y ), and let B0 (K) be the subset of functions in B0 with values in K. For γ ∈ R, let γ ∗ ∈ B0 be the constant function on S the value of which is γ . Lemma 6.3. There exists a functional I : B0 → R such that: (i) (ii) (iii) (iv)
For all f ∈ L0 , I (u ◦ f ) = J (f ) (hence I (1∗ ) = 1). I is monotonic (i.e. for a, b ∈ B0 : a b ⇒ I (a) I (b)). I is superlinear (i.e. superadditive and homogeneous of degree 1). I is C-independent: for any a ∈ B0 and γ ∈ R, I (a + γ ∗ ) = I (a) + I (γ ∗ ).
Proof. We first define I on B0 (K) by condition (i). (Lemma 6.2 and the monotonicity assumption assure that I is thus well defined). We now show that I is homogeneous on B0 (K). Assume a = αb where a, b ∈ B0 (K) and 0 < α 1. We have to show that I (a) = αI (b). (This will imply the equality for α > 1.) Let g ∈ L0 satisfy u ◦ g = b. Let z ∈ Y satisfy J (z) = 0 and define f = αg + (1 − α)z. Hence u ◦ f = αu ◦ g + (1 − α)u ◦ z = αb = a, so I (a) = J (f ). Let y ∈ Y satisfy y & g (hence J (y) = J (g) = I (b)). By C-independence, αy + (1 − α)z & αg + (1 − α)z = f . Hence J (f ) = J (αy + (1 − α)z) = αJ (y) + (1 − α)J (z) = αJ (y). Whence I (a) = J (f ) = αJ (y) = αI (b). We now extend I by homogeneity to all of B0 . Note that I is monotone and homogeneous of degree 1 on B0 . Next we show that I is C-independent (part (iv) of the Lemma). Let there be given a ∈ B0 and γ ∈ R. By homogeneity we may assume without loss of generality that 2a, 2γ ∗ ∈ B0 (K). Now define β = I (2a) = 2I (a). Let f ∈ L0 satisfy u ◦ f = 2a and let y, z ∈ Y satisfy u ◦ y = β ∗ and u ◦ z = 2γ ∗ . Since
130
Itzhak Gilboa and David Schmeidler
f & y, C-independence of implies that 12 f + 12 z & 12 y + 12 z. Hence 1 1 I (a + γ ∗ ) = I ( β ∗ + γ ∗ ) = β + γ = I (a) + γ , 2 2 and I is C-independent. It is left to show that I is superadditive. Let there be given a, b ∈ B0 . Once again, by homogeneity we may assume without loss of generality that a, b ∈ B0 (K). Furthermore, for the same reason it suffices to prove that I ( 12 a + 12 b) 12 I (a) + 1 2 I (b). Suppose that f , g ∈ L0 are such that u◦f = a and u◦g = b. If I (a) = I (b), then f & g and by uncertainty aversion (assumption A5), 12 f + 12 g f , which, in turn, implies I ( 12 a + 12 b) I (a) = 12 I (a) + 12 I (b). Assume, then, I (a) > I (b), and let γ = I (a) − I (b). Set c = b + γ ∗ and note that I (c) = I (b) + γ = I (a) by C-independence of I . Using the C-independence of I twice more and its superadditivity for the case proven earlier, one obtains: 1 1 1 1 1 1 I a+ b + γ =I a + c I (a) + I (c) 2 2 2 2 2 2 =
1 1 1 I (a) + I (b) + γ , 2 2 2
which completes the Proof of the Lemma. Recall that the space B is a Banach space with the sup norm · , and B0 is a norm-dense subspace of B. Lemma 6.4 will also be used in an extension of the Theorem. Lemma 6.4. There exists a unique continuous extension of I to B. Furthermore, this extension is monotonic, superlinear and C-independent. Proof. We first show that for each a, b ∈ B0 , |I (a) − I (b)| a − b. Indeed, a = b + a − b b + a − b∗ . Monotonicity and C-independence of I imply that I (a) I (b + a − b∗ ) = I (b) + a − b or I (a) − I (b) a − b. The same argument implies I (b) − I (a) b − a. Thus there exists a unique continuous extension of I . Obviously, it is superlinear, monotonic and C-independent. In Lemma (6.5) the convex set of finitely additive probability measures C of Theorem 6.1 will be constructed via a separation theorem. Lemma 6.5. If I is a monotonic superlinear and C-independent functional on B with I (1∗ ) = 1, there exists a closed and convex set C of finitely additive probability measures on such that: for all b ∈ B, I (b) = min{ b dP |P ∈ C}. Proof. Let b ∈ B with I (b) > 0 be given.We will construct a finitely additive probability measure Pb such that I (b) = b dPb and I (a) a dPb for all
Maxmin expected utility with non-unique prior
131
a ∈ B. To this end we define D1 = {a ∈ B|I (a) > 1}, D2 = conv({a ∈ B|a 1∗ } ∪ {a ∈ B|a b/I (b)}). We now show that D1 ∩ D2 = ∅. Let d2 ∈ D2 satisfy d2 = αa1 + (1 − α)a2 where a1 1∗ , a2 (b/I (b)), and α ∈ [0, 1]. By monotonicity, homogeneity, and C-independence of I , I (d2 ) α + (1 − α)I (a2 ) 1. Note that each of the sets D1 , D2 has an interior point and that they are both convex. Thus, by a separation theorem (see Dunford and Schwartz (1957, V.2.8)) there exists a nonzero continuos linear functional pb and an α ∈ R such that: for all d1 ∈ D1 and d2 ∈ D2 ,
pb (d1 ) α pb (d2 ).
(6.1)
Since the unit ball of B is included in D2 , α > 0. (Otherwise pb would have been identically zero.) We may therefore assume without loss of generality that α = 1. By (1), pb (1∗ ) 1. Since 1∗ is a limit point of D1 , pb (1∗ ) 1 is also true, hence pb (1∗ ) = 1. We now show that pb is non-negative, or, more specifically, that pb (1E ) 0 whenever 1E is the indicator function of some E ∈ . Since pb (1E ) + pb (1∗ − 1E ) = pb (1∗ ) = 1, and 1∗ − 1E ∈ D2 , the inequality follows. By the classical representation theorem there exists a finitely additive probability measure Pb on such that pb (a) = a dPb for all a ∈ B. We will now show that pb (a) I (a) for all a ∈ B, with equality for a = b: First assume I (a) > 0. It is easily seen that a/I (a) + (1/n)∗ ∈ D1 , so the continuity of pb and (1) imply pb (a) I (a). For the case I (a) 0 the inequality follows from C-independence. Since b/I (b) ∈ D2 , we obtain the converse inequality for b, thus pb (b) = I (b). We now define the set C as the closure of the convex hull of {Pb |I (b) > 0} (which, of course, is convex). It is easy to see that I (a) min{ a dP |P ∈ C}. For a such that I (a) > 0, we have shown the converse inequality to hold as well. For a such that I (a) 0, it is again a simple implication of C-independence. Conclusion of the Proof of Theorem 6.1 Lemmata 6.1–6.5 prove that (1) implies (2). Assuming (2) define I on B by I (b) = min{ b dP |P ∈ C}, C compact and convex. It is easy to see that I is monotonic, superlinear, C-independent, and continuous. So, in turn, the preference relation defined on L0 by (2) satisfies A1–A5. We now turn to prove the uniqueness properties of u and C. The uniqueness of u up to positive linear transformation is implied by Lemma 6.1. If Assumption A6 does not hold, the range of u, K, is a singleton, and C can be any nonempty closed and convex set. We shall now show that if assumption A6
132
Itzhak Gilboa and David Schmeidler
does hold, C is unique. Assume the contrary, that is, that there are C1 = C2 , both nonempty, closed and convex, such that the two functions on L0 : % " J1 (f ) = min u(f ) dP |P ∈ C1 , %
" J2 (f ) = min
u(f ) dP |P ∈ C2 ,
both represent . Without loss of generality one may assume that there exists P1 ∈ C1 \C2 . By a separation Theorem (Dunford and Schwartz (1957: V.2.10)), there exists a ∈ B such that % " a dP1 < min a dP |P ∈ C2 . Without loss of generality we may assume that a ∈ B0 (K). Hence there exists f ∈ L0 such that J1 (f ) < J2 (f ). Now let y ∈ Y satisfy y & f . We get J1 (y) = J1 (f ) < J2 (f ) = J2 (y), a contradiction.
6.4. Extension and concluding remarks A natural question arising in view of Theorem 6.1 is whether it holds when the set of acts L, on which the preference relation is given, is a convex superset of L0 . A partial answer is presented in the sequel. It will be shown that, for a certain superset of L0 , the preference relation on it is completely determined by its restriction to L0 , should it satisfy the assumptions introduced in Section 6.2. Given a weak order on Lc , an act f : S → Y is said to be -measurable if for all y ∈ Y the sets {s|f (s) > y} and {s|f (s) y} belong to . It is said to be bounded (or, more precisely, -bounded) if there are y1 , y2 ∈ Y such that y1 f (s) y2 for all s ∈ S. The set of all -measurable bounded acts in Y S is denoted by L(). It is obvious that L() is convex and contains L0 . Proposition 6.1. Suppose that a preference relation over L0 satisfies assumptions A1–A5. Then it has a unique extension to L() which satisfies the same assumptions (over L()). Proof. Because of monotonicity, the proposition is obvious in case that Assumption A6 does not hold. Therefore we assume it does, and we may apply Lemmata 6.1–6.4. We then define the extension of (also to be denoted by ) as follows: f g iff I (u(f )) I (u(g)). It is obvious that satisfies A1–A5 and that on L() is the unique monotonic extension of on L0 .
Maxmin expected utility with non-unique prior
133
Remark. Suppose that satisfies A1–A5 over L, which is convex and contains L0 . Then, in view of Proposition 6.1, may be represented as in Theorem 6.1 on L ∩ L(). We now introduce the concepts of independence of acts and products of binary relations. Suppose that a given preference relation satisfies A1–A6 over L0 . By Proposition 6.1 we extend it to L = L() and let u and C be as in Theorem 6.1. Two acts f , g ∈ L are said to be independent if the following two conditions hold: (1) There exists P0 ∈ C such that " % u ◦ f dP0 = min u ◦ f dP |P ∈ C , and "
u ◦ g dP0 = min
% u ◦ g dP |P ∈ C ;
(2) u ◦ f and u ◦ g are two stochastically independent random variables with respect to any extreme point of C (for short: Ext(C)). As expected, this notion of independence turns out to be closely related to that of product spaces, once the latter is defined. We will refer to a triple (S, , C) as a nonunique probability space. Given two nonunique probability spaces (Si , i , Ci ) i = 1, 2, we define their product (S, , C) as follows: S = S1 × S2 , = 1 ⊗ 2 and C is the closed convex hull of {P1 ⊗ P2 |P1 ∈ C1 , P2 ∈ C2 }. Suppose that for a given set of outcomes X, there are given two acts spaces Li0 ⊂ Y Si , i = 1, 2, and two preference relations i correspondingly, such that the restrictions of 1 and 2 to Y coincide. As before, we suppose that each i satisfies A1–A6 and we consider its extension to Li = Li (i ). For the product acts space L0 ⊂ Y S1 ×S2 we define the product preference relation = 1 ⊗ 2 as derived from u and C. It is obvious that also satisfies A1–A6, and it has a unique extension to L = L(). Given f i ∈ Li , it has a unique trivial extension f¯i ∈ L. Now we formulate the result which justifies our definition of independence: Proposition 6.2. Given L1 , 1 , L2 , 2 and L as stated earlier, is the unique preference relations over L satisfying: (1) assumptions A1–A6; (2) for all f i , g i ∈ Li , f i i g i iff f¯i g¯ i (i = 1, 2); (3) for all f ∈ L1 and g ∈ L2 , f¯ and g¯ are independent. Proof. It is trivial to see that indeed satisfies (1)–(3). To see that it is unique, let also satisfy (1)–(3). By (1) and our main result, is representable by a utility
134
Itzhak Gilboa and David Schmeidler
u and a convex and closed set of finitely additive measures C . By Lemma 6.1 we assume without loss of generality that u = u . We now wish to show that C = C. Step 1. C ⊂ C. Proof of Step 1. As C is convex, it suffices to show that Ext(C ) ⊂ C. Let, then P0 ∈ Ext(C ). Define Pi to be the restriction of P0 to i (i = 1, 2). Choose A ∈ 1 and B ∈ 2 , and let f ∈ L1 and g ∈ L2 satisfy u◦f = 1A , u◦g = 1B . Since f¯ and g¯ are independent, they are independent with respect to P0 . Hence P0 (A × B) = P0 (A × S2 )P0 (S1 × B) = P1 (A)P2 (B). This implies P0 = P1 ⊗ P2 ∈ C. Step 2. C ⊂ C . Proof of Step 2. We begin with Step 2a. It 1 and 2 are finite, then C ⊂ C . Proof of Step 2a. By a theorem of Straszewicz (1935), it suffices to show that P1 ⊗ P2 ∈ C for all P1 ∈ Exp(C1 ) and P2 ∈ Exp(C2 ), where Exp(C) denotes the set of exposed points in C, that is, the points at which there exists a supporting hyperplane which does not pass through any other point of C. Let there be given, then, P1 ∈ Exp(C1 ) and P2 ∈ Exp(C2 ). Let f ∈ L1 and g ∈ L2 be such that % " u ◦ f dP |P ∈ C1 u ◦ f dP1 = min and
"
u ◦ g dP2 = min
% u ◦ g dP |P ∈ C2 .
By the independence of f and g, there exists P0 ∈ C , for which u ◦ f¯ dP and u ◦ g¯ dP are minimized simultaneously. By step 1, P0 ∈ C, hence there are P1 ∈ C1 and P2 ∈ C2 such that P0 = P1 ⊗P2 . However, u◦ f¯ dP0 = u◦f dP1 and u ◦ g¯ dP0 = u ◦ g dP2 . By the uniqueness property of Exp(Ci )(i = 1, 2), we obtain P1 = P1 and P2 = P2 . Hence P1 ⊗ P2 = P0 ∈ C, and step 2a is proved. We will now complete the Proof of Step 2. Assume that, by way of negation, C \C = ∅, that is, = . As in the Proof of the Theorem, there exists f ∈ L0 ˜ and y ∈ Y such that f > y ∗ and y ∗ > f . Consider the finite sub-algebra, say , of generated by f . There are i finite sub-algebras of i (I = 1, 2), such that ˜ ⊂ = ⊗ . Next consider the restrictions of i to the -measurable 1 2 i functions, and the restrictions of , to -measurable functions. Obviously, both and satisfy requirements (1)–(3) of the Proposition, although they differ on the set of -measurable functions (to which f and y ∗ belong.) This contradicts Step 2a, and the Proof of the Proposition is thus completed.
Acknowledgments The authors acknowledge partial financial support by the Foerder Institute for Economic Research and by The Keren Rauch Fund at Tel Aviv University.
Maxmin expected utility with non-unique prior
135
References Agnew, C. E. (1985). Multiple probability assessments by dependent experts, Journal of the American Statistical Association, 80, 343–347. Anscombe, F. J. and R. J. Aumann (1963). A definition of subjective probability, The Annals of Mathematics and Statistics, 34, 199–205. Bewley, T. (1986). Knightian decision theory: Part 1, Mimeo (Yale University, New Haven, CT). Choquet, G. (1955). Theory of capacities, Annales de l’Institut Fourier, 5, 131–295. Dunford, N. and J. T. Schwartz (1957). Linear operators, Part I (Interscience, New York). Ellsberg, D. (1961). Risk, ambiguity and the savage axioms, Quarterly Journal of Economics, 75, 643–669. Fishburn, P. C. (1970). Utility theory for decision making (Wiley, New York). Genest, C. and M. J. Schervish (1985). Modeling expert judgments for Bayesian updating, The Annals of Statistics, 13, 1198–1212. Gilboa, I. (1987). Expected utility theory with purely subjective non-additive probabilities, Journal of Mathematical Economics, 16, 65–88. Huber, P. J. and V. Strassen (1973). Minimax tests and the Neyman–Pearson lemma for capacities, The Annals of Statistics, 1, 251–263. Hurwicz, L. (1951). Some specification problems and application to econometric models, Econometrica, 19, 343–344. Lindley, D. V., A. Tversky and R. V. Brown (1979). On the reconciliation of probability assessments, Journal of the Royal Statistical Society, Series A 142, 146–180. Savage, L. J. (1954). The foundations of statistics (Wiley, New York). Schmeidler, D. (1972). Cores of exact games, I, Journal of Mathematical Analysis and Applications, 40, 214–225. Schmeidler, D. (1982). Subjective probability without additivity (temporary title), Working paper (Foerder Institute for Economic Research, Tel Aviv University, Tel Aviv). Schmeidler, D. (1984). Subjective probability and expected utility without additivity, IMA Preprint Series. (Reprinted as Chapter 5 in this volume.) Schmeidler, D. (1986). Integral representation without additivity, Proceedings of the American Mathematical Society, 97, No. 2. Smith, A. B. Cedric, (1961). Consistency in statistical inference and decision, Journal of the Royal Statistical Society, Series B 23, 1–25. Straszewicz, S. (1935). Uber exponierte Punkte abgeschlossener Punktmengen, Fundamenta Mathematicae, 24, 139–143. Vardennan, S. and G. Meeden (1983). Calibration, sufficiency and dominant consideration for Bayesian probability assessors, Journal of the American Statistical Association, 78, 808–816. Wakker, P. (1986). Ch. 6 in a draft of a Ph.D thesis. Wald, A. (1950). Statistical decision functions (Wiley, New York).
7
A simple axiomatization of nonadditive expected utility Rakesh Sarin and Peter P. Wakker
7.1. Introduction Savage’s (1954) subjective expected utility (SEU) theory has been widely adopted as the guide for rational decision making in the face of uncertainty. In SEU theory both the probabilities and the utilities are derived from preferences (see also Ramsey (1931)). This represents a hallmark contribution, as it avoids the reliance on introspection for quantifying tastes and beliefs. We continue in Savage’s vein and extend his theory to derive a more general nonadditive expected utility representation, called Choquet expected utility (CEU). Schmeidler (1989, first version 1982) made the first contribution in providing a CEU representation and Gilboa (1987) extended this work. We develop this line of research further by providing an intuitive axiomatization of CEU. The key distinction between our work and that of Savage is that we identify two types of events—unambiguous and ambiguous. People feel relatively “sure” about the probabilities of unambiguous events. An example of an unambiguous event could be the outcome of a toss of a fair coin (heads or tails). We assume that Savage’s axioms hold for a sufficiently rich set of “unambiguous acts,” that is, acts measurable with respect to the unambiguous events. The probabilities of ambiguous events, however, are not known with precision. An example of such an event could be next week’s weather conditions (rain or sunshine). Ambiguity in the probability of such events may be caused, for example, by a lack of available information relative to the amount of conceivable information Keynes (1921). Most people exhibit a reluctance to bet on events with ambiguous probabilities. This reluctance leads to a violation of Savage’s sure-thing principle (P2). The CEU theory proposed here does not impose the sure-thing principle for all events and is therefore capable of permitting a liking for specificity and a dislike for ambiguity in probability. The key condition in this chapter to provide the CEU representation is “cumulative dominance” (P4 in Section 7.3). Simply stated, this condition requires that
Sarin, R. and P. P. Wakker (1992). “A simple axiomatization of nonadditive expected utility,” Economica, 18, 141–153.
Axiomatization of nonadditive expected utility
137
if receiving consequence α or a superior consequence is considered more likely for an act f than for an act g, for every α, then the act f is preferred to the act g. This condition is trivially satisfied for an SEU maximizer. Unlike the sure-thing principle that forces the probabilities for all events to be additive, cumulative dominance permits that probabilities for some events could be nonadditive. A probability function is nonadditive if the probability of the union of two disjoint events is not equal to the sum of the individual probabilities of each event. An example will show how nonadditive probabilities could accommodate an aversion toward ambiguity. The judgments and preferences that may lead to nonadditive probability have been rationalized by many authors. For example, Keynes (1921) has argued that confidence in probability influences decisions under uncertainty. Knight (1921) made the distinction between risk and uncertainty based on whether the event probabilities are known or unknown. Recently Schmeidler (1989) has argued that the amount of information available about an event may influence probabilities in such a way that probabilities are not necessarily additive. In a seminal paper, Ellsberg (1961) showed that if one accepts Savage’s definition of probability then a majority of subjects violates additivity of probability. Numerous experiments since then have confirmed Ellsberg’s findings. Even though Ellsberg’s example is well known, we present it as it serves to illustrate the motivation and direction for our proposed modification of Savage’s theory. Suppose an urn is filled with 90 balls, 30 of which are red (R), and 60 of which are white (W ) and yellow (Y ) in an unknown proportion. One ball will be drawn randomly from the urn and your payoff will depend on the color of the drawn ball and the “act” (decision alternative) you choose. See Table 7.1. When subjects are asked to choose between acts f and g, a majority chooses act f , presumably because in act f the chance of winning $1,000 is precisely known to be 1/3. In act g the chance of drawing a white ball is ambiguous since the number of white balls is unknown. Now, when the same subjects are asked to choose between acts f and g a majority chooses the act g . Again, in act g , the chance of winning $1,000 is precisely known to be 2/3, whereas in act f , the chance of winning is ambiguous. Thus, subjects tend to like specificity and to avoid ambiguity. By denoting v(R), v(W ), and v(Y ) as the probability of drawing a red, white, or yellow ball respectively, we obtain, assuming expected utility with
Table 7.1 The Ellsberg options Act
f g f g
30 balls
60 balls
Red
White
Yellow
$1,000 $0 $1,000 $0
$0 $1,000 $0 $1,000
$0 $0 $1,000 $1,000
138 Rakesh Sarin and Peter P. Wakker u(0) = 0:f g implies v(R)u(1, 000) > v(W )u(1, 000) or
v(R) > v(W );
g f implies v(W )u(1,000) + v(Y )u(1,000) > v(R)u(1,000) + v(Y )u(1,000), or v(W ) > v(R). Thus, consistent probabilities cannot be assigned to the states, as v(R) cannot simultaneously be larger as well as smaller than v(W ). Clearly, in this example no inconsistency results if v(R ∪ Y ) = v(R) + v(Y ). In our development we permit nonadditive probabilities for some events (such as R ∪ Y ) that we call ambiguous events. Our strategy is to differentiate between ambiguous and unambiguous events by requiring that only the acts that are measurable with respect to unambiguous events satisfy Savage’s axioms. General acts are assumed to satisfy somewhat weaker conditions that may yield nonadditive probabilities for ambiguous events. It is to be noted that we do not require an a priori definition of unambiguous or ambiguous events (for the latter see Fishburn, 1991). We do, however, assume that there exists a subclass of events, such as those generated by a roulette wheel, such that an SEU representation holds with respect to these events. The idea is that these events are unambiguous. The subclass of unambiguous events should be rich enough to ensure that all ambiguous events can be calibrated by appropriate bets contingent on unambiguous events. The strategy of permitting probabilities to be nonadditive and using them in CEU was first proposed by Schmeidler (1989, first version 1982). Schmeidler uses the set-up of Anscombe and Aumann (1963) (as refined in Fishburn, 1967, 1970, 1982), where for every state an act leads to an objective probability distribution, to formulate his axioms and derive the result. A nonadditive probability extension for the approach of Savage (1954) in full generality is very complicated. Gilboa (1987) succeeded in finding such an extension. The resulting axioms are, however, quite complicated and do not seem to have simple intuitive interpretations (see Fishburn 1998: 202). In this chapter, we propose another extension of Schmeidler’s model that in our view has a greater intuitive appeal. The basic idea is to reformulate Savage’s axioms to permit nonadditivity in probability for ambiguous events (event R ∪ Y in Table 7.1) while preserving additivity for unambiguous events (event Y ∪ W in Table 7.1). Technically, our work may be viewed as a sort of unification of Gilboa (1987) and Schmeidler (1989), and builds heavily on these works. Additional axiomatizations of CEU that assume some rich structure on the consequences instead of the states have been provided in Wakker (1989a,b, 1993a), and Nakamura (1990, 1992). Wakker (1990) has shown that CEU when applied to decision making under risk (where probabilities are extraneously specified) is identical to rank-dependent (anticipated) utility. A survey of several independent discoveries of the CEU form has been given in Wakker (1991).
Axiomatization of nonadditive expected utility
139
Schmeidler’s lottery-acts formulation may be viewed as a two-stage process where a state s occurs in the first stage and in the second stage a lottery is played to determine the final consequence. If probabilities are additive the one-stage formulation (e.g. of Savage) and the two-stage formulation (e.g., of Anscombe and Aumann) yield the same conclusion. However, as we shall see, in the nonadditive case the two formulations yield different conclusions about the preference rankings of acts. We begin by presenting some notations and definitions in Section 7.2. Our axioms and main result are stated in Section 7.3. In Section 7.4 we explore the relationship between CEU and SEU models. An example and a general result showing the irreconcilability of Schmeidler’s two-stage formulation with a naturally equivalent one-stage formulation are presented in Section 7.5. Finally, Section 7.6 contains conclusions, and proofs are given in the Appendix.
7.2. Definitions 7.2.1. Elementary definitions In this section we present the notation for the Savage (1954) style formulation for decisions under uncertainty and introduce some definitions that are useful in developing our results. There is a set C of consequences (payoffs, prizes, outcomes) and a set S of states of nature. The states in S are mutually exclusive and collectively exhaustive, so that exactly one state is the true state. We shall let A denote a σ -algebra of subsets of S, that is, A contains S, A ∈ A implies Ac (the complement of A) ∈ A, and A is closed under countable unions (this will be generalized in Remark 7.1). Thus A also contains Ø, and is closed under countable intersections. Subjective probabilities or “capacities” will be assigned to the elements of A; these elements are called events. An event A is informally said to occur if A contains the true state. The set C is also assumed to be endowed with a σ -algebra D; this will only play a role for acts with an infinite number of consequences. A decision alternative or an act is a function from S to C that is measurable, that is, f −1 (D) ∈ A for all D ∈ D. If the decision maker chooses an act f , then the consequence f (s) will result where s is the true state. The decision maker is uncertain about which state is true, hence about which consequence will result from an act. The set of acts is denoted as F . Act f is constant if, for some α ∈ C, f (s) = α for all states s. Often a constant act is identified with the resulting consequence. Statements of conditions are simplified by defining fA as the restriction of f to A, fA h as the act that assigns consequences f (s) to all s ∈ A, and consequences h(s) to all s ∈ S \ A. Given that consequences are identified with constant acts, fA α designates the act that is identical to f on A and constant α on S \ A; αA β is similar. Further, for a partition {A1 , . . . , Am }, we denote by 1 . . . α m the act that assigns consequence α j to each s ∈ A , j = 1, . . . , m. αA j Am 1 Such acts are called step acts.1 A binary relation over F gives the decision
140
Rakesh Sarin and Peter P. Wakker
maker’s preferences. The notations , , ≺, and ∼ are as usual. Further, is a weak order if it is complete (f g or g f for all f , g) and transitive. We define on C from on F through constant acts: α β if f g where f is constant α, g is constant β. Postulate P3 will ensure that on F and on C are in proper agreement. We assume that and D are compatible in the sense that all “preference intervals” are contained in D. A preference interval, as defined in Fishburn (1982), is a set E ⊂ D such that α, γ ∈ E, α β γ imply β ∈ E. A special case is a set E such that α ∈ E, β α implies β ∈ E. Such sets are called cumulative consequence sets. They will play a central role in this chapter. Example 7A.1 shows why, in the absence of set continuity, cumulative dominance must include all cumulative consequence sets and not just sets of the form {β:β α}; in the latter case cumulative dominance would become too strong. Following Savage (1954) (see also de Finetti (1931, 1937) and Ramsey (1931)), we define on A from on F through “bets on events:” A B if there exist consequences α β such that αA β αB β. We then say that A is more likely than B. Postulate P4 will ensure that on A satisfies usual conditions such as transitivity and completeness, and is in proper agreement with on F ; see also Lemma 7.1 in Section 7.2.2. Obviously, in this chapter the more-likely-than relation will not correspond to an additive probability; it will correspond to a “capacity”, that is a nonadditive probability see Lemma 7.2.1.2 We will make use of a sub σ -algebra Aua of A that should be thought of as containing unambiguous events, for example events generated by the spin of a roulette wheel, or by repeated tosses of a coin. We denote by F ua the set of acts that are D–Aua measurable; that is, F ua contains the acts f for which f −1 (E) ∈ Aua for each E ∈ D. We will assume that Savage’s (1954) axioms are satisfied if attention is restricted to the unambiguous events and F ua . An event A ∈ Aua is null if fA h ∼ gA h for all f , g ∈ F ua ; it is non-null otherwise. 7.2.2. Choquet expected utility A function v:A → [0, 1] is a capacity if v(Ø) = 0, v(s) = 1, and v is monotonic with respect to set-inclusion, that is, A ⊃ B ⇒ v(A) v(B). The capacity v is a (finitely additive) probability measure if, in addition, v is additive, that is, v(A ∪ B) = v(A)+v(B) for all disjoint A, B. A capacity v is convex-ranged if for every A ⊃ C and every µ between v(A) and v(C) there exists A ⊃ B ⊃ C such that v(B) = µ. For a capacity v, and a measurable functionφ : S → R, the Choquet integral of φ (with respect to v), denoted S φ dv, or φ dv, or φ, and introduced in Choquet (1953–1954), is v({s ∈ S:φ(s) τ }) dτ + [v({s ∈ S:φ(s) τ }) − 1] dτ . (7.1) R+
R−
In Wakker (1989b, Chapter VI) illustrations are given for the Choquet integral. We say that maximizes Choquet expected utility (CEU) if there exist a capacity v
Axiomatization of nonadditive expected utility
141
on A and a measurable utility function U :C → R such that the preference rela tion is represented by f → S U (f (s)) dv; the latter is called the Choquet expected utility of f , denoted CEU(f ). Suppose there are n states s1 , . . . , sn and U (f (s1 )) · · · U (f (sn )). Then CEU(f )=
n−1
(U (f (si )) − U (f (si+1 )))v({s1 , . . . , si }) + U (f (sn )).
i=1
The proof of the following lemma is left to the reader. Lemma 7.1. If on F maximizes CEU , then the relation on C is represented by the utility function U , and the relation on A is represented by the capacity v whenever U is nonconstant.
7.3. The main result Apart from the well-known postulates of Savage on the unambiguous acts, we shall use one additional postulate, “cumulative dominance” (P4 on the next page), to govern preferences over ambiguous acts. It is a natural extension of Savage’s P4 to acts with more than two consequences. When restricted to acts with exactly two consequences, our P4 is identical to Savage’s P4. It is best appreciated as an adaptation of the stochastic dominance condition. Let us recall that stochastic dominance applies to decision making under risk, where for each uncertain event A ∈ A a probability P (A) is well specified, and usually C is an interval within R. In this setting, an act (or its probability distribution as generated over consequences) stochastically dominates another if it assigns to each cumulative consequence set3 at least as high a probability. In the present set-up, without probabilities attached to each event, it is natural to say that an act f stochastically (“cumulatively”) dominates an act g if the decision maker regards each cumulative consequence set at least as likely under f as under g. Monotonicity with respect to stochastic dominance, reformulated with this adaptation, is our additional postulate P4. It turns out that this condition in the presence of the usual conditions, and Savage’s conditions on a rich set of unambiguous acts, is necessary and sufficient for CEU. To readers familiar with CEU and with Savage’s set-up, the proof of the main result may be transparent if P4 is assumed. We hope that this mathematical simplicity is viewed as a strength of the chapter, because P4, in our opinion, is an intuitively appealing assumption about behavior under uncertainty as well. We first state the axioms and then the main theorem, which is followed by a discussion. Postulate P1. Weak ordering. Postulate P2. (The sure-thing principle for unambiguous acts). For all events A and acts f , g, h, h with fA h, gA h, fA h , gA h ∈ F ua : fA h gA h ⇐⇒ fA h gA h .
142
Rakesh Sarin and Peter P. Wakker
Postulate P3. For all events A ∈ A, acts f ∈ F , and consequences α, β : α β ⇒ αA f βA f . The reversed implication holds as well if A ∈ Aua , A is nonnull, and f ∈ F ua . Postulate P4. (Cumulative dominance). For all acts f, g we have: f g whenever f −1 (E) g −1 (E) for all cumulative consequence sets E. Postulate P5. (Nontriviality). There exist consequences α, β such that α β. Postulate P6. (Fineness of the unambiguous events). If α ∈ C and, for f ∈ F ua , g ∈ F , f g, then there exists a partition (A1 , . . . , Am ) of S, with all elements in Aua , such that αAj f g for all j , and the same holds with ≺ instead of . The following postulate is Gilboa’s adaptation of Savage’s P7 to the case of CEU. It is a technical condition, and is only needed for the extension of CEU to acts with infinite range. In order to state the postulate, we define an event A to be f -convex if for any s, s ∈ A and s ∈ S, f (s) f (s ) f (s ) ⇒ s ∈ A. Note that, for some fixed s ∈ A, f (s)A h denotes the act that assigns f (s) to each s ∈ A, and is identical to h on Ac . Postulate P7. For all f , g ∈ F , and nonempty f -convex events A, f (s)A f g
f or all s ∈ A ⇒ f g,
and the same holds with instead of . We now state the main theorem. In it, cardinal abbreviates “unique up to scale and location.” Theorem 7.1. The following two statements are equivalent: (i) The preference relation maximizes CEU for a bounded nonconstant utility function U on C, and for a capacity v on A. On Aua the capacity is additive and convex-ranged. (ii) Postulates P1–P7 are satisfield. Further, the utility function in statement (i) is cardinal, and the capacity is unique. In this result, condition P4 can be weakened to the following “cumulative reduction” condition, if in addition we include Savage’s P4 (i.e. our P4 restricted to two-consequence acts). Cumulative reduction says that the only relevant aspect of an act is its “decumulative” distribution. Cumulative reduction follows from two-fold application of P4, with the roles of f and g interchanged. This condition is the only implication of P4 that we shall use in the proof of Theorem 7.1 for acts
Axiomatization of nonadditive expected utility
143
with more than two consequences. We have preferred to present the stronger P4 in the theorem because of its close relationship with stochastic dominance. Postulate P4 . (Cumulative reduction). For all acts f , g we have: f ∼ g whenever
f −1 (E) ∼ g −1 (E)
for all cumulative consequences sets E. Let us also point out that all conditions can be weakened to hold only for step acts, with the exception of P1, the act g in P6, and P7. If P4/P4 is restricted to step acts then cumulative consequence sets can be restricted to sets of the form {β ∈ C : β α} for some α ∈ C. The next example considers the cases where the state space is a product space. These are the cases considered by Schmeidler. The above theorem applies to any case where there is a sub σ -algebra isomorphic to the Borel sets on [0, 1] endowed with the Lebesgue measure; the latter is somewhat more general than product spaces. The technique of this chapter allows for more generality: the sets of ambiguous acts and events can be quite general, as long as the set of unambiguous acts and events is sufficiently rich. This will be explicated in Remark 7.1. A further generalization can be obtained in our one-stage approach by imposing on F ua the conditions of Gilboa (1987) which lead to CEU, instead of using Savage’s conditions which lead to additive expected utility. The proof of this more general result is almost identical to the proof of Theorem 7.1. In other words, as soon as there is a sufficiently rich subset of acts on which CEU holds, then by cumulative dominance CEU will spread over all acts. Alternatively, for the rich subset of acts, we could have taken the set of probability distributions over the consequences, with expected utility or rank-dependent utility maximized there. We chose Savage’s set-up because it is very appealing. Example 7.1. Let [0, 1] be endowed with the usual Lebesgue measure (i.e. uniform distribution) over the usual Borel σ -algebra. can be any set endowed with any σ -algebra. Let S = × [0, 1], endowed with the usual product σ -algebra; v is any capacity that assigns the Lebesgue measure of E to any set × E. C can be any arbitrary set, and U :C → R any function, nonconstant to avoid triviality. Preferences maximize CEU. With Aua the σ -algebra of all sets of the form × E for E a Borel-subset of [0, 1], all Postulates P1–P7 are satisfied. Remark 7.1. The requirement that A should be a σ -algebra, and that all A–D measurable functions from S to C should be included in F , can be restricted to the unambiguous acts and events, as follows. (i) Aua should be a σ -algebra, and all Aua –D measurable functions from S to C should be included in F . Then, in addition, the following adaptations should be made. First, the measurability requirement should be imposed that for all f ∈ F and cumulative consequence sets E, f −1 (E) ∈ A. Second, Postulate P3 should be required only if αA f , βA f ∈ F . Third, the nontriviality Postulate P5 should be changed as follows.
144 Rakesh Sarin and Peter P. Wakker Postulate P5 . There exist consequences α β such that αA βAc ∈ F for all events A ∈ A. P5 as such is not a necessary condition for the CEU representation. Fourth and finally, for Postulate P7, needed for nonsimple acts, it should be required that for all acts f ∈ F , f -convex events A, and states s ∈ A, f (s)A f be contained in F (consequences can be “collapsed”). Note that this allows for great generality. For instance, A may consist of Aua , events described by a roulette wheel, and a collection of events entirely unrelated to the roulette wheel. There is no need to incorporate intersections or unions of events described by the roulette wheel, and other events. Let us finally comment further on the uniqueness of the capacity in Theorem 7.1. Suppose Statement (i) in Theorem 7.1 holds. Would there exist CEU representations that also represent the preference relation but have v nonadditive on Aua ? The following observation answers this question. Observation 7.1. Suppose Statement (i) in Theorem 7.1 holds. If there exist three or more equivalence classes of consequences, then for any CEU representation the capacity will be additive on Aua . If there exist no more than two equivalence classes of consequences, then any capacity can be taken that is a strictly increasing transform of the capacity of Theorem 7.1.4
7.4. Revealed unambiguous events In this section we characterize revealed unambiguous events and partitions, that is, those for which the capacity is additive (defined hereafter). It is possible that a decision maker considers some events as ambiguous but nevertheless reveals an additive capacity with respect to these. The characterization of this section will lead to a generalization of the theorem of Anscombe and Aumann (1963). A capacity is additive on a partition {A1 , . . . , Am } if v(A ∪ B) = v(A) + v(B) for all disjoint events A, B that are unions of elements of the partition. This is equivalent to additivity of the capacity on the algebra generated by the partition. A capacity is additive with respect to an event A if it is additive with respect to the partition {A, Ac }, that is, if v(A) = 1 − v(Ac ). Gilboa (1989) used the term symmetry for a capacity that is additive with respect to each event. As shown there, symmetry does not imply that the capacity is additive. A capacity is additive if and only if it is additive on each partition, which holds if and only if it is additive on each partition consisting of three events (consider, for disjoint events A, B, the partition {A, B, (A ∪ B)c }). In the presence of the rich Aua in Theorem 7.1, the characterization of revealed unambiguous partitions is easy. Note that in CEU additivity of the capacity immediately leads to SEU. Machina and Schmeidler (1992) consider the case with an additive probability measure on the events, and a general (nonexpected utility) functional, such as used in Machina (1982). Like our main result, their main result weakens Savage’s sure-thing principle and strengthens his P4. Their P4 implies the sure-thing principle for two-consequence acts, which our P4
Axiomatization of nonadditive expected utility
145
obviously does not. In addition, it implies, mainly in the presence of P6, our P4. The Ellsberg paradoxes give examples where their P4 is satisfied. Proposition 7.1. Suppose Statement (i) in Theorem 7.1 holds. Let {A1 , . . . , Am } be a partition. The following four statements are equivalent: (i) The capacity is additive on the partition. (ii) For all disjoint A and A that are unions of elements of the partition, and for disjoint unambiguous events B ua ∼ A, B ua ∼ A , we have A ∪ A ∼ ua ua B ∪B . ua } such that (iii) There exists an unambiguous partition {B1ua , . . . , Bm 1 m ∼ αB1 1ua . . . αBmmua αA . . . αA 1 m
for all consequences α 1 , . . . , α m .
ua } we have: (iv) For each unambiguous partition {B1ua , . . . , Bm
A1 ∪ . . . ∪ Aj ∼ B1ua ∪ . . . ∪ Bjua for all j ⇒
1 αA 1
m . . . αA m
∼
αB1 1ua
. . . αBmmua
(7.2) 1
for all consequences α , . . . , α m .
We could obviously obtain additivity of the capacity v in Statement (i) of Theorem 7.1 by adding any of the conditions in Statements (ii), (iii), or (iv) above, for each partition, to Statement (ii) of Theorem 7.1. Given the importance of the result that can be derived from Statement (iv), let us make the condition explicit: Postulate P4 . (Reduction.) For each partition {A1 , . . . , Am } and each unamua }, (7.2) holds true. bigous partition {B1ua , . . . , Bm If in the definition of reduction we would have added the condition that the consequences in (7.2) are rank-ordered, that is, α1 · · · αm , then the condition would have been identical to P4 (cumulative reduction) restricted to step acts, which is all of P4 that is needed apart from its restriction to two-consequence acts (i.e. Savage’s P4). P4 resembles the reduction principle in Fishburn (1998), which is called neutrality in Yaari (1987). This principle says that if for two acts consequences are in some sense equally likely, then the acts are equivalent. Corollary 7.1. In Statement (i) of Theorem 7.1 additivity of the capacity can be added if in Statement (ii) P4 (cumulative dominance) is replaced by P4 (reduction) plus the restriction of P4 to two-consequence acts. The above corollary can be regarded as a generalization of the result of Anscombe and Aumann (1963) and Fishburn (1967). Their structure is rich enough to satisfy P1–P3, P4 , and P5–P7. The set-up of the above corollary is more general in exactly the same way that the set-up of Theorem 7.1 is more general than the result of Schmeidler (1989): The state space is not required to be a Cartesian product of ambiguous and unambiguous events. All that is needed is that the set of unambiguous events be rich enough. In the same way that Theorem 7.1 can be
146
Rakesh Sarin and Peter P. Wakker
considered a unification of the results of Schmeidler (1989) and Gilboa (1987), the above corollary can be considered a unification of the results of Anscombe and Aumann (1963) and Savage (1954). The key feature in either case is that the events generated be a random device are incorporated within the state space. We think this is more natural than the two-stage approach of Anscombe and Aumann (1963). In the practice of decision analysis, objective probabilities of events Aua generated by a roulette wheel will typically be used as in Lemma 7A.1 in the Appendix to elicit “unknown” probabilities. This in no way requires a two-stage structure. While Theorem 7.1 was (apart from convex-rangedness) less general than Gilboa’s result, the above corollary is a generalization of both Anscombe and Aumann’s result and Savage’s result. A generalization as indicated in Remark 7.1 can also be obtained for the above corollary. An earlier result along these lines, within the classical additive set-up, is Bernardo et al. (1985). Corollary 7.1 is more general, mainly because, unlike Bernardo et al., we do not require a stochastic independence relation as a primitive, or existence of independent unambiguous events.
7.5. Nonequivalence of one- and two-stage approaches Schmeidler made the novel contribution of showing that CEU is capable of permitting attitudes toward ambiguity that are disallowed by Savage’s SEU. Schmeidler stated his axioms using the horserace–roulette wheel set-up of Anscombe and Aumann (1963). This is a two-stage set-up; that is, in the first stage an event (e.g. the horse Secretariat winning) obtains and in the second stage the consequence is determined depending, for example, on a roulette wheel. In Schmeidler’s model capacities are assigned to first-stage events. Further, the lotteries in the second stage are evaluated by the usual additive expected utility. An act assigns to each first-stage event a lottery, thus an expected utility value. The Choquet integral of these (with respect to the capacity over the first-stage events) gives the evaluation of the act. In our one-stage approach we embed the roulette wheel lotteries within Savage’s formulation by enlarging the state space S. Our one-stage approach is complementary to the two-stage approach of Schmeidler as it provides additional flexibility in modeling decisions under uncertainty. This one-stage approach to CEU was introduced in Becker and Sarin (1989). In the SEU theory, whether the one-stage or a two-stage approach is employed is purely a matter of taste or convenience in modeling. In the CEU framework, however, these two variations produce theoretically different results. We demonstrate this theoretical nonequivalence of one- and two-stage approaches through an example. Our analysis gives further evidence that multi-stage set-ups in nonexpected utility may cause complications. Gärdenfors and Sahlin (1983), Luce and Narens (1985), Luce (1988, 1991, 1992), Luce and Fishburn (1991), Segal (1987, 1990) focus on distinctions between one- and two-or-more-stage set-ups. Segal (1990) uses a two-stage set-up to describe an ambiguous event. Probabilities within each stage are assumed to be additive but they do not follow multiplicative rules between
Axiomatization of nonadditive expected utility (a)
Stage 1
act
Stage 2 H ub
H
b
f
T
ub
H
ub
Tb T ub (b)
H bH ub H bTub
147
Consequences (utilities) 1
1
1
–1
0
0
0
0
1
0
0
1
1
1
1
–1
0
0
0
0
1
0
0
1
f T bH ub T bT ub
Figure 7.1 (a) Two-stage formulation of Example 7.2 (b) one-stage formulation of Example 7.2.
the two stages. Segal showed how dominance type axioms can provide nonexpected utility characterizations in the two-stage-set up (also see Wakker, 1993b). Example 7.2. This example is a small variation on one of the paradoxes of Ellsbeurg. The preferences used in the example are consistent with those observed in the Ellsberg paradox. Further, the single-stage capacities are uniquely determined by the equivalent two-stage model of Schmeidler. Suppose a biased coin and an unbiased coin will be tossed. The possible states of nature are H b H ub , H b T ub , T b H ub , T b T ub , where H b T ub denotes the state where the biased coin lands heads up and the unbiased coin lands tails up, and so on. For simplicity assume that utility is known and that payment is in utility. It follows in Schmeidler’s model that subjects consider a bet of 1 on H ub 5 as well as a bet of 1 on T ub equivalent to 1/2 for certain (given that payment is in utility). It has been observed that subjects will typically consider a bet of 1 on H b as well as a bet of 1 on T b less preferable. Let us assume the latter bets are equivalent to α for certain, for some number α < 1/2. In the two-stage set-up of Anscombe–Aumann and Schmeidler, decisions are formulated as shown in Figure 7.1(a). For the act f shown in Figure 7.1(a), the two-stage approach yields CEU(f ) = 0, because the probability of H ub and T ub is 1/2. Thus, f is judged indifferent to a constant act g with consequence 0. Note that our assumption stated in the preceding paragraph implies that, with v m denoting the capacity in the two-stage approach, v m (H b ) = v m (T b ) = α. Now consider the one-stage formulation of the act in Figure 7.1(a) as depicted in Figure 7.1(b). To evaluate CEU(f ) in Figure 7.1(b) we need the single-stage capacities, now denoted v j to distinguish from the capacities in the two-stage approach, v j (H b H ub ) and v j (H b H ub , T b H ub , T b T ub ). For consistency with
148
Rakesh Sarin and Peter P. Wakker
the two-stage approach (see the boxed columns in Figure 7.1(a) and (b), the first column in Schmeidler’s two-stage approach is equivalent to α/2 and the second column to α × 1 + (1 − α) × 12 = 12 + 12 α, so v j (H b H ub ) = α/2 and v j (H b H ub , T b H ub , T b T ub ) = 12 + 12 α must be chosen. Hence, in the one-stage approach, CEU(f ) is α/2 + (1 − ( 12 + α/2))(−1) = α − 12 < 0; it follows that f ≺ g(≡ 0). Thus the one- and the two-stage approach yield different results, and are irreconcilable. They only agree in the additive case α = 1/2. In Sarin and Wakker (1990: Theorem 10) it is shown that the result of the above example holds in full generality. That is to say, only under expected utility can the one- and two-stage approach of CEU be equivalent. As soon as the capacity is nonadditive in Schmeidler’s two-stage approach, the equivalent one-stage approach is not a CEU model.
7.6. Conclusion Savage’s SEU theory is widely accepted as a rational theory of decision making under uncertainty in economics and decision sciences. Unfortunately, however, people’s choices violate the axioms of SEU theory in some well-defined situations. One such situation is when event probabilities are ambiguous. In this chapter we have shown that a simple extension of SEU theory called CEU theory can be derived by assuming a natural cumulative dominance condition. CEU permits a subject to assign probabilities to events so that the probability of a union of two disjoint events is not necessarily the sum of the individual event probabilities. The violation of additivity may occur because a person’s choice may be influenced by the degree of confidence or specificity about the event probabilities. Schmeidler and Gilboa have also proposed axioms to derive the CEU representation. Building on their work, we have provided the simplest derivation of CEU presently available. Also, conditions have been given under which CEU reduces to SEU. It is also shown that unlike SEU theory, where a one-stage set-up of Savage or a two-stage set-up of Anscombe and Aumann yield identical results, the two-stage set-up of Schmeidler cannot be reconciled with a one-stage formulation unless event probabilities are additive. In our opinion the one-stage set-up as used by Gilboa seems more appropriate in single-person decision theory. We hope that our work has clarified the distinction between CEU and SEU theories and that it will stimulate further research and additional explorations of CEU.
Appendix: Proofs A1. Proof of Theorem 7.1, Remark 7.1, and Observation 7.1 For the implication (i) ⇒ (ii) in Theorem 7.1, suppose (i) holds. Then P1 follows directly. P2 and P3 are standard results from, mainly, the usual additive expected utility theory. For Postulate P4, note that if [f −1 (E) g −1 (E) for all cumulative consequence sets E], then by Lemma 7.1 the integrand in (7.1) is at least as large for φ = U ◦ f as for φ = U ◦ g. So f g, as P4 requires. P5 is direct from
Axiomatization of nonadditive expected utility
149
nonconstantness of U . For P6, let f ∈ F ua , g ∈ F , f g (the case f ≺ g is similar) and α ∈ C. By boundedness of utility, there exists µ > 0 such that ∀s ∈ S:U (f (s))−U (α) < µ. Because v is convex-ranged within Aua , we can take a partition {A1 , . . . , Am } of S such that Aj ∈ Aua and v(Aj ) < (CEU(f )−CEU(g))/µ for all j . For P7, let f , g ∈ F , and let A ∈ A be a nonempty event (f -convexity of A will not be used). Then, with U ∗ = U ◦f on Ac , and U ∗ = inf A U ◦f (inf is real-valued by nonemptiness of A and boundedness of U ) on A, the premise in P7 implies ∗ U ◦ f dv U dv = inf U ◦ f (s)A f dv CEU(g). s∈A
Next we suppose (ii) holds, and derive (i) and the uniqueness results, including Observation 7.1. It is immediate that Savage’s postulates P1–P6 hold true on F ua . So we get an SEU representation on F s,ua , which denotes the set of step acts in F ua . There exist a cardinal utility function U :C → R and a unique additive probability measure P on Aua , such that expected utility represents preferences on F s,ua . We call P (A) the “probability” of A. As follows from Savage (1954), P is atomless and satisfies convex-rangedness. Obviously, P will be the restriction of v to F ua . Let us next extend the CEU representation as now established for all unambiguous step acts, to all step acts. First, we define the capacity v. By P5 there are consequences ζ η, which are kept fixed throughout the proof. Lemma 7.A.1. For each event A there exists an Aua ∈ Aua such that ζA η ∼ ζAua η. Proof. By P2, ζS η ζA η ζØ η. Suppose that in fact ζS η ζA η ζØ η (otherwise we are done immediately), and that for event B ua ∈ Aua we have ζA η ζB ua η (e.g. B ua = Ø). This implies P ((B ua )c ) > 0. By P6, there exists a partition C1 , . . . , Cn of S, with all Cj ∈ Aua , such that ζB ua ∪Cj η ≺ ζA η for all j . There exists at least one Cj ∩ (B ua )c with strictly positive probability. So there exists an event B ua : = B ua ∪ Cj with probability strictly greater than B ua , and such that still ζA η ζB ua η. So, using convex-rangedness, the set of probabilities of events B ua as above must be of the form [0, p − [ for some 0 < p − 1. Similarly, the set of probabilities of events C ua ∈ Aua such that ζA η ≺ ζC ua η, must be of the form ]p+ , 1] for some 0 p+ < 1. The only possibility is p− = p+ . By convex-rangedness there exists an event Aua ∈ Aua with probability p − . Now ζA η ∼ ζAua η is the only possibility. Q.E.D. Thus, for every A ∈ A, there exists an Aua that is equally likely. Because each possible choice of Aua has the same P value, we can define v:A → P (Aua ), extending v from Aua (where v = P ) to the entire A. For monotonicity with respect to set-inclusion, suppose that A ⊃ B. Then, by P2, ζA η ζB η . From this v(A) v(B) follows, and v is a capacity. To establish the CEU representation for all step acts, we construct for each ambiguous step act an unambiguous one “with the same cumulative distribution.”
150
Rakesh Sarin and Peter P. Wakker
That is, for the ambiguous and the unambiguous acts the events of obtaining a consequence at least as good as α are equally likely, for each consequence α. For step acts this is not only necessary, but also sufficient, to have all cumulative consequence sets equally likely under the two acts. First, we extend Lemma 7A.1. The proof of the extension is completely similar, with µ, ν in the place of ζ , η, further f in the place of ζA η, and µ f ν implied by P4.
Lemma 7.A.2. For each act f for which there exist consequences µ, ν such that [∀s ∈ S:µ f (s) ν], there exists an Aua ∈ Aua such that µAua ν ∼ f . Obviously, by the SEU representation as already established, ζAua η ∼ ζB ua η for each unambiguous event B ua equally likely as Aua . By convex-rangedness of P , and Lemma 7.A.1, for each partition A1 , . . . , Am of S we can find an unambiguous partition B1 , . . . , Bm of S such that A1 ∪ · · · ∪Aj is equally likely as B1 ∪· · ·∪Bj , for each j . To do so, first we find an unambiguous B1 ∼ A1 , and set B1 : = B1 . Next we find an unambiguous B2 ∼ A1 ∪ A2 . By convex-rangedness of P , we can find an unambiguous B2 with B2 ∩ B1 = Ø such that P (B1 ∪ B2 ) = P (B2 ), so that B1 ∪ B2 ∼ A1 ∪ A2 , and so on. The next paragraph is the central part of the proof, and is simple. The other parts of the proof are all standard after Savage (1954), using Gilboa’s (1987) P7. 1 . . . α m be an arbitrary step act, with α 1 · · · α m . We take Let αA Am 1 an unambiguous partition {B1 , . . . , Bm } as described earlier. The unambiguous act αB1 1 . . . αBmm , by two-fold application of P4 (once with , once with ), is equivalent to the ambiguous act. Its SEU value can, similarly to the Choquet integral, be written as P (B1 )U (α 1 ) + [P (B1 ∪ B2 ) − P (B1 )]U (α 2 ) + · · · + [1 − P (B1 ∪ · · · ∪ Bm−1 )]U (α m ). This shows that it is identical to the CEU value of the ambiguous act. So indeed CEU represents preferences between all step acts. The extension of the CEU representation to non-step acts is mainly by P7, and is similar to Gilboa (1987). Note that this in particular establishes the expected utility representation on the entire set F ua . Contrary to Gilboa (1987), our capacity need not be convex-ranged. We can however follow the reasoning of his subsection 4.3 with only unambiguous step acts f¯, g. ¯ Convex-rangedness is used there for the existence of g, ¯ while convex-rangedness of P suffices for that. In the proof of his Theorem 4.3.4, in Statement (i), the act f¯ can always be chosen unambiguous, by Lemma 7.A.2. Let us also mention that one cannot restrict P7 to F ua . This would be possible if for each ambiguous act there would exist an unambiguous act with the same cumulative distribution. This however is not the case in general. For example if P is countably additive, then it cannot generate strictly finitely additive distributions; for example, with C = R, it does not generate cumulative distribution functions that are not continuous from the right. Also it is possible that for instance U (C) = [0, 1[, P is countably additive, and there exists a positive ε such that under an ambiguous act f each cumulative event {s ∈ S : f (s) α} (0 α < 1) has capacity at least ε.
Axiomatization of nonadditive expected utility
151
The utility functions must be bounded, as follows from the representation on F ua . This is shown in Fishburn (1970: section 14.1), and the second 1972 edition of Savage (1954: footnote on p. 80). Finally we establish the uniqueness results. By the standard results of Savage (1954) we get cardinality of U , and uniqueness of the restriction P of v to Aua . The extension of v to A \ Aua shows that v is uniquely determined. Next let us suppose that v is allowed to be nonadditive on F ua , as studied in Observation 7.1. Let us at first also suppose that there are three or more nonequivalent consequences. Then the representation, if restricted to F ua , satisfies all conditions in Gilboa (1987); hence by his uniqueness results the restriction of v to F ua is unique, so additive. The uniqueness of v follows in the same way as above. Let us finally consider the case where there are exactly two equivalence classes of consequences, with say ζ η. Any U instead of U in a CEU representation is constant on equivalence classes of consequences and satisfies U (ζ ) > U (η). So U is a strictly increasing transform of U , and obviously is bounded. Given the two-valued range, U is cardinal. Because any v in a CEU representation has to represent the same ordering over events as v, v must be a strictly increasing transform of v. Conversely, any such v will do. Thus, it is possible to choose v such that it is not convex-ranged and not additive on Aua . It can however always be chosen such that it is convex-ranged and additive on Aua . For the proof of Remark 7.1, note that all constructions in the proof of the implication (ii) ⇒ (i) of Theorem 7.1 (including the extension to nonsimple acts, following Gilboa 1987) remain possible under the conditions of Remark 7.1. Our result has not established convex-rangedness of the capacity v. That can be characterized by addition of one condition, Gilboa’s P6*. We propose to rename this as “solvability.” Solvability is satisfied if for all acts f , g, consequences α β, and events A, if αA f g βA f , and αA f , βA f “comonotonic” (∀s ∈ Ac :f (s) α or f (s) β), there exists an event B ⊂ A such that αB βA\B fAc ∼ g. That solvability, even if restricted to two-consequence acts, is sufficient for convexrangedness of v, follows mainly from convex-rangedness of P , which gives all desired “intermediate” g. Necessity is straightforward. Proposition 7.A.1. Suppose Statement (i) in Theorem 7.1 holds. Then v is convexranged if and only if satisfies solvability. For the case of three or more equivalence classes of consequences, a more general derivation, without use of F ua , is given in Gilboa (1987). If there are exactly two equivalence classes of consequences and v is not required to be additive on Aua , then, by Observation 7.1, v need not be convex-ranged, even if solvability is satisfied. The following example shows why we used cumulative consequence sets, instead of less general sets of the form {β ∈ C:β α} for α ∈ C, in the definition P4 of cumulative dominance, and its derivatives P4 and P4 . Note that the distinction is relevant only for nonstep acts, and that we could have restricted
152
Rakesh Sarin and Peter P. Wakker
P4, P4’, P4 to step acts. In that case, we could have used the less general sets as mentioned earlier. Example 7.A.1. Suppose the special case of Statement (i) in Theorem 7.1 holds where in fact all of Savage’s axioms are satisfied. So v is an additive probability measure, that we denote by P . Let C = {−1/j :j ∈ N} ∪ {1 + 1/j :j ∈ N}, and ∞ ∞ let U be the identity. Let {Aj }∞ j = 1 ∪ {Bj }j = 1 be a partition of S, {Aj }j = 1 ∪ {Bj } ∞ another partition of S. A: = ∪j = 1 Aj , B, A , B are defined similarly. Suppose that P (Aj ) = P (Aj ), P (Bj ) = P (Bj ) for all j . Further suppose that P (A) > P (A ). Such cases can be constructed if P is not set-continuous, that is, not countably additive. Let f assign 1 + 1/j to each Aj , and −1/j to each Bj . Similarly f assigns 1 + 1/j to each Aj , and −1/j to each Bj . For each consequence 1 + 1/j we have j j P (f (s) 1 + 1/j ) = i = 1 P (Aj ) = i = 1 P (Aj ) = P (f (s) 1 + 1/j ). For each consequence −1/j we have P (f (s) −1/j ) = 1 − P (f (s) ≺ j −1 j −1 −1/j ) = 1 − i = 1 P (Bi ) = 1 − i = 1 P (Bi ) = P (f (s) −1/j ). So for each α ∈ C:{s ∈ S:f (s) α} ∼ {s ∈ S:f (s) α}. However, for 0 µ 1, P (f (s) µ) = P (A) > P (A ) = P (f (s) µ). By Formula (7.1), CEU(f ) − CEU(f ) = 1 × (P (A) − P (A )) > 0. So f f . Only for cumulative consequence sets E: = [µ, ∞[ with µ as above we do not have f −1 (E) f −1 (E). A1. Proof of Proposition 7.1 The implications (i) ⇒ (ii) and (i) ⇒ (iv) are direct. The implication (i) ⇒ (iii) follows from convex-rangedness of P . Next we prove that Statement (i) is implied by each of the other statements. (ii) ⇒ (i) is direct. (iii) ⇒ (ii) follows from taking A and A as union of Aj ’s, taking B ua and B ua as union of corresponding Bjua ’s, and from the equivalences (with ζ η) ζA η ∼ ζB ua η, ζA η ∼ ζB ua η, ζA∪A η ∼ ζB ua ∪B ua η. Finally, suppose (iv) holds. Similarly to the reasoning below Lemma 7.A.2, we can show the existence of ua } such that A ∪· · ·∪A ∼ B ua ∪· · ·∪B ua an unambiguous partition {B1ua , . . . , Bm 1 j 1 j for all j . For any A that is a union of Aj ’s-different-from-A1 , and B ua a union of corresponding Bj ’s, we have, by (7.2), ζA∪A1 η ∼ ζB ua ∪B1ua η and ζA η ∼ ζB ua η. Taking differences and dividing by the positive U (ζ ) − U (η) we get v(A∪A1 )−v(A) = P (B ua ∪B1ua )−P (B ua ) = P (B1ua ). So the “decision weight” that A1 contributes to each union of the other Aj ’s, is independent of those other Aj ’s. The same holds for each Av . Hence, the capacity of a union of different Aj ’s is the sum of the separate capacities: v is additive on the partition.
Acknowledgments The support for this research was provided in part by the Decision, Risk, and Management Science branch of the National Science Foundation; the research of Peter Wakker has been made possible by a fellowship of the Royal Netherlands Academy of Arts and Sciences, and a fellowship of the Netherlands Organization for Scientific Research. The authors are thankful to two referees for many detailed comments.
Axiomatization of nonadditive expected utility
153
Notes 1 Every step act is “simple,” that is, is measurable and has a finite range. If D contains every one-element subset, then every simple act is a step act. Step acts turn out to be easier to work with than simple acts. 2 Sometimes a nonadditive capacity is a strictly increasing transform of a probability measure, which then also represents the “more-likely-than” relation. In general, however, a capacity will not be of that form. 3 For example, receiving α or a superior consequence. 4 Only one will be additive on Aua of course. 5 Such a bet gives 1 if H ub obtains and 0 if T ub obtains.
References Anscombe, F. J. and R. J. Aumann (1963). “A Definition of Subjective Probability,” Annals of Mathematical Statistics, 34, 199–205. Becker, J. L. and R. K. Sarin (1989). “Economics of Ambiguity,” Duke University, Fuqua School of Business, Durham, NC, USA. Bernardo, J. M., J. R. Ferrandiz, and A. F. M. Smith (1985). “The Foundations of Decision Theory: An Intuitive, Operational Approach with Mathematical Extensions,” Theory and Decision, 19, 127–150. Choquet, G. (1953–1954). “Theory of Capacities,” Annales de l’Institut Fourier, 5 (Grenoble), 131–295. de Finetti, B. (1931). “Sul Significato Soggettivo della Probabilità,” Fundamenta Mathematicae, 17, 298–329. —— (1937). “La Prévision: Ses Lois Logiques, ses Sources Subjectives,” Annales de l’ Institut Henri Poincaré, 7, 1–68. Translated into English by H. E. Kyburg, “Foresight: Its Logical Laws, its Subjective Sources,” in Studies in Subjective Probability, ed. by H. E. Kyburg and H. E. Smokler. New York: Wiley, 1964, 53–118; 2nd edition, New York: Krieger, 1980. Ellsberg, D. (1961). “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Fishburn, P. C. (1967). “Preference-Based Definitions of Subjective Probability,” Annals of Mathematical Statistics, 38, 1605–1617. —— (1970). Utility Theory for Decision Making. New York: Wiley. —— (1982). The Foundations of Expected Utility. Dordrecht: Reidel. —— (1988). Nonlinear Preference and Utility Theory. Baltimore: Johns Hopkins University Press. —— (1991). “On the Theory of Ambiguity,” International Journal of Information and Management Science, 2, 1–16. Gärdenfors, P. and N.-E. Sahlin (1983). “Decision Making with Unreliable Probabilities,” British Journal of Mathematical and Statistical Psychology, 36, 240–251. Gilboa, I. (1987). “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88. —— (1989). “Duality in Non-Additive Expected Utility Theory,” in Choice under Uncertainty, Annals of Operations Research, ed. by P. C. Fishburn and I. H. LaValle. Basel: J. C. Baltzer AG, 405–414. Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. Second edition, 1948. Knight, F. H. (1921). Risk, Uncertainty, and Profit. New York: Houghton Mifflin.
154
Rakesh Sarin and Peter P. Wakker
Luce, R. D. (1988). “Rank-Dependent, Subjective Expected-Utility Representations,” Journal of Risk and Uncertainty, 1, 305–332. —— (1991). “Rank- and-Sign Dependent Linear Utility Models for Binary Gambles,” Journal of Economic Theory, 53, 75–100. —— (1992). “Where Does Subjective Expected Utility Fail Descriptively?” Journal of Risk and Uncertainty, 4, 5–27. Luce, R. D. and P. C. Fishburn (1991). “Rank- and-Sign Dependent Linear Utility Models for Finite First-Order Gambles,” Journal of Risk and Uncertainty, 4, 29–59. Luce, R. D. and L. Narens (1985). “Classification of Concatenation measurement Structures According to Scale Type,” Journal of Mathematical Psychology, 29, 1–72. Machina, M. J. (1982). “ ‘Expected Utility’ Analysis without the Independence Axiom,” Econometrica, 50, 277–323. Machina, M. J. and D. J. Schmeidler (1992). “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. Nakamura, Y. (1990). “Subjective Expected Utility with Non-Additive Probabilities on Finite State Spaces,” Journal of Economic Theory, 51, 346–366. —— (1992). “Multi-Symmetric Structures and Non-Expected Utility,” Journal of Mathematical Psychology, 36, 375–395. Ramsey, F. P. (1931). “Truth and Probability,” in The Foundations of Mathematics and other Logical Essays. London: Routledge and Kegan Paul, 156–198. Reprinted in Studies in Subjective Probability, ed. by H. E. Kyburg and H. E. Smokler. New York: Wiley, 1964, 61–92. Sarin, R. and P. P. Wakker (1990). “Incorporating Attitudes towards Ambiguity in Savage’s Set-up,” University of California, Los Angeles, CA. Savage, L. J. (1954). The Foundations of Statistics. New York: Wiley. Second Edition, New York: Dover, 1972. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Segal, U. (1987). “The Ellsberg Paradox and Risk Aversion: An Anticipated Utility Approach,” International Economic Review, 28, 175–202. —— (1990). “Two-Stage Lotteries without the Reduction Axiom,” Econometrica, 58, 349– 377. Wakker, P. P. (1989a). “Continuous Subjective Expected Utility with Nonadditive Probabilities,” Journal of Mathematical Economics, 18, 1–27. —— (1989b). Additive Representations of Preferences, A New Foundation of Decision Analysis. Dordrecht: Kluwer. —— (1993a). “Unbounded Utility for Savage’s ‘Foundations of Statistics’, and Other Models,” Mathematics of Operations Research, 18, 446–485. —— (1990). “Under Stochastic Dominance Choquet-Expected Utility and Anticipated Utility are Identical,” Theory and Decision, 29, 119–132. —— (1991). “Additive Representations of Preferences on Rank-Ordered Sets I. The Algebraic Approach,” Journal of Mathematical Psychology, 35, 501–531. —— (1993b). “Counterexamples to Segal’s Measure Representation Theorem,” Journal of Risk and Uncertainty, 6, 91–98. Yaari, M. E. (1987). “The Dual Theory of Choice under Risk,” Econometrica, 55, 95–115.
8
Updating ambiguous beliefs Itzhak Gilboa and David Schmeidler
8.1. Introduction The Bayesian approach to decision making under uncertainty prescribes that a decision maker have a unique prior probability and a utility function such that decisions are made so as to maximize the expected utility. In particular, in a statistical inference problem the decision maker is assumed to have a probability distribution over all possible distributions which may govern a certain random process. This paradigm was justified by axiomatic treatments, most notably that of Savage (1954), and it enjoys unrivaled popularity in economic theory, game theory, and so forth. However, this theory is challenged by two classes of evidence: on the one hand, there are experiments and thought experiments (such as Ellsberg (1961) and many others) which seem to show that individuals tend to violate the consistency conditions underlying (and implied by) the Bayesian approach. On the other hand, people seem to have difficulties with specification of a prior for actual statistical inference problems. Thus, classical—rather than Bayesian—methods are user for practical purposes, although they are theoretically less satisfactory. The last decade has witnessed—among numerous and various generalizations of von Neumann and Morgenstern’s (1947) expected utility theory—generatlizations of the Bayesian paradigm as well. We will not attempt to provide a survey of them here. Instead, we only mention the models which are relevant to the sequel. 1
2
Nonadditive probabilities. First introduced by Schmeidler (1982, 1986, 1989) and also axiomatized in Gilboa (1987), Fishburn (1988), and Wakker (1989), nonadditive probabilities are monotone set-functions which do not have to satisfy additivity. Using the Choquet integral (Choquet, 1953–1954) one may define expected utility, and the works cited before axiomatize preference relations which are representable by expected utility in this sense. Multiple priors (MP). As axiomatized by Gilboa and Schmeidler (1989), this model assumes that the decision maker has a set of priors, and each alternative
Gilboa, I. and D. Schmeidler (1993). Updating ambiguous beliefs, J. Econ. Theory, 59, 33–49.
156
Itzhak Gilboa and David Schmeidler is assessed according to its minimal expected utility, where the minimum is taken over all priors in the set. (This idea is also related to Bewley (1986– 1988), who suggests a partial order over alternatives, such that one alternative is preferred to another only if its expected utility is higher according to all priors in the set.)
Of particular interest to this study is the intersection of the two models: it turns out that if a nonadditive measure (NA) exhibits uncertainty aversion (technically, if it is convex in the sense v(A ∪ B) + v(A ∩ B) v(A) + v(B)), then the Choquet integral of a real-valued function with respect to v equals the minimum of all its integrals with respect to additive priors taken from the core of v. (The core is defined as in cooperative game theory, that is, p is in the core of v if p(A) v(A) for every event A with equality for the whole sample space. Convex NA have nonempty cores.) While these models—as many others—suggested generalizations of the Bayesian approach for a one-shot decision problem, they shed very little light on the problem of dynamically updating probabilities as new information is gathered. We find this problem to be of paramount importance for several interrelated reasons: 1 2
3
4
The theoretical validity of any model of decision making under uncertainty is quite dubious if it cannot cope successfully with the dynamic aspect. The updating problem is at the heart of statistical theory. In fact, it may be viewed as the problem statistical inference is trying to solve. Some of the works in the statistical literature which pertain to this study are Agnew (1985), Genest and Schervish (1985), and Lindley et al. (1979). Applications of these models to economic and game theory models require some assumptions on how economic agents change their beliefs over time. The question naturally arises, then: What are the reasonable ways to update such beliefs? The theory of artificial intelligence, which in general seems to have much in common with the foundations of economic, decision, and game theory, also tries to cope with this problem. See, for instance, Fagin and Halpern (1989), Halpern and Fagin (1989), and Halpern and Tuttle (1989).
In this study we try to deal with the problem axiomatically and suggest plausible update rules which satisfy some basic requirements. We present a family of pseudoBayesian rules, each of which may be considered a generalization of Bayes’ rule for a unique additive prior. We also present a family of “classical” update rules, each of which starts out with a given set of priors, possibly rules some of them out in the face of new information, and continues with the (Bayesian) updates of the remaining ones. In particular, a maximum-likelihood-update rule would be the following: consider only those priors which ascribe the maximal probability to the event
Updating ambiguous beliefs
157
that is known to have occurred, update each of them according to Bayes’ rule, and continue in this fashion. It turns out that if the set of priors one starts out with can also be represented by a nonadditive probability, the results of this rule are independent of the order in which information is gathered. Furthermore, for those preferences which can be simultaneously represented by a NA and by MP, the maximum-likelihood-update rule coincides with one of the more intuitive Bayesian rules, and they boil down to the Dempster–Shafer rule (see Dempster (1967, 1968), Shafer (1976), and Smets (1986)). For recent work on belief functions and their updating, see Jaffray (1989), Chateauneuf and Jaffray (1989), and especially Jaffray (1990). Thus, we find that an axiomatically based generalization of the Bayesian approach can accommodate MP (which are used in classical statistics). Moreover, the maximum-likelihood principle, which is at the heart of statistical inference (and implicit in the techniques of confidence sets and hypothesis testing) coincides with the generalized Bayesian updating. Due to the prominence of this rule, it may be a source of insight to study it in a simple example. Consider Ellsberg’s example in which an urn with 90 balls is given, out of which 30 are red, and 60 are either blue or yellow. For simplicity of exposition, let us model this situation in a somewhat extreme fashion, allowing for all distributions of blue and yellow balls. Maxmin expected utility with respect to this set of priors is equivalent to the maximization of the Choquet integral of utility with respect to (w.r.t.) a NA v defined as 1 , 3 1 v(R ∪ B) = v(R ∪ Y ) = , 3 v(R ∪ B ∪ Y ) = 1, v(R) =
v(B) = v(Y ) = 0 v(B ∪ Y ) =
2 3
where R, B, and Y denote the events of a red, blue, or yellow ball being drawn, respectively. Assume now that it is known that a ball (which, say, has already been drawn) is not red. Conditioning on the event B ∪ Y , all priors in the set ascribe probability of 2 3 to it. Thus, they are all left in the set and updated according to Bayes’ rule. This captures our intuition that no ambiguity was resolved, and our complete ignorance regarding the event B ∪ Y has not changed. (Actually, it is now highlighted by the fact that this event, about which we know the least, is now known to have occurred.) Consider, on the other hand, the same update rules in the case that R ∪ B is known. The priors we started out with ascribe to this event probabilities ranging from 13 to 23 . According to the maximum-likelihood principle, only one of them is chosen—namely, the p which satisfies p(R) =
1 , 3
p(B) =
2 , 3
p(Y ) = 0.
158
Itzhak Gilboa and David Schmeidler
In this particular case, the set of priors shrinks to a singleton and, equivalently, the updated measure v is additive (and equals p itself). Ambiguity is thus reduced (in the case, eliminated) by the generalized Bayesian learning embodied in the exclusion of some priors. In the context of such examples it is sometimes argued that the maximumlikelihood rule is too extreme, and that priors which, say, only ε-maximize the likelihood function should not be ruled out. Indeed, classical statistics techniques such as hypothesis testing do allow for ranges of the likelihood function. At present we are not aware of a nice axiomatization of such rules. We point out, however, that the other extreme rule, that is, updating all priors without excluding any of them (see, for instance, Fagin and Halpern (1989), and Jaffray (1990)), does not appear to be any less “extreme” in general, nor does it seem to be implied by more compelling axioms. We believe that our theory can be applied to a variety of economic models, explaining phenomena which are incompatible with the Bayesian theory, and possibly providing better predictions. As a matter of fact, this belief may be updated given new evidence: Dow and Werlang (1990) and Simonsen and Werlang (1990) have already applied the MP theory to portfolio selection problems. These studies have shown that a decision maker having ambiguous beliefs will have a (nontrivial) range of prices at which he/she will neither buy or sell an uncertain asset, exhibiting inertia in portfolio selection. Applying our new results regarding ambiguous beliefs update, one may study the conditions under which these price ranges will shrink in the face of new information. Dow et al. (1989) studied trade among agents, at least one of whom has ambiguous beliefs. They show that the celebrated no-trade result of Milgrom and Stokey (1982) fails to hold in this context. In this study, the Dempster–Shafer rule for updating NA was used, a rule which is justified by the current chapter. Casting the trade setup into a dynamic context raises the question of an asymptotic no-trade theorem: Under what conditions will additional information reduce the volume or probability or trade? In another recent study, Yoo (1990) addressed the question of why stock prices tend to fall after the initial public offering and rise at a later stage. Although Yoo uses ambiguous beliefs mostly as in Bewley’s (1986) model, his results can also be obtained using the models mentioned earlier. It seems that the update rule justified by our study may explain the price dynamics. These various models seem to point at a basic problem: given a convex NA (or, equivalently, a set of priors which is the core of such a measure), under what conditions will the Dempster–Shafer rule yield convergence of beliefs to a single additive prior? Obviously, the answer cannot be “always.” Consider a “large” measurable space with all possible priors (equivalently, with the “unanimity game” as an NA measure). In this setup of “complete ignorance,” no conclusions about the future may be drawn from past observations—that is, the updated beliefs still include all possible priors. However, with some initial information (say, finitely many extreme points of the set of priors) convergence is possible. Conditions that will guarantee such convergence call for further study.
Updating ambiguous beliefs
159
The rest of this chapter is organized as follows. Section 8.2 presents the framework and quotes some results. Section 8.3 defines the update rules and states the theorems. Finally, Section 8.4 includes proofs, related analysis, and some remarks regarding possible generalizations.
8.2. Framework and preliminaries Let X be a set of consequences endowed with a weak order . Let (S, ) be a measurable space of states of the world, where is the algebra of events. A function f : S → X is -measurable if for every x ∈ X {s|f (s) > x},
{s|f (s) x} ∈ .
Let F = {f : S → X|f is -measurable} be the set of acts. Let F0 = {f ∈ F | |range(f )| < ∞} be the set of simple acts. A function u: X → R, which represents , that is, u(x) u(y) ⇐⇒ x y,
∀x, y ∈ X
is called a utility function. A function v: → [0, 1] satisfying (i) v(Ø) = 0; v(S) = 1; (ii) A ⊆ B ⇒ v(A) v(B) is an NA. It is convex if v(A ∪ B) + v(A ∩ B) v(A) + v(B) for all A, B ∈ . It is additive, or simply a measure, if the above inequality is always satisfied as an equality. A real-valued function is -measurable if for every t ∈ R {s|w(s) t},
{s|w(s) > t} ∈ .
Given such a function w and an NA v, the (Choquet) integral of w w.r.t. v on S is
w dv =
w dv =
S
0
∞
v({s|w(s) t}) dt +
0 ∞
[v({s|w(s) t}) − 1] dt.
For an NA measure v we define the core as for a cooperative game, that is, Core(v) = {p|p is a measure s.t. p(A) v(A)∀A ∈ }. Recall that a convex v has a nonempty core (see Shapley (1965)). We are now about to define two classes of binary relations on F : those represented by maximization of expected utility with NA, and those represented by maxmin of expected utility with MP.
160
Itzhak Gilboa and David Schmeidler
Denote by NA◦ (= NA◦ (X, , S, )) the set of binary relations on F such that there are a utility u, unique up to positive linear transformation (p.l.t.), and a unique NA v satisfying: (i) for every f ∈ F , u ◦ f is -measurable; (ii) for every f , g ∈ F f g ⇐⇒
u ◦ f dv
u◦g dv.
Note that in general the measurability of f does not guarantee that of u◦f , and that (ii) implies that on F , when restricted to constant functions, extends on X. Hence we use this convenient abuse of notation. Similarly, we will not distinguish between x ∈ X and the constant act which equals x on S. Characterizations of NA◦ were given by Schmeidler (1986, 1989) for the Anscombe–Aumann (1963) framework, where X is a mixture space and u is assumed affine; by Gilboa (1987) in the Savage (1954) framework, where X is arbitrary but = 2S and v is nonatomic; and by Wakker (1989) for the case where X is a connected topological space. Fishburn (1988) extended the characterization to non-transitive relations. Let MP◦ (= MP◦ (X, , S, )) denote the set of binary relations of F such that there are a utility u unique up to a p.l.t., and a unique nonempty, closed (in the weak* topology), and convex set C of (finitely additive) measures on such that: (i) for every f ∈ F , u ◦ f is -measurable; (ii) for every f , g ∈ F f g ⇐⇒ min p∈C
u ◦ f dp min p∈C
u ◦ g dp.
A characterization of MP◦ in the Anscombe–Aumann framework was given in Gilboa and Schmeidler (1989). To the best of our knowledge, there is no such axiomatization in the framework of Savage. However, the set NA◦ ∩ MP◦ , which will play an important role in the sequel, may be characterized by strengthening the axioms in Gilboa (1987). It will be convenient to include the trivial weak order ∗ = F × F in both NA and MP. Hence, we define NA = NA◦ ∪ {∗ } and MP = MP◦ ∪ {∗ }. For simplicity we assume that X has -maximal and -minimal elements. More specifically, let x ∗ , x∗ ∈ X satisfy x∗ x x ∗ for all x ∈ X. Without loss of generality (w.l.o.g.), assume that x∗ and x ∗ are unique. Since for both NA◦ and MP◦ the utility function is unique only up to a p.l.t. we will assume w.l.o.g. that u(x∗ ) = 0 and u(x ∗ ) = 1 for all utilities henceforth considered. When X is a mixture space we define NA and MP to be the subsets of NA and MP, respectively, where the utility u is also required to be affine. For such spaces X we recall the following results.
Updating ambiguous beliefs
161
Proposition 8.1. Suppose that ∈ NA and let v be the associated NA. Then ∈ MP iff v is convex. Proposition 8.2. Suppose that ∈ MP and let C be the associated set of measures. Define v(A) = min p(A) for A ∈ . p∈C
Then v is an NA and ∈ NA iff v is convex and C = Core(v). The proofs of these appear, explicity or implicity, in Schmeidler (1984, 1986, 1989). Note that the axiomatization of NA (Schmeidler, 1989) uses comonotonic independence, and given this property the convexity of v is equivalent to uncertainty aversion. The axiomatization of MP (Gilboa and Schmeidler, 1989) uses a weaker independence notion—termed C-independence and uncertainty aversion. Given these, the convexity of v and the equality C = Core(v) (where v is defined as in Proposition 8.2) is equivalent to comonotonic independence. We now define update rules. We need the following definitions. Given a measurable partition = {Ai }ni−1 of S and {fi }ni=1 ⊆ F , let (f1 , A1 ; . . . ;fn , An ) denote the act g ∈ F satisfying g(s) = fi (s) for all s ∈ Ai and all 1 i n. Given a binary relation on F , an event A ∈ is -null iff the following holds: for every f , g, h1 , h2 ∈ F , f g
iff (f , Ac ; h1 , A) (g, Ac ; h2 , A).
¯ an update Let B¯ denote the set of all binary relations on F . Given B ⊆ B, rule for B is a collection of functions, U = {UA }A∈ , where UA :B → B¯ such that for all ∈ B and A ∈ , Ac is UA ()-null and US () = . UA () should be thought of as the preference relation once A is known to have occurred. Given B and an update rule for it, U = {UA }A∈ , U is said to be commutative w.r.t. or -commutative if for every A, B ∈ we have UA () ∈ B and UB (UA ()) = UA∩B (). It is commutative if it is commutative w.r.t. for all ∈ B. (Note that this condition is stronger than strict commutativity, that is, UA ◦ UB = UB ◦ UA . However, “commutativity” seems to be a suggestive name which is not overburdened with other meanings.)
8.3. Bayesian and classical rules Given a set B of binary relations of F , every f ∈ F suggests a natural update rule f for B: define BUf = {BUA }A∈ by f
g BUA ()h ⇐⇒ (g, A; f , Ac ) (h, A; f , Ac )
for all g, h ∈ F .
162
Itzhak Gilboa and David Schmeidler f
It is obvious that for every f , BUf is an update rule, that is, that Ac is BUA ()null for all ∈ B and A ∈ . We will refer to it as the f -Bayesian update rule and {BUf }f ∈F will be called the set of Bayesian-update rules. Note that for ∈ NA with an additive v, all the Bayesian update rules coincide with Bayes’ rule, hence the definition of the Bayesian-update rules may be considered a formulation and axiomatization of Bayes’ rule in the case of (a unique) additive prior. Proposition 8.3. For every ∈ B¯ and f ∈ F , BUf is -commutative. Theorem 8.1. Let f ∈ F and assume that || > 4. Then the following are equivalent: (i) BUA (NA ) ⊆ NA for all A ∈ ; (ii) f = (x ∗ , T ; x∗ , T c ) for some T ∈ . f
Of particular interest are the Bayesian update rules corresponding to f = x ∗ and f = x∗ (i.e. T = S or T = Ø in (ii) earlier). For the latter (x∗ ) there is an “optimistic” interpretation: when comparing two actions given a certain event A, the decision maker implicitly assumes that had A not occurred, the worst possible outcome (x∗ ) would have resulted. In other words, the behavior given f A—BUA ()—exhibits “happiness” that A has occurred; the decisions are made as if we are always in “the best of all possible worlds.” Note that the corresponding NA is vA (B) = v(B ∩ A)/v(A). On the other hand, for f = x ∗ , we consider a “pessimistic” decision maker, whose choices reveal the hidden assumption that all the impossible worlds are the best conceivable ones. This rule defines the nonadditive function by vA (B) = [v((B ∩ A) ∪ Ac ) − v(Ac )]/(1 − v(Ac )), which is identical to the Dempster–Shafer rule for updating probabilities. It should not surprise us that this “pessimistic” rule is going to play a major role in relation to MP—that is, to uncertainty averse decision makers who follow a maxmin (expected utility) decision rule. In a similar way one may develop a “dual” theory of “optimism” in which uncertainty seeking will replace uncertainty aversion, concavity of v will replace convexity, and maxmax will supercede maxmin. For this “dual” theory, the update rule vA (B) = v(B ∩ A)/v(A) would be the “appropriate” one (in a sense that will be clear shortly). Note that this rule was used—without axiomatization—as a definition of probability update in Gilboa (1989).
Updating ambiguous beliefs
163
Taking a classical statistics point of view, it is natural to start out with a set of priors. Hence we only define classical update rules for B = MP . A natural procedure in the classical updating process is to rule out some of the given priors, and update the rest according to Bayes’ rule. Thus, we get a family of update rules, which differ in the way the priors are selected. Formally, a classical update rule is characterized by a function R: (C, A) → C such that C ⊆ C is a closed and convex set of measures for every such C and every A ∈ , with R(C, S) = C. The associated update rule will be denoted CU R = {CUAR }A∈ . (If R(C, A) = Ø we define CUAR () = ∗ .) Note that these are indeed update rules, that is, for every ∈ MP , every R and every A ∈ , Ac is CUAR ()-null. Furthermore, for ∈ MP with an associated set C, CUAR () ∈ MP provided that inf {p(A)|R(C, A)} > 0 for all A ∈ . Of particular interest will be the classical update rule called maximum likelihood and defined by R 0 (C A) = {p ∈ C|p(A) = max q(A) > 0}. q∈C
Theorem 8.2. CU R ∈ NA ∩ MP , BU(x A
∗
,S)
0
is commutative on NA ∩ MP . Furthermore, for
() = CUR A () ∈ NA ∩ MP . 0
That is, the Bayesian update rule with f = (x ∗ , S) coincides with the maximumlikelihood classical update rule. Moreover, they are also equivalent to the Dempster–Shafer update rule for belief functions. (Note that every belief function (see Shafer (1976)) is convex, though the converse is false. Yet one may apply the Dempster–Shafer rule for every NA v.)
8.4. Proofs and related analysis Proof of Proposition 8.1. It only requires to note that for every f , g ∈ F , A, B ∈ ((g, A; f , Ac ), B; f , B c ) = (g, A ∩ B; f , (A ∩ B)c ). Proof of Theorem 8.1. 8.1. First assume (ii). Let there be given ∈ NA with associated u and v. Define for B ∈ an NA vB by vB (A) = [v((A ∩ B) ∪ (T ∩ B c )) − v(T ∩ B c )]/[v(B ∪ T ) − v(T ∩ B c )]
164
Itzhak Gilboa and David Schmeidler
if the denominator is positive. (Otherwise the result is trivial.) For every g ∈ F we have
u ◦ (g, B; f , B ) dv = c
1
v({s|u ◦ (g, B; f , B c )(s) t}) dt
0
S
=
1
v((T ∩ B c ) ∪ ({s|u ◦ g(s) t} ∩ B)) dt
0 1
=
[v(T ∩ B c )
0
+ [v(B ∪ T ) − v(T ∩ B c )]vB ({s|u ◦ g(s) t})] dt c c = v(T ∩ B ) + [v(B ∪ T ) − v(T ∩ B )] u ◦ g dvB , where vB and u represent BUB , which implies that the latter is in NA . Conversely, assume (i) holds. Assume, to the contrary, that f (s) ∼ x for s ∈ D where D ∈ , D = S and x∗ < x < x ∗ (where ∼ denotes -equivalence). Let E, F ∈ satisfy E ∩ F = F ∩ D = D ∩ F = Ø. Denote α = u(x) (where 0 < α < 1). Choose m ∈ (α, 1) and a nonadditive v such that f
v(E) = v(F ) = v(D) = m v(E ∪ F ) = v(E ∪ D) = v(F ∪ D) = m and v(T ) = v(T ∩ (E ∪ F ∪ D)) for all T ∈ . Next define ∈ NA by v and (the unique) u. Choose g1 , g2 such that u ◦ g1 (s) = u ◦ g2 (s) = α s∈D u ◦ g1 (s) = 1, u ◦ g2 (s) = α + (1 − α/m) s ∈ E u ◦ g1 (s) = 0, u ◦ g2 (s) = α + (1 − α/m) s ∈ F . Let be BUE∪F (). By assumption it belongs to NA ; hence, there correspond to it u = u and v . Note that v is unique as is nontrivial, and that v (T )= v (T ∩ (E ∪ F )) for all T ∈ . As u ◦ g dv = u ◦ g2 dv, g1 ∼ g2 , whence g1 ∼ g2 . Hence, u ◦ g1 dv = 1 u ◦ g2 dv , that is, v (E) = α + (1 − α/m). Next choose β ∈ (0, α) and choose an act g3 ∈ F such that f
⎧ ⎪ ⎨α u ◦ g3 (s) = β ⎪ ⎩ 0
s∈D s∈E s ∈ F.
Updating ambiguous beliefs
165
For every γ ∈ (0, α) choose gγ ∈ F such that α s∈D u ◦ gγ (s) = γ s ∈ E ∪ F. Then u ◦ g3 dv = αm and u ◦gγ dv = αm + γ (1 − m). Hence, gγ > g3 and gγ > g3 for all γ > 0. However, u ◦ gγ dv = γ and u ◦ g3 dv = βv (E), where v (E) = 0, which is a contradiction. Remark 8.1. In the case of no extreme outcomes, that is, when X has no -maximal or no -minimal elements, and in particular when the utility is not bounded, there are no update rules BUf which map NA into itself. However, one may choose for g, h ∈ F ◦ x ∗ , x∗ ∈ X such that x ∗ g(s), h(s) x∗ , ∀s ∈ S, and for every f T ∈ define BUf () = {BUA }A∈ between g and h by f = (x ∗ , T ; x∗ , T c ). If ∈ NA, this definition is independent of the choice of x ∗ and x∗ . The resulting update rule will be commutative for any (fixed) T ∈ . Proof of Theorem 8.2. Let ∈ MP be given, and let C denote its associated set of additive measures. Define v(·) = minp∈C p(·). Assume that v is convex and C = Core(v). For A ∈ with q(A) > 0 for some q ∈ C, we have " % R 0 (C, A) = p ∈ C|p(A) = max q(A) = {p ∈ C|p(Ac ) = v(Ac )}. q∈C
R R R ∗ (Note that if v(Ac ) = 1, CUR A (CUB ()) = CUB (CUA () = .) As was shown in Schmeidler (1984), v is convex iff for every chain Ø = E0 ⊆ E1 ⊂ · · · ⊂ En = S there is an additive measure p in Core(v) = C such that p(Ei ) = v(Ei ), 0 i n. Furthermore, this requirement for n = 3 is also equivalent to convexity. Next define 0
0
vA (T ) = min{p(T ∩ A)|p ∈ R 0 (C, A)}. (T ) = v((T ∩ A) ∪ Ac ) − v(Ac ). Claim. vA
Proof. For p ∈ R 0 (C, A) we have p(T ∩ A) = p((T ∩ A) ∪ Ac ) − p(Ac ) = p((T ∩ A) ∪ Ac ) − v(Ac ) v((T ∩ A) ∪ Ac ) − v(Ac ) whence vA (T ) v((T ∩ A) ∪ Ac ) − v(Ac ).
0
0
166
Itzhak Gilboa and David Schmeidler
To show the converse inequality, consider the chain Ø ⊆ Ac ⊆ Ac ∪ (A ∩ T ) ⊆ S. By convexity there is p ∈ Core(v) = C satisfying p(Ac ) = v(Ac ) and p(Ac ∪ (T ∩ A)) = v(Ac ∪ (T ∩ A)) which also implies p ∈ R 0 (C, A). Then vA (T ) p(T ∩ A) = p((T ∩ A) ∪ Ac ) − p(Ac )
= v((T ∩ A) ∪ Ac ) − v(Ac ). ∗ c Consider CUR A (). If it is not equal to , it has to be the case that v(A ) < 1, and then it is defined by the set of additive measures 0
CA = {pA |p ∈ R 0 (C, A)} where pA (T ) = p(T ∩ A)/p(A) = p(T ∩ A)/(1 − v(Ac )). Note that CA is nonempty, closed, and convex. Define vA (T ) = min{p(T )|p ∈ CA }, (T )/(1 − v(Ac )), that is, and observe that vA (T ) = vA
vA (T ) = [v((T ∩ A) ∪ Ac ) − v(Ac )]/[1 − v(Ac )].
(8.1)
Hence, vA is also convex. We have to show that CA = Core(vA ). To see this, let p ∈ Core(vA ). We will show that p = qA for some q ∈ R 0 (C, A). Take any q ∈ Core(v) and define q(T ) = p(T ∩ A)[1 − v(Ac )] + q (T ∩ Ac ). Note that q(T ∩ A) = p(T ∩ A)[1 − v(Ac )] vA (T ∩ A)[1 − v(Ac )] = v((T ∩ A) ∪ Ac ) − v(Ac ). (As p ∈ Core(vA ) and by definition of the latter.) Next, since q ∈ Core(v), q(T ∩ Ac ) = q (T ∩ Ac ) v(T ∩ Ac ). Hence, q(T ) = q(T ∩ A) + q(T ∩ Ac ) v((T ∩ A) ∪ Ac ) − v(Ac ) + v(T ∩ Ac ) = v(T ∪ Ac ) − v(Ac ) + v(T ∩ Ac ) v(T ), where the last inequality follows from the convexity of v. Finally, q(S) = q(A) + q(Ac ) = p(A)[1 − v(Ac )] + v(Ac ) = 1. Hence, q ∈ Core(v). Furthermore, q ∈ R 0 (C, A). Obviously, p = qA .
Updating ambiguous beliefs
167
∗
(x , S) R Thus we establish CUR () and A () ∈ NA . Furthermore, CUA () = BUA the non-additive probability update rule (8.1) coincides with the Dempster–Shafer 0 rule. Any of these two facts, combined with the observation CUR A () ∈ NA , 0 R implies that CU is commutative. 0
0
Remark 8.2. It is not difficult to see that the maximum-likelihood update rule is not commutative in general. In fact, one may ask whether the converse of Theorem 8.2 is true, that is, whether a relation ≥ ∈ MP w.r.t. which CUR0 is commutative has to define a set C which is a core of an NA. The negative answer is given by the following example: S = {1, 2, 3, 4}, = 2S , C = conv{p1 , p2 } defined by
p1 p2
1
2
3
4
0.7 0.1
0.1 0.3
0.1 0.3
0.1 0.3
It is easily verifiable that the maximum-likelihood update rule is commutative w.r.t. the induced ∈ MP , though C is not the core of any v. Remark 8.3. It seems that the maximum-likelihood update rule is not commutative in general, because it lacks some “look-ahead” property. One is tempted to define an update rule that will retain all the priors which may, at some point in the future, turn out to be likelihood maximizers. Thus, we are led to the “semi-generalized maximum likelihood”: " R 1 (C, A) = cl conv p ∈ C|p(E) = max q(E) > 0 q∈C
%
for some measurableE ⊆ A (where cl means closure in the weak* topology). Note that the resulting set of measures may include p ∈ C such that p(A) = 0. In this case, define 1 ∗ CUR A () = . However, the following example shows that this update rule also fails to be commutative in general. Consider S = {1, 2, 3, 4, 5}, = 2S , and let C be conv {p1 , p2 , p3 , p4 } defined by the following table:
p1 p2 p3 p4
1
2
3
4
5
0.2 0 0.27 0
0.2 0 0 0.27
0.01 0.4 0.03 0.03
0.09 0.4 0 0
0.5 0.2 0.7 0.7
168
Itzhak Gilboa and David Schmeidler
Taking A = {1, 2, 3, 4} and B = {1, 2, 3}, one may verify that R 1 (R 1 (C, A), B) = {p2 , p3 , p4 } and R 1 (C, B) = {p1 , p2 , p3 , p4 } and that p1B is not in the convex hull of {p2B , p3B , p4B }. We may try an even more generalized version of the maximum-likelihood criterion: retain all priors according to which the integral of some non-negative simple function is maximized. That is, define " " % R 2 (C, A) = cl conv p ∈ C u ◦ f dp = max u ◦ f dq q ∈ C > 0 % for some f ∈ F◦ . The maximization of u ◦ f dp for some f may be viewed as maximization of some convex combination of the likelihood function at several points of time. 2 However, the same example shows that CUR is not commutative in general. Remark 8.4. Although our results are formulated for NA and MP , they may be generalized easily. First, one should note that none of the results actually requires that X be a mixture space. All that is needed is that the utility on X be uniquely defined (up to a p.l.t.) and that its range will contain an open interval. In particular, connected topological spaces with a continuous utility function will do. Moreover, most of the results do not even require such richness of the utility’s range. In fact, this richness was only used in the proof of (i) ⇒ (iii) in Theorem 8.1.
Acknowledgments We thank James Dow, Joe Halpern, Jean-Yves Jaffray, Bart Lipman, Klaus Nehring, Sergiu Werlang, and an anonymous referee for stimulating discussions and comments. NSF grants SES-9113108 and SES-9111873 are gratefully acknowledged.
References C. E. Agnew (1985) Multiple probability assessments by dependent experts, J. Amer. Statist. Assoc., 80, 343–347. F. J. Anscombe and R. J. Aumann (1963) A definition of subjective probability, Ann. Math. Statist., 34, 199–205. T. Bewley (1986) “Knightian Decision Theory: Part I,” Cowles Foundation Discussion paper No. 807, Yale University.
Updating ambiguous beliefs
169
T. Bewley (1987) “Knightian Decision Theory: Part II: Intertemporal Problems,” Cowles Foundation Discussion paper No. 835, Yale University. T. Bewley (1988) “Knightian Decision Theory and Econometric Inference,” Cowles Foundation Discussion paper No. 868, Yale University. A. Chateauneuf and J.-Y. Jaffray (1989) Some characterizations of lower probabilities and other monotone capacities through the use of Moebius inversion, Mathematical Social Sciences, 17, 263–283. G. Choquet (1953–1954) Theory of capacities, Ann. l’Institut Fourier, 5, 131–295. A. P. Dempster (1967) Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Statist., 38, 325–339. A. P. Dempster (1968) A generalization of Bayesian inference, J. Roy. Statist. Soc. Series B, 30, 205–247. J. Dow and S. R. C. Werlang (1990) “Risk Aversion, Uncertainty Aversion and the Optimal Choice of Portfolio,” London Business School Working Paper. (Reprinted as Chapter 17 in this volume.) J. Dow, V. Madrigal, and S. R. C. Werlang (1989) “Preferences, Common Knowledge and Speculative Trade,” Fundacao Getulio Vargas Working paper. D. Ellsberg (1961) Risk, ambiguity and the Savage axioms, Quart. J. Econ., 75, 643–669. R. Fagin and J. Y. Halpern (1989) A new approach to updating beliefs, mimeo. P. C. Fishburn (1988) Uncertainty aversion and separated effects in decision making under uncertainty, in “Combining Fuzzy Imprecision with Probabilistic Uncertainty in Decision Making,” (J. Kacprzyk and M. Fedrizzi, eds), pp. 10–25, Springer-Verlag, New York/Berlin. C. Genest and M. J. Schervish (1985) Modeling expert judgments for Bayesian updating, Ann. Statist., 13, 1198–1212. I. Gilboa (1987) Expected utility theory with purely subjective non-additive probabilities, J. Math. Econ., 16, 65–88. I. Gilboa (1989)Additivizations of non-additive measures, Math. Operations Res., 14, 1–17. I. Gilboa and D. Schmeidler (1989) Maxmin expected utility with non-unique prior, J. Math. Econ., 18, 141–153. (Reprinted as Chapter 6 in this volume.) J. Y. Halpern and R. Fagin (1989) Two views of belief: Belief as generalized probability and belief as evidence, mimeo. J. Y. Halpern and M. R. Tuttle (1989) Knowledge, probability and adversaries, mimeo. J. Y. Jaffray (1989) Linear utility theory for belief functions, Operations Res. Lett., 8, 107–112. J. Y. Jaffray (1990) Bayesian updating and belief functions, mimeo. D. V. Lindley, A. Tversky, and R. V. Brown (1979) On the reconciliation of probability assessments, J. Roy. Statist. Soc. Series A, 142, 146–180. P. Milgrom and N. Stokey (1982) Information, trade and common knowledge, J. Econ. Theory, 26, 17–27. J. von Neumann and O. Morgenstern (1947) “Theory of Games and Economic Behavior,” 2nd edn, Princeton University Press, Princeton, NJ. L. J. Savage (1954) “The Foundations of Statistics,” Wiley, New York. D. Schmeidler (1982) “Subjective Probability Without Additivity,” (temporary title), Working paper, Foerder Institute for Economic Research. Tel Aviv University. D. Schmeidler (1984) Nonadditive probabilities and convex games, mimeo. D. Schmeidler (1986) Integral representation without additivity, Proc. Amer. Math. Soc., 97, 253–261.
170
Itzhak Gilboa and David Schmeidler
D. Schmeidler (1989) Subjective probability and expected utility without additivity, Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) G. Shafer (1976) “A Mathematical Theory of Evidence,” Princeton University Press. Princeton, NJ. M. H. Simonsen and S. R. C. Werlang (1990) Subadditive probabilities and portfolio inertia, mimeo. L. S. Shapley (1971) “Notes on n-person Games VII: Cores of Convex Games,” The Rand Corporation R. M. (1965); also as Cores of convex games. Int. J. Game Theory, 1, 12–26. Ph. Smets (1986) “Combining Non-distinct Evidences,” Technical report ULBIRIDIA-86/3. P. Wakker (1989) Continuous subjective expected utility with non-additive probabilities, J. Math. Econ., 18, 1–17. K. R. Yoo (1990) A theory of the underpricing of initial public offerings, mimeo.
9
A definition of uncertainty aversion Larry G. Epstein
9.1. Introduction 9.1.1. Objectives The concepts of risk and risk aversion are cornerstones of a broad range of models in economics and finance. In contrast, relatively little attention is paid in formal models to the phenomenon of uncertainty that is arguably more prevalent than risk. The distinction between them is roughly that risk refers to situations where the perceived likelihoods of events of interest can be represented by probabilities, whereas uncertainty refers to situations where the information available to the decision-maker is too imprecise to be summarized by a probability measure. Thus the terms “vagueness” or “ambiguity” can serve as close substitutes. Ellsberg, in his famous experiment, has demonstrated that such a distinction is meaningful empirically, but it cannot be accommodated within the subjective expected utility (SEU) model. Perhaps because this latter model has been so dominant, our formal understanding of uncertainty and uncertainty aversion is poor. There exists a definition of uncertainty aversion, due to Schmeidler (1989), for the special setting of Anscombe–Aumann (AA) horse-race/roulette-wheel acts. Though it has been transported and widely adopted in models employing the Savage domain of acts, I feel that it is both less appealing and less useful in such contexts. Because the Savage domain is typically more appropriate and also more widely used in descriptive modeling, this suggests the need for an alternative definition of uncertainty aversion that is more suited to applications in a Savage domain. Providing such a definition is the objective of this chapter. Uncertainty aversion is defined for a large class of preferences. This is done for the obvious reason that a satisfactory understanding of uncertainty aversion can be achieved only if its meaning does not rely on preference axioms that are auxiliary rather than germane to the issue. On the other hand, Choquet expected utility (CEU) theory Schmeidler (1989) and its close relative, the multiple-priors model Epstein, L.G. (1999) “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608.
172
Larry G. Epstein
Gilboa and Schmeidler (1989), provide important examples for understanding the nature of our definition, as they are the most widely used and studied theories of preference that can accommodate Ellsberg-type behavior. Recall that risk aversion has been defined and characterized for general preferences, including those that lie outside the expected utility class (see Yaari (1969) and Chew and Mao (1995), for example). There is a separate technical or methodological contribution of the paper. After the formulation and initial examination of the definition of uncertainty aversion, subsequent analysis is facilitated by assuming eventwise differentiability of utility. The role of eventwise differentiability may be described roughly as follows: The notion of uncertainty aversion leads to concern with the “local probabilistic beliefs” implicit in an arbitrary preference order or utility function. These beliefs represent the decision-maker’s underlying “mean” or “ambiguity-free” likelihood assessments for events. In general, they need not be unique. But they are unique if utility is eventwise differentiable (given suitable additional conditions). Further perspective is provided by recalling the role of differentiability in decision theory under risk, where utility functions are defined on cumulative distribution functions. Much as calculus is a powerful tool, Machina (1982) has shown that differential methods are useful in decision theory under risk. He employs Frechet differentiability; others have shown that Gateaux differentiability suffices for many purposes (Chew et al., 1987). In the present context of decision-making under uncertainty, where utility functions are defined over acts, the preceding two notions of differentiability are not useful for the task of uncovering implicit local beliefs. On the other hand, eventwise differentiability “works.” Because local probabilistic beliefs are likely to be useful more broadly, so it seems will the notion of eventwise differentiability. It must be acknowledged, however, that eventwise differentiability has close relatives in the literature, namely in Rosenmuller (1972) and Machina (1992).1 The differences from this chapter and the value added here are clarified later (Appendix C). It seems accurate to say that this chapter adds to the demonstration in Machina (1992) that differential techniques are useful also for analysis of decision-making under uncertainty. The chapter proceeds as follows: The Schmeidler definition of uncertainty aversion is examined first. This is accompanied by examples that motivate the search for an alternative definition. Then, because the parallel with the well understood theory of risk aversion is bound to be helpful, relevant aspects of that theory are reviewed. A new definition of uncertainty aversion is formulated in the remainder of Section 9.2 and some attractive properties are described in Section 9.3. In particular, uncertainty aversion is shown to have intuitive empirical content and to admit simple characterizations within the CEU and multiple-priors models. The notion of “eventwise derivative” and the analysis of uncertainty aversion given eventwise differentiability follow in Section 9.4. It is shown that eventwise differentiability of utility simplifies the task of checking whether the corresponding preference order is uncertainty averse and thus enhances the tractability of the proposed definition. Section 9.5 concludes with remarks on the significance of the choice between the domain of Savage acts versus the larger AA domain of horse-race/roulette-wheel
A definition of uncertainty aversion
173
acts. This difference in domains is central to understanding the relation between this chapter and Schmeidler (1989). Two important limitations of the analysis should be acknowledged at the start. First, uncertainty aversion is defined relative to an exogenously specified collection of events A. Events in A are thought of as unambiguous or uncertainty free. They play a role here parallel to that played by constant (or risk-free) acts in the standard analysis of risk aversion. However, whether or not an event is ambiguous is naturally viewed as subjective or derived from preference. Accordingly, it seems desirable to define uncertainty aversion relative to the collection of subjectively unambiguous events. Unfortunately, such a formulation is beyond the scope of this chapter.2 In defense of the exogenous specification of the collection A, observe that Schmeidler (1989) relies on a comparable specification through the presence of objective lotteries in the AA domain. In addition, it seems likely that given any future success in endogenizing ambiguity, the present analysis of uncertainty aversion relative to a given collection A will be useful. The other limitation concerns the limited success in this chapter in achieving the ultimate objective of deriving the behavioral consequences of uncertainty aversion. The focus here is on the definition of uncertainty aversion. Some behavioral implications are derived but much is left for future work. In particular, applications to standard economic contexts, such as asset pricing or games, are beyond the scope of the chapter. However, the importance of the groundwork laid here for future applications merits emphasis—an essential precondition for understanding the behavioral consequences of uncertainty aversion is that the latter term has a precise and intuitively satisfactory meaning. Admittedly, there have been several papers in the literature claiming to have derived consequences of uncertainty aversion for strategic behavior and also for asset pricing. To varying degrees these studies either adopt the Schmeidler definition of uncertainty aversion or they do not rely on a precise definition. In the latter case, they adopt a model of preference that has been developed in order to accommodate an intuitive notion of uncertainty aversion and interpret the implications of this preference specification as due to uncertainty aversion. (This author is partly responsible for such an exercise (Epstein and Wang, 1995); there are other examples in the literature.) There is an obvious logical flaw in such a procedure and the claims made (or the interpretations proposed) are unsupportable without a satisfactory definition of uncertainty aversion. 9.1.2. The current definition of uncertainty aversion In order to motivate the chapter further, consider briefly Schmeidler’s definition of uncertainty aversion. See Section 9.5 for a more complete description and for a discussion of the importance of the choice between the AA domain (as in Schmeidler, 1989) and the Savage domain (as in this chapter). Fix a state space (S, ), where is an algebra, and an outcome set X . Denote by F the Savage domain, that is, the set of all finite-ranged (simple) and measurable acts e from (S, ) into X . Choice behavior relative to F is the object of study.
174
Larry G. Epstein
Accordingly, postulate a preference order ! and a representing utility function U defined on F. Schmeidler’s definition of uncertainty aversion has been used primarily in the context of CEU theory, according to which uncertain prospects are evaluated by a utility function having the following form: ceu u(e) dν, e ∈ F. (9.1) U (e) = S
Here, u : X −→ R1 is a vNM utility index, ν is a capacity (or nonadditive probability) on , integration is in the sense of Choquet and other details will be provided later.3 For such a preference order, uncertainty aversion in the sense of Schmeidler is equivalent to convexity of the capacity ν, that is, to the property whereby ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B),
(9.2)
for all measurable events A and B. Additivity is a special case that characterizes uncertainty neutrality (suitably defined). However, Ellsberg’s single-urn experiment illustrates the weak connection between convexity of the capacity and behavior that is intuitively uncertainty averse.4 The urn is represented by the state space S = {R, B, G}, where the symbols represent the possible colors, red, blue, and green of a ball drawn at random from an urn. The information provided the decision-maker is that the urn contains 30 red balls and 90 balls in total. Thus, while he knows that there are 60 balls that are either blue or green, the relative proportions of each are not given. Let ! be the decision-maker’s preference over bets on events E ⊂ S. Typical choices in such a situation correspond to the following rankings of events:5 {R} {B} ∼ {G}, {B, G} {R, B} ∼ {R, G}.
(9.3)
The intuition for these rankings is well known and is based on the fact that {R} and {B, G} have objective probabilities, while the other events are “ambiguous,” or have “ambiguous probabilities.” Thus these rankings correspond to an intuitive notion of uncertainty or ambiguity aversion. Next suppose that the decision-maker has CEU preferences with capacity ν. Then convexity is neither necessary nor sufficient for the above rankings. For example, if ν(R) = 8/24, ν(B) = ν(G) = 7/24, and ν({B, G}) = 13/24, ν({R, G}) = ν({R, B}) = 1/2, then (9.3) is implied but ν is not convex. For the fact that convexity is not sufficient, observe that convexity does not even exclude the “opposite” rankings that intuitively reflect an affinity for ambiguity. (Let ν(R) = 1/12, ν(B) = ν(G) = 1/6, ν({B, G}) = 1/3, ν({R, G}) = ν({R, B}) = 1/2.) An additional example, taken from Zhang (1997), will reinforce that stated earlier and also illustrate a key feature of the analysis to follow. An urn contains 100 balls in total, with color composition R, B, W , and G, such
A definition of uncertainty aversion
175
that R + B = 50 = G + B. Thus S = {R, B, G, W } and the collection A = {∅, S, {B, G}, {R, W }, {B, R}, {G, W }} contains the events that are intuitively unambiguous. It is natural to suppose that the decision-maker would use the probability measure p on A, where p assigns probability 1/2 to each binary event. For other subsets of S, she might use the capacity p∗ defined by6 p∗ (E) = sup {p(B) : B ⊂ E, B ∈ A},
E ⊂ S.
The fact that the capacity of any E is computed by means of an inner approximation by unambiguous events seems to capture a form of aversion to ambiguity. However, p∗ is not convex because 1 = p∗ ({B, G}) + p∗ ({B, R}) > p∗ ({B, G, R}) + p∗ ({B}) =
1 . 2
Finally, observe that the collection A is not an algebra, because it is not closed with respect to intersections. Each of {R, B} and {G, B} is unambiguous, but {B} is ambiguous, showing that an algebra is not the appropriate mathematical structure for modeling collections of unambiguous events. This important insight is due to Zhang (1997). He further proposes an alternative structure, called a λ-system, that is adopted in Section 9.2.2.
9.2. Aversion to risk and uncertainty 9.2.1. Risk aversion Recall first some aspects of the received theory of risk aversion. This will provide some perspective for the analysis of uncertainty aversion. In addition, it will become apparent that if a distinction between risk and uncertainty is desired, then the theory of risk aversion must be modified. Because a subjective approach to risk aversion is the relevant one, adapt Yaari’s analysis (Yaari, 1969), which applies to the primitives (S, ), X ⊂ RN and !, a preference over the set of acts F. Turn first to “comparative risk aversion.” Say that !2 is more risk averse than 1 ! if for every act e and outcome x, x !1 (1 )e =⇒ x !2 (2 )e.
(9.4)
The two acts that are being compared here differ in that the variable outcomes prescribed by e are replaced by the single outcome x. The intuition for this definition is clear given the identification of constant acts with the absence of risk or perfect certainty. To define absolute (rather than comparative) risk aversion, it is necessary to adopt a “normalization” for risk neutrality. Note that this normalization is exogenous to the model. The standard normalization is the “expected value function,”
176
Larry G. Epstein
that is, risk neutral orders !rn are those satisfying:
e !rn e ⇐⇒
e(s)dm(s) !rn S
e (s)dm(s),
(9.5)
S
for some probability measure m on (S, ), where the R N -valued integrals are interpreted as constant acts and accordingly are ranked by !rn . This leads to the following definition of risk aversion: Say that ! is risk averse if there exists a risk neutral order !rn such that ! is more risk averse than !rn . Risk loving and risk neutrality can be defined in the obvious ways. In the SEU framework, this notion of risk aversion is the familiar one characterized by concavity of the vNM index, with the required m being the subjective beliefs or prior. By examining the implications of risk aversion for choice between binary acts, Yaari (1969), argues that this interpretation for m extends to more general preferences. Three points from this review merit emphasis. First, the definition of comparative risk aversion requires an a priori definition for the absence of risk. Observe that the identification of risklessness with constant acts is not tautological. For example, Karni (1983) argues that in a state-dependent expected utility model “risklessness” may very well correspond to acts that are not constant. Thus the choice of how to model risklessness is a substantive normalization that precedes the definition of “more risk averse.” Second, the definition of risk aversion requires further an a priori definition of risk neutrality. The final point is perhaps less evident or familiar. Consider rankings of the sort used in (9.4) to define “more risk averse.” A decision-maker may prefer the constant act because she dislikes variable outcomes even when they are realized on events that are understood well enough to be assigned probabilities (risk aversion). Alternatively, the reason for the indicated preference may be that the variable outcomes occur on events that are ambiguous and because she dislikes ambiguity or uncertainty. Thus it seems more appropriate to describe (9.4) as revealing that !2 is “more risk and uncertainty averse than !1 ,” with no attempt being made at a distinction. However, the importance of the distinction between these two underlying reasons seems self-evident; it is reflected also in recent concern with formal models of “Knightian uncertainty” and decision theories that accommodate the Ellsberg (as opposed to Allais) Paradox. The second possibility mentioned earlier can be excluded, and thus a distinction made, by assuming that the decision-maker is indifferent to uncertainty, or put another way, by assuming that there is no uncertainty (all events are assigned probabilities). But these are extreme assumptions that are contradicted in Ellsberg-type situations. This chapter identifies and focuses upon the uncertainty aversion component implicit in the comparisons (9.4) and, to a limited extent, achieves a separation between risk aversion and uncertainty aversion.
A definition of uncertainty aversion
177
9.2.2. Uncertainty aversion Once again, consider orders ! on F, where for the rest of the chapter the outcome set X is arbitrary rather than Euclidean. The objective now is to formulate intuitive notions of comparative and absolute uncertainty aversion. Turn first to comparative uncertainty aversion. It is clear intuitively and also from the discussion of risk aversion that one can proceed only given a prior specification of the “absence of uncertainty.” This specification takes the form of an exogenous family A ⊂ of “unambiguous” events. Assume throughout the following intuitive requirements for A: It contains S and A∈A A1 , A2 ∈ A
implies that Ac ∈ A; and
A1 ∩ A2 = ∅ imply that A1 ∪ A2 ∈ A.
Zhang (1997) argues that these properties are natural for a collection of unambiguous events and, following (Billingsley, 1986: 36), calls such collections λ-systems. Intuitively, if an event being unambiguous means that it can be assigned a probability by the decision-maker, then the sum of the individual probabilities is naturally assigned to a disjoint union, while the complementary probability is naturally assigned to the complementary event. As demonstrated earlier, it is not intuitive to require that A be closed with respect to non-disjoint unions or intersections, that is, that A be an algebra. Denote by F ua the set of A-measurable acts, also called unambiguous acts. The following definition parallels the earlier one for comparative risk aversion. Given two orderings, say that !2 is more uncertainty averse than !1 if for every unambiguous act h and every act e in F, h !1 (1 )e =⇒ h !2 (2 )e.
(9.6)
There is no loss of generality in supposing that the acts h and e deliver the identical outcomes. The difference between the acts lies in the nature of the events where these outcomes are delivered (some of these events may be empty). For h, the typical outcome x is delivered on the unambiguous event h−1 (x), while it occurs on an ambiguous event given e. Then whenever the greater ambiguity inherent in e leads !1 to prefer h, the more ambiguity averse !2 will also prefer h. This interpretation relies on the assumption that each event in A is unambiguous and thus is (weakly) less ambiguous than any E ∈ . Fix an order !. To define absolute (rather than comparative) uncertainty aversion for !, it is necessary to adopt a “normalization” for uncertainty neutrality. As in the case of risk, a natural though exogenous normalization exists, namely that preference is based on probabilities in the sense of being probabilistically sophisticated as defined in Machina and Schmeidler (1992). The functional form of representing utility functions reveals clearly the sense in which preference is based on probabilities. The components of that functional form are a probability measure m on the state space (S, ) and a functional W : (X ) −→ R1 , where (X ) denotes the set of all simple (finite support) probability measures
178
Larry G. Epstein
on the outcome set X . Using m, any act e induces such a probability distribution m,e . Probabilistic sophistication requires that e be evaluated only through the distribution over outcomes m,e that it induces. More precisely, utility has the form U ps (e) = W (m,e ),
e ∈ F.
(9.7)
Following Machina and Schmeidler (1992: 754), assume also that W is strictly increasing in the sense of first-order stochastic dominance, suitably defined.7 Denote any such order by !ps . A decision-maker with !ps assigns probabilities to all events and in this way transforms any act into a lottery, or pure risk. Such exclusive reliance on probabilities is, in particular, inconsistent with the typical “uncertainty averse” behavior exhibited in Ellsberg-type experiments. Thus it is both intuitive and consistent with common practice to identify probabilistic sophistication with uncertainty neutrality. Think of m and W as the “beliefs” (or probability measure) and “risk preferences” underlying !ps .8 This normalization leads to the following definition: Say that ! is uncertainty averse if there exists a probabilistically sophisticated order !ps such that ! is more uncertainty averse than !ps . In other words, under the conditions stated in (9.6), h !ps (ps )e =⇒ h ! ()e.
(9.8)
The intuition is similar to that for (9.6). It is immediate that ! and !ps agree on unambiguous acts. Further, !ps is indifferent to uncertainty and thus views all acts as being risky only. Therefore, interpret (9.8) as stating that !ps is a “risk preference component” of !. The indefinite article is needed for two reasons—first because all definitions depend on the exogenously specified collection A and second, because !ps need not be unique even given A. Subject to these same qualifications, the probability measure underlying !ps is naturally interpreted as “mean” or “uncertainty-free” beliefs underlying !. The formal analysis stated later does not depend on these interpretations. It might be useful to adapt familiar terminology and refer to !ps satisfying (9.8) as constituting a support for ! at h. Then uncertainty aversion for ! means that there exists a single order !ps supporting ! at every unambiguous act. A parallel requirement in consumer theory is that there exist a single price vector supporting the indifference curve at each consumption bundle on the 45◦ line. (This parallel is developed further in Section 9.3.4 and via Theorem 9.3(c).) Turn next to uncertainty loving and uncertainty neutrality. For the definition of the former, reverse the inequalities in (9.8). That is, say that ! is uncertainty loving if there exists a probabilistically sophisticated order !ps such that, under the conditions stated in (9.6), h )ps (≺ps )e =⇒ h ) (≺)e.
(9.9)
The conjunction of uncertainty aversion and uncertainty loving is called uncertainty neutrality.
A definition of uncertainty aversion
179
9.2.3. A degree of separation Consider the question of a separation between attitudes toward uncertainty and attitudes toward risk. Suppose that ! is uncertainty averse with support !ps . Because ! and !ps agree on the set F ua of unambiguous acts, ! is probabilistically sophisticated there. Thus, treating the probability measure underlying !ps as objective, one may adopt the standard notion of risk aversion (or loving) for objective lotteries (see e.g. Machina (1982)) in order to give precise meaning to the statement that ! is risk averse (or loving). In the same way, such risk attitudes are well defined if ! is uncertainty loving. That a degree of separation between risk and uncertainty attitudes has been achieved is reflected in the fact that all four logically possible combinations of risk and uncertainty attitudes are admissible. On the other hand, the separation is partial: If !1 is more uncertainty averse than !2 , then these two preference orders must agree on F ua and thus embody the same risk aversion. As emphasized earlier, the meaning of uncertainty aversion depends on the exogenously specified A. That specification also bears on the distinction between risk aversion and uncertainty aversion. The suggestion just expressed is that the risk attitude of an order ! is embodied in the ranking it induces on F ua , while the attitude toward uncertainty is reflected in the way in which ! relates arbitrary acts e with unambiguous acts h as in (9.6). Thus if the modeler specifies that A = {∅, S}, and hence that F ua contains only constant acts, then she is assuming that the decision-maker is not facing any meaningful risk. Accordingly, the modeler is led to interpret comparisons of the form (9.4) as reflecting (comparative) uncertainty aversion exclusively. At the other extreme, if the modeler specifies that A = , and hence that all acts in F are unambiguous, then she is assuming that the decision-maker faces only risk, which leads to the interpretation of (9.4) as reflecting (comparative) risk aversion exclusively. More generally, the specification of A reflects the modeler’s prior view of the decision-maker’s perception of his environment.
9.3. Is the definition attractive? 9.3.1. Some attractive properties The definition of uncertainty aversion has been based on the a priori identification of uncertainty neutrality (defined informally) with probabilistic sophistication. Therefore, internal consistency of the approach should deliver this identification as a formal result. On the other hand, because attitudes toward uncertainty have been defined relative to a given A, such a result cannot be expected unless it is assumed that A is “large.” Suppose, therefore, that A is rich: There exist x ∗ x∗ such that for every E¯ ⊂ E in and A in A satisfying (x ∗ , A; x∗ , Ac ) ∼ (x ∗ , E; x∗ , E c ),
180
Larry G. Epstein
there exists A¯ in A, A¯ ⊂ A such that ¯ x∗ , A¯ c ) ∼ (x ∗ , E; ¯ x∗ , E¯ c ). (x ∗ , A; A corresponding notion of richness is valid for the roulette-wheel lotteries in the AA framework adopted by Schmeidler (1989).9 The next theorem (proved in Appendix A) establishes the internal consistency of our approach. Theorem 9.1. If ! is probabilistically sophisticated, then it is uncertainty neutral. The converse is true if A is rich. The potential usefulness of the notion of uncertainty aversion depends on being able to check for the existence of a probabilistically sophisticated order supporting a given !. This concern with tractability motivates the later analysis of eventwise differentiability. Anticipating that analysis, consider here the narrower question “does there exist !ps that both supports ! and has underlying beliefs represented by the given probability measure m on ?” On its own, the question may seem to be of limited interest. But once eventwise differentiability delivers m, its answer completes a procedure for checking for uncertainty aversion. Lemma 9.1. Let !ps support ! in the sense of (9.8) and have underlying probability measure m on . Then: (i) For any two unambiguous acts h and h , if m,h first-order stochastically dominates m,h , then U (h) ≥ U (h ). (ii) For all acts e and unambiguous acts h, m,e = m,h =⇒ U (e) ≤ U (h). The converse is true if m satisfies: For each unambiguous A and 0 < r < mA, there exists unambiguous B ⊂ A with mB = r. The added assumption for m is satisfied if S = S1 × S2 , unambiguous events are measurable subsets of S1 and the marginal of m on S1 is convex-ranged in the usual sense. The role of the assumption is to ensure that, using the notation surrounding (9.7), {m,h : h ∈ F ua } = (X ). 9.3.2. Multiple-priors and CEU utilities The two most widely used generalizations of SEU theory are CEU and the multiplepriors model. In this subsection, uncertainty aversion is examined in the context of these models.
A definition of uncertainty aversion
181
Say that ! is a multiple-priors preference order if it is represented by a utility function U mp of the form U mp (e) = min u(e) dm, (9.10) m∈P
S
for some set P of probability measures on (S, ) and some vNM index u : X −→ R1 . Given a class A, it is natural to model the unambiguous nature of events in A by supposing that all measures in P are identical when restricted to A; that is, mA = m A
for all m and m in P and A in A.
(9.11)
These two restrictions on ! imply uncertainty aversion, because ! is more uncertainty averse than the expected utility order !ps with vNM index u and any probability measure m in P . More precisely, the following intuitive result is valid: Theorem 9.2. Any multiple-priors order satisfying (9.11) is uncertainty averse. Proof. Let !ps denote an expected utility order with vNM index u and any probps e ⇐⇒ ability measure m in P . Then h ! u(h) dm ≥ u(e) dm =⇒ U mp (h) = u(h) dm ≥ u(e) dm ≥ U mp (e). A commonly studied special case of the multiple-priors model is a CEU order with convex capacity ν. Then (9.10) applies with P = core(ν) = {m : m(·) ≥ ν(·) on }. Thus convexity of the capacity implies uncertainty aversion given (9.11). Focus more closely on the CEU model, with particular emphasis on the connection between uncertainty aversion and convexity of the capacity. The next result translates Lemma 9.1 into the present setting, thus providing necessary and sufficient conditions for uncertainty aversion combined with a prespecified supporting probability measure m. For necessity, an added assumption is adopted. Say that a capacity ν is convex-ranged if for all events E1 ⊂ E2 and ν(E1 ) < r < ν(E2 ), there exists E, E1 ⊂ E ⊂ E2 , such that ν(E) = r. This terminology applies in particular if ν is additive, where it is standard.10 For axiomatizations of CEU that deliver a convex-ranged capacity, see (Gilboa, 1987: 73 and Sarin and Wakker, 1992: Proposition A.3). Savage’s axiomatization of expected utility delivers a convex-ranged probability measure. Use the notation U ceu to refer to utility functions defined by (9.1), where the vNM index u : X −→ R1 satisfies u(X ) has nonempty interior in R1 . For those unfamiliar with Choquet integration, observe that for simple acts it yields n−1 U ceu (e) = i=1 [u(xi ) − u(xi+1 )]ν(∪i1 Ej ) + u(xn ),
(9.12)
where the outcomes are ranked as x1 x2 · · · xn and the act e has e(xi ) = Ei , i = 1, . . . , n.
182 Larry G. Epstein Lemma 9.2. Let U ceu be a CEU utility function with capacity ν. (a) The following conditions are sufficient for U ceu to be uncertainty averse with supporting U ps having m as underlying probability measure: There exists a bijection g : [0, 1] −→ [0, 1] such that m ∈ core(g −1 (ν));
and
m(·) = g −1 (ν(·)) on A.
(9.13) (9.14)
(b) Suppose that ν is convex-ranged and that A is rich. Then the conditions in (a) are necessary in order that U ceu be uncertainty averse with supporting U ps having m as underlying probability measure. (c) Finally, in each of the preceding parts, the supporting utility U ps can be taken to be an expected utility function if and only if in addition g is the identity function, that is, m = ν on A and m ≥ ν on .
(9.15)
See Appendix A for a proof. The supporting utility function U ps that is provided by the proof of (a) has the form (9.7), where the risk preference functional W is W () = u(x) d(g ◦ )(x), X
a member of the rank-dependent expected utility class (Chew et al., 1987). Observe first that attitudes toward uncertainty do not depend on properties of the vNM index u. More surprising is that given m, the conditions on ν described in (a) are ordinal invariants, that is, if ν satisfies them, then so does ϕ(ν) for any monotonic transformation ϕ. In other words, ν and g satisfy these conditions if and only if ϕ(ν) and g = ϕ(g) do. Consequently, under the regularity conditions in the lemma, the CEU utility function u(e) dν is uncertainty averse if and only if the same is true for u(e) dϕ(ν). The fact that uncertainty aversion is determined by ordinal properties of the capacity makes it perfectly clear that uncertainty aversion has little to do with convexity, a cardinal property. Thus far, only parts (a) and (b) of the lemma have been used. Focus now on (c), characterizing conditions under which U ceu is “more uncertainty averse than some expected utility order with probability measure m.” Because the CEU utility functions studied by Schmeidler are defined on horse-race/roulette-wheels and conform with expected utility on the objective roulette-wheels, this latter comparison may be more relevant than uncertainty aversion per se for understanding the connection with convexity. The lemma delivers the requirement that ν be additive on A and that it admit an extension to a measure lying in its core. It is well known that convexity of ν is sufficient for nonemptiness of the core, but that seems to be the extent of the link with uncertainty aversion. The final example in Section 9.1.2, as completed in the next subsection, shows that U ceu may be more uncertainty averse than some expected utility order even though its capacity is not convex.
A definition of uncertainty aversion
183
To summarize, there appears to be no logical connection in the Savage framework between uncertainty aversion and convexity. Convexity does not imply uncertainty aversion, unless added conditions such as (9.11) are imposed. Furthermore, convexity is not necessary even for the stricter notion “more uncertainty averse than some expected utility order” that seems closer to Schmeidler’s notion. This is not to say that convexity and the associated multiple-priors functional structure that it delivers are not useful hypotheses. Rather, the point is to object to the widely adopted behavioral interpretation of convexity as uncertainty aversion. 9.3.3. Inner measures Zhang (1997) argues that rather than convex capacities, it is capacities that are inner measures that model uncertainty aversion. These capacities are defined as follows: Let p be a probability measure on A; its existence reflects the unambiguous nature of events in A. Then the corresponding inner measure p∗ is the capacity given by p∗ (E) = sup{p(B) : B ⊂ E, B ∈ A},
E ∈ .
The fact that the capacity of any E is computed by means of an inner approximation by unambiguous events seems to capture a form of aversion to ambiguity. Zhang provides axioms for preference that are consistent with this intuition and that deliver the subclass of CEU preferences having an inner measure as the capacity ν. It is interesting to ask whether CEU preferences with inner measures are uncertainty averse in the formal sense of this chapter. The answer is “sometimes” as described in the next lemma. Lemma 9.3. Let U ceu (·) ≡ u(·)dp∗ , where p∗ is the inner measure generated as above from the probability measure p on A. ceu is more (a) If p admits an extension to a probability measure on , then U uncertainty averse than the expected utility function u(·)dp. (b) Adopt the auxiliary assumptions in Lemma 9.2(b). If U ceu is uncertainty averse, then p admits an extension from A to a measure on all of . Proof. (a) p∗ and p coincide on A. For every B ⊂ E, p(B) ≤ p(E). Therefore, p∗ (E) ≤ p(E). From the formula (9.12) for the Choquet integral, conclude that for all acts e and unambiguous acts h,
u(h) dp∗ =
u(h) dp
and
u(e) dp∗ ≤
u(e) dp.
(b) By Lemma 9.2 and its proof, p = p∗ = g(m) on A and m(A) = [0, 1]. Therefore, g must be the identity function. Again by the previous lemma, m lies in core(p∗ ), implying that m ≥ p∗ = p on A. Because A is closed with respect to complements, conclude that m = p on A and hence that m is the asserted extension of p.
184
Larry G. Epstein
Both directions in the lemma are of interest. In general, a probability measure on the λ-system A need not admit an extension to the algebra .11 Therefore, (b) shows that the intuition surrounding “inner approximation” is flawed or incomplete, demonstrating the importance of a formal definition of uncertainty aversion. Part (a) provides a class of examples of CEU functions that are more uncertainty averse than some expected utility order. These can be used to show that even if this stricter notion of (more) uncertainty averse is adopted, the capacity p∗ need not be convex. For instance, the last example in Section 9.1.2 satisfies the conditions in (a)—the required extension is the equally likely (counting) probability measure on the power set. Thus preference is uncertainty averse, even though p∗ is not convex. 9.3.4. Bets, beliefs and uncertainty aversion This section examines some implications of uncertainty aversion for the ranking of binary acts. Because the ranking of bets reveals the decision-maker’s underlying beliefs or likelihoods, these implications clarify the meaning of uncertainty aversion and help to demonstrate its intuitive empirical content. The generic binary act is denoted xEy, indicating that x is obtained if E is realized and y otherwise. Let ! be uncertainty averse with probabilistically sophisticated order !ps satisfying (9.8). Apply the latter to binary acts, to obtain the following relation: For all unambiguous A, events E and outcomes x1 and x2 , x1 Ax2 !ps (ps )x1 Ex2 =⇒ x1 Ax2 ! ()x1 Ex2 . Proceed to transform this relation into a more illuminating form. Exclude the uninteresting case x1 ∼ x2 and assume that x1 x2 . Then x1 Ex2 can be viewed as a bet on the event E. As noted earlier, !ps necessarily agrees with the given ! in the ranking of unambiguous acts and hence also constant acts or outcomes, so x1 ps x2 . Let m be the subjective probability measure on the state space (S, ) that underlies !ps . Then the monotonicity property inherent in probabilistic sophistication implies that x1 Ax2 !ps (ps )x1 Ex2 ⇐⇒ m(A1 ) ≥ (>)m(E1 ). Conclude that uncertainty aversion implies the existence of a probability measure m such that: For all A, E, x1 and x2 as given earlier, m(A) ≥ (>)m(E) =⇒ x1 Ax2 ! ()x1 Ex2 . One final rewriting is useful. Define, for the given pair x1 x2 , ν(E) = U (x1 Ex2 ).
A definition of uncertainty aversion
185
Then, mA ≥ (>)mE =⇒ νA ≥ (>)νE,
(9.16)
which is the sought-after implication of uncertainty aversion.12 In the special case of CEU (9.1), with vNM index satisfying u(x1 ) = 1 and u(x2 ) = 0, ν defined as stated earlier coincides with the capacity in the CEU functional form. Even when CEU is not assumed, (suppose that ν is monotone with respect to set inclusion and) refer to ν as a capacity. The interpretation is that ν represents ! numerically over bets on various events with the given stakes x1 and x2 , or alternatively, that it represents numerically the likelihood relation underlying preference !. From this perspective, only the ordinal properties of ν are significant.13 An implication of (9.16) is that ν and m must be ordinally equivalent on A (though not on ). In other words, uncertainty aversion implies the existence of a probability measure m that supports {E ∈ : ν(E) ≥ ν(A)} at each unambiguous A, where support is in a sense analogous to the usual meaning, except that the usual linear supporting function defined on a linear space is replaced by an additive function defined on an algebra. Think of the measure m as describing the (not necessarily unique) “mean ambiguity-free likelihoods” implicit in ν and !. This interpretation and the “support” analogy are pursued and developed further in Section 9.4.3 under the assumption that preference is eventwise differentiable. In a similar fashion, one can show that uncertainty loving implies the existence of a probability measure q on (S, ) such that q(A) ≤ ( · · · > u(xn ) and these utility levels can be varied over an open set containing some point (u(x), . . . , u(x)), it follows that g(m(∪i1 e(xj ))) = g(m(∪i1 h(xj ))) ≥ ν(∪i1 e(xj )), for all e and h as stated earlier. Given E ∈ , let e(x1 ) = E and e(x2 ) = E c , x1 x2 . There exists unambiguous A such that mE = mA. Let h(x1 ) = A and h(x2 ) = Ac . Then g(m(E)) ≥ ν(E) follows, proving (9.13). The sufficiency portion (a) can be proven by suitably reversing the preceding argument.
A definition of uncertainty aversion
197
Proof of Theorem 9.1. The following lemma is of independent interest because of the special significance of bets as a subclass of all acts. Notation from Section 9.3.4 is used below. Lemma 9.A.1. Suppose that A is rich, with outcomes x ∗ and x∗ as in the definition of richness. Let ν(E) ≡ U (x ∗ Ex∗ ). Then the conjunction of (9.16) and (9.17) implies that ν is ordinally equivalent to a probability measure on (or equivalently, ν satisfies (9.25)). A fortiori, the conclusion is valid if ! is both uncertainty averse and uncertainty loving. Proof. Let m and q be the hypothesized supports. Their defining properties imply that mF ≤ mG =⇒ qF ≤ qG, for all A ∈ A, F ⊂ Ac and G ⊂ A. But if this relation is applied to Ac in place of A, noting that Ac ∈ A, then the roles of F and G are reversed and one obtains mF ≥ mG =⇒ qF ≥ qG. In other words, mF ≤ mG ⇐⇒ qF ≤ qG, for all A ∈ A, F ⊂ Ac and G ⊂ A. Conclude from (9.16) and (9.17) that mF ≤ mG ⇐⇒ ν(A + F − G) ≤ νA for all A ∈ A, F ⊂ Ac and G ⊂ A; or equivalently, that for all A ∈ A, mE ≤ mA ⇐⇒ νE ≤ νA. In other words, every indifference curve for ν containing some unambiguous event is also an indifference curve for m. The stated hypothesis regarding A ensures that every indifference curve contains some unambiguous A and therefore that ν and m are ordinally equivalent on all of . ps
Complete the proof of Theorem 9.1. Denote by !ps and !∗ the probabilistically sophisticated preference orders supporting ! in the sense of (9.8) and (9.9), respectively, and having underlying probability measures m and q defined on . From the proof of the Lemma, m and q are ordinally equivalent on . Claim. For each act e, there exists h ∈ F ua such that ps
e ∼ps h and e ∼∗ h.
198
Larry G. Epstein
To see this, let e = ((xi , Ei )ni=1 ). By the richness of A, there exist unambiguous events H1 , such that, x ∗ H1 x∗ ∼ x ∗ E1 x∗ , or, in the notation of the lemma, ν(H1 ) = ν(E1 ). Because ν and m are ordinally equivalent, m(H1 ) = m(E1 ) and thus also m(H1c ) = m(E1c ) and ν(H1c ) = ν(E1c ). Thus one can apply richness again to find a suitable unambiguous subset H2 of H1c . Proceeding in this way, one constructs an unambiguous act h = ((xi , Hi )ni=1 ) such that ν(Hi ) = ν(Ei )
and m(Hi ) = m(Ei )
for all i. By the ordinal equivalence of m and q, q(Hi ) = q(Ei ),
all i.
The claim now follows immediately from the nature of probabilistic sophistication. ps From (9.8), ! and !ps agree on F ua . Similarly, ! and !∗ agree on F ua . ps ps Therefore, ! and !∗ agree there. From the claim, it follows that they agree on the complete set of acts F. The support properties (9.8) and (9.9) thus imply that h !ps e ⇐⇒ h ! e,
for all h ∈ F ua
and
e ∈ F.
In particular, every indifference curve for !ps containing some unambiguous act is also an indifference curve for !. But the qualification can be dropped because of the claim. It follows that ! and !ps coincide on F. Proof of Theorem 9.3. (a) Let m satisfy (9.16) at A. Show first that mF ≤ mG =⇒ δν(F ; A) ≤ δν(G; A),
(9.A.1)
for all, F ⊂ Ac and G ⊂ A: Fix ε > 0 and let λ0 be such that the expression defining δν(·; A) is less than ε whenever λ > λ0 . By Lemma 9.B.1, there exist partitions {F j ,λ }n1 λ and {Gj ,λ }n1 λ such that mF j ,λ ≤ mGj ,λ ,
j = 1, . . . , nλ ,
and λ > λ0 , hence nλ
|[ν(A) − ν(A + F j ,λ − Gj ,λ )] − Eδν(Gj ,λ ; A) − δν(F j ,λ ; A)]| < ε.
j =1
Because m is a support, ν(A + F j ,λ − Gj ,λ ) ≤ ν(A). Thus20 nλ [δν(Gj ,λ ; A) − δν(F j ,λ ; A)] > −ε. δν(G; A) − δν(F ; A) = j =1
However, ε is arbitrary. This proves (9.A.1).
A definition of uncertainty aversion
199
Replace A by Ac , in which case F and G reverse roles and deduce that mF ≥ mG =⇒ δν(F ; Ac ) ≥ δν(G; Ac ) or equivalently, δν(F ; Ac ) ≤ δν(G; Ac ) =⇒ mF ≤ mG.
(9.A.2)
Because m is a support, this yields (9.28). (b) Let A ∈ A satisfy S A and
S Ac .
(9.A.3)
Claim 1. δν(Ac ; A) > 0. If it equals zero, then δν(Ac ; A) = δν(φ; A) implies, by (9.28), that A + Ac ) A, or S ∼ A, contrary to (9.A.3). Claim 2. mAc > 0. If not, then mS ≤ mA = 1 and (9.16) implies that S ∼ A, contrary to (9.A.3). Claim 3. δν(A; Ac ) > 0 and mA > 0. Replace A by Ac above. Claim 4. δν(Ac ; Ac ) > 0. If it equals zero, then δν(A; Ac ) mAc = 0 by (9.29), contradicting Claim 3. Claim 5. For any G ⊂ A, δν(G; A) = 0 =⇒ mG = 0: Let F = Ac . By Claim 1, δν(F ; A) > 0. Therefore, Lemma 9.B.1 implies that ∀λ0 ∃λ > λ0 , j . By (9.A.1), ∀λ0 ∃ λ > λ0 , m(F j ,λ ) > m(G) δν(F j ,λ ; A) > 0 = δν(G; A) for nall λ for all j , and thus also mF > j =1 (mG). This implies mG = 0. Claim 6. For any F ⊂ Ac , mF = 0 =⇒ δν(F ; A) = 0: mF = 0 =⇒ (by (9.A.1)) δν(F ; A) ≤ δν(G; A) for all G ⊂ A. Claim 4 implies δν(G; A) > 0 if G = A. Therefore, δν(·; A) convex-ranged implies (Lemma 9.B.1) that δν(F ; A) = 0. Claim 7. m is convex-ranged: By Claim 5, m is absolutely continuous with respect to δν(·; A) on A. The latter measure is convex-ranged. Therefore, m has no atoms in A. Replace A by Ac and use the convex range of δν(·; Ac ) to deduce in a similar fashion that m has no atoms in Ac . Thus m is non-atomic. Because it is also countably additive by hypothesis, conclude that it is convex-ranged (Rao and Rao, 1983: Theorem 5.1.6). Turn to (9.29); (9.30) may be proven similarly. Define the measures µ and p on Ac × A as follows: µ = m δν(·; A),
p = δν(·; A) ⊗ m.
Claims 5 and 6 prove that p µ. Denote by h ≡ dp/dµ the Radon-Nikodym density. (Countable additivity is used here.)
200
Larry G. Epstein
Claim 8. µ{(s, t) ∈ Ac × A : h(s, t) > 1} = 0: If not, then there exist F0 ⊂ Ac and G0 ⊂ A, with µ(F0 × G0 ) > 0, such that h>1
on F0 × G0 .
Case 1. mF0 = mG0 . Integration delivers that
F0 G0 [h(s, t)
− 1]dµ > 0, implying
δν(F0 ; A)mG0 − mF0 δν(G0 ; A) > 0. Consequently, mF0 = mG0 and δν(F0 ; A) > δν(G0 ; A), contradicting (9.A.1). Case 2. mF0 < mG0 . Because m is convex-ranged (Claim 7), there exists G1 ⊂ G0 such that mG1 = mF0 and µ(F0 × G1 ) > 0. Thus the argument in Case 1 can be applied. Case 3. mF0 > mG0 . Similar to Case 2. c This proves Claim 8. Finally, for any F ⊂ A and G ⊂ A, δν(F ; A) (mG) − (mF )δν(G; A) = F G (h − 1)dµ ≤ 0, proving (9.29). (c) Though at first glance the proof may seem obvious given (9.31), some needed details are provided here. Let A ∈ A0 . Multiply through (9.29) by δν(G; Ac ) to obtain that
δν(F ; A)δν(G; Ac )mG ≤ δν(G; A)δν(G; Ac )mF , for all F ⊂ Ac and G ⊂ A. Similarly, multiplying through (9.30) by δν(G; A) yields δν(G; A)δν(G; Ac )mF ≤ δν(G; A)δν(F ; Ac )mG, for all such F and G. Conclude from coherence that δν(G; A)δν(G; Ac )mF = δν(G; A)δν(F ; Ac )mG,
(9.A.4)
for all F ⊂ Ac and G ⊂ A. Take G = A in (9.A.4) to deduce δν(F ; Ac ) = δν(A; Ac )m(F )/m(A),
for all F ⊂ Ac .
(9.A.5)
Next take F = Ac in (9.A.4). If δν(G; A) > 0, then δν(G; Ac ) = δν(Ac ; Ac )m(G)/m(Ac ),
for all G ⊂ A.
(9.A.6)
This equation is true also if δν(G; A) = 0, because then (9.28), with F = Ac , implies δν(Ac ; A)m(G) = 0, which implies mG = 0 by Claim 1.
A definition of uncertainty aversion
201
Substitute the expressions for δν(F ; Ac ) and δν(G; Ac ) into (9.A.4) and set F = Ac and G = A to derive δν(Ac ; Ac )/m(Ac ) = δν(A; Ac )/m(A) ≡ α(A) > 0. Thus
δν(·; A ) = c
α(A)m(·) on ∩ Ac α(A)m(·) on ∩ A.
By additivity, it follows that δν(·; Ac ) = α(A) m(·) on all of . Thus δν(·; A) = κ(A)α(A)m(·), completing the proof.
Appendix B: Additive functions on
X
Some details are provided for such functions, as defined in Section 9.4.1. For any additive µ, µ(∅) = 0 and µ(e) = x µx (e(x)),
(9.B.1)
where µx is the marginal measure on defined by µx (E) = the µ-measure of the act that assigns E to the outcome x and the empty set to every other outcome. Apply to each marginal the standard notions and results for finitely additive measures on an algebra (see Rao and Rao, 1983). In this way, one obtains a decomposition of µ, µ = µ+ − µ− , where µ+ and µ− are non-negative measures. Define | µ | = µ+ + µ − . Say that the measure µ is bounded if ⎧ ⎫ nλ ⎨ ⎬ sup | µ | (f ) = sup | µ(f j ,λ ) | : f ∈ X , λ < ∞. ⎩ ⎭ f
(9.B.2)
j =1
Call the measure µ on X convex-ranged if for every e and r ∈ (0, | µ | (e)), there exists b, b ⊂ e such that | µ | (b) = r, where e and b are elements of X . Lemma 9.B.1 summarizes some useful properties of convex-ranged measures on X . See (Rao and Rao, 1983: 142–3) for comparable results for measures on an algebra. In Rao and Rao, 1983 property (b) is referred to as strong continuity. Lemma 9.B.1. Let µ be a measure on X . Then the following statements are equivalent: (a) µ is convex-ranged.
202
Larry G. Epstein
λ (b) For any act f , with corresponding net of all finite partitions {f j ,λ }nj =1 , and for any ε > 0, there exists λ0 such that
λ > λ0 =⇒ | µ | (f j ,λ ) < ,
for j = 1, . . . , nλ .
(c) For any acts f , g, and h ≡ f + g, if µ(f ) > µ(g), then there exists a λ partition {hj ,λ }nj =1 of h, such that µ(hj ,λ ) < ε and µ(hj ,λ ∩f ) > µ(hj ,λ ∩g), j = 1, . . . , nλ .
Appendix C: Differentiability This appendix elaborates on mathematical aspects of the definition of eventwise differentiability. Then it describes a stronger differentiability notion. The requirement of convex range for δ(·; e) is not needed everywhere, but is built into the definition for ease of exposition. Though I use the term derivative, δ(·; e) is actually the counterpart of a differential. The need for a signed measure arises from the absence of any monotonicity assumptions. If (·) is monotone with respect to inclusion ⊂, then each δ(·; e) is a non-negative measure. The limiting condition (9.23) may seem unusual because it does not involve a difference quotient. It may be comforting, therefore, to observe that a comparable condition can be identified in calculus: For a function ϕ : R1 −→ R1 that is differentiable at some x in the usual sense, elementary algebraic manipulation of the definition of the derivative ϕ (x) yields the following expression paralleling (9.23): N
[ϕ(x + N −1 ) − ϕ(x) − N −1 ϕ (x)] −→N−→∞ 0.
i=1
Further clarification is afforded as follows by comparison with Gateaux differentiability: Roughly speaking, eventwise differentiability at e states that the difference (e + f − g) − (e) can be approximated by δ(f ; e) − δ(g; e) for suitably “small” f and g, where the small size of the perturbation “f − g” is in the sense of the fineness of the partitions as λ grows. Naturally, it is important that the approximating functional δ(·; e) is additive (a signed measure). There is an apparent parallel with Gateaux (directional) differentiability of functions defined on a linear space—“f − g” represents the “direction” of perturbation and the additive approximation replaces the usual linear one. Note that the perturbation from e to e + f − g is perfectly general; any e can be expressed (uniquely) in the form e = e + f − g, with f ⊂ ec and g ⊂ e (see (9.18)). A natural question is “how restrictive is the assumption of eventwise differentiability?” In this connection, the reader may have noted that the definition is formulated for an arbitrary state space S and algebra . However, eventwise differentiability is potentially interesting only in cases where these are both infinite. That is because if is finite, then is differentiable if and only if it is additive.
A definition of uncertainty aversion
203
Another question concerns the uniqueness of the derivative. The limiting condition (9.23) has at most one solution, that is, the derivative is unique if it exists: If p and q are two measures on X satisfying the limiting property, then for λ | p(g j ,λ ) − q(g j ,λ ) | −→λ 0. Therefore, each g ⊂ ec , | p(g) − q(g) | ≤ nj =1 p(g) = q(g) for all g ⊂ e. Similarly, prove equality for all f ⊂ ec and then apply additivity. Next I describe a Chain Rule for eventwise differentiability. Theorem 9.C.1. Let : X −→ R1 be eventwise differentiable at e and ϕ : ( X ) −→ R1 be strictly increasing and continuously differentiable. Then ϕ ◦ is eventwise differentiable at e and δ(ϕ ◦ )(·; e) = ϕ ((e))δ(·; e) Proof. Consider the sum whose convergence defines the eventwise derivative of ϕ ◦ . By the Mean Value Theorem, ϕ ◦ (e + f j ,λ − g j ,λ ) − ϕ ◦ (e) = ϕ (zj ,λ )[(e + f j ,λ − g j ,λ ) − (e)] for suitable real numbers zj ,λ . Therefore, it suffices to prove that nλ
| (e + f j ,λ − g j ,λ ) − (e) || ϕ (zj ,λ ) − ϕ ((e)) | −→λ 0.
j =1
By the continuity of ϕ , the second term converges to zero uniformly in j . Eventwise differentiability of implies that given ε, there exists λ0 such that λ > λ0 =⇒ nλ
| (e + f j ,λ − g j ,λ ) − (e) | ≤ ε +
j =1
nλ
| δ(f j ,λ ; e) − δ(g j ,λ ; e) |
j =1
≤ε+
nλ
[| δ(f j ,λ ; e) | + | δ(g j ,λ ; e) |]
j =1
≤ K, for some K < ∞ that is independent of λ, f and g, as provided by the boundedness of the measure δ(·; e). Eventwise differentiability is inspired by Rosenmuller’s (1972) notion, but there are differences. Rosenmuller deals with convex capacities defined on , rather than with utility functions defined on acts. Even within that framework, his formulation differs from (9.23) and relies on the assumed convexity. Moreover, he restricts attention to “one-sided” derivatives, that is, where the inner perturbation g is identically empty (producing an outer derivative), or where the outer perturbation
204
Larry G. Epstein
f is identically empty (producing an inner derivative). Finally, Rosenmuller’s application is to cooperative game theory rather than to decision theory. A strengthening of eventwise differentiability, called µ-differentiability, is described here. The stronger notion is more easily interpreted, thus casting further light on eventwise differentiability, and it delivers a form of the Fundamental Theorem of Calculus. Machina (1992) introduces a very similar notion. Because it is new and still unfamiliar and because our formulation is somewhat different and arguably more transparent, a detailed description seems in order.21 To proceed, adopt as another primitive a non-negative, bounded and convexranged measure µ on X . This measure serves the “technical role” of determining the distance between acts. To be precise, if e and e are identified whenever µ(ee ) = 0, then d(e, e ) = µ(ee )
(9.C.1)
defines a metric on X ; the assumption of convex range renders the metric space path-connected (by Volkmer and Weber, 1983; see also Landers, 1973: Lemma 4). One way in which such a measure can arise is from a convex-ranged probability measure µ0 on . Given µ0 , define µ by µ(e) = x µ0 (e(x)).
(9.C.2)
Once again let : X −→ R1 Because acts e and e are identified when µ(ee ) = 0, is assumed to satisfy the condition µ(ee ) = 0 =⇒ (e ∪ f ) = (e ∪ f ),
for all f .
(9.C.3)
In particular, acts of µ-measure 0 are assumed to be “null” with respect to . Definition 9.C.1. is µ-differentiable at e ∈ X if there exists a bounded and convex-ranged measure δ(·; e) on X , such that for all f ⊂ ec and g ⊂ e, | (e + f − g) − (e) − δ(f ; e) + δ(g; e) | /µ(f + g) −→ 0 (9.C.4) as µ(f + g) −→ 0. The presence of a “difference quotient” makes the definition more familiar in appearance and permits an obvious interpretation. Think in particular of the case (| X | = 1) where the domain of is . It is easy to see that δ(·; e) is absolutely continuous with respect to µ for each e. (Use additivity of the derivative and (9.C.3).) Eventwise and µ-derivatives have not been distinguished notationally because they coincide whenever both exist. Lemma 9.C.1. If is µ-differentiable at some e in X , then is also eventwise differentiable at e and the two derivatives coincide.
A definition of uncertainty aversion
205
Proof. Let δ(·; e) be the µ-derivative at e, f ⊂ ec and g ⊂ e. Given ε > 0, there exists (by µ-differentiability) ε > 0 such that | (e + f − g ) − (e) − δ(f ; e) + δ(g ; e) | < εµ(f + g ), (9.C.5) if µ(f + g ) < ε . By Lemma 9.B.1 applied to the convex-ranged µ, there exists λ0 such that µ(f j ,λ + g j ,λ ) < ε ,
for all λ > λ0 .
Therefore, one can apply (9.C.5) to the acts (f , g ) = (f j ,λ , g j ,λ ). Deduce that nλ
| (e + f j ,λ − g j ,λ ) − (e) − δ(f j ,λ ; e) + δ(g j ,λ ; e) |
j =1
0, f ⊂ ec and g ⊂ e, there exist finite partitions f = f j and g = g j such that ε > | (e+f −g)−(e)−i δ(f i ; e+F i−1 −G i−1 )+i δ(g i ; e+F i−1 −G i−1 ) | (9.C.6) where F i =
i
j =1 f
j
and G i =
i
j =1 g
j.
Proof. µ-differentiability and the indicated uniform convergence imply that | (e + F i−1 − G i−1 + f i − g i ) − (e + F i−1 − G i−1 ) −δ(f i ; e + F i−1 − G i−1 ) + δ(g i ; e + F i−1 − G i−1 ) | < εµ(f i + g i ), for any partitions {f j } and {g j } such that µ(f j + g j ) is sufficiently small for all j . But the latter can be ensured by taking the partitions {f j ,λ } and {g j ,λ } for λ sufficiently large. The convex range assumption for µ enters here; use Lemma 9.B.1.
206
Larry G. Epstein
Therefore, the triangle inequality delivers | (e + f − g) − (e) − δ(f i ; e + F i − 1 − G i − 1 ) + δ(g i ; e + F i − 1 − G i − 1 ) | ≤ εi µ(f i + g i ) = εµ(f + g).
Acknowledgments An earlier version of this chapter was circulated in July 1997 under the title “Uncertainty Aversion.” The financial support of the Social Sciences and Humanities Research Council of Canada and the hospitality of the Hong Kong University of Science and Technology are gratefully acknowledged. I have also benefitted from discussions with Paolo Ghirardato, Jiankang Zhang and especially Kin Chung Lo, Massimo Marinacci and Uzi Segal, and from comments by audiences at HKUST and the Chantilly Workshop on Decision Theory, June 1997. The suggestions of an editor and referee led to an improved exposition.
Notes 1 After a version of this chapter was completed, I learned of a revision of Machina (1992), dated 1997, that is even more closely related. 2 Zhang (1997) is the first paper to propose a definition of ambiguity that is derived from preference, but his definition is problematic. An improved definition is the subject of current research by this author and Zhang. 3 See Section 9.3.2 for the definition of Choquet integration. 4 As explained in Section 9.5, the examples to follow raise questions about the widespread use that has been made of Schmeidler’s definition rather than about the definition itself. Section 9.3.4 describes the performance of this chapter’s definition of uncertainty aversion in the Ellsbergian setting. 5 In terms of acts, {R} {B} means 1R 1B and so on. For CEU, a decision-maker always prefers to bet on the event having the larger capacity. 6 p∗ is an inner measure, as defined and discussed further in Section 9.3.3. 7 Write y ≥ x if receiving outcome y with probability 1 is weakly preferable, according to U ps , to receiving x for sure. first-order stochastically dominates if for all outcomes y, ({x ∈ X : y ≥ x}) ≤ ({x ∈ X : y ≥ x}). Thus the partial order depends on the utility function U ps , but that causes no difficulties. See Machina and Schmeidler (1992) for further details. 8 Subjective expected utility is the special case of (9.7) with W () = X u(x)d(x). But more general risk preferences W are admitted, subject only to the noted monotonicity restriction. In particular, probabilistically sophisticated preference can rationalize behavior such as that exhibited in theAllais Paradox. It follows that uncertainty aversion, as defined shortly, is concerned with Ellsberg-type, and not, Allais-type, behavior. 9 It merits emphasis that richness of A is needed only for some results stated later; for example, for the necessity parts of Theorem 9.1 and Lemma 9.2. Richness is not used to describe conditions that are sufficient for uncertainty aversion (or neutrality). In particular, the approach and definitions of this chapter are potentially useful even if A = {∅, S}. 10 See Rao and Rao (1983). Given countable additivity, convex-ranged is equivalent to non-atomicity. 11 Massimo Marinacci provided the following example: Let S be the set of integers {1, . . . , 6} and A the λ-system {∅, S} ∪ {Ai , Aci : 1 ≤ i ≤ 3}, where A1 = {1, 2, 3},
A definition of uncertainty aversion
12
13 14 15
207
A2 = {3, 4, 5} and A3 = {1, 5, 6}. Define p on A as the unique probability measure satisfying p(Ai ) = 1/6 for all i. If p has an extension to the power set, then p(∪i Ai ) = 1 > 1/2 = i p(Ai ). However, the reverse inequality must obtain for any probability measure. This condition is necessary for uncertainty aversion but not sufficient, even if there are only two possible outcomes. That is because by taking h in (9.8) to be a constant act, one concludes that an uncertainty averse order ! assigns a lower certainty equivalent to any act than does the supporting order !ps . In contrast, (9.16) contains information only on the ranking of bets and not on their certainty equivalents. (I am assuming here that certainty equivalents exist.) These ordinal properties are independent of the particular pair of outcomes satisfying x1 x2 if (and only if) ! satisfies Savage’s axiom P4: For any events A and B and outcomes x1 x2 and y1 y2 , x1 Ax2 ! x1 Bx2 implies that y1 Ay2 ! y1 By2 . Alternatively, we could show that the rankings in (9.3) are inconsistent with the implication (9.17) of uncertainty loving. A slight strengthening of (9.19) is valid. Suppose that A − F i + Gi ! A all i,
16 17 18
19
20 21
for some partitions F = F i and G = Gi . Only the trivial partitions were admitted earlier. Then additivity of the supporting measure implies as above that mF ≤ mG and hence that A + F − G ) A. In particular, X is not the product algebra on S X induced by . However, X is a ring, that is, it is closed with respect to unions and differences. More generally, a counterpart of the usual product rule of differentiation is valid for eventwise differentiation. Even given (9.27), the supporting measure at a given single A is not unique, contrary to the intuition suggested by calculus. If the support property “mF ≤ mG =⇒ ν(A + F − G) ≤ νA,” is satisfied by m, then it is also satisfied by any m satisfying m(·) ≤ m (·) on ∩ Ac and m(·) ≥ m (·) on ∩ A. For example, let m be the conditional of m given Ac . Each urn contains 100 balls that are either red or blue. For the ambiguous urn this is all the information provided. For the unambiguous urn, the decision-maker is told that there are 50 balls of each color. The choice problem is whether to bet on drawing a red (or blue) ball from the ambiguous urn versus the unambiguous one. Given |xj − yj | < ε and yj ≥ 0 for all j , then xj ≥ −| xj − yj | + yj ≥ |xj − yj |, implying that xj > −ε. As mentioned earlier, after a version of this chapter was completed, I learned of a revision of Machina (1992), dated 1997, in which Machina provides a formulation very similar to that provided in this subsection. The connection with the more general “partitionsbased” notion of eventwise differentiability, inspired by Rosenmuller (1972), is not observed by Machina.
References F. Anscombe and R.J. Aumann (1963) “A Definition of Subjective Probability,” Ann. Math. Stat., 34, 199–205. P. Billingsley (1986) Probability and Measure, John Wiley. S.H. Chew, E. Karni and Z. Safra (1987) “Risk Aversion in the Theory of Expected Utility with Rank Dependent Probabilities,” J. Econ. Theory, 42, 370–381. S.H. Chew and M.H. Mao (1995) “A Schur Concave Characterization of Risk Aversion for Non-Expected Utility Preferences,” J. Econ. Theory, 67, 402–435.
208
Larry G. Epstein
J. Eichberger and D. Kelsey (1996) “Uncertainty Aversion and Preference for Randomization,” J. Econ. Theory, 71, 31–43. L.G. Epstein and T. Wang (1995) “Uncertainty, Risk-Neutral Measures and Security Price Booms and Crashes,” J. Econ. Theory, 67, 40–82. I. Gilboa (1987) “Expected Utility with Purely Subjective Non-Additive Probabilities,” J. Math. Econ., 16, 65–88. I. Gilboa and D. Schmeidler (1989) “Maxmin Expected Utility With Nonunique Prior,” J. Math. Econ., 18, 141–153. (Reprinted as Chapter 6 in this volume.) P.R. Halmos (1974) Measure Theory, Springer–Verlag. E. Karni (1983) “Risk Aversion for State-Dependent Utility Functions: Measurement and Applications,” Int. Ec. Rev., 24, 637–647. D.M. Kreps (1988) Notes on the Theory of Choice, Westview. D. Landers (1973) “Connectedness Properties of the Range of Vector and Semimeasures,” Manuscripta Math., 9, 105–112. M. Machina (1982) “Expected Utility Analysis Without the Independence Axiom,” Econometrica, 50, 277–323. M. Machina (1992) “Local Probabilistic Sophistication,” mimeo. M. Machina and D. Schmeidler (1992) “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. K.P. Rao and M.B. Rao (1983) Theory of Charges, Academic Press. J. Rosenmuller (1972) “Some Properties of Convex Set Functions, Part II,” Methods of Oper. Research, 17, 277–307. R. Sarin and P. Wakker (1992) “A Simple Axiomatization of Nonadditive Expected Utility,” Econometrica, 60, 1255–1272. (Reprinted as Chapter 7 in this volume.) L. Savage (1954) The Foundations of Statistics, John Wiley. D. Schmeidler (1972) “Cores of Exact Games,” J. Math. Anal. and Appl., 40, 214–225. D. Schmeidler (1986) “Integral Representation without Additivity,” Proc. Amer. Math. Soc., 97, 255–261. D. Schmeidler (1989) “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) G. Shafer (1979) “Allocations of Probability,” Ann. Prob., 7, 827–839. H. Volkmer and H. Weber (1983) “Der Wertebereich Atomloser Inhalte,” Archive der Mathematik, 40, 464–474. P. Wakker (1996) “Preference Conditions for Convex and Concave Capacities in Choquet Expected Utility,” mimeo. L. Wasserman (1990) “Bayes’Theorem for Choquet Capacities,” Ann. Stat., 18, 1328–1339. M. Yaari (1969) “Some Remarks on Measures of Risk Aversion and on Their Uses,” J. Econ. Theory, 1, 315–329. J. Zhang (1997) “Subjective Ambiguity, Probability and Capacity,” U. Toronto, mimeo.
10 Ambiguity made precise A comparative foundation Paolo Ghirardato and Massimo Marinacci
10.1. Introduction In this chapter we propose and characterize a formal definition of ambiguity aversion for a class of preference models which encompasses the most popular models developed to allow ambiguity attitude in decision making. Using this notion, we define and characterize ambiguity of events for ambiguity averse or loving preferences. Our analysis is based on a fully “subjective” framework with no extraneous devices (like a roulette wheel, or a rich set of exogenously “unambiguous” events). This yields a definition that can be fruitfully used with any preference in the mentioned class, though it imposes a limitation in the definition’s ability of distinguishing “real” ambiguity aversion from other behavioral traits that have been observed experimentally. The subjective expected utility (SEU) theory of decision making under uncertainty of Savage (1954) is firmly established as the choice-theoretic underpinning of modern economic theory. However, such success has well-known costs: SEU’s simple and powerful representation is often violated by actual behavior, and it imposes unwanted restrictions. In particular, Ellsberg’s (1961) famous thought experiment (see Section 10.6) convincingly shows that SEU cannot take into account the possibility that the information a decision maker (DM) has about some relevant uncertain event is vague or imprecise, and that such “ambiguity” affects her behavior. Ellsberg observed that ambiguity affected his “nonexperimental” subjects in a consistent fashion: Most of them preferred to bet on unambiguous rather than ambiguous events. Furthermore, he found that even when shown the inconsistency of their behavior with SEU, the subjects stood their ground “because it seems to them the sensible way to behave.” This attitude has later been named ambiguity aversion, and has received ample experimental confirmation.1 Savage was well aware of this limit of SEU, for he wrote: There seem to be some probability relations about which we feel relatively “sure” as compared with others. . . . The notion of “sure” and “unsure”
Ghirardato, P. and M. Marinacci (2002), “Ambiguity made precise: Acomparative foundation,” Journal of Economic Theory, 102, 251–289.
210
Paolo Ghirardato and Massimo Marinacci introduced here is vague, and my complaint is precisely that neither the theory of personal probability, as it is developed in this book, nor any other device known to me renders the notion less vague. (Savage 1954: 57–58 of the 1972 edition)
In the wake of Ellsberg’s contribution, extensions of SEU have been developed allowing ambiguity, and the DM’s attitude towards it, to play a role in her choices. Two methods for extending SEU have established themselves as the standards of this literature. The first, originally proposed in Schmeidler (1989), is to allow the DM’s beliefs on the state space to be represented by nonadditive probabilities, called capacities, and her preferences by Choquet integrals (which are just standard integrals when integrated with respect to additive probabilities). For this reason, this generalization is called the theory of Choquet expected utility (CEU) maximization. The second, axiomatized by Gilboa and Schmeidler (1989), allows the DM’s beliefs to be represented by multiple probabilities, and represents her preferences by the “maximin” on the set of the expected utilities. This generalization is thus called the maxmin expected utility (MEU) theory. Here we use the general class of preferences with ambiguity attitudes developed in Ghirardato and Marinacci (2000a). These orderings, that we call biseparable preferences, are all those such that the ranking of consequences can be represented by a stateindependent cardinal utility u, and the ranking of bets on events by u and a unique numerical function (a capacity) ρ.2 The latter represents the DM’s willingness to bet; that is, ρ(A) is roughly the number of euros she is willing to exchange for a bet that pays 1 euro if event A obtains and 0 euros otherwise. The only restriction imposed on the ranking of nonbinary acts is a mild dominance condition. CEU and MEU are special cases of biseparable preferences, where ρ is respectively the DM’s nonadditive belief and the lower envelope of her multiple probabilities. An important reason for the lasting success of SEU theory is the elegant theory of the measurement of risk aversion developed from the seminal contributions of de Finetti (1952), Arrow (1974) and Pratt (1964). Unlike risk aversion, ambiguity aversion is yet without a fully general formalization, one that does not require extraneous devices and applies to most if not all the existing models of ambiguity averse behavior. This chapter attempts to fill this gap: We propose a definition of ambiguity aversion and show its formal characterization in the general decisiontheoretic framework of Savage, whose only restriction is a richness condition on the set of consequences. Our definition is behavioral; that is, it only requires observation of the DM’s preferences on acts in this fully subjective setting. However, the definition works as well (indeed better, see Proposition 10.2) in the Anscombe– Aumann framework, a special case of Savage’s framework which presumes the existence of an auxiliary device with “known” probabilities. Decision models with ambiguity averse preferences are the objects of increasing attention by economists and political scientists interested in explaining phenomena at odds with SEU. For example, they have been used to explain the existence of incomplete contracts (Mukerji, 1998), the existence of substantial volatility in stock markets (Epstein and Wang, 1994; Hansen et al., 1999), or selective
Ambiguity made precise
211
abstention in political elections (Ghirardato and Katz, 2000). We hope that the characterization provided here will turn out to be useful for the “applications” of models of ambiguity aversion, as that of risk aversion was for the applications of SEU. More concretely, we hope that it will help to understand the predictive differences of risk and ambiguity attitudes. To understand our definition, it is helpful to go back to the characterization of risk aversion in the SEU model. The following approach to defining risk aversion was inspired by Yaari (1969). Given a state space S, let F denote a collection of “acts”, maps from S into R (e.g. monetary payoffs). Define a comparative notion of risk aversion for SEU preferences as follows: Say that 2 is more risk averse than 1 if they have identical beliefs and the following implications hold for every “riskless” (i.e. constant) act x and every “risky” act f : x 1 f
⇒ x 2 f
(10.1)
x 1 f
⇒ x 2 f
(10.2)
(where is the asymmetric component of ). Identity of beliefs is required to avoid possible confusions between differences in risk attitudes and in beliefs (cf. Yaari, 1969: 317). We can use this comparative ranking to obtain an absolute notion of risk aversion by calling some DMs—for instance expected value maximizers – risk neutral, and by then calling risk averse those DMs who are more risk averse than risk neutrals. As it is well known, this “comparatively founded” notion has the usual characterization. Like the traditional “direct” definition of risk aversion, it is fully behavioral in the sense defined above. However, its interpretation is based on two primitive assumptions. First, constant acts are intuitively riskless. Second, expected value maximization intuitively reflects risk neutral behavior, so that it can be used as our benchmark for measuring risk aversion. In this chapter, we follow the example of Epstein (1999) in giving a comparative foundation to ambiguity attitude: We start from a “more ambiguity averse than . . .” ranking and then establish a benchmark, thus obtaining an “absolute” definition of ambiguity aversion. Analogously to Yaari’s, our “more ambiguity averse . . .” relation is based on the following intuitive consideration: If a DM prefers an unambiguous (resp. ambiguous) act to an ambiguous (resp. unambiguous) one, a more (resp. less) ambiguity averse one will do the same. This is natural, but it raises the obvious question of which acts should be used as the “unambiguous” acts for this ranking. Depending on the decision problem the DM is facing and on her information, there might be different sets of “obviously” unambiguous acts; that is, acts that we are confident that any DM perceives as unambiguous. It seems intuitive to us that in any well-formulated problem, the constant acts will be in this set. Hence, we make our first primitive assumption: Constant acts are the only acts that are “obviously” unambiguous in any problem, since other acts may not be perceived as unambiguous by some DM in some state of information. This assumption implies that a preference (not necessarily SEU) 2 is more ambiguity averse than 1 whenever Equations (10.1) and (10.2) hold. However, the following example casts some doubts as to the intuitive appeal of such definition.
212
Paolo Ghirardato and Massimo Marinacci
Example 10.1. Consider an (Ellsberg) urn containing balls of two colors: Black and Red. Two DMs are facing this urn, and they have no information on its composition. The first DM has SEU preferences 1 , with a utility function on the set of consequences R given by u1 (x) = x, and beliefs on the state space of ball extractions S = {B, R} given by ρ1 (B) =
1 2
and
ρ1 (R) = 12 .
The second DM also has SEU √ preferences, and identical beliefs: Her preference 2 is represented by u2 (x) = x and ρ2 = ρ1 . Both (10.1) and (10.2) hold, but it is quite clear that this is due to differences in the DMs’ risk attitudes, and not in their ambiguity attitudes: They both apparently disregard the ambiguity in their information. Given a biseparable preference, call cardinal risk attitude the psychological trait described by the utility function u—what explains any differences in the choices over bets of two biseparable preferences with the same willingness to bet ρ. The problem with the example is that the two DMs have different cardinal risk attitude. To avoid confusions of this sort, our comparative ambiguity ranking uses Equations (10.1) and (10.2) only on pairs which satisfy a behavioral condition, called cardinal symmetry, that implies that two DMs have identical u. As it only looks at each DM’s preferences over bets on one event (which may be different across DMs), cardinal symmetry does not impose any restriction on the DMs’ relative ambiguity attitudes. Having thus constructed the comparative ambiguity ranking, we next choose a benchmark against which to measure ambiguity aversion. It seems generally agreed that SEU preferences are intuitively ambiguity neutral. We use SEU preferences as benchmarks because we posit—our second primitive assumption—that they are the only ones that are “obviously” ambiguity neutral in any decision problem and in any situation. Thus, ambiguity averse is any preference relation for which there is a SEU preference “less ambiguity averse than” . Ambiguity love and (endogenous) neutrality are defined in the obvious way. The main results in the chapter present the characterization of these notions of ambiguity attitude for biseparable preferences. The characterization of ambiguity neutrality is simply stated: A preference is ambiguity neutral if and only if it has a SEU representation. That is, the only preferences which are endogenously ambiguity neutral are SEU. The general characterization of ambiguity aversion (resp. love) implies in particular that a preference is ambiguity averse (resp. loving) only if its willingness to bet ρ is pointwise dominated by (resp. pointwise dominates) a probability. In the CEU case, the converse is also true: A CEU preference is ambiguity averse if and only if its belief (which is equal to ρ) is dominated by a probability; that is, it has a nonempty “core.” On the other hand, all MEU preferences are ambiguity averse, as it is intuitive. As to comparative ambiguity aversion, we find that if 2 is more ambiguity averse than 1 then ρ1 ρ2 . That is, a less ambiguity averse DM will have uniformly higher willingness to
Ambiguity made precise
213
bet. The latter condition is also sufficient for CEU preferences, whereas for MEU preferences containment of the sets of probabilities is necessary and sufficient for relative ambiguity. We next briefly turn to the issue of defining ambiguity itself. A “behavioral” notion of unambiguous act follows naturally from our earlier analysis: Say that an act is unambiguous if an ambiguity averse (or loving) DM evaluates it in an ambiguity neutral fashion. The unambiguous events are those that unambiguous acts depend upon. We obtain the following simple characterization of the set of unambiguous events for biseparable preferences: For an ambiguity averse (or loving) DM with willingness to bet ρ, event A is unambiguous if and only if ρ(A) + ρ(Ac ) = 1. (A more extensive discussion of ambiguity is contained in the companion (Ghirardato and Marinacci, 2000a).) Finally, as an application of the previous analysis, we consider the classical Ellsberg problem with a 3-color urn. We show that the theory delivers the intuitive answers, once the information provided to the DM is correctly incorporated. It is important to underscore from the outset two important limitations of the notions of ambiguity attitude we propose. The first limitation is that while the comparative foundation makes our absolute notion “behavioral,” in the sense defined above, it also makes it computationally demanding. A more satisfactory definition would be one which is more “direct:” It can be verified by observing a smaller subset of the DM’s preference relation. While we conjecture that it may be possible to construct such a definition—obtaining the same characterization as the one proposed here—we leave its development to future work. Our comparative notion is more direct, thus less amenable to this criticism. However, it is in turn limited by the requirement of the identity of cardinal risk attitude. The absolute notion is not, as it conceptually builds on the comparison of the DM with an idealized version of herself, identical to her in all traits but her ambiguity aversion. The second limitation stems from the fact that no extraneous devices are used in this chapter. An advantage of this is that our notions apply to any decision problem under uncertainty, and our results to any biseparable preference. However, such wide scope carries costs: Our notion of ambiguity aversion comprises behavioral traits that may not be due to ambiguity—like probabilistic risk aversion, the tendency of discounting “objective” probabilities that has been observed in many experiments on decision making under risk (including the celebrated “Allais paradox”). Thus, one may consider it more appropriate to use a different name for what is measured here, like “chance aversion” or “extended ambiguity aversion.” The reason for our choice of terminology is that we see a ranking of conceptual importance between ambiguity aversion/love and other departures from SEU maximization. As we argued above using Savage’s words, the presence of ambiguity provides a normatively compelling reason for violating SEU. We do not feel that other documented reasons are similarly compelling. Moreover, we hold (see below and Subsection 10.7.3) that extraneous devices—say, a rich set of exogenously “unambiguous” events—are required for ascertaining the reason of a given departure. Thus, when these devices are not available—say, because the set
214
Paolo Ghirardato and Massimo Marinacci
of “unambiguous” events is not rich enough—we prefer to attribute a departure to the reasons we find normatively more compelling. However, the reader is warned, so that he/she may choose to give a different name to the phenomenon we formally describe.
10.1.1. The related literature The problem of defining ambiguity and ambiguity aversion is discussed in a number of earlier papers. The closest to ours in spirit and generality is Epstein (1999), the first paper to develop a notion of absolute ambiguity aversion from a comparative foundation.3 As we discuss in more detail in Subsection 10.7.3, the comparative notion and benchmarks he uses are different from ours. Epstein’s objective is to provide a more precise measurement of ambiguity attitude than the one we attempt here; in particular, to filter out probabilistic risk aversion. For this reason, he assumes that in the absence of ambiguity a DM’s preferences are “probabilistically sophisticated” in the sense of Machina and Schmeidler (1992). However, we argue that for its conclusions to conform with intuition, Epstein’s approach requires an extraneous device: a rich set of acts which are exogenously established to be “unambiguous,” much larger than the set of the constants that we use. Thus, the higher accuracy of his approach limits its applicability vis à vis our cruder but less demanding approach. The most widely known and accepted definition of absolute ambiguity aversion is that proposed by Schmeidler in his seminal CEU model (Schmeidler, 1989). Employing an Anscombe–Aumann framework, he defines ambiguity aversion as the preference for “objective mixtures” of acts, and he shows that for CEU preferences this notion is characterized by the convexity of the capacity representing the DM’s beliefs. While the intuition behind this definition is certainly compelling, Schmeidler’s axiom captures more than our notion of ambiguity aversion. It gives rise to ambiguity averse behavior, but it entails additional structure that does not seem to be related to ambiguity aversion (see Example 10.4). Doubts about the relation of convexity to ambiguity aversion in the CEU case are also raised by Epstein (1999), but he concludes that they are completely unrelated (see Section 10.6 for a discussion). There are other interesting papers dealing with ambiguity and ambiguity aversion. In a finite setting, Kelsey and Nandeibam (1996) propose a notion of comparative ambiguity for the CEU and MEU models similar to ours and obtain a similar characterization, as well as an additional characterization in the CEU case. Unlike us, they do not consider absolute ambiguity attitude, and they do not discuss the issue of the distinction of cardinal risk and ambiguity attitude. Montesano and Giovannoni (1996) notice a connection between absolute ambiguity aversion in the CEU model and nonemptiness of the core, but they base themselves purely on intuitive considerations on Ellsberg’s example. Chateauneuf and Tallon (1998) present an intuitive necessary and sufficient condition for nonemptiness of the core of CEU preferences in an Anscombe–Aumann framework. Zhang
Ambiguity made precise
215
(1996), Nehring (1999), and Epstein and Zhang (2001) propose different definitions of unambiguous event and act. Fishburn (1993) characterizes axiomatically a primitive notion of ambiguity. 10.1.2. Organization The structure of the chapter is as follows. Section 10.2 provides the necessary definitions and set-up. Section 10.3 introduces the notions of ambiguity aversion. The cardinal symmetry condition is introduced in Subsection 10.3.1, and the comparative and absolute definitions in 10.3.2. Section 10.4 presents the characterization results. Section 10.5 contains the notions of unambiguous act and event, and the characterization of the latter. In Section 10.6, we go back to the Ellsberg urn and show the implications of our results for that example. Section 10.7 discusses the key aspects of our approach, in particular, the choices of the comparative ambiguity ranking and the benchmark for defining ambiguity neutrality; it thus provides a more detailed comparison with Epstein’s (1999) approach. The Appendices contain the proofs and some technical material.
10.2. Set-up and preliminaries The general set-up of Savage (1954) is the following. There is a set S of states of the world, an algebra of subsets of S, and a set X of consequences. The choice set F is the set of all finite-valued acts f : S → X which are measurable w.r.t. . With the customary abuse of notation, for x ∈ X we define x ∈ F to be the constant act x(s) = x for all s ∈ S, so that X ⊆ F . Given A ∈ , we denote by xAy the binary act (bet) f ∈ F such that f (s) = x for s ∈ A, and f (s) = y for s∈ / A. Our definitions require that the DM’s preferences be represented by a weak order on F : a complete and transitive binary relation , with asymmetric (resp. symmetric) component (resp. ∼). The weak order is called nontrivial if there are f , g ∈ F such that f g. We henceforth call preference relation any nontrivial weak order on F . A functional V : F → R is a representation of if for every f , g ∈ F , f g if and only if V (f ) V (g). A representation V is called: monotonic if f (s) g(s) for every s ∈ S implies V (f ) V (g); nontrivial if V (f ) > V (g) for some f,g ∈ F . While the definitions apply to any preference relation, our results require a little more structure, provided by a general decision model introduced in Ghirardato and Marinacci (2000a). To present it, we need the following notion of “nontrivial” event: Given a preference relation , A ∈ is essential for if for some x, y ∈ X, we have x x Ay y. Definition 10.1. Let be a binary relation. We say that a representation V : F → R of is canonical if it is nontrivial monotonic and there exists
216
Paolo Ghirardato and Massimo Marinacci
a set-function ρ : → [0, 1] such that, letting u(x) ≡ V (x) for all x ∈ X, for all consequences x y and all events A, V (x Ay) = u(x) ρ(A) + u(y) (1 − ρ(A)).
(10.3)
A relation is called a biseparable preference if it admits a canonical representation, and moreover such representation is unique up to a positive affine transformation when has at least one essential event. Clearly, a biseparable preference is a preference relation. If V is a canonical representation of , then u is a cardinal state-independent representation of the DM’s preferences over consequences, hence we call it his canonical utility index. Moreover, for all x y and all events A, B ∈ we have xAy xBy if and only if ρ(A) ρ(B). Thus, ρ represents the DM’s willingness to bet (likelihood relation) on events. ρ is easily shown to be a capacity—a set-function normalized and monotonic w.r.t. set inclusion—so that V evaluates binary acts by taking the Choquet expectation of u with respect to ρ.4 However, the DM’s preferences over nonbinary acts are not constrained to a specific functional form. To understand the rationale of the clause relating to essential events, first observe that for any with a canonical representation with willingness to bet ρ, an event A is essential if and only iff 0 < ρ(A) < 1. Thus, there are no essential events iff ρ(A) is either 0 or 1 for every A; that is, the DM behaves as if he does not judge any bet to be uncertain, and his canonical utility index is ordinal. In such a case, the DM’s cardinal risk attitude is then intuitively not defined: without an uncertain event there is no risk. On the other hand, it can be shown (Ghirardato and Marinacci, 2000a: Theorem 4) that cardinal risk attitude is characterized by a cardinal property of the canonical utility index, its concavity. Hence the additional requirement in Definition 10.1 guarantees that when there is some uncertain event cardinal risk aversion is well defined. As the differences in two DM’s cardinal risk attitude might play a role in the choices in Equations (10.1) and (10.2), it is useful to identify the situation in which these attitudes are defined: Say that preference relations 1 and 2 have essential events if there are events A1 , A2 ∈ such that for each i = 1, 2, Ai is essential for i . To avoid repetitions, the following lists all the assumptions on the structure of the decision problem and on the DM’s preferences that are tacitly assumed in all results in the chapter: Structural Assumption X is a connected and separable topological space (e.g. a convex subset of Rn with the usual topology). Every biseparable preference on F has a continuous canonical utility function. A full axiomatic characterization of the biseparable preferences satisfying the Structural Assumption is provided in Ghirardato and Marinacci, 2000a. 10.2.1. Some examples of biseparable preferences As mentioned earlier, the biseparable preference model is very general. In fact, it contains most of the known preference models that obtain a separation between
Ambiguity made precise
217
cardinal (state-independent) utility and willingness to bet. We now illustrate this claim by showing some examples of decision models which under mild additional restrictions (e.g. the Structural Assumption) belong to the biseparable class. (More examples and details are found in Ghirardato and Marinacci, 2000a.) (i) A binary relation on F is a CEU ordering if there exist a cardinal utility index u on X and a capacity ν on (S, ) such that can be represented by the functional V : F → R defined by the following equation: V (f ) =
u(f (s)) ν(ds),
(10.4)
S
where the integral is taken in the sense of Choquet (notice that it is finite because each act in F is finite-valued). The functional V is immediately seen to be a canonical representation of , and ρ = ν is its willingness to bet. An important subclass of CEU orderings are the SEU orderings, which correspond to the special case in which ν is a probability measure, that is, a finitely additive capacity. See Wakker (1989) for an axiomatization of CEU and SEU preferences (satisfying the Structural Assumption) in the Savage setting. (ii) Let denote the set of all the probability measures on (S, ). A binary relation on F is a MEU ordering if there exist a cardinal utility index u and a unique nonempty, (weak∗ )-compact and convex set C ⊆ such that can be represented by the functional V : F → R defined by the following equation: V (f ) = min
P ∈C S
u(f (s)) P (ds).
(10.5)
SEU also corresponds to the special case of MEU in which C = {P } for some probability measure P . If we now let for any A ∈ , P (A) = min P (A), P ∈C
(10.6)
we see that P is an exact capacity. While in general V (f ) is not equal to the Choquet integral of u(f ) with respect to P , this is the case for binary acts f . This shows that V is a canonical representation of , with willingness to bet ρ = P . See Casadesus-Masanell et al. (2000) for an axiomatization of MEU preferences (satisfying the Structural Assumption) in the Savage setting. More generally, consider an α-MEU preference which assigns some weight to both the worst-case and best-case scenarios. Formally, there is a cardinal utility u, a set or probabilities C, and α ∈ [0, 1], such that is
218
Paolo Ghirardato and Massimo Marinacci represented by * ) V (f ) = α min u(f (s)) P (ds) + (1 − α) max u(f (s)) P (ds) . P ∈C S
P ∈C
S
This includes the case of a “maximax” DM, who has α ≡ 0. V is canonical, so that is biseparable, with ρ given by ρ(A) = α minP ∈C P (A) + (1 − α) maxP ∈C P (A), for A ∈ . (iii) Consider a binary relation constructed as follows: There is a cardinal utility u, a probability P and a number β ∈ [0, 1] such that is represented by V (f ) ≡ (1 − β) u(f (s)) P (ds) + β ϕ(u ◦ f ), S
where
"
ϕ(u ◦ f ) ≡ sup
u(g(s)) P (ds) : g ∈ F binary, S
% u(g(s)) u(f (s)) for all s ∈ S . describes a DM who behaves as if he was maximizing SEU when choosing among binary acts, but not when comparing more complex acts. The higher the parameter β, the farther the preference relation is from SEU on nonbinary acts. V is monotonic and it satisfies Equation (10.3) with ρ = P , so that it is a canonical representation of . 10.2.2. The Anscombe–Aumann case The Anscombe–Aumann framework is a widely used special case of our framework in which the consequences have an objective feature: X is also a convex subset of a vector space. For instance, X is the set of all the lotteries on a set of prizes if the DM has access to an “objective” independent randomizing device. In this framework, it is natural to consider the following variant of the biseparable preference model— where for every f , g ∈ F and α ∈ [0, 1], αf + (1 − α)g denotes the act which pays αf (s) + (1 − α)g(s) ∈ X for every s ∈ S. Definition 10.2. A canonical representation V of a preference relation is constant linear (c-linear for short) if V (αf + (1 − α)x) = αV (f ) + (1 − α)V (x) for all binary f ∈ F , x ∈ X, and α ∈ [0, 1]. A relation is called a c-linearly biseparable preference if it admits a c-linear canonical representation. Again, an axiomatic characterization of this model is found in Ghirardato and Marinacci (2000a). It generalizes the SEU model ofAnscombe andAumann (1963) and many non-EU extensions that followed, like the CEU and MEU models of Schmeidler (1989) and Gilboa and Schmeidler (1989) respectively. In fact, a clinearly biseparable preference behaves in a SEU fashion over the set X of the
Ambiguity made precise
219
constant acts, but it is almost unconstrained over nonbinary acts. (C-linearity guarantees the cardinality of V and hence u.) All the results in this chapter are immediately translated to this class of preferences, in particular to the CEU and MEU models in the Anscombe–Aumann framework mentioned earlier. Indeed, as we show in Proposition 10.2 later, in this case removing cardinal risk aversion is much easier than in the more general framework we use.
10.3. The definitions As anticipated in the Introduction, the point of departure of our search for an extended notion of ambiguity aversion is the following partial order on preference relations: Definition 10.3. Let 1 and 2 be two preference relations. We say that 2 is more uncertainty averse than 1 if: For all x ∈ X and f ∈ F , both x 1 f ⇒ x 2 f
(10.7)
x 1 f ⇒ x 2 f .
(10.8)
and
This order has the advantage of making the weakest prejudgment on which acts are “intuitively” unambiguous: The constants. However, Example 10.1 illustrates that it does not discriminate between cardinal risk attitude and ambiguity attitude: DMs 1 and 2 are intuitively both ambiguity neutral, but 1 is more cardinal risk averse, and hence more uncertainty averse than 2. The problem is that constant acts are “neutral” with respect to ambiguity and with respect to cardinal risk. Given that our objective is comparing ambiguity attitudes, we thus need to find ways to coarsen the ranking above, so as to identify which part is due to differences in cardinal risk attitude and which is due to differences in ambiguity attitude. 10.3.1. Filtering cardinal risk attitude While the “factorization” just described can be achieved easily if we impose more structure on the decision framework (see, e.g. the discussion in Subsection 10.7.3), we present a method for separating cardinal risk and ambiguity attitude which is only based on preferences, does not employ extraneous devices, and obtains the result for all biseparable preferences. Moreover, this approach does not impose any restrictions on the two DMs’ beliefs (and hence on their relative ambiguity attitude), a problem that all alternatives share. The key step is coarsening comparative uncertainty aversion by adding the following restriction on which pairs of preferences are to be compared (we write {x, y} z as a short-hand for x z and y z, and similarly for ≺):
220
Paolo Ghirardato and Massimo Marinacci
Definition 10.4. Two preference relations 1 and 2 are cardinally symmetric if for any pair (A1 , A2 ) ∈ × such that each Ai is essential for i , i = 1, 2, and any v∗ , v ∗ , w∗ , w∗ ∈ X such that v∗ ≺1 v ∗ and w∗ ≺2 w ∗ we have: •
If there are x, y ∈ X such that v∗ 1 {x, y}, w∗ 2 {x, y}, and v∗ A1 x ∼1 v ∗ A1 y
and
w∗ A2 x ∼2 w ∗ A2 y,
(10.9)
then for every x , y ∈ X such that v∗ 1 {x , y }, w∗ 2 {x , y } we have v∗ A1 x ∼1 v ∗ A1 y ⇐⇒ w∗ A2 x ∼2 w ∗ A2 y . •
(10.10)
Symmetrically, if there are x, y ∈ X such that v ∗ ≺1 {x, y}, w ∗ ≺2 {x, y}, and x A1 v ∗ ∼1 y A1 v∗
and
x A2 w ∗ ∼2 y A2 w∗ ,
(10.11)
then for every x , y ∈ X such that v ∗ ≺1 {x , y }, w ∗ ≺2 {x , y } we have x A1 v ∗ ∼1 y A1 v∗ ⇐⇒ x A2 w ∗ ∼2 y A2 w∗ .
(10.12)
This condition is inspired by the utility construction technique used in the axiomatizations of additive conjoint measurement in, for example, Krantz et al. (1971) and Wakker (1989). A few remarks are in order: First, cardinal symmetry holds vacuously for any pair of preferences which do not have essential events. Second, cardinal symmetry does not impose restrictions on the DMs’ relative ambiguity attitudes. In fact, for all acts ranked by i , the consequence obtained if Ai is always strictly better than that obtained if Aci , so that all acts are bets on the same event Ai . Intuitively, a DM’s ambiguity attitude affects these bets symmetrically, so that his preferences do not convey any information about it. Moreover, cardinal symmetry does not constrain the DMs’ relative confidence on A1 and A2 , since the “win” (or “loss”) payoffs can be different for the two DMs. On the other hand, it does unsurprisingly restrict their relative cardinal risk attitudes. To better understand the relative restrictions implied by cardinal symmetry, assume that consequences are monetary payoffs and that both DMs like more money to less. Suppose that, when betting on events (A1 , A2 ), (10.9) holds for some “loss” payoffs x and y and “win” payoffs v ∗ 1 v∗ and w∗ 2 w∗ respectively. This says that exchanging v∗ for v ∗ as the prize for A1 , and w∗ for w∗ as the prize for A2 , can for both DMs be traded off with a reduction in “loss” from x to y. Suppose that when the initial loss is x < x, 1 is willing to tradeoff the increase in “win” with a reduction in “loss” to y , but 2 accepts reducing “loss” only to y > y (i.e., w∗ A2 x 2 w ∗ A2 y , in violation of (10.10)). That is, as the amount of the low payoff decreases, 2 becomes more sensible to differences in payoffs than 1 . Such diversity of behavior—that we intuitively attribute to differences in the DMs’ risk attitude—is ruled out by cardinal symmetry, which requires that the two DMs consistently agree on the acceptable tradeoff for improving their “win”
Ambiguity made precise
221
payoff, and similarly for the “loss” payoff. It is important to stress that this discussion makes sense only when both DMs are faced with nontrivial uncertainty (i.e. they are both betting on essential events). Thus, we do not use “trade-off” to mean certain substitution; rather, substitution in the context of an uncertain prospect. To see how cardinal symmetry is used to show that two biseparable preferences have the same cardinal risk attitude, assume first that the two relations are ordinally equivalent: for every x, y ∈ X, x 1 y ⇔ x 2 y. When that is the case, cardinal symmetry holds if and only if their canonical utility indices are positive affine transformations of each other. In order to simplify the statements, we write u1 ≈ u2 to denote such “equality” of indices. Proposition 10.1. Suppose that 1 and 2 are ordinally equivalent biseparable preferences which have essential events. Then 1 and 2 are cardinally symmetric if and only if their canonical utility indices satisfy u1 ≈ u2 . The intuition of the proof (see Appendix B) can be quickly grasped by rewriting, say, Equations (10.9) and (10.10) in terms of the canonical representations to find that for every x, y, x , y ∈ X, u1 (x) − u1 (y) = u1 (x ) − u1 (y ) ⇐⇒ u2 (x) − u2 (y) = u2 (x ) − u2 (y ). Notice however that this does not imply that the preferences are identical on binary acts: The DMs’ beliefs on events could be totally different. The comparative notion of ambiguity aversion we propose in the next subsection checks comparative uncertainty aversion in preferences with the same cardinal risk attitude. Clearly, it would be nicer to have a comparative notion that also ranks preferences without the same cardinal risk attitude. In Subsection 10.7.1, we discuss how to extend our notion to deal with these cases. This extension requires the exact measurement of the two preferences’ canonical utility indices, and is thus “less behavioral” than the one we just anticipated. Finally, we remark that a symmetric exercise to that performed here is to coarsen comparative uncertainty aversion so as to rank preferences by their cardinal risk aversion only. In Ghirardato and Marinacci (2000a) it is shown that for biseparable preferences such ranking is represented by the ordering of canonical utilities by their relative concavity, thus generalizing the standard result. 10.3.2. Comparative and absolute ambiguity aversion Having thus prepared the ground, our comparative notion of ambiguity is immediately stated: Definition 10.5. Let 1 and 2 be two preference relations. We say that 2 is more ambiguity averse than 1 whenever both the following
222
Paolo Ghirardato and Massimo Marinacci
conditions hold: (A) 2 is more uncertainty averse than 1 ; (B) 1 and 2 are cardinally symmetric. Thus, we restrict our attention to pairs which are cardinally symmetric. As explained earlier, when one DM’s preference does not have an essential event, cardinal risk aversion does not play a role in that DM’s choices, so that we do not need to remove it from the picture. Remark 10.1. So far, we have tacitly assumed that cardinal risk and ambiguity attitude completely characterize biseparable preferences. Indeed, the validity of this can be easily verified by observing that if two such preferences are “as uncertainty averse as” each other (i.e., 1 is more uncertainty averse than 2 , and vice versa), they are identical. We finally come to the absolute definition of ambiguity aversion and love. Let be a preference relation on F with a SEU representation.5 As we observed in the Section 10.1, these relations intuitively embody ambiguity neutrality. We propose to use them as the benchmark for defining ambiguity aversion. Of course, one could intuitively hold that the SEU ones are not the only relations embodying ambiguity neutrality, and thus prefer using a wider set of benchmarks. This alternative route is discussed in Subsection 10.7.3. Definition 10.6. A preference relation is ambiguity averse (loving) if there exists a SEU preference relation which is less (more) ambiguity averse than . It is ambiguity neutral if it is both ambiguity averse and ambiguity loving. If is a SEU preference which is less ambiguity averse than , we call it a benchmark preference for . We denote by R() the set of all benchmark preferences for . That is, R() ≡ { ⊆ F × F : is SEU and is more ambiguity averse than }. Each benchmark preference ∈ R() induces a probability measure P on , so a natural twin of R() is the set of the benchmark measures: M() = {P ∈ : P represents , for ∈ R()}. Using this notation, Definition 10.6 can be rewritten as follows: is ambiguity averse if either R() = Ø, or M() = Ø.
10.4. The characterizations We now characterize the notions of comparative and absolute ambiguity aversion defined in the previous section for the general case of biseparable preferences, and the important subcases of CEU and MEU preferences. To start, we use
Ambiguity made precise
223
Proposition 10.1 and the observation that the canonical utility index of a preference with no essential events is ordinal, to show that if two preferences are biseparable and they are ranked by Definition 10.5, they have the same canonical utility index: Theorem 10.1. Suppose that 1 and 2 are biseparable preferences, and that 2 is more ambiguity averse than 1 . Then u1 ≈ u2 . Checking cardinal symmetry is clearly not a trivial task, but for an important subclass of preference relations—the c-linearly biseparable preferences in an Anscombe–Aumann setting—it is implied by comparative uncertainty aversion. In fact, under c-linearity, ordinal equivalence easily implies cardinal symmetry, so that we get: Proposition 10.2. Suppose that X is a convex subset of a vector space, and that 1 and 2 are c-linearly biseparable preferences. 2 is more ambiguity averse than 1 if and only if 2 is more uncertainty averse than 1 . Therefore, in this case Definition 10.3 can be directly used as our definition of comparative ambiguity attitude. 10.4.1. Absolute ambiguity aversion We first characterize absolute ambiguity aversion for a general biseparable preference . Suppose that V is a canonical representation of , with canonical utility u. We let " % D() ≡ P ∈ : u(f (s)) P (ds) V (f ) for all f ∈ F . S
That is, D(), which depends only on V , is the set of beliefs inducing preferences which assign (weakly) higher expected utility to every act f . These preferences exhaust the set of the benchmarks of : Theorem 10.2. Let be a biseparable preference. Then, M() = D(). In particular, is ambiguity averse if and only if D() = Ø. Let ρ be the capacity associated with the canonical representation V . It is immediate to see that if P ∈ D(), then P ρ. Thus, nonemptiness of the core of ρ (the set of the probabilities that dominate ρ pointwise, that we denote C(ρ)) is necessary for to be ambiguity averse. In Subsection 10.4.2 it is shown to be not sufficient in general. Turn now to the characterization of ambiguity aversion for the popular CEU and MEU models. Suppose first that is a CEU preference relation represented by the capacity ν, and let C(ν) denote ν’s possibly empty core. It is shown that D() = C(ν), so that the following result—which also provides a novel decision-theoretic interpretation of the core as the set of all the benchmark measures—follows as a corollary of Theorem 10.2.
224
Paolo Ghirardato and Massimo Marinacci
Corollary 10.1. Suppose that is a CEU preference relation, represented by capacity ν. Then C(ν) = M(). In particular, is ambiguity averse if and only if C(ν) = Ø. Thus, the core of an ambiguity averse capacity is equal to the set of its benchmark measures, and the ambiguity averse capacities are those with a nonempty core, called “balanced.” A classical result (see, e.g. Kannai, 1992) thus provides an internal characterization of ambiguity aversion in the CEU case: Letting 1A denote the characteristic function of A ∈ , a capacity reflects ambiguity aversion if and only if for all λ1 , . . . , λn 0 and all A1 , . . . , An ∈ such that ni=1 λi 1Ai 1S , we have ni=1 λi ν (Ai ) 1. As convex capacities are balanced, but not conversely, the corollary motivates our claim that convexity does not characterize our notion of ambiguity aversion. This point is illustrated by Example 10.4 below, which presents a capacity that intuitively reflects ambiguity aversion but is not convex. On the other hand, given a MEU preference relation with set of priors C, it is shown that D() = C. Thus, Theorem 10.2 implies that any MEU preference is ambiguity averse (as it is intuitive) and, more interestingly, that the set C can be interpreted as the set of the benchmark measures for . Corollary 10.2. Suppose that is a MEU preference relation, represented by the set of probabilities C. Then C = M(), so that is ambiguity averse. As to ambiguity love, reversing the proof of Theorem 10.2 shows that for any biseparable preference, ambiguity love is characterized by nonemptiness of the set " % E () ≡ P ∈ : u(f (s)) P (ds) V (f ) for all f ∈ F . S
In particular, a CEU preference with capacity ν is ambiguity loving if and only if the set of probabilities dominated by ν is nonempty. As for MEU preferences: None is ambiguity loving. Conversely, any “maximax” EU preference is ambiguity loving, with E () = C. Finally, we look at ambiguity neutrality. Since we started with an informal intuition of SEU preferences as reflecting neutrality to ambiguity, an important consistency check on our analysis is to verify that they are ambiguity neutral in the formal sense. This is the case: Proposition 10.3. Let be a biseparable preference. Then is ambiguity neutral if and only if it is a SEU preference relation. 10.4.2. Comparative ambiguity aversion We conclude the section with the characterization of comparative ambiguity aversion. The general result on comparative ambiguity, an immediate consequence
Ambiguity made precise
225
of Theorem 10.2, is stated as follows (where ρ1 and ρ2 represent the willingness to bet of 1 and 2 respectively): Proposition 10.4. Let 1 and 2 be two biseparable preferences. If 2 is more ambiguity averse than 1 , then ρ1 ρ2 , D(1 ) ⊆ D(2 ), E (1 ) ⊇ E (2 ) and u1 ≈ u2 . Thus, relative ambiguity implies containment of the sets D() and E () (clearly in opposite directions), and dominance of the willingness to bet ρ. Of course, the proposition lacks a converse, and thus it does not offer a full characterization. As we argue below, biseparable preferences seem to have too little structure for obtaining a general characterization result. Things are different if we restrict our attention to specific models. For instance, the next result characterizes comparative ambiguity for the CEU and MEU models: Theorem 10.3. Let 1 and 2 be biseparable preferences, with canonical utilities u1 and u2 respectively. (i) Suppose that 1 and 2 are CEU, with respective capacities ν1 and ν2 . Then 2 is more ambiguity averse than 1 if and only if ν1 ν2 and u1 ≈ u2 . (ii) Suppose that 1 is MEU, with set of probabilities C1 . Then 2 is more ambiguity averse than 1 if and only if C1 = D(1 ) ⊆ D(2 ) and u1 ≈ u2 . Observe that part (ii) of the theorem does more than characterize comparative ambiguity for MEU preferences, as it applies to any biseparable 2 . For instance, it is immediate to notice that one can characterize absolute ambiguity aversion using that result and the fact that if 1 is a SEU preference relation with beliefs P , then C1 = {P }. Also, a symmetric result to (ii) holds: If 2 is “maximax” EU, it is more ambiguity averse than 1 iff C2 = E (2 ) ⊆ E (1 ). Remark 10.2. Theorem 10.3 can be used to explain the apparent incongruence of the characterization of comparative risk aversion in SEU (in the sense of Yaari, 1969) and of comparative ambiguity aversion in CEU: Convexity of ν seems to be the natural counterpart of concavity of u, but it is not. This is due to the different uniqueness properties of utility functions and capacities. A SEU 2 is more risk averse than a SEU 1 iff for every common normalization of the utilities, we have u2 (x) u(x) inside the interval of normalization. Since any normalization is allowed, u2 must then be a concave transformation of u1 . In the case of capacities only one normalization is allowed, so we only have ν1 ν2 . It is not difficult to show that the necessary conditions of Proposition 10.4 are not sufficient if taken one by one. For instance, there are pairs of MEU (resp. CEU) preferences 1 and 2 such that ρ1 ρ2 (resp. C(ν1 ) = D(1 ) ⊆ D(2 ) = C(ν2 )) does not entail that 2 is more ambiguity averse than 1 .
226
Paolo Ghirardato and Massimo Marinacci
Example 10.2. Let S = {s1 , s2 , s3 }, the power set of S. Consider the probabilities P , Q and R defined by P = [1/2, 0, 1/2], Q = [0, 1, 0] and R = [1/2, 1/2, 0]. Let C1 and C2 respectively be the closed convex hull of {P , Q} and {P , Q, R}. Then, ρ1 = P 1 = P 2 = ρ2 , but C2 ⊆ C1 , and indeed by Theorem 10.3 the MEU preference 2 inducing C2 is more ambiguity averse than the MEU preference 1 inducing C1 . Consider next a capacity ν such that ν(A) = 1/3 for any A = {Ø, S}, and a probability P equal to 1/3 on each singleton. Then C(ν) = {P }, so that ν is balanced, but not exact (for instance, P ({s1 , s2 }) = 2/3 > 1/3 = ν({s1 , s2 })). We have C(ν) ⊆ C(P ) but ν P , and by Theorem 10.3 the CEU preference inducing ν is not more ambiguity averse than that inducing P . In contrast, P is exact, and we have both C(P ) ⊆ C(ν) and P ν. This example illustrates two conceptual observations. The first (anticipated in Subsection 10.4.1) is that nonemptiness of the core of ρ is not sufficient for absolute ambiguity aversion: A probability can dominate ρ without being a benchmark measure for . Unsurprisingly, in general the capacity ρ does not completely describe the DM’s ambiguity attitude. The second observation is that, while D() does characterize the DM’s absolute ambiguity aversion, it is also an incomplete description of the DM’s ambiguity attitude: There can be preferences 1 and 2 strictly ranked by comparative ambiguity even though D(1 ) = D(2 ). To better appreciate the difficulty of obtaining a general sufficiency result for biseparable preferences, we now present an example in which all the necessary conditions hold but the comparative ranking does not obtain. Example 10.3. For a general S and (but see the restriction on P below), consider two preference relations 1 and 2 which behave according to example (iii) of biseparable preference in Section 10.2. Both have identical P and u (which ranges in a nondegenerate interval of R), with the following restriction on P : There are at least three disjoint events in , A1 , A2 and A3 such that P (Ai ) > 0 for i = 1, 2, 3 (otherwise both preferences are indistinguishable from SEU preferences with utility u and beliefs P ). Their β parameters are different, in particular β2 > β1 > 0. Clearly ρ1 = ρ2 = P and u1 ≈ u2 . It is also immediate to verify that, under the assumption on P , D(1 ) = D(2 ) = {P } and E (1 ) = E (2 ) = Ø, so that both preferences are (strictly) ambiguity averse. However, 1 is not more ambiguity averse than 2 (nor are 1 and 2 equal, which would follow from two applications of the converse). Indeed, the parameter β measures comparative ambiguity for these preferences, so that 2 is more ambiguity averse than 1 .
10.5. Unambiguous acts and events Let be an ambiguity averse or loving preference relation. Even though the preference relation has a strict ambiguity attitude, it may nevertheless behave in an ambiguity neutral fashion with respect to some subclass of acts and events,
Ambiguity made precise
227
that we may like to consider “unambiguous.” The purpose of this section is to identify the class of the unambiguous acts and the related class of unambiguous events, and to present a characterization of the latter for biseparable (in particular CEU and MEU) preference relations. We henceforth focus on ambiguity averse preference relations, but it is easy to see that all the results in this section can be shown for ambiguity loving preferences. A more extensive discussion of the behavioral definition of ambiguity for events and acts is found in Ghirardato and Marinacci, 2000b. In view of our results so far, the natural approach in defining the class of unambiguous events of a preference relation is to fix a benchmark ∈ R(), and to consider the subset of all the acts in F over which is as ambiguity averse as . Intuitively, ambiguity is a property that the DM attaches to partitions of events, so that nonconstant acts which generate the same partition should be consistently deemed either both ambiguous or both unambiguous. Hence, we consider as “truly” unambiguous only the acts which belong to the set defined next. Definition 10.7. Given a preference relation and ∈ R(), the set of -unambiguous acts, denoted H , is the largest subset of F satisfying the following two conditions:6 (A) For every x ∈ X and every f ∈ H , and agree on the ranking of f and x. (B) For every f ∈ H and every g ∈ F , if {{s : g(s) ∼ x} : x ∈ X} ⊆ {{s : f (s) ∼ x} : x ∈ X}, then g ∈ H . Given a preference relation , for any f ∈ F denote by f the collection of all the “upper pre-image” sets of f , that is, f = {{s : f (s) x} : x ∈ X} .
(10.13)
Since any benchmark ∈ R() is ordinally equivalent to , for any act f ∈ F the upper pre-images of f with respect to and coincide: for all x ∈ X, {s: f (s) x} = {s: f (s) x}. The set ⊆ of the -unambiguous events is thus naturally defined to be the collection of all sets of upper pre-images of the acts in H . That is, ≡ f . f ∈H
It is immediate to observe that if A ∈ , then for every x, y ∈ X the binary act xAy belongs to H . This implies that Ac ∈ (i.e., is closed w.r.t. complements). We now present the characterization of the set . This turns out to be quite simple and intuitive: It is the subset of the events over which the capacity ρ representing ’s willingness to bet is complement-additive (sometimes called “symmetric”):
228
Paolo Ghirardato and Massimo Marinacci
Proposition 10.5. Let be an ambiguity averse biseparable preference with willingness to bet ρ. Then for every ∈ R(), the set satisfies: , + = A ∈ : ρ(A) + ρ(Ac ) = 1 . (10.14) It immediately follows from the proposition that the choice of the specific benchmark does not change the resulting set of events. In light of this, we henceforth call = the set of unambiguous events for . The consequences of the proposition for the CEU and MEU models are clear: Just substitute ν or P for ρ. In particular, when is a MEU preference with a set of probabilities C, it can be further shown that is the set of events on which all probabilities agree: = {A ∈ : ρ(A) = P (A) for all P ∈ C} . It is also interesting to observe that is in general not an algebra. This is intuitive, as the intersection of unambiguous events could be ambiguous.7 As to the set of unambiguous acts H , it can also be seen to be independent of the choice of benchmark. In general, the only way to ascertain which acts are unambiguous is to construct the set H . However, for MEU preferences and for CEU preferences whose capacity is exact (the lower envelope of its core), the set H is the set of all the acts which are measurable with respect to the events in . Therefore, in these cases characterizes the set of unambiguous acts as well. (All these results are proved in Ghirardato and Marinacci, 2000b).
10.6. Back to Ellsberg We now illustrate our results using the classical Ellsberg urn. The urn contains 90 balls of three colors: red, blue and yellow. The DM knows that there are 30 red balls and that the other 60 balls are either blue or yellow. However, he does not know their relative proportion. The state space for an extraction from the urn is S = {B, R, Y }. Given the nature of his information, it is natural to assume that the DM’s preference relation will be such that its set of unambiguous events satisfies ⊇ {Ø, {R}, {B, Y }, S}. In particular, assume that the DM’s preference relation is CEU and it induces the capacity ν. To reflect the fact that {R} and {B, Y } form an unambiguous partition, we know from Section 10.5 that if the DM is ambiguity averse (or loving) ν must satisfy ν(R) + ν(B, Y ) = 1.
(10.15)
Also, because of the symmetry of the information that the DM is given, it is natural to assume that ν(B) = ν(Y )
and ν(B, R) = ν(R, Y ).
(10.16)
We first show that, if the ambiguity restriction (10.15) is imposed, ambiguity aversion is not compatible with the following beliefs, which induce behavior that
Ambiguity made precise
229
would on intuitive grounds be considered “ambiguity loving”: ν(R) < ν(B) = ν(Y ); ν(B, Y ) < ν(B, R) = ν(R, Y ).
(10.17)
Proposition 10.6. No ambiguity averse CEU preference relation such that its set of unambiguous events contains {{R}, {B, Y }} can agree with the ranking (10.17). In his paper on ambiguity aversion, Epstein (1999) also discusses the Ellsberg urn, and he presents a convex capacity compatible with ambiguity loving in his sense (see Subsection 10.7.3 for a brief review), which satisfies the conditions in (10.17). This is the capacity ν1 defined by 1 , ν1 (B, Y ) = 13 , ν1 (R) = 12 1 ν1 (B) = ν1 (Y ) = 6 , ν1 (B, R) = ν1 (R, Y ) = 12 .
He thus concludes that convexity of beliefs does not imply ambiguity aversion for CEU preferences (it is also not implied, in his definition). We know from Corollary 10.1 that convexity implies ambiguity aversion in our sense. Proposition 10.6 helps clarifying why this example does not conflict with the intuition developed earlier: In fact, ν1 does embody ambiguity aversion in our sense, but it does not reflect the usual presumption that {R} and {B, Y } are seen as unambiguous events. If it did, it would have to satisfy (10.15), which is not the case (it cannot be, since convex capacities are balanced). For us, the DM with beliefs ν1 does not perceive {R} and {B, Y } as unambiguous. Of course, then it is not clear in which sense the conditions in (10.17) should “intuitively” embody ambiguity loving behavior. Going back to the example, we would say that the DM’s preferences intuitively reflect ambiguity aversion if the reverse inequalities held: ν (R) ν (B) = ν (Y ) ; ν (B, Y ) ν (B, R) = ν (R, Y ) .
(10.18)
We now show that the notion of ambiguity aversion proposed earlier characterizes this intuitive ranking when, besides the obvious symmetry restrictions in (10.16), we strengthen the requirement in (10.15) in the following natural way: ν(R) =
1 3
and
ν(B, Y ) =
2 . 3
(10.19)
Proposition 10.7. Let be a CEU preference relation such that its representing capacity ν satisfies the equalities (10.16) and (10.19). Then is ambiguity averse if and only if ν agrees with the ranking (10.18). In closing our discussion of Ellsberg’s problem, we provide further backing for our belief that convexity is not necessary for ambiguity aversion. Here is a capacity which is not convex, and still makes the typical Ellsberg choices.
230
Paolo Ghirardato and Massimo Marinacci
Example 10.4. Consider the capacity ν2 defined by (10.19) and ν2 (B) = ν2 (Y ) =
7 , 24
ν2 (B, R) = ν2 (R, Y ) =
1 . 2
This capacity satisfies (10.18), so that it reflects ambiguity aversion both formally and intuitively, but it is not superadditive, let alone convex.
10.7. Discussion In this section we discuss some of the choices we have made in the previous sections. First we briefly discuss how the comparative ambiguity ranking can be extended to preferences with different cardinal risk attitude. Then we discuss in more detail how the unambiguous acts described in Section 10.5 can be used in the comparative ranking, and why we chose SEU preferences as benchmarks. 10.7.1. Comparative ambiguity and equality of cardinal risk attitude As we observed earlier, our comparative ambiguity aversion notion cannot compare biseparable preferences with different canonical utility indices. Of course, the characterization results of Section 10.4 can be used to qualitatively compare two preferences by ambiguity: For instance, we can look at two CEU preferences and compare their willingness to bet, or we can use utility functions to compare two SEU preferences by risk aversion, even if they do not have the same beliefs. However, when dealing with biseparable preferences, it is easy to apply the intuition of our comparative ranking to compare preferences which do not have the same canonical utility. This requires eliciting the canonical utility indices first, and then using acts and constants that are “utility equivalents” in Equations (10.7) and (10.8).8 The ranking thus obtained is very general (it does not even entail ordinal equivalence), but it yields mutatis mutandis the same characterization results that we obtained with the more restrictive one. For instance: is ambiguity averse iff D() = Ø, and CEU (MEU) preference 2 is more ambiguity averse than CEU (MEU) preference 1 iff ν1 ν2 (C1 ⊆ C2 ) (but of course in general u1 ≈ u2 ). Nonetheless, this ranking requires the full elicitation of the DMs’ canonical utility indices, and is thus operationally more complex than that in Definition 10.5. 10.7.2. Using unambiguous acts in the comparative ranking One of the intuitive assumptions that our analysis builds on is that constant acts are primitively “unambiguous:” That is, we assume that every DM perceives constants as unambiguous. No other acts are “unambiguous” in this primitive sense. However, one could argue that it is natural to use in the comparative ranking also those acts which are revealed to be deemed unambiguous by both DMs, even if they are not constant.
Ambiguity made precise
231
Suppose that is an ambiguity averse biseparable preference, and let H ( ) be its set of unambiguous acts (events), as defined in Section 10.5. It is possible to see Ghirardato and Marinacci (2000b) that for every ∈ R() and every h ∈ H and f ∈ F , we have hf ⇒hf
and
h > f ⇒ h f.
(10.20)
That is, all benchmarks according to Definition 10.5 satisfy the stronger comparative ranking suggested above. Conversely, it is obvious that if and a SEU preference are cardinally symmetric and satisfy (10.20), they satisfy Definition 10.5. Thus, modifying Definition 10.5 to have (10.20) in part (A) does not change the set of the ambiguity averse preferences. 10.7.3. A more general benchmark We chose SEU maximization as the benchmark representing ambiguity neutrality. While few would disagree that SEU preferences are “ambiguity neutral” (in a primitive, nonformal sense), some readers may find that the result of Proposition 10.3 that SEU maximization characterizes ambiguity neutrality does not agree with their intuition of what constitutes ambiguity neutral behavior. In particular, they might feel that we should also classify as ambiguity neutral any non-SEU preference whose likelihood relation can still be represented by a probability measure. This would clearly be the case if we let such preferences be benchmarks for our comparative ambiguity notion. Here we explain why we have not followed that route, and the consequences of this choice for the interpretation of our notions. The non-SEU preferences in question are those that are probabilistically sophisticated (PS) in the sense of Machina and Schmeidler (1992). For example, consider a CEU preference whose willingness to bet is ρ = g(P ) for some probability measure P and “distortion” function g; that is, an increasing g : [0, 1] → [0, 1] such that g(0) = 0 and g(1) = 1. Such is PS since its ranking of bets (likelihood relation) is represented by the probability P , but it is not SEU if g is different from the identity function. According to the point of view suggested above, such is “ambiguity neutral”; it should thus be used as a benchmark in characterizing ambiguity aversion. Moreover, if we used PS preferences as benchmarks it might be possible to avoid attributing to ambiguity aversion the effects of probabilistic risk aversion. However, go back to the ambiguous urn of Example 10.1 and consider the following: Example 10.1. (continued) In the framework of Example 10.1, consider a third DM with CEU preferences 3 , with canonical utility u(x) = x and willingness to bet defined by ρ3 (B) =
1 4
and
ρ3 (R) =
1 . 4
It is immediate to verify that according to Definition 10.5, DM 3 is more ambiguity averse than DM 1 (who is SEU), so that he is ambiguity averse in our sense.
232
Paolo Ghirardato and Massimo Marinacci
That seems quite natural, since he is willing to invest less in bets on the ball extractions. With PS benchmarks, we conclude that both DMs are ambiguity neutral, since their willingness to bet are ordinally equivalent to the probability ρ1 (ρ3 = g(ρ1 ) for any distortion g such that g(1/2) = 1/4), so that both are PS. Hence, DM 3’s behavior is only due to his probabilistic risk aversion. Yet, it seems that the fact that DM 3 is only willing to bet 1/4 utils on any color may at least in part be due to the ambiguity of the urn and his possible ambiguity aversion. This example is not the only case in which using PS benchmarks yields counterintuitive conclusions. When the state space is finite, if we use PS preferences as benchmarks we find that almost every CEU preference inducing a strictly positive ρ on a finite state space is both ambiguity averse and loving. Thus, a large set of preferences are shown to be ambiguity neutral. Including, as the following example illustrates, many preferences which are not PS. Example 10.5. Suppose that two DMs are faced with the following decision problem. There are two urns, both containing 100 balls, either red or black. The DMs are told that Urn I contains at least 40 balls of each color, while Urn II contains at least 10 balls of each color. One ball will be extracted from each urn. Thus, the state space is S = {Rr, Rb, Br, Bb}, where the upper (lower) case letter stands for the color of the ball extracted from Urn I (II). Suppose that both DMs have CEU preferences 1 and 2 , with respective willingness to bet ρ1 and ρ2 . Using obvious notation, suppose that ρ1 (b) = ρ1 (r) = 0.1 and ρ1 (B) = ρ1 (R) = 0.4, that ρ1 (s) = 0.04 for each singleton s, and for every other event ρ1 is obtained by additivity. According to Definition 13, DM 1 is strictly ambiguity averse. In contrast, with PS benchmarks the result mentioned above shows that he is ambiguity neutral. Let ρ2 be as follows: ρ2 (b) = ρ2 (r) = 0.9 and ρ2 (B) = ρ2 (R) = 0.6, ρ2 (s) = 0.54 for each singleton s, ρ2 (A) = 0.92 for each A ∈ {Rr ∪ Bb, Rb ∪ Br}, and ρ2 (A) = 0.95 for each ternary set. According to Corollary 10.1, DM 2 is ambiguity loving, but if we use PS benchmarks we conclude that she is ambiguity neutral. Both conclusions go against our intuition. Moreover, since both ρ1 and ρ2 are not ordinally equivalent to a probability, 1 and 2 are not PS. The foregoing discussion shows some of the difficulties that may arise if we use PS, rather than SEU, preferences as benchmarks with our comparative ambiguity aversion notion: We end up attributing too much explanatory power to probabilistic risk aversion. Instead, with SEU benchmarks we overemphasize the role of ambiguity aversion. Is it possible to remove probabilistic risk attitude from the picture, as we did for cardinal risk attitude?9 10.7.3.1. Removing probabilistic risk aversion Suppose that there is a subset E of acts which are universally accepted as “unambiguous,” in the sense that we are sure that a DM’s choices among these acts are unaffected by his ambiguity attitude. Then, if E (and the
Ambiguity made precise
233
associated set of “unambiguous” events, denoted ) is sufficiently rich, we can discriminate between probabilistic risk and ambiguity aversion. For instance, modify Example 10.1 by assuming the availability of an “unambiguous” randomizing device, so that each state describes the result of the device as well. Now, find a set A of results of the device (obviously, here is the family of all such sets) which is as likely as R(ed) and then check if B(lack) is as likely as Ac . If it is, the DM behaves identically when faced with (equally likely) ambiguous and unambiguous events, so that all the nonadditivity of ρ3 on {B, R} must be due to his probabilistic risk aversion. His preferences are also PS on the extended problem. If it is not, then DM 3’s behavior is affected by ambiguity, and his preferences are not PS on the extended problem. The point is that in the presence of a sufficiently rich , a DM whose preferences are PS is treating ambiguous and unambiguous events symmetrically, and is hence intuitively ambiguity neutral. Therefore, in such a case we would expect PS preferences to be found ambiguity neutral. This is not the case in the original version of Example 10.1, since a rich set of “unambiguous” events is missing. More generally, consider a biseparable preference which is not PS overall, but is PS when comparing only unambiguous acts. That is, the DM behaves as if he forms a probability P on the set , and calculates his willingness to bet on these events by means of a distortion function g which only reflects his probabilistic risk attitude. As we did in controlling for cardinal risk attitude, we want to use as benchmarks for only those PS preferences—that with a small abuse of notation we also denote —which have the same probabilistic risk attitude; for example, those biseparable preferences which share g as distortion function. Interestingly, it turns out that if the set E is rich enough, any PS preference satisfying Equation (10.20) for all h ∈ E has this property. This is exactly the approach followed by Epstein (1999) in his work on ambiguity aversion: He assumes the existence of a suitably rich set of “unambiguous” events,10 defines E as the set of all the -measurable acts, and uses Equation (10.20) with h ∈ E as his comparative ambiguity notion. His choice of benchmark are PS preferences. This approach attains the objective of “filtering” the effects of probabilistic risk attitude from our absolute ambiguity notion. It thus yields a finer assessment of the DM’s ambiguity attitude. However, the foregoing discussion has illustrated that a crucial ingredient to this filtration is the existence of a set of “unambiguous” acts which is sufficiently rich: If it is too poor (e.g. it contains only the constants, as in Example 10.5), we may use benchmarks whose probabilistic risk attitude is different from the DM’s. This may cause Epstein’s approach to reach counterintuitive conclusions, as illustrated in the previous examples. The main problem we have with this approach is that we find it undesirable to base our measurement of ambiguity attitude on an exogenous notion of “ambiguity,” especially in view of the richness requisite. It seems that in many cases of interest the “obvious” set of “unambiguous” acts does not satisfy such requisite; for example, Ellsberg’s example. Our objective is to develop a notion of ambiguity attitude which is based on the weakest set of primitive requisites (like the two assumptions stated in the Introduction), even though this has a cost in terms of the “purity” of the interpretation of the behavioral feature we measure.
234
Paolo Ghirardato and Massimo Marinacci
Epstein and Zhang (2001) propose a behavioral foundation to the notion of “ambiguity,” so that the existence of a rich set E can be objectively verified, solving the problem mentioned earlier. In Ghirardato and Marinacci (2000b) we present an example which suggests that their behavioral notion can lead to counterintuitive conclusions (in that case, an intuitively ambiguous event is found unambiguous). More generally, we see the following problem with this enterprise: There may be events which are “unambiguous” (resp. “ambiguous”) with respect to which the DM nonetheless behaves in an ambiguity nonneutral (resp. neutral) fashion. Consider a DM who listens to a weather forecast stated as a probabilistic judgment. If the DM does not consider the specific source reliable, he might express a willingness to bet which is a distortion of this judgment, while being probabilistic risk neutral. Alternatively, he may find the source reliable, hence perceive no ambiguity, but be probabilistically risk averse. A preference-based notion of ambiguity must be able to distinguish between these two cases, classifying the relevant events ambiguous in the first case and unambiguous in the second. And this without using any auxiliary information. Considering moreover that the set of “verifiably unambiguous” events must be rich, we are skeptical that this feat is possible: The problem is that the Savage set-up does not provide us with enough instruments; it is too abstract. 10.7.3.2. Summing up We have argued that what motivates using PS (rather than SEU) preferences as benchmarks is the objective of discriminating between probabilistic risk aversion and ambiguity attitude. We have shown that this requires a rich set of “verifiably unambiguous” events, and briefly reviewed our doubts about the possibility of providing a behavioral foundation to this “verifiable ambiguity” notion in a general subjective setting without extraneous devices. In contrast, the analysis in this chapter shows that there are no such problems in using SEU benchmarks to identify an “extended” notion of ambiguity attitude, which can be disentangled from cardinal risk attitude using only behavioral data and no extraneous devices. Though it does not distinguish “real” ambiguity and probabilistic risk attitudes, we think that this “extended” ambiguity attitude is worthwhile, especially because of its wider applicability.
Appendix A: Capacities and Choquet integrals A set-function ν on (S, ) is called a capacity if it is monotone and normalized. That is: if for A, B ∈ , A ⊆ B, then ν(A) ν(B); ν(Ø) = 0 and ν(S) = 1. A capacity is called a probability measure if it is finitely additive: ν(A ∪ B) = ν(A) + ν(B) for all A disjoint from B. It is called convex if for every pair A, B ∈ , we have ν(A ∪ B) ν(A) + ν(B) − ν(A ∩ B). The core of a capacity ν is the (possibly empty) set C(ν) of all the probability measures on (S, ) which dominate it, that is, C(ν) ≡ {P : P ∈ , P (A) ν(A) for all A ∈ }.
Ambiguity made precise
235
Following the usage in Cooperative Game Theory (e.g., Kannai, 1992), all capacities with nonempty core are called balanced. A capacity ν is called exact it is balanced and it is equal to the lower envelope of its core (i.e., for all A ∈ , ν(A) = minP ∈C(ν) P (A)). Convex implies exact, which in turn implies balanced, but the converse implications are all false. The notion of integral used for capacities is the Choquet integral, due to Choquet (1953). For a given -measurable function ϕ : S → R, the Choquet integral of ϕ with respect to a capacity ν is defined as:
∞
ϕ dν = S
ν({s ∈ S : ϕ(s) α}) dα
0
+
0
−∞
[1 − ν({s ∈ S : ϕ(s) α})] dα,
(10.A.1)
where the r.h.s. is a Riemann integral (which is well defined because ν is monotone). When ν is additive, (10.A.1) becomes a standard (additive) integral. In general it is seen to be monotonic, positive homogeneous and comonotonic addi tive: If ϕ, ψ : S → R are non-negative and comonotonic, then (ϕ + ψ) dν = ϕ dν + ψ dν. Two functions ϕ, ψ : S → R are called comonotonic if there are no s, s ∈ S such that ϕ(s) > ϕ(s ) and ψ(s) < ψ(s ).
Appendix B: Cardinal symmetry and biseparable preferences In this Appendix, we prove Proposition 10.1. In order to make the proof as clear as possible, we first explain the notion of “standard sequence,” and then show how the latter can be used to prove the proposition. 10.B.1. Standard sequences Consider a DM whose preferences have a canonical representation V , with canonical utility index u, willingness to bet ρ, and an essential event A ∈ . Fix a pair of consequences v ∗ v∗ , and consider x 0 ∈ X such that x 0 v ∗ . If there is an x ∈ X such that x A v∗ x 0 A v ∗ , then by (10.3) and the convexity of the range of u, there is x 1 ∈ X such that x 1 A v∗ ∼ x 0 A v ∗ .
(10.B.1)
It is easy to verify that x 1 x 0 : If x 0 x 1 held, by monotonicity and biseparability, we would have x 0 A v ∗ x 1 A v ∗ and x 1 A v ∗ x 1 A v∗ . This yields x 0 A v ∗ x 1 A v∗ , a contradiction. Assuming that there is an x ∈ X such that x A v∗ x 1 A v ∗ , as above we can find x 2 ∈ X such that x 2 A v∗ ∼ x 1 A v ∗ .
(10.B.2)
236
Paolo Ghirardato and Massimo Marinacci
Again, x 2 x 1 . We can use the representation V to check that the equivalences in (10.B.1) and (10.B.2) translate to u(x 1 ) − u(x 0 ) =
1 − ρ(A) (u(v ∗ ) − u(v∗ )) = u(x 2 ) − u(x 1 ), ρ(A)
(10.B.3)
that is, the three points x 0 , x 1 , x 2 , are equidistant in u. Proceeding in this fashion we can construct a sequence of points {x 0 , x 1 , x 2 , . . .} all evenly spaced in utility. Such sequence we call an increasing standard sequence with base x 0 , carrier A, and mesh (v∗ , v ∗ ). (Notice that the distance in utility between the points in the sequence is proportional to the distance in utility between v∗ and v ∗ , which is used as the “measuring rod.”) Analogously, we can construct a decreasing standard sequence with base x 0 , carrier A and mesh (v∗ , v ∗ ) where v∗ x 0 . This will be a sequence starting again from x 0 , but now moving in the direction of decreasing utility: For every n 0, v ∗ A x n+1 ∼ v∗ A x n . Henceforth, we call a standard sequence w.r.t. (x 0 , A) any sequence {x¯ 0 , x¯ 1 , x¯ 2 , . . .} such that x¯ 0 = x 0 , and there is a pair of points (above or below x 0 ) which provides the mesh for obtaining {x¯ 0 , x¯ 1 , x¯ 2 , . . .} as a decreasing/increasing standard sequence with carrier A. It is simple to see how—having fixed an essential event A, and a base x 0 which is non-extremal in the ordering on X (i.e. there are y, z ∈ X such that y x 0 z)— standard sequences can be used to measure the canonical utility index u of a biseparable preference (extending the scope of the method proposed by Wakker and Deneffe, 1996): One just needs to construct (increasing and decreasing) standard sequences with base x 0 and finer and finer mesh. In what follows we use standard sequences and cardinal symmetry to show that equality of the ui , i = 1, 2, can be verified without eliciting them. 10.B.2. Equality of utilities: Proof of Proposition 10.1 The proof of Proposition 10.1 builds on two lemmas. The first lemma, whose simple proof we omit, shows the following: Suppose that a pair of biseparable preferences are cardinally symmetric, then for fixed non-extremal x 0 and essential events A1 and A2 , the sets of the standard sequences (with respect to (x 0 , A1 ) and (x 0 , A2 ) respectively) of the orderings are “nested” into each other. Stating this lemma requires some terminology and notation: Given a standard sequence {x n } for preference relation i , we say that a sequence {y m } ⊆ X is a refinement of {x n } if it is itself a standard sequence, and it is such that y m = x n whenever m = kn for some k ∈ N. Two canonical utility indices are subject to a common normalization if they take identical values on two consequences x, y ∈ X such that x i y for both i. Finally, for the rest of this section: For each i = 1, 2, the carrier of any standard sequence for i is a fixed essential event Ai , and SQ(i , x 0 ) ⊆ X denotes the set of the points belonging to some standard sequence of i with base x 0 and carrier Ai .
Ambiguity made precise
237
Lemma 10.B.1. Suppose that 1 , 2 are as assumed in Proposition 10.1. Fix a non-extremal x 0 ∈ X. If 1 and 2 are cardinally symmetric, then the following holds: Either every standard sequence for ordering 1 is a refinement of a standard sequence for 2 , or every standard sequence for ordering 2 is a refinement of a standard sequence for 1 . Hence, SQ(1 , x 0 ) = SQ(2 , x 0 ) ≡ SQ(x 0 ). The second lemma shows that, because of cardinal symmetry, the result holds on SQ(x 0 ): Lemma 10.B.2. Suppose that 1 , 2 are as assumed in Proposition 10.1. If 1 and 2 are cardinally symmetric, then for any non-extremal x 0 ∈ X and any common normalization of the two indices, u1 (x) = u2 (x) for every x ∈ SQ(x 0 ). Proof. Fix a non-extremal x 0 . Suppose that x belongs to an increasing standard sequence for i , {x n }. Since the relations are cardinally symmetric, by Lemma 10.B.1 it is w.l.o.g. (taking refinements if necessary) to take the sequence to be standard for both orderings. That is, there are v∗ , v ∗ , w∗ , w∗ ∈ X such that v ∗ 1 v∗ , w ∗ 2 w∗ and for n 0, x n+1 A1 v∗ ∼1 x n A1 v ∗ , and analogously for 2 (with w replacing v). Moreover, there is n 0 such that x = x n . Choose x m for some m > n, and take positive affine transformations of the two canonical utility functions so as to obtain u1 (x 0 ) = u2 (x 0 ) = 0 and u1 (x m ) = u2 (x m ) = 1. All points in the sequence are evenly spaced for both preferences (cf. Equation (10.B.3)). Hence we have u1 (x n ) = u2 (x n ) = n/m. The case in which x belongs to a decreasing standard sequence is treated symmetrically. Finally, we have the immediate observation that if u1 (x) = u2 (x) for one common normalization, the equality holds for every common normalization. Proof of Proposition 10.1. The “if” part follows immediately from the canonical representation. We now prove the “only if.” Start by fixing a non-extremal x 0 and adding a constant to both indices, so that u1 (x 0 ) = u2 (x 0 ) = 0. Suppose that (after this transformation) there is x ∈ X such that u1 (x) = u2 (x). By relabelling if necessary, assume that u1 (x) = α > β = u2 (x). There are different cases to consider, depending on where α and β are located. Suppose first that β 0. Choose v ∗ ∈ X such that x 0 1 v ∗ and further transform the utilities so that u¯ 1 (v ∗ ) = u¯ 2 (v ∗ ) = −1, to obtain u¯ 1 (x) = α¯ > β¯ = u¯ 2 (x). Choose ε > 0 such that α¯ − β¯ > ε. By the connectedness of the range of each ui and Lemma 10.B.1, there are v∗ , w∗ ∈ X such that (v∗ , v ∗ ) and (w∗ , v ∗ ) generate the same standard sequence {x n } and u¯ 1 (x n+1 ) − u¯ 1 (x n ) = u¯ 2 (x n+1 ) − u¯ 2 (x n ) < ε. So the “length” of the utility interval between each element in the increasing ¯ We also proved standard sequence is smaller than the distance between α¯ and β.
238
Paolo Ghirardato and Massimo Marinacci
in Lemma 10.B.2 that for each element in the standard sequence, we have equality of the utilities (since we imposed a common normalization). Hence there must be ¯ α). n 0 such that u¯ 1 (x n ) = u¯ 2 (x n ) = γ ∈ (β, ¯ We then have u¯ 1 (x n ) > u¯ 1 (x) ⇐⇒ x n 1 x
and u¯ 2 (x n ) < u¯ 2 (x) ⇐⇒ x n ≺2 x,
which contradicts the assumption of ordinal equivalence. The case in which α 0 is treated symmetrically. If, finally, α > 0 > β then, using an argument similar to the one just presented, one can find x¯ ∈ X such that u1 (x) ¯ = u2 (x) ¯ ∈ (0, α) and obtain a similar contradiction. This shows that u1 (x) = u2 (x) for every x ∈ X.
Appendix C: Proofs for Sections 10.4–10.6 10.C.1. Section 10.4 Proof of Theorem 10.1. We first state without proof an immediate result: Lemma 10.C.1. Two preference relations 1 and 2 satisfying Equations (10.7) and (10.8) are ordinally equivalent. Given this lemma, if 1 and 2 have essential events the result follows immediately from Proposition 10.1. If, say, relation i does not have essential events, any ordinal transformations of ui is still a canonical utility. Since the two preferences are ordinally equivalent by the lemma, it is then w.l.o.g. to use uj (j = i) to represent both of them. Proof of Theorem 10.2. We first prove that D() ⊆ M(). Given a canonical representation V of with canonical utility u, suppose that P ∈ D(), and consider the relation induced by P and u. We want to show that is more ambiguity averse than . Since P ∈ D(), u(f ) dP V (f ) for all f ∈ F , so that for every x ∈ X and f ∈ F , u(x) u(f (s)) P (ds) =⇒ V (x) V (f ), S
where the implication follows from the definition u(x) = V (x) for all x ∈ X. This proves that (10.7) holds. Similarly one shows the validity of (10.8). Part (B) of Definition 10.5 is immediate: If and have essential events, then the result follows from Proposition 10.1. Hence ∈ R(), or in other words P ∈ M(). We now prove the opposite inclusion D() ⊇ M(). Suppose that P ∈ M(). Let be the benchmark preference corresponding to P , and let u be the canonical utility index of . Since is a benchmark for , we have for every x ∈ X and f ∈ F , u (x) u (f (s)) P (ds) =⇒ u(x) V (f ), (10.C.1) S
Ambiguity made precise
239
and the same with strict inequality. We have to show that P ∈ D(). By Theorem 10.1, it is w.l.o.g. to take u = u . Hence, (10.C.1) implies that u(f ) dP V (f ) for all f ∈ F , and so P ∈ D(). Proof of Corollary 10.1. By Theorem 10.2, M() = D(). Let P ∈ D(). For every A ∈ and x ∗ x∗ , consider the act f = x ∗ A x∗ . Normalizing u(x ∗ ) = 1 and u(x∗ ) = 0, we have P (A) = u(f (s)) P (ds) u(f (s)) ν(ds) = ν(A), S
S
and so P ∈ C(ν). This implies D() ⊆ C(ν). The converse inclusion is trivial, since P ∈ C(ν) implies u(f ) dP u(f ) dν for all f ∈ F . Proof of Corollary 10.2. We are done if we show that for all f , g ∈ F , u(f (s)) P (ds) min u(g(s)) P (ds). f g ⇐⇒ min P ∈D() S
P ∈D() S
(10.C.2) This follows from the fact that there exists a unique weak∗ -compact and convex set C representing . D() is clearly weak∗ -compact (so that the minimum in (10.C.2) is well defined) and convex. Hence, if (10.C.2) holds C = D(), and by Theorem 1 10.2, D() = M(). To prove (10.C.2), suppose there are f , g ∈ F such that min u(f ) dP min u(g) dP P ∈C
P ∈C
and
min
P ∈D()
u(f ) dP < min
P ∈D()
u(g) dP .
Let P ∗ ∈ arg min{ S u(f (s)) P (ds) : P ∈ D()}. Since C ⊆ D(), we have: min u(f (s)) P (ds) u(f (s)) P ∗ (ds) < P ∈C S
min
P ∈D() S
S
u(g(s)) P (ds) min
P ∈C S
u(g(s)) P (ds),
a contradiction. Similarly, one shows that there cannot be f , g ∈ F such that the preference based on D() prefers weakly f to g, while g f . This shows that Equation (10.C.2) holds, concluding the proof. Proof of Proposition 10.3. That every SEU preference is ambiguity neutral follows immediately from two applications of Theorem 10.1. As for the converse: If is both ambiguity averse and ambiguity loving, there are a SEU preference
240
Paolo Ghirardato and Massimo Marinacci
relation 1 (represented by probability P1 ) such that is more ambiguity averse than 1 , and a SEU preference relation 2 (represented by probability P2 ) which is more ambiguity averse than . Applying Definition 10.5 twice, we obtain that for every f ∈ F and x ∈ X, x 1 f ⇒ x 2 f
and
x >1 f ⇒ x >2 f .
We show that 1 and 2 are cardinally symmetric. This requires first showing that if 2 has an essential event, so must . Suppose that A ∈ is essential for 2 , so that for some x y (remember that and 1 and 2 are all ordinally equivalent), x >2 x A y >2 y. Using the contrapositive of (10.7), we then have x A y y. Since 2 is a SEU preference, Ac is also 2 -essential, similarly implying x Ac y y. Now, suppose that has no essential event. Because of the preferences we just derived, we must have both x ∼ x A y and x ∼ x Ac y. This is impossible since 1 ∈ R(), for the contrapositive of (10.8) then yields x A y 1 x, which implies P1 (A) = 1, and x Ac y 1 x, which implies P1 (A) = 0. This gives us a contradiction, so that must have an essential event if 2 does. Hence, 2 and have essential events, and they are cardinally symmetric by assumption. Similarly one shows that 1 and have essential events and are cardinally symmetric. It is now immediate to check that these facts imply that 1 and 2 are cardinally symmetric. We thus conclude that 2 is more ambiguity averse than 1 . Mimicking the last part of the proof of Theorem 10.2, we then show that then P1 P2 , which immediately implies P1 = P2 , so that 1 = 2 ≡ . Thus is both more and less ambiguity averse than , which immediately implies = . Proof of Theorem 10.3. Part (i) follows immediately along the lines of the proofs of Theorem 10.2 and Corollary 10.1. As for part (ii), it is similarly immediate to show that if 2 is more ambiguity averse than 1 , then C1 ⊆ D(2 ) and u1 ≈ u2 . We show the converse. Let V1 and V2 denote the canonical representations of 1 and 2 , and w.l.o.g. assume that u1 = u2 = u. Then C1 ⊆ D(2 ) implies that for every f ∈ F and every P ∈ C1 , V2 (f ) u(f ) dP . Hence, using the fact that 1 is MEU, we find V2 (f ) min u(f (s)) P (ds) = V1 (f ), P ∈C1 S
which immediately yields the desired result. 10.C.2. Section 10.5 Proof of Proposition 10.5. Let ∈ R() and set ρ(Ac ) = 1}. If A ∈ for all x ∈ X we have u(x) = P (A) ⇐⇒ u(x) = ρ(A) u(x) = P (Ac ) ⇐⇒ u(x) = ρ(Ac ),
≡ {A ∈ : ρ(A) +
Ambiguity made precise and so ρ(A) = P (A) and ρ(Ac ) = P (Ac ). This implies that A ∈ ⊆ . Now, if A ∈ we have ρ(A) = P (A)
and
ρ(Ac ) = P (Ac ).
241
, so that
(10.C.3)
In order to show that A ∈ , we need to show that any act measurable w.r.t. the partition {A, Ac } is in H . This follows from (10.C.3), as for every x, y ∈ X we have V (x A y) = V (x A y). Thus ⊆ , which concludes the proof. 10.C.3. Section 10.6 Proof of Proposition 10.6. Suppose, to the contrary, that ν agrees with (10.17). If Eq. (10.15) holds then P (R) = ν(R) and P (B, Y ) = ν(B, Y ) for all P ∈ C(ν), so that we have P (B, Y ) = ν(B, Y ) < ν(B, R) P (B, R). In turn, this implies P (Y ) < P (R), yielding ν(Y ) P (Y ) < P (R) = ν(R). Hence ν(Y ) < ν(R), contradicting (10.17). Proof of Proposition 10.7. Every ν which satisfies (10.18) is such that C(ν) = Ø. For, the measure P such that P (R) = P (B) = P (Y ) = 1/3 belongs to C(ν). This proves that all preferences satisfying (10.18) are ambiguity averse. As to the converse, let be ambiguity averse, that is, C(ν) = Ø. Let P ∈ C(ν). Assume first that ν(B) = ν(Y ) > ν(R). Since P (B) ν(B) and P (Y ) ν(Y ), P (B) + P (R) + P (Y ) > ν(B) + ν(R) + ν(Y ) > 1, a contradiction. Assume now ν(B, Y ) < ν(B, R) = ν(R, Y ). This implies P (B, Y ) < P (B, R) and P (B, Y ) < P (R, Y ), so that P (Y ) < P (R), P (B) < P (R), and P (B) + P (R) + P (Y ) < 1, a contradiction.
Acknowledgments An earlier version of this chapter was circulated with the title “Ambiguity Made Precise: A Comparative Foundation and Some Implications.” We thank Kim Border, Eddie Dekel, Itzhak Gilboa, Tony Kwasnica, Antonio Rangel, David Schmeidler, audiences at Caltech, Johns Hopkins, Northwestern, NYU, Rochester, UC-Irvine, Université Paris I, the TARK VII-Summer Micro Conference (Northwestern, July 1998), the 1999 RUD Workshop, and especially Simon Grant, Peter Klibanoff, Biung-Ghi Ju, Peter Wakker, and an anonymous referee for helpful comments and discussion. Our greatest debt of gratitude is however to Larry Epstein, who sparked our interest on this subject with his paper (Epstein (1999)) and stimulated it with many discussions. Marinacci gratefully acknowledges the financial support of MURST.
242
Paolo Ghirardato and Massimo Marinacci
Notes 1 Other widespread names are “uncertainty aversion” and “aversion to Knightian uncertainty.” We like to use “uncertainty” in its common meaning of any situation in which the consequences of the DM’s possible actions are not known at the time of choice. 2 A bet “on” an event is any binary act in which a better payoff (“win”) is received when the event obtains. 3 There are earlier papers that use a comparative approach for studying ambiguity attitude, but they do not use it as a basis for defining absolute notions. For example, Tversky and Wakker (1995). 4 See Appendix A for the definition of capacities, Choquet integrals, and some of their properties. 5 We use the symbols (and >) to denote SEU weak (and strict) preferences. 6 Such set is well-defined since it is trivially true that the union of any collection of sets satisfying (A) and (B) below also satisfies the two conditions. 7 See Zhang (1996) for a compelling urn example in which this happens. 8 For any pair of biseparable preferences which have essential events, this elicitation can be done without extraneous devices by using the tradeoff method briefly outlined in Appendix B. 9 We thank Peter Klibanoff for his substantial help in developing the ensuing discussion. 10 The richness condition is: For every F ⊆ E in and A ∈ such that A is as likely as E, there is B ⊆ A in such that B is as likely as F . Epstein remarks that richness of is not required for some of his results.
References F. I. Anscombe and R. J. Aumann (1963), A definition of subjective probability, Ann. Math. Stat. 34, 199–205. K. J. Arrow (1974), The theory of risk aversion, in “Essays in the Theory of Risk-Bearing,” Chap. 3, North-Holland, Amsterdam. R. Casadesus-Masanell, P. Klibanoff, and E. Ozdenoren (2000), Maxmin expected utility over Savage acts with a set of priors, J. Econ. Theory 92, 33–65. A. Chateauneuf and J. M. Tallon (1998), Diversification, convex preferences and non-empty core, mimeo, Université Paris I, July. G. Choquet (1953), Theory of capacities, Ann. Inst. Fourier (Grenoble) 5, 131–295. B. de Finetti (1952), Sulla preferibilitá, Giorn. Econ. 6, 3–27. D. Ellsberg (1961), Risk, ambiguity, and the Savage axioms, Quart. J. Econ. 75, 643–669. L. G. Epstein (1999), A definition of uncertainty aversion, Rev. Econ. Stud. 66, 579–608. (Reprinted as Chapter 9 in this volume.) L. G. Epstein and T. Wang (1994), Intertemporal asset pricing under Knightian uncertainty, Econometrica 62, 283–322. (Reprinted as Chapter 18 in this volume.) L. G. Epstein and J. Zhang (2001), Subjective probabilities on subjectively unambiguous events, Econometrica 69, 265–306. P. C. Fishburn (1993), The axioms and algebra of ambiguity, Theory Dec. 34, 119–137. P. Ghirardato and J. N. Katz (2000a), “Indecision Theory: Explaining Selective Abstention in Multiple Elections,” Social Science Working Paper 1106, Caltech, November. P. Ghirardato and M. Marinacci (2000b), “Risk, Ambiguity, and the Separation of Utility and Beliefs,” Social Science Working Paper 1085, Caltech, March (Revised: January 2001).
Ambiguity made precise
243
P. Ghirardato and M. Marinacci (2000), A subjective definition of ambiguity, Work in progress, Caltech and Università di Torino. I. Gilboa and D. Schmeidler (1989), Maxmin expected utility with a non-unique prior, J. Math. Econ. 18, 141–153. (Reprinted as Chapter 6 in this volume.) L. P. Hansen, T. Sargent, and T. D. Tallarini (1999), Robust permanent income and pricing, Rev. Econ. Stud. 66, 873–907. Y. Kannai (1992), The core and balancedness, in “Handbook of Game Theory” (R. J. Aumann and S. Hart, eds), pp. 355–395, North-Holland, Amsterdam. D. Kelsey and S. Nandeibam (1996), On the measurement of uncertainty aversion, mimeo, University of Birmingham, September. D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky (1971), “Foundations of Measurement: Additive and Polynomial Representations,” Vol. 1, Academic Press, San Diego. M. J. Machina and D. Schmeidler (1992), A more robust definition of subjective probability, Econometrica 60, 745–780. A. Montesano and F. Giovannoni (1996), Uncertainty aversion and aversion to increasing uncertainty, Theory Dec. 41, 133–148. S. Mukerji (1998), Ambiguity aversion and incompleteness of contractual form, Amer. Econ. Rev. 88, 1207–1231. (Reprinted as Chapter 14 in this volume.) K. Nehring (1999), Capacities and probabilistic beliefs: A precarious coexistence, Math. Soc. Sci. 38, 197–213. J. W. Pratt (1964), Risk aversion in the small and in the large, Econometrica 32, 122–136. L. J. Savage (1954), “The Foundations of Statistics,” Wiley, New York. D. Schmeidler (1989), Subjective probability and expected utility without additivity, Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) A. Tversky and P. P. Wakker (1995), Risk attitudes and decision weights, Econometrica 63, 1255–1280. P. P. Wakker (1989), “Additive Representations of Preferences,” Kluwer, Dordrecht. P. P. Wakker and D. Deneffe (1996), Eliciting von Neumann–Morgenstern utilities when probabilities are distorted or unknown, Manage. Sci. 42, 1131–1150. M. E. Yaari (1969), Some remarks on measures of risk aversion and on their uses, J. Econ. Theory 1, 315–329. J. Zhang (1996), Subjective ambiguity, probability and capacity, mimeo, University of Toronto, October.
11 Stochastically independent randomization and uncertainty aversion Peter Klibanoff
11.1. Introduction An example seminal to interest in uncertainty (or ambiguity) aversion is Ellsberg’s (1961) “two-color” problem. There is a “known urn” which contains 50 red balls and 50 black balls, and an “unknown urn” which contains a mix of red and black balls, totaling 100, about which no information is given. Ellsberg observed (as did many afterwards, more carefully) that a substantial fraction of individuals were indifferent between the colors in both urns, but preferred to bet on either color in the “known urn” rather than the corresponding color in the “unknown urn.” This violates not only expected utility (EU), but probabilistically sophisticated behavior more generally. One contemporary criticism of the displayed behavior was put forward by Raiffa (1961) who pointed out that flipping a coin to decide which color to bet on in the unknown urn should be viewed as equivalent to betting on the “known” 50–50 urn. One can think of such preferences as displaying a preference for randomization. Jumping ahead to more recent work, there is a burgeoning literature attempting to model uncertainty (or ambiguity) aversion in decision makers using representations with nonadditive probabilities or sets of probabilities. Some of this work (e.g. Lo, 1996; Klibanoff, 1994) accepts this preference for mixture or randomization as a facet of uncertainty aversion, while other work (e.g. Dow and Werlang, 1994; Eichberger and Kelsey, 2000) does not. This has led to several papers, most directly Eichberger and Kelsey (1996), but also Ghirardato (1997) and Sarin and Wakker (1992), related to this difference. In particular, all three papers observe that the choice of a “one-stage” or Savage model as opposed to a “two-stage” or Anscombe–Aumann model can lead to different preferences when modeling uncertainty aversion. In Eichberger and Kelsey (1996) the authors set out to “show that while individuals with nonadditive beliefs may display a strict preference for randomization in an Anscombe–Aumann framework they will not do so in a Savage-style decision theory.”1
Klibanoff, P. Stochastically independent randomization and uncertainty aversion. Economic Theory 18, 605–620.
Stochastically independent randomization
245
This chapter was motivated in part by the intuition that the one-stage/two-stage modeling distinction is largely a red herring, at least as it relates to preference for randomization. In particular, while appreciating that there can be differences between the frameworks, one goal of this chapter is to relate these differences to violations of stochastic independence and to point out that they have essentially no role to play in the debate over preference for randomization in uncertainty aversion. In making this point, the related finding of the restrictiveness of Choquet expected utility (CEU) preferences in allowing for randomizing devices is key. An additional contribution of the chapter is to provide preference based conditions to describe a stochastically independent randomizing device in a nonBayesian environment. Section 11.2 sets out some preliminaries and notation. Section 11.3 describes two frameworks in which a randomizing device can be modeled. Section 11.4 provides the key preference conditions and contains the main results on the restrictiveness of CEU when stochastic independence is required and the relative flexibility of Maxmin expected utility (MMEU) with multiple priors. Section 11.5 concludes.
11.2. Preliminaries and notation We will consider two representations of preferences, each of which generalizes EU and allows for uncertainty aversion. The first model is CEU. CEU was axiomatized first in an Anscombe–Aumann framework by Schmeidler (1989), and then in a Savage framework by Gilboa (1987) and Sarin and Wakker (1992). In a Savage framework, but assuming a rich set of consequences and a finite state space, Wakker (1989), Nakamura (1990), and Chew and Karni (1994) have axiomatized CEU. The second model is MMEU with non-unique prior. MMEU was first axiomatized in an Anscombe–Aumann framework by Gilboa and Schmeidler (1989). In a Savage framework, but assuming a rich set of consequences and allowing a finite or infinite state space, MMEU has been axiomatized by Casadesus-Masanell et al. (2000b). Consider a finite set of states of the world S. Let X be a set of consequences. An act f is a function from S to X. Denote the set of acts by F . A function v : 2S → [0, 1] is a capacity or nonadditive probability if it satisfies, (i) v(∅) = 0, (ii) v(S) = 1, and (iii) A ⊆ B implies v(A) ≤ v(B). It is convex if, in addition, (iv) For all A, B ⊆ S, v(A) + v(B) ≤ v(A ∪ B) + v(A ∩ B). Now define the (finite) Choquet integral of a real-valued function a to be: a dv = α1 v(E1 ) +
n
αi [v(∪ij =1 Ej ) − v(∪i−1 j =1 Ej )],
i=2
where αi is the ith largest value that a takes on, and Ei = a −1 (αi ).
246
Peter Klibanoff
Let ! be a binary relation on acts, F , that represents (weak) preferences. A decision maker is said to have CEU preferences if there exists a utility function S u : X → / and a nonadditive probability v : 2 → / such that, for all f , g ∈ F , f ! g if and only if u ◦ f dv ≥ u ◦ g dv. CEU preferences are said to display uncertainty aversion if v is convex.2 A decision maker is said to have MMEU preferences if there exists a utility function u : X → / and a non-empty, closed and convex set B of additive probability measures on S such that, for all f , g ∈ F , f ! g if and only if minp∈B u ◦ f dp ≥ minp∈B u ◦ g dp. All MMEU preferences display uncertainty aversion.3 Finally, note that the set of MMEU preferences strictly contains the set of CEU preferences with convex capacities.
11.3. Modeling a randomizing device Corresponding to the two standard frameworks for modeling uncertainty (Anscombe–Aumann and Savage) there are at least two alternative ways to model a randomizing device. In an Anscombe–Aumann setting, a randomizing device is incorporated in the structure of the consequence space. Specifically the “consequences” X, are often taken to be the set of all simple probability distributions over some more primitive set of outcomes, Z. In this setup, a randomization over two acts f and g with probabilities p and 1 − p respectively is modeled by an act h where h(s)(z) = pf (s)(z) + (1 − p)g(s)(z), for all s ∈ S, z ∈ Z. Observe that h is, indeed, a well-defined act because the set of simple probability distributions is closed under mixture. Returning to the “unknown urn” of the introduction, Table 11.1 shows the three acts (a) “bet on red,” (b) “bet on black,” and (c) “randomize 50–50 over betting on red or on black” as modeled in this setting. Alternatively, consider a Savage-style setting with a finite state space (e.g. Wakker (1984), Nakamura (1990), or Gul (1992)). Here a convex combination of two elements of the consequence space X need not be an element of X (and need not even be defined). Therefore, to model a randomization, we may instead expand the original state space, S, by forming the cross product of S with the possible outcomes (or “states”) of the randomizing device. For example, Table 11.2 shows the acts (a) “bet on red,” (b) “bet on black,” (c) “bet on red if heads, black if tails,” and (d) “bet on black if heads, red if tails” in the case of the unknown urn with a coin used to randomize. Table 11.1 Unknown urn with randomization in the consequence space (Anscombe– Aumann) R(ed)
B(lack)
(a) (b)
$100 $0
$0 $100
(c)
1 2 $100
⊕ 12 $0
1 2 $100
⊕ 12 $0
Stochastically independent randomization
247
Table 11.2 Unknown urn with randomization in the state space only (Savage)
(a) (b) (c) (d)
R(ed), H(eads)
B(lack), H(eads)
R(ed), T(ails)
B(lack), T(ails)
$100 $0 $100 $0
$0 $100 $0 $100
$100 $0 $0 $100
$0 $100 $100 $0
In comparing the two models, observe that the Anscombe–Aumann setting builds in several key properties that a randomizing device should satisfy while the Savage setting does not. In particular, the probabilities attached to the outcomes of the randomizing device should be unambiguous and the device should be stochastically independent from the (rest of the) state space. Arguably these two properties capture the essence of what is meant by a randomizing device. Both properties are automatically satisfied in an Anscombe–Aumann setting. In a Savage setting, as we will see later, these properties require additional restrictions on preferences.4 Several recent papers (including Eichberger and Kelsey, 1996; Ghirardato, 1997; and Sarin and Wakker, 1992), have noted that CEU need not give identical results in the two frameworks. Specifically, they suggest that the choice of a one-stage (Savage) or two-stage (Anscombe–Aumann) model can lead to different behavior. To see this in the unknown urn example, consider the case where the decision maker’s marginal capacity over the colors is v(R) = v(B) = 13 . In the Anscombe–Aumann setting this is enough to pin down preferences as c a ∼ b (i.e. the Raiffa preferences or preference for randomization). In the Savage setting, consider the capacity given by v(R × {H , T }) = v(B × {H , T }) = v({R, B} × H ) = v({R, B} × T ) = v(R × H ) = v(R × T ) = v(B × H ) = v(B × T ) = v((R × H ) ∪ (B × T )) = v((R × T ) ∪ (B × H )) = v(any 3 states) =
1 , 3 1 , 2 1 , 6 1 , 3 2 . 3
This capacity yields the preferences a ∼ b ∼ c ∼ d, and thus does not provide a preference for randomization as in the Anscombe–Aumann setting. Why does this occur despite the fact that the marginals are identical in the two cases and the product capacity is equal to the product of the marginals on all rectangles? Mathematically, as Ghirardato (1997) explains, the source is a failure of the usual Fubini Theorem to hold for Choquet integrals. Intuitively, however, it is not clear what is going “wrong” in the example.
248
Peter Klibanoff Table 11.3 Non-product weights for randomized act
c weights
R, H
B, H
R, T
B, T
$100
$0
$0
$100
1 6
1 3
1 3
1 6
To gain some insight, it is useful to examine the weights applied to each state when evaluating the randomized acts using the Choquet integral. For example, as Table 11.3 shows, “Bet on Red if Heads, Black if Tails” is evaluated using nonproduct weights. The fact that such non-product weights can be applied suggests that the CEU preferences with the capacity above reflect ambiguity not only about the color of the ball drawn from the urn but also about the correlation between the randomizing device and the color of the ball. This can also be seen by noting that v({R, B} × H ) > v((R × H ) ∪ (B × T )), in contrast to the equality one might expect if H and T are really produced by a symmetric, independent randomization. While such ambiguity is certainly possible, it runs directly counter to the stochastic independence we would expect of a randomizing device. In the next section, therefore, I propose conditions on preferences that ensure this independence.
11.4. Stochastically independent randomization and preferences Here I propose conditions on preferences that are designed to reflect two properties of a randomizing device: unambiguous probabilities and stochastic independence. These two properties are essential to what is meant by a randomizing device. Formally, consider preferences, !, over acts, F : S → X, on a finite product state space, S = S1 ×S2 ×· · ·×SN . Let S−i denote the product of all ordinates other than i. Denote by FSi the subset of acts for which outcomes are determined entirely by the ith ordinate. This means that f ∈ FSi implies f (si , s−i ) = f (si , sˆ−i ) for all s−i , sˆ−i ∈ S−i and si ∈ Si . For f , g ∈ F and A ⊆ S, denote by fA g the act which equals f (s) for s ∈ A and equals g(s) for s ∈ / A. We now state some useful definitions concerning preferences. Definition 11.1. ! satisfies solvability on Si if, for f ∈ FSi , x, y, z ∈ X and Ai ⊆ Si , xAi ×S−i z ! f ! yAi ×S−i z implies f ∼ wAi ×S−i z for some w ∈ X. Solvability should be seen as a joint richness condition on ! and X. It is satisfied in all axiomatizations of which I am aware of, EU, CEU, or MMEU over Savage acts on a finite state space. For example, Nakamura (1991) imposes solvability directly, while Wakker (1984, 1989), Gul (1992) and Casadesus-Masanell et al. (2000a,b) ensure it is satisfied through topological assumptions on X and continuity assumptions on !.
Stochastically independent randomization
249
Definition 11.2. ! satisfies expected utility (EU) on Si if ! restricted to FSi can be represented by expected utility where the utility function is unique up to a positive affine transformation and the probability measure on the set of all subsets of Si is unique. While the definition is intentionally stated somewhat flexibly, it could easily be made more primitive/rigorous by assuming that preferences restricted to FSi satisfy the axioms in one of the existing axiomatizations of EU over Savage acts on a finite state space such as Wakker (1984), Nakamura (1991), Gul (1992), or Chew and Karni (1994). This definition is intended to capture the fact that the decision maker associates a unique probability distribution with Si and uses that distribution to weight outcomes. Note that the uniqueness requirement on the probability measure entails the existence of consequences x, y ∈ X such that x y (where preferences over X are derived from preferences over the associated constant-consequence acts in the usual way). Furthermore, any of the axiomatizations cited will imply solvability on Si as well. Definition 11.3. si ∈ Si is null if fsi ×S−i h ∼ gsi ×S−i h for all f , g, h ∈ FSi . Note that given EU, a state is null if and only if it is assigned zero probability. Definition 11.4. Si is stochastically independent of S−i if, for all sˆ−i ∈ S−i , f ∈ FSi and w ∈ X, f ∼w
(11.1)
implies, fSi ׈s−i w ∼ w.
(11.2)
While this is formulated as a general definition of stochastic independence of an ordinate, this chapter will focus only on independence of a randomizing device. For this purpose, the main definition is the following: Definition 11.5. Si is a stochastically independent randomizing device (SIRD) if Si is stochastically independent and contains at least two non-null states, and ! satisfies solvability and EU on Si . This condition is designed to differentiate between EU ordinates that are stochastically independent from the rest of the state space and those that are dependent, while still allowing for possible uncertainty aversion on other ordinates. A useful way to understand this definition is as follows: There are several potential reasons why Equation (11.1) could hold while Equation (11.2) is violated. First, it might be that uncertainty aversion over Si leads a different marginal probability measure over Si to be used when evaluating the acts in (11.2) than when evaluating acts
250
Peter Klibanoff
in (11.1). This is ruled out by the assumption that preferences satisfy EU on Si . Second, it might be that the marginal over Si conditional on sˆ−i is different than the unconditional marginal over Si due to some stochastic dependence (or uncertainty about stochastic independence) between Si and S−i . Since we want to model an independent randomizing device, it is proper that the SIRD condition does not allow for such dependence. Also supporting the idea that this definition reflects stochastic independence is the observation that if preferences are EU and nontrivial, then Si an SIRD is equivalent to requiring that the representing probability measure be a product measure on Si × S−i . Note also that all of the results that follow will also hold true if we additionally impose that S−i is stochastically independent of Si (by switching the role of i and −i in the definition of stochastically independent). Thus this concept shares the symmetry that a notion of stochastic independence should intuitively possess. In the next two sections, we develop the implications of SIRD for some common classes of uncertainty averse preferences. 11.4.1. MMEU and randomizing devices This section develops the implications for MMEU preferences of one ordinate of the state space being a SIRD. MMEU will be found to be flexible enough to easily incorporate both a SIRD and uncertainty aversion. Theorem 11.1. Assume ! are MMEU preferences satisfying solvability for some Si that contains at least two non-null states. Then the following are equivalent: (i) Si is a SIRD; (ii) There exists a probability measure on 2Si , p, ˆ such that all probability measures, p, in the closed, convex set of measures, B, of the MMEU representation satisfy p(s) = p(s ˆ i )p(Si × s−i ), for all s ∈ S. Proof. ((i) ⇒ (ii)) We first show that all p ∈ B must have the same marginal on Si . Fix outcomes x, y ∈ X such that x y. EU on Si implies that ! restricted to FSi ˆ i ) where pˆ is the unique representing may be represented by si ∈Si u(f (si ))p(s probability measure on 2Si , and u is unique up to a positive affine transformation. Using the MMEU representation of ! yields a utility function u˜ and a set of measures B such that, for all f , g ∈ FSi , min p∈B
si ∈Si
⇐⇒
u(f ˜ (si ))p(si × S−i ) ≥ min p∈B
si ∈Si
u(f (si ))p(s ˆ i) ≥
u(g(s ˜ i ))p(si × S−i )
si ∈Si
u(g(si ))p(s ˆ i ).
si ∈Si
Without loss of generality, set u(x) = u(x) ˜ = 1 and u(y) = u(y) ˜ = 0. Using the fact that Si satisfies EU and solvability, combined with the MMEU representation,
Stochastically independent randomization
251
allows one to apply Nakamura (1990: lemma 3) and conclude that, given the normalization, the two utility functions must be the same (i.e. u(x) ˜ = u(x) for all x ∈ X). Therefore, min u(f (si ))p(si × S−i ) ≥ min u(g(si ))p(si × S−i ) p∈B
si ∈Si
⇐⇒
p∈B
ˆ i) ≥ u(f (si ))p(s
si ∈Si
si ∈Si
u(g(si ))p(s ˆ i ).
si ∈Si
ˆ i ) for some si ∈ Si . Suppose there is some p ∈ B such that p (si × S−i ) = p(s Without loss of generality, assume that p(s ˆ i ) > p (si × S−i ) for an si ∈ Si . Consider the act f = xsi ×S−i y. Solvability guarantees that there exists a z ∈ X such that z ∼ f . Thus, u(f (si ))p(si × S−i ) u(z) = min p∈B
≤
0 ⇔ s(ωi ) ≥ 0, and rearranging terms (14A.3) yields (14.A.5), and (14.A.4) yields (14A.6):
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi ))
ωi ∈
≥ hB (βH ) − hB (βL )
(14.A.5)
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βH , σL ))(s(ωi )δ(ωi ))
ωi ∈
≥ hS (σH ) − hS (σL ).
(14.A.6)
Hence solutions to (14.A.1) and (14.A.2) will exist if we find t˜ that solves (14.A.7) and (14.A.8):
[π˜ b (ωi |βH , σH ) − π˜ b (ωi |βL , σH )] × [s(ωi )δ(ωi ) − t˜(ωi )]
ωi ∈
≥
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi ))
(14.A.7)
ωi ∈
[π˜ s (ωi |βH , σH ) − π˜ s (ωi |βH , σL )][t˜(ωi )]
ωi ∈
≥
ωi ∈
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βH , σL ))(s(ωi )δ(ωi )).
(14.A.8)
Incompleteness of contractual form
325
Using matrix notation the inequalities (14.A.7) and (14.A.8) may be replaced by (14.A.9.): ⎡ ⎤ . t˜(ω1 ) −π˜ b (ωi |βH , σH ) + π˜ b (ωi |βL , σH ) ⎢ ⎥ .. ⎣ ⎦ . π˜ s (ωi |βH , σH ) − π˜ s (ωi |βH , σL ) t˜(ωN ) ⎤ ⎡ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi )) ⎥ ⎢ ω ∈
⎥ ⎢ i ≥⎢ ⎥ . (14.A.9) ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωi ∈
Consider the case where π(·|β, σ ) is unambiguous. It is possible to find a (bounded) t˜ if the vectors −π˜ b (β H , σ H ) + π˜ b (β L , σ H ) and π˜ s (β H , σ H ) − π˜ s (β H , σ L ) are independent, which in turn is ensured by Condition 14.1. Thus the proof is complete for this case. However, in general, the fact that the system in (14.A.9) has a bounded solution does not immediately follow from our assumption on independence. The reason is that, when the core of π (β, σ ) is a nonsingleton set, the vectors −π˜ b (β H , σ H ) + π˜ b (β L , σ H ) and π˜ s (β H , σ H ) − π˜ s (β H , σ L ) are not completely “exogenous” to the system; they are “endogenous” in so far as they depend on ˜t. However, Condition 14.1 applied in conjunction with a standard fixed-point argument shows that (14.A.9) indeed has a bounded solution. To proceed with the fixed-point argument, I first construct an appropriate mapping in the following three steps. Step 1. Pick any two elements from (π(·|βH , σH )) and an element each from (π(·|βH , σL )) and (π(·|βL , σH )). Denote them as π˜ 1 , π˜ 2 , π˜ 3 , π˜ 4 , respectively. Step 2. Consider the solution set to the system given by (14.A.10): ⎡ ⎤ t(ω1 ) * ) −π˜ 1 + π˜ 4 ⎢ ⎥ .. ⎦ . π˜ 2 − π˜ 3 ⎣ t(ωN ) ⎤ ⎡ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi )) ⎥ ⎢ ω ∈
⎥ ⎢ i ≥⎢ ⎥. ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωi ∈
(14.A.10) Let τ = {t|t solves (14A.10)}. Recall the role of contingencies ωk and ωl as stated in Condition 14.1. Make a selection t¯ from τ such that t(ωi ) = 0, if i = k or i = l. It follows from Condition 14.1 that such a selection exists and is unique. Furthermore, t¯ is bounded.
326
Sujoy Mukerji
Step 3. Finally consider a set {π˜ 1t¯, π˜ 2t¯, π˜ 3t¯, π˜ 4t¯} where π˜ 1t¯, π˜ 2t¯ ∈ π(t¯; βH , σH ), π˜ 3t¯ ∈ π(t¯; βH , σL ), π˜ 4t¯ ∈ π(t¯; βL , σH ). Steps 1 and 2 together define a continuous function 1 : (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βL , σH )) → RN while Step 3 defines a convex-valued upper hemicontinuous correspondence. 2 : RN ⇒ (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βl , σH )). Hence the composition , ≡ 2 ◦ defines a convex-valued upper hemicontinuous correspondence from the convex domain (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βL , σH )) into itself. Kakutani’s fixed-point theorem ensures that has a fixed point. Let {π1t ∗, π2t ∗, π3t∗ , π4t∗ } be a fixed point of . If t∗ solves (14.A.11), then clearly t∗ satisfies conditions required of the transfer t˜ (in 14.A.1) and (14.A.2): ⎤ ⎡ * t(ω1 ) ) −π1t ∗ + π4t ∗ ⎢ . ⎥ . π˜ 2t ∗ − π˜ 3t ∗ ⎣ . ⎦ t(ωn ) ⎡ ⎤ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βl , σH ))(s(ωi )δ(ωi )) ⎢ ω ∈
⎥ ⎢ t ⎥ ≥⎢ ⎥ ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωt ∈
(14.A.11) Proof of Proposition 14.1. Lemma 14.1 proves that a (bounded) t exists which satisfies the incentive constraints relevant to implementing the first best. Given such a t the expected payoffs to B and S are E(s(ωi )δ(ωi ) − t(ωi )|βH , σH ) − hB (βH ) and E(t(ωi )|βH , σH ) − hS (σH ), respectively. Since the expectations operator is additive, the sum of the expected payoffs to the two parties is E(s(ωi )δ(ωi )|βH , σH ) − hB (βH ) − hS (σH ). (βH , σH ) is the first best, implying E(s(ωi )δ(ωi )|βH , σH ) − hB (βH ) − hS (σH ) ≥ 0. Hence participation constraints can be taken care of by a transfer
Incompleteness of contractual form
327
τ ∈ R transacted when the contract is signed, so long as τ satisfies the following conditions: (P CB∗ ) E(s(ωi )δ(ωi ) − t(ωi )|βH , σH ) − τ − hB (βH ) ≥ 0 (P CS∗ ) E(t(ωi )|βH , σH ) + τ − hS (σH ) ≥ 0. Lemma 14.A.1. Given that f : → R, g : → R and that π is a convex nonadditive probability function, π : 2 → [0, 1]; and the labeling of the state space = {ωi }N i=1 is such that f (ωm ) > g(ωn ) ⇒ m > n. (a) if f and g are comonotonic then Eπ (f + g) = Eπ (f ) + Eπ (g). (b) Let f and f + g be comonotonic, and suppose that there is ωk+1 , ωk such that (i) and (ii) holds: (i) g(ωk+1 ) < g(ωk ); (ii) π({ωk+1 }) + π({ωk }) < π({ωk+1 , ωk }). Then Eπ (f + g) > Eπ (f ) + Eπ (g). Proof. The proof is straightforward and hence omitted. Lemma 14.A.2. Assume π(·|·, ·) satisfies Conditions 14.2a, 14.2b, and that (βh , σh ) is the first-best action profile. Then there exist hB (·), hS (·) such that for any t˜ which satisfies the incentive, compatibility conditions I C B , I C S in Lemma 14.1, t˜ and the vector [max s(·), 0 − t˜(·)] are not comonotonic. Proof. The strategy of the proof will be to choose hS (·) such that hS (σH ) − hS (σL ) = SocBen (σH /σL ) and then show that if t˜ and s − t˜ are comonotonic, B’s marginal private benefit from choosing βH (i.e. E(s− ˜t|βH , σH )−E(s− ˜t|βL , σH ) falls short of SocBen (βH /βL ). Hence we can choose hB (·) with the difference hB (βH ) − hB (βL ) large enough [but less than SocBen (βH /βL )], so that if ˜t, s − ˜t are to satisfy (I C B ), (I C S ), then it must be that ˜t and s − ˜t are not comonotonic. Fix s(·) such that SocBen(σH /σL ) > 0 and SocBen(βH /βL ) > 0 and consider hS (·) and hB (·) such that (βH , σH ) is the first-best action profile. Choose hS (·) such that hS (σH ) − hS (σL ) = SocBen(σH /σL ). Choose t˜ such that s, t˜ and s − t˜ are comonotonic and t˜ satisfies I C S . Let i S be the smallest value of the contingent state index i, such that π(X(ωi+1 )|βH , σH /σL ) > 0. Similarly, let i S be the highest index i such that π(X(ωi )|βH , σH /σL ) > 0. Note, Assumption 14.2b guarantees that the set {ωi ∈ |i S ≤ i ≤ iS } is not a singleton set.
328
Sujoy Mukerji
Claim 14.A.1. s(ωi ) − t˜(ωi ) is the same for all i satisfying the condition i S ≤ i ≤ iS . Proof. Since for all i ≤ i S , π(X(ωi )|βH , σH /σL ) = 0, it follows that π(X(ωi S )|βH , σH /σL ) − π(X(ωi S +1 )|βH , σH /σL ) < 0.
(14.A.12)
Since for all i > i S , π(X(ωi )βH , σH /σL ) = 0, it follows that π(X(ωi S )|βH , σH /σL ) − π(X(ωi S +.1 )|βH , σH /σL ) > 0.
(14.A.13)
At this point it is useful to recall that E(s − t˜|β, σ ) may be written as N−1
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|β, σ ) − π(X(ωi+1 )|β, σ )]
i=1
+ [s(ωN ) − t˜(ωN )]π(ωN |β, σ ). Suppose the claim is false. That is, s(ωi ) − t˜(ωi ) is not constant when i varies in the interval [i S , i S ]. By comonotonicity of s˜ and s − t˜, s(ωi ) − t˜(ωi ) is weakly increasing in ωi for i such that i S ≤ i ≤ i S . Hence it must be that s(ωi S ) − t˜(ωi S < s(ωi S ) − t˜(ωi S ). Next notice, E(s − t˜|βH , σH ) − E(s − t˜|βH , σL ) =
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σH ) − π(X(ωi+1 )|βH , σH )]
i=i S
−
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σL ) − π(X(ωi+1 )|βH , σL )]
i=i S
=
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σH /σL )
i=i S
− π(X(ωi+1 )|βH , σH /σL )].
(14.A.14)
By inspecting the final expression (14.A.14), it may be checked that E(s − t˜|βH , σH ) − E(s − t˜|βH , σL ) > 0.
(14.A.15)
To see this, first note that for any i = ιˆ, π(X(ωιˆ)|β, σ ) =
N −1
[π(X(ωi )|β, σ ) − π(X(ωi+1 )|β, σ )] + π(ωN |β, σ ).
i=ˆι
Incompleteness of contractual form
329
Hence, the Assumption 14.2b (π(X(ωi )|βH , σh /σl ) ≥ 0) implies iS
[π(X(ωi )|βH , σH /σL ) − π(X(ωi+1 )|βH , σH /σL )] ≥ 0.
(14.A.16)
i=ˆι
Then, (14A.15) finally follows from the fact that s(ωi )− t˜(ωi ) is weakly increasing in ωi , (14A.16), (14A.12), (14A.13) and that s(ωi S ) − t˜(ωiS ) < s(ωi S ) − t˜(ωi S ). But (14A.15) in turn implies E(t˜|βH , σH ) − E(t˜|βH , σL ) < SocBen(σH /σL ).
(14.A.17)
Hence, given that we chose t˜ which satisfies I C S we have arrived at a contradiction. Claim 14.A.2. E(s − t˜|βH , σH ) − E(s − t˜|βL , σH ) < SocBen(βH /βL ). Proof. Suppose not, that is, E(s − t˜|βH , σH ) − E(s − t˜|βL , σH ) = SocBen(βH /βL ). Then by an argument as in Claim 14A.1, one may show that there must be a set of contiguous contingencies {ωi B , . . . , ωi B } where t˜(ωi ) is constant when i varies in the closed interval [i B , i B ]. Further, Condition 14.2 (a and b taken together) ensures that the intervals [i S , i S ] and [i B , i B ] overlap; that is, i B < i S and i S < i B . Since it has already been established that s(ωi ) − t˜(ωi ) is constant when i ∈ [i S , i S ], the fact that t˜(ωi ) is constant for i ∈ [i B , i˜B ] therefore implies that both s(ωi ) − t˜(ωi ) and t˜(ωi ) are constant when i ∈ [i, i], where i ≡ min{i B , i S } and i ≡ max{i B , i S }. Thus we are left with the contradictory conclusion that SocBen(βH /βL ) = 0 = SocBen(σH /σL ). With claim 14A.2 we have established that if t˜ and s − t˜ are comonotonic, and if hS (σH ) − hS (σL ) = SocBen(σH /σL ), then we can find hB (·) such that s − t˜ will not satisfy (I C B ) if t˜ satisfies (I C S ). Lemma 14.A.3. Suppose Conditions 14.2a, 14.2b, and 14.3 are satisfied and (βH , σH ) is the first-best profile. Then, there will exist hB (·) and hS (·), such that (βH , σH ) cannot be implemented even if there are t˜ and s − t˜ which satisfy (I C B ), (I C S ). Proof. We choose two investment cost functions h˜ B (·) and hS (·), such that any t and s − t which satisfy the corresponding (I C B ), (I C S ) are necessarily noncomonotonic. Lemma 14A.2 assures that such a choice is available. Of all t and s − t that satisfies the corresponding (I C B ), (I C S ) with investment cost functions hB (·) and hS (·), let t˜ and s − t˜ be the pair which maximizes E(max{s(ωi ), 0} − t(ωi )|βH , σH ) + E(t(·)|βH , σH ).
330
Sujoy Mukerji
Given what has been assumed about π(·|β, σ ), t˜ and s − t˜ are not comonotonic. Hence, it follows from Lemma 14A.1(b) and Condition 14.3 that E(max{s(ωi ), 0} − t˜(ωi )|βH , σH ) + E(t˜(·)|βH , σH ) < E(max{s(ωi ), 0}|βH , σH ), implying that there exist nonnegative real numbers ξb and ξs such that E(max{s(ωi ), 0} − t˜(ωi )|βH , σH ) + E(t˜(·)|βH , σH ) − hB (·) − ξb − hS (·) − ξs < 0, even though, E(max{s(ωi ), 0}|βH , σH ) − hB (·) − ξb − hS (·) − ξs > 0. Choose, hB (·) = hB (·) + ξb ; hS (·) = hS (·) + ξs ; this choice will satisfy (I C B ), (I C S ) for the division of surplus t˜ and s − t˜; however it will fail to satisfy the participation constraint(s) for implementing (βH , σH ). Proof of Lemma 14.2. Clearly t and s − t satisfy the appropriate incentive constraints. Notice, t and s − t are comonotonic and thus by Lemma 14A.2(a), the expectations operator is additive. Hence ex ante payments can be arranged to satisfy the individual participation constraints given that the aggregate participation constraint (14.8) is satisfied. Proof of Proposition 14.3. We know from Proposition 14.2 that there exists a tuple (πˆ (·|β, σ ), hˆ B (·), hˆ S (·), sˆ (·)) satisfying Conditions 14.1, 14.2a, 14.2b, and 14.3 such that no contract may implement the first best (βH , σH ). Recall, (β ∗ , σ ∗ ) is a second-best profile if ES(β , σ ) > ES(β ∗ , σ ∗ ) ⇒ (β , σ ) is the first best. If (βL , σL ) is the second best in the model described by the tuple (πˆ (·|β, σ ), hˆ B (·), hˆ S (·), sˆ (·)), then it must be the case that the inequalities (14A.18), (14A.19), (14A.20) are satisfied, implying that the null contract will implement (βL , σL ) (by Lemma 14.2). ¯ λ[E(max{ˆ s (ωi ), 0}|βL , σL ) − E(max{ˆs (ωi ), 0}|βH , σL )] ≥ hˆ B (βL ) − hˆ B (βH )
(14.A.18)
¯ (1 − λ)[E(max{ˆ s (ωi ), 0}|βL , σL ) − E(max{ˆs (ωi ), 0}|βL , σH )] ≥ hˆ S (σL ) − hˆ S (σH )
(14.A.19)
Incompleteness of contractual form E(max{ˆs (ωi ), 0}|βL , σL ) ≥ hˆ B (σL ) + hˆ B (βL ).
331
(14.A.20)
To see why (14.A.18), (14.A.19) must be satisfied, suppose (14A.18) does not hold. That is, ¯ λ[E(max{ˆ s (ωi ), 0}|βH , σL − E(max{ˆs (ωi ), 0}|βL , σL )] > hˆ B (βH ) − hˆ B (βL ).
(14.A.21)
But the assumption of stochastic dominance (in Assumption (14.2a) implies that (1 − λ¯ )E(max{ˆs (ωi ), 0}|βH , σL ) − hˆ S (σL ) ¯ ≥ (1 − λ)E(max{ˆ s (ωi ), 0}|βL , σL ) − hˆ S (σL ).
(14.A.22)
Summing up (14.A.21) and (14.A.22), we get (14.A.23): E(max{ˆs (ωi ), 0}|βH , σL ) − hˆ B (βH ) − hˆ S (σL ) > E(max{ˆs (ωi ), 0}|βl , σL ) − hˆ B (βL ) − hˆ S (σl ).
(14.A.23)
But (14A.23) contradicts the hypothesis that (βL , σL ) is the second best. Hence if (βL , σL ) is the second best the null contract is optimal contract for the model (π(·|β, ˆ σ ), hˆ B (·), hˆ S (·), sˆ (·)). Next consider the case where (βL , σL ) is not the second best. Assume w.l.o.g. that (βH , σL ) is the second best. Adjust the investment cost functions as follows: Let h˜ B (βH ) = hˆ B (βH ); h˜ S (σH ) = hˆ S (σH ); and h˜ B (βL ) = hˆ B (βL )+ε; h˜ S (σL ) = hˆ S (σL ) − ε, where ε is such that, ¯ λ[E(max{ˆ s (ωi ), 0}|βH , σL ) − E(max{ˆs (ωi ), 0}|βL , σL )] = h˜ B (βh ) − h˜ B (βL ).
(14.A.24)
It may be checked that with the adjusted cost functions, (βH , σL ) will be implemented by the null contract and further, the adjustment will not alter the fact that (βH , σH ) is the first best. One has to verify that (βH , σH ) cannot be implemented, given the adjusted cost functions h˜ B and h˜ S . To that end, suppose to the contrary. Hence by this hypothesis there exists a transfer t˜ which meets the required incentive and participation constraints corresponding to the adjusted cost functions h˜ B and h˜ S . That implies there exists a transfer tˆ = t˜ + α which ensures that the incentive and participation constraints are met for the implementation of (βH , σH ) corresponding to the original investment cost functions hˆ B and hˆ S . This contradicts the fact that the model given by (π(·|β, ˆ σ ), hˆ B (·), hˆ S (·), sˆ (·)) is such that the first best (βH , σH ) cannot be implemented by a contract.
Acknowledgments The chapter has benefited very substantially from the many constructive suggestions of the two anonymous referees. Their efforts went much beyond the call
332
Sujoy Mukerji
of duty and I remain very grateful. Jim Malcomson’s painstaking scrutiny of an earlier draft made possible the much-needed expositional improvements. I also thank Dieter Balkenborg, Jacques Cramér, David Kelsey, Fahad Khalil, Peter Klibanoff, Andrew Mountford, David Pearce, R. Edward Ray, Gerd Weinrich, and seminar members at various universities and conferences (especially the audience at the Conference on Decision Making Under Uncertainty held at Saarbrücken (Germany), University of Saarland) for helpful discussions and comments.
Notes 1 To preempt misunderstandings it is emphasized that the term “ambiguity,” as used in this chapter, refers purely to the fuzzy perception of the likelihood subjectively associated with an event (e.g. when asked about his subjective estimate of the probability of an event, the agent replies, “It is between 50 and 60 percent.”). It does not refer to a lack of clarity in the description of contingent events and actions. Also note, some authors and researchers refer to ambiguity as “Knightian Uncertainty” or even simply as “uncertainty.” As it is used in this chapter, the word “uncertainty” is simply the defining characteristic of any environment where the consequence of at least one action is not known for certain. 2 The reader is assured that the example is essentially unaffected by also having a state in which both the statements are true. 3 The author remains most grateful to the two anonymous referees for drawing his attention to this point. 4 In general, nonadditative probability (or capacity) π obeys the axioms (i), (ii), and the condition that X ⊇ Y ⇒ π(X ) ≥ π(Y). The axiom (iii) applies to the special case of a convex nonadditive probability. The term “convex” points to the requirement that the nonadditive probability of a set is (weakly) greater than the sum of the nonadditive probabilities of the cells of a partition of the set. Presumably, the analogy is to the property of any increasing convex function, say φ : R+ → R+ , that φ(x) + φ(y) ≤ φ(x + y). It is when the nonadditive probability is convex that the CEU decision rule corresponds to ambiguity aversion. 5 Consider the following stronger version of the third property: (iii ) For every n > 0 and every collection χ1 , . . . , χn ∈ 2 , π(∪ni=1 χi ) ≥ (−1)|I |+1 π(∩i∈I χi ) I ⊆{1,...,n} I =∅
where |I | denotes the cardinality of I . Non-additive probabilities which satisfy (iii ), in addition to (i) and (ii), have been variously referred to as “belief functions”, “totally monotone capacities,” and “n-convex capacities”. In the rest of the chapter, all references to convex non-additive probability measures should be understood to be referring to non-additive probabilities which satisfy (i), (ii), and (iii ), rather than to those satisfying (i), (ii), and (iii), as they did in the version published in the AER. The amendment ensures that Lemma A1 is correct as stated. The author thanks Ben Polak for bringing to his notice that Lemma A1 need not hold for convex capacities which satisfied only (iii), the weaker version of the third property. 6 This follows from the celebrated theorem in Lloyd S. Shapley (1965) which asserts the existence of a core allocation corresponding to any convex characteristic value function defined on possible coalitions in a cooperative game. 7 Peter C. Fishburn (1993) provides an axiomatic justification of this definition of ambiguity and Mukerji (1997) demonstrates its equivalence to a more primitive and epistemic
Incompleteness of contractual form
333
notion of ambiguity (expressed in terms of the DM’s knowledge of the state space). Massimo M. Marinacci (1995) applies the idea to game theory, while David Kelsey and Shasikanta Nandeibam’s (1996) analysis explains why this definition is sometimes interpreted as a measure of “uncertainty aversion.” 8 The Choquet expectation operator may be directly defined with respect to a nonadditive probability. Label ωi such that f (ω1 ) ≤ · · · ≤ f (ωN ). Then, CEπ (f ) = f (ω1 ) +
N
[f (ωi ) − f (ωi−1 )] × π({ωi , . . . , ωN })
i=2
=
N −1
f (ωi )[π({ωi , . . . , ωN }) − π({ωi+1 , . . . , ωN })] + f (ωN )π({ωN }).
i=1
9 For a fuller review of the arithmetic of the Choquet expectation operator, see Example 14.A.1 in the Appendix. 10 This is technically evident from the fact that if {X , Y} is a partition of the set E, then convexity of the belief (on E) implies, A(π(X )) + A(π(Y)) ≥ A(π(E)). 11 Usually stochastic dominance is defined with respect to the payoff or the outcome space. As stated here, instead the reference is to the underlying contingency space. Thus we have to suitably amend the usual definition to accommodate the fact that contiguous contingencies may yield the same outcome, that is, the same surplus. 12 The reader will observe that this notion of the first best is “vindicated” by the fact that this is the profile that will be chosen if the investment effort were contractible. 13 In particular, this allows for terms of trade being contingent on realizations of v(·) and c(·) by the simple expedient of making the terms contingent on events such as E(V ; C) ≡ {ωi ∈ |v(ωi ) = V and c(ωi ) = C}. 14 By taking δ as a mapping into {0, 1} ex post randomization is ruled out. This follows the dominant tradition in the literature on incomplete contracting, see for example, Hart and John H. Moore (1988). 15 It seems a reasonable conjecture that Conditions 14.2 and 14.3 are generic. If, for instance, Condition 14.2 fails to hold, even the slightest perturbation of beliefs should restore the condition. A similar understanding can be suggested for Condition 14.3. In a suitably rich space of measures that includes all convex nonadditive measures, the subspace of beliefs that are strictly additive (over at least some events) would appear to be nongeneric. (NB the parametric specification in Example 14.2 satisfies Conditions 14.1, 14.2a, 14.2b, and 14.3.) 16 A referee has remarked, “the claim . . . that [a] transactions cost argument cannot rationalize the use of long-term contracts in stable environments but not in highly uncertain ones is debatable . . . if the greater uncertainty means more contingencies that must be foreseen, described, bargained over, and ultimately recognized . . . .” While admittedly debatable, the claim is definitely defensible. It is certainly intuitive to posit a link between the nature of the incumbent uncertainty and the extent of transactions costs. But one is yet to see a formal clarification of such a story. For instance, it is hard to figure precisely what primitives and principles imply that “greater uncertainty means more contingencies that must be foreseen (etc).” The point is, while ambiguity aversion does manage to convey a precise and coherent account of a link between uncertainty and contracting costs, the transactions-costs paradigm is yet to find one. 17 James M. Malcomson and W. Bentley McLeod (1993) explains Joskow contracts by essentially arguing that in such contexts conditioning instructions over a coarse partition
334 Sujoy Mukerji of the contingency space is sufficient. This explanation is very consistent with the ambiguity aversion story: as has been observed earlier (Note 9), the coarser the partition the less the bite from ambiguity.
References Anderlini, Luca and Felli, Leonardo. 1994, “Incomplete Written Contracts: Undescribable States of Nature,” Quarterly Journal of Economics, November, 109(4), pp. 1085–124. Baker, George; Gibbons, Robert and Murphy, Kevin. J. 1997, “Implicit Contracts and the Theory of the Firm,” Unpublished manuscript. Bernheim, B. Douglas and Whinston, Michael D. 1997, “Incomplete Contracts and Strategic Ambiguity,” Discussion Paper No.1787, Harvard University. Camerer, Colin F. and Weber, Martin. 1992, “Recent Developments in Modelling Preferences: Uncertainty and Ambiguity,” Journal of Risk and Uncertainty, October 5(4), pp. 325–70. Carlton, Dennis W. 1979, “Vertical Integration in Competitive Markets Under Uncertainty,” Journal of Industrial Economics, March, 27(3), pp. 189–209. Coase, Ronald. 1937, “The Nature of the Firm,” Economica, November, 39(4), pp. 386–405. Dow, James P. and Werlang, Sergio R. 1992, “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, January 60(1), pp. 197–204. (Reprinted as Chapter 17 in this volume.) —— 1994 “Nash Equilibrium Under Kinghtian Uncertainty: Breaking Down Backward Induction,” Journal of Economic Theory, December, 64(2), pp. 305–24. Eichberger, Jurgen and Kelsey, David. 1996, “Signalling Games with Uncertainty,” Mimeo, University of Birmingham, U.K. Ellsberg, Daniel. 1961, “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, November, 75(4), pp. 643–69. Epstein, Larry G. and Wang, Tan. 1994, “Intertemporal Asset Pricing Under Kinghtian Uncertainty,” Econometrica, March 62(2), pp. 283–322. (Reprinted as Chapter 18 in this volume.) —— 1995 “Uncertainty, Risk-Neutral Measures and Security Price Booms and Crashes,” Journal of Economic Theory, October, 67(1), pp. 40–82. Fishburn, Peter C. 1993, “The Axioms and Algebra of Ambiguity,” Theory and Decision, March, 34(2), pp. 119–37. Ghirardato, Paolo. 1995, “Coping with Ignorance: Unforeseen Contingencies and Nonadditive Uncertainty,” Mimeo, University of California, Berkeley. Grossman, Sanford J. and Hart, Oliver D. 1986, “The Costs and Benefits of Ownership: A Theory of Vertical and Lateral Integration,” Journal of Political Economy, August, 94(4), pp. 691–719. Hart, Oliver D. 1995, Contracts and financial structure. Oxford: Clarendon Press. Hart, Oliver D. and Moore, John H. 1988, “Incomplete Contracts and Renegotiation,” Econometrica, July, 56(4), pp. 755–85. Joskow, Paul L. 1985, “Vertical Integration and Long-Term Contracts: The Case of CoalBurning Electric Generating Plants,” Journal of Law, Economics and Organization, Spring, 1(1), pp. 33–80. Kelsey, David and Nandeibam, Shasikanta. 1996, “On the Measurement of Uncertainty Aversion,” Mimeo, University of Birmingham, U.K.
Incompleteness of contractual form
335
Klein, Benjamin; Crawford, Robert G. and Alchian, Armen A. 1978, “Vertical Integration, Appropriable Rents and the Competitive Contracting Process,” Journal of Law and Economics, October, 21(2), pp. 297–326. Legros, Patrick and Matsushima, Hitoshi. 1991, “Efficiency in Partnerships,” Journal of Economic Theory, December, 55(2), pp. 296–322. Lipman, Barton L. 1992, “Limited Rationality and Endogenously Incomplete Contracts,” Queen’s Institute for Economic Research Discussion Paper No. 858. October. Lo, Kin Chung. 1998, “Sealed Bid Auctions with Uncertainty Averse Bidders,” Economic Theory, July, 12(1), pp. 1–20. MacLeod, W. Bentley and Malcomson, James M. 1993 “Investments, Holdup, and the Form of Market Contracts,” American Economic Review, September, 83(4), pp. 811–37. Malcomson, James M. 1997, “Contracts, Hold-Up, and Labor Markets,” Journal of Economic Literature, December, 35(4), pp. 1916–57. Marinacci, Massimo M. 1995, “Ambiguous Games,” Mimeo, Northwestern University. Masten, Scott E.; Meehan, James W. and Snyder, Edward A. 1989, “Vertical Integration in the U.S. Auto Industry: A Note on the Influence of Transaction Specific Assets,” Journal of Economic Behavior and Organization, October, 12(2), pp. 265–73. Mukerji, Sujoy. 1995, “A Theory of Play for Games in Strategic Form when Rationality Is Not Common Knowledge,” Mimeo, University of Southampton, U.K. —— 1997 “Understanding the Nonadditive Probability Decision Model,” Economic Theory, January, 9(1), pp. 23–46. Schmeidler, David. 1989, “Subjective Probability and Expected Utility without Additivity,” Econometrica, May, 57(3), pp. 571–87. (Reprinted as Chapter 5 in this volume.) Shapley, Llyod. S. 1971, “Cores of Convex Games,” International Journal of Game Theory, January, 1(1), pp. 12–26. Simon, Herbert A. 1951, “A Formal Theory of the Employment Relationship,” Econometrica, July 19(3), pp. 293–305. Tallon, Jean-Marc. 1998, “Asymmetric Information, Nonadditive Expected Utility, and the Information Revealed by Prices: An Example,” International Economic Review, May 39(2), pp. 329–42. Tirole, Jean. 1994, “Incomplete Contracts: Where Do We Stand?” Walras-Bowley Lecture, Summer Meetings of the Econometric Society. Williams, Steven R. and Radner, Roy. 1988, “Efficiency in Partnership When the Joint Output Is Uncertain,” Northwestern Center for Mathematical Studies in Economics and Management Science Working Paper No. 760. Williamson, E. Oliver, 1985, The economic institutions of capitalism. New York: Free Press.
15 Ambiguity aversion and incompleteness of financial markets Sujoy Mukerji and Jean-Marc Tallon
15.1. Introduction Suppose an agent’s subjective knowledge about the likelihood of contingent events is consistent with more than one probability distribution. And further that, what the agent knows does not inform him of a precise (second order) probability distribution over the set of “possible” (first order) probabilities. We say then that the agent’s beliefs about contingent events are characterized by ambiguity. If ambiguous, the agent’s beliefs are captured not by a unique probability distribution in the standard Bayesian fashion but instead by a set of probabilities. Thus not only is the outcome of an act uncertain but also the expected payoff of the action, since the payoff may be measured with respect to more than one probability. An ambiguity averse decision maker evaluates an act by the minimum expected value that may be associated with it: the decision rule is to compute all possible expected values for each action and then choose the act which has the best minimum expected outcome. This (informal) notion of ambiguity aversion inspires the formal model of Choquet expected utility (CEU) preferences introduced in Schmeidler (1989). The present chapter considers a model of financial markets populated by agents with CEU preferences, with the interpretation that the agents’ preferences demonstrate ambiguity aversion.1 Typically, economic agents are endowed with income streams that are not evenly spread over time or across uncertain states of nature. A financial contract is a claim to a contingent income stream—hence the logic of the financial markets: by exchanging such claims agents change the shapes of their income streams, obtaining a more even consumption across time and the uncertain contingencies. A financial market is said to be complete if contingent payoffs from the different marketed financial contracts are varied enough to span all the contingencies. However, casual empiricism suggests that in just about every financial market in the real world the span is less than the full set of contingencies, that is, the markets are incomplete. The primary implication of incompleteness of financial markets is that agents may transfer income only across a limited set of contingencies and are thus left to share risk in a suboptimal manner.2
Mukerji, Sujoy and Tallon, Jean-Marc (forthcoming). “Ambiguity aversion and incompleteness of financial markets,” Review of Economic Studies (2001), vol. 68(4), 883–904.
Incompleteness of financial markets
337
Consider the following question: Take a (financial) economy with complete markets, but suppose agents are not subjective expected utility (SEU) maximizers, but rather CEU maximizers; are there conditions under which it is possible that at a competitive equilibrium agents do not trade some assets and hence their equilibrium allocations are equivalent to competitive allocations deriving from some incomplete market economy wherein the allocations are not Pareto optimal? The answer to the question is a qualified yes. The qualification is important and the essential contribution of the present chapter is in identifying this qualification. Imposing CEU maximization in a complete market economy does not generate no-trade, but, as this chapter shows, a robust sequence of incomplete market economies which would converge to complete markets with SEU agents but does not with CEU, can be constructed. The key characteristic of such a sequence of economies is that they include, as nonredundant instruments of risk-sharing, financial assets which are affected by idiosyncratic risk.3 We establish that trade in financial assets, whose payoffs have idiosyncratic components, may break down because of ambiguity aversion. We find, furthermore, that the no-trade due to ambiguity aversion is a robust occurrence, in the sense that it takes place even in the limit replica economy, with enough replicas of the financial assets such that idiosyncratic risk may be completely hedged. Hence, the behavior of the limit replica economy is markedly different depending on whether agents are SEU maximizers or CEU maximizers: in the former case the allocation is precisely that of a complete markets economy whereas in the latter case, because of the endogenous breakdown of trade, the equilibrium allocation, given a “high enough” level of ambiguity aversion and idiosyncratic risk, is not Pareto optimal and the nature of risk-sharing is as in an incomplete markets economy. These findings are of interest, both for the way it complements the related literature and for the substantive economic insight it gives rise to. Dow and Werlang (1992) showed, in a model with one risky and one riskless asset, a single ambiguity averse agent with CEU preferences, exogenously determined asset prices, and a risk-less initial endowment, that there exists a nondegenerate price interval at which an agent will strictly prefer to take a zero position in the risky asset. Recall, the logic of this result essentially rests on the observation that a CEU agent when going short in the risky asset will use a different probability to evaluate expected return as compared to when going long, since an agent taking a short (long) position is relatively better (respectively, worse) off in states where the asset payoff is shocked adversely. Having (robustly) rationalized a zero position in a single decision-maker framework one might be tempted to conjecture (even though such a conjecture is not made by Dow and Werlang) that it were but a short step to generate no-trade in a full equilibrium model. But, as we remarked earlier, simply imposing CEU maximization in a complete market economy does not generate no-trade unless endowments are Pareto optimal to begin with. The point is that, with complete markets, allocations are Pareto optimal and hence comonotonic (i.e. every agent’s ranking of states, ranked in accordance with the agent’s ex post utility from the given allocation, is identical) (Chateauneuf et al. (2000)). Comonotonicity implies that all agents evaluate the returns of assets with
338
Sujoy Mukerji and Jean-Marc Tallon
the same probability measure in a CEU world. Thus, closing Dow and Werlang model in the obvious way makes it apparent that, for generic endowments, assets will surely be traded. Hence it is, at least, of academic interest to find what condition actually generates an endogenous closure of some financial markets and a consequent lessening of risk-sharing opportunities, when moving from an SEU to a CEU world. Perhaps, a more compelling reason for interest in our findings is their economic significance. It is widely regarded that a crucial function of financial markets is that they allow individuals to hedge their income (from, say, human capital/labor) risk even though such risks are not, per se, contractible in appropriate detail because of usual reasons of asymmetric information and/or transactions costs. For instance, take X, a shopowner in Detroit, whose fortunes are heavily dependent on the fortunes of the automobile industry centered in Detroit. While X would love to smooth consumption across the various possible income shocks, it is hardly likely that an insurance company would be willing to insure X against anything other than accidents like fire and theft. But, standard economic/finance theory would argue, even though such personalized contracts may not be available, X should be able to hedge his income shocks in the stock market. To transfer income from the “good” states to the “bad,” all that is required is that X take a short position on a portfolio of shares of different firms in automobile (and related) industry and a long position on a “safe” asset (e.g. a government bond). Of course, the returns of any particular share will not be perfectly correlated with X’s income; in particular, each individual share return will be subject to some idiosyncratic risk. But, with a large enough number of such equities in the portfolio, the idiosyncracies may be hedged away, and X would find the (almost) perfect hedge for his income shocks. To X, therefore, for all practical purposes, the economy is very much a complete market economy. However, what this chapter shows is that the story only runs so far in an SEU world, not in a CEU world. Consider two agents trading an equity subject to idiosyncratic risk, with one agent taking a short position while the other goes long. Evidently, then, the variation in the agent’s consumption across states which differ only in terms of the idiosyncratic shocks would be exclusively determined by the nature and extent of the shocks and the agents’ position on the asset. Moreover, the variation of each agent’s consumption across such states will be inversely related, and therefore, their consumption will not be comonotonic. Hence, given ambiguity aversion with CEU preferences, an agent will behave as if he applies a different probability measure depending on whether he is choosing to go short or to go long. Therefore, it may be that the minimum asking price of the agent when choosing to go short will be higher than the maximum bid of the agent when choosing to go long. Thus, no trade may result, and the chapter provides sufficient conditions that obtain the result. Indeed, as we show, the no-trade outcome will survive even in the limit, when there are an arbitrarily large number of (independent) replicas of the equity. The intuition here is that the law of large numbers implies that the agents’ beliefs on the payoff of a portfolio of risky assets, hit (in part) by idiosyncratic shocks, converge to some mean, but the mean is in principle different for
Incompleteness of financial markets
339
agents taking differently signed positions on the (relevant) assets. In this fashion, ambiguity aversion creates an endogenous limit to the extent of risk sharing possible through financial markets, thereby providing a (theoretical) justification for the basic premise of the general equilibrium with incomplete markets model (HRI). To see it in the eyes of X, in a CEU world, unlike in an SEU world, there may not exist prices that would allow X to go short on automobile industry equities as he needs to do to “export” his income risk. The same market which offers possibilities of risk-sharing equivalent to complete markets when beliefs and behavior are in accordance with SEU, offers only the Pareto suboptimal risk-sharing possibilities on an incomplete market economy when agents are CEU maximizers with beliefs that are “sufficiently” ambiguous. The rest of the chapter is organized as follows. Section 15.2 provides an introduction to the formal model of ambiguity aversion applied in this chapter. Section 15.3 contains the formal model of the finance economy and the main result. Section 15.4 concludes the chapter. Appendix A contains some technical material on independence and law of large numbers for capacities. All formal proofs are in theAppendix B.
15.2. Choquet expected utility and the related literature Let = {ωi }N i−1 be a finite state space, and assume that the decision maker (DM) chooses among acts with state contingent payoffs, z : → R. In the CEU model (Schmeidler, 1989) an ambiguity averse DM’s subjective belief is represented by a convex non-additive probability function (or a convex capacity), ν such that, (i) ν(∅) = 0, (ii) ν( ) = 1 and, (iii) ν(X ∪ Y ) ≥ ν(X) + ν(Y ) − ν(X ∩ Y ), for all X, Y ⊂ . Define the core of ν, (notation: ( ) is the set of all additive probability measures on ): C(ν) = {µ ∈ ( ) | µ(X) ≥ ν(X), for all X ⊂ }. Hence, ν(X) = minµ∈C (ν) µ(X). The ambiguity4 of the belief about an event X is measured by the expression A(X; ν) ≡ 1 − ν(X) − ν(X c ) = maxµ∈C (ν) µ(X) − minµ∈C (ν) µ(X). Like in SEU, a utility function u : R+ → R, u (·) ≥ 0, describes DM’s attitude to risk and wealth. Given a convex non-additive probability ν, the Choquet expected utility5 of an act is simply the minimum of all possible “standard” expected utility values obtained by measuring the contingent utilities possible from the act with respect to each of the additive probabilities in the core of ν: $ u(z(ω))µ(ω) . CEν u(z) = min µ∈C (ν)
ω∈
The fact that the same additive probability in C(ν) will not in general “minimize” the expectation for two different acts, explains why the Choquet expectations operator is not additive, that is, given any acts z, w: CEν (z) + CEν (w) ≤ CEν (z + w).
340
Sujoy Mukerji and Jean-Marc Tallon
The operator is additive, however, if the two acts z and w are comonotonic, that is, if (z(ωi ) − z(ωj ))(w(ωi ) − w(ωj )) ≥ 0. In our analysis, we will need to consider the independent product of capacities. The independent product of two convex capacities, ν1 and ν2 according to the definition (suggested by Gilboa and Schmeidler, 1989) we apply in this chapter, may be (informally) understood as the lower envelope of the set {µ1 ×µ2 |µ1 ∈ C(ν1 ), µ2 ∈ C(ν2 )}. Unlike what is true with “standard” probabilities, there is more than one way to define the independent product of two capacities. As it turns out, the formal analysis in this chapter is unaffected if an alternative definition of independence were applied. We refer the interested reader to the Appendix A and to the discussion at the end of Section 15.3 for more on the independent product of capacities and turn next to the use of capacities and CEU on portfolio decision problems. Dow and Werlang (1992), as noted earlier, identified an important implication of Schmeidler’s model. They showed, in a model with one risky and one riskless asset, that if a CEU maximizer has a riskless endowment than there exists a set of asset prices that support the optimal choice of a riskless portfolio. The intuition behind this finding may be grasped in the following example. Consider an asset that pays off 1 in state L and 3 in state H and assume that ν(L) = 0.3 and ν(H ) = 0.4. Assuming that the DM has a linear utility function, the expected payoff of buying a unit of z, the act zb , is given by CEν (zb ) = 0.6 × 1 + 0.4 × 3 = 1.8. On the other hand, the payoff from going short on a unit of z (the act zs ) is higher at L than at H . Hence, the relevant minimizing probability when evaluating CEν (zb ) is that probability in C(ν) that puts most weight on H . Thus, CEν (zs ) = 0.3 × (−1) + 0.7 × (−3) = −2.4. Hence, if the price of the asset z were to lie in the open interval (1.8, 2.4), then the investor would strictly prefer a zero position to either going short or buying. Unlike in the case of unambiguous beliefs there is no single price at which to switch from buying to selling. Taking a zero position on the risky asset has the unique advantage that its evaluation is not affected by ambiguity. The “inertia” zone demonstrated by Dow and Werlang was simply a statement about optimal portfolio choice corresponding to exogenously determined prices, given an initially riskless position. However, it does not follow from this result at the individual level that no-trade is an equilibrium when closing the model by allowing agents to trade their risks, as we illustrate next using the Edgeworth box diagram in Figure 15.1. The diagram depicts the possibilities of risk-sharing (one may think of the risksharing as being obtained through the exchange of two Arrow securities, one for such contingency) between two CEU agents, h = 1, 2, with uncertain endowment in the two states, ωa and ωb . W is the endowment vector. Notice that, because of ambiguity aversion, the indifference curves are kinked at the point of intersection with the 45◦ ray through the origin. The shaded area in the diagram represents the area of mutually advantageous trade. Hence, no-trade is an equilibrium outcome in this economy if and only if endowment is Pareto optimal to begin with. Introduction of ambiguity aversion in an economy, seemingly, would not impede the trade in risk sharing. Contracts and would not be a reason for incomplete risk sharing. The reason for this “absence of no-trade” goes as follows: Pareto optimal
Incompleteness of financial markets
341
X1b O2 X2a
45°
W 45° O1
X1a X2b
Figure 15.1 Risk sharing with two CEU agents.
allocations lie within the “tramlines,” the 45◦ rays through each origin, that is, they are comonotonic. Hence, at a Pareto optimal allocation, the ranking of the states is identical for both agents and is given by the ordering of aggregate endowment. Now with complete markets, equilibrium allocations are Pareto optimal and therefore comonotonic as well. Thus, agents use the same “minimizing probability” at equilibrium, and agree on asset valuation. Risk-sharing proceeds just as in an economy with SEU agents (see Chateauneuf et al. (2000)). Thus, if one wants to obtain that equilibrium be characterized by absence of trade, one has to move away from this (canonical) example, something that is accomplished by introducing into the model assets with idiosyncratic payoff components. Epstein and Wang (1994) recognized the role of first of the two conditions defining idiosyncratic risk (as defined in this chapter) in obtaining nonunique equilibrium asset prices in a CEU world. That result is related to ours. The precise relationship between the results deserves careful discussion. For expository purposes, we turn to this discussion at the end of the next section, after the presentation of our model. We end this section with a discussion of another model of behavior under Knightian uncertainty due to Bewley (1986), distinct from the one applied in this chapter, which easily generates a no-trade result. Bewley, essentially, drops Savage’s assumption that preferences are complete and adds an axiom of the “status quo.” In our Edgeworth box this would amount to assuming that indifference curves are kinked precisely at the endowment point, irrespective of its position in the box. If indifference curves are “kinked enough,” the incompleteness of markets for contingent deliveries (the absence of trade) is then a direct consequence of preference for status quo which is exogenously imposed as a part of the definition of ambiguity aversion.
342
Sujoy Mukerji and Jean-Marc Tallon
15.3. The model and the main result The setting for our formal analysis is a model of a stylized two period finance economy which we call an n-financial asset economy with idiosyncracy. Households (h = 1, . . . , H ) trade assets in period 0, before uncertainty is resolved, and consume the one (and only) good in period 1. The assets available for trade are claims on deliveries of the consumption good in period 1. There are two sources of uncertainty. First, there is some “economic uncertainty”: agents do not know their endowments tomorrow. An economic state of the world, s, s = 1, 2, is completely identified by the endowment vector for that s state (e1s , . . . ehs , . . . , eH ); where each component of the vector, ehs ∈ R+ , gives a particular household’s endowment of the consumption good in state s (arising in period 1). We have restricted our analysis to the case of risk-sharing across only two economic states, to make the argument as transparent as possible. Second, there is idiosyncratic financial uncertainty. An idiosyncratic state of the world completely characterizes the realization of the idiosyncratic components of payoffs of the available financial assets (described below); it is identified by the vector t = (t1 , t2 , . . . , tn ), where ti ∈ {0, 1}, i ∈ {1, 2, . . . , n}, and n is the total number of financial assets. τn denotes the set of all t’s, that is, τn ≡ {0, 1}n . Hence, to obtain a complete description of a state of the world, exhausting all uncertainty relevant to the model, the economic states s must each be further partitioned into cells denoted (s, t). A typical state of the world is denoted by the letter ω, ω ∈ ≡ {(1, t)t∈τn , (2, t)t∈τn }. The assets available for trade at date 0 are as follows: 1
2
Financial assets, zi , i = 1, . . . , n, with payoffs that have idiosyncratic components. An asset zi yields a payoff of y s + y(ti ) > 0 units of the good; s = 1, 2, ti ∈ τ ≡ {0, 1}. y(ti ) is the idiosyncratic component, in the sense that it is independent of the realized economic state and independent of the realization of the payoff from any other financial asset zj , where j = i. It is assumed i that y(1) > y(0) and that y 1 = y 2 . Price of an asset zi is denoted by qnz . A safe asset, b, which delivers one unit of the good irrespective of the realized state of the world. Price of this security is normalized to 1.
A point behind modeling the asset structure as above is to ensure that in order to transfer resources across the two economic states the agents would have to rely on financial assets whose payoffs are affected by idiosyncratic shocks. Prior to the resolution of uncertainty, agents are endowed with a common belief about the likelihood of state ω. The (marginal) beliefs about particular idiosyncratic component ti are described by a capacity νi , νi (0) + νi (1) ≤ 1. To model the assumption that the realization of ti and tj are believed to be independent, the beliefs τn are described by the independent product (defined in Appendix A) 1on n ν ≡ i=1 νi . For simplicity, we shall assume that νi (ti = r) = νj (tj = r) = ν r , r = 0, 1, i, j ∈ {1, . . . , n}. The belief on an economic state s is given by π(s). To make it transparent that it is the ambiguity of beliefs about the idiosyncratic realizations which is responsible for the possibility of no-trade in financial assets,
Incompleteness of financial markets
343
and also to make the computation less tedious, we assume π(1) + π(2) 1 = 1. Finally, the common belief on is given by the independent product π ν. ω ω Let eh,n and xh,n be h’s endowment and consumption, respectively, in state ω = (s, t), given that the total number of financial assets in the economy is n. (s,t) (s,t ) Note, the definition of an economic state implies eh,n = eh,n . Hence, we may use s as a complete description of state contingent endowment. Holding the notation eh,n i . of the asset b by h is denoted bh,n and holding of the asset zi by h is denoted zh,n Agent h has a von-Neumann Morgenstern utility index uh : R+ → R, which is assumed to be strictly increasing, smooth and strictly concave. Furthermore, ω uh (0) = ∞ and eh,n > 0 for all h and all ω. Phn which denotes the maximization program of agent h, is as follows:
s,t CEπ⊗ν uh xh,n max n 1 ,...,zh,n bh,n ,zh,n
i i bh,n + ni=1 qnz zh,n =0 s.t. s,t s i , s = 1, 2, t ∈ τn . xh,n − eh,n = bh,n + ni=1 (y s + y(ti )) zh,n
2 3 1 n An equilibrium consists of a set of asset prices, qn ≡ 1, qnz , . . . , qnz , a set of "
H % 1 , . . . , zn asset holdings, (bn , zn ) ≡ bh,n , zh,n h,n h=1 , and a consumption vector,
ω xn ≡ xh,n , such that, given qn all agents solve Phn , and the asset h=1,...,H ;ω∈
markets clear, that is, i bh,n = zh,n = 0, h
∀i ∈ {1, . . . , n},
h
ω ω and the consumption vector is feasible at each state, that is, h xh,n = h eh,n . Notice, a tuple (qn , (bn , zn )) uniquely pins down the equilibrium, hence we may denote an equilibrium of an n-financial asset economy using such a tuple. In interpreting various aspects of the model it helps to bear in mind the economic issue the model has been formulated to examine, which is, how economic agents may share risks, inherent to their labor/human capital endowment, by trading in financial markets. Hence, as it appears in the model, a household’s endowment income is distinct from the household’s income obtained from the ownership of assets. Portfolio income is the instrument the household is allowed to use to absorb the shocks it faces in its endowed income. But the instrument is not a perfect one. The presence of idiosyncratic risk embodies the notion that payoff from a financial asset is not only affected by some of the same shocks that affect individual households’ endowment income and common to many assets but also by risks specific to each asset. While most firms’ profits are naturally affected by aggregate or sectorial demand shifts and supply shocks, other factors, more idiosyncratic to the firm, do typically matter.6 Finally, notice, we have assumed that the assets are in zero net supply. This implies that the asset trading our analysis applies to include
344
Sujoy Mukerji and Jean-Marc Tallon
all manner of trade in corporate bonds7 ; but for general assets (e.g. equities) the analysis is (formally) restricted to those trades which involve one side of the market going short. The main point of the assumption is that it allows us the abstraction to study how an agent may use a financial asset (say, an equity) to share the risk in his exogenously endowed income: by going short on an asset he issues contingent claims on his risky income, thereby, trading out his risk. To fix ideas, it might help to refer back to the example of X, the Detroit drugstore owner. X would be very representative of the agents in our model presented earlier. Think of the economic states 1 and 2 as states defined by shocks to X’s income from his drugstore. X may hedge his income shock by trading in a “safe” asset, such as a treasury bond, and financial assets, such as corporate bonds/equities issued by the various automobile and ancillary firms located in and around Detroit. Payoffs to each such financial asset it affected by the same income shock that affects X’s drug-store profits. In addition, each financial asset is also affected by shocks idiosyncratic to the issuing firm. Assuming, the firms’ profits and drug-store profits are affected in the same direction by the income shock, X’s hedging strategy would be, presumably, to take a short position on a portfolio of the available financial assets while simultaneously going long on the treasury bond. Our analysis, in effect, compares how such a strategy would fare in an SEU world and in a CEU world. Formally, the analysis compares equilibrium allocations across two cases: one, where beliefs about idiosyncratic outcome is unambiguous (ν 0 + ν 1 = 1), and the other where beliefs about the idiosyncracy is ambiguous (ν 0 + ν 1 < 1). In order to make the comparison stark, the analysis will relate the two cases to two benchmarks. One benchmark is a complete market economy which we call an economy without idiosyncracy, that is, an economy which is identical to the n-financial asset economy with idiosyncracy described in the last section in every respect except that there is only a single financial asset z which pays off y s + Eν y(t) ≡ y¯ s units in the economic states s = 1, 2. Correspondingly, q z denotes the price of z and zh denotes the amount held by household h. (Note, when denoting endogenous variables in the economy without idiosyncracy we may omit the subscript n.) The second benchmark is an incomplete market economy which is identical to the n-financial asset economy with idiosyncracy in every respect except that the only asset available is the safe asset. The following Lemma simplifies the analysis greatly. Lemma. Let (qn , (bn , zn )) be an equilibrium of the n-financial assets economy i i with idiosyncracy. Suppose ν 0 + ν 1 ≤ 1. Then, zh,n = zh,n , ∀i, i ∈ {1, . . . , n}, ∀h ∈ {1, . . . , H }. According to the Lemma, at an equilibrium, agents will hold all the financial assets in the same proportion. This is essentially a consequence of the fact that agents are risk averse and that the n financial assets are simply “independent replicas.” Let z˜ n denote a unit of a portfolio composed of n1 unit of the asset zi , i = 1, . . . , n; z˜ h,n is the amount held of this portfolio by h and q˜n is the price
Incompleteness of financial markets
345
of a unit of this portfolio. Given the Lemma, we may assume, without loss of generality, that it is only the asset z˜ n , instead of the individual assets zi , that is available for trade in the economy. Hence, an equilibrium of an n-financial assets economy with idiosyncracy, (qn , (bn , zn )), may equivalently be denoted by the tuple (q˜ n , (bn , z˜ n )), q˜ n ≡ {1, q˜n } and (bn , z˜ n ) ≡ {(bh,n , z˜ h,n )H h=1 }. The above characterization of the equilibrium in turn facilitates a simple definition of what it means to satisfy the conditions of equilibrium when n is arbitrarily large. We say (q˜ ∞ , (b∞ , z˜ ∞ ), x∞ ) satisfies the conditions of equilibrium of the n-financial assets economy with idiosyncracy where n is arbitrarily large, that is, n → ∞ if 8 : 1
2
Given q˜ ∞ , ((b∞ , z˜ ∞ ), x∞ ), is a solution to the problem P˜ h,∞ defined as follows:
s,t max CEπ ⊗ν uh xh,∞ ⎧ bh,∞ + q˜∞ z˜ h,∞ = 0 ⎪ ⎪ * ) ⎨ n (y s + y(ti )) s,t s s.t. xh,∞ , − eh,∞ = bh,∞ + z˜ h,∞ limn→∞ i=1 ⎪ n ⎪ ⎩ s = 1, 2, with probability 1,
= h z˜ h,∞ = 0, and the consumption vector is feasible at each ω ω state, that is, h xh,∞ = h eh,∞ with probability 1. h bh,∞
Theorem. Suppose ν 0 + ν 1 = 1. Then (q˜ ∞ , (b∞ , z˜ ∞ )) satisfies the conditions of equilibrium of the n-financial assets economy with idiosyncracy where n is arbitrarily large, if and only if, (q˜ ∞ , (b∞ , z˜ ∞ )) describes an equilibrium of an economy without idiosyncracy, wherein the price of a unit of z is equal to q˜∞ , and a household’s holding of the asset z, zh , is equal to z˜ h,∞ . The theorem shows that equilibrium allocations of the n-financial assets economy with idiosyncracy are essentially identical to that of the economy without idiosyncracy, in which financial markets are complete, provided the number of available financial assets is large enough and agents’ beliefs are unambiguous. The result follows from an application of the usual diversification principle stating that in the limit idiosyncracies are “washed away,” in conjunction with the assumption that y 1 = y 2 . However, if the model of the n-financial assets economy with idiosyncracy were to be reconsidered with the sole amendment that beliefs about idiosyncracies are ambiguous, that is, ν 0 + ν 1 < 1, then the result no longer holds. In such an economy, however large the n, given sufficient ambiguity, the equilibrium allocation is bounded away from Pareto optimal risk-sharing. The allocation actually coincides with the allocation of an incomplete market economy in which it is impossible to transfer resources between states 1 and 2, as we show in our main theorem, later. But, first, we present Example 15.1 to convey an intuition for the result.
346
Sujoy Mukerji and Jean-Marc Tallon
Example 15.1. Consider a 2-period finance economy with two risk averse agents, h = 1, 2, and two economic states. There are two assets available, b and z. b is a safe asset; it delivers one unit of the good in each of the two economic states. The payoff of z in state (s, t) is y s + y(t), s = α, β; t = 0, 1. Fix, y α = 1, y β = 2, y(0) = 0, y(1) = 2. First consider the case where ν 0 + ν 1 = 1. The model reduces to a standard incomplete market equilibrium with two assets and four states, in which, for “generic” endowments, there is trade, that is, some partial insurance among agents.9 Next, suppose, to simplify matters drastically, that ν 0 = ν 1 = 0. Consider an agent h contemplating buying the uncertain asset at a price q z , given the safe asset is priced q b = 1. h may buy zh units of the uncertain asset and take a position bh in the safe asset such that bh + q z zh = 0. His utility functional is then given by: CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh ) = uh (ehα + zh (y α + y(0)) + bh )π(α)(1 − ν 1 ) + uh (ehα + zh (y α + y(1)) + bh )π(α)ν 1
β + uh eh + zh y β + y(0) + bh π(β)(1 − ν 1 )
β + uh eh + zh y β + y(1) + bh π(β)ν 1 Once we substitute in ν 1 = 0, it is clear from the above functional that the payoff matrix the agent (as a buyer of z) will consider is: ) * 1 1 1 2 If q z ≥ 2, any balanced portfolio with zh > 0 yields negative payoffs and is therefore not worth buying. Thus, an agent will wish to buy the uncertain asset only if q z < 2. Next consider an agent h who contemplates going short on asset z. His utility functional is therefore: CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh ) = uh (ehα + zh (y α + y(0)) + bh )π(α)ν 0 + uh (ehα + zh (y α + y(1)) + bh )π(α)(1 − ν 0 )
β + uh eh + zh y β + y(0) + bh π(β)ν 0
β + uh eh + zh y β + y(1) + bh π(β)(1 − ν 0 ) Notice now the functional is dependent on ν 0 since the agent is going short, that is, zh < 0. Substituting ν 0 = 0, we find the payoff matrix the agent h will consider: ) * 1 3 1 4
Incompleteness of financial markets
347
For q z ≤ 3 any balanced portfolio with zh < 0 yields negative payoffs. Thus, an agent will wish to sell the risky asset only if q z > 3. Thus, buyers of asset 1 will not want to pay more than 2, while sellers will not sell it for less than 3. Hence, there does not exist an equilibrium price such that agents will have a nonzero holding of the uncertain asset. Next, consider another extreme, a case in which ambiguity appears only on the economic states while the agents are able to assess (additive) probabilities for the idiosyncratic states. In fact, to keep matters stark, assume π(α) = π(β) = 0, though the additive probability on idiosyncratic states is arbitrary, simply ensurβ ing that ν 0 + ν 1 = 1. Suppose that, for agent h, ehα > eh . Then, for zh ∈ (−ε, ε), for ε small enough, CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh )
β β = ν 0 uh eh + zh y β + y(0) + bh + ν 1 eh + zh y β + y(1) + bh β
since for zh small enough ehα + zh (y α + y(t)) + bh > eh + zh (y β + y(t)) + bh . Hence, zh = 0 if and only if q z = y β + ν 0 y(0) + ν 1 y(1) (the fact that endowments and the utility function do not appear in this expression is due to the extreme form of ambiguity assumed, that is, a maximin behavior). Thus, the only candidate for a no-trade equilibrium price is q z = y β + ν 0 y(0) + ν 1 y(1). Now, assume that for at least one other agent, the order of the endowment is β reversed, that is, eh > ehα , then a computation similar to the one earlier shows that such agents will not want to trade the risky asset if and only if q z = y α + ν 0 y(0) + ν 1 y(1) Hence, if both types of agents are present in the economy, trade will occur as y α = y β . If we were not to assume the extreme maximin form of preferences but that π(α) + π(β) < 1 with, say, π(α) > 0 and π(β) > 0, the no-trade price for β agent h (say with ehα > eh ) depends on his initial endowment and utility function (i.e. relative attitude to risk). In that case, even if endowments of all agents were β comonotonic (i.e. ehα eh for all h) there would not exist, for the generic endowment vector, an asset price q z that would support no-trade as an equilibrium of this economy. The two more significant ways in which the main theorem, generalizes the demonstration in Example 15.1 are: one, it shows that no-trade obtains even when beliefs have a degree of ambiguity strictly less than 1; two, it allows for any arbitrary number of financial assets, in particular, for n → ∞. We consider the intuition for each of these generalizations in turn. First, consider a 2-(economic) state, 2 agent, 1-financial asset (and 1 safe asset) economy with idiosyncracy,
348
Sujoy Mukerji and Jean-Marc Tallon
in which the financial asset’s payoffs are as in Example 15.1. Consider an agent thinking of buying the financial asset. The maximum payoff he expects in any economic state is 2 + 0 × (1 − ν 1 ) + 2 × ν 1 ≡ V (B), the amount he expects in state β. This implies, whatever his utility function, whatever his endowment vector, whatever his beliefs about the economic uncertainty, he will not want to buy the asset for more than V (B). Now, instead, if an agent were to go short with the asset, the least he expects to have to repay in any economic state is 1 + 0 × ν 0 + 2 × (1 − ν 0 ) ≡ V (S), and therefore, will not want to sell the asset if the price is less than this. Clearly, if ν 0 and ν 1 were small enough, V (B) < V (S). Therefore, if ν 0 and ν 1 were small enough, agents will not trade in the financial asset. Intuition about the second bit of generalization is difficult to obtain without some understanding of how the law of large numbers works for nonadditive beliefs. Specifically, let us consider an i.i.d. sequence {Xn }n≥1 of {0, 1}-valued random variables. Suppose, ν({Xn = 0}) = ν({Xn = 1}) = 14 for all n ≥ 1. As is usual with laws of large numbers, the question is about the limiting distribution of the sample average, n1 ni=1 Xi . The law10 implies: ν
1 1 1 3 ≤ lim inf Xi ≤ lim sup Xi ≤ n→∞ 4 n 4 n→∞ n n
n
i=1
i=1
! = 1.
This shows that the DM has a probability 1 4belief5 that the limiting value of the sample average lies in the (closed) interval 14 , 43 . However, unlike in the case of additive probabilities, the DM is not able to further pin down its value. Thus, even with non-additive probabilities the law of large numbers works in the usual way, in the sense that here too the tails of the distribution are “canceled out” and the distribution “converges on the mean.” But of course here, given that the DM’s knowledge is consistent with more than one prior, there is more than one mean to converge on; hence, the convergence is to the set of means corresponding to the set of priors consistent with the DM’s knowledge. Hence, a CEU maximizer whose (ex post) utility is increasing in X (e.g. when the DM is a buyer of an asset with payoff X) will behave as if the convergence of the sample average occurred at 14 , the lower boundary of the interval, while a DM whose utility is increasing in −X (e.g. when the DM is a seller of an asset with payoff X) will behave as if the convergence of the sample average occurred at, 34 , the upper boundary of the interval. Now we can complete our intuition for the main result. Consider a modification of the simplified financial economy of Example 15.1 such that, ceteris paribus, there are now n fold replicas of the financial asset, n → ∞. We consider trade between “two” assets, one the safe asset and the other the “portfolio” asset, containing each of the independent replica assets in equal proportion. The law of large numbers result, explained earlier, implies that any agent contemplating going long4 on the portfolio asset 5will behave as if a unit of the portfolio will payoff y s + 0 × (1 − ν 1 ) + 2 × ν 1 with probability 1 in economic state s
Incompleteness of financial markets
349
while an agent contemplating going short will5 behave as if a unit of the port4 folio will payoff y s + 0 × ν 0 + 2 × (1 − ν 0 ) with probability 1 in economic state s. Hence, exactly the same argument as before applies: for ν 0 and ν 1 sufficiently small, V (B) < V (S) and there will not be any trade in the portfolio. The important insight here is that while agents are fully aware that a “well diversified” portfolio “averages out” the idiosyncracies, they only have an imprecise knowledge of what it averages out to. Another important point demonstrated in Example 15.1, as modified earlier, is how equilibrium risk-sharing is affected by ambiguity aversion. If 1 − ν 0 − ν 1 > 12 , then the equilibrium allocation is necessarily not Pareto optimal unless endowments are, no matter how large the value of n. Consider an economy, E, which is the same as in the original example except that there is only one financial asset available in this economy, the safe asset b. Given ambiguity is greater than 1 2 , there is no-trade in the portfolio of uncertain assets in the economy in (the modified) Example 15.1, hence an equilibrium allocation of this economy is an equilibrium allocation of E. E has two states, α and β, but one asset, and therefore, is an incomplete markets economy with sub-optimal risk-sharing. We now state our main result: Main Theorem. Consider the n-financial assets economy with idiosyncracy. Let y s ≡ min{y s } and y s ≡ max{y s } and suppose that y s − y s < y(1) − y(0). Then s s ¯ z˜ h,n = 0 for all ¯ 0 < A¯ < 1, such that if 1 − ν 0 − ν 1 > A, there exists an A,
h ∈ {1, . . . , H } and xns,t = xns,t , s = 1, 2, t = t at every equilibrium (qn , (bn , zn )), for all n ∈ N, including, n → ∞.
Stated differently, this says that if the range of variation of the idiosyncratic component of the financial asset is greater than the range of variation due to the economic shocks, if the beliefs over the idiosyncratic states are ambiguous enough, and if agents are ambiguity averse, then irrespective of the utility functions of the agents and the endowment vector, the equilibrium of an n-financial assets economy with idiosyncracy is an equilibrium of the economy with one safe asset, that is, an economy with incomplete markets, since the financial assets are not traded in equilibrium, whatever the value of n. Notice further, if the conditions described in the theorem are met, then for a generic endowment, an equilibrium allocation of the n-financial assets economy with idiosyncracy is necessarily not Pareto optimal. This follows simply from the understanding that an equilibrium allocation of the n-financial assets economy with idiosyncracy, given the conditions of the theorem, is an equilibrium of the economy with one safe asset. The latter economy is an incomplete market economy in which it would not be possible to transfer resources between states 1 and 2. The significant sufficient condition to ensure no-trade, irrespective of the utility functions of the agents and the endowment vector, is that y s − y s < y(1) − y(0). ¯ A¯ = (y s − y s )/(y(1) − y(0)), The bound follows from the expression for A, constructed in the proof of the main theorem. Notice, A¯ is the supremum among
350
Sujoy Mukerji and Jean-Marc Tallon
the values of ambiguity required for no-trade, across all the possible combinations of parameters of utilities or endowments, and is independent of any parameter of utility or endowment. So, typically, the ambiguity required for no-trade be less ¯ further, no-trade will result even if y(1) − y(0) < y s − y s . Also, the than A; required ambiguity will be greater, greater the risk aversion and/or riskiness of the endowment (see example 3 in Mukerji and Tallon (1999)). One might be tempted to conjecture that results of the chapter may be replicated by simply assuming heterogeneous beliefs among agents. Or to conjecture, since with incomplete markets comonotonicity of equilibrium allocations is in general broken so that different (CEU) agents would evaluate their prospects using different (effective) probabilities, that adding CEU agents might “worsen” incompleteness even in the absence of idiosyncratic risks. Both conjectures are, however, false. What is at work in obtaining no-trade is not that different agents have different beliefs but that any given agent behaves as if he evaluates the two different actions, going short and going long, with different (probabilistic) beliefs. Neither does market incompleteness, in the absence of idiosyncratic risk, make for this peculiarity and therefore does not, in and of itself, lead to no-trade. We illustrate this with the following example. Example 15.2. Suppose there are S states, H agents, one safe asset and one risky asset that pays off y s unit of the good in state s. Agent h’s budget constraints are (we normalize the price of the safe asset in the first period as well as the price of the good in all states to be equal to 1): "
bh + qzh = 0 xhs = ehs + (y s − q)zh
s = 1, . . . , S
Claim. Assume that there are no pairs of states s and s such that y s = y s and es = es . Then, there exists a unique price qh such that zh∗ (qh ) = 0.
Proof. Assume w.l.o.g. that eh1 ≤ eh2 ≤ · · · ≤ ehS . Since by assumption ehs = ehs ⇒ y s = y s , there exists ε > 0 such that for all zh ∈ (−ε, ε): eh1 + (y 1 − q)zh ≤ eh2 + (y 2 − q)zh ≤ · · · ≤ ehS + (y S − q)zh Let (zh ) be the set of probability measures in C(ν) that minimize Eµ∈C (ν) uh (eh + (y s − q)zh ), that is, (zh ) = {(µ1 , . . . , µS ) ∈ C(ν) | Eµ uh (eh + (y s − q)zh ) = Eν uh (eh + (y s − q)zh )}. Observe that if µ, µ ∈ (zh ) are different, then they must disagree on those states where consumption is identical, or, said differently (given the order we adopted on h’s endowment):
ehs + (y s − q)zh = ehs + (y s − q)zh ∀s = s
⇒ µs = µ s = ν({s, . . . , S}) − ν({s + 1, . . . , S})
Incompleteness of financial markets
351
Hence, zh = 0 is optimal at price qh if and only if there exists µ ∈ (0) such that: µs y s uh (ehs ) q = qh ≡ s µs uh (ehs ) s
Recall now that probability measures in (0) can differ only on those states in which the endowment is constant. Since, by assumption, ehs = ehs ⇒ y s = y s , one obtains, Eµ [y s uh (ehs )] = Eµ [y s uh (ehs )] for all µ, µ ∈ (0). Since Eµ uh (ehs ) = Eµ uh (ehs ) for all µ, µ ∈ (0), qh as defined above is unique. We just established that there is only one price qh , defined in the proof earlier, such that at this price, agent h optimally wants a zero position in the risky asset. Now, unless the endowment allocation is Pareto optimal, qh = qh . Hence, at an equilibrium, trade on the market for the risky asset will be observed. This establishes that, “generically,” in order for zh = 0 for all h to be an equilibrium of the model, there must be pairs of states s, s such that ehs = ehs for all h and y s = y s ; in other words, an idiosyncratic element is necessary to obtain no trade. Before we close this section, we attempt to clarify further how our main result adds to the findings in the related literature. In Example 15.2, inspite of an incomplete markets environment, inspite of CEU agents, no-trade fails to materialize because each agent has a unique price at which he takes a zero position in the asset, and in general, this price is different for different agents. Dow and Werlang (1992) may be read as an exercise in purely deriving the demand function for a risky asset, given an initial riskless position. By putting together two Dow and Werlang agents one does obtain an economy where an equilibrium may be defined, but given that such agents’ endowments are riskless, agents do not have any risks to share in such an economy. Hence simply “completing” the Dow and Werlang exercise to obtain an equilibrium model does not allow one to investigate the question addressed in the present chapter, which is, whether ambiguity aversion affects risk-sharing possibilities in the economy. And, as explained in the previous section, even if we were to make the simple further extension of allowing uncertain endowments, given complete markets, we will find ambiguity has no effect. Finally, as Example 15.2 demonstrates, an even further extension of allowing market incompleteness does not provide the answer either. Evidently, one has to move further afield from the Dow and Werlang analysis to address our question. Epstein and Wang (1994) significantly generalized the Dow and Werlang (1992) result to find that price intervals supporting the zero position occurred (in equilibrium) if there were some states across which asset payoffs differ while endowments remain identical. The intuition for this is as follows. To obtain a range of supporting prices for the zero position, there must occur a “switch” in the effective probability distribution precisely at the zero position. That is, depending on whether he takes a position + or − away from 0, howsoever small, the agent evaluates his position using a different probability. For this to happen, the agent’s ranking
352
Sujoy Mukerji and Jean-Marc Tallon
of states (according to his consumption) must switch depending exclusively on whether he takes a posititive or negative position on the asset. Hence, there must be at least two states for which even the smallest movement away from the zero position would cause a difference in the ranking of the states depending on which side of zero one moves to. Clearly, this may only be true if the endowment were constant across the two states while the asset payoff were not. The clarification obtained in Epstein and Wang (1994) of the condition that enables multiple price supports to emerge, was the point of inspiration for the research reported in the present chapter. Indeed, the condition of Epstein and Wang (1994) is one of the two conditions we apply to define idiosyncratic risk. Where the present chapter has gone further and what, in essence, is its contribution, is in finding conditions for an economy wherein the agents’ price intervals overlap in such a manner that every equilibrium of the economy involves no-trade in an asset, and more importantly, conditions under which ambiguity aversion demonstrably “worsens” risk sharing and incompleteness of markets. These are issues that were neither addressed nor even raised in Epstein and Wang (1994), formally or informally, and understandably so, since the principal model in that paper was the Lucas (1978) pure exchange economy amended to include ambiguity averse beliefs. This is a model with a single representative agent, or equivalently, a number of agents with identical preferences and endowments. In an equilibrium of such an economy, trade and risk-sharing is trivial since agents will consume their endowments; endowments are, by construction, Pareto optimal.11 Kelsey and Milne (1995) extends the equilibrium arbitrage price theory (APT) by allowing for various kinds of nonexpected utility preferences. One of the cases they consider is the CEU model. The model in the present chapter may be thought of as a special case of the equilibrium APT framework: what are labeled as factor risks in APT are precisely what we call economic states and idiosyncratic risk is present in both models though in our model the idiosyncratic risk has a simpler structure in that there are only two possible idiosyncratic states corresponding to each asset. Only a special case of CEU preferences is investigated by Kelsey and Milne (1995): their Assumption 3.3 allows nonadditive beliefs only with respect to factor risks; idiosyncratic risk is described only by additive probabilities (see Assumption 3.3, the Remark following the assumption and footnote 2). The formal result of their analysis appears in Corollary 3.1 and shows, given the qualifications, the usual APT result continues to hold: diversification may proceed as usual, idiosyncratic risk disappears in the limit as the number of assets tend to infinity and the price of any asset is, consequently, a linear function of factor risks. This formal result is readily understandable given our analysis. As is repeatedly stressed upon in the present chapter, what drives our result is the nonadditivity of beliefs over the idiosyncratic states. While it is not necessary that ambiguity aversion be restricted to idiosyncratic states for our result to hold, it is necessary that there be some ambiguity about idiosyncracies. The no-trade result fails if ambiguity is merely restricted to economic states, as we explained in the latter part of Example 15.1 and in Example 15.2. With ambiguity only on economic states, ambiguity aversion has no bite, irrespective of whether there is only a single asset or infinitely many and
Incompleteness of financial markets
353
hence diversification proceeds as with SEU. Hence, their result would not obtain without the restriction imposed by (their) Assumption 3.3. Our analysis therefore warns against informally extrapolating the Kelsey and Milne (1995) result to think that diversification would proceed as usual even when the special circumstances of Assumption 3.3 does not hold (i.e. the ambiguity is not restricted to economic states but occurs more generally over the state space). Further, it would appear to be a compelling description of the economic environment to assume, if an agent is at all ambiguity averse, the agent will be ambiguity averse about an idiosyncratic risk. By definition, such a risk is unrelated to his own income risk and the macroeconomic environment; the risk stems from the internal workings of a particular firm, something about which the typical agent is likely to have little knowledge of. It is well known that it is possible to define more than one notion of independence for nonadditive beliefs. Ghirardato (1997) presents a comprehensive analysis of the various notions. As Ghirardato notes (p. 263), the problem of defining an independent product has been studied, previous to Ghirardato’s investigation, by Hendon et al. (1996), Gilboa and Schmeidler (1989) and Walley and Fine (1982). The definition invoked in the present chapter, suggested by Gilboa and Schmeidler (1989) and Walley and Fine (1982), is arguably the most prominent in the literature. However, the formal analysis in the present chapter, given the primitives of our model, does not hinge on this particular choice of the notion of independence. An important finding of Ghirardato’s analysis was that the proposed specific notions of independent product give rise to a unique product for cases in which marginals have some additional structural properties. The capacity we use in our model is a product of an additive probability and n two-point capacities (ν consists of two points, ν 0 and ν 1 ). A two-point capacity is, of course, a convex capacity and (trivially) a belief function. As is explicit in Theorems 2 and 3 in Ghirardato (1997), if marginals satisfy the structural properties, the marginals we use do, then uniqueness of product capacity obtains. That is, irrespective of which of the two definitions of independence is adopted, the one suggested by Hendon et al. (1996) or the one we use, the computed product capacity is the same. The law of large numbers that we use formally invokes the Gilboa–Schmeidler notion (see Marinacci, 1999: Theorem 15 and Section 7.2). Since both notions of independence are equivalent given the primitives of our model, it is of irrelevance to our analysis whether the law of large numbers that we use also holds if the alternative notion of independence were adopted. In other words, conclusions of our formal analysis are robust to the adoption of the alternative notion of independence.
15.4. Concluding remarks Financial assets typically carry some risk idiosyncratic to them, hence, disposing incomes risk using financial assets will involve buying into the inherent idiosyncratic risk. However, standard theory argues that diversification would, in principle, reduce the inconvenience of idiosyncratic risk to arbitrarily low levels thereby making the tradeoff between the two types of risk much less severe. This arguments is less robust than what standard theory leads us to believe. Ambiguity
354
Sujoy Mukerji and Jean-Marc Tallon
aversion can actually exacerbate the tension between the two kinds of risks to the point that classes of agents may find it impossible to trade some financial assets: they can no more rely on such assets as routes for “exporting” their income risks. Thus, theoretically, the effect of ambiguity aversion on financial markets is to make the risk sharing opportunities offered by financial markets less complete than it would be otherwise. This is the principal conclusion of the exercise in this chapter. This conclusion is robust, to the extent that many of the assumptions of the model presented in the last section could be substantially relaxed without losing the substance of the analytical results. First, it does not matter whether the beliefs about the economic states are ambiguous, the no-trade result still obtains. Second, given that diversification with replica assets doesn’t work with ambiguous beliefs, one might wonder whether diversification can be achieved through assets which are not replicas (in terms of payoffs). It turns out that it does not make any difference (to the main result) if we were to relax the assumption about “strict” replicas (see Mukerji and Tallon, 1999). It is instructive to note the distinction between the empirical content of a theory of no-trade based on the “lemons” problem (e.g. Morris (1997)) and the theory based on ambiguity aversion. The primitive of the former theory is asymmetric uncertainty between the transacting parties, and significantly, no-trade may result even if there were no idiosyncratic component. Thus that theory, per se, does not link the presence and extent of idiosyncratic component to no-trade. To obtain such a link, one has to assume, a priori, that there is sufficient asymmetric information only in the presence of idiosyncratic information. On the other hand, the theory based on ambiguity aversion does not require that one assumes that ambiguity is present only with idiosyncracies, or that agents have ambiguous beliefs especially with respect to payoffs of assets with idiosyncratic components. One may well begin with the primitive that ambiguity is present in a “general” way, across all contingencies: However, since ambiguity aversion selectively attacks only those assets whose payoffs have idiosyncratic components, the link between idiosyncracy and no-trade is endogenously generated in the theory based on ambiguity aversion. This positive understanding is of significance. History of financial markets is replete with episodes of increase in uncertainty leading to a thinning out of trade (or even seizing up completely) peculiarly in assets such as high yield corporate bonds (“junk” bonds) and bonds issued in “emerging markets” (namely, Latin America, Eastern Europe and East Asia) (see Mukerji and Tallon, 1999). The understanding also explains certain institutional structures adopted in some countries to protect markets from such episodes (see Mukerji and Tallon, 1999).
Appendix A: Some formal details relating to the CEU model Independent product for capacities We consider here the formal modeling of the idea of stochastic independence of random variables when beliefs are ambiguous. Let y be a function from a given space τ to R, and σ (y) be the smallest σ -algebra that makes y a random variable.
Incompleteness of financial markets
355
τn denotes the n-fold Cartesian product of τ , and σ (y1 , . . . , yn ) the product σ algebra on τn generated by the σ -algebras {σ (yi )}ni=1 . The following definition was proposed by Gilboa and Schmeidler (1989), and earlier, by Walley and Fine (1982). Definition 15.A.1. Let νi be a convex 1 non-additive probability defined on σ (yi ). The independent product, denoted ni=1 νi , is defined as follows n 6
νi (A) = min{(µ1 × · · · × µn )(A) : µi ∈ C(vi )
for 1 ≤ i ≤ n}
i=1
for every A ∈ σ (y1 , . . . , yn ), where µ1 × · · · × µn is the standard additive product on σ (y measure. We denote by ⊗νi any non-additive probability 1 11 , . . . , yn , . . .) such that for any finite class {yt1 , . . . , ytn } it holds i≥1 νi (A) = ni=1 νi (A) for every A ∈ σ (y1 , . . . , yn ). The computation of the Choquet expectation operator using product capacities is particularly simple for slice comonotonic functions (Ghirardato (1997)), defined now. Let X1 , . . . , Xn be n (finite) sets and let = X1 ×· · ·×Xn . Correspondingly, let νi be convex non-additive probabilities defined on algebras of subsets of Xi , i = 1, . . . , n. Definition 15.A.2. Let f : → R. We say that f has comonotonic xi -sections , x , . . . , x ) ∈ X × if for every (x1 , . . . , xi−1 , xi+1 , . . . , xn ), (x1 , . . . , xi−1 1 n i+1 · · · × Xi−1 × Xi+1 × · · · × Xn , f (x1 , . . . , xi−1 , ·, xi+1 , . . . , xn ): Xi → R, and , ·, x , . . . , x ): X → R are comonotonic functions. f is called f (x1 , . . . , xi−1 i n i+1 slice-comonotonic if it has comonotonic xi -sections for every i ∈ {1, . . . , n}. The following fact follows from Proposition 7 and Theorem 1 in Ghirardato (1997). Fact 15.A.1. Suppose that f : → R is slice comonotonic. Then CE⊗νi f (x1 , . . . , xn ) = CEν1 . . . CEνn f (x1 , . . . , xn ) In what follows we verify that Fact 15.A.1 applies to the calculation of Choquet expected utility of an agent’s contingent consumption vector. As in the main text
= S × {0, 1}n be the state space, with generic element ω = (s, t1 , . . . , tn ) = s,t , h’s consumption at state ω = (s, t). Finally (s, t). For a given h let x(ω) = xh,n let u: R → R denote the strictly increasing utility index. It will be shown that composite function, u ◦ x(·): → R is slice comonotonic, and therefore, the calculation of CEu (x(ω)) may obtain as in Fact 15.A.1. Recall, ! n y s + y(ti ) s zh x(ω) = x(s, t) = eh + bh + n i=1
where z˜ h is the holding of the diversified portfolio consisting of 1/n units of each financial asset. We first show that x(·) is slice comonotonic. This is done
356
Sujoy Mukerji and Jean-Marc Tallon
by demonstrating, in turn, that x has comonotonic s-sections and comonotonic tj -sections. Fix t = (t1 , . . . , tn ) and t = (t1 , . . . , tn ). Assume that x(s, t) ≥ x(s , t). Then, as required in Definition 15.A. (slice comonotonicity), we want to show that x(s, t) ≥ x(s , t). Now, x(s, t) ≥ x(s , t) ⇐⇒
ehs
n y s + y(ti ) n
+ bn + z˜ h
! ≥
ehs
+ bh + z˜ h
i=1
⇐⇒
ehs
+ bh + z˜ h y ≥ s
n
⇐⇒ ehs + bh + z˜ h
i=1
ehs
n y s + y(ti ) n
!
i=1
s
+ bh + z˜ h y ! ! n y s + y(ti ) y s + y(ti ) s ≥ eh + bh + z˜ h n n i=1
⇐⇒ x(s, t ) ≥ x(s , t ) Hence, x has comonotonic s-sections.
Next, fix (s, t−j ) where t−j = (t1 , . . . , tj −1 , tj +1 , . . . , tn ) and s , t−j . Now, x(s, t−j , tj ) ≥ x(s, t−j , tj )
⎛
⎞ y s + y(ti ) (y s + y(tj )) ⎠ + ⇐⇒ ehs + bh + z˜ h ⎝ n n i =j ⎞
⎛ y s + y(tj ) y s + y(ti ) ⎠ ≥ ehs + bh + z˜ h ⎝ + n n i =j
⇐⇒ y(tj ) ≥
y(tj )
, tj ) ≥ x(s , t−j , tj ) ⇐⇒ x(s , t−j
Repeating this, one shows that x has comonotonic tj -sections, for all j = 1, . . . , n. Hence, x is slice comonotonic. Now, it is possible to see that slice comonotonicity of u ◦ x(·): → R follows readily from the assumption that u is strictly increasing. To this end, notice: x(s, t) ≥ x(s , t) ⇐⇒ u(x(s, t)) ≥ u(x(s , t)) and x(s, t−j , tj ) ≥ x(s, t−j , tj ) ⇐⇒ u(x(s, t−j , tj )) ≥ u(x(s, t−j , tj ))
Incompleteness of financial markets
357
Law of large numbers for capacities: (Marinacci (1996) Theorem 7.7, Walley and Fine (1982)) Let y be a function from a given (countably) finite space to the real line R, and σ (y) the smallest σ -algebra that makes y a random variable. n denotes the n n-fold Cartesian product of , and σ (y1 , . . . , yn ) the product n σ -algebra on
n generated by the σ -algebras {σ (yi )}i=1 . Set Sn = (1/n) i=1 yi . Let each νi be sequence of random variables a convex capacity on σ (yi ), and let {yi }i≥1 be a 1 independent and identically distributed relative to νi . Set Sn = (1/n) ni=1 yi . Suppose both CEν1 (y1 ) and CEν1 (−y1 ) exist. Then % " 1 ω ∈ ∞ : CEν1 (y1 ) ≤ lim inf n Sn (ω) ≤ lim supn Sn (ω) 1. νi = 1. ≤ −CEν1 (−y1 ) " % 1 ω ∈ ∞ : CEν1 (y1 ) < lim inf n Sn (ω) ≤ lim supn Sn (ω) 2. νi = 0. < −CEν1 (−X1 ) 1 3. νi ({ω ∈ ∞ : CEν1 (y1 ) = lim inf n Sn (ω)}) = 0. 1 4. νi ({ω ∈ ∞ : − CEν1 (−y1 ) = lim supn Sn (ω)}) = 0.
Appendix B: Proofs of results in the main text i
Proof of the Lemma. Suppose w.l.o.g q z ≥ q z for some i, i ∈ {1, . . . , n}. First i i i i > zh,n for some ≤ zh,n , ∀h ∈ {1, . . . , H }. Indeed, assume zh,n we show that zh,n and construct the portfolio zh,n as follows: i
zih,n
=
i zh,n
− ε,
zih,n
=
i zh,n
+
qz
i
i qz
ε,
and zh,n = zh,n ∀j = i, i . j
j
where ε is small enough so that zih,n > zih,n . Note, zh,n is budget feasible. Let x s,t h,n
≡
ehs
+ bh,n +
n
i zh,n (y s + y(ti )) for s = 1, 2
i=1 s,t xh,n
x s,t h,n
and are comonotonic, and uh is strictly increasing, it folBecause lows from Definition 15.A.1 that exists an additive product measure µ, where µ ≡ ×ni=1 µi , and µi : 2{0,1} → [0, 1] are additive measures, such that,
s,t s,t s,t CEπ ⊗ν xh,n = Eπ×µ xh,n , CEπ⊗ν x s,t h,n = Eπ×µ x h,n , and,
s,t s,t = Eπ×µ uh xh,n , CEπ ⊗ν uh xh,n
= Eπ×µ uh x s,t , s = 1, 2, ∀t ∈ τn . CEπ ⊗ν uh x s,t h,n h,n
s,t | s = Eµ x s,t Furthermore, Eµ xh,n h,n | s + Eµi εy(ti ) − Eµi εy(ti ), s = 1, 2.
i i Next, notice Eµi εy(ti )−Eµi εy(ti ) ≤ 0. Indeed, either zh,n and zh,n have the same
358
Sujoy Mukerji and Jean-Marc Tallon
i i > 0 > zh,n sign, in which case µi = µi and Eµi εy(ti ) − Eµi εy(ti ) = 0. Or zh,n and then
Eµi εy(ti ) − Eµi εy(ti ) = ε[1 − ν 0 − ν 1 ][y(0) − y(1)] ≤ 0 Hence, x s stochastically dominates x s . Given u < 0, therefore, s,t s,t Eπ ×µ uh (x s,t > h,n ) > Eπ×µ uh (xh,n ). As a consequence, CEπ⊗ν uh (x h,n ) s,t CEπ ⊗ν uh (xh,n ). But this is a contradiction to the hypothesis that (qn , (bn , zn , xn )) i i is an equilibrium. ∴ zh,n ≤ zh,n , ∀h ∈ {1, . . . , H }. H i i Since, (qn , (bn , zn , xn )) is an equilibrium, H h=1 zh,n = 0. Thereh=1 zh,n = i i i i fore, using the fact that zh,n ≤ zh,n for all h, we get that zh,n = zh,n for all h. h∞ given asset prices q˜ ∞ may Proof of the Theorem. The maximization problem P be written as follows: * ) n y s + y(t ) i max Eπ ⊗ν uh ehs + bh,∞ + zh,∞ limn→∞ n i=1 s.t. bh,∞ + q˜∞ z˜ h,∞ = 0 And the maximization problem Ph , solved by the agent in an economy without idiosyncracy, given asset prices q = q˜ ∞ : max s∈{1,2} π(s)uh (ehs + bh,n + zh y s ) s.t. bh + q˜∞ zh = 0 If n → ∞, by the law of large numbers, with probability 1 a unit of the portfolio z˜ n yields a payoff of y s + Et∈{0,1} y(t) ≡ y s units. That is, a.s limn→∞ ni=1 ((y s + y(ti ))/n) −→ y s . Recall, the financial asset z yields y s units of the good in the economic states s = 1, 2. h∞ at prices ( Hence, (b∞ , z∞ ) solves the maximization problem P q∞ ), if and z∞ ) also solves the maximization problem Ph at prices ( q∞ ). only if (b∞ , Finally note, if ( q∞ , (b∞ , z∞ , x∞ )) describes an equilibrium of the n-financial assets economy with idiosyncracy it must be that (b∞ , z∞ ) satisfies the conditions q∞ ) will also clear asset of (asset) market clearing at the price vector q∞ . Hence, ( z∞ , x∞ )) markets in the economy without idiosyncracy. Conversely, if ( q∞ , (b∞ , describes an equilibrium of the economy without idiosyncracy then ( q∞ ) will also clear asset markets in the n-financial assets economy with idiosyncracy. hn , the maximization problem in the Proof of the Main Theorem. Consider P n-financial asset economy with idiosyncracy. Suppose that, at equilibrium there exists h such that z˜ h ,n = 0, say z˜ h ,n > 0. Then, there must be h such that
Incompleteness of financial markets
359
z˜ h ,n < 0. Next, since z˜ h ,n > 0 and y(0) < y(1), Fact 15.A.1 together with the fact ω that uh (xh,n ) is slice-comonotonic (see Appendix A), implies that CEπ⊗ν uh (xhs,t ,n ) is a standard expectation with respect to the additive measure π × µ(t), where µ(t) = (1 − ν 1 )n0 × (ν 1 )n−n0 , n0 being the number of financial assets whose s,(t ,t )
idiosyncratic payoff is y(0) at state (s, t). This is because xh ,ni −i is necessarily smaller at a state (s, (0, t−i )) than at the state (s, (1, t−i )), s = 1, 2. The first order h n (for agent h ) then give: conditions of the problem P
q˜n =
Eπ ×µ
38 ((y s + y(ti ))/n)uh xhs,t ,n i=1 7
8 Eπ×µ uh xhs,t ,ni
7n
2
Notice, for s = 1, 2, xhs,t ,n and ni=1 ((y s + y(ti ))/n) are positively dependent given s (see Magill and Quinzii (1996)) since z˜ h ,n > 0. Hence, because u (·) < 0, ! n s
y + y(ti ) s,t < 0, given s. , uh xh ,n Covariance n i=1
Now, Eµ
.
n " s
% y + y(ti ) s,t uh xh ,n n i=1
!
n s
y + y(ti ) = Covariance , uh xhs,t ,n n i=1 - n .
y s + y(ti ) + Eµ Eµ uh xhs,t ,n . n i=1
Thus, -
. n " s
% y + y(ti ) s,t uh xh ,n Eµ n i=1 . - n
y s + y(ti ) < Eµ Eµ uh xhs,t ,n . n i=1
360
Sujoy Mukerji and Jean-Marc Tallon
Hence, 7n
8 π(s)Eµ ((y s + y(ti ))/n) Eµ uh xhs,t ,n s=1 i=1 q˜n < 8 7
2 π(s)Eµ uh xhs,t ,n s=1 .$ - n y s + y(ti ) ⇒ q˜n < max Eµ s n 2
i=1
⇒ q˜n < y + (1 − ν 1 )y(0) + ν 1 y(1) s
(15.A.1)
Consider next, h such that z˜ h ,n < 0. By a reasoning similar to that followed n s,t (y s +y(ti )) for the agent h (noticing xh ,n and i=1 are negatively dependent n given s) one gets q˜n > y s + (1 − ν 0 )y(1) + ν 0 y(0).
(15.A.2)
Therefore, a necessary condition for having an equilibrium with z˜ h,n = 0 for at least some h is that y s − y s > (1 − ν 0 − ν 1 )(y(1) − y(0)). Set A¯ = ¯ then z˜ h,n = 0 for all (y s − y s )/(y(1) − y(0)) ∈ (0, 1). If 1 − ν 0 − ν 1 > A, h, at any equilibrium.
Finally, note if n → ∞, CEπ⊗ν uh xhs,t ,n is just a standard expectation operator with respect to the additive measure π × µ(t), where µ(t) is such that µ
$!
n s (y + y(ti )) t : lim = y s + (1 − ν 1 )y(0) + ν 1 y(1) n→∞ n
= 1.
i=1
The proof then proceeds as in the case of finite n except that the inequality (15.A.1) reads lim q˜n ≤ y s + (1 − ν 1 )y(0) + ν 1 y(1)
n→∞
and the inequality (15.A.2) reads lim q˜n ≥ y s + (1 − ν 0 )y(1) + ν 0 y(0).
n→∞
Acknowledgments We thank the referees and the Managing Editor, M. Armstrong, as well as, A. Bisin, S. Bose, P. Ghirardato, I. Gilboa, R. Guesnerie, B. Lipman, J. Malcomson, J. Marin, M. Marinacci, M. Piccione, Z. Safra, H. S. Shin and J. C. Vergnaud for helpful comments. The chapter has also benefitted from the responses of seminar members at the University of British Columbia, University of Essex, University Evry,
Incompleteness of financial markets
361
Johns Hopkins, Nuffield College, Tilburg University, CORE-Louvain-la-Neuve, NYU, UPenn, University of Paris I, University of Toulouse, University du Maine, ENPC-Paris, University Venezia and the ESRC Economic Theory Conference at Kenilworth. The first author gratefully acknowledges financial assistance from an Economic and Social Research Council of U.K. Research Fellowship (# R000 27 1065).
Notes 1 Recent literature has debated the merits of the CEU framework as a model of ambiguity aversion. For instance, Epstein (1999) contends that CEU preferences associated with convex capacities (see Section 15.2,) do not always conform with a “natural” notion of ambiguity averse behavior. On the other hand, Ghirardato and Marinacci (1997) argue that ambiguity aversion is demonstrated in the CEU model by a broad class of capacities which includes convex capacities. 2 And, indeed, formal empirical investigations overwhelmingly confirm that the data on individual consumption are more consistent with incomplete than complete markets. Among others, see Zeldes (1989), Carroll (1992), Deaton and Paxson (1994) and Hayashi et al. (1996). The evidence, however, is not unanimous, see, for example, Mace (1991). 3 We will say an asset’s payoff has an idiosyncratic component if at least some component of the payoff is independent of (1) the realized endowments of agents and (2) of the payoff of any other asset as well. 4 Fishburn (1993) provides an axiomatic justification of this definition of ambiguity and Mukerji (1997) demonstrates its equivalence to a more primitive and epistemic notion of ambiguity (expressed in terms of the DM’s knowledge of the state space). 5 The Choquet expectation operator may be directly defined with respect to a non-additive probability, see Schmeidler (1989). Also, for an intuitive introduction to the CEU model see Section 2 in Mukerji (1998). 6 For instance, suppose a firm introduces a new product line, an innovation, into the market. In such a case, typically, it is not just the shocks commonly affecting firms in the same trade that will affect the sales of the new product but also more (brand) specific elements, for example, whether (or not) the innovation has a “special” appeal for the consumers. Another example of idiosyncratic shocks, are shocks to firms’internal organizational capabilities. 7 In this context it is worth noting that it is reported almost 70 percent of corporate borrowing in the US is through bonds. Default rates on bonds are also significant. Financial Times, October 13, 1998, in its report headlined “US corporate bond market hit,” notes, “the rate of default on US high-yield bonds was running at 10% in the early 1990s … today the default rate is hovering around 3% but creeping higher.” 8 Werner (1997) considers a finance economy of which this is just a special case. There are standard arguments that ensure the existence of equilibria of such economies (op.-cit., p. 100). 9 This has to be qualified since there exists some nongeneric constraints among endowments in different states, namely ehs,t = ehs,t ≡ ehs . 10 Laws of large numbers for ambiguous beliefs have been studied by, among others, Walley and Fine (1982), Marinacci (1996, 1999). Appendix A contains a formal statement of the version we apply. This version was, essentially, originally proved in Walley and Fine (1982). The statement given here is from Marinacci (1996), Theorem 7.7. However, the result is a direct implication of the more general Theorem 15 in Marinacci (1999).
362
Sujoy Mukerji and Jean-Marc Tallon
11 Section 3.4 of Epstein and Wang (1994), presents an example of an economy with heterogeneous agents. But, in this model markets are assumed to be complete, and hence, risk-sharing continues to be efficient (Pareto optimal), as is explicitly observed by the authors.
References Bewley, T. (1986). “Knightian Decision Theory: Part,” Discussion Paper 807, Cowles Foundation. Carroll, C. (1992). “The Buffer Stock Theory of Savings; some Macroeconomic Evidence,” Brookings Papers on Economic Activity, 2, 61–135. Chateauneuf, A., R. Dana, and J.-M. Tallon (2000). “Optimal Risk Sharing Rules and Equilibria with Choquet-Expected-Utility,” Journal of Mathematical Economics, 34, 191–214. Deaton, A. and C. Paxson (1994). “Intertemporal Choice and Inequality,” Journal of Political Economy, 102(3), 437–467. Dow, J. and S. Werlang (1992). “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, 60(1), 197–204. (Reprinted as Chapter 17 in this volume.) Epstein, L. (1999). “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608. (Reprinted as Chapter 9 in this volume.) Epstein, L. and T. Wang (1994). “Intertemporal Asset Pricing under Knightian Uncertainty,” Econometrica, 62(3), 283–322. (Reprinted as Chapter 18 in this volume.) Fishburn, P. (1993). “The Axioms and Algebra of Ambiguity,” Theory and Decision, 34, 119–137. Ghirardato, P. (1997). “On Independence for Non-Additive Measures, with a Fubini Theorem,” Journal of Economic Theory, 73, 261–291. Ghirardato, P. and M. Marinacci (1997). “Ambiguity Aversion Made Precise: A Comparative Foundation and Some Implications,” Social science w.p. 1026, CalTech. Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Hayashi, F., J.Altonji, and L. Kotlikof (1996). “Risk Sharing Between and Within Families,” Econometrica, 64(2), 261–294. Hendon, E., H. Jacobsen, B. Sloth, and T. Tranaes (1996). “The Product of Capacities and Belief Functions,” Mathematical Social Sciences, 32(2), 95–108. Kelsey, D. and F. Milne (1995). “The Arbitrage Pricing Theorem with Non-Expected Utility Preferences,” Journal of Economic Theory, 65(2), 557–574. Lucas, R. (1978). “Asset Prices in an Exchange Economy,” Econometrica, 46, 1429–1445. Mace, B. (1991). “Full Insurance in the Presence of Aggregate Uncertainty,” Journal of Political Economy, 99(5), 928–956. Magill, M. and M. Quinzii (1996). Theory of Incomplete Markets, Vol. 1, MIT Press. Marinacci, M. (1996). “Limit Laws for Non-Additive Probabilities, and their Frequentist Interpretation,” mimeo. Marinacci, M. (1999). “Limit Laws for Non-Additive Probabilities and their Frequentist Interpretation,” Journal of Economic Theory, 84, 145–195. Morris, S. (1997). “Risk, Uncertainty and Hidden Information,” Theory and Decision, 42(3), 235–269.
Incompleteness of financial markets
363
Mukerji, S. (1997). “Understanding the Nonadditive Probability Decision Model,” Economic Theory, 9(1), 23–46. Mukerji, S. (1998). “Ambiguity Aversion and Incompleteness of Contractual Form,” American Economic Review, 88(5), 1207–1231. (Reprinted as Chapter 14 in this volume.) Mukerji, S., and J.-M. Tallon (1999). “Ambiguity Aversion and Incompleteness of Financial Markets-Extended Version,” Mimeo. 99-28, Cahiers de la Maison des Sciences Economiques, Universite Paris I, available for download at http://eurequa.univparis1.fr/membros/tallon/tallon.htm Schmeidler, D. (1989). “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57(3), 571–587. (Reprinted as Chapter 5 in this volume.) Walley, P. and L. Fine (1982). “Towards a Frequentist Theory of Upper and Lower Probability,” Annals of Statistics, 10, 741–761. Werner, J. (1997). “Diversification and Equilibrium in Securities Markets,” Journal of Economic Theory, 75, 89–103. Zeldes, S. (1989). “Consumption and Liquidity: An Empirical Investigation,” Journal of Political Economy, 97, 305–346.
16 A quartet of semigroups for model specification, robustness, prices of risk, and model detection Evan W. Anderson, Lars Peter Hansen, and Thomas J. Sargent
16.1. Introduction 16.1.1. Rational expectations and model misspecification A rational expectations econometrician or calibrator typically attributes no concern about specification error to agents even as he shuttles among alternative specifications.1 Decision makers inside a rational expectations model know the model.2 Their confidence contrasts with the attitudes of both econometricians and calibrators. Econometricians routinely use likelihood-based specification tests (information criteria or IC) to organize comparisons between models and empirical distributions. Less formally, calibrators sometimes justify their estimation procedures by saying that they regard their models as incorrect and unreliable guides to parameter selection if taken literally as likelihood functions. But the agents inside a calibrator’s model do not share the model-builder’s doubts about specification. By equating agents’ subjective probability distributions to the objective one implied by the model, the assumption of rational expectations precludes any concerns that agents should have about the model’s specification. The empirical power of the rational expectations hypothesis comes from having decision makers’beliefs be outcomes, not inputs, of the model-building enterprise. A standard argument that justifies equating objective and subjective probability distributions is that agents would eventually detect any difference between them, and would adjust their subjective distributions accordingly. This argument implicitly gives agents an infinite history of observations, a point that is formalized by the literature on convergence of myopic learning algorithms to rational expectations equilibria of games and dynamic economies.3 Specification tests leave applied econometricians in doubt because they have too few observations to discriminate among alternative models. Econometricians with finite data sets thus face a model detection problem that builders of rational
Anderson Evan W., Lars Peter Hansen, and Thomas J. Sargent, (forthcoming), “A quartet of semigroups for model specification, robustness, prices of risk, and model detection,” Journal of the European Economic Association. March 2003; 1(1): 68–123.
A quartet of semigroups for model specification
365
expectations models let agents sidestep by endowing them with infinite histories of observations “before time zero.” This chapter is about models with agents whose databases are finite, like econometricians and calibrators. Their limited data leave agents with model specification doubts that are quantitatively similar to those of econometricians and that make them value decision rules that perform well across a set of models. In particular, agents fear misspecifications of the state transition law that are sufficiently small that they are difficult to detect because they are obscured by random shocks that impinge on the dynamical system. Agents adjust decision rules to protect themselves against modeling errors, a precaution that puts model uncertainty premia into equilibrium security market prices. Because we work with Markov models, we can avail ourselves of a powerful tool called a semigroup. 16.1.2. Iterated laws and semigroups The law of iterated expectations imposes consistency requirements that cause a collection of conditional expectations operators associated with a Markov process to form a mathematical object called a semigroup. The operators are indexed by the time that elapses between when the forecast is made and when the random variable being forecast is realized. This semigroup and its associated generator characterize the Markov process. Because we consider forecasting random variables that are functions of a Markov state, the current forecast depends only on the current value of the Markov state.4 The law of iterated values embodies analogous consistency requirements for a collection of economic values assigned to claims to payoffs that are functions of future values of a Markov state. The family of valuation operators indexed by the time that elapses between when the claims are valued and when their payoffs are realized forms another semigroup. Just as a Markov process is characterized by its semigroup, so prices of payoffs that are functions of a Markov state can be characterized by a semigroup. Hansen and Scheinkman (2002) exploited this insight. Here we extend their insight to other semigroups. In particular, we describe four semigroups: (1) one that describes a Markov process; (2) another that adjusts continuation values in a way that rewards decision rules that are robust to misspecification of the approximating model; (3) another that models the equilibrium pricing of securities with payoff dates in the future; and (4) another that governs statistics for discriminating between alternative Markov processes using a finite time series data record.5 We show the close connections that bind these four semigroups. 16.1.3. Model detection errors and market prices of risk In earlier work (Hansen, Sargent, and Tallarini (1999), henceforth denoted HST, and Hansen, Sargent, and Wang (2002), henceforth denoted HSW), we studied various discrete time asset pricing models in which decision makers’ fear of model misspecification put model uncertainty premia into market prices of risk, thereby
366 Anderson et al. potentially helping to account for the equity premium. Transcending the detailed dynamics of our examples was a tight relationship between the market price of risk and the probability of distinguishing the representative decision maker’s approximating model from a worst-case model that emerges as a byproduct of his cautious decision making procedure. Although we had offered only a heuristic explanation for that relationship, we nevertheless exploited it to help us calibrate the set of alternative models that the decision maker should plausibly seek robustness against. In the context of continuous time Markov models, this chapter analytically establishes a precise link between the uncertainty component of risk prices and a bound on the probability of distinguishing the decision maker’s approximating and worstcase models. We also develop new ways of representing decision makers’ concerns about model misspecification and their equilibrium consequences.
16.1.4. Related literature In the context of a discrete-time, linear-quadratic permanent income model, HST considered model misspecifications measured by a single robustness parameter. HST showed how robust decision making promotes behavior like that induced by risk aversion. They interpreted a preference for robustness as a decision maker’s response to Knightian uncertainty and calculated how much concern about robustness would be required to put market prices of risk into empirically realistic regions. Our fourth semigroup, which describes model detection errors, provides a statistical method for judging whether the required concern about robustness is plausible. HST and HSW allowed the robust decision maker to consider only a limited array of specification errors, namely, shifts in the conditional mean of shocks that are i.i.d. and normally distributed under an approximating model. In this chapter, we consider more general approximating models and motivate the form of potential specification errors by using specification test statistics. We show that HST’s perturbations to the approximating model emerge in linear-quadratic, Gaussian control problem as well as in a more general class of control problems in which the stochastic evolution of the state is a Markov diffusion process. However, we also show that misspecifications different from HST’s must be entertained when the approximating model includes Markov jump components. As in HST, our formulation of robustness allows us to reinterpret one of Epstein and Zin’s (1989) recursions as reflecting a preference for robustness rather than aversion to risk. As we explain in Hansen, Sargent, Turmuhambetova, and Williams (henceforth HSTW) (2002), the robust control theory described in Section 16.5 is closely connected to the minmax expected utility or multiple priors model of Gilboa and Schmeidler (1989). A main theme of this chapter is to advocate a workable strategy for actually specifying those multiple priors in applied work. Our strategy is to use detection error probabilities to surround the single model that is typically specified in applied work with a set of empirically plausible but vaguely specified alternatives.
A quartet of semigroups for model specification
367
16.1.5. Robustness versus learning A convenient feature of rational expectations models is that the model builder imputes a unique and explicit model to the decision maker. Our analysis shares this analytical convenience. While an agent distrusts his model, he still uses it to guide his decisions.6 But the agent uses his model in a way that recognizes that it is an approximation. To quantify approximation, we measure discrepancy between the approximating model and other models with relative entropy, an expected log likelihood ratio, where the expectation is taken with respect to the distribution from the alternative model. Relative entropy is used in the theory of large deviations, a powerful mathematical theory about the rate at which uncertainty about unknown distributions is resolved as the number of observations grows.7 An advantage of using entropy to restrain model perturbations is that we can appeal to the theory of statistical detection to provide information about how much concern about robustness is quantitatively reasonable. Our decision maker confronts alternative models that can be discriminated among only with substantial amounts of data, so much data that, because he discounts the future, the robust decision maker simply accepts model misspecification as a permanent situation. He designs robust controls, and does not use data to improve his model specification over time. He adopts this stance because relative to his discount factor, it would take too much time for enough data to accrue for him to dispose of the alternative models that concern him. In contrast, many formulations of learning have decision makers fully embrace an approximating model when making their choices.8 Despite their different orientations, learners and robust decision makers both need a convenient way to measure the proximity of two probability distributions. This fact builds technical bridges between robust decision theory and learning theory. The same expressions from large deviation theory that govern bounds on rates of learning also provide bounds on value functions across alternative possible models in robust decision theory.9 More importantly here, we shall show that the tight relationship between detection error probabilities and the market price of risk that was encountered by HST and HSW can be explained by formally studying the rate at which detection errors decrease as sample size grows.
16.1.6. Reader’s guide A reader interested only in our main results can read Section 16.2, then jump to the empirical applications in Section 16.9.
16.2. Overview This section briefly tells how our main results apply in the special case in which the approximating model is a diffusion. Later sections provide technical details and show how things change when we allow jump components.
368 Anderson et al. A representative agent’s model asserts that the state of an economy xt in a state space D follows a diffusion10 dxt = µ(xt )dt + (xt )dBt ,
(16.1)
where Bt is a Brownian vector. The agent wants decision rules that work well not just when (16.1) is true but also when the data conform to models that are statistically difficult to distinguish from (16.1). A robust control problem to be studied in Section 16.5 leads to such a robust decision rule together with a value function V (xt ) and a process γ (xt ) for the marginal utility of consumption of a representative agent. As a byproduct of the robust control problem, the decision maker computes a worst-case diffusion that takes the form ˆ t )]dt + (xt )dBt , dxt = [µ(xt ) + (xt )g(x
(16.2)
where gˆ = (−1/θ ) (∂V /∂x) and θ > 0 is a parameter measuring the size of potential model misspecifications. Notice that (16.2) modifies the drift but not the volatility relative to (16.1). The formula for gˆ tells us that large values of θ are associated with gˆ t ’s that are small in absolute value, making model (16.2) difficult to distinguish statistically from model (16.1). The diffusion (16.6) lets us quantify just how difficult this statistical detection problem is. Without a preference for robustness to model misspecification, the usual approach to asset pricing is to compute the expected discounted value of payoffs with respect to the “risk-neutral” probability measure that is associated with the following twisted version of the physical measure (diffusion (16.1)): dxt = [µ(xt ) + (xt )g(x ¯ t )]dt + (xt )dBt .
(16.3)
In using the risk-neutral measure to price assets, future expected returns are discounted at the risk-free rate ρ(xt ), obtained as follows. The marginal utility of the representative household γ (xt ) conforms to dγt = µγ (xt )dt + σγ (xt )dBt . Then the risk-free rate is ρ(xt ) = δ − (µγ (xt )/γ (xt )), where δ is the instantaneous rate at which the household discounts future utilities; the risk-free rate thus equals the negative of the expected growth rate of the representative household’s marginal utility. The price of a payoff φ(xN ) contingent on a Markov state in period N is then * ) N ρ(xu )du φ(xN )|x0 = x , (16.4) E¯ exp − 0
where E¯ is the expectation evaluated with respect to the distribution generated by (16.3). This formula gives rise to a pricing operator for every horizon N . Relative to the approximating model, the diffusion (16.3) for the risk-neutral measure distorts the drift in the Brownian motion by adding the term (x)g(x ¯ t ), where g¯ = (∂ log γ (x)/∂x). Here g¯ is a vector of “factor risk prices” or “market prices of risk.” The equity premium puzzle is the finding that with plausible quantitative
A quartet of semigroups for model specification
369
specifications for the marginal utility γ (x), factor risk prices g¯ are too small relative to their empirically estimated counterparts. In Section 16.7, we show that when the planner and a representative consumer want robustness, the diffusion associated with the risk-neutral measure appropriate for pricing becomes dxt = (µ(xt ) + (xt )[g(x ¯ t ) + g(x ˆ t )])dt + (xt )dBt ,
(16.5)
where gˆ is the same process that appears in (16.2). With robustness sought over a set of alternative models that is indexed by θ , factor risk prices become augmented to g+ ¯ g. ˆ The representative agent’s concerns about model misspecification contribute the gˆ component of the factor risk prices. To evaluate the quantitative potential for attributing parts of the market prices of risk to agents’ concerns about model misspecification, we need to calibrate θ and therefore |g|. ˆ To calibrate θ and g, ˆ we turn to a closely related fourth diffusion that governs the probability distribution of errors from using likelihood ratio tests to detect which of two models generated a continuous record of length N of observations on xt . Here the key idea is that we can represent the average error in using a likelihood ratio test to detect the difference between the two models (16.1) and (16.3) from a continuous record of data of length N as 0.5E (min{exp(N ), 1}|x0 = x) where E is evaluated with respect to model (16.1) and N is a likelihood ratio of the data record of model (16.2) with respect to model (16.1). For each α ∈ (0, 1), we can use the inequality E(min{exp(N ), 1}|x0 = x) ≤ E({exp(αN )}|x0 = x) to attain a bound on the detection error probability. For each α, we show that the bound can be calculated by forming a new diffusion that uses (16.1) and (16.2) as ingredients, and in which the drift distortion gˆ from (16.2) plays a key role. In particular, for α ∈ (0, 1), define dxtα = [µ(xt ) + α(xt )g(x ˆ t )]dt + (xt ) dBt ,
(16.6)
and define the local rate function ρ α (x) = ((1−α)α/2)g(x) ˆ Then the bound ˆ g(x). on the average error in using a likelihood ratio test to discriminate between the approximating model (16.1) and the worst-case model (16.2) from a continuous data record of length N is ) N * α α av error ≤ 0.5E exp − ρ (xt ) dt x0 = x , (16.7) 0
where E α is the mathematical expectation evaluated with respect to the diffusion (16.6). The error rate ρ α (x) is maximized by setting α = 0.5. Notice that the right side of (16.7) is one half the price of pure discount bond that pays off one unit of consumption for sure N periods in the future, treating ρ α as the risk-free rate and the measure induced by (16.6) as the risk-neutral probability measure. It is remarkable that the three diffusions (16.2), (16.5), and (16.6) that describe the worst case model, asset pricing under a preference for robustness, and the local behavior of a bound on model detection errors, respectively, are all obtained
370 Anderson et al. by perturbing the drift in the approximating model (16.1) with functions of the same drift distortion g(x) ˆ that emerges from the robust control problem. To the extent that the bound on detection probabilities is informative about the detection probabilities themselves, our theoretical results thus neatly explain the pattern that was observed in the empirical applications of HST and HSW, namely, that there is a tight link between calculated detection error probabilities and the market price of risk. That link transcends all details of the model specification.11 . In Section 16.9, we shall encounter this tight link again when we calibrate the contribution to market prices of risk that can plausibly be attributed to a preference for robustness in the context of three continuous time asset pricing models. Subsequent sections of this chapter substantiate these and other results in a more general Markov setting that permits x to have jump components, so that jump distortions also appear in the Markov processes for the worst-case model, asset pricing, and model detection error. We shall exploit and extend the assetpricing structure of formulas like (16.4) and (16.7) by recognizing that they reflect that collections of expectations, values, and bounds on detection error rates can all be described with semigroups.
16.3. Mathematical preliminaries The remainder of this chapter studies continuous-time Markov formulations of model specification, robust decision making, pricing, and statistical model detection. We use Feller semigroups indexed by time for all four purposes. This section develops the semigroup theory needed for our chapter. 16.3.1. Semigroups and their generators Let D be a Markov state space that is a locally compact and separable subset of R m . We distinguish two cases. First, when D is compact, we let C denote the space of continuous functions mapping D into R. Second, when we want to study cases in which the state space is unbounded so that D is not compact, we shall use a one-point compactification that enlarges the state space by adding a point at ∞. In this case we let C be the space of continuous functions that vanish at ∞. We can think of such functions as having domain D or domain D ∪ ∞. The compactification is used to limit the behavior of functions in the tails when the state space is unbounded. We use the sup-norm to measure the magnitude of functions on C and to define a notion of convergence. We are interested in a strongly continuous semigroup of operators {St :t ≥ 0} with an infinitesimal generator G. For {St :t ≥ 0} to be a semigroup we require that S0 = I and St+τ = St Sτ for all τ , t ≥ 0. A semigroup is strongly continuous if lim Sτ φ = φ, τ ↓0
A quartet of semigroups for model specification
371
where the convergence is uniform for each φ in C. Continuity allows us to compute a time derivative and to define a generator Gφ = lim τ ↓0
Sτ φ − φ . τ
(16.8)
This is again a uniform limit and it is well defined on a dense subset of C. A generator describes the instantaneous evolution of a semigroup. A semigroup can be constructed from a generator by solving a differential equation. Thus applying the semigroup property gives lim τ ↓0
St+τ φ − St φ = GSt φ, τ
(16.9)
a differential equation for a semigroup that is subject to the initial condition that S0 is the identity operator. The solution to differential Equation (16.9) is depicted heuristically as: St = exp(tG), and thus satisfies the semigroup requirements. The exponential formula can be justified rigorously using a Yosida approximation, which formally constructs a semigroup from its generator. In what follows, we will use semigroups to model Markov processes, intertemporal prices, and statistical discrimination. Using a formulation of Hansen and Scheinkman (2002), we first examine semigroups that are designed to model Markov processes. 16.3.2. Representation of a generator We describe a convenient representation result for a strongly continuous, positive, contraction semigroup. Positivity requires that St maps nonnegative functions φ into nonnegative functions φ for each t. When the semigroup is a contraction, it is referred to as a Feller semigroup. The contraction property restricts the norm of St to be less than or equal to one for each t and is satisfied for semigroups associated with Markov processes. Generators of Feller semigroups have a convenient characterization: 1 ∂ 2φ ∂φ + N φ − ρφ, (16.10) + trace Gφ = µ · ∂x 2 ∂x∂x where N has the product form N φ(x) = [φ(y) − φ(x)]η(dy|x),
(16.11)
where ρ is a nonnegative continuous function, µ is an m-dimensional vector of continuous functions, is a matrix of continuous functions that is positive
372 Anderson et al. semidefinite on the state space, and η(·|x) is a finite measure for each x and 2 into C where continuous in x for Borel subset of D. We require that N map CK 2 CK is the subspace of functions that are twice continuously differentiable functions 2 .12 with compact support in D. Formula (16.11) is valid at least on CK To depict equilibrium prices we will sometimes go beyond Feller semigroups. Pricing semigroups are not necessarily contraction semigroups unless the instantaneous yield on a real discount bond is nonnegative. When we use this approach for pricing, we will allow ρ to be negative. While this puts us out of the realm of Feller semigroups, as argued by Hansen and Scheinkman (2002), known results for Feller semigroups can often be extended to pricing semigroups. We can think of the generator (16.10) as being composed of three parts. The first two components are associated with well known continuous-time Markov process models, namely, diffusion and jump processes. The third part discounts. The next three subsections will interpret these components of Equation (16.10). 16.3.2.1. Diffusion processes The generator of a Markov diffusion process is a second-order differential operator: Gd φ = µ ·
∂φ ∂x
+
1 ∂ 2φ , trace ∂x∂x 2
where the coefficient vector µ is the drift or local mean of the process and the coefficient matrix is the diffusion or local covariance matrix. The corresponding stochastic differential equation is: dxt = µ(xt )dt + (xt )dBt , where {Bt } is a multivariate standard Brownian motion and = . Sometimes the resulting process will have attainable boundaries, in which case we either stop the process at the boundary or impose other boundary protocols. 16.3.2.2. Jump processes The generator for a Markov jump process is. Gn φ = N φ = λ[Qφ − φ], (16.12) . where the coefficient λ = η(dy|x) is a possibly state-dependent Poisson intensity parameter that sets the jump probabilities and Q is a conditional expectation operator that encodes the transition probabilities conditioned on a jump taking place. Without loss of generality, we can assume that the transition distribution associated with the operator Q assigns probability zero to the event y = x provided that x = ∞, where x is the current Markov state and y the state after a jump takes place. That is, conditioned on a jump taking place, the process cannot stay put with positive probability unless it reaches a boundary.
A quartet of semigroups for model specification
373
The jump and diffusion components can be combined in a model of a Markov process. That is, Gd φ + Gn φ = µ ·
∂φ ∂x
1 ∂ 2φ + trace + N φ, 2 ∂x∂x
(16.13)
is the generator of a family (semigroup) of conditional expectation operators of a Markov process {xt }, say St (φ)(x) = E[φ(xt )|x0 = x]. 16.3.2.3. Discounting The third part of (16.10) accounts for discounting. Thus, consider a Markov process {xt } with generator Gd + Gn . Construct the semigroup: * ) t St φ = E exp − ρ(xτ ) dτ φ(xt )|x0 = x , 0
on C. We can think of this semigroup as discounting the future state at the stochastic rate ρ(x). Discount rates will play essential roles in representing shadow prices from a robust resource allocation problem and in measuring statistical discrimination between competing models.13 16.3.3. Extending the domain to bounded functions While it is mathematically convenient to construct the semigroup on C, sometimes it is necessary for us to extend the domain to a larger class of functions. For instance, indicator functions 1D of nondegenerate subsets D are omitted from C. Moreover, 1D is not in C when D is not compact; nor can this function be approximated uniformly. Thus to extend the semigroup to bounded, Borel measurable functions, we need a weaker notion of convergence. Let {φj :j = 1, 2, . . .} be a sequence of uniformly bounded functions that converges pointwise to a bounded function φo . We can then extend the Sτ semigroup to φo using the formula: Sτ φo = lim Sτ φj , j →∞
where the limit notion is now pointwise. The choice of approximating sequence does not matter and the extension is unique.14 With this construction, we define the instantaneous discount or interest rate as the pointwise derivative − lim τ ↓0
1 log Sτ 1D = ρ, τ
when the derivative exists.
374 Anderson et al. 16.3.4. Extending the generator to unbounded functions Value functions for control problems on noncompact state spaces are often not bounded. Thus for our study of robust counterparts to optimization, we must extend the semigroup and its generator to unbounded functions. We adopt an approach that is specific to a Markov process and hence we study this extension only for a semigroup generated by G = Gd + Gn . We extend the generator using martingales. To understand this approach, we first remark that for a given φ in the domain of the generator, t Mt = φ(xt ) − φ(x0 ) − Gφ(xτ ) dτ , 0
is a martingale. In effect, we produce a martingale by subtracting the integral of the local means from the process {φ(xt )}. This martingale construction suggests a way to build the extended generator. Given φ we find a function ψ such that t Mt = φ(xt ) − φ(x0 ) − ψ(xτ ) dτ , (16.14) 0
is a local martingale (a martingale under all members of a sequence of stopping times that increases to ∞). We then define Gφ = ψ. This construction extends the operator G to a larger class of functions than those for which the operator differentiation (16.8) is well defined. For every φ in the domain of the generator, ψ = Gφ in (16.14) produces a martingale. However, there are φ’s not in the domain of the generator for which (16.14) also produces a martingale.15 In the case of a Feller process defined on a state space D that is an open subset of R m , this extended domain contains at least functions in C˜ 2 , functions that are twice continuously differentiable on D. Such functions can be unbounded when the original state space D is not compact.
16.4. A tour of four semigroups In the remainder of the chapter we will study four semigroups. Before describing each in detail, it is useful to tabulate the four semigroups and their uses. We have already introduced the first semigroup, which describes the evolution of a state vector process {xt }. This semigroup portrays a decision maker’s approximating model. It has the generator displayed in (16.10) with ρ = 0, which we repeat here for convenience: ∂φ 1 ∂ 2φ Gφ = µ · + N φ. (16.15) + trace ∂x 2 ∂x ∂x While up to now we used G to denote a generic semigroup, from this point forward we will reserve it for the approximating model. We can think of the decision maker as using the semigroup generated by G to forecast functions φ(xt ). This semigroup for the approximating model can have both jump and Brownian components, but
A quartet of semigroups for model specification
375
the discount rate ρ is zero. In some settings, the semigroup associated with the approximating model includes a description of endogenous state variables and therefore embeds robust decision rules of one or more decision makers, as for example when the approximating model emerges from a robust resource allocation problem of the kind to be described in Section 16.5. With our first semigroup as a point of reference, we will consider three additional semigroups. The second semigroup represents an endogenous worst-case model that a decision maker uses to promote robustness to possible misspecification of his approximating model (16.15). For reasons that we discuss in Section 16.8, we shall focus the decision maker’s attention on worst-case models that are absolutely continuous with respect to his approximating model. Following Kunita (1969), we shall assume that the decision maker believes that the data are actually generated by a member of a class of models that are obtained as Markov perturbations of the approximating model (16.15). We parameterize this class of models by a pair of functions (g, h), where g is a continuous function of the Markov state x that has the same number of coordinates as the underlying Brownian motion, and h is a nonnegative function of (y, x) that distorts the jump intensities. For the worst-case ˆ Then we can represent model, we have the particular settings g = gˆ and h = h. ˆ the worst-case generator G as ˆ = µˆ · Gφ
∂φ ∂x
+
2 1 ∂ φ + Nˆ φ, trace 2 ∂x ∂x
(16.16)
where µˆ = µ + gˆ = ˆ x)η(dy|x). η(dy|x) ˆ = h(y, The distortion gˆ to the diffusion and the distortion hˆ to the jump component in the worst case model will also play essential roles both in asset pricing and in the detection probabilities formulas. From (16.12), it follows that the jump ˆ x)η(dy|x) and the intensity under this parameterization is given by λˆ (x) = h(y, ˆ x)/λ(x))η(dy|x). ˆ jump distribution conditioned on x is (h(y, A generator of the ˆ form (16.16) emerges from a robust decision problem, the perturbation pair (g, ˆ h) being chosen by a malevolent player, as we discuss next. Our third semigroup modifies one that Hansen and Scheinkman (2002) developed for computing the time zero price of a state contingent claim that pays off φ(xt ) at time t. Hansen and Scheinkman showed that the time zero price can be computed with a risk-free rate ρ¯ and a risk-neutral probability measure embedded in a semigroup with generator: ¯ = −ρφ Gφ ¯ + µ¯ ·
∂φ ∂x
+
1 ∂ 2φ ¯ + N¯ φ. trace 2 ∂x∂x
(16.17a)
376 Anderson et al. Here µ¯ = µ + π¯ ¯ =
(16.17b)
¯ η(dy|x) ¯ = (y, x)η(dy|x). In the absence of a concern about robustness, π¯ = g¯ is a vector of prices for the ¯ = h¯ encodes the jump risk prices. In Markov setBrownian motion factors and tings without a concern for robustness, (16.17b) represents the connection between the physical probability and the so-called risk-neutral probability that is widely used for asset pricing along with the interest rate adjustment. We alter generator (16.17) to incorporate a representative consumer’s concern about robustness to model misspecification. Specifically a preference for ¯ that are based solely robustness changes the ordinary formulas for π¯ and on pricing risks under the assumption that the approximating model is true. A concern about robustness alters the relationship between the semigroups for representing the underlying Markov processes and pricing. With a concern for robustness, we represent factor risk prices by relating µ¯ to the worst-case drift µ: ˆ µ¯ = µˆ + g¯ and risk-based jump prices by relating η¯ to the worst-case jump ¯ x)η(dy|x). measure η: ˆ η(dy|x) ¯ = h(y, ˆ Combining this decomposition with the relation between the worst-case and the approximating models gives the new vectors of pricing functions π¯ = g¯ + gˆ ˆ ¯ = h¯ h, ˆ is used to portray the (constrained) worst-case model in where the pair (g, ˆ h) ¯ (16.16). Later we will supply formulas for (ρ, ¯ g, ¯ h). A fourth semigroup statistically quantifies the discrepancy between two competing models as a function of the time interval of available data. We are particularly interested in measuring the discrepancy between the approximating and worst-case models. For each α ∈ (0, 1), we develop a bound on a detection error probability in terms of a semigroup and what looks like an associated “risk-free interest rate.” The counterpart to the risk-free rate serves as an instantaneous discrimination rate. For each α, the generator for the bound on the detection error probability can be represented as: ∂ 2φ 1 ∂φ G α φ = −ρ α φ + µα · + N α φ, + trace α ∂x 2 ∂x ∂x where µα = µ + g α α = ηα (dy|x) = hα (y, x)η(dy|x).
A quartet of semigroups for model specification
377
Table 16.1 Parameterizations of the generators of four semigroups. The rate modifies the generator associated with the approximating model by adding −ρφ to the generator for a test function φ. The drift distortion adds a term g · (∂φ/∂x) to the generator associated with the approximating model. The jump distortion density is h(y, x)η(dy|x) instead of the jump distribution η(dy|x) in the generator for the approximating model Semigroup
Generator
Rate
Drift distortion
Jump dist. density
Approximating model Worst-case model Pricing Detection
G
0
0
1
Gˆ G¯ Gα
0 ρ(x) ¯ ρ α (x)
g(x) ˆ π(x) ¯ = g(x) ¯ + g(x) ˆ g α (x)
ˆ x) h(y, ¯ x)h(y, ˆ x) ¯ (x) = h(y, hα (y, x)
The semigroup generated by G α governs the behavior as sample size grows of a bound on the fraction of errors made when distinguishing two Markov models using likelihood ratios or posterior odds ratios. The α associated with the best bound is determined on a case by case basis and is especially easy to find in the special case that the Markov process is a pure diffusion. Table 16.1 summarizes our parameterization of these four semigroups. Subsequent sections supply formulas for the entries in this table.
16.5. Model misspecification and robust control We now study the continuous-time robust resource allocation problem. In addition to an approximating model, this analysis will produce a constrained worst case model that by helping the decision maker to assess the fragility of any given decision rule can be used as a device to choose a robust decision rule. 16.5.1. Lyapunov equation under Markov approximating model and a fixed decision rule Under a Markov approximating model with generator G and a fixed policy function i(x), the decision maker’s value function is V (x) =
∞
exp(−δt)E[U [xt , i(xt )]|x0 = x] dt.
0
The value function V satisfies the continuous-time Lyapunov equation: δV (x) = U [x, i(x)] + GV (x).
(16.18)
Since V may not be bounded, we interpret G as the weak extension of the generator (16.13) defined using local martingales. The local martingale associated with this
378 Anderson et al. equation is:
t
Mt = V (xt ) − V (x0 ) −
(δV (xs ) − U [xs , i(xs )]) ds.
0
As in (16.13), this generator can include diffusion and jump contributions. We will eventually be interested in optimizing over a control i, in which case the generator G will depend explicitly on the control. For now we suppress that dependence. We refer to G as the approximating model; G can be modelled using the triple (µ, , η) as in (16.13). The pair (µ, ) consists of the drift and diffusion coefficients while the conditional measure η encodes both the jump intensity and the jump distribution. We want to modify the Lyapunov equation (16.18) to incorporate a concern about model misspecification. We shall accomplish this by replacing G with another generator that expresses the decision maker’s precaution about the specification of G. 16.5.2. Entropy penalties We now introduce perturbations to the decision maker’s approximating model that are designed to make finite horizon transition densities of the perturbed model be absolutely continuous with respect to those of the approximating model. We use a notion of absolute continuity that pertains only to finite intervals of time. In particular, imagine a Markov process evolving for a finite length of time. Our notion of absolute continuity restricts probabilities induced by the path {xτ :0 ≤ τ ≤ t} for all finite t. See HSTW (2002), who discuss this notion as well as an infinite history version of absolute continuity. Kunita (1969) shows how to preserve both the Markov structure and absolute continuity. Following Kunita (1969), we shall consider a Markov perturbation that can be parameterized by a pair (g, h), where g is a continuous function of the Markov state x and has the same number of coordinates as the underlying Brownian motion, and h is a nonnegative function of (y, x) used to model the jump intensities. In Section 16.8, we will have more to say about these perturbations including a discussion of why we do not perturb . For the pair (g, h), the perturbed generator is portrayed using a drift µ + g, a diffusion matrix , and a jump measure h(y, x)η(dy|x). Thus the perturbed generator is ∂φ(x) G(g, h)φ(x) = Gφ(x) + [(x)g(x)] · + [h(y, x) − 1] ∂x [φ(y) − φ(x)]η(dy|x). For this perturbed generator to be a Feller process would require that we impose additional restrictions on h. For analytical tractability we will only limit the perturbations to have finite entropy. We will be compelled to show, however, that the perturbation used to implement robustness does indeed generate a Markov process. This perturbation will be constructed formally as the solution to a constrained
A quartet of semigroups for model specification
379
minimization problem. In what follows, we continue to use the notation G to be the approximating model in place of the more tedious G(0, 1). 16.5.3. Conditional relative entropy At this point, it is useful to have a local measure of conditional relative entropy.16 Conditional relative entropy plays a prominent role in large deviation theory and in classical statistical discrimination where it is sometimes used to study the decay in the so-called type II error probabilities, holding fixed type I errors (Stein’s Lemma). For the purposes of this section, we will use relative entropy as a discrepancy measure. In Section 16.8 we will elaborate on its connection to the theory of statistical discrimination. As a measure of discrepancy, it has been axiomatized by Csiszar (1991) although his defense shall not concern us here. By t we denote the log of the ratio of the likelihood of model one to the likelihood of model zero, given a data record of length t. For now, let the data be either a continuous or a discrete time sample. The relative entropy conditioned on x0 is defined to be: E(t |x0 , model 1) = E[t exp(t )|x0 , model 0] =
d E[exp(αt )|x0 , model 0]|α=1 , dα
(16.19)
where we have assumed that the model zero probability distribution is absolutely continuous with respect to the model one probability distribution. To evaluate entropy, the second relation differentiates the moment-generating function for the log-likelihood ratio. The same information inequality that justifies maximum likelihood estimation implies that relative entropy is nonnegative. When the model zero transition distribution is absolutely continuous with respect to the model one transition distribution, entropy collapses to zero as the length of the data record t → 0. Therefore, with a continuous data record, we shall use a concept of conditional relative entropy as a rate, specifically the time derivative of (16.19). Thus, as a local counterpart to (16.19), we have the following measure: (g, h)(x) =
g(x) g(x) + 2
[1 − h(y, x) + h(y, x) log h(y, x)]η(dy|x), (16.20)
where model zero is parameterized by (0, 1) and model one is parameterized by (g, h). The quadratic form g g/2 comes from the diffusion contribution, and the term [1 − h(y, x) + h(y, x) log h(y, x)]η(dy|x), measures the discrepancy in the jump intensities and distributions. It is positive by the convexity of h log h in h.
380 Anderson et al. Let denote the space of all such perturbation pairs (g, h). Conditional relative entropy is convex in (g, h). It will be finite only when h(y, x)η(dy|x) < ∞.
0
0 is a penalty parameter. We are led to the following entropy penalty problem. Problem A J (V ) = inf
(g,h)∈
θ (g, h) + G(g, h)V .
(16.21)
Theorem 16.1. Suppose that (i) V is in C˜ 2 and (ii) exp[−V (y)/θ ]η(dy|x) < ∞ for all x. The minimizer of Problem A is 1 ∂V (x) g(x) ˆ = − (x) ∂x θ ) * V (x) − V (y) ˆh(y, x) = exp . θ
(16.22a)
The optimized value of the criterion is: J (V ) = −θ
G [exp (−V /θ )] . exp (−V /θ )
(16.22b)
Finally, the implied measure of conditional relative entropy is: ∗ =
V G[exp(−V /θ )] − G[V exp(−V /θ )] − θ G[exp(−V /θ)] . θ exp(−V /θ )
(16.22c)
Proof. The proof is in Appendix A. The formulas (16.22a) for the distortions will play a key role in our applications to asset pricing and statistical detection.
A quartet of semigroups for model specification
381
16.5.4. Risk-sensitivity as an alternative interpretation In light of Theorem 16.1, our modified version of Lyapunov equation (16.18) is δV (x) = min U [x, i(x)] + θ (g, h) + G(g, h)V (x) (g,h)∈
= U [x, i(x)] − θ
G [exp (−V /θ )] (x) . exp [−V (x)/θ]
(16.23)
If we ignore the minimization prompted by fear of model misspecification and instead simply start with that modified Lyapunov equation as a description of preferences, then replacing GV in the Lyapunov equation (16.18) by −θ(G[exp(−V /θ)]/ exp(−V /θ )) can be interpreted as adjusting the continuation value for risk. For undiscounted problems, the connection between risk-sensitivity and robustness is developed in a literature on risk-sensitive control (e.g. see James (1992) and Runolfsson (1994)). Hansen and Sargent’s (1995) recursive formulation of risk sensitivity accommodates discounting. The connection between the robustness and the risk-sensitivity interpretations is most evident when G = Gd so that x is a diffusion. Then
−θ
1 Gd [exp (−V /θ )] = Gd (V ) − exp (−V /θ ) 2θ
∂V ∂x
∂V ∂x
.
1 In this case, (16.23) is a partial differential equation. Notice that − 2θ scales (∂V /∂x) (∂V /∂x), the local variance of the value function process {V (xt )}. The interpretation of (16.23) under risk sensitive preferences would be that the decision maker is concerned not about robustness but about both the local mean and the local variance of the continuation value process. The parameter θ is inversely related to the size of the risk adjustment. Larger values of θ assign a smaller concern about risk. The term 1/θ is the so-called risk sensitivity parameter. Runolfsson (1994) deduced the δ = 0 (ergodic control) counterpart to (16.23) to obtain a robust interpretation of risk sensitivity. Partial differential equation (16.23) is also a special case of the equation system that Duffie and Epstein (1992), Duffie and Lions (1992), and Schroder and Skiadas (1999) have analyzed for stochastic differential utility. They showed that for diffusion models, the recursive utility generalization introduces a variance multiplier that can be state dependent. The counterpart to this multiplier in our setup is state independent and equal to the risk sensitivity parameter 1/θ . For a robust decision maker, this variance multiplier restrains entropy between the approximating and alternative models. The mathematical connections between robustness, on the one hand, and risk sensitivity and recursive utility, on the other, let us draw on a set of analytical results from those literatures.17
382 Anderson et al. 16.5.5. The θ -constrained worst-case model Given a value function. Theorem 16.1 reports the formulas for the distortions ˆ for a worst-case model used to enforce robustness. This worst-case model (g, ˆ h) is Markov and depicted in terms of the value function. This theorem thus gives us a generator Gˆ and shows us how to fill out the second row in Table 16.1. In fact, a separate argument is needed to show formally that Gˆ does in fact generate a Feller process or more generally a Markov process. There is a host of alternative sufficient conditions in the probability theory literature. Kunita (1969) gives one of the more general treatments of this problem and goes outside the realm of Feller semigroups. Also, Ethier and Kurtz (1985: Chapter 8) give some sufficient conditions for operators to generate Feller semigroups, including restrictions on the jump component Gˆn of the operator. ˆ we can apply Theorem 16.3 Using the Theorem 16.1 characterization of G, to obtain the generator of a detection semigroup that measures the statistical discrepancy between the approximating model and the worst-case model. 16.5.6. An alternative entropy constraint We briefly consider an alternative but closely related way to compute worst-case models and to enforce robustness. In particular, we consider: Problem B J ∗ (V ) =
inf
(g,h)∈,(g,h)≤ε
G(g, h)V .
(16.24)
This problem has the same solution as that given by Problem A except that θ must now be chosen so that the relative entropy constraint is satisfied. That is, θ should be chosen so that (g, h) satisfies the constraint. The resulting θ will typically depend on x. The optimized objective must now be adjusted to remove the penalty: J ∗ (V ) = J (V ) − θ ∗ =
V G[exp(−V /θ )] − G[V exp(−V /θ)] , exp(−V /θ )
which follows from (16.22c). These formulas simplify greatly when the approximating model is a diffusion. Then θ satisfies ∂V (x) 1 ∂V (x) . θ2 = ∂x ∂x 2ε This formulation embeds a version of the continuous-time preference order that Chen and Epstein (2001) proposed to capture uncertainty aversion. We had also suggested the diffusion version of this robust adjustment in our earlier paper (Anderson et al., 1998).
A quartet of semigroups for model specification
383
16.5.7. Enlarging the class of perturbations In this chapter we focus on misspecifications or perturbations to an approximating Markov model that themselves are Markov models. But in HSTW, we took a more general approach and began with a family of absolutely continuous perturbations to an approximating model that is a Markov diffusion. Absolute continuity over finite intervals puts a precise structure on the perturbations, even when the Markov specification is not imposed on these perturbations. As a consequence, HSTW follow James (1992) by tconsidering path dependent specifications of the drift of the Brownian motion 0 gs ds, where gs is constructed as a general function of past x’s. Given the Markov structure of this control problem, its solution can be represented as a time-invariant function of the state vector xt that we denote ˆ t ). gˆ t = g(x 16.5.8. Adding controls to the original state equation We now allow the generator to depend on a control vector. Consider an approximating Markov control law of the form i(x) and let the generator associated with an approximating model be G(i). For this generator, we introduce perturbation (g, h) as before. We write the corresponding generator as G(g, h, i). To attain a robust decision rule, we use the Bellman equation for a two-player zero-sum Markov multiplier game: δV = max min U (x, i) + θ (g, h) + G(g, h, i)V . i
(g,h)∈
(16.25)
The Bellman equation for a corresponding constraint game is: δV = max i
min
(g,h)∈(i),(g,h)≤ε
U (x, i) + G(g, h, i)V .
Sometimes infinite-horizon counterparts to terminal conditions must be imposed on the solutions to these Bellman equations. Moreover, application of a Verification Theorem will be needed to guarantee that the implied control laws actually solve the game. Finally, these Bellman equations presume that the value function is twice continuously differentiable. It is well known that this differentiability is not always present in problems in which the diffusion matrix can be singular. In these circumstances there is typically a viscosity generalization to each of these Bellman equations with very similar structures. (See Fleming and Soner (1991) for a development of the viscosity approach to controlled Markov processes.)
16.6. Portfolio allocation To put some of the results of Section 16.5 to work, we now consider a robust portfolio problem. In Section 16.7 we will use this problem to exhibit how asset prices can be deduced from the shadow prices of a robust resource allocation problem. We depart somewhat from our previous notation and let {xt : t ≥ 0} denote a state vector that is exogenous to the individual investor. The investor influences
384 Anderson et al. the evolution of his wealth, which we denote by wt . Thus the investor’s composite state at date t is (wt , xt ). We first consider the case in which the exogenous component of the state vector evolves as a diffusion process. Later we let it be a jump process. Combining the diffusion and jump pieces is straightforward. We focus on the formulation with the entropy penalty used in Problem (16.21), but the constraint counterpart is similar. 16.6.1. Diffusion An investor confronts asset markets that are driven by a Brownian motion. Under an approximating model, the Brownian increment factors have data t prices given by π(xt ) and xt evolves according to a diffusion: dxt = µ(xt ) dt + (xt ) dBt .
(16.26)
Equivalently, the x process has a generator Gd that is a second-order differential operator with drift µ and diffusion matrix = . A control vector bt entitles the investor to an instantaneous payoff bt · dBt with a price π(xt ) · bt in terms of the consumption numeraire. This cost can be positive or negative. Adjusting for cost, the investment has payoff −π(xt ) · bt dt + bt · dBt . There is also a market in a riskless security with an instantaneous risk-free rate ρ(x). The wealth dynamics are therefore dwt = [wt ρ(xt ) − π(xt ) · bt − ct ] dt + bt · dBt ,
(16.27)
where ct is date t consumption. The control vector is i = (b , c). Only consumption enters the instantaneous utility function. By combining (16.26) and (16.27), we form the evolution for a composite Markov process. But the investor has doubts about this approximating model and wants a robust decision rule. Therefore he solves a version of game (16.25) with (16.26), (16.27) governing the dynamics of his composite state vector w, x. With only the diffusion component, the investor’s Bellman equation is δV (w, x) = max min U (c) + θ (g) + G(g, b, c)V , (c,b)
g
where G(g, b, c) is constructed using drift vector ) * µ(x) + (x)g wρ(x) − π(x) · b − c + b · g and diffusion matrix ) * [ b]. b The choice of the worst-case shock g satisfies the first-order condition: θ g + Vw b + Vx = 0,
(16.28)
A quartet of semigroups for model specification
385
. where Vw = ∂V /∂w and similarly for Vx . Solving (16.28) for g gives a special case of the formula in (16.22a). The resulting worst-case shock would depend on the control vector b. In what follows we seek a solution that does not depend on b. The first-order condition for consumption is Vw (w, x) = Uc (c), and the first-order condition for the risk allocation vector b is −Vw π + Vww b + Vxw + Vw g = 0.
(16.29)
In the limiting case in which the robustness penalty parameter is set to ∞, we obtain the familiar result that b=
π Vw − Vxw , Vww
in which the portfolio allocation rule has a contribution from risk aversion measured by −Vw /wVww and a hedging demand contributed by the dynamics of the exogenous forcing process x. Take the Markov perfect equilibrium of the relevant version of game (16.25). Provided that Vww is negative, the same equilibrium decision rules prevail no matter whether one player or the other chooses first, or whether they choose simultaneously. The first-order conditions (16.28) and (16.29) are linear in b and g. Solving these two linear equations gives the control laws for b and g as a function of the composite state (w, x): θ π Vw − θ Vxw + Vw Vx bˆ = θ Vww − (Vw )2 Vw Vxw − (Vw )2 π − Vww Vx gˆ = . θVww − (Vw )2
(16.30)
Notice how the robustness penalty adds terms to the numerator and denominator of the portfolio allocation rule. Of course, the value function V also changes when we introduce θ. Notice also that (16.30) gives decision rules of the form ˆ bˆ = b(w, x) gˆ = g(w, ˆ x),
(16.31)
and in particular how the worst-case shock g feeds back on the consumer’s endogenous state variable w. Permitting g to depend on w expands the kinds of misspecifications that the consumer considers. 16.6.1.1. Related formulations So far we have studied portfolio choice in the case of a constant robustness parameter θ . Maenhout (2001) considers portfolio problems in which the robustness
386 Anderson et al. penalty depends on the continuation value. In his case, the preference for robustness is designed so that asset demands are not sensitive to wealth levels as is typical in constant θ formulations. Lei (2000) uses the instantaneous constraint formulation of robustness described in Section 16.5.6. to investigate portfolio choice. His formulation also makes θ state dependent, since θ now formally plays the role of a Lagrange multiplier that restricts conditional entropy at every instant. Lei specifically considers the case of incomplete asset markets in which the counterpart to b has a lower dimension than the Brownian motion. 16.6.1.2. Ex post Bayesian interpretation While the dependence of g on the endogenous that w seems reasonable as a way to enforce robustness, it can be unattractive if we wish to interpret the implied worst-case model as one with misspecified exogenous dynamics. It is sometimes asked whether a prescribed decision rule can be rationalized as being optimal for some set of beliefs, and then to find what those beliefs must be. The dependence of the shock distributions on an endogenous state variable such as wealth w might be regarded as a peculiar set of beliefs because it is egotistical to let an adverse nature feedback on personal state variables. But there is a way to make this feature more acceptable. It requires using a dynamic counterpart to an argument of Blackwell and Girshick (1954). We can produce a different representation of the solution to the decision problem by forming an exogenous state vector W that conforms to the Markov perfect equilibrium of the game. We can confront a decision maker with this law of motion for the exogenous state vector, have him not be concerned with robustness against misspecification of this law by setting θ = ∞, and pose an ordinary decision problem in which the decision maker has a unique model. We initialize the exogenous state at W0 = w0 . The optimal decision processes for {(bt , ct )} (but not the control laws) will be identical for this decision problem and for game (16.25) (see HSWT). It can be said that this alternative problem gives a Bayesian rationale for the robust decision procedure. 16.6.2. Jumps Suppose now that the exogenous state vector {xt } evolves according to a Markov jump process with jump measure η. To accommodate portfolio allocation, introduce the choice of a function a that specifies how wealth changes when a jump takes place. Consider an investor who faces asset markets with date-state Arrow security prices given by (y, xt ) where {xt } is an exogenous state vector with jump dynamics. In particular, a choice a with instantaneous payoff a(y) if the state jumps to y has a price (y, xt )a(y)η(dy|x) in terms of the consumption numeraire. This cost can be positive or negative. When a jump does not take place, wealth evolves according to * ) dwt = ρ(xt− )wt− − (y, xt− )a(y)η(dy|xt− ) − ct− dt
A quartet of semigroups for model specification
387
where ρ(x) is the risk-free rate given state x and for any variable z, zt− = limτ ↑t zτ . If the state x jumps to y at date t, the new wealth is a(y). The Bellman equation for this problem is δV (w, x) = max min U (c) c,a h∈ ) * + Vw (w, x) ρ(x)wt − (y, x)a(y)η (dy|x) − c + θ [1 − h(y, x) + h(y, x) log h(y, x)]η (dy|x) + h(y, x)(V [a(y), y] − V (w, x))η (dy|x) The first-order condition for c is the same as for the diffusion case and equates Vw to the marginal utility of consumption. The first-order condition for a requires ˆ x)Vw [a(y), h(y, ˆ y] = Vw (w, x)(y, x), and the first-order condition for h requires ˆ x) = V [a(y), −θ log h(y, ˆ y] − V (w, x). Solving this second condition for hˆ gives the jump counterpart to the solution asserted in Theorem 16.1. Thus the robust aˆ satisfies: Vw [a(y), ˆ y] (y, x) = . Vw (w, x) exp((−V [a(y), ˆ y] + V (x)/θ )) In the limiting no-concern-about-robustness case θ = ∞, hˆ is set to one. Since Vw is equated to the marginal utility for consumption, the first-order condition for a equates the marginal rate of substitution of consumption before and after the jump to the price (y, x). Introducing robustness scales the price by the jump distribution distortion. In this portrayal, the worst-case h depends on the endogenous state w, but it is again possible to obtain an alternative representation of the probability distortion that would give an ex post Bayesian justification for the decision process of a.
16.7. Pricing risky claims By building on findings of Hansen and Scheinkman (2002), we now consider a third semigroup that is to be used to price risky claims. We denote this semigroup by {Pt : t ≥ 0} where Pt φ assigns a price at date zero to a date t payoff φ(xt ). That pricing can be described by a semigroup follows from the Law of Iterated Values: a date 0 state-date claim φ(xt ) can be replicated by first buying a claim Pτ φ(xt−τ )
388 Anderson et al. and then at time t − τ buying a claim φ(xt ). Like our other semigroups, this one ¯ that we write as in (16.10): has a generator, say G, ∂φ 1 ∂ 2φ ¯ ¯ Gφ = −ρφ ¯ + µ¯ · + trace + N¯ φ ∂x 2 ∂x∂x where N¯ φ =
[φ(y) − φ(x)]η(dy|x). ¯
The coefficient on the level term ρ¯ is the instantaneous riskless yield to be given in formula (16.34). It is used to price locally riskless claims. Taken together, the remaining terms ∂φ 1 ∂ 2φ ¯ µ¯ · + N¯ φ, + trace ∂x 2 ∂x∂x comprise the generator of the so-called risk neutral probabilities. The risk neutral evolution is Markov. As discussed by Hansen and Scheinkman (2002), we should expect there to be a connection between the semigroup underlying the Markov process and the semigroup that underlies pricing. Like the semigroup for Markov processes, a pricing semigroup is positive: it assigns nonnegative prices to nonnegative functions of the Markov state. We can thus relate the semigroups by importing the measuretheoretic notion of equivalence. Prices of contingent claims that payoff only in probability measure zero events should be zero. Conversely, when the price of a contingent claim is zero, the event associated with that claim should occur only with measure zero, which states the principle of no-arbitrage. We can capture these properties by specifying that the generator G¯ of the pricing semigroup satisfies: µ(x) ¯ = µ(x) + (x)π(x) ¯ ¯ (x) = (x)
(16.32)
¯ η(x) ¯ = (y, x)η (dy|x), ¯ is strictly positive. Thus we construct equilibrium prices by producing where ¯ We now show how to construct this triple both with and without a triple (ρ, ¯ π¯ , ). a preference for robustness. 16.7.1. Marginal rate of substitution pricing To compute prices, we follow Lucas (1978) and focus on the consumption side of the market. While Lucas used an endowment economy, Brock (1982) showed that the essential thing in Lucas’s analysis was not the pure endowment feature. Instead it was the idea of pricing assets from marginal utilities that are evaluated at a candidate equilibrium consumption process that can be computed prior to computing prices. In contrast to Brock, we use a robust planning problem to generate a
A quartet of semigroups for model specification
389
candidate equilibrium allocation. As in Breeden (1979), we use a continuous-time formulation that provides simplicity along some dimensions.18 16.7.2. Pricing without a concern for robustness First consider the case in which the consumer has no concern about model misspecification. Proceedings in the spirit of Lucas (1978) and Brock (1982), we can construct market prices of risk from the shadow prices of a planning problem. Following Lucas and Prescott (1971) and Prescott and Mehra (1980), we solve a representative agent planning problem to get a state process {xt }, an associated control process {it }, and a marginal utility of consumption process {γt }, respectively. We let G ∗ denote the generator for the state vector process that emerges when the optimal controls from the resource allocation problem with no concern for robustness are imposed. In effect, G ∗ is the generator for the θ = ∞ robust control problem. We construct a stochastic discount factor process by evaluating the marginal rate of substitution at the proposed equilibrium consumption process: mrst = exp(−δt)
γ (xt ) , γ (x0 )
where γ (x) denotes the marginal utility process for consumption as a function of the state x. Without a preference for robustness, the pricing semigroup satisfies Pt φ(x) = E ∗ [mrst φ(xt )|x0 = x],
(16.33)
where the expectation operator E ∗ is the one implied by G ∗ . Individuals solve a version of the portfolio problem described in Section 16.6 without a concern for robustness. This supports the following representation of the generator for the equilibrium pricing semigroup Pt : ρ¯ = −
G∗γ +δ γ
∂ log γ µ¯ = µ∗ + ∗ π¯ = µ∗ + ∗ ∗ ∂x * ) γ (y) ∗ ¯ η¯ (dy|x) = (y, x)η∗ (dy|x) = η (dy|x). γ (x)
(16.34)
These are the usual rational expectations risk prices. The risk-free rate is the subjective rate of discount reduced by the local mean of the equilibrium marginal utility process scaled by the marginal utility. The vector π¯ of Brownian motion risk prices are weights on the Brownian increment in the evolution of the marginal utility of consumption, again scaled by the marginal utility. Finally the jump ¯ are given by the equilibrium marginal rate of substitution between risk prices consumption before and after a jump.
390 Anderson et al. 16.7.3. Pricing with a concern for robustness under the worst-case model As in our previous analysis, let G denote the approximating model. This is the model that emerges after imposing the robust control law iˆ while assuming that there is no model misspecification (g = 0 and h = 1). It differs from G ∗ , which also assumes no model misspecification but instead imposes a rule derived without any preference for robustness. But simply attributing the beliefs G to private agents in (16.34) will not give us the correct equilibrium prices when there is a preference for robustness. Let Gˆ denote the worst-case model that emerges as part of the Markov perfect equilibrium of the two-player, zero-sum game. However, formula (16.34) will yield the correct equilibrium prices if we in effect impute to the individual agents the worst-case generator Gˆ instead of G ∗ as their model of state evolution when making their decisions without any concerns about its possible misspecification. To substantiate this claim, we consider individual decision makers who, when choosing their portfolios, use the worst-case model Gˆ as if it were correct (i.e. they have no concern about the misspecification of that model, so that rather than entertaining a family of models, the individuals commit to the worst-case Gˆ as a model of the state vector {xt : t ≥ 0}). The pricing semigroup then becomes ˆ Pt φ(x) = E[mrs t φ(xt )|x0 = x],
(16.35)
where Eˆ denotes the mathematical expectation with respect to the distorted meaˆ The generator for this pricing semigroup is sure described by the generator G. parameterized by ρ¯ = −
ˆ Gγ +δ γ
∂ log γ µ¯ = µˆ + g¯ = µˆ + ∂x ) * γ (y) ¯ η(dy|x) ¯ = h(y, x)η(dy|x) ˆ = η(dy|x). ˆ γ (x)
(16.36)
As in subsection 16.7.2, γ (x) is the log of the marginal utility of consumption except it is evaluated at the solution of the robust planning problem. Individuals solve the portfolio problem described in Section 16.6 using the worst-case model 1 = h¯ specified relative to the of the state {xt } with pricing functions π¯ = g¯ and worst-case model. We refer to g¯ and h¯ as risk prices because they are equilibrium prices that emerge from an economy in which individual agents use the worst-case model as if it were the correct model to assess risk. The vector g¯ contains the socalled factor risk prices associated with the vector of Brownian motion increments. Similarly, h¯ prices jump risk. Comparison of (16.34) and (16.36) shows that the formulas for factor risk prices and the risk-free rate are identical except that we have used the distorted generator Gˆ
A quartet of semigroups for model specification
391
in place of G ∗ . This comparison shows that we can use standard characterizations of asset pricing formulas if we simply replace the generator for approximating ˆ 19 model G with the distorted generator G. 16.7.4. Pricing under the approximating model There is another portrayal of prices that uses the approximating model G as a reference point and that provides a vehicle for defining model uncertainty prices and for distinguishing between the contributions of risk and model uncertainty. The g¯ and h¯ from subsection 16.7.3 give the risk components. We now use the discrepancy between G and Gˆ to produce the model uncertainty prices. To formulate model uncertainty prices, we consider how prices can be represented under the approximating model when the consumer has a preference for robustness. We want to represent the pricing semigroup as Pt φ(x) = E[(mrst )(mput )φ(xt )|x0 = x],
(16.37)
where mpu is a multiplicative adjustment to the marginal rate of substitution that allows us to evaluate the conditional expectation with respect to the approximating model rather than the distorted model. Instead of (16.34), to attain (16.37), we portray the drift and jump distortion in the generator for the pricing semigroup as µ¯ = µˆ + g¯ = µ + (g¯ + g) ˆ ¯ x)η(dy|x) ¯ x)h(y, ˆ x)η(dy|x). η(dy|x) ¯ = h(y, ˆ = h(y, Changing expectation operators in depicting the pricing semigroup will not change the instantaneous risk-free yield. Thus from Theorem 16.1 we have: Theorem 16.2. Let V p be the value function for the robust resource allocation problem. Suppose that (i) V p is in C˜ 2 and (ii) exp[−V p (y)/θ ]η(dy|x) < ∞ for all x. Moreover, γ is assumed to be in the domain of the extended generator ˆ Then the equilibrium prices can be represented by: G. ρ¯ = −
ˆ Gγ +δ γ
) * 1 p γx (x) π¯ (x) = − (x) Vx (x) + (x) = g(x) ˆ + g(x) ¯ θ γ (x) 1 ¯ log (y, x) = − [V p (y) − V p (x)] + log γ (y) − log γ (x) θ ˆ x) + log h(y, ¯ x). = log h(y, This theorem follows directly from the relation between G and Gˆ given in Theorem 16.1 and from the risk prices of subsection 16.7.3. It supplies the third row of Table 16.1.
392 Anderson et al. 16.7.5. Model uncertainly prices: diffusion and jump components We have already interpreted g¯ and h¯ as risk prices. Thus we view gˆ = −(1/θ ) Vx as the contribution to the Brownian exposure prices that comes from model uncerˆ x) = −(1/θ ) exp[V p (y) − V p (x)] as the model tainty. Similarly, we think of h(y, uncertainty contribution to the jump exposure prices. HST obtained the additive decomposition for the Brownian motion exposure asserted in Theorem 16.2 as an approximation for linear-quadratic, Gaussian resource allocation problems. By studying continuous-time diffusion models we have been able to sharpen their results and relax the linear-quadratic specification of contraints and preferences. p
16.7.6. Subtleties about decentralization In Hansen and Sargent (2003), we confirm that the solution of a robust planning problem can be decentralized with households who also solve robust decision problems while facing the state-date prices that we derived above. We confront the household with a recursive representation of state-date prices, give the household the same robustness parameter θ as the planner, and allow the household to choose a new worst-case model. The recursive representation of the state-date prices is portrayed in terms of the state vector X for the planning problem. As in the portfolio problems of Section 16.6, among the households’ state variables is their endogenously determined financial wealth, w. In equilibrium, the household’s wealth can be expressed as a function of the state vector X of the planner. However, in posing the household’s problem, it is necessary to include both wealth w and the state vector X that propels the state-date prices as distinct state components of the household’s state. More generally, it is necessary to include both economy-wide and individual versions of household capital stocks and physical capital stocks in the household’s state vector, where the economy-wide components are used to provide a recursive representation of the date-state prices. Thus the controls and the worst-case shocks chosen by both the planner, on the one hand, and the households in the decentralized economy, on the other hand, will depend on different state vectors. However, in a competitive equilibrium, the decisions that emerge from these distinct rules will be perfectly aligned. That is, if we take the decision rules of the household in the decentralized economy and impose the equilibrium conditions requiring that “the representative agent be representative,” then the decisions and the motion of the state will match. The worst-case models will also match. In addition, although the worst-case models depend on different state variables, they coincide along an equilibrium path. 16.7.7. Ex post Bayesian equilibrium interpretation of robustness In a decentralized economy, Hansen and Sargent (2003) also confirm that it is possible to compute robust decision rules for both the planner and the household
A quartet of semigroups for model specification
393
by (a) endowing each such decision maker with his own worst-case model, and (b) having each solve his decision problem without a preference for robustness, while treating those worst-case models as if they were true. Ex post it is possible to interpret the decisions made by a robust decision maker who has a concern about the misspecification of his model as also being made by an equivalent decision maker who has no concern about the misspecification of a different model that can be constructed from the worst-case model that is computed by the robust decision maker. Hansen and Sargent’s (2003) results thus extend results of HSTW, discussed in Section 16.6.1.2, to a setting where both a planner and a representative household choose worst-case models, and where their worst-case models turn out to be aligned.
16.8. Statistical discrimination A weakness in what we have achieved up to now is that we have provided the practitioner with no guidance on how to calibrate our model uncertainty premia of Theorem 16.2, or what formulas (16.22a) tell us is virtually the same thing, the decision maker’s robustness parameter θ . It is at this critical point that our fourth semigroup enters the picture.20 Our fourth semigroup governs bounds on detection statistics that we can use to guide our thinking about how to calibrate a concern about robustness. We shall synthesize this semigroup from the objects in two other semigroups that represent alternative models that we want to choose between given a finite data record. We apply the bounds associated with distinguishing between the decision maker’s approximating and worst-case models. In designing a robust decision rule, we assume that our decision maker worries about alternative models that available time series data cannot readily dispose of. Therefore, we study a stylized model selection problem. Suppose that a decision maker chooses between two models that we will refer to as zero and one. Both are continuous-time Markov process models. We construct a measure of how much time series data are needed to distinguish these models and then use it to calibrate our robustness parameter θ . Our statistical discrepancy measure is the same one that in Section 16.5 we used to adjust continuation values in a dynamic programming problem that is designed to acknowledge concern about model misspecification.
16.8.1. Measurement and prior probabilities We assume that there are direct measurements of the state vector {xt :0 ≤ t ≤ N } and aim to discriminate between two Markov models: model zero and model one. We assign prior probabilities of one-half to each model. If we choose the model with the maximum posterior probability, two types of errors are possible, choosing model zero when model one is correct and choosing model one when model zero is correct. We weight these errors by the prior probabilities and, following Chernoff (1952), study the error probabilities as the sample interval becomes large.
394 Anderson et al. 16.8.2. A semigroup formulation of bounds on error probabilities We evade the difficult problem of precisely calculating error probabilities for nonlinear Markov processes and instead seek bounds on those error probabilities. To compute those bounds, we adapt Chernoff’s (1952) large deviation bounds to discriminate between Markov processes. Large deviation tools apply here because the two types of error both get small as the sample size increases. Let G 0 denote the generator for Markov model zero and G 1 the generator for Markov model one. Both can be represented as in (16.13). 16.8.2.1. Discrimination in discrete time Before developing results in continuous time, we discuss discrimination between two Markov models in discrete time. Associated with each Markov process is a family of transition probabilities. For any interval τ , these transition probabilities are mutually absolutely continuous when restricted to some event that has positive probability under both probability measures. If no such event existed, then the probability distributions would be orthogonal, making statistical discrimination easy. Let pτ (y|x) denote the ratio of the transition density over a time interval τ of model one relative to that for model zero. We include the possibility that pτ (y|x) integrates to a magnitude less than one using the model zero transition probability distribution. This would occur if the model one transition distribution assigned positive probability to an event that has measure zero under model zero. We also allow the density pτ to be zero with positive model zero transition probability. If discrete time data were available, say x0 , xτ , x2τ , . . . , xT τ where N = T τ , then we could form the log likelihood ratio: N τ =
T
log pτ (xj τ , x(j −1)τ ).
j =1
Model one is selected when N τ > 0,
(16.38)
and model zero is selected otherwise. The probability of making a classification error at date zero conditioned on model zero is Pr{N |x0 = x, model 0). τ > 0|x0 = x, model 0} = E(1{N τ >0} It is convenient that the probability of making a classification error conditioned on model one can also be computed as an expectation of a transformed random variable conditioned on model zero. Thus, Pr{N |x0 = x, model 1] τ < 0|x0 = x, model 1} = E[1{N τ L + π(H − L). At prices in between these two numbers the investor will not hold the asset. Figure 17.1 shows the expected payoff from buying and selling the asset as a function of p. This example illustrates how the expected value is computed under a nonadditive distribution. In this case, E(X) = L + π(H − L) (the details are given in the Appendix). It should be clear from the discussion that adding a constant to a random variable or multiplying it by a positive constant has the same effect on its
Payoff Expected gain from buying L + (H – L) – p
Expected gain from short sale p – H + ⬘ (H – L)
0 L Long position
H
Price
Short position
Figure 17.1 Expected gains from buying and selling short one unit of the asset.
Uncertainty aversion and optimal portfolio
423
expectation. On the other hand, this property does not hold for negative constants: −E(−X) = H + π (H − L), so that −E(−X) > E(X). It is this inequality which gives rise to the interval of prices with no asset holdings. A closely related representation of decisions is to suppose that the agent evaluates expected utility for a set of prior (additive) probability distributions and acts to maximize the minimum of expected utility over these priors (see Gilboa and Schmeidler (1989)). At one extreme, the agent considers only one prior—a “known” distribution—and acts according to the standard theory of expected utility. At the other extreme, if all prior distributions over outcomes are considered, the agent considers only the worst possible outcome. In our example, we would consider a set of additive priors where the chance of a high return lies between π (at least) and 1 − π (at most). The payoff from buying a unit of the asset is then Min{L + λ(H − L) − p|λ ∈ [π , 1 − π ]} = L + π(H − L) − p, and from selling it short, Min{p − H + λ(H − L)|λ ∈ [π , 1 − π ]} = p − H + π (H − L).
17.3. Uncertainty aversion We define a measure of uncertainty aversion, following an idea of Schmeidler (1989) for the case of two states of nature. The reader should refer as necessary to the Appendix for the notation, the definition of nonadditive probabilities, and a summary of their mathematical properties. Definition 17.1. Let P be a probability and A ⊂ an event. The uncertainty aversion of P at A is defined by c(P , A) = 1 − P (A) − P (Ac ). This number measures the amount of probability “lost” by the presence of uncertainty aversion. It gives the deviation of P from additivity at A. Notice that c(P , A) = c(P , Ac ), which is natural. Lemma 17.1. c(P , A) = 0 for all events A ⊂ if, and only if, P is additive. The proof is omitted. Example 17.1. Constant Uncertainty Aversion. Let be finite with n elements and let the event space be the power set of , 2 . For all ω ∈ , set P ({ω}) = (1−c)/n, where c ∈ [0, 1]. For A ⊂ , A = , define P (A) = ω∈A P ({ω}). It is easy to verify that c(P , A) = c, ∀A = , Ø. In other words this is a distribution with constant uncertainty aversion. In general a nonadditive probability need not be so simple.
424
James Dow and Sergio ´ Ribeiro da Costa Werlang
Example 17.2. Maximin Behavior. A person with extreme uncertainty aversion who is completely uninformed maximizes the payoff of the worst possible outcome. Suppose c(P , A) = 1 for all events A = , Ø. Then P (A) = 0 for all A = . Let u : R → R+ be the utility function of the agent. Then ∞ Eu = P (u α) dα. u dp = 0
Let u = inf x∈R u(x). Then P (u u) = 1 and P (u u + ε) = 0 ∀ε > 0. Therefore u 1 dα = u = inf u(x). Eu = x∈R
0
This “maximin” rule was proposed by Wald (1950) for situations of complete uncertainty, and Ellsberg (1961) and Rawls (1971) also suggest that this rule should be considered in such circumstances. Simonsen (1986) is a recent application to the theory of inflationary inertia. We now proceed to extend this “local” measure of uncertainty aversion to the whole range of two nonadditive probabilities. Definition 17.2. Given two nonadditive probabilities P and Q defined on the same space of events, we say that P is at least as uncertainty averse as Q if for all events A ⊂ , c(P , A) c(Q, A). The terminology is clumsy, but shorter than alternatives such as “P reflects at least as much perceived uncertainty as Q,” etc. This definition allows us to formalize the statement that the gap between buying and selling prices increases as the uncertainty aversion increases. Theorem 17.1. The following statements are equivalent: (i) P is at least as uncertainty averse as Q. (ii) For all random variables X for which the integrals are finite, −EP (−X) − EP X −EQ (−X) − EQ X. Proof. (i) ⇒ (ii): Let A(α) = {ω ∈ |X(ω) α}. Then 0 ∞ [P (A(α)) − 1] dα + P (A(α)) dα. EP X = −∞
0
Notice that {ω ∈ | − X(ω) > α} = A(−α)c . Thus 0 ∞ EP (−X) = [P (A(−α)c ) − 1] dα + P (A(−α)c ) dα −∞
=
0
∞
[P (A(α)c ) − 1] dα +
0 0 −∞
P (A(α)c ) dα.
Uncertainty aversion and optimal portfolio
425
Hence −EP (−X) − EP (X) =
∞
−∞
[1 − P (A(α)) − P (A(α)c )] dα.
By the same argument, −EQ (−X) − EQ (X) =
∞
−∞
[1 − Q(A(α)) − Q(A(α)c )] dα.
Since P is at least as uncertainty averse as Q, the result follows immediately. (ii) ⇒ (i): For all events A ∈ , define the random variable X = 1A (the characteristic function of the set A). Then EP X = P (A), EP (−X) = P (Ac ) − 1, EQ X = Q(A), and EQ (−X) = Q(Ac ) − 1. Applying (ii) to X, we get (i). The next example illustrates the effect of uncertainty aversion on the difference between −E(−X) and E(X). Example 17.3. Let X be a random variable with X = inf ω∈ X(ω) 0 and X = supω∈ X(ω) ∞. Let P be an additive probability, and fix c ∈ [0, 1]. We define a nonadditive probability which is obtained by uniformly increasing the uncertainty aversion from P : let Pc ( ) = 1, and Pc (A) = (1 − c)P (A) for A = . It is easy to verify that c(Pc , A) = c for all A = , Ø, and that EPc X = cX + (1 − c)EP X
and
− EPc (−X) = cX + (1 − c)EP X.
Thus −EPc (−X) − EPc X = c(X − X), which is increasing in the uncertainty aversion c in accordance with Theorem 17.1. Here we have taken an additive distribution and squeezed it uniformly. A risk-neutral agent whose behavior is represented by this distribution will maximize a weighted average of the worst possible outcome and the expectation of the additive distribution. Ellsberg (1961) suggested this as an ad hoc decision rule; this example provides some rationale for the rule.
17.4. Portfolio choice In this section we derive our main result, namely that there will be a range of prices, from E(X) to −E(−X), at which the investor has no position in the asset. At prices below these, the investor holds a positive amount of the asset, and at higher prices he holds a short position. Notice that this range of prices depends only on the beliefs and attitudes to uncertainty incorporated in the agent’s prior, and not on the attitudes towards risk captured by the utility function. Let W > 0 be the investors’ initial wealth, u 0 the utility function, and X a random variable with nonadditive distribution P . We assume that u is C 2 , u > 0, and u 0.
426
James Dow and Sergio ´ Ribeiro da Costa Werlang
Lemma 17.2. Suppose EX < ∞ and −E(−X 2 ) < ∞. For λ ∈ R define f (λ) = Eu(W + λX). Then (i) f is right-differentiable at λ = 0; (ii) f+ (0) = u (W )EX. The proof is omitted. We now proceed to the main result, namely the behavior of the risk-averse or risk-neutral agent under uncertainty aversion. Suppose the investor is faced with the problem of choosing the sum of money S he will invest in an asset. The present value of one unit of the asset next period is a random amount X with nonadditive probability distribution P . We characterize the demand for the asset as a function of the price. Theorem 17.2. A risk-averse or risk-neutral investor with certain wealth W , who is faced with an asset which yields X per unit, whose price is p > 0 per unit, will buy the asset if p < EX and only if p EX. He will sell the asset if p > −E(−X) and only if p −E(−X). Proof. By Jensen’s inequality (see the Appendix), Eu(W − S + (S/p)X) u(E[W − S + (S/p)X]). If EX p, then E[W − S + (S/p)X] W (by property (iv) of the integral in the Appendix). Thus the investor is at least as well off not holding the asset, giving expected utility u(W ), as buying any positive amount. Similarly if EX < p, no holding is strictly better than investing in the asset. We now show that if p < EX the investor will buy some of the asset. The investor’s objective is to maximize g(S) = Eu(W −S+(S/p)X). By Lemma 17.2, g+ (0) = u (W )E[(X − p)/p] > 0,
since EX > p. Thus the investor will buy a strictly positive amount of the asset. Similar arguments give the corresponding results for short sales. Notice that if u is not differentiable at some point W , then there is a range of prices with no trade even with an additive measure (if u is concave, the set of such points has measure zero).
Appendix The mathematical treatment of nonadditive probabilities may be found in Schmeidler (1982, 1986, 1989), Choquet (1955), Dellacherie (1970), Gilboa (1987), Gilboa and Schmeidler (1989), Shafer (1976), and Dempster (1967). The reader is referred to these sources. In particular, Schmeidler (1986) contains only material related to the mathematical aspects of the theory. Let be a set, and an algebra, that is a set of subsets of such that (i) ∈ , (ii) A, B ∈ ⇒ A ∪ B ∈ , and (iii) A ∈ ⇒ Ac ∈ (here Ac means the set of elements of not in A). is the set of states of nature and the elements of are called events. A function P : → [0, 1] is a nonadditive probability
Uncertainty aversion and optimal portfolio
427
if (i) P (Ø) = 0, (ii) P ( ) = 1, and (iii) P (A) P (B) if A ⊂ B. We impose an additional restriction (see Gilboa and Schmeidler (1989), Schmeidler (1986) and Shafer (1976)): (iv) ∀A, B ∈ , P (A ∪ B) + P (A ∩ B) P (A) + P (B). In Section 17.3 of the chapter we show that this corresponds to uncertainty aversion. A real valued function X : → R is said to be a random variable if for all open sets O of R, X−1 (O) ∈ . The expected value of a random variable X is defined as: 0 ∞ EX = X dp = (P (X α) − 1) dα + P (X α) dα,
−∞
0
whenever these integrals exist (in the improper Riemann sense) and are finite. Notice that since P (X α) = P (X > α) a.e., the expression for the expected value may also be written with strict inequalities. When it is necessary to distinguish between P and other distributions, we write EP X. The following properties of the integral are either proved in the papers referred to previously, or else can be proved immediately: (i) (ii) (iii) (iv) (v)
X Y ⇒ EX EY ; E(X + Y ) EX + EY ; −E(−X) EX; ∀a 0 and b ∈ R, E(aX + b) = aEX + b; For all concave functions u : R → R, Eu(X) u(EX) (Jensen’s inequality).
Note ∗ This research was initiated while the authors were at the University of Pennsylvania.
References Arrow, K. J. (1965). “The Theory of Risk Aversion,” Chapter 2 of Aspects of the Theory of Risk Bearing. Helsinki: Yrjo Jahnsonin Saatio. Bernoulli, D. (1730). “Exposition of a New Theory of the Measurement of Risk,” (in Latin), English translation in Econometrica, 21 (1953), 503–546. Bewley, T. (1986). “Knightian Decision Theory, Part 1,” Yale University. Choquet, G. (1955). “Theory of Capacities,” Ann. Inst. Fourier, Grenoble, 5, 131–295. Dellacherie, C. (1970). “Quelques Commentaires sur les Prolongements de Capacites,” Seminaire Probabilities V, Strasbourg. Berlin: Springer–Verlag, Lecture Notes in Mathematics 191. Dempster, A. (1967). “Upper and Lower Probabilities Induced by a Multivalued Mapping,” Annals of Mathematical Statistics, 38, 205–247. Dow, J., V. Madrigal, and S. Werlang (1989). “Preferences, Common Knowledge and Speculative Trade,” Working Paper, EPGE/Fundacao Getulio Vargas. Ellsberg, D. (1961). “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Gilboa, I. (1987). “Expected Utility Theory with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88.
428
James Dow and Sergio ´ Ribeiro da Costa Werlang
Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Knight, F. (1921). Risk, Uncertainty and Profit. Boston: Houghton Mifflin. Neumann, J. von and O. Morgenstern (1947). Theory of Games and Economic Behavior. Princeton: Princeton University Press. Rawls, J. (1971). A Theory of Justice. Cambridge: Harvard University Press. Savage, L. J. (1954). The Foundations of Statistics. New York: John Wiley (2nd edn, 1972, New York: Dover). Schmeidler, D. (1982). “Subjective Probability without Additivity (Temporary Title),” Foerder Institute for Economic Research Working Paper, Tel Aviv University. —— (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) —— (1986). “Integral Representation with Additivity,” Proceedings of the American Mathematical Society, 97, 255–261. Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton: Princeton University Press. Simonsen, M. H. (1986). “Rational Expectations, Income Policies and Game Theory,” Revista de Econometria, 6, 7–46. Wald, A. (1950). Statistical Decision Functions. New York: John Wiley.
18 Intertemporal asset pricing under Knightian uncertainty Larry G. Epstein and Tan Wang
18.1. Introduction Modern asset pricing theory typically adopts strong assumptions about agents’ beliefs. According to the rational expectations hypothesis, for example, there exists an objective probability law describing the state process, and it is assumed that agents know this probability law precisely. More generally, even if existence of the latter is not assumed, each agent’s beliefs about the likelihoods of future states of the world are represented by a subjective probability measure or Bayesian prior, in conformity with the Bayesian model of decision-making and, more particularly, with the Savage (1954) axioms. As a result, no meaningful distinction is allowed between risk, where probabilities are available to guide choice, and uncertainty, where information is too imprecise to be summarized adequately by probabilities. In contrast, Knight (1921) emphasized the distinction between risk and uncertainty and argued that uncertainty is more common in economic decision-making.1 Particularly, in the context of asset prices, Keynes emphasized the importance of “animal spirits” when, because of Knightian uncertainty, individuals cannot estimate probabilities reliably and so cannot make a good calculation of expected values. (See Keynes (1936) and (1921: Ch. 6); see also Koppl (1991) for discussion and additional references.) This chapter provides a formal model of asset price determination in which Knightian uncertainty plays a role. Specifically, we extend the Lucas (1978) general equilibrium pure exchange economy by suitably generalizing the representation of beliefs. Two principal results are the proof of existence of equilibrium and the characterization of equilibrium prices by an “Euler inequality.” The latter represents the appropriate generalization of the standard Euler equation to the context of uncertainty. A noteworthy feature of our model is that uncertainty may lead to equilibria that are indeterminate; that is, there may exist a continuum of equilibria for given fundamentals. That leaves the determination of a particular equilibrium price process to “animal spirits” and sizable volatility may result.
Epstein, Larry G. and Tan Wang (1994) “Intertemporal asset pricing under Knightian uncertainty,” Econometrica, 62, 283–322.
430
Larry G. Epstein and Tan Wang
Overall our model conforms closely to Keynes’ (1936: p. 154) description of the consequences of uncertainty: A conventional valuation which is established as the outcome of the mass psychology of a large number of ignorant individuals is liable to change violently as a result of a sudden fluctuation of opinion due to factors which do not really make much difference to the prospective yield; since there will be no strong roots of conviction to hold it steady. Besides the motivation provided by the intuitively appealing ideas of Knight and Keynes, our chapter is motivated also by evidence that people prefer to act on known rather than unknown or vague probabilities. For example, they typically prefer to bet on drawing a red ball from an urn containing 50 red and black balls each, than from an urn containing 100 red and black balls in undisclosed proportions. The best known such evidence is the Ellsberg Paradox (Ellsberg (1961)); the large body of empirical evidence inspired by this paradox, both experimental and market-based, is surveyed by Camerer and Weber (1992). Behavior such as that exhibited in the context of the Ellsberg Paradox contradicts the Bayesian paradigm, that is the existence of any prior underlying choices. Intuitively, the reason is that a probability measure cannot adequately represent both the relative likelihoods of events and the amount, type, and reliability of the information underlying those likelihoods. On the other hand, in a multiperiod setting such as ours, one may wonder whether “vagueness” might disappear asymptotically as a result of learning by the agent, at least if the environment is stationary. Learning in the presence of uncertainty has not yet been studied sufficiently well to provide a definitive theoretical answer to this question. In any event, it would seem that economic processes are typically too complicated or unstable to be modeled in detail and understood precisely. Thus we would not presume uncertainty to be strictly a shortrun phenomenon. See Walley (1991: Ch. 5) for further arguments about the general prevalence of imprecision and Zarnowitz (1992: 61–63) for cogent arguments in a business cycle setting. In an asset pricing context, Barsky and DeLong (1992) argue that there is substantial uncertainty about the structure of the aggregate dividend process in the United States over the last century, even on the part of current analysts who have the benefit of hindsight. In addition, many processes of interest are presumably physically indeterminate; Papamarcou and Fine (1991) describe an empirical process that generates relative frequencies that can be modeled by a set of probability measures, but not by any single probability measure. Ultimately, our objective is to investigate whether the noted shortcoming of the Bayesian paradigm is at all responsible for any of the empirical failures of the consumption-based asset pricing model derived from Lucas (1978). While serious empirical analysis is beyond the scope of this chapter, we will address the empirical content of our model informally. We do so first in Section 18.3.4 where we indicate the potential usefulness of our model for resolving the excess volatility puzzle (Shiller (1981) and Cochrane (1991)). Further discussion of empirical content is provided in Section 18.4.
Intertemporal asset pricing
431
There are now available a number of extensions of the Bayesian model that admit a distinction between risk and uncertainty. One, due to Bewley (1986), drops Savage’s assumption that preferences are complete and adds a model of the “status quo.” An alternative direction, due to Gilboa and Schmeidler (1989), is to weaken Savage’s Sure-Thing Principle. The consequence for the representation of preferences and beliefs is that Savage’s single prior is replaced by a set of priors. In this chapter, we take this multiple-priors model as our starting point.2 Then, since our framework is intertemporal and since the Gilboa–Schmeidler (1989) framework is atemporal and deals exclusively with one-shot choice, we extend their model (nonaxiomatically) to an intertemporal, infinite-horizon setting. Moreover, this is done in a way that delivers two attractive properties of the standard expected additive utility model that dominates economics and finance—dynamic consistency and tractability. Since such an extension is potentially useful for addressing issues other than asset pricing where uncertainty may be important, we view it as a separate contribution of the chapter. While the rational expectations hypothesis has considerable a priori appeal for economists, it has come under scrutiny in recent years because of apparently contradictory empirical evidence. We have already mentioned the asset pricing anomalies that indicate rejection of a collection of joint hypotheses that includes rational expectations. In addition, where it has been tested separately by means of survey data, the rational expectations hypothesis has generally been rejected (e.g., see Cragg and Malkiel (1982), Zarnowitz (1984), Ito (1990), Frankel and Froot (1990)). As a result, models with “irrational expectations” have been developed, involving “fads” (Shiller (1991), Barsky and DeLong (1992)) or “noise traders” (DeLong et al. (1990)). A focus on beliefs is shared by the model proposed in this chapter, though, in a sense, we deviate much less from the standard Lucas-style model. One can interpret our model as differing from Lucas’ only by replacing the Sure-Thing Principle and its implied Bayesian prior, by the Gilboa–Schmeidler set of axioms, suitably adapted to the intertemporal framework, and the resulting set of priors. We proceed as follows: Section 18.2 describes our model of intertemporal utility, including beliefs. Equilibrium asset pricing is studied in Section 18.3. We conclude in Section 18.4 with some comments on the empirical content of our model. Technical details are collected in appendices.
18.2. Intertemporal utility 18.2.1. Background The standard specification of utility over infinite horizon consumption processes is given by U (c) = E
∞ 1
! t
β u(ct ) ,
(18.1)
432
Larry G. Epstein and Tan Wang
or in recursive form U (c) = u(c1 ) + βEU (c2 , c3 , . . .).
(18.2)
Here E denotes the expectation operator conditional on available information; other notation is standard and will shortly be defined precisely in any event, as will the underlying stochastic environment. Beliefs about the likelihoods of future underlying states of the world are represented by a conditional probability measure π. In the rational expectations paradigm, π is an objective probability law that governs the evolution of states of the world and is assumed known to the decisionmaker. An alternative justification for π is the Savage representation theorem according to which π is a subjective probability measure; an objective probability law need not exist in principle. In either approach, a role for Knightian uncertainty or imprecise information is excluded a priori, either because information is assumed to be precise, or, in the second approach, because the Savage axioms imply that imprecision is a matter of indifference to the decision-maker (as discussed later). Our objective is to investigate the implications of imprecise information and thus we need to adopt a more general representation for beliefs. In order to focus more sharply on our objective, we consider otherwise “minimal” variations of (18.1) and (18.2). 18.2.2. The environment and beliefs The set of states is , a compact metric space with Borel σ -algebra B( ). Under the weak convergence topology, M( ), the space of all Borel probability measures, is also a compact metric space. At time t the decision-maker observes some realization ωt ∈ . Beliefs about the evolution of the process {ωt } conform to a time-homogeneous Markov structure. In standard models, this would involve a Markov probability kernel giving conditional probabilities. Here we assume that beliefs conditional on ωt are too vague to be represented by a probability measure and are represented instead by a set of probability measures. Thus we model beliefs by a probability kernel correspondence P , which is a (nonempty valued) correspondence P : → M( ), assumed to be continuous, compact-valued, and convex-valued. For each ω ∈ , we think of P (ω) as the set of probability measures representing beliefs about next period’s state. However, the rigorous interpretation of P is as a component of the representation of the preference ordering over consumption processes as described in the next subsection. Anticipating somewhat the noted representation of preferences, adapt common terminology and refer to the multivalued nature of P as reflecting uncertainty aversion of preferences (see Schmeidler (1989: Proposition, p. 532)). In fact, the multivaluedness of P reflects both the presence of uncertainty and the agent’s aversion to uncertainty; for our purposes, there is no need to attempt to define a meaningful distinction between the “absence of uncertainty” on the one hand, and the presence of uncertainty accompanied by indifference to it on the other. If
Intertemporal asset pricing
433
P is singleton-valued, then P = {π }, where π is a probability kernel, that is, a continuous map from into M( ). Since this Bayesian representation of beliefs excludes any role for uncertainty, we refer to uncertainty neutrality or indifference in this case. It will be convenient to adopt the following notation: for any bounded, Borelmeasurable f : → R and for any set P ⊂ M( ), " % f dP ≡ inf f dm : m ∈ P (18.3)
and accordingly, P(A) ≡ inf {m(A) : m ∈ P},
A ∈ B( ).
(18.4)
In particular, if P = P (ω) for some ω, then P (ω, A) ≡ inf {m(A) : m ∈ P (ω)},
(18.5)
and for any continuous f , " % f (·) dP (ω, ·) ≡ f dP (ω) ≡ min f dm : m ∈ P (w) .
(18.6)
Note that the latter minimum exists since P (ω) is compact and the map m → f dm is continuous by the nature of the weak convergence topology. We stretch common terminology and refer to the expressions on the left sides of (18.3) and (18.6) as “integrals” or “expected values.”3 On occasion, we will want to impose a link between beliefs and “reality,” at least with respect to which events are null or impossible. Suppose therefore that objectively null events are defined in the obvious way by the probability kernel π ∗ . It is not necessary to assume that the {ωt } process evolves according to π ∗ or any other probability kernel. Say that P is absolutely continuous with respect to π ∗ if ∀ω ∈ , ∀A ∈ B( ),
π ∗ (ω, A) = 0 ⇒ m(A) = 0
∀m ∈ P (ω); (18.7)
that is, the objective nullity of A(π ∗ (ω, A) = 0) implies the subjective nullity of A, with “conditioning on ω” understood throughout. For other purposes, it is useful to have the following property satisfied for all ω ∈ and all continuous functions f , g : → R+ , ∗ f dP (ω) > g dP (ω). f g, π (ω, {ω : f (ω ) > g(ω )}) > 0 ⇒ (18.8) With this in mind, we assume where explicitly stated that P has full support, that is, m(A) > 0 for all ω ∈ , m ∈ P (ω) and nonempty open subsets A ⊂ . Given
434
Larry G. Epstein and Tan Wang
this assumption, this indicated strict inequality for the integrals holds if f g and f = g. Thus we avoid the expositional and notational clutter associated with qualifications of the form “a.e. [π ∗ (ω, ·)]” in various definitions and statements of theorems stated later. Note that P (ω, A) > 0 ⇒ P (ω, \ A) < 1. Therefore, the assumption of full support limits the class of subjectively null events and guarantees that, at least for open sets A, P (ω, \ A) = 1 ⇒ A = Ø ⇒ π ∗ (ω, A) = 0, which is the converse of the implication in (18.7). 18.2.3. Examples of probability kernel correspondences Many natural and useful specifications of sets of priors have been studied in the statistics literatures (see, e.g. Wasserman (1990), Wasserman and Kadane (1990), and Walley (1991)) and many of these are readily extended to probability kernel correspondences. Here we describe two such examples. Example 18.1. (ε-Contamination) Fix a probability kernel π ∗ and a continuous function ε : → [0, 1]. Let P be defined by P (ω) ≡ {(1 − ε(ω))π ∗ (ω) + ε(ω)m : m ∈ M( )}.
(18.9)
Then the associated integrals (18.4) take the form f dP (ω) = (1 − ε(ω)) f (ω )π ∗ (ω, dω ) + ε(ω) · inf f ,
and for each B ∈ B( ), " (1 − ε(ω))π ∗ (ω, B) if B = , P (ω, B) = 1 if B = .
(18.10)
(18.11)
The correspondence P has full support if ε < 1 and supp π ∗ (ω) = for all ω ∈ . It reduces to the probability kernel π ∗ if ε ≡ 0. The other extreme, called complete ignorance, has ε ≡ 1 and P (·) ≡ M( ), in which case f dP (ω) = inf f .
The set P (ω) includes all perturbations of π ∗ (ω, ·), where ε(ω) reflects the amount of error deemed possible. Accordingly, a possible rationale for the specification mentioned earlier of P is that π ∗ represents the “true” probability law on
that the agent knows only imprecisely.4 Other forms of perturbations are also possible as suggested by the examples in the references cited earlier. An attractive feature of the particular perturbation represented by (18.9) is the explicit formula (18.10) available for associated integrals. Example 18.2. (Belief Function Kernels) The set of states is assumed to be exhaustive and therefore is presumably large and complex. Consequently, the law
Intertemporal asset pricing
435
of motion on may be too complicated to be understood precisely, or alternatively may not be representable by a probability kernel. Suppose, however, that the agent observes N statistics, each a function of the current state, and that the probability law governing the dynamics of these statistics is known. More precisely, let G : → RN
(18.12)
and let p be a probability kernel that describes the evolution of {G(ωt )} as a time-homogeneous Markov process; that is, p(·|y) is a conditional probability meausure on G( ) that varies continuously with y ∈ G( ). We assume both that G is continuous and that the inverse y → G−1 (y) is a continuous correspondence. Since, as described later, pay-off relevant variables, such as consumption and dividends, vary with ωt rather than G(ωt ), assessment of likelihoods over is essential to the agent. It is important to note that p and G do not imply a probability kernel over unless G is one-to-one. However, a representation of likelihoods in terms of a probability kernel correspondence may be constructed for arbitrary G in the following intuitively plausible fashion: For any ω ∈ and B ∈ B( ), let µ(ω, B) ≡ p({y ∈ RN : G−1 (y) ⊆ B}|G(ω)),
(18.13)
the probability according to p of those realizations for the statistics that imply B conditional on the values of the statistics at ω. Now define PG by PG (ω) ≡ {m ∈ M( ) : m(B) µ(ω, B), ∀B ∈ B( )}.
(18.14)
Then PG is continuous and therefore is a probability kernel correspondence.5 To elucidate (18.14), note that for each ω ∈ , µ(ω, ·) is a special capacity, called a belief function (Dempster (1967)) and PG (ω) is the “core” of µ(ω, ·); see also Wasserman (1988, 1990), Jaffray (1992), and Schmeidler (1989). Examination of the integration formulae implied by (18.14) provides further insight into the nature of PG . The analogues of (18.5)–(18.6) are PG (ω, A) = µ(ω, A),
ω ∈ ,
A ∈ B( ),
and
f dPG (ω) ≡
f ∗ (y)dp(y|G(ω)),
(18.15)
where f ∗ : G( ) → R is defined by f ∗ (y) ≡ min{f (ω ) : G(ω ) = y}. Since f ∗ is defined as the indicated minimum, the integral on the right side of (18.15) reflects the agent’s ignorance on each level set {ω : G(ω ) = y}. Thus PG models the situation where the law of motion p for the statistics G is the only information available regarding the law of motion on .
436
Larry G. Epstein and Tan Wang
18.2.4. Utility This subsection completes the description of the utility function over consumption processes, the first component of which is the probability kernel correspondence P . To define the domain of consumption processes, we need some notation and terminology. The measurable space underlying all random processes is ( ∞ , the product Borel σ -algebra B( ∞ )). For ω ∈ ∞ and t 1, ωt ≡ (ω1 , . . . , ωt ); t is the collection of all such points. Let B( t ) be the product Borel σ -algebra and embed it in the usual fashion in B( ∞ ). A process {Xt }, Xt : ∞ → Rn for each t, is adapted if Xt is B( t )-measurable for all t. Given such measurability, we can identify Xt with a map from t → Rn . If each such map is also continuous, refer to the process {Xt } as a continuous process. The process is real-valued if n = 1. Consumption processes lie in the complete normed space 2 D ≡ X = {Xt } : {Xt } is an adapted and continuous real-valued process, Xt (ωt ) 0 for all t 1 and ωt ∈ t , 3 and ||X|| ≡ sup sup |Xt (ωt )|/bt < ∞ , t
ωt
where b 1 is a fixed real number that provides an upper bound for the average rate of growth of consumption. The restriction to adapted processes is natural; consumption at time t can depend only on information available then. The assumption of continuity is undoubtedly less natural. Nevertheless, it affords considerable analytical simplification and is important for the analysis of equilibrium asset pricing and therefore seems appropriate in our attempt to balance mathematical generality with economic significance and accessibility.6 Consumption processes are typically denoted by c = {ct }. Since D will also be the ambient space for utility and price processes, the “neutral” dummy variable X is used earlier. An element X in D is Markovian if for each t and ωt ∈ , Xt (·, ωt ) is constant on t−1 and time-homogeneous if in addition Xt does not vary with t. Utilities over D are defined by three primitives: a probability kernel correspondence P , a discount factor β ∈ (0, 1), and an instantaneous utility or felicity function u : R+ → R+ assumed to be continuous, increasing, concave, and normalized to satisfy u(0) = 0. For each given c in D we define a utility process {Vt (c)}∞ 1 as the unique element of D satisfying the following recursive relation: for all t 1 and ωt in t , Vt (c; ωt ) = u(ct (ωt )) + β Vt+1 (c; ωt , ·)dP (ωt , ·), (18.16) where Vt (c; ωt ) denotes Vt (c)(ωt ). Think of Vt (c; ωt ) as the utility of the continuation consumption process t c ≡ (ct , ct+1 , . . .) conditional on the history ωt . The (initial) utility of the entire path c is V1 (c; ω1 ). The interpretation of (18.16) is clear. Given the history ωt at time t, the individual evaluates the consumption process for the remaining future in two stages. First, the
Intertemporal asset pricing
437
future from (t + 1) onward is evaluated by means of the “expected value” of Vt+1 with respect to beliefs P (ωt ). This summary index of future is then combined with the instantaneous utility of time t consumption to define the utility of the consumption process from t onward. If each P (ωt ) is a singleton probability measure, then (18.16) reduces to the standard model (18.2). Uncertainty aversion is incorporated into preferences in the general case by permitting P (ωt ) to be multivalued.7 By routine arguments based on the contraction mapping theorem, we show in Appendix A that utilities are well defined by (18.16). To state our theorem, adopt the notation t c|ωt−1 ≡ {cτ (ωt−1 , ·)}∞ τ =t ∈ D, for the continuation of c given the history ωt−1 preceding t. Also, if c and c are elements of D, write c > c if c = c and ct ct for all t; c 3 c if c > c and there exists t such that ct (ωt ) > ct (ωt ) for all ωt ∈ t . Finally, if U : D → R+ , say that U is (strictly) increasing if (c > c)c 3 c implies U (c ) > U (c). Theorem 18.1. (Existence of utility). Suppose that βb < 1. Then for each c ∈ D, there exists a unique V (c) ∈ D satisfying (18.16). Moreover, for all c, t 1 and ωt , Vt (c; ωt ) = V1 (t c|ωt−1 ; ωt ).
(18.17)
For each ω ∈ , V1 (· ; ω) is increasing and concave on D; it is strictly increasing if P has full support. Finally, if u satisfies a growth condition, that is, if there exist constants k1 and k2 > 0 such that u(x) k1 + k2 x for all x ∈ R+ , then V1 (c; ω) is jointly continuous on D × . Condition (18.17) asserts that time t utility equals a time-invariant function of the continuation consumption path t c|ωt−1 and the current state ωt . This follows from the time-homogeneous, first-order Markov structure assumed for beliefs. Since the time 1 designation is irrelevant, we denote V1 (c; ω) simply by V (c; ω) and refer to V as the utility function defined by (18.16). By the last part of the theorem, V possesses some standard regularity conditions. Note that the assumption βb < 1 is adopted throughout. Another important property of V , or at least of the entire utility process, is dynamic consistency. The recursive construction of utility via (18.16) suggests that dynamic consistency (suitably defined) will be satisfied. To be more precise, each Vt (· ; ωt ) is a utility function over D; denote by {Vt } the corresponding process of utility functions. Say that {Vt } is dynamically consistent if for all ω1 ∈ , c and c in D and T 1, V1 (c ; ω1 ) > V1 (c; ω1 ) if: (i) ct = ct for t = 1, . . . , T − 1, (ii) VT (c ; ω1 , ·) = VT (c; ω1 , ·), and (iii) VT (c ; ω1 , ·) VT (c; ω1 , ·) on T −1 . Say that {Vt } is weakly dynamically consistent if (i)–(iii) imply only that V1 (c ; ω1 ) V1 (c; ω1 ). The stronger notion of dynamic consistency is the counterpart for our framework of the usual definition (e.g. Epstein and Zin (1989), Duffie
438
Larry G. Epstein and Tan Wang
and Epstein (1992)). Only the weaker notion is satisfied in general by the process {Vt }, since the set of histories ω2 , . . . , ωT −1 where VT (c ; ω1 , ·) > VT (c; ω1 , ·) could be “null” from the perspective of time 1 and the beliefs prevailing there and thus V1 (c ; ω1 ) = V1 (c; ω1 ) is possible. That possibility is ruled out if P has full support, in which case dynamic consistency holds (see Appendix A). However, even if only weak dynamic consistency obtains, we show that our asset pricing model of Section 18.3 has an equilibrium along which optimal plans are carried out. Risk aversion for V is not mentioned earlier since it is well defined only given the existence of probabilities that can be used to define actuarial fairness. For that purpose, suppose there exist events in B( ) that can be assigned probabilities; that is, suppose B ∗ is a sub-sigma algebra of B( ) such that for each ω, any two measures in P (ω) agree when restricted to B ∗ . Then V has the form (18.1) on the subdomain of consumption processes defined by B ∗ -measurability, and so is clearly risk averse there. Finally, in this subsection we relate our recursive model of utility to Gilboa and Schmeidler (1989) and argue that (18.16) represents a sensible extension of their atemporal model to an intertemporal framework. An alternative extension has the following form: there exists a correspondence K : → M( ∞ ), with the set of measures K(ωt ) representing beliefs at (t, ωt ) about the entire future, such that intertemporal utility Ut is given by . - ∞ i−t t Ut (c; ω ) = β u(ci ) dK(ωt )
∞
≡ inf
t
∞
-
∞
. β
i−t
$
u(ci ) dm : m ∈ K(ωt ) .
(18.18)
t
In comparing (18.16) and (18.18), note first that they coincide under uncertainty neutrality but not more generally. In particular, if P is a probability kernel, then, given ωt , it determines a unique probability measure p(ωt ) on ∞ and (18.18) is derived with K(ωt ) equal to the singleton {p(ωt )}. However, such a derivation of (18.18) from (18.16) fails more generally since the additivity property of Lebesgue integration with respect to a probability measure is not satisfied by “integration” with respect to a set of probability measures. Given that (18.16) and (18.18) represent distinct models of intertemporal utility, one is left wondering which is more attractive. A definitive judgment would presumably require an examination of the axiomatic underpinnings of each model.8 While such an examination is beyond the scope of this chapter, we can point to an axiomatic difference between the two models that is important and supportive of (18.16) at least when “time” is taken seriously. That feature is simply that {Ut } is generally weakly dynamically inconsistent. Therefore, in the absence of an explanation of how dynamic inconsistency is resolved, the model (18.18) does not deliver predictions about choice behavior. In an important sense, therefore, the model (18.18) is incomplete; in particular, it is not clear how it should be applied to describe consumption/savings behavior and asset price determination in the model
Intertemporal asset pricing
439
economy of Section 18.3. A game-theoretic resolution of dynamic inconsistency has been examined in related models, but the tractability of such an approach is a serious concern in the setting of Section 18.3. There is a closely related observation concerning (18.18) that also merits mention. One might think of adopting the specification (18.18) at t = 1 and then suitably updating the set of priors K(ωt ) as time proceeds. However, any updating rule will invariably imply the weak dynamic inconsistency of preferences, excluding a “small” number of arguably uninteresting specifications for K(ωt ), one of which is that K(ωt ) is a single probability measure (see Epstein and LeBreton (1993)). This difficulty reflects the problematic nature of rules for updating vague beliefs that is now well recognized (see Walley (1991: 279–281), Gilboa and Schmeidler (1993), Jaffray (1992), Epstein and LeBreton (1993)). In contrast, our model of utility delivers weak dynamic consistency. By adopting “conditional belief,” represented by P , as the primitive, we obviate the need for an updating rule. Moreover, we feel that the recursive framework has some psychological plausibility because of the algorithmic appeal of backward induction. 18.2.5. Utility supergradients Since we will be concerned later with the (shadow) pricing of securities, we are led naturally to an examination of the supergradients, suitbly defined, of our utility function V . A novel feature of V relative to utility functions that have been generally applied previously in the macro/finance literature is that V (·, ω) is “frequently” nondifferentiable in the Gâteaux sense unless P is a probability kernel. However, since V (·; ω) is concave, it possesses one-sided Gâteaux derivatives. Here we derive representations for these one-sided derivatives and the associated supergradients. These representations are applied to the security valuation problem in Section 18.3.2. Let e ∈ D represent a base consumption process that is everywhere strictly positive and consider the effect on utility of perturbations in specified directions. It will suffice to consider perturbations in “today’s” and “next period’s” consumption only, that is, consider the change from e to e + ξ h where ξ ∈ R1 and h = {ht }∞ 1 is a continuous real-valued process, such that ht ≡ 0 for t = 1, 2, h1 ∈ R, and h2 ∈ C( ). Note that e+ξ h ∈ D for sufficiently small ξ . Therefore, V (e+ξ h; ω) is defined for such ξ . Since V is defined via a minimum over a set of probability measures as in (18.6) and (18.16), one-sided directional derivatives of V may be derived by an appropriate “envelope theorem.” The one-sided derivatives are described in Lemma 18.1, which is a special case of the envelope theorem result in Aubin (1979: 118). For simplicity, the Lemma deals with the case where e is Markovian and time-homogeneous. Lemma 18.1. Let e ∈ D be a positive, Markovian and time-homogeneous consumption process with et (ωt ) = e∗ (ωt ). Let h = {ht }∞ 1 with ht = 0 for t = 1, 2, h1 ∈ R, and h2 ∈ C( ). Define the convex-valued and compact-valued
440
Larry G. Epstein and Tan Wang
correspondence Q : → M( ) by % " ∗ ∗ Q(ω) = m ∈ P (ω) : V dm = V dP (ω) ,
(18.19)
where V ∗ (ω) ≡ V (e; ω),
ω ∈ .
(18.20)
Then the one-sided Gâteauz derivatives of V (·, ω) at e and in the direction h, are given by d V (e + ξ h; ω) = u (e∗ (ω))h1 dξ 0+ " % ∗ + β min u (e )h2 dm : m ∈ Q(ω) , d V (e + ξ h; ω) dξ
(18.21) 0−
= u (e∗ (ω))h1 " + β max
% u (e∗ )h2 dm : m ∈ Q(ω) .
The Lemma suggests, and this will be confirmed by examples later, that utility is not Gâteaux differentiable in general. The “origin” of this nondifferentiability is clear since uility is defined via a pointwise minimum, namely the “integral” on the right side of (18.16), corresponding to the Gilboa–Schmeidler (1989) way of modeling uncertainty aversion, and a pointwise minimum of functions is not differentiable in general. The particular representation for one-sided derivatives provided in (18.21) is also intuitive for “envelope theorem” reasons. To elaborate and in order to pave the way for its role in our study of asset prices, we spell out the following interpretation for Q : m ∈ Q(ω) if and only if m is (i) “compatible” with beliefs, in the sense of lying in P (ω), and (ii) “equivalent” to P (ω), in the sense of the calculation of expected future utility for the given base process e. Since any single prior reflects the absence of (or indifference to) uncertainty, the relation between P (ω) and each m ∈ Q(ω) is akin to that between a random pay-off and its certainty equivalent familiar in the case of risk. Accordingly, refer to Q(ω) as the set of uncertainty adjusted probability measures corresponding to P (ω), for the given e. A critical question for our purposes is whether the nondifferentiability suggested by (18.21) is likely to be sufficiently frequent to be “significant.” We postpone discussion of this question until Section 18.3.4, at which point the relevance of (18.16) for asset pricing will have been described. Finally, in this subsection, we provide an alternative formulation of Lemma 18.1. Let e and h be as stated earlier. Refer to s as a (one-period ahead) supergradient
Intertemporal asset pricing
441
of V (·, ω) at e if s is a continuous linear functional on R × C( ) satisfying V (e + h; ω) − V (e; ω) s(h1 , h2 )
(18.22)
for all (h1 , h2 ) such that e + h ∈ D. Denote by M + ( ) the space of positive countably additive measures on endowed with the weak topology induced by C( ). By the Riesz Representation Theorem, each s can be identified with an element (s1 , p) ∈ R+ × M + ( ) in the sense that s(h1 , h2 ) = s1 h1 + β h2 dp, (h1 , h2 ) ∈ R × C( ).
Lemma 18.1 shows that s1 = u (e∗ (ω)) and dp = u (e∗ ) dm for some m ∈ Q(ω). Therefore the set of supergradients of V (·; ω) at e, ∂V (e; ω), viewed as a subset of R+ × M + ( ), is given by ∂V (e; ω) = {(u (e∗ (ω)), p) : p ∈ M+ ( ), ∃m ∈ Q(ω), dp = u (e∗ ) dm}. (18.23) The continuity of the correspondence ω → ∂V (e; ω) is important in the proof of existence of an equilibrium in the economies to which we now turn.
18.3. Equilibrium asset pricing 18.3.1. The economy We consider an extension of the Lucas (1978) pure exchange economy, having a representative agent, or equivalently a number of agents with identical preferences and endowments. Preferences are as stated earlier, with the exception that we add the assumptions that the felicity function u is strictly increasing and continuously differentiable, with u (0) = ∞ admissable. Such a “minimal” variation of the Lucas model seems appropriate given our focus on the effects of uncertainty aversion. There is a single perishable consumption good with the total supply available at any time and state described by the endowment process e = {et } ∈ D. For simplicity, assume that the endowment process has a time-homogeneous Markov structure, in the sense that for some function e∗ , et (ωt ) = e∗ (ωt )
t 1,
ω t ∈ t ,
(18.24)
and that endowments are positive, that is, e∗ (ω) > 0 on .
(18.25)
There are n securities, where the ith provides the dividend process di = {di,t } ∈ D. In each period, the securities are traded in a competitive market at prices qi = {qi,t } ∈ D, i = 1, . . . , n, with consumption in each period serving as
442
Larry G. Epstein and Tan Wang
numeraire.9 Write qt ≡ (q1,t , . . . , qn,t ) and q = {qt } ∈ D n . Without loss of generality, that is, by redefining e if necessary, we can assume that each asset is available in zero net supply at all times and states. At the beginning of each period, the consumer plans consumption and portfolio holdings for the current period and all future periods in order to maximize intertemporal utility. Plans are represented by a pair (c, θ), where c ∈ D and θ = {θt } is a continuous process with θt = (θ1,t , . . . , θn,t ) representing the portfolio plan for period t. Consider a time-history pair (t, ωt ). Refer to (c, θ) as (t, ωt )-feasible if for all τ t qτ · θτ + cτ = θτ −1 · [qτ + dτ ] + eτ ,
θt−1 (ωt−1 ) ≡ 0,
and
inf θi,τ (ω ) > −∞, τ
i,τ ,ωτ
where the latter is a weak restriction on short sales and θ0 ≡ 0.10 Say that the (t, ωt )-feasible plan (c, θ ) is (t, ωt )-optimal if V (t c|ωt−1 ; ωt ) V (t c |ωt−1 ; ωt ) for all other plans (c , θ ) that are (t, ωt )-feasible. ∞ n An equilibrium is a price process {qt }∞ 1 ∈ D such that {(eτ , 0)}1 is a t t t (t, ω )-optimal plan for all t 1 and ω ∈ . In an equilibrium, spot asset and consumption good markets clear at any (t, ωt ) when the agent optimizes given expectations regarding future prices described by {qτ }∞ t+1 ; and subsequently, those expectations are fulfilled in that they clear later spot markets. Note that the consumer is dynamically consistent in equilibrium in the sense that the given (t, ωt )-optimal plan remains optimal from the perspective of all later times and histories. A weaker notion of equilibrium would require only that {(et , 0)}∞ 1 be (1, ω1 )-optimal. The relation between these two equilibrium notions is clarified in Theorem 18.2. The term “equilibrium” is reserved for the first definition. 18.3.2. Euler inequalities In this section we derive necessary conditions for an equilibrium from the firstorder conditions for the agent’s optimization problem. These conditions generalize the standard Euler equations; they take the form of inequalities, rather than equalities, because V (· ; ω) is generally nondifferentiable in the Gâteaux sense (see Section 18.2.5) unless P is a probability kernel. Suppose {qt } is an equilibrium. At any given (t, ωt ) consider a variation (c, θ) of the optimal policy such that cτ = eτ and θτ = 0 for τ = t, t + 1, ct = et − ξ( · qt ), θt = ξ , θt+1 = 0, and ct+1 = et+1 + ξ · (qt+1 + dt+1 ), where ∈ Rn represents the direction in which the period t portfolio is perturbed and ξ ∈ R represents the “size” of the perturbation. Any such perturbation must leave the agent worse-off. In other words, if ht ≡ − · qt
and ht+1 ≡ · (qt+1 + dt+1 ),
Intertemporal asset pricing
443
then in the obvious notation 0 ∈ argmax V (e + ξ(ht , ht+1 , 0, . . .); ωt ). ξ
(18.26)
By Lemma 18.1, the first-order conditions for this problem take the form11 u (e∗ ) · (qt+1 + dt+1 ) dm β min m∈Q(ωt )
u (et ) · qt β max
m∈Q(ωt )
u (e∗ ) · (qt+1 + dt+1 ) dm.
We can rewrite these inequalities in the more compact and equivalent form " ) % * u (et+1 ) min · (qt+1 + dt+1 ) − · qt 0 ∀ ∈ Rn , βEm m∈Q(ωt ) u (et ) (18.27) where Em denotes integration with respect to the probability measure m. We wish to express this infinite collection of inequalities in a more efficient and useful way. In usual formulations, where differentiability obtains, there is no loss in restricting to the coordinate directions. Such equivalence fails here, however, since the expression in (18.27) is not linear in , or equivalently, the one-sided Gâteaux derivatives of V , described in Lemma 18.1, are not linear in the perturbation. Therefore, a slightly more elaborate procedure is required. First, rewrite (18.27) in the more manageable form sup min F (m, ) 0. m∈Q(ωt )
Since F (m, ·) is linearly homogeneous, this inequality is equivalent to max min F (m, ) 0,
∈ γ m∈Q(wt )
where γ is the convex hull of {±ith unit coordinate vector: i = 1, . . . , n}. By Fan’s Theorem (see Appendix B), the latter inequality is equivalent to min
max F (m, ) 0.
m ∈ Q(ωt ) ∈ γ
By the Maximum Theorem, there exists m∗ ∈ Q(ωt ) for which the minimum over Q(ωt ) equals max ∈ γ F (m∗ , ) 0. By the linear homogeneity of F (m∗ , ·) and the fact that ∈ γ ⇔ − ∈ γ , we conclude that for all ∈ γ , F (m∗ , ) =
min
max F (m, ) = 0.
m ∈ Q(ωt ) ∈ γ
(18.28)
Since for each m, F (m, ·) is linear, max{F (m, ) : ∈ γ } is attained on the set of extreme points of γ . We arrive finally at the following system of Euler inequalities
444
Larry G. Epstein and Tan Wang
that must be satisfied in equilibrium: for all (t, ωt ), % " ) * u (et+1 ) = 0. (q + d ) − q min max βEm i,t+1 i,t+1 i,t m∈Q(ωt ) i u (et )
(18.29)
The presence of the minimization over Q(ωt ) on the left side of (18.29) justifies our use of the term “inequalities” to refer to (18.29), in spite of the equality with zero. The inequality nature of (18.29) is highlighted in the single asset case (n = 1) where it reduces, in the obvious notation, to ) * u (et+1 ) (q + d ) min βEm t+1 t+1 m∈Q(ωt ) u (et ) * ) u (et+1 ) qt max βEm (18.30) (qt+1 + dt+1 ) . m∈Q(ωt ) u (et ) When n > 1, (18.29) implies an inequality analogous to (18.30) for each asset, but this collection of n inequalities is not exhaustive for the reasons given earlier concerning the nonlinearity of one-sided Gâteaux derivatives. Of course, if P is a probability kernel, then both P (ωt ) and Q(ωt ) are singletons and (18.29) reduces to the standard Euler equation ) * u (et+1 ) (q + d ) , for all i. qi,t = βEµ(ωt ,·) i,t+1 i,t+1 u (et ) 18.3.3. Equilibrium The Euler inequalities are not only necessary, but they are also sufficient for an equilibrium, that is, any price process {qt } satisfying (18.29) is an equilibrium, as we show shortly. To establish the existence of solutions to (18.29), and therefore of equilibria, we need to restrict the probability kernel correspondence P . To formulate the added assumption, define the correspondence Qf from into M( ), for any given f ∈ C( ), by " % f dm : m ∈ P (ω) . (18.31) Qf (ω) ≡ argmin Assumption (Strict Feller property for P ). Qf is a continuous correspondence for each f ∈ C( ). If P is a probability kernel, then Qf is continuous since Qf = P . Another trivial case, termed i.i.d. beliefs, has P (ω) independent of ω; then Qf is constant and a fortiori continuous. The continuity of Qf is trivial also if is finite and endowed with the discrete topology. More generally, we can infer from the continuity of P and the Maximum Theorem only that Qf is upper semi-continuous. The interpretation of the strict Feller property is facilitated by reference to Section 18.2.5. From (18.23), we see that it implies that the set of supergradients of V (· ; ω) varies continuously with ω. From the perspective of the question
Intertemporal asset pricing
445
of the existence of equilibria, such “continuous superdifferentiability” is the essential content of the added assumption. Note that for the proof of existence in the economy corresponding to the specific endowment process e, it suffices that QV ∗ be continuous (see Lemma 18.1 and note that QV ∗ = Q). In particular, existence is guaranteed if e∗ is constant, since that V ∗ is constant and thus Q = P . The proof of existence of solutions to the Euler inequalities now proceeds as follows.12 Since Q is compact and convex-valued and continuous, it admits a continuous selection, that is, there exists a sequence of probability kernels {πt } such that πt (ωt , ·) ∈ Q(ωt ) for all t and ωt ∈ . Now consider the equation " qi,t = βEπt (ωt ,·)
%
u (et+1 ) (qi,t+1 + di,t+1 ) u (et )
(18.32)
for all t, ωt , and i. By contraction mapping arguments (as extended in Lemma 18.A.1), one can prove the existence of a unique (given {πt }) price process satisfying (18.32). For that solution q, the Euler inequalities follow immediately. The arguments-given earlier lead us to the following central theorem, the proof of which is completed in Appendix B: Theorem 18.2. (Existence and characterization of Equilibria): (a) The set of equilibria coincides with the set of price processes satisfying (18.29). (b) If P satisfies the strict Feller property, there exists equilibria. (c) If P has full support, then q ∈ D n is an equilibrium if and only if {(eτ , 0)}∞ 1 is (1, ω1 )-optimal for all ω1 ∈ . Part (c) shows that the two equlibrium notions described earlier coincide if P has full support. This is not surprising in light of the dynamic consistency property of the utility process implied by the full support assumption, as discussed in Section 18.2.4. There exists an equilibrium for each sequence of selections {πt }, used as in (18.32), implying that there may be many equilibria in our economy. This nonuniqueness is related to the findings of Dow and Werlang (1992), who show in a static model with one risky and one riskless asset, that there exists a set of asset prices that support the optimal choice of a riskless portfolio. Here we extend their analysis to an infinite-horizon, multiple-asset framework and we show that the nonuniqueness of supporting prices is not restricted to riskless positions. Simonsen and Werlang (1991) also observe the potential nonuniqueness of supporting prices under uncertainty aversion in a static setting. Note also that the nonuniqueness of prices and its “origin” in the multiplicity of underlying priors accord well with Keynes’ intuition. He writes (1936: 152) that the “existing market valuation . . . cannot be uniquely correct, since our existing knowledge does not provide a sufficient basis for a calculated mathematical expectation.” In order to discuss further the nonuniqueness or indeterminacy of equilibrium prices, adopt the following notation and terminology: Denote by E the set of all
446
Larry G. Epstein and Tan Wang
equilibria. Say that the price of the ith security is determinate if for all q and q in ∞ E , {qi,t }∞ 1 = {qi,t }1 . Theorem 18.3. (Structure of set of equilibria). If P satisfies the strict Feller property, then: (a) E is a closed and connected subset of D n . (b) For each i, the equations " q¯i,t = β max Em m∈Q(ωt )
"
q i,t = β min Em m∈Q(ωt )
%
u (et+1 ) (q¯i,t+1 + di,t+1 ) u (et )
and %
u (et+1 ) (q i,t+1 + di,t+1 ) u (et )
(18.33)
have unique solutions in D, denoted q¯i and q i , respectively. These solutions satisfy the condition that for any q ∈ E and for any i and t, q i,t qi,t q¯i,t
on t .
(18.34)
Moreover, given i, t, and any ε > 0, there exist q 1 and q 2 in E such that 1 q i,t + ε qi,t
and
2 qi,t q¯i,t − ε
on t .
(18.35)
Finally, the ith security price is indeterminate if and only if for some t q i,t ≡ q¯i,t ,
(18.36)
in which case {qi : q ∈ E } is an uncountably infinite set.13 Part (a) provides some information regarding the size of E . Since E is a connected complete metric space, it follows from the Baire category theorem (Royden (1988: 159)), that if the equilibrium is not unique, then there exists an uncountable infinity of equilibria. This is confirmed by part (b). The latter first provides, via (18.34), bounds for the equilibrium price of any security and then shows that these bounds are tight, in the natural sense of (18.35). Finally, (18.36) provides necessary and sufficient conditions for price indeterminacy. In special circumstances, those conditions assume a simpler form. For instance, the condition % % " " u (et+1 ) u (et+1 ) min Em = max Em m∈Q(ωt ) m∈Q(ωt ) u (et ) u (et ) characterizes the indeterminacy of the price of a one-period discount bond issued at (t, ωt ) and paying one unit of consumption at (t + 1). Intuitively, we would expect a link between indeterminacy of asset prices and intertemporal price volatility. This intuition can be confirmed in the special case of “i.i.d. beliefs,” that is, where P (ω) is independent of ω, in which case the
Intertemporal asset pricing
447
correspondence Q is also constant. Hence, for a security with time-homogeneous dividend process, if the price of the security is determinate, then it must be constant (across time and states). Consequently, any fluctuation in price is a reflection of indeterminacy. More generally, the link between indeterminacy and volatility can be thought of in the usual way in terms of the existence of “sunspot equilibria.” That is, if the selection {πt } from Q (see 18.32)) is made to depend on a “sunspot” or “extrinsic” variable, then the corresponding equilibrium price process will also depend on that variable.14 The discussion to this point has assumed implicitly that price indeterminacy is a significant feature of our model in the sense of occurring on a “nonnegligible” set of economies. That this assumption is warranted is most easily demonstrated in the context of specific examples of probability kernel correspondences and so we defer further discussion to the next section. The final result of this section provides a further characterization of equilibria. Let q be an equilibrium and reconsider (18.28). For the given t, we will now consider ωt to be variable and thus the dependence of F on ωt (through qt and qt+1 ) is made explicit by writing F (m, , ωt ). From (18.28) and the linearity of F (m, ·, ωt ) we derive min g(m, ωt ) = 0,
m∈Q(ωt )
where g(m, ωt ) ≡ max{F (m, , ωt ) : an extreme point of γ }. By the Maximum Theorem, g is continuous and the correspondence of minimizers given earlier is upper semicontinuous. Therefore, it admits a measurable selection (Klein and Thomson (1984: Theorem 4.2.1)), that is, there exists for each t ξt : t → M( ) measurable, ξt (ωt , ·) ∈ Q(ωt )
∀ωt ∈ t
(18.37)
and g(ξt (ωt , ·), ωt ) ≡ 0. Substitution of the appropriate expressions for g and F establishes the nontrivial portion of the following result. Theorem 18.4. (Further characterization of equilibria). q is in E if and only if q is in D n and for some {ξt } as in (18.37), q satisfies " % u (et+1 ) qi,t = βEξt (ωt ,· ) (qi,t+1 + di,t+1 ) , (18.38) u (et ) for all t and i.15 The characterization provided by Theorem 18.4 is helpful in placing our model of asset price determination in the context of the literature. In order to proceed, adopt the standard assumption that the actual evolution of {ωt } is described by a probability kernel π ∗ . In place of the rational expectations hypothesis that π ∗ is known precisely by the agent, assume instead that Q is absolutely continuous with respect to π ∗ , for which it suffices that the probability kernel correspondence P
448
Larry G. Epstein and Tan Wang
be absolutely continuous (recall (18.7)). Such absolute continuity is assured if
is finite and π ∗ (ω, ω ) > 0 for all ω and ω in . Denote by zt+1 (ωt , ·) : → R+ the Radon-Nikodym derivative of ξt (ωt , ·). Then (18.38) has the form " % u (et+1 ) (q + d ) . (18.39) qi,t = βEπ ∗ (ωt ,· ) z t+1 i,t+1 i,t+1 u (et ) By construction, {zt+1 } is restricted by zt+1 0, zt+1 dπ ∗ (ωt , ·) ≡ 1 and
ξt (ωt , ·) ∈ Q(ωt ), dξt (ωt , ·) ≡ zt+1 (ωt , ·)dπ ∗ (ωt , ·).
(18.40)
The relations (18.39), without (18.40) or other restrictions on {zt+1 }, can be established under fairly general considerations and contain commonly considered models as special cases (see Hansen and Richard (1987) and Hansen and Jagannathan (1991)). Generally, (18.39) is rewritten in terms of the “stochastic discount factors” γt+1 ≡ βzt+1 u (et+1 )/u (et ) in the form qi,t = Eπ ∗ (ωt ,·) [γt+1 (qi,t+1 + di,t+1 )],
i = 1, . . . , n.
(18.41)
Since one can always find some {γt+1 } so that 18.41 is satisfied, the empirical content of any particular model of asset prices is represented by the restrictions it imposes on the discount factors {γt+1 } or equivalently, on {zt+1 }. For our model, those restrictions are represented by (18.40). The standard Lucas based rational expectations model imposes the stronger restriction {zt+1 } ≡ 1. See Cochrane and Hansen (1992) for examples of other restrictions on stochastic discount factors that have been studied in the literature. 18.3.4. Examples We illustrate and elaborate upon our analysis of asset price determination in the context of the two examples of probability kernel correspondences of Section 18.2.3. Then, in order to lend indirect support to our “explanation” of price indeterminacy, we examine another model where indeterminacy can occur—a Lucas-style model where the felicity function u is not differentiable. Finally, we consider briefly an example of an economy where agents are uncertainty averse and heterogeneous so that trade may occur. ε-Contamination. It follows from (18.10) that for any f ∈ C( ), " % Qf (ω) = (1 − ε(ω))π ∗ (ω) + ε(ω)m : m ∈ M argmin f .
Therefore, the strict Feller property is satisfied. In the particular case f = V ∗ (see Lemma 18.1), Q(ω) = (1 − ε(ω))π ∗ (ω) + ε(ω)M( m ),
m ≡ argmin V ∗ .
(18.42)
Intertemporal asset pricing
449
For simplicity, assume henceforth that dividend processes are Markovian and time-homogeneous,16 di,t (ωt ) = di∗ (ωt ),
for all i, t, and ωt .
Then it follows from Theorem 18.4 and (18.42) that the price of security i is indeterminate if and only if u (e∗ )di∗ is nonconstant on m .
(18.43)
The essential economic (as opposed to mathematical) content of this restriction is that knowledge of the level of intertemporal utility V ∗ is not sufficient to infer the weighted dividend u (e∗ )di∗ , or more precisely u (e∗ )di∗ is not V ∗ -measurable.17
(18.44)
The conditions for indeterminacy simplify if we consider the i.i.d. case where ε(ω) and π ∗ (ω, ·), and hence also P (ω) and Q(ω), are independent of ω. Then V ∗ (·) = u(e∗ (·)) + constant. Therefore, by (18.43), the ith price is indeterminate if and only if di∗ is nonconstant on argmin e∗
(18.45)
which in the sense explained in the preceding footnote is tantamount to18 di∗ is not e∗ -measurable.
(18.46)
This will be the case, for example, if there exist state variables affecting dividends that do not influence consumption. Since that is a plausible hypothesis, we conclude that our model predicts price indeterminacy for a “broad” or at least economically interesting class of dividend and endowment processes. Moreover, note that the model delivers predictions regarding the cross-sectional (across asset) variation of the degree of indeterminacy. That is, referring to Theorem 18.4, we see that q j ,t q i,t q¯i,t q¯j ,t
if
min dj∗ min di∗ max di∗ max dj∗ ,
m
m
m
m
where, given the i.i.d. assumption, m = argmin e∗ . Therefore, asset j features a “large” degree of indeterminacy in its price if [min m dj∗ , max m dj∗ ] is large, which interval provides a measure of the extent to which dj∗ is “unpredictable” given consumption.19 Finally, consider the counterpart of (18.39)–(18.40), under the assumptions that the contamination function ε is constant, is finite, and π ∗ (ω, ω ) > 0 for all
450
Larry G. Epstein and Tan Wang
ω and ω in , ensuring thereby the absolute continuity of P with respect to π ∗ . Then (18.40) is equivalent to20 zt+1 dπ ∗ (ωt , ·) = 1 and
zt+1 1 − ε, zt+1 (ωt , ·) = 1 − ε
on \ m ,
and the associated Euler equations (18.39) take the form ) * u (et+1 ) −1 ∗ zt+1 Ri,t+1 , i = 1, . . . n, β = Eπ (ωt ,· ) u (et )
(18.47)
(18.48)
where Ri,t+1 ≡ [qi,t+1 + di,t+1 ]/qi,t . The potential empirical significance of (18.47)–(18.48) can be illustrated through the analysis of stochastic discount factors in Hansen and Jagannathan (1991), for example. They infer from asset price and aggregate consumption data for the United States that stochastic discount factors that rationalize the data in the sense of (18.41) must have a large variance. The indicated variance is often considered too extreme to be compatible with any “reasonable” model of fundamentals and is occasionally interpreted as evidence for “fads” (Poterba and Summers (1988)). In particular, the consumption-based model having zt+1 ≡ 1, is rejected in this way because consumption is too smooth. It is interesting, therefore, to examine whether our specific model of discount factors (18.47) is compatible with a large variance. To highlight the role of uncertainty, we make the challenge facing our model of discount factors as difficult as possible and assume the extreme case of “smooth consumption,” e∗ constant. We then compute mvar(ε), the maximum variance of limiting distributions corresponding to some {zt+1 } satisfying (18.47) and ergodicity. (Ergodicity justifies the approximation of moments of the limiting distribution by appropriate sample moments.) Assuming that {ωt } under π ∗ is ergodic with limiting distribution described by p ∈ M( ), we find that21 2 mvar(ε) = ε 1 − min p(ω) / min p(ω). ω∈
ω∈
A consideration in evaluating the implications of this expression is that the underlying state space and therefore also π ∗ , may not be observable to the analyst even if the probability distributions induced by π ∗ on dividends and rates of return are observable or estimable. Note accordingly that for any given ε > 0, mvar(ε) → ∞ as minω∈ p(ω) → 0. It follows that, unless the analyst insists on maintaining assumptions on and π ∗ that are themselves arguably irrefutable, our model does not restrict the variance of discount factors. Moreover, the earlier mentioned is true for any fixed ε > 0, even arbitrarily small. This suggests, therefore, that some heretofore anomalous features of asset return data can be accommodated if we introduce a “small” amount of uncertainty aversion into the standard model. The earlier mentioned is not to suggest that other important empirical puzzles are similarly resolvable or that the model (18.47)–(18.48) is irrefutable. Indeed,
Intertemporal asset pricing
451
in other dimensions the empirical restrictiveness of the generalization (18.42) diminishes “continuously” as ε increases from 0, the standard model, to the extreme of complete ignorance, ε = 1. For example, assuming for simplicity that e∗ is constant, it follows from (18.47)–(18.48) that the return to a one-period pure discount bond equals β −1 and that −1 Eπ ∗ (ωt ,·) Rt+1 − β ε Eπ ∗ (ωt ,·) Rt+1 − min Rt+1 ,
where Rt+1 ≡ (qt+1 + dt+1 )/qt . Consequently, the largest admissible equity premium is small if ε is small.22 Belief Function Kernels. Let f ∈ C( ) and define ψf (y) ≡ argmin{f (ω) : G(ω) = y}. From (18.15) (see also Wasserman (1990: Theorem 2.1)), it follows that for any given f ∈ C( ), Qf (ω) ={m ∈ M( ) : m(·) =
r(y)(·) dp(y|G(ω)) for some function
G( )
r : G( ) → M( ) such that r(y)(ψf (y)) = 1 for all y}. (18.49) Therefore, Qf is a continuous correspondence and PG satisfies the strict Feller property if the mapping y → p(·|y ) is continuous in the strong topology. If f is set equal to V ∗ in (18.49), we obtain a representation for elements of Q as a suitable mixture of measures {r(y) : y ∈ G( )}, where r(y) has support on ψV ∗ (y). Since V ∗ is constant on each ψV ∗ (y), every m ∈ Q(ω) induces the identical probability distribution for V ∗ . Nevertheless, Q(ω) is a nonsingleton if the set of minimizers ψV ∗ (y) is a nonsingleton for “many” y values, since then there are many possible choices for the measure r(y) supported on ψV ∗ (y). Arguing as in the preceding example, we can show that the essential economic characterization of indeterminacy for the ith security price is the condition u (e∗ )di∗ is not (G, V ∗ )-measurable;
(18.50)
that is, the level of the weighted dividend u∗ (e∗ )di∗ cannot be inferred from knowledge of the levels of the statistics G and intertemporal utility V ∗ . This can be expected to be the case in situations where the statistics G provide only a crude summary of the underlying state. Nondifferentiable Lucas Model. Price indeterminacy can occur also in a Lucasstyle model where the felicity function u is not necessarily differentiable. However, such an “explanation” of indeterminacy differs from ours in two important respects. First, it does not capture Keynes’ intuition, in the citation given earlier, regarding the link between uncertainty and indeterminacy. In our model, V (c) = 0∞ β t u(ct ) for deterministic consumption processes, that is, those for which each ct is a
452
Larry G. Epstein and Tan Wang
constant function. Therefore, all the usual regularity properties, including the uniqueness of supporting prices, are satisfied in the domain of deterministic consumption processes, supporting our assertion that indeterminacy is due to uncertainty. In contrast, in the modified Lucas model, supporting prices are nonunique even for deterministic consumption processes. The second important difference concerns the robustness of the prediction of indeterminacy. Since u can fail to be differentiable only on a zero Lebesgue measure subset K of R, security prices are determinate in the Lucas model as long as all conditional distributions assign zero probability to consumption lying in K. For example, if the endowment process is constant with e∗ ≡ e, ¯ then security prices are determinate for all e¯ ∈ / K. On the other hand, for the constant endowment case our model predicts indeterminacy for all values of e¯ and all securities paying nonconstant dividends (see (18.43), for example, and note that m = if e∗ is constant). More generally, we have argued earlier that in our model indeterminacy occurs in a “large” set of economies. Heterogeneous Agents. Our “justification” for representative agent modeling is the usual one, namely that it provides a simple way to organize observations in terms of familiar microeconomic principles and notions. One may also take a more stringent view and ask whether such a model can be justified theoretically in the context of an economy with heterogeneous agents. Here we adopt such an approach and prove a complete-markets aggregation theorem along the lines of Constantinides (1982), thereby providing an additional “example” to which our representative agent analysis applies. The example serves also to suggest an alternative interpretation for our price indeterminacy result in a model with trade and to clarify the “real” consequences of Knightian uncertainty in our model. Expand the economy defined in Section 18.3.1 to admit H consumers, where consumer h has intertemporal utility function V h corresponding to discount parameter β, belief kernel correspondence P , and felicity function uh , the only source of differences in consumer preferences. (Though restrictive, these assumptions, are weaker than those in Constantinides (1982), where the standard single prior representation of beliefs is also imposed.) Specialize P further so that it implies i.i.d. beliefs (P (ω) independent of ω), has full support and is based on the capacity representation of beliefs) Schmeidler (1989)); both the ε-contamination and belief function kernel examples fulfill the latter requirement. (See Appendix C for clarification and for a proof of the assertions given later under more general assumptions.) Though we will be interested in the competitive equilibria of a decentralized economy, it is useful first to characterize Pareto optimal allocations given the earlier mentioned preferences, an aggregate endowment process c (possibly different from e), and the initial state ω0 . For the usual reasons, it is enough to consider, for each vector α = (αh )H h=1 of nonnegative utility weights, the planning problem U α (c; ω0 ) ≡ max {αh V h (ch ; ω0 ) : ch ∈ D, ch = c},
c ∈ D. (18.51)
This U α is a candidate utility for the representative agent in the decentralized economy specified in the usual way (see e.g. Duffie (1992: Chapter 2)). Consumers
Intertemporal asset pricing
453
begin with endowments eh ∈ D of consumption and zero shares of each asset and then trade in complete asset markets. Focus on an (Pareto optimal) equilibrium allocation and denote by q ∈ D n a corresponding equilibrium price and by α the utility weights corresponding to (18.51). Then, by suitable adaptations of Duffie (1992: 9–11), q is also an equilibrium in the single-agent model with aggregate endowment e and intertemporal utility U α . The agent with utility U α is “representative” if the intertemporal utility function α U lies in the same recursive class defined in Section 18.2 containing the individual utilities. Under our assumptions, thus is indeed the case: The standard risk-sharing rule, that is Pareto optimal in the expected utility framework of Constantinides, is to allocate the endowment x at any time t and state ωt by solving , + uα (x) ≡ max αh uh (xh ) : xh = x .
(18.52)
Under our assumptions, this risk-sharing rule continues to be efficient given aversion to Knightian uncertainty, that is, the set of processes {c¯h }H h=1 solves (18.51) H h t t if and only if {c¯t (ω )}h=1 solves (18.52) for all t, w and x = ct (ωt ). It follows that U α is the recursive intertemporal utility function corresponding, in the sense of our paper, to β, P and uα . This aggregation result “justifies” the application to aggregate data of our Euler inequalities (18.29) or the discount-factor model (18.39)–(18.40). However, interpretations of the indeterminacy of prices and its potential empirical relevance must be revised. That is because it is generally not the case that every equilibrium q for the representative agent economy with utility U α is also a competitive equilibrium for the given initial endowments {eh }. The situation is easily visualized in the context of an Edgeworth box where at an arbitrary point on the contract curve there exists a continuum of price lines that separate the better-than-sets for the two agents, but these lines do not all pass through the given initial endowment. Since not all selections from the set of representative agent equilibria are warranted, our earlier discussion of sunspots, animal spirits and price volatility seems wrong from this perspective. However, not all is lost if the analyst does not know the initial micro endowments. Indeed, if she knows nothing at all about them other than that they sum to e, then from her perspective all equilibria in the representative agent model have equal standing and the significance of price indeterminacy in the representative agent model is restored. More generally, one would expect there to remain a continuum of representative agent price equilibria that are consistent with the analyst’s information about the micro endowments, and some potential for explaining price volatility would be retained. We emphasize that, according to this perspective, the “origin” of price indeterminacy and the associated price volatility lies in the conjunction of: (i) agents’ aversion to Knightian uncertainty and (ii) incompleteness of a model formulated exclusively in terms of aggregate variables, or the analyst’s incomplete information. The preceding also clarifies the differing implications of our model for prices versus allocations. In the general representative agent model, prices may be
454
Larry G. Epstein and Tan Wang
indeterminate while consumption is exogenously specified and thus trivially determinate. This can “explain” greater volatility for prices than for consumption. These comparisons are more interesting in the heterogeneous agent model where the consumption side is nontrivial. Here we see the earlier confirmed in the sense that the prices supporting a given efficient allocation may be indeterminate. This is not to say, however, that Knightian uncertainty aversion has no real consequences, as it clearly influences the set of efficient and competitive allocations.
18.4. Remarks on empirical content Alternative models of irrational expectations, such as Shiller’s model of “fads,” have been criticized for not being well enough specified to produce rejectable implications (West (1988), Cochrane (1991), Leroy (1989)). Some readers may be skeptical also regarding the useful empirical content of our model. The discussions surrounding (18.39)–(18.40) and in Section 18.3.4 provided some indication of the potential usefulness of our model. Here we argue further that empirical investigation of our model is potentially fruitful. However, we caution the reader that the example just described may provide cause for suitably revising and weakening our arguments regarding empirical relevance. One potential source of skepticism concerns Theorem 18.4. Equation (18.38) is the Euler equation implied by a Lucas-style model in which {ξt } represents beliefs. Note that ξt is not a probability kernel because it, (i) depends on ωt and not just ωt and (ii) may not be continuous in ωt . Nevertheless, the theorem raises concerns about whether our model is essentially observationally indistinguishable from a Lucas model, with the rational expectations hypothesis possibly deleted, but where beliefs are represented by probability measures and therefore uncertainty neutrality prevails. Observe, however, that to replicate an equilibrium q as an equilibrium of a Lucas-style model and the associated Euler equation (18.38), the required “shadow” sequence of probability kernels {ξt } may seem unnatural and contrived. (For convenience, we refer to the ξt ’s as probability kernels though they need not conform to our definition of the term.) For example, the ξt ’s will often depend on history or be time dependent for no “good” reason. Second, when some states are extrinsic (see the discussion of sunspot equilibria in Section 18.3.3), replication of a sunspot equilibrium q requires that “shadow” beliefs about intrinsic states, represented by {ξt }, depend upon extrinsic states. Therefore, acceptance of the Lucas model approximation requires that one revise the classification of “intrinsic” versus “extrinsic” states. Finally, we point out later that our model has some crosssectional (across agent) implications. They can be delivered also by a Lucas-style model with a larger number of agents, if each agent’s beliefs are represented by some {ξt }, but the latter would have to vary across agents in an artificial way. Another possible reason for skepticism is the feeling that our model “can explain anything” by a suitable specification of the capacity kernel representing beliefs, which are presumably unobservable. But similar remarks apply with respect to the specification of utility even if the Bayesian, rational expectations model of beliefs is adopted. That is, in principle, a wide range of specifications are possible
Intertemporal asset pricing
455
for the intertemporal von Neumann–Morgenstern index v(c0 , c1 , . . . , ct , . . .). The strong predictive content of the Lucas asset pricing model derives in part from the t u(c ). This specializaparametric specialization of v to the additive form ∞ β t 0 tion is widely accepted, at least as a benchmark, both because of the tractability that it delivers and because we have some understanding of its plausibility, via its axiomatic underpinnings, for example. Analogy with the present context of modeling beliefs argues, not for skepticism, but rather for the need to study the properties of alternative specifications for P . This chapter points out some attractive features of the ε-contamination model (18.9) and of belief function kernels, but much more work in this direction is required. In order to derive rejectable predictions for time series data, beliefs must be related to the actual evolution of the state process. One possible link is to posit that {ωt } is governed by a probability kernel π ∗ and that beliefs incorporate some vagueness about π ∗ on the part of the investor. For reasons of robustness of empirical procedures, Lehmann (1992) suggests studying pricing equations for a range of discount factors, reflecting the analyst’s imprecise information about the correct factors. It is at least as plausible to posit that investor’s information is imprecise. Here such imprecision is incorporated into the theoretical framework and a “robust” theoretical model is delivered. Finally, some may disagree with the presumption that beliefs are unobservable; for instance, a number of researchers, cited in the introduction, have used survey data as an independent measure of investors’ expectations. Therefore, to conclude, suppose that such information is available for a cross-section of investors and consider some predictions of our model regarding expectations. We interpret our model as containing a number of agents with identical endowments and preferences, including the probability kernel correspondence P . If surveys elicit entire correspondences, then agents will respond identically given our model. However, suppose that they are asked for a conditional probability distribution over next period’s state variables, or for some summary moments and that they respond with an uncertainty adjusted conditional prior, that is, with an element of Q(ω). Then there is no reason to expect all investors to report the same element of Q(ω). Thus our model is consistent with heterogeneous measured forecasts, even though agents have common information in the form of P . Moreover, the dispersion of forecasts should increase if Q(ω) increase in the sense of set inclusion. Specialize to the ε-contamination model of beliefs (18.9) and suppose that ε(ω) is larger in those states ω where the “true” conditional probability measure π ∗ (ω) is riskier, for example, has larger variance. Then a positive relation is indicated between dispersion of reported expectations of forecasters on the one hand and the poor performance of point forecasts, on the other. For a related prediction, recall our earlier discussion of a link between the indeterminacy and volatility of prices. Given such a link, our model suggests a positive relation between price volatility and the dispersion of reported expectations of forecasters. There is some supporting evidence for such a relation (Cragg and Malkiel (1982), Frankel and Froot (1990)).
456
Larry G. Epstein and Tan Wang
One could derive a number of other predictions that would be testable given appropriate survey data. Needless to say, we are not asserting that such data are currently available (see, however, Zarnowitz and Lambros (1987)). The current paucity of suitable data is not damning of our model, however. After all, one of the roles of theory is to guide the collection of data.
Appendix A: Proof of Theorem 18.1 The following Lemma (18.A.1) is an adaptation to our space D, consisting of sequences of real-valued functions, of the well-known Blackwell sufficient condition for a contraction mapping that applies to a space of real-valued functions. Lemma 18.A.1. Let T : D → D be an operator with the following properties: (i) (Monotonicity): if f , g ∈ D and f g, that is, ft (ωt ) gt (ωt ) for all t and ωt , then Tf T g; (ii) (Discounting): there exists a real constant β, 0 < bβ < 1, such that for any f ∈ D and sequence of constant functions a = {at } ∈ D with at ∈ R+ , (T (f + a))t (ωt ) (Tf )t (ωt ) + at+1 β for all t and ωt . Then T has a unique fixed point. Proof. Let f and g ∈ D. Set at = ft −gt . Then f g+a. By monotonicity and discounting, (Tf )t (ωt ) (T (g + a))t (ωt ) (T g)t (ωt ) + βft+1 − gt+1 . Thus |(Tf )t (ωt ) − (T g)t (ωt )|/bt βbft+1 − gt+1 /bt+1 , and further Tf − T g βbf − g, proving that T is a contraction. Proposition 18.A.1 (Existence of utility). For each c ∈ D there exists a unique V (c) ∈ D such that (18.16) holds for all t and ωt . Proof. Define a map T : D → D by ∀f ∈ D (Tf )t (ωt ) = u(ct (ωt )) + β ft+1 (ωt , ·) dP (ωt , ·). By the continuity of P , (Tf )t is continuous. Next, β sup(Tf )t (ω )/b sup u(ct (ω ))/b + t sup t t b ω ω ωt t
t
t
t
"
% ft+1 (ω , ·) dP (ωt , ·) t
= sup u(ct (ωt ))/bt + βbft+1 /bt+1 . ωt
Since u is increasing, concave, and u(0) = 0, we have ∞ > u(c) u(ct /bt ) u(ct )/bt and Tf = sup sup t
ωt
(Tf )t (ωt ) bt
u(c) + βb sup ft+1 /bt+1 u(c) + βbf . t
Intertemporal asset pricing
457
Therefore, T is well-defined. Monotonicity and discounting for T are obvious. By Lemma 18.A.1, T has a unique fixed point, which is the solution of (18.16). Proposition 18.A.2 (Approximation of utility). Fix c ∈ D. For each T , define T {VtT (c)}∞ t=1 ∈ D by Vt ≡ 0 for t > T and T VtT (c; ωt ) = u(ct (ωt )) + β Vt+1 (c; ωt , ·) dP (ωt , ·) for 0 t T . Then limT →∞ VtT (c; ωt ) = Vt (c; ωt ) for all t and ωt . Proof. Verify that, for any t T and ωt , VtT (c; ωt ) Vt (c; ωt ) VtT (c; ωt ) + V (c)(βb)T −t+1 bt .
(18.A.1)
Proposition 18.A.3 (Continuity of utility). If u satisfies the growth condition, then Vt (c; ωt ) is continuous in (c, ωt ). Proof. Under the growth condition, ∞
|Vt (c; ωt )|
k1 β j ct+j . + k2 1−β j =1
Hence V (c)
βbk2 c βk1 + . 1−β 1 − βb
Thus it follows from (18.A.1) that βk1 βbk2 c t T t Vt (c; ω ) − Vt (c; ω ) + (βb)T −t+1 bt . 1−β 1 − βb
(18.A.2)
Let cn → c. For fixed t, |Vt (cn ; ωt ) − Vt (c; ωt )| |Vt (cn ; ωt ) − VtT (cn ; ωt )| + |VtT (cn ; ωt ) − VtT (c; ωt )| + |Vt (c; ωt ) − VtT (c; ωt )|. For all cn such that cn − c < 1, |ctn (ωt )| (c + 1)bT for t T . It follows from the continuity of u that the second term converges to zero as cn → c and that the convergence is uniform in ωt . By (18.A.2), the first and third terms on the right side converge to zero as T → ∞ uniformly in n and ωt . Therefore, Vt (·; ωt )
458
Larry G. Epstein and Tan Wang
is continuous at c uniformly in ωt . The desired joint continuity now follows from the continuity of Vt (c; ωt ) in ωt . The remaining properties asserted for utility can be proven by standard arguments from the theory of recursive utility (see e.g. Lucas and Stokey (1984), Stokey and Lucas (1989), and Epstein and Zin (1989)). Footnote 8 clarifies the link with the recursive utility literature; note that W defined there is increasing and concave. For dynamic consistency, note that if P has full support, then (18.8) applies.
Appendix B: Proof of Theorems in Section 18.3 For the convenience of the reader, we provide here statements of two results invoked in Section 18.3. The first is the version of Fan’s Theorem employed in the derivation of the Euler inequalities (18.29). A stronger form is proven in Sion (1958: Theorems 4.2 and 4.2 ). Fan’s Theorem. Let X and Y be metrizable convex and compact subsets of some linear topological spaces, and f a continuous real-valued function on X × Y that satisfies (i) f (·, y) is concave on X for each y; and (ii) f (x, ·) is convex on Y for each x. Then max min f (x, y) = min max f (x, y). x
y
y
x
Second, the argument surrounding (18.32) relies on the following selection theorem that is slightly stronger than that which is explicitly stated in Michael (1956). This result is also needed below in the proof of Theorem 18.3. Lemma 18.B.1 (Michael). Suppose that X is paracompact, Y is a topological linear space, and Z is a convex closed subset of Y containing 0 that has a base {Bn } for the neighborhoods of 0, consisting of symmetric and convex sets such that Bn+1 ⊂ 12 Bn . Suppose that ψ : X → Z ⊂ Y is a lower semicontinuous convex-valued correspondence such that ψ(X) + Bn ⊂ Z. Suppose further that for each y ∈ ψ(X), y + Bn is open in Z. Then the correspondence ψ¯ defined by ¯ ψ(x) = ψ(x) admits a continuous selection. Proof. See the proofs of Lemma 4.1 and Theorem 3.2 in Michael (1956), or the proof of Theorem 9.G of Zeidler (1986). Proof of Theorem 18.2. Part (a): It remains to show only that if a price process q satisfies (18.29), or equivalently (18.27), then it is an equilibrium. Let (c, θ) be any (t, ωt )-feasible plan for which θt−1 (ωt−1 ) = 0. It follows from (18.27) with = θτ that there exist πτ : τ → M( ), τ t, such that πτ (ωτ ) ∈ Q(ωτ ) for
Intertemporal asset pricing
459
each ωτ and θτ · qτ u (eτ ) βEπτ (ωτ ,·) {u (eτ +1 )θτ · (qτ +1 − dτ +1 )}.
(18.B.1)
It follows from the budget constraints that eτ − cτ − θτ · qτ = −(qτ + dτ ) · θτ −1 .
(18.B.2)
By the concavity of u, u(eτ ) u(cτ ) + u (eτ )(eτ − cτ ).
(18.B.3)
Define VtT as in Proposition 18.A.2. We have the following lengthy but elementary chain of inequalities: VtT (c; ωt ) − Vt (e; ωt ) T (c; ωt , ·) dP (ωt , ·) − u(et ) − β Vt+1 (e; ωt , ·) dP (ωt , ·) = u(ct ) + β Vt+1 u (et )(ct − et ) + β = −u (et )θt · qt + β
T Vt+1 (c; ωt , ·) dP (ωt , ·) − β
Vt+1 (e; ωt , ·) dP (ωt , ·)
T Vt+1 (c; ωt , ·) dP (ωt , ·) − β
Vt+1 (e; ωt , ·) dP (ωt , ·)
T (c; ωt , ·)} − βEπt (ωt ,·) {Vt+1 (e; ωt , ·)} −u (et )θt · qt + βEπt (ωt ,·) {Vt+1 T βEπt (ωt ,·) {−u (et+1 )θt · (qt+1 + dt+1 )} + βEπt (ωt ,·) {Vt+1 (c; ωt , ·)}
− βEπt (ωt ,·) {Vt+1 (e; ωt , ·)} " T (c; ωt+1 , ·) = βEπt (ωt ,·) −u (et+1 )θt · (qt+1 + dt+1 ) + u(ct+1 ) + β Vt+2 %
× dP (ωt+1 , ·) − u(et+1 ) − β
Vt+2 (e; ωt+1 , ·) dP (ωt+1 , ·)
" T (c; ωt+1 , ·) = βEπt (ωt ,·) u (et+1 )(et+1 − ct+1 − θt+2 · qt+1 ) + u(ct+1 ) + β Vt+2 %
× dP (ωt+1 , ·) − u(et+1 ) − β
Vt+2 (e; ω
t+1
, ·) dP (ωt+1 , ·)
" T (c; ωt+1 , ·) dP (ωt+1 , ·) βEπt (ωt ,·) −u (et+1 )θt+2 · qt+1 + β Vt+2 %
−β
Vt+2 (e; ωt+1 , ·) dP (ωt+1 , ·)
460
Larry G. Epstein and Tan Wang " " βEπt (ωt ,·) βEπt+1 (ωt+1 ,·) − u (et+2 )θt+2 · (qt+2 − dt+2 ) + u(ct+2 ) + β
T Vt+3 (c; ωt+2 , ·) dP (ωt+2 , ·)
%%
−u(et+2 ) − β .. .
Vt+3 (e; ωt+2 , ·) dP (ωt+1 , ·)
"
βEπt (ωt ,·)
" · · · βEπt+T (ωt+T ,·)
− u (et+T +1 )θt+T · (qt+T +1 + dt+T +1 )
+ u(ct+T +1 ) − u(et+T +1 ) % % −β Vt+T +2 (e; ωt+T +1 , ·) dP (ωt+T +1 , ·) · · · "
" · · · βEπt+T (ωt+T ,·)
βEπt (ωt ,·) −β
Vt+T +2 (e; ω
t+T +1
− u (et+T +1 )θt+T · qt+T +1 %
%
, ·) dP (ωt+T +1 , ·) · · ·
+ , , βEπt (ωt ,·) · · · βEπt+T (ωt+T ,·) u (et+T +1 )K · qt+T +1 · · · +
bt (βb)T −t+1 u (e∗ )K · q → 0
where K ∈ Rn+ has all components equal to inf i,τ ,ωτ θi,τ (ωτ ). The first inequality follows from (18.B.3); the second equality follows from (18.B.2) and θt−1 = 0; the second inequality follows from πt (ωt ) ∈ Q(ωt ); the third inequality follows from (18.B.1); the third and fourth equalities follow from (18.16) and (18.B.2); the fourth inequality follows from (18.B.3); the fifth inequality follows from (18.B.1) and πt+1 (ωt+1 ) ∈ Q(ωt+1 ); the seventh inequality follows from (18.B.2) and (18.B.3); the eighth inequality follows from the short selling constraint θ −K and the nonnegativity of utility; and the last inequality follows from the fact that the process {u (et )K · qt } is in D. Thus Vt (c; ωt ) − Vt (e; ωt ) 0, which implies that q is an equilibrium. Part (b): We need to show only that Q admits a continuous selection. Then the claim of (b) follows from (a) and the arguments in the text surrounding (18.32). Let C ∗ ( ) be the dual of C( ) endowed with the weak∗ topology. Then M( ) is a compact subset of C ∗ ( ). Since C( ) is separable, there exists a countable family {fn } that is a dense subset of the closed unit ball of C( ). Let Z be the closed ball with radius 4 in C ∗ ( ), that is, Z ≡ {m ∈ C ∗ ( ) : m 4}, where the norm is the usual norm on the dual space. Define a metric on Z by 1 fn dP − fn dQ . d(P , Q) = n 9 n
Intertemporal asset pricing
461
This metric induces the weak∗ topology on Z. In particular, it induces the weak convergence topology on M( ), which is a subset of Z. Under this metric, Z is a convex and compact metric space. Define $ 1 1 fn dm < 1 , Bn = n B, B≡ m∈Z: n 9 2 n and apply Lemma 18.B.1. Part (c): Follows from the dynamic consistency of the utility process under the assumption of full support for P . Proof of Theorem 18.4. See text. Proof of Theorem 18.3. Part (b): (i) Proof of (18.33) and (18.34): Define contraction mappings T i : D → D and T i : D → D by, for each f ∈ D, (T i f )t (ωt ) = β Max Em {ft+1 + u (et+1 ) di,t+1 }, m∈Q(ωt )
(T i f )t (ωt ) = β Min Em {ft+1 + u (et+1 ) di,t+1 }. m∈Q(ωt )
Denote their unique fixed points by f¯i and f i and define q i,t and q i,t by q i,t (ωt ) = f¯i,t (ωt )/u (et (ωt )) and q (ωt ) = f (ωt )/u (et (ωt )). By construction, {q i,t } and i,t
{q i,t } ∈ D and satisfy (18.33).
i,t
Given q ∈ E , let {ξt } be as in (18.37) and (18.38). Denote by D the set of processes satisfying the requirements in the definition of D with the possible exception of continuity. Define contraction mappings Ti : D → D by (Ti f )t (ωt ) = βEξt (ωt ,·) {ft+1 + u (et+1 ) di,t+1 }. Its unique fixed point is fi ∈ D. By (18.38) and the uniqueness of the fixed point, u (et )qi,t = fi,t on t . Now (18.34) follows from the monotonicity of the three maps T i , T i , and Ti and the observation that (T i f )t (Ti f )t (T i f )t . For the next step, we need the following Lemma (18.B.2) concerning the existence of ε-optimal continuous policies. Bertsekas and Shreve (1978: Section 8.2) contains a parallel result for measurable policies. Lemma 18.B.2. Suppose that X is paracompact, Y is a topological linear space, and Z is a convex closed subset of Y containing 0 that has a base {Bn } for the neighborhoods of 0, consisting of symmetric and convex sets such that Bn+1 ⊂ 12 Bn . Suppose : X → Z ⊂ Y is a continuous, compact and convex-valued correspondence such that ψ(X) + Bn ⊂ Z. Suppose further that for each y ∈ (X), y + Bn is open in Z. Let F : X × Z → R be continuous. Define f , g : X → R by f (x) = min F (x, y); y∈(x)
and g(x) = max F (x, y). y∈(x)
462
Larry G. Epstein and Tan Wang
(a) If F (x, y) is convex in y, then for any ε > 0 there exists a continuous function h : X → Y such that h(x) ∈ (x) and F (x, h(x)) f (x) + ε for all x ∈ X. (b) If F (x, y) is concave in y, then for any ε > 0 there exists a continuous function h : X → Y such that h(x) ∈ (x) and F (x, h(x)) g(x) − ε for all x ∈ X. Proof. We prove (a). Fix ε > 0. Define a correspondence ψ : X → Z ⊂ Y by ψ(x) = {y ∈ (x) : F (x, y) < f (x) + ε}. By the convexity of F (x, y) in y, ψ(x) is convex. Suppose y ∈ ψ(x0 ). Then F (x0 , y) < f (x0 ) + ε. By the continuity of F (x) + ε (via the Maximum Theorem) and F , there exists a neighborhood N (x0 ) of x0 such that ∀x ∈ N (x0 ), F (x, y) < f (x) + ε, which implies that ∀x ∈ N (x0 ), y ∈ ψ(x), which in turn implies that for any open set V , the set {x : ψ(x) ∩ V = Ø} is open. Therefore ψ is lower semicontinuous. By Lemma 18.B.1, ψ¯ admits a continuous selection, say h. Since ¯ ψ(x) ⊂ {y ∈ (x) : F (x, y) f (x) + ε}, we have F (x, h(x)) f (x) + ε for all x ∈ X. Lemma 18.B.3. If qi ∈ D and satisfies (18.38) for some {ξt }, then u (et )qt bt
u(e)βb . 1 − βb
Proof. Apply (18.38) and the concavity of u. (ii) Proof of (18.35). We show the existence of q 1 . The existence of q 2 can be shown similarly. In the following, the superscript 1 is suppressed and, without essential loss of generality, we set t = 0. Choose T such that 2
(βb)T u(e)βb < βε. u (et )(1 − βb)
By Lemma 18.B.2. (with X = t , Z as in the proof of part (b) of Theorem 18.2, ≡ Q, F (ωt , m) ≡ Em {u (et+1 )(di,t+1 + qi,t+1 )} and noting that the right side of the last expression is a continuous function of (ωt , m)), there exists, for each t, πt : t → M( ) continuous such that πt (ωt ) ∈ Q(ωt ) and βEπt (ωt ,·) {u (et+1 )(di,t+1 + q i,t+1 )} u (et (ωt ))q i,t (ωt )) + u (e∗ )(1 − β)2 ε. By the proof of Theorem 18. 2(b), there is a unique equilibrium price process q in E associated with {πt } as in (18.38) with ξt replaced by πt . Now we show that
Intertemporal asset pricing
463
qi,0 satisfies the appropriate form of (18.35). For this purpose, define qiT ∈ D by T ≡ 0 for t > T + 1, q T qi,t i,T +1 = q i,T +1 and " % u (et+1 ) T T (di,t+1 + qi,t+1 ) for t T . qi,t = βEπt u (et ) Then we claim for t T , T u (et )qi,t u (et )q i,t + u (e∗ )(1 − β)2 (ε + βε + · · · + β T −t ε). (18.B.4)
This is true when t = T , since T u (eT )qi,T = βEπT {u (eT +1 )(di,T +1 + q i,T +1 )}
u (et )q i,T + u (e∗ )ε(1 − β)2 . Assume that (18.B.4) is true for some t + 1 T . Then T T u (et )qi,t = βEπt {u (et+1 )(di,t+1 + qi,t+1 )}
βEπt {u (et+1 )di,t+1 + u (et+1 )q i,t+1 + u (e∗ )(1 − β)2 (ε + βε + · · · + β T −t−1 ε)} u (et )q i,t + u (e∗ )(1 − β)2 (ε + βε + · · · + β T −t ε). Thus (18.B.4) is established. Setting t = 1 and letting T → ∞ on the right side of (18.B.4), we obtain T qi,1 q i,1 + ε(1 − β).
Now by Lemma 18.B.2 and straightforward calculation, T 0 qi,1 − qi,1 2
(βb)T u(e) βb . u (e0 ) 1 − βb
T + βε q Then by our choice of T , qi,1 qi,1 + ε. i,1
(iii) Proof of (18.36). “Only if ” follows from (18.34). For the converse, assume (18.36). By choosing ε sufficiently small in (18.35) and noting the proof of the latter, it follows that there exist two equilibria q 0 and q 1 , with qi0 = qi1 and corresponding (in the sense of (18.32)) sequences of continuous functions {πt0 } and {πt1 } from t to M( ) with πti (ωt ) ∈ Q(ωt ) for i = 0, 1 and all ωi ∈ t . For each α ∈ [0, 1], define πtα = απt0 − (1 − α)πt1 . By the proof of Theorem 18.2(b), there exists a unique q(α) ∈ E such that " % u (et+1 ) t qi,t (α, ω ) = βEπtα (ωt ,·) (qi,t+1 (α) + di,t+1 ) . u (et ) If it can be shown that for each i and t, the map α → qi,t (α) ∈ C( t ) is continuous, 0 = q 1 implies that q 0 (ωt ) = q 1 (ωt ) then the proposition is proven because qi,t i,t i,t i,t
464
Larry G. Epstein and Tan Wang
for some ωt . Then qi,t (α, ωt ) as a continuous function of α assumes at least two distinct values and hence must assume a continuum of distinct values. It remains to show that qi,t (α) is continuous in α. Let ε > 0. By Lemma 18.B.3, T T T qi,t (α) − qi,t (α0 ) qi,t (α) − qi,t (α) + qi,t (α) − qi,t (α0 ) T + qi,t (α0 ) − qi,t (α0 )
2
bt (βb)T −t+1 u(e)βb T T (α) − qi,t (α0 ), + qi,t u (et )(1 − βb) (18.B.5)
T (α) ≡ 0 for t > T and where qiT (α) ∈ D is defined by qi,t
" T (α) = βEπtα qi,t
%
u (et+1 ) T qi,t+1 (α) + di,t+1 u (et )
for t T .
T (α) as a function from [0,1] to C( t ) follows by straightforThe continuity of qi,t ward induction. This implies that the second term of (18.B.5) can be made less than ε/2 by choosing |α − α0 | arbitrarily small. Finally, choose T such that the first term of (18.B.5) is less than ε/2.
Part (a): Let q n ∈ E and q n → q ∈ D n . Then, by the Maximum Theorem, (18.27) is satisfied for q. Therefore, q ∈ E and E is closed. Define P E ⊂ E to consist of those equilibria q for which (18.38) is satisfied by some sequence {ξt } as in (18.37), except that “measurability” is strengthened to “continuity.” As in the proof of (18.36), P E can be shown to be path-connected and hence also connected. Second, P E is dense in E . (The argument is similar to the proof of (18.35); in particular, ε-optimal continuous policies are used. A detailed proof is available from the authors upon request.) We conclude (Dugundji (1966: Theorem 1.6, p. 109)) that E is connected.
Appendix C: Aggregation in a heterogeneous agent economy We provide the details to support the example in Section 18.3.4 dealing with heterogeneous agents. First define the subclass of our model of utility that corresponds to Schmeidler (1989) where beliefs are represented by a capacity. Say that the probability kernel correspondence P is capacity-based if for each ω ∈ : (i) the mapping A → P (ω, A), from B( ) into [0, 1], defines a convex capacity; and (ii) P (ω) = {m ∈ M( ) : m(A) P (ω, A), ∀A ∈ B( )}. In that case we have the following convenient Choquet integration formula for any f ∈ C+ ( ):
f dP (ω) = 0
∞
P (ω, {f t}) dt.
Intertemporal asset pricing
465
Moreover, and this is critical for what follows, for any two such functions f and g: (f + g) dP (ω) f dP (ω) + g dP (ω) (18.C.1) and equality prevails if f and g are comonotone, that is, if [f (ω ) − f (ω)][g(ω ) − g(ω)] 0,
∀ω , ω ∈ .
(18.C.2)
Note that the ε-contamination and belief function kernel examples are capacitybased. For these and other examples in a static setting, see Wasserman and Kadane (1990). Suppose further that P has full support and that P (ω) is constant in ω. Let β, {uh }, {α h }, and e be as in the text. For each t and ωt assign consumption to agent h given by the solution to (18.52) with x = e∗ (ωt ). Denote the consumption h processes defined in this way by ch and the associated utility processes by V . Then V¯th (ωt ) = uh (x h ∗(e∗ (ωt ))) + constant, where x h ∗(x), h = 1, . . . , H , is the solution to (18.52). Since the latter functions are all nondecreasing, we see that for each t, the functions {V¯th : h = 1, . . . , H } are pairwise comonotone. (∗) That is, given the allocation {c¯h }, agents agree (weakly) in their induced rankings of states. This occurs because the i.i.d. assumption restricts the dependence of beliefs on the current state so that it does not offset the comonotonicity of current felicities uh (x h∗ (e∗ (·))). We now show that {c¯h } solves (18.57) uniquely and in particular is Pareto optimal: For any other feasible utility processes {Vth }∞ t=0 for h = 1, . . . , H , (18.16), (18.C.1), and (18.52) imply h h t h h t αh Vt (ω ) = αh u (ct (ω )) + β αh Vt+1 (ωt , ·) dP (ωt , ·) h h h t αh u (c¯t (ω )) + β αh Vt+1 (ωt , ·) dP (ωt , ·) whereas
αh V¯th (ωt ) =
αh uh (c¯th (ωt )) + β
h αh V¯t+1 (ωt , ·) dP (ωt , ·).
(18.C.3) By the contraction mapping arguments in Appendix A, it follows that αh Vth (ωt ) αh V¯th (ωt ).
(18.C.4)
The full support assumption for P guarantees that {c¯h } is the unique solution to (18.51).
466
Larry G. Epstein and Tan Wang
In terms of the candidate representative agent’s intertemporal utility U α defined by (18.51), it follows from (18.C.3) and (18.C.4) that α t α ∗ α Ut (e; ω ) = u (e (ωt )) + β Ut+1 (e; ωt , ·) dP (ωt , ·). Moreover, a corresponding equality holds also if e is replaced by an arbitrary c ∈ D, since the preceding arguments extend to arbitrary endowment processes e. Therefore, U α is generated by β, P , and uα , completing the arguments sketched in the text. Finally, note that the assumption of i.i.d. beliefs was used earlier only to guarantee (∗). Indeed, the latter condition, assumed to hold not only for the given e but for all endowment processes in an open neighborhood of e in the norm topology of D, suffices for our aggregation result.
Acknowledgments We are grateful to the Social Sciences and Humanities Research Council of Canada for financial support and to Chew Soo Hong, Darrell Duffie, Mike Peters, J. C. Rochet, Rishin Roy, and especially to Angelo Melino and Guy Laroque for valuable discussions and suggestions.
Notes 1 See, however, LeRoy and Singell (1987), for a different interpretation of Knight. 2 A closely related model, due to Schmeidler (1989) and Gilboa (1987), replaces the Bayesian prior by a nonadditive probability measure or capacity. In an earlier working paper, the capacity-based model was adopted as a starting point and virtually the identical results were obtained. 3 Under suitable additional restrictions on P, the map A → P(A) defines a capacity and f dP equals the associated Choquet integral, that is central to the capacity-based model of Schmeidler (1989) and Gilboa (1987). (See Appendix C.) With this link in mind, we add the following remark concerning the above definition of uncertainty aversion: Not every uncertainty averse P can accommodate Ellsberg-type behavior; the latter is inconsistent with the “small” subclass of correspondences P for which the capacity P (ω, ·), for some ω, defines a qualitative probability (Schmeidler (1989: 585)). 4 Another possible rationale for (18.9) is based on the hypothesis that the set of states
is not exhaustive. See Epstein and Wang (1992) for elaboration and for another class of examples motivated by “missing states.” 5 The continuity of PG follows from Epstein and Wang (1992: Proposition A.2.1). A closely related form of continuity is apparent from (18.15) later. Under our assumptions on p and G, the integrals there vary continuously with ω since f ∗ is continuous by the Maximum Theorem. 6 The class of functions that are analytic in the sense of analytic set theory (see Dellacherie and Meyer (1982)) is the appropriate one for Choquet integration, which is closely related to the integration notion (18.3) employed here. Therefore, future extensions of D may need to go beyond the space of adapted processes to include processes for which each Xt is analytic. 7 Epstein and Zin (1989) study recursive relations of the form Vt (c; ωt ) = W (c(ωt ), m(Vt+1 (c; ωt , ·))), where m is a generalized certainty equivalent operator.
Intertemporal asset pricing
8
9
10
11
12 13
14
15
16
467
Relation (18.16) is the special case in which W (c, z) = u(c) + βz and m in the generalized expected value operator (18.6). Several of our results can be extended considerably beyond the specification (18.16) by applying and adapting available results on recursive utility, but such extensions would detract from the main focus of this chapter. Note also that the presence of uncertainty aversion introduces an important technical difference, namely a lack of Gâteaux differentiability, relative to the analysis in Epstein and Zin. See Sections 18.2.5 and 18.3 for elaboration and for the economic significance of the nondifferentiability. Though we have not axiomatized our model, there is reason to believe that it can be provided with a respectable axiomatic basis. That is because in the literature on preferences under risk, the corresponding question of how to extend atemporal theories has been thoroughly examined and the recursive approach has been provided with respectable axiomatic credentials (see Kreps and Porteus (1978), Chew and Epstein (1991), and, for an overview, see Epstein (1993)). In addition, Skiadas (1992) axiomatizes recursive utility in a Savage-style framework where conditional subjective probabilities are derived. In particular, we assume that for each security buying and selling prices coincide. In fact, the presence of uncertainty aversion can “explain” bid-ask spreads even in the absence of transactions costs. We leave this extension of our model to a separate chapter. By cτ , we mean the function cτ (ωt , ·) on τ −t and similarly for qτ , θτ and so on. The indicated equality and inequality are intended at the level of functions and so apply throughout τ −t . Similar simplifying notation is adopted throughout the chapter. Finally, note that restrictions on short sales are commonly assumed in the literature in order to guarantee existence of planning optima and equilibria. The objective function in (18.26) is concave in ξ and therefore is almost everywhere differentiable in ξ , for given e, d, q, and . It is incorrect, however, to interpret this fact as implying that the price indeterminacy discussed later is “infrequent.” Only differentiability at ξ = 0 is relevant to price determinacy. Thus the relevant question is whether for “many” specifications of e, d, q, and , the objective function in (18.26) is nondifferentiable in ξ at ξ = 0. The frequency of price indeterminacy is examined in Section 18.3.4. We continue to write Q rather than QV ∗ . It is common in the literature to assume a time-homogeneous Markov structure for dividends and to restrict attention to price processes that are time-homogeneous and Markovian. Therefore, we point out that under the earlier mentioned assumption, Theorems 18.2 and 18.3 remain valid if price processes are defined to be elements of D that are time-homogeneous and Markovian. It is well known that sunspot equilibria may exist, even in infinitely lived representative agent models, given financial constraints, externalities, nonconvexities, or other sources of market imperfection that lead to inefficient equilibrium allocations. See Guesnerie and Woodford (1993) for a survey. In contrast, in our model no such imperfections exist and the equilibrium allocation is trivially efficient, but quantities do not vary with the extrinsic state. Note the difference between (18.32) and (18.38). The former is sufficient for q to be an equilibrium since the selection {πt } is assumed to be continuous and hence the solution q to (18.32) must lie in D n . On the other hand, as just shown, the existence of a measurable selection {ξt } as in (18.37)–(18.38) is a necessary condition for q to be an equilibrium. It is also sufficient only if, as in Theorem 18.4, we assume that the solution q to (18.38) lies in D n . It is more common to assume a time-homogeneous Markov structure for growth rates rather than levels. Our analysis is readily modified accordingly with no effects on our qualitative results.
468
Larry G. Epstein and Tan Wang
17 That is, u (e∗ ) di∗ is not measurable with respect to the σ -algebra on generated by the mapping V ∗ : → R. Note that (18.44) is weaker than (18.43) since the latter requires only that one not be able to infer the magnitude of u (e∗ ) di∗ from knowledge that V ∗ = min V ∗ , while the former rules out the possibility of such inference given V ∗ = k for some k. This difference does not appear to us to be economically significant and thus we will not differentiate between (18.43) and (18.44). 18 If consists of only two states, then (18.45) and (18.46) are each equivalent to: (∗) e∗ is constant and di∗ is not constant (on ). In particular, for this i.i.d. case, indeterminacy can occur only if consumption is certain. This conclusion that asset price indeterminacy is limited to riskless initial positions is also apparent from examination of the indifference curves of a Gilboa–Schmeidler utility in the state preference diagram for a static setting (see Simonsen and Werlang (1991), for example). However, one must be cautious in extrapolating to more general state spaces, where (18.45) implies not (∗), but rather that the conditions specified there apply on arg min e∗ . 19 Note the loose parallel with the case of risk (ε = 0 and P a probability kernal) where our model reduces to the consumption-based CAPM according to which the risk premium for asset j depends on the covariation of di∗ and consumption. 20 Under the stated assumptions, (18.42) implies that any ξt+1 (ωt , ·) ∈ Q(ωt ) has RadonNikodym derivative of the form zt+1 (ωt , ·) = 1 − ε + εgt+1 (ωt , ·), for some gt+1 0 satisfying gt+1 dπ ∗ (ωt , ·) ≡ 1 and gt+1 = 0 on \. m . These restrictions on zt+1 are equivalent to (18.47) 21 Specifically, if = {ω1 , ..., ωn }, then 3 2 p(ωi )zi = 1 , m var (ε) ≡ max p(ωi )zi2 − 1 : z ∈ RN , zi 1 − ε ∀i, and the maximum is attained at one of the N extreme points of the constraint set {zj }N 1 , j where zj = 1 − ε + ε/p(ωj ) and zii = 1 − ε if i = j . Finally, note that e∗ constant implies that m = . 22 Under the additional assumption that the price of equity is constant across time and states (such an equilibrium exists if beliefs are i.i.d.), the maximum equity premium equals, in terms of primitives of the model, ) *9) * (1 − ε)Eπ ∗ (ωt ,·) dt+1 + ε min dt+1 . ε(1 − β)β −1 Eπ ∗ (ωt ,·) dt+1 − min dt+1
The latter vanishes if ε = 0. Therefore, this expression represents a premium for the uncertainty associated with holding equity rather than for the bearing of risk. We leave to a separate chapter consideration of the equity premium puzzle (Mehra and Prescott (1985)) unrestricted by the numerous simplifying assumptions of this section.
References Aubin, J. P. (1979). Mathematical Methods of Game and Economic Theory. Amsterdam: North Holland. Barsky, R. B. and J. B. DeLong (1992). “Why Does the Stock Market Fluctuate?” NBER Working Paper 3995. Bertsekas, D. P. and S. E. Shreve (1978). Stochastic Optimal Control. New York: Academic Press. Bewley, T. (1986). “Knightian Decision Theory: Part I,” Cowles Foundation Discussion Paper No. 807, Yale University. Camerer, C. and M. Weber (1992). “Recent Developments in Modeling Preferences: Uncertainty and Ambiguity,” Journal of Risk and Uncertainty, 5, 325–370.
Intertemporal asset pricing
469
Chew, S. H. and L. G. Epstein (1991). “Recursive Utility under Uncertainty,” in Equilibrium with an Infinite Number of Commodities, ed. A. Khan and N. Yannelis. Heidelberg: Springer-Verlag. Cochrane, J. H. (1991). “Volatility Tests and Efficient Markets,” Journal of Monetary Economics, 27, 463–485. Cochrane, J. H. and L. P. Hansen (1992). “Asset Pricing Explorations for Macroeconomics,” NBER Working Paper 4088. Constantinides, G.M. (1982). “Intertemporal Asset Pricing with Heterogeneous Consumers and without Demand Aggregation,” Journal of Business, 55, 253–267. Cragg, J. and B. Malkiel (1982). Expectations and the Structure of Share Prices. Chicago: University of Chicago Press. Dellacherie, D. and P. A. Meyer (1982). Probabilities and Potential. New York: NorthHolland. DeLong, J. B., A. Shleifer, L. H. Summers, and R. J. Waldmann (1990). “Noise Trader Risk in Financial Markets,” Journal of Political Economy, 98, 703–738. Dempster, A. P. (1967). “Upper and Lower Probabilities Induced by a Multivalued Mapping,” Annals of Mathematical Statistics, 38, 325–339. Dow, J. and S. R. Werlang (1992). “Uncertainty Aversion, Risk Aversion and the Optimal Choice of Portfolio,” Econometrica, 60, 197–204. (Reprinted as Chapter 17 in this volume.) Duffie, D. (1992). Dynamic Asset Pricing Theory. Princeton: Princeton University Press. Duffie, D. and L. G. Epstein (1992). “Stochastic Differential Utility,” Econometrica, 60, 353–394. Dugundji, J. (1966). Topology. Boston: Allyn and Bacon. Ellsberg, D. (1961). “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Epstein, L. G. (1993). “Behavior under Risk: Recent Developments in Theory and Applications,” in Advances in Economic Theory, Vol. II, ed. J. J. Laffont. Cambridge: Cambridge University Press. Epstein, L. G. and M. LeBreton (1993). “Dynamically Consistent Beliefs Must Be Bayesian,” Journal of Economic Theory, 61, 1–22. Epstein, L. G. and S. Zin (1989). “Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework,” Econometrica, 57, 937–969. Epstein, L. G. and T. Wang (1992). “Intertemporal Asset Pricing under Knightian Uncertainty,” University of Toronto, Working Paper 9211. Frankel, J. A. and K. Froot (1990). “Exchange Rate Forecasting Techniques, Survey Data, and Implications for the Foreign Exchange Market,” NBER Working Paper 3470. Gilboa, I. (1987). “Expected Utility Theory with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88. Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) —— (1993). “Updating Ambiguous Beliefs,” Journal of Economic Theory, 59, 33–49. (Reprinted as Chapter 8 in this volume.) Guesnerie, R. and M. Woodford (1993). “Endogenous Fluctuations,” in Advances in Economic Theory, Vol. II, ed. J. J. Laffont. Cambridge: Cambridge University Press. Hansen, L. P. and R. Jagannathan (1991). “Implications of Security Market Data for Models of Dynamic Economies,” Journal of Political Economy, 99, 225–262.
470
Larry G. Epstein and Tan Wang
Hansen, L. P. and S. F. Richard (1987). “The Role of Conditioning Information in Deducing Testable Restrictions Implied by Dynamic Asset Pricing Models,” Econometrica, 55, 587–619. Ito, T. (1990). “Foreign Exchange Rate Expectations: Micro Survey Data,” American Economic Review, 80, 434–449. Jaffray, J. Y. (1992). “Dynamic Decision Making with Belief Functions,” mimeo. Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. —— (1936). The General Theory of Employment Interest and Money. London: Macmillan. Klein, E. and A. Thompson (1984). Theory of Correspondences. New York: Wiley. Knight, F. H. (1921). Risk, Uncertainty and Profit. Boston: Houghton Mifflin. Koppel, R. (1991). “Animal Spirits,” Journal of Economic Perspectives, 5, 203–210. Kreps, D. M. and E. L. Porteus (1978). “Temporal Resolution of Uncertainty and Dynamic Choice Theory,” Econometrica, 46, 185–200. Lehmann, B. N. (1992). “Asset Pricing and Intrinsic Values: A Review Essay,” Journal of Monetary Economics, 28, 485–500. LeRoy, S. F. (1989). “Efficient Capital Markets and Martingales,” Journal of Economic Literature, 27, 1583–1621. LeRoy, S. F. and L. D. Singell Jr. (1987). “Knightian Risk and Uncertainty,” Journal of Political Economy, 95, 384–406. Lucas, R. E. Jr. (1978). “Asset Prices in an Exchange Economy,” Econometrica, 46, 1429–1445. Lucas, R. E. Jr. and N. Stokey (1984). “Optimal Growth with Many Consumers,” Journal of Economic Theory, 7, 188–209. Mehra, R. and E. Prescott (1985). “The Equity Premium: A Puzzle,” Journal of Monetary Economics, 15, 145–161. Michael, E. (1956). “Continuous Selections, I,” Annals of Mathematics, 63, 361–382. Papamarcou, A. and T. L. Fine (1991). “Unstable Collectives and Envelopes of Probability Measures,” Annals of Probability, 19, 893–906. Poterba, J. M. and L. H. Summers (1988). “Mean Reversion in Stock Prices: Evidence and Implications,” Journal of Financial Economics, 22, 27–59. Royden, H. L. (1988). Real Analysis, 3rd ed. New York: Macmillan. Savage, L. (1954). The Foundations of Statistics. New York: John Wiley. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Shiller, R. J. (1981). “Do Stock Prices Move Too Much to Be Justified by Subsequent Changes in Dividends?” American Economic Review, 71, 421–436. —— (1991). Market Volatility. Cambridge: MIT Press. Simonsen, M. H. and S. R. C. Werlang (1991). “Subadditive Probabilities and Portfolio Inertia,” R. de Econometria, 11, 1–19. Sion, M. (1958). “On General Min-Max Theorems,” Pacific Journal of Mathematics, 8, 171–176. Skiadas, C. (1992). “Advances in the Theory of Choice and Asset Pricing,” Ph.D. Dissertation, Stanford University. Stokey, N. and R. E. Lucas Jr. (1989). Recursive Methods in Economic Dynamics. Cambridge: Harvard University Press. Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. London: Chapman and Hall. Wasserman, L. A. (1988). “Some Applications of Belief Functions to Statistical Inference,” Ph.D. Dissertation, University of Toronto.
Intertemporal asset pricing
471
—— (1990). “Prior Envelopes Based on Belief Functions,” Annals of Statistics, 18, 454–464. Wasserman, L. A. and J. Kadane (1990). “Bayes’ Theorem for Choquet Capacities,” Annals of Statistics, 18, 1328–1339. West, K. D. (1988). “Bubbles, Fads and Stock Price Volatility Tests: A Partial Evaluation,” Journal of Finance, 43, 639–660. Zarnowitz, V. (1984). “Business CycleAnalysis and Expectational Survey Data,” in Leading Indicators and Business Cycle Surveys, eds K. H. Openheimer and G. Poser. Aldershot: Gower Publishing. —— (1992). Business Cycles: Theory, History, Indicators and Forecasting. Chicago: University of Chicago Press. Zarnowitz, V. and L. A. Lambros (1987). “Consensus and Uncertainty in Economic Prediction,” Journal of Political Economy, 95, 591–621. Zeidler, E. (1986). Nonlinear Functional Analysis and Its Applications, I: Fixed-Point Theorems. New York: Springer-Verlag.
19 Sharing beliefs Between agreeing and disagreeing Antoine Billot, Alain Chateauneuf, Itzhak Gilboa, and Jean-Marc Tallon
19.1. Introduction When is it Pareto optimal for risk averse agents to take bets? Under what conditions do they choose to introduce uncertainty into an otherwise certain economic environment? One obvious case is where they do not share beliefs. As in the classical (theoretical) example of horse lotteries, people who do not agree on probability assessments do find it mutually beneficial to engage in uncertainty-generating trade. If the agents involved are Bayesian expected utility maximizers and strictly risk averse, it is not hard to see that disagreement on probabilities is the only way that betting, understood as trade of an uncertain asset, may be Pareto improving when starting from a full insurance allocation. On the other hand, any such disagreement induces betting. Put differently, Pareto optimality dictates either that there be no betting (in case beliefs are common to all agents) or that there be betting (in case of disagreement). This is somewhat puzzling, because there is no lack of allocation-neutral, “sunspot” sources of uncertainty in the world around us. If every disagreement on probabilities of states of the world suggests a Pareto improving trade, one might have expected to see much more betting taking place. Rather than believing that people who do not bet necessarily share probabilistic beliefs about anything they do not bet on (or, to be precise, share these beliefs up to some slack allowed by transaction costs), we tend to take the relative rarity of bets as a piece of empirical evidence against the Bayesian model. It seems that often people do not bet because they are uncertainty averse, and they therefore tend to avoid uncertainty that they know little about. It follows that a person’s willingness to bet will increase with her subjective confidence in her information and in her likelihood assessments. It is worth emphasizing that Bewley’s (1986) motivation for his work on Knightian decision theory was partly this absence of observed widespread betting.
Billot, Antoine, Alain Chateauneuf, Itzhak Gilboa, and Jean-Marc Tallon (2000) “Sharing beliefs: between agreeing and disagreeing,” Econometrica, 68, 685–694.
Sharing beliefs
473
While we do not attempt to argue that the full complexity of betting behavior can be explained by the type of models we study here,1 we are led to ask, how much can be explained by these models if we relax some of the more demanding assumptions of the Bayesian model. Specifically, we consider maxmin expected utility with a nonunique prior (Gilboa and Schmeidler, 1989) that captures Knightian uncertainty (Knight, 1921). Assume that such uncertainty averse agents who are also risk averse, give rise to an economy in which there is no aggregate risk. When does there exist full insurance, that is, no-bet allocations that are also Pareto optimal? When is it the case that all Pareto optimal allocations are full insurance? Is any betting due to different beliefs, and, conversely, does a difference in beliefs always trigger some betting? In the multiple prior model an individual is characterized by a utility function and a nonempty, closed and convex set of probability measures. The individual evaluates every act by its expected utility according to each possible probability measure, and chooses an act whose minimal expected utility is the highest. The family of preference relations described by this model strictly contains the relations described by Choquet expected utility with a convex capacity (Schmeidler, 1989). Consider now a pair of agents conforming to the multiple prior model. It is an easy extension of the expected utility analysis to show that these agents will not bet against one another if they share at least one prior. Moreover in a general framework with more than two agents and complex bets possibly involving several of them, it is easy to show, following Dow and Werlang (1992) early intuition, that Pareto optimal allocations are indeed full insurance allocations whenever agents’ sets of priors have a nonempty intersection (see, e.g. Dana, 1998; Tallon, 1998). The question of whether the converse to this result holds arises naturally: is commonality of beliefs, in the sense of agents sharing a prior in common, exactly what is needed to explain, within the framework of the multiple prior model, the absence of betting on the many possible sources of “extrinsic” uncertainty? Differently put, is the observation of a Pareto optimal allocation that is immune to sunspots enough to tell us something about the intersection of agents’sets of priors? It turns out that we can answer this question affirmatively and that the result in the Bayesian model has a conceptually identical counterpart in the multiple prior model. Under the same nontriviality conditions, there exists a Pareto optimal full insurance allocation if and only if all Pareto optimal allocations provide full insurance, and this holds if and only if all agents share a prior probability on the states of the world. In other words, commonality of beliefs is the necessary and sufficient condition to explain the absence of betting. Whereas in the Bayesian model “sharing a prior” could only mean “having an identical prior,” in the multiple prior model this phrase may be read as “having at least one prior in common.” With this grammatical convention in place, the result holds verbatim. Bayesian agents either agree on probability assessments, or disagree enough to bet against each other. By contrast, uncertainty averse agents can be in a “grey area” between agreeing and disagreeing: they may not agree in the sense of having the same set of possible priors, yet not disagree in the sense of being willing to bet against each other.
474
Billot et al.
Finally, we emphasize another contribution of this note. In showing that commonality of beliefs is the minimal assumption explaining the absence of bets, we prove a separation theorem for n convex sets that might be of interest on its own. The rest of this chapter is organized as follows. Section 19.2 provides the setup of the model. In Section 19.3 we state the main result and the separation theorem. Proofs are relegated to an Appendix.
19.2. Setup The economy we consider is a standard two-period pure-exchange economy with uncertainty in the second period, but for agents’ preferences. The state space is S, and is a σ -algebra of subsets of S, so that (S, ) is a measurable state space. There are n agents indexed by subscript i. We assume (i) that there is only one good, which can be interpreted as income or money; and (ii) that there is no aggregate uncertainty. Trading an uncertain asset is thus interpreted as betting rather than as hedging. Let B(S, ) be the Banach space of real-valued, bounded and measurable functions on S, endowed with the sup-norm. Let ba(S, ) be the space of bounded finitely additive measures on (S, ) endowed with the weak# -topology. Agent i’s consumption Ci , is a positive element of B(S, ), that is, Ci (s) is the consumption of agent i in state s. Denote by w ∈ B(S, ) the constant-across-states aggregate endowment, and assume that w > 0. An allocation C = (C1 , . . . , Cn ) is feasible if ni=1 Ci = w. An allocation is interior if Ci (s) > 0 for all i, for all s. In the multiple prior approach, each agent i is endowed with a utility index Ui :R+ → R and a set Pi of probability distributions over S. Ui is defined up to a positive affine transformation, and is taken to be differentiable, strictly increasing, and strictly concave. Pi is a convex and closed set of ba(S, ). We assume that all priors in Pi are σ -additive.2 Note that Pi is compact in the weak# -topology since it is a weak# -closed subset of the set of finitely-additive probability measures on , which is compact in the weak# -topology (see, e.g. Dunford and Schwartz, 1958). The norm-dual of B(S, ) which is isometrically isomorphic to ba(S, ) will be denoted B # (S, ). The overall utility function Vi defined over B(S, ) then takes the following form: Vi (Ci ) = min Eπ Ui (Ci ). π ∈Pi
We assume throughout that: ∀A ∈ , ∀i, j , ∀πi ∈ Pi , ∀πj ∈ Pj ,
πi (A) = 0 ⇐⇒ πj (A) = 0.
This assumption essentially says that all agents agree on “null events.” The last definition we need is that of a full insurance allocation. An allocation C is said to be full insurance if it is constant apart from a set A ∈ that has πi (A) = 0 for some (and therefore, by the assumption of mutual absolute continuity, for all) πi ∈ Pi and i.3
Sharing beliefs
475
19.3. The main result The following theorem states that the set of Pareto optimal allocations and the set of full insurance allocations are either identical or disjoint. Moreover, they are identical if and only if the agents share at least one prior. Theorem 19.1. Under the maintained assumptions, the following assertions are equivalent: (i) (ii) (iii) (iv)
There exists an interior full insurance Pareto optimal allocation. Any Pareto optimal allocation is a full insurance allocation. Every full insurance allocation is Pareto optimal. ∩ni=1 Pi = Ø.
The intuition for the proof (and the role of some assumptions) is as follows. We prove that (iv) ⇒ (ii) ⇒ (iii) ⇒ (i) ⇒ (iv). If there is a common prior (iv), one can use strict concavity to show that a risk bearing allocation is Pareto dominated by the full insurance allocation that equals its expectation at every state, proving (ii).4 This step uses the mutual absolute continuity assumption, as well as the assumption that the probability measures we deal with are σ -additive (rather than only finitely additive). Observe that with finitely additive measures the implication (iv) ⇒ (ii) does not hold, even in a Bayesian set-up. This is so because the integral of a function with respect to a finitely additive measure may be strictly smaller than each of the values the function assumes. Therefore individuals who hold assets that they view as uncertain may not benefit from smoothing them across states. If every Pareto improving allocation provides full insurance (ii), the converse (iii) also holds, since no two full insurance allocations can be Pareto ranked,5 and it follows trivially that there is at least one such allocation (i). Finally, the crucial step and the main contribution of the theorem is that the existence of a full insurance Pareto optimal allocation (i) implies that there is a common prior (iv). This step does not require concavity of the utility index.6 In proving this last part we make use of the following theorem, which generalizes the standard separating hyperplane theorem, and may be of interest on its own. In the Appendix we also comment on the geometric interpretation of this result, which may be viewed as a separation theorem among n convex sets. Theorem 19.2. Let X be a locally convex linear topological space and let Pi ⊆ X, 1 ≤ i ≤ n, be convex, nonempty, and compact. Then, the following are equivalent: (i) ∩ni=1 Pi = Ø. (ii) There exist I ⊆ {1, . . . , n}, I = Ø and p ∈ co(∪i∈I Pi ) and for each i ∈ I , there exists a continuous linear functional hi : X → R such that: (a) ∀ i ∈ I , hi (q − p) > 0 (b) i∈I hi = 0.
for all q ∈ Pi ,
476
Billot et al.
An immediate corollary of Theorem 19.2 is that, under the same assumption, if ∩ni=1 Pi = Ø, there exist continuous linear functionals hi , i = 1, . . . , n, and a point p such that (a ) hi (q − p) ≥ 0 for all q ∈ Pi , for all i, (b ) ni=1 hi = 0, and (c ) there exist i, i such that the inequality in (a ) is strict. It is worthy of note that a similar result, developed independently and with a rather different motivation, is to be found in Samet (1998), for subsets of a finite dimensional simplex. Samet’s result is weaker in the sense that it guarantees the existence of linear functionals as in our case, but does not guarantee that the separating hyperplanes will intersect at one point p in the convex hull of the sets, and therefore does not yield itself to a straightforward geometric interpretation. Further, Samet’s result can be easily derived from the corollary above specialized to subsets of the simplex. It does not appear that Samet’s argument could easily be amended to get ours. Theorem 19.1 has two immediate corollaries. First, in the Choquet expected utility model with convex capacities, nonempty core intersection is equivalent to some, or all, Pareto optimal allocations being full insurance. Second, in the expected utility case, where the sets of priors are reduced to one point, some, or all, Pareto optimal allocations are full insurance allocations if and only if agents have the same beliefs (i.e. the same prior). Note that even though we cast the argument in the multiple prior model, it should be clear from the proof that a similar result holds for the Bewley (1986) approach. In Bewley’s approach, agents are also endowed with a set of priors and move away from a (exogenously defined) status quo situation only if the new situation is better than the status quo for all the probability distributions in their set of priors. While Bewley characterizes a partial order over acts, a proposed bet will be preferred to a certain status quo if and only if this preference holds in the multiple prior model of Gilboa and Schmeidler.7 Our analysis is conducted for an economy with one good. However, the only use we make of this assumption is in arguing that all full insurance allocations are Pareto optimal. Indeed, one can generalize our results to an economy with m goods, with the slight modification that full insurance allocations that are considered for optimality be assumed Pareto optimal in each state.
Appendix Proof of Theorem 19.1. We first prove (iv) ⇒ (ii). Assume to the contrary that there exists an agent, say, agent 1, such that for every π1 ∈ P1 and every c ∈ R+ , π1 ({s | C1 (s) < c}) + π1 ({s | C1 (s) > c}) > 0. Let π ∈ ∩i Pi and define C i = Eπ Ci for all i. Abusing notation, let C i also denote the constant allocation all states. C = (C i )i is a givingC i to agent i in feasible allocation since i C i = i Eπ Ci = Eπ i Ci = Eπ w1S = w. Now, Vi (Ci ) = min Eϕ Ui (Ci ) ≤ Eπ Ui (Ci ). ϕ∈Pi
Sharing beliefs
477
Furthermore, Eπ Ui (Ci ) ≤ Ui (Eπ (Ci )) = Ui (C i ) = Vi (C i ) for all i since Ui is concave. Since π belongs to Pi , one gets that π({s | C1 (s) < C 1 }) + π({s | C1 (s) > C 1 }) > 0. Furthermore, π({s | C1 (s) < C 1 }) = 0 is impossible, for then π({s | C1 (s) > C 1 }) > 0, implying by σ -additivity of π that Eπ (C1 ) > C 1 , a contradiction. Hence, π({s | C1 (s) < C 1 }) > 0 and, similarly, π({s | C1 (s) > C 1 }) > 0. It follows that V1 (C1 ) < V1 (C 1 ) since U1 is strictly concave. Therefore, the allocation C Pareto dominates C, a contradiction. To see that (ii) implies (iii), let C be a full insurance allocation. Assume, contrary to (iii), that it is not Pareto optimal, and is dominated by another allocation C . By the same argument as above, C is at least as desirable as C for every agent. By transitivity of Pareto domination, C Pareto dominates C. But this is a contradiction since both provide full insurance and there is only one good in the economy. That (iii) implies (i) is obvious, and it remains to prove that (i) implies (iv). Suppose to the contrary that ∩i Pi = Ø, and let C be an interior Pareto optimal allocation that is a full-insurance allocation (Ci is constant for all i apart on a set of measure zero, the latter notion being defined unambiguously given our absolute mutual continuity assumption). By Theorem 19.2 (where X is B # (S, ) endowed with the weak# -topology), since ∩i Pi = Ø, there exists a nonempty set I , a point p and functionals hi ∈ B # (S, ), i ∈ I , such that: (a) ∀i ∈ I , hi (q − p) > 0 (b) i∈I hi = 0.
for all q ∈ Pi ;
Recall that (see, e.g. Kelley and Namioka, 1963: 155) every weak# -continuous linear functional on the conjugate space of a linear topological space E is the evaluation at some point of E. Hence, for all i ∈ I , there exists Di ∈ B(S, ) such that hi (p) = p(Di ), for all p ∈ B # (S, ). Construct the allocation (Cˆ i )i=1, ..., n as follows: Cˆ i = Ci ,
i∈ / I,
Cˆ i = Ci + ε[Di − p(Di )1S ],
i ∈ I,
with ε > 0 small enough so that Cˆ is an allocation. We first check that this allocation is feasible: . . ε Di − p(Di )1S = ε Di − hi (p) i∈I
i∈I
i∈I
=ε
i∈I
Di
i∈I
since
i∈I
hi = 0.
478
Billot et al.
Now, Di is such that hi (q) = q(Di ) for all q ∈ B # (S, ) and hence # q( i∈I Di ) = 0 for all q ∈ B (S, ). To conclude that i∈I D i = 0, suppose there exists s such that i∈I Di (s) = a, a = 0. The event {s | i∈I Di (s) = a} is measurable because the Di are measurable. Now, let q be the continuous linear functional in B # (S, ) correspondingto the additive probability in ba(S, ) with the mass 1 on that event. Then q( i∈I Di ) = 0 implies a = 0, a contradiction. Hence, i∈I Di = 0. Now, for i ∈ I , one has: Vi (Cˆ i ) = Eqˆ ε Ui (Ci + ε[Di − p(Di )1S ]) for some qˆ ε ∈ Pi = Vi (Ci ) + εUi (Ci )[qˆ ε (Di ) − p(Di )] + o(ε) = Vi (Ci ) + εUi (Ci )[hi (qˆ ε − p)] + o(ε) ) * ≥ Vi (Ci ) + εUi (Ci ) inf hi (q − p) + α(ε) q∈Pi
where α(ε) = o(ε)/ε → 0 as ε → 0. Since inf q∈Pi hi (q − p) > 0 by continuity of hi and compactness of Pi , and α(ε) → 0, there exists ε small enough so that the term in brackets is strictly positive. Hence, Vi (Cˆ i ) > Vi (Ci ) for i ∈ I , and we found a Pareto dominating allocation ˆ (Ci )i=1, ..., n , a contradiction. Proof of Theorem 19.2. We start with the following lemma: Lemma. Let X be a locally convex linear topological space and let Pi ⊆ X, 1 ≤ i ≤ n be convex, nonempty, and compact. Assume that ∩i≤n Pi = Ø but that for all l ≤ n, ∩i=l Pi = Ø. Then, there exist p ∈ co(∪ni=1 Pi ) and a continuous linear functional hi : X → R for each i ≤ n such that: (a) ∀ i ≤ n, hi (q − p) > 0 ∀q ∈ Pi ; (b) i≤n hi = 0. The geometric interpretation of this lemma is as follows. Assume that n convex and compact sets have an empty intersection, but that every subset of them has a nonempty intersection. Then, we can find a point p that is not included in any set, but that is “in the middle” in the following sense: one can find, for each set Pi , a hyperplane hi that passes through p that is in the convex hull of the union of the Pi and leaves the entire Pi on one side, such that the normals of these hyperplanes, multiplied by appropriate positive constants, add up to zero. In the case n = 2, our lemma reduces to a standard separation theorem between two disjoint sets. For n > 2, the lemma may be considered as an n-way separation among n convex sets. See Figure 19.A1 for an illustration of the case n = 3. Proof of the Lemma. The proof is by induction on n. For n = 2, we have P1 ∩ P2 = Ø and we use a standard separation theorem (cf. Kelley and Namioka,
Sharing beliefs
479
h2 = h2(p)
2
1
p
h3 = h3(p)
3
h1 = h1(p)
Figure 19.1 Separation among three convex sets.
1963: 119, theorem on strong separation) to conclude that there is a continuous linear functional h:X → R and a number β ∈ R such that h(q) > β for q ∈ P1 and h(q) < β for q ∈ P2 . Choose p such that h(p) = β, and set h1 = h and h2 = −h. By linearity of h it is possible to choose p ∈ co(P1 ∪ P2 ). Assume that the lemma holds for every n < n. Let there be given (Pi )ni=1 . Set A = ∩i β
∀q ∈ B
and
h˜ n (q) < β
∀ q ∈ A.
Choose q0 ∈ X such that h˜ n (q0 ) = β. We shift the origin to q0 . Specifically, define for each i ≤ n, Pˆi = {p − q0 | p ∈ Pi } = Pi − q0 . Naturally, (Pˆi )ni=1 and their intersections inherit all relevant properties of (Pi )i . Denote Bˆ = B − q0 = Pˆn and Aˆ = A − q0 = ∩i 0 ∀q ∈ Bˆ and h˜ n (q) < 0 ˆ Consider X = {q ∈ X | h˜ n (q) = 0}. X is a locally convex linear ∀q ∈ A. topological subspace of X. Focusing on this subspace, define Pˆi = Pˆi ∩ X for i < n. Obviously, Pˆi is convex and compact for every i < n. We argue that it ˆ On the other hand, Pˆi has a nonempty is also nonempty. Indeed, Pˆi contains A. intersection with Bˆ = Pˆn . By convexity of Pˆi and linearity of h˜ n , Pˆi = Ø. Similarly, for l < n, ∩i=l,n Pˆi contains Aˆ and intersects Bˆ and we therefore get Pˆi = Ø ∀ l < n. i=l,n
ˆ Hence ∩i 0 ∀ p ∈ C ∩ X . Then, h can be extended to a continuous linear functional h: X → R such that h(p) > 0 ∀p ∈ C. Proof of Fact 19.1. Set D = {p ∈ X | h (p) = 0}. Observe that D = Ø since the origin is in D. Thus C and D are disjoint nonempty closed and convex sets in
Sharing beliefs
481
˜ → R and d ∈ R be X, and C is compact. Let a continuous linear functional h:X such that ˜ h(p) d
∀p ∈ C.
We claim that h˜ has to be constant on D. Indeed, assume that for some p, q ∈ D, ˜ ˜ ˆ ˆ h(p) = h(q). Since p, q ∈ D implies h(p) = h(q) = 0 and h (p) = h (q) = 0, ˜ + α(q − p)) | α ∈ we conclude that p + α(q − p) ∈ D for all α ∈ R. Hence {h(p ˜ R} = R, a contradiction to the fact that h(p) < d ∀ p ∈ D. Thus there is a c ∈ R ˜ such that h(p) = c ∀p ∈ D. Since the origin is in D, we obtain c = 0. It follows that d > 0 and therefore ˜ h(p) >d>0
∀ p ∈ C.
We now wish to show that, up to multiplication by a positive constant, h˜ extends h on X. Restrict attention to X . If p ∈ X satisfies h (p) = 0, then p ∈ D ˜ and we know that h(p) = 0. By Fact 19.2 below, there exists α ∈ R such that ˜ h(p) = αh (p) ∀ p ∈ X . However, on C ∩ X , both h˜ and h are positive. Therefore α > 0. Hence h ≡ (1/α)h˜ extends h on X and is positive on all of C. ˜ h: X → R be linear. Assume that Fact 19.2. Let X be a linear space and let h, ˜ h(q) = 0 ⇒ h(q) = 0 ∀ q ∈ X. ˜ Then there exists α ∈ R such that h(q) = α h(q) ∀q ∈ X. We skip the proof of this Fact and now turn to the proof of Theorem 19.2: (i) ⇒ (ii). Assume that ∩i≤n Pi = Ø. Let I be a minimal (with respect to set inclusion) subset of {1, . . . , n} with the property that ∩i∈I Pi = Ø. Since ∩ni=1 Pi = Ø, but Pi = Ø for every i, such a set I exists and for every such set | I | ≥ 2. Apply the Lemma to I . (ii) ⇒ (i). Assume that a point p ∈ X, a set I ⊆ {1, . . . , n}, and functionals (hi )i∈I exist asrequired, and suppose, contrary to (i), that there exists q ∈ ∩i≤n Pi . Then, by (a), i∈I hi (q − p) > 0, contrary to (b).
Acknowledgments We thank participants of the Erasmus conference at Tilburg University and two referees for useful comments.
Notes 1 In particular, we ignore the social aspects of betting as well as the strategic ones (see, e.g., Milgrom and Stokey, 1982). 2 Note that the axiomatization of Gilboa and Schmeidler (1989) delivers only finitely additive probability distributions.
482
Billot et al.
3 It is straightforward to check that C is of full-insurance if and only if ∀i, Ci is constant apart from a set Ai ∈ that has πi (Ai ) = 0 for some (and therefore, by assumption of mutual absolute continuity, for all) πi ∈ Pi . 4 This implication follows the logic of similar results for Choquet expected utility in Chateauneuf et al. (2000). 5 The fact that (iv) implies (ii) and (iii) also appears in Dana (1998) but in a finite set-up. 6 Dana (1998) shows that if there is a full insurance competitive equilibrium in this economy with finitely many states, then agents share a prior in common. Her proof, however, uses the concavity of the utility index and relies on the existence of a competitive equilibrium. 7 Bewley (1989) contains a similar no-trade result for agents whose preferences are given by partial orders as in Bewley (1986). His proof is very similar to Samet’s, and his result is weaker than Theorem 19.2 in the same sense that Samet’s is.
References Bewley, T. (1986). “Knightian Decision Theory: Part I,” Discussion Paper 807, Cowles Foundation. —— (1989). “Market Innovation and Entrepreneurship: A Knightian View,” Discussion Paper 905, Cowles Foundation. Chateauneuf, A., R. A. Dana, and J.-M. Tallon (2000). “Optimal Risk-sharing Rules and Equilibria with Choquet Expected Utility,” Cahiers EcoMaths 97-54, Université Paris I; Journal of Mathematical Economics, 34(2), 191–214. Dana, R. A. (1998). “Pricing Rules when Agents have Non-additive Expected Utility and Homogeneous Expectations,” Cahier du Ceremade, Université Paris IX. Dow, J., and S. Werlang (1992). “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, 60, 197–204. (Reprinted as Chapter 17 in this volume.) Dunford, N., and J. T. Schwartz (1958). Linear Operators. Part I. New York: Interscience. Gilboa, I., and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Kelley, J., and I. Namioka (1963). Linear Topological Spaces, Volume of Graduate Text in Mathematics. New York: Springer Verlag. Knight, F. (1921). Risk, Uncertainty and Profit. Boston: Houghton Mifflin. Milgrom, P., and N. Stokey (1982). “Information, Trade and Common Knowledge,” Journal of Economic Theory, 26, 17–27. Samet, D. (1998). “Common Priors and Separation of Convex Sets,” Games and Economic Behavior, 24, 172–174. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Tallon, J.-M. (1988). “Do Sunspots Matter when Agents are Choquet-Expected-Utility Maximizers?” Journal of Economic Dynamics and Control, 22, 357–368.
20 Equilibrium in beliefs under uncertainty Kin Chung Lo
20.1. Introduction Due to its simplicity and tractability, the subjective expected utility model axiomatized by Savage (1954) has been the most important theory in analyzing human decision making under uncertainty. In particular, it is almost universally used in game theory. Using the subjective expected utility model to represent players’ preferences, a large number of equilibrium concepts have been developed. The central one being Nash Equilibrium. On the other hand, the descriptive validity of the subjective expected utility model has been questioned, for example, because of Ellsberg’s (1961) famous mind experiment, a version of which follows. Suppose there are two urns. Urn 1 contains 50 red balls and 50 black balls. Urn 2 contains 100 balls. Each ball in urn 2 can be either red or black but the relative proportions are not specified. Consider the four acts listed in Table 20.1. Ellsberg argues that the typical preferences for the acts are f1 ∼ f2 f3 ∼ f4 , where the strict preference f2 f3 reflects an aversion to the “ambiguity” or “Knightian uncertainty” associated with urn 2. Subsequent experimental studies generally support that people are averse to ambiguity. (A summary can be found in Camerer and Weber (1992).) Such aversion contradicts the subjective expected utility model, as is readily demonstrated for the Ellsberg experiment. In fact, it contradicts any model of preferences in which underlying beliefs are represented by a probability measure. (Machina and Schmeidler (1992) call such preferences “probabilistically sophisticated.” In this chapter, I reserve the term “Bayesian” for subjective expected utility maximizer.) The Ellsberg paradox has motivated generalizations of the subjective expected utility model. In the multiple priors model axiomatized by Gilboa and Schmeidler (1989), the single prior of Savage is replaced by a closed and convex set of probability measures. The decision maker is said to be uncertainty averse if the set is not a singleton. He evaluates an act by computing the minimum expected utility over the probability measures in his set of priors.
Lo, K. C. (1996) Equilibrium in beliefs under uncertainty, J. Econ. Theory, 71: 443–484.
484
Kin Chung Lo Table 20.1 Acts in Ellsberg’s experiment f1 f2 f3 f4
Win $100 if the ball drawn from urn 1 is black Win $100 if the ball drawn from urn 1 is red Win $100 if the ball drawn from urn 2 is black Win $100 if the ball drawn from urn 2 is red
Although the Ellsberg Paradox only involves a single decision maker facing an exogenously specified environment, it is natural to think that ambiguity aversion is also common in decision-making problems where more than one person is involved. Since existing equilibrium notions of games are defined under the assumption that players are subjective expected utility maximizers, deviations from the Savage model to accommodate aversion to uncertainty make it necessary to redefine equilibrium concepts. This chapter generalizes Nash Equilibrium and one of its variations in normal form games to allow the beliefs of each player to be representable by a closed and convex set of probability measures as in the Gilboa–Schmeidler model. The chapter then employs the generalized equilibrium concepts to study the effects of uncertainty aversion on strategic interaction in normal form games. Note that in order to carry out a ceteris paribus study of the effects of uncertainty aversion on how a game is played, the solution concept we use for uncertainty averse players should be different from that for Bayesian players only in terms of attitude toward uncertainty. In particular, the solution concepts should share, as far as possible, comparable epistemic conditions. That is, the requirements on what the players should know about each other’s beliefs and rationality underlying the new equilibrium concepts should be “similar” to those underlying familiar equilibrium concepts. This point is emphasized throughout the chapter and is used to differentiate the equilibrium concepts proposed here from those proposed by Dow and Werlang (1994) and Klibanoff (1993), also in an attempt to generalize Nash Equilibrium in normal form games to accommodate uncertainty aversion. The chapter is organized as follows. Section 20.2 contains a brief review of the multiple priors model and a discussion of how it is adapted to the context of normal form games. Section 20.3 defines Nash Equilibrium and one of its variants. Section 20.4 defines and discusses the generalized equilibrium concepts used in this chapter. Section 20.5 makes use of the equilibrium concepts defined in Section 20.4 to investigate how uncertainty aversion affects players’ strategy choices and welfare. Section 20.6 identifies how uncertainty aversion is related to the structure of a game. Section 20.7 discusses the epistemic conditions of the equilibrium concepts for uncertainty averse players used in this chapter and compares them with those underlying the corresponding equilibrium notions for subjective expected utility maximizing players. Section 20.8 provides a comparison with Dow and Werlang (1994) and Klibanoff (1993). The comparison also serves to clarify the implications for adopting different approaches for developing equilibrium notions for games with uncertainty averse players. Section 20.9 argues that the results in previous sections hold even if we drop the particular functional form
Equilibrium in beliefs under uncertainty
485
of the utility function proposed by Gilboa and Schmeidler (1989) but retain some of its basic properties. Some concluding remarks are offered in Section 20.10.
20.2. Preliminaries 20.2.1. Multiple priors model In this section, I provide a brief review of the multiple priors model and a discussion of some of its properties that will be relevant in later sections. For any topological space Y , adopt the Borel σ -algebra Y and denote by M(Y ) the set of all probability measures over Y .1 Adopt the weak∗ topology on the set of all finitely additive probability measures over (Y , Y ) and the induced topology on subsets. Let (X, X ) be the space of outcomes and ( , ) the space of states of nature. Let F be the set of all bounded measurable functions from to M(X).2 That is, F is the set of two-stage, horse-race/roulette-wheel acts, as in Anscombe and Aumann (1963). For f , g ∈ F and α ∈ [0, 1], αf + (1 − α)g ≡ h, where h(ω) = αf (ω) + (1 − α)g(ω) ∀ω ∈ . f ∈ F is called a constant act if f (ω) = p ∀ω ∈ ; such an act involves (probabilistic) risk but no uncertainty. For notational simplicity, I also use p ∈ M(X) to denote the constant act that yields p in every state of the world, x ∈ X, the degenerate probability measure on x, and ω ∈ , the event {ω} ∈ . The primitive is a preference ordering over acts. The relations of strict preference and indifference are denoted by and ∼, respectively. Gilboa and Schmeidler (1989) impose a set of axioms on that are necessary and sufficient for to be represented by a numerical function having the following structure: there exists an affine function u : M(X) → R and a unique, nonempty, closed and convex set of finitely additive probability measures on such that for all f , g ∈ F , f g ⇐⇒ min
p∈
u ◦ f dp min
p∈
u ◦ g dp.
(20.1)
It is convenient, but in no way essential, to interpret as “representing the beliefs underlying ”; I provide no formal justification for such an interpretation. The difference between the subjective expected utility model and the multiple priors model can be illustrated by a simple example. Suppose = {ω1 , ω2 } and X = R. Consider an act f ≡ (f (ω1 ), f (ω2 )). If the decision maker is a Bayesian and his beliefs over are represented by a probability measure p, the utility of f is p(ω1 )u(f (ω1 )) + p(ω2 )u(f (ω2 )). On the other hand, if the decision maker is uncertainty averse with the set of priors = {p ∈ M({ω1 , ω2 })|pl p(ω1 ) ph with 0 pl < ph 1},
486
Kin Chung Lo
then the utility of f is pl u(f (ω1 )) + (1 − pl )u(f (ω2 )) ph u(f (ω1 )) + (1 − ph )u(f (ω2 ))
if u(f (ω1 )) u(f (ω2 )) if u(f (ω1 )) u(f (ω2 )).
Note that given any act f with u(f (ω1 )) > u(f (ω2 )), (ω1 , pl ; ω2 , 1 − pl )3 can be interpreted as local probabilistic beliefs at f in the following sense. There exists an open neighborhood of f such that for any two acts g and h in the neighborhood, g h ⇐⇒ pl u(g(ω1 )) + (1 − pl )u(g(ω2 )) pl u(h(ω1 )) + (1 − pl )u(h(ω2 )). That is, the individual behaves like an expected utility maximizer in that neighborhood with beliefs represented by (ω1 , pl ; ω2 , 1−pl ). Similarly, (ω1 , ph ; ω2 , 1−ph ) represents the local probabilistic beliefs at f if u(f (ω1 )) < u(f (ω2 )). Therefore, the decision maker who “consumes” different acts may have different local probability measures at those acts. There are three issues regarding the multiple priors model that will be relevant when the model is applied to normal form games. The first concerns the decision maker’s preference for randomization. According to the multiple priors model, preferences over constant acts, that can be identified with objective lotteries over X, are represented by u(·) and thus conform with the von Neumann Morgenstern model. The preference ordering over the set of all acts is quasiconcave. That is, for any two acts f , g ∈ F with f ∼ g, we have αf + (1 − α)g f for any α ∈ (0, 1). This implies that the decision maker may have a strict incentive to randomize among acts. The second concerns the notion of null event. Given any preference ordering over acts, define an event T ⊂ to be -null as in Savage (1954): T is -null if for all f , f , g ∈ F , * ) * ) f (ω) if ω ∈ T f (ω) if ω ∈ T . ∼ g(ω) if ω ∈ /T g(ω) if ω ∈ /T In other words, an event T is -null if the decision maker does not care about payoffs in states belonging to T . This can be interpreted as the decision maker knows (or believes) that T can never happen. If is expected utility preferences, then T is -null if and only if the decision maker attaches zero probability to T . If is represented by the multiple priors model, then T is -null if and only if every probability measure in attaches zero probability to T . Finally, the notion of stochastic independence will also be relevant when the multiple priors model is applied to games having more than two players. Suppose the set of states is a product space 1 × · · · × N . In the case of a subjective expected utility maximizer, where beliefs are represented by a probability measure p ∈ M( ), beliefs are said to be stochastically independent if p is a product i measure: p = ×N i=1 mi , where mi ∈ M( ) ∀i. In the case of uncertainty aversion,
Equilibrium in beliefs under uncertainty
487
the decision maker’s beliefs over are represented by a closed and convex set of probability measures . Let marg i be the set of marginal probability measures on i as one varies over all the probability measures in . That is, marg i ≡ {mi ∈ M( i ) | ∃p ∈ such that mi = marg i p}. Following Gilboa and Schmeidler (1989: 150–151), say that the decision maker’s beliefs are stochastically independent if = closed convex hull of {×N i=1 mi |mi ∈ marg i ∀i}. That is, is the smallest closed convex set containing all the product measures in ×N i=1 marg i . 20.2.2. Normal form games This section defines n-person normal form games where players’ preferences are represented by the multiple priors model. Throughout, the indices i, j , and k vary over distinct players in {1, . . . , n}. Unless specified otherwise, the quantifier “for all such i, j , and k” is to be understood. As usual, −i denotes the set of all players other than i. Player i’s finite pure strategy space is Si with typical element si . The set of pure strategy profiles is S ≡ ×ni=1 Si . The game specifies an outcome function gi : S → X for player i. Since mixed strategies induce lotteries over X, we specify an affine function uˆ i : M(X) → R to represent player i’s preference ordering over M(X). A set of strategy profiles, outcome functions, and utility functions determine a normal form game (Si , gi , uˆ i )ni=1 . Let M(Si ) be the set of mixed strategies for player i with typical element σi . The set of mixed strategy profiles is therefore given by ×ni=1 M(Si ). σi (si ) denotes the probability of playing si according to the mixed strategy σi , σ−i (s−i ) denotes j =i σj (sj ) and σ−i is the corresponding probability measure on S−i ≡ ×j =i Sj . Note that when players are Bayesians, σi is sometimes interpreted as the probabilistic conjecture held by i’s opponents about i’s pure strategy choice. This chapter adopts the view that uncertainty averse players have a strict incentive to randomize.4 Therefore, σi represents player i’s conscious randomization. For example, suppose a factory employer has two pure strategies s = monitor worker 1 and s = monitor worker 2. His decision problem is to choose a (possibly degenerate) random device to determine which worker he is going to monitor. (See Section 20.4.2 for arguments for and against this approach.) Assume that player i is uncertain about the strategy choices of all the other players. Since the objects of choice for player j is the set of mixed strategies M(Sj ), the relevant state space for player i is ×j =i M(Sj ), endowed with the product topology. Each mixed strategy of player i can be regarded as an act over this state space. If player i plays σi and the other players play σ−i , i receives the lottery that yields outcome gi (si , s−i ) with probability σi (si )σ−i (s−i ). Note that this lottery has finite support because S and therefore {gi (s)}s∈S are finite sets. It is also easy to see that the act corresponding to any mixed strategy is bounded and
488
Kin Chung Lo
measurable in the sense of the preceding subsection. Consistent with the multiple priors model, player i’s beliefs over ×j =i M(Sj ) are represented by a closed and convex set of probability measures Bˆ i . Therefore, the objective of player i is to choose σi ∈ M(Si ) to maximize uˆ i (gi (si , s−i ))σi (si )σ−i (s−i ) d pˆ i (σ−i ). min pˆ i ∈Bˆ i
×j =i M(Sj ) s ∈S s ∈S j i −i −i
Define the payoff function ui : S → R as follows: ui (s) ≡ uˆ i (gi (s)) ∀s ∈ S. A normal form game can then be denoted alternatively as (Si , ui )ni=1 and the objective function of player i can be restated in the form (20.2) ui (si , s−i )σi (si )σ−i (s−i ) d pˆ i (σ−i ). min pˆ i ∈Bˆ i
×j =i M(Sj ) s ∈S s ∈S i i −i −i
In order to produce a simpler formulation of player i’s objective function, note that each element in Bˆ i is a probability measure over a set of probability measures. Therefore, the standard rule for reducing two-stage lotteries leads to the following construction of Bi ⊆ M(S−i ): Bi ≡ pi ∈ M(S−i ) | ∃pˆ i ∈ Bˆ i such that $
pi (s−i ) =
×j =i M(Sj )
σ−i (s−i ) d pˆ i (σ−i )
∀s−i ∈ S−i .
The objective function of player i can now be rewritten as min ui (σi , pi ),
pi ∈Bi
(20.3)
where ui (σi , pi ) ≡ si ∈Si s−i ∈S−i ui (si , s−i )σi (si )pi (s−i ). Convexity of Bˆ i implies that Bi is also convex. Further, from the perspective of the multiple priors model (20.1), (20.3) admits a natural interpretation whereby S−i is the set of states of nature relevant to i and Bi is his set of priors over S−i . Because of the greater simplicity of (20.3), the equilibrium concepts used in this chapter will be expressed in terms of (20.3) and Bi instead of (20.2) and Bˆ i . The above construction shows that doing this is without loss of generality. However, the reader should always bear in mind that the former is derived from the latter and I will occasionally go back to the primitive level to interpret the equilibrium concepts.
20.3. Equilibrium concepts for Bayesian players This section defines equilibrium concepts for Bayesian players. The definition of equilibrium proposed by Nash (1951) can be stated as follows:
Equilibrium in beliefs under uncertainty
489
Definition 20.1. A Nash Equilibrium is a mixed strategy profile {σi∗ }ni=1 such that ∗ ∗ σi∗ ∈ BRi (σ−i ) ≡ argmax ui (σi , σ−i ). σi ∈M(Si )
Under the assumption that players are expected utility maximizers, Nash proves that any finite matrix game of complete information has a Nash Equilibrium. It is well known that there are two interpretations of Nash Equilibrium. The traditional interpretation is that σi∗ is the actual strategy used by player i. In a Nash Equi∗ . The librium, it is best for player i to use σi∗ given that other players choose σ−i ∗ second interpretation is that σi is not necessarily the actual strategy used by player i. Instead it represents the marginal beliefs of player j about what pure strategy player i is going to pick. Under this interpretation, Nash Equilibrium is usually stated as an n-tuple of probability measures {σi∗ }ni=1 such that ∗ si ∈ BRi (σ−i )
∀si ∈ support of σi∗ .
∗ , BR (σ ∗ ) Its justification is that given that player i’s beliefs are represented by σ−i i −i is the set of strategies that maximize the utility of player i. So player j should ∗ ) will be chosen “think,” if j knows i’s beliefs, that only strategies in BRi (σ−i by i. Therefore, the event that player i will choose a strategy which is not in ∗ ) should be “null” (in the sense of Section 20.2.1) from the point of view BRi (σ−i of player j . This is the reason for imposing the requirement that every strategy si in the support of σi∗ , which represents the marginal beliefs of player j , must be an ∗ ). element of BRi (σ−i The “beliefs” interpretation of Nash Equilibrium allows us to see clearly the source of restrictiveness of this solution concept. First, the marginal beliefs of player j and player k about what player i is going to do are represented by the same probability measure σi∗ . Second, player i’s beliefs about what his opponents are going to do are required to be stochastically independent in the sense that the ∗ on the strategy choices of the other players is a product probability distribution σ−i measure. We are therefore led to consider the following variation.
Definition 20.2. A Bayesian Beliefs Equilibrium is an n-tuple of probability measures {bi }ni=1 where bi ∈ M(S−i ) such that5 margSi bj ∈ BRi (bi ) ≡ argmax ui (σi , bi ). σi ∈M(Si )
∗ }n is a Bayesian It is easy to see that if {σi∗ }ni=1 is a Nash Equilibrium, then {σ−i i=1 Beliefs Equilibrium. Conversely, a Bayesian Beliefs Equilibrium {bi }ni=1 consti∗ . Note that in games involving only tutes a Nash Equilibrium {σi∗ }ni=1 if bi = σ−i two players, the two equilibrium concepts are equivalent in that a Bayesian Beliefs Equilibrium must constitute a Nash Equilibrium. However, when a game involves more than two players, the definition of Bayesian Beliefs Equilibrium is more general. For instance, in a Bayesian Beliefs
490
Kin Chung Lo
Equilibrium, players i and k can disagree about what player j is going to do. That is, it is allowed that margSj bi = margSj bk . Example 20.1. Marginal beliefs disagree. Suppose the game involves three players. Player 1 only has one strategy {X}. Player 2 only has one strategy {Y }. Player 3 has two pure strategies {L, R}. The payoff to player 3 is a constant. {b1 = Y L, b2 = XR, b3 = XY } is a Bayesian Beliefs Equilibrium. However it does not constitute a Nash Equilibrium because players 1 and 2 disagree about what player 3 is going to do. Second, in a Bayesian Beliefs Equilibrium, player i is allowed to believe that the other players are playing in a correlated manner. As argued by Aumann (1987), this does not mean that the other players are actually coordinating with each other. It may simply reflect that i believes that there exist some common factors among the players that affect their behavior; for example, player i knows that all other players are professors of economics. Example 20.2. Stochastically dependent beliefs. Suppose the game involves three players. Player 1 has two pure strategies {U , D}. Player 2 has two pure strategies {L, R}. Player 3 has two pure strategies {T , B}. The payoffs of players 1 and 2 are constant. The payoff matrix for player 3 is as shown in Table 20.2. (For all n-person games presented in this chapter, the payoff is in terms of utility.) It is easy to see that b1 = (LT , 0.5; RT , 0.5), b2 = (U T , 0.5; DT , 0.5), and b3 = (U R, 0.5; DL, 0.5) constitute a Bayesian Beliefs Equilibrium. Moreover the marginal beliefs of the players agree. However it does not constitute a Nash Equilibrium. The reason is that player 3’s beliefs about the strategies of players 1 and 2 are stochastically dependent. If player 3 believes that the strategies of player 1 and player 2 are stochastically independent, player 3’s beliefs are possibly (UL, 0.25; UR, 0.25; DL, 0.25; DR, 0.25) and T would no longer be his best response. Table 20.2 Payoff matrix for player 3
T B
UL
UR
DL
DR
−10 0
3 0
4 0
−10 0
Equilibrium in beliefs under uncertainty
491
20.4. Equilibrium concepts for uncertainty averse players 20.4.1. Equilibrium concepts This section defines generalizations of Nash Equilibrium and Bayesian Beliefs Equilibrium to allow players’ preferences to be represented by the multiple priors model. The proposed equilibrium concepts preserve all essential features of their Bayesian counterparts, except that players’ beliefs are not necessarily represented by a probability measure. Further discussion is provided in Section 20.4.2. The generalization of Bayesian Beliefs Equilibrium is presented first. Definition 20.3. A Beliefs Equilibrium is an n-tuple of sets of probability measures {Bi }ni=1 where Bi ⊆ M(S−i ) is a nonempty, closed, and convex set such that margSi Bj ⊆ BRi (Bi ) ≡ argmax min ui (σi , pi ). σi ∈M(Si ) pi ∈Bi
When expressed in terms of Bˆ i , a Beliefs Equilibrium is an n-tuple of closed and convex sets of probability measures {Bˆ i }ni=1 such that σi ∈ BRi (Bˆ i )
∀σi ∈ ∪pˆj ∈Bˆ j support of margM(Si ) pˆ j ,
where BRi (Bˆ i ) is the set of strategies which maximize (20.2).6 The interpretation of Beliefs Equilibrium parallels that of its Bayesian counterpart. Given that player i’s beliefs are represented by Bˆ i , BRi (Bˆ i ) is the set of strategies that maximize the utility of player i. So player j should “think,” if j knows i’s beliefs, that only strategies in BRi (Bˆ i ) will be chosen by i. Therefore, the event that player i will choose a strategy that is not in BRi (Bˆ i ) should be “null” (in the sense of Section 20.2.1) from the point of view of player j . This is the reason for imposing the requirement that every strategy σi in the union of the support of every probability measure in margM(si ) Bˆ j , which represents the marginal beliefs of player j about what player i is going to do, must be an element of BRi (Bˆ i ). It is obvious that every Bayesian Beliefs Equilibrium is a Beliefs Equilibrium. Say that a Beliefs Equilibrium {Bi }ni=1 is proper if not every Bi is a singleton. Recall that Nash Equilibrium is different from Bayesian Beliefs Equilibrium in two respects: (i) The marginal beliefs of the players agree and (ii) the overall beliefs of each player are stochastically independent. An appropriate generalization of Nash Equilibrium to allow for uncertainty aversion should also possess these two properties. Consider therefore the following definition. Definition 20.4. A Beliefs Equilibrium {Bi }ni=1 is called a Beliefs Equilibrium with Agreement if there exists ×ni=1 i ⊆ ×ni=1 M(Si ) such that Bi = closed convex hull of {σ−i ∈ M(S−i ) | margSj σ−i ∈ j }.
492
Kin Chung Lo
We can see as follows that this definition delivers the two properties “agreement” and “stochastic independence of beliefs.” As explained in Section 20.2.2, player i’s beliefs are represented by a closed and convex set of probability measures Bˆ i on ×j =i M(Sj ). I require the marginal beliefs of the players to agree in the sense that margM(Sj ) Bˆ i = margM(Sj ) Bˆ k . To capture the idea that the beliefs of each player are stochastically independent, I impose the requirement that Bˆ i contains all the product measures. That is, Bˆ i = closed convex hull of {×j =i m ˆ j |m ˆ j ∈ margM(Sj ) Bˆ i }. Bi is derived from Bˆ i as in Section 20.2.2. By construction, we have margSj Bi = margSj Bk = convex hull of j and Bi takes the form required in the definition of Beliefs Equilibrium with Agreement. Note that Beliefs Equilibrium and Beliefs Equilibrium with Agreement coincide in two-person games. Further, for n-person games, if {bi }ni=1 is a Bayesian Beliefs Equilibrium with Agreement, then {bi }ni=1 constitutes a Nash Equilibrium. To provide further perspective and motivation, I state two variations of Beliefs Equilibrium and explain why they are not the focus of this chapter. Given that any strategy in BRi (Bi ) is equally good for player i, it is reasonable for player j to feel completely ignorant about which strategy i will pick from BRi (Bi ). This leads us to consider the following strengthening of Beliefs Equilibrium: Definition 20.5. A Strict Beliefs Equilibrium is a Beliefs Equilibrium with margSi Bj = BRi (Bi ). A Beliefs Equilibrium may not be a Strict Beliefs Equilibrium, as demonstrated in the following example.7 The example also shows that a Strict Beliefs Equilibrium does not always exist, which is obviously a serious deficiency of this solution concept. Example 20.3. Nonexistence of Strict Beliefs Equilibrium. The game in Table 20.3 only has one Nash Equilibrium, {U , L}. It is easy to check that it is not a Strict Beliefs Equilibrium. In fact, there is no Strict Beliefs Equilibrium for this game. An opposite direction is to consider weakening the definition of Beliefs Equilibrium. Table 20.3 A two-person game
U D
L
R
3, 2 0, 4
−1, 2 0, −100
Equilibrium in beliefs under uncertainty
493
Definition 20.6. A Weak Beliefs Equilibrium is an n-tuple of beliefs {Bi }ni=1 such that margSi Bj ∩ BRi (Bi ) = Ø. It is clear that any Beliefs Equilibrium is a Weak Beliefs Equilibrium. The converse is not true. If margSi Bj BRi (Bi ), there are some strategies (in j ’s beliefs about i) that player i will definitely not choose. However, player j considers those strategies “possible.” On the other hand, margSi Bj ∩ BRi (Bi ) = Ø captures the idea that player j cannot be “too wrong.” Weak Beliefs Equilibrium is also not the focus of this chapter because we do not expect much strategic interaction if the players know so little about their opponents. However, it will be discussed further in Section 20.7 (Proposition 20.8) and Section 20.8, where its relation to the equilibrium concepts proposed by Dow and Werlang (1994) and Klibanoff (1993) is discussed. 20.4.2. Discussion 20.4.2.1. Mixed strategies as objective randomization vs subjective beliefs As pointed out earlier, a mixed strategy of a player is traditionally regarded as his conscious randomization. In recent years, a modern view of mixed strategies has emerged according to which players do not randomize. Each player chooses a definite action but his opponents may not know which one, and the mixture represents their conjecture about his choice. Note that in games with Bayesian players, the two views are “observationally indistinguishable” in the sense that they lead to the same set of Nash Equilibria. However, this is not necessarily the case for games with uncertainty averse players. For example, consider the game in Table 20.4. (For all two person games presented in this chapter, player 1 is the row player and player 2 is the column player.) If player 1 is Bayesian, D is never optimal no matter whether he randomizes or not. Now suppose that player 1 is uncertainty averse. If he has a preference ordering represented by (20.3), then whatever his beliefs, the utility of the mixed strategy (U , 0.5; C, 0.5) is equal to 5 and the utility of D is only equal to 1. Therefore, D is also never optimal. On the other hand, if we assume that player 1 does not randomize and therefore has the choice set {U , C, D}, then he will strictly prefer to play D rather than U and C if his beliefs are, say, B1 = M({U , C, D}). The above example demonstrates the necessity to reexamine the two views of mixed strategies when we consider games with uncertainty averse players. Such Table 20.4 A two-person game
U C D
L
R
10, 1 0, 1 1, 1
0, 1 10, 1 1, 1
494
Kin Chung Lo
a reexamination is provided here. It also serves to justify the adoption of the traditional view in this chapter. One justification of the modern view is that the normal form game under study is repeated over time, where each player’s pure strategy choices are independent and identically distributed random variables. A mixed strategy equilibrium can therefore be interpreted as a stochastic steady state. However, since uncertainty is presumably eliminated asymptotically, this repeated game scenario is of limited relevance for the present study of games with uncertainty averse players. The standard objection to the traditional view also does not necessarily extend to games with uncertainty averse players. The argument against the traditional view is that since expected utility is linear in probabilities, Bayesian players do not have a strict incentive to randomize (see, for instance, Brandenburger (1992: 91). However, when preferences deviate from the expected utility model, there may exist a strict incentive to randomize. To see this, let us first go back to the context of single-person decision theory. Recall that is a preference ordering over the set of acts F , where each act f maps
into M(X). The interpretation of f is as follows. First a horse race determines the true state of nature ω ∈ . The decision maker is then given the objective lottery ticket f (ω). He spins the roulette wheel as specified by f (ω) to determine the actual prize he is going to receive. Also recall that for any two acts f , f ∈ F and α ∈ [0, 1], αf + (1 − α)f refers to the act which yields the lottery ticket αf (ω) + (1 − α)f (ω) in state ω. Suppose is strictly quasiconcave as in the Gilboa–Schmeidler model and the decision maker has to choose between f and f . Suppose further that he perceives that nature moves first; that is, a particular state ω∗ ∈ has been realized but the decision maker does not know what ω∗ is. If the decision maker randomizes between choosing f and f with probabilities α and 1 − α, respectively, he will receive the lottery αf (ω)+(1−α)f (ω) when ω∗ = ω. This is precisely the payoff of the act αf + (1 − α)f in state ω. That is, randomization enables him to enlarge the choice set from {f , f } to {αf + (1 − α)f | α ∈ [0, 1]}. Correspondingly, there will “typically” be an α ∈ (0, 1) such that αf + (1 − α)f is optimal according to .8 On the other hand, suppose the decision maker moves first and nature moves second. If the decision maker randomizes between choosing f and f with probabilities α and 1−α, respectively, he faces the lottery (f , α; f , 1−α) that delivers act f with probability α and f with probability 1 − α. Therefore, randomization delivers the set {(f , α; f , 1 − α) | α ∈ [0, 1]} of objective lotteries over F . Note that {(f , α; f , 1 − α) | α ∈ [0, 1]} is not in the domain of F and so the Gilboa–Schmeidler model is silent on the decision maker’s preference ordering over this set. This discussion translates to the context of normal form games with uncertainty averse players as follows. The key is whether player i perceives himself as moving first or last. The assumption of strict incentive to randomize can be justified by the assumption that each player perceives himself as moving last. On the other hand, if we assume that each player perceives himself as moving first, and has
Equilibrium in beliefs under uncertainty
495
an expected utility representation for preferences over objective lotteries on F , then there will be no strict incentive to randomize. Since the perception of each player about the order of strategy choices is not observable and there does not seem to be a compelling theoretical case for assuming either order, it would seem that either specification of strategy space merits study. (See also Dekel et al., 1991 for another instance where the perception of the players about the order of moves is important.) Another objection to the assumption of strict incentive to randomize that might be raised in the context of uncertainty is that it contradicts “Ellsberg type” behavior. The argument goes as follows. Suppose a decision maker can choose between f3 and f4 listed in Table 20.1. If the decision maker randomizes between f3 and f4 with equal probability, it will generate the act 12 f3 + 12 f4 which yields the lottery [$100, 12 ; $0, 12 ] in each state. Therefore, 12 f3 + 12 f4 is as desirable as f1 or f2 . This implies that the decision maker will be indifferent between having the opportunity to choose an act from {f1 , f2 } or from {f3 , f4 } and the Ellsberg Paradox disappears! The discussion in previous paragraphs gives us the correct framework to handle this objection. Randomization between f3 and f4 with equal probability will generate the act 21 f3 + 12 f4 only if either the decision maker is explicitly told or he himself perceives that he can first draw a ball from the urn but not look at its color, then toss a fair coin and choose f3 (f4 ) when head (tail) comes up (Raiffa, 1961: 693). Also, the preference pattern f1 ∼ f2 f3 ∼ f4 is already sufficient to constitute one version of the Ellsberg Paradox. In this version, consideration of randomization is irrelevant. Therefore, assuming a strict incentive to randomize does not make every version of the Ellsberg Paradox disappear. Finally, one standard defense of the interpretation of mixed strategies as objective randomization also makes sense in games with uncertainty averse players. That is, one may imagine a hypothetical “guide to playing games.” Such a guide can certainly recommend a mixed strategy or a set of mixed strategies to each player. 20.4.2.2. Knowledge of beliefs In common with the equilibrium concepts for Bayesian players presented in Section 20.3, Beliefs Equilibrium (with Agreement) assumes that each player knows his opponents’ beliefs. Three possible justifications for this assumption are as follows. First, players’ beliefs are derived from statistical information (which is not necessarily precise enough to be characterized by an objective probability measure). For instance, a salesman possesses statistical information about the bargaining behavior of customers. If a customer also knows the information, then he knows the beliefs of the salesman (Aumann and Brandenburger, 1995: 1176). Second, players may learn about their opponents’ beliefs through pre-game communication. For instance, suppose a player has two pure strategies X and Y . He may announce that he will choose X with probability between 0 and 1. In fact, Example 20.5 in Section 20.5 illustrates that it may be strictly better for a player
496
Kin Chung Lo
to make such a “vague” announcement. This point is also discussed by Greenberg (1994). Finally, players’ beliefs may be derived from public recommendation. A “guide” suggests a set of strategies to each player publicly. After receiving the suggestion, each player chooses a strategy which is unknown to his opponents. Admittedly, these justifications may not be entirely convincing. For example, when two players receive the same statistical information which does not take the form of an objective probability measure, it is demanding to assume that their beliefs agree. However, without the assumption of agreement, it would be harder to justify that players know each other’s beliefs, even though beliefs are derived from the same source of information. Nevertheless, note that the agreement assumption is equally strong if we restrict players to be Bayesians. In fact, if we allow players’ beliefs to disagree, it seems even harder for Bayesian beliefs to be mutual knowledge. How can player i know the unique subjective probability measure representing the beliefs of player j ? To conclude, I acknowledge the limitation of the above story and, following Aumann and Brandenburger (1995: 1176), only intend to show that a player may well know another’s conjecture in situations of economic interest. 20.4.2.3. Knowledge of rationality As do their Bayesian counterparts, Beliefs Equilibrium (with Agreement) assumes mutual knowledge of rationality. That is, player j ’s beliefs about player i’s behavior are focused on i’s best response given i’s true beliefs. This can be justified by the assumption that players learn their opponents’ rational behavior from past observations. We can assume that past observations are obtained from previous plays of the same normal form game (which has not been repeated sufficiently often to eliminate all uncertainty about strategy choices).9 Alternatively, we can assume that players’ knowledge of opponents’ rationality is derived from other sources. For example, before players i and j play a normal form game, i has observed that j was rational when j played a different game with player k. 20.4.3. Relationship with maximin strategy and rationalizability Finally it is useful to clarify the relationship between Beliefs Equilibrium defined in Section 20.4.1 and some familiar concepts in the received theory of normal form games. Definition 20.7. The strategy σi∗ is a maximin strategy for player i if σi∗ ∈ argmax
min
σi ∈M(Si ) pi ∈M(S−i )
ui (σi , pi ).
The following result is immediate: Proposition 20.1. If {M(Si )}ni=1 is a Beliefs Equilibrium, then every σi ∈ M(Si ) is a maximin strategy.
Equilibrium in beliefs under uncertainty
497
Definition 20.8. Set i0 ≡ M(Si ) and recursively define10 " in = σi ∈ in−1 | ∃p ∈ M(×j =i supp jn−1 ) % such that ui (σi , p) ui (σi , p) ∀σi ∈ in−1 .
n For player i, the set of Correlated Rationalizable Strategies is Ri ≡ ∞ n=0 i .
∞ n−1 We call RBi ≡ n=1 M(×j =i supp j ) the set of Rationalizable Beliefs. These notions are related to Beliefs Equilibrium by the next proposition. Proposition 20.2. Suppose {Bi }ni=1 is a Beliefs Equilibrium. Then BRi (Bi ) ⊆ Ri and Bi ⊆ RBi . Proof. Set ˆ i0 ≡ M(Si ) and recursively define " n ˆ i = σi ∈ ˆ in−1 | ∃P ⊆ M(×j =i supp ˆ jn−1 ) such that min ui (σi , p) p∈P
min ui (σi , p) p∈P
∀σi
% n−1 ˆ . ∈ i
By definition, i0 = ˆ i0 . It is obvious that i1 ⊆ ˆ i1 . Any element σi not in does not survive the first round of the iteration in the definition of correlated rationalizability. Since correlated rationalizability and iterated strict dominance coincide (see Fudenberg and Tirole, 1991: 52), there must exist σi∗ ∈ i0 such that ui (σi∗ , p) > ui (σi , p) ∀p ∈ M(×j =i supp j0 ). This implies minp∈P ui (σi∗ , p) > minp∈P ui (σi , p) ∀P ⊆ M(×j =i supp ˆ 0 ). Therefore, σi ∈ / ˆ 1 and we have i1
j
i
i1 = ˆ i1 . The argument can be repeated to establish in = ˆ in ∀n. BRi (Bi ) is rationalized by Bi , that is, BRi (Bi ) ⊆ ˆ i1 . According to the definition of Beliefs Equilibrium, margSi Bj ⊆ BRi (Bi ) ⊆ ˆ i1 . This implies ×j =i margSj Bi ⊆ ×j =i ˆ j1 and therefore Bi ⊆ M(×j =i supp ˆ j1 ). The argument can be repeated to establish BRi (Bi ) ⊆ ˆ n and Bi ⊆ M(×j =i supp ˆ n ) ∀n. i
j
20.5. Does uncertainty aversion matter? 20.5.1. Questions In Section 20.4, I have set up a framework that enables us to investigate how uncertainty aversion affects strategic interaction in the context of normal form
498
Kin Chung Lo
games. My objective here is to address the following two specific questions: 1. As an outside observer, one only observes the actual strategy choice but not the beliefs of each player. Is it possible for an outside observer to distinguish uncertainty averse players from Bayesian players? 2. Does uncertainty aversion make the players worse off (better off)? To deepen our understanding, let me first provide the answers to these two questions in the context of single-person decision making and conjecture the possibility of extending them to the context of normal form games. 20.5.2. Single-person decision making The first question is: as an outside observer, can we distinguish an uncertainty averse decision maker from a Bayesian decision maker? The answer is obviously yes if we have “enough” observations. (Otherwise the Ellsberg Paradox would not exist!) However, it is easy to see that if we only observe an uncertainty averse decision maker who chooses one act from a convex constraint set G ⊆ F , then his choice can always be rationalized (as long as monotonicity is not violated) by a subjective expected utility function. For example, take the simple case where
= {ω1 , ω2 }. The feasible set of utility payoffs C ≡ {(u(f (ω1 )), u(f (ω2 )))|f ∈ G} generated by G will be a convex set in R 2 . Suppose the decision maker chooses a point c ∈ C. To rationalize his choice by an expected utility function, we can simply draw a linear indifference curve which is tangent to C at c, with slope describing the probabilistic beliefs of the expected utility maximizer. This answer is at least partly relevant to the first question posed in Section 20.5.1. That is because in a normal form game, an outside observer only observes that each player i chooses a strategy from the set M(Si ). An important difference, though, is that the strategy chosen by i is a best response given his beliefs and these are part of an equilibrium. Therefore, it is possible that the consistency condition imposed by the equilibrium concept can enable us to break the observational equivalence. The second question addresses the welfare consequences of uncertainty aversion: does uncertainty aversion make a decision maker worse off (better off)? There is a sense in which uncertainty aversion makes a decision maker worse off. For simplicity, suppose again that X = R. Suppose that initially, beliefs over the state space are represented by a probability measure pˆ and next that beliefs change from pˆ to the set of priors with pˆ ∈ . Given f ∈ F , let CE (f ) be the certainty equivalent of f , that is, u(CE (f )) = minp∈ u ◦ f dp. Similar meaning is given to CEpˆ (f ). Then uncertainty aversion makes the decision maker worse off in the sense that CEpˆ (f ) CE (f ). That is, the certainty equivalent of any f when beliefs are represented by pˆ is higher than that when beliefs are represented by . Note that in this welfare comparison, I am fixing the utility function of lotteries u. This assumption can be clarified by the following restatement: Assume that the
Equilibrium in beliefs under uncertainty
499
decision maker has a fixed preference ordering ∗ over M(X) which satisfies the independence axiom and is represented numerically by u. Denote by and the orderings over acts corresponding to the priors pˆ and , respectively. Then the welfare comparison presumes that both and agree with ∗ on the set of constant acts, that is, for any f , g ∈ F with f (ω) = p and g(ω) = q for all ω ∈ , f g ⇔ f g ⇔ p ∗ q. At this point, it is not clear that the earlier discussion extends to the context of normal form games. When strategic considerations are present, one might wonder whether it is possible that if player 1 is uncertainty averse and if player 2 knows that player 1 is uncertainty averse, then the behavior of player 2 is affected in a fashion that benefits player 1 relative to a situation where 2 knows that 1 is a Bayesian.11 When both players are uncertainty averse and they know that their opponents are uncertainty averse, can they choose a strategy profile that Pareto dominates equilibria generated when players are Bayesians? 20.5.3. Every Beliefs Equilibrium contains a Bayesian Beliefs Equilibrium In this section, the two questions posed in Section 20.5.1 are addressed using the equilibrium concepts Bayesian Beliefs Equilibrium and Beliefs Equilibrium. The answers are implied by the following proposition. Proposition 20.3. If {Bi }ni=1 is a Beliefs Equilibrium, then there exist bi ∈ Bi , i = 1, . . . , n, such that {bi }ni=1 is a Bayesian Beliefs Equilibrium and BRi (Bi ) ⊆ BRi (bi ). Proof. 12 It is sufficient to show that there exists bi ∈ Bi such that BRi (Bi ) ⊆ BRi (bi ). This and the fact that {Bi }ni=1 is a Beliefs Equilibrium imply margSi bj ∈ margSi Bj ⊆ BRi (Bi ) ⊆ BRi (bi ). Therefore, {bi }ni=1 is a Bayesian Beliefs Equilibrium. We have that ui (·, pi ) is linear on M(Si ) for each pi and ui (σi , ·) is linear on Bi for each σi . Therefore, by Fan’s Theorem (Fan, 1953), u ≡ max min ui (σi , pi ) = min σi ∈M(Si ) pi ∈Bi
max ui (σi , pi ).
pi ∈Bi σi ∈M(Si )
By definition, σi ∈ BRi (Bi ) if and only if minpi ∈Bi ui (σi , pi ) = u. Therefore, ui (σi , bi ) u ∀pi ∈ Bi ∀σi ∈ BRi (Bi ).
(20.4)
Take bi ∈ argminpi ∈Bi maxσi ∈M(Si ) ui (σi , pi ). Then conclude that ui (σi , bi ) u = max ui (σi , bi ) σi ∈M(Si )
∀σi ∈ M(Si ).
Combining (20.4) and (20.5), we have ui (σi , bi ) = u ∀σi ∈ BRi (Bi ), that is, BRi (Bi ) ⊆ BRi (bi )
(20.5)
500
Kin Chung Lo Table 20.5 A two-person game
R1 R2 R3 R4
C1
C2
C3
C4
0, 1 0, 1 1, 1 −1, 1
0, 1 0, 1 −1, 1 1, 1
1, 1 0, 1 0, 1 0, 1
0, 1 1, 1 0, 1 0, 1
Example 20.4. Illustrating Proposition 20.3. Consider the game in Table 20.5. The sets of probability measures B1 = M({C1 , C2 , C3 , C4 }) and B2 = {R1 } constitute a Beliefs Equilibrium. It contains the Bayesian Beliefs Equilibrium {b1 = (C1 , 0.5; C2 , 0.5), b2 = R1 }. Also note that BR1 (B1 ) = {p ∈ M({R1 , R2 , R3 , R4 }) | p(R3 ) = p(R4 )} and BR1 (b1 ) = M({R1 , R2 , R3 , R4 }). This shows that the inclusion property BRi (Bi ) ⊆ BRi (bi ) in Proposition 20.3 can be strict. The example also demonstrates that a Proper Beliefs Equilibrium may contain more than one Bayesian Beliefs Equilibrium. For instance, {b1 = C3 , b2 = R1 } is another Bayesian Beliefs Equilibrium. However BR1 (b1 ) = {R1 }. Therefore, not every Bayesian Beliefs Equilibrium {bi }ni=1 contained in a Beliefs Equilibrium {Bi }ni=1 has the property BRi (Bi ) ⊆ BRi (bi ). For games involving more than two players, a Beliefs Equilibrium in general does not contain a Nash Equilibrium. This is already implied by the fact that a Bayesian Beliefs Equilibrium is itself a Beliefs Equilibrium but not a Nash Equilibrium. However, since Bayesian Beliefs Equilibrium and Nash Equilibrium are equivalent in two-person games, Proposition 20.3 has the following corollary. Corollary of Proposition 20.3. In a two-person game, if {B1 , B2 } is a Beliefs Equilibrium, then there exists σj∗ ∈ Bi such that {σ1∗ , σ2∗ } is a Nash Equilibrium and BRi (Bi ) ⊆ BRi (σj∗ ). Proposition 20.3 delivers two messages. The first is regarding the prediction of how the game will be played. Suppose {Bi }ni=1 is a Beliefs Equilibrium. The associated prediction regarding strategies played is that i chooses some σi ∈ BRi (Bi ). According to Proposition 20.3, it is always possible to find at least one Bayesian Beliefs Equilibrium {bi }ni=1 contained in {Bi }ni=1 such that the observed behavior of the uncertainty averse players (the actual strategies they choose) is consistent with utility maximization given beliefs represented by {bi }ni=1 . This implies that an outsider who can only observe the actual strategy choices in the single game under study will not be able to distinguish uncertainty averse players from Bayesian players. (I will provide reasons to qualify such observational equivalence in the next section.) We can use Proposition 20.3 also to address the welfare consequences of uncertainty aversion, where the nature of our welfare comparisons is spelled out in Section 20.5.2. If {Bi }ni=1 is a Beliefs Equilibrium, it contains a Bayesian Beliefs
Equilibrium in beliefs under uncertainty
501
Equilibrium {bi }ni=1 , and therefore, max min ui (σi , pi ) max ui (σi , bi ).
σi ∈M(Si ) pi ∈Bi
σi ∈M(Si )
The left-hand side of the above inequality is the ex ante utility of player i when his beliefs are represented by Bi and the right-hand side is ex ante utility when beliefs are represented by bi . The inequality implies that ex ante, i would prefer to play the Bayesian Beliefs Equilibrium {bi }ni=1 to the Beliefs Equilibrium {Bi }ni=1 . In this ex ante sense, uncertainty aversion makes the players worse off.13 20.5.4. Uncertainty aversion can be beneficial when players agree The earlier comparisons addressed the effects of uncertainty aversion when the equilibrium concepts used, namely Beliefs Equilibrium and Bayesian Beliefs Equilibrium, do not require agreement between agents. Here I reexamine the effects of uncertainty aversion when agreement is imposed, as incorporated in the Beliefs Equilibrium with Agreement and Nash Equilibrium solution concepts. For two-person games, the Corollary of Proposition 20.3 still applies since agreement is not an issue given only two players. However, for games involving more than two players, the following example demonstrates that it is possible to have a Beliefs Equilibrium with Agreement not containing any Nash Equilibrium. Example 20.5. Uncertainty aversion leads to Pareto improvement The game presented in this example is a modified version of the prisoners’ dilemma. The game involves three players, 1, 2, and N . Player N can be interpreted as “nature.” The payoff of player N is a constant and his set of pure strategies is {X, Y }. Players 1 and 2 can be interpreted as two prisoners. The set of pure strategies available for players 1 and 2 are {C1 , D1 } and {C2 , D2 }, respectively, where C stands for “co-operation” and D stands for “defection.” The payoff matrix for player 1 is shown in Table 20.6 and that for player 2 is shown in Table 20.7. Assume that the payoffs satisfy a>c>b
and c > d > e
and
2c < a + b.
(20.6)
Note that the game is the prisoner’s dilemma game if the inequalities a > c > b in (20.6) are replaced by a = b > c. (When a = b > c, the payoffs of players 1 and 2 for all strategy profiles do not depend on nature’s move.) This game is different from the standard prisoners’ dilemma in one respect. In the standard prisoners’ dilemma game, the expression a = b > c says that it is always better for one Table 20.6 Payoff matrix for player 1
C1 D1
XC2
Y C2
XD2
Y D2
c a
c b
e d
e d
502
Kin Chung Lo Table 20.7 Payoff matrix for player 2
C2 D2
XC1
Y C1
XD1
Y D1
c b
c a
e d
e d
Table 20.8 A two-person game
C1 D1
C2
D2
c, c pl a + (1 − pl )b, e
e, ph b + (1 − ph )a d, d
player to play D given that his opponent plays C. In this game, the expression a > c > b says that if one player plays D and one plays C, the player who plays D may either gain or lose. The interpretation of the inequalities c > d > e in (20.6) is the same as that in the standard prisoners’ dilemma. That is, it is better for both players to play C rather than D. However, a player should play D given that his opponent plays D. Note that the last inequality 2c < a + b in (20.6) is implied by a = b > c in the prisoners’ dilemma game. The inequality 2c < a + b can be rewritten as (a − c) − (c − b) > 0. For player 1, for example, (a − c) is the utility gain from playing D1 instead of C1 if the true state is XC2 . (c − b) is the corresponding utility loss if the true state is Y C2 . Therefore, the interpretation of 2c < a + b is that if you know your opponent plays C, the possible gain (loss) for you to play D instead of C is high (low). Assume that players 1 and 2 know each other’s action but they are uncertain about nature’s move. To be precise, suppose that the beliefs of the players are BN = {C1 C2 } B1 = {p ∈ M({XC2 , Y C2 }) | pl p(XC2 ) ph with 0 pl < ph 1} B2 = {p ∈ M({XC1 , Y C1 }) | pl p(XC1 ) ph with 0 pl < ph 1}. The construction of {BN , B1 , B2 } reflects the fact that the players agree. For example, the marginal beliefs of players 1 and 2 regarding {X, Y } agree with = {p ∈ M({X, Y }) | pl p(X) ph
with
0 pl < ph 1}. (20.7)
Given {BN , B1 , B2 }, the payoffs of each pure strategy profile for players 1 and 2 are shown in Table 20.8. Recall that the payoff of player N is a constant. Therefore, both X and Y are optimal and we only need to consider players 1 and 2. {BN , B1 , B2 } is a Beliefs Equilibrium with Agreement and BR1 (B1 ) = {C1 }
and BR2 (B2 ) = {C2 }
if and only if pl
a−c . a−b
Equilibrium in beliefs under uncertainty
503
Table 20.9 A two-person game
C1 D1
C2
D2
c, c λa + (1 − λ)b, e
e, λb + (1 − λ)a d, d
Note that our assumptions guarantee that c−b >0 a−b
and
a−c < 1, a−b
so there exist values for pl and ph consistent with the above inequalities. However, {BN , B1 , B2 } does not contain a Nash Equilibrium. To see this, suppose that players 1 and 2 are Bayesians who agree that p(X) = λ = 1 − p(Y ) for any λ ∈ [0, 1]. Then they are playing the game in Table 20.9. The strategy profile {C1 , C2 } is a Nash Equilibrium if and only if c λa + (1 − λ)b
and c λb + (1 − λ)a.
There exists λ ∈ [0, 1] such that {C1 , C2 } is a Nash Equilibrium if and only if c
1 (a + b), 2
which contradicts the last inequality in (20.6). Therefore, it is never optimal for both Bayesian players to play C and any Nash Equilibrium requires both players to play D and therefore that both players receive d with certainty. In the Beliefs Equilibrium with Agreement constructed above, on the other hand, both players play C and receive c > d with certainty. To better understand why uncertainty aversion leads to a better equilibrium in this game, let us go back to the beliefs {B1 , B2 } of players 1 and 2. As explained in Section 20.2.1, although the global beliefs of players 1 and 2 on {X, Y } are represented by the same in (20.7), the local probability measures for different acts may be different. For example, the local probability measure on {X, Y } at the act corresponding to D1 is (X, pl ; Y , 1 − pl ) and for D2 it is (X, ph ; Y , 1 − ph ), respectively. In the sense of local probability measures, therefore, players 1 and 2 disagree on the relative likelihood of X and Y when they are consuming the acts D1 and D2 , respectively. This allows playing D to be undesirable for both players. The example delivers three messages. First, it shows that in a game involving more than two players, uncertainty aversion can lead to an equilibrium that Pareto dominates all Nash Equilibria. Second, interpreting player N in the above game as “nature,” the game becomes a two-person game where the players are uncertain
504 Kin Chung Lo about their own payoff functions. Therefore, uncertainty aversion can be “beneficial” even in two-person games. Third, the beliefs profile {BN , B1 , B2 } continues to be a Beliefs Equilibrium with Agreement if, for instance, the payoff of player N is independent of his own strategy, but is the highest when player 1 plays C1 and player 2 plays C2 . In this case, even if players can communicate, player N has a strict incentive not to announce his own strategy.14
20.6. Why do we have uncertainty aversion? The next question I want to address is: When should we expect (or not) the existence of an equilibrium reflecting uncertainty aversion? In the context of single-person decision theory, we do not have much to say about the origin or precise nature of beliefs on the set of states of nature. However, we should be able to say more in the context of game theory. The beliefs of the players should be “endogenous” in the sense of depending on the structure of the game. For example, it is reasonable to predict that the players will not be uncertainty averse if there is an “obvious way” to play the game. The following two examples identify possible reasons for players to be uncertainty averse. Example 20.6. Nonunique equilibria. In the game in Table 20.10, any strategy profile is a Nash Equilibrium. Any {B1 , B2 } is a Beliefs Equilibrium. Uncertainty aversion in this game is due to the fact that the players do not have any idea about how their opponents will play. Example 20.7. Nonunique best responses. In the game in Table 20.11, {U , L} is the only Nash Equilibrium. However, it is equally good for player 1 to play D if he believes that player 2 plays L. Under this circumstance, it may be too demanding to require player 2 to attach probability one to player 1 playing U . At the other extreme, where 2 is totally ignorant of 1’s strategy choice, we obtain the Proper Beliefs Equilibrium {B1 = {L}, B2 = M({U , D})}. This example shows that the existence of a unique Nash Equilibrium is not sufficient to rule out an equilibrium with uncertainty aversion. However, I can Table 20.10 A two-person game
U M D
L
C
R
1, 1 1, 1 1, 10
1, 1 1, 1 1, 10
10, 1 10, 1 10, 10
Equilibrium in beliefs under uncertainty
505
prove the following: Table 20.11 A two-person game
U D
L
R
0, 1 0, 1
1, 0.5 0, 2
Table 20.12 A two-person game
U D
L
R
2, 1 2, 2
1, 2 0, 2
Proposition 20.4. If the game has a unique Bayesian Beliefs Equilibrium and it is also a strict Nash Equilibrium, then there does not exist a Proper Beliefs Equilibrium. Proof. Let {bi }ni=1 be the unique Bayesian Beliefs Equilibrium. Since it is also a ∗ . Let strict Nash Equilibrium of the game, there exists si∗ ∈ Si such that bi = s−i ∗ ∈ B . Using {Bi }ni=1 be a Beliefs Equilibrium. According to Proposition 20.3, s−i i Proposition 20.3 and the definition of Beliefs Equilibrium, we have margSi Bj ⊆ ∗ ) = {s ∗ }. This implies {B }n ∗ n BRi (Bi ) ⊆ BRi (s−i i i=1 = {s−i }i=1 . i Corollary of Proposition 20.4. In a two-person game, if the game has a unique Nash Equilibrium and it is also a Strict Nash Equilibrium, then there does not exist a Proper Beliefs Equilibrium. A Proper Beliefs Equilibrium can be ruled out also if the game is dominance solvable. Proposition 20.5. If the game is dominance solvable, then there does not exist a Proper Beliefs Equilibrium.15 (See Table 20.12.) Proof. Let {Bi }ni=1 be a Beliefs Equilibrium. According to Proposition 20.2, BRi (Bi ) ⊆ Ri and Bi ⊆ RB i . Since iterated strict dominance and correlated rationalizability are equivalent, a dominance solvable game has a unique pure strat∗ . Therefore, BR (B ) = s ∗ egy profile {si∗ }ni=1 such that Ri = si∗ and RB i = s−i i i i ∗ and Bi = s−i .
20.7. Decision theoretic foundation Recently, decision theoretic foundations for Bayesian solution concepts have been developed. In particular, Aumann and Brandenburger (1995) develop epistemic
506
Kin Chung Lo
conditions for Nash Equilibrium. The purpose of this line of research is to understand the knowledge requirements needed to justify equilibrium concepts. Although research on the generalization of Nash Equilibrium to allow for uncertainty averse preferences has already started (see Section 20.8), serious study of the epistemic conditions for those generalized equilibrium concepts has not yet been carried out. In this section, I provide epistemic conditions for the equilibrium concepts proposed in this chapter. The main finding is that Beliefs Equilibrium (Beliefs Equilibrium with Agreement) and Bayesian Beliefs Equilibrium (Nash Equilibrium) presume similar knowledge requirements. This supports the interpretation of results in previous sections as reflecting solely the effects of uncertainty aversion. Before I proceed, I acknowledge that although the partitional information structure used below is standard in game theory (see, for instance, Aumann, 1987 and Osborne and Rubinstein, 1994: 76), it is more restrictive than the interactive belief system used by Aumann and Brandenburger (1995). In their framework, “know” means “ascribe probability 1 to” which is more general than the “absolute certainty without possibility of error” that is being used here (Aumann and Brandenburger, 1995: 1175). Apart from this difference, the two approaches share essentially the same spirit. There is a common set of states of the world. A state contains a description of each player’s knowledge, beliefs, strategy, and payoff function.16 Formally, the following notation is needed. Let be a common finite set of states of nature for the players with typical element ω. Each state ω ∈ consists of a specification for each player i of • • • •
Hi (ω) ⊆ , which describes player i’s knowledge in state ω (where Hi is a partitional information function) i (ω), a closed and convex set of probability measures on Hi (ω), the beliefs of player i in state ω fi (ω) ∈ M(Si ), the mixed strategy used by player i in state ω ui (ω, ·): S → R, the payoff function of player i in state ω.
To respect the partitional information structure, the payoff function ui : × S → R and the strategy fi : → M(Si ) are required to be adapted to Hi . Given fi , fi (ω)(si ) denotes the probability that i plays si according to the strategy fi in state ω. For each ω ∈ , player i’s beliefs over S−i are represented by a closed and convex set of probability measures Bi (ω) that is induced from i (ω) in the following way: Bi (ω) ≡ pi ∈ M(S−i ) | ∃qi ∈ i (ω) such that pi (s−i )
=
ω∈H ˆ i (ω)
qi (ω) ˆ
# j =i
fj (ω)(s ˆ j ) ∀s−i ∈ S−i
⎫ ⎬ ⎭
.
Equilibrium in beliefs under uncertainty
507
This specification is common knowledge among the players. Player i is said to know an event E at ω if Hi (ω) ⊆ E. Say that an event is mutual knowledge if everyone knows it. Let H be the meet of the partitions of all the players and H (ω) the element of H which contains the element ω. An event E is common knowledge at ω if and only if H (ω) ⊆ E. Say that player i is rational at ω if his strategy fi (ω) maximizes utility as stated in (20.3) when beliefs are represented by Bi (ω). The following proposition describes formally the knowledge requirements for {Bi }ni=1 to be a Beliefs Equilibrium. If each Bi is a singleton, then a parallel result for Bayesian Beliefs Equilibrium is obtained. (The version of Proposition 20.6 for Bayesian Beliefs Equilibrium for two-person games can be found in Theorem A in Aumann and Brandenburger, 1995: 1167.) To focus on the intuition, all the propositions in this section are only informally discussed. Their proofs can be found in the Appendix. Proposition 20.6. Suppose that at some state ω, the rationality of the players, {ui }ni=1 , and {Bi }ni=1 are mutual knowledge. Then {Bi }ni=1 is a Beliefs Equilibrium. The idea of Proposition 20.6 is not difficult. At ω, player i knows j ’s beliefs Bj (ω), payoff function BRj (ω), and that j is rational. Therefore, any strategy fj (ω ) for player j with ω ∈ Hi (ω), that player i thinks is possible, must be player j ’s best response given his beliefs. That is, fj (ω ) ∈ BRj (ω)(Bj (ω)) ∀ω ∈ Hi (ω). Since the preference ordering of player j is quasiconcave, any convex combination of strategies in the set {fj (ω ) | ω ∈ Hi (ω)} must also be a best response for player j . By construction, margSj Bi (ω) is a subset of the convex hull of {fj (ω ) | ω ∈ Hi (ω)}. This implies margSj Bi (ω) ⊆ BRj (ω)(Bj (ω)) and therefore {Bi }ni=1 is a Beliefs Equilibrium. In a Beliefs Equilibrium with Agreement, the beliefs {Bi }ni=1 of the players over the strategy choices of opponents are required to have the properties of agreement and stochastic independence. Since Bi is derived from i , it is to be expected that some restrictions on i are needed for {Bi }ni=1 to possess the desired properties. In the case where players are expected utility maximizers so that i (ω) is a singleton for all ω, Theorem B in Aumann and Brandenburger (1995: 1168) shows that by restricting {i }ni=1 to come from a common prior, mutual knowledge of rationality and payoff functions and common knowledge of beliefs are sufficient to imply Nash Equilibrium. In the case where players are uncertainty averse, the following proposition says that by restricting each player i to being completely ignorant at each ω about the relative likelihood of states in Hi (ω), exactly the same knowledge requirements that imply Nash Equilibrium also imply Beliefs Equilibrium with Agreement. Proposition 20.7. Suppose that i (ω) = M(Hi (ω)) ∀ω. Suppose that at some state ω, the rationality of the players and {ui }ni=1 are mutual knowledge and that {Bi }ni=1 is common knowledge. Then {Bi }ni=1 is a Beliefs Equilibrium with Agreement.
508
Kin Chung Lo
The specification of i (ω) as the set of all probability measures on Hi (ω) reflects the fact that player i is completely ignorant about the relative likelihood of states in Hi (ω). It is useful to explain the role played by this parametric specialization of beliefs. Given any state ω and any event E, the beliefs {p(E) | p ∈ M(Hi (ω))} of player i about E at ω must satisfy one and only one of the following three conditions. 1 2 3
p(E) = 1 ∀p ∈ M(Hi (ω)) if and only if Hi (ω) ⊆ E (player i knows E). {p(E) | p ∈ M(Hi (ω))} = [0, 1] if and only if Hi (ω) E and Hi (ω) ∩ E = Ø (player i does not know E or \ E). p(E) = 0 ∀p ∈ M(Hi (ω)) if and only if Hi (ω) ∩ E = Ø (player i knows
\ E).
Therefore, player j knows i’s beliefs about E if and only if one and only one of the following is true: 1 2 3
j knows that i knows E. j knows that i does not know E or \ E. j knows that i knows \ E.
As a result, mutual knowledge of beliefs about E implies agreement of beliefs about E. The common knowledge assumption in Proposition 20.7 is used only to derive the property of stochastically independent beliefs.17 Note that Theorem B in Aumann and Brandenburger (1995) is not a special case of Proposition 20.7. The common prior assumption imposed by their theorem coincides with the restriction on i imposed by Proposition 20.7 only in the case where Hi (ω) = {ω} ∀ω. Therefore, the examples they provide to show the sharpness of their theorem do not apply to Proposition 20.7. To serve this purpose, I provide Example 20.8 to show that Proposition 20.7 is tight in the sense that mutual knowledge rather than common knowledge of {Bi }ni=1 is not sufficient to guarantee a Beliefs Equilibrium with Agreement. Example 20.8. Mutual knowledge of beliefs is not sufficient for agreement. The game consists of three players. The set of states of nature is = {ω1 , ω2 , ω3 , ω4 }. The players’ information structures are H1 = {{ω1 , ω2 }, {ω3 , ω4 }}, H2 = {{ω1 , ω3 }, {ω2 , ω4 }}, and H3 = {{ω1 , ω2 , ω3 }, {ω4 }}. Their strategies are listed in Table 20.13. Suppose i (ω) = M(Hi (ω)) ∀ω. At ω1 , the beliefs {Bi (ω1 )}3i=1 of the players are mutual knowledge. For example, since B1 (ω) = M({U T , DT }) ∀ω ∈ ,
Equilibrium in beliefs under uncertainty
509
Table 20.13 Strategies
f1 f2 f3
ω1
ω2
ω3
ω4
L U T
L D T
R U T
R D T
B1 (ω1 ) is common knowledge and therefore mutual knowledge at ω1 . According to the proof of Proposition 20.7, marginal beliefs of the players agree. For example, margS2 B1 (ω1 ) = margS2 B3 (ω1 ) = M({U , D}). However, B3 (ω1 ) = M({LU , LD, RU }) is not common knowledge at ω1 . Player 3 does not know that player 1 knows player 3’s beliefs. At ω1 , player 3 cannot exclude the possibility that the true state is ω3 . At ω3 , player 1 only knows that player 3’s beliefs are represented by either B3 (ω3 ) = M({LU , LD, RU }) or B3 (ω4 ) = {RD}. Note that B3 (ω1 ) does not take the form required in the definition of Beliefs Equilibrium with Agreement. Finally, although the notion of Weak Beliefs Equilibrium (Definition 20.6) is not the main focus of this chapter, it is closely related to the equilibrium concepts proposed by the papers discussed in Section 20.8. According to the following proposition, complete ignorance and rationality at a state ω are sufficient to imply Weak Beliefs Equilibrium. Proposition 20.8. Suppose that at some state ω, i (ω) = M(Hi (ω)) and that players are rational. Then {Bi }ni=1 is a Weak Beliefs Equilibrium. By looking at the definition of Weak Beliefs Equilibrium more carefully, Proposition 20.8 is hardly surprising. For instance, given any n-person normal form game, it is immediate that {M(S−i )}ni=1 is a Weak Beliefs Equilibrium. Note that Proposition 20.8 requires only that the players be rational; they do not need to know that their opponents are rational. They also do not need to know anything about the beliefs of their opponents.
20.8. Related literature In this section, I compare my equilibrium concepts with those proposed by Dow and Werlang (1994) and Klibanoff (1993).18 Since the latter employs the same strategy space as in this chapter, let me first conduct a direct comparison. 20.8.1. Klibanoff (1993) Klibanoff (1993) also adopts the multiple priors model to represent players’ preferences in normal form games with any finite number of players and proposes the following solution concept:19
510
Kin Chung Lo
Definition 20.9. ({σi }ni=1 , {Bi }ni=1 ) is an Equilibrium with Uncertainty Aversion if the following conditions are satisfied: 1 2
σ−i ∈ Bi . minpi ∈Bi ui (σi , pi ) minpi ∈Bi ui (σi , pi ) ∀σi ∈ M(Si ).
σi is the actual strategy used by player i and Bi is his beliefs about opponents’ strategy choices. Condition 1 says that player i’s beliefs cannot be “too wrong.” That is, the strategy profile σ−i chosen by other players should be considered “possible” by player i. Condition 2 says that σi is a best response for i given his beliefs Bi . In addition, the following refinement of Equilibrium with Uncertainty Aversion is offered in his paper. ({σi }ni=1 , {Bi }ni=1 ) is an Equilibrium with Uncertainty Aversion and Rationalizable Beliefs if it is an Equilibrium with Uncertainty Aversion and, in addition, Bi ⊆ RBi (defined in Definition 20.8). That is, player i believes that his opponents’ strategy choices are correlated rationalizable. Although Klibanoff’s equilibrium concepts involve both the specification of beliefs and the actual strategies used by the players, while the equilibrium concepts in my chapter involve only the former, the main differences between them can be summarized in terms of beliefs in four aspects as shown in Table 20.14. It enables us to conclude that Beliefs Equilibrium with Agreement is a refinement of Klibanoff’s equilibrium concepts. Proposition 20.9. If {Bi }ni=1 is a Beliefs Equilibrium with Agreement, then for any σi ∈ margSi Bj , it is the case that ({σi }ni=1 , {Bi }ni=1 ) is an Equilibrium with Uncertainty Aversion and Rationalizable Beliefs. Since knowledge of rationality and beliefs is the most essential property underlying the equilibrium concepts, let us focus on two-person games where agreement and stochastic independence are not the issues. The following example illustrates Table 20.14 Comparison for two equilibrium concepts
Knowledge of rationality and beliefs Agreement of marginal beliefs Stochastically independent beliefs Rationalizable beliefs
Equilibrium with uncertainty aversion and rationalizable beliefs
Beliefs equilibrium with agreement
margSi Bj ∩ BRi (Bi ) = Ø
margSi Bj ⊆ BRi (Bi )
margSj Bi ∩ margSj Bk = Ø
margSj Bi = margSj Bk
Bi contains at least one product measure in ×j =i margSj Bi Bi ⊆ RB i
Bi contains all product measures in ×j =i margSj Bi Bi ⊆ RB i (Proposition 20.2)
Equilibrium in beliefs under uncertainty
511
Table 20.15 A two-person game
U D
L
C
R
10, 10 2, 4
1.99, 10 2, 4
10, 10 2, 5
that an Equilibrium with Uncertainty Aversion and Rationalizable Beliefs may not be a Beliefs Equilibrium. Example 20.9. Refinement of Klibanoff (1993). The game in Table 20.15 is deliberately constructed so that every strategy of every player survives iterated elimination of strictly dominated strategies. Therefore, Klibanoff’s standard equilibrium concept coincides with his own refinement. It is easy to check that {σ1 , σ2 , B1 , B2 } = {D, R, M(S2 ), M(S1 )} is an Equilibrium with Uncertainty Aversion (and Rationalizable Beliefs). This equilibrium predicts that D and R to be the unique best response for players 1 and 2, respectively. As a result, player 1 receives 2 and player 2 receives 5. It is reasonable that player 2 will play R. The reason is that R is as good as L and C if 2 plays U and it is strictly better than L and C if 2 plays D. However, if player 1 realizes this, 1 should play U , and as a result, both players will receive 10. Note that {B1 , B2 } = {M(S2 ), M(S1 )} is not a Beliefs Equilibrium. Moreover, no Beliefs Equilibrium in this game will predict D to be player 1’s unique best response. Finally, for any two-person game, Klibanoff’s standard equilibrium concept is equivalent to the notion of Weak Beliefs Equilibrium (Definition 20.6). Proposition 20.10. {B1 , B2 } is a Weak Beliefs Equilibrium if and only if there exists σi ∈ Bj such that {σ1 , σ2 , B1 , B2 } is an Equilibrium with Uncertainty Aversion. 20.8.2. Dow and Werlang (1994) Dow and Werlang (1994) consider two-person games and assume that players’ preference orderings over acts are represented by the convex capacity model proposed by Schmeidler (1989). Any such preference ordering is a member of the multiple priors model (Gilboa and Schmeidler, 1989). Their equilibrium concept can be restated using the multiple priors model as follows. Definition 20.10. {B1 , B2 } is a Nash Equilibrium Under Uncertainty if the following conditions are satisfied: 1 2
There exists Ei ⊆ Si such that pj (Ei ) = 1 for at least one pj ∈ Bj . minpi ∈Bi ui (si , pi ) minpi ∈Bi ui (si , pi ) ∀si ∈ Ei ∀si ∈ Si .
Dow and Werlang (1994) interpret Condition 1 as saying that player j “knows” that player i will choose a strategy in Ei . Condition 2 says that every si ∈ Ei is
512
Kin Chung Lo
a best response for i, given that Bi represents i’s beliefs about the strategy choice of player j . Unlike Klibanoff and this chapter, Dow and Werlang restrict players to choosing pure rather than mixed strategies. It is therefore important to reiterate the justification for using one strategy space instead of the other. According to the discussion in Section 20.4.2, the use of pure versus mixed strategy spaces depends on the perception of the players about the order of strategy choices. The adoption of a mixed strategy space in Klibanoff (1993) and in this chapter can be justified by the assumption that each player perceives himself as moving last. On the other hand, we can understand the adoption of a pure strategy space in Dow and Werlang (1994) as assuming that each player perceives himself as moving first and has an expected utility representation for preferences over objective lotteries on acts. Further comparison of Dow and Werlang (1994) and this chapter is provided in the next subsection. 20.8.3. Epistemic conditions I suggested in the introduction that in order to carry out a ceteris paribus study of the effects of uncertainty aversion on how a game is played, we should ensure that the generalized equilibrium concept is different from Nash Equilibrium only in one dimension, players’ attitude toward uncertainty. In particular, the generalized solution concept should share comparable knowledge requirements with Nash Equilibrium. According to this criterion, I argue in Section 20.7 that the solution concepts I propose are appropriate generalizations of their Bayesian counterparts. Dow and Werlang (1994) and Klibanoff (1993) do not provide epistemic foundations for their solution concepts and a detailed study is beyond the scope of this chapter. However, I show below that in the context of two-person normal form games, exactly the same epistemic conditions that support Weak Beliefs Equilibrium as stated in Proposition 20.8, namely, complete ignorance and rationality, also support Nash Equilibrium Under Uncertainty and Equilibrium with Uncertainty Aversion. Therefore, the sufficient conditions for players’ beliefs to constitute an equilibrium in these two senses do not require the players to know anything about the beliefs and rationality of their opponents. The weak epistemic foundation for their equilibrium concepts is readily reflected by the fact that given any two-person normal form game, {M(S2 ), M(S1 )} is always a Nash Equilibrium Under Uncertainty and there always exist σ1 and σ2 such that (σ1 , σ2 , M(S2 ), M(S1 )) is an Equilibrium with Uncertainty Aversion. The equilibrium notions in these two papers therefore do not fully exploit the difference between a game, where its payoff structure (e.g. dominance solvability) may limit the set of “reasonable” beliefs, and a single-person decision-making problem, where any set of priors (or single prior in the Bayesian case) is “rational.” In fact, Dow and Werlang (1994: 313) explicitly adopt the view that the degree of uncertainty aversion is subjective, as in the single-agent setting, rather than reasonably tied to the structure of the game. As a result, their equilibrium concept delivers a continuum of equilibria for every normal form game (see their theorem on p. 313).
Equilibrium in beliefs under uncertainty
513
Let me now proceed to the formal statements. Recall that Klibanoff’s standard equilibrium concept can be readily rewritten as a Weak Beliefs Equilibrium (Proposition 20.10). It follows that Proposition 20.8 provides the epistemic conditions underlying Klibanoff’s equilibrium concept as simplified here. Since Dow and Werlang (1994) adopt a pure strategy space, I keep all notation from Section 20.7 but redefine • • •
fi : → Si B i (ω) ≡ {pi
∈
M(Sj ) | ∃qi
∈
qi (ω) ˆ ∀sj ∈ Sj } ω∈H ˆ i (ω)∩{ω |fj (ω )=sj } BRi (Bi ) ≡ argmaxsi ∈Si minpi ∈Bi ui (si , pi ).
i (ω) such that pi (sj )
=
That is, fi (ω) is the pure action used by player i at ω and Bi (ω) is the set of probability measures on Sj induced from i (ω) and fj . It represents the beliefs of player i at ω about j ’s strategy choice. Proposition 20.11. Suppose that at some state ω, i (ω) = M(Hi (ω)) and that the players are rational. Then {B1 , B2 } is a Nash Equilibrium Under Uncertainty. When a decision maker’s beliefs are represented by a probability measure p, an event E is -null if p(E) = 0. It is well recognized that when preferences are not probabilistically sophisticated, there are alternative ways of defining nullity. The equilibrium concepts in Dow and Werlang (1994), Klibanoff (1993), and this chapter can all be regarded as generalizations of Nash Equilibrium if the “right” notion of nullity is adopted. To see this, first assume that each player does not have a strict incentive to randomize. Take S−i to be the state space of player i and Si to be a subset of acts which map S−i to R. Suppose that i represents the preference ordering of player i over Si . Player i is rational if he chooses si such that si i sˆi ∀ˆsi ∈ Si . The following is an appropriate restatement of Nash Equilibrium in terms of preferences: Definition 20.11. {i , j } is a Nash Equilibrium if the following conditions are satisfied: 1 2
There exists i ⊆ Si such that the complement of i is j -null. si i sˆi ∀si ∈ i ∀ˆsi ∈ Si .
In words, {i , j } is a Nash Equilibrium if the event that player i is irrational is j -null. Suppose i and j are represented by the multiple priors model. Let Bi and Bj be the sets of probability measures underlying i and j , respectively. Then {Bi , Bj } is a Beliefs Equilibrium (with Agreement) if and only if {i , j } satisfies Definition 20.11, with Si replaced by M(Si ) and using the definition of nullity as stated in Section 20.2.1. The equilibrium concepts of Dow and Werlang (1994) and Klibanoff (1993) are equivalent to Definition 20.11 if the notion of nullity in Dow and Werlang (1994) is adopted: an event is j -null if it is attached zero probability by at least one probability measure in Bj .20
514
Kin Chung Lo
The above discussion may lead the reader to think that the epistemic conditions provided for the equilibrium concepts in Dow and Werlang (1994) and Klibanoff (1993) are biased by using the notion of knowledge that is only appropriate for the equilibrium concepts proposed in this chapter. Therefore, it is worth reiterating that the conditions stated in Propositions 20.8 and 20.11 do not require the players to know anything about their opponents’ beliefs and rationality. Therefore, the notion of knowledge to be adopted is irrelevant. Moreover, Propositions 20.8 and 20.11 do not even exploit the fact that the information structure is represented by partitions. The two propositions and their proof continue to hold as long as the beliefs of player i at ω are represented by the set of all probability measures over an event Hi (ω) ⊆ with the property that ω ∈ Hi (ω). Therefore, the conclusion that complete ignorance and rationality imply the two equilibrium concepts remains valid even in the absence of partitional information structures.
20.9. More general preferences The purpose of this section is to show that even if we drop the particular functional form proposed by Gilboa and Schmeidler (1989) but retain some of its basic properties, counterparts of this paper’s equilibrium concepts and results can be formulated and proven. Let us first go back to the context of single-person decision theory and define a class of utility functions that generalizes the multiple priors model. Recall the notation introduced in Section 20.2.1, whereby is a preference ordering over the set of acts F , where each act maps into M(X). Impose the following restrictions on : Suppose that restricted to constant acts conforms to expected utility theory and so is represented by an affine u : M(X) → R. Suppose that there exists a nonempty, closed, and convex set of probability measures on such that is representable by a utility function of the form " % f → U (f ) ≡ V u ◦ f dp|p ∈ (20.8) for some function V : R → R. Assume that is monotonic in the sense that for any f , g ∈ F , if u ◦ f dp > u ◦ g dp ∀p ∈ , then f g. Say that is multiple priors representable if satisfies all the above properties. Quasiconcavity of will also be imposed occasionally. Two examples are provided here to clarify the structure of the utility function U in (20.8). Suppose there exists a probability measure µ over M( ) and a concave and increasing function h such that U (f ) = h u ◦ f dp dµ. M( )
In this example, the set of probability measures corresponds to the support of µ. The interpretation of the above utility function is that the decision maker views an act f as a two-stage lottery. However, the reduction of compound lotteries axiom
Equilibrium in beliefs under uncertainty
515
may not hold (Segal, 1990). Note that this utility function satisfies quasiconcavity. Another example for U which is not necessarily quasiconcave is the Hurwicz (1951) criterion; U (f ) = α min u ◦ f dp, u ◦ f dp + (1 − α) max p∈
p∈
where 0 α 1. Adapting the model to the context of normal form games as in Section 20.2.2, the objective function of player i is V ({ui (σi , bi ) | bi ∈ Bi }). All equilibrium notions can be defined precisely as before. I now prove the following extension of Proposition 20.3. Proposition 20.12. Consider an n-person game. Suppose that the preference ordering of each player is multiple priors representable and quasiconcave. If {Bi }ni=1 is a Beliefs Equilibrium, then there exists bi ∈ Bi such that {bi }ni=1 is a Bayesian Beliefs Equilibrium and BRi (Bi ) ⊆ BRi (bi ). Proof. As in the proof of Proposition 20.3, it suffices to show that, given Bi , there exists bi ∈ Bi such that BRi (Bi ) ⊆ BRi (bi ). I first show that for each σi ∈ BRi (Bi ), there exists bi ∈ Bi such that σi ∈ BRi (bi ). Suppose that this were not true. Then there exists σi ∈ BRi (Bi ) such that for each bi ∈ Bi , we can find σi ∈ M(Si ) with ui (σi , bi ) < ui (σi , bi ). This implies that there exists σi∗ ∈ M(Si ) such that ui (σi∗ , bi ) > ui (σi , bi ) ∀bi ∈ Bi . (See Lemma 3 in Pearce, 1984: 1048.) Since the preference of player i is monotonic, player i should strictly prefer σi∗ to σi when his beliefs are represented by Bi . This contradicts the fact that σi ∈ BRi (Bi ). Quasiconcavity of preference implies that BRi (Bi ) is a convex set. Therefore, there exists an element σi ∈ BRi (Bi ) such that the support of σi is equal to the union of the support of every probability measure in BRi (Bi ). Since σi ∈ BRi (bi ) for some bi ∈ Bi and ui (·, bi ) is linear on M(Si ), this implies that si ∈ BRi (bi ) ∀si ∈ support of σi . This in turn implies BRi (Bi ) ⊆ BRi (bi ). Besides Proposition 20.3, it is not difficult to see that Proposition 20.2 also holds if the preference ordering of each player is multiple priors representable. (The monotonicity of the preference ordering for player i ensures that in = ˆ in ∀n in the proof of Proposition 20.2.) Propositions 20.4 and 20.5 are also valid because their proofs depend only on Propositions 20.2 and 20.3. Finally, all the results in Section 20.6, except Proposition 20.6 which also requires preferences to be quasiconcave, are true under the assumption of multiple priors representable preferences.
20.10. Concluding remarks Let me first summarize the questions addressed in this chapter.
516 1 2 3 4 5
Kin Chung Lo What is a generalization of Nash Equilibrium (and its variants) in normal form games that allows for uncertainty averse preferences? What are the epistemic conditions for those equilibrium concepts? Can an outside observer distinguish uncertainty averse players from Bayesian players? Does uncertainty aversion make the players worse off (better off)? How is uncertainty aversion related to the structure of the game?
Generalizations of Nash Equilibrium have already been proposed by Dow and Werlang (1994) and Klibanoff (1993) to partly answer questions 3, 4, and 5. One important feature of the equilibrium concepts presented in this chapter that is different from Dow and Werlang (1994) but in common with Klibanoff (1993) is the adoption of mixed instead of pure strategy space. They can both be justified by different perceptions of the players about the order of strategy choices. On the other hand, I can highlight the following relative merits of the approach pursued here. A distinctive feature of the solution concepts proposed in my chapter is their epistemic foundation, which resemble as closely as possible those underlying the corresponding Bayesian equilibrium concepts. As pointed out by Dow and Werlang (1994: 313), their equilibrium concepts are only “presented intuitively rather than derived axiomatically.” In my chapter, some epistemic conditions are also provided for the equilibrium concepts proposed by Dow and Werlang (1994) and Klibanoff (1993). The weakness of their equilibrium concepts is revealed by the fact that the epistemic conditions do not involve any strategic considerations. This point was demonstrated in Section 20.8.3 where I noted that in any normal form game, regardless of its payoff structure, the beliefs profile {M(S2 ), M(S1 )} constitutes an equilibrium in their sense.
20.11. Appendix Proof of Proposition 20.6. Fix ω at the state where the rationality of the players, {ui }ni=1 and {Bi }ni=1 , are mutual knowledge. Player i knows player j ’s beliefs means Hi (ω) ⊆ {ω ∈ | Bj (ω ) = Bj (ω)}. Player i knows player j is rational means Hi (ω) ⊆ {ω ∈ | fj (ω ) ∈ BRj (ω )(Bj (ω ))}. Note that BRj varies with the state because uj does. Player i knows player j ’s payoff function means Hi (ω) ⊆ {ω ∈ | BRj (ω ) = BRj (ω)}.
Equilibrium in beliefs under uncertainty
517
Therefore, Hi (ω) ⊆ {ω ∈ | Bj (ω ) = Bj (ω)} ∩ {ω ∈ | fj (ω ) ∈ BRj (ω )(Bj (ω ))} ∩ {ω ∈ | BRj (ω ) = BRj (ω)} ⊆ {ω ∈ | fj (ω ) ∈ BRj (ω)(Bj (ω))}. This implies {fj (ω ) | ω ∈ Hi (ω)} ⊆ BRj (ω)(Bj (ω)). The fact that the preference of player j is quasiconcave implies that BRj (ω)(Bj (ω)) is a convex set. Therefore, we have convex hull of {fj (ω ) | ω ∈ Hi (ω)} ⊆ BRj (ω)(Bj (ω)). By construction of Bi (ω), margSj Bi (ω) ⊆ convex hull of {fj (ω ) | ω ∈ Hi (ω)} ⊆ BRj (ω)(Bj (ω)). This shows that {Bi }ni=1 is a Beliefs Equilibrium. Proof of Proposition 20.7. The conditions stated in Proposition 20.7 imply those in Proposition 20.6. Therefore, it is immediate that {Bi }ni=1 is a Beliefs Equilibrium. By construction of Bi (ω) and the assumption i (ω) = M(Hi (ω)) ∀ω, we have margSj Bi (ω) = convex hull of {fj (ω ) | ω ∈ Hi (ω)} ∀ω. Now fix ω at the state where the rationality of the players and {ui }ni=1 are mutual knowledge and {Bi }ni=1 is common knowledge. Player k knows player i’s beliefs implies that Bi (ω ) = Bi (ω) ∀ω ∈ Hk (ω) and therefore, convex hull of {fj (ω ) | ω ∈ Hi (ω )} = margSj Bi (ω ) = margSj Bi (ω) = convex hull of {fj (ω ) | ω ∈ Hi (ω)}
∀ω ∈ Hk (ω).
Let jk ⊆ {fj (ω ) | ω ∈ Hk (ω)} be the set of extreme points of margSj Bk (ω). I claim that jk ⊆ margSj Bi (ω) (and therefore margSj Bk (ω) ⊆ margSj Bi (ω)). Suppose that it were not true. Then there exists σj ∈ jk such that σj ∈ /
518
Kin Chung Lo
margSj Bi (ω ) ∀ω ∈ Hk (ω). Therefore, σj ∈ / {fj (ω ) | ω ∈ Hi (ω )} ω ∈Hk (ω)
⊇ {fj (ω ) | ω ∈ Hk (ω)} jk ∈ σj , which is a contradiction. Since i, j , and k are arbitrary, we have margSj Bi (ω) = margSj Bk (ω) and, in particular, ji = jk ≡ j . It only remains to show that Bi (ω) = convex hull of {σ−i ∈ M(S−i ) | margSj σ−i ∈ j }. Bi (ω) takes the form as required if and only if for each σ−i ∈ ×j =i j , there exists ω ∈ Hi (ω) such that fj (ω ) = σj ∀j = i. Suppose that the condition stated above were not satisfied. Without loss of generality, assume that for player 1, there exists σ−i ∈ ×j =i j such that for each / ω ∈ H1 (ω), there exists j = 1 where fj (ω ) = σj . This implies that σ−1 ∈ B1 (ω). B1 (ω) is common knowledge at ω implies that B1 (ω ) = B1 (ω) ∀ω ∈ / B1 (ω ) ∀ω ∈ H(ω). Therefore, for each ω ∈ H (ω), H(ω). Therefore, σ−1 ∈ there exists j = 1 where fj (ω ) = σj . Now consider player 2. The last sentence in the previous paragraph implies that for ω ∈ H (ω) such that f2 (ω ) = σ2 , for each ω ∈ H2 (ω ), there exists j ∈ / marg×nj=3 Sj B2 (ω ). Again {3, . . . , n} such that fj (ω ) = σj . Therefore, nj=3 σj ∈ B2 (ω) is common knowledge at ω ∈ implies B2 (ω ) = B2 (ω) ∀ω ∈ H (ω). / marg×nj=3 Sj B2 (ω ) ∀ω ∈ H (ω) and we can conclude that Therefore, nj=3 σj ∈ for each ω ∈ H(ω), there exists j ∈ {3, . . . , n} such that fj (ω ) = σj . Repeat the same argument for players 3, . . . , n to conclude that for each ω ∈ H(ω), fn (ω ) = σn . This contradicts the fact that σn ∈ n . Proof of Proposition 20.8. By construction of Bi (ω) and the assumption i (ω) = M(Hi (ω)), it follows that margSj Bi (ω) = convex hull of {fj (ω ) | ω ∈ Hi (ω)}. In particular, fj (ω) ∈ margSj Bi (ω). At ω, the fact that player j is rational implies fj (ω) ∈ BRj (ω)(Bj (ω)). Therefore, margSj Bi (ω) ∩ BRj (ω)(Bj (ω)) = Ø. Proof of Proposition 20.11. Set Ej = fj (ω). By construction of Bi (ω) and assumption i (ω) = M(Hi (ω)), it follows that Bi (ω) = M({fj (ω ) | ω ∈ Hi (ω)}). In particular, there exists a probability measure in Bi (ω) which attaches probability one to Ej . Therefore, Ej satisfies condition 1 in Definition 20.10. At ω, the fact that player j is rational implies fj (ω) ∈ BRj (ω)(Bj (ω)). Therefore, condition 2 in Definition 20.10 is also satisfied. This completes the proof that {B1 , B2 } is a Nash Equilibrium Under Uncertainty.
Acknowledgments This is a revised version of Chapter 1 of my PhD thesis at the University of Toronto. I especially thank Professor Larry G. Epstein for pointing out this topic, and for
Equilibrium in beliefs under uncertainty
519
providing supervision and encouragement. I am also grateful to Professors Eddie Dekel, R. M. Neal, Mike Peters, and Shinji Yamashige for valuable discussions and to an associate editor and two referees for helpful comments. Remaining errors are my responsibility.
Notes 1 The only exception is that when Y is the space of outcomes X, M(X) denotes the set of all probability measures over X with finite supports. 2 See Gilboa and Schmeidler (1989: 149–150). 3 Throughout this chapter, I use (y1 , p1 ; . . . ; ym , pm ) to denote the probability measure which attaches probability pi to yi . 4 Also note that uncertainty aversion is not the only reason for players to have a strict incentive to randomize. In Crawford (1990) and Dekel et al. (1991), players may also strictly prefer to randomize even though they are probabilistically sophisticated. 5 To avoid confusion, note that this is not Harsanyi’s Bayesian Equilibrium for games of incomplete information with Bayesian players. 6 To be even more precise, a Beliefs Equilibrium is an n-tuple of closed and convex sets of probability measures {Bˆ i }ni=1 such that the complement of BRi (Bˆ i ) is a set of margM(Si ) pˆ j -measure zero for every pˆ j ∈ Bˆ j . 7 A parallel statement for Bayesian players is that a Nash Equilibrium may not be a Strict Nash Equilibrium. 8 Note that this only explains why the decision maker may strictly prefer to randomize. We also need to rely on the dynamic consistency argument proposed by Machina (1989) to ensure that the decision maker is willing to obey the randomization result after the randomizing device is used. See also Dekel et al. (1991: 241) for discussion of this issue in the context of normal form games. 9 Mukerji (1994) argues that uncertainty about opponents’ strategy choices can even persist as a steady state in the repeated game scenario. 10 The notation supp jn−1 stands for the union of the supports of the probability measures in jn−1 . 11 A well-known example where this kind of reasoning applies is the following. An expected utility maximizer who is facing an exogenously specified set of states of nature always prefers to have more information before making a decision. However, this is not necessarily the case if the decision maker is playing a game against another player. The reason is that if player 1 chooses to have less information and if player 2 “knows” it, the strategic behavior of player 2 may be affected. The end result is that player 1 may obtain a higher utility by throwing away information. (See the discussion of correlated equilibrium in Chapter 2 of Fudenberg and Tirole, 1991.) 12 Though I prove a result later (Proposition 20.12) for more general preferences, I provide a separate proof here because the special structure of the Gilboa–Schmeidler model permits some simplification. 13 For the Bayesian Beliefs Equilibrium {bi }ni=1 constructed in the proof of Proposition 20.3, we actually have max min ui (σi , pi ) = max ui (σi , bi ).
σi ∈M(Si ) pi ∈Bi
σi ∈M(Si )
This of course, does not exclude the possibility that there may exist other Bayesian Beliefs Equilibria contained in {Bi }ni=1 such that the equality is replaced by a strict inequality. 14 Greenberg (2000) develops an example independently. The intuition of his example is very similar to that of Example 20.5 in this chapter.
520
Kin Chung Lo
15 We may also want to ask the reverse question: Are the conditions stated in Propositions 20.4 and 20.5 necessary for the absence of Proper Beliefs Equilibrium? The game in Table 20.12 has two Nash Equilibria (and therefore it is not dominance solvable). They are {U , R} and {D, L}. However, there does not exist a Proper Beliefs Equilibrium. 16 Another feature of the interactive belief system (Aumann and Brandenburger, 1995: 1164) that is shared by the model here is that players’ prior beliefs are not part of the specification. Note that the common prior assumption in their paper is imposed only for their Theorem B (p. 1168). 17 In particular, unlike Theorem B in Aumann and Brandenburger (1995), the proof of Proposition 20.7 does not rely on the “agreeing to disagree” result of Aumann (1976). 18 A brief review of other related papers is provided below. There are two other papers on generalizations of Nash Equilibrium. Lo (1999a) proposes Cautious Nash Equilibrium which, when specialized to the multiple priors model, refines the equilibrium concept in Dow and Werlang (1994). However, its main focus is on relaxing mutual knowledge of rationality, rather than uncertainty aversion. Mukerji (1994) proposes the equilibrium concept Equilibrium in ε-ambiguous Beliefs. The equilibrium concept only admits players’ utility functions having a specific form but otherwise is identical to that in Dow and Werlang (1994). Epstein (1997) and Mukerji (1994) generalize rationalizability. The former requires common knowledge of rationality but the latter does not. For normal form games of incomplete information, Epstein and Wang (1996) establish the general theoretical justification for the Harsanyi style formulation for non-Bayesian players. Lo (1999b) provides a generalization of Nash Equilibrium in extensive form games. All the above papers either adopt the multiple priors model or consider a class of preferences that includes the multiple priors model as a special case. 19 The equilibrium concept presented here is a simplified version. Klibanoff (1993) assumes that players’ beliefs are represented by lexicographic sets of probability measures. 20 To see that Klibanoff’s equilibrium concept satisfies Definition 20.11 when Dow and Werlang’s definition of nullity is adopted, restate Weak Beliefs Equilibrium in terms of Bˆ i : {Bˆ i , Bˆ j } is a Weak Beliefs Equilibrium if there exists bˆj ∈ Bˆ j such that σi ∈ BRi (Bˆ i ) ∀σi ∈ support of bˆj . Set i in Definition 20.11 to be the support of bˆj .
References F. J. Anscombe and R. Aumann, (1963) A definition of subjective probability, Ann. Math. Statist. 34, 199–205. R. Aumann, (1976) Agreeing to disagree, Ann. Statist. 4, 1236–1239. R. Aumann, (1987) Correlated equilibrium as an expression of Bayesian rationality, Econometrica 55, 1–18. R. Aumann and A. Brandenburger, (1995) Epistemic conditions for Nash equilibrium, Econometrica 63, 1161–1180. A. Brandenburger, (1992) Knowledge and equilibrium in games, J. Econ. Perspect. 6, 83–101. C. Camerer and M. Weber, (1992) Recent developments in modelling preference: Uncertainty and ambiguity, J. Risk. Uncertainty 5, 325–370. V. Crawford, (1990) Equilibrium without independence, J. Econ. Theory 50, 127–154. E. Dekel, Z. Safra, and U. Segal, (1991) Existence and dynamic consistency of Nash equilibrium with non-expected utility preferences, J. Econ. Theory 55, 229–246. J. Dow and S. Werlang, (1994) Nash equilibrium under Knightian uncertainty: Breaking down backward induction, J. Econ. Theory 64, 305–324. D. Ellsberg, (1961) Risk, ambiguity, and the savage axioms, Quart. J. Econ. 75, 643–669.
Equilibrium in beliefs under uncertainty
521
L. G. Epstein, (1997) Preference, rationalizability and equilibrium, J. Econ. Theory, 73, 1–29. L. G. Epstein and T. Wang, (1996) Beliefs about beliefs without probabilities, Econometrica, 64, 1343–1373. K. Fan, (1953) Minimax theorems, Proc. Nat. Acad. Sci. 39, 42–47. D. Fudenberg and J. Tirole, (1991) “Game Theory,” MIT Press, Cambridge, MA. I. Gilboa and D. Schmeidler, (1989) Maxmin expected utility with non-unique prior, J. Math. Econ. 18, 141–153. (Reprinted as Chapter 6 in this volume.) J. Greenberg, (2000) The right to remain silent, Theory and Decision 48, 193–204. L. Hurwicz, (1951) Optimality criteria for decision making under ignorance, Cowles Commission Discussion Paper. P. Klibanoff, (1993) Uncertainty, decision, and normal form games, manuscript, MIT. K. C. Lo, (1999a) Nash equilibrium without mutual knowledge of rationality, Economic Theory 14, 621–633. K. C. Lo, (1999b) Extensive form games with uncertainty averse players, Games and Economic Behaviour 28, 256–270. S. Mukerji, (1994) A theory of play for games in strategic form when rationality is not common knowledge, manuscript, Yale University. M. Machina, (1989) Dynamic consistency and non-expected utility models of choice under uncertainty, J. Econ. Lit. 27, 1622–1668. M. Machina and D. Schmeidler, (1992) A more robust definition of subjective probability, Econometrica 60, 745–780. J. Nash, (1951) Non-cooperative games, Ann. Math. 54, 286–295. M. Osborne and A. Rubinstein, (1994) “Game Theory,” MIT Press, Cambridge, MA. D. Pearce, (1984) Rationalizable strategic behaviour and the problem of perfection, Econometrica 52, 1029–1050. H. Raiffa, (1961) Risk, ambiguity and the savage axioms: Comment, Quart. J. Econ. 75, 690–694. L. Savage, (1954) “The Foundations of Statistics,” Wiley, New York. D. Schmeidler, (1989) Subjective probability and expected utility without additivity, Econometrica 57, 571–581. (Reprinted as Chapter 5 in this volume.) U. Segal, (1990) Two-stage lotteries without the reduction axiom, Econometrica 58, 349–377.
21 The right to remain silent Joseph Greenberg
21.1. Introduction Over the last two decades economic theorists (in fields that include microeconomics, macroeconomics, industrial organization, and labor economics) have extensively studied dynamic situations in which players move sequentially. Most of the formal analysis is done by representing the model under consideration as a “game tree,” and then employing the notion of “equilibrium in strategy profiles,” notably Nash equilibrium (or any one of its many refinements). In recent years economists and game theorists have come to recognize many shortcomings of Nash equilibrium. In a narrow sense, the contribution of this short chapter is pointing out another deficiency of Nash equilibrium in dynamic games. Much more ambitiously, I hope to convince (at least some of ) the readers that “equilibrium in strategy profiles” is not the appropriate notion that ought to be used in the analysis of dynamic games. This is true both conceptually and empirically: it is very hard to interpret a strategy profile (viewed as either a choice of actions or as beliefs), and neither introspection nor observed behavior suggest that players consider strategy profiles. Moreover, I shall show that the use of “equilibrium in strategy profiles” does not allow players to use ambiguity to their advantage. In “normal form games” the strategy sets constitute part of the given data. Indeed, such games are now known as “games in strategic form.” Thus, in any analysis of a normal form game, strategies must constitute the basic (in fact, the only!) “building block.” Such is not the case with dynamic games. It was an ingenious idea to invent the notion of strategy in dynamic games, enabling to transform them into normal form games.1 This transformation is not trivial; a “strategy” becomes a function that assigns to every information set an action available at this information set. In particular, a strategy profile specifies the precise ( perhaps probabilistic) actions to be taken in every possible contingency (information set). Clearly, this notion is both complex and unintuitive.2 It is also very difficult to interpret this notion. But more importantly, I shall argue that rarely
Greenberg, Joseph (2000) “The right to remain silent,” Theory and Decision, 48: 193–204.
The right to remain silent
523
do we observe players employing strategies. A more plausible building block in the analysis of dynamic games is, perhaps, a path or a play, that is, the course of action that is to be followed.3 I also contend that in many social interactions in “real life” players communicate and discuss their choice of actions, (even if no agreement can be signed or trusted).4 Players, typically, negotiate over the “paths” to be taken, not over strategies. The Example in Section 21.3 illustrates that when players are involved in “open negotiations,” it may be disadvantageous for a player to choose a strategy. That is, a player may benefit by not revealing (or not pre-determining) the choice of his action in an information set, he thereby hopes will not be reached.5 He would be better-off to “cross the bridge if and when he gets to it.” A player might benefit from exercising his “right to remain silent” if he believes – as the empirical evidence shows – that players display aversion to “Knightian uncertainty.”6 In that case, a player who behaves strategically, may wish to avoid revealing/choosing his strategy.7 Section 21.4 concludes the chapter with a discussion of some related literature, and with a modification of the Example of Section 21.3 demonstrating that by “remaining silent,” all players can be made better-off, relative to the (unique) Nash equilibrium.
21.2. Strategies in a dynamic game In this section I shall argue that the analysis of a dynamic game should not be based on “equilibrium in strategy profiles.” This is true both conceptually and empirically. On the conceptual level, it is by no means clear what it is that a strategy profile, in dynamic games, represents, because a crucial feature of a dynamic game is that (some of ) players’ actions are revealed along the play of the game. Ever since Cournot, a strategy in a normal form game typically represents a choice of action(s). It is in this way that game theorists have, for a long time, interpreted the notion of a strategy also in extensive form game. But then, how are we to interpret, for example, the action player i’s strategy specifies in some information set h, if that information set cannot be reached (because of i’s own previous choice of actions) if i were to follow this strategy?8 To rescue the usefulness of the notion of “equilibrium in strategy profiles” (and hence, of Nash equilibrium or its refinements) in dynamic games, it was then suggested to interpret a strategy of player i as representing the beliefs other players have over the actions i would take. But, again, because in a dynamic game (some of ) players’ actions are revealed along the play of the game, the beliefs other players have over the actions i would take should be modified as the game unravels and i’s past actions are revealed. Beliefs, therefore, ought to depend on the subgame reached.9 In addition to the conceptual difficulties, strategy profiles fail also to be descriptive. Typically, individuals do not consider all possible contingencies. Rather, players often “negotiate openly,” trying to “convince,” “influence,” “coordinate,” and “agree” on a course of action that is to be followed.10 Sometimes, such agreements
524
Joseph Greenberg
include clauses that prescribe the precise consequences (sanctions/punishments) for some deviations. But rarely, if ever, are all possible deviations covered. Almost no contract is “complete.” The same is true for any “social norm” or “legal system.” They specify the “appropriate/legal/acceptable behavior,” but neither the social norm nor any legal system pins down the precise actions (“punishments”) to be taken in all contingencies that might possibly arise when the prescribed behavior is not followed.11 To conclude, any notion that uses “equilibrium in strategy profiles” considerably limits the relevance of the analysis. This is true when strategies are interpreted as the actual choice of actions by the players or as players’ beliefs, or as representing the legal system or players’ “thought processes.”
21.3. An example Consider the following diplomatic “peace-negotiation” scenario, which is represented by the game tree in Figure 21.1.12 Each of the two warring countries, 1 and 2, has to decide whether or not to reach a peace agreement, represented by the path (bd). Failing to reach an agreement, country 3 would “re-evaluate” its policy, a decision that will affect both countries 1 and 2. Assume that country 3 has no way to know which of the two countries caused the breakup of the negotiations (otherwise, it could threaten to retaliate against that country). All it observes is whether or not the negotiations were successful. As the payoffs in Figure 21.1 indicate, it is in the best interest of country 3 that the two warring countries sign the peace agreement.13 Since country 3 cannot know who is responsible for the breakup of the peace negotiations, both policies L and R are “rational.” Both countries 1 and 2 (correctly) anticipate this set of “plausible/rational” re-evaluated policies. Therefore, unless country 3 pre-determines, or reveals in advance the policy it is going to adopt should the peace treaty not be reached, countries 1 and 2 have no way to know (even probabilistically) which policy would be adopted by country 3. It is, then, conceivable14 that each country 1 b a
2 c
3
v L
0 9 1
w L
R
9 0 0
Figure 21.1
d
3 9 0
R
6 0 1
4 4 4
The right to remain silent
525
will follow the path (bd), but each because of different reasons: country 1 for believing that policy L is more likely to be adopted than policy R, and country 2 for believing that policy R is more likely to be adopted than policy L. It is important to observe that if both countries held the same beliefs on the precise likelihood of the adoption of policies L and R, at least one of these two countries would find it in its best interest to jeopardize the peace talks. Nevertheless, by remaining silent, player 3 can create some uncertainty in the other players’ minds, thereby accomplishing his goal (that his information set is not reached). However, no Nash equilibrium for this game supports the path (bd). In fact, this game possesses a unique Nash equilibrium,which is given by: player 1 chooses actions a and b with equal probabilities (i.e. he uses the mixed strategy ( 12 a; 12 b)), player 2 chooses c (with probability 1), and player 3 chooses actions L and R with equal probabilities (i.e. he uses the mixed strategy ( 12 L, 12 R)). The resulting equilibrium payoff vector is (4.5, 4.5, 0.5).15 The success of the peace talks between Israel and Egypt ( players 1 and 2) mediated by the USA (player 3) following the 1973 war, may be, at least partially, attributed to such a phenomenon. Egypt and Israel were each afraid that if negotiations broke down, she would be the loser. “And once a negotiation is thus reduced to details, it has a high probability of success – unless one party has consciously decided to make a show of flexibility simply to put itself in a better light for a deliberate breakup of the talks.16 Egypt was precluded from such a course by the plight of the Third Army, Israel by the fear of diplomatic isolation” (Kissinger, 1982: 802). I shall now show how player 3 can implement the path (bd) when players are allowed to openly communicate. Were I player 3, I would suggest players 1 and 2 to follow the path (bd). I definitely would choose not to disclose the choice of my action if my information set were to be reached. By “remaining silent,” players 1 and 2 would no longer have a single common belief about my choice of action. It is then conceivable that player 1 might fear that I would choose L (with probability greater than 5/9), and that player 2 might fear that I would choose R (with probability greater than 5/9). In this case, each of the two players would be happy with the payoff of 4, thus, they would accept my suggestion to follow the path (bd). And I shall get a payoff of 4 instead of my Nash equilibrium payoff of 1/2. That is, by deferring or concealing the choice of my strategy, I may well deter the players from employing the Nash strategies, thereby considerably increasing my own payoff. The unique Nash equilibrium may not be acceptable even if it is interpreted as a recommendation. Indeed, if either an outside recommender or one of the two players were to suggest that we follow the unique Nash equilibrium rather than the path (bd), I, as player 3, would openly reject this recommendation. Instead, I would tell the other two players that I am not yet sure which probability distribution over my actions L and R I will choose, but in any case, I can assure them that I shall not follow their (Nash) recommendation. Note that this threat of mine is “credible.” For, if players 1 and 2 would follow their Nash strategies, then my (expected) payoff is 1/2 no matter what action I choose. I stand to lose nothing by
526
Joseph Greenberg
adhering to my threat. It is, therefore, likely that players 1 and 2 would reconsider and agree to follow that path (bd), instead.
21.4. Concluding remarks Remark 21.1. The following simple modification of our example shows that the strategic employment of Knightian uncertainty might yield an outcome that is Pareto superior to the (unique) Nash outcome. Since the game in Figure 21.1 has a unique Nash equilibrium that passes through player 3’s information set, the only Nash payoff in the game depicted in Figure 21.2 is (2, 2, 2, 2). But, if player 3 does not specify his strategy, then the players may well agree to follow the path (Dbd) which yields the Pareto superior payoff of (4, 4, 4, 4). (Note that in this example player 4 need not worry that player 3 might decide to “double cross,” that is, to remain silent in order to induce player 4 to choose D, and then to disclose his choice were his information reached. Player 3’s interests are best served by remaining silent.) Remark 21.2. Knight (1921) argued for a distinction between uncertainty (a situation in which players are not informed about the “objective” probabilities) and risk (when the “objective” probabilities are known by the players). There is ample evidence that players behave differently under uncertainty and risk. Specifically, most players exhibit aversion to uncertainty. The best known example of this phenomenon is the Ellsberg (1961) Paradox. As Ellsberg (1961: 656) notes: The important finding is that, after rethinking all their “offending” decisions in the light of [Savage] axioms, a number of people who are not only 4 U
D 1 b
2 2 2 2
–a c v
L
0 9 1 0
2
3
w
R
9 0 0 0
Figure 21.2
d
L
3 9 0 0
R
6 0 1 0
4 4 4 4
The right to remain silent
527
sophisticated but reasonable decide that they wish to persist in their choices. This includes many people who previously felt a “first-order commitment” to the axioms, many of them surprised and some dismayed to find that they wished, in these situations, to violate the Sure-thing Principle. Many subsequent studies (see, e.g. Camerer and Weber (1992)) have found ambiguity premiums which are strictly positive. Observe that for the purpose of this chapter, the magnitude of these premiums (which is typically around 10–20 percent in expected value terms) is irrelevant. The existence of these premiums implies that one can construct examples, similar to the one given in Section 21.3, in which it would benefit a player not to reveal/pre-determine his choice of actions in some contingencies. Remark 21.3. As was mentioned in the Introduction, there are many other solution concepts that support the path (bd). Bernheim (1984) and Pearce’s (1984) notion of “rationalizability,” which is appropriate if no communication among players takes place, includes this path. Other concepts that include this path emerge from the recent literature on “learning,” and are motivated by the fact that “off equilibrium choices” are not observed, and hence the requirement of “commonality of beliefs” cannot be justified.17 Finally, (bd) is also included in the solution concepts that modify the notion of Nash equilibrium to incorporate Knightian uncertainty.18 But all of the above are notions of “equilibrium in strategies” (or in “capacities”), and they all extend the notion of Nash equilibrium. The same is true for rationalizable outcomes. Thus, even in our simple example, these notions support other paths as well (including the “Nash path”). In contrast, I am not attempting here to come up with an “equilibrium notion” in the absence of commonality of beliefs or in the presence of Knightian uncertainty. Rather, I suggest that players use these features to their advantage. In particular, in our example, I suggest that it is the path (bd) that would result in that game. Remark 21.4. Of course, just as it might pay a player not to reveal his choice of “credible” action in some of his information sets, (as is the case with player 3 in our example), there are other situations in which a player may wish to reveal the actions he intends to take in the future, thereby attracting players to his information set. I intend to further study the set of paths that is likely to prevail when players behave strategically, but my purpose here is only to suggest that equilibrium in strategies might be inappropriate to study strategic behavior in dynamic games.
Appendix: Proof of uniqueness We shall now verify that the game depicted in Figure 21.1 admits a unique Nash equilibrium, given by: Player 1 uses the mixed strategy ( 12 a, 12 b), player 2 uses the pure strategy c, and player 3 uses the mixed strategy ( 12 L, 12 R). It is easy to see that there is no Nash equilibrium in which player 3 employs a pure strategy, since if it is R then player 1 must choose a, in which case, player 3’s
528
Joseph Greenberg
best response is L. If, on the other hand, player 3’s pure strategy is L, then player 1 will choose b, player 2 will choose c, in which case, player 3’s best response is R. Moreover, in every Nash equilibrium player 1 must employ a strictly mixed strategy, since otherwise player 3 would know whether he is in vertex v or in vertex w, and thus employ a pure strategy, contrary to the earlier argument. As for player 2, he cannot employ the pure strategy d, since then player 3 would know that he is in vertex v and employ the pure strategy L, contradicting our conclusion that in every Nash equilibrium player 3 does not employ a pure strategy. Denote by α, β, and γ , respectively, the probabilities that player 1 chooses a, player 2 chooses c and player 3 chooses L in a Nash equilibrium for this game. By the earlier discussion, we have that 0 < α, γ < 1, and β > 0. We shall now show that the only values that α, β and γ can assume are 1/2, 1, and 1/2, respectively. To see that β = 1, assume otherwise. Then, since we have established that β > 0, player 2 employs a strictly mixed strategy and therefore he must be indifferent between c and d. That is, 9γ = 4, that is, γ = 4/9. Since β < 1, player 1’s unique best response is a, (guaranteeing himself the payoff of 5), which contradicts our conclusion that player 1 uses a strictly mixed strategy. Thus, β = 1. As 0 < α < 1, player 1 is indifferent between a and b. That is 9(1 − γ ) = 3γ + 6(1 − γ ), implying that γ = 1/2. Finally, since 0 < γ < 1 player 3 is indifferent between L and R, that is, α = 1/2. Thus, in the unique Nash equilibrium in this game, α = 1/2, β = 1, and γ = 1/2 – as we wished to show.
Acknowledgments I thank Daniel Arce, Geir Asheim, Ariel Assaf, Giacomo Bonanno, Faye Diamantoudi, Benyamin Shitovitz, Xiao Luo, and Licun Xue for their useful comments and advice. I also thank the Editor for his support and encouragement. Financial support from the Research Council of Canada (both the Natural Sciences and Engineering (NSERC), and the Social Sciences and Humanities (SSHRC)) and from Quebec’s Fonds (FCAR) is gratefully acknowledged.
Notes 1 My understanding is that the definition of a strategy in dynamic games is due to Kuhn (1953). 2 This is evidenced, for example, by the difficulty almost every student encounters when first exposed to this notion. 3 See Greenberg (1990, 1996), and Greenberg et al. (1996). 4 This, of course, is in sharp contrast to the social environment envisioned by Nash (1951) where: “each participant acts independently, without collaboration or communication with any of the others.” 5 See Remark 21.4. 6 See Remark 21.2. 7 Ed Green communicated to me that some of his colleagues in the Federal Reserve System in Minnesota use the term “constructive ambiguity” to describe a policy of
The right to remain silent
8 9 10
11 12 13 14 15 16 17 18
529
being deliberately vague about how far they would be willing to go to bail out a large bank if one were to fail. For a more detailed discussion, see, for example, Rubinstein (1991). A similar criticism, regarding the notion of subgame perfect equilibrium, was put forward by Binmore (1987), arguing that players cannot hold to their beliefs if these beliefs have been proved to be wrong in the past; see, also, Osborne and Rubinstein (1994). Only a very limited set of real life situations is captured in Nash’s “complete noncommunicative” realm. As I have argued in Greenberg (1990), the description of a normal form game does not provide any information concerning the way in which the game is being played. For example, it provides no information concerning the availability of legal institutions that allow for binding agreements, self-commitments, or coalition formation. Nash equilibrium, in addition to providing a solution concept, also “completes” the description of the game by assuming that every player takes the actions of the other players as given. For a more detailed analysis and discussion, see Greenberg (1996). The example is reminiscent of the “horse-shaped game” in Fudenberg and Kreps (1995: Example 6.1). Country 3’s payoff in that case is 4, while the most it can obtain if the negotiations break up is a payoff of 1. See Remark 21.2. See proof in Appendix. Thus, the USA was unable to know which of the two players is “really” responsible for the breakup of the talks, as is reflected in Figure 21.1. (My footnote.) See, for example, Fudenberg and Levine (1993), Kalai-Lehrer (1993), and Rubinstein and Wolinsky (1994). See, for example, Dow–Werlang (1994), Goes et al. (1998), Hendon et al. (1994), Klibanoff (1993), and Lo (1996). Goes et al. (1998) consider a game similar to our example, and they, too, single out the path (bd), from among the set of Nash equilibrium in lower probabilities.
References Bernheim, D. (1984). Rationalizable strategic behavior, Econometrica, 52, 1007–1028. Binmore, K. (1987). Modelling rational players: Part I, Economics and Philosophy, 3: 179–214. Camerer, C. and Weber, M. (1992). Recent developments in modelling preferences: Uncertainty, and Ambiguity, Journal of Risk and Uncertainty, 5(4): 325–370. Dow, J. and Werlang, S. (1994). Nash equilibrium under Knightian uncertainty, Journal of Economic Theory, 64(2): 305–324. Ellsberg, D. (1961). Risk, ambiguity, and the savage axioms, Quarterly Journal of Economics, 75: 643–669. Fudenberg, D. and Levine, D. (1993). Self-confirming equilibrium, Econometrica, 61(3): 523–545. Fudenberg, D. and Kreps, D. (1995). Learning in extensive-form games, I: Self-confirming equilibria, Games and Economic Behavior, 8: 20–55. Goes, E., Jacobason, H. J., Sloth, B. and Tranaes, T. (1998). Nash equilibrium with lower probabilities, Theory and Decision, 44: 37–66. Greenberg, J. (1990). The theory of social situations: an alternative game-theoretic approach. Cambridge University Press.
530 Joseph Greenberg Greenberg, J. (1996). Acceptable course of action in dynamic games, Proceedings of the Seventh International Symposium on Dynamic Games and Applications; Filar J., Gaitsgory V. and Imado F. (eds.), 283–298. Greenberg, J., Monderer, D. and Shitovitz, B. (1996). Multistage situations, Econometrica, 64(6): 1415–1437. Hendon, E., Jacobason, H. J., Sloth, B. and Tranaes, T. (1994). Game theory with lower probabilities. University of Copenhagen, Mimeo. Kalai, E. and Lehrer, E. (1993). Subjective equilibrium in repeated games, Econometrica, 61(5): 1231–1240. Kissinger, H. (1982). Years of upheaval. Boston Little Brown and Company. Klibanoff, P. (1993). Uncertainty, decision and normal-form games. Cambridge, MA: MIT, Mimeo. Knight, F. (1921). Risk, uncertainty, and profit. Houghton-Mifflin. Kuhn, H. W. (1953). Extensive games and the problem of information. Contribution to the Theory of Games Vol. i (pp. 193–216). Princeton, NJ: Princeton University Press. Lo, C. K. (1996). Equilibrium in beliefs under uncertainty, Journal of Economic Theory, 71: 443–484. (Reprinted as Chapter 20 in this volume.) Osborne, J. M. and Rubinstein, A. (1994). A course in game theory. Cambridge, MA: The MIT Press. Pearce, D. (1984). Rationalizable strategic behavior and the problem of perfection, Econometrica, 52, 1029–1050. Rubinstein, A. (1991). Comments on the interpretation of game theory, Econometrica, 59: 909–924. Rubinstein, A. and Wolinsky, A. (1994). Rationalizable conjectural equilibrium: between Nash and rationalizability, games and economic behavior, Games and Economic Behavior, 6(2): 299–311.
22 On the measurement of inequality under uncertainty Elchanan Ben-Porath, Itzhak Gilboa, and David Schmeidler
22.1. Motivation The bulk of the literature on inequality measurement assumes that the income profiles do not involve uncertainty. It is natural to suppose that if uncertainty (or risk) is present, one may use the theory of decision under uncertainty to reduce the inequality problem to the case of certainty, say, by replacing each individual’s income distribution by its expected value or expected utility. Alternatively, it would appear that one may use the theory of inequality measurement to reduce the problem to a single decision-maker’s choice under uncertainty, say, to the choice among distributions over inequality indices. We claim, however, that neither of these reductions would result in a satisfactory approach to the measurement of inequality under uncertainty. Rather, inequality and uncertainty need to be analyzed in tandem. The following example illustrates. Consider a society consisting of two individuals, a and b. They are facing two possible states of the world, s and t. A social policy determines the income of each individual at each state of the world. We further assume that both individuals, hence also a “social planner,” have identical beliefs represented by a probability over the states. Say, the two states are equally likely. Consider the following possible choices (or “social policies”). f1 s t
a 0 1
b 0 1
f2 s t
a 1 0
b 1 0
g1 s t
a 0 1
b 1 0
g2 s t
a 1 0
b 0 1
h1 s t
a 0 0
b 1 1
h2 s t
a 1 1
b 0 0
“On the Measurement of Inequality under Uncertainty”, by Elchanan Ben-Porath, Itzhak Gilboa, and David Schmeidler (Journal of Economic Theory, 75 (1997): 194–204).
532
Ben-Porath et al.
We argue that a reasonable social ordering · would rank these choices from top to bottom: f1 ≈ ·f2 > ·g1 ≈ ·g2 > ·h1 ≈ ·h2 . Indeed, symmetry between the states and anonymity of individuals imply the equivalence relations. The f choices are preferred to the g choices due to ex-post inequality: in all of these (f and g) alternatives, the expected income of each individual is 0.5, and thus there is no inequality ex-ante. But according to f , both individuals will have the same income at each state of the world, while under g the resulting income profile will have a rich individual and a poor individual.1 One may argue that the f alternatives are riskier than the g ones from a social standpoint, since under f there is a state of the world in which no individual has any income, whereas the g alternatives allow additional transfers in each state of the world, after which both individuals would have a positive income. However, we consider a social planner’s preferences over final allocations. These preferences are the basis on which potential transfers will be made. The comparison between g and h hinges on ex-ante inequality: ex-post, both choices have the same level of inequality at each state of the world. However, the g alternatives promise each individual the same expected income, while the h choices pre-determine which individual will be the rich and which will be the poor one. Thus g is “more ex-ante egalitarian” than h. Matrices g2 and h2 are identical to those of Diamond (1967)2 . We observe that one cannot capture these preferences if one reduces uncertainty to, say, expected utility and measures the inequality of the latter, or vice versa. For instance, suppose that the Gini index is the accepted measure of inequality. In the case of two individuals, and in the absence of uncertainty, the Gini welfare function can be written as G(y1 , y2 ) = (3y˜1 + y˜2 )/4 where (y˜1 , y˜2 ) is a permutation of (y1 , y2 ) such that y˜1 y˜2 . Ranking alternatives by the Gini welfare function of the expected incomes will distinguish between g and h, but not between f and g. On the other hand, selecting the expected Gini index as a choice criterion will serve to distinguish between f and g, but not between g and h. By contrast, a (weighted) average of the expected Gini index and the Gini of the expected income would rank f above g and g above h. We are therefore interested in measures of social welfare under uncertainty that take into account both ex-ante and ex-post inequality, and, in particular, that include the above-mentioned functionals. Furthermore, our goal is to characterize a class of measures that is a natural generalization of those commonly used for the measurement of social welfare under certainty. That is, we seek a set of principles that will be equally plausible in the contexts of certainty and of uncertainty, that is satisfied by known measures under certainty, and that, under uncertainty, will reflect both ex-ante and ex-post inequality considerations.
22.2. Inequality and uncertainty The relationship between the measurement of inequality and the evaluation of decisions under uncertainty has long been recognized. Harsanyi’s utilitarian solution (Harsanyi, 1955) corresponds to maximization of expected utility, while
Measurement of inequality under uncertainty
533
Rawls’ egalitarian solution (Rawls, 1971) is equivalent to maximization of the minimal payoff under uncertainty.3 Further, the previous decade has witnessed several derivations of a class of functionals that generalizes both expected utility (in the context of choice under uncertainty) and the Gini index of inequality (in the context of inequality measurement). The reader is referred to Weymark, 1981; Quiggin, 1982; Chew, 1983; Yaari, 1987, 1988; Schmeidler, 1989. Chew also pointed out the relationship between the “rank-dependent probabilities” approach to uncertainty and the generalization of the Gini index. The “rank-dependent probabilities” approach suggests that the probability weight assigned to a state of the world in the evaluation of an uncertain act f depends not only on the state, but also on its relative ranking according to f . Yet, if we restrict our attention to acts that are “comonotonic,” that is, that agree on the payoff-ranking of the states, probabilities play their standard role as in expected utility calculations. We therefore refer to these functionals as “comonotonically linear.” For simplicity, consider the symmetric case. A symmetric comonotonically linear functional would be characterized by a probability vector (pi )ni=1 , where there are n states of the world. Given an uncertain act f , that guarantees a payoff of fj at state of the world j , let f (i) be the ith lowest payoff in f . Then, f is evaluated by a weighted sum I (f ) =
pi f (i) .
i
In the context of inequality measurement, Weymark, 1981; Chew, 1983 and Yaari, 1988 considered the same type of evaluation functionals, where an element j is interpreted as an individual in a society, rather than as a state. In this context, it is natural to impose the symmetry (or “anonymity”) condition, which implies that an individual’s weight in the comonotonically linear aggregation given depends only on her social ranking. A common assumption is that p1 p2 · · · pn . For such a vector p, the functional given represents social preferences according to which a transfer of income from a richer to a poorer individual that preserves the social ranking, can only increase social welfare. Special cases of these functionals are the following: (i) if pi = 1/n, we get the average income function; (ii) if pi−1 − pi = pi − pi+1 > 0 (for 1 < i < n), the resulting functional agrees with the Gini index on subspaces of income profiles defined by a certain level of total income (see Ben-Porath and Gilboa, 1994); (iii) if p1 = 1 (and pi = 0 for i > 1), the functional reduces to the minimal income level. While axiomatizations of these special cases do exist in the literature, we prefer to keep the discussion here on the more general level, dealing with all functionals defined by p1 p2 · · · pn as above, and focusing on these cases as examples and reference points. The strong connection between inequality measurement and decision under uncertainty, and, furthermore, the fact that comonotonically linear functionals were independently developed in both fields, may lead one to believe that the problem of
534
Ben-Porath et al.
inequality measurement under uncertainty is (mathematically) a special case of the known problems of the measurement of inequality—or of uncertainty—aversion. But this is not the case. The rank-dependent approach of Weymark, Quiggin, Yaari, and Chew cannot satisfactorily deal with the preference patterns described in Section 22.1. Specifically, in each of the six alternatives in the f , g, and h matrices there are two 1’s and two 0’s. If we follow the rank-dependent approach, applied to the state–individual matrix, and impose symmetry between the states and between the individuals, we will have to conclude that all six alternatives are equivalent. Indeed, the pattern of preferences between the f ’s and the g’s, as well as that between the g’s and the h’s, is mathematically equivalent to Ellsberg’s paradox (see Ellsberg, 1961), and the rank-dependent approach is not general enough to deal with this paradox. We suggest to measure inequality under uncertainty by the class of min-of-means functionals. A min-of-means functional is representable by a set of probability vectors (or matrices in the case of uncertainty) in the following sense: for every income profile, the functional assigns the minimal expected income, where the expectation is taken separately with respect to each of the probability vectors (matrices). Observe that a comonotonically linear functional defined by (pi )ni=1 with p1 p2 · · · pn can be viewed as a min-of-means functional, for the set of measures which is the convex hull of all permutations of (pi )ni=1 . In Section 22.3 we define this class axiomatically, and contend that the axioms are acceptable in the presence of uncertainty no less than under certainty. After quoting a representation theorem, we prove (in Section 22.4) that this class is closed under iterative application, as well as under averaging. It follows that this class includes linear combinations of, say, the expected Gini index and the Gini index of expected income. We devote Section 22.5 to briefly comment on the related class of Choquet integrals. Finally, an Appendix contains an explicit calculation of the probability matrices for some examples.
22.3. Min-of-means functionals We start with some general-purpose notation. For a finite set A, let F = FA be the set of real-valued functions on A, and let P = PA be the space of probability vectors on A. For f ∈ F and p ∈ P we use fi and pi with the obvious meaning for i ∈ A. Elements of F and of P will be identified with real-valued vectors or matrices, as the case may be. We also use p · f to denote the inner product i∈A pi fi . Specifically, let S be a finite set of states of the world, and let K be a finite set of individuals. (Both S and K are assumed non-empty.) Let A = S × K, so that F = FA denotes the space of income profiles under uncertainty for the society K where uncertainty is represented by S. Let · denote the binary relation on F , reflecting the preference order of “society” or of a “social planner.” Consider the following axioms on ·. (Here and in the sequel, ≈ · and > · stand for the symmetric and antisymmetric parts of ·, respectively.)
Measurement of inequality under uncertainty A1 A2 A3 A4 A5 A6
535
Weak order: For all f , g, h ∈ F : (i) f ·g or g ·f ; (ii) f ·g and g ·h imply f ·h; Continuity: f k → f and f k ·(· )g imply f ·(· )g; Monotonicity: fsi (>)gsi for all (s, i) ∈ A implies f ·(> ·)g; Homogeneity: f ·g and λ > 0 imply λf ·λg; Shift covariance: f ·g implies f + c ·g + c for any constant function c ∈ A; Concavity: f ≈ ·g and α ∈ (0, 1) implies αf + (1 − α)g ·f .
We do not insist that any of these axioms, let alone all of them taken together, are indisputable. However, under certainty (where F is the set of income vectors), they are satisfied by utilitarian preferences, by egalitarian (maxmin) preferences, as well as by any preferences that correspond to a comonotonically linear functional with p1 p2 · · · pn . Moreover, axioms 1–6 seem to be as reasonable in the case of uncertainty as they are in the case of certainty. In Gilboa and Schmeidler (1989) it is shown that a preference order satisfies A1–A6 iff it can be numerically represented by a functional I : F → / defined by a compact and convex set of measures C ⊆ P as I (f ) = Minp∈C p · f
for all f ∈ F .
Moreover, in this case the set C is the unique compact and convex set of measures satisfying this equality for all f ∈ F . We refer to such a functional I as a “minof-means” functional: for every function f , its value is the minimum over a set of values, each of which is a weighted-average of the values of f .
22.4. Iteration and averaging We now prove that the class of min-of-means functionals is closed under two operations: pointwise averaging over a given space, and iterated application over two spaces. (This, of course, will prove closure under any finite number of iterations.) Let there be given two sets A1 and A2 , to be interpreted as the sets of states and of individuals, respectively. Consider the product space A = A1 × A2 . Given two min-of-means functionals I1 and I2 (on F1 = FA1 and on F2 = FA2 , respectively), we wish to show that applying one of them to the results of the other generates a min-of-means functional on F = FA . We first define iterative application formally. Notation 22.1. For a matrix f ∈ F , define f˜1 ∈ F1 to be the vector of I2 -values of the rows of f ; that is, for i ∈ A1 , (f˜1 )i = I2 (fi· ). Then (I1 ∗ I2 )(f ) = I1 (f˜1 ). Should I1 ∗ I2 be a min-of-means functional (on F ), it would have a set of probability matrices (on A) corresponding to it. In order to specify this set, we need the following notation. First, let Pr = PAr for r = 1, 2. We now define
536
Ben-Porath et al.
a “product” operation between a probability vector on A1 , and a matrix on A, associating a probability vector on A2 for each element of A1 . Notation 22.2. Let m = (mij )ij be a stochastic matrix, such that mi· ∈ P2 for every i ∈ A1 . Then, for p1 ∈ P1 let p1 ∗ m be the probability matrix on A defined by (p1 ∗ m)ij = (p1 )i mij . Notation 22.3. Let Cr ⊆ Pr be given (for r = 1, 2). Let C1 ∗ C2 ⊆ P = PA be defined by C1 ∗ C2 = {p1 ∗ m | p1 ∈ C1 , ∀i, mi· ∈ C2 }. That is, C1 ∗ C2 denotes the set of all probability matrices for which every conditional probability on A2 , given a row in A1 , is in C2 , and whose marginal on A1 is in C1 . Theorem 22.1. Let there be given A1 and A2 as above, and let I1 and I2 be minof-means functionals on them, respectively. Let Cr ⊆ Pr be the set of measures corresponding to Ir , r = 1, 2, by the result quoted above. Then I1 ∗ I2 is a minof-means functional on A. Furthermore, the set C ⊆ P corresponding to I1 ∗ I2 is C ≡ C1 ∗ C2 . Proof. Observe that the set C is compact. We note that it is also convex. Indeed, assume that p1 ∗ m, p1 ∗ m ∈ C and let α ∈ [0, 1]. Define p¯ 1 = αp1 + (1 − α)p1 ∈ C1 and m ¯ i· =
α(p1 )i mi· + (1 − α)(p1 )i mi· ∈ C2 α(p1 )i + (1 − α)(p1 )i
for i ∈ A1 whenever the denominator does not vanish. (The definition of m ¯ is immaterial when it does.) It is easily verified that p¯ 1 ∗ m ¯ = α(p1 ∗ m) + (1 − α)(p1 ∗ m ). We now turn to show that C is the set of measures corresponding to I1 ∗ I2 . We need to show that for every f ∈ F , (I1 ∗ I1 )(f ) = Minp ∈ C p · f . Let f ∈ F be given. We first show that (I1 ∗ I2 )(f ) Minp ∈ C p · f . Let mi· ∈ C2 be a minimizer of j mij fij ≡ fˆi . Let p1 ∈ C1 be a minimizer of p1 · fˆ
Measurement of inequality under uncertainty
537
(where fˆ is defined in the obvious way). Note that p1 · fˆ = (I1 ∗ I2 )(f ). Since p1 ∗ m appears in C, the inequality follows. Next we show that (I1 ∗ I2 )(f ) Minp ∈ C p · f . Assume that the minimum on the right-hand side is attained by the measurep1 ∗ m. We claim that, unless (p1 )i = 0, mi· is a minimizer, over all p2 ∈ C2 , of j (p2 )j fij . Indeed, were one to minimize p ·f by choosing a measure p1 ∗ m, one could choose mi· independently at different states i; hence, for any choice of p1 , the minimal product will be obtained for mi· that is a pointwise minimizer. Without loss of generality we may therefore assume that mi· is a minimizer of j mij fij ≡ fˆi . Hence p1 has to be a minimizer of p1 · fˆ, and the equality has been established. Finally, since I1 ∗ I2 is representable as Minp ∈ C p · f , it is a min-of-means functional. Under the above conditions, both I1 ∗I2 and I2 ∗I1 are min-of-means functionals. However, in general they are not equal, as the examples in Section 22.1 show. Specifically, let A1 = {s, t}, A2 = {a, b}, and define I1 by C1 = {(1/2, 1/2)} and I2 by C2 = {(p, 1 − p) | 0 p 1}. That is, I1 is the expectation with respect to a uniform prior, and I2 is the minimum operator. Consider the matrix g1 defined in Section 22.1, and observe that (I1 ∗ I2 )(g1 ) = 0 while (I2 ∗ I1 )(g1 ) = 1/2. The theorem above states that if a certain inequality index, such as the Gini index or the minimal income index, is representable as a min-of-means functional, so will be that index applied to expected income, and so will be the expected value of this index. However, if we consider a sum (or an average) of these two, we need the following result to guarantee that the resulting functional is also a min-of-means functional. Proposition 22.1. Let there be given two min-of-means functionals I 1 and I 2 on F = FA . Let α ∈ [0, 1]. Then I = αI 1 + (1 − α)I 2 is a min-of-means functional. Furthermore, if C 1 and C 2 are the sets of measures corresponding to I 1 and I 2 , respectively, then the set C corresponding to I is given by C = {αp1 + (1 − α)p 2 | p1 ∈ C 1 , p2 ∈ C 2 }.
22.5. A comment on Choquet integration Schmeidler (1989) suggested to use Choquet integration (Choquet, 1953–1954) with respect to non-additive measures for the representation of preferences under uncertainty. For the sake of the present discussion, the reader may think of a Choquet integral as a continuous functional, which is linear over each cone of comonotonic income profiles. Specifically, assume that f and g are two matrices, with rows corresponding to states, and columns—to individuals. Recall that the two are comonotonic if there is an ordering of the state–individual pairs that both f and g agree with, that is, if there are no two pairs (s, a) and (t, b) such that fsa > ftb while gsa < gtb . In this case, the Choquet integral I with respect to any
538
Ben-Porath et al.
non-additive measure has to satisfy I (f + g) = I (f ) + I (g). Under certainty, the assumption of symmetry (between individuals) reduces the non-additive integration approach to the rank-dependent one. This is not the case, however, when the relevant space is a product of states and individuals. In a twodimensional space, symmetry between rows and between columns does not imply that every subset of the matrix is equivalent to any other subset with identical cardinality. Thus, the non-additive approach may explain preferences as in Ellsberg’s paradox, and can, correspondingly, account for preferences as in Section 22.1, without violating the symmetry assumptions. This would give one a reason to hope that this approach is the appropriate generalization of comonotonically linear functionals in the case of certainty. Yet, we find that this linearity property, even when restricted to comonotonic profiles, is hardly plausible in our context. Consider the following four alternatives. f3 s t
a 1 0
b 0 0
f4 s t
a 0 0
b 1 0
g3 s t
a 2 1
b 1 0
g4 s t
a 1 1
b 2 0
By symmetry between the two individuals, the first two alternatives are equivalent. Assuming a Choquet-integral representation, this would imply that the last two alternatives are also equivalent. To see this, note that g3 = f3 + h and g4 = f4 + h for h given by h s t
a 1 1
b 1 0
Since f3 and h are comonotonic, I (g3 ) = I (f3 ) + I (h). Similarly, I (g4 ) = I (f4 ) + I (h). From I (f3 ) = I (f4 ) one therefore obtains I (g3 ) = I (g4 ). However, g3 and g4 are equivalent only as far as ex-post inequality considerations are concerned. Ex-ante, g3 makes one individual always better off than the other, while g4 guarantees the two individuals identical expected income. The expected minimal income, as well as the expected Gini index, are the same under g3 and g4 . Yet, the Gini index of the expected income, as well as the minimal expected income, differ. From a mathematical viewpoint, we note that the average of Choquet integrals is a Choquet integral, and therefore the expected Gini and the expected minimum are representable by a Choquet integral over the states–individuals matrix. By contrast, the minimum of Choquet integrals, or the integral (over individuals) of Choquet integrals (over states) need not be a Choquet integral itself. In other words, the family of Choquet integrals fails to be closed under iterative application.
Measurement of inequality under uncertainty
539
Appendix Suppose that an inequality measure (under certainty) is represented by a min-of-means functional IK on FK that corresponds to a set of measures CK ⊆ PK . Assume that p ∈ PS is an objective probability measure on the state space S. Then the functional I (f ) = αEs [IK (fs· )] + (1 − α)IK (Es [fsi ]) ∀f ∈ FS×K is a min-of-means functional, and the corresponding set of measures is 0 1 % " ∃r , r , . . . , r |S| ∈ CK s.t. ∀s, i . q ∈ PS×K qsi = αps ris + (1 − α)ps ri0 As an example, consider the case of extreme egalitarianism. That is, on the set of individuals K we adopt the set of all probability measures: CK = PK . Assume that p ∈ PS is an objective probability measure as given above. Then the functional ) * 1 1 I (f ) = Es min fsi + min Es [fsi ] ∀f ∈ FS×K i 2 2 i is a min-of-means functional, and its corresponding set of measures (on S × K) is ⎧ ⎫ (i) q = p ∀s; ⎪ ⎪ ⎪ ⎪ si s ⎨ ⎬ i q ∈ PS×K . 0 ⎪ ⎪ ps ri ⎪ ⎪ 0 ⎩ 0 ∀s, i ⎭ (ii) ∃r ∈ PK s.t. qsi − 2
Acknowledgment We thank two anonymous referees and the associate editor for their comments.
Notes 1 Myerson (1981) also draws the distinction between ex-ante and ex-post inequality, and argues that a social planner’s preference for ex-post inequality might lead to a choice which is not ex-ante Pareto optimal. 2 One can justify the preference of g over h on grounds of “procedural justice” (Karni and Schmeidler, 1995): it is more just that Nature (i.e., a lottery) would choose whether a and b will be the rich individual, rather than that the choice be made by a person (or persons) acting on behalf of “society.” 3 Indeed, Rawls’ conceptual derivation of his criterion resorts to the “veil of ignorance,” that is, to the reduction of social choice problems to decision under uncertainty. (See also Harsanyi, 1953.)
References Ben-Porath, E. and I. Gilboa (1994) Linear measures, the Gini index and the income– equality tradeoff, J. Econ. Theory 64, 443–467.
540
Ben-Porath et al.
Chew, S. H. (1983) A generalization of the quasilinear mean with applications to measurement of income inequality and decision theory resolving the Allais paradox, Econometrica 51, 1065–1092. Choquet, G. (1953–1954) Theory of capacities, Ann. Inst. Fourier 5, 131–295. Diamond, P. A. (1967) Cardinal welfare, individualistic ethics, and interpersonal comparison of utility: Comment, J. Polit. Econ. 75, 765–766. Ellsberg, D. (1961) Risk, ambiguity and the savage axioms, Quart. J. Econ. 76, 643–669. Gilboa, I. and D. Schmeidler (1989) Maxmin expected utility with non-unique prior, J. Math. Econ. 18, 141–153. (Reprinted as Chapter 6 in this volume.) Harsanyi, J. C. (1953) Cardinal utility in welfare economics and in the theory of risk-taking, J. Polit. Econ. 61, 434–435. —— (1955) Cardinal welfare, individualistic ethics and interpersonal comparisons of utility, J. Polit. Econ. 63, 309–321. Karni, E. and D. Schmeidler (1995) Justice and common knowledge, mimeo. Myerson, R. (1981) Utilitarism, egalitarism, and the timing effect in social choice problems, Econometrica 49, 883–897. Quiggin, J. (1982) A theory of anticipated utility, J. Econ. Behav. and Organ. 3, 323–343. Rawls, J. (1971) “A Theory of Justice,” Harvard Univ. Press, Cambridge, MA. Schmeidler, D. (1989) Subjective probability and expected utility without additivity, Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) Weymark, J. (1981) Generalized Gini inequality indices, Math. Soc. Sci. 1, 409–430. Yaari, M. (1987) The dual theory of choice under risk, Econometrica 55, 95–115. —— (1988) A controversial proposal concerning inequality measurement, J. Econ. Theory 44, 381–397.
˚
Index
acts: comonotonic 29, 111, 307; constant 211; as decision alternative 137; as horse lotteries 127; measurable 139; “objective mixtures” of 214; set of 23; step 139 additive games (charge games) 47 additive (subjective) probabilities: on idiosyncratic states 347; in physics 108 additivity for unambiguous events 138 agents: aversion to Knightian uncertainty 453; beliefs, assumptions about 429; consumption across states 338; with finite databases 365; heterogeneous 452; ranking of states 337 Agnew, C. E. 127, 156 Alaoglu Theorem 49, 264 Σ as σ-algebra 50, 53–4, 73, 474 Aliprantis, C. D. 51–6, 58–9, 76, 79, 91–2, 101 Allais, M. 3, 21; Paradox of 3, 5, 176, 206, 213 ambiguity: affinity for 185; and ambiguity attitude, defining 36–44; and ambiguity aversion, related literature 214–15; in beliefs, updating 155–68; definition of 40–3; in events 138; as fuzzy perception of likelihood 332; made precise 209–41; in probabilities 174 ambiguity aversion 209, 212; absolute, characterization 223–4; behavioral definition of 36; capacities and Choquet integrals 234–5; cardinal symmetry and biseparable preferences 235–8; characterizations comparative and absolute 222–6; comparative and absolute 221–2; comparative, and equality of cardinal risk attitude 230; comparative foundations 37–40; definition for class of preference models
209; definition of unambiguous acts 227; definitions 37, 219–22; equality of utilities 236; filtering cardinal risk attitude 219; with heterogeneous agents, static model 285; probabilistic risk aversion 232–4; proposition on ambiguity averse CEU preference relation 229; “pure” 40; set-up and preliminaries 215–19; theorem on absolute ambiguity aversion 223–4; theorems on biseparable preferences 223, 225; theorem proofs 238–41; theory 309, 354; unambiguous acts, using in comparative ranking 230 Anderlini, L. 303 Anderson, E. W. 286, 364, 382, 411 animal spirits 429 Anscombe, F. J. 17, 26–7, 109, 121, 126, 138–9, 144–6, 168, 194, 210, 214, 218, 223, 520; models of 126; see also Anscombe–Aumann Anscombe–Aumann model 31, 43, 109, 160, 195; framework 210, 214, 218, 223; horse-race/roulette-wheel acts 171; standard frameworks for modeling uncertainty 246; Theorem 113–14; or “two-stage” model 147, 244–5, 255, 258 ARMA model for consumption growth 414 Arrow, K. J. 26, 44, 210, 284, 420 Arrow securities 340 artificial intelligence, theory of 156 asset price: and aggregate consumption data for the United States 450; determination 438 asset pricing 370; models 365 asset pricing under Knightian uncertainty, intertemporal 429–68; aggregation in heterogenous agent economy 464–6;
542
Index
asset pricing under Knightian – cont. belief function kernels 451; equilibrium asset pricing 441–54; integral equilibrium examples 448; lemma on continuity 462; lemma on existence of ε-optimal continuous policies 461; emma from Michael 458; Lucas model, nondifferentiable 451; lMaximum Theorem 462; proposition on approximation of utility 457; proposition on continuity of utility 457; proposition on existence of utility 456; remarks on empirical content 454–6; theorem on existence of utility proof 456–8; theorem on structure of set of equilibria proofs 458–64; theorems on existence and characterization of equilibria 445, 447; utility 431–4, 436 asset trading 343 asymmetric uncertainty between transacting parties 354 Atkinson–Kolm–Sen, traditional welfare function 297 attitudes toward uncertainty and risk, degree of separation between 179 Aubin, J. P. 468 auctions, analysis of behavior in 295 Aumann, R. J. 17, 26–7, 48, 65, 109, 121, 138–9, 144–6, 168, 194, 210, 214, 218, 223, 263, 495, 505–8, 520; models of 126; theorem of 15; see also Anscombe–Aumann aversion to risk and uncertainty 175–9 axiomatizations: for decision under uncertainty 24; and development of new models or concepts 20–1; of expected utility 21; general purpose of 20–2; and their structural assumptions 26 axioms: bisymmetry 28, 31; P4 25; three different classes 22 Baire category theorem 446 Baker, G. 305 Balcar, B. 267 Banach: lattice 75; space, dual 104; spaces 48, 75, 130, 263, 278, 474 Bansal, R. 403–4, 414 Baratta, P. 13 Barsky, R. B. 430–1 Bartle, R. G. 53 Bassanezi, R. C. 67 Bayes, T. 3 Bayes’/Bayesian: Beliefs Equilibrium 490–1, 501, 507, 519; decision problems
386–7, 414; expected utility maximizers 472; implementation theory 296; law of 3; learning generalized 158; rule 163; statistical techniques 108; tenets of 43; update rules 162–3; updating, violations of 3 Bayesian approach: to decision making under uncertainty 155; for one-shot decision problem 156 Bayesian model 430; of decision-making 429; extensions of the 431; prior in 466 Bayesianism: assumptions 17; first tenet of 4–5, 16 Becker, J. L. 146 behavioral consequences of uncertainty aversion 173 behavioral definition of ambiguity for events and acts 227 belief functions 14–15, 84; kernels 455; and their updating 157 Beliefs Equilibrium 501, 507, 515, 519; with Agreement 506 beliefs, sharing 472–81; lemma on convex and compact sets 478; mutual knowledge of 508; proof of theorem Pareto optimal allocations 476–8; standard two-period economy 474–5; theorem on Pareto optimal allocations theorem 475, 478–8 Bellman equations 383–4, 387 Ben Porath, E. 103; 257, 297–8, 531, 533 Bernardo, J. M. 146 Bernheim, B. D. 295, 321 Bernheim and Pearce’s notion of “rationalizability” 527 Bernoulli, D. 419 Bernstein polynomials, 85, 94–6 Bertsekas, D. P. 461 Bessanezi, R. C. 103 bets: on event 242; on events, ranking of 210; willingness as complementadditive 227 Bewley, T. F. 15–16, 18, 29, 41, 127, 156, 287, 341, 420, 431, 476; model 158, 421; work on Knightian decision theory 472 Billingsley, P. 177, 271 Billot, A. 287, 472 Binmore, K. 529 biseparable preferences 43, 210, 212, 216, 222, 236; ambiguity averse 231; examples of 216–18; theorems 223–5 biseparability 235 Biswas, A. K. 81, 104
Index 543 Blackwell, D. 26, 386; sufficient condition for contraction mapping 456 Blume, L. 256 Bondareva, O. 50 bond/s: Chernoff 400; high yield corporate (“junk” bonds) 354; issued in “emerging markets” 354; market, US corporate 361 Border, K. C. 51–6, 58–9, 76, 79, 91–2, 101, 104 Borel: measurable functions 413; measures 275; probability measures 432; sets 143 Borel σ-algebra 143, 261, 268, 432, 436; of compact Hausdorff space 274 Boros, E. 94–5 bounded game 47, 50 bounded rationality 303 Brandenburger, A. 494–5, 505–8, 520 Bray, M. 413 Breeden, D. T. 389, 413 Brock, W. A. 388–9 Brownian motion 372, 375, 378, 383–4, 386, 390; with constant drift 397; risk prices 389; shocks 407–8 Burks, A. W. 24 business: cycle models, analysis of 406; firm 303 Camerer, C. F. 309, 430, 483, 527 Cameron, H. R. 397–8 Campbell, J. Y. 414 canonical utility index 216, 221, 223 capacity: additive 144; notion of 9; see also nonadditive probability cardinal risk attitude 212 cardinal symmetry 212, 220, 223 Carlier, G. 103 Carlton, D. W. 289, 305 Carroll, C. 361 Cartesian product state space 194 Casadesus-Masanell, R. 17, 26, 31, 37, 217, 245, 248, 258 Case-Based Decision Theory of Gilboa and Schmeidler 299 certainty: equivalence 110; independence 127 CEU see Choquet expected utility Chain Rule for eventwise differentiability 203 Chateauneuf, A. 30–1, 84, 96, 157, 214, 287, 337, 341, 472 Chen, Z. 285, 382 Chernoff, H. 393, 395–6, 415; bonds 400; entropy 396–8, 401, 403, 406, 409, 414;
entropy rate 400; large deviation bounds 394; measure 413 Chew, S. H. 14, 26, 28–9, 122, 172, 182, 190, 245, 249, 259, 467, 533–4 Cho, I.-K. 413 Choquet expectation 291, 323; of act 307; operator 308, 311, 332, 361 Choquet expected utility (CEU) 21, 26, 29, 136, 138, 140–1, 344, 482; with convex capacities, cognitive interpretation of 9–10; framework as model of ambiguity aversion 361; maximizers 337; model with convex capacities 473, 476; model of Gilboa and Schmeidler 218; model of Schmeidler 36, 63, 218, 284; one-stage and two-stage approaches, nonequivalence 146–8; orderings 217; properties of utility and capacities under 30; restrictiveness 245; uncertainty aversion, and randomizing devices 253–5 Choquet expected utility (CEU) model, formal details relating to 354; independent product for capacities 353–7; law of large numbers for capacities 357 Choquet expected utility (CEU) preferences 40–1, 43, 222, 228, 245, 290, 336, 361; with convex capacities in product state space model 256; with convex capacity restrictive class in Savage-like setting 255; with a convex capacity, restrictiveness 253; theorem 253 Choquet expected utility (CEU) theory 9, 16, 180, 210; with convex capacities, cognitive interpretation of 9–10; development of theory, related literature 13; maximizer 10; nonadditive probabilities 6–9; Schmeidler 171, 182; utility function 189–90 Choquet functionals 60, 62–7, 73, 76–8, 81, 93, 103–4; representation 69–73; representation lemma 71; representation theorem 69–70 Choquet, G. 8, 73, 84, 103, 117, 125, 153, 155, 174, 217, 235, 261, 273–4, 426, 540; decision rule 309; method of evaluation of act 323; Representation Theory 269 Choquet integral 8, 10, 58–69, 77–8, 90–3, 121, 126, 141, 146, 155, 210, 235, 245, 247–8, 264, 274–5, 277, 298, 534; basic properties 64; definition for general
544
Index
Choquet integral – cont. functions 9; evaluation 293; of every real-valued function 9; ∫ f dν definition 61; general functions 60–4; Jensen inequality 68; lemma on comonotonic functions 66; Lipschitz continuity 64; monotonicity 64; for positive functions 58–60, 103; positive homogeneity 64; proposition on basic properties 64–5; proposition with comonotonic additivity 67; proposition with Jensen inequality 68; proposition with simple functions 63; proposition with unique invariant extension 60; propositions defining 58–9; representation 538; theorem with comonotonic functions 66; translation invariance 64; variants of 299 Choquet integration 14, 125, 181, 466, 537–8; definition of 206; formula 464; proposition on min-of-means functionals 537 Choquet subjective expected utility (CSEU) 275 Chow, C. K. 400 classification errors in models 396 coalition games, representation of 263; corollary on monotone set function 277; decomposition 265–7, 273; decomposition and representation of 261–78; definition of filter games 267; Dempster–Shafer–Shapley Representation Theorem 263; dual spaces 278; filter games, proposition 267–8; of finitely additive representation of games in Vb 275; integral representation 274–5; isometric isomorphism T 276; Jordan Decomposition Theorem for measures 265–7; locally convex topological vector space on V 264–5; representation of countably additive representation of games in Vb 267–74; theorem on isometric isomorphism 268–9 coalitions in cooperative game 332 Coase, R. H. 303 Cochrane, J. H. 402, 414, 430, 448, 454 cognitive ease 5 cognitive unease 6 coins, example of two 11, 36 collection of unambiguous events 177; see also λ-systems commutativity 161 comonotonic additivity 235 comonotonic independence 112, 128
comonotonicity 68 compact Hausdorff space 271, 274 complete information normal form games 290 complete-markets aggregation theorem 452 completeness 22 composition norm 88 concavity 28; of vNM index 176 conditional relative entropy 379 confidence sets and hypothesis testing, techniques of 157 Constantinides, G. M. 414, 452 consumer behavior under certainty, theory of 420 consumption: model based on 450; processes 436; and savings behavior 438 ε-contamination 448, 452; model of beliefs 455 contingent contracts 320 contingent deliveries: incompleteness of markets for 341; rule 311 contingent payoffs 336; comonotonic 316; noncomonotonic 316 contingent states, relevant details 308 contingent surplus 310 continuation values 365 continuity 29, 112, 127; absolute 378 continuous-time: diffusion specification 414; Markov formulations of model specification, robust decision making, pricing, and statistical model detection 370; Markov models 366, 412 contracts with low powered incentives 289 contractual arrangements, ambiguity aversion 283 contractual relationship, “efficient” 320 convex games (supermodular games) 47, 73–82; and their Choquet integrals 103; convex analysis tools in studying; corollary, conditions in 76; lemma on Choquet functionals 76; lemma on finite game 81; proposition, properties of 73; theorem on bounded game 80; theorem on condition equivalence 73; theorem on condition equivalence in bounded game 78 convex nonadditive probability 158, 312–13, 339; function 306, 308 convexity: as defined in cooperative game theory 9; identification with uncertainty aversion 194; of nonadditive measures 9 Cooperative Game Theory 235 core: of bounded game, weak compact 49;
Index 545 of convex games 103; weak*-compact 54; corporate borrowing in US 361 countable additivity 51–3; games called measures 47 Cournot, A. A. 523 Cowles foundation 18 Cox, J. C. 416 Cragg, J. 431, 455 Crawford, V. 519 Csiszar, I. 379 cumulative dominance 137, 141–2, 145 cumulative prospect theory (CPT) 14 Dana, R. A. 103, 473 Deaton, A. 361 decision: analysis, early literature 27; in risk situation 110; models with ambiguity averse preferences 210; process, ex post Bayesian justification for 387; under risk 23–5; under uncertainty, general conditions for 23–5 decision maker: ambiguity averse 37, 39, 283; approximating model 378; concern about “model uncertainty” 286; Knightian uncertainty, response to 366; rational expectations model 364 decision maker’s beliefs 210; convexity of capacity representing 214; represented by multiple probabilities 210; stochastically independent 487 decision theory: applications of capacities on topological domains 103; models developed by Schmeidler 286; singleperson 494; under risk 297; under uncertainty 16, 148 de Finetti, B. 3, 7, 12, 26, 31, 109, 140, 210 Dekel, E. 29, 519 Delbaen, F. 54, 77, 81 Dellacherie, C. 66, 103, 426, 466 DeLong, J. B. 430–1 Dempster, A. P. 14, 84–5, 157, 426, 435 Dempster–Shafer theory 15; for belief functions 163; for probabilities 162; update rule 157–8, 167 Dempster–Shafer–Shapley Representation Theorem 262, 269; for finite algebras 274; for finite games 261 Deneffe, D. 236 Denneberg, D. 66, 91, 103 detection 377; error probability 401; error probability bounds 402 deterministic outcomes 127 De Waegenaere, A. 77, 298
Diamond, P. A. 532 differentiability and uncertainty aversion, examples 189–91 Dini Theorem 53 Dirac: charge 97; measures 274–5 discount-factor model 453 diversification opportunities 289 Dow, J. P. 158, 259, 284–5, 291–2, 294, 309–10, 337–8, 340, 351, 416, 419, 421, 445, 473, 483, 493, 512–14, 516, 520, 529; notion of equilibrium 296 Dow and Werlang model, equilibrium beliefs 293 dual games with “dual” properties 48 dual spaces, proposition 278 Dubois, D. 299 Dubra, J. 29 Duffie, D. 381, 414, 437, 452 Dugundji, J. 464 Dunford, N. 47, 49, 53–5, 58, 75, 79, 104, 124, 128–9, 131–2, 272, 276, 474 dynamic consistency of choices 284 Dynkin, E. B. 413 East Asia 354 Eastern Europe 354 Eberlein–Smulian Theorem 54 economic effects of ambiguity aversion 288 economic significance of nondifferentiability 467 economy: without idiosyncracy 344; limit replica 337; markets, complete and incomplete 337–40; models 286–7; see also n-financial asset economy Edgeworth box 285, 341, 453 Edwards, W. 13, 22 Eichberger, J. 195, 244, 247, 255, 258, 296, 310 Einy, E. 81 Ellsberg, D. 4, 17, 36, 108, 110, 135, 137, 155, 176, 206, 210, 309, 420, 424–5, 430, 534; options 137; experiments 11, 195, 209, 212–14, 228–30, 257, 309, 483; mind experiment, challenging expected utility hypotheses 125; see also Ellsberg Paradox Ellsberg Paradox 4–6, 14, 113, 145, 430, 484, 495, 526, 538; example of urn with 90 balls 137, 157; one-stage formulation 147; “two-color” problem 244; twostage formulation 147; two-urn experiment 5; “unknown urn” example 242; urn experiments 174, 178, 185, 194, 207
546
Index
Ellsberg-type behavior 466 entropy 381; penalties 378; penalty problem 380; relative, models with 367 Epstein, L. G. 29, 31, 37, 39, 40–1, 103, 173, 210–11, 214–15, 229, 234, 258, 284–5, 300, 309, 341, 351–2, 361, 366, 381–2, 414, 429, 437–9, 458, 466–7, 520; definition of ambiguity aversion 39 equilibrium: in ε-ambiguous beliefs 520; arbitrage price theory (APT) 352; model with multiple-prior agents 284; as price process 442; prices, characterization by an “Euler inequality” 429; pricing of securities with payoff dates in the future 365; profile 291; risk-sharing 349; in strategy profiles 522–4 equilibrium asset pricing 441–54; belief function kernels 451; economy 441; Euler inequalities 442–4; examples 448–54; heterogenous agents 452–3; nondifferentiable Lucas model 451–2; theorem on existence and characterization of 445–6, 447–8; theorem on structure set of 446–7 equilibrium in beliefs under uncertainty 483–518; Bayesian players concepts for 488–90; belief equilibrium containing Bayesian beliefs equilibrium 499; Beliefs Equilibrium (with Agreement) 496; correlated rationalizable strategies 497; general preferences 514–15; knowledge of rationality 496; marginal beliefs disagreement 490; multiple priors model 485–7; Nash Equilibrium, “beliefs” interpretation of 489; normal form games 487–8; proposition on Bayesian Beliefs Equilibrium 507; proposition on Bayesian Beliefs Equilibrium proof 516; proposition on Beliefs Equilibrium with Agreement 510; proposition on Beliefs Equilibrium with Agreement proof 517; proposition on Nash Equilibrium Under Uncertainty 513; proposition on Nash Equilibrium Under Uncertainty proof 518; proposition on Proper Beliefs Equilibrium 507; proposition on Weak Beliefs Equilibrium 511; proposition Weak Beliefs Equilibrium proof 518; rationalizable beliefs 497; single-person decision making 498; stochastically dependent beliefs 490; uncertainty averse players, concepts for 491–7;
uncertainty aversion, importance of 497–504 equilibrium in beliefs under uncertainty, decision theoretic foundation for Bayesian solution concepts 505–9; proposition on Beliefs Equilibrium 507; proposition on Beliefs Equilibrium with Agreement 507–8, 510; Proposition on Weak Beliefs Equilibrium 511 equilibrium in beliefs under uncertainty, definitions 489; Bayesian Beliefs Equilibrium 489; Beliefs Equilibrium 491–2; Nash Equilibrium 489, 513; Strict Beliefs Equilibrium 492; Weak Beliefs Equilibrium 493 equilibrium in beliefs under uncertainty, related literature 509–14; definition of Nash Equilibrium 489, 513; Dow and Werlang 511–12; epistemic conditions for equilibrium 512–14; proposition on Nash Equilibrium Under Uncertainty 513; proposition on Nash Equilibrium Under Uncertainty proof 518 equilibrium in beliefs under uncertainty, with uncertainty aversion 510; beneficial when players agree 501; importance of 497–504; need for 504–5; and rationalizable beliefs 510 equilibrium concepts for uncertainty averse players 491–7; definition of Beliefs Equilibrium 491–2; definition of Strict Beliefs Equilibrium 492–3; definition of Weak Beliefs Equilibrium 493; knowledge of beliefs 495; knowledge of rationality 496; mixed strategies as objective randomization vs subjective beliefs 493–5; nonunique best responses 504; nonunique equilibria 504; relationship with maximin strategy and rationalizability 496 equity premium puzzle 287, 368, 414, 468 equivalence relation 23 Ethier, S. N. 382, 412 EU see expected utility Euclidean space 87 Euler: equalities 285; equations 442, 444, 450, 454; inequalities 285, 443–4, 453, 458 Evans, G. W. 412 events 139; commutativity 28; evidence as non-negative number attached to 14; fineness of the unambiguous 142; types 136; unambiguous 40 eventwise differentiability 39
Index 547 eventwise differentiability of utility 172; definition 188; technical aspects of 189, 202–5 expectation 122 expected gains 312, 422 expected Gini index 534 expected utility (EU) 22, 172; maximization relative to probabilistic beliefs 3; violations of 4 expected utility model 123; of concave Bergson-Samuelson social welfare function 123 Expected Utility Theory (EUT) 419; by Anscombe and Aumann 9; classical 38; subjective, violations of independence axiom/sure thing principle 256 expected utility under nonadditive probability measure: maximizing 420; model of 419; value computed 422, 426–7 expected value: of contracted payoffs 289; function 175 extended generator 374 factor risk prices 368–9 Fagin, R. 156, 158 fair prizes 26 Fan, K. 51, 521; Theorem 443, 458 Federal Reserve System in Minnesota 528 Feller: process 374, 378; property, strict 444–5, 448; semigroups 370–2, 382 Felli, L. 303 Feynman, R. P. 108 filter games 267; corollary 273; definition 267; proposition 267 financial markets: ambiguity aversion 283; outcomes 284–7 financial uncertainty, idiosyncratic 342 Fine, T.-L. 47, 353, 357, 361, 430 finite algebras 262 finite convex games 100–3; properties of 100; theorem on marginal worth charges 100; theorem on vertex of core 101 finite games 77, 81–103; additive representation 90–3; decomposition 89–90; lemma on decomposition 89; lemma on lattice preserving isomorphism 87; lemma on Owen correspondence 97; lemma on polynomial counterpart of total monotonicity 98; lemma on Riesz space with lattice operations 86; polynomial representation 94–9; proposition on core of 101; space of 82–9; theorem on
decomposition 90; theorem on lattice preserving and isometric isomorphism 88, 91, 99; theorem on mononomials 95; theorem on totally monotone games 85; theorem on unanimity games 83 finite time series data record 365 firms: profits of 343; as risk neutral 289, 321 Fishburn, P. C. 22, 29, 31, 40, 109, 113, 121, 128, 138, 40, 145–6, 151, 155, 160, 215, 332, 361 Fisher Body and GM merger between 320 Fleming, W. H. 383 framing effects, documentation of 3 Franek, F. 267 Frankel, J. A. 431, 455 Frechet differentiability 172 Friedman, M. 124 Froot, K. 431, 455 Fubini Theorem 59, 247 Fudenberg, D. 295, 412, 497, 519, 529 full-insurance allocation 477 functional analytic tools in study of convex games 77 functions, comonotonic 66–7 Fundamental Theorem of Calculus 204–5 Gajdos, T. 298 game theory: ambiguity aversion 283; interpreting supermodularity in terms of marginal values 73; resolution of dynamic inconsistency 439 game tree 524; model 522 games: balanced 50; dynamic, building blocks in analysis of 522–3; extendable 81; monotone 262; non-atomic 57; representation of locally convex topological vector space 264–5; in strategic form 522; totally balanced 50; two-player zero-sum Markov multiplier 383 Gärdenfors, P. 146 Gardner, R. J. 272 Gâteaux 439, 444; derivatives 439, 443; differentiability of functions 172, 189, 202, 440, 467; nondifferentiable 442 Gaussian control problem 366 Genest, C. 127, 156 Ghirardato and Katz, MEU framework to analysis of voting behavior 296 Ghirardato, P. 17, 26, 28, 31, 36–8, 41–2, 103, 209–10, 215–17, 221, 227–8, 231, 234, 244, 247, 255–8, 290, 309, 353, 361
548
Index
Gilboa, I. 12, 15, 17, 22, 26, 30, 43, 85, 88–9, 91, 121–2, 125–6, 136, 143, 146, 148, 150–1, 155, 160–1, 172, 181, 195, 210, 243, 256, 261–4, 266, 274–5, 291, 299, 340, 353, 416, 419–20, 423, 426, 431, 438–9, 466, 472–3, 476, 481, 485, 487, 514, 519, 531, 533, 535; adaptation of Savage’s P7 to case of CEU 142 Gilboa and Schmeidler 440; axiomatizaiton based on behavioral data 12; multiple-prior model 284; representation 262; Theorem E of 277; unification of 138; utility 468 Gilboa–Schmeidler model 494, 519; closed and convex set of probability measures 483–4 Gini index 532, 537; of expectation 298; of expected income 297, 534, 538; of inequality 533; on subspaces of income profiles 533; welfare function 532 Giovannoni, F. 40, 43, 214; theory of ambiguity aversion 43 Girshick, M. A. 26, 386 Goes, E. 529 government/defense procurement contracts 321 Grabisch, M. 84, 94, 299 Greco, G. H. 67, 103–4 Green, E. 528 Greenberg, J. 294, 519, 522, 528–9 Grodal, B. 27 Grossman, S. J. 303 Grossman-Hart separation of CEU/MEU preferences 290 Guesnerie, R. 467 Gul, F. 26, 28, 246, 248–9 Hahn–Banach Theorem 75, 80–2 Halmos, P. R. 208 Halpern, J. Y. 156, 158 Hammer, P. L. 94–5 Hammond, P. J. 24 Hansen, L. P. 210, 286, 299–300, 364–5, 371–2, 375, 381, 387–8, 392–3, 402–3, 412–14, 448, 450 Hansen, Sargent, and Tallarini 365–7, 370, 392, 406–7, 409, 412–14; discrete-time model 406; equilibrium permanent income model 402; robust permanent income model 406; see also Hansen; Sargent; Tallarini Hansen, Sargent, Turmuhambetova, and Williams 366, 378, 383, 386, 392; see
also Hansen; Sargent; Turmuhambetova; Williams Hansen, Sargent, and Wang 365–7, 370, 401, 412–14; see also Hansen; Sargent; Wang Harsanyi, J. C. 532; Bayesian Equilibrium for games of incomplete information with Bayesian players 519; utilitarian solution 532 Hart, O. D. 303 Hausdorff: compact space 271, 274; topological vector space 264 Hayashi, F. 361 Hazen, G. B. 29 Heaton, J. C. 414 Hellman, M. E. 395, 400 Hendon, E. 256–7, 353, 529 Henry, H. 286 Holmes, R. B. 278 home-bias puzzle 285 Honkapohja, S. 412 horse lotteries 109, 111, 126–7, 472; acts 146, 182, 194; complicated probabilities 121 Huber, P. J. 103, 126 Hurwicz, L. 44, 125, 515; α-criterion 12, 31 hypercube [0, 1]n 94, 97 hypothesis testing 158 Ichiishi, T. 100 idiosyncratic risk 352–3 idiosyncratic shocks 361 incomplete contracts 210, 314; ambiguous beliefs, investment holdup and 310–20; assumptions 310; literature 320; null 316 incomplete information games 290 incomplete market economy 344; with sub-optimal risk-sharing 349 incompleteness of contractual form: ambiguity aversion and 303–33; condition on informativeness 312, 315; corollaries on first best action 314; corollary on tuple of informed and ambiguous beliefs 320; informed and ambiguous beliefs 320; lemma on informed convex nonadditive probability 313; lemma on informed convex nonadditive probability proof 323; lemma on nonadditive probability 318; lemma on nonadditive probability proof 330; model of decision-making by ambiguity-averse agents 306–10;
Index 549 propositions on first best action profile 313, 315, 319; proposition on first best action profile proofs 326, 330 incompleteness of financial markets: ambiguity aversion and 336–62; CEU model, formal details relating to 354–7; Choquet expected utility and related literature 339–41; lemma on equilibrium of n-financial assets economy with idiosyncracy 344; model and main result 342–53; theorems on n-financial assets economy with idiosyncracy 345–53; theorems on n-financial assets economy with idiosyncracy, proofs of 357–60 independence 112; axiom (sure-thing principle) 126, 420; concept in case of non-unique prior 126; monotonicity 112; for nonadditive beliefs, notion of 353; nondegeneracy 112; strict monotonicity 112 indexed debt 286 inequality (or inequity): aversion 123; and uncertainty 532–4 inequality measurement 297, 533; evaluation functionals 533; theory of 297; under uncertainty 531 inertia property 300 infinite convex games 103 infinitesimal generator 370 information criteria (IC) 364 integral for capacities, notion of 103 integration: for non-negative functions, definition of 117; of real-valued functions 8 intertemporal asset pricing, model of 285; see also asset pricing intertemporal choice and multi-attribute choice 298 intertemporal utility 431–41; belief function kernels 434; environment and beliefs 432; models of 438; probability kernel correspondences, examples of 434–5; supergradients, lemma 439; theorem on existence of 437–8 interval beliefs 191 investment holdup 320 irrational expectations 431; alternative models of 454 iteration and averaging 535–7 Ito, T. 470 Jaffray, J.-Y. 12, 44, 84, 96, 157–8, 435, 439 Jagannathan, R. 402–3, 448, 450
James, M. R. 381, 383 Jordan Decomposition Theorem: for charges 89; for measures 265, 273 Jorgenson, D. W. 412 Joskow, P. L. 320 jump distortions 370 Kadane, J. 434, 465 Kahneman, D. 3, 13, 14, 22, 26, 29; and Tversky, Prospect Theory of 3; see also Prospect Theory Kalai, E. 295, 529 Kannai, Y. 51, 224 Karni, E. 25–6, 28, 176, 245, 249, 539 Katutani’s fixed-point theorem 326 Katz, J. N. 242, 296 Kelley, J. L. 81, 477–8 Kelsey, D. 40, 195, 214, 244, 247, 255, 258, 296, 310, 332, 352–3 Keynes, J. M. 136–7, 429, 445, 451; description of consequences of uncertainty 430 Kikuta, K. 81, 104 Klein, B. 303, 320 Klein, E. 447 Klein, P. 289 Klibanoff, P. 12, 257, 291, 294, 483, 493, 509–12, 514, 516, 529; definition of equilibrium in normal form games 292; equilbrium in 292, 513, 520; model, equilibrium belief 293 Knight, F. H. 4, 36, 137, 417, 420, 429–30, 466, 473, 526 Knightian uncertainty 36, 176, 332, 429, 432, 452–3, 473, 483, 523, 526–7; ambiguity 293–4; aversion 242, 454; model of behavior under 341 Kogan, L. 286 Koppel, R. 429 Kopylov, I. 43 Krantz, D. H. 25, 220; conjoint measurement theory of 27 Krein–Milman Theorem 79, 274 Kreps, D. M. 208, 413, 467, 529 Kreps, S. 195 Kuhn, H. W. 528 Kunita, H. 375, 378, 382 Kurtz, T. G. 382, 412 Kurz, M. 412 Lambros, L. A. 456 Landers, D. 208 Laplace’s principle of insufficient reason 4–5
550
Index
Latin America 354 La Valle, I. H. 29 Laws: of iterated expectations 365; of Iterated Values 387 law of large numbers 353; for ambiguous beliefs 361; of capacities 357 Lebesgue: integration 438; measure 143 LeBreton, M. 31, 439 Legros, P. 313 Lehmann, B. N. 455 Lehrer, E. 85, 295, 529 Lei, C. I. 386 Leonardo, F. 333 LeRoy, S. F. 454, 466 Levine, D. K. 295, 412, 529 Lindley, D. V. 127, 156 linear inequalities in normed spaces, systems of 51 Lions, P. L. 381 Lipman, B. L. 303 Lipschitz continuity 103 Lo, K. C. 259, 281, 295, 310, 483, 529; definition of equilibrium 292; model, equilibrium beliefs 293 local probabilistic beliefs 172 local risk-neutrality result 421 Loomes, G. 29 loss of surplus 320 lotteries 17; evaluating 13; ticket 494; twostage 514 Luca, R. 301 Lucas, D. J. 414 Lucas, R. 352 Lucas, R. E. Jr. 388–9, 412, 429–30, 458; model 448, 452, 454; pure exchange economy 441; rational expectations model 448 Luce, R. D. 24, 146 Lyapunov equation 377–8, 380 Maccheroni, F. 53, 104 Mace, B. 361 Machina, M. J. 17, 31, 41, 144, 172, 177–9, 189, 204, 206–7, 214, 231, 483, 519; probabilistic sophistication model 39 MacLeod, W. B. 333–4 Maenhout, P. 385 Magill, M. J. P. 359 Malcomson, J. M. 303, 333 Malkiel, B. 431, 455 Mao, M. H. 172 marginal worth 101; charges 100, 102 Marinacci M. M. 31, 38, 46, 53, 59, 77, 89,
91, 102–3, 206, 209–10, 216–17, 221, 227–8, 231, 234, 255, 258, 261, 289, 291, 294, 332, 353, 361; definition of equilibrium in two-player normal form games 292; model 293; theorem 357 market prices of risk 365–6, 368–9, 414 Markov: evolution indexed by α 414; models 365; perfect equilibrium of twoplayer, zero-sum game 390; probability kernel 432–3, 454; structure 378, 467 Markov processes 365, 373; conditional expectations operators 365; diffusion, generator of 372; as Feller semigroups 398; jump, generator for 372; mathematical theory of continuous-time 409; models, discriminating between 398 martingale construction 374 Martin, W. T. 397–8 Masten, S. E. 321 mathematical treatment of nonadditive probabilities 426 mathematics of ambiguity 46–104; Choquet integrals 58–69; convex games 73–82; finite games 82–103; representation 69–73; set functions 46–58; MATLAB control toolbox 404, 406 Matsushima, H. 313 Maurin, E. 298 maximizing utility with nonadditive prior 421 maximum-likelihood-update rule 156–7, 163; generalized version of 168; not commutative 167 Maximum Theorem 443–4, 447, 462, 464, 466 maxmin expected utility (MEU) (MMEU) 157; preferences 37, 40, 224, 228; theory 210 Maxmin Expected Utility (MMEU) model or “multiple prior” model 16; in Anscombe–Aumann framework, axioms of 17; development of theory, related literature 13; preferences 222, 290; of Schmeidler, and Gilboa and Schmeidler 218; two main advantages over CEU 12 Maxmin expected utility (MMEU) model with non-unique prior 125–34, 245; extension of 132–4; lemmata 128–30; with multiple priors 245; preferences with convex capacities in a product state space model 256; proof of 128–31;
Index 551 propositions 132–3; and randomizing devices theorem 250–3; theorem 127–8 Meeden, G. 127 Mehra, R. 389, 468 MEU see maxmin expected utility Meyer, P.-A. 103, 466 Miao, J. 285 Michael, E. 470 Milgrom, P. 158, 481 Milne, F. 352 minimal income index 537 minimax criteria: loss 126; regret 126 minmax expected utility or multiple priors model of Gilboa and Schmeidler 366; see also maxmin expected utility min-of-means functionals 536; theorem 534–6 Mobius: inverse of capacity 299; theory of 84; transform 98 model misspecification and robust control 365, 377–83; for adding controls to the original state equation 383; alternative entropy constraint 382–3; conditional relative entropy 379–80; enlarging class of perturbations 383; entropy penalties 378–9; Lyapunov equation under Markov approximating model and fixed decision rule 377–8; risk-sensitivity as an alternative interpretation 381; theorem on entropy penalty 380–2; worst-case model, θ-constrained 382 model specification, semigroups for: detection errors 365–6, 369–70; discounting 373; entropy solution 411–12; extending domain to bounded functions 373; extending generator to unbounded function 374; and their generators 370–1; iterated laws and 365; market prices of risk 365–6; mathematics 370–4; related literature 366; robustness versus learning 367; theorem on entropy penalty 380; worstcase models 375, 390; see also semigroups for model specification model specification, semigroups for entropy and the market price uncertainty 401–9; permanent income economy 406, 409; prices of model uncertainty and detection-error probabilities 403, 409; risk-sensitivity and calibration of θ 406 model specification, semigroups for portfolio allocation 383–7; diffusion 384; ex post Bayesian interpretation
386; jumps 386–7; related formulations 385–6 model specification semigroups for pricing risky claims 387–93; ex post Bayesian equilibrium interpretation of robustness 392; marginal rate of substitution pricing 388; model uncertainty prices, diffusion and jump components 392; pricing under approximating model 391; pricing without concern for robustness 389–91; subtleties about decentralization 392; theorem on robust resource allocation 391 model specification semigroups for statistical discrimination 393–401; constant drift 397; continuous time 398; detection and plausibility 401; detection statistics and robustness 400; discrimination Markov models in discrete time 394; formulation of bounds on error probabilities 394; measurement and prior probabilities 393; rates for measuring discrepancies between models locally 396; theorem on generator of positive contraction semigroup 400 model uncertainty premia: for an alternative economy 404; in market prices of risk 365 monotonicity 22, 24, 112, 127, 235, 456; in game 47, 273; strict 112 Montesano, A. 43, 214 Montrucchio, L. 46, 77, 102–3 Moore, J. H. 333 moral hazard 289; models of double-sided 322 Morgenstern, O. 124, 169, 419 Morris, S. 354 Moscarini, G. 400, 414 Moulin, H. 47, 49, 73 Mukerji, S. 210, 285–6, 289, 293, 300, 303, 309–10, 336, 350, 354, 519–20 multi-attribute choice 298 multiple price supports 352 multiple priors (MP) 21, 26, 155–7; theory applied to portfolio selection problems 158; with unanimity ranking 15–16 multiple-priors model 11–12, 30, 155–7, 171, 286, 291, 298, 431, 473, 476, 509, 511, 515; ; axiomatized by Gilboa and Schmeidler 483; preference order 181; see also Maxmin Expected Utility (MMEU) Myerson, R. B. 83, 275, 539
552
Index
Nakamura, Y. 26, 138, 245–6, 248–9, 251; multi-symmetry 28 Namioka, I. 477–8 Nandeibam, S. 40, 214, 332 Narens, L. 146 Nash equilibrium 290–1, 294, 483–4, 491, 500–1, 503, 506–7, 519–20, 522; Cautious 520; deficiency in dynamic games 522; generalization of 516; proof of uniqueness 527; strict 519; Under Uncertainty 512; unique 525–6; unique recommendation 525 Nash, J. 488, 528; path 527 Nehring, K. 40–3, 215, 257, 302 Neo-Bayesian decision theory: Anscombe–Aumann Theorem 113; comonotonic independence 112; continuity 112; implication of von Neumann–Morgenstern Theorem; Newman, C. M. 400 n-financial asset economy with idiosyncracy 342–4; lemma on equilibrium of 344; maximization problem in 358; model and main result 342–53; theorem proofs 357–60; theorems 345–53 Nguyen, H. T. 106 Nikodym Uniform Boundedness Theorem 50 no-arbitrage, principle of 388 noise traders 431 nonadditive (objective) probabilities 108, 115; in physics 108; set of 122 nonadditive beliefs 294, 352; law of large numbers work 348 nonadditive expected utility, simple axiomatization of 136–53; corollary on additivity of capacity 145; definition of Choquet expected utility 140; lemma 141; main result 141; nonequivalence of one- and two-stage approaches 146–8; proposition 145; revealed unambiguous events 144–6; theorem 142–3; theorem proof 148–52; theory 126 nonadditive expected utility, simple axiomatization, definitions 139–41; Choquet expected utility (CEU) 140–1; consequences 139; cumulative consequence sets 140; events 139; states of nature 139; step acts 139 nonadditive measures 157; nonempty cores, convex 156; for representation of preferences under uncertainty 537; uncertainty aversion 156
nonadditive probabilities 6–9, 155, 244; or capacities 210, 466; distribution 420; for events 138; measures 421; monotone set-functions 155; update rule 167; updating 158; see also Choquet Expected Utilities theory nonadditivity 14–15; in probability for ambiguous events 138 non-Bayesian uncertainty 287 noncooperative game theory 290 nondegeneracy 112, 128 nonexpected utility models 29–31; abandoning basic axioms 29; preferences 352; probabilistic sophistication 31; properties of utility and capacities under CEU and PT 30–1; prospect theory 29–30 nonlinear decision weights 32 nonlinear probabilities, alternative models of 22 non-product weights for randomized act 248 non-trade theorem 421 nontriviality 142 nonuniqueness or indeterminacy of equilibrium prices 445 normal form games 522 no-trade: based on “lemons” problem, theory of 354; equilibrium price 347; price interval 285; result 352 null contract 311 null sets, basic properties of 55 objective probabilities 174; distribution 138 “odds” concept 125 off equilibrium choices 527 “one-stage” or Savage model 244, 247 ε-optimal continuous policies, existence of 461 optimal contractual arrangements 287–90; incentive contracts 288; risk sharing 287; role of ambiguity aversion in design of 322 Osborne, J. M. 529 Osborne, M. 521 outcome space 23 Owen, G. 47, 49, 95; correspondence 96; multilinear extension 95; polynomial 96–7 ownership rights 303 Ozdenoren, E. 295 Papamarcou, A. 430
Index 553 Pareto optimal allocations 473; optimality 472; theorem 475, 478–8; theorem proof 476–8 path games, analysis of 523 Paxson, C. 361 peace-negotiation scenario example 524–6 Pearce, D. 302, 521 permanent income model, discrete-time, linear-quadratic 366 Pfanzagl, J. 27; bisymmetry 28 Phelps, R. 269 Philippe, F. 91, 103 physical probability measure 413 players: Bayesian 291; preferences, multiple priors framework to model 292; strategic choice 291 Poisson intensity parameter, statedependent 372 Polya index 298 polyhedra 101 polytopes 101 Porteus, E. L. 467 portfolio: choice 284; income 343 positive games 47, 77; called probabilities 47 Poterba, J. M. 450 Pratt, J. W. 210, 410 preference axiomatizations for decision under uncertainty 20–32; axioms 21; general conditions for decision under uncertainty 23–5; general purpose of 20–2; nonexpected utility models 29–31; relation of decision 23; subjective expected utility (SEU) 24, 26; subjective expected utility (SEU) conditions to characterize 25–8 preferences: ambiguity averse 38, 42; intervals 140; mixture or randomization as facet of uncertainty aversion 244; randomization 257; relations 117, 215, 220, 222 preferences, definition 249; expected utility (EU) 249; solvability 248; stochastically independent 249; stochastically independent randomizing device (SIRD) 249 Prescott, E. C. 389, 468 Preston, M. G. 13 price: dynamics 158; indeterminacy 451; risk, two types of 287 pricing semigroups see semigroups principal agent problem under moral hazard 290
priors, set of 155, 157; see also multiple priors prisoners’ dilemma 501 probabilistic risk aversion 39–40, 213, 232–4 probabilistic sophistication: behavior 244; within CEU 190 probabilistically sophisticated (PS) preferences 231, 234; as benchmarks 231 probability 122; measure 190; on λ-system 184; prior 108; update in Gilboa, definition of 162 probability kernel 433, 454; correspondences 432; intertemporal utility examples of 434 procedural justice 539 proper randomizing device (SIRD) 255; see also stochastically independent randomizing device prospect theory (PT) 13, 17, 26; properties of utility and capacities under 30; reference outcome 29 pseudo-Bayesian rules, family of 156 pseudo-Boolean functions 94; grounded 94–6 Puppe, C. 299 pure endowment feature 388 pure exchange economy 352 quadratic capacity 190 quasiconcavity of preference 515 Quiggin, J. 14, 22, 28, 121, 533–4; model of 122 Quinzii, M. 359 Radner, R. 313 Radon–Nikodym Theorem 55; derivative 448, 468 Raiffa, H. 26–7, 244, 257, 495; preference for randomization 253; preferences 247 Ramsey, F. P. 3, 12, 27, 109, 136, 140 random outcomes or (roulette) lotteries 111, 127 randomization: device, modeling 246–8; stochastically independent and preferences 248–56 rank-dependence 22 rank dependent utility 17, 29, 138 rank-dependent expected utility (RDEU) 64, 182; class of functions 190; model 13–14 “rank-dependent probabilities” approach to uncertainty 533
554
Index
Rao, K. P. 47, 50, 57, 59, 67, 199, 201, 206 Rao, M. B. 47, 50, 57, 59, 67, 199, 201, 206 rational expectations: hypothesis 429; model of beliefs 454; models 367; revolution in macroeconomics 286; risk price 389 “rationalization” of non-Nash strategy profiles 293 Raviv, J. 395, 400 Rawls, J. 424, 533, 539; egalitarian solution 533 real-life decisions 6 real-valued function 7 real-world contracts, incomplete 303 recursive utility 381; specification 403 reference point, notion of 17 “relation-based” approach to modeling ambiguity 42; limitations 43 research and development procurement by US Defense Department 289 reservation prices 421 Revenue Equivalence Theorem 295 Revuz, A. 89, 91, 261, 275; finitely additive representation of 274 Revuz, D. 413 Richard, S. F. 448 Riemann integrals 59, 61, 92, 235, 264 Riesz: Representation Theorem 441; space 87; space with lattice operations 86 Rigotti, S. 287 risk 36; based and ambiguity-based behavioral traits 39; equivalence 110; interest rate free of 376; neutral agent 421; neutral firms, vertically related 304; neutral probabilities 388; neutral probability measure 368; premium 110; sensitivity 381; and uncertainty, distinction between 137, 420 risk aversion 110, 172, 175–6; concepts of 171; neutrality, definition of 176; preferences 178; in SEU model 211; theory of measurement of 210; and uncertainty, distinction between 175 risklessness 176; in state-dependent expected utility model 176 risk-sharing 285; opportunities 338; possibilities on an incomplete market economy 339; problem 287; proceeds 341 robust control: problem 368; theory 366 robustness 381; asset pricing under preference for 369; versus learning 367
Rosenmuller, J. 172, 189, 203–4 Rota, G. C. 84 roulette wheel lotteries (objective) 109, 121, 126–7, 494; acts 146, 182, 194; richness notion valid for 180 Routledge, B. 286 Royden, H. L. 446 Rubinstein, A. 506, 529 Ruckle, W. H. 104, 278 Rudin, W. 59, 104 Runolfsson, T. 381 Ryan, M. J. 291 Sahlin, N.-E. 146 Salinetti, G. 85 Samet, D. 476 Sargent, T. J. 286, 299–300, 364, 366, 378, 381, 383, 386, 392–3, 412–14 Sarin, R. K. 26–7, 29, 136, 146, 148, 181, 195, 244–5, 247, 257 Savage, L. J. 3, 7, 12, 24, 26, 37, 109, 122, 126, 140, 146, 151, 155, 160, 173, 183, 195, 209–10, 213, 215, 217, 234, 258, 419, 429, 486; P2 109; Theorem 121 Savage, L. J.: act 194; acts, EU axiomatizations on finite state space 249; axiomatization of expected utility 181; axioms 4–5, 17432, 526; decision theory 244; definition of probability 137; EU 146; framework by Gilboa 245; model 41, 421; model with richness of state space 25; paradigm, objections to 125; preference conditions of 21; rationality test 5; single prior 431, 483; standard frameworks for modeling uncertainty 246; subjective expected utility model axiomatized by 483; subjective expected utility (SEU) theory 136, 146, 148; Savage, postulates of: cumulative dominance 142; fineness of unambiguous events 142; nontriviality 142; P4 (cumulative reduction) 141, 143, 145; sure-thing principle (P2) 5, 24, 40, 136, 141, 144, 431; on unambiguous acts 141; weak ordering 141 Schaefer, H. H. 279 Scheinkman, J. A. 365, 371–2, 375, 387–8, 412–13 Schervish, M. J. 127, 156 Schmeidler, D. J. 4–6, 9, 14, 17, 20, 22–3, 26, 29–32, 37, 41, 43, 46, 49, 53–4, 56, 69, 77, 81, 88–9, 91, 108, 116, 124–6,
Index 555 128, 136–8, 143–4, 146, 148, 155, 160–1, 165, 171, 173, 177–8, 180, 194, 210, 214, 231, 253, 256, 258; 261–3, 266, 274–5, 291, 296, 299, 306, 336, 339–40, 353, 361, 416, 419–20, 423, 426, 431–2, 438–9, 452, 464, 466, 473, 476, 481, 483, 485, 487, 514, 519, 531, 533, 535, 537, 539; approach to ambiguity 73; CEU model 214; characterization of exact games 53; coin example 6; critique of Bayesianism 5; decision theory 46; definition of uncertainty aversions 172, 174, 195; financial market outcomes 284–7; formulation of CEU 195; inequality measurement 297–8; interest in uncertainty 12; intertemporal choice and multi-attribute choice 298–9; lotteryacts formulation 139; model 146; model of decision making under uncertainty, economic applications of 283–300; models, tractability of 299; optimal contractual arrangements 287–90; probabilistic sophistication model 39; strategic interaction 290–6; theorem 70; two-stage approach 148 Schmeidler–Gilboa model 421 Schmidt, U. 30 Schneider, M. 284, 414 Schroder, M. 381 Schultz, M. H. 107 Schwartz, J. T. 47, 49, 53–5, 58, 75, 79, 104, 124, 128–9, 131–2, 272, 276, 474 Segal, U. 14, 122, 146, 515 semigroups for model specification, robustness, prices of risk, and model detection, four 364–415; approximating model 377, 402; and its associated generator 365; as contraction 413; for detection error probability bounds 401; formulation of Markov processes 412; iterated laws and 365; marginal utility process in approximating model 407; model space, generator of positive contraction 400; parameterizations of generators of 377; pricing 372, 377; for pricing under robustness 401; rational expectations and model misspecification 364–5; see also model specification and semigroups Serlang, S. R. C. 169 set: of consequences 159; of priors 126 set functions 46–58; basic properties 46–9; core 49–58; lemma on basic properties
of null sets 55; lemma on continuous exact games 56; lemma on Σ as a σalgebra 53; proposition on balanced games 52; proposition on core compactness property 49; proposition on finite variations norm 48; proposition on “non-atomic” cores 57; theorem on balanced game 50; theorem on positive games 54 SEU see subjective expected utility; see also Savage Shafer, G. 15, 84–5, 96, 157, 208, 261, 267, 274, 426 Shalev, J. 298 Shannon, C. 287 Shapiro, L. 25 Shapley, L. S. 19, 48, 50, 65, 81–2, 101, 104, 159, 261, 263, 332 Sharkey, W. W. 104 Shelanski, H. 289 Shiller, R. J. 430–1; model of “fads” 454 Shin, H. S. 293 Shitovitz, B. 81 shocks 366; distributions 386 Shreve, S. E. 461 Simon, H. A. 303 Simonsen, M. H. 158, 428, 445, 468 Sims, C. A. 412 Singell Jr., L. D. 466 Sion, M. 458 Sipos, J. 103 SIRD see stochastically independent randomizing device Skiadas, C. 381, 467 Slutsky matrix 420 Smets, Ph. 157 Smith, A. B. C. 125 Smith, L. 401, 414 social choice 14; problems, reduction to decision under uncertainty 539 Soner, H. M. 383 sources of uncertainty 342 space of events and outcomes 322 specification: error in rational expectations models 412; tests, likelihood-based 364 standard expectations operator 311; additivity of 305 standard incomplete market equilibrium 356 standard two-period pure-exchange economy 474 state-contingent contracts 303 state space 23; outcome space, richness on 22
556
Index
states: of nature 23, 127, 139; of the world 159, 245 statistical detection operator 412 stochastic discount factors 448, 450 stochastic dominance underlying contingency space 333 stochastically independent randomization and uncertainty aversion 244–58; CEU, uncertainty and randomizing devices 253–5; definition of preferences 248–9; MMEU and randomising devices 250–3; modelling randomizing device 246–8; and preferences 248–56; preliminaries and notation 245–6; stochastically independent randomizing device (SIRD), condition 255–6; stochastically independent randomizing device (SIRD), definition of 249–50; theorem on CEU preferences 253–4; theorem on MMEU preferences 250–1 stochastically independent randomizing device (SIRD) 249–50; condition 255–6 stock markets: prices fall after initial public offering 158; volatility in 210 Stokey, N. 158, 458, 481; updating NA 158 Stone lattices 69 Strassen, V. 103, 126 Straszewicz, S. 134; theorem of 134 strategies in dynamic game 523–4; choice of action(s) 523; deferring or concealing choice of 525 strategy: in decision making, theory of 290; in equilibrium, defining under ambiguity 290; in interaction 290–6; pure 291 Stuck, B. W. 400 subjective decision attitude toward incomplete information 22 subjective expected utility (SEU) 24, 26; conditions to characterize 25–8; maximization as benchmark representing ambiguity neutrality 231–4; maximizers 305, 337; model 171; model of Anscombe and Aumann 218; orderings 217; preference relation 224; theory 136, 146, 148, 180; theory of decision making under uncertainty 209; world 344; see also Savage subjective probabilities or “capacities” 139; measure 24 subjective probability and expected utility (EU) without additivity 108–34; Anscombe–Aumann Theorem 114–15;
axioms and background 111–14; theorem 115–20 Sugden, R. 29 Summers, L. H. 450 super modularity, or 2-monotonicity 9 superadditive game 47 Sure-thing Principle 21–2, 24–5, 109, 527; of unambiguous acts 136, 141, 144; variation preserving 298; see also Savage symmetric and complementary uncertain events 108 symmetry: for additive capacity 144; in information 108 λ-system (a class closed with respect to complements and disjoint unions) 40–2; see also Zhang Tallarini, Jr, T. D. 365–7, 370, 392, 406–7, 409, 411–14; risk-sensitivity interpretation Tallon, J.-M. 30, 214, 285–6, 289, 300, 309, 350, 354, 336, 472–3 taxonomy on games 50 technical axioms 22 technical richness conditions 25 Thompson, A. 447 Tirole, J. 303, 497, 519 tools of classical statistics 11 ν-topology 264; properties of the 265 total variation norm 47 trade among agents 158 tradeoff-consistency-like axioms 31 transaction-specific assets 320 transaction-specific investments 321 transactions costs 303, 414; of asymmetric information 421; paradigm 320, 333 transferable utility coalitional games 261 transitivity 22, 29; and completeness of preferences 20 Turmuhambetova, G. A. 366, 378, 383, 386, 392 Tuttle, M. R. 156 Tversky, A. 3, 13–14, 22, 26, 29–30, 40, 242; and Kahneman, Prospect Theory of 3; see also Prospect Theory two-coins flip example 41 two-consequence acts 145 two period finance economy, model of stylized 342 “two-stage” or Anscombe–Aumann model 147, 244, 247 two-urn and one-urn experiments of Ellsberg 4; see also Ellsberg
Index 557 Tychonoff Theorem 51 unambiguous acts: “behavioral” notion of 213; and events 226–8 unambiguous events 138; revealed 144–6 unambiguous preference relation 41 unanimity games 62, 82–3, 262, 266–7, 275 uncertainty: averse players 291; aversion axiom 43; and Bayesianism 3–6; kinds of 36; premium 110; premium incorporated in equilibrium security market prices 287; qualified by a probability measure 4; representation by nonadditive measure 15; and vertical integration, link between 305; of war 5 uncertainty aversion 110, 119, 128, 177–8, 181, 183, 187, 201, 242, 246, 423; additive function on σχ 201; bets, beliefs and 184–6; for the CEU model 258; and convexity 183; current definition of 173–5; definition of 171–207; definition and attractive properties of 179–86; under differentiability 191–3, 202–5; differentiable utilities 186–93; eventwise differentiability, definition of 187–9; for general nonbinary acts 193; implications of 185–6; inner measures 183–4; lemma on CEU utility function 182; lemma on CEU utility function proof 196; lemma on inner measure 183; lemma on inner measure proof 198; lemma on probability measure 180; lemma on probability measure proof 196; multiple-priors and CEU utilities 180–3; of preferences 432; theorem on eventwise differentiability 192–3; theorem on eventwise differentiability, proof 198–201; theorem on multiple priors order 181; theorem on probabilistically sophisticated order180; theorem 190 uncertainty aversion and optimal portfolio: constant 423; definition on amount of probability lost 423; definition on nonadditive probabilities 424; lemma on constant uncertainty aversion 423; lemma on optimal portfolio 426; portfolio choice 425–6; risk aversion and 419–27; theorem on nonadditive probabilities 424; theorem on riskaverse investor 426 uncertainty loving and uncertainty neutrality 178
uncertainty neutral benchmark, Epstein set of preferences 258 Uniform Boundedness Principle 75 unindexed debt 286 unique prior probability 155 unknown urn with randomisation: in consequence space (AnscombeAumann) 246; in the state space only (Savage) 247 update rule 161; classical 163 updating ambiguous beliefs 155–68; Bayesian and classical rules 161–3; framework and preliminaries 159–61; proofs and related analysis 163–8; proposition on associated set of measures 161; proposition on commutativity in 162; proposition on convexity in 161; theorem on commutativity 163; theorem on commutativity proof 165–8; theorem on equivalence162; theorem on equivalence proof 163–5 Uppal, R. 286 U.S. auto manufacturers 321 utility: characterizations of properties of 28; function 24, 155, 159, 437; recursive model of 438; sophistication 44 Vardennan, S. 127 Vind, K. 27, 29 volatility: for prices 454; puzzle of excess 430 Volkmer, H. 208 von Neumann, J. 124, 169, 419 von Neumann–Morgenstern: expected utility theory 155; index intertemporal 455; model 121, 486; stability of cores 81; Theorem 109, 113–14, 117, 128; utility index 343; utility of money 111 wage contract 287 Wakai, K. 298 Wakker, P. P. 17, 20, 26–30, 40, 77, 121, 126, 136, 138, 141, 148, 155, 160, 181, 195, 217, 220, 236, 242, 244–9, 257, 298; tradeoff consistency technique 27 Wald, A. 126, 424; decision rule 125; minimax criterion 126 Walley, P. 353, 357, 361, 430, 434, 439 Wang, T. 103, 173, 210, 284–6, 309, 341, 351–2, 362, 365–7, 370, 401, 412–14, 416, 429, 466, 520 Wasserman, L. A. 208, 434–5, 451, 465 Weak Beliefs Equilibrium 509, 511, 513
558
Index
weak order 111, 127, 140–1, 215 weak*-topology 104, 128, 474, 477 Weber, H. 208 Weber, M. 309, 430, 483, 527 Webster, R. 101 Werlang, S. R. C. 158, 259, 284–5, 291–2, 294, 309–10, 337–8, 340, 351, 416, 445, 468–9, 473, 483, 493, 509, 511–14, 516, 520, 529; definition of Nash Equilibrium Under Uncertainty 511; notion of equilibrium 296 Werner, J. 363 West, K. D. 454 Wets, R. 85 Weymark, J. A. 14, 302, 533–4 Whinston, M. D. 321 Widder, D. V. 85 Williams, N. 366, 378, 383, 386, 392 Williamson, O. E. 303 Williams, S. R. 313 Wolfenson, M. 49 Wolinsky, A. 529 Woodford, M. 467
worst-case jump measure 376 worst-case models 366, 369–70, 377, 386, 392–3, 402, 405, 410; specification semigroups, θ-constrained 382 Yaari, M. E. 14, 122, 145, 172, 175, 211, 533–4; axiomatization of rankdependent utility for risk 24; general definition of risk aversion for nonexpected utility references 37 Yaron, A. 403–4, 414 Yoo, K. R. 158 Yor, M. 413 Yosida approximation 371 Zarnowitz, V. 430–1, 456 Zeidler, E. 458 Zeldes, S. 361 Zhang, J. 40, 43, 174, 177, 183, 214–15, 234, 242; definition of unambiguous event 41; λ-system 175 Zhou, L. 69 Zin, S. E. 286, 366, 437, 458, 466–7
eBooks – at www.eBookstore.tandf.co.uk
A library at your fingertips!
eBooks are electronic versions of printed books. You can store them on your PC/laptop or browse them online. They have advantages for anyone needing rapid access to a wide variety of published, copyright information. eBooks can help your research by enabling you to bookmark chapters, annotate text and use instant searches to find specific words or phrases. Several eBook files would fit on even a small laptop or PDA. NEW: Save money by eSubscribing: cheap, online access to any eBook for as long as you need it.
Annual subscription packages We now offer special low-cost bulk subscriptions to packages of eBooks in certain subject areas. These are available to libraries or to individuals. For more information please contact
[email protected] We’re continually developing the eBook concept, so keep up to date by visiting the website.
www.eBookstore.tandf.co.uk