CONTENTS Financial Innovation and the Transactions Demand for Cash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RICARDO LAGOS AND GUILLAUME ROCHETEAU: Liquidity in Asset Markets With Search Frictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GLENN ELLISON AND SARA FISHER ELLISON: Search, Obfuscation, and Price Elasticities on the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JOHANNES HÖRNER AND STEFANO LOVO: Belief-Free Equilibria in Games With Incomplete Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MANUEL ARELLANO AND STÉPHANE BONHOMME: Robust Priors in Nonlinear Panel Data Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FERNANDO ALVAREZ AND FRANCESCO LIPPI:
363 403 427 453 489
NOTES AND COMMENTS: HENRIK JACOBSEN KLEVEN, CLAUS THUSTRUP KREINER, AND EMMANUEL SAEZ:
The Optimal Income Taxation of Couples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SHOUYONG SHI: Directed Search for Equilibrium Wage–Tenure Contracts . . . . . . . . SOKBAE LEE, OLIVER LINTON, AND YOON-JAE WHANG: Testing for Stochastic Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EDI KARNI: A Mechanism for Eliciting Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2008 ELECTION OF FELLOWS TO THE ECONOMETRIC SOCIETY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VOL. 77, NO. 2 — March, 2009
537 561 585 603 607 615 617
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org EDITOR STEPHEN MORRIS, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] MANAGING EDITOR GERI MATTSON, 2002 Holly Neck Road, Baltimore, MD 21221, U.S.A.; mattsonpublishingservices@ comcast.net CO-EDITORS DARON ACEMOGLU, Dept. of Economics, MIT, E52-380B, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] STEVEN BERRY, Dept. of Economics, Yale University, 37 Hillhouse Avenue/P.O. Box 8264, New Haven, CT 06520-8264, U.S.A.;
[email protected] WHITNEY K. NEWEY, Dept. of Economics, MIT, E52-262D, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] WOLFGANG PESENDORFER, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] LARRY SAMUELSON, Dept. of Economics, Yale University, New Haven, CT 06520-8281, U.S.A.;
[email protected] HARALD UHLIG, Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A.;
[email protected] ASSOCIATE EDITORS YACINE AÏT-SAHALIA, Princeton University JOSEPH G. ALTONJI, Yale University JAMES ANDREONI, University of California, San Diego DONALD W. K. ANDREWS, Yale University JUSHAN BAI, New York University MARCO BATTAGLINI, Princeton University PIERPAOLO BATTIGALLI, Università Bocconi DIRK BERGEMANN, Yale University MICHELE BOLDRIN, Washington University in St. Louis VICTOR CHERNOZHUKOV, Massachusetts Institute of Technology J. DARRELL DUFFIE, Stanford University JEFFREY ELY, Northwestern University LARRY G. EPSTEIN, Boston University HALUK ERGIN, Washington University in St. Louis FARUK GUL, Princeton University JINYONG HAHN, University of California, Los Angeles PHILIP A. HAILE, Yale University PHILIPPE JEHIEL, Paris School of Economics YUICHI KITAMURA, Yale University PER KRUSELL, Princeton University and Stockholm University OLIVER LINTON, London School of Economics
BART LIPMAN, Boston University THIERRY MAGNAC, Toulouse School of Economics GEORGE J. MAILATH, University of Pennsylvania DAVID MARTIMORT, IDEI-GREMAQ, Université des Sciences Sociales de Toulouse STEVEN A. MATTHEWS, University of Pennsylvania ROSA L. MATZKIN, University of California, Los Angeles LEE OHANIAN, University of California, Los Angeles WOJCIECH OLSZEWSKI, Northwestern University ERIC RENAULT, University of North Carolina PHILIP J. RENY, University of Chicago JEAN-MARC ROBIN, Université de Paris 1 and University College London SUSANNE M. SCHENNACH, University of Chicago UZI SEGAL, Boston College CHRIS SHANNON, University of California, Berkeley NEIL SHEPHARD, Oxford University MARCIANO SINISCALCHI, Northwestern University JEROEN M. SWINKELS, Washington University in St. Louis ELIE TAMER, Northwestern University IVÁN WERNING, Massachusetts Institute of Technology ASHER WOLINSKY, Northwestern University
EDITORIAL ASSISTANT: MARY BETH BELLANDO, Dept. of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021, U.S.A.;
[email protected] Information on MANUSCRIPT SUBMISSION is provided in the last two pages. Information on MEMBERSHIP, SUBSCRIPTIONS, AND CLAIMS is provided in the inside back cover.
SUBMISSION OF MANUSCRIPTS TO ECONOMETRICA 1. Members of the Econometric Society may submit papers to Econometrica electronically in pdf format according to the guidelines at the Society’s website: http://www.econometricsociety.org/submissions.asp Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in pdf format, one copy of a paper prepared according to the guidelines at the website above can be submitted, with a cover letter, by mail addressed to Professor Stephen Morris, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A. 2. There is no charge for submission to Econometrica, but only members of the Econometric Society may submit papers for consideration. In the case of coauthored manuscripts, at least one author must be a member of the Econometric Society. Nonmembers wishing to submit a paper may join the Society immediately via Blackwell Publishing’s website. Note that Econometrica rejects a substantial number of submissions without consulting outside referees. 3. It is a condition of publication in Econometrica that copyright of any published article be transferred to the Econometric Society. Submission of a paper will be taken to imply that the author agrees that copyright of the material will be transferred to the Econometric Society if and when the article is accepted for publication, and that the contents of the paper represent original and unpublished work that has not been submitted for publication elsewhere. If the author has submitted related work elsewhere, or if he does so during the term in which Econometrica is considering the manuscript, then it is the author’s responsibility to provide Econometrica with details. There is no page fee; nor is any payment made to the authors. 4. Econometrica has the policy that all empirical and experimental results as well as simulation experiments must be replicable. For this purpose the Journal editors require that all authors submit datasets, programs, and information on experiments that are needed for replication and some limited sensitivity analysis. (Authors of experimental papers can consult the posted list of what is required.) This material for replication will be made available through the Econometrica supplementary material website. The format is described in the posted information for authors. Submitting this material indicates that you license users to download, copy, and modify it; when doing so such users must acknowledge all authors as the original creators and Econometrica as the original publishers. If you have compelling reason we may post restrictions regarding such usage. At the same time the Journal understands that there may be some practical difficulties, such as in the case of proprietary datasets with limited access as well as public use datasets that require consent forms to be signed before use. In these cases the editors require that detailed data description and the programs used to generate the estimation datasets are deposited, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results. This exemption is offered on the understanding that the authors made reasonable effort to obtain permission to make available the final data used in estimation, but were not granted permission. We also understand that in some particularly complicated cases the estimation programs may have value in themselves and the authors may not make them public. This, together with any other difficulties relating to depositing data or restricting usage should be stated clearly when the paper is first submitted for review. In each case it will be at the editors’ discretion whether the paper can be reviewed. 5. Papers may be rejected, returned for specified revision, or accepted. Approximately 10% of submitted papers are eventually accepted. Currently, a paper will appear approximately six months from the date of acceptance. In 2002, 90% of new submissions were reviewed in six months or less. 6. Submitted manuscripts should be formatted for paper of standard size with margins of at least 1.25 inches on all sides, 1.5 or double spaced with text in 12 point font (i.e., under about 2,000 characters, 380 words, or 30 lines per page). Material should be organized to maximize readability; for instance footnotes, figures, etc., should not be placed at the end of the manuscript. We strongly encourage authors to submit manuscripts that are under 45 pages (17,000 words) including everything (except appendices containing extensive and detailed data and experimental instructions).
While we understand some papers must be longer, if the main body of a manuscript (excluding appendices) is more than the aforementioned length, it will typically be rejected without review. 7. Additional information that may be of use to authors is contained in the “Manual for Econometrica Authors, Revised” written by Drew Fudenberg and Dorothy Hodges, and published in the July, 1997 issue of Econometrica. It explains editorial policy regarding style and standards of craftmanship. One change from the procedures discussed in this document is that authors are not immediately told which coeditor is handling a manuscript. The manual also describes how footnotes, diagrams, tables, etc. need to be formatted once papers are accepted. It is not necessary to follow the formatting guidelines when first submitting a paper. Initial submissions need only be 1.5 or double-spaced and clearly organized. 8. Papers should be accompanied by an abstract of no more than 150 words that is full enough to convey the main results of the paper. On the same sheet as the abstract should appear the title of the paper, the name(s) and full address(es) of the author(s), and a list of keywords. 9. If you plan to submit a comment on an article which has appeared in Econometrica, we recommend corresponding with the author, but require this only if the comment indicates an error in the original paper. When you submit your comment, please include any correspondence with the author. Regarding comments pointing out errors, if an author does not respond to you after a reasonable amount of time, then indicate this when submitting. Authors will be invited to submit for consideration a reply to any accepted comment. 10. Manuscripts on experimental economics should adhere to the “Guidelines for Manuscripts on Experimental Economics” written by Thomas Palfrey and Robert Porter, and published in the July, 1991 issue of Econometrica. Typeset at VTEX, Akademijos Str. 4, 08412 Vilnius, Lithuania. Printed at The Sheridan Press, 450 Fame Avenue, Hanover, PA 17331, USA. Copyright © 2009 by The Econometric Society (ISSN 0012-9682). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation, including the name of the author. Copyrights for components of this work owned by others than the Econometric Society must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Posting of an article on the author’s own website is allowed subject to the inclusion of a copyright statement; the text of this statement can be downloaded from the copyright page on the website www.econometricsociety.org/permis.asp. Any other permission requests or questions should be addressed to Claire Sashi, General Manager, The Econometric Society, Dept. of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA. Email:
[email protected]. Econometrica (ISSN 0012-9682) is published bi-monthly by the Econometric Society, Department of Economics, New York University, 19 West 4th Street, New York, NY 10012. Mailing agent: Sheridan Press, 450 Fame Avenue, Hanover, PA 17331. Periodicals postage paid at New York, NY and additional mailing offices. U.S. POSTMASTER: Send all address changes to Econometrica, Blackwell Publishing Inc., Journals Dept., 350 Main St., Malden, MA 02148, USA.
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Membership, Subscriptions, and Claims Membership, subscriptions, and claims are handled by Blackwell Publishing, P.O. Box 1269, 9600 Garsington Rd., Oxford, OX4 2ZE, U.K.; Tel. (+44) 1865-778171; Fax (+44) 1865-471776; Email
[email protected]. North American members and subscribers may write to Blackwell Publishing, Journals Department, 350 Main St., Malden, MA 02148, USA; Tel. 781-3888200; Fax 781-3888232. Credit card payments can be made at www.econometricsociety.org. Please make checks/money orders payable to Blackwell Publishing. Memberships and subscriptions are accepted on a calendar year basis only; however, the Society welcomes new members and subscribers at any time of the year and will promptly send any missed issues published earlier in the same calendar year. Individual Membership Rates Ordinary Member 2009 Print + Online 1933 to date Ordinary Member 2009 Online only 1933 to date Student Member 2009 Print + Online 1933 to date Student Member 2009 Online only 1933 to date Ordinary Member—3 years (2009–2011) Print + Online 1933 to date Ordinary Member—3 years (2009–2011) Online only 1933 to date Subscription Rates for Libraries and Other Institutions Premium 2009 Print + Online 1999 to date Online 2009 Online only 1999 to date
$a $60
€b €40
£c £32
Concessionaryd $45
$25
€18
£14
$10
$45
€30
£25
$45
$10
€8
£6
$10
$175
€115
£92
$70
€50
£38
$a
€b
£c
Concessionaryd
$550
€360
£290
$50
$500
€325
£260
Free
a All
countries, excluding U.K., Euro area, and countries not classified as high income economies by the World Bank (http://www.worldbank.org/data/countryclass/classgroups.htm), pay the US$ rate. High income economies are: Andorra, Antigua and Barbuda, Aruba, Australia, Austria, The Bahamas, Bahrain, Barbados, Belgium, Bermuda, Brunei, Canada, Cayman Islands, Channel Islands, Cyprus, Czech Republic, Denmark, Equatorial Guinea, Estonia, Faeroe Islands, Finland, France, French Polynesia, Germany, Greece, Greenland, Guam, Hong Kong (China), Hungary, Iceland, Ireland, Isle of Man, Israel, Italy, Japan, Rep. of Korea, Kuwait, Liechtenstein, Luxembourg, Macao (China), Malta, Monaco, Netherlands, Netherlands Antilles, New Caledonia, New Zealand, Northern Mariana Islands, Norway, Oman, Portugal, Puerto Rico, Qatar, San Marino, Saudi Arabia, Singapore, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Taiwan (China), Trinidad and Tobago, United Arab Emirates, United Kingdom, United States, Virgin Islands (US). Canadian customers will have 6% GST added to the prices above. b Euro area countries only. c UK only. d Countries not classified as high income economies by the World Bank only. Back Issues Single issues from the current and previous two volumes are available from Blackwell Publishing; see address above. Earlier issues from 1986 (Vol. 54) onward may be obtained from Periodicals Service Co., 11 Main St., Germantown, NY 12526, USA; Tel. 518-5374700; Fax 518-5375899; Email
[email protected].
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Administrative Office: Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA; Tel. 212-9983820; Fax 212-9954487 General Manager: Claire Sashi (
[email protected]) 2009 OFFICERS ROGER B. MYERSON, University of Chicago, PRESIDENT JOHN MOORE, University of Edinburgh and London School of Economics, FIRST VICE-PRESIDENT BENGT HOLMSTRÖM, Massachusetts Institute of Technology, SECOND VICE-PRESIDENT TORSTEN PERSSON, Stockholm University, PAST PRESIDENT RAFAEL REPULLO, CEMFI, EXECUTIVE VICE-PRESIDENT
2009 COUNCIL (*)DARON ACEMOGLU, Massachusetts Institute of Technology MANUEL ARELLANO, CEMFI SUSAN ATHEY, Harvard University ORAZIO ATTANASIO, University College London (*)TIMOTHY J. BESLEY, London School of Economics KENNETH BINMORE, University College London TREVOR S. BREUSCH, Australian National University DAVID CARD, University of California, Berkeley JACQUES CRÉMER, Toulouse School of Economics (*)EDDIE DEKEL, Tel Aviv University and Northwestern University MATHIAS DEWATRIPONT, Free University of Brussels DARRELL DUFFIE, Stanford University HIDEHIKO ICHIMURA, University of Tokyo MATTHEW O. JACKSON, Stanford University
LAWRENCE J. LAU, The Chinese University of Hong Kong CESAR MARTINELLI, ITAM HITOSHI MATSUSHIMA, University of Tokyo MARGARET MEYER, University of Oxford PAUL R. MILGROM, Stanford University STEPHEN MORRIS, Princeton University ADRIAN R. PAGAN, Queensland University of Technology JOON Y. PARK, Texas A&M University and Sungkyunkwan University CHRISTOPHER A. PISSARIDES, London School of Economics ROBERT PORTER, Northwestern University ALVIN E. ROTH, Harvard University LARRY SAMUELSON, Yale University ARUNAVA SEN, Indian Statistical Institute MARILDA SOTOMAYOR, University of São Paulo JÖRGEN W. WEIBULL, Stockholm School of Economics
The Executive Committee consists of the Officers, the Editor, and the starred (*) members of the Council.
REGIONAL STANDING COMMITTEES Australasia: Trevor S. Breusch, Australian National University, CHAIR; Maxwell L. King, Monash University, SECRETARY. Europe and Other Areas: John Moore, University of Edinburgh and London School of Economics, CHAIR; Helmut Bester, Free University Berlin, SECRETARY; Enrique Sentana, CEMFI, TREASURER. Far East: Joon Y. Park, Texas A&M University and Sungkyunkwan University, CHAIR. Latin America: Pablo Andres Neumeyer, Universidad Torcuato Di Tella, CHAIR; Juan Dubra, University of Montevideo, SECRETARY. North America: Roger B. Myerson, University of Chicago, CHAIR; Claire Sashi, New York University, SECRETARY. South and Southeast Asia: Arunava Sen, Indian Statistical Institute, CHAIR.
Econometrica, Vol. 77, No. 2 (March, 2009), 363–402
FINANCIAL INNOVATION AND THE TRANSACTIONS DEMAND FOR CASH BY FERNANDO ALVAREZ AND FRANCESCO LIPPI1 We document cash management patterns for households that are at odds with the predictions of deterministic inventory models that abstract from precautionary motives. We extend the Baumol–Tobin cash inventory model to a dynamic environment that allows for the possibility of withdrawing cash at random times at a low cost. This modification introduces a precautionary motive for holding cash and naturally captures developments in withdrawal technology, such as the increasing diffusion of bank branches and ATM terminals. We characterize the solution of the model, which qualitatively reproduces several empirical patterns. We estimate the structural parameters using micro data and show that quantitatively the model captures important economic patterns. The estimates are used to quantify the expenditure and interest rate elasticity of money demand, the impact of financial innovation on money demand, the welfare cost of inflation, and the benefit of ATM ownership. KEYWORDS: Money demand, technological progress, inventory models.
1. INTRODUCTION THERE IS A LARGE LITERATURE arguing that financial innovation is important for understanding money demand, yet this literature seldom integrates the empirical analysis with a model of the financial innovation. In this paper we develop a dynamic inventory model of money demand that explicitly incorporates the effects of financial innovation on cash management. We estimate the structural parameters of the model using detailed micro data from Italian households, and use the estimates to revisit several classic questions on money demand. As is standard in the inventory theory we assume that nonnegative cash holdings are needed to pay for cash purchases c. We extend the Baumol (1952) and Tobin (1956) model to a dynamic environment that allows for the opportunity to withdraw cash at random times at no cost. Withdrawals at any other time involve a fixed cost b. In particular, the expected number of such opportunities per period of time is described by a single parameter p. Examples of such opportunities are finding an ATM that does not charge a fee or passing by a bank desk at a time with a low opportunity cost. Another interpretation is that 1 We thank the co-editor and three anonymous referees for constructive criticisms on previous versions of the paper. We also thank Alessandro Secchi for his guidance in the construction and analysis of the data base. We benefited from the comments of Manuel Arellano, V. V. Chari, Luigi Guiso, Bob Lucas, Greg Mankiw, Fabiano Schivardi, Rob Shimer, Pedro Teles, Randy Wright, and seminar participants at the University of Chicago, University of Sassari, Harvard University, MIT, Wharton School, Northwestern, FRB of Chicago, FRB of Minneapolis, Bank of Portugal, European Central Bank, Bank of Italy, CEMFI, EIEF, University of Cagliari, University of Salerno, Austrian National Bank, Tilburg University, and Erasmus University Rotterdam. Alvarez thanks the Templeton Foundation for support and the EIEF for their hospitality.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7451
364
F. ALVAREZ AND F. LIPPI
p measures the probability that an ATM is working properly or a bank desk is open for business. Financial innovations, such as the increase in the number bank branches and ATM terminals, can be modeled by increases in p and decreases in b. It is useful to split the agent’s decision on financing of her purchases into three parts: (i) choose a technology, which is indexed by p and b (e.g., adopt the ATM card); (ii) decide the amount of expenditure to be made in cash (c), as opposed to credit or debit; (iii) decide the optimal inventory policy to minimize the cost of cash management for a given technology (p b) and level of c. This paper focuses on (iii). We stress that given p, b, and c, the inventory problem in (iii) is well defined even if the optimal choice of c in (ii) depends on p and b. In Section 5 we show that the presence of a systematic relationship between c and (p b) does not bias the estimates of the technological parameters. Our model changes the predictions of the Baumol–Tobin model (BT henceforth) in ways that are consistent with stylized facts concerning households’ cash management behavior. The randomness introduced by p gives rise to a precautionary motive for holding cash: when agents have an opportunity to withdraw cash at zero cost, they do so even if they have some cash on hand. Thus, the average cash balances held at the time of a withdrawal relative to the average cash holdings, M/M, is a measure of the strength of the precautionary motive. This ratio ranges between 0 and 1, and is increasing in p. Using household data for Italy and the United States, we document that M/M is about 0.4, instead of being 0 as predicted by the BT model. Another property of our model is that a higher p increases the number of withdrawals n and decreases the average withdrawal size W , with W /M ranging between 2 and 0. Using data from Italian households, we measure values of n and W /M that are inconsistent with those predicted by the BT model, but can be rationalized by our model. We organize the analysis as follows. In Section 2 we use a panel data of Italian households to illustrate key cash management patterns, including the strength of precautionary motive, to compare them to the predictions of the BT model and motivate the analysis that follows. Sections 3 and 4 present the theory. Section 3 analyzes the effect of financial innovation using a version of the BT model where agents have a deterministic number of free withdrawals per period. This model provides a simple illustration of how technology affects the level and the shape of money demand, that is, its interest and expenditure elasticities. Section 4 introduces our benchmark inventory model, a stochastic dynamic version of the one in Section 3. In this model agents have random meetings with a financial intermediary in which they can withdraw money at no cost. We solve analytically for the Bellman equation and characterize its optimal decision rule. We derive the distribution of currency holdings, the aggregate money demand, the average number of withdrawals, the average size of withdrawals, and the average cash balances at the time of a withdrawal. We show that a single index of technology, b · p2 , determines both the shape of the money demand and the strength
INNOVATION AND THE DEMAND FOR CASH
365
of its precautionary component. While technological improvements (higher p and lower b) unambiguously decrease the level of money demand, their effect on this index—and hence on the shape and the precautionary component of money demand—is ambiguous. The structural estimation of the model parameters will allow us to shed light on this issue. We conclude the section with an analysis of the welfare implications. Sections 5, 6, and 7 contain the empirical analysis. In Section 5 we estimate the model using the panel data for Italian households. We discuss identification and show that the two parameters p and b are overidentified because we observe four dimensions of household behavior: M, W , M, and n. The estimates reproduce the sizable precautionary holdings observed in the data. The patterns of the estimates are reasonable: for instance, the parameters for the households with an ATM card indicate their access to a better technology (higher p and lower b). Section 6 studies the implications of our findings for the time pattern of technology and for the expenditure and interest elasticity of the demand for currency. The estimates indicate that technology is better in locations with a higher density of ATM terminals and bank branches, and that it has improved through time. Even though our model can generate interest rate elasticities between 0 and 1/2, and expenditure elasticities between 1/2 and 1, the values implied by the estimates are close to 1/2 for both—the values of the BT model. We discuss how to reconcile this finding with the smaller estimates of the interest rate elasticity that are common in the literature.2 In Section 7 we use the estimates to quantify the welfare cost of inflation—relating it to the one in Lucas (2000)—and to measure the benefits of ATM card ownership. In spite of the finding that the interest elasticity is close to the one in BT over the sample, our estimate of the welfare cost is about half of the cost in BT. This happens because the interest elasticity is constant in BT, while it converges to zero in our model as the interest rate becomes nil. 2. CASH HOLDINGS PATTERNS OF ITALIAN HOUSEHOLDS Table I presents some statistics on the cash holdings patterns of Italian households based on the Survey of Household Income and Wealth.3 All households have checking accounts that pay interest at rates documented below. We report statistics separately for households with and without an ATM card. The 2 We remark that our interest rate elasticity, as in the BT model, refers to the ratio of money stock to cash consumption. Of course if cash consumption relative to total consumption is a function of interest rates, as in the Lucas and Stokey (1987) cash–credit model, the elasticity of money to total consumption will be even higher. A similar argument applies to the expenditure elasticity. The distinction is important for comparing our results with estimates in the literature, that typically use money/total consumption. See for instance Lucas (2000), who used aggregate US data, or Attanasio, Guiso, and Jappelli (2002), who used the same household data used here. 3 A periodic survey of the Bank of Italy that collects data on several social and economic characteristics. The cash management information used below is available only since 1993.
366
F. ALVAREZ AND F. LIPPI TABLE I HOUSEHOLDS’ CURRENCY MANAGEMENTa
Variable
1993
1995
1998
2000
2002
2004
Expenditure share paid w/ currencyb w/o ATM w. ATM
0.68 0.62
0.67 0.59
0.63 0.56
0.66 0.55
0.65 0.52
0.63 0.47
Currencyc M/c (c per day) w/o ATM w. ATM M per household, in 2004 eurosd w/o ATM w. ATM
15 10
17 11
19 13
18 12
17 13
18 14
430 370
490 410
440 370
440 340
410 330
410 350
Currency at withdrawalse M/M w/o ATM w. ATM
0.41 0.42
0.31 0.30
0.47 0.39
0.46 0.45
0.46 0.41
na na
Withdrawalf W /M w/o ATM w. ATM
2.3 1.5
1.7 1.2
1.9 1.3
2.0 1.4
2.0 1.3
1.9 1.4
No. of withdrawals n (per year)g w/o ATM w. ATM n Normalized: c/(2M) (c per year)g w/o ATM w. ATM
16 50
17 51
25 59
24 64
23 58
23 63
1.2 2.4
1.4 2.7
2.6 3.8
2.0 3.8
1.7 3.9
2.0 4.1
No. of observations w ATM cardh No. of observations w/o ATM cardh
2322 3421
2781 3020
2998 2103
3562 2276
3729 2275
3866 2190
a The unit of observation is the household. Entries are sample means computed using sample weights. Only households with a checking account and whose head is not self-employed are included, which accounts for about 85% of the sample observations in each year. b Ratio of expenditures paid with cash to total expenditures (durables, nondurables, and services). c Average currency during the year divided by daily expenditures paid with cash. d The average number of adults per household is 2.3. In 2004, 1 euro in Italy was equivalent to $1.25 in the United States, PPP adjusted (Source: World Bank International Comparison Program (ICP) tables). e Average currency at the time of withdrawal as a ratio to average currency. f Average withdrawal during the year as a ratio to average currency. g The entries with n = 0 are coded as missing values. h Number of households with bank account for whom the currency and the cash consumption data were available in each survey. Data on withdrawals were supplied by a smaller number of respondents (Source: Bank of Italy Survey of Household Income and Wealth).
survey records the household expenditure paid in cash during the year (we use cash and currency interchangeably to denote the value of coins and banknotes), which the table displays as a fraction of total consumption. The fraction is smaller for households with an ATM card and displays a downward trend for both type of households. These percentages are comparable to those
INNOVATION AND THE DEMAND FOR CASH
367
for the United States between 1984 and 1995.4 The table reports the sample mean of the ratio M/c, where M is the average currency held by the household during a year and c is the daily expenditure paid with currency. We notice that relative to c, Italian households held about twice as much cash as U.S. households between 1984 and 1995.5 Table I reports three statistics that are useful to assess the empirical performance of deterministic inventory models, such as the classic one by Baumol and Tobin. The first statistic is the ratio between currency holdings at the time of a withdrawal (M) and average currency holdings in each year (M). While this ratio is zero in deterministic inventory-theoretic models, its sample mean in the data is about 0.4. A comparable statistic for U.S. households is about 0.3 in 1984, 1986, and 1995 (see Table 1 in Porter and Judson (1996)). The second statistic is the ratio between the withdrawal amount (W ) and average currency holdings.6 While this ratio is 2 in the BT model, it is smaller in the data. The sample mean of this ratio for households with an ATM card is below 1.4; for those without an ATM it is slightly below 2. Inspection of the raw data shows that there is substantial variation across provinces: indeed, the median across households is about 1.0 for households with and without an ATM. The third n statistic is the normalized number of withdrawals, c/(2M) . The normalization is 7 chosen so that this statistic is equal to 1 in BT. As the table shows, the sample mean of this statistic is well above 1, especially for households with an ATM card. n The second statistic, WM , and the third, c/(2M) , are related through the accounting identity c = nW . In particular, if W /M is smaller than 2 and the identity holds, then the third statistic must be above 1. Yet we present separate sample means for these statistics because of the large measurement error in all these variables. This is informative because W enters in the first statistic but not in the second, and c enters in the third but not in the second. In the estimation section of the paper, we consider the effect of measurement error systemat4
Humphrey (2004) estimated that the mean share of total expenditures paid with currency in the United States is 36% and 28% in 1984 and 1995, respectively. If expenditures paid with checks are added to those paid with currency, the resulting statistics are about 85% and 75% in 1984 and 1995, respectively. The measure including checks was used by Cooley and Hansen (1991) to compute the share of cash expenditures for households in the United States where, contrary to the practice in Italy, checking accounts did not pay interest. For comparison, the mean share of total expenditures paid with currency by all Italian households was 65% in 1995. 5 Porter and Judson (1996), using currency and expenditure paid with currency, estimated that M/c was about 7 days both in 1984 and in 1986, and 10 days in 1995. A calculation for Italy following the same methodology yields about 20 and 17 days in 1993 and 1995, respectively. 6 The withdrawal amount is computed as the weighted average of ATM and bank desk withdrawals. Since in Italy there is no cash back at withdrawals, this measures the withdrawal amount quite accurately. See Appendix A in Alvarez and Lippi (2009) for more documentation. 7 In the BT model, the accounting identity nW = c holds and since withdrawals only happen when cash balances reach zero, then M = W /2.
368
F. ALVAREZ AND F. LIPPI TABLE II FINANCIAL INNOVATION AND THE OPPORTUNITY COST OF CASHa
Variable
1993
1995
1998
2000
2002
2004
Bank branchesb
038 (013)
042 (014)
047 (016)
050 (017)
053 (018)
055 (018)
ATM terminalsb
031 (018)
039 (019)
050 (022)
057 (022)
065 (023)
065 (022)
Interest rate on depositsc
61 (04)
54 (03)
22 (02)
17 (02)
11 (02)
07 (01)
Probability of cash theftd
22 (26)
18 (21)
21 (24)
22 (25)
21 (24)
22 (26)
46
52
20
26
26
23
CPI Inflation
a Mean (standard deviation in parentheses) across provinces. b Per thousand residents (Source: supervisory reports to the Bank of Italy and the Italian Central Credit Register). c Net nominal interest rates in percent. Arithmetic average between the self-reported interest on deposit account (Source: Survey of Household Income and Wealth) and the average deposit interest rate reported by banks in the province (Source: central credit register). d We estimate this probability using the time and province variation from statistics on reported crimes on purse snatching and pickpocketing. The level is adjusted to take into account both the fraction of unreported crimes as well as the fraction of cash stolen for different types of crimes using survey data on victimization rates (Source: Istituto nazionale di statistica (Istat) and authors’ computations; see Appendix B in Alvarez and Lippi (2009) for details).
ically, without altering the conclusion about the drawbacks of deterministic inventory-theoretic models. For each year, Table II reports the mean and standard deviation across provinces for the diffusion of bank branches and ATM terminals, and for two components of the opportunity cost of holding cash: interest rate paid on deposits and the probability of cash theft. The diffusion of bank branches and ATM terminals varies significantly across provinces and is increasing through time. Differences in the nominal interest rate across time are due mainly to the disinflation. The variation of nominal interest rates across provinces mostly reflects the segmentation of banking markets. The large differences in the probability of cash theft across provinces reflect variation in crime rates across rural vs. urban areas, and a higher incidence of such crimes in the North. Lippi and Secchi (2009) reported that the household data display patterns which are in line with previous empirical studies showing that the demand for currency decreases with financial development and that its interest elasticity is below 1/2.8 Tables I and II show that the opportunity cost of cash in 2004 is about 1/3 of the value in 1993 (the corresponding ratio for the nominal interest rate is about 1/9) and that the average of M/c shows an upward trend. Indeed the average of M/c across households of a given type (with and without 8 They estimated that the elasticity of cash holdings with respect to the interest rate is about 0 for agents who hold an ATM card and −02 for agents who do not.
INNOVATION AND THE DEMAND FOR CASH
369
ATM cards) is negatively correlated with the opportunity cost R in the crosssection, the time-series, and the pooled time-series and cross-section data. Yet the largest estimate for the interest rate elasticity is smaller than 0.25 and in most cases about 0.05 (in absolute value). Such patterns are consistent with both shifts of the money demand and movements along it. Our model and estimation strategy allows us to quantify each of them. 3. A MODEL WITH DETERMINISTIC FREE WITHDRAWALS This section presents a modified version of the BT model to illustrate how technological progress affects the level and interest elasticity of the demand for currency. Consider an agent who finances a consumption flow c by making n withdrawals from a deposit account. Let R be the opportunity cost of cash (e.g., the nominal interest rate on a deposit account). In a deterministic setting, agents’ cash balances decrease until they hit zero, when a new withdrawal must take place. Hence the size of each withdrawal is W = c/n and the average cash balance M = W /2. In the BT model, agents pay a fixed cost b for each withdrawal. We modify the latter by assuming that the agent has p free withdrawals, so that if the total number of withdrawals is n, then she pays only for the excess of n over p. Technology is thus represented by the parameters b and p. For concreteness, assume that the cost of a withdrawal is proportional to the distance to an ATM or bank branch. In a given period the agent is moving across locations, for reason unrelated to her cash management, so that p is the number of times that she is in a location with an ATM or bank branch. At any other time, b is the distance that the agent must travel to withdraw. In this setup, an increase in the density of bank branches or ATMs increases p and decreases b. The optimal number of withdrawals solves the minimization problem c min R + b max(n − p 0) (1) n 2n It is immediate that the value of n that solves the problem, and its associated M/c, depends only on β ≡ b/(cR), the ratio of the two costs, and p. The money demand for a technology with p ≥ 0 is given by ˆ b 1 bp2 M (2) = min 2 1 where bˆ ≡ c 2p R c To understand the workings of the model, fix b and consider the effect of increasing p (so that bˆ increases). For p = 0 we have the BT setup, so that when R is small, the agent decides to economize on withdrawals and choose a large value of M. Now consider the case of p > 0. In this case there is no reason to
370
F. ALVAREZ AND F. LIPPI
have less than p withdrawals, since these are free by assumption. Hence, for all R ≤ 2bˆ the agent will choose the same level of money holdings, namely, M = c/(2p), since she is not paying for any withdrawal but is subject to a ˆ positive opportunity cost. Note that the interest elasticity is zero for R ≤ 2b. ˆ increases, the money demand has a lower level and a Thus as p (hence b) lower interest rate elasticity than the money demand from the BT model. Indeed (2) implies that the range of interest rates R for which the money demand is smaller and has lower interest rate elasticity is increasing in p. On the other hand, if we fix bˆ and increase p, the only effect is to lower the level of money demand. The previous discussion makes clear that for fixed p bˆ controls the ˆ p controls its level. We think of “shape” of the money demand, and for fixed b, technological improvements as both increasing p and lowering b: the net effect ˆ hence on the slope of the money demand, is in principle ambiguous. The on b, empirical analysis below allows us to sign and quantify this effect. 4. A MODEL WITH RANDOM FREE WITHDRAWALS This section presents a model that generalizes the example of the previous section in several dimensions. It takes an explicit account of the dynamic nature of the cash inventory problem, as opposed to minimizing the average steady state cost. It distinguishes between real and nominal variables, as opposed to financing a constant nominal expenditure, or alternatively assuming zero inflation. Most importantly, it assumes that the agent has a Poisson arrival of free opportunities to withdraw cash at a rate p. Relative to the deterministic model, the randomness gives rise to a precautionary motive, so that some withdrawals occur when the agent still has a positive cash balance and the (average) W /M ratio is smaller than 2, as observed in Table I. The model retains the feature, discussed in Section 3, that the interest rate elasticity is smaller than 1/2 and is decreasing in the parameter p. It also generalizes the sense in which the shape of the money demand depends on the parameter bˆ = p2 b/c. We assume that the agent is subject to a cash-in-advance constraint and minimizes the cost of financing a given constant flow of cash consumption, denoted by c. Let m ≥ 0 denote the agent nonnegative real cash balances that decrease due to consumption and inflation: (3)
dm(t) = −c − m(t)π dt
for almost all t ≥ 0 The agent can withdraw or deposit at any time from an account that yields net real interest r. Transfers from the interest bearing account to cash balances are indicated by discontinuities in m: a withdrawal is a jump up on the cash balances, that is, m(t + ) − m(t − ) > 0 and likewise for a deposit. There are two sources or randomness in the environment, described by independent Poisson processes with intensitiesp1 and p2 . The first source describes
INNOVATION AND THE DEMAND FOR CASH
371
the arrivals of “free adjustment opportunities” (see the Introduction for examples); the second describes the arrival times where the agent’s cash balances are stolen. We assume that a fixed cost b is paid for each adjustment, unless it happens at the time of a free adjustment opportunity. We can write the problem of the agent as ∞
−rτj + − G(m) = min E0 Iτj b + (m(τj ) − m(τj )) (4) e {m(t)τj }
j=0
subject to (3) and m(t) ≥ 0, where τj denote the stopping times at which an adjustment (jump) of m takes place and m(0) = m is given. The indicator Iτj is 0—so the cost is not paid—if the adjustment occurs upon a free opportunity; otherwise it is 1. The expectation is taken with respect to the two Poisson processes. The parameters of this problem are r, π, p1 , p2 , b, and c. 4.1. Bellman Equations and Optimal Policy We turn to the characterization of the Bellman equation and of its associated optimal policy. We will guess, and later verify, that the optimal policy is described by two thresholds for m: 0 < m∗ < m∗∗ . The threshold m∗ is the value of cash that the agent chooses after a contact with a financial intermediary: we refer to it as the optimal cash replenishment level. The threshold m∗∗ is the value of cash beyond which the agent pays the cost b, contacts the intermediary, and makes a deposit so as to leave her cash balances at m∗ . Assuming that the optimal policy is of this type and that for m ∈ (0 m∗∗ ) the value function G is differentiable, it must satisfy (5)
ˆ − m + G(m) ˆ − G(m)] rG(m) = G (m)(−c − πm) + p1 min[m ˆ m≥0
ˆ + G(m) ˆ − G(m)] + p2 min[b + m ˆ m≥0
The first term gives the change in the value function per unit of time, conditional on no arrival of either free adjustment or of a cash theft. The second term gives the expected change conditional on the arrival of free adjustment ˆ − m is incurred instantly with its associated “capopportunity: an adjustment m ˆ − G(m). Likewise, the third term gives the change in the value ital gain” G(m) function conditional on a cash theft. In this case, the cost b must be paid and ˆ Upon being matched with a financial intermethe cash adjustment equals m. diary the agent replenishes her balances to m = m∗ , which solves (6)
ˆ + G(m) ˆ m∗ = arg min m ˆ m≥0
This problem has two boundary conditions. First, if m = 0, the agent withdraws to prevent violation of the nonnegative cash constraint in the next instant. Sec-
372
F. ALVAREZ AND F. LIPPI
ond, for m ≥ m∗∗ , the agent pays b and deposits cash in excess of m∗ . Combining these boundary conditions with (5), we have (7)
G(m) ⎧ b + m∗ + G(m∗ ) if m = 0, ⎪ ⎪ ⎪ ⎪ ∗ ∗ ⎪ ⎨ −G (m)(c + πm) + (p1 + p2 )[m + G(m )] + p2 b − p1 m = r + p1 + p2 ⎪ ⎪ ⎪ if m ∈ (0 m∗∗ ) ⎪ ⎪ ⎩ b + m∗ − m + G(m∗ ) if m ≥ m∗∗ .
For the assumed configuration to be optimal it must be the case that the agent prefers not to pay the cost b and adjust money balances in the relevant range: (8)
m + G(m) < b + m∗ + G(m∗ )
for m ∈ (0 m∗∗ )
Summarizing, we say that m∗ m∗∗ , and G(·) solve the Bellman equation for the total cost problem (4) if they satisfy (6), (7), and (8). We define a related problem that it is closer to the standard inventorytheoretic problem where the agent minimizes the shadow cost ∞ τj+1 −τj V (m) = min E0 (9) e−rτj Iτj b + e−rt Rm(t + τj ) dt {m(t)τj }
j=0
0
subject to (3), m(t) ≥ 0, where τj denote the stopping times at which an adjustment (jump) of m takes place and m(0) = m is given.9 The indicator Iτj equals 0 if the adjustment takes place at the time of a free adjustment; otherwise it is 1. In this formulation, R is the opportunity cost of holding cash and there is only one Poisson process with intensity p describing the arrival of a free opportunity to adjust. The parameters of the problem are r, R, π, p, b, and c. Derivation of the Bellman equation for an agent unmatched with a financial intermediary and holding a real value of cash m follows by the same logic used to derive equation (5). The only decision that the agent must make is whether to remain unmatched or to pay the fixed cost b and be matched with a financial intermediary. Denoting by V (m) the derivative of V (m) with respect to m, the Bellman equation satisfies (10)
ˆ − V (m)) + V (m)(−c − mπ) rV (m) = Rm + p min(V (m) ˆ m≥0
9 The shadow cost formulation is standard in the literature on inventory-theoretic models, as in, for example, Baumol (1952), Tobin (1956), Miller and Orr (1966), and Constantinides (1976). In these papers the problem is to minimize the steady state cost of a stationary inventory policy. This differs from our formulation, where the agent minimizes the expected discounted cost in (9). In this regard, our analysis follows that of Constantinides and Richard (1978). In a related model, Frenkel and Jovanovic (1980) compared the resulting money demand arising from minimizing the steady state vs. the expected discounted cost.
INNOVATION AND THE DEMAND FOR CASH
373
Upon being matched with a financial intermediary, the agent chooses the optimal adjustment setting m = m∗ or (11)
ˆ V ∗ ≡ V (m∗ ) = min V (m) ˆ m≥0
As in problem (4), we conjecture that the optimal policy is described by two threshold values satisfying 0 < m∗ < m∗∗ . This requires two boundary conditions. At m = 0, the agent must pay the cost b and withdraw; for m ≥ m∗∗ , the agent pays the cost b and deposits cash in excess of m∗ .10 Combining these boundary conditions with (10), we have ⎧ ∗ V + b if m = 0, ⎪ ⎨ Rm + pV ∗ − V (m)(c + mπ) (12) V (m) = if m ∈ (0 m∗∗ ), ⎪ r +p ⎩ V ∗ + b if m ≥ m∗∗ . To ensure that it is optimal not to pay the cost and contact the intermediary in the relevant range, we require (13)
V (m) < V ∗ + b for m ∈ (0 m∗∗ )
Summarizing, we say that m∗ , m∗∗ , and V (·) solve the Bellman equation for the shadow cost problem (9) if they satisfy (11), (12), and (13). We are now ready to show that (4) and (9) are equivalent and to characterize the solution. PROPOSITION 1: Assume that the opportunity cost is given by R = r + π + p2 and that the contact rate with the financial intermediary is p = p1 + p2 . Assume that the functions G(·) and V (·) satisfy (14)
G(m) = V (m) − m + c/r + p2 b/r
for all m ≥ 0. Then m∗ , m∗∗ , and G(·) solve the Bellman equation for the total cost problem (4) if and only if m∗ , m∗∗ , and V (·) solve the Bellman equation for the shadow cost problem (9). See the Appendix for the proof. Notice that the total and the shadow cost problems are described by the same number of parameters. They have r, π, c, and b in common. The total cost problem uses p1 and p2 , while the shadow cost problem uses R and p. That R = r + π + p2 is simple: the shadow cost of holding money is given by the real opportunity cost of investing, r, plus the fact that cash holdings loose real value continually at a rate π and are lost entirely with probability p2 per 10 Since withdrawals are the agent’s only source of cash in this economy, in the invariant distribution money holdings are distributed on the interval (0 m∗ ) and m∗∗ is never reached.
374
F. ALVAREZ AND F. LIPPI
unit of time. That p = p1 + p2 is clear too, since the effect of either shock is to force an adjustment on cash. The effect of theft as part of the opportunity cost allows us to parameterize R as being, at least conceptually, independent of r and π. Quantitatively, we think that, at least for low nominal interest rates, the presence of other opportunity costs may be important. The relation between G and V in (14) is intuitive. First, the constant c/r is required, since even if all withdrawals were free, consumption expenditures must be financed. Second, the constant p2 b/r is the present value of all the withdrawal costs paid upon a cash theft. This adjustment is required because in the shadow cost problem there is no theft. Third, the term m has to be subtracted from V since this amount has already been withdrawn from the interest bearing account. From now on we use the shadow cost formulation since it is closer to the standard inventory decision problem. The predictions of the two models concerning cash holdings statistics are going to be identical for M and n, and display a small difference for W and M, which is discussed later. The next proposition gives one nonlinear equation whose unique solution determines the cash replenishment value m∗ as a function of the model parameters R π r p c, and b. PROPOSITION 2: Assume that r + π + p > 0. The optimal return point m∗ /c has three arguments: β, r + p, and π, where β ≡ b/(cR). The return point m∗ is given by the unique positive solution to 1+(r+p)/π ∗ b m∗ m (15) π +1 (r + p + π) + 1 + (r + p)(r + p + π) = c c cR The optimal return point m∗ has the following properties: (i) m∗ /c is increasing in b/(cR), m∗ /c = 0 as b/(cR) = 0, and m∗ /c → ∞ as b/(cR) → ∞. (ii) For small b/(cR), m∗ /c = 2b/(cR) + o( b/(cR)), where o(z)/z → 0 as z → 0. (iii) The elasticity of m∗ with respect to p evaluated at zero inflation satisfies p dm∗ p 0≤− ∗ ≤ m dp π=0 p + r (iv) The elasticity of m∗ with respect to R evaluated at zero inflation satisfies 0 ≤ −(R/m∗ )(dm∗ /dR)|π=0 ≤ 12 , is decreasing in p, and satisfies R ∂m∗ bˆ 1 − ∗ as →0 → m ∂R π=0 2 R and
R ∂m∗ →0 − ∗ m ∂R π=0
as
bˆ → ∞ R
INNOVATION AND THE DEMAND FOR CASH
375
where bˆ ≡ (p + r)2 b/c. See the Appendix for the proof. Note that, keeping r and π fixed, the solution for m∗ /c is a function of b/(cR), as it is in the steady state money demand of Section 3. Hence m∗ is homogenous of degree 1 in (c b). Result (ii) shows that when b/(cR) is small, the resulting money demand is well approximated by the BT model. Part (iv) shows that the absolute value of the interest elasticity ranges between 0 and 1/2, and that it is decreasing in p (at low inflation). In the limits, we use bˆ to write a comparative static result for the interest elasticity of m∗ with respect to p. Indeed, for r = 0, we have already given an economic interpretation to bˆ in Section 3, to which we will return in Proposition 6. Since m∗ is a function of b/(cR), then the elasticity of m∗ with respect to b/c equals that with respect to R with an opposite sign. The next proposition gives a closed form solution for the function V (·) and the scalar V ∗ in terms of m∗ . PROPOSITION 3: Assume that r + π + p > 0 Let m∗ be the solution of (15). (i) The value for the agents not matched with a financial institution, for m ∈ (0 m∗∗ ), is given by the convex function pV ∗ − Rc/(r + p + π) R V (m) = (16) + m r+p r +p+π 2 −(r+p)/π c m + A 1+π r +p c where A = (r + p)/c 2 (Rm∗ + (r + p)b + Rc/(r + p + π)) > 0. For m = 0 or m ≥ m∗∗ , V (m) = V ∗ + b. (ii) The value for the agents matched with a financial institution is V ∗ = (R/r)m∗ . See the Appendix for the proof. The close relationship between the value function at zero cash and the optimal return point V (0) = (R/r)m∗ + b derived in this proposition will be useful to measure the gains of different financial arrangements. 4.2. Cash Holdings Patterns of the Model This section derives the model predictions concerning observable cash management statistics produced by the model under the invariant distribution of real cash holdings when a policy (m∗ p c) is followed and the inflation rate is π. Throughout the section, m∗ is treated as a parameter, so that the policy is to replenish cash holdings to m∗ when a free withdrawal occurs or when m = 0.
376
F. ALVAREZ AND F. LIPPI
Our first result is to compute the expected number of withdrawals per unit of time, denoted by n. By the fundamental renewal theory, n equals the reciprocal of the expected time between withdrawals, which gives11 ∗ p m π p = (17) n −p/π c m∗ 1− 1+π c As can be seen from expression (17), the ratio n/p ≥ 1, since in addition to the p free withdrawals, it includes the costly withdrawals that agents make when they exhaust their cash. Note how this formula yields exactly the expression in the BT model when p = π = 0. The next proposition derives the density of the invariant distribution of real cash balances as a function of p, π, and c and m∗ /c. PROPOSITION 4: (i) The density for the real balances m is p/π−1 m 1+π p c h(m) = p/π c m∗ 1+π −1 c
(18)
(ii) Let H(m m∗1 ) be the cumulative distribution function (CDF) of m for a given m∗ . Let m∗1 < m∗2 . Then H(m m∗2 ) ≤ H(m m∗1 ), that is, H(· m∗2 ) first order stochastically dominates H(· m∗1 ). For the proof, see the Appendix. The density of m solves the ordinary differential equation (ODE) (see the proof of Proposition 4) (19)
∂h(m) (p − π) = h(m) ∂m (πm + c)
for any m ∈ (0 m∗ ). There are two forces that determine the shape of this density. One is that agents replenish their balances to m∗ at a rate p. The other is that inflation erodes the real value of their nominal balances. For p = π = 0, the two effects cancel and the distribution is uniform, as in BT. 11
The time between withdrawals is distributed as a truncated exponential with parameter p. It is exponential because free withdrawals arrive at a rate p. Since agents must withdraw when m = 0, the distribution is truncated at t¯ = (1/π) log(1 + m∗ /cπ), which is the time to deplete cash balances from m∗ to 0 conditional on not having a free withdrawal. Simple algebra gives the equation in the text.
INNOVATION AND THE DEMAND FOR CASH
We define the average money demand as M = expression for h(m), integration gives (20)
M m∗ π p = c c
m∗ 1+π c
m∗ 0
377
mh(m) dm. Using the
p/π
1 m∗ (1 + πm∗ /c) − + c p+π p+π ∗ p/π m 1+π −1 c
The function Mc (· π p) is increasing in m∗ , which follows immediately from part (ii) of Proposition 4. Next we show that for a fixed m∗ , M is increasing in p: PROPOSITION 5: The ratio M/m∗ is increasing in p with 1 M (π p) = ∗ m 2
for
p=π
and
M (π p) → 1 m∗
as p → ∞
For the proof, see the Appendix. Note that p = π = 0 gives BT, that is, M/m∗ = 1/2. The other limit corresponds to the case where the agent is continuously replenishing her balances. m∗ The average withdrawal W = m∗ [1 − pn ] + [ pn ] 0 (m∗ − m)h(m) dm. The expression [1 − pn ] is the fraction of withdrawals, of size m∗ , that occur when m = 0. The complementary fraction gives the withdrawals that occur upon a chance meeting with the intermediary, which are of size m∗ − m and happen with frequency h(m). Combining the previous results, we can see that for p ≥ π, the ratio of withdrawals to average cash holdings is less than 2. To see this, using the definition of W , we can write (21)
W m∗ p = − M M n
Since M/m∗ ≥ 1/2, then W /M ≤ 2. Notice that for p large enough this ratio can be smaller than 1. In the BT model W /M = 2, while in the data of Table I the average ratio is below 1.5 and its median value is 1 for households with an ATM card. The intuition for this result is clear: the agent takes advantage of the free random withdrawals regardless of her cash balances, hence the withdrawals are distributed on [0 m∗ ], as opposed to being concentrated on m∗ , as in the BT model. Let M be the average amount of money at the time of withdrawal. The m∗ derivation, analogous to that for W , gives M = 0 · [1 − pn ] + [ pn ] 0 mh(m) dm or (22)
M=
p M n
378
F. ALVAREZ AND F. LIPPI
which implies that 0 < M/M < 1, M/M → 0, as p → 0 and M/M → 1 as p → ∞. Other researchers noticing that currency holdings are positive at the time of withdrawals account for this feature by adding a constant M/M to the sawtoothed path of a deterministic inventory model, which implies that the average cash balance is M1 = M + 05c/n or M2 = M + 05W ; see, for example, equations (1) and (2) in Attanasio, Guiso, and Jappelli (2002) and Table 1 in Porter and Judson (1996). Instead, in our model W /2 < M < M + W /2. The leftmost inequality is a consequence of Proposition 5 and equation (21); the other can be derived using the form of the optimal decision rules and the law of motion of cash flows (see Appendix C in Alvarez and Lippi (2009)). Hence in our model M1 and M2 are upward biased estimates of M. Indeed the data of Table I show they overestimate M by a large margin.12 We finish with two comments on W /M and M/M. First, these ratios, as is the case with M/c and n, are functions of three arguments: m∗ /c, p, and π. This property is useful in the estimation, where we use the normalization m∗ /c. Second, as mentioned in the comment to Proposition 1, the statistics for W /M and M/M produced by the total cost problem differ from those for the shadow cost problem displayed in (21) and (22). The expression for the total cost problem are given by W /M = m∗ /M − p/n + p2 /n and M/M = p/n − p2 /n. The correction term p2 /n is due to the effect of cash theft. Quantitatively, the effect of p2 /n on W /M and M is negligible compared to p, and hence we ignore this term in the expressions for W and M below. Note that, instead, p2 is quantitatively important in the computation of the opportunity cost of cash, R = r + π − p2 , at low inflation rates. 4.3. Comparative Statics on M, M, W , and Welfare We begin with a comparative statics exercise on M, M, and W in terms of the primitive parameters b/c, p, and R. To do this, we combine the results of Proposition 2 that describe how m∗ /c depends on p, b/c, and R, and the results of Section 4.2 that analyze how M, M, and W change as a function of m∗ /c and p. The next proposition defines a one dimensional index bˆ ≡ (b/c)p2 that characterizes the shape of the money demand and the strength of the precautionary motive focusing on π = r = 0. When r → 0, our problem is equivalent to minimizing the steady state cost. The choice of π = r = 0 simplifies comparison of the analytical results with those of the original BT model and with those of Section 3. 12 The expression for M1 overestimates the average cash by 20% and 140% for households with and without ATMs, respectively; the one for M2 overestimates by 7% and 40%, respectively.
INNOVATION AND THE DEMAND FOR CASH
379
PROPOSITION 6: Let π = 0 and r → 0. The ratios W /M, M/M, and (M/c)p ˆ are determined by three strictly monotone functions of b/R that satisfy
as
bˆ → 0: R
as
bˆ → ∞: R
W → 2 M
W → 0 M
Mp c → 1; 2 bˆ ∂ log R Mp ∂ log c → 0 bˆ ∂ log R
∂ log
M → 0 M
M → 1 M
See the Appendix for the proof. ˆ The limit where b/R → 0 corresponds to the BT model, where W /M = 2, ∂ log(M/c) M/M = 0, and ∂ log R = −1/2 for all b/c and R. The elasticity of (M/c)p with ˆ respect to b/R determines the effect of the technological parameters b/c and p on the level of money demand, as well as on the interest rate elasticity of M/c with respect to R since ˆ b ∂ log(M/c)p ∂ log(M/c) η ≡ =− ˆ R ∂ log R ∂ log(b/R) Direct computation gives that ˆ ∂ log(M/c) b = −1 + 2η ≤0 ∂ log p R
ˆ ∂ log(M/c) b and 0 ≤ =η ∂ log(b/c) R
The effects of p, b/c, and R on M/c are smooth versions of those described in the model with p deterministic free withdrawals in Section 3; the effects on W and M differ due to the precautionary demand generated by the random withdrawal cost. ˆ Figure 1 plots W /M, M/M, and η as functions of b/R. This figure completely characterizes the shape of the money demand and the strength of the ˆ precautionary motive since the functions plotted in it depend only on b/R. The ˆ range of the b/R values used in this figure is chosen to span the variation of the estimates presented in Table V. While this figure is based on results for π = r = 0, the figure obtained using the values of π and r that correspond to the averages for Italy during 1993–2004 is quantitatively indistinguishable. We conclude this section with a result on the welfare cost of inflation and the effect of technological change. Let (R κ) be the vector of parameters that index the value function V (m; R κ) and the invariant distribution h(m; R κ), where κ = (π r b p c). We define the average flow cost of cash purchases
380
F. ALVAREZ AND F. LIPPI
FIGURE 1.—W /M, M/M, m∗ /M, and η = elasticity of (M/c)p.
m∗ borne by households v(R κ) ≡ 0 rV (m; R κ)h(m; R κ) dm. We measure the benefit of lower inflation for households, say as captured by a lower R and π, or of a better technology, say as captured by a lower b/c or a higher p, by comparing v(·) for the corresponding values of (R κ). A related concept is (R κ), the expected withdrawal cost borne by households that follow the optimal rule (23)
(R κ) = n(m∗ (R κ) p π) − p · b
where n is given in (17) and the expected number of free withdrawals, p, are subtracted. The value of (R κ) measures the resources wasted trying to economize on cash balances, that is, the deadweight loss for the society corresponding to R. While is the relevant measure of the cost for the society, we find it useful to define v separately to measure the consumers’ benefit of using ATM cards. The next proposition characterizes (R κ) and v(R κ) as r → 0. This limit is useful for comparison with the BT model and it also turns out to be an excellent approximation for the values of r used in our estimation.
INNOVATION AND THE DEMAND FOR CASH
381
PROPOSITION 7: Let r → 0. Then (i) v(R κ) = Rm∗ (R κ); (ii) v(R κ) = R ˜ κ) d R; ˜ and (iii) (R κ) = v(R κ) − RM(R κ). M(R 0 For the proof, see the Appendix. The proposition implies that the loss for society coincides with the consumer R ˜ d R˜ − surplus that can be gained by reducing R to 0, that is, (R) = 0 M(R) 13 RM(R). This extends the result of Lucas (2000), derived from a money-inthe-utility-function model, to an explicit inventory-theoretic model. Note that measuring the welfare cost of inflation using consumer surplus requires estimation of the money demand for different interest rates, while the approach using (i) and (iii) can be implemented once M, W , and M are known, since W + M = m∗ . In the BT model = RM, since m∗ = W = 2M. Relative to BT, the welfare cost in our model is /(RM) = m∗ /M − 1, a value that ranges between 1 and 0, as can be seen in Figure 1. Hence the cost of inflation in our model is smaller than in BT. The difference is due to the behavior of the interest elasticity: while it is constant and equal to 1/2 in BT, the elasticity is between 1/2 and 0 in our model, and is smaller at lower interest rates (recall Proposition 6). Section 7 presents a quantitative application of this result. Finally, note that the loss for society is smaller than the cost for households; using (i)–(iii) and Figure 1 the ˆ two can be easily compared. As b/R ranges from 0 to ∞, the ratio of the costs
/v decreases from 1/2 (the BT value) to 0. 5. ESTIMATION OF THE MODEL We estimate the parameters (p b/c) using the data described in Section 2 under two alternative sets of assumptions. Our baseline assumptions are that all households in the same cell (to be defined below) have the same parameters (p b/c). Alternatively, in Section 5.3 we assume that (p b/c) comprises a simple parametric function of individual household characteristics, that is, an instance of “observed heterogeneity.” In both cases we take the opportunity cost R as observable (see Table II), and assume that households’ values of (M/c n W /M M/M) are observed with classical normally distributed measurement error (in logs). Appendices F and G in Alvarez and Lippi (2009) explore alternative estimation setups, including one with unobserved heterogeneity, all of which lead to similar results. The assumption of classical measurement error is often used when estimating models based on household survey data. We find that the pattern of violations of a simple accounting identity, c = nW − πM, is consistent with large classical measurement error. In particular, a histogram of the deviations of 13 In (ii) and (iii) we measure welfare and consumer surplus with respect to variations in R, keeping π fixed. The effect on M and v of changes in π for a constant R are quantitatively small.
382
F. ALVAREZ AND F. LIPPI
this identity (in log points) is centered around zero, symmetric, and roughly bell shaped (see Appendix D in Alvarez and Lippi (2009)). We stressed in the Introduction that ignoring the endogeneity of c with respect to (p b) does not impair our estimation strategy. Suppose, for instance, that the cash expenditure c(p b) depends on p and b (this would be the solution of problem (ii) in the Introduction). At this point, the agent solves the inventory problem, that is, chooses the number of withdrawals n(c p b) W (c p b) to finance c(p b), where the notation emphasizes that n(·) and W (·) depend on (c p b). Since we have data on c, we can invert the decisions on n(c p b), W (c p b) to estimate p and b without the need to know the mapping: c(p b). Instead, if c was not observed, the mapping c(p b) would be needed to estimate p and b by inverting n(c(p b) p b), W (c(p b) p b) 5.1. Identification of p and b/(cR) Our identification strategy uses the fact that the model is described by two parameters (p b/(cR)) and that, for each observation, the model has implications for four variables (M/c n W /M M/M) as shown below. Under the hypothesis that the model is the data generating process, the parameters (p b/(cR)) can be estimated independently for each observation, regardless of the distribution of (M/c n W /M M/M R) across observations. An advantage of this strategy is that the estimates of (p b/(cR)) would be unbiased (or unaffected by selection bias) even if agents were assigned to ATM and nonATM holder classes in a systematic way. For simplicity this subsection assumes that π = 0 and ignores measurement error. Both assumptions are relaxed in the estimation. We consider three cases, each of which exactly identifies p and b/(cR) using a different pair of variables. In the first case we show how M/c and n exactly identify p and b/(cR). For the BT model, that is, for p = 0, we have W = m∗ c = m∗ n, and M = m∗ /2, which implies 2M/c = 1/n. Hence, if the data were generated by the BT model, M/c and n would have to satisfy this relation. Now consider the average cash balances generated by a policy like the model in Section 4. From (17) and (20), for a given value of p we have 1 m∗ p M (24) = n −1 and n = c p c 1 − exp(−pm∗ /c) or, solving for M/c as a function of n, M 1 n p (25) = ξ(n p) = − log 1 − −1 c p p n For a given p, the pairs M/c = ξ(n p) and n are consistent with a cash management policy of replenishing balances to some value m∗ either when the zero
INNOVATION AND THE DEMAND FOR CASH
383
balance is reached or when a chance meeting with an intermediary occurs. The function ξ is defined only for n ≥ p. Analysis of ξ shows that p is identified. Specifically, consider a pair of observations on M/c and n: if M/c ≥ 1/(2n) = ξ(n 0), then there is a unique value of p that solves M/c = ξ(n p); if M/c < 1/(2n), then there is no solution.14 Thus, for any n ≥ /(2M), our model can rationalize values of M/c > ξ(n 0), where ξ(n 0) is the value of M/c in the BT model. In fact, fixing M/c, a higher value of n implies a higher value of p. The identification of β ≡ b/(cR) uses the first order condition for m∗ . In particular, given the values of p and the corresponding pair (M/c n), we use (24) to solve for m∗ /c. Finally, using the equation for m∗ given in Proposition 2 gives (26)
β≡
b exp[(r + p)m∗ /c] − [1 + (r + p)(m∗ /c)] = cR (r + p)2
To understand this expression, consider two pairs (M/c n), both on the locus defined by ξ(· p) for a given value of p. The pair with higher M/c and lower n corresponds to a higher value of β, because when trips are expensive relative to the opportunity cost of cash (high β), agents visit the intermediary less often. Hence, data on M/c and n identify p and β. An estimate of b/c can then be retrieved using data on R. The second case shows that W /M and n exactly identify p and b/(cR). Consider an agent who follows an arbitrary policy of replenishing her cash to m∗ either as m = 0 or when a free withdrawal occurs. Using the cash flow identity nW = c and (25) yields (27)
−1 1 1 p W = δ(n p) ≡ + − M p/n log(1 − p/n) n
for n ≥ p and p ≥ 0. Notice that the ratio W /M is a function only of the ratio p/n. As in the previous case, given a pair of observations on 0 < W /M ≤ 2 and n > 0, we can use δ(n p) to solve for the unique corresponding value of p.15 The interpretation of this is clear: for p = 0 we have W /M = 2, the highest value that can be achieved by W /M. A smaller W /M observed for a given n implies a larger value of p. Indeed, as n converges to p—a case where almost all the withdrawals are due to chance meetings with the intermediary— then W /M goes to zero. As in the first case, the identification of β ≡ b/(cR) uses the first order condition for m∗ . In particular, we can find the value of m∗ /c using W /M = (m∗ /c)/(M/c) − p/n (equation (21)). With the values of Since for any n > 0 the function ξ satisfies ξ(n 0) = 1/(2n), ∂ξ(n p)/∂p > 0 for all p > 0, and limp↑n ξ(n p) = ∞. 15 This follows since for all n > 0, δ(n 0) = 2, ∂δ(n p)/∂p < 0, and limp↑n δ(n p) = 0 . 14
384
F. ALVAREZ AND F. LIPPI
(m∗ /c p) we can find the unique value of β = (b/c)/R that rationalizes this choice, using (26). Thus, data on W /M and n identify β and p. The third case shows that observations on M/M and n exactly identify p and b/(cR). Equation (22) gives p = n(M/M). If M/M < 1, then p is immediately identified; otherwise, there is no solution with n ≤ p. As in the previous cases the identification of β uses the first order condition for m∗ . For a fixed p, different combinations of n and M/M that give the same product are due to differences in β = (b/c)/R. If β is high, then agents economize on the number of withdrawals and keep larger cash balances (see Figure 2 in Alvarez and Lippi (2007) for a graphical analysis of the identification problem). We have discussed how data on each of the pairs (M/c n), (W /M n), or (M/M n) identify p and β. Of course, if the data had been generated by the model, the three ways to estimate (p β) should produce identical estimates. In other words, the model is overidentified. In the next section, we will use this idea to report how well the model fits the data or, more formally, to test for the overidentifying restrictions. Considering the case of π > 0 makes the expressions more complex, but, at least qualitatively, does not change any of the properties discussed above. Moreover, since the inflation rate in our data set is quite low, the expressions for π = 0 approximate the relevant range for π > 0 very well. 5.2. Baseline Case: Cell Level Estimation In the baseline estimation we define a cell as a particular combination of year–province–household type, where the latter is defined by the cashexpenditure group (lowest, middle, and highest third of households ranked by cash expenditure) and ATM ownership. This yields about 3700 cells, the product of the 103 provinces of Italy × 6 time periods (spanning 1993–2004) × 2 ATM ownership statuses (whether a household has an ATM card or not) × 3 cash expenditure group. For each year we observe the inflation rate π, and for each year–province–ATM ownership type we observe the opportunity cost R. Let i index the households in a cell. For all households in that cell we assume that bi /ci and pi are identical. Given the homogeneity of the optimal decision rules, this implies that all household i have the same values of M/c W /M n, and M/M. j Let j = 1 2 3 4 index the variables M/c W /M, n, and M/M, let zi be the j (log of the) ith household observation on variable j, and let ζ (θ) be the (log of the) model prediction of the j variable for the parameter vector θ ≡ (p b/c). j j The variable zi is observed with a zero-mean measurement error εi with varij j ance σj2 , so that zi = ζ j (θ) + εi . It is assumed that the parameter σj2 is common across cells (we allow one set of variances for households with ATM cards and one for those without). The estimation proceeds in two steps. We first estimate σj2 by regressing each of the four observables, measured at the individual household level, on a vector
INNOVATION AND THE DEMAND FOR CASH
385
of cell dummies. The variance of the regression residual is our estimate of σj2 . We treat σj2 as known parameters because there are about 20,000 degrees of j freedom for each estimate. Since the errors εi are assumed to be independent across households i and variables j, in the second step we estimate the vector of parameters θ for each cell separately, by minimizing the likelihood criterion F(θ; z) ≡
4 Nj j=1
σj2
Nj 1 j z − ζ j (θ) Nj i=1 i
2
where σj2 is the measurement error variance estimated above and Nj is the sample size of the variable j.16 Minimizing F (for each cell) yields the maximum j likelihood estimator provided the εi are independent across j for each i. Table III reports some summary statistics for the baseline cell (province– year–type combination) estimates. The first two panels in the table report the mean, median, and 95th and 5th percentiles of the estimated values for p and b b/c across all cells. As explained above, our procedure estimates β ≡ cR , so to obtain b/c we compute the opportunity cost R as the sum of the nominal interest rate and the probability of cash theft described in Table II. Inflation in each year is measured by the Italian consumer price index (CPI) (the same across provinces); the real return r is fixed at 2% per year. The parameter p gives the average number of free withdrawal opportunities per year. The parameter b/c · 100 is the cost of a withdrawal in percentage of daily cash expenditure. We also report the mean value of the t statistics for these parameters. The asymptotic standard errors are computed by solving for the information matrix. The estimates reported in the first two columns of the table concern households who possesses an ATM card, shown separately for those in the lowest and highest cash-expenditure group. The corresponding statistics for households without ATM cards appear in the third and fourth columns. The difference between the 95th and the 5th percentile indicates that there is a significant amount of heterogeneity across cells. The relatively low values for the mean t-statistics reflects that the number of households used in each cell is small. Indeed, in Appendix F in Alvarez and Lippi (2009) we consider different levels of aggregation and data selection. In all the cases considered we find very similar values for the average of the parameters p and b/c, and we find that when we do not disaggregate the data as much, the average 16 The average number of observations (Nj ) available for each variable varies. It is similar for households with and without ATM cards. There are more observations on M/c than for each of the other variables, and its average weight (N1 /σ12 ) is about 1.5 times larger than each of the other three weights (see Appendix E in Alvarez and Lippi (2009) for further documentation). The number of household–year–type combinations used to construct all the cells is approximately 40,000.
386
F. ALVAREZ AND F. LIPPI TABLE III SUMMARY OF (p b/c) ESTIMATES ACROSS PROVINCE–YEAR–TYPE CELLS Cash Expenditurea Household w/o ATM
Household w. ATM
Low
High
Low
High
Parameter p (avg. no. of opportunities per year) 68 Meanb 56 Medianb 17 95th percentileb 11 5th percentileb 25 Mean t-statisticsb
87 62 25 08 22
20 17 49 3 27
25 20 61 4 35
Parameter b/c (in % of daily cash expenditure) Meanb Medianb 95th percentileb 5th percentileb Mean t-statisticsb
105 73 30 15 28
55 36 17 04 25
65 35 24 06 24
21 11 7 03 33
No. of cellsc
504
505
525
569
Goodness of Fit: Likelihood Criterion F(θ; z) ∼ χ2(2) Household w/o ATM
Household w. ATM
Percentage of cells whered F(θ; z) ≤ 46 = 90th percentile of χ2(2) F(θ; z) ≤ 14 = 50th percentile of χ2(2)
59% 28%
48% 22%
Average no. of households per estimate
10.7
13.5
a Low (high) denotes the lowest (highest) third of households ranked by cash expenditure c . b Statistics computed across cells. c The total number of cells, which includes the group with middle cash expenditure, is 1539 and 1654 for households without and with ATM, respectively. d Only cells where all four variables (M/c n W /M M/M) are available are used to computed these statistics (about 80% of all cells).
t-statistics increase roughly with the (square root) of the average number of observations per cell.17 Table III shows that the average value of b/c across all cells is between 2% and 10% of daily cash consumption. Fixing an ATM ownership type and comparing the average estimates for p and b/c across cash consumption cells, we 17 Concerning aggregation, we repeat all the estimates without disaggregating by the level of cash consumption, so that Nj is three times larger. Concerning data selection, we repeat all the estimates excluding those observations where the cash holding identity is violated by more than 200% or where the share of total income received in cash by the household exceeds 50%. The goal of this data selection, that roughly halves the sample size, is to explore the robustness of the estimates to measurement error.
INNOVATION AND THE DEMAND FOR CASH
387
see that there are small differences for p, but that b/c is substantially smaller for the those in the high cash-expenditure group. Indeed, combining this information with the level of cash consumption that corresponds to each cell, we estimate b to be uncorrelated with cash consumption levels, as documented in Section 6. Using information from Table I for the corresponding cash expenditure to which these percentages refer, the mean values of b for households with and without ATMs are 0.8 and 1.7 euros at year 2004 prices, respectively. For comparison, the cash withdrawal charge for own-bank transactions was zero, while the average charge for other-bank transactions, which account for less than 20% of the total, was 2.0 euros.18 Next we discuss three patterns that emerge from the estimates of (p b/c) that are consistent with the economics of the withdrawal technology, but that were not imposed in the estimation of these parameters, which we take as supportive of the economic significance of the model and its estimates. The first pattern is that households with ATM cards have higher values of p and correspondingly lower values of b/c than those without cards. This can be seen for the mean and median values in Table III, but more strikingly (not shown in the table), the estimated value of p is higher for those with ATM cards in 88% of the cells, and the value for b/c is smaller in 82% of the cells. This pattern supports the hypothesis that households with ATM cards have access to a superior withdrawal technology. The second pattern is the positive correlation (0.69) of the estimated values of b/c between households with and without ATM across province–year– consumption cells. Likewise we find a correlation (0.30) of the estimated values of p between households with and without ATM cards. This pattern supports the hypothesis that province–year–consumption cells are characterized by different levels of efficiency on the withdrawal technology for both ATM and non-ATM card holders. The third pattern is that b/c shows a strong negative correlation with indicators of the density of financial intermediaries (bank branches and ATMs per resident, shown in Table II) that vary across provinces and years. Likewise, the correlation of p with those indicators is positive, although it is close to zero (see Alvarez and Lippi (2007) for details). As greater financial diffusion raises the chances of a free withdrawal opportunity (p) and reduces the cost of contacting an intermediary (b/c), we find that these correlation are consistent with the economics of the model. We find these patterns reassuring since we have estimated the model independently for each cell, that is, for ATM holders/nonholders (first pattern), for province–year–consumption combinations (second pattern), and without using information on indicators of financial diffusion (third pattern). Finally, we report on the statistical goodness of fit of the model. The bottom panel of Table III reports some statistics on the goodness of fit of the 18
The sources are RBR (2005) and an internal report by the Bank of Italy.
388
F. ALVAREZ AND F. LIPPI
model. Let S be the number of estimation cells and consider a cell s ≤ S with data zs and estimated parameter θs . Under the assumption of normally distributed errors, or as the number of households in the cell is large, the minimized likelihood criterion F(θs ; zs ) is distributed as a χ2(2) . The 2 degrees of freedom result from having four observable variables—that give four moments— and two parameters, that is, two overidentifying restrictions. As standard, the cell s passes the overidentifying restriction test with a 10% confidence level if F(θs ; zs ) < 46, the 90th percentile of χ2(2) . As shown in the table, this happens for 48% and 59% of the cells for households with and without ATM cards, respectively. In this sense the statistical fit of the model is relatively good. Alternatively, under the assumption that errors are independent across cells, the vector {F(θs ; zs )}Ss=1 are is a sample of size S from a chi square with 2 degrees of freedom. Since S is a large number, the fraction of cells with F < 46 should be around 090 and with F < 14, the median of a χ2(2) , should be around 050. As the corresponding values in the table are smaller, the joint statistical fit of the model is poor. 5.3. Estimates With Observed Household Heterogeneity This section explores an alternative estimation strategy that incorporates observed household level heterogeneity. It is assumed that the four variables (M/c, W /M, n, M/M) are observed with classical measurement error, and that households differ in the parameters b/c and p which are given by a simple parametric function of household observables. In particular, let Xi be a k dimensional vector containing the value of households i covariates. We assume that for each household i the values of b/c and p are given by (b/c)i = exp(λb/c · Xi ) and pi = exp(λp · Xi ), where λp and λb/c are the parameters to be estimated. The vector Xi contains k = 8 covariates: a constant, calendar year, the (log) household cash expenditure, an ATM dummy, a measure of the financial diffusion of bank branches (BB) and ATM terminals at the province level, a credit card dummy, the (log) income level per adult, and the household (HH) size. Assuming that the measurement error is independent across households and variables, the maximum likelihood estimate of λ minimizes F(λ; X z) ≡
4 N
2 1 j zi − ζ j (θ(λ Xi Ri )) 2 σj i=1 j=1
j
where, as above, zi is the log of the jth observable for household i, ζ j (θ) is the model solution given the parameters θ, and N is the number of households in the sample.19 The estimation proceeds in two steps. We first estimate σj2 19 We treat the opportunity cost Ri as known. To speed up the calculations, we estimate the model by assuming that inflation is zero, which has almost no effect on the estimates. We also
389
INNOVATION AND THE DEMAND FOR CASH TABLE IV HOUSEHOLD LEVEL (p b/c) ESTIMATES WITH OBSERVED HETEROGENEITYa Xi Covariates
Constant Year log cash expenditure ATM dummy log ATM and BB density Credit card dummy log income log HH size
λp
t -Stat
λ¯ p
λb/c
−87.7 004 004 124 −015 030 025 035
(−1370) (067) (003) (6470) (−130) (296) (418) (405)
−87.7 004 −001 128 −016 021 030 028
225 −011 −096 −066 −037 −001 026 026
t -Stat
(3340) (−164) (−062) (−327) (−28) (−005) (405) (274)
λ¯ b/c
217 −011 −097 −075 −034 008 033 027
a Estimates for p and b/c under the assumption that (b/c) = exp(λ i b/c Xi ) and pi = exp(λp Xi ), Xi is at the household level, and (M/c , W /M , n, M/M ) is measured with error.
for each of the four variables by running a regression at the household level of each of the four variables against the household level Xi . We then minimize the likelihood criterion F(·; X z), taking the estimated σˆ j2 as given. The asymptotic standard errors of λ are computed by inverting the information matrix. Table IV presents the estimates of λ. The first data column displays the point estimates of λp and the fourth data column displays the point estimates for λb/c . The numbers in parentheses next to the point estimates are the corresponding t-statistics. To compare the results with the baseline estimates of Section 5.2, the table also includes the coefficients of two regressions, labeled λ¯ p and λ¯ b/c . The dependent variables of these regression are the baseline estimates of p and b/c, and hence they are the same for all households in a cell (i.e., combination of a year, province, ATM card ownership, and third-tile cash consumption). The right hand side variables are the cell means of the Xi covariates. We summarize the findings of the household level observed heterogeneity estimates displayed in Table IV. First, and most importantly, the values of λ and λ¯ are extremely close, which shows that the benchmark cell estimates and the household level estimates provide the same information on the variations of (p b/c) on observables.20 The estimates of p and b/c that correspond to a household with average values of each of the Xi variables and our estimated parameters λp and λb/c are, respectively, 11 and 5.2%. These values are similar to the estimates reported in Table III (in particular they are close to the median across cells). The mean estimate for p, greater than zero, supports the introduction of this dimension of the technology, as opposed to having only the BT parameter b/c. The estimates of both ATM dummies are economically restrict the sample to households that have information for all four variables of interest. This gives us a sample of about N = 17,000 (as opposed to 40,000 in our baseline estimates). 20 The exceptions are two values which are small and statistically insignificant.
390
F. ALVAREZ AND F. LIPPI
important and statistically significant. Households with an ATM card have a value of p approximately three times larger (exp(124) ≈ 346) and a value of b/c about half (exp(−066) ≈ 052) relative to households without ATM cards. There is a small positive time trend on p and a larger negative time trend on b/c, although neither estimate is statistically significant. The value of b/c is smaller in locations with a higher density of ATMs or bank branches, with an elasticity of −037, which is borderline statistically significant, but this measure has a small negative effect on p. Credit card ownership has no effect on b/c and a small positive (borderline significant) effect on p. A possible interpretation for the effect on p is that households with a credit card have better access to financial intermediaries. We find a positive effect of the household size (number of adults) on both p and b/c.21 The coefficient of cash expenditure indicates no effect on p and a negative near-unit elasticity with respect to b/c, though it is imprecisely estimated (this elasticity is very close to that estimated using cell level aggregated data). The income per adult has a positive elasticity of about 0.25 for both p and b/c. We interpret the effect of income per capita on p as reflecting better access to financial intermediaries, and with respect to b/c as measuring a higher opportunity cost of time. The combination of the effects of income per capita and cash expenditures yields the following important corollary: the value of b is estimated to be independent of the level of cash expenditure of the household, implying a cash-expenditure elasticity of money demand of approximately one-half provided that the opportunity cost of time is the same. Under the assumption of independent measurement error, the value of the likelihood criterion F is asymptotically distributed as a χ2 with N × 4 − 2 × k = 54260 degrees of freedom.22 The minimized value for F , given by F = 62804, implies a relatively poor statistical fit of the model since the tail probability for the corresponding χ2 of such value of F is essentially zero. 6. IMPLICATIONS FOR MONEY DEMAND In this section we study the implications of our findings for the time patterns of technology and for the expenditure and interest elasticity of the demand for currency. We begin by documenting the trends in the withdrawal technology, as measured by our baseline estimates of p and b/c. Table V shows that p has approximately doubled and that (b/c) has approximately halved over the sample period. In words, our point estimates indicate that the withdrawal technology 21
This result is hard to interpret because if the withdrawal technology had increasing (decreasing) returns with respect to the household size, we would have expected the p and b/c to vary in opposite ways as the size changed. 22 F equals half of the log-likelihood minus a constant not involving λ. We estimate k loadings λb/c and k loadings λp using N households with four observations each.
391
INNOVATION AND THE DEMAND FOR CASH TABLE V TIME SERIES PATTERN OF ESTIMATED MODEL PARAMETERSa 1993
1995
1998
2000
2002
2004
All Years
Households with ATMs p 17 b/c × 100 66 ˆ b/R 11
16 57 14
20 28 19
24 31 56
22 28 30
33 35 58
22 40 32
Households without ATMs p 6 b/c × 100 13 ˆ b/R 02
5 12 02
8 62 04
9 49 04
8 45 04
12 57 16
8 77 05
73
43
39
32
29
50
R × 100
85
a R and p are annual rates, c is the daily cash-expenditure rate, and, for each province–year–type, b/R ˆ = b·p2 /(365· c · R), which has no time dimension. Entries in the table are sample means across province type in a year.
ˆ ≡ (b/c)p2 /R, which, has improved through time.23 The table also reports b/R as shown in Proposition 6 and illustrated in Figure 1, determines the elasticity of the money demand and the strength of the precautionary motive. In particˆ The ular, the proposition implies that W /M and M/M depend only on b/R. ˆ upward trend in the estimates of b/R, which is mostly a reflection of the downward trend in the data for W /M, implies that the interest rate elasticity of the money demand has decreased through time. ˆ By Proposition 6, the interest rate elasticity η(b/R) implied by those estiˆ mates is smaller than 1/2, the BT value. Using the mean of b/R reported in the last column of Table V to evaluate the function η in Figure 1 yields values for the elasticity equal to 0.43 and 0.48 for households with and without ATM ˆ cards, respectively. Even for the largest values of b/R recorded in Table V, the value of η remains above 0.4. In fact, further extending the range of Figure 1 ˆ shows that values of b/R close to 100 are required to obtain an elasticity η ˆ smaller than 0.25. For such high values of b/R, the model implies M/M of about 0.99 and W /M below 0.3, values reflecting much stronger precautionary demand for money than those observed for most Italian households. On the other hand, studies using cross-sectional household data, such as Lippi and Secchi (2009) for Italian data and Daniels and Murphy (1994) using U.S. data, report interest rate elasticities smaller than 0.25. A possible explanation for the difference in the estimated elasticities is that the cross-sectional regressions in the studies mentioned above fail to include adequate measures of financial innovations, and hence the estimate of the 23 Since we have only six time periods, the time trends are imprecisely estimated, as can be seen from the t-statistics corresponding to years in Table IV.
392
F. ALVAREZ AND F. LIPPI TABLE VI A LABORATORY EXPERIMENT ON THE INTEREST ELASTICITY OF MONEY DEMANDa
Dependent Variable: log(M/c)
log(p) log(b/c) log(R) R2 # observations (cells)
Household w. ATM
−005 045 −044 0985 1654
— — −007 001 1654
Household w/o ATM
−001 048 −048 0996 1539
— — −004 0004 1539
a All regressions include a constant.
interest rate elasticity is biased toward zero. To explore this hypothesis, in Table VI we estimate the interest elasticity of M/c by running two regressions for each household type, where M/c is the model fitted value for each province–year–consumption type. The first regression includes the log of p, ˆ b/c, and R. According to Proposition 6, (M/c)p has elasticity η(b/R) so that we approximate it using a constant elasticity: log M/c = − log p + η(log(b/c) + 2 log(p))−η log(R). The regression coefficient for η estimated from this equation gives virtually the same value obtained from Figure 1. Since the left hand side of the equation uses the values of M/c produced by the model using the estimated p and b/c and no measurement error, the only reason why the regression R2 does not equal 1 is that we are approximating a nonlinear function with a linear one. Yet the R2 is pretty close to 1 because the elasticity for this range of parameters is close to constant. To estimate the size of the bias due to omission of the variables log p and log b/c, the second regression includes only log R. The regression coefficient for log R is an order of magnitude smaller than the value of η, pointing to a large omitted variable bias: the correlation between (log(b/c) + 2 log(p)) and log R is 0.12 and 0.17 for households with and without ATM cards, respectively. Interestingly, the regression coefficients on log R estimated by omitting the log of p and b/c are similar to the values that are reported in the literature mentioned above. Replicating the regressions of Table VI using the actual, as opposed to the fitted, value of M/c yields very similar results (not reported here). We now estimate the expenditure elasticity of the money demand. An advantage of our data is that we use direct measures of cash expenditures (as opposed to income or wealth).24 By Proposition 6, the expenditure elasticity is ˆ ∂ log M b ∂ log b/c =1+η ∂ log c R ∂ log c 24
Dotsey (1988) argued for the use of cash expenditure as the appropriate scale variable.
INNOVATION AND THE DEMAND FOR CASH
393
For instance, if the ratio b/c is constant across values of c, then the elasticity is 1; alternatively, if b/c decreases proportionately with c, the elasticity is 1 − η. Using the variation of the estimated b/c across time, locations, and household groups with different values of c, we estimate the elasticity of b/c with respect to c equal to −082 and −101 for households without and with ATM cards, respectively. Using the estimates for η, we obtain that the mean expenditure elasticity is 1 + 048 · (−082) = 061 for households without ATMs, and 0.56 for those with. 7. COST OF INFLATION AND BENEFITS OF ATM CARD This section uses our model to quantify the cost of inflation and the benefits of ATM card ownership. Section 4.3 shows that the loss is = R(m∗ − M) and the household cost is v = Rm∗ . We use the baseline estimates of (p bc ) from Section 5.2 to compute m∗ and M and the implied losses for each estimation cell. The analysis shows that the cost v is lower for households with ATM cards, reflecting their access to a better technology, and that it is lower for households with higher cash expenditures c, reflecting that our estimates of b/c are uncorrelated with c. Quantitatively, the sample mean value of across all years and households in our sample is about 15 euros or approximately 0.6 day of cash purchases per year. To put this quantity in perspective, we relate it to the one in Lucas (2000), obtained by fitting a log-log money demand with constant interest elasticity of 1/2, which corresponds to the BT model. Our model predicts a smaller welfare cost of inflation relative to BT: /(RM) = m∗ /M − 1 (see Section 4.3). For ˆ = 18, which are about the mean of our baseline estimates, R = 005 and b/R
/(RM) = 06, which shows that the welfare cost in our model is 40% smaller than in BT. As discussed after Proposition 7, the discrepancy is due to the different behavior of the interest rate elasticity in our model. As indicated by Lucas, the behavior of the elasticity at low interest rates is key to quantifying the inflation costs. Despite the fact that the interest elasticity is about 1/2 in both models at the sample mean estimates, the elasticity is constant in BT while it is decreasing and eventually zero in our model (recall Proposition 6). Another difference between these estimates is the choice of the monetary aggregate. In both models the welfare cost is proportional to the level of the money demand. While we focus on currency held by households, Lucas used the stock of M1, an aggregate much larger than ours.25 25 Attanasio, Guiso, and Jappelli (2002) fitted a different model to the same data set, focusing mostly on cash balances M—as opposed to W , n, and M—but endogenizing the decision to obtain an ATM card. They also found a first order difference compared to Lucas’ estimates that originates from the use of a smaller monetary aggregate.
394
F. ALVAREZ AND F. LIPPI TABLE VII DEADWEIGHT LOSS AND HOUSEHOLD COST v OF CASH PURCHASESa
Variableb
v
1993
1995
1998
2000
2002
2004
Mean
24 51
23 49
11 25
11 25
10 22
10 25
15 33
a and v are weighted sample averages, measured as annual flows. b Per household in 2004 euros.
Table VII shows that the welfare loss in 2004 is about 40% smaller than in 1993. The reduction is due to the decrease in R and advances in the withdrawal technology (decreases in b/c and increases in p).26 We use v to quantify the benefits associated with ownership of an ATM card. Under the maintained assumption that b is proportional to consumption within each year–province–consumption group type, the value of the benefit for an agent without an ATM card, keeping cash purchases constant, is defined as v0 − v1 c0 /c1 = R(m∗0 − m∗1 c0 /c1 ), where the subscript 1 (0) indicates ownership of (lack of) an ATM card. Our computations show that the mean benefit of ATM card ownership, computed as the weighted average of the benefits across all years and households, is 17 euros. The benefit associated with ATM card ownership is estimated to be positive for over 91% of the province–year–type estimates. The null hypothesis that the gain is positive cannot be rejected (at the 10% confidence level) in 99.5% of our estimates. Since our estimates of the parameters for households with and without ATMs are done independently, we think that the finding that the estimated benefit is positive for most province– years provides additional support for the model. There are two important limitations of this counterfactual exercise. First, the estimated benefit assumes that households without ATM cards differ from those with a card only in terms of the withdrawal technology that is available to them (p b/c). The second is that ATM cards provide other benefits, such as access to electronic retail transactions. In future work we plan to study the household card adoption choice, which will be informative on the size of the estimates’ bias. Yet, we find it interesting that our estimated benefit of ATM cards is close to annual cardholder fees for debit cards, which vary from 10 to 18 euros for most Italian banks over 2001–2005 (see page 35 and Figure 3.8.2 in RBR (2005)).
26
A counterfactual exercise suggests that the contribution of the disinflation and of technological change to the reduction in the welfare loss is of similar magnitude; see Section 8 in Alvarez and Lippi (2007).
INNOVATION AND THE DEMAND FOR CASH
395
8. CONCLUSIONS This paper proposes a simple, tightly parameterized, extension of the classic Baumol–Tobin model to capture empirical regularities of households’ cash management. We now discuss some extensions of the model that we plan to develop fully in the future. Our model has some unrealistic features: all random withdrawals are free and all cash expenditures are deterministic. Two variations of our model that address these issues are sketched below. The first one introduces an additional parameter f , which denotes a fixed cost for withdrawals upon random contacts with the financial intermediary (see Appendix F in Alvarez and Lippi (2009)). The motivation for this is that when random withdrawals are free, the model has the unrealistic feature that agents withdraw every time they match with an intermediary, making several withdrawals of extremely small size. Instead, the model with 0 < f < b has a strictly positive minimum withdrawal size. In Appendix I in Alvarez and Lippi (2009) we use a likelihood ratio test to compare the fit of the f > 0 model with our benchmark f = 0 model. It is shown that the fit does not improve much. Additionally, we show that the parameter f is nearly not identified. To understand the intuition behind this result, notice that the BT model is obtained for p = 0, f = 0, and b > 0 or, equivalently, for f = b > 0, and p > 0. More data, such as information on the minimum withdrawal size, would be needed to estimate f > 0. We left this exploration for future work. The second variation explores the consequences of assuming that the cash expenditure has a random component. One interesting result of this model is that it may produce W /M ≥ 2 or, equivalently, M < E(c)/2n, where E(c) stands for expected cash consumption per unit of time. These inequalities are indeed observed for a small number of households, especially those without ATM cards (see Table I). However, this model is less tractable than our benchmark model, and it is inconsistent with the large number of withdrawals and the values of W /M that characterize the behavior of most households in the sample. Although we solved for the dynamic programming problem for both variations, as well as for the implied statistics for cash balances and withdrawals, we do not develop them further here to keep the discussion simple. Moreover, as briefly discussed, while the models incorporate some realistic features of cash management, they deliver only a modest improvement on the fit of the statistics that we focussed on. Our model, like the one by BT, takes as given the household cash expenditure. We think that this framework should work well as an input for a cash– credit model, and view this as an important extension for future work. New household level data sets with information on cash management, similar to the one we have used, as well as detailed diary information on how different
396
F. ALVAREZ AND F. LIPPI
purchases were paid (cash, debit, credit card, check, etc.) will allow careful quantitative work in this area.27 APPENDIX: PROOFS FOR THE MODEL WITH FREE WITHDRAWALS PROOF OF PROPOSITION 1: Given two functions G and V that satisfy (14), it is immediate to verify that the boundary conditions of the two systems at m = 0 and m ≥ m∗∗ are equivalent. Also, it is immediate to show that for two such functions, ˆ = arg min[m ˆ + G(m)] ˆ m∗ = arg min V (m) ˆ m≥0
ˆ m≥0
It only remains to be shown that the Bellman equations are equivalent for m ∈ (0 m∗∗ ). Using (14), we compute G (m) = V (m) − 1. Assume that G(·) solves the Bellman equation (7) in this range. Inserting (14) and its derivative into (7) gives [r + p1 + p2 ]V (m) = V (m)(−c − πm) + [p1 + p2 ]V (m∗ ) + [r + p2 + π]m Using R = r + π + p2 and p = p1 + p2 , we obtain the desired result, that is, (12). The proof that if V solves the Bellman equation for m ∈ (0 m∗∗ ), then so does G defined as in (14) follows from analogous steps. Q.E.D. LEMMA 1: Let V ∗ be an arbitrary nonnegative value. (a) For m ∈ (0 m∗∗ ) the ODE in (10) is solved by (16) for some constant A > 0. (b) Imposing that (16) satisfies V (0) = V ∗ + b gives π ∗ + (r + p)2 b V (r + p)r + Rc 1 + r +p A= (28) > 0 c2 (c) The expressions in (16) and (28) imply that V (·) is a convex function of m. (d) Let A be the constant that indexes the expression for V (·) in (16). The value m∗ that solves V (m∗ ) = 0 is −π/(r+p+π) R π c ∗ m = 1+ (29) −1 π Ac r+p 27 One such data set was developed by the Austrian National Bank and was used, for instance, by Stix (2004) and Mooslechner, Stix, and Wagner (2006).
INNOVATION AND THE DEMAND FOR CASH
397
(e) The value of V ∗ is (30)
V∗=
R ∗ m r
PROOF: (a) Follows by differentiation. (b) Follows using simple algebra. (c) Direct differentiation gives V (m) > 0. (d) Follows using simple algebra. (e) Replacing V (m∗ ) = 0 and V (m∗ ) = V ∗ in (10) evaluated at m = m∗ yields rV ∗ = Rm∗ . Q.E.D. PROOF OF PROPOSITION 2: Lemma 1 yields a system of three equations, (28), (29), and (30), in the three unknowns A, m∗ , and V ∗ . Replacing equation (30) into (28) yields one equation for A. Rearranging equation (29), we obtain another equation for A. Equating these expressions for A, collecting terms, and rearranging yields equation (15). Let f (m∗ ) and g(m∗ ) be the left and the right hand sides of equation (15), respectively. We know that f (0) < g(0) for b > 0, g (0) = f (0) > 0, g (m∗ ) = 0, and f (m∗ ) > 0 for all m∗ > 0. Thus there exists a unique value of m∗ that solves (15). (i) Let u(m∗ ) ≡ f (m∗ ) − g(m∗ ) + b/(cR)(r + p)(r + π + p). Notice that u(m∗ ) is strictly increasing, convex, goes from [0 ∞), and does not depend on the desired properties of m∗ . b/(cR). Simple analysis of u(m∗ ) establishes m∗ m∗ (ii) For this result we use that f ( c ) = g( c ) is equivalent to j j ∗ 2 ∞ b 1 m 1 m∗ (31) = + (r + p − sπ) cR c 2 j=1 (2 + j)! s=1 c which follows by expanding ( mc π + 1)1+(r+p)/π around m = 0. We notice o( b/c) is equivalent to (m∗ /c)2 = 2b/(cR) + that m∗ /c = 2b/(cR) + [o( b/c)]2 + 2 2b/(cR)o( b/c). Inserting this expression into (31), dividing both sides by b/(cR), and taking the limit as b/(cR) → 0 verifies our approximation. (iii) For π = R − r = 0, using (31) we have ∗ 2 ∗ j ∞ 1 m 1 b j m = + (r + p) cr c 2 j=1 (j + 2)! c To see that m∗ is decreasing in p notice that the right hand side is increasing in p and m. That m∗ (p + r) is increasing in p follows by noting that since (m∗ )2 decreases as p increases, then the term in square brackets, which is a function of (r + p)m∗ , must increase. This implies that the elasticity of m∗ with respect to p is smaller than p/(p + r) since ∂ (p + r) p ∂m∗ ∂m∗ ∗ ∗ ∗ (m (p + r)) = m + (p + r) =m 1+ 0< ∂p ∂p p m∗ ∂p
398
F. ALVAREZ AND F. LIPPI
Thus (p + r) p ∂m∗ ≥ −1 p m∗ ∂p
0≤−
or
p ∂m∗ p ≤ ∗ m ∂p p+r
(iv) For π → 0, equation (15) yields exp(m∗ /c(r + p)) = 1 + m∗ /c(r + p) + (r + p)2 b/(cR). Replacing bˆ ≡ (p + r)2 b/c and x ≡ m∗ (r + p)/c into this expression, expanding the exponential, collecting terms, and rearranging yields x 1+ 2
∞ j=1
bˆ 2 j (x) = 2 (j + 2)! R
We now analyze the elasticity of x with respect to R. Letting ϕ(x) ≡ ∞ 2 j 2 ˆ j=1 (j+2)! [x] , we can write that x solves x [1 + ϕ(x)] = 2b/R. Taking logs ˆ − and defining z ≡ log(x) we get z + (1/2) log(1 + ϕ(exp(z))) = (1/2) log(2b) (1/2) log R. Differentiating z with respect to (w.r.t.) log R, 1 ϕ (exp(z)) exp(z) 1 =− z 1 + 2 2 1 + ϕ(exp(z)) or ηxR ≡ −
R dx = x dR
(1/2) ϕ (x)x 1 + (1/2) 1 + ϕ(x)
Direct computation gives ∞
ϕ (x)x = 1 + ϕ(x)
j=1
1+
j
2 [x]j (j + 2)!
∞ j=1
2 [x]j (j + 2)!
=
∞
jκj (x)
j=0
where
κj (x) = 1+
2 [x]j (j + 2)! ∞ 2 s=1
(s + 2)!
for j ≥ 1 [x]
s
INNOVATION AND THE DEMAND FOR CASH
399
and 1
κ0 (x) = 1+
∞ s=1
2 [x]s (s + 2)!
so that κj has the interpretation of a probability. For larger x, the distribution κ is stochastically larger since κj+1 (x)/κj (x) = x/(j + 3) for all j ≥ 1 and x. ϕ (x)x Then we can write 1+ϕ(x) = E x [j], where the right hand side is the expected value of j for each x. Hence, for higher x we have that E x [j] increases and thus the elasticity ηxR decreases. As x → 0, the distribution κ puts all the mass in j = 0 and hence ηxR → 1/2. As x → ∞, the distribution κ concentrates all the mass in arbitrarily large values of j, hence E x [j] → ∞ and ηxR → 0. Q.E.D. PROOF OF PROPOSITION 3: (i) The function V (·) and the expression for A are derived in parts (a) and (b) of Lemma 1. (ii) V ∗ is given in part (e) of Lemma 1. Q.E.D. PROOF OF PROPOSITION 4: (i) Let H(m t)be the CDF for m at time t. Define ψ(m t; ) ≡ H(m t) − H(m − (mπ + c) t). Thus ψ(m t; ) is the fraction of agents with money in the interval [m m − (mπ + c)) at time t. Let (32)
h(m t; ) =
ψ(m t; ) (mπ + c)
so that lim h(m t; ) as → 0 is the density of H evaluated at m at time t. In the discrete time version of the model with period of length , the law of motion of cash implies (33)
ψ(m t + ; ) = ψ(m + (mπ + c) t; )(1 − p)
Assuming that we are in the stationary distribution, h(m t; ) does not depend on t, so we write h(m; ). Inserting equation (32) into (33), substituting ∂h h(m; ) + ∂m (m; )[(mπ + c)] + o() for h(m + (mπ + c); ), canceling terms, dividing by , and taking the limit as → 0, we obtain (19). The solution of this ODE is h(m) = 1/m∗ if p = π and h(m) = A[1 + π mc ](p−π)/π for some constant A if p = π. The constant A is chosen so that the density integrates to 1, so that A = 1/{( pc )([1 + πc m∗ ]p/π − 1)}. (ii) We now show that the distribution of m that corresponds to a higher value of m∗ is stochastically higher. Consider the CDF H(m; m∗ ) and let m∗1 < m∗2 be two values for the optimal return point. We argue that H(m; m∗1 ) >
400
F. ALVAREZ AND F. LIPPI
H(m; m∗2 ) for all m ∈ [0 m∗2 ). This follows because in m ∈ [0 m∗1 ] the densities satisfy p/π p/π h(m; m∗2 ) m∗1 m∗2 = 1 + π − 1 1 + π − 1 < 1 h(m; m∗1 ) c c In the interval [m∗1 m∗2 ) we have H(m; m∗1 ) = 1 > H(m; m∗2 ).
Q.E.D.
PROOF OF PROPOSITION 5: We first show that if p > p, then the distribution associated with p stochastically dominates the one associated with p. For this we use four properties. First, equation (18) evaluated at m = 0 shows that h(0; p) is decreasing in p. Second, since h(·; p) and h(·; p ) are contin˜ uous densities, they integrate to 1, and hence there must be some value m ˜ p ) > h(m; ˜ p). Third, by the intermediate value theorem, there such that h(m; ˆ ∈ (0 m∗ ) at which h(m; ˆ p) = h(m; ˆ p ). Fourth, note must be at least one m ∗ ˆ ∈ (0 m ). To see why, recall that h that there is at most one such value m ˆ ) ˆ (p−π) m;p mp) ˆ ˆ p ), then ∂h(∂m solves ∂h(m) = h(m) so that if h( m p) = h(m > ∂h(∂m . ∂m (πm+c) ˆ h(m; ˆ p) = h(m; ˆ p ), and To summarize, h(m; p) > h(m; p ) for 0 ≤ m < m, ˆ < m ≤ m∗ . This establishes that H(·; p ) is stochash(m; p) < h(m; p ) for m tically higher than H(·; p). Clearly this implies that M/m∗ is increasing in p. Finally, we obtain the expressions for the two limiting cases. Direct computation yields h(m) = 1/m∗ for p = π, hence M/m∗ = 1/2. For the other case, note that p/π m∗ −1 1+π 1 c c = p/π−1 h(m∗ ) p m∗ 1+π c m∗ 1 c 1+π 1− = p/π p c m∗ 1+π c hence h(m∗ ) → ∞ for p → ∞. Since h is continuous in m, for large p the distribution of m is concentrated around m∗ . This implies that M/m∗ → 1 as p → ∞. Q.E.D. PROOF OF PROPOSITION 6: Let x ≡ m∗ (r + p)/c. Equation (15) for π = 0 ˆ This defines the and r = 0 shows that the value of x solves ex = 1 + x + b/R. ˆ ˆ increasing function x = γ(b/R). Note that x → ∞ as b/R → ∞ and x → 0 as ˆ → 0. b/R To see how the ratio Mp/c depends on x, notice that from (24) we have that Mp/c = φ(xp/(p + r)), where φ(z) ≡ z/(1 − e−z ) − 1. Thus limr→0 Mp/c =
INNOVATION AND THE DEMAND FOR CASH
401
φ(x). To see why the ratios W /M and M/M are functions only of x, note from (24) that p/n = 1 − exp(−pm∗ /c) = 1 − exp(−xp/(p + r)) and hence as r → 0, we can write p/n = ω(x) = M/M, where the last equality follows from (22) and ω is the function ω(x) ≡ 1 − exp(−x). Using (27) we have W /M = α(ω), where α(ω) ≡ [1/ω + 1/ log(1 − ω)]−1 − ω. The monotonicity of the functions φ, ω, and α is straightforward to check. The limits for M/M and W /M as x → 0 or as x → ∞ follow from a tedious but straightforward calculation. ˆ Finally, the elasticity of the aggregate money demand with respect to b/R is R ∂M/c (1/p)φ (x) ∂x φ (x) R ∂x = R =x = ηφx · ηxb/R ˆ M/c ∂R M/c ∂R φ(x) x ∂R that is, is the product of the elasticity of φ w.r.t. x, denoted by ηφx , and ˆ the elasticity of x w.r.t. b/R, denoted by ηxb/R ˆ . The definition of φ(x) gives −x −x ηφx = (x(1 − e − xe ))/((x − 1 + e−x )(1 − e−x )), where limx→∞ ηφx = 1. A second order expansion of each of the exponential functions shows that ˆ yield ηxb/R = (ex − limx→0 ηφx = 1. Direct computations using x = γ(b/R) ˆ x = 0 and limx→0 ηxb/R = x − 1)/(x(e − 1)). It is immediate that limx→∞ ηxb/R ˆ ˆ 1/2. Q.E.D. PROOF OF PROPOSITION 7: (i) By Proposition 3, rV (m∗ ) = Rm∗ , V (·) is decreasing in m, and V (0) = V (m∗ ) + b. The result then follows since m∗ is continuous at r = 0. ∗ = ∂Rm∂R(R) = M(R) or, equiv(ii) Since v(0) = 0, it suffices to show that ∂v(R) ∂R ∗ alently, that m∗ (R) + R ∂m∂R(R) = M(R). From (15) we have that ∂m∗ ∂R
(r+p)/π m∗ (r + p + π) 1+π −1 c c
=−
b (r + p)(r + p + π) cR2
b Using (15) again to replace cR (r + p)(r + p + π), inserting the resulting ex∗ ∗ pression into m (R) + R ∂m (R)/∂R, letting r → 0, and rearranging yields the expression for M obtained in (20). (iii) Using (i) in (iii) yields R(m∗ − M) = (n − p)b. Replacing M and n using equations for the expected values (17) and (20) for an arbitrary m∗ yields an equation identical to the one characterizing the optimal value of m∗ , (15), evaluated at r = 0. Q.E.D.
REFERENCES ALVAREZ, F. E., AND F. LIPPI (2007): “Financial Innovation and the Transactions Demand for Cash,” Working Paper 13416, National Bureau of Economic Research. [384,387,394]
402
F. ALVAREZ AND F. LIPPI
(2009): “Supplement to ‘Financial Innovation and the Transactions Demand for Cash’,” Econometrica Supplemental Material, 77, http://econometricsociety.org/ecta/Supmat/ 7451_data.pdf; http://econometricsociety.org/ecta/Supmat/7451_data and programs.zip. [367, 368,378,381,382,385,395] ATTANASIO, O., L. GUISO, AND T. JAPPELLI (2002): “The Demand for Money, Financial Innovation and the Welfare Cost of Inflation: An Analysis With Household Data,” Journal of Political Economy, 110, 318–351. [365,378,393] BAUMOL, W. J. (1952): “The Transactions Demand for Cash: An Inventory Theoretic Model,” Quarterly Journal of Economics, 66, 545–556. [363,372] CONSTANTINIDES, G. M. (1976): “Stochastic Cash Management With Fixed and Proportional Transaction Costs,” Management Science, 22, 1320–1331. [372] CONSTANTINIDES, G. M., AND S. F. RICHARD (1978): “Existence of Optimal Simple Policies for Discounted-Cost Inventory and Cash Management in Continuous Time,” Operations Research, 26, 620–636. [372] COOLEY, T., AND G. HANSEN (1991): “The Welfare Costs of Moderate Inflations,” Journal of Money, Credit and Banking, 23, 483–503. [367] DANIELS, K. N., AND N. B. MURPHY (1994): “The Impact of Technological Change on the Currency Behavior of Households: An Empirical Cross Section Study,” Journal of Money, Credit and Banking, 26, 867–874. [391] DOTSEY, M. (1988): “The Demand for Currency in the United States,” Journal of Money, Credit and Banking, 20, 22–40. [392] FRENKEL, J. A., AND B. JOVANOVIC (1980): “On Transactions and Precautionary Demand for Money,” The Quarterly Journal of Economics, 95, 25–43. [372] HUMPHREY, D. B. (2004): “Replacement of Cash by Cards in US Consumer Payments,” Journal of Economics and Business, 56, 211–225. [367] LIPPI, F., AND A. SECCHI (2009): “Technological Change and the Households’ Demand for Currency,” Journal of Monetary Economics (forthcoming). [368,391] LUCAS, R. E. J. (2000): “Inflation and Welfare,” Econometrica, 68, 247–274. [365,381,393] LUCAS, R. E. J., AND N. L. STOKEY (1987): “Money and Interest in a Cash-in-Advance Economy,” Econometrica, 55, 491–513. [365] MILLER, M., AND D. ORR (1966): “A Model of the Demand for Money by Firms,” Quarterly Journal of Economics, 80, 413–435. [372] MOOSLECHNER, P., H. STIX, AND K. WAGNER (2006): “How Are Payments Made in Austria? Results of a Survey on the Structure of Austrian Households Use of Payment Means in the Context of Monetary Policy Analysis,” Monetary Policy and the Economy, 2, 111–134. [396] PORTER, R. D., AND R. A. JUDSON (1996): “The Location of U.S. Currency: How Much Is Abroad?” Federal Reserve Bulletin, 82, 883–903. [367,378] RETAIL BANKING RESEARCH (RBR) (2005): “Study of the Impact of Regulation 2560/2001 on Bank Charges for National Payments,” Report RBR, prepared for the European Commission, London. [387,394] STIX, H. (2004): “How Do Debit Cards Affect Cash Demand? Survey Data Evidence,” Empirica, 31, 93–115. [396] TOBIN, J. (1956): “The Interest Elasticity of Transactions Demand for Money,” Review of Economics and Statistics, 38, 241–247. [363,372]
Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A. and NBER;
[email protected] and Dept. of Economics, University of Sassari, Via di Torre Tonda, 34, 07100 Sassari, Italy and Einaudi Institute for Economics and Finance, Rome, Italy and CEPR;
[email protected]. Manuscript received September, 2007; final revision received October, 2008.
Econometrica, Vol. 77, No. 2 (March, 2009), 403–426
LIQUIDITY IN ASSET MARKETS WITH SEARCH FRICTIONS BY RICARDO LAGOS AND GUILLAUME ROCHETEAU1 We develop a search-theoretic model of financial intermediation in an over-thecounter market and study how trading frictions affect the distribution of asset holdings and standard measures of liquidity. A distinctive feature of our theory is that it allows for unrestricted asset holdings, so market participants can accommodate trading frictions by adjusting their asset positions. We show that these individual responses of asset demands constitute a fundamental feature of illiquid markets: they are a key determinant of trade volume, bid–ask spreads, and trading delays—the dimensions of market liquidity that search-based theories seek to explain. KEYWORDS: Bid–ask spreads, trading delays, liquidity, search, trade volume.
1. INTRODUCTION RECENT LITERATURE pioneered by Duffie, Gârleanu, and Pedersen (2005) (DGP) uses search theory to model the trading frictions that are characteristic of over-the-counter (OTC) markets.2 The search-based approach is appealing because it can parsimoniously rationalize standard measures of liquidity such as trade volume, bid–ask spreads, and trading delays, and can be used to study how market conditions influence these measures. A virtue of DGP’s formulation is that it is analytically tractable, so all these mechanisms can be well understood. The literature spurred by DGP keeps the framework tractable by imposing a stark restriction on asset holdings: agents can only hold either 0 units or 1 unit of the asset. In effect, investors’ ability to respond to changes in market conditions is severely limited by this restriction. In this paper we develop a searchbased model of liquidity in asset markets with no restrictions on investors’ asset holdings. The model is close in structure and spirit to DGP, but captures the heterogeneous responses of individual investors to changes in market conditions. As a result of the restrictions they imposed on asset holdings, existing searchbased theories neglect a critical feature of illiquid markets, namely, that mar1 We are grateful to Gadi Barlevy, Darrell Duffie, Mariacristina De Nardi, Nicolae Gârleanu, Joe Haubrich, Rob Shimer, Neil Wallace, Pierre-Olivier Weill, and Ruilin Zhou for comments. We also thank seminar participants at the Bank of Portugal, Chicago GSB, Federal Reserve Bank of Cleveland, Federal Reserve Bank of New York, Indiana, HEC Lausanne, London Business School, MIT, New York University, Penn State, Princeton, Queens, Rice, Simon Fraser, Singapore Management University, Universidad de San Andrés, Universidad Torcuato Di Tella, University of Basel, UCLA, UCSB, University of Pennsylvania, and University of Texas at Austin. We thank Patrick Higgins for research assistance. Financial support from the C. V. Starr Center for Applied Economics at NYU is gratefully acknowledged. 2 The search-theoretic literature on financial markets also includes Duffie, Gârleanu, and Pedersen (2007), Gârleanu (2008), Miao (2006), Rust and Hall (2003), Spulber (1996), and Weill (2007).
© 2009 The Econometric Society
DOI: 10.3982/ECTA7250
404
R. LAGOS AND G. ROCHETEAU
ket participants can mitigate trading frictions by adjusting their asset positions to reduce their trading needs.3 The key theoretical observation is that an investor’s asset demand in an OTC market depends not only on his valuation for the asset at the time of the trade, but also on his expected valuation over the holding period until his next opportunity to trade. A reduction in trading frictions makes investors less likely to remain locked into an undesirable asset position and therefore induces them to put more weight on their current valuation. As a result, a reduction in trading frictions induces an investor to demand a larger asset position if his current valuation is relatively high and a smaller position if it is relatively low, which tends to increase the spread of the distribution of asset holdings. We find that this effect on the dispersion of the distribution of asset holdings is a key channel through which trading frictions determine trade volume, bid–ask spreads, and trading delays—the dimensions of market liquidity that search-based theories of financial intermediation are designed to explain. Trade volume is a manifestation of the ability of the exchange mechanism to reallocate assets across investors. We find that by increasing the dispersion of asset positions, a reduction in trading delays (or in dealers’ market power) tends to increase trade volume. Bid–ask spreads constitute the main out-ofpocket transaction cost in an illiquid market. Our model generates a distribution of spreads across trade sizes and predicts that spreads per unit of asset traded decrease with the ease with which investors can find alternative trading partners (a mechanism identified in DGP), but increase with the size of the trade. Since reduced trading delays tend to increase trade sizes, marketwide measures of transaction costs can vary in a nonmonotonic fashion with the extent of the trading frictions. Trading delays are a distinguishing feature of an OTC market. We find that the distribution of asset holdings is a key determinant of trading delays and that the interaction with the dealers’ incentives to make markets generates a liquidity externality that can give rise to multiple steady states. There is a connection between DGP and Kiyotaki and Wright (1989): both are search-based theories of exchange, and both restrict asset holdings to keep the distribution of assets manageable. Similarly, there is also a connection between our work and the monetary literature that attempts to generalize the inventory restrictions of Kiyotaki and Wright (e.g., Camera and Corbae (1999) and Molico (2006)). In standard monetary models, idiosyncratic trading shocks are the only source of heterogeneity, so in his numerical examples, Molico (2006) found that the distribution of money holdings becomes more concentrated as trading frictions are reduced. Our theory has the opposite prediction 3 The importance of this mechanism in the context of another class of models—those with exogenous transaction costs—has been stressed by Constantinides (1986) for the case of proportional transaction costs, and by Lo, Mamaysky, and Wang (2004) for the case of fixed transaction costs.
LIQUIDITY IN ASSET MARKETS
405
due to the asset reallocation that dealers carry out among investors with heterogeneous valuations. Another difference between our work and the analogous monetary literature is that, aside from a few exceptions (e.g., Green and Zhou (2002)), the latter is eminently computational, and theoretical results are limited. Recent search models of money (e.g., Lagos and Wright (2005)) allow for unrestricted inventories but keep the analysis tractable by making assumptions that render the distribution of money holdings degenerate. In contrast, the heterogeneity in asset holdings that is propagated endogenously by random matching is an important feature of our model, and we are able to provide an analytical characterization of the equilibrium—including transitional dynamics and the endogenous distribution of asset holdings. 2. ENVIRONMENT Time is continuous, starts at t = 0, and goes on forever. There are two types of infinitely lived agents: a unit measure of investors and a unit measure of dealers. There is one asset, one perishable consumption good called fruit, and another consumption good defined as numéraire. The asset is durable, perfectly divisible, and in fixed supply, A ∈ R+ . Each unit of the asset produces a unit flow of fruit. There is no market for fruit, so holding the asset is necessary to consume this good. The numéraire good is produced and consumed by all agents. The instantaneous utility function of an investor is ui (a) + c, where a ∈ R+ represents the fruit consumption (which coincides with the investor’s asset holdings), c ∈ R is the net consumption of the numéraire good (c < 0 if the investor produces more of these goods than he consumes), and i ∈ X = {1 I} indexes a preference type. The utility function ui (a) is twice continuously differentiable, strictly increasing, and strictly concave.4 Each investor receives a preference shock with Poisson arrival rate δ. This process is independent across investors. Conditional on the preference shock, the probI ability the investor draws preference type i is πi > 0, with i=1 πi = 1. These preference shocks capture the notion that investors will value the asset differently over time, thereby generating the need to rebalance their asset posi4 Just as in DGP, our specification associates a certain utility to the investor as a function of his asset holdings. The utility from holding an asset position could be simply the value from enjoying the asset itself, as would be the case for real assets such as cars or houses. An alternative interpretation that leads to the same formulation would be to assume that there is a single consumption good, that investors are risk-neutral and able to borrow and lend freely at rate r, and regard the asset as physical capital used to produce the consumption good with the production technology ui . As yet another possibility, one could adopt the preferred interpretation of DGP, namely that ui is in fact a reduced-form utility function that stands in for the various reasons why investors may want to hold different quantities of the asset, such as differences in liquidity needs, financing or financial-distress costs, correlation of asset returns with endowments (hedging needs), or relative tax disadvantages. By now, several papers that build on the work of DGP have formalized the “hedging needs” interpretation. Examples include Duffie, Gârleanu, and Pedersen (2007), Gârleanu (2008), and Vayanos and Weill (2008).
406
R. LAGOS AND G. ROCHETEAU
tions.5 Dealers do not hold positions and their instantaneous utility is c, their consumption of the numéraire good.6 All agents discount at rate r > 0. Dealers can trade the asset continuously in a competitive interdealer market. Investors periodically contact dealers who can trade in this market on their behalf. Meetings with dealers occur at random according to a Poisson process with arrival rate α.7 Once a dealer and an investor have contacted each other, they negotiate the quantity of assets that the dealer will acquire for the investor and the intermediation fee that the dealer charges for his services. After the transaction has been completed, the dealer and the investor part ways. Asset holdings and preference types lie in the sets R+ and X, respectively, and vary across investors and over time. We describe this heterogeneity with a probability space (S Σ Ht ), where S = R+ × X, Σ is the σ-field generated by the sets (A I ), where A ⊆ R+ and I ⊆ X, and Ht is a probability measure on Σ that represents the distribution of investors across asset holdings and preference types at time t. 3. EQUILIBRIUM Let Vi (a t) denote the maximum expected discounted utility attainable by an investor who has preference type i and is holding a assets at time t. The value function Vi (a t) satisfies Tα (1) e−r(s−t) uk(s) (a) ds Vi (a t) = Ei t
+ e−r(Tα −t) Vk(Tα ) ak(Tα ) (Tα ) Tα
− p(Tα ) ak(Tα ) (Tα ) − a − φk(Tα ) (aTα )
where Tα denotes the next time the investor contacts a dealer and k(s) ∈ X denotes the investor’s preference type at time s. The expectations operator, Ei , is over the random variables Tα and k(s), and is indexed by i to indicate that it is conditional on k(t) = i.8 The first term on the right side of (1) contains the expected discounted utility flows over the time interval [t Tα ], whose 5 In online Appendix B (Lagos and Rocheteau (2009)), we allow preference shocks to follow a general continuous-time Markov chain and find that most of the substantive results generalize under appropriate regularity conditions. 6 The restriction that dealers cannot hold assets is immaterial when analyzing steady-state equilibria. Lagos, Rocheteau, and Weill (2007) studied dynamic equilibria where dealers may choose to hold asset positions. 7 Although our description of the trading process is stylized, it captures the salient features of the actual trading arrangements in OTC markets. We refer the interested reader to Schultz (2001) as well as the discussion in Section 2.1 in Lagos and Rocheteau (2006). 8 For now we proceed under the assumption that the right side of (1) is well defined. Later in this section we verify that this is the case by calculating Vi (a t) explicitly. More generally, in
LIQUIDITY IN ASSET MARKETS
407
length is exponentially distributed with mean 1/α. The flow utility is indexed by the preference type, k(s), which follows a compound Poisson process with Pr[k(s) = j|k(t) = i] = [1 − e−δ(s−t) ]πj + e−δ(s−t) I{j=i} for s ≥ t. The second term on the right side of (1) is the expected discounted utility from the time when the investor next contacts a dealer, Tα , onward. At this time Tα , the dealer purchases ak(Tα ) (Tα ) − a in the market (or sells if this quantity is negative) at price p(Tα ) on behalf of the investor; the investor readjusts his asset holdings from a to ak(Tα ) (Tα ) and pays the dealer an intermediation fee φk(Tα ) (a Tα ). Throughout, we will focus on price functions p(t) that are nonnegative and Lebesgue measurable. Both the fee and the asset price are expressed in terms of the numéraire good.9 Let W (t) denote the maximum expected discounted utility attainable by a dealer. It satisfies
−r(Tα −t) φi (a Tα ) dHTα + W (Tα ) W (t) = E e S
where the expectations operator, E, is over the next time the dealer meets an investor, Tα . Random matching implies that the investor whom the dealer meets is a random draw from HTα , the distribution of investors across preference types and asset holdings at time Tα . We turn to the determination of the terms of trade in a bilateral meeting at time t between a dealer and an investor of type i who is holding a. Let a denote the investor’s posttrade asset holdings and let φ denote the intermediation fee. We take (a φ) to be the outcome corresponding to the Nash solution to a bargaining problem where the dealer has bargaining power η ∈ [0 1]. The utility of the investor is Vi (a t) − p(t)(a − a) − φ if an agreement (a φ) is reached and is Vi (a t) in case of disagreement. Therefore, the investor’s gain from trade is Vi (a t) − Vi (a t) − p(t)(a − a) − φ. Analogously, the utility of the dealer is W (t) + φ if an agreement (a φ) is reached and is W (t) in case of disagreement, so the dealer’s gain from trade is the fee, φ. The bargaining outcome is (2)
[ai (t) φi (a t)] = arg max [Vi (a t) − Vi (a t) − p(t)(a − a) − φ]1−η φη (a φ)
online Appendix D (Lagos and Rocheteau (2009)) we formulate the investor’s infinite-horizon problem from the time-0 perspective, and formalize the relationship between the maximum value of that problem and the function {Vi }i∈X that satisfies (1). 9 Since the intermediation fee is determined in a bilateral meeting, it may depend on the investor’s preference type and asset holdings. Our notation for the investor’s new asset position, ak(Tα ) (Tα ), makes explicit that it may depend on time and on the investor’s preference type at the time of the trade. Below (condition (3)), we will find that the investor’s new asset position is independent of the asset position he was holding at the time of the trade. To simplify the notation, we anticipate this result and do not include a as an argument of his new asset position.
408
R. LAGOS AND G. ROCHETEAU
where the maximization is subject to a ≥ 0.10 The solution (2) can be written as (3)
[Vi (a t) − p(t)a ] ai (t) = arg max
(4)
φi (a t) = η{Vi [ai (t) t] − Vi (a t) − p(t)[ai (t) − a]}
a ≥0
We now turn to the investor’s problem. Substitute (3) and (4) into (1) to obtain (5)
Tα
Vi (a t) = Ei
e−r(s−t) uk(s) (a) ds
t
+ e−r(Tα −t) (1 − η) max Vk(Tα ) (a Tα ) − p(Tα )(a − a) a ≥0
+ ηVk(Tα ) (a Tα )
It is apparent from (5) that the investor’s payoff is the same he would get in an alternative environment where he meets dealers according to a Poisson process with arrival rate α, but instead of bargaining, he readjusts his asset position and extracts the whole surplus with probability 1 − η, whereas with probability η he cannot readjust his asset position and enjoys no gain from trade. Therefore, from the standpoint of the investor, keeping the paths of the aggregate variables unchanged, the environment we are analyzing is payoff-equivalent to an alternative one in which he meets dealers according to a Poisson process with arrival rate κ = α(1 − η) and has all the bargaining power in bilateral negotiations. Based on this observation, the following lemma offers an equivalent formulation of the investor’s choice of asset holdings that appears on the right side of (5). 10 The maximum in (2) is achieved provided that maxa [Vi (a t) − p(t)a ] is achieved, which will be the case in equilibrium (see Lemma 1 and the proof of Proposition 1). Also, note that it would be equivalent to set φ = (pˆ − p(t))(a − a) in (2) and reformulate the bargaining problem ˆ If a > a, the investor is a buyer and pˆ > p(t) can be interpreted as the as a choice of (a − a p). ask price he is charged by the dealer. Conversely, if a < a, the investor is a seller and pˆ < p(t) is the bid price he is paid by the dealer. The outcome from the axiomatic Nash solution can also be obtained from a strategic bargaining game in which, upon contact, a randomly selected proposer makes a take-it-or-leave-it offer. Nature selects the dealer to make an offer with probability η, which the investor must either accept or reject on the spot. With complement probability 1 − η, the investor makes the offer and the dealer either accepts or rejects on the spot. It is easy to check that the expected equilibrium outcome of this game coincides with (2), subject to the obvious reinterpretation of φi (a t) as an expected intermediation fee, which is inconsequential. See online Appendix C (Lagos and Rocheteau (2009)) for details.
LIQUIDITY IN ASSET MARKETS
409
LEMMA 1: An investor with preference type i and asset holdings a who readjusts his asset position at time t chooses (6)
ai (t) = arg max [u¯ i (a ) − q(t)a ] a ≥0
where (r + κ)ui (a) + δ (7) (8)
πj uj (a)
j
u¯ i (a) =
r +κ+δ ∞ −(r+κ)s q(t) = (r + κ) p(t) − κ e p(t + s) ds 0
If q(t) > u¯ i (∞), then ai (t) exists and is unique. In Lemma 1, u¯ i (a)/(r + κ) is the expected discounted utility and q(t)/ (r + κ) = p(t) − E[e−r(Tκ −t) p(Tκ )] is the present value of the expected capital loss to the investor from holding a from t until the next (effective) time Tκ when he readjusts his holdings, where Tκ − t is exponentially distributed with mean 1/κ. The assumptions on q(t) are without loss of generality since they will be implied by the market-clearing condition. Given that ui is strictly concave for each i, the asset position ai (t) solves the maximization problem on the right side of (6) at time t if and only if it satisfies (9)
u¯ i [ai (t)] ≤ q(t)
“ = ” if ai (t) > 0
In online Appendix D (Proposition 9) we show that a feasible asset plan {(ai (t) t ∈ [0 ∞))}Ii=1 maximizes the investor’s infinite-horizon problem from the time-0 perspective if and only if it satisfies (9) and lim Ei e−rTn p(Tn )ak(Tn ) (Tn ) = 0 (10) n→∞
where Tn , for n = 1 2 denotes the time at which the investor gains his nth effective access to the market. In online Appendix D we also establish a version of the principle of optimality for our economy (Lemma 8), and show that if there exist real numbers B and B such that maxj u¯ j (∞) < B ≤ q(t) ≤ B for all t, then Vi (a t) = u¯ i (a)/(r + κ) + [p(t) − q(t)/(r + κ)]a + Ki (t), where Ki (t) ∈ R. From (4), φi (a t) = η{Vi [ai (t) t] − Vi (a t) − p(t)[ai (t) − a]}, with ai (t) characterized by (9). If we substitute the value function, we arrive at (11)
φi (a t) =
η{u¯ i [ai (t)] − u¯ i (a) − q(t)[ai (t) − a]} r +κ
410
R. LAGOS AND G. ROCHETEAU
Since each investor contacts a dealer with equal probability, the quantity of assets supplied in the interdealer market over a small interval of time dt is α dt A.11 Similarly, the measure of type-i investors who contact dealers is α dt ni (t), where (12)
ni (t) = e−δt ni (0) + (1 − e−δt )πi
is the measure of investors with preference type i at time t, so the demand for I assets in the interdealer market is α dt i=1 ni (t)ai (t). The clearing condition for the asset market is (13)
I
ni (t)ai (t) = A
i=1
This condition implies that the q(t) that clears the market is continuous and bounded.12 Given such a q(t) and (10), the following lemma shows how to recover p(t). LEMMA 2: For any continuous and bounded q(t), the price of the asset is ∞ 1 p(t) = q(t) + κ (14) e−r(s−t) q(s) ds r+κ t At any point in time, investors differ in asset holdings and preference types. Consider a set of asset holdings A and a set of preference types I . Then for all (A I ) ∈ Σ, Ht (A I ) gives the measure of investors whose asset holdings and preference types lie in A and I , respectively. We characterize this probability measure in the following lemma, where I{a∈A} denotes an indicator function that equals 1 if a ∈ A. LEMMA 3: The measure of investors across individual states at time t satisfies (15)
t I 0 nji (A t) + I{aj (t−τ)∈A} nji (τ t) dτ Ht (A I ) = i∈I j=1
0
for all (A I ) ∈ Σ, where (16) (17)
n0ji (A t) = e−αt (1 − e−δt )πi + e−δt I{i=j} H0 (A {j}) nji (τ t) = αe−ατ (1 − e−δτ )πi + e−δτ I{i=j} nj (t − τ)
11 See Duffie and Sun (2007) for a derivation of the law of large numbers in random-matching environments. 12 The asset demand, ai , is a continuous function of q, and ni (t) is continuous in t, so q(t) is continuous. Also, (13) implies maxi u¯ i (∞) < q(t) ≤ maxi u¯ i (A).
LIQUIDITY IN ASSET MARKETS
411
At time 0, the market starts with investors distributed across preference types and asset holdings according to the initial probability measure H0 . Subsequently, there are two types of investors: those who have not contacted a dealer since time 0 and those who have. The time-t measure of those who started at time 0 with preference type j and assets in A, whose preference type is i at the current time t, and who have never traded (so their asset holdings are still in A) is n0ji (A t) as given in (16). Analogously, nji (τ t) in (17) gives the time-t density of investors whose last trade was at time t − τ when their preference type was j and who have preference type i at time t. DEFINITION 1: An equilibrium is a time path {ai (t)} q(t) p(t) {φi (a t)} Ht that satisfies (6), (11), (13), (14), and (15), given an initial condition H0 . PROPOSITION 1: There exists a unique equilibrium. For any H0 , the equilibrium allocations and prices, {ai (t)} q(t) p(t) {φi (a t)} Ht , converge to the unique steady-state allocations and prices {ai } q p {φi (a)} H that satisfy p = q/r, (18) (19)
u¯ i (ai ) ≤ q I
“ = ” if ai > 0
πi ai = A
i=1
η[u¯ i (ai ) − u¯ i (a) − q(ai − a)] r +κ δπi πj + απi I{i=j} H({ai } {j}) = (21) α+δ I and H(A I ) = 0 for all (A I ) ∈ Σ such that j=1 {aj } ∩ A = ∅. (20)
φi (a) =
It is possible to show that the equilibrium is efficient if and only if η = 0 (see online Appendix B). To illustrate how a reduction in trading delays affects the equilibrium, consider the limiting case κ → ∞. From (7), u¯ i (a) → ui (a), and ˙ from (8) and (9), ui [ai (t)] ≤ q(t) = rp(t) − p(t) for all i. From (13), q(t) → ∗ [q (t)] = A, where It+ = {i ∈ X : ai (t) > 0}. q∗ (t), which solves i∈It+ ni (t)u−1 i From (11), φi (a t) → 0 for all a, i, and t. With regard to the distribution of investors, α → ∞ implies that every investor holds his desired asset position at all times.13 Thus, as frictions vanish, investors choose ai (t) continuously by equating their current marginal utility from holding the asset to its effective 13 To see this, first note that (16) implies the measure of agents who have not contacted a dealer since time 0 vanishes; that is, n0ji (A t) → 0 for all i and j, all t, and all A ⊆ R+ as α → ∞. The time-t density of agents who have not contacted a dealer since time t − τ > 0 is n(τ t) = Iij=1 nji (τ t). From (17), α → ∞ implies n(τ t) → 0 for all τ > 0, that is, investors can find a dealer instantly when α is arbitrarily large, so the measure of investors who have
412
R. LAGOS AND G. ROCHETEAU
cost q∗ (t), and the equilibrium fees, asset price, and distribution of asset holdings are the ones that would prevail in a Walrasian economy. In what follows, when we analyze the steady state we will denote an individual investor’s state (ai j) ∈ {ai }Ii=1 × X by (i j) ∈ X2 and H({ai } {j}) by nij . Also, at times we use φji to denote φi (aj ) for (i j) ∈ X2 . 4. SEARCH FRICTIONS AND THE DISTRIBUTION OF ASSET HOLDINGS In this section we focus on the steady state to study the effects of trading frictions on the distribution of asset holdings. Hereafter we assume ui (∞) = 0 and ui (0) = ∞ for each i.14 Condition (18) becomes (22)
u¯ i (ai ) = rp
Let ai = gi (κ; p) denote the choice of asset holdings characterized by (22). Then I δ ui (ai ) − πj uj (ai ) ∂gi (κ; p) j=1 = (23) ¯ ∂κ −ui (ai )(r + κ + δ)2 I has the sign of ui (ai ) − j=1 πj uj (ai ), that is, an investor whose current marginal valuation exceeds his expected marginal valuation over the expected holding period increases his demand when κ increases. If ui (ai ) > I j=1 πj uj (ai ), the investor anticipates that his valuation is likely to revert toI ward j=1 πj uj (ai ) in the future and that when this happens, he may be unable to rebalance his asset position for some time. Thus, from (22), his choice of ai is lower than u−1 i (rp), what he would choose in a world with no trading delays. If α increases, the investor is more likely to find a dealer faster; if η decreases, it will be cheaper for the investor to readjust his asset holdings once he finds a dealer. In both cases, the investor assigns more weight to current marginal utility from holding the asset relative to the expected value, so his demand increases. Conversely, an investor with a current marginal valuation that is below his expected marginal valuation over the holding period reduces his demand when κ increases.15 All this seems to suggest that the distribution of not met a dealer between t − τ and t is zero for all τ > 0. As for those investors who have met a dealerthis “instant,” from (17), nji (0 t) = 0 for i = j and nii (0 t) = ni (t). Therefore, Ht (A I ) → i∈I I{ai (t)∈A} ni (t) as α → ∞, that is, every investor of type i holds ai (t) at every t. 14 These conditions imply that the investor’s problem has a solution for all q > 0 and that the nonnegativity constraints in (6) are slack at every date for every investor in the unique equilibrium. This will simplify the notation, but is otherwise inessential for our results. 15 In online Appendix B we show that this insight does not rely on preference shocks being independently and identically distributed (i.i.d.). There we derive an expression analogous to (23)
LIQUIDITY IN ASSET MARKETS
413
asset holdings will spread out if frictions are reduced. However, this intuition is only partial because (23) keeps the equilibrium asset price constant. Next, we study the effect of trading frictions on asset prices—a necessary step to establish the general equilibrium effect of trading frictions on the distribution of asset holdings. Let ui (a) = εi u(a). Then (22) becomes ε¯ i u (ai ) = rp, where ε¯ i = ((r +κ)εi + I δε)/(r ¯ + κ + δ) and ε¯ = j=1 πj εj . For a given p, as κ increases, the demands ¯ fall, while those of investors of investors with relatively low valuations (εi < ε) ¯ rise. Whether an increase in κ causes the asset with high valuations (εi > ε) price to rise depends on the curvature of the individual demand for the asset as a function of ε¯ i , that is, on the slope of ∂ai /∂ε¯ i = −[u (ai )]2 /[u (ai )rp]. In Appendix A (Proposition 5) we show that dp/dκ ≥ 0 if [u (a)]2 /[u (a)rp] is decreasing in a.16 The following proposition characterizes the general equilibrium effect of trading frictions on the dispersion of the distribution of asset holdings. PROPOSITION 2: (i) For all i ∈ {1 I}, ai → A as r + κ → 0. (ii) Let ui (a) = εi a1−σ /(1 − σ) with σ > 0. An increase in κ causes the equilibrium distribution of asset holdings to become more dispersed. According to part (i) of Proposition 2, the dispersion of the distribution of asset holdings approaches zero as trading frictions become very severe, provided that investors are sufficiently patient. This result holds for general preferences and will be useful in our analysis of trade volume, transaction costs, and trading delays.17 when preference shocks follow a general Markov process, and we provide several sufficient conditions that allow us to sign ∂gi (κ p)/∂κ. We show, for instance, that for κ sufficiently large, I ∂gi (κ p)/∂κ > 0 if and only if ui (ai ) < j=1 πij uj (ai ), where πij is the probability that an investor with preference type i draws type j conditional on his receiving a preference shock. This condition is equivalent to the condition in part (i) of Proposition 2 in Gârleanu (2008). See Proposition 6 in online Appendix B (Lagos and Rocheteau (2009)) for details. 16 For example, if u(a) = a1−σ /(1 − σ) with σ > 0, then dp/dκ < 0 (> 0) if σ > 1 (< 1). If u(a) = log a, then ai is linear in ε¯ i and dp/dκ = 0. This particular result is reminiscent of the findings in Constantinides (1986), Gârleanu (2008), and Heaton and Lucas (1995) that the equilibrium asset price is not (much) affected by transaction costs. In online Appendix B, we show that this finding generalizes to the more general case of Markovian preference shocks. 17 In online Appendix B (part (iii) of Proposition 8 in Lagos and Rocheteau (2009)), we show that part (i) of Proposition 2 also holds for more general preference shock processes. The proof of part (ii) of Proposition 2 relies on the assumption of i.i.d. preference shocks and its immediate mean-reverting property. The i.i.d. specification, however, is without loss of generality for the case I = 2. (This is the case analyzed by DGP and much of the subsequent literature.) For I > 2, an increase in trading frictions need not compress the cross-sectional distribution of asset holdings. As pointed out by Gârleanu (2008), it is possible that for certain ranges of κ, an investor with a high current valuation (relative to the cross section of current valuations) may increase his asset holdings in response to an increase in trading frictions. The general insight, however, is that
414
R. LAGOS AND G. ROCHETEAU
5. MARKET LIQUIDITY In the previous section we showed that traders who operate in markets with OTC-style frictions will seek to mitigate these trading frictions by adjusting their asset positions so as to reduce their trading needs. In this section we show how this kind of “liquidity hedging” that we have identified—and that only becomes possible with unrestricted asset holdings—shapes the effects of trading frictions on the three key dimensions of market liquidity: trade volume, transaction costs, and trading delays. Trade Volume Let V denote trade volume, defined as α nij |aj − ai | 2 ij=1 I
(24)
V=
An increase in α has three distinct effects on V . First, the measure of investors in any individual state (i j) ∈ X2 who gain access to the market and are able to I trade increases, which tends to increase V . Second, the proportion 1 − i=1 nii of agents who are mismatched to their asset position—the fraction of agents who wish to trade—decreases, which tends to decrease V . Finally, the distribution of asset holdings spreads out, which tends to increase the quantity of assets traded in many individual trades. With (21) and (24), it is possible to show that the first two effects combined lead to an increase in V . Although it is difficult to sign the third effect in general due to the general equilibrium effects of the price on the distribution of asset holdings, the following proposition establishes analytical results for three cases. PROPOSITION 3: (i) Trade volume approaches zero as r + κ → 0. (ii) Let ui (a) = εi ln a. Trade volume increases with κ. Moreover, for any pair (κ κ ) such that κ > κ, the distribution of trade sizes associated with κ first-order stochastically dominates the one associated with κ. (iii) Let ui (a) = εi a1−σ /(1 − σ) with σ > 0 and assume that I = 2. Trade volume increases with κ. Transaction Costs Intermediation fees and the implied bid–ask spreads constitute the out-ofpocket transaction costs borne by investors and are commonly used measures investors always react to more severe trading frictions by choosing asset positions that reduce the expected sizes of their future asset reallocations.
LIQUIDITY IN ASSET MARKETS
415
of market liquidity.18 At the same time, these spreads determine the revenue of dealers, and hence are a key determinant of their incentives to make markets and provide liquidity. Intermediation fees depend on the rate at which investors can contact alternative dealers, on their bargaining power in bilateral negotiations, and on the size of the trade. We showed in Propositions 2 and 3 that trade sizes tend to increase as trading frictions are reduced. The following result shows that, keeping the characteristics of an investor and a dealer constant, transaction costs—both total and per unit of asset traded—increase with the size of the trade.19 LEMMA 4: Consider an investor who holds asset position a ≥ 0 and wishes to trade |ai − a| > 0. Both ∂φi (a)/∂a and ∂/∂a[φi (a)/|ai − a|] have the same sign as a − ai . In the general equilibrium, κ affects the distribution of asset holdings, and this can give rise to nonmonotonicities in trading costs in response to changes in the degree of trading frictions. We prove this result for the case of patient traders, both for intermediation fees individual trades and for a marketwide for I measure of transaction costs, Φ = ij=1 nji φji . The average fee, Φ, represents the expected revenue of an individual dealer conditional on meeting an investor. PROPOSITION 4: (i) For each (i j) ∈ X2 , there exists r¯ > 0 such that for all r < r¯ and η ∈ (0 1), φji is nonmonotonic in κ and is largest for some κ ∈ (0 ∞). (ii) There exists rˆ > 0 such that for all r < rˆ and η ∈ (0 1), Φ is nonmonotonic in κ and is largest for some κ ∈ (0 ∞). In very illiquid markets (as r + κ → 0), investors hedge against future preference shocks by choosing asset holdings that reflect their average utility from holding the asset rather than their current utility at the time they trade. Thus, trade sizes and fees are small. In very liquid markets (as κ → ∞) investors trade large quantities, but the fees they pay are also small because of favorable search options. For intermediate values of κ, trade sizes are considerable and dealers have a degree of market power that results in larger intermediation fees. Part (ii) of Proposition 4 implies that dealers are better off when 18
See footnote 10 for the theoretical link between intermediation fees and bid–ask spreads. The theory generates a distribution of transaction costs, not only across trade-size categories, but also among trades of equal size, which is in accordance with the evidence from the OTC market for municipal bonds (Green, Hollifield, and Schurhoff (2007)). The increasing relationship between trade size and transaction cost for given α is consistent with the empirical evidence on foreign exchange markets (Burnside, Eichenbaum, Kleshchelski, and Rebelo (2006, Table 12)). In contrast, empirical studies on municipal and corporate bond markets document that larger trades tend to be executed at a discount (Harris and Piwowar (2006)). Our model can rationalize this observation if we allow for heterogeneous investors, some of which can contact dealers faster than others. See Lagos and Rocheteau (2006). 19
416
R. LAGOS AND G. ROCHETEAU
they trade in markets that are neither too liquid nor too illiquid. If κ is very large, dealers would find it profitable to shift the trading activity to markets with larger η or smaller α. Conversely if κ is very small, perhaps surprisingly, dealers would benefit from reductions in η or increases in α. Trading Delays Here we allow for free entry of dealers so as to endogenize the length of the trading delays and formalize the notion that a dealer’s profit depends on the competition for order flow that he faces from other dealers. Let α now be a continuously differentiable function of the measure of dealers in the market, υ, with ∂α(υ)/∂υ > 0, ∂[α(υ)/υ]/∂υ < 0, α(0) = 0, limυ→∞ α(υ) = ∞, and limυ→∞ α(υ)/υ = 0. Since all matches are bilateral and random, a dealer contacts an investor with Poisson rate α(υ)/υ. A large measure of dealers can choose to participate in the market, and while they participate, incur a flow cost γ > 0 that represents the ongoing costs of running the dealership.20 A steadystate equilibrium with free entry is a list {ai } q {φi (a)} {nji } υ that satisfies (18)–(21) with α = α(υ) and the free-entry condition α(υ) Φ = γ. υ For any η > 0, there exists a steady-state equilibrium with entry of dealers (see Lagos and Rocheteau (2006)). However, the steady-state equilibrium with free entry need not be unique. Although the measure of dealers, υ, is strictly increasing in Φ, the dealers’ expected revenue, Φ, can be a nonmonotonic function of α(υ) (part (ii) of Proposition 4). For the case of patient traders, it can be shown that the model necessarily exhibits multiple steady-state equilibria if α(υ)/υ is not too elastic (the effect of an additional dealer on existing dealers’ order flow is not too large) and γ is in an intermediate range.21 In the case of multiple equilibria, the market could operate in a “low-liquidity equilibrium” with small trade volume, large spreads, and long trading delays, merely because few dealers make markets and investors engage in small transactions.22 APPENDIX A: PROOFS PROOF OF LEMMA 1: We can write (5) as (25)
Vi (a t) = U¯ i (a) V + Ei e−r(Tκ −t) p(Tκ )a + max k(Tκ ) (a Tκ ) − p(Tκ )a a ≥0
20 This formulation of the free entry of dealers is analogous to the free entry of firms in Pissarides (2000). 21 See Lagos and Rocheteau (2008) for details. 22 The strategic complementarity that leads to multiple equilibria in this model depends crucially on the endogenous distribution of asset holdings. The multiplicity is not due to increasing returns in the meeting technology, as in Diamond (1982) or Vayanos and Weill (2008), or to the cost of holding the asset, as in Rocheteau and Wright (2005).
LIQUIDITY IN ASSET MARKETS
417
where (26)
U¯ i (a) = Ei
Tκ −t
−rs
e
uk(t+s) (a) ds
0
From (25), the problem of an investor with preference shock i who gains access to the market at time t is given by max (27) U¯ i (a ) − p(t) − E e−r(Tκ −t) p(Tκ ) a a ≥0
Equation (26) can be written recursively, (28)
(r + κ)U¯ i (a) = ui (a) + δ
I
πj [U¯ j (a) − U¯ i (a)]
j=1
I Multiply (28) through by πi , sum over i, solve for j=1 πj U¯ j (a), and substitute this expression back into (28) to obtain U¯ i (a) = u¯ i (a)/(r + κ), where u¯ i (a) is as in (7). The expected discounted price of the asset at the next time when the investor gets an opportunity to trade is ∞ −r(Tκ −t) Ee (29) p(Tκ ) = κ e−(r+κ)s p(t + s) ds 0
Substitute U¯ i (a) = u¯ i (a)/(r + κ) and (29) into (27), and multiply through by (r + κ) to see that the investor’s problem is given by the maximization in (6). The objective function on the right side of (6) is strictly concave and differentiable, so ui [ai (t)] − q(t) ≤ 0 (“=” if ai (t) > 0) is necessary and sufficient for an optimum of this problem. Since q(t) > u¯ i (∞), ai (t) given by (6) is the unique solution to the maximization problem on the right side of (6). Q.E.D. PROOF OF LEMMA 2: Since q(t) is continuous and bounded, the right side of (14) is well defined. Rewrite (8) as ∞ −(r+κ)(s−t) q(t) = (r + κ) p(t) − κ (30) e p(s) ds t
∞ 1 ˜ [q(t) + κ t e−r(s−t) q(s) ds] is a particular It can be checked that p(t) = r+κ ˜ is well defined, and solution to (30). Since q(t) is continuous and bounded, p(t) it is continuous and bounded. Suppose that p(t) is any other solution to (30). ˜ + z(t), where Then p(t) = p(t) ∞ z(t) = κ (31) e−(r+κ)(s−t) z(s) ds t
418
R. LAGOS AND G. ROCHETEAU
The right side of (31) is differentiable with respect to t, so z(t) is differentiable ˙ = 0, which implies z(t) = Zert for Z ∈ R. Hence, any solution with rz(t) − z(t) p(t) to (30) takes the form ∞ 1 p(t) = q(t) + κ (32) e−r(s−t) q(s) ds + Zert r+κ t To determine the value of Z, we use (10), which can be written as (33) lim E e−rTn p(Tn )Ei ak(Tn ) (Tn )|Tn = 0 n→∞
Since (33) holds for each i, multiply the left side by ni (0) and sum over i to get (34)
−rTn
lim E e
n→∞
p(Tn )
I
ni (0)Ei ak(Tn ) (Tn )|Tn
= 0
i=1
where I
ni (0)Ei ak(Tn ) (Tn )|Tn
i=1
=
I i=1
=
I
ni (0)
I
aj (Tn ) Pr[k(Tn ) = j|k(t) = i]
j=1
aj (Tn )nj (Tn )
j=1
= A Therefore, (34) becomes limn→∞ E[e−rTn p(Tn )A] = 0, and since A > 0, it implies (35)
lim E[e−rTn p(Tn )] = 0
n→∞
Substitute (32) into (35) to obtain ∞ κ −rTn q(Tn ) −rs + lim E e (36) e q(s) ds + Z = 0 n→∞ r + κ r + κ Tn Let F n denote the distribution function of Tn . Normalize T0 = 0 and notice that n Tn = m=1 (Tm − Tm−1 ) is the sum of n independent exponentially distributed random variables with mean 1/κ, so Tn /n → 1/κ almost surely as n → ∞, by
LIQUIDITY IN ASSET MARKETS
419
the strong law of large numbers. This implies that limn→∞ Fn (x) = F(x) with F(x) = 0 for all x ∈ [0 ∞). Therefore, by Theorem 1 in Feller (1971, p. 249), ∞ κ −rx q(x) −rs e + e q(s) ds dFn (x) lim n→∞ r +κ r +κ x ∞ κ −rx q(x) −rs + = e e q(s) ds dF(x) r+κ r+κ x =0 + since limx→∞ [e−rx q(x) r+κ
κ r+κ
∞ x
e−rs q(s) ds]. Hence, (36) implies Z = 0. Q.E.D.
PROOF OF LEMMA 3: We proceed in three steps: (i) derive nji (τ t), (ii) derive n0ji (A t), and (iii) obtain Ht (A I ) for an arbitrary (A I ) ∈ Σ. (i) The density measure of investors who last readjusted their asset holdings at time t − τ > 0 is αe−ατ . The probability that an investor who last contacted a dealer at time t − τ has a history of preference types involving k(t − τ) = j and k(t) = i is (1 − e−δτ )πi + I{i=j} e−δτ . Since the measure of investors with preference type j at time t − τ is nj (t − τ), and the Poisson process for meeting dealers and the compound Poisson process for preference shocks are independent, the density measure of investors who last traded at time t − τ and who have a history of preferences involving k(t − τ) = j and k(t) = i is nji (τ t) = αe−ατ [(1 − e−δτ )πi + I{i=j} e−δτ ]nj (t − τ), as given by (17). (ii) The measure of investors who have not contacted a dealer up to time t is e−αt . Since the Poisson meeting process is independent of investors’ individual states, the time-t measure of investors whose asset holdings and preference types lay in the set (A {j}) at time 0 and who have not yet met a dealer at time t is e−αt H0 (A {j}). The measure of investors who were of preference type j at time 0 and are of type i at time t is (1 − e−δt )πi + e−δt I{j=i} . Thus, the time-t measure of investors who at time 0 had preference type j and assets in A, whose preference type is i at the current time t, and who have never traded (so their asset holdings are still in A) is n0ji (A t) = e−αt [(1 − e−δt )πi + e−δt I{j=i} ]H0 (A {j}), as given in (16). (iii) Ht (A I ) is the measure of investors who have an individual state I (a i) ∈ (A I ) at time t. The first term in Ht (A I ) is i∈I j=1 n0ji (A t), namely, those investors who never contacted dealers but who were holding asset positions in the set A at time 0 and whose preference types at t lie in I . The time-t measure of investors of type i who chose an asset position in the set A the last time they traded, given that their preference type at that time t was j, is 0 I{aj (t−τ)∈A} nji (τ t) dτ. Thus, the second term in Ht (A I ), namely, the measure of investors who the last time they traded chose asset positions that belong to the set A and whose preference types at time t lie in I , is I t Q.E.D. j=1 0 I{aj (t−τ)∈A} nji (τ t) dτ. i∈I
420
R. LAGOS AND G. ROCHETEAU
PROOF OF PROPOSITION 1: For all t ≥ 0, the distribution {ni (t)}Ii=1 is I unique and given by (12). Define Adt (q) ≡ { i=1 ni (t)ai (q) : ai (q) ∈ arg maxa ≥0 [u¯ i (a ) − qa ]} for q ∈ (q(t) +∞), where q(t) = maxi∈X u¯ i (∞) × I{ni (t)>0} . (If q ≤ q(t), then (9) has no solution for some i such that ni (t) > 0.) From Lemma 1, the optimal choice ai is uniquely determined for all q ∈ (q(t) +∞) and all i such that ni (t) > 0, and it is continuous in q. Consequently, Adt (q) is single-valued and continuous for q ∈ (q(t) +∞). Moreover, (9) implies that any interior choice ai (t) is a strictly decreasing function of ¯ q(t) for every i. Thus, Adt (q) is strictly decreasing for all q ∈ (q(t) q(t)), d ¯ ¯ where q(t) = maxi∈X u¯ i (0)I{ni (t)>0} and At (q) = {0} for all q ≥ q(t). As q ↓ ¯ Adt (q) → 0. So for each t there is a q(t), Adt (q) → +∞, and as q ↑ q(t), ¯ unique q(t) ∈ (q(t) q(t)) such that Adt [q(t)] = {A} or, equivalently, such that I I i=1 ni (t)ai [q(t)] = A. Given this q(t), there is a unique {ai (t)}i=1 that solves (9). Given q(t), (11) gives the fee φi (a t) for every i and a. Finally, given {ai (t)}Ii=1 , the distribution Ht is given by (15). From (12), limt→∞ ni (t) = πi for each i. By an argument similar to that in the proof of Proposition 1, one can establish that there is a unique, time-invariant q that clears the asset market. Given this q, (9) implies a unique set of timeinvariant optimal asset holdings {ai }Ii=1 . Thus, {ai }Ii=1 and q satisfy (18) and (19). Given the fact that q(t) = q for all t, (14) implies p = q/r. Given q and {ai }Ii=1 , (11) implies (20), which determines the time-invariant fees {φi (a)}Ii=1 . To derive (21), start from Lemma 3 and note that limt→∞ n0ji (A t) = 0 for all i, j ∈ X and all A ⊆ R+ . Also, limt→∞ nji (τ t) = αe−ατ [(1 − e−δτ )πi + e−δτ I{i=j} ]πj ≡ nji (τ ∞) and limt→∞ aj (t − τ) = aj , so lim Ht (A I ) =
t→∞
I i∈I j=1
∞
I{aj ∈A} nji (τ ∞) dτ ≡ H(A I )
0
for all (A I ) ∈ Σ. To conclude, observe that H({ai } {j}) = carry out the integration to obtain (21).
∞ 0
nij (τ ∞) dτ and Q.E.D.
PROOF OF PROPOSITION 2: (i) From (7), as r + κ → 0, then u¯ i (a) → I δ j=1 πj uj (a) which is independent of i. Together with market clearing, this implies that ai → A for all i ∈ {1 I} as r + κ → 0. (ii) Let ai (κ) denote the individual demand of an investor with preference type i in a market with effective contact rate κ. With ui (a) = εi a1−σ /(1 − σ), (37)
I
ai (κ) = A
j=1
(r + κ)εj + δε¯ πj (r + κ)εi + δε¯
1/σ
421
LIQUIDITY IN ASSET MARKETS
Consider κ > κ. We have a1 (κ ) < a1 (κ), since (r + κ)εj + δε¯ (r + κ )εj + δε¯ > (r + κ )ε1 + δε¯ (r + κ)ε1 + δε¯
for all j > 1
and aI (κ ) > aI (κ), since (r + κ)εj + δε¯ (r + κ )εj + δε¯ < (r + κ )εI + δε¯ (r + κ)εI + δε¯
for all j < I
The difference ai (κ ) − ai (κ) is continuous in εi , so there exists ε˜ ∈ (ε1 εI ) ˜ Moreover, from (37), such that ai (κ ) = ai (κ) ≡ a. (r + κ)a˜ ∂ai (κ) (r + κ )a˜ ∂ai (κ ) > = = ∂εi εi =ε˜ σ[(r + κ )ε˜ + δε] ¯ σ[(r + κ)ε˜ + δε] ¯ ∂εi εi =ε˜ so ai (κ ) as a function of εi intersects ai (κ) from below. Hence ε˜ is unique, ˜ With (21), the and ai (κ ) < ai (κ) for all εi < ε˜ and ai (κ ) > ai (κ) for all εi > ε. cumulative distribution of assets indexed by κ, is Gκ (a) =
I
I{aj (κ)≤a} πj
j=1 ˜ The fact a that ai (κ ) 0 for all i and ai = aj unless i = j. From (21), the proportion of trades that involve buying ai and selling aj or vice I I versa (for i = j) is (nij + nji )/(1 − i=1 nii ) = 2πi πj /(1 − i=1 πi2 ), which is independent of κ. From Proposition 5, dp/dκ = 0, so differentiating (22), d[gi (κ; p) − gj (κ; p)] δ(εi − εj ) = dκ rp(r + κ + δ)2 Thus, |ai −aj | = |gi (κ; p) −gj (κ; p)| increases with κ for all i = j. The measure of trades of size less than z ≥ 0 is I i=1 j=i
πi πj I{|ai −aj |≤z} I 2 1− πi i=1
which is decreasing in κ. This establishes that the distribution of trade sizes associated with κ first-order stochastically dominates the one associated with κ if κ > κ. Since every trade size is larger in the market with a larger κ, we conclude that V increases with κ. (iii) For I = 2, we have X ={1 2} and
V=
αδπ1 π2 [a2 (κ) − a1 (κ)] α+δ
where ai (κ) is given by (37). Since ε1 < ε2 , we have a1 (κ) < a2 (κ), and by part (i) of Proposition 2, da2 (κ) da1 (κ) 0 dκ α+δ dκ dκ (b) An increase in κ caused by an increase in α, which implies dV = dκ
> 0
δ α+δ
2
αδπ1 π2 da2 (κ) da1 (κ) − π1 π2 [a2 (κ) − a1 (κ)] + α+δ dκ dκ Q.E.D.
423
LIQUIDITY IN ASSET MARKETS
PROOF OF LEMMA 4: Differentiate (20) to obtain ∂φi (a) η =− [u¯ (a) − q] ∂a r +κ i Suppose that the nonnegativity constraint on ai is slack. Then, since u¯ i is strictly concave and u¯ i (ai ) − q = 0, we know that u¯ i (a) − q < 0 if and only if a − ai > 0, and ∂φi (a)/∂a has the same sign as a − ai . If ai = 0, then a > ai and u¯ i (a) − q < u¯ i (ai ) − q ≤ 0, so ∂φi (a)/∂a > 0, which is the same sign as a − ai = a > 0. This establishes the first part. To show the second part, divide (20) by (ai − a) and differentiate the resulting expression to get ∂ φi (a) η u¯ i (ai ) − u¯ i (a) − u¯ i (a)(ai − a) = ∂a ai − a r +κ (ai − a)2 which is strictly negative, since u¯ i is strictly concave.
Q.E.D.
PROOF OF PROPOSITION 4: (i) Let q(κ r), ai (κ r), and φji (κ r) denote, respectively, the equilibrium q, ai , and φji that solve (18), (19), and (20) for all i j ∈ X. We proceed in three steps: (a) show that φji (κ r) > 0 for all κ ∈ (0 ∞) and all r ∈ [0 ∞) provided ai (κ r) = aj (κ r) and η > 0; (b) establish that limκ→∞ φji (κ r) = 0 for any r ≥ 0 and all (i j) ∈ X2 ; (c) show that for each κ ∈ (0 ∞) there is r¯ > 0 such that φji (0 r) < φji (κ r) for all r ∈ (0 r¯). The nonmonotonicity of φji (κ r) with respect to κ for all r ∈ [0 r¯) will then follow from steps (a) through (c). η {maxa [u¯ i (a ; κ r) − qa ] − [u¯ i (aj ; κ r) − qaj ]}, so (a) From (20), φij = r+κ φij (κ r) > 0 for all κ ∈ (0 ∞) and all r ∈ [0 ∞) provided η > 0 and aj = arg maxa ≥0 [u¯ i (a ) − qa ] (i.e., provided the investor trades). (b) limκ→∞ q(κ r) = q∗ and limκ→∞ ai (κ r) = arg maxa ≥0 [ui (a ) − q∗ a ] ≡ I ∞ ∗ hi (q∗ ), where q∗ is independent of r and solves i=1 πi h∞ i (q ) = A, which ∗ ∞ ∗ ∗ in turn implies q ∈ (0 ∞), hi (q ) < ∞, and hence |ui (aj ) − q aj | < ∞ for all (i j) ∈ X2 . Therefore, limκ→∞ φij (κ r) = 0 for any r ≥ 0 and all (i j) ∈ X2 . ˜ (c) Let κ → 0 to obtain q(0 r) = q(r) and ai (0 r) = arg maxa ≥0 [u˜ i (a ) − I 0 ˜ where u˜ i (a; r) = (rui (a) + δu(a))/(r ˜ ˜ ˜ ] ≡ hi (q), + δ), u(a) = k=1 πk uk (a), qa I ˜ = A. From (20), and q˜ solves i=1 πi h0i (q) (38)
φji (0 r) = η
r[ui (ai ) − ui (aj )] + δ
I
πk [uk (ai ) − uk (aj )]
k=1
˜ − (r + δ)q(r)(a i − aj )
((r + δ)r)
424
R. LAGOS AND G. ROCHETEAU
˜ Observe that limr→0 ai (0 r) = u˜ −1 [q(0)] = A for each i ∈ X. Totally differentiate (18) and (19) with respect to r and evaluate at κ = r = 0 to find ˜ ∂ai (0 0) q(0) + δq˜ (0) − ui (A) = I ∂r δ πk uk (A) k=1
and I i=1
πi
∂ai (0 0) = 0 ∂r
Combine these conditions to get ˜ q(0) + δq˜ (0) − u˜ (A) = 0 I δ πk uk (A) k=1
˜ which together with the investor’s first-order condition, u˜ (A) = q(0), implies q˜ (0) = 0 and hence ˜ q(0) − ui (A) ∂ai (0 0) = I ∂r δ πk uk (A) k=1
With this, apply l’Hôpital’s rule to (38) to find limr→0 φji (0 r) = 0. Our assumptions on primitives imply that q(κ r) and ai (κ r) are continuous functions, so φji (κ r) is continuous. Hence, for each (i j) with i = j and each κ ∈ (0 ∞), there is some r¯ > 0 such that for all r ∈ [0 r¯), we have limκ→∞ φji (κ r) = 0 < φji (κ r) (by (a) and (b)) and φji (0 r) < φji (κ r) (by (a) and (c)), which establishes the nonmonotonicity of φij with respect to κ. I (ii) Write Φ(α η r) = ij=1 nji (α)φji [α(1 − η) r], where nji (α) is given by (21). Fix an arbitrary (α η) ∈ (0 ∞) × (0 1). From step (a) in part (i), φIj [α(1 − η) r] > 0 for j < I and all r ∈ [0 ∞). Hence, Φ(α η r) > 0 for all α(1 − η) ∈ (0 ∞) and all r ∈ [0 ∞). Following a similar reasoning as in step (c) in part (i), for each (i j) ∈ X2 , there is r¯ji > 0 such that for all r ∈ [0 r¯ji ), φji (0 r) < Φ(α η r). Then Φ(0 η r) < Φ(α η r) for any r ∈ [0 r0 ), where r0 = min(ij)∈X2 r¯ji . Finally, from step (b) in part (i), for any r ≥ 0 we have limα →∞ Φ(α η r) = 0 < Φ(α η r), which establishes the nonmonotonicity of Φ with respect to α and, therefore, with respect to κ = α(1 − η). Q.E.D.
LIQUIDITY IN ASSET MARKETS
425
PROPOSITION 5: Let ui (a) = εi u(a). If [u (a)]2 /u (a) is strictly decreasing in a, then dp/dκ > 0. If [u (a)]2 /u (a) is increasing in a, then dp/dκ ≤ 0 (with “=” if [u (a)]2 /u (a) is constant). PROOF: Differentiate (19) to obtain I
dp = dκ
πi ∂ai /∂κ
i=1
−
I
πi ∂ai /∂p
i=1
The denominator of this expression is strictly positive (from (22)), so focus on the sign of the numerator. Differentiate (22) to obtain ∂ai /∂κ, multiply by πi , and add over all i to arrive at I i=1
[u (ai )]2 δ ∂ai = (εi − ε) πi ¯ ∂κ (r + κ + δ)2 rp i=1 −u (ai ) I
πi
Suppose −[u (a)]2 /u (a) is strictly increasing in a. Let a¯ denote the a ¯ Then note that −[u (ai )]2 (εi − ε)/u ¯ (ai ) ≥ that solves (22) for ε¯ i = ε. 2 ¯ (εi − ε)/u ¯ for each i, with strict inequality for all i such that ¯ (a) −[u (a)] I dp εi = ε. ¯ Thus, i=1 πi ∂ai /∂κ > 0 and consequently, dκ > 0. Similar reasondp dp 2 = 0 if ing implies dκ < 0 if −[u (a)] /u (a) is strictly decreasing and dκ 2 −[u (a)] /u (a) is constant in a. Q.E.D. REFERENCES BURNSIDE, C., M. EICHENBAUM, I. KLESHCHELSKI, AND S. REBELO (2006): “The Returns to Currency Speculation,” Unpublished Manuscript, Dept. of Economics, Nortwestern University. [415] CAMERA, G., AND D. CORBAE (1999): “Money and Price Dispersion,” International Economic Review, 40, 985–1008. [404] CONSTANTINIDES, G. (1986): “Capital Market Equilibrium With Transaction Costs,” Journal of Political Economy, 94, 842–862. [404,413] DIAMOND, P. A. (1982): “Aggregate Demand Management in Search Equilibrium,” Journal of Political Economy, 90, 881–894. [416] DUFFIE, D., AND Y. SUN (2007): “Existence of Independent Random Matching,” Annals of Applied Probability, 17, 386–419. [410] DUFFIE, D., N. GÂRLEANU, AND L. H. PEDERSEN (2005): “Over-the-Counter Markets,” Econometrica, 73, 1815–1847. [403] (2007): “Valuation in Over-the-Counter Markets,” Review of Financial Studies, 20, 1868–1900. [403,405] FELLER, W. (1971): An Introduction to Probability Theory and Its Applications, Vol. 2 (Second Ed.). New York: Wiley. [419] GÂRLEANU, N. (2008): “Portfolio Choice and Pricing in Illiquid Markets,” Journal of Economic Theory (forthcoming). [403,405,413]
426
R. LAGOS AND G. ROCHETEAU
GREEN, E., AND R. ZHOU (2002): “Dynamic Monetary Equilibrium in a Random Matching Economy,” Econometrica, 70, 929–969. [405] GREEN, R., B. HOLLIFIELD, AND N. SCHURHOFF (2007): “Financial Intermediation and the Costs of Trading in an Opaque Market,” Review of Financial Studies, 20, 275–314. [415] HARRIS, L., AND M. PIWOWAR (2006): “Secondary Trading Costs in the Municipal Bond Market,” Journal of Finance, 61, 1361–1397. [415] HEATON, J., AND D. J. LUCAS (1995): “The Importance of Investor Heterogeneity and Financial Market Imperfections for the Behavior of Asset Prices,” Carnegie–Rochester Conference Series on Public Policy, 42, 1–32. [413] KIYOTAKI, N., AND R. WRIGHT (1989): “On Money as a Medium of Exchange,” Journal of Political Economy, 97, 927–954. [404] LAGOS, R., AND G. ROCHETEAU (2006): “Search in Asset Markets,” Staff Report 375, Federal Reserve Bank of Minneapolis. [406,415,416] (2008): “Liquidity in Asset Markets With Search Frictions,” Staff Report 408, Federal Reserve Bank of Minneapolis. [416] (2009): “Supplement to ‘Liquidity in Asset Markets With Search Frictions’,” Econometrica Supplemental Material, 77, http://econometricsociety.org/ecta/Supmat/7250_Proofs.pdf. [406-408,413] LAGOS, R., AND R. WRIGHT (2005): “A Unified Framework for Monetary Theory and Policy Analysis,” Journal of Political Economy, 113, 463–484. [405] LAGOS, R., G. ROCHETEAU, AND P. O. WEILL (2007): “Crashes and Recoveries in Illiquid Markets,” Working Paper 0708. Available at http://www.clevelandfed.org/research/workpaper/ 2007/wp0708.pdf. [406] LO, A. W., H. MAMAYSKY, AND J. WANG (2004): “Asset Prices and Trading Volume Under Fixed Transactions Costs,” Journal of Political Economy, 112, 1054–1090. [404] MIAO, J. (2006): “A Search Model of Centralized and Decentralized Trade,” Review of Economic Dynamics, 9, 68–92. [403] MOLICO, M. (2006): “The Distribution of Money and Prices in Search Equilibrium,” International Economic Review, 47, 701–722. [404] PISSARIDES, C. A. (2000): Equilibrium Unemployment Theory (Second Ed.). Cambridge, MA: MIT Press. [416] ROCHETEAU, G., AND R. WRIGHT (2005): “Money in Search Equilibrium, in Competitive Equilibrium, and in Competitive Search Equilibrium,” Econometrica, 73, 175–202. [416] RUST, J., AND G. HALL (2003): “Middlemen versus Market Makers: A Theory of Competitive Exchange,” Journal of Political Economy, 111, 353–403. [403] SCHULTZ, P. (2001): “Corporate Bond Trading Costs: A Peek Behind the Curtain,” Journal of Finance, 56, 677–698. [406] SPULBER, D. (1996): “Market Making by Price Setting Firms,” Review of Economic Studies, 63, 559–580. [403] VAYANOS, D., AND P. O. WEILL (2008): “A Search-Based Theory of the On-the-Run Phenomenon,” Journal of Finance, 63, 1351–1389. [405,416] WEILL, P. O. (2007): “Leaning Against the Wind,” Review of Economic Studies, 74, 1329–1354. [403]
Dept. of Economics, New York University, 269 Mercer Street, New York, NY 10003, U.S.A.;
[email protected] and Dept. of Economics, University of California–Irvine, 3151 Social Science Plaza, Irvine, CA 92697-5100, U.S.A. and Federal Reserve Bank of Cleveland;
[email protected]. Manuscript received June, 2007; final revision received October, 2008.
Econometrica, Vol. 77, No. 2 (March, 2009), 427–452
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES ON THE INTERNET BY GLENN ELLISON AND SARA FISHER ELLISON1 We examine the competition between a group of Internet retailers who operate in an environment where a price search engine plays a dominant role. We show that for some products in this environment, the easy price search makes demand tremendously pricesensitive. Retailers, though, engage in obfuscation—practices that frustrate consumer search or make it less damaging to firms—resulting in much less price sensitivity on some other products. We discuss several models of obfuscation and examine its effects on demand and markups empirically. KEYWORDS: Search, obfuscation, Internet, retail, search engines, loss leaders, addon pricing, demand elasticities, frictionless commerce.
1. INTRODUCTION WHEN INTERNET COMMERCE first emerged, one heard a lot about the promise of “frictionless commerce.” Search technologies would have a dramatic effect by making it easy for consumers to compare prices at online and offline merchants. This paper examines an environment where Internet price search plays a dominant role: small firms selling computer parts through Pricewatch.com. A primary observation is that the effect of the Internet on search frictions is not so clear-cut: advances in search technology are accompanied by investments by firms in obfuscation. We begin with a brief discussion of some relevant theory. One way to think about obfuscation is in relation to standard search-theoretic models in which consumers do not learn all prices in equilibrium. Obfuscation can be thought of as an action that raises search costs, which can lead to less consumer learning and higher profits. Another way to think about obfuscation is in relation to Ellison (2005), which describes how sales of “add-ons” at high unadvertised prices can raise equilibrium profits in a competitive price discrimination model. Designing products to require add-ons can thereby be a profit-enhancing obfuscation strategy even when consumers correctly infer all prices. Pricewatch is an Internet price search engine popular with savvy computerparts shoppers. Dozens of small, low-overhead retailers attract consumers just 1 We would like to thank Nathan Barczi, Jeffrey Borowitz, Nada Mora, Youngjun Jang, Silke Januszewski, Caroline Smith, Andrew Sweeting, and Alex Wolitzky for outstanding research assistance. We also thank Patrick Goebel for a valuable tip on Internet data collection, Steve Ellison for sharing substantial industry expertise, and Drew Fudenberg, the co-editor, and three anonymous referees for their comments. This work was supported by NSF Grants SBR-9818524, SES0219205, and SES-0550897. The first author’s work was supported by fellowships from the Center for Advanced Study in the Behavioral Sciences and the Institute for Advanced Study. The second author’s work was supported by fellowships from the Hoover Institute and the Institute for Advanced Study.
© 2009 The Econometric Society
DOI: 10.3982/ECTA5708
428
G. ELLISON AND S. F. ELLISON
by keeping Pricewatch informed of their low prices. Although atypical as a retail segment, Pricewatch retail has many of the features one looks for as a setting for an empirical industrial organization study: it is not too complicated, there is unusually rich data, and the extreme aspects of the environment should make the mechanisms of the theory easier to examine. We present an informal evidence section describing various practices that can be thought of as forms of obfuscation. Some of these are as simple as making product descriptions complicated and creating multiple versions of products. We particularly call attention to the practice of offering a low-quality product at a low price to attract consumers and then trying to convince them to pay more for a superior product. We refer to this as a “loss-leader strategy” even though it sometimes differs from the classic loss-leader strategy in two respects: it involves getting consumers to upgrade to a superior product rather than getting them to buy both the loss leader and a second physical good, and the loss leader may be sold for a slight profit rather than at a loss. The majority of the paper is devoted to formal empirical analyses. We analyze demand and substitution patterns within four categories of computer memory modules. Data come from two sources. We obtained yearlong hourly price series by repeatedly conducting price searches on Pricewatch. We matched this to sales data obtained from a single private firm that operates several computer parts websites and derives most of its sales from Pricewatch referrals. Our first empirical result is a striking confirmation that price search technologies can dramatically reduce search frictions. We estimate that the firm faces a demand elasticity of −20 or more for its lowest quality memory modules! Our second main empirical result is a contribution to the empirics of loss leaders. We show that charging a low price for a low-quality product increases our retailer’s sales of medium- and high-quality products. Intuitively, this happens because one cannot ask a search engine to find “decent-quality memory module sold with reasonable shipping, return, warranty, and other terms.” Hence, many consumers use Pricewatch to do what it is good at—finding websites that offer the lowest prices for any memory module—and then search within a few of these websites to find products that better fit their preferences. Other empirical results examine how obfuscation affects profitability. We examine predictions of the two obfuscation mechanisms mentioned above. In the search-theoretic model, obfuscation raises profits by making consumers less informed. In Ellison’s (2005) add-on pricing model, obfuscation raises profits by creating an adverse-selection effect that deters price-cutting. We find evidence of the relevance of both mechanisms. Finally, we examine an additional data source—cost data—for direct evidence that retailers’ obfuscation strategies have been successful in raising markups beyond the level that would otherwise be sustainable. Given the extreme price sensitivity of the demand for low-quality products, a naive application of single-good markup rules would suggest that equilibrium price–cost
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
429
margins might be just 3% to 6%. We find that the average markup on the memory modules sold by the firm that provided us with data is about 12%. A few previous papers have examined price search engines empirically. Brynjolfsson and Smith (2001) used a data set containing the click sequences of tens of thousands of people who conducted price searches for books on Dealtime to estimate several discrete-choice models of demand. Baye, Gatti, Kattuman, and Morgan (2006) examined an extensive data set on the Kelkoo price comparison site and noted that there is a big discontinuity in clicks at the top, in line with clearinghouse models. One advantage of our data set relative to others we are aware of is that we observe actual quantities sold and not just “clickthroughs.” A large number of studies have documented online price dispersion.2 The one study we know of that reports price elasticities obtained from quantity data in an online retail sector is Chevalier and Goolsbee (2003). Some other studies that provide evidence related to Internet search and price levels are Brown and Goolsbee (2002) and Scott Morton, Zettelmeyer, and Silva-Risso (2001, 2003). Our paper has also spawned a broader literature on obfuscation.3 2. THEORY OF SEARCH AND OBFUSCATION Our most basic observation is that it is not a priori obvious that the Internet will lead us toward frictionless commerce. Price search engines and other Internet tools will help consumers to find and to process information, but retailers may simultaneously harness the power of the Internet to make information processing problems more formidable and/or to make consumer informedness less damaging to their profits. In this section we quickly sketch two ways in which one might think about obfuscation.4 2.1. Incomplete Consumer Search A number of authors have developed models in which consumer search costs affect market efficiency and firm profits. Stahl (1989, 1996), for example, considered a model in which some consumers incur a search cost every time they incur a price quote, whereas other consumers do not. The model has a mixed strategy equilibrium: retailers randomize over prices in some interval; fully informed consumers purchase from the lowest priced firm; other consumers often stop searching before finding the lowest priced firm. Firm profits are in2 See Baye, Morgan, and Scholten (2004) for one such study and Baye, Morgan, and Scholten (2007) for a survey. 3 See Ellison (2005), Gabaix and Laibson (2006), Spiegler (2006), and Brown, Hossain, and Morgan (2007). 4 See Ellison and Ellison (2004) for a longer discussion of search engines and search and obfuscation; see Baye and Morgan (2001, 2003) for two formal models of search engines and their effects on prices and firm profits.
430
G. ELLISON AND S. F. ELLISON
creasing in the fraction of consumers with positive search costs and in the level of the search costs. One could regard obfuscation as an action that raises search costs and/or the fraction of consumers who incur search costs. Such actions would increase average markups and the fraction of consumers buying from relatively highpriced firms. Developing such a formal model for our application is well beyond the scope of this paper: one would want all consumers’ searches to be directed by the Pricewatch list, whereas Stahl’s consumers search in a random manner; one would want to extend the model to include multiple products per firm; and one would also want to make search costs firm-specific so that obfuscation could be an action taken by individual firms and not by firms as a whole.5 Nonetheless, the basic intuition from search models that obfuscation might lead to higher profits by making consumer learning less complete seems useful to explore empirically. 2.2. Add-Ons and Adverse Selection Ellison (2005) provided a model with a somewhat different flavor—add-on pricing schemes can raise retailers’ profits even if consumers correctly infer all prices in equilibrium. We develop this idea in more generality below to illustrate how it would work in an empirically relevant setting.6 Suppose two firms i = 1 2 can each produce two versions of a good j = L H at constant marginal costs cL and cH . They post prices piL for their low-quality goods on a price comparison site and simultaneously choose nonposted prices piH for their high-quality products. Consumers who visit the price comparison site learn both low-quality prices. At a time cost of s, consumers can visit a firm’s website, learn its high-quality price, and buy or not buy. They can then visit the second firm’s site at an additional cost of s if they so desire. We assume, however, that consumers wish to buy at most one unit. As in Diamond (1971), the incremental price of the “upgrade” from good L to good H is priced at the ex post monopoly price in any pure strategy equilibrium. The argument is that at any lower price the firm will always be tempted to raise its upgrade price by ε. For ε < s, no consumer will switch to the other firm, because that would require incurring s again and the other firm’s product was less attractive at the prices that the consumer anticipated. Formally, if we write piU ≡ piH − pIL for the upgrade price, cU = cH − cL for the cost of the 5
Another difficulty with the application is that the mixed strategy nature of the equilibrium is awkward. 6 Ellison (2005) used several special assumptions. The population consists of two types, demand for the low-quality good is linear, and all consumers of the same type have an identical willingness to pay to upgrade to the high-quality good.
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
431
upgrade, and x(piU piL p−iL ) for the fraction of consumers who choose to upgrade, Diamond’s argument implies that the equilibrium price p∗iU satisfies p∗iU (piL p−iL ) = pm iU (piL p−iL ) ≡ Arg max(p − cU )x(p piL p−iL ) p
Write x∗ (p1L p2L ) for x(p∗iU (piL p−iL ) piL p−iL ). Write D1 (p1 p2 ) for the number of consumers who visit firm 1.7 Assume that this function is smooth, strictly decreasing in p1 , and otherwise well behaved. Firm 1’s profits when it sets price p1L and the other firm follows its equilibrium strategy are given by ∗ π1 (p1L p∗2L ) = p1L − cL + x∗ (p1L p∗2L )(pm 1U (p1L p2L ) − cU ) × D1 (p1L p∗2L ) The first-order condition implies that the equilibrium prices satisfy (1)
∗ ∗ ∗ p∗1L + x∗ (p∗1L p∗2L )pm 1U − cL − x (p1L p2L )cU m ∗ ∗ ∗ ∗ p1L + x (p1L p2L )p1U 1 ∂x∗ ∂pm 1U ∗ ∗ ∗ = − 1 + (pm − c ) + x (p p ) U 1U 1L 2L ε ∂p1L ∂p1L
where ε=
∂D1 p∗1L + x∗ (p∗1L p∗2L )pm 1U ∂p1L D1 (p∗1L p∗2L )
The left-hand side of this expression is the firm’s revenue-weighted average markup. The right-hand side is the product of a term that is like the inverse of a demand elasticity and a multiplier. Suppose first that the fraction of firm 1’s customers who buy the upgrade at any given price p1U is independent of p1L .8 Then the last two terms in the multiplier are zero. Hence, the average markup satisfies an inverse elasticity rule. If total demand is highly sensitive to the low-quality price, then markups will be low. It does not matter whether the firm earns extremely high profits on addon sales: these are fully “competed away” with below-cost prices if necessary in the attempt to attract consumers. Although the constant-upgrade-fraction assumption might seem natural and has been made with little comment in many papers on competitive price discrimination, Ellison (2005) argued that it is not compelling. One way in which 7 In any pure strategy equilibrium, all consumers who visit firm i will buy from firm i. Otherwise they would be better off not visiting. 8 For example, suppose that the optimal price for good H is always $25 above the price of good L and that 50% of consumers upgrade at this price differential.
432
G. ELLISON AND S. F. ELLISON
real-world consumers will be heterogeneous is in their marginal utility of income. In this case, price cuts disproportionately attract “cheapskates” who have a lower willingness to pay for upgrades. This suggests that it may be more ∗ common that both ∂pm 1U /∂p1L > 0 and ∂x /∂p1L > 0. Ellison (2005) refered to such demand systems as having an adverse-selection problem when add-ons are sold. With such demand, sales of add-ons will raise equilibrium profit margins above the inverse-elasticity benchmark. The factor by which profit margins increase is increasing in both the upgrade price and the fraction of consumers who upgrade. Hence, taking a low-cost, high-value feature out of the low-quality good and making it available in the high-quality good may be a profit-enhancing strategy. 3. THE PRICEWATCH UNIVERSE AND MEMORY MODULES We study a segment of e-retail shaped by the Pricewatch price search engine. It is composed of a large number of small, minimally differentiated firms selling memory upgrades, central processing units (CPUs), and other computer parts. The firms do little or no advertising, and receive most of their customers through Pricewatch. Pricewatch presents a menu that contains a set of predefined categories. Clicking on one returns a list of websites sorted from cheapest to most expensive in a twelve listings per page format. The categories invariably contain heterogeneous offerings: some include products made by higher and lower quality manufacturers, and all include offers with varying return policies, warranties, and other terms of trade. Figure 1 contains the first page of a typical list, that for 128MB PC100 memory modules from October 12, 2000. There is substantial reshuffling in the sorted lists, making Pricewatch a nice environment for empirical study. For example, on average three of the twentyfour retailers on the first two pages of the 128MB PC100 list change their prices in a given hour. Each price change can move several other firms up or down one place. Some websites regularly occupy a position near the top of the Pricewatch list, but there is no rigid hierarchy. Several factors contribute to the reshuffling. One of these is the volatility of wholesale memory prices: wholesale price changes will make firms want to change retail prices. Memory prices declined by about 70% over the course of the year we study, but there were also two subperiods during which prices rose by at least 25%. A second complementary factor is a limitation of Pricewatch’s technology: Pricewatch relied on retailers updating their prices in its data base. Most or all of the retailers were doing this manually in the period we study and would probably reassess each price one or a few times per day.9 When wholesale prices are declining, this results in a pattern where each firm’s price tends to drift slowly down the list until the next time it is reset. 9
A retailer may have dozens or hundreds of products listed in various Pricewatch categories.
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
433
FIGURE 1.—A sample Pricewatch search list: 128MB PC100 memory modules at 12:01pm ET on October 12, 2000.
Our sales and cost data come from a firm that operates several websites, two of which regularly sell memory modules.10 We have data on products in four Pricewatch categories of memory modules: 128MB PC100, 128MB PC133, 256MB PC100, and 256MB PC133. PC100 versus PC133 refers to the speed with which the memory communicates with the CPU. They are not substitutes for most retail consumers because the speed of a memory module must match the speed of a computer’s CPU and motherboard. The second part of the 10
We will call these Site A and Site B.
434
G. ELLISON AND S. F. ELLISON
FIGURE 2.—A website designed to induce consumers to upgrade to a higher quality memory module.
product description is the capacity of the memory in megabytes. The 256MB modules are about twice as expensive. Each of our firm’s websites sells three different quality products within each Pricewatch category. They are differentiated by the quality of the physical product and by contract terms. Figure 2 illustrates how a similar quality choice is presented to consumers on a website that copied Site A’s design. Making comparisons across websites would be much harder than making within-website comparisons because many sites contain minimal technical specifications and contractual terms are multidimensional. 4. OBSERVATIONS OF OBFUSCATION Pricewatch has made a number of enhancements to combat obfuscation. Practices that frustrate search nonetheless remain commonplace. One of the most visible search-and-obfuscation battles was fought over shipping costs. In its early days Pricewatch did not collect information on shipping costs and sorted its lists purely on the basis of the item price. Shipping charges grew to the point that it was not uncommon for firms to list a price of $1 for a memory module and inform consumers of a $40 “shipping and handling” fee at check out. Pricewatch fought this with a two-pronged approach: it mandated that all firms offer United Parcel Service (UPS) ground shipping for a fee no greater than a Pricewatch-set amount ($11 for memory modules); and it added a column that displayed the shipping charge or a warning that cus-
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
435
tomers should be wary of stores that do not report their shipping charges.11 Many retailers adopted an $11 shipping fee in response, but uncertainty about the cost of UPS ground shipping was not completely eliminated: a number of retailers left the column blank or reported a range of charges. The meaning of “UPS ground shipping” was also subject to manipulation: one company explicitly stated on its website that items ordered with the standard UPS ground shipping were given lower priority for packing and might take two weeks to arrive. More recently, Pricewatch mandated that retailers provide it with shipping charges and switched to sorting low-price lists based on shipping-inclusive prices. This appears to be working, but is only fully satisfactory for customers who prefer ground shipping: those who wish to upgrade to third-, second-, or next-day air must search manually through retailers’ websites. One model of obfuscation we discussed involved firms trying to increase customers’ inspection costs and/or reduce the fraction of customers who will buy from the firm on the top of the search engine’s list. We observed several practices that might serve this purpose. The most effective seems to be bundling low-quality goods with unattractive contractual terms, like providing no warranty and charging a 20% restocking fee on all returns. Given the variety of terms we observed, it would seem unwise to purchase a product without reading the fine print. Another practice is making advertised prices difficult to find. In 2001 it took us quite a bit of time to find the prices listed on Pricewatch on several retailers’ sites. In a few cases, we never found the listed prices. Several other firms were explicit that Pricewatch prices were only available on telephone orders. Given that phone calls are more costly for the retailers, we assume that firms either wanted people to waste time on hold or wanted to make people sit through sales pitches. Pricewatch has fought these practices in several ways. For example, it added a “buy now” button, which (at least in theory) takes customers directly to the advertised product. The second obfuscation mechanism we discussed is the adoption of a lossleader or add-on pricing scheme: damaged goods are listed on the search engine at low prices and websites are designed to convince customers attracted by the low prices to upgrade to a higher quality product. Such practices are now ubiquitous on Pricewatch. Figure 2 is one example. Customers who tried to order a generic memory module from Buyaib.com at the price advertised on Pricewatch.com were directed to this page. It illustrates several ways in which the low-priced product is inferior to other products the company sells (at higher markups). Figure 3 is another example. A consumer who tried to order a generic module from Tufshop.com was taken to this page, on which a number of complementary products, upgrades, and services were listed. The figure shows the webpage as it initially appeared, defaulting the buyer to several 11
Our empirical work is based on data from the period when these policies were in effect.
436
G. ELLISON AND S. F. ELLISON
FIGURE 3.—Another website designed to induce consumers to upgrade and/or buy add-ons.
upgrades. To avoid purchasing the various add-ons, the consumer must read through the various options and unclick several boxes. After completing this page, the customer was taken to another on which he or she must choose from a long list of shipping options. These include paying $15.91 extra to upgrade
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
437
FIGURE 3.—(Continued.)
from UPS ground to UPS 3-day, $30.96 extra to upgrade to UPS 2-day, and $45.96 extra to upgrade to UPS next day.12 Our impression is that the practices are also consistent with the add-on pricing model in terms of the low-priced goods being of inefficiently low quality. In Pricewatch’s CPU categories all of the listings on the first few pages were “bare” CPUs without fans attached. This seems highly inefficient: an experienced installer can attach a fan in less than a minute, whereas there is a non12
$20.
The incremental costs to Tufshop of the upgraded delivery methods were about $4, $6, and
438
G. ELLISON AND S. F. ELLISON
trivial probability that a novice will snap off a pin and ruin a $200 chip. We were also told that most of the generic memory modules at the top of Pricewatch’s memory lists are poor quality products that are much more likely to have problems than are other modules that can sometimes be purchased wholesale for just $1 or $2 more. We know that the wholesale price difference is occasionally so small as to induce the retailer from which we got our data to ship mediumquality generic modules to customers who ordered low-quality modules (without telling the customers) because it felt the time cost and hassle of dealing with returns was not worth the cost savings. Obfuscation could presumably take many forms in addition to those we outlined in our theory section. One is that firms could try to confuse boundedly rational consumers. Presumably, this would involve either tricking consumers into paying more for a product than it is worth to them or altering their utility functions in a way that raises equilibrium profits. Our impression is that many Pricewatch retailers’ sites are intentionally confusing. For example, whereas several sites will provide consumers with product comparison lists like that in Figure 2, we did not see any that augmented such a comparison with a description of what “CAS latency” means to help consumers think about whether they should care about it. Pricewatch requires that retailers enter their prices into a data base. An alternate technology for running a price comparison site is to use shopbots to gather information automatically from retailers’ sites. The shopbot approach may be even more prone to obfuscation. In 2001, for example, the Yahoo! Shopping search engine should have had a much easier time gathering information than a general search engine because it only searched sites hosted by Yahoo. Yahoo collected a royalty on all sales made by merchants through Yahoo! Shopping, so there must have been some standardization of listing and ordering mechanics. Nonetheless, when we typed “128MB PC100 SDRAM DIMM” into the search box, the five lowest listed prices were from merchants who had figured out how to get Yahoo! Shopping’s search engine to think the price is zero even though a human who clicks over to the retailer can easily see the price (and see that it is 50–100% above the Pricewatch price). The next hundred or so cheapest items on Yahoo’s search results were also either products for which Yahoo’s search engine had misinterpreted the price or misclassified items. 5. DATA Our price data were collected from Pricewatch.com. They contain information on the twelve or twenty-four lowest price offerings within each of the four predefined categories mentioned above.13 They are at hourly frequency from May 2000 to May 2001. 13 We collected the twenty-four lowest prices for the 128MB PC100 and 128MB PC133 categories and the twelve lowest prices for the other two.
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
439
In addition to the price data for these low-quality products, we obtained price and quantity data from an Internet retailer who operates two websites that sell memory modules. The data contain the prices and the quantities sold for all products that fit within the four Pricewatch categories. The websites usually offer three different quality products in each category. We aggregate data on individual orders to produce daily sales totals for each product–website pair.14 Our primary price variables are the average transaction prices for sales of a given product on a given day.15 We also record the daily average position of each website on Pricewatch’s price-ranked list. The same Internet retailer also provided us with data on wholesale acquisition costs for each product. Websites A and B have identical product lineups: they sell three products within each memory module category, which we refer to as the low-, the medium-, and the high-quality module. Our data set contains between 575 and 683 observations in each category.16 Summary statistics for the 128MB PC100 category are given in Table I.17 The data are at the level of the website day, so the number of days covered is approximately half of the number TABLE I SUMMARY STATISTICS FOR MEMORY MODULE DATA (128MB PC100 MEMORY MODULES; 683 WEBSITE DAY OBSERVATIONS) Variable
LowestPrice Range 1–12 PLow PMid PHi log(1 + PLowRank) QLow QMid QHi
Mean
Stdev
Min
Max
6298 676 6688 9071 11519 186 1280 244 202
3331 252 3451 4010 4637 053 1703 333 346
2100 100 2100 3549 4850 069 0 0 0
12085 1353 12349 14949 18550 326 163 25 47
14 Here, “product” also includes the quality level, for example, a high-quality 128MB PC100 module. 15 Transaction prices are unavailable for products which have zero sales on a given day. These are filled in using the data collected from Pricewatch or imputed using prices on surrounding days and prices charged by the firm’s other websites. 16 Data are occasionally missing due to failures of the program we used to collect data and missing data in the files the firm provided. The 256MB prices are missing for most of the last six weeks, so we chose to use mid-March rather than May as the end of the 256MB samples. 17 Summary statistics for the other categories are presented in Ellison and Ellison (2004, 2004). We will present many results for the 128MB PC100 category and only discuss how the most important of these extend to the other categories. One reason for this choice is that the 128MB PC100 data are available for the longest time period and demand is less time-varying, which allows for more precise estimates.
440
G. ELLISON AND S. F. ELLISON
of observations. LowestPrice is the lowest price listed on Pricewatch (which is presumably for a low-quality memory module).18 Range 1–12 is the difference between the twelfth lowest listed price and the lowest listed price. Note that the price distribution is fairly tight. PLow, PMid, and PHi are the prices for the three qualities of memory modules at the two websites. QLow, QMid, and QHi are the average daily quantities of each quality of module sold by each website. The majority of the sales are the low-quality modules. PLowRank is the rank of the website’s first entry in Pricewatch’s sorted list of prices within the category.19 This variable turns out to allow us to predict sales much better than we can with simple functions of the cardinal price variables. We have not broken the summary statistics down by website. Website A’s prices are usually lower than website B’s, but there is no rigid relationship. In the 128MB PC100 category, website A has a lower low-quality price on 251 days and accounts for 70% of the combined unit sales. 6. DEMAND PATTERNS In this section we estimate demand elasticities and examine how consumers substitute between low-, medium-, and high-quality products. We do this both to provide descriptive evidence on search-engine-influenced e-retail and to provide empirical evidence on theories of obfuscation. 6.1. Methodology for Demand Estimation Assume that within each product category c, the quantity of quality q products purchased from website w on day t is Qwcqt = eXwct βcq uwcqt with Xwct βcq = βcq0 + βcq1 log(PLowwct ) + βcq2 log(PMidwct ) + βcq3 log(PHiwct ) + βcq4 log(LowestPricect ) + βcq5 log(1 + PLowRankwct ) + βcq6 Weekendt + βcq7 SiteBw +
12
βcq7+s TimeTrendst
s=1
18 The Pricewatch data are hourly. Daily variables are constructed by taking a weighted average across hours using weights that reflect the average hourly sales volumes of the websites we study. 19 We only know a site’s Pricewatch rank if it is among the twelve or twenty-four lowest priced websites. When a site does not appear on the list, we impute a value for PLowRank using the difference between the site’s price and the highest price on the list. In the 128MB category this happens for fewer than 1% of the observations. In the 256MB category this happens for 3% of the Site A observations and 14% of the Site B observations.
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
441
The effect of PLowRank on demand is of interest for two reasons: it will contribute to the own-price elasticity of demand for low-quality memory and it provides information on how the Pricewatch list is guiding consumers who buy other products. The price variables PLow, PMid, and PHi are used to estimate elasticities. We think of the other variables mostly as important controls. An important part of our estimation strategy is the inclusion of the TimeTrend variables, which allow for a piecewise linear time trend with a slope that changes every 30 days. We estimate the demand equations via generalized method of moments. Specifically, for most of our estimates we assume that the multiplicative error term uwcqt satisfies E(uwcqt |Xwct ) = 1 so that we can estimate the models using the moment condition E(Qwcqt e−Xwct βcq − 1|Xwct ) = 0 These estimates are done separately for each product category and each quality level. Standard errors use a Newey–West style approach to allow for serial correlation. This estimation approach presumes that the price variables and PLowRank are not endogenous. In the case of PLowRank we think this is a very good assumption: our e-retailer has little information on demand fluctuations and little analytic capability to assess whether idiosyncratic conditions affect the relative merits of different positions on the Pricewatch list. The person who sets prices told us that he checks some of the Pricewatch lists a few times a day and might change prices for a few reasons: if a rank has drifted too far from where he typically leaves it, if there has been a wholesale prices change; or occasionally if multiple employees have failed to show up for work and he needs to reduce volume. The price variables are more problematic. The obvious endogeneity concern is that prices may be positively correlated with demand shocks and/or rivals’ prices, which would bias estimates of own-price elasticities toward zero. The idea behind our base estimates, however, is that the unusual time-series properties of the variables may let us address this at least in part without instruments. The unusual aspect of the data is that our retailer tends to leave medium- and high-quality prices fixed for a week or two and then to change prices by $5–10. Our hope is that demand shifts and rivals’ prices are moving sufficiently smoothly so that much of the variation in them can be captured by the flexible time trends. The effect of our firm’s prices on demand may be picked up in the periods around the discontinuous changes. In the next section we will see that we have some success with this approach, but in several categories it does not work very well. We present alternate estimates derived from using two distinct sets of instruments for the price variables in Section 6.5.
442
G. ELLISON AND S. F. ELLISON
6.2. Basic Results on Demand Table II presents demand estimates from the 128MB PC100 memory module category. The first column of the table contains estimates of the demand equation for low-quality modules. The second and third columns contain estimates of the demand for medium- and high-quality modules. Our first main empirical result is that demand for low-quality modules at a website is extremely price-sensitive. Most of this is due to the effect of Pricewatch rank on demand. The rank effect is very strong: the coefficient on the log(1 + PLowRank) variable in the first column implies that moving from first to seventh on the list reduces a website’s sales of low-quality modules by 83%. The estimates are highly significant—we get a t-statistic of 10.9 in a regression with only 683 observations. Table III presents demand elasticities derived from the coefficient estimates.20 The upper left number in the upper left matrix indicates that the combination of the two price effects in the model results in an own-price elasticity of −24.9 for low-quality 128MB PC100 modules. TABLE II DEMAND FOR 128MB PC100 MEMORY MODULESa Dep. Var.: Quantities of Each Quality Level Independent Variables
log(1 + PLowRank) log(PLow) log(PMid) log(PHi) SiteB Weekend log(LowestPrice) Number of obs.
Low q
Mid q
High q
−129∗ (109) −303 (23) 068 (08) 017 (02) −025∗ (35) −049∗ (84) 120 (11)
−077∗ (46) −059 (04) −674∗ (59) 272 (18) −031∗ (29) −094∗ (83) 083 (06)
−051∗ (29) 149 (09) 238 (17) −476∗ (33) −059∗ (56) −072∗ (58) −014 (01)
683
683
683
a Absolute value of t -statistics in parentheses. Asterisks (*) denote significance at the 5% level.
20
Elasticities with respect to changes in the low-quality price are a sum of two effects: one due to changes in the PLow variable and one due to changes in the PLowRank variable. We estimate the latter by treating PLowRank as a continuous variable and setting the derivative of PLowRank with respect to PLow equal to the inverse of the average distance between the twelve lowest prices, and setting the rank and other variables equal to their sample means.
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
443
TABLE III PRICE ELASTICITIES FOR MEMORY MODULES: THREE QUALITIES IN EACH OF FOUR PRODUCT CLASSESa 128MB PC100 Modules
PLow PMid PHi
128MB PC133 Modules
Low
Mid
Hi
Low
Mid
Hi
−249∗ 07 02
−125∗ −67∗ 27
−72∗ 24 −48∗
−331∗ 08 02
−112∗ −36∗ −48∗
−49∗ 05 −48∗
Hi
Low
256MB PC100 Modules
PLow PMid PHi
Low
Mid
−174∗ 57 07
−81∗ −78 64
256MB PC133 Modules
−41 −41 −38
−248∗ 03 −09
Mid
−125 33 −72
Hi
−66 39∗ −08
a Asterisks (*) denote significance at the 5% level.
A second striking empirical result in Table II is that low-quality memory is an effective loss leader. The coefficients on log(1 + PLowRank) in the second and third columns are negative and highly significant. This means that controlling for a site’s medium- and high-quality prices and other variables, a site sells more medium- and high-quality memory when it occupies a higher position on Pricewatch’s (low-quality) list. The effect is very strong. The −0.77 coefficient estimate indicates that moving from first to seventh on the Pricewatch list for low-quality 128MB PC100 memory reduces a website’s sales of medium-quality 128MB PC100 memory by 66%. The −0.51 coefficient estimate indicates that moving from first to seventh on the Pricewatch list for low-quality memory would reduces high-quality memory sales by 51%.21 A potential concern about this result is that PLowRank might be significant not because Pricewatch’s low-quality list is guiding consumers’ searches, but rather because of an omitted variable problem in our analysis: PLowRank might be correlated with a ranking of our firm’s medium- and high-quality prices relative to its competitors’ prices for comparable goods. We think that this is unlikely given what we know of the time-series behavior of the different series: Pricewatch ranks change frequently, whereas medium- and high-quality prices are left unchanged for substantial periods of time, so that most of the variation in the attractiveness of our firm’s medium- and high-quality prices will occur around the occasional price changes. One’s first reaction to this concern would be to want to address it by including within-category rank variables 21 Although it is common in marketing to talk about loss leaders, the empirical marketing literature on the effectiveness of loss leaders has produced mixed results (Walters (1988), Walters and McKenzie (1988), Chevalier, Kashyap, and Rossi (2003)). We are not aware of any evidence nearly as clear as our results.
444
G. ELLISON AND S. F. ELLISON
PMidRank or PHiRank analogous to PLowRank. This is, however, not possible.22 We can, however, provide a test robust to this concern by looking at choices conditional on buying from one of our websites. We discuss this and present results in Section 6.3. A third noteworthy result is that the coefficients on the Site B dummy are negative and significant in all three regressions. Site B is particularly less successful at selling high-quality memory. This could indicate that website design is important.23 Alternative explanations would include that people may prefer to buy memory from Site A because it specializes in memory and that there may be reputational advantages we cannot directly observe. We report elasticity matrices for the other memory categories in Table III, but to save space we have not included full tables of demand estimates.24 The elasticity tables reveal that our findings that low-quality products have highly elastic demand and that there are loss-leader benefits from selling lowquality goods at a low price are consistent across categories. The estimated own-price elasticities for low-quality modules range from −33.1 in the 128MB PC133 category to −17.4 in the 256MB PC100 category. The one way in which the results for the 128MB PC100 category are unusual is that the own-price elasticities of medium- and high-quality memory are precisely estimated. This problem is particularly severe in the 256MB categories where the effective sample size is reduced by the fact that most of the memory is sold toward the end of the data period. 6.3. The Mechanics of Obfuscation: Incomplete Consumer Search One way to think about the obfuscation discussed in Section 2 is as an increase in search costs that made search less complete. We noted in Section 6.2 that the finding that PLowRank affects medium- and high-quality sales suggests that consumers are conducting a meaningfully incomplete search with the omissions being influenced by Pricewatch’s list, but that an alternate explanation for the finding could be that PLowRank is correlated with the rank of a site’s higher quality offerings. In this section we note that the structure of our 22 We did not collect data on other firms’ full product lines. Even if we had done so, mediumand high-quality memory are not sufficiently well defined concepts to make within-quality rank a well defined concept: every website has a different number of offerings with (often undisclosed) technical attributes and service terms that do not line up neatly with the offerings of our retailer. 23 Site A and Site B are owned by the same firm. They share customer service and packing employees. A few attributes should make Site B more attractive: it had slightly lower shipping charges for part of the sample, it offers more products other than memory, and at the time it had a higher customer feedback rating at ResellerRatings.com, which was probably the most important reputation-posting site for firms like this. 24 Significance levels in the other categories are generally similar to those in the 128MB PC100 category. The log(1 + PLowRank), Weekend, and Site B variables are usually highly significant. The other variables are usually insignificant.
445
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES TABLE IV EVIDENCE OF INCOMPLETE CONSUMER LEARNING: CONDITIONAL SITE CHOICES OF CONSUMERS OF MEDIUM- AND HIGH-QUALITY MEMORYa
Dependent Variable: Dummy for Choice of Site A Independent Variables
log(1 + PLowRank) log(PMid) log(PHi) Number of obs.
Medium Quality
High Quality
−064∗ (42) −308∗ (22) −143 (12) 4118
−031∗ (40) 148 (14) −573∗ (34) 6768
a The table presents estimates of logit models. The dependent variable for the transaction-level data set is a dummy for whether a consumer chose to buy from Site A (versus Site B). The samples are all purchases of medium- or highquality modules from Site A or Site B. Absolute values of z -statistics in parentheses. Asterisks (∗ ) denote significance at the 5% level. The regressions also include unreported category dummies, a linear time trend, and the difference between dummies for appearing on Pricewatch’s first screen.
data set provides an opportunity to avoid this confounding. Our two websites offer identical products. If all consumers learned about all prices, then conditioning on a consumer’s decision to purchase from one of our sites, the relative position of the two sites on Pricewatch’s list should not help predict which site a consumer will purchase from.25 To provide a straightforward analysis of conditional choices, we estimate simple logit models on the consumer-level data using a dummy for whether each consumer chose to buy from Site A (versus Site B) as the dependent variable. As explanatory variables we include the difference across sites in log(1 + PLowRank), log(PMid), and log(PHi), and a set of time trends. The two columns of Table IV report estimates from the sample of all consumers who purchased medium- and high-quality memory, respectively.26 The significant coefficients on the log(PMid) variable in the first column and on the log(PHi) variable in the second column indicate that consumers are influenced by the prices of the product they are buying. Interestingly, however, the significant coefficients on log(1 + PLowRank) in both columns indicate that consumers are also more likely to purchase from the site with a lower lowquality price. Considering the standard deviations of the two variables, we find that the rank of a firm’s low-quality product has about as much influence on 25 This would be exactly true in a discrete-choice model with the IIA property. In a randomcoefficients model where consumers had preferences over websites and over quality levels, one would expect PLowRank to have the opposite effect from the one we find: when Site A has a low price for low-quality memory, then fewer consumers with a strong Site A preference will buy medium-quality memory, which makes the pool of consumers buying medium-quality memory tilted toward Site B. 26 We have pooled observations from all four memory categories.
446
G. ELLISON AND S. F. ELLISON
consumer decisions as the price of the product consumers are buying. Overall, the regressions support the conclusion that consumer learning about prices is incomplete. 6.4. The Mechanics of Obfuscation: Add-Ons and Adverse Selection The second model in Section 2 noted that creating inferior versions of products to advertise could raise equilibrium markups by creating an adverse selection problem. More concretely, this occurs if a decrease in a firm’s low-quality price decreases the fraction of consumers who buy upgrades. In other words, if the elasticity on the low-quality memory is larger (in absolute value) than that for medium- or high-quality memory, there is evidence of adverse selection. This feature is present in all four of our elasticity matrices.27 An alternate way to get intuition for the magnitude of this adverse-selection effect without relying on the functional form assumptions is to look at the firm’s quality mix using sample means. For example, we find that when one of our sites is first on one of the Pricewatch lists for 256MB memory, 63% of its unit sales are low-quality memory. On days when one of them is in tenth place, only 35% of the unit sales are low-quality memory. 6.5. Instrumental Variables Estimates We noted above that an obvious source of potential difficulty for our elasticity estimates (especially with respect to changes in medium- and high-quality prices) is that our price variables may be correlated with demand shocks, rival firms’ prices, or both. In this section we present two sets of instrumental variables (IV) estimates. Our first set of instruments are cost-based. We instrument for PLow, PMid, and PHi with our firm’s acquisition costs for each product. Many textbooks use costs as the prototypical example of an instrument for price in a demand equation. In retail, however, the case for instrumenting with acquisition costs is tenuous: “costs” are really wholesale prices and will therefore be affected by broader demand shocks, and they may be correlated with retail prices charged by our firm’s rivals. Two features of the memory market make correlation with demand shocks less of a worry than they would be in other retail industries: (i) sales of aftermarket memory are small compared to the use of memory in new computers, so aftermarket memory prices will not be much affected by aftermarket demand shocks; (ii) some of the variation in wholesale prices in the period we study is due to collusion among memory manufacturers.28 The correlation with rival’s prices is clearly a potential problem. 27 See Ellison and Ellison (2004) for additional evidence on this point, including similar estimates from CPUs. 28 Demand shocks in the new computer and memory upgrades markets may be correlated, of course, if both are driven by the memory requirements of popular applications. Samsung, Elpida,
447
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES TABLE V INSTRUMENTAL VARIABLES ESTIMATES OF PC100 128MB MEMORY DEMAND MODELa Dependent Variables: Quantity of 128MB PC100 Memory Modules of Each Quality Level Cost Instruments Independent Variables
log(1 + PLowRank) log(PLow) log(PMid) log(PHi) SiteB Weekend log(LowestPrice) Number of obs.
Low
Mid ∗
−199 (38) 094 (01) 1261 (16) 641 (13) −040∗ (26) −055∗ (51) −591 (10) 683
Other Speed Instruments Hi
∗
−106 (22) 354 (06) −688 (09) 638 (14) −036∗ (30) −095∗ (80) −247 (05) 683
Low ∗
−119 (31) 1275∗ (23) 076 (01) −469 (14) −059∗ (44) −076∗ (47) −826 (16) 683
∗
−175 (30) 268 (06) −172 (11) 521∗ (25) −025∗ (21) −048∗ (73) −389 (11) 608
Mid
Hi
−025 (04) −158 (04) −714∗ (41) 203 (08) −051∗ (28) −095∗ (82) 217 (07) 608
002 (00) 239 (05) −050 (01) −345 (09) −066∗ (26) −064∗ (46) 237 (07) 608
a Absolute value of t -statistics in parentheses. Asterisks (∗ ) denote significance at the 5% level.
The first three columns of Table V report estimates of the demand equations for 128MB PC100 memory modules (comparable to those in Table II) obtained using the cost-based instruments for PLow, PMid, and PHi.29 Our primary results about own-price elasticities and loss-leader benefits are robust to this change: The effect of PLowRank on sales remains large, negative, and significant in all three categories. The biggest difference between the IV estimates and our earlier estimates is that the cross-price terms are all positive and many are much larger. The standard errors, however, are also generally larger in these regressions so few of the own-price and cross-price estimates are significant. We refer to our second set of instruments as the “other-speed” set. We instrument for PLow, PMid, PHi, and log(1 + PLowRank) in the 128MB PC100 category using a website’s prices for low-, medium-, and high-quality 128MB PC133 modules and its rank in this category.30 These may be useful in identifyInfineon, and Hynix pled guilty in separate cases to collusion charges covering the period from April 1999 to June 2002. Executives from these companies and a Micron sales representative were also prosecuted individually and received jail sentences. 29 First-stage regressions are presented in Ellison and Ellison (2009). The cost of mediumquality memory has less predictive power than one might hope. 30 Ellison and Ellison (2009) present first-stage regressions showing that the instruments are not weak, although predictive power is better for the prices than for the rank.
448
G. ELLISON AND S. F. ELLISON
ing exogenous shifts in medium- and high-quality prices if these tend to occur in both categories simultaneously either because prices are reviewed sporadically or if prices are adjusted in response to unexpected labor shortages. Another attractive aspect of this strategy is that the availability of the other-speed rank gives us a fourth instrument, whereas in our cost-based instrument set we had to maintain the assumption that log(1 + PLowRank) was exogenous. There are still potential concerns. For example, prices in the other category may not be completely orthogonal to demand conditions if demand in both categories is driven by a common shock, like the memory requirements of popular software applications. The second three columns of Table V present estimates from the other-speed instruments. Instrumenting for log(1 + PLowRank) makes the standard errors on the estimates much larger. Two of the estimates become more negative and one becomes less negative. The cross-price effects between low- and highquality memory are much larger than in our noninstrumented results. Standard errors on all the price effects are also much larger. Overall, we see the IV results as indicating that cross-price terms probably are larger than in our noninstrumented results. There is nothing to cause concern about any of our main results, although the limited quality of the instruments does not let us provide strong additional support either. 7. MARKUPS This section examines price–cost margins. It is intended both to provide descriptive evidence on price search-dominated e-commerce and to give insight on how obfuscation affects markups. Table VI presents revenue-weighted average percentage markups for each of the four categories of memory modules.31 In the two 128MB memory categories, the markups for low-quality products are slightly negative. Prices have not, however, been pushed far below cost by the desire to attract customers who can be sold upgrades. Markups are about 16% for medium-quality modules and about 27% for high-quality modules. Averaging across all three quality levels, markups are about 8% and 12% in the two categories. This corresponds to about $5 for a PC100 module and $10 for a PC133 module. The firm’s average markups in the 256MB memory categories were higher: 13% and 16% in the two categories. Part of the difference is due to the fact that a higher fraction of consumers buy premium quality products, but the 31 The percentage markup is the percentage of the sale price, that is, 100(p − mc)/p. Dollar markups were obtained by adding the standard shipping and handling charge to the advertised item price, and then subtracting the wholesale acquisition cost, credit card fees, an approximate shipping cost, an estimate of marginal labor costs for order processing, packing, and returns, and an allowance for losses due to fraud. The labor and shipping costs were chosen after discussions with the firm, but are obviously subject to some error.
449
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES TABLE VI MEAN PERCENTAGE MARKUP IN SIX PRODUCT CLASSESa Product Category 128MB Memory PC100
Actual low markup Actual mid markup Actual hi markup Overall markup Overall elasticity ε 1/ε Adverse selection multiplier Predicted markup
−07% 173% 273% 77% −239 42% 20 83%
PC133
−25% 156% 269% 115% −277 36% 35 128%
256MB Memory PC100
PC133
43% 162% 243% 127% −160 63% 17 109%
29% 199% 249% 158% −212 47% 24 114%
a The table presents revenue-weighted mean percentage markups for products sold by websites A and B in each of four product categories along with predicted markups as described in Sections 2.2 and 7.
largest part comes from the markups on low-quality memory being substantially higher. It is interesting to examine how the actual markups compare to what one would expect given the overall demand elasiticity and the strength of the adverse selection effect. The sixth row of the the table reports the inverse demand elasiticity 1/ε defined in Section 2.2. Absent any adverse selection effects, these would be the expected markups. They range from 3.6% to 6.3% across the categories. Although these are small numbers and we have emphasized that demand is highly elastic, one channel by which obfuscation may be affecting markups is by preventing elasticities from being even higher than they are. We do not know how elastic demand would be absent the obfuscation, but it is perhaps informative to note that our estimates imply that fewer than onethird of consumers are buying from the lowest priced firm. If Pricewatch ads were more standardized and consumers did not need to worry about restocking policies, etcetera, then one might imagine that many more consumers would buy from the lowest priced firm and demand could be substantially more elastic. The second mechanism by which we noted that obfuscation could affect markups is through the adverse selection effect that arises when firms sell addons. The seventh row reports the markup multiplier we would expect given the degree of adverse selection we have estimated to be present. Specifically, we report an estimate of the rightmost term in parentheses in equation 1, m obtained by assuming ∂π1U /∂p1L = 0 and computing the multiplier term as 32 ∗ 1 + ∂x /∂p1L (p1U − c1U ). The multipliers range from 1.7 to 3.5 across the 32 The effect of the low-quality price on the fraction upgrading comes from the demand system and the markup on the upgrade is set to its sample mean.
450
G. ELLISON AND S. F. ELLISON
four categories. This indicates that the adverse selection we have identified is sufficiently strong so that one would expect it to have a substantial effect on equilibrium markups. The actual and predicted markups are roughly consistent. In three of the four categories the actual markups are within two percentage points of the predicted markups. This implies, for example, that prices are within $2 of what we would predict on a $100 product. The actual and predicted markups are both lowest in the 128MB PC100 category. The difference between the actual and predicted markups is largest in the 256MB PC133 category, where actual markups are four percentage points higher than the prediction. Looking further into the data we note that the positive average markups for low-quality 256MB modules are entirely attributable to two subperiods: low-quality 256MB modules were sold at about $10 above cost in September– October 2000 and at about $5 above cost in February–March 2001. We think we understand what happened in the former period. A small number of retailers found an obscure supplier willing to sell them 256MB modules at a price far below the price offered by the standard wholesale distributors.33 As a result, there were effectively six or fewer retailers competing in these two months rather than dozens. 8. CONCLUSION In this paper we have noted that the extent to which the Internet will reduce consumer search costs is not clear. Although the Internet clearly facilitates search, it also allows firms to adopt a number of strategies that make search more difficult. In the Pricewatch universe, we see that demand is sometimes remarkably elastic, but that this is not always what happens. The most popular obfuscation strategy for the products we study is to intentionally create an inferior quality good that can be offered at a very low price. Retailers could, of course, avoid the negative impact of search engines simply by refusing to let the search engines have access to their prices. This easy solution, however, has a free rider problem: if other firms are listing, a firm will suffer from not being listed. What may help make the obfuscation strategy we observe popular is that it is hard not to copy it: if a retailer tries to advertise a decent quality product with reasonable contractual terms at a fair price, it will be buried behind dozens of lower price offers on the search engine’s list. The endogenous-quality aspect of the practice makes it somewhat different from previous bait-and-switch and loss-leader models, and it seems that it would be a worthwhile topic for research.34 We would also be interested to see 33 The first retailer to have found the supplier appears to have found it on July 10. On that day, when the firm that supplied us with data bought modules wholesale for $270, PC Cost cuts its retail price to $218—a full $51 below the next lowest price. 34 Simester’s (1995) model seems to be the most similar to the practice. We would imagine, however, that what makes the low prices on Pricewatch have advertising value is that the offerings
SEARCH, OBFUSCATION, AND PRICE ELASTICITIES
451
more work integrating search engines into models with search frictions, exploring other obfuscation techniques (such as individualized prices), and trying to understand why adoption of price search engines has been slow. REFERENCES BAYE, M. R., AND J. MORGAN (2001): “Information Gatekeepers on the Internet and the Competitiveness of Homogeneous Product Markets,” American Economic Review, 91, 454–474. [429] (2003): “Information Gatekeepers and Price Discrimination on the Internet,” Economics Letters, 76, 47–51. [429] BAYE, M. R., J. MORGAN, AND P. SCHOLTEN (2004): “Price Dispersion in the Small and the Large: Evidence From an Internet Price Comparison Site,” Journal of Industrial Economics, 52, 463–496. [429] (2007): “Information, Search, and Price Dispersion,” in Handbook of Economics and Systems, ed. by T. Hendershott, Amsterdam: North-Holland. [429] BAYE, M. R., J. RUPERT, J. GATTI, P. KATTUMAN, AND J. MORGAN (2006): “Clicks, Discontinuities, and Firm Demand Online,” Mimeo, Haas School of Business, University of California, Berkeley. [429] BROWN, J. R., AND A. GOOLSBEE (2002): “Does the Internet Make Markets More Competitive? Evidence From the Life Insurance Industry,” Journal of Political Economy, 110, 481–507. [429] BROWN, J., T. HOSSAIN, AND J. MORGAN (2006): “Shrouded Attributes and Information Suppression: Evidence From Field,” Mimeo, Haas School of Business, University of California, Berkeley. [429] BRYNJOLFSSON, E., AND M. SMITH (2001): “Consumer Decision-Making at an Internet Shopbot: Brand Still Matters,” Journal of Industrial Economics, XLIX, 541–558. [429] CHEVALIER, J., AND A. GOOLSBEE (2003): “Measuring Prices and Price Competition Online: Amazon vs. Barnes and Noble,” Quantitative Marketing and Economics, 1, 203–222. [429] CHEVALIER, J., A. KASHYAP, AND P. ROSSI (2003): “Why Dont Prices Rise During Periods of Peak Demand? Evidence From Scanner Data,” American Economic Review, 93, 15–37. [443] DIAMOND, P. A. (1971): “A Model of Price Adjustment,” Journal of Economic Theory, 3, 156–168. [430] ELLISON, G. (2005): “A Model of Add-On Pricing,” Quarterly Journal of Economics, 120, 585–637. [427-432] ELLISON, G., AND S. F. ELLISON (2004): “Search, Obfuscation, and Price Elasticities on the Internet,” Working Paper 10570, NBER. [429,439,446] (2009): “Supplement to ‘Search, Obfuscation, and Price Elasticities on the Internet’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ ecta/Supmat/5708_Data.pdf; http://www.econometricsociety.org/ecta/Supmat/5708_Data and programs.pdf. [447] GABAIX, X., AND D. LAIBSON (2006): “Shrouded Attributes, Consumer Myopia, and Information Suppression in Competitive Markets,” Quarterly Journal of Economics, 121, 505–540. [429] SCOTT MORTON, F., F. ZETTELMEYER, AND J. SILVA-RISSO (2001): “Internet Car Retailing,” Journal of Industrial Economics, 49, 501–519. [429] (2003): “Cowboys or Cowards: Why Are Internet Car Prices Lower?” Mimeo, School of Management, Yale University. [429] SIMESTER, D. (1995): “Signaling Price Image Using Advertised Prices,” Marketing Science, 14, 166–188. [450] are sufficiently attractive so as to force a retailer to set low prices for its other offerings to avoid having everyone buy the advertised product.
452
G. ELLISON AND S. F. ELLISON
SPIEGLER, R. (2006): “Competition Over Agents With Boundedly Rational Expectations,” Theoretical Economics, 1, 207–231. [429] STAHL, D. O. (1989): “Oligopolistic Pricing With Sequential Consumer Search,” American Economic Review, 79, 700–712. [429] (1996): “Oligopolistic Pricing With Sequential Consumer Search and Heterogeneous Search Costs,” International Journal of Industrial Organization, 14, 243–268. [429] WALTERS, R. (1988): “Retail Promotions and Retail Store Performance: A Test of Some Key Hypotheses,” Journal of Retailing, 64, 153–180. [443] WALTERS, R., AND S. MCKENZIE (1988): “A Structural Equation Analysis of the Impact of Price Promotions on Store Performance,” Journal of Marketing Research, 25, 51–63. [443]
Dept. of Economics, Massachusetts Institute of Technology, E52-380, 50 Memorial Drive, Cambridge, MA 02142-1347, U.S.A. and NBER;
[email protected] and Dept. of Economics, Massachusetts Institute of Technology, E52-274B, 50 Memorial Drive, Cambridge, MA 02142-1347, U.S.A.;
[email protected]. Manuscript received February, 2005; final revision received October, 2007.
Econometrica, Vol. 77, No. 2 (March, 2009), 453–487
BELIEF-FREE EQUILIBRIA IN GAMES WITH INCOMPLETE INFORMATION BY JOHANNES HÖRNER1 AND STEFANO LOVO We define belief-free equilibria in two-player games with incomplete information as sequential equilibria for which players’ continuation strategies are best replies after every history, independently of their beliefs about the state of nature. We characterize a set of payoffs that includes all belief-free equilibrium payoffs. Conversely, any payoff in the interior of this set is a belief-free equilibrium payoff. The characterization is applied to the analysis of reputations. KEYWORDS: Repeated game with incomplete information, Harsanyi doctrine, belief-free equilibria.
1. INTRODUCTION THE PURPOSE OF THIS PAPER is to characterize the set of payoffs that can be achieved by equilibria that are robust to specification of beliefs. The games considered are two-player discounted repeated games with two-sided incomplete information and observable actions. The equilibria whose payoffs are studied are such that the players’ strategies are optimal from any history on and independently of players’ beliefs about their opponent’s type. This concept is not new. It has been introduced in another context, namely in repeated games with imperfect private monitoring, in Piccione (2002) and Ely and Välimäki (2002), and further examined in Ely, Hörner, and Olszewski (2005). It is also related to the concept of ex post equilibrium that is used in mechanism design (see Crémer and McLean (1985)) as well as in large games (see Kalai (2004)). A recent study of ex post equilibria and related belief-free solution concepts in the context of static games of incomplete information was provided by Bergemann and Morris (2007). To predict players’ behavior in games with unknown parameters, a model typically includes specification of the players’ subjective probability distributions over these unknowns, following Harsanyi (1967–1968). This is not necessary when belief-free equilibria are considered, as their characterization requires a relatively parsimonious description of the model. One needs to enumerate the set of possible states of the world and players’ information partitions over these states, but it is no longer necessary to specify players’ beliefs. Therefore, while solving for belief-free equilibria requires the game to be fully specified, it does not require that all players know all the parameters of the 1 We thank Gabrielle Demange, Jeff Ely, Stephen Morris, Tristan Tomala, and, particularly, Larry Samuelson for useful comments. We are grateful to two anonymous referees and the editor for insightful comments and suggestions that substantially improved the paper. Stefano Lovo gratefully acknowledges financial support from the HEC Foundation.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7134
454
J. HÖRNER AND S. LOVO
model. In this sense, this idea is close to the original motivation of von Neumann and Morgenstern (1944) in defining “games of incomplete information” as games in which some parameters remain unknown, and it is consistent with misperceptions as defined by Luce and Raiffa (1957). Nevertheless, as in the case of games with perfect information, players are expected utility maximizers: players are allowed to randomize, and take expectations with respect to such mixtures when evaluating their payoff.2 Our purpose is to characterize which equilibria do not require any probabilistic sophistication beyond that assumed in repeated games with perfect information. Just as for ex post equilibria, belief-free equilibria enjoy the desirable property that the beliefs about the underlying uncertainty are irrelevant. This means that they remain equilibria when players are endowed with arbitrary beliefs. Such beliefs need not be derived by Bayes’ rule from a common prior. Furthermore, the way in which players update their beliefs as the game unfolds is irrelevant. For instance, belief-free equilibria remain equilibria if we allow players to observe a signal of their stage-game payoff and to learn in this way about the other player’s private information. Thus, belief-free equilibria are robust to all specifications of how players form and update their belief. In particular, belief-free equilibria are sequential equilibria (for any prior) satisfying any potentially desirable refinement. In a belief-free equilibrium, the players’ strategies must be a subgame-perfect Nash equilibrium of the game of complete information that is determined by the joint of their private information. However, we do not view belief-free equilibrium as an equilibrium refinement per se. In fact, belief-free equilibria need not exist. The robustness that is demanded is extreme in the sense that it is not only required that the strategies be mutual best replies for a neighborhood of beliefs, but for all possible beliefs. Note that it is also stronger than the way it is modeled in the recent macroeconomics literature (Hansen and Sargent (2007)), since the property examined here treats all possible beliefs identically. We provide a set of necessary conditions that belief-free equilibrium payoffs must satisfy, which defines a closed convex and possibly empty set. Conversely, we prove that every interior point of this set is a belief-free equilibrium payoff, provided that players are sufficiently patient. In the proof, we also show how to construct a belief-free equilibrium supporting any payoff in this set. This equilibrium has a recursive structure similar to standard constructions based on an equilibrium path and a punishment path for each player. While the set of belief-free equilibria is empty for some games, belief-free equilibria exist for large classes of games studied in economics such as, for example, most types of auctions, Cournot games, and Bertrand games. Constructing “belief-based” equilibria generally requires keeping track of beliefs and even of hierarchies of beliefs. This is usually untractable unless the information structure is quite 2 This is also the standard assumption used in the literature on non-Bayesian equilibria (see, for instance, Monderer and Tennenholtz (1999)).
BELIEF-FREE EQUILIBRIA IN GAMES
455
special or the game is repeated at most twice.3 This problem does not arise with belief-free equilibria, thus offering a possible route for the analysis of dynamic economic interactions with relatively complex and realistic information structures at a time when there is a strong interest in modeling robustness in economics. The set of belief-free equilibrium payoffs turns out to coincide with a set that plays a prominent role in the literature on Nash equilibria in games with one-sided incomplete information. Building on this literature, we describe the implications of the concept to the study of reputations. In particular, the Stackelberg payoff is equal to the lowest (belief-free) equilibrium payoff if the game is of conflicting interest, which is precisely the type of game typically used as example to show how surprisingly limited reputation effects are when players are equally patient. More generally, focusing attention on belief-free equilibria with equally patient players is shown to involve restrictions on the equilibrium payoff set similar to those of Nash equilibria when the informed player is infinitely more patient than the uninformed player. As mentioned, the set of payoffs that characterizes belief-free equilibria has already appeared in the literature, at least in the case of one-sided incomplete information. In particular, Shalev (1994) considered the case of known-own payoffs (the uninformed player knows his own payoffs) and showed that the set of uniform (undiscounted) Nash equilibrium payoffs can be derived from this set. Closest to our analysis is Cripps and Thomas (2003), which considered the one-sided case with known-own payoffs as well, but with discounting. Most relevant here is their Theorem 2, which establishes that the payoffs in the strict interior of this set are Nash equilibria for all priors. In general, however, the set of Nash equilibrium payoffs is larger, as they demonstrated in their Theorem 3, which establishes a folk theorem. The work of Forges and Minelli (1997) is related as well. They showed how communication can significantly simplify the construction of strategies that achieve the Nash equilibrium payoffs. These simple strategies also appear in Koren (1992). The most general characterization of Nash equilibrium payoffs remains the one obtained by Hart (1985) for the case of one-sided incomplete information. A survey is provided by Forges (1992). The assumptions of Bayesianism has already been relaxed in several papers. Baños (1968) and Megiddo (1980) showed that strategies exist that asymptotically allow a player to secure a payoff as high as in the game with complete information. Milnor (1954) reviewed several alternative criteria and discuss their relative merits. The topic has also been explored in computer science. Aghassi and Bertsimas (2006) used robust optimization techniques to provide an alternative concept in the case of bounded payoff uncertainty. Monderer and 3 Examples can be found in financial economics, where sequential trade of a security is often modeled as a dynamic auction, and in industrial organization, where repeated Cournot games are used to model dynamic competition among firms.
456
J. HÖRNER AND S. LOVO
Tennenholtz (1999) studied the asymptotic efficiency in the case in which players are non-Bayesian and monitoring is imperfect. All these papers either offer an alternative equilibrium concept or study what is asymptotically achievable without using any solution concept. Yet the strategy profiles that are characterized in these papers are not Bayesian equilibria (at least under discounting), which is a major difference with our paper. As mentioned, the concept of belief-free equilibria has already been introduced in the context of games with complete but imperfect information. There, the restriction on the equilibrium pertains to the private history observed by the opponent. In both contexts, the application of the concept reduces the complexity of the problem (players need no longer keep track of the relevant beliefs) and yields a simple characterization. In games with imperfect private monitoring, this has further allowed the construction of equilibria in cases in which only trivial equilibria were known so far. The next section introduces notations and definitions. Section 3 then provides the payoff characterization, identifying in turn necessary and sufficient conditions on payoffs that belief-free equilibrium payoffs satisfy. This section also gives a relatively short proof of sufficiency using explicit communication (the proof without such communication is given in Appendix) and provides counterexamples to existence, as well as sufficient conditions for existence. Section 4 applies the concept to the study of reputations.
2. NOTATION AND DEFINITIONS We consider repeated games with two-sided incomplete information, as defined by Harsanyi (1967–1968) and Aumann and Maschler (1995). There is a J × K array of two-person games in normal form. The number of actions of player i = 1 2 is the same across all J × K games. At the beginning of time and once for all, Nature chooses the game in the J × K array. Player 1 is told in which row j = 1 J the true game lies, but he is not told which of the games in that row is actually being played. Player 2 is told in which column k = 1 K the true game lies, but he is not told which of the games in that column is the true game. The row j (respectively, column k) is also referred to as player 1’s (respectively, player 2’s) type. Given some finite set B, |B| denotes the cardinality of B and B denotes the probability simplex over B. Also, given some set B, let int B denote its interior and co B denote its convex hull. The stage game is a finite-action game. Let A1 and A2 be the finite sets of actions for players 1 and 2, respectively, where |Ai | ≥ 2. Let A = A1 × A2 . When the row is j and the column is k (for short, when the state is (j k)), jk player i’s reward (or payoff) function is denoted ui for i = 1 2. We extend the jk domain of ui from pure action profiles a ∈ A to mixed action profiles α ∈ A
BELIEF-FREE EQUILIBRIA IN GAMES jk
j
457
jk
in the standard way. We let uk1 := {u1 }Jj=1 and u2 := {u2 }Kk=1 . The set of feasible payoffs in RJ×K × RJ×K is defined, as usual, as jk jk co (u1 (a))(jk) (u2 (a))(jk) : a ∈ A jk
Let M := max |ui (a)|, where the maximum is taken over players i = 1 2, states (j k), and action profiles a ∈ A. Given some payoff function u, let u or val u jk refer to the corresponding minmax payoff. We let Bi (α−i ) denote the set of player i’s best replies in the stage game given state (j k) and action α−i of player −i. We omit the superscript k in case |K| = 1, that is, if the game is of one-sided incomplete information. If furthermore player 2’s payoff does not depend on j, we write B(α1 ) for his set of best replies. As an example, consider the stage game given below. Since this game is dominance solvable and the dominant action depends on the state, ex post equilibria do not exist in the static game. Yet as we shall see, the repeated game admits a rich set of belief-free equilibria. EXAMPLE 1—Prisoner’s Dilemma With One-Sided Incomplete Information: Player 1 is informed of the true state ( = the row), player 2 is not, and there is only one column (J = 2, K = 1). If the true game corresponds to j = 1, payoffs are given (in every period) by the prisoner’s dilemma payoff matrix in which T is “cooperate” and B is “defect.” If the true game corresponds to j = 2, payoffs are given by the prisoner’s dilemma payoff matrix in which B is “cooperate” and T is “defect.” The payoffs in the first state are given by
T B
T 1 1 1 + G −L
B −L 1 + G 0 0
T 0 0 −L 1 + G
B 1 + G −L . 1 1
and in the second state by
T B
We consider the repeated game between the two players. Players select an action in each period t = 1 2 Realized actions are observable, mixed actions and realized rewards are not.
458
J. HÖRNER AND S. LOVO
Let H t = (A1 × A2 )t−1 be the set of all possible histories of actions ht up to period t, with H 1 = ∅. A (behavioral) strategy for type j of player 1 (resp. j j1 j2 jt type k of player 2) is a sequence of maps s1 := (s1 s1 ), s1 : H t → A1 j (resp. s2k := (s2k1 s2k2 ), s2kt : H t → A2 ). We define s1 := {s1 }Jj=1 and s2 := {s2k }Kk=1 . Consider the game of complete information given state (j k). Given the common discount factor δ < 1, player i’s payoff in this game is the average discounted sum of expected rewards. A subgame-perfect Nash equilibrium of this game is defined as usual. Our purpose is to characterize the payoffs that can be achieved, with low discounting, by a special class of Nash equilibria. In a belief-free equilibrium, each player’s continuation strategy, after any history, is a best reply to his opponent’s continuation strategy, independently of his beliefs about the state of the world and, therefore, independently of his opponent type. Such equilibria are trivially sequential equilibria that satisfy any belief-based refinement. At the same time, they do not require players to be Bayesian or to share a common prior. Because they are belief-free, they must, in particular, induce a subgameperfect equilibrium in every complete information game that is consistent with the player’s private information. Formally, a belief-free equilibrium is defined as follows. DEFINITION 1: A strategy profile s := (s1 s2 ) is a belief-free equilibrium j if it is the case that, for all states (j k), (s1 s2k ) is a subgame-perfect Nash equilibrium of the infinitely repeated game with stage-game payoffs given jk jk by (u1 u2 ). As mentioned, belief-free equilibria have been previously introduced in and applied to games with imperfect private monitoring. With incomplete information but observable actions, there is no need for randomization on the equilibrium path. Indeed, in our construction, along the equilibrium path, players always have a strict preference to play some particular action. Of course, this action potentially depends on a player’s private information (and on the history). In our construction, randomization is only necessary during punishment phases, as is standard in folk theorems that allow for mixed strategies to determine minmax payoffs, as we do.4 It follows from the definition of belief-free equilibria that even when different player’s types use the same strategy, it would be weakly optimal for them to reveal their type (if there was a communication device). Indeed, by definition, the strategy profile that is played is an equilibrium of the underlying complete information game. This means that pooling belief-free equilibria are simply 4 Yet a randomization device considerably simplifies the exposition. At the end of the Appendix, we indicate how to dispense with it.
459
BELIEF-FREE EQUILIBRIA IN GAMES
“degenerately separating” belief-free equilibria. In particular, the payoffs of pooling belief-free equilibria are in the closure of the set of payoffs achieved by separating belief-free equilibria. This finding implies that this concept is more restrictive than most refinements, since refinements do not usually prune all pooling equilibria that are not degenerate separating ones.5 3. CHARACTERIZATION jk
Any belief-free equilibrium determines a J × K array of payoffs (vi ), for each player i = 1 2. We first provide necessary conditions that such a pair of arrays must satisfy, before providing sufficient conditions that ensure they are achieved by some belief-free equilibrium. 3.1. Necessary Conditions For definiteness, consider i = 1. Conditional on the column k he is being told, player 2 knows that player 1’s equilibrium payoff is one among the coordinates of the vector v1k = (v11k v1Jk ). Because the equilibrium is belief-free, player 1’s payoff must be individually rational in the special case in which his beliefs are degenerate on the true column k. This means that, for a given column k, player 2’s strategy s2k is such that player 1 cannot gain from deviating j from s1 , for all j = 1 J. The existence of such a strategy s2k puts a restricjk tion on how low player 1’s payoff v1 can be (in fact, a joint restriction on the coordinates of the vector v1k ). If J = 1, so that the game is of one-sided incomplete information, this restriction on player 1’s payoff is standard: for each k, player 1 must receive at least as much as his minmax payoff (in mixed strategies) in the true game being played. In the general case however, the minmax level in one state depends on the payoffs in the other states, and there is a trade-off between these levels: punishing player 1 for one row may require conceding him a high payoff for some other row. Determining these minmax levels is not obvious. This is precisely the content of Blackwell’s approachability theorem (Blackwell (1956)). j For a given p ∈ {1 J} (resp. q ∈ {1 K}), let bk1 (p) (resp. b2 (q)) be the value for player 1 (resp. player 2) of the one-shot game with payj off matrix p · uk1 (resp. q · u2 ). We say that a vector v1 ∈ RJ×K is individually rational for player 1 if it is the case that, for all k = 1 K, p · v1k ≥ bk1 (p)
∀p ∈ {1 J}
5 Indeed, many games admit “traditional” pooling equilibria in which individual rationality holds in expectation, but not conditional on every type of opponent.
460
J. HÖRNER AND S. LOVO
where v1k = (v11k v1Jk ). Similarly, v2 ∈ RJ×K is individually rational for player 2 if it is the case that, for all j = 1 J, j
j
q · v2 ≥ b2 (q) j
j1
∀q ∈ {1 K}
jK
where v2 = (v2 v2 ). Blackwell’s characterization ensures that if v1 ∈ RJ×K is individually rational for player 1, then for any column k, player 2 has a strategy sˆ2k (referred to as a punishment strategy hereafter) such that player 1’s averjk age payoff cannot be larger than v1 for all j independently of the strategy he uses. In a belief-free equilibrium, each player can guarantee that his payoff is individually rational, independently of the discount factor.6 NECESSARY CONDITION 1—Individual Rationality: If vi is a belief-free equilibrium payoff, then it is individually rational. In a belief-free equilibrium, play may depend on a player’s private informaj tion. That is, player 1’s equilibrium strategy s1 typically depends on the row j k he is told, and player 2’s strategy s2 depends on the row k. Since player 1’s j strategy s1 must be a best reply to s2 independently of his beliefs, it must be a best reply to s2k , corresponding to beliefs that are degenerate on the true colj j umn k. In particular, s1 must be a better reply to s2k than s1 , j = j, when the row is j. While this might seem a weaker restriction than individual rationality, it is not implied by it, since it places restrictions on the equilibrium path. By j deviating to s1 when the state is (j k), player 1 induces the same distribution j k over action profiles as the one generating the payoff v1 in state (j k). This imposes additional restrictions on the equilibrium strategies. To state this second necessary condition in terms of payoffs, observe that j each pair (s1 s2k ) induces a probability distribution {Pr{a | (j k)} : a ∈ A}(jk) over action profiles, where Pr{a | (j k)} = (1 − δ)
∞
j
δt−1 Pr{at = a | (s1 s2k )}
t=1 j
and Pr{at = a | (s1 s2k )} is the probability that action a is played in period t j given the strategy profile (s1 s2k ). 6 The punishments that can be imposed in the discounted game are lower than, but converge uniformly to, those that can be imposed in the undiscounted game. See Cripps and Thomas (2003) and references therein. We thank a referee for pointing out that individual rationality must hold for all discount factors.
461
BELIEF-FREE EQUILIBRIA IN GAMES
NECESSARY CONDITION 2—Incentive Compatibility: If (v1 v2 ) is a pair of belief-free equilibrium payoff arrays, there must exist distributions {Pr{a | (j k)} : a ∈ A}(jk) such that, for all (j k), jk jk jk v1 = Pr{a | (j k)}u1 (a) ≥ Pr{a | (j k)}u1 (a) a
and jk
v2 =
a
a
jk
Pr{a | (j k)}u2 (a) ≥
Pr{a | (j k )}u2 (a) jk
a
If such distributions exist, we say that (v1 v2 ) is incentive compatible. While not every pair of payoff arrays is incentive compatible, there always exist some incentive compatible pairs, since the constraints are trivially satisfied for distributions Pr{a | (j k)} that are independent of (j k). 3.2. Sufficient Conditions Let V ∗ denote the feasible set of pairs of payoff arrays satisfying Conditions 1 and 2. It is clear that V ∗ is convex. Our main result is the following. THEOREM 1: Fix some v in the interior of V ∗ . The pair of payoff arrays v is achieved in some belief-free equilibrium if players are sufficiently patient. This theorem establishes that the necessary conditions are “almost” sufficient. It is then natural to ask whether we can get an exact characterization. However, the strict inequalities corresponding to individual rationality cannot be generally weakened. One reason for this is that our optimality criterion involves discounting, while Blackwell’s characterization of approachability is only valid for the undiscounted case. The strict inequalities corresponding to incentive compatibility may be weakened when V ∗ has nonempty interior. However, for the interesting case in which V ∗ has empty interior, this may not be possible.7 While belief-free equilibria need not exist, as shown in Section 3.4, they exist in a variety of games. For instance, the game in Example 1 admits a large set of belief-free equilibrium payoffs. Figures 1 and 2 display the resulting equilibrium payoffs.8 7 Consider for instance the case of one-sided incomplete information: player 1 knows the row, but his payoff does not depend on the row, so the incentive compatibility constraints necessarily bind. 8 Note that these are the projections of the equilibrium payoff pairs onto each player’s payoff. It is not true that every pair of vectors selected from these projections is a pair of equilibrium payoff vectors. Incentive compatibility imposes some restrictions on the pairing. Details on
462
J. HÖRNER AND S. LOVO
FIGURE 1.—Belief-free equilibrium payoffs for player 1 as δ → 1.
3.3. Sketch of the Proof The proof of the theorem is constructive. A natural way to proceed is to follow Koren (1992) and others. First, players signal their type (through their choice of actions). Given the reported types, players then choose actions so as
FIGURE 2.—Belief-free equilibrium payoffs for player 2 as δ → 1. the derivation of individually rational and incentive compatible payoffs for Example 1 can be found in the working paper HEC CR 884/2008 available at http://www.hec.fr/hec/fr/professeurs_ recherche/upload/cahiers/CR884SLOVO.pdf. We thank one referee for pointing out a mistake.
BELIEF-FREE EQUILIBRIA IN GAMES
463
to generate the distribution over action profiles corresponding to these reports. If a player deviates in this second phase, he is minmaxed. Individual rationality guarantees that deviating after some report yields a lower payoff than equilibrium play does, independently of the state. Incentive compatibility ensures that truthful reporting is optimal. However, such strategies typically fail to be sequentially rational. Minmaxing forever one’s opponent need not be individual rational. While this issue can be addressed with the obvious modification, a more serious difficulty is that the resulting strategy profile still fails to be belief-free. In particular, if a player believes that the reported type is incorrect, following his prescribed continuation strategy is no longer individually rational. The actual construction is therefore more involved, to ensure that beliefs are irrelevant after every possible history. To simplify exposition, we assume here that there is a public randomization device and that players can communicate at no cost in every period. These assumptions are dropped in the Appendix. So suppose that at the beginning of each period, a draw from the uniform distribution on the unit interval (independent of the state of nature and over time) is publicly observed, and suppose that at the beginning of the game (before the first draw is realized) and at the end of every period, players simultaneously make a report that is publicly observable. The set of possible reports is the set of rows and columns, respectively. Player 1 reports some j = 1 J, while player 2 reports some k = 1 K. In every period, and using the most recent outcome of the randomization device as a correlation device, a correlated action profile is played that only depends on the last pair of reports made by the players. These correlated action profiles are such that each player obtains the desired payoff whenever (j k ) = (j k), that is, whenever reports are correct, and such that this payoff exceeds what can be obtained by misreporting, independently of the type truthfully reported by the opponent. Thus, players are willing to report their type truthfully, regardless of their beliefs. In case a player deviates from the prescribed action, he is then punished for finitely many periods. Making sure that play during such a punishment phase is also belief-free introduces some additional complications. Because players report their types infinitely often, a player who believes that his opponent’s report is incorrect still expects his opponent to revert to the true report in the next period. As a consequence, it is less costly for him to play for one period according to the report that he believes to be false than to deviate and to face a long punishment phase. More formally, given some v ∈ int V ∗ , we first describe the equilibrium strategies, and then check that these strategies (i) achieve v and (ii) are best replies that are belief-free.
464
J. HÖRNER AND S. LOVO
Equilibrium Strategies The play can be divided into phases, which are similar to states of an automaton. There are two kinds of phases. Regular phases last one period. Punishment phases can last from 1 to T periods, where T is to be specified. Regular phases are denoted Rjk (ε1 ε2 ), where ε1 ε2 ∈ R. Punishment phases are dej noted P1k , P2 . Actions (i) Regular phase: In a regular phase Rjk (ε1 ε2 ), actions are determined by the outcome of the public randomization device. Each action profile a is selected with probability Pr{a | Rjk (ε1 ε2 )}. Given jk jk vi (Rj k (ε1 ε2 )) := Pr{a | Rj k (ε1 ε2 )}ui (a) a∈A jk
and vi (R(ε1 ε2 )) := {vi (Rjk (ε1 ε2 ))}(jk) , these probabilities, along with some number ε¯ > 0, are chosen such that (1)
vi (R(ε1 ε2 )) = vi + εi
and (2)
v1 (Rjk (ε1 ε2 )) > v1 (Rj k (ε1 ε2 )) jk
jk
v2 (Rjk (ε1 ε2 )) > v2 (Rjk (ε1 ε2 )) jk
jk
for all i = 1 2, εi εi ∈ [−ε ¯ ε], ¯ j = j, k = k. This is possible for all sufficiently small ε¯ by incentive compatibility, given that v ∈ int V ∗ . At the end of a regular phase, types are reported truthfully. (ii) Punishment phase: The punishment phase lasts at most T periods. Without loss of generality, we describe here the actions and reports in phase P1k . Both the subscript (the identity of the punished player) and the superscript (the reported type by the punisher) remain constant throughout the phase. Decreasing ε¯ if necessary, the (behavior) strategy s2k of player 2 during the pun¯ ishment phase P1k is such that, for some δ¯ < 1 and all discount factors δ > δ, the average discounted payoff of player 1 over the T periods, conditional on jk state (j k), is no larger than v1 − 2ε. ¯ This is possible for all sufficiently large T by individual rationality, given that v ∈ int V ∗ . ¯ and ε¯ satisfy, for all j k, and i = 1 2, We further assume that T , δ, (3)
jk
−(1 − δ)M + δ(vi − ε) ¯ jk jk ¯ + δT (vi − ε) ¯ > (1 − δ)M + δ (1 − δT )(vi − 2ε)
BELIEF-FREE EQUILIBRIA IN GAMES
465
and (4)
jk
jk
−(1 − δT )M + δT vi > (1 − δT )M + δT (vi − 2ε/3) ¯
¯ and ε¯ exist, observe that for a fixed but small enough To see that such T , δ, ε¯ > 0, (3) is satisfied for all T large enough and δ > δ¯ for δ¯ close enough to 1. Increasing the value of δ¯ if necessary, (4) is then satisfied as well. Returning to the specification of actions and reports, as long as the punishs2k ment phase P1k lasts (i.e., for at most T periods), player 2 plays according to k (given k and the history starting in the initial period of P1 ). Observe that s2k jk need not be pure. Player 1 plays a best reply s1 to s2k , conditional on the true jk column being k. Without loss of generality, we pick s1 to be pure. Observe jk that s1 may depend on j. Players report truthfully types in all periods of the punishment phase. (iii) Initial phase: As mentioned, players report types at the beginning of the game. These initial reports are made truthfully. The initial phase is the regular phase Rjk (0 0), where (j k) are the initial reports. Transitions (i) From a regular phase Rjk (ε1 ε2 ): If the action of player 1 (resp. player 2) differs from the prescribed action, while player 2 (resp. 1) plays the prescribed j action, then the next phase is P1k (resp. P2 ), where k (resp. j ) is the report made at the end of the period by the corresponding player. (Observe that the message of the deviator plays no role here.) Otherwise: (a) if (j k ) = (j k) or both j = j and k = k , the next phase is Rj k (ε1 ε2 ), where (j k ) is the pair of messages in the period; (b) if j = j and k = k (resp. j = j and k = k ), the ¯ ε2 ) (resp. Rj k (ε1 −ε)). ¯ In words, unilateral deviations next phase is Rj k (−ε from the prescribed action profile trigger a punishment phase, while inconsistencies in successive reports are punished via the payoff prescribed by the regular phase. Simultaneous deviations are ignored. (ii) From a punishment phase: Without loss of generality, consider P1k , where k is player 2’s report at the end of the last period before the punishment phase (so k is fixed throughout P1k ). In what follows, all statements to histories and periods refer to the partial histories starting at the beginning of the punishment phase. Given s2k , define H k ⊆ H T as the set of histories of length at most T for which there exists a (arbitrary) strategy s1 of player 1 such that this history is on the equilibrium path for s1 and s2k as far as actions are conk cerned. That is, a history is not in H if and only if, in some period, the action of player 2 is inconsistent with s2k . t k t+1 k / H , the punishment phase stops at the end of period If h ∈ H but h ∈ j t + 1 and the punishment phase P2 starts, where j is player 1’s report in period
466
J. HÖRNER AND S. LOVO
t + 1. Otherwise, the punishment phase continues up to the T th period, and we henceforth let h denote such a history of length T . Let (j k ) denote the pair of reports in the last period of the punishment phase. ¯ 0] The next phase is then Rj k (ε1 (h; P1k ) ε2 (h; P1k )) with ε1 (h; P1k ) ∈ [−ε and ε1 (h; P1k ) = −ε¯ if k = k, and PROPERTY 1: ε1 (h; P1k ) is such that, if k = k, playing the action specified in the punishment phase is optimal for player 1 along every history h ∈ H k under the state of the world (j k ) (recall that h specifies (j k )).9 Inequality (4) guarantees that the variation of ε1 (h; P1k ) across histories h that is required is less than 2ε/3, ¯ so that this can be done with ε1 (h; P1k ) in ¯ ε] ¯ if k = k and in [−ε ¯ 0] for all histories h. As for ε2 (h; P1k ), it is in [ε/3 [−ε ¯ −ε/3] ¯ otherwise. Furthermore, PROPERTY 2: ε2 (·; P1k ) is such that, conditional on state (j k ) and after every history h ∈ H k within the punishment phase, player 2 is indifferent over all sequences over action profiles (within the punishment phase) consistent with H k , and prefers those to all others. Given (4), this is possible whether k = k or not. It is clear that the strategy profile yields the pair of payoff arrays v = (v1 v2 ). It is equally clear that play is specified in a way that is independent of beliefs. Verification That the Described Strategy Profile Is a Perfect Bayesian Equilibrium Regular Phase Rjk (ε1 ε2 ): (i) Actions: Suppose that one player, say player 1, unilaterally deviates from the prescribed action profile. Then the punishment phase P1k starts, where k is the announcement by player 2. Accordingly, the payoff from deviating is at most equal to the right-hand side of (3), while the payoff from following the prescribed strategy is at least the left-hand side of (3). The result follows. (ii) Messages: (a) Assume first that player 1 has deviated from the recommended action profile, while player 2 has not. Because player 2 will cor rectly report the column k at the end of the punishment phase P1k that starts, jk ¯ by announcing k = k, while he will get at most (1 − δT )M + δT (vi − ε/3) jk ¯ if he announces k = k, so he gets at least −(1 − δT )M + δT (vi + ε/3) that player 2 has a strict incentive to report truthfully given (4). Given that player 1 has deviated, player 1’s report is irrelevant, and so it is also optimal 9
See Hörner and Olszewski (2006) for the details of an analogous specification.
BELIEF-FREE EQUILIBRIA IN GAMES
467
for player 1 to report truthfully; (b) Otherwise, if player i (say player 2) reports jk ¯ while if he misreports, he gets at most the true state, he gets at least vi − ε, jk jk jk ¯ ε)) ¯ + δ(vi − ε). ¯ Therefore, (2) guarantees that neither (1 − δ) maxk vi (R (ε player has an incentive to deviate. Note that whenever player i’s reports conjk tradict his previous report, his continuation payoff is at most vi − ε, ¯ ensuring 10 that no player benefits from misreporting his type. Punishment Phase: Without loss of generality, consider P1k . (i) Messages: Observe first that all the messages in the punishment phase are irrelevant except in the last period of this punishment phase, whether this occurs after T periods or before. If such a history belongs to H k , then truthful announcements are optimal because of (2), as in case (ii)(b) above. If such a history does not belong to H k , then truthful announcements are also optimal as the situation is identical to the one described just above (case (ii)(a)). (ii) Actions: The inequality (4) (for i = 2) along with Property 2 ensures that player 2 has no incentive to take an action outside of the support of the (possibly mixed) action specified by s2k after every history h ∈ H k and that he is indifferent over all the actions within this support (whether his report k is correct or not). As for player 1, by definition his strategy is optimal in case k is the true column, and Property 1 guarantees that it remains optimal to play jk according to s1 in state (j k ), for all j k . 3.4. Existence Strict individual rationality and incentive compatibility are stringent restrictions, implying that the set of belief-free individually rational payoffs is empty for some games. In the following, we discuss two examples in which the set V ∗ is empty and we provide two conditions ensuring nonemptiness. In Example 2 there is no feasible payoff that is individually rational for both players simultaneously. In Example 3 both the set of individually rational payoffs and the set of incentive compatible payoffs are nonempty, but their intersection is empty. EXAMPLE 2 —Nonexistence of Belief-Free Individually Rational Payoffs: Player 1 is informed of the true state ( = the row); player 2 is not (J = 2, K = 1). The payoffs are either
U D
L 10 −4 1 1
R 1 1 0 0
10 Otherwise a player could profit from misreporting his type at the beginning of a punishment phase and in the next regular phase, and reverting to truthtelling afterward.
468
J. HÖRNER AND S. LOVO
or
U D
L 0 0 1 1
R 1 1 . 10 −4
In each state, player 2 must be guaranteed at least 0 in a belief-free equilibrium: his equilibrium strategy must be optimal given any beliefs he may hold, including degenerate beliefs on the true state. His payoff must therefore be at least as large as his minmax payoff given the true state, which exceeds 0 in both states. This implies that the action profile yielding −4 to player 2 cannot be played more than a fifth of the time in equilibrium. Equivalently, this means that player 1’s equilibrium payoff is at most 14/5 in each state. However, if player 1 randomizes equally between U and D independently of the state, he is guaranteed to get at least 3 in one of the states, a contradiction. (This state typically depends on player 2’s strategy. However, no strategy of player 2 can bring down player 1’s payoff below 3 in both states simultaneously.) In Example 2, player 2’s payoff matrix depends on player 1’s type. In a belieffree equilibrium, player 2 must get at least what he can guarantee when he knows player’s 1 type. In this example, this is only possible, for all beliefs of player 2, if the equilibrium is separating, that is, if in equilibrium player 1 reveals his information. However, in this example a nonrevealing strategy yields a higher payoff to player 1 than any separating outcome that is individually rational for player 2, and so no belief-free equilibrium exists. This does not arise when the uninformed player does not need to know the state to secure his individually rational payoff. This gives rise to the following condition that guarantees that V ∗ is nonempty. CONDITION 3: Consider a game of one-sided incomplete information in j j which player 1 is informed. If there exist α∗2 ∈ A2 and α1 ∈ B1 (α∗2 ) such that, for all j = 1 J, u2 (α1 α∗2 ) ≥ u2 j
j
j
then V ∗ is nonempty. In fact, the above inequality implies that the payoffs obtained if player 2 j plays α∗2 and player 1 plays his type-dependent best reply α1 are individually rational for both players as well as incentive compatible for player 1. When the uninformed player always knows his own payoff, the strategy guaranteeing him his minmax payoff is independent of the informed player’s type. Thus, Condition 3 always holds in games of one-sided incomplete information
BELIEF-FREE EQUILIBRIA IN GAMES
469
with known-own payoffs.11 This is the main class of games examined in the literature on reputations. See Section 4. The next example shows that, with two-sided incomplete information, known-own payoffs is not a sufficient condition for V ∗ to be nonempty.12 EXAMPLE 3—Nonexistence of Individually Rational and Incentive Compatible Payoffs: Each player is informed of his own payoffs. Player 1’s payoff is
T B
L 3 0
R 0 1
or
T B
L 1+ε 0
R 1 0
for j = 1 and j = 2 respectively. Player 2’s payoff is
T B
L 1 0
R 0 3
or
T B
L 0 0
R 1 1+ε
for k = 1 and k = 2, respectively, where ε ∈ (0 1/35) Consider state (j k) = (2 1). Player 1 can secure a payoff of at least 1. This requires that, in equilibrium, action T is used with frequency not smaller than 1 − ε/(1 + ε). Player 2 can guarantee 3/4, but as action profile {R B} cannot be played more than ε/(1+ε) of the time, it follows that Pr{T L|(j k) = (2 1)} > 3/4 − 3ε/(1 + ε). Applying a symmetric argument to player 2, we obtain that Pr{B R|(j k) = (1 2)} > 3/4 − 3ε/(1 + ε). Consider now state (j k) = (1 1). Each player may pretend that he is of type 2 so that his preferred outcome occurs at least 3/4 − 3ε/(1 + ε) of the time. Thus, the incentive compatibility constraints for player 1 and for player 2 in state (1 1) require that 3 Pr{T L|(j k) = (1 1)} + 1 − Pr{T L|(j k) = (1 1)} ε 3 −3 ≥3 4 1+ε and Pr{T L|(j k) = (1 1)} + 3 1 − Pr{T L|(j k) = (1 1)} 3 ε ≥3 −3 4 1+ε For this class of games, Shalev (1994) showed that V ∗ is nonempty (Proposition 5, p. 253). Example 3 is inspired by Example 6.6 in Koren (1992) that establishes that Nash equilibria need not exist in undiscounted games with two-sided incomplete information. 11 12
470
J. HÖRNER AND S. LOVO
respectively. However, there is no value of Pr{T L|(j k) = (1 1)} that satisfies both inequalities for ε < 1/35. In Example 3, known-own payoffs guarantee that the set of individually rational payoffs is nonempty. Still, none of these payoff arrays is incentive compatible. This issue does not arise when there exists a distribution over action profiles that yields individually rational payoffs independently of the state. More formally, let α ∈ A be a distribution over action profiles and jk let ui (α) be player i’s payoff in state (j k) under the distribution α. Let JK ui (α) := (u11 i (α) ui (α)). j
α1 ∈ A1 , j = 1 2 J, and αk2 ∈ CONDITION 4: If there exist α∗ ∈ A A2 , k = 1 2 K, such that, for all (j k) and i, αk2 ) αk2 ) u1 (α∗ ) ≥ u1 (B1 ( jk
jk
jk
and u2 (α∗ ) ≥ u2 (B2 ( α1 ) α1 ) jk
jk
jk
j
j
then V ∗ is nonempty. The payoff array (u1 (α∗ ) u2 (α∗ )) is obviously incentive compatible, since it is achieved by strategies that do not depend on players’ types. The existence of j “punishment” strategies α1 , αk2 that are independent of the other player’s type ∗ guarantees that (u1 (α ), u2 (α∗ )) is individually rational.13 Condition 3 relied on the existence of a strategy that secured a player his minmax payoff independently of the state. Condition 4 relies on the existence of a strategy that drives down an opponent’s payoff below some target level independently of the state. Condition 4 can be further simplified when a player punishment strategy is j α1 and αk2 = α2 for all states jk. This is the case state-independent, that is, α1 = in a variety of games commonly used in economics. For instance, most auction formats considered in the literature (including affiliated values, auctions with synergies, and multiunit auctions) satisfy it provided that the range of allowable bids includes the range of possible values of the units. In this case, any distribution α∗ for which the winning price is below the lowest possible value and each bidder wins the auction with positive probability guarantees each player a positive payoff, while any punishment strategy αi that sets a bid no smaller than the largest possible value drives player −i’s payoff to zero. Similar reasoning applies to Bertrand games and Cournot games provided that for some output range the market price is commonly known to exceed production cost. Thus, 13 We thank a referee for pointing out that punishment strategies α can vary with the punishing player’s type.
BELIEF-FREE EQUILIBRIA IN GAMES
471
on the one hand, there always exists a way of sharing the market such that both players achieve a positive profit. On the other hand, each player can minmax his opponent by setting a low price in a Bertrand game or a high quantity in Cournot games, independently of the state. So far, we have focused on conditions guaranteeing that V ∗ is nonempty. Yet Theorem 1 asserts the existence of belief-free equilibria only for payoffs in the interior of V ∗ . Focusing on the interior of V ∗ guarantees that it is possible to provide incentives for players to carry out punishments, as in standard proofs of folk theorems with perfect monitoring and complete information, and three or more players (see Fudenberg and Maskin (1986)). This may or may not be possible otherwise. There are games for which the set V ∗ is nonempty, but its interior is empty. The problem may lie with individual rationality. For instance, V ∗ has empty interior in zero-sum games or in games in which a player has a strictly dominant action yielding a payoff independent of the opponent’s action—the payoff corresponding to a Stackelberg type. We are not aware of any simple condition ensuring that strictly individually rational payoffs exist. On the other hand, the problem may lie with weak vs. strict incentive compatibility. Recall that weakly incentive compatible payoffs always exist, and suppose that some weakly incentive compatible payoff is strictly individually rational. We may then as well assume that the corresponding (distribution over) action profile(s) α ∈ (A)JK is completely mixed.14 Strict incentive compatibility is equivalent to KJ(J − 1) and JK(K − 1) linear inequalities, corresponding to player 1 and player 2, respectively. Given that incentive compatibility constraints only depend on differences in the distributions of outcomes corresponding to different reports, there are (JK − 1) distributions that can be chosen to find (strictly) incentive compatible payoffs. Thus, generically, this is possible if (|A| − 1)(JK − 1) is at least as large as the number of constraints, JK(J + K − 2). Observe that (J + K − 1)(JK − 1) − JK(J + K − 2) = (J − 1)(K − 1). Therefore, a sufficient condition ensuring that, for a generic payoff matrix, V ∗ has nonempty interior whenever there exists some strictly individually rational, weakly incentive compatible payoff v, is |A| ≥ J + K If V ∗ is nonempty, but its interior is empty, belief-free equilibrium may or may not exist. For instance, in strictly dominant action games with a unique Stackelberg type—a class of games examined in the literature on reputations— a belief-free equilibrium always exist, although the interior of V ∗ is empty. 14 To see this, observe that any state-independent action profile is weakly individually rational. Pick any such completely mixed action profile α and consider the convex combination εα + (1 − ε)α, ε ∈ [0 1]. Since the set of incentive compatible action profiles is convex, this linear combination is weakly incentive compatible, is completely mixed, and is strictly individually rational for small enough ε.
472
J. HÖRNER AND S. LOVO
4. REPUTATIONS We consider games with known-own payoffs and one-sided incomplete information. By Proposition 5 of Shalev (1994), the set V ∗ is nonempty in such games, and we restrict attention for now to games in which this set has nonempty interior, which guarantees that belief-free equilibria exist. Player 1 is the informed player, while player 2 is uninformed. We fix one payoff type of player 1—the rational type—and study how the lower bound on the limit of equilibrium payoffs as the discount factor tends to 1 varies with the addition of (finitely many) other payoff types. The supremum of this lower bound over these payoff types is called the reputation payoff. Given some action α1 ∈ A1 of the informed player, recall that B(α1 ) is the set of best replies of player 2. The rational payoff type is denoted u1 . When considering two types only, we write u1 for player 1’s other payoff type. The analysis of reputation is strikingly simple. Observe that if some other type u1 is present, the rational type’s payoff must be at least min u1 (α) such that
α∈A
u2 (α) ≥ u2
u1 (α) ≥ u1
Indeed, player 2’s strategy must be optimal if he assigns probability 1 to player 1’s other (nonrational) type, so that the distribution over action profiles induced by player 1’s other type against player 2 must be individually rational for both players. Yet player 1’s rational type may mimic the other type. The supremum over u1 of this expression gives then a lower bound on the reputation payoff. (Introducing more than two types can only increase this lower bound.) The dual problem is sup
u1 p≥0q≥0
pu2 + qu1
such that
pu2 + qu1 ≤ u1
Since the constraints can be taken to be binding, the reputation payoff is at least sup val(u1 − p(u2 − u2 1)) p≥0
where 1 is an |A1 | × |A2 | matrix with 1s as entries. This is the bound found for Nash equilibrium payoffs in the undiscounted case by Israeli (1999, Theorem 1) using Farkas’ lemma. His proof shows that it is tight and achieved by u1 = −u2 .15 Since there usually is a trade-off between punishing player 1’s rational type and his other type, punishing player 1’s rational type might give his other type a payoff above his minmax. But if the other type’s preferences are opposite to player 2’s, player 2’s payoff is below his minmax, a contradiction: 15 As mentioned, zero-sum games have been ruled out by the assumption int V ∗ = ∅. Nevertheless, there exist payoff types arbitrarily close to u1 = −u2 for which the assumption is satisfied, so that Israeli’s analysis applies. See the online supplemental material (Hörner and Lovo (2009)).
BELIEF-FREE EQUILIBRIA IN GAMES
473
By maximizing his payoff, player 2 minimizes the other type’s payoff, implying that the rational type’s payoff is high. Note that the reputation payoff is the lower bound on belief-free equilibrium payoffs, which may be higher than that of Nash or sequential equilibrium payoffs. A standard concept in the analysis of reputations is the Stackelberg payoff, introduced by Fudenberg and Levine (1989). DEFINITION 2: The Stackelberg payoff u∗1 is defined as sup
min u1 (α1 α2 )
α1 ∈A1 α2 ∈B(α1 )
A sequence achieving the supremum is a Stackelberg sequence and its limit is a Stackelberg action. We say that a reputation is possible in a given game if there exists some type u1 such that, in all (belief-free) equilibria of the game, the rational type secures a payoff strictly above the minmax payoff. We also introduce a particular class of games (see, e.g., Schmidt (1993)).16 DEFINITION 3: A game has conflicting interest if some Stackelberg sequence minmaxes player 2. In other words, in a game of conflicting interests, there exists a Stackelberg sequence {αn1 } such that max u2 (αn1 α2 ) = u2
α2 ∈A2
THEOREM 2: Fix a game of one-sided incomplete information with known-own payoffs in which player 1 is the informed player. (i) The reputation payoff is equal to sup
min
α1 ∈A1 α2 :u2 (α1 α2 )≥u2
u1 (α1 α2 )
(ii) A reputation is possible if and only if, for some α1 ∈ A1 , ∀α2 ∈ A2
u2 (α1 α2 ) ≥ u2
⇒
u1 (α1 α2 ) > u1
(iii) The reputation and Stackelberg payoffs are equal if and only if, for any n ∈ N, there exists αn1 ∈ A1 such that ∀α2 ∈ A2
u2 (αn1 α2 ) ≥ u2
⇒
u1 (αn1 α2 ) ≥ u∗1 − 1/n
16 Our definition is slightly stronger than the usual one, as minmaxing must occur along the sequence. If the supremum is a maximum, then one can take the constant sequence {α1 } and the definition coincides with Schmidt (1993). We thank a referee for an illuminating example.
474
J. HÖRNER AND S. LOVO
This includes all games of conflicting interest. The first conclusion is due to Israeli (1999). The second conclusion of the theorem follows immediately from the first. If a sequence {αn1 } satisfying the condition of the third conclusion exists, then this sequence guarantees that the reputation payoff is at least as large as the Stackelberg payoff. Conversely, if the reputation payoff equals the Stackelberg payoff, then by definition of the reputation payoff, there must exist a sequence satisfying this condition. As for the last statement, observe that from the definition of a game of conflicting interest, given any term αn1 of a Stackelberg sequence, the set of best replies to αn1 and the set of individually rational actions for player 2 coincide. Therefore, plugging this Stackelberg sequence into the definition of the reputation payoff, it follows that the reputation payoff must be at least as large as, and therefore equal to, the Stackelberg payoff.17 Note that the Stackelberg payoff may or may not exceed the minmax payoff. That is, while the second conclusion characterizes when the reputation payoff exceeds the minmax payoff, the third makes no claim regarding the level of the reputation payoff. A few more remarks are in order. • Reputation may or may not be possible in games of common interest (Aumann and Sorin (1989)). This should not be surprising, since we allow for mixed strategies and, more importantly, incomplete information pertains to payoffs, not to the complexity of strategies. • The theorem is reminiscent of results for Nash equilibrium payoffs with unequal discount factors. Schmidt (1993) showed that the Stackelberg payoff and the reputation payoff coincide in games of conflicting interests, when player 1 is sufficiently more patient than player 2 and both discount factors tend to 1. Cripps, Schmidt, and Thomas (1996) generalized this result by showing that the reputation payoff is as given in the first conclusion of the theorem and they provided an example that shows that the result is false with equal discounting. In the case of equal discounting, more severe restrictions are thus required to obtain reputational effects with Nash equilibria. Cripps, Dekel, and Pesendorfer (2005) showed that the Stackelberg payoff can be achieved when attention is restricted to a subclass of games with conflicting interest, namely games of strictly conflicting interest. It should come as no surprise that, unlike in the case of Nash equilibria, it is not necessary that player 1 be more patient than player 2 here. After all, the uninformed player must play a best reply to all possible beliefs. This alleviates the need for the informed player to build a reputation, which may be a costly enterprise, before enjoying it. Indeed, given that the general characterization of belief-free equilibrium payoffs is similar to the characterization of Nash 17 We thank both referees for pointing this out, correcting an erroneous statement made in an earlier version.
BELIEF-FREE EQUILIBRIA IN GAMES
475
equilibrium payoffs with no discounting (at least in the case of one-sided incomplete information), it is natural that our findings regarding reputations parallel those of Cripps and Thomas (1995) for the case of no discounting. • Chan (2000) established that in strictly dominant action games (games in which player 1 has a strictly dominant action, and player 2’s best reply yields the highest possible individually rational payoff to player 1), the rational type receives the Stackelberg payoff in any sequential equilibrium when the game is perturbed by adding a single commitment type who always play the Stackelberg action. See, for instance, the game in Figure 3. The reader may wonder how this result is consistent with our analysis, since a strictly dominant action game need not be a game of conflicting interest. Recall that we assumed so far that int V ∗ = ∅, which rules out games in which the set of feasible and individually rational payoffs has empty interior. That is, we have excluded commitment types, as they correspond to payoff types with a dominant action whose payoff is independent of player 2’s action.18 If the game is perturbed by adding a single commitment type who always plays the Stackelberg action, a belief-free equilibrium exists if and only if there exists an action α1 of player 1 such that (α1 a2 ) is a Nash equilibrium in the (stage) game of complete information between player 1’s rational type and player 2, where a2 is 2’s best reply to the Stackelberg action. This condition is satisfied in strictly dominant action games and, indeed, it is then immediate that player 1’s rational type secures his Stackelberg payoff in the belief-free equilibrium of any such game: Since player 2’s strategy must be a best reply to all possible beliefs, including those which assign probability 1 to the commitment type, he must play a2 in every period, and player 1’s best reply is then to play his Stackelberg action. Observe, however, that reputation is fragile in such games: Consider replacing the single commitment type by any payoff type arbitrarily close to the commitment type, but for whom the dominant action does not yield a payoff independent of player 2’s action. Then, according to the previous theorem, to determine the reputation payoff, we must minimize player 1’s payoff from his Stackelberg action over player 2’s individually rational actions rather than over his best replies only. In the example of Figure 3, this implies that for all nearby payoff types, the reputation payoff is 4/3—still strictly above the minmax payoff of 0, so that a reputation is indeed possible, but below the Stackelberg payoff of 2. Since belief-free equilibria are sequential equilibria, this implies that reputation in strictly dominant action games is also fragile with respect to sequential equilibria.19 In contrast, the reputational effects obtained by Cripps, Dekel, 18 More precisely, player 1 has a dominant strategy in the repeated game for all discount factors if and only if he has a dominant strategy in the stage game yielding a payoff that is independent of player 2’s action. 19 Note, however, that this nongenericity is only with respect to the limit payoff set as the discount factor tends to 1. For a fixed discount factor, the Stackelberg action is a strictly dominant action in the supergame for payoff types sufficiently close to the Stackelberg type.
476
J. HÖRNER AND S. LOVO
FIGURE 3.—A strictly dominant action game.
and Pesendorfer (2005) in strictly conflicting interest games are robust, as the reputation payoff is continuous in the payoff parameters (as long as int V ∗ = ∅). • The result also sheds some light on the possible nonexistence of belief-free equilibria in games with two-sided incomplete information and known-own payoffs. Indeed, following the same logic, each player should be able to secure his reputation payoff in that case. However, nothing guarantees in general that it is feasible for both players to simultaneously achieve their reputation payoff. 5. CONCLUDING REMARKS This paper has introduced a solution concept for two-player repeated games of incomplete information and has characterized the corresponding payoff set as the discount factor tends to 1. This characterization is simple. Payoffs must be individually rational (in the sense of Blackwell (1956)) and must correspond to probability distributions over action profiles that are incentive compatible, given the private information of each player. The relevance and effectiveness of this concept has been illustrated in the context of reputations. There are several theoretical generalizations that demand attention. The information structure that we have considered in this paper is quite stylized, if standard. More generally, a player’s information can be modeled as a partition over the states of nature. Second, attention has been restricted to two players. While the appropriate generalization of the incentive compatibility conditions is quite obvious, it is less clear how to define individual rationality in the case of three players or more, as Blackwell’s characterization immediately applies to the case of two players only. Such a generalization could yield interesting insights for the study of reputations with more than two players. For economic applications, it is also of interest to extend the characterization to the case of a changing state. For instance, Athey and Bagwell (2001) and Athey, Bagwell, and Sanchirico (2004) characterized the (perfect public) equilibrium payoffs of a repeated game between price-setting oligopolists whose marginal cost in each period is private information. These costs are assumed to be drawn independently across players and over time, according to some commonly known distribution. In this context, we may wish to know which of these payoffs remain equilibrium payoffs if all that firms know is that the costs
BELIEF-FREE EQUILIBRIA IN GAMES
477
are independent and identically distributed, but the underlying distributions are unknown. In this way, the concept of belief-free equilibrium may turn out to provide a useful robustness criterion in this literature. APPENDIX: PROOF OF THEOREM 1 We first explain the construction without explicit communication, but with a randomization device. Communication is replaced by choices of actions, but since the set of actions may be smaller than the set of states, it may be necessary to use several periods to report types. We let c − 1 denote the smallest such number given the number of states and actions, that is, c is the smallest integer such that |A1 |c−1 ≥ J and |A2 |c−1 ≥ K (recall that |Ai | ≥ 2). Players will regularly report their type in rounds of c periods. For reasons that will become clear, in the last of these c periods, players have the opportunity, through the choice of a specific action, to signal that the report they have just made is incorrect. Equilibrium Strategies The play is again divided into phases. To guarantee that players’ best replies are independent of their beliefs, even within a round of communication (especially if a player’s own deviation during that round already prevents him from truthfully reporting his type), the construction must be considerably refined. For each player, we pick two specific actions from Ai , henceforth referred to as B and U. The pair of payoff arrays v is in the interior of V ∗ and is fixed throughout. There are two kinds of phases. Regular phases last at most n periods and punishment phases last at most T periods, where n and T are to be specified. Regular phases are denoted Rjk (ε1 ε2 ), where j ∈ {1 J} and k ∈ {1 K}, or Rxy , where x ∈ {1 J (L nU1 )} and y ∈ {1 K (L nU2 )}, with nUi ∈ {1 c} and either x = (L nU1 ) or y = (L nU2 ), or both (L stands j for “lie”). Punishment phases are denoted Pi , i = 1 2. Let s2k (resp. s1 ) denote a (behavior) strategy of player 2 (resp. 1) such that player 1’s (resp. player 2 ’s) jk jk payoff is less than v1 − 3ε¯ for all j and all strategies of player 1 (resp. v2 − 3ε¯ for all k and all strategies of player 2) for ε¯ > 0 small enough to be specified. jk jk Such strategies exist since v ∈ int V ∗ . Further, let s1 (resp. s2 ) denote some j s1 ) given row j (resp. column k). fixed pure best reply to s2k (resp. In several steps of the construction, a communication round of c periods takes place (within a phase). We fix a 1–1 mapping from states {1 J} to J t sequences {at1 }c−1 t=1 of length c − 1 (a1 ∈ A1 ) and similarly fix a 1–1 mapping from t states {1 K} to K sequences {at2 }c−1 t=1 of length c − 1 (a2 ∈ A2 ). If the play of player 1 during the first c − 1 periods equals such a sequence and his action in period c equals B, we say that player 1 (or his play) reports the row j that maps
478
J. HÖRNER AND S. LOVO
into this sequence of actions. Similarly, if the play of player 2 during the first c − 1 periods equals such a sequence and his action in period c equals B, we say that player 2 (or his play) reports the column k that maps into this sequence of actions. Otherwise, we say that player i (or his play) communicates (L nUi ), where U is the number of periods during these c periods in which player i chose action U. We shall provide incentives for player i to report the true row or column, rather than report (L nUi ) for any nUi , and to report (L nUi ) for any nUi ≥ 0, rather than the incorrect row or column. Further, we provide incentives for player i to maximize this number nUi as soon as his sequence of actions {at1 }τt=1 , τ ≤ c − 1, is inconsistent with any of the sequences that the mapping maps into. Actions (i) Regular phase: A regular phase lasts at most n > c periods, the last c of which is a communication round. During the first n − c periods, for all regular phases indexed by j k, and true column k , play proceeds as follows:
Phase U
Rj(Ln2 ) Rjk (ε1 ε2 ) U
U
R(Ln1 )(Ln2 )
U
Player 1
Player 2
j s1 jk a1 (ε1 ε2 )
jk s2 jk a2 (ε1 ε2 )
(U U)
(U U)
U
The specification for R(Ln1 )k is the obvious analogue to the case Rj(Ln2 ) . j jk The action ajk (ε1 ε2 ) is to be specified. The strategies s1 and s2 are the same as in the punishment phase (note that the duration is not the same, however). jk The superscript jk of the expression s2 refers to the row j that indexes the U regular phase Rj(Ln2 ) (which need not be the true row) and to the true col umn k . This specification of actions is valid as long as (in the case of Rjk (ε1 ε2 ) U U or R(Ln1 )(Ln2 ) ) the history within the phase is consistent with these actions or if all deviations from the specified actions during this phase were simultaneous, U and as long as (in the case of Rj(Ln2 ) ) the history within the phase is consistent j with s1 for some arbitrary s2 : As will be specified, a punishment phase is immediately entered otherwise. During the periods n − c + 1 n − 1 of this phase, player 1 (resp. player 2) communicates the true row j (resp. true column k); if this is impossible given his play from period n − c onward, he chooses U in every remaining period. (ii) Punishment phase: Without loss of generality, consider P1 , where T > 2c is to be specified. In the first c periods of this phase, player 1 plays U repeatedly while player 2 reports the true column (following the protocol described
BELIEF-FREE EQUILIBRIA IN GAMES
479
above). As in the regular phase, if this is impossible given player 2’s play, he chooses U in every remaining period of this communication round. In the table below, we refer to the case in which the column reported is k as the case k, while (L (nU1 nU2 )) refers to any other case, where nUi is the number of times player i chose action U in periods 1 c. Play in periods c + 1 T − c is then as follows:
Phase P1
k (L (nU1 nU2 ))
Player 1
Player 2
j k
s2k U
s1 U
This specification is valid up to period T − c (i) in the case (L (nU1 nU2 )), as long as both players have played U in all periods since period c + 1 or all deviations have been simultaneous, or (ii) in the case k, as long as the history since period c + 1 is consistent with s2k for some strategy s1 ; otherwise, a new punishment phase is immediately entered (see below). Here, j refers to the true row privately known to player 1. In the last c periods of a punishment phase (assuming that the specification above remained valid up to period T − c), a communication round takes place, that is, players report the true row and column, and as soon as they fail to do so, play U repeatedly. (iii) Initial phase: In the first c periods of the game, a communication round takes place, that is, players report the true row and column, and as soon as they fail to do so, play U repeatedly. In period c + 1, the regular phase Rjk (ε1 ε2 ) is entered if row j and column k are reported, where εi ∈ [−ε ¯ ε] ¯ is chosen so that the ex ante payoff in period 1 is exactly vjk conditional on j and k being the true row and column. If player 1 reports j and player 2 reports (L nU2 ) in U the first c periods, the regular phase Rj(Ln2 ) is entered. Similarly, if player 1 reU ports (L nU1 ), whereas player 2 reports k, the regular phase R(Ln2 )k is entered. U U Regular phase R(Ln1 )(Ln2 ) is entered in the remaining case. Transitions From a Regular Phase We have already mentioned what happens if there is a deviation during the first n − c periods of such a phase: If a player makes a unilateral deviation U U during the first n − c periods of a regular phase Rjk (ε1 ε2 ) or R(Ln1 )(Ln2 ) , a punishment phase starts. If player 1 (player 2) unilaterally deviates, punishment phase P1 (resp. P2 ) is immediately entered. Similarly, if player 1 (resp.
480
J. HÖRNER AND S. LOVO j
player 2) deviates from s1 (resp. s2k ) during the first n − c periods of a regular U )k j(LnU ) (Ln 2 (resp. R 2 ), the punishment phase P1 (resp. P2 ) is immephase R diately entered. From now on, we assume without repeating it that no such deviation occurs. In all tables that follow, j = j and k = k. (i) From Rjk (ε1 ε2 ): The new phase depends on the last c periods of the jk phase. Define also ρ := 2(1 − δ)δ− max(nT ) M. The quantity ε˜ i will be defined shortly. We have the following transitions:
Regular Phase
During Periods n − c + 1 n of the Phase, Players 1 and 2 Report
Next Regular Phase
Rjk (ε1 ε2 ) Rjk (ε1 ε2 ) Rjk (ε1 ε2 ) Rjk (ε1 ε2 ) Rjk (ε1 ε2 ) Rjk (ε1 ε2 )
(L nU1 ) (L nU2 ) (L nU1 ) k (L nU1 ) k j k j k j k
R(Ln1 )(Ln2 ) U R(Ln1 )k (LnU R 1 )k Rjk (ε1 −ε) ¯ Rj k (ε1 ε2 ) Rjk (ε1 ε2 )
U
U
(We omit the obvious symmetric specification for reports j (L nU2 ) and j (L nU2 )) U U (ii) From R(Ln1 )k (and symmetrically from Rj(Ln2 ) ): We have the following transitions:
During Periods n − c + 1 n of the Phase, Players 1 and 2 Report
Next Regular Phase
U
U (L nU 1 ) (L n2 ) U (L n1 ) k (L nU 1 ) k U j (L n2 ) j k
R(Ln1 )(Ln2 ) U R(Ln1 )k (LnU R 1 )k j(LnU ) 2 R jk Rjk (ε˜ 1 + ρnU1 εk;k 2 (h))
U
j k
k;k U Rjk (εk;k 1 (h) + ρn1 ε2 (h))
Regular Phase
R(Ln1 )k U R(Ln1 )k (LnU R 1 )k U R(Ln1 )k U R(Ln1 )k R(Ln1 )k
U
U
Here ε2k;k (h) ∈ [3ε/4 ¯ ε], ¯ ε2k;k (h) ∈ [−ε/2 ¯ −ε/4], ¯ and ε1k;k (h) ∈ [−ε ¯ ε] ¯ are k;k computed as follows: ε2 (·) makes player 2 precisely indifferent over all histories h that are consistent with s2k , conditional on the true column being k; ε2k;k (·) makes player 2 precisely indifferent over all histories h that are con sistent with s2k , conditional on the true column being k , while ε1k;k (h) compensates player 1 for every period along h in which the action he took is the
481
BELIEF-FREE EQUILIBRIA IN GAMES jk
action specified by s1 , so as to make sure that playing this action is optimal, conditional on the true state being (j k ) (reported in the last c periods). U U (iii) Finally, from R(Ln1 )(Ln2 ) : We have the following transitions:
Regular Phase U
U
R(Ln1 )(Ln2 ) U U R(Ln1 )(Ln2 ) (LnU )(LnU 1 2 ) R
During Periods n − c + 1 n of the Phase, Players 1 and 2 Report
Next Regular Phase
U (L nU 1 ) (L n2 ) U (L n1 ) k j k
R(Ln1 )(Ln2 ) U R(Ln1 )k U jk R (ρn1 ρnU2 )
U
U
(We omit the obvious symmetric specification for reports j (L nU2 )) From a Punishment Phase Without loss of generality, consider P1 . We have already briefly mentioned what happens if there is a deviation during the periods c +1 T −c of such a phase; in case (L (nU1 nU2 )), if player i unilaterally deviates from the play of U, the punishment phase Pi is immediately entered; in case k, if player 2 deviates from the support of the (possibly mixed) action specified by s2k , punishment phase P2 is entered (no matter how player 1 has played). From now on, we assume without repeating it that no such deviation occurs up to period T − c. In case k, let h denote the history during the periods c + 1 T − c. (i) In case k, we observe the following transitions:
During Periods T − c + 1 T of the Phase, Players 1 and 2 Report
Next Regular Phase
Case k Case k Case k Case k Case k
U (L nU 1 ) (L n2 ) (L nU ) k 1 (L nU 1 ) k j k j k
R(Ln1 )(Ln2 ) U R(Ln1 )k (LnU R 1 )k U jk R (ρn1 − ε ¯ εk;k 2 (h)) k;k jk R (ε1 (h) εk;k 2 (h))
Case k
j (L nU 2 )
Punishment Phase P1
U
U
U )
Rj(Ln2
(ii) In case (L (nU1 nU2 )), the transitions are described by the next table. It is clear from this specification that the strategy profile described here is belief-free, since actions are always determined by the history and possibly by a player’s own type (in case he is minmaxed), but not on his beliefs about his opponent’s type.
482
J. HÖRNER AND S. LOVO
Punishment Phase P1
During Periods T − c + 1 T of the Phase, Players 1 and 2 Report
Next Regular Phase
U (L nU 1 ) (L n2 ) (L nU ) k 1 j (L nU 2 ) j k
R(Ln1 )(Ln2 ) U R(Ln1 )k j(LnU 2 ) R U jk R (ρn1 ρnU2 )
Case (L (nU1 nU2 )) Case (L (nU1 nU2 )) Case (L (nU1 nU2 )) Case (L (nU1 nU2 ))
jk
U
U
jk
Specification of ε, ¯ a1 (ε1 ε2 ), δ, T , n, and ε˜ i : Since v is in the interior of V ∗ , it is possible to find ε¯ > 0, as well as, for all (ε1 ε2 ) (ε1 ε2 ) ∈ [−2ε ¯ 2ε], ¯ probability distributions over A, Pr{· | Rjk (ε1 ε2 )}, such that for all j k j k , and i = 1 2, defining jk jk vi (Rj k (ε1 ε2 )) := Pr{a | Rj k (ε1 ε2 )}ui (a) a∈A
it is the case that, for j = j and k = k (A1)
v1 (Rjk (ε1 ε2 )) > v1 (Rj k (ε1 ε2 )) jk
jk
and
v2 (Rjk (ε1 ε2 )) > v2 (Rjk (ε1 ε2 )) jk
jk
Furthermore, if {at1 }ct=1 and {at2 }ct=1 are the sequences corresponding to reports j and k, for all δ close enough to 1 and n large enough, we can pick those distributions so that player i’s average discounted payoff under state (j k) from the sequence {at1 at2 }ct=1 followed by n−c repetitions of the action profile deterjk mined by Pr{a | Rjk (ε1 ε2 )} is exactly equal to vi +εi . Observe that in the equilibrium described above, all values of εi are in [−ε ¯ ε]. ¯ Furthermore, since v is in the interior of V ∗ , we may assume that player 1’s (resp. player 2’s) average j discounted payoff under state (j k) given that player 2 uses s2k (ε) (resp. s1 (ε)) for n − 2c periods, followed by any arbitrary play during c periods, is at most jk jk v1 + ε (resp. v2 + ε) for ε > −3ε. ¯ Consider the inequalities jk
jk
jk
(A2)
v1 + ε1 > (1 − δc )M + δc (1 − δn )(v − 2ε) ¯ + δn+c (v1 + ε˜ 1 + cρ)
(A3)
v1 + ε1 < −(1 − δn+c )M + δn+c (v1 + ε˜ 1 )
(A4)
v1 − ε¯ > (1 − δc )M + δc (1 − δn−c )(v1 − 2ε) ¯ + δn (v1 − ε) ¯
jk jk
jk
jk
jk
jk
Given ε, ¯ fixing δn , inequality (A4) is satisfied as δ → 1, provided that the value n of δ is large enough. Similarly, given ε, ¯ fixing δn , inequality (A2) is satisfied as jk jk δ → 1 for ε˜ 1 = −ε¯ and (A3) is satisfied for ε˜ 1 = 3ε/4, ¯ provided that the value n of δ is large enough and ε1 < ε/2 ¯ (recall that ρ = 2(1 − δ)δ− max(nT ) M → 0 for
483
BELIEF-FREE EQUILIBRIA IN GAMES
fixed δ− max(nT ) ). Observe that the left-hand side of (A4) is the lowest possible payoff for player 1, evaluated in the first period of a communication round, concluding either a punishment phase or a regular phase, if he reports his true row j and player 2 reports his true column k, while the right-hand side is the most he can expect by reporting another row j = j when player 2 reports his true column k. Similarly, the left-hand sides of (A2) and (A3) are player 1’s payoff, evaluated in the first period of a communication round concluding either a punishment phase or a regular phase, if he reports his true row j and player 2 reports his true column k (and the upcoming regular phase is Rjk (ε1 ε2 )), while the right-hand side of (A2) (resp. (A3)) is the highest (resp. lowest) payoff he can expect if he reports (L nU1 ) for some nU1 . There¯ by the intermediate value theorem, we can find different valfore, if ε1 < ε/2, jk ¯ 3ε/4) ¯ so that the payoff from reporting the true row exceeds ues of ε˜ 1 ∈ (−ε the payoff from reporting (L nU1 ) for all nU1 , which in turn exceeds the payoff from reporting another row j = j, provided player 2 reports the true column. jk ¯ we can set ε˜ 1 = 0: In that case as well, the same ordering obtains If ε1 ≥ ε/2, jk provided that the value of δn is large enough as δ → 1. The values ε˜ 2 are defined similarly. Consider now the two inequalities jk
jk
(A5)
¯ > (1 − δn )M + δn (v1 + ρc) −(1 − δn )M + δn (v1 + 3ε/4)
(A6)
−(1 − δn )M + δn v1 > (1 − δn )M + δn (v1 − ε) ¯
jk
jk
Conditional on player 2 reporting (L nU2 ) for some nU2 , the left-hand side of (A5) is the lowest possible payoff for player 1, evaluated in the first period of a communication round concluding either a punishment phase or a regular phase, if he reports his true row j, while the right-hand side is the highest payoff he can get if he reports (L nU1 ) for some nU1 . Similarly, the left-hand side of (A6) is the lowest possible payoff for player 1, evaluated in the first period of a communication round concluding either a punishment phase or a regular phase, if he reports (L nU1 ) for some nU1 , while the right-hand side is the highest payoff he can get if he reports another row j = j. Observe that both inequalities hold, given ε, ¯ letting δ → 1, provided δn is large enough. Finally, observe that the choice of ρ trivially ensures that, conditional on having started reporting (L nU1 ) for some nU1 , player 1 has strict incentives to play U in all remaining periods of the communication round, no matter where this round takes place. Similar considerations hold for player 2. To summarize, we have shown that we can ensure that both players prefer to report their true type, in any communication round, than to report (L nUi ) for all nUi ; that, conditional on reporting (L nUi ) for some nUi , player i has strict incentives to choose U in any remaining period of the communication round; and that they prefer to report (L nUi ) for any nUi than to report an incorrect row or column; all this, provided that δn (and δT ) is fixed but large enough, by taking δ → 1, given ε. ¯
484
J. HÖRNER AND S. LOVO
Turning now to actions, we must consider (A7)
¯ + δT +1 (vi − ε) ¯ (1 − δn+1 )M + δn+1 (1 − δT −n )(vi − 2ε) jk
jk
jk
¯ < −(1 − δn )M + δn (vi − ε) (A8)
(1 − δn+1 )M + δn+1 (1 − δT −n )(vi − 2ε) ¯ + δT +1 (vi − ε) ¯
(A9)
¯ < −(1 − δT )M + δT (vi − ε/2) jk 1 − δmax(Tn) M + δmax(Tn) (vi − ε/2) ¯ jk < − 1 − δmax(Tn) M + δmax(Tn) (vi − ε/4) ¯
jk
jk
jk
Observe that all three inequalities hold for both i = 1 2, given ε, ¯ for δT and n fixed as δ → 1. This ensures that, given ε, ¯ we can choose n, T , and δ to satisfy all the inequalities above. As for the interpretation, (A7) ensures that player i does not want to deviate during any regular phase, (A8) ensures that player i does not want to deviate during the punishment phase P−i , and (A9) ensures that we can pick ε2k;k (·) and ε2k;k (·) within a range of values not exceeding u u ε/4 ¯ in case k and after phases R(Ln1 )k and Rj(Ln2 ) . Indeed, the left-hand sides of (A7) and (A8) are the highest payoffs player i can hope for by deviating at any time (outside communication rounds), while the right-hand side of (A7) (resp. (A8)) is the lowest payoff he can expect by sticking to the equilibrium strategies in a regular phase (resp. in a punishment phase). Note that (1 − δT )M is the highest payoff player 1 (resp. player 2) can get when using j strategy s1 (resp. s2k ) during the punishment phase P−i over all actions consistent with his equilibrium strategy, while −(1 − δT )M is the lowest such payoff. Inequality (A9) guarantees therefore that there exist functions ε1k;k and ε1k;k whose ranges do not exceed ε/4 ¯ such that player 1 is playing a best reply, given k;k ε1 (·), whether or not the true column is k. To conclude, it remains to show that public randomization can be dispensed with. Observe that this device is used in exactly one place. For all pairs (j k), and all (ε1 ε2 ) ∈ [−2ε ¯ 2ε], ¯ if {at1 at2 }ct=1 is the sequence of action profiles corresponding to the reports (j k), and for all δ close enough to 1, the public randomization device guarantees that we can find a correlated action profile such that the average discounted payoff from the sequence {at1 at2 }ct=1 followed jk by n − c repetitions of this correlated action profile yields a payoff vi + εi to player i, in state (j k). Observe now that all incentives in the regular phase are strict, so that they would also be satisfied, for all δ close enough to 1, as long as the continuation payoff vˆ it in period t of the regular phase is within jk 2ε¯ + εˆ (rather than within 2ε) ¯ of vi for some εˆ > 0 sufficiently small and all t = 1 n. Observe now that, following Fudenberg and Maskin (1991) (which itself builds on Sorin (1986)), we can find n large enough, so that for all δ close enough to 1, there exists a sequence of sequences {{at1 (ν) at2 (ν)}nt=1 }∞ v=1 ,
485
BELIEF-FREE EQUILIBRIA IN GAMES
with {at1 (ν) at2 (ν)}ct=1 = {at1 at2 }ct=1 for all ν, such that (i) the average discounted payoff from the infinite play {a11 (1) a12 (1) an1 (1) an2 (1) a11 (2) a12 (2) } jk
obtained by concatenation of the elements of this sequence, is equal to vi + εi , and that (ii) the continuation payoff from any period t onward in this infinite jk play is within εˆ of vi + εi .20 It is then clear how to modify the specification above: Increase n and choose δ close to 1, if necessary, to guarantee the existence of such sequences; if players are in the vth consecutive regular phase Rjk (ε1 ε2 ), with reports (j k) that agreed in all those phases, play in that vth phase is given by {at1 (ν) at2 (ν)}nt=1 . (Note that, in general, the continuation payjk off of i at the beginning of the vth phase is not exactly vi + εi , so (ε1 ε2 ) only refers to the continuation payoff achieved in the first such regular phase, or more precisely, from the communication phase that immediately precedes this first regular phase onward.) If a deviation occurs or consecutive reports disagree, a new such sequence of consecutive plays {{at1 (ν) at2 (ν)}nt=1 }∞ v=1 starts in the next regular phase (or more precisely, from the communication phase that immediately precedes this first regular phase), given the new values of (ε1 ε2 ). REFERENCES AGHASSI, M., AND D. BERTSIMAS (2006): “Robust Game Theory,” Mathematical Programming, Ser. B, 107, 231–273. [455] ATHEY, S., AND K. BAGWELL (2001): “Optimal Collusion With Private Information,” RAND Journal of Economics, 32, 428–465. [476] ATHEY, S., K. BAGWELL, AND C. SANCHIRICO (2004): “Collusion and Price Rigidity,” Review of Economic Studies, 71, 317–349. [476] AUMANN, R. J., AND M. B. MASCHLER (1995): Repeated Games With Incomplete Information. Cambridge, MA: The MIT Press. [456] AUMAN, R., AND S. SORIN (1989): “Cooperation and Bounded Recall,” Games and Economic Behavior, 1, 5–39. [474] BAÑOS, A. (1968): “On Pseudo-Games,” Annals of Mathematical Statistics, 39, 1932–1945. [455] BERGEMANN, D., AND S. MORRIS (2007): “Belief Free Incomplete Information Games,” Discussion Paper 1629, Cowles Foundation, Yale University. [453] BLACKWELL, D. (1956): “An Analog of the Minmax Theorem for Vector Payoffs,” Pacific Journal of Mathematics, 6, 1–8. [459,476] CHAN, J. (2000): “On the Non-Existence of Reputation Effects in Two-Person InfinitelyRepeated Games,” Working Paper, Johns Hopkins University. [475] CRÉMER, J., AND R. MCLEAN (1985): “Optimal Selling Strategies Under Uncertainty for a Discriminating Monopolist When Demands Are Interdependent,” Econometrica, 53, 345–361. [453] CRIPPS, M., AND J. THOMAS (1995): “Reputation and Commitment in Two-Person Repeated Games Without Discounting,” Econometrica, 63, 1401–1419. [475] 20 The construction of Sorin (1986) and Fudenberg and Maskin (1991) guarantees that n and δ can be chosen independently of vjk + ε.
486
J. HÖRNER AND S. LOVO
(2003): “Some Asymptotic Results in Discounted Repeated Games of One-Sided Incomplete Information,” Mathematics of Operations Research, 28, 433–462. [455,460] CRIPPS, M., E. DEKEL, AND W. PESENDORFER (2005): “Reputation With Equal Discounting in Repeated Games With Strictly Conflicting Interests,” Journal of Economic Theory, 121, 259–272. [474-476] CRIPPS, M., K. SCHMIDT, AND J. THOMAS (1996): “Reputation in Perturbed Repeated Games,” Journal of Economic Theory, 69, 387–410. [474] ELY, J., AND J. VÄLIMÄKI (2002): “A Robust Folk Theorem for the Prisoner’s Dilemma,” Journal of Economic Theory, 102, 84–105. [453] ELY, J., J. HÖRNER, AND W. OLSZEWSKI (2005): “Belief-Free Equilibria in Repeated Games,” Econometrica, 73, 377–415. [453] FORGES, F. (1992): “Non-Zero-Sum Repeated Games of Incomplete Information,” in Handbook of Game Theory, Vol. 1, ed. by R. J. Aumann and S. Hart. Amsterdam: North Holland. [455] FORGES, F., AND E. MINELLI (1997): “A Property of Nash Equilibria in Repeated Games With Incomplete Information,” Games and Economic Behavior, 18, 159–175. [455] FUDENBERG, D., AND D. LEVINE (1989): “Reputation and Equilibrium Selection in Games With a Single Patient Player,” Econometrica, 57, 759–778. [473] FUDENBERG, D., AND E. MASKIN (1986): “The Folk Theorem in Repeated Games With Discounting or With Incomplete Information,” Econometrica, 54, 533–554. [471] (1991): “On the Dispensability of Public Randomization in Discounted Repeated Games,” Journal of Economic Theory, 53, 428–438. [484,485] HANSEN, L. P., AND T. J. SARGENT (2007): Robustness. Princeton, NJ: Princeton University Press. [454] HARSANYI, J. C. (1967–1968): “Games With Incomplete Information Played by Bayesian Players,” Management Science, 14, 159–182, 320–334, 486–502. [453,456] HART, S. (1985): “Nonzero-Sum Two-Person Repeated Games With Incomplete Information,” Mathematics of Operations Research, 10, 117–153. [455] HÖRNER, J., AND S. LOVO (2009): “Supplement to ‘Belief-Free Equilibria in Games With Incomplete Information’,” Econometrica Supplemental Material, 77, http://www.econometricsociety. org/ecta/Supmat/7134_extensions.pdf. [472] HÖRNER, J., AND W. OLSZEWSKI (2006): “The Folk Theorem With Almost-Perfect Monitoring,” Econometrica, 74, 1499–1544. [466] ISRAELI, E. (1999): “Sowing Doubt Optimally in Two-Person Repeated Games,” Games and Economic Behavior, 28, 203–216. [472,474] KALAI, E. (2004): “Large Robust Games,” Econometrica, 72, 1631–1665. [453] KOREN, G. (1992): “Two-Person Repeated Games Where Players Know Their Own Payoffs,” Working Paper, New York University. [455,462,469] LUCE, D., AND H. RAIFFA (1957): Games and Decisions. New York: Wiley. [454] MEGIDDO, N. (1980): “On Repeated Games With Incomplete Information Played by NonBayesian Players,” International Journal of Game Theory, 9, 157–167. [455] MILNOR, J. (1954): “Games Against Nature,” in Decision Processes, ed. by R. M. Thrall, C. H. Coombs, and R. L. Davis. New York: Wiley, 49–59. [455] MONDERER, D., AND M. TENNENHOLTZ (1999): “Dynamic Non-Bayesian Decision Making in Multi-Agent Systems,” Annals of Mathematics and Artificial Intelligence, 25, 91–106. [454-456] PICCIONE, M. (2002): “The Repeated Prisoner’s Dilemma With Imperfect Private Monitoring,” Journal of Economic Theory, 102, 70–83. [453] SCHMIDT, K. (1993): “Reputation and Equilibrium Characterization in Repeated Games With Conflicting Interests,” Econometrica, 61, 325–351. [473,474] SHALEV, J. (1994): “Nonzero-Sum Two-Person Repeated Games With Incomplete Information and Known-Own Payoffs,” Games and Economic Behavior, 7, 246–259. [455,469,472] SORIN, S. (1986): “On Repeated Games With Complete Information,” Mathematics of Operations Research, 11, 147–160. [484,485]
BELIEF-FREE EQUILIBRIA IN GAMES
487
NEUMANN, J., AND O. MORGENSTERN (1944): Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press. [454]
VON
Dept. of Economics, Yale University, Box 208281, New Haven, CT 06520-8281, U.S.A. and Centre for Economic Policy Research, London EC1V 0DG, U.K.;
[email protected] and Dept. of Finance and Economics, HEC School of Management, Paris and GREGHEC, 1 rue de la Liberation, 78351 Jouy-en-Josas, France;
[email protected]. Manuscript received April, 2007; final revision received August, 2008.
Econometrica, Vol. 77, No. 2 (March, 2009), 489–536
ROBUST PRIORS IN NONLINEAR PANEL DATA MODELS BY MANUEL ARELLANO AND STÉPHANE BONHOMME1 Many approaches to estimation of panel models are based on an average or integrated likelihood that assigns weights to different values of the individual effects. Fixed effects, random effects, and Bayesian approaches all fall into this category. We provide a characterization of the class of weights (or priors) that produce estimators that are first-order unbiased. We show that such bias-reducing weights will depend on the data in general unless an orthogonal reparameterization or an essentially equivalent condition is available. Two intuitively appealing weighting schemes are discussed. We argue that asymptotically valid confidence intervals can be read from the posterior distribution of the common parameters when N and T grow at the same rate. Next, we show that random effects estimators are not bias reducing in general and we discuss important exceptions. Moreover, the bias depends on the Kullback–Leibler distance between the population distribution of the effects and its best approximation in the random effects family. Finally, we show that, in general, standard random effects estimation of marginal effects is inconsistent for large T , whereas the posterior mean of the marginal effect is large-T consistent, and we provide conditions for bias reduction. Some examples and Monte Carlo experiments illustrate the results. KEYWORDS: Panel data, incidental parameters, bias reduction, integrated likelihood, priors.
1. INTRODUCTION IN A PANEL MODEL the likelihood of the data yi for a given unit is typically a function f (yi θ αi ) = fi (θ αi ) of common and individual specific parameters θ and αi , respectively. Interest centers on the estimation of θ or other common policy parameters constructed as summary measures of the two types of parameters and data. The central feature of this estimation problem is the presence of many nuisance parameters (the individual effects) when the cross-sectional dimension is large relative to the number of time-series observations. Many approaches to estimation of θ in this context are based on an average likelihood that assigns weights to different values of αi : fia (θ) = fi (θ αi )wi (αi ) dαi (1) where wi (αi ) is a possibly θ-specific weight, related to a discrete or continuous measure. An estimate of θ is then usually chosen to maximize the average N likelihood of the sample under cross-sectional independence: i=1 ln fia (θ). 1 We thank Victor Aguirregabiria, Josh Angrist, Gary Chamberlain, Gabriele Fiorentini, JeanPierre Florens, Jinyong Hahn, Laura Hospido, Guido Imbens, Thierry Magnac, Jean-Marc Robin, Enrique Sentana, and two anonymous referees for useful comments. All remaining errors are our own. Research funding from the Spanish MEC Grant SEJ2005-08880 is gratefully acknowledged.
© 2009 The Econometric Society
DOI: 10.3982/ECTA6895
490
M. ARELLANO AND S. BONHOMME
A fixed effects approach that estimates θ jointly with the individual effects by maximum likelihood (ML) falls into this category with weights assigning all αi (θ), where αi (θ) is the maximum likelihood estimator of αi for mass to αi = given θ. That is, (2)
wi (αi ) = δ(αi − αi (θ))
where δ(·) denotes Dirac’s delta function. The resulting average likelihood in αi (θ)). this case is just the concentrated likelihood fi (θ A random effects approach is also based on an average likelihood in which the weights are chosen as a model for the distribution of individual effects in the population given covariates and initial observations. In this case, wi (αi ) is a parametric or semiparametric density or probability mass function which does not depend on θ, but includes additional unknown coefficients: wi (αi ) = πi (αi ; ξ) Finally, in a Bayesian approach, beginning with a joint prior for common and individual parameters π(θ α1 αN ), an average likelihood is also constructed. In this case, weights are chosen as a formulation of the prior probability distribution of αi given θ, covariates, and initial observations, under the assumption of prior conditional independence of α1 αN given θ: wi (αi ) = πi (αi |θ) such that (3)
π(θ α1 αN ) = π1 (α1 |θ) πN (αN |θ)π(θ)
However, αi and θ need not be independent, so that the weights assigned to different values of αi may depend on the value of θ. All these approaches, in general, lead to estimators of θ that are not consistent as N tends to infinity for fixed T , but have large-N biases of order 1/T . This situation, known as the incidental parameter problem, is of particular concern when T is small relative to N (a common situation in applications) and has become one of the main challenges in modern econometrics.2 The traditional reaction to this problem has been to look for estimators that yield fixed-T consistency as N goes to infinity.3 One drawback of these methods is that they are somewhat limited to linear models and certain nonlinear models, often due to the fact that fixed-T identification itself is problematic. 2
The classic reference on the incidental parameter problem is Neyman and Scott (1948). Lancaster (2000) reviewed the history of the problem since then. 3 See Arellano and Honoré (2001) for a review.
PRIORS IN NONLINEAR PANEL DATA
491
Other considerations are that their properties may deteriorate as T increases and that there may be superior methods that are not fixed-T consistent.4 More recently, it has been argued that the incidental parameter problem can be viewed as time-series finite-sample bias when T tends to infinity. Following this perspective, several approaches have been proposed to correct for the time-series bias. These methods include bias-correction of the ML estimator of the common parameters (Hahn and Newey (2004), Hahn and Kuersteiner (2004), Dhaene, Jochmans, and Thuysbaert (2006)), of the moment equation (Woutersen (2002), Arellano (2003), Carro (2007)), or of the objective function (Arellano and Hahn (2006, 2007), Bester and Hansen (2005a), Hospido (2006)), each of them based on analytical or simulation-based approximations. The aim in this literature has been to obtain estimators of θ with biases of order 1/T 2 (as opposed to 1/T ) and similar large-sample dispersion as the corresponding uncorrected methods when T/N tends to a constant. This is done in the hope that the reduction in the order of magnitude of the bias will essentially eliminate the incidental parameter problem, even in panels where T is much smaller than N, as long as individual time series are statistically informative. In this paper, we consider estimators that maximize an average likelihood such as (1) and provide a characterization of the class of weights that produce estimators that are first-order unbiased. Specifically, we consider θ= N arg maxθ i=1 ln fia (θ) for general weight functions, or priors, wi (αi ).5 For fixed θ. In general, θT = θ0 . T , we can define the pseudo true value θT = plimN→∞ However, expanding in powers of T , θT = θ0 +
1 B +o T T
We look for priors that yield B = 0. Our results suggest new bias-reducing estimators with attractive computational properties, as well as a natural way to obtain asymptotic confidence intervals. They also provide important insights into the properties of fixed effects, random effects, and Bayesian nonlinear panel estimators in a unified framework. The approach we follow was first considered in the panel data context by Lancaster (2002) from a Bayesian perspective, in situations where common parameters and fixed effects can be made information orthogonal by repara4
Alvarez and Arellano (2003) showed that standard panel generalized method of moments (GMM) estimators of linear dynamic models are asymptotically biased as T and N increase at the same rate. 5 We shall interchangeably use the terms “weights” and “priors,” since in this paper we treat priors as automatic weighting schemes.
492
M. ARELLANO AND S. BONHOMME
meterization.6 Indeed, it can be shown that under information orthogonality, taking a uniform prior for the effects reduces the bias on the parameter of interest. In this paper we generalize this approach to situations where orthogonal reparameterizations do not exist. We start with a characterization of bias-reducing priors. For a given weight function or prior, we derive the expression of the 1/T term of the bias of the average likelihood relative to an infeasible average likelihood without uncertainty about pseudo true values of the effects for given values of θ. We use this finding to show that there always exist bias-reducing weights. This result provides a generalization of Lancaster’s approach to a much wider class of models. We also find an expression for the bias of the score of the average or integrated likelihood, which allows us to make the link with information orthogonality. Namely we show that, when (generalized) orthogonal reparameterizations of the fixed effects are not available, bias-reducing priors will in general, depend on the data. We discuss two specific data-dependent bias-reducing priors. The first one, which we call the robust prior, can be written as a combination of a Hessian and an outer product of the score term. As such it is related to, but different from, the nonsubjective prior introduced by Harold Jeffreys. The second biasreducing prior is just the normal approximation to the sampling distribution of the estimated effects for given θ: αi (θ)] wi (αi ) ∼ N αi (θ) Var[ The bias-reduction property comes from the fact that, contrary to (2), the variability of the fixed effects estimates and its dependence on θ are taken into account. Given a bias-reducing prior, estimation of the common parameters can be performed by integration methods, as well as by using Bayesian simulation techniques such as the Markov chain Monte Carlo. The possibility of using computationally efficient techniques for estimation is an appealing feature of the method we propose. In addition, simulation methods can also be useful to compute confidence intervals. Building on Chernozhukov and Hong (2003), we argue that asymptotically valid confidence intervals of the parameter estimates can be read from the quantiles of the posterior distribution of θ when N and T grow at the same rate. Next we study random effects estimation, which we see as a particular case of the previous analysis when the priors on the individual effects are independent of the common parameters. We find that in the absence of prior knowledge on the distribution of the individual effects in the population, it is not possible, in 6
The classic paper on information orthogonality is Cox and Reid (1987), and its discussion by Sweeting (1987) makes the connection between orthogonality and inference from the integrated likelihood.
PRIORS IN NONLINEAR PANEL DATA
493
general, to correct for first-order bias. For a given random effects specification, we characterize the set of models for which random effects maximum likelihood (REML) is robust. As an important special case, we derive a necessary and sufficient condition for the Gaussian REML estimator to be bias reducing, which includes the class of linear autoregressive models. In more general nonlinear models, however, the use of Gaussian REML has no bias-reducing asymptotic justification. In contrast, if the random effects family approximates the population distribution of individual effects well, the properties of REML improve. Specifically, we show that the first-order bias of the REML estimator depends on the distance between the distribution of individual effects and its best approximation, in a Kullback–Leibler sense, in the random effects family. This suggests that using a flexible distribution for the effects may reduce the bias on the parameter of interest. As an example, we consider the case of a normal mixture with a number of components that grows with N, and we obtain first-order bias reduction of the REML estimator in a model without covariates. Finally, we study the estimation of averages over individual effects, such as average marginal effects. We compare two estimators: First, the standard random effects estimator, which is inconsistent for large T unless the population distribution of the effects belongs to the chosen family of priors; second, the Bayesian fixed effects (BFE) estimator, defined as the posterior mean of the marginal effect, which is large-T consistent. Thus, in the presence of misspecification, by updating the prior given the data, the bias of marginal effects is reduced by an order of magnitude. We compute the first-order bias term of BFE estimators of marginal effects. Priors that are bias reducing for the common parameters do not lead, in general, to bias reduction of marginal effects, and bias-reducing priors for marginal effects are specific to the effect considered. The BFE first-order bias depends on the distance between the population distribution of the effects and its best fitting approximation in the chosen family of priors. So, while updating lowers the bias on the marginal effects by an order of magnitude, the bias can be further reduced either by using a bias-reducing prior or a sufficiently close approximating family to the distribution of the effects. The related literature includes Woutersen (2002), who obtained the firstorder bias of the integrated likelihood estimator in the case where parameters are information orthogonal, and proposed a modification of the score when there is no orthogonality. In a contribution closely related to ours, Severini (1999) studied the conditions under which a classical pseudo-likelihood is asymptotically equivalent to some integrated likelihood, corresponding to a given prior distribution for the effects. The conditions he finds can be seen as a special case of our results when parameters are information orthogonal. Some of the results of this paper have been independently obtained by Bester and Hansen (2005b). They considered the form of bias-reducing priors for general parametric likelihood models and provided a data-dependent prior, which coincides with one of our proposals, but their focus is not on panel data and they
494
M. ARELLANO AND S. BONHOMME
do not discuss the duality between existence of orthogonal reparameterizations and non-data-dependent bias-reducing priors. Other important differences are that we provide a formal justification for bias reduction in the panel context and that we are also concerned with developing a framework where we can study the bias-reducing properties of random effects estimators. The plan of the paper is as follows. In Section 2, we derive the expression of the bias of the average likelihood and make the link with information orthogonality. In Section 3, we obtain analytical expressions of two special biasreducing weight functions and discuss inference issues. Section 4 focuses on the bias-reducing properties of random effects estimators. In Section 5, we study the properties of marginal effects. Section 6 illustrates the results by means of two examples: the dynamic AR(p) model and the static logit model with fixed effects. In Section 7, we report a small Monte Carlo simulation to study the finite-sample behavior of the proposed estimators. Section 8 concludes. The Appendix contains proofs of results from Sections 2, 3, 4.1, and 4.2. Proofs of the remaining results, which are of a more technical nature, are in the online Supplemental material (Arellano and Bonhomme (2009)) on the journal website. 2. BIASES OF THE INTEGRATED LIKELIHOOD AND SCORE In this section, we derive the expression of the first-order bias of the integrated likelihood with respect to an arbitrary prior distribution for the individual effects. We start by setting the notation. 2.1. Notation it
Let (yit x ) , i = 1 N and t = 0 1 T , be the set of observations on the endogenous variable yit and a vector of strictly exogenous variables xit that we assume are independent and identically distributed (i.i.d.) across individuals. The density of yit conditioned on (xi1 xiT ) and lagged y’s is given by fit (yit |θ0 αi0 ) ≡ f yit |xit yi(t−1) ; θ0 αi0 which leads to the expression for the scaled individual log-likelihood conditioned on exogenous covariates and initial observations: i (θ αi ) =
T 1
ln fit (yit |θ αi ) T t=1
The likelihood is assumed to depend on a vector of common parameters θ and scalar individual fixed effects α1 αN .7 Then let πi (αi |θ) be a conditional 7 Considering further lags and multiple fixed effects would complicate the notation, but leave the essence of what follows unaltered.
PRIORS IN NONLINEAR PANEL DATA
495
prior distribution on the individual fixed effect given θ. The conditioning on θ follows from our treatment of αi as nuisance parameters, while θ are the parameters of interest. Moreover, the subindex i in πi refers to possible conditioning on strictly exogenous regressors and initial conditions. Throughout the paper, we will assume that standard regularity conditions are satisfied (e.g., Severini (1999)). In particular, all likelihood and pseudolikelihood functions as well as all priors will be three times differentiable. We will also assume that the prior is not dogmatic in the following sense. ASSUMPTION 1: The support of πi (αi |θ) contains an open neighborhood of the true parameters (αi0 θ0 ). The prior will generally depend on T . We assume that the order of magnitude of the logarithm of the prior is bounded as T increases: ASSUMPTION 2: When T tends to infinity, we have, for all θ and αi , ln πi (αi |θ) = O(1) uniformly over i8 Concentrated Likelihood Our analysis makes use of three different objective functions at the individual level. The first one is the concentrated or profile likelihood. It is deαi (θ)), where the fixed effects estimates solve α (θ) = fined as ci (θ) = i (θ N i c arg maxαi i (θ αi ). Thus, the ML estimator solves θML = arg maxθ i=1 i (θ). As is well known, θML is, in general, inconsistent for fixed T as N → ∞. Integrated Likelihood Bias-corrected estimators for θ based on the concentrated likelihood have been recently studied in the statistical and econometric literatures (Arellano and Hahn (2007)). In this paper, we study the behavior of the integrated likelihood with respect to a given prior πi (αi |θ). The individual log integrated likelihood is given by 1 I i (θ) = ln exp[T i (θ αi )]πi (αi |θ) dαi T As noted by Berger, Liseo, and Wolpert (1999), this likelihood would be acceptable to a subjective Bayesian whose joint prior is separable in the individual effects; see (3). From this perspective, in this paper we implicitly assume a uniform prior on θ: π(θ) ∝ 1.9 Allowing for any nondogmatic prior on θ does not affect the analysis. 9
We write a ∝ b to denote that a and b are equal up to a multiplicative constant.
496
M. ARELLANO AND S. BONHOMME
Target Likelihood We shall compute the first-order bias of the integrated likelihood relative to a target likelihood without uncertainty about the value of the effects for given θ. Let the target likelihood be i (θ) = i (θ αi (θ)), where αi (θ) = arg maxαi plimT →∞ i (θ αi ). This function possesses many properties of a proper likelihood. In particular, it is maximized at θ0 and satisfies Bartlett identities (Severini (2000)). Note that the effects αi (θ)—and as such the likelihood i (θ)—are infeasible. The target likelihood provides a useful theoretical benchmark to compute first-order biases. It is a “least favorable” target likelihood in the sense that the expected information for θ calculated from i (θ) coincides with the partial expected information. The concentrated and target likelihood functions can be regarded as integrated likelihood functions with respect to the priors π i (αi |θ) = δ(αi − αi (θ))
and πi c (αi |θ) = δ(αi − αi (θ))
respectively. In this perspective, πic can be interpreted as a sample counterpart of π i . Below, we investigate the existence of nondegenerate feasible counterparts of π i that, unlike πic , reduce first-order bias. Last, we denote the observed score with respect to the fixed effect as vi (θ αi ) =
∂ i (θ αi ) ∂αi
and denote its derivatives as α
vi i (θ αi ) = αα
∂vi (θ αi ) ∂αi
vi i i (θ αi ) =
∂2 vi (θ αi ) ∂α2i
viθ (θ αi ) =
∂vi (θ αi ) ∂θ
etc.
2.2. Bias of the Integrated Likelihood We now derive the expression of the first-order bias of the individual integrated likelihood relative to the target likelihood: 1 βi (θ) +O Eθ0 αi0 [ Ii (θ) − i (θ)] = C st + T T2 for a given prior πi (αi |θ).10 The expectation is taken with respect to exp[T i (θ0 αi0 )], so that a quantity like Eθ0 αi0 [ Ii (θ)] will depend on θ, θ0 , and αi0 . We shall proceed in two steps. 10 Throughout the paper, we use C st to denote any constant term, which depending on the context may be scalar or vector-valued, and stochastic or nonstochastic.
PRIORS IN NONLINEAR PANEL DATA
497
In a first step, we use a Laplace approximation (e.g., Tierney, Kass, and Kadane (1989)) to link the integrated and the concentrated likelihood functions. The result is contained in the following lemma. LEMMA 1: Let Assumptions 1 and 2 hold. Then (4)
1 α ln Eθ0 αi0 [−vi i (θ αi (θ))] 2T 1 1 + ln πi (αi (θ)|θ) + O T T2
Eθ0 αi0 [ Ii (θ) − ci (θ)] = C st −
In a second step we use the formula that gives the first-order bias of the concentrated likelihood (e.g., Arellano and Hahn (2006, 2007)): (5)
Eθ0 αi0 [ ci (θ) − i (θ)] =
−1 1 α Eθ0 αi0 [−vi i (θ αi (θ))] 2T 1 × Eθ0 αi0 [T vi2 (θ αi (θ))] + O T2
The expression of the first-order bias of the integrated likelihood then follows directly. THEOREM 1: Let Assumptions 1 and 2 hold. Then 1 βi (θ) +O Eθ0 αi0 [ (θ) − i (θ)] = C + T T2 I i
st
where (6)
βi (θ) =
−1 1 α Eθ0 αi0 [−vi i (θ αi (θ))] Eθ0 αi0 [T vi2 (θ αi (θ))] 2 1 α − ln Eθ0 αi0 [−vi i (θ αi (θ))] + ln πi (αi (θ)|θ) 2
As the right-hand side of (6) is O(1), Theorem 1 shows that the effect of the prior vanishes as the amount of data increases. When T goes to infinity, the bias of the integrated likelihood goes to zero irrespective of the prior, provided that the latter is nondogmatic. In Section 4, we will see that this property is shared by random effects panel data models. However, it turns out that the prior has an effect on the first-order bias of the integrated likelihood as, in general, βi (θ) is not locally constant around θ0 .
498
M. ARELLANO AND S. BONHOMME
2.3. Bias of the Integrated Score We start with a definition of robust priors. DEFINITION 1: Let bi (θ0 ) = ∂θ∂ |θ0 βi (θ) be the first-order bias of the integrated score evaluated at the true value. A prior family is said to be bias reducing, or robust, if and only if N 1
bi (θ0 ) = o(1) N→∞ N i=1
b∞ (θ0 ) ≡ plim
Bias reduction of the moment equation implies bias reduction of the estimator (e.g., Arellano and Hahn (2006)). So, for a robust prior family, the mode of the integrated likelihood θIML = arg max θ
N
Ii (θ)
i=1
has zero first-order bias; that is, 1 plim θIML = θ0 + o T N→∞ We now use the results of the previous subsection to characterize robust priors. From Theorem 1 we can obtain the expression of the bias of the integrated score evaluated at the true value, bi (θ0 ). It is convenient, in the likelihood context, to use a simplification proposed by Pace and Salvan (2006). At the true value θ0 , where the information matrix equality is satisfied, we have
−1 ∂
α (7) Eθ0 αi0 [−vi i (θ αi (θ))] Eθ0 αi0 [T vi2 (θ αi (θ))]
∂θ θ0
−1 ∂
α ln Eθ0 αi0 [−vi i (θ αi (θ))] Eθ0 αi0 [T vi2 (θ αi (θ))] =
∂θ θ0 The bias of the integrated score is thus given by
∂
∂
α (8) ln πi (αi (θ)|θ) − ln Eθ0 αi0 [−vi i (θ αi (θ))] bi (θ0 ) =
∂θ θ0 ∂θ θ0 −1/2 × Eθ0 αi0 [T vi2 (θ αi (θ))] Hence we get the following result:
PRIORS IN NONLINEAR PANEL DATA
499
THEOREM 2: A prior πi is bias reducing if
∂
ln πi (αi (θ)|θ) ∂θ θ0
−1/2 ∂
α = ln Eθ0 αi0 [−vi i (θ αi (θ))] Eθ0 αi0 [T vi2 (θ αi (θ))] ∂θ θ0 1 +O T Theorem 2 gives a sufficient condition for bias reduction. The reason why the condition is not always necessary is that bias reduction might happen because of cross-sectional averaging, that is, b∞ (θ0 ) could be O(1/T ) even if some of the bi (θ0 ), i = 1 N, are not. However, the bias-reducing priors that we discuss in the next section will satisfy bi (θ0 ) = O(1/T ) for all i. 2.4. Nondistribution Dependent Bias-Reducing Priors and Orthogonality We turn to consider the role of information orthogonality. The next proposition shows the link between the ability of a prior to reduce bias and information orthogonality. PROPOSITION 1: The following equality holds:
∂
∂
bi (θ0 ) = (9) ln πi (αi (θ)|θ) + ρi (θ0 αi ) ∂θ θ0 ∂αi αi0 where −1 α ρi (θ αi ) ≡ Eθαi [−vi i (θ αi )] Eθαi [viθ (θ αi )] Proposition 1 shows that the quantity ρi (θ αi ), the projection coefficient in the efficient score for θ, is key in the ability of a given prior to reduce bias. A particular case is the information orthogonality studied by Cox and Reid (1987) and Lancaster (2002). In that case, the information matrix is block diagonal so that Eθαi [viθ (θ αi )] is identically zero. It follows from Proposition 1 that the uniform prior πi (αi |θ) ∝ 1 is bias reducing. The same is true of all priors that are independent of θ in light of Proposition 1 and the fact that (10)
∂αi (θ)
= ρi (θ0 αi0 ) ∂θ θ0
500
M. ARELLANO AND S. BONHOMME
Conversely, Proposition 1 implies that the uniform prior reduces bias if and only if
N 1 ∂
(11) ρi (θ0 αi ) = o(1) plim ∂αi αi0 N→∞ N i=1 Condition (11) is slightly more general than information orthogonality. For it to be satisfied, it suffices that ρi (θ αi ) is a function of θ only. The uniform prior does not depend on the distribution of the data. That is, it is independent of the true parameters θ0 α10 αN0 . We shall refer to the (infeasible) weighting schemes that depend on the true values of the parameters as distribution dependent. In particular, the uniform prior is not distribution dependent. Other non-distribution-dependent priors are given by orthogonal reparameterizations of the fixed effects, when available. Let ψi = ψi (αi θ) be a reparameterization of the individual effects. To every prior πi (ψi |θ) on ψi we can associate the transformed prior in the original parameterization:
∂ψi (αi θ)
πi (αi |θ) = πi (ψi (αi θ)|θ)
∂αi The following result shows that the bias-reducing properties of a prior are not affected by a reparameterization of the effects. PROPOSITION 2: πi is bias-reducing in the transformed parameterization ψi if and only if πi is bias reducing in the original parameterization αi . We now apply Proposition 2 to a reparameterization ψi = ψi (αi θ) such that ψi and θ are information orthogonal in the sense of equation (11). In this case the uniform prior on ψi is bias reducing. Hence, using Proposition 2, the transformed prior on αi ,
∂ψi (αi θ)
πi (αi |θ) =
∂αi is also bias reducing. Note that this prior is the Jacobian of the transformation which maps (αi θ) onto (ψi θ). Conversely, any non-distribution-dependent bias-reducing prior πi (αi |θ) can be associated with an orthogonal reparameterization in the sense of equation (11). It suffices to take ψi = ψi (αi θ), where αi πi (α|θ) dα ψi (αi θ) = −∞
This discussion shows that there exists a mapping between non-distributiondependent bias reducing priors and orthogonal reparameterizations in the sense of (11). Now, such reparameterizations do not always exist. In the mul-
PRIORS IN NONLINEAR PANEL DATA
501
tiparameter case (when θ is a vector) one ends up with a partial differential equation which has no solution in general, in close analogy with the case of strict information orthogonality (Cox and Reid (1987)). Hence, to deal with the case where orthogonal reparameterizations are not available, it is, in general, necessary to search for robust priors that depend on the distribution of the data. We address this task in the next section. 3. CONSTRUCTIVE BIAS-REDUCING PRIORS In this section we discuss two specific data-dependent priors that are bias reducing independently of the possibility of orthogonalization. 3.1. A Robust Prior Theorem 2 shows that the following prior is bias reducing: (12)
πiR (αi |θ) ∝ E[−vi i (θ αi )]{ E[vi2 (θ αi )]}−1/2 α
α α E[vi2 (θ αi )] are consistent estimates of Eθ0 αi0 [−vi i (θ where E[−vi i (θ αi )] and αi )] and Eθ0 αi0 [vi2 (θ αi )], respectively, when T tends to infinity. Note that replacing the expectations by large-T consistent estimates in the condition of Theorem 2 does not affect the result.11 The bias-reducing prior (12), which we call the robust prior, depends on the data. The discussion in the previous section has shown that non-datadependent priors are generally not robust in cases when orthogonal reparameterizations of the fixed effects are not available.12 α Moreover, πiR is the combination of a Hessian term ( E[−vi i (θ αi )]) and 2 a outer product term ( E[vi (θ αi )]). A closely related expression appears in Jeffreys’ automatic prior when θ is kept fixed, the expression of which is 1/2 α (13) πiJ (αi |θ) ∝ Eθαi [−vi i (θ αi )]
A crucial difference between πiR (αi |θ) and πiJ (αi |θ) is that Jeffreys’ prior does not depend on the data. In fact, Jeffreys’ prior (13) is generally not bias reducing (see Hahn (2004)). Before ending this discussion, note that we have assumed a likelihood setup, as opposed to a pseudo-likelihood setup. The likelihood assumption is required to obtain equation (7), which uses the information identity at true parameter values. In the pseudo-likelihood case, however, it is still possible to 11 Thus, the problem of computing bias-reducing priors is analogous to the problem of estimating an additive bias correction to the concentrated likelihood. See, for example, Hahn and Kuersteiner (2004), Arellano and Hahn (2006, 2007), and Pace and Salvan (2006). 12 This result is in a similar spirit to one in Wasserman (2000), which showed that for certain mixture models, data-dependent priors are the only priors that produce intervals with secondorder frequentist coverage.
502
M. ARELLANO AND S. BONHOMME
use Theorem 1 to obtain a robust weighting scheme for an integrated objective function. In effect, using the expression of the bias of the integrated likelihood (6), it is straightforward to show that the following prior is bias reducing in both likelihood and pseudo-likelihood settings: T α α E[−vi i (θ αi )]}−1 E[vi2 (θ αi )] (14) { E[−vi i (θ αi )]}1/2 exp − { 2 Coming back to the likelihood setup, note that Proposition 1 shows that many other priors are robust. In particular, the two priors given by (12) and (14) are bias reducing. Using (14) instead of (12) for estimation can make a difference in finite samples. The Monte Carlo simulations reported below will illustrate this remark. 3.2. Robust Reparameterizations The following result provides an additional characterization of the robust prior. PROPOSITION 3: We have (15)
πiR ( αi (θ)|θ) ∝
1 αi (θ)) Var(
1 + Op
1 T
√ αi (θ)) is a consistent estimate of the asymptotic variance of T × where T Var( ( αi (θ) − αi (θ)) when T tends to infinity. In addition, every nondogmatic prior satisfying (15) is bias reducing. Proposition 3 sheds some light on the properties of the robust prior. To see why, let us consider the reparameterization (16)
αi (θ) αi − ψi (αi θ) = αi (θ)) Var(
Reparameterizing the individual effects as in (16) amounts to rescaling the effects, weighting them in inverse proportion to the standard deviation of the fixed effects maximum likelihood estimator (MLE). Specifically, let us consider a prior on ψi that is independent of θ, with probability density function (p.d.f.) f . In terms of the original parameterization, the prior is13 αi (θ) 1 αi − R πi (αi |θ) = f αi (θ)) αi (θ)) Var( Var( 13 Note that πiR does not satisfy Assumption 2. This does not matter for the present discussion, however, as shown by the proof of Proposition 3.
PRIORS IN NONLINEAR PANEL DATA
503
Then, clearly αi (θ)|θ) ∝ πiR (
1 αi (θ)) Var(
It thus follows from Proposition 3 that πiR is bias reducing. For the particular choice of ψi ∼ N (0 1), we obtain the result that the normal approximation to the sampling distribution of the MLE αi (θ) is a biasreducing weighting scheme for αi : αi (θ)) (17) αi |θ ∼ N αi (θ) Var( Specifying a prior distribution on the fixed effects as in (17) is intuitively appealing from the point of view of bias reduction. First, unlike the robust prior (πiR ), this prior is proper, so that it will unambiguously lead to a proper posterior. Second, it can be seen as a feasible counterpart of the (degenerate) prior associated to the target likelihood (π i ). Unlike the prior associated with the concentrated likelihood (πic ), it takes into account the way the precision of αi (θ)) varies slowly with θ, the uniform prior αi (θ) varies with θ. When Var( on the original effects is bias reducing. This happens when parameters are information orthogonal. 3.3. Asymptotic Distribution and Inference Here we derive the asymptotic distribution of the integrated likelihood estimator and discuss how to perform inference from the posterior distribution of θ. θIML = Let Ii (θ) be associated with a bias-reducing prior. Let N arg maxθ i=1 Ii (θ) be the mode of the integrated likelihood. We are interested in the asymptotic distribution of θIML when N and T tend simultaneously to infinity at the same rate: T/N → C st > 0. N Let θ = arg maxθ i=1 i (θ) be the (infeasible) mode of the target likelihood. Because the prior is bias reducing, we have 1 θIML = θ + op T So, when N and T tend to infinity at the same rate, √ NT ( θIML − θ) = op (1) The mode of the integrated likelihood and the mode of the target likelihood are √ thus asymptotically equivalent.√In particular, the asymptotic variance of NT ( θIML − θ0 ) is equal to that of NT (θ − θ0 ). Now, θ has the same asymptotic dispersion as the maximum likelihood estimator θML . So, as in the case
504
M. ARELLANO AND S. BONHOMME
of the additive approaches to bias reduction (Hahn and Newey (2004)), bias reduction occurs with no increase in the asymptotic variance relative to fixed effects maximum likelihood. Given a robust weighting scheme, estimation based on the integrated likelihood can be performed using classical or Bayesian techniques. For this purpose, one can use integration routines (quadrature, Monte Carlo) to compute the integrated likelihood and then maximize the latter using optimization algorithms. This is the approach we have adopted in the Monte Carlo experiments reported below. However, in highly nonlinear models with possibly many parameters, this approach can be problematic. Our connection to Bayesian statistics makes it possible to use Bayesian techniques, such as Markov chain Monte Carlo (MCMC), to perform the estimation. Moreover, an additional appealing feature of the simulation approach is the ability to read confidence intervals directly from the posterior distribution. Following Chernozhukov and Hong (2003), it can be shown that in a double asymptotics perspective when N and T tend to infinity at the same rate, the quantiles of the posterior distribution of θ provide asymptotically valid confidence intervals for θ0 . Indeed, the marginal posterior of θ can be interpreted as a pseudo-posterior calculated from the integrated likelihood. Moreover, this objective function satisfies a generalized information equality in a double asymptotic sense. 4. RANDOM EFFECTS AND BIAS REDUCTION In this section, we study the first-order bias properties of random effects maximum likelihood (REML) estimators. 4.1. The Random Effects Model We assume that αi0 , i = 1 N, are drawn from a distribution with density π0 conditioned on covariates and initial observations. The marginal density of an observation is thus given by fi (yi1 yiT |yi0 θ0 π0 ) =
T f yit |xit yi(t−1) ; θ0 αi π0 (αi ) dαi t=1
This model is very common in the panel data literature. Often π0 is supposed to belong to a known parametric family such as the normal or a multinomial distribution with a finite number of mass points, possibly independent of covariates. In contrast, here we make no assumption about the functional form of π0 . Let ξ be a parameter and let πi (αi ; ξ) be a family of prior distributions indexed by ξ. A typical example is when π(αi ; ξ) is a normal distribution with unknown mean and variance, ξ = (m s2 ). Importantly, πi (αi ; ξ) does not depend directly on the common parameter θ or on the cumulative distribution function (c.d.f.) of the distribution of the data (that is, on the true parame-
PRIORS IN NONLINEAR PANEL DATA
505
ters θ0 αi0 ). Nevertheless, we do allow πi to depend on conditioning covariates and/or initial conditions. For example, the mean and variance of the normal m and s2 may be functions of covariates and/or initial conditions as in Chamberlain (1984). The function πi (αi ; ξ) has two possible interpretations. It can be regarded as a model for the population distribution of αi0 ; this is the “random effects” perspective. In a Bayesian perspective, it can also be seen as a hierarchical prior assuming independence between αi and θ. In both approaches, we are interested in the random effects pseudo-likelihood: 1 RE i (θ; ξ) = ln exp[T i (θ αi )]πi (αi ; ξ) dαi T which is the integrated likelihood with respect to the prior πi (αi ; ξ). 4.2. Robust Random Effects Here we study the existence of random effects specifications that are bias reducing for any population distribution of the individual effects π0 .14 It is convenient to start by concentrating the likelihood with respect to ξ. Let ξ(θ) = arg max ξ
N
RE i (θ; ξ)
i=1
The score of the concentrated random effects likelihood is given by
N N 1 ∂
RE 1 ∂
RE (θ; ξ(θ)) = (θ; ξ(θ0 )) N i=1 ∂θ θ0 i N i=1 ∂θ θ0 i where the equality comes from the envelope theorem. The bias of the score of the concentrated random effects likelihood is thus
N 1 ∂
RE b∞ (θ0 ) = plim (18) (θ; ξ(θ0 )) ∂θ θ0 i N→∞ N i=1 N 1
∂
RE = plim Eπ0 (θ; ξ(θ0 )) ∂θ θ0 i N→∞ N i=1 ξ(θ)). The following result helps to interpret the where ξ(θ) = plimN→∞ ( pseudo true value ξ(θ0 ). 14 In general, π0 is conditional on covariates and initial conditions, but for simplicity our notation does not make explicit that π0 may be unit-specific.
506
M. ARELLANO AND S. BONHOMME
LEMMA 2: For all θ, we have (19)
N ∂ ln πi (αi (θ); ξ(θ)) 1 1
plim =O Eπ0 ∂ξ T N→∞ N i=1
Lemma 2 provides a heuristic interpretation of ξ(θ), up to a O(1/T ) term, as the pseudo true value of ξ for the model πi (·; ξ) and the “data” α1 (θ) αN (θ). Evaluated at θ = θ0 , equation (19) shows that πi (·; ξ(θ0 )) is the best approximation to π0 , in a Kullback–Leibler sense, in the family πi (·; ξ). In the next subsection, we will see that the distance between π0 and its best approximation also matters for bias reduction. Equation (18) shows that the first-order bias properties of the random effects likelihood are the same as those of an integrated likelihood with prior πi (αi ; ξ(θ0 )). In particular, using Proposition 1 we obtain (20)
N 1
∂
b∞ (θ0 ) = plim Eπ0 ln πi (αi (θ); ξ(θ0 )) ∂θ θ0 N→∞ N i=1
∂
+ ρi (θ0 αi ) ∂αi αi0
So, using (20) together with equation (10) and rearranging, we find that REML is first-order bias reducing if and only if (21)
N 1
1 ∂
Eπ0 πi (αi ; ξ(θ0 ))ρi (θ0 αi ) = o(1) plim πi (αi0 ; ξ(θ0 )) ∂αi αi0 N→∞ N i=1
A first implication of (21) is that if the common parameters and the individual effects are information orthogonal, then every REML estimator is bias reducing. This is because in this case ρi (θ α) = 0 is identically zero. Another case where REML is bias reducing is when π0 belongs to the parametric family πi (·; ξ). Then the random effects model is correctly specified. So, under standard identification conditions, the REML estimator is fixed-T consistent, hence bias reducing. Moreover, equation (21) allows us to characterize the set of models for which a given random effects specification is bias reducing, as shown by the following theorem. THEOREM 3: Let πi (·; ξ) be a random effects specification depending on a qdimensional vector of hyperparameters ξ. Then REML is bias reducing for all π0 and covariate distributions if and only if there exists a constant dim(θ) × q matrix
PRIORS IN NONLINEAR PANEL DATA
507
Γ (θ) such that
∂
∂
(22) ρ (θ α)π (α; ξ(θ)) = Γ (θ) πi (αi ; ξ) + o(1) i i ∂α αi ∂ξ ξ(θ) Theorem 3 shows that for a given random effects family, the set of models where there is bias reduction is limited: it corresponds to ρi being a linear combination of q functions, where q is the number of hyperparameters. As an important special case, we mention the following corollary. COROLLARY 1—(Uncorrelated Random Effects): REML based on a location–scale family reduces first-order bias for all π0 and covariate distributions if and only if there exist γ1 (θ) and γ2 (θ) such that (23)
ρi (θ αi ) = γ1 (θ) + γ2 (θ)αi + o(1)
Corollary 1 gives a necessary and sufficient condition for REML based on a location–scale family to reduce bias. In the corollary, the mean and variance hyperparameters are independent of xi . We also have the following result, where we let the mean depend linearly on xi (correlated random effects).15 COROLLARY 2—(Correlated Random Effects): REML based on a location– scale family with mean depending linearly on xi reduces first-order bias for all π0 and covariate distributions if and only if there exist γ1 (θ) and γ2 (θ) such that (24)
ρi (θ αi ) = γ1 (θ)xi + γ2 (θ)αi + o(1)
In particular, these results apply to Gaussian REML. Section 6 will give examples of models that satisfy conditions (23) or (24), such as dynamic AR(p) models with or without strictly exogenous regressors. In these models, the bias of REML based on the Gaussian family is of order 1/T 2 . Still, most models do not satisfy conditions (23) or (24). In those cases, the bias of the Gaussian REML estimator is of order 1/T . Corollaries 1 and 2 are interestingly related to the minimax finite-sample result obtained by Chamberlain and Moreira (2008). Using a very different perspective, our results also emphasize the importance of the model’s linearity for Gaussian REML to have good properties. 4.3. Flexible Random Effects In the previous subsection we asked the question: Given a random effects family of priors, what is the set of models in which REML is robust for any population distribution of the individual effects? In particular, we required bias reduction to hold even if the population distribution π0 was very poorly 15
As in Chamberlain’s (1984) random effects probit, for example.
508
M. ARELLANO AND S. BONHOMME
approximated by the parametric family of prior distributions πi (·; ξ). In contrast, here we ask: Is it possible to reduce the bias on θ by choosing a family of priors that approximates π0 “sufficiently well?” Our motivation comes from the fact that in the absence of misspecification, that is, when π0 belongs to the chosen family of prior distributions, the bias is zero. To answer this question, it is convenient to define the objects ξ0 = arg max Eπ0 (ln π(αi0 ; ξ))
and π0 ≡ π(·; ξ0 )
ξ
ξ0 is the infeasible ML estimand of ξ, for the “data” α10 αN0 . So π0 is the best approximation to π0 , in a Kullback–Leibler sense, in the family π(·; ξ). π0 are theoretical objects.16 Note also that we have asNote that both ξ0 and sumed for expositional simplicity that πi (·; ξ) ≡ π(·; ξ) does not depend on covariates. We come back to this point at the end of this subsection. It is also convenient to define, for a density p, 2 1/2 p(αi0 ) K(π0 p) = Eπ0 ln π0 (αi0 )
K(π0 p) is the L2 Kullback–Leibler loss. We will use it to measure how close the true π0 and its best parametric approximation π0 are. Let θREML = arg max θ
N
RE i (θ ξ(θ))
i=1
N be the REML estimator and let θ = arg maxθ i=1 i (θ) be the infeasible mode θREML of the target likelihood. Unlike that of θ, the asymptotic distribution of is generally not centered at zero. The following theorem shows that the bias in the asymptotic distribution of θREML depends on the discrepancy between the true density π0 and its best fitting approximation π0 , as measured by the L2 Kullback–Leibler loss. The theorem requires some conditions on the tails of π0 that we detail in the Supplemental material, together with its proof. THEOREM 4: Let N and T tend to infinity such that N/T → C st . Under suitable regularity conditions, √ √ NT ( θREML − θ0 ) = NT (θ − θ0 ) + O(K(π0 π0 )) + op (1) Theorem 4 shows that if the distance between π0 and its best parametric approximation π0 is o(1), then the REML estimator is first-order un16 Note also that ξ0 does not coincide with ξ(θ0 ), although due to (19) their difference is O(1/T ).
PRIORS IN NONLINEAR PANEL DATA
509
biased and has the same asymptotic variance as the fixed effects estimator. As a special case, Theorem 4 implies that θREML and θ are asymptotically equivalent if the model is correctly specified and π0 belongs to the parametric family π(·; ξ). More interestingly, the result in Theorem 4 also suggests that for a flexible choice of π(·; ξ), one should be able to obtain asymptotically unbiased inference on θ. The following result formalizes this intuition in the case of normal mixtures. For this purpose, we adopt the setup in Ghosal and van der Vaart (2001). COROLLARY 3: Assume that π0 can be expressed as a mixture of normals of the form α−μ 1 π0 (α) = ϕ dH0 (μ σ) σ σ where σ ∈ [σ σ] belongs to a compact interval. Let π be the p.d.f. of a finite mixture of K normal components: K
1 α − μk π(α) = pk ϕ σk σk k=1 K where pk ≥ 0, k=1 pk = 1, and μk ∈ [−A A], with A = O((ln N)ν ) for some ν > 0. Assume also that there exists δ ∈ ]0 1] such that17 δ π0 (α) (25) π0 (α) dα < ∞ π0 (α) π0 (α)/ π0 (α)≥e1/δ Then, for K ≥ C ln N with C large enough, (26) K(π0 π0 ) = O N −1/2+γ for any γ > 0 So when N T tend to infinity such that N/T → C st , then √ √ (27) NT ( θREML − θ0 ) = NT (θ − θ0 ) + op (1) Corollary 3 shows that in the case where π0 is a mixture of normals, the rate of convergence of the discrete sieve MLE is almost root-N in (26). As noted by Ghosal and van der Vaart (2007) this near-parametric rate is driven by the assumptions on π0 . Working under much weaker assumptions, Ghosal 17 Condition (25) imposes that the tails of π0 are not too thin relative to those of π0 . We need this condition because Ghosal and van der Vaart (2001) bound the Hellinger distance between the two distributions (i.e., the L2 distance of square roots), while we need to bound the L2 Kullback– Leibler loss. A useful inequality between the two distances is given in Wong and Shen (1995). Also note that (25) is clearly satisfied if π0 is compactly supported.
510
M. ARELLANO AND S. BONHOMME
and van der Vaart (2007) found convergence rates of sieve MLEs that are close to the rate of nonparametric kernel estimators O(N −2/5 ). Applied to the case of finite mixtures of normals, their results imply that (27) holds for REML based on a normal mixture with a sufficiently large number of components, under much weaker assumptions on π0 . Indeed, for (27) to hold, we only need π0 ) = o(1) and do not require a specific convergence rate. that K(π0 Importantly, all results in this section are stated under the assumption that π and π0 do not depend on covariates, or that covariates are discrete and the analysis is conducted for specific values. If π0 depends on more general x’s, then the statements of the theorem and corollary will still hold, provided that we let πi (·; ξ) depend in an unrestricted way on xi . 5. POLICY PARAMETERS: MARGINAL EFFECTS 5.1. Estimating Marginal Effects In this section we study the bias properties of some estimators of averages over individual effects, such as average marginal effects. We consider quantities of the form M=
N 1
mi (θ0 αi0 ) N i=1
A first example is the marginal effect of a covariate in a probit or logit T model, for example, for probit mi (θ αi ) = θk T1 t=1 ϕ(xit θ + αi ), where ϕ is the N (0 1) density. Other examples are moments of the distribution of individual effects mi (θ αi ) = αki . A standard fixed effects estimator of M is given by N
FE = 1 mi ( θ αi ( θ)) M N i=1
where αi (θ) is the MLE of αi given θ, and θ is a possibly bias-reducing estimator of θ. This estimator was studied by Hahn and Newey (2004). Whether θ is FE has generally a nonzero first-order bias term. Hahn bias corrected or not, M and Newey suggested an approach to bias-correct the marginal effects also and obtained a bias of order 1/T 2 . We consider two other estimators of M. In a random effects framework with family πi (·; ξ), we may consider the standard random effects estimator given by N
RE = 1 M mi ( θ αi )πi (αi ; ξ( θ)) dαi N i=1
PRIORS IN NONLINEAR PANEL DATA
511
where θ is a large-T consistent estimator of θ, for example, the REML estimator, and ξ(θ) is the MLE of ξ given θ. More generally, assuming a family of prior distributions πi (αi |θ), we can consider a Bayesian fixed effects (BFE) estimator of M as
N 1 BFE = · · · mi (θ αi ) M N i=1 × p(α1 αN θ|y x) dα1 · · · dαN dθ where p is the posterior distribution of the model’s parameters given the data. BFE is the posterior mean of 1 N mi (θ αi ). One could as well consider the M i=1 N posterior mode. As before, assuming a nonflat prior on θ does not affect the large-T bias or the asymptotic distribution of the estimator.18 5.2. Bayesian Fixed Effects Estimation BFE . The following theorem gives the large-T bias of the BFE estimator M THEOREM 5: When T tends to infinity, 1 BM +o plim(MBFE − M) = T T N→∞ where
N 1 ∂
BM = plim mi (θ αi0 ) B ∂θ θ0 N→∞ N i=1
N −1 ∂
1
1 α Eθ0 αi0 (−vi i (θ0 α))
πi (αi0 |θ0 ) ∂α αi0 N→∞ N i=1
+ plim
α
× πi (α|θ0 )mi i (θ0 α) and B is the first-order bias of the mode of the integrated likelihood (or, equivalently, of the posterior mean of θ). Theorem 5 shows that the BFE estimator of M is large-T consistent, independently of πi , and gives an expression of the first-order bias. It follows that 18
In a random effects model, we could also consider another estimator, that one could refer to as Bayesian random effects, namely the posterior mean or mode of N 1 mi (θ αi )πi (αi ; ξ) dαi . Using a Laplace approximation, it is easy to show that this esi=1 N RE when N and T tend to infinity at the same rate. timator is asymptotically equivalent to M
512
M. ARELLANO AND S. BONHOMME
taking a robust prior on αi leads to first-order unbiasedness for θ (B = 0), but not for M in general (BM = 0). An exception where the two bias terms are zero occurs when M = m(θ0 ) does not depend on the individual effects. So the properties of the BFE estimator are similar to those of the standard fixed effects estimator. As in the case of common parameters θ, one may look for priors on αi that yield BM = 0. If parameters are information orthogonal, the uniform prior is not bias reducing for M if the marginal effect depends on individual effects. Instead, one may consider (28)
πim (αi ) =
1 α Eθ α [−vi i (θ0 αi )] α mi i (θ0 αi ) 0 i0
Under information orthogonality, πim is bias reducing for both θ and M. In the general case, one can verify that the following prior is robust for θ and M simultaneously: α
(29)
πiRm (αi |θ) =
mi i (θ0 αi (θ)) α Eθ0 αi0 [−vi i (θ0 αi )] α mi i (θ0 αi ) −1/2 × Eθ0 αi0 [vi2 (θ0 αi (θ))]
As the robust priors considered in Section 3, πiRm depends on the distribution of the data.19 However, πiRm also depends on mi , and although it is not unique, there does not seem to be a way to find priors that are bias reducing for any marginal effect considered. So, in practice, one would need to estimate the model with different priors on αi for the various marginal effects that one would consider. In keeping with the discussion in Section 4, we now look for a flexible specification for πi that is bias reducing, independently of the marginal effect considered. For this purpose we use the setup of Section 4.3, and denote the population distribution of individual effects as π0 , the parametric random effects π0 . Then we have the family as πi (·; ξ), and the best fitting approximation as following corollary to Theorem 5. COROLLARY 4: Under suitable regularity conditions given in the Supplemental material, 1 K (π π 0 0) BFE − M) = O +o plim(M T T N→∞ 19 As such, πiRm and πim are infeasible. Feasible counterparts could be constructed as explained in Section 3.
PRIORS IN NONLINEAR PANEL DATA
513
Corollary 4 shows that the first-order bias of the BFE estimator of M depends on the distance between the true π0 and its parametric approximation π0 , as measured by the L2 Kullback–Leibler loss. As in the case of Corollary 3, to eliminate first-order bias, one could choose πi (·; ξ) to be the p.d.f. of a finite normal mixture with a sufficiently large number of components. Finally, let us discuss inference when N and T tend to infinity at the same rate. Provided that one uses either a robust prior for √M or a flexible random BFE − M) is noreffects specification, the asymptotic distribution of NT (M mal with zero mean and variance given by the large-T inverse information matrix.20 In addition, asymptotically valid confidence intervals can be read from the posterior distribution of the marginal effects, as in the case of common parameters. 5.3. Random Efects Estimation Let us now turn to random effects estimation of marginal effects. The folRE is generally inconsistent when N and T tend lowing theorem shows that M to infinity. THEOREM 6: When T tends to infinity,
RE − M) = plim 1 plim(M N→∞ N→∞ N i=1 1 +O T N
mi (θ0 αi )( π0 (αi ) − π0 (αi )) dαi
RE or M BFE to estimate In a random effects framework one can use either M M. Theorem 5 showed that the BFE estimator of M is large-T consistent, independently of the priors postulated on the individual effects. In sharp contrast with this result, Theorem 6 shows that standard random effects estimators of M are inconsistent in general. This happens because, in the estimation of M, BFE updates the prior knowledge on the distribution of the fixed effects using M RE does not.21 the data, while M To summarize the results in this section, the comparison of Bayesian fixed effects and random effects estimators of marginal effects shows the benefits 20
Note that if we are interested instead in inference about the plim of M, then (unless mi is √ √ independent of αi ) the confidence intervals would be of order 1/ N as opposed to 1/ NT . This is because when N and T grow at the same rate, the sampling error due to the averaging over cross-sectional units dominates. 21 RE is O(K(π0 Under suitable tail assumptions it can be shown that the bias of M π0 )). However, using a flexible parametric family to reduce the bias would increase the asymptotic variance RE . of the estimator, because π0 appears in the first term of the expansion of M
514
M. ARELLANO AND S. BONHOMME
of updating by relying on the posterior distribution, as this reduces bias by an order of magnitude, from O(1) to O(1/T ). Moreover, the magnitude of the bias of the Bayesian fixed effects estimator depends on how well the parametric distribution of priors approximates the population distribution of individual effects. 6. EXAMPLES In this section and the next we consider two specific examples: a dynamic AR(p) model, and a static logit model. Derivations and an additional example concerning a Poisson counts model are available in Section S2 of the Supplemental material. 6.1. Dynamic AR(p) The model we consider is given by yit = μ10 yit−1 + · · · + μp0 yit−p + αi0 + εit (i = 1 N t = 1 T ) Let yi0 = (yi1−p yi0 ) be the vector of initial conditions that we assume is observed. Observations are i.i.d. across i. Moreover, it is assumed that (εi1 εiT ) |αi0 yi0 ∼ N (0 σ02 IT ) where IT is the identity matrix of order T . For this model there exist likelihood-based fixed-T consistent estimators (see, for example, Alvarez and Arellano (2004)), which can provide a useful benchmark for the application of our general methods. Another interesting aspect of this illustration is that, as we argue below, an orthogonal reparameterization is available for the first-order process, but not for models with p > 1. The individual log-likelihood is given by i (μ σ 2 αi ) =
1 ln f (yi |yi0 αi ; μ σ 2 ) T
T 1 1 1 (yit − xit μ − αi )2 = − ln(2π) − ln(σ 2 ) − 2 2 2T t=1 σ2
where xit = (yit−1 yit−p ) and μ = (μ1 μp ) . We show in the Supplemental material that a robust prior can be chosen as a large-T consistent estimate of the infeasible quantity −1/2 πiIR (αi |μ σ 2 ) ∝ 1 + a(μ − μ0 ) + bi (μ − μ0 αi − αi0 )
PRIORS IN NONLINEAR PANEL DATA
515
where a(·) and bi (· ·) are linear and quadratic functions, respectively, the coefficients of which depend on true parameter values and initial conditions. More precisely, a ≡ a(μ0 ) is a function of μ0 only, while bi ≡ b(μ0 αi0 yi0 ) depends on true values and initial conditions. The quadratic term bi (μ − μ0 αi − αi0 ) has no effect on the bias. Indeed, it could be replaced by any other quadratic function in differences μ − μ0 and αi − αi0 . Removing the quadratic terms, we may consider (30)
π IR (αi |μ σ 2 ) ∝ {1 + a(μ − μ0 )}−1/2
The prior π IR is also bias reducing. Note that as a(μ−μ0 ) is linear, the function IR π (αi |μ σ 2 ) is degenerate for some values of μ. When estimating the prior in practice, this degeneracy can be a problem. It can then make sense to use the alternative expression (14) for the robust prior and consider instead 1 π IR (αi |μ σ 2 ) ∝ exp − a(μ − μ0 ) (31) 2 Now, the priors given by (30) and (31), are distribution dependent because a depends on μ0 . Looking for a non-distribution-dependent prior requires solving
∂
∂
2 2 ln π(α (μ σ )|μ σ ) ∝ ln {1 + a(μ − μ0 )}−1/2 (32) i
∂μ μ0 σ 2 ∂μ μ0 0
for some function π independent of (μ0 σ02 αi0 ). In the AR(1) case, we show in the Supplemental material that
T −1 1
∂
t−1 −1/2 ln {1 + a(μ − μ )} (T − t)μ10 = 0 ∂μ μ0 T t=1 In this case, equation (32) admits solutions independent of true parameter values. For example, the following choice works: T −1 1 T −t t 2 (33) π(αi |μ σ ) = exp μ T t=1 t This is the prior found by Lancaster (2002) in terms of the original (noninformation-orthogonal) parameterization. Note that this property is specific to the AR(1) case. In the AR(p) model, p > 1, a non-data-dependent biasreducing prior generally does not exist. At the end of this section, the existence of bias-reducing data-dependent priors for the AR(p) model that are independent of the common parameters is discussed in the context of random effects estimation.
516
M. ARELLANO AND S. BONHOMME
6.2. Static Logit We now consider the model yit = 1{xit θ0 + αi0 + εit > 0}
(i = 1 N t = 1 T )
where the x’s are known, and εit are i.i.d. and drawn from the logistic distribution with c.d.f. Λ. The individual log-likelihood is given by T 1 yit ln Λ(xit θ + αi ) + (1 − yit ) ln[1 − Λ(xit θ + αi )] T t=1
i (θ αi ) =
In the Supplemental material we derive the expression of a robust prior as a consistent estimate of T −1/2
IR 2 (34) Eθ0 αi0 ([yit − Λ(xit θ + αi )] ) πi (αi |θ) ∝ t=1
×
T
Λ(xit θ + αi )[1 − Λ(xit θ + αi )]
t=1
As shown in Lancaster (2000), there also exists an orthogonal reparameterization in this model. Let ψi =
T
Λ(xit θ + αi )
t=1
Then ψi and θ are information orthogonal. The uniform prior on ψi is thus bias reducing. The corresponding prior on the original individual effects is (35)
πi (αi |θ) ∝
T
Λ(xit θ + αi )[1 − Λ(xit θ + αi )]
t=1
Note that in this case, Jeffreys’ prior is given by πiJ (αi |θ) ∝ {πi (αi |θ)}1/2 . It is readily verified that πiJ is not bias reducing. On the other hand, both πiIR and πi reduce bias. In practice, one can thus compute the robust prior (36)
π (αi |θ) ∝ R i
T
it
(yit − Λ(x θ + αi ))
t=1
2
−1/2
PRIORS IN NONLINEAR PANEL DATA
×
T
517
Λ(xit θ + αi )[1 − Λ(xit θ + αi )]
t=1
One can also use expected quantities and compute (37)
πiR (αi |θ) −1/2 T
2 αi )[1 − 2Λ(xit θ + αi )] + [Λ(xit θ + αi )] Λ(xit θ + ∝ t=1
×
T
Λ(xit θ + αi )[1 − Λ(xit θ + αi )]
t=1
where θ and αi are consistent estimates of the true parameters when T tends to infinity (for example, maximum likelihood estimates). 6.3. Random Effects We study the properties of random effects maximum likelihood (REML) estimators in the previous examples. Dynamic AR(p) We start with the dynamic AR(p) model of Section 6.1. We show in the Supplemental material that, for this model, ρi (μ σ 2 αi ) = a0 (μ)yi0 + a1 (μ)αi where yi0 is the vector of initial conditions, and a0 (μ) and a1 (μ) are matrices. Moreover, if the process is stationary, then a0 (μ) = O(1/T ). Hence, it follows from Corollary 1 that uncorrelated Gaussian REML is bias reducing for this model. This result was proven by Cho, Hahn, and Kuersteiner (2004) in the case p = 1. If strictly exogenous covariates are included in the model, then it is easy to check that correlated Gaussian REML is robust, while uncorrelated REML is not, in general. Linear Model With One Endogenous Regressor and Many Instruments A closely related example is the following linear model with one endogenous regressor in a panel context22 : yit = θαi + uit xit = αi + vit 22
We are grateful to Jinyong Hahn for this suggestion.
518
M. ARELLANO AND S. BONHOMME
where errors are i.i.d. and uit ∼ N (0 Ω) vit We assume that covariance matrix Ω is given. We let ω11 ω12 Ω−1 = ω21 ω22 In this example there is an analogy between having a large number of individual effects and a large number of instruments in a simultaneous equations perspective (see Hahn (2000)). We show in the Supplemental material that ρi (θ αi ) = αi
−ω11 θ − ω12 ω11 θ2 + 2ω12 θ + ω22
We are thus in the case of Corollary 1 and Gaussian REML is bias reducing. A related situation arises in Chamberlain and Imbens’ (2004) use of REQML under Bekker’s (1994) asymptotics. Our treatment of this example shows that the linearity of the model is crucial for the success of random effects methods. Static Logit In the case of the static logit model, we have that T
ρi (θ αi ) = −
Λ(xit θ + αi )(1 − Λ(xit θ + αi ))xit
t=1 T
it
it
Λ(x θ + αi )(1 − Λ(x θ + αi ))
t=1
This is a highly nonlinear expression in αi , θ, and xi = (xi1 xiT ) . Thus, usual REML estimators are not bias reducing. For example, Corollary 1 shows that uncorrelated Gaussian REML is not robust. Note that this lack of unbiasedness is not corrected for by allowing the prior to depend on covariates xit , as in Chamberlain’s (1984) probit model. In that case, it is still impossible to correct for the first-order bias without permitting the prior to depend on the common parameters θ. In nonlinear models, thus, the success of random effects likelihood inference depends critically on prior knowledge about the form of the fixed effects. 7. MONTE CARLO SIMULATION In this section, we provide some Monte Carlo evidence on the finite-sample behavior of integrated likelihood estimators.
PRIORS IN NONLINEAR PANEL DATA
519
7.1. Static Logit We first focus on the static logit model: (38)
yit = 1{xit θ0 + αi0 + εit > 0}
(i = 1 N t = 1 T )
The xit are constant across simulations and drawn from a N (0 1) distribution. The Tindividual effects are drawn in each simulation from N (xi 1), where xi = 1 t=1 xit . Last, εit are i.i.d. draws from the logistic c.d.f. and θ0 is set to 1. In T all the experiments N is 100. Table I shows some statistics of the empirical distribution of 100 draws of θ, where θ can be one of the following estimators: “Uncorrected” refers to the MLE and “Corrected” refers to the corrected MLE obtained using the DiCiccio and Stern (1993) adjustment based on equation (5) (see Arellano and Hahn (2007, p. 392)); “Uniform” is the integrated likelihood estimator with uniform prior πi ∝ 1; “Lancaster” is the integrated likelihood with the uniform prior on the orthogonal parameters written in terms of the original effects (see equation (35)); “Robust, observed” refers to the integrated likelihood with the robust prior constructed from observed quantities (see (36)), while “Robust, infeasible” refers to the integrated likelihood with the robust prior estimated using expected quantities where the true parameter θ0 is assumed known (see (37)); “Robust, iterated 1” refers to the same estimator, but when the expectation in (37) is evaluated at θ, the “Robust” integrated likelihood estimator; then, “robust, iterated ∞” is obtained iterating this procedure until convergence; “random effects” is the Gaussian random effects estimator; “Conditional logit” is Chamberlain’s (1980) conditional logit.23 Table I shows that the bias of the MLE can be large: it is equal to 33% for T = 5 and still 6% for T = 20. The corrections based on the concentrated likelihood and the various integrated likelihoods give roughly the same results. In all cases considered, using one of these corrections reduces the bias by a factor of between 2 and 3. The best performance, in terms of bias, mean squared error (MSE), and mean absolute error (MAE), is achieved by Lancaster’s (1998) integrated likelihood given by equation (35). Note that the infeasible estimator based on (37) and the iterated corrections do not give better results than the correction based on observed quantities. The Gaussian random effects MLE gives rather good results. Our experiments (not reported) showed that the relative performance of REML worsens when the correlation between αi0 and xi increases, and when the sampling distribution of the individual effects departs from the normal. Last, the conditional logit estimator is consistent for fixed T . Still, note that several corrected/integrated estimators yield MSE and MAE comparable to—or lower than—those of conditional logit for T = 10 and T = 20. This suggests that 23 Both the random effects and conditional logit estimators were computed using the STATA xtlogit and clogit commands, respectively. The other estimators were computed using GAUSS.
520
M. ARELLANO AND S. BONHOMME TABLE I VARIOUS ESTIMATORS OF θ IN THE STATIC LOGIT MODELa Mean
Median
STD
05 p
10 p
MSE
Uncorrected Corrected Uniform Lancaster Robust, observed Robust, infeasible Robust, iterated 1 Robust, iterated ∞ Random effects Conditional logit
133 112 161 106 111 118 113 123 114 997
130 108 162 105 109 117 114 122 113 983
T =5 235 188 260 150 199 146 184 195 163 172
929 838 122 800 821 950 878 101 854 749
108 868 129 843 867 963 914 103 905 793
163 0489 442 0260 0523 0530 0504 0907 0418 0283
335 170 613 126 176 193 172 236 178 138
Uncorrected Corrected Uniform Lancaster Robust, observed Robust, infeasible Robust, iterated 1 Robust, iterated ∞ Random effects Conditional logit
113 106 126 102 105 107 104 108 103 997
113 105 126 103 105 106 104 106 103 998
T = 10 117 0975 147 0911 109 100 0892 0896 0986 0961
950 902 105 880 884 895 918 939 865 859
994 927 106 899 909 933 932 970 906 884
0296 0136 0893 00880 0145 0142 00976 0139 00848 0105
140 0943 263 0790 0974 0946 0785 0938 0832 0754
Uncorrected Corrected Uniform Lancaster Robust, observed Robust, infeasible Robust, iterated 1 Robust, iterated ∞ Random effects Conditional logit
106 102 112 997 101 104 101 102 102 101
106 103 111 997 100 104 100 102 101 995
T = 20 0683 0606 0683 0548 0702 0613 0673 0688 0664 0682
947 912 990 900 905 923 885 893 920 905
971 946 103 921 929 955 934 948 940 920
00826 00424 0184 00298 00500 00558 00459 00525 00579 00492
0757 0530 119 0429 0527 0629 0536 0567 0523 0535
MAE
a Estimates of θ in model (38); N = 100 simulations; θ = 1. 0
for intermediate values of T , it may not be obvious to choose a fixed-T consistent estimator rather than bias-corrected alternatives. Hahn, Kuersteiner, and Newey (2004) showed that bias-corrected estimators are second-order efficient. Clearly, under suitable regularity conditions, our robust integrated likelihood estimator falls into the class considered by these authors.24 In contrast, 24 A second-order Laplace approximation of the integrated likelihood (as in Tierney, Kass, and Kadane (1989)) is necessary to prove this result formally.
PRIORS IN NONLINEAR PANEL DATA
521
FIGURE 1.—Likelihood functions in the static logit model (T = 10, N = 100, θ0 = 1). The thin line represents the likelihood function, the thick line represents the bias-corrected likelihood using DiCiccio and Stern (1993), and the dashed line represents the robust integrated likelihood.
there is a potential efficiency loss in conditioning on the sufficient statistic in the conditional logit model. Finally, in Figure 1 we draw the likelihood function of the static logit model (thin line). The thick line and the dashed line show the bias-corrected likelihood function (using the DiCiccio and Stern formula) and the robust integrated likelihood. The two pseudo-likelihoods are concave. Moreover, it is clear on the figure that they both correct bias with respect to the MLE. 7.2. Dynamic AR(1) Next, we consider the dynamic AR(1) model: (39)
yit = μ10 yit−1 + αi0 + εit
(i = 1 N t = 1 T )
Individual effects are drawn in each simulation from a standard distribution. Moreover, the initial condition yi0 is drawn in the stationary distribution of yit for fixed i. Last, εit are i.i.d. standard normal draws and μ10 is set to 5. As before, N is 100. The standard deviation of errors, set to 1, is treated as known. With non-i.i.d. data, the choice of local approximation of the formulas for prior distributions may be important, as illustrated in Figure 2. The left panel
522
M. ARELLANO AND S. BONHOMME
FIGURE 2.—Likelihood functions in the dynamic AR(1) model (one simulation, T = 10, N = 100, μ10 = 5). The thin line represents the likelihood function and the thick line represents the robust integrated likelihood. Left: Prior based on equation (12). Right: Prior based on equation (14).
in Figure 2 shows the likelihood function of the dynamic AR(1) model (thin line). The thick line shows the integrated likelihood with prior given by the formula (30), obtained using expected quantities. The function is degenerate around μ1 = 8. Moreover, a close look at the figure shows two local extrema. The local maximum corresponds to μ1 around 5, which means that inference from this local maximum is bias reducing. Still, the flatness of the curve suggests that one might have trouble trying to find this maximum using standard maximization algorithms. This problem is likely to be worse in situations with more parameters to consider. The right panel on the same figure shows the integrated likelihood for the prior (31). The situation there is strikingly different, as the pseudo-likelihood is nicely concave. Moreover, its maximum is still much closer to the truth than the MLE. In the rest of this section, we use the prior (31) to estimate common parameters. Table II shows some statistics of the empirical distributions of some estimators for T = 10: the MLE (“Observed”) and diverse corrections based on various degrees of trimming (from q = 1 to q = 3); then the integrated likelihood based on the uniform prior (“Uniform”) and on the Lancaster prior (“Lancaster”) given by (33); the “Robust” expression of the prior is based on (14) where the outer product is estimated using observed quantities with various degrees of trimming; the “Expected” prior is the one given by (31) and plugged into the “Robust, q = 2” result to start the iterations in “Iterated”; “GMM” refers to the estimator discussed in Arellano and Bond (1991); “Random effects (uncorr.)” and “Random effects (corr.)” refer to the Gaussian random effects estimators assuming that the individual effects are independent of
523
PRIORS IN NONLINEAR PANEL DATA TABLE II VARIOUS ESTIMATORS OF μ1 IN THE DYNAMIC AR(1) MODELa (T = 10)
Uncorrected Corrected, q = 1 Corrected, q = 2 Corrected, q = 3 Uniform Lancaster Robust, observed q = 1 Robust, observed q = 2 Robust, observed q = 3 Robust, infeasible Robust, iterated 1 Robust, iterated ∞ GMM Random effects (uncorr.) Random effects (corr.)
Mean
Median
STD
05 p
[p 10]
MSE
MAE
.333 .391 .402 .384 .336 .504 .393 .409 .394 .500 .479 .499 .455 .562 .500
.328 .390 .402 .384 .335 .506 .394 .413 .395 .502 .477 .497 .459 .560 .498
.0320 .0341 .0327 .0343 .0330 .0374 .0296 .0304 .0345 .0302 .0299 .0323 .0608 .0501 .0348
.288 .336 .348 .328 .277 .435 .335 .356 .332 .449 .429 .445 .340 .448 .435
.300 .342 .359 .340 .296 .455 .352 .368 .342 .455 .436 .455 .373 .498 .461
0290 0131 0107 0145 0281 00140 0123 00920 0125 000903 00133 00104 00567 00629 00120
167 109 0984 116 164 0302 107 0910 106 0240 0299 0264 0602 0663 0274
a Estimates of μ in model (39); N = 100 simulations; μ = 5. 1 10
initial conditions or allowing that the mean depends linearly on the initial condition.25 We find a large bias of the MLE (30%) that is corrected for by almost onehalf by both the corrections of the concentrated likelihood and the robust integrated likelihood. In both cases the preferred degree of trimming is 2. The uniform prior yields no bias reduction at all, and the Lancaster prior based on the available orthogonalization gives almost no bias. Interestingly, the infeasible robust prior based on expected quantities and the true value of μ10 gives even better results in terms of bias, MSE, and MAE. Moreover, the iterated estimators have also very good finite-sample properties. In our simulations, we found that two iterations were enough to get very close to the infinitely iterated estimator. As the formulas of these priors are not based on parameter orthogonalization, these results suggest that iteration of the analytical expressions of the prior such as (14) can be useful for dealing with non-i.i.d. data. Last, note that the GMM estimator suffers from a small bias, which disappears when N grows (recall that N = 100 in the experiments). Moreover, it has larger variance than all the other estimators. The result is that the integrated likelihood functions with priors based on analytical calculations (infeasible and iterated) compare favorably with the fixed-T consistent GMM estimator in terms of MSE and MAE. 25 We computed the GMM estimator using the STATA command xtabond2, with the option noleveleq. The other estimators were programmed in GAUSS.
524
M. ARELLANO AND S. BONHOMME
The last two rows of Table II show the behavior of random effects estimators. In the dynamic AR(1) model, Alvarez and Arellano (2003) showed that the Gaussian RE pseudo-likelihood based on αi ∼ N (m1 + m2 yi0 s2 ) reduces bias. Then Cho, Hahn, and Kuersteiner (2004) showed that this is also the case of the random effects specification αi ∼ N (m s2 ), where the mean of αi is misspecified to be independent of the initial observation yi0 . We have shown that this result generalizes to dynamic AR(p) models without exogenous covariates. The numbers reported show that, in spite of the theoretical result, the uncorrelated REML estimator is substantially biased compared to its correlated counterpart. Thus, in dynamic linear models, it may be important to allow (even parametrically) for correlation between the individual effects and the initial conditions in the estimation. Last, note that the correlated random effects estimator compares favorably to all other estimators studied, except the infeasible and infinitely iterated robust integrated likelihood estimators. 7.3. Dynamic AR(2) We end this simulation section by considering the dynamic AR(2) model: (40)
yit = μ10 yit−1 + μ20 yit−2 + αi0 + εit
(i = 1 N t = 1 T )
As before, the individual effects are drawn in each simulation from a standard distribution, and the initial conditions yi−1 and yi0 are drawn in the stationary distribution of (yit yit+1 ) for fixed i. Then εit are i.i.d. standard normal draws, μ10 is set to 5, and μ20 is set to 0. Last, N is 100 and the standard deviation of errors, set to 1, is treated as known. To estimate the priors, we use the robust formula given in (14). Analytical expressions are given in the Supplemental material. Table III presents the results for T = 10. We find that the MLE is biased. A difference with the AR(1) TABLE III VARIOUS ESTIMATORS OF (μ1 μ2 ) IN THE DYNAMIC AR(2) MODELa (T = 10)
Uncorrected Corrected, q = 1 Corrected, q = 2 Uniform Robust, observed q = 1 Robust, observed q = 2 Robust, infeasible Robust, iterated 1 Robust, iterated ∞ GMM
Mean μ1
MSE μ1
Mean μ2
MSE μ2
.385 .419 .423 .369 .451 .435 .451 .441 .446 .440
0146 00808 00734 0189 00371 00602 00352 00455 00405 00739
−0774 −101 −0780 −104 −137 −0873 −00801 −0262 −0187 −0278
00700 0111 00715 0119 0198 00868 00117 00203 00175 00297
a Estimates of μ and μ in model (39); N = 100 simulations; μ = 5, μ = 0. 1 2 10 20
PRIORS IN NONLINEAR PANEL DATA
525
case is that if the corrected concentrated likelihood and the robust integrated likelihood estimated using observed quantities reduce bias, they do so only for the first autoregressive parameter. In that case, only the analytical correction (“infeasible”) reduces both biases. Interestingly, as before only one or two iterations starting with the “Robust” estimate get close to these infeasible estimates. Moreover, as in the AR(1) case, the iterated analytical corrections compare favorably with the GMM estimator. Note that in the AR(2) case no orthogonal reparameterization is available. The results obtained for the iterated estimators thus seem remarkable, both in terms of bias and mean squared error. 8. CONCLUSION Many approaches to the estimation of panel data models rely on an average likelihood that assigns weights to different values of the individual effects. In this paper, we study under which conditions such weighting schemes are robust, in that they yield biases of order 1/T 2 as opposed to 1/T . We find that robust weights, or priors, will in general satisfy two conditions. First, they depend on the data unless an orthogonal reparameterization is available. Second, they do not impose prior independence between the common parameters and the individual effects, as we show that random effects specifications are not bias reducing, in general. We propose two bias-reducing priors, which deal with the incidental parameter problem by taking into account the uncertainty about the individual effects. Our approach, based on prior distributions and integration, has a natural connection with simulation-based estimation techniques, such as MCMC. In addition, we argue that asymptotically valid confidence intervals can be read from the quantiles of the posterior distribution. We show that, in general, standard random effects estimation of policy parameters is inconsistent for large T , whereas the posterior mean is large-T consistent, and we provide conditions for bias reduction. Priors that are bias reducing for the common parameters do not lead to bias reduction of marginal effects, and bias-reducing priors for marginal effects are specific to the effect considered. We also show that in random effects models, both the estimators of common parameters and the posterior means of marginal effects have first-order biases that depend on the Kullback–Leibler distance between the population distribution of the effects and its best approximation in the random effects family. So, while updating the prior given the data lowers the bias on the marginal effects by an order of magnitude, the bias can be further reduced by using either a bias-reducing prior or an approximating family sufficiently close to the distribution of the effects. The Monte Carlo evidence suggests rather good finite-sample properties of integrated likelihood estimates based on robust priors. It seems very interesting to investigate the behavior of our method as the complexity of the model
526
M. ARELLANO AND S. BONHOMME
increases. If what we propose turns out to be feasible and satisfying, then structural microeconometric models would be a natural field of application. APPENDIX: PROOFS This appendix provides proofs of the results in Sections 2, 3, 4.1, and 4.2. Proofs of the results from Section 4.3 are in the Supplemental material. PROOF OF LEMMA 1: Let us fix i and denote LIi (θ) = exp[T i (θ αi )]πi (αi |θ) dαi αi (θ) and using a Laplace apAssuming that i (θ αi ) has a unique maximum proximation as in Tierney, Kass, and Kadane (1989) we obtain T α L (θ) = πi ( αi (θ)|θ) exp T i (θ αi (θ)) + vi i (θ αi (θ)) 2 1 2 × (αi − αi (θ)) dαi 1 + Op T
I i
αi (θ)|θ) exp[T i (θ αi (θ))] = πi ( 1 T αi vi (θ αi (θ))(αi − αi (θ))2 dαi 1 + Op × exp 2 T √ α αi (θ)|θ) 2π{−T vi i (θ αi (θ))}−1/2 = πi ( 1 × exp[T i (θ αi (θ))] 1 + Op T It thus follows that (A1)
2π 1 αi 1 (θ) − (θ) = ln − ln −vi (θ αi (θ)) 2T T 2T 1 1 + ln πi ( αi (θ)|θ) + Op T T2 I i
c i
where Assumption 1 allows us to take logs. αi (θ)) = 0 around Now by expanding the sample moment condition vi (θ αi (θ), we immediately find that 1 A αi (θ) − αi (θ) = √ + Op T T
PRIORS IN NONLINEAR PANEL DATA
527
where A = Op (1) and Eθ0 αi0 [A] = 0. This implies that 1 B α α vi i (θ αi (θ)) = vi i (θ αi (θ)) + √ + Op T T
1 C α = Eθ0 αi0 [vi i (θ αi (θ))] + √ + Op T T
where B and C are Op (1) with zero mean. Expanding the log yields (A2)
αi 1 αi Eθ0 αi0 ln −vi (θ αi (θ)) = ln Eθ0 αi0 [−vi (θ αi (θ))] + O T
Likewise, using Assumption 2 we obtain (A3)
1 αi (θ)|θ) = ln πi (αi (θ)|θ) + O Eθ0 αi0 ln πi ( T
Taking expectations in (A1) and combining the result with (A2) and (A3) yields 2π 1 1 α I c ln − ln Eθ0 αi0 [−vi i (θ αi (θ))] Eθ0 αi0 [ i (θ) − i (θ)] = 2T T 2T 1 1 + ln πi (αi (θ)|θ) + O Q.E.D. T T2 PROOF OF THEOREM 1: Immediate from (4) and (5).
Q.E.D.
PROOF OF THEOREM 2: Immediate using (8).
Q.E.D.
In preparation for the proof of Proposition 1, we state the following lemma: LEMMA A1:
−1 ∂
α (A4) αi (θ) = Eθ0 αi0 [−vi i (θ0 αi0 )] Eθ0 αi0 [viθ (θ0 αi0 )]
∂θ θ0 ≡ ρi (θ0 αi0 ) PROOF: By differentiating the moment condition solved by αi (θ) with respect to θ, Eθ0 αi0 [vi (θ αi (θ))] = 0
Q.E.D.
528
M. ARELLANO AND S. BONHOMME
PROOF OF PROPOSITION 1: The bias of the integrated score is
∂
bi (θ0 ) = ln πi (αi (θ)|θ) ∂θ θ0
−1/2 ∂
α − ln Eθ0 αi0 [−vi i (θ αi (θ))] Eθ0 αi0 [vi2 (θ αi (θ))] ∂θ θ0 A
In addition to Lemma A1, we need the information matrix equality at true values: (A5)
α
Eθ0 αi0 [−vi i (θ0 αi0 )] = T Eθ0 αi0 [vi2 (θ0 αi0 )]
To simplify the notation, we drop the arguments inside the expectation terms when they are evaluated at true values. We obtain αθ
A= =
αα
α
E(vi i ) + ρi E(vi i i ) 1 2E(viθ vi ) + 2ρi E(vi i vi ) − · α E(vi i ) 2 E(vi2 ) −1 αθ αα α E(vi i ) + T E(viθ vi ) + ρi [E(vi i i ) + T E(vi i vi )] αi E(−vi )
−1 α αθ E(−vi i )(E(vi i ) + T E(viθ vi )) α E(−vi i )2 αα + E(viθ )(E(vi i i ) + T E(viα vi ))
−1 ∂
αi = E(−vi ) Eθαi (viθ (θ αi )) α E(−vi i )2 ∂αi θ0 αi0
! ∂
αi − E(viθ ) E (−v (θ α )) θαi i i ∂αi θ0 αi0
=
where
Eθαi (v (θ αi )) = θ i
and
viθ (θ αi )fi (y; θ αi ) dy;
αi i
Eθαi (v (θ αi )) = It follows that
α
vi i (θ αi )fi (y; θ αi ) dy
−1 ∂
α Eθαi [−vi i (θ αi )] Eθαi [viθ (θ αi )] A=−
∂αi θ0 αi0
529
PRIORS IN NONLINEAR PANEL DATA
Q.E.D.
and the proposition is proved. PROOF OF PROPOSITION 2: We have
∂
∂
α ln πi (αi (θ)|θ) − ln Eθ0 αi0 [−vi i (θ αi (θ))] bi (θ0 ) =
∂θ θ0 ∂θ θ0 −1/2 × Eθ0 αi0 [T vi2 (θ αi (θ))] Note that it follows from the invariance property of ML that ψi (θ) = ψi (αi (θ) θ) Moreover, it is easily verified that α
Eθ0 αi0 [−vi i (θ αi )] =
∂ψi (αi θ) ∂αi
−
2 ψ
Eθ0 αi0 [−vi i (θ ψi (αi θ))]
∂2 ψi (αi θ) Eθ0 αi0 [vi (θ ψi (αi θ))] ∂α2i
and
∂ψi (αi θ) Eθ0 αi0 [v (θ αi )] = ∂αi 2 i
2 Eθ0 αi0 [vi2 (θ ψi (αi θ))]
where with some abuse of notation we have written vi (θ ψi ) for the score of the reparameterized likelihood with respect to the new fixed effects. Evaluating these two equalities at (θ αi (θ)) and using that Eθ0 αi0 [vi (θ ψi (θ))] = 0 yields Eθ0 αi0 [−vi i (θ αi (θ))] =
∂ψi (αi (θ) θ) ∂αi
2
α
2 ψ
Eθ0 αi0 [−vi i (θ ψi (θ))]
and ∂ψi (αi (θ) θ) Eθ0 αi0 [v (θ αi (θ))] = ∂αi 2 i
Hence
Eθ0 αi0 [vi2 (θ ψi (θ))]
∂
∂
ψ bi (θ0 ) = ln πi (αi (θ)|θ) − ln Eθ0 αi0 [−vi i (θ ψi (θ))]
∂θ θ0 ∂θ θ0 −1/2 × Eθ0 αi0 [T vi2 (θ ψi (θ))]
530
M. ARELLANO AND S. BONHOMME
∂ψi (αi (θ) θ) ∂
− ln
∂θ θ0 ∂αi ∂
∂
ψ = πi (ψi (θ)|θ) − ln Eθ0 ψi0 [−vi i (θ ψi (θ))]
ln ∂θ θ0 ∂θ θ0 −1/2 × Eθ0 ψi0 [T vi2 (θ ψi (θ))] Q.E.D.
The proposition follows.
PROOF OF PROPOSITION 3: A stochastic expansion of vi (θ αi (θ)) in the neighborhood of (θ αi (θ)) yields −1 1 αi αi (θ) − αi (θ) = Eθ0 αi0 [−vi (θ αi (θ))] vi (θ αi (θ)) + Op T This yields
1 αi (θ) − αi (θ)) = O Eθ0 αi0 ( T
and
−2 α Eθ0 αi0 [( αi (θ) − αi (θ))2 ] = Eθ0 αi0 [−vi i (θ αi (θ))]
1 × Eθ0 αi0 [v (θ αi (θ))] + O T2
2 i
Hence αi (θ)) = [πiR (αi (θ)|θ)]−2 + Op Var(
1 T2
αi (θ)) = Op (1/T ) we have Thus, as Var( 1 1 πiR (αi (θ)|θ) ∝ 1 + Op T Var( αi (θ)) Equation (15) follows by noting that
1 πiR ( αi (θ)|θ) = πiR (αi (θ)|θ) 1 + Op T
by the same arguments as in the proof of Lemma 1. To show the second part of the proposition, let πi be a nondogmatic prior satisfying 1 1 αi (θ)|θ) ∝ πi ( 1 + Op T Var( αi (θ))
PRIORS IN NONLINEAR PANEL DATA
531
Then the proof of Lemma 1 shows that the only quantity that matters for bias reduction is ln πi ( αi (θ)|θ). This result comes directly from the Laplace approximation to the integrated likelihood and does not require Assumption 2 to hold. As 1 R ln πi ( αi (θ)|θ) = ln πi ( αi (θ)|θ) + Op T and as πiR is robust, it follows that πi is also bias reducing.
Q.E.D.
PROOF OF LEMMA 2: The first-order conditions of the maximization imply that 0=
N
∂ RE (θ; ξ(θ)) i
i=1 N
1 = T i=1
∂ξ
ξ(θ))/∂ξ} dαi exp[T i (θ αi )]{∂πi (αi ; exp[T i (θ αi )]πi (αi ; ξ(θ)) dαi
A Laplace approximation of the two integrals yields, as in the proof of Lemma 1,
ξ(θ)) ∂πi (αi ; dαi ∂ξ √ −1/2 α = 2π −T vi i (θ αi (θ)) exp[T i (θ αi (θ))] αi (θ); ξ(θ)) 1 ∂πi ( 1 + Op × ∂ξ T exp(T i (θ αi ))
exp(T i (θ αi ))πi (αi ; ξ(θ)) dαi =
√
−1/2 α αi (θ)) exp[T i (θ αi (θ))] 2π −T vi i (θ 1 αi (θ); ξ(θ)) 1 + Op × πi ( T
Hence we obtain N 1 ∂ ln πi ( 1 αi (θ); ξ(θ)) 1 + Op = 0 N i=1 ∂ξ T
532
M. ARELLANO AND S. BONHOMME
Then taking the probability limit we have N 1 1
∂ ln πi ( αi (θ); ξ(θ)) =O Eπ0 Eθ0 αi0 plim ∂ξ T N→∞ N i=1 Last, using that Eθ0 αi0 ( αi (θ) − αi (θ)) = O(1/T ), we obtain N ∂ ln πi (αi (θ); ξ(θ)) 1 1
plim =O Eπ0 ∂ξ T N→∞ N i=1
Q.E.D.
PROOF OF THEOREM 3: Let πi (αi ξ) be a class of random effects distributions indexed by ξ. Also, let π0G be a population joint density of individual effects and exogenous covariates. Lemma 2 implies that the pseudo true value ξ(θ0 ) satisfies (A6)
Eπ0G
∂ ln πi (αi0 ; ξ(θ0 )) 1 =O ∂ξ T
Note that ξ(θ0 ) is population specific. Moreover, it follows from the analysis in Section 4 that πi (αi ξ) is bias reducing if and only if πi (αi ξ(θ0 )) is bias reducing, that is, (A7)
Eπ0G
∂
∂ ln πi (αi0 ; ξ(θ0 )) = o(1) ρi (θ0 αi ) + ρi (θ0 αi0 ) ∂αi αi0 ∂αi
Here we ask the question: In which case is πi (αi ξ) bias reducing for all π0G ? Clearly, this will hold if and only if (A7) holds for all π0G such that (A6) is satisfied. We now provide a linear algebra interpretation of this statement, which leads to an explicit solution. Let us consider the Hilbert space L2 , endowed with the inner product ϕ ψ = ϕ(α)ψ(α) dα (ϕ ψ) ∈ L2 × L2 We have, for any function ψ, Eπ0G (ψ(αi0 )) = π0G ψ So (A6) is equivalent to " (A8)
# ∂ ln πi (·; ξ(θ0 )) − AT π0G = 0 ∂ξ
PRIORS IN NONLINEAR PANEL DATA
533
and (A7) is equivalent to " # ∂ρi (θ0 ·) ∂ ln πi (·; ξ(θ0 )) (A9) + ρi (θ0 ·) − BT π0G = 0 ∂αi ∂αi where AT = O( T1 ) and BT = o(1). So πi (αi ξ) is bias reducing for all π0G if and only if, for all π0G ∈ L2 such that (A8) holds, (A9) holds also.26 Let A⊥ denote the orthogonal complement of A ⊂ L2 . πi (αi ξ) is thus bias reducing for all π0G if and only if ∂ρi (θ0 ·) ∂ ln πi (·; ξ(θ0 )) + ρi (θ0 ·) − BT ∂αi ∂αi ⊥ ⊥ ∂ ln πi (·; ξ(θ0 )) − AT ∈ ∂ξ Now, as there is a finite number of first-order conditions in (A6), the vector space spanned by (∂ ln πi (·; ξ(θ0 )))/∂ξ − AT is finite dimensional, so (e.g., Griffel, (1989, p. 66)) ⊥ ⊥ ∂ ln πi (·; ξ(θ0 )) ∂ ln πi (·; ξ(θ0 )) − AT − AT = Vect ∂ξ ∂ξ where Vect(V ) denotes the vector space spanned by V . So (A7) and (A6) hold for all π0G if and only if there exists a matrix Γ (θ0 ), with as many columns as the number of hyperparameters ξ, such that ∂
∂ ln πi (αi0 ; ξ(θ0 )) − BT
ρi (θ0 αi ) + ρi (θ0 αi0 ) α ∂αi i0 ∂αi ∂ ln πi (αi0 ; ξ(θ0 )) − AT = 0 − Γ (θ0 ) ∂ξ or, equivalently, ∂
∂ ln πi (αi0 ; ξ(θ0 ))
ρi (θ0 αi ) + ρi (θ0 αi0 ) ∂αi αi0 ∂αi − Γ (θ0 ) 26
∂ ln πi (αi0 ; ξ(θ0 )) = o(1) ∂ξ
Strictly speaking, bias reduction holds for any density π0G , so equations (A8) and (A9) hold only for all π0G ∈ L2 that are nonnegative and integrate to 1. However, this does not matter for the argument.
534
M. ARELLANO AND S. BONHOMME
that is,
∂
∂πi (αi0 ; ξ(θ0 )) = o(1) ρi (θ0 αi )πi (αi ; ξ(θ0 )) − Γ (θ0 )
∂αi αi0 ∂ξ Q.E.D.
This ends the proof.
PROOF OF COROLLARY 1: In the location–scale case, we have π(αi ) = f ((αi − μ)/σ), where f is a known pdf, and μ and σ 2 are hyperparameters. Then (22) yields27 αi − μ(θ) + o(1) ρi (θ αi ) = Γ1 (θ) + Γ2 (θ) σ(θ) 1 σ
Q.E.D.
The corollary follows.
PROOF OF COROLLARY 2: In that case πi (αi ) = σ1 f ((αi − xi μ)/σ), and (22) yields αi − xi μ(θ) + o(1) ρi (θ αi ) = Γ1 (θ)xi + Γ2 (θ) Q.E.D. σ(θ) REFERENCES ALVAREZ, J., AND M. ARELLANO (2003): “The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators,” Econometrica, 71, 1121–1159. [491,524] (2004): “Robust Likelihood Estimation of Dynamic Panel Data Models,” Unpublished Manuscript. [514] ARELLANO, M. (2003): “Discrete Choices With Panel Data,” Investigaciones Económicas, 27, 423–458. [491] ARELLANO, M., AND S. R. BOND (1991): “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations,” Review of Economic Studies, 58, 277–297. [522] ARELLANO, M., AND S. R. BONHOMME (2009): “Supplement to ‘Robust Priors in Nonlinear Panel Data Models’,” Econometrica Supplemental Material, 77, http://econometricsociety. org/ecta/Supmat/6895_Proofs.pdf; http://econometricsociety.org/ecta/Supmat/6895_programs. zip. [494] ARELLANO, M., AND J. HAHN (2006): “A Likelihood-Based Approximate Solution to the Incidental Parameter Problem in Dynamic Nonlinear Models With Multiple Effects,” Unpublished Manuscript. [491,497,498,501] (2007): “Understanding Bias in Nonlinear Panel Models: Some Recent Developments,” in Advances in Economics and Econometrics, Ninth World Congress, Vol. 3, ed. by R. Blundell, W. Newey and T. Persson. Cambridge, U.K.: Cambridge University Press. [491,495,497,501,519] ARELLANO, M., AND B. HONORÉ (2001): “Panel Data Models: Some Recent Developments,” in Handbook of Econometrics, Vol. 5, ed. by J. Heckman and E. Leamer. Amsterdam: NorthHolland. [490] 27
We are only looking at the solutions that satisfy: limα→±∞ ρi (θ0 α)πi (α; ξ(θ0 )) = 0.
PRIORS IN NONLINEAR PANEL DATA
535
BEKKER, P. A. (1994): “Alternative Approximations to the Distributions of Instrumental Variable Estimators,” Econometrica, 62, 657–681. [518] BERGER, J., B. LISEO, AND R. L. WOLPERT (1999): “Integrated Likelihood Methods for Eliminating Nuisance Parameters,” Statistical Science, 14, 1–22. [495] BESTER, C. A., AND C. HANSEN (2005a): “A Penalty Function Approach to Bias Reduction in Non-Linear Panel Models With Fixed Effects,” Unpublished Manuscript. [491] (2005b): “Bias Reduction for Bayesian and Frequentist Estimators,” Unpublished Manuscript. [493] CARRO, J. (2007): “Estimating Dynamic Panel Data Discrete Choice Models With Fixed Effects,” Journal of Econometrics, 127, 503–528. [491] CHAMBERLAIN, G. (1980): “Analysis of Covariance With Qualitative Data,” Review of Economic Studies, 47, 225–238. [519] (1984): “Panel Data,” in Handbook of Econometrics, Vol. 2, ed. by Z. Griliches and M. D. Intriligator. Amsterdam: Elsevier Science. [505,507,518] CHAMBERLAIN, G., AND G. IMBENS (2004): “Random Effects Estimators With Many Instrumental Variables,” Econometrica, 72, 295–306. [518] CHAMBERLAIN, G., AND M. MOREIRA (2008): “Decision Theory Applied to a Linear Panel Data Model,” Econometrica (forthcoming). [507] CHERNOZHUKOV, V., AND H. HONG (2003): “An MCMC Approach to Classical Estimation,” Journal of Econometrics, 115, 293–346. [492,504] CHO, M. H., J. HAHN, AND G. KUERSTEINER (2004): “Asymptotic Distribution of Misspecified Random Effects Estimator for a Dynamic Panel Model With Fixed Effects When Both n and T Are Large,” Economics Letters, 84, 117–125. [517,524] COX, D. R., AND N. REID (1987): “Parameter Orthogonality and Approximate Conditional Inference” (with Discussion), Journal of the Royal Statistical Society, Series B, 49, 1–39. [492,499,501] DHAENE, G., K. JOCHMANS, AND B. THUYSBAERT (2006): “Split-Panel Jacknife Estimation of Fixed Effects Models,” Unpublished Manuscript. [491] DICICCIO, T. J., AND S. E. STERN (1993): “An Adjustment to Profile Likelihood Based on Observed Information,” Technical Report, Department of Statistics, Stanford University. [519,521] GHOSAL, S., AND A. W. VAN DER VAART (2001): “Rates of Convergence for Bayes and Maximum Likelihood Estimation for Mixture of Normal Densities,” Annals of Statistics, 29, 1233–1263. [509] (2007): “Posterior Convergence Rates of Dirichlet Mixtures of Normal Distributions at Smooth Densities,” Annals of Statistics, 35, 697–723. [509,510] GRIFFEL, D. H. (1989): Linear Algebra and Its Applications, Vol. 2. New York: Ellis Horwood. [533] HAHN, J. (2000): “Parameter Orthogonalization and Bayesian Inference,” Unpublished Manuscript. [518] (2004): “Does Jeffrey’s Prior Alleviate the Incidental Parameter Problem?” Economics Letters, 82, 135–138. [501] HAHN, J., AND G. KUERSTEINER (2004): “Bias Reduction for Dynamic Nonlinear Panel Models With Fixed Effects,” Unpublished Manuscript. [491,501] HAHN, J., AND W. K. NEWEY (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,” Econometrica, 72, 1295–1319. [491,504,510] HAHN, J., G. KUERSTEINER, AND W. NEWEY (2004): “Higher Order Efficiency of Bias Corrections,” Unpublished Manuscript. [520] HOSPIDO, L. (2006): “Modelling Heterogeneity and Dynamics in the Volatility of Individual Wages,” Unpublished Manuscript. [491] LANCASTER, T. (1998): “Panel Binary Choice With Fixed Effects,” Unpublished Manuscript. [519] (2000): “The Incidental Parameter Problem Since 1948,” Journal of Econometrics, 95, 391–413. [490,516]
536
M. ARELLANO AND S. BONHOMME
(2002): “Orthogonal Parameters and Panel Data,” Review of Economic Studies, 69, 647–666. [491,499,515] NEYMAN, J., AND E. L. SCOTT (1948): “Consistent Estimates Based on Partially Consistent Observations,” Econometrica, 16, 1–32. [490] PACE, L., AND A. SALVAN (2006): “Adjustments of the Profile Likelihood From a New Perspective,” Journal of Statistical Planning and Inference, 136, 3554–3564. [498,501] SEVERINI, T. A. (1999): “On the Relationship Between Bayesian and Non-Bayesian Elimination of Nuisance Parameters,” Statistica Sinica, 9, 713–724. [493,495] (2000): Likelihood Methods in Statistics. London: Oxford University Press. [496] SWEETING, T. J. (1987): “Discussion of the Paper by Professors Cox and Reid,” Journal of the Royal Statistical Society, Series B, 49, 20–21. [492] TIERNEY, L., R. E. KASS, AND J. B. KADANE (1989): “Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions,” Journal of the American Statistical Association, 84, 710–716. [497,520,526] WASSERMAN, L. (2000): “Asymptotic Inference for Mixture Models Using Data-Dependent Priors,” Journal of the Royal Statistical Society, Series B, 62, 159–180. [501] WONG, W. H., AND X. SHEN (1995): “Probability Inequalities for Likelihood Ratios and Convergence Rates of Sieve MLEs,” Annals of Statistics, 23, 339–362. [509] WOUTERSEN, T. (2002): “Robustness Against Incidental Parameters,” Unpublished Manuscript. [491,493]
CEMFI, Casado del Alisal, 5, 28014 Madrid, Spain;
[email protected] and CEMFI, Casado del Alisal, 5, 28014 Madrid, Spain;
[email protected]. Manuscript received December, 2006; final revision received September, 2008.
Econometrica, Vol. 77, No. 2 (March, 2009), 537–560
NOTES AND COMMENTS THE OPTIMAL INCOME TAXATION OF COUPLES BY HENRIK JACOBSEN KLEVEN, CLAUS THUSTRUP KREINER, AND EMMANUEL SAEZ1 This paper analyzes the general nonlinear optimal income tax for couples, a multidimensional screening problem. Each couple consists of a primary earner who always participates in the labor market, but makes an hours-of-work choice, and a secondary earner who chooses whether or not to work. If second-earner participation is a signal of the couple being better (worse) off, we prove that optimal tax schemes display a positive tax (subsidy) on secondary earnings and that the tax (subsidy) on secondary earnings decreases with primary earnings and converges to zero asymptotically. We present calibrated microsimulations for the United Kingdom showing that decreasing tax rates on secondary earnings is quantitatively significant and consistent with actual income tax and transfer programs. KEYWORDS: Optimal income tax, multidimensional screening.
1. INTRODUCTION THIS PAPER EXPLORES the optimal income taxation of couples. Each couple is modelled as a unitary agent supplying labor along two dimensions: the labor supply of a primary earner and the labor supply of a secondary earner. Primary earners differ in ability and make a continuous labor supply decision as in the Mirrlees (1971) model. Secondary earners differ in opportunity costs of work and make a binary labor supply decision (work or not work). We consider a fully general nonlinear tax system allowing us to study the central question of couple taxation: how should the tax rate on one individual vary with the earnings of the spouse. This creates a multidimensional screening problem. We show that if second-earner labor force participation is a signal of the couple being better off (as when second-earner entry reflects high labor market opportunities), optimal tax schemes display positive tax rates on secondary earnings along with negative jointness whereby the tax rate on one person decreases with the earnings of the spouse. Conversely, if second-earner participation is a signal of the couple being worse off (as when second-earner entry reflects low home production ability), we obtain a negative tax rate on the secondary earner along with positive jointness: the second-earner subsidy is being phased out with primary earnings. These results imply that, in either case, the tax distortion on 1 We thank the co-editor, Mark Armstrong, Richard Blundell, Mike Brewer, Raj Chetty, Steven Durlauf, Nada Eissa, Kenneth Judd, Botond Koszegi, Etienne Lehmann, Randall Mariger, JeanCharles Rochet, Andrew Shephard, four anonymous referees, and numerous seminar and conference participants for very helpful comments and discussions. Financial support from NSF Grant SES-0134946 and an Economic Policy Research Network (EPRN) Grant is gratefully acknowledged.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7343
538
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
the secondary earner is declining in primary earnings, which is therefore a general property of an optimum. We also prove that the second-earner tax distortion tends to zero asymptotically as primary earnings become large. Although this result may seem reminiscent of the classic no-distortion-at-the-top result, our result rests on a completely different reasoning and proof. Previous work on couple taxation assumed separability in the tax function and, hence, could not address the optimal form of jointness, which we view as central to the optimal couple tax problem.2 The separability assumption also sidesteps the complexities associated with multidimensional screening. In fact, very few studies in the optimal tax literature have attempted to deal with multidimensional screening problems.3 The nonlinear pricing literature in industrial organization has analyzed such problems extensively. A central complication of multidimensional screening problems is that first-order conditions are often not sufficient to characterize the optimal solution. The reason is that solutions usually display “bunching” at the bottom (Armstrong (1996), Rochet and Choné (1998)), whereby agents with different types are making the same choices. Our framework with a binary labor supply outcome for the secondary earner along with continuous earnings for the primary earner avoids the bunching complexities and offers a simple understanding of the shape of optimal taxes based on graphical exposition. Our key results are obtained under a number of strong simplifying assumptions:4 (i) We adopt the unitary model of family decision making. (ii) We assume that the government knows a priori the identity of the primary and secondary earner in the couple. (iii) We consider only couples and do not model the marriage decision. (iv) We assume uncorrelated abilities between spouses. (v) We assume no income effects on labor supply and separability in the disutility of working for the two members of the household, implying that there is no jointness in the family utility function. Instead, jointness in our model arises solely because the social welfare function depends on family utilities rather than individual utilities. Our assumptions allow us to zoom in on the role of equity concerns for the jointness of the tax system. 2
Boskin and Sheshinski (1983) considered linear taxation of couples, allowing for different marginal tax rates on husband and wife. The linearity assumption effectively implies separable and hence individual-based (albeit gender-specific) tax treatment. More recently, Schroyen (2003) extended the Boskin–Sheshinski framework to the case of nonlinear taxation but kept the assumption of separability in the tax treatment. 3 Mirrlees (1976, 1986) set out a general framework to study such problems and derived firstorder optimality conditions. More recently, Cremer, Pestieau, and Rochet (2001) revisited the issue of commodity versus income taxation in a multidimensional screening model assuming a discrete number of types. Brett (2006) and Cremer, Lozachmeur, and Pestieau (2007) considered the issue of couple taxation in discrete-type models. They showed that, in general, incentive compatibility constraints bind in complex ways, making it difficult to obtain general properties. 4 We refer to Kleven, Kreiner, and Saez (2006) for a discussion of robustness and generalizations.
OPTIMAL INCOME TAXATION OF COUPLES
539
Section 2 sets out our model and Section 3 derives our theoretical results. Section 4 presents a numerically calibrated illustrative simulation based on U.K. micro data. Some proofs are presented in Appendices A and B, while some supplemental material is available on the journal’s website (Kleven, Kreiner, and Saez (2009)). 2. THE MODEL 2.1. Family Labor Supply Choice We consider a population of couples, the size of which is normalized to 1. In each couple, there is a primary earner who always participates in the labor market and makes a choice about the size of labor earnings z. The primary ¯ in earner is characterized by a scalar ability parameter n distributed on (n n) the population. The cost of earning z for a primary earner with ability n is given by n · h(z/n), where h(·) is an increasing and convex function of class C 2 and normalized so that h(0) = 0 and h (1) = 1. Secondary earners choose whether or not to participate in the labor market, l = 0 1, but hours worked conditional on working are fixed. Their labor income is given by w · l, where w is a uniform wage rate, and they face a fixed cost of participation q, which is heterogeneous across the secondary earners. The government cannot observe n and q, and redistributes based on observed earnings using a nonlinear tax T (z wl). Because l is binary and w is uniform, this tax system simplifies to a pair of schedules, T0 (z) and T1 (z), depending on whether the spouse works or not.5 The tax system is separable iff T0 and T1 differ by a constant. Net-of-tax income for a couple with earnings (z wl) is given by c = z + w · l − Tl (z). We consider two sources of heterogeneity across secondary earners, differences in market opportunities and differences in home production abilities, as reflected in the utility function z − qw · l + qh · (1 − l) (1) u(c z l) = c − n · h n where qw + qh ≡ q is the total cost of second-earner participation, the sum of a direct work cost qw and an opportunity cost of lost home production qh . Het5 Like the rest of the literature, we assume that the government observes the identity of the primary and secondary earner in each couple, and is allowed to use this information in the tax system. If identity could not be used in the calculation of taxes (a so-called anonymous tax system), a symmetry constraint T (z w) = T (w z) would have to be added to the problem. However, this symmetry constraint can be ignored if the secondary earner is always the lower-earnings spouse in the couple. In the context of our simple model (where w is uniform), this assumption is equivalent to w < z(n). When identity is perfectly aligned with earnings, an earnings-based and anonymous tax can be made dependent on identity de facto without being identity-specific de jure. This is important in countries where an identity-specific (e.g., gender-specific) tax system would be unconstitutional.
540
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
erogeneity in qw creates differences in household utility across couples with l = 1 (heterogeneity in market opportunities), whereas heterogeneity in qh generates differences in household utility across couples with l = 0 (heterogeneity in home production abilities). As we shall see, the two types of heterogeneity pull optimal redistribution policy in opposite directions. To isolate the impact of each type of heterogeneity, we consider them in turn. In the work cost model (q = qw > 0, qh = 0), at a given primary earner ability n, two-earner couples will be those with low work costs and hence they will be better off than one-earner couples. This creates a motive for the government to tax the income of the secondary earner so as to redistribute from two-earner to one-earner couples. By contrast, in the home production model (qw = 0, q = qh > 0), two-earner couples will be those with low home production abilities and therefore they will be worse off than one-earner couples, creating the reverse redistributive motive. The work cost model is more consonant with the tradition in applied welfare and poverty measurement, which assumes that secondary earnings contribute positively to family well-being, and with the underlying notion in the existing optimal tax literature that higher income is a signal of higher wellbeing.6 On the other hand, the existing literature did not consider two-person households where home production (including child-bearing and child-caring) is more important. We therefore analyze both models symmetrically. The online supplemental material has a discussion of the general case with both types of heterogeneity. If T0 and T1 are differentiable, the first-order condition for z (conditional on l = 0 1) is h (zl /n) = 1 − Tl (z).7 In the case of no tax distortion, Tl (z) = 0, our normalization h (1) = 1 implies z = n. Hence, it is natural to interpret n as potential earnings.8 Positive marginal tax rates depress actual earnings z below potential earnings n. If the tax system is nonseparable such that T0 = T1 , primary earnings z depend on the labor force participation decision l of the spouse. We denote by zl the optimal choice of z at a given l. We define the 6
It is this notion that drives the result in the Mirrlees model that optimal marginal tax rates are positive. If differences in market earnings were driven by home production ability instead of market ability, the Mirrlees model would generate negative optimal tax rates as high-earnings individuals are those with low ability and utility. Ramey (2008) showed that primary earners provide significant home production but the main question is whether this effect is strong enough to make the poor better off than the rich, and thereby reverse the traditional results. 7 If the tax system is not differentiable, we can still define the implicit marginal tax rate Tl (with slight abuse of notation) as 1 − h (zl /n), where zl is the utility maximizing choice of earnings conditional on l. 8 Typically, economists consider models where n is a wage rate and utility is specified as u = c − h(z/n), leading to a first-order condition n · (1 − T (z)) = h (z/n). Our results carry over to this case but n would no longer reflect potential earnings and the interpretation of optimal tax formulas would be less transparent (Saez (2001)).
OPTIMAL INCOME TAXATION OF COUPLES
541
elasticity of primary earnings with respect to the net-of-tax rate 1 − Tl as εl ≡
1 − Tl h (zl /n) ∂zl = zl ∂(1 − Tl ) (zl /n)h (zl /n)
Under separable taxation where T0 = T1 , we have z0 = z1 and ε0 = ε1 . Secondary earners work if the utility from participation is greater than or equal to the utility from nonparticipation. Let us denote by zl (2) +w·l Vl (n) = zl − Tl (zl ) − nh n the indirect utility of the couple (exclusive of the fixed cost q) at a given l. Differentiating with respect to n (denoted by an upper dot from now on) and using the envelope theorem, we obtain zl zl zl (3) + · h ≥ 0 V˙l (n) = −h n n n The inequality follows from the fact that x → −h(x) + x · h (x) is increasing (as h > 0) and null at x = 0. The inequality is strict if zl > 0, that is, if Tl < 1. The participation constraint for secondary earners is given by (4)
¯ q ≤ V1 (n) − V0 (n) ≡ q(n)
¯ where q(n) is the net gain from working exclusive of the fixed cost q. For fam¯ ilies with a fixed cost below (above) the threshold value q(n), the secondary earner works (does not work). The couple characteristics (n q) are distributed according to a continuous ¯ × [0 ∞). We denote by P(q|n) the cudensity distribution defined over [n n] mulative distribution function of q conditional on n, by p(q|n) the density function of q conditional on n, and by f (n) the unconditional density of n. The probability of labor force participation for the secondary earner at a given ¯ ability level n of the primary earner is P(q|n). We define the participation elas¯ ¯ ticity with respect to the net gain from working q¯ as η = q¯ · p(q|n)/P( q|n). Since w is the gross gain from working, and q¯ has been defined as the (money metric) net utility gain from working, we can define the tax rate on secondary ¯ earnings as τ = (w − q)/w. Notice that if taxation is separate so that T0 = T1 and z0 = z1 , we have τ = (T1 − T0 )/w. If taxation is nonseparate, then T1 − T0 reflects the total tax change for the family when the secondary earner starts working and the primary earner makes an associated earnings adjustment, whereas w − q¯ reflects the tax burden on second-earner participation per se. The central optimal couple tax question we want to tackle is whether the tax rate on one person should depend on the earnings of the spouse. We may define the possible forms of couple taxation as follows:
542
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
DEFINITION 1: At any point n, we have either (i) positive jointness, T1 > T0 and τ˙ > 0, (ii) separability, T0 = T1 and τ˙ = 0, or (iii) negative jointness, T1 < T0 and τ˙ < 0.9 Finally, notice that double-deviation issues are taken care of in our model, because we consider earnings at a given n and allow z to adapt optimally when l changes. If the secondary earner starts working, optimal primary earnings shift from z0 (n) to z1 (n) but the key first-order condition (3) continues to apply. As in the Mirrlees model, a given path for (z0 (n) z1 (n)) can be implemented via a truthful mechanism or, equivalently, by a nonlinear tax system if and only if z0 (n) and z1 (n) are nonnegative and nondecreasing in n (a formal proof is provided in the online supplemental material). 2.2. Government Objective The government sets T0 (z) and T1 (z) to maximize social welfare n¯ ∞ (5) Ψ (Vl (n) − qw · l + qh · (1 − l))p(q|n)f (n) dq dn W = n=n
q=0
where Ψ (·) is an increasing and concave transformation (representing either the government redistributive preferences or individual concave utilities) subject to the budget constraint n¯ ∞ (6) Tl (zl )p(q|n)f (n) dq dn ≥ 0 n=n
q=0
and subject to V˙0 (n) and V˙1 (n) in equation (3). Let λ > 0 be the multiplier associated with the budget constraint (6). The government’s redistributive tastes may be represented by social marginal welfare weights on different couples. We denote by gl (n) the (average) social marginal welfare weight for couples with primary-earner ability n and secondary-earner participation status l. For the work cost model q¯ ¯ (qw > 0, qh = 0), we have g1 (n) = 0 Ψ (V1 (n) − qw )p(q|n) dq/(P(q|n) · λ) w h model (q = 0, q > 0), and g0 (n) = Ψ (V0 (n))/λ. For the home production ∞ we have g1 (n) = Ψ (V1 (n))/λ and g0 (n) = q¯ Ψ (V0 (n) + qh )p(q|n) dq/((1 − ¯ P(q|n)) · λ). Optimal redistribution depends crucially on the evolution of weights g0 (n) and g1 (n) through the ability distribution. In particular, we will show that the 9 Using equations (2)–(4), it is easy to prove that sign(T1 − T0 ) = sign(τ). ˙ This is simply another way of stating the theorem of equality of cross-partial derivatives. Notice that T0 and T1 are evaluated at the same ability level n but not at the same earnings level when T0 = T1 because this implies z0 (n) = z1 (n).
543
OPTIMAL INCOME TAXATION OF COUPLES
optimal tax scheme depends on properties of g0 (n) − g1 (n), which reflects the preferences for redistribution between one- and two-earner couples. At this stage, notice that the sign of g0 (n) − g1 (n) depends on whether second-earner heterogeneity is driven by work costs or by home production ability. In the work cost model, we have V1 (n) − qw > V0 (n), which implies (as Ψ is concave) that g0 (n) − g1 (n) > 0. By contrast, in the home production model, we have V0 (n) + qh > V1 (n) and hence g0 (n) − g1 (n) < 0. As we shall see, whether g0 (n) − g1 (n) is positive or negative determines whether the optimal tax on secondary earners is positive or negative. 3. CHARACTERIZATION OF THE OPTIMAL INCOME TAX SCHEDULE 3.1. Optimal Tax Formulas and Their Relation to Mirlees (1971) The simple model described above makes it possible to derive explicit optimal tax formulas as in the individualistic Mirrlees (1971) model. We introduce the following assumption: ASSUMPTION 1: The function x −→ (1 − h (x))/(x · h (x)) is decreasing.
1−h (z/n) T Assumption 1 ensures that the marginal deadweight loss ε · 1−T = (z/n)h (z/n) is increasing in T . When Assumption 1 fails, ε falls so quickly with T that the marginal deadweight loss falls with T , and such a point can never be optimum.10 Assumption 1 is satisfied, for example, for isoelastic utilities h(x) = x1+1/ε /(1 + 1/ε) or any utility function such that the elasticity ε = h /(x · h ) is decreasing in x. We prove the following proposition in Appendix A:
PROPOSITION 1: Under Assumption 1, an optimal solution exists such that (z0 z1 T0 T1 ) is continuous in n and satisfies (7)
T0 1 1 = · ¯ 1 − T0 ε0 nf (n)(1 − P(q|n)) n¯ ¯ )) + [T1 − T0 ]p(q|n ¯ ) f (n ) dn · (1 − g0 )(1 − P(q|n n
(8)
1
T 1 1 = · ¯ 1 − T1 ε1 nf (n)P(q|n) n¯ ¯ ) − [T1 − T0 ]p(q|n ¯ )}f (n ) dn · {(1 − g1 )P(q|n n
10 Mathematically, Assumption 1 is required to ensure that the first-order condition of the government problem generates a maximum (instead of a minimum); see Appendix A.
544
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
where all the terms outside the integrals are evaluated at ability level n and all the terms inside the integrals are evaluated at n . These conditions apply at any point n where there is no bunching, that is, where zl (n) is strictly increasing in n. If the conditions generate segments over which z0 (n) or z1 (n) are decreasing, then there is bunching and z0 (n) or z1 (n) are constant over a segment. Kleven, Kreiner, and Saez (2006) presented a detailed discussion of these formulas. Let us here remark on just two aspects. First, the (weighted) average marginal tax rate faced by primary earners in one- and two-earner couples equals (9)
¯ (1 − P(q|n)) · ε0 · 1 · = nf (n)
n¯
T0 T1 ¯ + P( q|n) · ε · 1 1 − T0 1 − T1
¯ ))f (n ) dn (1 − g(n
n
¯ ))g0 (n )+P(q|n ¯ )g1 (n ) is the average social marginal ¯ ) = (1−P(q|n where g(n welfare weight for couples with ability n . This result is identical to the Mirrlees formula (without income effects), implying that redistribution across couples with different primary earners follows the standard logic in the literature. The introduction of a secondary earner in the household creates a potential difference in the marginal tax rates faced by primary earners with working and nonworking spouses, which we explore in detail below. Second, the famous results that optimal marginal tax rates are zero at the bottom and at the top carry over to the couple model from the transversality conditions (see Appendix A).11 3.2. Asymptotic Properties of the Optimal Schedule Let the ability distribution of primary earners f (n) have an infinite tail (n¯ = ∞). As top tails of income distributions are well approximated by the Pareto distribution (Saez (2001)), we assume that f (n) has a Pareto tail with parameter a > 1 (f (n) = C/n1+a ). We also assume that the distribution of work costs P(q|n) converges to P ∞ (q). We can then show the next proposition: PROPOSITION 2: Suppose T1 − T0 , T0 , T1 , and q¯ converge to T ∞ , T ∞ 0 < 1, T < 1, and q¯ ∞ as n → ∞. Then (i) g0 and g1 converge to the same value g ≥ 0, (ii) the second-earner tax converges to zero, T ∞ = τ∞ = 0, and (iii) the ∞ ∞ marginal tax rates on primary earners converge to T ∞ 0 = T 1 = (1 − g )/(1 − ∞ ∞ ∞ g + a · ε ) > 0, where ε is the asymptotic elasticity. ∞ 1 ∞
11 As is well known, these results have limited relevance because (i) the bottom result does not apply when there is an atom of nonworkers, and (ii) the top rate drops to zero only for the single topmost earner (Saez (2001)).
OPTIMAL INCOME TAXATION OF COUPLES
545
PROOF: V0 (n) and V1 (n) are increasing in n without bound (as T0 T1 converge to values below 1). As Ψ > 0 is decreasing, it must converge to ψ¯ ≥ 0. q¯ Therefore, in the work cost model, g0 = Ψ (V0 )/λ and g1 = 0 Ψ (V0 + q¯ − ¯ ≥ 0.12 Because T1 − T0 ¯ q)p(q|n) dq/[λ · P(q|n)] both converge to g∞ = ψ/λ ∞ ∞ converges, it must be the case that T 0 = T 1 = T ∞ . Hence, as h (zl /n) = 1 − Tl , zl /n converge for both l = 0 1 and εl = h (zl /n)/(h (zl /n)zl /n) also converges to ε∞ . ¯ ¯ Because P(·|n) and q¯ converge, P(q|n) and p(q|n) converge to P ∞ (q¯ ∞ ) ∞ ∞ and p (q¯ ). The Pareto assumption implies that (1 − F(n))/(nf (n)) = 1/a for large n. Taking the limit of (7) and (8) as n → ∞, we obtain, respectively, T ∞ /(1 − T ∞ ) = (1/ε∞ )(1/a)[1 − g∞ + T ∞ p∞ /(1 − P ∞ )] and T ∞ /(1 − T ∞ ) = (1/ε∞ )(1/a)[1 −g∞ − T ∞ p∞ /(1−P ∞ )]. Hence, we must have T ∞ = Q.E.D. 0, and the formula for T ∞ then follows. It is quite striking that the spouses of very high earners should be exempted from taxation as n tends to infinity, even in the case where the government tries to extract as much tax revenue as possible from high-income couples (g∞ = 0). Although this result may seem similar to the classic no-distortion-at-the-top result reviewed above, the logic behind our result is completely different. In fact, in the present case with an infinite tail for n, Proposition 2 shows that the marginal tax rate on primary earners does not converge to zero. Instead, the marginal tax rates converges to the positive constant (1 − g∞ )/(1 − g∞ + aε∞ ), exactly as in the individualistic Mirrlees model when n → ∞ (Saez (2001)).13 To grasp the intuition behind the zero second-earner tax at the top, consider a situation where T1 − T0 does not converge to zero but instead converges to
T ∞ > 0 as illustrated on Figure 1. Consider then a reform that increases the tax on one-earner couples and decreases the tax on two-earner couples above some high n, and in such a way that the net mechanical effect on government revenue is zero.14 These tax burden changes are achieved by increasing the marginal tax rate for one-earner couples in a small band (n n + dn) and lowering the marginal tax rate for two-earner couples in this band. What are the welfare effects of the reform? First, there are direct welfare effects as the reform redistributes income from one-earner couples (who lose dW0 ) to two-earner couples (who gain dW1 ). However, because g0 and g1 have converged to g∞ , these direct welfare effects cancel out. Second, there are fiscal effects due to earnings responses of primary earners in the small band where marginal tax rates have been changed (dH0 and dH1 ). Because T1 − T0 has converged to a constant for large n, the marginal tax rates on one- and ¯ ≤ g0 < g1 = Ψ (V1 )/λ → ψ/λ. ¯ In the home production model, we also have ψ/λ Conversely, in the case of a bounded ability distribution, the top marginal tax rate on primary earnings would be zero, but then the tax on the secondary earner would be positive. 14 ¯ Because q¯ and hence P(q|n) have converged, revenue neutrality requires that the tax changes ¯ and dT1 = −dT/P(q), ¯ respectively. on one- and two-earner couples are dT0 = dT/(1 − P(q)) 12 13
546
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
FIGURE 1.—Zero second-earner tax at the top. ∞ two-earner couples are identical, T ∞ 0 = T 1 , which implies z0 /n = z1 /n and hence identical primary-earner elasticities ε0 = ε1 . Thus, the negative fiscal effect dH0 exactly offsets the positive fiscal effect dH1 . Third, there is a participation effect as some secondary earners are induced to join the labor force in response to the lower T1 − T0 . Because T1 − T0 is initially positive, this response generates a positive fiscal effect, dP > 0. Since all other effects were zero, dP > 0 is the net total welfare effect of the reform, implying that the original schedule with T ∞ > 0 cannot be optimal.15
3.3. Optimal Jointness To analyze the optimal form of jointness, we introduce two additional assumptions. ASSUMPTION 2: The function V −→ Ψ (V ) is strictly convex. This is satisfied for standard CRRA or CARA social welfare functions. In consumer theory, convexity of marginal utility of consumption is a common 15 The opposite situation with T ∞ < 0 cannot be optimal either, because the reverse reform would then improve welfare.
OPTIMAL INCOME TAXATION OF COUPLES
547
assumption, because it captures the notion of prudence and generates precautionary savings. As shown below, this assumption captures the central idea that secondary earnings matter less and less for social marginal welfare as primary earnings increase. ASSUMPTION 3: q and n are independently distributed. Abstracting from correlation in spouse characteristics (assortative matching) allows us to isolate the implications of the spousal interaction occurring through the social welfare function. In Section 4, we examine numerically how assortative matching affects our results. To establish an intuition on the optimal form of jointness, let us consider a tax reform introducing a little bit of jointness around the optimal separable tax system. For the work cost model, we will argue that the optimal separable schedule can be improved by introducing a little bit of negative jointness.16 ¯ and A separable schedule is one where T0 = T1 , implying that T1 − T0 , q, ¯ are constant in n. In the work cost model, we would have T1 − T0 > 0 due P(q) to the property g0 − g1 > 0. As discussed above, this property follows from the fact that, at a given n, being a two-earner couple is a signal of low work costs and being better off than one-earner couples. Moreover, under Assumptions 2 and 3, and starting from a separable tax system, g0 − g1 is decreasing in n. Intuitively, as primary-earner ability increases, the contribution of secondary earnings to couple utility is declining in relative terms, and therefore the value of redistribution from two- to one-earner couples is declining. Formally, under ¯ separable taxation and Assumption 3, we have that q¯ = w − (T1 − T0 ), P(q|n) = ¯ and p(q|n) = p(q) are constant in n. Then, from the definitions of g0 (n) P(q), and g1 (n), we obtain q¯ Ψ (V0 + q¯ − q)p(q) dq Ψ (V0 ) d[g0 (n) − g1 (n)] 0 = − · V˙0 < 0 (10) ¯ dn λ λ · P(q) where we have used V1 = V0 + q¯ from equation (4). Since Ψ (·) is increasing (by Assumption 2) and V0 is increasing in n, it follows that the expression in (10) is negative. Now, consider a tax reform introducing a little bit of negative jointness as shown in Figure 2. The tax reform has two components. Above ability level n, we increase the tax on one-earner couples and decrease the tax on two-earner couples. Below ability level n, we decrease the tax on one-earner couples and increase the tax on two-earner couples. These tax burden changes are associated with changes in the marginal tax rates on primary earners around n. 16 In the home production model, reversed arguments show that some positive jointness is welfare improving.
548
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
FIGURE 2.—Desirability of negative jointness.
To ensure that the reform is revenue-neutral (absent any behavioral responses), let the size of the tax change on each segment be inversely proportional to the number of couples on the segment. That is, above n, the tax ¯ and the change for one-earner couples is dT0a = dT/[(1 − F(n))(1 − P(q))] ¯ Below n, tax change for two-earner couples is dT1a = −dT/[(1 − F(n))P(q)]. ¯ and the the tax change for one-earner couples is dT0b = dT/[F(n)(1 − P(q))] ¯ There are three tax change for two-earner couples is dT1b = dT/[F(n)P(q)]. effects. First, there is a direct welfare effect created by the redistribution across couples at each n : n dT · (11) [g0 (n ) − g1 (n )]f (n ) dn dW = F(n) n n¯ dT · [g0 (n ) − g1 (n )]f (n ) dn > 0 − 1 − F(n) n The first term reflects the gain created at the bottom by redistributing from two-earner to one-earner couples, and the second term reflects the loss created at the top from the opposite redistribution. Equation (10) implies that the gain dominates the loss at the top, so that dW > 0.
OPTIMAL INCOME TAXATION OF COUPLES
549
Second, there are fiscal effects associated with earnings responses by primary earners induced by the changes in T0 and T1 around n. Since the reform increases the marginal tax rate for one-earner couples around n and reduces it for two-earner couples, the earnings responses are opposite. As we start from separable taxation, T0 = T1 , and hence identical primary-earner elasticities, ε0 = ε1 , the fiscal effects of primary earner responses cancel out exactly. Third, the reform creates participation responses by secondary earners. Above n, nonworking spouses will be induced to join the labor force. Below n, working spouses have an incentive to drop out. Because spouse characteristics q and n are independent, and since we start from a separable tax system, ¯ ¯ and T1 − T0 are initially constant. the participation elasticity η = q¯ · p(q)/P( q) Therefore, the fiscal implications of these responses also cancel out exactly. Therefore, dW > 0 is the net total welfare effect of the reform. Hence, under Assumptions 1–3, introducing a little bit of negative jointness increases welfare. This perturbation argument suggests that, for the work cost model, the optimal incentive scheme will be associated with negative jointness, a point we will prove formally after introducing a final technical assumption: ASSUMPTION 4: The function x −→ x · p(w − x)/[P(w − x) · (1 − P(w − x))] q is increasing and p(q)/P(q) ≤ P(q)/ 0 P(q ) dq for all q. This assumption is satisfied for isoelastic work cost distributions, P(q) = (q/qmax )η , where the participation elasticity of secondary earners is constant and equal to η.17 PROPOSITION 3: Under Assumptions 1–4 and if the optimal solution is not associated with bunching, the tax system is characterized by the following models: Work Cost Model: 1a. Positive tax on secondary-earner income, τ > 0 for all ¯ 1b. Negative jointness, T1 ≤ T0 and τ˙ ≤ 0 for all n ∈ [n n]. ¯ n ∈ [n n]. Home Production Model: 2a. Negative tax on secondary-earner income, τ < 0 ¯ 2b. Positive jointness, T1 ≥ T0 and τ˙ ≥ 0 for all n ∈ [n n]. ¯ for all n ∈ [n n]. PROOF: We consider the work cost model.18 Suppose by contradiction that T > T0 for some n. Then, because T0 and T1 are continuous in n and because T = T0 at the top and bottom skills, there exists an interval (na nb ) where T > T0 and where T1 = T0 at the end points na and nb . This implies that z1 < z0 1 1 1
17
Assumption 4 can be seen as a counterpart to Assumption 1 for the participation margin. It ensures that the participation response does not decrease too fast with the tax rate. It was not needed for the small reform argument, because in that case the efficiency effects from participation responses cancel out to the first order. 18 Results 2a and 2b may be established by reversing all inequalities in the proof below.
550
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
on (na nb ) with equality at the end points. Assumption 1 implies z0 z1 z1 ε1 T1 /(1 − T1 ) = 1 − h h n n n z1 z0 z0 > 1 − h h n n n = ε0 T0 /(1 − T0 ) on (na nb ). Then, because of our no bunching assumption, (7) and (8) imply n¯ 1 [(1 − g0 )(1 − P) + T · p]f (n ) dn 1−P n 1 n¯ < [(1 − g1 )P − T · p]f (n ) dn ≡ Ω1 (n) P n
Ω0 (n) ≡
on (na nb ) with equality at the end points. This implies that the derivatives of the above expressions with respect to n, at the end points, obey the inequalities Ω˙ 0 (na ) ≤ Ω˙ 1 (na ) and Ω˙ 0 (nb ) ≥ Ω˙ 1 (nb ). At the end points, we have T1 = T0 , z0 = z1 , and V˙0 = V˙1 , which implies q˙¯ = 0 and P˙ = 0. Hence, the inequalities in derivatives can be written as ≥ 1 − g1 − T · p/P at na , 1 − g0 + T · p/(1 − P) ≤ 1 − g1 − T · p/P at nb . Combining these inequalities, we obtain
T · p
T · p ≥ g0 (na ) − g1 (na ) > g0 (nb ) − g1 (nb ) ≥ P(1 − P) na P(1 − P) nb From our small reform argument, the middle inequality is intuitive and we prove it formally in Appendix B. Using that q¯ = w − T at na and nb , along with the first part of Assumption 4, we obtain T (na ) > T (nb ). However, given T1 > T0 and hence z1 < z0 , we have q˙¯ < 0 on the interval (na nb ). This ¯ a ) ≥ q(n ¯ b ) and thus T (na ) ≤ T (nb ). This generates a contradicimplies q(n tion, which proves that T1 ≤ T0 for all n. ¯ with Property 1a follows easily from 1b. Since we now have T1 ≤ T0 on (n n) ¯ with equality equality at the end points, we obtain Ω0 (n) ≥ Ω1 (n) on (n n) ¯ ≤ Ω˙ 1 (n), ¯ which implies 1 − g0 + at the end points. Then we have that Ω˙ 0 (n) ¯ Because g0 (n) ¯ − g1 (n) ¯ > 0, we have
T · p/(1 − P) ≥ 1 − g1 − T · p/P at n. ¯ > 0. Finally, T1 ≤ T0 and hence z1 ≥ z0 implies q˙¯ = V˙1 − V˙0 ≥ 0 from
T (n) ¯ ¯ n))/w ¯ ¯ equation (3). Hence, τ(n) = (w − q(n))/w ≥ (w − q( = T (n)/w >0 ¯ for all n, where the last equality follows from T1 = T0 = 0 at n. Q.E.D.
OPTIMAL INCOME TAXATION OF COUPLES
551
We may summarize our findings as follows. In the work cost model, secondearner participation is a signal of low work costs and hence being better off than one-earner couples. This implies g0 (n) > g1 (n), which makes it optimal to tax secondary earnings, τ > 0. In the home production model, second-earner participation is a signal of low ability in home production and hence being worse off than one-earner couples. In this model, it is therefore optimal to subsidize secondary earnings, τ < 0.19 In either model, the redistribution between one- and two-earner couples gives rise to a distortion in the entry–exit decision of secondary earners, creating an equity–efficiency trade-off. The size of the efficiency cost does not depend on the ability of the primary earner, because spousal characteristics q and n are independently distributed. An increase in n therefore influences the optimal second-earner distortion only through its impact on the equity gain as reflected by g0 (n) − g1 (n). Because the contribution of the secondary earner to couple utility is declining in relative terms, the value of redistribution between one- and two-earner couples is declining in n, that is, g0 (n) − g1 (n) is decreasing in n. Therefore, the second-earner distortion is declining with primary earnings. As shown in Proposition 2, if the ability distribution of primary earners is unbounded, the secondary-earner distortion tends to zero at the top.20 Instead of working with a social welfare function Ψ (·), if we assume exogenous Pareto weights (λ0 (n) λ1 (n)), then the social marginal welfare weights g0 (n) = λ0 (n)/λ and g1 = λ1 (n)/λ would be fixed a priori. Optimal tax formulas (7) and (8) would carry over. Positive versus negative second-earner tax rates would depend on the sign of λ0 (n) − λ1 (n), and positive versus negative jointness would depend on the profile of λ0 (n) − λ1 (n) with respect to n. The asymptotic zero tax result would be true iff λ0 (n) − λ1 (n) → 0 as n → ∞. Hence, all results would depend on the assumptions made on the exogenous Pareto weights. Unlike our reform argument, the negative jointness result in Proposition 3 relies on an assumption of no bunching. As we discuss in the online supplemental material, when redistributive tastes are weak, the optimal solution is close to the no-tax situation and therefore should display no bunching.21 For strong redistributive tastes, our numerical simulations show that there is no bunching in a wide set of cases. 19 In a more general model with both costs of work and home production, there should be a tax (subsidy) on secondary earnings if there is more (less) heterogeneity in work costs than in home production abilities (see the online supplemental material for a discussion). 20 If Ψ is quadratic, then g0 − g1 is constant in n and the optimal tax system is separable. If Ψ is concave, then g0 − g1 increases in n and the distortion on spouses actually increases with n. As discussed above, the case Ψ convex (Assumption 2) fits best with the intuition that secondary earnings affect marginal social utility less when primary earnings are higher. 21 This is also true in the one-dimensional model. We provide a simple formal proof of this in the online supplemental material.
552
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
4. NUMERICAL CALIBRATION FOR THE UNITED KINGDOM Numerical simulations are conceptually important (i) to assess whether our no bunching assumption in Proposition 3 is reasonable, (ii) to assess how quickly the second-earner tax rate decreases to zero (scope of Proposition 2), and (iii) to analyze if and to what extent optimal schedules resemble real-world schedules. We focus on the more realistic and traditional work cost model and make the following parametric assumptions: (a) h(x) = ε/(1 + ε)x1+1/ε so that the elasticity of primary earnings ε is constant; (b) q is distributed as a power function on the interval [0 qmax ] with distribution function P(q) = (q/qmax )η , implying a constant second-earner participation elasticity η; (c) the social welfare function is CRRA, Ψ (V ) = V 1−γ /(1 − γ), where γ > 0 measures preferences for equity. We calibrate the ability distribution F(n) and qmax using the British Family Resource Survey for 2004/5 linked to the tax-benefit microsimulation model TAXBEN at the Institute for Fiscal Studies. We define the primary earner as the husband and the secondary earner as the wife. Figure 3A depicts the ac-
FIGURE 3.—Numerical simulations: current system. Computations are based on the British Family Resource Survey for 2004/05 and TAXBEN tax/transfer calculator.
OPTIMAL INCOME TAXATION OF COUPLES
553
tual tax rates T0 , T1 , and τ faced by couples in the United Kingdom. As in Saez (2001), f (n) is calibrated such that, at the actual marginal tax rates, the resulting distribution of primary earnings matches the empirical earnings distribution for married men. The top quintile of the distribution (n ≥ £46000) is approximated by a Pareto distribution with coefficient a = 2, a good approximation according to Brewer, Saez, and Shephard (2008). Figure 3B depicts the calibrated density distribution f (n). The dashed line is the raw density distribution and the solid line is the smoothed density that we use to obtain smooth optimal schedules. Figure 3C shows that the participation rate of wives conditional on husbands’ earnings is fairly constant across the earnings distribution and equal to 75% on average. Figure 3D shows that average female earnings, conditional on participation, are slightly increasing in husbands’ earnings. Our model with homogenous secondary earnings does not capture this feature. We therefore assume (except when we explore the effects of assortative matching below) that qmax (and hence q) is independent of n. We calibrate qmax so that the average participation rate (under the current tax system) matches the empirical rate. The w parameter is set equal to average female earnings conditional on participation.22 Based on the empirical labor supply literature for the United Kingdom (see Brewer, Saez, and Shephard (2008)), we assume ε = 025 and η = 05 in our benchmark case. Based on estimates of the curvature of utility functions consistent with labor supply responses, we set γ equal to 1 (see, e.g., Chetty (2006)). Finally, we assume that the simulated optimal tax system (net of transfers) must collect as much tax revenue (net of transfers) as the actual U.K. tax system, which we compute using TAXBEN and the empirical data. In all simulations, we check that the implementation conditions (zl (n) increasing in n) are satisfied so that there is no bunching. All technical details of the simulations are described in the online supplemental material. Figure 4A plots the optimal T0 , T1 , and τ as a function of n in our benchmark case. Consistent with the theoretical results, we have T1 < T0 and τ declining in n. Consistent with earlier work on the single-earner model (e.g., Saez (2001)), optimal marginal tax rates on primary earners follow a U-shape, with very high marginal rates at the bottom corresponding to the phasing out of welfare benefits, lower rates at the middle, and increasing rates at the top converging to 667% = 1/(1 + a · ε). The difference between T1 and T0 is about 8 percentage points on average, and τ is almost 40% at the bottom and then declines toward zero fairly quickly. This suggests that the negative jointness property as well as the zero second-earner tax at the top are quantitatively 22 Positive correlation in abilities across spouses with income effects could also generate those empirical patterns. Analyzing a calibrated case with income effects is beyond the scope of this paper and is left for future work.
554
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
FIGURE 4.—Optimal tax simulations. Computations are based on the British Family Resource Survey for 2004/05 and TAXBEN tax/transfer calculator.
significant results and not just theoretical curiosities. Finally, notice that tax rates on primary earners are substantially higher than on secondary earners because the primary-earner elasticity is smaller than the secondary-earner elasticity. Figure 4B introduces a positive correlation in spousal abilities by letting qmax depend on n, so that the fraction of working spouses (under the current tax system) increases smoothly from 55% to 80% across the distribution of n. This captures indirectly the positive correlation in earnings shown in Figure 3D. Figure 4B shows that introducing this amount of correlation has minimal effects on optimal tax rates. Compared to no correlation, the second-earner tax is slightly higher at the bottom, which reinforces the declining profile for τ. Figure 4C explores the effects of increasing redistributive tastes γ from 1 to 2. Not surprisingly, this increases tax rates across the board. Figure 4D considers a higher primary-earner elasticity (ε = 05). As expected, this reduces primary-earner tax rates (especially at the top).
OPTIMAL INCOME TAXATION OF COUPLES
555
Importantly, none of our simulations displays bunching, which suggests that there is no bunching in a wide set of cases and hence that Proposition 3 applies broadly. Comparing the simulations with the empirical tax rates in Figure 3A is illuminating. The actual tax-transfer system also features negative jointness, with the second-earner tax rate falling from about 40% at the bottom to about 20% at the middle and upper parts of the primary earnings distribution. This may seem surprising at first glance given that the United Kingdom operates an individual income tax. However, income transfers in the United Kingdom (as in virtually all Organization for Economic Cooperation and Development countries) are means tested based on family income. The combination of an individual income tax and a family-based, meanstested welfare system generates negative jointness: a wife married to a lowincome husband will be in the phase-out range of welfare programs and hence faces a high tax rate, whereas a wife married to a high-income husband is beyond benefit phase-out and hence faces a low tax rate because the income tax is individual. Thus, our theoretical and numerical findings of negative jointness may provide a justification for the current practice in many countries of combining family-based transfers with individual income taxation.2324 Clearly, our calibration abstracts from several potentially important aspects such as income effects, heterogeneity in secondary earnings, and endogenous marriage. Hence, our simulations should be seen as an illustration of our theory rather than actual policy recommendation. More complex and comprehensive numerical calibrations are left for future work. APPENDIX A: PROOF OF PROPOSITION 1 The government maximizes W =
n¯ n
0
+
23
q¯
Ψ (V1 − qw )p(q|n) dq
∞
Ψ (V0 + q )p(q|n) dq f (n) dn h
q¯
Indeed, Immervoll, Kleven, Kreiner, and Verdelin (2008) showed that most European Union countries feature negative jointness at the bottom driven by family-based transfers. 24 As for the size and profile of primary-earner tax rates, the current U.K. schedule displays lower rates at the very bottom (below £6–7K) than the simulations. This might be justified by participation responses for low-income primary earners (Saez (2002)), not incorporated in our model. Above £6–7K, the current U.K. tax system does display a weak U-shape with the highest marginal rates at the bottom and modest increases above £40K.
556
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
where q¯ = V1 − V0 , q = qw + qh and either qw = 0 or qh = 0. The objective is maximized subject to the budget constraint n¯ z1 ¯ − V1 P(q|n) z1 + w − nh n n
z0 ¯ − V0 (1 − P(q|n)) + z0 − nh f (n) dn ≥ 0 n and the constraints from household optimization, V˙l = −h(zl /n)+zl /nh (zl /n) for l = 0 1. Let λ, μ0 (n), and μ1 (n) be the associated multipliers, and let H(z0 z1 V0 V1 μ0 μ1 λ n) be the Hamiltonian. We demonstrate the existence of a measurable solution n → z(n) in the online supplemental material. The Pontryagin maximum principle then provides necessary conditions that hold at the optimum: (i) There exist absolutely continuous multipliers (μ0 (n) μ1 (n)) such that ¯ μ˙ l (n) = −∂H/∂Vl almost everywhere in n with transversality condion (n n), ¯ = 0 for l = 0 1. tions μl (n) = μl (n) (ii) We have H(z(n) V (n) μ(n) λ n) ≥ H(z V (n) μ(n) λ n) for all z almost everywhere in n. The first-order conditions associated with this maximization condition are ∂H μ0 z0 z0 z0 ¯ (A1) · ·h +λ· 1−h · (1 − P(q|n)) · f (n) = ∂z0 n n n n = 0
(A2)
μ1 z1 z1 ∂H z1 ¯ · ·h +λ· 1−h · P(q|n) · f (n) = 0 = ∂z1 n n n n
By Assumption 1, ϕ(x) ≡ (1 − h (x))/(xh (x)) is decreasing in x. Rewriting ¯ (A1) as ϕ(z0 /n) = −μ0 (n)/[λ(1 − P(q|n))nf (n)], Assumption 1 implies that (A1) has a unique solution z0 (n), and that ∂H/∂z0 > 0 for z0 < z0 (n) and ∂H/∂z0 < 0 for z0 > z0 (n). This ensures that z0 (n) is indeed the global maximum for H as required in the Pontryagin maximum principle. Obviously, the state variable V (n) is continuous in n. Thus, ϕ(z0 (n)/n) = −μ0 (n)/[λ(1 − P(V1 (n) − V0 (n)|n))nf (n)] implies that z0 (n) is continuous in n.25 Similarly, z1 (n) is continuous in n.26 By defining Tl ≡ 1 − h (zl (n)/n), we have that (T0 T1 ) is also continuous in n.27 The assumption that n → f (n) and x → h (x) are continuous is required here. Those continuity results also apply to the one-dimensional case and were explicitly derived by Mirrlees (1971) under a condition equivalent to our Assumption 1. The subsequent literature almost always assumes continuity. 27 Notice that we adopt this definition of Tl everywhere, including points where z → T (z) has a kink. 25 26
OPTIMAL INCOME TAXATION OF COUPLES
557
The conditions μ˙ l (n) = −∂H/∂Vl for l = 0 1 imply ∞ ¯ (A3) Ψ (V0 + qh )p(q|n)f (n) dq − λ(1 − P(q|n))f (n) −μ˙ 0 (n) = q¯
(A4)
¯ (n) − λ[T1 − T0 ]p(q|n)f q¯ ¯ Ψ (V1 − qw )p(q|n)f (n) dq − λP(q|n)f (n)T1 −μ˙ 1 (n) = 0
¯ (n) + λ[−T0 ]p(q|n)f Using the definition of welfare weights, g0 (n) and g1 (n), we integrate (A3) and (A4) using the upper transversality conditions so as to obtain n¯ μ0 (n) ¯ ))f (n ) − = [1 − g0 (n )](1 − P(q|n λ n ¯ )f (n ) dn + [T1 − T0 ]p(q|n n¯ μ1 (n) ¯ )f (n ) − [T1 − T0 ]p(q|n ¯ )f (n ) dn = − [1 − g1 (n )]P(q|n λ n Inserting these two equations into (A1) and (A2), noting that Tl = 1 − hl , and using the elasticity definition εl = h (zl /n)/[zl /nh (zl /n)], we obtain equations (7) and (8) in Proposition 1. The transversality conditions μ0 = μ1 = 0 at n and n¯ combined with (A1) ¯ and (A2) imply that h (z0 /n) = h (z1 /n) = 1 and hence T1 = T0 = 0 at n and n. As shown in the online supplemental material, a necessary and sufficient condition for implementability is that z0 and z1 are weakly increasing in n (exactly as in the one-dimensional Mirrlees model). If (7) and (8) generate decreasing ranges for z0 or z1 , there is bunching and the formulas do not apply on the bunching portions. It is straightforward to include the constraints z˙ l (n) ≥ 0 in the maximization problem (as in Mirrlees (1986)).28 On a bunching portion, zl (n) is constant (say equal to z ∗ ) and hence Tl = 1 − h (z ∗ /n) remains continuous in n as stated in Proposition 1, but z → Tl (z) jumps discontinuously at z ∗ and z → Tl (z) displays a kink at z ∗ . Hence the optimal solution z → T (z) is continuous and z → T (z) is piecewise continuous. We do not establish that the solution is unique, but uniqueness is not required for our results. Uniqueness would follow from the concavity of (z V ) → H(z V μ(n) λ n), but this is a very strong assumption. In the simulations, we can check numerically that, under our parametric assumptions, the stronger concavity assumptions required for uniqueness hold in the domain of interest so that we are sure the numerical solution we find is indeed the global optimum. 28 We do not include such constraints formally so as to simplify the exposition and because our main Proposition 3 assumes no bunching and our simulations never involve bunching.
558
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
APPENDIX B: PROOF OF LEMMA IN PROPOSITION 3 LEMMA B1: Under Assumptions 1–4, if T1 > T0 on (na nb ) with equality at the end points, then g0 (na ) − g1 (na ) > g0 (nb ) − g1 (nb ). q¯ PROOF: We have q¯ = V1 −V0 and g0 −g1 = Ψ (V0 )/λ− 0 Ψ (V1 −q)p(q) dq/ ¯ > 0 (inequality follows from Ψ decreasing). Differentiating with re(λ · P(q)) spect to n, we obtain q¯ Ψ (V1 − q)p(q) dq (V Ψ 0) 0 − V˙1 · g˙ 0 − g˙ 1 = V˙0 · ¯ λ λ · P(q) q¯ Ψ (V1 − q)p(q) dq ¯ q˙¯ Ψ (V ) p(q) 0 · 0 − + ¯ ¯ P(q) λ · P(q) λ which can be rewritten as
(B1)
q¯
Ψ (V0 + q¯ − q)p(q) dq (V ) Ψ 0 − 0 g˙ 0 − g˙ 1 = V˙1 · ¯ λ λ · P(q) ¯ Ψ (V0 ) p(q) ˙ − + q¯ · −(g0 − g1 ) · ¯ P(q) λ
The first term in (B1) is negative, because V˙1 > 0 and Ψ is increasing (by Assumption 2) so that the term inside the first square brackets is negative. On (na nb ), z1 < z0 and hence q˙¯ < 0. Moreover, convexity of Ψ implies Ψ (V0 ) − Ψ (V0 + q¯ − q) ≤ −Ψ (V0 ) · (q¯ − q) and hence q¯ [Ψ (V0 ) − Ψ (V0 + q¯ − q)]p(q) dq 0 g0 − g1 = (B2) ¯ λ · P(q) q¯ P(q) dq ≤ −Ψ (V0 ) · 0 ¯ λ · P(q) q¯ q¯ where we have used that 0 (q¯ − q)p(q) dq = 0 P(q) dq by integration by parts and P(0) = 0. Combining (B2) and the second part of Assumption 4, we have ¯ ¯ ≤ −Ψ (V0 )/λ Thus, the second term in square brackets q) (g0 − g1 ) · p(q)/P( in (B1) is nonnegative, making the entire second term in (B1) nonpositive. As Q.E.D. a result, g˙ 0 (n) − g˙ 1 (n) < 0 on (na nb ) and the lemma is proven.
OPTIMAL INCOME TAXATION OF COUPLES
559
REFERENCES ARMSTRONG, M. (1996): “Multiproduct Nonlinear Pricing,” Econometrica, 64, 51–75. [538] BOSKIN, M., AND E. SHESHINSKI (1983): “Optimal Tax Treatment of the Family: Married Couples,” Journal of Public Economics, 20, 281–297. [538] BRETT, C. (2006): “Optimal Nonlinear Taxes for Families,” International Tax and Public Finance, 14, 225–261. [538] BREWER, M., E. SAEZ, AND A. SHEPHARD (2008): “Means Testing and Tax Rates on Earnings,” IFS Working Paper. Forthcoming in Reforming the Tax System for the 21st Century, Oxford University Press, 2009. [553] CHETTY, R. (2006): “A New Method of Estimating Risk Aversion,” American Economic Review, 96, 1821–1834. [553] CREMER, H., J. LOZACHMEUR, AND P. PESTIEAU (2007): “Income Taxation of Couples and the Tax Unit Choice,” CORE Discussion Paper No. 2007/13. [538] CREMER, H., P. PESTIEAU, AND J. ROCHET (2001): “Direct versus Indirect Taxation: The Design of the Tax Structure Revisited,” International Economic Review, 42, 781–799. [538] IMMERVOLL, H., H. J. KLEVEN, C. T. KREINER, AND N. VERDELIN (2008): “An Evaluation of the Tax-Transfer Treatment of Married Couples in European Countries,” Working Paper 2008-03, EPRU. [555] KLEVEN, H. J., C. T. KREINER, AND E. SAEZ (2006): “The Optimal Income Taxation of Couples,” Working Paper 12685, NBER. [538,544] (2009): “Supplement to ‘The Optimal Income Taxation of Couples’,” Econometrica Supplementary Material, 77, http://www.econometricsociety.org/ecta/Supmat/7343_Proofs.pdf and http://www.econometricsociety.org/ecta/Supmat/7343_Data and programs.zip. [539] MIRRLEES, J. A. (1971): “An Exploration in the Theory of Optimal Income Taxation,” Review of Economic Studies, 38, 175–208. [537,543,556] (1976): “Optimal Tax Theory: A Synthesis,” Journal of Public Economics, 6, 327–358. [538] (1986): “The Theory of Optimal Taxation,” in Handbook of Mathematical Economics, Vol. 3, ed. by K. J. Arrow and M. D. Intrilligator. Amsterdam: Elsevier. [538,557] RAMEY, V. A. (2008): “Time Spent in Home Production in the 20th Century: New Estimates From Old Data,” Working Paper 13985, NBER. [540] ROCHET, J., AND C. PHILIPPE (1998): “Ironing, Sweeping, and Multi-Dimensional Screening,” Econometrica, 66, 783–826. [538] SAEZ, E. (2001): “Using Elasticities to Derive Optimal Income Tax Rates,” Review of Economic Studies, 68, 205–229. [540,544,545,553] (2002): “Optimal Income Transfer Programs: Intensive versus Extensive Labor Supply Responses,” Quarterly Journal of Economics, 117, 1039–1073. [555] SCHROYEN, F. (2003): “Redistributive Taxation and the Household: The Case of Individual Filings,” Journal of Public Economics, 87, 2527–2547. [538]
Dept. of Economics, London School of Economics, Houghton Street, London WC2A 2AE, U.K. and Economic Policy Research Unit, Dept. of Economics, University of Copenhagen, Copenhagen, Denmark and Centre for Economic Policy Research, London, U.K.;
[email protected], Dept. of Economics, University of Copenhagen, Studiestraede 6, # 1455 Copenhagen, Denmark and Economic Policy Research Unit, Dept. of Economics, University of Copenhagen, Copenhagen, Denmark and CESifo, Munich, Germany, and
560
H. J. KLEVEN, C. T. KREINER, AND E. SAEZ
Dept. of Economics, University of California–Berkeley, 549 Evans Hall 3880, Berkeley, CA 94720, U.S.A. and NBER;
[email protected]. Manuscript received August, 2007; final revision received August, 2008.
Econometrica, Vol. 77, No. 2 (March, 2009), 561–584
DIRECTED SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS BY SHOUYONG SHI1 I construct a theoretical framework in which firms offer wage–tenure contracts to direct the search by risk-averse workers. All workers can search, on or off the job. I characterize an equilibrium and prove its existence. The equilibrium generates a nondegenerate, continuous distribution of employed workers over the values of contracts, despite that all matches are identical and workers observe all offers. A striking property is that the equilibrium is block recursive; that is, individuals’ optimal decisions and optimal contracts are independent of the distribution of workers. This property makes the equilibrium analysis tractable. Consistent with stylized facts, the equilibrium predicts that (i) wages increase with tenure, (ii) job-to-job transitions decrease with tenure and wages, and (iii) wage mobility is limited in the sense that the lower the worker’s wage, the lower the future wage a worker will move to in the next job transition. Moreover, block recursivity implies that changes in the unemployment benefit and the minimum wage have no effect on an employed worker’s job-to-job transitions and contracts. KEYWORDS: Directed search, on-the-job search, wage–tenure contracts.
1. INTRODUCTION SEARCH ON THE JOB is prevalent and generates large job-to-job transitions. On average, 2.6 percent of employed workers in the United States change employers each month, and nearly two-fifths of new jobs represent employer changes (Fallick and Fleischman (2004)). This large flow of workers between jobs exhibits three stylized patterns. First, the longer the tenure that a worker has on his current job, the less likely he will quit for another job (Farber (1999)). Second, controlling for individual heterogeneity, wage is a key determinant of mobility: a worker with a higher wage is less likely to quit for another job (Topel and Ward (1992)). Third, wage mobility is limited in the following sense: controlling for individual characteristics, most of the transitions take place between adjacent quintiles of wages at the lower end of the wage distribution and probabilities of staying in a quintile are higher at the higher quintiles (Buchinsky and Hunt (1999)). 1
I thank a co-editor and three anonymous referees for providing thoughtful suggestions that led to significant improvements in the paper. I also thank Giuseppe Moscarini, Guido Menzio, and Alok Kumar for serving as discussants of the paper. Useful comments have also been received from the participants of workshops and conferences at University of Pennsylvania (2006), Rochester (2006), Yale (2006), University of British Columbia, Calgary, Pittsburgh, the Microfoundations of Markets With Frictions (UQAM, 2007), Research on Money and Markets (Toronto, 2007), the NBER Summer Institute (Cambridge, 2007), Canadian Macro Study Group Meeting (Ottawa, 2007), Cornell–Penn State Macro Conference (2008), and the Econometric Society Winter Meeting (2008). Financial support from a Bank of Canada Fellowship and from the Social Sciences and Humanities Research Council of Canada is gratefully acknowledged. The opinions expressed in the paper are my own and do not reflect the view of the Bank of Canada. All errors are mine. © 2009 The Econometric Society
DOI: 10.3982/ECTA7870
562
SHOUYONG SHI
To explain these facts, I construct a theoretical framework to integrate wage– tenure contracts and on-the-job search. In the model, firms enter the labor market competitively and offer wage–tenure contracts. Workers are risk-averse and identical, all of whom can search for jobs. Search is directed in the sense that, when making search decisions, agents take into account that a higher offer yields a lower matching rate for an applicant and a higher matching rate for a firm. Firms can commit to the contracts but workers cannot commit to staying with a firm. I characterize an equilibrium, prove its existence, and explore its properties. The framework provides consistent explanations for the above facts. First, wages increase with tenure, and job-to-job transitions decrease with tenure and wages. Making wages increase continuously with tenure is the optimal way for a firm to backload wages when workers are risk-averse. As wages rise with tenure, a worker is less likely to quit because the probability of finding higher wages elsewhere falls. Second, directed search strengthens the negative dependence of job-to-job transitions on wages. As an optimal trade-off between offers and matching rates, workers with low wages choose to search for relatively low offers. Because low offers are relatively easier to obtain, low-wage workers make job transitions with higher probabilities than high-wage workers. Third, and similarly, directed search generates limited wage mobility. By climbing up the wage ladder gradually, workers maximize the expected gain from search in each job transition. An equilibrium has a nondegenerate, continuous distribution of wages or values, despite the assumptions that all matches are identical and all workers observe all offers. On-the-job search generates a wage ladder among identical workers by creating dispersion among workers’ histories of search outcomes. Wage–tenure contracts fill in the gap between any two rungs of the ladder by increasing wages continuously with tenure. In addition to explaining the stylized facts, this paper formalizes and explores a key property of an equilibrium with directed search, called block recursivity. That is, individuals’ decisions and equilibrium contracts are independent of the distribution of workers over wages, although the distribution affects aggregate statistics. In general, the nondegenerate distribution can serve as a state variable in individuals’ decisions. By eliminating this role of the distribution, block recursivity makes the equilibrium analysis tractable. Block recursivity arises from directed search, because the optimal trade-off between offers and matching rates implies that workers at different wages choose to apply for different offers. With such endogenous separation, the workers who apply for a particular offer care about only the matching rate at that offer, but not the distribution of workers over other offers. In turn, the matching rate at each offer is determined by free entry of firms independently of the distribution of workers. Besides tractability, block recursivity has a novel policy implication—changes in the unemployment benefit and the minimum wage have no effect on an employed worker’s job-to-job transitions.
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
563
This paper is closely related to Burdett and Coles (2003; BC henceforth). Both papers predict that on-the-job search induces firms to backload wages, thus making wages increase and quit rates decrease with tenure. As a main difference, BC assumed that search is undirected as workers exogenously receive offers.2 With undirected search, wage mobility is not limited as in the data, because all workers have the same probability of obtaining any particular offer regardless of their current wages. Moreover, to sustain a nondegenerate distribution of wages among identical matches, BC assumed that every worker observes at most one offer before applying. In contrast, I assume that all workers observe all offers, which makes wage dispersion more robust and on-the-job search more potent for explaining worker turnover. In addition, a different apparatus is required to establish the existence of an equilibrium with directed search. I model directed search as in Moen (1997) and Acemoglu and Shimer (1999a, 1999b). To the literature of directed search, the main contributions here are to incorporate wage–tenure contracts and on-the-job search, and to formally establish the existence of an equilibrium.3 Moen and Rosen (2004) examined on-the-job search with contracts, but their assumption that on-the-job search is entirely driven by changes in productivity eliminates the main issues and theoretical challenges that I face here. Delacroix and Shi (2006) examined directed search on the job with identical workers, but they assumed that firms offer constant wages, rather than wage–tenure contracts. In this paper, all matches are identical and the productivity of a match is public information. Although heterogeneity, private information, and learning about productivity are important for wage dynamics and turnover in reality, as modeled by Jovanovic (1979), Harris and Holmstrom (1982), and Moscarini (2005), abstracting from them enables me to focus on search. Most of the proofs are omitted in this paper, but are available as Supplemental Material (Shi (2009)). 2. THE MODEL ENVIRONMENT Consider a labor market that lasts forever in continuous time. There is a unit measure of homogeneous, risk-averse workers whose utility function in each period is u(w), where w is income. The utility function has the standard properties: 0 < u (w) < ∞ and −∞ < u (w) < 0 for all w ∈ (0 ∞), and u (0) = ∞. Workers cannot borrow against their future income. An employed 2 BC extended the wage-posting model of Burdett and Mortensen (1998). Undirected search is also the feature of another class of search models, pioneered by Diamond (1982), Mortensen (1982), and Pissarides (1990). 3 Peters (1984, 1991) and Montgomery (1991) are two of the earliest formulations of directed search. Other examples of directed search models include Julien, Kennes, and King (2000), Burdett, Shi, and Wright (2001), Shi (2001, 2002), Coles and Eeckhout (2003), and Galenianos and Kircher (2005).
564
SHOUYONG SHI
worker produces a flow of output y > 0 and an unemployed worker enjoys the unemployment benefit b > 0. A worker dies at a Poisson rate δ ∈ (0 ∞) and is replaced with a newborn who is unemployed. Death is the only exogenous separation. Firms are identical and risk-neutral. Jobs enter the market competitively: a firm can post a vacancy at a flow cost k > 0 and can treat different jobs independently. Firms announce wage–tenure contracts to recruit. A contract ∞ ˜ is denoted as W = {w(t)} t=0 , which specifies the wage at each tenure length t, conditional on that the worker stays with the firm. Firms are assumed to commit to the contracts, but workers can quit a job at any time. In particular, a firm cannot respond to the employee’s outside offers. Normalize the production cost to 0. Firms and workers discount future with the same rate of time preference ρ ≥ 0. Denote r = ρ + δ as the effective discount rate.4 Throughout this paper, t denotes tenure rather than calendar time. Denote V (t) as the value of a contract at t, that is, the worker’s lifetime expected utility generated by the remaining contract at t. The value of a contract at t = 0 is called an offer and is denoted as x = V (0). Denote an unemployed worker’s ˜ “tenure” as t = ∅, the unemployment benefit as b = w(∅), and the value of unemployment as Vu = V (∅). All offers are bounded in [V V¯ ], where (2.1)
¯ V¯ = u(w)/r
V = u(b)/r;
w¯ is the highest wage given later by (3.8), V¯ is the lifetime utility of a worker who is employed at w¯ permanently until death, and V is the lifetime utility of a worker who stays in unemployment forever. I use the phrases “all x” and “all V ” to mean all values in [V V¯ ]. All workers, employed or unemployed, can search. There is a continuum of submarkets indexed by the offer x. Each submarket x has a tightness θ(x), which is the ratio of applicants to vacancies in that submarket. The total number of matches in submarket x is given by a linearly homogeneous matching function M(N(x) N(x)/θ(x)), where N(x) is the number of applicants in the submarket. In submarket x, a vacancy is filled at the Poisson rate q(x) ≡ M(θ(x) 1) and an applicant obtains an offer at the rate p(x) ≡ M(1 1/θ(x)). I refer to q(·) as the hiring rate function and to p(·) as the employment rate function. In an equilibrium, q(x) is increasing and p(x) is decreasing in x. Thus, search is directed in the sense that agents face a tradeoff between offers and matching rates when choosing which submarket to enter. Since the matching rates act as hedonic prices in each submarket, search is competitive.5 4 The assumptions on the contracts and the separation process are the same as in BC. The main difference of my model and BC is that search is directed here. Also, I do not impose BC’s assumption u(0) = −∞. For a model in which firms can counter outside offers, see Postel-Vinay and Robin (2002). 5 Moen (1997) and Acemoglu and Shimer (1999a, 1999b) formulated this competitive process of directed search. An alternative is to formulate the process as a strategic game, for example,
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
565
Although the function M is exogenous, the functions q(·), p(·), and θ(·) are equilibrium objects. Following Moen and Rosen (2004), I eliminate θ from the above expressions for p and q to express p(x) = M(q(x)). Because the function M(q) inherits all essential properties of the function, M, I will take M(·) as a primitive of the model and refer to it as the matching function. I focus on stationary equilibria where the set of offered contracts and the functions q(x) and p(x) are time invariant. Moreover, I focus on an equilibrium in which p(·) satisfies
(2.2)
(i) p(V¯ ) = 0, (ii) p(x) is bounded, continuous and concave for all x, (iii) p(x) is strictly decreasing and continuously differentiable for all x < V¯ .
I will first characterize individuals’ decisions under any arbitrary p function that satisfies (2.2) and then verify, in Theorem 4.1, that an equilibrium satisfying (2.2) exists indeed. 3. WORKERS’ AND FIRMS’ OPTIMAL DECISIONS 3.1. A Worker’s Optimal Search Decision Refer to a worker’s value, V , as the worker’s state or type. If the worker searches in submarket x, he obtains the offer x at rate p(x), which yields the gain (x − V ). The expected gain from search in submarket x is p(x)(x − V ). The optimal search decision x solves (3.1)
S(V (t)) ≡ max p(x)(x − V ) x∈[V (t)V¯ ]
Denote the solution as x = F(V ). I prove the following lemma in Appendix A: LEMMA 3.1: Assume (2.2). Then F(V¯ ) = V¯ . For all V < V¯ , the following results hold: (i) F(V ) is interior, strictly increasing in V , and satisfies (3.2)
V = F(V ) +
p(F(V )) p (F(V ))
(ii) F(V ) is unique for each V and continuous in V . (iii) S(V ) is differentiable, with S (V ) = −p(F(V )) < 0. (iv) F(V2 ) − F(V1 ) ≤ (V2 − V1 )/2 for all V2 ≥ V1 . (v) If p (·) exists, then F (V ) and S (V ) exist, with 0 < F (V ) ≤ 1/2. Peters (1991), Burdett, Shi, and Wright (2001), and Julien, Kennes, and King (2000). The strategic formulation endogenizes the matching function M, but the function converges to a linearly homogeneous function when the number of participants in the market goes to infinity. Moreover, Acemoglu and Shimer (1999b) relaxed the assumption that each applicant observes all offers, and Galenianos and Kircher (2005) allowed each applicant to send two or more applications at once.
566
SHOUYONG SHI
The following properties are noteworthy. First, F(V ) is unique for each V . For a worker at the state V , offers higher than F(V ) have too low employment rates to be optimal, while offers lower than F(V ) have too low values. Only the offer F(V ) provides the optimal trade-off between the value and the employment rate. Second, F(V ) is strictly increasing in V . That is, the higher a worker’s state, the higher the offer for which the worker will apply. Thus, the applicants separate themselves according to their states. This endogenous separation arises because an applicant’s current job is a backup for him when he fails to obtain the applied job. The higher this backup value is, the more the worker can afford to “gamble” on the application and, hence, the higher the offer for which he will apply. Third, the expected gain from search, S(V ), and the actual gain in percentage, (F − V )/V , diminish as V increases. Moreover, S (V ) > 0; that is, the expected gain from search diminishes at a lower rate as V increases. Endogenous separation of the applicants is a common result in directed search models (see Acemoglu and Shimer (1999a), Shi (2001), Moen and Rosen (2004), and Delacroix and Shi (2006)). However, it is not a result in undirected search models (e.g., BC and Burdett and Mortensen (1998)). With undirected search, workers receive offers randomly and exogenously, and so there is no counterpart to the search decision in (3.1). In Section 5, I will contrast my model with undirected search models on worker turnover and wage mobility. 3.2. Value Functions of Workers and Firms Denote f˙ = df/dt for any variable f . Consider first an employed worker whose tenure is t ≥ 0. From the analysis above, the worker searches for the offer x(t) = F(V (t)). At rate p(F(V (t))), he gets the offer and quits the current job. If the worker does not get the offer, he stays in the current contract whose value increases by V˙ . Taking into account time discounting and the event of death, the value for the worker obeys ˜ ρV (t) = u(w(t)) + V˙ (t) + p F(V (t)) F(V (t)) − V (t) − δV (t) Using S(V ) defined by (3.1) and the effective discount rate r = ρ + δ, I can rewrite (3.3)
˜ − S(V (t)) V˙ (t) = rV (t) − u(w(t))
Because I focus on stationary equilibria, the change V˙ is entirely caused by ˜ changes in wages with tenure. If wages are constant, then w(t) = w˜ and
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
567
V˙ (t) = 0 for all t.6 In particular, because the unemployment benefit is constant, V˙u = 0, and Vu obeys (3.4)
0 = rVu − u(b) − S(Vu )
Since S(Vu ) > 0, it is clear that Vu > V , where V is defined in (2.1). Now consider the value of a firm whose worker has a contract with a re˜ denote this firm’s value. Similar to (3.3), I can maining value, V (t). Let J(t) derive ˜ ˜ − y + w(t) ˜ (3.5) d J(t)/dt = r + p F(V (t)) J(t) ˜ is bounded above and below for all t. For any arbitrary ta ∈ [0 t], Note that J(t) define t γ(t ta ) ≡ exp − (3.6) r + p F(V (τ)) dτ ta
Since limt→∞ γ(t ta ) = 0, integrating (3.5) yields ∞ ˜ ˜ J(ta ) = (3.7) [y − w(t)]γ(t ta ) dt ta
3.3. Optimal Recruiting Decisions and Contracts A firm’s recruiting decision contains two parts. The first is to choose an of˜ fer x to maximize the expected value of recruiting, q(x)J(0), taking the function q(·) as given. The second part is to choose a contract to deliver the value x ˜ and to maximize J(0). For the first part, I will later show that there is a continuum of optimal offers, denoted as V = [v1 V¯ ], where v1 ≡ F(Vu ). A high offer increases the chance of filling the vacancy but yields lower profit ex post. A low offer yields higher ex post profit, but reduces the chance of filling the vacancy. A firm is indifferent among the offers in V , because they all yield the same expected value of ˜ = k for all x ∈ V , where k is the vacancy cost. recruiting. That is, q(x)J(0) ¯ ¯ To derive w, ¯ note The highest offer, V , is delivered by the highest wage, w. that p(F(V¯ )) = p(V¯ ) = 0. Since (3.1) implies S(V¯ ) = 0 and (2.1) implies V¯ = ¯ u(w)/r, (3.3) implies V˙ = 0 at V¯ . Similarly, (3.5) implies that the value of a ¯ firm that employs a worker at w¯ is J = (y − w)/r. Because q(V¯ )J = k, then 6 Although a worker can quit the job to become unemployed, it is not optimal to do so in an equilibrium, because optimal contracts provide higher values in employment than in unemployment. Also, because an employed worker never returns to unemployment, the worker has no incentive to save provided that wages increase with tenure.
568
SHOUYONG SHI
w¯ = y − rk/q(V¯ ). Let q¯ ∈ (0 ∞) be the upper bound on q, discussed further ¯ if q(V¯ ) < q, ¯ offering a constant wage slightly in Assumption 1. Then q(V¯ ) = q: above w¯ would yield a higher expected value to the firm.7 Therefore, (3.8)
w¯ = y − rk/q¯ (< y)
J = k/q¯ (> 0)
∞ ˜ For the contracting part of a firm’s decisions, the optimal contract {w(t)} t=0 solves
˜ subject to V (0) = x (P ) max J(0) This problem differs from that in BC in two respects. First, BC assumed ˜ ˜ u(0) = −∞ to prove w(t) > 0 and d w(t)/dt > 0 for all t. This assumption is not necessary in the current model, because employed and unemployed workers face the same employment rate function. Second, the quit rate of a worker employed at V is p(F(V )) here, but it is λ[1 − Q(V )] in BC, where Q(·) is the distribution of offers and λ is a constant. Despite these differences, the following lemma is similar to the results in BC: LEMMA 3.2: Assume (2.2). Optimal contracts have the following features: ˜ ˜ ˜ (i) 0 < w(t) ≤ w¯ for all t > 0. (ii) d w(t)/dt > 0 for all t < ∞, w(t)
w¯ as t → ∞, and 2 ˜ ˜ [u (w(t))] d w(t) dp(F(V (t))) ˜ (3.9) = all t J(t) ˜ dt u (w(t)) dV ˜ ˜ J as (iii) V˙ (t) > 0 and d J(t)/dt < 0 for all t < ∞, with V (t) V¯ and J(t) t → ∞. Moreover, (3.10)
˜ d J(t)/dt =−
V˙ (t) ˜ u (w(t))
all t
Optimal contracts have several properties. First, wages are continuous and increasing in tenure for all finite tenure lengths. This property is generated by firms’ incentive to backload wages and workers’ risk aversion. Because a worker cannot commit to a job, a firm backloads wages to entice the worker to stay. A rising wage profile is less costly to the firm than a constant profile that promises the same value to the worker: as wages rise with tenure, it is more difficult for the worker to find a better offer elsewhere, and so the worker’s quit 7 Suppose that q(V¯ ) = q¯ − a for some a > 0. In this case, w¯ = y − rk/(q¯ − a). A firm that ¯ because deviates from w¯ to w¯ + ε, with ε > 0, attracts all of the workers who are employed at w, ¯ Thus, q(Vˆ ) = q, ¯ where Vˆ = the deviating firm is the only one that offers a wage higher than w. ¯ u(w¯ + ε)/r. The deviating firm’s expected value of recruiting is (y − w¯ − ε)q/r, that exceeds k for sufficiently small ε > 0. A contradiction.
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
569
rate falls. However, if workers are risk neutral, one optimal way to backload wages is to offer a very low wage initially, with promised wage jumps in the future (see Stevens (2004)). Risk aversion makes such jumps suboptimal. Thus, wages increase continuously in tenure in optimal contracts. Second, wages and values are strictly increasing in tenure for all finite tenure lengths. To explain this result, suppose that an optimal contract has a constant segment of wages. This segment should be put at the beginning of the contract to increase the room for backloading wages. Moreover, the constant segment must be at the constrained level 0; otherwise, the firm can increase its expected value by reducing the initial wage and shortening the constant segment. However, such a contract has a strictly lower value than the value of unemployment, because an unemployed worker enjoys a positive benefit and faces the same job opportunities as an employed worker does. Thus, a contract with such a constant segment of wages will not be accepted. Third, optimal contracts induce efficient sharing of the value between a firm ˜ and its worker, in the sense described by (3.10). To elaborate, note that −d J/dt ˙ ˜ is the maris the marginal cost to the firm of increasing wages, while V /u (w) ginal benefit to the worker of the wage increase, measured in the same unit as profit. Thus, (3.10) requires that a wage increase should have the same marginal cost to the firm as the marginal benefit to the worker. Fourth, all optimal contracts are sections of a baseline contract. The baseline ˜ b (0) is the lowest contract, denoted as {w˜ b (t)}∞ t=0 , is an optimal contract where w wage in an equilibrium. The entire set of optimal contracts can be constructed as
∞ ˜ ˜ = w˜ b (t + ta ) ta ∈ [0 ∞) for all t {w(t)} t=0 : w(t) That is, the “tail” of the baseline contract from any arbitrary tenure ta onward is an optimal contract by itself when offered at the beginning of the match. This property is an implication of the principle of dynamic optimality.8 With the above property, it suffices to examine only the baseline contract. From now on, I suppress the subscript b on the baseline contract. In particular, V (t) denotes the value of the baseline contract for a worker at tenure t. Note that the set of equilibrium offers across contracts at any given time can be obtained alternatively by tracing out the baseline contract over tenure. That is, V = {x : x = V (t), all t ≥ 0}. ∞ ˆ If the property does not hold for some tenure ta > 0, then there is another contract {w(t)} t=0 ∞ ˜ ˜ that yields a higher value to the firm than the contract {w(t)} = w˜ b (t + ta ) for t=0 , where w(t) ˆ all t. Replace the tail of the baseline contract from tenure ta onward by letting wˆ b (t + ta ) = w(t) for all t. The new baseline contract yields a higher value to the firm than the original baseline contract, contradicting the optimality of the latter. 8
570
SHOUYONG SHI
4. EQUILIBRIUM AND BLOCK RECURSIVITY I will use V instead of t as the variable in various functions. To do so, define (4.1)
T (V (t)) = t
˜ (V )) w(V ) = w(T
˜ (V )) J(V ) = J(T
T (V ) is the inverse function of V (t) and it records the length of time for the value to increase from the lowest equilibrium offer, v1 , to V . A contract of a value V starts with the wage w(V ) and generates a present value J(V ) to a firm. Refer to w(V ) as the wage function. Since T (V (t)) = 1/V˙ (t), then ˜ d J(t)/dt = J (V (t))V˙ (t) and (3.10) becomes (4.2)
J (V ) = −
1 u (w(V ))
all V < V¯
4.1. Definition of the Equilibrium and Block Recursivity An equilibrium consists of a set of offers V = {V (t) : t ≥ 0}, a hiring rate function q(·), an employment function p(·), an application strategy F(·), a value function J(·), a wage function w(·), a distribution of employed workers over values G(·), and a fraction of employed workers n that satisfy the following requirements: (i) Given p(·), F(V ) solves (3.1). (ii) Given F(·) and p(·), each offer x ∈ V is delivered by a contract that solves (P ), and the resulting value function of the firm is J(x). (iii) Expected profit of recruiting is zero for all offers; that is, q(x)J(x) = k for all x ∈ [V V¯ ] and q(x)J(x) < k otherwise, where q(x) = M −1 (p(x)). (iv) G and n are stationary.9 Most elements of this definition are self-explanatory, except (iii). Requirement (iii) asks expected profit of recruiting to be zero for all x ∈ [V V¯ ], and it implies that an equilibrium indeed has meaningful trade-offs between offers and matching rates in all submarkets. Given J(·), the requirement yields the hiring rate function as q(x) = k/J(x) and the employment rate function as p(x) = M(k/J(x)). Because J(x) decreases in x (see (4.2)), the hiring rate is increasing and the employment rate is decreasing in the offer, as I have used in previous sections. Note that requirement (iii) is imposed not just on equilibrium offers in V , but on all offers in [V V¯ ]. Because the lowest equilibrium offer is v1 = F(Vu ) > Vu > V , V is a strict subset of [V V¯ ]. Thus, requirement (iii) restricts the beliefs out of the equilibrium. By completing the markets, this restriction refines the set of equilibria and has been commonly used in directed 9 The model can be extended to allow for a sunk cost of creating a vacancy, C, in addition to the flow cost k. Let R be the expected value of a vacancy, measured after a firm has incurred C. Then R and the optimal offer solve ρR = −k + maxx {q(x)[J(x) − R]}. Free entry of vacancies requires R = C. In this economy, (iii) in the equilibrium definition is modified as q(x)[J(x)−C] ≤ k + ρC, with equality for all x ∈ [V V¯ ]. An equilibrium is well defined if either k > 0 or ρ > 0. Only when k = ρ = 0 is there no finite R that satisfies R = C.
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
571
search models, for example, Moen (1997), Acemoglu and Shimer (1999b), and Delacroix and Shi (2006).10 A striking property of an equilibrium is block recursivity: Although the distribution of workers over wages or values depends on the aggregation of individuals’ decisions, parts (i)–(iii) above are self-contained and independent of the distribution. Thus, the distribution plays no role in individuals’ decisions, optimal contracts, the equilibrium functions p(·) and q(·), and employed workers’ job-to-job transitions. The reason for this independence is that directed search separates the applicants into different submarkets and, in each submarket, free entry of firms determines the number of vacancies independently of the distributions of workers in other submarkets. As a result, the matching rate functions p(·) and q(·) are independent of the distributions. To elaborate, consider the fixed-point problem formed by (i)–(iii) in the above definition. Given q(·), the matching function yields the employment rate function p(·). Knowing p(·) is sufficient for the workers to choose the optimal target F(·). The functions p(V ) and F(V ) determine the quit rate of a worker at each V . For a firm, the worker’s quit rate summarizes all the effects of competition on the firm’s expected stream of profits. Thus, given the quit rate, the firm can calculate the expected value delivered by any wage contract and, hence, can choose the contract optimally. This optimal choice determines the wage function w(·) and the firm’s value function J(·). Finally, the free-entry condition ties the loop by determining the hiring rate function q(·) that, in an equilibrium, must be the same as the one with which the process started. The distributions of offers and workers do not appear in this process. Note that an equilibrium is block recursive even if there is exogenous separation into unemployment, which I have assumed away. With such separation, the value for a worker can still be determined given the function, p(·), without any reference to the distributions.11 Block recursivity relies critically on endogenous separation of workers, which is an implication of directed search. Not surprisingly, undirected search models (e.g., Burdett and Mortensen (1998) and BC) do not have this property. With undirected search, a worker’s quit rate is a function of the distribution of offers because the worker can receive an offer anywhere in the distribution of offers, and a firm’s hiring rate is a function of the distribution of workers 10
To see why there can be missing markets in general, suppose that all agents believe that no one will participate in submarket x. With such beliefs, no firm will post a vacancy and no worker will search in submarket x. Thus, the beliefs that submarket x will be missing are self-fulfilling. This outcome of a missing market may not be robust to a trembling-hand event that exogenously puts some firms in submarket x. 11 If the number of firms is fixed, rather than being determined by free entry, the expected value of recruiting is endogenous and depends on the distribution of workers. Even in this case, the distribution plays only a limited role because it affects individuals’ decisions and the functions p(·) and q(·) entirely through a one-dimensional object, that is, the expected value of recruiting.
572
SHOUYONG SHI
because all workers whose current values are less than the firm’s offer will accept the offer. Thus, the distributions of offers and workers affect individuals’ decisions and contracts in undirected search models. In turn, these decisions and contracts affect the flows of workers that determine the distribution of workers. The two-way dependence and the dimensionality of the distribution make an equilibrium analysis complicated in undirected search models. Block recursivity simplifies an equilibrium drastically. 4.2. Existence of an Equilibrium This subsection determines the equilibrium functions p(·), q(·), w(·), F(·), and J(·). I refer to existence of these functions as existence of an equilibrium, although an equilibrium also involves the distribution of workers that will be determined in Section 6 later. The following procedure formalizes the fixed-point problem discussed above. It is more convenient to develop a mapping on the wage function w(V ) than on q(·). Start with an arbitrary function w(·). First, integrating (4.2) and using J(V¯ ) = J = k/q¯ (see (3.8)), I get (4.3)
Jw (V ) =
k + q¯
V
V¯
1 dz u (w(z))
The subscript w on J, and on (q p F S) below, indicates the dependence on the initial function w. Second, the zero-profit condition yields qw (V ) = k/Jw (V ). Since p = M(q), then k (4.4) pw (V ) = M Jw (V ) Third, with pw (V ), the solution to (3.1) yields a worker’s optimal search as Fw (V ) and the expected gain from search as Sw (V ). Equation (3.3) yields V˙w and (3.5) yields dJw (V (t))/dt. Fourth, I combine (3.10) with (3.5) and (3.3). Recall that optimal contracts require V˙ (t) > 0 for all t < ∞ (see Lemma 3.2). However, V˙w (t) constructed from an arbitrary w may not necessarily be positive. To ensure that every step of the equilibrium mapping satisfies V˙w ≥ 0, I modify (3.10) as ˜ Substituting V˙w from (3.3) and d J˜w (t)/dt d J˜w (t)/dt = − max{0 V˙w }/u (w). from (3.5) into (3.10), I get w(V ) = ψw(V ), where the mapping ψ is defined as ψw(V ) ≡ y − r + pw (Fw (V )) Jw (V ) (4.5) −
max{0 rV − Sw (V ) − u(w(V ))} u (w(V ))
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
573
The equilibrium wage function is a fixed point of ψ. With this fixed point, the first three steps above recover q(·), p(·), J(·), F(·), and S(·) in an equilibrium. Clearly, all these functions are independent of the distribution of workers. To characterize the fixed point of ψ, define (4.6)
¯ for all V ; w(V¯ ) = w; ¯ Ω = {w(V ) : w(V ) ∈ [w w] and w(V ) is continuous and (weakly) increasing}
(4.7)
Ω = {w ∈ Ω : w(V ) is strictly increasing for all V < V¯ }
The equilibrium wage function must lie in Ω (see Lemma 3.2). In addition, I must verify that the equilibrium wage function induces a function p(·), through (4.4), that indeed satisfies (2.2). To this end, I impose the following assumption on the matching function M(q): ¯ for all V , where q ASSUMPTION 1: (i) M(q) is continuous and q(V ) ∈ [q q] ¯ = 0. (iii) M(q) will be specified in (4.8) and q¯ < ∞. (ii) M (q) < 0 and M(q) ¯ where |M | ≤ m1 and |M | ≤ m2 for some is twice differentiable for all q ∈ [q q], finite constants m1 and m2 . (iv) qM (q) + 2M (q) ≤ 0. Part (i) is a regularity condition. In particular, the upper bound on q is imposed to apply a fixed-point theorem on bounded and continuous functions. Part (ii) captures the intuitive feature that if it is easy for a firm to fill a vacancy, it must be difficult for a worker to obtain a job. In the extreme case where a firm can fill a vacancy at the maximum rate, the employment rate is 0. Part (iii) simplifies the proof of existence significantly. By restricting convexity of M(q), part (iv) helps to establish concavity of p(·), which is stated in (2.2) and used to ensure uniqueness of each worker’s optimal search decision. Assumption 1 is ¯ + θ), satisfied by the so-called telegraph matching function, M(θ 1) = qθ/(1 which implies M(q) = q¯ − q.12 Next, I specify the following bounds on various functions. Define (4.8)
¯ J ≡ k/q
J¯ ≡ Jw¯ (V )
¯ q ≡ k/J
p¯ ≡ M(q)
S¯ ≡ Sw¯ (V )
Because Jw (V ), pw (V ), and Sw (V ) are decreasing in V , and qw (V ) is increasing in V , then ¯ Jw (V ) ∈ [J J]
¯ qw (V ) ∈ [q q]
¯ Sw (V ) ∈ [0 S]
all w ∈ Ω
¯ pw (V ) ∈ [0 p]
all V
As another example, consider the Cobb–Douglas matching function, which has M(θ 1) = ˆ θ , where α ∈ (0 1). This function implies p = M(q) ≡ q(α−1)/α . Let q¯ be a sufficiently large but ˆ ˆ ¯ Then M(q) satisfies Assumption 1 if and only if finite constant and let M(q) = M(q) − M(q). α ≥ 1/2. 12
α
574
SHOUYONG SHI
Choose the lower bound on wages w to be a strictly positive number sufficiently close to 0. ASSUMPTION 2: Assume that b, V , and w satisfy rk q¯
(4.9)
(0 < ) b < w¯ = y −
(4.10)
[u(b) − Sw (V ) − u(w)] y − r + pw¯ (Fw¯ (V )) J¯ ≥ w + u (w)
(4.11)
1+
u (w) ¯ − u(w)] ≥ 0 [u(w) [u (w)]2
¯ all w ∈ [w w]
Note that all elements in Assumption 2 can be derived exclusively from exogenous objects of the model. Condition (4.9) is necessary for there to be any worker employed. Condition (4.10) is sufficient for ψw(V ) ≥ w for all V , and (4.11) is sufficient for ψ to map increasing functions into increasing functions. There is a nonempty region of parameter values that satisfy all of these conditions.13 The following theorem establishes the existence of an equilibrium that indeed satisfies (2.2) (see Appendix B for a proof): THEOREM 4.1: Maintain Assumptions 1 and 2. The mapping ψ has a fixed point w∗ ∈ Ω . Moreover, the equilibrium has the following properties: (i) Jw∗ (V ) is ¯ strictly decreasing, strictly concave, and continstrictly positive, bounded in [J J], uously differentiable for all V , with Jw∗ (V¯ ) = J. (ii) pw∗ (V ) has all the properties in (2.2) and is strictly concave for all V < V¯ . (iii) V˙w∗ > 0 and dJw∗ (V (t))/dt < 0 for all V < V¯ . REMARK 1: Although I have focused on an equilibrium that satisfies (2.2), all equilibria must have a strictly decreasing p(·). If p(V2 ) ≥ p(V1 ) for some V2 > V1 , then q(V2 ) ≤ q(V1 ). In this case, no worker would apply to V1 , no firm would recruit at V2 , and so V1 and V2 could not both be equilibrium offers. Similarly, p(·) must be continuous in all equilibria. In contrast, not all equilibria necessarily have a concave and differentiable p(·). However, it is natural to 13 Condition (4.9) can be easily satisfied. By choosing w sufficiently close to 0 and using the assumption u (0) → ∞, I can ensure (4.10) if [r + pw¯ (Fw¯ (V ))]J¯ < y. Because the left-hand side of this inequality is a decreasing function of V , the inequality puts a lower bound on V . This lower bound is smaller than V¯ , because [r + pw¯ (Fw¯ (V¯ ))]Jw¯ (V¯ ) = rJ < y. Using the definition of V , ¯ Hence, I can translate this lower bound on V into a lower bound on b, which is smaller than w. there are values of b that satisfy both (4.9) and (4.10). Finally, there are utility functions that satisfy (4.11). For example, the utility function with constant relative risk aversion satisfies (4.11) if the relative risk aversion is lower than a critical level.
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
575
focus on equilibria with a concave and differentiable p(·). Concavity of p(·) is useful for ensuring that each worker’s optimal search decision is unique, and differentiability of p(·) allows me to characterize this optimal decision with the first-order condition. I will suppress the asterisk on w∗ and the subscript w∗ on the functions J, p, q, F , and S. Moreover, I will focus on a wage function w(V ) that is differentiable. 5. JOB TRANSITIONS, WAGE MOBILITY, AND POLICY ANALYSIS A typical worker in this model experiences continuous wage increases when he stays with a job and discrete jumps in wages when he transits to another job. For example, consider a worker in unemployment. The worker’s value is Vu and he applies for the offer v1 = F(Vu ). If he obtains the offer, the value jumps to v1 and the target of his next search is v2 = F(v1 ). If the worker obtains the next offer, his value jumps to v2 . If the worker fails to obtain the offer v2 , the value for the worker increases continuously according to the contract. In both cases, the worker revises the target of search according to F(·). This process continues to increase the worker’s value toward V¯ asymptotically until the worker dies. The above process has the following predictions that are consistent with the stylized facts described in the Introduction. First, wages and values strictly increase with tenure, as shown by w (V ) > 0 and V˙ (t) > 0. Second, the rate at which a worker quits a job for a better offer strictly decreases with tenure and wages, as shown by the result that p(F(V )) strictly decreases in V . The cause for this feature is directed search, rather than the fact that a low-wage worker has more wage levels to which he can transit than a high-wage worker does. With directed search, low-wage workers optimally choose to search for relatively low offers that are easier to get, and so they make job transitions with higher probabilities than high-wage workers do. Third, wage mobility is limited endogenously, because the workers at a wage w(V ) optimally choose to search only for the contract that starts at the wage w(F(V )). The lower a worker’s current wage, the lower the future wage he will move to in the next job transition. Now consider two policies: an increase in the unemployment benefit b and a minimum-wage requirement w ≥ wmin . For the minimum wage to be nontrivial, assume that w(v1 ) < wmin , where w(v1 ) is the lowest equilibrium wage in the absence of the minimum wage. The following corollary summarizes the effects of these policies (the proof is straightforward and omitted): COROLLARY 5.1: Changes in b and wmin do not affect the functions w(·), F(·), p(·), q(·), and J(·). Hence, they do not affect an employed worker’s transitions or contracts, conditional on the worker’s current wage. However, they affect the distribution of workers and increase the lowest offer in an equilibrium, v1 . Moreover,
576
SHOUYONG SHI
an increase in b increases the value for unemployed workers Vu and reduces the measure of employed workers n. An increase in wmin reduces n and Vu . To see more clearly the effects of the policies, suppose that the policies increase v1 to vˆ 1 . The offers in [v1 vˆ 1 ) are no longer equilibrium offers, but the new baseline contract is the tail of the original baseline contract that starts at vˆ 1 . Since the latter is an equilibrium contract prior to the policy change, the set of equilibrium contracts after the policy change is a subset of the original set of equilibrium contracts. Conditional on a worker’s current value (or wage), the worker’s optimal application, the wage–tenure contract, and the worker’s transition rate to another job are all independent of the two policies. The reason for this independence is block recursivity of an equilibrium with directed search. Because the fixed-point problem that determines q, p, F , J, and w involves only employed workers and not unemployed workers, its solution does not depend on policies that affect only unemployed workers. The policies do affect aggregate activities in the current model by affecting (v1 Vu ) and the distribution of workers. These effects, stated in Corollary 5.1, are intuitive. For example, a higher unemployment benefit reduces employment, because it makes unemployed workers “picky” about offers. Note that an increase in the minimum wage reduces the value for unemployed workers, despite raising the target value of an unemployed worker’s search. The explanation is that the original target value, v1 , provides the best trade-off for an unemployed applicant between the offer and the employment rate. By raising the target value, the minimum wage reduces an unemployed worker’s transition rate into employment by so much that it cannot be adequately compensated by the rise in the target value. Let me contrast the results in this section with those in BC. Modeling search as an undirected process, BC has also shown that wages increase, and quit rates fall with tenure and wages. However, their model does not generate limited wage mobility; instead, even a worker at the bottom of the wage distribution can immediately transit to the top of the distribution. Moreover, because their model does not have block recursivity, the two policies above affect contracts and individuals’ transitions through the distribution of workers. In particular, an increase in the unemployment benefit in that model increases the equilibrium distribution of offers, the job-to-job transition rate, and the slope of the wage–tenure contracts.14 14 Both the current model and BC assume that there is no exogenous separation into unemployment. If such exogenous separation is introduced, the two policies will affect equilibrium contracts and employed workers’ transitions in the current model, because the value of unemployment will appear in the equation that determines the value for employed workers. Even in this extension of the current model, it is still true that the policies do not affect contracts and worker transitions through the distribution of workers.
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
577
6. EQUILIBRIUM DISTRIBUTION OF WORKERS Let G be the cumulative distribution function of employed workers over V = [v1 V¯ ], and let g be the corresponding density function.15 For any arbitrary V ∈ V and a small interval of time dt, let me examine the flows in and out of the group of workers who are employed at values less than or equal to V . The measure of this group is nG(V ). The only inflow is unemployed workers who find matches at v1 , which is (1 − n)p(v1 ) dt. There are three outflows. First, death generates an outflow δn G(V ) dt. Second, the contracts increase the values for the workers in (V − V˙ dt V ] above V , the flow of which is n[G(V ) − G(V − V˙ dt)]. Third, some workers in the group quit for offers higher than V . These quitters are currently employed in (F −1 (V ) V ] if F −1 (V ) ≥ v1 and in (v1 V ] if F −1 (V ) < v1 . Thus, quitting generates the outflow V p(F(z)) dG(z) (dt)n max{v1 F −1 (V )}
Equating the inflows to the sum of outflows and taking the limit dt ↓ 0, I obtain (6.1)
G(V ) − G(V − V˙ dt) dt↓0 dt
lim
=
1−n p(v1 ) − δG(V ) − n
V max{v1 F −1 (V )}
p(F(z)) dG(z)
THEOREM 6.1: Denote vj = F (j) (v0 ), j = 1 2 where F (0) (v0 ) = v0 ≡ Vu and F (j) (v0 ) = F(F (j−1) (v0 )). Then G(V ) is continuous for all V , with G(v1 ) = 0. The density function g(V ) is continuous for all V and differentiable except for V = v2 . Moreover, (6.2)
n = p(v1 )/[δ + p(v1 )]
(6.3)
g(V )V˙ = δ[1 − G(V )] −
V
max{v1 F −1 (V )}
p(F(z)) dG(z)
With the function T (V ) in (4.1), define T (z2 ) (6.4) δ + p F(V (t)) dt Γ (z2 z1 ) = exp −
z1 z2 ≥ v1
T (z1 )
Add a subscript j to g(V ) for V ∈ [vj vj+1 ). Then g can be recursively solved piecewise as (6.5)
g1 (V )V˙ = δΓ (V v1 )
15 The distribution of employed workers over wages can be deduced as Gw (w(V )) = G(V ), with a density function gw (w(V )) = g(V )/w (V ).
578 (6.6)
SHOUYONG SHI
gj (V )V˙ − gj (vj )v˙ j Γ (V vj ) =
V
Γ (V z)p(z)gj−1 (F −1 (z)) dF −1 (z)
vj
where (6.6) holds for j ≥ 2. Moreover, gj (vj ) = limV ↑vj gj−1 (V ) for all j. The above theorem documents several features. First, the equilibrium distribution of employed workers is nondegenerate and continuous, despite the fact that all matches are identical and search is directed. Both on-the-job search and wage–tenure contracts are important for this dispersion of values. If onthe-job search were prohibited, only one value, v1 , would be offered in an equilibrium, as in most models of directed search with homogeneous matches. Onthe-job search produces jumps in values and, hence, a nondegenerate distribution of values. However, without wage–tenure contracts, on-the-job search alone would only produce a wage ladder formed by the set, {v1 v2 V¯ } as in Delacroix and Shi (2006). Wage–tenure contracts provide continuous increases in the values to fill in the gaps between any two levels in this discrete set. Second, there is no mass point anywhere in the support of the distribution. It is particularly remarkable that there is no buildup of workers at v1 . Although all unemployed workers only apply for v1 , all workers at v1 move out of v1 in any arbitrarily short length of time as a result of quits, death, or wage increases in the contracts. Moreover, the density function is differentiable except at V = v2 . It is not differentiable at v2 because offers above v2 receive applications from employed workers but offers below v2 do not.16 Finally, more workers are employed at low values than at high values, because the job-to-job transition rate decreases sharply in the target value. In particular, as V approaches V¯ , the employment rate declines to 0, which requires the measure of recruiting firms per applicant to approach zero. Thus, the density function g(V ) can be decreasing for V close to V¯ . 7. CONCLUSION I have constructed a theoretical framework in which firms offer wage–tenure contracts to direct the search by risk-averse workers. All workers can search on or off the job. I have characterized an equilibrium and proved its existence. The equilibrium generates a nondegenerate, continuous distribution of employed workers over the values of contracts, despite the fact that all matches are identical and workers observe all offers. A striking property is that the equilibrium is block recursive; that is, individuals’ optimal decisions and optimal contracts 16 Similarly, the density function of offers is discontinuous, because a mass of firms recruit at v1 but no firm recruits at V ∈ (v1 v2 ). To eliminate nondifferentiability of g at v2 and discontinuity of the offer density, an earlier version of this paper assumes that b is distributed in an interval ¯ whose upper bound is equal to w.
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
579
are independent of the distribution of workers. This property makes the equilibrium analysis tractable. Consistent with stylized facts, the equilibrium predicts that (i) wages increase with tenure, (ii) job-to-job transitions decrease with tenure and wages, and (iii) wage mobility is limited in the sense that the lower the worker’s wage, the lower the future wage a worker will move to in the next job transition. Moreover, block recursivity implies that changes in the unemployment benefit and the minimum wage have no effect on an employed worker’s job-to-job transitions and contracts. The theoretical framework is tractable for a wide range of applications and extensions, because of block recursivity. In particular, Menzio and Shi (2008) incorporated aggregate and match-specific shocks into the model to examine dynamics and business cycles with on-the-job search. APPENDIX A: PROOF OF LEMMA 3.1 The result F(V¯ ) = V¯ is evident. Let V < V¯ and denote K(x V ) = p(x)(x − V ). Let V1 and V2 be two arbitrary values with V1 < V2 < V¯ , and denote Fi = F(Vi ), where i = 1 2. For part (i), because p(·) is bounded and continuous, K(x V ) is bounded and continuous. Thus, the maximization problem in (3.1) has a solution. Since K(x V ) > 0 for all x ∈ (V V¯ ) and K(V V ) = 0 = K(V¯ V ), the solutions are interior. Interior solutions and differentiability of p(·) imply that F(V ) is given by the first-order condition (3.2). Take two distinct values, V2 and V1 . They must generate different values for the right-hand side of (3.2). Thus, F(V1 ) ∩ F(V2 ) = ∅ for all V2 = V1 . This result implies that K(Fi Vi ) > K(Fj Vi ) for j = i. I have 0 > [K(F2 V1 ) − K(F1 V1 )] + [K(F1 V2 ) − K(F2 V2 )] = [p(F2 ) − p(F1 )](V2 − V1 ) Thus, p(F2 ) < p(F1 ). Because p(·) is strictly decreasing, F(V2 ) > F(V1 ). For part (ii), I show that K(x V ) is strictly concave in x for all x ∈ (V V¯ ). Let x1 and x2 be two arbitrary values with x2 > x1 > V . Let xα = αx1 + (1 − α)x2 , where α ∈ (0 1). Then K(xα V ) ≥ [αp(x1 ) + (1 − α)p(x2 )][α(x1 − V ) + (1 − α)(x2 − V )] = αK(x1 V ) + (1 − α)K(x2 V ) + α(1 − α)[p(x1 ) − p(x2 )][x2 − x1 ] > αK(x1 V ) + (1 − α)K(x2 V ) The first inequality comes from concavity of p, and the last comes from strictly decreasing p(·). Thus, K(x V ) is strictly concave in x and F(V ) is unique.
580
SHOUYONG SHI
Uniqueness implies that F(V ) is continuous in V by the theorem of the maximum (see Stokey, Lucas, and Prescott (1989, p. 62)). For part (iii), note that K(F1 V1 ) > K(F2 V1 ) and K(F2 V2 ) > K(F1 V2 ). Then S(V2 ) − S(V1 ) > K(F1 V2 ) − K(F1 V1 ) = −p(F1 )(V2 − V1 ) S(V2 ) − S(V1 ) < K(F2 V2 ) − K(F2 V1 ) = −p(F2 )(V2 − V1 ) Divide the two inequalities by (V2 − V1 ) and take the limit V2 → V1 . Because F(·) is continuous, the limit shows that S (V1 ) = −p(F1 ). Since V1 is arbitrary, part (iii) holds for all V . For part (iv), because p is decreasing and concave, p(F1 ) ≥ p(F2 ) − p (F1 )(F2 − F1 ). Substituting this inequality into (3.2) yields V2 − V1 ≥ 2(F2 − F1 ) + p(F2 )
p (F1 ) − p (F2 ) ≥ 2(F2 − F1 ) p (F1 )p (F2 )
This implies F2 − F1 ≤ (V2 − V1 )/2 and so F is Lipschitz. For part (v), if p is twice differentiable, then differentiating (3.2) generates F (V ). Part (iv) implies F (V ) ≤ 1/2. Hence, S (V ) = −p (F(V ))F (V ). Q.E.D. APPENDIX B: PROOF OF THEOREM 4.1 Consider the sets Ω and Ω , defined by (4.6) and (4.7), respectively. It can be verified that Ω is nonempty, closed, bounded, and convex. Lemma B.1 below shows that properties (i) and (ii) in Theorem 4.1 are satisfied not only by Jw∗ and pw∗ , but also by Jw and pw that are constructed through (4.3) and (4.4) with any arbitrary w ∈ Ω. Thus, (2.2) and parts (i)–(iv) of Lemma 3.1 hold in every iteration of the mapping ψ (defined by (4.5)), not just with the fixed point. In particular, Fw (V ) is strictly increasing and satisfies (3.2) in every iteration. Lemma B.2 below, whose proof uses (4.11), describes additional properties that will be used in the proofs of Lemmas B.3, B.4, and B.5. The latter three lemmas establish that the mapping ψ satisfies the conditions of the Schauder fixed-point theorem (see Stokey, Lucas, and Prescott (1989, p. 520)). Therefore, ψ has a fixed point in Ω, denoted as w∗ . Lemma B.3 then implies w∗ (V ) = (ψw∗ )(V ) ∈ Ω . Finally, I show that V˙w∗ > 0 for all V < V¯ , as in (iii) of Theorem 4.1. Once this is done, (4.5) implies dJw∗ (V (t))/dt = − max{0 V˙w∗ }/u (w∗ (V )) < 0 for all V < V¯ . Suppose that V˙w∗ ≤ 0 for some V1 < V¯ , contrary to the theorem. In this case, (4.5) implies w∗ (V1 ) = y − [r + pw∗ (Fw∗ (V1 ))]Jw∗ (V1 ) and so V˙w∗ = rV1 − Sw∗ (V1 ) − u y − r + pw∗ (Fw∗ (V1 )) Jw∗ (V1 )
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
581
With the properties of Jw∗ and pw∗ in Lemma B.1, the right-hand side of the above equation is strictly decreasing in V1 and is equal to 0 at V1 = V¯ . Thus, V˙w∗ > 0 at V1 —a contradiction. This completes the proof of Theorem 4.1. Q.E.D. The proofs of Lemmas B.1, B.2, B.3, and B.4 below are omitted but can be found in the Supplementary Material (Shi (2009)). LEMMA B.1: For any w ∈ Ω, Jw (V ) and pw (V ), defined by (4.3) and (4.4), ¯ pw (V ) ∈ have the following properties: (i) They are bounded with Jw (V ) ∈ [J J], ¯ ¯ ¯ ¯ Jw (V ) = J, and pw (V ) = 0, where J, J, and p¯ are defined in (4.8). [0 p], (ii) They are strictly decreasing, continuously differentiable, and concave for all V . (iii) If w ∈ Ω , then Jw (V ) and pw (V ) are strictly concave. LEMMA B.2: Let w1 w2 w ∈ Ω. (i) pw (Fw (V )) is increasing in w in the sense that if w2 (V ) ≥ w1 (V ) for all V , then pw2 (Fw2 (V )) ≥ pw1 (Fw1 (V )) for all V . (ii) For all V2 ≥ V1 , (B.1)
u(w(V2 )) − u(w(V1 )) [rV1 − Sw (V1 )] − [rV2 − Sw (V2 )] ≥Δ≥ u (w(V1 )) u (w(V1 ))
where Δ is defined as (B.2)
Δ=
1 max 0 rV1 − Sw (V1 ) − u(w(V1 )) u (w(V1 ))
1 max 0 rV2 − Sw (V2 ) − u(w(V2 )) − u (w(V2 ))
LEMMA B.3: ψ : Ω → Ω ⊂ Ω. LEMMA B.4: ψ is Lipschitz continuous in the sup norm. LEMMA B.5: With an arbitrary w ∈ Ω, define ψ0 w = w and ψj+1 w = ψ(ψj w) for j = 0 1 2 The family of functions {ψj w}∞ j=0 is equicontinuous. PROOF: Take an arbitrary w ∈ Ω and construct the family {ψj w}∞ j=0 . The family is equicontinuous if it satisfies the following requirement (see Stokey, Lucas, and Prescott (1989, p. 520)): For any given ε > 0, there exists a > 0 such that, for all V1 and V2 , (B.3)
|V2 − V1 | < a
⇒
|ψj w(V2 ) − ψj w(V1 )| < ε
all j
I use the following procedure to establish (B.3). In the entire procedure, fix w as an arbitrary function in Ω and fix ε > 0 as an arbitrary number. First, be¯ is bounded and closed, w cause w is continuous and the domain of w, [w w],
582
SHOUYONG SHI
is uniformly continuous. Thus, there exists a0 > 0 such that, for all V1 and V2 , |V2 − V1 | < a0 ⇒ |w(V2 ) − w(V1 )| < ε. Second, I show that there exists a1 > 0 such that, for all V1 and V2 , if |w(V2 ) − w(V1 )| < ε, then (B.4)
|V2 − V1 | < a1
⇒
|ψw(V2 ) − ψw(V1 )| < ε
Let a = min{a0 a1 }. Then w and ψw both satisfy (B.3). Third, replacing w with ψw and a0 with a, the above two steps yield that ψ2 w satisfies (B.3). Repeating this process but fixing a at the level just defined, it is easy to see that ψj w satisfies (B.3) for all j. Only (B.4) needs a proof. Take arbitrary V1 and V2 that satisfy |w(V2 ) − w(V1 )| < ε. As before, shorten the notation f (Vi ) to fi , where f includes the functions w, Jw , Fw , pw , Sw , and ψw. Without loss of generality, assume V2 ≥ V1 . Since w(V ) and ψw(V ) are increasing functions, then w2 ≥ w1 and ψw2 ≥ ψw1 . With the first inequality in (B.1), I have (B.5)
(0 ≤ ) ψw2 − ψw1 ≤ [r + pw (Fw1 )](Jw1 − Jw2 ) + Jw2 [pw (Fw1 ) − pw (Fw2 )] + [u(w2 ) − u(w1 )]/u (w1 )
Examine the three terms on the right-hand side in turn. Using (4.3), I ob¯ For the difference in pw (F), recall that tain 0 ≤ Jw1 − Jw2 ≤ (V2 − V1 )/u (w). ¯ I can verify Fw2 − Fw1 ≤ (V2 − V1 )/2 (see Lemma 3.1). Also, because qw ≤ q, that |(dM(k/Jw ))/dJw | ≤ B1 ≡ m1 q¯ 2 /k, where m1 is specified in Assumption 1. Thus, k k (B.6) −M 0 ≤ pw (Fw1 ) − pw (Fw2 ) = M Jw (Fw1 ) Jw (Fw2 ) ≤ B1 [Jw (Fw1 ) − Jw (Fw2 )] ≤
B1 [Fw2 − Fw1 ] B1 (V2 − V1 ) ≤ ¯ ¯ u (w) [2u (w)]
To examine the last term in (B.5), define L(w) ≡ u(w) − u(w1 ) − u (w1 )(w − w1 ) + (μ1 /2)(w − w1 )2 , where μ1 ≡ minw∈[ww] ¯ |u (w)| > 0. Because L is con cave, and L (w1 ) = 0, L(w) is maximized at w = w1 , and so L(w2 ) ≤ L(w1 ) = 0. Since w1 ≥ w, I get (B.7)
(0 ≤ )
μ1 u(w2 ) − u(w1 ) ≤ w2 − w1 − (w2 − w1 )2 u (w1 ) 2u (w)
The right-hand side (RHS) of (B.7) is maximized at w2 − w1 = [u (w)/μ1 ]1/2 . Recall that w2 − w1 < ε. If ε ≤ [u (w)/μ1 ]1/2 , the RHS of (B.7) is increasing in (w2 − w1 ), and so it is strictly smaller than the value at w2 − w1 = ε, which is
SEARCH FOR EQUILIBRIUM WAGE–TENURE CONTRACTS
583
[ε − μ1 /(2u (w))ε2 ]. If ε > [u (w)/μ1 ]1/2 , then RHS (B.7) < 12 [u (w)/μ1 ]1/2 < ε/2. In both cases, I have
u(w2 ) − u(w1 ) 1 μ1 ε (0 ≤ ) (B.8) < ε max 1 − u (w1 ) 2 2u (w)
1 μ1 ε = ε − ε min 2 2u (w) Substitute the above bounds on the terms on the RHS of (B.5). Noting that ¯ where p¯ and J¯ are defined in (4.8), I obtain pw (Fw ) ≤ p¯ and Jw ≤ J,
1 μ1 ε (B.9) (0 ≤ ) ψw2 − ψw1 < A3 (V2 − V1 ) + ε − ε min 2 2u (w) ¯ 1 /2)/u (w). ¯ A sufficient where A3 ∈ (0 ∞) is defined as A3 ≡ (r + p¯ + JB condition for ψw2 − ψw1 < ε is that RHS (B.9) ≤ ε. This condition can be expressed as 0 ≤ V2 − V1 ≤ a1 , where a1 ≡ (ε/A3 ) min{ 12 μ1 ε/(2u (w))}. Because A3 ∈ (0 ∞), w > 0, and μ1 ∈ (0 ∞), then a1 > 0 and (B.4) holds. Moreover, a1 is independent of a0 , given ε. This completes the proof of Lemma B.5. Q.E.D. REFERENCES ACEMOGLU, D., AND R. SHIMER (1999a): “Efficient Unemployment Insurance,” Journal of Political Economy, 107, 893–928. [563,564,566] (1999b): “Holdups and Efficiency With Search Frictions,” International Economic Review, 40, 827–850. [563-565,571] BUCHINSKY, M., AND J. HUNT (1999): “Wage Mobility in the United States,” Review of Economics and Statistics, 81, 351–368. [561] BURDETT, K., AND M. G. COLES (2003): “Equilibrium Wage–Tenure Contracts,” Econometrica, 71, 1377–1404. [563] BURDETT, K., AND D. MORTENSEN (1998): “Wage Differentials, Employer Size, and Unemployment,” International Economic Review, 39, 257–273. [563,566,571] BURDETT, K., S. SHI, AND R. WRIGHT (2001): “Pricing and Matching With Frictions,” Journal of Political Economy, 109, 1060–1085. [563,565] COLES, M., AND J. EECKHOUT (2003): “Indeterminacy and Directed Search,” Journal of Economic Theory, 111, 265–276. [563] DELACROIX, A., AND S. SHI (2006): “Directed Search on the Job and the Wage Ladder,” International Economic Review, 47, 651–699. [563,566,571,578] DIAMOND, P. (1982): “Wage Determination and Efficiency in Search Equilibrium,” Review of Economic Studies, 49, 217–227. [563] FALLICK, B., AND C. FLEISCHMAN (2004): “Employer-to-Employer Flows in the U.S. Labor Market: The Complete Picture of Gross Worker Flows,” Unpublished Manuscript, Federal Reserve Board. [561] FARBER, H. S. (1999): “Mobility and Stability: The Dynamics of Job Change in Labor Markets,” in Handbook of Labor Economics, Vol. 3B, ed. by O. Ashenfelter and D. Card. Amsterdam: Elsevier, 2439–2484. [561]
584
SHOUYONG SHI
GALENIANOS, M., AND P. KIRCHER (2005): “Directed Search With Multiple Job Applications,” Unpublished Manuscript, University of Pennsylvania. [563,565] HARRIS, M., AND B. HOLMSTROM (1982): “A Theory of Wage Dynamics,” Review of Economic Studies, 49, 315–333. [563] JOVANOVIC, B. (1979): “Job Matching and the Theory of Turnover,” Journal of Political Economy, 87, 972–990. [563] JULIEN, B., J. KENNES, AND I. KING (2000): “Bidding for Labor,” Review of Economic Dynamics, 3, 619–649. [563,565] MENZIO, G., AND S. SHI (2008): “Efficient Search on the Job and the Business Cycle,” Working Paper 327, University of Toronto. [579] MOEN, E. R. (1997): “Competitive Search Equilibrium,” Journal of Political Economy, 105, 385–411. [563,564,571] MOEN, E. R., AND A. ROSEN (2004): “Does Poaching Distort Training?” Review of Economic Studies, 71, 1143–1162. [563,565,566] MONTGOMERY, J. D. (1991): “Equilibrium Wage Dispersion and Interindustry Wage Differentials,” Quarterly Journal of Economics, 106, 163–179. [563] MORTENSEN, D. (1982): “Property Rights and Efficiency in Mating, Racing, and Related Games,” American Economic Review, 72, 968–979. [563] MOSCARINI, G. (2005): “Job Matching and the Wage Distribution,” Econometrica, 73, 481–516. [563] PETERS, M. (1984): “Bertrand Equilibrium With Capacity Constraints and Restricted Mobility,” Econometrica, 52, 1117–1129. [563] (1991): “Ex ante Price Offers in Matching Games: Non-Steady State,” Econometrica, 59, 1425–1454. [563,565] PISSARIDES, C. (1990): Equilibrium Unemployment Theory. Cambridge, MA: Basil Blackwell. [563] POSTEL -VINAY, F., AND J.-M. ROBIN (2002): “Equilibrium Wage Dispersion With Worker and Employer Heterogeneity,” Econometrica, 70, 2295–2350. [564] SHI, S. (2001): “Frictional Assignment I: Efficiency,” Journal of Economic Theory, 98, 232–260. [563,566] (2002): “A Directed Search Model of Inequality With Heterogeneous Skills and SkillBiased Technology,” Review of Economic Studies, 69, 467–491. [563] (2009): “Supplement to ‘Directed Search for Equilibrium Wage–Tenure Contracts’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/ Supmat/7870_Proofs.pdf. [563,581] STEVENS, M. (2004): “Wage–Tenure Contracts in a Frictional Labour Market: Firms’ Strategies for Recruitment and Retention,” Review of Economic Studies, 71, 535–551. [569] STOKEY, N., R. E. LUCAS, JR., AND E. PRESCOTT (1989): Recursive Methods in Economic Dynamics. Cambridge, MA: Harvard University Press. [580,581] TOPEL, R. H., AND M. P. WARD (1992): “Job Mobility and the Careers of Young Men,” Quarterly Journal of Economics, 107, 439–479. [561]
Dept. of Economics, University of Toronto, 150 St. George Street, Toronto, Ontario M5S 3G7, Canada;
[email protected]. Manuscript received April, 2008; final revision received August, 2008.
Econometrica, Vol. 77, No. 2 (March, 2009), 585–602
TESTING FOR STOCHASTIC MONOTONICITY BY SOKBAE LEE, OLIVER LINTON, AND YOON-JAE WHANG1 We propose a test of the hypothesis of stochastic monotonicity. This hypothesis is of interest in many applications in economics. Our test is based on the supremum of a rescaled U-statistic. We show that its asymptotic distribution is Gumbel. The proof is difficult because the approximating Gaussian stochastic process contains both a stationary and a nonstationary part, and so we have to extend existing results that only apply to either one or the other case. We also propose a refinement to the asymptotic approximation that we show works much better in finite samples. We apply our test to the study of intergenerational income mobility. KEYWORDS: Distribution function, extreme value theory, Gaussian process, monotonicity.
1. INTRODUCTION LET Y AND X DENOTE two random variables whose joint distribution is absolutely continuous with respect to Lebesgue measure on R2 . Let FY |X (·|x) denote the distribution of Y conditional on X = x. This paper is concerned with testing the stochastic monotonicity of FY |X . Specifically, we consider the hypothesis (1)
H0 : For each y ∈ Y , FY |X (y|x) ≤ FY |X (y|x ) whenever x ≥ x for x x ∈ X ,
where Y and X , respectively, are the supports of Y and X. We propose a test statistic and obtain asymptotically valid critical values. We are not aware of any existing test for (1) in the literature. This hypothesis can be of interest in a number of applied settings. If X is some policy, dosage, or other input variable, one might be interested in testing whether its effect on the distribution of Y is increasing in this sense. Also, one can test whether stochastic monotonicity exists in well-known economic relationships such as expenditures (Y ) versus incomes (X) at household levels, wages (Y ) versus cognitive skills (X) using individual data, outputs (Y ) versus the stock of capital (X) at the country level, sons’ incomes (Y ) versus fathers’ incomes (X) using family data, and so on. 1
We would like to thank a co-editor, three anonymous referees, John Einmahl, Jerry Hausman, Hidehiko Ichimura, Steve Machin, Charles Manski, and participants at several conferences and seminars for useful suggestions and helpful comments. Lee extends thanks to the ESRC and the Leverhulme Trust through funding of the Centre for Microdata Methods and Practice and of the research programme Evidence, Inference and Inquiry. Linton thanks the ESRC for financial support. Whang thanks the Korea Research Foundation Grant funded by the Korean government (MOEHRD) (KRF-2005-041-B00074). © 2009 The Econometric Society
DOI: 10.3982/ECTA7145
586
S. LEE, O. LINTON, AND Y.-J. WHANG
The notion of stochastic monotonicity is important in instrumental variables estimation. Manski and Pepper (2000) have introduced monotone instrumental variables assumptions that hold when the average outcome varies monotonically across the levels of instrumental variables. Small and Tan (2007) have used the stochastic monotonicity condition that does not require that a monotonic increasing relationship hold within individuals, thus allowing for “defiers” in treatments. Blundell, Gosling, Ichimura, and Meghir (2007) have recently adopted this hypothesis and obtained tight bounds on an unobservable cross-sectional wage distribution, thus allowing them to characterize the evolution of its inequality over time. Specifically, they assumed that the distribution of wages W for employed given observed characteristics X and an instrument Z is increasing in Z. Their instrument was the out of work income. They derived a bound on the implied distribution of wages given characteristics under this assumption of stochastic monotonicity. They also suggested a test of this hypothesis based on the implied bounds, using the bootstrap to calculate critical values. They found that the hypothesis was not rejected on their data at standard significance levels; indeed the p-values were very high. They did not provide any theory to justify their critical values and, moreover, did not test the monotonicity hypothesis itself, but an implication of it. This concept arises often in dynamic economic models. Thus suppose that Y = Yt+1 , X = Yt , and Yt is a Markov process so that FY |X = Ft+1|t is the transition measure of the process Yt . In that case the property, along with mild technical conditions, implies that the process has a stationary distribution. The influential monograph of Lucas and Stokey (1989) used the stochastic monotonicity property frequently in solving dynamic optimization problems of the Markov type and characterizing the properties of the solution. It is particularly important in problems where nonconvexities give rise to discontinuous stochastic behavior and it provides a route to proving the existence of stationary equilibria without requiring smoothness. Hopenhayn and Prescott (1992) argued that it arises “in economic models from the monotonicity of decision rules or equilibrium mappings that results from the optimizing behaviour of agents.” Pakes (1986) assumed that the distribution of the return to holding a patent conditional on current returns was nonincreasing in current returns. Consequently he showed that the optimal renewal policy took a very simple form based on the realization of current returns compared with the cost of renewing. Ericson and Pakes (1995), Olley and Pakes (1996), and Buettner (2003) have all used a similar property in various dynamic models of market structures. It is possible to test these restrictions with our methods given suitable data. Testing stochastic monotonicity can be relevant for testing the existence of firms’ strategic behaviors in industrial organization. Recently, Ellison and Ellison (2007) have shown that under some suitable conditions, investment levels are monotone in market size if firms are not influenced by strategic entry deterrence and are nonmonotone if influenced by a desire to deter entry. Ellison
TESTING FOR STOCHASTIC MONOTONICITY
587
and Ellison (2007) have also developed a couple of monotonicity tests, based on Hall and Heckman (2000), and have implemented them using pharmaceutical data. In addition to the tests used in Ellison and Ellison (2007), our test can be adopted to test the existence of strategic entry deterrence. We propose a simple test of hypothesis (1) for observed or (partially) estimated independent and identically distributed (i.i.d.) data. Our statistic is based on the supremum of a rescaled second-order U-process indexed by two parameters x and y (Nolan and Pollard (1987)). It generalizes the corresponding statistic introduced by Ghosal, Sen, and van der Vaart (2000) for testing the related hypothesis of monotonicity of a regression function. Our first contribution is to prove that the asymptotic distribution of our test statistic is Gumbel with certain nonstandard norming constants, thereby facilitating inference using critical values obtained from the limiting distribution. We also show that the test is consistent against all alternatives. The proof technique is quite complicated and novel because the approximating Gaussian stochastic process contains both a stationary part (corresponding to x) and a nonstationary part (corresponding to y), and so we have to extend existing results that apply only to one or the other case. For example, Stute (1986) established the weak convergence to a Brownian bridge of a conditional empirical process (effectively holding x constant in our problem). In the other direction, using the local strong invariance principle of Rio (1994), Ghosal, Sen, and van der Vaart (2000) established local (in x in our notation) weak convergence of an empirical process to a stationary limit, generalizing the seminal work of Bickel and Rosenblatt (1973). The most closely related work to ours is Beirlant and Einmahl (1996), who considered the asymptotics of some functional of a conditional empirical process except that they considered a maximum over a discrete partition of the support of the covariate. See also Einmahl and Van Keilegom (2008). We use some results of Piterbarg (1996) to establish the approximation. These results can be of use elsewhere. See Appendix A.1 of the electronic supplement to this article (Lee, Linton, and Whang (2009)) for some informal discussion on the proof technique. One known issue with the extreme value limiting distributions is the poor quality of the asymptotic approximation in the sense that the error declines only at a logarithmic (in sample size) rate. The usual approach to this has been to use the bootstrap, which provides an asymptotic refinement by removing the logarithmic error term and giving an error of polynomial order (Hall (1993)). In a special case of ours (of a stationary Gaussian process), Piterbarg (1996) provided a higher order analytic approximation to the limiting distribution that involves including the (known) logarithmic factor in the first-order error. His Theorem G1 shows that this corrected distribution is closer to the actual distribution and indeed has an error of polynomial (in sample size) magnitude. We apply this analysis to our more complicated setting and compute the corresponding “correction” term. Our simulation study shows that this approach gives a noticeable improvement in size. An alternative approach is to use a
588
S. LEE, O. LINTON, AND Y.-J. WHANG
standard bootstrap resample applied to the (recentered) statistic (or a bootstrap resample imposing independence between Y and X) to improve the size of the test, motivated by the reasoning of Hall (1993). This method should also yield an asymptotic refinement (Horowitz (2001)), but is much more time consuming than using the asymptotic critical values. The hypothesis (1) implies that the regression function E(Y |X = x), when it exists, is monotonic increasing. It also implies that all conditional quantile functions are increasing. It is a strong hypothesis, but can be reduced in strength by limiting the set of X and Y for which this property holds. See, for example, Bowman, Jones, and Gijbels (1998), Hall and Heckman (2000), Ghosal, Sen, and van der Vaart (2000), and Gijbels, Hall, Jones, and Koch (2000) for existing tests of the hypothesis that E(Y |X = x) is increasing in x. Note that the transformation regression model structure considered in Ghosal, Sen, and van der Vaart (2000) that is, φ(Y ) = m(X) + ε where ε is independent of X, and both φ and m are monotonic functions, actually implies stochastic monotonicity. See also Ekeland, Heckman, and Nesheim (2004). Also, a test of the hypothesis (1) can be viewed as a continuum version of the stochastic dominance test (see Linton, Maasoumi, and Whang (2005) and references therein for details on the stochastic dominance test). The remainder of the paper is organized as follows. Section 2 defines our test statistic and Section 3 states the asymptotic results and describes how to carry out the test. Section 4 contains results of some Monte Carlo experiments. Section 5 illustrates the usefulness of our test by applying it to the study of intergenerational income mobility. Section 6 considers a multivariate extension and Section 7 concludes. All the proofs are given in the electronic supplement to this article (Lee, Linton, and Whang (2009)). 2. THE TEST STATISTIC This section describes our test statistic. Let {(Yi Xi ) : i = 1 n} denote a random sample from (Y X). We suppose throughout that the data are i.i.d., but the main result also holds for the Markov time series case where Yi = Yt+1 i = and Xi = Yt . We actually suppose that Xi is not observed, but an estimate X ψ(Wi θ) is available, where Xi = ψ(Wi θ0 ) is a known function of observable θ is a root-n consistent estimator Wi for some true parameter value θ0 and thereof. The vector Wi can contain discrete and continuous variables. Let 1(·) denote the usual indicator function and let K(·) denote a one-dimensional kernel function with a bandwidth hn . Consider the U-process n (y x) = U
2 i − X j ) [1(Yi ≤ y) − 1(Yj ≤ y)] sgn(X n(n − 1) 1≤i<j≤n i − x)Khn (X j − x) × Khn (X
TESTING FOR STOCHASTIC MONOTONICITY
589
where Khn (·) = h−1 n K(·/ hn ) and sgn(x) = 1(x > 0) − 1(x < 0). Note that the n (y x) can be viewed as a locally weighted version of Kendall’s U-process U n (y x) is related to the U-process tau statistic applied to 1(Y ≤ y) and that U considered in Ghosal, Sen, and van der Vaart (2000, equation (2.1)). n (y x) computed using Xi instead of X i First, notice Let Un (y x) denote U that under regularity conditions including smoothness of FY |X (y|x), as n → ∞, h−1 n EUn (y x) → Fx (y|x) |u1 − u2 |K(u1 )K(u2 ) du1 du2 [fX (x)]2 where Fx (y|x) is a partial derivative of FY |X (y|x) with respect to x. Therefore, since θ is a consistent estimator, under the null hypothesis such that n (y x) is less than or equal to zero on Fx (y|x) ≤ 0 for all (y x) ∈ Y × X , U average for large n. Under the alternative hypothesis such that Fx (y|x) > 0 for n (y x) can be very some (y x) ∈ Y × X , a suitably normalized version of U large. In view of this, we define our test statistic as a supremum statistic (2)
Sn =
sup (yx)∈Y ×X
n (y x) U cn (x)
with some suitably defined cn (x), which may depend on (X1 Xn ) but not on (Y1 Yn ). The √ U-statistic structure suggests that we use the scaling facσn (x)/ n, where tor cn (x) = σn2 (x) =
4 i − X j ) sgn(X i − X k ) sgn(X n(n − 1)(n − 2) 1≤i =j =k≤n j − x)Khn (X k − x) Khn (X i − x) 2 × Khn (X
REMARK 2.1: (i) An alternative class of test statistics is based on explicit estimation of conditional cumulative distribution functions (c.d.f.’s). Thus, Y |X (y|x) − F Y |X (y|x )], where, for example, consider Tn = supy∈Y xx ∈X :x≥x [F Y |X (y|x) is some kernel estimate of the conditional c.d.f.; see Hall, Wolff, and F Yao (1999). The advantage that Tn has is that it does not require smoothness of FY |X (y|x). The disadvantage is that its limiting distribution is not pivotal and it is not known how to make it so. (ii) One might also be interested in testing second or higher order dominance (Levy (2006)) of the conditional distribution functions, which can be achieved by straightforward modification of either Sn or Tn .
590
S. LEE, O. LINTON, AND Y.-J. WHANG
REMARK 2.2: In applications one may also be interested in the following extension where there are multiple covariates. Specifically, suppose that X is replaced by X Z where Z is a vector, and the hypothesis is that H0 : For each y ∈ Y , FY |XZ (y|x z) ≤ FY |XZ (y|x z) whenever x ≥ x for x x ∈ X and z ∈ Z . This hypothesis allows the variable Z to affect the response in a general way. The hypothesis is nonnested with (1) for the same reason that a conditional independence hypothesis is non-nested with an independence hypothesis; see Dawid (1979) and Phillips (1988). In the case that Z are discrete random variables, our test statistic can be trivially adapted to test this hypothesis. If Z included some continuous random variables, then a modified version of our test statistic might work, but its asymptotic distribution would be different. 2.3: As an alternative norming constant, one can use cn (x) = REMARK √ σ˜ n (x)/ n, where −1 n
σ˜ (x) = 4h 2 n
q (u)K (u) du × fX3 (x) 2
2
q(u) = sgn(u − w)K(w) dw, and fX (x) is the kernel density estimator of σn (x). fX (x). It can be shown easily that σ˜ n (x) is asymptotically equivalent to In finite samples, σn (x) may worker better than σ˜ n (x) since the former is based on a more direct sample analog, but the latter is easier to compute. REMARK 2.4: In some applications, it might be more desirable to assume that Xi = ψ(Wi θ0 ) + εi (i = 1 n), where εi is an unobserved random i = ψ(Wi θ). To resolve variable. In this case, our test does not apply with X this case, one could assume certain regularity conditions that ensure that the stochastic monotonicity between Y and ψ(W θ0 ) holds if the stochastic monotonicity between Y and X holds (e.g., the monotone likelihood ratio property as in Proposition 2 of Ellison and Ellison (2007)). 3. ASYMPTOTIC THEORY This section provides the asymptotic behavior of the test statistic when the null hypothesis is true and when it is false. In particular, we determine the asymptotic critical region of the test and show that the test is consistent against general fixed alternatives at any level.
TESTING FOR STOCHASTIC MONOTONICITY
591
3.1. Distribution of the Test Statistic Since the hypothesis (1) is a composite hypothesis, it is necessary to find a case when the type I error probability is maximized asymptotically. First of all, under regularity conditions assumed below, it can be shown that
n (y x) − Un (y x) = Op n−1/2 and h1/2 (3) U n σn (x) = Op (1) uniformly over (y x) (see Lemmas A.6, A.7, and A.9 of Lee, Linton, and Whang (2009)). Then if hn → 0, (4)
n (y x) = Un (y x)[1 + op (1)] U
uniformly over (y x). Thus, the asymptotic distribution of the test statistic Sn is the same as if Xi were observed directly. Now define 2 [FY |X (y|Xi ) − FY |X (y|Xj )] sgn(Xi − Xj ) U˜ n (y x) = n(n − 1) 1≤i<j≤n × Khn (Xi − x)Khn (Xj − x) Since E[Un (y x) − U˜ n (y x)|X1 Xn ] = 0, under regularity conditions assumed below, using the empirical process method (see, e.g., Ghosal, Sen, and van der Vaart (2000, Appendix) and van der Vaart and Wellner (1996)), it can be shown that 1/2
log n ˜ and U˜ n (y x) = Op (hn ) Un (y x) − Un (y x) = Op nhn uniformly over (y x). Then if log n/(nh3n ) → 0, (5)
Un (y x) = U˜ n (y x)[1 + op (1)]
uniformly over (y x). Under the null hypothesis (1), note that (6)
[FY |X (y|Xi ) − FY |X (y|Xj )] sgn(Xi − Xj ) ≤ 0
Hence, by (4), (5), and (6), the type I error probability is maximized asymptotically when Fx (y|x) ≡ 0, equivalently FY |X (y|x) = FY (y) for any (y x) ∈ Y × X . Therefore, to derive the limiting distribution under the null hypothesis, we consider the case that Fx (y|x) ≡ 0, equivalently FY |X (y|x) = FY (y) for any (y x). That is, Y and X are independent. Further assume that without loss of generality, the support of X is X = [0 1]. To establish the asymptotic null distribution of the test statistic, we make the following assumptions, which are standard in the literature on nonparametric estimation and testing.
592
S. LEE, O. LINTON, AND Y.-J. WHANG
ASSUMPTION 3.1: Assume that (a) Y and X are independent; (b) X = [0 1]; (c) the distribution of X is absolutely continuous with respect to Lebesgue measure and the probability density function of X is continuously differentiable and strictly positive in X ; (d) the distribution of Y is absolutely continuous with respect to Lebesgue measure; (e) K is a second-order kernel function with support [−1 1] and is twice continuously differentiable; (f) θ0 is a finite-dimensional parameter and θ − θ0 = Op (n−1/2 ); (g) for each w, ψ(w θ) is continuously differentiable with respect to θ; (h) for any (Wi x hn ), there exists a positive constant CL < ∞ such that |ξ(Wi x θ1 hn ) − ξ(Wi x θ2 hn )| ≤ CL θ1 − θ2 for all θ1 and θ2 in a neighborhood of θ0 , where ξ(Wi x θ hn ) ˜ θ)]Khn [ψ(w ˜ θ) − x] dFW (w) ˜ = sgn[ψ(Wi θ) − ψ(w To describe the limiting distribution of Sn , recall that q(u) = w)K(w) dw. Let βn be the largest solution to the equation (7) where (8)
−1 n
h
8λ π
sgn(u −
1/2 βn exp(−2β2n ) = 1
6 q(x)K 2 (x)K (x) dx + q2 (x)K(x)K (x) dx λ=− q2 (x)K 2 (x) dx
The following theorem gives the asymptotic distribution of the test statistic when the null hypothesis is true. THEOREM 3.1: Let Assumption 3.1 hold. Let hn satisfy hn log n → 0, nh3n / (log n) → ∞, and nh2n /(log n)4 → ∞. Then for any x,
x x2 (9) + o(1) Pr(4βn (Sn − βn ) < x) = exp − exp −x − 2 1 + 2 8βn 4βn In particular, lim Pr(4βn (Sn − βn ) < x) = exp(−e−x ) ≡ F∞ (x)
n→∞
REMARK 3.1: It is necessary to compute βn in (7) to construct a test based on Theorem 3.1. The constant λ in (8) can be computed easily for commonly used kernels that are twice differentiable and have compact support. For example, λ = 1177/118 ≈ 9975 for the Epanechnikov kernel K(u) =
593
TESTING FOR STOCHASTIC MONOTONICITY
075(1 − u2 )1(|u| ≤ 1) and λ = 131689/11063 ≈ 11904 for the biweight kernel K(u) = (15/16)(1 − u2 )2 1(|u| ≤ 1). It is straightforward to show that (10)
βn =
1 ∗ log[h−1 n c ] 2
1/2 +
∗ log[ 12 log[h−1 n c ]] ∗ 1/2 8( 12 log[h−1 n c ])
+o
∗ 1/2 (log[h−1 n c ]) 1
where c ∗ = (8λ/π)1/2 . Then one can use an approximation to βn by the first two terms on the right side of (10) or solve the nonlinear equation (7) numerically. REMARK 3.2: We note that the regularity conditions on hn are not very restrictive. Bandwidth sequences hn that converge to zero at a rate of n−η , η < 1/3, or (log n)−ν , ν > 1, satisfy the conditions imposed in Theorem 3.1. We might also consider data-dependent bandwidths such as provided by crossvalidation, for example. That is, let hn be a data-dependent sequence such that p hn / hn → 1, where hn is a deterministic sequence that satisfies the assumptions of Theorem 3.1. In view of the results in Einmahl and Mason (2005), one expects that under some suitable regularity conditions, the asymptotic distribuhn is the same as that given tion of the test Sn with data-dependent bandwidths in Theorem 3.1. However, it is beyond the scope of this paper to provide such regularity conditions and corresponding proofs. As in Theorem 4.2 of Ghosal, Sen, and van der Vaart (2000), the theorem suggests that one can construct a test with an asymptotic level α: (11)
Reject H0 if F∞ (4βn (Sn − βn )) ≥ 1 − α
for any 0 < α < 1. Alternatively, one can construct an α-level test with (9): (12)
Reject H0 if Fn (4βn (Sn − βn )) ≥ 1 − α
where for each n, Fn (x) is the “distribution function” of the form2
x x2 Fn (x) = exp − exp −x − 2 1 + 2 8βn 4βn Although (11) yields the correct size asymptotically, the results of Hall (1979, 1991) suggest that Pr[(11) is true|H0 ] = α + O(1/ log n) which is rather slow for practical purposes. However, the results of Piterbarg (1996, Theorem G1) suggest that Pr[(12) is true |H0 ] = α + O(n−q ) for some q > 0 which is potentially much better. In the next section, we carry out Monte Carlo experiments using both critical regions (11) and (12). In our experiments, a test based on 2 The approach in (12) to defining the critical region is motivated partly by the normalizing transformation approach (Phillips (1979)).
594
S. LEE, O. LINTON, AND Y.-J. WHANG
(12) performs much better in finite samples and yields size quite close to the nominal value. An alternative approach to constructing critical values would be to use a bootstrap resampling method (that imposes independence between X and Y ) and then to reject if F∞ (4βn (Sn − βn )) exceeds the 1 − α critical value of the bootstrap distribution of F∞ (4βn (Sn∗ − βn )) where Sn∗ is the bootstrapped test statistic (Horowitz (2001)). Hall (1993) showed in the related context of density estimation that a bootstrapped test yields error of order n−q for some q > 0. We expect a similar result can be established here. The bootstrap approach is much more computationally demanding than the asymptotic approach outlined above. We now turn to the consistency of the test. It is straightforward to show that the test specified by (11) or (12) is consistent against general alternatives. THEOREM 3.2: Assume that nh3n / log h−1 n → ∞. If Fx (y|x) > 0 for some (y x) ∈ Y × X , then the test specified by (11) or (12) is consistent at any level. We end this subsection by mentioning that the test and its asymptotic properties obtained in this section can be extended easily to the case when the null hypothesis in (1) holds only for Y and X1 , where X1 is a compact interval and a strict subset of X . In this case, Fx (y|x) ≡ 0 does not imply that Y and X are independent; however, this would not matter since our test statistic depends only on observations inside an open interval containing X1 . Thus, the asymptotic properties of the supremum test statistic would be the same with X1 except that h−1 n in (7) and (10) is replaced with measure(X1 )/ hn . 4. MONTE CARLO EXPERIMENTS This section presents the results of some Monte Carlo experiments that illustrate the finite-sample performance of the test. For each Monte Carlo experiment, X was independently drawn from a uniform distribution on [0 1]. To evaluate the performance of the test under the correct null hypothesis, Y ≡ U was generated independently from X, where U ∼ N(0 012 ). In addition, to see the power of the test, Y was also generated from Y = m(X) + U, where m(x) = x(1 − x). The simulation design considered here is similar to that of Ghosal, Sen, and van der Vaart (2000).√ To save computing time the test statistic was computed by the maximum of nUn (y x)/σ˜ n (x) over Y × X , where Y = {Y1 Y2 Yn }, X = {005 010 090 095}, and σ˜ n (x) was defined in Remark 2.3. The kernel function was K(u) = 075(1 − u2 ) for −1 ≤ u ≤ 1. The simulations used sample sizes of n = 50 100 200, and 500, and all the simulations were carried out in GAUSS using GAUSS pseudo-random number generators. For each simulation, the number of replications was 1500. Table I reports results of Monte Carlo experiments using critical values obtained from the asymptotic expansion Fn of the limiting distribution (see (12))
TESTING FOR STOCHASTIC MONOTONICITY
595
TABLE I SIMULATION RESULTS Sample Size n
Bandwidth hn = 04
h = 05
h = 06
h = 07
Using critical values obtained from the asymptotic expansion Fn of the limiting distribution (1500 replications in each experiment) Rejection proportions when the null hypothesis is true 50 0.014 0.021 0.025 0.030 100 0.028 0.033 0.034 0.034 200 0.025 0.031 0.036 0.033 500 0.032 0.039 0.033 0.037 Rejection proportions when the null hypothesis is false 50 0.687 0.762 100 0.976 0.988 200 1.000 1.000
0.771 0.989 1.000
0.760 0.977 1.000
Using critical values obtained from the type I extreme value distribution (1500 replications in each experiment) Rejection proportions when the null hypothesis is true 50 0.009 0.017 0.013 0.017 100 0.022 0.024 0.022 0.021 200 0.015 0.021 0.022 0.021 500 0.021 0.021 0.022 0.023 Rejection proportions when the null hypothesis is false 50 0.618 0.693 100 0.966 0.976 200 1.000 1.000
0.697 0.983 1.000
0.694 0.965 1.000
Using bootstrap critical values with 500 bootstrap samples (500 replications in each experiment) Rejection proportions when the null hypothesis is true 50 0.064 0.062 0.046 0.058 Rejection proportions when the null hypothesis is false 50 0.814 0.872
0.880
0.856
and also using those from the type I extreme value distribution (see (11)). The nominal level was 5%. First, consider the first panel of the table that shows results with the critical values from Fn . When the null hypothesis is true, each rejection proportion is below the nominal level for all the bandwidths and is maximized at n = 500 and hn = 05. It can be seen that the best hn is decreasing with the sample size and the performance of the test is less sensitive to hn as n gets large. When the null hypothesis is false, for all values of hn , the powers of the test are high for n = 50, almost 1 for n = 100, and 1 for n = 200. The performance of the test with critical values from the type I extreme value distribution is uniformly worse, as seen from the second panel of the table.
596
S. LEE, O. LINTON, AND Y.-J. WHANG
Hence, our simulation study shows that the approximation based on (12) gives a substantial improvement in size. In addition, Table I gives results with bootstrap critical values. Each bootstrap resample is generated by random sampling of Y and X separately with replacement (i.e., imposing independence between Y and X). Because of very lengthy computation times, the Monte Carlo experiments are carried out with only n = 50 and only 500 replications in each experiment. There were 500 bootstrap resamplings for each replication in the Monte Carlo experiment. Not surprisingly, it can be seen that when the null hypothesis is true, the difference between actual and nominal rejection proportions is smaller than either of the asymptotic critical values. As a result, the test with bootstrap critical values has better power. In view of these experiment results, we recommend using bootstrap critical values when the sample size is small or moderate. 5. APPLICATION TO INTERGENERATIONAL INCOME MOBILITY This section presents an empirical example in which the test statistic Sn is used to test a hypothesis about the stochastic monotonicity between sons’ incomes and parental incomes. See Solon (1999, 2002) for a detailed survey on intergenerational income mobility in the United States and other countries. A large body of this literature focuses on the extent to which sons’ incomes are correlated with fathers’ or parental incomes. Testing the hypothesis of stochastic monotonicity in (1) with Y being sons’ incomes and X being parental incomes can give further insights into understanding intergenerational income mobility. For example, if one fails to reject the hypothesis, then that would imply that sons’ incomes with high parental incomes are higher not only on average, but also in the stochastic dominance sense, than those with low parental incomes.3 We use data from the Panel Study of Income Dynamics (PSID), which has been used frequently to study mobility after it was used in a highly influential paper by Solon (1992). In particular, we use Minicozzi’s (2003) data extract that is available on the Journal of Applied Econometrics website. The Y variable is the logarithm of sons’ averaged full-time real labor income at ages 28 and 29, and the X variable is the logarithm of parental predicted permanent income.4 3 The related paper Dearden, Machin, and Reed (1997) investigated the intergenerational income mobility in Britain using the quantile transition matrix approach. 4 Minicozzi (2003) computed sons’ average incomes only when incomes at both ages 28 and 29 are available and regarded those with only one or no income record as censored observations. In our empirical work, we define sons’ average incomes as the average of observed incomes. Hence, sons’ average incomes are defined for those with only one income record at age 28 or at age 29. Using our definition, only 12 cases have missing sons’ incomes. This is only 2% of the 628 original observations in Minicozzi’s (2003) data extract. Hence, censoring is not a serious issue with our definition. Parental permanent incomes are predicted values. The asymptotic distribution of Sn is the same as long as a parametric model used by Minicozzi (2003, equations (8) and (9)) gives consistent estimates of parental permanent incomes.
TESTING FOR STOCHASTIC MONOTONICITY
597
FIGURE 1.—Intergenerational income mobility: local linear quantile regression estimates of sons’ log incomes on parental log incomes.
Figure 1 shows local linear quantile regression estimates of sons’ log incomes on parental log incomes. The kernel function used in this estimation is the same as that used in the Monte Carlo experiment. For each quantile, the bandwidth is chosen by the simple rule of thumb suggested by Fan and Gijbels (1996, p. 202): 0.59 (quantile = 10%), 0.55 (25%), 0.55 (50%), 0.56 (75%), and 0.69 (90%). It can be seen that all the conditional quantiles of sons’ incomes are increasing functions of parental incomes for most of the range of the support of X. This suggests that there may be stochastic monotonicity between sons’ and parental incomes. To √ test this formally, the test statistic Sn is computed by the maximum of nUn (y x)/ σn (x) over Y × X , where Y = {Y1 Y2 Yn } and X = [848 1085], where the two endpoints of X correspond to 1st and 99th percentiles of parental log permanent incomes. The same kernel is used with a bandwidth of hn = 055, which is used to estimate the local linear median estimator above. The test gives Sn = 05227. The normalizing constant βn in (7) and (10) is obtained with the assumption that X = [0 1], but it is trivial to extend this to a more general case, X = [a b]. One just needs to replace h−1 n in (7) and (10) with (b − a)/ hn . After this simple modification, the critical values
598
S. LEE, O. LINTON, AND Y.-J. WHANG
at 10% nominal level are 1.71 using (11) and 1.72 using (12). Changing the value of the bandwidth to 075hn or to 125hn did not change this conclusion. Thus, we fail to reject the null hypothesis of stochastic monotonicity at any conventional level and this confirms findings from Figure 1. 6. TESTING FOR STOCHASTIC MONOTONICITY IN A VECTOR In this section, we extend our analysis to the case where monotonicity in a vector is of interest. Let X be a d-dimensional vector of random variables whose distribution is absolutely continuous with respect to Lebesgue measure on Rd . We consider the following hypothesis, which is a multivariate generalization of (1): (13)
H0 : For each y ∈ Y FY |X (y|x) ≤ FY |X (y|x )
whenever xj ≥ xj for all j = 1 d and for x ≡ (x1 xd ) x ≡ (x1 xd ) ∈ X . The hypothesis (13) restricts the stochastic ordering FY |X (y|x) only when all components of x are ordered componentwise. In other words, using the terminology of Manski (1997), testing (13) amounts to testing the stochastic semimonotonicity of FY |X . The hypothesis (13) can be of interest in a number of empirical applications. For example, Y is the output and X is a vector of inputs used for production (Manski (1997)). We now describe a test statistic for (13). Let {(Yi Xi ) : i = 1 n} denote a random sample from (Y X). As in Section 2, we assume that Xi is not observed, θ. For u ≡ (u1 ud ), but Xi is estimated with a root-n consistent estimator of let K(·) denote a d-dimensional product of univariate kernel functions K(u) = d d j=1 K(uj ) and let I(u > 0) = j=1 1(uj > 0). Consider the U-process Un (y x) =
2 [1(Yi ≤ y) − 1(Yj ≤ y)] sgn( Xi − Xj ) n(n − 1) 1≤i<j≤n Xi − x)Khn ( Xj − x) × Khn (
where Khn (·) = h−d n K(·/ hn ) and sgn(x) = I(x > 0) − I(x < 0). Note that Xi and sgn(Xi − Xj ) has a nonzero value only when semimonotonicity between Xj holds. Again, we define our test statistic as a supremum statistic (14)
Sn =
sup (yx)∈Y ×X
Un (y x) cn (x)
TESTING FOR STOCHASTIC MONOTONICITY
599
√ sn (x)/ n, where with cn (x) = s2n (x) =
4 sgn( Xi − Xj ) sgn( Xi − Xk ) n(n − 1)(n − 2) 1≤i =j =k≤n 2 Xj − x)Khn ( Xk − x) Khn ( Xi − x) × Khn (
Under the null hypothesis (13), note that (15)
[FY |X (y|Xi ) − FY |X (y|Xj )] sgn(Xi − Xj ) ≤ 0
Thus, using arguments similar to those used in Section 3.1, it can be shown that the type I error probability is maximized asymptotically when the inequality (15) is equality for all i and j. This equality occurs when Y is independent of X. Therefore, to derive the limiting distribution under the null hypothesis, we consider the case that Y and X are independent. ASSUMPTION 6.1: Assume that Y and X are independent and the support of X is X = [0 1]d . Let conditions (c)–(h) of Assumption 3.1 hold. Let bn be the largest solution to the equation (16)
−d −(d−1) n
h 2
8λ π
d/2 bdn exp(−2b2n ) = 1
where λ is defined in (8). The following theorem is a generalization of Theorem 6.1. THEOREM 6.1: Let Assumption 6.1 hold. Let hn satisfy hn log n → 0, nh3d n / 2(d+1) (log n) → ∞, and nhd+1 /(log n) → ∞. Then for any x, n
(17)
x2 Pr(4bn (Sn − bn ) < x) = exp − exp −x − 2 8bn
d x 1+ 2 + o(1) 4bn
In particular, lim Pr(4bn (Sn − bn ) < x) = exp(−e−x ) ≡ F∞ (x)
n→∞
Then a test with asymptotically valid critical values can be constructed as in Section 3.1. Furthermore, it can be shown that the corresponding test is −1 consistent at any level against fixed alternatives, provided that nh3d n / log hn → ∞.
600
S. LEE, O. LINTON, AND Y.-J. WHANG
7. CONCLUSIONS We have proposed a test for stochastic monotonicity and have developed the asymptotic null distribution of our test statistic. There remain several research topics we have not addressed in this paper. First, we have only established the consistency of the test against fixed general alternatives. It would be useful to establish asymptotic results regarding local powers of the test. Second, we have not considered an “optimal” choice of the bandwidth used in the test statistic. Our theoretical results for the asymptotic null distribution and the consistency of the test do not distinguish different bandwidths, provided that a sequence of bandwidths satisfies some weak regularity conditions on rates of convergence. Thus, it would be necessary to develop a finer asymptotic result to discuss an optimal bandwidth choice. Doing this and developing a corresponding datadependent bandwidth choice are topics for future research. Using a method similar to that used in this paper, we can extend the supremum test of Ghosal, Sen, and van der Vaart (2000), who considered the monotonicity of the regression function with a scalar explanatory variable, to the multivariate setup. This is another topic for future research. REFERENCES BEIRLANT, J., AND J. H. J. EINMAHL (1996): “Maximal Type Test Statistics Based on Conditional Processes,” Journal of Statistical Planning and Inference, 53, 1–19. [587] BICKEL, P. J., AND M. ROSENBLATT (1973): “On Some Global Measures of the Deviations of Density Function Estimates,” Annals of Statistics, 1, 1071–1095. [587] BLUNDELL, R., A. GOSLING, H. ICHIMURA, AND C. MEGHIR (2007): “Changes in the Distribution of Male and Female Wages Accounting for Employment Composition Using Bounds,” Econometrica, 75, 323–363. [586] BOWMAN, A. W., M. C. JONES, AND I. GIJBELS (1998): “Testing Monotonicity of Regression,” Journal of Computational and Graphical Statistics, 7, 489–500. [588] BUETTNER, T. (2003): “R&D and the Dynamics of Productivity,” Manuscript, London School of Economics. [586] DAWID, A. P. (1979): “Conditional Independence in Statistical Theory,” Journal of the Royal Statistical Society, Series B, 41, 1–31. [590] DEARDEN, L., S. MACHIN, AND H. REED (1997): “Intergenerational Mobility in Britain,” Economic Journal, 107, 47–66. [596] EINMAHL, J. H. J., AND I. VAN KEILEGOM (2008): “Specification Tests in Nonparametric Regression,” Journal of Econometrics, 143, 88–102. [587] EINMAHL, U., AND D. M. MASON (2005): “Uniform in Bandwidth Consistency of Kernel-Type Function Estimators,” Annals of Statistics, 33, 1380–1403. [593] EKELAND, I., J. J. HECKMAN, AND L. NESHEIM (2004): “Identification and Estimation of Hedonic Models,” Journal of Political Economy, 112, S60–S109. [588] ELLISON, G., AND S. F. ELLISON (2007): “Strategic Entry Deterrence and the Behavior of Pharmaceutical Incumbents Prior to Patent Expiration,” Working Paper, Department of Economics, MIT. [586,587,590] ERICSON, R., AND A. PAKES (1995): “Markov-Perfect Industry Dynamics: A Framework for Empirical Work,” Review of Economic Studies, 62, 53–82. [586] FAN, J., AND I. GIJBELS (1996): Local Polynomial Modelling and Its Applications. London: Chapman & Hall. [597]
TESTING FOR STOCHASTIC MONOTONICITY
601
GHOSAL, S., A. SEN, AND A. W. VAN DER VAART (2000): “Testing Monotonicity of Regression,” Annals of Statistics, 28, 1054–1082. [587-589,593,594,600] GIJBELS, I., P. HALL, M. C. JONES, AND I. KOCH (2000): “Tests for Monotonicity of a Regression Mean With Guaranteed Level,” Biometrika, 87, 663–673. [588] HALL, P. (1979): “On the Rate of Convergence of Normal Extremes,” Journal of Applied Probability, 16, 433–439. [593] (1991): “On Convergence Rate of Suprema,” Probability Theory and Related Fields, 89, 447–455. [593] (1993): “On Edgeworth Expansion and Bootstrap Confidence Bands in Nonparametric Curve Estimation,” Journal of the Royal Statistical Society, Series B, 55, 291–304. [587,588,594] HALL, P., AND N. E. HECKMAN (2000): “Testing for Monotonicity of a Regression Mean by Calibrating for Linear Functions,” Annals of Statistics, 28, 20–39. [587,588] HALL, P., R. C. L. WOLFF, AND Q. YAO (1999): “Methods for Estimating a Conditional Distribution Function,” Journal of the American Statistical Association, 94, 154–163. [589] HOPENHAYN, H. A., AND E. C. PRESCOTT (1992): “Stochastic Monotonicity and Stationary Distributions for Dynamic Economics,” Econometrica, 60, 1387–1406. [586] HOROWITZ, J. L. (2001): “The Bootstrap,” in The Handbook of Econometrics, Vol. 5, ed. by J. J. Heckman and E. Leamer. Amsterdam: North-Holland. [588,594] LEE, S., O. LINTON, AND Y.-J. WHANG (2009): “Supplement to ‘Testing for Stochastic Monotonicity’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ ecta/Supmat/7145_proofs.pdf; http://www.econometricsociety.org/ecta/Supmat/7145_Data and programs.pdf. [587,588,591] LEVY, H. (2006): Stochastic Dominance, Investment Decision Making Under Uncertainty (Second Ed.). Berlin: Springer. [589] LINTON, O., E. MAASOUMI, AND Y.-J. WHANG (2005): “Consistent Testing for Stochastic Dominance Under General Sampling Schemes,” Review of Economic Studies, 72, 735–765. [588] LUCAS, R. E., AND N. L. STOKEY (1989): Recursive Methods in Dynamic Economics. Cambridge, MA: Harvard University Press. [586] MANSKI, C. F. (1997): “Monotone Treatment Response,” Econometrica, 65, 1311–1334. [598] MANSKI, C. F., AND J. PEPPER (2000): “Monotone Instrumental Variables: With an Application to the Returns to Schooling,” Econometrica, 68, 997–1010. [586] MINICOZZI, A. L. (2003): “Estimation of Sons’ Intergenerational Earnings Mobility in the Presence of Censoring,” Journal of Applied Econometrics, 18, 291–314. [596] NOLAN, D., AND D. POLLARD (1987): “U-Processes: Rates of Convergence,” Annals of Statistics, 15, 780–799. [587] OLLEY, G. S., AND A. PAKES (1996): “The Dynamics of Productivity in the Telecommunications Equipment Industry,” Econometrica, 64, 1263–1297. [586] PAKES, A. (1986): “Patents as Options: Some Estimates of the Value of Holding European Patent Stocks,” Econometrica, 54, 755–784. [586] PHILLIPS, P. C. B. (1979): “Normalizing Transformations and Expansions for Functions of Statistics,” Research Notes, Birmingham University. [593] (1988): “Conditional and Unconditional Statistical Independence,” Journal of Econometrics, 38, 341–348. [590] PITERBARG, V. I. (1996): Asymptotic Methods in the Theory of Gaussian Processes and Fields, Translation of Mathematical Monographs, Vol. 148. Providence, RI: American Mathematical Society. [587] RIO, E. (1994): “Local Invariance Principles and Their Application to Density Estimation,” Probability Theory and Related Fields, 98, 21–45. [587] SMALL, D., AND Z. TAN (2007): “A Stochastic Monotonicity Assumption for the Instrumental Variables Method,” Working Paper, Department of Statistics, Wharton School, University of Pennsylvania. [586] SOLON, G. (1992): “Intergenerational Income Mobility in the United States,” American Economic Review, 82, 393–408. [596]
602
S. LEE, O. LINTON, AND Y.-J. WHANG
(1999): “Intergenerational Mobility in the Labor Market,” in Handbook of Labor Economics, Vol. 3A, ed. by O. Ashenfetter and D. Card. Amsterdam: North-Holland, 1761–1800, Chapter 29. [596] (2002): “Cross-Country Differences in Intergenerational Earnings Mobility,” Journal of Economic Perspectives, 16, 59–66. [596] STUTE, W. (1986): “Conditional Empirical Processes,” Annals of Statistics, 14, 638–647. [587] VAN DER VAART, A. W., AND J. A. WELLNER (1996): Weak Convergence and Empirical Processes. New York: Springer. [591]
Dept. of Economics, University College London, Gower Street, London WC1E 6BT, U.K.;
[email protected], Dept. of Economics, London School of Economics, Houghton Street, London WC2A 2AE, U.K.;
[email protected], http://econ.lse.ac.uk/staff/olinton/, and Dept. of Economics, Seoul National University, Seoul 151-742, Korea;
[email protected], http://plaza.snu.ac.kr/~whang. Manuscript received May, 2007; final revision received July, 2008.
Econometrica, Vol. 77, No. 2 (March, 2009), 603–606
A MECHANISM FOR ELICITING PROBABILITIES BY EDI KARNI1 This paper describes a direct revelation mechanism for eliciting agents’ subjective probabilities. The game induced by the mechanism has a dominant strategy equilibrium in which the players reveal their subjective probabilities. KEYWORDS: Probability elicitation, direct revelation mechanism.
1. INTRODUCTION AN INDIVIDUAL’S ASSESSMENT of the likelihoods of events in which he has no stake may be of interest to others. This is the case when the person whose assessment is sought is an expert and the others are decision makers facing a choice between alternative courses of action whose consequences depend on which of the events obtains. Investors, for instance, may be interested in the probability a geologist assigns to discovering mineral deposits beneath a particular tract of land; a patient might want to seek a second opinion on the likelihood of success of a treatment recommended by his physician. Procedures for eliciting the subjective probabilities of agents whose preferences are represented by subjective expected utility functionals include the proper scoring rule method (Savage (1971)), the promissory notes method (de Finetti (1974)), and the lotteries method (Kadane and Winkler (1988)). The first two procedures entail trade-offs between the incentives and the accuracy of the probability estimate. The third procedure is not incentive compatible.2 This paper introduces a new elicitation mechanism that yields accurate elicitation while allowing the incentives to be set at any desirable level. 2. THE ELICITATION MECHANISM Let S be a set of states, one of which is the true state. Subsets of S are events. An event is said to obtain if the true state belongs to it. Simple acts are mapping from S to the real numbers, representing monetary payoffs, with finite images. A bet on an event E is a simple act that pays x dollars if E obtains and y dollars otherwise, x > y and is denoted by xE y A simple lottery is a finite list of monetary prizes (that is, (x1 xm ) ∈ Rm , m < ∞) and a corresponding probability vector (p1 pm ) where, for each m i, pi ≥ 0 is the probability of winning the prize xi and i=1 pi = 1 1 I am grateful to John Hey for stimulating conversations and to LUISS University for its hospitality. I also benefited greatly from the comments and suggestions of the editor and three anonymous referees. 2 For a more detailed discussion, see the concluding section.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7833
604
EDI KARNI
Denote by D the union of the sets of simple acts and lotteries, let be a preference relation on D, and denote by and ∼ the asymmetric and symmetric parts of respectively. A preference relation on D that restricted to the set of finite acts is said to exhibit probabilistic sophistication if it ranks acts or lotteries solely on the basis of their implied probability distributions over outcomes (see Machina and Schmeidler (1995)). In particular, if π is the probability measure implicit in then probabilistic sophistication implies that, for all acts f and lotteries (p x y) = [x p; y (1 − p)] p ∈ [0 1], π(f −1 (x)) = p implies xf −1 (x) y ∼ (p x y). Consider an agent whose assessment of the probability of the event E is of interest. Suppose that the agent’s preference relation on D displays probabilistic sophistication and dominance in the sense that (p x y) (p x y) for all x > y if and only if p ≥ p Denote by π(E) the probability the agent assigns to the event E The elicitation mechanism selects a random number r from a uniform distribution on [0 1] and requires the agent to submit a report, μ ∈ [0 1], of his subjective probability assessment of the event E The mechanism awards the agent the payoff β := xE y if μ ≥ r and the lottery (r x y) if μ < r To see that truthful reporting is the agent’s unique dominant strategy, suppose that the agent reports μ > π(E) If r ≤ π(E) or r ≥ μ, the agent’s payoff is the same regardless of whether he reports μ or π(E).3 If r ∈ (π(E) μ) the agent’s payoff is β; had he reported π(E) instead of μ, his payoff would have been (r x y) But r > π(E), which, by probabilistic sophistication and dominance, implies (r x y) β Thus the agent is worse off reporting μ instead of π(E). A similar argument applies when μ < π(E).4 The elicitation method described in the preceding section is quite general. In particular, it may be extended to the case of many agents by running the mechanism separately for each agent.5 It may also be extended to finitely many events by running the mechanism separately for each of the events. Moreover, if the payoff difference is sufficiently large, the agent is induced to exert the effort necessary to arrive at an accurate assessment of his subjective probability. An equivalent probability-elicitation auction mechanism is as follows: The mechanism selects r as before and runs a continuous increasing bid auction between the agent and a dummy bidder. The dummy bidder stays in the auction as long as the bid is smaller than r and drops out when the bid equals r. Starting at 0, the bid increases continuously as long as the agent and the dummy bidder are both “in the auction” and stops when one of them drops out or the bid 3 Regardless of whether the agent reported μ or π(E) his payoff is β if r ≤ π(E) and (r x y) if r ≥ μ. 4 In this case, if r ∈ (μ π(E)), the agent wins (r x y); had he reported π(E) instead, he would have won β But β (r x y). 5 In the case of many agents, the mechanism may be redesigned so that the random number, r, is generated endogenously. Specifically, the highest reported value is substituted for r and all other reports are treated as μ. For more details see Karni (2008).
A MECHANISM FOR ELICITING PROBABILITIES
605
reaches 1, whichever is first.6 The agent is awarded (r x y) if he is the first to quit and β otherwise Clearly, the agent’s dominant strategy is to stay in the auction as long as the bid is smaller than π(E) and to quit when it is equal to π(E). 3. CONCLUDING REMARKS In situations in which the agent must be induced to take costly measures (e.g., time and effort) in order to arrive at a reliable probability estimate, the mechanisms introduced in this paper perform better than the elicitation procedures discussed in the literature. Consider, for example, the proper scoring rule method. Let E denote the event of interest and denote by E c its complement. Then, according to this method, the agent’s payoff equals the score −r(δE −μ)2 , where r is a positive constant, μ is the agent’s reported probability of E and δE is the indicator function of the event E. Let w˜ be the agent’s random wealth and denote by F its cumulative distribution function. Consistent with the no-stake requirement, suppose that w˜ is distributed independently of E. If the agent’s subjective assessment of the probability of E is π(E) then his problem is Max π(E) u(w − r(1 − μ)2 ) dF(w | E) (1) μ
+ (1 − π(E))
u(w − rμ2 ) dF(w | E c )
The necessary condition is (2)
μ∗ (r) K(r)π(E) = (1 − μ∗ (r)) 1 − π(E)
where μ∗ (r) denotes the optimal solution and K(r) = u w − r(1 − μ∗ (r))2 dF(w | E)
u (w − rμ∗ (r)2 ) dF(w | E c )
Thus μ∗ (r) = π(E) if and only if K(r) = 1. Unless the agent is risk neutral, the elicitation of probabilities by the scoring rule method confounds subjective probabilities and marginal utilities. If, to motivate the agent to assess the probability of the event of interest accurately, it is necessary to expose him to 6
The agent is “in the auction” until he signals that he quits.
606
EDI KARNI
risk by setting a large value of r, then, in general, μ∗ (r) is a biased estimate of π(E).7 To obtain an unbiased assessment of π(E), it is necessary to let r tend to zero but then the agent has no incentive to assess the probability of the event E accurately. The promissory notes method of de Finetti (1974) suffers from the same problem.8 The accuracy of the elicitation procedures described here depends critically on the agent having no stake in the event of interest.9 If he does have a stake in the event, the evaluations of the payoffs of the bet and the lotteries that figure in the mechanism are event dependent, and the preference relation does not exhibit probabilistic sophistication. REFERENCES DE FINETTI, B. (1974): Theory of Probability, Vol. 1. New York: Wiley. [603,606] JAFFRAY, J.-Y., AND E. KARNI (1999): “Elicitation of Subjective Probabilities When the Initial Endowment Is Unobserved,” Journal of Risk and Uncertainty, 18, 5–20. [606] KADANE, B. J., AND R. L. WINKLER (1988): “Separating Probability Elicitation From Utility,” Journal of the American Statistical Association, 83, 357–363. [603,606] KARNI, E. (1999): “Elicitation of Subjective Probabilities When Preferences Are StateDependent,” International Economic Review, 40, 479–486. [606] (2008): “A Mechanism Design for Probability Elicitation,” Unpublished Manuscript, Johns Hopkins University. [604] MACHINA, M. J., AND D. SCHMEIDLER (1995): “Bayes Without Bernoulli: Simple Conditions for Probabilistically Sophisticated Choice,” Journal of Economic Theory, 67, 106–128. [604] SAVAGE, L. J. (1971): “Elicitation of Personal Probabilities and Expectations,” Journal of the American Statistical Association, 66, 783–801. [603]
Dept. of Economics, Johns Hopkins University, Baltimore, MD 21218, U.S.A.;
[email protected]. Manuscript received April, 2008; final revision received November, 2008.
7 If the agent is risk averse and π(E) = 1/2 then μ∗ (r) is biased toward 1/2 and the biased increases with r Similarly, if the agent is risk inclined, then μ∗ (r) is bias toward either 0 (if π(E) < 1/2) or toward 1 (if π(E) > 1/2) and the bias increases with r. 8 The lottery method requires the agent to indicate the probability p that would make him indifferent between (p x y) and β However, because it does not specify the payoff to the agent, it is not incentive compatible. 9 When the agent has a stake in the events of interest, the other methods also fail (Kadane and Winkler (1988)). Jaffray and Karni (1999) and Karni (1999) developed elicitation methods designed to overcome this difficulty.
Econometrica, Vol. 77, No. 2 (March, 2009), 607–614
ANNOUNCEMENTS 2009 NORTH AMERICAN SUMMER MEETING
THE 2009 NORTH AMERICAN SUMMER MEETING of the Econometric Society will be held June 4–7, 2009, hosted by the Department of Economics, Boston University, in Boston, MA. The program will be composed of a selection of invited and contributed papers. The program co-chairs are Barton Lipman and Pierre Perron of Boston University. The local arrangements chair is Marc Rysman of Boston University. In addition to submitted papers, the program will include the Presidential Address by Roger Myerson (University of Chicago), the Walras–Bowley Lecture by Jean-Marc Robin (Paris School of Economics, University of Paris I, and University College London), the Cowles Lecture by Victor Chernozhukov (Massachusetts Institute of Technology) and the following plenary sessions: Behavioral economics: David Laibson, Harvard University Sendhil Mullainathan, Harvard University Decision theory: Eddie Dekel, Northwestern University and Tel Aviv University Ariel Rubinstein, Tel Aviv University and New York University Development economics: Edward Miguel, University of California, Berkeley James Robinson, Harvard University Econometrics of policy evaluation: Guido Imbens, Harvard University Edward Vytlacil, Yale University Education policy: James Heckman, University of Chicago Derek Neal, University of Chicago Factor models: Serena Ng, Columbia University James Stock, Harvard University The financial crisis: John Geanakoplos, Yale University © 2009 The Econometric Society
DOI: 10.3982/ECTA772ANN
608
ANNOUNCEMENTS
Jeremy Stein, Harvard University Theories of conflict: Sandeep Baliga, Northwestern University Debraj Ray, New York University Trade and geography: Samuel Kortum, University of Chicago Thomas Holmes, University of Minnesota Information on local arrangements will be available later. The program cochairs are Barton Lipman and Pierre Perron of Boston University. The local arrangements chair is Marc Rysman of Boston University. Program Committee: Daron Acemoglu, Massachusetts Institute of Technology (Macroeconomics: Growth, and Political Economy) John Campbell, Harvard University (Financial Economics) Yeon-Koo Che, Columbia University (Auctions and Contracts) Francis X. Diebold, University of Pennsylvania (Financial Econometrics) Jean-Marie Dufour, McGill University (Theoretical Econometrics) Jonathan Eaton, New York University (International Trade) Glenn Ellison, Massachusetts Institute of Technology (Theoretical Industrial Organization) Charles Engel, University of Wisconsin (International Finance) Larry Epstein, Boston University (Plenary Sessions) Hanming Fang, Duke University (Theoretical Public Economics) Jesus Fernandez-Villaverde, University of Pennsylvania (Macroeconomics: Dynamic Models and Computational Methods) Simon Gilchrist, Boston University (Plenary Sessions) Wojciech Kopczuk, Columbia University (Empirical Public Economics) Thomas Lemieux, University of British Columbia (Empirical Microeconomics) Dilip Mookherjee, Boston University (Plenary Sessions) Kaivan Munshi, Brown University (Development) Muriel Niederle, Stanford University (Experimental Economics and Market Design) Edward O’Donoghue, Cornell University (Behavioral Economics) Claudia Olivetti, Boston University (Empirical Labor/Macroeconomics) Christine Parlour, University of California, Berkeley (Corporate Finance/ Microeconomic Foundations of Asset Pricing) Zhongjun Qu, Boston University (Plenary Sessions) Lucrezia Reichlin, London School of Economics (Applied Macroeconomics/ Factor Models: Theory and Application)
ANNOUNCEMENTS
609
Marc Rysman, Boston University (Empirical Industrial Organization) Uzi Segal, Boston College (Decision Theory) Chris Shannon, University of California, Berkeley (General Equilibrium and Mathematical Economics) Balazs Szentes, University of Chicago (Economic Theory) Julia Thomas, Ohio State University (Macroeconomics: Business Cycles) Timothy Vogelsang, Michigan State University (Time Series Econometrics) Adonis Yatchew, University of Toronto (Micro-Econometrics and Non-Parametric Methods) Muhammet Yildiz, Massachusetts Institute of Technology (Game Theory) 2009 AUSTRALASIAN MEETING
THE ECONOMETRIC SOCIETY AUSTRALASIAN MEETING in 2009 (ESAM09) will be held in Canberra, Australia, from July 7th to July 10th. ESAM09 will be hosted by the College of Business and Economics at the Australian National University, and the program committee will be co-chaired by Heather Anderson and Maria Racionero. The program will include plenary, invited and contributed sessions in all fields of economics. Prospective contributors are invited to submit titles and abstracts of their papers in both theoretical and applied economics and econometrics by March 6th 2009 via the conference website at http://esam09.anu.edu.au. Each person may submit only one paper, or be a co-author on others providing that they will present no more than one paper. At least one co-author must be a member of the Society or must join prior to submission. The ESAM09 conference website contains details about the program, invited speakers, the paper submission process, and conference registration. 2009 FAR EAST AND SOUTH ASIA MEETING
THE 2009 FAR EAST AND SOUTH ASIA MEETING of the Econometric Society (FESAMES 2009) will be hosted by the Faculty of Economics of the University of Tokyo, on 3–5 August, 2009. The venue of the meeting is the Hongo campus of the University of Tokyo which is about ten minutes away from the Tokyo JR station. The program will consist of invited and contributed sessions in all fields of economics. Prospective contributors are invited to submit their papers electronically to the appropriate program committee member through the conference website: http://www.e.u-tokyo.ac.jp/cirje/research/conf/FESAMES2009/ fesames2009.html Complete papers (with title, abstract and full text) must be submitted by 20 April, 2009.
610
ANNOUNCEMENTS
Paper acceptance notification will be forwarded by email by 20 May, 2009. Final versions of accepted papers must be uploaded to the conference website by 8 July, 2008. Paper presenters must register by that date. Each author may submit only one paper, or be a co-author in others, provided no author would present more than one paper. At least one co-author must be a member of the Society or must join (electronically at http://www.econometricsociety.org) prior to submission. Partial subsidy is available for travel and local expenses of young economists submitting a paper for a contributed session in the conference. Confirmed Invited Speakers: Dilip Abreu (Princeton University) Abhijit Banerjee (Massachusetts Institute of Technology) Markus K. Brunnermeier (Princeton University) Larry G. Epstein (Boston University) Faruk R. Gul (Princeton University) James Heckman (University of Chicago) Han Hong (Stanford University) Michihiro Kandori (University of Tokyo) Dean Karlan (Yale University) Nobuyuki Kiyotaki (Princeton University) John List (University of Chicago) Charles Manski (Northwestern University) Daniel McFadden (University of California, Berkeley) Costas Meghir (University College London) Roger Myerson (University of Chicago) Sendhil Mullainathan (Harvard University) Ariel Pakes (Harvard University) Giorgio Primiceri (Northwestern University) Jean-Marc Robin (Paris School of Economics/UCL) Yuliy Sannikov (Princeton University) Hyun Song Shin (Princeton University) Christopher Sims (Princeton University) James Stock (Harvard University) David Weir (University of Michigan) Program Committee: Co-Chair Hidehiko Ichimura (University of Tokyo) Hitoshi Matsushima (University of Tokyo) Toni Braun, Macroeconomics Xiaohong Chen, Econometric Theory Shinichi Fukuda, Monetary Economics, Macroeconomics, International Finance
ANNOUNCEMENTS
611
Takeo Hoshi, Finance and Banking, Monetary Economics, Japanese Economy Toshihiro Ihori, Public Economics, Fiscal Policy Hideshi Itoh, Contract Theory Atsushi Kajii, Economic Theory, Information Economics, General Equilibrium, Game Theory Kazuya Kamiya, General Equilibrium, Decision Theory, Computational Economics Yuichi Kitamura, Microeconometrics Kazuharu Kiyono, Trade, Industrial Economics, Applied Economics Siu Fai Leung, Labor Akihiko Matsui, Game Theory Tomoyuki Nakajima, Macroeconomic Theory Masao Ogaki, Macroeconometrics Hiroshi Ohashi, Industrial Organization, International Trade Joon Y. Park, Time Series Econometrics Tatsuyoshi Saijo, Experimental Economics, Mechanism Design Makoto Saito, Asset Pricing, Consumption and Investment, Monetary Theory Yasuyuki Sawada, Development, Applied Econometrics Shigehiro Serizawa, Mechanism Design, Social Choice Theory Akihisa Shibata, International Macroeconomics Takatoshi Tabuchi, Urban Economics, International Trade, Economic Geography Noriyuki Yanagawa, Law and Economics, Financial Contract Local Organizing Committee: Motoshige Itoh (Chair) (University of Tokyo) Yoichi Arai (University of Tokyo) Yun Jeong Choi (University of Tokyo) Julen Esteban-Pretel (University of Tokyo) Masahiro Fujiwara (University of Tokyo) Fumio Hayashi (University of Tokyo) Isao Ishida (University of Tokyo) Takatoshi Ito (University of Tokyo) Katsuhito Iwai (University of Tokyo) Yasushi Iwamoto (University of Tokyo) Yoshitsugu Kanemoto (University of Tokyo) Takashi Kano (University of Tokyo) Takao Kobayashi (University of Tokyo) Tatsuya Kubokawa (University of Tokyo) Naoto Kunitomo (University of Tokyo) Hisashi Nakamura (University of Tokyo) Tetsuji Okazaki (University of Tokyo) Yasuhiro Omori (University of Tokyo)
612
ANNOUNCEMENTS
Akihiko Takahashi (University of Tokyo) Yoshiro Miwa (University of Tokyo) Kazuo Ueda (University of Tokyo) Makoto Yano (Kyoto University) Jiro Yoshida (University of Tokyo) Hiroshi Yoshikawa (University of Tokyo) 2009 EUROPEAN MEETING
THE 2009 EUROPEAN MEETING of the Econometric Society (ESEM) will take place in Barcelona, Spain from 23 to 27 August, 2009. The Meeting is jointly organized by the Barcelona Graduate School of Economics and it will run in parallel with the Congress of the European Economic Association (EEA). The Program Committee Chairs are Prof. Juuso Välimäki (Helsinki School of Economics) for Theoretical and Applied Economics; Prof. Gerard J. van der Berg (Free University Amsterdam) for Econometrics and Empirical Economics. This year’s Fisher–Schultz Lecture will be given by Faruk Gul (Princeton University). The Laffont Lecture will be given by Guido Imbens (Harvard University). The Local Arrangements Committee: Albert Carreras, Chairman—Universitat Pompeu Fabra and Barcelona GSE Carmen Bevià—Universitat Autònomade Barcelona and Barcelona GSE Jordi Brandts—Institute for Economic Analysis-CSIC and Barcelona GSE Eduard Vallory, Secretary—Barcelona GSE Director-General All details regarding the congress can be found on the website http:// eea-esem2009.barcelonagse.eu/. 2009 LATIN AMERICAN MEETING
THE 2009 LATIN AMERICAN MEETINGS will be held jointly with the Latin American and Caribbean Economic Association in Buenos Aires, Argentina, from October 1 to 3, 2009. The Meetings will be hosted by Universidad Torcuato Di Tella (UTDT). The Annual Meetings of these two academic associations will be run in parallel, under a single local organization. By registering for LAMES 2009, participants will be welcome to attend to all sessions of both meetings. Andrés Neumeyer (UTDT) is the conference chairman. The LAMES Program Committee is chaired by Emilio Espina (UTDT). The LACEA Program Committee is chaired by Sebastián Galiani (Washington University in St. Louis).
ANNOUNCEMENTS
613
The deadline for submissions is May 4, 2009. Authors can submit only one paper to each meeting, but the same paper cannot be submitted to both meetings. Authors submitting papers must be a member of the respective association at the time of submission. Membership information can be found at http://www. lacea.org/ and http://www.econometricsociety.org. A limited number of papers will be invited to be presented in poster sessions that will be organized by topic. Further information can be found at the conference website at http://www. lacealames2009.utdt.edu or by email at
[email protected]. 2010 NORTH AMERICAN WINTER MEETING
THE 2010 NORTH AMERICAN WINTER MEETING of the Econometric Society will be held in Atlanta, GA, from January 3 to 5, 2010, as part of the annual meeting of the Allied Social Science Associations. The program will consist of contributed and invited papers. The program committee will be chaired by Dirk Bergemann of Yale University. This year we are pleased to invite submissions of entire sessions in addition to individual papers. The program committee invites contributions in the form of individual papers and entire sessions (of three or four papers). Each person may submit only one paper and only present one paper. However, each person is allowed to be the co-author of several papers submitted to the conference. At least one co-author must be a member of the Society or must join prior to submission. You may join the Econometric Society at http://www.econometricsociety.org. Prospective contributors are invited to submit titles and abstracts of their papers by May 5, 2009 at the conference website: https://editorialexpress.com/conference/NAWM2010 At least one co-author must be a member of the Society or must join prior to submission. This can be done at www.econometricsociety.org. The submissions should represent original manuscripts not previously presented to any Econometric Society regional meeting or submitted to other professional organizations for presentation at these same meetings. The following information should also be provided electronically at the time of submission: the authors’ names, affiliations, complete addresses, telephone and fax numbers; both the email addresses and web sites (if any) of the submitters; the JEL primary field name and number; and the paper title. Program Committee: Dirk Bergemann, Yale University, Chair Marco Battaglini, Princeton University (Political Economy)
614
ANNOUNCEMENTS
Roland Benabou, Princeton University (Behavioral Economics) Markus Brunnermeier, Princeton University (Financial Economics) Xiahong Chen, Yale University (Theoretical Econometrics, Time Series) Liran Einav, Stanford University (Industrial Organization) Luis Garicano, University of Chicago (Organization, Law and Economics) John Geanakoplos, Yale University (General Equilibrium Theory, Mathematical Economics) Mike Golosov, MIT (Macroeconomics) Pierre Olivier Gourinchas, University of California (International Finance) Igal Hendel, Northwestern (Empirical Microeconomics) Johannes Hoerner, Yale University (Game Theory) Han Hong, Stanford University (Applied Econometrics) Wojcich Kopczuk, Columbia University (Public Economics) Martin Lettau, University of California, Berkeley (Finance) Enrico Moretti, University of California, Berkeley (Labor) Muriel Niederle, Stanford University (Experimental Game Theory, Market Design) Luigi Pistaferri, Stanford University (Labor) Esteban Rossi-Hansberg, Princeton University (International Trade) Marciano Siniscalchi, Northwestern University (Decision Theory) Robert Townsend, Massachusetts Institute of Technology (Development Economics) Oleg Tsyvinski, Yale University (Macroeconomics, Public Finance) Harald Uhlig, University of Chicago (Macroeconomics, Computational Finance) Ricky Vohra, Northwestern University (Auction, Mechanism Design)
Econometrica, Vol. 77, No. 2 (March, 2009), 615
FORTHCOMING PAPERS THE FOLLOWING MANUSCRIPTS, in addition to those listed in previous issues, have been accepted for publication in forthcoming issues of Econometrica. ACKERBERG, DANIEL, JOHN GEWEKE, AND JINYONG HAHN: “Comments on “Convergence Properties of the Likelihood of Computed Dynamic Models” by Fernández-Villaverde, Rubio-Ramírez, and Santos.” BAI, JUSHAN: “Panel Data Models With Interactive Fixed Effects.” BLOOM, NICHOLAS: “The Impact of Uncertainty Shocks.” CHARNESS, GARY, AND URI GNEEZY: “Incentives to Exercise.” COSTINOT, ARNAUD: “An Elementary Theory of Comparative Advantage.” ECHENIQUE, FEDERICO, AND IVANA KOMUNJER: “Testing Models With Multiple Equilibria by Quantile Methods.” FIRPO, SERGIO, NICOLE M. FORTIN, AND THOMAS LEMIEUX: “Unconditional Quantile Regressions.” GARRATT, RODNEY J., THOMAS TRÖGER, AND CHARLES Z. ZHENG: “Collusion via Resale.” GNEEZY, URI, KENNETH L. LEONARD, AND JOHN A. LIST: “Gender Differences in Competition: Evidence From a Matrilineal and a Patriarchal Society.” GUERRE, EMMANUEL, ISABELLE PERRIGNE, AND QUANG VUONG: “Nonparametric Identification of Risk Aversion in First-Price Auctions Under Exclusion Restrictions.” HELLWIG, CHRISTIAN, AND GUIDO LORENZONI: “Bubbles and Self-Enforcing Debt.” IMBENS, GUIDO W., AND WHITNEY K. NEWEY: “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity.” KARLAN, DEAN, AND JONATHAN ZINMAN: “Observing Unobservables: Identifying Information Asymmetries With a Consumer Credit Field Experiment.” KEANE, MICHAEL P., AND ROBERT M. SAUER: “Classification Error in Dynamic Discrete Choice Models: Implications for Female Labor Supply Behavior.” PISSARIDES, CHRISTOPHER A.: “The Unemployment Volatility Puzzle: Is Wage Stickiness the Answer?” RIEDEL, FRANK: “Optimal Stopping With Multiple Priors.” SINISCALCHI, MARCIANO: “Vector Expected Utility and Attitudes Toward Variation.” STOYE, JÖRG: “More on Confidence Intervals for Partially Identified Parameters.” SUN, NING, AND ZAIFU YANG: “A Double-Track Adjustment Process for Discrete Markets With Substitutes and Complements.”
© 2009 The Econometric Society
DOI: 10.3982/ECTA772FORTH
Econometrica, Vol. 77, No. 2 (March, 2009), 617–622
2008 ELECTION OF FELLOWS TO THE ECONOMETRIC SOCIETY THE FELLOWS OF THE ECONOMETRIC SOCIETY elected fifteen new Fellows in 2008. Their names and selected bibliographies are given below. TORBEN G. ANDERSEN, Dept. of Finance, Kellogg School of Management, Northwestern University. “Return Volatility and Trading Volume: An Information Flow Interpretation of Stochastic Volatility,” Journal of Finance, 51 (1996), 169–204. “DM-Dollar Volatility: Intraday Activity Patterns, Macroeconomic Announcements, and Longer Run Dependencies” (with T. Bollerslev), Journal of Finance, 53 (1998), 219–265. “Answering the Skeptics: Yes, Standard Volatility Models Do Provide Accurate Forecasts” (with T. Bollerslev), International Economic Review, 39 (1998), 885–905. “An Empirical Investigation of Continuous-Time Models for Equity Returns” (with L. Benzoni and J. Lund), Journal of Finance, 57 (2002), 1239–1284. “Modeling and Forecasting Realized Volatility” (with T. Bollerslev, F. X. Diebold, and P. Labys), Econometrica, 71 (2003), 579–626. “Micro Effects of Macro Announcements: Real-Time Price Discovery in Foreign Exchange” (with T. Bollerslev, F. X. Diebold, and C. Vega), American Economic Review, 93 (2003), 38–62. MARK ARMSTRONG, Professor of Economics, University College London. “Multiproduct Nonlinear Pricing,” Econometrica, 64 (1996), 51–76. “Network Interconnection in Telecommunications,” Economic Journal, 108 (1998), 545–564. “Price Discrimination by a Many-Product Firm,” Review of Economic Studies, 66 (1999), 151–168. “Optimal Multi-Object Auctions,” Review of Economic Studies, 67 (2000), 455–482. “Competitive Price Discrimination” (with J. Vickers), Rand Journal of Economics, 32 (2001), 579–605. “Competition in Two-sided Markets,” Rand Journal of Economics, 37 (2006), 668–691. MARTIN CRIPPS, Professor of Economics, University College London. “Reputation and Commitment in Two-Person Repeated Games Without Discounting” (with J. P. Thomas), Econometrica, 63 (1995), 1401–1420. © 2009 The Econometric Society
DOI: 10.3982/ECTA772FES
618 “Reputation in Perturbed Repeated Games” (with K. M. Schmidt and J. P. Thomas), Journal of Economic Theory, 69 (1996), 387–410. “Imperfect Monitoring and Impermanent Reputations” (with G. J. Mailath and L. Samuelson), Econometrica, 72 (2004), 407–432. “Strategic Experimentation With Exponential Bandits” (with G. Keller and S. Rady), Econometrica, 73 (2005), 39–68. “Efficiency of Large Double Auctions” (with J. M. Swinkels), Econometrica, 74 (2006), 47–92. “Common Learning” (with J. Ely, G. J. Mailath, and L. Samuelson), Econometrica, 76 (2008), 909–934. ERNST FEHR, Professor of Economics, University of Zurich. “Does Fairness Prevent Market Clearing?” (with G. Kirchsteiger and A. Riedl), Quarterly Journal of Economics, 108 (1993), 437–460. “Reciprocity as a Contract Enforcement Device” (with S. Gächter and G. Kirchsteiger), Econometrica, 65 (1997), 833–860. “A Theory of Fairness, Competition and Cooperation” (with K. Schmidt), Quarterly Journal of Economics, 114 (1999), 817–868. “Cooperation and Punishment in Public Goods Experiments” (with S. Gächter), American Economic Review, 90 (2000), 980–994. “Altruistic Punishment in Humans” (with S. Gächter), Nature, 415 (10 January, 2002), 137–140. “Oxytocin Increases Trust in Humans” (with M. Kosfeld, M. Heinrichs, P. Zak, and U. Fischbacher), Nature, 435 (2 June, 2005), 673–676. JEREMY GREENWOOD, Professor of Economics, University of Pennsylvania. “Financial Development, Growth, and the Distribution of Income” (with B. Jovanovic), Journal of Political Economy, 98 (1990), 1076–1107. “Long-Run Implications of Investment-Specific Technological Change” (with Z. Hercowitz and P. Krusell), American Economic Review, 87 (1997), 342–362. “On the State of the Union” (with S. R. Aiyagari and N. Guner), Journal of Political Economy, 108 (2000), 213–244. “Efficient Investment in Children” (with S. R. Aiyagari and A. Seshadri), Journal of Economic Theory, 102 (2002), 290–321. “Engines of Liberation” (with A. Seshadri and M. Yorukoglu), Review of Economic Studies, 72 (2005), 109–133.
619 “The Baby Boom and Baby Bust” (with A. Seshadri and G. Vandenbroucke), American Economic Review, 95 (2005), 183–207. PHILIP HAILE, Professor of Economics, Yale University. “Auctions With Resale Markets: An Application to U.S. Forest Service Timber Sales,” American Economic Review, 91 (2001), 399–427. “Identification of Standard Auction Models” (with S. Athey), Econometrica, 70 (2002), 2107–2140. “Inference With an Incomplete Model of English Auctions” (with E. Tamer), Journal of Political Economy, 111 (2003), 1–51. “Nonparametric Approaches to Auctions” (with S. Athey), in Handbook of Econometrics, Vol. 6A, ed. by J. Heckman and E. Leamer, Elsevier (2007), 3847–3965. “On the Empirical Content of Quantal Response Equilibrium” (with A. Hortaçsu and G. Kosenok), American Economic Review, 98 (2008), 180–200. IAN JEWITT, Official Fellow, Nuffield College, University of Oxford. “Preference Structure and Piecemeal Second Best Policy,” Journal of Public Economics, 16 (1981), 215–231. “Risk Aversion and the Choice Between Risky Prospects: The Preservation of Comparative Statics Results,” Review of Economic Studies, 54 (1987), 73–85. “Justifying the First-Order Approach to Principal-Agent Problems,” Econometrica, 56 (1988), 1177–1190. “Decentralizing Public Good Supply” (with T. Besley), Econometrica, 59 (1991), 1769–1778. “The Economics of Career Concerns, Part I: Comparing Information Structures” (with M. Dewatripont and J. Tirole), Review of Economic Studies, 66 (1999), 183–198. “The Economics of Career Concerns, Part II: Application to Missions and Accountability of Government Agencies” (with M. Dewatripont and J. Tirole), Review of Economic Studies, 66 (1999), 199–217. MICHAEL KREMER, Professor of Economics, Harvard University. “The O-Ring Theory of Economic Development,” Quarterly Journal of Economics, 108 (1993), 551–576. “Population Growth and Technological Change: 1,000,000 B.C. to 1990,” Quarterly Journal of Economics, 108 (1993), 681–716. “Patent Buyouts: A Mechanism for Encouraging Innovation,” Quarterly Journal of Economics, 113 (1998), 1137–1167.
620 “Vouchers for Private Schooling in Colombia: Evidence From a Randomized Natural Experiment” (with J. Angrist, E. Bettinger, E. Bloom, and E. King), American Economic Review, 92 (2002), 1535–1558. “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities” (with E. Miguel), Econometrica, 72 (2004), 159–217. “Odious Debt” (with S. Jayachandran), American Economic Review, 96 (2006), 82–92. JONATHAN LEVIN, Professor of Economics, Stanford University. “Information and Competition in U.S. Forest Service Timber Auctions” (with S. Athey), Journal of Political Economy, 109 (2001), 375–417. “Multilateral Contracting and the Employment Relationship,” Quarterly Journal of Economics, 117 (2002), 1075–1103. “Relational Incentive Contracts,” American Economic Review, 93 (2003), 835–857. “Profit Sharing and the Role of Professional Partnerships” (with S. Tadelis), Quarterly Journal of Economics, 120 (2005), 131–171. “Matching and Price Competition” (with J. Bulow), American Economic Review, 96 (2006), 652–668. “Estimating Dynamic Models of Imperfect Competition” (with P. Bajari and L. Benkard), Econometrica, 75 (2007), 1331–1370. AKIHIKO MATSUI, Professor of Economics, University of Tokyo. “Social Stability and Equilibrium” (with I. Gilboa), Econometrica, 59 (1991), 859–867. “Cheap-Talk and Cooperation in a Society,” Journal of Economic Theory, 54 (1991), 245–258. “Best Response Dynamics and Socially Stable Strategies,” Journal of Economic Theory, 57 (1992), 343–362. “An Approach to Equilibrium Selection” (with K. Matsuyama), Journal of Economic Theory, 65 (1995), 415–434. “Asynchronous Choice in Repeated Coordination Games” (with R. Lagunoff), Econometrica, 65 (1997), 1467–1477. “Learning Aspiration in Repeated Games” (with I.-K. Cho), Journal of Economic Theory, 124 (2005), 171–201. MARC MELITZ, Professor of Economics, Princeton University. “The Impact of Trade on Intra-Industry Reallocations and Aggregate Industry Productivity,” Econometrica, 71 (2003), 1695–1725.
621 “Export Versus FDI With Heterogeneous Firms” (with E. Helpman and S. Yeaple), American Economic Review, 94 (2004), 300–316. “International Trade and Macroeconomic Dynamics With Heterogeneous Firms” (with F. Ghironi), Quarterly Journal of Economics, 120 (2005), 865–915. “Market Size, Trade, and Productivity” (with G. Ottaviano), Review of Economic Studies, 75 (2008), 295–316. “Estimating Trade Flows: Trading Partners and Trading Volumes” (with E. Helpman and Y. Rubinstein), Quarterly Journal of Economics, 123 (2008), 441–487. DILIP MOOKHERJEE, Professor of Economics, Boston University. “Optimal Incentive Schemes With Many Agents,” Review of Economic Studies, 51 (1984), 433–446. “Optimal Auditing, Insurance and Redistribution” (with I. Png), Quarterly Journal of Economics, 102 (1989), 399–415. “Learning Behavior in an Experimental Matching Pennies Game” (with B. Sopher), Games and Economic Behavior, 7 (1994), 62–91. “Hierarchical Decentralization of Incentive Contracts” (with N. Melumad and S. Reichelstein), Rand Journal of Economics, 26 (1995), 654–672. “Inequality, Control Rights, and Rent-Seeking: Sugar Cooperatives in Maharashtra” (with A. Banerjee, K. Munshi, and D. Ray), Journal of Political Economy, 109 (2001), 138–190. “Persistent Inequality” (with D. Ray), Review of Economic Studies, 70 (2003), 369–394. MONIKA PIAZZESI, Professor of Economics, Stanford University. “A No-Arbitrage Vector Autoregression of Term Structure Dynamics With Macroeconomic and Latent Variables” (with A. Ang), Journal of Monetary Economics, 50 (2003), 745–787. “Bond Risk Premia” (with J. Cochrane), American Economic Review, 95 (2005), 138–160. “Bond Yields and the Federal Reserve,” Journal of Political Economy, 113 (2005), 311–344. “What Does the Yield Curve Tell Us About GDP Growth?” (with A. Ang and M. Wei), Journal of Econometrics, 131 (2006), 359–403.
622 “Housing, Consumption, and Asset Pricing” (with M. Schneider and S. Tuzel), Journal of Financial Economics, 83 (2007), 531–569. ROBERT W. STAIGER, Professor of Economics, Stanford University. “Discretionary Trade Policy and Excessive Protection” (with G. Tabellini), American Economic Review, 77 (1987), 823–837. “A Theory of Managed Trade” (with K. Bagwell), American Economic Review, 80 (1990), 779–795. “An Economic Theory of GATT” (with K. Bagwell), American Economic Review, 89 (1999), 215–248. “Domestic Policies, National Sovereignty and International Economic Institutions” (with K. Bagwell), Quarterly Journal of Economics, 116 (2001), 519–562. “Will International Rules on Subsidies Disrupt the World Trading System?” (with K. Bagwell), American Economic Review, 96 (2006), 877–895. ELIE TAMER, Professor of Economics, Northwestern University. “Inference on Regressions With Interval Data on a Regressor or Outcome” (with C. Manski), Econometrica, 70 (2002), 519–546. “Inference in Censored Models With Endogenous Regressors” (with H. Hong), Econometrica, 71 (2003), 905–932. “Incomplete Simultaneous Discrete Response Model With Multiple Equilibria,” Review of Economic Studies, 70 (2003), 147–167. “Inference With an Incomplete Model of English Auctions” (with P. Haile), Journal of Political Economy, 111 (2003), 1–51. “Bounds on Parameters in Panel Dynamic Discrete Choice Models” (with B. Honore), Econometrica, 74 (2006), 611–630. “Estimation and Confidence Regions for Parameter Sets in Econometric Models” (with V. Chernozhukov and H. Hong), Econometrica, 75 (2007), 1243–1284.